Bacteria isolates
The microbiology laboratory of Ibn Rochd University Hospital Centre (IR-UHC) of Casablanca carries out surveillance of invasive and non-invasive pneumococcal infections in children ≤5 years [8]. All pneumococcal strains were isolated and identified according to the standard bacteriology procedures. Serogrouping was done by the checkerboard method with Pneumotest-latex (Statens Serum Institute antisera, Copenhague, Denmark). Serotyping was performed by Quellung capsule swelling using Statens Serum Institute antisera (Statens Serum Institute antisera, Copenhague, Denmark).
Antibiotic susceptibility tests were performed on Mueller-Hinton agar additioned with 5% of sheep blood (BioMérieux, Marcy-l'Etoile, France) and interpreted according to the Clinical Laboratory Standard Institute (CLSI, 2012) recommendations [35]. Oxacillin (1 µg) was used for screening of penicillin non-susceptible S. pneumoniae. Erythromycin, chloramphenicol, clindamycin, vancomycin, cotrimoxazole, rifampicin, tetracycline and levofloxacin were tested by disc diffusion method. The MIC of penicillin G and ceftriaxone were determined by E-test method with E-Tests from Oxoid (Oxoid, Basingstoke, UK) on Mueller-Hinton agar additioned with 5% of sheep blood (BioMérieux, Marcy-l'Etoile, France). The breakpoints recommended by CLSI in 2012 were used for interpretation: ≤ 0.06µg/ml and ≥ 2µg/ml for penicillin, ≤ 0.5µg/ml and ≥ 2µg/ml, for ceftriaxone for meningeal isolates and ≤1 µg/ml and ≥4 µg/ml for non-meningeal isolates. Quality control was conducted using S. pneumoniae ATCC 49619.
From 2007-2014, 9 strains (invasive and non-invasive) of serotype 1 were isolated in children under five years old. Three of them were lost. Six (6) isolates (5 invasive and 1 non-invasive) of S. pneumoniae serotype 1 causing infections among children under 5 years, were randomly selected from the data bank of the microbiology laboratory of IR-UHC of Casablanca, to perform the WGS analysis. All isolated strains were stored in brain heart infusion broth with 15% of glycerol at − 80 °C.
Bacterial DNA preparation and whole genome sequencing
The genomic DNA of the six strains selected for this surveillance was purified with the QIAamp DNA Mini Kit (Hilden, Germany) following the manufacturer's recommendations. DNA quality and quantity were estimated by measuring the absorbance of the sample using NanoVue™ Plus Spectrophotometer (GE Healthcare UK Limited, UK) at wavelengths 260 nm and 280 nm following the manufacturer’s instructions. Extracted DNA were stored at -20 °C. The DNAs of the six strains were whole-genome-sequenced using an Illumina HiSeq 2500 platform at the Wellcome Trust Sanger Institute, as part of the PAGe project. Libraries were constructed using the Nextera XT DNA Library Preparation Kit and sequenced with the HiSeq Reagent Kit (pair-end reads of 150 bp).
Genome assembly and annotation
The quality of the generated reads from high throughput NGS was assessed using FastQC v0.11.8 [15]. After removal adaptor sequences, reads of each isolate were de novo assembled using SPAdes v3.11.1 [36] with a k-mer size automatically determined by the package. The obtained draft assemblies were annotated using the Prokka (Prokaryotic annotation) software, which predicts genes, based on available annotation informations such as proteins and coding sequences (CDS) [37]. Average Nucleotide Identity (ANI), a whole-genome similarity metric was used to investigate the relatedness among isolates genomes.
Recombination, phylogenetic and population structure analysis
In the study of by Chaguza et al. [28], the phylogenic analysis of the global population structure of serotype 1 in Africa showed that all isolates were grouped in five distinct clades. From those clades, we selected a balanced genomic data of 74 public genomes of serotype 1 from nine African countries (Egypt, Ethiopia, Ghana, Malawi, Mozambique, Niger, Nigeria, South Africa and Gambia) previously published [28]. Data were extracted in the European Nucleotide Archive (ENA) database (Additional file 1). Recombination was analyzed among the strains from Morocco, and 74 public genomes of serotype 1 using Gubbins algorithm [38] over the core genome alignment generated by progressiveMauve [39], a software package that attempts to align orthologous and xenologous regions among genome sequences. First, we removed inconsistent alignment columns with trimAl [40] in all concatenated locally collinear blocks, and then Gubbins was run over the core genome alignment. For the inference of the phylogenetic relationships among the 80 isolates, Maximum Likelihood (ML) phylogenetic analyses were performed by using RAxML v8.2.12 [41] based on core genome obtained with progressiveMauve (recombinations were filtered out with Gubbins), with 1000 bootstrap iterations. The clustering analysis was done with hierBaps (Bayesian clustering tool for population genetics). Sequence types (STs) of moroccan S. pneumoniae isolates were determined by the sequences of seven housekeeping genes (aroE, gdh, gki, recP, spi, xpt, and ddl) obtained from the results of WGS. Allelic numbers and STs were assigned by using the pneumococcal Multilocus Sequence Typing (MLST) website (https://pubmlst.org/spneumoniae/).
Pangenome reconstruction
To accurately reconstruct the pangenome of the whole dataset, all 80 assembled and annotated genomes (6 genomes from Morocco and 74 public genomes) were analyzed by Roary v3.11.2 [42]. DAPC [43] was used to investigate the accessory genes distribution among sublineages in order to explore its differences.
Finally, we did a linear regression of the Jaccard distance based on the accessory genes and the nucleotide diversity of synonym sites of core genes for each pair of genomes to provide insights of accessory genome adataptive evolution. The analysis was done using the R package pagoo (https://github.com/iferres/pagoo), computing the Jaccard distance between each pair of organisms by the vegan::vegdist function [44] and the pairwise nucleotide diversity by the pegas::nuc.div function [45].