Study design and participants
Participants were delivered in two sub-urban health centres (sub-urban arm) and three urban University hospitals (hospital arm) in the Abomey-Calavi, Sô-Ava and Cotonou districts in the South Benin region where malaria is hyper-endemic [43]. In both arms, only infants born from mother living in the Abomey-Calavi district were recruited to facilitate the follow-up and minimize effect of geographical origins. In the sub-urban arm, that includes only normal geatation with low-risk delivery (no maternal risk factors for infection), all consecutive births were included, whereas in the hospital arm only newborns born from mothers with maternal-foetal risk factors for infection (prematurity, prolonged rupture of the membrane, maternal fever) were included. In both arms, the exclusion criteria included maternal HIV positive status, major congenital malformation and refusal of consent. All children from both arms were followed clinically on a bi-monthly base during the first 3 months of life. The follow-up consisted of scheduled home visits and unscheduled emergency visits if the infant was ill. The study protocol was approved by the local institutional review board (CER-ISBA 85 − 5). Written informed consent was obtained from parents.
Exposures and neonatal sepsis definition
The exposure were occurrence of gestational malaria (GM). GM was defined as a malaria infection during pregnancy or at delivery. For women in the suburban arm, malaria screening was performed at each scheduled prenatal visit using a thick blood smear. Mothers from the Hospital arm were screened only at the time of delivery. In this group, antenatal malaria was established on the basis of mother’s anamnesis. For both study arms, placental blood smear and mother’s peripheral blood smear were performed. The Lambaréné technique was used to quantify parasitaemia with a detection threshold of five parasites per microliter [44].
Neonatal sepsis was suspected in neonates with more than two of the following criteria being present: neutrophil count < 7500/mm3 or > 14 500/mm3, band form > 1500/mm3, immature/total neutrophils ratio > 0.16, platelets count < 150 000/mm3 and CRP > 10 mg/L. Suspected neonatal sepsis was considered as clinical sepsis when the following clinical signs were associated: temperature irregularity; respiratory distress or apnoea; seizures, altered tonus, irritability or lethargy; vomiting, altered feeding pattern, ileus; skin perfusion alteration, haemodynamic signs (tachycardia, hypotension); hypoglycaemic/hyperglycaemic, hyperlactatemia or identification of focal infection such as soft tissue infection or conjunctivitis. All newborns with a clinical sepsis were subsequently adjudicated by one independent pediatrician (PT) and sorted into ‘presumed sepsis’ and ‘definite clinical sepsis’ grouped as “adjudicated sepsis”. In discordant cases, a second independent pediatrician (ULT) performed the final adjudication with access, in addition to the full medical file review, to microbiological cultures results. Parallel to microbiological culture (BACT/ALERT® system), specific BioFire® FilmArray® panels (bioMerieux, Marcy-l’Etoile, France) were run for all positive blood cultures (Blood Culture Identification (BCID) panel), cerebrospinal fluids (meningitis/encephalitis panel), respiratory and gastrointestinal samples (Pneumonia and Gastro-Intestinal panels). All studied biomarkers were kept blinded for the adjudication.
Biomarkers sampling
At birth and at follow-up visits, the clinical examination data of the children were collected. Blood samples were obtained at birth, then at week (W)1, W4, W8 and W12. The study protocol has been described in detail elsewhere [44]. Sampling and analytical methods are presented in Supplementary Appendix 2
Outcomes
The primary outcome was the diagnosis of clinical neonatal sepsis, and secondary outcome was mortality within the first three months of life. Neonatal sepsis diagnosis was established by the local paediatrician based on the clinical examination of the child and initial laboratory workup including haemogram, C-reactive protein (CRP) and microbiological cultures (blood, cerebral fluids and urine). Neonatal sepsis that occurred within the first 72 hours following birth was considered as an early onset neonatal sepsis (EONS), and late onset (LONS) thereafter (for detailed algorithm for sepsis diagnosis, see published study protocol [44]).
Statistical analysis
An independent statistician (FB) (Soladis Inc. Lyon, France; https://www.soladis.com/ ) supported the statistical methodology and performed all analysis. Statistical analyses were performed using R software version 3.6.1. The variables were assessed for normality using Kolmogorov Smirnov test. Numbers and frequency were used for qualitative data and medians and IQR (inter-quartile range: [Q1–Q3]) for quantitative data. Qualitative variables were compared using the Chi-squared test (or Fisher’s exact test for small expected numbers). The distribution of quantitative data was compared using Student’s t-test (or the Mann-Whitney t-test when distribution was not normal or Welch test when homoscedasticity was rejected) if 2 groups were compared. If more than 2 groups, the distribution of quantitative data was compared using Anova test (or the Kruskal-Wallis test when distribution was not normal or when homoscedasticity was rejected).
To evaluate their diagnostic accuracy, data-driven analysis was performed. Selection of cut-off values or discrimination values defining the positive and negative test results were performed. Several methods for selecting optimal cut-off values in diagnostic tests are proposed in the literature depending on the underlying reason for this choice. Here, we selected a cut-off to have a sensitivity of 0.95 and maximize specificity. This choice of cut-off was the same in the rest of the publication [45]. CD74/IP-10 was the score corresponding to the division of CD74 gene expression level by IP-10 serum concentration. We used three datasets to test more complex models, either genes (noted CX3CR1 & CD74), proteins (noted Protein biomarkers) or sets of biomarkers (noted All biomarkers). To avoid overfitting and to compare our models, we used random sampling which takes place within each class and must preserve the overall distribution of data by class. To do this, we created a distribution, repeated 200 times, of 75/25% of the data. This distribution was used to optimize hyperparameters with package caret. Then we compared the average AUC and select the best average AUC for each dataset between all models. AUC accuracy were compared using Bootstrap approach. A p value < 0.05 was considered as significant. Statistical algorithms descriptions are displayed in Supplementary Appendix 2.
For comparison between clinical variables (VC) and biomarkers, we selected the clinical variables of interest following an expert opinion (PT) and corresponding to neonatal risk factors (eg. Gestational age, weight, maternal risk factors, multiple gestation, APGAR score) (Supplementary Table 1). In order to compare clinical and biomarker data, we transformed the data into the same referential to be able to compare them. We transformed the categorical clinical variables into a complete set of dummy variables [46]. Different transformations were then applied to the data set. A Yeo-Johnson transform is a non-linear transformation that reduces skewness and approximates a normal law. We centred (subtracts the mean of the variable’s data) and scaled data (divides the standard deviation).
For the establishment of heatmap, Partial Last Squares (PLS) Regression was used to compare the two datasets. This algorithm comes from the mixOmics package. Biomarkers were deflated with respect to the information extracted/modelled from the local regression on Clinical Variables. Consequently, the latent variables computed to predict Biomarkers from Clinical Variables are different from those computed to predict Clinical Variables from Biomarkers. One matrix Clustered Image Map (CIM) is a 2-dimensional visualization with rows and/or columns reordered according to some hierarchical clustering method to identify interesting patterns. The CIM allows to visualize correlations between variables. Generated dendrograms from clustering were added to the left side and to the top of the image. The used clustering method for rows and columns is the complete linkage method and the used distance measure is the distance Euclidean. We showed only variables with co-variances greater than max(covariance)/2.
N-integration and feature selection with Projection to Latent Structures models (PLS) with sparse Discriminant Analysis was used to compare the two datasets and outcome (Circosplot). This algorithm comes from the mixOmics package. The circos plot represents the correlations between variables of different types, represented on the side quadrants. We showed within and between connexions between blocks, expression levels of each variable according to each class. The circos plots were built based on a similarity matrix, extended to the case of multiple data sets. We showed only variables with co-variances greater than max(correlation)/2 [46].