SomaLogic’s default normalization procedure consists of 5 steps, hybridization normalization, intraplate normalization, plate scaling, calibration, and adaptive normalization by maximum likelihood (ANML). These steps aim to reduce technical variability and remove batch effects. We use a specialised notation system to refer to various combinations of these steps. This system is explained in Fig. 1a.
Normalization procedures impact on analysis results
We consider how different normalization procedures impact the analysis of differential protein abundances between semaglutide and placebo arms in STEP 1 and STEP 2. Analysis was done by robust regression using bisquare M-estimation. The results of each aptamer are considered a match between different normalization procedures if both are non-significant or if they are both significant with estimated effects in matching directions. We consider an effect significant at a 5% false discovery rate using Benjamini-Yekutieli10.
Documentation describing the normalization procedure is publicly available2, however, the code itself and the references used are proprietary. Therefore, we reverse-engineered the procedure to allow us to remove individual steps. We also implemented a simpler alternative to ANML which we refer to as median normalization. Figure 1b shows that the reverse engineered version of the full pipeline closely matches that of the vendor, with 98.2% of aptamers matching in STEP 1, and 99.2% matching in STEP 2. Aptamers were sorted into matching categories in most cases, with a high correlation of 0.999 and 0.997 between the estimated log fold changes, for STEP 1 and STEP 2 respectively. The discrepancy seen is likely due to the difference in reference values used. The vendor uses proprietary values for the hybridization normalization and ANML steps. The reverse engineered implementation uses references based on the median of the pooled STEP 1 and STEP 2 data (using study specific reference made no notable difference).
We now compare different combinations of normalization steps to see how they impact results. Figure 1c shows substantial difference in STEP 1 between the categorization of results for raw data when compared to V-HIPC, with only 48.4% of aptamers matching. We also see a modest 45.7% for V-HIPC vs. V-HIPCA. In STEP 2, we 93.8% of aptamers match for V-HIPC and V-HIPCA, however if aptamers that are non-significant for both procedures are removed the overlap is only 38.3%. A similar trend is seen for raw vs. VHIPC in STEP 2.
Figure 1d shows that the plate scaling step has no effect on results if the calibration step is applied (RE-HIPC vs. RE-HIC). No aptamers switch category and the log fold changes have equal values giving a correlation of 1.000. This is due to the raw data of RE-HIPC and RE-HIC being completely equal. Removing the intraplate normalization step has a negligible effect with 96.5% aptamers matching in STEP 1. However, removing the hybridization step as well does lead to a substantial number of mismatching aptamers, with an overlap of 85.4% in STEP 1.
Figure 1e evaluates whether the complexity of ANML is warranted. ANML uses an iterative process based on reference ranges for each analyte. It is compared to a simpler median normalization step. We see the simpler process gives similar results as ANML, with 98.2% overlap in STEP 1.
Figure 2a shows volcano plots for treatment effect using raw data, VHIPC, and VHIPCA. In STEP 1, we see 811 significantly downregulated aptamers with the raw data, this increases to 3422 with V-HIPC, but drops to 741 with V-HIPCA. VHIPC finds 125 significantly upregulated aptamers in STEP 1, but this increases to 632 aptamers with V-HIPCA. This trend where the number of up- and downregulated aptamers is more even with V-HIPCA compared to raw data and V-HIPC is also seen in STEP 2. Figure 2c shows a histogram of ratio of the fold change estimates of V-HIPCA over V-HIPC, they show ANML shifts the estimated fold change of all aptamers upwards. In Supplementary Fig. 1, we see the same phenomenon of ANML shifting fold change estimates for gender and age at baseline. We investigate this fold change estimate shift, using two simulations. For Fig. 2b and Fig. 2d, we simulated a treatment effect in a portion of aptamers, the first simulation had an equal number of up- and downregulated aptamers (balanced effects) and the second had only downregulated aptamers (unbalanced effects). The second simulation simplifies the trend seen in STEP 1 and 2, where most aptamers are downregulated. Figure 2d shows that for the simulation with unbalanced effects ANML introduces a bias, whereas for the simulation with balanced effects it does not. With unbalanced effects, ANML introduces 101 false positives (5.0% of significant aptamers) with a positive log fold change, where only 1 false positive is found using REHIPC. Using REHIPC results in 10 false negative, and ANML increases this to 324.
ANML scale factors
ANML is based on scale factors that are multiplied to the data. Ideally, the scale factors adjust for technical factors while preserving the biological signal. We test to see if ANML’s scale factors are associated with biological factors, in particular semaglutide treatment, age, and gender. In Table 1, we see that ANML produces scale factors that are significantly associated with treatment, gender, and age. This means that the measurements of different groups of subjects are shifted relative to each other, which explains the shift in fold change estimates.
Table 1
Analysis of scale factors of ANML (SomaLogic’s implementation). The log2 scale factors were fitted against different factors using robust regression with a bisquare M-estimator at 90% efficiency. For the analysis of age, the fold change shows the increase for a 10-year difference.
analysis | contrast | visit | study | fold change | p-value |
age | - | Visit 2 | STEP 1 | 1.02 | P < 0.001 |
| | Visit 2 | STEP 2 | 1.02 | P < 0.001 |
| | Visit 24 | STEP 1 | 1.01 | P < 0.001 |
| | Visit 24 | STEP 2 | 1.03 | P < 0.001 |
gender | Male / Female | Visit 2 | STEP 1 | 1.02 | P < 0.001 |
| | Visit 2 | STEP 2 | 1.03 | P < 0.001 |
| | Visit 24 | STEP 1 | 1.02 | P < 0.001 |
| | Visit 24 | STEP 2 | 1.03 | P < 0.001 |
treatment | Treatment / Placebo | Visit 2 | STEP 1 | 1.00 | P = 0.979 |
| | Visit 2 | STEP 2 | 1.00 | P = 0.549 |
| | Visit 24 | STEP 1 | 1.04 | P < 0.001 |
| | Visit 24 | STEP 2 | 1.02 | P < 0.001 |
Estimated proportion of technical variability
To estimate the proportion of variation that is technical, we compare the variation of subject samples against QC samples. The QC samples are technical replicates meaning there is no biological variation, whereas the subject samples contain both biological variability and technical variability. Figure 4a shows the estimated proportion of technical variability. This is estimated by taking the quotient of the median absolute deviation (MAD) of the technical replicated QC samples over the regular samples. The regular samples are impacted by both technical and biological variability, whereas the QC samples are only impacted by technical variability. We see (Fig. 4a) that the hybridization normalization step reduces the estimated proportion of technical variance from (raw) 69.3–39.5% (RE-H). The RE-HIPC procedures further reduces the estimated proportion to 31.3%. However, including ANML on top of this processing pipeline, slightly increases the proportion of technical variability to 32.6%. We see ANML decreases variability in both subject samples (Fig. 4b) and QC samples (Fig. 4c), however, the increase in the ratio of the two suggests the procedure is removing biological signal.
Robust regression
In Fig. 3c we see a QQ-plot for all sample types, which all have heavy tails. Seeing heavy tails for buffers, QC, and calibrators indicates that the extreme values are a result of technical variability and do not reliably reflect biology. This violates the assumption of normal distributed residuals for linear regression. We therefore use robust regression using M-estimation.
Figure 3a shows a that robust methods find more aptamers with significant differential abundance. However, linear regression does not offer a conservative option, with 672 aptamers being statistically significant when using linear regression but nonsignificant with M-estimation in STEP 1 (only 9 aptamers for STEP 2).
In Fig. 3b we see that linear regression and M-estimation categorize 80.8% of aptamers the same in terms of significance and directionality and have a correlation of 0.857 between the estimated log fold change, in STEP 1. For STEP 2 93.3% of aptamers match, with a correlation of fold change estimates of 0.835.
An alternative method that is robust to outliers is rank-based inverse normal transformation (RBINT), which has previously been applied to SomaScan data11. In Supplementary Fig. 2, we compare RBINT to alternatives using gender as the only covariate. Gender was chosen because it allows us to use the non-parametric method Kruskal-Wallis, allowing us to compare rank-based inverse normal transformation (RBINT) with a well-established alternative. RBINT is only based on the rank of measurements, meaning it does not produce meaningful fold change estimates. It has also previously been shown to inflate type I error in certain circumstances12. In our comparison we see it finds more positive results than Kruskal-Walls making it unlikely that it provides type 1 error control. Research has shown RBINT outperforms linear regression on raw values in specific simulations13.
Spuriomers
SomaScan includes 20 spuriomers which are random sequences of DNA that do not target any known human protein and, thus, should not be associated with biological factors such as treatment, gender, and age. We can therefore use them to assess non-specific binding for the raw data and the normalization procedures VHIPC and VHIPCA. We assess treatment, gender, and age to see how biological signal impacts the spuriomers.
Supplementary Fig. 3 shows volcano plots of these 20 spuriomers for treatment, gender, and age. Previously we have corrected for multiplicity using all aptamers, however, in this analysis we only correct for the 20 spuriomers. We see spuriomers with significant association with treatment effect for both VHIPC and VHIPCA in in STEP 1, but none for STEP 2. For gender we see 8 significant spuriomers for V-HIPC in STEP 2, and 1 significant spuriomer for STEP 1 and VHIPCA. For age we see many significant spuriomers for both studies and all three levels of normalization.