We use low pass whole genome sequencing (LP-WGS) for the analysis of Long COVID patients. LP-WGS’s applications in the clinic have been scant to date, in part due to their novelty [5,6], having little time passed since this technology became widely available. With only two months from sending samples to obtaining meaningful results, LP-WGS allowed us to repurpose genome markers from other studies as useful indicators for genetic susceptibility to Long COVID. The fact that 1000G data are the reference dataset of variant imputation in LP-WGS, makes it possible to cover most variants identified from GWAS analyses and therefore polygenic risk scores (PRS) that make use of GWAS [28]. For a fraction of the cost compared to standard clinical 30x coverage of the whole genome, LP-WGS allows ~ 81 million SNPs from the 1000G to be imputed with 99% accuracy [5]. Our results show the benefit of using a small sample with genome-wide coverage of variants, yet leveraging the value of PRS, which make use of GWAS involving many thousands of cases in order to assign weights to allele risk contributions to traits tested in validated populations such as the UK Biobank.
With regards to Long COVID, no genomic markers have been published significantly associated to date [29]. This is in part due to lack of definitive datasets that allow clear-cut patients. The definition of clinical symptoms and signs still remains tenuous [4], which makes it difficult to amass datasets sufficiently different and large enough to enable genetic association. Because of these limitations, we have taken a different approach. We make use of existing PRS of likely association with Long COVID symptoms from the PGS Catalog [7]. Our approach of using PRS as a way to test symptoms relevant to Long COVID shows a new way of developing understanding of disease, repurposing existing data for different applications. Our study design thus is able to suggest genetic markers that are highly likely to contribute to Long COVID despite testing a small sample population.
We have been able to show that the PRS trait ‘Tiredness/lethargy in past two weeks’ contains risk allele distributions that are significantly different in Long COVID cases versus controls. We acknowledge the limitation that the control population we use is suboptimal, given that it may also include patients potentially affected by Long COVID, although they would likely be a small fraction. This ‘contamination’ in our control population seems not to impair discrimination between cases vs. controls, suggesting that our stringent inclusion criteria together with using matched general population was an adequate strategy.
We also recognize the limitation that PRS have been trained with Northern European populations, mostly the UK Biobank, which will contain some ancestral genetic differences to the patient cohort in which we tested the PRS with (Iberian Spanish, IBS). In a previous publication [28] we argued that the difference between ancestries within the same continent may influence results but still yield sufficient statistical power for PRS to enable stratification of high risk individuals. The traits or symptoms we have tested for this study do not necessarily overlap 100% with the symptoms described in the clinic (e.g., ‘Tiredness/lethargy in past two weeks’ partially overlaps with fatigue). Yet, the overlap seems significant to infer that a manifestation of fatigue would be concordant with the more specific trait PRS used in our analysis.
Although our case population is predominantly female (~ 80%), we have not made a distinction between sexes when comparing results, given the small sample size. We leave this question for future work and acknowledge that by including only women in the analysis could slightly change our results. In addition, our approach for selecting symptoms from the PGS Catalog could be interpreted as somewhat ad hoc based on a subjective (although rigorous) selection approach. For future work we plan to test many more PRS following our selection criteria outlined in Methods. We expect that further testing of hundreds of PRS could suggest unsuspected traits and genetic predispositions likely to be associated with Long COVID. We also plan an expanded Long COVID cohort of patients for PRS testing for future work, which could further suggest clues about the pathophysiological mechanisms of the disease.
Our approach does not identify Long COVID-specific markers but PRS with weighted allele risks for independent traits which can be used for stratification of patients with Long COVID genetic predisposition. In this regard, our methodology is different from other studies that have examined genetic traits associated with tiredness [30], which is the most common symptom reported by Long COVID patients [16,17]. We believe our significantly associated PRS could be utilized as a potential test for predisposition to Long COVID to help predict and prioritize care precisely targeted to the most susceptible patients. Despite the promising results shown here, they will need to be tested in a greater and more diverse population than the one used here.
From results presented here major questions remain. We have not been able to find why Long COVID affects women more than men. We also do not know the exact nature of the association between ‘Tiredness/lethargy in the past two weeks’ and the underlying molecular mechanisms of the disease. It also remains speculation that persons predisposed to depression should be more liable to Long COVID. What our results do suggest, however, is an unequivocal genetic signature for the disease. These results should help dissipate doubts on whether Long COVID is just a subjective condition, where there must be a pathophysiological mechanism behind Long COVID that makes some individuals more prone to develop it.
Finally, our results have been tested on an Iberian Spanish population. However, when comparing cases against controls spanning all 1000G Europeans (n = 503), we get almost identical results (data not shown). This suggests that results are applicable not only to Iberians but Europeans as well. Although it remains speculation, we would also expect similar applications to ancestrally diverse populations although with less precision. Regrettably, on this occasion we only have European patients in our tested cohort. We would like to encourage scientists and patient groups from other ancestries and continents to make it possible to test these hypotheses; so that the fruits of genomic science become accessible to all humanity regardless of ancestral origin.