Clinical data of the studied cohort
A total of 444 COVID-19 unrelated patients (263 males and 181 females) admitted to the biggest hospitals in Sofia, Bulgaria, from October 2020 to April 2022, were included in the present study. They were divided into three qualitative phenotypic groups depending on the clinical severity: (1) patients with severe and critical course (181 cases; 41%); (2) individuals having moderate disease (129 cases; 29%); and (3) all cases who test positive for SARS-CoV-2 but who have some mild or no symptoms consistent with COVID-19 (134 cases; 30%) (Fig. 1). Phenotype definitions are described in Methods. The most frequent co-morbidities were malignant diseases, hypertension, cardiovascular disease, and diabetes mellitus (Table 1). Detailed clinical data of each patient are recorded in a local database and could be provided on request.
Analysis of the infected patients showed that the sex is significantly associated with the disease severity. Males are more likely to develop severe or critical disease than females. Relative risk for severe disease in males is 1.32 and OR = 1.59 (p = 0.024, 95% CI-1, 07 − 2,35). Within the infected subjects, patients with severe infections and those who died were significantly older than the remaining clinical groups (asymptomatic/mild infection and moderate infection) (p = 0.002). Correlation analysis also showed significantly association between age and disease severity (Spearman correlation coefficient – 0.193 with two tailed p = 4.06x10− 5).
Recurrent genetic variants
An expected observation of our study was the high prevalence of subjects carrying three rare variants in the primary structural component of airways mucus gene, MUC5AC. These include c.1993A > G (rs36195734, MAF 0.02%) found in 302 subjects (68% in total; 28% severely ill), c.2018C > G (rs200292517, MAF 0.001%) identified in 281 (63% in total; 25% severely ill), and c.1974C > A (rs74811639, MAF 0.09%) which was detected in 147 (33% in total; 13% severely ill) patients.
Identification of pathogenic variants in COVID-19 patients
Given multiple previous reports related to the impact of pathogenic genetic variants on the severity and outcome of COVID-1930, we set off to test such effect using our cohort of 444 patients with varying degree of severity of the disease and clinical outcomes.
Using WES data, we focused on genes that have been involved in viral infection sensitivity, host immune response, genes related to coagulation, genes related to cardiovascular function, and genes that were previously shown to be associated with COVID-19 outcomes. Finally, we collected available information in a specific panel of 1172 genes (Suppl. Table S1). After applying filter steps described above, 9,237 rare variants remained in the gene panel for 444 patients; the mean number of variants per person was 20.8. A total of 519 (out of 9,237) were variants classified as either pathogenic or likely pathogenic, located in 244 different genes. These variants were observed in a total of 296 patients (Suppl. Table S2).
Unique variants
Our dataset of rare genetic variants found in patients with different COVID-19 outcomes was analyzed for alterations with unique combination of position and allele. The number of these variants was 5,854 and they were located in 1083 different genes. They were detected in 439 patients, each carrying between 8 and 26 variants. Distribution of variants by type, MAF and functional consequences is shown in Fig. 2. Missense variants predominated among variants with other functional consequences (87%), synonymous changes counted 5.4%, and only 2.8% were represented by frameshift and splicing variants; (Fig. 2a). Ultra rare variants with MAF < 0.01% were the most numerous (Fig. 2b).
Fifty eight percent of the identified unique variants (n = 3,395) were located in 673 genes involved in immune response. Seventy eight percent (n = 4,558) of these rare variants were detected in genes with an autosomal recessive (AR) inheritance mode (CFTR, IL17RC, CCDC103, OAS2, OAS3, THBS3, ELF5, FBRSL1, SLC22A31, NR1H2, TYK2, IL36RN, DNAH5, VPS13B, etc.); 19.3% (n = 1,134) of the variants were found in genes with an autosomal dominant (AD) inheritance mode (such as ABCA7, NFKB1, PLCG2, RNASEL, TLR3, BRCA1/2, CHEK2, MUC5B, and others), 1.7% (n = 99) variants were identified in X-linked genes (among which TLR7, ACE2 and G6PD), and 63 variants (1%) were located in overlapping AR and AD genes. Differences between patients with severe versus non-severe COVID-19 were not observed in the proportions of these variants.
Novel variants
Approximately 900 (n = 916; 9.9%) of the identified 9,237 rare variants were novel, unreported in gnomAD or dbSNP (as of April 2024) (Suppl. Table S3). Of note, 47 of those mutations (5.2%) appeared in heterozygous state in more than one (2 or 3) unrelated individuals (Table 2). Recurring extremely rare variants included synonymous, missense and frameshift changes. They were identified predominantly in individuals with mild course of the disease, although no conclusion as to the effect of the variants could be drawn considering the small number of patients involved.
Table 2
Novel rare variants recurring in more than one cases. Bold: different substitutions involving the same amino acid residue in two additional cases; Phenotype of the carriers: 0 – critical, 1 – severe, 2 – moderate, 3 – mild, 4 - asymptomatic
Gene Names
|
Chr:Pos
|
Nucleotide position
|
Effect on protein
|
Number of carriers
|
Phenotype of the carriers
|
Rare novel variants recurring in more than one cases
|
ACE
|
17:61561237
|
NM_000789.4:c.1614C > T
|
NP_000780.1:p.Phe538=
|
2
|
1,3
|
ADAMTSL4
|
1:150529195
|
NM_019032.6:c.1675C > A
|
NP_061905.2:p.Arg559Ser
|
2
|
0,1
|
ADIPOR1
|
1:202915630
|
NM_015999.6:c.367G > A
|
NP_057083.2:p.Ala123Thr
|
2
|
3,3
|
AGBL2
|
11:47703611
|
NM_024783.4:c.1825T > G
|
NP_079059.2:p.Cys609Gly
|
2
|
2,3
|
ANGPT1
|
8:108276554
|
NM_001146.5:c.1231A > T
|
NP_001137.2:p.Thr411Ser
|
2
|
3,1
|
ANKRD11
|
16:89346802
|
NM_013275.6:c.6148A > G
|
NP_037407.4:p.Thr2050Ala
|
2
|
2,4
|
C5
|
9:123837071
|
NM_001317163.2:c.44C > A
|
NP_001304092.1:p.Pro15Gln
|
2
|
1,1
|
CC2D2A
|
4:15482381
|
NM_001080522.2:c.181delC
|
NP_001073991.2:p.Gln61Argfs*13
|
2
|
1,2
|
CCPG1, DNAAF4-CCPG1
|
15:55664195
|
NM_001204450.2:c.502A > G, NR_037923.1:n.1919A > G
|
NP_001191379.1:p.Ser168Gly
|
2
|
3,3
|
CDAN1
|
15:43018579
|
NM_138477.4:c.3133G > C
|
NP_612486.2:p.Asp1045His
|
2
|
1,3
|
CFAP65
|
2:219885975
|
NM_194302.4:c.3158G > A
|
NP_919278.2:p.Ser1053Asn
|
2
|
3,3
|
CFTR
|
7:117307042
|
NM_000492.4:c.4323C > G
|
NP_000483.3:p.Ile1441Met
|
3
|
1,1,3
|
DCLRE1B
|
1:114449747
|
NM_022836.4:c.319C > A
|
NP_073747.1:p.Leu107Ile
|
3
|
1,3,3
|
DEF6
|
6:35278407
|
NM_022047.4:c.409A > G
|
NP_071330.3:p.Met137Val
|
2
|
1,3
|
DNAH9
|
17:11672458
|
NM_001372.4:c.7364G > C
|
NP_001363.2:p.Ser2455Thr
|
2
|
2,2
|
ENTREP2, NSMCE3
|
15:29561633
|
NM_015307.2:c.277-16951G > C, NM_138704.4:c.277G > C
|
NP_619649.1:p.Val93Leu
|
2
|
2,2
|
ESPN
|
1:6505825
|
NM_031475.3:c.1302_1325del
|
NP_113663.2:p.Ser436_Pro443del
|
2
|
3,3
|
F13B
|
1:197032161
|
NM_001994.3:c.91G > T
|
NP_001985.2:p.Glu31Ter
|
2
|
1,3
|
FLG
|
1:152284690
|
NM_002016.2:c.2672G > A
|
NP_002007.1:p.Ser891Asn
|
2
|
1,2
|
GAS2L2
|
17:34077177
|
NM_139285.4:c.546G > T
|
NP_644814.1:p.Glu182Asp
|
2
|
1,3
|
HHATL
|
3:42739041
|
NM_020707.4:c.824G > A
|
NP_065758.3:p.Ser275Asn
|
3
|
2,2,3
|
HIP1
|
7:75187530
|
NM_005338.7:c.1405A > G
|
NP_005329.3:p.Ser469Gly
|
3
|
3,3,3
|
INO80
|
15:41275066
|
NM_017553.3:c.4447T > C
|
NP_060023.1:p.Ser1483Pro
|
2
|
0,3
|
IRF2
|
4:185311880
|
NM_002199.4:c.718C > T
|
NP_002190.2:p.Pro240Ser
|
2
|
1,3
|
KANSL1
|
17:44116476
|
NM_001193466.2:c.2309G > T
|
NP_001180395.1:p.Arg770Leu
|
2
|
3,3
|
SOGA3
|
6:127767931
|
NM_014702.5:c.1533T > A
|
NP_055517.3:p.Asn511Lys
|
3
|
3,3,3
|
LDLR
|
19:11217326
|
NM_000527.5:c.780C > T
|
NP_000518.1:p.Asp260=
|
2
|
3,3
|
LRP4
|
11:46917857
|
NM_002334.4:c.952C > G
|
NP_002325.2:p.Leu318Val
|
2
|
1,2
|
MASP2
|
1:11094962
|
NM_006610.4:c.1010G > C
|
NP_006601.2:p.Gly337Ala
|
2
|
2,2
|
MCM4
|
8:48877224
|
NM_005914.4:c.784C > T
|
NP_005905.2:p.Pro262Ser
|
2
|
3,3
|
MUC5B
|
11:1271906
|
NM_002458.3:c.13796C > A
|
NP_002449.2:p.Thr4599Lys
|
2
|
2,2
|
NACAD
|
7:45122138
|
NM_001146334.2:c.3641A > C
|
NP_001139806.1:p.Lys1214Thr
|
2
|
0,3
|
NLRC4
|
2:32463358
|
NM_021209.4:c.2364A > T
|
NP_067032.3:p.Lys788Asn
|
2
|
1,1
|
ODAD2
|
10:28283873
|
NM_018076.5:c.199G > A
|
NP_060546.2:p.Ala67Thr
|
2
|
3,3
|
PDE4A
|
19:10577953
|
NM_001111307.2:c.2317G > C
|
NP_001104777.1:p.Glu773Gln
|
2
|
1, 1
|
PRDM16
|
1:3103020
|
NM_022114.4:c.369G > C
|
NP_071397.3:p.Glu123Asp
|
2
|
3,3
|
RANBP2
|
2:109392290
|
NM_006267.5:c.8395A > G
|
NP_006258.3:p.Ile2799Val
|
3
|
0,2,3
|
SLC47A1
|
17:19437356
|
NM_018242.3:c.104T > A
|
NP_060712.2:p.Leu35Gln
|
2
|
1,2
|
SLC4A2
|
7:150767275
|
NM_003040.4:c.1291A > G
|
NP_003031.3:p.Ser431Gly
|
2
|
3,3
|
SMARCA4
|
19:11144456
|
NM_001128849.3:c.3788G > T
|
NP_001122321.1:p.Ser1263Ile
|
2
|
0,3
|
SRC
|
20:36028639
|
NM_005417.5:c.981G > A
|
NP_005408.1:p.Gln327=
|
3
|
1,1,3
|
STAT2
|
12:56742747
|
NM_005419.4:c.1537A > G
|
NP_005410.1:p.Asn513Asp
|
3
|
3,3,3
|
TCN2
|
22:31013400
|
NM_000355.4:c.1024C > G
|
NP_000346.2:p.Pro342Ala
|
2
|
2,3
|
TLR4
|
9:120476855
|
NM_138554.5:c.2449G > A
|
NP_612564.1:p.Asp817Asn
|
2
|
2,2
|
TOP2B
|
3:25670534
|
NM_001330700.2:c.1792A > C
|
NP_001317629.1:p.Ile598Leu
|
2
|
0,3
|
VPS45
|
1:150082729
|
NM_007259.5:c.1612C > T
|
NP_009190.2:p.His538Tyr
|
2
|
1,1
|
CENPF
|
1:214811208
|
NM_016343.4:c.1447-1G > T
|
-
|
2
|
1,1
|
Different substitutions involving the same amino acid residue in two additional cases
|
PKD1L3
|
16:72033750
|
NM_181536.2:c.128G > C
|
NP_853514.1:p.Ser43Thr
|
1
|
2
|
PKD1L3
|
16:72033750
|
NM_181536.2:c.128G > T
|
NP_853514.1:p.Ser43Ile
|
1
|
1
|
Within this analysis, 93 variants subsequently classified as either pathogenic or likely pathogenic according to the ACMG criteria were identified. Among them, 38 were found in the subgroup of patients (n = 181, 41%) who developed severe complications during SARS-CoV-2 infection or deceased. These variants were dispersed across 34 distinct genes which are described in Suppl. Table S3. The subgroups corresponding to patients with moderate (n = 129, 29%), and mild/asymptomatic infections (n = 134, 30%) carried 55 novel pathogenic/likely pathogenic variants in total, which were located in 51 genes. (Suppl. Table S3). Genes were mainly connected to pathways for respiratory, immune, cardiovascular, and neurological diseases, all of which have been linked to the wide range of symptoms seen in severe, non-severe and long-COVID patients. Statistical analysis did not identify differences in the proportion of carriers of pathogenic genetic variants between the groups. These results indicate that the pathogenic allele carrier status does not directly influence the course of COVID-19 in our patient group.
Genetic variants associated with SARS-CoV-2 severity
In our study, we did not find any of the variants in ACE1 (rs4646994), ACE2 (rs2074192), IFNAR2 (rs2236757, rs13050728), TYK2 (rs2304256, rs7956615), OAS1 (rs10774671, rs4766664), OAS3 (rs10735079), CD40 (rs4813003), FCGR2A (rs1801274), CASP3 (rs113420705), and DPP9 (rs2277732)31, nor the variants in TLR3, TLR4, TLR7, TLR8 and TLR9 previously associated with prognosis and susceptibility to COVID-19 infection32, but we have detected ultra-rare variants (MAF < 0.01%) in three genes related to immune response (MUC5AC, ABCA7, FLNA), previously linked to the COVID-19 host response to different degrees33.
A variant, ACE2-c.1189A > G (p.Asn397Asp), found in a 37-year-old female (19334R) was included in a recently reported predisposing ACE2-genetic background34. The p.Gln665Glu variant in IL17RC, which was not described in public databases (dbSNP and gnomAD), was detected in a 58-year-old male from our cohort (18810R) who was diagnosed with autoimmune rheumatologic disease and diabetes type 2 at the time of the SARS-CoV-2 infection (Suppl. Table S2). This patient also carried the variants p.Ala40Glu and p.Glu329Gln located in the genes NFKB1 and TMPRSS2, respectively, which by its action facilitate the SARS-CoV-2 entry35. The same case was heterozygous for a pathogenic variant p.His154Pro in the gene CCDC103 associated with pulmonary symptoms in long-COVID patients36.
Rare variants affecting interferon signaling pathways (UNC93B1, IRF9, TLR3, IFNA1, TICAM1, IRF3, and IRF7) were identified in 7 (7/444; 1.6%) patients below 60 years of age who were previously healthy.
Novel genetic variants (not reported in the dbSNP and gnomAD) were detected in patients with critical and severe COVID-19 (Table 2, Suppl. Table S3). Among them, variants which were classified as pathogenic and possibly pathogenic according to ACMG criteria, were found in IL10RB, CENPF, DGKE, FYCO1, TONSL, CIB1, HLA-A, DCDC1, VPS13B and BRCA2 (Suppl. Table S3). Two of the severe patients (18012S and 18057S) shared a splice variant, c.1447-1G > T, in the gene CENPF, which according to the prediction programs is very likely to affect the splicing (Suppl. Table S3). An increase in the levels of centromeric protein CENPF which is involved in the regulation of cell cycle is observed as a result of viral infection suggesting that this protein is important for the viral replication control37. Therefore, it is possible that pathogenic variants in CENPF might influence the course of COVID-19. The c.804 + 2delT variant found in the IL10RB gene in a severely infected patient 17955S (Suppl. Table S3) represents a splice variant which, according to the prediction algorithms would affect splicing. IL10RB, encoding an interleukin receptor subunit, is reported to be a key regulator of COVID-19 host susceptibility and severity. Elevated blood levels of IL10RB are associated with a poor prognosis for the course of SARS-CoV-2 infection38, as it was recorded in case 17955S. Newly found variants in FYCO1 were detected in 3 patients with severe course of COVID-19. Two of these variants, classified as likely pathogenic, were identified in heterozygous state in patients with severe pneumonia, 18565R and 18291R (Suppl. Table S3). Variants of uncertain significance localized in FYCO1 were found in all clinical patient subgroups except the asymptomatic carriers.
Despite the absence of a significant effect of the presence of pathogenic variant on COVID-19 illness, we found that severely infected patients from our group carried pathogenic variants in genes linked to Mendelian disorders with increased susceptibility to infectious diseases, auto-inflammation, auto-immunity, allergy or malignancies. Pathogenic variants in the F7 (p.Ala354Val and p.Arg283Trp) and F11 (p.Gln244Ter and p.Trp519Ter) coagulation-related genes were found in patients with severe COVID-19 (Suppl. Table S2). Four severely affected cases (three males and one female) were carriers of a pathogenic variant c.563C > T (p.Ser188Phe) in the G6PD gene, mutations in which are associated with glucose-6-phosphate dehydrogenase deficiency (Suppl. Table S2). Pathogenic variants p.Arg4496Ter and p.Arg3539His in DNAH5 were detected in two patients with a critical and severe course of COVID-19, respectively (Suppl. Table S2). Variants in genes involved in the formation of cilia and flagellum have been suggested to be relevant to the body's response to SARS-CoV-2 infection. One of the deceased patients in our group, clinically diagnosed with hypertension (18542R), was a carrier of a pathogenic LoF variant in the DNAH5 gene, p.Arg4496Ter (Suppl. Table S2).
Heterozygous pathogenic variants in LDLR, p.Ala399Thr and p.Ser286Arg, were observed in three patients with severe and one with critical course of COVID-19, respectively (Suppl. Table S2). Eight CF-causing variants were identified in 9 patients with severe COVID-19 (Suppl. Table S2). A variant, c.2758G > T (p.Val920Leu), located in the AR gene CFTR and found to be deleterious in cystic fibrosis patients39 was identified in heterozygosity in the severely infected patient 18059S. Two cases with severe infection were carriers of well-known CF-disease causing mutations: CFTR-c.2491G > T (p.Glu831Ter) located in coding exon 1540, and CFTR-c.328G > C (p.Asp110His) located in coding exon 4 of the gene41.
We also noticed that some patients with critical or severe progression carried known pathogenic variants in genes linked to AD diseases. Mutations were found in genes related to cardiovascular disease (SCN5A, LDLR, DSG2), DNA damage repair response (CHEK2), coagulation (PROC), primary immune disorder (TNFRSF1A, COL7A1, LZTR1, CASP10), hemoglobin subunit β (HBB), and other genes (COL9A3, MUC5B, CHRND), associated with severe COVID-19. These cases required intubation and developed severe complications during SARS-CoV-2 infection or some of them died. Clinical features of the subjects carrying the identified variants, including the COVID-19 disease course and co-morbidities, are summarized in Supplementary Table S4.
This subset included three individuals with variants in the DNA damage repair response Fanconi anemia (FA)-BRCA pathway gene, CHEK2, which has been previously implicated in the pathogenesis of severe COVID-1942. In patients 18118R and 19110R we identified known pathogenic variant CHEK2-c.433C > T (p.Arg145Trp) that has been reported 20 times in ClinVar in hereditary cancer-predisposing syndrome patients (Suppl. Table S4). The second mutation, CHEK2-c.715G > A (p.Glu239Lys), found in patient 17759R is associated with familial breast cancer and is classified as pathogenic, supported by clinical interpretations previously reported in ClinVar43. Nevertheless, none of the patients had any symptoms of malignant tumors, although the ECOG score was changed in case 17759R (Suppl. Table S4).
Functional network analysis of genes with pathogenic/likely pathogenic variants
We further performed a network analysis, including 244 genes with pathogenic/likely pathogenic variants (Suppl. Table S2) to examine functional interactions between them. As a result, we identified 51 protein-protein interaction networks consisting of between 2 and 28 genes, and 9 singletons (Fig. 3). The majority of molecular functions, interactions and pathways are related to immune responses and DNA metabolism and repair, epithelial cilium movement and determination of left/right asymmetry, cellular response to oxidized low-density lipoprotein, regulation of lipid storage and lipid transport, complement activation; several interactions and pathways are related to the metabolic and cardiovascular systems, which could lead to multi-organ complications and dysfunction (Suppl. Table S5).
By comparing the clusters in terms of rare pathogenic/likely pathogenic changes carried by patients with severe/critical infections and the clinical groups of asymptomatic/mild and moderate cases we found one gene cluster that has an impact on severe outcomes of COVID-19. This main component contains 13 highly interconnected genes, ABCA7, CCDC103, CCNO, DNAH1, DNAH11, DNAH5, DNAH9, LVRN, NME8, STK36, SPAG1, GAS2L2, RAD51AP2 (Fig. 3, Table 3), related to epithelial cilium movement; mutations in these loci are associated with primary ciliary dyskinesia. The number of variants was 17, all in heterozygous state, and they were identified in 20 different patients (Table 3). Sixteen of these candidate variants were found in genes with an AR inheritance mode (such as CCDC103, CCNO, DNAH1, DNAH11, DNAH5, DNAH9, LVRN, NME8, STK36, SPAG1, GAS2L2, or RAD51AP2) and one variant was detected in gene with an AD inheritance mode (ABCA7). Statistical analysis reached significance in the assessment of all genes from the cluster among clinical groups. Results showed that patients carrying pathogenic or likely pathogenic variants in this gene cluster are more likely to develop critical or severe COVID-19 diseases with OR of 2.67 (95% CI 1.1–6.5), two tailed p = 0.028.
Variants DNAH11-c.11804C > T (p.Pro3935Leu) and CCDC103-c.461A > C (p.His154Pro) were found in three cases, and DNAH1-c.171delC (p.Lys58Serfs*25) was detected in two cases. The detected heterozygous frameshift variant in ABCA7, c.2871delC (p.Ser958Profs*34), was found in a 52-year-old male who was recorded with very severe pneumonia (18213R) and no co-morbidities. The p.Ser958Profs*34 variant in ABCA7 was not described in public databases (gnomAD) and was not found in other patients from our cohort (Table 3). According to the ACMG guidelines this change was classified as likely pathogenic. One of the deceased patients, clinically previously diagnosed with a hypertension (18542R), carried a pathogenic LoF variant in the DNAH5 gene (p.Arg4496Ter). Two cases with no pre-existing co-morbidities, one with moderate (18058S) and one with severe disease (19581R) carried two variants. Patient 18058S carried two likely pathogenic variants, c.11804C > T (p.Pro3935Leu) and c.607C > T (p.Arg203Ter), in DNAH11 and NME8, respectively. One pathogenic, CCDC103-c.461A > C (p.His154Pro), and one likely pathogenic change, LVRN-c.1856dupA (p.Asn619Lysfs*26), were found in a severely infected 72-year-old male (19581R).