A growing body of epidemiological evidence suggests that men exhibit a higher mortality rate to COVID-19 than women 1–3, yet the underlying biology remains largely unknown. Hypotheses pertaining to the expression of viral entry protein, hormone levels, and immune systems are actively explored 4,5 and sex steroid hormone drugs are being investigated in clinical trials (Estradiol: NCT04359329, Progesterone: NCT04365127, Degarelix: NCT04397718). In a recent study, sex differences in immune responses in COVID-19 were examined by Takahashi et al., where blood samples from female patients were found to present more robust T-cell activation than male patients during SARS-CoV-2 infection 6. Comparing samples from nasopharyngeal swabs, Lieberman et al. observed that male patients had reduced B cell-specific and NK cell-specific transcripts and an increase in inhibition of nuclear factor kappa-B signaling 7.
SARS-CoV-2 engages the receptor ACE2 (angiotensin-converting enzyme 2) for entry into the target cell through its spike protein 8. Its internalization requires priming of the spike protein by the cellular protease TMPRSS2 (transmembrane protease, serine 2) in the host cell 9, thus co-expression of ACE2 and TMPRSS2 on the target cell surface is required for virus entry. The high mortality in patients with COVID-19 may be partially driven by the strong affinity of the virus to ACE2 and the facilitation from TMPRSS2. A few studies 10–12 analyzed gene expression of ACE2 and/or TMPRSS2 between sexes using bulk or single-cell RNA-Seq samples primarily profiled from healthy individuals. However, many of the hospitalized COVID-19 patients have an underlying illness which increases mortality risks, and are in older age range (>60 years) 13. Furthermore, the number of patients in these studies 10–12 was relatively small.
In this study, we address the question of sex differences in response to SARS-CoV-2 in three ways. First, in order to quantify expression differences between sexes, we leverage public gene expression profiles covering a wide range of age, tissues, and disease conditions, and later utilize Electronic Medical Records (EMR) to validate findings. We also harness the emerging COVID-19 patient gene expression profiles to characterize cellular response differences between sexes in the upper airway and blood. Lastly, we investigated in vitro antiviral activity of sex steroid hormones in two cell lines infected by SARS-CoV-2.
Expression of ACE2 and TMPRSS2 in a diverse and comprehensive set of human samples
We first compiled three large independent expression datasets consisting of 220,835 samples from diverse tissue types and patient populations (healthy and disease conditions) and completed their meta-information, including sex, age group (younger: 0-19, middle: 20-59, and older: >60), and tissue of origin (14 main tissues), through machine learning and manual annotation (Figure S1). To minimize batch effects, all the samples in each dataset were profiled under the same platform and processed using the same pipeline. The first dataset was compiled from the Treehouse project (T), where 17,654 RNA-Seq samples primarily from consortium projects including TCGA, GTEx, and TARGET were processed through the Toil pipeline 14. The second dataset was downloaded from the ARCHS4 (A) project, where 60,936 human RNA-Seq samples profiled under the Illumina HiSeq 2000 platform were aligned using Kallisto 15. The last dataset was collected from the GEO (G), where 145,947 samples profiled under the Affymetrix GPL570 platform were processed using Robust Multi-array Average (RMA). The sex, age group, and tissue of origin were obtained from the original resources (Methods); however, a substantial number of samples had missing metadata, especially in sets A and G, where only 1,407 and 4,392 samples have all sex, age, and tissue information, respectively. Leveraging their expression profiles, we built machine learning models (deep multi-task neural network and XGBoost) that completed metadata for the majority of samples with high confidence (Table S1). All of these predictions were further manually inspected based on unstructured sample metainformation available in the source files when possible.
Since both the proportion of samples with high expression of entry proteins and the absolute expression value of these proteins within individual samples are important to understand sex differences, we analyzed both categorical and continuous expression data. We first merged all three datasets into one single matrix (referred to as the Merged dataset) consisting of 220,835 samples, after categorizing them into high (e.g., top 10% within each dataset) and normal expression groups (i.e., ACE2 high vs. ACE2 normal, TMPRSS2 high vs. TMPRSS2 normal, and ACE2&TMPRSS2 high vs. ACE2&TMPRSS2 normal) in individual datasets. The Merged dataset, including diseased samples, healthy samples, and those samples with the treatment of perturbgenes samples, might be one of the best resources to investigate expression of entry proteins thus far. Similarly, we compiled a single matrix consisting of 8,066 healthy samples from T (referred to as the Healthy dataset). Logistic regressions were applied to predict the high expression group using age, tissue, and sex as features (by default, 95% confidence interval (CI), female as a reference). In addition to analyzing categorical expression data using the Merged dataset, we compared absolute expression between sex groups for each dataset separately.
Table 1. Odds ratios of sex in the prediction of the high expression group in the Healthy dataset and in the Merged dataset. In the regression model, Y is the binary expression level of an entry protein, and X is sex with female being the reference. The all age group is adjusted for age, tissue, and data source, and the older group (>60 years) is adjusted for tissue and data source. *: p < 0.001 and #: 0.009.
|
Samples in the Healthy dataset
|
Samples in the Merged dataset
|
|
all ages (n=8,066)
|
>60 (n=2,849)
|
all ages (n=220,835)
|
>60 (n=37,911)
|
ACE2
|
1.06 [0.86-1.32]
|
1.02 [0.72-1.44]
|
1.25 [1.19-1.30]*
|
1.15 [1.07-1.23]*
|
TMPRSS2
|
1.03 [0.85-1.27]
|
0.91 [0.65-1.29]
|
1.28 [1.23-1.34]*
|
1.32 [1.24-1.42]*
|
ACE2 & TMPRSS2
|
0.80 [0.52-1.24]
|
0.55 [0.27-1.10]
|
1.16 [1.09-1.24]*
|
1.12 [1.03-1.22]#
|
We did not observe any significant difference in the proportion of highly expressed ACE2, TMPRSS2, or ACE2&TMPRSS2 samples between women and men in the Heathy dataset after adjusting for age and tissue (Table1). However, the proportion of highly expressed samples in men is larger than in women in the Merged dataset (ACE2: OR 1.25 [1.19-1.30], P < 0.001; TMPRSS2: OR 1.28 [1.23-1.34], P < 0.001; ACE2&TMPRSS2: OR 1.16 [1.09-1.24], P < 0.001) (Table 1). In the older group, proportion difference was also observed in the Merged dataset (ACE2: OR 1.15 [1.07-1.23], P < 0.001; TMPRSS2: OR 1.32 [1.24-1.42], P < 0.001; ACE2&TMPRSS2: OR 1.12 [1.03-1.22], P=0.009), but not in the Healthy dataset (Table 1). Neither ACE2 nor TMPRSS2 is highly expressed in the majority of samples, while both are indeed highly expressed in a considerable number of samples in both men and women suggested by the long tails in both G and A, but not in T (normal) (Figure 1A). Compared to the younger group (0-19), the older group (>60) has a larger difference of ACE2 expression between males and females (G: M/F 1.11 [1.1-1.12], P < 0.001 in older vs. M/F 0.99 [0.98-0.99], P < 0.001 in younger) (Figure 1A) as well as TMPRSS2 expression (G: M/F 1.04 [1.03-1.04], P < 0.001 in older vs. M/F 1.0 [0.99-1.0], P=0.58 in younger) (Figure S2A). Further analysis of additional disease samples with the highest expression of ACE2 revealed that Crohn’s disease, ulcerative colitis, Barrett’s esophagus, trachoma, and ichthyosis have overall higher ACE2 expression in disease samples compared to control (Student’s t-test, P < 0.05, Figure S3). The difference of ACE2 expression between sexes in these disease samples was not observed, likely due to the small sample size for each disease. In summary, although expression difference of entry proteins between sexes was not observed in the Healthy dataset, higher ACE2 expression was found in men, especially in older men, in the Merged dataset.
Next we investigated whether there are expression differences in individual tissues. Perhaps because of the wide coverage of samples in A and G, expression of ACE2 has a larger variation in both datasets than in the Healthy dataset T, especially in the kidney, small intestine, heart, liver, and colon (Figure 1B), while a large variation exists in the expression of TMPRSS2 in the kidney, small intestine, liver, colon, lung, pancreas, and prostate (Figure S2B). ACE2 is not differentially expressed between sexes in the lung (OR: 0.9 [0.78-1.04], P > 0.001), and women have even lower TMPRSS2 expression in the lung in the Merged dataset [OR: 0.71(0.64-0.78), P < 0.001] (Table S2). Notably, the kidney is the only tissue showing a remarkable difference in ACE2 expression between sexes in both A and G (Figure 1B). After adjusting for age and data source, the OR is 1.45 [1.26-1.67] (P < 0.001) (Table S2). The kidney is also the only tissue showing a significant difference in TMPRSS2 expression between sexes in both A and G (Figure S2B). We were able to further map 28% of those samples with high ACE2 expression to their diseases based on sample metainformation. The top mapped diseases are clear cell/renal cell carcinoma (60.8%), renal interstitial fibrosis (9.1%), acute kidney injury (7.5%), glomerulosclerosis (6.7%), nephritis (4.3%), and nephropathy (2.6%). In addition, as steroid hormone receptors regulate the renin-angiotensin-aldosterone-system, where ACE2 is an essential component 16, we examined the expression relationship between ACE2 and steroid hormone receptors. ACE2 expression has a higher correlation with AR expression (Androgen Receptor, Spearman Rho: 0.72, P < 0.001) than with ESR1 expression (Estrogen Receptor 1, Rho: 0.19, P < 0.001), ESR2 expression (Estrogen Receptor 2, Rho: -0.12, P < 0.001), and PGR expression (Progesterone Receptor, Rho: 0.26, P < 0.001) in the kidney (Figure S4A). The genes regulated by AR also highly overlap with the genes positively co-expressed with ACE2 in the healthy adrenal gland (P = 3.08E-5, Figure S4B), suggesting that ACE2 expression might be associated with androgen receptor activity in the kidney.
In order to find clinical evidence of these findings, we analyzed 6,031 COVID-19 patients (4,621 inpatients and 1,410 outpatients with available labs) for serum creatinine levels measured in five member hospitals at Mount Sinai Health System up to May 10, 2020. We observed that men have significantly higher serum creatinine levels than women after normalizing to sex-specific reference ranges and adjusting for age and race (Inpatients OR: 1.89 [1.66-2.15], P < 0.001; Outpatients OR 2.12 [1.68-2.66], P < 0.001) (Extended Data), indicating COVID-19 male patients are most likely to have kidney dysfunction than female patients. Whereas, both expression and clinical data analysis suggest that sex difference in the kidney is not specific to the older group (Table S2 and Extended Data). Recent studies reported that acute kidney injury is common in patients hospitalized with COVID-19 and is associated with increased mortality 17,18. Together, the expression difference of entry proteins in kidney between sexes might be a factor contributing to sex differences in COVID-19 susceptibility.
Sex stratified analysis of host responses to SARS-CoV-2
We searched GEO and SRA to obtain COVID-19 patient RNA-Seq samples and reprocessed raw sequence data when possible. We compiled four datasets with gender information (one from upper airway nasopharyngeal, one from upper airway naso/oro-pharyngeal, one from blood PBMC, and one from blood leukocytes), totaling 782 samples (Table S4). In each dataset, the ratio of the number of samples between sexes is close to 1. For each large upper airway dataset, we stratified samples into an older age group (>60 years) and a middle age group (20-60 years). We enumerated all the possible comparisons for each dataset (i.e., female control vs. female patient, male control vs. male patient, female control vs. male control, and female patient vs. male patient), with each comparison using the same thresholds to select differentially expressed genes. In comparing female and male samples either in the control group or in the COVID-19 group, only a few sex-specific genes were dysregulated between women and men. However, expression of a vast number of genes was significantly changed (p < 0.001, absolute fold change > 2) between healthy patients and COVID-19 patients in either men or women (e.g., 4269 DE in female CT vs. SARS2 and 911 DE in male CT vs. SARS2 in middle age group; 627 DE in female CT vs. SARS2 and 29 DE in male CT vs. SARS2 in older age group in GSE152075) (Figure 2 A-F). Interestingly, such changes seem unique to each sex with only a small portion of DE genes shared by both sexes. The two datasets from blood show the largest number of shared DE genes (35.7% and 30.8%, Figure 2E, 2F), while the dataset from older male upper airway has the lowest number of shared DE genes (0.1%, Figure 2B). The lower number is likely because fewer genes were differentially expressed in older male upper airways after SARS-CoV-2 infection Female patients presented very distinct gene expression changes in all datasets, especially in the younger group (Figure 2A, 2C). Pathway enrichment analysis of these distinct DE genes confirmed the immune response differences (cytokinin mediated signaling, cellular response to interferon-gamma and interferon 1) as previously reported 6,7; however, a few other non-immune related pathways were enriched in female patients, including down-regulation of mitochondrial respiratory responses and regulation of cholesterol biosynthesis (Figure 2 G-I). The younger male group presented the downregulation of various immune responses such as humoral immune response, acute inflammatory response, and Fc- gamma signaling pathways (Figure 2G), but no enriched pathways were observed in older male COVID-19 patients. Together with the higher susceptibility in older men, the analysis suggests that men and women have distinct host cellular responses in addition to immune responses. Importantly, out dataset suggests that weak host responses in the upper airway could be one indicator of susceptibility.
We further inferred the enrichment of 64 cell types in COVID leucocytes samples using Xcell 19 and compared cell type enrichment. In both men and women, CD8+ T-cells and memory CD8 T Cell were suppressed in COVID ICU patients (Figure S5). NK cells were suppressed in male ICU patients while neutrophils were elevated in female ICU patients. One striking difference between men and women came from the enrichment of mesangial cells, a kidney-specific cell type (control vs. COVID: male p-value of 1E-7 and female p-value of 1E-1, Student’s t-test). Logistic regression analysis of enrichment of mesangial cells with disease severity (non-ICU vs. ICU) indicated patients with higher enrichment of mesangial cells are more likely admitted into ICU [OR: 3.2(1.6-8.3), P < 0.001] (adjusted for age, Supplementary Figure S5). Together with the higher expression of ACE2 and higher creatine levels in men, this analysis implies that impaired kidney function could be one source of sex differences.
Responses to sex steroid hormone treatment
The difference in sex hormone levels between sexes might contribute to disease susceptibility; however, to establish the connection requires the development of robust SARS-CoV-2 animal models or the launch of clinical trials, either of which could not be accomplished soon. Therefore, we sought to understand how infected cells respond to the treatment of hormones in vitro, including estrogens, progesterone, and androgens. We first evaluated the anti-SARS-CoV-2 activity of estradiol (estrogen receptor agonist), fulvestrant (estrogen receptor antagonist), danazol (androgen receptor agonist), bicalutamide (androgen receptor antagonist), and hydroxyprogesterone caproate (OHPC, progesterone receptor agonist) in Vero and Calu-3 cells (Table 2, Figure S6). Among these drugs, only OHPC was effective in cells challenged with SARS-CoV-2 (IC50 13 µM in Vero and 6.4 µM in Calu-3). A previous study validated progesterone in vitro and proposed it might act through targeting sigma receptors, the inhibitors of which displayed antiviral activity in vitro 20. Thus, we evaluated an additional progesterone receptor agonist desogestrel and did not observe the efficacy (IC50 > 50 µM). Similarly, in an independent screening effort from the NCATS OpenData Portal project, three progesterone receptor agonists (desogestrel, chlormadinone acetate, and danazol) showed weak anti-cytopathic effect activity (20 µM) 21. This suggested that the potential protective effect of progesterone might come from its off-target effect on sigma receptors. We further surveyed the activity of 62 steroid and non-steroid hormones drugs through literature search and querying of large-scale screening databases including NCATS OpenData Portal, and confirmed that ER agonists, ER antagonists, AR agonists, AR antagonists, and PR agonists generally did not present in vitro anti-SARS-CoV-2 activity, except diethylstilbestrol, a non-steroid ER agonist (with IC50 of 4.5 µM) (Extended Data). However, six out of eight selective estrogen receptor modulators (SERM) showed considerable activity (IC50: 3.4-12 µM). SERM also presented anti-EBOV activity in previous screening efforts, and their activity appeared to be an off-target effect 22,23. Together, the role of hormones in antiviral activity is still inconclusive; however, our data are hopeful to incite deeper investigation of its effect in vivo or in the clinic.
Table 2: In vitro anti-SARS-CoV-2 efficacy of steroid sex hormone drugs. The IC50s in Vero and Calu-3 cells were summarized from dose-response curves (Figure S6). The IC50s of SERMs (Selective Estrogen Receptor Modulators) were collected from published studies (Extended Data).
Drug
|
Drug class
|
Vero (uM)
|
Calu-3 (uM)
|
Other studies (uM)
|
Estradiol
|
Estrogen receptor
agonist
|
>50
|
27.3
|
|
Fulvestrant
|
Estrogen receptor
antagonist
|
>50
|
>50
|
|
Danazol
|
Androgen receptor agonist
|
>50
|
25.6
|
|
Bicalutamide
|
Androgen receptor antagonist
|
>50
|
>50
|
|
Hydroxyprogesterone
caproate
|
Progesterone receptor agonist
|
13.0
|
6.4
|
|
Desogestrel
|
Progesterone receptor agonist
|
>50
|
>50
|
|
Bazedoxifene
|
SERM
|
NA
|
NA
|
3.4 24
|
Droloxifene
|
SERM
|
NA
|
NA
|
6.6 24
|
Ospemifene
|
SERM
|
NA
|
NA
|
12.6 21
|
Raloxifene hydrochloride
|
SERM
|
NA
|
NA
|
2.2 21
|
Tamoxifen
|
SERM
|
NA
|
NA
|
9.0 25
|
Toremifene
|
SERM
|
NA
|
NA
|
11.3 25
|