GWAS for STEM occupations
As reported in Table 1, the data involve 178,876 participants of European ancestry that passed quality control from the UK Biobank database, among whom 9,273 (5.2%) had a STEM career. In the pooled sample (N = 178,876), 47.8% were males, 34.9% held a college or university degree, and the average age was 54.3 (s.d. = 7.6). The share of male participants among STEM professionals (N = 9,273) was 83.5%, which is much higher than that among non-STEM workers (45.9%), suggesting a notable gender gap in the professional STEM field in the UK labor market.
Table 1
Summary statistics of the analytical sample in UK Biobank
Variable
|
(1) Pooled
|
(2) STEM participants
|
(3) Non-STEM participants
|
Mean
|
Std. Dev.
|
Mean
|
Std. Dev.
|
Mean
|
Std. Dev.
|
Age
|
54.3
|
7.6
|
53.4
|
7.9
|
54.4
|
7.6
|
Male
|
47.8%
|
-
|
83.5%
|
-
|
45.9%
|
-
|
College/university degree
|
34.9%
|
-
|
56.9%
|
-
|
33.8%
|
-
|
STEM occupation
|
5.2%
|
-
|
100.0%
|
-
|
0.0%
|
-
|
Height (centimeters)
|
170.8
|
9.2
|
175.4
|
8.1
|
170.5
|
9.2
|
Weight (kilograms)
|
78.6
|
15.9
|
82.9
|
14.7
|
78.4
|
16.0
|
BMI
|
27.3
|
4.7
|
27.0
|
4.2
|
27.3
|
4.7
|
Household size
|
2.6
|
1.3
|
2.6
|
1.3
|
2.6
|
1.3
|
Household income (pounds)
|
65593.9
|
46458.2
|
78087.8
|
44435.4
|
64915.4
|
46469.5
|
|
|
|
|
|
|
|
N
|
178,976
|
9,273
|
169,703
|
We identified two genome-wide significant (p < 5e-8) single-nucleotide polymorphisms for the binary trait of STEM (Figure 1 and Table 2): rs10048736 on chromosome 2 and rs12903858 on chromosome 15. Using LocusZoom (Pruim et al., 2010), SNPs within 500 kb of the lead SNPs were plotted in Figure 1. The genomic inflation factors (λcc) was estimated to be 1.07, suggesting no substantial impact of population structure or unmodeled relatedness. The NHGRI-EBI GWAS Catalog showed links between lead SNPs and genes with neuroticism, educational attainment, body mass index, sleep-related phenotypes, corpus callosum mid-posterior volume, and Parkinson’s disease progression.
Table 2
Summary of top loci for STEM trait identified from GWAS
SNP
|
CHR
|
BP
|
A1
|
A2
|
EAF
|
Beta
|
SE
|
P
|
Nearest Gene(s)
|
Function
|
rs10048736
|
2
|
144239303
|
A
|
G
|
0.36
|
0.086
|
0.014
|
1.46E-09
|
ARHGAP15
|
Intron Variant
|
rs12691680
|
2
|
144204338
|
T
|
C
|
0.36
|
0.086
|
0.014
|
1.94E-09
|
ARHGAP15
|
Intron Variant
|
rs12903858
|
15
|
45972166
|
C
|
T
|
0.43
|
-0.086
|
0.014
|
2.26E-09
|
SQOR/SQRDL
|
Intron Variant
|
rs770075436
|
2
|
144228576
|
T
|
TAAAG
|
0.35
|
0.085
|
0.014
|
2.45E-09
|
ARHGAP15
|
Intron Variant
|
rs13411140
|
2
|
144215811
|
T
|
C
|
0.36
|
0.085
|
0.014
|
2.74E-09
|
ARHGAP15
|
Intron Variant
|
rs28684621
|
2
|
144231584
|
G
|
T
|
0.36
|
0.084
|
0.014
|
3.77E-09
|
ARHGAP15
|
Intron Variant
|
rs56081031
|
2
|
144197423
|
C
|
A
|
0.27
|
-0.092
|
0.016
|
4.56E-09
|
ARHGAP15
|
Intron Variant
|
rs28380327
|
2
|
144232491
|
T
|
A
|
0.36
|
0.084
|
0.014
|
4.59E-09
|
ARHGAP15
|
Intron Variant
|
rs35789697
|
2
|
144248718
|
A
|
G
|
0.35
|
0.084
|
0.014
|
4.89E-09
|
ARHGAP15
|
Intron Variant
|
rs2381455
|
2
|
144182917
|
G
|
T
|
0.27
|
-0.089
|
0.016
|
1.73E-08
|
ARHGAP15
|
Intron Variant
|
rs6705184
|
2
|
144265362
|
G
|
C
|
0.5
|
0.079
|
0.014
|
1.96E-08
|
ARHGAP15
|
Intron Variant
|
rs4662334
|
2
|
144241887
|
G
|
A
|
0.36
|
0.08
|
0.014
|
2.20E-08
|
ARHGAP15
|
Intron Variant
|
rs12617059
|
2
|
144238363
|
T
|
G
|
0.27
|
-0.087
|
0.016
|
2.91E-08
|
ARHGAP15
|
Intron Variant
|
rs12999615
|
2
|
144149614
|
A
|
T
|
0.27
|
-0.087
|
0.016
|
3.08E-08
|
ARHGAP15
|
Intron Variant
|
rs4402695
|
2
|
144259542
|
T
|
C
|
0.27
|
-0.087
|
0.016
|
3.94E-08
|
ARHGAP15
|
Intron Variant
|
rs57014442
|
2
|
144220812
|
AAT
|
A
|
0.27
|
-0.086
|
0.016
|
4.04E-08
|
ARHGAP15
|
Intron Variant
|
rs35564832
|
2
|
144237261
|
G
|
A
|
0.27
|
-0.086
|
0.016
|
4.14E-08
|
ARHGAP15
|
Intron Variant
|
rs13020925
|
2
|
144263416
|
C
|
T
|
0.27
|
-0.087
|
0.016
|
4.52E-08
|
ARHGAP15
|
Intron Variant
|
Heritability and genetic correlations of STEM occupations
For the trait of STEM occupation, the LDSC SNP-based heritability (h2) was estimated to be 4.2% (95% CI, 2.8% to 5.6%). The SNP heritability of STEM occupation is smaller than physical traits but generally in accordance with some of the behavioral traits, such as risk tolerance (h2 = 4.6%; Linnér et al., 2019) and leadership position (h2 ranged from 3% to 8%; Song et al., 2022)
Next, we examined genetic correlations between STEM occupations with 17 potentially linked personal traits using GWAS summary statistics from previous studies, including educational attainment, intelligence, personality, risk preference, height, brain volume, income, and sleep duration. As shown in Figure 2, we found significant positive genetic correlations of STEM occupation with educational attainment (rg=0.68, 95% CI, 0.55 to 0.82), cognitive ability (rg=0.62, 95% CI, 0.48 to 0.75), intelligence (rg=0.60, 95% CI, 0.44-0.75), household income (rg=0.45, 95% CI, 0.32 to 0.59), noncognitive ability (rg=0.42, 95% CI, 0.30 to 0.53), and sleep duration (rg=0.12, 95% CI, 0.00 to 0.23); and significant negative genetic correlations with ever smoked regularly (rg=-0.41, 95% CI, -0.53 to -0.29), insomnia (rg=-0.29, 95% CI, -0.43 to -0.15), morningness (rg=-0.24, 95% CI, -0.35 to -0.12), risk-taking (rg=-0.22, 95% CI, -0.34 to -0.10), drinks per week (rg=-0.18, 95% CI, -0.28 to -0.07), neuroticism (rg=-0.17, 95% CI, -0.28 to -0.07), and number of children (rg=-0.16, 95% CI, -0.31 to -0.00).
The explanatory power of the STEM polygenic score
To examine the explanatory power of the polygenic score (PGS) of STEM, we constructed the PGS based on the GWAS results (p-value threshold of 0.05; standardized between 0 and 1) and obtained parameters from multiple regression analyses of various socioeconomic outcomes, including STEM jobs, educational attainment (measured by whether a participant holds a college/university degree), and household income (Table 3). All regressions were adjusted for age, age-squared, sex (except for rows b and c), educational attainment (except for column 2), and the first ten genetic principal components (PCs) of each participant.
In the pooled UKB cohort (Table 3, row a), a 0.1 increase in the STEM polygenic score was associated with a 7.6% increase in the probability of being a STEM professional, a 4.6% increase in the probability of earning a college/university degree, and 634-pound increase in annual household income. Moreover, for the phenotype of STEM jobs, the STEM polygenic score accounted for 3.5% of the variance. This is on top of sex and educational attainment, which accounted for 0.1% and 1.4% of the variance of STEM jobs, respectively. Interestingly, the estimated effects of PGS on the propensities to have a STEM job and a college/university degree were found to be more pronounced in males than in females, but the opposite is found for household income (Table 3, rows b and c).
We also estimated the effects of STEM PGS in five independent UK Biobank cohorts with ethnic backgrounds different from the original discovery dataset (i.e., White British), including Irish (N = 13,108), Indian (N = 5,835), Caribbean (N = 4,420), African (N = 3,308), and Chinese (N = 1,538). As shown in Table 3, rows d to h, the STEM polygenic score is significantly associated with the phenotype of STEM jobs in all five independent UK Biobank subsamples (p < 0.0001). For educational attainment, the PGS of STEM was found to be a significant predictor in Irish (p < 0.0001), Indian (p <0.0001), and African (p = 0.009) subsamples. For household income, the STEM PGS was found to be a significant predictor only in the Caribbean (p = 0.011) and African (p = 0.038) subsamples.
Table 3
Parameter estimates of STEM polygenic score in explaining socioeconomic attainment
|
(1) STEM jobs
|
(2) Educational attainment
|
(3) Income
|
UKB Cohort
|
beta
|
p
|
N
|
beta
|
p
|
N
|
beta
|
p
|
N
|
(a) Pooled
|
0.764
|
<0.001
|
315,400
|
0.455
|
<0.001
|
487,202
|
6340.0
|
<0.001
|
415,766
|
(b) Males
|
1.169
|
<0.001
|
150,713
|
0.532
|
<0.001
|
223,043
|
5395.1
|
<0.001
|
198,227
|
(c) Females
|
0.318
|
<0.001
|
164,687
|
0.379
|
<0.001
|
264,159
|
6744.2
|
<0.001
|
217,539
|
(d) Irish
|
0.667
|
<0.001
|
8,617
|
0.395
|
<0.001
|
12,707
|
5826.5
|
0.246
|
11,133
|
(e) Indian
|
0.990
|
<0.001
|
4,092
|
0.459
|
<0.001
|
5,660
|
13444.8
|
0.150
|
4,253
|
(f) Caribbean
|
0.522
|
<0.001
|
3,217
|
0.185
|
0.081
|
4,295
|
21886.0
|
0.011
|
3,305
|
(g) African
|
1.399
|
<0.001
|
2,212
|
0.417
|
0.009
|
3,202
|
24189.6
|
0.038
|
2,445
|
(h) Chinese
|
1.629
|
<0.001
|
1,112
|
0.218
|
0.264
|
1,502
|
615.1
|
0.977
|
1,201
|
Notes: All regressions were adjusted by age, age-squared, sex (except for rows b and c), and the first ten genetic principal components of each participant. Regressions of STEM jobs and income were also adjusted by whether holding a college/university degree.
Assortative mating and intergenerational occupational transmission of STEM occupations
To test for assortative mating of STEM jobs at both genetic and phenotypic levels, we identified 39,985 couples as pairs of unrelated (i.e., genetic relationship < 0.05) opposite-sex individuals matched on several household variables as described previously (Yengo et al., 2018; Cheesman et al., 2020). We found significant correlations between couples for STEM occupations both genetically (ρ = 4.3%, p < 0.0001) and phenotypically (ρ = 11.0%, p < 0.0001), lending support to the existence of assortative mating and economic homogamy from the perspective of occupational choices (Yengo et al., 2018; Gonalons-Pons et al., 2017).
We also identified 3,708 parent-offspring pairs based on the kinship coefficients described in Cheeseman et al. (2020) to examine the intergenerational transmission of STEM occupations. Surprisingly, although we found a significant genetic correlation of STEM jobs between parents and offspring (ρ = 52.9%, p < 0.0001), the phenotypic correlation of actual STEM occupational choices is insignificant (ρ = 4.5%, p =0.21), implying a potential higher intergenerational occupational mobility and more employment opportunities for offspring of STEM parents (Modalsli, 2017).
Associations between average STEM polygenic score and regional economic performance
Finally, we investigated whether the regional average STEM polygenic score is associated with the economic performance of local administrative authorities in the UK. Figure 3 shows the geographic distributions of the average STEM polygenic score at the local authority level based on both current home address (a) and birthplace (b) provided in the UK Biobank. We tested the association between the average STEM polygenic score (based on home address) and four publicly available indicators of regional economic performance, including gross domestic product (GDP) from 1998 to 2019, value-added tax (VAT) from 1998 to 2019, counts of business by industry groups in 2016, and employments of business by 17 industry groups in 2016 of 380 local administrative units in UK Biobank dataset. All regional data were collected from the UK’s Office for National Statistics (ONS) website.
As shown in Figure 4, controlling for regional population and aggregate regional indicators (i.e., East Midlands, East of England, London, North East, North West, Scotland, South East, South West, Wales, West Midlands, and Yorkshire and The Humber), the STEM PGS has a statistically significant association with VAT (panel b; 1998-2009, and 2016-2019), but not with GDP (panel a). In terms of local business counts, a 0.1 increase in the average STEM polygenic score is significantly associated with 942 more business counts in professional, scientific & technical, 250 more in information & communication, 245 more in construction, 215 more in arts, entertainment, recreation and other services, 96 more in production, 44 more business counts in education, 35 more in motor trades, and 16 more in public administration & defence. Similarly, for local employments of business, a 0.1 increase in the average STEM polygenic score is significantly associated with 3,415 more employment in professional, scientific & technical, 1,176 more in arts, entertainment, recreation and other services, 1,079 more in agriculture, forestry & fishing, 868 more in education, 783 more in construction, 256 more in motor trades, and 113 more in public administration & defence.
Regional business counts and employment are crucial indicators of the local market and business performance, and the value-added tax has been linked to economic efficiency (Adhikari, 2020). Thus, as discussed by Abdellaoui et al. (2019), the results presented here imply that some of the regional economic outcomes are directionally linked to STEM-associated alleles detected by GWAS conducted in the current study. But such links must be interpreted with caution as they may not necessarily be causal. Other factors such as labor migration and business reallocation might drive these links.