Participants
The structural MRI data used in this study were obtained from the ABIDE I and II projects (38, 39). All data collection procedures were approved by the local Institutional Review Board. Subject inclusion criteria were as follows: 1) complete whole-brain coverage, 2) good image quality (see follows), and 3) sites with more than 10 subjects in each group after meeting the above criteria. Finally, a total of 1829 subjects (ABIDE I: 460 patients with ASD and 515 healthy controls (HCs) from 15 sites and ABIDE II: 379 patients with ASD and 475 HCs from 13 sites) were included in our study.
Image quality control included two steps: 1) each image was visually inspected for obvious artifacts due to head motion; 2) check the quality control report generated by the Computational Anatomy Toolbox (CAT12) manually, images were excluded if their weighted average image quality rating (IQR) was lower than 70 and volumes with low mean homogeneity (below two standard deviations from the sample mean) were again visually inspected for artefacts. Additional information about the subjects for each site is provided in Table 1. Further information on data acquisition and site-specific details (i.e., protocols, test batteries used, and scanning parameters) is available at the ABIDE website (https://fcon_1000.projects.nitrc.org/indi/abide/).
Table 1. Distribution of Participants Across Different Sites in ABIDE I and II Datasets.
ABIDE, Autism Brain Imaging Data Exchange; ASD, autism spectrum disorder; HC, healthy control; BNI, Barrow Neurological Institute; CALTECH, California Institute of Technology; CMU, Carnegie Mellon University; EMC, Erasmus University Medical Center Rotterdam; GU, Georgetown University; IP, Institut Pasteur and Robert Debré Hospital; IU, Indiana University; KKI, Kennedy Krieger Institute; LEUVEN, University of Leuven; MAX_MUN, Ludwig Maximilians University Munich; NYU, New York University Langone Medical Center; OHSU, Oregon Health and Science University; OLIN, Institute of Living at Hartford Hospital; PITT, University of Pittsburgh School of Medicine; SBL, Social Brain Lab BCN NIC UMC Groningen and Netherlands Institute for Neurosciences; SDSU, San Diego State University; SU, Stanford University; TCD, Trinity Centre for Health Sciences; TRINITY, Trinity Centre for Health Sciences; UCD, University of California Davis; UCLA, University of California, Los Angeles; UM, University of Michigan; USM, University of Utah School of Medicine; YALE, Yale Child Study Center, Yale School of Medicine.
Magnetic resonance imaging data pre-processing
T1 images were manually set the origin at the anterior commissure, then were processed using the CAT12 toolbox (version 1980, Structural Brain Mapping, Jena University Hospital, Jena, Germany) implemented in SPM12 (version 7771, Institute of Neurology, London, UK). We employed the default parameters of CAT12 for this pre-processing procedure. All the T1-weighted images were corrected for bias-field inhomogeneities, then segmented into gray matter, white matter, and cerebrospinal fluid and spatially normalized using the DARTEL algorithm (40). The final resulting voxel size was 1.5 × 1.5 × 1.5 mm.
For surface-based morphometry, CAT12 toolbox computes multiple surface parameters, including FD (41). These surface parameters of the left and right hemispheres were separately resampled and smoothed with a 20-mm FWHM Gaussian kernel. The software parcellated the cortex into 400 regions of interest (ROI) using the Schaefer atlas 17 networks for surface measures (42). We then averaged the FD value from each ROI in each participant for further analyses.
Mega analysis
As the ABIDE datasets are multicentric with heterogeneous acquisition parameters across sites, raw FD values were harmonized between sites using the ComBat method to remove inter-site variation and preserve biological variability (43). Independent two-sample t-tests were performed between ASD and HC groups to identify cortical complexity changes related differences. All statistical analyses were performed using R software (version 4.2, https://www.r-project.org/), and a threshold of p < 0.05, False Discovery Rate (FDR) corrected, was applied.
Meta analysis
To account for differences in scanners, acquisitions and sample characteristics, statistical analysis was conducted using a prospective meta-analytic technique, where each site is initially treated as an independent study and results are pooled to define significance. Effect sizes were computed as standardized mean differences (Cohen’s d) using Hedges’g as estimator. The between-study variance τ2 was estimated using the restricted maximum likelihood method, from which we computed the proportion of variance imputable to heterogeneity. Computations were performed using R (https://www.r-project.org) with the packages meta and metafor (44). We report statistical significance for an α level of .05.
Gene expression data processing
Gene expression data were obtained from the AHBA (http://human.brain-map.org). The AHBA comprises the normalized expression data of 20737 genes represented by 58692 probes taken from 3702 brain tissue samples from six donors (one female and five males, aged 24–57 years) (32). A newly proposed pipeline for transcription-neuroimaging association studies based on AHBA data was used in this study (33). Only genes that were consistently expressed across donors (i.e., average inter-donor correlation ≥0.5) were considered for our analyses. To correct for donor-specific effects, scaled robust sigmoid (SRS) normalization was used to ensure equivalent scaling of expression values for each donor. After this procedure, the expression values were more comparable across donors. Finally, we obtained a normalized gene expression matrix of 400 × 15633 (ROI × gene) (42). The detailed preprocessing steps are described in Supplementary Material.
Identifying transcriptomic correlates of cortical complexity changes in ASD
To identify genes whose expression was inferentially correlated with ASD-related alterations, we used a PLS regression, a multi-variate technique accounting for inherent shared topological structure between brain-derived neuroimaging phenotypes and gene expression. Significant genes were obtained by regressing each gene against our t-statistical maps and using a one-sample t-test to determine whether the slopes were different from 0. To correct for multiple comparisons, the procedure was repeated by randomly rotating our maps using 1000 spin permutations, which were compared with the original t-statistic to assess gene significance. We then ranked all genes according to their z score weights to the PLS components.
From the ranked PLS gene list, genes with Z score more than 1.96 or less than -1.96 (PFDR < 0.05, FDR corrected for the total number of genes tested), were selected. These are denoted as PLS+ and PLS-, representing genes most positively and negatively associated with FD changes in ASD patients.
Gene enrichment analyses
We employed the Metascape software to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways analysis (https://metascape.org/) (45). GO was used to determine their biological functions including molecular functions (MFs), biological processes (BPs), and cellular components (CCs) (46). KEGG was used to identify related biological pathways (47).
We employed Specific Expression Analysis (SEA) (48) to assess the potential over-representation of cortical complexity changes related genes in three specific domains, namely cell types, brain regions, and developmental stages. This analysis incorporated a specificity index probability (pSI), offering insights into the enrichment levels of genes within specific terms compared to others (49). The transcriptional profiles for brain development were sourced from the BrainSpan Atlas of the Developing Human Brain (http://www.brainspan.org/). All enrichment analyses were executed using Fisher's exact tests, and Benjamini-Hochberg False Discovery Rate (BH-FDR) correction was applied to account for multiple comparisons, ensuring a stringent threshold of significance (q < 0.05) for gene functional annotations.
We used TissueEnrich R package (50) and cell type specific expression analyses (CSEA) tools (http://doughertytools.wustl.edu/CSEAtool.html) (49) to conduct tissue, cell type, and temporal specific expression analyses. These specific expression analyses could help to determine the specific tissues, cortical cell types, and developmental stages in which the FD alteration related genes were overrepresented. Fisher’s exact tests were used to assess the significance of the above-mentioned enrichment analyses. Multiple testing was corrected using the BH-FDR correction with a corrected P value of 0.05.
We constructed PPI networks form the up and down regulated gene sets using STRING version 10.5, with the highest confidence value of 0.9 (51). Hub genes were defined by the top 1% of the node degree in the PPI networks. Additionally, the Human Brain Transcriptome database (http://hbatlas.org/) was used to characterize the spatial–temporal expression trajectory of hub genes with the highest node degree.