Subjects
Thirty-one subjects with ASD and thirty matched controls participated in this study. Because of poor fMRI quality, 29 subjects with ASD and 28 controls were finally included in the data analysis. The participants with ASD (aged 20.2 ± 6.0 years old, five female), were recruited from a community autism program. We reconfirmed the diagnosis of ASD by using the Diagnostic and Statistical Manual of Mental Disorders 5th Edition's (DSM-5) diagnostic criteria [1]. The participants in the age- and sex-matched control group (22.3 ± 3.5 years old, eight females) were recruited from the local community, and screened for major psychiatric illnesses by conducting structured interviews. The subjects did not participate in any intervention or drug programs during the experimental period. Participants with a comorbid psychiatric or medical condition, history of head injury, or genetic disorder associated with autism were excluded. All of them had normal or corrected-to-normal visual acuity.
Procedures
Before fMRI scanning, each participant underwent assessments with the State-Trait Anxiety Inventory (STAI) to determine their self-reported anxiety levels [20], as well as the Autism-Spectrum Quotient (AQ) [21].
The paradigm for fMRI scanning was derived from the work by Etkin et al. [9]. The visual stimuli consisted of black and white pictures of male and female faces with fearful and neutral facial expressions, which were chosen from the Pictures of Facial Affect [22]. The faces were oriented to maximize inter-stimulus alignment of eyes and mouths, and then artificially colorized (red, yellow, or blue) and equalized for luminosity. During fMRI scanning, subjects performed the color identification task, in which they were asked to judge the color of each face (pseudo-colored in either red, yellow, or blue) and to indicate the answer by a keypad button press. Each stimulus presentation involved a 200-ms fixation cross to cue subjects to focus on the center of the screen, followed by a 400-ms blank screen and a 200-ms face presentation. Participants then had 1200-ms to respond with a key press, indicating the color of the face. Non-masked stimuli consisted of a 200-ms fearful- or neutral-expression face. Backwardly masked stimuli consisted of 17-ms of a fearful or neutral face, followed by a 183-ms neutral face mask belonging to a different individual, but of the same color and gender as the previous one. Each epoch (12-s) consisted of six trials of the same stimulus type [Explicit Fearful (EF), Explicit Neutral (EN), Implicit Fearful (IF), or Implicit Neutral (IN)], but were randomized with respect to color and gender. The presentation order of the total 12 epochs (two for each stimulus type) and 12 fixation blocks (with a 12-s fixation cross) were pseudo-randomized. To avoid stimulus order effects, we used two different counterbalanced run orders. The stimuli were presented using Matlab software (MathWorks, Inc., Sherborn, MA, USA) and were triggered by the first radio frequency pulse for the functional run. The stimuli were displayed on VisuaStim XGA LCD screen goggles (Resonance Technology, Northridge, CA). The screen resolution was 800 × 600, with a refresh rate of 60 Hz. Behavioral responses were recorded by a fORP interface unit and saved in the Matlab program.
Immediately after fMRI scanning, participants underwent the detection task, during which they were shown all of the stimuli again and alerted of the presence of fearful faces. The subjects were administered a forced-choice test under the same presentation conditions as those during scanning and asked to indicate whether they observed a fearful face or not. The detection task was designed to assess possible awareness of the masked fearful faces. The chance level for correct answers was 50%. The performance was determined by the calculation of a detection sensitivity index (ď) based on the percentage of trials in which a masked stimulus was detected when presented [‘hits’ (H)] and adjusted for the percentage of trials a masked stimulus was ‘detected’ when not presented [‘false alarms’ (FA)]; [d’ = z-score (percentage H) − z-score (percentage FA), with chance performance = 0 ± 1.74].
Functional MRI data acquisition, image processing and analysis
Functional and structural MRI data were acquired on a 3T MRI scanner (Siemens Magnetom Tim Trio, Erlanger, German) equipped with a high-resolution 32-channel head array coil. A gradient-echo, T2*-weighted echoplanar imaging (EPI) with a blood oxygen level-dependent (BOLD) contrast pulse sequence was used for functional data. To optimize the BOLD signal in the amygdala [23], twenty-nine interleaved slices were acquired along the AC-PC plane, with a 96 × 128 matrix, 19.2 × 25.6 cm2 field of view (FOV) and 2 × 2 × 2 mm voxel size, resulting in a total of 144 volumes for the functional run (TR = 2-s, TE = 36-ms, flip angle = 70°, slice thickness 2 mm, no gap). Parallel imaging GRAPPA with factor 2 was used to increase the speed of acquisition. Structural data were acquired using a magnetization-prepared rapid gradient echo sequence (TR = 2.53-s, TE = 3.03-ms, FOV = 256 × 224 mm2, flip angle = 7°, matrix = 224 × 256, voxel size = 1.0 × 1.0 × 1.0 mm3, 192 sagittal slices/slab, slice thickness = 1 mm, no gap).
Image processing and analysis were performed using SPM8 (Wellcome Department of Imaging Neuroscience, London, UK) in MATLAB 7.0 (MathWorks Inc., Sherborn, MA, USA). Structural scans were coregistered to the SPM8 T1 template, and a skull-stripped image was created from the segmented gray matter, white matter, and CSF images. These segmented images were combined to create a subject-specific brain template. EPI images were realigned and filtered (128-s cutoff), then co-registered to these brain templates, normalized to MNI space, and smoothed (4 mm FWHM). The voxel size used in the functional analysis was 2 × 2 × 2 mm3. All subjects who completed scanning had less than 1 voxel of in-plane motion.
Preprocessing for the T1-weighted images involved using the following DARTEL algorithm: new segment – generate roughly aligned gray matter (GM) and white matter (WM) images of the subjects; create template – determine nonlinear deformations for warping all the GM and WM images so that they match each other; and normalize to Montreal Neurological Institute (MNI) space – images were normalized to the MNI template and were smoothed with an 8-mm full-width at half-maximum Gaussian filter. Then, the GM, WM, and cerebrospinal fluid (CSF) structures of each patient were obtained after processing. A two-level approach for block-design fMRI data was adopted using general linear model implemented in SPM8. Fixed effects analyses were performed at the single subject level to generate individual contrast maps, and random effects analyses were performed at the group level. At single subject level, contrast images were calculated comparing each explicitly and implicitly presented fearful face block with the neutral baseline. Shorthand (e.g., EF − EN) was used to indicate the contrasts of regressors (e.g., Explicit Fearful > Explicit Neutral). Error bars signify SEM. To isolate the effects of fear content of the stimuli from other aspects of the stimuli and the task, we subtracted neutral (EN) or masked neutral (IN) activity from fearful (EF) or masked fearful activity (IF), respectively. The explicit perception of fearful faces was denoted as non-masked fear (EF − EN), and the implicit perception of fearful faces as masked fear (IF − IN). These contrast images were then entered to the second level group analysis. The resulting first-level contrast images were then entered into a full factorial analysis: 2 (group: ASD vs. CTL) x 2 (attention: explicit vs. implicit). Whole brain activations were corrected for multiple comparisons family-wise error (FWE) rate at P < .05.
Using MarsBar (see http://marsbar.sourceforge.net/), regions of interest (ROIs) were drawn from the right and left amygdala according to prior meta-analysis [24]. Signals across all voxels with a radius of 5 mm in the ROIs were averaged and evaluated for the masked and non-masked comparisons. The individual mean parameter estimates (beta values) were then subject to a mixed ANOVA, as to test for main effects of group (ASD vs. CTL) and attention (explicit vs. implicit), as well as group-by-attention interactions.
Functional connectivity analysis
The psychophysiological Interaction (PPI) assesses the hypothesis that the activity in one brain region can be explained by an interaction between cognitive processes and hemodynamic activity in another brain region. The interaction between the first and second regressors represented the third regressor. The individual time series for the right amygdala was obtained by extracting the first principal component from all raw voxel time series in a sphere (5-mm radius) centered on the coordinates of the subject-specific amygdala activations. These time series were mean-corrected and high-pass filtered to remove low-frequency signal drifts. The physiological factor was then multiplied by the psychological factor to constitute the interaction term. PPI analyses were then carried out for each subject, and involved the creation of a design matrix with the interaction term, the psychological factor, and the physiological factor as regressors. PPI analyses were separately conducted for each group (ASD vs. CTL) in order to identify brain regions showing significant changes in functional coupling with the amygdala during explicitly and implicitly perceived fear. Subject-specific contrast images were then entered into random effects analyses to compare the group effect. Monte Carlo simulation implemented using AlphaSim [25] determined that a 5-voxel extent at height threshold of P < .005 uncorrected yielded a FWE corrected threshold of P < .05, accounting for spatial correlations in neighboring voxels. Subsequently, a multiple regression model was run separately for each seed to estimate the regression coefficient between all voxels and the interaction time series (along with task, movement, and linear drift as nuisance regressors). The strength of association between all voxels and the interaction time series was measured with R2 values. These coefficients of determination were square-rooted then multiplied by the sign of their respective estimated beta weights to obtain directionality of association. The correlation coefficients of the interaction time series were then converted to z-scores using Fisher’s transformation. The resulting statistical maps were then included in a second-level group analysis (ASD vs. CTL) by running a voxel-based two-sample t-test on the z-scores of the interaction effect for each seed separately.