2.1 ICA test description and the scientific rationale behind the test
The ICA test is a rapid visual categorization task with backward masking 20,21,24. The test takes advantage of the human brain’s strong reaction to animal stimuli 25,26. One hundred natural images (50 animal and 50 non-animal) are carefully selected, with various levels of difficulty, and are presented to the participants in rapid succession. Images are presented at the center of the screen at 7° visual angle. In some images the head or body of the animal is clearly visible to the participants, which makes it easier to detect. In other images the animals are further away or otherwise presented in cluttered environments, making them more difficult to detect. Few sample images are shown in Figure 1. We used grayscale images to remove the possibility of some typical color blindness affecting participants’ results. Furthermore, color images can facilitate animal detection solely based on color27,28, without fully processing the shape of the stimulus. This could have made the task easier and less suitable for detecting less severe cognitive dysfunctions.
The strongest categorical division represented in the human higher level visual cortex appears to be that between animates and inanimates 29,30. Studies also show that on average it takes about 100ms to 120ms for the human brain to differentiate animate from inanimate stimuli 26,31,32. Following this rationale, each image is presented for 100 ms followed by a 20 millisecond inter-stimulus interval (ISI), followed by a dynamic noisy mask (for 250 ms), followed by subject’s categorization into animal vs non-animal (Figure 1). Shorter periods of ISI can make the animal detection task more difficult and longer periods reduce the potential use for testing purposes as it may not allow for the detection of less severe cognitive impairments. The dynamic mask is used to remove (or at least reduce) the effect of recurrent processes in the brain 33,34. This makes the task more challenging by reducing the ongoing recurrent neural activity that could artificially boost subject’s performance; it further reduces the chances of learning the stimuli. For more information about rapid visual categorization tasks refer to Mirzaei et al., (2013) 21.
The ICA test starts with a different set of 10 test images (5 animal, 5 non-animal) to familiarize participants with the task. These images are later removed from further analysis. If participants perform above chance (>50%) on these 10 images, they will continue to the main task. If they perform at chance level (or below), the test instructions will be presented again, and a new set of 10 introductory images will follow. If they perform above chance in this second attempt, they will progress to the main task. If they perform below chance for the second time the test is aborted
Backward masking: To construct the dynamic mask, following the procedure in (Bacon-Macé and colleagues, 2005) 20,21 , a white noise image was filtered at four different spatial scales, and the resulting images were thresholded to generate high contrast binary patterns. For each spatial scale, four new images were generated by rotating and mirroring the original image. This leaves us with a pool of 16 images. The noisy mask used in the ICA test was a sequence of 8 images, chosen randomly from the pool, with each of the spatial scales to appear twice in the dynamic mask.
2.2 Brief International Cognitive Assessment for MS (BICAMS)
The BICAMS battery consists of three standard pen-and-paper tests, measuring speed of information processing, visuo-spatial memory and verbal learning.
Symbol Digit Modalities Test (SDMT): The SDMT is designed to assess speed of information processing, and takes about 5 minutes to administer 35.
California Verbal Learning Test -2nd edition (CVLT-II): The CVLT-II test 36,37 begins with the examiner reading a list of 16 words. Participants listen to the list and then report as many of the items as they can recall. Five learning trials of the CVLT-II are used in BICAMS6, which takes about 10 minutes to administer.
Brief Visual Memory Test–Revised (BVMT-R): The BVMT-R test assesses visuo-spatial memory 38,39. In this test, in three consecutive trials, six abstract shapes are presented to the participant for 10 seconds. After each trial, the display is removed from view and patients are asked to draw the stimuli via pencil on paper manual responses. The test takes about 5 minutes to administer. 6 shapes are presented for 10 sec over 3 consecutive trials; after each trial participants are asked to draw the stimuli.
2.3 Participants
In total, 174 volunteers took part in substudy1 (Table 1): 91 patients diagnosed with multiple sclerosis (MS), and 83 age, gender and education matched healthy controls. 48 MS patients took part in substudy2 (Table 2). Of all participants 25 attended both substudies. Participants’ age varied between 18 and 65. The study was conducted according to the Declaration of Helsinki and approved by the local ethics committee at Royan Institute. Informed written consent was obtained from all participants. Patient participants were consecutively recruited from the outpatient clinic of the Aria Medical Complex for MS in Tehran, Iran. Patients were diagnosed by a consultant neurologist according to the McDonald diagnostic criteria (2010 revision)40. Healthy controls (HC) were recruited through local advertisement.
Participants’ exclusion criteria included: Severe depression and other major psychiatric comorbidities, presence of neurological disorders and medical illness that independently affect brain function and cognition (other than MS for the patient group), visual problems that cannot be corrected with eye-glasses such that the problem prevents participant from reading, upper limb motor dysfunction, history of epileptic seizures, history of illicit substance and/or alcohol dependence.
For each participant, clinical characteristics of MS subtype, information on age, education and gender were also collected. We quantified participant disability and disability progression over time by utilising the Expanded Disability Status Scale (EDSS).
For the purposes of this study, patients with sever abnormality in at least one of the BICAMS sub-tests (defined as 2SD below the norm) or with mild abnormality (defined as 1SD below the norm) in at least two sub-tests of BICAMS were identified as cognitively impaired.
2.4 Study procedures
Substudy 1: 174 participants (Table 1) took the iPad-based ICA test and the pen-and-paper BICAMS test, administered in random order. The same researchers who administered the BICAMS, directed participants on how to take the iPad ICA test. In this substudy we investigated convergent validity of the ICA test with BICAMS, ICA’s test-retest reliability and the sensitivity and specificity of the ICA platform in detecting cognitive impairment in MS.
To measure test-retest reliability for the ICA test, a subset of 21 MS and 22 HC participants were called back after five weeks (± 15 days) to take the ICA test as well as the SDMT. The subset’s characteristics were similar to the primary set in terms of age, education and gender ratio. For both SDMT, and ICA, the same forms of the tests were used in the re-test session. Note that in the ICA test, while the images were the same, their presentation order randomly changes in every administration.
Substudy 2: In this substudy, we investigated ICA’s correlation with the level of serum NfL in 48 MS patients (Table 2). Participants took the iPad-based ICA test and the pen-and-paper SDMT test, administered in random order. ICA and SDMT were administered in the same session, but blood samples were collected in another visit with a gap of 2-3 days in between.
Blood samples were collected in tube for serum isolation, then centrifuged at 3000 rpm for 20 minutes of blood draw, and finally placed on ice. Serum samples were measured at 1:4 dilution. NfL concentrations in serum were measured using a commercial ELISA (NF-light® ELISA, Uman Diagnostics, Umeå, Sweden). We used Anti NF-L monoclonal antibody (mAB) as a capture antibody and a biotin-labeled Anti NF-L mAB as the detection antibody. All samples measured blinded. ELISA readings were converted to units per milliliter by using a standard curve constructed by calibrators (Bovine lyophilized NfL obtained from UmanDiagnostics).
Participants in substudy 2 also attended an 8-week physical/cognitive rehabilitation program, details of which are reported in separate studies (41,42). For the purposes of this study, to show ICA’s ability as a digital biomarker to track changes in cognition, we report pre and post rehabilitation ICA results for these group of participants, and the ICA correlation with NfL pre and post rehabilitation. Participants were divided into a rehabilitation group of 38 individuals and a control group of ten; the control group only took the tests pre and post these 8 weeks without attending the rehabilitation program. The rehabilitation group attended three sessions in each week, each of them lasting about 70 minutes.
Physical rehabilitation program included a combination of endurance and resistance exercises, with gradually increasing intensities over the 8-week period.
Cognitive rehabilitation program included playing newly developed games in virtual reality (VR) environment, targeting sensorimotor integration, memory-based navigation, and visual search.
2.5 Accuracy, speed, and ICA summary score calculations
Participants’ responses to each image and their reaction times (i.e. time between image onset and response) are recorded and used to calculate their overall accuracy and speed. Speed and accuracy are then used to calculate an overall summary score, we refer to as the ICA score.
Accuracy is simply defined as the number of correct categorisations divided by the total number of images, multiplied by a 100.
(see Equation 1 in the Supplementary Files)
Speed is defined based on participant’s response reaction times in trials they responded correctly.
(see Equation 2 in the Supplementary Files)
Speed is inversely related with participants’ reaction times; the higher the speed, the lower the reaction time.
Preprocessing: We used boxplot to remove outlier reaction times, before computing the ICA score. Boxplot is a non-parametric method for describing groups of numerical data through their quartiles; and allows for detection of outliers in the data. Following the boxplot approach, reaction times greater than q3 + w * (q3 - q1) or less than q1 - w * (q3 - q1) are considered outliers. q1 is the lower quartile, and q3 is the upper quartile of the reaction times. Where “w” is a 'whisker' ; w = 1.5. The number of reaction-time data-points removed by the boxplot can vary case by case; if this number exceeds 40% of the observed images, the results are deems invalid, and a warning is shown to the clinician to repeat the test. In this study none of the participants faced such a warning. The maximum number of outliers was 15%, which happened in one of the MS patients.
The ICA summary score is a combination of accuracy and speed, defined as follows:
(see Equation 3 in the Supplementary Files)
2.6 ICA’s artificial intelligence (AI) engine
ICA’s AI engine (Figure 2) used in this study was a multinomial logistic regression (MLR) classifier trained based on the set of ICA features extracted from the ICA test for each participant. These features included, the ICA score, and the trend of speed and accuracy during the test (i.e. whether the speed and/or accuracy were increasing or decreasing during the time-course of the test). The classifier also took subject’s age, gender and education in order to match subjects with similar demographics.
Multinomial logistic regression classifier (MLR)43 is a supervised regression-based learning algorithm. The learning algorithm’s task is to learn a set of weights for a regression model that maps participants’ ICA test output to classification labels.
The basic difference between ICA’s classification of patients (using the AI engine) and the conventional way of defining an optimal cut-off value for classification is the dimensionality (or the number of features) used to make the classification. For example, in a conventional assessment tool, an optimal cut-off value is defined based on the test score. This is a one-dimensional classification problem, and there is only one free parameter to optimize, therefore less flexibility to learn from more data. In ICA, however, the test returns a rich set of features (we have one reaction-time and accuracy per each image). ICA score is the most informative summary score, but on top of this, we used a classifier to find the optimum classification boundary in the higher dimensional space. There are more free parameters here to optimize and therefore, the classifier can benefit from more data to best set these parameters for achieving a higher accuracy. Furthermore, ICA’s performance can be further improved over time if it is exposed to more labelled data. This can be done by providing new batches of training to update the current AI model available on the cloud.