In Experiment 1 we examined category learning in the 5/5 task for the first instantiation, a schematic “robot”, illustrated in Fig. 1. In the robot each feature is distinct and instantiated by a simple geometric form (e.g., triangles or rectangles for ears). Although features are physically connected, they do not form parts of a familiar figure. The images are black and white line drawings without texture, shading, or color, which also enhances the salience of individual features. We predicted that this instantiation would facilitate selective attention to dimensions and therefore we would see use of prototype representations.
Method
Participants
Participants included 44 undergraduate students from South China Normal University, 23 females and 21 males, aged between 18 and 23. None had previously participated in related category learning experiments. All had normal vision or corrected to normal vision. Participants were paid for their participation. According to G*Power calculations, the experiment required at least 30 participants to reach the medium effect size (1-β = 0.8; α = 0.05). This study was approved by the SCNU Institutional Review Board.
Materials
Stimuli were formed using the 5/5 category structure illustrated in Table 1 and Fig. 1. Stimuli consisted of line drawings of robots that differed along four possible binary features: antennae (straight or curly), ears (triangular or rectangular), eyes (triangular or circular), and base (white square or black oval). A total of 10 different stimuli were formed from each prototype following the feature values in Table 1. Dimensions were randomly assigned to features and counterbalanced across participants.
Table 1 “5/4” and “5/5” category structure.
Note: Feature definitions for the 5/4 and 5/5 structures. The 5/4 structure differed from the 5/5 structure in that stimulus B5, indicated by the surrounding box, is not included in the 5/4 structure. A0 and B0 are prototypes of category A and B respectively. High similarity exemplar pairs (3 feature overlap) are stimuli A1(1110) and B1(1100)、A1(1110) and B2(0110)、A2(1010) and A1(1110), and A2(1010) and A3(1011).
Procedure
On each trial one of the stimuli from the set of 10 possible stimuli was randomly presented in the central position of the computer display screen. The screen was set to a resolution of 1024×768 and size of 6×8cm. Participants were asked to categorize the stimulus by pressing the F (for category A) or J (for category B) keys on the keyboard. Each stimulus was presented until the participant made a response at which time the stimulus was replaced by informative feedback (the words “Right” or “Wrong” in Chinese, 对 and 错, respectively). Maximum stimulus presentation time was 5 s; if a participant did not respond within this time the trial was terminated and no response collected. Feedback was displayed for 1 s followed by a fixation cross during a 500 ms intertrial interval. The experiment was divided into blocks of 10 trials within which each stimulus was presented once in random order. Participants continued performing the task until they reached a criterion of accuracy at or above 90% on three consecutive blocks, or completed a total of 70 blocks without reaching criterion.
Computational Modelling
We applied the two computational models described in the Introduction, the MPM and GCM, to participants’ data.
The GCM
The exemplar-based GCM represents categories by storing exemplars in memory. It classifies items by first calculating the weighted distance between to-be-classified item i and an exemplar stored in memory, j, by
$${{d}}_{{i}{j}}=\mathbf{C}\left[\sum _{{k}=1}^{{N}}{{w}}_{{k}}\left|{{x}}_{{i}{k}}-{{x}}_{{j}{k}}\right|\right]$$
1
where \({x}_{ik}\) takes the value 0 or 1 depending on dimension k. Differences among dimensions are weighed by a dimensional decision weight (attention weight), \({w}_{k}\) (0≤\({w}_{k}\)≤1, Σ\({w}_{k}\) = 1). C is an overall sensitivity parameter that determines the rate at which similarity declines with distance. (see Medin & Schaffer, 1978, and Nosofsky, 1984, for more extensive discussion). C was initialized to a range between 0–20 to meet the requirements of the genetic algorithm modeling function and was subsequently optimized during the modeling process. Distance is transformed by the following exponential decay function:
$${{\eta }}_{\mathbf{i}\mathbf{j}}={{e}}^{{-\mathbf{d}}_{\mathbf{i}\mathbf{j}}}$$
2
Equation 2 produces a similarity between exemplars i and j (Shepard, 1987). Similarity between an exemplar from category A is \(\sum _{j\in \text{A}}{\eta }_{\text{i}\text{j}}\),and a, exemplar from category B is \(\sum _{j\in B}{\eta }_{\text{i}\text{j}}\).
$${{P}}_{⟨{{R}}_{{A}}|{{S}}_{{i}}⟩}=\frac{{\sum }_{{j}\in \mathbf{A}}{{\eta }}_{\mathbf{i}\mathbf{j}}}{\sum _{{j}\in \mathbf{A}}{{\eta }}_{\mathbf{i}\mathbf{j}}+\sum _{{j}\in \mathbf{B}}{{\eta }}_{\mathbf{i}\mathbf{j}}}$$
3
which states that the probability of a Category A response R given item i is the ratio of the similarities between that item and all stored exemplars j in Category A versus those in Categories A and B. Similarities are raised to a gamma parameter specifying the level of deterministic responding. (As for the value of the Gama parameter, it has been stated in Nosofsky & Zaki, 2002, p. 926 and Rehder & Hoffman, 2005, p. 829 that the value is set to 1. Recall that with the GCM, deterministic responding was represented separately by the gamma parameter in Eq. 3. Although a gamma parameter may also be added to the MPM in the following equation, Eq. 6, one cannot estimate its value independently of the sensitivity parameter. In fact, versions of the MPM with and without the gamma parameter are mathematically identical, see Nosofsky & Zaki, 2002, p. 926, for a complete proof of this fact).
The MPM
The MPM is similar to the GCM in that items are classified via distances in a multidimensional space. However, distances in this model are calculated between to-be-classified items and a prototype rather than with previously classified exemplars,
$${{d}}_{{i}{p}}=\mathbf{C}\left[\sum _{{k}=1}^{{N}}{{w}}_{{k}}\left|{{x}}_{{i}{k}}-{{P}}_{{k}}\right|\right]$$
4
where Pk is the category prototype. Otherwise, the distance calculation functions much like Eq. 1. As with the GCM’s Eq. 2, distance is transformed into similarity by the exponential decay function by
$${{\eta }}_{\mathbf{i}\mathbf{p}}={{e}}^{{-\mathbf{d}}_{\mathbf{i}\mathbf{p}}}$$
5
where C represents the participant’s sensitivity to similarity (Nosofsky & Zaki 2002). Similarity to the Category A and B prototype is then entered into the choice equation given by
$${{P}}_{⟨{{R}}_{{A}}|{{S}}_{{i}}⟩}=\frac{{{\eta }}_{{i}{p}\mathbf{A}}}{{{\eta }}_{{i}\mathbf{p}\mathbf{A}}+{{\eta }}_{{i}\mathbf{p}\mathbf{B}}}$$
6
We used a Genetic Algorithm to fit the data, with 3000 iterations for parameter estimation, and sum of square deviation (SSD) as the index of fit. The lower the SSD, the better the fit. One limitation of this modelling approach is that it uses overall performance across the entire experiment to fit the data, which ignores any changes in representation that might occur over time. In order to better characterize representation use over time, we adopted a stage modelling approach. In this approach we took each 10 blocks of learning as a stage and calculated the degree of fit for each stage independently. This allowed us to identify changes in the use of prototype and exemplar representations across the whole learning process (Ashby, 2019). Model fitting was performed using custom code programmed in MATLAB(R2009a) software. Code for each of these models is available in the supplementary materials.
Results
Learning to Criterion
JASP 0.10.2 statistical software was used for data analysis. Mean accuracy was calculated for each participant for each block. The number of participants and the mean of learning blocks to reach the criteria were calculated. A total of 29 of the 44 participants met the learning criteria earlier than block 70. Across all participants the mean number of blocks to reach criterion was 27.35 (sd = 15.61, 10 trials per block). Because there was a large amount of variability between participants, we divided participants into “good” and “poor” learners based on blocks to criterion, with those reaching criterion in 27 or fewer blocks categorized as “good learners” (n = 18) and those requiring more than 27 blocks “poor” learners (n = 26). This allowed us to assess whether patterns in representation use were characteristic of all participants or primarily for good or poor learners alone.
Representations: Diagnostic Stimulus Classification evidence
As discussed in the Introduction, responses to stimuli A1 and A2 can be used to identify the representations used by participants. A1 is a good match for category A based on prototype similarity, but a poor match based on exemplar matching. Conversely, A2 is a good match for category A based on exemplar similarity but a poor match based on prototype similarity. Therefore, if the accuracy rate of A1 is higher than that of A2, it indicates that participants are using a prototype representation, whereas if A2 is higher than A1, it indicates an exemplar representation. A1 and A2 accuracy rates (calculated as the mean of category A response) are shown in Table 2.
Table 2
Categorization of diagnostic stimuli
|
Whole
|
First half
|
Second half
|
Good learner
(N = 18)
|
Poor learner
(N = 26)
|
A1
|
0.71(0.17)
|
0.64(0.20)
|
0.85(0.18)
|
0.77(0.16)
|
0.67(0.17)
|
A2
|
0.70(0.15)
|
0.66(0.18)
|
0.77(0.24)
|
0.72(0.16)
|
0.68(0.14)
|
Note: Proportion of classification of stimuli A1 and A2 as members of Category A. A1 stimuli are diagnostic of a prototype representation, A2 of an exemplar representation. Standard deviations are indicated in parentheses.
Overall, both A1 and A2 tended to be categorized as category A with no significant difference in A1 and A2 accuracy across the whole experiment, t (43) =0.73,p=0.469, Cohen’s d=0.11. To examine evolution of representation over time, we divided the experiment into halves. Because participants varied in the number of blocks to criterion, the two halves were determined for each participant individually. For example, for a participant who reached criterion in 20 blocks, blocks 1-10 were defined as the first half, and blocks 11-20 as the second half, whereas for a participant who reached criterion in 60 blocks, the first half included blocks 1-30 and the second half blocks 31-60. A 2 (stimulus: A1 vs A2) ×2 (Half: First vs second) repeated measures ANOVA found no significant main effect of A1-A2, F(1,43)=1.16, p>0.288, η2=0.03, but a main effect of half, F(1,43)=33.25, p<0.001, η2=0.44, such that accuracy increased in the second half. The interaction was also significant, F(1,43)=4.14, p=0.048, η2=0.09, which appeared to be driven by a difference between A1 and A2 that emerged in the second half. A simple effect test revealed a significant difference between A1 and A2 in the second half, F(1,43)=4.08, p=0.050, η2=0.09, but not in the first half, F(1,43)=0.21, p=0.649, η2=0.05. This result suggests that as they learned participants shifted to reliance on prototype representations.
We also compared good and poor learners. A 2 (stimulus: A1 vs A2) ×2 (Learners: Good vs Poor) repeated measure ANOVA found a significant interaction, F(1,42)=6.96, p=0.012, η2=0.14. Further simple effect test results showed that the difference was mainly due to the good learners showing a significant difference between A1 and A2, F(1,42)=4.84, p=0.033, η2=0.10, while the difference was not significant for the poor learners, F(1,42)=2.19, p=0.146, η2=0.05.This result shows that good learners were especially likely to use prototype representations.
Representation: MPM and GCM Model fitting evidence
Each participant was categorized as primarily MPM or primarily GCM based on the fit of the model across all blocks that they completed (see above for model fitting details). Among them, the participants whose behavior was best fit by MPM (N = 18) reached criterion in 24 blocks on average, while the participants whose behavior was best fit by GCM (n = 26) reached criterion in 34 blocks on average, a statistically marginal significant difference, t(33) = 1.89, p = 0.068, Cohen’s d = 0.65. The learning curves for MPM and GCM participants are shown in Fig. 2: as can be seen, the differences between strategies emerges around block 11. We further examined model fits in the good and poor learner groups separately (not illustrated) A higher proportion of the MPM group were good learners (61%), and a higher proportion of the GCM group were poor learners (73%). A Chi-square test revelated that good learners were significantly more likely to be categorized as MPM users, while poor learners were significantly more likely to be categorized as GCM, X2 = 5.14, df = 1, p = 0.023.
We further examined the development of representation use over time by fitting the model to each participant across groups of 10 blocks (100 trials). As shown in Table 3 and Fig. 3, the proportion of participants fit by the MPM model increased across blocks. There was also a tendency for the initial blocks to be best fit by the MPM model.
Table 3
Stage fitness of MPM and GCM across blocks.
|
Blocks
|
1~10
|
11~20
|
21~30
|
31~40
|
41~50
|
51~60
|
61~70
|
|
N
|
44
|
40
|
30
|
25
|
18
|
15
|
13
|
MPM
|
SSD
|
0.217
|
0.256
|
0.253
|
0.329
|
0.222
|
0.274
|
0.133
|
Fitness
|
.73
|
.45
|
.47
|
.28
|
.66
|
.80
|
.85
|
GCM
|
SSD
|
0.237
|
0.241
|
0.240
|
0.299
|
0.282
|
0.323
|
0.234
|
|
Fitness
|
.27
|
.55
|
.53
|
.72
|
.33
|
.20
|
.15
|
Note: Table indicates for each grouping of 10 blocks (100 trials each) the proportion of participants best fit by each model and the mean degree of fit measured by SSD. For this analysis participants were categorized as MPM or GCM on each set of blocks separately, rather than categorized based on their overall fit across the whole experiment. SSD: Sum of squared deviations, the measure of model fit. Fitness: proportion of participants best fit by the model in the particular group of 10 blocks.
Dimensional weighting
Table 4
Fitness and dimension weighting for MPM and GCM participants
Model
|
|
Fitness
|
D1
|
D2
|
D3
|
D4
|
C
|
MPM
|
1st half
|
0.535
|
0.311
|
0.055
|
0.331
|
0.303
|
0.230
|
2nd half
|
0.973
|
0.201
|
0.079
|
0.532
|
0.188
|
0.395
|
Total
|
0.103
|
0.233
|
0.103
|
0.521
|
0.144
|
1.242
|
GCM
|
1st half
|
1.430
|
0.223
|
0.088
|
0.477
|
0.212
|
0.477
|
|
2nd half
|
0.978
|
0.394
|
0.098
|
0.351
|
0.158
|
2.228
|
|
Total
|
0.106
|
0.226
|
0.158
|
0.403
|
0.214
|
4.125
|
Note: Halves were defined for each participant individually based on their blocks to criterion. N=44. D1-D4: Dimension 1 – Dimension 4 weights, respectively. C: sensitivity parameter
We examined the weighting of each dimension for the MPM and GCM participants across the two halves of the experiments, shown in Table 4. Because participants varied in the number of blocks to criterion, the two halves were determined for each participant individually. For example, for a participant who reached criterion in 20 blocks, blocks 1–10 were defined as the first half, and blocks 11–20 as the second half, whereas for a participant who reached criterion in 60 blocks, the first half included blocks 1–30 and the second half blocks 31–60. It is helpful to compare the weightings made by the participants to the diagnostic value of each dimension for an ideal observer, which are: D1 = 0.7, D2 = 0.6, D3 = 0.8 and D4 = 0.6. Thus, a participant whose behavior matches the diagnostic value of the dimensions should show the following pattern of weighting: D3 > D1 > D2 = D4.
To compare the overall fits in the MPM and GCM groups, A 2 (group: MPM or GCM) x 4 (dimension: D1-D4) ANOVA was performed. The results revealed a significant main effect of dimension F(3,129)=14.274,p<0.001,η2=0.249, no main effect of model group, F(3,129)=0.140,p=0.710,η2=0.003, and a significant interaction, F(3,129)=13.864, p<0.001,η2=0.244. Post hoc tests showed differential weightings of the dimensions in participants within each model group. Participants in the GCM model group showed a pattern of weightings such that: D3>D1>D4=D2, (p values for each pairwise comparison between adjacent dimensions: p=0.000, p=0.026, p=0.163). In contrast, participants in the MPM group showed the following pattern: D3>D1=D4=D2 ( p=0.031, p=0.832, p=0.277). To summarize, both groups showed weighting of D3 that correctly reflects its importance. However, the MPM group did not overall make further distinctions between the dimensions, whereas the GCM group did.
In order to examine how the weighting of dimensions changed across learning in each model group, separate 2 (experiment half: first, last) *4 (dimension: D1-D4) ANOVAs were performed. For the MPM group, the dimension main effect was significant, F(3,129)=12.86,p<0.001,η2=0.23, the experiment half main effect was not significant, F(3,129)=1.31,p=0.259,η2=0.03, and the interaction effect was significant, F(3,129)=5.67, p=0.002,η2=0.12. Post hoc simple effect tests showed that, for the first half, D3=D1=D4>D2,( p=0.831, p=0.936, p=0.000), and for the second half, D3>D1, D4 and D2, (p=0.000, p=0.000, p=0.000) and D1>D2 (p=0.025). Overall, this pattern of results indicates that MPM participants began to weight the more important critical dimensions over less critical dimensions early in training, and developed a strong weighting for the most critical dimension, D3, as they neared criterion performance.
For the GCM group, the dimension main effect was significant, F(3,129)=10.54,p<0.001,η2=0.20, the half main effect was not significant, F(3,129)=0.12,p=0.660,η2=0.05, and the interaction effect was significant, F(3,129)=3.60, p=0.019,η2=0.08. Post hoc simple effect tests showed that, for the first half, D3>D1=D4=D2,( p=0.03, p=0.865, p=0.055), and for the second half, D1=D3>D4=D2, (p=0.663, p=0.024, p=0.352). This pattern of results indicated a correct strongest weighting of D3 early in training, but a spreading of attention across dimensions as they approached criterion. A comparison of both MPM and GCM groups indicates that as they approached criterion the MPM participants became more focused on the most important criterion, whereas GCM participants became less focused.
Discussion
Overall, we found that participants who learned the 5/5 category using the robot instantiation benefitted from using a prototype representation. Participants whose performance was best fit by the MPM performed better than those whose performance was fit by GCM. Good learners were more likely to use prototype representation (as evidenced by A1-A2 stimulus classification and by computational modeling). All participants showed a shift towards prototype representation across the time course of learning. Finally, when we examined the weighting of individual dimensions we found that participants fit by the MPM weighted the most diagnostic feature D3 more strongly in the second half than did participants fit by GCM.