Surface features and feedback type affect formation of prototype or exemplar representations in the 5/5 category learning task

doi:10.21203/rs.3.rs-2368221/v1

Download PDF

Research Article

Surface features and feedback type affect formation of prototype or exemplar representations in the 5/5 category learning task

https://doi.org/10.21203/rs.3.rs-2368221/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Theories of category learning have typically focused on how the underlying category structure affects the category representations acquired by learners. However, there is limited research as to how other factors such as how the category structure is instantiated as stimulus features and how feedback is structured affect what representations are learned and utilized. Across three experiments we varied the surface appearance and type of feedback while holding category structure consistent. We used a novel “5/5” categorization task developed from the well-studied 5/4 task with the addition of one more stimulus to clarify an ambiguity in the 5/4 prototypes. We used multiple methods including computational modeling to identify whether participants categorized on the basis of exemplar or prototype representations. We found substantial differences when the same structure was instantiated as schematic robot-like stimuli and richer bee-like images, in that the former was characterized by use of prototypes and the latter by use of exemplars. We also compared standard correct/incorrect feedback with point-valued feedback for the bee stimulus set and found exemplar use in the former but greater prototype use in the latter. These results indicated that in addition to the underlying structure of categories, the appearance of the stimuli, and form of feedback may affect the strategies utilized and resulting representations during category learning.

Category learning

5/5 category structure

prototype and exemplar representation

computational model

Categorization is a critical human capacity that allows us to learn about the underlying structure of the environment in order to support appropriate behavior. Researchers have studied a wide variety of category structures, including categories defined on the basis of univariate rules, multiple discrete valued features, and /or continuously varying features. We focus on multidimensional categories with discrete features, a type of task structure characteristic of many categorization studies including the classic Shepard et al. (1961) tasks, family resemblance tasks (Medin et al., 1987), and the 5/4 task (Medin & Schaffer, 1978).

The dominant theories of how participants learn discrete multidimensional categories are prototype and exemplar theories. According to prototype theories (Minda & Smith, 2001, 2002; Rosch & Mervis, 1975), categories are represented by their central tendency, the prototype, which is abstracted from specific exemplars and embodies all characteristic features. When a new sample is encountered, people categorize it into the category whose prototype is the most similar. According to exemplar theories (Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1984, 1992; Nosofsky et al., 2018), category learning is based on memorized representations of individual exemplars. When a new sample is encountered, people categorize it into the category that has the most similar previously studied exemplars.

In our study we used a new discrete multidimensional category learning task, the 5/5 task, which is a variant of the highly studied 5/4 task (Braunlich & Love, 2019; Johansen & Palmeri, 2002; Medin & Schaffer, 1978; Minda & Smith, 2002; Rehder & Hoffman, 2005; Smith & Minda, 2000; Zaki et al., 2003). Our goal was to test how two manipulations of task characteristics other than the underlying category structure affect whether participants rely on prototype or exemplar representations. These manipulations were surface instantiation of the categories (as schematic or richly detailed stimuli) and type of feedback (as correct/incorrect or point-valued). We used multiple methods to assess for each participant whether performance was characterized by exemplar or prototype representations, including responses on critical diagnostic stimuli, and computational modeling. We also examined how representation use changed across the time course of learning.

The 5/4 and 5/5 category learning tasks.

One of the most commonly used discrete multidimensional category learning task is the 5/4 task(Medin & Schaffer, 1978). In this task participants learn two categories, A and B, through trial and error. The categories vary along four binary-valued dimensions, as illustrated in Table 1. Only 9 total stimuli, 5 from A and 4 from B, are included in the training set to allow for tight control of prototype and exemplar similarity.

One way the 5/4 task is used to distinguish between prototype and exemplar representations is through examination of responses to two diagnostic stimuli, A1 and A2, which prototype and exemplar theories categorize differently. According to prototype theory, the prototype for Category A is A0(1111), because 1 is the most common value on each dimension in Category A, and the prototype for Category B is B0(0000), because 0 is the most common value on each dimension in Category B. Thus, because A1 (1110) shares three features with the A prototype but only one with the B prototype, the prototype model predicts that A1 will be classified as a member of Category A more often than will A2 (1010), which shares two features with both prototypes. In other words, the prototype theory predicts an A1 advantage over A2 in classification performance. Exemplar theory, in contrast, predicts an A2 advantage, because whereas A2 shares three features with two Category A members (A1 and A3) and two or fewer with any Category B member, A1 shares three features with only one other Category A member (A2) and two Category B members (B1 and B2). The use of A1 vs A2 categorization to identify category learning strategy is supported by a large number of previous studies (Rehder & Hoffman, 2005; Smith & Minda, 2000).

However, the 5/4 category structure includes an ambiguity for category B in dimension 2: across the four B stimuli, the 0 and 1 values occur equally often. Most research assumes that the prototype for category B is B0 (0000), but it could equally likely be (0100) based solely on the B category stimuli. This difference does not affect the validity of the use of A1 or A2 to identify strategy, but it does result in a very weak diagnostic weight for dimension 2. Computational modeling studies using the 5/4 task showed that participants' attention weight for dimension 2 was negligible (for example, the weight was 0.049 in Liu et al. 2012b on a scale ranging from 0.0–1.0). We eliminated this ambiguity by adding a fifth stimulus, exemplar B5(1001) in category B, as shown in Table 1, and refer to this version as the 5/5 task. The addition of B5 eliminates the ambiguity about the prototype of category B as B0(0000), and increases the diagnostic value of dimension 2, but does not affect the use of stimuli A1 and A2 for strategy determination.

Factors Influencing Reliance On Prototype And Exemplar Representations

Most previous research aimed to find evidence for either prototype or exemplar strategies under the assumption that there was a single unitary category learning function that underlay all category learning tasks. Recent research has found evidence for multiple category learning strategies that can result in qualitatively different representations (Ashby et al., 1998; Ashby & Maddox, 2005; Seger & Miller, 2010). Within the area of discrete multidimensional category learning researchers have presented evidence that performance can be fit by both prototype and exemplar models (Mack et al., 2013), and that different neural regions may underlay learning of each (Bowman et al., 2020; Xing et al., 2018).

A complete account of the factors determining when and whether prototype or exemplar representations are formed during learning of discrete multidimensional categories has not yet been established, but there are some intriguing hints. Most prominently, people have examined different types of category structures. Bowman and Zeithamova (2020) manipulated category coherence (closeness of stimuli in the training set to the prototype) and found that more coherent categories were more likely to lead to prototype representations and less coherent to exemplar representations.

Another factor that may influence the type of representation acquired is the surface appearance of the category, referred to as the category instantiation. The underlying structure of a discrete multidimensional category is defined abstractly in terms of the number of dimensions manipulated and the number of discrete values within each dimension, but it does not dictate how the categories are presented in an experiment. Markman and Maddox (2003) found that participants have greatly increased difficulty in learning when the same abstract structure is instantiated with multiple valued features (for example, value 1 on dimension 1 could be instantiated as one of several shades of gray rather than a single shade). Over time researchers have used a wide variety of instantiations that vary in abstractness and realism, but these have rarely been compared and potential biasing by choice of surface instantiation has rarely been acknowledged. Blair and Homa (2003) tested two surface instantiations of a 5/4 category (geometric forms and schematic bugs) and found a primarily memorization strategy for both; however, the criterion for considering that participants learned other than by memorization was strict: participants had to show greater accuracy than could be predicted by memorization alone. We hypothesized that the salience of dimensions would affect the likelihood of ultimately developing exemplar or prototype representations. More specifically, we propose that surface instantiations that were more schematic and in which individual features were prominent would be learned via prototype representations, but that as stimuli became more individualized and individual features less prominent exemplar representations would become more dominant. We discuss potential mechanisms that might underlie the effect of dimensional salience on representations in more detail in the Discussion.

Another factor affecting the type of representation is the type of feedback given to the participants (Xing et al., 2015). Participants who were told the number of different exemplars they would be learning were more likely to use exemplar representation when told the set was small (Liu et al., 2012a). Most previous studies have used simple informative feedback (correct/incorrect) that gave no guidance about the relatively importance of stimuli or dimensions. However, using a different set of categorization tasks (rule based and information integration), Liu and colleagues (Liu et al., 2020) found that type of feedback (correct/incorrect, gain only, loss only, or combination gain/loss) had different effects for different category learning tasks, with information integration benefiting from combined gain and loss feedback, and rule based benefiting from either gain or loss information in isolation. We hypothesized that increasing the value and informativeness of feedback would also affect prototype and exemplar learning in discrete multidimensional tasks; specifically, we predicted that point value feedback calibrated to the similarity of the stimulus to the prototype would highlight the underlying structure of the category and increase the likelihood that participants would learn to use a prototype representation.

Computational Modeling

Computational modeling has brought many exciting results to category learning research(Mack et al., 2016; Mack et al., 2020; Mok & Love, 2019; Rosedahl et al., 2018). We adapted two well established computational models to identify category representations: the generalized context model (GCM, McKinley & Nosofsky, 1995; Nosofsky & Zaki, 2002) and the multiplicative prototype model (MPM, Minda & Smith, 2001, 2002). The models also were used to identify which dimensions participants weighted most in their decisions and compare these weights across strategies.

Current Study

In the current study we compared different surface instantiations and feedback informativeness for category learning. Our overall hypothesis was that differences in surface features and feedback that highlight (or fail to highlight) dimensions would result in participants relying on different category representations. We designed three experiments to test this idea. Experiments 1 and 2 compared category learning of schematic stimuli in which dimensions were more salient (“robot” stimuli, Experiment 1) with richer and more naturalistic stimuli in which individual dimensions were less salient (“bee” stimuli, Experiment 2). Experiments 2 and 3 compared category learning with simple informative feedback (right/wrong; Experiment 2) with rewarded point value feedback scaled to prototype similarity (Experiment 3). All of the experiments used the 5/5 category structure. We hypothesized that conditions that emphasized the salience of dimensions would lead to use of prototype representations, whereas conditions with lower salience for dimensions would lead to exemplar representations. Thus, we predicted that in Experiment 1 participants would tend to learn prototypes, and those who used prototypes would be more likely to be successful at learning, whereas the opposite would be true in Experiment 2. We also predicted that point-based feedback would better highlight category structure than accuracy only feedback. Therefore, we predicted that participants in Experiment 3 would be more likely to use prototype-based representations than participants in Experiment 2.

In Experiment 1 we examined category learning in the 5/5 task for the first instantiation, a schematic “robot”, illustrated in Fig. 1. In the robot each feature is distinct and instantiated by a simple geometric form (e.g., triangles or rectangles for ears). Although features are physically connected, they do not form parts of a familiar figure. The images are black and white line drawings without texture, shading, or color, which also enhances the salience of individual features. We predicted that this instantiation would facilitate selective attention to dimensions and therefore we would see use of prototype representations.

Method

Participants

Participants included 44 undergraduate students from South China Normal University, 23 females and 21 males, aged between 18 and 23. None had previously participated in related category learning experiments. All had normal vision or corrected to normal vision. Participants were paid for their participation. According to G*Power calculations, the experiment required at least 30 participants to reach the medium effect size (1-β = 0.8; α = 0.05). This study was approved by the SCNU Institutional Review Board.

Materials

Stimuli were formed using the 5/5 category structure illustrated in Table 1 and Fig. 1. Stimuli consisted of line drawings of robots that differed along four possible binary features: antennae (straight or curly), ears (triangular or rectangular), eyes (triangular or circular), and base (white square or black oval). A total of 10 different stimuli were formed from each prototype following the feature values in Table 1. Dimensions were randomly assigned to features and counterbalanced across participants.

Table 1 “5/4” and “5/5” category structure.

Note: Feature definitions for the 5/4 and 5/5 structures. The 5/4 structure differed from the 5/5 structure in that stimulus B5, indicated by the surrounding box, is not included in the 5/4 structure. A0 and B0 are prototypes of category A and B respectively. High similarity exemplar pairs (3 feature overlap) are stimuli A1(1110) and B1(1100)、A1(1110) and B2(0110)、A2(1010) and A1(1110), and A2(1010) and A3(1011).

Procedure

On each trial one of the stimuli from the set of 10 possible stimuli was randomly presented in the central position of the computer display screen. The screen was set to a resolution of 1024×768 and size of 6×8cm. Participants were asked to categorize the stimulus by pressing the F (for category A) or J (for category B) keys on the keyboard. Each stimulus was presented until the participant made a response at which time the stimulus was replaced by informative feedback (the words “Right” or “Wrong” in Chinese, 对 and 错, respectively). Maximum stimulus presentation time was 5 s; if a participant did not respond within this time the trial was terminated and no response collected. Feedback was displayed for 1 s followed by a fixation cross during a 500 ms intertrial interval. The experiment was divided into blocks of 10 trials within which each stimulus was presented once in random order. Participants continued performing the task until they reached a criterion of accuracy at or above 90% on three consecutive blocks, or completed a total of 70 blocks without reaching criterion.

Computational Modelling

We applied the two computational models described in the Introduction, the MPM and GCM, to participants’ data.

The GCM

The exemplar-based GCM represents categories by storing exemplars in memory. It classifies items by first calculating the weighted distance between to-be-classified item i and an exemplar stored in memory, j, by

$${{d}}_{{i}{j}}=\mathbf{C}\left[\sum _{{k}=1}^{{N}}{{w}}_{{k}}\left|{{x}}_{{i}{k}}-{{x}}_{{j}{k}}\right|\right]$$

where ${x}_{ik}$ takes the value 0 or 1 depending on dimension k. Differences among dimensions are weighed by a dimensional decision weight (attention weight), ${w}_{k}$ (0≤${w}_{k}$≤1, Σ${w}_{k}$ = 1). C is an overall sensitivity parameter that determines the rate at which similarity declines with distance. (see Medin & Schaffer, 1978, and Nosofsky, 1984, for more extensive discussion). C was initialized to a range between 0–20 to meet the requirements of the genetic algorithm modeling function and was subsequently optimized during the modeling process. Distance is transformed by the following exponential decay function:

$${{\eta }}_{\mathbf{i}\mathbf{j}}={{e}}^{{-\mathbf{d}}_{\mathbf{i}\mathbf{j}}}$$

Equation 2 produces a similarity between exemplars i and j (Shepard, 1987). Similarity between an exemplar from category A is $\sum _{j\in \text{A}}{\eta }_{\text{i}\text{j}}$,and a, exemplar from category B is $\sum _{j\in B}{\eta }_{\text{i}\text{j}}$.

$${{P}}_{⟨{{R}}_{{A}}|{{S}}_{{i}}⟩}=\frac{{\sum }_{{j}\in \mathbf{A}}{{\eta }}_{\mathbf{i}\mathbf{j}}}{\sum _{{j}\in \mathbf{A}}{{\eta }}_{\mathbf{i}\mathbf{j}}+\sum _{{j}\in \mathbf{B}}{{\eta }}_{\mathbf{i}\mathbf{j}}}$$

which states that the probability of a Category A response R given item i is the ratio of the similarities between that item and all stored exemplars j in Category A versus those in Categories A and B. Similarities are raised to a gamma parameter specifying the level of deterministic responding. (As for the value of the Gama parameter, it has been stated in Nosofsky & Zaki, 2002, p. 926 and Rehder & Hoffman, 2005, p. 829 that the value is set to 1. Recall that with the GCM, deterministic responding was represented separately by the gamma parameter in Eq. 3. Although a gamma parameter may also be added to the MPM in the following equation, Eq. 6, one cannot estimate its value independently of the sensitivity parameter. In fact, versions of the MPM with and without the gamma parameter are mathematically identical, see Nosofsky & Zaki, 2002, p. 926, for a complete proof of this fact).

The MPM

The MPM is similar to the GCM in that items are classified via distances in a multidimensional space. However, distances in this model are calculated between to-be-classified items and a prototype rather than with previously classified exemplars,

$${{d}}_{{i}{p}}=\mathbf{C}\left[\sum _{{k}=1}^{{N}}{{w}}_{{k}}\left|{{x}}_{{i}{k}}-{{P}}_{{k}}\right|\right]$$

where P_k is the category prototype. Otherwise, the distance calculation functions much like Eq. 1. As with the GCM’s Eq. 2, distance is transformed into similarity by the exponential decay function by

$${{\eta }}_{\mathbf{i}\mathbf{p}}={{e}}^{{-\mathbf{d}}_{\mathbf{i}\mathbf{p}}}$$

where C represents the participant’s sensitivity to similarity (Nosofsky & Zaki 2002). Similarity to the Category A and B prototype is then entered into the choice equation given by

$${{P}}_{⟨{{R}}_{{A}}|{{S}}_{{i}}⟩}=\frac{{{\eta }}_{{i}{p}\mathbf{A}}}{{{\eta }}_{{i}\mathbf{p}\mathbf{A}}+{{\eta }}_{{i}\mathbf{p}\mathbf{B}}}$$

We used a Genetic Algorithm to fit the data, with 3000 iterations for parameter estimation, and sum of square deviation (SSD) as the index of fit. The lower the SSD, the better the fit. One limitation of this modelling approach is that it uses overall performance across the entire experiment to fit the data, which ignores any changes in representation that might occur over time. In order to better characterize representation use over time, we adopted a stage modelling approach. In this approach we took each 10 blocks of learning as a stage and calculated the degree of fit for each stage independently. This allowed us to identify changes in the use of prototype and exemplar representations across the whole learning process (Ashby, 2019). Model fitting was performed using custom code programmed in MATLAB(R2009a) software. Code for each of these models is available in the supplementary materials.

Results

Learning to Criterion

JASP 0.10.2 statistical software was used for data analysis. Mean accuracy was calculated for each participant for each block. The number of participants and the mean of learning blocks to reach the criteria were calculated. A total of 29 of the 44 participants met the learning criteria earlier than block 70. Across all participants the mean number of blocks to reach criterion was 27.35 (sd = 15.61, 10 trials per block). Because there was a large amount of variability between participants, we divided participants into “good” and “poor” learners based on blocks to criterion, with those reaching criterion in 27 or fewer blocks categorized as “good learners” (n = 18) and those requiring more than 27 blocks “poor” learners (n = 26). This allowed us to assess whether patterns in representation use were characteristic of all participants or primarily for good or poor learners alone.

Representations: Diagnostic Stimulus Classification evidence

As discussed in the Introduction, responses to stimuli A1 and A2 can be used to identify the representations used by participants. A1 is a good match for category A based on prototype similarity, but a poor match based on exemplar matching. Conversely, A2 is a good match for category A based on exemplar similarity but a poor match based on prototype similarity. Therefore, if the accuracy rate of A1 is higher than that of A2, it indicates that participants are using a prototype representation, whereas if A2 is higher than A1, it indicates an exemplar representation. A1 and A2 accuracy rates (calculated as the mean of category A response) are shown in Table 2.

Table 2

Categorization of diagnostic stimuli
	Whole	First half	Second half	Good learner (N = 18)	Poor learner (N = 26)
A1	0.71(0.17)	0.64(0.20)	0.85(0.18)	0.77(0.16)	0.67(0.17)
A2	0.70(0.15)	0.66(0.18)	0.77(0.24)	0.72(0.16)	0.68(0.14)

Note: Proportion of classification of stimuli A1 and A2 as members of Category A. A1 stimuli are diagnostic of a prototype representation, A2 of an exemplar representation. Standard deviations are indicated in parentheses.

Overall, both A1 and A2 tended to be categorized as category A with no significant difference in A1 and A2 accuracy across the whole experiment, t (43) =0.73，p=0.469, Cohen’s d=0.11. To examine evolution of representation over time, we divided the experiment into halves. Because participants varied in the number of blocks to criterion, the two halves were determined for each participant individually. For example, for a participant who reached criterion in 20 blocks, blocks 1-10 were defined as the first half, and blocks 11-20 as the second half, whereas for a participant who reached criterion in 60 blocks, the first half included blocks 1-30 and the second half blocks 31-60. A 2 (stimulus: A1 vs A2) ×2 (Half: First vs second) repeated measures ANOVA found no significant main effect of A1-A2, F(1,43）=1.16, p>0.288, η²=0.03, but a main effect of half, F(1,43）=33.25, p<0.001, η²=0.44, such that accuracy increased in the second half. The interaction was also significant, F(1,43）=4.14, p=0.048, η²=0.09, which appeared to be driven by a difference between A1 and A2 that emerged in the second half. A simple effect test revealed a significant difference between A1 and A2 in the second half, F(1,43）=4.08, p=0.050, η²=0.09, but not in the first half, F(1,43）=0.21, p=0.649, η²=0.05. This result suggests that as they learned participants shifted to reliance on prototype representations.

We also compared good and poor learners. A 2 (stimulus: A1 vs A2) ×2 (Learners: Good vs Poor) repeated measure ANOVA found a significant interaction, F(1,42）=6.96, p=0.012, η²=0.14. Further simple effect test results showed that the difference was mainly due to the good learners showing a significant difference between A1 and A2, F(1,42）=4.84, p=0.033, η²=0.10, while the difference was not significant for the poor learners, F(1,42）=2.19, p=0.146, η²=0.05.This result shows that good learners were especially likely to use prototype representations.

Representation: MPM and GCM Model fitting evidence

Each participant was categorized as primarily MPM or primarily GCM based on the fit of the model across all blocks that they completed (see above for model fitting details). Among them, the participants whose behavior was best fit by MPM (N = 18) reached criterion in 24 blocks on average, while the participants whose behavior was best fit by GCM (n = 26) reached criterion in 34 blocks on average, a statistically marginal significant difference, t(33) = 1.89, p = 0.068, Cohen’s d = 0.65. The learning curves for MPM and GCM participants are shown in Fig. 2: as can be seen, the differences between strategies emerges around block 11. We further examined model fits in the good and poor learner groups separately (not illustrated) A higher proportion of the MPM group were good learners (61%), and a higher proportion of the GCM group were poor learners (73%). A Chi-square test revelated that good learners were significantly more likely to be categorized as MPM users, while poor learners were significantly more likely to be categorized as GCM, X² = 5.14, df = 1, p = 0.023.

We further examined the development of representation use over time by fitting the model to each participant across groups of 10 blocks (100 trials). As shown in Table 3 and Fig. 3, the proportion of participants fit by the MPM model increased across blocks. There was also a tendency for the initial blocks to be best fit by the MPM model.

Table 3

Stage fitness of MPM and GCM across blocks.
	Blocks	1～10	11～20	21～30	31～40	41～50	51～60	61～70
	N	44	40	30	25	18	15	13
MPM	SSD	0.217	0.256	0.253	0.329	0.222	0.274	0.133
MPM	Fitness	.73	.45	.47	.28	.66	.80	.85
GCM	SSD	0.237	0.241	0.240	0.299	0.282	0.323	0.234
	Fitness	.27	.55	.53	.72	.33	.20	.15

Note: Table indicates for each grouping of 10 blocks (100 trials each) the proportion of participants best fit by each model and the mean degree of fit measured by SSD. For this analysis participants were categorized as MPM or GCM on each set of blocks separately, rather than categorized based on their overall fit across the whole experiment. SSD: Sum of squared deviations, the measure of model fit. Fitness: proportion of participants best fit by the model in the particular group of 10 blocks.

Dimensional weighting

Table 4

Fitness and dimension weighting for MPM and GCM participants
Model		Fitness	D1	D2	D3	D4	C
MPM	1st half	0.535	0.311	0.055	0.331	0.303	0.230
	2nd half	0.973	0.201	0.079	0.532	0.188	0.395
	Total	0.103	0.233	0.103	0.521	0.144	1.242
GCM	1st half	1.430	0.223	0.088	0.477	0.212	0.477
	2nd half	0.978	0.394	0.098	0.351	0.158	2.228
	Total	0.106	0.226	0.158	0.403	0.214	4.125

Note: Halves were defined for each participant individually based on their blocks to criterion. N=44. D1-D4: Dimension 1 – Dimension 4 weights, respectively. C: sensitivity parameter

We examined the weighting of each dimension for the MPM and GCM participants across the two halves of the experiments, shown in Table 4. Because participants varied in the number of blocks to criterion, the two halves were determined for each participant individually. For example, for a participant who reached criterion in 20 blocks, blocks 1–10 were defined as the first half, and blocks 11–20 as the second half, whereas for a participant who reached criterion in 60 blocks, the first half included blocks 1–30 and the second half blocks 31–60. It is helpful to compare the weightings made by the participants to the diagnostic value of each dimension for an ideal observer, which are: D1 = 0.7, D2 = 0.6, D3 = 0.8 and D4 = 0.6. Thus, a participant whose behavior matches the diagnostic value of the dimensions should show the following pattern of weighting: D3 > D1 > D2 = D4.

To compare the overall fits in the MPM and GCM groups, A 2 (group: MPM or GCM) x 4 (dimension: D1-D4) ANOVA was performed. The results revealed a significant main effect of dimension F（3，129）=14.274，p<0.001，η²=0.249, no main effect of model group, F（3，129）=0.140，p=0.710，η²=0.003, and a significant interaction, F（3，129）=13.864, p<0.001，η²=0.244. Post hoc tests showed differential weightings of the dimensions in participants within each model group. Participants in the GCM model group showed a pattern of weightings such that: D3>D1>D4=D2, (p values for each pairwise comparison between adjacent dimensions: p=0.000, p=0.026, p=0.163). In contrast, participants in the MPM group showed the following pattern: D3>D1=D4=D2 ( p=0.031, p=0.832, p=0.277). To summarize, both groups showed weighting of D3 that correctly reflects its importance. However, the MPM group did not overall make further distinctions between the dimensions, whereas the GCM group did.

For the GCM group, the dimension main effect was significant, F（3，129）=10.54，p<0.001，η²=0.20, the half main effect was not significant, F（3，129）=0.12，p=0.660，η²=0.05, and the interaction effect was significant, F（3，129）=3.60, p=0.019，η²=0.08. Post hoc simple effect tests showed that, for the first half, D3>D1=D4=D2,( p=0.03, p=0.865, p=0.055), and for the second half, D1=D3>D4=D2, (p=0.663, p=0.024, p=0.352). This pattern of results indicated a correct strongest weighting of D3 early in training, but a spreading of attention across dimensions as they approached criterion. A comparison of both MPM and GCM groups indicates that as they approached criterion the MPM participants became more focused on the most important criterion, whereas GCM participants became less focused.

Discussion

Overall, we found that participants who learned the 5/5 category using the robot instantiation benefitted from using a prototype representation. Participants whose performance was best fit by the MPM performed better than those whose performance was fit by GCM. Good learners were more likely to use prototype representation (as evidenced by A1-A2 stimulus classification and by computational modeling). All participants showed a shift towards prototype representation across the time course of learning. Finally, when we examined the weighting of individual dimensions we found that participants fit by the MPM weighted the most diagnostic feature D3 more strongly in the second half than did participants fit by GCM.

For experiment 2 we chose to instantiate the category structure using a different format designed to make individual feature dimensions less salient. We used the figurative bee stimuli illustrated in Fig. 1. Compared with the robot stimuli in Experiment 1, the bees were less schematic: each individual feature was more complex, and features were more strongly connected with the main body of the bee. Furthermore, encoding of the stimulus as a whole was supported by depicting the stimulus in color rather than as a black and white line drawing. We hypothesized that this structure would lead to greater use of exemplar representations for categorization.

Method

Participants

Participants included 32 undergraduate students from South China Normal, 20 females and 12 males, aged between 18 and 23. None had previously participated in related category learning experiments. All had normal vision or corrected to normal vision. Participants were paid for their participation. According to G*Power calculations, the experiment required at least 30 participants to reach the medium effect size (1-β = 0.8; α = 0.05).

Materials

The category structure was the same as in Experiment 1, but the individual stimuli were instantiated as “bees” rather than “robots” as shown in Fig. 1. For the bee stimuli the four feature dimensions were the legs (2 or 6), wings (1 or 2 pairs), antennae (with or without), and body markings (stripes or spots). As in Experiment 1, these features were randomly assigned to dimensional values and counterbalanced across participants. The bee format was chosen because if its relatively rich appearance including coloration: for all bees the body color was yellow and the wings were gray.

Procedure

The procedure was the same as Experiment 1 except instead of instructing the participants that they would be learning to categorize as “A” and “B”, they were instructed that they would be learning to categorize the bees as the "toxic" and "non-toxic". The labels were applied to the F and J keys of a computer keyboard, respectively, to represent toxic and non-toxic.

Results

Learning to Criterion

The number of participants and the mean of learning blocks to reach the criteria were calculated. Of the 32 total participants, 25 met the learning criteria in fewer than 70 blocks, with an average block of 29.19 (sd = 13.59, 10 trials per block). Participants were again divided into “good” and “poor” learner groups based on the mean blocks to criterion, with good learners (n = 19) being those who reached criterion in 29 or fewer blocks, and poor learners (n = 13) being those who required greater than 29 blocks.

Representation: Diagnostic Stimulus Classification evidence

Table 5

Categorization of diagnostic stimuli, Experiment 2
	Whole	First half	Second half	Good learner (N = 19)	Poor learner (N = 13)
A1	0.70(0.21)	0.70(0.21)	0.67(0.24)	0.69(0.20)	0.71(0.22)
A2	0.70(0.20)	0.72(0.23)	0.75(0.19)	0.72(0.17)	0.67(0.24)

Accuracy for stimuli A1 and A2 is shown in Table 5. The accuracy for all participants across the whole experiment on A1 and A2 was compared using a t-test, which revealed no significant difference, t(31)=0.03，p=0.978, Cohen’s d=0.05. Differences emerged when the task was broken down into halves, again individually for each participant based on their total number of blocks. A 2 (stimulus: A1 vs A2) ×2 (Half: first vs second) repeated measure ANOVA revealed a significant interaction, F(1,31）= 6.50, p=0.016, η²=0.17. The main effect of A1-A2 was significantly different, F(1,31）=4.58, p=0.04, η²=0.13, but the main effect of half was not, F(1,31）=0.00, p=0.990, η²=0.00. A simple effect analysis found that during the second half there was a significant difference between A1 and A2 F(1,31）=9.76, p=0.004, η²=0.24, but not in the first half, F(1,31）=0.22, p=0.640, η²=0.01. This result suggests that in the later stage of learning, the participants tend to form exemplar representations. Although some numeric differences between good and poor learners can be seen in the Table, a 2 (stimulus: A1 vs A2) x 2 (group: good vs poor) ANOVA found that the interaction and main effects did not reach significance, F(1,31）=0.02, p=0.882, η²=0.001, F(1,31）=0.06, p=0.805, η²=0.002, F(1,31）=1, p=0.326, η²=0.032.

Representation: MPM and GCM Model evidence

Participants best fit by the MPM reached criterion in 35.79 blocks on average, whereas the participants fit by the GCM reached criterion in 26.61 blocks on average. This difference indicated a trend for slower learning in MPM participants, but was not statistically significant, t(30) = 1.78, p = 0.086, Cohen’s d = 0.70. The learning curves for MPM and GCM participants are illustrated in Fig. 4. A visual inspection reveals that the lines diverge around block 12.

The number of participants identified as good and poor learners that were identified as best fit by GCM or MPM was tested using a Chi-square test. This analysis showed a trend for good learners to be disproportionately likely to use the GCM representation, X² = 3.52, df = 1, p = 0.061. A higher proportion of the GCM group were good learners (70%), and a higher proportion of the MPM group were poor learners (67%)

As in Experiment 1, staged model fitting was performed across groups of 10 blocks (100 trials). The results are shown in Table 6 and illustrated in Fig. 5. The results showed that across the time course of learning, a larger proportion of the participants were best fit by the GCM.

Table 6

Stage fitness of MPM and GCM across blocks.
	Blocks	1～10	11～20	21～30	31～40	41～50	51～60	61～70
	N	32	31	19	11	8	6	5
MPM	SSD	0.247	0.211	0.321	0.554	0.611	0.285	0.492
MPM	Fitness	.50	.61	.37	.18	.38	.33	.2
GCM	SSD	0.264	0.225	0.188	0.364	0.111	0.171	0.153
	Fitness	.50	.39	.63	.82	.62	.67	.8

Dimensional weighting

Table 7

Fitness and dimension weighting for MPM and GCM participants
Model		Fitness	D1	D2	D3	D4	C
MPM	1st half	0.215	0.247	0.098	0.343	0.312	1.433
	2nd half	0.086	0.234	0.129	0.392	0.249	3.036
	Total	0.143	0.273	0.113	0.434	0.180	1.385
GCM	1st half	0.263	0.246	0.147	0.273	0.334	5.834
	2nd half	0.223	0.257	0.246	0.290	0.207	5.546
	Total	0.128	0.253	0.202	0.291	0.254	5.114

Note: Halves were defined for each participant individually based on their blocks to criterion. N=32. D1-D4: Dimension 1 – Dimension 4 weights, respectively. C: sensitivity parameter

The same analysis approach was taken as in Experiment 1, shown in Table 7. First, the learning curve for each participant was divided into halves based on the individual’s blocks to criterion. Then to compare the overall fits in the MPM and GCM groups, A 2 (group: MPM or GCM) x 4 (dimension: D1-D4_ ANOVA was performed, and results compared with the ideal observer weightings, D3 > D1 > D2 = D4. The results show that the dimension main effect was significant, F（3，93）=4.29，p=0.01，η²=0.12, the model main effect was not significant, F（3，93）=0.69，p=0.414，η²=0.02, and the interaction effect was significant, F（3，93）=10.85, p<0.001，η²=0.26. Post hoc simple effect test showed that for MPM model, D3>D1=D4>D2, (p=0.011, p=0.101, p=0.02). For the GCM, there were no significant pairwise differences between dimensions, resulting in the overall pattern D3=D1=D4=D2 (all ps > .05). Overall, the attention distribution to the four feature dimensions for participants fit by the MPM model was more consistent with the diagnostic characteristics of each dimension than for participants fit by the GCM model.

In order to examine how the weighting of dimensions changed across learning in each model group, separate 2 (experiment half: first, last) *4 (dimension: D1-D4) ANOVAs were performed. For the MPM group, the dimension main effect was significant, F（3，93）=7.70，p<0.001，η²=0.20, the half main effect was not significant, F（3，93）=1，p=0.325，η²=0.03, and the interaction effect was not significant, F（3，93）=1.63, p=0.187，η²=0.05. Post hoc multiple compare test showed that, for the first half, D3=D4=D1>D2, (p=0.602, p=0.308, p=0.01), and for the second half, D3>D4> D2, D3>D1, (p=0.046, p=0.016, p=0.032). Overall, these results show that MPM participants differentially weighted the dimensions across both halves, with the weights improving in their match to the ideal observer weights in the second half. For the GCM group, the dimension main effect was not significant, F（3，93）=0.38，p=0.544，η²=0.01, and the half main effect was not significant, F（3，93）=0.75，p=0.527，η²=0.02, but the interaction effect was is significant, F（3，93）=4.79, p=0.004，η²=0.13. Post hoc simple effect test showed some differential weighting of dimensions in the first half that did not accurately reflect the underlying weightings of the dimensions: D4>D2, p=0.008. In the second half attention was more evenly spread across dimensions so that there were no significant differences between any two dimensions.

Overall, these results indicated different patterns of weighting in the two groups, such that participants best fit by the MPM increased their weighting of the most critical dimensions over time, whereas those best fit by GCM weighted dimensions more evenly over time.

Discussion

Across Experiments 1 and 2 we found that the different instantiations of the categories (as schematic robots or detailed bees) led to different reliance on exemplar versus prototype representations. In Experiment 2 participants who were best fit by the GCM performed better than participants fit by the MPM. There was a shift across the whole experiment to exemplar use. This is the opposite of what was found in Experiment 1. For dimensional attention, participants best fit by the MPM showed greater weighting of diagnostic features D3 and D1 in both Experiments, whereas participants best fit by GCM showed no differences in dimensional weighting; however, greater weighting of diagnostic dimensions and MPM use was associated with better learning in Experiment 1 but worse learning in Experiment 2. These differences between Experiments 1 and 2 all support our hypothesis that the surface instantiation of a category can affect the representations learned, and that more schematic instantiations are best supported by prototype representations and more integrated instantiations best supported by exemplar representations.

In Experiment 3 we tested the hypothesis that feedback that highlighted the match of the stimulus to the prototype would affect tendency for prototype and exemplar representation. For this experiment we used the same stimulus instantiation as Experiment 2 (bees stimulus set), but manipulated the feedback. In Experiment 2 feedback was the same, right or wrong, for each stimulus. In Experiment 3 participants were given point value feedback associated with a small monetary reward. Points were informative about category membership based on the prototype theory: the most typical stimuli (closest to the prototype) received the smallest number of points whereas the least typical stimuli (those farthest from the prototype) received a larger number of points. We hypothesized that point value feedback would increase the salience of category membership and ultimately lead to increased use of prototype representations.

Method

Participants

Participants included 32 undergraduate students from South China Normal, 17 females and 15 males, aged between 18 and 23. None had previously participated in related category learning experiments. All had normal vision or corrected to normal vision. Participants were paid for their participation. According to G*Power calculations, the experiment required at least 30 participants to reach the medium effect size (1-β = 0.8; α = 0.05).

Materials

Table 8

“5/4” and “5/5” category structure.
	A							B
	D1	D2	D3	D4	Score		D1		D2	D3	D4	Score
A1	1	1	1	0	2	B1	1		1	0	0	3
A2	1	0	1	0	3	B2	0		1	1	0	3
A3	1	0	1	1	2	B3	0		0	0	1	2
A4	1	1	0	1	2	B4	0		0	0	0	1
A5	0	1	1	1	2	B5	1		0	0	1	3
A0	1	1	1	1		B0	0		0	0	0

Note: A0 and B0 are prototype of category A and B respectively. High similarity exemplar pairs are (3 features overlap) A1(1110) and B1(1100)、A1(1110) and B2(0110)、A2(1010) and A1(1110)、A2(1010) and A3(1011).

The category definition and stimuli were the same as in Experiment 2. The feedback was changed to use numeric points for correct categorization. The score of each exemplar was based on how similar the stimulus is to the prototype, with greater similarity leading to a lower score. This gives extra reward and information for the hardest of the stimuli (those that differ the most from the prototype). The score for each of the stimuli is given in Table 8. Points available ranged from 1 to 3, with 1 given for features that overlap with the four dimensions of the category prototype, 2 given for features that overlap with three, and 3 given for features that overlap with two. 10% of the participant's standard experiment participation payment was reserved to be used as a bonus. Each point was worth .01 RMB; the average participant received an end of the study bonus of 7 RMB.

Procedure

Except for the different feedback methods, the procedure was the same as Experiment 2. Participants received points for correct answers. No points were deducted for incorrect responses. If the participant responded incorrectly the screen displayed a “X” and no other information. If the participant responded correctly the screen included the following information: “Right, + x points. Total score = y” where x was the points earned on the current trial and y was the total number of points earned in the experiment to that point.

Results

The number of participants and the mean of learning blocks to reach the criteria were calculated. Of the 32 participants, 25 met the learning criteria in fewer than 70 blocks. On average participants reached criterion after 33.19 blocks (sd = 13.96, 10 trials per block).

Representation: Diagnostic Stimulus Classification evidence

Overall, the comparison of stimuli A1 and A2 indicated a tendency to use prototype representations. As shown in Table 9, the accuracy rate for A1 (.72) was significantly higher than A2 (.64), t(31) = 2.33, p = 0.027, Cohen's sd = 0.41, which supported the use of prototype representations in this experiment.

Table 9

Categorization of diagnostic stimuli, Experiment 3
	Whole	First half	Second half	Good learner (N = 17)	Poor learner (N = 15)
A1	0.72(0.17)	0.70(0.19)	0.74(0.23)	0.78(0.17)	0.65(0.14)
A2	0.64(0.25)	0.62(0.26)	0.67(0.27)	0.67(0.23)	0.57(0.26)

Note: N = 32. Proportion of classification of stimuli A1 and A2 as members of Category A. A1 stimuli are diagnostic of a prototype representation, A2 of an exemplar representation. Standard deviations are indicated in parentheses.

A 2 (stimulus: A1 vs A2) ×2 (Group: Good vs Poor) ANOVA reveled significant main effects of both variables such that a A1 accuracy was significantly higher than that of A2, F (1,30）=5.14, p=0.031, η²=0.15, and good learners showed a trend towards higher accuracy than poor learners F(1,30）=3.81, p=0.06, η²=0.11. There was no significant interaction between the variables, F(1,30）=0.09, p=0.765, η²=0.003. This indicates that both good and poor learners had a bias towards learning prototypes.

To examine changes across the time course of learning, A 2 (stimulus: A1 vs A2) ×2 (Half: first, last) ANOVA was conducted. As in the previous analyses, there was a significant difference overall between A1 and A2, F(1,31)=4.58, p=0.04, η²=0.13. but the difference between the first and second halves failed to reach significance, F(1,31）=0.00, p=0.99, η²=0.00. There was also a significant interaction F(1,31）=6.50, p=0.016, η²=0.17. Tests of the simple effects found within the second half a significant difference between A1 and A2, F(1,31）=9.76, p=0.004, η²=0.24, which was not present in the first F(1,31）=0.22, p=0.64, η²=0.01. This result suggests that in the later stage of learning, the participants tend to form prototype representations.

Representation: MPM and GCM Model fitting evidence

Participants were divided into those whose performance was best fit by MPM or GCM using the same modeling approach as in the previous experiments. Participants who were best fit by MPM reached criterion in an average of 30.79 blocks on average, whereas those best fit by GCM reached criterion in 36.69 blocks on average. This difference was not statistically significant, t(30) = 1.18, p = 0.246, Cohen’s d = 0.43. Learning curves across blocks for the MPM and GCM groups are plotted in Fig. 6.

We further examined model fits in the good and poor learner groups separately (not illustrated). A Chi-square test revealed no significant differences in distribution of MPM and GCM use across good and poor learners, 58% and 54%, X² = 0.43, df = 1, p = 0.513.

As in Experiments 1 and 2, staged model fitting was performed across groups of 10 blocks. The results are shown in Table 10 and illustrated in Fig. 7. The results showed that with the progress of learning, a larger proportion of the participants were best fit by the MPM.

Table 10

Stage fitness of MPM and GCM across blocks.
	Blocks	1～10	11～20	21～30	31～40	41～50	51～60	61～70
	N	32	31	24	18	12	9	7
MPM	SSD	0.206	0.275	0.416	0.527	0.156	0.133	0.116
MPM	Fitness	.66	.52	.50	.44	.50	.55	.71
GCM	SSD	0.308	0.248	0.336	0.327	0.259	0.176	0.192
	Fitness	.33	.48	.50	.54	.50	.45	.29

Dimensional weighting

Table 11

Fitness and dimension weighting for MPM and GCM participants
Model		Fitness	D1	D2	D3	D4	C
MPM	1st half	0.204	0.274	0.197	0.351	0.177	1.355
	2nd half	0.363	0.213	0.1	0.409	0.278	3.624
	Total	0.141	0.236	0.145	0.406	0.212	1.635
GCM	1st half	0.308	0.258	0.233	0.3	0.209	7.188
	2nd half	0.178	0.181	0.156	0.345	0.318	6.752
	Total	0.178	0.193	0.216	0.293	0.298	6.445

Note: Halves were defined for each participant individually based on their blocks to criterion. N=32. D1-D4: Dimension 1 – Dimension 4 weights, respectively. C: sensitivity parameter

The same analysis approach for the dimensional weightings was taken as in Experiment 1 and 2, shown in Table 11. To compare the MPM and GCM groups, a 2 (group: MPM or GCM) x 4 (dimension: D1-D4) ANOVA was performed, and results compared with the ideal observer weightings, D3 > D1 > D2 = D4. The results show a trend for the dimension main effect, F（3，93）=2.54，p=0.065，η²=0.08, the group main effect was not significant, F（3，93）=1.49，p=0.231，η²=0.05, and the interaction effect was significant, F（3，93）=5.90, p=0.001，η²=0.16. Post hoc tests showed that for participants fit by the MPM model, D3>D1>D2, D3>D4, (p=0.006, p=0.032, p=0.002). For the GCM, none of the pairwise comparisons between dimensions were significant, D4=D3=D2=D1 (all ps > .05). Overall, the dimensional weights for those participants fit by the MPM model was more consistent with the diagnostic characteristics of each dimension.

In order to examine how the weighting of dimensions changed across learning in each model group, separate 2 (experiment half: first, last) x 4 (dimension: D1-D4) ANOVAs were performed. The learning curve for each participant was divided into halves based on the individual’s blocks to criterion. In the MPM group, the dimension main effect was significant, F（3，93）=9.27，p<0.001，η²=0.23, the half main effect was not significant, F（3，93）=0.53，p=0.474，η²=0.02, and the interaction effect was significant, F（3，93）=5.02, p=0.003，η²=0.14. Post hoc tests showed that the greater weighting of the most diagnostic dimension, D3, was apparent in the first half, D3>D4, D3>D2, (p=0.006, p=0.037), and was maintained in the second half, in which a greater weighting for D1 also emerged, D3>D1=D4>D2 (p=0.000, p=0.197, p=0.000). In the GCM group there were no significant main effects or interaction, indicating similar weighting of dimensions across both halves of the study (dimension main effect, F（3，93）=1.52，p=0.216，η²=0.05, half main effect, F（3，93）=0.81，p=0.375，η²=0.03, interaction effect, F（3，93）=1.80, p=0.161，η²=0.06.

Overall, the patterns of attentional weights were similar to Experiment 2, with the weights for participants fit by MPM more consistent with the diagnostic characteristics of each dimension, and weights for participants best fit by GCM distributed across dimensions.

Discussion

Across Experiments 2 and 3 we found a different pattern of results depending on the type of feedback given despite the use of the same category structure and instantiation in both studies. When feedback was limited to correct/incorrect (as in Experiment 2) participants adopted exemplar strategies and use of exemplar strategies was associated with better performance. However, in Experiment 3 this reliance on exemplar representations was reduced and reliance on prototype representations increased. Overall, in Experiment 3 participants who were best fit by the prototype model performed better than participants best fit by the exemplar model. There was also an overall shift across the study to the use of prototype representations. When dimensional weightings were examined, a similar pattern was observed in both Experiments 2 and 3: higher weighting of the diagnostic dimensions D1 and D3 in those best fit by MPM, and no significant differences in dimension weighting in those best fit by GCM. Combined with the overall superiority seen in learning by GCM participants in Experiment 2 and by MPM participants in Experiment 3, it appears that the greater attentional weighting on D3 and D1 was helpful in Experiment 3 but not in Experiment 2.

It is also important to note that performance overall was worse in Experiment 3 than Experiment 2: participants required more blocks to reach criterion. This indicates that the addition of the point value feedback with reward did not enhance learning, as would be predicted if the informativeness of feedback was useful for supporting learning. The implications of this finding for the cognitive strategies underlying learning is discussed more fully in the General Discussion.

Overall, we found substantial differences in representation use across different instantiations of the same category learning task based on the same underlying structure. When stimuli were represented as schematic robots there was a clear bias towards prototype representations: good participants endorsed probe stimuli consistent with prototype representations, participants fit by the MPM performed better, and participants overall showed a shift towards categorization fit by MPM. In contrast, when the same structure was instantiated as more naturalistic bee stimuli, there was a bias towards exemplar representation use, again demonstrated by both endorsement of probe stimuli (A1 and A2), better performance by the GCM fitted participants, and a shift across blocks to increased fit by GCM.

When stimulus instantiation was held constant (bee stimuli) but the feedback regimen was changed to point feedback, participants also showed differences in resulting representations. When the bees were categorized using point feedback, participants showed greater reliance on prototype representation as evidenced by endorsement of probe stimuli and shift to better fit by the MPM across blocks of trials. Together these results imply that the representations learned during category learning are not merely a function of the underlying category structure but also depend on the surface features of stimuli and the type of feedback received.

Stimulus Instantiation Effects On Representation

We found that our “robot” stimuli were more likely to lead participants to use prototype representations, whereas our “bee” stimuli were more likely to lead participants to use exemplar representations. We hypothesized that the salience of the dimensions would play a role in determining which type of representation would be most effective. Individual features were more salient in the robots than the bees, which may have made identifying the most diagnostic dimension easier. An inspection of the dimension weights across experiments supports this possibility (refer to Tables 4, 7 and 11): the most diagnostic dimension. D3, was identified quickly for the robot stimuli in Experiment 1, with a first half weight of 0.67. In contrast, when the bee stimuli were used Experiments 2 and 3 the first half weights for D3 were weaker, 0.41 and 0.35, respectively. Other research has found that salience and ease of identification of feature values affects learning. Markman and Maddox (2003) found that for stimuli instantiated using multi-level feature values (not limited to 0 and 1; for example, using a range of shades of gray rather than a single gray), participants were more likely to acquire exemplar representations. Zettersten and Lupyan (2019) found that categories with easily verbalizable features were learned more rapidly than those with hard to verbalize features.

Feedback Informativeness Effects On Category Representations

Experiments 2 and 3 used two different forms of feedback, traditional right/wrong feedback versus point-valued feedback, respectively. When point valued feedback was used in Experiment 3, stimuli with low similarity to the prototype received a greater number of points if classified correctly. The change in feedback resulted in a shift from largely exemplar representation to an increased use of prototype representation. Interestingly, even though feedback was more informative in the point valued version, participants performed worse overall as evidenced by an increase in blocks required to reach criterion. One possibility is that the inclusion of the point values changed the strategies that participants used on the task. Previous research examining the effects of type of feedback on category learning has largely used rule based and information integration tasks (Ashby & O’Brien, 2007; Freedberg et al., 2017; Maddox et al., 2003; Worthy et al., 2013). These studies have found that a greater amount of information generally supports rule-based learning reliant on explicit hypothesis testing, as long as the amount of information does not exceed working memory capacity (Liu et al., 2012a). In contrast, information integration is supported by the amount of reinforcement value of the feedback (Liu et al., 2021).

Representational Change Across The Process Of Category Learning

Using the new analysis method of phased model fitting, we found that the percentage of participants fit by prototypes and exemplar models changed across the whole learning process. In all experiments there was an initial bias in the first 10 blocks towards prototype representations that then either shifted towards exemplar (in Experiment 2), or became more pronounced (in Experiments 1 and 3). Liu et al. argued for three stages of learning, each characterized by a different learning strategy or representation (Liu et al., 2012a; b). In Liu et al’s theory, the first phase is the rule strategy phase, which seeks to identify the maximum diagnostic feature dimension to construct a unidimensional rule. Accuracy of the participants at this stage is usually close to the diagnostic value of the feature dimension identified. The second stage is the rule plus exception strategy stage, in which the participants are not satisfied with the accuracy of the single-dimension rule and try to remember some examples of the exception. The classification accuracy of this stage is slightly higher than the diagnostic level, but less than 90%. The final phase the information integration phase in which additional dimensions are included in the representation; this stage could be supported by either prototype or exemplar memory. The classification accuracy of this stage is over 90%.

Between Task Structure And Representations: Implications For Strategy Use In Category Learning

In this paper we have focused on the representation of category structure learned by participants, focusing on prototype and exemplar representations. We identified two factors that affect the salience of features and dimensions and ultimately lead to differential use of prototype or exemplar representations. However, this raises the question of how and why these differences in salience result in differences in representations. This requires an examination of the different strategies that can be brought to bear by participants to learn categories, of which a number have been proposed.

First there are attentional mechanisms. Increased salience of dimensions may support ease of visual parsing of the stimulus and identification of critical features. Many theories postulate that category learning is supported by learned differences in attention to features and dimensions, and that these attentional differences determine the basis of categorization (Kruschke, 1992). We used computational modeling to examine the weights of different dimensions and found, consistent with previous research, that prototype processing was associated with greater weighting of diagnostic features that reflect the underlying dimension weights within the category structure itself (Smith & Minda, 2000; Minda & Smith, 2001; 2002). Other research has questioned the nature of this weighting and whether it truly reflects visual attention. Taking eyetracking as a measure of visual attention, Rehder and Hoffman (2005) found that eye fixations better reflected the distributed attention to features characteristic of GCM, rather than the highly focused attention on critical dimensions of MPM. Our results are consistent with greater dimensional weighting occurring when learning is best supported by prototypes, but readers should note the caveat that these weights may not be diagnostic solely of visual attention and may be influenced by later decisional processes.

In addition to attentional mechanisms, participants can attempt to apply different cognitive strategies during learning which can affect the representations acquired. One strategy is an explicit memory strategy in which participants endeavor to memorize individual exemplars along with their category membership. A memorization strategy would be expected to result in behavior best fit by an exemplar model. In many category learning tasks memorization has limited efficacy because of these tasks include a large number of different stimuli that can overwhelm working memory capacity. However, in the 5/4 and 5/5 tasks only 9 or 10 individual stimuli are used, respectively, which may be amenable to a memorization strategy. Blair and Homa (2003) argued that stimuli in the 5/4 task are just as easy to memorize as they are to learn to categorize. Liu et al. (2012) showed using the 5/4 task that if participants were told the number of unique stimuli in advance they tended to acquire exemplar representations. Even for categorization tasks in which performance cannot be fully accounted for by memorization, memorization can serve as one component in learning. For example, “rule plus exception” theories propose that exception stimuli are memorized (Davis et al., 2011). For memorization, rich stimuli from familiar categories (such as the bees used in Experiments 2 and 3) may have an advantage as there are more features available for high fidelity encoding, and the memory representations may benefit from being able to be built on preexisting conceptually structured memory representations.

A third mechanism is active hypothesis testing as postulated to underlie rule-based category learning (Ashby & Maddox, 2005; Seger & Miller, 2010). Rule-based strategies typically work best when a low number of dimensions are manipulated, so that the rule complexity does not exceed working memory capacity for maintaining and updating across trials. In the 5/4 task rule learning may support early learning, as participants learn to identify a single dimension and make decisions based on the single dimension, or even to learn to base the decision on two dimensions (Liu et al., 2012b). However, to continue to improve in accuracy by rules with more dimensions typically exceed the working memory capacity of most participants, and thus requires a shift to another strategy. To the extent that both rule learning and prototype representations emphasize the most distinctive dimensions, one would predict that rule-based learning would have prototype-like representations. However, rule learning focusing on one or two dominant features would be unlikely to accurately represent the less dominant features.

A final mechanism is reinforcement learning. Reinforcement learning has been shown to account well for complex multivalued and continuous valued dimensions, such as are typical of information integration category learning tasks. According to the COVIS model (Ashby et al., 2011), reinforcement learning serves to map small regions of perceptual space to categories. The prediction of what apparent category representation will result will depend on the interaction of attention with perceptual space. If attention is directed to some dimensions more than others, then the overall patterns of categorization behavior may mimic prototype learning. To the extent that perceptual space includes equal attention to all features, one would predict exemplar representations (Ashby & Rosedahl, 2017).

In a series of experiments, we showed that the surface feature instantiation and feedback type can significantly affect the representations acquired during category learning of discrete multidimensional stimuli. These results highlight the importance of factors other than the underlying category structure in guiding what is learned during category learning, and indicate that these factors should be incorporated into future experimental work and theoretical development.

Data availability:

Data may be obtained upon request to the first author (Zhiya Liu).

Author contributions:

Zhiya Liu: Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Resources; Supervision; Visualization; Writing—Original draft; Writing—review & editing.

Hao Chen: Investigation; Methodology; Visualization; Writing—Original draft;

Jianru Feng: Investigation; Methodology; Visualization; Writing—Original draft;

Carol A. Seger: Formal analysis; Methodology; Visualization; Writing—Original draft; Writing—review & editing.

Funding sources:

The MOE Project of Key Research Institute of Humanities and Social Sciences in Universities (16JJD880025)

Conflict of interest:

The authors declare no conflicts of interest.

Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105(3), 442–481. DOI: 10.1016/j.neuroimage.2011.02.011
Ashby, F. G. (2019). State-trace analysis misinterpreted and misapplied: Reply to Stephens, Matzke, and Hayes (2019). Journal of Mathematical Psychology. 91, 195–200. DOI: 10.1016/j.jmp.2019.07.001
Ashby, F. G., & Maddox, W. T. (2005). Human category learning. Annual Review of Psychology, 56, 149–178. DOI: 10.1111/j.1749-6632.2010.05874.x
Ashby, F. G., & O’Brien, J. B. (2007). The effects of positive versus negative feedback on information-integration category learning. Perception & Psychophysics, 69, 865–878. DOI: 10.3758/BF03193923
Ashby, F. G., Paul, E. J., & Maddox, W. T. (2011). COVIS. In E. M. Pothos & A. J. Wills (Eds.), Formal Approaches in Categorization (pp. 65–87). Cambridge University Press. DOI: 10.1017/CBO9780511921322.004
Ashby, F. G., & Rosedahl, L. (2017). A neural interpretation of exemplar theory. Psychological Review, 124(4), 472–482. DOI: 10.1037/rev0000064
Blair, M., & Homa, D.. (2003). As easy to memorize as they are to classify: the 5 – 4 categories and the category advantage. Memory & Cognition, 31(8), 1293–1301. DOI: 10.3758/BF03195812
Bowman, C. R., Iwashita, T., & Zeithamova, D. (2020). Tracking prototype and exemplar representations in the brain across learning. ELife, 9, e59360. DOI: 10.7554/eLife.59360
Bowman, C. R., & Zeithamova, D. (2020). Training set coherence and set size effects on concept generalization and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition. DOI: 10.1037/xlm0000824
Braunlich, K., & Love, B. C. (2019). Occipitotemporal representations reflect individual differences in conceptual knowledge. Journal of Experimental Psychology: General, 148(7), 1192–1203. DOI: 10.1037/xge0000501.
Davis, T., Love, B. C., & Preston, A. R. (2011). Learning the exception to the rule: Model-based FMRI reveals specialized representations for surprising category members. Cerebral Cortex, 22, 260–273. DOI: 10.1093/cercor/bhr036
Freedberg, M., Glass, B., Filoteo, J.V., Hazeltine, E., & Maddox, W. T. (2017). Comparing the effects of positive and negative feedback in information-integration category learning. Memory & Cognition, 45, 12–25. DOI: 10.3758/s13421-016-0638-3.
Johansen, M. K., & Palmeri, T. J. (2002). Are there representational shifts during category learning? Cognitive Psychology, 45, 482–553. DOI: 10.1016/S0010-0285(02)00505-4
Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning.Psychological Review, 99, 22–44. DOI: 10.1037/0033-295X.99.1.22
Liu, Z. Y., Zhang, Y., Ma, D., Xu, Q., & Seger, C. A.. (2020). Differing effects of gain and loss feedback on rule-based and information-integration category learning. Psychonomic Bulletin & Review, 28(6). DOI: 10.3758/s13423-020-01816-6
Liu, Z. Y., Song, X. H., & Seger, C. A. (2012a). Six-year-old children's ability on category learning: category representation, attention and learning strategy. Acta Psychologica Sinica (Chinese), 44, 634–646. DOI: 10.3724/SP.J.1041.2012.00634.
Liu, Z. Y., Huang, Y. L., & Seger, A. C.(2012b). The expectation effect of the sample size in category learning. Acta Psychologica Sinica(Chinese), 44, 6, 754–765. DOI: 10.3724/SP.J.1041.2012.00754
Mack, M., & Preston, A., & Love, B. (2013). Decoding the brain’s algorithm for categorization from its neural implementation. Current Biology: CB. 23. DOI: 10.1016/j.cub.2013.08.035.
Mack, M. L., Love, B. C., & Preston, A. R. (2016). Dynamic updating of hippocampal object representations reflects new conceptual knowledge. Proceedings of the National Academy of Sciences (PNAS). DOI: 10.1073/pnas.1614048113
Mack, M. L., Preston, A. R., & Love, B. C. (2020). Ventromedial prefrontal cortex compression during concept learning. Nature Communications. DOI: 10.1038/s41467-019-13930-8
Maddox, W. T., Ashby, F. G., & Bohil, C. J. (2003). Delayed feedback effects on rule-based and information-integration category learning. Journal of Experimental Psychology Learning Memory & Cognition, 29(4), 650–662. DOI: 10.1037/0278-7393.29.4.650
Markman, A. B., & Maddox, W. T. (2003). Classification of exemplars with single- and multiple-feature manifestations: The effects of relevant dimension variation and category structure. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(1), 107–117. DOI: 10.1037/0278-7393.29.1.107
McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human Perception and Performance, 21, 128–148. DOI: 10.1037/0096-1523.21.1.128
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238. DOI: 10.1093/jxb/erq104
Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Family resemblance, conceptual cohesiveness, and category construction. Cognitive Psychology, 19(2), 242–279. DOI: 10.1016/0010–0285(87)90012-0
Minda, J. P., & Smith, J. D. (2001). Prototypes in category learning: The effects of category size, category structure, and stimulus complexity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 775–799. DOI: 10.1037/0278-7393.27.3.775
Minda, J. P., & Smith, J. D. (2002). Comparing prototype-based and exemplar-based accounts of category learning and attentional allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 275–292. DOI: 10.1037/0278-7393.28.2.275
Mok, R. & Love, B. C. (2019). A non-spatial account of place and grid cells based on clustering models of concept learning. Nature Communications. DOI: 10.1038/s41467-019-13760-8
Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 104–114. DOI: 10.1037/0278-7393.10.1.104
Nosofsky, R. M. (1992). Exemplars, prototypes, and similarity rules. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From learning theory to connectionist theory: Essays in honor of William K. Estes (pp.149–167). Hillsdale, NJ: Erlbaum.
Nosofsky, R. M., Sanders, C. A., & McDaniel, M. A. (2018). Tests of an Exemplar-Memory model of classification learning in a High-Dimensional Natural-Science category domain. Journal of Experimental Psychology: General, 147(3), 328–353. DOI: 10.1037/xge0000369
Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 924–940. DOI: 10.1037/0278-7393.28.5.924
Rehder, B., & Hoffman, A. (2005). Thirty-something categorization results explained: Attention, eyetracking, and models of category learning. Journal of experimental psychology. Learning, memory, and cognition. 31, 811–829. DOI: 10.1037/0278-7393.31.5.811.
Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7(4), 573–605. DOI: 10.1016/0010–0285(75)90024-9
Rosedahl, L. & Eckstein, M. & Ashby, F.. (2018). Retinal-specific category learning. Nature Human Behaviour. 2. DOI: 10.1038/s41562-018-0370-z.
Seger, C. A. & Miller, E. K. (2010). Category learning in the brain. Annual Review of Neuroscience. 33, 203–219. DOI: 10.1146/annurev.neuro.051508.135546
Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs: General and Applied, 75(13), 1–42. DOI: 10.1037/h0093825.
Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 3–27. DOI: 10.1037//0278-7393.26.1.3
Worthy, D. A., Markman, A. B., & Maddox, W. T. (2013). Feedback and stimulus-offset timing effects in perceptual category learning. Brain & Cognition, 81(2), 283–293. DOI: 10.1016/j.bandc.2012.11.006
Xing, Q., Sun. H. L., & Che, J.S. (2015). Effect of feedback nature on family resemblance category learning. Psychological exploration (In Chinese), 35, 222–227.
Xing, Q., Sun, H., & Che, J. (2018). Effect of feedback value on family resemble category learning: an ERPs study. Studies of Psychology and Behavior(In Chinese), 16(03):300–307.
Zaki, S. R., Nosofsky, R. M., Stanton, R. D., & Cohen, A. L. (2003). Prototype and exemplar accounts of category learning and attentional allocation: A reassessment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1160–1173. DOI: 10.1037/0278-7393.29.6.1160
Zettersten, M., & Lupyan, G. (2019). Finding categories through words: More nameable features improve category learning. Cognition, 196:104–135. DOI: 10.1016/j.cognition.2019.104135.

The Supplementary Materials are not available with this version.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Surface features and feedback type affect formation of prototype or exemplar representations in the 5/5 category learning task

Status:

Version 1

Abstract

Figures

Introduction

Factors Influencing Reliance On Prototype And Exemplar Representations

Computational Modeling

Current Study

Experiment 1

Experiment 2

Experiment 3

General Discussion

Stimulus Instantiation Effects On Representation

Feedback Informativeness Effects On Category Representations

Representational Change Across The Process Of Category Learning

Between Task Structure And Representations: Implications For Strategy Use In Category Learning

Conclusion

Declarations

References

Supplementary Materials

Additional Declarations

Status:

Version 1