Species variety in an unbalanced market. The tagging of legume-based ingredients from the dataset led to the identification of 32 different species including soy(presented in Table S1). This variety is a rather unexpected observation, and at first glance could be considered as an encouraging result in terms of biodiversity. Yet the Figure 1 shows a highly asymmetrical distribution of identified species. Products for which only soy was identified within the ingredient lists, account for 72% of all products. Conversely, products for which only one or more NSL ingredients were identified, represent 19% of all products. The 9% of other products concern those containing both soy and NSL-based ingredients. There are nearly 4 times more product launches containing soy-based ingredients than products containing NSL-based ingredients.
The analysis of the frequencies of the tagged species (see Table S1 in appendices ) according to their position, confirms this highly unbalanced market in favor of few species - primarily soy, and pea among pulses. More precisely, a quartet of species, Pisum sativum L., Phaseolus vulgaris L., Cicer arietinum L. and Lens culinaris Medik L.., account for almost 80% of NSL ingredients, while around twenty NSL-species have a frequency of appearance inferior to 1% among products containing NSL-species. However, NSL ingredient products are more frequently associated with a mention of the NSL species in the product description (on the packaging) compared with soy, for which this frequency is only 4%. For instance, 70% of products containing Lens culinaris Medik. present a mention of the Lens species within the product description.
This unbalanced situation between soy and NSL-based ingredients could be linked to economic factors, such as the availability of each species for agri-food companies. According to the FAO[1], soy remains the most cultivated legume in the world, with an annual production of more than 300 million tons over the last decade. This is 3 times higher than the total production of the most frequent NSL species mentioned above. These observations first confirm the existence of a strong “technological lock-in” around one legume species - soya - which is used for food worldwide, the dominance of this species could be analyzed as a structural trend in agri-food markets29,32.
This dominance of soy is observed across all market segments (Figure 2), with the exception of segments “Spreads” and “Fruits & Vegetables”, where the balance between products containing soy-based ingredients and products containing NSL-based ingredients is almost negligible or even inversed respectively. However, growth rates for products containing soy-based and NSL-based ingredients lead us to several observations. While soy dominates in terms of volume, the cumulative growth rate of products containing NSL-based ingredients is much higher than those containing soy, whatever the market segment considered (see also Table S1). Some products containing NSL-based ingredients experienced a very high cumulative growth rate, particularly in the “Dairy” segment - almost 12 times higher than that of soy-based products. The “Desserts” segment was almost 9 times higher, and the “Breakfast” segment - 7 times higher. More generally, these observations point to a growing interest of agri-food companies for NSL-based products30. Such growing interest, if confirmed over time, could play in favor of greater biodiversity of legume species used in the food supply.
Figure 2 - Shares of products containing soy-based ingredients or NSL-based ingredients in each market segment (%). The percentage sum may exceed 100% because some products have both soy and NSL-based ingredients. The color intensity reflects the cumulative-growth of product launches in the market segment for soy and NSL over the decade. The market segment categories are those established in Mintel-GNPD and detailed in another work32.
[Here comes Table 1]
Europe presents the less unbalanced market between soy and NSL ingredients. Differences between products containing soy-based ingredients and those containing NSL-based ingredients are also more significant when we observe the share of these two categories in each main geographical area of our corpus (Figure 3).
Figure 3 - Soy and NSL-based products in main geographic areas covered by the corpus.
Soy is dominant in every geographical area. Let us now consider three groups of geographic areas. A first one encompasses areas for which the share of products containing soy-based ingredients and NSL-based ingredients are almost balanced: this is the case for Europe and Southern Asia, where around 40% of product launches contain NSL-based ingredients. A second one gathers areas where the share of products containing soy-based ingredients is extremely dominant (around 80%): this group concerns mainly Asian markets (excepting the Southern Asian market) and the North and South American markets. A third group includes areas where the share of products containing soy-based ingredients is less dominant than in the second group, but still concerns the majority (over 60% and under 77 %). This last group concerns smaller markets compared to the other areas, for which MINTEL-GNPD does not have full coverage, as it is the case for Africa.
The interpretation of such differences between geographical areas is probably multi-factorial. The structuring of the different markets may reflect differences in terms of food cultures. For instance, soy products dominate Asian markets, with the exception of the Southern Asian market which includes India, a country where pulse (and particularly lentils) production and consumption are among the highest in the world35,36. Meanwhile, such differences can also be interpreted as the consequences of the different national or international public support schemes for pulse consumption, as is the case for Europe37. Still regarding the European area, this quasi-balance between soy and NSL-based food products could indicate a shift in the technological lock-in that European countries have encountered until now30–32, which would be beneficial in a near future to the biodiversification of processed food supply.
More generally, the overall structure of the corpus, whether in terms of market segments or geographical areas, reveals that a small number of species account for the bulk of legume-based food innovation production. This concentration stands in the way of greater biodiversity in the food markets. The more we use a small number of species to produce a larger and growing variety of foods (soy is present in all market segments), the less room there is for the development of other species. This situation, partly resulting from historical and economic factors that led to a lock-in situation, could undergo contemporary changes. But to confirm an actual possible shift that would favor the use of NSL species in food markets, we also need to look at the ways these species are used in product formulations.
Product-context use of legumes: entering the importance of the ingredient. To go further in assessing the biodiversity of legume species used in product launches, we looked at the ‘product-context of use’ of these species. In terms of biodiversity, we assumed that it could be misleading to consider on the same level the food products that use these species for different reasons that we do not know about. Notably, from the point of view of product formulation, we may consider that the functional properties (technological, organoleptic, nutritional, etc.) derived from the parts of the species used, account for more than the species itself. This way, we propose to approach what we call the ‘product-context of use’ by analyzing jointly the different positions of appearance of species in ingredient lists.
A good starting point is to examine where the identified species appears in ingredient lists. Regulations require ingredients to be listed in descending order of importance, with the first one weighing the most and the last one the least. Hence, we assume that a species that is only used for few of its functionalities (i.e. which received a treatment process aimed at extracting one or other of its parts such as peptides, starches, a gelling compound, etc.) is more likely to be found among the least important ingredients of an ingredient list (i.e. those weighing the least). This approach can be further refined by assessing if the species identified in the food products are part or not of the marketing pitch. We assume that the mention of the species on the product packaging (in addition to its appearance in the ingredient list) gives higher specificity to the species used, as it is positively associated with the identity of the product. From this point of view, differences between soy-based ingredients and NSL-based ingredients are quite striking.
[Here comes Table 2]
The Table 2 reports the mean position of the soy and NSL-based ingredients according to the ingredient list length, grouped in deciles. We observe that half of all products (52%) containing NSL-based ingredients are concentrated within the first four deciles; the first decile gathering almost 20% of the products containing NSL-based ingredients. For products containing soy-based ingredients, this threshold is reached from the 6th decile upwards. On the other hand, unlike products containing NSL-based ingredients, the decile distribution of products containing soy-based ingredients seems relatively balanced, which would support the idea of flex uses of this species.
More generally, soy-based ingredients tend to appear, more frequently in food products with complex formulations (i.e., longer ingredient lists), and almost systematically at a higher rank (column ‘Soy ingr. mean position’ in Table 2) than NSL-based ingredients. In all the deciles except the last three, the mean position of NSL-based ingredients is always lower than soy-based ingredients. This means that NSL-based ingredients tend to appear earlier in ingredient lists, suggesting their amount used in the product formulations is probably larger than for soy-based ingredients. This result could be explained by the fact that soy has been much more widely studied than pulses in the field of Food Sciences & Technology, particularly during the last decade38. Research and development in this domain led to a broader knowledge base for various uses and functionalities of soy, compared to all other pulses/NSL. In view of this, our results could confirm that soy use is associated to a larger array of functional ingredients than NSL-based ingredients. The likelihood of finding soy-based ingredients used as additives in product formulation is then probably higher than for NSL-based ingredients, whose position is most often among the top ingredients of ingredient lists. Thus, we assume that NSL-based ingredients are probably more frequently used as specific components for the product formulation.
We then considered only the top five ingredients (“InTop” in Figure 4) of depth 1 in the ingredient lists. This allowed us to eliminate most ingredients that are part of the composition of higher-depth ingredient lists (i.e., ingredients written between brackets). This selection criterion does not completely change the overall structure of the corpus. We observe here the same general trend. Soy-based products still account for almost twice (n=93,359) the number of NSL-based products (n=53,161). The latter still tend to appear more among the top 5 ingredients (69% to 78%), while soy-based ingredients appear more in the lower part of ingredient lists.
Based on this new criterion, we refined our analysis by classifying species according to their frequency of appearance among the top five ingredients, and in product packaging. The Figure 4 presents the result of a k-means clustering performed on these two dimensions, for species present in at least one hundred product launches. The 5 resulting groups are identified by categorical colors, and species are displayed in a three-dimensional space showing their frequency of appearance (in percent) among the top 5 ingredients, the remaining ingredients, and on the product packaging. In addition, to help interpret the results of this clustering, we also examined the most frequent ingredient expressions associated with the species in each group.
Figure 4 - 3D Scatter plot of most frequent legume species. Each species is plotted in the 3D graph according to its frequency of appearance in the top five ingredients (InTop%), in the remaining ingredients (InRemList %), and in the product description (InDesc %). Each color represents a cluster resulting from the k-means clustering (5 clusters were requested, according to results provided by the silhouette coefficients method39).
A central axis structures the cluster distribution in Figure 4. This axis distinguishes the species mostly found among the top 5 ingredients and frequently in product descriptions (the green cluster), from those found more frequently among the remaining ingredients and rarely cited in product descriptions (the blue cluster). More precisely, at one end of this axis we find a set of 6 species highlighted in green which are Phaseolus coccineus L., Cajanus cajan L., Lens culinaris Medik., Phaseolus vulgaris L., Cicer arietinum L., Vicia faba L., characterized by a high frequency of appearance among first ingredients and a high rate of mentions in product descriptions. These features lead us to suggest that product identity is more closely associated with those NSL species whatever their functional use. In that sense, species from this group could have a more positive impact on market biodiversity as they are of key interest for the food industry, in comparison to species only used for a functional interest, and therefore could be substituted by another species. Most frequent ingredients associated with species from this group do not seem to indicate a fractionated use of them. For example, in the case of Lens culinaris Medik., the most common ingredients mentioning this species quote it directly by its vernacular name, without mentioning specific parts ("lentils", n=2522; "red lentils", n=1250; “green lentils”, n=818). When this ingredient is associated to a processing term, the most frequent one is milling (“lentil flour”, n=1195). The same is true for Cicer arietinum L. (“chickpeas”, n=7467; “chickpea flour”, n=3295). This cluster gathers ingredients that seldom undergo processing.
At the opposite end of this central axis, plotted in blue, we find a group of 4 NSL-species (Canavalia gladiata Jacq., Pachyrhizus erosus L., Dolichos lablab L., Glycine max L.) and soy. They present a reversed profile: a low frequency among first ingredients and in product descriptions. This cluster could also include Ceratonia siliqua L. (plotted in purple), which has been identified as a cluster in its own right due to its extreme behavior: it is hardly ever mentioned - neither in product descriptions, nor among first ingredients. In this group, for the two most frequent species, Glycine max L. and Ceratonia siliqua L., the ingredients most frequently associated correspond to fractionated uses (“soybean oil”, n=58584; “soy lecithin”, n=50490; “soy protein”, n=18569; “locust bean gum”, n=4903; ”carob bean gum”, n=2242).
This axis, that contrasts species according to their frequency of appearance (within the top 5 ingredients and in the product description) may also tend to oppose different product-context of use of species. This brings us back to our initial hypothesis: the more frequently a species is used in a fractionated way, the more likely it is to be found among ingredients of lesser importance (in terms of volume and therefore rank and level) in ingredient lists, and the less prominence it is given in product packaging.
Hence, the case of the median cluster (plotted in red on Figure 4,) is very interesting. Here we find species characterized by a balanced score between their frequency of appearance among the top and remaining ingredients, but not systematically mentioned in product descriptions (Pisum sativum L., Vigna unguiculata L., Vigna angularis L., Vigna radiata L., Lupinus angustifolius L.). According to our main hypothesis, this median position between species could reveal various strategies of the food industry for those species that could become more “identity” or “generic” species, according to the future uses those species will encounter. In other words, those species can be used as effective key-components for product formulation, and sometimes not. The analysis of the most frequent ingredients quoting the most major species of this group seems to substantiate this observation. For Pisum sativum L., the two most common ingredients are “peas” (n=10,132) and “pea protein” (n=6,423), and for Lupinus angustifolius L., the most common ingredient is lupin flour (n=1,095).
Finally, a group of three species (plotted in orange on Figure 4), made up of Vigna aconitifolia Jacq., Vigna mungo L. and Phaseolus acutifolius L., seems to be opposed to the first group described (plotted in green) due to the lower propensity of these species to be cited in product descriptions. These “discrete” species know very low frequency of appearance, but the analysis of the most cited ingredients related to them brings them closer to the first group. For example, the most common ingredients referring to Vigna mungo L. mention the species by its vernacular name (“black gram lentils”, n=1412; “black lentils”, n=114), and when a process is mentioned, in most cases it concerns flours, the resulting product from grinding possibly coupled with sieving, (“black lentil flour”, n=56). We observe the same thing for Phaseolus acutifolius L. (“tepary bean flour”, n=128; “tepary beans”, n=25).