Sample Selection
We evaluate the quality of offset credits from the methodologies with the largest number of cookstove offset project activities on the VCM: GS-TPDDTEC, GS-Simplified, CDM-AMS-II-G, and CDM-AM-I-E. We also review GS’s new Methodology for Metered & Measured Cooking Energy Devices (GS-Metered) released October 2021.
The methodologies deploy different project stoves. GS TPDDTEC (previously GS’s Methodology for Improved Cook-Stoves and Kitchen Regimes) is the most versatile methodology covering any thermal domestic technology switch that is less GHG intensive, including but not limited to Improved biomass, heat retention, solar, LPG, and electric stoves. CDM-AMS-I-E replaces non-renewable biomass with renewable energy (e.g., renewable biomass, biogas, bioethanol, electric stoves). Designed for smaller projects, GS-Simplified and CDM-AMS-II-G have limited scopes, only allowing for biomass efficiency projects (e.g., traditional fuelwood stove to an improved fuelwood stove). GS-Metered is designed for cookstoves with metered or other direct fuel monitoring (e.g., purchase records) such as electric, LPG, biogas, bioethanol, or advanced biomass pellet stoves.
Most cookstoves projects are structured as Program of Activities (PoAs) in which multiple similar project activities (called voluntary project activities (VPAs) on the VCM and component project activities (CPAs) on the CDM) are bundled together to allow for rapid replication, only requiring a quick check from a validator and not a full registration procedure75. In order to reflect the diversity of projects on the VCM, we evaluated VPAs separately. CDM methodologies are used on both the CDM and the VCM, but we limited our scope to only VCM-registered projects (i.e., those certified by GS or Verra).
We then identified the 15 countries with the most credits from cookstove projects on the market and selected the projects with the most credits from each country for each methodology. There were so few projects under AMS-I-E and GS-Metered that we selected all eligible projects that provided enough information to recreate offset credit calculations on a stove-day basis for individual stove types. For the GS-TPDDTEC, GS-Simplified, and CDM-AMS-II-G projects, we prioritized projects that posted at least one monitoring report and provided their exact calculations and the stove days. No AMS-I-E and GS-Metered projects had issued credits; however, we still decided to include all listed projects under these two methodologies because of the importance of these methodologies, in terms of offering other methods for monitoring stove usage and fuel consumption, and because of the greater potential emissions reductions and health benefits from fuel switch projects that these protocols accommodate. We also ensured that our sample covered all types of fuel transitions, except for electric stoves. There were no issued projects actively deploying an electric stove, and the only listed electric project under GS Metered had no files available.
This approach resulted in 36 projects spanning 22 countries and accounts for 37% of all issued credits from these cookstove methodologies on the VCM (as of November 9th, 2022). Our sample contains seven methodology-project type combinations: (1) G-Firewood, (2) GS-Charcoal, (3) GS-LPG, (4) CDM AMS-II-G-Firewood, (5) CDM-AMS-II-G-Charcoal, (6) CDM-AMS-II-E-Ethanol, and (7) GS-Metered-Pellet (WHO Tier 4+ Biomass Pellet Stove). We have no reason to believe that these projects are not representative of the entire pool of cookstove credits on the VCM. The 20 GS standard projects represent 42% of the GS credits on the VCM. The 12 AMS-II-G contain 23% of that methodologies’ credits. The 4 AMS-I-E projects did not have any issued credits as of March 31st, 2022 (when we selected the sample), and the only AMS-I-E projects that had been issued credits were also credited under AMS-I-I.
Factors affecting cookstove offset quality
Due to the nature of this analysis, the methods we chose to utilize in our over/under crediting analysis are also our results and inform our recommendations. Thus, our methods are summarized in the main text. Here, in the expanded methods section, we include a full standalone explanation and justification for the methods we used to estimate the climate benefit of cookstoves offset projects and our recommendations for amending the cookstoves methodologies.
Note on uncertainty
Quantification of emission reductions from offset programs is inherently uncertain. Emission reductions must be estimated against an immeasurable counterfactual scenario. Other factors, notably fraction of non-renewable biomass, upstream emissions, and leakage are also difficult to measure, and with limited research to date, involve substantial uncertainty. Since offset credits often are used to “offset” or trade with direct emissions reduction, in order to maintain the integrity of an emissions reduction claim, offset programs are tasked with estimating program impacts conservatively when there is uncertainty. Here conservative means more likely to under-credit than to over-credit. Our analysis uses the most rigorous and up-to-date values from the literature when available (e.g., fNRB). Instead of choosing conservative methods for all factors, we do not or minimally correct factors with little published research, notably additionality, leakage, and fuel consumption.
Note on methodology versions
All methodologies, except for recently released GS-Metered, have undergone considerable updates over the years of credit generation that affect the methodological factors we study. Our recommendations and discussion below focus on the most recent version of each methodology and any updates proposed by the registry. However, most credits on the VCM including those still available for purchase are issued under previous methodology versions. Therefore, our quantitative over/under crediting analysis assesses the credits generated regardless of the methodology version used. We note in the main text and detail below where updated methodologies address over-crediting.
Adoption, usage, and stacking rates
Definition:Efficient cookstove projects reduce emissions to the extent that users: (1) “adopt” a more efficient project stove, defined as the % of distributed stoves actually in use; (2) use the project stove, where “usage” is defined as the % of meals cooked using the project stove; and (3) stop or reduce “stacking,” defined as % of meals cooked using the baseline stove(s) in concert with the project stove. These rates determine the change between pre- and post-project fuel use.
We note that with these definitions, usage and stacking do not sum to 100% as a household can use the household stove for every meal (usage=100%) and simultaneously use the baseline stove for every meal (stacking=100%). These rates determine the change in biomass or alternative fuel use induced by the project. Methodologies define these rates differently; most refer to adoption as “usage,” and some incorrectly include stacking as a source of leakage (see Box 1). In this paper and analysis, we always use the terms as defined above for clarity and consistency. Projects typically report a % of baseline stove use, which is then incorporated into the fuel consumption calculation. Thus, we use the project’s documentation to fit their reporting to our definition of stacking. For example, we would remove the deduction from fuel consumption or leakage. We then exactly replicate their calculated emission reductions using our definitions before implementing our adjustment.
Approaches: Methods for monitoring adoption, usage, and stacking fall into three categories. AMS-I-E, GS-Simplified, and AMS-II-G track them through short cross-sectional surveys. GS-TPDDTEC requires in-field, multi-day kitchen performance tests (KPTs) for a sample of households, capturing both usage and stacking rates by directly measuring daily fuel usage. Results are then applied to the full set of project households, according to their surveyed use of the stove (adoption). GS-Metered uses the most robust approach, directly tracking project stove and fuel use in all participating households through meters or fuel sales data.
Weaknesses in the methods used by cookstoves offset projects:Default surveys for all methodologies, commonly used by projects, are infrequent, simplistic, and vulnerable to social desirability27–30 and recall30,31 biases.
The various default surveys ask 14-40 questions, which also includes general information and information on leakage and end user training. Only 1-2 questions are explicitly identified to calculate adoption, usage, and/or stacking rates. Universally, all sample surveys ask specifically about the number of times the stoves were used over a time period, without the context of how many total cooking events were conducted, the nature of those cooking events (e.g., tea, dinner), etc. For example, AMS-II-G’s default survey simply asks households if they used the improved stove in the last week or month. Credits are generated for all households that reply “yes” as if they used the stove 100% of the time for the entire crediting period, typically a year or two. Stacking is estimated separately. AMS-II-G surveys stacking through two questions, “Do you use your traditional (baseline) cookstove also?” If yes,” how many meals did you prepare using the traditional (baseline) cookstove last week or last month?”. The answers to these questions are then used to discount the amount of biomass saved from the project (see supplemental).
The current approach to use surveys to obtain these values can lead to inaccurate results. For example, households often only use an LPG stove to make tea in the mornings. These households would answer “yes” to the question, “have you used the stove regularly since you installed it?” even though this is not the same level of usage as preparing an entire meal. The survey would capture the use of the baseline stove but equating a meal of chai and porridge in the morning to heavier meals will not result in accurate estimates of fuel reductions. All default surveys ask three to four further questions about adoption/usage/stacking, but the answers are commonly ignored in emissions reduction calculations. GS’s suggested surveys are more comprehensive than CDM’s; however, they still do not collect the information required to estimate stove adoption, usage, and stacking rates (see supplemental for full depiction of each methodology’s survey). GS-Metered provides a default survey; however, this is not required as surveying is only one option to determine the baseline or leakage (see Box 1).
Social desirability bias has been well-documented in survey methods27–30 whereby participants provide responses (in this case inflated adoption and usage rates) which they believe the surveyors (hired by the cookstove project developer) want to hear. Social desirability bias is more likely when the data collection is a one-off (cross-sectional), simple survey, rather than more intensive monitoring that can build rapport and more definitively confirm the households long term stove use (i.e., it is easier to over-state use once than consistently overstate)76,77. Research has established further methods to decrease social desirability bias (i.e., asking about specific days, multiple, carefully crafted questions, triangulation with photos, etc.), yet none of these methods are required by the methodologies (see section on specific survey recommendations)
Projects can inadvertently increase social desirability bias through their educational and marketing campaigns as respondents are keenly aware that being a good customer means using the project stove30. Additionally, participating households may know that project developers benefit financially from high rates of adoption and usage, and so may inflate those rates especially when asked simple questions. Since project developers profit from that desirability bias, they are not motivated to minimize it.
The survey based methods commonly used are further compromised by their infrequency as households may suffer from unintentional recall bias30,32,33. For example, a household may use a project stove every day for a few weeks, abandon it for a while, and then return to it again. The household may not remember exactly how long each phase lasted, and only respond given their recent experience. The survey asks about the last week or month, but that week or month may not be indicative of the past year or next year for which their response will hold. This uncertainty could thus lead to over- or under-crediting33, and is more likely to lead to over-crediting when paired with social desirability bias30.
These biases have been documented in both cookstove and another household-level carbon financed technology. In the cookstove literature, studies have compared the most robust and objective monitoring (i.e., wireless sensors21, stove use monitors31, KPTs78, and purchase data30) to survey data, finding higher rates of reported stove/fuel use from the surveys. Specifically, Ramanthan et al. 201721 found that using wireless monitors to track usage resulted in only 25% of the amount of usage that surveys found. Wilson et al. 2015 compared stove use monitors to self-reported surveys, finding that the surveys reported twice as much stove usage as the monitors31. Beyond cooking specifically, Pickering et al. 2017 evaluated a carbon-financed water filter program and found usage rates of 19%, while the project reported 81% over the same 2-3 year time period79.
KPTs, if done well, reasonably robustly capture usage and stacking, albeit with a few weaknesses. As a form of social desirability bias called the Hawthorne effect, households may change their behavior in the presence of project staff who can observe their stove choices while weighing the fuel34. Due to cost, KPTs are only required every two years on a sample of households; however, stacking and fuel quality and availability can be seasonal35, and stove use varies significantly between households. KPTs might not provide an accurate representation of stove use across the participant pool over the two-year crediting period. There are further concerns about KPTs (see fuel consumption).
Infrequent monitoring has not been shown to be indicative of the long-term mean. Studies have shown that even with objective measures (i.e., not surveys or KPTs) such as particulate and temperature sensors80 and stove use monitors81, short-term measurements (e.g., one or two random or consecutive days) did not consistently estimate the long-term mean. Research indicates that adoption and usage diminish drastically over time, both in the short term (few months to a year)82 and long term (years)37. Since estimated adoption, usage, and stacking rates are locked in until another survey (annual) or KPT (every two years) is required, this may lead to some under and over-crediting depending on the monitoring period length.
Adoption, usage, and stacking rates in the literature: The projects in our sample that use surveys report adoption and usage rates much higher than rates documented in the literature (90% adoption rate and 99% usage rate). As opposed to the simple, infrequent methodology surveys, research studies rely on highly engaged methods: a combination of KPTs, stove use monitors (SUMs), photos, and/or comprehensive longitudinal surveys (recommended when a high rate of bias is suspected76) that together can fully capture adoption, usage, and stacking. Further, the enumerators are trained to explain to respondents that they have no interest in the household’s answers, only in the truth. Two recent review studies compiled the usage and stacking rates from studies of improved and clean cookstove programs. Jeuland et al.’s review of cookstove programs found an average usage rate of 48% (minimum (16%) and maximum values (80%)83) across eight studies located in South America, Africa, and Asia. A review of stacking identified 11 case studies, spanning Sub-Saharan Africa, Asia and Latin America and found stacking rates ranging from 28%-100%84, with an average of 76%. National data on stacking (i.e., beyond specific RCTs) is limited as most national surveys or Demographic Health Surveys only collect data on the primary fuel used (although this is currently changing under WHO’s guidance). Shankar et al. argued that with access to multiple options, everybody stacks, even in high-middle income countries (e.g. using a microwave, a gas stove, and occasionally barbecuing)84.
Only one empirical study has quantified the Hawthorne effect in KPTs, although this source of bias is widely acknowledged in the cookstove literature34,85,86. Simons et al. found a 53% increase in project stove usage and 29% decrease in pre-project devices while being monitored during the KPT as compared to when stove use was monitored with sensors34. The study used a peer-reviewed algorithm to translate temperature sensor data into fuel consumption, to make it comparable to the KPT87. Cutting edge technology in the cookstove monitoring space includes a Fuel Use Electronic Logger (FUEL) which monitors household fuel consumption, air pollution, and temperature78,88,89. It has been shown to be comparable to the KPT78; however, the household was still keenly aware that they were being monitored. The FUEL requires an actively involved field team and household78. For example, the user must store all firewood in the monitor’s holder, remove it immediately before use, and only add additional wood after storing it in the holder for at least a minute. Thus, this work does not yet quantify the Hawthorne effect compared to KPTs as both still are quite intrusive approaches.
To determine a reasonable range for adoption rates, we identified and reviewed nine randomized control trials (RCTs) that implemented improved and clean stove programs across Central and South America42,44,45, Asia37,43, and Sub-Saharan Africa38–41. Our review found a minimum, average, and maximum adoption rate of 40%, 53%, and 65% respectively.
Approach: The default surveys used by offset projects have known biases because of how they are written and conducted. If adoption and usage rates were truly as high as the projects report, and stacking as low, the cookstove sector at large would not document and bemoan the barriers to effective interventions90,91. Without metering these projects to reveal the true adoption, usage, and stacking rates, and given the known weaknesses in the survey methods used by offset projects, the research studies reviewed above provide the best evidence38–41, using multiple robust methods in parallel, to estimate the project’s climate benefit. The studies within the adoption38–41, usage83, and stacking84 reviews cover a range of stove types, countries, and contexts similar to those of cookstove carbon offset projects (see below for a discussion on research vs. results-based financing projects). Stove and country specific adoption, usage, and stacking rates would limit the sample we would be able to evaluate. Rather, we use the reviews detailed above to construct a bounded range of the likely adoption, usage, and stacking rates across stove types and contexts. The range allows for uncertainty.
We replace all survey-derived adoption and usage rates with literature values as the best data available. We use empirical ranges in a Monte Carlo Analysis (MCA) using a triangle distribution: adoption 53% [40%, 65%]38–41, usage 48% [16%, 80%]83, and stacking 76% [28%, 100%]84 We discount KPT-derived (i.e., GS-TPDDTEC) usage and stacking rates with MCA with the maximum based on an empirical study estimating the Hawthorne effect (-53% in usage and 23% in stacking)34. We use a uniform distribution for the Hawthorne effect because of the greater uncertainty in the true value, given that there is only one study. We do not correct GS-Metered.
For two projects in our sample, we take a slightly different approach. One project required the destruction of the baseline stove. Another project built the improved stove in the exact place as the baseline stove. There is limited research on the effect of requiring the easily reversible commitment of removing the baseline stove. Harrel et al.’s study in India presented households the option of a second LPG cylinder in return for destroying or filling in their traditional stove. The researchers compared LPG usage and baseline stove use to similar households who obtained the second cylinder without this commitment. They found that baseline stove use decreased with the soft commitment, but 39% of households rebuilt the baseline stove92. Harrel et al. suggests that requiring the destruction of the baseline stove can lead to lower rates of stacking; however, these households were already LPG users, which is a very different context than most cookstove carbon offset projects. Further research is needed. While this one study would suggest that the adoption and stacking rates of the projects that require the destruction of the baseline stove are likely lower than a project without this requirement, the projects with this requirement are likely to still have stacking rates higher than their surveys would suggest. With this limited literature, we construct an uncertainty range within a MCA using a uniform distribution with the maximum as the mean of each of the triangle distributions outlined above (adoption: 53%, usage: 48%, and stacking: 76%). These are rare project requirements within cookstove projects on the market; no methodology requires this.
Beyond these two projects, we have no reason to believe that cookstove offset projects’ rates of adoption, usage, or stacking would be any different than projects studied in the literature, especially as many projects in the literature work with NGOs similar to those implementing carbon financed projects. An important distinction is that research is often based on research-funded cookstove activities, while VCM projects are results-based schemes (i.e., VCM projects will only generate revenues if they demonstrate usage). Results-based financing should imply that cookstove offset developers are motivated to structure the project so that households adopt and use the stoves; however, obtaining high rates of adoption and usage is difficult as the literature has shown38–41. This incentive to create projects that households like is blurred with the incentive to ensure high reported usage rates. This should mean that there is a further imperative on the methodologies to meticulously track actual adoption, usage, and stacking. Yet only GS-Metered requires robust methods. If anything, the literature’s low adoption and usage rates are suspected to be even lower (and stacking even higher) had the research team not been so highly engaged, limiting the household’s ability to show true valuation or resource allocation (i.e., stove maintenance)37. Thus, the values derived from the literature likely represent the higher end of adoption and usage.
Recommendations: The methodologies should require projects to choose between the following options:
- Use meters or fuel purchase data for adoption and usage and conduct a longitudinal stacking/rebound survey (with observation and photos); if a project has metered or fuel purchase data, this option is required.
- Conduct a KPT, adjusting for the Hawthorne effect with a literature-derived default, for usage and stacking, tracking adoption with a robust longitudinal survey (with observation and photos) or a conservative literature-derived default
- Capture all with a robust longitudinal survey (with observation and photos)
- Assume all conservative literature-derived default values
Specific survey recommendations: Longitudinal surveys are favored when social desirability biases are likely to be high76, such as when enumerators work for a company or organization with known financial interest in the participants’ answers. Well known guidance on survey design to avoid bias suggests (1) asking about specific days50 (yesterday, the day before yesterday, and within the last seven days), (2) asking specifically when and what they cooked (i.e., deploying time-use strategies), and (3) even having households self-administer the surveys to reduce recall and social desirability bias93, which is often not feasible in some low and middle income settings94. Triangulating with photos, observation (i.e., is the intervention stove clearly used or put away) and asking similar questions in different forms can also reduce bias. Currently, the surveys suggest kitchen visits, taking photos, etc.; however, only one or two questions are used to calculate usage rates, not mentioning these other checks. Instead, the projects should be required to outline before monitoring how the rates will explicitly be calculated leveraging all available and relevant questionnaire data. For example, project should use the answers to multiple specific questions to support usage and the photos to confirm it. The frequency of follow-ups in robust research ranges from 1-3 weeks to once a year37,82,95; however, it would be important to capture seasonality. Stove use monitors, meters, and fuel purchase data improve the accuracy of quantifying project emissions from the new stove13; but surveys are still needed to monitor continued use of the baseline stove as stove use monitors often burn up in traditional stoves. Observation can play a large role in monitoring baseline stove use, noting if the traditional stove is warm, contains embers or signs of use, etc. To obtain trustworthy survey results, field teams must be well trained and motivated, and it is recommended that data analysts clean survey data at least every few days. We acknowledge that these are high and expensive expectations that many projects may not have the resources to perform. However, if survey methods do not avoid bias, developers can simply use data from published studies that use robust methods.
Notes on methodology requirements:While KPTs used by some AMS-I-E-Ethanol projects, they are only used to estimate an amount of ethanol used in the project scenario (not all fuels used) to calculate the thermal equivalent amount of biomass that the project is displacing, assuming a static amount of energy consumption. These projects use surveys to determine adoption and usage and so we replace their survey results with literature-derived values as we do with other projects that use surveys. GS-Metered does not require the monitoring of stacking, as they only credit for project fuel usage and assume a static amount of energy consumption, like AMS-I-E. GS-Metered and AMS-I-E thus are vulnerable to over-crediting from the unmonitored use of the baseline stove only in the context of rebound (see section below). GS Metered requires that projects demonstrate continual purchase of the new fuel for use in the new stove (i.e., customers are not burning pellets for warmth instead of cooking).
Not all projects release their surveys, therefore, we do not know whether some projects use more robust survey methods than the methodologies’ default surveys; however, no project is required to do so. Even if the projects use better surveys than the defaults, if these surveys did not follow established guidance on reducing social desirability and recall bias, we will still recommend they use literature values.
Fuel consumption
Methodologies use three approaches to estimate the difference between baseline and project fuel consumption. AMS-II-G and GS-Simplified determine baseline fuel use, and then use stove efficiencies to estimate fuel use savings. GS-TPDDTEC determines baseline and project fuel consumption separately, and calculates emissions reductions as the difference between the two. GS-Metered/AMS-I-E back calculate baseline fuel consumption from the measured/surveyed project fuel use assuming the equivalent energy would have been used in the baseline by the less efficient baseline stove.
Methodologies allow projects to select between several options to determine most inputs into each of these approaches. AMS-II-G, GS-Simplified, and GS-TPDDTEC allow projects to determine total baseline fuel use using a default value (0.446-0.5 tons of firewood/capita/year), literature or national survey data, their own survey, or a KPT (rarely chosen)22,24 (Box 1). AMS-II-G and GS-Simplified use default values for the baseline stove efficiency and determine the project stove efficiency with a laboratory test. GS-Metered and AMS-I-E determine baseline fuel consumption with default values, literature, or surveys. GS-TPDDTEC and GS-Metered allow for no flexibility in estimating project consumption, requiring KPTs and metered or sales data respectively.
Weaknesses in the methods used by cookstoves offset projects:We identified a few inaccuracies within these options to estimate fuel consumption. Old CDM default baseline stove efficiencies are lower than those found in the literature. There are well documented discrepancies between stove performance in the lab and when in use in households, often with higher performance and efficiency noted in the lab where fuel and combustion conditions are controlled and/or optimized (e.g., 49). Other studies have shown that the improved stove in the field consumes the same or more biomass as the baseline stove49,96.
Numerous factors contribute to an improved stove’s poor performance in the field as perfectly controlling combustion is difficult in real world scenarios. However, the largest two sources of poor in-field performance relate to the fuel and to the user. In the field, fuelwood has varied quality (i.e., moisture, wood-type, etc.) which will affect the stove’s performance, while the fuelwood used in a laboratory is typically of uniform quality. Lower efficiencies in the field are also a product of users not following manufacturer guidance. For example, some stoves require small pieces of logs or branches, which creates further time and effort for the household97. A study of an improved stove in Malawi found that to avoid this extra hassle users placed larger wood pieces into the stove, which then stuck out of the combustion chamber, allowing for incomplete combustion and decreasing the stove’s efficiency49.
The testing protocols (e.g., WBT 4.2.3, CCT 2.0) give little guidance regarding the specific actions of the operator during the laboratory test, and a review of thermal efficiency protocols found systematic errors associated with the test operator’s behavior98. Zhang et al. states that in China’s stove testing facilities, there is a saying that, “‘Thirty percent depend on the stove, and seventy percent rely on the operator’, which implies the operator has a greater influence on the performance rating than any inherent characteristics of the stove itself”98. We also note that some projects estimate project stove efficiency with the use of pot skirts, which can increase the efficiency by 5-10%; however, pot skirts are not fixed and users often remove them, especially to accommodate larger pots97. These inaccuracies together can create artificially high baseline and low project consumption values.
As described above, baselines constructed with project-led fuel consumption surveys are vulnerable to social desirability27–30 and recall30,31 biases, as households struggle to estimate kilograms of fuel used30, especially since use is seasonal. Kar et al.’s assessment of an LPG program in India found that surveyed consumption values can be far higher or lower than tracked fuel usage33, but other research has found that self-reported cooking fuel consumption values were significantly and statistically larger than the tracked consumption from purchase data30 or stove use monitors99.
These biases can also be present in national survey data50–53. For example, in a national survey, households again may struggle to estimate the weight of firewood used daily33 or inflate their use of an improved or clean stove to present themselves as having a higher level of affluence. Further, lengthy national surveys often do not have the resources, time, or need to triangulate between questions, photos, and monitoring techniques to estimate these cooking statistics that the research studies we cite use. National surveys may track fuel consumption by asking, “Over the last month, did the household purchase/acquire/consume firewood, charcoal, or other cooking fuel?” and then ask for that amount in grams, kilograms, milliliters, liters, or kWhs100. Beyond bias, the use of national surveys may also compromise the baseline as a national average may not represent the project’s specific target population.
Further, no methodology requires the baseline value to be updated throughout the crediting period. Urbanization, decreasing household sizes, and increasing incomes trend towards less GHG intensive fuels and stoves and thus lower baseline emissions. However, project baselines are typically fixed for 5-10 years.
In theory, energy delivered to the pot should be similar or even increase (i.e., rebound effect) in the project compared to the baseline; however, most projects only show a decrease in energy consumption. We do not assume that energy needs with the baseline and project stoves are the same, since multiple factors affect a household’s cooking demand and practice. Adoption of a more efficient stove is not a guarantee that fuel consumption will decrease. Inherently, equation based, non-field methods will lead to lower fuel consumption after the improved stove is introduced. These equations assume static delivered energy, not considering true daily fuel usage, or rebound effect.
Further detail on potential bias in field-based methods: Even field-based methods can lead to unreasonable baseline and project fuel consumption values. KPTs, although better than surveyed consumption, are highly variable47,101. Related to our discussion on why improved stoves may perform differently in the field, field tests are more accurate to real world conditions, but less precise as those conditions introduce a lot of variability from factors such as fuel quality and user behavior. An improved stove’s performance may differ between the lab and the field, but also between households. Unlike in laboratory conditions, the tester cannot control factors such as temperature, air flow, fuel quality, cooking techniques, etc. between households47,101 .
Additionally, household fuel consumption is very specific to a households’ context (size, ages of family members, types of foods cooked, occupations, etc.). Thus, sample households chosen for KPTs must be carefully selected. The variability and uncertainty in measuring a specific household’s daily fuel use can lead to KPT-derived consumption values outside of the range. Therefore, field-based methods are valuable, but still vulnerable to some bias and uncertainty.
A potential source of under-crediting for stoves with a high turndown ratio: A potential source of under-crediting for GS Metered and AMS-I-E is the nuance between energy delivered to the pot vs. useful energy, which are equated in the methodology. To establish baseline useful energy, the methodologies use the efficiency of the baseline stove (as determined in laboratory settings). The result is not useful energy, but energy that went into the pot. Thus, energy delivered to the pot still can contain wasted energy. The amount of wasted energy relates to the power level households were using to cook. For the most time-consuming cooking tasks, the baseline stove may only “need” a certain power; however, since baseline stoves cannot be turned down to this level, the stove may be operating at a higher power to perform the task, wasting energy. This is most apparent when it comes to keeping water boiling, which is one of the most common long-range cooking tasks. For example: a baseline stove may only need to operate at 2kW to keep water boiling, but its minimum power may be 4 or even 6kW power, leading to 2-3x higher energy delivered than what was needed. This means that improved stoves that can be turned down to deliver heat at a low level can save greater amounts of fuel than methodologies estimate, implying under-crediting.
To incorporate this source of under-crediting, we would have to establish the type and quantity of food cooked in the baseline scenario, and how much of the new fuel would be needed to cook that food in the new stove. We would have to assume that the user would in fact turn down the stove appropriately. We only note this source of under crediting. These methodologies could keep this source of under-crediting for stoves with high turndown ratios as a needed buffer of conservatism.
Approach:For projects that use default stove efficiency values to calculation emissions reductions, we update those figures to the 2022 CDM Methodology Panel’s recommendations, which reflect current literature46. They update the baseline firewood stove’s efficiency from 10% to 15% and a baseline charcoal stove’s efficiency from 20 to 25%.
Without a way to ground truth fuel consumption, we simply confine fuel consumption values to a reasonable literature-derived range of 2-4 MJ/capita/day54–56 energy delivered to the pot. This range is reasonable beyond the literature as GS-TPDDTEC includes guidance that 0.75 ton/capita/year (~3.2 MJ-delivered/capita/day) is a ‘threshold’ or nearing the maximum guidance and that 0.95 ton/capita/year (~4MJ/capita/day) is the maximum allowable baseline. In cases of projects reporting multiple fuels, we allowed for the individual fuels to be under 2 MJ-delivered/capita/day as the total cooking fuel use was over 2 MJ-delivered/capita/day. In cases of low project consumption, we set the project value at 2 MJ-delivered/capita/day of the fuel and then recalculated the baseline value based on the project’s reported fuel savings as a percent. We are minimal in our corrections. There is likely further over-crediting than we are able to adjust.
Fraction of non-renewable biomass (fNRB)
Projects that reduce biomass use should only be credited for the proportion of emissions reduced from non-renewable sources. This is because projects should only be credited for emissions reductions from biomass use associated with the loss of carbon storage in natural systems over the project crediting period.
Previously, all methods relied on inaccurate CDM fNRB default values. As these defaults have now expired, projects may calculate fNRB values from a CDM tool57 or assume a 30% default (rarely chosen). Both the earlier defaults and the tool rely on a simple equation and Food and Agriculture Organization (FAO) data102. Both options assume trees and bushes have no regenerative capacity, overstating forest degradation as compared to published literature4,58. Bailis et al. (2015)’s WISDOM model determines fNRB estimates that account for biomass regrowth and geographical, ecological, and land use heterogeneity4,58. The WISDOM model can be implemented at a minimum administrative unit of analysis (i.e., villages, counties, states) and has been used to estimate fNRB for 25 countries. Using fNRB values from WISDOM at the regional level, Bailis et al. (2017)11 estimate abated carbon as 41-59% less than estimates using CDM and GS protocols. Johnson, Edwards, and Masera (2010)103 and Bailis et al. (2017)11 advocate for the use of community-level fNRB values in emission reduction calculations. National level estimates lack the resolution needed to reflect fuelwood consumption for specific communities, requisite for small-scale projects. The WISDOM model incorporates cost-distance maps to limit the uncertainty through specifically evaluating the land cover most likely impacted by the project4.
The most robust fNRB approach to date, is a dynamic landscape model, MoFuSS (Modeling Fuelwood Sustainability Scenarios)59, although only a few national values are currently available. Protocols and projects should monitor new MoFuSS values and update their crediting appropriately. In our sample, 11 projects used the CDM default values, 18 used the CDM tool, four used previous surveys from the literature, and two used preliminary values to be determined upon verification. The average fNRB used in our sample of projects was 88% [min:61%, max: 100%], while the average, if the projects had used the Bailis et al. 2015 value, would be 38% [min: 6%, max: 65%]. On average, the projects’ chosen fNRB values are 2.8 [min:1.1, max: 16.4] times the Bailis et al. 2015 values4.
The CDM Methodology Panel as of June 30th, 202246 recommended that projects which individually determine the percentage through a CDM tool, must address, compare, and justify any difference to established academic literature, specifically noting Bailis et al. 20154. However, this change has not yet been implemented. There is no guidance for which fNRB value within Bailis et al. 20154 to use, and projects still have leeway to determine their own fNRB as the Bailis et al. 20154 numbers are not enforced.
Approach: Within a Monte Carlo Framework, we utilize a triangle distribution which draws on Bailis et al.’s fNRB estimates for each subnational (if the subnational location is listed in the PDD) or national level, in cases where the project’s scope is the entire country4. We utilize the fNRB values from Bailis et al.’s second, low-yield scenario which considers that the byproducts of deforestation are used to meet a fraction of the wood demand and then the other component represents other sources when the byproducts of deforestation are exhausted, assuming productivity from the Forest Service of India4.For the mode of the triangle distribution, we utilize Bailis et al.’s “expected” fNRB value, which Bailis et al. constructed as a linear transformation from their estimated minimum value of fNRB4. We utilize the low yield scenario’s minimum value as the triangle distribution’s minimum. Since we have no information about the shape of the distribution after this expected value, we set the maximum of the triangle distribution as 10% greater than the expected value.
Emission factors
All cooking fuels have and non- emissions at the point of use, and all except for collected firewood have upstream emissions. To translate baseline and project fuel use into GHG emissions, GS and CDM methodologies take two very different approaches.
GS uses outdated (i.e., 2006) IPCC default emission factors (EFs) for both and non- gasses. GS methodologies make lifecycle emissions accounting optional, a source of under-crediting when excluded, and as a result only eight projects include it. Counterintuitively, the CDM, instead of using an emissions factor for the fuel wood or charcoal used by the baseline stove for all registered projects, uses an EF per unit energy of a fossil fuel that is projected to substitute only the non-renewable biomass portion of past fuel usage. This projected EF is based on a weighted regional value of different fossil fuels: kerosene, liquified petroleum gas, and coal, all of which have lower carbon dioxide EFs than firewood and charcoal per unit of heat delivered to the pot. This avoids the restriction that CDM projects are not permitted to credit reductions in deforestation. This workaround, in theory, provides a conservative estimate of true emissions reductions. It is less complex than directly measuring the baseline technology’s emission profile. It does not, however, reflect actual emission reductions. Since 2019, CDM methodologies have included the projected fossil fuel’s non-gasses since (version 11.1.), a source of (past) minimal under-crediting. Although recent CDM methodologies (as of v11.1) account for upstream emissions, it is still a source of under-crediting as the upstream emissions of the CDM required are much less than those of the typical baseline fuel (firewood/charcoal) of the projected fossil fuel which is inappropriately used as the baseline.
EFs in the literature: Emission profiles (point of use and upstream) of various cooking fuels vary widely in the field, depending on moisture content, wood type, sourcing of feedstock, characteristics of production, distribution, etc. Since the IPCC values were established, there have been a considerable number of studies specifically looking at the lifecycle emissions of cooking fuels. Recently, Floess et al.61 compiled a database of lifecycle emission factors for cooking fuels, drawing from peer-review literature10,104,105, the EPA/CCA’s analysis48, and the Greenhouse Gases, Regulated Emissions, and Energy Use in Technologies Model (GREET) model106 (see Floess et al.61 Supplemental Table S9).
Freeman & Zerrifi and Sanford & Burney argue that the CDM and GS methodologies both result in substantial under-crediting by ignoring reductions in black carbon (BC) emissions, estimated as having 600 times the global warming potential (GWP) of 15,19. Subsequent research, however, has found the climate benefits of reducing BC to be ambiguous62,63. This ambiguity is in part due to the simultaneous reduction of co-emitted species, some of which have climate cooling effects. For example, improved stoves shrink the ratio of EC (a climate cooler) to OC (a climate warmer), an effect that lessens and may negate the climate benefits of reducing BC emissions107. Grieshop et al. note that accounting for benefits over shorter time periods could counterbalance these complicating factors, as BC’s short lifespan may offer immediate climate relief, while the imbalance of EC to OC has implications over a longer period of time108.
Whitman and Lehmann explain that all methodologies over-credit due to an error in the way they account for reductions in methane emissions from renewable biomass14. Instead of claiming a reduction in methane using its full 100-year GWP value, the methodologies should use a lower GWP value that accounts for the renewability of the that methane becomes in the atmosphere after its lifetime of 10-20 years. However, since the IPCC GWP figures have since increased, the old GWP values are now a small source of under-crediting.
Approach:We replace the methodologies’ EF respective approaches with cooking-fuel specific EFs, including upstream and non-CO2 emissions, from Floess et al.61, the most comprehensive, recent database.
In practice, this means we add methane and nitrous oxide (i.e., non- gasses) for earlier CDM projects and update the IPCC values used by GS projects with the Floess et al.61 figures. We incorporate (or update for the projects that already include upstream emissions) all production, processing, and distribution emissions for each cooking fuel from Floess et al.61
We updated all methodologies’ GWPs to the most recent IPCC values, accounting for the most recent IPCC distinctions for renewable/nonrenewable biomass (i.e., 28 rGWP and 30 GWP).14 Due to high uncertainty around the climate impacts of black carbon emissions from cookstoves projects62,63, we, like the methodologies, exclude black carbon (see online methods).
Rebound effect
Households commonly increase their overall cooking energy consumption with access to an improved stove (e.g.,65,66). The improved stove lowers the “cost” of cooking and provides another burner, allowing the household to increase their fuel consumption. The most recent literature and modeling of clean cooking transitions is moving towards including this increase in cooking energy consumption109,110. Therefore, we reviewed and analyzed the literature of improved stove studies, finding estimates of increased cooking fuel consumption which averaged to roughly a 22% increase in consumption. Only projects that utilize KPTs would capture this increase47, while laboratory water boiling tests (WBT), thermal output formulas, controlled cooking tests, and metering/purchase data would not capture this phenomenon as they assume that total household energy delivered to the pot is constant.
The average rebound effect from five published studies39,65,67–69 with available datasets is 22% (i.e., consumption with the KPT was found to be 22% higher than expected consumption with an efficiency equation, assuming static consumption). The three projects in our sample that reported both KPT and WBT values found rebounds of 16%, 32% and 47%. We reduced our estimating emissions reduction estimation by 22% for projects that do not utilize KPTs to monitor project fuel consumption, reflecting the rebound effect implied in published datasets and confirmed by three projects in our sample that reported both KPT (observed) and WBT (expected) values.
Additionality
Offsets should only be generated for reductions caused by the offset program and should not generate credits from improved or clean cookstoves that would have been purchased regardless of the prospect of income from carbon revenues. The methodologies treat additionality as a binary. Projects in locations with low penetration rates of improved stoves, small projects in certain locations, or projects using specific technologies are automatically considered additional. All other projects must demonstrate that they would not have gone forward without carbon revenues using the CDM additionality tool, which has been criticized for being inaccurate60,72,111. In the published literature, only one narrative case study explores the additionality of a cookstoves project, finding strong evidence for the additionality of the studied project112. We define additionality as a proportion of participating households who would likely have not used an efficient stove were it not for the offset program. We tracked the reported percentage of control households obtaining the improved or clean stove and then weighted by the time period of the study. We also triangulated with recent projections for clean fuel use up to 2030113 globally and for the countries within our sample. These results were inconclusive as two RCTs found positive spillover effects from the stove programs, suggesting even greater additionality under our non-binary definition. Others found extremely high rates of control households obtaining the stove, suggesting lower additionality, due to government cooking policy (i.e., India’s PMUY policy43 (see supplemental information). Given that the sources we reviewed did not converge within a reasonable range, we defer to the methodologies’ tests for additionality and identify this aspect as a factor for future research.
Our recommendation to establish baselines with required KPTs and/or robust project-led surveys will also hedge against non-additional stoves. Baselines constructed from national statistics may not capture non-additional stoves, particularly in urban, heterogenous contexts where households might already use or obtain a clean fuel.
Leakage
Leakage can occur when reduced use of firewood or charcoal by project households leads to increased fuel usage by non-project households; GS methodologies include additional sources of emissions within their definition of leakage, which at times includes stove stacking, upstream emissions, etc. CDM methodologies account for leakage either through household surveys or an unjustified default value of 5% of final emission reductions114. This default is used by 50% of our sample. GS methodologies use the same 5% default value or allow projects to thoroughly evaluate five potential sources of leakage. The unjustified default of 5% is another source of uncertainty114. It is largely unknown whether the introduction of an improved stove program increases the consumption of biomass of non-project households. When projects do track leakage, they rarely report a value larger than 5%. This is not empirical evidence that 5% is conservative as projects are incentivized to find low leakage.
Analogous to our quantitative additionality assessment, we identified three cookstove randomized control trials that tracked the control household’s biomass consumption from baseline to endline. Following advice from Calyx Global115, specifically Hilda Galt, we also reviewed all the GS projects for each country in our sample to find the projects that actively tracked leakage as opposed to relying on the default value. We do note that some projects inappropriately included upstream emissions or baseline stove use within leakage. We then compiled the percentage of leakage each of these projects found (see supplemental information). These results were also inconclusive as some of the RCTs found that the average control household actually decreased their consumption37,116 suggesting zero or even negative leakage (although leakage was not isolated as the factor), while another found that control households increased their consumption by 33%39(see supplemental information). The GS projects tracking leakage from each country in our sample found a range of 0% to 6% leakage, with an average of 1%. Projects are not unbiased in their assessment as more leakage implies fewer credits. As in the baseline, projects devise their own questions, risk developing leading questions, or rely on a single question to determine leakage. Given that the sources we reviewed were inconclusive and potentially biased, we do not correct any leakage values and identify this aspect as a factor for future research.
Adjusting factors
Using the values listed in the latest verified monitoring report or project documents of these 36 projects, we calculated the number of VERs on a per stove-day basis. We only included projects (or monitoring reports from projects) in our sample if we were exactly able to replicate the number of VERs either in total or on a per stove-time basis. Once we replicated the credits generated under the methodologies, we then adjusted all of the identified factors contributing to either over/under crediting as described above. Then, we conducted analyses isolating each factor. Finally, we conducted one analysis evaluating only ex post values across all protocols (i.e., excluding adoption, usage, and stacking rates) as at the time of sampling GS-Metered-Pellet and AMS-I-E Ethanol projects had generated no credits and thus adoption and stacking rates are ex ante values from the PDD rather than ex post values from monitoring reports as with all other protocols. In all analyses in which we adjusted the emission factors, firewood-charcoal conversion factor, or consumption, we removed GS-Metered and AMS-I-E’s calculation approach and calculated the baseline emissions and project emissions separately. For example, we used the baseline and project consumption reported in their PDDs to calculate the difference between baseline and project emissions instead of using their baseline conversion factor approach (see fuel consumption section).
In total, we have analyses in which (1) all factors were adjusted, (2) only adoption rates were adjusted, (3) only usage rates were adjusted, (4) only stacking rates were adjusted, (5) only fNRB values were adjusted, (6) only emission factors (including upstream emissions) were adjusted, (7) only the firewood-charcoal conversion were adjusted, (8) only consumption (baseline and project) values were adjusted, (9) only the rebound adjustment was incorporated, and (10) all factors were adjusted, except adoption, usage, and stacking (see supplemental). For each analysis, we were then able to project the amount of over/under crediting of these cookstove projects and methodologies and on the market.
We then spliced the results by methodology-project type combination and then by country. Our sample contains 7 methodology-project type combinations: (1) GS-Firewood, (2) GS-Charcoal, (3) GS-LPG Stove, (4) CDM-AMS-II-G-Firewood, (5) CDM-AMS-II G-Charcoal, (6) CDM-AMS I-E-Ethanol, and (7) GS-Metered-Pellets.
Monte Carlo Framework
A Monte Carlo Analysis is a statistical framework which calculates possible outcomes when input parameters are randomly varied within a specified range and a given distribution117. When used for fNRB, adoption, usage, and stacking rates, the Monte Carlo analysis generates values within our defined limits, following a defined distribution. We specified the simulation to run 10,000 times randomly generating new values for each of these factors and calculating an associated emission reduction. We acknowledge the inherent uncertainty within our factors and bound them within a literature derived range.
Extrapolating emission reductions
We extrapolate our estimated amount of over-crediting from our sample of monitoring reports to each project’s total issued credits. We then sum those to compare the amount of VERs we estimate to those issued. To extrapolate to the entire cookstove market, we take the average range of over-crediting for each methodology-stove combination and then apply these to the total amount of credits issued for each methodology-stove combination.
Limitations
Our study has some limitations that must moderate our conclusions. We only cover 37% of the market, 22 countries; however, we attempted to have a fully representative sample across methodologies, location, and project type (see methods). We were limited to projects that were transparent enough to provide their exact calculations or stove-days within their monitoring or validation reports. There is uncertainty around the factors, which is why we utilized a Monte Carlo Framework as explained above. We were limited by the details provided by the projects and the standards. For example, numerous projects did not specify the rural or urban setting or more specific administrative units, which is very important for fNRB.
Commercial Credits
A few of our sampled projects credited stoves used for commercial purposes (restaurants, schools, etc.). We do not adjust their adoption, usage, or stacking rates, or baseline/project fuel consumption. There are still barriers to adoption, usage, and ending stacking for commercial institutions; however, the literature on these rates is limited118, and thus an area for future research.
All analysis was conducted in Python 3.