Machine learning-based evidence and attribution mapping of 100,000 climate impact studies

doi:10.21203/rs.3.rs-783398/v1

Download PDF

Research Article

Machine learning-based evidence and attribution mapping of 100,000 climate impact studies

https://doi.org/10.21203/rs.3.rs-783398/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

An ever-growing body of evidence suggests that climate change is already impacting human and natural systems around the world. Global environmental assessments assessing this evidence, for example by the Intergovernmental Panel on Climate Change (IPCC)¹, face increasing challenges to appraise an exponentially growing literature² and diverse approaches to climate change attribution. Here we use the language representation model BERT to identify and classify studies on observed climate impacts, producing a machine-learning-assisted evidence map which provides the most comprehensive picture of the literature to date. We identify 100,724 (62,950 − 162,838) publications covering a broad range of impacts in human and natural systems across all continents. By combining our spatially resolved database with human-attributable changes in temperature and precipitation on the grid cell level, we infer that attributable climate change impacts may be occurring in regions encompassing 85% (80%) of the world's population (land area). Our results also reveal a substantial 'attribution gap' as robust evidence for attributable impacts is twice as prevalent in high income compared to low income countries. While substantial gaps remain on confidently establishing attributable climate impacts at the regional and sectoral level, our unique database illustrates the broad extent to which anthropogenic climate change may already be impacting natural systems and societies across the globe.

Environmental Policy

Climatology

Climate change

climate impacts

detection and attribution

machine learning

BERT

There is overwhelming evidence that the impacts of climate change are already being observed in human and natural systems³. These effects are emerging in a range of different systems and at different scales, covering a broad range of research fields from glaciology to agricultural science, and marine biology to migration and conflict research¹. The evidence base for observed climate impacts is expanding⁴, and the wider climate literature is growing exponentially^5,6. Systematic reviews and systematic maps offer structured ways to collectively identify and describe this evidence while maintaining transparency, attempting to ensure comprehensiveness and reduce bias⁷. However, their scope is often confined to very specific questions covering no more than dozens to hundreds of studies.

In the climate science community, evidence-based assessments of observed climate change impacts are performed by the Intergovernmental Panel on Climate Change (IPCC)¹. Since the first Assessment Report (AR) of the IPCC in 1990, we estimate that the number of studies relevant to observed climate impacts published per year has increased by more than two orders of magnitude (Fig. 1a). Since the third AR, published in 2001, the number has increased ten-fold. This exponential growth in peer-reviewed scientific publications on climate change^5,6 is already pushing manual expert assessments to their limits. To address this issue, recent work has investigated ways to handle big literature in sustainability science by scaling systematic review and map methods to large bodies of published research using technological innovations and machine learning methods^8–12.

Fully utilising the available knowledge on emerging climate change impacts is key to informing global policy processes¹³ as well as regional and local risk assessments and on-the-ground action on climate adaptation^14,15. While the global policy process may be served well with literature assessments presenting results aggregated on the level of continents or world regions^1,16, informing climate adaptation typically requires more highly localised and contextualised information on climate impacts^17,18.

Another core challenge of literature reviews and assessments of observed climate impacts relates to the question of whether climate impacts can be attributed to anthropogenic forcing⁴. While anthropogenic climate change signals have been identified in observed trends in a number of variables⁴ including temperature¹⁹, precipitation²⁰, sea level rise²¹, or water resources²², and selected extreme weather²³ events, the confidence in these assessments is still subject to substantial regional variations and remains relatively tentative at smaller spatial scales even if very high confidence levels can be reached for larger scale (e.g., global scale) attribution findings. Confidence also strongly depends on the variable being considered, and specifically decreases further down the impact chain, i.e. for indicators of changes in human and natural systems that are driven by changes in other climate impact variables⁴. In addition, methodological approaches and robustness criteria for climate change attribution differ widely between studies and disciplines, requiring expert judgement on a case-by-case basis in order to compile a comprehensive evidence base.

This points towards the added value of joining the body of evidence documenting regional or local-scale studies about climate impacts linked to common climate drivers such as temperature and precipitation change to a spatially resolved detection/attribution database of those variables.

Using BERT, a state of the art deep learning language representation model²⁴, we develop a machine learning pipeline to identify, locate and classify studies on observed climate impacts at a scale beyond that which is possible manually (see Extended Figure 1). We combine this spatially resolved dataset with an approach to attributing observed trends in surface temperature and precipitation at the grid cell level (5^o x 5^o and 2.5^o x 2.5^o cells respectively) to human influence on the climate. In doing so, we establish a new paradigm for assessing the impacts of climate change across human and natural systems.

Mapping over 100,000 impact studies

We searched two large bibliographic databases (Web of Science and Scopus) using an inclusive and transparent search method to systematically identify the literature on climate impacts. We assessed comprehensiveness by ensuring that our search string returned all references from tables 18.5-18.9 in AR5 WGII, which deal with the detection and attribution of climate impacts. Recent breakthroughs in natural language processing (NLP) have extended the capabilities of text classification. BERT (Bidirectional Encoder Representations from Transformers) is a deep learning language model trained using semi-supervised learning on massive corpora to represent text where word representations are dependent on context. The pretrained model can be fine-tuned on downstream tasks, and has achieved state of the art results across a range of NLP tasks. Using training data assembled by collaboratively screening and coding 2,629 abstracts, we use supervised machine learning, fine-tuning a DistilBERT model²⁵, to classify, also based on the abstract text, documents relevant to understanding the observed impacts of climate change in general, and to predict the human or natural systems for which they document impacts (i.e., the impact categories), as well as the climate variable(s) driving the documented impacts. Uncertainty estimates for the predictions are derived from bootstrapping. We employ a nested cross-validation approach to hyperparameter tuning, model selection and classifier evaluation, and find that our binary inclusion classifier achieves an average F1 score of 0.71, and ROC AUC score of 0.92. The prediction of impact type is achieved with an average macro F1 score of 0.84 while the prediction of climate driver is achieved with an average F1 score of 0.79 (see Methods section and Extended Figures 1-5 for a detailed explanation of the labelling, machine learning approach and classifier performance).

Our query returned 603,759 unique documents (Fig.1a): many more than would have been possible to screen by hand. Of these we estimate that 100,724 (62,950-162,838) documents are relevant to understanding the observed impacts of climate change in general, based on the spread of inclusion/exclusion predictions obtained from our model via bootstrapping (Fig. 1a.). This base of relevant publications has grown substantially through the IPCC assessment cycles. 48,911 (39,602-79,464) articles have been published in the sixth assessment cycle so far; this represents more than twice the number of studies published during the AR5 period.

We used a geoparser pre-trained using neural networks²⁶ to extract structured geographic information from the titles and abstracts of the studies in our database. Although the number of relevant studies in North America, Asia, and Europe is much higher than in South America, Africa, and Oceania, there is a large body of relevant studies available on all continents (fig 1.c). The relevant publications are also unevenly distributed across impact categories, with by far the largest number of studies 34,988 (18,520 - 65,666) documenting impacts on terrestrial and freshwater ecosystems (Fig 1.b.). However, the category with the comparably smallest coverage--mountains, snow and ice--still has 6,307 (3,526 - 12,228) studies.

In contrast to the map of observed impacts produced by the IPCC, we do not only include papers which formally attribute impacts to observed trends in climate. Instead, we take a more comprehensive approach reflecting that our objective is to map all possibly relevant studies on climate-related changes, rather than a list of studies where the relationship between an observed climate trend and specific impacts has been demonstrated with high confidence, or even linked to human influence on the climate. This includes studies attributing impacts to observed trends in climate variables, even where the authors do not attribute these trends to human influence, such, for example, a study documenting the influence of the date of snowmelt on the phenology and population growth of mammals²⁷. In addition, we include studies which provide evidence on the sensitivity of human or natural systems to climate metrics, such as on how heart disease mortality responds to variations in temperature²⁸. Finally, we include documents describing the impacts of extreme events and studies which detect significant trends in climate variables or climate extremes²⁹, regardless of whether or not these trends are in line with the expected effects of anthropogenic climate change. We exclude all studies which only describe potential or modelled impacts of future climate change.

Combining geolocated literature with climate information

To add context on the role of anthropogenic climate change in driving impacts, or more precisely the role of historical changes in anthropogenic climate forcing agents such as greenhouse gases and aerosols, we combine our literature database of studies selected using machine learning with spatially explicit analysis of detectable and attributable trends in two key climate variables. Combining evidence from climate model simulations and observational datasets allows us to identify trends likely attributable in part to anthropogenic climate change for near-surface temperature and precipitation at the level of 5 degree (temperature) or 2.5 degree (precipitation) grid cells^19,20. Here we apply this methodology to updated observational data until 2018 for temperature (Fig.2a) and until 2016 for precipitation (Fig.2b), analysing in each case trends from 1951. Grid cells in our categories +-2 or +-3 show where trends cannot be explained by internal variability and are either consistent with or greater than the expected change in climate model simulations that include anthropogenic forcing agents like greenhouse gas increases. We infer that these cells display detectable and at least partly attributable trends (see Methods for more details).

We next resolve the structured geographic information extracted from our studies, which range from continental scale down to individual watersheds or communities, to sets of grid cells (Extended Fig. 9, Methods). We can then derive the weighted number of studies per grid cell according to the number of grid cells to which each study relates. By combining studies related to temperature or precipitation with the gridded information on attributable trends in temperature and precipitation, this provides a necessary (though not necessarily sufficient) condition for a systematic two-step attribution to anthropogenic activities of the impacts predicted by the classifier³⁰. Where studies documenting impacts associated with changes in temperature or precipitation co-occur with attributable trends in those variables, we claim that there is at least preliminary evidence for attributable impacts in these areas. This approach is similar in nature to the “joint attribution” applied in IPCC AR4^31,32.

In general, we note that this type of automated assessment procedure which we present here is no substitute for careful assessment by experts, but can identify large numbers of studies for a region that may point toward attributable human influence on impacts. Confidence in multi-step attribution claims depends on confidence in the attribution of the individual components (steps) along with the confidence or limitation in linking the different steps in the proposed causal chain³². One limitation of our partially automated two-step attribution approach is that we cannot verify that every temperature or precipitation trend cited in impact studies matches, either in sign, magnitude or time period, those attributed to human influence by the regional detection and attribution studies for temperature¹⁹ and precipitation²⁰. This is a greater problem for studies driven by precipitation, where both wetting and drying trends occur with greater temporal variation, though these make up the minority of partially attributed studies and grid cells. We also note that not all studies in our database document impacts in response to trends in climate variables. Where impacts are attributed to extreme events or variation in temperature or precipitation, the fact that recent trends in temperature or precipitation can be attributed to human influence provides important context, but does not allow robust attribution of those impacts. These factors limit confidence in our cases of potential attribution of impacts to anthropogenic forcing. Our approach could be extended with more fine-grained analysis of studies or with attribution of additional signals in climate variables in order to make more robust attribution statements.

For 80% of global land area (excluding Antarctica), trends in temperature and/or precipitation can be attributed at least in part to human influence on the climate according to our analysis (purple cells, Fig. 2c). Using gridded population density data³³, we calculate that this covers 85% of the world’s population. The majority of land grid cells show attributable warming trends, with exceptions where trends cannot be robustly distinguished from internal variability (white cells, category 0) or where there is insufficient data to establish trends (grey cells). For precipitation, attributable wetting and drying trends are found with greater geographical variation. There are also more grid cells where a trend in precipitation cannot be established, or where the observed trend is opposite in sign to that simulated by climate model historical simulations (green and yellow cells, +-4).

Though most of the world’s population resides in areas where trends in temperature and or precipitation can be at least partially attributed to human influence according to our analysis, there is substantial geographical variation in the degree to which the impacts of temperature and precipitation on human and natural systems have been studied. We characterise areas with fewer than 5 weighted studies per grid cell as displaying low evidence, areas with between 5 and 20 weighted studies as robust evidence, and areas with more than 20 weighted studies as high evidence.

For 48% of global land area (hosting 74% of global population), we find robust or high evidence of impacts on human and natural systems colocated with attributable temperature or precipitation trends (Fig. 2c). Areas with this combination of evidence are indicated by the darker purple cells. These constitute almost all grid cells in Western Europe, North America, South and East Asia, and there are parts of all continents for which we have similar pockets of substantial preliminary evidence.

However, for 33% of global land area (hosting 11% of global population), although we have evidence that long-term trends in precipitation and temperature are attributable at least in part to human influence, there is apparently relatively little evidence in the existing literature about how these trends impact human and natural systems (Fig. 2c lightest purple shading). This imbalance suggests, in line with research measuring climate impacts using remote sensing³⁴, that the lack of evidence in individual studies is rather to do with these locations being less intensively studied than an absence of impacts in these areas. Parts of Western Africa, South-east, Western and Northern Asia contain several light red grid cells where there is evidence to suggest that the climate (temperature and/or precipitation) has changed because of human influence, but we have little evidence on how this may be impacting human and natural systems. These demonstrable evidence gaps suggest a lack of impacts research commensurate with current knowledge of how the local climate (temperature and/or precipitation) is changing.

Some of the spatial features can be explained by the geographical characteristics. Among the regions with limited evidence are vast, sparsely populated and difficult to reach areas with a comparable uniform biosphere and climate such as Siberia or the Saharan desert. But beyond these features, our results clearly reveal a substantial 'attribution gap'. We find that 23% of the population of low income countries live in areas with low impact evidence despite at least partially attributable trends in temperature and/or precipitation (Fig. 2.d). In high income countries, this figure is only 3%. A density of 5 studies per grid cell or more with attributable impacts is 1.76 times as prevalent by population for high income countries (88%) as for low income countries (50%), while a density of 20 studies or more with attributable impacts is more than 4 times as prevalent (81% compared to 17%).

In the remaining grey grid cells (Fig. 2c), trends in precipitation and temperature have not been attributed to human influence on the climate according to the methodology in refs. 18 and 19, as applied to CMIP6 models. This does not rule out the possibility that some trends in precipitation or temperature have occured in these regions that have been driven, at least in part, by human influence on the climate. However, due to various factors, such as lack of adequate observational data, high levels of natural variability compared to the climate change signal, or limitations in modelling or estimated climate forcings, some observed changes that actually include anthropogenic contributions may not yet be attributable at the grid cell level. This categorisation of individual gridpoints may well change as new observational data are collected, as models improve, as the global climate continues to warm, or as detection/attribution methodologies improve. Darker grey grid cells (10% of analyzed land area) indicate where there are no detectable trends in temperature or precipitation that can be attributed to human influence at a grid cell level, but where there nevertheless appears to be substantial evidence that local trends in some climate variables lead to impacts on human and natural systems. For example, many studies refer to the impacts of temperature in the state of Western Australia, but of the 40 grid cells in the state, an attributable temperature trend can be demonstrated for 22 cells. For 16 of the remaining cells a lack of data means that a detectable trend cannot be established, and for the remaining 2 cells, no attributable trend can be established.

The lightest grey cells (17% of land area) describe areas where we do not detect anthropogenic influence on regional temperature or precipitation and find few publications about the impacts of temperature or precipitation on human and natural systems in those areas. Apart from high latitudes and over the ocean, these cells are primarily in Africa. For example, in the light grey patch over the central part of sub-Saharan Africa, either limitations of observed data, models, or low signal to noise imply that we are unable to attribute temperature or precipitation trends to human influence on the climate using the methodologies employed here (see extended fig. 4); further, we have identified few studies analysing the impacts of climate change on human and natural systems in those regions. These evidence gaps constitute significant blind spots in our understanding of climate impacts, and in some cases in our understanding of attributable anthropogenic influence on regional precipitation and/or temperature.

In total, 57,366 studies discuss impacts related to a driver which our analysis suggests can be attributed in part to human influence on the climate in at least one grid cell to which the study refers. We find hundreds of partially or mostly attributable studies (where there are attributable trends in the relevant climate variable for at least 1% or more than 50% of grid cells respectively) in each impact category across all continents (Fig. 3, indicated by the darker green and purple bars). This figure ranges from 268 (143-514) studies of impacts on mountains, snow and ice in Africa to 7,835 (4,308-13,552) studies of impacts on terrestrial ecosystems in North America. Wide confidence intervals here reflect the compound uncertainty deriving from classification of relevance, impact and driver.

Our analysis also allows quantification of how the share of research on each impact category varies from continent to continent. For example, research on human and managed systems makes up 12% of all research globally, but only 10% of research in Europe, compared to 19% in Africa. This focus on human and managed systems in Africa is remarkable given that the absolute numbers of studies in Africa (1,466) is similar to that in Europe (1,799) despite the vast difference in total numbers of studies between the two continents. This greater share of research in Africa documents impacts in human and managed systems may reflect the high vulnerability of particularly sub-Saharan Africa to climate impacts³⁵.

We develop a novel two-step attribution process which combines a transparent and reproducible^36,37 machine learning approach to identifying studies on observed climate impacts with model-based assessments of detectable anthropogenic contributions to historical temperature and precipitation trends. Using machine learning to scale up evidence synthesis allows us to map 100,000 studies of climate impacts, providing the most comprehensive picture of the evidence base to date. Bringing together these two lines of evidence on climate change and climate impacts provides a new bridge between the climate science community and the impacts, adaptation, and vulnerabilities communities, and highlights the synergistic nature of their approaches.

Our spatially resolved approach allows for a systematic provision of regional to local, sector-specific climate impact information to local or regional experts and adaptation practitioners. This offers perspectives for a novel climate service supporting the uptake of scientific information in local contexts and providing relevant information for adaptation action. Second, the quantification of an “attribution gap” highlights the need for more research on climate impacts in low income countries. Furthermore, the automated nature of the assessment allows for continuous updating of the database, creating a ‘living’ evidence map that can also be improved and extended by incorporating additional sources of relevant publications (e.g. non-English speaking evidence, or improved/expanded regional detection/attribution studies) and targeted assisted learning in regional or topical areas of interest.

The database we compile is vast, but neither complete nor perfect. Our systematic query-based literature search in the Web of Science and Scopus - two large bibliographic databases - is extensive, but will also exclude some relevant studies from our considerations. The selection and categorisation of studies was achieved using machine learning, meaning that our results are subject to additional uncertainties, which compound for each level of classification. Further, documents were coded only at the abstract level, and only the abstracts were used as inputs to our classifiers. Given the relative simplicity of the type of information we extract (focussing on the impact area studied and the documented driver), we expect them to be covered in the abstract of a study, which provides the condensed summary of the study’s findings. Applying classifiers to noisy full texts which contain contextual information and related research as well as the results and topic of a study would greatly increase the risks of false positives. We thus find our approach well justified for such high-level syntheses.

The database we assemble will also incorrectly exclude some relevant documents and contain some documents that have been incorrectly included or incorrectly coded, but the approach enables us to report both classifier performance and associated uncertainties. Additionally, some included studies may be of low quality, as no process for critical appraisal (a key component of formal systematic reviews) was followed either by human reviewers or in the machine learning pipeline. In the case of systems subject to other anthropogenic interference such as the global biosphere, managed systems such as agriculture, or human systems themselves, identifying a robust climate change driver requires careful assessment of other socio-economic factors^38,39, adding additional levels of complexity⁴⁰.

The two-step attribution process is also only applied for the subset of papers which provide evidence on impacts driven by temperature and precipitation. Exploring the role of human influence for studies analysing the effects of factors other than trends in mean temperature or precipitation as the main driver would require additional attribution strategies, but these could in principle be combined with individual studies in similar ways. There is a growing literature on attributable human influence on a number of climate metrics at the regional scale as well as extreme events^41–43, and therefore much scope for expansion of this approach. Finally, we note that plausible causal chains of cascading impacts are not covered by our attribution approach (such as temperature driving an increase in drought, leading to reduced agricultural yields) except where studies address each part of the causal chain.

These caveats highlight that the type of machine learning-assisted evidence map we present here is no substitute for careful assessment by experts, either in the context of a gold-standard systematic review⁴⁴ or in IPCC assessments. However, in an age of “big literature”^8,10, it is an invaluable complement. The use of machine learning means we consider more evidence than would otherwise be feasible, showing where evidence appears to be more prevalent and where important gaps can be observed. While traditional assessments can offer relatively precise but incomplete pictures of the evidence, our machine-learning-assisted approach generates an expansive preliminary but quantifiably uncertain map. Further, it enables us to provide an automated, living systematic map of climate impacts that can be readily updated. Ultimately, we hope that our global, living, automated, and multi-scale database will help to jump-start a host of reviews of climate impacts on particular topics or particular geographic regions.

Machine-learning pipelines as developed here will be useful to prepare the IPCC for the age of big literature by scaling systematic evidence mapping approaches. However, our results also show how synthesis and transparency can be lifted to new levels by combining so-far disparate lines of evidence and reporting classifier performance as well as associated uncertainties. If science advances by standing on the shoulders of giants, in times of ever-expanding scientific literature giants’ shoulders become harder to reach. Our computer-assisted evidence mapping approach can offer a leg-up.

Competing interests: The authors declare no competing interests.

Author contribution statement

M.W.C. and J.C.M., and C-F.S. designed the research. M.W.C developed the coding platform and machine learning pipeline to identify studies, with advice from M.R., M.W.C, C-F.S., G.H., Q.L., E.T. developed the codebook and coordinated screening and coding. M.W.C., Q.L., S.N., C-F.S.. conceptualised the link to detection and attribution data. S.N. performed the reanalysis of temperature and precipitation trends with T.R.K. M.W.C and S.N. designed and implemented the matching of studies with detection and attribution data. All other authors contributed to screening and coding studies. M.W.C., C-F.S., J.C.M., Q.L., and S.N. wrote the manuscript with contributions from all authors.

1. IPCC. Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change . (Cambridge University Press, 2014).

2. Minx, J. C., Callaghan, M., Lamb, W. F., Garard, J. & Edenhofer, O. Learning about climate change solutions in the IPCC and beyond. Environ. Sci. Policy 77 , 252–259 (2017).

3. Cramer, W. et al. Detection and attribution of observed impacts. in Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel of Climate Change (eds. Field, C. B. et al.) 979–1037 (Cambridge University Press, 2014).

4. Hansen, G. & Stone, D. Assessing the observed impact of anthropogenic climate change. Nat. Clim. Change 6 , 532–537 (2016).

5. Haunschild, R., Bornmann, L. & Marx, W. Climate Change Research in View of Bibliometrics. PLOS ONE 11 , e0160393 (2016).

6. Grieneisen, M. L. & Zhang, M. The current status of climate change research. Nat. Clim. Change 1 , 72–73 (2011).

7. Haddaway, N. R. & Pullin, A. S. The Policy Role of Systematic Reviews: Past, Present and Future. Springer Sci. Rev. 2 , 179–183 (2014).

8. Callaghan, M. W., Minx, J. C. & Forster, P. M. A topography of climate change research. Nat. Clim. Change 10 , 118–123 (2020).

9. Porciello, J., Ivanina, M., Islam, M., Einarson, S. & Hirsh, H. Accelerating evidence-informed decision-making for the Sustainable Development Goals using machine learning. Nat. Mach. Intell. 2 , 559–565 (2020).

10. Nunez-Mir, G. C., Iannone, B. V., Curtis, K. & Fei, S. Evaluating the evolution of forest restoration research in a changing world: a “big literature” review. New For. 46 , 669–682 (2015).

11. Westgate, M. J. et al. Software support for environmental evidence synthesis. Nat. Ecol. Evol. 2 , 588–590 (2018).

12. Lamb, W. F., Creutzig, F., Callaghan, M. W. & Minx, J. C. Learning about urban climate solutions from case studies. Nat. Clim. Change 9 , 279–287 (2019).

13. Schleussner, C.-F. & Fyson, C. L. Scenarios science needed in UNFCCC periodic review | Nature Climate Change. Nat. Clim. Change 10 , (2020).

14. Fankhauser, S. Adaptation to Climate Change. Annu. Rev. Resour. Econ. 9 , 209–230 (2017).

15. Bedsworth, L. W. & Hanak, E. Adaptation to Climate Change. J. Am. Plann. Assoc. 76 , 477–495 (2010).

16. IPCC. Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation. (Cambridge Univ ersity Press, 2012).

17. Hallegatte, S. & Mach, K. J. Make climate-change assessments more relevant. Nat. News 534 , 613 (2016).

18. The need for bottom-up assessments of climate risks and adaptation in climate-sensitive regions | Nature Climate Change. https://www.nature.com/articles/s41558-019-0502-0.

19. Knutson, T. R., Zeng, F. & Wittenberg, A. T. Multimodel Assessment of Regional Surface Temperature Trends: CMIP3 and CMIP5 Twentieth-Century Simulations. J. Clim. 26 , 8709–8743 (2013).

20. Knutson, T. R. & Zeng, F. Model Assessment of Observed Precipitation Trends over Land Regions: Detectable Human Influences and Possible Low Bias in Model Trends. J. Clim. 31 , 4617–4637 (2018).

21. Nerem, R. S. et al. Climate-change-driven accelerated sea-level rise detected in the altimeter era. Proc. Natl. Acad. Sci. U. S. A. 115 , 2022–2025 (2018).

22. Gudmundsson, L., Leonard, M., Do, H. X., Westra, S. & Seneviratne, S. I. Observed Trends in Global Indicators of Mean and Extreme Streamflow. Geophys. Res. Lett. 46 , 756–766 (2019).

23. Padrón, R. S. et al. Observed changes in dry-season water availability attributed to human-induced climate change. Nat. Geosci. 13 , 477–481 (2020).

24. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv181004805 Cs (2019).

25. Sanh, V., Debut, L., Chaumond, J. & Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv191001108 Cs (2020).

26. Halterman, A. Mordecai: Full Text Geoparsing and Event Geocoding. J. Open Source Softw. 2 , 91 (2017).

27. Lane, J. E., Kruuk, L. E. B., Charmantier, A., Murie, J. O. & Dobson, F. S. Delayed phenology and reduced fitness associated with climate change in a wild hibernator. Nature 489 , 554–557 (2012).

28. Zhang, Y. Q., Yu, C. H. & Bao, J. Z. Acute effect of daily mean temperature on ischemic heart disease mortality: a multivariable meta-analysis from 12 counties across Hubei Province, China. Zhonghua Yu Fang Yi Xue Za Zhi 50 , 990–995 (2016).

29. Barry, A. A. et al. West Africa climate extremes and climate change indices. Int. J. Climatol. 38 , e921–e938 (2018).

30. Hegerl, G. C. et al. Good Practice Guidance Paper on Detection and Attribution Related to Anthropogenic Climate Change. in Meeting Report of the Intergovernmental Panel on Climate Change Expert Meeting on Detection and Attribution of Anthropogenic Climate Change (eds. Stocker, T. F. et al.) (IPCC Working Group I Technical Support Unit, University of Bern, 2010).

31. Rosenzweig, C. et al. Assessment of observed changes and responses in natural and managed systems. in Climate Change 2007: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change 79–131 (Cambridge University Press).

32. Rosenzweig, C. et al. Attributing physical and biological impacts to anthropogenic climate change. Nature 453 , 353–357 (2008).

33. Center for International Earth Science Information Network - CIESIN - Columbia University. Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 11. (2018).

34. Frank, D. et al. Effects of climate extremes on the terrestrial carbon cycle: concepts, processes and potential future impacts. Glob. Change Biol. 21 , 2861–2880 (2015).

35. Schleussner, C.-F. et al. 1.5°C Hotspots: Climate Hazards, Vulnerabilities, and Impacts. Annu. Rev. Environ. Resour. 43 , 135–163 (2018).

36. Peng, R. D. Reproducible Research in Computational Science. Science 334 , 1226–1227 (2011).

37. Müller-Hansen, F., Callaghan, M. W. & Minx, J. C. Text as big data: Develop codes of practice for rigorous computational text analysis in energy social science. Energy Res. Soc. Sci. 70 , 101691 (2020).

38. Shepherd, T. G. Storyline approach to the construction of regional climate change information. Proc. R. Soc. Math. Phys. Eng. Sci. 475 , 20190013 (2019).

39. Rosenzweig, C. & Neofotis, P. Detection and attribution of anthropogenic climate change impacts. WIREs Clim. Change 4 , 121–150 (2013).

40. Mengel, M., Treu, S., Lange, S. & Frieler, K. ATTRICI 1.0 - counterfactual climate for impact attribution. Geosci. Model Dev. Discuss. 1–26 (2020) doi:https://doi.org/10.5194/gmd-2020-145.

41. Diffenbaugh, N. S. Verification of extreme event attribution: Using out-of-sample observations to assess changes in probabilities of unprecedented events. Sci. Adv. 6 , eaay2368 (2020).

42. Herring, S. C., Christidis, N., Hoell, A., Hoerling, M. P. & Stott, P. A. Explaining Extreme Events of 2019 from a Climate Perspective. Bull. Am. Meteorol. Soc. 102 , S1–S116 (2021).

43. Gudmundsson, L. et al. Globally observed trends in mean and extreme river flow attributed to climate change. Science 371 , 1159–1162 (2021).

44. Cochrane Handbook for Systematic Reviews of Interventions. (John Wiley & Sons, 2019).

Outline

An overview of each of the steps taken in this study is given in Extended Fig 1. These are outlined briefly here and explained in detail in the following sections. Over 600,000 documents were retrieved from bibliographic databases using a query. 2,373 of these documents were screened for relevance and coded for impact type and driver by human reviewers. The implicit inclusion and coding decisions for a further 351 documents were extracted from Tables 18.5-18.9 in the contribution of Working Group II to the Fifth Assessment Report of the IPCC¹. Machine learning classifiers were trained to predict relevance of documents using the titles and abstracts, and evaluated using nested cross-validation. The best performing classifier was then fit with all labelled documents using bootstrapping to make predictions with confidence intervals for the relevance of the remaining documents. Those documents predicted to be irrelevant were discarded, as were documents labelled by reviewers as irrelevant. Multilabel classifiers were then trained using the remaining labelled relevant documents, and assessed in a similar fashion using cross-validation. Predictions for impact type and driver were then made for the remaining unlabelled documents. Geographical entities were extracted from the included studies using a geoparser, and each entity was matched to the set of 2.5 degree grid cells overlapping it. Observed trends in precipitation and temperature were collected for 2.5 and 5 degree grid cells and compared with climate models to assess whether observed trends were detectable (i.e., unusual compared with natural variability, and in the same direction as simulated by historical forcing climate model simulations) and at least partially attributable to human influence on the climate, as discussed below. Finally, documents predicted to be driven by temperature or precipitation were extracted from the database of studies and merged with the grid cell attribution datasets so that each document could be characterised by the presence of human-attributable climate trends in the grid cells it referred to, and each grid cell could be characterised by the number of studies referring to it.

Search, screening and coding

Search Strategy

Potentially relevant documents were assembled by developing a query to search bibliographic databases. To validate the query, we tested this against a set of records known to be relevant. Tables 18.5-18.9 in the contribution of Working Group II to the Fifth Assessment Report of the IPCC² (AR5 WGII) contain the studies considered in their assessment of the observed impacts of climate change. After extracting these references, we built a query that would return all of the references in the tables that specifically referred to the role of climate change (rather than of counterfactual explanations for impacts). The query is reproduced in the Supplementary Information (in the format for Web of Science - the same query was used for scopus) and is made up of three lists of keywords linked with boolean ANDs. The first set of keywords refer to climate and climate variables, the second to impacts, and the third to observations and attribution.

The query was performed on Scopus and the following citation indices from the Web of Science Core Collection:

Science Citation Index Expanded (SCI-EXPANDED) --1900-present
Social Sciences Citation Index (SSCI) --1900-present
Arts & Humanities Citation Index (A&HCI) --1975-present
Conference Proceedings Citation Index- Science (CPCI-S) --1990-present
Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH) --1990-present
Emerging Sources Citation Index (ESCI) --2015-present

The queries were updated on October 19 2020: Extended Table 1 documents the number of documents retrieved from each database and the total number of records after deduplication through fuzzy title and publication year matching using trigram similarity. The queries were imported into a database and deduplicated using the NACSOS review platform³.

Inclusion and exclusion criteria

We take a broad definition of climate impacts to include all studies relevant to understanding the observed impacts of climate change. This includes

Studies which explicitly link impacts to climate change (8% of coded studies)
Studies which link impacts to trends in climate drivers like temperature or precipitation (42% of coded studies)
Studies which link impacts to extreme climate events (6% of coded studies)
Studies which link impacts to variation in climate drivers (39% of coded studies)
Studies which document regional or local climate trends (11% of coded studies)

Documents which only provide evidence of likely future impacts of climate change were excluded.

With this broad definition of climate impacts evidence, we do not claim that each study is in and of itself evidence of the impacts of climate change. Rather, taken together, and in the context of observations and climate models, this collection of included studies constitutes the evidence base necessary for understanding climate impacts.

Coding impacts and drivers

Where documents were selected for inclusion, reviewers coded the attribution category, the climate impacts and the drivers (where appropriate) for each paper. Impacts and their drivers were chosen from a selection of 75 specific categories, which were aggregated according to the hierarchy of categories included in the supplementary file category_aggregation.csv. 93% of included studies coded impacts in one or more of the 5 broad impact categories used by IPCC AR5:

Mountains, snow and ice (11.42% of included studies)
Rivers, lakes and soil moisture (21.27% of included studies)
Terrestrial ecosystems (33.13% of included studies)
Coastal and marine ecosystems (13.21% of included studies)
Human and managed systems (21.42% of included studies)

Remaining studies documented only trends in climate variables without reference to any of these systems.

Screening and Coding

A total of 2,373 documents were screened by members of the author team using the NACSOS platform³, of which 1,125 were included as relevant and coded for impacts and drivers. The median number of documents coded per user was 133, and the mean was 173.

In addition, documents extracted from the tables 18.5-18.9 in AR5 WGII were automatically labelled as relevant and tagged with the broad impact categories corresponding to the table in which they were found.

In order to mitigate a highly unbalanced sample (few relevant documents among many irrelevant documents), and to make best use of reviewing resources, some documents were selected for screening using an adapted active learning pipeline. With active learning, a classifier (see following section for details) is trained using existing screening decisions to predict the relevance of documents yet to be reviewed. Usually, reviewers screen subsequent documents in decreasing order of predicted relevance and the classifier is periodically updated with the new data that has been generated. Given that our goal was to not to screen all relevant documents but to generate useful labels efficiently, we created samples with relevance predictions greater than 0.2, 0.3 and 0.4, in order to exclude documents with a low likelihood of being relevant. Documents were first screened by a small group of reviewers who developed the categorisation scheme for impacts and drivers. A subsequent set of documents was screened by all reviewers, and differences in coding were discussed and alterations recorded. Reviewers were then split into teams corresponding with the AR5 impact categories according to expertise, and screened documents predicted to be rather relevant (>0.33) to the given category. Each team screened a sample of documents and discussed differences in screening and coding decisions. Teams reached average Cohen’s Kappa scores between 0.66, indicating substantial agreement, and 1.0, indiciating full agreement. After this initial round of double coding, reviewers proceeded to screen documents individually. Additional documents were selected for screening using keyword searches (https://github.com/mcallaghan/regional-impacts-map/blob/master/literature_identification/category_keywords.ipynb) to identify documents from infrequently appearing subcategories.

Because the documents selected using the methods described above are unlikely to be representative of the full set of documents returned by the query, we also screened 732 documents drawn at random which we used for validation.

Machine learning classifiers for inclusion, impact type and drivers

We first trained a binary classifier to predict the inclusion/exclusion decision given by reviewers. We use a nested cross-validation procedure (Extended Fig. 2) to optimize parameter settings and evaluate the performance of a support vector machine (SVM) classifier⁴ as well as a pre-trained DistilBERT model fine-tuned with our labelled dataset⁵. Support vector machines have a long history of applications in evidence synthesis⁶, while the BERT⁷ (Bidirectional Encoder Representations from Transformers) model recently achieved state of the art results in a variety of natural language processing challenges, and has begun to be used in evidence synthesis pipelines⁸.

In our nested cross-validation procedure, we first separate those documents which were drawn at random from the population of documents identified by the query from the remaining unrepresentative documents. Only randomly selected documents are used in validation and test sets, in order to ensure that the estimation of the performance of the classifier on the whole dataset is not biased. In the outer fold of the cross-validation loop, a separate test set is drawn from the randomly selected documents for each fold, k, and all other documents are assigned to the test set. The inner CV loop draws k inner validation sets from the remaining random documents in the training set, and allocates all other documents in the training set to an inner training set. The inner loop is used to optimise hyperparameters for each model using grid search: a model is initialised with each combination of hyperparameters and fit on each inner training set and evaluated on each inner validation set. The combination of hyperparameters with the best mean F1 score across inner folds is selected as the best model. This model is fit with the training data from the outer CV and evaluated with the test data. The outer CV thus returns k scores for each metric, which we report below.

We evaluated our binary inclusion/exclusion classifiers with 5 inner and outer folds. DistilBERT clearly outperformed SVM across all metrics, achieving an average F1 score of 0.71, and an average ROC AUC score of 0.92 (Extended Fig. 3). A final DistilBERT model configuration was chosen using the same procedure on the outer folds. Each combination of parameter settings was tested on each outer fold, and the combination of parameter settings with the highest mean F1 score was selected.

This final model was used to predict the relevance of all remaining documents. To create a confidence interval for each prediction, 5 versions of the final model were trained on 5 folds of the data. Upper and lower estimates for each document are given by the mean plus or minus one standard deviation. All documents where the lower estimate was below 0.5 were excluded from the study.

We then trained multilabel classifiers to predict the impact category and the driver category of included documents. Classifiers parameters were optimised and classifiers evaluated with the same nested cross-validation method, using only those labelled documents which were included. Because documents selected for screening using the active learning process are broadly representative of the documents to which the multilabel classifiers are applied, all documents selected in this manner are also used for validation. Due to the lower number of documents, and lower number of documents drawn from a random sample in this set, we used a smaller k value of 3 for cross-validation. We treat each class equally and optimise using the macro F1 score. For the prediction of impact categories, DistilBERT outperforms SVM, achieving a macro-averaged F1 score of 0.84 and a macro-averaged ROC AUC score of 0.95 (Extended Fig. 4.). For classification of climate drivers, we optimise for the macro-averaged F1 score for the categories temperature and precipitation. DistilBERT outperforms SVM, achieving an average F1 score of 0.79 and an average ROC AUC score of 0.86. Where no individual class has a prediction larger than 0.5, documents are classes as “Other systems”.

Detection and Attribution

To put our database of impact studies in context, we match studies with grid cell level detection and attribution of temperature and precipitation trends to human influence on the climate.

Updating attribution of temperature and precipitation trends

We followed a previously published methodology^9,10 used to attribute observed temperature and precipitation trends to human influence around the globe, at the level of typical climate model grid cells (5 degree grid boxes for temperature and 2.5 degree grid boxes for precipitation). The different resolutions are based on the available observed datasets, which we did not regrid for our project. The method relies on a comparison of gridbox-scale trends in observational datasets for temperature (HadCRUT4 version 4.6¹¹) and precipitation (GPCC v2018, obtainable from https://psl.noaa.gov/data/gridded/data.gpcc.html), with those produced in climate model runs from CMIP6¹². The CMIP6 runs simulate climate changes over the historical period under the influence of either all forcings (i.e., both natural and anthropogenic, referred to as “ALL”) or natural forcings only (referred to as “NAT”).

We analysed the outputs of these simulations from 10 CMIP6 models, namely MIROC6, IPSL-CM6A-LR, CanESM5, HadGEM3-GC31-LL, CNRM-CM6-1,GFDL-ESM4, CCESS-ESM1-5, BCC-CSM2-MR, NorESM2-LM and CESM2 . The model selection was based on the availability of ALL, NAT as well as “piControl” runs (simulating internal climate variations in the absence of external forcings, apart from a constant solar forcing). The analysis provides a test of the ability of the corresponding ALL simulations to reproduce the regional trends in annual mean temperature and precipitation against observational data¹³. For some models the ALL simulations were not available after 2014, in which case we combined them with the first few years of the ssp585 simulations of future climate conditions in order to match the length of the observational data.

Linear trends over the 1951-2018 (for temperature) and 1951-2016 periods (for precipitation) were computed over each grid cell with adequate data for each observational dataset, following the criteria of ref. 7 and 8 (see Extended Figures 6a&b). For temperature we computed a linear trend for each ensemble member of the HadCRUT4 dataset, from which observed trend distributions were derived. Precipitation trends were not computed over grid cells where less than 20% of data was available for the first or last 10% of the observed time series or where the entire time series had less than 70% of data available. For temperature, we divide the trend period into five roughly equal periods and require that each period has at least 20% temporal coverage for annual means. We consider an annual mean as available if at least 40% of the months are available for the year.

To be compared with the observational data, for each model the data from both the ALL and NAT runs were first re-gridded onto the observational grids (5° × 5° for temperature and 2.5° x 2.5° for precipitation), excluding times and grid locations where observed data were missing, before linear trends were computed over each grid cell in which adequate temporal coverage was available (see Extended Figures 6c&d). For each model, we then assessed the potential effect of internal variability by computing trends of the length being investigated in 50 random samples of the corresponding piControl runs from each model. The model control runs had beforehand been corrected for any long-term drift, and the anomaly series adjusted by a factor to ensure consistency of low-frequency variability between model control runs and estimated internal variability from observations (further discussed below). We then combined the resulting trend distributions from the piControl runs with the trends computed in the ensemble mean of ALL and NAT runs. Following previous studies^9,10, the final trend distribution for temperature was based on an aggregate distribution of all constructed model trend distributions (and thus included the spread of different model ensemble means) whereas for precipitation, an average distribution of model trends across the ensemble was used (i.e., the distribution had the average characteristics of the 10 CMIP6 models).

Attribution categories were assigned to grid cells (Extended Fig. 6 e,f) based on where their observed trend (or trend distribution in the case of temperature) lay relative to the final trend distributions derived from the ALL and NAT runs. Over the grid cells where an observed trend was in the same direction (sign) as the mean of the ALL trend distribution and was outside the trend distribution 5th-95th percentile range for the NAT simulations, the observed trend was categorized as -3 (+3), -2 (+2) or -1 (+1) depending on whether it was significantly stronger, the same or weaker than the simulated decrease (increase). Categories -3 (+3) and -2 (+2) are defined as decreases (increases) that are detectable and at least partially attributable to anthropogenic forcing, according to our methodology. Categories -1 (+1) are detectable but not attributable. If the observed trend was significantly different from the NAT distribution, but was in the opposite direction to the mean of the All-Forcing distribution, it was categorized as -4 (observed decrease, modeled increase) or +4 (observed increase, modeled decrease). All observed trends (or trend distributions, in the case of temperature) that intersected with the 5th-95th percentile range of the corresponding trend distributions derived from the NAT runs were categorized as non-detectable, or indistinguishable from natural variability (i.e. category 0). Note that for cases where observed trends or trend distributions had a different sign of the mean trend from that of the trend distribution derived from the ALL runs, but were within the range of the Nat run distribution, the corresponding grid cells were also categorised as non-detectable (category 0).

Once the grid cells were categorised, in the case of temperature the results were re-gridded to a 2.5° x 2.5° grid to allow superposition with the categories obtained for precipitation.

Our analysis requires the internal variability for each grid location and variable to be estimated via model control runs. To compare observed estimated internal variability and trends with those generated by the model control runs, Extended Figs. 7 and 8 show fractional difference maps for estimated internal low-frequency variability (model vs. observed) for each model individually and for the ensemble mean of the modeled variability (the latter being most relevant for our analysis, which is based on combined estimated variability across the models). The observed low-frequency internal variability is estimated by subtracting the multi-model ensemble All-Forcing change from the observations and computing the standard deviation of the annual residuals, after application of a 7-year running mean filter. For models, we use the simulated variability from the various control runs, again smoothed with the 7-year running mean smoother. The averaged internal low-frequency variability comparison plot for precipitation (Extended Fig. 7, top panel) shows reds in most regions indicating that by this measure of internal low-frequency variability, the CMIP6 models actually tend to overestimate observed variability levels. So our detection results for precipitation will tend to be conservative, while conversely, the ability of All-Forcing to be consistent with observations will tend to be liberal, because the modeled spread is relatively wide. However, blue regions are evident in Extended Fig. 7 in some tropical regions, including over Africa and South America, indicating an undersimulation of internal low-frequency variability there. We took the internal variability comparisons vs. observed estimated internal variability in Extended Fig. 7 and adjusted the control run variability and trends by the ratio [Obs. stdev / Model stdev] prior to computing our assessment categories. Results without this variability adjustment (not shown) are broadly similar but show more category -4 (unexplained trends of incorrect sign) over Africa, where internal low-frequency variability appears to be underestimated in models according to this analysis; unadjusted results show slightly less detectable human influence in middle and high latitudes, where internal variability is apparently overestimated in models.

For surface temperature (extended Fig. 8) the internal variability comparison results vs. observed estimates are similar to those of Knutson et al. 2013 for CMIP3 and CMIP5 with a mixture of results: models tend to simulate more internal variability than the observed estimate in northern mid to high latitudes, typically less than observed over most other ocean regions at lower latitudes, and mixed results over land regions. Whether we include the gridpoint-scale adjustment of simulated internal variability in our detection/attribution analysis or not, the results are similar (unadjusted control run-based assessment not shown). For the assessment of 1951-2018 observed trends (Extended Fig. 6), there are some additional regions with detectable anthropogenic warming compared to Knutson et al. (2013), but that is as expected, since the Knutson et al. analysis only examined trends through 2010. With the termination of the ‘global warming hiatus’ around 2014, the additional recent years have been adding to an ongoing strengthening warming signal and leading to even greater assessed area with detectable anthropogenic warming. In Extended Fig. 6 and elsewhere in the study, we use the adjusted control run results for our assessments for both temperature and precipitation.

Spatial resolution of studies

In order to match this data with the finest-scale resolution of our database, we resolved each study to the set of 2.5 degree grid cells contained by the smallest geographical entity extracted from each paper’s title and abstract using the geoparser Mordecai¹⁴. For each study, we calculated the proportion of the grid cells that this entity corresponds to in which an attributable trend for each variable can be found. For example, in Extended Figure 9, panels a. and b. show that 20 out of Sudan’s 27 grid cells show an attributable anthropogenic warming trend, so each study referring to Sudan and documenting impacts predicted to be driven by temperature receives a precipitation trend proportion value of 20/27. Such a study would therefore add towards the dark red bars in Fig. 3, which count studies where an attributable temperature trend can be demonstrated for more than 50% of the grid cells the study refers to.

We also calculate a weighted number of studies for each grid cell by adding 1 divided by the number of grid cells a study refers to to each of those grid cells, and repeating this procedure for all identified relevant studies. Extended Figures 9c. and d. show 11 studies which refer to impacts predicted to be driven by temperature trends in Sudan, where Sudan is the smallest geographical entity mentioned. Each gridcell in Sudan therefore recieves 11/27 weighted studies. Given that some geographical entities were too small to hold one 2.5 degree grid cell, their longitude-latitude values were interpolated to the nearest grid cell instead and the grouped studies apportioned to that one grid cell. Because 4 additional studies refer to Khartoum, we add 4/1 to the weighted studies value in the grid cell containing Khartoum.

1. Cramer, W. et al. Detection and attribution of observed impacts. in Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel of Climate Change (eds. Field, C. B. et al.) 979–1037 (Cambridge University Press, 2014).

2. IPCC. Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change . (Cambridge University Press, 2014).

3. Callaghan, M., Müller-Hansen, F., Hilaire, J. & Lee, Y. T. NACSOS: NLP Assisted Classification, Synthesis and Online Screening. (Zenodo, 2020). doi:10.5281/zenodo.4121526.

4. Chang, C.-C. & Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 , 27:1-27:27 (2011).

5. Sanh, V., Debut, L., Chaumond, J. & Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv191001108 Cs (2020).

6. Cohen, A. M. An Effective General Purpose Approach for Automated Biomedical Document Classification. AMIA. Annu. Symp. Proc. 2006, 161–165 (2006).

7. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv181004805 Cs (2019).

8. Porciello, J., Ivanina, M., Islam, M., Einarson, S. & Hirsh, H. Accelerating evidence-informed decision-making for the Sustainable Development Goals using machine learning. Nat. Mach. Intell. 2 , 559–565 (2020).

9. Knutson, T. R. & Zeng, F. Model Assessment of Observed Precipitation Trends over Land Regions: Detectable Human Influences and Possible Low Bias in Model Trends. J. Clim. 31 , 4617–4637 (2018).

10. Knutson, T. R., Zeng, F. & Wittenberg, A. T. Multimodel Assessment of Regional Surface Temperature Trends: CMIP3 and CMIP5 Twentieth-Century Simulations. J. Clim. 26 , 8709–8743 (2013).

11. Morice, C. P., Kennedy, J. J., Rayner, N. A. & Jones, P. D. Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set. Atmospheres (2012) doi:https://doi.org/10.1029/2011JD017187.

12. Eyring, V. et al. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9 , 1937–1958 (2016).