This review adheres to the guidelines of the Collaboration for Environmental Evidence (CEE) (https://environmentalevidence.org/) and follows the PRISMA reporting standards 27. Following CEE guidelines, we published an a priori protocol 26 that was reviewed by two external experts on systematic review methodology and on the content subject. The PRISMA 2020 Reporting Checklist is provided in SI.
Search strategy
We conducted a comprehensive search for publications in five languages (Chinese, English, German, French, and Spanish, ) to reduce bias and ensure a wide geographical coverage of potential studies 26. We included both peer-reviewed papers and grey literature. We searched Web of Science and Google Scholar to identify publications in all languages. We further searched Scopus and CABI in English using the same search string as for Web of Science, and CNKI (www.cnki.net) in Chinese. The search was restricted to articles published after 1 January 2020. The main search was carried out on 17 June 2022 and search results were updated until 31 August 2022.
Together with native speakers and after series of initial scoping searches, we defined sets of keywords composed by #1 type of settlement, #2 type of the greenspace, #3 COVID-19 related terms, #4 different usage terms. The search string was developed in English and subsequently translated into other languages. We improved the list of English search terms through a machine learning strategy with the R package litsearchr 66. During this procedure, all abstracts, titles and keywords from the first 1000 hits in Web of Science were text-mined to generate a list of frequently mentioned keywords, sorted by strength. The outcome was a list of 2081 terms, from which we identified seven that we added to the initial search string for improvement. The final search string #5 combines the different search terms under different elements (#1-#4) using Boolean operator AND. Search string #4 was left out from the searches in other languages than English as only few studies are available and they are not captured with such a specific search string. The asterisk (*) is a wildcard character that represents any group of characters, including no character. Quotation marks are used to search exact phrases.
To test the comprehensiveness of the search strategy, we pre-selected a list of 20 key papers 26 that we considered as key for the subject and that cover different subject domains and regions of interest. We extended the set of English keywords until all 20 papers were included in a search on Web of Science.
The search string used was the following (in Web of Science format, TS=topic search):
#1 TS = (urban* OR town* OR settlement* OR "populated area*" OR agglomeration* OR "built environment*" OR city OR cities OR village* OR "public space*")
#2 TS = (green* OR park* OR "open space*" OR "natur* area*" OR “urban natur*” OR garden* OR forest* OR vegetate* OR ecosystem* OR backyard* OR cemeter* OR graveyard* OR waterside* OR river* OR “roof garden*” OR balcon* OR “vertical green*” OR agricultur* OR “protected area*” OR “nature reserve*” OR “national park*”)
#3 TS = (pandemic* OR COVID OR COVID-19 OR corona* OR coronavirus OR SARS-COV-2 OR lockdown* OR “social distancing” OR "Severe Acute Respiratory Syndrome Coronavirus 2" OR "2019-nCoV")
#4 TS = (use OR need* OR benefit* OR recreation* OR health OR service* OR motivation* OR mobility OR attitude* OR leisure OR walk* OR hike OR hiking OR running OR play* OR window* OR view* OR gardening OR jogging OR sport* OR “physical activity” OR outdoor*)
#5 TS = #1 AND #2 AND #3 AND #4
Screening:
In the screening process, we applied predefined eligibility criteria in two stages: first, title and abstracts, and then screening full-text articles. Each article was independently screened by at least two reviewers and any disagreements were resolved through discussion or consultation with a third person. Articles were included based on the following eligibility criteria:
- Population: Studies that were conducted in a country affected by the COVID pandemic
- Exposure: Studies that looked at the change in use of urban greenspaces
- Comparator: Studies that have reference to the time before and during or after lockdowns.
- Outcomes: Actual or stated uses of urban greenspaces or outcomes on health/well-being that directly relate to the use of greenspaces
- Data: Studies with quantitative or qualitative data on the outcomes.
- Languages: English, Chinese, German, Spanish, and French
Articles not fulling the eligibility criteria were excluded. Additionally, the following articles were excluded: a) Reviews, b) Studies based on anecdotal evidence, c) Studies where changes in use of urban greenspace since COVID are assumed as a given (no data linked with the change).
Grey literature was also screened in two stages: first based on titles and a short excerpt from Google Scholar, then by assessing the full text. All other screening steps were conducted with the online tool Rayyan 67. We report all reviewing steps with article numbers in a PRISMA diagram (Fig.1) 27. A list of articles excluded at the full-text stage and the reason for excluding are provided in SI.
Data extraction
After screening, we extracted pre-defined variables from the selected studies to Excel. At the beginning of the data extraction process, we jointly extracted data from 5 articles, after that, one person continued the extraction and throughout the process discussed unclear cases with a second person. In a second round, a third person independently checked all references and the previously extracted data again. For articles reporting on multiple locations and types of greenspaces, data was extracted separately for each city (location) and UGS type (study). We extracted the relevant data from the text, tables and graphs. In cases where only graphs were available, we used an online tool (www.graphreader.com) to extract numerical data for analysis.
For each study, we extracted information on geographic location (city name and country), reference period (before/ during/ after first lockdown), type of UGS studied (categorized into 7 categories), methodology used (categorized into 6 categories), sample size, study period, percentage of change in UGS use, proportion of people that increased/ decreased/ did not change UGS use, as well as a qualitative expression of increase/ decrease/ no change with regards to the reference period. As different countries and cities implemented varying measures to combat COVID-19, we defined the reference period as being before, during or after the most restrictive government-imposed measures (lockdowns) in the first phase of COVID-19 in 2020. The measures reported in the studies ranged from movement restrictions 68 to strict curfews 69–71. We based the classification of whether a study referred to the time during or after the first lockdowns on the description provided in the papers. Given that we only analysed papers published between 2020 and 2022, we did not have information on the total duration of COVID-19, and thus, it was not feasible to classify papers according to a longer timeline. Therefore, the common reference point among the papers was the outbreak of COVID-19 in early 2020 with the first lockdowns, compared to later phases.
A table of all included studies with metadata and extracted data is provided in SI.
Contextual variables
We considered several potential effect modifiers from the pre-defined set 26 to further analyse their impact on the changes in UGS use. These modifiers included strictness of COVID-19 policies, regional economic situation (indicated by GDP per capita) and type of green space. Some other potential modifiers defined in the protocol, such as location of UGS in the urban fabric, were not consistently reported in the analysed literature, which limited our ability to make comparisons across studies. For the same reason, we also did not consider individual effect modifiers of study participants such as gender, age, etc.
To determine the stringency of COVID-19 policy measures, we relied on the Oxford COVID-19 Government Response Tracker 28. Since all studies in our review referred to changes during and after the lockdowns in the first phase of the pandemic in 2020, we calculated the average government stringency for this year only. The data were available on national scales, except for Canada, the US, UK, and Australia, where we used sub-national data (Fig. S1 in supplemental information).
To assess the regional distribution of wealth, we used GDP per capita as an indicator. We extracted the GDP value for each studied location from a gridded dataset (Fig. S2) for the year 2015 with 5-arc min (~10km) resolution 29. We performed this extraction in ArcGIS using the “Extract Values to Points” function.
Regarding the classification of reported UGS, we categorized them into different greenspace categories based on their functional and structural characteristics. Although the terminology used across studies varied, we coded the study results into the following overarching categories: 1) Private gardens (including yards and balconies), 2) UGS near home (including roadside greenspace), 3) forests and nature reserves (including watersides and all natural and semi-natural remnants of vegetation and vacant land that are not maintained in a park-like manner), 4) Public gardens and urban agriculture (including community gardens and publicly accessible agricultural fields), 5) UGS unspecified (many papers do not differentiate types of UGS, including those that rely on the generic “park” category provided by Google and Apple mobility reports), 6) Public parks (maintained by municipalities for leisure and other purposes), 7) Historic gardens (typically gated, with restricted entry based on fees).
Critical appraisal:
To assess the quality of the included studies, we employed a 6-point (0-5) Likert-scale rating system 26. Two independent reviewers conducted the critical appraisal for all studies that were included after full text screening. Studies that did not meet the knockout criteria, which included clearly stated research questions and a clear description of the methods used, were excluded.
We assigned points based on specific criteria to rate the methodological quality of the studies. The criteria included: 1) Sampling methods appropriate for the research question, such as the number of observations and geographical distribution; 2) Adequacy of statistical analysis for the research question, with clear descriptions and data fit for model; 3) Inclusion of control cases referring to the time before COVID-19 as a baseline for comparison; 4) Consideration of confounding variables, such as the temporal distribution of observations and weather conditions, that could impact the results; 5) Inclusion and proper handling of confounding variables in the data analysis. Studies that did not meet at least criterium 1, received 0 points and were subsequently excluded from the review, making this de-facto a knockout criterium. Qualitative studies with low (<10) sample sizes were included in the review but could only achieve 1 point due to their limited comparability. The results of the critical appraisal are included in the table of included studies in SI.
To present the quality of studies, we displayed their scores in the figures and included them as weights in the statistical tests. This approach gave more importance to high quality studies, thus ensuring more reliable results in the analysis. Additionally, for a subset of studies that reported magnitudes of change based on quantitative surveys, we included sample sizes as weights in the analysis to account for their statistical strength.
Furthermore, we assessed external validity of the included studies based on their geographical location, type of green space and socio-economic setting. These contextual factors were considered during the discussion section to provide a contextual evaluation of the study findings.
Data analysis
To map the results of the included locations, we obtained geographical coordinates for each city covered by an article. We used the “Geography data type” formatting in Excel and conducted additional searches in Google Maps when necessary. In cases where only larger administrative units were indicated in the studies, we used the respective capital cities as the locations. We used QGIS to display locations on a world map in the Equal Earth projection 72.
We mostly analysed data at the location level, i.e., for each city where results were reported. In a few cases, where an article reported adverse results for different types of UGS within the same city, we used the average change across all types to avoid that articles with very high detail of reporting become too dominant in the overall analysis. Only for the direct comparison between UGS types, we analyzed available results for each UGS separately.
We coded the studies based on whether they reported overall increase, decrease or no change in UGS use. We included studies with different methodologies and detailed study questions, resulting in a mix of studies reporting general changes, and others providing more specific insights. To enable global comparability, we simplified results by summarizing them for each location and differentiating between overall reported increases and decreases based on the respective study methods. In a subset of studies, we specifically analysed the magnitudes of change in UGS use. To be included in the analysis of change magnitudes, studies had to be based on quantitative surveys as a comparable methodology, report sample sizes, and provide their results as relative changes in UGS use in percentage.
For comparing COVID-19 stringency and GDP per capita between cases of decreased vs. increased use of UGS, we employed logistic regressions. We adopted this approach to facilitate a binary comparison and to account for the study score from critical appraisal as a weighting variable. Although this binary approach required disregarding the few cases of reported “no change”, it allowed us to use a more robust statistical test of the GLM-type with a binomial distribution family. We also plotted the relative changes against COVID-19 stringency and GDP, weighted by sample size, for visual interpretation.
To assess changes in UGS use across different types of green spaces, we grouped the numbers of studied categories in contingency tables. We conducted a statistical comparison of the distribution of contingency tables with an expected random distribution, using a Χ2-test, weighting the results based on the study scores with the “weights” package in R. The distribution of these categories was displayed in mosaic plots, weighted by study score, using the “vcd” package in R. All other data visualizations were performed using R, using the “ggplot2” package. For displaying means and error bars, we calculated means with 95%-confidence intervals, using the “Rmisc” package.
We discuss publication bias based on the distribution of studies reporting increase, decrease or no change in combination with their geographical distribution and their quality scores. We do not expect strong publication bias between increase or decrease, as there is no obvious reason why either of them would be more likely to be reported. However, there might be a certain bias toward not publishing studies that show no change over time.
During the writing process of this work, we used Generative AI and AI-assisted technologies, specifically ChatGPT, to improve the language for clarity and conciseness. After using this tool, we thoroughly reviewed and edited the content and we take full responsibility for the content of the publication.
Deviations from the protocol
Unlike stated, we did not use ROSES reporting standards 73 upon request from the journal editor to use PRISMA.
In the protocol we state that double screening will be applied for at least 100 references and then continue until sufficient agreement is reached. In fact, we practiced double-screening throughout and added a third screener in cases of divergent results.
In the inclusion criteria, we previously stated that only countries with “lockdowns” will be included. However, we did include countries such as Sweden that did not have formal lockdowns but were still affected by COVID-19 and did apply certain social distancing measures.
Unlike previously stated, we did not systematically conduct additional searches on Google search, as the number of potentially relevant results generated with the pre-defined set of keywords by far exceeded the limit of 1000 results that are displayed by Google.