Reproducibility and transparency are cornerstones of reputable, rigorous, and mature Science 1–5 and, for programming, the reproducibility spectrum6 begins with public, permanently archived code 6,7. In Ecology and Evolution, programming has become the basis of most analyses8, and the benefits of code-sharing are increasingly recognized 5,9. Clear, reusable code released under a permissive license 10 may enhance paper impacts and reduce duplicated efforts, allowing science to progress more effectively 1,3–5. Well-documented code provides a valuable educational resource11. Code-sharing could also facilitate our ability to credit developers, as software and package usage data can be harvested directly from published code 12.
Has the increasing appreciation of code-sharing benefits influenced code-sharing practices over time? Recent evidence suggests biologists may be reluctant to share code. A study focused on agent-based models found that 81% of publications did not provide code13, while PLOS Open Science Indicators revealed that 92% of publications in Agricultural and Biological Sciences fail to share code (in comparison, only 49% fail to share data)14. While some papers include the statement “code available upon request”, this promise may not be met15. Where published, code may also not be reusable due to licensing 10. Resistance to code-sharing and re-use may arise from unfamiliarity with best sharing practices, insecurity about code quality, fears of misuse or unsolicited appropriation of ideas, and excess preparation costs 16. However, it has been argued that many perceived issues with code-sharing stem from misunderstanding of the risks and benefits of sharing 16. To better understand how code-sharing practices change over time and whether code-sharing benefits citation rates, we estimated trends in R code 17 sharing since 2010 and tested whether citation rate was higher for papers that shared code.
We identified 28,227 articles citing the R programming language published between Jan. 1, 2010, and Aug. 19, 2022, in ecology and evolution journals 17; see Online Methods. We used a randomized survey of 1,001 of these papers to assess trends in code-sharing frequency and whether this is related to the number of citations each paper receives.
Overall, R code was only available in 49 of the 1001 papers examined (4.9%) (Figure 1). When included, code was most often in the Supplemental Information (41%), followed by Github (20%), Figshare (6%), or other repositories (33%). Open-access publications were 70% more likely to include code than closed-access publications (7.21% vs. 4.22%, Χ2 = 4.442, p < 0.05). Code-sharing was estimated to increase at 0.5% / year, but this trend was not significant (p=0.11). The year of 2021 and 2022 showed a shift towards more frequent sharing, but the percentage of code-sharing has been consistently below 15% over the past decade (Figure 1).
We found papers including code disproportionately impact the literature (Figure 2), and accumulate citations faster (i.e., a marginally significant year-by-code-inclusion interaction; p = 0.0863). Further, we found a significant interaction between Open Access and code inclusion (p = 0.0265), with publications meeting both Open Science criteria (i.e., open code and open access) having highest overall predicted citation rates (Figure 2). For example, Open Science papers are expected to receive more than doubled citations (96.25 vs. 36.89) in year 13 post-publication compared with fully closed papers (Figure 2).
Ecological and evolutionary literature falls far short of the code sharing required for complete reproducibility and transparency, and there has not been any systematic increase in the last 12 years. Undoubtedly, this hinders scientific progress and has far-reaching financial consequences, since coding must be redone for common analytical tasks2. Our results also indicate that failure to share code may also hinder the impact of most scientists, as sharing code leads to a higher rate of citation accumulation, particularly combined with open-access publication. Scientists and journals that embrace code sharing may be more impactful, even if the publications are closed-access. For instance, our results suggest that a scientist who publishes two Open Science papers in a journal with an impact factor of 3 will have 112 more citations after 13 years than if they had published closed-access and without sharing code.
To create an environment conducive to reproducibility and transparency, we call upon scientists, funding sponsors, and institutions to champion code-sharing and acknowledge its role as a legitimate and valuable contribution to the scientific process. Major funding sponsors are beginning to mandate Open Science approaches (e.g., the EU’s Horizon Europe: https://research-and-innovation.ec.europa.eu/funding/funding-opportunities/funding-programmes-and-open-calls/horizon-europe_en, the US Year of Open Science: https://open.science.gov/, ). Scientific journals can also play a significant role in this transformation 1,3,4. Minimally, journals facilitate the deposition of code in a stable repository and provide a link to that repository (ideally in both human and machine-readable formats)6. More ambitious solutions might include incorporating links between methods text and the corresponding code, employing dedicated code editors to help improve code style and clarity, or incorporating computational notebooks (e.g., Rmarkdown, quarto, jupyter)6. Such measures will enhance transparency in reporting and provide reviewers and readers with the critical information necessary to reproduce and validate the study's findings. The adoption of these principles and practices will serve to promote the integration of open code in the scientific landscape, enhancing the verifiability and impact of our research.
Given the growing list of reasons for code-sharing, we encourage scientists to embrace open code and open access more generally. Although maintaining well-documented code in a version-controlled public repository (e.g., Github) or public archive with directions for its use (e.g., Zenodo) is ideal for code-sharing, other options that require less effort can at least ensure the distribution of code for interested parties. The recent advances in artificial intelligence (e.g., ChatGPT) have made documenting scripts easier, thus lowering the cost of the authors in code-sharing 18. Software licenses play a critical role in code sharing: permissive licenses (e.g., MIT) encourage re-use, while restrictive or proprietary licenses can still allow methodological transparency while limiting or preventing the re-use of published code 10. Where code is published without a license, the author retains the copyright 10, providing a mechanism for scientists to embrace transparency without allowing use of their code. Conversely, scientists who wish their code to be freely available need to ensure they include an appropriate license. Finally, we stress that as code sharing increases, our attribution practices must keep pace, both for scientific transparency and to credit the developers12.