Code sharing increases citations, but remains uncommon

doi:10.21203/rs.3.rs-3222221/v1

Download PDF

Brief Communication

Code sharing increases citations, but remains uncommon

https://doi.org/10.21203/rs.3.rs-3222221/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Biologists increasingly rely on computer code, reinforcing the importance of published code for transparency, reproducibility, training, and a basis for further work. Here we conduct a literature review examining temporal trends in code sharing in ecology and evolution publications since 2010, and test for an influence of code sharing on citation rate. We find that scientists are overwhelmingly (95%) failing to publish their code and that there has been no significant improvement over time, but we also find evidence that code sharing can considerably improve citations, particularly when combined with open access publication.

Scientific community and society/Scientific community/Publishing/Publication characteristics

Scientific community and society/Scientific community/Publishing

Biological sciences/Computational biology and bioinformatics

Biological sciences/Biological techniques/Bioinformatics

Scientific community and society/Scientific community/Ethics

Reproducibility and transparency are cornerstones of reputable, rigorous, and mature Science ^1–5 and, for programming, the reproducibility spectrum⁶ begins with public, permanently archived code ^6,7. In Ecology and Evolution, programming has become the basis of most analyses⁸, and the benefits of code-sharing are increasingly recognized ^5,9. Clear, reusable code released under a permissive license ¹⁰ may enhance paper impacts and reduce duplicated efforts, allowing science to progress more effectively ^1,3–5. Well-documented code provides a valuable educational resource¹¹. Code-sharing could also facilitate our ability to credit developers, as software and package usage data can be harvested directly from published code ¹².

Has the increasing appreciation of code-sharing benefits influenced code-sharing practices over time? Recent evidence suggests biologists may be reluctant to share code. A study focused on agent-based models found that 81% of publications did not provide code¹³, while PLOS Open Science Indicators revealed that 92% of publications in Agricultural and Biological Sciences fail to share code (in comparison, only 49% fail to share data)¹⁴. While some papers include the statement “code available upon request”, this promise may not be met¹⁵. Where published, code may also not be reusable due to licensing ¹⁰. Resistance to code-sharing and re-use may arise from unfamiliarity with best sharing practices, insecurity about code quality, fears of misuse or unsolicited appropriation of ideas, and excess preparation costs ¹⁶. However, it has been argued that many perceived issues with code-sharing stem from misunderstanding of the risks and benefits of sharing ¹⁶. To better understand how code-sharing practices change over time and whether code-sharing benefits citation rates, we estimated trends in R code ¹⁷ sharing since 2010 and tested whether citation rate was higher for papers that shared code.

We identified 28,227 articles citing the R programming language published between Jan. 1, 2010, and Aug. 19, 2022, in ecology and evolution journals ^{17; see Online Methods}. We used a randomized survey of 1,001 of these papers to assess trends in code-sharing frequency and whether this is related to the number of citations each paper receives.

Overall, R code was only available in 49 of the 1001 papers examined (4.9%) (Figure 1). When included, code was most often in the Supplemental Information (41%), followed by Github (20%), Figshare (6%), or other repositories (33%). Open-access publications were 70% more likely to include code than closed-access publications (7.21% vs. 4.22%, Χ² = 4.442, p < 0.05). Code-sharing was estimated to increase at 0.5% / year, but this trend was not significant (p=0.11). The year of 2021 and 2022 showed a shift towards more frequent sharing, but the percentage of code-sharing has been consistently below 15% over the past decade (Figure 1).

We found papers including code disproportionately impact the literature (Figure 2), and accumulate citations faster (i.e., a marginally significant year-by-code-inclusion interaction; p = 0.0863). Further, we found a significant interaction between Open Access and code inclusion (p = 0.0265), with publications meeting both Open Science criteria (i.e., open code and open access) having highest overall predicted citation rates (Figure 2). For example, Open Science papers are expected to receive more than doubled citations (96.25 vs. 36.89) in year 13 post-publication compared with fully closed papers (Figure 2).

Ecological and evolutionary literature falls far short of the code sharing required for complete reproducibility and transparency, and there has not been any systematic increase in the last 12 years. Undoubtedly, this hinders scientific progress and has far-reaching financial consequences, since coding must be redone for common analytical tasks². Our results also indicate that failure to share code may also hinder the impact of most scientists, as sharing code leads to a higher rate of citation accumulation, particularly combined with open-access publication. Scientists and journals that embrace code sharing may be more impactful, even if the publications are closed-access. For instance, our results suggest that a scientist who publishes two Open Science papers in a journal with an impact factor of 3 will have 112 more citations after 13 years than if they had published closed-access and without sharing code.

To create an environment conducive to reproducibility and transparency, we call upon scientists, funding sponsors, and institutions to champion code-sharing and acknowledge its role as a legitimate and valuable contribution to the scientific process. Major funding sponsors are beginning to mandate Open Science approaches (e.g., the EU’s Horizon Europe: https://research-and-innovation.ec.europa.eu/funding/funding-opportunities/funding-programmes-and-open-calls/horizon-europe_en, the US Year of Open Science: https://open.science.gov/, ). Scientific journals can also play a significant role in this transformation ^1,3,4. Minimally, journals facilitate the deposition of code in a stable repository and provide a link to that repository (ideally in both human and machine-readable formats)⁶. More ambitious solutions might include incorporating links between methods text and the corresponding code, employing dedicated code editors to help improve code style and clarity, or incorporating computational notebooks (e.g., Rmarkdown, quarto, jupyter)⁶. Such measures will enhance transparency in reporting and provide reviewers and readers with the critical information necessary to reproduce and validate the study's findings. The adoption of these principles and practices will serve to promote the integration of open code in the scientific landscape, enhancing the verifiability and impact of our research.

Given the growing list of reasons for code-sharing, we encourage scientists to embrace open code and open access more generally. Although maintaining well-documented code in a version-controlled public repository (e.g., Github) or public archive with directions for its use (e.g., Zenodo) is ideal for code-sharing, other options that require less effort can at least ensure the distribution of code for interested parties. The recent advances in artificial intelligence (e.g., ChatGPT) have made documenting scripts easier, thus lowering the cost of the authors in code-sharing ¹⁸. Software licenses play a critical role in code sharing: permissive licenses (e.g., MIT) encourage re-use, while restrictive or proprietary licenses can still allow methodological transparency while limiting or preventing the re-use of published code ¹⁰. Where code is published without a license, the author retains the copyright ¹⁰, providing a mechanism for scientists to embrace transparency without allowing use of their code. Conversely, scientists who wish their code to be freely available need to ensure they include an appropriate license. Finally, we stress that as code sharing increases, our attribution practices must keep pace, both for scientific transparency and to credit the developers¹².

Nosek, B. A. et al. Promoting an open research culture. Science 348, 1422–1425 (2015).
Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. The Economics of Reproducibility in Preclinical Research. PLoS Biol. 13, e1002165 (2015).
McNutt, M. Journals unite for reproducibility. Science 346, 679 (2014).
Reality check on reproducibility. Nature 533, 437 (2016).
Munafò, M. R. et al. A manifesto for reproducible science. Nat Hum Behav 1, 0021 (2017).
Peng, R. D. Reproducible research in computational science. Science 334, 1226–1227 (2011).
Parker, T. H., Nakagawa, S. & Gurevitch, J. IIEE (Improving Inference in Evolutionary Biology and Ecology) workshop participants 2016. Promoting transparency in evolutionary biology and ecology. Ecology Letters. (2016)
Feng, X., Qiao, H. & Enquist, B. J. Doubling demands in programming skills call for ecoinformatics education. Front. Ecol. Environ. 18, 123–124 (2020).
Parker, T. H. et al. Transparency in Ecology and Evolution: Real Problems, Real Solutions. Trends Ecol. Evol. 31, 711–719 (2016).
Stodden, V. The Legal Framework for Reproducible Scientific Research: Licensing and Copyright. Comput. Sci. Eng. 11, 35–40 (2009).
Busjahn, T. & Schulte, C. The use of code reading in teaching programming. in Proceedings of the 13th Koli Calling International Conference on Computing Education Research 3–11 (Association for Computing Machinery, 2013).
Merow, C. et al. Better incentives are needed to reward academic software development. Nat Ecol Evol (2023)
Barton, C. M. et al. How to make models more useful. Proc. Natl. Acad. Sci. U. S. A. 119, e2202112119 (2022).
Public Library of Science. PLOS Open Science Indicators. (2023) doi:10.6084/m9.figshare.21687686.v3.
Stodden, V., Seiler, J. & Ma, Z. An empirical analysis of journal policy effectiveness for computational reproducibility. Proc. Natl. Acad. Sci. U. S. A. 115, 2584–2589 (2018).
Gomes, D. G. E. et al. Why don’t we share data and code? Perceived barriers and benefits to public archiving practices. Proc. Biol. Sci. 289, 20221113 (2022).
R Core Team. R: A Language and Environment for Statistical Computing. https://www.R-project.org/ (2021).
Merow, C., Serra-Diaz, J. M., Enquist, B. J. & Wilson, A. M. AI chatbots can boost scientific coding. Nat Ecol Evol (2023)

There is NO Competing Interest.

Rcodesharingmethods.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Code sharing increases citations, but remains uncommon

Status:

Version 1

Abstract

Figures

Main Text

References

Additional Declarations

Supplementary Files

Status:

Version 1