Every fifth published metagenome is not available to science

Ester M. Eckert, Andrea Di Cesare, Diego Fontaneto, Thomas U. Berendonk, Helmut Bürgmann, Eddie Cytryn, Despo Fatta-Kassinos, Andrea Franzetti, D. G. Joakim Larsson, Célia M. Manaia, Amy Pruden, Andrew C. Singer, Nikolina Udikovic-Kolic, Gianluca Corno*

Have you ever sought to use metagenomic DNA sequences reported in scientific publications? Were you successful? Here, we reveal that metagenomes from no fewer than 20% of the papers found in our literature search, published between 2016 and 2019, were not deposited in a repository or were simply inaccessible. The proportion of inaccessible data within the literature has been increasing year-on-year. Noncompliance with Open Data is best predicted by the scientific discipline of the journal. The number of citations, journal type (e.g., Open Access or subscription journals), and publisher are not good predictors of data accessibility. However, many publications in high-impact factor journals do display a higher likelihood of accessible metagenomic data sets. Twenty-first century science demands compliance with the ethical standard of data sharing of metagenomes and DNA sequence data more broadly. Data accessibility must become one of the routine and mandatory components of manuscript submissions-a requirement that should be applicable across the increasing number of disciplines using metagenomics. Compliance must be ensured and reinforced by funders, publishers, editors, reviewers, and, ultimately, the authors.
