Data Sharing and Research Parasites
On January 21, 2016, the Editors of the New England Journal of Medicine, Dan L. Longo, M.D., and Jeffrey M. Drazen, M.D., published an editorial that characterized scientists who re-analyzed published data sets as "research parasites" (N Engl J Med 2016; 374:276-277). Many groups of scientists wrote thoughtful Letters to the Editor in response. While the NEJM published some responses, many, including one initiated by the members of the FGED Society and endorsed by many others, were rejected. This page was created to allow the FGED Society's letter to be published and made part of the public record.
A complete list of signatories to this letter is provided below. We encorage further discussion of this on social media and have set up a Facebook page for this purpose. On Twitter, use the hashtag #ResearchParasites.
A reply to “Data Sharing”
Recently, the New England Journal of Medicine and five other journals published “Sharing Clinical Trial Data — A Proposal from the International Committee of Medical Journal Editors (ICMJE)”[1-6]. The proposal was followed the next day by an NEJM editorial from Longo and Drazen titled “Data Sharing”. We wish to simultaneously reply to both the initial proposal and the editorial as we have serious concerns with both the proposal and an editorial that allow primary authors to retain control of shared data and emphasize the importance of maximizing benefit to the primary authors.
There are two major issues with the ICMJE proposal. First, the ICMJE propose that data should be shared no later than 6 months after publication. We believe that data should be shared upon publication and that data should be made available to the reviewers at the time of submission. Reviewers and readers cannot adequately assess the publication without access to the primary data on which it is based. Requiring release of data to reviewers at the time of submission and to readers at the time of publication also provides the journals and society with leverage that may be necessary to assure release. Moreover, this model of release allows reviewers to assure that the data is actually available as described. A previous study on the persistency of “supplemental resources” in the biomedical literatures demonstrates that long-term persistency of shared data decreases unless the data is stored with the journal or an independent, and well-funded third party (often a governmental agency). In addition, the authors of that study noted “Our work suggests that approximately 10% of all supplemental data was not available at the time of publication” even though the manuscripts specifically indicated that such data was available. This suggests that a check of data availability is a necessary part of the review process.
Second, the ICMJE proposal requires that "authors of secondary analyses using these shared data must attest that their use was in accordance with the terms (if any) agreed to upon their receipt.” This requirement leaves the long-term control of the data and secondary analyses in the hands of the primary authors. The rationale for this requirement is that “the reasonable rights of investigators and trial sponsors must also be protected”. This concern is expanded upon by Drs. Longo and Drazen who worry “that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited.” It is our position that data sharing is intended to encourage others to use the data and to maximize the scientific benefit of the data. We hold that to maximize benefit to the patients, science and the community, data must be freely shared without constraint on subsequent usage. This is especially important when the data gathering activities were funded by governmental agencies or other funding sources that are responsible to the public at large. The obligation of others is to fully acknowledge the source of the data but not necessarily collaborate with those who produced the original data. Often the latter is quite difficult due to a wide variety of factors, including disparate or conflicting goals and interpersonal relationship issues. Moreover, the reanalysis of data to “disprove what the original investigators had posited” is exactly how science should work. That is, if another author uses someone else’s data to disprove what the original investigators posited, that’s a good thing for science (and patients) if the second opinion is indeed the correct one.
Drs. Longo and Drazen cite an additional concern that they have with data sharing as it relates to clinical research. They state “that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters”. They further suggest that this may lead to false conclusions or inappropriate cross study comparisons. While this is indeed a valid concern, we would argue that is not a valid rationale against sharing clinical data. Rather this concern should be addressed by requiring appropriate and more complete annotation of the data such that other researchers can “understand the choices made in defining the parameters”.
To address this issue, “Minimum Information” standards for the publication of shared data have been developed in fields ranging from high throughput biological assays [9-15] to selected types of clinical studies[16-18]. While not 100% successful in practice (due to both issues with the standards and author adherence to them), such standards go a long way towards providing others with the data annotations needed to allow cross comparability. We would further argue that a huge benefit of data sharing is that issues of inadequate annotation and study design become far more apparent to the reviewers and readers. In addition to allowing cross comparability and additional analyses, it is worth noting that without adequate information on data and study design, the validity of the original study may be called into question as readers cannot adequately assess the potential for biases and errors in study design.
As long as some continue to insist that clinical trial and other clinical data should remain in the exclusive control of those who generate the data until those researchers can milk it for all it's worth, progress will be slower than what it could be, to the detriment of future patients and our society as a whole. Longo and Drazo themselves state "The potential for leveraging existing results for even more benefit pays appropriate increased tribute to the patients who put themselves at risk to generate the data. The moral imperative to honor their collective sacrifice is the trump card that takes this trick." We suggest that first sentence in this quote should be fully internalized. It is a moral imperative to the patients to get the most from the data even if it means that the researchers do not.
1. Taichman DB, Backus J, Baethge C, et al. Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. Lancet 2016.
2. Taichman DB, Backus J, Baethge C, et al. Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. CMAJ 2016.
3. Taichman DB, Backus J, Baethge C, et al. Sharing Clinical Trial Data: A Proposal From the International Committee of Medical Journal Editors. JAMA 2016.
4. Taichman DB, Backus J, Baethge C, et al. Sharing Clinical Trial Data: A Proposal From the International Committee of Medical Journal Editors. Ann Intern Med 2016.
5. Taichman DB, Backus J, Baethge C, et al. Sharing Clinical Trial Data: A Proposal from the International Committee of Medical Journal Editors. PLoS Med 2016; 13(1): e1001950.
6. Taichman DB, Backus J, Baethge C, et al. Sharing Clinical Trial Data - A Proposal from the International Committee of Medical Journal Editors. N Engl J Med 2016.
7. Longo DL, Drazen JM. Data Sharing. N Engl J Med 2016; 374(3): 276-7.
8. Anderson NR, Tarczy-Hornoch P, Bumgarner RE. On the persistence of supplementary resources in biomedical publications.BMC Bioinformatics 2006; 7: 260.
9. Huang J, Mirel D, Pugh E, et al. Minimum Information about a Genotyping Experiment (MIGEN). Stand Genomic Sci 2011;5(2): 224-9.
10. Yilmaz P, Kottmann R, Field D, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 2011; 29(5): 415-20.
11. Bourbeillon J, Orchard S, Benhar I, et al. Minimum information about a protein affinity reagent (MIAPAR). Nat Biotechnol2010; 28(7): 650-3.
12. Bustin SA, Benes V, Garson JA, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem 2009; 55(4): 611-22.
13. Deutsch EW, Ball CA, Berman JJ, et al. Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). Nat Biotechnol 2008; 26(3): 305-12.
14. Taylor CF, Paton NW, Lilley KS, et al. The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 2007;25(8): 887-93.
15. Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001; 29(4): 365-71.
16. Scudamore CL, Soilleux EJ, Karp NA, et al. Recommendations for minimum information for publication of experimental pathology data: MINPEPA guidelines. J Pathol 2016; 238(2): 359-67.
17. Lemmon VP, Ferguson AR, Popovich PG, et al. Minimum information about a spinal cord injury experiment: a proposed reporting standard for spinal cord injury experiments. J Neurotrauma 2014; 31(15): 1354-61.
18. Norlin L, Fransson MN, Eriksson M, et al. A Minimum Data Set for Sharing Biobank Samples, Information, and Data: MIABIS.Biopreserv Biobank 2012; 10(4): 343-8.
19. Brazma A. Minimum Information About a Microarray Experiment (MIAME)--successes, failures, challenges.ScientificWorldJournal 2009; 9: 420-3.