DATA SHARING: MAKING HEADWAY IN A COMPETITIVE RESEARCH MILIEU

If kindergarten teaches us all we need to know about life, many researchers missed the part about sharing.
Ten years ago, a team of researchers at the Massachusetts General Hospital (MGH) Institute for Health Policy published the first in a series of surveys on data sharing practices and attitudes among academic researchers. The studies have revealed what most of us already know – in the sandbox of experimental neurology and neuroscience, not everyone wants to share.

A high resolution multiphoton microscopy mosaic image of a cerebellar distribution of cell bodies (Hoescht 33342; blue) and alpha-synuclein (green) from a non-transgenic animal from the BIRN project.

A high resolution multiphoton microscopy mosaic image of a cerebellar distribution of cell bodies (Hoescht 33342; blue) and alpha-synuclein (green) from a non-transgenic animal from the BIRN project.

In their first survey, they discovered that academic researchers involved with commercialization of their research, or who collaborate with industry, were more likely to withhold results from other researchers.
Although things appear to be changing, the idea of sharing data seems counterintuitive. Grant money is harder than ever to come by, and competition for space in professional journals is fierce. Factor in the desire for personal advancement and prestige, and the allure of possibly striking it rich, and it would seem that sharing data, although a lofty ideal, hasn’t got a prayer if it remains voluntary.
Yet this may not be the case, according to MGH researcher Eric Campbell, who with his colleague David Blumenthal, co-authored the surveys. According to Campbell, sharing is slowly making inroads. In some areas, though, notably clinical trials and privately funded research, opposition to sharing any data prior to publication or a patent remains steadfast.
“Much has changed, and I suspect that in many areas of research there’s greater awareness of the need for open sharing – at least in terms of replicating published results,” he said. ”We’re clearly moving toward more open research, at least that’s the talk. The problem is that policies need to be established by consensus, they need to be enforced, and they need to have teeth, and that means some kind of penalties.”

High Stakes
Campbell said resistance to data sharing is understandable, given the fact that academic institutions, individual researchers and, of course, the pharmaceutical industry, invest huge amounts of time and money in research, hoping to recoup the outlay and turn a profit.
It now costs more than $800 million to develop a new drug and as many as 70 separate studies, according to industry data.
“For universities, big hits are few and far between,” he noted. “They feel the need to protect their data and there’s nothing wrong with that because they put up money into their research and researchers.”
A number of medical journals, professional societies and privately funded consortiums now require some degree of data sharing as a condition of publication or participation, and the NIH has adopted rules requiring grant recipients to sign an agreement to share data. Grant recipients of $500,000 or more per year must now provide a “data-sharing plan” in their grant application, but the rules allow significant leeway on what can be kept private. In addition, many researchers receiving public funding or grant money for clinical trials are finding ways to circumvent such restrictions to keep their data private, observers said.
“I still run into problems,” noted Robert Williams, co-director of the Center of Genomics and Bioinformatics at the University of Tennessee School of Medicine in Memphis. “Especially when requesting data from researchers involved in projects where it takes a great deal of money and effort to develop a data set that can be mined or a repository of specimens where samples have a short shelf-life, for example tissues from a population of patients with a specific disease,” he said.
“It’s understandable. I know it’s very difficult to let go of primary data. Of course it’s less excusable when a study has been published, but it goes on all the time. Some investigators skirt policies, and others ignore them.”
He also agreed that what policies there are in place are often not enforced. “We recently tried to get some micro array data from a study published in the Public Library of Science (PLOS), and were unsuccessful. This is data supporting a study that has appeared in a public open-access journal, but we weren’t granted access. We were told the data was peripheral to the study, but this can be said for about three out of four new data sets. It should have been available.”

Data Hoarding
Withholding data harms not only biomedical progress in general, but can frustrate promising junior researchers and even turn them away from pursuing avenues of discovery, according to Campbell. In one of his surveys, fully half of graduate students and postdoctoral fellows reported having data requests denied, and half again said that withholding had had a negative effect on their research or the progress of their lab or group. Some even reported having to abandon a line of research, being unable to confirm the results of other scientists, or having their research or a publication delayed.
“It is incumbent on all academic institutions to protect their investments. But it is equally important for junior researchers to have access to data from other, more recognized researchers,” according to Campbell.
Nonetheless, an entrenched climate of competition, for grants and other sources of funding, will not be easily overcome, he noted.
”Many researchers tend to keep their data to themselves or only grant access to researchers with name recognition in a field. We should be moving in the other direction and making data sharing an important part of the research lexicon, as a matter of professional ethics and integrity.”
According to Maryann Martone, professor-in-residence in the Department of Neurosciences and Co-Director of the National Center for Microscopy and Imaging Research (NCMIR), at the University of California, San Diego, the reality is that most investment in research is now, and will continue to be, driven by the potential for profit and prestige – and these are powerful incentives to not share data.
“The landscape is better than it was ten years ago, but for some types of data it remains a big problem,” she said. “I think expectations were that things would be better by now. Some inroads have been made, but many rank and file researchers are still reluctant to make all of their raw data available freely. The view seems to be, why submit data unless someone asks for it.”

Different Worlds
Eldon Geisert, a professor at the University of Tennessee College of Medicine in Memphis, also cautioned that progress in areas such as genomics does not mean that researchers in other disciplines are suddenly embracing the practice. The Human Genome Project set a precedent for sharing data that remains the exception, not the rule, he said.
Geisert, who works in genomics and brain cancer research in academic and commercial research – he also runs a small laboratory – said that data sharing practices are very different between the two fields, as well as in public versus private ventures.
“Genomic data we share freely because there’s so much data and so many ideas that you just don’t care – it’s wide open. But in our private research we try not to share anything until a patent is awarded – period. You have to protect intellectual property or you can lose it very easily, especially data that can be mined.”
Without elaborating, Geisert said his commercial work involves glioblastoma multiforme, the type of brain tumor recently diagnosed in Sen. Edward Kennedy (D-Mass).
“With private research and commercial trials data, the rule is you don’t talk about it – not on cell phones or even land lines – there’s a lot of paranoia, but it may be justified. I live in two worlds – genomics and commercial research, and I can tell you between the two, it’s nice being able to talk to people without having to protect anything.”
He added that the same holds true with major repositories of clinical trial data, including those sponsored by government agencies, in his experience.
“We’ve requested data from the National Cancer Institute that we know is there and should be publicly available, but haven’t been able to get access to it. I guess it’s true across the board: The name of the game is protect your data at any cost.”

Clinical Trials
Although few researchers are very optimistic that data sharing in clinical trials will be embraced any time soon, there have been signs of progress by major pharmaceutical companies. In May 2006, GlaxoSmithKline announced that it would provide open access to trial summaries of more than 2,600 clinical drug trials on 52 prescription drugs and vaccines.
“We have created a record of transparency which we believe is unsurpassed in regard to medical interventions that affect the daily lives of patients,” said Frank Rockhold, Senior Vice President, Biomedical Data Sciences, GSK Research & Development. “The initiative to make healthcare information more widely available is growing as more pharmaceutical companies create results databases. It will progress further if not only more companies but also academic and government sponsors create public databases of the results of their research into various medical interventions.”
Last October, industry giant Eli Lilly also came on board. CEO Sidney Taurel endorsed a voluntary data-sharing collaboration among the healthcare industry, the Pharmaceutical Research and Manufacturers of America (PhRMA), and government agencies, “for safety” purposes. The cooperative effort, which would include integrated, “constantly updated computer systems and databases,” could cost upwards of $400 million and take 10 years to develop, he said.
Eli Lilly will disclose almost all of its clinical trial data from marketed drugs over the last 10 years. Lilly has also promised to embrace data transparency by continuing to reveal data from all “early to late stage clinical trials, including safety information and outcomes.” It will even disclose trials that test established drugs for “off-label uses,” Taurel explained.
PhRMA now operates a publicly accessible database of some clinical trial results, while NIH continues to develop ClinicalTrials.gov, although both sites only post late-stage data, and allow sponsors considerable leeway in which results to share. Far more controversial is the release of early-stage data, which can hurt sponsors’ competitive goals.
According to Andrew Vickers, a research methodologist at Memorial Sloan-Kettering Cancer Center, in New York City, these changes have more to do with public relations than data sharing.
“In the clinical trial community investigators routinely refuse to share raw data from a randomized trial without giving a reason,” he stated in a recent article published in the journal Trials. He offered a novel solution.
“Clinical trialists … sometimes appear overly concerned with being scooped and with misrepresentation of their work. Both possibilities can be avoided with simple measures such as inclusion of the original trialists as co-authors on any publication resulting from data sharing. Moreover, if we treat any data set as belonging to the patients who comprise it, rather than the investigators, such concerns fall away,” according to Vickers.
While NIH data-sharing requirements for grantees “is a good start,” he noted, they do not ensure that data are ultimately made available to others interested in the research. “Moreover, only a fraction of randomized trials are NIH-funded to the tune of $500,000 per year.”
He suggested making it illegal to experiment on human subjects and failing to publish the raw data within an appropriate period of time.
“I assume there may be special circumstances in which it might be reasonable to keep data confidential – for example, during early phase development of a patented drug – and one might envisage that exceptions might be granted. Lest my proposal seem rather draconian, consider whether some of the Vioxx-related deaths might have been avoided had Merck been forced to publish raw data on individual patients,” he noted.

‘Mom and Apple Pie’
Computational neuroscience is one area that lags behind others in terms of data sharing, but here too there is greater access than ever to raw data, according to Yale University neurobiologist Nicholas “Ted” Carnevale, who develops computational models of neurons and biological neural networks.
The change has occurred primarily because source data can be manipulated and, if not evaluated by others, it may be suspect, he explained, adding that making raw data freely available to other researchers is like peer review on a grand scale. And as in other areas of experimental research, computational neurological researchers have come to realize that if they refuse to make their data available for scrutiny it becomes automatically suspect and is likely to be discounted by others in the field.
“What we all want is reproducibility,” he said. “I feel that it’s up to the research community to set its own data sharing standards. The acceptance is finally there in neuroscience. No one’s against it – it’s like mom and apple pie – but nobody is rushing into it either
UCSD’s Martone said neuroimaging also lags behind the data sharing curve.
“In neuroimaging, data is tough to collect, easy to misplace, and a large number of researchers are still uncomfortable with sharing because there’s a strong culture [pressure] to not put out data unless it’s of the highest quality possible, which a lot of supporting neuroimaging data clearly isn’t.”
Something as simple as an out of focus sample can have repercussions that can harm a researcher’s reputation, she observed. “You have to have thick skin if you are putting it all out there for other researchers to see. You can get crucified if you’re not careful.”

The Nascent ‘Dataverse’
The first high-profile battle over open sharing involved the much publicized rift between the principal researchers involved in sequencing the human genome.
Nobel laureate James Watson, who jointly discovered the structure of DNA in 1953, was the first director of the $3 billion Human Genome Project. He and his NIH colleague J. Craig Venter had a falling out over a number of issues, with Venter adamant about the need for state-of-the-art high-throughput sequencing technology to accomplish the mission.
Watson and Venter both left the NIH in 1991, Watson to start the nonprofit Institute for Genomic Research (TIGR). Venter formed Celera Genomics, a well-funded private enterprise dedicated to decoding the genome first, and later the J. Craig Venter Institute.
The other architect, Francis Collins, who replaced Venter and became his chief competitor in the race, is credited with mobilizing the NIH effort through the adoption of Venter’s rapid sequencing techniques and seeing the project through its completion. He recently announced his departure as director of the U.S. National Human Genome Institute.
Unlike Collins, who became an outspoken advocate for open sharing of emerging genome data, Venter sought to keep Celera’s findings under wraps and keep private competitors at bay until the project was done. Ultimately both shared credit for the completed genome when it was published in 2003.
Squabbles aside, no one can argue against the contribution of the HGP in raising awareness of the debate and paving the way for data sharing in huge projects. Since then, the practice has become integrated into several areas of research. In neurology, sharing neuroimaging, proteomic and genomic data has become widely accepted, and there is progress, albeit slower, in other areas.
Today, web-based initiatives such as the NIH repositories for gene sequence data for Alzheimer’s and other neurological diseases are being used by thousands of researchers around the world, as are programs like the (LONI) Human Brain Project and the Allen Brain Atlas. Additionally, there are similar data bases in Europe and in Japan. This initial wave of an emerging data sharing paradigm has laid the infrastructure for handling massive amounts of data from different sources, including high-throughput gene assay databases, complex neuroimaging repositories, and other components of the bioresearch “dataverse.”

Blue Skies?
Last year, the NINDS asked researchers for a wish list – a “blue sky vision” – of future NIH-sponsored neurological research. Promoting data sharing was at the top of the list of process and policy recommendations.
At a recent congressional hearing on the FY2009 NIH budget, Story Landis, Director of the National Institute of Neurological Disorders and Stroke (NINDS), emphasized NIH’s commitment to encourage data sharing among the nation’s researchers working on federally funded studies.
“Most of the hundreds of diseases caused by defects in single genes are relatively uncommon, but combined variations in multiple genes, often operating in concert with environmental influences, contribute to many common neurological disorders and to differences in how people respond to treatment,” Landis said.
“This year investigators identified the first two new genes linked to multiple sclerosis in more than 30 years, reinforcing the rationale for a therapy that is already in clinical trials. Because these studies often require participation of thousands of people, sharing data among researchers is essential.”
A major Parkinson’s disease study at NINDS last year “set a standard in rapid data sharing,” she continued, and the NINDS Human Genetics Repository and other Institute efforts” will continue to emphasize the importance of open sharing of data among neurologists.”
The recent launch of The Alzheimer’s Disease Neuroimaging Initiative (ADNI) is another example, according to Landis, a project to develop an imaging and biomarker database that can be tapped by researchers in both the public and private sectors as they develop and test drugs for memory decline.
At $60 million, one-third of which is funded by private Alzheimer’s associations and pharmaceutical companies, it is the largest public-private partnership on brain research underway at the NIH and, unlike Human Genome Project, a central database of raw data will be freely available to researchers during development, she explained.
While such initiatives might advance data sharing in general, the stagnant state of the NIH budget over the past five years may be pushing things in the opposite direction, commented Martone, who heads the UCSD coordinating center of the Biomedical Informatics Research Network (BIRN), another multi-center cooperative initiative with open data sharing.
“Actually, I think that the funding situation at the NIH may be exacerbating the problem,” she said. “Because NIH has less grant money available the competition for funds has grown much tougher, and that feeds into the tendency of researchers to want to keep their cards close to their vest.” AN
Kurt Samson

Further Reading

Leave a Reply