SPEED BUMP FOR OPEN ACCESS TO GENOMIC DATA

A bioinformatics methodology that could help solve crimes or identify victims of disasters has at the same time raised genetic privacy concerns that threaten to stall the rich data-sharing potential of genome-wide association studies just as the power of these studies to detect risk-conferring alleles in complex genetic diseases is being realized.

Genome-wide association studies have been used to great effect in recent years to pick up subtle polymorphisms that confer small risks and thus only become evident when large cohorts are analyzed.  Using this strategy, researchers have identified genes and gene candidates for a number of common neurologic diseases, including multiple sclerosis, autism, ADHD and schizophrenia, and have made significant inroads in understanding how individual SNP’s affect therapeutic response, potentially paving the way for personalized medicine.

To facilitate data sharing and accelerate genetic studies, the National Institutes of Health has made a concerted effort to ensure that summary data from genome-wide association studies is freely available to researchers, and to require researchers to bank genetic data from NIH-funded studies in online repositories.   But in a policy change announced August 29, the National Human Genome Research Institute (NHGRI)- along with The Wellcome Trust and the Broad Institute — took a cautionary step backward, limiting access to the very same data for which they’ve advocated greater sharing.

The move was prompted by the discovery that, with enough genomic data on an individual, it is possible to determine whether that individual participated in a given genetic study by analyzing pooled summary data such as that readily available on NIH’s dbGaP or CGEMS Web sites until recently. In the August 29 issue of PLoS Genetics, David W. Craig and colleagues at the Translational Genomics Research Institute (TGen) in Phoenix and the University of California, Los Angeles, spelled out a methodology by which an individual genotype could be detected, probabilistically, from a mix of DNA samples or from pooled data sets of aggregate single nucleotide polymorphisms.  The researchers developed the method as a means of identifying a single person’s DNA from among many potential samples from, say, a contaminated crime scene or mass-casualty disaster.

In a letter to Science magazine published online September 4, NIH Director Elias Zerhouni and National Heart, Lung and Blood Institute Director Elizabeth Nabel said that, in addition to having important implications for forensics and genome-wide studies, the TGen/UCLA research “has also changed our understanding of the risks of making aggregate genomic data publicly available.”

Privacy Trumps Data Sharing

“Sharing genomic data and, particularly, allele frequencies has become common practice, if not an imperative, in science,” Zerhouni and Nabel wrote.  “Yet, the protection of participant privacy and the confidentiality of their data are of paramount importance.”

Informed by Craig in advance of the paper’s publication that study participants’ genetic information privacy could be compromised, NIH moved quickly to remove aggregate genomic data from public access.  Such data is now sealed off behind a firewall, accessible to researchers only after an application and review process and subject to specific terms and conditions of use. The change essentially treats aggregate data as individual-level genotype/phenotype data, to which access was already controlled because of perceived privacy vulnerabilities.

Laura Lyman Rodriguez, senior advisor to the director for research policy at NHGRI, said the action “was the appropriate, if cautious, thing to do.”

“I think it was very unexpected that you could even make this kind of inference about inclusion within a group,” Rodriguez told NerveCenter.  “So the fact that something that had been previously thought not possible was now possible reminded everyone of the speed with which the methodologies and the technologies are changing.  To be prudent and to be respectful of the [study] participants, the thinking was that it was better to be cautious.”

She said the new policy was not intended to pose a “high barrier” for researchers, and that the response time for data access requests averages about two weeks.

Ripple Effects

The move has nonetheless caused ripple effects throughout the genetics research community, as universities mull whether to pull data from their own Web sites and grapple with issues of informed consent in the face of the apparent vulnerabilities to participant confidentiality.

“To me, it’s taking a very large step to prevent a small risk,” says Bradley Malin, Vanderbilt University geneticist who is involved in an NHGRI-funded, five-center study to correlate SNPs from various DNA collections with clinical phenotypes of disease.  “The amount of technical savviness that you would need to have in order to leverage this type of an attack, and the amount of ethical responsibility you would need to violate [gives] it a very low probability of occurring.”

Malin called the data restrictions a “knee-jerk reaction” on the part of NIH and predicted that “as time moves forward they’ll have a better idea of how much information we can share before the threat [of a privacy breach] becomes too big.”

David Karp, a geneticist at University of Texas-Southwestern Medical Center who is involved in the development of Immport (http:www.ImmPort.org/), a data-archiving and -sharing platform supported by the National Institute of Allergy and Infectious Diseases (NIAID), agrees that there “is a very small likelihood of harm to people who have contributed DNA to genome-wide association studies”.  He suggests that the government’s conservative response might be seen as “unnecessarily paternalistic and patronizing to a degree, because it says that we are going to make a decision to possibly limit the utility of research that people volunteered for because we’re not certain that we can protect their identity 100 percent.”

Social and Ethical Questions

Karp advocates an open and continuous dialogue between investigators and study participants that addresses unknown questions such as privacy protection straightforwardly.  “There are social and ethical considerations that need to be discussed and figured out, but I think the risk is that scientists are making decisions that the community of subjects who we’re trying to protect don’t feel need to be made for them.”

Genetic privacy issues have been in the news with the recent debate and eventual passage of the Genetic Information Nondiscrimination Act, which will bar employers and health-insurance companies (but not life or long-term-care insurers) from using an individual’s genetic blueprint to deny employment or health coverage.  The law takes effect in May 2009 for employers and in November 2009 for insurers. AN

Brenda Patoine

One Response

  1. [...] Speed Bump for Open Access to Genomic Data [...]

Leave a Reply