Opportunities Exist for the National Institutes of Health to Strengthen Controls In Place to Permit and Monitor Access to its Sensitive Data

The Policy

Synopsis

As the largest public funder of biomedical research, The National Institutes of Health (NIH) manages the control of and access to several publicly accessible genomic datasets. Due to the sensitive content of genomic data, the NIH released the Genomic Data Sharing Plan (GDS) on August 27, 2014 that “sets forth expectations that ensure the broad and responsible sharing of genomic research data.” According to a February 2019 review from the Office of Inspector General (OIG) — Opportunities Exist for the National Institutes of Health to Strengthen Controls in Place To Permit and Monitor Access to Its Sensitive Data — the NIH had not considered national security risks posed by genomic data access from foreign researchers when drafting the GDS. The report highlights the following main findings:

  • The NIH did not assess the risks to national security posed by permitting access to genomic data by researchers from other countries;
  • The NIH did not ensure that the GDS policy stayed current with national security needs; and
  • The NIH did not verify that researchers from other countries had proper training on how to handle sensitive genomic data.

Overall, the report concludes that current policies restricting and/or providing genomic data access are focused on research qualifications by investigators. However, according to the report, current policies do not currently consider whether providing access to sensitive genomic data to a particular foreign investigator or foreign entity could pose a national security threat. Importantly, the report points out that it is not clear whether current documents that lay out how sensitive data can be used for research purposes are legally binding outside of the United States. Moreover, the NIH allowed access to genomic data to Chinese companies with ties to the Chinese government. The Chinese government has been accused of using genetic data for surveillance.

The report includes a series of recommendations for the NIH, including developing a required security training for investigators and implementing a protocol to assess that foreign researchers have gone through proper training. It further recommends that the NIH work with an organization with national security expertise to assess “the impact of a potential misuse of genomic data provided to foreign [researcher]s.”

Lastly, the report includes a response by the NIH that disagrees with some of the conclusions of the report. The NIH disagreed with the need of conducting a security assessment specifically focused on the access of genomic data by foreign investigators. Furthermore, the NIH “does not concur with OIG's finding and corresponding recommendation regarding additional internal controls specific to foreign [researcher]s…” The report ends with a response by the Office of Inspector General to the NIH’s response, stating that despite of the NIH responses, “further response by NIH is merited that addresses the risks posed by foreign [researcher]s."

Context

Testimony submitted on March 10th, 2017 by Agent Edward H. You, from the Weapons of Mass Destruction Directorate of The Federal Bureau of Investigation, indicates that intellectual property theft through research collaborations with China pose a theoretical threat to marginalize United States pharmaceutical industries, and may lead to deficiencies in innovation from biological research entities in the United States. According to Agent You, “the near term benefits for US entities are realized through acquisition of data to support disease research, health diagnostics, genealogy studies, and personal health information. However, the long-term implications based on China’s potential access to the same data (usage contracts notwithstanding) have not been assessed.” Separately, a recent cyberattack conducted by individuals working on behalf of a branch of the Iranian Government armed services stole intellectual property from universities across the world, including the United States, giving impetus to a concern among some that certain countries might attempt to retrieve and utilize US data for certain interests that conflict with domestic policy.

Policies regarding the sharing of genomic data differ by country, and the use of cloud computing to store and analyze genomic data could pose a risk of bad actors illegally accessing genomic data. It is important to note that global sharing of genomic data for research purposes remains controversial in certain pockets of the public. In an August 2018 letter written by Dr. Francis Collins, Director of the NIH, Collins states that the “NIH is aware that some foreign entities have mounted systematic programs to influence NIH researchers and peer reviewers and to take advantage of the long tradition of trust, fairness, and excellence of NIH-supported research activities. This kind of inappropriate influence is not limited to biomedical research; it has been a significant issue for defense and energy research for some time.” In October 2018, the NIH replied to inquiries from Senator Charles Grassley (R-IA) about NIH policies for protecting taxpayer funded research from foreign influence. In February 2019, a group reported that the NIH referred twelve cases of noncompliance to NIH policies, including failure to report foreign affiliations on funding applications, to federal investigators. Senator Grassley stated that “these [federally-funded] projects can produce important breakthroughs for patients and industry, keeping America at the cutting edge. I intend to continue scrutinizing this area so taxpayers get their money’s worth when funding this research and foreign actors can’t pilfer the good work done by legitimate researchers[.]” 

It is important to note that the NIH funds both domestic and international institutions. The NIH highlights the importance of balancing international collaboration with mitigating the risk of intellectual property theft. One instance of intellectual property theft, the export of scientific research data to foreign labs from the United States, presents a significant financial burden for the United States; the annual cost of intellectual property theft by China is estimated to be between $225 billion and $600 billion. A December 2018 report by an NIH Working Group states that, “unfortunately, some foreign governments have initiated systematic programs to unduly influence and capitalize on U.S.-conducted research, including that funded by NIH.” Thus, the threats of intellectual property theft to national security, the high financial costs of theft, the recent interest in improving the United States Biodefense infrastructure, and the increased availability of genomic data collectively led to the OIG audit of NIH policies regarding data access by foreign investigators.  

The Science

Science Synopsis

Genetic material, composed of sequences of DNA, is unique to individuals and can be analyzed to understand sensitive information such as genetic susceptibility to certain diseases including some types of cancer and sickle cell anemia. Genetics is the area of science related to the understanding of the role of genes in inheritance, usually focused on studying few genes at a time. The National Human Genome Research Institute defines genomics as the “study of all of a person's genes (the genome), including interactions of those genes with each other and with the person's environment.” Collectively, both genetics and genomics contribute greatly to our understanding of human health. However, concerns about how genetic data (the product of analyzing genetic material) might be used by some entities has led to the passage of policies such as the Genetic Information Nondiscrimination Act of 2008 to protect the public from their genetic data being used against them in an employment or healthcare insurance provider setting.

To accelerate biomedical research, the NIH promotes the sharing of both human and non-human genetic data. According to the NIH, data sharing facilitates reproducibility and improves statistical power. The National Human Genome Research Institute, within the NIH, states that “when conducting genomics research, two essential values of science research need to be balanced — the need to share data broadly to maximize its utility for ongoing scientific exploration, and the need to protect research participants' privacy.”

The NIH houses several distinct repositories that include sensitive human data including the Database of Genotypes and Phenotypes, The Cancer Genome Atlas, and the Sequence Read Archive. The data housed in these different repositories include genomics data (i.e., data related to the sequencing of the human genome). Researchers can access genomic data, some of which is openly accessible, and some of which is under controlled-access. The level of access depends on the nature of the data. Applications for accessing controlled-access genomic data must be reviewed by an NIH Data Access committee. When submitting a grant proposal to the NIH, scientists are required to submit a “Data Sharing Plan” that details either how the generated data will be shared or barriers for sharing of this information. Several concerns with improperly de-identified genomics data have been raised, including instances of finding personal identifying information from supposedly anonymous databases.

Scientific Assumptions

  • Data that the NIH maintains can be weaponized (p. 5): There is a considerable body of literature detailing the potential privacy breaches related to genomic data. Furthermore, the NIH houses data related to pathogen sequencing, such as bacteria and viruses. Studying genomes of some pathogens (e.g., the avian flu) can fall under the category of dual-use research — research that has the potential of improving human health, yet under the hands of a rogue agent, could potentially harm the public.

The Debate

Scientific Controversies / Uncertainties

How to properly protect genomic data remains a current area of research. Even “de-identified” genomic datasets are still at risk for identification of patient origin. According to a 2016 article, Beyond Our Borders? Public Resistance to Global Genomic Data Sharing, global sharing of genomic data “enables the best science and ultimately the greatest contributions to human well-being.” The article’s authors further argue that international collaboration might be the only feasible way of collecting enough data for rare diseases. Yet, some US persons remain apprehensive about global genomic data sharing.

Endorsements & Opposition

  • Senator Charles Grassley (R-IA), Press Release, May 7, 2019: “The inspector general’s finding that NIH didn’t consider risks posed by foreign individuals when allowing access to sensitive information is alarming,” National security is the primary concern of the federal government, and that should inform every decision at every department and agency. I appreciate the inspector general’s candid assessment. Only by bringing light to these issues will there ever be accountability. Foreign governments and non-state actors are attempting to exploit our weaknesses at every turn. NIH needs to be more vigilant against this very real threat and I intend to follow-up on these and related issues.”

Potential Impacts

Senator Charles Grassley (R-IA) states that national security is the “primary concern of the federal government.” He further argues that the NIH needs to be “more vigilant.” If the recommendations of the OIG report are followed, this could change NIH internal policies regarding how genomic data is shared especially in the context of access by foreign researchers. As outlined in the report, “NIH could strengthen its controls by developing a security framework, conducting a risk assessment, and implementing additional appropriate security controls designed to safeguard sensitive data. We also recommend that NIH develop and implement mechanisms to ensure data security policies keep current with emerging threats.”