The Need
Charles Danko’s lab was interested in using primary data from dbGaP, which holds all of the data and results from studies that have investigated the interaction of genotype and phenotype in humans. The Danko lab wanted to use the data to investigate how frequently they see changes in regions that control gene transcription related to their breast cancer research.
The Challenge
There are stringent security requirements that have to be fulfilled in order to use Restricted Access Data (RAD) held in dbGaP. In order to protect the privacy and intent of research participants, data access is restricted to scientific investigators pursuing research questions consistent with the informed consent agreements provided by individual research participants. The NIH also provides a set of data security best practices that researchers must follow in order to receive access.
The Solution
The Cornell Restricted Access Data Center (CRADC) provides secure access to restricted access and confidential data of all kinds, and is often the first stop for researchers who are requesting restricted access data.
For Danko and other life scientists doing data-intensive research, the tools needed to analyze the data require large amounts of computation, storage and memory, and Linux operating systems. CRADC is Windows based and so not a viable solution.
In order to fulfill security requirements of the dbGaP data, The Danko lab set up a secure high-performance computer at the Biotechnology Resource Center (BRC) Bioinformatics Facility. Working with BRC staff, the Office of Sponsored Programs (OSP) and the Institutional Review Board (IRB) they also created a security plan that documents the policies, operational procedures and technical configuration of their system.
With increasing needs for high-performing comupting support with restricted data, the OSP, BRC and CRADC are coordinating to improve the way that researchers are guided to the RAD resources they need.