dbGaP

Guidance on access and use of dbGaP data

The NCBI Database of Genotypes and Phenotypes (dbGaP) archives the results of studies that have investigated the interaction of genotype and phenotype and distributes these results to investigators for secondary study. The types of data available include phenotype data, association (GWAS) data, summary level analysis data, SRA (Short Read Archive) data, reference alignment (BAM) data, VCF (Variant Call Format) data, expression data, imputed genotype data, image data, etc. 

Investigators who are permanent employees of Princeton and eligible to serve as a Principal Investigator (PI)  at Princeton, may request access to dbGaP datasets for research purposes. 

Note that new security requirements, effective January 25, 2025, require that Approved Users of NIH controlled-access data maintain downloaded data on institutional IT systems and third-party computing infrastructures that comply with the cybersecurity standards found in NIST SP 800-171 “Protecting Controlled Unclassified Information in Nonfederal Information Systems and Organizations”. Currently, the only system at Princeton that meets these requirements is Citadel. All data downloaded should be stored and processed on that system. 

These new security requirements also apply to other NIH controlled access repositories. A list of all repositories for which these new regulations apply can be found here.

Process for requesting access to dbGaP datasets
  1. The investigator creates a project and follows the steps to complete and submit the online application through the dbGaP Authorized Access System as a “Principal Investigator” using their eRA Commons account.
    • Investigators who do not already have an eRA Commons account should work with the grant managers to initiate account set up.
  2. The application routes to the Princeton Institutional Signing Official (Office of Research and Project Administration - ORPA) to approve the submission once steps 3 and 4 are complete.
    • If the appropriate Signing Official is not listed, then they won't be able to view the Data Access Request. 
  3. To initiate ORPA’s review of the application, the investigator (or their representative) submits a request for a Data Use Agreement (DUA) in ERA. 
    • In addition to the required information for DUA agreement type, please provide the following information in the ERA submission:
      • Dataset being requested, including version numbers
      • The dbGaP project number
      • The storage location where the data will be saved and the IT person who oversees that location
  4. ORPA will review the information submitted and either approve or coordinate with the investigator if additional information or steps are needed.
  5. Once approved by ORPA, the NIH reviews and authorizes approval to access the data. The investigator may then access the dataset through the dbGaP Authorized Access System.

     

Project renewals and closeouts

Projects nearing expiration in dbGaP must be either renewed or closed. 

For renewals, please follow the process described above for requesting access and instead of submitting a new DUA request in ERA, please submit an amendment to the original DUA.

For closeouts, please initiate the closeout process in the dbGaP system. Submit an amendment to the original DUA in ERA and include documentation confirming the data has been destroyed.

 

Need help?

Contact Liz Powell at [email protected] and Aaron Collie at [email protected]