Welcome to Hail-Annotate’s documentation!
Hail-Annotate is a Python-based guide that helps users annotate genetic variant information with allele frequencies from the GnomAD population genomics database. This project is not affilitated with the GnomAD or Hail teams, although users can familiarize themselves with GnomAD via the project’s official website.
The following pages will walk users through a small project to take a variant call format (VCF) file and annotate the variants with their corresponding allele frequencies in the GnomAD dataset.
This library will require the use of the Hail genomics library and Google Cloud infrastructure. Please refer to the “Getting Started” section for full details.
Project Motivation
I often need to annotate VCF files with allele frequency information from GnomAD. For my annotation tasks, I initially used a locally-downloaded version of the GnomAD v2.1.1 exome and genome VCF files which I stored on an external hard drive. Writing Hail code to accomplish this was fairly straightforward, but annotating large VCF files was troublesome and occasionally exceeded the available space on my external hard drive (in addition to being very slow to run). For this reason, I developed a simple pipeline which uses Google Cloud to run Hail Annotation tasks. While I think Hail is very well documented, I struggled to write Hail code and deploy it on Google Cloud. I created this repository in the hopes of guiding others through this process.
Acknowledgements
Many thanks to the Hail team, especially Dan King, who’s own Hail Cloud tutorials were of considerable help as I developed this project.
Note
This project is under active development and is not associated with the Broad Institute, the Hail team, or the GnomAD project. This work is something that I have undertaken in my free time and it is not affiliated with any of my past or present employers.
Contents: