View Full Version : Deep sequencing of 10,000 human genomes

05-10-16, 11:57

We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

The goal of clinical use of the genome requires standards for sequencing, analysis, and interpretation. Our work specifically addresses the first two steps: sequencing and sequence analysis. The performance of the platform, implemented in full production mode, improves on recent benchmarks for the accurate interpretation of next-generation DNA sequencing in the clinical setting (22 (http://m.pnas.org/content/early/2016/10/03/1613365113.full#ref-22),28 (http://m.pnas.org/content/early/2016/10/03/1613365113.full#ref-28), 29 (http://m.pnas.org/content/early/2016/10/03/1613365113.full#ref-29)). This is needed for laboratory standards, regulatory purposes, and clinical diagnostics and research. The third step—interpretation—remains a major issue given the many types of genetic evidence that laboratories consider. Initiatives such as ClinVar and policies and guidelines (10 (http://m.pnas.org/content/early/2016/10/03/1613365113.full#ref-10), 30 (http://m.pnas.org/content/early/2016/10/03/1613365113.full#ref-30)) set standards for clinical interpretation.
This report also extends prior efforts at genome and exome sequencing by detailing the distribution of human variation in the noncoding genome. The amount of data supports the discovery of sites in the genome that are intolerant to variation. The 10,545 genomes provide estimates of the rate of discovery of new SNVs, and complements the human genome by more than 3 Mb through the identification of nonreference and putative human-like sequences. These data anticipate the relentless accumulation of rare variants and the scale of observable mutagenesis of the human genome.

we ain't seen nothing yet
once research gets in full swing, it will be hard to digest all the news