Complete Genomics and the Personal Genome Project publish and make freely available through open access the largest set of whole personal genome sequences with experimental haplotype phasing
Publish Date: 2016-10-11

October 11, 2016,  Mountain View, California —— Complete Genomics, Inc, a BGI company, and the Personal Genome Project (PGP) announced today the publication of over 100 individual whole genome sequences with experimental haplotype phasing. This set of data was generated using Long Fragment Read (LFR) technology and represents the largest set of high coverage whole human genome assemblies with comprehensive experimentally determined haplotypes. 

 “The vast majority of genomic data that has been generated to date is without experimentally derived haplotypes.” explained Dr. Brock Peters, Senior Director of Research and project leader for Complete Genomics.  “This represents a very unique set of data that is freely available for anyone to use through open access data publication.”  A total of 184 individuals, recruited by the PGP, took part in the project.  Each individual consented to have their identity, their genome, and their phenotype data made freely and publically available.  Blood samples were collected by the PGP team and sent to Complete Genomics for DNA isolation, LFR library generation, and whole genome sequencing.  Currently 114 genome assemblies are available with the remaining expected to be released in the next few months.  As part of the release of this genomic data Complete Genomics and the PGP also published a description and comprehensive analyses of the quality of the data in the journal GigaScience

"In 2011, we made freely available a set of 69 whole human genome sequence assemblies which quickly became a highly utilized resource and benchmark for the genetics community,” stated Dr. Radoje Drmanac, Chief Scientific Officer of Complete Genomics.  “We are proud to continue the tradition by releasing this set of experimentally haplotyped whole human genome sequence assemblies.  This represents the largest and most accurate set of human haplotypes currently available." The published variant and phasing data can be found at GigaScience, an open access data journal.  Corresponding reads and mappings for all genomes will be made available through the Database of Genotypes and Phenotypes (dbGaP).  “Combining Complete Genomics’ advanced WGS with the PGP’s informed consent policy which allows for unrestricted access and GigaScience’s open access data publication method enables the full release of a large data set with exceptional scientific value.  We expect it will be used by many researchers around the world”, explained Dr. George Church, founder of the PGP.  

The technology used to generate this dataset, LFR, was previously described by Complete Genomics in a 2012 Nature publication.  In that publication, genome quality and completeness was shown to be very high.  In the GigaScience publication LFR was again shown to be highly accurate and complete.  Each sample was sequenced to 100X coverage allowing for the detection of most variants with high confidence.  This allowed for over 98% of heterozygous variants to be placed into long contigs approaching 1 Mb in length. On average, over 85% of haplotypes contained no errors with the majority of the remaining 15% having only a single phasing error.


GigaScience paper:


