Technology&Platforms
De novo Sequencing

De novo Assembly

The major work of our group is developing de novo assembler, especially for short reads. Recently we released SOAP de novo, a assembler which has successfully assembled several large genomes: cucumber, Asian individuals, Giant Panda and et al.  As the current sequencing technique produces longer reads and longer insert size pairs, we are improving the software to accommodate it.

Ongoing Projects:

1) Large plant genomes with plenty of repeats

2) Highly heterozygous diploid genomes

Big genomes & Assembly

Many organisms have very complex genomes. How to assemble short sequencing reads to an entire and accurate reference genome is a big challenge in the field of bioinformatics. Our group is working on de novo reproducing  accurate assembly at an acceptable computational cost, based on short sequencing reads generated by the next-generation sequencer. The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers, but applications of such technologies seem to be limited to resequencing and transcript discovery. Employing several algorithms have been post with different assemblers recently,  we will demonstrate the feasibility of de novo assembly using Solexa reads during illustration. A compromise strategy is applied for the highly repetitive genome, other than sequencing errors and the assembly of reads containing mismatches. We use tools developed by Dr. Zhu Hongmei’s group to assemble novel genomes according to their different characters. 

Ongoing Projects:

1) The cucumber Genome Initiative: The assembly of cucumber genome by a compromise strategy

2) Potato genome project: The assembly of potato genome

Metagenomics & Transcriptome Assembly

The group is interested in the assembly of transcriptome and metagenomics.The difficulty of transcriptome study will be assembly problem. But compared to whole genome assembly, the assembly problem is a lot easier for transcripts because most of repeats are not sitting in protein-coding regions or UTRs, even the gene duplication don’t really hurt you that badly. Through collaborations with other groups in BGI, it has been possible to assemble the RNA-Seq data. At present, we focus on the assembly of transcriptome. And furthermore, with the achievement in transcriptome analysis, we will try to apply a similarly method in metagenomics study.

Ongoing Projects:

1) The assembly of transcriptome for the callus of rice indica

2)  Building a overlap-based short-reads assembler

Annotation

Our group is focusing on the field of genome annotation, which includes identifying the structural and functional elements, integrating and displaying the valuable information on genomic level. It mainly consists of four parts: (1) Identify protein coding genes using automatic pipeline to combine evidence derived from different methods, such as de novo prediction, cDNA/EST mapping, and homology protein aligning. (2) Assign function descriptions to genes by searching against the known database (such as NT/NR, SwissProt/TrEMbl), and predict domains and infer ontology terms. (3) identify ncRNAs genes such as rRNA, tRNA, miRNA, and other RNA genes by de novo prediction or homologue searching. (4) identify repeat sequences: tandem repeats by TRF, transposable elements by Repeat-Masker. Other analysis such as regulatory elements and pseudogenes prediction are under development. All these methods and software are included in GACP (Genome Annotation and Comparison Pipeline) project, which has been updated constantly since 2006, and achieve version 7.0 in 2009 June .

Projects and brief description:

We have finished annotation projects for a broad branch of species include plants, animals, bacteria, and fungi. In 2004, we published a silkworm paper on Science, which trained the parameters specially for gene prediction. In 2005, we published a rice paper on PLoS which used a post filter methods to remove TE contamination from gene models. In 2007, We published a bacteria paper on JOURNAL OF BACTERIOLOGY, which had performed a comprehensive annotation and analysis. The projects undertaken but not published include rice improved annotation silkworm improved annotation, and a set of microbes and BACs. As the coming of large-scale sequencing era, many genomes will be sequenced and called for fine annotation in the near future. To the end of year 2008, the cucumber genome, giant panda genome, tobacco genome, and a new drosophila genome, will be fully sequenced and annotated.

  News
  Related Information
·Solexa Data Processing
·De novo Sequencing
·Evolution & Comparative Genomics
·Transcription and Regulation Analysis
·Metagenomics & Bacteria
·Molecular Breeding
·Software Development/Database/IT and System
   |   BMC   |   Legal   |   Site Map   |   Privacy   |   Contact Copyright © 1999-2010 BGI