Bioinformatics Seminar, Tuesday, Sept 18 @ 4:30PM

September 18 @ 4:30 PM - 5:30 PM -

Date: Tuesday, September 18, 2012
Time: 4:30 PM
Place: LILY G126

Speaker: Michael Schatz; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Title: "De novo assembly of complex genomes"


Emerging third-generation single molecule sequencing instruments can generate much longer sequences than prior methods, with the potential to dramatically improve genome and transcriptome assembly for complex genomes. However, the high error rate of the sequence reads makes their use in de novo assembly challenging, and has limited their use to specialized applications. To address these limitations, we introduced a novel sequence correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the inherent error in long, single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the de novo assembly of yeast (Saccharomyces cerevisiae), the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better 
assemblies than any other sequencing strategy currently available: in many cases, doubling the median contig size relative to high-coverage, second-generation assemblies.

Associated Reading:
S. Koren, M.C. Schatz, et al. 2012. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology. doi:10.1038/nbt.2280.

