Reference-guided assembly of four diverse Arabidopsis thaliana genomes
- PMID: 21646520
- PMCID: PMC3121819
- DOI: 10.1073/pnas.1107739108
Reference-guided assembly of four diverse Arabidopsis thaliana genomes
Abstract
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Similar articles
-
Sequencing of natural strains of Arabidopsis thaliana with short reads.Genome Res. 2008 Dec;18(12):2024-33. doi: 10.1101/gr.080200.108. Epub 2008 Sep 25. Genome Res. 2008. PMID: 18818371 Free PMC article.
-
Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E4052-60. doi: 10.1073/pnas.1607532113. Epub 2016 Jun 27. Proc Natl Acad Sci U S A. 2016. PMID: 27354520 Free PMC article.
-
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.Bioinformatics. 2014 Jun 15;30(12):i319-i328. doi: 10.1093/bioinformatics/btu291. Bioinformatics. 2014. PMID: 24932000 Free PMC article.
-
Shotguns and SNPs: how fast and cheap sequencing is revolutionizing plant biology.Plant J. 2010 Mar;61(6):922-7. doi: 10.1111/j.1365-313X.2009.04030.x. Plant J. 2010. PMID: 20409267 Review.
-
Technology-enabled great leap in deciphering plant genomes.Nat Plants. 2024 Apr;10(4):551-566. doi: 10.1038/s41477-024-01655-6. Epub 2024 Mar 20. Nat Plants. 2024. PMID: 38509222 Review.
Cited by
-
Transposon variants and their effects on gene expression in Arabidopsis.PLoS Genet. 2013;9(2):e1003255. doi: 10.1371/journal.pgen.1003255. Epub 2013 Feb 7. PLoS Genet. 2013. PMID: 23408902 Free PMC article.
-
Hound: a novel tool for automated mapping of genotype to phenotype in bacterial genomes assembled de novo.Brief Bioinform. 2024 Jan 22;25(2):bbae057. doi: 10.1093/bib/bbae057. Brief Bioinform. 2024. PMID: 38385882 Free PMC article.
-
Accurate indel prediction using paired-end short reads.BMC Genomics. 2013 Feb 27;14:132. doi: 10.1186/1471-2164-14-132. BMC Genomics. 2013. PMID: 23442375 Free PMC article.
-
Inheritance of Trans Chromosomal Methylation patterns from Arabidopsis F1 hybrids.Proc Natl Acad Sci U S A. 2014 Feb 4;111(5):2017-22. doi: 10.1073/pnas.1323656111. Epub 2014 Jan 21. Proc Natl Acad Sci U S A. 2014. PMID: 24449910 Free PMC article.
-
Functional test of Brassica self-incompatibility modifiers in Arabidopsis thaliana.Proc Natl Acad Sci U S A. 2011 Nov 1;108(44):18173-8. doi: 10.1073/pnas.1115283108. Epub 2011 Oct 24. Proc Natl Acad Sci U S A. 2011. PMID: 22025723 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous
