Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 2;6:e4588.
doi: 10.7717/peerj.4588. eCollection 2018.

PlasmidSeeker: Identification of Known Plasmids From Bacterial Whole Genome Sequencing Reads

Affiliations
Free PMC article

PlasmidSeeker: Identification of Known Plasmids From Bacterial Whole Genome Sequencing Reads

Märt Roosaare et al. PeerJ. .
Free PMC article

Abstract

Background: Plasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem.

Results: We developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based on k-mers and uses k-mer abundance to distinguish between plasmid and bacterial sequences. We tested the performance of PlasmidSeeker on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity.

Conclusion: PlasmidSeeker enables quick detection of known plasmids and complements existing tools that assemble plasmids de novo. The PlasmidSeeker source code is stored on GitHub: https://github.com/bioinfo-ut/PlasmidSeeker.

Keywords: Plasmid; Unassembled; Whole genome sequencing; k-mer.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Fraction of unique k-mers in bacteria and unique chromosomal k-mers shared with the plasmid database.
Dashed lines indicate the fraction of unique k-mers in the assembled full genome of the bacterium (number of unique k-mers divided by all k-mers of the bacterium), and solid lines indicate the fraction of unique chromosomal k-mers that are also present in the plasmid database.
Figure 2
Figure 2. Fraction of detected plasmid k-mers (F) is affected by the distance from the reference plasmid and k-mer length.
F decreases with increasing k-mer length, with more pronounced differences with larger distances. In all cases, increasing the distance decreased F. Name, size and bacterial host of the plasmid were as follows: pUM505, 123,322 bp, P. aeruginosa; pOSAK1, 3,306 bp, E. coli. Green lines show test results for k = 16, orange line k = 24 and blue line k = 32. Distance from the reference plasmid is given in nucleotide substitutions per bp.
Figure 3
Figure 3. Sample coverage affects the fraction of detectable (frequency >1) chromosomal k-mers in simulated samples.
Each simulated sample was converted to a k-mer list, and all k-mers not present in the reference bacterium were discarded. The fraction of detectable k-mers was calculated by dividing the number of k-mers with frequency >1 by the total number of strain k-mers. The theoretical distribution was assumed to follow a Poisson distribution, read length was equal to the read length in simulated samples (80 bp) and the average error rate was 0.01/bp.
Figure 4
Figure 4. Threshold of the fraction of plasmid k-mers detected (F) affects the number of false positives in real and simulated samples.
Number of false positive identifications decreased with higher F thresholds. At an F threshold of 95%, there were no false positives. No false negatives were detected at any threshold value, and there were no false positives for P. stuartii and C. callunae. Simulated samples are marked “sim”. Read length in simulated samples is 80 bp; C. callunae, 300 bp; C. freundii, 400 bp; P. stuartii, 202 bp.
Figure 5
Figure 5. Clustering algorithm used by PlasmidSeeker.
First, plasmids are sorted by length, starting with the longest (1). Clusters are formed based on the overlap coefficient C (fraction of shared k-mers relative to the smaller reference plasmid). In step 1, the longest reference plasmid from the results (its k-mer list) is picked and compared to all other detected reference plasmids. All plasmids with C exceeding a threshold are recruited to Cluster 1. In step 2, reference plasmids already placed to Cluster 1 are excluded. The process continues until all plasmids are assessed. Numbers depict different plasmids (1 is the longest, 8 the shortest) and colors indicate shared k-mers.

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles

References

    1. Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A, Pevzner P. PlasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics. 2016;32:btw493. doi: 10.1093/bioinformatics/btw493. - DOI - PubMed
    1. Carattoli A, Zankari E, García-Fernández A, Larsen MV, Lund O, Villa L, Aarestrup FM, Hasman H. In Silico detection and typing of plasmids using plasmidfinder and plasmid multilocus sequence typing. Antimicrobial Agents and Chemotherapy. 2014;58(7):3895–3903. doi: 10.1128/aac.02412-14. - DOI - PMC - PubMed
    1. Couturier M, Bex F, Bergquist PL, Maas WK. Identification and classification of bacterial plasmids. Microbiology Reviews. 1988;52:375–395. - PMC - PubMed
    1. Kaplinski L, Lepamets M, Remm M. GenomeTester4: a toolkit for performing basic set operations—union, intersection and complement on k-mer lists. GigaScience. 2015;4(1):1–8. doi: 10.1186/s13742-015-0097-y. - DOI - PMC - PubMed
    1. Lanza VF, de Toro M, Garcillán-Barcia MP, Mora A, Blanco J, Coque TM, de la Cruz F. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences. PLOS Genetics. 2014;10(12):e1004766 doi: 10.1371/journal.pgen.1004766. - DOI - PMC - PubMed

Grant support

This work was supported by the European Union through the European Regional Development Fund through Estonian Centre of Excellence in Genomics and Translational Medicine (project No. 2014-2020.4.01.15-0012) and by the Estonian Ministry of Education and Research (institutional grant IUT34-11). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Feedback