Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 40 (Database issue), D33-7

The International Nucleotide Sequence Database Collaboration

Collaborators, Affiliations

The International Nucleotide Sequence Database Collaboration

Ilene Karsch-Mizrachi et al. Nucleic Acids Res.


The members of the International Nucleotide Sequence Database Collaboration (INSDC; set out to capture, preserve and present globally comprehensive public domain nucleotide sequence information. The work of the long-standing collaboration includes the provision of data formats, annotation conventions and routine global data exchange. Among the many developments to INSDC resources in 2011 are the newly launched BioProject database and improved handling of assembly information. In this article, we outline INSDC services and update the reader on developments in 2011.


Figure 1.
Figure 1.
Sample BioProject records, shown from the NCBI website, for (A) a genome sequencing project and (B) an epigenomic project. Linkages are created from the BioProject record to the resources containing data and from the BioProject record to other BioProject records that are part of the same initiative or contain related data.
Figure 2.
Figure 2.
(A) Cumulative base pairs in INSDC over time, excluding the Trace Archive (raw data from capillary sequencing platforms). (B) Base pairs in INSDC over time since 1980, broken down into selected data components. Cumulative data volume in base pairs broken down into assembled sequence (whole genome shotgun methods and others) and raw next-generation-sequence data.
Figure 3.
Figure 3.
Cumulative growth in the number of sequences included in the traditional INSDC sequence archives over time. Bulk sequence data includes non-WGS bulk submission types i.e. EST, GSS, Patent and Transcriptome Shotgun Assembly (TSA). WGS includes the number of sequence overlap contigs. Non-bulk data is the remainder.
Figure 4.
Figure 4.
Growth in genomes. The layered chart shows the number of new species with genomes entered into INSDC databases over time by taxonomic group. The 2011 time point includes data released in the first 9 months.

Similar articles

See all similar articles

Cited by 40 PubMed Central articles

See all "Cited by" articles


    1. Kodama Y, Shumway M, Leinonen R. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:D54–D56. - PMC - PubMed
    1. Cochrane G, Karsch-Mizrachi I, Nakamura Y. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2011;39:D15–D18. - PMC - PubMed
    1. Barrett T, Clark K, Gevorgyan R, Gorelenkov V, Gribov E, Karsch-Mizrachi I, Kimelman M, Pruitt KD, Resenchuk S, Tatusova T, et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012;40:D57–D63. - PMC - PubMed
    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 2011;39:D1005–D1010. - PMC - PubMed
    1. Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag IE, Emam I, Farne A, Hastings E, Holloway E, et al. ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 2011;39:D1002–D1004. - PMC - PubMed

Publication types