Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;15(1):14-18.
doi: 10.1016/j.gpb.2017.01.001. Epub 2017 Feb 2.

GSA: Genome Sequence Archive

Free PMC article

GSA: Genome Sequence Archive

Yanqing Wang et al. Genomics Proteomics Bioinformatics. .
Free PMC article


With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; or, a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.

Keywords: Big data; GSA; Genome Sequence Archive; INSDC; Raw sequence data.


Figure 1
Figure 1
Data model in GSAPrefixes of accession numbers for data objects, including BioProject, BioSample, Experiment, and Run, are indicated in red. Data objects Experiment and Run constitute China Read Archive.
Figure 2
Figure 2
Data statistics of GSAA. Numbers of BioProjects and BioSamples in GSA. B. Numbers of Experiments and Runs, as well as file size in GSA. All statistics are based on data submissions ranging from December 2015 to December 2016.
Figure 3
Figure 3
Graphic illustration of data submissions to GSATwo representative studies are provided here as examples to depict the data objects involved in data submission.

Similar articles

See all similar articles

Cited by 90 articles

See all "Cited by" articles


    1. Collins F.S., Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372:793–795. - PMC - PubMed
    1. Taylor P.N., Porcu E., Chew S., Campbell P.J., Traglia M., Brown S.J. Whole-genome sequence-based analysis of thyroid function. Nat Commun. 2015;6:5681. - PMC - PubMed
    1. Gudbjartsson D.F., Helgason H., Gudjonsson S.A., Zink F., Oddson A., Gylfason A. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. - PubMed
    1. Bai B., Zhao W.M., Tang B.X., Wang Y.Q., Wang L., Zhang Z. DoGSD: the dog and wolf genome SNP database. Nucleic Acids Res. 2015;43:D777–D783. - PMC - PubMed
    1. Xue Y., Lameijer E.W., Ye K., Zhang K., Chang S., Wang X. Precision medicine: what challenges are we facing? Genomics Proteomics Bioinformatics. 2016;14:253–261. - PMC - PubMed

LinkOut - more resources