A new concept of sequence data distribution on wide area networks

Comput Appl Biosci. 1994 Sep;10(5):519-26. doi: 10.1093/bioinformatics/10.5.519.

Abstract

Accepted concepts in distributed applications design have been applied in the development of a network-based system for the synchronization of remote sequence database access sites by an incremental update mechanism. Computer hardware requirements, network bandwidth, and stability considerations make centralized access to essential computerized resources undesirable. A network model has been developed to distribute access over a collection of remotely situated computer centers. The formally independent database-access nodes join to form a heterogeneous, long distance, co-operating network that can compensate for the deficiencies of unstable network links thereby ensuring uninterrupted access to the resource. In order to guarantee consistency among these nodes, several distributed transaction protocols have been investigated; based on these results, a prototype system has been implemented. A layered software architecture makes the distributed transaction protocol transparent to the individual database system and the underlying network. Individual components of this network communicate by means of Remote Procedure Calls (RPCs). A prototype software system operates to synchronize up to data copies of the PIR-International Protein Sequence Database (Barker et al., 1993) at a number of different sites using the public Internet as the transport vehicle.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Computer Communication Networks*
  • Data Collection
  • Databases, Factual
  • Gene Library*
  • Information Systems
  • Software*