High-throughput computing in the sciences

Methods Enzymol. 2009:467:197-227. doi: 10.1016/S0076-6879(09)67008-7.

Abstract

While it is true that the modern computer is many orders of magnitude faster than that of yesteryear; this tremendous growth in CPU clock rates is now over. Unfortunately, however, the growth in demand for computational power has not abated; whereas researchers a decade ago could simply wait for computers to get faster, today the only solution to the growing need for more powerful computational resource lies in the exploitation of parallelism. Software parallelization falls generally into two broad categories--"true parallel" and high-throughput computing. This chapter focuses on the latter of these two types of parallelism. With high-throughput computing, users can run many copies of their software at the same time across many different computers. This technique for achieving parallelism is powerful in its ability to provide high degrees of parallelism, yet simple in its conceptual implementation. This chapter covers various patterns of high-throughput computing usage and the skills and techniques necessary to take full advantage of them. By utilizing numerous examples and sample codes and scripts, we hope to provide the reader not only with a deeper understanding of the principles behind high-throughput computing, but also with a set of tools and references that will prove invaluable as she explores software parallelism with her own software applications and research.

MeSH terms

  • Algorithms
  • Computational Biology* / instrumentation
  • Computational Biology* / methods
  • Computers*
  • Database Management Systems
  • Humans
  • Monte Carlo Method
  • Science*
  • Software*
  • User-Computer Interface