Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;22:154-165.
doi: 10.1142/9789813207813_0016.

RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS

Affiliations
Free PMC article

RABIX: AN OPEN-SOURCE WORKFLOW EXECUTOR SUPPORTING RECOMPUTABILITY AND INTEROPERABILITY OF WORKFLOW DESCRIPTIONS

Gaurav Kaushik et al. Pac Symp Biocomput. .
Free PMC article

Abstract

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.

Figures

Figure 1
Figure 1
Illustration of a directed acyclic graph (DAG). The DAG may be traversed from left-to-right, moving from node-to-node along the edges that connect them.
Figure 2
Figure 2
The process of parsing a workflow description. A. The machine-readable document is interpreted, from which B. a DAG is produced. From the DAG, C. subgraphs representing computational jobs that can be sent to backends for scheduling/execution and D. a job tree is resolved, which identifies “parent” and “leaf” nodes. Each leaf represents an individual job.
Figure 3
Figure 3
A DAG created from a workflow described by the Common Workflow Language which contains two tools ( A, B ). Tools have input and output ports, which define discrete data elements that are passed downstream along the edges of the DAG.
Figure 4
Figure 4
The algorithm as it is traversed. A. The engine interprets the top-level of the workflow description and B. inspects the contents of the workflow node and determines the DAG structure and links between each step (edges). The currying of value1 from the workflow input to the input of Tool A triggers an input event, where a job (analysis of Tool A with its inputs) is sent to a backend node. C. The execution continues and the engine traverses the DAG. D. The workflow is completed when the output of the final tool (W.B.O., value3) is curried to the overall workflow output (W.O). The port counters allow the engine to track when nodes are ready to be executed even if upstream jobs are only partially completed.
Figure 5
Figure 5
Graph transformations when performing parallelization. In this workflow, a function is performed on two inputs, an int and an array of ints. B. The flattened DAG created by the engine. Each value of the array is scattered as a single process to reduce computation time.
Figure 6
Figure 6
Graph transformations for sequential scattered nodes. A. The workflow from Fig. 5 with an additional downstream function with an input that can be scattered. B. During execution, the engine is able to look ahead to the next stage in the workflow. If any input is available (e.g. value of 11 returned by a tool), downstream processes which can proceed are started. C. The completed workflow.
Figure 7
Figure 7
Jobs can be grouped (grey background) for execution on a backend node from criteria set by the workflow or tool author.
Figure 8
Figure 8
Graph transformations of nested workflows to optimize total execution time. A. Workflow consisting of two tools. B. Workflow in Fig. 8a. extended with third tool. The engine allows the downstream tool to start executing once the necessary inputs are ready, even if the upstream workflow has yet to produce all of its outputs. No code refactoring from the workflow in 8a is required.

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

Publication types

Feedback