Motivation: Most experimental evidence on kinetic parameters is buried in the literature, whose manual searching is complex, time consuming and partial. These shortcomings become particularly acute in systems biology, where these parameters need to be integrated into detailed, genome-scale, metabolic models. These problems are addressed by KiPar, a dedicated information retrieval system designed to facilitate access to the literature relevant for kinetic modelling of a given metabolic pathway in yeast. Searching for kinetic data in the context of an individual pathway offers modularity as a way of tackling the complexity of developing a full metabolic model. It is also suitable for large-scale mining, since multiple reactions and their kinetic parameters can be specified in a single search request, rather than one reaction at a time, which is unsuitable given the size of genome-scale models.
Results: We developed an integrative approach, combining public data and software resources for the rapid development of large-scale text mining tools targeting complex biological information. The user supplies input in the form of identifiers used in relevant data resources to refer to the concepts of interest, e.g. EC numbers, GO and SBO identifiers. By doing so, the user is freed from providing any other knowledge or terminology concerned with these concepts and their relations, since they are retrieved from these and cross-referenced resources automatically. The terminology acquired is used to index the literature by mapping concepts to their synonyms, and then to textual documents mentioning them. The indexing results and the previously acquired knowledge about relations between concepts are used to formulate complex search queries aiming at documents relevant to the user's information needs. The conceptual approach is demonstrated in the implementation of KiPar. Evaluation reveals that KiPar performs better than a Boolean search. The precision achieved for abstracts (60%) and full-text articles (48%) is considerably better than the baseline precision (44% and 24%, respectively). The baseline recall is improved by 36% for abstracts and by 100% for full text. It appears that full-text articles are a much richer source of information on kinetic data than are their abstracts. Finally, the combined results for abstracts and full text compared with the curated literature provide high values for relative recall (88%) and novelty ratio (92%), suggesting that the system is able to retrieve a high proportion of new documents.
Availability: Source code and documentation are available at: (http://www.mcisb.org/resources/kipar/).