The high-quality in vivo preclinical safety data produced by the pharmaceutical industry during drug development, which follows numerous strict guidelines, are mostly not available in the public domain. These safety data are sometimes published as a condensed summary for the few compounds that reach the market, but the majority of studies are never made public and are often difficult to access in an automated way, even sometimes within the owning company itself. It is evident from many academic and industrial examples, that useful data mining and model development requires large and representative data sets and careful curation of the collected data. In 2010, under the auspices of the Innovative Medicines Initiative, the eTOX project started with the objective of extracting and sharing preclinical study data from paper or pdf archives of toxicology departments of the 13 participating pharmaceutical companies and using such data for establishing a detailed, well-curated database, which could then serve as source for read-across approaches (early assessment of the potential toxicity of a drug candidate by comparison of similar structure and/or effects) and training of predictive models. The paper describes the efforts undertaken to allow effective data sharing intellectual property (IP) protection and set up of adequate controlled vocabularies) and to establish the database (currently with over 4000 studies contributed by the pharma companies corresponding to more than 1400 compounds). In addition, the status of predictive models building and some specific features of the eTOX predictive system (eTOXsys) are presented as decision support knowledge-based tools for drug development process at an early stage.