The search for molecules with activity against Mycobacterium tuberculosis (Mtb) is employing many approaches in parallel including high throughput screening and computational methods. We have developed a database (CDD TB) to capture public and private Mtb data while enabling data mining and collaborations with other researchers. We have used the public data along with several cheminformatics approaches to produce models that describe active and inactive compounds. We have compared these datasets to those for known FDA approved drugs and between Mtb active and inactive compounds. The distribution of polar surface area and pK(a) of active compounds was found to be a statistically significant determinant of activity against Mtb. Hydrophobicity was not always statistically significant. Bayesian classification models for 220, 463 molecules were generated and tested with external molecules, and enabled the discrimination of active or inactive substructures from other datasets in the CDD TB. Computational pharmacophores based on known Mtb drugs were able to map to and retrieve a small subset of some of the Mtb datasets, including a high percentage of Mtb actives. The combination of the database, dataset analysis, Bayesian and pharmacophore models provides new insights into molecular properties and features that are determinants of activity in whole cells. This study provides novel insights into the key 1D molecular descriptors, 2D chemical substructures and 3D pharmacophores which can be used to mine the chemistry space, prioritizing those molecules with a higher probability of activity against Mtb.