Objective: We propose classification integration as a new method for data integration from different sources. We also propose reclassification as a new method of combining existing medical classifications for different classes.
Background: In many problems the raw data are already classified according to a set of features but need to be reclassified. Data reclassification is usually achieved using data integration methods that require the raw data, which may not be available or sharable because of privacy and legal concerns.
Methodology: We introduce general classification integration and reclassification methods that create new classes by combining in a flexible way the existing classes without requiring access to the raw data. The flexibility is achieved by representing any linear classification in a constraint database.
Results: The experiments using support vector machines and decision trees on heart disease diagnosis and primary biliary cirrhosis data show that our classification integration method is more accurate than current data integration methods when there are many missing values in the data. The reclassification problem also can be solved using constraint databases without requiring access to the raw data.
Conclusions: The classification integration and the reclassification methods are applied to two particular data sets. Beside these particular cases, our general method is also appropriate for many other application areas and may yield similar accuracy improvements. These methods may be also extended to non-linear classifiers.
Copyright 2010 Elsevier B.V. All rights reserved.