The axillary lymph node status remains the most valuable prognostic factor for breast cancer patients. However, approximately 20-30% of node-positive patients remain free of distant metastases within 15-30 years. It is important to develop molecular markers that are able to predict for the risk of distant metastasis and to develop patient-tailored therapy strategies. We hypothesize that the lymph node metastases may represent the most metastatic fraction of the primary cancers. Therefore, we sought to identify the differentially expressed genes by microarray between the primary tumors and their paired lymph node metastases samples collected from 26 patients. A set of 79 differentially expressed genes between primary cancers and metastasis samples was identified to correctly separate most of primary cancers from lymph node metastases. And decreased expression of matrix metalloproteinase 2, fibronectin, osteoblast specific factor 2, collagen type XI alpha 1 in lymph node metastases were further confirmed by real-time RT-PCR performed on 30 specimen pairs. This set of genes also classified 35 primary cancers into two groups with different prognosis: "high risk group" and "low risk group." Patients in "high risk group" had a 4.65-fold hazard ratio (95% CI 1.02-21.13, P = 0.047) to develop a distant metastasis within 43 months comparing with the "low risk group." This suggested that the gene signature consisting of 79 differentially expressed genes between primary cancers and lymph node metastases could also predict clinical outcome of node-positive patients, and that the molecular classification based on the gene signature could guide patient-tailored therapy.