Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012:2012:1350-9.
Epub 2012 Nov 3.

A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning

Affiliations

A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning

Jialan Que et al. AMIA Annu Symp Proc. 2012.

Abstract

A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The geometric view of an SVM model. The dots have class labels “+1”, and the “X”s have class labels “−1”. W is the set of parameters to be learned, and m is the margin, or the longest distance between the support vector (dotted line) for a given class and the separating plane (dashed line). Note that f(X) = WX in the figure.
Figure 2
Figure 2
Vertically distributed data matrix with m dimensions (features) and n rows (records). Each vertical slice represents a different site.
Figure 3
Figure 3
DPP-SVM web-service framework. Three procedures are: (1) local kernel matrix calculation; (2) local kernel matrix transmission to the server; and (3) partial weights calculation. Local data are executed consecutively to build a global SVM from vertically partitioned data. The web-service exchanges local kernel matrices (i.e., non-sensitive intermediary results) between the server and the participants, and calculates global model parameters. Expensive matrix multiplications of high-dimensional data (i.e., whole genome features) are handled by powerful private server/clouds, but a local clinic without a lot of computation power can still build a global SVM model from calculations done on the cloud. In addition, the DPP-SVM framework enforces privacy because no participant ever leaks sensitive patient information to other participants or to the server (although a unique identifier needs to be agreed upon at all sites).
Figure 4
Figure 4
The structure of the web application for our DPP-SVM learning framework. Tasks created by a single site (task creator) are stored in the trusted server (task manager). Signed applets sitting on the user’s side calculate and communicate with the trusted server to transmit privacy insensitive intermediary results, and the final SVM model is calculated using the kernel combiner and the partial sum combiner, both scheduled in the trusted server.
Figure 5
Figure 5
Detailed workflow for distributed privacy preserving support vector machine (DPP-SVM). The entire process of the DPP-SVM is driven by emails (red blocks), which are sent by the task manager from the trusted server. There is no need for participants to disclose their data to others, and their data are only processed locally (yellow blocks). Task-related information (name, description and participants) and intermediary results are stored in the database on the server (parallelograms).
Figure 6
Figure 6
Snapshots of DPP-SVM web application interfaces. (a) The HTML form used to create a task: ① task name, ② expiration period, ③task description, ④ the creator’s email, ⑤ participant email(s), and ⑥ the global model parameter C. (b) The signed Java applet that processes data locally.
Figure 7
Figure 7
The discrimination performance of DPP-SVMs is the same as the centralized SVM, with precision O(10−6), using the tic-tac-toe data set. The three subfigures correspond to (a) the ROC curves of 10 cross-validation folds, (b) vertically averaged ROC curve with standard deviations for false positive rates, (c) AUC (Area Under the ROC Curves) for all 10 folds of both SVMs (i.e., distributed and centralized). Note that the color bar in (a) indicates cutoff values for different ROCs.
Figure 8
Figure 8
The discrimination performance of DPP-SVMs is the same as that of the centralized SVM, with precision O(10−6), using the hospital discharge data set. The three subfigures correspond to (a) the ROC curves of 10 folds, (b) vertically averaged ROC curve with standard deviations for false positive rates, (c) AUC for all 10 folds of both SVMs (i.e., distributed and centralized). Note that the color bar in (a) indicates cutoff values for different ROCs.
Figure 9
Figure 9
Time comparison between centralized SVM and DPP-SVMs with various numbers of participants using the tic-tac-toe data. The left figure plots the time and the right figure shows their box plots. The red square brackets on the right figure indicate differences that were not significant. Black brackets indicate significantly different times.
Figure 10
Figure 10
Time comparison between centralized SVM and DPP-SVMs with various numbers of participants using the hospital discharge data. The left figure plots the time and the right figure shows their box plots. All time differences were statistically significant.

Similar articles

  • WebGLORE: a web service for Grid LOgistic REgression.
    Jiang W, Li P, Wang S, Wu Y, Xue M, Ohno-Machado L, Jiang X. Jiang W, et al. Bioinformatics. 2013 Dec 15;29(24):3238-40. doi: 10.1093/bioinformatics/btt559. Epub 2013 Sep 25. Bioinformatics. 2013. PMID: 24072732 Free PMC article.
  • The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.
    Matschinske J, Späth J, Bakhtiari M, Probul N, Kazemi Majdabadi MM, Nasirigerdeh R, Torkzadehmahani R, Hartebrodt A, Orban BA, Fejér SJ, Zolotareva O, Das S, Baumbach L, Pauling JK, Tomašević O, Bihari B, Bloice M, Donner NC, Fdhila W, Frisch T, Hauschild AC, Heider D, Holzinger A, Hötzendorfer W, Hospes J, Kacprowski T, Kastelitz M, List M, Mayer R, Moga M, Müller H, Pustozerova A, Röttger R, Saak CC, Saranti A, Schmidt HHHW, Tschohl C, Wenke NK, Baumbach J. Matschinske J, et al. J Med Internet Res. 2023 Jul 12;25:e42621. doi: 10.2196/42621. J Med Internet Res. 2023. PMID: 37436815 Free PMC article.
  • Fair compute loads enabled by blockchain: sharing models by alternating client and server roles.
    Kuo TT, Gabriel RA, Ohno-Machado L. Kuo TT, et al. J Am Med Inform Assoc. 2019 May 1;26(5):392-403. doi: 10.1093/jamia/ocy180. J Am Med Inform Assoc. 2019. PMID: 30892656 Free PMC article.
  • Analysis of Privacy Preservation Enhancements in Federated Learning Frameworks.
    Anastasakis Z, Bourou S, Velivasaki TH, Voulkidis A, Skias D. Anastasakis Z, et al. In: Sofia RC, Soldatos J, editors. Shaping the Future of IoT with Edge Intelligence: How Edge Computing Enables the Next Generation of IoT Applications. Abingdon (UK): River Publishers; 2024 Jan. Chapter 6. In: Sofia RC, Soldatos J, editors. Shaping the Future of IoT with Edge Intelligence: How Edge Computing Enables the Next Generation of IoT Applications. Abingdon (UK): River Publishers; 2024 Jan. Chapter 6. PMID: 38564559 Free Books & Documents. Review.
  • Privacy-Preserving Methods for Feature Engineering Using Blockchain: Review, Evaluation, and Proof of Concept.
    Jones M, Johnson M, Shervey M, Dudley JT, Zimmerman N. Jones M, et al. J Med Internet Res. 2019 Aug 14;21(8):e13600. doi: 10.2196/13600. J Med Internet Res. 2019. PMID: 31414666 Free PMC article. Review.

Cited by

References

    1. Murphy SN, Gainer V, Mendis M, et al. Strategies for maintaining patient privacy in i2b2. Journal of the American Medical Informatics Association. 2011;18:103–8. - PMC - PubMed
    1. Vinterbo SA, Sarwate AD, Boxwala A. AMIA Summit on Clinical Research Informatics (CRI’11) San Francisco: 2011. Protecting count queries in cohort identification; p. 79.
    1. Ohno-Machado L, Bafna C, Boxwala A, et al. iDASH. Integrating data for analysis, anonymization, and sharing. Journal of the American Medical Informatics Association. 2012;19:196–201. - PMC - PubMed
    1. Grzybowski DM. Patient privacy: the right to know versus the need to access. Health management technology. 2005;2654:53. - PubMed
    1. Ohno-Machado L, Silveira PSP, Vinterbo S. Protecting patient privacy by quantifiable control of disclosures in disseminated databases. International Journal of Medical Informatics. 2004;73:599–606. - PubMed

Publication types

LinkOut - more resources