Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora

William L Hamilton; Kevin Clark; Jure Leskovec; Dan Jurafsky

doi:10.18653/v1/D16-1057

Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora

Proc Conf Empir Methods Nat Lang Process. 2016 Nov:2016:595-605. doi: 10.18653/v1/D16-1057.

Authors

William L Hamilton¹, Kevin Clark¹, Jure Leskovec¹, Dan Jurafsky¹

Affiliation

¹ Department of Computer Science, Stanford University, Stanford CA, 94305.

Abstract

A word's sentiment depends on the domain in which it is used. Computational social science research thus requires sentiment lexicons that are specific to the domains being studied. We combine domain-specific word embeddings with a label propagation framework to induce accurate domain-specific sentiment lexicons using small sets of seed words. We show that our approach achieves state-of-the-art performance on inducing sentiment lexicons from domain-specific corpora and that our purely corpus-based approach outperforms methods that rely on hand-curated resources (e.g., WordNet). Using our framework, we induce and release historical sentiment lexicons for 150 years of English and community-specific sentiment lexicons for 250 online communities from the social media forum Reddit. The historical lexicons we induce show that more than 5% of sentiment-bearing (non-neutral) English words completely switched polarity during the last 150 years, and the community-specific lexicons highlight how sentiment varies drastically between different communities.

Grants and funding

U54 EB020405/EB/NIBIB NIH HHS/United States