Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug;2015:995-1004.
doi: 10.1145/2783258.2783362.

ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Free PMC article

ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Xiang Ren et al. KDD. .
Free PMC article


Entity recognition is an important but challenging research problem. In reality, many text collections are from specific, dynamic, or emerging domains, which poses significant new challenges for entity recognition with increase in name ambiguity and context sparsity, requiring entity detection without domain restriction. In this paper, we investigate entity recognition (ER) with distant-supervision and propose a novel relation phrase-based ER framework, called ClusType, that runs data-driven phrase mining to generate entity mention candidates and relation phrases, and enforces the principle that relation phrases should be softly clustered when propagating type information between their argument entities. Then we predict the type of each entity mention based on the type signatures of its co-occurring relation phrases and the type indicators of its surface name, as computed over the corpus. Specifically, we formulate a joint optimization problem for two tasks, type propagation with relation phrases and multi-view relation phrase clustering. Our experiments on multiple genres-news, Yelp reviews and tweets-demonstrate the effectiveness and robustness of ClusType, with an average of 37% improvement in F1 score over the best compared method.

Keywords: Entity Recognition and Typing; Relation Phrase Clustering.


Figure 1
Figure 1. An example of distant supervision
Figure 2
Figure 2. The constructed heterogeneous graph
Figure 3
Figure 3. Example output of candidate generation
Figure 4
Figure 4. Example entity name-relation phrase links from Yelp reviews
Figure 5
Figure 5. Example mention-mention links for entity surface name “White House” from Tweets
Figure 6
Figure 6. Performance breakdown by types
Figure 7
Figure 7. Performance changes in F1 score with #clusters, #seeds and corpus size on Tweets
Figure 8
Figure 8
Case studies on context sparsity and surface name popularity on the Tweet dataset.

Similar articles

See all similar articles

Cited by 2 articles

LinkOut - more resources