Incorporating Social Context and Domain Knowledge for Entity Recognition

doi:10.1145/2736277.2741135

Home
/
Papers
/
Incorporating Social Context and Domain Knowledge for Entity Recognition

Proceedings ArticleDOI

Incorporating Social Context and Domain Knowledge for Entity Recognition

Jie Tang,Zhanpeng Fang,Jimeng Sun +2 moreTsinghua University,Georgia Institute of Technology

- pp 517-526

Show Less

Chats0

TLDR

The SOCINST model, which can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance, is proposed and incorporated into the model using a Dirichlet tree distribution.

Abstract:

Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances? In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution. We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; $p l 1e-5 with t-test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A distantly supervised approach for recognizing product mentions in user-generated content

Henry Silva Vieira,Altigran Soares da Silva,Pável Calado,Edleno Silva de Moura +3 more

- 27 May 2022 -

Journal of Intelligent Information Syste...

Show Less

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley,Stanford University

- 01 Mar 2003 -

Journal of Machine Learning Research

Show Less

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 moreUniversity of California, Berkeley

Show Less

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty,Andrew McCallum,Fernando Pereira +2 moreCarnegie Mellon University

Show Less

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty,Andrew McCallum,Fernando Pereira,Kevin Duh +3 more

Show Less

Journal ArticleDOI

Probabilistic latent semantic indexing

Thomas HofmannInternational Computer Science Institute

Show Less

TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

...read moreread less

1
2
3
4
…
5
6
7
8
9

Collapse

Journal of the Association for Informati...

Show Less

Modeling Social Annotation: a Bayesian Approach

Anon Plangprasopchok,Kristina Lerman +1 moreInformation Sciences Institute

- 09 Nov 2008 -

arXiv: Artificial Intelligence

Show Less

Joint Implicit and Explicit Neural Networks for Question Recommendation in CQA Services

Hongkui Tu,Jiahui Wen,Aixin Sun,Xiaodong Wang +3 moreNational University of Defense Technology,Nanyang Technological University

- 13 Nov 2018 -

IEEE Access

Show Less

Incorporating Social Context and Domain Knowledge for Entity Recognition

Citations

A distantly supervised approach for recognizing product mentions in user-generated content

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Probabilistic Models for Segmenting and Labeling Sequence Data

Probabilistic latent semantic indexing

Related Papers (5)

Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach

Modeling Content and Users: Structured Probabilistic Representation and Scalable Inference Algorithms

Collective Named Entity Recognition in User Comments via Parameterized Label Propagation

Modeling Social Annotation: a Bayesian Approach

Joint Implicit and Explicit Neural Networks for Question Recommendation in CQA Services