scispace - formally typeset
Proceedings ArticleDOI

Incorporating Social Context and Domain Knowledge for Entity Recognition

Reads0
Chats0
TLDR
The SOCINST model, which can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance, is proposed and incorporated into the model using a Dirichlet tree distribution.
Abstract
Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances? In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution. We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; $p l 1e-5 with t-test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

TempoRec: Temporal-Topic Based Recommender for Social Network Services

TL;DR: A hybrid recommendation algorithm based on social relations and time-sequenced topics, which has been verified using Real Sina Weibo datasets, works well and achieves better mean average precision (MAP) than existing other counterparts.
Book

Knowledge Graphs: An Information Retrieval Perspective

TL;DR: An overview of the literature on knowledge graphs (KGs) in the context of information retrieval (IR) is provided and how KGs can be employed to support IR tasks, including document and entity retrieval is discussed.
Proceedings Article

Multi-modal Bayesian embeddings for learning social knowledge graphs

TL;DR: A multi-modal Bayesian embedding model, GenVector, is proposed to learn latent topics that generate word and network embeddings in a shared latent topic space, and significantly decreases the error rate in an online A/B test with live users.
Proceedings ArticleDOI

AMiner: Mining Deep Knowledge from Big Scholar Data

TL;DR: This talk will focus on answering two fundamental questions for author-centric network analysis: who is who?
Journal ArticleDOI

A semantic and social‐based collaborative recommendation of friends in social networks

TL;DR: A novel approach which combines a user‐based collaborative filtering (CF) algorithm with semantic and social recommendations for the recommendation of users in social networks is proposed and a social recommender system based on this approach is developed.
References
More filters
Proceedings Article

Resolving Entity Morphs in Censored Data

TL;DR: This paper exploits temporal constraints to collect crosssource comparable corpora relevant to any given morph query and identify target candidates and proposes various novel similarity measurements including surface features, meta-path based semantic features and social correlation features and combine them in a learning-to-rank framework.
Proceedings ArticleDOI

Accurate Product Name Recognition from User Generated Content

TL;DR: This paper proposes a hybrid approach which combines the results obtained by several separately trained recognition models and achieves the best performance in the contest.
Proceedings ArticleDOI

Exploiting user clicks for automatic seed set generation for entity matching

TL;DR: This paper presents an approach that leverages user clicks during Web search to automatically generate training data for entity matching, and finds that Web pages clicked for a given query are likely to be about the same entity.
Related Papers (5)