scispace - formally typeset

Information extraction

About: Information extraction is a(n) research topic. Over the lifetime, 14312 publication(s) have been published within this topic receiving 295135 citation(s). The topic is also known as: IE. more


Open accessPosted Content
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at more

Topics: Interface (computing) (57%), Deep learning (55%), Information extraction (54%) more

9,253 Citations

Open accessBook
12 Jun 2009-
Abstract: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication. Packed with examples and exercises, Natural Language Processing with Python will help you: Extract information from unstructured text, either to guess the topic or identify "named entities" Analyze linguistic structure in text, including parsing and semantic analysis Access popular linguistic databases, including WordNet and treebanks Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful. more

Topics: Language identification (66%), Natural language programming (65%), Language technology (63%) more

3,135 Citations

Open accessProceedings ArticleDOI: 10.3115/1219840.1219885
25 Jun 2005-
Abstract: Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks. more

Topics: Approximate inference (61%), Information extraction (58%), Gibbs sampling (57%) more

3,090 Citations

Open accessProceedings Article
Andrew Carlson1, Justin Betteridge1, Bryan Kisiel1, Burr Settles1  +2 moreInstitutions (2)
11 Jul 2010-
Abstract: We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74% after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent more

Topics: Knowledge base (55%), Software agent (52%), Information extraction (50%)

1,755 Citations

Proceedings ArticleDOI: 10.1145/1401890.1402008
Jie Tang1, Jing Zhang1, Limin Yao1, Juanzi Li1  +2 moreInstitutions (2)
24 Aug 2008-
Abstract: This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods. more

Topics: Information extraction (52%), Digital library (50%), Topic model (50%)

1,672 Citations

No. of papers in the topic in previous years

Top Attributes

Show by:

Topic's top 5 most impactful authors

Ralph Grishman

61 papers, 3.9K citations

Fabio Ciravegna

35 papers, 1.9K citations

Horacio Saggion

24 papers, 381 citations

Heng Ji

24 papers, 703 citations

Andrew McCallum

23 papers, 2.5K citations

Network Information
Related Topics (5)
Query expansion

17.5K papers, 452.7K citations

90% related
Knowledge extraction

20.2K papers, 413.4K citations

90% related

26.6K papers, 393.3K citations

89% related
Supervised learning

20.8K papers, 710.5K citations

89% related
Question answering

14K papers, 375.4K citations

89% related