scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Book ChapterDOI
03 Sep 2012
TL;DR: Three Machine Learning techniques are tested on the 1-million token manually annotated subcorpus of the National Corpus of Polish: Decision Tree induction, Memory-Based Learning and Conditional Random Fields.
Abstract: The published experiments with shallow parsing for Slavic languages are characterised with small size of the corpora used. With the publication of the National Corpus of Polish (NCP), a new opportunity was opened: to test several chunking algorithms on the 1-million token manually annotated subcorpus of the NCP. We test three Machine Learning techniques: Decision Tree induction, Memory-Based Learning and Conditional Random Fields. We also investigate the influence of tagging errors on the overall chunker performance, which happens to be quite substantial.

14 citations

Proceedings Article
01 Jan 2001
TL;DR: This work introduces shapaqa, a shallow parsing approach to online, open-domain question answering on the WorldWideWeb that uses a memory-based shallow parser to analyze web pages retrieved using normal keyword search on a search engine.
Abstract: We introduce shapaqa, a shallow parsing approach to online, open-domain question answering on the WorldWideWeb. Given a form-based natural language question as input, the system uses a memory-based shallow parser to analyze web pages retrieved using normal keyword search on a search engine. Two versions of the system are evaluated on a test set of 200 questions. In combination with two back-off methods a mean reciprocal rank of .46 is achieved.

14 citations

01 Jan 2003
TL;DR: This article proposes to apply shallow parsing, implemented by means of cascades of finite-state transducers, to extract complex index terms based on an approximate grammar of Spanish to improve the effectiveness of the index terms extracted.
Abstract: The extraction of the keywords that characterize each document in a given collection is one of the most important components of an Information Retrieval system. In this article, we propose to apply shallow parsing, implemented by means of cascades of finite-state transducers, to extract complex index terms based on an approximate grammar of Spanish. The effectiveness of the index terms extracted has been evaluated through the CLEF collection.

14 citations

Journal ArticleDOI
Christoph Tillmann1, Tong Zhang
TL;DR: A novel training method for a localized phrase-based prediction model for statistical machine translation (SMT) that explicitly handles local phrase reordering and a novel stochastic gradient descent training algorithm is presented that can easily handle millions of features.
Abstract: In this article, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrase-based translation that explicitly handles local phrase reordering. We use a maximum likelihood criterion to train a log-linear block bigram model which uses real-valued features (e.g., a language model score) as well as binary features based on the block identities themselves (e.g., block bigram features). The model training relies on an efficient enumeration of local block neighbors in parallel training data. A novel stochastic gradient descent (SGD) training algorithm is presented that can easily handle millions of features. Moreover, when viewing SMT as a block generation process, it becomes quite similar to sequential natural language annotation problems such as part-of-speech tagging, phrase chunking, or shallow parsing. Our novel approach is successfully tested on a standard Arabic-English translation task using two different phrase reordering models: a block orientation model and a phrase-distortion model.

14 citations

Journal Article
TL;DR: This paper proposes to integrate shallow parsing features and heuristic position information for modeling of the training process without introducing domain lexicon, and shows that after adding the proposed features, nearly all specifications of both conditional random fields and contrast model are improved, and the results of conditionalrandom fields are more efficient than that of the contrast model.

13 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611