scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MaltParser: A language-independent system for data-driven dependency parsing

TL;DR: Experimental evaluation confirms that MaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data.
Abstract: Parsing unrestricted text is useful for many language technology applications but requires parsing methods that are both robust and efficient. MaltParser is a language-independent system for data-driven dependency parsing that can be used to induce a parser for a new language from a treebank sample in a simple yet flexible manner. Experimental evaluation confirms that MaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data.
Citations
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations


Cites background from "MaltParser: A language-independent ..."

  • ...Domain Representative works Computer vision LIBPMK (Grauman and Darrell, 2005) Natural language processing Maltparser (Nivre et al., 2007) Neuroimaging PyMVPA (Hanke et al....

    [...]

Proceedings Article
06 Jan 2007
TL;DR: Open Information Extraction (OIE) as mentioned in this paper is a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input.
Abstract: Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

1,574 citations

Proceedings Article
01 May 2012
TL;DR: New data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the OPUS project are reported.
Abstract: This paper presents the current status of OPUS, a growing language resource of parallel corpora and related tools. The focus in OPUS is to provide freely available data sets in various formats together with basic annotation to be useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.

1,559 citations

Book
Nizar Habash1
30 Aug 2010
TL;DR: The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing to provide system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language.
Abstract: he Arabic language has recently become the focus of an increasing number of projects in natural language processing (NLP) and computational linguistics (CL). In this book, I try to provide NLP/CL system developers and researchers (computer scientists and linguists alike) with the necessary background information for working with Arabic.I discuss various Arabic linguistic phenomena and review the state-of-the-art in Arabic processing.

715 citations


Cites background or methods from "MaltParser: A language-independent ..."

  • ...There are several state-of-the-art parsers used for parsing Arabic: Bikel parser (phrase structure) [153, 119, 154], Malt parser (dependency) [146, 112, 110] and Stanford parser [155, 156], among others....

    [...]

  • ...PADT was used in the CoNLL 2006 and CoNLL 2007 shared tasks on dependency parsing [146] and its morphological data has been used for training automatic taggers [114]....

    [...]

Proceedings Article
01 Dec 2007
TL;DR: The tasks of the different tracks are defined and how the data sets were created from existing treebanks for ten languages are described, to characterize the different approaches of the participating systems and report the test results and provide a first analysis of these results.
Abstract: The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In thispaper, we definethe tasksof the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.

606 citations

References
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations


"MaltParser: A language-independent ..." refers methods in this paper

  • ...…be used to solve the learning problem posed by inductive dependency parsing, most of the work done in this area has been based on support vector machines (SVM) and memory-based learning (MBL).6 SVM is a hyperplane classifier that relies on the maximum margin strategy introduced by Vapnik (1995)....

    [...]

  • ...SVM is a hyperplane classifier that relies on the maximum margin strategy introduced by Vapnik (1995) ....

    [...]

01 Jan 1995

4,292 citations


"MaltParser: A language-independent ..." refers methods in this paper

  • ...…be used to solve the learning problem posed by inductive dependency parsing, most of the work done in this area has been based on support vector machines (SVM) and memory-based learning (MBL).6 SVM is a hyperplane classifier that relies on the maximum margin strategy introduced by Vapnik (1995)....

    [...]

Journal ArticleDOI
TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Abstract: This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

1,956 citations


"MaltParser: A language-independent ..." refers methods in this paper

  • ...The data has been converted to dependency trees using the head percolation table of Yamada and Matsumoto (2003), and dependency type labels have been inferred using a variation of the scheme employed by Collins (1999), which makes use of the nonterminal labels on the head daughter, non-head daughter and parent corresponding to a given dependency relation....

    [...]

  • ...This methodology is exemplified by the influential parsers of Collins (1997; 1999) and Charniak (2000), among others....

    [...]

  • ...Thus, several studies have reported a substantial increase in error rate when applying state-of-the-art statistical parsers developed for English to other languages, such as Czech (Collins et al. 1999), Chinese (Bikel and Chiang 2000; Levy and Manning 2003), German (Dubey and Keller 2003), and Italian (Corazza et al. 2004)....

    [...]

  • ...History-based feature models for predicting the next parser action (Black et al. 1992; Magerman 1995; Ratnaparkhi 1997; Collins 1999) 3....

    [...]

  • ...In more recent developments, the history-based model has replaced the grammar completely, as in the parsers of Collins (1997; 1999) and Charniak (2000)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors present two approaches for obtaining class probabilities, which can be reduced to linear systems and are easy to implement, and show conceptually and experimentally that the proposed approaches are more stable than the two existing popular methods: voting and the method by Hastie and Tibshirani (1998).
Abstract: Pairwise coupling is a popular multi-class classification method that combines all comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement. We show conceptually and experimentally that the proposed approaches are more stable than the two existing popular methods: voting and the method by Hastie and Tibshirani (1998)

1,888 citations