scispace - formally typeset
Search or ask a question
Author

Daniël de Kok

Other affiliations: University of Groningen
Bio: Daniël de Kok is an academic researcher from University of Tübingen. The author has contributed to research in topics: Parsing & Treebank. The author has an hindex of 8, co-authored 24 publications receiving 430 citations. Previous affiliations of Daniël de Kok include University of Groningen.

Papers
More filters
Book ChapterDOI
01 Jan 2013
TL;DR: This chapter presents the Lassy Small and Lassy Large treebanks, as well as related tools and applications, which have been developed and made available for syntactically annotated corpora.
Abstract: This chapter presents the Lassy Small and Lassy Large treebanks, as well as related tools and applications. Lassy Small is a corpus of written Dutch texts (1,000,000 words) which has been syntactically annotated with manual verification and correction. Lassy Large is a much larger corpus (over 500,000,000 words) which has been syntactically annotated fully automatically. In addition, various browse and search tools for syntactically annotated corpora have been developed and made available. Their potential for applications in corpus linguistics and information extraction has been illustrated and evaluated in a series of case studies.

94 citations

Proceedings Article
19 Jun 2011
TL;DR: This work proposes reversible stochastic attribute-value grammars, in which a single statistical model is employed both for parse selection and fluency ranking.
Abstract: An attractive property of attribute-value grammars is their reversibility. Attribute-value grammars are usually coupled with separate statistical components for parse selection and fluency ranking. We propose reversible stochastic attribute-value grammars, in which a single statistical model is employed both for parse selection and fluency ranking.

58 citations

Proceedings ArticleDOI
06 Aug 2009
TL;DR: This work extends the iterative method of Sagot and de la Clergerie (2006) to treat n-grams of an arbitrary length, and proposes a new evaluation metric which will enable us to compare different error miners.
Abstract: Error mining is a useful technique for identifying forms that cause incomplete parses of sentences. We extend the iterative method of Sagot and de la Clergerie (2006) to treat n-grams of an arbitrary length. An inherent problem of incorporating longer n-grams is data sparseness. Our new method takes sparseness into account, producing n-grams that are as long as necessary to identify problematic forms, but not longer. Not every cause for parsing errors can be captured effectively by looking at word n-grams. We report on an algorithm for building more general patterns for mining, consisting of words and part of speech tags. It is not easy to evaluate the various error mining techniques. We propose a new evaluation metric which will enable us to compare different error miners.

24 citations


Cited by
More filters
Journal ArticleDOI

682 citations

Proceedings ArticleDOI
04 Sep 2013
TL;DR: This paper discusses some implementation and data processing challenges encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure, and compares the solution to the previous system.
Abstract: There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.

529 citations

Journal ArticleDOI
01 Dec 2013
TL;DR: The ACL Anthology Network is introduced, a comprehensive manually curated networked database of citations, collaborations, and summaries in the field of Computational Linguistics and a number of statistics about the network including the most cited authors, the most central collaborators, as well as network statistics.
Abstract: We introduce the ACL Anthology Network (AAN), a comprehensive manually curated networked database of citations, collaborations, and summaries in the field of Computational Linguistics. We also present a number of statistics about the network including the most cited authors, the most central collaborators, as well as network statistics about the paper citation, author citation, and author collaboration networks.

332 citations

Journal ArticleDOI
TL;DR: The authors identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures, which is a novel approach for parsing argumentation structure, and apply it to the problem of argumentation parsing.
Abstract: In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecti...

301 citations