S
Stefan Evert
Researcher at University of Erlangen-Nuremberg
Publications - 101
Citations - 2871
Stefan Evert is an academic researcher from University of Erlangen-Nuremberg. The author has contributed to research in topics: Text corpus & Semantic similarity. The author has an hindex of 26, co-authored 101 publications receiving 2700 citations. Previous affiliations of Stefan Evert include University of Osnabrück & University of Stuttgart.
Papers
More filters
DissertationDOI
The statistics of word cooccurrences : word pairs and collocations
TL;DR: In this article, the authors present a comprehensive repository of association measures, which are organized into thematic groups and visualized as surfaces in a three-dimensional coordinate space, with the properties of each measure determined by the geometric shapes of the respective surfaces.
Proceedings ArticleDOI
Methods for the Qualitative Evaluation of Lexical Association Measures
Stefan Evert,Brigitte Krenn +1 more
TL;DR: This paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results obtained, and shows how estimates for the very large number of hapaxlegomena and double occurrences can be inferred from random samples.
Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium
Stefan Evert,Andrew Hardie +1 more
TL;DR: Recent work to update CWB for the new century includes support for multiple character sets, most especially Unicode (in the form of UTF-8), allowing all the world’s writing systems to be utilised within a CWB-indexed corpus, and support for powerful Perl-style regular expressions in CQP queries.
Journal ArticleDOI
Using small random samples for the manual evaluation of statistical association measures
Stefan Evert,Brigitte Krenn +1 more
TL;DR: It is shown how an evaluation strategy based on random samples can reduce the amount of manual annotation work significantly, making it possible to perform many more evaluation experiments under specific conditions.
Book
Corpus Linguistics with BNCweb - a Practical Guide
TL;DR: Key methodological issues in corpus linguistics, such as collocations, keywords and the categorization of concordance lines are addressed, step-by-step with BNCweb, a user-friendly web-based tool that supports sophisticated analyses of the 100-million-word British National Corpus.