scispace - formally typeset
S

Stefan Evert

Researcher at University of Erlangen-Nuremberg

Publications -  101
Citations -  2871

Stefan Evert is an academic researcher from University of Erlangen-Nuremberg. The author has contributed to research in topics: Text corpus & Semantic similarity. The author has an hindex of 26, co-authored 101 publications receiving 2700 citations. Previous affiliations of Stefan Evert include University of Osnabrück & University of Stuttgart.

Papers
More filters
DissertationDOI

The statistics of word cooccurrences : word pairs and collocations

Stefan Evert
TL;DR: In this article, the authors present a comprehensive repository of association measures, which are organized into thematic groups and visualized as surfaces in a three-dimensional coordinate space, with the properties of each measure determined by the geometric shapes of the respective surfaces.
Proceedings ArticleDOI

Methods for the Qualitative Evaluation of Lexical Association Measures

TL;DR: This paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results obtained, and shows how estimates for the very large number of hapaxlegomena and double occurrences can be inferred from random samples.

Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium

TL;DR: Recent work to update CWB for the new century includes support for multiple character sets, most especially Unicode (in the form of UTF-8), allowing all the world’s writing systems to be utilised within a CWB-indexed corpus, and support for powerful Perl-style regular expressions in CQP queries.
Journal ArticleDOI

Using small random samples for the manual evaluation of statistical association measures

TL;DR: It is shown how an evaluation strategy based on random samples can reduce the amount of manual annotation work significantly, making it possible to perform many more evaluation experiments under specific conditions.
Book

Corpus Linguistics with BNCweb - a Practical Guide

TL;DR: Key methodological issues in corpus linguistics, such as collocations, keywords and the categorization of concordance lines are addressed, step-by-step with BNCweb, a user-friendly web-based tool that supports sophisticated analyses of the 100-million-word British National Corpus.