scispace - formally typeset
Search or ask a question
Author

Winfried Lötzsch

Bio: Winfried Lötzsch is an academic researcher from Hasso Plattner Institute. The author has contributed to research in topics: Reinforcement learning & Deep learning. The author has an hindex of 1, co-authored 5 publications receiving 45 citations. Previous affiliations of Winfried Lötzsch include Chemnitz University of Technology & University of Potsdam.

Papers
More filters
Book ChapterDOI
20 Mar 2016
TL;DR: This paper selects 15 of the most influential papers for author identification and recruits a group of students to reimplement them from scratch, laying the groundwork for integrating author identification with information retrieval to eventually scale the former to the web.
Abstract: In this paper, we revisit author identification research by conducting a new kind of large-scale reproducibility study: we select 15 of the most influential papers for author identification and recruit a group of students to reimplement them from scratch. Since no open source implementations have been released for the selected papers to date, our public release will have a significant impact on researchers entering the field. This way, we lay the groundwork for integrating author identification with information retrieval to eventually scale the former to the web. Furthermore, we assess the reproducibility of all reimplemented papers in detail, and conduct the first comparative evaluation of all approaches on three well-known corpora.

47 citations

Posted Content
TL;DR: A newly created combination of two commonly used reinforcement learning methods is tested to see whether it is able to learn more effectively than a baseline and to reduce training time and eventually help the algorithm to converge.
Abstract: Deep reinforcement learning enables algorithms to learn complex behavior, deal with continuous action spaces and find good strategies in environments with high dimensional state spaces. With deep reinforcement learning being an active area of research and many concurrent inventions, we decided to focus on a relatively simple robotic task to evaluate a set of ideas that might help to solve recent reinforcement learning problems. We test a newly created combination of two commonly used reinforcement learning methods, whether it is able to learn more effectively than a baseline. We also compare different ideas to preprocess information before it is fed to the reinforcement learning algorithm. The goal of this strategy is to reduce training time and eventually help the algorithm to converge. The concluding evaluation proves the general applicability of the described concepts by testing them using a simulated environment. These concepts might be reused for future experiments.

3 citations

Posted Content
TL;DR: This paper establishes a hierarchy of learning power depending on whether $C$-indices are required on all outputs; (a) only on outputs relevant for the class to be learned and (c) only in the limit as final, correct hypotheses.
Abstract: In language learning in the limit, the most common type of hypothesis is to give an enumerator for a language. This so-called $W$-index allows for naming arbitrary computably enumerable languages, with the drawback that even the membership problem is undecidable. In this paper we use a different system which allows for naming arbitrary decidable languages, namely programs for characteristic functions (called $C$-indices). These indices have the drawback that it is now not decidable whether a given hypothesis is even a legal $C$-index. In this first analysis of learning with $C$-indices, we give a structured account of the learning power of various restrictions employing $C$-indices, also when compared with $W$-indices. We establish a hierarchy of learning power depending on whether $C$-indices are required (a) on all outputs; (b) only on outputs relevant for the class to be learned and (c) only in the limit as final, correct hypotheses. Furthermore, all these settings are weaker than learning with $W$-indices (even when restricted to classes of computable languages). We analyze all these questions also in relation to the mode of data presentation. Finally, we also ask about the relation of semantic versus syntactic convergence and derive the map of pairwise relations for these two kinds of convergence coupled with various forms of data presentation.
Posted Content
TL;DR: Several maps (depictions of all pairwise relations) of various groups of learning criteria are provided, including a map for monotonicity restrictions and similar criteria and amap for restrictions on data presentation, to consider, for various learning criteria, whether learners can be assumed consistent.
Abstract: We study learning of indexed families from positive data where a learner can freely choose a hypothesis space (with uniformly decidable membership) comprising at least the languages to be learned. This abstracts a very universal learning task which can be found in many areas, for example learning of (subsets of) regular languages or learning of natural languages. We are interested in various restrictions on learning, such as consistency, conservativeness or set-drivenness, exemplifying various natural learning restrictions. Building on previous results from the literature, we provide several maps (depictions of all pairwise relations) of various groups of learning criteria, including a map for monotonicity restrictions and similar criteria and a map for restrictions on data presentation. Furthermore, we consider, for various learning criteria, whether learners can be assumed consistent.
DOI
01 Jan 2017
TL;DR: This paper investigates in how far a recent asynchronously parallel actor-critic approach, initially proposed to speed up discrete RL algorithms, could be used for the continuous control of robotic arms.
Abstract: Recent advances in deep reinforcement learning methods have attracted a lot of attention, because of their ability to use raw signals such as video streams as inputs, instead of pre-processed state variables. However, the most popular methods (value-based methods, e.g. deep Q-networks) focus on discrete action spaces (e.g. the left/right buttons), while realistic robotic applications usually require a continuous action space (for example the joint space). Policy gradient methods, such as stochastic policy gradient or deep deterministic policy gradient, propose to overcome this problem by allowing continuous action spaces. Despite their promises, they suffer from long training times as they need huge numbers of interactions to converge. In this paper, we investigate in how far a recent asynchronously parallel actor-critic approach, initially proposed to speed up discrete RL algorithms, could be used for the continuous control of robotic arms. We demonstrate the capabilities of this end-to-end learning algorithm on a simulated 2 degrees-of-freedom robotic arm and discuss its applications to more realistic scenarios.

Cited by
More filters
Journal ArticleDOI
TL;DR: An extensive performance analysis is performed on a corpus of 1,000 authors to investigate authorship attribution, verification, and clustering using 14 algorithms from the literature.
Abstract: The analysis of authorial style, termed stylometry, assumes that style is quantifiably measurable for evaluation of distinctive qualities. Stylometry research has yielded several methods and tools over the past 200 years to handle a variety of challenging cases. This survey reviews several articles within five prominent subtasks: authorship attribution, authorship verification, authorship profiling, stylochronometry, and adversarial stylometry. Discussions on datasets, features, experimental techniques, and recent approaches are provided. Further, a current research challenge lies in the inability of authorship analysis techniques to scale to a large number of authors with few text samples. Here, we perform an extensive performance analysis on a corpus of 1,000 authors to investigate authorship attribution, verification, and clustering using 14 algorithms from the literature. Finally, several remaining research challenges are discussed, along with descriptions of various open-source and commercial software that may be useful for stylometry subtasks.

129 citations

Posted Content
TL;DR: This work applies CNNs to large-scale authorship attribution, which aims to determine an unknown text's author among many candidate authors, motivated by their ability to process character-level signals and to differentiate between a large number of classes.
Abstract: Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision. Recently, character-level and multi-channel CNNs have exhibited excellent performance for sentence classification tasks. We apply CNNs to large-scale authorship attribution, which aims to determine an unknown text's author among many candidate authors, motivated by their ability to process character-level signals and to differentiate between a large number of classes, while making fast predictions in comparison to state-of-the-art approaches. We extensively evaluate CNN-based approaches that leverage word and character channels and compare them against state-of-the-art methods for a large range of author numbers, shedding new light on traditional approaches. We show that character-level CNNs outperform the state-of-the-art on four out of five datasets in different domains. Additionally, we present the first application of authorship attribution to reddit.

92 citations

Book ChapterDOI
11 Apr 2019
TL;DR: This paper combines ideas from “generalised differential privacy” and machine learning techniques for text processing to model privacy for text documents, defining a privacy mechanism that operates at the level of text documents represented as “bags-of-words”.
Abstract: We address the problem of how to “obfuscate” texts by removing stylistic clues which can identify authorship, whilst preserving (as much as possible) the content of the text. In this paper we combine ideas from “generalised differential privacy” and machine learning techniques for text processing to model privacy for text documents. We define a privacy mechanism that operates at the level of text documents represented as “bags-of-words”—these representations are typical in machine learning and contain sufficient information to carry out many kinds of classification tasks including topic identification and authorship attribution (of the original documents). We show that our mechanism satisfies privacy with respect to a metric for semantic similarity, thereby providing a balance between utility, defined by the semantic content of texts, with the obfuscation of stylistic clues. We demonstrate our implementation on a “fan fiction” dataset, confirming that it is indeed possible to disguise writing style effectively whilst preserving enough information and variation for accurate content classification tasks. We refer the reader to our complete paper [15] which contains full proofs and further experimentation details.

73 citations

01 Jan 2016
TL;DR: The impact of 3 obfuscators on the performance of a total of 44 authorship verification approaches has been measured and analyzed and the best-performing obfuscator successfully impacts the decision-making process of the authorship verifiers on average.
Abstract: We report on the first large-scale evaluation of author obfuscation approaches built to attack authorship verification approaches: the impact of 3 obfuscators on the performance of a total of 44 authorship verification approaches has been measured and analyzed. The best-performing obfuscator successfully impacts the decision-making process of the authorship verifiers on average in about 47% of the cases, causing them to misjudge a given pair of documents as having been written by “different authors” when in fact they would have decided otherwise if one of them had not been automatically obfuscated. The evaluated obfuscators have been submitted to a shared task on author obfuscation that we organized at the PAN 2016 lab on digital text forensics. We contribute further by surveying the literature on author obfuscation, by collecting and organizing evaluation methodology for this domain, and by introducing performance measures tailored to measuring the impact of author obfuscation on authorship verification.

63 citations

Book ChapterDOI
09 Sep 2019
TL;DR: The fact that the PAN 2019 evaluation lab continues to invite the submission of software rather than its run output using the TIRA experimentation platform, demarcates a good start into the second decade of PAN evaluations labs.
Abstract: We briefly report on the four shared tasks organized as part of the PAN 2019 evaluation lab on digital text forensics and authorship analysis. Each task is introduced, motivated, and the results obtained are presented. Altogether, the four tasks attracted 373 registrations, yielding 72 successful submissions. This, and the fact that we continue to invite the submission of software rather than its run output using the TIRA experimentation platform, demarcates a good start into the second decade of PAN evaluations labs.

58 citations