Home
/
Authors
/
Gregor Wiedemann

Author

Gregor Wiedemann

Bio: Gregor Wiedemann is an academic researcher from University of Hamburg. The author has contributed to research in topics: Topic model & Information extraction. The author has an hindex of 13, co-authored 53 publications receiving 690 citations. Previous affiliations of Gregor Wiedemann include Leipzig University.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2011

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology

[...]

Daniel Maier¹, Annie Waldherr², Peter Miltner¹, Gregor Wiedemann³, Andreas Niekler³, Alexa Keinert¹, Barbara Pfetsch¹, Gerhard Heyer³, Ueli Reber⁴, Thomas Häussler⁴, Hannah Schmid-Petri⁵, Silke Adam⁴ - Show less +8 more•Institutions (5)

Free University of Berlin¹, University of Münster², Leipzig University³, University of Bern⁴, University of Passau⁵

16 Feb 2018-Communication Methods and Measures

TL;DR: The overall goal is to make LDA topic modeling more accessible to communication researchers and to ensure compliance with disciplinary standards by developing a brief hands-on user guide for applying L DA topic modeling.

...read moreread less

Abstract: Latent Dirichlet allocation (LDA) topic models are increasingly being used in communication research. Yet, questions regarding reliability and validity of the approach have received little attentio...

...read moreread less

375 citations

Posted Content•

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

[...]

Gregor Wiedemann, Steffen Remus, Avi Chawla, Chris Biemann

23 Sep 2019-arXiv: Computation and Language

TL;DR: This paper proposed a simple but effective approach to word sense disambiguation using a nearest neighbor classification on contextualized word embeddings (CWEs) and compared the performance of different CWE models for the task and reported improvements above the current state of the art for two standard WSD benchmark datasets.

...read moreread less

Abstract: Contextualized word embeddings (CWE) such as provided by ELMo (Peters et al., 2018), Flair NLP (Akbik et al., 2018), or BERT (Devlin et al., 2019) are a major recent innovation in NLP. CWEs provide semantic vector representations of words depending on their respective context. Their advantage over static word embeddings has been shown for a number of tasks, such as text classification, sequence tagging, or machine translation. Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD). We introduce a simple but effective approach to WSD using a nearest neighbor classification on CWEs. We compare the performance of different CWE models for the task and can report improvements above the current state of the art for two standard WSD benchmark datasets. We further show that the pre-trained BERT model is able to place polysemic words into distinct 'sense' regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability.

...read moreread less

103 citations

Journal Article•DOI•

Opening up to Big Data: Computer-Assisted Analysis of Textual Data in Social Sciences

[...]

Gregor Wiedemann¹•Institutions (1)

Leipzig University¹

25 May 2013-Historical Social Research

TL;DR: To clarify methodological differences of various computer-assisted text analysis approaches the article suggests a typology from the perspective of a qualitative researcher, which shows compatibilities between manual qualitative data analysis methods and computational, rather quantitative approaches for large scale mixed method text analysis designs.

...read moreread less

Abstract: Two developments in computational text analysis may change the way qualitative data analysis in social sciences is performed: 1. the availability of digital text worth to investigate is growing rapidly, and 2. the improvement of algorithmic information extraction approaches, also called text mining, allows for further bridging the gap between qualitative and quantitative text analysis. The key factor hereby is the inclusion of context into computational linguistic models which extends conventional computational content analysis towards the extraction of meaning. To clarify methodological differences of various computer-assisted text analysis approaches the article suggests a typology from the perspective of a qualitative researcher. This typology shows compatibilities between manual qualitative data analysis methods and computational, rather quantitative approaches for large scale mixed method text analysis designs. URN: http://nbn-resolving.de/urn:nbn:de:0114-fqs1302231

...read moreread less

90 citations

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

[...]

Gregor Wiedemann, Steffen Remus, Avi Chawla, Chris Biemann

01 Sep 2019

TL;DR: A simple but effective approach to WSD using a nearest neighbor classification on CWEs and it is shown that the pre-trained BERT model is able to place polysemic words into distinct 'sense' regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability.

...read moreread less

55 citations

Book•DOI•

Text Mining for Qualitative Data Analysis in the Social Sciences

[...]

Gregor Wiedemann

01 Jan 2016

39 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Primer in BERTology: What We Know About How BERT Works

[...]

Anna Rogers¹, Olga Kovaleva¹, Anna Rumshisky¹•Institutions (1)

University of Massachusetts Lowell¹

01 Jan 2020-Transactions of the Association for Computational Linguistics

TL;DR: A survey of over 150 studies of the BERT model can be found in this paper, where the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression.

...read moreread less

Abstract: Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.

...read moreread less

617 citations

Posted Content•

A Primer in BERTology: What we know about how BERT works

[...]

Anna Rogers¹, Olga Kovaleva², Anna Rumshisky²•Institutions (2)

University of Copenhagen¹, University of Massachusetts Lowell²

27 Feb 2020-arXiv: Computation and Language

TL;DR: This paper is the first survey of over 150 studies of the popular BERT model, reviewing the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression.

...read moreread less

616 citations

Proceedings Article•DOI•

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval).

[...]

Marcos Zampieri¹, Shervin Malmasi², Preslav Nakov³, Sara Rosenthal⁴, Noura Farra⁴, Ritesh Kumar⁵ - Show less +2 more•Institutions (5)

University of Wolverhampton¹, Brigham and Women's Hospital², Qatar Computing Research Institute³, Columbia University⁴, Indian Institutes of Technology⁵

01 Jun 2019

TL;DR: The SemEval-2019 Task 6 on Identifying and categorizing Offensive Language in Social Media (OffensEval) as mentioned in this paper was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets, and featured three sub-tasks.

...read moreread less

Abstract: We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets, and it featured three sub-tasks. In sub-task A, systems were asked to discriminate between offensive and non-offensive posts. In sub-task B, systems had to identify the type of offensive content in the post. Finally, in sub-task C, systems had to detect the target of the offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, nearly 800 teams signed up to participate in the task and 115 of them submitted results, which are presented and analyzed in this report.

...read moreread less

498 citations

Proceedings Article•DOI•

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

[...]

Marcos Zampieri¹, Preslav Nakov², Sara Rosenthal³, Pepa Atanasova⁴, Georgi Karadzhov⁵, Hamdy Mubarak², Leon Derczynski⁶, Zeses Pitenis⁷, Çağrı Çöltekin⁸ - Show less +5 more•Institutions (8)

Rochester Institute of Technology¹, Qatar Computing Research Institute², IBM³, University of Copenhagen⁴, Massachusetts Institute of Technology⁵, IT University of Copenhagen⁶, University of Wolverhampton⁷, University of Tübingen⁸

17 Jul 2020

TL;DR: The SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020) as mentioned in this paper included three subtasks corresponding to the hierarchical taxonomy of the OLID schema, and was offered in five languages: Arabic, Danish, English, Greek, and Turkish.

...read moreread less

Abstract: We present the results and the main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval-2020). The task included three subtasks corresponding to the hierarchical taxonomy of the OLID schema from OffensEval-2019, and it was offered in five languages: Arabic, Danish, English, Greek, and Turkish. OffensEval-2020 was one of the most popular tasks at SemEval-2020, attracting a large number of participants across all subtasks and languages: a total of 528 teams signed up to participate in the task, 145 teams submitted official runs on the test data, and 70 teams submitted system description papers.

...read moreread less

249 citations

The ERP response to the amount of information conveyed by words in sentences (vol 140, pg 1, 2015)

[...]

Stefan L. Frank, Leun J. Otten, Giulia Galli, Gabriella Vigliocco

01 Nov 2015

TL;DR: The authors investigated whether event-related potentials (ERPs) too are predicted by information measures and found that different information measures quantify cognitively different processes and that readers do not make use of a sentence's hierarchical structure for generating expectations about the upcoming word.

...read moreread less

Abstract: Reading times on words in a sentence depend on the amount of information the words convey, which can be estimated by probabilistic language models. We investigate whether event-related potentials (ERPs), too, are predicted by information measures. Three types of language models estimated four different information measures on each word of a sample of English sentences. Six different ERP deflections were extracted from the EEG signal of participants reading the same sentences. A comparison between the information measures and ERPs revealed a reliable correlation between N400 amplitude and word surprisal. Language models that make no use of syntactic structure fitted the data better than did a phrase-structure grammar, which did not account for unique variance in N400 amplitude. These findings suggest that different information measures quantify cognitively different processes and that readers do not make use of a sentence's hierarchical structure for generating expectations about the upcoming word.

...read moreread less

207 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196

Collapse