scispace - formally typeset
Search or ask a question
Author

Malte Ostendorff

Bio: Malte Ostendorff is an academic researcher from University of Konstanz. The author has contributed to research in topics: Document classification & Recommender system. The author has an hindex of 6, co-authored 22 publications receiving 98 citations. Previous affiliations of Malte Ostendorff include German Research Centre for Artificial Intelligence.

Papers
More filters
Posted Content
TL;DR: Building upon BERT, a deep neural language model, it is demonstrated how to combine text representations with metadata and knowledge graph embeddings, which encode author information.
Abstract: In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowledge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. We make the source code and trained models of our experiments publicly available

47 citations

Posted Content
TL;DR: The QURATOR project, funded by the German Federal Ministry of Education and Research, develops a sustainable and innovative technology platform that provides services to support knowledge workers in various industries to address the challenges they face when curating digital content.
Abstract: In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industry and a broad range of expertise in AI, Machine Learning and Language Technologies, the QURATOR project, funded by the German Federal Ministry of Education and Research, develops a sustainable and innovative technology platform that provides services to support knowledge workers in various industries to address the challenges they face when curating digital content. The project's vision and ambition is to establish an ecosystem for content curation technologies that significantly pushes the current state of the art and transforms its region, the metropolitan area Berlin-Brandenburg, into a global centre of excellence for curation technologies.

22 citations

Journal ArticleDOI
17 Mar 2022-Findings
TL;DR: This work proposes a novel approach to formulate, extract, encode and inject hierarchical structure information explicitly into an extractive summarization model based on a pre-trained, encoder-only Transformer language model (HiStruct+ model), which improves SOTA ROUGEs for extractives summarization on PubMed and arXiv substantially.
Abstract: Transformer-based language models usually treat texts as linear sequences. However, most texts also have an inherent hierarchical structure, i.e., parts of a text can be identified using their position in this hierarchy. In addition, section titles usually indicate the common topic of their respective sentences. We propose a novel approach to formulate, extract, encode and inject hierarchical structure information explicitly into an extractive summarization model based on a pre-trained, encoder-only Transformer language model (HiStruct+ model), which improves SOTA ROUGEs for extractive summarization on PubMed and arXiv substantially. Using various experimental settings on three datasets (i.e., CNN/DailyMail, PubMed and arXiv), our HiStruct+ model outperforms a strong baseline collectively, which differs from our model only in that the hierarchical structure information is not injected. It is also observed that the more conspicuous hierarchical structure the dataset has, the larger improvements our method gains. The ablation study demonstrates that the hierarchical position information is the main contributor to our model’s SOTA performance.

19 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: A qualitative analysis validates the quantitative results and indicates that aspect-based document similarity indeed leads to more fine-grained recommendations.
Abstract: Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity approach for research papers. Paper citations indicate the aspect-based similarity, i.e., the title of a section in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. According to our results, SciBERT is the best performing system with F1-scores of up to 0.83. A qualitative analysis validates our quantitative results and indicates that aspect-based document similarity indeed leads to more fine-grained recommendations.

18 citations


Cited by
More filters
Journal Article
TL;DR: In this article, van Dijk proposed a new, interdisciplinary theory of news in the press, which represents a very ambitious and somewhat speculative effort to weave together a broad range of existing news research approaches into a coherent, heuristic framework.
Abstract: VAN DIJK, TEUN A., News as Discourse Hillsdale, N.J.: Lawrence Erlbaum, 1988. $29.95 cloth. This book attempts the development of a "new, interdisciplinary theory of news in the press" (p. vii). It represents a very ambitious and somewhat speculative effort to weave together a broad range of existing news research approaches into a coherent, heuristic framework. Van Dijk succeeds in providing a useful summary of the literature in news research. Especially valuable is his discussion of recent European research. However, the overall framework is still at an early stage of development. Its utility remains to be demonstrated by future research. For some time now, communication researchers have talked of passing paradigms and ferment in the field. With the decline of past paradigms, our discipline has been left with a hodgepodge of small-scale theories. This is particularly true in the area of news research where various narrative, discourse and information processing theories abound. Van Dijk's book may signal a new era, an era in which efforts will be made to integrate existing conceptual fragments into broader frameworks. His work may be seen as providing a model for others who seek to make sense of our current proliferation of theories. Van Dijk's approach is centered in the tradition of discourse analysis, which evolved out of an integration of literary analysis and linguistics. However, he has aggressively modified earlier forms of discourse analysis in an effort to incorporate insights into the structure and interpretation of discourse derived from cognitive psychology. He is not content to simply apply discourse analysis to the evaluation of news stories. He recognizes the utility of constructing an approach which also considers the production of news by media practitioners and the interpretation of news by audience members. It is these broader concerns which set van Dijk's approach apart from previous analyses of news content. A central concept in van Dijk's theory is the notion of story schemas, which are defined as implicit structures that underlie typical stories. These schemas permit the easy production of news and also facilitate its interpretation by news consumers. The schema concept is at once powerful and ambiguous. There is growing research evidence that demonstrates the utility of positing the existence of cognitive structures (schemas) in people's minds which are activated by content cues and guide interpretion of all forms of communication. The schema concept helps to explain why complex and seemingly ambiguous messages often are easily interpreted by audience members. It also can explain why the same message can be interpreted in highly discrepent ways. If messages contain conflicting cues that lead people to activate different schemas, or if people don't share a homogeneous set of schemas, then it is likely that many contrasting interpretations of story content will be developed. But despite growing consensus concerning the utility of schema as a concept, researchers remain quite divided over both its definition and the type of research that will lead to the most useful findings. …

581 citations

Journal ArticleDOI
TL;DR: Some of the chief advancements of these methods and their applications in rational materials design are reviewed, followed by a discussion on some of the main challenges and opportunities the authors currently face together with a perspective on the future ofrational materials design and discovery.
Abstract: Developing algorithmic approaches for the rational design and discovery of materials can enable us to systematically find novel materials, which can have huge technological and social impact. However, such rational design requires a holistic perspective over the full multistage design process, which involves exploring immense materials spaces, their properties, and process design and engineering as well as a techno-economic assessment. The complexity of exploring all of these options using conventional scientific approaches seems intractable. Instead, novel tools from the field of machine learning can potentially solve some of our challenges on the way to rational materials design. Here we review some of the chief advancements of these methods and their applications in rational materials design, followed by a discussion on some of the main challenges and opportunities we currently face together with our perspective on the future of rational materials design and discovery.

145 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: This work presents the experiments which lead to the creation of the BERT and ELECTRA based German language models, GBERT and GELECTRA and shows that these models are the best German models to date.
Abstract: In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. All trained models will be made publicly available to the research community.

130 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: This work proposes a novel topic-informed BERT-based architecture for pairwise semantic similarity detection and shows that the model improves performance over strong neural baselines across a variety of English language datasets.
Abstract: Semantic similarity detection is a fundamental task in natural language understanding. Adding topic information has been useful for previous feature-engineered semantic similarity models as well as neural models for other tasks. There is currently no standard way of combining topics with pretrained contextual representations such as BERT. We propose a novel topic-informed BERT-based architecture for pairwise semantic similarity detection and show that our model improves performance over strong neural baselines across a variety of English language datasets. We find that the addition of topics to BERT helps particularly with resolving domain-specific cases.

107 citations

Posted Content
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie Chen, Kathleen Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel1, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Ahmad Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Yang Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang 
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

76 citations