Top 7 papers published in the topic of Shallow parsing in 2021

Journal Article•DOI•

New approach to the chunk recoginition in Polish.

[...]

Marcin Oleksy¹, Wiktor Walentynowicz¹, Jan Wieczorek¹•Institutions (1)

01 Jan 2021-Procedia Computer Science

TL;DR: The linguistic work on annotation guidelines development, manual corpus annotation, and preparing the neural models used for chunking - the first one for the Polish language, and the evaluation of these models are described.

...read moreread less

1 citations

Book Chapter•DOI•

Chomsky Was (Almost) Right: Ontology-Based Parsing of Texts of a Narrow Domain

[...]

Boris Israelevich Geltser, Tatiana Aleksandrovna Gorbach, Valeria Gribova, Olesya Vladimirovna Karpik, Eduard Klyshinsky¹, Dmitrii Okun, Margarita Vyacheslavovna Petryaeva, Carina Shakhgeldyan² - Show less +4 more•Institutions (2)

National Research University – Higher School of Economics¹, Vladivostok State University of Economics and Service²

20 Sep 2021

TL;DR: This article demonstrates an opposite approach: ontology-based entailing of words in combination with simple shallow parsing rules, which allows us to increase UAS metrics from 0.82 for SpaCy to 0.834 for the authors' approach.

...read moreread less

Abstract: The common approach to the analysis of natural texts implies that semantic analysis should following the stage of parsing. However, medical texts are known as very complicated and written in a very specific language. Traditional parsers are demonstrating relatively small productivity here. In this article, we are demonstrating an opposite approach: ontology-based entailing of words in combination with simple shallow parsing rules. It allows us to increase UAS metrics from 0.82 for SpaCy to 0.834 for our approach.

...read moreread less

Proceedings Article•DOI•

Granska API – an Online API for Grammar Checking and Other NLP Services

[...]

Jonas Sjöbergh¹, Viggo Kann²•Institutions (2)

Hokkaido University¹, Royal Institute of Technology²

12 Aug 2021

TL;DR: In this paper, the authors present an online API to access a number of Natural Language Processing services developed at KTH, including tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more.

...read moreread less

Abstract: We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.

...read moreread less

Posted Content•

Using Pause Information for More Accurate Entity Recognition

[...]

Sahas Dendukuri, Pooja Chitkara, Joel Ruben Antony Moniz, Xiao Yang, Manos Tsagkias, Stephen Pulman - Show less +2 more

27 Sep 2021-arXiv: Computation and Language

TL;DR: This article showed that the linguistic observation on pauses can be used to improve accuracy in machine-learnt language understanding tasks and applied pause duration to enrich contextual embeddings to improve shallow parsing of entities.

...read moreread less

Abstract: Entity tags in human-machine dialog are integral to natural language understanding (NLU) tasks in conversational assistants. However, current systems struggle to accurately parse spoken queries with the typical use of text input alone, and often fail to understand the user intent. Previous work in linguistics has identified a cross-language tendency for longer speech pauses surrounding nouns as compared to verbs. We demonstrate that the linguistic observation on pauses can be used to improve accuracy in machine-learnt language understanding tasks. Analysis of pauses in French and English utterances from a commercial voice assistant shows the statistically significant difference in pause duration around multi-token entity span boundaries compared to within entity spans. Additionally, in contrast to text-based NLU, we apply pause duration to enrich contextual embeddings to improve shallow parsing of entities. Results show that our proposed novel embeddings improve the relative error rate by up to 8% consistently across three domains for French, without any added annotation or alignment costs to the parser.

...read moreread less

Posted Content•

SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts

[...]

Ruben Kruiper, Ioannis Konstas, Alasdair J. G. Gray, Farhad Sadeghineko, Richard T. Watson, Bimal Kumar - Show less +2 more

04 Oct 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors introduce a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for automated compliance checking, and train a sequence tagger that achieves 79,93 F1-score on the test set.

...read moreread less

Abstract: Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).

...read moreread less

SPaR.txt, a Cheap Shallow Parsing Approach for Regulatory Texts

[...]

Ruben Kruiper, Ioannis Konstas, Alasdair J. G. Gray, Farhad Sadeghineko, Richard T. Watson, Bimal Kumar - Show less +2 more

01 Nov 2021

TL;DR: In this article, the authors introduce a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for automated compliance checking, and train a sequence tagger that achieves 79,93 F1-score on the test set.

...read moreread less

Abstract: Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).

...read moreread less

Using Pause Information for More Accurate Entity Recognition

[...]

Sahas Dendukuri, Pooja Chitkara, Joel Ruben Antony Moniz, Xiao Yang, Manos Tsagkias, Stephen Pulman - Show less +2 more

27 Sep 2021

TL;DR: This article showed that the linguistic observation on pauses can be used to improve accuracy in machine-learnt language understanding tasks and applied pause duration to enrich contextual embeddings to improve shallow parsing of entities.

...read moreread less

Abstract: Entity tags in human-machine dialog are integral to natural language understanding (NLU) tasks in conversational assistants. However, current systems struggle to accurately parse spoken queries with the typical use of text input alone, and often fail to understand the user intent. Previous work in linguistics has identified a cross-language tendency for longer speech pauses surrounding nouns as compared to verbs. We demonstrate that the linguistic observation on pauses can be used to improve accuracy in machine-learnt language understanding tasks. Analysis of pauses in French and English utterances from a commercial voice assistant shows the statistically significant difference in pause duration around multi-token entity span boundaries compared to within entity spans. Additionally, in contrast to text-based NLU, we apply pause duration to enrich contextual embeddings to improve shallow parsing of entities. Results show that our proposed novel embeddings improve the relative error rate by up to 8% consistently across three domains for French, without any added annotation or alignment costs to the parser.

...read moreread less

Showing papers on "Shallow parsing published in 2021"