scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Proceedings ArticleDOI
12 Aug 2021
TL;DR: In this paper, the authors present an online API to access a number of Natural Language Processing services developed at KTH, including tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more.
Abstract: We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.
Book ChapterDOI
20 Sep 2017
TL;DR: The paper gives a technical description of CoSyCo, a corpus of syntactic co-occurrences, which provides information on syntactically connected words in the Russian language.
Abstract: The paper gives a technical description of CoSyCo, a corpus of syntactic co-occurrences, which provides information on syntactically connected words in the Russian language. The paper includes an overview of the corpora collected for CoSyCo creation and the amount of collected combinations. In the paper, we also provide a short evaluation of the gathered information.
Book ChapterDOI
01 Jan 2020
TL;DR: It is found that it costs more time for training and tagging with the machine learning method with more features and more fine-grained tagging schemes on all the corpora, Nevertheless, the tagging time is less affected by them.
Abstract: Text chunking, also known as shallow parsing, is an important task in natural language processing, and very useful for other tasks. By means of discriminate machine learning methods and extensive experiments, this paper investigates the impacts of different tagging schemes and feature types on chunking efficiency and effectiveness on corpora with different chunk specifications and languages. We find out that it costs more time for training and tagging with the machine learning method with more features and more fine-grained tagging schemes on all the corpora. Nevertheless, the tagging time is less affected by them. It is also revealed from our investigation that the method with more features and more fine-grained tagging schemes has better performance, but the chunk specification of corpus may have impacts on the choice.
Book ChapterDOI
01 Jan 2004
TL;DR: It is suggested that the use of Language Technologies and — more specifically — of Information Extraction technologies provides a substantial help in Customer Opinion Monitoring, when compared to alternative approaches, including both the “traditional” methodology of employing human operators for reading documents and formalizing relevant opinions/facts to be stored and data mining techniques bases on the non—linguistic structure of the page.
Abstract: The paper addresses a crucial topic in current CRM processes, i.e. the one of constant monitoring customer opinions. We use the label “Real Time Customer Opinion Monitoring” to denote the process of retrieving, analyzing and assessing opinions, judgments, criticisms about products and brands, from newsgroups, message boards, consumer associations sites and other public sources on the Internet. We suggest that the use of Language Technologies and — more specifically — of Information Extraction technologies provides a substantial help in Customer Opinion Monitoring, when compared to alternative approaches, including both the “traditional” methodology of employing human operators for reading documents and formalizing relevant opinions/facts to be stored, data mining techniques bases on the non—linguistic structure of the page (web mining) or on statistical rather then linguistic analysis of the text (text mining in its standard meaning). In the light of these considerations, a novel application (ArgoServer) is presented, where different technologies cooperate with the core linguistic information extraction engine in order to achieve the result of constantly updating a database of product or brand-related customer opinions automatically gathered from newsgroups. The paper will emphasize how far the currently implemented shallow parsing techniques can go in understanding the contents of customers and users’ messages, thus extracting database records from relevant textual segments. It will also stress the limits inherently associated to the use of pure shallow techniques for the comprehension of language, and show how a new emerging linguistic technology to be developed in the context of the European project Deep Thought could in principle overcome such limits.
Proceedings ArticleDOI
Qiang Zhou1, Hang Yu1
01 Oct 2008
TL;DR: A new relation tagging scheme to represent different intra-chunk relations is designed and several experiments of feature engineering are made to select a best baseline statistical model to improve parsing performance.
Abstract: Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.

Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611