scispace - formally typeset
Search or ask a question
Topic

Noisy text analytics

About: Noisy text analytics is a research topic. Over the lifetime, 700 publications have been published within this topic receiving 28759 citations.


Papers
More filters
Book ChapterDOI
René Schneider1
21 Apr 1998
TL;DR: This paper introduces a statistical method for bootstrapping a lexicon from a very small number of "noisy," domain-specific texts that determines regularity in grammatical forms and also reoccuring ungrammatical forms from the input text.
Abstract: Optical character recognition (OCR) still garbles a considerable amount of information reduction and noise on texts so that many documents are unsuitable for information extraction systems. This paper introduces a statistical method for bootstrapping a lexicon from a very small number of "noisy," domain-specific texts. This method determines regularity in grammatical forms and also reoccuring ungrammatical forms from the input text. Through a combination of frequency lists and Levenshtein matrices, a language independent, robust core lexicon is constructed that supports the analysis of "noisy texts," too.
Proceedings ArticleDOI
11 Dec 2004
TL;DR: Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India and not much work is reported on the development of optical character recognition systems for Telugu text.
Abstract: Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Not much work has been reported on the development of optical character recognition systems for Telugu text. Therefore, it is an area of current research. During the process of recognition, it is observed that, in many cases, a symbol is recognized erroneously because the recognizer incorrectly outputs a very similar looking symbol. Several such sets of symbols that are commonly confused for each other are identified and presented in a table called confusion table. Special logic and algorithms are developed using simple structural features, for resolving confusion and improving recognition accuracies considerably without too much additional computational effort.
Patent
Stephen G. Holmes1
06 Dec 2004
TL;DR: A handheld text image processing device can include a camera, a graphics processing component, a text processing component and a memory as mentioned in this paper, which can capture, store and manipulate information associated with representation of textual information in a convenient and efficient way.
Abstract: Information associated with representation of textual information can be captured, stored and manipulated in a convenient and efficient automated manner that conserves resources. A handheld text image processing device can include a camera, a graphics processing component, a text processing component and a memory. The camera captures digital picture information associated with text on an object. The graphics processing component performs graphics processing on the digital picture information that facilitate text recognition (e.g., transforms, rotations, etc.). The text processing component recognizes representations of the text in the digital picture information and converts the digital picture information associated with the text into a text file format. The memory stores the information in a text file format. The text file format can represent the textual information utilizing less bits than a file format in which the text information is captured. The text information can also be communicated in a text file format.
Patent
17 Feb 2000
TL;DR: In this article, a translating device stores in a translation information database, including an original language text and its object language text, a user confirmation text written in the original language and edited by making an addition or a change to or in the expression on the basis of the translation information.
Abstract: A translating device (10) stores in a translation information database (18) in addition to an original language text and its object language text, a user confirmation original language text written in the original language and edited by making an addition or a change to or in the expression on the basis of the object language text, a situation describing text describing the situation where the expression of the user confirmation original language text is used, a situation describing image and a situation describing speech relating to the situation where the expression is used, and translation information including information concerning the original language text and the object language text, that is, the application limit applied to the candidates of the results of translation of the original language text. The original language text is translated into an object language text by using the translation information.

Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
75% related
Server
79.5K papers, 1.4M citations
74% related
Cluster analysis
146.5K papers, 2.9M citations
74% related
Feature (computer vision)
128.2K papers, 1.7M citations
73% related
Wireless sensor network
142K papers, 2.4M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20236
20228
20201
20191
20184
201723