Topic
Noisy text analytics
About: Noisy text analytics is a research topic. Over the lifetime, 700 publications have been published within this topic receiving 28759 citations.
Papers published on a yearly basis
Papers
More filters
••
21 Apr 1998TL;DR: This paper introduces a statistical method for bootstrapping a lexicon from a very small number of "noisy," domain-specific texts that determines regularity in grammatical forms and also reoccuring ungrammatical forms from the input text.
Abstract: Optical character recognition (OCR) still garbles a considerable amount of information reduction and noise on texts so that many documents are unsuitable for information extraction systems. This paper introduces a statistical method for bootstrapping a lexicon from a very small number of "noisy," domain-specific texts. This method determines regularity in grammatical forms and also reoccuring ungrammatical forms from the input text. Through a combination of frequency lists and Levenshtein matrices, a language independent, robust core lexicon is constructed that supports the analysis of "noisy texts," too.
••
11 Dec 2004
TL;DR: Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India and not much work is reported on the development of optical character recognition systems for Telugu text.
Abstract: Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Not much work has been reported on the development of optical character recognition systems for Telugu text. Therefore, it is an area of current research. During the process of recognition, it is observed that, in many cases, a symbol is recognized erroneously because the recognizer incorrectly outputs a very similar looking symbol. Several such sets of symbols that are commonly confused for each other are identified and presented in a table called confusion table. Special logic and algorithms are developed using simple structural features, for resolving confusion and improving recognition accuracies considerably without too much additional computational effort.
•
06 Dec 2004TL;DR: A handheld text image processing device can include a camera, a graphics processing component, a text processing component and a memory as mentioned in this paper, which can capture, store and manipulate information associated with representation of textual information in a convenient and efficient way.
Abstract: Information associated with representation of textual information can be captured, stored and manipulated in a convenient and efficient automated manner that conserves resources. A handheld text image processing device can include a camera, a graphics processing component, a text processing component and a memory. The camera captures digital picture information associated with text on an object. The graphics processing component performs graphics processing on the digital picture information that facilitate text recognition (e.g., transforms, rotations, etc.). The text processing component recognizes representations of the text in the digital picture information and converts the digital picture information associated with the text into a text file format. The memory stores the information in a text file format. The text file format can represent the textual information utilizing less bits than a file format in which the text information is captured. The text information can also be communicated in a text file format.
•
17 Feb 2000
TL;DR: In this article, a translating device stores in a translation information database, including an original language text and its object language text, a user confirmation text written in the original language and edited by making an addition or a change to or in the expression on the basis of the translation information.
Abstract: A translating device (10) stores in a translation information database (18) in addition to an original language text and its object language text, a user confirmation original language text written in the original language and edited by making an addition or a change to or in the expression on the basis of the object language text, a situation describing text describing the situation where the expression of the user confirmation original language text is used, a situation describing image and a situation describing speech relating to the situation where the expression is used, and translation information including information concerning the original language text and the object language text, that is, the application limit applied to the candidates of the results of translation of the original language text. The original language text is translated into an object language text by using the translation information.