scispace - formally typeset
Search or ask a question
Topic

Noisy text analytics

About: Noisy text analytics is a research topic. Over the lifetime, 700 publications have been published within this topic receiving 28759 citations.


Papers
More filters
PatentDOI
Hsiao-Wuen Hon1, Dong Li1, Xuedong Huang1, Yun-Chen Ju1, Xianghui Sean Zhang1 
TL;DR: A computer implemented system and method of proofreading text in a computer system includes receiving text from a user into a text editing module as discussed by the authors, at least a portion of the text is converted to an audio signal upon the detection of an indicator, the indicator defining a boundary in the text by either being embodied therein or comprising delays in receiving text.
Abstract: A computer implemented system and method of proofreading text in a computer system includes receiving text from a user into a text editing module. At least a portion of the text is converted to an audio signal upon the detection of an indicator, the indicator defining a boundary in the text by either being embodied therein or comprising delays in receiving text. The audio signal is played through a speaker to the user to provide feedback.

224 citations

Posted Content
TL;DR: This work proposes to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem, and demonstrates that the proposed algorithm substantially outperforms previous state-of-the-art approaches.
Abstract: Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge. However, vast majority of the existing methods detect text within local regions, typically through extracting character, word or line level candidates followed by candidate aggregation and false positive elimination, which potentially exclude the effect of wide-scope and long-range contextual cues in the scene. To take full advantage of the rich information available in the whole natural image, we propose to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem. The proposed algorithm directly runs on full images and produces global, pixel-wise prediction maps, in which detections are subsequently formed. To better make use of the properties of text, three types of information regarding text region, individual characters and their relationship are estimated, with a single Fully Convolutional Network (FCN) model. With such predictions of text properties, the proposed algorithm can simultaneously handle horizontal, multi-oriented and curved text in real-world natural images. The experiments on standard benchmarks, including ICDAR 2013, ICDAR 2015 and MSRA-TD500, demonstrate that the proposed algorithm substantially outperforms previous state-of-the-art approaches. Moreover, we report the first baseline result on the recently-released, large-scale dataset COCO-Text.

220 citations

Proceedings ArticleDOI
01 Mar 1996
TL;DR: The interaction between text segments and text themes is used to characterize text structure, and to formulate specifications for information retrieval, text traversal, and text summarization.
Abstract: With the widespread use of full-text information retrieval, passage-retrieval techniques are becoming increasingly popular. Larger texts can then be replaced by important text excerpts, thereby simplifying the retrieval task and improving retrieval effectiveness. Passage-level evidence about the use of words in local contexts is also useful for resolving language ambiguities and improving retrieval output. Two main text decomposition strategies are introduced in this study, including a chronological decomposition into {\em text segments}, and semantic decomposition into {\em text themes}. The interaction between text segments and text themes is then used to characterize text structure, and to formulate specifications for information retrieval, text traversal, and text summarization.

213 citations

Patent
27 Jun 2003
TL;DR: In this article, an action dynamically linked library (DLL) is used to obtain actions associated with markup language elements applied to the text or data, which are then passed to a recognizer DLL for recognition of certain data types.
Abstract: Markup language data applied to text or data is leveraged for providing helpful actions on certain types of text or data such as names, addresses, etc. Selected portions of text or data entered into a document and any associated markup language data are passed to an action dynamically linked library (DLL) for obtaining actions associated with markup language elements applied to the text or data. The text or data may be passed to a recognizer DLL for recognition of certain data types. The recognizer DLL utilizes markup language data associated with the text or data to assist recognition and labeling of text or data. After all applicable text and/or data is recognized and labeled, an action DLL is called for actions associated with the labeled text or data.

208 citations

Patent
27 Dec 2007
TL;DR: In this paper, a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language is proposed. But the method is not suitable for speech recognition systems due to the large domain size, scarce training data and noisy environmental conditions.
Abstract: The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.

199 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
75% related
Server
79.5K papers, 1.4M citations
74% related
Cluster analysis
146.5K papers, 2.9M citations
74% related
Feature (computer vision)
128.2K papers, 1.7M citations
73% related
Wireless sensor network
142K papers, 2.4M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20236
20228
20201
20191
20184
201723