scispace - formally typeset
Search or ask a question
Author

J.-M. Jolion

Bio: J.-M. Jolion is an academic researcher from Vision Institute. The author has contributed to research in topics: Image retrieval & Concept search. The author has an hindex of 1, co-authored 1 publications receiving 249 citations.

Papers
More filters
Proceedings ArticleDOI
10 Dec 2002
TL;DR: An algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text is presented and the quality of the localized text is improved by robust multiple frame integration.
Abstract: The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e.g. by key word based queries. In this paper we present an algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text. The quality of the localized text is improved by robust multiple frame integration. Anew technique for the binarization of the text boxes is proposed. Finally, detection and OCR results for a commercial OCR are presented.

262 citations


Cited by
More filters
01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Proceedings ArticleDOI
Simon M. Lucas1, A. Panaretos1, L. Sosa1, A. Tang1, S. Wong1, R. Young1 
03 Aug 2003
TL;DR: The robust reading problem was broken down into three sub-problems, and competitions for each stage, and also a competition for the best overall system, which was the only one to have any entries.
Abstract: This paper describes the robust reading competitions forICDAR 2003. With the rapid growth in research over thelast few years on recognizing text in natural scenes, thereis an urgent need to establish some common benchmarkdatasets, and gain a clear understanding of the current stateof the art. We use the term robust reading to refer to text imagesthat are beyond the capabilities of current commercialOCR packages. We chose to break down the robust readingproblem into three sub-problems, and run competitionsfor each stage, and also a competition for the best overallsystem. The sub-problems we chose were text locating,character recognition and word recognition.By breaking down the problem in this way, we hope togain a better understanding of the state of the art in eachof the sub-problems. Furthermore, our methodology involvesstoring detailed results of applying each algorithm toeach image in the data sets, allowing researchers to study indepth the strengths and weaknesses of each algorithm. Thetext locating contest was the only one to have any entries.We report the results of this contest, and show cases wherethe leading algorithms succeed and fail.

607 citations

Journal ArticleDOI
TL;DR: A survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras, and some sample applications under development and feasible ideas for future development is presented.
Abstract: The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

493 citations

Journal ArticleDOI
TL;DR: A new framework to detect text strings with arbitrary orientations in complex natural scene images with outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation.
Abstract: Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from a complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) image partition to find text character candidates based on local gradient features and color uniformity of character components and 2) character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in nonhorizontal orientations.

355 citations

Proceedings ArticleDOI
31 Aug 2005
TL;DR: The main result is that the leading 2005 entry has improved significantly on the leading 2003 entry, with an increase in average f- score from 0.5 to 0.62, where the f-score is the same adapted information retrieval measure used for the 2003 competition.
Abstract: This paper describes the results of the ICDAR 2005 competition for locating text in camera captured scenes. For this we used the same data as the ICDAR 2003 competition, which has been kept private until now. This allows a direct comparison with the 2003 entries. The main result is that the leading 2005 entry has improved significantly on the leading 2003 entry, with an increase in average f-score from 0.5 to 0.62, where the f-score is the same adapted information retrieval measure used for the 2003 competition. The paper also discusses the Web-based deployment and evaluation of text locating systems, and one of the leading entries has now been deployed in this way. This mode of usage could lead to more complete and more immediate knowledge of the strengths and weaknesses of each newly developed system.

304 citations