scispace - formally typeset
Search or ask a question
Author

Eng-Jon Ong

Bio: Eng-Jon Ong is an academic researcher from University of Surrey. The author has contributed to research in topics: 3D pose estimation & Feature (computer vision). The author has an hindex of 25, co-authored 62 publications receiving 1834 citations. Previous affiliations of Eng-Jon Ong include Queen Mary University of London & University of Oxford.


Papers
More filters
Proceedings ArticleDOI
17 May 2004
TL;DR: A novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape.
Abstract: The ability to detect a persons unconstrained hand in a natural video sequence has applications in sign language, gesture recognition and HCl. This paper presents a novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape. A database of images is first clustered using a k-method clustering algorithm with a distance metric based upon shape context. From this, a tree structure of boosted cascades is constructed. The head of the tree provides a general hand detector while the individual branches of the tree classify a valid shape as belong to one of the predetermined clusters exemplified by an indicative hand shape. Preliminary experiments carried out showed that the approach boasts a promising 99.8% success rate on hand detection and 97.4% success at classification. Although we demonstrate the approach within the domain of hand shape it is equally applicable to other problems where both detection and classification are required for objects that display high variability in appearance.

283 citations

01 Jan 2017
TL;DR: In this paper, sign language recognition using linguistic sub-units is discussed, which includes those learned from appearance data as well as those inferred from both 2D or 3D tracking data.
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%.

146 citations

Book ChapterDOI
TL;DR: This paper discusses sign language recognition using linguistic sub-units, presenting three types of sub- units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data.
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%.

135 citations

Proceedings ArticleDOI
07 Sep 2004
TL;DR: A flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign is presented.
Abstract: This paper presents a flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches. The power of the system is due to four key elements: (i) Head and hand detection based upon boosting which removes the need for temperamental colour segmentation; (ii) A body centred description of activity which overcomes issues with camera placement, calibration and user; (iii) A two stage classification in which stage I generates a high level linguistic description of activity which naturally generalises and hence reduces training; (iv) A stage II classifier bank which does not require HMMs, further reducing training requirements. The outcome of which is a system capable of running in real-time, and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign. We demonstrate classification rates as high as 92% for a lexicon of 164 words with extremely low training requirements outperforming previous approaches where thousands of training examples are required.

103 citations

Journal ArticleDOI
TL;DR: The results show that orientation-selective Gabor filters enhance differences in pose and that different filter orientations are optimal at different poses, while principal component analysis was found to provide an identity-invariant representation in which similarities can be calculated more robustly.

98 citations


Cited by
More filters
Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations

Journal ArticleDOI
TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.

2,738 citations

Journal ArticleDOI
01 Aug 2004
TL;DR: This paper reviews recent developments and general strategies of the processing framework of visual surveillance in dynamic scenes, and analyzes possible research directions, e.g., occlusion handling, a combination of two and three-dimensional tracking, and fusion of information from multiple sensors, and remote surveillance.
Abstract: Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of twoand three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance.

2,321 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of computer vision-based human motion capture literature from the past two decades is presented, with a general overview based on a taxonomy of system functionalities, broken down into four processes: initialization, tracking, pose estimation, and recognition.

1,917 citations

Journal ArticleDOI
TL;DR: This paper discusses the inherent difficulties in head pose estimation and presents an organized survey describing the evolution of the field, comparing systems by focusing on their ability to estimate coarse and fine head pose and highlighting approaches well suited for unconstrained environments.
Abstract: The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has fewer rigorously evaluated systems or generic solutions. In this paper, we discuss the inherent difficulties in head pose estimation and present an organized survey describing the evolution of the field. Our discussion focuses on the advantages and disadvantages of each approach and spans 90 of the most innovative and characteristic papers that have been published on this topic. We compare these systems by focusing on their ability to estimate coarse and fine head pose, highlighting approaches that are well suited for unconstrained environments.

1,402 citations