scispace - formally typeset
Search or ask a question

Showing papers by "Michihiko Minoh published in 2012"


Book ChapterDOI
07 Oct 2012
TL;DR: A set-based discriminative ranking model (SBDR), which iterates between set-to-set distance finding and discrim inative feature space projection to achieve simultaneous optimization of these two.
Abstract: Recently both face recognition and body-based person re-identification have been extended from single-image based scenarios to video-based or even more generally image-set based problems. Set-based recognition brings new research and application opportunities while at the same time raises great modeling and optimization challenges. How to make the best use of the available multiple samples for each individual while at the same time not be disturbed by the great within-set variations is considered by us to be the major issue. Due to the difficulty of designing a global optimal learning model, most existing solutions are still based on unsupervised matching, which can be further categorized into three groups: a) set-based signature generation, b) direct set-to-set matching, and c) between-set distance finding. The first two count on good feature representation while the third explores data set structure and set-based distance measurement. The main shortage of them is the lack of learning-based discrimination ability. In this paper, we propose a set-based discriminative ranking model (SBDR), which iterates between set-to-set distance finding and discriminative feature space projection to achieve simultaneous optimization of these two. Extensive experiments on widely-used face recognition and person re-identification datasets not only demonstrate the superiority of our approach, but also shed some light on its properties and application domain.

57 citations


Proceedings ArticleDOI
01 Sep 2012
TL;DR: A new way called Common-Near-Neighbor Analysis is presented, which analyzes the commonness of the near neighbors of each pair of samples in a learned metric space, measured by a novel rank-order based dissimilarity.
Abstract: Person re-identification tackles the problem whether an observed person of interest reappears in a network of cameras. The difficulty primarily originates from few samples per class but large amounts of intra-class variations in real scenarios: illumination, pose and viewpoint changes across cameras. So far, proposals in the literature have treated this either as a matching problem focusing on feature representation or as a classification/ranking problem relying on metric optimization. This paper presents a new way called Common-Near-Neighbor Analysis, which to some extent combines the strengths of these two methodologies. It analyzes the commonness of the near neighbors of each pair of samples in a learned metric space, measured by a novel rank-order based dissimilarity. Our method, using only color cue, has been tested on widely-used benchmark datasets, showing significant performance improvement over the state-of-the-art.

48 citations


Proceedings ArticleDOI
Yang Wu1, Michihiko Minoh1, Masayuki Mukunoki1, Wei Li1, Shihong Lao2 
18 Sep 2012
TL;DR: A collaborative representation over all the gallery images of known person individuals is built to best approximate the query images (containing an unknown person) via affine combinations to reveal the identity of the querying person.
Abstract: In this paper we propose a simple and effective solution to the important and challenging problem of across-camera person re-identification. We focus on the common case in video surveillance where multiple images or video frames are available for each person. Instead of exploring new features, the proposed approach aims at making a better use of such images/frames. It builds a collaborative representation over all the gallery images (of known person individuals) to best approximate the query images (containing an unknown person) via affine combinations. The approximation is measured by the nearest point distance between the two affine hulls constructed by the query images and gallery images, respectively. By enforcing the sparsity of the samples used for approximating the two nearest points, the relative importance of the gallery images belonging to different persons has the ability to reveal the identity of the querying person. Extensive experiments on public benchmark datasets demonstrate that the proposed approach greatly outperforms the state-of-the-art methods.

30 citations


Proceedings Article
01 Nov 2012
TL;DR: The proposed method is applicable to various real-world object recognition tasks instead of handling only the well-controlled face recognition problem, and enables using an existing dictionary for testing new data without time-consuming data annotation and model re-training.
Abstract: A simple and effective method is proposed for object recognition via collaborative representation with ridge regression Different from existing sparse representation and collaborative representation based approaches, the proposal does not need extensive training samples for each testing class and it is robust to localization errors and large within-class variations, thus being applicable to various real-world object recognition tasks instead of handling only the well-controlled face recognition problem Its discriminative power is explored from a third-party dataset which can be different from the training and testing datasets, therefore, it enables using an existing dictionary for testing new data without time-consuming data annotation and model re-training As an example, the proposal is extensively tested on the representative and very challenging task of person re-identification, defining novel state-of-the-art results on widely adopted benchmark datasets using only simple and common features

15 citations


Proceedings ArticleDOI
02 Nov 2012
TL;DR: A method that involves some physical signals obtained in a cutting process by attaching load and sound sensors to the chopping board to facilitate more precise recognition of ingredients in food preparing activity is proposed.
Abstract: We propose a method for recognizing ingredients in food preparing activity. The research for object recognition mainly focuses on only visual information; however, ingredients are difficult to recognize only by visual information because of their limited color variations and larger within-class difference than inter-class difference in shapes. In this paper, we propose a method that involves some physical signals obtained in a cutting process by attaching load and sound sensors to the chopping board. The load may depend on an ingredient's hardness. The sound produced when a knife passes through an ingredient reflects the structure of the ingredient. Hence, these signals are expected to facilitate more precise recognition. We confirmed the effectiveness of the integration of the three modalities (visual, auditory, and load) through experiments in which the developed method was applied to 23 classes of ingredients.

13 citations


Journal ArticleDOI
TL;DR: It is argued that students should report their own comprehension explicitly in a classroom with students' comprehension made available at the slide level, and a machine learning technique is applied to classify presentation slides according to comprehension levels.
Abstract: Comprehension assessment is an essential tool in classroom learning. However, the judgment often relies on experience of an instructor who makes observation of students' behavior during the lessons. We argue that students should report their own comprehension explicitly in a classroom. With students' comprehension made available at the slide level, we apply a machine learning technique to classify presentation slides according to comprehension levels. Our experimental result suggests that presentation-based features are as predictive as bag-of-words feature vector which is proved successful in text classification tasks. Our analysis on presentation-based features reveals possible causes of poor lecture comprehension.

11 citations


Book ChapterDOI
09 Jul 2012
TL;DR: This paper focuses on the learners’ interest level as an example of the important affective state, and investigates a method for estimating it from their nonverbal behaviors.
Abstract: A method for recognizing or estimating learners’ affective state plays a key role for realizing agent-based conversational e-Learning. In this paper, we focus on the learners’ interest level as an example of the important affective state, and investigate a method for estimating it from their nonverbal behaviors. In conversational situations, the sense of the nonverbal behaviors will vary depending on the contexts of the conversations. Therefore we do not use the nonverbal behaviors themselves but use the occurrence frequencies of the nonverbal behaviors as inputs for estimation mechanism. In the result of our experiment, the proposed method could estimate whether the learners’ interest level is “High” or “Low” with the accuracy of more than 70%.

8 citations



Proceedings ArticleDOI
13 Oct 2012
TL;DR: A novel method for reconstructing the shape model of a non-rigid object as the union of rigid components, and uses the Pinhole-to-Projection Pyramid obtained from each range image to non-iteratively solve the assignment task.
Abstract: In this paper, we propose a novel method for reconstructing the shape model of a non-rigid object. We represent the non-rigid object as the union of rigid components, and acquire range images of the object and motion of each component while the object varies its shape. We acquire the range images using one-shot scanning, and we use marker-based motion capture for motion acquisition. Based on them, our method performs registration of the range images and assigns a shape to each component. We propose the use of the Pinhole-to-Projection Pyramid obtained from each range image to non-iteratively solve the assignment task. The effectiveness of our method is demonstrated by applying it to reconstruct the shape of a human hand.

3 citations


Book ChapterDOI
09 Jul 2012
TL;DR: A method is proposed for estimating the students’ posture sequence in classroom from video footage by computer automatically by introducing spatio-temporal constraints, in which the belief of postures is propagated through a given time interval with considering the confidence of observation.
Abstract: We propose a method for estimating the students’ posture sequence in classroom from video footage by computer automatically. A posture sequence is a time-series of student’s postures during a lecture and a posture of a student is described by a set of his head, body trunk (torso) and hands/arms states, which we call the body part states. The detection of body parts from video footage has many errors. To cope with the errors, we introduce spatio-temporal constraints, in which we propagate the belief of postures through a given time interval with considering the confidence of observation. Through this propagation, we can revise the erroneous detection results and estimate an appropriate posture sequence. In the experiment, we apply our proposed method to a real lecture, and show that our method can improve the accuracy of posture sequence estimation.

1 citations


Journal ArticleDOI
15 Sep 2012
TL;DR: This paper proposes the method to apply pre-roll sequentially for segmented video data divided by points where continuity of video content breaks, and shows it can meet spatial, temporal and continuous quality with few degradation of real-time quality.
Abstract: This paper focuses on live video distribution by video streaming technology. In general, it is an important challenge for video streaming to preserve spatial, temporal, continuous, and real-time quality of video. Pre-roll streaming, which buffers a certain amount of data before video playback, can retain spatial and temporal quality without degradation of continuous quality, while the problem about real-time quality remains because it is difficult to know data amount of video filmed in future on live video distribution. Hence, we propose the method to apply pre-roll sequentially for segmented video data divided by points where continuity of video content breaks, and show it can meet spatial, temporal and continuous quality with few degradation of real-time quality. In the experiment, we show the proposed method met other three qualities with few degradation of real-time quality for lecture video streaming.

Book ChapterDOI
09 Jul 2012
TL;DR: In this experiment, the proposed method for generating a key by quantizing the facial features based on entropy was applied to a public facial image database, and the system performance and integrity was evaluated.
Abstract: To achieve privacy protection on facial image retrieval systems, we propose a method of encrypting facial images with a key produced from facial features. Because facial features vary even for the same person, it is not recommended to use facial features as the cryptographic key. Therefore, we propose a method for generating a key by quantizing the facial features based on entropy. In our experiment, we applied the proposed method to a public facial image database, and evaluated the system performance and integrity by calculating the false acceptance rate and the false rejection rate.