scispace - formally typeset
Search or ask a question

Showing papers on "Standard test image published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled.
Abstract: This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. The system combines region-level features with per-exemplar sliding window detectors. Per-exemplar detectors are better suited for our parsing task than traditional bounding box detectors: they perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. The proposed system achieves state-of-the-art accuracy on three challenging datasets, the largest of which contains 45,676 images and 232 labels.

263 citations


Proceedings Article
11 Nov 2013
TL;DR: A new database for a copy-move forgery detection (CMFD) that consist of 260 forged image sets, which includes forged image, two masks and original image is developed.
Abstract: Due to the availability of many sophisticated image processing tools, a digital image forgery is nowadays very often used. One of the common forgery method is a copy-move forgery, where part of an image is copied to another location in the same image with the aim of hiding or adding some image content. Numerous algorithms have been proposed for a copy-move forgery detection (CMFD), but there exist only few benchmarking databases for algorithms evaluation. We developed new database for a CMFD that consist of 260 forged image sets. Every image set includes forged image, two masks and original image. Images are grouped in 5 categories according to applied manipulation: translation, rotation, scaling, combination and distortion. Also, postprocessing methods, such as JPEG compression, blurring, noise adding, color reduction etc., are applied at all forged and original images. In this paper we present database organization and content, creation of forged images, postprocessing methods, and database testing. CoMoFoD database is available at http://www.vcl.fer.hr/comofod.

225 citations


Journal ArticleDOI
TL;DR: This paper proposes a forgery detection method that exploits subtle inconsistencies in the color of the illumination of images that is applicable to images containing two or more people and requires no expert interaction for the tampering decision.
Abstract: For decades, photographs have been used to document space-time events and they have often served as evidence in courts. Although photographers are able to create composites of analog pictures, this process is very time consuming and requires expert knowledge. Today, however, powerful digital image editing software makes image modifications straightforward. This undermines our trust in photographs and, in particular, questions pictures as evidence for real-world events. In this paper, we analyze one of the most common forms of photographic manipulation, known as image composition or splicing. We propose a forgery detection method that exploits subtle inconsistencies in the color of the illumination of images. Our approach is machine-learning-based and requires minimal user interaction. The technique is applicable to images containing two or more people and requires no expert interaction for the tampering decision. To achieve this, we incorporate information from physics- and statistical-based illuminant estimators on image regions of similar material. From these illuminant estimates, we extract texture- and edge-based features which are then provided to a machine-learning approach for automatic decision-making. The classification performance using an SVM meta-fusion classifier is promising. It yields detection rates of 86% on a new benchmark dataset consisting of 200 images, and 83% on 50 images that were collected from the Internet.

220 citations


Journal ArticleDOI
TL;DR: A robust hashing method is developed for detecting image forgery including removal, insertion, and replacement of objects, and abnormal color modification, and for locating the forged area.
Abstract: A robust hashing method is developed for detecting image forgery including removal, insertion, and replacement of objects, and abnormal color modification, and for locating the forged area. Both global and local features are used in forming the hash sequence. The global features are based on Zernike moments representing luminance and chrominance characteristics of the image as a whole. The local features include position and texture information of salient regions in the image. Secret keys are introduced in feature extraction and hash construction. While being robust against content-preserving image processing, the hash is sensitive to malicious tampering and, therefore, applicable to image authentication. The hash of a test image is compared with that of a reference image. When the hash distance is greater than a threshold τ1 and less than τ2, the received image is judged as a fake. By decomposing the hashes, the type of image forgery and location of forged areas can be determined. Probability of collision between hashes of different images approaches zero. Experimental results are presented to show effectiveness of the method.

200 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work presents a novel and robust exemplar-based face detector that integrates image retrieval and discriminative learning, and can detect faces under challenging conditions without explicitly modeling their variations.
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplar-based face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other face-related tasks, such as attribute recognition, as well as general object detection.

168 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes an exemplar-based face image segmentation algorithm, taking inspiration from previous works on image parsing for general scenes, that first selects a subset of exemplar images from the database, then computes a nonrigid warp for each exemplar image to align it with the test image.
Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results, that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.

158 citations


Journal ArticleDOI
TL;DR: Experimental results have shown that the image reconstruction with basis images distinctly outperforms the ICA feature extraction approach in defect inspection of solar modules in electroluminescence (EL) images.
Abstract: Solar power has become an attractive alternative of electricity energy. Solar cells that form the basis of a solar power system are mainly based on multicrystalline silicon. A set of solar cells are assembled and interconnected into a large solar module to offer a large amount of electricity power for commercial applications. Many defects in a solar module cannot be visually observed with the conventional CCD imaging system. This paper aims at defect inspection of solar modules in electroluminescence (EL) images. The solar module charged with electrical current will emit infrared light whose intensity will be darker for intrinsic crystal grain boundaries and extrinsic defects including micro-cracks, breaks and finger interruptions. The EL image can distinctly highlight the invisible defects but also create a random inhomogeneous background, which makes the inspection task extremely difficult. The proposed method is based on independent component analysis (ICA), and involves a learning and a detection stage. The large solar module image is first divided into small solar cell subimages. In the training stage, a set of defect-free solar cell subimages are used to find a set of independent basis images using ICA. In the inspection stage, each solar cell subimage under inspection is reconstructed as a linear combination of the learned basis images. The coefficients of the linear combination are used as the feature vector for classification. Also, the reconstruction error between the test image and its reconstructed image from the ICA basis images is also evaluated for detecting the presence of defects. Experimental results have shown that the image reconstruction with basis images distinctly outperforms the ICA feature extraction approach. It can achieve a mean recognition rate of 93.4% for a set of 80 test samples.

140 citations


Journal ArticleDOI
TL;DR: Benefiting from sparse feature-based image transformation, the method is more robust to corrupted input data, and can be considered as a simultaneous image restoration and transformation process.
Abstract: In this paper, we propose a framework of transforming images from a source image space to a target image space, based on learning coupled dictionaries from a training set of paired images. The framework can be used for applications such as image super-resolution and estimation of image intrinsic components (shading and albedo). It is based on a local parametric regression approach, using sparse feature representations over learned coupled dictionaries across the source and target image spaces. After coupled dictionary learning, sparse coefficient vectors of training image patch pairs are partitioned into easily retrievable local clusters. For any test image patch, we can fast index into its closest local cluster and perform a local parametric regression between the learned sparse feature spaces. The obtained sparse representation (together with the learned target space dictionary) provides multiple constraints for each pixel of the target image to be estimated. The final target image is reconstructed based on these constraints. The contributions of our proposed framework are three-fold. 1) We propose a concept of coupled dictionary learning based on coupled sparse coding which requires the sparse coefficient vectors of a pair of corresponding source and target image patches to have the same support, i.e., the same indices of nonzero elements. 2) We devise a space partitioning scheme to divide the high-dimensional but sparse feature space into local clusters. The partitioning facilitates extremely fast retrieval of closest local clusters for query patches. 3) Benefiting from sparse feature-based image transformation, our method is more robust to corrupted input data, and can be considered as a simultaneous image restoration and transformation process. Experiments on intrinsic image estimation and super-resolution demonstrate the effectiveness and efficiency of our proposed method.

130 citations


Journal ArticleDOI
TL;DR: A novel method for the blind detection of MF in digital images is presented and two new feature sets that allow us to distinguish a median- Filtered image from an untouched image or average-filtered one are introduced.
Abstract: Recently, the median filtering (MF) detector as a forensic tool for the recovery of images' processing history has attracted wide interest This paper presents a novel method for the blind detection of MF in digital images Following some strongly indicative analyses in the difference domain of images, we introduce two new feature sets that allow us to distinguish a median-filtered image from an untouched image or average-filtered one The effectiveness of the proposed features is verified with evidence from exhaustive experiments on a large composite image database Compared with prior arts, the proposed method achieves significant performance improvement in the case of low resolution and strong JPEG post-compression In addition, it is demonstrated that our method is more robust against additive noise than other existing MF detectors With analyses and extensive experimental researches presented in this paper, we hope that the proposed method will add a new tool to the arsenal of forensic analysts

126 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed new super-resolution (SR) scheme achieves significant improvement compared with four state-of-the-art schemes in terms of both subjective and objective qualities.
Abstract: This paper proposes a new super-resolution (SR) scheme for landmark images by retrieving correlated web images. Using correlated web images significantly improves the exemplar-based SR. Given a low-resolution (LR) image, we extract local descriptors from its up-sampled version and bundle the descriptors according to their spatial relationship to retrieve correlated high-resolution (HR) images from the web. Though similar in content, the retrieved images are usually taken with different illumination, focal lengths, and shot perspectives, resulting in uncertainty for the HR detail approximation. To solve this problem, we first propose aligning these images to the up-sampled LR image through a global registration, which identifies the corresponding regions in these images and reduces the mismatching. Second, we propose a structure-aware matching criterion and adaptive block sizes to improve the mapping accuracy between LR and HR patches. Finally, these matched HR patches are blended together by solving an energy minimization problem to recover the desired HR image. Experimental results demonstrate that our SR scheme achieves significant improvement compared with four state-of-the-art schemes in terms of both subjective and objective qualities.

111 citations


Journal Article
TL;DR: A simple and effective full-reference color image quality measure (CQM) based on reversible luminance and chrominance (YUV) color transformation and peak signal-to-noise ratio (PSNR) measure is proposed, which relies on a unique feature of the human eye response to the Luminance and color.
Abstract: Various methods for measuring perceptual image quality attempt to quantify the visibility of differences between an original digital image and its distorted version using a variety of known properties of the human vision system (HVS). In this paper, we propose a simple and effective full-reference color image quality measure (CQM) based on reversible luminance and chrominance (YUV) color transformation and peak signal-to-noise ratio (PSNR) measure. The main motivation of this new measure relies on a unique feature of the human eye response to the luminance and color. Experimental studies about the applicability of the CQM on a well-known test image under 6 different distortions, both perceivable by the human vision system and with the same PSNR value (i.e. 27.67), are presented. The CQM results are obtained as 39.56, 38.93, 38.08, 37.43, 37.10, and 36.79 dB for each distorted image, showing that image quality of the first image is noticeably higher than the others with respect to the same PSNR value. This conclusion attests that using the CQM together with the traditional PSNR approach provides distinguished results.

Journal ArticleDOI
TL;DR: The results of this study showed that the 2D FIR filters designed based on ABC optimization can eliminate speckle noise quite well on noise added test images and intrinsically noisy ultrasound images.

Proceedings ArticleDOI
01 Oct 2013
TL;DR: A novel facial representation model of cattle based on local binary pattern (LBP) texture features and some extended LBP descriptors are introduced and shows its excellence in efficiency and accuracy with regard to the encouraging results on cattle face recognition.
Abstract: In response to the current need for positive identification of cattle traceability, this paper presents a novel facial representation model of cattle based on local binary pattern (LBP) texture features and some extended LBP descriptors are also introduced. Algorithm training was performed independently on several normalized gray face images of 30 cattle (with each having a set of six, seven, eight, and nine images respectively). Robust alignment by sparse and low-rank decomposition was also used to align the images because of variations in illumination, image misalignment and occlusion in the test image. The performance of this technique was assessed on a separate set of images using the weighted Chi square distance [1]. The LBP descriptor shows its excellence in efficiency and accuracy with regard to the encouraging results on cattle face recognition. More training sets and modified algorithms will be considered to improve recognition rates. Future work should aim at improving the automation of the system and combining the LBP histogram with other effective histograms.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: A new set of high quality color image sequences captured with the authors' professional digital cinema camera and the set is freely available for FTP at ftp://imageset@ftp.arri.de/
Abstract: In 1991 Kodak released a set of 24 digital color images derived from a variety of film source materials. Since then, most image processing algorithms have been developed, optimized, tested and compared using this set. Until a few years ago it was considered “the” image set; however, today it shows its limitations. Researches have expressed their need for better, more up-to-date material. We present a new set of high quality color image sequences captured with our professional digital cinema camera. This camera stores uncompressed raw sensor data and the set is freely available for FTP at ftp://imageset@ftp.arri.de/ password: imageset.

Journal ArticleDOI
TL;DR: The nearest-farthest subspace (NFS) classifier is proposed which exploits the both relationships to classify a test image and the comparisons with NS classifier and other state-of-the-art methods on four famous public face databases demonstrate the good performance of FS and NFS.

Journal ArticleDOI
TL;DR: An on-line computer-aided diagnostic system called “UroImage” that classifies a Transrectal Ultrasound (TRUS) image into cancerous or non-cancerous with the help of non-linear Higher Order Spectra (HOS) features and Discrete Wavelet Transform (DWT) coefficients is proposed.
Abstract: In this work, we have proposed an on-line computer-aided diagnostic system called “UroImage” that classifies a Transrectal Ultrasound (TRUS) image into cancerous or non-cancerous with the help of non-linear Higher Order Spectra (HOS) features and Discrete Wavelet Transform (DWT) coefficients. The UroImage system consists of an on-line system where five significant features (one DWT-based feature and four HOS-based features) are extracted from the test image. These on-line features are transformed by the classifier parameters obtained using the training dataset to determine the class. We trained and tested six classifiers. The dataset used for evaluation had 144 TRUS images which were split into training and testing sets. Three-fold and ten-fold cross-validation protocols were adopted for training and estimating the accuracy of the classifiers. The ground truth used for training was obtained using the biopsy results. Among the six classifiers, using 10-fold cross-validation technique, Support Vector Machin...

Journal ArticleDOI
TL;DR: This work aims to differentiate a stego image from its cover image based on steganalysis results of decomposed image blocks, and finds a classifier for each class to decide whether a block is from a cover or stEGo image.

Journal ArticleDOI
TL;DR: This work proposes a Laplacian Joint Group Lasso model to jointly reconstruct the regions within a test image with a set of labeled training data and extends the LJGL model to a kernel version in order to achieve the non-linear reconstruction.

Patent
05 Jul 2013
TL;DR: In this article, an authentication method and an authentication system are provided, which includes the following steps: Providing a test image in a first state and obtaining the test images in a second state in response to a rotating operation.
Abstract: An authentication method and an authentication system are provided. The authentication method includes the following steps. Providing a test image in a first state. Obtaining the test image in a second state in response to a rotating operation. Calculating a difference value between each of image hash values of the test image in the second state and the test image in a third state. Determining that an authentication is successful if the difference value is less than a threshold value, wherein the third state is a state in which the test image is up-right.

Journal ArticleDOI
TL;DR: A novel feature descriptor, Local Polar DCT Features (LPDF), which is robust to a variety of image transformations, even with very low dimensions is presented.
Abstract: We present a novel feature descriptor, Local Polar DCT Features (LPDF), which is robust to a variety of image transformations. Specifically, the local patch is quantized in the designed polar geometric structure and the 2-D DCT features are then extracted and rearranged. A subset of the resulting DCT coefficients is selected as our compact LPDF descriptor. We perform a comprehensive performance evaluation with state-of-the-art methods, i.e., SIFT, DAISY, LIOP, and GLOH on the standard Oxford dataset and two additional test image pairs. Experimental results demonstrate the superiority of proposed descriptor under various image transformations, even with very low dimensions.

Proceedings ArticleDOI
15 Jul 2013
TL;DR: The main idea is that the time-series curve of a 3D hand gesture is divided into various finger combinations, called `fingerlets', which can either be learned or be set manually to represent each gesture and to capture inter-class variations.
Abstract: 3D Human Computer Interaction (HCI) becomes more and more popular thanks to the emergence of commercial depth cameras. Moreover, hand gestures provide a natural and attractive alternative to cumbersome interface devices for HCI. In this paper, we present an Image-to-Class Dynamic Time Warping (I2C-DTW) approach for 3D hand gesture recognition. Themain idea is that we divide the time-series curve of a 3D hand gesture into various finger combinations, called `fingerlets', which can either be learned or be set manually to represent each gesture and to capture inter-class variations. Furthermore, the I2C-DTW approach searches for the minimal path to warp two fingerlets, which are fromone test image and the specific class, respectively. Then the gesture recognition is to use the ensemble of multiple image-to-class DTW distance of fingerlets to obtain better performance. The proposed approach is evaluated on two 3D hand gesture datasets and the experiment results show that the proposed I2C-DTW approach significantly improves the recognizing performance.

Journal ArticleDOI
TL;DR: This paper presents experimental proof the accuracy of the Eigenfaces approach, a unique approach which directly classifies a test image as belonging to one of the six standard expressions - anger, disgust, fear, happy, sad or surprise with great accuracy.

01 Jan 2013
TL;DR: This paper mainly addresses the Methodological Analysis of Principal Component Analysis Method and presents a comprehensive discussion of PCA and also simulate it on some data sets using MATLAB.
Abstract: Principal Components Analysis (PCA) is a practical and standard statistical tool in modern data analysis that has found application in different areas such as face recognition, image compression and neuroscience. It has been called one of the most precious results from applied linear algebra. PCA is a straightforward, non-parametric method for extracting pertinent information from confusing data sets. It presents a roadmap for how to reduce a complex data set to a lower dimension to disclose the hidden, simplified structures that often underlie it. This paper mainly addresses the Methodological Analysis of Principal Component Analysis (PCA) Method. PCA is a statistical approach used for reducing the number of variables which is most widely used in face recognition. In PCA, every image in the training set is represented as a linear combination of weighted eigenvectors called eigenfaces. These eigenvectors are obtained from covariance matrix of a training image set. The weights are found out after selecting a set of most relevant Eigenfaces. Recognition is performed by projecting a test image onto the subspace spanned by the eigenfaces and then classification is done by measuring minimum Euclidean distance. In this paper we present a comprehensive discussion of PCA and also simulate it on some data sets using MATLAB.

Patent
17 Jan 2013
TL;DR: In this article, a method for reconstructing a time series of images from data acquired with a medical imaging system is provided, where the acquired data and selected image block set are then used to jointly estimate a plurality of images that form a time-series of images while promoting locally low rank structure in the images.
Abstract: A method for reconstructing a time series of images from data acquired with a medical imaging system is provided. Data is acquired with the medical imaging system, and a set of image blocks that defines the location and size of each of a plurality of image blocks in the image domain is then selected. The acquired data and selected image block set are then used to jointly estimate a plurality of images that form a time series of images while promoting locally-low rank structure in the images.

Journal ArticleDOI
TL;DR: Experimental results on three publicly available benchmark datasets show that in all scenarios the structured models lead to more accurate predictions, and leverage user input much more effectively than state-of-the-art independent models.
Abstract: We propose structured prediction models for image labeling that explicitly take into account dependencies among image labels. In our tree-structured models, image labels are nodes, and edges encode dependency relations. To allow for more complex dependencies, we combine labels in a single node and use mixtures of trees. Our models are more expressive than independent predictors, and lead to more accurate label predictions. The gain becomes more significant in an interactive scenario where a user provides the value of some of the image labels at test time. Such an interactive scenario offers an interesting tradeoff between label accuracy and manual labeling effort. The structured models are used to decide which labels should be set by the user, and transfer the user input to more accurate predictions on other image labels. We also apply our models to attribute-based image classification, where attribute predictions of a test image are mapped to class probabilities by means of a given attribute-class mapping. Experimental results on three publicly available benchmark datasets show that in all scenarios our structured models lead to more accurate predictions, and leverage user input much more effectively than state-of-the-art independent models.

Journal ArticleDOI
TL;DR: An Eigenvector based system has been presented to recognize facial expressions from digital facial images and it was found that similarity was obtained by calculating the minimum Euclidean distance between the test image and the different expressions.
Abstract: In this paper, an Eigenvector based system has been presented to recognize facial expressions from digital facial images. In the approach, firstly the images were acquired and cropping of five significant portions from the image was performed to extract and store the Eigenvectors specific to the expressions. The Eigenvectors for the test images were also computed, and finally the input facial image was recognized when similarity was obtained by calculating the minimum Euclidean distance between the test image and the different expressions. A human face carries a lot of important information while interacting to one another. In social interaction, the most common communicative hint is given by one's facial expression. Mainly in psychology, the expressions of facial features have been largely considered. As per the study of Mehrabian (1), amongst the human communication, facial expressions comprises 55% of the message transmitted in comparison to the 7% of the communication information conveyed by linguistic language and 38% by paralanguage.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF).
Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it predicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.

Patent
03 Apr 2013
TL;DR: In this article, a method for detecting optical-axis offset of a lens in equipment is presented, where a standard image acquiring module is used for focusing a standard lens assembled in the equipment at a pickup position, picking up an image sample, and acquiring an image of the image sample.
Abstract: The invention discloses a device and a method for detecting optical-axis offset of a lens in equipment. The device comprises a standard image acquiring module, a reference coordinate system setup module, a test image acquiring module, a test cursor position determining module and an optical-axis offset detecting module, wherein the standard image acquiring module is used for focusing a standard lens assembled in the equipment at a pickup position, picking up an image sample and acquiring a standard image of the image sample; the reference coordinate system setup module is used for taking the center of the standard image as coordinate origin and setting up a reference coordinate system; the test image acquiring module is used for focusing a to-be-detected lens assembled in the equipment at the pickup position, picking up the image sample and acquiring a test image of the image sample; the test cursor position determining module is used for taking the center of the test image as test cursor and determining the position of the test cursor in the reference coordinate system; and the optical-axis offset detecting module is used for determining optical-axis offset and/or optical-axis offset angle of the to-be-detected lens according the position. The device and the method for detecting optical-axis offset of the lens can solve the technical problem that the optical-axis offset of the lens in the equipment cannot be detected during assembly.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper proposes a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces analogous to that of the well-known Robust PCA~\cite{rpca}.
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA~\cite{rpca}, but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object co-detection method and the generic object detection methods without co-detection formulations.

Journal ArticleDOI
TL;DR: An image matching method based on affine transformation of local image areas is proposed that provides significant improvement in robustness for different viewpoint images matching in the 2D scene and 3D scene.
Abstract: In recent years, many methods have been put forward to improve the image matching for different viewpoint images. However, these methods are still not able to achieve stable results, especially when large variation in view occurs. In this paper, an image matching method based on affine transformation of local image areas is proposed. First, local stable regions are extracted from the reference image and the test image, and transformed to circular areas according to the second-order moment. Then, scale invariant features are detected and matched in the transformed regions. Finally, we use epipolar constraint based on the fundamental matrix to eliminate wrong corresponding pairs. The goal of our method is not to increase the invariance of the detector but to improve the final performance of the matching results. The experimental results demonstrate that compared with the traditional detectors the proposed method provides significant improvement in robustness for different viewpoint images matching in the 2D scene and 3D scene. Moreover, the efficiency is greatly improved compared with affine scale invariant feature transform (Affine-SIFT).