scispace - formally typeset
Search or ask a question

Showing papers on "Standard test image published in 2012"


Journal ArticleDOI
TL;DR: An efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics model of discrete cosine transform (DCT) coefficients, which requires minimal training and adopts a simple probabilistic model for score prediction.
Abstract: We develop an efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics (NSS) model of discrete cosine transform (DCT) coefficients. The algorithm is computationally appealing, given the availability of platforms optimized for DCT computation. The approach relies on a simple Bayesian inference model to predict image quality scores given certain extracted features. The features are based on an NSS model of the image DCT coefficients. The estimated parameters of the model are utilized to form features that are indicative of perceptual quality. These features are used in a simple Bayesian inference approach to predict quality scores. The resulting algorithm, which we name BLIINDS-II, requires minimal training and adopts a simple probabilistic model for score prediction. Given the extracted features from a test image, the quality score that maximizes the probability of the empirically determined inference model is chosen as the predicted quality score of that image. When tested on the LIVE IQA database, BLIINDS-II is shown to correlate highly with human judgments of quality, at a level that is competitive with the popular SSIM index.

1,484 citations


Journal ArticleDOI
TL;DR: This work proposes a conceptually simple face recognition system that achieves a high degree of robustness and stability to illumination variation, image misalignment, and partial occlusion, and demonstrates how to capture a set of training images with enough illumination variation that they span test images taken under uncontrolled illumination.
Abstract: Many classic and contemporary face recognition algorithms work well on public data sets, but degrade sharply when they are used in a real recognition system. This is mostly due to the difficulty of simultaneously handling variations in illumination, image misalignment, and occlusion in the test image. We consider a scenario where the training images are well controlled and test images are only loosely controlled. We propose a conceptually simple face recognition system that achieves a high degree of robustness and stability to illumination variation, image misalignment, and partial occlusion. The system uses tools from sparse representation to align a test face image to a set of frontal training images. The region of attraction of our alignment algorithm is computed empirically for public face data sets such as Multi-PIE. We demonstrate how to capture a set of training images with enough illumination variation that they span test images taken under uncontrolled illumination. In order to evaluate how our algorithms work under practical testing conditions, we have implemented a complete face recognition system, including a projector-based training acquisition system. Our system can efficiently and effectively recognize faces under a variety of realistic conditions, using only frontal images under the proposed illuminations as training.

669 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work proposes a novel low-rank matrix approximation algorithm with structural incoherence for robust face recognition that decomposes raw training data into a set of representative basis with corresponding sparse errors for better modeling the face images.
Abstract: We address the problem of robust face recognition, in which both training and test image data might be corrupted due to occlusion and disguise. From standard face recognition algorithms such as Eigenfaces to recently proposed sparse representation-based classification (SRC) methods, most prior works did not consider possible contamination of data during training, and thus the associated performance might be degraded. Based on the recent success of low-rank matrix recovery, we propose a novel low-rank matrix approximation algorithm with structural incoherence for robust face recognition. Our method not only decomposes raw training data into a set of representative basis with corresponding sparse errors for better modeling the face images, we further advocate the structural incoherence between the basis learned from different classes. These basis are encouraged to be as independent as possible due to the regularization on structural incoherence. We show that this provides additional discriminating ability to the original low-rank models for improved performance. Experimental results on public face databases verify the effectiveness and robustness of our method, which is also shown to outperform state-of-the-art SRC based approaches.

227 citations


Journal ArticleDOI
01 Nov 2012
TL;DR: Important findings include 1) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and 2) higher contribution of texture features than border-based features in the optimized feature set.
Abstract: This paper presents a novel computer-aided diagnosis system for melanoma. The novelty lies in the optimized selection and integration of features derived from textural, border-based, and geometrical properties of the melanoma lesion. The texture features are derived from using wavelet-decomposition, the border features are derived from constructing a boundary-series model of the lesion border and analyzing it in spatial and frequency domains, and the geometry features are derived from shape indexes. The optimized selection of features is achieved by using the gain-ratio method, which is shown to be computationally efficient for melanoma diagnosis application. Classification is done through the use of four classifiers; namely, support vector machine, random forest, logistic model tree, and hidden naive Bayes. The proposed diagnostic system is applied on a set of 289 dermoscopy images (114 malignant, 175 benign) partitioned into train, validation, and test image sets. The system achieves an accuracy of 91.26% and area under curve value of 0.937, when 23 features are used. Other important findings include 1) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and 2) higher contribution of texture features than border-based features in the optimized feature set.

163 citations


Book ChapterDOI
04 Jan 2012
TL;DR: This work proposes several ways to combine recent image-level and segment-level techniques to predict both image and segment labels jointly and confirms that the two levels offer complementary information.
Abstract: For the task of assigning labels to an image to summarize its contents, many early attempts use segment-level information and try to determine which parts of the images correspond to which labels. Best performing methods use global image similarity and nearest neighbor techniques to transfer labels from training images to test images. However, global methods cannot localize the labels in the images, unlike segment-level methods. Also, they cannot take advantage of training images that are only locally similar to a test image. We propose several ways to combine recent image-level and segment-level techniques to predict both image and segment labels jointly. We cast our experimental study in an unified framework for both image-level and segment-level annotation tasks. On three challenging datasets, our joint prediction of image and segment labels outperforms either prediction alone on both tasks. This confirms that the two levels offer complementary information.

140 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: A simplified and computationally-efficient version of the recent 2D-to-3D image conversion algorithm, which is validated quantitatively on a Kinect-captured image+depth dataset against the Make3D algorithm.
Abstract: Among 2D-to-3D image conversion methods, those involving human operators have been most successful but also time-consuming and costly. Automatic methods, that typically make use of a deterministic 3D scene model, have not yet achieved the same level of quality as they often rely on assumptions that are easily violated in practice. In this paper, we adopt the radically different approach of “learning” the 3D scene structure. We develop a simplified and computationally-efficient version of our recent 2D-to-3D image conversion algorithm. Given a repository of 3D images, either as stereopairs or image+depth pairs, we find k pairs whose photometric content most closely matches that of a 2D query to be converted. Then, we fuse the k corresponding depth fields and align the fused depth with the 2D query. Unlike in our original work, we validate the simplified algorithm quantitatively on a Kinect-captured image+depth dataset against the Make3D algorithm. While far from perfect, the presented results demonstrate that online repositories of 3D content can be used for effective 2D-to-3D image conversion.

131 citations


Journal ArticleDOI
TL;DR: The proposed approach for matching low-resolution probe images with higher resolution gallery images, which are often available during enrollment, using Multidimensional Scaling (MDS), improves the matching performance significantly as compared to performing matching in the low- resolution domain or using super-resolution techniques to obtain a higher resolution test image prior to recognition.
Abstract: Face recognition performance degrades considerably when the input images are of Low Resolution (LR), as is often the case for images taken by surveillance cameras or from a large distance. In this paper, we propose a novel approach for matching low-resolution probe images with higher resolution gallery images, which are often available during enrollment, using Multidimensional Scaling (MDS). The ideal scenario is when both the probe and gallery images are of high enough resolution to discriminate across different subjects. The proposed method simultaneously embeds the low-resolution probe images and the high-resolution gallery images in a common space such that the distance between them in the transformed space approximates the distance had both the images been of high resolution. The two mappings are learned simultaneously from high-resolution training images using an iterative majorization algorithm. Extensive evaluation of the proposed approach on the Multi-PIE data set with probe image resolution as low as 8 × 6 pixels illustrates the usefulness of the method. We show that the proposed approach improves the matching performance significantly as compared to performing matching in the low-resolution domain or using super-resolution techniques to obtain a higher resolution test image prior to recognition. Experiments on low-resolution surveillance images from the Surveillance Cameras Face Database further highlight the effectiveness of the approach.

129 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: A novel technique for figure-ground segmentation, where the goal is to separate all foreground objects in a test image from the background, by transferring segmentation masks from training windows that are visually similar to windows in the test image.
Abstract: We present a novel technique for figure-ground segmentation, where the goal is to separate all foreground objects in a test image from the background. We decompose the test image and all images in a supervised training set into overlapping windows likely to cover foreground objects. The key idea is to transfer segmentation masks from training windows that are visually similar to windows in the test image. These transferred masks are then used to derive the unary potentials of a binary, pairwise energy function defined over the pixels of the test image, which is minimized with standard graph-cuts. This results in a fully automatic segmentation scheme, as opposed to interactive techniques based on similar energy functions. Using windows as support regions for transfer efficiently exploits the training data, as the test image does not need to be globally similar to a training image for the method to work. This enables to compose novel scenes using local parts of training images. Our approach obtains very competitive results on three datasets (PASCAL VOC 2010 segmentation challenge, Weizmann horses, Graz-02).

128 citations


Journal ArticleDOI
TL;DR: A face recognition algorithm based on simultaneous sparse approximations under varying illumination and pose that has the ability to recognize human faces with high accuracy even when only a single or a very few images per person are provided for training.
Abstract: We present a face recognition algorithm based on simultaneous sparse approximations under varying illumination and pose. A dictionary is learned for each class based on given training examples which minimizes the representation error with a sparseness constraint. A novel test image is projected onto the span of the atoms in each learned dictionary. The resulting residual vectors are then used for classification. To handle variations in lighting conditions and pose, an image relighting technique based on pose-robust albedo estimation is used to generate multiple frontal images of the same person with variable lighting. As a result, the proposed algorithm has the ability to recognize human faces with high accuracy even when only a single or a very few images per person are provided for training. The efficiency of the proposed method is demonstrated using publicly available databases available databases and it is shown that this method is efficient and can perform significantly better than many competitive face recognition algorithms.

113 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: This work proposes a hierarchical approach to age estimation from face images, where face images are divided into various age groups and then a separate regression model is learned for each group.
Abstract: We consider the problem of automatic age estimation from face images. Age estimation is usually formulated as a regression problem relating the facial features and the age variable, and a single regression model is learnt for all ages. We propose a hierarchical approach, where we first divide the face images into various age groups and then learn a separate regression model for each group. Given a test image, we first classify the image into one of the age groups and then use the regression model for that particular group. To improve our classification result, we use many different classifiers and fuse them using the majority rule. Experiments show that our approach outperforms many state of the art regression methods for age estimation.

91 citations


Journal ArticleDOI
TL;DR: The proposed hashing algorithm shows superior robustness and discrimination performance compared with other state-of-the-art algorithms, particularly in the robustness against rotations (of large degrees).
Abstract: In this paper, we propose a robust-hash function based on random Gabor filtering and dithered lattice vector quantization (LVQ). In order to enhance the robustness against rotation manipulations, the conventional Gabor filter is adapted to be rotation invariant, and the rotation-invariant filter is randomized to facilitate secure feature extraction. Particularly, a novel dithered-LVQ-based quantization scheme is proposed for robust hashing. The dithered-LVQ-based quantization scheme is well suited for robust hashing with several desirable features, including better tradeoff between robustness and discrimination, higher randomness, and secrecy, which are validated by analytical and experimental results. The performance of the proposed hashing algorithm is evaluated over a test image database under various content-preserving manipulations. The proposed hashing algorithm shows superior robustness and discrimination performance compared with other state-of-the-art algorithms, particularly in the robustness against rotations (of large degrees).

Journal ArticleDOI
TL;DR: The results of several experiments show that the proposed algorithm for image cryptosystems provides an efficient and secure approach to real-time image encryption and transmission.
Abstract: We propose a new and efficient method to develop secure image-encryption techniques. The new algorithm combines two techniques: encryption and compression. In this technique, a wavelet transform was used to decompose the image and decorrelate its pixels into approximation and detail components. The more important component (the approximation component) is encrypted using a chaos-based encryption algorithm. This algorithm produces a cipher of the test image that has good diffusion and confusion properties. The remaining components (the detail components) are compressed using a wavelet transform. This proposed algorithm was verified to provide a high security level. A complete specification for the new algorithm is provided. Several test images are used to demonstrate the validity of the proposed algorithm. The results of several experiments show that the proposed algorithm for image cryptosystems provides an efficient and secure approach to real-time image encryption and transmission.

Journal ArticleDOI
TL;DR: This work develops a novel data-driven iterative algorithm that combines the best of both generative and discriminative approaches and introduces the notion of a “pull-back” operation that enables it to predict the parameters of the test image using training samples that are not in its neighborhood in the parameter space.
Abstract: Image alignment in the presence of non-rigid distortions is a challenging task. Typically, this involves estimating the parameters of a dense deformation field that warps a distorted image back to its undistorted template. Generative approaches based on parameter optimization such as Lucas-Kanade can get trapped within local minima. On the other hand, discriminative approaches like nearest-neighbor require a large number of training samples that grows exponentially with respect to the dimension of the parameter space, and polynomially with the desired accuracy 1/?. In this work, we develop a novel data-driven iterative algorithm that combines the best of both generative and discriminative approaches. For this, we introduce the notion of a "pull-back" operation that enables us to predict the parameters of the test image using training samples that are not in its neighborhood (not ?-close) in the parameter space. We prove that our algorithm converges to the global optimum using a significantly lower number of training samples that grows only logarithmically with the desired accuracy. We analyze the behavior of our algorithm extensively using synthetic data and demonstrate successful results on experiments with complex deformations due to water and clothing.

Book ChapterDOI
07 Oct 2012
TL;DR: This work develops a learning framework to jointly consider object classes and their relations in image retrieval with structured object queries --- queries that specify the objects that should be present in the scene, and their spatial relations.
Abstract: We consider image retrieval with structured object queries --- queries that specify the objects that should be present in the scene, and their spatial relations. An example of such queries is "car on the road". Existing image retrieval systems typically consider queries consisting of object classes (i.e. keywords). They train a separate classifier for each object class and combine the output heuristically. In contrast, we develop a learning framework to jointly consider object classes and their relations. Our method considers not only the objects in the query ("car" and "road" in the above example), but also related object categories can be useful for retrieval. Since we do not have ground-truth labeling of object bounding boxes on the test image, we represent them as latent variables in our model. Our learning method is an extension of the ranking SVM with latent variables, which we call latent ranking SVM. We demonstrate image retrieval and ranking results on a dataset with more than a hundred of object classes.

Book ChapterDOI
07 Oct 2012
TL;DR: The proposed model combines the support from all the linking features observed in a test image to infer the most likely joint configuration of all the parts of interest, and generality is shown by applying it without modification to part detection on datasets of animal parts and of facial fiducial points.
Abstract: We present an approach to the detection of parts of highly deformable objects, such as the human body. Instead of using kinematic constraints on relative angles used by most existing approaches for modeling part-to-part relations, we learn and use special observed 'linking' features that support particular pairwise part configurations. In addition to modeling the appearance of individual parts, the current approach adds modeling of the appearance of part-linking, which is shown to provide useful information. For example, configurations of the lower and upper arms are supported by observing corresponding appearances of the elbow or other relevant features. The proposed model combines the support from all the linking features observed in a test image to infer the most likely joint configuration of all the parts of interest. The approach is trained using images with annotated parts, but no a-priori known part connections or connection parameters are assumed, and the linking features are discovered automatically during training. We evaluate the performance of the proposed approach on two challenging human body parts detection datasets, and obtain performance comparable, and in some cases superior, to the state-of-the-art. In addition, the approach generality is shown by applying it without modification to part detection on datasets of animal parts and of facial fiducial points.

Journal ArticleDOI
TL;DR: Extensive experiments on real-world image data sets suggest that the proposed framework is able to predict the label and annotation for testing images successfully, and is computationally efficient and effective for image classification and annotation.
Abstract: In this paper, we propose a novel supervised nonnegative matrix factorization-based framework for both image classification and annotation. The framework consists of two phases: training and prediction. In the training phase, two supervised nonnegative matrix factorizations for image descriptors and annotation terms are combined to identify the latent image bases, and to represent the training images in the bases space. These latent bases can capture the representation of the images in terms of both descriptors and annotation terms. Based on the new representation of training images, classifiers can be learnt and built. In the prediction phase, a test image is first represented by the latent bases via solving a linear least squares problem, and then its class label and annotation can be predicted via the trained classifiers and the proposed annotation mapping model. In the algorithm, we develop a three-block proximal alternating nonnegative least squares algorithm to determine the latent image bases, and show its convergent property. Extensive experiments on real-world image data sets suggest that the proposed framework is able to predict the label and annotation for testing images successfully. Experimental results have also shown that our algorithm is computationally efficient and effective for image classification and annotation.

Journal ArticleDOI
TL;DR: It was observed that CoC with proposed modifications yield better results as compared to state of the art enhanced Principal Component Analysis and Enhanced Subspace Linear Discriminant Analysis.
Abstract: The goal of this paper is to analyse and improve the performance of metrics like Coefficient of Correlation (CoC) and Structural Similarity Index (SSIM) for image recognition in real-time environment. The main novelties of the methods are; it can work under uncontrolled environment and no need to store multiple copies of the same image at different orientations. The values of CoC and SSIM get changed if images are rotated or flipped or captured under bad/highly illuminated conditions. To increase the recognition accuracy, the input test image is pre-processed. First, discrete wavelet transform is applied to recognize the image captured under bad illuminated and dull lightning conditions. Second, to make the method rotation invariant, the test image is compared against the stored database image without and with rotations in the horizontal, vertical, diagonal, reverse diagonal and flipped directions. The image recognition performance is evaluated using the Recognition Rate and Rejection Rate. The results indicate that recognition performance of Correlation Coefficient and SSIM gets improved with rotations and discrete wavelet transform. Also it was observed that CoC with proposed modifications yield better results as compared to state of the art enhanced Principal Component Analysis and Enhanced Subspace Linear Discriminant Analysis. Keywords—Image Recognition, Discrete Wavelet Transforms, Correlation Coefficient, Structural Similarity Index Metrics.

Book ChapterDOI
12 Nov 2012
TL;DR: This work presents an approach to generate image descriptions from image annotation and shows that with accurate object and attribute detection, human-like descriptions can be generated.
Abstract: In this paper, we address the problem of automatically generating a description of an image from its annotation. Previous approaches either use computer vision techniques to first determine the labels or exploit available descriptions of the training images to either transfer or compose a new description for the test image. However, none of them report results on the effect of incorrect label detection on the quality of the final descriptions generated. With this motivation, we present an approach to generate image descriptions from image annotation and show that with accurate object and attribute detection, human-like descriptions can be generated. Unlike any previous work, we perform an extensive task-based evaluation to analyze our results.

Journal ArticleDOI
TL;DR: The proposed metric performs comparatively well for all the databases and is significantly improved by combining the mean quality scores from the edge and texture image region.

Book ChapterDOI
01 Oct 2012
TL;DR: A superpixel based learning framework based on retinal structure priors for glaucoma diagnosis that proposes processing of the fundus images at the superpixel level, which leads to features more descriptive and effective than those employed by pixel-based techniques, while yielding significant computational savings over methods based on sliding windows.
Abstract: We present a superpixel based learning framework based on retinal structure priors for glaucoma diagnosis. In digital fundus photographs, our method automatically localizes the optic cup, which is the primary image component clinically used for identifying glaucoma. This method provides three major contributions. First, it proposes processing of the fundus images at the superpixel level, which leads to features more descriptive and effective than those employed by pixel-based techniques, while yielding significant computational savings over methods based on sliding windows. Second, the classifier learning process does not rely on pre-labeled training samples, but rather the training samples are extracted from the test image itself using structural priors on relative cup and disc positions. Third, we present a classification refinement scheme that utilizes both structural priors and local context. Tested on the ORIGA−light clinical dataset comprised of 650 images, the proposed method achieves a 26.7% non-overlap ratio with manually-labeled ground-truth and a 0.081 absolute cup-to-disc ratio (CDR) error, a simple yet widely used diagnostic measure. This level of accuracy is comparable to or higher than the state-of-the-art technique [1], with a speedup factor of tens or hundreds.

Proceedings ArticleDOI
01 Sep 2012
TL;DR: A novel scale-embedded dictionary-based method that poses the problem of OD localization as that of classification, carried out in sparse representation framework, which was correctly localized in 253 out of 259 images and accuracy of 97.6%.
Abstract: Automatic eye screening for conditions like diabetic retinopathy critically hinges on detection and localization of Optic disk (OD). In this paper, we present a novel scale-embedded dictionary-based method that poses the problem of OD localization as that of classification, carried out in sparse representation framework. A dictionary is created with manually marked fixed-sized sub-images that contain OD at the center, for multiple scales. For a given test image, all subimages are sparsely represented as a linear combination of OD dictionary elements. A confidence measure indicating the likelihood of the presence of OD is obtained from these coefficients. Red channel and gray intensity images are processed independently, and their respective confidence measures are fused to form a confidence map. A blob detector is run on the confidence map, whose peak response is considered to be at the location of the OD. The proposed method is evaluated on publicly available databases such as DIARETDB0, DIARETDB1 and DRIVE. The OD was correctly localized in 253 out of 259 images, with an average computation time of 3.8 seconds/image and accuracy of 97.6%. Comparisons with two existing techniques are also discussed.

Proceedings ArticleDOI
29 Oct 2012
TL;DR: A novel context-aware classification model based on bilayer sparse representation (BSR) that simultaneously takes the local context and global-local context into account is proposed.
Abstract: In image understanding, the automatic recognition of emotion in an image is becoming important from an applicative viewpoint. Considering the fact that the emotion evoked by an image is not only from its global appearance but also interplays among local regions, we propose a novel context-aware classification model based on bilayer sparse representation (BSR) that simultaneously takes the local context and global-local context into account. The BSR model contains two layers: global sparse representation (GSR) and local sparse representation (LSR). The GSR is to define global similarities between a test image and all training images; while the LSR is to define similarities of local regions' appearances and their co-occurrence between a test image and all training images. The experiments on two data sets demonstrate that our method is effective on affective images classification.

Patent
29 Nov 2012
TL;DR: In this article, the optical flow between a reference image and a currently captured image, in the chronologically captured images, is calculated, and images in an overlap area (area having disparity) between the images are clipped, on the basis of the calculated optical flow.
Abstract: A plurality of images are acquired by panning an image capturing device (10) and performing high-speed continuous shooting. The optical flow between a reference image (previously captured image) and a currently captured image, in the chronologically captured images, is calculated, and images in an overlap area (area having disparity) between the images are clipped, on the basis of the calculated optical flow. A pair of clipped images is stored in a memory (48) as a left image and a right image, respectively. Thereafter, a left panoramic image is synthesized from a plurality of left images stored in the memory (48), and similarly, a right panoramic image is synthesized from a plurality of right images.

Proceedings ArticleDOI
29 Oct 2012
TL;DR: A novel affective image classification system based on bilayer sparse representation (BSR), which demonstrates that the system is effective on image emotion recognition.
Abstract: Automatic image emotion analysis has emerged as a hot topic due to its potential application on high-level image understanding. Considering the fact that the emotion evoked by an image is not only from its global appearance but also interplays among local regions, we propose a novel affective image classification system based on bilayer sparse representation (BSR). The BSR model contains two layers: The global sparse representation (GSR) is to define global similarities between a test image and all the training images; and the local sparse representation (LSR) is to define similarities of local regions' appearances and their co-occurrence between a test image and all the training images. The experiments on real data sets demonstrate that our system is effective on image emotion recognition.

Patent
10 May 2012
TL;DR: In this paper, an initial HDR image is coded and distributed, and a data packet is computed, which has a first and a second data set, each of which has an application marker that relates to the HDR-enhancement images.
Abstract: HDR images are coded and distributed. An initial HDR image is received. Processing the received HDR image creates a JPEG-2000 DCI-compliant coded baseline image and an HDR-enhancement image. The coded baseline image has one or more color components, each of which provide enhancement information that allows reconstruction of an instance of the initial HDR image using the baseline image and the HDR-enhancement images. A data packet is computed, which has a first and a second data set. The first data set relates to the baseline image color components, each of which has an application marker that relates to the HDR-enhancement images. The second data set relates to the HDR-enhancement image. The data packets are sent in a DCI-compliant bit stream.

Book ChapterDOI
31 Oct 2012
TL;DR: The proposed scheme reversibly embeds data into image prediction-errors by using histogram-pair method with the following four thresholds for optimal performance: embedding threshold, fluctuation threshold, left- and right-histogram shrinking thresholds, and different from the previous work, the image gray level histogram shrinking towards the center is not only for avoiding underflow and/or overflow but also for optimum performance.
Abstract: This proposed scheme reversibly embeds data into image prediction-errors by using histogram-pair method with the following four thresholds for optimal performance: embedding threshold, fluctuation threshold, left- and right-histogram shrinking thresholds. The embedding threshold is used to select only those prediction-errors, whose magnitude does not exceed this threshold, for possible reversible data hiding. The fluctuation threshold is used to select only those prediction-errors, whose associated neighbor fluctuation does not exceed this threshold, for possible reversible data hiding. The left- and right-histogram shrinking thresholds are used to possibly shrink histogram from the left and right, respectively, by a certain amount for reversible data hiding. Only when all of four thresholds are satisfied the reversible data hiding is carried out. Different from our previous work, the image gray level histogram shrinking towards the center is not only for avoiding underflow and/or overflow but also for optimum performance. The required bookkeeping data are embedded together with pure payload for original image recovery. The experimental results on four popularly utilized test images (Lena, Barbara, Baboon, Airplane) and one of the JPEG2000 test image (Woman, whose histogram does not have zero points in the whole range of gray levels, and has peaks at its both ends) have demonstrated that the proposed scheme outperforms recently published reversible image data hiding schemes in terms of the highest PSNR of marked image verses original image at given pure payloads.

Patent
05 Jan 2012
TL;DR: In this paper, an image forming device capable of forming a good quality image even if the image forming condition changes, by replacing density conversion means for converting luminance of a patch image into density, with density conversion conversion means suitable for the image form condition, was proposed.
Abstract: PROBLEM TO BE SOLVED: To provide an image forming device capable of forming a good quality image even if an image forming condition changes, by replacing density conversion means for converting luminance of a patch image into density, with density conversion means suitable for the image forming condition, when the image forming condition changesSOLUTION: An image forming device 100 adjusts a γ-LUT 25 of a gamma correction circuit according to density data of a test image formed on a photosensitive drum 1 A CPU 28 selects a conversion table according to image forming conditions such as laser power of a semiconductor laser 32, fixing temperature of a fuser 10, charge amount of developer, and the like A luminance-density conversion part 42 converts luminance data of the test image into the density data, using the conversion table selected by the CPU 28 The CPU 28 adjusts contrast potential and the γ-LUT 25, using the density data

Proceedings ArticleDOI
15 Mar 2012
TL;DR: It has been observed that the test image is matching and recognized with respect to original image, and the value of average error is less than that of test image without application of artificial neural network.
Abstract: There are several techniques for image recognition. Among those methods, application of soft computing models on digital image has been considered to be an approach for a better result. The main objective of the present work is to provide a new approach for image recognition using Artificial Neural Networks. Initially an original gray scale intensity image has been taken for transformation. The Input image has been added with Salt and Peeper noise. Adaptive median Filter has been applied on noisy image such that the noise can be removed and the output image would be considered as filtered Image. The estimated Error and average error of the values stored in filtered image matrix have been calculated with reference to the values stored in original data matrix for the purpose of checking of proper noise removal. Now each pixel data has been converted into binary number (8 bit) from decimal values. A set of four pixels has been taken together to form a new binary number with 32 bits and it has been converted into a decimal. This process continues to produce new data matrix with new different set of values. This data matrix has been taken as original data matrix and saved in data bank. Now for recognition, a new test image has been taken and the same steps as salt & pepper noise insertion, removal of noise using adaptive median filter as mentioned earlier have been applied to get a new test matrix. Now the average error of the second image with respect to original image has been calculated based on the both generated matrices. If the average error is more than 45% then a conclusion can be made that the images are different and cannot be matched. But if the value of average error has been found to be less than or equal to 45%, an effort has been made to use the artificial neural network on test data matrix with reference to original data matrix thereby producing a new matrix of the second image(test image). The total average error has been calculated on generated data matrix produced after the application of artificial neural networks on test data matrix to check whether proper identification can be made or not. It has been observed that the value of average error is less than that of test image without application of artificial neural network. Further it has been observed that the test image is matching and recognized with respect to original image.

Proceedings ArticleDOI
01 Jan 2012
TL;DR: A completely new approach to the colour constancy problem by unsupervised learning of an appropriate model for each training surface in training images, which has the advantage of overcoming multi-illuminant situations, which is not possible for most current methods.
Abstract: Exemplar-based learning or, equally, nearest neighbour methods have recently gained interest from researchers in a variety of computer science domains because of the prevalence of large amounts of accessible data and storage capacity. In computer vision, these types of technique have been successful in several problems such as scene recognition, shape matching, image parsing, character recognition and object detection. Applying the concept of exemplar-based learning to the problem of colour constancy seems odd at first glance since, in the first place, similar nearest neighbour images are not usually affected by precisely similar illuminants and, in the second place, gathering a dataset consisting of all possible real-world images, including indoor and outdoor scenes and for all possible illuminant colours and intensities, is indeed impossible. In this paper we instead focus on surfaces in the image and address the colour constancy problem by unsupervised learning of an appropriate model for each training surface in training images. We find nearest neighbour models for each surface in a test image and estimate its illumination based on comparing the statistics of pixels belonging to nearest neighbour surfaces and the target surface. The final illumination estimation results from combining these estimated illuminants over surfaces to generate a unique estimate. The proposed method has the advantage of overcoming multi-illuminant situations, which is not possible for most current methods. The concept proposed here is a completely new approach to the colour constancy problem. We show that it performs very well, for standard datasets, compared to current colour constancy algorithms.

Proceedings Article
Chen Siyuan1, Yuan He1, Jun Sun1, Satoshi Naoi1
01 Nov 2012
TL;DR: This paper presents a novel approach for structured document classification by matching the salient feature points between the query image and the reference images, which is robust to diverse training data size, image formats and qualities.
Abstract: Following the recent trend in using low level image features in classifying document images, in this paper we present a novel approach for structured document classification by matching the salient feature points between the query image and the reference images. Our method is robust to diverse training data size, image formats and qualities. Through matching the feature points, image registration is available for the query image as well. Although we aimed for the large domain of the structured document images, our method already achieved zero error rates in the tests on the benchmark NIST tax form databases.