scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A hierarchical clustering approach for image datasets

01 Dec 2014-pp 1-6
TL;DR: A clustering algorithm is proposed in this work to achieve a dataset with images grouped semantically, which does not utilize any background knowledge related either to the semantics of images or the number of clusters formed.
Abstract: Humans analyze images mostly on their semantics. But such a semantic clustering of images is one of the difficult tasks in the field of computer vision. A clustering algorithm is proposed in this work to achieve a dataset with images grouped semantically. It does not utilize any background knowledge related either to the semantics of images or the number of clusters formed. The algorithm is based on the agglomerative method of hierarchical clustering algorithm. At each intermediate step, a representative image is chosen to denote a cluster. This image stands for every other image belonging to a cluster and hence there is some loss of information. This loss is tracked to get the number of clusters automatically. Experimental results on four datasets of varying sizes are presented which show the efficiency and effectiveness of the proposed algorithm. The results are also compared with a popular k-means algorithm.
Citations
More filters
Journal ArticleDOI
TL;DR: The results reported with non-hierarchical but categorized image databases further prove the efficacy of the proposed system and are evaluated in terms of accuracy of the retrieved semantics and precision-recall curves.
Abstract: Proposes automatic semantics and image retrieval system for hierarchical databases.System uses < 1/3 search space to retrieve semantics, and finally similar images.? 77% semantic retrieval accuracy on ImageNet. Uses ? 4% space to retrieve images.System reports precision of 0.78 @ 20 and 0.67 @ 100 images on categorized WANG.The study explores adequacy of visual signatures set used to represent a semantic. This work presents a content based semantics and image retrieval system for semantically categorized hierarchical image databases. Each module is designed with an aim to develop a system that works closer to human perception. Images are mapped to a multidimensional feature space, where images belonging a semantic are clustered and indexed to acquire its efficient representation. This helps in handling the existing variability or heterogeneity within this semantic. Adaptive combinations of the obtained depictions are utilized by the branch selection and pruning algorithms to identify some closer semantics and select only a part of the large hierarchical search space for actual search. So obtained search space is finally used to retrieve desired semantics and similar images corresponding to them. The system is evaluated in terms of accuracy of the retrieved semantics and precision-recall curves. Experiments show promising semantics and image retrieval results on hierarchical image databases. The results reported with non-hierarchical but categorized image databases further prove the efficacy of the proposed system.

20 citations

Book
31 Oct 2018
TL;DR: This chapter provides a concise introduction to the book, by explaining the motivation behind its elaboration and pointing out the need for a comprehensive text on the state, progress made and open questions revolving around large-group decision making research.
Abstract: This chapter provides a concise introduction to the book, by explaining the motivation behind its elaboration and pointing out the need for a comprehensive text on the state, progress made and open questions revolving around large-group decision making research. Some notes are also provided for the potential readership of the book. The chapter finalizes with an outline of the book structure into chapters.

14 citations

Journal ArticleDOI
TL;DR: This work presents a Content-Based Image Retrieval (CBIR) system embedded with a clustering technique to retrieve images similar to query image and results are presented to show efficacy of the developed system in retrieving semantically more similar images.

13 citations

Journal ArticleDOI
TL;DR: Various image annotation methods, namely: Visual Content-based and Users’ Tags-based Image Annotation Methods are analyzed since they are one of the dynamic research fields nowadays.
Abstract: In the current era of digital communication, the use of images is growing exponentially since they are one of the best ways of expressing, sharing and memorizing knowledge. In fact, images can be used in various real-world applications, like biology, medical diagnosis, space research, remote sensing, etc. However, finding the most relevant images that meet the users’ needs is a challenging task, especially when the search is performed over gigantic amounts of images. This has led to the emergence of several image retrieval studies during the past two decades. Typically, research studies in this area were focused on the Content-based Image Retrieval (CBIR). However, extensive research have proved that there is a ‘semantic gap’ between the visual information captured by the imaging devices and the image semantics understandable by humans. As an alternative, researchers’ efforts have been oriented towards the Text-based Image Retrieval (TBIR). Indeed, TBIR is a typical method that helps bridge the issue of ‘semantic gap’ between the low-level image features and the high-level image semantics. Its policy consists in associating textual descriptions with the images, which constitute the focus of the research queries later on. In this paper, we analyze various image annotation methods, namely: Visual Content-based and Users’ Tags-based Image Annotation Methods. In particular, we focus on the visual content-based image annotation techniques since they are one of the dynamic research fields nowadays.

7 citations


Cites background from "A hierarchical clustering approach ..."

  • ...[215] for achieving a dataset with images grouped semantically....

    [...]

Dissertation
11 Dec 2015
TL;DR: This thesis focuses on which elements inside a profile picture modify the perception of the context that best suits the picture itself and modelled the social context with image features, to understand and quantify the influence of each feature.
Abstract: Online communities and social networks are more and more present in everyday life. On these networks, people build a virtual profile and interact between them for many different purposes: to be in touch with friends, for business, to make new connections, to find a love partner... Online profiles should be coherent with these tasks, starting with the omnipresent profile picture, as different pictures of the same subject can convey very different messages. This thesis focuses on which elements inside a profile picture modify the perception of the context that best suits the picture itself. We define this concept borrowing the "social context" concept in psychology. Image features considered are both low and high level; while the first are more technical quantities related to the sole pixels values (i.e. brightness, contrast), the latter are related to the understanding of the scene depicted in the picture. These elements are underlined by results of research on psychology (i.e. influence of clothing or gaze direction in social interactions). These features have been evaluated through crowdsourcing, a relatively new technique that exploits the power of the web to outsource simple tasks. We adopted the same technique to gather social context evaluations, being this a subjective perception. Then, through different mathematical approaches, we modelled the social context with image features, to understand and quantify the influence of each feature. Results are in line with empirical experience.

5 citations

References
More filters
Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations


"A hierarchical clustering approach ..." refers background in this paper

  • ...1, belonging to 5 different categories of vegetable semantics from ImageNet [11]....

    [...]

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations


"A hierarchical clustering approach ..." refers background in this paper

  • ...Clustering algorithms are categorized in several ways [9]....

    [...]

Journal ArticleDOI
TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

3,433 citations


"A hierarchical clustering approach ..." refers background in this paper

  • ...The performance of such systems is greatly affected by the presence of semantic gap, which is excessive for large image collections [3-4]....

    [...]

Journal ArticleDOI
TL;DR: The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible.

2,597 citations


"A hierarchical clustering approach ..." refers methods in this paper

  • ...The algorithm is also tested on a large dataset Caltech101 with 8677 images spread over 101 object categories [13]....

    [...]

Journal ArticleDOI
TL;DR: This paper implemented and tested the ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images and demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.
Abstract: Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in computer vision and content-based image retrieval. In this paper, we introduce a statistical modeling approach to this problem. Categorized images are used to train a dictionary of hundreds of statistical models each representing a concept. Images of any given concept are regarded as instances of a stochastic process that characterizes the concept. To measure the extent of association between an image and the textual description of a concept, the likelihood of the occurrence of the image based on the characterizing stochastic process is computed. A high likelihood indicates a strong association. In our experimental implementation, we focus on a particular group of stochastic processes, that is, the two-dimensional multiresolution hidden Markov models (2D MHMMs). We implemented and tested our ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images. The system is evaluated quantitatively using more than 4,600 images outside the training database and compared with a random annotation scheme. Experiments have demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.

1,163 citations


"A hierarchical clustering approach ..." refers background or methods in this paper

  • ...∑ (7) The other standard measures used for comparison of clustering quality are Davies Bouldin Index (DB), Separation Measure (S) and Dunn’s Index (DI) [12]....

    [...]

  • ...ZUBUD contains 1005 images for 201 different buildings of Zurich city [10], while WANG contains 100 images in each of the ten categories [12]....

    [...]