A hierarchical clustering approach for image datasets

doi:10.1109/ICIINFS.2014.7036504

Home
/
Papers
/
A hierarchical clustering approach for image datasets

Proceedings Article•DOI•

A hierarchical clustering approach for image datasets

01 Dec 2014-pp 1-6

TL;DR: A clustering algorithm is proposed in this work to achieve a dataset with images grouped semantically, which does not utilize any background knowledge related either to the semantics of images or the number of clusters formed.

read less

Abstract: Humans analyze images mostly on their semantics. But such a semantic clustering of images is one of the difficult tasks in the field of computer vision. A clustering algorithm is proposed in this work to achieve a dataset with images grouped semantically. It does not utilize any background knowledge related either to the semantics of images or the number of clusters formed. The algorithm is based on the agglomerative method of hierarchical clustering algorithm. At each intermediate step, a representative image is chosen to denote a cluster. This image stands for every other image belonging to a cluster and hence there is some loss of information. This loss is tracked to get the number of clusters automatically. Experimental results on four datasets of varying sizes are presented which show the efficiency and effectiveness of the proposed algorithm. The results are also compared with a popular k-means algorithm.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A semantics and image retrieval system for hierarchical image databases

[...]

Shreelekha Pandey¹, Pritee Khanna¹, Haruo Yokota²•Institutions (2)

Indian Institute of Information Technology, Design and Manufacturing, Jabalpur¹, Tokyo Institute of Technology²

01 Jul 2016-Information Processing and Management

TL;DR: The results reported with non-hierarchical but categorized image databases further prove the efficacy of the proposed system and are evaluated in terms of accuracy of the retrieved semantics and precision-recall curves.

...read moreread less

Abstract: Proposes automatic semantics and image retrieval system for hierarchical databases.System uses < 1/3 search space to retrieve semantics, and finally similar images.? 77% semantic retrieval accuracy on ImageNet. Uses ? 4% space to retrieve images.System reports precision of 0.78 @ 20 and 0.67 @ 100 images on categorized WANG.The study explores adequacy of visual signatures set used to represent a semantic. This work presents a content based semantics and image retrieval system for semantically categorized hierarchical image databases. Each module is designed with an aim to develop a system that works closer to human perception. Images are mapped to a multidimensional feature space, where images belonging a semantic are clustered and indexed to acquire its efficient representation. This helps in handling the existing variability or heterogeneity within this semantic. Adaptive combinations of the obtained depictions are utilized by the branch selection and pruning algorithms to identify some closer semantics and select only a part of the large hierarchical search space for actual search. So obtained search space is finally used to retrieve desired semantics and similar images corresponding to them. The system is evaluated in terms of accuracy of the retrieved semantics and precision-recall curves. Experiments show promising semantics and image retrieval results on hierarchical image databases. The results reported with non-hierarchical but categorized image databases further prove the efficacy of the proposed system.

...read moreread less

20 citations

Book•

Large Group Decision Making: Creating Decision Support Approaches at Scale

[...]

Iván Palomares Carrascosa

31 Oct 2018

TL;DR: This chapter provides a concise introduction to the book, by explaining the motivation behind its elaboration and pointing out the need for a comprehensive text on the state, progress made and open questions revolving around large-group decision making research.

...read moreread less

Abstract: This chapter provides a concise introduction to the book, by explaining the motivation behind its elaboration and pointing out the need for a comprehensive text on the state, progress made and open questions revolving around large-group decision making research. Some notes are also provided for the potential readership of the book. The chapter finalizes with an outline of the book structure into chapters.

...read moreread less

14 citations

Journal Article•DOI•

Content-based image retrieval embedded with agglomerative clustering built on information loss

[...]

Shreelekha Pandey¹, Pritee Khanna¹•Institutions (1)

Indian Institute of Information Technology, Design and Manufacturing, Jabalpur¹

01 Aug 2016-Computers & Electrical Engineering

TL;DR: This work presents a Content-Based Image Retrieval (CBIR) system embedded with a clustering technique to retrieve images similar to query image and results are presented to show efficacy of the developed system in retrieving semantically more similar images.

...read moreread less

13 citations

Journal Article•DOI•

A review on visual content-based and users’ tags-based image annotation: methods and techniques

[...]

Mariam Bouchakwa¹, Yassine Ayadi¹, Ikram Amous¹•Institutions (1)

University of Sfax¹

09 May 2020-Multimedia Tools and Applications

TL;DR: Various image annotation methods, namely: Visual Content-based and Users’ Tags-based Image Annotation Methods are analyzed since they are one of the dynamic research fields nowadays.

...read moreread less

Abstract: In the current era of digital communication, the use of images is growing exponentially since they are one of the best ways of expressing, sharing and memorizing knowledge. In fact, images can be used in various real-world applications, like biology, medical diagnosis, space research, remote sensing, etc. However, finding the most relevant images that meet the users’ needs is a challenging task, especially when the search is performed over gigantic amounts of images. This has led to the emergence of several image retrieval studies during the past two decades. Typically, research studies in this area were focused on the Content-based Image Retrieval (CBIR). However, extensive research have proved that there is a ‘semantic gap’ between the visual information captured by the imaging devices and the image semantics understandable by humans. As an alternative, researchers’ efforts have been oriented towards the Text-based Image Retrieval (TBIR). Indeed, TBIR is a typical method that helps bridge the issue of ‘semantic gap’ between the low-level image features and the high-level image semantics. Its policy consists in associating textual descriptions with the images, which constitute the focus of the research queries later on. In this paper, we analyze various image annotation methods, namely: Visual Content-based and Users’ Tags-based Image Annotation Methods. In particular, we focus on the visual content-based image annotation techniques since they are one of the dynamic research fields nowadays.

...read moreread less

7 citations

Cites background from "A hierarchical clustering approach ..."

...[215] for achieving a dataset with images grouped semantically....
[...]

Dissertation•

Influence of image features on face portraits social context interpretation: experimental methods, crowdsourcing based studies and models

[...]

Filippo Mazza

11 Dec 2015

TL;DR: This thesis focuses on which elements inside a profile picture modify the perception of the context that best suits the picture itself and modelled the social context with image features, to understand and quantify the influence of each feature.

...read moreread less

Abstract: Online communities and social networks are more and more present in everyday life. On these networks, people build a virtual profile and interact between them for many different purposes: to be in touch with friends, for business, to make new connections, to find a love partner... Online profiles should be coherent with these tasks, starting with the omnipresent profile picture, as different pictures of the same subject can convey very different messages. This thesis focuses on which elements inside a profile picture modify the perception of the context that best suits the picture itself. We define this concept borrowing the "social context" concept in psychology. Image features considered are both low and high level; while the first are more technical quantities related to the sole pixels values (i.e. brightness, contrast), the latter are related to the understanding of the scene depicted in the picture. These elements are underlined by results of research on psychology (i.e. influence of clothing or gaze direction in social interactions). These features have been evaluated through crowdsourcing, a relatively new technique that exploits the power of the web to outsource simple tasks. We adopted the same technique to gather social context evaluations, being this a subjective perception. Then, through different mathematical approaches, we modelled the social context with image features, to understand and quantify the influence of each feature. Results are in line with empirical experience.

...read moreread less

5 citations

References

PDF

Open Access

More filters

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

[...]

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

49,639 citations

"A hierarchical clustering approach ..." refers background in this paper

...1, belonging to 5 different categories of vegetable semantics from ImageNet [11]....
[...]

Journal Article•DOI•

Survey of clustering algorithms

[...]

Rui Xu¹, Donald C. Wunsch¹•Institutions (1)

Missouri University of Science and Technology¹

01 May 2005-IEEE Transactions on Neural Networks

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

...read moreread less

Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

...read moreread less

5,744 citations

"A hierarchical clustering approach ..." refers background in this paper

...Clustering algorithms are categorized in several ways [9]....
[...]

Journal Article•DOI•

Image retrieval: Ideas, influences, and trends of the new age

[...]

Ritendra Datta¹, Dhiraj Joshi¹, Jia Li¹, James Z. Wang¹•Institutions (1)

Pennsylvania State University¹

08 May 2008-ACM Computing Surveys

TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.

...read moreread less

Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

...read moreread less

3,433 citations

"A hierarchical clustering approach ..." refers background in this paper

...The performance of such systems is greatly affected by the presence of semantic gap, which is excessive for large image collections [3-4]....
[...]

Journal Article•DOI•

Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories

[...]

Li Fei-Fei¹, Rob Fergus², Pietro Perona³•Institutions (3)

Princeton University¹, University of Oxford², California Institute of Technology³

01 Apr 2007-Computer Vision and Image Understanding

TL;DR: The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible.

...read moreread less

2,597 citations

"A hierarchical clustering approach ..." refers methods in this paper

...The algorithm is also tested on a large dataset Caltech101 with 8677 images spread over 101 object categories [13]....
[...]

Journal Article•DOI•

Automatic Linguistic Indexing of Pictures by a statistical modeling approach

[...]

Jia Li¹, James Z. Wang²•Institutions (2)

Pennsylvania State University¹, Penn State College of Information Sciences and Technology²

01 Sep 2003-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper implemented and tested the ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images and demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.

...read moreread less

Abstract: Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in computer vision and content-based image retrieval. In this paper, we introduce a statistical modeling approach to this problem. Categorized images are used to train a dictionary of hundreds of statistical models each representing a concept. Images of any given concept are regarded as instances of a stochastic process that characterizes the concept. To measure the extent of association between an image and the textual description of a concept, the likelihood of the occurrence of the image based on the characterizing stochastic process is computed. A high likelihood indicates a strong association. In our experimental implementation, we focus on a particular group of stochastic processes, that is, the two-dimensional multiresolution hidden Markov models (2D MHMMs). We implemented and tested our ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images. The system is evaluated quantitatively using more than 4,600 images outside the training database and compared with a random annotation scheme. Experiments have demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.

...read moreread less

1,163 citations

"A hierarchical clustering approach ..." refers background or methods in this paper

...∑ (7) The other standard measures used for comparison of clustering quality are Davies Bouldin Index (DB), Separation Measure (S) and Dunn’s Index (DI) [12]....
[...]
...ZUBUD contains 1005 images for 201 different buildings of Zurich city [10], while WANG contains 100 images in each of the ten categories [12]....
[...]