Understanding bag-of-words model: A statistical framework

doi:10.1007/S13042-010-0001-0

Journal ArticleDOI

Understanding bag-of-words model: A statistical framework

Yin Zhang, +2 more

- 28 Aug 2010 -

International Journal of Machine Learnin...

- Vol. 1, Iss: 1, pp 43-52

TLDR

A statistical framework which generalizes the bag-of-words representation, in which the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method.

Abstract:

The bag-of-words model is one of the most popular representation methods for object categorization. The key idea is to quantize each extracted key point into one of visual words, and then represent each image by a histogram of the visual words. For this purpose, a clustering algorithm (e.g., K-means), is generally used for generating the visual words. Although a number of studies have shown encouraging results of the bag-of-words representation for object categorization, theoretical studies on properties of the bag-of-words model is almost untouched, possibly due to the difficulty introduced by using a heuristic clustering process. In this paper, we present a statistical framework which generalizes the bag-of-words representation. In this framework, the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method. A theoretical analysis based on statistical consistency is presented for the proposed framework. Moreover, based on the framework we developed two algorithms which do not rely on clustering, while achieving competitive performance in object categorization when compared to clustering-based bag-of-words representations.

Understanding bag-of-words model: A statistical framework

Citations

The PASCAL Visual Object Classes Challenge

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

SourcererCC: scaling code clone detection to big-code

Multi-modal Transformer for Video Retrieval

Advanced internet of things for personalised healthcare systems

References

Distinctive Image Features from Scale-Invariant Keypoints

Handbook of mathematical functions : with formulas, graphs, and mathematical tables

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Video Google: a text retrieval approach to object matching in videos

Related Papers (5)

Glove: Global Vectors for Word Representation

Latent dirichlet allocation

Long short-term memory

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Distinctive Image Features from Scale-Invariant Keypoints