Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Locality sensitive hashing: A comparison of hash function types and querying mechanisms

[...]

Loïc Paulevé¹, Hervé Jégou², Laurent Amsaleg³•Institutions (3)

École normale supérieure de Cachan¹, French Institute for Research in Computer Science and Automation², Centre national de la recherche scientifique³

01 Aug 2010-Pattern Recognition Letters

TL;DR: This paper compares several families of space hashing functions in a real setup and reveals that unstructured quantizer significantly improves the accuracy of LSH, as it closely fits the data in the feature space.

...read moreread less

327 citations

Proceedings Article•DOI•

Multiple feature hashing for real-time large scale near-duplicate video retrieval

[...]

Jingkuan Song¹, Yi Yang², Zi Huang¹, Heng Tao Shen¹, Richang Hong³ - Show less +1 more•Institutions (3)

University of Queensland¹, Carnegie Mellon University², Hefei University of Technology³

28 Nov 2011

TL;DR: This paper presents a novel approach - Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR and shows that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

...read moreread less

Abstract: Near-duplicate video retrieval (NDVR) has recently attracted lots of research attention due to the exponential growth of online videos. It helps in many areas, such as copyright protection, video tagging, online video usage monitoring, etc. Most of existing approaches use only a single feature to represent a video for NDVR. However, a single feature is often insufficient to characterize the video content. Besides, while the accuracy is the main concern in previous literatures, the scalability of NDVR algorithms for large scale video datasets has been rarely addressed. In this paper, we present a novel approach - Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR. MFH preserves the local structure information of each individual feature and also globally consider the local structures for all the features to learn a group of hash functions which map the video keyframes into the Hamming space and generate a series of binary codes to represent the video dataset. We evaluate our approach on a public video dataset and a large scale video dataset consisting of 132,647 videos, which was collected from YouTube by ourselves. The experiment results show that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

...read moreread less

324 citations

Proceedings Article•DOI•

Self-taught hashing for fast similarity search

[...]

Dell Zhang¹, Jun Wang², Deng Cai³, Jinsong Lu¹•Institutions (3)

Birkbeck, University of London¹, University College London², Zhejiang University³

19 Jul 2010

TL;DR: Self-Taught Hashing (STH) as discussed by the authors is a self-taught hashing method that finds the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then trains l classifiers via supervised learning to predict the lbit code for any query document unseen before.

...read moreread less

Abstract: The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the l-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms state-of-the-art techniques significantly.

...read moreread less

322 citations

Proceedings Article•DOI•

BitShred: feature hashing malware for scalable triage and semantic analysis

[...]

Jiyong Jang¹, David Brumley¹, Shobha Venkataraman²•Institutions (2)

Carnegie Mellon University¹, AT&T Labs²

17 Oct 2011

TL;DR: The key idea behind BitShred is using feature hashing to dramatically reduce the high-dimensional feature spaces that are common in malware analysis, and to mine correlated features between malware families and samples using co-clustering techniques.

...read moreread less

Abstract: The sheer volume of new malware found each day is growing at an exponential pace This growth has created a need for automatic malware triage techniques that determine what malware is similar, what malware is unique, and why In this paper, we present BitShred, a system for large-scale malware similarity analysis and clustering, and for automatically uncovering semantic inter- and intra-family relationships within clusters The key idea behind BitShred is using feature hashing to dramatically reduce the high-dimensional feature spaces that are common in malware analysis Feature hashing also allows us to mine correlated features between malware families and samples using co-clustering techniques Our evaluation shows that BitShred speeds up typical malware triage tasks by up to 2,365x and uses up to 82x less memory on a single CPU, all with comparable accuracy to previous approaches We also develop a parallelized version of BitShred, and demonstrate scalability within the Hadoop framework

...read moreread less

314 citations

Proceedings Article•DOI•

Using multiple hash functions to improve IP lookups

[...]

Andrei Z. Broder¹, Michael Mitzenmacher²•Institutions (2)

AmeriCorps VISTA¹, Harvard University²

22 Apr 2001

TL;DR: This work describes an approach for obtaining good hash tables based on using multiple hashes of each input key (which is an IP address), which proves extremely suitable in instances where the goal is to have one hash bucket fit into a cache line.

...read moreread less

Abstract: High performance Internet routers require a mechanism for very efficient IP address lookups. Some techniques used to this end, such as binary search on levels, need to construct quickly a good hash table for the appropriate IP prefixes. We describe an approach for obtaining good hash tables based on using multiple hashes of each input key (which is an IP address). The methods we describe are fast, simple, scalable, parallelizable, and flexible. In particular, in instances where the goal is to have one hash bucket fit into a cache line, using multiple hashes proves extremely suitable. We provide a general analysis of this hashing technique and specifically discuss its application to binary search on levels.

...read moreread less

294 citations

Collapse

Network Information

Performance

Metrics

1,120

Papers

57,460

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	89
2021	11
2020	16
2019	16
2018	38

Feature hashing

Papers published on a yearly basis

Papers

Trending Questions (2)

Network Information

Related Topics (5)

Performance

Metrics