Showing papers on "Feature hashing published in 2021"

PDF

Open Access

Posted Content•

TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval

[...]

Chen Yongbiao¹, Sheng Zhang², Fangxin Liu¹, Zhigang Chang¹, Mang Ye³, Zhengwei Qi¹ - Show less +2 more•Institutions (3)

Shanghai Jiao Tong University¹, University of Southern California², Wuhan University³

05 May 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Transhash as discussed by the authors designs a siamese vision transformer backbone for image feature extraction and adopts a dual-stream feature learning on top of the transformer to learn discriminative global and local features.

...read moreread less

Abstract: Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. \texttt{Resnet}\cite{he2016deep}. In this paper, inspired by the recent advancements of vision transformers, we present \textbf{Transhash}, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on \textit{Vision Transformer} (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end manner.~To the best of our knowledge, this is the first work to tackle deep hashing learning problems without convolutional neural networks (\textit{CNNs}). We perform comprehensive experiments on three widely-studied datasets: \textbf{CIFAR-10}, \textbf{NUSWIDE} and \textbf{IMAGENET}. The experiments have evidenced our superiority against the existing state-of-the-art deep hashing methods. Specifically, we achieve 8.2\%, 2.6\%, 12.7\% performance gains in terms of average \textit{mAP} for different hash bit lengths on three public datasets, respectively.

...read moreread less

6 citations

Posted Content•

Deep Hashing with Hash Center Update for Efficient Image Retrieval

[...]

Abin Jose¹, Daniel Filbert, Christian Rohlfing, Jens-Rainer Ohm•Institutions (1)

RWTH Aachen University¹

11 Jun 2021-arXiv: Image and Video Processing

TL;DR: In this article, Canonical correlation analysis (CCA) is used to design two loss functions for training a neural network such that the correlation between the two views to CCA is maximized.

...read moreread less

Abstract: In this paper, we propose an approach for learning binary hash codes for image retrieval. Canonical Correlation Analysis (CCA) is used to design two loss functions for training a neural network such that the correlation between the two views to CCA is maximized. The first loss, maximizes the correlation between the hash centers and learned hash codes. The second loss maximizes the correlation between the class labels and classification scores. A novel weighted mean and thresholding based hash center update scheme is proposed for adapting the hash centers in each epoch. The training loss reaches the theoretical lower bound of the proposed loss functions, showing that the correlation coefficients are maximized during training and substantiating the formation of an efficient feature space for image retrieval. The measured mean average precision shows that the proposed approach outperforms other state-of-the-art approaches in both single-labeled and multi-labeled image datasets.

...read moreread less

4 citations

Journal Article•DOI•

Compact feature hashing for machine learning based malware detection

[...]

Damin Moon¹, Jaekoo Lee¹, MyungKeun Yoon¹•Institutions (1)

Kookmin University¹

10 Aug 2021-ICT Express

TL;DR: This paper finds for the first time that the default vector size of current feature hashing practices is unnecessarily large, which reduces memory space by 70% and increases the detection accuracy, compared with the state-of-the-art scheme.

...read moreread less

4 citations

Proceedings Article•DOI•

A New Feature Hashing Approach Based on Term Weight for Dimensional Reduction

[...]

Abubakar Ado, Noor Azah Samsudin, Mustafa Mat Deris

04 Jul 2021

TL;DR: This paper proposed a new feature hashing approach that hashes similar features to the same bin based on their weight known as "weight term" while minimizing certain collisions, which effectively reduces the collisions between dissimilar features, thus improving model performance.

...read moreread less

Abstract: Machine learning models usually face a problem when encountered with large scale text dataset. Such kind of dataset produces sparse features of a high-dimensional, which makes it complex or infeasible to process by the learning models. Feature hashing is a dimensional reduction technique commonly used in the pre-processing phase to overcome the aforementioned problem. However, models performance are negatively affected due to the inherited so-called collisions that occur during the hashing process. In this study, we proposed a new Feature hashing approach that hashes similar features to the same bin based on their weight known as "weight term" while minimizing certain collisions. The approach effectively reduces the collisions between dissimilar features, thus improving model performance. The experiment results conducted on binary and multi-class classification datasets with a very high number of sparse features show that the proposed approach achieved competitive performance compared with the conventional FH.

...read moreread less

3 citations

Journal Article•

Continual learning using hash-routed convolutional neural networks

[...]

Ahmad Berjaoui¹•Institutions (1)

CentraleSupélec¹

04 May 2021-arXiv: Learning

TL;DR: This work introduces hash-routed convolutional neural networks: a group of convolutionian units where data flows dynamically, providing excellent plasticity thanks to its routed nature, while generating stable features through the use of orthogonal feature hashing.

...read moreread less

Abstract: Continual learning could shift the machine learning paradigm from data centric to model centric. A continual learning model needs to scale efficiently to handle semantically different datasets, while avoiding unnecessary growth. We introduce hash-routed convolutional neural networks: a group of convolutional units where data flows dynamically. Feature maps are compared using feature hashing and similar data is routed to the same units. A hash-routed network provides excellent plasticity thanks to its routed nature, while generating stable features through the use of orthogonal feature hashing. Each unit evolves separately and new units can be added (to be used only when necessary). Hash-routed networks achieve excellent performance across a variety of typical continual learning benchmarks without storing raw data and train using only gradient descent. Besides providing a continual learning framework for supervised tasks with encouraging results, our model can be used for unsupervised or reinforcement learning.

...read moreread less

1 citations

Posted Content•

Dynamic Texture Recognition using PDV Hashing and Dictionary Learning on Multi-scale Volume Local Binary Pattern

[...]

Ruxin Ding, Jianfeng Ren, Heng Yu, Jiawei Li

24 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as discussed by the authors proposed a method for dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern (PHD-MVLBP), instead of forming very high-dimensional LBP histogram features, it first uses hash functions to map the pixel difference vectors (PDVs) to binary vectors, then forms a dictionary using the derived binary vector, and encodes them using a derived dictionary.

...read moreread less

Abstract: Spatial-temporal local binary pattern (STLBP) has been widely used in dynamic texture recognition. STLBP often encounters the high-dimension problem as its dimension increases exponentially, so that STLBP could only utilize a small neighborhood. To tackle this problem, we propose a method for dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern (PHD-MVLBP). Instead of forming very high-dimensional LBP histogram features, it first uses hash functions to map the pixel difference vectors (PDVs) to binary vectors, then forms a dictionary using the derived binary vector, and encodes them using the derived dictionary. In such a way, the PDVs are mapped to feature vectors of the size of dictionary, instead of LBP histograms of very high dimension. Such an encoding scheme could extract the discriminant information from videos in a much larger neighborhood effectively. The experimental results on two widely-used dynamic textures datasets, DynTex++ and UCLA, show the superiority performance of the proposed approach over the state-of-the-art methods.

...read moreread less

1 citations

Posted Content•

[...]

Zhushou Tang, Lingyi Tang, Keying Tang, Ruoying Tang

18 Sep 2021-arXiv: Databases

TL;DR: In this paper, the authors make a systematical survey on the existent well-known similarity hash functions to tease out the satisfied ones and conclude that the similarity hash function MinHash and Nilsimsa can be directly marshaled into the pipeline of similarity analysis using vector manage system.

...read moreread less

Abstract: The booming vector manage system calls for feasible similarity hash function as a front-end to perform similarity analysis. In this paper, we make a systematical survey on the existent well-known similarity hash functions to tease out the satisfied ones. We conclude that the similarity hash function MinHash and Nilsimsa can be directly marshaled into the pipeline of similarity analysis using vector manage system. After that, we make a brief and empirical discussion on the performance, drawbacks of the these functions and highlight MinHash, the variant of SimHash and feature hashing are the best for vector management system for large-scale similarity analysis.

...read moreread less

Journal Article•DOI•

Deflate-inflate: Exploiting hashing trick for bringing inference to the edge with scalable convolutional neural networks

[...]

Azra Nazir¹, Roohie Naaz Mir¹, Shaima Qureshi¹•Institutions (1)

National Institute of Technology, Srinagar¹

30 Aug 2021-Concurrency and Computation: Practice and Experience

TL;DR: The proposed system uses the hashing trick to deflate the VGG‐16 model and a neighborhood function are used to inflate the model at runtime to bridge the gap between these models and Internet of Things edge devices.

...read moreread less

Abstract: With each passing year, the compelling need to bring deep learning computational models to the edge grows, as does the disparity in resource demand between these models and Internet of Things edge devices. This article employs an old trick from the book \"deflate and inflate\" to bridge this gap. The proposed system uses the hashing trick to deflate the model. A uniform hash function and a neighborhood function are used to inflate the model at runtime. The neighborhood function approximates the original parameter space better than the uniform hash function according to experimental results. Compared to existing techniques for distributing the VGG‐16 model over the Fog‐Edge platform, our deployment strategy has a 1.7 × ‐ 7.5 × speedup with only 1–4 devices due to decreased memory access and better resource utilization.

...read moreread less

Posted Content•

Additive Feature Hashing.

[...]

M. Andrecut

07 Feb 2021-arXiv: Learning

TL;DR: In this article, additive feature hashing is used to encode categorical features into a numerical vector representation of pre-defined fixed length, which is then converted into high-dimensional numerical vectors.

...read moreread less

Abstract: The hashing trick is a machine learning technique used to encode categorical features into a numerical vector representation of pre-defined fixed length. It works by using the categorical hash values as vector indices, and updating the vector values at those indices. Here we discuss a different approach based on additive-hashing and the "almost orthogonal" property of high-dimensional random vectors. That is, we show that additive feature hashing can be performed directly by adding the hash values and converting them into high-dimensional numerical vectors. We show that the performance of additive feature hashing is similar to the hashing trick, and we illustrate the results numerically using synthetic, language recognition, and SMS spam detection data.

...read moreread less

Posted Content•

byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings.

[...]

Xiang Zhang, Alexandre Drouin, Raymond Li

24 Jun 2021-arXiv: Computation and Language

TL;DR: The byteSteady algorithm as mentioned in this paper assumes that each input comes as a sequence of bytes and uses a pre-defined set of n-gram embeddings to reduce the number of vectors.

...read moreread less

Abstract: This article introduces byteSteady -- a fast model for classification using byte-level n-gram embeddings. byteSteady assumes that each input comes as a sequence of bytes. A representation vector is produced using the averaged embedding vectors of byte-level n-grams, with a pre-defined set of n. The hashing trick is used to reduce the number of embedding vectors. This input representation vector is then fed into a linear classifier. A straightforward application of byteSteady is text classification. We also apply byteSteady to one type of non-language data -- DNA sequences for gene classification. For both problems we achieved competitive classification results against strong baselines, suggesting that byteSteady can be applied to both language and non-language data. Furthermore, we find that simple compression using Huffman coding does not significantly impact the results, which offers an accuracy-speed trade-off previously unexplored in machine learning.

...read moreread less

Patent•

PE Packer classification apparatus and method using PE section inforamtion

[...]

Lee Tae Jin¹, Lee Young Joo•Institutions (1)

Hoseo University¹

14 Jan 2021

TL;DR: In this article, a packer classification apparatus extracts features based on a section that holds packer information from files and classifies packers using a Deep Neural Network(DNN) for detection of new/variant packers.

...read moreread less

Abstract: A packer classification apparatus extracts features based on a section that holds packer information from files and classifies packers using a Deep Neural Network(DNN) for detection of new/variant packers. A packer classification apparatus according to an embodiment uses PE section information. packer classification apparatus includes a collection classification module collecting a data set and classifying data by packer type to prepare for a model learning, a token hash module tokenizing a character string obtained after extracting labels and section names of each data and combining the section names, and obtaining a certain standard output value using Feature Hashing, and a type classification module generating a learning model after learning the data set with a Deep Neural Network(DNN) algorithm using extracted features, and classifying files for each packer type using the learning model after extracting features for the files to be classified.

...read moreread less