Cuckoo feature hashing: Dynamic weight sharing for sparse analytics

Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.

Abstract:

Feature hashing is widely used to process large scale sparse features for learning of predictive models. Collisions inherently happen in the hashing process and hurt the model performance. In this paper, we develop a feature hashing scheme called Cuckoo Feature Hashing(CCFH) based on the principle behind Cuckoo hashing, a hashing scheme designed to resolve collisions. By providing multiple possible hash locations for each feature, CCFH prevents the collisions between predictive features by dynamically hashing them into alternative locations during model training. Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

Antonio Ginart,Maxim Naumov,Dheevatsa Mudigere,Jiyan Yang,James Zou +4 moreStanford University,Facebook

- 25 Sep 2019 -

arXiv: Learning

Show Less

TL;DR: This work proposes mixed dimension embedding layers in which the dimension of a particular embedding vector can depend on the frequency of the item, which drastically reduces the memory requirement for the embedding, while maintaining and sometimes improving the ML performance.

...read moreread less

Journal ArticleDOI

Towards Reliable Learning for High Stakes Applications.

Jinyang Gao,Junjie Yao,Yingxia Shao +2 moreAlibaba Group,East China Normal University,Beijing University of Posts and Telecommunications

Show Less

TL;DR: This paper proposes an exploratory solution called GALVE (Generative Adversarial Learning with Variance Expansion) which adopts generative adversarial learning to implicitly measure the region where the model achieve good generalization performance and achieved an error rate less than half of which straightforwardly measured by confidence in CIFAR10 and SVHN computer vision tasks.

...read moreread less

Posted Content

PANDA: Facilitating Usable AI Development

Jinyang Gao,Wei Wang,Meihui Zhang,Gang Chen,H. V. Jagadish,Guoliang Li,Teck Khim Ng,Beng Chin Ooi,Sheng Wang,Jingren Zhou +9 more

- 26 Apr 2018 -

arXiv: Artificial Intelligence

Show Less

TL;DR: A new perspective on developing AI solutions is taken, and a solution for making AI usable is presented that will enable all subject matter experts (eg. Clinicians) to exploit AI like data scientists.

...read moreread less

Proceedings ArticleDOI

A New Feature Hashing Approach Based on Term Weight for Dimensional Reduction

Abubakar Ado,Noor Azah Samsudin,Mustafa Mat Deris +2 more

Show Less

TL;DR: This paper proposed a new feature hashing approach that hashes similar features to the same bin based on their weight known as "weight term" while minimizing certain collisions, which effectively reduces the collisions between dissimilar features, thus improving model performance.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma,Jimmy Ba +1 moreUniversity of Amsterdam,University of Toronto

Show Less

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

R: A Language for Data Analysis and Graphics

Ross Ihaka,Robert Gentleman +1 moreUniversity of Auckland

- 01 Sep 1996 -

Journal of Computational and Graphical S...

Show Less

TL;DR: In this article, the authors discuss their experience designing and implementing a statistical computing language, which combines what they felt were useful features from two existing computer languages, and they feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scope.

...read moreread less

Proceedings Article

Spark: cluster computing with working sets

Matei Zaharia,Mosharaf Chowdhury,Michael J. Franklin,Scott Shenker,Ion Stoica +4 moreUniversity of California, Berkeley

Show Less

TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.

...read moreread less

Proceedings Article

Locality Preserving Projections

Xiaofei He,Partha Niyogi +1 moreUniversity of Chicago

Show Less

TL;DR: These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold.

...read moreread less

Proceedings ArticleDOI

Fisher discriminant analysis with kernels

Sebastian Mika,Gunnar Rätsch,Jason Weston,Bernhard Schölkopf,K.R. Mullers +4 moreMax Planck Society,Fraunhofer Institute for Open Communication Systems

Show Less

TL;DR: In this article, a non-linear classification technique based on Fisher's discriminant is proposed and the main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space.

...read moreread less