Cuckoo feature hashing: Dynamic weight sharing for sparse analytics
Jinyang Gao,Beng Chin Ooi,Yanyan Shen,Wang-Chien Lee +3 more
- pp 2135-2141
TLDR
Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.Abstract:
Feature hashing is widely used to process large scale sparse features for learning of predictive models. Collisions inherently happen in the hashing process and hurt the model performance. In this paper, we develop a feature hashing scheme called Cuckoo Feature Hashing(CCFH) based on the principle behind Cuckoo hashing, a hashing scheme designed to resolve collisions. By providing multiple possible hash locations for each feature, CCFH prevents the collisions between predictive features by dynamically hashing them into alternative locations during model training. Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.read more
Citations
More filters
Posted Content
Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
TL;DR: This work proposes mixed dimension embedding layers in which the dimension of a particular embedding vector can depend on the frequency of the item, which drastically reduces the memory requirement for the embedding, while maintaining and sometimes improving the ML performance.
Journal ArticleDOI
Towards Reliable Learning for High Stakes Applications.
TL;DR: This paper proposes an exploratory solution called GALVE (Generative Adversarial Learning with Variance Expansion) which adopts generative adversarial learning to implicitly measure the region where the model achieve good generalization performance and achieved an error rate less than half of which straightforwardly measured by confidence in CIFAR10 and SVHN computer vision tasks.
Posted Content
PANDA: Facilitating Usable AI Development
Jinyang Gao,Wei Wang,Meihui Zhang,Gang Chen,H. V. Jagadish,Guoliang Li,Teck Khim Ng,Beng Chin Ooi,Sheng Wang,Jingren Zhou +9 more
TL;DR: A new perspective on developing AI solutions is taken, and a solution for making AI usable is presented that will enable all subject matter experts (eg. Clinicians) to exploit AI like data scientists.
Proceedings ArticleDOI
A New Feature Hashing Approach Based on Term Weight for Dimensional Reduction
TL;DR: This paper proposed a new feature hashing approach that hashes similar features to the same bin based on their weight known as "weight term" while minimizing certain collisions, which effectively reduces the collisions between dissimilar features, thus improving model performance.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
R: A Language for Data Analysis and Graphics
Ross Ihaka,Robert Gentleman +1 more
TL;DR: In this article, the authors discuss their experience designing and implementing a statistical computing language, which combines what they felt were useful features from two existing computer languages, and they feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scope.
Proceedings Article
Spark: cluster computing with working sets
TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.
Proceedings Article
Locality Preserving Projections
Xiaofei He,Partha Niyogi +1 more
TL;DR: These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold.
Proceedings ArticleDOI
Fisher discriminant analysis with kernels
TL;DR: In this article, a non-linear classification technique based on Fisher's discriminant is proposed and the main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space.
Related Papers (5)
Compact Structure Hashing via Sparse and Similarity Preserving Embedding
Renzhen Ye,Xuelong Li +1 more