scispace - formally typeset
Proceedings ArticleDOI

GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

Reads0
Chats0
TLDR
GrekNet leverages a multi-task learning approach to train a single computer vision trunk, achieving a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system.
Abstract
In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.

read more

Citations
More filters
Journal ArticleDOI

Multi-Task Learning for Dense Prediction Tasks: A Survey.

TL;DR: This survey provides a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision, explicitly emphasizing on dense prediction tasks.
Posted Content

Training Vision Transformers for Image Retrieval

TL;DR: In this article, a transformer-based approach for image retrieval is proposed, which combines a contrastive loss with a differential entropy regularizer to generate image descriptors and train the resulting model with a metric learning objective.
Proceedings ArticleDOI

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

TL;DR: The CommerceMM model achieves state-of-the-art performance on 7 commerce-related downstream tasks after fine-tuning and proposes another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training.
Proceedings ArticleDOI

Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook

TL;DR: In this paper, the authors present Que2Search, a deployed query and product understanding system for search, which leverages multi-task and multi-modal learning approaches to train query representations and achieve over 5% absolute offline relevance improvement and over 4% online engagement gain over state-of-the-art Facebook product understanding systems by combining the latest multilingual natural language understanding architectures like XLM and XLM-R with multi-mode fusion techniques.
Journal ArticleDOI

Deep multi-task learning for malware image classification

TL;DR: In this paper , a multi-task learning framework is proposed for malware image classification for accurate and fast malware detection, which generates bitmap (BMP) and (PNG) images from malware features, which are then fed to a deep learning classifier.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Proceedings ArticleDOI

Aggregated Residual Transformations for Deep Neural Networks

TL;DR: ResNeXt as discussed by the authors is a simple, highly modularized network architecture for image classification, which is constructed by repeating a building block that aggregates a set of transformations with the same topology.
Proceedings ArticleDOI

Dimensionality Reduction by Learning an Invariant Mapping

TL;DR: This work presents a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold.
Related Papers (5)