GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

doi:10.1145/3394486.3403311

Proceedings ArticleDOI

GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

Sean Bell, +8 more

- pp 2608-2616

Chats0

TLDR

GrekNet leverages a multi-task learning approach to train a single computer vision trunk, achieving a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system.

Abstract:

In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Multi-Task Learning for Dense Prediction Tasks: A Survey.

Simon Vandenhende, +5 more

- 26 Jan 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This survey provides a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision, explicitly emphasizing on dense prediction tasks.

...read moreread less

Posted Content

Training Vision Transformers for Image Retrieval

Alaaeldin El-Nouby, +3 more

- 10 Feb 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a transformer-based approach for image retrieval is proposed, which combines a contrastive loss with a differential entropy regularizer to generate image descriptors and train the resulting model with a metric learning objective.

...read moreread less

Proceedings ArticleDOI

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Licheng Yu, +6 more

TL;DR: The CommerceMM model achieves state-of-the-art performance on 7 commerce-related downstream tasks after fine-tuning and proposes another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training.

...read moreread less

Proceedings ArticleDOI

Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook

Yiqun Liu, +6 more

TL;DR: In this paper, the authors present Que2Search, a deployed query and product understanding system for search, which leverages multi-task and multi-modal learning approaches to train query representations and achieve over 5% absolute offline relevance improvement and over 4% online engagement gain over state-of-the-art Facebook product understanding systems by combining the latest multilingual natural language understanding architectures like XLM and XLM-R with multi-mode fusion techniques.

...read moreread less

Journal ArticleDOI

Deep multi-task learning for malware image classification

A. Bensaoud, +1 more

- 01 Feb 2022 -

Journal of information security and appl...

TL;DR: In this paper , a multi-task learning framework is proposed for malware image classification for accurate and fast malware detection, which generates bitmap (BMP) and (PNG) images from malware features, which are then fed to a deep learning classifier.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013 -

arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

Proceedings ArticleDOI

Aggregated Residual Transformations for Deep Neural Networks

Saining Xie, +4 more

TL;DR: ResNeXt as discussed by the authors is a simple, highly modularized network architecture for image classification, which is constructed by repeating a building block that aggregates a set of transformations with the same topology.

...read moreread less

Proceedings ArticleDOI

Dimensionality Reduction by Learning an Invariant Mapping

Raia Hadsell, +2 more

TL;DR: This work presents a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold.

...read moreread less

Collapse

arXiv: Computer Vision and Pattern Recog...

Collaborative feature learning from social media

Chen Fang, +3 more

GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

Citations

Multi-Task Learning for Dense Prediction Tasks: A Survey.

Training Vision Transformers for Image Retrieval

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook

Deep multi-task learning for malware image classification

References

Deep learning

Distributed Representations of Words and Phrases and their Compositionality

Distributed Representations of Words and Phrases and their Compositionality

Aggregated Residual Transformations for Deep Neural Networks

Dimensionality Reduction by Learning an Invariant Mapping

Related Papers (5)

Deep Residual Learning for Image Recognition

Learning a Unified Embedding for Visual Search at Pinterest

Visual Search at eBay

Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction

Collaborative feature learning from social media