scispace - formally typeset
Search or ask a question
Author

Vasiliy Galyuk

Bio: Vasiliy Galyuk is an academic researcher. The author has contributed to research in topics: Artificial intelligence & Computer science. The author has an hindex of 1, co-authored 3 publications receiving 3 citations.

Papers
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this article, a composite mini-batch technique is proposed to combine several sampling strategies in one training process, where the main idea is to compose mini-batches from several parts, and use different sampling strategy for each part.
Abstract: Mini-batch construction strategy is an important part of the deep representation learning. Different strategies have their advantages and limitations. Usually only one of them is selected to create mini-batches for training. However, in many cases their combination can be more efficient than using only one of them. In this paper, we propose Composite Mini-Batches - a technique to combine several mini-batch sampling strategies in one training process. The main idea is to compose mini-batches from several parts, and use different sampling strategy for each part. With this kind of mini-batch construction, we combine the advantages and reduce the limitations of the individual mini-batch sampling strategies. We also propose Interpolated Embeddings and Priority Class Sampling as complementary methods to improve the training of face representations. Our experiments on a challenging task of disguised face recognition confirm the advantages of the proposed methods.

5 citations

Posted Content
TL;DR: Prototype Memory as discussed by the authors uses a limited-size memory module for storing recent class prototypes and employs a set of algorithms to update it in appropriate way, which can be used with various loss functions, hard example mining algorithms and encoder architectures.
Abstract: Face representation learning using datasets with massive number of identities requires appropriate training methods. Softmax-based approach, currently the state-of-the-art in face recognition, in its usual "full softmax" form is not suitable for datasets with millions of persons. Several methods, based on the "sampled softmax" approach, were proposed to remove this limitation. These methods, however, have a set of disadvantages. One of them is a problem of "prototype obsolescence": classifier weights (prototypes) of the rarely sampled classes, receive too scarce gradients and become outdated and detached from the current encoder state, resulting in an incorrect training signals. This problem is especially serious in ultra-large-scale datasets. In this paper, we propose a novel face representation learning model called Prototype Memory, which alleviates this problem and allows training on a dataset of any size. Prototype Memory consists of the limited-size memory module for storing recent class prototypes and employs a set of algorithms to update it in appropriate way. New class prototypes are generated on the fly using exemplar embeddings in the current mini-batch. These prototypes are enqueued to the memory and used in a role of classifier weights for usual softmax classification-based training. To prevent obsolescence and keep the memory in close connection with encoder, prototypes are regularly refreshed, and oldest ones are dequeued and disposed. Prototype Memory is computationally efficient and independent of dataset size. It can be used with various loss functions, hard example mining algorithms and encoder architectures. We prove the effectiveness of the proposed model by extensive experiments on popular face recognition benchmarks.

2 citations

Posted Content
TL;DR: In this paper, a number of diverse subsystems based on using deep neural networks as feature extractors were used for speaker verification filed in the NIST 2021 Speaker Recognition Evaluation for both fixed and open training conditions.
Abstract: This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation for both fixed and open training conditions. These systems consists of a number of diverse subsystems based on using deep neural networks as feature extractors. During the NIST 2021 SRE challenge we focused on the training of the state-of-the-art deep speaker embeddings extractors like ResNets and ECAPA networks by using additive angular margin based loss functions. Additionally, inspired by the recent success of the wav2vec 2.0 features in automatic speech recognition we explored the effectiveness of this approach for the speaker verification filed. According to our observation the fine-tuning of the pretrained large wav2vec 2.0 model provides our best performing systems for open track condition. Our experiments with wav2vec 2.0 based extractors for the fixed condition showed that unsupervised autoregressive pretraining with Contrastive Predictive Coding loss opens the door to training powerful transformer-based extractors from raw speech signals. For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets. The final results for primary systems were obtained by different configurations of subsystems fusion on the score level followed by score calibration.
Book ChapterDOI
TL;DR: FaceMix as mentioned in this paper is a flexible face-specific data augmentation technique that transfers a local area of an image to another image, and it can generate new images for a class, using face data from other classes, and these two modes also could be combined.
Abstract: Augmentation strategies for image recognition based on local image patches have gained widespread popularity. Their main idea is to replace or remove some local regions of the image. The advantage of these methods is that they change part of the image and force the network to pay attention to the less significant parts, which leads to a greater generalization capacity of the network. While these methods work good for image recognition, they do not perform as well for face recognition tasks. The purpose of this work is to create augmentation specialized for face recognition and devoid of the shortcomings of previous works. We present FaceMix: a flexible face-specific data augmentation technique that transfers a local area of an image to another image. The method has two operating modes: it can generate new images within a class, and it can generate images for a class, using face data from other classes, and these two modes also could be combined. FaceMix is helping to solve the problems of class imbalance and insufficient number of images per identity. A feature of this method is that the number of possible artificial images grows quadratically with the growth of real images. Experiments on face recognition benchmarks, such as CFP-FP, AgeDB, CALFW, CPLFW, XQLFW, SLLFW, RFW and MegaFace, demonstrate the effectiveness of the proposed method.

Cited by
More filters
Posted Content
TL;DR: Prototype Memory as discussed by the authors uses a limited-size memory module for storing recent class prototypes and employs a set of algorithms to update it in appropriate way, which can be used with various loss functions, hard example mining algorithms and encoder architectures.
Abstract: Face representation learning using datasets with massive number of identities requires appropriate training methods. Softmax-based approach, currently the state-of-the-art in face recognition, in its usual "full softmax" form is not suitable for datasets with millions of persons. Several methods, based on the "sampled softmax" approach, were proposed to remove this limitation. These methods, however, have a set of disadvantages. One of them is a problem of "prototype obsolescence": classifier weights (prototypes) of the rarely sampled classes, receive too scarce gradients and become outdated and detached from the current encoder state, resulting in an incorrect training signals. This problem is especially serious in ultra-large-scale datasets. In this paper, we propose a novel face representation learning model called Prototype Memory, which alleviates this problem and allows training on a dataset of any size. Prototype Memory consists of the limited-size memory module for storing recent class prototypes and employs a set of algorithms to update it in appropriate way. New class prototypes are generated on the fly using exemplar embeddings in the current mini-batch. These prototypes are enqueued to the memory and used in a role of classifier weights for usual softmax classification-based training. To prevent obsolescence and keep the memory in close connection with encoder, prototypes are regularly refreshed, and oldest ones are dequeued and disposed. Prototype Memory is computationally efficient and independent of dataset size. It can be used with various loss functions, hard example mining algorithms and encoder architectures. We prove the effectiveness of the proposed model by extensive experiments on popular face recognition benchmarks.

2 citations

Journal ArticleDOI
TL;DR: Prototype Memory as discussed by the authors uses a limited-size memory module for storing recent class prototypes and employs a set of algorithms to update it in appropriate way to prevent prototype obsolescence.
Abstract: Face representation learning using datasets with a massive number of identities requires appropriate training methods. Softmax-based approach, currently the state-of-the-art in face recognition, in its usual "full softmax" form is not suitable for datasets with millions of persons. Several methods, based on the "sampled softmax" approach, were proposed to remove this limitation. These methods, however, have a set of disadvantages. One of them is a problem of "prototype obsolescence": classifier weights (prototypes) of the rarely sampled classes receive too scarce gradients and become outdated and detached from the current encoder state, resulting in incorrect training signals. This problem is especially serious in ultra-large-scale datasets. In this paper, we propose a novel face representation learning model called Prototype Memory, which alleviates this problem and allows training on a dataset of any size. Prototype Memory consists of the limited-size memory module for storing recent class prototypes and employs a set of algorithms to update it in appropriate way. New class prototypes are generated on the fly using exemplar embeddings in the current mini-batch. These prototypes are enqueued to the memory and used in a role of classifier weights for softmax classification-based training. To prevent obsolescence and keep the memory in close connection with the encoder, prototypes are regularly refreshed, and oldest ones are dequeued and disposed of. Prototype Memory is computationally efficient and independent of dataset size. It can be used with various loss functions, hard example mining algorithms and encoder architectures. We prove the effectiveness of the proposed model by extensive experiments on popular face recognition benchmarks.

1 citations

Journal ArticleDOI
TL;DR: In this paper , the authors propose a P.P.O.O (P.P.) scheme. But it is difficult to implement and computationally computationally timeconsuming.
Abstract: P.O
Posted Content
TL;DR: In this paper, a number of diverse subsystems based on using deep neural networks as feature extractors were used for speaker verification filed in the NIST 2021 Speaker Recognition Evaluation for both fixed and open training conditions.
Abstract: This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation for both fixed and open training conditions. These systems consists of a number of diverse subsystems based on using deep neural networks as feature extractors. During the NIST 2021 SRE challenge we focused on the training of the state-of-the-art deep speaker embeddings extractors like ResNets and ECAPA networks by using additive angular margin based loss functions. Additionally, inspired by the recent success of the wav2vec 2.0 features in automatic speech recognition we explored the effectiveness of this approach for the speaker verification filed. According to our observation the fine-tuning of the pretrained large wav2vec 2.0 model provides our best performing systems for open track condition. Our experiments with wav2vec 2.0 based extractors for the fixed condition showed that unsupervised autoregressive pretraining with Contrastive Predictive Coding loss opens the door to training powerful transformer-based extractors from raw speech signals. For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets. The final results for primary systems were obtained by different configurations of subsystems fusion on the score level followed by score calibration.
Posted Content
TL;DR: In this article, simultaneous perturbation stochastic approximation (SPSA) is used for meta-training tasks weights optimization to improve the performance of the meta-learning pipeline.
Abstract: Meta-learning methods aim to build learning algorithms capable of quickly adapting to new tasks in low-data regime. One of the main benchmarks of such an algorithms is a few-shot learning problem. In this paper we investigate the modification of standard meta-learning pipeline that takes a multi-task approach during training. The proposed method simultaneously utilizes information from several meta-training tasks in a common loss function. The impact of each of these tasks in the loss function is controlled by the corresponding weight. Proper optimization of these weights can have a big influence on training of the entire model and might improve the quality on test time tasks. In this work we propose and investigate the use of methods from the family of simultaneous perturbation stochastic approximation (SPSA) approaches for meta-train tasks weights optimization. We have also compared the proposed algorithms with gradient-based methods and found that stochastic approximation demonstrates the largest quality boost in test time. Proposed multi-task modification can be applied to almost all methods that use meta-learning pipeline. In this paper we study applications of this modification on Prototypical Networks and Model-Agnostic Meta-Learning algorithms on CIFAR-FS, FC100, tieredImageNet and miniImageNet few-shot learning benchmarks. During these experiments, multi-task modification has demonstrated improvement over original methods. The proposed SPSA-Tracking algorithm shows the largest accuracy boost that is competitive against the state-of-the-art meta-learning methods. Our code is available online.