Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey

doi:10.5120/IJCA2019918334

Citations

PDF

Open Access

More filters

Posted Content•

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

[...]

Jie Gui¹, Zhenan Sun, Yonggang Wen², Dacheng Tao³, Jieping Ye⁴ - Show less +1 more•Institutions (4)

Southeast University¹, Nanyang Technological University², University of Sydney³, University of Michigan⁴

20 Jan 2020-arXiv: Learning

TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.

...read moreread less

Abstract: Generative adversarial networks (GANs) are a hot research topic recently. GANs have been widely studied since 2014, and a large number of algorithms have been proposed. However, there is few comprehensive study explaining the connections among different GANs variants, and how they have evolved. In this paper, we attempt to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications. Firstly, the motivations, mathematical representations, and structure of most GANs algorithms are introduced in details. Furthermore, GANs have been combined with other machine learning algorithms for specific applications, such as semi-supervised learning, transfer learning, and reinforcement learning. This paper compares the commonalities and differences of these GANs methods. Secondly, theoretical issues related to GANs are investigated. Thirdly, typical applications of GANs in image processing and computer vision, natural language processing, music, speech and audio, medical field, and data science are illustrated. Finally, the future open research problems for GANs are pointed out.

...read moreread less

344 citations

Cites background from "Audio Enhancement and Synthesis usi..."

...1) GANs for specific applications: There are surveys of using GANs for specific applications such as image synthesis and editing [5], audio enhancement and synthesis [6]....
[...]

Journal Article•DOI•

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

[...]

01 Apr 2023-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A review of the various GAN methods from the perspectives of algorithms, theory, and applications is provided in this paper , where the motivations, mathematical representations, and structures of most GAN algorithms are introduced in detail, and compared their commonalities and differences.

...read moreread less

Abstract: Generative adversarial networks (GANs) have recently become a hot research topic; however, they have been studied since 2014, and a large number of algorithms have been proposed. Nevertheless, few comprehensive studies explain the connections among different GAN variants and how they have evolved. In this paper, we attempt to provide a review of the various GAN methods from the perspectives of algorithms, theory, and applications. First, the motivations, mathematical representations, and structures of most GAN algorithms are introduced in detail, and we compare their commonalities and differences. Second, theoretical issues related to GANs are investigated. Finally, typical applications of GANs in image processing and computer vision, natural language processing, music, speech and audio, the medical field, and data science are discussed.

...read moreread less

77 citations

Posted Content•

Video Generative Adversarial Networks: A Review.

[...]

Nuha Aldausari, Arcot Sowmya, Nadine Marcus, Gelareh Mohammadi

04 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This is the first paper that reviews the state-of-the-art video GANs models and summarizes the main improvements in GAns that are not necessarily applied in the video domain in the first run but have been adopted in multiple video Gans variations.

...read moreread less

Abstract: With the increasing interest in the content creation field in multiple sectors such as media, education, and entertainment, there is an increasing trend in the papers that uses AI algorithms to generate content such as images, videos, audio, and text. Generative Adversarial Networks (GANs) in one of the promising models that synthesizes data samples that are similar to real data samples. While the variations of GANs models, in general, have been covered to some extent in several survey papers, to the best of our knowledge, this is among the first survey papers that reviews the state-of-the-art video GANs models. This paper first categorized GANs review papers into general GANs review papers, image GANs review papers, and special field GANs review papers such as anomaly detection, medical imaging, or cybersecurity. The paper then summarizes the main improvements in GANs frameworks that are not initially developed for the video domain but have been adopted in multiple video GANs variations. Then, a comprehensive review of video GANs models is provided under two main divisions according to the presence or non-presence of a condition. The conditional models then further grouped according to the type of condition into audio, text, video, and image. The paper is concluded by highlighting the main challenges and limitations of the current video GANs models. A comprehensive list of datasets, applied loss functions, and evaluation metrics is provided in the supplementary material.

...read moreread less

20 citations

Additional excerpts

...The field of synthesizing and enhancing audio using GANs architectures has also been reviewed [37]....
[...]

Posted Content•

Generative Adversarial Networks in Human Emotion Synthesis:A Review

[...]

Noushin Hajarolasvadi¹, Miguel Arjona Ramírez, Hasan Demirel•Institutions (1)

Eastern Mediterranean University¹

28 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A comprehensive survey of recent advances in human emotion synthesis by studying available databases, advantages, and disadvantages of the generative models along with the related training strategies considering two principal human communication modalities, namely audio and video.

...read moreread less

Abstract: Synthesizing realistic data samples is of great value for both academic and industrial communities. Deep generative models have become an emerging topic in various research areas like computer vision and signal processing. Affective computing, a topic of a broad interest in computer vision society, has been no exception and has benefited from generative models. In fact, affective computing observed a rapid derivation of generative models during the last two decades. Applications of such models include but are not limited to emotion recognition and classification, unimodal emotion synthesis, and cross-modal emotion synthesis. As a result, we conducted a review of recent advances in human emotion synthesis by studying available databases, advantages, and disadvantages of the generative models along with the related training strategies considering two principal human communication modalities, namely audio and video. In this context, facial expression synthesis, speech emotion synthesis, and the audio-visual (cross-modal) emotion synthesis is reviewed extensively under different application scenarios. Gradually, we discuss open research problems to push the boundaries of this research area for future works.

...read moreread less

10 citations

Additional excerpts

...enhancement and synthesis [15], image synthesis [16], and text synthesis [17]....
[...]

Journal Article•DOI•

Deep convolutional generative adversarial networks for modeling complex hydrological structures in Monte-Carlo simulation

[...]

Qiyu Chen, Zhesi Cui, Gang Liu, Zixiao Yang, Xiaogang Ma - Show less +1 more

01 May 2022-Journal of hydrology

TL;DR: In this article , the authors proposed a method to reconstruct complex hydrological structures by using deep convolutional generative adversarial networks (DCGAN) in the Monte-Carlo simulation process, named MC-GAN.

...read moreread less

Abstract: Characterization of complex subsurface structures is challenging due to the demand to preserve geological realism of the training images in earth and environmental sciences. In this work, we propose a novel method to reconstruct complex hydrological structures by using deep convolutional generative adversarial networks (DCGAN) in the Monte-Carlo simulation process, named MC-GAN. Network architectures for reconstructing both two-dimensional (2D) and three-dimensional (3D) complex spatial structures are provided in this method. We first exploit the robust DCGAN to reproduce abundant and various spatial pattern blocks. Then, we combine the various heterogeneous patterns to reconstruct a complex hydrological structure by using the Monte-Carlo stochastic simulation process. The method is able to represent multiple-scale spatial structures under the premise of using the same generative adversarial network architecture. It not only ensures the simulation efficiency, but also makes the heterogeneous patterns in the realizations more diverse. Three sets of training images were used to test the capability of the proposed method. The experiment results demonstrate that our method can accurately characterize complex heterogeneous spatial structures. At the same time, the trained deep learning model can be reused effectively to generate multiple-scale spatial structures.

...read moreread less

10 citations

Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey

Citations

Cites background from "Audio Enhancement and Synthesis usi..."

Additional excerpts

Additional excerpts

References

"Audio Enhancement and Synthesis usi..." refers background or methods in this paper

"Audio Enhancement and Synthesis usi..." refers background in this paper

"Audio Enhancement and Synthesis usi..." refers background or methods in this paper

Related Papers (5)

Trending Questions (1)