Showing papers by "Dolby Laboratories published in 2020"

PDF

Open Access

Proceedings Article•

Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models

[...]

Joan Serrà¹, David Álvarez, Vicenç Gómez², Olga Slizovskaia², José F. Núñez, Jordi Luque³ - Show less +2 more•Institutions (3)

Dolby Laboratories¹, Pompeu Fabra University², Telefónica³

30 Apr 2020

TL;DR: This paper uses an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison, and finds such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.

...read moreread less

Abstract: Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to the excessive influence that input complexity has in generative models' likelihoods. We report a set of experiments supporting this hypothesis, and use an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio test akin to Bayesian model comparison. We find such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, and complexity estimates.

...read moreread less

155 citations

Proceedings Article•DOI•

Audio Codec Enhancement with Generative Adversarial Networks

[...]

Arijit Biswas, Dai Jia¹•Institutions (1)

Dolby Laboratories¹

04 May 2020

TL;DR: A GAN-based coded audio enhancer that operates end-to-end directly on decoded audio samples, eliminating the need to design any manually-crafted frontend and improving the quality of speech and difficult to code applause excerpts significantly.

...read moreread less

Abstract: Audio codecs are typically transform-domain based and efficiently code stationary audio signals, but they struggle with speech and signals containing dense transient events such as applause. Specifically, with these two classes of signals as examples, we demonstrate a technique for restoring audio from coding noise based on generative adversarial networks (GAN). A primary advantage of the proposed GAN-based coded audio enhancer is that the method operates end-to-end directly on decoded audio samples, eliminating the need to design any manually-crafted frontend. Furthermore, the enhancement approach described in this paper can improve the sound quality of low-bit rate coded audio without any modifications to the existent standard-compliant encoders. Subjective tests illustrate that the proposed enhancer improves the quality of speech and difficult to code applause excerpts significantly.

...read moreread less

31 citations

Posted Content•

Upsampling artifacts in neural audio synthesis

[...]

Jordi Pons¹, Santiago Pascual¹, Giulio Cengarle¹, Joan Serrà¹•Institutions (1)

Dolby Laboratories¹

27 Oct 2020-arXiv: Sound

TL;DR: This work shows that nearest neighbor interpolation upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

...read moreread less

Abstract: A number of recent advances in neural audio synthesis rely on upsampling layers, which can introduce undesired artifacts. In computer vision, upsampling artifacts have been studied and are known as checkerboard artifacts (due to their characteristic visual pattern). However, their effect has been overlooked so far in audio processing. Here, we address this gap by studying this problem from the audio signal processing perspective. We first show that the main sources of upsampling artifacts are: (i) the tonal and filtering artifacts introduced by problematic upsampling operators, and (ii) the spectral replicas that emerge while upsampling. We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

...read moreread less

30 citations

Proceedings Article•DOI•

Luma Mapping with Chroma Scaling in Versatile Video Coding

[...]

Taoran Lu¹, Pu Fangjun¹, Peng Yin¹, Mccarthy Sean Thomas¹, Walt Husak¹, Tao Chen¹, Edouard Francois², Christophe Chevance², Franck Hiron², Chen Jie³, Liao Ruling³, Yan Ye³, Luo Jiancong³ - Show less +9 more•Institutions (3)

Dolby Laboratories¹, InterDigital, Inc.², Alibaba Group³

23 Mar 2020

TL;DR: A new video coding tool in the Versatile Video Coding standard (VVC) named as luma mapping with chroma scaling (LMCS) is described, which aims at improving the coding efficiency for standard and high dynamic range video signals by making better use of the range of luma code values allowed at a specified bit depth.

...read moreread less

Abstract: This paper describes a new video coding tool in the Versatile Video Coding standard (VVC) named as luma mapping with chroma scaling (LMCS). Experimental compression performance results for LMCS and non-normative examples for deriving LMCS parameter values are also provided. LMCS has two main components: 1) a process for mapping input luma code values to a new set of code values for use inside the coding loop; and 2) a luma-dependent process for scaling chroma residue values. The first process, luma mapping, aims at improving the coding efficiency for standard and high dynamic range video signals by making better use of the range of luma code values allowed at a specified bit depth. The second process, chroma scaling, manages relative compression efficiency for the luma and chroma components of the video signal. The luma mapping process of LMCS is applied at the pixel sample level, and is implemented using a piecewise linear model. The chroma scaling process is applied at the chroma block level, and is implemented using a scaling factor derived from reconstructed neighboring luma samples of the chroma block.

...read moreread less

23 citations

Posted Content•

An empirical study of Conv-TasNet

[...]

Berkan Kadioglu¹, Michael Horgan², Xiaoyu Liu², Jordi Pons², Dan Darcy², Vivek Kumar² - Show less +2 more•Institutions (2)

Northeastern University¹, Dolby Laboratories²

20 Feb 2020-arXiv: Audio and Speech Processing

TL;DR: An empirical study of Conv-TasNet is conducted and an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it is proposed that can improve average SI-SNR performance by more than 1 dB.

...read moreread less

Abstract: Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

...read moreread less

22 citations

Journal Article•DOI•

High Dynamic Range Video Coding Technology in Responses to the Joint Call for Proposals on Video Compression With Capability Beyond HEVC

[...]

Edouard Francois¹, C. Andrew Segall, Alexis Michael Tourapis², Peng Yin³, D. Rusanovskyy⁴ - Show less +1 more•Institutions (4)

InterDigital, Inc.¹, Apple Inc.², Dolby Laboratories³, Qualcomm⁴

01 May 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The goal of this paper is to provide an overview of the CfP responses for the HDR/WCG category, and a description of the specific HDR/ WCG technologies submitted to the Cfp.

...read moreread less

Abstract: The ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group issued a Call for Proposals (CfP) on video compression with capability beyond HEVC in October 2017. The CfP considered three categories of content – Standard Dynamic Range, High Dynamic Range and Wide Colour Gamut (HDR/WCG), and 360° Omni-directional video. As a result of the CfP process, the development of a new video coding standard, named Versatile Video Coding (VVC), was initiated. The goal of this paper is to provide an overview of the CfP responses for the HDR/WCG category. The paper includes a summary of work leading to the development of the CfP, a presentation of the CfP results for the HDR/WCG category, and a description of the specific HDR/WCG technologies submitted to the CfP.

...read moreread less

20 citations

Posted Content•

SESQA: semi-supervised learning for speech quality assessment

[...]

Joan Serrà¹, Jordi Pons¹, Santiago Pascual¹•Institutions (1)

Dolby Laboratories¹

01 Oct 2020-arXiv: Audio and Speech Processing

TL;DR: This work tackles automatic speech quality assessment with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks.

...read moreread less

Abstract: Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks. Our results show that such a semi-supervised approach can cut the error of existing methods by more than 36%, while providing additional benefits in terms of reusable features or auxiliary outputs. Improvement is further corroborated with an out-of-sample test showing promising generalization capabilities.

...read moreread less

19 citations

Proceedings Article•DOI•

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

[...]

Furkan Yesiler¹, Joan Serrà², Emilia Gómez¹•Institutions (2)

Pompeu Fabra University¹, Dolby Laboratories²

04 May 2020

TL;DR: MOVE is presented, a musically-motivated method for accurate and scalable version identification that achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy.

...read moreread less

Abstract: The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.

...read moreread less

17 citations

Posted Content•

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

[...]

Christian J. Steinmetz¹, Jordi Pons¹, Santiago Pascual¹, Joan Serrà¹•Institutions (1)

Dolby Laboratories¹

20 Oct 2020-arXiv: Audio and Speech Processing

TL;DR: This work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters, and generates mixes that outperform baseline approaches.

...read moreread less

Abstract: Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weight sharing, as well as with a sum/difference stereo loss function. The proposed model can be trained with a limited number of examples, is permutation invariant with respect to the input ordering, and places no limit on the number of input sources. Furthermore, it produces human-readable mixing parameters, allowing users to manually adjust or refine the generated mix. Results from a perceptual evaluation involving audio engineers indicate that our approach generates mixes that outperform baseline approaches. To the best of our knowledge, this work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters.

...read moreread less

14 citations

Journal Article•DOI•

A Unified Video Codec for SDR, HDR, and 360° Video Applications

[...]

Xiaoyu Xiu¹, Philippe Hanhart¹, Yuwen He¹, Yan Ye¹, Rahul Vanam¹, Taoran Lu², Pu Fangjun², Peng Yin² - Show less +4 more•Institutions (2)

InterDigital, Inc.¹, Dolby Laboratories²

01 May 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A response to the joint CfP that considers all three categories of video content and the core codec in the response is designed based on the joint exploration model (JEM) reference software.

...read moreread less

Abstract: The ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG) issued in October 2017 a joint Call for Proposals (CfP) on video compression with capability beyond HEVC. The joint CfP included three categories of content: standard dynamic range (SDR), high dynamic range and wide color gamut (HDR/WCG), and 360° omni-directional video (360°). This paper describes a response to the joint CfP that considers all three categories of video content. The core codec in the response is designed based on the joint exploration model (JEM) reference software. The key coding tools in the JEM are significantly simplified to reduce both average and worst-case complexity for hardware design with negligible coding performance loss. Furthermore, two additional coding tools are used to further improve coding efficiency. For the HDR and 360° categories, additional coding tools specifically designed to optimize the compression efficiency and subjective quality of that specific content category are included. Further, some SDR coding tools are modified to alleviate subjective quality problems. For the random access configuration, compared to the HEVC test model (HM) anchor, the proposed video codec achieves average luma rate savings of 35.7%, 31.3%, and 33.9% for the SDR, HDR, and 360° categories, respectively.

...read moreread less

13 citations

Proceedings Article•DOI•

Combining musical features for cover detection

[...]

Guillaume Doras¹, Furkan Yesiler², Joan Serrà³, Emilia Gómez², Geoffroy Peeters⁴ - Show less +1 more•Institutions (4)

University of Paris¹, Pompeu Fabra University², Dolby Laboratories³, IRCAM⁴

11 Oct 2020

TL;DR: This work proposes a comparative study of different input representations of melodic or harmonic characteristics of songs, and shows that systems combining melodic and harmonic features drastically outperform those relying on a single input representation.

...read moreread less

Abstract: Recent works have addressed the automatic cover detection problem from a metric learning perspective. They employ different input representations, aiming to exploit melodic or harmonic characteristics of songs and yield promising performances. In this work, we propose a comparative study of these different representations and show that systems combining melodic and harmonic features drastically outperform those relying on a single input representation. We illustrate how these features complement each other with both quantitative and qualitative analyses. We finally investigate various fusion schemes and propose methods yielding state-of-the-art performances on two publicly-available large datasets.

...read moreread less

Proceedings Article•DOI•

Joint Source-Channel Rate-Distortion Optimization for Wireless Video Transmission

[...]

Rana Hegazy¹, Qing Song², Arash Vosoughi³, Laurence B. Milstein¹, Pamela C. Cosman¹ - Show less +1 more•Institutions (3)

University of California, San Diego¹, Dolby Laboratories², LG Electronics³

01 Nov 2020

TL;DR: In this paper, a joint source-channel rate-distortion (RD) optimization for real-time video transmission is proposed, where the video compression and forward error correction (FEC) options are optimized by looking for the best trade-off between the estimated end-to-end distortion of a video packet and the sum of the number of source bits and FEC bits used to encode that packet.

...read moreread less

Abstract: This paper proposes a joint source-channel rate-distortion (RD) optimization for real-time video transmission. The video compression and forward error correction (FEC) options are optimized by looking for the best trade-off between the estimated end-to-end distortion of a video packet and the sum of the number of source bits and FEC bits used to encode that packet. Video coding options include coding mode and quantization parameter, which are selected for each macroblock. Channel coding options consist of different FEC code rates that provide different levels of protection against the lossy channel. The proposed RD technique adjusts the bit rate to meet a target using a Lagrange multiplier approach. The encoder also uses the instantaneous channel state information to improve performance for a varying channel. Conventional RD optimization approaches optimize over the video coding modes, and our approach which considers the channel and FEC bits as well has better performance over both AWGN and Rayleigh fading channels. We also consider an approach to reduce the computational complexity of the proposed RD scheme.

...read moreread less

Proceedings Article•DOI•

An Empirical Study of Conv-Tasnet

[...]

Berkan Kadioglu¹, Michael Horgan², Xiaoyu Liu², Jordi Pons², Dan Darcy², Vivek Kumar² - Show less +2 more•Institutions (2)

Northeastern University¹, Dolby Laboratories²

04 May 2020

TL;DR: In this article, the encoder/decoder of Conv-TasNet is proposed to be based on a non-linear variant of the learned encoder and decoder.

...read moreread less

Patent•

Signal reshaping and coding for HDR and wide color gamut signals

[...]

Peng Yin¹, Taoran Lu¹, Pu Fangjun¹, Tao Chen¹, Husak Walter J¹ - Show less +1 more•Institutions (1)

Dolby Laboratories¹

21 May 2020

TL;DR: In this paper, a decoder parses sequence processing set (SPS) data from an input coded bitstream to detect that an HDR extension syntax structure is present in the parsed SPS data.

...read moreread less

Abstract: In a method to improve the coding efficiency of high-dynamic range (HDR) images, a decoder parses sequence processing set (SPS) data from an input coded bitstream to detect that an HDR extension syntax structure is present in the parsed SPS data. It extracts from the HDR extension syntax structure post-processing information that includes one or more of a color space enabled flag, a color enhancement enabled flag, an adaptive reshaping enabled flag, a dynamic range conversion flag, a color correction enabled flag, or an SDR viewable flag. It decodes the input bitstream to generate a preliminary output decoded signal, and generates a second output signal based on the preliminary output signal and the post-processing information.

...read moreread less

Journal Article•DOI•

Efficient Debanding Filtering for Inverse Tone Mapped High Dynamic Range Videos

[...]

Qing Song¹, Guan-Ming Su¹, Pamela C. Cosman²•Institutions (2)

Dolby Laboratories¹, University of California, San Diego²

01 Aug 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A highly integrated architecture that can detect and alleviate banding artifacts while preserving true edges and details in an extremely efficient way is designed by a 7-tap edge-aware selective sparse filter.

...read moreread less

Abstract: When a low or standard dynamic range video is inverse tone mapped to high dynamic range (HDR), there can be banding artifacts in the output HDR video. We design a highly integrated architecture that can detect and alleviate banding artifacts while preserving true edges and details in an extremely efficient way. This is achieved by a 7-tap edge-aware selective sparse filter. Coding artifacts, such as blocky artifacts, can also be reduced by this filter. The filter includes some parameters that depend on the strengths of the banding artifacts. A parameter selection mechanism is presented which considers smoothness of the banding regions and fidelity of the filtering output. The filter yields significant PSNR gain at the regions of artifacts. Subjective tests demonstrate the great quality improvement achieved by the proposed filter, compared to the quality before filtering. The visual quality provided by the filter is better than or similar to that of algorithms which are far more complex.

...read moreread less

Posted Content•

Source coding of audio signals with a generative model

[...]

Fejgin Roy M¹, Janusz Klejsa, Lars Villemoes, Cong Zhou¹•Institutions (1)

Dolby Laboratories¹

27 Jan 2020-arXiv: Audio and Speech Processing

TL;DR: This work uses SampleRNN as the generative model and demonstrates that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.

...read moreread less

Abstract: We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditioned on the quantized waveform. The proposed coding scheme is theoretically analyzed. Using SampleRNN as the generative model, we demonstrate that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.

...read moreread less

Proceedings Article•DOI•

Less is more: Faster and better music version identification with embedding distillation.

[...]

Furkan Yesiler¹, Joan Serrà², Emilia Gómez¹•Institutions (2)

Pompeu Fabra University¹, Dolby Laboratories²

11 Oct 2020

TL;DR: This work proposes to further narrow the gap between accuracy and scalability in version identification systems by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model.

...read moreread less

Abstract: Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further narrow this gap by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model. We compare a wide range of techniques and propose new ones, from classical dimensionality reduction to more sophisticated distillation schemes. With those, we obtain 99% smaller embeddings that, moreover, yield up to a 3% accuracy increase. Such small embeddings can have an important impact in retrieval time, up to the point of making a real-world system practical on a standalone laptop.

...read moreread less

Proceedings Article•DOI•

Source Coding of Audio Signals with a Generative Model

[...]

Fejgin Roy M¹, Janusz Klejsa, Lars Villemoes, Cong Zhou¹•Institutions (1)

Dolby Laboratories¹

27 May 2020

TL;DR: In this paper, a waveform is first quantized, yielding a finite bitrate representation, and then reconstructed by random sampling from a model conditioned on the quantized waveform.

...read moreread less

Book Chapter•DOI•

Adaptive Dithering Using Curved Markov-Gaussian Noise in the Quantized Domain for Mapping SDR to HDR Image

[...]

Subhayan Mukherjee¹, Guan-Ming Su², Irene Cheng¹•Institutions (2)

University of Alberta¹, Dolby Laboratories²

20 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors proposed an online noise generation and injection method based on the luma of the quantized pixel and the slope of the inverse-tone mapping function to generate noise patterns.

...read moreread less

Abstract: High Dynamic Range (HDR) imaging is gaining increased attention due to its realistic content, for not only regular displays but also smartphones. Before sufficient HDR content is distributed, HDR visualization still relies mostly on converting Standard Dynamic Range (SDR) content. SDR images are often quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video transmission. Quantization can easily lead to banding artefacts. In some computing and/or memory I/O limited environment, the traditional solution using spatial neighborhood information is not feasible. Our method includes noise generation (offline) and noise injection (online), and operates on pixels of the quantized image. We vary the magnitude and structure of the noise pattern adaptively based on the luma of the quantized pixel and the slope of the inverse-tone mapping function. Subjective user evaluations confirm the superior performance of our technique.

...read moreread less

Patent•

Coding multiview video

[...]

Lakshman Haricharan¹, Ninan Ajit¹•Institutions (1)

Dolby Laboratories¹

12 May 2020

TL;DR: In this paper, a target view to a 3D scene depicted by a multiview image is determined by determining the texture image and depth image for each sampled view in the selected sampled views.

...read moreread less

Abstract: A target view to a 3D scene depicted by a multiview image is determined The multiview image comprises multiple sampled views Each sampled view comprises multiple texture images and multiple depth images in multiple image layers The target view is used to select, from the multiple sampled views of the multiview image, sampled views A texture image and a depth image for each sampled view in the selected sampled views are encoded into a multiview video signal to be transmitted to a downstream device

...read moreread less

Patent•

Tone curve mapping for high dynamic range images

[...]

Pytlarz Jaclyn Anne¹, Atkins Robin¹•Institutions (1)

Dolby Laboratories¹

02 Jan 2020

TL;DR: In this article, a tone-mapping transfer function based on third order Hermite splines is presented, which is based on a function that includes two spline polynomials determined using three anchor points and three slopes.

...read moreread less

Abstract: Methods for mapping an image from a first dynamic range to a second dynamic range are presented. The mapping is based on a function that includes two spline polynomials determined using three anchor points and three slopes. The first anchor N point is determined using the black point levels of the input and target output, the second anchor point is determined using the white point levels of the input and target output, and the third anchor point is determined using mid-tones information data for the input and target output. The mid-tones level of the target output is computed adaptively based on an ideal one-to-one mapping and by preserving input contrast in both the blacks and the highlights. An example tone-mapping transfer function based on third order (cubic) Hermite splines is presented.

...read moreread less

Patent•

Improved Rendering of Immersive Audio Content

[...]

Michael Mason¹, Torres Juan Felix, Mateos Sole Antonio, Owen Andrew Robert, Daniel Arteaga, Mills Adam J, De Burgh Mark David - Show less +3 more•Institutions (1)

Dolby Laboratories¹

13 May 2020

TL;DR: In this paper, a method for rendering audio input including divergence metadata for playback in a playback environment comprises creating two additional audio objects associated with the audio object such that respective locations of the two additional objects are evenly spaced from the location of the audio objects, on opposite sides of the location when seen from an intended listener's position in the playback environment.

...read moreread less

Abstract: The present document relates to methods and apparatus for rendering input audio for playback in a playback environment. The input audio includes at least one audio object and associated metadata, and the associated metadata indicates at least a location of the audio object. A method for rendering input audio including divergence metadata for playback in a playback environment comprises creating two additional audio objects associated with the audio object such that respective locations of the two additional audio objects are evenly spaced from the location of the audio object, on opposite sides of the location of the audio object when seen from an intended listener's position in the playback environment, determining respective weight factors for application to the audio en.) object and the two additional audio objects, and rendering the audio object and the two additional audio objects to one or more speaker feeds in accordance with the determined weight factors, The present document further relates to methods and apparatus for rendering audio input including extent metadata and/or diffuseness metadata for playback in a playback environment.

...read moreread less

Patent•

Efficient End-to-End Single Layer Inverse Display Management Coding

[...]

Neeraj J. Gadgil¹, Su Guan-Ming, Chen Tao, Lee Yoon Yung•Institutions (1)

Dolby Laboratories¹

07 Feb 2020

TL;DR: In this article, a standard dynamic range (SDR) image is received and composer metadata is generated for mapping the SDR image to an enhanced dynamic range image, which is encoded in an output SDR video signal.

...read moreread less

Abstract: A standard dynamic range (SDR) image is received. Composer metadata is generated for mapping the SDR image to an enhanced dynamic range (EDR) image. The composer metadata specifies a backward reshaping mapping that is generated from SDR-EDR image pairs in a training database. The SDR-EDR image pairs comprise SDR images that do not include the SDR image and EDR images that corresponds to the SDR images. The SDR image and the composer metadata are encoded in an output SDR video signal. An EDR display operating with a receiver of the output SDR video signal is caused to render an EDR display image. The EDR display image is derived from a composed EDR image composed from the SDR image based on the composer metadata.

...read moreread less

Proceedings Article•DOI•

Experience: advanced network operations in (Un)-connected remote communities

[...]

Diego Perino¹, Xiaoyuan Yang¹, Joan Serrà², Andra Lutu¹, Ilias Leontiadis³ - Show less +1 more•Institutions (3)

Telefónica¹, Dolby Laboratories², Samsung³

16 Apr 2020

TL;DR: This paper presents the commercial deployment in thousands remote small communities and describes the unique experience of maintaining this infrastructure, and presents an extension of the operations support system (OSS) leveraging advanced analytics and machine learning with the goal of optimizing network maintenance while reducing costs.

...read moreread less

Abstract: The Internet Para Todos program is working to provide sustainable mobile broadband to 100 M unconnected people in Latin America. In this paper we present our commercial deployment in thousands remote small communities and describe the unique experience of maintaining this infrastructure. We describe the challenges related to managing operations containing the cost in these extreme geographical conditions. We also analyze operational data to understand outage patterns and present typical operational issues in this unique remote community environment. Finally, we present an extension of the operations support system (OSS) leveraging advanced analytics and machine learning with the goal of optimizing network maintenance while reducing costs.

...read moreread less

Patent•

Speech style transfer

[...]

Cong Zhou¹, Michael Horgan, Vivek Kumar, Morales Jaime H, Cristina Michel Vasco - Show less +1 more•Institutions (1)

Dolby Laboratories¹

13 Oct 2020

TL;DR: In this article, a speech synthesizer is trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker using time-stamped phoneme sequences, pitch contour data and speaker identification data.

...read moreread less

Abstract: Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.

...read moreread less

Patent•

System and method for displaying high quality images in a dual modulation projection system

[...]

Pertierra Juan P¹, Richards Martin J, Orlick Christopher John¹, Le Barbenchon Clement Luc Carol, Pires Arrifano Angelo Miguel - Show less +1 more•Institutions (1)

Dolby Laboratories¹

15 May 2020

TL;DR: In this paper, a beam-steering modulator, an amplitude modulator and a controller are used to project a high quality version of the image described by the image data.

...read moreread less

Abstract: A novel high efficiency image projection system includes a beam-steering modulator, an amplitude modulator, and a controller In a particular embodiment the controller generates beam-steering drive values from image data and uses the beam-steering drive values to drive the beam-steering modulator Additionally, the controller utilizes the beam-steering drive values to generate a lightfield simulation of a lightfield projected onto the amplitude modulator by the beam-steering modulator The controller utilizes the lightfield simulation to generate amplitude drive values for driving the amplitude modulator in order to project a high quality version of the image described by the image data

...read moreread less

Journal Article•DOI•

Robust HDR image quality assessment using combination of quality metrics

[...]

Anustup Choudhury¹•Institutions (1)

Dolby Laboratories¹

01 Aug 2020-Multimedia Tools and Applications

TL;DR: This work proposes a framework to combine various quality metrics using a full reference approach for High Dynamic Range (HDR) Image quality assessment (IQA), and uses the back-tracking based Sequential Floating Forward Selection technique during training to include a subset of metrics from a list of quality metrics in this model.

...read moreread less

Abstract: We propose a framework to combine various quality metrics using a full reference approach for High Dynamic Range (HDR) Image quality assessment (IQA). We combine scores from metrics exclusively designed for different applications such as HDR, Standard Dynamic Range (SDR) and color difference measures, in a non-linear manner using machine learning (ML) approaches with weights determined during an offline training process. We explore various ML techniques and find that support vector machine regression and gradient boosting regression trees are effective. To improve performance and reduce complexity, we use the back-tracking based Sequential Floating Forward Selection technique during training to include a subset of metrics from a list of quality metrics in our model. We evaluate the performance on five publicly available calibrated HDR databases with different types of distortion (including different types of compression, Gaussian noise, gamut mismatch, chromatic distortions and so on) and demonstrate improved performance using our method as compared to several existing IQA metrics. We perform extensive statistical analysis to demonstrate significant improvement over existing approaches and show the generality and robustness of our approach using cross-database validation.

...read moreread less

Patent•

Audio conferencing using a distributed array of smartphones

[...]

Nguyen Khoa-Van¹, Giraudie Stephane, Senard Benoit•Institutions (1)

Dolby Laboratories¹

25 Mar 2020

TL;DR: In this paper, a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability, is described.

...read moreread less

Abstract: Described is a method of hosting a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability, the method comprising: grouping the plurality of client devices into two or more groups based on their belonging to respective acoustic spaces, receiving first audio streams from the plurality of client devices, generating second audio streams from the first audio streams for rendering by respective client devices among the plurality of client devices, based on the grouping of the plurality of client devices into the two or more groups, and outputting the generated second audio streams to respective client devices. Further described are corresponding computation devise, computer programs, and computer-readable storage media.

...read moreread less

Patent•

Processing video including a physical writing surface

[...]

Port Timothy Alan¹•Institutions (1)

Dolby Laboratories¹

21 Apr 2020

TL;DR: In this paper, a method of processing a sequence of video frames from a camera capturing a writing surface for subsequent transmission to at least one of a remote videoconferencing client and a remote server is presented.

...read moreread less

Abstract: A method of processing of a sequence of video frames from a camera capturing a writing surface for subsequent transmission to at least one of a remote videoconferencing client and a remote videoconferencing server. The method comprises receiving the sequence of video frames from the camera; and selecting an image area of interest in the video frames, comprising selecting one of a sub-area of the video frames and an entire area of the video frames. The method also comprises, for each current video frame of the sequence of video frames, generating a pen stroke mask by applying adaptive thresholding to the image area of interest. The method also comprises generating an output video frame using the pen stroke mask. Corresponding systems and computer readable media are disclosed.

...read moreread less

Patent•

Audio device for hdmi

[...]

Wolff Christian¹, Fischer David Matthew•Institutions (1)

Dolby Laboratories¹

04 Aug 2020

TL;DR: In this paper, a loopback device connects between a first source device and a sink device on a first connection; a second source device connects to the sink devices on a second connection.

...read moreread less

Abstract: An apparatus, method and system for connecting High-Definition Multimedia Interface (HDMI) devices. A loopback device connects between a first source device and a sink device on a first connection; a second source device connects to the sink device on a second connection. The loopback device manages the first connection, passes transition-minimized differential signaling (TMDS) or fixed-rate link (FRL) signals through to the sink device, and outputs audio received from the sink device on the audio return channel (ARC) or enhanced audio return channel (eARC). In this manner, audio that originates from any source device may be output without requiring a direct connection to the loopback device.

...read moreread less