scispace - formally typeset
Search or ask a question

Showing papers by "Dolby Laboratories published in 2020"


Proceedings Article
30 Apr 2020
TL;DR: This paper uses an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison, and finds such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.
Abstract: Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to the excessive influence that input complexity has in generative models' likelihoods. We report a set of experiments supporting this hypothesis, and use an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio test akin to Bayesian model comparison. We find such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, and complexity estimates.

155 citations


Proceedings ArticleDOI
04 May 2020
TL;DR: A GAN-based coded audio enhancer that operates end-to-end directly on decoded audio samples, eliminating the need to design any manually-crafted frontend and improving the quality of speech and difficult to code applause excerpts significantly.
Abstract: Audio codecs are typically transform-domain based and efficiently code stationary audio signals, but they struggle with speech and signals containing dense transient events such as applause. Specifically, with these two classes of signals as examples, we demonstrate a technique for restoring audio from coding noise based on generative adversarial networks (GAN). A primary advantage of the proposed GAN-based coded audio enhancer is that the method operates end-to-end directly on decoded audio samples, eliminating the need to design any manually-crafted frontend. Furthermore, the enhancement approach described in this paper can improve the sound quality of low-bit rate coded audio without any modifications to the existent standard-compliant encoders. Subjective tests illustrate that the proposed enhancer improves the quality of speech and difficult to code applause excerpts significantly.

31 citations


Posted Content
TL;DR: This work shows that nearest neighbor interpolation upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.
Abstract: A number of recent advances in neural audio synthesis rely on upsampling layers, which can introduce undesired artifacts. In computer vision, upsampling artifacts have been studied and are known as checkerboard artifacts (due to their characteristic visual pattern). However, their effect has been overlooked so far in audio processing. Here, we address this gap by studying this problem from the audio signal processing perspective. We first show that the main sources of upsampling artifacts are: (i) the tonal and filtering artifacts introduced by problematic upsampling operators, and (ii) the spectral replicas that emerge while upsampling. We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

30 citations


Proceedings ArticleDOI
23 Mar 2020
TL;DR: A new video coding tool in the Versatile Video Coding standard (VVC) named as luma mapping with chroma scaling (LMCS) is described, which aims at improving the coding efficiency for standard and high dynamic range video signals by making better use of the range of luma code values allowed at a specified bit depth.
Abstract: This paper describes a new video coding tool in the Versatile Video Coding standard (VVC) named as luma mapping with chroma scaling (LMCS). Experimental compression performance results for LMCS and non-normative examples for deriving LMCS parameter values are also provided. LMCS has two main components: 1) a process for mapping input luma code values to a new set of code values for use inside the coding loop; and 2) a luma-dependent process for scaling chroma residue values. The first process, luma mapping, aims at improving the coding efficiency for standard and high dynamic range video signals by making better use of the range of luma code values allowed at a specified bit depth. The second process, chroma scaling, manages relative compression efficiency for the luma and chroma components of the video signal. The luma mapping process of LMCS is applied at the pixel sample level, and is implemented using a piecewise linear model. The chroma scaling process is applied at the chroma block level, and is implemented using a scaling factor derived from reconstructed neighboring luma samples of the chroma block.

23 citations


Posted Content
TL;DR: An empirical study of Conv-TasNet is conducted and an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it is proposed that can improve average SI-SNR performance by more than 1 dB.
Abstract: Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

22 citations


Journal ArticleDOI
TL;DR: The goal of this paper is to provide an overview of the CfP responses for the HDR/WCG category, and a description of the specific HDR/ WCG technologies submitted to the Cfp.
Abstract: The ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group issued a Call for Proposals (CfP) on video compression with capability beyond HEVC in October 2017. The CfP considered three categories of content – Standard Dynamic Range, High Dynamic Range and Wide Colour Gamut (HDR/WCG), and 360° Omni-directional video. As a result of the CfP process, the development of a new video coding standard, named Versatile Video Coding (VVC), was initiated. The goal of this paper is to provide an overview of the CfP responses for the HDR/WCG category. The paper includes a summary of work leading to the development of the CfP, a presentation of the CfP results for the HDR/WCG category, and a description of the specific HDR/WCG technologies submitted to the CfP.

20 citations


Posted Content
TL;DR: This work tackles automatic speech quality assessment with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks.
Abstract: Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks. Our results show that such a semi-supervised approach can cut the error of existing methods by more than 36%, while providing additional benefits in terms of reusable features or auxiliary outputs. Improvement is further corroborated with an out-of-sample test showing promising generalization capabilities.

19 citations


Proceedings ArticleDOI
04 May 2020
TL;DR: MOVE is presented, a musically-motivated method for accurate and scalable version identification that achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy.
Abstract: The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.

17 citations


Posted Content
TL;DR: This work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters, and generates mixes that outperform baseline approaches.
Abstract: Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weight sharing, as well as with a sum/difference stereo loss function. The proposed model can be trained with a limited number of examples, is permutation invariant with respect to the input ordering, and places no limit on the number of input sources. Furthermore, it produces human-readable mixing parameters, allowing users to manually adjust or refine the generated mix. Results from a perceptual evaluation involving audio engineers indicate that our approach generates mixes that outperform baseline approaches. To the best of our knowledge, this work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters.

14 citations


Journal ArticleDOI
TL;DR: A response to the joint CfP that considers all three categories of video content and the core codec in the response is designed based on the joint exploration model (JEM) reference software.
Abstract: The ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG) issued in October 2017 a joint Call for Proposals (CfP) on video compression with capability beyond HEVC. The joint CfP included three categories of content: standard dynamic range (SDR), high dynamic range and wide color gamut (HDR/WCG), and 360° omni-directional video (360°). This paper describes a response to the joint CfP that considers all three categories of video content. The core codec in the response is designed based on the joint exploration model (JEM) reference software. The key coding tools in the JEM are significantly simplified to reduce both average and worst-case complexity for hardware design with negligible coding performance loss. Furthermore, two additional coding tools are used to further improve coding efficiency. For the HDR and 360° categories, additional coding tools specifically designed to optimize the compression efficiency and subjective quality of that specific content category are included. Further, some SDR coding tools are modified to alleviate subjective quality problems. For the random access configuration, compared to the HEVC test model (HM) anchor, the proposed video codec achieves average luma rate savings of 35.7%, 31.3%, and 33.9% for the SDR, HDR, and 360° categories, respectively.

13 citations


Proceedings ArticleDOI
11 Oct 2020
TL;DR: This work proposes a comparative study of different input representations of melodic or harmonic characteristics of songs, and shows that systems combining melodic and harmonic features drastically outperform those relying on a single input representation.
Abstract: Recent works have addressed the automatic cover detection problem from a metric learning perspective. They employ different input representations, aiming to exploit melodic or harmonic characteristics of songs and yield promising performances. In this work, we propose a comparative study of these different representations and show that systems combining melodic and harmonic features drastically outperform those relying on a single input representation. We illustrate how these features complement each other with both quantitative and qualitative analyses. We finally investigate various fusion schemes and propose methods yielding state-of-the-art performances on two publicly-available large datasets.

Proceedings ArticleDOI
01 Nov 2020
TL;DR: In this paper, a joint source-channel rate-distortion (RD) optimization for real-time video transmission is proposed, where the video compression and forward error correction (FEC) options are optimized by looking for the best trade-off between the estimated end-to-end distortion of a video packet and the sum of the number of source bits and FEC bits used to encode that packet.
Abstract: This paper proposes a joint source-channel rate-distortion (RD) optimization for real-time video transmission. The video compression and forward error correction (FEC) options are optimized by looking for the best trade-off between the estimated end-to-end distortion of a video packet and the sum of the number of source bits and FEC bits used to encode that packet. Video coding options include coding mode and quantization parameter, which are selected for each macroblock. Channel coding options consist of different FEC code rates that provide different levels of protection against the lossy channel. The proposed RD technique adjusts the bit rate to meet a target using a Lagrange multiplier approach. The encoder also uses the instantaneous channel state information to improve performance for a varying channel. Conventional RD optimization approaches optimize over the video coding modes, and our approach which considers the channel and FEC bits as well has better performance over both AWGN and Rayleigh fading channels. We also consider an approach to reduce the computational complexity of the proposed RD scheme.

Proceedings ArticleDOI
04 May 2020
TL;DR: In this article, the encoder/decoder of Conv-TasNet is proposed to be based on a non-linear variant of the learned encoder and decoder.
Abstract: Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

Patent
Peng Yin1, Taoran Lu1, Pu Fangjun1, Tao Chen1, Husak Walter J1 
21 May 2020
TL;DR: In this paper, a decoder parses sequence processing set (SPS) data from an input coded bitstream to detect that an HDR extension syntax structure is present in the parsed SPS data.
Abstract: In a method to improve the coding efficiency of high-dynamic range (HDR) images, a decoder parses sequence processing set (SPS) data from an input coded bitstream to detect that an HDR extension syntax structure is present in the parsed SPS data. It extracts from the HDR extension syntax structure post-processing information that includes one or more of a color space enabled flag, a color enhancement enabled flag, an adaptive reshaping enabled flag, a dynamic range conversion flag, a color correction enabled flag, or an SDR viewable flag. It decodes the input bitstream to generate a preliminary output decoded signal, and generates a second output signal based on the preliminary output signal and the post-processing information.

Journal ArticleDOI
TL;DR: A highly integrated architecture that can detect and alleviate banding artifacts while preserving true edges and details in an extremely efficient way is designed by a 7-tap edge-aware selective sparse filter.
Abstract: When a low or standard dynamic range video is inverse tone mapped to high dynamic range (HDR), there can be banding artifacts in the output HDR video. We design a highly integrated architecture that can detect and alleviate banding artifacts while preserving true edges and details in an extremely efficient way. This is achieved by a 7-tap edge-aware selective sparse filter. Coding artifacts, such as blocky artifacts, can also be reduced by this filter. The filter includes some parameters that depend on the strengths of the banding artifacts. A parameter selection mechanism is presented which considers smoothness of the banding regions and fidelity of the filtering output. The filter yields significant PSNR gain at the regions of artifacts. Subjective tests demonstrate the great quality improvement achieved by the proposed filter, compared to the quality before filtering. The visual quality provided by the filter is better than or similar to that of algorithms which are far more complex.

Posted Content
TL;DR: This work uses SampleRNN as the generative model and demonstrates that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.
Abstract: We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditioned on the quantized waveform. The proposed coding scheme is theoretically analyzed. Using SampleRNN as the generative model, we demonstrate that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.

Proceedings ArticleDOI
11 Oct 2020
TL;DR: This work proposes to further narrow the gap between accuracy and scalability in version identification systems by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model.
Abstract: Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further narrow this gap by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model. We compare a wide range of techniques and propose new ones, from classical dimensionality reduction to more sophisticated distillation schemes. With those, we obtain 99% smaller embeddings that, moreover, yield up to a 3% accuracy increase. Such small embeddings can have an important impact in retrieval time, up to the point of making a real-world system practical on a standalone laptop.

Proceedings ArticleDOI
27 May 2020
TL;DR: In this paper, a waveform is first quantized, yielding a finite bitrate representation, and then reconstructed by random sampling from a model conditioned on the quantized waveform.
Abstract: We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditioned on the quantized waveform. The proposed coding scheme is theoretically analyzed. Using SampleRNN as the generative model, we demonstrate that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.

Book ChapterDOI
TL;DR: In this paper, the authors proposed an online noise generation and injection method based on the luma of the quantized pixel and the slope of the inverse-tone mapping function to generate noise patterns.
Abstract: High Dynamic Range (HDR) imaging is gaining increased attention due to its realistic content, for not only regular displays but also smartphones. Before sufficient HDR content is distributed, HDR visualization still relies mostly on converting Standard Dynamic Range (SDR) content. SDR images are often quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video transmission. Quantization can easily lead to banding artefacts. In some computing and/or memory I/O limited environment, the traditional solution using spatial neighborhood information is not feasible. Our method includes noise generation (offline) and noise injection (online), and operates on pixels of the quantized image. We vary the magnitude and structure of the noise pattern adaptively based on the luma of the quantized pixel and the slope of the inverse-tone mapping function. Subjective user evaluations confirm the superior performance of our technique.

Patent
12 May 2020
TL;DR: In this paper, a target view to a 3D scene depicted by a multiview image is determined by determining the texture image and depth image for each sampled view in the selected sampled views.
Abstract: A target view to a 3D scene depicted by a multiview image is determined The multiview image comprises multiple sampled views Each sampled view comprises multiple texture images and multiple depth images in multiple image layers The target view is used to select, from the multiple sampled views of the multiview image, sampled views A texture image and a depth image for each sampled view in the selected sampled views are encoded into a multiview video signal to be transmitted to a downstream device

Patent
02 Jan 2020
TL;DR: In this article, a tone-mapping transfer function based on third order Hermite splines is presented, which is based on a function that includes two spline polynomials determined using three anchor points and three slopes.
Abstract: Methods for mapping an image from a first dynamic range to a second dynamic range are presented. The mapping is based on a function that includes two spline polynomials determined using three anchor points and three slopes. The first anchor N point is determined using the black point levels of the input and target output, the second anchor point is determined using the white point levels of the input and target output, and the third anchor point is determined using mid-tones information data for the input and target output. The mid-tones level of the target output is computed adaptively based on an ideal one-to-one mapping and by preserving input contrast in both the blacks and the highlights. An example tone-mapping transfer function based on third order (cubic) Hermite splines is presented.

Patent
13 May 2020
TL;DR: In this paper, a method for rendering audio input including divergence metadata for playback in a playback environment comprises creating two additional audio objects associated with the audio object such that respective locations of the two additional objects are evenly spaced from the location of the audio objects, on opposite sides of the location when seen from an intended listener's position in the playback environment.
Abstract: The present document relates to methods and apparatus for rendering input audio for playback in a playback environment. The input audio includes at least one audio object and associated metadata, and the associated metadata indicates at least a location of the audio object. A method for rendering input audio including divergence metadata for playback in a playback environment comprises creating two additional audio objects associated with the audio object such that respective locations of the two additional audio objects are evenly spaced from the location of the audio object, on opposite sides of the location of the audio object when seen from an intended listener's position in the playback environment, determining respective weight factors for application to the audio en.) object and the two additional audio objects, and rendering the audio object and the two additional audio objects to one or more speaker feeds in accordance with the determined weight factors, The present document further relates to methods and apparatus for rendering audio input including extent metadata and/or diffuseness metadata for playback in a playback environment.

Patent
07 Feb 2020
TL;DR: In this article, a standard dynamic range (SDR) image is received and composer metadata is generated for mapping the SDR image to an enhanced dynamic range image, which is encoded in an output SDR video signal.
Abstract: A standard dynamic range (SDR) image is received. Composer metadata is generated for mapping the SDR image to an enhanced dynamic range (EDR) image. The composer metadata specifies a backward reshaping mapping that is generated from SDR-EDR image pairs in a training database. The SDR-EDR image pairs comprise SDR images that do not include the SDR image and EDR images that corresponds to the SDR images. The SDR image and the composer metadata are encoded in an output SDR video signal. An EDR display operating with a receiver of the output SDR video signal is caused to render an EDR display image. The EDR display image is derived from a composed EDR image composed from the SDR image based on the composer metadata.

Proceedings ArticleDOI
16 Apr 2020
TL;DR: This paper presents the commercial deployment in thousands remote small communities and describes the unique experience of maintaining this infrastructure, and presents an extension of the operations support system (OSS) leveraging advanced analytics and machine learning with the goal of optimizing network maintenance while reducing costs.
Abstract: The Internet Para Todos program is working to provide sustainable mobile broadband to 100 M unconnected people in Latin America. In this paper we present our commercial deployment in thousands remote small communities and describe the unique experience of maintaining this infrastructure. We describe the challenges related to managing operations containing the cost in these extreme geographical conditions. We also analyze operational data to understand outage patterns and present typical operational issues in this unique remote community environment. Finally, we present an extension of the operations support system (OSS) leveraging advanced analytics and machine learning with the goal of optimizing network maintenance while reducing costs.

Patent
13 Oct 2020
TL;DR: In this article, a speech synthesizer is trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker using time-stamped phoneme sequences, pitch contour data and speaker identification data.
Abstract: Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.

Patent
15 May 2020
TL;DR: In this paper, a beam-steering modulator, an amplitude modulator and a controller are used to project a high quality version of the image described by the image data.
Abstract: A novel high efficiency image projection system includes a beam-steering modulator, an amplitude modulator, and a controller In a particular embodiment the controller generates beam-steering drive values from image data and uses the beam-steering drive values to drive the beam-steering modulator Additionally, the controller utilizes the beam-steering drive values to generate a lightfield simulation of a lightfield projected onto the amplitude modulator by the beam-steering modulator The controller utilizes the lightfield simulation to generate amplitude drive values for driving the amplitude modulator in order to project a high quality version of the image described by the image data

Journal ArticleDOI
TL;DR: This work proposes a framework to combine various quality metrics using a full reference approach for High Dynamic Range (HDR) Image quality assessment (IQA), and uses the back-tracking based Sequential Floating Forward Selection technique during training to include a subset of metrics from a list of quality metrics in this model.
Abstract: We propose a framework to combine various quality metrics using a full reference approach for High Dynamic Range (HDR) Image quality assessment (IQA). We combine scores from metrics exclusively designed for different applications such as HDR, Standard Dynamic Range (SDR) and color difference measures, in a non-linear manner using machine learning (ML) approaches with weights determined during an offline training process. We explore various ML techniques and find that support vector machine regression and gradient boosting regression trees are effective. To improve performance and reduce complexity, we use the back-tracking based Sequential Floating Forward Selection technique during training to include a subset of metrics from a list of quality metrics in our model. We evaluate the performance on five publicly available calibrated HDR databases with different types of distortion (including different types of compression, Gaussian noise, gamut mismatch, chromatic distortions and so on) and demonstrate improved performance using our method as compared to several existing IQA metrics. We perform extensive statistical analysis to demonstrate significant improvement over existing approaches and show the generality and robustness of our approach using cross-database validation.

Patent
25 Mar 2020
TL;DR: In this paper, a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability, is described.
Abstract: Described is a method of hosting a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability, the method comprising: grouping the plurality of client devices into two or more groups based on their belonging to respective acoustic spaces, receiving first audio streams from the plurality of client devices, generating second audio streams from the first audio streams for rendering by respective client devices among the plurality of client devices, based on the grouping of the plurality of client devices into the two or more groups, and outputting the generated second audio streams to respective client devices. Further described are corresponding computation devise, computer programs, and computer-readable storage media.

Patent
21 Apr 2020
TL;DR: In this paper, a method of processing a sequence of video frames from a camera capturing a writing surface for subsequent transmission to at least one of a remote videoconferencing client and a remote server is presented.
Abstract: A method of processing of a sequence of video frames from a camera capturing a writing surface for subsequent transmission to at least one of a remote videoconferencing client and a remote videoconferencing server. The method comprises receiving the sequence of video frames from the camera; and selecting an image area of interest in the video frames, comprising selecting one of a sub-area of the video frames and an entire area of the video frames. The method also comprises, for each current video frame of the sequence of video frames, generating a pen stroke mask by applying adaptive thresholding to the image area of interest. The method also comprises generating an output video frame using the pen stroke mask. Corresponding systems and computer readable media are disclosed.

Patent
04 Aug 2020
TL;DR: In this paper, a loopback device connects between a first source device and a sink device on a first connection; a second source device connects to the sink devices on a second connection.
Abstract: An apparatus, method and system for connecting High-Definition Multimedia Interface (HDMI) devices. A loopback device connects between a first source device and a sink device on a first connection; a second source device connects to the sink device on a second connection. The loopback device manages the first connection, passes transition-minimized differential signaling (TMDS) or fixed-rate link (FRL) signals through to the sink device, and outputs audio received from the sink device on the audio return channel (ARC) or enhanced audio return channel (eARC). In this manner, audio that originates from any source device may be output without requiring a direct connection to the loopback device.