Showing papers by "Sony Broadcast & Professional Research Laboratories published in 2021"

PDF

Open Access

Journal Article•DOI•

A Link-Layer Synchronization and Medium Access Control Protocol for Terahertz-Band Communication Networks

[...]

Qing Xia¹, Zahed Hossain², Michael J. Medley³, Josep Miquel Jornet⁴•Institutions (4)

Sony Broadcast & Professional Research Laboratories¹, Intel², Air Force Research Laboratory³, Northeastern University⁴

01 Jan 2021-IEEE Transactions on Mobile Computing

TL;DR: The results show that the proposed protocol can maximize the successful packet delivery probability without compromising the achievable throughput in THz-band communication networks.

...read moreread less

Abstract: In this paper, a link-layer synchronization and medium access control (MAC) protocol for very-high-speed wireless communication networks in the Terahertz (THz) band is presented. The protocol relies on a receiver-initiated handshake to guarantee synchronization between transmitter and receiver. Two scenarios are considered, namely, a macroscale scenario, where nodes utilize rotating directional antennas to periodically sweep the space while overcoming the distance problem at THz frequencies, and a nanoscale scenario, where nano-devices require energy harvesting systems to operate. Both scenarios are implemented on a centralized and an ad-hoc network architecture. A carrier-based physical layer is considered for the macro-scenario, whereas the physical layer for the nano-scenario is based on a femtosecond-long pulse-based modulation scheme with packet interleaving. The performance of the proposed MAC protocol is analytically investigated in terms of delay, throughput and probability of successful packet delivery, and compared to that of an adapted Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) with and without handshake. The results are validated by means of extensive simulations with ns -3, in which all the necessary THz elements have been implemented. The results show that the proposed protocol can maximize the successful packet delivery probability without compromising the achievable throughput in THz-band communication networks.

...read moreread less

34 citations

Proceedings Article•DOI•

Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection

[...]

Kazuki Shimada¹, Yuichiro Koyama¹, Naoya Takahashi¹, Shusuke Takahashi¹, Yuki Mitsufuji¹ - Show less +1 more•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

06 Jun 2021

TL;DR: Zhang et al. as discussed by the authors proposed an activity-coupled Cartesian DOA (ACCDOA) representation, which assigns a sound event activity to the length of a corresponding Cartesian DoA vector.

...read moreread less

Abstract: Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task increases system complexity and network size. To address these problems, we propose an activity-coupled Cartesian DOA (ACCDOA) representation, which assigns a sound event activity to the length of a corresponding Cartesian DOA vector. The ACCDOA representation enables us to solve a SELD task with a single target and has two advantages: avoiding the necessity of balancing the objectives and model size increase. In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size. The ACCDOA-based SELD system also performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.

...read moreread less

31 citations

Proceedings Article•DOI•

Densely connected multidilated convolutional networks for dense prediction tasks

[...]

Naoya Takahashi¹, Yuki Mitsufuji¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

01 Jun 2021

TL;DR: In this paper, the authors claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net).

...read moreread less

Abstract: Tasks that involve high-resolution dense prediction require a modeling of both local and global patterns in a large input field. Although the local and global structures often depend on each other and their simultaneous modeling is important, many convolutional neural network (CNN)-based approaches interchange representations in different resolutions only a few times. In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net). D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multidilated convolution with the DenseNet architecture, D3Net incorporates multiresolution learning with an exponentially growing receptive field in almost all layers, while avoiding the aliasing problem that occurs when we naively incorporate the dilated convolution in DenseNet. Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over stateof-the-art methods.

...read moreread less

31 citations

Journal Article•DOI•

Technologies for the Crystal LED display system

[...]

Goshi Biwa¹, Akiyoshi Aoyagi¹, Masato Doi¹, Katsuhiro Tomoda, Atsushi Yasuda, Hisashi Kadota¹ - Show less +2 more•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

01 Jun 2021-Journal of The Society for Information Display

27 citations

Proceedings Article•DOI•

All For One And One For All: Improving Music Separation By Bridging Networks

[...]

Ryosuke Sawata¹, Stefan Uhlich, Shusuke Takahashi¹, Yuki Mitsufuji¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

06 Jun 2021

TL;DR: In this article, a multi-domain loss (MDL) and two combination schemes are proposed for music separation with deep neural networks, which can be applied to many existing DNN-based separation methods as they are merely loss functions which are only used during training.

...read moreread less

Abstract: This paper proposes several improvements for music separation with deep neural networks (DNNs), namely a multi-domain loss (MDL) and two combination schemes. First, by using MDL we take advantage of the frequency and time domain representation of audio signals. Next, we utilize the relationship among instruments by jointly considering them. We do this on the one hand by modifying the network architecture and introducing a CrossNet structure. On the other hand, we consider combinations of instrument estimates by using a new combination loss (CL). MDL and CL can easily be applied to many existing DNN-based separation methods as they are merely loss functions which are only used during training and do not affect the inference step. Experimental results show that the performance of Open-Unmix (UMX), a well-known and state-of-the-art open-source library for music separation, can be improved by utilizing our above schemes. Our modifications of UMX are open-sourced together with this paper.

...read moreread less

20 citations

Proceedings Article•DOI•

Streaming Transformer Asr With Blockwise Synchronous Beam Search

[...]

Emiru Tsunoo¹, Yosuke Kashiwagi¹, Shinji Watanabe²•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, Johns Hopkins University²

19 Jan 2021

TL;DR: In this article, a blockwise synchronous beam search algorithm based on blockwise processing of encoder is proposed to perform streaming E2E Transformer ASR, where encoded feature blocks are synchronously aligned using a block boundary detection technique, where a reliability score of each predicted hypothesis is evaluated based on the end-ofsequence and repeated tokens in the hypothesis.

...read moreread less

Abstract: The Transformer self-attention network has shown promising performance as an alternative to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) systems. However, Transformer has a drawback in that the entire input sequence is required to compute both self-attention and source–target attention. In this paper, we propose a novel blockwise synchronous beam search algorithm based on blockwise processing of encoder to perform streaming E2E Transformer ASR. In the beam search, encoded feature blocks are synchronously aligned using a block boundary detection technique, where a reliability score of each predicted hypothesis is evaluated based on the end-of-sequence and repeated tokens in the hypothesis. Evaluations of the HKUST and AISHELL-1 Mandarin, LibriSpeech English, and CSJ Japanese tasks show that the proposed streaming Transformer algorithm outperforms conventional online approaches, including monotonic chunkwise attention (MoChA), especially when using the knowledge distillation technique. An ablation study indicates that our streaming approach contributes to reducing the response time, and the repetition criterion contributes significantly in certain tasks. Our streaming ASR models achieve comparable or superior performance to batch models and other streaming-based Transformer methods in all tasks considered.

...read moreread less

15 citations

Journal Article•DOI•

FlavorGraph: a large-scale food-chemical graph for generating food representations and recommending food pairings.

[...]

Donghyeon Park¹, Keonwoo Kim¹, Seoyoon Kim¹, Michael Spranger², Jaewoo Kang¹ - Show less +1 more•Institutions (2)

Korea University¹, Sony Broadcast & Professional Research Laboratories²

13 Jan 2021-Scientific Reports

TL;DR: In this paper, a large-scale food graph, called FlavorGraph, is proposed to better represent foods in dense vectors, which can also be used to predict relations between compounds and foods.

...read moreread less

Abstract: Food pairing has not yet been fully pioneered, despite our everyday experience with food and the large amount of food data available on the web. The complementary food pairings discovered thus far created by the intuition of talented chefs, not by scientific knowledge or statistical learning. We introduce FlavorGraph which is a large-scale food graph by relations extracted from million food recipes and information of 1,561 flavor molecules from food databases. We analyze the chemical and statistical relations of FlavorGraph and apply our graph embedding method to better represent foods in dense vectors. Our graph embedding method is a modification of metapath2vec with an additional chemical property learning layer and quantitatively outperforms other baseline methods in food clustering. Food pairing suggestions made based on the food representations of FlavorGraph help achieve better results than previous works, and the suggestions can also be used to predict relations between compounds and foods. Our research offers a new perspective on not only food pairing techniques but also food science in general.

...read moreread less

11 citations

Proceedings Article•DOI•

End-to-End Lyrics Recognition with Voice to Singing Style Transfer

[...]

Sakya Basak¹, Shrutina Agarwal¹, Sriram Ganapathy¹, Naoya Takahashi²•Institutions (2)

Indian Institute of Science¹, Sony Broadcast & Professional Research Laboratories²

06 Jun 2021

TL;DR: In this paper, a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer is proposed, which performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice.

...read moreread less

Abstract: Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice. The V2S model based style transfer can generate good quality singing voice thereby enabling the conversion of large corpora of natural speech to singing voice that is useful in building an E2E lyrics transcription system. In our experiments on monophonic singing voice data, the V2S style transfer provides a significant gain (relative improvements of 21 %) for the E2E lyrics transcription system. We also discuss additional components like transfer learning and lyrics based language modeling to improve the performance of the lyrics transcription system.

...read moreread less

10 citations

Journal Article•DOI•

Fast Power Density Assessment of 5G Mobile Handset Using Equivalent Currents Method

[...]

Wang He¹, Bo Xu², Lucia Scialacqua, Zhinong Ying³, A. Scannavini, Lars J. Foged, Kun Zhao³, Carla Di Paola⁴, Shuai Zhang⁴, Sailing He¹ - Show less +6 more•Institutions (4)

Zhejiang University¹, Royal Institute of Technology², Sony Broadcast & Professional Research Laboratories³, Aalborg University⁴

08 Apr 2021-IEEE Transactions on Antennas and Propagation

TL;DR: The proposed EQC method is a good candidate for fast PD assessment of EMF exposure compliance testing in the mmWave frequency range and is compared with those computed using full-wave simulations and also those measured with a planar near-field (NF) scanning system.

...read moreread less

Abstract: As the fifth-generation (5G) mobile communication is utilizing millimeter-wave (mmWave) frequency bands, electromagnetic field (EMF) exposure emitted from a 5G mmWave mobile handset should be evaluated and compliant with the relevant EMF exposure limits in terms of peak spatial-average incident power density (PD). In this work, a fast PD assessment method for a 5G mmWave mobile handset using the equivalent current (EQC) method is proposed. The EQC method utilizes the intermediate-field (IF) data collected by a spherical measurement system to reconstruct the EQCs over a reconstruction surface and then computes the PD in close proximity of the mobile handset with acceptable accuracy. The performance of the proposed method is evaluated using a mmWave mobile handset mock-up equipped with four quasi-Yagi antennas. The assessed PD results are compared with those computed using full-wave simulations and also those measured with a planar near-field (NF) scanning system. In addition, three influencing factors related to the accuracy of the EQC method, namely, the angular resolution, the phase error, and the handset position in the IF measurements, are also analyzed. The proposed method is a good candidate for fast PD assessment of EMF exposure compliance testing in the mmWave frequency range.

...read moreread less

10 citations

Proceedings Article•DOI•

Making Punctuation Restoration Robust and Fast with Multi-Task Learning and Knowledge Distillation

[...]

Michael Hentschel¹, Emiru Tsunoo¹, Takao Okuda¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

06 Jun 2021

TL;DR: In this paper, a multi-task learning framework with ELECTRA, a recently proposed improvement on BERT, that has a generator-discriminator structure, is used to inject errors into the training data and, as their experiments show, this improves robustness against speech recognition errors during inference.

...read moreread less

Abstract: In punctuation restoration, we try to recover the missing punctuation from automatic speech recognition output to improve understandability. Currently, large pre-trained transformers such as BERT set the benchmark on this task but there are two main drawbacks to these models. First, the pre-training data does not match the output data from speech recognition that contains errors. Second, the large number of model parameters increases inference time. To address the former, we use a multi-task learning framework with ELECTRA, a recently proposed improvement on BERT, that has a generator-discriminator structure. The generator allows us to inject errors into the training data and, as our experiments show, this improves robustness against speech recognition errors during inference. To address the latter, we investigate knowledge distillation and parameter pruning of ELECTRA. In our experiments on the IWSLT 2012 benchmark data, a model with less than 11% the size of BERT achieved better performance while having an 82% faster inference time.

...read moreread less

8 citations

Journal Article•DOI•

Olfactory training with Aromastics: olfactory and cognitive effects.

[...]

Anna Oleszkiewicz¹, Laura Bottesi¹, Michał Pieniak², Shuji Fujita³, Nadejda Krasteva, Gabriele Nelles, Thomas Hummel¹ - Show less +3 more•Institutions (3)

Dresden University of Technology¹, University of Wrocław², Sony Broadcast & Professional Research Laboratories³

16 Apr 2021-European Archives of Oto-rhino-laryngology

TL;DR: In this paper, the authors examined whether increased frequency of OT leads to better outcomes in both olfactory and cognitive domains, and found that OT performed twice a day was more effective in supporting olfaction rehabilitation and interventions targeted to verbal semantic fluency than OT performed four times a day, even more so in subjects with lower baseline scores.

...read moreread less

Abstract: The olfactory system can be successfully rehabilitated with regular, intermittent stimulation during multiple daily exposures to selected sets of odors, i.e., olfactory training (OT). OT has been repeatedly shown to be an effective tool of olfactory performance enhancement. Recent advancements in studies on OT suggest that its beneficial effects exceed olfaction and extend to specific cognitive tasks. So far, studies on OT provided compelling evidence for its effectiveness, but there is still a need to search for an optimal OT protocol. The present study examined whether increased frequency of OT leads to better outcomes in both olfactory and cognitive domains. Fifty-five subjects (28 females; Mage = 58.2 ± 11.3 years; 26 patients with impaired olfaction) were randomly assigned to a standard (twice a day) or intense (four times a day) OT. Olfactory and cognitive measurements were taken before and after OT. OT performed twice a day was more effective in supporting olfactory rehabilitation and interventions targeted to verbal semantic fluency than OT performed four times a day, even more so in subjects with lower baseline scores. OT is effective in supporting olfactory rehabilitation and interventions targeted to verbal semantic fluency. However, it may be prone to a ceiling effect, being efficient in subjects presenting with lower baseline olfactory performance and lower verbal semantic fluency.

...read moreread less

Journal Article•DOI•

Model Predictive Path-Following Control of Snake Robots Using an Averaged Model

[...]

Hiroaki Fukushima¹, Taro Yanagiya², Yusuke Ota³, Masahiro Katsumoto, Fumitoshi Matsuno¹ - Show less +1 more•Institutions (3)

Kyoto University¹, Sony Broadcast & Professional Research Laboratories², Toyota³

01 Nov 2021-IEEE Transactions on Control Systems and Technology

TL;DR: A new simplified model for the control design of snake robots and apply it to a path-following control design using model predictive control (MPC), which imposes constraints on the change rates of these variables in the MPC design since the averaged model is derived by assuming that these variables slowly change.

...read moreread less

Abstract: We propose a new simplified model for the control design of snake robots and apply it to a path-following control design using model predictive control (MPC). While MPC has an advantage in that inequality constraints can be explicitly considered in control design, most of the previous simplified models are still too complex to apply to MPC since the models include joint angles as time-varying parameters. Thus, we exclude joint angles using the averaging method to construct a simpler model. Another feature of the proposed model is that it can be derived from the original complex model without parameter identification using simulation data and without assuming straight-line movements. In addition to inequality constraints on joint angles and the frequency of joint motions, we impose constraints on the change rates of these variables in our MPC design since the averaged model is derived by assuming that these variables slowly change. Furthermore, we introduce a soft constraint to decrease the effects of approximation error of the simplified model on the control performance. The effectiveness of the control system is verified in both simulations and experiments.

...read moreread less

Patent•

Compression encoding device, compression encoding method, decoding device, decoding method and program

[...]

Fukui Takao¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

19 May 2021

TL;DR: In this paper, a compressive encoding method, a decoding apparatus, and a decoding method are discussed. But the present disclosure is applicable to a CEC that compressively encodes an audio signal, and the like.

...read moreread less

Abstract: The present disclosure relates to a compressive encoding apparatus, a compressive encoding method, a decoding apparatus, a decoding method, and a program which can provide a lossless compression technology having a higher compression rate. An encoding unit of the compressive encoding apparatus converts M bits of a ”£-modulated digital signal into N bits (M > N) with reference to a first conversion table, and when the M bits are not able to be converted into the N bits with the first conversion table, converts the M bits into the N bits with reference to a second conversion table. When the number of bit patterns of the N bits is P, the first conversion table is a table storing (P-1) number of codes having higher generation frequencies for past bit patterns, and the second conversion table is a table storing (P-1) number of codes having higher generation frequencies for past bit patterns, which follow those of the first conversion table. The present disclosure is applicable to a compressive encoding apparatus that compressively encoding an audio signal, and the like, for example.

...read moreread less

Proceedings Article•DOI•

Data Augmentation Methods for End-to-End Speech Recognition on Distant-Talk Scenarios

[...]

Emiru Tsunoo¹, Kentaro Shibata, Chaitanya Narisetty², Yosuke Kashiwagi¹, Shinji Watanabe³ - Show less +1 more•Institutions (3)

Sony Broadcast & Professional Research Laboratories¹, Carnegie Mellon University², Johns Hopkins University³

30 Aug 2021

TL;DR: In this paper, the authors investigated data augmentation methods for end-to-end automatic speech recognition (E2E ASR) in distant-talk scenarios, which are suitable tasks for studying robustness against noisy and spontaneous speech.

...read moreread less

Abstract: Although end-to-end automatic speech recognition (E2E ASR) has achieved great performance in tasks that have numerous paired data, it is still challenging to make E2E ASR robust against noisy and low-resource conditions. In this study, we investigated data augmentation methods for E2E ASR in distant-talk scenarios. E2E ASR models are trained on the series of CHiME challenge datasets, which are suitable tasks for studying robustness against noisy and spontaneous speech. We propose to use three augmentation methods and thier combinations: 1) data augmentation using text-to-speech (TTS) data, 2) cycle-consistent generative adversarial network (Cycle-GAN) augmentation trained to map two different audio characteristics, the one of clean speech and of noisy recordings, to match the testing condition, and 3) pseudo-label augmentation provided by the pretrained ASR module for smoothing label distributions. Experimental results using the CHiME-6/CHiME-4 datasets show that each augmentation method individually improves the accuracy on top of the conventional SpecAugment; further improvements are obtained by combining these approaches. We achieved 4.3\% word error rate (WER) reduction, which was more significant than that of the SpecAugment, when we combine all three augmentations for the CHiME-6 task.

...read moreread less

Proceedings Article•DOI•

Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition

[...]

Yosuke Kashiwagi¹, Emiru Tsunoo¹, Shinji Watanabe²•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, Johns Hopkins University²

06 Jun 2021

TL;DR: In this article, the authors proposed a new architecture, which is a variant of the Gaussian kernel, which itself is a shift-invariant kernel, to mitigate the mismatch between the inference and training data.

...read moreread less

Abstract: ISelf-attention (SA) based models have recently achieved significant performance improvements in hybrid and end-to-end automatic speech recognition (ASR) systems owing to their flexible context modeling capability. However, it is also known that the accuracy degrades when applying SA to long sequence data. This is mainly due to the length mismatch between the inference and training data because the training data are usually divided into short segments for efficient training. To mitigate this mismatch, we propose a new architecture, which is a variant of the Gaussian kernel, which itself is a shift-invariant kernel. First, we mathematically demonstrate that self-attention with shared weight parameters for queries and keys is equivalent to a normalized kernel function. By replacing this kernel function with the proposed Gaussian kernel, the architecture becomes completely shift-invariant with the relative position information embedded using a frame indexing technique. The proposed Gaussian kernelized SA was applied to connectionist temporal classification (CTC) based ASR. An experimental evaluation with the Corpus of Spontaneous Japanese (CSJ) and TEDLIUM 3 benchmarks shows that the proposed SA achieves a significant improvement in accuracy (e.g., from 24.0% WER to 6.0% in CSJ) in long sequence data without any windowing techniques.

...read moreread less

Journal Article•DOI•

PT -symmetric Helmholtz resonator dipoles for sound directivity

[...]

Tetsu Magariyachi¹, Helena Arias Casals², Ramon Herrero², Muriel Botey², Kestutis Staliunas - Show less +1 more•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, Polytechnic University of Catalonia²

02 Mar 2021-Physical Review B

TL;DR: In this article, a composite non-Hermitian system in acoustics consisting of assemblies of π-symmetric Helmholtz resonator (HR) dipoles is proposed.

...read moreread less

Abstract: Parity-time ($\mathcal{PT}$)-symmetric or, more generally, non-Hermitian systems have opened a new area for unconventional management of waves, with significant applications, especially in optics. However, fewer proposals are found in acoustics, possibly due to the lack of a simple mechanism for coherent gain. In this paper, we propose a composite non-Hermitian system in acoustics consisting of assemblies of $\mathcal{PT}$-symmetric Helmholtz resonator (HR) dipoles. Like meta-atoms are used as building elements in metamaterials, we propose $\mathcal{PT}$-symmetric dipoles to design non-Hermitian systems intended to engineer complicated directivity fields. We theoretically analyze, numerically confirm, and experimentally show the symmetry breaking in a two-dimensional space of non-Hermitian dipoles consisting of a pair of Helmholtz resonators with different levels of gain and loss. In particular, we explore, as an application, a metastructure to concentrate the sound pressure inside the circular array formed by $\mathcal{PT}$-symmetric dipoles. The proposed HR dipoles may be a convenient composite element for smart control of sound.

...read moreread less

Journal Article•DOI•

48-5: Eye-sensing Light Field Display for Spatial Reality Reproduction

[...]

Koji Aoyama¹, Kazuki Yokoyama¹, Tomoya Yano¹, Yuji Nakahata¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

01 May 2021

Patent•

Liquid crystal display device and electronic apparatus

[...]

Okazaki Tsuyoshi¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

04 Mar 2021

TL;DR: A liquid crystal display device includes a plurality of pixels each of which includes a first electrode, a second electrode opposed to the first electrode and a liquid crystal layer between the second electrode and the first electrodes as mentioned in this paper.

...read moreread less

Abstract: A liquid crystal display device includes: a plurality of pixels each of which includes a first electrode, a second electrode opposed to the first electrode, and a liquid crystal layer between the second electrode and the first electrode; a first region that is provided in each of the plurality of pixels, and has a first optical path length between the first electrode and the second electrode; a second region that is provided in each of the plurality of pixels, and has a second optical path length between the first electrode and the second electrode, the second optical path length being shorter than the first optical path length, the second region being provided with the liquid crystal layer equal in thickness to the liquid crystal layer in the first region; and an optical path length adjusting layer that is provided between the liquid crystal layer and the first electrode in the first region, and fills a difference in level between the first electrode in the second region and the first electrode in the first region.

...read moreread less

Posted Content•

Adversarial Attacks for Tabular Data: Application to Fraud Detection and Imbalanced Data

[...]

Cartella Francesco, Orlando Anunciação¹, Yuki Funabiki, Daisuke Yamaguchi, Toru Akishita, Elshocht Olivier - Show less +2 more•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

20 Jan 2021-arXiv: Cryptography and Security

TL;DR: In this article, the authors illustrate a novel approach to modify and adapt state-of-the-art algorithms to imbalanced tabular data, in the context of fraud detection.

...read moreread less

Abstract: Guaranteeing the security of transactional systems is a crucial priority of all institutions that process transactions, in order to protect their businesses against cyberattacks and fraudulent attempts. Adversarial attacks are novel techniques that, other than being proven to be effective to fool image classification models, can also be applied to tabular data. Adversarial attacks aim at producing adversarial examples, in other words, slightly modified inputs that induce the Artificial Intelligence (AI) system to return incorrect outputs that are advantageous for the attacker. In this paper we illustrate a novel approach to modify and adapt state-of-the-art algorithms to imbalanced tabular data, in the context of fraud detection. Experimental results show that the proposed modifications lead to a perfect attack success rate, obtaining adversarial examples that are also less perceptible when analyzed by humans. Moreover, when applied to a real-world production system, the proposed techniques shows the possibility of posing a serious threat to the robustness of advanced AI-based fraud detection procedures.

...read moreread less

Journal Article•DOI•

RecipeBowl: A Cooking Recommender for Ingredients and Recipes Using Set Transformer

[...]

Keonwoo Kim¹, Donghyeon Park¹, Michael Spranger², Kana Maruyama², Jaewoo Kang² - Show less +1 more•Institutions (2)

Korea University¹, Sony Broadcast & Professional Research Laboratories²

01 Jan 2021-IEEE Access

TL;DR: This paper proposed RecipeBowl, which is a cooking recommendation system that takes a set of ingredients and cooking tags as input and suggests possible ingredient and recipe choices, based on a set encoder and a 2-way decoder for prediction.

...read moreread less

Abstract: Countless possibilities of recipe combinations challenge us to determine which additional ingredient goes well with others. In this work, we propose RecipeBowl which is a cooking recommendation system that takes a set of ingredients and cooking tags as input and suggests possible ingredient and recipe choices. We formulate a recipe completion task to train RecipeBowl on our constructed dataset where the model predicts a target ingredient previously eliminated from the original recipe. The RecipeBowl consists of a set encoder and a 2-way decoder for prediction. For the set encoder, we utilize the Set Transformer that builds meaningful set representations. Overall, our model builds a set representation of an leave-one-out recipe and maps it to the ingredient and recipe embedding space. Experimental results demonstrate the effectiveness of our approach. Furthermore, analysis on model predictions and interpretations show interesting insights related to cooking knowledge.

...read moreread less

Proceedings Article•DOI•

Adversarial Attacks on Audio Source Separation

[...]

Naoya Takahashi¹, Shota Inoue², Yuki Mitsufuji¹•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, University of Tsukuba²

06 Jun 2021

TL;DR: In this article, the authors reformulated various adversarial attack methods for the audio source separation problem and intensively investigated them under different attack conditions and target models, and proposed a simple yet effective regularization method to obtain imperceptible adversarial noise while maximizing the impact on separation quality.

...read moreread less

Abstract: Despite the excellent performance of neural-network-based audio source separation methods and their wide range of applications, their robustness against intentional attacks has been largely neglected. In this work, we reformulate various adversarial attack methods for the audio source separation problem and intensively investigate them under different attack conditions and target models. We further propose a simple yet effective regularization method to obtain imperceptible adversarial noise while maximizing the impact on separation quality with low computational complexity. Experimental results show that it is possible to largely degrade the separation quality by adding imperceptibly small noise when the noise is crafted for the target model. We also show the robustness of source separation models against a black-box attack. This study provides potentially useful insights for developing content protection methods against the abuse of separated signals and improving the separation performance and robustness.

...read moreread less

Journal Article•DOI•

Multi-mode dual-polarised cavity backed patch antenna array for 5G mobile devices

[...]

Carla Di Paola¹, Kun Zhao¹, Kun Zhao², Gert Frølund Pedersen¹, Shuai Zhang¹ - Show less +1 more•Institutions (2)

Aalborg University¹, Sony Broadcast & Professional Research Laboratories²

29 Jan 2021-Iet Microwaves Antennas & Propagation

Journal Article•DOI•

Impact of oxygen on band structure at the Ni/GaN interface revealed by hard X-ray photoelectron spectroscopy

[...]

Hirotaka Mizushima¹, Ryoji Arai¹, Yuta Inaba¹, Shunsuke Yamashita¹, Yudai Yamaguchi¹, Yuya Kanitani¹, Yoshihiro Kudo¹, Tatsushi Hamaguchi¹, Rintaro Koda¹, Katsunori Yanashima¹, Tadakatsu Ohkubo², Kazuhiro Hono², Shigetaka Tomiya¹ - Show less +9 more•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, National Institute for Materials Science²

22 Mar 2021-Applied Physics Letters

TL;DR: In this article, the impact of oxygen on the band structure at the Ni/p-type GaN interface was investigated using transmission electron microscopy and three-dimensional atom probe (3DAP) analysis.

...read moreread less

Abstract: To investigate the impact of oxygen on the band structure at the Ni/p-type GaN interface, the crystal structure and nanoscale impurity distributions were evaluated using transmission electron microscopy and three-dimensional atom probe (3DAP) analysis, respectively. These measurements revealed that the oxygen region existed approximately 5 nm from the GaN surface and that the oxygen concentration was equal to or higher than the Mg acceptor concentration. The band bending and photoelectron spectrum were then simulated using the Mg and oxygen concentration profiles obtained by 3DAP to consider the impact of the interfacial oxygen donors on the photoelectron spectrum measured using hard X-ray photoelectron spectroscopy (HAXPES). The precise band bending was then determined by fitting the simulated spectrum onto the experimental measurements. This showed that the oxygen donors at the interface modulated the band structure and decreased the energy barrier by at least 0.1 eV, which demonstrates the importance of considering the existence of oxygen at the interface. It is, therefore, essential to use techniques like 3DAP and HAXPES to evaluate both the nanoscale impurity distributions and the resulting band structure to fabricate higher-performance devices.

...read moreread less

Patent•

Information processing device and method

[...]

Eshima Masashi¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

25 May 2021

TL;DR: In this paper, an information processing device and a method that enable suppression of reduction in coding efficiency while suppressing a reduction in subjective quality are discussed. But the present disclosure relates to an information process device and an approach that can be applied to any information processing method.

...read moreread less

Abstract: The present disclosure relates to an information processing device and an information processing method that enable suppression of a reduction in coding efficiency while suppressing a reduction in subjective quality. Information regarding a three-dimensional region is encoded on the basis of a distribution related to overlapping of visual fields of the three-dimensional region to be imaged by a plurality of imaging units, the distribution being specified by using a parameter relating to overlapping of visual fields that are imaging ranges of the plurality of imaging units. The present disclosure can be applied to, for example, an information processing device, an encoding device, and the like.

...read moreread less

Patent•

Transmitting node, receiving node, methods and mobile communications system

[...]

Martin Brian Alexander¹, Sharma Vivek¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

16 Feb 2021

TL;DR: In this paper, the authors propose an automatic repeat request process (ARP) for the transmission of data from one or more service data units to a receiving node according to an automatic repeated request process.

...read moreread less

Abstract: A transmitting node operating with a mobile communications system comprises transmitter circuitry configured to transmit signals representing protocol data units formed from one or more service data units via a wireless access interface of the mobile communications system to a receiving node of the mobile communications system according to an automatic repeat request process, receiver circuitry configured to receive signals from the receiving node via the wireless access interface, controller circuitry configured to control the transmitter circuitry to transmit the signals and to control the receiver circuitry to receive the signals, and a buffer configured to store data conveyed by or representing the protocol data units for transmission to the receiving node according to the automatic repeat request process, wherein each of the protocol data units has a sequence number defining their position in a predetermined order.

...read moreread less

Journal Article•DOI•

A New Primary Protection Method With Received Power-Based 3D Antenna Rotation Range Prediction for Dynamic Spectrum Access

[...]

Hiroto Kuriki¹, Keita Onose¹, Ryota Kimura¹, Ryo Sawai¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

17 May 2021-IEEE Access

TL;DR: In this paper, the authors proposed a new primary protection method for dynamic spectrum access (DSA), in which a secondary system uses a frequency band assigned to a primary system, and a corresponding primary reception station (PRS) keeps its antenna boresight facing towards the moving primary system.

...read moreread less

Abstract: We propose a new primary protection method for Dynamic Spectrum Access (DSA), in which a secondary system uses a frequency band assigned to a primary system. Following a DSA implementation plan in Japan, we take care of a practical scenario where a primary system’s transmission station (PTS) moves along a predefined course or area, and a corresponding primary reception station (PRS) keeps its antenna boresight facing towards the moving PTS. For accurate interference calculation from the secondary to the primary systems in this scenario, the proposed method takes the movement of the PTS into account and predicts a range of angle variation of the PRS’s antenna boresight based on received signal powers from the PTS to the PRS. We conducted computer simulations in three different practical scenarios to demonstrate the effectiveness of the proposed method. The simulation results show that the proposed method can increase the number of available secondary base stations (SBSs) by up to 1.93 times compared to a conventional method.

...read moreread less

Proceedings Article•DOI•

Efficient Real-Time Inference in Temporal Convolution Networks

[...]

Piyush Khandelwal¹, James Macglashan¹, Peter Wurman¹, Peter Stone²•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, University of Texas at Austin²

30 May 2021

TL;DR: RT-TCN as mentioned in this paper uses the output of prior convolution operations to minimize the computational requirements and persistent memory footprint of a TCN during real-time inference, which can be used for realtime inference on devices with limited compute and memory, especially if the receptive field is large.

...read moreread less

Abstract: It has been recently demonstrated that Temporal Convolution Networks (TCNs) provide state-of-the-art results in many problem domains where the input data is a time-series. TCNs typically incorporate information from a long history of inputs (the receptive field) into a single output using many convolution layers. Real-time inference using a trained TCN can be challenging on devices with limited compute and memory, especially if the receptive field is large. This paper introduces the RT-TCN algorithm that reuses the output of prior convolution operations to minimize the computational requirements and persistent memory footprint of a TCN during real-time inference. We also show that when a TCN is trained using time slices of the input time-series, it can be executed in realtime continually using RT-TCN. In addition, we provide TCN architecture guidelines that ensure that real-time inference can be performed within memory and computational constraints.

...read moreread less

Patent•

Display device and electronic apparatus

[...]

Suzuki Masaki¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

05 Jan 2021

TL;DR: In this article, a display device, including a plurality of light emitting sections formed on a substrate, and reflectors provided above the LEs with respect to the plurality of LEs positioned in at least a partial region of a display surface, lower surfaces of the reflectors reflecting part of emission light from LEs.

...read moreread less

Abstract: [Solution] Provided is a display device, including: a plurality of light emitting sections formed on a substrate; and reflectors provided above the light emitting sections with respect to the plurality of light emitting sections positioned in at least a partial region of a display surface, lower surfaces of the reflectors reflecting part of emission light from the light emitting sections. The light emitting sections and the reflectors are arranged in a state in which centers of the reflectors are shifted from centers of luminescence surfaces of the light emitting sections in a plane perpendicular to a stacking direction so that light emitted in a direction other than a desired direction among the emission light from the light emitting sections is reflected.

...read moreread less

Patent•

Information processing device, method, program, and multi-camera system

[...]

Oryoji Hiroshi¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

05 May 2021

TL;DR: In this paper, an information processing device including an information acquisition unit configured to acquire camera location information indicating locations of a plurality of imaging cameras located in an imaging space was provided to find a camera location for optimizing calibration of a multi-camera system efficiently without trial and error.

...read moreread less

Abstract: [Object] To find a camera location for optimizing calibration of a multi-camera system efficiently without trial and error [Solution] There is provided an information processing device including: an information acquisition unit configured to acquire camera location information indicating locations of a plurality of imaging cameras located in an imaging space; and an evaluation unit configured to evaluate calibration accuracy obtained in a case of locating a calibration camera in the imaging space on a basis of the location of each of the plurality of imaging cameras indicated by the camera location information and a location of the calibration camera

...read moreread less

Patent•

Information processing apparatus and information processing method

[...]

Ishihara Atsushi¹, Ishikawa Tsuyoshi, Aga Hiroyuki, Kawasaki Koichi, Nishibe Mitsuru¹, Kusano Yuji - Show less +2 more•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

16 Mar 2021

TL;DR: In this paper, an information processing apparatus including circuitry configured to acquire information indicating a spatial relationship between a real object and a virtual object, and initiate generation of a user feedback based on the acquired information, the user feedback being displayed to be augmented to a generated image obtained based on capturing by an imaging device, or augmented to the perceived view of the real world.

...read moreread less

Abstract: An information processing apparatus including circuitry configured to acquire information indicating a spatial relationship between a real object and a virtual object, and initiate generation of a user feedback based on the acquired information, the user feedback being displayed to be augmented to a generated image obtained based on capturing by an imaging device, or augmented to a perceived view of the real world, and wherein a characteristic of the user feedback is changed when the spatial relationship between the real object and the virtual object changes.

...read moreread less

Collapse