K
Kazuhito Koishida
Researcher at Microsoft
Publications - 54
Citations - 1439
Kazuhito Koishida is an academic researcher from Microsoft. The author has contributed to research in topics: Encoder & Speech coding. The author has an hindex of 18, co-authored 51 publications receiving 1190 citations.
Papers
More filters
Journal ArticleDOI
Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
TL;DR: A novel text-independent speaker verification framework based on the triplet loss and a very deep convolutional neural network architecture are investigated in this study, where a fixed-length speaker discriminative embedding is learned from sparse speech features and utilized as a feature representation for the SV tasks.
Patent
Bitstream syntax for multi-process audio decoding
TL;DR: In this article, an audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques, and a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.
Patent
Sub-band voice codec with multi-stage codebooks and redundant coding
TL;DR: In this article, techniques and tools related to coding and decoding of audio information are described, as well as tools and techniques related to decoding audio information, such as redundant coded information for decoding a current frame includes signal history information associated with only a portion of a previous frame.
Posted Content
MMTM: Multimodal Transfer Module for CNN Fusion
TL;DR: A simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks, named Multimodal Transfer Module (MMTM), which improves the recognition accuracy of well-known multimodal networks.
Patent
LPC-harmonic vocoder with superframe structure
TL;DR: In this paper, an enhanced low-bit rate parametric voice coder that groups a number of frames from an underlying frame-based vocoder, such as MELP, into a superframe structure is presented.