scispace - formally typeset
K

Kazuhito Koishida

Researcher at Microsoft

Publications -  54
Citations -  1439

Kazuhito Koishida is an academic researcher from Microsoft. The author has contributed to research in topics: Encoder & Speech coding. The author has an hindex of 18, co-authored 51 publications receiving 1190 citations.

Papers
More filters
Journal ArticleDOI

Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings

TL;DR: A novel text-independent speaker verification framework based on the triplet loss and a very deep convolutional neural network architecture are investigated in this study, where a fixed-length speaker discriminative embedding is learned from sparse speech features and utilized as a feature representation for the SV tasks.
Patent

Bitstream syntax for multi-process audio decoding

TL;DR: In this article, an audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques, and a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.
Patent

Sub-band voice codec with multi-stage codebooks and redundant coding

TL;DR: In this article, techniques and tools related to coding and decoding of audio information are described, as well as tools and techniques related to decoding audio information, such as redundant coded information for decoding a current frame includes signal history information associated with only a portion of a previous frame.
Posted Content

MMTM: Multimodal Transfer Module for CNN Fusion

TL;DR: A simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks, named Multimodal Transfer Module (MMTM), which improves the recognition accuracy of well-known multimodal networks.
Patent

LPC-harmonic vocoder with superframe structure

TL;DR: In this paper, an enhanced low-bit rate parametric voice coder that groups a number of frames from an underlying frame-based vocoder, such as MELP, into a superframe structure is presented.