scispace - formally typeset
Search or ask a question
Author

Janne Vainio

Bio: Janne Vainio is an academic researcher from Nokia. The author has contributed to research in topics: Speech coding & Codec. The author has an hindex of 17, co-authored 40 publications receiving 1269 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the adaptive multirate wideband (AMR-WB) speech codec was selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services.
Abstract: This paper describes the adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000 and the corresponding specifications were approved in March 2001. The AMR-WB codec was also selected by the International Telecommunication Union-Telecommunication Sector (ITU-T) in July 2001 in the standardization activity for wideband speech coding around 16 kb/s and was approved in January 2002 as Recommendation G.722.2. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 50 Hz to 7 kHz and gives superior speech quality and voice naturalness compared to existing second- and third-generation mobile communication systems. The wideband speech service provided by the AMR-WB codec will give mobile communication speech quality that also substantially exceeds (narrowband) wireline quality. The paper details AMR-WB standardization history, algorithmic description including novel techniques for efficient ACELP wideband speech coding and subjective quality performance of the codec.

312 citations

Patent
Janne Bergman1, Janne Vainio1
24 Nov 2010
TL;DR: In this article, the authors present a method for displaying selectable objects on a graphical user interface, where each of the selected objects corresponds to data or an application accessible via the GUI.
Abstract: A method including causing, at least in part, display of selectable objects on a graphical user interface, where each of the selectable objects corresponds to data or an application accessible via the graphical user interface. The methodfurther includes causing, at least in part, display of the selectable objects in motion travelling across the graphical user interface based on a category of the selectable objector context dependent data, and allowing user selection and manipulationofthe selectable objects displayed on the graphical user interface.

177 citations

Patent
28 May 2008
TL;DR: In this article, a method including detecting a selection of a plurality of erroneous words in text presented on a display of a device, in an automatic speech recognition system, receiving sequentially dictated corrections for the selected erroneous words, in a single continuous operation where each dictated correction corresponds to at least one of the selected incorrect words, and replacing the plurality of incorrect words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or multiple corresponding words in an order the erroneous words appear according to a reading direction of the text.
Abstract: A method including detecting a selection of a plurality of erroneous words in text presented on a display of a device, in an automatic speech recognition system, receiving sequentially dictated corrections for the selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and replacing the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text.

173 citations

Proceedings ArticleDOI
21 Apr 1997
TL;DR: The GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system provides wireline quality not only for error-free conditions but also for the most typical error conditions.
Abstract: This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit/s ADPCM). The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for error protection. Speech coding is based on the ACELP algorithm (algebraic code excited linear prediction). The codec provides substantial quality improvement compared to the existing GSM full rate and half rate codecs. The old GSM codecs lack wireline quality even in error-free channel conditions, while the EFR codec provides wireline quality not only for error-free conditions but also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile to mobile calls).

84 citations

Patent
16 Feb 2005
TL;DR: In this paper, an encoder (200) comprising an input (201) for inputting frames of an audio signal in a frequency band, at least a first excitation block (206) for performing first excitations for a speech-like audio signal, and a second excitation blocks (207), for performing second excinations for a non-speech like audio signal.
Abstract: The invention relates to an encoder (200) comprising an input (201) for inputting frames of an audio signal in a frequency band, at least a first excitation block (206) for performing a first excitation for a speech like audio signal, and a second excitation block (207) for performing a second excitation for a non-speech like audio signal. The encoder (200) further comprises a filter (300) for dividing the frequency band into a plurality of sub bands each having a narrower bandwidth than said frequency band. The encoder (200) also comprises an excitation selection block (203) for selecting one excitation block among said at least first excitation block (206) and said second excitation block (207) for performing the excitation for a frame of the audio signal on the basis of the properties of the audio signal at least at one of said sub bands. The invention also relates to a device, a system, a method and a storage medium for a computer program.

45 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work develops and analyzes low-energy adaptive clustering hierarchy (LEACH), a protocol architecture for microsensor networks that combines the ideas of energy-efficient cluster-based routing and media access together with application-specific data aggregation to achieve good performance in terms of system lifetime, latency, and application-perceived quality.
Abstract: Networking together hundreds or thousands of cheap microsensor nodes allows users to accurately monitor a remote environment by intelligently combining the data from the individual nodes. These networks require robust wireless communication protocols that are energy efficient and provide low latency. We develop and analyze low-energy adaptive clustering hierarchy (LEACH), a protocol architecture for microsensor networks that combines the ideas of energy-efficient cluster-based routing and media access together with application-specific data aggregation to achieve good performance in terms of system lifetime, latency, and application-perceived quality. LEACH includes a new, distributed cluster formation technique that enables self-organization of large numbers of nodes, algorithms for adapting clusters and rotating cluster head positions to evenly distribute the energy load among all the nodes, and techniques to enable distributed signal processing to save communication resources. Our results show that LEACH can improve system lifetime by an order of magnitude compared with general-purpose multihop approaches.

10,296 citations

Patent
14 Jun 2016
TL;DR: Newness and distinctiveness is claimed in the features of ornamentation as shown inside the broken line circle in the accompanying representation as discussed by the authors, which is the basis for the representation presented in this paper.
Abstract: Newness and distinctiveness is claimed in the features of ornamentation as shown inside the broken line circle in the accompanying representation.

1,500 citations

Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Dissertation
01 Jan 2000
TL;DR: This dissertation supports the claim that application-specific protocol architectures achieve the energy and latency efficiency and error robustness needed for wireless networks by developing two systems.
Abstract: In recent years, advances in energy-efficient design and wireless technologies have enabled exciting new applications for wireless devices. These applications span a wide range, including real-time and streaming video and audio delivery, remote monitoring using networked microsensors, personal medical monitoring, and home networking of everyday appliances. While these applications require high performance from the network, they suffer from resource constraints that do not appear in more traditional wired computing environments. In particular, wireless spectrum is scarce, often limiting the bandwidth available to applications and making the channel error-prone, and the nodes are battery-operated, often limiting available energy. My thesis is that this harsh environment with severe resource constraints requires an application-specific protocol architecture, rather than the traditional layered approach, to obtain the best possible performance. This dissertation supports this claim using detailed case studies on microsensor networks and wireless video delivery. The first study develops LEACH (Low-Energy Adaptive Clustering Hierarchy), an architecture for remote microsensor networks that combines the ideas of energy-efficient cluster-based routing and media access together with application-specific data aggregation to achieve good performance in terms of system lifetime, latency, and application-perceived quality. This approach improves system lifetime by an order of magnitude compared to general-purpose approaches when the node energy is limited. The second study develops an unequal error protection scheme for MPEG-4 compressed video delivery that adapts the level of protection applied to portions of a packet to the degree of importance of the corresponding bits. This approach obtains better application-perceived performance than current approaches for the same amount of transmission bandwidth. These two systems show that application-specific protocol architectures achieve the energy and latency efficiency and error robustness needed for wireless networks. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

1,253 citations

Patent
28 Sep 2012
TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.
Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

593 citations