scispace - formally typeset
Search or ask a question
Author

Urs-Viktor Marti

Bio: Urs-Viktor Marti is an academic researcher from Swisscom. The author has contributed to research in topics: Terrestrial television & Pixel. The author has an hindex of 6, co-authored 11 publications receiving 95 citations.

Papers
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this paper, the authors optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms, which results in more realistic textures and sharper edges.
Abstract: By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart. Although such objective functions generate near-photorealistic results, their capability is limited, since they estimate the reconstruction error for an entire image in the same way, without considering any semantic information. In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. We optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms. In particular, the proposed method leverages our proposed OBB (Object, Background and Boundary) labels, generated from segmentation labels, to estimate a suitable perceptual loss for boundaries, while considering texture similarity for backgrounds. We show that our proposed approach results in more realistic textures and sharper edges, and outperforms other state-of-the-art algorithms in terms of both qualitative results on standard benchmarks and results of extensive user studies.

113 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an encoder architecture able to extract and use semantic information to super-resolve a given image by using multitask learning, simultaneously for image super-resolution and semantic segmentation.

13 citations

Patent
11 Jun 2003
TL;DR: A method used in a speech-enabled automatic directory system for determining a fallback threshold is described in this article. But this method is not suitable for automatic directory systems, as it requires the use of speech data corresponding to names uttered by users, and it requires a speech recognition system over this speech data to determine the false acceptance rate for various thresholds.
Abstract: A method used in a speech-enabled automatic directory system for determining a fallback threshold, wherein a fallback decision is taken by said directory system when a metric delivered by a speech recognition system is lower than said threshold, said method comprising: collecting speech data corresponding to names uttered by users, running a speech recognition system over this speech data, determining the false acceptance rate for various thresholds of a metric delivered by said speech recognition system, determining an adequate fallback threshold based on said false acceptance rate.

10 citations

Patent
04 Mar 2004
TL;DR: In this paper, a system for recording and playback of television signals from a plurality of television channels is proposed, which comprises a computer-based controlling central unit, connectible to a telecommunication network, and a multiplicity of television receivers, connected to the controlling center unit, for receiving the television signals.
Abstract: Proposed is a system for recording and playback of television signals from a plurality of television channels, which comprises a computer-based controlling central unit, connectible to a telecommunication network, and a plurality of television receivers, connected to the controlling central unit, for receiving the television signals in each case on one of the television channels via cable television networks and/or television antennas for terrestrial television broadcasting or satellite television transmission. The system further comprises coding modules, connected to the television receivers, for coding the received television signals in a digital format. The controlling central unit is set up to receive recording instructions from users via the telecommunication network and to store the television signals, coded in digital format, which have been received on the television channel specified by the stored recording instructions, at a time specified by the stored recording instructions. The system further comprises a playback module for transmitting the television signals, stored in digital format, via the telecommunication network, in each case for playback on a terminal of a respective user. The system enables users to have television signals from a plurality of television channels recorded at the same time without it being necessary for them to have a video recorder at their disposal or to operate a video recorder.

8 citations

Patent
09 Sep 2014
TL;DR: In this article, a graphical user interface may be configured to display consecutive visual elements, displaying one visual element, selected as a focus element, in a focus area of the graphical interface, and displaying one or more of the consecutive visual items, preceding or following the focus element in the list, as non-focus elements outside the focus area, at particular display positions.
Abstract: Methods and systems are provided for configuring a graphical user interface that is used for browsing a list of visual elements. The graphical user interface may be configured to display consecutive visual elements, displaying one visual element, selected as a focus element, in a focus area of the graphical user interface, and displaying one or more of the consecutive visual elements, preceding or following the focus element in the list, as non-focus elements outside the focus area, at particular display positions. When focus is moved, a different visual element may be displayed in the focus area and the remaining visual elements may be displayed at rearranged display positions. The display positions may be arranged along two or more presentation lines running through the focus area, with the position of each visual element being representative of a visual element's position in the list with respect to the focus element.

6 citations


Cited by
More filters
Patent
02 Feb 2006
TL;DR: In this paper, a speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user, under certain conditions, information about the response expected from user may be available.
Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response.

517 citations

Patent
18 May 2012
TL;DR: In this paper, a method and apparatus that dynamically adjust operational parameters of a text-to-speech engine in a speech-based system are disclosed, in response to one or more environmental conditions.
Abstract: A method and apparatus that dynamically adjust operational parameters of a text-to-speech engine in a speech-based system are disclosed. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.

407 citations

Patent
17 Oct 2014
TL;DR: In this article, a method for identifying possible errors made by a speech recognition system without using a transcript of words input to the system is described. But this method does not consider the use of a word-to-word model.
Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The apparatus may further include a controller adapted to adjust an adaptation of the model for the word or various models for the various words, based on the error rate.

306 citations

Patent
15 Mar 2013
TL;DR: In this article, a communication component modifies production of an audio waveform at determined modification segments to mitigate the effects of a delay in processing and/or receiving a subsequent audio wave form.
Abstract: A communication component modifies production of an audio waveform at determined modification segments to thereby mitigate the effects of a delay in processing and/or receiving a subsequent audio waveform. The audio waveform and/or data associated with the audio waveform are analyzed to identify the modification segments based on characteristics of the audio waveform and/or data associated therewith. The modification segments show where the production of the audio waveform may be modified without substantially affecting the clarity of the sound or audio. In one embodiment, the invention modifies the sound production at the identified modification segments to extend production time and thereby mitigate the effects of delay in receiving and/or processing a subsequent audio waveform for production.

302 citations

Patent
19 Sep 2003
TL;DR: In this article, a system that maps media content information to an interactive program guide (400) displayed on a screen is described, among other things, a memory with logic (351), and a processor (344) configured with the logic to display at least one personal video recording display channel in the interactive program guides.
Abstract: A system (16) that maps media content information (352) to an interactive program guide (400) displayed on a screen (341) includes, among other things, a memory with logic (351), and a processor (344) configured with the logic to display at least one personal video recording display channel in the interactive program guide (400). The processor (344) is further preferably configured with the logic to display media content instance listings (460) in the personal video recording display channel (480) for corresponding media content instance recordings (410).

175 citations