Home
/
Authors
/
Simon Dobrišek

Author

Simon Dobrišek

Bio: Simon Dobrišek is an academic researcher from University of Ljubljana. The author has contributed to research in topics: Facial recognition system & Computer science. The author has an hindex of 13, co-authored 59 publications receiving 552 citations.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2005
2003
2002
2000
1999
1998
1997
1996
1992

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Evolution of the Information-Retrieval System for Blind and Visually-Impaired People

[...]

Simon Dobrišek¹, Jerneja Gros¹, Boštjan Vesnicer¹, Nikola Pavešić#x¹•Institutions (1)

University of Ljubljana¹

01 Jul 2003-International Journal of Speech Technology

TL;DR: In the latest version of the system all the modules of the early version are being integrated into the user interface, which has some basic web-browsing functionalities and a text-to-speech screen-reader function controlled by the mouse as well.

...read moreread less

Abstract: Blind and visually-impaired people face many problems in interacting with information retrieval systems. State-of-the-art spoken language technology offers potential to overcome many of them. In the mid-nineties our research group decided to develop an information retrieval system suitable for Slovene-speaking blind and visually-impaired people. A voice-driven text-to-speech dialogue system was developed for reading Slovenian texts obtained from the Electronic Information System of the Association of Slovenian Blind and Visually Impaired Persons Societies. The evolution of the system is presented. The early version of the system was designed to deal explicitly with the Electronic Information System where the available text corpora are stored in a plain text file format without any, or with just some, basic non-standard tagging. Further improvements to the system became possible with the decision to transfer the available corpora to the new web portal, exclusively dedicated to blind and visually-impaired users. The text files were reformatted into common HTML/XML pages, which comply with the basic recommendations set by the Web Access Initiative. In the latest version of the system all the modules of the early version are being integrated into the user interface, which has some basic web-browsing functionalities and a text-to-speech screen-reader function controlled by the mouse as well.

...read moreread less

95 citations

Journal Article•DOI•

Towards Efficient Multi-Modal Emotion Recognition

[...]

Simon Dobrišek¹, Rok Gajsek¹, Nikola Pavešić¹, Vitomir Struc¹•Institutions (1)

University of Ljubljana¹

01 Jan 2013-International Journal of Advanced Robotic Systems

TL;DR: The developed system represents a feasible solution to emotion recognition that can easily be integrated into various systems, such as humanoid robots, smart surveillance systems and alike.

...read moreread less

Abstract: The paper presents a multi-modal emotion recognition system exploiting audio and video (i.e., facial expression) information. The system first processes both sources of information individually to produce corresponding matching scores and then combines the computed matching scores to obtain a classification decision. For the video part of the system, a novel approach to emotion recognition, relying on image-set matching, is developed. The proposed approach avoids the need for detecting and tracking specific facial landmarks throughout the given video sequence, which represents a common source of error in video-based emotion recognition systems, and, therefore, adds robustness to the video processing chain. The audio part of the system, on the other hand, relies on utterance-specific Gaussian Mixture Models (GMMs) adapted from a Universal Background Model (UBM) via the maximum a posteriori probability (MAP) estimation. It improves upon the standard UBM-MAP procedure by exploiting gender information when bu...

...read moreread less

71 citations

Journal Article•DOI•

How to Correctly Detect Face-Masks for COVID-19 from Visual Information?

[...]

Borut Batagelj, Peter Peer, Vitomir Struc, Simon Dobrišek

26 Feb 2021-Applied Sciences

TL;DR: A comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images is conducted and the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement is investigated.

...read moreread less

Abstract: The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community.

...read moreread less

65 citations

Proceedings Article•DOI•

Speaker de-identification using diphone recognition and speech synthesis

[...]

Tadej Justin¹, Vitomir Struc¹, Simon Dobrišek¹, Boštjan Vesnicer, Ivo Ipšić² - Show less +1 more•Institutions (2)

University of Ljubljana¹, University of Rijeka²

04 May 2015

TL;DR: The comparison of both speech synthesis modules integrated in the proposed DROPSY-based approach reveals that both can efficiently de-identify the input speakers while still producing intelligible speech.

...read moreread less

Abstract: The paper addresses the problem of speaker (or voice) de-identification by presenting a novel approach for concealing the identity of speakers in their speech. The proposed technique first recognizes the input speech with a diphone recognition system and then transforms the obtained phonetic transcription into the speech of another speaker with a speech synthesis system. Due to the fact that a Diphone RecOgnition step and a sPeech SYnthesis step are used during the de-identification, we refer to the developed technique as DROPSY. With this approach the acoustical models of the recognition and synthesis modules are completely independent from each other, which ensures the highest level of input speaker de-identification. The proposed DROPSY-based de-identification approach is language dependent, text independent and capable of running in real-time due to the relatively simple computing methods used. When designing speaker de-identification technology two requirements are typically imposed on the de-identification techniques: i) it should not be possible to establish the identity of the speakers based on the de-identified speech, and ii) the processed speech should still sound natural and be intelligible. This paper, therefore, implements the proposed DROPSY-based approach with two different speech synthesis techniques (i.e, with the HMM-based and the diphone TD-PSOLA-based technique). The obtained de-identified speech is evaluated for intelligibility and evaluated in speaker verification experiments with a state-of-the-art (i-vector/PLDA) speaker recognition system. The comparison of both speech synthesis modules integrated in the proposed method reveals that both can efficiently de-identify the input speakers while still producing intelligible speech.

...read moreread less

40 citations

Journal Article•DOI•

Spoken Language Resources at LUKS of the University of Ljubljana

[...]

Jerneja Gros¹, Simon Dobrišek¹, Janez Žibert¹, Nikola Pavešić#x¹•Institutions (1)

University of Ljubljana¹

01 Jul 2003-International Journal of Speech Technology

TL;DR: This paper presents the Slovene-language spoken resources that were acquired at the Laboratory of Artificial Perception, Systems and Cybernetics (LUKS) at the Faculty of Electrical Engineering, University of Ljubljana over the past ten years.

...read moreread less

Abstract: This paper presents the Slovene-language spoken resources that were acquired at the Laboratory of Artificial Perception, Systems and Cybernetics (LUKS) at the Faculty of Electrical Engineering, University of Ljubljana over the past ten years. The resources consist of: • isolated-spoken-word corpora designed for phonetic research of the Slovene spoken language; • read-speech corpora from dialogues relating to air flight information; • isolated-word corpora, designed for studying the Slovene spoken diphthongs; • Slovene diphone corpora used for text-to-speech synthesis systems; • a weather forecast speech database, as an attempt to capture radio and television broadcast news in the Slovene language; and • read- and spontaneous-speech corpora used to study the effects of the psycho physical conditions of the speakers on their speech characteristics. All the resources are accompanied by relevant text transcriptions, lexicons and various segmentation labels. The read-speech corpora relating to the air flight information domain also are annotated prosodically and semantically. The words in the orthographic transcription were automatically tagged for their lemma and morphosyntactic description. Many of the mentioned speech resources are freely available for basic research purposes in speech technology and linguistics. In this paper we describe all the resources in more detail and give a brief description of their use in the spoken language technology products developed at LUKS.

...read moreread less

34 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Computer vision : a modern approach = 计算机视觉 : 一种现代的方法

[...]

David Forsyth, Jean Ponce

01 Jan 2004

TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.

...read moreread less

Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

...read moreread less

3,627 citations

Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector

[...]

Р Ю Чуйков, Д А Юдин

01 Jan 2017

1,687 citations

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Journal Article•DOI•

Statistics: Methods and Applications.

[...]

Ernest M. Scheuer, John I. Griffin

01 Feb 1963-American Mathematical Monthly

979 citations

Journal Article•DOI•

A review of affective computing

[...]

Soujanya Poria¹, Erik Cambria², Rajiv Bajpai², Amir Hussain¹•Institutions (2)

University of Stirling¹, Nanyang Technological University²

01 Sep 2017-Information Fusion

TL;DR: This first of its kind, comprehensive literature review of the diverse field of affective computing focuses mainly on the use of audio, visual and text information for multimodal affect analysis, and outlines existing methods for fusing information from different modalities.

...read moreread less

969 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

Collapse