Author
Roberto Gerson de Albuquerque Azevedo
Other affiliations: École Normale Supérieure, Federal University of Maranhão, The Catholic University of America ...read more
Bio: Roberto Gerson de Albuquerque Azevedo is an academic researcher from École Polytechnique Fédérale de Lausanne. The author has contributed to research in topics: User interface & Video quality. The author has an hindex of 6, co-authored 37 publications receiving 149 citations. Previous affiliations of Roberto Gerson de Albuquerque Azevedo include École Normale Supérieure & Federal University of Maranhão.
Papers
More filters
••
TL;DR: This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications, essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications.
Abstract: Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers’ visual perception and the immersive experience at large is still unknown—thus, it is an open research topic—this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications.
66 citations
••
TL;DR: This paper proposes the integration of concepts from those two communities in a unique high-level programming framework that integrates user modalities —both user-generated and user-consumed — in declarative programming languages for the specification of interactive multimedia applications.
Abstract: Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities --both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)-- in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.
18 citations
••
TL;DR: A microkernel-based architecture for authoring tools is proposed, where the microkernel is responsible for instantiating the requested extensions (plugins), maintaining the core data model that represents the hypermedia document under development, and notifying changes in this model to plugins interested in them.
Abstract: This paper discusses the importance of non-functional requirements in the design of hypermedia authoring tools, which typically provides multiple graphical abstractions (views). It focuses on creating products and services that operate robustly across a broad range of environments, and that take into account the changeable needs of their users over time, as they become more familiar with the tool. In order to meet these non-functional aspects, this paper proposes a microkernel-based architecture for authoring tools, where the microkernel is responsible for instantiating the requested extensions (plugins), maintaining the core data model that represents the hypermedia document under development, and notifying changes in this model to plugins interested in them. Based on the proposed architecture, a new version of Composer (an NCL authoring tool) is presented, rewritten from scratch. Results from experiments show that the discussed non-functional requirements are adequately met.
15 citations
••
01 Sep 2019TL;DR: A testbed is presented and a user-focused study on a scenario in which the user is immersed in the 360-degree video content and is stimulated through additional sensory effects, indicating that the sensorial effects can considerably increase the sense of presence of360-degree videos.
Abstract: Traditionally, most multimedia content has been developed to stimulate two of the human senses, i.e., sight and hearing. Due to recent technological advancements, however, innovative services have been developed that provide more realistic, immersive, and engaging experiences to the audience. Omnidirectional (i.e., 360-degree) video, for instance, is becoming increasingly popular. It allows the viewer to navigate the full 360-degree view of a scene from a specific point. In particular, when consumed through head-mounted displays, 360-degree videos provide increased immersion and sense of presence. The use of multi-sensory effects —e.g., wind, vibration, and scent— has also been explored by recent work, which allows an improved experience by stimulating other users' senses through sensory effects that go beyond the audiovisual content. Understanding how these additional multi-sensory effects affect the users' perceived quality of experience (QoE) in 360-degree, however, is still an open research problem at large. As a step to better understand the QoE of immersive sensory experiences, this paper presents a testbed and discusses a user-focused study on a scenario in which the user is immersed in the 360-degree video content and is stimulated through additional sensory effects. Quantitative results indicated that the sensorial effects can considerably increase the sense of presence of 360-degree videos. Qualitative results provided us with a better view of the limitations of current technologies and interesting insights such as the users' sense of surprise.
12 citations
••
29 Jun 2011TL;DR: A textual approach to hypermedia authoring that uses typographical accessories, such as program visualization, hypertextual navigation, and semi-automatic error correction, and does not imply in extra cognitive overload is presented.
Abstract: Authoring tools for hypermedia languages usually provide visual abstractions, which hide the source code from the author aiming to simplify and accelerate the development process. Among other drawbacks, these abstractions modify or even break the communication process between the author and the language designer, since these languages were designed to be readable and understandable by its target audience. This paper presents a textual approach to hypermedia authoring that does not have these inconveniences, but rather uses typographical accessories, such as program visualization, hypertextual navigation, and semi-automatic error correction. The proposed approach exploits concepts known to the author and does not imply in extra cognitive overload. A use case is presented, namely the NCL Eclipse authoring environment, for Nested Context Language, the Brazilian Digital TV and ITU-T standard.
11 citations
Cited by
More filters
•
TL;DR: GRiNS as mentioned in this paper is an authoring and presentation environment that can be used to create SMIL-compliant documents and to play SMIL documents created with GRiNS or by hand.
91 citations
••
TL;DR: The state-of-the-art on adaptive 360° video delivery solutions considering end-to-end video streaming in general and then specifically of 360°video delivery are presented.
Abstract: Omnidirectional or 360° video is increasingly being used, mostly due to the latest advancements in immersive Virtual Reality (VR) technology. However, its wide adoption is hindered by the higher bandwidth and lower latency requirements than associated with traditional video content delivery. Diverse researchers propose and design solutions that help support an immersive visual experience of 360° video, primarily when delivered over a dynamic network environment. This paper presents the state-of-the-art on adaptive 360° video delivery solutions considering end-to-end video streaming in general and then specifically of 360° video delivery. Current and emerging solutions for adaptive 360° video streaming, including viewport-independent, viewport-dependent, and tile-based schemes are presented. Next, solutions for network-assisted unicast and multicast streaming of 360° video content are discussed. Different research challenges for both on-demand and live 360° video streaming are also analyzed. Several proposed standards and technologies and top international research projects are then presented. We demonstrate the ongoing standardization efforts for 360° media services that ensure interoperability and immersive media deployment on a massive scale. Finally, the paper concludes with a discussion about future research opportunities enabled by 360° video.
90 citations
••
TL;DR: This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications, essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications.
Abstract: Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers’ visual perception and the immersive experience at large is still unknown—thus, it is an open research topic—this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications.
66 citations
••
TL;DR: The experiments on two databases, namely CVIQD2018 and MVAQD databases, demonstrate that the proposed SSP-BOIQA method outperforms the state-of-the-art blind quality assessment methods, and is more consistent with human visual perception.
Abstract: In contrast with traditional images, omnidirectional image (OI) has a higher resolution and provides the user with an interactive wide field of view. OI with equirectangular projection (ERP) format, as the default for encoding and transmitting omnidirectional visual contents, is not suitable for quality assessment of OI because of serious geometric distortion in the bipolar regions, especially for blind image quality assessment. In this paper, a segmented spherical projection (SSP) based blind omnidirectional image quality assessment (SSP-BOIQA) method is proposed. The OI with ERP format is first converted into that with SSP format, so as to solve the problem of stretching distortion in the bipolar regions of ERP format, but retain the equatorial region of ERP format. On the one hand, considering that the bipolar regions of the SSP format are circular, a local/global perceptual features extraction scheme with fan-shaped window is proposed for estimating the distortion in the bipolar regions of OI. On the other hand, the perceptual features of the equatorial region are extracted with heat map as weighting factor to reflect users' visual behavior. Then, the features extracted from the OI's bipolar and equatorial regions are pooled to predict the quality of distorted OIs. The experiments on two databases, namely CVIQD2018 and MVAQD databases, demonstrate that the proposed SSP-BOIQA method outperforms the state-of-the-art blind quality assessment methods, and is more consistent with human visual perception.
28 citations
••
TL;DR: Different projections, compression, and streaming techniques that either incorporate the visual features or spherical characteristics of 360-degree video, as well as the latest ongoing standardization efforts for enhanced degree-of-freedom immersive experience are presented.
Abstract: 360-degree video streaming is expected to grow as the next disruptive innovation due to the ultra-high network bandwidth (60–100 Mbps for 6k streaming), ultra-high storage capacity, and ultra-high computation requirements. Video consumers are more interested in the immersive experience instead of conventional broadband televisions. The visible area (known as user’s viewport) of the video is displayed through Head-Mounted Display (HMD) with a very high frame rate and high resolution. Delivering the whole 360-degree frames in ultra-high-resolution to the end-user significantly adds pressure to the service providers’ overall intention. This paper surveys 360-degree video streaming by focusing on different paradigms from capturing to display. It overviews different projections, compression, and streaming techniques that either incorporate the visual features or spherical characteristics of 360-degree video. Next, the latest ongoing standardization efforts for enhanced degree-of-freedom immersive experience are presented. Furthermore, several 360-degree audio technologies and a wide range of immersive applications are consequently deliberated. Finally, some significant research challenges and implications in the immersive multimedia environment are presented and explained in detail.
24 citations