scispace - formally typeset
Search or ask a question
Author

Riccardo Leonardi

Bio: Riccardo Leonardi is an academic researcher from University of Brescia. The author has contributed to research in topics: Search engine indexing & Image segmentation. The author has an hindex of 28, co-authored 253 publications receiving 3180 citations. Previous affiliations of Riccardo Leonardi include Teesside University & University of Catania.


Papers
More filters
Journal ArticleDOI
TL;DR: This work states that the proposed encoder architecture, which combines a block- based transform and interframe predictive coding approach, is well-suited for applications where the video is encoded once and decoded many times, i.e., one-to-many topologies, such as broadcasting or video-on-demand, where the cost of the decoder is more critical than thecost of the encoder.
Abstract: Growing percentage of the world population now uses image and video coding technologies on a regular basis. These technologies are behind the success and quick deployment of services and products such as digital pictures, digital television, DVDs, and Internet video communications. Today's digital video coding paradigm represented by the ITU-T and MPEG standards mainly relies on a hybrid of block- based transform and interframe predictive coding approaches. In this coding framework, the encoder architecture has the task to exploit both the temporal and spatial redundancies present in the video sequence, which is a rather complex exercise. As a consequence, all standard video encoders have a much higher computational complexity than the decoder (typically five to ten times more complex), mainly due to the temporal correlation exploitation tools, notably the motion estimation process. This type of architecture is well-suited for applications where the video is encoded once and decoded many times, i.e., one-to-many topologies, such as broadcasting or video-on-demand, where the cost of the decoder is more critical than the cost of the encoder.

142 citations

Journal ArticleDOI
TL;DR: A semantic indexing algorithm which uses both audio and visual information for salient event detection in soccer, using camera motion information as a visual cue and the "loudness" as an audio descriptor is proposed.
Abstract: Content characterization of sport videos is a subject of great interest to researchers working on the analysis of multimedia documents. In this paper, we propose a semantic indexing algorithm which uses both audio and visual information for salient event detection in soccer. The video signal is processed first by extracting low-level visual descriptors directly from an MPEG-2 bit stream. It is assumed that any instance of an event of interest typically affects two consecutive shots and is characterized by a different temporal evolution of the visual descriptors in the two shots. This motivates the introduction of a controlled Markov chain to describe such evolution during an event of interest, with the control input modeling the occurrence of a shot transition. After adequately training different controlled Markov chain models, a list of video segments can be extracted to represent a specific event of interest using the maximum likelihood criterion. To reduce the presence of false alarms, low-level audio descriptors are processed to order the candidate video segments in the list so that those associated to the event of interest are likely to be found in the very first positions. We focus in particular on goal detection, which represents a key event in a soccer game, using camera motion information as a visual cue and the "loudness" as an audio descriptor. The experimental results show the effectiveness of the proposed multimodal approach.

122 citations

Journal ArticleDOI
TL;DR: This paper intends to contribute for the identification of the most DVC friendly application scenarios, highlighting the expected benefits and drawbacks for each studied scenario.
Abstract: Distributed Video Coding (DVC) is a new video coding paradigm based on two major Information Theory results: the Slepian-Wolf and Wyner-Ziv theorems. Recently, practical DVC solutions have been proposed with promising results; however, there is still a need to study in a more systematic way the set of application scenarios for which DVC may bring major advantages. This paper intends to contribute for the identification of the most DVC friendly application scenarios, highlighting the expected benefits and drawbacks for each studied scenario. This selection is based on a proposed methodology which involves the characterization and clustering of the applications according to their most relevant characteristics, and their matching with the main potential DVC benefits.

99 citations

Journal ArticleDOI
TL;DR: The current state-of-the-art in SVC is described, focusing on wavelet based motion-compensated approaches (WSVC), and individual components that have been designed to address the problem over the years are reviewed and how such components are typically combined to achieve meaningful WSVC architectures are reviewed.
Abstract: Scalable video coding (SVC) differs form traditional single point approaches mainly because it allows to encode in a unique bit stream several working points corresponding to different quality, picture size and frame rate. This work describes the current state-of-the-art in SVC, focusing on wavelet based motion-compensated approaches (WSVC). It reviews individual components that have been designed to address the problem over the years and how such components are typically combined to achieve meaningful WSVC architectures. Coding schemes which mainly differ from the space-time order in which the wavelet transforms operate are here compared, discussing strengths and weaknesses of the resulting implementations. An evaluation of the achievable coding performances is provided considering the reference architectures studied and developed by ISO/MPEG in its exploration on WSVC. The paper also attempts to draw a list of major differences between wavelet based solutions and the SVC standard jointly targeted by ITU and ISO/MPEG. A major emphasis is devoted to a promising WSVC solution, named STP-tool, which presents architectural similarities with respect to the SVC standard. The paper ends drawing some evolution trends for WSVC systems and giving insights on video coding applications which could benefit by a wavelet based approach.

96 citations

Journal ArticleDOI
TL;DR: This paper extends previous work by extracting audiovisual and film grammar descriptors and, driven by users' rates on connotative properties, creates a shared framework where movie scenes are placed, compared, and recommended according to connotation.
Abstract: The apparent difficulty in assessing emotions elicited by movies and the undeniable high variability in subjects' emotional responses to film content have been recently tackled by exploring film connotative properties: the set of shooting and editing conventions that help in transmitting meaning to the audience. Connotation provides an intermediate representation that exploits the objectivity of audiovisual descriptors to predict the subjective emotional reaction of single users. This is done without the need of registering users' physiological signals. It is not done by employing other people's highly variable emotional rates, but by relying on the intersubjectivity of connotative concepts and on the knowledge of user's reactions to similar stimuli. This paper extends previous work by extracting audiovisual and film grammar descriptors and, driven by users' rates on connotative properties, creates a shared framework where movie scenes are placed, compared, and recommended according to connotation. We evaluate the potential of the proposed system by asking users to assess the ability of connotation in suggesting film content able to target their affective requests.

95 citations


Cited by
More filters
01 Jan 1990
TL;DR: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article, where the authors present an overview of their work.
Abstract: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article.

2,933 citations

Journal ArticleDOI

2,415 citations

Journal ArticleDOI
Tamar Frankel1
TL;DR: The Essay concludes that practitioners theorize, and theorists practice, use these intellectual tools differently because the goals and orientations of theorists and practitioners, and the constraints under which they act, differ.
Abstract: Much has been written about theory and practice in the law, and the tension between practitioners and theorists. Judges do not cite theoretical articles often; they rarely "apply" theories to particular cases. These arguments are not revisited. Instead the Essay explores the working and interaction of theory and practice, practitioners and theorists. The Essay starts with a story about solving a legal issue using our intellectual tools - theory, practice, and their progenies: experience and "gut." Next the Essay elaborates on the nature of theory, practice, experience and "gut." The third part of the Essay discusses theories that are helpful to practitioners and those that are less helpful. The Essay concludes that practitioners theorize, and theorists practice. They use these intellectual tools differently because the goals and orientations of theorists and practitioners, and the constraints under which they act, differ. Theory, practice, experience and "gut" help us think, remember, decide and create. They complement each other like the two sides of the same coin: distinct but inseparable.

2,077 citations

Journal ArticleDOI
TL;DR: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT), capable of modeling the spatially varying visual masking phenomenon.
Abstract: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT). The algorithm exhibits state-of-the-art compression performance while producing a bit-stream with a rich set of features, including resolution and SNR scalability together with a "random access" property. The algorithm has modest complexity and is suitable for applications involving remote browsing of large compressed images. The algorithm lends itself to explicit optimization with respect to MSE as well as more realistic psychovisual metrics, capable of modeling the spatially varying visual masking phenomenon.

1,933 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations