scispace - formally typeset
Search or ask a question

Showing papers in "Multimedia Tools and Applications in 2011"


Journal ArticleDOI
TL;DR: Challenges augmented reality is facing in each of these applications to go from the laboratories to the industry, as well as the future challenges the authors can forecast are also discussed in this paper.
Abstract: This paper surveys the current state-of-the-art of technology, systems and applications in Augmented Reality. It describes work performed by many different research groups, the purpose behind each new Augmented Reality system, and the difficulties and problems encountered when building some Augmented Reality applications. It surveys mobile augmented reality systems challenges and requirements for successful mobile systems. This paper summarizes the current applications of Augmented Reality and speculates on future applications and where current research will lead Augmented Reality's development. Challenges augmented reality is facing in each of these applications to go from the laboratories to the industry, as well as the future challenges we can forecast are also discussed in this paper. Section 1 gives an introduction to what Augmented Reality is and the motivations for developing this technology. Section 2 discusses Augmented Reality Technologies with computer vision methods, AR devices, interfaces and systems, and visualization tools. The mobile and wireless systems for Augmented Reality are discussed in Section 3. Four classes of current applications that have been explored are described in Section 4. These applications were chosen as they are the most famous type of applications encountered when researching AR apps. The future of augmented reality and the challenges they will be facing are discussed in Section 5.

1,012 citations


Journal ArticleDOI
TL;DR: A database of static images of human faces taken in uncontrolled indoor environment using five video surveillance cameras of various qualities to enable robust face recognition algorithms testing, emphasizing different law enforcement and surveillance use case scenarios is described.
Abstract: In this paper we describe a database of static images of human faces. Images were taken in uncontrolled indoor environment using five video surveillance cameras of various qualities. Database contains 4,160 static images (in visible and infrared spectrum) of 130 subjects. Images from different quality cameras should mimic real-world conditions and enable robust face recognition algorithms testing, emphasizing different law enforcement and surveillance use case scenarios. In addition to database description, this paper also elaborates on possible uses of the database and proposes a testing protocol. A baseline Principal Component Analysis (PCA) face recognition algorithm was tested following the proposed protocol. Other researchers can use these test results as a control algorithm performance score when testing their own algorithms on this dataset. Database is available to research community through the procedure described at www.scface.org .

483 citations


Journal ArticleDOI
TL;DR: This survey is designed for scholars and IT professionals approaching this field, reviewing existing tools and providing a view on the past, the present and the future of digital image forensics.
Abstract: Digital visual media represent nowadays one of the principal means for communication. Lately, the reliability of digital visual information has been questioned, due to the ease in counterfeiting both its origin and content. Digital image forensics is a brand new research field which aims at validating the authenticity of images by recovering information about their history. Two main problems are addressed: the identification of the imaging device that captured the image, and the detection of traces of forgeries. Nowadays, thanks to the promising results attained by early studies and to the always growing number of applications, digital image forensics represents an appealing investigation domain for many researchers. This survey is designed for scholars and IT professionals approaching this field, reviewing existing tools and providing a view on the past, the present and the future of digital image forensics.

319 citations


Journal ArticleDOI
TL;DR: A new refined definition of soft biometrics is presented, emphasizing on the aspect of human compliance, and candidate traits that accept this novel definition are identified.
Abstract: In this work we seek to provide insight on the general topic of soft biometrics. We firstly present a new refined definition of soft biometrics, emphasizing on the aspect of human compliance, and then proceed to identify candidate traits that accept this novel definition. We then address relations between traits and discuss associated benefits and limitations of these traits. We also consider two novel soft biometric traits, namely weight and color of clothes and we analyze their reliability. Related promising results on the performance are provided. Finally, we consider a new application, namely human identification solely carried out by a bag of facial, body and accessory soft biometric traits, and as an evidence of its practicality, we provide preliminary promising results.

235 citations


Journal ArticleDOI
TL;DR: This paper surveys geo-tagging related research within the context of multimedia and along three dimensions: modalities in which geographical information can be extracted, applications that can benefit from the use of geographical information, and the interplay between modalities and applications.
Abstract: Geo-tagging is a fast-emerging trend in digital photography and community photo sharing. The presence of geographically relevant metadata with images and videos has opened up interesting research avenues within the multimedia and computer vision domains. In this paper, we survey geo-tagging related research within the context of multimedia and along three dimensions: (1) Modalities in which geographical information can be extracted, (2) Applications that can benefit from the use of geographical information, and (3) The interplay between modalities and applications. Our survey will introduce research problems and discuss significant approaches. We will discuss the nature of different modalities and lay out factors that are expected to govern the choices with respect to multimedia and vision applications. Finally, we discuss future research directions in this field.

193 citations


Journal ArticleDOI
TL;DR: This paper surveys the field of event recognition, from interest point detectors and descriptors, to event modelling techniques and knowledge management technologies, and provides an overview of the methods, categorising them according to video production methods and video domains.
Abstract: Research on methods for detection and recognition of events and actions in videos is receiving an increasing attention from the scientific community, because of its relevance for many applications, from semantic video indexing to intelligent video surveillance systems and advanced human-computer interaction interfaces Event detection and recognition requires to consider the temporal aspect of video, either at the low-level with appropriate features, or at a higher-level with models and classifiers than can represent time In this paper we survey the field of event recognition, from interest point detectors and descriptors, to event modelling techniques and knowledge management technologies We provide an overview of the methods, categorising them according to video production methods and video domains, and according to types of events and actions that are typical of these domains

162 citations


Journal ArticleDOI
TL;DR: A survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains.
Abstract: Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.

122 citations


Journal ArticleDOI
TL;DR: This paper discusses the vision for the future of visual quality assessment research, introduces the area of quality assessment and state its relevance, and describes current standards for gauging algorithmic performance and define terms that will be used through this paper.
Abstract: Creating algorithms capable of predicting the perceived quality of a visual stimulus defines the field of objective visual quality assessment (QA). The field of objective QA has received tremendous attention in the recent past, with many successful algorithms being proposed for this purpose. Our concern here is not with the past however; in this paper we discuss our vision for the future of visual quality assessment research. We first introduce the area of quality assessment and state its relevance. We describe current standards for gauging algorithmic performance and define terms that we will use through this paper. We then journey through 2D image and video quality assessment. We summarize recent approaches to these problems and discuss in detail our vision for future research on the problems of full-reference and no-reference 2D image and video quality assessment. From there, we move on to the currently popular area of 3D QA. We discuss recent databases, algorithms and 3D quality of experience. This yet-nascent technology provides for tremendous scope in terms of research activities and we summarize each of them. We then move on to more esoteric topics such as algorithmic assessment of aesthetics in natural images and in art. We discuss current research and hypothesize about possible paths to tread. Towards the end of this article, we discuss some other areas of interest including high-definition (HD) quality assessment, immersive environments and so on before summarizing interesting avenues for future work in multimedia (i.e., audio-visual) quality assessment.

119 citations


Journal ArticleDOI
TL;DR: The experimental results suggest that useful motion vectors to detect personal highlights varied significantly across viewers, however, it was suggested that the activity in the upper part of face tended to be more indicative of personal highlights than the Activity in the lower part.
Abstract: This paper presents an approach to detect personal highlights in videos based on the analysis of facial activities of the viewer. Our facial activity analysis was based on the motion vectors tracked on twelve key points in the human face. In our approach, the magnitude of the motion vectors represented a degree of a viewer's affective reaction to video contents. We examined 80 facial activity videos recorded for ten participants, each watching eight video clips in various genres. The experimental results suggest that useful motion vectors to detect personal highlights varied significantly across viewers. However, it was suggested that the activity in the upper part of face tended to be more indicative of personal highlights than the activity in the lower part.

109 citations


Journal ArticleDOI
TL;DR: A new audio watermarking algorithm based on singular value decomposition and dither-modulation quantization is presented that is quite robust against attacks including additive white Gaussian noise, MP3 compression, resampling, low-pass filtering, requantization, cropping, echo addition and denoising.
Abstract: Quantization index modulation is one of the best methods for performing blind watermarking, due to its simplicity and good rate-distortion-robustness trade-offs. In this paper, a new audio watermarking algorithm based on singular value decomposition and dither-modulation quantization is presented. The watermark is embedded using dither-modulation quantization of the singular values of the blocks of the host audio signal. The watermark can be blindly extracted without the knowledge of the original audio signal. Subjective and objective tests confirm high imperceptibility achieved by the proposed scheme. Moreover, the scheme is quite robust against attacks including additive white Gaussian noise, MP3 compression, resampling, low-pass filtering, requantization, cropping, echo addition and denoising. The watermark data payload of the algorithm is 196 bps. Performance analysis of the proposed scheme shows low error probability rates.

105 citations


Journal ArticleDOI
TL;DR: A comprehensive survey on recent research and applications on online georeferenced media based on the current technical achievements, open research issues and challenges are identified, and directions that can lead to compelling applications are suggested.
Abstract: In recent years, the emergence of georeferenced media, like geotagged photos, on the Internet has opened up a new world of possibilities for geographic related research and applications. Despite of its short history, georeferenced media has been attracting attentions from several major research communities of Computer Vision, Multimedia, Digital Libraries and KDD. This paper provides a comprehensive survey on recent research and applications on online georeferenced media. Specifically, the survey focuses on four aspects: (1) organizing and browsing georeferenced media resources, (2) mining semantic/social knowledge from georeferenced media, (3) learning landmarks in the world, and (4) estimating geographic location of a photo. Furthermore, based on the current technical achievements, open research issues and challenges are identified, and directions that can lead to compelling applications are suggested.

Journal ArticleDOI
TL;DR: This paper first analyzes the common limitations of the original PVD and its modified versions, and then proposes a more secure steganography based on a content adaptive scheme that achieves much better security compared with the previous PVD-based methods.
Abstract: Pixel-value differencing (PVD) based steganography is one of popular approaches for secret data hiding in the spatial domain. However, based on extensive experiments, we find that some statistical artifacts will be inevitably introduced even with a low embedding capacity in most existing PVD-based algorithms. In this paper, we first analyze the common limitations of the original PVD and its modified versions, and then propose a more secure steganography based on a content adaptive scheme. In our method, a cover image is first partitioned into small squares. Each square is then rotated by a random degree of 0, 90, 180 or 270. The resulting image is then divided into non-overlapping embedding units with three consecutive pixels, and the middle one is used for data embedding. The number of embedded bits is dependent on the differences among the three pixels. To preserve the local statistical features, the sort order of the three pixel values will remain the same after data hiding. Furthermore, the new method can first use sharper edge regions for data hiding adaptively, while preserving other smoother regions by adjusting a parameter. The experimental results evaluated on a large image database show that our method achieves much better security compared with the previous PVD-based methods.

Journal ArticleDOI
TL;DR: This paper has focused on investigating the user perceived experience of olfaction-enhanced multimedia applications, with the aim of discovering the quality evaluation factors that are important from a user’s perspective of these applications, and consequently ensure the continued advancement and success of o aroma-enhancing multimedia applications.
Abstract: Olfaction--or smell--is one of the last challenges which multimedia and multimodal applications have to conquer. Enhancing such applications with olfactory stimuli has the potential to create a more complex--and richer--user multimedia experience, by heightening the sense of reality and diversifying user interaction modalities. Nonetheless, olfaction-enhanced multimedia still remains a challenging research area. More recently, however, there have been initial signs of olfactory-enhanced applications in multimedia, with olfaction being used towards a variety of goals, including notification alerts, enhancing the sense of reality in immersive applications, and branding, to name but a few. However, as the goal of a multimedia application is to inform and/or entertain users, achieving quality olfaction-enhanced multimedia applications from the users' perspective is vital to the success and continuity of these applications. Accordingly, in this paper we have focused on investigating the user perceived experience of olfaction-enhanced multimedia applications, with the aim of discovering the quality evaluation factors that are important from a user's perspective of these applications, and consequently ensure the continued advancement and success of olfaction-enhanced multimedia applications.

Journal ArticleDOI
TL;DR: This work concerned with the problem of accurately finding the location where a photo is taken without needing any metadata, that is, solely by its visual content, shows that the time is right for automating the geo-tagging process, and shows how this can work at large scale.
Abstract: New applications are emerging every day exploiting the huge data volume in community photo collections. Most focus on popular subsets, e.g., images containing landmarks or associated to Wikipedia articles. In this work we are concerned with the problem of accurately finding the location where a photo is taken without needing any metadata, that is, solely by its visual content. We also recognize landmarks where applicable, automatically linking them to Wikipedia. We show that the time is right for automating the geo-tagging process, and we show how this can work at large scale. In doing so, we do exploit redundancy of content in popular locations--but unlike most existing solutions, we do not restrict to landmarks. In other words, we can compactly represent the visual content of all thousands of images depicting e.g., the Parthenon and still retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from an existing, geo-tagged dataset, we cluster images into sets of different views of the same scene. This is a very efficient, scalable, and fully automated mining process. We then align all views in a set to one reference image and construct a 2D scene map. Our indexing scheme operates directly on scene maps. We evaluate our solution on a challenging one million urban image dataset and provide public access to our service through our online application, VIRaL.

Journal ArticleDOI
Zhenjun Tang1, Shuozhong Wang1, Xinpeng Zhang1, Weimin Wei1, Yan Zhao1 
TL;DR: Under the proposed framework, a hashing scheme using discrete cosine transform (DCT) and non-negative matrix factorization (NMF) is implemented, and experimental results show that the proposed scheme is resistant to normal content-preserving manipulations, and has a very low collision probability.
Abstract: Image hash is a content-based compact representation of an image for applications such as image copy detection, digital watermarking, and image authentication. This paper proposes a lexicographical-structured framework to generate image hashes. The system consists of two parts: dictionary construction and maintenance, and hash generation. The dictionary is a large collection of feature vectors called words, representing characteristics of various image blocks. It is composed of a number of sub-dictionaries, and each sub-dictionary contains many features, the number of which grows as the number of training images increase. The dictionary is used to provide basic building blocks, namely, the words, to form the hash. In the hash generation, blocks of the input image are represented by features associated to the sub-dictionaries. This is achieved by using a similarity metric to find the most similar feature among the selective features of each sub-dictionary. The corresponding features are combined to produce an intermediate hash. The final hash is obtained by encoding the intermediate hash. Under the proposed framework, we have implemented a hashing scheme using discrete cosine transform (DCT) and non-negative matrix factorization (NMF). Experimental results show that the proposed scheme is resistant to normal content-preserving manipulations, and has a very low collision probability.

Journal ArticleDOI
TL;DR: A probabilistic Bayesian belief network (BBN) method for automatic indexing of excitement clips of sports video sequences and offers a general approach to the automatic tagging of large scale multimedia content with rich semantics.
Abstract: This paper presents a probabilistic Bayesian belief network (BBN) method for automatic indexing of excitement clips of sports video sequences. The excitement clips from sports video sequences are extracted using audio features. The excitement clips are comprised of multiple subclips corresponding to the events such as replay, field-view, close-ups of players, close-ups of referees/umpires, spectators, players' gathering. The events are detected and classified using a hierarchical classification scheme. The BBN based on observed events is used to assign semantic concept-labels to the excitement clips, such as goals, saves, and card in soccer video, wicket and hit in cricket video sequences. The BBN based indexing results are compared with our previously proposed event-association based approach and found BBN is better than the event-association based approach. The proposed scheme provides a generalizable method for linking low-level video features with high-level semantic concepts. The generic nature of the proposed approach in the sports domain is validated by demonstrating successful indexing of soccer and cricket video excitement clips. The proposed scheme offers a general approach to the automatic tagging of large scale multimedia content with rich semantics. The collection of labeled excitement clips provide a video summary for highlight browsing, video skimming, indexing and retrieval.

Journal ArticleDOI
TL;DR: The when, where, and what of content adaptation is introduced to help classify the content adaptation techniques and to select the appropriate techniques for a given content delivery environment.
Abstract: With continued increase in the use of smartphones, user expectations of content access have also increased. Most of the content that exists today is not designed for mobile devices. Mobile devices cannot directly access most of the content due to the mismatch in device capability and content playback requirements. Content adaptation is an essential tool that bridges the gap between device capabilities and content formats. In this paper we present an overview of content adaptation and survey recent papers on content adaptation for mobile devices. We introduce the when, where, and what of content adaptation to help classify the content adaptation techniques and to select the appropriate techniques for a given content delivery environment.

Journal ArticleDOI
TL;DR: This work has designed and developed a system to enable the embedding of selective interactive elements into the original text in appropriate locations, which act as triggers for the video translation into sign language, which significantly simplifies the expansion and availability of additional accessibility functions to web developers.
Abstract: The World Wide Web is becoming increasingly necessary for everybody regardless of age, gender, culture, health and individual disabilities. Unfortunately, there are evidently still problems for some deaf and hard of hearing people trying to use certain web pages. These people require the translation of existing written information into their first language, which can be one of many sign languages. In previous technological solutions, the video window dominates the screen, interfering with the presentation and thereby distracting the general public, who have no need of a bilingual web site. One solution to this problem is the development of transparent sign language videos which appear on the screen on request. Therefore, we have designed and developed a system to enable the embedding of selective interactive elements into the original text in appropriate locations, which act as triggers for the video translation into sign language. When the short video clip terminates, the video window is automatically closed and the original web page is shown. In this way, the system significantly simplifies the expansion and availability of additional accessibility functions to web developers, as it preserves the original web page with the addition of a web layer of sign language video. Quantitative and qualitative evaluation has demonstrated that information presented through a transparent sign language video increases the users' interest in the content of the material by interpreting terms, phrases or sentences, and therefore facilitates the understanding of the material and increases its usefulness for deaf people.

Journal ArticleDOI
TL;DR: This study builds on an interdisciplinary confluence of insights from image processing, data mining, human computer interaction, and sociology to describe the folksonomic features of users, annotations and images.
Abstract: The plethora of social actions and annotations (tags, comments, ratings) from online media sharing Websites and collaborative games have induced a paradigm shift in the research on image semantic interpretation. Social inputs with their added context represent a strong substitute for expert annotations. Novel algorithms have been designed to fuse visual features with noisy social labels and behavioral signals. In this survey, we review nearly 200 representative papers to identify the current trends, challenges as well as opportunities presented by social inputs for research on image semantics. Our study builds on an interdisciplinary confluence of insights from image processing, data mining, human computer interaction, and sociology to describe the folksonomic features of users, annotations and images. Applications are categorized into four types: concept semantics, person identification, location semantics and event semantics. The survey concludes with a summary of principle research directions for the present and the future.

Journal ArticleDOI
TL;DR: This paper proposes a novel high capacity robust audio watermarking algorithm by using the high frequency band of the wavelet decomposition at which the human auditory system (HAS) is not very sensitive to alteration.
Abstract: This paper proposes a novel high capacity robust audio watermarking algorithm by using the high frequency band of the wavelet decomposition at which the human auditory system (HAS) is not very sensitive to alteration. The main idea is to divide the high frequency band into frames and, for embedding, to change the wavelet samples depending on the average of relevant frame's samples. The experimental results show that the method has a very high capacity (about 11,000 bps), without significant perceptual distortion (ODG in [?1 ,0] and SNR about 30dB), and provides robustness against common audio signal processing such as additive noise, filtering, echo and MPEG compression (MP3).

Journal ArticleDOI
TL;DR: A comprehensive survey of the technical achievements in the research area of content-based tag processing for social images, covering the research aspects on tag ranking, tag refinement and tag-to-region assignment is provided.
Abstract: Online social media services such as Flickr and Zooomr allow users to share their images with the others for social interaction. An important feature of these services is that the users manually annotate their images with the freely-chosen tags, which can be used as indexing keywords for image search and other applications. However, since the tags are generally provided by grassroots Internet users, there is still a gap between these tags and the actual content of the images. This deficiency has significantly limited tag-based applications while, on the other hand, poses a new challenge to the multimedia research community. It calls for a series of research efforts for processing these unqualified tags, especially in making use of content analysis techniques to improve the descriptive power of the tags with respect to the image contents. This paper provides a comprehensive survey of the technical achievements in the research area of content-based tag processing for social images, covering the research aspects on tag ranking, tag refinement and tag-to-region assignment. We review the research advances for each topic and present a brief suggestion for future promising directions.

Journal ArticleDOI
TL;DR: An interactive system for recognizing flower images taken by digital cameras is presented allowing each user to draw an appropriate bounding window that contains the interested flower region and a flower boundary tracing method is developed to extract the flower region as accurately as possible.
Abstract: In this paper, we present an interactive system for recognizing flower images taken by digital cameras. The proposed system provides an interactive interface allowing each user to draw an appropriate bounding window that contains the interested flower region. Then, a flower boundary tracing method is developed to extract the flower region as accurately as possible. In addition to the color and shape features of the whole flower region, the color and shape features of the pistil/stamen area will also be used to represent the flower characteristics more precisely. Experiments conducted on two distinct databases consisting of 24 species and 102 species have shown that our proposed system outperforms other approaches in terms of the recognition rate.

Journal ArticleDOI
TL;DR: A novel self-embedding watermarking scheme is proposed, in which the reference data derived from the most significant bits of host image and the localization dataderived from MSB and reference data are embedded into the least significant bits (LSB) of the cover.
Abstract: A novel self-embedding watermarking scheme is proposed, in which the reference data derived from the most significant bits (MSB) of host image and the localization data derived from MSB and reference data are embedded into the least significant bits (LSB) of the cover. At authentication side, while the localization data are used to detect the blocks containing substitute information, the reference data extracted from other regions and the spatial correlation are exploited to recover the principal content in tampered area by a pixel-by-pixel manner. In this scheme, the narrower the tampered area is, the higher quality of recovered content can be obtained.

Journal ArticleDOI
TL;DR: The use of Priority-based Media Delivery for Scalable Video Coding for SVC to overcome link interruptions and channel bitrate reductions in mobile networks by performing a transmission scheduling algorithm that prioritizes media data according to its importance is presented.
Abstract: Media delivery, especially video delivery over mobile channels may be affected by transmission bitrate variations or temporary link interruptions caused by changes in the channel conditions or the wireless interface. In this paper, we present the use of Priority-based Media Delivery (PMD) for Scalable Video Coding (SVC) to overcome link interruptions and channel bitrate reductions in mobile networks by performing a transmission scheduling algorithm that prioritizes media data according to its importance. The proposed approach comprises a priority-based media pre-buffer to overcome periods under reduced connectivity. The PMD algorithm aims to use the same transmission bitrate and overall buffer size as the traditional streaming approach, yet is more likely to overcome interruptions and reduced bitrate periods. PMD achieves longer continuous playback than the traditional approach, avoiding disruptions in the video playout and therefore improving the video playback quality. We analyze the use of SVC with PMD in the traditional RTP streaming and in the adaptive HTTP streaming context. We show benefits of using SVC in terms of received quality during interruption and re-buffering time, i.e. the time required to fill a desired pre-buffer at the receiver. We present a quality optimization approach for PMD and show results for different interruption/bitrate-reduction scenarios.

Journal ArticleDOI
TL;DR: This paper presents a method to adapt the playback velocity of the video to the temporal information density, so that the users can explore the video under controlled cognitive load and show its advantages over motion-based measures.
Abstract: Automated video analysis lacks reliability when searching for unknown events in video data. The practical approach is to watch all the recorded video data, if applicable in fast-forward mode. In this paper we present a method to adapt the playback velocity of the video to the temporal information density, so that the users can explore the video under controlled cognitive load. The proposed approach can cope with static changes and is robust to video noise. First, we formulate temporal information as symmetrized Renyi divergence, deriving this measure from signal coding theory. Further, we discuss the animated visualization of accelerated video sequences and propose a physiologically motivated blending approach to cope with arbitrary playback velocities. Finally, we compare the proposed method with the current approaches in this field by experiments and a qualitative user study, and show its advantages over motion-based measures.

Journal ArticleDOI
TL;DR: A video watermarking algorithm which embeds different parts of a single watermark into different shots of a video under the wavelet domain, which is not only less perceptible but also robust against common video processing attacks.
Abstract: In this paper, we propose a video watermarking algorithm which embeds different parts of a single watermark into different shots of a video under the wavelet domain. Based on a Motion Activity Analysis, different regions of the original video are separated into perceptually distinct categories according to motion information and region complexity. Thus, the localizations of the watermark are adjusted adaptively in accordance with the human visual system and signal characteristics, which makes them perceptually invisible and less vulnerable to automated removal. In addition, contrary to traditional methods where the watermark remains at a fixed position on the screen, the watermark moves along with moving objects and thus motion artefacts can be avoid. The multi-frame based extraction strategy ensures that the watermark can be correctly recovered from a very short segment of video. Individual frames extracted from the video also contain watermark information. Experimental results show that the inserted watermark is not only less perceptible but also robust against common video processing attacks.

Journal ArticleDOI
TL;DR: A weighted network Voronoi diagram is designed and a high-performance multilevel range search query processing that retrieves a set of objects locating in some specified region within the searching range is proposed.
Abstract: Due to the universality and importance of range search queries processing in mobile and spatial databases as well as in geographic information system (GIS), numerous approaches on range search algorithms have been proposed in recent years. But ordinary range search queries focus only on a specific type of point objects. For queries which require to retrieve objects of interest locating in a particular region, ordinary range search could not get the expected results. In addition, most existing range search methods need to perform a searching on each road segments within the pre-defined range, which decreases the performance of range search. In this paper, we design a weighted network Voronoi diagram and propose a high-performance multilevel range search query processing that retrieves a set of objects locating in some specified region within the searching range. The experimental results show that our proposed algorithm runs very efficiently and outperforms its main competitor.

Journal ArticleDOI
TL;DR: A firmer understanding is envisioned of the broad topic of multimedia quality assessment, of the various sub-disciplines corresponding to different signal types, how these signals types co-relate in producing an overall user experience, and what directions of research remain to be pursued.
Abstract: We survey recent developments in multimedia signal quality assessment, including image, audio, video, and combined signals. Such an overview is timely given the recent explosion in all-digital sensory entertainment and communication devices pervading the consumer space. Owing to the sensory nature of these signals, perceptual models lie at the heart of multimedia signal quality assessment algorithms. We survey these models and recent competitive algorithms and discuss comparison studies that others have conducted. In this context we also describe existing signal quality assessment databases. We envision that the reader will gain a firmer understanding of the broad topic of multimedia quality assessment, of the various sub-disciplines corresponding to different signal types, how these signals types co-relate in producing an overall user experience, and what directions of research remain to be pursued.

Journal ArticleDOI
TL;DR: An efficient one-pass algorithm for shot boundary detection and a cost-effective anchor shot detection method with search space reduction, which are unified scheme in news video story parsing are proposed.
Abstract: In this paper, we propose an efficient one-pass algorithm for shot boundary detection and a cost-effective anchor shot detection method with search space reduction, which are unified scheme in news video story parsing. First, we present the desired requirements for shot boundary detection from the perspective of news video story parsing, and propose a new shot boundary detection method, based on singular value decomposition, and a newly developed algorithm, viz., Kernel-ART, which meets all of these requirements. Second, we propose a new anchor shot detection system, viz., MASD, which is able to detect anchor person cost-effectively by reducing the search space. It consists of skin color detector, face detector, and support vector data descriptions with non-negative matrix factorization sequentially. The experimental results with the qualitative analysis illustrate the efficiency of the proposed method.

Journal ArticleDOI
TL;DR: A novel composite model descriptor is presented, which takes into account both visual and geometric characteristics of 3D models, and an original mapping mechanism from low-level model features to high-level semantic concepts based on the user’s retrieval history is proposed.
Abstract: Since object similarity is a subjective matter, the gap between low-level feature representations and high-level semantic concepts is a major problem in the field of content-based 3D model retrieval. This paper presents a novel composite model descriptor, which takes into account both visual and geometric characteristics of 3D models. It also proposes an original mapping mechanism from low-level model features to high-level semantic concepts based on the user's retrieval history, and so this method belongs to long-term relevance feedback algorithms for 3D model retrieval. Finally, an effective 3D model retrieval system "ModelSeek" has been built, and implemented with the introduced model descriptor and mapping mechanism. The experimental results show that the approaches above not only have significantly improved the retrieval performance, but have also achieved better retrieval effectiveness than the state-of-the-art techniques on the publicly available 3D model criterion that Princeton Shape Benchmark (PSB) and several standard evaluation measures.