scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2016"


Journal ArticleDOI
TL;DR: In discussing the rationale behind the vision for JPEG Pleno and how the new standardization initiative aims to reinvent the future of imaging, the authors review plenoptic representation and its underlying practical implications and challenges in implementing real-world applications with an enhanced quality of experience.
Abstract: In discussing the rationale behind the vision for JPEG Pleno and how the new standardization initiative aims to reinvent the future of imaging, the authors review plenoptic representation and its underlying practical implications and challenges in implementing real-world applications with an enhanced quality of experience.

158 citations


Journal ArticleDOI
TL;DR: As Rosalind W. Picard reflects on the events that moved her from research to the lab to a real-world application, who would have expected efforts to develop algorithms to perceive multimodal inputs would lead to a wearable that detects signals related to deep brain activation and issues potentially life-saving alerts?
Abstract: As Rosalind W. Picard reflects on the events that moved her from research to the lab to a real-world application, she can't help but think... who would have expected efforts to develop algorithms to perceive multimodal inputs would lead to a wearable that detects signals related to deep brain activation and issues potentially life-saving alerts?

60 citations


Journal ArticleDOI
TL;DR: A collaborative sparse coding framework is proposed that integrates the classifier training process and sparse coding process into a unified collaborative filtering framework that lets more discriminative sparse video representations and classifiers be learned by optimizing the dictionary and classifier jointly.
Abstract: Multiview action recognition has received increasing attention over the past decade. Various approaches have been proposed to extract view-invariant features; among them, self-similarity matrices (SSMs) have shown outstanding performance. However, SSMs become sensitive when there's a very large view change. To make SSMs more robust to viewpoint changes, the authors propose a collaborative sparse coding framework. They integrate the classifier training process and sparse coding process into a unified collaborative filtering framework; this lets more discriminative sparse video representations and classifiers be learned by optimizing the dictionary and classifier jointly. Experimental results demonstrate the effectiveness of the framework.

58 citations


Journal ArticleDOI
TL;DR: A novel image encryption algorithm is designed based on autoblocking and a medical electrocardiography (ECG) signal and the Wolf algorithm to generate initial conditions for the chaotic maps.
Abstract: A novel image encryption algorithm is designed based on autoblocking and a medical electrocardiography (ECG) signal. The method uses a chaotic logistic map and a generalized Arnold map. To solve deterministic input problems, the method uses an ECG signal and the Wolf algorithm to generate initial conditions for the chaotic maps. Compared with traditional cryptoarchitectures, this system performs the autoblocking diffusion operation only in the encryption process. The keystream is generated by a control parameter produced from the plain-image, which has proven to be secure against chosen-plaintext and known-plaintext attacks. Experimental results show that the proposed algorithm can achieve high security with good performance.

45 citations


Journal ArticleDOI
TL;DR: The authors shed some light on the recent developments of the JPEG committee and discuss both the current status of JPEG XT and its future plans.
Abstract: The Joint Photographic Experts Group recently produced a new standard, JPEG eXTension. JPEG XT is both backward-compatible with legacy JPEG and offers the ability to encode images of higher precision and higher dynamic range, and in lossy or lossless modes. Here, the authors shed some light on the recent developments of the JPEG committee and discuss both the current status of JPEG XT and its future plans.

43 citations


Journal ArticleDOI
TL;DR: Experimental results show that this in-loop filter design can significantly improve the compression performance of the High Efficiency Video Coding (HEVC) standard, leading us in a new direction for improving compression efficiency.
Abstract: In-loop filtering has emerged as an essential coding tool since H.264/AVC, due to its delicate design, which reduces different kinds of compression artifacts. However, existing in-loop filters rely only on local image correlations, largely ignoring nonlocal similarities. In this article, the authors explore the design philosophy of in-loop filters and discuss their vision for the future of in-loop filter research by examining the potential of nonlocal similarities. Specifically, the group-based sparse representation, which jointly exploits an image's local and nonlocal self-similarities, lays a novel and meaningful groundwork for in-loop filter design. Hard- and soft-thresholding filtering operations are applied to derive the sparse parameters that are appropriate for compression artifact reduction. Experimental results show that this in-loop filter design can significantly improve the compression performance of the High Efficiency Video Coding (HEVC) standard, leading us in a new direction for improving compression efficiency.

41 citations


Journal ArticleDOI
TL;DR: The authors present a framework to generate incidental word learning tasks via load-based profiles measured through the involvement load hypothesis, and topic- based profiles obtained from social media and find that the proposed framework promotes more effective and enjoyable word learning than intentional word learning.
Abstract: Compared to intentional word learning, incidental word learning better motivates learners, integrates development of more language skills, and provides richer contexts. The effectiveness of incidental word learning tasks can also be increased by employing materials that learners are more familiar with or interested in. Here, the authors present a framework to generate incidental word learning tasks via load-based profiles measured through the involvement load hypothesis, and topic-based profiles obtained from social media. They also conduct an experiment on real participants and find that the proposed framework promotes more effective and enjoyable word learning than intentional word learning. This article is part of a special issue on social media for learning.

35 citations


Journal ArticleDOI
TL;DR: This extended guided filtering approach for depth map upsampling outperforms other state-of-the-art approaches by using a high-resolution color image as a guide and applying an onion-peeling filtering procedure that exploits local gradient information of depth images.
Abstract: The authors address the problem of depth map upsampling using a corresponding high-resolution color image. The depth map is captured by low-resolution time-of-flight cameras paired with a high-resolution RGB camera. Inspired by guided image filtering, the proposed method not only uses the structure of the high-resolution color image as guidance, it also exploits local gradient information of depth images to suppress potential texture-copying artifacts. In addition, the authors introduce onion-peel-order filtering that predicts depth values from outside inward in a concentric-layer order, which avoids depth bleeding during the propagation process. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of this approach over prior depth map upsampling methods.

33 citations


Journal ArticleDOI
TL;DR: The authors consider specific key performance indicators (KPIs) and propose using neural networks to provide an automatic classification among these KPIs (related to quality of service and QoE) and adopt the adoption of the neural network ensures replicability of QOE estimation regardless of user involvement and simplifiesQoE analysis for future communications systems.
Abstract: High data rates are usually envisaged by operators to satisfy the subscribers using multimedia services. However, due to the increasing number of tablets, smartphones, and push applications, user needs can require low throughput. A new analysis of user satisfaction is necessary--the so-called quality of experience (QoE). The authors consider specific key performance indicators (KPIs) and propose using neural networks to provide an automatic classification among these KPIs (related to quality of service) and QoE. The adoption of the neural network ensures replicability of QoE estimation regardless of user involvement and simplifies QoE analysis for future communications systems.

31 citations


Journal ArticleDOI
TL;DR: The authors have developed a gaze-based control paradigm to investigate how eye-based interaction techniques can be made precise and fast enough to let disabled people easily interact with multimedia information.
Abstract: The EU-funded MAMEM project (Multimedia Authoring and Management using your Eyes and Mind) aims to propose a framework for natural interaction with multimedia information for users who lack fine motor skills. As part of this project, the authors have developed a gaze-based control paradigm. Here, they outline the challenges of eye-controlled interaction with multimedia information and present initial project results. Their objective is to investigate how eye-based interaction techniques can be made precise and fast enough to let disabled people easily interact with multimedia information.

24 citations


Journal ArticleDOI
TL;DR: This department discusses multimedia hashing and networking and presents one paradigm of leveraging MINets to incorporate both visual and textual information to reach a sensible event coreference resolution.
Abstract: This department discusses multimedia hashing and networking. The authors summarize shallow-learning-based hashing and deep-learning-based hashing. By exploiting successful shallow-learning algorithms, state-of-the-art hashing techniques have been widely used in high-efficiency multimedia storage, indexing, and retrieval, especially in multimedia search applications on smartphone devices. The authors also introduce Multimedia Information Networks (MINets) and present one paradigm of leveraging MINets to incorporate both visual and textual information to reach a sensible event coreference resolution. The goal is to make deep learning practical in realistic multimedia applications.

Journal ArticleDOI
TL;DR: In this article, the social-sensed multimedia computing paradigm and advocate for the need to organically integrate social network and social media data with multimedia computing tasks is proposed and more researchers in the multimedia community should focus on the user dimension.
Abstract: The authors propose the social-sensed multimedia computing paradigm and advocate for the need to organically integrate social network and social media data with multimedia computing tasks. More researchers in the multimedia community should be focusing on the user dimension to quickly advance this line of research.

Journal ArticleDOI
TL;DR: This study investigated the relationships among the use of functions in computer-supported collaborative learning (CSCL), psychological factors, and learning behaviors related to applying the Community of Inquiry (CoI) framework to increase active interaction among learners.
Abstract: This study investigated, through both formative and practical evaluation, the relationships among the use of functions in computer-supported collaborative learning (CSCL), psychological factors, and learning behaviors related to applying the Community of Inquiry (CoI) framework. The goal was to increase active interaction among learners. In two experiments inside and outside the classroom, the authors examined an online discussion and collected data using questionnaires that assessed perceived psychological factors, as well as communication logs related to the efficacy of CoI. The results of a path analysis showed two points. First, cognitive learning tools support the enhancement of expressive cognitive presence that promotes the perception of CoI as formative evaluation. Second, the frequency of the use of the functions fostered expressive social and cognitive presence (which enhanced the perception of both), perceived contribution, and satisfaction with online discussion. This article is part of a special issue on social media for learning.

Journal ArticleDOI
TL;DR: The authors describe their efforts in the EU Recall project to extract memory cues from multimedia records to augment human memory beyond simple memory prosthetics.
Abstract: Technology has always had a direct impact on what humans remember. In the era of smartphones and wearable devices, people easily capture information, such as pictures and videos, on a daily basis. The so-called "quantified self" movement focuses on using such captured multimedia information, often in combination with additional contextual data (such as GPS traces or social media posts), with the goal of extracting and providing better insights into people's everyday actions (for example, fitness tracking, work productivity, and dieting). However, a more interesting use of such captured data might be to directly support human memory. Here, the authors describe their efforts in the EU Recall project to extract memory cues from multimedia records to augment human memory beyond simple memory prosthetics.

Journal ArticleDOI
TL;DR: This article presents an introduction to visual attention retargeting, its connection to visual saliency, the challenges associated with it, and ideas for how it can be approached.
Abstract: This article presents an introduction to visual attention retargeting, its connection to visual saliency, the challenges associated with it, and ideas for how it can be approached. The difficulty of attention retargeting as a saliency inversion problem lies in the lack of one-to-one mapping between saliency and the image domain, in addition to the possible negative impact of saliency alterations on image aesthetics. A few approaches from recent literature to solve this challenging problem are reviewed, and several suggestions for future development are presented.

Journal ArticleDOI
TL;DR: This paper presents a method to construct such resume and illustrates the framework with current Semantic Web technologies, such as RDF and SPARQL for representing and querying semantic metadata, and shows the benefits of indexing and retrieving multimedia contents without centralizing multimedia contents or their associated metadata.
Abstract: Currently, many multimedia contents are acquired and stored in real time and on different locations. In order to retrieve efficiently the desired information and to avoid centralizing all metadata, we propose to compute a centralized metadata resume, i.e., a concise version of the whole metadata, which locates some desired multimedia contents on remote servers. The originality of this resume is that it is automatically constructed based on the extracted metadata. In this paper, we present a method to construct such resume and illustrate our framework with current Semantic Web technologies, such as RDF and SPARQL for representing and querying semantic metadata. Some experimental results are provided in order to show the benefits of indexing and retrieving multimedia contents without centralizing multimedia contents or their associated metadata, and to prove the efficiency of a metadata resume.

Journal ArticleDOI
TL;DR: A novel approach for fast summarization of user-generated videos (UGVs) by selecting a few representative segments based on segment-level semantic and emotional recognition results, which are generally sufficient for a summary.
Abstract: This article introduces a novel approach for fast summarization of user-generated videos (UGVs). Different from other types of videos where the semantic content might vary greatly over time, most UGVs contain only a single shot with relatively consistent high-level semantics and emotional content. Therefore, a few representative segments, which can be selected based on segment-level semantic and emotional recognition results, are generally sufficient for a summary. In addition, due to the poor shooting quality of many UGVs, factors such as camera shaking and lighting conditions are also considered to achieve more pleasant summaries. This article is part of a special issue on quality modeling.

Journal ArticleDOI
TL;DR: In this article, the authors explore open social learner modeling (OSLM), a social extension of open learner modelling (OLM), by embedding visualization of both a learner's own model and other learning peers' models into different parts of the learning content.
Abstract: This article explores open social learner modeling (OSLM)--a social extension of open learner modeling (OLM). A specific implementation of this approach is presented by which learners' self-direction and self-determination in a social e-learning context could be potentially promoted. Unlike previous work, the proposed approach, multifaceted OSLM, lets the system seamlessly and adaptively embed visualization of both a learner's own model and other learning peers' models into different parts of the learning content, for multiple axes of context, at any time during the learning process. It also demonstrates the advantages of visualizing both learners' performance and their contribution to a learning community. An experimental study shows that, contrary to previous research, the richness and complexity of this new approach positively affected the learning experience in terms of perceived effectiveness, efficiency, and satisfaction. This article is part of special issue on social media for learning.

Journal ArticleDOI
TL;DR: Experimental results on the University of Illinois at Urbana-Champaign Image Sense Discrimination dataset and the Google-MM dataset show that the authors' ensemble fusion model outperforms approaches using only a single modality for disambiguation and retrieval.
Abstract: In this article, the authors identify the correlative and complementary relations among multiple modalities. They then propose a multimodal ensemble fusion model to capture the complementary relation and correlative relation between two modalities (images and text) and explain why this ensemble fusion model works. Experimental results on the University of Illinois at Urbana-Champaign Image Sense Discrimination (UIUC-ISD) dataset and the Google-MM dataset show that their ensemble fusion model outperforms approaches using only a single modality for disambiguation and retrieval. Word sense disambiguation and information retrieval are the use cases they studied to demonstrate the effectiveness of their ensemble fusion model.

Journal ArticleDOI
TL;DR: An incomplete traffic data fusing method is proposed to estimate traffic state accurately by extracting data correlations and applying incomplete data fusion, implementing the two approaches in parallel.
Abstract: Today, data-driven intelligent transportation systems must address data quality challenges, such as the missing data problem. For example, is it possible to improve the performance of traffic state estimation using incomplete data? In this article, an incomplete traffic data fusing method is proposed to estimate traffic state accurately. It improves missing data estimation by extracting data correlations and applying incomplete data fusion, implementing the two approaches in parallel. The main research focus is on extracting the inherent spatio-temporal correlations of traffic states data from road segments based on a multiple linear regression (MLR) model. Computational experiments for accuracy and efficiency demonstrate that this method can use data correlations to implement accurate and real-time traffic state estimation. This article is part of a special issue on quality modeling.

Journal ArticleDOI
TL;DR: The following three main topics are examined and studied here: ubiquitous learning via social media services, intelligent tutoring in adaptive e-learning, and multimedia enabled social learning.
Abstract: The popularity and influence of social media have been continuously expanding worldwide, and a similar trend is visible in educational settings. Social media tools have been reportedly used for a variety of educational purposes and in wide-ranging contexts, bridging formal and informal, as well as individual and collaborative learning. Currently, the trend is toward the integration of social media services with mobile and ubiquitous learning and adaptive educational technologies. To provide an overview of this interplay by reviewing the applicable techniques and relevant works in the area, we supplement this editorial with a tutorial survey. In particular, the following three main topics are examined and studied here: ubiquitous learning via social media services, intelligent tutoring in adaptive e-learning, and multimedia enabled social learning. The authors also introduced the articles featured in this special issue on social media for learning.

Journal ArticleDOI
TL;DR: Researchers at the Casa Paganini-InfoMus Research Centre work to combine scientific research in information and communications technology with artistic and humanistic research, showing how collaboration with artists informed work on analyzing nonverbal expressive and social behavior and contributed to tools that support both artistic and scientific developments.
Abstract: As art influences science and technology, science and technology can in turn inspire art. Recognizing this mutually beneficial relationship, researchers at the Casa Paganini-InfoMus Research Centre work to combine scientific research in information and communications technology (ICT) with artistic and humanistic research. Here, the authors discuss some of their work, showing how their collaboration with artists informed work on analyzing nonverbal expressive and social behavior and contributed to tools, such as the EyesWeb XMI hardware and software platform, that support both artistic and scientific developments. They also sketch out how art-informed multimedia and multimodal technologies find application beyond the arts, in areas including education, cultural heritage, social inclusion, therapy, rehabilitation, and wellness.

Journal ArticleDOI
TL;DR: A scale-aware spatially guided mapping (SaSGM) model is proposed, which formulates and combines multiple spatial influences of simple edge responses under different levels of detail and is thus more sensitive to image patterns at a large scale.
Abstract: The scale information in images is important for guiding image-filtering configuration. The authors propose a scale-aware spatially guided mapping (SaSGM) model, which formulates and combines multiple spatial influences of simple edge responses under different levels of detail. The SaSGM model is thus more sensitive to image patterns at a large scale. The authors further incorporate the SaSGM into several image processing models, such as detail enhancement and image stylization models. Experiments show that by inheriting the characteristics of the SaSGM, the extended models are able to differentiate image contents in terms of their scales and thus generate more natural or diversified visual effects. This article is part of a special issue on quality modeling.

Journal ArticleDOI
TL;DR: An approach to encode contents and build advanced multimodal interfaces that protect intellectual property is proposed and used as a case study the IEEE 1599--an international standard for music description.
Abstract: With the advent of the digital age and the spread of portable digital audio players, interest in software and hardware tools that can help producers and distributors enhance and revive their catalogue of music has progressively increased. One of the main concerns of major labels is how to prevent file sharing. An innovative approach that couples reviving catalogues with support for rights management could provide an experience of multimedia content in which users select multiple media streams on the fly in a fully synchronized environment. Because this kind of user experience can't be reconstructed from the single original streams, illegal copying would be intrinsically discouraged. In this article, the authors propose an approach to encode contents and build advanced multimodal interfaces that protect intellectual property. As a case study, they use the IEEE 1599--an international standard for music description.

Journal ArticleDOI
TL;DR: The authors present the person-centered multimedia computing approach inspired by assistive and rehabilitative applications, where the emphasis is on understanding the individual user's preferences and expectations toward designing, developing, and deploying effective solutions.
Abstract: Human-centered multimedia computing (HCMC) focuses on a tight engagement of humans in the design, development, and deployment of multimedia solutions. However, people's abilities change over time due to a variety of reasons, including age, context, and geographical location. To address this challenge, the authors recently introduced the concept of person-centered multimedia computing, where the emphasis is on understanding the individual user's preferences and expectations toward designing, developing, and deploying effective solutions. Today's multimedia technology is largely geared toward the "able" population; individuals with disabilities have largely been absent in the design process and thus must adapt themselves (often unsuccessfully) to available solutions. Further, individuals with disabilities have specific and individualized requirements that necessitate a person-centered, adaptive approach to multimedia computing. Here, the authors present the person-centered multimedia computing approach inspired by assistive and rehabilitative applications.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method for automatic planogram compliance checking in retail chains that does not require product template images for training and extract the product layout from an input image by means of unsupervised recurring pattern detection and matched via graph matching.
Abstract: In this article, the authors propose a novel method for automatic planogram compliance checking in retail chains that doesn't require product template images for training. Product layout is extracted from an input image by means of unsupervised recurring pattern detection and matched via graph matching, with the expected product layout specified by a planogram to measure the level of compliance. A divide-and-conquer strategy is employed to improve the speed. Specifically, the input image is divided into several regions based on the planogram. Recurring patterns are detected in each region, respectively, and then merged together to estimate the product layout.

Journal ArticleDOI
TL;DR: This special issue provides another forum for the researchers of the top symposium papers to further present their research results to the community.
Abstract: The wide-ranging applications and big data of ubiquitous multimedia present both unprecedented challenges and unique opportunities for multimedia computing research. This was the main theme of the 2015 IEEE International Symposium on Multimedia (ISM 2015), and this special issue provides another forum for the researchers of the top symposium papers to further present their research results to the community.

Journal ArticleDOI
TL;DR: This article proposes labeling speakers using just the available information in the news video without external information, which uses face recognition, face clustering, face landmarking, natural language processing tools, and speaker diarization.
Abstract: Identifying the speakers in TV news would help listeners analyze and understand news content, but doing so in news videos is challenging because new faces often appear. Previous research has identified speakers on pretrained faces for TV shows and movies. Using an unsupervised method, this article proposes labeling speakers using just the available information in the news video without external information. The proposed framework segments the audio by speaker, parses closed captions for speaker names, identifies who is speaking, and performs optical character recognition for speaker names. The framework uses face recognition, face clustering, face landmarking, natural language processing tools, and speaker diarization. Results indicate 63.6 percent accuracy for identifying speakers for CNN News.

Journal ArticleDOI
TL;DR: Following a feature extraction stage in which spatial domain statistics are utilized as features, a two-stage nonparametric NR-IQA framework is proposed, which requires no training phase and enables prediction of the image distortion type as well as local regions' quality.
Abstract: In this article, the authors explore an alternative way to perform no-reference image quality assessment (NR-IQA). Following a feature extraction stage in which spatial domain statistics are utilized as features, a two-stage nonparametric NR-IQA framework is proposed. This approach requires no training phase, and it enables prediction of the image distortion type as well as local regions' quality, which is not available in most current algorithms. Experimental results on IQA databases show that the proposed framework achieves high correlation to human perception of image quality and delivers competitive performance to state-of-the-art NR-IQA algorithms.

Journal ArticleDOI
TL;DR: A novel framework for automatically creating Cinemagraphs from video sequences, with specific emphasis on determining the composition of masks and layers in creating aesthetically pleasing cinemagraphs is presented.
Abstract: A cinemagraph is a novel medium that infuses a static image with the dynamics of one or several particular image regions. It is in many ways intermediate between a photograph and video, and it has numerous potential applications, such as in the creation of dynamic scenes in computer games and interactive environments. However, creating cinemagraphs is a time-consuming process requiring high proficiency in photo-editing techniques. This article presents a novel framework for automatically creating cinemagraphs from video sequences, with specific emphasis on determining the composition of masks and layers in creating aesthetically pleasing cinemagraphs. Treating video as a spatiotemporal data volume, the problem is considered a type of constrained optimization problem involving the discovery of a connected subgraph in video frames with maximal cumulative interestingness scores. The proposed framework accommodates multiple criteria describing qualities of interest in local image patches based on appearance and motion. Furthermore, the selected regions are not limited to certain shapes--the proposed approach facilitates capturing arbitrary objects. Experiments demonstrate the performance of the proposed approach. The findings of this study provide valuable information regarding various design choices for developing an easy and versatile authoring tool for cinemagraphs.