scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2018"


Journal ArticleDOI
TL;DR: Originally developed to improve manufacturing processes, digital twins are being redefined as digital replications of living as well as nonliving entities that enable data to be seamlessly transmitted between the physical and virtual worlds.
Abstract: Originally developed to improve manufacturing processes, digital twins are being redefined as digital replications of living as well as nonliving entities that enable data to be seamlessly transmitted between the physical and virtual worlds. Digital twins facilitate the means to monitor, understand, and optimize the functions of all physical entities and for humans provide continuous feedback to improve quality of life and well-being.

373 citations


Journal ArticleDOI
TL;DR: A thorough security analysis of a chaotic image encryption algorithm based on autoblocking and electrocardiography from the view point of modern cryptography finds it is vulnerable to the known plaintext attack.
Abstract: This paper performs a thorough security analysis of a chaotic image encryption algorithm based on autoblocking and electrocardiography from the view point of modern cryptography. The algorithm uses electrocardiography (ECG) signals to generate the initial key for a chaotic system and applies an autoblocking method to divide a plain image into blocks of certain sizes suitable for subsequent encryption. The designers claimed that the proposed algorithm is “strong and flexible enough for practical applications”. We find it is vulnerable to the known plaintext attack: based on one pair of a known plain-image and its corresponding cipher-image, an adversary is able to derive a mask image, which can be used as an equivalent secret key to successfully decrypt other cipher images encrypted under the same key with a non-negligible probability of 1/256. Using this as a typical counterexample, we summarize some security defects existing in many image encryption algorithms.

207 citations


Journal ArticleDOI
TL;DR: A new image self-embedding scheme based on optimal iterative block truncation coding and non-uniform watermark sharing is proposed that can achieve better performance of tampering recovery than some of state-of-the-art schemes.
Abstract: Self-embedding watermarking can be used for image tampering recovery. In this work, the authors proposed a new image self-embedding scheme based on optimal iterative block truncation coding and non-uniform watermark sharing. Experimental results demonstrate that the proposed scheme can achieve better performance of tampering recovery than some of state-of-the-art schemes.

80 citations


Journal ArticleDOI
TL;DR: The design and execution of a series of experiments are reported to quantitatively evaluate HoloLens' performance in head localization, real environment reconstruction, spatial mapping, hologram visualization, and speech recognition.
Abstract: A recently released cutting-edge AR device, Microsoft HoloLens, has attracted considerable attention with its advanced capabilities. In this article, we report the design and execution of a series of experiments to quantitatively evaluate HoloLens' performance in head localization, real environment reconstruction, spatial mapping, hologram visualization, and speech recognition.

78 citations


Journal ArticleDOI
TL;DR: It is believed that the multimedia community can build on sensing technologies to enable efficient clinical decision-making in mental health care, and innovative multimedia systems can help identify and visualize personalized early-warning signs from complex multimodal signals, which could lead to effective intervention strategies and better preemptive care.
Abstract: Mental health is an urgent global issue. Around 450 million people suffer from serious mental illnesses worldwide, which results in devastating personal outcomes and huge societal burden. Effective symptom monitoring and personalized interventions can significantly improve mental health care across different populations. However, traditional clinical methods often fall short when it comes to real-time monitoring of symptoms. Sensing technologies can address these issues by enabling granular tracking of behavioral, physiological, and social signals relevant to mental health. In this article, we describe how sensing technologies can be used to diagnose and monitor patient states for numerous serious mental illnesses. We also identify current limitations and potential future directions. We believe that the multimedia community can build on sensing technologies to enable efficient clinical decision-making in mental health care. Specifically, innovative multimedia systems can help identify and visualize personalized early-warning signs from complex multimodal signals, which could lead to effective intervention strategies and better preemptive care.

67 citations


Journal ArticleDOI
TL;DR: This work proposes a progressive search paradigm to reduce the search space that has three main strategies: coarse- to-fine search in the feature space, near-to-distantSearch in the spatial-temporal space, and low-to -high permission search inThe security space.
Abstract: As the Internet of Things (IoT) expands, IoT search has tremendous potential applications but also presents many challenges. To solve these challenges, we propose a progressive search paradigm to reduce the search space that has three main strategies: coarse-to-fine search in the feature space, near-to-distant search in the spatial-temporal space, and low-to-high permission search in the security space. Two case studies from a multimedia-based urban sensing network demonstrate that our approach can significantly improve IoT search speed and accuracy.

51 citations


Journal ArticleDOI
TL;DR: This paper found that a novel hyper-chaotic based image encryption scheme is vulnerable to chosen plaintext attack, but by performing permutation before the vector partition and adding two-round crossover diffusion at the end of the encryption, the security of the original scheme has been greatly improved.
Abstract: This paper found that a novel hyper-chaotic based image encryption scheme is vulnerable to chosen plaintext attack. By performing permutation before the vector partition and adding two-round crossover diffusion at the end of the encryption, the security of the original scheme has been greatly improved.

45 citations


Journal ArticleDOI
TL;DR: A novel system that involves data capturing and multimodal fusion to extract relevant features, analyze data, and provide useful recommendations for improving the patients quality of life is described.
Abstract: The analysis of multimodal data collected by innovative imaging sensors, Internet of Things devices, and user interactions can provide smart and automatic distant monitoring of Parkinsons and Alzheimers patients and reveal valuable insights for early detection and/or prevention of events related to their health. This article describes a novel system that involves data capturing and multimodal fusion to extract relevant features, analyze data, and provide useful recommendations. The system gathers signals from diverse sources in health monitoring environments, understands the user behavior and context, and triggers proper actions for improving the patients quality of life. The system offers a multimodal, multi-patient, versatile approach not present in current developments. It also offers comparable or improved results for detection of abnormal behavior in daily motion. The system was implemented and tested during 10 weeks in real environments involving 18 patients.

40 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present Rhythm, a platform that combines wearable electronic badges and online applications to capture team-level and network-level interaction patterns in organizations, including conversation time, turn-taking behavior, and the physical proximity of both co-located and distributed members.
Abstract: To understand and manage complex organizations, we must develop tools capable of measuring human social interaction accurately and uniformly. Current technologies that measure face-to-face communication do not measure interaction in a unified manner and often ignore remote interaction, an increasingly common communication modality. In this article we present Rhythm, a platform that combines wearable electronic badges and online applications to capture team-level and network-level interaction patterns in organizations. The platform measures conversation time, turn-taking behavior, and the physical proximity of both co-located and distributed members. Our goal is to empower organizations and researchers to measure formal and informal social interaction across teams, divisions, and locations. We describe two pilot studies that use this platform and discuss how measurement systems like Rhythm may further the fields of computational social science and organizational design.

33 citations


Journal ArticleDOI
TL;DR: A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.
Abstract: Neural-network-based image and video captioning can be substantially improved by utilizing architectures that make use of special features from the scene context, objects, and locations. A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.

23 citations


Journal ArticleDOI
TL;DR: The challenges in meeting these goals and the techniques that Kandao--a VR startup company based in China--used to conquer them when designing its Obsidian cameras are described.
Abstract: To make VR cameras more accessible to the public, devices must be affordable, portable, reliable, high quality, and user friendly. In this article, we describe the challenges in meeting these goals and the techniques that Kandao--a VR startup company based in China--used to conquer them when designing its Obsidian cameras.

Journal ArticleDOI
TL;DR: An overview of biometrics and latest progress is presented to further understanding of general audiences and policy makers, and interdisciplinary research, and to complement earlier articles with updates on recent topics.
Abstract: Though biometrics is widely being used in various applications, it still faces many challenges. This article aims: i) to present an overview of biometrics and latest progress, ii) to further understanding of general audiences and policy makers, and interdisciplinary research, iii) to complement earlier articles with updates on recent topics.

Journal ArticleDOI
TL;DR: The relevance and contribution of new signals in a broader interpretation of multimedia for personal health and how core multimedia research is becoming an important enabler for applications with the potential for significant societal impact are explored.
Abstract: In this article, we explore the relevance and contribution of new signals in a broader interpretation of multimedia for personal health. We present how core multimedia research is becoming an important enabler for applications with the potential for significant societal impact.

Journal ArticleDOI
TL;DR: The MOVING platform enables its users to improve their information literacy by training how to exploit data mining methods in their daily research tasks.
Abstract: The MOVING platform enables its users to improve their information literacy by training how to exploit data mining methods in their daily research tasks. Its novel integrated working and training environment supports the education of data-savvy information professionals and allows them to address the big data and open innovation challenges.

Journal ArticleDOI
TL;DR: This work proposes to extend HAS with MS-Stream, a pragmatic solution for multiple-source streaming over HTTP that simultaneously utilizes several servers to obtain higher QoE in distributed infrastructures.
Abstract: HTTP Adaptive Streaming (HAS) have become the de-facto solutions to deliver video over the Internet that increase end-user's Quality of Experience (QoE). We propose to extend HAS with MS-Stream, a pragmatic solution for multiple-source streaming over HTTP that simultaneously utilizes several servers to obtain higher QoE in distributed infrastructures.

Journal ArticleDOI
Sijie Song1, Tao Mei
TL;DR: The future of fashion is being reshaped by multimedia, and researchers are working toward computational fashion, which triggers promising applications and theoretical research in this area.
Abstract: Through transforming visual fashion into computational images, multimedia technologies are transforming fashion industry at a faster pace than before. The huge demand for fashion analytics triggers promising applications and theoretical research in this area. The future of fashion is being reshaped by multimedia, and researchers are working toward computational fashion.

Journal ArticleDOI
TL;DR: An alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN).
Abstract: With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations are typically obtained with autoencoders that reconstruct multimodal data. In this article, we describe an alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN). Using the lessons learned from multimodal retrieval, we present a BiDNN-based system that performs video hyperlinking and recommends interesting video segments to a viewer. Results established using TRECVIDs 2016 video hyperlinking benchmarking initiative show that our method obtained the best score, thus defining the state of the art.

Journal ArticleDOI
TL;DR: An application framework is presented that implements semi-automatic camera calibration, object extraction, object tracking, and object separation to seamlessly generate high-quality free viewpoint sports videos for handheld devices.
Abstract: Free viewpoint technology makes it possible to view video of sports content from any angle or position, but creating such content is currently a time-consuming process that can prevent real-time delivery. To address this problem, the authors present an application framework that implements semi-automatic camera calibration, object extraction, object tracking, and object separation to seamlessly generate high-quality free viewpoint sports videos for handheld devices.

Journal ArticleDOI
TL;DR: Emerging trends and attempts by FinTech start-ups to apply AI and multimedia information processing techniques across a wide range of business needs are described.
Abstract: Enthusiasm for artificial intelligence and multimedia information in the financial industry is at an all time high. Every leader in finance now feels the pressure to answer the question, “ What is your AI strategy? ” Start-ups are playing a key role in helping the financial sector determine what AI can do and how humans and machines can work together. In this essay, we describe emerging trends and attempts by FinTech start-ups to apply AI and multimedia information processing techniques across a wide range of business needs.

Journal ArticleDOI
TL;DR: An evaluation of the authors' proposed multimodal method using a job candidate screening system that predicted five personality traits from a short video demonstrates the methods effectiveness.
Abstract: The authors present a novel methodology for analyzing integrated audiovisual signals and language to assess a persons personality. An evaluation of their proposed multimodal method using a job candidate screening system that predicted five personality traits from a short video demonstrates the methods effectiveness.

Journal ArticleDOI
TL;DR: A new cross-media hashing scheme, multiview cross- media hashing with semantic consistency (MCMHSC), to address the semantic gap between objects by fully exploiting the semantic correlation and complementary information among objects.
Abstract: Cross-media hashing is used to handle both cross-media representation and indexing simultaneously. Most existing methods attempt to bridge the semantic gap by maximizing the correlation of heterogeneous instances describing the same information object. Although these methods guarantee that such instances are close in the commonly shared space, instances describing different objects but the same category may be scattered. We propose a new cross-media hashing scheme, multiview cross-media hashing with semantic consistency (MCMHSC), to address this problem. By fully exploiting the semantic correlation and complementary information among objects, MCMHSC builds discriminative hashing codes. Experiments on two public benchmark datasets show that our proposed scheme achieves comparable or better performance compared to state-of-the-art methods in terms of accuracy and time complexity.

Journal ArticleDOI
TL;DR: A probabilistic graphical representation called pDisVPL is derived to explain the discriminative mid-level visual part learning problem from a Probabilistic point of view and it demonstrates the state-of-the-art performances on image classification benchmarks.
Abstract: This work studies the discriminative mid-level visual part learning problem from a probabilistic point of view and we derive a probabilistic graphical representation called pDisVPL to explain this learning problem. Extensive experiments on image classification benchmarks demonstrate the state-of-the-art performances.

Journal ArticleDOI
TL;DR: The authors describe four interdisciplinary performance-oriented multimedia projects developed at Bauhaus-Universität Weimar and reflect on the student participants perspective as well as lessons they learned as teachers and human-computer interaction researchers.
Abstract: The authors describe four interdisciplinary performance-oriented multimedia projects developed at Bauhaus-Universitat Weimar and reflect on the student participants perspective as well as lessons they learned as teachers and human-computer interaction researchers.

Journal ArticleDOI
TL;DR: An overview of VNBA is presented and various research challenges and proposed solutions related to fusion of multimodal cues, context estimation, and user privacy protection still need to be addressed adequately.
Abstract: Social signal processing (SSP) is a promising automated technology that aims to provide computers with the ability to sense and understand human social behaviors. Representative SSP applications include novel human-computer interaction mechanisms that enhance machine sensitivity of users emotional and mental states, more engaging games, ambient intelligence systems responsive to social context, and new quantitative psychological evaluation tools for coaching or diagnosis. Based on adopted cues, existing SSP methods can be categorized as verbal or nonverbal. Over the last decade, significant progress has been accomplished in visual nonverbal behavior analysis (VNBA). However, several emerging issues such as fusion of multimodal cues, context estimation, and user privacy protection still need to be addressed adequately. The authors present an overview of VNBA and describe various research challenges and proposed solutions.

Journal ArticleDOI
Bo Li1, Mingliang Zhou1, Yongfei Zhang1, Xupeng Lin1, Weihan Guo1 
TL;DR: A highly accurate RC scheme is achieved based on the proposed CTU-level RC model in High Efficiency Video Coding (HEVC) and the validity of the algorithm is experimentally verified.
Abstract: We propose a model parameter estimation scheme for coding tree unit (CTU)-level rate control (RC) in High Efficiency Video Coding (HEVC). Based on the proposed CTU-level RC model, a highly accurate RC scheme is achieved. We experimentally verified the validity of the algorithm.

Journal ArticleDOI
TL;DR: This work overviews preventative and precision medicine and field of deep learning, and point out some existing achievements, positive indications, limitations, and near future opportunities and impediments.
Abstract: Deep learning has a game-changing potential to improve the state of preventative and precision medicine within medical US Research Labs image computing. Here, we will first overview preventative and precision medicine and field of deep learning. Afterward, we will share our perspective on recent research and development NVIDIA activities in both areas and point out some existing achievements, positive indications, limitations, and near future opportunities and impediments. To flesh out our viewpoints, we draw from examples of our most recent work, which largely stem from radiologic images, but we encourage readers to consult some other recent reviews, which include many references that space did not allow us to include. We also assume the reader is broadly familiar with machine learning technologies.

Journal ArticleDOI
TL;DR: A new mechanism for digital watermarking is proposed that combines a sparsity analysis process with a flexible selection process to embed the watermarks in a carrier image.
Abstract: In this paper, a new mechanism for digital watermarking is proposed. Three processes are involved: First, the watermarks are encapsulated into a carrier image; second, a sparsity analysis process is conducted on one component; finally, a flexible selection process is eventually executed to embed the watermarks. Experimental results demonstrate high-capacity information and strong robustness.

Journal ArticleDOI
TL;DR: Clustering these networks through their main metrics allows grouping similar musical tracks and to show the viability of the approach, results on a dataset of guitar solos are provided.
Abstract: Musical pieces can be modeled as complex networks. This fosters innovative ways to categorize music, paving the way toward novel applications in multimedia domains, such as music didactics, multimedia entertainment, and digital music generation. Clustering these networks through their main metrics allows grouping similar musical tracks. To show the viability of the approach, we provide results on a dataset of guitar solos.

Journal ArticleDOI
TL;DR: This paper fuse latent Dirichlet allocation and three text-based schemes to improve diversity of retrieved results and proves the advantages of this fusion.
Abstract: To provide an overview of a query object to fast target users' demands, it is important to make the retrieved results as diverse as possible. In this paper, we fuse latent Dirichlet allocation and three text-based schemes to improve diversity of retrieved results. The experiments prove our advantages.

Journal ArticleDOI
TL;DR: The goal of the MMHealth 2017 Workshop on Multimedia for Personal Health and Health Care was to explore the relevance and contribution of multimedia in healthcare and personal health.
Abstract: The goal of the MMHealth 2017 Workshop on Multimedia for Personal Health and Health Care was to explore the relevance and contribution of multimedia in healthcare and personal health. After a successful debut in Amsterdam in 2016, the second workshop at ACM Multimedia 2017 in Mountain View, California, again attracted more than 30 participants, indicating the relevance that this topic has in the multimedia research community.