scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2017"


Journal ArticleDOI
TL;DR: It is found that the real scrambling domain--the position-scrambling scope of ISEA's scrambled elements--can be used to support an efficient known or chosen-plaintext attack on it, and it is demonstrated that some advanced multimedia processing techniques can facilitate the cryptanalysis of multimedia encryption algorithms.
Abstract: Position scrambling (permutation) is widely used in multimedia encryption schemes and some international encryption standards, such as the Data Encryption Standard and the Advanced Encryption Standard. In this article, the authors re-evaluate the security of a typical image-scrambling encryption algorithm (ISEA). Using the internal correlation remaining in the cipher image, they disclose important visual information of the corresponding plain image in a ciphertext-only attack scenario. Furthermore, they found that the real scrambling domain--the position-scrambling scope of ISEA's scrambled elements--can be used to support an efficient known or chosen-plaintext attack on it. Detailed experimental results have verified these points and demonstrate that some advanced multimedia processing techniques can facilitate the cryptanalysis of multimedia encryption algorithms.

143 citations


Journal ArticleDOI
TL;DR: A snapshot of research into touch, taste, and smell is provided, carried out at the Sussex Computer Human Interaction Lab at the University of Sussex in Brighton, UK.
Abstract: For decades, the use of vision and audition for interaction dominated the field of human-computer interaction (HCI), despite the fact that nature has provided many more senses for perceiving and interacting with the world. Recently, HCI researchers have started trying to capitalize on touch, taste, and smell when designing interactive tasks, especially in gaming, multimedia, and art environments. Here, the authors provide a snapshot of their research into touch, taste, and smell, carried out at the Sussex Computer Human Interaction (SCHI) Lab at the University of Sussex in Brighton, UK.

71 citations


Journal ArticleDOI
Jingyi Yu1
TL;DR: In this article, the authors proposed a light-field imaging technology that captures the light rays people perceive from different locations and directions, combined with computer vision and machine learning, for producing low-cost, high-quality VR content.
Abstract: Producing ultra-high-quality content is the next grand challenge for the VR industry. Creating a virtual reality that even the human eye cannot distinguish from the real world will require light-field technology--3D imaging technology that captures the light rays people perceive from different locations and directions. When combined with computer vision and machine learning, light-field technology provides a viable path for producing low-cost, high-quality VR content.

67 citations


Journal ArticleDOI
TL;DR: JPEG is celebrating the 25th anniversary of its approval as a standard this year, and what are the fundamental components that have given it longevity?
Abstract: JPEG is celebrating the 25th anniversary of its approval as a standard this year. Where did JPEG come from, and what are the fundamental components that have given it longevity?

44 citations


Journal ArticleDOI
TL;DR: Open Symphony is designed to explore audience-performer interaction in live music performances, assisted by digital technology, and identified further design challenges around audience sense of control, learnability, and compositional structure.
Abstract: Most contemporary Western performing arts practices restrict creative interactions from audiences. Open Symphony is designed to explore audience-performer interaction in live music performances, assisted by digital technology. Audiences can conduct improvising performers by voting for various musical "modes." Technological components include a web-based mobile application, a visual client displaying generated symbolic scores, and a server service for the exchange of creative data. The interaction model, app, and visualization were designed through an iterative participatory design process. The system was experienced by about 120 audience and performer participants (35 completed surveys) in controlled (lab) and real-world settings. Feedback on usability and user experience was overall positive, and live interactions demonstrate significant levels of audience creative engagement. The authors identified further design challenges around audience sense of control, learnability, and compositional structure. This article is part of a special issue on multimedia for enriched music.

41 citations


Journal ArticleDOI
TL;DR: The Benchmarking Initiative for Multimedia Evaluation (MediaEval) organizes an annual cycle of scientific evaluation tasks that help the research community tackle challenges linked to less widely studied user needs.
Abstract: The Benchmarking Initiative for Multimedia Evaluation (MediaEval) organizes an annual cycle of scientific evaluation tasks in the area of multimedia access and retrieval. The tasks offer scientific challenges to researchers working in diverse areas of multimedia technology. The tasks, which are focused on the social and human aspects of multimedia, help the research community tackle challenges linked to less widely studied user needs. They also support researchers in investigating the diversity of perspectives that naturally arise when users interact with multimedia content. Here, the authors present highlights from the 2016 workshop.

38 citations


Journal ArticleDOI
TL;DR: In this survey of academic contributions driving augmented reality’s commercial potential and of the industry trends advancing software and hardware developments and emerging applications, the author offers advice on how startups hoping to leverage these advances can compete against senior tech tycoons.
Abstract: In this survey of academic contributions driving augmented reality’s commercial potential and of the industry trends advancing software and hardware developments and emerging applications, the author offers advice on how startups hoping to leverage these advances can compete against senior tech tycoons.

37 citations


Journal ArticleDOI
TL;DR: A standards-compliant video-encoding scheme that can suppress unnecessary temporal fluctuation in stable background areas of a raw video, and improves object-detection performance and results in lower bit rates with comparable quality.
Abstract: Many distributed wireless surveillance applications use compressed videos for automatic video analysis tasks. However, the accuracy of object detection--which is essential for video analysis--can be reduced because lossy compression degrades video quality. Current standardized video-encoding schemes can cause temporal fluctuation for encoded blocks in stable background areas of a raw video, which strongly affects object-detection accuracy. To obtain better object-detection performance on compressed videos, the authors introduce a standards-compliant video-encoding scheme that can suppress unnecessary temporal fluctuation in stable background areas. New mode-decision strategies, designed for both intra- and interframes, reduce the temporal fluctuation while maintaining acceptable rate-distortion performance. Experimental results show that, compared with traditional encoding schemes, the proposed scheme improves object-detection performance and results in lower bit rates with comparable quality.

26 citations


Journal ArticleDOI
TL;DR: A selective privacy-preserving method is proposed that adaptively allocates encryption resources according to the privacy weight and execution time of each data package and selects the encryption method with the appropriate complexity and security level for each multimedia data package.
Abstract: With the significant improvements in mobile digital devices and wireless networking technologies, we have witnessed the explosion of multimedia data. Because it is dynamic, vast in volume, and heterogeneous, this data not only evokes various novel data-driven services and applications, but also brings considerable security threats. In this article, the authors focus on privacy leakage issues in multimedia systems and study how to maximize the total privacy weights and upgrade the security level given predefined time and resource constraints. To this end, they propose a selective privacy-preserving method that adaptively allocates encryption resources according to the privacy weight and execution time of each data package. That is, it selects the encryption method with the appropriate complexity and security level for each multimedia data package. It first divides the data randomly into two parts, then performs XOR operations and generates cipher keys in different cloud storages to prevent users’ original information from being attacked by untrusted cloud operators. Extensive simulation results have demonstrated the advantages and superiority of the proposed method over previous schemes. This article is part of a special issue on cybersecurity.

24 citations


Journal ArticleDOI
TL;DR: This work discusses aspects of DMI design by focusing on the complexity of the design space and the importance of prototyping cycles, and proposes a new methodology and an associated physical prototyping toolkit, which has building blocks inspired by of existing instruments.
Abstract: Digital musical instruments (DMIs) make up a class of devices in which gestural control and sound production are physically decoupled, but digitally mapped. This work discusses aspects of DMI design by focusing on the complexity of the design space and the importance of prototyping cycles. The authors' research questions cover how to provide an initial path for generating DMI ideas and how to reduce the time and effort required to build functional DMI prototypes. To address these questions, they propose a new methodology and an associated physical prototyping toolkit, which has building blocks inspired by of existing instruments. Preliminary tests with musicians and DMI designers revealed a strong potential for its use in the development of DMIs, and also uncovered limitations of the current toolkit. This article is part of a special issue on multimedia technologies for enriched music.

21 citations


Journal ArticleDOI
TL;DR: The experimental results show that the proposed VQA mechanism for next-generation (5G) mobile networks can monitor the video quality when the network is degraded, and significantly reducing power consumption.
Abstract: This article proposes a video quality assessment (VQA) mechanism for next-generation (5G) mobile networks, following the small cell deployment architecture. The proposed method uses the structural similarity (SSIM) index as a reduced reference metric and is suitable for implementation as a virtual network function (VNF) within an IT infrastructure located close to the small cell. It enables the in-service monitoring of the delivered video quality, which is a useful tool for mobile network operators to monitor their customers’ satisfaction. An advantage of the proposed method is that the complex and power-consuming VQA process is performed at the edge of the network, and not at the user equipment itself, thus significantly reducing power consumption. The authors used a small cell experimental testbed to implement and evaluate the performance of the proposed method. The experimental results show that the proposed method can monitor the video quality when the network is degraded.

Journal ArticleDOI
TL;DR: The authors provide a first-person outlook on the technical challenges involved in the recording, analysis, archiving, and cloud-based interchange of multimodal string quartet performance data as part of a collaborative research project on ensemble music making and develop a hosting platform through which multimodals data can be stored, visualized, annotated, and selectively retrieved via a web interface and a dedicated API.
Abstract: The authors provide a first-person outlook on the technical challenges involved in the recording, analysis, archiving, and cloud-based interchange of multimodal string quartet performance data as part of a collaborative research project on ensemble music making. To facilitate the sharing of their own collection of multimodal recordings and extracted descriptors and annotations, they developed a hosting platform through which multimodal data (audio, video, motion capture, and derived signals) can be stored, visualized, annotated, and selectively retrieved via a web interface and a dedicated API. This article offers a twofold contribution: the authors open their collection of enriched multimodal recordings, the Quartet dataset, to the community, and they introduce and enable access to their multimodal data exchange platform and web application, the Repovizz system. This article is part of a special issue on multimedia technologies for enriched music.

Journal ArticleDOI
TL;DR: The authors designed a network function virtualization (NFV)-based virtual cache (vCache) that can dynamically manage video chunks cost-effectively and can intelligently provision resources to guarantee transcoding delays won't affect streaming services.
Abstract: In adaptive bitrate (ABR) streaming, each video must be transcoded into multiple representations. Transcoding and caching videos consume tremendous resources, and only a small percentage of video chunks are frequently requested. Thus a question arises: is it necessary to pre-transcode each video and cache all video chunks? To answer this, the authors designed a network function virtualization (NFV)-based virtual cache (vCache). In vCache, video chunks have two mutually exclusive caching states: physically cached and virtually cached. A physically cached video chunk can be directly read from storage, and it consumes storage resources. A virtually cached video chunk will be transcoded online when being requested, and it consumes computing resources. With NFV, vCache can dynamically manage video chunks cost-effectively and can intelligently provision resources to guarantee transcoding delays won't affect streaming services. Results from experiments show that vCache can greatly reduce operational costs for ABR. This article is part of a special issue on advancing multimedia distribution.

Journal ArticleDOI
TL;DR: Deep learning alone will not solve all the problems encountered in industrial robotics, but it will certainly improve the perception capabilities of robotics systems, given its power to recognize complex real-world patterns robustly.
Abstract: The pattern recognition capabilities of deep learning have pushed the limits in various fields—and industrial robotics is no exception. Deep learning alone will not solve all the problems encountered in industrial robotics, but it will certainly improve the perception capabilities of robotics systems, given its power to recognize complex real-world patterns robustly. Here, the author examines robotics applications in deep learning.

Journal ArticleDOI
TL;DR: The authors propose to model the bandwidth allocation problem as a sigmoidal programming problem, more closely representing video traffic, and solve this nonconvex optimization problem using an approximation algorithm.
Abstract: The problem of bandwidth allocation in networks is traditionally solved using distributed rate allocation algorithms under the general framework of network utility maximization (NUM). Despite many advances in solving the computationally intensive flow assignment problem in NUM, the common but unrealistic assumption of concavity of utility functions undermines the performance of existing systems in providing satisfactory quality of experience (QoE) to consumers of video traffic, the utility function of which is not concave, but sigmoidal. The authors propose to model the bandwidth allocation problem as a sigmoidal programming problem, more closely representing video traffic, and solve this nonconvex optimization problem using an approximation algorithm. Their simulation results for video streaming over a range of tree-shaped content delivery networks indicate improvements of at least 60 percent in average utility/QoE and 45 percent in fairness, while using slightly fewer network resources, compared to two representative methods: proportional fair and max-min fair.

Journal ArticleDOI
TL;DR: Empirical tests show that although the responsive websites investigated had acceptable levels of accessibility, they posed numerous usability barriers and triggered intense, negative user emotions.
Abstract: Recent studies show that websites complying with accessibility guidelines can still be ineffective, inefficient, and unpleasant. Compliance with accessibility guidelines does not guarantee blind users' satisfaction when accessing websites. Meanwhile, in recent years, websites have undergone radical changes regarding design, development, and construction. Responsive design is a new trend that has a strong impact on web design. To determine blind users' experience with responsive design, the authors performed empirical tests to investigate the impact of responsive design on the emotions of blind users during web interactions. They measured user emotions by applying the Positive and Negative Affect Schedule (PANAS) instrument. Results show that although the responsive websites investigated had acceptable levels of accessibility, they posed numerous usability barriers and triggered intense, negative user emotions. Furthermore, the average number of negative emotional reactions for blind users was higher in the case of responsive web design than in the case of nonresponsive web design.

Journal ArticleDOI
TL;DR: Security and privacy issues in multimedia crowdsensing are identified and existing solutions that are designed to protect both data producers and consumers in multimedia communities are described.
Abstract: Recent smartphones are equipped with various sensors, such as an accelerometer, GPS, and a gravity sensor, and have high-performance wireless communication capabilities. Through the ubiquitous presence of powerful mobile devices, crowdsensing lets ordinary people collectively gather and share real-time multimedia data. Multimedia crowdsensing has made large-scale participatory sensing viable in a speedy and cost-efficient manner, but it also introduces some security and privacy concerns. Personally identifiable information of participants can be exposed while sharing individually owned sensor data. This article identifies security and privacy issues in multimedia crowdsensing and describes existing solutions that are designed to protect both data producers and consumers in multimedia crowdsensing. This article is part of a special issue on cybersecurity.

Journal ArticleDOI
TL;DR: Yong Rui discusses a panel he attended, which featured various competitions between AI and humans, and considers AI's progress and remaining challenges and its potential to augment human intelligence.
Abstract: Yong Rui discusses a panel he attended, which featured various competitions between AI and humans. Rui considers AI's progress and remaining challenges and its potential to augment human intelligence.

Journal ArticleDOI
TL;DR: The authors analyze the forces driving the evolution of the network architecture, then describe a novel NFV-enabled media cloud system from two alternative perspectives: an end-to-end view and a layered view, and show that the cost of providing Internet media services by NFVs-enabled systems could be substantially reduced in practice.
Abstract: Recently, network operators around the world have encountered a paradox. On the one hand, they have made enormous investments to handle the tremendous growth of Internet media services triggered by the ubiquitous penetration of high-speed wireless networks and the increasing popularity of mobile devices. On the other hand, these investments have not delivered the expected revenue, leading to declining profitability. In response to this paradox, network function virtualization (NFV) has been proposed as a revolutionary technology to transform network architecture and operations. This emerging technology opens up significant opportunities to reduce the cost of operating media services. In this article, the authors present a survey on this emerging topic. Specifically, they analyze the forces driving the evolution of the network architecture, then describe a novel NFV-enabled media cloud system from two alternative perspectives: an end-to-end view and a layered view. They then illustrate the architectural changes' technical challenges, which range from optimal resource provision and request routing to automated optimization. Finally, they substantiate the NFV-enabled media cloud platform with proof-of-concept case studies. By demonstrating NFV's flexibility and effectiveness, they show that the cost of providing Internet media services by NFV-enabled systems could be substantially reduced in practice.

Journal ArticleDOI
TL;DR: This article investigates how switching to state-of-the-art NFV products for multimedia content delivery can result in significant energy costs and identifies energy inefficiency in the NFV data plane, which can be exacerbated if not handled properly.
Abstract: Multimedia now accounts for the largest share of all Internet traffic, highlighted by its volume, variety, multicast nature, and QoS constraints. Downstream toward consumers, multimedia traffic can traverse through middleboxes, undergoing additional data processing imposed by content providers and distributors. With the advent of network function virtualization (NFV), middleboxes are progressively embedded in off-the-shelf, general-purpose servers. Despite the benefits, NFV can incur an undue amount of energy consumption during high packet forwarding. In this article, the authors investigate how switching to state-of-the-art NFV products for multimedia content delivery can result in significant energy costs. They identify energy inefficiency in the NFV data plane, which can be exacerbated if not handled properly. They outline a power management framework that considers characteristics of multimedia traffic and exploits CPU frequency scaling to save energy. This article is part of a special issue on advancing multimedia distribution.

Journal ArticleDOI
Ran Wang1, Guangquan Xu1, Bin Liu1, Yan Cao1, Xiaohong Li1 
TL;DR: Experimental results show that the proposed IPS3 scheme outperforms the traditional network flow watermarking technology in the above-mentioned two issues: noise filtering and multistream tracing, which can achieve higher accuracy when it is used in anonymous network tracing.
Abstract: To solve the problems of noise interference and multistream tracing in anonymous network tracing, this article proposes an interval packet-size-based spread spectrum (IPS3) network flow watermarking technology, which adopts a new watermarking carrier based on the original direct sequence spread spectrum (DSSS) technology. On one hand, IPS3 solves the problem of multistream tracing through the operation of DSSS for original watermarking. On the other hand, taking the average packet size in a time interval as the watermarking carrier, it solves the problem of network flow watermarking technology being subjected to the network stability by adjusting the carrier size in the process of watermarking modulation. Experimental results show that the proposed IPS3 scheme outperforms the traditional network flow watermarking technology in the above-mentioned two issues: noise filtering and multistream tracing, which can achieve higher accuracy when it is used in anonymous network tracing. This article is part of a special issue on cybersecurity.

Journal ArticleDOI
Lifeng Sun1, Ming Ma1, Wen Hu1, Haitian Pang1, Zhi Wang1 
TL;DR: The authors perform a large-scale measurement to explore the popularity of geocontent and design solutions for the system's operation, including regional content popularity prediction, region partition, and collaborative content replication.
Abstract: To deliver video traffic with guaranteed quality of experience, providers have been deploying content delivery networks (CDNs) closer to users. CDN providers are also starting to use the storage and network resources of edge network devices for content delivery--that is, crowdsourced CDN. This article envisions a crowdsourced CDN and proposes a set of practical strategies to guide the implementation of this new paradigm. The authors perform a large-scale measurement to explore the popularity of geocontent. They then design solutions for the system's operation, including regional content popularity prediction, region partition, and collaborative content replication. In addition, they propose an auction mechanism to cope with resource competition among multiple content providers, and they demonstrate the performance gain of the design using data-driven simulation.

Journal ArticleDOI
TL;DR: Discrete Cross-Modal Hashing (DCMH) is a novel supervised cross-modal hashing method to learn the binary codes without relaxing them, and it learns binary codes for use as ideal features for classification.
Abstract: Hashing techniques have been widely adopted for cross-modal retrieval due to their low storage cost and fast query speed. Recently, some unimodal hashing methods have tried to directly optimize the objective function with discrete binary constraints. Inspired by these methods, the authors propose a novel supervised cross-modal hashing method called Discrete Cross-Modal Hashing (DCMH) to learn the binary codes without relaxing them. DCMH is formulated through semantic similarity reconstruction, and it learns binary codes for use as ideal features for classification. Furthermore, DCMH alternately updates binary codes for each modality, and its discrete hashing codes are learned efficiently, bit by bit, which is quite promising for large-scale datasets. To evaluate the effectiveness of the proposed discrete optimization, the authors optimize their objective function in a relax-and-threshold manner. Extensive empirical results on both image-text and image-tag datasets demonstrate that DCMH is a significant improvement over previous approaches in terms of training time and retrieval performance.

Journal ArticleDOI
TL;DR: This article explores a different approach: MCS based on word of mouth (WoM), in which crowdworkers, apart from executing tasks, exploit their mobile social networks and/or physical encounters to actively recruit other appropriate individuals to work on the task.
Abstract: By fully exploring various sensing capabilities and multiple wireless interfaces of mobile devices and integrating them with human power and intelligence, mobile crowdsourcing (MCS) is emerging as an effective paradigm for large-scale multimedia-related applications. However, most MCS schemes use a direct mode, in which crowdworkers passively or actively select tasks and contribute without interacting and collaborating with each other; such a mode can hamper some time-constrained crowdsourced tasks. This article explores a different approach: MCS based on word of mouth (WoM), in which crowdworkers, apart from executing tasks, exploit their mobile social networks and/or physical encounters to actively recruit other appropriate individuals to work on the task. The authors describe a WoM-based MCS architecture and typical applications, which they divide into Internet-scale and local scale. They then systematically summarize the main technical challenges, including crowdworker recruitment, incentive design, security and privacy, and data quality control, and they compare typical solutions. Finally, from a systems-level viewpoint, they discuss several practical issues that must be resolved. This article is part of a special issue on cybersecurity.

Journal ArticleDOI
TL;DR: This work proposes a practical network-assisted HAS system where the network elements infer the network link congestion using measurements collected from the client endpoints, and the congestion-level signal is then used by the clients to optimize their video data requests.
Abstract: HTTP Adaptive Streaming (HAS) can efficiently deliver video to multiple heterogeneous users in a fully distributed way. This might, however, lead to unfair bandwidth utilization among HAS users. Therefore, network-assisted HAS systems have been proposed, where network elements operate alongside clients' adaptation logic to improve user satisfaction. However, current solutions rely on the assumption that network elements have full knowledge of the network status, which isn't always realistic. In this work, the authors propose a practical network-assisted HAS system where the network elements infer the network link congestion using measurements collected from the client endpoints. The congestion-level signal is then used by the clients to optimize their video data requests. The authors' novel controller maximizes overall user satisfaction, and the clients share the available bandwidth fairly from a utility perspective, as demonstrated by simulation results obtained on a network simulator.

Journal ArticleDOI
TL;DR: This special issue gathers state-of-the-art research on multimedia methods and technologies aimed at enriching music performance, production, and consumption.
Abstract: This special issue gathers state-of-the-art research on multimedia methods and technologies aimed at enriching music performance, production, and consumption.

Journal ArticleDOI
TL;DR: Susanne Boll recounts her experience as a telepresence robot attending this year’s ACM Conference on Human Factors in Computing Systems (CHI), illustrating the fusion of the state of the art in robotics, long-distance interaction, and shared remote-audio-visual experiences.
Abstract: Susanne Boll recounts her experience as a telepresence robot attending this year’s ACM Conference on Human Factors in Computing Systems (CHI), illustrating the fusion of the state of the art in robotics, long-distance interaction, and shared remote-audio-visual experiences. As a researcher in human computer interaction, yet with an international team facing restricted travel owing to the newly imposed US travel ban, she opted for remote participation at CHI 2017. Furthermore, as a member of the multimedia community, she wanted to explore first-hand how multimedia is at work in the field of telepresence.

Journal ArticleDOI
TL;DR: The results demonstrate that the framework established can extract valuable knowledge from TV ratings data and propose two applications based on their framework--interactive audience behavior mining tools and popular news topic detection.
Abstract: TV ratings play an important role in the analysis of advertising, risk management, and social trends. The ratings reflect the interests of audiences, so valuable knowledge could be discovered by analyzing ratings in combination with multimedia content, such as broadcast video and transcripts. This article establishes a general framework for mining audience behaviors. The authors focus on change points in TV ratings data, which reflect the active intentions of users. Meaningful patterns are extracted from a large number of change points by filtering and aggregating the data. The authors propose two applications based on their framework--interactive audience behavior mining tools and popular news topic detection. The results demonstrate that their framework can extract valuable knowledge from TV ratings data.

Journal ArticleDOI
TL;DR: It is argued that the next grand challenge for the multimedia community will be understanding and formally modeling the flow of life around us, over many modalities and scales, which will require supplementing generic signal/sensor-based retrieval with syntactical, semantic, and pragmatics-based approaches.
Abstract: In reviewing how content-based retrieval has evolved over the years, the authors argue that the next grand challenge for the multimedia community will be understanding and formally modeling the flow of life around us, over many modalities and scales. This will require supplementing generic signal/sensor-based retrieval with syntactical, semantic, and pragmatics-based approaches. To determine whether a data-driven approach supports their observations regarding the evolution of content-based retrieval, the authors examine the types of research reported at the ACM International Conference on Multimedia from 2005 to 2015.

Journal ArticleDOI
TL;DR: This article investigates how the two main actors of the video delivery chain--the CDN operator and the ISP--can benefit from network and server virtualization to negotiate dynamic service-level agreements that reduce CDN capital expenditures and operating expenses, while generating more revenue for the ISP.
Abstract: Today, over-the-top video streaming is gaining a lot of popularity. In this respect, the virtual content delivery network (CDN) is perceived as a key enabler to circumvent the technical challenges faced by content providers to deliver high-quality content over the Internet. Here, the authors investigate how the two main actors of the video delivery chain--the CDN operator and the ISP--can benefit from network and server virtualization to negotiate dynamic service-level agreements that reduce CDN capital expenditures and operating expenses, while generating more revenue for the ISP. First, the authors present a dataset used to simulate dynamic distributed traffic consumption. Second, they discuss the steps required to deploy and operate a virtual CDN deployed on an ISP's network. Furthermore, they present evaluation results of the proposed solution, based on simple models. Lastly, they elaborate on operational parameters that are used to further optimize the solution. This article is part of a special issue on advancing multimedia distribution.