scispace - formally typeset
Search or ask a question

Showing papers by "Christian Timmerer published in 2022"


Proceedings ArticleDOI
14 Jun 2022
TL;DR: The Video Complexity Analyzer (VCA) project aims to provide an efficient spatial and temporal complexity analysis of each video (segment) which can be used in various applications to find the optimal encoding decisions.
Abstract: For online analysis of the video content complexity in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. To this light, for each video (segment), two features, i.e., the average texture energy and the average gradient of the texture energy, are determined. A DCT-based energy function is introduced to determine the block-wise texture of each frame. The spatial and temporal features of the video (segment) are derived from this DCT-based energy function. The Video Complexity Analyzer (VCA) project aims to provide an efficient spatial and temporal complexity analysis of each video (segment) which can be used in various applications to find the optimal encoding decisions. VCA leverages some of the x86 Single Instruction Multiple Data (SIMD) optimizations for Intel CPUs and multi-threading optimizations to achieve increased performance. VCA is an open-source library published under the GNU GPLv3 license. Github: https://github.com/cd-athena/VCA Online documentation: https://cd-athena.github.io/VCA/ Website: https://vca.itec.aau.at/

27 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: An online per-title encoding scheme (OPTE) for live video streaming applications that predicts each target bitrate’s optimal resolution from any pre-defined set of resolutions using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment.
Abstract: Current per-title encoding schemes encode the same video content at various bitrates and spatial resolutions to find an optimized bitrate ladder for each video content in Video on Demand (VoD) applications. However, in live streaming applications, a bitrate ladder with fixed bitrate-resolution pairs is used to avoid the additional latency caused to find optimum bitrate-resolution pairs for every video content. This paper introduces an online per-title encoding scheme (OPTE) for live video streaming applications. In this scheme, each target bitrate’s optimal resolution is predicted from any pre-defined set of resolutions using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment. Experimental results show that, on average, OPTE yields bitrate savings of 20.45% and 28.45% to maintain the same PSNR and VMAF, respectively, compared to a fixed bitrate ladder scheme (as adopted in current live streaming deployments) without any noticeable additional latency in streaming.

21 citations


Proceedings ArticleDOI

[...]

14 Jun 2022
TL;DR: In this article , a DCT-based energy function is introduced to determine the block-wise texture of each frame and the spatial and temporal features of the video (segment) are derived from this DCT based energy function.
Abstract: For online analysis of the video content complexity in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. To this light, for each video (segment), two features, i.e., the average texture energy and the average gradient of the texture energy, are determined. A DCT-based energy function is introduced to determine the block-wise texture of each frame. The spatial and temporal features of the video (segment) are derived from this DCT-based energy function. The Video Complexity Analyzer (VCA) project aims to provide an efficient spatial and temporal complexity analysis of each video (segment) which can be used in various applications to find the optimal encoding decisions. VCA leverages some of the x86 Single Instruction Multiple Data (SIMD) optimizations for Intel CPUs and multi-threading optimizations to achieve increased performance. VCA is an open-source library published under the GNU GPLv3 license.

18 citations


Proceedings ArticleDOI
14 Jun 2022
TL;DR: This paper provides an overview of the open Video Complexity Dataset (VCD) which comprises 500 Ultra High Definition (UHD) resolution test video sequences, characterized by spatial and temporal complexities, rate-distortion complexity, and encoding complexity with the x264 AVC/H.264 and x265 HEVC/ H.265 video encoders.
Abstract: This paper provides an overview of the open Video Complexity Dataset (VCD) which comprises 500 Ultra High Definition (UHD) resolution test video sequences. These sequences are provided at 24 frames per second (fps) and stored online in losslessly encoded 8-bit 4:2:0 format. In this paper, all sequences are characterized by spatial and temporal complexities, rate-distortion complexity, and encoding complexity with the x264 AVC/H.264 and x265 HEVC/H.265 video encoders. The dataset is tailor-made for cutting-edge multimedia applications such as video streaming, two-pass encoding, per-title encoding, scene-cut detection, etc. Evaluations show that the dataset includes diversity in video complexities. Hence, using this dataset is recommended for training and testing video coding applications. All data have been made publicly available as part of the dataset, which can be used for various applications. Online documentation: https://vcd.itec.aau.at. Dataset URL: https://ftp.itec.aau.at/datasets/video-complexity/.

16 citations


Proceedings ArticleDOI
01 Mar 2022
TL;DR: A low-latency pre-processing algorithm named COntent-aware frame Dropping Algorithm (CODA) is proposed to predict the optimized framerate per video segment in streaming scenarios, saving encoding time and improving visual quality at lower bitrates.
Abstract: Ultra High Definition Television (UHDTV) offers a better immersive audiovisual experience than HDTV by improving the aesthetic sense of the content [1]. How-ever, it may lead to an increase of both encoding time complexity and compression artifacts at lower bitrates. To address this challenge, a low-latency pre-processing algorithm named COntent-aware frame Dropping Algorithm (CODA) is proposed to predict the optimized framerate per video segment in streaming scenarios. The optimized framerate $(\hat{f})$ for every video segment at each target bitrate is modelled as an exponential decay (increasing) function whose decay rate is directly proportional to the temporal characteristics $(h)$ [2] [3] of the video and the target bitrate $(b)$, and inversely proportional to the spatial characteristics $(E)$ of the video. The encoding is carried out with the predicted framerate, saving encoding time and improving visual quality at lower bitrates. At the decoder side, the video is upscaled in the temporal domain to the original framerate $(f_{max})$ for display.

14 citations


Proceedings ArticleDOI

[...]

14 Jun 2022
TL;DR: The Video Complexity Dataset (VCD) as mentioned in this paper contains 500 UHD resolution test video sequences and is tailored for cutting-edge multimedia applications such as video streaming, two-pass encoding, per-title encoding, scene-cut detection, etc.
Abstract: This paper provides an overview of the open Video Complexity Dataset (VCD) which comprises 500 Ultra High Definition (UHD) resolution test video sequences. These sequences are provided at 24 frames per second (fps) and stored online in losslessly encoded 8-bit 4:2:0 format. In this paper, all sequences are characterized by spatial and temporal complexities, rate-distortion complexity, and encoding complexity with the x264 AVC/H.264 and x265 HEVC/H.265 video encoders. The dataset is tailor-made for cutting-edge multimedia applications such as video streaming, two-pass encoding, per-title encoding, scene-cut detection, etc. Evaluations show that the dataset includes diversity in video complexities. Hence, using this dataset is recommended for training and testing video coding applications. All data have been made publicly available as part of the dataset, which can be used for various applications.

9 citations


Proceedings ArticleDOI
01 Mar 2022
TL;DR: A hybRId P2P-CDN arcHiTecture for low LatEncy live video stReaming (RICHTER) is introduced, the details of its design are discussed, and a few directions of the future work are given.
Abstract: Content Distribution Networks (CDN) and HTTP Adaptive Streaming (HAS) are considered the principal video delivery technologies over the Internet. Despite the wide usage of these technologies, designing cost-effective, scalable, and flexible architectures that support low latency and high quality live video streaming is still a challenge. To address this issue, we leverage existing works that have combined the characteristics of Peer-to-Peer (P2P) networks and CDN-based systems and introduce a hybrid CDN-P2P live streaming architecture. When dealing with the technical complexity of managing hundreds or thousands of concurrent streams, such hybrid systems can provide low latency and high quality streams by enabling the delivery architecture to switch between the CDN and the P2P modes. However, modern networking paradigms such as Edge Computing, Network Function Virtualization (NFV), and distributed video transcoding have not been extensively employed to design hybrid P2P-CDN streaming systems. To bridge the aforementioned gaps, we introduce a hybRId P2P-CDN arcHiTecture for low LatEncy live video stReaming (RICHTER), discuss the details of its design, and finally give a few directions of the future work.

9 citations


Journal ArticleDOI
TL;DR: A novel fast approach to optimize Amazon EC2 spot instances and minimize video encoding costs is proposed and the results show that this approach can reduce the encoding costs by at least 15.8% and up to 47.8%" when compared to a random selection of EC2spot instances.
Abstract: : HTTP Adaptive Streaming (HAS) of video content is becoming an undivided part of the Internet and accounts for most of today’s network traffic. Video compression technology plays a vital role in efficiently utilizing network channels, but encoding videos into multiple representations with selected encoding parameters is a significant challenge. However, video encoding is a computationally intensive and time-consuming operation that requires high-performance resources provided by on-premise infrastructures or public clouds. In turn, the public clouds, such as Amazon elastic compute cloud (EC2), provide hundreds of computing instances optimized for different purposes and clients’ budgets. Thus, there is a need for algorithms and methods for optimized computing instance selection for specific tasks such as video encoding and transcoding operations. Additionally, the encoding speed directly depends on the selected encoding parameters and the complexity characteristics of video content. In this paper, we first benchmarked the video encoding performance of Amazon EC2 spot instances using multiple × 264 codec encoding parameters and video sequences of varying complexity. Then, we proposed a novel fast approach to optimize Amazon EC2 spot instances and minimize video encoding costs. Furthermore, we evaluated how the optimized selection of EC2 spot instances can affect the encoding cost. The results show that our approach, on average, can reduce the encoding costs by at least 15.8% and up to 47.8% when compared to a random selection of EC2 spot instances.

8 citations


Proceedings ArticleDOI
18 Jul 2022
TL;DR: A perceptually-aware per-title encoding (PPTE) scheme for video streaming applications where optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference in quality perception to avoid adding perceptually similar representations in the bitrate ladder.
Abstract: In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is used for simplicity and efficiency to avoid the additional encoding run-time required to find optimum resolution-bitrate pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces a perceptually-aware per-title encoding (PPTE) scheme for video streaming applications. In this scheme, optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference (JND) in quality perception to avoid adding perceptually similar representations in the bitrate ladder. To this end, Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment are used. Experimental results show that, on average, PPTE yields bitrate savings of 16.47% and 27.02% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming accompanied by a 30.69% cumulative decrease in storage space for various representations.

7 citations


Proceedings ArticleDOI
14 Jun 2022
TL;DR: This paper provides multiple test assets in the form of a dataset that facilitates the research and development of the stated technologies, Dynamic Adaptive Streaming over HTTP (MPEG-DASH) packaged multimedia assets, encoded with Advanced Video Coding (AVC), High Efficiency Video Coded (HEVC), AOMedia Video 1 (AV1), and VVC.
Abstract: Many applications and online services produce and deliver multimedia traffic over the Internet. Video streaming services with a rapidly growing desire for more resources to provide better quality, such as Ultra High Definition (UHD) 8K content, are on the list. The HTTP Adaptive Streaming (HAS) technique defines standard baselines for audio-visual content streaming to balance the delivered media quality and minimize defects in streaming sessions. On the other hand, video codecs development and standardization help the progress toward improving such services by introducing efficient algorithms and technologies. Versatile Video Coding (VVC) is one of the latest advancements in video encoding technology that is still not fully optimized and not supported on all available platforms. Mentioned optimization of the video codecs and supporting more platforms require years of research and development. This paper provides multiple test assets in the form of a dataset that facilitates the research and development of the stated technologies. Our open-source dataset comprises Dynamic Adaptive Streaming over HTTP (MPEG-DASH) packaged multimedia assets, encoded with Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), AOMedia Video 1 (AV1), and VVC. We provide our dataset with resolutions of up to 7680x4320 or 8K. Our dataset has a maximum media duration of 322 seconds, and we offer our MPEG-DASH packaged content with two segments lengths, 4 and 8 seconds.

7 citations


Journal ArticleDOI
16 May 2022
TL;DR: A coLlaborative Edge- and SDN-Assisted framework for HTTP aDaptive vidEo stReaming (LEADER), where the SDN controller collects various information items and runs a central optimization model that minimizes the HAS clients’ serving time, subject to the network’s and edge servers’ resource constraints.
Abstract: With the emerging demands of high-definition and low-latency video streams, HTTP Adaptive Streaming (HAS) is considered the principal video delivery technology over the Internet. Network-assisted video streaming schemes, which employ modern networking paradigms, e.g., Software-Defined Networking (SDN), Network Function Virtualization (NFV), and edge computing, have been introduced as promising complementary solutions in the HAS context to improve users’ Quality of Experience (QoE) as well as network utilization. However, the existing network-assisted HAS schemes have not fully used edge collaboration techniques and SDN capabilities for achieving the aforementioned aims. To bridge this gap, this paper introduces a coLlaborative Edge- and SDN-Assisted framework for HTTP aDaptive vidEo stReaming (LEADER). In LEADER, the SDN controller collects various information items and runs a central optimization model that minimizes the HAS clients’ serving time, subject to the network’s and edge servers’ resource constraints. Due to the NP-completeness and impractical overheads of the central optimization model, we propose an online distributed lightweight heuristic approach consisting of two phases that runs on the SDN controller and edge servers, respectively. We implement the proposed framework, conduct our experiments on a large-scale testbed including 250 HAS players, and compare its effectiveness with other strategies. The experimental results demonstrate that LEADER outperforms baseline schemes in terms of both users’ QoE and network utilization, by at least 22% and 13%, respectively.

Proceedings ArticleDOI

[...]

01 Mar 2022
TL;DR: In this article , a hybrid CDN-P2P live streaming architecture is proposed to provide low latency and high quality streams by enabling the delivery architecture to switch between the CDN and the P2P modes.
Abstract: Content Distribution Networks (CDN) and HTTP Adaptive Streaming (HAS) are considered the principal video delivery technologies over the Internet. Despite the wide usage of these technologies, designing cost-effective, scalable, and flexible architectures that support low latency and high quality live video streaming is still a challenge. To address this issue, we leverage existing works that have combined the characteristics of Peer-to-Peer (P2P) networks and CDN-based systems and introduce a hybrid CDN-P2P live streaming architecture. When dealing with the technical complexity of managing hundreds or thousands of concurrent streams, such hybrid systems can provide low latency and high quality streams by enabling the delivery architecture to switch between the CDN and the P2P modes. However, modern networking paradigms such as Edge Computing, Network Function Virtualization (NFV), and distributed video transcoding have not been extensively employed to design hybrid P2P-CDN streaming systems. To bridge the aforementioned gaps, we introduce a hybRId P2P-CDN arcHiTecture for low LatEncy live video stReaming (RICHTER), discuss the details of its design, and finally give a few directions of the future work.

Proceedings ArticleDOI
01 Mar 2022
TL;DR: This work presents the new features of H3 and QUIC, and compares them to those of H/1.1/2 and TCP, and shares the latest research findings in this domain.
Abstract: With the introduction of HTTP/3 (H3) and QUIC at its core, there is an expectation of significant improvements in Web-based secure object delivery. As HTTP is a central protocol to the current adaptive streaming methods in all major streaming services, an important question is what H3 will bring to the table for such services. To answer this question, we present the new features of H3 and QUIC, and compare them to those of H/1.1/2 and TCP. We also share the latest research findings in this domain.

Proceedings ArticleDOI
01 Mar 2022
TL;DR: Experimental results show that the proposed SR-ABR Net can improve the video quality compared to traditional SR approaches while running in real time and the proposed WISH-SR can significantly boost the visual quality of the delivered content while reducing both bandwidth consumption and number of stalling events.
Abstract: The advancement of hardware capabilities in recent years made it possible to apply deep neural network (DNN) based approaches on mobile devices. This paper introduces a lightweight super-resolution (SR) network, namely SR-ABR Net, deployed at mobile devices to upgrade low-resolution/low-quality videos and a novel adaptive bitrate (ABR) algorithm, namely WISH-SR, that leverages SR networks at the client to improve the video quality depending on the client's context. WISH-SR takes into account mobile device properties, video characteristics, and user preferences. Experimental results show that the proposed SR-ABR Net can improve the video quality compared to traditional SR approaches while running in real time. Moreover, the proposed WISH-SR can significantly boost the visual quality of the delivered content while reducing both bandwidth consumption and number of stalling events.

Proceedings ArticleDOI
04 Dec 2022
TL;DR: In this article , a multi-layer P2P-DN-CDN architecture for live video reaming is proposed, and an online learning approach is presented to solve the time complexity issue of the optimization model.
Abstract: Designing a cost-effective, scalable, and flexible architecture that supports low latency and high quality live video streaming is still a challenge for Over-The-Top (OTT) service providers. To cope with this issue, this paper leverages Peer-to-Peer (P2P), Content Delivery Network (CDN), edge computing, Network Function Virtualization (NFV), and distributed video transcoding paradigms to introduce a hybRId P2P-DN arcHfiTecture for livE video stReaming (RICHTER). We first introduce RICHTER's multi-layer architecture and design an action tree that considers all feasible resources provided by peers, edge, and CDN servers for serving peer requests with minimum latency and maximum quality. We then formulate the problem as an optimization model executed at the edge of the network. We present an Online Learning (OL) approach that leverages an unsupervised Self Organizing Map (SOM) to (i) alleviate the time complexity issue of the optimization model and (ii) make it a suitable solution for large-scale scenarios, by enabling decisions for groups of requests instead of for single requests. Finally, we implement the RICHTER framework, conduct our experiments on a large-scale cloud-based testbed including 350 HAS players, and compare its effectiveness with baseline systems. The experimental results illustrate that RICHTER outperforms baseline schemes in terms of users' Quality of Experience (QoE), latency, and network utilization, by at least 59%, 39%, and 70% respectively.

Journal ArticleDOI
TL;DR: In this paper , the authors propose a scalable per-title encoding approach to support both CPU-only and GPU-available end-users, which is based on a bitrate ladder for video super-resolution.
Abstract: InIn HTTP Adaptive Streaming (HAS), each video is divided into smaller segments, and each segment is encoded at multiple pre-defined bitrates to construct a bitrate ladder . To optimize bitrate ladders, per-title encoding approaches encode each segment at various bitrates and resolutions to determine the convex hull. From the convex hull, an optimized bitrate ladder is constructed, resulting in an increased Quality of Experience (QoE) for end-users. With the ever-increasing efficiency of deep learning-based video enhancement approaches, they are more and more employed at the client-side to increase the QoE, specifically when GPU capabilities are available. Therefore, scalable approaches are needed to support end-user devices with both CPU and GPU capabilities (denoted as CPU-only and GPU-available end-users, respectively) as a new dimension of a bitrate ladder. To address this need, we propose DeepStream , a scalable content-aware per-title encoding approach to support both CPU-only and GPU-available end-users. ( i ) To support backward compatibility , DeepStream constructs a bitrate ladder based on any existing per-title encoding approach. Therefore, the video content will be provided for legacy end-user devices with CPU-only capabilities as a base layer (BL). ( ii ) For high-end end-user devices with GPU capabilities, an enhancement layer (EL) is added on top of the base layer comprising lightweight video super-resolution deep neural networks (DNNs) for each bitrate-resolution pair of the bitrate ladder. A content-aware video super-resolution approach leads to higher video quality, however, at the cost of bitrate overhead. To reduce the bitrate overhead for streaming content-aware video super-resolution DNNs, DeepCABAC , context-adaptive binary arithmetic coding for DNN compression, is used. Furthermore, the similarity among ( i ) segments within a scene and ( ii ) frames within a segment are used to reduce the training costs of DNNs. Experimental results show bitrate savings of 34% and 36% to maintain the same PSNR and VMAF, respectively, for GPU-available end-users, while the CPU-only users get the desired video content as usual.

Book ChapterDOI
12 Jan 2022
TL;DR: MoViDNN as discussed by the authors is an open-source mobile platform to evaluate DNN-based video quality enhancement methods, such as super-resolution, denoising, and deblocking.
Abstract: Deep neural network (DNN) based approaches have been intensively studied to improve video quality thanks to their fast advancement in recent years. These approaches are designed mainly for desktop devices due to their high computational cost. However, with the increasing performance of mobile devices in recent years, it became possible to execute DNN based approaches in mobile devices. Despite having the required computational power, utilizing DNNs to improve the video quality for mobile devices is still an active research area. In this paper, we propose an open-source mobile platform, namely MoViDNN, to evaluate DNN based video quality enhancement methods, such as super-resolution, denoising, and deblocking. Our proposed platform can be used to evaluate the DNN based approaches both objectively and subjectively. For objective evaluation, we report common metrics such as execution time, PSNR, and SSIM. For subjective evaluation, Mean Score Opinion (MOS) is reported. The proposed platform is available publicly at https://github.com/cd-athena/MoViDNN

TL;DR: In this article , a fast and efficient per-title encoding scheme (Live-PSTR) is proposed tailor-made for live Ultra High Definition (UHD) High Framerate (HFR) streaming, which includes a preprocessing step in which Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features are used to determine the complexity of each video segment, based on which the optimized encoding resolution and framerate for streaming at every target bitrate is determined.
Abstract: – Current per-title encoding schemes encode the same video content at various bitrates and spatial resolutions to find optimal bitrate-resolution pairs (known as bitrate ladder) for each content in Video on Demand (VoD) applications. But in live streaming applications, a fixed bitrate ladder is used for simplicity to avoid the additional latency to find the optimized bitrate-resolution pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or network resources or/and (ii) increased Quality of Experience (QoE). In this paper, a fast and efficient per-title encoding scheme (Live-PSTR) is proposed tailor-made for live Ultra High Definition (UHD) High Framerate (HFR) streaming. It includes a pre-processing step in which Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features are used to determine the complexity of each video segment, based on which the optimized encoding resolution and framerate for streaming at every target bitrate is determined. Experimental results show that, on average, Live-PSTR yields bitrate savings of 9.46% and 11.99% to maintain the same PSNR and VMAF scores, respectively compared to the HTTP Live Streaming (HLS) bitrate ladder.

Journal ArticleDOI
10 Oct 2022
TL;DR: In this paper , the authors propose different types of segment prefetching policies and study their costs and benefits, including segment pre-fetching based on past segment requests, transrating, a Markov prediction model and machine learning.
Abstract: Segment prefetching is a technique that transmits the next video segments in advance closer to the user to serve con-tent with reduced latency. Due to its location and capabilities, an edge computing node is an ideal component for executing segment prefetching policies and storing/caching the prefetched segments. In this work, we study segment prefetching techniques deployed at the edge computing node for adaptive video streaming. We propose different types of segment prefetching policies and study their costs and benefits, including segment prefetching based on past segment requests, transrating, a Markov prediction model and machine learning. Besides, we analyze and discuss which segment prefetching policy is better under which circumstances and the influence of the ABR algorithm and the bitrate ladder on segment prefetching.

Proceedings ArticleDOI
05 Sep 2022
TL;DR: This paper investigates JND for VMAF to facilitate the efficient construction of content-specific bitrate ladders, invalidates existing step size rules of thumb by revealing the large errors they introduce when applied to real-world video content and investigates different factors that influence the actual ΔVMAF step sizes required to obtain correct JNDs.
Abstract: We currently witness the rapidly growing importance of intelligent video streaming quality optimization and reduction of video delivery costs. Per-Title encoding, in contrast to a fixed bitrate ladder, shows significant promise to deliver higher quality video streams by addressing the trade-off between compression efficiency and video characteristics such as resolution and frame rate. Selecting encodings with noticeable quality differences in between prevents the construction of an inefficient bitrate ladder that suffers from too similar quality representations. In this respect, the VMAF metric represents a promising foundation for bitrate laddering, as it currently yields the highest video quality prediction performance. However, the minimum noticeable quality difference, referred as to just-noticeable-difference (JND), has not been properly validated for VMAF yet, with existing sources proposing highly diverse ΔVMAF step sizes ranging from two [1] to six [2]. This paper investigates JND for VMAF to facilitate the efficient construction of content-specific bitrate ladders. Using a publicly available JND multimedia dataset, we invalidate existing step size rules of thumb by revealing the large errors they introduce when applied to real-world video content. We investigate different factors that influence the actual ΔVMAF step sizes required to obtain correct JNDs and develop a model using content features that more accurately determines the optimal quality step sizes required for properly laddering individual clips.

Journal ArticleDOI
TL;DR: In this paper , a cost-aware adaptive video streaming approach for the Internet of Vehicles (IoV) to deliver video segments requested by mobile users at specified qualities and deadlines is presented.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel light field image compression method that enables viewport scalability, quality scalability and spatial scalability while keeping compression efficiency high, which can adapt to the display type, transmission channel, network condition, processing power, and user needs.
Abstract: Light field imaging, which captures both spatial and angular information, improves user immersion by enabling post-capture actions, such as refocusing and changing view perspective. However, light fields represent very large volumes of data with a lot of redundancy that coding methods try to remove. State-of-the-art coding methods indeed usually focus on improving compression efficiency and overlook other important features in light field compression such as scalability. In this paper, we propose a novel light field image compression method that enables (i) viewport scalability, (ii) quality scalability, (iii) spatial scalability, (iv) random access, and (v) uniform quality distribution among viewports, while keeping compression efficiency high. To this end, light fields in each spatial resolution are divided into sequential viewport layers, and viewports in each layer are encoded using the previously encoded viewports. In each viewport layer, the available viewports are used to synthesize intermediate viewports using a video interpolation deep learning network. The synthesized views are used as virtual reference images to enhance the quality of intermediate views. An image super-resolution method is applied to improve the quality of the lower spatial resolution layer. The super-resolved images are also used as virtual reference images to improve the quality of the higher spatial resolution layer. The proposed structure also improves the flexibility of light field streaming, provides random access to the viewports, and increases error resiliency. The experimental results demonstrate that the proposed method achieves a high compression efficiency and it can adapt to the display type, transmission channel, network condition, processing power, and user needs.

Book ChapterDOI
12 Jan 2022
TL;DR: In this paper , the authors presented ECAS-ML, Edge Assisted Adaptation Scheme for HTTP Adaptive Streaming with Machine Learning, which focuses on managing the tradeoff among bitrate, segment switches, and stalls to achieve a higher quality of experience (QoE).
Abstract: As the video streaming traffic in mobile networks is increasing, improving the content delivery process becomes crucial, e.g., by utilizing edge computing support. At an edge node, we can deploy adaptive bitrate (ABR) algorithms with a better understanding of network behavior and access to radio and player metrics. In this work, we present ECAS-ML, Edge Assisted Adaptation Scheme for HTTP Adaptive Streaming with Machine Learning. ECAS-ML focuses on managing the tradeoff among bitrate, segment switches, and stalls to achieve a higher quality of experience (QoE). For that purpose, we use machine learning techniques to analyze radio throughput traces and predict the best parameters of our algorithm to achieve better performance. The results show that ECAS-ML outperforms other client-based and edge-based ABR algorithms.

Journal ArticleDOI
TL;DR: Days of Future Past+ is introduced, a heuristic algorithm that takes advantage of the features of the latest HTTP version, HTTP/3, to provide high Quality of Experience (QoE) to the viewers and examines different strategies of download order for those segments to optimize the QoE in limited resources scenarios.
Abstract: HTTP Adaptive Streaming (HAS) solutions use various adaptive bitrate (ABR) algorithms to select suitable video qualities with the objective of coping with the variations of network connections. HTTP has been evolving with various versions and provides more and more features. Most of the existing ABR algorithms do not significantly benefit from the HTTP development when they are merely supported by the most recent HTTP version. An open research question is “How can new features of the recent HTTP versions be used to enhance the performance of HAS?” To address this question, in this paper, we introduce Days of Future Past+ (DoFP+ for short), a heuristic algorithm that takes advantage of the features of the latest HTTP version, HTTP/3, to provide high Quality of Experience (QoE) to the viewers. DoFP+ leverages HTTP/3 features, including (i) stream multiplexing, (ii) stream priority, and (iii) request cancellation to upgrade low-quality segments in the player buffer while downloading the next segment. The qualities of those segments are selected based on an objective function and throughput constraints. The objective function takes into account two factors, namely the (i) average bitrate and the (ii) video instability of the considered set of segments. We also examine different strategies of download order for those segments to optimize the QoE in limited resources scenarios. The experimental results show an improvement in QoE by up to 33% while the number of stalls and stall duration for DoFP+ are reduced by 86% and 92%, respectively, compared to state-of-the-art ABR schemes. In addition, DoFP+ saves, on average, up to 16% downloaded data across all test videos. Also, we find that downloading segments sequentially brings more benefits for retransmissions than concurrent downloads; and lower-quality segments should be upgraded before other segments to gain more QoE improvement. Our source code has been published for reproducibility at https://github.com/cd-athena/DoFP-Plus.

Journal ArticleDOI
18 Jul 2022
TL;DR: Experimental results show that, on average, OPSEyields bitrate savings of up to 48.88% in certain scenes to maintain the same VMAF, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming.
Abstract: In live streaming applications, typically a fixed set of bitrateresolution pairs (known as a bitrate ladder) is used during the entire streaming session in order to avoid the additional latency to find scene transitions and optimized bitrateresolution pairs for every video content. However, an optimized bitrate ladder per scene may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces an Online Per-Scene Encoding (OPSE) scheme for adaptive HTTP live streaming applications. In this scheme, scene transitions and optimized bitrate-resolution pairs for every scene are predicted using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features. Experimental results show that, on average, OPSEyields bitrate savings of up to 48.88% in certain scenes to maintain the same VMAF, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a framewise scheme based on a convolutional neural network for the detection and localization of video transcoding from AVC to HEVC (AVC-HEVC), where the partition and location information of prediction units (PUs) are introduced to generate frame-level PU maps to make full use of the local artifacts of PUs.
Abstract: In general, manipulated videos will eventually undergo recompression. Video transcoding will occur when the standard of recompression is different from the prior standard. Therefore, as a special sign of recompression, video transcoding can also be considered evidence of forgery in video forensics. In this paper, we focus on the detection and localization of video transcoding from AVC to HEVC (AVC-HEVC). There are two probable cases of AVC-HEVC transcoding - whole video transcoding and partial frame transcoding. However, the existing forensic methods only consider the detection of whole video transcoding, and they do not consider partial frame transcoding localization. In view of this, we propose a framewise scheme based on a convolutional neural network. First, we analyze that the essential difference between AVC-HEVC and HEVC is reflected in the high-frequency components of decoded frames. Then, the partition and location information of prediction units (PUs) are introduced to generate frame-level PU maps to make full use of the local artifacts of PUs. Finally, taking the decoded frames and PU maps as inputs, a dual-path network including specific convolutional modules and an adaptive fusion module is proposed. Through it, the artifacts on a single frame can be better extracted, and the transcoded frames can be detected and localized. Coupled with a simple voting strategy, the results of whole transcoding detection can be easily obtained. A large number of experiments are conducted to verify the performances. The results show that the proposed scheme outperforms or rivals the state-of-the-art methods in AVC-HEVC transcoding detection and localization.

Proceedings ArticleDOI
01 Mar 2022
TL;DR: Experimental results show that the proposed scheme yields significant bitrate savings while maintaining the same quality, compared to the HLS fixed bitrate ladder scheme without any noticeable additional latency in streaming.
Abstract: In live streaming applications, service providers generally use a bitrate ladder with fixed bitrate-resolution pairs instead of optimizing it per title to avoid the additional latency caused to find optimum bitrate-resolution pairs for every video content. This paper introduces an online bitrate ladder construction scheme for live video streaming applications. In this scheme, each target bitrate's optimized resolution is determined from any pre-defined set of resolutions using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment. Experimental results show that, on average, the proposed scheme yields significant bitrate savings while maintaining the same quality, compared to the HLS fixed bitrate ladder scheme without any noticeable additional latency in streaming.

Journal ArticleDOI
TL;DR: In this article , a cost and delay-aware light-weight transcoding approach at the edge is proposed, in which the optimal search results are stored as metadata for each bitrate of video segments to reduce the required time and computational resources for transcoding.
Abstract: The edge computing paradigm brings cloud capabilities close to the clients. Leveraging the edge’s capabilities can improve video streaming services by employing the storage capacity and processing power at the edge for caching and transcoding tasks, respectively, resulting in video streaming services with higher quality and lower latency. In this paper, we propose, a Cost-and Delay-aware Light-weight Transcoding approach at the Edge, in the context of HTTP Adaptive Streaming (HAS). The encoding of a video segment requires computationally intensive search processes. The main idea of is to store the optimal search results as metadata for each bitrate of video segments and reuse it at the edge servers to reduce the required time and computational resources for transcoding. Aiming at minimizing the cost and delay of Video-on-Demand (VoD) services, we formulate the problem of selecting an optimal policy for serving segment requests at the edge server, including (i) storing at the edge server, (ii) transcoding from a higher bitrate at the edge server, and (iii) fetching from the origin or a CDN server, as a Binary Linear Programming (BLP) model. As a result, stores the popular video segments at the edge and serves the unpopular ones by transcoding using metadata or fetching from the origin/CDN server. In this way, in addition to the significant reduction in bandwidth and storage costs, the transcoding time of a requested segment is remarkably decreased by utilizing its corresponding metadata. Moreover, we prove the proposed BLP model is an NP-hard problem and propose two heuristic algorithms to mitigate the time complexity of . We investigate the performance of in comprehensive scenarios with various video contents, encoding software, encoding settings, and available resources at the edge. The experimental results show that our approach (i) reduces the transcoding time by up to 97%, (ii) decreases the streaming cost, including storage, computation, and bandwidth costs, by up to 75%, and (iii) reduces delay by up to 48% compared to approaches.

Proceedings ArticleDOI
14 Dec 2022
TL;DR: In this paper , a multilayer and pipeline encoding on the computing continuum (MPEC2) method is proposed to address the key technical challenge of high-price and computational complexity of video encoding.
Abstract: Video streaming is the dominating traffic in today’s data-sharing world. Media service providers stream video content for their viewers, while worldwide users create and distribute videos using mobile or video system applications that significantly increase the traffic share. We propose a multilayer and pipeline encoding on the computing continuum (MPEC2) method that addresses the key technical challenge of high-price and computational complexity of video encoding. MPEC2 splits the video encoding into several tasks scheduled on appropriately selected Cloud and Fog computing instance types that satisfy the media service provider and user priorities in terms of time and cost. In the first phase, MPEC2 uses a multilayer resource partitioning method to explore the instance types for encoding a video segment. In the second phase, it distributes the independent segment encoding tasks in a pipeline model on the underlying instances. We evaluate MPEC2 on a federated computing continuum encompassing Amazon Web Services (AWS) EC2 Cloud and Exoscale Fog instances distributed in seven geographical locations. Experimental results show that MPEC2 achieves 24% faster completion time and 60% lower cost for video encoding compared to resource allocation related methods. When compared with baseline methods, MPEC2 yields 40%– 50% lower completion time and 5%–60% reduced total cost.

Proceedings ArticleDOI
31 Oct 2022
TL;DR: In this paper , a CMCD-Aware per-device bitrate LADder construction (CADLAD) is proposed, which leverages the Common Media Client Data (CMCD) standard to address the above issues.
Abstract: In this paper, we introduce a CMCD-Aware per-Device bitrate LADder construction (CADLAD) that leverages the Common Media Client Data (CMCD) standard to address the above issues. CADLAD comprises components at both client and server sides. The client calculates the top bitrate (tb) — a CMCD parameter to indicate the highest bitrate that can be rendered at the client — and sends it to the server together with its device type and screen resolution. The server decides on a suitable bitrate ladder, whose maximum bitrate and resolution are based on CMCD parameters, to the client device with the purpose of providing maximum QoE while minimizing delivered data. CADLAD has two versions to work in Video on Demand (VoD) and live streaming scenarios. Our CADLAD is client agnostic; hence, it can work with any players and ABR algorithms at the client. The experimental results show that CADLAD is able to increase the QoE by 2.6x while saving 71% of delivered data, compared to an existing bitrate ladder of an available video dataset. We implement our idea within CAdViSE — an open-source testbed for reproducibility.