scispace - formally typeset
Search or ask a question
Journal

arXiv: Multimedia 

About: arXiv: Multimedia is an academic journal. The journal publishes majorly in the area(s): Steganography & Digital watermarking. Over the lifetime, 1489 publications have been published receiving 12140 citations.


Papers
More filters
Posted Content
TL;DR: This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.
Abstract: — Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patterns

846 citations

Journal ArticleDOI
TL;DR: The Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M) as mentioned in this paper is a collection of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license.
Abstract: We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014. In this article we explain the rationale behind its creation, as well as the implications the dataset has for science, research, engineering, and development. We further present several new challenges in multimedia research that can now be expanded upon with our dataset.

401 citations

Proceedings ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a three-layer metaverse architecture from a macro perspective, containing infrastructure, interaction, and ecosystem, which journey toward both a historical and novel metaverse with a detailed timeline and table of specific attributes.
Abstract: In recent years, the metaverse has attracted enormous attention from around the world with the development of related technologies. The expected metaverse should be a realistic society with more direct and physical interactions, while the concepts of race, gender, and even physical disability would be weakened, which would be highly beneficial for society. However, the development of metaverse is still in its infancy, with great potential for improvement. Regarding metaverse's huge potential, industry has already come forward with advance preparation, accompanied by feverish investment, but there are few discussions about metaverse in academia to scientifically guide its development. In this paper, we highlight the representative applications for social good. Then we propose a three-layer metaverse architecture from a macro perspective, containing infrastructure, interaction, and ecosystem. Moreover, we journey toward both a historical and novel metaverse with a detailed timeline and table of specific attributes. Lastly, we illustrate our implemented blockchain-driven metaverse prototype of a university campus and discuss the prototype design and insights.

347 citations

Posted Content
TL;DR: The rationale behind the creation of the YFCC100M, the largest public multimedia collection that has ever been released, is explained, as well as the implications the dataset has for science, research, engineering, and development.
Abstract: We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014. In this article we explain the rationale behind its creation, as well as the implications the dataset has for science, research, engineering, and development. We further present several new challenges in multimedia research that can now be expanded upon with our dataset.

326 citations

Proceedings ArticleDOI
TL;DR: In this article, a viewport-adaptive 360-degree video streaming system is proposed to reduce the bandwidth waste, while still providing an immersive experience, by preparing multiple video representations, which differ not only by their bit-rate, but also by the qualities of different scene regions.
Abstract: The delivery and display of 360-degree videos on Head-Mounted Displays (HMDs) presents many technical challenges. 360-degree videos are ultra high resolution spherical videos, which contain an omnidirectional view of the scene. However only a portion of this scene is displayed on the HMD. Moreover, HMD need to respond in 10 ms to head movements, which prevents the server to send only the displayed video part based on client feedback. To reduce the bandwidth waste, while still providing an immersive experience, a viewport-adaptive 360-degree video streaming system is proposed. The server prepares multiple video representations, which differ not only by their bit-rate, but also by the qualities of different scene regions. The client chooses a representation for the next segment such that its bit-rate fits the available throughput and a full quality region matches its viewing. We investigate the impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user, showing that the cube map layout offers the best quality for the given bit-rate budget. An evaluation with a dataset of users navigating 360-degree videos demonstrates that segments need to be short enough to enable frequent view switches.

228 citations

Network Information
Related Journals (5)
Multimedia Tools and Applications
16K papers, 185.7K citations
89% related
IEEE Transactions on Multimedia
4.3K papers, 175.2K citations
88% related
IEEE Transactions on Circuits and Systems for Video Technology
5.4K papers, 311.2K citations
87% related
arXiv: Computer Vision and Pattern Recognition
50K papers, 1.1M citations
86% related
IEEE Transactions on Image Processing
9.2K papers, 868.5K citations
84% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
2021171
2020189
2019184
2018167
2017143
2016137