scispace - formally typeset
Journal ArticleDOI

Analysis and Importance of Deep Learning for Video Aesthetic Assessments

28 Feb 2019-Vol. 5, Iss: 1, pp 546-554

TL;DR: This paper principally emphasizes deep learning on basics of automatic video aesthetic assessments on the basis of machine-based reality finding, good over-seeing, sensory activity recognition and so on.

AbstractDeep Learning is one of the active analysis topic obtaining a great deal of analysis attention recently. This increase in analysis interest is driven by several area as that are being worked on like machine-based reality finding, good over-seeing, sensory activity recognition, online learning, world of advertisement, text analysis and so on. Videos have specific characteristics that make their method unique. Visual aesthetic typically: Remember what they see, understand and learn rather than what they hear. This paper principally emphasizes deep learning on basics of automatic video aesthetic assessments.

...read more

Content maybe subject to copyright    Report


References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,871 citations

Journal ArticleDOI
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

12,532 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: In this paper, a biologically plausible, wide and deep artificial neural network architectures was proposed to match human performance on tasks such as the recognition of handwritten digits or traffic signs, achieving near-human performance.
Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

3,248 citations

Journal ArticleDOI
TL;DR: This work introduces, analyzes and demonstrates a recursive hierarchical generalization of the widely used hidden Markov models, which is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech.
Abstract: We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM) Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech We seek a systematic unsupervised approach to the modeling of such structures By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data We then use the trained model for automatic hierarchical parsing of observation sequences We describe two applications of our model and its parameter estimation procedure In the first application we show how to construct hierarchical models of natural English text In these models different levels of the hierarchy correspond to structures on different length scales in the text In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting

975 citations

Book ChapterDOI
07 May 2006
TL;DR: This paper treats the challenge of automatically inferring aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated online photo sharing Website as data source and extracts certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images.
Abstract: Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty Judging beauty and other aesthetic qualities of photographs is a highly subjective task Hence, there is no unanimously agreed standard for measuring aesthetic value In spite of the lack of firm rules, certain features in photographic images are believed, by many, to please humans more than certain others In this paper, we treat the challenge of automatically inferring aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated online photo sharing Website as data source We extract certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images Automated classifiers are built using support vector machines and classification trees Linear regression on polynomial terms of the features is also applied to infer numerical aesthetics ratings The work attempts to explore the relationship between emotions which pictures arouse in people, and their low-level content Potential applications include content-based image retrieval and digital photography

918 citations