Toward Knowledge as a Service Over Networks: A Deep Learning Model Communication Paradigm

doi:10.1109/JSAC.2019.2904360

Home
/
Papers
/
Toward Knowledge as a Service Over Networks: A Deep Learning Model Communication Paradigm

Journal Article•DOI•

Toward Knowledge as a Service Over Networks: A Deep Learning Model Communication Paradigm

Ziqian Chen¹, Ling-Yu Duan¹, Shiqi Wang², Yihang Lou¹, Tiejun Huang¹, Dapeng Oliver Wu³, Wen Gao¹ - Show less +3 more•Institutions (3)

Peking University¹, City University of Hong Kong², University of Florida³

14 Mar 2019-IEEE Journal on Selected Areas in Communications (IEEE)-Vol. 37, Iss: 6, pp 1349-1363

TL;DR: This paper presents the deep learning model communication paradigm based on multiple model compression, which greatly exploits the redundancy among multiple deep learning models in different application scenarios and analyzes the potential and demonstrates the promise of the compression strategy for deepLearning model communication through a set of experiments.

read less

Abstract: The advent of artificial intelligence and Internet of Things has led to the seamless transition turning the big data into the big knowledge. The deep learning models, which assimilate knowledge from large-scale data, can be regarded as an alternative but promising modality of knowledge for artificial intelligence services. Yet, the compression, storage, and communication of the deep learning models towards better knowledge services, especially over networks, pose a set of challenging problems on both industrial and academic realms. This paper presents the deep learning model communication paradigm based on multiple model compression, which greatly exploits the redundancy among multiple deep learning models in different application scenarios. We analyze the potential and demonstrate the promise of the compression strategy for deep learning model communication through a set of experiments. Moreover, the interoperability in deep learning model communication, which is enabled based on the standardization of compact deep learning model representation, is also discussed and envisioned.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics

[...]

Ling-Yu Duan¹, Jiaying Liu¹, Wenhan Yang¹, Tiejun Huang¹, Wen Gao¹ - Show less +1 more•Institutions (1)

Peking University¹

28 Aug 2020-IEEE Transactions on Image Processing

TL;DR: In this paper, a new area, Video Coding for Machines (VCM), is proposed to bridge the gap between feature coding for machine vision and video coding for human vision, and the preliminary results have demonstrated the performance and efficiency gains.

...read moreread less

Abstract: Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this article, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts. 1 Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well. 1 https://lists.aau.at/mailman/listinfo/mpeg-vcm

...read moreread less

113 citations

Proceedings Article•DOI•

VrR-VG: Refocusing Visually-Relevant Relationships

[...]

Yuanzhi Liang¹, Yalong Bai, Wei Zhang, Xueming Qian¹, Li Zhu¹, Tao Mei - Show less +2 more•Institutions (1)

Xi'an Jiaotong University¹

01 Feb 2019

TL;DR: Zhang et al. as discussed by the authors proposed a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome.

...read moreread less

Abstract: Relationships encode the interactions among individual instances and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, relationship models tend to fit the statistical bias rather than ``learning" to infer the relationships from images. To encourage further development in visual relationships, we propose a novel method to mine more valuable relationships by automatically pruning visually-irrelevant relationships. We construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved, which demonstrates the effectiveness of both our dataset and features embedding schema. Both our VrR-VG dataset and representation-aware features will be made publicly available soon.

...read moreread less

42 citations

Posted Content•

VrR-VG: Refocusing Visually-Relevant Relationships

[...]

Yuanzhi Liang¹, Yalong Bai, Wei Zhang, Xueming Qian¹, Li Zhu¹, Tao Mei - Show less +2 more•Institutions (1)

Xi'an Jiaotong University¹

01 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved, which demonstrates the effectiveness of both the dataset and features embedding schema.

...read moreread less

Abstract: Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, existing methods tend to fit the statistical bias rather than ``learning'' to ``infer'' the relationships from images. To encourage further development in visual relationships, we propose a novel method to automatically mine more valuable relationships by pruning visually-irrelevant ones. We construct a new scene-graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved with a large margin, which demonstrates the gain of our dataset and the features embedding schema. VrR-VG is available via this http URL.

...read moreread less

23 citations

Cites background from "Toward Knowledge as a Service Over ..."

...Representation Learning: Numerous deep learning methods have been proposed for representation learning with various knowledge [31, 22, 5, 30]....
[...]

Journal Article•DOI•

Digital Retina: A Way to Make the City Brain More Efficient by Visual Coding

[...]

Wen Gao¹, Siwei Ma¹, Ling-Yu Duan¹, Yonghong Tian¹, Peiyin Xing¹, Yaowei Wang, Shanshe Wang¹, Huizhu Jia¹, Tiejun Huang¹ - Show less +5 more•Institutions (1)

Peking University¹

16 Aug 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A vision towards a novel visual computing framework, termed as digital retina, which aligns high-efficiency sensing models with the emerging Visual Coding for Machine (VCM) paradigm is articulated, and extensive experiments have been conducted to validate that it is able to effectively support the video big data analysis and retrieval in the intelligent city system.

...read moreread less

Abstract: The ubiquitous camera networks in the city brain system grow at a rapid pace, creating massive amounts of images and videos at a range of spatial-temporal scales and thereby forming the “biggest” big data. However, the sensing system often lags behind the construction of the fast-growing city brain system, in the sense that such exponentially growing data far exceed today’s sensing capabilities. Therefore, critical issues arise regarding how to better leverage the existing city brain system and significantly improve the city-scale performance in intelligent applications. To tackle the unprecedented challenges, we articulate a vision towards a novel visual computing framework, termed as digital retina , which aligns high-efficiency sensing models with the emerging Visual Coding for Machine (VCM) paradigm. In particular, digital retina may consist of video coding, feature coding, model coding, as well as their joint optimization. The digital retina is biologically-inspired, rooted on the widely accepted view that the retina encodes the visual information for human perception, and extracts features by the brain downstream areas to disentangle the visual objects. Within the digital retina framework, three streams, i.e., video stream, feature stream, and model stream, work collaboratively over the end-edge-cloud platform. In particular, the compressed video stream serves for human vision, the compact feature stream targets for machine vision, and the model stream incrementally updates deep learning models to improve the performance of human/machine vision tasks. We have developed a prototype to demonstrate the technical advantages of digital retina, and extensive experiments have been conducted to validate that it is able to effectively support the video big data analysis and retrieval in the intelligent city system. In particular, up to $7000\times $ compression ratio could be realized for visual data compression while maintaining competitive performance with pristine signal in a series of visual analysis tasks.

...read moreread less

19 citations

Posted Content•

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics

[...]

Ling-Yu Duan¹, Jiaying Liu¹, Wenhan Yang¹, Tiejun Huang¹, Wen Gao¹ - Show less +1 more•Institutions (1)

Peking University¹

10 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The definition, formulation, and paradigm of VCM are given, and the preliminary results have demonstrated the performance and efficiency gains, and state-of-the-art techniques in video compression and feature compression are systematically reviewed from the unique perspective of MPEG standardization.

...read moreread less

Abstract: Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts1. Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well.

...read moreread less

11 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

49,914 citations

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

[...]

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

49,639 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations