scispace - formally typeset
Search or ask a question
Author

Yudong Tao

Bio: Yudong Tao is an academic researcher from University of Miami. The author has contributed to research in topics: Deep learning & Medicine. The author has an hindex of 7, co-authored 29 publications receiving 662 citations. Previous affiliations of Yudong Tao include Fudan University & Bascom Palmer Eye Institute.

Papers
More filters
Journal ArticleDOI
TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.
Abstract: The field of machine learning is witnessing its golden era as deep learning slowly becomes the leader in this domain. Deep learning uses multiple layers to represent the abstractions of data to build computational models. Some key enabler deep learning algorithms such as generative adversarial networks, convolutional neural networks, and model transfers have completely changed our perception of information processing. However, there exists an aperture of understanding behind this tremendously fast-paced domain, because it was never previously represented from a multiscope perspective. The lack of core understanding renders these powerful methods as black-box machines that inhibit development at a fundamental level. Moreover, deep learning has repeatedly been perceived as a silver bullet to all stumbling blocks in machine learning, which is far from the truth. This article presents a comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications. It was also undertaken to review the issues faced in deep learning such as unsupervised learning, black-box models, and online learning and to illustrate how these challenges can be transformed into prolific future research avenues.

824 citations

Proceedings ArticleDOI
10 Apr 2018
TL;DR: A novel model based on the Convolutional Neural Networks to handle such imbalanced and heterogeneous data and successfully identifies the semantic concepts in these multimedia systems is presented.
Abstract: Many multimedia systems stream real-time visual data continuously for a wide variety of applications. These systems can produce vast amounts of data, but few studies take advantage of the versatile and real-time data. This paper presents a novel model based on the Convolutional Neural Networks (CNNs) to handle such imbalanced and heterogeneous data and successfully identifies the semantic concepts in these multimedia systems. The proposed model can discover the semantic concepts from the data with a skewed distribution using a dynamic sampling technique. The paper also presents a system that can retrieve real-time visual data from heterogeneous cameras, and the run-time environment allows the analysis programs to process the data from thousands of cameras simultaneously. The evaluation results in comparison with several state-of-the-art methods demonstrate the ability and effectiveness of the proposed model on visual data captured by public network cameras.

118 citations

Journal ArticleDOI
TL;DR: A new multimodal deep learning framework for event detection from videos by leveraging recent advances in deep neural networks and a novel fusion technique is proposed that integrates different data representations in two levels, namely frame-level and video-level.
Abstract: Real-world applications usually encounter data with various modalities, each containing valuable information. To enhance these applications, it is essential to effectively analyze all information extracted from different data modalities, while most existing learning models ignore some data types and only focus on a single modality. This paper presents a new multimodal deep learning framework for event detection from videos by leveraging recent advances in deep neural networks. First, several deep learning models are utilized to extract useful information from multiple modalities. Among these are pre-trained Convolutional Neural Networks (CNNs) for visual and audio feature extraction and a word embedding model for textual analysis. Then, a novel fusion technique is proposed that integrates different data representations in two levels, namely frame-level and video-level. Different from the existing multimodal learning algorithms, the proposed framework can reason about a missing data type using other available data modalities. The proposed framework is applied to a new video dataset containing natural disaster classes. The experimental results illustrate the effectiveness of the proposed framework compared to some single modal deep learning models as well as conventional fusion techniques. Specifically, the final accuracy is improved more than 16% and 7% compared to the best results from single modality and fusion models, respectively.

53 citations

Journal ArticleDOI
TL;DR: The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques and demonstrates the effectiveness of both visual model and fusion model compared to the baseline approaches.
Abstract: The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government This study targets content analysis and mining for disaster management Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed First, a video dataset of natural disasters is collected from YouTube Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset

40 citations

Proceedings ArticleDOI
08 Jul 2019
TL;DR: SP-ASDNet is proposed which utilizes both convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to classify whether an observer is typical developed (TD) or has ASD, based on the scanpath of the corresponding observer's gaze at the given image.
Abstract: Autism spectrum disorder (ASD) is one of the common diseases that affects the language and even the behavior of the subjects. Since the large variations in the symptoms and severities of ASD, the diagnosis becomes a challenging problem. It has been witnessed that deep neural networks have been widely used and achieve good performance in various applications of visual data analysis. In this paper, we propose SP-ASDNet which utilizes both convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to classify whether an observer is typical developed (TD) or has ASD, based on the scanpath of the corresponding observer's gaze at the given image. The proposed SP-ASDNet is submitted to 2019 Saliency4ASD grand challenge and achieves 74.22% accuracy for validation.

33 citations


Cited by
More filters
01 Jan 2012

3,692 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

1,897 citations

Journal ArticleDOI
TL;DR: Examination of existing deep learning techniques for addressing class imbalanced data finds that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered.
Abstract: The purpose of this study is to examine existing deep learning techniques for addressing class imbalanced data. Effective classification with imbalanced data is an important area of research, as high class imbalance is naturally inherent in many real-world applications, e.g., fraud detection and cancer detection. Moreover, highly imbalanced data poses added difficulty, as most learners will exhibit bias towards the majority class, and in extreme cases, may ignore the minority class altogether. Class imbalance has been studied thoroughly over the last two decades using traditional machine learning models, i.e. non-deep learning. Despite recent advances in deep learning, along with its increasing popularity, very little empirical work in the area of deep learning with class imbalance exists. Having achieved record-breaking performance results in several complex domains, investigating the use of deep neural networks for problems containing high levels of class imbalance is of great interest. Available studies regarding class imbalance and deep learning are surveyed in order to better understand the efficacy of deep learning when applied to class imbalanced data. This survey discusses the implementation details and experimental results for each study, and offers additional insight into their strengths and weaknesses. Several areas of focus include: data complexity, architectures tested, performance interpretation, ease of use, big data application, and generalization to other domains. We have found that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered. Several traditional methods for class imbalance, e.g. data sampling and cost-sensitive learning, prove to be applicable in deep learning, while more advanced methods that exploit neural network feature learning abilities show promising results. The survey concludes with a discussion that highlights various gaps in deep learning from class imbalanced data for the purpose of guiding future research.

1,377 citations

Journal ArticleDOI
TL;DR: In this paper, a comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field is provided, and the challenges and suggested solutions to help researchers understand the existing research gaps.
Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.

1,084 citations