scispace - formally typeset
Search or ask a question
Author

Randall Wald

Other affiliations: Atlantic University
Bio: Randall Wald is an academic researcher from Florida Atlantic University. The author has contributed to research in topics: Feature selection & Feature (computer vision). The author has an hindex of 21, co-authored 78 publications receiving 3040 citations. Previous affiliations of Randall Wald include Atlantic University.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: This study explores how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks.
Abstract: Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.

1,827 citations

Journal ArticleDOI
TL;DR: This paper will present recent research using Big Data tools and approaches for the analysis of Health Informatics data gathered at multiple levels, including the molecular, tissue, patient, and population levels.
Abstract: The amount of data produced within Health Informatics has grown to be quite vast, and analysis of this Big Data grants potentially limitless possibilities for knowledge to be gained. In addition, this information can improve the quality of healthcare offered to patients. However, there are a number of issues that arise when dealing with these vast quantities of data, especially how to analyze this data in a reliable manner. The basic goal of Health Informatics is to take in real world medical data from all levels of human existence to help advance our understanding of medicine and medical practice. This paper will present recent research using Big Data tools and approaches for the analysis of Health Informatics data gathered at multiple levels, including the molecular, tissue, patient, and population levels. In addition to gathering data at multiple levels, multiple levels of questions are addressed: human-scale biology, clinical-scale, and epidemic-scale. We will also analyze and examine possible future work for each of these areas, as well as how combining data from each level may provide the most promising approach to gain the most knowledge in Health Informatics.

264 citations

Journal ArticleDOI
TL;DR: Overall, both cyber threat analysis and cyber intelligence could be enhanced by correlating security events across many diverse heterogeneous sources, as well as presenting areas where more research opportunities exist.
Abstract: Intrusion Detection has been heavily studied in both industry and academia, but cybersecurity analysts still desire much more alert accuracy and overall threat analysis in order to secure their systems within cyberspace. Improvements to Intrusion Detection could be achieved by embracing a more comprehensive approach in monitoring security events from many different heterogeneous sources. Correlating security events from heterogeneous sources can grant a more holistic view and greater situational awareness of cyber threats. One problem with this approach is that currently, even a single event source (e.g., network traffic) can experience Big Data challenges when considered alone. Attempts to use more heterogeneous data sources pose an even greater Big Data challenge. Big Data technologies for Intrusion Detection can help solve these Big Heterogeneous Data challenges. In this paper, we review the scope of works considering the problem of heterogeneous data and in particular Big Heterogeneous Data. We discuss the specific issues of Data Fusion, Heterogeneous Intrusion Detection Architectures, and Security Information and Event Management (SIEM) systems, as well as presenting areas where more research opportunities exist. Overall, both cyber threat analysis and cyber intelligence could be enhanced by correlating security events across many diverse heterogeneous sources.

257 citations

Proceedings ArticleDOI
06 Dec 2009
TL;DR: This work considers three filters using classifier performance metrics and six commonly-used filters and all nine filtering techniques are compared and contrasted using five different microarray expression datasets.
Abstract: Feature selection is an important topic in data mining, especially for high dimensional datasets. Filtering techniques in particular have received much attention, but detailed comparisons of their performance is lacking. This work considers three filters using classifier performance metrics and six commonly-used filters. All nine filtering techniques are compared and contrasted using five different microarray expression datasets. In addition, given that these datasets exhibit an imbalance between the number of positive and negative examples, the utilization of sampling techniques in the context of feature selection is examined.

145 citations

Proceedings ArticleDOI
17 Sep 2012
TL;DR: The role of stability in feature selection with DNA microarray data is introduced, various ways of improving feature ranking stability are listed, and feature selection techniques are discussed, specifically explaining ensemble feature ranking and presenting various ensemble featureranking aggregation methods.
Abstract: Feature selection is an important step in data mining and is used in various domains including genetics, medicine, and bioinformatics. Choosing the important features (genes) is essential for the discovery of new knowledge hidden within the genetic code as well as the identification of important biomarkers. Although feature selection methods can help sort through large numbers of genes based on their relevance to the problem at hand, the results generated tend to be unstable and thus cannot be reproduced in other experiments. Relatedly, research interest in the stability of feature ranking methods has grown recently and researchers have produced experimental designs for testing the stability of feature selection, creating new metrics for measuring stability and new techniques designed to improve the stability of the feature selection process. In this paper, we will introduce the role of stability in feature selection with DNA microarray data. We list various ways of improving feature ranking stability, and discuss feature selection techniques, specifically explaining ensemble feature ranking and presenting various ensemble feature ranking aggregation methods. Finally, we discuss experimental procedures such as dataset perturbation, fixed overlap partitioning, and cross validation procedures that help researchers analyze and measure the stability of feature ranking methods. Throughout this work, we investigate current research in the field and discuss possible avenues of continuing such research efforts.

114 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work was supported in part by the Royal Society of the UK, the National Natural Science Foundation of China, and the Alexander von Humboldt Foundation of Germany.

2,404 citations

Journal ArticleDOI
TL;DR: A survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges indicates that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.

2,100 citations

Journal ArticleDOI
TL;DR: A comprehensive survey on adversarial attacks on deep learning in computer vision can be found in this paper, where the authors review the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them.
Abstract: Deep learning is at the heart of the current rise of artificial intelligence. In the field of computer vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas, deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recently led to a large influx of contributions in this direction. This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlook of this research direction.

1,542 citations

Journal ArticleDOI
TL;DR: In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.
Abstract: Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of techniques to deal with label noise. However, the field lacks a comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise. This paper proposes to fill this gap. First, the definitions and sources of label noise are considered and a taxonomy of the types of label noise is proposed. Second, the potential consequences of label noise are discussed. Third, label noise-robust, label noise cleansing, and label noise-tolerant algorithms are reviewed. For each category of approaches, a short discussion is proposed to help the practitioner to choose the most suitable technique in its own particular field of application. Eventually, the design of experiments is also discussed, what may interest the researchers who would like to test their own algorithms. In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.

1,440 citations