scispace - formally typeset
Search or ask a question
Author

Omar Alrawi

Bio: Omar Alrawi is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Malware & Malware analysis. The author has an hindex of 12, co-authored 23 publications receiving 699 citations. Previous affiliations of Omar Alrawi include Qatar Computing Research Institute & Qatar Foundation.

Papers
More filters
Proceedings ArticleDOI
19 May 2019
TL;DR: This work systematize the literature for home-based IoT using this methodology in order to understand attack techniques, mitigations, and stakeholders, and evaluates umDevices devices to augment the systematized literature inorder to identify neglected research areas.
Abstract: Home-based IoT devices have a bleak reputation regarding their security practices. On the surface, the insecurities of IoT devices seem to be caused by integration problems that may be addressed by simple measures, but this work finds that to be a naive assumption. The truth is, IoT deployments, at their core, utilize traditional compute systems, such as embedded, mobile, and network. These components have many unexplored challenges such as the effect of over-privileged mobile applications on embedded devices. Our work proposes a methodology that researchers and practitioners could employ to analyze security properties for home-based IoT devices. We systematize the literature for home-based IoT using this methodology in order to understand attack techniques, mitigations, and stakeholders. Further, we evaluate umDevices devices to augment the systematized literature in order to identify neglected research areas. To make this analysis transparent and easier to adapt by the community, we provide a public portal to share our evaluation data and invite the community to contribute their independent findings.

285 citations

Journal ArticleDOI
TL;DR: An evaluation of both AutoMal and MaLabel based on medium-scale and large-scale datasets shows AMAL's effectiveness in accurately characterizing, classifying, and grouping malware samples, and several benchmarks, cost estimates and measurements highlight the merits of AMAL.

177 citations

Book ChapterDOI
10 Jul 2014
TL;DR: The literature lacks any systematic study on validating the performance of antivirus scanners, and the reliability of those labels or detection, and researchers rely on AV labels to establish a baseline of ground truth to compare their detection and classification algorithms.
Abstract: Antivirus scanners are designed to detect malware and, to a lesser extent, to label detections based on a family association. The labeling provided by AV vendors has many applications such as guiding efforts of disinfection and countermeasures, intelligence gathering, and attack attribution, among others. Furthermore, researchers rely on AV labels to establish a baseline of ground truth to compare their detection and classification algorithms. This is done despite many papers pointing out the subtle problem of relying on AV labels. However, the literature lacks any systematic study on validating the performance of antivirus scanners, and the reliability of those labels or detection.

107 citations

Book ChapterDOI
25 Aug 2014
TL;DR: An evaluation of both AutoMal and MaLabel based on medium-scale and large-scale datasets show AMAL’s effectiveness in accurately characterizing, classifying, and grouping malware samples.
Abstract: This paper introduces AMAL, an operational automated and behavior-based malware analysis and labeling (classification and clustering) system that addresses many limitations and shortcomings of the existing academic and industrial systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually-vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4,000 samples) and large-scale datasets (more than 115,000 samples)—collected and analyzed by AutoMal over 13 months—show AMAL’s effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5 % and recall of 99.6 % for certain families’ classification, and more than 98 % of precision and recall for unsupervised clustering. Several benchmarks, costs estimates and measurements highlight and support the merits and features of AMAL.

66 citations

Journal Article
01 Jan 2013-Scopus
TL;DR: It is shown that artifacts like file system, registry, and network features can be used to identify distinct malware families with high accuracy - in some cases as high as 95 percent.
Abstract: Malware family classification is an age old problem that many Anti-Virus (AV) companies have tackled. There are two common techniques used for classification, signature based and behavior based. Signature based classification uses a common sequence of bytes that appears in the binary code to identify and detect a family of malware. Behavior based classification uses artifacts created by malware during execution for identification. In this paper we report on a unique dataset we obtained from our operations and classified using several machine learning techniques using the behavior-based approach. Our main class of malware we are interested in classifying is the popular Zeus malware. For its classification we identify 65 features that are unique and robust for identifying malware families. We show that artifacts like file system, registry, and network features can be used to identify distinct malware families with high accuracy - in some cases as high as 95 percent.

63 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Book ChapterDOI
19 Sep 2016
TL;DR: AVclass is described, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample, and implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection.
Abstract: Labeling a malicious executable as a variant of a known family is important for security applications such as triage, lineage, and for building reference datasets in turn used for evaluating malware clustering and training malware classification approaches. Oftentimes, such labeling is based on labels output by antivirus engines. While AV labels are well-known to be inconsistent, there is often no other information available for labeling, thus security analysts keep relying on them. However, current approaches for extracting family information from AV labels are manual and inaccurate. In this work, we describe AVclass, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample. AVclass implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection. We have evaluated AVclass on 10 datasets comprising 8.9 M samples, larger than any dataset used by malware clustering and classification works. AVclass leverages labels from any AV engine, e.g., all 99 AV engines seen in VirusTotal, the largest engine set in the literature. AVclass’s clustering achieves F1 measures up to 93.9 on labeled datasets and clusters are labeled with fine-grained family names commonly used by the AV vendors. We release AVclass to the community.

351 citations

Book ChapterDOI
06 Jul 2017
TL;DR: This work uses existing anti-virus scan results and automation techniques in categorizing a large Android malware dataset into 135 varieties which belong to 71 malware families, and presents detailed documentation of the process used in creating the dataset, including the guidelines for the manual analysis.
Abstract: To build effective malware analysis techniques and to evaluate new detection tools, up-to-date datasets reflecting the current Android malware landscape are essential For such datasets to be maximally useful, they need to contain reliable and complete information on malware’s behaviors and techniques used in the malicious activities Such a dataset shall also provide a comprehensive coverage of a large number of types of malware The Android Malware Genome created circa 2011 has been the only well-labeled and widely studied dataset the research community had easy access to (As of 12/21/2015 the Genome authors have stopped supporting the dataset sharing due to resource limitation) But not only is it outdated and no longer represents the current Android malware landscape, it also does not provide as detailed information on malware’s behaviors as needed for research Thus it is urgent to create a high-quality dataset for Android malware While existing information sources such as VirusTotal are useful, to obtain the accurate and detailed information for malware behaviors, deep manual analysis is indispensable In this work we present our approach to preparing a large Android malware dataset for the research community We leverage existing anti-virus scan results and automation techniques in categorizing our large dataset (containing 24,650 malware app samples) into 135 varieties (based on malware behavioral semantics) which belong to 71 malware families For each variety, we select three samples as representatives, for a total of 405 malware samples, to conduct in-depth manual analysis Based on the manual analysis result we generate detailed descriptions of each malware variety’s behaviors and include them in our dataset We also report our observations on the current landscape of Android malware as depicted in the dataset Furthermore, we present detailed documentation of the process used in creating the dataset, including the guidelines for the manual analysis We make our Android malware dataset available to the research community

342 citations

Journal ArticleDOI
TL;DR: This survey aims at providing an overview on the way machine learning has been used so far in the context of malware analysis in Windows environments, i.e. for the analysis of Portable Executables.

316 citations

Journal ArticleDOI
TL;DR: This survey aims at providing a systematic and detailed overview of machine learning techniques for malware detection and in particular, deep learning techniques with special emphasis on deep learning approaches.

291 citations