scispace - formally typeset
Search or ask a question

Showing papers by "V. Kamakoti published in 2022"


Proceedings ArticleDOI
17 Oct 2022
TL;DR: RaDaR is an open real-world dataset for run-time behavioral analysis of Windows malware and provides a multi-perspective data collection and labeling of malware activity, to enable an unbiased comparison of different solutions and foster multiple verticals in malware research.
Abstract: Artificial Intelligence techniques on malware run-time behavior have emerged as a promising tool in the arms race against sophisticated and stealthy cyber-attacks. While data of malware run-time features are critical for research and benchmark comparisons, unfortunately, there is a dearth of real-world datasets due to multiple challenges to their collection. The evasive nature of malware, its dependence on connected real-world conditions to execute, and its potential repercussions pose significant challenges for executing malware in laboratory settings. Consequently, prior open datasets rely on isolated virtual sandboxes to run malware, resulting in data that is not representative of malware behavior in the wild. This paper presents RaDaR, an open real-world dataset for run-time behavioral analysis of Windows malware. RaDaR is collected by executing malware on a real-world testbed with Internet connectivity and in a timely manner, thus providing a close-to-real-world representation of malware behavior. To enable an unbiased comparison of different solutions and foster multiple verticals in malware research, RaDaR provides a multi-perspective data collection and labeling of malware activity. The multi-perspective collection provides a comprehensive view of malware activity across the network, operating system (OS), and hardware. On the other hand, the multi-perspective labeling provides four independent perspectives to analyze the same malware, including its methodology, objective, capabilities, and the information it exfiltrates. To date, RaDaR includes 7 million network packets, 11.3 million OS system call traces, and 3.3 million hardware events of 10,434 malware samples having different methodologies (3 classes) and objectives (9 classes), spread across 30 well-known malware families.

Journal ArticleDOI
TL;DR: Snoopy as mentioned in this paper is a framework that performs webpage fingerprinting for a large number of users visiting a website by predicting the variations caused by factors such as header sizes, MTU, and User Agent String that arise from the diversity in browsing contexts.
Abstract: —Internet users are vulnerable to privacy attacks despite the use of encryption. Webpage fingerprinting, an attack that analyzes encrypted traffic, can identify the webpages visited by a user. Recent research works have been successful in demonstrating webpage fingerprinting attacks on individual users, but have been unsuccessful in extending their attack for mass-surveillance. The key challenges in performing mass-scale webpage fingerprinting arises from (i) the sheer number of combinations of user behavior and preferences to account for, and; (ii) the bound on the number of website queries imposed by the defense mechanisms (e.g., DDoS defense) deployed at the website. These constraints preclude the use of conventional data-intensive ML-based techniques. In this work, we propose Snoopy, a first-of-its-kind framework, that performs webpage fingerprinting for a large number of users visiting a website. Snoopy caters to the generalization requirements of mass-surveillance while complying with a bound on the number of website accesses (finite query model) for traffic sample collection. For this, Snoopy uses a feature (i.e., sequence of encrypted resource sizes) that is either unaffected or predictably affected by different browsing contexts (OS, browser, caching, cookie settings). Snoopy uses static analysis techniques to predict the variations caused by factors such as header sizes, MTU, and User Agent String that arise from the diversity in browsing contexts. We show that Snoopy achieves ≈ 90% accuracy when evaluated on most websites, across various browsing contexts. A simple ensemble of Snoopy and an ML-based technique achieves ≈ 97% accuracy while adhering to the finite query model, in cases when Snoopy alone does not perform well.

Journal ArticleDOI
TL;DR: SUNDEW as discussed by the authors uses an ensemble of specialized predictors, each trained with a particular data source (network, OS, and hardware) and tuned for features and requirements of a specific class.
Abstract: —Malware programs are diverse, with varying objectives, functionalities, and threat levels ranging from mere pop-ups to financial losses. Consequently, their run-time footprints across the system differ, impacting the optimal data source (Network, Operating system (OS), Hardware) and features that are instrumental to malware detection. Further, the variations in threat levels of malware classes affect the user requirements for detection. Thus, the optimal tuple of (cid:104) data - source , features , user - requirements (cid:105) is different for each malware class, impacting the state-of-the-art detection solutions that are agnostic to these subtle differences. This paper presents SUNDEW, a framework to detect malware classes using their optimal tuple of (cid:104) data - source , features , user - requirements (cid:105) . SUNDEW uses an ensemble of specialized predictors, each trained with a particular data source (network, OS, and hardware) and tuned for features and requirements of a specific class. While the specialized ensemble with a holistic view across the system improves detection, aggregating the independent conflicting inferences from the different predictors is challenging. SUNDEW resolves such conflicts with a hierarchical aggregation considering the threat-level, noise in the data sources, and prior domain knowledge. We evaluate SUNDEW on a real-world dataset of over 10,000 malware samples from 8 classes. It achieves an F1-Score of one for most classes, with an average of 0.93 and a limited performance overhead of 1 . 5% .