Showing papers by "V. Kamakoti published in 2022"

PDF

Open Access

Proceedings Article•DOI•

RaDaR: A Real-Word Dataset for AI powered Run-time Detection of Cyber-Attacks

[...]

Sareena Karapoola, Nikhilesh Singh, Chester Rebeiro, V. Kamakoti

17 Oct 2022

TL;DR: RaDaR is an open real-world dataset for run-time behavioral analysis of Windows malware and provides a multi-perspective data collection and labeling of malware activity, to enable an unbiased comparison of different solutions and foster multiple verticals in malware research.

...read moreread less

Abstract: Artificial Intelligence techniques on malware run-time behavior have emerged as a promising tool in the arms race against sophisticated and stealthy cyber-attacks. While data of malware run-time features are critical for research and benchmark comparisons, unfortunately, there is a dearth of real-world datasets due to multiple challenges to their collection. The evasive nature of malware, its dependence on connected real-world conditions to execute, and its potential repercussions pose significant challenges for executing malware in laboratory settings. Consequently, prior open datasets rely on isolated virtual sandboxes to run malware, resulting in data that is not representative of malware behavior in the wild. This paper presents RaDaR, an open real-world dataset for run-time behavioral analysis of Windows malware. RaDaR is collected by executing malware on a real-world testbed with Internet connectivity and in a timely manner, thus providing a close-to-real-world representation of malware behavior. To enable an unbiased comparison of different solutions and foster multiple verticals in malware research, RaDaR provides a multi-perspective data collection and labeling of malware activity. The multi-perspective collection provides a comprehensive view of malware activity across the network, operating system (OS), and hardware. On the other hand, the multi-perspective labeling provides four independent perspectives to analyze the same malware, including its methodology, objective, capabilities, and the information it exfiltrates. To date, RaDaR includes 7 million network packets, 11.3 million OS system call traces, and 3.3 million hardware events of 10,434 malware samples having different methodologies (3 classes) and objectives (9 classes), spread across 30 well-known malware families.

...read moreread less

Journal Article•DOI•

Snoopy: A Webpage Fingerprinting Framework with Finite Query Model for Mass-Surveillance

[...]

Gargi Mitra, Prasanna Karthik Vairam, Sandip Saha, Nitin Chandrachoodan, V. Kamakoti - Show less +1 more

30 May 2022-IEEE Transactions on Dependable and Secure Computing

TL;DR: Snoopy as mentioned in this paper is a framework that performs webpage fingerprinting for a large number of users visiting a website by predicting the variations caused by factors such as header sizes, MTU, and User Agent String that arise from the diversity in browsing contexts.

...read moreread less

Abstract: —Internet users are vulnerable to privacy attacks despite the use of encryption. Webpage ﬁngerprinting, an attack that analyzes encrypted trafﬁc, can identify the webpages visited by a user. Recent research works have been successful in demonstrating webpage ﬁngerprinting attacks on individual users, but have been unsuccessful in extending their attack for mass-surveillance. The key challenges in performing mass-scale webpage ﬁngerprinting arises from (i) the sheer number of combinations of user behavior and preferences to account for, and; (ii) the bound on the number of website queries imposed by the defense mechanisms (e.g., DDoS defense) deployed at the website. These constraints preclude the use of conventional data-intensive ML-based techniques. In this work, we propose Snoopy, a ﬁrst-of-its-kind framework, that performs webpage ﬁngerprinting for a large number of users visiting a website. Snoopy caters to the generalization requirements of mass-surveillance while complying with a bound on the number of website accesses (ﬁnite query model) for trafﬁc sample collection. For this, Snoopy uses a feature (i.e., sequence of encrypted resource sizes) that is either unaffected or predictably affected by different browsing contexts (OS, browser, caching, cookie settings). Snoopy uses static analysis techniques to predict the variations caused by factors such as header sizes, MTU, and User Agent String that arise from the diversity in browsing contexts. We show that Snoopy achieves ≈ 90% accuracy when evaluated on most websites, across various browsing contexts. A simple ensemble of Snoopy and an ML-based technique achieves ≈ 97% accuracy while adhering to the ﬁnite query model, in cases when Snoopy alone does not perform well.

...read moreread less

Journal Article•DOI•

SUNDEW: An Ensemble of Predictors for Case-Sensitive Detection of Malware

[...]

Sareena Karapoola, Nikhilesh Singh, Chester Rebeiro, V. Kamakoti

11 Nov 2022-arXiv.org

TL;DR: SUNDEW as discussed by the authors uses an ensemble of specialized predictors, each trained with a particular data source (network, OS, and hardware) and tuned for features and requirements of a speciﬁc class.

...read moreread less

Abstract: —Malware programs are diverse, with varying objectives, functionalities, and threat levels ranging from mere pop-ups to ﬁnancial losses. Consequently, their run-time footprints across the system differ, impacting the optimal data source (Network, Operating system (OS), Hardware) and features that are instrumental to malware detection. Further, the variations in threat levels of malware classes affect the user requirements for detection. Thus, the optimal tuple of (cid:104) data - source , features , user - requirements (cid:105) is different for each malware class, impacting the state-of-the-art detection solutions that are agnostic to these subtle differences. This paper presents SUNDEW, a framework to detect malware classes using their optimal tuple of (cid:104) data - source , features , user - requirements (cid:105) . SUNDEW uses an ensemble of specialized predictors, each trained with a particular data source (network, OS, and hardware) and tuned for features and requirements of a speciﬁc class. While the specialized ensemble with a holistic view across the system improves detection, aggregating the independent conﬂicting inferences from the different predictors is challenging. SUNDEW resolves such conﬂicts with a hierarchical aggregation considering the threat-level, noise in the data sources, and prior domain knowledge. We evaluate SUNDEW on a real-world dataset of over 10,000 malware samples from 8 classes. It achieves an F1-Score of one for most classes, with an average of 0.93 and a limited performance overhead of 1 . 5% .

...read moreread less