scispace - formally typeset
J

Justin Ma

Researcher at University of California, San Diego

Publications -  11
Citations -  2431

Justin Ma is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Semantic URL & The Internet. The author has an hindex of 10, co-authored 11 publications receiving 2281 citations. Previous affiliations of Justin Ma include University of California, Berkeley.

Papers
More filters
Proceedings ArticleDOI

Beyond blacklists: learning to detect malicious web sites from suspicious URLs

TL;DR: This paper describes an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs.
Proceedings ArticleDOI

Identifying suspicious URLs: an application of large-scale online learning

TL;DR: It is demonstrated that recently-developed online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set.
Journal ArticleDOI

Scalability, fidelity, and containment in the potemkin virtual honeyfarm

TL;DR: This paper has built a prototype honeyfarm system, called Potemkin, that exploits virtual machines, aggressive memory sharing, and late binding of resources to achieve the goal of improving honeypot scalability while still closely emulating the execution behavior of individual Internet hosts.
Journal ArticleDOI

Learning to detect malicious URLs

TL;DR: This article develops a real-time system for gathering URL features and is able to train an online classifier that detects malicious Web sites with 99% accuracy over a balanced dataset.
Proceedings ArticleDOI

Unexpected means of protocol inference

TL;DR: This work analyzes three alternative mechanisms using statistical and structural content models for automatically identifying traffic that uses the same application-layer protocol, relying solely on flow content, and evaluates each mechanism's classification performance using real-world traffic traces from multiple sites.