scispace - formally typeset
Proceedings ArticleDOI

Scalable Graph-based Bug Search for Firmware Images

Reads0
Chats0
TLDR
A new bug search scheme is proposed which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy, and implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches.
Abstract
Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

read more

Citations
More filters
Proceedings ArticleDOI

IoT Sentinel Demo: Automated Device-Type Identification for Security Enforcement in IoT

TL;DR: IoT Sentinel is presented, a system capable of automatically identifying the types of devices being connected to an IoT network and enabling enforcement of rules for constraining the communications of vulnerable devices so as to minimize damage resulting from their compromise.
Journal ArticleDOI

Demystifying IoT Security: An Exhaustive Survey on IoT Vulnerabilities and a First Empirical Look on Internet-Scale IoT Exploitations

TL;DR: A unique taxonomy is provided, which sheds the light on IoT vulnerabilities, their attack vectors, impacts on numerous security objectives, attacks which exploit such vulnerabilities, corresponding remediation methodologies and currently offered operational cyber security capabilities to infer and monitor such weaknesses.
Proceedings ArticleDOI

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

TL;DR: This work proposes a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy.
Proceedings Article

Graph Matching Networks for Learning the Similarity of Graph Structured Objects

TL;DR: This paper proposes a novel Graph Matching Network model that, given a pair of graphs as input, computes a similarity score between them by jointly reasoning on the pair through a new cross-graph attention-based matching mechanism.
Proceedings ArticleDOI

Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization

TL;DR: An assembly code representation learning model that can find and incorporate rich semantic relationships among tokens appearing in assembly code and significantly outperforms existing methods against changes introduced by obfuscation and optimizations is developed.
References
More filters
Book

Networks: An Introduction

Mark Newman
TL;DR: This book brings together for the first time the most important breakthroughs in each of these fields and presents them in a coherent fashion, highlighting the strong interconnections between work in different areas.
Proceedings Article

On Spectral Clustering: Analysis and an algorithm

TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Proceedings Article

A comparison of event models for naive bayes text classification

TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Journal ArticleDOI

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

TL;DR: An algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(DN + n1+1c2 + o(1) + 1/c2), which almost matches the lower bound for hashing-based algorithm recently obtained.
Related Papers (5)