This thesis explores the use of graph embedding methods for compiler provenance identification. Graph embedding algorithms are widely used to analyze, compare, or distinguish networks, or similar structures, that are too large to represent visually. Using graph embeddings to address the problem of compiler provenance identification is a novel approach. Our approach applies embedding algorithms to the control flow graphs of binaries. In this document, we explore two graph embedding methods: tiered approaches and alternative embedding representations for analysis. Our results indicate that our method has the potential for use in compiler provenance identification. Experiments show that our approach is able to distinguish between individual compilers, compiler versions, and compiler version flags with above-average accuracy. Future work may explore extracting the significant graph embeddings from our generated model, recreate the generalized graph from the embeddings, and identify significant structures for manual analysis.
University of Minnesota M.S. thesis. June 2019. Major: Computer Science. Advisor: Peter Peterson. 1 computer file (PDF); vii, 42 pages.
Straumann, Aleksandar N..
Graph Embeddings for the extraction of Compiler Provenance features.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.