usenix conference policies
Reverse-Engineering Instruction Encodings
Binary tools such as disassemblers, just-in-time compilers, and executable code rewriters need to have an explicit representation of how machine instructions are encoded. Unfortunately, writing encodings for an entire instruction set by hand is both tedious and error-prone. We describe DERIVE, a tool that extracts bit-level instruction encoding information from assemblers. The user provides DERIVE with assembly-level information about various instructions. DERIVE automatically reverse-engineers the encodings for those instructions from an assembler by feeding it permutations of instructions and analyzing the resulting machine code. DERIVE solves the entire MIPS, SPARC, Alpha, and PowerPC instruction sets, and almost all of the ARM and x86 instruction sets. Its output consists of C declarations that can be used by binary tools. To demonstrate the utility of DERIVE, we have built a code emitter generator that takes DERIVE’s output and produces C macros for code emission, which we have then used to rewrite a Java JIT backend.
author = {Wilson Hsieh and Dawson Engler and Godmar Back},
title = {{Reverse-Engineering} Instruction Encodings},
booktitle = {2001 USENIX Annual Technical Conference (USENIX ATC 01)},
year = {2001},
address = {Boston, MA},
url = {https://www.usenix.org/conference/2001-usenix-annual-technical-conference/reverse-engineering-instruction-encodings},
publisher = {USENIX Association},
month = jun
}
connect with us