Specific instructions on using features of opdis and libopdis.
The supported architectures can be displayed with the --list-architectures option:
bash$ opdis --list-architectures
i386
i386:x86-64
i8086
i386:intel
i386:x86-64:intel
l1om
l1om:intel
plugin
Default architecture is 'i386'
The architecture can then be passed to the -a option:
opdis -a i386:x86-64
The available disassembler options for all supported architectures can be displayed with the --list-disassembler-options option:
bash$ /opdis --list-disassembler-options The following i386/x86-64 specific disassembler options are supported for use with the -M switch (multiple options should be separated by commas): x86-64 Disassemble in 64bit mode i386 Disassemble in 32bit mode i8086 Disassemble in 16bit mode att Display instruction in AT&T syntax intel Display instruction in Intel syntax att-mnemonic Display instruction in AT&T mnemonic intel-mnemonic Display instruction in Intel mnemonic addr64 Assume 64bit address size addr32 Assume 32bit address size addr16 Assume 16bit address size data32 Assume 32bit data size data16 Assume 16bit data size suffix Always display instruction suffix in AT&T syntax
The disassembler options can then be passed to the -O option, delimited by commas:
opdis -O 'intel,x86-64'
The general format for specifying a memory map is:
[target]:offset@vma[+size]
The :, @, and + characters are delimiters indicating that what follows is an offset, a vma, or a size, respectively. These components may appear in any order, but the target must always be first as it is not delimited. Note that these values are all parsed with strtoul, and so they may appear in any supported base (octal with a '0' prefix, decimal, or hexadecimal with a '0x' prefix).
The target, size, and offset components are all optional. The default value for target is 1, the ID of the first target. The default value for offset is 0. The default value for size is the size of the target.
Memory maps are generally used for the following purposes:
Examples
Map the entirety of a.out to VMA 0x400000:
opdis -m @0x400000 --dry-run a.out
Map the entirety of a.out to VMA 0x400000 and the entirety of libc.so.6 to 0x7f626a934000:
opdis -m @0x400000 -m 2:0x7f626a934000 --dry-run a.out /usr/lib/libc.so.6
Map offset 0x1000 of a.out to VMA 0x401000:
opdis -m :0x1000@0x401000 --dry-run a.out
Map buffer 1 to 0x400000 and offset 0x1000 of a.out to VMA 0x401000:
opdis -m 1@400000 -m 2:0x1000@0x401000 -b '7f 45 4c 46 02 01 01 00 00' --dry-run a.out
Make buffer 1 and buffer 2 contiguous in memory:
opdis -m 1@0 2@4 -b '2e 2e 74 50' -b '89 e1 31 d2' --dry-run
Make buffer 2 and buffer 1 contiguous in memory:
opdis -m 2@0 1@4 -b '2e 2e 74 50' -b '89 e1 31 d2' --dry-run
Use ldd to determine the load address of a shared library:
bash$ ldd a.out linux-vdso.so.1 => (0x00007fffc74cf000) libssl.so.0.9.8 => /lib/libssl.so.0.9.8 (0x00007f0340274000) libc.so.6 => /lib/libc.so.6 (0x00007f033ff05000) libcrypto.so.0.9.8 => /lib/libcrypto.so.0.9.8 (0x00007f033fb7e000) libdl.so.2 => /lib/libdl.so.2 (0x00007f033f97a000) libz.so.1 => /lib/libz.so.1 (0x00007f033f763000) /lib64/ld-linux-x86-64.so.2 (0x00007f03404c2000) bash$ opdis -m 2@0x7f0340274000 --dry-run a.out /usr/lib/libssl.so
Use readelf to get the load address of the .text section for linear disassembly:
bash$ readelf a.out -S | grep -A 1 text [14] .text PROGBITS 00000000004005d0 000005d0 00000000000001f8 0000000000000000 AX 0 0 16 bash$ opdis -m :0x5d0@0x4005d0+0x1f8 -l @0x4005d0 a.out
Use readelf to get the executable program segment and program entry point for control-flow disassembly:
bash$ readelf a.out -l | grep -A 1 LOAD | grep -B 1 'R E'
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000087c 0x000000000000087c R E 200000
bash$ readelf a.out -h | grep Entry
Entry point address: 0x4005d0
bash$ opdis -m :@0x400000+0x87C -c @0x4005d0 a.out
The default handler callback (opdis_default_handler) supports tracking of visited addresses. When the default handler encounters a VMA that has previously been disassembled, it halts disassembly and prevents the display callback from being invoked. This is useful for applications that write the instructions directly to output in their display callback.
Visited-address tracking is disabled by default in order to speed up disassembly, but it can be enabled by allocating an opdis_vma_tree_t in the opdis_t visited_addr field:
o->visited_addr = opdis_vma_tree_init();
The display callback can be replaced with a function that takes an opdis_insn_t and an optional argument parameter. The only restriction is that the callback must duplicate the opdis_insn_t (using opdis_insn_dupe) when storing it for later use.
This example adds an instruction to an opdis_insn_tree_t and prints a status message:
static void my_display( const opdis_insn_t * insn, void * arg ) { opdis_insn_tree_t tree = (opdis_insn_tree_t) arg; opdis_insn_t * i = opdis_insn_dupe( insn ); opdis_insn_tree_add( tree, i ); printf( "%d bytes at offset %X\n", insn->size, insn->offset ); }
The following code demonstrates the use of this display callback:
/* allocate an insn tree which will free the insns when deleted */ opdis_insn_tree_t tree = opdis_insn_tree_init( 1 ); opdis_set_display( opdis, my_display, (void *) tree ); /* ... code to operate on tree, e.g. opdis_insn_tree_foreach ... */ opdis_insn_tree_free( tree );
The handler callback is used to determine if disassembly should continue after the current instruction has been processed.
This example checks if a specific instruction mnemonic is encountered; if so, the handler displays the instruction and halts disassembly. The default handler is chained in order to halt on invalid instructions.
struct HANDLER_ARG { char halt_mnem[32]; opdis_t opdis; }; static int my_handler( const opdis_insn_t * insn, void * arg ) { struct HANDLER_ARG * harg = (struct HANDLER_ARG *) arg; /* halt disassembly if specified mnemonic is encountered */ if ( (insn->status & opdis_decode_mnem) && ! strcmp(harg->halt_mnem, insn->mnemonic) ) { /* display instruction before halting */ harg->opdis->display( insn, harg->opdis->display_arg ); return 0; } /* invoke default handler to check for invalid and visited addresses */ return opdis_default_handler( insn, harg->opdis ); }
The following code demonstrates the use of this handler callback:
struct HANDLER_ARG handler_arg; strncpy( handler_arg.halt_mnem, "ret", 32 ); handler_arg.opdis = o; opdis_set_handler( o, my_handler, (void *) &handler_arg );
The resolver callback is used to determine the VMA of the branch target of the current instruction. Applications that provide an emulator or VM will need to chain or replace this callback in order to resolve branch targets that are stored in registers or at a memory location.
This example assumes a flat address space and returns the offset component of an absolute address operand:
static opdis_vma_t my_resolver( const opdis_insn_t * insn, void * arg ) { /* return the offset component of segment:offset operands */ if ( (insn->status & opdis_decode_op_flags) && insn->target && insn->target->category == opdis_op_cat_absolute ) { return (opdis_vma_t) insn->target->value.abs.offset; } /* invoke the default resolver to handle immediate values */ return opdis_default_resolver( insn, NULL ); }
The following code demonstrates the use of this resolver callback:
opdis_set_resolver( o, my_resolver, NULL );
The decoder callback is used to generate an opdis_insn_t from the list of strings emitted by libopcodes. An incomplete decoder callback (such as the default) will result in metadata not being generated for the instruction (including the list of operands). Control-flow disassembly relies on this metadata, making a functional decoder callback essential to most applications.
This example detects x86 control-flow instructions after invoking opdis_default_decoder to do the basic instruction decoding. It does not chain either of the builtin x86 decoders, and no decoding is done for non-control-flow instructions.
static const char * jcc_insns[] = { "ja", "jae", "jb", "jbe", "jc", "jcxz", "jecxz", "jrcxz", "je", "jg", "jge", "jl", "jle", "jna", "jnae", "jnb", "jnbe", "jnc", "jne", "jng", "jnge", "jnl", "jnle", "jno", "jnp", "jns", "jnz", "jo", "jp", "jpe", "js", "jz" }; static const char * call_insns[] = { "lcall", "call", "callq" }; static const char * jmp_insns[] = { "jmp", "ljmp", "jmpq" }; static const char * ret_insns[] = { "ret", "lret", "retq", "retf", "iret", "iretd", "iretq" }; static void handle_target( opdis_insn_t * out, const char * item ) { opdis_op_t * op = out->operands[0]; op->category = opdis_op_cat_unknown; op->flags = opdis_op_flag_x; opdis_op_set_ascii( op, item ); out->target = out->operands[0]; } static int decode_mnemonic( char ** items, int idx, opdis_insn_t * out ) { int i, num; const char *item = items[idx]; /* detect JMP */ num = (int) sizeof(jmp_insns) / sizeof(char *); for ( i = 0; i < num; i++ ) { if (! strcmp(jmp_insns[i], item) ) { out->category = opdis_insn_cat_cflow; out->flags.cflow = opdis_cflow_flag_jmp; handle_target( out, items[idx+1] ); return 1; } } /* detect RET */ num = (int) sizeof(ret_insns) / sizeof(char *); for ( i = 0; i < num; i++ ) { if (! strcmp(ret_insns[i], item) ) { out->category = opdis_insn_cat_cflow; out->flags.cflow = opdis_cflow_flag_ret; return 1; } } /* detect branch (call/jcc) */ num = (int) sizeof(call_insns) / sizeof(char *); for ( i = 0; i < num; i++ ) { if (! strcmp(call_insns[i], item) ) { out->category = opdis_insn_cat_cflow; out->flags.cflow = opdis_cflow_flag_call; handle_target( out, items[idx+1] ); return 1; } } num = (int) sizeof(jcc_insns) / sizeof(char *); for ( i = 0; i < num; i++ ) { if (! strcmp(jcc_insns[i], item) ) { out->category = opdis_insn_cat_cflow; out->flags.cflow = opdis_cflow_flag_jmpcc; handle_target( out, items[idx+1] ); return 1; } } return 0; } static int my_decoder( const opdis_insn_buf_t in, opdis_insn_t * out, const opdis_byte_t * buf, opdis_off_t offset, opdis_vma_t vma, opdis_off_t length, void * arg ) { int i, rv; /* the default decoder fills ascii, vma, offset, size, and bytes. it sets status to opdis_decode_basic. */ rv = opdis_default_decoder( in, out, buf, offset, vma, length, NULL ); if (! rv ) { return rv; } for ( i=0; i < in->item_count; i++ ) { if ( decode_mnemonic( in->items, i, out ) ) { out->status |= (opdis_decode_mnem | opdis_decode_ops | opdis_decode_mnem_flags); break; } } return rv; }
The following code demonstrates the use of this decoder callback:
opdis_set_decoder( o, my_decoder, NULL );