Howtos

Specific instructions on using features of opdis and libopdis.

opdis

Setting the target architecture

The supported architectures can be displayed with the --list-architectures option:

bash$ opdis --list-architectures
        i386
        i386:x86-64
        i8086
        i386:intel
        i386:x86-64:intel
        l1om
        l1om:intel
        plugin
Default architecture is 'i386'

The architecture can then be passed to the -a option:

opdis -a i386:x86-64

Setting disassembler options

The available disassembler options for all supported architectures can be displayed with the --list-disassembler-options option:

bash$ /opdis --list-disassembler-options

The following i386/x86-64 specific disassembler options are supported for use
with the -M switch (multiple options should be separated by commas):
  x86-64      Disassemble in 64bit mode
  i386        Disassemble in 32bit mode
  i8086       Disassemble in 16bit mode
  att         Display instruction in AT&T syntax
  intel       Display instruction in Intel syntax
  att-mnemonic
              Display instruction in AT&T mnemonic
  intel-mnemonic
              Display instruction in Intel mnemonic
  addr64      Assume 64bit address size
  addr32      Assume 32bit address size
  addr16      Assume 16bit address size
  data32      Assume 32bit data size
  data16      Assume 16bit data size
  suffix      Always display instruction suffix in AT&T syntax

The disassembler options can then be passed to the -O option, delimited by commas:

opdis -O 'intel,x86-64'

Note:: The message about the -M switch is generated by libopcodes and should be ignored.

Mapping memory regions

The general format for specifying a memory map is:

[target]:offset@vma[+size]

The :, @, and + characters are delimiters indicating that what follows is an offset, a vma, or a size, respectively. These components may appear in any order, but the target must always be first as it is not delimited. Note that these values are all parsed with strtoul, and so they may appear in any supported base (octal with a '0' prefix, decimal, or hexadecimal with a '0x' prefix).

The target, size, and offset components are all optional. The default value for target is 1, the ID of the first target. The default value for offset is 0. The default value for size is the size of the target.

Memory maps are generally used for the following purposes:

to alter the VMA that appears in the disassembler output
to make multiple targets appear as a contiguous range of memory
to change the order in which targets appear in memory
to mimic the OS load addresses of the target(s)

Examples

Map the entirety of a.out to VMA 0x400000:

opdis -m @0x400000 --dry-run a.out

Map the entirety of a.out to VMA 0x400000 and the entirety of libc.so.6 to 0x7f626a934000:

opdis -m @0x400000 -m 2:0x7f626a934000 --dry-run a.out /usr/lib/libc.so.6

Map offset 0x1000 of a.out to VMA 0x401000:

opdis -m :0x1000@0x401000 --dry-run a.out

Map buffer 1 to 0x400000 and offset 0x1000 of a.out to VMA 0x401000:

opdis -m 1@400000 -m 2:0x1000@0x401000 -b '7f 45 4c 46 02 01 01 00 00' --dry-run a.out

Make buffer 1 and buffer 2 contiguous in memory:

opdis -m 1@0 2@4 -b '2e 2e 74 50' -b '89 e1 31 d2' --dry-run

Make buffer 2 and buffer 1 contiguous in memory:

opdis -m 2@0 1@4 -b '2e 2e 74 50' -b '89 e1 31 d2' --dry-run

Use ldd to determine the load address of a shared library:

bash$ ldd a.out 
        linux-vdso.so.1 =>  (0x00007fffc74cf000)
        libssl.so.0.9.8 => /lib/libssl.so.0.9.8 (0x00007f0340274000)
        libc.so.6 => /lib/libc.so.6 (0x00007f033ff05000)
        libcrypto.so.0.9.8 => /lib/libcrypto.so.0.9.8 (0x00007f033fb7e000)
        libdl.so.2 => /lib/libdl.so.2 (0x00007f033f97a000)
        libz.so.1 => /lib/libz.so.1 (0x00007f033f763000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f03404c2000)
bash$ opdis -m 2@0x7f0340274000 --dry-run a.out /usr/lib/libssl.so

Use readelf to get the load address of the .text section for linear disassembly:

bash$ readelf a.out -S | grep -A 1 text
  [14] .text             PROGBITS         00000000004005d0  000005d0
       00000000000001f8  0000000000000000  AX       0     0     16
bash$ opdis -m :0x5d0@0x4005d0+0x1f8 -l @0x4005d0 a.out

Use readelf to get the executable program segment and program entry point for control-flow disassembly:

bash$ readelf a.out -l | grep -A 1 LOAD | grep -B 1 'R E'
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000087c 0x000000000000087c  R E    200000
bash$ readelf a.out -h | grep Entry
  Entry point address:               0x4005d0
bash$ opdis -m :@0x400000+0x87C -c @0x4005d0 a.out

libopdis API

Enabling visited-address tracking

The default handler callback (opdis_default_handler) supports tracking of visited addresses. When the default handler encounters a VMA that has previously been disassembled, it halts disassembly and prevents the display callback from being invoked. This is useful for applications that write the instructions directly to output in their display callback.

Visited-address tracking is disabled by default in order to speed up disassembly, but it can be enabled by allocating an opdis_vma_tree_t in the opdis_t visited_addr field:

        o->visited_addr = opdis_vma_tree_init();

Warning:: Visited-address tracking will slow disassembly down considerably. It is recommended that applications instead have their display callback store instructions in an opdis_insn_tree_t and write to output after disassembly has finished; this has the additional benefit of ordering the disassembled instructions by VMA. The display callback can be used in such cases to update a progress display.

Writing a display callback

The display callback can be replaced with a function that takes an opdis_insn_t and an optional argument parameter. The only restriction is that the callback must duplicate the opdis_insn_t (using opdis_insn_dupe) when storing it for later use.

This example adds an instruction to an opdis_insn_tree_t and prints a status message:

static void my_display( const opdis_insn_t * insn, void * arg ) {
        opdis_insn_tree_t tree = (opdis_insn_tree_t) arg;
        opdis_insn_t * i = opdis_insn_dupe( insn );

        opdis_insn_tree_add( tree, i );

        printf( "%d bytes at offset %X\n", insn->size, insn->offset );
}

The following code demonstrates the use of this display callback:

        /* allocate an insn tree which will free the insns when deleted */
        opdis_insn_tree_t tree = opdis_insn_tree_init( 1 );

        opdis_set_display( opdis, my_display, (void *) tree );
        /* ... code to operate on tree, e.g. opdis_insn_tree_foreach ... */

        opdis_insn_tree_free( tree );

Writing a handler callback

The handler callback is used to determine if disassembly should continue after the current instruction has been processed.

This example checks if a specific instruction mnemonic is encountered; if so, the handler displays the instruction and halts disassembly. The default handler is chained in order to halt on invalid instructions.

struct HANDLER_ARG { char halt_mnem[32]; opdis_t opdis; };

static int my_handler( const opdis_insn_t * insn, void * arg ) {
        struct HANDLER_ARG * harg = (struct HANDLER_ARG *) arg;

        /* halt disassembly if specified mnemonic is encountered */
        if ( (insn->status & opdis_decode_mnem) && 
             ! strcmp(harg->halt_mnem, insn->mnemonic) ) {

                /* display instruction before halting */
                harg->opdis->display( insn, harg->opdis->display_arg );
                return 0;
        }

        /* invoke default handler to check for invalid and visited addresses */
        return opdis_default_handler( insn, harg->opdis );
}

The following code demonstrates the use of this handler callback:

        struct HANDLER_ARG handler_arg;
        strncpy( handler_arg.halt_mnem, "ret", 32 );
        handler_arg.opdis = o;
        opdis_set_handler( o, my_handler, (void *) &handler_arg );

Note:: If the handler callback returns 0, the display callback will not be invoked for the current instruction. The handler must invoke the display callback itself if the instruction should be displayed before halting disassembly.

Writing a resolver callback

The resolver callback is used to determine the VMA of the branch target of the current instruction. Applications that provide an emulator or VM will need to chain or replace this callback in order to resolve branch targets that are stored in registers or at a memory location.

This example assumes a flat address space and returns the offset component of an absolute address operand:

static opdis_vma_t my_resolver( const opdis_insn_t * insn, void * arg ) {

        /* return the offset component of segment:offset operands */
        if ( (insn->status & opdis_decode_op_flags) &&
             insn->target && insn->target->category == opdis_op_cat_absolute ) {
                return (opdis_vma_t) insn->target->value.abs.offset;
        }

        /* invoke the default resolver to handle immediate values */
        return opdis_default_resolver( insn, NULL );
}

The following code demonstrates the use of this resolver callback:

        opdis_set_resolver( o, my_resolver, NULL );

Writing a decoder callback

The decoder callback is used to generate an opdis_insn_t from the list of strings emitted by libopcodes. An incomplete decoder callback (such as the default) will result in metadata not being generated for the instruction (including the list of operands). Control-flow disassembly relies on this metadata, making a functional decoder callback essential to most applications.

This example detects x86 control-flow instructions after invoking opdis_default_decoder to do the basic instruction decoding. It does not chain either of the builtin x86 decoders, and no decoding is done for non-control-flow instructions.

static const char * jcc_insns[] = {
        "ja", "jae", "jb", "jbe", "jc", "jcxz", "jecxz", 
        "jrcxz", "je", "jg", "jge", "jl", "jle", "jna", "jnae", "jnb", "jnbe",
        "jnc", "jne", "jng", "jnge", "jnl", "jnle", "jno", "jnp", "jns", "jnz",
        "jo", "jp", "jpe", "js", "jz"
};

static const char * call_insns[] = { "lcall", "call", "callq" };

static const char * jmp_insns[] = { "jmp", "ljmp", "jmpq" };

static const char * ret_insns[] = {
        "ret", "lret", "retq", "retf", "iret", "iretd", "iretq"
};

static void handle_target( opdis_insn_t * out, const char * item ) {
        opdis_op_t * op = out->operands[0];
        op->category = opdis_op_cat_unknown;
        op->flags = opdis_op_flag_x;
        opdis_op_set_ascii( op, item );
        out->target = out->operands[0];
}

static int decode_mnemonic( char ** items, int idx, opdis_insn_t * out ) {
        int i, num;
        const char *item = items[idx];

        /* detect JMP */
        num = (int) sizeof(jmp_insns) / sizeof(char *);
        for ( i = 0; i < num; i++ ) {
                if (! strcmp(jmp_insns[i], item) ) {
                        out->category = opdis_insn_cat_cflow;
                        out->flags.cflow = opdis_cflow_flag_jmp;
                        handle_target( out, items[idx+1] );
                        return 1;
                }
        }

        /* detect RET */
        num = (int) sizeof(ret_insns) / sizeof(char *);
        for ( i = 0; i < num; i++ ) {
                if (! strcmp(ret_insns[i], item) ) {
                        out->category = opdis_insn_cat_cflow;
                        out->flags.cflow = opdis_cflow_flag_ret;
                        return 1;
                }
        }

        /* detect branch (call/jcc) */
        num = (int) sizeof(call_insns) / sizeof(char *);
        for ( i = 0; i < num; i++ ) {
                if (! strcmp(call_insns[i], item) ) {
                        out->category = opdis_insn_cat_cflow;
                        out->flags.cflow = opdis_cflow_flag_call;
                        handle_target( out, items[idx+1] );
                        return 1;
                }
        }
        num = (int) sizeof(jcc_insns) / sizeof(char *);
        for ( i = 0; i < num; i++ ) {
                if (! strcmp(jcc_insns[i], item) ) {
                        out->category = opdis_insn_cat_cflow;
                        out->flags.cflow = opdis_cflow_flag_jmpcc;
                        handle_target( out, items[idx+1] );
                        return 1;
                }
        }

        return 0;
}

static int my_decoder( const opdis_insn_buf_t in, opdis_insn_t * out,
                       const opdis_byte_t * buf, opdis_off_t offset,
                       opdis_vma_t vma, opdis_off_t length, void * arg ) {
        int i, rv;

        /* the default decoder fills ascii, vma, offset, size, and bytes.
 it sets status to opdis_decode_basic. */
        rv = opdis_default_decoder( in, out, buf, offset, vma, length, NULL );
        if (! rv ) {
                return rv;
        }

        for ( i=0; i < in->item_count; i++ ) {
                if ( decode_mnemonic( in->items, i, out ) ) {
                        out->status |= (opdis_decode_mnem | opdis_decode_ops | 
                                        opdis_decode_mnem_flags);
                        break;
                }
        }

        return rv;
}

The following code demonstrates the use of this decoder callback:

        opdis_set_decoder( o, my_decoder, NULL );

Note:: While chaining of the default decoder (opdis_default_decoder) is not required, it is strongly encouraged. The default decoder ensures that all instructions have their status, ascii, vma, offset, size, and bytes fields set reliably.