decgen - Instruction set decoder generator - V1.41b, 2017/04/15
======


Introduction
============

What is it?
-----------

decgen is a tool for generating instruction set decoders. I.e. you give it a set of files describing the instruction set of a CPU, and as output it produces a C file. Inside that C file will be a function that when given a raw data word as input will determine the instruction that that data word corresponds to. The code will then perform the (user-specified) activity associated with that instruction type; e.g. a disassembler would produce disassembly for the instruction, while an emulator would emulate the behaviour of the instruction.


Why use decgen?
---------------

Using decgen often provides several advantages over using a hand-written instruction set decoder:

* Guaranteed correctness - With all but the simplest of instruction sets it's very easy to accidentally produce a hand-written decoder that is unable to decode certain instructions, or decodes some instructions incorrectly. By using decgen the chances of this occurring are reduced significantly, as decgen will verify your instruction set definitions to ensure that they are complete and unambiguous.
* Ease of extensibility - Finding the right place to edit a hand-written decoder can be tricky due to the typically tree-like flow of the code. With decgen the instruction set definition does not describe the structure of the decoder, thus when you want to add a new definition you can insert it anywhere you like within the file.
* Use-case optimised decoding - If you're writing some code that only wants to deal with a subset of a CPU's full instruction set, then decgen will produce a tailor-made decoder that aborts decoding as soon as it determines that the instruction is of no interest to you.


Usage
-----

Decgen can be used in two primary forms; to verify a set of encodings or to produce a decoder (optionally verifying the encoding set and/or decoder)

decgen -bits=<n> [-D<name>=<val> [-D...]] -e <files...> [-valid] [-incomplete]
       [-v] [-a <files...> [-o|-A] <outfile> [-pre <file>] [-post <file>]
       [-prefix=<prefix>] [-default=<action>] [-name=<name>] [-verify] [-void]
       [-maxmaskbits=<n>] [-fuzzy|-exact] [-graph <file>] [-opt-strings]
       [-maxpreextract=<n>] [-treewalker] [-useargstruct]
       [-readcache|-writecache|-updatecache <file>]]

-bits=<n>          Specify number of bits in each opcode
-D<name>=<val>     Define a macro for use in encoding files
-e <files...>      Specify encoding files
-valid             Disables the ambiguity and completeness checks; specify when
                   processing known-good data in order to speed up decgen.
-incomplete        Indicate that the encoding files describe an incomplete
                   subset of the instruction set (disables completeness checks)
-v                 Enable verbose output
-a <files...>      Specify action files
-o <outfile>       Specify output file
-A <outfile>       Specify output file (append mode)
-pre <file>        Specify file whose content is to be placed at the start of
                   the output file
-post <file>       Specify file whose content is to be placed at the end of the
                   output file
-prefix=<prefix>   Specify prefix used for all functions generated by decgen.
                   Default is decgen__
-default=<action>  Name of default action to invoke if none matches an encoding
-name=<name>       Name of decoder function. Default is DECODER.
-verify            Verify tree decodes correctly (for debugging generator; may
                   be slow!)
-maxmaskbits=<n>   Set maximum number of bits to consider when creating mask
                   nodes (a.k.a. switch statements/jump tables). Default 5.
-fuzzy             For incomplete instruction sets, indicates that the decoder
                   is not required to have any predictable behaviour when asked
                   to decode an opcode for which no definition exists.
-exact             For incomplete instruction sets, indicates that the decoder
                   should perform the specified -default action when it
                   encounters an opcode for which no definition exists.
                   Warning: Selecting this option will significantly increase
                   the amount of time taken to generate a tree.
-graph <file>      Name of file to write decision tree to in GraphML format
-void              Indicate that RETURNTYPE has been #define'd to 'void', and
                   therefore decgen shouldn't attempt to return the value of
                   functions. Useful for supressing warnings/errors generated
                   by fussy compilers.
-opt-strings       Attempt to optimise generated code by merging together
                   action functions if the only difference between two
                   functions are the string literals that they use.
-maxpreextract=<n> Specify the maximum number of fields to pre-extract from
                   opcodes at higher levels of the tree. This is the maximum
                   number that will be passed to any one child function; tune
                   it to control code size/performance on the target platform.
                   Default value is zero.
-treewalker        Instead of emitting the tree as pure code, emit it in a
                   compact binary form instead, along with a tree walker
                   function.
-useargstruct      Place all fields into a structure to avoid leaves having to
                   extract them manually.
-readcache <file>  Read cached tree from <file>.
-writecache <file> Write cached tree to <file>.
-updatecache <file> Use cache file and update it if corrupt/out-of-date.

The 'binaries' folder contains copies of decgen compiled for RISC OS (requires 3.5+, SharedUnixLibrary), Linux (i486), and Windows.


Terminology
-----------

Throughout the rest of this document, and the program source code, the following terminology is used:

* Opcode - A word of data which you're attempting to decode, e.g. 0xE0810002
* Instruction set space - The set containing all possible opcodes. E.g. the instruction set space of an 8-bit CPU would be from 0x00 to 0xFF.
* Instruction - used somewhat interchangeably to refer to a specific opcode or an assembler mnemonic.
* Encoding - A set of rules that describe a subset of the instruction set space. An encoding would typically be used to detect a specific instruction, e.g. ADD Rd,Rn,Rm. Encodings are specified within 'encodingdef' files.
* Field - A contiguous set of bits within an opcode. Fields are typically defined on a per-encoding basis, e.g. to indicate where the Rd, Rn and Rm register numbers are stored within the encoding of an ADD instruction.
* Constraint - An expression that, when evaluated, indicates whether a specific opcode may belong to, or does not belong to, a specific encoding. Constraints where the expression evaluates to a nonzero value ('true') are said to match an opcode; constraints where the expression evaluates to zero ('false') are said to not match an opcode.
* Fixed bits and mask - A simplified constraint. Rather than a generic expression, a 'fixed bits and mask' constraint specifies that in order to be matched, certain bits of an opcode must have certain values.
* Opcode subset - A subset of the instruction set space. The subset is described by a set of constraints. An opcode subset may describe some, none, or all of a particular encoding.
* Action - The thing that should be done once an opcode has been classified. Actions are assigned to encodings, such that if opcode X matches encoding Y, then whenever the decoder is handed opcode X as input the action associated with encoding Y will be performed. Actions are specified as blocks of C code within 'actiondef' files.
* 'Should match' fields - These are best described in terms of their relationship to the ARM instruction set. For some instructions, the CPU ignores the values of certain bits of the instruction. This represents a (perhaps unintentional) waste of space within the instruction set, as there will be two (or more) opcodes that have the exact same behaviour. So in order to provide room for future expansion, ARM decree that these ignored bits should be filled with specific values, to ensure that existing binaries will continue to work as intended should extra decode logic be added to make use of the unused bits. When code generated by decgen decodes an instruction, it also ignores the values of the 'should match' fields. However it does provide the ability to indicate to the action whether the fields were correct or not, allowing the action to act as it sees fit (e.g. a disassembler might warn that the instruction is nonstandard, whereas an emulator might ignore the mismatch, or attempt to emulate the unpredictable behaviour of a specific CPU model)
* Ambiguity - If two encodings are capable of matching against the same opcode then the encodings are said to be ambiguous. decgen does not tolerate ambiguous encodings and will produce an error should it encounter one.
* Completeness - If the set of encodings given to decgen describe the contents of the entire instruction set space, then the encoding set is said to be complete. decgen is capable of working with both complete and incomplete sets of encodings, although complete sets are preferred if you want to get the most from decgen's validation facilities.
* Kin - Used within the decgen source to refer to encoding-action pairs that are similar. One encoding is kin with another if they share the same action, and the fields required by the action are in the same location for each encoding. Detecting kin helps decgen to produce a more optimised decoder.


Encoding files
==============

Encoding files are plain text files containing encoding definitions. Comments begin with a # and extend to the end of the line; everything else (excluding blank lines) decgen will attempt to parse as an encoding definition.

Each encoding definition occupies one line. The line is split into three parts; from left to right these are the encoding, the constraints, and the encoding name. The different parts are seperated by whitespace. For a simple example, here's the encoding for a 3-register ARM ADD instruction:

(cond:4)0000000S(Rn:4)(Rd:4)(shift:5)(type:2)0(Rm:4)	{ne(cond,15)}	ADD_reg
                   |                                         |             |
                Encoding                              Constraint  Encoding name


The encoding
------------

The encoding specifies the values of any fixed bits, the locations, names and sizes of any fields, and the values of any 'should match' fields. Little-endian notation is used; i.e. the leftmost bit in the encoding is the MSB.

* Fixed bits are denoted directly by a '1' or a '0'
* Brackets indicate fields. The following notations are supported:
  (name)           Describes a 1-bit wide field with the name 'name'
  (name:width)     Describes a 'width'-bit wide field with the name 'name'
  (:width)         Describes a 'width'-bit wide anonymous field (i.e. the
                   contents are of no interest to the encoding or the action)
  (bits)           Where 'bits' is a sequence of 0's and 1's, this indicates a
                   'should match' field. The number of bits in the sequence
                   indicates the width of the field
* Individual letters can be used to indicate 1-bit wide fields

Note that all fields are treated as unsigned.


Constraints
-----------

Constraints come after the encoding as a list of expressions, each expression residing within curly braces. The current expression parser is simplistic, thus all expressions must be expressed as if they were a set of nested function calls in C. An encoding can have zero or more constraints.


Encoding name
-------------

The encoding name comes after the constraints. It is used as an identifier when matching encodings with actions.


Thus, from the above example, we can deduce the following:

* The encoding's name is ADD_reg

* There are 7 fields:

  Name     Width   LSB

  cond     4       28
  S        1       20
  Rn       4       16
  Rd       4       12
  shift    5       6
  type     2       5
  Rm       4       0

* The fixed bits and mask indicate that, in order for the encoding to match, bits 21-27 and bit 5 must be zero

* There is one constraint. {ne(cond,15)}. This constraint means that, in order for the encoding to match, the 'cond' field must not equal 15 (decimal).


Identifier names
----------------

Identifier names (i.e. field names or encoding names). both in encoding files and actiondef files, follow the same basic rules as C; the identifier must start with a letter or underscore, and must contain only letters, numbers, and underscores. decgen treats identifier names as case-sensitive.


The 'in' specifier
------------------

You can use the 'in' specifier to indicate that one encoding lies fully within another. For example, the following 'UNDEFINED' encoding is specified as being in the 'VABA_VABAL_A2' encoding:

1111001U1D(size:2)(Vn:4)(Vd:4)0101N0M0(Vm:4)	{ne(size,3)}	VABA_VABAL_A2
1111001U1D(size:2)(Vn:4)(Vd:4)0101N0M0(Vm:4)	{ne(size,3)} {band(Vd,1)}	UNDEFINED in VABA_VABAL_A2

When decgen encounters the above it will first verify that all the opcodes that match 'UNDEFINED' also match 'VABA_VABAL_A2' (unless verification has been disabled via -valid). It will then examine the constraints of UNDEFINED, invert them, and add them to VABA_VABAL_A2, such that VABA_VABAL_A2 no longer matches any of the opcodes that were matched by UNDEFINED. Note that the constraint inversion code isn't particularly smart or efficient, and so over-use of 'in' may significantly degrade the performance of decgen (but the generated code should be just as fast).

The fact that you must manually restrict the child encoding to lie fully within the parent does limit the usefulness of the 'in' specifier. But it also helps to ensure the encodings are correct, and it is also an easy way of placing an encoding in one file and using an encoding in another file to override part of it.

Note that the parent encoding must be defined before the child encoding, and there must only be one instance of an encoding with that name.


Macros
------

Encoding files support the use of simple macros. Macro substitution is performed as a preprocessing step before the parsing of an input line; thus macros can be used to modify the encoding, the clauses, the encoding name, or all three. Macros can have up to 31 parameters.

When invoking decgen, you can declare a parameterless macro as follows:

  -Dfoo=wibble

This creates the macro 'foo'. To use the macro in an encoding file, place the macros name in square brackets, i.e. '[foo]'. With the above definition any instance of '[foo]' in the encoding file will be replaced with the text 'wibble'.

To define a macro which evaluates to a null string, use '-Dfoo='.

Macros which take parameters have the following syntax:

  -Dfoo(a,b,c)=[a][c]

This macro will take three parameters and evaluate to the value of the 'a' and 'c' parameters. I.e. if the text '[foo(red,green,blue)]' was in an encoding file then it would be replaced with 'redblue' the text.

When defining or invoking a macro, you can use \ to escape any characters that would otherwise be illegal. E.g. with the macro '-Dfoo(wibble)=\[wibble]', if you were to place the text '[foo(wibble(a\,b\))]' in an encoding file then it would expand to the text '[wibble(a,b)]'. After a macro is expanded the macro expansion will begin again from the start of the replaced text, thus you can create recursive macros if necessary.

Although macros might not seem like the most useful feature, they can be very useful when dealing with several very similar architecture variants. E.g. the supplied encoding files make use of macros to cope with the different handling of the 'NV' condition code between ARM architecture versions, and to allow the ASIMD and VFP files to be used to describe both the ARM and Thumb instruction sets. Parameterised macros can also make it easy to implement your own constraints for situations where the builtin constraints are insufficient.


Constraints in detail
=====================

As stated above, constraints are currently expressed in the form of a series of nested function calls. Regular arithmetic expressions aren't supported; instead, the following builtin functions must be used:
                                      
Function    Name                      Description
                                      
eq(X,Y)     Equals                    Returns true if X equals Y
ne(X,Y)     Not equals                Returns true if X is not equal to Y
lt(X,Y)     Less than                 Returns true if X is less than Y
gt(X,Y)     Greater than              Returns true if X is greater than Y
le(X,Y)     Less than or equal to     Returns true if X <= Y
ge(X,Y)     Greater than or equal to  Returns true if X >= Y
add(X,Y)    Add                       Returns X + Y
sub(X,Y)    Subtract                  Returns X - Y
mul(X,Y)    Multiply                  Returns X * Y
div(X,Y)    Divide                    Returns X / Y
mod(X,Y)    Modulo                    Returns X % Y
land(X,Y)   Logical AND               Returns X && Y
lor(X,Y)    Logical OR                Returns X || Y
lnot(X)     Logical NOT               Returns !X
leor(X,Y)   Logical EOR               Returns (X!=0)^(Y!=0)
band(X,Y)   Binary AND                Returns X & Y
bor(X,Y)    Binary OR                 Returns X | Y
bnot(X)     Binary NOT                Returns ~X
beor(X,Y)   Binary EOR                Returns X ^ Y
lsl(X,Y)    Logical shift left        Returns X << Y
lsr(X,Y)    Logical shift right       Returns X >> Y
neg(X)      Negate                    Returns -X
bitcount(X) Bit count                 Returns the number of bits in X that are
                                      set
bit(X)      Bit                       Returns 1<<X
bits(HI,LO) Bits                      Returns a value with bits HI..LO set to 1
                                      (all inclusive)
ispow2(X)   Power-of 2 check          Returns true if X is a nonzero power of 2
BadReg(X)   Bad register check        ARM-specific check; returns true if X is
                                      13 or 15.

The result of all constraint functions will be an unsigned 32bit integer or a boolean. decgen ensures that functions are given appropriate inputs (e.g. that the add() function is only given numeric inputs), but performs little else in the way of safety checks. I.e. attempting a logical shift of >=32 bits will have an undefined result, division by zero is likely to fail, and bits() has undefined behaviour if LO > HI, or if HI >= 32

Although decgen is able to automatically simplify some constraints (e.g. splitting {land(A,B)} into two constraints {A} and {B}), simpler constraints are almost always going to be better, both in terms of decgen's performance and in terms of the performance of the generated code.


The 'opcode' constraint
-----------------------

In addition to the above functions, and the encoding's fields, there is the 'opcode' constraint. This allows an arbitrary field (sequence of consecutive bits) to be used as a value within a constraint. Two syntaxes are available:

opcode<BIT>      Extract bit 'BIT' from the opcode
opcode<HI:LO>    Extract the bits between 'HI' and 'LO' (all inclusive)


Actiondef files
===============

Actiondef files are plain text files containing action definitions. The files contain a mix of two languages - the simplistic language decgen uses to define the actions, and C. Within the decgen areas, comments begin with a # and extend to the end of the line. Within the C areas, only minimal parsing is performed by decgen, and therefore C-style comments must be used (and #'s can be used to include preprocessor directives).

Here's an example action definition:

ADD_reg(Rd,Rn,Rm)
{
	printf("ADD R%lu,R%lu,R%lu\n",Rd,Rn,Rm);
	return;
}

This defines an action for the ADD_reg encoding. The action takes the Rd, Rn and Rm fields of the encoding as parameters. Inside the curly braces lies the C code that decgen will emit for that action when it outputs the decoder. Note the return statement; all actions must have explicit return statements for all code paths contained within. The data type returned can be specified by the user, but it must be the same data type for all actions.

For an action definition to be correctly formatted, the action name and the parameter list must be on the same line. If no parameters are needed then empty brackets must be used. The following line must begin with an opening brace, indicating the start of the C code. The contents of subsequent lines are then collected together, until a line beginning with a closing brace is detected. Therefore if you want to use complex if(), case(), etc. statements within the code block, you must make sure you indent the code to avoid placing any closing braces at the start of the line.

Parameters passed to the action will be of type 'const uint32_t'. decgen requires the user to ensure that 'uint32_t' is the type of a 32bit unsigned integer.

When specifying the parameters to pass to the encoding, it is possible to use the colon character to concatenate multiple field names together. This will cause decgen to concatenate the fields together, turning two or more fields into one argument/variable within the action code. For example, the below action definition combines the 'imm4' and 'imm12' fields of an ARM MOVW instruction into one argument to the action, which will be available as the variable 'imm4_imm12' (The colons are replaced with underscores in the variable name). 'imm12' occupies the low bits of the resulting variable.

MOV_imm_A2(cond,imm4:imm12,Rd)
{
	printf("MOVW%s\tR%d,#0x%04X",condition(cond),Rd,imm4_imm12);
	return;
}

Apart from the fields of an encoding, an action can also take an extra, special parameter - 'nonstandard'. This, another const uint32_t, is a result of ANDing the opcode with the encoding's 'should match' fields, and then EORing with the required values of those bits. Thus 'nonstandard' will be zero if the opcode had the correct values for those bits, and nonzero if it did not match. The bits within nonstandard which are 1 correspond to the bits that differ from the required values.


'as if' actions
---------------

'as if' actions allow you to force a particular encoding to behave as if it were another, even if they have fundamentally different field lists. The syntax is as follows:

X as if Y

This will cause encoding 'X' to use action 'Y'. Action 'Y' must have been previously defined, and must map to an encoding with the same name.

Note: 'as if' support hasn't received much testing, so don't be surprised if it doesn't work perfectly! 


Invoking the decoder, and required support code
===============================================

Once you've passed decgen your encoding file(s) and actiondef file(s) and asked it to produce a decoder, you will need to prepend some support code to the C file. This support code must set up the following types and macros:

uint32_t - must be the type of a 32bit unsigned integer
RETURNTYPE - must be the return type of the decoder (or 'void' if nothing is to be returned)
PARAMDECL - a macro that expands to the definition of the parameter list of the decoder. All parameters passed to the decoder will be visible to the actions
PARAMS - a macro that expands to the list of parameters of the decoder.
OPCODE - a macro that expands to the name of the variable that is the opcode to decode. Often this would be one of the parameters inside PARAMS, but there is no reason why it cannot be a global variable instead.

For example, a simple decoder may define the above as follows:

#include <stdint.h> /* for uint32_t */
#define RETURNTYPE void
#define PARAMDECL const uint32_t OPCODE
#define PARAMS OPCODE

Note that, as above, there is no real need to make OPCODE a macro if you are happy with the parameter being called OPCODE.

Your support code will also need to include any prerequisite headers, etc. that are required for the actions to function.

There is one public, non-static entry point to the decoder. This entry point must be used to invoke the decoder. Its signature is as follows:

RETURNTYPE DECODER(PARAMDECL);

Therefore you may also wish to #define DECODER to something more useful, e.g. '#define DECODER disassemble_arm', or use the -name= option to set the name directly, e.g. -name=disassemble_arm. For the above example, this would then result in the decoder function having the signature 'void disassemble_arm(const uint32_t OPCODE)'.


Reserved words
--------------

By default, decgen uses the prefix decgen__ for the names of all the C functions and variables it produces (with the exception of the DECODER, and the parameters used by actions). An alternative prefix can be specified by using the -prefix= option.


Custom builtin function implementations
---------------------------------------

When decgen produces the C code containing the decoder it also outputs implementations of any of the builtin functions that are required by the decoder. In the C source these functions are implemented as #define macros. Each implemenentation is guarded with a #ifndef check. This means that by defining your own version of the macro you can easily prevent the default implementation from being used.

Although you can't sensibly provide an implementation that produces a different result to the original, you can replace implementations with faster versions tailored to the target machine. E.g. on some systems it may be advantageous to replace the bitcount function (which, with default settings, would be implemented by the decgen__func_bitcount macro) with a call to GCC's __builtin_popcount intrinsic.


A working example
=================

In the 'example' folder you'll find an example actiondef file and an example skeleton C program. Combined with some of the encoding defintions from the 'encodings' folder, these produce a simple Thumb instruction set disassembler that, when run, outputs disassembly of all 65536 16-bit Thumb instructions. The output is roughly similar to that of the Debugger_DissasembleThumb SWI provided by RISC OS 5.

To create the working example, perform the following steps:

* Invoke decgen:
  decgen -bits=16 -e encodings/T16_ARMv7 encodings/T16_ARMv7_nEE -v -a example/T16_ARMv7 -pre example/head.c -o example/prog.c -verify
* Invoke your favourite C compiler to compile example/prog.c into an executable


Graph export
============

decgen provides rudimentary support for exporting decision trees as GraphML files. The GraphML files contain minimal yEd markup, so that the files can be loaded into yEd and have the correct node and edge labels displayed. However the files contain no layout information, so for visual inspection of graphs you'll have to use the appropriate layout option in yEd (e.g. hierarchical). Colouring the nodes and edges is also advised!


Using cache files to improve build times
========================================

The -readcache, -writecache and -updatecache options allow decgen to make use of a cache file to improve tree generation time. Using cache files allows decgen to skip the most computationally expensive part of tree generation (working out what the next node should look like), reducing build times for large trees to a fraction of their full time. This is particularly useful once you have all your encodings in place and are working on finalising the behaviour of the actions - using a cache file will allow you to modify the actions or tweak tree optimisation settings without having to deal with the tree being rebuilt from scratch each time.

When -readcache is specified, decgen will attempt to read from the cache file and will fail with an error if the cache can't be read or appears to be invalid.

When -writecache is specified, decgen will attempt to write to the cache file, overwriting the current contents.

-updatecache combines the above two options. If the cache exists decgen will read from it. If the cache file is missing or looks invalid, decgen will restart the tree generation algorithm and write out a fresh cache file in its place.

Note that when a cache file is used, decgen may produce sub-optimal trees if the encodings or actions change significantly from their original form. However the trees that are generated should still be valid.

Currently, you will also have to manually invalidate the cache file if you change the -maxmaskbits option.


Example encodings
=================

In the 'encodings' folder is a set of encoding files covering the ARM, Thumb, Thumb2, and ThumbEE instruction sets. Each file covers the following:

ARMv3		The ARMv2, ARMv2a, and ARMv3 instruction sets. Macros are used
		to select which instruction set variant is used.
ARMv7		The bulk of the ARMv7 instruction set, as described by the
		ARMv7-AR ARM. With the right actiondef files, decoders could
		be produced for ARMv4, ARMv5, ARMv6 or ARMv7.
ARMv7_ASIMD	The file to use alongside 'ARMv7' or 'T32_ARMv7' if you want
		the Advanced SIMD instructions
ARMv7_VFP	The file to use alongside 'ARMv7' or 'T32_VFP' if you want the
		VFP instructions. Note that this file also contains the
		instructions that belong to both the ASIMD and VFP instruction
		sets, whereas ARMv7_ASIMD only contains instructions that
		belong to the SIMD instruction set alone.
ARMv7_nASIMD	The file to use alongside 'ARMv7' or 'T32_ARMv7' if you do not
		want the Advanced SIMD instructions (the encodings within map
		the SIMD instructions to 'UNDEFINED')
ARMv7_nVFP	The file to use alongside 'ARMv7' or 'T32_ARMv7' if you do not
		want the VFP instructions (the encodings within will map the
		instructions to 'UNDEFINED')
FPA		To be used with 'ARMv3', 'ARMv7' or 'T32_ARMv7' if you want the
		FPA instruction set.
T16_ARMv7	The bulk of the 16-bit Thumb instructions
T16_ARMv7_EE	The 16-bit ThumbEE instructions (to be used with 'T16_ARMv7')
T16_ARMv7_nEE	The 16-bit Thumb instructions that are available if the
		processor is not in EE state (to be used with 'T16_ARMv7')
T32_ARMv7	The bulk of the 32-bit Thumb instructions
T32_ARMv7_EE	The file to use alongside 'T32_ARMv7' if you want the ThumbEE
		instructions
T32_ARMv7_nEE	The file to use alongside 'T32_ARMv7' if you want the
		instructions that are available in the non-EE state.
XScale_DSP	To be used with 'ARMv7' in order to detect the XScale DSP
		(coprocessor 0) instructions.

The above encding files must be combined as indicated if you want a 'complete' set of encodings. For example, ARMv7, ARMv7_ASIMD and ARMv7_VFP can be used together to describe the complete ARMv7 instruction set, including the VFP and Advanced SIMD extensions. Note that some files require certain macros to be set correctly; check the top of each file for details.

Note that decgen currently lacks support for variable-length instructions. Therefore if the first halfword of a 32bit Thumb instruction is passed to the T16_ARMv7 encodings, it will match against the 'IS_T32' actions. If a 16bit instruction (padded to 32 bits) is passed to the T32_ARMv7 encodings, then it will match against the 'IS_T16' actions.

Also note that although the ARMv7 encodings will detect all the UNDEFINED instructions, for the most part detection of UNPREDICTABLE instructions is left to the individual actions. This is mainly to keep the encodings files from becoming overcomplicated, especially since an opcode that is unpredictable in one architecture version may be predictable in another, or some users of the decoder may not care about detecting unpredictable instructions at all.

The 'checkall.sh' bash script can be used to verify that the above encoding files are all still valid, or that any changes you have made to decgen haven't introduced bugs to the encoding parsing and verification stage. On a relatively modern x86 PC this will take around fifteen minutes. On a relatively modern ARM machine you can expect it to take around half an hour.


Optimisation options
====================

By default decgen will try to produce a decision tree which is as small as possible. However this has the effect of moving a lot of the bloat into the leaves (i.e. actions); each action function is responsible for extracting the relevant fields from the opcode and performing whatever operations you wish. When developing something like a disassembler which operates on the full instruction set, you'll often find that many actions are very similar to each other. If your compiler isn't smart enough to spot these cases and optimise the code itself, this is where decgen's optimisation options come in.


-opt-strings
------------

This option looks for string literals in the action code sequences and attempts to turn them into function parameters instead. If two or more actions are found to be identical apart from their string literals, then only one C function will be emitted for that action, and any string literals that differ will be turned into arguments. For something like a disassembler, this can have a large impact on the resulting code size.

Note that the C parsing code is very simple - although it should cope OK with comments, the \ character, etc., it won't correctly handle wide character string literals (e.g. L"foo"), and absolutely everything which isn't a string literal (including whitespace!) must match in order for it to consider merging together two actions.


-maxpreextract=<n>
------------------

Ordinarily when decgen generates code for executing an action, that generated code has to extract all the fields from the instruction that the action requires. But this can result in lots of duplicated code if several actions depend on the same fields. By using the '-maxpreextract' option, decgen will attempt to extract some of those common fields at higher levels of the tree, and pass them through as parameters to the leaf/action functions. The <n> parameter controls the maximum number of extra parameters that decgen will try to add to any one leaf function - tune this value to the one that gives the best results for your system.

Note that this option is currently incompatible with the -opt-strings option.


-treewalker
-----------

decgen prefers to generate the tree as pure code, using lots of static/inline functions to implement each node in the decition tree. However this can result in a lot of unwanted bloat - if the tree was to be output as a raw blob of data then it could be made a lot smaller. The -treewalker option is for just this - instead of writing out lots of code, decgen instead outputs a compact binary representation of the tree, and a small tree walker routine to walk the tree and execute the corresponding action.

Note that this option is incompatible with the -maxpreextract option, due to the elimination of all the node functions that make up the tree.


-useargstruct
-------------

Like -maxpreextract, this option tackles the issue of leaf nodes having to extract all the fields from the instruction themselves. However instead of merely shuffling the extraction of some fields further up the tree, this option causes decgen to take the approach of pre-extracting all fields and storing them in a struct. A pointer to this struct is then passed to each node in the tree (or directly to the action functions when combined with -treewalker), greatly reducing the amount of code needed to extract the fields, at the cost of slower execution due to all fields being extracted even though only a few of them are likely to be needed.


How decgen works
================

After parsing the encodings and actions, decgen constructs a decision tree using an algorithm similar to ID3. Each node in the tree can be a 'field' node, a 'constraint' node, or a 'leaf' node. The code generation stage translates each node into a single C function. 'field' nodes use a switch() statement to examine a sequence of consecutive bits of the opcode, and thus typically have 2^N children. 'constraint' nodes evaluate a constraint, and thus have two children. 'leaf' nodes contain code to perform the action associated with an encoding (or more correctly, the action associated with a kin).

Although the tree initially starts out as a 'proper' tree, the tree optimisation process detects and deduplicates identical nodes and subtrees, resulting in a structure that is closer to a DAG (directed acyclic graph) than a tree. The tree optimisation step also removes any redundant nodes which may have been inserted into the tree.


Compiling the source code
=========================

The 'source' folder contains the decgen source code. Any half-decent C++ compiler should be up to the job of compiling the code.


Bugs
====

From time to time decgen may throw an assert, or even worse, crash. This is likely due to a bug. If you find or fix a bug, then please get in touch with me!


History
=======

V1.41b - 2017/04/15
  * Rebuilt RISC OS binary with GCCSDK 4.7.4 release 3, to produce an ARMv8 compatible binary
  * No other changes
V1.41 - 2014/02/07
  * Fixed bug in regular code generator that would result in the wrong paths being taken for many trees
  * Improved action handling so that identical actions will be merged together
  * Tweaked code output so that nodes which only have one reference will be marked as inline functions, to help with compilers that aren't very good at automatic inlining
V1.40 - 2013/11/26
  * Added some options to control optimisation of generated code:
    * '-opt-strings' to optimise leaf nodes which are identical apart from containing differing string literals
    * '-maxpreextract=<n>' to allow fields required by several actions to be extracted further up the tree instead of only in the leaves
    * '-treewalker' to write the tree in a compact data form (with a small tree walker routine) instead of as pure code
    * '-useargstruct' to pre-extract all fields and store them in a structure which is passed down through the tree nodes
  * Added support for tree generation cache files ('-writecache', '-readcache' and '-updatecache' options), which can significantly improve tree generation times when the basic tree structure is the same as before
  * Assorted fixed for the bundled ARM/Thumb encoding files
  * Added support for concatenating fields in action definitions using ':'
V1.35 - 2012/10/20
  * Updated the bundled ARM/Thumb encoding files to bring them in line with issue C of the ARMv7 ARM
  * Changed to use C99 uint32_t/uint8_t data types instead of u32/u8
  * Fixed 'as if' actions to remap all instances of the encoding instead of just the first it finds  
V1.30 - 2011/09/25
  * Fixed a few typos in the supplied encodings
  * Added '-void' option to prevent decgen from attempting to return the value of decoder functions
  * Added hash to end of generated leaf function names, to prevent name clashes where multiple instances of an action are required to cope with differing parameters
  * Optimised node::collect to be roughly O(NlogN) instead of roughly O((N^2)/2)
  * Also fixed node::collect to return all pointers unique to node pointers, instead of just all pointers to pointers to unique nodes
  * Added a second node::collect implementation which just returns pointers to the unique nodes, instead of pointers to unique pointers to nodes
  * Reduced memory usage by making build_tree deduplicate the subtree it's just built if the root node is a mask node
  * Fixed node::shrink to not crash when shrinking a node containing null child pointers
  * Attempted to optimise determine_best_node() and split_on_field() to allow them to early-exit if they think they've found the best result, or to skip exhaustive checks if they think the node being checked can't score better than the current one. However the code is a bit broken and produces different trees as output, so it currently disabled.  
V1.20 - 2010/12/11
  * Fixed code to output all builtin functions required by the code instead of just some of them
  * Fixed the emitted C code for the bitcount function
  * Added #ifndef... guards to emitted builtin functions, so that users can easily replace them with more optimal versions where desireable
  * Made error messages much more useful. Filename and line number is now listed where appropriate. Most errors will list a column number too, but it's likely to be inaccurate if the line contained macros.
  * Upgraded macros to support parameters
  * Used new paramaterised macros to merge together the Thumb2 VFP/ASIMD encoding files with the ARM ones
  * Added FPA support to the ARMv7 & T32_ARMv7 encodings
  * Added encoding file for the XScale DSP coprocessor
  * Added support for 'as if' actions
V1.10 - 2010/05/23
  * Tidied source code a bit
  * Fixed a few mistakes in the supplied encoding definitions
  * Added code to validate the data types of constraint function parameters
  * Fixed typo in lsr macro
  * Added -pre, -post, -prefix=, -name= options
  * Fixed constraint_func destructor to use array delete operator
  * Added lazy evaluation to constraint_func::eval() for land() and lor() functions
  * Changed constraint_func::eval() to use a fixed-size array for storing function parameters instead of using a variable-size one. Using a variable-size array was overkill considering that functions accept at most two parameters, and variable-size arrays/alloca() isn't guaranteed to be any faster than an ordinary malloc()
  * Added code to simplify constraints before they get added to the constraint list for an encoding
  * Added new compute_match() function for calculating whether there is an opcode within the given subset that matches a particular encoding. Function works by beginning with a simple set of constraints and making them increasingly more complex until the constraint set is exhausted or the size of the opcode subset shrinks to zero. When a new constraint is being applied to the set, certain combinations of logical operators within the constraint will be detected and will use more optimised code paths for iterating over the new set and detecting whether it still contains any opcodes.
  * Changed split_on_constraint() and split_on_field() to use compute_match()
  * Fixed node::collect so that the node is only added to the set after its children. This ensures subtrees which have deduplicated children get deduplicated properly themselves.
  * Changed the node_constraint and node_mask constructors so that they no longer use opcode_subset::constrain(const encodingset &e) to shrink the subsets which their children inhabit. opcode_subset::constrain(const encodingset &e) is far too slow to be used in its current form, as it requires every opcode within the set to be enumerated.
  * Added code to shrink the mask area used by mask nodes, to eliminate unnecessary bits from the test
  * Removed unnecessary asserts from compute_gain() and build_tree() now that I've realised that null children can occur when generating trees for any complete set of encoding definitions, not just for fuzzy sets.
  * Added code to check for contradictions in constraints (constraints which are always false)
  * Optimised verification of fuzzy trees
  * Fixed infinite loop when verifying complete trees for 32bit opcodes
  * Optimised verification of complete trees
  * Added support for exporting the decision tree as a GraphML file, with minimal yEd markup so that the files can be viewed properly in yEd.
  * Improved tree shrinking code to swap a child's test with that of its parent if the resulting tree uses fewer nodes
  * All the above optimisations mean that encoding verification now takes approximately 50% of the time that it took in V1.00, and tree generation (for 32bit opcodes) now takes seconds instead of several hours (and likely aborting halfway through due to an inappropriate assert!)
V1.00 - 2010/05/12
  * First released version
  * Expect lots of rough edges, but hopefully not too many bugs!


Legal
=====

decgen is Copyright(c)2014, Jeffrey Lee
Allrightsreserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met: 
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

Note that the above licence does not apply to the source code that decgen produces as output. The example encoding files, and the example program, are also provided free of any copyright or licensing.


Contact
=======

You can email me@phlamethrower.co.uk, or visit my website:
http://www.quote-egnufeb-quote-greaterthan-colon-hash-comma-underscore-at.info/

- Jeffrey Lee
