- Implementation of a small C11 (2011 ANSI C standard) to MIR compiler
- no optional standard features: variable size arrays, complex, atomic, thread local variables
- support of the following C extensions (many of them can be used for better
JIT implementation of dynamic programming languages):
\eescape sequence- binary numbers starting with
0bor0Bprefix - macro
__has_include - empty structure, unions, and initializer list
- range cases
case <start>...<finish> - zero size arrays
- statement expressions
- labels as values (see analogous GNU C extension)
- register variables (see analogous global register variable extension in GNU C)
- builtins (some of them is a part GNU C extensions):
__builtin_expect(<cond>, <expected cond value>)is used instead of to hintc2mirabout expected value and results in better code generation by placing code related to the expected condition nearby- overflow builtins can be used for effective code generation in JIT for interpreters
using multi-precision integer numbers like Ruby or Python:
__builtin_add_overflow(v1,v2,&res)makesres=v1+v2and returns non-zero if the overflow occurs
__builtin_sub_overflow(v1,v2,&res)and__builtin_mul_overflow(v1,v2,&res)are analogous to the above builtin but makes subtraction and multiplication- builtins for jump calls and returns can be used for fast switching between JITted code and threaded code interpreters:
__builtin_jcall(func)calls the C function with void result through a direct jump. Such function should return by the next builtin only__builtin_jret (addr)returns from a function called by__builtin_jcallto given address- builtins used for generating specialized code based on lazy basic block versioning (see blogpost how to use them):
__builtin_prop_set(var, property_const)sets the variable property to given constant__builtin_prop_eq(var, property_const)and__builtin_prop_eq(var, property_const)compares current property value of the variable with given constant and returns true if they are correspondingly equal or not equal
- Minimal compiler code dependency. No additional tools (like yacc/flex) are used
- Simplicity of implementation over speed to make code easy to learn and maintain
- Four passes to divide compilation on manageable sub-tasks:
- Preprocessor pass generating tokens
- Parsing pass generating AST (Abstract Syntax Tree). To be close ANSI standard grammar as soon as possible, PEG manual parser is used
- Context pass checking context constraints and augmenting AST
- Generation pass producing MIR
- Four passes to divide compilation on manageable sub-tasks:
C to MIR compiler can be used as a library to make it as a part of your code. The compiler can be used as a separate program as usual C compiler.
To recognize compilation by C-to-MIR compiler, compiler specific
macros __mirc__ and __MIRC__ defined as 1 can be used.
An additional information about C-to-MIR compiler can be found in this blog post
The project makefile builds program c2m which can compile C and
MIR files given on the command line and produce MIR code or execute
it:
- The compiler
c2mhas options-E,-c,-S, and-oas other C compilers:-Estops the compiler after preprocessing and output the preprocessed file into standard output or into file given after option-o-Sstops the compiler after generation of MIR code and outputs MIR textual representations of C source files and binary MIR files with suffix.bmir-calso stops the compiler after generation of MIR code and outputs MIR binary representation of C source files and textual MIR files with suffix.mir- Output files for options
-Sand-care created in the current directory named as the source files by using suffix correspondingly.mirand.bmir - If you have one source file, you also can use option
-oto setup the output file
- You can give C source on the command line by using option
-sand subsequent string which will be C source - You can read C source from the standard input by using option
-i - If options
-E,-c, or-Sare not given, all generated MIR code is linked and checked that there is functionmain. The whole generated code is output as binary MIR filea.bmiror as file given by option-o - Instead of output of the linked file, you can execute the program by using options
-ei,-eg,-el, or-eb:-eimeans execution the code by MIR interpreter-egmeans execution machine code generated by MIR-generator. MIR-generator processing all MIR code first before the interpreter-elmeans lazy code generation. It is analogous to-egbut function code is generated on the first call of the function. So machine code will be never generated for functions never used-ebmeans lazy BB code generation. It is analogous to-elbut BB code is generated on the first execution of the BB- Command line arguments after option
-ei,-eg,-el, or-ebare not processed by C to MIR compiler. Such arguments are passed to generated and executed MIR program - The executed program can use functions from libraries
libcandlibm. They are always available- Option
-lxxxmakes librarylibxxxavailable for the program execution - Option
-Lxxxadds library directoryxxxto search libraries given by options-lxxx. The search starts with the standard library directory and continues in directories given by preceding-Loptions in their order on the command line
- Option
- To generate stand-alone executable see utility
b2ctabdescription in directorymir-utils
- Options
-Dand-Uare analogous to ones used in other C compilers for macro manipulations on the command line - Option
-Ito add include directory is analogous to other C-compilers - Option
-fpreprocessedmeans skipping preprocessor for C files - Option
-fsyntax-onlymeans stopping after parsing and semantic checking of C files without MIR code generation - Option
-wmeans switching off reporting all warnings - Option
-pedanticis used for stricter diagnostic about C standard conformance. It might be useful as C2MIR implements some GCC extensions of C - Option
-O<n>is used to set up MIR-generator optimization level. The optimization levels are described in documentation for MIR generator API functionMIR_gen_set_optimize_level - Option
-dg[<level>]is used for debugging MIR-generator. It results in dumping debug information about MIR-generator work tostderraccording to the debug level. If the level is omitted, it means maximal level - Besides C files, MIR textual files with suffix
.mirand MIR binary files with suffix.bmircan be given on the command line. In this case these MIR files are read and added to generated MIR code - Simple examples of the compiler usage and execution of C program:
c2m -c part1.c && c2m -S part2.c && c2m part1.bmir part2.mir -eg # variant 1
c2m part1.c part2.c && c2m a.bmir -eg # variant 2
c2m part1.c part2.c -eg # variant 3
The compiler can be used as a library and can be made a part of your
program. It can take C code from a file or memory. The all compiler
code is contained in file c2mir.c. Its interface is described in
file c2mir.h:
- Function
c2mir_init (MIR_context ctx)initializes the compiler to generate MIR code in contextctx - Function
c2mir_finish (MIR_context ctx)finishes the compiler to work in contextctx. It frees some common memory used by the compiler worked in contextctx - Function
c2mir_compile (MIR_context_t ctx, struct c2mir_options *ops, int (*getc_func) (void *), void *getc_data, const char *source_name, FILE *output_file)compiles one C code file. Function returns true (non-zero) in case of successful compilation. It frees all memory used to compile the file. So you can compile a lot of files in the same context without program memory growth. Functiongetc_funcprovides access to the compiled C code which can be in a file or memory. The function will getgetc_dataevery its call as its argument. Name of the source file used for diagnostic is given by parametersource_name. Parameteroutput_fileis analogous to one given by option-oofc2m. Parameter ops is a pointer to a structure defining the compiler options:- Member
message_filedefines where to report errors and warnings. If its value is NULL, there will be no any output - Members
macro_commands_numandmacro_commandsdirect compiler as options-Dand-Uofc2m - Members
include_dirs_numandinclude_dirsdirect compiler as options-I - Members
debug_p,verbose_p,ignore_warnings_p,no_prepro_p,prepro_only_p,syntax_only_p,pedantic_p,asm_p, andobject_pdirect the compiler as options-d,-v,-w,-fpreprocessed,-E,-fsyntax-only,-pedantic,-S, and-cofc2m. If all values ofprepro_only_p,syntax_only_p,asm_p, andobject_p are zero, there will be no output files, only the generated MIR module will be kept in memory of the contextctx` - Member
module_numdefines index in the generated MIR module name (if there is any)
- Member