CLI Back-End

Table of Contents

Latest News

2007-01-09

Added documentation about the back-end internal structure.

2006-09-07

Creation of st/cli branch.

Introduction

CLI is a framework that defines a platform independent format for executables and a run-time environment for the execution of applications. The framework has been been standardized by the European Computer Manufacturers Association (ECMA-335) and by the International Organization for Standardization (ISO/IEC 23271:2006). CLI executables are encoded in the Common Intermediate Language (CIL), a stack-based bytecode language. CLI framework is designed to support several programming languages with different abstraction levels, from object-oriented managed languages to low-level languages with no managed execution at all.

The purpose of this project is to develop a GCC back-end that produces CLI-compliant binaries. The initial focus is on C language (more precisely, C99); C++ is likely to be considered in the future, as well as any other language for which there is an interest for a CLI back-end.

The implementation currently resides in the st/cli branch.

Contributing

Check out st/cli branch following the instructions found in the SVN documentation.

Being this a branch, the usual maintainer rules do not apply. The branch is being maintained by Roberto Costa. Checking-in into the branch is free, provided that the action was coordinated with the branch maintainer and that the usual contribution and testing rules are followed. The branch is still in heavy development and check ins into the mainline are not planned yet.

Structure of the back-end

Unlike a typical GCC back-end, CLI back-end stops the compilation flow at the end of the middle-end passes and, without going through any RTL pass, it emits CIL bytecode from GIMPLE representation. As a matter of fact, RTL is not a convenient representation to emit CLI code, while GIMPLE is much more suited for this purpose.

CIL bytecode is much more high-level than a processor machine code. For instance, there is no such a concept of registers or of frame stack; instructions operate on an unbound set of local variables (which closely match the concept of local variables) and on elements on top of an evaluation stack. In addition, CIL bytecode is strongly typed and it requires high-level data type information that is not preserved across RTL.

Target machine model

Like existing GCC back-ends, CLI is truly seen as a target machine and, as such, it follows GCC policy about the organization of the back-end specific files.

Unfortunately, it is not feasible to define a single CLI target machine. The reason is that, in dealing with languages with unmanaged datas like C and C++, the size of pointers of the target machine must be known at compile time. Therefore, separate 32-bit and 64-bit CLI targets are defined, namely cil32 and cil64. CLI binaries compiled for cil32 are not guaranteed to work on 64-bit machines and vice-versa. Current work is focusing on cil32 target, but the differences between the two are minimal.

Being cil32 the target machine, the machine model description is located in files config/cil32/cil32.*. This is an overview of such a description:

CIL simplification pass

Though most GIMPLE tree codes closely match what is representable in CIL, some simply do not. Those codes could still be expressed in CIL bytecodes by a CIL-emission pass; however, it would be much more difficult and complicated to perform the required transformations at CIL emission time (i.e.: those that involve generating new local temporary variables, modifications in the control-flow graph or in types...), than directly on GIMPLE expressions.

Pass simpcil (file config/cil32/tree-simp-cil.c) is in charge of performing such transformations. The input is any code in GIMPLE form; the outcome is still valid GIMPLE, it just contains only constructs for which CIL emission is straightforward. Such a constrained GIMPLE format is referred as "CIL simplified" GIMPLE throughout this documentation.

The pass is currently performed just once, after leaving SSA form and immediately before the CIL emission. This is not a constraint; the only requirement is that the CIL emission is immediately preceded by a run of simpcil. simpcil pass is designed to be idempotent and it is perfectly fine to insert additional previous runs in the compilation flow. Given its current position in the list of passes, simpcil does not yet support SSA form (though planned).

This is a non-exhaustive list of simpcil transformations:

CIL emission pass

Pass cil (file config/cil32/gen-cil.c) receives a CIL-simplified GIMPLE form as input and it produces a CLI assembly file as output. It is the final pass of the compilation flow.

Before the proper emission, cil currently merges GIMPLE expressions in the attempt to eliminate local variables. The elimination of such variables has positive effects on the generated code, both on performance and code size (each of such an useless local variable ends up in an avoidable pair of stloc and ldloc CIL opcodes). The resulting code is no longer in valid GIMPLE form; this is fine because the code stays in this form only within the pass. This is conceptually (perhaps not only conceptually) similar to what done by the out-of-ssa pass; out-of-ssa may even be more powerful in doing this, since it operates in SSA form. It may be interesting to move simpcil pass before out-of-ssa and to avoid any variable elimination in cil. To be evaluated.

Here is an overview of how cil pass handles some of GIMPLE constructs. Many of them are omitted, for which the emission is straightforward.

Readings

[1]
ECMA, Common Language Infrastructure (CLI), 4th edition, June 2006.
[2]
John Gough, Compiling for the .NET Common Language Runtime (CLR), Prentice Hall, ISBN 0-13-062296-6.
[3]
Serge Liden, Inside Microsoft .NET IL Assembler, Microsoft Press, ISBN 073561547.