Name | Updated at | |
---|---|---|
internal | ||
README.md | ||
abi-internal.md | ||
doc.go | ||
main.go |
cmd/compile
contains the main packages that form the Go compiler. The compiler
may be logically split in four phases, which we will briefly describe alongside
the list of packages that contain their code.
You may sometimes hear the terms "front-end" and "back-end" when referring to the compiler. Roughly speaking, these translate to the first two and last two phases we are going to list here. A third term, "middle-end", often refers to much of the work that happens in the second phase.
Note that the go/*
family of packages, such as go/parser
and
go/types
, are mostly unused by the compiler. Since the compiler was
initially written in C, the go/*
packages were developed to enable
writing tools working with Go code, such as gofmt
and vet
.
However, over time the compiler's internal APIs have slowly evolved to
be more familiar to users of the go/*
packages.
It should be clarified that the name "gc" stands for "Go compiler", and has little to do with uppercase "GC", which stands for garbage collection.
cmd/compile/internal/syntax
(lexer, parser, syntax tree)In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.
Each syntax tree is an exact representation of the respective source file, with nodes corresponding to the various elements of the source such as expressions, declarations, and statements. The syntax tree also includes position information which is used for error reporting and the creation of debugging information.
cmd/compile/internal/types2
(type checking)The types2 package is a port of go/types
to use the syntax package's
AST instead of go/ast
.
cmd/compile/internal/types
(compiler types)cmd/compile/internal/ir
(compiler AST)cmd/compile/internal/typecheck
(AST transformations)cmd/compile/internal/noder
(create compiler AST)The compiler middle end uses its own AST definition and representation of Go types carried over from when it was written in C. All of its code is written in terms of these, so the next step after type checking is to convert the syntax and types2 representations to ir and types. This process is referred to as "noding."
There are currently two noding implementations:
irgen (aka "-G=3" or sometimes "noder2") is the implementation used starting with Go 1.18, and
Unified IR is another, in-development implementation (enabled with
GOEXPERIMENT=unified
), which also implements import/export and inlining.
Up through Go 1.18, there was a third noding implementation (just "noder" or "-G=0"), which directly converted the pre-type-checked syntax representation into IR and then invoked package typecheck's type checker. This implementation was removed after Go 1.18, so now package typecheck is only used for IR transformations.
cmd/compile/internal/deadcode
(dead code elimination)cmd/compile/internal/inline
(function call inlining)cmd/compile/internal/devirtualize
(devirtualization of known interface method calls)cmd/compile/internal/escape
(escape analysis)Several optimization passes are performed on the IR representation: dead code elimination, (early) devirtualization, function call inlining, and escape analysis.
cmd/compile/internal/walk
(order of evaluation, desugaring)The final pass over the IR representation is "walk," which serves two purposes:
It decomposes complex statements into individual, simpler statements, introducing temporary variables and respecting order of evaluation. This step is also referred to as "order."
It desugars higher-level Go constructs into more primitive ones. For example,
switch
statements are turned into binary search or jump tables, and
operations on maps and channels are replaced with runtime calls.
cmd/compile/internal/ssa
(SSA passes and rules)cmd/compile/internal/ssagen
(converting IR to SSA)In this phase, IR is converted into Static Single Assignment (SSA) form, a lower-level intermediate representation with specific properties that make it easier to implement optimizations and to eventually generate machine code from it.
During this conversion, function intrinsics are applied. These are special functions that the compiler has been taught to replace with heavily optimized code on a case-by-case basis.
Certain nodes are also lowered into simpler components during the AST to SSA conversion, so that the rest of the compiler can work with them. For instance, the copy builtin is replaced by memory moves, and range loops are rewritten into for loops. Some of these currently happen before the conversion to SSA due to historical reasons, but the long-term plan is to move all of them here.
Then, a series of machine-independent passes and rules are applied. These do not
concern any single computer architecture, and thus run on all GOARCH
variants.
These passes include dead code elimination, removal of
unneeded nil checks, and removal of unused branches. The generic rewrite rules
mainly concern expressions, such as replacing some expressions with constant
values, and optimizing multiplications and float operations.
cmd/compile/internal/ssa
(SSA lowering and arch-specific passes)cmd/internal/obj
(machine code generation)The machine-dependent phase of the compiler begins with the "lower" pass, which rewrites generic values into their machine-specific variants. For example, on amd64 memory operands are possible, so many load-store operations may be combined.
Note that the lower pass runs all machine-specific rewrite rules, and thus it currently applies lots of optimizations too.
Once the SSA has been "lowered" and is more specific to the target architecture, the final code optimization passes are run. This includes yet another dead code elimination pass, moving values closer to their uses, the removal of local variables that are never read from, and register allocation.
Other important pieces of work done as part of this step include stack frame layout, which assigns stack offsets to local variables, and pointer liveness analysis, which computes which on-stack pointers are live at each GC safe point.
At the end of the SSA generation phase, Go functions have been transformed into
a series of obj.Prog instructions. These are passed to the assembler
(cmd/internal/obj
), which turns them into machine code and writes out the
final object file. The object file will also contain reflect data, export data,
and debugging information.
To dig deeper into how the SSA package works, including its passes and rules, head to cmd/compile/internal/ssa/README.md.