1bf215546Sopenharmony_ciWelcome to Mesa's GLSL compiler.  A brief overview of how things flow:
2bf215546Sopenharmony_ci
3bf215546Sopenharmony_ci1) lex and yacc-based preprocessor takes the incoming shader string
4bf215546Sopenharmony_ciand produces a new string containing the preprocessed shader.  This
5bf215546Sopenharmony_citakes care of things like #if, #ifdef, #define, and preprocessor macro
6bf215546Sopenharmony_ciinvocations.  Note that #version, #extension, and some others are
7bf215546Sopenharmony_cipassed straight through.  See glcpp/*
8bf215546Sopenharmony_ci
9bf215546Sopenharmony_ci2) lex and yacc-based parser takes the preprocessed string and
10bf215546Sopenharmony_cigenerates the AST (abstract syntax tree).  Almost no checking is
11bf215546Sopenharmony_ciperformed in this stage.  See glsl_lexer.ll and glsl_parser.yy.
12bf215546Sopenharmony_ci
13bf215546Sopenharmony_ci3) The AST is converted to "HIR".  This is the intermediate
14bf215546Sopenharmony_cirepresentation of the compiler.  Constructors are generated, function
15bf215546Sopenharmony_cicalls are resolved to particular function signatures, and all the
16bf215546Sopenharmony_cisemantic checking is performed.  See ast_*.cpp for the conversion, and
17bf215546Sopenharmony_ciir.h for the IR structures.
18bf215546Sopenharmony_ci
19bf215546Sopenharmony_ci4) The driver (Mesa, or main.cpp for the standalone binary) performs
20bf215546Sopenharmony_cioptimizations.  These include copy propagation, dead code elimination,
21bf215546Sopenharmony_ciconstant folding, and others.  Generally the driver will call
22bf215546Sopenharmony_cioptimizations in a loop, as each may open up opportunities for other
23bf215546Sopenharmony_cioptimizations to do additional work.  See most files called ir_*.cpp
24bf215546Sopenharmony_ci
25bf215546Sopenharmony_ci5) linking is performed.  This does checking to ensure that the
26bf215546Sopenharmony_cioutputs of the vertex shader match the inputs of the fragment shader,
27bf215546Sopenharmony_ciand assigns locations to uniforms, attributes, and varyings.  See
28bf215546Sopenharmony_cilinker.cpp.
29bf215546Sopenharmony_ci
30bf215546Sopenharmony_ci6) The driver may perform additional optimization at this point, as
31bf215546Sopenharmony_cifor example dead code elimination previously couldn't remove functions
32bf215546Sopenharmony_cior global variable usage when we didn't know what other code would be
33bf215546Sopenharmony_cilinked in.
34bf215546Sopenharmony_ci
35bf215546Sopenharmony_ci7) The driver performs code generation out of the IR, taking a linked
36bf215546Sopenharmony_cishader program and producing a compiled program for each stage.  See
37bf215546Sopenharmony_ci../mesa/program/ir_to_mesa.cpp for Mesa IR code generation.
38bf215546Sopenharmony_ci
39bf215546Sopenharmony_ciFAQ:
40bf215546Sopenharmony_ci
41bf215546Sopenharmony_ciQ: What is HIR versus IR versus LIR?
42bf215546Sopenharmony_ci
43bf215546Sopenharmony_ciA: The idea behind the naming was that ast_to_hir would produce a
44bf215546Sopenharmony_cihigh-level IR ("HIR"), with things like matrix operations, structure
45bf215546Sopenharmony_ciassignments, etc., present.  A series of lowering passes would occur
46bf215546Sopenharmony_cithat do things like break matrix multiplication into a series of dot
47bf215546Sopenharmony_ciproducts/MADs, make structure assignment be a series of assignment of
48bf215546Sopenharmony_cicomponents, flatten if statements into conditional moves, and such,
49bf215546Sopenharmony_ciproducing a low level IR ("LIR").
50bf215546Sopenharmony_ci
51bf215546Sopenharmony_ciHowever, it now appears that each driver will have different
52bf215546Sopenharmony_cirequirements from a LIR.  A 915-generation chipset wants all functions
53bf215546Sopenharmony_ciinlined, all loops unrolled, all ifs flattened, no variable array
54bf215546Sopenharmony_ciaccesses, and matrix multiplication broken down.  The Mesa IR backend
55bf215546Sopenharmony_cifor swrast would like matrices and structure assignment broken down,
56bf215546Sopenharmony_cibut it can support function calls and dynamic branching.  A 965 vertex
57bf215546Sopenharmony_cishader IR backend could potentially even handle some matrix operations
58bf215546Sopenharmony_ciwithout breaking them down, but the 965 fragment shader IR backend
59bf215546Sopenharmony_ciwould want to break to have (almost) all operations down channel-wise
60bf215546Sopenharmony_ciand perform optimization on that.  As a result, there's no single
61bf215546Sopenharmony_cilow-level IR that will make everyone happy.  So that usage has fallen
62bf215546Sopenharmony_ciout of favor, and each driver will perform a series of lowering passes
63bf215546Sopenharmony_cito take the HIR down to whatever restrictions it wants to impose
64bf215546Sopenharmony_cibefore doing codegen.
65bf215546Sopenharmony_ci
66bf215546Sopenharmony_ciQ: How is the IR structured?
67bf215546Sopenharmony_ci
68bf215546Sopenharmony_ciA: The best way to get started seeing it would be to run the
69bf215546Sopenharmony_cistandalone compiler against a shader:
70bf215546Sopenharmony_ci
71bf215546Sopenharmony_ci./glsl_compiler --dump-lir \
72bf215546Sopenharmony_ci	~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag
73bf215546Sopenharmony_ci
74bf215546Sopenharmony_ciSo for example one of the ir_instructions in main() contains:
75bf215546Sopenharmony_ci
76bf215546Sopenharmony_ci(assign (constant bool (1)) (var_ref litColor)  (expression vec3 * (var_ref Surf
77bf215546Sopenharmony_ciaceColor) (var_ref __retval) ) )
78bf215546Sopenharmony_ci
79bf215546Sopenharmony_ciOr more visually:
80bf215546Sopenharmony_ci                     (assign)
81bf215546Sopenharmony_ci                 /       |        \
82bf215546Sopenharmony_ci        (var_ref)  (expression *)  (constant bool 1)
83bf215546Sopenharmony_ci         /          /           \
84bf215546Sopenharmony_ci(litColor)      (var_ref)    (var_ref)
85bf215546Sopenharmony_ci                  /                  \
86bf215546Sopenharmony_ci           (SurfaceColor)          (__retval)
87bf215546Sopenharmony_ci
88bf215546Sopenharmony_ciwhich came from:
89bf215546Sopenharmony_ci
90bf215546Sopenharmony_cilitColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0);
91bf215546Sopenharmony_ci
92bf215546Sopenharmony_ci(the max call is not represented in this expression tree, as it was a
93bf215546Sopenharmony_cifunction call that got inlined but not brought into this expression
94bf215546Sopenharmony_citree)
95bf215546Sopenharmony_ci
96bf215546Sopenharmony_ciEach of those nodes is a subclass of ir_instruction.  A particular
97bf215546Sopenharmony_ciir_instruction instance may only appear once in the whole IR tree with
98bf215546Sopenharmony_cithe exception of ir_variables, which appear once as variable
99bf215546Sopenharmony_cideclarations:
100bf215546Sopenharmony_ci
101bf215546Sopenharmony_ci(declare () vec3 normDelta)
102bf215546Sopenharmony_ci
103bf215546Sopenharmony_ciand multiple times as the targets of variable dereferences:
104bf215546Sopenharmony_ci...
105bf215546Sopenharmony_ci(assign (constant bool (1)) (var_ref __retval) (expression float dot
106bf215546Sopenharmony_ci (var_ref normDelta) (var_ref LightDir) ) )
107bf215546Sopenharmony_ci...
108bf215546Sopenharmony_ci(assign (constant bool (1)) (var_ref __retval) (expression vec3 -
109bf215546Sopenharmony_ci (var_ref LightDir) (expression vec3 * (constant float (2.000000))
110bf215546Sopenharmony_ci (expression vec3 * (expression float dot (var_ref normDelta) (var_ref
111bf215546Sopenharmony_ci LightDir) ) (var_ref normDelta) ) ) ) )
112bf215546Sopenharmony_ci...
113bf215546Sopenharmony_ci
114bf215546Sopenharmony_ciEach node has a type.  Expressions may involve several different types:
115bf215546Sopenharmony_ci(declare (uniform ) mat4 gl_ModelViewMatrix)
116bf215546Sopenharmony_ci((assign (constant bool (1)) (var_ref constructor_tmp) (expression
117bf215546Sopenharmony_ci vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) )
118bf215546Sopenharmony_ci
119bf215546Sopenharmony_ciAn expression tree can be arbitrarily deep, and the compiler tries to
120bf215546Sopenharmony_cikeep them structured like that so that things like algebraic
121bf215546Sopenharmony_cioptimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1
122bf215546Sopenharmony_ci* (mat2 * vec))) or recognizing operation patterns for code generation
123bf215546Sopenharmony_ci(vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier.  This comes
124bf215546Sopenharmony_ciat the expense of additional trickery in implementing some
125bf215546Sopenharmony_cioptimizations like CSE where one must navigate an expression tree.
126bf215546Sopenharmony_ci
127bf215546Sopenharmony_ciQ: Why no SSA representation?
128bf215546Sopenharmony_ci
129bf215546Sopenharmony_ciA: Converting an IR tree to SSA form makes dead code elimination,
130bf215546Sopenharmony_cicommon subexpression elimination, and many other optimizations much
131bf215546Sopenharmony_cieasier.  However, in our primarily vector-based language, there's some
132bf215546Sopenharmony_cimajor questions as to how it would work.  Do we do SSA on the scalar
133bf215546Sopenharmony_cior vector level?  If we do it at the vector level, we're going to end
134bf215546Sopenharmony_ciup with many different versions of the variable when encountering code
135bf215546Sopenharmony_cilike:
136bf215546Sopenharmony_ci
137bf215546Sopenharmony_ci(assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) )
138bf215546Sopenharmony_ci(assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) )
139bf215546Sopenharmony_ci(assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) )
140bf215546Sopenharmony_ci
141bf215546Sopenharmony_ciIf every masked update of a component relies on the previous value of
142bf215546Sopenharmony_cithe variable, then we're probably going to be quite limited in our
143bf215546Sopenharmony_cidead code elimination wins, and recognizing common expressions may
144bf215546Sopenharmony_cijust not happen.  On the other hand, if we operate channel-wise, then
145bf215546Sopenharmony_ciwe'll be prone to optimizing the operation on one of the channels at
146bf215546Sopenharmony_cithe expense of making its instruction flow different from the other
147bf215546Sopenharmony_cichannels, and a vector-based GPU would end up with worse code than if
148bf215546Sopenharmony_ciwe didn't optimize operations on that channel!
149bf215546Sopenharmony_ci
150bf215546Sopenharmony_ciOnce again, it appears that our optimization requirements are driven
151bf215546Sopenharmony_cisignificantly by the target architecture.  For now, targeting the Mesa
152bf215546Sopenharmony_ciIR backend, SSA does not appear to be that important to producing
153bf215546Sopenharmony_ciexcellent code, but we do expect to do some SSA-based optimizations
154bf215546Sopenharmony_cifor the 965 fragment shader backend when that is developed.
155bf215546Sopenharmony_ci
156bf215546Sopenharmony_ciQ: How should I expand instructions that take multiple backend instructions?
157bf215546Sopenharmony_ci
158bf215546Sopenharmony_ciSometimes you'll have to do the expansion in your code generation.
159bf215546Sopenharmony_ciHowever, in many cases you'll want to do a pass over the IR to convert
160bf215546Sopenharmony_cinon-native instructions to a series of native instructions.  For
161bf215546Sopenharmony_ciexample, for the Mesa backend we have ir_div_to_mul_rcp.cpp because
162bf215546Sopenharmony_ciMesa IR (and many hardware backends) only have a reciprocal
163bf215546Sopenharmony_ciinstruction, not a divide.  Implementing non-native instructions this
164bf215546Sopenharmony_ciway gives the chance for constant folding to occur, so (a / 2.0)
165bf215546Sopenharmony_cibecomes (a * 0.5) after codegen instead of (a * (1.0 / 2.0))
166bf215546Sopenharmony_ci
167bf215546Sopenharmony_ciQ: How shoud I handle my special hardware instructions with respect to IR?
168bf215546Sopenharmony_ci
169bf215546Sopenharmony_ciOur current theory is that if multiple targets have an instruction for
170bf215546Sopenharmony_cisome operation, then we should probably be able to represent that in
171bf215546Sopenharmony_cithe IR.  Generally this is in the form of an ir_{bin,un}op expression
172bf215546Sopenharmony_citype.  For example, we initially implemented fract() using (a -
173bf215546Sopenharmony_cifloor(a)), but both 945 and 965 have instructions to give that result,
174bf215546Sopenharmony_ciand it would also simplify the implementation of mod(), so
175bf215546Sopenharmony_ciir_unop_fract was added.  The following areas need updating to add a
176bf215546Sopenharmony_cinew expression type:
177bf215546Sopenharmony_ci
178bf215546Sopenharmony_ciir.h (new enum)
179bf215546Sopenharmony_ciir.cpp:operator_strs (used for ir_reader)
180bf215546Sopenharmony_ciir_constant_expression.cpp (you probably want to be able to constant fold)
181bf215546Sopenharmony_ciir_validate.cpp (check users have the right types)
182bf215546Sopenharmony_ci
183bf215546Sopenharmony_ciYou may also need to update the backends if they will see the new expr type:
184bf215546Sopenharmony_ci
185bf215546Sopenharmony_ci../mesa/program/ir_to_mesa.cpp
186bf215546Sopenharmony_ci
187bf215546Sopenharmony_ciYou can then use the new expression from builtins (if all backends
188bf215546Sopenharmony_ciwould rather see it), or scan the IR and convert to use your new
189bf215546Sopenharmony_ciexpression type (see ir_mod_to_floor, for example).
190bf215546Sopenharmony_ci
191bf215546Sopenharmony_ciQ: How is memory management handled in the compiler?
192bf215546Sopenharmony_ci
193bf215546Sopenharmony_ciThe hierarchical memory allocator "talloc" developed for the Samba
194bf215546Sopenharmony_ciproject is used, so that things like optimization passes don't have to
195bf215546Sopenharmony_ciworry about their garbage collection so much.  It has a few nice
196bf215546Sopenharmony_cifeatures, including low performance overhead and good debugging
197bf215546Sopenharmony_cisupport that's trivially available.
198bf215546Sopenharmony_ci
199bf215546Sopenharmony_ciGenerally, each stage of the compile creates a talloc context and
200bf215546Sopenharmony_ciallocates its memory out of that or children of it.  At the end of the
201bf215546Sopenharmony_cistage, the pieces still live are stolen to a new context and the old
202bf215546Sopenharmony_cione freed, or the whole context is kept for use by the next stage.
203bf215546Sopenharmony_ci
204bf215546Sopenharmony_ciFor IR transformations, a temporary context is used, then at the end
205bf215546Sopenharmony_ciof all transformations, reparent_ir reparents all live nodes under the
206bf215546Sopenharmony_cishader's IR list, and the old context full of dead nodes is freed.
207bf215546Sopenharmony_ciWhen developing a single IR transformation pass, this means that you
208bf215546Sopenharmony_ciwant to allocate instruction nodes out of the temporary context, so if
209bf215546Sopenharmony_ciit becomes dead it doesn't live on as the child of a live node.  At
210bf215546Sopenharmony_cithe moment, optimization passes aren't passed that temporary context,
211bf215546Sopenharmony_ciso they find it by calling talloc_parent() on a nearby IR node.  The
212bf215546Sopenharmony_citalloc_parent() call is expensive, so many passes will cache the
213bf215546Sopenharmony_ciresult of the first talloc_parent().  Cleaning up all the optimization
214bf215546Sopenharmony_cipasses to take a context argument and not call talloc_parent() is left
215bf215546Sopenharmony_cias an exercise.
216bf215546Sopenharmony_ci
217bf215546Sopenharmony_ciQ: What is the file naming convention in this directory?
218bf215546Sopenharmony_ci
219bf215546Sopenharmony_ciInitially, there really wasn't one.  We have since adopted one:
220bf215546Sopenharmony_ci
221bf215546Sopenharmony_ci - Files that implement code lowering passes should be named lower_*
222bf215546Sopenharmony_ci   (e.g., lower_builtins.cpp).
223bf215546Sopenharmony_ci - Files that implement optimization passes should be named opt_*.
224bf215546Sopenharmony_ci - Files that implement a class that is used throught the code should
225bf215546Sopenharmony_ci   take the name of that class (e.g., ir_hierarchical_visitor.cpp).
226bf215546Sopenharmony_ci - Files that contain code not fitting in one of the previous
227bf215546Sopenharmony_ci   categories should have a sensible name (e.g., glsl_parser.yy).
228