1bf215546Sopenharmony_ci=====================
2bf215546Sopenharmony_ciAdreno Five Microcode
3bf215546Sopenharmony_ci=====================
4bf215546Sopenharmony_ci
5bf215546Sopenharmony_ci.. contents::
6bf215546Sopenharmony_ci
7bf215546Sopenharmony_ci.. _afuc-introduction:
8bf215546Sopenharmony_ci
9bf215546Sopenharmony_ciIntroduction
10bf215546Sopenharmony_ci============
11bf215546Sopenharmony_ci
12bf215546Sopenharmony_ciAdreno GPUs prior to 6xx use two micro-controllers to parse the command-stream,
13bf215546Sopenharmony_cisetup the hardware for draws (or compute jobs), and do various GPU
14bf215546Sopenharmony_cihousekeeping.  They are relatively simple (basically glorified
15bf215546Sopenharmony_ciregister writers) and basically all their state is in a collection
16bf215546Sopenharmony_ciof registers.  Ie. there is no stack, and no memory assigned to
17bf215546Sopenharmony_cithem; any global state like which bank of context registers is to
18bf215546Sopenharmony_cibe used in the next draw is stored in a register.
19bf215546Sopenharmony_ci
20bf215546Sopenharmony_ciThe setup is similar to radeon, in fact Adreno 2xx thru 4xx used
21bf215546Sopenharmony_cibasically the same instruction set as r600.  There is a "PFP"
22bf215546Sopenharmony_ci(Prefetch Parser) and "ME" (Micro Engine, also confusingly referred
23bf215546Sopenharmony_cito as "PM4").  These make up the "CP" ("Command Parser").  The
24bf215546Sopenharmony_ciPFP runs ahead of the ME, with some PM4 packets handled entirely
25bf215546Sopenharmony_ciin the PFP.  Between the PFP and ME is a FIFO ("MEQ").  In the
26bf215546Sopenharmony_cigenerations prior to Adreno 5xx, the PFP and ME had different
27bf215546Sopenharmony_ciinstruction sets.
28bf215546Sopenharmony_ci
29bf215546Sopenharmony_ciStarting with Adreno 5xx, a new microcontroller with a unified
30bf215546Sopenharmony_ciinstruction set was introduced, although the overall architecture
31bf215546Sopenharmony_ciand purpose of the two microcontrollers remains the same.
32bf215546Sopenharmony_ci
33bf215546Sopenharmony_ciFor lack of a better name, this new instruction set is called
34bf215546Sopenharmony_ci"Adreno Five MicroCode" or "afuc".  (No idea what Qualcomm calls
35bf215546Sopenharmony_ciit internally.
36bf215546Sopenharmony_ci
37bf215546Sopenharmony_ciWith Adreno 6xx, the separate PF and ME are replaced with a single
38bf215546Sopenharmony_ciSQE microcontroller using the same instruction set as 5xx.
39bf215546Sopenharmony_ci
40bf215546Sopenharmony_ci.. _afuc-overview:
41bf215546Sopenharmony_ci
42bf215546Sopenharmony_ciInstruction Set Overview
43bf215546Sopenharmony_ci========================
44bf215546Sopenharmony_ci
45bf215546Sopenharmony_ci32bit instruction set with basic arithmatic ops that can take
46bf215546Sopenharmony_cieither two source registers or one src and a 16b immediate.
47bf215546Sopenharmony_ci
48bf215546Sopenharmony_ci32 registers, although some are special purpose:
49bf215546Sopenharmony_ci
50bf215546Sopenharmony_ci- ``$00`` - always reads zero, otherwise seems to be the PC
51bf215546Sopenharmony_ci- ``$01`` - current PM4 packet header
52bf215546Sopenharmony_ci- ``$1c`` - alias ``$rem``, remaining data in packet
53bf215546Sopenharmony_ci- ``$1d`` - alias ``$addr``
54bf215546Sopenharmony_ci- ``$1f`` - alias ``$data``
55bf215546Sopenharmony_ci
56bf215546Sopenharmony_ciBranch instructions have a delay slot so the following instruction
57bf215546Sopenharmony_ciis always executed regardless of whether branch is taken or not.
58bf215546Sopenharmony_ci
59bf215546Sopenharmony_ci
60bf215546Sopenharmony_ci.. _afuc-alu:
61bf215546Sopenharmony_ci
62bf215546Sopenharmony_ciALU Instructions
63bf215546Sopenharmony_ci================
64bf215546Sopenharmony_ci
65bf215546Sopenharmony_ciThe following instructions are available:
66bf215546Sopenharmony_ci
67bf215546Sopenharmony_ci- ``add``   - add
68bf215546Sopenharmony_ci- ``addhi`` - add + carry (for upper 32b of 64b value)
69bf215546Sopenharmony_ci- ``sub``   - subtract
70bf215546Sopenharmony_ci- ``subhi`` - subtract + carry (for upper 32b of 64b value)
71bf215546Sopenharmony_ci- ``and``   - bitwise AND
72bf215546Sopenharmony_ci- ``or``    - bitwise OR
73bf215546Sopenharmony_ci- ``xor``   - bitwise XOR
74bf215546Sopenharmony_ci- ``not``   - bitwise NOT (no src1)
75bf215546Sopenharmony_ci- ``shl``   - shift-left
76bf215546Sopenharmony_ci- ``ushr``  - unsigned shift-right
77bf215546Sopenharmony_ci- ``ishr``  - signed shift-right
78bf215546Sopenharmony_ci- ``rot``   - rotate-left (like shift-left with wrap-around)
79bf215546Sopenharmony_ci- ``mul8``  - multiply low 8b of two src
80bf215546Sopenharmony_ci- ``min``   - minimum
81bf215546Sopenharmony_ci- ``max``   - maximum
82bf215546Sopenharmony_ci- ``comp``  - compare two values
83bf215546Sopenharmony_ci
84bf215546Sopenharmony_ciThe ALU instructions can take either two src registers, or a src
85bf215546Sopenharmony_ciplus 16b immediate as 2nd src, ex::
86bf215546Sopenharmony_ci
87bf215546Sopenharmony_ci  add $dst, $src, 0x1234   ; src2 is immed
88bf215546Sopenharmony_ci  add $dst, $src1, $src2   ; src2 is reg
89bf215546Sopenharmony_ci
90bf215546Sopenharmony_ciThe ``not`` instruction only takes a single source::
91bf215546Sopenharmony_ci
92bf215546Sopenharmony_ci  not $dst, $src
93bf215546Sopenharmony_ci  not $dst, 0x1234
94bf215546Sopenharmony_ci
95bf215546Sopenharmony_ci.. _afuc-alu-cmp:
96bf215546Sopenharmony_ci
97bf215546Sopenharmony_ciThe ``cmp`` instruction returns:
98bf215546Sopenharmony_ci
99bf215546Sopenharmony_ci- ``0x00`` if src1 > src2
100bf215546Sopenharmony_ci- ``0x2b`` if src1 == src2
101bf215546Sopenharmony_ci- ``0x1e`` if src1 < src2
102bf215546Sopenharmony_ci
103bf215546Sopenharmony_ciSee explanation in :ref:`afuc-branch`
104bf215546Sopenharmony_ci
105bf215546Sopenharmony_ci
106bf215546Sopenharmony_ci.. _afuc-branch:
107bf215546Sopenharmony_ci
108bf215546Sopenharmony_ciBranch Instructions
109bf215546Sopenharmony_ci===================
110bf215546Sopenharmony_ci
111bf215546Sopenharmony_ciThe following branch/jump instructions are available:
112bf215546Sopenharmony_ci
113bf215546Sopenharmony_ci- ``brne`` - branch if not equal (or bit not set)
114bf215546Sopenharmony_ci- ``breq`` - branch if equal (or bit set)
115bf215546Sopenharmony_ci- ``jump`` - unconditional jump
116bf215546Sopenharmony_ci
117bf215546Sopenharmony_ciBoth ``brne`` and ``breq`` have two forms, comparing the src register
118bf215546Sopenharmony_ciagainst either a small immediate (up to 5 bits) or a specific bit::
119bf215546Sopenharmony_ci
120bf215546Sopenharmony_ci  breq $src, b3, #somelabel  ; branch if src & (1 << 3)
121bf215546Sopenharmony_ci  breq $src, 0x3, #somelabel ; branch if src == 3
122bf215546Sopenharmony_ci
123bf215546Sopenharmony_ciThe branch instructions are encoded with a 16b relative offset.
124bf215546Sopenharmony_ciSince ``$00`` always reads back zero, it can be used to construct
125bf215546Sopenharmony_cian unconditional relative jump.
126bf215546Sopenharmony_ci
127bf215546Sopenharmony_ciThe :ref:`cmp <afuc-alu-cmp>` instruction can be paired with the
128bf215546Sopenharmony_cibit-test variants of ``brne``/``breq`` to implement gt/ge/lt/le,
129bf215546Sopenharmony_cidue to the bit pattern it returns, for example::
130bf215546Sopenharmony_ci
131bf215546Sopenharmony_ci  cmp $04, $02, $03
132bf215546Sopenharmony_ci  breq $04, b1, #somelabel
133bf215546Sopenharmony_ci
134bf215546Sopenharmony_ciwill branch if ``$02`` is less than or equal to ``$03``.
135bf215546Sopenharmony_ci
136bf215546Sopenharmony_ci
137bf215546Sopenharmony_ci.. _afuc-call:
138bf215546Sopenharmony_ci
139bf215546Sopenharmony_ciCall/Return
140bf215546Sopenharmony_ci===========
141bf215546Sopenharmony_ci
142bf215546Sopenharmony_ciSimple subroutines can be implemented with ``call``/``ret``.  The
143bf215546Sopenharmony_cijump instruction encodes a fixed offset.
144bf215546Sopenharmony_ci
145bf215546Sopenharmony_ci  TODO not sure how many levels deep function calls can be nested.
146bf215546Sopenharmony_ci  There isn't really a stack.  Definitely seems to be multiple
147bf215546Sopenharmony_ci  levels of fxn call, see in PFP: CP_CONTEXT_SWITCH_YIELD -> f13 ->
148bf215546Sopenharmony_ci  f22.
149bf215546Sopenharmony_ci
150bf215546Sopenharmony_ci
151bf215546Sopenharmony_ci.. _afuc-control:
152bf215546Sopenharmony_ci
153bf215546Sopenharmony_ciConfig Instructions
154bf215546Sopenharmony_ci===================
155bf215546Sopenharmony_ci
156bf215546Sopenharmony_ciThese seem to read/write config state in other parts of CP.  In at
157bf215546Sopenharmony_cileast some cases I expect these map to CP registers (but possibly
158bf215546Sopenharmony_cinot directly??)
159bf215546Sopenharmony_ci
160bf215546Sopenharmony_ci- ``cread $dst, [$off + addr], flags``
161bf215546Sopenharmony_ci- ``cwrite $src, [$off + addr], flags``
162bf215546Sopenharmony_ci
163bf215546Sopenharmony_ciIn cases where no offset is needed, ``$00`` is frequently used as
164bf215546Sopenharmony_cithe offset.
165bf215546Sopenharmony_ci
166bf215546Sopenharmony_ciFor example, the following sequences sets::
167bf215546Sopenharmony_ci
168bf215546Sopenharmony_ci  ; load CP_INDIRECT_BUFFER parameters from cmdstream:
169bf215546Sopenharmony_ci  mov $02, $data   ; low 32b of IB target address
170bf215546Sopenharmony_ci  mov $03, $data   ; high 32b of IB target
171bf215546Sopenharmony_ci  mov $04, $data   ; IB size in dwords
172bf215546Sopenharmony_ci
173bf215546Sopenharmony_ci  ; sanity check # of dwords:
174bf215546Sopenharmony_ci  breq $04, 0x0, #l23 (#69, 04a2)
175bf215546Sopenharmony_ci
176bf215546Sopenharmony_ci  ; this seems something to do with figuring out whether
177bf215546Sopenharmony_ci  ; we are going from RB->IB1 or IB1->IB2 (ie. so the
178bf215546Sopenharmony_ci  ; below cwrite instructions update either
179bf215546Sopenharmony_ci  ; CP_IB1_BASE_LO/HI/BUFSIZE or CP_IB2_BASE_LO/HI/BUFSIZE
180bf215546Sopenharmony_ci  and $05, $18, 0x0003
181bf215546Sopenharmony_ci  shl $05, $05, 0x0002
182bf215546Sopenharmony_ci
183bf215546Sopenharmony_ci  ; update CP_IBn_BASE_LO/HI/BUFSIZE:
184bf215546Sopenharmony_ci  cwrite $02, [$05 + 0x0b0], 0x8
185bf215546Sopenharmony_ci  cwrite $03, [$05 + 0x0b1], 0x8
186bf215546Sopenharmony_ci  cwrite $04, [$05 + 0x0b2], 0x8
187bf215546Sopenharmony_ci
188bf215546Sopenharmony_ci
189bf215546Sopenharmony_ci
190bf215546Sopenharmony_ci.. _afuc-reg-access:
191bf215546Sopenharmony_ci
192bf215546Sopenharmony_ciRegister Access
193bf215546Sopenharmony_ci===============
194bf215546Sopenharmony_ci
195bf215546Sopenharmony_ciThe special registers ``$addr`` and ``$data`` can be used to write GPU
196bf215546Sopenharmony_ciregisters, for example, to write::
197bf215546Sopenharmony_ci
198bf215546Sopenharmony_ci  mov $addr, CP_SCRATCH_REG[0x2] ; set register to write
199bf215546Sopenharmony_ci  mov $data, $03                 ; CP_SCRATCH_REG[0x2]
200bf215546Sopenharmony_ci  mov $data, $04                 ; CP_SCRATCH_REG[0x3]
201bf215546Sopenharmony_ci  ...
202bf215546Sopenharmony_ci
203bf215546Sopenharmony_cisubsequent writes to ``$data`` will increment the address of the register
204bf215546Sopenharmony_cito write, so a sequence of consecutive registers can be written
205bf215546Sopenharmony_ci
206bf215546Sopenharmony_ciTo read::
207bf215546Sopenharmony_ci
208bf215546Sopenharmony_ci  mov $addr, CP_SCRATCH_REG[0x2]
209bf215546Sopenharmony_ci  mov $03, $addr
210bf215546Sopenharmony_ci  mov $04, $addr
211bf215546Sopenharmony_ci
212bf215546Sopenharmony_ciMany registers that are updated frequently have two banks, so they can be
213bf215546Sopenharmony_ciupdated without stalling for previous draw to finish.  These banks are
214bf215546Sopenharmony_ciarranged so bit 11 is zero for bank 0 and 1 for bank 1.  The ME fw (at
215bf215546Sopenharmony_cileast the version I'm looking at) stores this in ``$17``, so to update
216bf215546Sopenharmony_cithese registers from ME::
217bf215546Sopenharmony_ci
218bf215546Sopenharmony_ci  or $addr, $17, VFD_INDEX_OFFSET
219bf215546Sopenharmony_ci  mov $data, $03
220bf215546Sopenharmony_ci  ...
221bf215546Sopenharmony_ci
222bf215546Sopenharmony_ciNote that PFP doesn't seem to use this approach, instead it does something
223bf215546Sopenharmony_cilike::
224bf215546Sopenharmony_ci
225bf215546Sopenharmony_ci  mov $0c, CP_SCRATCH_REG[0x7]
226bf215546Sopenharmony_ci  mov $02, 0x789a   ; value
227bf215546Sopenharmony_ci  cwrite $0c, [$00 + 0x010], 0x8
228bf215546Sopenharmony_ci  cwrite $02, [$00 + 0x011], 0x8
229bf215546Sopenharmony_ci
230bf215546Sopenharmony_ciLike with the ``$addr``/``$data`` approach, the destination register address
231bf215546Sopenharmony_ciincrements on each write.
232bf215546Sopenharmony_ci
233bf215546Sopenharmony_ci.. _afuc-mem:
234bf215546Sopenharmony_ci
235bf215546Sopenharmony_ciMemory Access
236bf215546Sopenharmony_ci=============
237bf215546Sopenharmony_ci
238bf215546Sopenharmony_ciThere are no load/store instructions, as such.  The microcontrollers
239bf215546Sopenharmony_cihave only indirect memory access via GPU registers.  There are two
240bf215546Sopenharmony_cimechanism possible.
241bf215546Sopenharmony_ci
242bf215546Sopenharmony_ciRead/Write via CP_NRT Registers
243bf215546Sopenharmony_ci-------------------------------
244bf215546Sopenharmony_ci
245bf215546Sopenharmony_ciThis seems to be only used by ME.  If PFP were also using it, they would
246bf215546Sopenharmony_cirace with each other.  It seems to be primarily used for small reads.
247bf215546Sopenharmony_ci
248bf215546Sopenharmony_ci- ``CP_ME_NRT_ADDR_LO``/``_HI`` - write to set the address to read or write
249bf215546Sopenharmony_ci- ``CP_ME_NRT_DATA`` - write to trigger write to address in ``CP_ME_NRT_ADDR``
250bf215546Sopenharmony_ci
251bf215546Sopenharmony_ciThe address register increments with successive reads or writes.
252bf215546Sopenharmony_ci
253bf215546Sopenharmony_ciMemory Write example::
254bf215546Sopenharmony_ci
255bf215546Sopenharmony_ci  ; store 64b value in $04+$05 to 64b address in $02+$03
256bf215546Sopenharmony_ci  mov $addr, CP_ME_NRT_ADDR_LO
257bf215546Sopenharmony_ci  mov $data, $02
258bf215546Sopenharmony_ci  mov $data, $03
259bf215546Sopenharmony_ci  mov $addr, CP_ME_NRT_DATA
260bf215546Sopenharmony_ci  mov $data, $04
261bf215546Sopenharmony_ci  mov $data, $05
262bf215546Sopenharmony_ci
263bf215546Sopenharmony_ciMemory Read example::
264bf215546Sopenharmony_ci
265bf215546Sopenharmony_ci  ; load 64b value from address in $02+$03 into $04+$05
266bf215546Sopenharmony_ci  mov $addr, CP_ME_NRT_ADDR_LO
267bf215546Sopenharmony_ci  mov $data, $02
268bf215546Sopenharmony_ci  mov $data, $03
269bf215546Sopenharmony_ci  mov $04, $addr
270bf215546Sopenharmony_ci  mov $05, $addr
271bf215546Sopenharmony_ci
272bf215546Sopenharmony_ci
273bf215546Sopenharmony_ciRead via Control Instructions
274bf215546Sopenharmony_ci-----------------------------
275bf215546Sopenharmony_ci
276bf215546Sopenharmony_ciThis is used by PFP whenever it needs to read memory.  Also seems to be
277bf215546Sopenharmony_ciused by ME for streaming reads (larger amounts of data).  The DMA access
278bf215546Sopenharmony_ciseems to be done by ROQ.
279bf215546Sopenharmony_ci
280bf215546Sopenharmony_ci  TODO might also be possible for write access
281bf215546Sopenharmony_ci
282bf215546Sopenharmony_ci  TODO some of the control commands might be synchronizing access
283bf215546Sopenharmony_ci  between PFP and ME??
284bf215546Sopenharmony_ci
285bf215546Sopenharmony_ciAn example from ``CP_DRAW_INDIRECT`` packet handler::
286bf215546Sopenharmony_ci
287bf215546Sopenharmony_ci  mov $07, 0x0004  ; # of dwords to read from draw-indirect buffer
288bf215546Sopenharmony_ci  ; load address of indirect buffer from cmdstream:
289bf215546Sopenharmony_ci  cwrite $data, [$00 + 0x0b8], 0x8
290bf215546Sopenharmony_ci  cwrite $data, [$00 + 0x0b9], 0x8
291bf215546Sopenharmony_ci  ; set # of dwords to read:
292bf215546Sopenharmony_ci  cwrite $07, [$00 + 0x0ba], 0x8
293bf215546Sopenharmony_ci  ...
294bf215546Sopenharmony_ci  ; read parameters from draw-indirect buffer:
295bf215546Sopenharmony_ci  mov $09, $addr
296bf215546Sopenharmony_ci  mov $07, $addr
297bf215546Sopenharmony_ci  cread $12, [$00 + 0x040], 0x8
298bf215546Sopenharmony_ci  ; the start parameter gets written into MEQ, which ME writes
299bf215546Sopenharmony_ci  ; to VFD_INDEX_OFFSET register:
300bf215546Sopenharmony_ci  mov $data, $addr
301bf215546Sopenharmony_ci
302bf215546Sopenharmony_ci
303bf215546Sopenharmony_ciA6XX NOTES
304bf215546Sopenharmony_ci==========
305bf215546Sopenharmony_ci
306bf215546Sopenharmony_ciThe ``$14`` register holds global flags set by:
307bf215546Sopenharmony_ci
308bf215546Sopenharmony_ci  CP_SKIP_IB2_ENABLE_LOCAL - b8
309bf215546Sopenharmony_ci  CP_SKIP_IB2_ENABLE_GLOBAL - b9
310bf215546Sopenharmony_ci  CP_SET_MARKER
311bf215546Sopenharmony_ci    MODE=GMEM - sets b15
312bf215546Sopenharmony_ci    MODE=BLIT2D - clears b15, b12, b7
313bf215546Sopenharmony_ci  CP_SET_MODE - b29+b30
314bf215546Sopenharmony_ci  CP_SET_VISIBILITY_OVERRIDE - b11, b21, b30?
315bf215546Sopenharmony_ci  CP_SET_DRAW_STATE - checks b29+b30
316bf215546Sopenharmony_ci
317bf215546Sopenharmony_ci  CP_COND_REG_EXEC - checks b10, which should be predicate flag?
318