1bf215546Sopenharmony_ci===================== 2bf215546Sopenharmony_ciAdreno Five Microcode 3bf215546Sopenharmony_ci===================== 4bf215546Sopenharmony_ci 5bf215546Sopenharmony_ci.. contents:: 6bf215546Sopenharmony_ci 7bf215546Sopenharmony_ci.. _afuc-introduction: 8bf215546Sopenharmony_ci 9bf215546Sopenharmony_ciIntroduction 10bf215546Sopenharmony_ci============ 11bf215546Sopenharmony_ci 12bf215546Sopenharmony_ciAdreno GPUs prior to 6xx use two micro-controllers to parse the command-stream, 13bf215546Sopenharmony_cisetup the hardware for draws (or compute jobs), and do various GPU 14bf215546Sopenharmony_cihousekeeping. They are relatively simple (basically glorified 15bf215546Sopenharmony_ciregister writers) and basically all their state is in a collection 16bf215546Sopenharmony_ciof registers. Ie. there is no stack, and no memory assigned to 17bf215546Sopenharmony_cithem; any global state like which bank of context registers is to 18bf215546Sopenharmony_cibe used in the next draw is stored in a register. 19bf215546Sopenharmony_ci 20bf215546Sopenharmony_ciThe setup is similar to radeon, in fact Adreno 2xx thru 4xx used 21bf215546Sopenharmony_cibasically the same instruction set as r600. There is a "PFP" 22bf215546Sopenharmony_ci(Prefetch Parser) and "ME" (Micro Engine, also confusingly referred 23bf215546Sopenharmony_cito as "PM4"). These make up the "CP" ("Command Parser"). The 24bf215546Sopenharmony_ciPFP runs ahead of the ME, with some PM4 packets handled entirely 25bf215546Sopenharmony_ciin the PFP. Between the PFP and ME is a FIFO ("MEQ"). In the 26bf215546Sopenharmony_cigenerations prior to Adreno 5xx, the PFP and ME had different 27bf215546Sopenharmony_ciinstruction sets. 28bf215546Sopenharmony_ci 29bf215546Sopenharmony_ciStarting with Adreno 5xx, a new microcontroller with a unified 30bf215546Sopenharmony_ciinstruction set was introduced, although the overall architecture 31bf215546Sopenharmony_ciand purpose of the two microcontrollers remains the same. 32bf215546Sopenharmony_ci 33bf215546Sopenharmony_ciFor lack of a better name, this new instruction set is called 34bf215546Sopenharmony_ci"Adreno Five MicroCode" or "afuc". (No idea what Qualcomm calls 35bf215546Sopenharmony_ciit internally. 36bf215546Sopenharmony_ci 37bf215546Sopenharmony_ciWith Adreno 6xx, the separate PF and ME are replaced with a single 38bf215546Sopenharmony_ciSQE microcontroller using the same instruction set as 5xx. 39bf215546Sopenharmony_ci 40bf215546Sopenharmony_ci.. _afuc-overview: 41bf215546Sopenharmony_ci 42bf215546Sopenharmony_ciInstruction Set Overview 43bf215546Sopenharmony_ci======================== 44bf215546Sopenharmony_ci 45bf215546Sopenharmony_ci32bit instruction set with basic arithmatic ops that can take 46bf215546Sopenharmony_cieither two source registers or one src and a 16b immediate. 47bf215546Sopenharmony_ci 48bf215546Sopenharmony_ci32 registers, although some are special purpose: 49bf215546Sopenharmony_ci 50bf215546Sopenharmony_ci- ``$00`` - always reads zero, otherwise seems to be the PC 51bf215546Sopenharmony_ci- ``$01`` - current PM4 packet header 52bf215546Sopenharmony_ci- ``$1c`` - alias ``$rem``, remaining data in packet 53bf215546Sopenharmony_ci- ``$1d`` - alias ``$addr`` 54bf215546Sopenharmony_ci- ``$1f`` - alias ``$data`` 55bf215546Sopenharmony_ci 56bf215546Sopenharmony_ciBranch instructions have a delay slot so the following instruction 57bf215546Sopenharmony_ciis always executed regardless of whether branch is taken or not. 58bf215546Sopenharmony_ci 59bf215546Sopenharmony_ci 60bf215546Sopenharmony_ci.. _afuc-alu: 61bf215546Sopenharmony_ci 62bf215546Sopenharmony_ciALU Instructions 63bf215546Sopenharmony_ci================ 64bf215546Sopenharmony_ci 65bf215546Sopenharmony_ciThe following instructions are available: 66bf215546Sopenharmony_ci 67bf215546Sopenharmony_ci- ``add`` - add 68bf215546Sopenharmony_ci- ``addhi`` - add + carry (for upper 32b of 64b value) 69bf215546Sopenharmony_ci- ``sub`` - subtract 70bf215546Sopenharmony_ci- ``subhi`` - subtract + carry (for upper 32b of 64b value) 71bf215546Sopenharmony_ci- ``and`` - bitwise AND 72bf215546Sopenharmony_ci- ``or`` - bitwise OR 73bf215546Sopenharmony_ci- ``xor`` - bitwise XOR 74bf215546Sopenharmony_ci- ``not`` - bitwise NOT (no src1) 75bf215546Sopenharmony_ci- ``shl`` - shift-left 76bf215546Sopenharmony_ci- ``ushr`` - unsigned shift-right 77bf215546Sopenharmony_ci- ``ishr`` - signed shift-right 78bf215546Sopenharmony_ci- ``rot`` - rotate-left (like shift-left with wrap-around) 79bf215546Sopenharmony_ci- ``mul8`` - multiply low 8b of two src 80bf215546Sopenharmony_ci- ``min`` - minimum 81bf215546Sopenharmony_ci- ``max`` - maximum 82bf215546Sopenharmony_ci- ``comp`` - compare two values 83bf215546Sopenharmony_ci 84bf215546Sopenharmony_ciThe ALU instructions can take either two src registers, or a src 85bf215546Sopenharmony_ciplus 16b immediate as 2nd src, ex:: 86bf215546Sopenharmony_ci 87bf215546Sopenharmony_ci add $dst, $src, 0x1234 ; src2 is immed 88bf215546Sopenharmony_ci add $dst, $src1, $src2 ; src2 is reg 89bf215546Sopenharmony_ci 90bf215546Sopenharmony_ciThe ``not`` instruction only takes a single source:: 91bf215546Sopenharmony_ci 92bf215546Sopenharmony_ci not $dst, $src 93bf215546Sopenharmony_ci not $dst, 0x1234 94bf215546Sopenharmony_ci 95bf215546Sopenharmony_ci.. _afuc-alu-cmp: 96bf215546Sopenharmony_ci 97bf215546Sopenharmony_ciThe ``cmp`` instruction returns: 98bf215546Sopenharmony_ci 99bf215546Sopenharmony_ci- ``0x00`` if src1 > src2 100bf215546Sopenharmony_ci- ``0x2b`` if src1 == src2 101bf215546Sopenharmony_ci- ``0x1e`` if src1 < src2 102bf215546Sopenharmony_ci 103bf215546Sopenharmony_ciSee explanation in :ref:`afuc-branch` 104bf215546Sopenharmony_ci 105bf215546Sopenharmony_ci 106bf215546Sopenharmony_ci.. _afuc-branch: 107bf215546Sopenharmony_ci 108bf215546Sopenharmony_ciBranch Instructions 109bf215546Sopenharmony_ci=================== 110bf215546Sopenharmony_ci 111bf215546Sopenharmony_ciThe following branch/jump instructions are available: 112bf215546Sopenharmony_ci 113bf215546Sopenharmony_ci- ``brne`` - branch if not equal (or bit not set) 114bf215546Sopenharmony_ci- ``breq`` - branch if equal (or bit set) 115bf215546Sopenharmony_ci- ``jump`` - unconditional jump 116bf215546Sopenharmony_ci 117bf215546Sopenharmony_ciBoth ``brne`` and ``breq`` have two forms, comparing the src register 118bf215546Sopenharmony_ciagainst either a small immediate (up to 5 bits) or a specific bit:: 119bf215546Sopenharmony_ci 120bf215546Sopenharmony_ci breq $src, b3, #somelabel ; branch if src & (1 << 3) 121bf215546Sopenharmony_ci breq $src, 0x3, #somelabel ; branch if src == 3 122bf215546Sopenharmony_ci 123bf215546Sopenharmony_ciThe branch instructions are encoded with a 16b relative offset. 124bf215546Sopenharmony_ciSince ``$00`` always reads back zero, it can be used to construct 125bf215546Sopenharmony_cian unconditional relative jump. 126bf215546Sopenharmony_ci 127bf215546Sopenharmony_ciThe :ref:`cmp <afuc-alu-cmp>` instruction can be paired with the 128bf215546Sopenharmony_cibit-test variants of ``brne``/``breq`` to implement gt/ge/lt/le, 129bf215546Sopenharmony_cidue to the bit pattern it returns, for example:: 130bf215546Sopenharmony_ci 131bf215546Sopenharmony_ci cmp $04, $02, $03 132bf215546Sopenharmony_ci breq $04, b1, #somelabel 133bf215546Sopenharmony_ci 134bf215546Sopenharmony_ciwill branch if ``$02`` is less than or equal to ``$03``. 135bf215546Sopenharmony_ci 136bf215546Sopenharmony_ci 137bf215546Sopenharmony_ci.. _afuc-call: 138bf215546Sopenharmony_ci 139bf215546Sopenharmony_ciCall/Return 140bf215546Sopenharmony_ci=========== 141bf215546Sopenharmony_ci 142bf215546Sopenharmony_ciSimple subroutines can be implemented with ``call``/``ret``. The 143bf215546Sopenharmony_cijump instruction encodes a fixed offset. 144bf215546Sopenharmony_ci 145bf215546Sopenharmony_ci TODO not sure how many levels deep function calls can be nested. 146bf215546Sopenharmony_ci There isn't really a stack. Definitely seems to be multiple 147bf215546Sopenharmony_ci levels of fxn call, see in PFP: CP_CONTEXT_SWITCH_YIELD -> f13 -> 148bf215546Sopenharmony_ci f22. 149bf215546Sopenharmony_ci 150bf215546Sopenharmony_ci 151bf215546Sopenharmony_ci.. _afuc-control: 152bf215546Sopenharmony_ci 153bf215546Sopenharmony_ciConfig Instructions 154bf215546Sopenharmony_ci=================== 155bf215546Sopenharmony_ci 156bf215546Sopenharmony_ciThese seem to read/write config state in other parts of CP. In at 157bf215546Sopenharmony_cileast some cases I expect these map to CP registers (but possibly 158bf215546Sopenharmony_cinot directly??) 159bf215546Sopenharmony_ci 160bf215546Sopenharmony_ci- ``cread $dst, [$off + addr], flags`` 161bf215546Sopenharmony_ci- ``cwrite $src, [$off + addr], flags`` 162bf215546Sopenharmony_ci 163bf215546Sopenharmony_ciIn cases where no offset is needed, ``$00`` is frequently used as 164bf215546Sopenharmony_cithe offset. 165bf215546Sopenharmony_ci 166bf215546Sopenharmony_ciFor example, the following sequences sets:: 167bf215546Sopenharmony_ci 168bf215546Sopenharmony_ci ; load CP_INDIRECT_BUFFER parameters from cmdstream: 169bf215546Sopenharmony_ci mov $02, $data ; low 32b of IB target address 170bf215546Sopenharmony_ci mov $03, $data ; high 32b of IB target 171bf215546Sopenharmony_ci mov $04, $data ; IB size in dwords 172bf215546Sopenharmony_ci 173bf215546Sopenharmony_ci ; sanity check # of dwords: 174bf215546Sopenharmony_ci breq $04, 0x0, #l23 (#69, 04a2) 175bf215546Sopenharmony_ci 176bf215546Sopenharmony_ci ; this seems something to do with figuring out whether 177bf215546Sopenharmony_ci ; we are going from RB->IB1 or IB1->IB2 (ie. so the 178bf215546Sopenharmony_ci ; below cwrite instructions update either 179bf215546Sopenharmony_ci ; CP_IB1_BASE_LO/HI/BUFSIZE or CP_IB2_BASE_LO/HI/BUFSIZE 180bf215546Sopenharmony_ci and $05, $18, 0x0003 181bf215546Sopenharmony_ci shl $05, $05, 0x0002 182bf215546Sopenharmony_ci 183bf215546Sopenharmony_ci ; update CP_IBn_BASE_LO/HI/BUFSIZE: 184bf215546Sopenharmony_ci cwrite $02, [$05 + 0x0b0], 0x8 185bf215546Sopenharmony_ci cwrite $03, [$05 + 0x0b1], 0x8 186bf215546Sopenharmony_ci cwrite $04, [$05 + 0x0b2], 0x8 187bf215546Sopenharmony_ci 188bf215546Sopenharmony_ci 189bf215546Sopenharmony_ci 190bf215546Sopenharmony_ci.. _afuc-reg-access: 191bf215546Sopenharmony_ci 192bf215546Sopenharmony_ciRegister Access 193bf215546Sopenharmony_ci=============== 194bf215546Sopenharmony_ci 195bf215546Sopenharmony_ciThe special registers ``$addr`` and ``$data`` can be used to write GPU 196bf215546Sopenharmony_ciregisters, for example, to write:: 197bf215546Sopenharmony_ci 198bf215546Sopenharmony_ci mov $addr, CP_SCRATCH_REG[0x2] ; set register to write 199bf215546Sopenharmony_ci mov $data, $03 ; CP_SCRATCH_REG[0x2] 200bf215546Sopenharmony_ci mov $data, $04 ; CP_SCRATCH_REG[0x3] 201bf215546Sopenharmony_ci ... 202bf215546Sopenharmony_ci 203bf215546Sopenharmony_cisubsequent writes to ``$data`` will increment the address of the register 204bf215546Sopenharmony_cito write, so a sequence of consecutive registers can be written 205bf215546Sopenharmony_ci 206bf215546Sopenharmony_ciTo read:: 207bf215546Sopenharmony_ci 208bf215546Sopenharmony_ci mov $addr, CP_SCRATCH_REG[0x2] 209bf215546Sopenharmony_ci mov $03, $addr 210bf215546Sopenharmony_ci mov $04, $addr 211bf215546Sopenharmony_ci 212bf215546Sopenharmony_ciMany registers that are updated frequently have two banks, so they can be 213bf215546Sopenharmony_ciupdated without stalling for previous draw to finish. These banks are 214bf215546Sopenharmony_ciarranged so bit 11 is zero for bank 0 and 1 for bank 1. The ME fw (at 215bf215546Sopenharmony_cileast the version I'm looking at) stores this in ``$17``, so to update 216bf215546Sopenharmony_cithese registers from ME:: 217bf215546Sopenharmony_ci 218bf215546Sopenharmony_ci or $addr, $17, VFD_INDEX_OFFSET 219bf215546Sopenharmony_ci mov $data, $03 220bf215546Sopenharmony_ci ... 221bf215546Sopenharmony_ci 222bf215546Sopenharmony_ciNote that PFP doesn't seem to use this approach, instead it does something 223bf215546Sopenharmony_cilike:: 224bf215546Sopenharmony_ci 225bf215546Sopenharmony_ci mov $0c, CP_SCRATCH_REG[0x7] 226bf215546Sopenharmony_ci mov $02, 0x789a ; value 227bf215546Sopenharmony_ci cwrite $0c, [$00 + 0x010], 0x8 228bf215546Sopenharmony_ci cwrite $02, [$00 + 0x011], 0x8 229bf215546Sopenharmony_ci 230bf215546Sopenharmony_ciLike with the ``$addr``/``$data`` approach, the destination register address 231bf215546Sopenharmony_ciincrements on each write. 232bf215546Sopenharmony_ci 233bf215546Sopenharmony_ci.. _afuc-mem: 234bf215546Sopenharmony_ci 235bf215546Sopenharmony_ciMemory Access 236bf215546Sopenharmony_ci============= 237bf215546Sopenharmony_ci 238bf215546Sopenharmony_ciThere are no load/store instructions, as such. The microcontrollers 239bf215546Sopenharmony_cihave only indirect memory access via GPU registers. There are two 240bf215546Sopenharmony_cimechanism possible. 241bf215546Sopenharmony_ci 242bf215546Sopenharmony_ciRead/Write via CP_NRT Registers 243bf215546Sopenharmony_ci------------------------------- 244bf215546Sopenharmony_ci 245bf215546Sopenharmony_ciThis seems to be only used by ME. If PFP were also using it, they would 246bf215546Sopenharmony_cirace with each other. It seems to be primarily used for small reads. 247bf215546Sopenharmony_ci 248bf215546Sopenharmony_ci- ``CP_ME_NRT_ADDR_LO``/``_HI`` - write to set the address to read or write 249bf215546Sopenharmony_ci- ``CP_ME_NRT_DATA`` - write to trigger write to address in ``CP_ME_NRT_ADDR`` 250bf215546Sopenharmony_ci 251bf215546Sopenharmony_ciThe address register increments with successive reads or writes. 252bf215546Sopenharmony_ci 253bf215546Sopenharmony_ciMemory Write example:: 254bf215546Sopenharmony_ci 255bf215546Sopenharmony_ci ; store 64b value in $04+$05 to 64b address in $02+$03 256bf215546Sopenharmony_ci mov $addr, CP_ME_NRT_ADDR_LO 257bf215546Sopenharmony_ci mov $data, $02 258bf215546Sopenharmony_ci mov $data, $03 259bf215546Sopenharmony_ci mov $addr, CP_ME_NRT_DATA 260bf215546Sopenharmony_ci mov $data, $04 261bf215546Sopenharmony_ci mov $data, $05 262bf215546Sopenharmony_ci 263bf215546Sopenharmony_ciMemory Read example:: 264bf215546Sopenharmony_ci 265bf215546Sopenharmony_ci ; load 64b value from address in $02+$03 into $04+$05 266bf215546Sopenharmony_ci mov $addr, CP_ME_NRT_ADDR_LO 267bf215546Sopenharmony_ci mov $data, $02 268bf215546Sopenharmony_ci mov $data, $03 269bf215546Sopenharmony_ci mov $04, $addr 270bf215546Sopenharmony_ci mov $05, $addr 271bf215546Sopenharmony_ci 272bf215546Sopenharmony_ci 273bf215546Sopenharmony_ciRead via Control Instructions 274bf215546Sopenharmony_ci----------------------------- 275bf215546Sopenharmony_ci 276bf215546Sopenharmony_ciThis is used by PFP whenever it needs to read memory. Also seems to be 277bf215546Sopenharmony_ciused by ME for streaming reads (larger amounts of data). The DMA access 278bf215546Sopenharmony_ciseems to be done by ROQ. 279bf215546Sopenharmony_ci 280bf215546Sopenharmony_ci TODO might also be possible for write access 281bf215546Sopenharmony_ci 282bf215546Sopenharmony_ci TODO some of the control commands might be synchronizing access 283bf215546Sopenharmony_ci between PFP and ME?? 284bf215546Sopenharmony_ci 285bf215546Sopenharmony_ciAn example from ``CP_DRAW_INDIRECT`` packet handler:: 286bf215546Sopenharmony_ci 287bf215546Sopenharmony_ci mov $07, 0x0004 ; # of dwords to read from draw-indirect buffer 288bf215546Sopenharmony_ci ; load address of indirect buffer from cmdstream: 289bf215546Sopenharmony_ci cwrite $data, [$00 + 0x0b8], 0x8 290bf215546Sopenharmony_ci cwrite $data, [$00 + 0x0b9], 0x8 291bf215546Sopenharmony_ci ; set # of dwords to read: 292bf215546Sopenharmony_ci cwrite $07, [$00 + 0x0ba], 0x8 293bf215546Sopenharmony_ci ... 294bf215546Sopenharmony_ci ; read parameters from draw-indirect buffer: 295bf215546Sopenharmony_ci mov $09, $addr 296bf215546Sopenharmony_ci mov $07, $addr 297bf215546Sopenharmony_ci cread $12, [$00 + 0x040], 0x8 298bf215546Sopenharmony_ci ; the start parameter gets written into MEQ, which ME writes 299bf215546Sopenharmony_ci ; to VFD_INDEX_OFFSET register: 300bf215546Sopenharmony_ci mov $data, $addr 301bf215546Sopenharmony_ci 302bf215546Sopenharmony_ci 303bf215546Sopenharmony_ciA6XX NOTES 304bf215546Sopenharmony_ci========== 305bf215546Sopenharmony_ci 306bf215546Sopenharmony_ciThe ``$14`` register holds global flags set by: 307bf215546Sopenharmony_ci 308bf215546Sopenharmony_ci CP_SKIP_IB2_ENABLE_LOCAL - b8 309bf215546Sopenharmony_ci CP_SKIP_IB2_ENABLE_GLOBAL - b9 310bf215546Sopenharmony_ci CP_SET_MARKER 311bf215546Sopenharmony_ci MODE=GMEM - sets b15 312bf215546Sopenharmony_ci MODE=BLIT2D - clears b15, b12, b7 313bf215546Sopenharmony_ci CP_SET_MODE - b29+b30 314bf215546Sopenharmony_ci CP_SET_VISIBILITY_OVERRIDE - b11, b21, b30? 315bf215546Sopenharmony_ci CP_SET_DRAW_STATE - checks b29+b30 316bf215546Sopenharmony_ci 317bf215546Sopenharmony_ci CP_COND_REG_EXEC - checks b10, which should be predicate flag? 318