110
1
1
xxxxxxxxx
00
00000
LoaD Global
{SY}{JP}{NAME}.{TYPE} {TYPE_HALF}{DST}, g[{SRC1}{OFF}], {SIZE}
0
LoaD Global
{SY}{JP}{NAME}.{TYPE} {TYPE_HALF}{DST}, g[{SRC1}+({SRC2}{OFF})<<{SRC2_BYTE_SHIFT}], {SIZE}
{SY}{JP}{NAME}.{TYPE} {TYPE_HALF}{DST}, g[{SRC1}+{SRC2}<<{SRC2_BYTE_SHIFT}{OFF}<<2], {SIZE}
{SRC2_ADD_DWORD_SHIFT} > 0
0
1
{SRC2_ADD_DWORD_SHIFT} + 2
x
xxxxxxxx
1x
x
00011
STore Global
{SY}{JP}{NAME}.{TYPE} g[{SRC1}{OFF}], {TYPE_HALF}{SRC3}, {SIZE}
({OFF_HI} << 8) | {OFF_LO}
0
STore Global
{SY}{JP}{NAME}.{TYPE} g[{SRC1}+({SRC2}{OFF})<<{DST_BYTE_SHIFT}], {TYPE_HALF}{SRC3}, {SIZE}
{SY}{JP}{NAME}.{TYPE} g[{SRC1}+{SRC2}<<{DST_BYTE_SHIFT}{OFF}<<2], {TYPE_HALF}{SRC3}, {SIZE}
{SRC2_ADD_DWORD_SHIFT} > 0
{SRC2_ADD_DWORD_SHIFT} + 2
0
1
1
x
1
xxxxxxxxx
xx
LoaD Local
{SY}{JP}{NAME}.{TYPE} {DST}, l[{SRC}{OFF}], {SIZE}
00001
LoaD Private
{SY}{JP}{NAME}.{TYPE} {DST}, p[{SRC}{OFF}], {SIZE}
00010
LoaD Local (variant used for passing data between geom stages)
{SY}{JP}{NAME}.{TYPE} {DST}, l[{SRC}{OFF}], {SIZE}
01010
LoaD Local Varying - read directly from varying storage
{SY}{JP}{NAME}.{TYPE} {DST}, l[{OFF}], {SIZE}
0
xxxxxxxx
11
xxxxxxxxx
xx
11111
({OFF_HI} << 8) | {OFF_LO}
xxxxxxxxx
1
1
xx
"
STore Local
{SY}{JP}{NAME}.{TYPE} l[{DST}{OFF}], {SRC}, {SIZE}
x
00100
STore Private
{SY}{JP}{NAME}.{TYPE} p[{DST}{OFF}], {SRC}, {SIZE}
0
00101
STore Local (variant used for passing data between geom stages)
{SY}{JP}{NAME}.{TYPE} l[{DST}{OFF}], {SRC}, {SIZE}
x
01011
{OFFSET}
0
a1.x{OFFSET}
1
Encoding for stc destination which can be constant or have an
offset of a1.x.
STore Const - used for shader prolog (between shps and shpe)
to store "uniform folded" values into CONST file
NOTE: TYPE field actually seems to be set to different
values (ie f32 vs u32), but it seems that only the size (16b vs
32b) matters. Setting a 16-bit type (f16, u16, or s16) doesn't
cause any promotion to 32-bit, it causes the 16-bit sources to
be stored one after the other starting with the low half of the
constant. So e.g. "stc.f16 c[1], hr0.x, 1" copies hr0.x to the
bottom half of c0.y. There seems to be no way to set just the
upper half. In any case, the blob seems to only use the 32-bit
versions.
The blob disassembly doesn't include the type, but we still
display it so that we can preserve the different values the blob
sets when round-tripping.
NOTE: this conflicts with stgb from earlier gens
{SY}{JP}{NAME}.{TYPE} c[{DST}], {SRC}, {SIZE}
x
xxxxxxxxxxxxxx
1
xxxxx
xxxxxxxx
xx
11100
{SY}{JP}{NAME}.{TYPE}.{D}d {DST}, g[{SSBO}]
x
xxxxxxxx
x
xx
xxxxxxxx
x
x
xxxxxxxx
0
x
01111
x
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE} {DST}, g[{SSBO}], {SRC1}, {SRC2}
xxxxxxxx
x
11011
1
11011
x
00110
1
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE} g[{SSBO}], {SRC1}, {SRC2}, {SRC3}
xxxxxxxxx
0
1
11100
11101
11100
11101
Base for atomic instructions (I think mostly a4xx+, as
a3xx didn't have real image/ssbo.. it was all just global).
Still used as of a6xx for local.
NOTE that existing disasm and asm parser expect atomic inc/dec
to still have an extra src. For now, match that.
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE}.l {DST}, l[{SRC1}], {SRC2}
x
1
00000000
00000000
0
0
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
Pre-a6xx atomics for Image/SSBO
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE}.g {DST}, g[{SSBO}], {SRC1}, {SRC2}, {SRC3}
1
0
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
1
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
a6xx+ global atomics which take iova in SRC1
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE}.g {DST}, {SRC1}, {SRC2}
1
00000000
00000000
1
0
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
Base for new instruction encoding that started being used
with a6xx for instructions supporting bindless mode.
00
0
00000
x
x
011110
1xx
x11
ldc.k copies a series of UBO values to constants. In other
words, it acts the same as a series of ldc followed by stc. It's
also similar to a CP_LOAD_STATE with a UBO source but executed
in the shader.
Like CP_LOAD_STATE, the UBO offset and const file offset must be
a multiple of 4 vec4's but it can load any number of vec4's. The
UBO descriptor and offset are the same as a normal ldc. The
const file offset is specified in a1.x and is in units of
components, and the number of vec4's to copy is specified in
LOAD_SIZE.
{SY}{JP}ldc.{LOAD_SIZE}.k.{MODE}{BASE} c[a1.x], {SRC1}, {SRC2}
xx
11
LoaD Constant - UBO load
{SY}{JP}{NAME}.offset{OFFSET}.{TYPE_SIZE}.{MODE}{BASE} {DST}, {SRC1}, {SRC2}
10
GET Shader Processor ID?
{SY}{JP}{NAME}.{TYPE} {DST}
0
xx
x
100100
x1xx
xxxxxxxx
xxxxxxxx
1x
GET Wavefront ID
{SY}{JP}{NAME}.{TYPE} {DST}
0
xx
x
100101
x1xx
xxxxxxxx
xxxxxxxx
1x
GET Fiber ID (gl_SubgroupID)
{SY}{JP}{NAME}.{TYPE} {DST}
0
xx
x
100110
11xx
xxxxxxxx
xxxxxxxx
1x
RESourceINFO - returns image/ssbo dimensions (3 components)
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE}.{MODE}{BASE} {DST}, {SSBO}
0
001111
0110
xxxxxxxx
1x
IBO (ie. Image/SSBO) instructions
{SY}{JP}{NAME}.{TYPED}.{D}d.{TYPE}.{TYPE_SIZE}.{MODE}{BASE} {TYPE_HALF}{SRC1}, {SRC2}, {SSBO}
0110
STore IBo
0
011101
10
LoaD IBo
x
000110
10
x
010000
11
x
010001
11
x
010010
11
x
010101
11
x
010110
11
x
010111
11
x
011000
11
x
011001
11
x
011010
11
{D_MINUS_ONE} + 1
{TYPE_SIZE_MINUS_ONE} + 1
{LOAD_SIZE_MINUS_ONE} + 1
{TYPED}
typed
untyped
{BINDLESS}
.base{BASE}
Source value that can be either immed or gpr
{SRC_IM}
{IMMED}
r{GPR}.{SWIZ}
{MODE} == 0
Source mode for "new" a6xx+ instruction encodings
Immediate index.
Index from a uniform register (ie. does not depend on flow control)
Index from a non-uniform register (ie. potentially depends on flow control)