The BASE field is actually split across BASE_LO and BASE_HI,
but '.baseN' should only appear in the bindless case.. the
easiest way to accomplish that is by splitting it out into a
bitset. We just arbitrarily map this to BASE_LO
{BINDLESS}
.base{BASE}
({BASE_HI} * 2) | {BASE_LO}
The "normal" case, ie. not s2en (indirect) and/or bindless
{SY}{JP}{NAME}{3D}{A}{O}{P}{S} {TYPE}({WRMASK}){DST_HALF}{DST}{SRC1}{SRC2}{SAMP}{TEX}
0x
0
101
{S2EN_BINDLESS}
The s2en (indirect) or bindless case
{SY}{JP}{NAME}{3D}{A}{O}{P}{S}{S2EN}{UNIFORM}{NONUNIFORM}{BASE} {TYPE}({WRMASK}){DST_HALF}{DST}{SRC1}{SRC2}{SRC3}{A1}
00
00000
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
GETINFO returns 4 values, in .xyzw:
x: A value associated with the channel type, i.e. OpenCL's
get_image_channel_data_type:
The below was RE'd on A420 and confirmed with the
blob's headers.
8_SNORM: 0 (CLK_SNORM_INT8)
16_SNORM: 1 (CLK_SNORM_INT16)
8_UNORM: 2 (CLK_UNORM_INT8)
16_UNORM: 3 (CLK_UNORM_INT16)
5_6_5_UNORM: 4 (CLK_UNORM_SHORT_565)
5_5_5_1_UNORM: 5 (CLK_UNORM_SHORT_555)
10_10_10_2_UNORM: 6 (CLK_UNORM_INT_101010, CLK_UNORM_SHORT_101010)
8_SINT: 7 (CLK_SIGNED_INT8)
16_SINT: 8 (CLK_SIGNED_INT16)
32_SINT: 9 (CLK_SIGNED_INT32)
8_UINT: 10 (CLK_UNSIGNED_INT8)
16_UINT: 11 (CLK_UNSIGNED_INT16)
32_UINT: 12 (CLK_UNSIGNED_INT32)
16_FLOAT: 13 (CLK_HALF_FLOAT)
32_FLOAT: 14 (CLK_FLOAT)
9_9_9_E5_FLOAT: 15 (CLK_FLOAT_10F_11F_11F)
11_11_10_FLOAT: 15 (CLK_FLOAT_10F_11F_11F)
10_10_10_2_UINT: 16 (CLK_UNSIGNED_SHORT_101010)
4_4_4_4_UNORM: 17 (CLK_UNORM_INT4)
X8Z24_UNORM: 18 (CLK_UNORM_INT32)
y: A value associated with the number of components
and swizzle, i.e. OpenCL's get_image_channel_order:
The below was largely taken from the blob's headers.
A3xx/A4xx:
0: CLK_A
1: CLK_R
2: CLK_Rx
3: CLK_RG
4: CLK_RGx
5: CLK_RA
6: CLK_RGB
7: CLK_RGBx
8: CLK_RGBA
9: CLK_ARGB
10: CLK_BGRA
11: CLK_LUMINANCE
12: CLK_INTENSITY
13: CLK_ABGR
14: CLK_BGR
15: CLK_sRGB
16: CLK_sRGBA
17: CLK_DEPTH
A5xx/A6xx:
0: CLK_A
1: CLK_R
2: CLK_RX
3: CLK_RG
4: CLK_RGX
5: CLK_RA
6: CLK_RGB
7: CLK_RGBX
8: CLK_RGBA
9: CLK_ARGB
10: CLK_BGRA
11: CLK_INTENSITY
12: CLK_LUMINANCE
13: CLK_ABGR
14: CLK_DEPTH
15: CLK_sRGB
16: CLK_sRGBx
17: CLK_sRGBA
18: CLK_sBGRA
19: CLK_sARGB
20: CLK_sABGR
21: CLK_BGR
z: Number of levels
w: Number of samples
01101
01110
01111
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
The subgroup is divided into (subgroup_size / CLUSTER_SIZE)
clusters. For each cluster brcst.active.w does:
Given a cluster of fibers f_0, f_1, ..., f_{CLUSTER_SIZE-1} brcst
broadcasts the SRC value from the fiber f_{CLUSTER_SIZE/2-1}
to fibers f_{CLUSTER_SIZE/2}, ..., f_{CLUSTER_SIZE-1}. The DST reg
in other fibers is unaffected. If fiber f_{CLUSTER_SIZE/2-1} is
inactive the value to broadcast is taken from lower fibers
f_{CLUSTER_SIZE/2-2}, f_{CLUSTER_SIZE/2-3}, ...
If all fibers f_0, f_1, ..., f_{CLUSTER_SIZE/2-1} are inactive
the DST reg remains unchanged for all fibers.
It is necessary in order to implement arithmetic subgroup
operations with prefix sum (https://en.wikipedia.org/wiki/Prefix_sum).
For brcst.active.w8 without inactive fibers:
Fiber | 0 1 2 3 4 5 6 7 | 8 9 10 11 12 13 14 15
SRC | s0 s1 s2 s3 ... s7 | s8 ... s11 ... s15
DST_before | d0 d1 ... d7 | d8 ... d15
DST_after | d0 d1 d2 d3 s3 s3 s3 s3 | d8 ... d11 s11 s11 s11 s11
If fibers 2 and 3 are inactive:
Fiber | 0 1 X X 4 5 6 7 | ...
SRC | s0 s1 X X ... s7 | ...
DST_before | d0 d1 ... d7 | ...
DST_after | d0 d1 X X s1 s1 s1 s1 | ...
{SY}{JP}{NAME}.w{CLUSTER_SIZE} {TYPE}({WRMASK}){DST_HALF}{DST}{SRC1}
111110
2 << {W}
{SY}{JP}{NAME} {TYPE}({WRMASK}){DST_HALF}{DST}{SRC1}{SRC2}
111111
subgroupQuadBroadcast
00
subgroupQuadSwapHorizontal
01
subgroupQuadSwapVertical
10
subgroupQuadSwapDiagonal
11
{NUM_SRC} > 0
, {HALF}{SRC}
00000000
{O} || ({NUM_SRC} > 1)
, {HALF}{SRC}
00000000
{HAS_SAMP}
, s#{SAMP}
0000
s2en (indirect) / bindless case with a1.x has 8b samp
{HAS_SAMP}
, s#{SAMP}
00000000
{HAS_TEX}
, t#{TEX}
0000000
s2en (indirect) / bindless case only has 4b tex
{HAS_TEX}
, t#{TEX}
0000
{HAS_TYPE}
({TYPE})
We don't actually display this enum, but it is useful to
document the various cases
TODO we should probably have an option for uniforms w/out
display strings, but which have 'C' names that can be used
to generate header that the compiler can use
Use traditional GL binding model, get texture and sampler index
from src3 which is presumed to be uniform on a4xx+ (a3xx doesn't
have the other modes, but does handle non-uniform indexing).
The sampler base comes from the low 3 bits of a1.x, and the sampler
and texture index come from src3 which is presumed to be uniform.
The texture and sampler share the same base, and the sampler and
texture index come from src3 which is *not* presumed to be uniform.
The sampler base comes from the low 3 bits of a1.x, and the sampler
and texture index come from src3 which is *not* presumed to be
uniform.
Use traditional GL binding model, get texture and sampler index
from src3 which is *not* presumed to be uniform.
The texture and sampler share the same base, and the sampler and
texture index come from src3 which is presumed to be uniform.
The texture and sampler share the same base, get sampler index from low
4 bits of src3 and texture index from high 4 bits.
The sampler base comes from the low 3 bits of a1.x, and the texture
index comes from the next 8 bits of a1.x. The sampler index is an
immediate in src3.
{DESC_MODE} < 6 /* CAT5_BINDLESS_IMM */
({DESC_MODE} == 1) /* CAT5_BINDLESS_A1_UNIFORM */ ||
({DESC_MODE} == 2) /* CAT5_BINDLESS_NONUNIFORM */ ||
({DESC_MODE} == 3) /* CAT5_BINDLESS_A1_NONUNIFORM */ ||
({DESC_MODE} == 5) /* CAT5_BINDLESS_UNIFORM */ ||
({DESC_MODE} == 6) /* CAT5_BINDLESS_IMM */ ||
({DESC_MODE} == 7) /* CAT5_BINDLESS_A1_IMM */
({DESC_MODE} == 1) /* CAT5_BINDLESS_A1_UNIFORM */ ||
({DESC_MODE} == 3) /* CAT5_BINDLESS_A1_NONUNIFORM */ ||
({DESC_MODE} == 7) /* CAT5_BINDLESS_A1_IMM */
({DESC_MODE} == 0) /* CAT5_UNIFORM */ ||
({DESC_MODE} == 1) /* CAT5_BINDLESS_A1_UNIFORM */ ||
({DESC_MODE} == 5) /* CAT5_BINDLESS_UNIFORM */
({DESC_MODE} == 2) /* CAT5_BINDLESS_NONUNIFORM */ ||
({DESC_MODE} == 3) /* CAT5_BINDLESS_A1_NONUNIFORM */ ||
({DESC_MODE} == 4) /* CAT5_NONUNIFORM */
bindless/indirect src3, which can either be GPR or samp/tex
, {SRC_HALF}{SRC}
!{BINDLESS}
In the case that a1.x is used, all 8 bits encode sampler
{SAMP}
{SAMP}{TEX}