1bf215546Sopenharmony_ci# Notes on opcodes 2bf215546Sopenharmony_ci 3bf215546Sopenharmony_ci_Notes mainly by Connor Abbott extracted from the disassembler_ 4bf215546Sopenharmony_ci 5bf215546Sopenharmony_ciLOG_FREXPM: 6bf215546Sopenharmony_ci 7bf215546Sopenharmony_ci // From the ARM patent US20160364209A1: 8bf215546Sopenharmony_ci // "Decompose v (the input) into numbers x1 and s such that v = x1 * 2^s, 9bf215546Sopenharmony_ci // and x1 is a floating point value in a predetermined range where the 10bf215546Sopenharmony_ci // value 1 is within the range and not at one extremity of the range (e.g. 11bf215546Sopenharmony_ci // choose a range where 1 is towards middle of range)." 12bf215546Sopenharmony_ci // 13bf215546Sopenharmony_ci // This computes x1. 14bf215546Sopenharmony_ci 15bf215546Sopenharmony_ciFRCP_FREXPM: 16bf215546Sopenharmony_ci 17bf215546Sopenharmony_ci // Given a floating point number m * 2^e, returns m * 2^{-1}. This is 18bf215546Sopenharmony_ci // exactly the same as the mantissa part of frexp(). 19bf215546Sopenharmony_ci 20bf215546Sopenharmony_ciFSQRT_FREXPM: 21bf215546Sopenharmony_ci // Given a floating point number m * 2^e, returns m * 2^{-2} if e is even, 22bf215546Sopenharmony_ci // and m * 2^{-1} if e is odd. In other words, scales by powers of 4 until 23bf215546Sopenharmony_ci // within the range [0.25, 1). Used for square-root and reciprocal 24bf215546Sopenharmony_ci // square-root. 25bf215546Sopenharmony_ci 26bf215546Sopenharmony_ci 27bf215546Sopenharmony_ci 28bf215546Sopenharmony_ci 29bf215546Sopenharmony_ciFRCP_FREXPE: 30bf215546Sopenharmony_ci // Given a floating point number m * 2^e, computes -e - 1 as an integer. 31bf215546Sopenharmony_ci // Zero and infinity/NaN return 0. 32bf215546Sopenharmony_ci 33bf215546Sopenharmony_ciFSQRT_FREXPE: 34bf215546Sopenharmony_ci // Computes floor(e/2) + 1. 35bf215546Sopenharmony_ci 36bf215546Sopenharmony_ciFRSQ_FREXPE: 37bf215546Sopenharmony_ci // Given a floating point number m * 2^e, computes -floor(e/2) - 1 as an 38bf215546Sopenharmony_ci // integer. 39bf215546Sopenharmony_ci 40bf215546Sopenharmony_ciLSHIFT_ADD_LOW32: 41bf215546Sopenharmony_ci // These instructions in the FMA slot, together with LSHIFT_ADD_HIGH32.i32 42bf215546Sopenharmony_ci // in the ADD slot, allow one to do a 64-bit addition with an extra small 43bf215546Sopenharmony_ci // shift on one of the sources. There are three possible scenarios: 44bf215546Sopenharmony_ci // 45bf215546Sopenharmony_ci // 1) Full 64-bit addition. Do: 46bf215546Sopenharmony_ci // out.x = LSHIFT_ADD_LOW32.i64 src1.x, src2.x, shift 47bf215546Sopenharmony_ci // out.y = LSHIFT_ADD_HIGH32.i32 src1.y, src2.y 48bf215546Sopenharmony_ci // 49bf215546Sopenharmony_ci // The shift amount is applied to src2 before adding. The shift amount, and 50bf215546Sopenharmony_ci // any extra bits from src2 plus the overflow bit, are sent directly from 51bf215546Sopenharmony_ci // FMA to ADD instead of being passed explicitly. Hence, these two must be 52bf215546Sopenharmony_ci // bundled together into the same instruction. 53bf215546Sopenharmony_ci // 54bf215546Sopenharmony_ci // 2) Add a 64-bit value src1 to a zero-extended 32-bit value src2. Do: 55bf215546Sopenharmony_ci // out.x = LSHIFT_ADD_LOW32.u32 src1.x, src2, shift 56bf215546Sopenharmony_ci // out.y = LSHIFT_ADD_HIGH32.i32 src1.x, 0 57bf215546Sopenharmony_ci // 58bf215546Sopenharmony_ci // Note that in this case, the second argument to LSHIFT_ADD_HIGH32 is 59bf215546Sopenharmony_ci // ignored, so it can actually be anything. As before, the shift is applied 60bf215546Sopenharmony_ci // to src2 before adding. 61bf215546Sopenharmony_ci // 62bf215546Sopenharmony_ci // 3) Add a 64-bit value to a sign-extended 32-bit value src2. Do: 63bf215546Sopenharmony_ci // out.x = LSHIFT_ADD_LOW32.i32 src1.x, src2, shift 64bf215546Sopenharmony_ci // out.y = LSHIFT_ADD_HIGH32.i32 src1.x, 0 65bf215546Sopenharmony_ci // 66bf215546Sopenharmony_ci // The only difference is the .i32 instead of .u32. Otherwise, this is 67bf215546Sopenharmony_ci // exactly the same as before. 68bf215546Sopenharmony_ci // 69bf215546Sopenharmony_ci // In all these instructions, the shift amount is stored where the third 70bf215546Sopenharmony_ci // source would be, so the shift has to be a small immediate from 0 to 7. 71bf215546Sopenharmony_ci // This is fine for the expected use-case of these instructions, which is 72bf215546Sopenharmony_ci // manipulating 64-bit pointers. 73bf215546Sopenharmony_ci // 74bf215546Sopenharmony_ci // These instructions can also be combined with various load/store 75bf215546Sopenharmony_ci // instructions which normally take a 64-bit pointer in order to add a 76bf215546Sopenharmony_ci // 32-bit or 64-bit offset to the pointer before doing the operation, 77bf215546Sopenharmony_ci // optionally shifting the offset. The load/store op implicity does 78bf215546Sopenharmony_ci // LSHIFT_ADD_HIGH32.i32 internally. Letting ptr be the pointer, and offset 79bf215546Sopenharmony_ci // the desired offset, the cases go as follows: 80bf215546Sopenharmony_ci // 81bf215546Sopenharmony_ci // 1) Add a 64-bit offset: 82bf215546Sopenharmony_ci // LSHIFT_ADD_LOW32.i64 ptr.x, offset.x, shift 83bf215546Sopenharmony_ci // ld_st_op ptr.y, offset.y, ... 84bf215546Sopenharmony_ci // 85bf215546Sopenharmony_ci // Note that the output of LSHIFT_ADD_LOW32.i64 is not used, instead being 86bf215546Sopenharmony_ci // implicitly sent to the load/store op to serve as the low 32 bits of the 87bf215546Sopenharmony_ci // pointer. 88bf215546Sopenharmony_ci // 89bf215546Sopenharmony_ci // 2) Add a 32-bit unsigned offset: 90bf215546Sopenharmony_ci // temp = LSHIFT_ADD_LOW32.u32 ptr.x, offset, shift 91bf215546Sopenharmony_ci // ld_st_op temp, ptr.y, ... 92bf215546Sopenharmony_ci // 93bf215546Sopenharmony_ci // Now, the low 32 bits of offset << shift + ptr are passed explicitly to 94bf215546Sopenharmony_ci // the ld_st_op, to match the case where there is no offset and ld_st_op is 95bf215546Sopenharmony_ci // called directly. 96bf215546Sopenharmony_ci // 97bf215546Sopenharmony_ci // 3) Add a 32-bit signed offset: 98bf215546Sopenharmony_ci // temp = LSHIFT_ADD_LOW32.i32 ptr.x, offset, shift 99bf215546Sopenharmony_ci // ld_st_op temp, ptr.y, ... 100bf215546Sopenharmony_ci // 101bf215546Sopenharmony_ci // Again, the same as the unsigned case except for the offset. 102bf215546Sopenharmony_ci 103bf215546Sopenharmony_ci--- 104bf215546Sopenharmony_ci 105bf215546Sopenharmony_ciADD ops.. 106bf215546Sopenharmony_ci 107bf215546Sopenharmony_ciF16_TO_F32.X: // take the low 16 bits, and expand it to a 32-bit float 108bf215546Sopenharmony_ciF16_TO_F32.Y: // take the high 16 bits, and expand it to a 32-bit float 109bf215546Sopenharmony_ci 110bf215546Sopenharmony_ciMOV: 111bf215546Sopenharmony_ci // Logically, this should be SWZ.XY, but that's equivalent to a move, and 112bf215546Sopenharmony_ci // this seems to be the canonical way the blob generates a MOV. 113bf215546Sopenharmony_ci 114bf215546Sopenharmony_ci 115bf215546Sopenharmony_ciFRCP_FREXPM: 116bf215546Sopenharmony_ci // Given a floating point number m * 2^e, returns m ^ 2^{-1}. 117bf215546Sopenharmony_ci 118bf215546Sopenharmony_ciFLOG_FREXPE: 119bf215546Sopenharmony_ci // From the ARM patent US20160364209A1: 120bf215546Sopenharmony_ci // "Decompose v (the input) into numbers x1 and s such that v = x1 * 2^s, 121bf215546Sopenharmony_ci // and x1 is a floating point value in a predetermined range where the 122bf215546Sopenharmony_ci // value 1 is within the range and not at one extremity of the range (e.g. 123bf215546Sopenharmony_ci // choose a range where 1 is towards middle of range)." 124bf215546Sopenharmony_ci // 125bf215546Sopenharmony_ci // This computes s. 126bf215546Sopenharmony_ci 127bf215546Sopenharmony_ciLD_UBO.v4i32 128bf215546Sopenharmony_ci // src0 = offset, src1 = binding 129bf215546Sopenharmony_ci 130bf215546Sopenharmony_ciFRCP_FAST.f32: 131bf215546Sopenharmony_ci // *_FAST does not exist on G71 (added to G51, G72, and everything after) 132bf215546Sopenharmony_ci 133bf215546Sopenharmony_ciFRCP_TABLE 134bf215546Sopenharmony_ci // Given a floating point number m * 2^e, produces a table-based 135bf215546Sopenharmony_ci // approximation of 2/m using the top 17 bits. Includes special cases for 136bf215546Sopenharmony_ci // infinity, NaN, and zero, and copies the sign bit. 137bf215546Sopenharmony_ci 138bf215546Sopenharmony_ciFRCP_FAST.f16.X 139bf215546Sopenharmony_ci // Exists on G71 140bf215546Sopenharmony_ci 141bf215546Sopenharmony_ciFRSQ_TABLE: 142bf215546Sopenharmony_ci // A similar table for inverse square root, using the high 17 bits of the 143bf215546Sopenharmony_ci // mantissa as well as the low bit of the exponent. 144bf215546Sopenharmony_ci 145bf215546Sopenharmony_ciFRCP_APPROX: 146bf215546Sopenharmony_ci // Used in the argument reduction for log. Given a floating-point number 147bf215546Sopenharmony_ci // m * 2^e, uses the top 4 bits of m to produce an approximation to 1/m 148bf215546Sopenharmony_ci // with the exponent forced to 0 and only the top 5 bits are nonzero. 0, 149bf215546Sopenharmony_ci // infinity, and NaN all return 1.0. 150bf215546Sopenharmony_ci // See the ARM patent for more information. 151bf215546Sopenharmony_ci 152bf215546Sopenharmony_ciMUX: 153bf215546Sopenharmony_ci // For each bit i, return src2[i] ? src0[i] : src1[i]. In other words, this 154bf215546Sopenharmony_ci // is the same as (src2 & src0) | (~src2 & src1). 155bf215546Sopenharmony_ci 156bf215546Sopenharmony_ciST_VAR: 157bf215546Sopenharmony_ci // store a varying given the address and datatype from LD_VAR_ADDR 158bf215546Sopenharmony_ci 159bf215546Sopenharmony_ciLD_VAR_ADDR: 160bf215546Sopenharmony_ci // Compute varying address and datatype (for storing in the vertex shader), 161bf215546Sopenharmony_ci // and store the vec3 result in the data register. The result is passed as 162bf215546Sopenharmony_ci // the 3 normal arguments to ST_VAR. 163bf215546Sopenharmony_ci 164bf215546Sopenharmony_ciDISCARD 165bf215546Sopenharmony_ci // Conditional discards (discard_if) in NIR. Compares the first two 166bf215546Sopenharmony_ci // sources and discards if the result is true 167bf215546Sopenharmony_ci 168bf215546Sopenharmony_ciATEST.f32: 169bf215546Sopenharmony_ci // Implements alpha-to-coverage, as well as possibly the late depth and 170bf215546Sopenharmony_ci // stencil tests. The first source is the existing sample mask in R60 171bf215546Sopenharmony_ci // (possibly modified by gl_SampleMask), and the second source is the alpha 172bf215546Sopenharmony_ci // value. The sample mask is written right away based on the 173bf215546Sopenharmony_ci // alpha-to-coverage result using the normal register write mechanism, 174bf215546Sopenharmony_ci // since that doesn't need to read from any memory, and then written again 175bf215546Sopenharmony_ci // later based on the result of the stencil and depth tests using the 176bf215546Sopenharmony_ci // special register. 177bf215546Sopenharmony_ci 178bf215546Sopenharmony_ciBLEND: 179bf215546Sopenharmony_ci // This takes the sample coverage mask (computed by ATEST above) as a 180bf215546Sopenharmony_ci // regular argument, in addition to the vec4 color in the special register. 181