15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci INTEL_shader_integer_functions2 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_INTEL_shader_integer_functions2 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Ian Romanick <ian.d.romanick@intel.com> 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciContributors 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci 165bd8deadSopenharmony_ciStatus 175bd8deadSopenharmony_ci 185bd8deadSopenharmony_ci In progress 195bd8deadSopenharmony_ci 205bd8deadSopenharmony_ciVersion 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ci Last Modification Date: 11/25/2019 235bd8deadSopenharmony_ci Revision: 5 245bd8deadSopenharmony_ci 255bd8deadSopenharmony_ciNumber 265bd8deadSopenharmony_ci 275bd8deadSopenharmony_ci OpenGL Extension #547 285bd8deadSopenharmony_ci OpenGL ES Extension #323 295bd8deadSopenharmony_ci 305bd8deadSopenharmony_ciDependencies 315bd8deadSopenharmony_ci 325bd8deadSopenharmony_ci This extension is written against the OpenGL 4.6 (Core Profile) 335bd8deadSopenharmony_ci Specification. 345bd8deadSopenharmony_ci 355bd8deadSopenharmony_ci This extension is written against Version 4.60 (Revision 03) of the OpenGL 365bd8deadSopenharmony_ci Shading Language Specification. 375bd8deadSopenharmony_ci 385bd8deadSopenharmony_ci GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL) 395bd8deadSopenharmony_ci is required. 405bd8deadSopenharmony_ci 415bd8deadSopenharmony_ci This extension interacts with ARB_gpu_shader_int64. 425bd8deadSopenharmony_ci 435bd8deadSopenharmony_ci This extension interacts with AMD_gpu_shader_int16. 445bd8deadSopenharmony_ci 455bd8deadSopenharmony_ci This extension interacts with OpenGL 4.6 and ARB_gl_spirv. 465bd8deadSopenharmony_ci 475bd8deadSopenharmony_ci This extension interacts with EXT_shader_explicit_arithmetic_types. 485bd8deadSopenharmony_ci 495bd8deadSopenharmony_ciOverview 505bd8deadSopenharmony_ci 515bd8deadSopenharmony_ci OpenCL and other GPU programming environments provides a number of useful 525bd8deadSopenharmony_ci functions operating on integer data. Many of these functions are 535bd8deadSopenharmony_ci supported by specialized instructions various GPUs. Correct GLSL 545bd8deadSopenharmony_ci implementations for some of these functions are non-trivial. Recognizing 555bd8deadSopenharmony_ci open-coded versions of these functions is often impractical. As a result, 565bd8deadSopenharmony_ci potential performance improvements go unrealized. 575bd8deadSopenharmony_ci 585bd8deadSopenharmony_ci This extension makes available a number of functions that have specialized 595bd8deadSopenharmony_ci instruction support on Intel GPUs. 605bd8deadSopenharmony_ci 615bd8deadSopenharmony_ciNew Procedures and Functions 625bd8deadSopenharmony_ci 635bd8deadSopenharmony_ci None 645bd8deadSopenharmony_ci 655bd8deadSopenharmony_ciNew Tokens 665bd8deadSopenharmony_ci 675bd8deadSopenharmony_ci None 685bd8deadSopenharmony_ci 695bd8deadSopenharmony_ciIP Status 705bd8deadSopenharmony_ci 715bd8deadSopenharmony_ci No known IP claims. 725bd8deadSopenharmony_ci 735bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.60 745bd8deadSopenharmony_ci 755bd8deadSopenharmony_ci Including the following line in a shader can be used to control the 765bd8deadSopenharmony_ci language features described in this extension: 775bd8deadSopenharmony_ci 785bd8deadSopenharmony_ci #extension GL_INTEL_shader_integer_functions2 : <behavior> 795bd8deadSopenharmony_ci 805bd8deadSopenharmony_ci where <behavior> is as specified in section 3.3. 815bd8deadSopenharmony_ci 825bd8deadSopenharmony_ci New preprocessor #defines are added to the OpenGL Shading Language: 835bd8deadSopenharmony_ci 845bd8deadSopenharmony_ci #define GL_INTEL_shader_integer_functions2 1 855bd8deadSopenharmony_ci 865bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL Shading Language Specification 875bd8deadSopenharmony_ci(Built-in Functions) 885bd8deadSopenharmony_ci 895bd8deadSopenharmony_ci Modify Section 8.8, Integer Functions 905bd8deadSopenharmony_ci 915bd8deadSopenharmony_ci (add a new rows after the existing "findMSB" table row, p. 161) 925bd8deadSopenharmony_ci 935bd8deadSopenharmony_ci genUType countLeadingZeros(genUType value) 945bd8deadSopenharmony_ci 955bd8deadSopenharmony_ci Returns the number of leading 0-bits, stating at the most significant bit, 965bd8deadSopenharmony_ci in the binary representation of value. If value is zero, the size in bits 975bd8deadSopenharmony_ci of the type of value or component type of value, if value is a vector will 985bd8deadSopenharmony_ci be returned. 995bd8deadSopenharmony_ci 1005bd8deadSopenharmony_ci 1015bd8deadSopenharmony_ci genUType countTrailingZeros(genUType value) 1025bd8deadSopenharmony_ci 1035bd8deadSopenharmony_ci Returns the number of trailing 0-bits, stating at the least significant bit, 1045bd8deadSopenharmony_ci in the binary representation of value. If value is zero, the size in bits 1055bd8deadSopenharmony_ci of the type of value or component type of value (if value is a vector) will 1065bd8deadSopenharmony_ci be returned. 1075bd8deadSopenharmony_ci 1085bd8deadSopenharmony_ci 1095bd8deadSopenharmony_ci genUType absoluteDifference(genUType x, genUType y) 1105bd8deadSopenharmony_ci genUType absoluteDifference(genIType x, genIType y) 1115bd8deadSopenharmony_ci genU64Type absoluteDifference(genU64Type x, genU64Type y) 1125bd8deadSopenharmony_ci genU64Type absoluteDifference(genI64Type x, genI64Type y) 1135bd8deadSopenharmony_ci genU16Type absoluteDifference(genU16Type x, genU16Type y) 1145bd8deadSopenharmony_ci genU16Type absoluteDifference(genI16Type x, genI16Type y) 1155bd8deadSopenharmony_ci 1165bd8deadSopenharmony_ci Returns |x - y| clamped to the range of the return type (instead of modulo 1175bd8deadSopenharmony_ci overflowing). Note: the return type of each of these functions is an 1185bd8deadSopenharmony_ci unsigned type of the same bit-size and vector element count. 1195bd8deadSopenharmony_ci 1205bd8deadSopenharmony_ci 1215bd8deadSopenharmony_ci genUType addSaturate(genUType x, genUType y) 1225bd8deadSopenharmony_ci genIType addSaturate(genIType x, genIType y) 1235bd8deadSopenharmony_ci genU64Type addSaturate(genU64Type x, genU64Type y) 1245bd8deadSopenharmony_ci genI64Type addSaturate(genI64Type x, genI64Type y) 1255bd8deadSopenharmony_ci genU16Type addSaturate(genU16Type x, genU16Type y) 1265bd8deadSopenharmony_ci genI16Type addSaturate(genI16Type x, genI16Type y) 1275bd8deadSopenharmony_ci 1285bd8deadSopenharmony_ci Returns x + y clamped to the range of the type of x (instead of modulo 1295bd8deadSopenharmony_ci overflowing). 1305bd8deadSopenharmony_ci 1315bd8deadSopenharmony_ci 1325bd8deadSopenharmony_ci genUType average(genUType x, genUType y) 1335bd8deadSopenharmony_ci genIType average(genIType x, genIType y) 1345bd8deadSopenharmony_ci genU64Type average(genU64Type x, genU64Type y) 1355bd8deadSopenharmony_ci genI64Type average(genI64Type x, genI64Type y) 1365bd8deadSopenharmony_ci genU16Type average(genU16Type x, genU16Type y) 1375bd8deadSopenharmony_ci genI16Type average(genI16Type x, genI16Type y) 1385bd8deadSopenharmony_ci 1395bd8deadSopenharmony_ci Returns (x+y) >> 1. The intermediate sum does not modulo overflow. 1405bd8deadSopenharmony_ci 1415bd8deadSopenharmony_ci 1425bd8deadSopenharmony_ci genUType averageRounded(genUType x, genUType y) 1435bd8deadSopenharmony_ci genIType averageRounded(genIType x, genIType y) 1445bd8deadSopenharmony_ci genU64Type averageRounded(genU64Type x, genU64Type y) 1455bd8deadSopenharmony_ci genI64Type averageRounded(genI64Type x, genI64Type y) 1465bd8deadSopenharmony_ci genU16Type averageRounded(genU16Type x, genU16Type y) 1475bd8deadSopenharmony_ci genI16Type averageRounded(genI16Type x, genI16Type y) 1485bd8deadSopenharmony_ci 1495bd8deadSopenharmony_ci Returns (x+y+1) >> 1. The intermediate sum does not modulo overflow. 1505bd8deadSopenharmony_ci 1515bd8deadSopenharmony_ci 1525bd8deadSopenharmony_ci genUType subtractSaturate(genUType x, genUType y) 1535bd8deadSopenharmony_ci genIType subtractSaturate(genIType x, genIType y) 1545bd8deadSopenharmony_ci genU64Type subtractSaturate(genU64Type x, genU64Type y) 1555bd8deadSopenharmony_ci genI64Type subtractSaturate(genI64Type x, genI64Type y) 1565bd8deadSopenharmony_ci genU16Type subtractSaturate(genU16Type x, genU16Type y) 1575bd8deadSopenharmony_ci genI16Type subtractSaturate(genI16Type x, genI16Type y) 1585bd8deadSopenharmony_ci 1595bd8deadSopenharmony_ci Returns x - y clamped to the range of the type of x (instead of modulo 1605bd8deadSopenharmony_ci overflowing). 1615bd8deadSopenharmony_ci 1625bd8deadSopenharmony_ci 1635bd8deadSopenharmony_ci genUType multiply32x16(genUType x_32_bits, genUType y_16_bits) 1645bd8deadSopenharmony_ci genIType multiply32x16(genIType x_32_bits, genIType y_16_bits) 1655bd8deadSopenharmony_ci genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits) 1665bd8deadSopenharmony_ci genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits) 1675bd8deadSopenharmony_ci 1685bd8deadSopenharmony_ci Returns x * y, where only the (possibly sign-extended) low 16-bits of y 1695bd8deadSopenharmony_ci are used. In cases where one of the signed operands is known to be in the 1705bd8deadSopenharmony_ci range [-2^15, (2^15)-1] or unsigned operands is known to be in the range 1715bd8deadSopenharmony_ci [0, (2^16)-1], this may provide a higher performance multiply. 1725bd8deadSopenharmony_ci 1735bd8deadSopenharmony_ciInteractions with OpenGL 4.6 and ARB_gl_spirv 1745bd8deadSopenharmony_ci 1755bd8deadSopenharmony_ci If OpenGL 4.6 or ARB_gl_spirv is supported, then 1765bd8deadSopenharmony_ci SPV_INTEL_shader_integer_functions2 must also be supported. 1775bd8deadSopenharmony_ci 1785bd8deadSopenharmony_ci The IntegerFunctions2INTEL capability is available whenever the 1795bd8deadSopenharmony_ci implementation supports INTEL_shader_integer_functions2. 1805bd8deadSopenharmony_ci 1815bd8deadSopenharmony_ciInteractions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64 1825bd8deadSopenharmony_ci 1835bd8deadSopenharmony_ci If the shader enables only INTEL_shader_integer_functions2 but not 1845bd8deadSopenharmony_ci ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64, 1855bd8deadSopenharmony_ci remove all function overloads that have either genU64Type or genI64Type 1865bd8deadSopenharmony_ci parameters. 1875bd8deadSopenharmony_ci 1885bd8deadSopenharmony_ciInteractions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16 1895bd8deadSopenharmony_ci 1905bd8deadSopenharmony_ci If the shader enables only INTEL_shader_integer_functions2 but not 1915bd8deadSopenharmony_ci AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16, 1925bd8deadSopenharmony_ci remove all function overloads that have either genU16Type or genI16Type 1935bd8deadSopenharmony_ci parameters. 1945bd8deadSopenharmony_ci 1955bd8deadSopenharmony_ciIssues 1965bd8deadSopenharmony_ci 1975bd8deadSopenharmony_ci 1) What should this extension be called? 1985bd8deadSopenharmony_ci 1995bd8deadSopenharmony_ci RESOLVED. There already exists a MESA_shader_integer_functions extension, 2005bd8deadSopenharmony_ci so this is called INTEL_shader_integer_functions2 to prevent confusion. 2015bd8deadSopenharmony_ci 2025bd8deadSopenharmony_ci 2) How does countLeadingZeros differ from findMSB? 2035bd8deadSopenharmony_ci 2045bd8deadSopenharmony_ci RESOLVED: countLeadingZeros is only defined for unsigned types, and it is 2055bd8deadSopenharmony_ci equivalent to 32-(findMSB(x)+1). This corresponds the clz() function in 2065bd8deadSopenharmony_ci OpenCL and the LZD (leading zero detection) instruction on Intel GPUs. 2075bd8deadSopenharmony_ci 2085bd8deadSopenharmony_ci 3) How does countTrailingZeros differ from findLSB? 2095bd8deadSopenharmony_ci 2105bd8deadSopenharmony_ci RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)), 2115bd8deadSopenharmony_ci 32). This corresponds to the ctz() function in OpenCL. 2125bd8deadSopenharmony_ci 2135bd8deadSopenharmony_ci 4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be 2145bd8deadSopenharmony_ci provided? 2155bd8deadSopenharmony_ci 2165bd8deadSopenharmony_ci RESOLVED: NO. OpenCL has 64-bit versions of clz() and ctz(), but OpenGL 2175bd8deadSopenharmony_ci does not have 64-bit versions of findMSB() or findLSB() even when 2185bd8deadSopenharmony_ci ARB_gpu_shader_int64 is supported. The instructions used to implement 2195bd8deadSopenharmony_ci countLeadingZeros and countTrailingZeros do not natively support 64-bit 2205bd8deadSopenharmony_ci operands. 2215bd8deadSopenharmony_ci 2225bd8deadSopenharmony_ci The implementation of 64-bit countLeadingZeros() would be 5 instructions, 2235bd8deadSopenharmony_ci and the implementation of 64-bit countTrailingZeros() would be 7 2245bd8deadSopenharmony_ci instructions. Neither of these is better than an application developer 2255bd8deadSopenharmony_ci could achieve in GLSL: 2265bd8deadSopenharmony_ci 2275bd8deadSopenharmony_ci uint countLeadingZeros(uint64_t value) 2285bd8deadSopenharmony_ci { 2295bd8deadSopenharmony_ci uvec2 v = unpackUint2x32(value); 2305bd8deadSopenharmony_ci 2315bd8deadSopenharmony_ci return v.y == 0 2325bd8deadSopenharmony_ci ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y); 2335bd8deadSopenharmony_ci } 2345bd8deadSopenharmony_ci 2355bd8deadSopenharmony_ci uint countTrailingZeros(uint64_t value) 2365bd8deadSopenharmony_ci { 2375bd8deadSopenharmony_ci uvec2 v = unpackUint2x32(value); 2385bd8deadSopenharmony_ci 2395bd8deadSopenharmony_ci return v.x == 0 2405bd8deadSopenharmony_ci ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x); 2415bd8deadSopenharmony_ci } 2425bd8deadSopenharmony_ci 2435bd8deadSopenharmony_ci 5) Should 64-bit versions of the arithmetic functions be provided? 2445bd8deadSopenharmony_ci 2455bd8deadSopenharmony_ci RESOLVED: NO. Since recent generations of Intel GPUs have removed 2465bd8deadSopenharmony_ci hardware support for 64-bit integer arithmetic, there doesn't seem to be 2475bd8deadSopenharmony_ci much value in providing 64-bit arithmetic functions. 2485bd8deadSopenharmony_ci 2495bd8deadSopenharmony_ci 6) Should this extension include average()? 2505bd8deadSopenharmony_ci 2515bd8deadSopenharmony_ci RESOLVED: YES. average() corresponds to hadd() in OpenCL, and 2525bd8deadSopenharmony_ci averageRounded() corresponds to rhadd() in OpenCL. 2535bd8deadSopenharmony_ci 2545bd8deadSopenharmony_ci averageRounded() corresponds to the AVG instruction on Intel GPUs. 2555bd8deadSopenharmony_ci average(), on the other hand, does not correspond to a single instruction. 2565bd8deadSopenharmony_ci The signed and unsigned versions may have slightly different 2575bd8deadSopenharmony_ci implementations depending on the specific GPU. In the worst case, the 2585bd8deadSopenharmony_ci implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) & 2595bd8deadSopenharmony_ci 1)), and in the best case it is 3 instructions. 2605bd8deadSopenharmony_ci 2615bd8deadSopenharmony_ciRevision History 2625bd8deadSopenharmony_ci 2635bd8deadSopenharmony_ci Rev Date Author Changes 2645bd8deadSopenharmony_ci --- ----------- -------- --------------------------------------------- 2655bd8deadSopenharmony_ci 1 04-Sep-2018 idr Initial version. 2665bd8deadSopenharmony_ci 2 19-Sep-2018 idr Add interactions with AMD_gpu_shader_int16. 2675bd8deadSopenharmony_ci 3 22-Jan-2019 idr Add interactions with EXT_shader_explicit_arithmetic_types. 2685bd8deadSopenharmony_ci 4 14-Nov-2019 idr Resolve issue #1 and issue #5. 2695bd8deadSopenharmony_ci 5 25-Nov-2019 idr Fix a bunch of typos noticed by @cmarcelo. 270