15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    INTEL_shader_integer_functions2
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_INTEL_shader_integer_functions2
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Ian Romanick <ian.d.romanick@intel.com>
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciContributors
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci
165bd8deadSopenharmony_ciStatus
175bd8deadSopenharmony_ci
185bd8deadSopenharmony_ci    In progress
195bd8deadSopenharmony_ci
205bd8deadSopenharmony_ciVersion
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ci    Last Modification Date: 11/25/2019
235bd8deadSopenharmony_ci    Revision: 5
245bd8deadSopenharmony_ci
255bd8deadSopenharmony_ciNumber
265bd8deadSopenharmony_ci
275bd8deadSopenharmony_ci    OpenGL Extension #547
285bd8deadSopenharmony_ci    OpenGL ES Extension #323
295bd8deadSopenharmony_ci
305bd8deadSopenharmony_ciDependencies
315bd8deadSopenharmony_ci
325bd8deadSopenharmony_ci    This extension is written against the OpenGL 4.6 (Core Profile)
335bd8deadSopenharmony_ci    Specification.
345bd8deadSopenharmony_ci
355bd8deadSopenharmony_ci    This extension is written against Version 4.60 (Revision 03) of the OpenGL
365bd8deadSopenharmony_ci    Shading Language Specification.
375bd8deadSopenharmony_ci
385bd8deadSopenharmony_ci    GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
395bd8deadSopenharmony_ci    is required.
405bd8deadSopenharmony_ci
415bd8deadSopenharmony_ci    This extension interacts with ARB_gpu_shader_int64.
425bd8deadSopenharmony_ci
435bd8deadSopenharmony_ci    This extension interacts with AMD_gpu_shader_int16.
445bd8deadSopenharmony_ci
455bd8deadSopenharmony_ci    This extension interacts with OpenGL 4.6 and ARB_gl_spirv.
465bd8deadSopenharmony_ci
475bd8deadSopenharmony_ci    This extension interacts with EXT_shader_explicit_arithmetic_types.
485bd8deadSopenharmony_ci
495bd8deadSopenharmony_ciOverview
505bd8deadSopenharmony_ci
515bd8deadSopenharmony_ci    OpenCL and other GPU programming environments provides a number of useful
525bd8deadSopenharmony_ci    functions operating on integer data.  Many of these functions are
535bd8deadSopenharmony_ci    supported by specialized instructions various GPUs.  Correct GLSL
545bd8deadSopenharmony_ci    implementations for some of these functions are non-trivial.  Recognizing
555bd8deadSopenharmony_ci    open-coded versions of these functions is often impractical.  As a result,
565bd8deadSopenharmony_ci    potential performance improvements go unrealized.
575bd8deadSopenharmony_ci
585bd8deadSopenharmony_ci    This extension makes available a number of functions that have specialized
595bd8deadSopenharmony_ci    instruction support on Intel GPUs.
605bd8deadSopenharmony_ci
615bd8deadSopenharmony_ciNew Procedures and Functions
625bd8deadSopenharmony_ci
635bd8deadSopenharmony_ci    None
645bd8deadSopenharmony_ci
655bd8deadSopenharmony_ciNew Tokens
665bd8deadSopenharmony_ci
675bd8deadSopenharmony_ci    None
685bd8deadSopenharmony_ci
695bd8deadSopenharmony_ciIP Status
705bd8deadSopenharmony_ci
715bd8deadSopenharmony_ci    No known IP claims.
725bd8deadSopenharmony_ci
735bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.60
745bd8deadSopenharmony_ci
755bd8deadSopenharmony_ci    Including the following line in a shader can be used to control the
765bd8deadSopenharmony_ci    language features described in this extension:
775bd8deadSopenharmony_ci
785bd8deadSopenharmony_ci      #extension GL_INTEL_shader_integer_functions2 : <behavior>
795bd8deadSopenharmony_ci
805bd8deadSopenharmony_ci    where <behavior> is as specified in section 3.3.
815bd8deadSopenharmony_ci
825bd8deadSopenharmony_ci    New preprocessor #defines are added to the OpenGL Shading Language:
835bd8deadSopenharmony_ci
845bd8deadSopenharmony_ci      #define GL_INTEL_shader_integer_functions2        1
855bd8deadSopenharmony_ci
865bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL Shading Language Specification
875bd8deadSopenharmony_ci(Built-in Functions)
885bd8deadSopenharmony_ci
895bd8deadSopenharmony_ci    Modify Section 8.8, Integer Functions
905bd8deadSopenharmony_ci
915bd8deadSopenharmony_ci    (add a new rows after the existing "findMSB" table row, p. 161)
925bd8deadSopenharmony_ci
935bd8deadSopenharmony_ci    genUType countLeadingZeros(genUType value)
945bd8deadSopenharmony_ci
955bd8deadSopenharmony_ci    Returns the number of leading 0-bits, stating at the most significant bit,
965bd8deadSopenharmony_ci    in the binary representation of value.  If value is zero, the size in bits
975bd8deadSopenharmony_ci    of the type of value or component type of value, if value is a vector will
985bd8deadSopenharmony_ci    be returned.
995bd8deadSopenharmony_ci
1005bd8deadSopenharmony_ci
1015bd8deadSopenharmony_ci    genUType countTrailingZeros(genUType value)
1025bd8deadSopenharmony_ci
1035bd8deadSopenharmony_ci    Returns the number of trailing 0-bits, stating at the least significant bit,
1045bd8deadSopenharmony_ci    in the binary representation of value.  If value is zero, the size in bits
1055bd8deadSopenharmony_ci    of the type of value or component type of value (if value is a vector) will
1065bd8deadSopenharmony_ci    be returned.
1075bd8deadSopenharmony_ci
1085bd8deadSopenharmony_ci
1095bd8deadSopenharmony_ci    genUType absoluteDifference(genUType x, genUType y)
1105bd8deadSopenharmony_ci    genUType absoluteDifference(genIType x, genIType y)
1115bd8deadSopenharmony_ci    genU64Type absoluteDifference(genU64Type x, genU64Type y)
1125bd8deadSopenharmony_ci    genU64Type absoluteDifference(genI64Type x, genI64Type y)
1135bd8deadSopenharmony_ci    genU16Type absoluteDifference(genU16Type x, genU16Type y)
1145bd8deadSopenharmony_ci    genU16Type absoluteDifference(genI16Type x, genI16Type y)
1155bd8deadSopenharmony_ci
1165bd8deadSopenharmony_ci    Returns |x - y| clamped to the range of the return type (instead of modulo
1175bd8deadSopenharmony_ci    overflowing).  Note: the return type of each of these functions is an
1185bd8deadSopenharmony_ci    unsigned type of the same bit-size and vector element count.
1195bd8deadSopenharmony_ci
1205bd8deadSopenharmony_ci
1215bd8deadSopenharmony_ci    genUType addSaturate(genUType x, genUType y)
1225bd8deadSopenharmony_ci    genIType addSaturate(genIType x, genIType y)
1235bd8deadSopenharmony_ci    genU64Type addSaturate(genU64Type x, genU64Type y)
1245bd8deadSopenharmony_ci    genI64Type addSaturate(genI64Type x, genI64Type y)
1255bd8deadSopenharmony_ci    genU16Type addSaturate(genU16Type x, genU16Type y)
1265bd8deadSopenharmony_ci    genI16Type addSaturate(genI16Type x, genI16Type y)
1275bd8deadSopenharmony_ci
1285bd8deadSopenharmony_ci    Returns x + y clamped to the range of the type of x (instead of modulo
1295bd8deadSopenharmony_ci    overflowing).
1305bd8deadSopenharmony_ci
1315bd8deadSopenharmony_ci
1325bd8deadSopenharmony_ci    genUType average(genUType x, genUType y)
1335bd8deadSopenharmony_ci    genIType average(genIType x, genIType y)
1345bd8deadSopenharmony_ci    genU64Type average(genU64Type x, genU64Type y)
1355bd8deadSopenharmony_ci    genI64Type average(genI64Type x, genI64Type y)
1365bd8deadSopenharmony_ci    genU16Type average(genU16Type x, genU16Type y)
1375bd8deadSopenharmony_ci    genI16Type average(genI16Type x, genI16Type y)
1385bd8deadSopenharmony_ci
1395bd8deadSopenharmony_ci    Returns (x+y) >> 1.  The intermediate sum does not modulo overflow.
1405bd8deadSopenharmony_ci
1415bd8deadSopenharmony_ci
1425bd8deadSopenharmony_ci    genUType averageRounded(genUType x, genUType y)
1435bd8deadSopenharmony_ci    genIType averageRounded(genIType x, genIType y)
1445bd8deadSopenharmony_ci    genU64Type averageRounded(genU64Type x, genU64Type y)
1455bd8deadSopenharmony_ci    genI64Type averageRounded(genI64Type x, genI64Type y)
1465bd8deadSopenharmony_ci    genU16Type averageRounded(genU16Type x, genU16Type y)
1475bd8deadSopenharmony_ci    genI16Type averageRounded(genI16Type x, genI16Type y)
1485bd8deadSopenharmony_ci
1495bd8deadSopenharmony_ci    Returns (x+y+1) >> 1.  The intermediate sum does not modulo overflow.
1505bd8deadSopenharmony_ci
1515bd8deadSopenharmony_ci
1525bd8deadSopenharmony_ci    genUType subtractSaturate(genUType x, genUType y)
1535bd8deadSopenharmony_ci    genIType subtractSaturate(genIType x, genIType y)
1545bd8deadSopenharmony_ci    genU64Type subtractSaturate(genU64Type x, genU64Type y)
1555bd8deadSopenharmony_ci    genI64Type subtractSaturate(genI64Type x, genI64Type y)
1565bd8deadSopenharmony_ci    genU16Type subtractSaturate(genU16Type x, genU16Type y)
1575bd8deadSopenharmony_ci    genI16Type subtractSaturate(genI16Type x, genI16Type y)
1585bd8deadSopenharmony_ci
1595bd8deadSopenharmony_ci    Returns x - y clamped to the range of the type of x (instead of modulo
1605bd8deadSopenharmony_ci    overflowing).
1615bd8deadSopenharmony_ci
1625bd8deadSopenharmony_ci
1635bd8deadSopenharmony_ci    genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
1645bd8deadSopenharmony_ci    genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
1655bd8deadSopenharmony_ci    genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
1665bd8deadSopenharmony_ci    genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)
1675bd8deadSopenharmony_ci
1685bd8deadSopenharmony_ci    Returns x * y, where only the (possibly sign-extended) low 16-bits of y
1695bd8deadSopenharmony_ci    are used.  In cases where one of the signed operands is known to be in the
1705bd8deadSopenharmony_ci    range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
1715bd8deadSopenharmony_ci    [0, (2^16)-1], this may provide a higher performance multiply.
1725bd8deadSopenharmony_ci
1735bd8deadSopenharmony_ciInteractions with OpenGL 4.6 and ARB_gl_spirv
1745bd8deadSopenharmony_ci
1755bd8deadSopenharmony_ci    If OpenGL 4.6 or ARB_gl_spirv is supported, then
1765bd8deadSopenharmony_ci    SPV_INTEL_shader_integer_functions2 must also be supported.
1775bd8deadSopenharmony_ci
1785bd8deadSopenharmony_ci    The IntegerFunctions2INTEL capability is available whenever the
1795bd8deadSopenharmony_ci    implementation supports INTEL_shader_integer_functions2.
1805bd8deadSopenharmony_ci
1815bd8deadSopenharmony_ciInteractions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64
1825bd8deadSopenharmony_ci
1835bd8deadSopenharmony_ci    If the shader enables only INTEL_shader_integer_functions2 but not
1845bd8deadSopenharmony_ci    ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
1855bd8deadSopenharmony_ci    remove all function overloads that have either genU64Type or genI64Type
1865bd8deadSopenharmony_ci    parameters.
1875bd8deadSopenharmony_ci
1885bd8deadSopenharmony_ciInteractions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16
1895bd8deadSopenharmony_ci
1905bd8deadSopenharmony_ci    If the shader enables only INTEL_shader_integer_functions2 but not
1915bd8deadSopenharmony_ci    AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
1925bd8deadSopenharmony_ci    remove all function overloads that have either genU16Type or genI16Type
1935bd8deadSopenharmony_ci    parameters.
1945bd8deadSopenharmony_ci
1955bd8deadSopenharmony_ciIssues
1965bd8deadSopenharmony_ci
1975bd8deadSopenharmony_ci    1) What should this extension be called?
1985bd8deadSopenharmony_ci
1995bd8deadSopenharmony_ci    RESOLVED.  There already exists a MESA_shader_integer_functions extension,
2005bd8deadSopenharmony_ci    so this is called INTEL_shader_integer_functions2 to prevent confusion.
2015bd8deadSopenharmony_ci
2025bd8deadSopenharmony_ci    2) How does countLeadingZeros differ from findMSB?
2035bd8deadSopenharmony_ci
2045bd8deadSopenharmony_ci    RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
2055bd8deadSopenharmony_ci    equivalent to 32-(findMSB(x)+1).  This corresponds the clz() function in
2065bd8deadSopenharmony_ci    OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.
2075bd8deadSopenharmony_ci
2085bd8deadSopenharmony_ci    3) How does countTrailingZeros differ from findLSB?
2095bd8deadSopenharmony_ci
2105bd8deadSopenharmony_ci    RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
2115bd8deadSopenharmony_ci    32).  This corresponds to the ctz() function in OpenCL.
2125bd8deadSopenharmony_ci
2135bd8deadSopenharmony_ci    4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
2145bd8deadSopenharmony_ci    provided?
2155bd8deadSopenharmony_ci
2165bd8deadSopenharmony_ci    RESOLVED: NO.  OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
2175bd8deadSopenharmony_ci    does not have 64-bit versions of findMSB() or findLSB() even when
2185bd8deadSopenharmony_ci    ARB_gpu_shader_int64 is supported.  The instructions used to implement
2195bd8deadSopenharmony_ci    countLeadingZeros and countTrailingZeros do not natively support 64-bit
2205bd8deadSopenharmony_ci    operands.
2215bd8deadSopenharmony_ci
2225bd8deadSopenharmony_ci    The implementation of 64-bit countLeadingZeros() would be 5 instructions,
2235bd8deadSopenharmony_ci    and the implementation of 64-bit countTrailingZeros() would be 7
2245bd8deadSopenharmony_ci    instructions.  Neither of these is better than an application developer
2255bd8deadSopenharmony_ci    could achieve in GLSL:
2265bd8deadSopenharmony_ci
2275bd8deadSopenharmony_ci        uint countLeadingZeros(uint64_t value)
2285bd8deadSopenharmony_ci        {
2295bd8deadSopenharmony_ci            uvec2 v = unpackUint2x32(value);
2305bd8deadSopenharmony_ci
2315bd8deadSopenharmony_ci            return v.y == 0
2325bd8deadSopenharmony_ci                ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
2335bd8deadSopenharmony_ci        }
2345bd8deadSopenharmony_ci
2355bd8deadSopenharmony_ci        uint countTrailingZeros(uint64_t value)
2365bd8deadSopenharmony_ci        {
2375bd8deadSopenharmony_ci            uvec2 v = unpackUint2x32(value);
2385bd8deadSopenharmony_ci
2395bd8deadSopenharmony_ci            return v.x == 0
2405bd8deadSopenharmony_ci                ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
2415bd8deadSopenharmony_ci        }
2425bd8deadSopenharmony_ci
2435bd8deadSopenharmony_ci    5) Should 64-bit versions of the arithmetic functions be provided?
2445bd8deadSopenharmony_ci
2455bd8deadSopenharmony_ci    RESOLVED: NO.  Since recent generations of Intel GPUs have removed
2465bd8deadSopenharmony_ci    hardware support for 64-bit integer arithmetic, there doesn't seem to be
2475bd8deadSopenharmony_ci    much value in providing 64-bit arithmetic functions.
2485bd8deadSopenharmony_ci
2495bd8deadSopenharmony_ci    6) Should this extension include average()?
2505bd8deadSopenharmony_ci
2515bd8deadSopenharmony_ci    RESOLVED: YES.  average() corresponds to hadd() in OpenCL, and
2525bd8deadSopenharmony_ci    averageRounded() corresponds to rhadd() in OpenCL.
2535bd8deadSopenharmony_ci
2545bd8deadSopenharmony_ci    averageRounded() corresponds to the AVG instruction on Intel GPUs.
2555bd8deadSopenharmony_ci    average(), on the other hand, does not correspond to a single instruction.
2565bd8deadSopenharmony_ci    The signed and unsigned versions may have slightly different
2575bd8deadSopenharmony_ci    implementations depending on the specific GPU.  In the worst case, the
2585bd8deadSopenharmony_ci    implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
2595bd8deadSopenharmony_ci    1)), and in the best case it is 3 instructions.
2605bd8deadSopenharmony_ci
2615bd8deadSopenharmony_ciRevision History
2625bd8deadSopenharmony_ci
2635bd8deadSopenharmony_ci    Rev  Date         Author    Changes
2645bd8deadSopenharmony_ci    ---  -----------  --------  ---------------------------------------------
2655bd8deadSopenharmony_ci      1  04-Sep-2018  idr       Initial version.
2665bd8deadSopenharmony_ci      2  19-Sep-2018  idr       Add interactions with AMD_gpu_shader_int16.
2675bd8deadSopenharmony_ci      3  22-Jan-2019  idr       Add interactions with EXT_shader_explicit_arithmetic_types.
2685bd8deadSopenharmony_ci      4  14-Nov-2019  idr       Resolve issue #1 and issue #5.
2695bd8deadSopenharmony_ci      5  25-Nov-2019  idr       Fix a bunch of typos noticed by @cmarcelo.
270