extensions/INTEL/INTEL_shader_integer_functions2.txt

5bd8deadSopenharmony_ciName
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    INTEL_shader_integer_functions2
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciName Strings
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    GL_INTEL_shader_integer_functions2
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciContact
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Ian Romanick <ian.d.romanick@intel.com>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciContributors
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciStatus
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    In progress
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciVersion
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Last Modification Date: 11/25/2019
5bd8deadSopenharmony_ci    Revision: 5
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNumber
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    OpenGL Extension #547
5bd8deadSopenharmony_ci    OpenGL ES Extension #323
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension is written against the OpenGL 4.6 (Core Profile)
5bd8deadSopenharmony_ci    Specification.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension is written against Version 4.60 (Revision 03) of the OpenGL
5bd8deadSopenharmony_ci    Shading Language Specification.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
5bd8deadSopenharmony_ci    is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts with ARB_gpu_shader_int64.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts with AMD_gpu_shader_int16.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts with OpenGL 4.6 and ARB_gl_spirv.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts with EXT_shader_explicit_arithmetic_types.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciOverview
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    OpenCL and other GPU programming environments provides a number of useful
5bd8deadSopenharmony_ci    functions operating on integer data.  Many of these functions are
5bd8deadSopenharmony_ci    supported by specialized instructions various GPUs.  Correct GLSL
5bd8deadSopenharmony_ci    implementations for some of these functions are non-trivial.  Recognizing
5bd8deadSopenharmony_ci    open-coded versions of these functions is often impractical.  As a result,
5bd8deadSopenharmony_ci    potential performance improvements go unrealized.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension makes available a number of functions that have specialized
5bd8deadSopenharmony_ci    instruction support on Intel GPUs.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Procedures and Functions
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Tokens
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciIP Status
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    No known IP claims.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.60
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Including the following line in a shader can be used to control the
5bd8deadSopenharmony_ci    language features described in this extension:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      #extension GL_INTEL_shader_integer_functions2 : <behavior>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    where <behavior> is as specified in section 3.3.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    New preprocessor #defines are added to the OpenGL Shading Language:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      #define GL_INTEL_shader_integer_functions2        1
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL Shading Language Specification
5bd8deadSopenharmony_ci(Built-in Functions)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 8.8, Integer Functions
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add a new rows after the existing "findMSB" table row, p. 161)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType countLeadingZeros(genUType value)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns the number of leading 0-bits, stating at the most significant bit,
5bd8deadSopenharmony_ci    in the binary representation of value.  If value is zero, the size in bits
5bd8deadSopenharmony_ci    of the type of value or component type of value, if value is a vector will
5bd8deadSopenharmony_ci    be returned.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType countTrailingZeros(genUType value)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns the number of trailing 0-bits, stating at the least significant bit,
5bd8deadSopenharmony_ci    in the binary representation of value.  If value is zero, the size in bits
5bd8deadSopenharmony_ci    of the type of value or component type of value (if value is a vector) will
5bd8deadSopenharmony_ci    be returned.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType absoluteDifference(genUType x, genUType y)
5bd8deadSopenharmony_ci    genUType absoluteDifference(genIType x, genIType y)
5bd8deadSopenharmony_ci    genU64Type absoluteDifference(genU64Type x, genU64Type y)
5bd8deadSopenharmony_ci    genU64Type absoluteDifference(genI64Type x, genI64Type y)
5bd8deadSopenharmony_ci    genU16Type absoluteDifference(genU16Type x, genU16Type y)
5bd8deadSopenharmony_ci    genU16Type absoluteDifference(genI16Type x, genI16Type y)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns |x - y| clamped to the range of the return type (instead of modulo
5bd8deadSopenharmony_ci    overflowing).  Note: the return type of each of these functions is an
5bd8deadSopenharmony_ci    unsigned type of the same bit-size and vector element count.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType addSaturate(genUType x, genUType y)
5bd8deadSopenharmony_ci    genIType addSaturate(genIType x, genIType y)
5bd8deadSopenharmony_ci    genU64Type addSaturate(genU64Type x, genU64Type y)
5bd8deadSopenharmony_ci    genI64Type addSaturate(genI64Type x, genI64Type y)
5bd8deadSopenharmony_ci    genU16Type addSaturate(genU16Type x, genU16Type y)
5bd8deadSopenharmony_ci    genI16Type addSaturate(genI16Type x, genI16Type y)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns x + y clamped to the range of the type of x (instead of modulo
5bd8deadSopenharmony_ci    overflowing).
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType average(genUType x, genUType y)
5bd8deadSopenharmony_ci    genIType average(genIType x, genIType y)
5bd8deadSopenharmony_ci    genU64Type average(genU64Type x, genU64Type y)
5bd8deadSopenharmony_ci    genI64Type average(genI64Type x, genI64Type y)
5bd8deadSopenharmony_ci    genU16Type average(genU16Type x, genU16Type y)
5bd8deadSopenharmony_ci    genI16Type average(genI16Type x, genI16Type y)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns (x+y) >> 1.  The intermediate sum does not modulo overflow.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType averageRounded(genUType x, genUType y)
5bd8deadSopenharmony_ci    genIType averageRounded(genIType x, genIType y)
5bd8deadSopenharmony_ci    genU64Type averageRounded(genU64Type x, genU64Type y)
5bd8deadSopenharmony_ci    genI64Type averageRounded(genI64Type x, genI64Type y)
5bd8deadSopenharmony_ci    genU16Type averageRounded(genU16Type x, genU16Type y)
5bd8deadSopenharmony_ci    genI16Type averageRounded(genI16Type x, genI16Type y)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns (x+y+1) >> 1.  The intermediate sum does not modulo overflow.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType subtractSaturate(genUType x, genUType y)
5bd8deadSopenharmony_ci    genIType subtractSaturate(genIType x, genIType y)
5bd8deadSopenharmony_ci    genU64Type subtractSaturate(genU64Type x, genU64Type y)
5bd8deadSopenharmony_ci    genI64Type subtractSaturate(genI64Type x, genI64Type y)
5bd8deadSopenharmony_ci    genU16Type subtractSaturate(genU16Type x, genU16Type y)
5bd8deadSopenharmony_ci    genI16Type subtractSaturate(genI16Type x, genI16Type y)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns x - y clamped to the range of the type of x (instead of modulo
5bd8deadSopenharmony_ci    overflowing).
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
5bd8deadSopenharmony_ci    genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
5bd8deadSopenharmony_ci    genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
5bd8deadSopenharmony_ci    genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Returns x * y, where only the (possibly sign-extended) low 16-bits of y
5bd8deadSopenharmony_ci    are used.  In cases where one of the signed operands is known to be in the
5bd8deadSopenharmony_ci    range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
5bd8deadSopenharmony_ci    [0, (2^16)-1], this may provide a higher performance multiply.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciInteractions with OpenGL 4.6 and ARB_gl_spirv
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If OpenGL 4.6 or ARB_gl_spirv is supported, then
5bd8deadSopenharmony_ci    SPV_INTEL_shader_integer_functions2 must also be supported.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The IntegerFunctions2INTEL capability is available whenever the
5bd8deadSopenharmony_ci    implementation supports INTEL_shader_integer_functions2.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciInteractions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If the shader enables only INTEL_shader_integer_functions2 but not
5bd8deadSopenharmony_ci    ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
5bd8deadSopenharmony_ci    remove all function overloads that have either genU64Type or genI64Type
5bd8deadSopenharmony_ci    parameters.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciInteractions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If the shader enables only INTEL_shader_integer_functions2 but not
5bd8deadSopenharmony_ci    AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
5bd8deadSopenharmony_ci    remove all function overloads that have either genU16Type or genI16Type
5bd8deadSopenharmony_ci    parameters.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciIssues
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    1) What should this extension be called?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    RESOLVED.  There already exists a MESA_shader_integer_functions extension,
5bd8deadSopenharmony_ci    so this is called INTEL_shader_integer_functions2 to prevent confusion.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    2) How does countLeadingZeros differ from findMSB?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
5bd8deadSopenharmony_ci    equivalent to 32-(findMSB(x)+1).  This corresponds the clz() function in
5bd8deadSopenharmony_ci    OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    3) How does countTrailingZeros differ from findLSB?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
5bd8deadSopenharmony_ci    32).  This corresponds to the ctz() function in OpenCL.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
5bd8deadSopenharmony_ci    provided?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    RESOLVED: NO.  OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
5bd8deadSopenharmony_ci    does not have 64-bit versions of findMSB() or findLSB() even when
5bd8deadSopenharmony_ci    ARB_gpu_shader_int64 is supported.  The instructions used to implement
5bd8deadSopenharmony_ci    countLeadingZeros and countTrailingZeros do not natively support 64-bit
5bd8deadSopenharmony_ci    operands.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The implementation of 64-bit countLeadingZeros() would be 5 instructions,
5bd8deadSopenharmony_ci    and the implementation of 64-bit countTrailingZeros() would be 7
5bd8deadSopenharmony_ci    instructions.  Neither of these is better than an application developer
5bd8deadSopenharmony_ci    could achieve in GLSL:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        uint countLeadingZeros(uint64_t value)
5bd8deadSopenharmony_ci        {
5bd8deadSopenharmony_ci            uvec2 v = unpackUint2x32(value);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci            return v.y == 0
5bd8deadSopenharmony_ci                ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        uint countTrailingZeros(uint64_t value)
5bd8deadSopenharmony_ci        {
5bd8deadSopenharmony_ci            uvec2 v = unpackUint2x32(value);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci            return v.x == 0
5bd8deadSopenharmony_ci                ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    5) Should 64-bit versions of the arithmetic functions be provided?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    RESOLVED: NO.  Since recent generations of Intel GPUs have removed
5bd8deadSopenharmony_ci    hardware support for 64-bit integer arithmetic, there doesn't seem to be
5bd8deadSopenharmony_ci    much value in providing 64-bit arithmetic functions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    6) Should this extension include average()?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    RESOLVED: YES.  average() corresponds to hadd() in OpenCL, and
5bd8deadSopenharmony_ci    averageRounded() corresponds to rhadd() in OpenCL.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    averageRounded() corresponds to the AVG instruction on Intel GPUs.
5bd8deadSopenharmony_ci    average(), on the other hand, does not correspond to a single instruction.
5bd8deadSopenharmony_ci    The signed and unsigned versions may have slightly different
5bd8deadSopenharmony_ci    implementations depending on the specific GPU.  In the worst case, the
5bd8deadSopenharmony_ci    implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
5bd8deadSopenharmony_ci    1)), and in the best case it is 3 instructions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciRevision History
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Rev  Date         Author    Changes
5bd8deadSopenharmony_ci    ---  -----------  --------  ---------------------------------------------
5bd8deadSopenharmony_ci      1  04-Sep-2018  idr       Initial version.
5bd8deadSopenharmony_ci      2  19-Sep-2018  idr       Add interactions with AMD_gpu_shader_int16.
5bd8deadSopenharmony_ci      3  22-Jan-2019  idr       Add interactions with EXT_shader_explicit_arithmetic_types.
5bd8deadSopenharmony_ci      4  14-Nov-2019  idr       Resolve issue #1 and issue #5.
5bd8deadSopenharmony_ci      5  25-Nov-2019  idr       Fix a bunch of typos noticed by @cmarcelo.