extensions/NV/NV_gpu_program5.txt

5bd8deadSopenharmony_ciName
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_gpu_program5
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciName Strings
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    GL_NV_gpu_program5
5bd8deadSopenharmony_ci    GL_NV_gpu_program_fp64
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciContact
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciStatus
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Shipping.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciVersion
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Last Modified Date:         05/25/2022
5bd8deadSopenharmony_ci    NVIDIA Revision:            8
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNumber
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    388
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    OpenGL 2.0 is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension is written against the OpenGL 3.0 specification.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_gpu_program4 and NV_gpu_program4_1 are required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_shader_buffer_load is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_shader_buffer_store is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension is written against and interacts with the NV_gpu_program4,
5bd8deadSopenharmony_ci    NV_vertex_program4, NV_geometry_program4, and NV_fragment_program4
5bd8deadSopenharmony_ci    specifications.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts with NV_tessellation_program5.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts with ARB_transform_feedback3.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with NV_shader_buffer_load.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with NV_shader_buffer_store.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with NV_parameter_buffer_object2.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with OpenGL 3.3, ARB_texture_swizzle,
5bd8deadSopenharmony_ci    and EXT_texture_swizzle.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with ARB_blend_func_extended.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with EXT_shader_image_load_store.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension interacts trivially with ARB_shader_subroutine.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If the 64-bit floating-point portion of this extension is not supported,
5bd8deadSopenharmony_ci    "GL_NV_gpu_program_fp64" will not be found in the extension string.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciOverview
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This specification documents the common instruction set and basic
5bd8deadSopenharmony_ci    functionality provided by NVIDIA's 5th generation of assembly instruction
5bd8deadSopenharmony_ci    sets supporting programmable graphics pipeline stages.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The instruction set builds upon the basic framework provided by the
5bd8deadSopenharmony_ci    ARB_vertex_program and ARB_fragment_program extensions to expose
5bd8deadSopenharmony_ci    considerably more capable hardware.  In addition to new capabilities for
5bd8deadSopenharmony_ci    vertex and fragment programs, this extension provides new functionality
5bd8deadSopenharmony_ci    for geometry programs as originally described in the NV_geometry_program4
5bd8deadSopenharmony_ci    specification, and serves as the basis for the new tessellation control
5bd8deadSopenharmony_ci    and evaluation programs described in the NV_tessellation_program5
5bd8deadSopenharmony_ci    extension.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Programs using the functionality provided by this extension should begin
5bd8deadSopenharmony_ci    with the program headers "!!NVvp5.0" (vertex programs), "!!NVtcp5.0"
5bd8deadSopenharmony_ci    (tessellation control programs), "!!NVtep5.0" (tessellation evaluation
5bd8deadSopenharmony_ci    programs), "!!NVgp5.0" (geometry programs), and "!!NVfp5.0" (fragment
5bd8deadSopenharmony_ci    programs).
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension provides a variety of new features, including:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for 64-bit integer operations;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * the ability to dynamically index into an array of texture units or
5bd8deadSopenharmony_ci        program parameter buffers;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * extending texel offset support to allow loading texel offsets from
5bd8deadSopenharmony_ci        regular integer operands computed at run-time, instead of requiring
5bd8deadSopenharmony_ci        that the offsets be constants encoded in texture instructions;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * extending TXG (texture gather) support to return the 2x2 footprint
5bd8deadSopenharmony_ci        from any component of the texture image instead of always returning
5bd8deadSopenharmony_ci        the first (x) component;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * extending TXG to support shadow comparisons in conjunction with a
5bd8deadSopenharmony_ci        depth texture, via the SHADOW* targets;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * further extending texture gather support to provide a new opcode
5bd8deadSopenharmony_ci        (TXGO) that applies a separate texel offset vector to each of the four
5bd8deadSopenharmony_ci        samples returned by the instruction;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * bit manipulation instructions, including ones to find the position of
5bd8deadSopenharmony_ci        the most or least significant set bit, bitfield insertion and
5bd8deadSopenharmony_ci        extraction, and bit reversal;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * a general data conversion instruction (CVT) supporting conversion
5bd8deadSopenharmony_ci        between any two data types supported by this extension; and
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * new instructions to compute the composite of a set of boolean
5bd8deadSopenharmony_ci        conditions a group of shader threads.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension also provides some new capabilities for individual program
5bd8deadSopenharmony_ci    types, including:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for instanced geometry programs, where a geometry program may
5bd8deadSopenharmony_ci        be run multiple times for each primitive;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for emitting vertices in a geometry program where each vertex
5bd8deadSopenharmony_ci        emitted may be directed at a specified vertex stream and captured
5bd8deadSopenharmony_ci        using the ARB_transform_feedback3 extension;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for interpolating an attribute at a programmable offset
5bd8deadSopenharmony_ci        relative to the pixel center (IPAO), at a programmable sample number
5bd8deadSopenharmony_ci        (IPAS), or at the fragment's centroid location (IPAC) in a fragment
5bd8deadSopenharmony_ci        program;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for reading a mask of covered samples in a fragment program;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for reading a point sprite coordinate directly in a fragment
5bd8deadSopenharmony_ci        program, without overriding a texture coordinate;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for reading patch primitives and per-patch attributes
5bd8deadSopenharmony_ci        (introduced by ARB_tessellation_shader) in a geometry program; and
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for multiple output vectors for a single color output in a
5bd8deadSopenharmony_ci        fragment program (as used by ARB_blend_func_extended).
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension also provides optional support for 64-bit-per-component
5bd8deadSopenharmony_ci    variables and 64-bit floating-point arithmetic.  These features are
5bd8deadSopenharmony_ci    supported if and only if "NV_gpu_program_fp64" is found in the extension
5bd8deadSopenharmony_ci    string.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension incorporates the memory access operations from the
5bd8deadSopenharmony_ci    NV_shader_buffer_load and NV_parameter_buffer_object2 extensions,
5bd8deadSopenharmony_ci    originally built as add-ons to NV_gpu_program4.  It also provides the
5bd8deadSopenharmony_ci    following new capabilities:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for the features without requiring a separate OPTION keyword;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for indexing into an array of constant buffers using the LDC
5bd8deadSopenharmony_ci        opcode added by NV_parameter_buffer_object2;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for storing into buffer objects at a specified GPU address
5bd8deadSopenharmony_ci        using the STORE opcode, an allowing applications to create READ_WRITE
5bd8deadSopenharmony_ci        and WRITE_ONLY mappings when making a buffer object resident using the
5bd8deadSopenharmony_ci        API mechanisms in the NV_shader_buffer_store extension;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * storage instruction modifiers to allow loading and storing 64-bit
5bd8deadSopenharmony_ci        component values;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for atomic memory transactions using the ATOM opcode, where
5bd8deadSopenharmony_ci        the instruction atomically reads the memory pointed to by a pointer,
5bd8deadSopenharmony_ci        performs a specified computation, stores the results of that
5bd8deadSopenharmony_ci        computation, and returns the original value read;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * support for memory barrier transactions using the MEMBAR opcode, which
5bd8deadSopenharmony_ci        ensures that all memory stores issued prior to the opcode complete
5bd8deadSopenharmony_ci        prior to any subsequent memory transactions; and
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * a fragment program option to specify that depth and stencil tests are
5bd8deadSopenharmony_ci        performed prior to fragment program execution.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Additionally, the assembly program languages supported by this extension
5bd8deadSopenharmony_ci    include support for reading, writing, and performing atomic memory
5bd8deadSopenharmony_ci    operations on texture image data using the opcodes and mechanisms
5bd8deadSopenharmony_ci    documented in the "Dependencies on NV_gpu_program5" section of the
5bd8deadSopenharmony_ci    EXT_shader_image_load_store extension.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Procedures and Functions
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Tokens
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
5bd8deadSopenharmony_ci    GetFloatv, and GetDoublev:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV             0x8E5A
5bd8deadSopenharmony_ci        MIN_FRAGMENT_INTERPOLATION_OFFSET_NV            0x8E5B
5bd8deadSopenharmony_ci        MAX_FRAGMENT_INTERPOLATION_OFFSET_NV            0x8E5C
5bd8deadSopenharmony_ci        FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV   0x8E5D
5bd8deadSopenharmony_ci        MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV            0x8E5E
5bd8deadSopenharmony_ci        MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV            0x8E5F
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_fragment_program4, Program Grammar
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify the section, updating the program header string for the extended
5bd8deadSopenharmony_ci     instruction set)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Fragment programs are required to begin with the header string
5bd8deadSopenharmony_ci    "!!NVfp5.0".  This header string identifies the subsequent program body as
5bd8deadSopenharmony_ci    being a fragment program and indicates that it should be parsed according
5bd8deadSopenharmony_ci    to the base NV_gpu_program5 grammar plus the additions below.  Program
5bd8deadSopenharmony_ci    string parsing begins with the character immediately following the header
5bd8deadSopenharmony_ci    string.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add/change the following rules to the NV_fragment_program4 and
5bd8deadSopenharmony_ci     NV_gpu_program5 base grammars)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <SpecialInstruction>    ::= "IPAC" <opModifiers> <instResult> ","
5bd8deadSopenharmony_ci                                <instOperandV>
5bd8deadSopenharmony_ci                              | "IPAO" <opModifiers> <instResult> ","
5bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandV>
5bd8deadSopenharmony_ci                              | "IPAS" <opModifiers> <instResult> ","
5bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandS>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <interpModifier>        ::= "SAMPLE"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <attribBasic>           ::= <fragPrefix> "sampleid"
5bd8deadSopenharmony_ci                              | <fragPrefix> "samplemask"
5bd8deadSopenharmony_ci                              | <fragPrefix> "pointcoord"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <resultBasic>           ::= <resPrefix> "color" <resultOptColorNum>
5bd8deadSopenharmony_ci                                <resultOptColorType>
5bd8deadSopenharmony_ci                              | <resPrefix> "samplemask"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <resultOptColorType>    ::= ""
5bd8deadSopenharmony_ci                              | "." <colorType>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_geometry_program4, Program Grammar
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify the section, updating the program header string for the extended
5bd8deadSopenharmony_ci     instruction set)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Geometry programs are required to begin with the header string
5bd8deadSopenharmony_ci    "!!NVgp5.0".  This header string identifies the subsequent program body as
5bd8deadSopenharmony_ci    being a geometry program and indicates that it should be parsed according
5bd8deadSopenharmony_ci    to the base NV_gpu_program5 grammar plus the additions below.  Program
5bd8deadSopenharmony_ci    string parsing begins with the character immediately following the header
5bd8deadSopenharmony_ci    string.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add the following rules to the NV_geometry_program4 and NV_gpu_program5
5bd8deadSopenharmony_ci     base grammars)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <declaration>           ::= "INVOCATIONS" <int>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <declPrimInType>        ::= "PATCHES"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <SpecialInstruction>    ::= "EMITS" <instOperandS>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <attribBasic>           ::= <primPrefix> "invocation"
5bd8deadSopenharmony_ci                              | <primPrefix> "vertexcount"
5bd8deadSopenharmony_ci                              | <attribTessOuter> <optArrayMemAbs>
5bd8deadSopenharmony_ci                              | <attribTessInner> <optArrayMemAbs>
5bd8deadSopenharmony_ci                              | <attribPatchGeneric> <optArrayMemAbs>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <attribMulti>           ::= <attribTessOuter> <arrayRange>
5bd8deadSopenharmony_ci                              | <attribTessInner> <arrayRange>
5bd8deadSopenharmony_ci                              | <attribPatchGeneric> <arrayRange>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <attribTessOuter>       ::= <primPrefix> "." "tessouter"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <attribTessInner>       ::= <primPrefix> "." "tessinner"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <attribPatchGeneric>    ::= <primPrefix> "." "patch" "." "attrib"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_vertex_program4, Program Grammar
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify the section, updating the program header string for the extended
5bd8deadSopenharmony_ci     instruction set)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Vertex programs are required to begin with the header string "!!NVvp5.0".
5bd8deadSopenharmony_ci    This header string identifies the subsequent program body as being a
5bd8deadSopenharmony_ci    vertex program and indicates that it should be parsed according to the
5bd8deadSopenharmony_ci    base NV_gpu_program5 grammar plus the additions below.  Program string
5bd8deadSopenharmony_ci    parsing begins with the character immediately following the header string.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_gpu_program4, Program Grammar
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add the following grammar rules to the NV_gpu_program4 base grammar;
5bd8deadSopenharmony_ci     additional grammar rules usable for assembly programs are documented in
5bd8deadSopenharmony_ci     the EXT_shader_image_load_store and ARB_shader_subroutine specifications)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <instruction>           ::= <MemInstruction>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <MemInstruction>        ::= <ATOMop_instruction>
5bd8deadSopenharmony_ci                              | <STOREop_instruction>
5bd8deadSopenharmony_ci                              | <MEMBARop_instruction>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <VECTORop>              ::= "BFR"
5bd8deadSopenharmony_ci                              | "BTC"
5bd8deadSopenharmony_ci                              | "BTFL"
5bd8deadSopenharmony_ci                              | "BTFM"
5bd8deadSopenharmony_ci                              | "PK64"
5bd8deadSopenharmony_ci                              | "LDC"
5bd8deadSopenharmony_ci                              | "CVT"
5bd8deadSopenharmony_ci                              | "TGALL"
5bd8deadSopenharmony_ci                              | "TGANY"
5bd8deadSopenharmony_ci                              | "TGEQ"
5bd8deadSopenharmony_ci                              | "UP64"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <SCALARop>              ::= "LOAD"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <BINop>                 ::= "BFE"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <TRIop>                 ::= "BFI"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <TEXop_instruction>     ::= <TEXop> <opModifiers> <instResult> ","
5bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandV> ","
5bd8deadSopenharmony_ci                                <texAccess>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <TEXop>                 ::= "TXG"
5bd8deadSopenharmony_ci                              | "LOD"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <TXDop>                 ::= "TXGO"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <ATOMop_instruction>    ::= <ATOMop> <opModifiers> <instResult> ","
5bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandS>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <ATOMop>                ::= "ATOM"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <STOREop_instruction>   ::= <STOREop> <opModifiers> <instOperandV> ","
5bd8deadSopenharmony_ci                                <instOperandS>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <STOREop>               ::= "STORE"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <MEMBARop_instruction>  ::= <MEMBARop> <opModifiers>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <MEMBARop>              ::= "MEMBAR"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <opModifier>            ::= "F16"
5bd8deadSopenharmony_ci                              | "F32"
5bd8deadSopenharmony_ci                              | "F64"
5bd8deadSopenharmony_ci                              | "F32X2"
5bd8deadSopenharmony_ci                              | "F32X4"
5bd8deadSopenharmony_ci                              | "F64X2"
5bd8deadSopenharmony_ci                              | "F64X4"
5bd8deadSopenharmony_ci                              | "S8"
5bd8deadSopenharmony_ci                              | "S16"
5bd8deadSopenharmony_ci                              | "S32"
5bd8deadSopenharmony_ci                              | "S32X2"
5bd8deadSopenharmony_ci                              | "S32X4"
5bd8deadSopenharmony_ci                              | "S64"
5bd8deadSopenharmony_ci                              | "S64X2"
5bd8deadSopenharmony_ci                              | "S64X4"
5bd8deadSopenharmony_ci                              | "U8"
5bd8deadSopenharmony_ci                              | "U16"
5bd8deadSopenharmony_ci                              | "U32"
5bd8deadSopenharmony_ci                              | "U32X2"
5bd8deadSopenharmony_ci                              | "U32X4"
5bd8deadSopenharmony_ci                              | "U64"
5bd8deadSopenharmony_ci                              | "U64X2"
5bd8deadSopenharmony_ci                              | "U64X4"
5bd8deadSopenharmony_ci                              | "ADD"
5bd8deadSopenharmony_ci                              | "MIN"
5bd8deadSopenharmony_ci                              | "MAX"
5bd8deadSopenharmony_ci                              | "IWRAP"
5bd8deadSopenharmony_ci                              | "DWRAP"
5bd8deadSopenharmony_ci                              | "AND"
5bd8deadSopenharmony_ci                              | "OR"
5bd8deadSopenharmony_ci                              | "XOR"
5bd8deadSopenharmony_ci                              | "EXCH"
5bd8deadSopenharmony_ci                              | "CSWAP"
5bd8deadSopenharmony_ci                              | "COH"
5bd8deadSopenharmony_ci                              | "ROUND"
5bd8deadSopenharmony_ci                              | "CEIL"
5bd8deadSopenharmony_ci                              | "FLR"
5bd8deadSopenharmony_ci                              | "TRUNC"
5bd8deadSopenharmony_ci                              | "PREC"
5bd8deadSopenharmony_ci                              | "VOL"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texAccess>             ::= <textureUseS> "," <texTarget> <optTexOffset>
5bd8deadSopenharmony_ci                              | <textureUseV> "," <texTarget> <optTexOffset>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texTarget>             ::= "ARRAYCUBE"
5bd8deadSopenharmony_ci                              | "SHADOWARRAYCUBE"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <optTexOffset>          ::= /* empty */
5bd8deadSopenharmony_ci                              | <texOffset>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texOffset>             ::= "offset" "(" <instOperandV> ")"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <namingStatement>       ::= <TEXTURE_statement>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <BUFFER_statement>      ::= <bufferDeclType> <establishName>
5bd8deadSopenharmony_ci                                <optArraySize> <optArraySize> "="
5bd8deadSopenharmony_ci                                <bufferMultInit>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <bufferDeclType>        ::= "CBUFFER"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <TEXTURE_statement>     ::= "TEXTURE" <establishName> <texSingleInit>
5bd8deadSopenharmony_ci                              | "TEXTURE" <establishName> <optArraySize>
5bd8deadSopenharmony_ci                                <texMultipleInit>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texSingleInit>         ::= "=" <textureUseDS>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texMultipleInit>       ::= "=" "{" <texItemList> "}"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texItemList>           ::= <textureUseDM>
5bd8deadSopenharmony_ci                              | <textureUseDM> "," <texItemList>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <bufferBinding>         ::= "program" "." "buffer" <arrayRange>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <textureUseS>           ::= <textureUseV> <texImageUnitComp>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <textureUseV>           ::= <texImageUnit>
5bd8deadSopenharmony_ci                              | <texVarName> <optArrayMem>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <textureUseDS>          ::= "texture" <arrayMemAbs>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <textureUseDM>          ::= <textureUseDS>
5bd8deadSopenharmony_ci                              | "texture" <arrayRange>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <texImageUnitComp>      ::= <scalarSuffix>
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.3.1, Program Variable Types
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (IGNORE if GL_NV_gpu_program_fp64 is not found in the extension string.
5bd8deadSopenharmony_ci     Otherwise modify storage size modifiers to guarantee that "LONG"
5bd8deadSopenharmony_ci     variables are at least 64 bits in size.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Explicitly declared variables may optionally have one storage size
5bd8deadSopenharmony_ci    modifier.  Variables decared as "SHORT" will be represented using at least
5bd8deadSopenharmony_ci    16 bits per component.  "SHORT" floating-point values will have at least 5
5bd8deadSopenharmony_ci    bits of exponent and 10 bits of mantissa.  Variables declared as "LONG"
5bd8deadSopenharmony_ci    will be represented with at least 64 bits per component.  "LONG"
5bd8deadSopenharmony_ci    floating-point values will have at least 11 bits of exponent and 52 bits
5bd8deadSopenharmony_ci    of mantissa.  If no size modifier is provided, the GL will automatically
5bd8deadSopenharmony_ci    select component sizes.  Implementations are not required to support more
5bd8deadSopenharmony_ci    than one component size, so "SHORT", "LONG", and the default could all
5bd8deadSopenharmony_ci    refer to the same component size.  The "LONG" modifier is supported only
5bd8deadSopenharmony_ci    for declarations of temporary variables ("TEMP"), and attribute variables
5bd8deadSopenharmony_ci    ("ATTRIB") in vertex programs.  The "SHORT" modifier is supported only
5bd8deadSopenharmony_ci    for declarations of temporary variables and result variables ("OUTPUT").
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_fragment_program4 specification, Program
5bd8deadSopenharmony_ci    Attribute Variables.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add a table entry and relevant text describing the fragment program
5bd8deadSopenharmony_ci     input sample mask variable.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Fragment Attribute Binding  Components  Underlying State
5bd8deadSopenharmony_ci      --------------------------  ----------  ----------------------------
5bd8deadSopenharmony_ci      fragment.samplemask         (m,-,-,-)   fragment coverage mask
5bd8deadSopenharmony_ci      fragment.pointcoord         (s,t,-,-)   fragment point sprite coordinate
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.samplemask", the "x"
5bd8deadSopenharmony_ci    component is filled with a coverage mask indicating the set of samples
5bd8deadSopenharmony_ci    covered by this fragment.  The coverage mask is a bitfield, where bit <n>
5bd8deadSopenharmony_ci    is one if the sample number <n> is covered and zero otherwise.  If
5bd8deadSopenharmony_ci    multisample buffers are not available (SAMPLE_BUFFERS is zero), bit zero
5bd8deadSopenharmony_ci    indicates if the center of the pixel corresponding to the fragment is
5bd8deadSopenharmony_ci    covered.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.pointcoord", the "x" and
5bd8deadSopenharmony_ci    "y" components are filled with the s and t point sprite coordinates
5bd8deadSopenharmony_ci    (section 3.3.1), respectively.  The "z" and "w" components are undefined.
5bd8deadSopenharmony_ci    If the fragment is generated by any primitive other than a point, or if
5bd8deadSopenharmony_ci    point sprites are disabled, all four components of the binding are
5bd8deadSopenharmony_ci    undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_geometry_program4 specification, Program
5bd8deadSopenharmony_ci    Attribute Variables.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add a table entry and relevant text describing the geometry program
5bd8deadSopenharmony_ci    invocation attribute and per-patch attributes.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Geometry Vertex Binding         Components  Description
5bd8deadSopenharmony_ci      -----------------------------   ----------  ----------------------------
5bd8deadSopenharmony_ci      ...
5bd8deadSopenharmony_ci      primitive.invocation            (id,-,-,-)  geometry program invocation
5bd8deadSopenharmony_ci      primitive.tessouter[n]          (x,-,-,-)   outer tess. level n
5bd8deadSopenharmony_ci      primitive.tessinner[n]          (x,-,-,-)   inner tess. level n
5bd8deadSopenharmony_ci      primitive.patch.attrib[n]       (x,y,z,w)   generic patch attribute n
5bd8deadSopenharmony_ci      primitive.tessouter[n..o]       (x,-,-,-)   outer tess. levels n to o
5bd8deadSopenharmony_ci      primitive.tessinner[n..o]       (x,-,-,-)   inner tess. levels n to o
5bd8deadSopenharmony_ci      primitive.patch.attrib[n..o]    (x,y,z,w)   generic patch attrib n to o
5bd8deadSopenharmony_ci      primitive.vertexcount           (c,-,-,-)   vertices in primitive
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ...
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.invocation", the "x"
5bd8deadSopenharmony_ci    component is filled with an integer giving the number of previous
5bd8deadSopenharmony_ci    invocations of the geometry program on the primitive being processed.  If
5bd8deadSopenharmony_ci    the geometry program is invoked only once per primitive (default), this
5bd8deadSopenharmony_ci    component will always be zero.  If the program is invoked multiple times
5bd8deadSopenharmony_ci    (via the INVOCATIONS declaration), the component will be zero on the first
5bd8deadSopenharmony_ci    invocation, one on the second, and so forth.  The "y", "z", and "w"
5bd8deadSopenharmony_ci    components of the variable are always undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If an attribute binding matches "primitive.tessouter[n]", the "x"
5bd8deadSopenharmony_ci    component is filled with the per-patch outer tessellation level numbered
5bd8deadSopenharmony_ci    <n> of the input patch.  <n> must be less than four.  The "y", "z", and
5bd8deadSopenharmony_ci    "w" components are always undefined.  A program will fail to load if this
5bd8deadSopenharmony_ci    attribute binding is used and the input primitive type is not PATCHES.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If an attribute binding matches "primitive.tessinner[n]", the "x"
5bd8deadSopenharmony_ci    component is filled with the per-patch inner tessellation level numbered
5bd8deadSopenharmony_ci    <n> of the input patch.  <n> must be less than two.  The "y", "z", and "w"
5bd8deadSopenharmony_ci    components are always undefined.  A program will fail to load if this
5bd8deadSopenharmony_ci    attribute binding is used and the input primitive type is not PATCHES.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If an attribute binding matches "primitive.patch.attrib[n]", the "x", "y",
5bd8deadSopenharmony_ci    "z", and "w" components are filled with the corresponding components of
5bd8deadSopenharmony_ci    the per-patch generic attribute numbered <n> of the input patch.  A
5bd8deadSopenharmony_ci    program will fail to load if this attribute binding is used and the input
5bd8deadSopenharmony_ci    primitive type is not PATCHES.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If an attribute binding matches "primitive.tessouter[n..o]",
5bd8deadSopenharmony_ci    "primitive.tessinner[n..o]", or "primitive.patch.attrib[n..o]", a sequence
5bd8deadSopenharmony_ci    of 1+<o>-<n> outer tessellation level, inner tessellation level, or
5bd8deadSopenharmony_ci    per-patch generic attribute bindings is created.  For per-patch generic
5bd8deadSopenharmony_ci    attribute bindings, it is as though the sequence
5bd8deadSopenharmony_ci    "primitive.patch.attrib[n], primitive.patch.attrib[n+1], ...
5bd8deadSopenharmony_ci    primitive.patch.attrib[o]" were specfied.  These bindings are available
5bd8deadSopenharmony_ci    only in explicit declarations of array variables.  A program will fail to
5bd8deadSopenharmony_ci    load if <n> is greater than <o> or the input primitive type is not
5bd8deadSopenharmony_ci    PATCHES.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.vertexcount", the "x"
5bd8deadSopenharmony_ci    component is filled with the number of vertices in the input primitive
5bd8deadSopenharmony_ci    being processed.  The "y", "z", and "w" components of the variable are
5bd8deadSopenharmony_ci    always undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.3.5, Program Results
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify Table X.X)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Binding                        Components  Description
5bd8deadSopenharmony_ci      -----------------------------  ----------  ----------------------------
5bd8deadSopenharmony_ci      result.color[n].primary        (r,g,b,a)   primary color n (SRC_COLOR)
5bd8deadSopenharmony_ci      result.color[n].secondary      (r,g,b,a)   secondary color n (SRC1_COLOR)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Table X.X:  Fragment Result Variable Bindings. Components labeled "*"
5bd8deadSopenharmony_ci      are unused. "[n]" is optional -- color <n> is used if specified; color
5bd8deadSopenharmony_ci      0 is used otherwise.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add after third paragraph)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a result variable binding matches "result.color[n].primary" or
5bd8deadSopenharmony_ci    "result.color[n].secondary" and the ARB_blend_func_extended option is
5bd8deadSopenharmony_ci    specified, updates to the "x", "y", "z", and "w" components of these color
5bd8deadSopenharmony_ci    result variables modify the "r", "g", "b", and "a" components of the
5bd8deadSopenharmony_ci    SRC_COLOR and SRC1_COLOR color outputs, respectively, for the fragment
5bd8deadSopenharmony_ci    output color numbered <n>.  If the ARB_blend_func_extended program option
5bd8deadSopenharmony_ci    is not specified, the "result.color[n].primary" and
5bd8deadSopenharmony_ci    "result.color[n].secondary" bindings are unavailable.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.3.6, Program Parameter Buffers
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify the description of parameter buffer arrays to require that all
5bd8deadSopenharmony_ci    bindings in an array declaration must use the same single buffer *or*
5bd8deadSopenharmony_ci    buffer range)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ...  Program parameter buffer variables may be declared as arrays, but all
5bd8deadSopenharmony_ci    bindings assigned to the array must use the same binding point or binding
5bd8deadSopenharmony_ci    point range, and must increase consecutively.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add to the end of the section)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    In explicit variable declarations, the bindings in Table X.12.1 of the
5bd8deadSopenharmony_ci    form "program.buffer[a..b]" may also be used, and indicate the variable
5bd8deadSopenharmony_ci    spans multiple buffer binding points.  Such variables must be accessed as
5bd8deadSopenharmony_ci    an arrays, with the first index specifying an offset into the range of
5bd8deadSopenharmony_ci    buffer object binding points.  A buffer index of zero identifies binding
5bd8deadSopenharmony_ci    point <a>; an index of <b>-<a>-1 identifies binding point <b>.  If such a
5bd8deadSopenharmony_ci    variable is declared as an array, a second index must be provided to
5bd8deadSopenharmony_ci    identify the individual array element.  A program will fail to compile if
5bd8deadSopenharmony_ci    such bindings are used when <a> or <b> is negative or greater than or
5bd8deadSopenharmony_ci    equal to the number of buffer binding points supported for the program
5bd8deadSopenharmony_ci    type, or if <a> is greater than <b>.  The bindings in Table X.12.1 may not
5bd8deadSopenharmony_ci    be used in implicit variable declarations.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Binding                        Components  Underlying State
5bd8deadSopenharmony_ci      -----------------------------  ----------  -----------------------------
5bd8deadSopenharmony_ci      program.buffer[a..b][c]        (x,x,x,x)   program parameter buffers a
5bd8deadSopenharmony_ci                                                   through b, element c
5bd8deadSopenharmony_ci      program.buffer[a..b][c..d]     (x,x,x,x)   program parameter buffers a
5bd8deadSopenharmony_ci                                                   through b, elements b
5bd8deadSopenharmony_ci                                                   through c
5bd8deadSopenharmony_ci      program.buffer[a..b]           (x,x,x,x)   program parameter buffers a
5bd8deadSopenharmony_ci                                                   through b, all elements
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Table X.12.1:  Program Parameter Buffer Array Bindings.  <a> and <b>
5bd8deadSopenharmony_ci      indicate buffer numbers, <c> and <d> indicate individual elements.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When bindings beginning with "program.buffer[a..b]" are used in a variable
5bd8deadSopenharmony_ci    declaration, they behave identically to corresponding beginning with
5bd8deadSopenharmony_ci    "program.buffer[a]", except that the variable is filled with a separate
5bd8deadSopenharmony_ci    set of values for each buffer binding point from <a> to <b> inclusive.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add new section after Section 2.X.3.7, Program Condition Code Registers
5bd8deadSopenharmony_ci    and renumber subsequent sections accordingly)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.3.8, Program Texture Variables
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Program texture variables are used as constants during program execution
5bd8deadSopenharmony_ci    and refer the texture objects bound to to one or more texture image units.
5bd8deadSopenharmony_ci    All texture variables have associated bindings and are read-only during
5bd8deadSopenharmony_ci    program execution.  Texture variables retain their values across program
5bd8deadSopenharmony_ci    invocations, and the set of texture image units to which they refer is
5bd8deadSopenharmony_ci    constant.  The texture object a variable refers to may be changed by
5bd8deadSopenharmony_ci    binding a new texture object to the appropriate target of the
5bd8deadSopenharmony_ci    corresponding texture image unit.  Texture variables may only be used to
5bd8deadSopenharmony_ci    identify a texture object in texture instructions, and may not be used as
5bd8deadSopenharmony_ci    operands in any other instruction.  Texture variables may be declared
5bd8deadSopenharmony_ci    explicitly via the <TEXTURE_statement> grammar rule, or implicitly by
5bd8deadSopenharmony_ci    using a texture image unit binding in an instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Texture array variables may be declared as arrays, but the list of
5bd8deadSopenharmony_ci    texture image units assigned to the array must increase consectively.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Texture variables identify only a texture image unit; the corresponding
5bd8deadSopenharmony_ci    texture target (e.g., 1D, 2D, CUBE) and texture object is identified by
5bd8deadSopenharmony_ci    the <texTarget> grammar rule in instructions using the texture variable.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Binding          Components  Underlying State
5bd8deadSopenharmony_ci      ---------------  ----------  ------------------------------------------
5bd8deadSopenharmony_ci      texture[a]           x      texture object bound to image unit a
5bd8deadSopenharmony_ci      texture[a..b]        x      texture objects bound to image units a
5bd8deadSopenharmony_ci                                     through b
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Table X.12.2:  Texture Image Unit Bindings.  <a> and <b> indicate
5bd8deadSopenharmony_ci      texture image unit numbers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a texture binding matches "texture[a]", the texture variable is filled
5bd8deadSopenharmony_ci    with a single integer referring to texture image unit <a>.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a texture binding matches "texture[a..b]", the texture variable is
5bd8deadSopenharmony_ci    filled with an array of integers referring to texture image units <a>
5bd8deadSopenharmony_ci    through <b>, inclusive.  A program will fail to compile if <a> or <b> is
5bd8deadSopenharmony_ci    negative or greater than or equal to the number of texture image units
5bd8deadSopenharmony_ci    supported, or if <a> is greater than <b>.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.4, Program Execution Environment
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Update the instruction set table to include new columns to indicate the
5bd8deadSopenharmony_ci     first ISA supporting the instruction, and to indicate whether the
5bd8deadSopenharmony_ci     instruction supports 64-bit floating-point modifiers.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Instr-      Modifiers
5bd8deadSopenharmony_ci      uction  V  F I C S H D  Out Inputs    Description
5bd8deadSopenharmony_ci      ------- -- - - - - - -  --- --------  --------------------------------
5bd8deadSopenharmony_ci      ABS     40 6 6 X X X F  v   v         absolute value
5bd8deadSopenharmony_ci      ADD     40 6 6 X X X F  v   v,v       add
5bd8deadSopenharmony_ci      AND     40 - 6 X - - S  v   v,v       bitwise and
5bd8deadSopenharmony_ci      ATOM    50 - - X - - -  s   v,su      atomic memory transaction
5bd8deadSopenharmony_ci      BFE     50 - X X - - S  v   v,v       bitfield extract
5bd8deadSopenharmony_ci      BFI     50 - X X - - S  v   v,v,v     bitfield insert
5bd8deadSopenharmony_ci      BFR     50 - X X - - S  v   v         bitfield reverse
5bd8deadSopenharmony_ci      BRK     40 - - - - - -  -   c         break out of loop instruction
5bd8deadSopenharmony_ci      BTC     50 - X X - - S  v   v         bit count
5bd8deadSopenharmony_ci      BTFL    50 - X X - - S  v   v         find least significant bit
5bd8deadSopenharmony_ci      BTFM    50 - X X - - S  v   v         find most significant bit
5bd8deadSopenharmony_ci      CAL     40 - - - - - -  -   c         subroutine call
5bd8deadSopenharmony_ci      CEIL    40 6 6 X X X F  v   vf        ceiling
5bd8deadSopenharmony_ci      CMP     40 6 6 X X X F  v   v,v,v     compare
5bd8deadSopenharmony_ci      CONT    40 - - - - - -  -   c         continue with next loop interation
5bd8deadSopenharmony_ci      COS     40 X - X X X F  s   s         cosine with reduction to [-PI,PI]
5bd8deadSopenharmony_ci      CVT     50 - - X X - F  v   v         general data type conversion
5bd8deadSopenharmony_ci      DDX     40 X - X X X F  v   v         derivative relative to X (fp-only)
5bd8deadSopenharmony_ci      DDY     40 X - X X X F  v   v         derivative relative to Y (fp-only)
5bd8deadSopenharmony_ci      DIV     40 6 6 X X X F  v   v,s       divide vector components by scalar
5bd8deadSopenharmony_ci      DP2     40 X - X X X F  s   v,v       2-component dot product
5bd8deadSopenharmony_ci      DP2A    40 X - X X X F  s   v,v,v     2-comp. dot product w/scalar add
5bd8deadSopenharmony_ci      DP3     40 X - X X X F  s   v,v       3-component dot product
5bd8deadSopenharmony_ci      DP4     40 X - X X X F  s   v,v       4-component dot product
5bd8deadSopenharmony_ci      DPH     40 X - X X X F  s   v,v       homogeneous dot product
5bd8deadSopenharmony_ci      DST     40 X - X X X F  v   v,v       distance vector
5bd8deadSopenharmony_ci      ELSE    40 - - - - - -  -   -         start if test else block
5bd8deadSopenharmony_ci      EMIT    40 - - - - - -  -   -         emit vertex stream 0 (gp-only)
5bd8deadSopenharmony_ci      EMITS   50 - X - - - S  -   s         emit vertex to stream (gp-only)
5bd8deadSopenharmony_ci      ENDIF   40 - - - - - -  -   -         end if test block
5bd8deadSopenharmony_ci      ENDPRIM 40 - - - - - -  -   -         end of primitive (gp-only)
5bd8deadSopenharmony_ci      ENDREP  40 - - - - - -  -   -         end of repeat block
5bd8deadSopenharmony_ci      EX2     40 X - X X X F  s   s         exponential base 2
5bd8deadSopenharmony_ci      FLR     40 6 6 X X X F  v   vf        floor
5bd8deadSopenharmony_ci      FRC     40 6 - X X X F  v   v         fraction
5bd8deadSopenharmony_ci      I2F     40 - 6 X - - S  vf  v         integer to float
5bd8deadSopenharmony_ci      IF      40 - - - - - -  -   c         start of if test block
5bd8deadSopenharmony_ci      IPAC    50 X - X X - F  v   v         interpolate at centroid (fp-only)
5bd8deadSopenharmony_ci      IPAO    50 X - X X - F  v   v,v       interpolate w/offset (fp-only)
5bd8deadSopenharmony_ci      IPAS    50 X - X X - F  v   v,su      interpolate at sample (fp-only)
5bd8deadSopenharmony_ci      KIL     40 X X - - X F  -   vc        kill fragment
5bd8deadSopenharmony_ci      LDC     40 - - X X - F  v   v         load from constant buffer
5bd8deadSopenharmony_ci      LG2     40 X - X X X F  s   s         logarithm base 2
5bd8deadSopenharmony_ci      LIT     40 X - X X X F  v   v         compute lighting coefficients
5bd8deadSopenharmony_ci      LOAD    40 - - X X - F  v   su        global load
5bd8deadSopenharmony_ci      LOD     41 X - X X - F  v   vf,t      compute texture LOD
5bd8deadSopenharmony_ci      LRP     40 X - X X X F  v   v,v,v     linear interpolation
5bd8deadSopenharmony_ci      MAD     40 6 6 X X X F  v   v,v,v     multiply and add
5bd8deadSopenharmony_ci      MAX     40 6 6 X X X F  v   v,v       maximum
5bd8deadSopenharmony_ci      MEMBAR  50 - - - - - -  -   -         memory barrier
5bd8deadSopenharmony_ci      MIN     40 6 6 X X X F  v   v,v       minimum
5bd8deadSopenharmony_ci      MOD     40 - 6 X - - S  v   v,s       modulus vector components by scalar
5bd8deadSopenharmony_ci      MOV     40 6 6 X X X F  v   v         move
5bd8deadSopenharmony_ci      MUL     40 6 6 X X X F  v   v,v       multiply
5bd8deadSopenharmony_ci      NOT     40 - 6 X - - S  v   v         bitwise not
5bd8deadSopenharmony_ci      NRM     40 X - X X X F  v   v         normalize 3-component vector
5bd8deadSopenharmony_ci      OR      40 - 6 X - - S  v   v,v       bitwise or
5bd8deadSopenharmony_ci      PK2H    40 X X - - - F  s   vf        pack two 16-bit floats
5bd8deadSopenharmony_ci      PK2US   40 X X - - - F  s   vf        pack two floats as unsigned 16-bit
5bd8deadSopenharmony_ci      PK4B    40 X X - - - F  s   vf        pack four floats as signed 8-bit
5bd8deadSopenharmony_ci      PK4UB   40 X X - - - F  s   vf        pack four floats as unsigned 8-bit
5bd8deadSopenharmony_ci      PK64    50 X X - - - F  v   v         pack 4x32-bit vectors to 2x64
5bd8deadSopenharmony_ci      POW     40 X - X X X F  s   s,s       exponentiate
5bd8deadSopenharmony_ci      RCC     40 X - X X X F  s   s         reciprocal (clamped)
5bd8deadSopenharmony_ci      RCP     40 6 - X X X F  s   s         reciprocal
5bd8deadSopenharmony_ci      REP     40 6 6 - - X F  -   v         start of repeat block
5bd8deadSopenharmony_ci      RET     40 - - - - - -  -   c         subroutine return
5bd8deadSopenharmony_ci      RFL     40 X - X X X F  v   v,v       reflection vector
5bd8deadSopenharmony_ci      ROUND   40 6 6 X X X F  v   vf        round to nearest integer
5bd8deadSopenharmony_ci      RSQ     40 6 - X X X F  s   s         reciprocal square root
5bd8deadSopenharmony_ci      SAD     40 - 6 X - - S  vu  v,v,vu    sum of absolute differences
5bd8deadSopenharmony_ci      SCS     40 X - X X X F  v   s         sine/cosine without reduction
5bd8deadSopenharmony_ci      SEQ     40 6 6 X X X F  v   v,v       set on equal
5bd8deadSopenharmony_ci      SFL     40 6 6 X X X F  v   v,v       set on false
5bd8deadSopenharmony_ci      SGE     40 6 6 X X X F  v   v,v       set on greater than or equal
5bd8deadSopenharmony_ci      SGT     40 6 6 X X X F  v   v,v       set on greater than
5bd8deadSopenharmony_ci      SHL     40 - 6 X - - S  v   v,s       shift left
5bd8deadSopenharmony_ci      SHR     40 - 6 X - - S  v   v,s       shift right
5bd8deadSopenharmony_ci      SIN     40 X - X X X F  s   s         sine with reduction to [-PI,PI]
5bd8deadSopenharmony_ci      SLE     40 6 6 X X X F  v   v,v       set on less than or equal
5bd8deadSopenharmony_ci      SLT     40 6 6 X X X F  v   v,v       set on less than
5bd8deadSopenharmony_ci      SNE     40 6 6 X X X F  v   v,v       set on not equal
5bd8deadSopenharmony_ci      SSG     40 6 - X X X F  v   v         set sign
5bd8deadSopenharmony_ci      STORE   50 - - - - - -  -   v,su      global store
5bd8deadSopenharmony_ci      STR     40 6 6 X X X F  v   v,v       set on true
5bd8deadSopenharmony_ci      SUB     40 6 6 X X X F  v   v,v       subtract
5bd8deadSopenharmony_ci      SWZ     40 X - X X X F  v   v         extended swizzle
5bd8deadSopenharmony_ci      TEX     40 X X X X - F  v   vf,t      texture sample
5bd8deadSopenharmony_ci      TGALL   50 X X X X - F  v   v         test all non-zero in thread group
5bd8deadSopenharmony_ci      TGANY   50 X X X X - F  v   v         test any non-zero in thread group
5bd8deadSopenharmony_ci      TGEQ    50 X X X X - F  v   v         test all equal in thread group
5bd8deadSopenharmony_ci      TRUNC   40 6 6 X X X F  v   vf        truncate (round toward zero)
5bd8deadSopenharmony_ci      TXB     40 X X X X - F  v   vf,t      texture sample with bias
5bd8deadSopenharmony_ci      TXD     40 X X X X - F  v vf,vf,vf,t  texture sample w/partials
5bd8deadSopenharmony_ci      TXF     40 X X X X - F  v   vs,t      texel fetch
5bd8deadSopenharmony_ci      TXFMS   40 X X X X - F  v   vs,t      multisample texel fetch
5bd8deadSopenharmony_ci      TXG     41 X X X X - F  v   vf,t      texture gather
5bd8deadSopenharmony_ci      TXGO    50 X X X X - F  v vf,vs,vs,t  texture gather w/per-texel offsets
5bd8deadSopenharmony_ci      TXL     40 X X X X - F  v   vf,t      texture sample w/LOD
5bd8deadSopenharmony_ci      TXP     40 X X X X - F  v   vf,t      texture sample w/projection
5bd8deadSopenharmony_ci      TXQ     40 - - - - - S  vs  vs,t      texture info query
5bd8deadSopenharmony_ci      UP2H    40 X X X X - F  vf  s         unpack two 16-bit floats
5bd8deadSopenharmony_ci      UP2US   40 X X X X - F  vf  s         unpack two unsigned 16-bit integers
5bd8deadSopenharmony_ci      UP4B    40 X X X X - F  vf  s         unpack four signed 8-bit integers
5bd8deadSopenharmony_ci      UP4UB   40 X X X X - F  vf  s         unpack four unsigned 8-bit integers
5bd8deadSopenharmony_ci      UP64    50 X X X X - F  v   v         unpack 2x64 vectors to 4x32
5bd8deadSopenharmony_ci      X2D     40 X - X X X F  v   v,v,v     2D coordinate transformation
5bd8deadSopenharmony_ci      XOR     40 - 6 X - - S  v   v,v       exclusive or
5bd8deadSopenharmony_ci      XPD     40 X - X X X F  v   v,v       cross product
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci          Table X.13:  Summary of NV_gpu_program5 instructions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The "V" column indicates the first assembly language in the
5bd8deadSopenharmony_ci      NV_gpu_program4 family (if any) supporting the opcode.  "41" and "50"
5bd8deadSopenharmony_ci      indicate NV_gpu_program4_1 and NV_gpu_program5, respectively.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The "Modifiers" columns specify the set of modifiers allowed for the
5bd8deadSopenharmony_ci      instruction:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        F = floating-point data type modifiers
5bd8deadSopenharmony_ci        I = signed and unsigned integer data type modifiers
5bd8deadSopenharmony_ci        C = condition code update modifiers
5bd8deadSopenharmony_ci        S = clamping (saturation) modifiers
5bd8deadSopenharmony_ci        H = half-precision float data type suffix
5bd8deadSopenharmony_ci        D = default data type modifier (F, U, or S)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      For the "F" and "I" columns, an "X" indicates support for both unsized
5bd8deadSopenharmony_ci      type modifiers and sized type modifiers with fewer than 64 bits.  A "6"
5bd8deadSopenharmony_ci      indicates support for all modifiers, including 64-bit versions (when
5bd8deadSopenharmony_ci      supported).
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The input and output columns describe the formats of the operands and
5bd8deadSopenharmony_ci      results of the instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        v:  4-component vector (data type is inherited from operation)
5bd8deadSopenharmony_ci        vf: 4-component vector (data type is always floating-point)
5bd8deadSopenharmony_ci        vs: 4-component vector (data type is always signed integer)
5bd8deadSopenharmony_ci        vu: 4-component vector (data type is always unsigned integer)
5bd8deadSopenharmony_ci        s:  scalar (replicated if written to a vector destination;
5bd8deadSopenharmony_ci                    data type is inherited from operation)
5bd8deadSopenharmony_ci        su:  scalar (data type is always unsigned integer)
5bd8deadSopenharmony_ci        c:  condition code test result (e.g., "EQ", "GT1.x")
5bd8deadSopenharmony_ci        vc: 4-component vector or condition code test
5bd8deadSopenharmony_ci        t:  texture
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Instructions labeled "fp-only" and "gp-only" are supported only for
5bd8deadSopenharmony_ci      fragment and geometry programs, respectively.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.4.1, Program Instruction Modifiers
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Update the discussion of instruction precision modifiers.  If
5bd8deadSopenharmony_ci     GL_NV_gpu_program_fp64 is not found in the extension string, the "F64"
5bd8deadSopenharmony_ci     instruction modifier described below is not supported.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add to Table X.14 of the NV_gpu_program4 specification.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Modifier  Description
5bd8deadSopenharmony_ci      --------  ---------------------------------------------------
5bd8deadSopenharmony_ci      F         Floating-point operation
5bd8deadSopenharmony_ci      U         Fixed-point operation, unsigned operands
5bd8deadSopenharmony_ci      S         Fixed-point operation, signed operands
5bd8deadSopenharmony_ci      ...
5bd8deadSopenharmony_ci      F32       Floating-point operation, 32-bit precision or
5bd8deadSopenharmony_ci                  access one 32-bit floating-point value
5bd8deadSopenharmony_ci      F64       Floating-point operation, 64-bit precision or
5bd8deadSopenharmony_ci                  access one 64-bit floating-point value
5bd8deadSopenharmony_ci      S32       Fixed-point operation, signed 32-bit operands or
5bd8deadSopenharmony_ci                  access one 32-bit signed integer value
5bd8deadSopenharmony_ci      S64       Fixed-point operation, signed 64-bit operands or
5bd8deadSopenharmony_ci                  access one 64-bit signed integer value
5bd8deadSopenharmony_ci      U32       Fixed-point operation, unsigned 32-bit operands or
5bd8deadSopenharmony_ci                  access one 32-bit unsigned integer value
5bd8deadSopenharmony_ci      U64       Fixed-point operation, unsigned 64-bit operands or
5bd8deadSopenharmony_ci                  access one 64-bit unsigned integer value
5bd8deadSopenharmony_ci      ...
5bd8deadSopenharmony_ci      F32X2     Access two 32-bit floating-point values
5bd8deadSopenharmony_ci      F32X4     Access four 32-bit floating-point values
5bd8deadSopenharmony_ci      F64X2     Access two 64-bit floating-point values
5bd8deadSopenharmony_ci      F64X4     Access four 64-bit floating-point values
5bd8deadSopenharmony_ci      S8        Access one 8-bit signed integer value
5bd8deadSopenharmony_ci      S16       Access one 16-bit signed integer value
5bd8deadSopenharmony_ci      S32X2     Access two 32-bit signed integer values
5bd8deadSopenharmony_ci      S32X4     Access four 32-bit signed integer values
5bd8deadSopenharmony_ci      S64       Access one 64-bit signed integer value
5bd8deadSopenharmony_ci      S64X2     Access two 64-bit signed integer values
5bd8deadSopenharmony_ci      S64X4     Access four 64-bit signed integer values
5bd8deadSopenharmony_ci      U8        Access one 8-bit unsigned integer value
5bd8deadSopenharmony_ci      U16       Access one 16-bit unsigned integer value
5bd8deadSopenharmony_ci      U32       Access one 32-bit unsigned integer value
5bd8deadSopenharmony_ci      U32X2     Access two 32-bit unsigned integer values
5bd8deadSopenharmony_ci      U32X4     Access four 32-bit unsigned integer values
5bd8deadSopenharmony_ci      U64       Access one 64-bit unsigned integer value
5bd8deadSopenharmony_ci      U64X2     Access two 64-bit unsigned integer values
5bd8deadSopenharmony_ci      U64X4     Access four 64-bit unsigned integer values
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      ADD       Perform add operation for ATOM
5bd8deadSopenharmony_ci      MIN       Perform minimum operation for ATOM
5bd8deadSopenharmony_ci      MAX       Perform maximum operation for ATOM
5bd8deadSopenharmony_ci      IWRAP     Perform wrapping increment for ATOM
5bd8deadSopenharmony_ci      DWRAP     Perform wrapping decrment for ATOM
5bd8deadSopenharmony_ci      AND       Perform logical AND operation for ATOM
5bd8deadSopenharmony_ci      OR        Perform logical OR operation for ATOM
5bd8deadSopenharmony_ci      XOR       Perform logical XOR operation for ATOM
5bd8deadSopenharmony_ci      EXCH      Perform exchange operation for ATOM
5bd8deadSopenharmony_ci      CSWAP     Perform compare-and-swap operation for ATOM
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      COH       Make LOAD and STORE operations use coherent caching
5bd8deadSopenharmony_ci      VOL       Make LOAD and STORE operations treat memory as volatile
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      PREC      Instruction results should be precise
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      ROUND     Inexact conversion results round to nearest value (even)
5bd8deadSopenharmony_ci      CEIL      Inexact conversion results round to larger value
5bd8deadSopenharmony_ci      FLR       Inexact conversion results round to smaller value
5bd8deadSopenharmony_ci      TRUNC     Inexact conversion results round to value closest to zero
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    "F", "U", and "S" modifiers are base data type modifiers and specify that
5bd8deadSopenharmony_ci    the instruction should operate on floating-point, unsigned integer, or
5bd8deadSopenharmony_ci    signed integer values, respectively.  For example, "ADD.F", "ADD.U", and
5bd8deadSopenharmony_ci    "ADD.S" specify component-wise addition of floating-point, unsigned
5bd8deadSopenharmony_ci    integer, or signed integer vectors, respectively.  While these modifiers
5bd8deadSopenharmony_ci    specify a data type, they do not specify an exact precision at which the
5bd8deadSopenharmony_ci    operation is performed.  Floating-point and fixed-point operations will
5bd8deadSopenharmony_ci    typically be carried out at 32-bit precision, unless otherwise described
5bd8deadSopenharmony_ci    in the instruction documentation or overridden by the precision modifiers.
5bd8deadSopenharmony_ci    If all operands are represented with less than 32-bit precision (e.g.,
5bd8deadSopenharmony_ci    variables with the "SHORT" component size modifier), operations may be
5bd8deadSopenharmony_ci    carried out at a precision no less than the precision of the largest
5bd8deadSopenharmony_ci    operand used by the instruction.  For some instructions, the data type of
5bd8deadSopenharmony_ci    some operands or the result are fixed; in these cases, the data type
5bd8deadSopenharmony_ci    modifier specifies the data type of the remaining values.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Operands represented with fewer bits than used to perform the instruction
5bd8deadSopenharmony_ci    will be promoted to a larger data type.  Signed integer operands will be
5bd8deadSopenharmony_ci    sign-extended, where the most significant bits are filled with ones if the
5bd8deadSopenharmony_ci    operand is negative and zero otherwise.  Unsigned integer operands will be
5bd8deadSopenharmony_ci    zero-extended, where the most significant bits are always filled with
5bd8deadSopenharmony_ci    zeroes.  Operands represented with more bits than used to perform the
5bd8deadSopenharmony_ci    instruction will be converted to lower precision.  Floating-point
5bd8deadSopenharmony_ci    overflows result in IEEE infinity encodings; integer overflows result in
5bd8deadSopenharmony_ci    the truncation of the most significant bits.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For arithmetic operations, the "F32", "F64", "U32", "U64", "S32", and
5bd8deadSopenharmony_ci    "S64" modifiers are precision-specific data type modifiers that specify
5bd8deadSopenharmony_ci    that floating-point, unsigned integer, or signed integer operations be
5bd8deadSopenharmony_ci    carried out with an internal precision of no less than 32 or 64 bits per
5bd8deadSopenharmony_ci    component, respectively.  The "F64", "U64", and "S64" modifiers are
5bd8deadSopenharmony_ci    supported on only a subset of instructions, as documented in the
5bd8deadSopenharmony_ci    instruction table.  The base data type of the instruction is trivially
5bd8deadSopenharmony_ci    derived from a precision-specific data type modifiers, and an instruction
5bd8deadSopenharmony_ci    may not specify both base and precision-specific data type modifiers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ...
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    "SAT" and "SSAT" are clamping modifiers that generally specify that the
5bd8deadSopenharmony_ci    floating-point components of the instruction result should be clamped to
5bd8deadSopenharmony_ci    [0,1] or [-1,1], respectively, before updating the condition code and the
5bd8deadSopenharmony_ci    destination variable.  If no clamping suffix is specified, unclamped
5bd8deadSopenharmony_ci    results will be used for condition code updates (if any) and destination
5bd8deadSopenharmony_ci    variable writes.  Clamping modifiers are not supported on instructions
5bd8deadSopenharmony_ci    that do not produce floating-point results, with one exception.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ...
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For load and store operations, the "F32", "F32X2", "F32X4", "F64",
5bd8deadSopenharmony_ci    "F64X2", "F64X4", "S8", "S16", "S32", "S32X2", "S32X4", "S64", "S64X2",
5bd8deadSopenharmony_ci    "S64X4", "U8", "U16", "U32", "U32X2", "U32X4", "U64", "U64X2", and "U64X4"
5bd8deadSopenharmony_ci    storage modifiers control how data are loaded from or stored to memory.
5bd8deadSopenharmony_ci    Storage modifiers are supported by the ATOM, LDC, LOAD, and STORE
5bd8deadSopenharmony_ci    instructions and are covered in more detail in the descriptions of these
5bd8deadSopenharmony_ci    instructions.  These instructions must specify exactly one of these
5bd8deadSopenharmony_ci    modifiers, and may not specify any of the base data type modifiers (F,U,S)
5bd8deadSopenharmony_ci    described above.  The base data types of the result vector of a load
5bd8deadSopenharmony_ci    instruction or the first operand of a store instruction are trivially
5bd8deadSopenharmony_ci    derived from the storage modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For atomic memory operations performed by the ATOM instruction, the "ADD",
5bd8deadSopenharmony_ci    "MIN", "MAX", "IWRAP", "DWRAP", "AND", "OR", "XOR", "EXCH", and "CSWAP"
5bd8deadSopenharmony_ci    modifiers specify the operation to perform on the memory being accessed,
5bd8deadSopenharmony_ci    and are described in more detail in the description of this instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For load and store operations, the "COH" modifier controls whether the
5bd8deadSopenharmony_ci    operation uses a coherent level of the cache hierarchy, as described in
5bd8deadSopenharmony_ci    Section 2.X.4.5.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For load and store operations, the "VOL" modifier controls whether the
5bd8deadSopenharmony_ci    operation treats the memory being read or written as volatile.
5bd8deadSopenharmony_ci    Instructions modified with "VOL" will always read or write the underlying
5bd8deadSopenharmony_ci    memory, whether or not previous or subsequent loads and stores access the
5bd8deadSopenharmony_ci    same memory.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For arithmetic and logical operations, the "PREC" modifier controls
5bd8deadSopenharmony_ci    whether the instruction result should be treated as precise.  For
5bd8deadSopenharmony_ci    instructions not qualified with ".PREC", the implementation may rearrange
5bd8deadSopenharmony_ci    the computations specified by the program instructions to execute more
5bd8deadSopenharmony_ci    efficiently, even if it may generate slightly different results in some
5bd8deadSopenharmony_ci    cases.  For example, an implementation may combine a MUL instruction with
5bd8deadSopenharmony_ci    a dependent ADD instruction and generate code to execute a MAD
5bd8deadSopenharmony_ci    (multiply-add) instruction instead.  The difference in rounding may
5bd8deadSopenharmony_ci    produce unacceptable artifacts for some algorithms.  When ".PREC" is
5bd8deadSopenharmony_ci    specified, the instruction will be executed in a manner that always
5bd8deadSopenharmony_ci    generates the same result regardless of the program instructions that
5bd8deadSopenharmony_ci    precede or follow the instruction.  Note that a ".PREC" modifier does not
5bd8deadSopenharmony_ci    affect the processing of any other instruction.  For example, tagging an
5bd8deadSopenharmony_ci    instruction with ".PREC" does not mean that the instructions used to
5bd8deadSopenharmony_ci    generate the instruction's operands will be treated as precise unless
5bd8deadSopenharmony_ci    those instructions are also qualified with ".PREC".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For the CVT (data type conversion) instruction, the "F16", "F32", "F64",
5bd8deadSopenharmony_ci    "S8", "S16", "S32", "S64", "U8", "U16", "U32", and "U64" storage modifiers
5bd8deadSopenharmony_ci    specify the data type of the vector operand and the converted result.  Two
5bd8deadSopenharmony_ci    storage modifiers must be provided, which specify the data type of the
5bd8deadSopenharmony_ci    result and the operand, respectively.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For the CVT (data type conversion) instruction, the "ROUND", "CEIL",
5bd8deadSopenharmony_ci    "FLR", and "TRUNC" modifiers specify how to round converted results that
5bd8deadSopenharmony_ci    are not directly representable using the data type of the result.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.4.4, Program Texture Access
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Extend the language describing the operation of texel offsets to cover
5bd8deadSopenharmony_ci     the new capability to load texel offsets from a register.  Otherwise,
5bd8deadSopenharmony_ci     this functionality is unchanged from previous extensions.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <offset> is a 3-component signed integer vector, which can be specified
5bd8deadSopenharmony_ci    using constants embedded in the texture instruction according to the
5bd8deadSopenharmony_ci    <texOffsetImmed> grammar rule, or taken from a vector operand according to
5bd8deadSopenharmony_ci    the <texOffsetVar> grammar rule.  The three components of the offset
5bd8deadSopenharmony_ci    vector are added to the computed <u>, <v>, and <w> texel locations prior
5bd8deadSopenharmony_ci    to sampling.  When using a constant offset, one, two, or three components
5bd8deadSopenharmony_ci    may be specified in the instruction; if fewer than three are specified,
5bd8deadSopenharmony_ci    the remaining offset components are zero.  If no offsets are specified,
5bd8deadSopenharmony_ci    all three components of the offset are treated as zero.  A limited range
5bd8deadSopenharmony_ci    of offset values are supported; the minimum and maximum <texOffset> values
5bd8deadSopenharmony_ci    are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
5bd8deadSopenharmony_ci    MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively.  A program will fail to load:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if the texture target specified in the instruction is 1D, ARRAY1D,
5bd8deadSopenharmony_ci        SHADOW1D, or SHADOWARRAY1D, and the second or third component of a
5bd8deadSopenharmony_ci        constant offset vector is non-zero;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if the texture target specified in the instruction is 2D, RECT,
5bd8deadSopenharmony_ci        ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
5bd8deadSopenharmony_ci        component of a constant offset vector is non-zero;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if the texture target is CUBE, SHADOWCUBE, ARRAYCUBE, or
5bd8deadSopenharmony_ci        SHADOWARRAYCUBE, and any component of a constant offset vector is
5bd8deadSopenharmony_ci        non-zero -- texel offsets are not supported for cube map or buffer
5bd8deadSopenharmony_ci        textures;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if any component of the constant offset vector of a TXGO instruction
5bd8deadSopenharmony_ci        is non-zero -- non-constant offsets are provided in separate operands;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if any component of a constant offset vector is less than
5bd8deadSopenharmony_ci        MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
5bd8deadSopenharmony_ci        MAX_PROGRAM_TEXEL_OFFSET_EXT;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if a TXD or TXGO instruction specifies a non-constant texel offset
5bd8deadSopenharmony_ci        according to the <texOffsetVar> grammar rule; or
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if any instruction specifies a non-constant texel offset according
5bd8deadSopenharmony_ci        to the <texOffsetVar> grammar rule and the texture target is CUBE,
5bd8deadSopenharmony_ci        SHADOWCUBE, ARRAYCUBE, or SHADOWARRAYCUBE.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The implementation-dependent minimum and maximum texel offset values apply
5bd8deadSopenharmony_ci    to texel offsets are taken from a vector operand, but out-of-bounds or
5bd8deadSopenharmony_ci    invalid component values will not prevent program loading since the
5bd8deadSopenharmony_ci    offsets may not be computed until the program is executed.  Components of
5bd8deadSopenharmony_ci    the vector operand not needed for the texture target are ignored.  The W
5bd8deadSopenharmony_ci    component of the offset vector is always ignored; the Z component of the
5bd8deadSopenharmony_ci    offset vector is ignored unless the target is 3D; the Y component is
5bd8deadSopenharmony_ci    ignored if the target is 1D, ARRAY1D, SHADOW1D, or SHADOWARRAY1D.  If the
5bd8deadSopenharmony_ci    value of any non-ignored component of the vector operand is outside
5bd8deadSopenharmony_ci    implementation-dependent limits, the results of the texture lookup are
5bd8deadSopenharmony_ci    undefined.  For all instructions except TXGO, the limits are
5bd8deadSopenharmony_ci    MIN_PROGRAM_TEXEL_OFFSET_EXT and MAX_PROGRAM_TEXEL_OFFSET_EXT.  For the
5bd8deadSopenharmony_ci    TXGO instruction, the limits are MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV and
5bd8deadSopenharmony_ci    MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify language describing how the check for using multiple targets on a
5bd8deadSopenharmony_ci     single texture image unit works, to account for texture array variables
5bd8deadSopenharmony_ci     where a single instruction may access one of multiple textures and the
5bd8deadSopenharmony_ci     texture used is not known when the program is loaded.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    A program will fail to load if it attempts to sample from multiple texture
5bd8deadSopenharmony_ci    targets (including the SHADOW pseudo-targets) on the same texture image
5bd8deadSopenharmony_ci    unit.  For example, a program containing any two the following
5bd8deadSopenharmony_ci    instructions will fail to load:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      TEX out, coord, texture[0], 1D;
5bd8deadSopenharmony_ci      TEX out, coord, texture[0], 2D;
5bd8deadSopenharmony_ci      TEX out, coord, texture[0], ARRAY2D;
5bd8deadSopenharmony_ci      TEX out, coord, texture[0], SHADOW2D;
5bd8deadSopenharmony_ci      TEX out, coord, texture[0], 3D;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For the purposes of this test, sampling using a texture variable declared
5bd8deadSopenharmony_ci    as an array is treated as though all texture image units bound to the
5bd8deadSopenharmony_ci    variable were accessed.  A program containing the following
5bd8deadSopenharmony_ci    instructions would fail to load:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      TEXTURE textures[] = { texture[0..3] };
5bd8deadSopenharmony_ci      TEX out, coord, textures[2], 2D;     # acts as if all textures are used
5bd8deadSopenharmony_ci      TEX out, coord, texture[1], 3D;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add language describing texture gather component selection)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The TXG and TXGO instructions provide the ability to assemble a
5bd8deadSopenharmony_ci    four-component vector by taking the value of a single component of a
5bd8deadSopenharmony_ci    multi-component texture from each of four texels.  The component selected
5bd8deadSopenharmony_ci    is identified by the <texImageUnitComp> grammar rule.  Component selection
5bd8deadSopenharmony_ci    is not supported for any other instruction, and a program will fail to
5bd8deadSopenharmony_ci    load if <texImageUnitComp> is matched for any texture instruction other
5bd8deadSopenharmony_ci    than TXG or TXGO.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Add New Section 2.X.4.5, Program Memory Access
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Programs may load from or store to buffer object memory via the ATOM
5bd8deadSopenharmony_ci    (atomic global memory operation), LDC (load constant), LOAD (global load),
5bd8deadSopenharmony_ci    and STORE (global store) instructions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Load instructions read 8, 16, 32, 64, 128, or 256 bits of data from a
5bd8deadSopenharmony_ci    source address to produce a four-component vector, according to the
5bd8deadSopenharmony_ci    storage modifier specified with the instruction.  The storage modifier has
5bd8deadSopenharmony_ci    three parts:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      - a base data type, "F", "S", or "U", specifying that the instruction
5bd8deadSopenharmony_ci        fetches floating-point, signed integer, or unsigned integer values,
5bd8deadSopenharmony_ci        respectively;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      - a component size, specifying that the components fetched by the
5bd8deadSopenharmony_ci        instruction have 8, 16, 32, or 64 bits; and
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      - an optional component count, where "X2" and "X4" indicate that two or
5bd8deadSopenharmony_ci        four components be fetched, and no count indicates a single component
5bd8deadSopenharmony_ci        fetch.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When the storage modifier specifies that fewer than four components should
5bd8deadSopenharmony_ci    be fetched, remaining components are filled with zeroes.  When performing
5bd8deadSopenharmony_ci    an atomic memory operation (ATOM) or a global load (LOAD), the GPU address
5bd8deadSopenharmony_ci    is specified as an instruction operand.  When performing a constant buffer
5bd8deadSopenharmony_ci    load (LDC), the GPU address is derived by adding the base address of the
5bd8deadSopenharmony_ci    bound buffer object to an offset specified as an instruction operand.
5bd8deadSopenharmony_ci    Given a GPU address <address> and a storage modifier <modifier>, the
5bd8deadSopenharmony_ci    memory load can be described by the following code:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
5bd8deadSopenharmony_ci      {
5bd8deadSopenharmony_ci        result_t_vec result = { 0, 0, 0, 0 };
5bd8deadSopenharmony_ci        switch (modifier) {
5bd8deadSopenharmony_ci        case F32:
5bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F32X2:
5bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F32X4:
5bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((float32_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((float32_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F64:
5bd8deadSopenharmony_ci            result.x = ((float64_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F64X2:
5bd8deadSopenharmony_ci            result.x = ((float64_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((float64_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F64X4:
5bd8deadSopenharmony_ci            result.x = ((float64_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((float64_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((float64_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((float64_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S8:
5bd8deadSopenharmony_ci            result.x = ((int8_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S16:
5bd8deadSopenharmony_ci            result.x = ((int16_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32:
5bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32X2:
5bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32X4:
5bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((int32_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((int32_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S64:
5bd8deadSopenharmony_ci            result.x = ((int64_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S64X2:
5bd8deadSopenharmony_ci            result.x = ((int64_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((int64_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S64X4:
5bd8deadSopenharmony_ci            result.x = ((int64_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((int64_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((int64_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((int64_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U8:
5bd8deadSopenharmony_ci            result.x = ((uint8_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U16:
5bd8deadSopenharmony_ci            result.x = ((uint16_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32:
5bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32X2:
5bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32X4:
5bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((uint32_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((uint32_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U64:
5bd8deadSopenharmony_ci            result.x = ((uint64_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U64X2:
5bd8deadSopenharmony_ci            result.x = ((uint64_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((uint64_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U64X4:
5bd8deadSopenharmony_ci            result.x = ((uint64_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((uint64_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((uint64_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((uint64_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci        return result;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Store instructions write the contents of a four-component vector operand
5bd8deadSopenharmony_ci    into 8, 16, 32, 64, 128, or 256 bits, according to the storage modifier
5bd8deadSopenharmony_ci    specified with the instruction.  The storage modifiers supported by stores
5bd8deadSopenharmony_ci    are identical to those supported for loads.  Given a GPU address
5bd8deadSopenharmony_ci    <address>, a vector operand <operand> containing the data to be stored,
5bd8deadSopenharmony_ci    and a storage modifier <modifier>, the memory store can be described by
5bd8deadSopenharmony_ci    the following code:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      void BufferMemoryStore(char *address, operand_t_vec operand,
5bd8deadSopenharmony_ci                             OpModifier modifier)
5bd8deadSopenharmony_ci      {
5bd8deadSopenharmony_ci        switch (modifier) {
5bd8deadSopenharmony_ci        case F32:
5bd8deadSopenharmony_ci            ((float32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F32X2:
5bd8deadSopenharmony_ci            ((float32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((float32_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F32X4:
5bd8deadSopenharmony_ci            ((float32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((float32_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            ((float32_t *)address)[2] = operand.z;
5bd8deadSopenharmony_ci            ((float32_t *)address)[3] = operand.w;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F64:
5bd8deadSopenharmony_ci            ((float64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F64X2:
5bd8deadSopenharmony_ci            ((float64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((float64_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F64X4:
5bd8deadSopenharmony_ci            ((float64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((float64_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            ((float64_t *)address)[2] = operand.z;
5bd8deadSopenharmony_ci            ((float64_t *)address)[3] = operand.w;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S8:
5bd8deadSopenharmony_ci            ((int8_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S16:
5bd8deadSopenharmony_ci            ((int16_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32:
5bd8deadSopenharmony_ci            ((int32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32X2:
5bd8deadSopenharmony_ci            ((int32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((int32_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32X4:
5bd8deadSopenharmony_ci            ((int32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((int32_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            ((int32_t *)address)[2] = operand.z;
5bd8deadSopenharmony_ci            ((int32_t *)address)[3] = operand.w;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S64:
5bd8deadSopenharmony_ci            ((int64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S64X2:
5bd8deadSopenharmony_ci            ((int64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((int64_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S64X4:
5bd8deadSopenharmony_ci            ((int64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((int64_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            ((int64_t *)address)[2] = operand.z;
5bd8deadSopenharmony_ci            ((int64_t *)address)[3] = operand.w;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U8:
5bd8deadSopenharmony_ci            ((uint8_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U16:
5bd8deadSopenharmony_ci            ((uint16_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32:
5bd8deadSopenharmony_ci            ((uint32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32X2:
5bd8deadSopenharmony_ci            ((uint32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((uint32_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32X4:
5bd8deadSopenharmony_ci            ((uint32_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((uint32_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            ((uint32_t *)address)[2] = operand.z;
5bd8deadSopenharmony_ci            ((uint32_t *)address)[3] = operand.w;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U64:
5bd8deadSopenharmony_ci            ((uint64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U64X2:
5bd8deadSopenharmony_ci            ((uint64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((uint64_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U64X4:
5bd8deadSopenharmony_ci            ((uint64_t *)address)[0] = operand.x;
5bd8deadSopenharmony_ci            ((uint64_t *)address)[1] = operand.y;
5bd8deadSopenharmony_ci            ((uint64_t *)address)[2] = operand.z;
5bd8deadSopenharmony_ci            ((uint64_t *)address)[3] = operand.w;
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a global load or store accesses a memory address that does not
5bd8deadSopenharmony_ci    correspond to a buffer object made resident by MakeBufferResidentNV, the
5bd8deadSopenharmony_ci    results of the operation are undefined and may produce a fault resulting
5bd8deadSopenharmony_ci    in application termination.  If a load accesses a buffer object made
5bd8deadSopenharmony_ci    resident with an <access> parameter of WRITE_ONLY, or if a store accesses
5bd8deadSopenharmony_ci    a buffer object made resident with an <access> parameter of READ_ONLY, the
5bd8deadSopenharmony_ci    results of the operation are also undefined and may lead to application
5bd8deadSopenharmony_ci    termination.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The address used for global memory loads or stores or offset used for
5bd8deadSopenharmony_ci    constant buffer loads must be aligned to the fetch size corresponding to
5bd8deadSopenharmony_ci    the storage opcode modifier.  For S8 and U8, the offset has no alignment
5bd8deadSopenharmony_ci    requirements.  For S16 and U16, the offset must be a multiple of two basic
5bd8deadSopenharmony_ci    machine units.  For F32, S32, and U32, the offset must be a multiple of
5bd8deadSopenharmony_ci    four.  For F32X2, F64, S32X2, S64, U32X2, and U64, the offset must be a
5bd8deadSopenharmony_ci    multiple of eight.  For F32X4, F64X2, S32X4, S64X2, U32X4, and U64X2, the
5bd8deadSopenharmony_ci    offset must be a multiple of sixteen.  For F64X4, S64X4, and U64X4, the
5bd8deadSopenharmony_ci    offset must be a multiple of thirty-two.  If an offset is not correctly
5bd8deadSopenharmony_ci    aligned, the values returned by a buffer memory load will be undefined,
5bd8deadSopenharmony_ci    and the effects of a buffer memory store will also be undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Global and image memory accesses in assembly programs are weakly ordered
5bd8deadSopenharmony_ci    and may require synchronization relative to other operations in the OpenGL
5bd8deadSopenharmony_ci    pipeline.  The ordering and synchronization mehcanisms described in
5bd8deadSopenharmony_ci    Section 2.14.X (of the EXT_shader_image_load_store extension
5bd8deadSopenharmony_ci    specification) for shaders using the OpenGL Shading Language apply equally
5bd8deadSopenharmony_ci    to loads, stores, and atomics performed in assembly programs.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.6.Y of the NV_fragment_program4 specification
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add new option section)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    + Early Per-Fragment Tests (NV_early_fragment_tests)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a fragment program specifies the "NV_early_fragment_tests" option, the
5bd8deadSopenharmony_ci    depth and stencil tests will be performed prior to fragment program
5bd8deadSopenharmony_ci    invocation, as described in Section 3.X.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.7.Y of the NV_geometry_program4 specification
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Simply add the new input primitive type "PATCHES" to the list of tokens
5bd8deadSopenharmony_ci     allowed by the "PRIMITIVE_IN" declaration.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    - Input Primitive Type (PRIMITIVE_IN)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The PRIMITIVE_IN statement declares the type of primitives seen by a
5bd8deadSopenharmony_ci    geometry program.  The single argument must be one of "POINTS", "LINES",
5bd8deadSopenharmony_ci    "LINES_ADJACENCY", "TRIANGLES", "TRIANGLES_ADJACENCY", or "PATCHES".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add a new optional program declaration to declare a geometry shader that
5bd8deadSopenharmony_ci     is run <N> times per primitive.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Geometry programs support three types of mandatory declaration statements,
5bd8deadSopenharmony_ci    as described below.  Each of the three must be included exactly once in
5bd8deadSopenharmony_ci    the geometry program.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ...
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Geometry programs also support one optional declaration statement.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    - Program Invocation Count (INVOCATIONS)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The INVOCATIONS statement declares the number of times the geometry
5bd8deadSopenharmony_ci    program is run on each primitive processed.  The single argument must be a
5bd8deadSopenharmony_ci    positive integer less than or equal to the value of the
5bd8deadSopenharmony_ci    implementation-dependent limit MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV.  Each
5bd8deadSopenharmony_ci    invocation of the geometry program will have the same inputs and outputs
5bd8deadSopenharmony_ci    except for the built-in input variable "primitive.invocation".  This
5bd8deadSopenharmony_ci    variable will be an integer between 0 and <n>-1, where <n> is the declared
5bd8deadSopenharmony_ci    number of invocations.  If omitted, the program invocation count is one.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, ATOM:  Atomic Global Memory Operation
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The ATOM instruction performs an atomic global memory operation by reading
5bd8deadSopenharmony_ci    from memory at the address specified by the second unsigned integer scalar
5bd8deadSopenharmony_ci    operand, computing a new value based on the value read from memory and the
5bd8deadSopenharmony_ci    first (vector) operand, and then writing the result back to the same
5bd8deadSopenharmony_ci    memory address.  The memory transaction is atomic, guaranteeing that no
5bd8deadSopenharmony_ci    other write to the memory accessed will occur between the time it is read
5bd8deadSopenharmony_ci    and written by the ATOM instruction.  The result of the ATOM instruction
5bd8deadSopenharmony_ci    is the scalar value read from memory.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The ATOM instruction has two required instruction modifiers.  The atomic
5bd8deadSopenharmony_ci    modifier specifies the type of operation to be performed.  The storage
5bd8deadSopenharmony_ci    modifier specifies the size and data type of the operand read from memory
5bd8deadSopenharmony_ci    and the base data type of the operation used to compute the value to be
5bd8deadSopenharmony_ci    written to memory.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      atomic     storage
5bd8deadSopenharmony_ci      modifier   modifiers            operation
5bd8deadSopenharmony_ci      --------   ------------------   --------------------------------------
5bd8deadSopenharmony_ci       ADD       U32, S32, U64        compute a sum
5bd8deadSopenharmony_ci       MIN       U32, S32             compute minimum
5bd8deadSopenharmony_ci       MAX       U32, S32             compute maximum
5bd8deadSopenharmony_ci       IWRAP     U32                  increment memory, wrapping at operand
5bd8deadSopenharmony_ci       DWRAP     U32                  decrement memory, wrapping at operand
5bd8deadSopenharmony_ci       AND       U32, S32             compute bit-wise AND
5bd8deadSopenharmony_ci       OR        U32, S32             compute bit-wise OR
5bd8deadSopenharmony_ci       XOR       U32, S32             compute bit-wise XOR
5bd8deadSopenharmony_ci       EXCH      U32, S32, U64        exchange memory with operand
5bd8deadSopenharmony_ci       CSWAP     U32, S32, U64        compare-and-swap
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     Table X.Y, Supported atomic and storage modifiers for the ATOM
5bd8deadSopenharmony_ci     instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Not all storage modifiers are supported by ATOM, and the set of modifiers
5bd8deadSopenharmony_ci    allowed for any given instruction depends on the atomic modifier
5bd8deadSopenharmony_ci    specified.  Table X.Y enumerates the set of atomic modifiers supported by
5bd8deadSopenharmony_ci    the ATOM instruction, and the storage modifiers allowed for each.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      address = ScalarLoad(op1);
5bd8deadSopenharmony_ci      result = BufferMemoryLoad(address, storageModifier);
5bd8deadSopenharmony_ci      switch (atomicModifier) {
5bd8deadSopenharmony_ci      case ADD:
5bd8deadSopenharmony_ci        writeval = tmp0.x + result;
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case MIN:
5bd8deadSopenharmony_ci        writeval = min(tmp0.x, result);
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case MAX:
5bd8deadSopenharmony_ci        writeval = max(tmp0.x, result);
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case IWRAP:
5bd8deadSopenharmony_ci        writeval = (result >= tmp0.x) ? 0 : result+1;
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case DWRAP:
5bd8deadSopenharmony_ci        writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1;
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case AND:
5bd8deadSopenharmony_ci        writeval = tmp0.x & result;
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case OR:
5bd8deadSopenharmony_ci        writeval = tmp0.x | result;
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case XOR:
5bd8deadSopenharmony_ci        writeval = tmp0.x ^ result;
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case EXCH:
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      case CSWAP:
5bd8deadSopenharmony_ci        if (result == tmp0.x) {
5bd8deadSopenharmony_ci          writeval = tmp0.y;
5bd8deadSopenharmony_ci        } else {
5bd8deadSopenharmony_ci          return result;  // no memory store
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci        break;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      BufferMemoryStore(address, writeval, storageModifier);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ATOM performs a scalar atomic operation.  The <y>, <z>, and <w> components
5bd8deadSopenharmony_ci    of the result vector are undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ATOM supports no base data type modifiers, but requires exactly one
5bd8deadSopenharmony_ci    storage modifier.  The base data types of the result vector, and the first
5bd8deadSopenharmony_ci    (vector) operand are derived from the storage modifier.  The second
5bd8deadSopenharmony_ci    operand is always interpreted as a scalar unsigned integer.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, BFE:  Bitfield Extract
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The BFE instruction extracts a selected set of performs a component-wise
5bd8deadSopenharmony_ci    bit extraction of the second vector operand to yield a result vector.  For
5bd8deadSopenharmony_ci    each component, the number of bits extracted is given by the x component
5bd8deadSopenharmony_ci    of the first vector operand, and the bit number of the least significant
5bd8deadSopenharmony_ci    bit extracted is given by the y component of the first vector operand.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      tmp1 = VectorLoad(op1);
5bd8deadSopenharmony_ci      result.x = BitfieldExtract(tmp0.x, tmp0.y, tmp1.x);
5bd8deadSopenharmony_ci      result.y = BitfieldExtract(tmp0.x, tmp0.y, tmp1.y);
5bd8deadSopenharmony_ci      result.z = BitfieldExtract(tmp0.x, tmp0.y, tmp1.z);
5bd8deadSopenharmony_ci      result.w = BitfieldExtract(tmp0.x, tmp0.y, tmp1.w);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If the number of bits to extract is zero, zero is returned.  The results
5bd8deadSopenharmony_ci    of bitfield extraction are undefined
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if the number of bits to extract or the starting offset is negative,
5bd8deadSopenharmony_ci      * if the sum of the number of bits to extract and the starting offset
5bd8deadSopenharmony_ci        is greater than the total number of bits in the operand/result, or
5bd8deadSopenharmony_ci      * if the starting offset is greater than or equal to the total number of
5bd8deadSopenharmony_ci        bits in the operand/result.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Type BitfieldExtract(Type bits, Type offset, Type value)
5bd8deadSopenharmony_ci      {
5bd8deadSopenharmony_ci        if (bits < 0 || offset < 0 || offset >= TotalBits(Type) ||
5bd8deadSopenharmony_ci            bits + offset > TotalBits(Type)) {
5bd8deadSopenharmony_ci          /* result undefined */
5bd8deadSopenharmony_ci        } else if (bits == 0) {
5bd8deadSopenharmony_ci          return 0;
5bd8deadSopenharmony_ci        } else {
5bd8deadSopenharmony_ci          return (value << (TotalBits(Type) - (bits+offset))) >>
5bd8deadSopenharmony_ci                   (TotalBits(type) - bits);
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    BFE supports only signed and unsigned integer data type modifiers.  For
5bd8deadSopenharmony_ci    signed integer data types, the extracted value is sign-extended (i.e.,
5bd8deadSopenharmony_ci    filled with ones if the most significant bit extracted is one and filled
5bd8deadSopenharmony_ci    with zeroes otherwise).  For unsigned integer data types, the extracted
5bd8deadSopenharmony_ci    value is zero-extended.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, BFI:  Bitfield Insert
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The BFI instruction performs a component-wise bitfield insertion of the
5bd8deadSopenharmony_ci    second vector operand into the third vector operand to yield a result
5bd8deadSopenharmony_ci    vector.  For each component, the <n> least significant bits are extracted
5bd8deadSopenharmony_ci    from the corresponding component of the second vector operand, where <n>
5bd8deadSopenharmony_ci    is given by the x component of the first vector operand.  Those bits are
5bd8deadSopenharmony_ci    merged into the corresponding component of the third vector operand,
5bd8deadSopenharmony_ci    replacing bits <b> through <b>+<n>-1, to produce the result.  The bit
5bd8deadSopenharmony_ci    offset <b> is specified by the y component of the first operand.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      tmp1 = VectorLoad(op1);
5bd8deadSopenharmony_ci      tmp2 = VectorLoad(op2);
5bd8deadSopenharmony_ci      result.x = BitfieldInsert(op0.x, op0.y, tmp1.x, tmp2.x);
5bd8deadSopenharmony_ci      result.y = BitfieldInsert(op0.x, op0.y, tmp1.y, tmp2.y);
5bd8deadSopenharmony_ci      result.z = BitfieldInsert(op0.x, op0.y, tmp1.z, tmp2.z);
5bd8deadSopenharmony_ci      result.w = BitfieldInsert(op0.x, op0.y, tmp1.w, tmp2.w);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The results of bitfield insertion are undefined
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * if the number of bits to insert or the starting offset is negative,
5bd8deadSopenharmony_ci      * if the sum of the number of bits to insert and the starting offset
5bd8deadSopenharmony_ci        is greater than the total number of bits in the operand/result, or
5bd8deadSopenharmony_ci      * if the starting offset is greater than or equal to the total number of
5bd8deadSopenharmony_ci        bits in the operand/result.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Type BitfieldInsert(Type bits, Type offset, Type src, Type dst)
5bd8deadSopenharmony_ci      {
5bd8deadSopenharmony_ci        if (bits < 0 || offset < 0 || offset >= TotalBits(type) ||
5bd8deadSopenharmony_ci            bits + offset > TotalBits(Type)) {
5bd8deadSopenharmony_ci          /* result undefined */
5bd8deadSopenharmony_ci        } else if (bits == TotalBits(Type)) {
5bd8deadSopenharmony_ci          return src;
5bd8deadSopenharmony_ci        } else {
5bd8deadSopenharmony_ci          Type mask = ((1 << bits) - 1) << offset;
5bd8deadSopenharmony_ci          return ((src << offset) & mask) | (dst & (~mask));
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    BFI supports only signed and unsigned integer data type modifiers.  If no
5bd8deadSopenharmony_ci    type modifier is specified, the operand and result vectors are treated as
5bd8deadSopenharmony_ci    signed integers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, BFR:  Bitfield Reverse
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The BFR instruction performs a component-wise bit reversal of the single
5bd8deadSopenharmony_ci    vector operand to produce a result vector.  Bit reversal is performed by
5bd8deadSopenharmony_ci    exchanging the most and least significant bits, the second-most and
5bd8deadSopenharmony_ci    second-least significant bits, and so on.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      result.x = BitReverse(tmp0.x);
5bd8deadSopenharmony_ci      result.y = BitReverse(tmp0.y);
5bd8deadSopenharmony_ci      result.z = BitReverse(tmp0.z);
5bd8deadSopenharmony_ci      result.w = BitReverse(tmp0.w);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    BFR supports only signed and unsigned integer data type modifiers.  If no
5bd8deadSopenharmony_ci    type modifier is specified, the operand and result vectors are treated as
5bd8deadSopenharmony_ci    signed integers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, BTC:  Bit Count
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The BTC instruction performs a component-wise bit count of the single
5bd8deadSopenharmony_ci    source vector to yield a result vector.  Each component of the result
5bd8deadSopenharmony_ci    vector contains the number of one bits in the corresponding component of
5bd8deadSopenharmony_ci    the source vector.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      result.x = BitCount(tmp0.x);
5bd8deadSopenharmony_ci      result.y = BitCount(tmp0.y);
5bd8deadSopenharmony_ci      result.z = BitCount(tmp0.z);
5bd8deadSopenharmony_ci      result.w = BitCount(tmp0.w);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    BTC supports only signed and unsigned integer data type modifiers.  If no
5bd8deadSopenharmony_ci    type modifier is specified, both operands and the result are treated as
5bd8deadSopenharmony_ci    signed integers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, BTFL:  Find Least Significant Bit
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The BTFL instruction searches for the least significant bit of each
5bd8deadSopenharmony_ci    component of the single source vector, yielding a result vector comprising
5bd8deadSopenharmony_ci    the bit number of the located bit for each component.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      result.x = FindLSB(tmp0.x);
5bd8deadSopenharmony_ci      result.y = FindLSB(tmp0.y);
5bd8deadSopenharmony_ci      result.z = FindLSB(tmp0.z);
5bd8deadSopenharmony_ci      result.w = FindLSB(tmp0.w);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    BTFL supports only signed and unsigned integer data type modifiers.  For
5bd8deadSopenharmony_ci    unsigned integer data types, the search will yield the bit number of the
5bd8deadSopenharmony_ci    least significant one bit in each component, or the maximum integer (all
5bd8deadSopenharmony_ci    bits are ones) if the source vector component is zero.  For signed data
5bd8deadSopenharmony_ci    types, the search will yield the bit number of the least significant one
5bd8deadSopenharmony_ci    bit in each component, or -1 if the source vector component is zero.  If
5bd8deadSopenharmony_ci    no type modifier is specified, both operands and the result are treated as
5bd8deadSopenharmony_ci    signed integers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, BTFM:  Find Most Significant Bit
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The BTFM instruction searches for the most significant bit of each
5bd8deadSopenharmony_ci    component of the single source vector, yielding a result vector comprising
5bd8deadSopenharmony_ci    the bit number of the located bit for each component.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      result.x = FindMSB(tmp0.x);
5bd8deadSopenharmony_ci      result.y = FindMSB(tmp0.y);
5bd8deadSopenharmony_ci      result.z = FindMSB(tmp0.z);
5bd8deadSopenharmony_ci      result.w = FindMSB(tmp0.w);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    BTFM supports only signed and unsigned integer data type modifiers.  For
5bd8deadSopenharmony_ci    unsigned integer data types, the search will yield the bit number of the
5bd8deadSopenharmony_ci    most significant one bit in each component , or the maximum integer (all
5bd8deadSopenharmony_ci    bits are ones) if the source vector component is zero.  For signed data
5bd8deadSopenharmony_ci    types, the search will yield the bit number of the most significant one
5bd8deadSopenharmony_ci    bit if the source value is positive, the bit number of the most
5bd8deadSopenharmony_ci    significant zero bit if the source value is negative, or -1 if the source
5bd8deadSopenharmony_ci    value is zero.  If no type modifier is specified, both operands and the
5bd8deadSopenharmony_ci    result are treated as signed integers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, CVT:  Data Type Conversion
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The CVT instruction converts each component of the single source vector
5bd8deadSopenharmony_ci    from one specified data type to another to yield a result vector.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      result = DataTypeConvert(tmp0);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The CVT instruction requires two storage modifiers.  The first specifies
5bd8deadSopenharmony_ci    the data type of the result components; the second specifies the data type
5bd8deadSopenharmony_ci    of the operand components.  The supported storage modifiers are F16, F32,
5bd8deadSopenharmony_ci    F64, S8, S16, S32, S64, U8, U16, U32, and U64.  A storage modifier of
5bd8deadSopenharmony_ci    "F16" indicates a source or destination that is treated as having a
5bd8deadSopenharmony_ci    floating-point type, but whose sixteen least significant bits describe a
5bd8deadSopenharmony_ci    16-bit floating-point value using the encoding provided in Section 2.1.2.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If the component size of the source register doesn't match the size of the
5bd8deadSopenharmony_ci    specified operand data type, the source register components are first
5bd8deadSopenharmony_ci    interpreted as a value with the same base data type as the operand and
5bd8deadSopenharmony_ci    converted to the operand data type.  The operand components are then
5bd8deadSopenharmony_ci    converted to the result data type.  Finally, if the component size of the
5bd8deadSopenharmony_ci    destination register doesn't match the specified result data type, the
5bd8deadSopenharmony_ci    result components are converted to values of the same base data type with
5bd8deadSopenharmony_ci    a size matching the result register's component size.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Data type conversion is performed by first converting the source
5bd8deadSopenharmony_ci    components to an infinite-precision value of the destination data type,
5bd8deadSopenharmony_ci    and then converting to the result data type.  When converting between
5bd8deadSopenharmony_ci    floating-point and integer values, integer values are never interpreted as
5bd8deadSopenharmony_ci    being normalized to [0,1] or [-1,+1].  Converting the floating-point
5bd8deadSopenharmony_ci    special values -INF, +INF, and NaN to integers will yield undefined
5bd8deadSopenharmony_ci    results.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When converting from a non-integral floating-point value to an integer,
5bd8deadSopenharmony_ci    one of the two integers closest in value to the floating-point value are
5bd8deadSopenharmony_ci    chosen according to the rounding instruction modifier.  If "CEIL" or "FLR"
5bd8deadSopenharmony_ci    is specified, the larger or smaller value, respectively is chosen.  If
5bd8deadSopenharmony_ci    "TRUNC" is specified, the value nearest to zero is chosen.  If "ROUND" is
5bd8deadSopenharmony_ci    specified, if one integer is nearer in value to the original
5bd8deadSopenharmony_ci    floating-point value, it is chosen; otherwise, the even integer is chosen.
5bd8deadSopenharmony_ci    "ROUND" is used if no rounding modifier is specified.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When converting from the infinite-precision intermediate value to the
5bd8deadSopenharmony_ci    destination data type:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * Floating-point values not exactly representable in the destination
5bd8deadSopenharmony_ci        data are rounded to one of the two nearest values in the destination
5bd8deadSopenharmony_ci        type according to the rounding modifier.  Note that the results of
5bd8deadSopenharmony_ci        float-to-float conversion are not automatically rounded to integer
5bd8deadSopenharmony_ci        values, even if a rounding modifier such as CEIL or FLR is specified.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * Integer values are clamped to the closest value representable in the
5bd8deadSopenharmony_ci        result data type if the "SAT" (saturation) modifier is specified.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * Integer values drop the most significant bits if the "SAT" modifier is
5bd8deadSopenharmony_ci        not specified.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Negation and absolute value operators are not supported on the source
5bd8deadSopenharmony_ci    operand; a program using such operators will fail to compile.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    CVT supports no data type modifiers; the type of the operand and result
5bd8deadSopenharmony_ci    vectors is fully specified by the required storage modifiers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, EMIT:  Emit Vertex
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify the description of the EMIT opcode to deal with the interaction
5bd8deadSopenharmony_ci     with multiple vertex streams added by ARB_transform_feedback3.  For more
5bd8deadSopenharmony_ci     information on vertex streams, see ARB_transform_feedback3.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The EMIT instruction emits a new vertex to be added to the current output
5bd8deadSopenharmony_ci    primitive for vertex stream zero.  The attributes of the emitted vertex
5bd8deadSopenharmony_ci    are given by the current values of the vertex result variables.  After the
5bd8deadSopenharmony_ci    EMIT instruction completes, a new vertex is started and all result
5bd8deadSopenharmony_ci    variables become undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, EMITS:  Emit Vertex to Stream
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add new geometry program opcode; the EMITS instruction is not supported
5bd8deadSopenharmony_ci     for any other program types.  For more information on vertex streams, see
5bd8deadSopenharmony_ci     ARB_transform_feedback3.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The EMITS instruction emits a new vertex to be added to the current output
5bd8deadSopenharmony_ci    primitive for the vertex stream specified by the single signed integer
5bd8deadSopenharmony_ci    scalar operand.  The attributes of the emitted vertex are given by the
5bd8deadSopenharmony_ci    current values of the vertex result variables.  After the EMITS
5bd8deadSopenharmony_ci    instruction completes, a new vertex is started and all result variables
5bd8deadSopenharmony_ci    become undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If the specified stream is negative or greater than or equal to the
5bd8deadSopenharmony_ci    implementation-dependent number of vertex streams
5bd8deadSopenharmony_ci    (MAX_VERTEX_STREAMS_NV), the results of the instruction are undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, IPAC:  Interpolate at Centroid
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The IPAC instruction generates a result vector by evaluating the fragment
5bd8deadSopenharmony_ci    attribute named by the single vector operand at the centroid location.
5bd8deadSopenharmony_ci    The result vector would be identical to the value obtained by a MOV
5bd8deadSopenharmony_ci    instruction if the attribute variable were declared using the CENTROID
5bd8deadSopenharmony_ci    modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When interpolating an attribute variable with this instruction, the
5bd8deadSopenharmony_ci    CENTROID and SAMPLE attribute variable modifiers are ignored.  The FLAT
5bd8deadSopenharmony_ci    and NOPERSPECTIVE variable modifiers operate normally.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     tmp0 = Interpolate(op0, x_pixel + x_centroid, y_pixel + x_centroid);
5bd8deadSopenharmony_ci     result = tmp0;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    IPAC supports only floating-point data type modifiers.  A program will
5bd8deadSopenharmony_ci    fail to load if it contains an IPAC instruction whose single operand is
5bd8deadSopenharmony_ci    not a fragment program attribute variable or matches the "fragment.facing"
5bd8deadSopenharmony_ci    or "primitive.id" binding.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, IPAO:  Interpolate with Offset
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The IPAO instruction generates a result vector by evaluating the fragment
5bd8deadSopenharmony_ci    attribute named by the single vector operand at an offset from the pixel
5bd8deadSopenharmony_ci    center given by the x and y components of the second vector operand.  The
5bd8deadSopenharmony_ci    z and w components of the second vector operand are ignored.  The (x,y)
5bd8deadSopenharmony_ci    position used for interpolating the attribute variable is obtained by
5bd8deadSopenharmony_ci    adding the (x,y) offsets in the second vector operand to the (x,y)
5bd8deadSopenharmony_ci    position of the pixel center.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The range of offsets supported by the IPAO instruction is
5bd8deadSopenharmony_ci    implementation-dependent.  The position used to interpolate the attribute
5bd8deadSopenharmony_ci    variable is undefined if the x or y component of the second operand is
5bd8deadSopenharmony_ci    less than MIN_FRAGMENT_INTERPOLATION_OFFSET_NV or greater than
5bd8deadSopenharmony_ci    MAX_FRAGMENT_INTERPOLATION_OFFSET_NV.  Additionally, the granularity of
5bd8deadSopenharmony_ci    offsets may be limited.  The (x,y) value may be snapped to a fixed
5bd8deadSopenharmony_ci    sub-pixel grid with the number of subpixel bits given by
5bd8deadSopenharmony_ci    FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When interpolating an attribute variable with this instruction, the
5bd8deadSopenharmony_ci    CENTROID and SAMPLE attribute variable modifiers are ignored.  The FLAT
5bd8deadSopenharmony_ci    and NOPERSPECTIVE variable modifiers operate normally.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     tmp1 = VectorLoad(op1);
5bd8deadSopenharmony_ci     tmp0 = Interpolate(op0, x_pixel + tmp1.x, y_pixel + tmp2.x);
5bd8deadSopenharmony_ci     result = tmp0;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    IPAO supports only floating-point data type modifiers.  A program will
5bd8deadSopenharmony_ci    fail to load if it contains an IPAO instruction whose first operand is not
5bd8deadSopenharmony_ci    a fragment program attribute variable or matches the "fragment.facing" or
5bd8deadSopenharmony_ci    "primitive.id" binding.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, IPAS:  Interpolate at Sample Location
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The IPAS instruction generates a result vector by evaluating the fragment
5bd8deadSopenharmony_ci    attribute named by the single vector operand at the location of the
5bd8deadSopenharmony_ci    pixel's sample whose sample number is given by the second integer scalar
5bd8deadSopenharmony_ci    operand.  If multisample buffers are not available (SAMPLE_BUFFERS is
5bd8deadSopenharmony_ci    zero), the attribute will be evaluated at the pixel center.  If the sample
5bd8deadSopenharmony_ci    number given by the second operand does not exist, the position used to
5bd8deadSopenharmony_ci    interpolate the attribute is undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When interpolating an attribute variable with this instruction, the
5bd8deadSopenharmony_ci    CENTROID and SAMPLE attribute variable modifiers are ignored.  The FLAT
5bd8deadSopenharmony_ci    and NOPERSPECTIVE variable modifiers operate normally.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     sample = ScalarLoad(op1);
5bd8deadSopenharmony_ci     tmp1 = SampleOffset(sample);
5bd8deadSopenharmony_ci     tmp0 = Interpolate(op0, x_pixel + tmp1.x, y_pixel + tmp2.x);
5bd8deadSopenharmony_ci     result = tmp0;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    IPAS supports only floating-point data type modifiers.  A program will
5bd8deadSopenharmony_ci    fail to load if it contains an IPAO instruction whose first operand is not
5bd8deadSopenharmony_ci    a fragment program attribute variable or matches the "fragment.facing" or
5bd8deadSopenharmony_ci    "primitive.id" binding.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, LDC:  Load from Constant Buffer
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The LDC instruction loads a vector operand from a buffer object to yield a
5bd8deadSopenharmony_ci    result vector.  The operand used for the LDC instruction must correspond
5bd8deadSopenharmony_ci    to a parameter buffer variable declared using the "CBUFFER" statement; a
5bd8deadSopenharmony_ci    program will fail to load if any other type of operand is used in an LDC
5bd8deadSopenharmony_ci    instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      result = BufferMemoryLoad(&op0, storageModifier);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    A base operand vector is fetched from memory as described in Section
5bd8deadSopenharmony_ci    2.X.4.5, with the GPU address derived from the binding corresponding to
5bd8deadSopenharmony_ci    the operand.  A final operand vector is derived from the base operand
5bd8deadSopenharmony_ci    vector by applying swizzle, negation, and absolute value operand modifiers
5bd8deadSopenharmony_ci    as described in Section 2.X.4.2.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The amount of memory in any given buffer object binding accessible by the
5bd8deadSopenharmony_ci    LDC instruction may be limited.  If any component fetched by the LDC
5bd8deadSopenharmony_ci    instruction extends 4*<n> or more basic machine units from the beginning
5bd8deadSopenharmony_ci    of the buffer object binding, where <n> is the implementation-dependent
5bd8deadSopenharmony_ci    constant MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV, the value fetched for that
5bd8deadSopenharmony_ci    component will be undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    LDC supports no base data type modifiers, but requires exactly one storage
5bd8deadSopenharmony_ci    modifier.  The base data types of the operand and result vectors are
5bd8deadSopenharmony_ci    derived from the storage modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, LOAD:  Global Load
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The LOAD instruction generates a result vector by reading an address from
5bd8deadSopenharmony_ci    the single unsigned integer scalar operand and fetching data from buffer
5bd8deadSopenharmony_ci    object memory, as described in Section 2.X.4.5.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      address = ScalarLoad(op0);
5bd8deadSopenharmony_ci      result = BufferMemoryLoad(address, storageModifier);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    LOAD supports no base data type modifiers, but requires exactly one
5bd8deadSopenharmony_ci    storage modifier.  The base data type of the result vector is derived from
5bd8deadSopenharmony_ci    the storage modifier.  The single scalar operand is always interpreted as
5bd8deadSopenharmony_ci    an unsigned integer.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, MEMBAR:  Memory Barrier
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The MEMBAR instruction synchronizes memory transactions to ensure that
5bd8deadSopenharmony_ci    memory transactions resulting from any instruction executed by the thread
5bd8deadSopenharmony_ci    prior to the MEMBAR instruction complete prior to any memory transactions
5bd8deadSopenharmony_ci    issued after the instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    MEMBAR has no operands and generates no result.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, PK64:  Pack 64-Bit Component
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The PK64 instruction reads the four components of the single vector
5bd8deadSopenharmony_ci    operand as 32-bit values, packs the bit representations of these into a
5bd8deadSopenharmony_ci    pair of 64-bit values, and replicates those to produce a four-component
5bd8deadSopenharmony_ci    result vector.  The "x" and "y" components of the operand are packed to
5bd8deadSopenharmony_ci    produce the "x" and "z" components of the result vector; the "z" and "w"
5bd8deadSopenharmony_ci    components of the operand are packed to produce the "y" and "w" components
5bd8deadSopenharmony_ci    of the result vector.  The PK64 instruction can be reversed by the UP64
5bd8deadSopenharmony_ci    instruction below.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This instruction is intended to allow a program to reconstruct 64-bit
5bd8deadSopenharmony_ci    integer or floating-point values generated by the application but passed
5bd8deadSopenharmony_ci    to the GL as two 32-bit values taken from adjacent words in memory.  The
5bd8deadSopenharmony_ci    ability to use this technique depends on how the 64-bit value is stored in
5bd8deadSopenharmony_ci    memory.  For "little-endian" processors, first 32-bit value would hold the
5bd8deadSopenharmony_ci    with the least significant 32 bits of the 64-bit value.  For "big-endian"
5bd8deadSopenharmony_ci    processors, the first 32-bit value holds the most significant 32 bits of
5bd8deadSopenharmony_ci    the 64-bit value.  This reconstruction assumes that the first 32-bit word
5bd8deadSopenharmony_ci    comes from the x component of the operand and the second 32-bit word comes
5bd8deadSopenharmony_ci    from the y component.  The method used to construct a 64-bit value from a
5bd8deadSopenharmony_ci    pair of 32-bit values depends on the processor type.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      if (underlying system is little-endian) {
5bd8deadSopenharmony_ci        result.x = RawBits(tmp.x) | (RawBits(tmp.y) << 32);
5bd8deadSopenharmony_ci        result.y = RawBits(tmp.z) | (RawBits(tmp.w) << 32);
5bd8deadSopenharmony_ci        result.z = RawBits(tmp.x) | (RawBits(tmp.y) << 32);
5bd8deadSopenharmony_ci        result.w = RawBits(tmp.z) | (RawBits(tmp.w) << 32);
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        result.x = RawBits(tmp.y) | (RawBits(tmp.x) << 32);
5bd8deadSopenharmony_ci        result.y = RawBits(tmp.w) | (RawBits(tmp.z) << 32);
5bd8deadSopenharmony_ci        result.z = RawBits(tmp.y) | (RawBits(tmp.x) << 32);
5bd8deadSopenharmony_ci        result.w = RawBits(tmp.w) | (RawBits(tmp.z) << 32);
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    PK64 supports integer and floating-point data type modifiers, which
5bd8deadSopenharmony_ci    specify the base data type of the operand and result.  The single vector
5bd8deadSopenharmony_ci    operand is always treated as having 32-bit components, and the result is
5bd8deadSopenharmony_ci    treated as a vector with 64-bit components.  The encoding performed by
5bd8deadSopenharmony_ci    PK64 can be reversed using the UP64 instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    A program will fail to load if it contains a PK64 instruction that writes
5bd8deadSopenharmony_ci    its results to a variable not declared as "LONG".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, STORE:  Global Store
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The STORE instruction reads an address from the second unsigned integer
5bd8deadSopenharmony_ci    scalar operand and writes the contents of the first vector operand to
5bd8deadSopenharmony_ci    buffer object memory at that address, as described in Section 2.X.4.5.
5bd8deadSopenharmony_ci    This instruction generates no result.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      address = ScalarLoad(op1);
5bd8deadSopenharmony_ci      BufferMemoryStore(address, tmp0, storageModifier);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    STORE supports no base data type modifiers, but requires exactly one
5bd8deadSopenharmony_ci    storage modifier.  The base data type of the vector components of the
5bd8deadSopenharmony_ci    first operand is derived from the storage modifier.  The second operand is
5bd8deadSopenharmony_ci    always interpreted as an unsigned integer scalar.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TEX:  Texture Sample
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
5bd8deadSopenharmony_ci     longer need to be immediate arguments.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
5bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      ddx = ComputePartialsX(tmp);
5bd8deadSopenharmony_ci      ddy = ComputePartialsY(tmp);
5bd8deadSopenharmony_ci      lambda = ComputeLOD(ddx, ddy);
5bd8deadSopenharmony_ci      result = TextureSample(tmp, lambda, ddx, ddy, itmp);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TGALL:  Test for All Non-Zero in a Thread Group
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The TGALL instruction produces a result vector by reading a vector operand
5bd8deadSopenharmony_ci    for each active thread in the current thread group and comparing each
5bd8deadSopenharmony_ci    component to zero.  A result vector component contains a TRUE value
5bd8deadSopenharmony_ci    (described below) if the value of the corresponding component in the
5bd8deadSopenharmony_ci    operand vector is non-zero for all active threads, and a FALSE value
5bd8deadSopenharmony_ci    otherwise.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    An implementation may choose to arrange programs threads into thread
5bd8deadSopenharmony_ci    groups, and execute an instruction simultaneously for each thread in the
5bd8deadSopenharmony_ci    group.  If the TGALL instruction is contained inside conditional flow
5bd8deadSopenharmony_ci    control blocks and not all threads in the group execute the instruction,
5bd8deadSopenharmony_ci    the operand values for threads not executing the instruction have no
5bd8deadSopenharmony_ci    bearing on the value returned.  The method used to arrange threads into
5bd8deadSopenharmony_ci    groups is undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      result = { TRUE, TRUE, TRUE, TRUE };
5bd8deadSopenharmony_ci      for (all active threads) {
5bd8deadSopenharmony_ci        if ([thread]tmp.x == 0) result.x = FALSE;
5bd8deadSopenharmony_ci        if ([thread]tmp.y == 0) result.y = FALSE;
5bd8deadSopenharmony_ci        if ([thread]tmp.z == 0) result.z = FALSE;
5bd8deadSopenharmony_ci        if ([thread]tmp.w == 0) result.w = FALSE;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TGALL supports all data type modifiers.  For floating-point data types,
5bd8deadSopenharmony_ci    the TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
5bd8deadSopenharmony_ci    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
5bd8deadSopenharmony_ci    integer data types, the TRUE value is the maximum integer value (all bits
5bd8deadSopenharmony_ci    are ones) and the FALSE value is zero.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TGANY:  Test for Any Non-Zero in a Thread Group
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The TGANY instruction produces a result vector by reading a vector operand
5bd8deadSopenharmony_ci    for each active thread in the current thread group and comparing each
5bd8deadSopenharmony_ci    component to zero.  A result vector component contains a TRUE value
5bd8deadSopenharmony_ci    (described below) if the value of the corresponding component in the
5bd8deadSopenharmony_ci    operand vector is non-zero for any active thread, and a FALSE value
5bd8deadSopenharmony_ci    otherwise.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    An implementation may choose to arrange programs threads into thread
5bd8deadSopenharmony_ci    groups, and execute an instruction simultaneously for each thread in the
5bd8deadSopenharmony_ci    group.  If the TGANY instruction is contained inside conditional flow
5bd8deadSopenharmony_ci    control blocks and not all threads in the group execute the instruction,
5bd8deadSopenharmony_ci    the operand values for threads not executing the instruction have no
5bd8deadSopenharmony_ci    bearing on the value returned.  The method used to arrange threads into
5bd8deadSopenharmony_ci    groups is undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      result = { FALSE, FALSE, FALSE, FALSE };
5bd8deadSopenharmony_ci      for (all active threads) {
5bd8deadSopenharmony_ci        if ([thread]tmp.x != 0) result.x = TRUE;
5bd8deadSopenharmony_ci        if ([thread]tmp.y != 0) result.y = TRUE;
5bd8deadSopenharmony_ci        if ([thread]tmp.z != 0) result.z = TRUE;
5bd8deadSopenharmony_ci        if ([thread]tmp.w != 0) result.w = TRUE;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TGANY supports all data type modifiers.  For floating-point data types,
5bd8deadSopenharmony_ci    the TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
5bd8deadSopenharmony_ci    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
5bd8deadSopenharmony_ci    integer data types, the TRUE value is the maximum integer value (all bits
5bd8deadSopenharmony_ci    are ones) and the FALSE value is zero.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TGEQ:  Test for All Equal Values in a Thread Group
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The TGEQ instruction produces a result vector by reading a vector operand
5bd8deadSopenharmony_ci    for each active thread in the current thread group and comparing each
5bd8deadSopenharmony_ci    component to zero.  A result vector component contains a TRUE value
5bd8deadSopenharmony_ci    (described below) if the value of the corresponding component in the
5bd8deadSopenharmony_ci    operand vector is the same for all active threads, and a FALSE value
5bd8deadSopenharmony_ci    otherwise.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    An implementation may choose to arrange programs threads into thread
5bd8deadSopenharmony_ci    groups, and execute an instruction simultaneously for each thread in the
5bd8deadSopenharmony_ci    group.  If the TGEQ instruction is contained inside conditional flow
5bd8deadSopenharmony_ci    control blocks and not all threads in the group execute the instruction,
5bd8deadSopenharmony_ci    the operand values for threads not executing the instruction have no
5bd8deadSopenharmony_ci    bearing on the value returned.  The method used to arrange threads into
5bd8deadSopenharmony_ci    groups is undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      tgall = { TRUE, TRUE, TRUE, TRUE };
5bd8deadSopenharmony_ci      tgany = { FALSE, FALSE, FALSE, FALSE };
5bd8deadSopenharmony_ci      for (all active threads) {
5bd8deadSopenharmony_ci        if ([thread]tmp.x == 0) tgall.x = FALSE; else tgany.x = TRUE;
5bd8deadSopenharmony_ci        if ([thread]tmp.y == 0) tgall.y = FALSE; else tgany.y = TRUE;
5bd8deadSopenharmony_ci        if ([thread]tmp.z == 0) tgall.z = FALSE; else tgany.z = TRUE;
5bd8deadSopenharmony_ci        if ([thread]tmp.w == 0) tgall.w = FALSE; else tgany.w = TRUE;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      result.x = (tgall.x == tgany.x) ? TRUE : FALSE;
5bd8deadSopenharmony_ci      result.y = (tgall.y == tgany.y) ? TRUE : FALSE;
5bd8deadSopenharmony_ci      result.z = (tgall.z == tgany.z) ? TRUE : FALSE;
5bd8deadSopenharmony_ci      result.w = (tgall.w == tgany.w) ? TRUE : FALSE;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TGEQ supports all data type modifiers.  For floating-point data types, the
5bd8deadSopenharmony_ci    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
5bd8deadSopenharmony_ci    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
5bd8deadSopenharmony_ci    integer data types, the TRUE value is the maximum integer value (all bits
5bd8deadSopenharmony_ci    are ones) and the FALSE value is zero.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TXB:  Texture Sample with Bias
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
5bd8deadSopenharmony_ci     longer need to be immediate arguments.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
5bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      ddx = ComputePartialsX(tmp);
5bd8deadSopenharmony_ci      ddy = ComputePartialsY(tmp);
5bd8deadSopenharmony_ci      lambda = ComputeLOD(ddx, ddy);
5bd8deadSopenharmony_ci      result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, itmp);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TXG:  Texture Gather
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Update the TXG opcode description from NV_gpu_program4_1 specification.
5bd8deadSopenharmony_ci     This version adds two capabilities:  any component of a multi-component
5bd8deadSopenharmony_ci     texture can be selected by tacking on a component name to the texture
5bd8deadSopenharmony_ci     variable passed to identify the texture unit, and depth compares are
5bd8deadSopenharmony_ci     supported if a SHADOW target is specified.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The TXG instruction takes the four components of a single floating-point
5bd8deadSopenharmony_ci    vector operand as a texture coordinate, determines a set of four texels to
5bd8deadSopenharmony_ci    sample from the base level of detail of the specified texture image, and
5bd8deadSopenharmony_ci    returns one component from each texel in a four-component result vector.
5bd8deadSopenharmony_ci    To determine the four texels to sample, the minification and magnification
5bd8deadSopenharmony_ci    filters are ignored and the rules for LINEAR filter are applied to the
5bd8deadSopenharmony_ci    base level of the texture image to determine the texels T_i0_j1, T_i1_j1,
5bd8deadSopenharmony_ci    T_i1_j0, and T_i0_j0, as defined in equations 3.23 through 3.25. The
5bd8deadSopenharmony_ci    texels are then converted to texture source colors (Rs,Gs,Bs,As) according
5bd8deadSopenharmony_ci    to table 3.21, followed by application of the texture swizzle as described
5bd8deadSopenharmony_ci    in section 3.8.13.  A four-component vector is returned by taking one of
5bd8deadSopenharmony_ci    the four components of the swizzled texture source colors from each of the
5bd8deadSopenharmony_ci    four selected texels.  The component is selected using the
5bd8deadSopenharmony_ci    <texImageUnitComp> grammar rule, by adding a scalar suffix
5bd8deadSopenharmony_ci    (".x", ".y", ".z", ".w") to the identified texture; if no scalar suffix
5bd8deadSopenharmony_ci    is provided, the first component is selected.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TXG only operates on 2D, SHADOW2D, CUBE, SHADOWCUBE, ARRAY2D,
5bd8deadSopenharmony_ci    SHADOWARRAY2D, ARRAYCUBE, SHADOWARRAYCUBE, RECT, and SHADOWRECT texture
5bd8deadSopenharmony_ci    targets; a program will fail to compile if any other texture target is
5bd8deadSopenharmony_ci    used.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When using a "SHADOW" texture target, component selection is ignored.
5bd8deadSopenharmony_ci    Instead, depth comparisons are performed on the depth values for each of
5bd8deadSopenharmony_ci    the four selected texels, and 0/1 values are returned based on the results
5bd8deadSopenharmony_ci    of the comparison.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    As with other texture accesses, the results of a texture gather operation
5bd8deadSopenharmony_ci    are undefined if the texture target in the instruction is incompatible
5bd8deadSopenharmony_ci    with the selected texture's base internal format and depth compare mode.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      ddx = (0,0,0);
5bd8deadSopenharmony_ci      ddy = (0,0,0);
5bd8deadSopenharmony_ci      lambda = 0;
5bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
5bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      result.x = TextureSample_i0j1(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci      result.y = TextureSample_i1j1(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci      result.z = TextureSample_i1j0(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci      result.w = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    In this pseudocode, "<comp>" refers to the texel component selected by the
5bd8deadSopenharmony_ci    <texImageUnitComp> grammar rule, as described above.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TXG supports all three data type modifiers.  The single operand is always
5bd8deadSopenharmony_ci    treated as a floating-point vector; the results are interpreted according
5bd8deadSopenharmony_ci    to the data type modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TXGO:  Texture Gather with Per-Texel Offsets
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Like the TXG instruction, the TXGO instruction takes the four components
5bd8deadSopenharmony_ci    of its first floating-point vector operand as a texture coordinate,
5bd8deadSopenharmony_ci    determines a set of four texels to sample from the base level of detail of
5bd8deadSopenharmony_ci    the specified texture image, and returns one component from each texel in
5bd8deadSopenharmony_ci    a four-component result vector.  The second and third vector operands are
5bd8deadSopenharmony_ci    taken as signed four-component integer vectors providing the x and y
5bd8deadSopenharmony_ci    components of the offsets, respectively, used to determine the location of
5bd8deadSopenharmony_ci    each of the four texels.  To determine the four texels to sample, each of
5bd8deadSopenharmony_ci    the four independent offsets is used in conjunction with the specified
5bd8deadSopenharmony_ci    texture coordinate to select a texel.  The minification and magnification
5bd8deadSopenharmony_ci    filters are ignored and the rules for LINEAR filtering are used to select
5bd8deadSopenharmony_ci    the texel T_i0_j0, as defined in equations 3.23 through 3.25, from the
5bd8deadSopenharmony_ci    base level of the texture image.  The texels are then converted to texture
5bd8deadSopenharmony_ci    source colors (Rs,Gs,Bs,As) according to table 3.21, followed by
5bd8deadSopenharmony_ci    application of the texture swizzle as described in section 3.8.13.  A
5bd8deadSopenharmony_ci    four-component vector is returned by taking one of the four components
5bd8deadSopenharmony_ci    of the swizzled texture source colors from each of the four selected
5bd8deadSopenharmony_ci    texels.  The component is selected using the <texImageUnitComp> grammar
5bd8deadSopenharmony_ci    rule, by adding a scalar suffix (".x", ".y", ".z", ".w") to the identified
5bd8deadSopenharmony_ci    texture; if no scalar suffix is provided, the first component is selected.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TXGO only operates on 2D, SHADOW2D, ARRAY2D, SHADOWARRAY2D, RECT, and
5bd8deadSopenharmony_ci    SHADOWRECT texture targets; a program will fail to compile if any other
5bd8deadSopenharmony_ci    texture target is used.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When using a "SHADOW" texture target, component selection is ignored.
5bd8deadSopenharmony_ci    Instead, depth comparisons are performed on the depth values for each of
5bd8deadSopenharmony_ci    the four selected texels, and 0/1 values are returned based on the results
5bd8deadSopenharmony_ci    of the comparison.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    As with other texture accesses, the results of a texture gather operation
5bd8deadSopenharmony_ci    are undefined if the texture target in the instruction is incompatible
5bd8deadSopenharmony_ci    with the selected texture's base internal format and depth compare mode.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      itmp1 = VectorLoad(op1);
5bd8deadSopenharmony_ci      itmp2 = VectorLoad(op2);
5bd8deadSopenharmony_ci      ddx = (0,0,0);
5bd8deadSopenharmony_ci      ddy = (0,0,0);
5bd8deadSopenharmony_ci      lambda = 0;
5bd8deadSopenharmony_ci      itmp = (op1.x, op2.x);
5bd8deadSopenharmony_ci      result.x = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci      itmp = (op1.y, op2.y);
5bd8deadSopenharmony_ci      result.y = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci      itmp = (op1.z, op2.z);
5bd8deadSopenharmony_ci      result.z = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci      itmp = (op1.w, op2.w);
5bd8deadSopenharmony_ci      result.w = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    In this pseudocode, "<comp>" refers to the texel component selected by the
5bd8deadSopenharmony_ci    <texImageUnitComp> grammar rule, as described above.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If TEXTURE_WRAP_S or TEXTURE_WRAP_T are either CLAMP or MIRROR_CLAMP_EXT,
5bd8deadSopenharmony_ci    the results of the TXGO instruction are undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Note:  The TXG instruction is equivalent to the TXGO instruction with X
5bd8deadSopenharmony_ci    and Y offset vectors of (0,1,1,0) and (0,0,-1,-1), respectively.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    TXGO supports all three data type modifiers.  The first operand is always
5bd8deadSopenharmony_ci    treated as a floating-point vector and the second and third operands are
5bd8deadSopenharmony_ci    always treated as a signed integer vector; the results are interpreted
5bd8deadSopenharmony_ci    according to the data type modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TXL:  Texture Sample with LOD
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
5bd8deadSopenharmony_ci     longer need to be immediate arguments.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
5bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      ddx = (0,0,0);
5bd8deadSopenharmony_ci      ddy = (0,0,0);
5bd8deadSopenharmony_ci      result = TextureSample(tmp, tmp.w, ddx, ddy, itmp);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, TXP:  Texture Sample with Projection
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
5bd8deadSopenharmony_ci     longer need to be immediate arguments.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
5bd8deadSopenharmony_ci      tmp0.x = tmp0.x / tmp0.w;
5bd8deadSopenharmony_ci      tmp0.y = tmp0.y / tmp0.w;
5bd8deadSopenharmony_ci      tmp0.z = tmp0.z / tmp0.w;
5bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
5bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci      ddx = ComputePartialsX(tmp);
5bd8deadSopenharmony_ci      ddy = ComputePartialsY(tmp);
5bd8deadSopenharmony_ci      lambda = ComputeLOD(ddx, ddy);
5bd8deadSopenharmony_ci      result = TextureSample(tmp, lambda, ddx, ddy, itmp);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, UP64:  Unpack 64-bit Component
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The UP64 instruction produces a vector result with 32-bit components by
5bd8deadSopenharmony_ci    unpacking the bits of the "x" and "y" components of a 64-bit vector
5bd8deadSopenharmony_ci    operand.  The "x" component of the operand is unpacked to produce the "x"
5bd8deadSopenharmony_ci    and "y" components of the result vector; the "y" component is unpacked to
5bd8deadSopenharmony_ci    produce the "z" and "w" components of the result vector.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This instruction is intended to allow a program to pass 64-bit integer or
5bd8deadSopenharmony_ci    floating-point values to an application using two 32-bit values stored in
5bd8deadSopenharmony_ci    adjacent words in memory, which will be read by the application as single
5bd8deadSopenharmony_ci    64-bit values.  The ability to use this technique depends on how the
5bd8deadSopenharmony_ci    64-bit value is stored in memory.  For "little-endian" processors, the
5bd8deadSopenharmony_ci    first 32-bit value would hold the with the least significant 32 bits of
5bd8deadSopenharmony_ci    the 64-bit value.  For "big-endian" processors, the first 32-bit value
5bd8deadSopenharmony_ci    holds the most significant 32 bits of the 64-bit value.  This
5bd8deadSopenharmony_ci    reconstruction assumes that the first 32-bit word comes from the "x"
5bd8deadSopenharmony_ci    component of the operand and the second 32-bit word comes from the "y"
5bd8deadSopenharmony_ci    component.  The method used to unpack a 64-bit value into a pair of 32-bit
5bd8deadSopenharmony_ci    values depends on the processor type.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
5bd8deadSopenharmony_ci      if (underlying system is little-endian) {
5bd8deadSopenharmony_ci        result.x = (RawBits(tmp.x) >>  0) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci        result.y = (RawBits(tmp.x) >> 32) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci        result.z = (RawBits(tmp.y) >>  0) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci        result.w = (RawBits(tmp.y) >> 32) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci      } else {
5bd8deadSopenharmony_ci        result.x = (RawBits(tmp.x) >> 32) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci        result.y = (RawBits(tmp.x) >>  0) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci        result.z = (RawBits(tmp.y) >> 32) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci        result.w = (RawBits(tmp.y) >>  0) & 0xFFFFFFFF;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    UP64 supports integer and floating-point data type modifiers, which
5bd8deadSopenharmony_ci    specify the base data type of the operand and result.  The single operand
5bd8deadSopenharmony_ci    vector always has 64-bit components.  The result is treated as a vector
5bd8deadSopenharmony_ci    with 32-bit components.  The encoding performed by UP64 can be reversed
5bd8deadSopenharmony_ci    using the PK64 instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    A program will fail to load if it contains a UP64 instruction whose
5bd8deadSopenharmony_ci    operand is a variable not declared as "LONG".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.14.6.1 of the NV_geometry_program4 specification,
5bd8deadSopenharmony_ci    Geometry Program Input Primitives
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add patches to the list of supported input primitive types)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The supported input primitive types are: ...
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Patches (PATCHES)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Geometry programs that operate on patches are valid only for the
5bd8deadSopenharmony_ci    PATCHES_NV primitive type.  There are a variable number of vertices
5bd8deadSopenharmony_ci    available for each program invocation, depending on the number of input
5bd8deadSopenharmony_ci    vertices in the primitive itself.  For a patch with <n> vertices,
5bd8deadSopenharmony_ci    "vertex[0]" refers to the first vertex of the patch, and "vertex[<n>-1]"
5bd8deadSopenharmony_ci    refers to the last vertex.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.14.6.2 of the NV_geometry_program4 specification,
5bd8deadSopenharmony_ci    Geometry Program Output Primitives
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add a new paragraph limiting the use of the EMITS opcode to geometry
5bd8deadSopenharmony_ci     programs with a POINTS output primitive type at the end of the section.
5bd8deadSopenharmony_ci     This limitation may be removed in future specifications.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Geometry programs may write to multiple vertex streams only if the
5bd8deadSopenharmony_ci    specified output primitive type is POINTS.  A program will fail to load if
5bd8deadSopenharmony_ci    it contains and EMITS instruction and the output primitive type specified
5bd8deadSopenharmony_ci    by the PRIMITIVE_OUT declaration is not POINTS.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.14.6.4 of the NV_geometry_program4 specification,
5bd8deadSopenharmony_ci    Geometry Program Output Limits
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Modify the limitation on the total number of components emitted by a
5bd8deadSopenharmony_ci     geometry program from NV_gpu_program4 to be per-invocation.  If a that
5bd8deadSopenharmony_ci     limit is 4096 and a program has 16 invocations, each of the 16 program
5bd8deadSopenharmony_ci     invocation can emit up to 4096 total components.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    There are two implementation-dependent limits that limit the total number
5bd8deadSopenharmony_ci    of vertices that each invocation of a program can emit.  First, the vertex
5bd8deadSopenharmony_ci    limit may not exceed the value of MAX_PROGRAM_OUTPUT_VERTICES_NV.  Second,
5bd8deadSopenharmony_ci    product of the vertex limit and the number of result variable components
5bd8deadSopenharmony_ci    written by the program (PROGRAM_RESULT_COMPONENTS_NV, as described in
5bd8deadSopenharmony_ci    section 2.X.3.5 of NV_gpu_program4) may not exceed the value of
5bd8deadSopenharmony_ci    MAX_PROGRAM_TOTAL_OUTPUT_COMPONENTS_NV.  A geometry program will fail to
5bd8deadSopenharmony_ci    load if its maximum vertex count or maximum total component count exceeds
5bd8deadSopenharmony_ci    the implementation-dependent limit.  The limits may be queried by calling
5bd8deadSopenharmony_ci    GetProgramiv with a <target> of GEOMETRY_PROGRAM_NV.  Note that the
5bd8deadSopenharmony_ci    maximum number of vertices that a geometry program can emit may be much
5bd8deadSopenharmony_ci    lower than MAX_PROGRAM_OUTPUT_VERTICES_NV if the program writes a large
5bd8deadSopenharmony_ci    number of result variable components.  If a geometry program has multiple
5bd8deadSopenharmony_ci    invocations (via the "INVOCATIONS" declaration), the program will load
5bd8deadSopenharmony_ci    successfully as long as no single invocation exceeds the total component
5bd8deadSopenharmony_ci    count limit, even if the total output of all invocations combined exceeds
5bd8deadSopenharmony_ci    the limit.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.0 Specification (Rasterization)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 3.X, Early Per-Fragment Tests, as documented in the
5bd8deadSopenharmony_ci    EXT_shader_image_load_store specification
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add new paragraph at the end of a section, describing how early fragment
5bd8deadSopenharmony_ci     tests work when assembly fragment programs are active)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If an assembly fragment program is active, early depth tests are
5bd8deadSopenharmony_ci    considered enabled if and only if the fragment program source included the
5bd8deadSopenharmony_ci    NV_early_fragment_tests option.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci   Add to Section 3.11.4.5 of ARB_fragment_program (Fragment Program):
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci   Section 3.11.4.5.3, ARB_blend_func_extended Option
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci   If a fragment program specifies the "ARB_blend_func_extended" option, dual
5bd8deadSopenharmony_ci   source color outputs as described in ARB_blend_func_extended are made
5bd8deadSopenharmony_ci   available through the use of the "result.color[n].primary" and
5bd8deadSopenharmony_ci   "result.color[n].secondary" result bindings, corresponding to SRC_COLOR
5bd8deadSopenharmony_ci   and SRC1_COLOR, respectively, for the fragment color output numbered <n>.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.0 Specification (Per-Fragment
5bd8deadSopenharmony_ciOperations and the Frame Buffer)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 4.4.3, Rendering When an Image of a Bound Texture Object
5bd8deadSopenharmony_ci    is Also Attached to the Framebuffer, p. 288
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Replace the complicated set of conditions with the following)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Specifically, the values of rendered fragments are undefined if any
5bd8deadSopenharmony_ci    shader stage fetches texels from a given mipmap level, cubemap face, and
5bd8deadSopenharmony_ci    array layer of a texture if that same mipmap level, cubemap face, and
5bd8deadSopenharmony_ci    array layer of the texture can be written to via fragment shader outputs,
5bd8deadSopenharmony_ci    even if the reads and writes are not in the same Draw call. However, an
5bd8deadSopenharmony_ci    application can insert MemoryBarrier(TEXTURE_FETCH_BARRIER_BIT_NV) between
5bd8deadSopenharmony_ci    Draw calls that have such read/write hazards in order to guarantee that
5bd8deadSopenharmony_ci    writes have completed and caches have been invalidated, as described in
5bd8deadSopenharmony_ci    section 2.20.X.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.0 Specification (State and
5bd8deadSopenharmony_ciState Requests)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.0 Specification (Invariance)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciGLX Protocol
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciErrors
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None, other than new conditions by which a program string would fail to
5bd8deadSopenharmony_ci    load.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew State
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Implementation Dependent State
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci                                                             Minimum
5bd8deadSopenharmony_ci    Get Value                         Type  Get Command       Value   Description           Sec.   Attrib
5bd8deadSopenharmony_ci    --------------------------------  ----  ---------------  -------  --------------------- ------ ------
5bd8deadSopenharmony_ci    MAX_GEOMETRY_PROGRAM_              Z+   GetIntegerv        32     Maximum number of GP  2.X.6.Y  -
5bd8deadSopenharmony_ci      INVOCATIONS_NV                                                  invocations per prim.
5bd8deadSopenharmony_ci    MIN_FRAGMENT_INTERPOLATION_        R    GetFloatv        -0.5     Max. negative offset  2.X.8.Z  -
5bd8deadSopenharmony_ci      OFFSET_NV                                                       for IPAO instruction.
5bd8deadSopenharmony_ci    MAX_FRAGMENT_INTERPOLATION_        R    GetFloatv         +0.5    Max. positive offset  2.X.8.Z  -
5bd8deadSopenharmony_ci      OFFSET_NV                                                       for IPAO instruction.
5bd8deadSopenharmony_ci    FRAGMENT_PROGRAM_INTERPOLATION_    Z+   GetIntegerv         4     Subpixel bit count    2.X.8.Z  -
5bd8deadSopenharmony_ci      OFFSET_BITS_NV                                                  for IPAO instruction
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on NV_gpu_program4, NV_vertex_program4, NV_geometry_program4, and
5bd8deadSopenharmony_ciNV_fragment_program4
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension is written against the NV_gpu_program4 family of
5bd8deadSopenharmony_ci    extensions, and introduces new instruction set features and inputs/outputs
5bd8deadSopenharmony_ci    described here.  These features are available only if the extension is
5bd8deadSopenharmony_ci    supported and the appropriate program header string is used ("!!NVvp5.0"
5bd8deadSopenharmony_ci    for vertex programs, "!!NVgp5.0" for geometry programs, and "!!NVfp5.0"
5bd8deadSopenharmony_ci    for fragment programs.) When loading a program with an older header (e.g.,
5bd8deadSopenharmony_ci    "!!NVvp4.0"), the instruction set features described in this extension are
5bd8deadSopenharmony_ci    not available.  The features in this extension build upon those documented
5bd8deadSopenharmony_ci    in full in NV_gpu_program4.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on NV_tessellation_program5
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension provides the basic assembly instruction set constructs for
5bd8deadSopenharmony_ci    tessellation programs.  If this extension is supported, tessellation
5bd8deadSopenharmony_ci    control and evaluation programs are supported, as described in the
5bd8deadSopenharmony_ci    NV_tessellation_program5 specification.  There is no separate extension
5bd8deadSopenharmony_ci    string for tessellation programs; such support is implied by this
5bd8deadSopenharmony_ci    extension.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on ARB_transform_feedback3
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The concept of multiple vertex streams emitted by a geometry shader is
5bd8deadSopenharmony_ci    introduced by ARB_transform_feedback3, as is the description of how they
5bd8deadSopenharmony_ci    operate and implementation-dependent limits on the number of streams.
5bd8deadSopenharmony_ci    This extension simply provides a mechanism to emit a vertex to more than
5bd8deadSopenharmony_ci    one stream.  If ARB_transform_feedback3 is not supported, language
5bd8deadSopenharmony_ci    describing the EMITS opcode and the restriction on PRIMITIVE_OUT when
5bd8deadSopenharmony_ci    EMITS is used should be removed.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The programmability functionality provided by NV_shader_buffer_load is
5bd8deadSopenharmony_ci    also incorporated by this extension.  Any assembly program using a program
5bd8deadSopenharmony_ci    header corresponding to this or any subsequent extension (e.g.,
5bd8deadSopenharmony_ci    "!!NVfp5.0") may use the LOAD opcode without needing to declare "OPTION
5bd8deadSopenharmony_ci    NV_shader_buffer_load".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_shader_buffer_load is required by this extension, which means that the
5bd8deadSopenharmony_ci    API mechanisms documented there allowing applications to make a buffer
5bd8deadSopenharmony_ci    resident and query its GPU address are available to any applications using
5bd8deadSopenharmony_ci    this extension.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    In addition to the basic functionality in NV_shader_buffer_load, this
5bd8deadSopenharmony_ci    extension provides the ability to load 64-bit integers and floating-point
5bd8deadSopenharmony_ci    values using the "S64", "S64X2", "S64X4", "U64", "U64X2", "U64X4", "F64",
5bd8deadSopenharmony_ci    "F64X2", and "F64X4" opcode modifiers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on NV_shader_buffer_store
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension provides assembly programmability support for the
5bd8deadSopenharmony_ci    NV_shader_buffer_store, which provides the API mechanisms allowing buffer
5bd8deadSopenharmony_ci    object to be stored to.  NV_shader_buffer_store does not have a separate
5bd8deadSopenharmony_ci    extension string entry, and will always be supported if this extension is
5bd8deadSopenharmony_ci    present.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on NV_parameter_buffer_object2
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The programmability functionality provided by NV_parameter_buffer_object2
5bd8deadSopenharmony_ci    is also incorporated by this extension.  Any assembly program using a
5bd8deadSopenharmony_ci    program header corresponding to this or any subsequent extension (e.g.,
5bd8deadSopenharmony_ci    "!!NVfp5.0") may use the LDC opcode without needing to declare "OPTION
5bd8deadSopenharmony_ci    NV_parameter_buffer_object2".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    In addition to the basic functionality in NV_parameter_buffer_object2,
5bd8deadSopenharmony_ci    this extension provides the ability to load 64-bit integers and
5bd8deadSopenharmony_ci    floating-point values using the "S64", "S64X2", "S64X4", "U64", "U64X2",
5bd8deadSopenharmony_ci    "U64X4", "F64", "F64X2", and "F64X4" opcode modifiers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on OpenGL 3.3, ARB_texture_swizzle, and EXT_texture_swizzle
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If OpenGL 3.3, ARB_texture_swizzle, and EXT_texture_swizzle are not
5bd8deadSopenharmony_ci    supported, remove the swizzling step from the definition of TXG and TXGO.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on ARB_blend_func_extended
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If ARB_blend_func_extended is not supported, references to the dual source
5bd8deadSopenharmony_ci    color output bindings (result.color.primary and result.color.secondary)
5bd8deadSopenharmony_ci    should be removed.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    EXT_shader_image_load_store provides OpenGL Shading Language mechanisms to
5bd8deadSopenharmony_ci    load/store to buffer and texture image memory, including spec language
5bd8deadSopenharmony_ci    describing memory access ordering and synchronization, a built-in function
5bd8deadSopenharmony_ci    (MemoryBarrierEXT) controlling synchronization of memory operations, and
5bd8deadSopenharmony_ci    spec language describing early fragment tests that can be enabled via GLSL
5bd8deadSopenharmony_ci    fragment shader source.  These sections of the EXT_shader_image_load_store
5bd8deadSopenharmony_ci    specification apply equally to the assembly program memory accesses
5bd8deadSopenharmony_ci    provided by this extension.  If EXT_shader_image_load_store is not
5bd8deadSopenharmony_ci    supported, the sections of that specification describing these features
5bd8deadSopenharmony_ci    should be considered to be added to this extension.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    EXT_shader_image_load_store additionally provides and documents assembly
5bd8deadSopenharmony_ci    language support for image loads, stores, and atomics as described in the
5bd8deadSopenharmony_ci    "Dependencies on NV_gpu_program5" section of EXT_shader_image_load_store.
5bd8deadSopenharmony_ci    The features described there are automatically supported for all
5bd8deadSopenharmony_ci    NV_gpu_program5 assembly programs without requiring any additional
5bd8deadSopenharmony_ci    "OPTION" line.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on ARB_shader_subroutine
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    ARB_shader_subroutine provides and documents assembly language support for
5bd8deadSopenharmony_ci    subroutines as described in the "Dependencies on NV_gpu_program5" section
5bd8deadSopenharmony_ci    of ARB_shader_subroutine.  The features described there are automatically
5bd8deadSopenharmony_ci    supported for all NV_gpu_program5 assembly programs without requiring any
5bd8deadSopenharmony_ci    additional "OPTION" line.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciIssues
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (1) Are there any restrictions or performance concerns involving the
5bd8deadSopenharmony_ci        support for indexing textures or parameter buffers?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  There are no significant functional limitations.  Textures
5bd8deadSopenharmony_ci      and parameter buffers accessed with an index must be declared as arrays,
5bd8deadSopenharmony_ci      so the assembler knows which textures might be accessed this way.
5bd8deadSopenharmony_ci      Additionally, accessing an array of textures or parameter buffers with
5bd8deadSopenharmony_ci      an out-of-bounds index will yield undefined results.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      In particular, there is no limitation on the values used for indexing --
5bd8deadSopenharmony_ci      they are not required to be true constants and are not required to have
5bd8deadSopenharmony_ci      the same value for all vertices/fragments in a primitive.  However,
5bd8deadSopenharmony_ci      using divergent texture or parameter buffer indices may have performance
5bd8deadSopenharmony_ci      concerns.  We expect that GPU implementations of this extension will run
5bd8deadSopenharmony_ci      multiple program threads in parallel (SIMD).  If different threads in a
5bd8deadSopenharmony_ci      thread group have different indices, it will be necessary to do lookups
5bd8deadSopenharmony_ci      in more than one texture at once.  This is likely to result in some
5bd8deadSopenharmony_ci      thread serialization.  We expect that indexed texture or parameter
5bd8deadSopenharmony_ci      buffer access where all indices in a thread group match will perform
5bd8deadSopenharmony_ci      identically to non-indexed accesses.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (2) Which texture instructions support programmable texel offsets, and
5bd8deadSopenharmony_ci        what offset limits apply?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Most texture instructions (TEX, TXB, TXF, TXG, TXL, TXP)
5bd8deadSopenharmony_ci      support both constant texel offsets as provided by NV_gpu_program4 and
5bd8deadSopenharmony_ci      programmable texel offsets.  TXD supports only constant offsets.  TXGO
5bd8deadSopenharmony_ci      does not support non-zero or programmable offsets in the texture portion
5bd8deadSopenharmony_ci      of the instruction, but provides full support for programmable offsets
5bd8deadSopenharmony_ci      via two of the three vector arguments in the regular instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      For example,
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TEX result, coord, texture[0], 2D, (-1,-1);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      uses the NV_gpu_program4 mechanism applies a constant texel offset of
5bd8deadSopenharmony_ci      (-1,-1) to the texture coordinates.  With programmable offsets, the
5bd8deadSopenharmony_ci      following code applies the same offset.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TEMP offxy;
5bd8deadSopenharmony_ci        MOV offxy, {-1, -1};
5bd8deadSopenharmony_ci        TEX result, coord, texture[0], offset(offxy);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Of course, the programmable form allows the offsets to be computed in
5bd8deadSopenharmony_ci      the program and does not require constant values.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      For most texture instructions, the range of allowable offsets is
5bd8deadSopenharmony_ci      [MIN_PROGRAM_TEXEL_OFFSET_EXT, MAX_PROGRAM_TEXEL_OFFSET_EXT] for both
5bd8deadSopenharmony_ci      constant and programmable texel offsets.  Constant offsets can be
5bd8deadSopenharmony_ci      checked when the program is loaded, and out-of-bounds offsets cause the
5bd8deadSopenharmony_ci      program to fail to load.  Programmable offsets can not have a
5bd8deadSopenharmony_ci      load-time range check; out-of-bounds offsets produce undefined results.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Additionally, the new TXGO instruction has a separate (likely larger)
5bd8deadSopenharmony_ci      allowable offset range, [MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV,
5bd8deadSopenharmony_ci      MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV], that applies to the offset
5bd8deadSopenharmony_ci      vectors passed in its second and third operand.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      In the initial implementation of this extension, the range limits are
5bd8deadSopenharmony_ci      [-8,+7] for most instructions and [-32,+31] for TXGO.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (3) What is TXGO (texture gather with separate offsets) good for?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  TXGO allows for efficiently sampling a single-component
5bd8deadSopenharmony_ci      texture with a variety of offsets that need not be contiguous.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      For example, a shadow mapping algorithm using a high-resolution shadow
5bd8deadSopenharmony_ci      map may have pixels whose footpoint covers a large number of texels in
5bd8deadSopenharmony_ci      the shadow map.  Such pixels could do a single lookup into a
5bd8deadSopenharmony_ci      lower-resolution texture (using mipmapping), but quality problems will
5bd8deadSopenharmony_ci      arise.  Alternately, a shader could perform a large number of texture
5bd8deadSopenharmony_ci      lookups using either NEAREST or LINEAR filtering from the
5bd8deadSopenharmony_ci      high-resolution texture.  NEAREST filtering will require a separate
5bd8deadSopenharmony_ci      lookup for each texel accessed; LINEAR filtering may require somewhat
5bd8deadSopenharmony_ci      fewer lookups, but all accesses cover a 2x2 portion of the texture.  The
5bd8deadSopenharmony_ci      TXG instruction added to NV_gpu_program4_1 allows a 2x2 block of texels
5bd8deadSopenharmony_ci      to be returned in a single instruction in case the program wants to do
5bd8deadSopenharmony_ci      something other than linear filtering with the samples.  The TXGO allows
5bd8deadSopenharmony_ci      a program to do semi-random sampling of the texture without requiring
5bd8deadSopenharmony_ci      that each sample cover a 2x2 block of texels.  For example, the TXGO
5bd8deadSopenharmony_ci      instruction would allow a program to the four texels A, H, J, O from the
5bd8deadSopenharmony_ci      4x4 block depicted below:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TXGO result, coord, {-1,+2,0,+1}, {-1,0,+1,+2}, texture[0], 2D;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The "equivalent" TXG instruction would only sample the four center
5bd8deadSopenharmony_ci      texels F, G, J, and K
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TXG result, coord, texture[0], 2D;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      All sixteen texels of the footprint could be sampled with four TXG
5bd8deadSopenharmony_ci      instructions,
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TXG result0, coord, texture[0], 2D, (-1,-1);
5bd8deadSopenharmony_ci        TXG result1, coord, texture[0], 2D, (-1,+1);
5bd8deadSopenharmony_ci        TXG result2, coord, texture[0], 2D, (+1,-1);
5bd8deadSopenharmony_ci        TXG result3, coord, texture[0], 2D, (+1,+1);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      but accessing a smaller number of samples spread across the footprint
5bd8deadSopenharmony_ci      with fewer instructions may produce results that are good enough.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The figure here depicts a texture with texel (0,0) shown in the
5bd8deadSopenharmony_ci      upper-left corner.  If you insist on a lower-left origin, please look at
5bd8deadSopenharmony_ci      this figure while standing on your head.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci       (0,0) +-+-+-+-+
5bd8deadSopenharmony_ci             |A|B|C|D|
5bd8deadSopenharmony_ci             +-+-+-+-+
5bd8deadSopenharmony_ci             |E|F|G|H|
5bd8deadSopenharmony_ci             +-+-+-+-+
5bd8deadSopenharmony_ci             |I|J|K|L|
5bd8deadSopenharmony_ci             +-+-+-+-+
5bd8deadSopenharmony_ci             |M|N|O|P|
5bd8deadSopenharmony_ci             +-+-+-+-+ (4,4)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (4) Why are the results of TXGO (texture gather with separate offsets)
5bd8deadSopenharmony_ci        undefined if the wrap mode is CLAMP or MIRROR_CLAMP_EXT?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  The CLAMP and MIRROR_CLAMP_EXT wrap modes are fairly
5bd8deadSopenharmony_ci      different from other wrap modes.  After adding any instruction offsets,
5bd8deadSopenharmony_ci      the spec says to pre-clamp the (u,v) coordinates to [0,texture_size]
5bd8deadSopenharmony_ci      before generating the footprint.  If such clamping occurs on one edge
5bd8deadSopenharmony_ci      for a normal texture filtering operation, the footprint ends up being
5bd8deadSopenharmony_ci      half border texels, half edge texels, and the clamping effectively
5bd8deadSopenharmony_ci      forces the interpolation weights used for texture filtering to 50/50.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      We expect the TXG instruction to be used in cases where an application
5bd8deadSopenharmony_ci      may want to do custom filtering, and is in control of its own filtering
5bd8deadSopenharmony_ci      weights.  Coordinate clamping as above will affect the footprint used
5bd8deadSopenharmony_ci      for filtering, but not the weights.  In the NV_gpu_program4_1 spec, we
5bd8deadSopenharmony_ci      defined the TXG/CLAMP combination to simply return the "normal"
5bd8deadSopenharmony_ci      footprint produced after the pre-clamp operation above.  Any adjustment
5bd8deadSopenharmony_ci      of weights due to clamping is the responsibility of the application.  We
5bd8deadSopenharmony_ci      don't expect this to be a common operation, because CLAMP_TO_EDGE or
5bd8deadSopenharmony_ci      CLAMP_TO_BORDER are much more sensible wrap modes.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The hardware implementing TXGO is anticipated to extract all four
5bd8deadSopenharmony_ci      samples in a single pass.  However, the spec language is defined for
5bd8deadSopenharmony_ci      simplicity to perform four separate "gather" operations with the four
5bd8deadSopenharmony_ci      provided offsets, extract a single sample from each, and combine the
5bd8deadSopenharmony_ci      four samples into a vector.  This would require four separate pre-clamp
5bd8deadSopenharmony_ci      operations, which was deemed too costly to implement in hardware for a
5bd8deadSopenharmony_ci      wrap mode that doesn't work well with texture gather operations.  Even
5bd8deadSopenharmony_ci      if such hardware were built, it still wouldn't obtain a footprint
5bd8deadSopenharmony_ci      resembling the half-border, half-edge footprint for simple TXGO offsets
5bd8deadSopenharmony_ci      -- that would require different per-texel clamping rules for the four
5bd8deadSopenharmony_ci      samples.  We chose to leave the results of this operation undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (5) Should double-precision floating-point support be required or
5bd8deadSopenharmony_ci        optional?  If optional, how?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Double-precision floating-point support will be optional in
5bd8deadSopenharmony_ci      case low-end GPUs supporting the remainder of these instruction features
5bd8deadSopenharmony_ci      choose to cut costs by removing the silicon necessary to implement
5bd8deadSopenharmony_ci      64-bit floating-point arithmetic.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (6) While this extension supports double-precision computation, how can
5bd8deadSopenharmony_ci        you provide high-precision inputs and outputs to the GPU programs?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  The underlying hardware implementing this extension does not
5bd8deadSopenharmony_ci      provide full support for 64-bit floats, even though DOUBLE is a standard
5bd8deadSopenharmony_ci      data type provided by the GL.  For example, when specifying a vertex
5bd8deadSopenharmony_ci      array with a data type of DOUBLE, the vertex attribute components will
5bd8deadSopenharmony_ci      end up being converted to 32-bit floats (FLOAT) by the driver before
5bd8deadSopenharmony_ci      being passed to the hardware, and the extra precision in the original
5bd8deadSopenharmony_ci      64-bit float values will be lost.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      For vertex attributes, the EXT_vertex_attrib_64bit and
5bd8deadSopenharmony_ci      NV_vertex_attrib_integer_64bit extensions provide the ability to specify
5bd8deadSopenharmony_ci      64-bit vertex attribute components using the VertexAttribL* and
5bd8deadSopenharmony_ci      VertexAttribLPointer APIs.  Such attributes can be read in a vertex
5bd8deadSopenharmony_ci      program using a "LONG ATTRIB" declaration:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        LONG ATTRIB vector64;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The LONG modifier can only be used vertex program inputs, and can not be
5bd8deadSopenharmony_ci      used for inputs of any program type or outputs of any program type.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      For other cases, this extension provides the PK64 and UP64 instructions
5bd8deadSopenharmony_ci      that provide a mechanism to pass 64-bit components using consecutive
5bd8deadSopenharmony_ci      32-bit components.  For example, a 3-component vector with 64-bit
5bd8deadSopenharmony_ci      components can be passed to a vertex shader using multiple vertex
5bd8deadSopenharmony_ci      attributes without using the VertexAttribL APIs with the following code:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        /* Pass the X/Y components in vertex attribute 0 (X/Y/Z/W).  Use
5bd8deadSopenharmony_ci           stride to skip over Z. */
5bd8deadSopenharmony_ci        glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 3*sizeof(GLdouble),
5bd8deadSopenharmony_ci                              (GLdouble *) buffer);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        /* Pass the Z components in vertex attribute 1 (X/Y).  Use stride to
5bd8deadSopenharmony_ci           skip over original X/Y components. */
5bd8deadSopenharmony_ci        glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 3*sizeof(GLdouble),
5bd8deadSopenharmony_ci                              (GLdouble *) buffer + 2);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      In this example, the vertex program would use the PK64 instruction to
5bd8deadSopenharmony_ci      reconstruct the 64-bit value for each component as follows:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        LONG TEMP reconstructed;
5bd8deadSopenharmony_ci        PK64 reconstructed.xy, vertex.attrib[0];
5bd8deadSopenharmony_ci        PK64 reconstructed.z,  vertex.attrib[1];
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      A similar technique can be used to pass 64-bit values computed by a GPU
5bd8deadSopenharmony_ci      program, using transform feedback or writes to a color buffer.  The UP64
5bd8deadSopenharmony_ci      instruction would be used to convert the 64-bit computed value into two
5bd8deadSopenharmony_ci      32-bit values, which would be written to adjacent components.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Note also that the original hardware implementation of this extension
5bd8deadSopenharmony_ci      does not support interpolation of 64-bit floating-point values.  If an
5bd8deadSopenharmony_ci      application desires to pass a 64-bit floating-point value from a vertex
5bd8deadSopenharmony_ci      or geometry program to a fragment program, and doesn't require
5bd8deadSopenharmony_ci      interpolation, the PK64/UP64 techniques can be combined.  For example,
5bd8deadSopenharmony_ci      the vertex shader could unpack a 3-component vector with 64-bit
5bd8deadSopenharmony_ci      components into a four-component and a two-component 32-bit vector:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        LONG TEMP result64;
5bd8deadSopenharmony_ci        RESULT result32[2] = { result.attrib[0..1] };
5bd8deadSopenharmony_ci        UP64 result32[0],    result64.xyxy;
5bd8deadSopenharmony_ci        UP64 result32[1].xy, result64.z;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The fragment program would read and reconstruct using PK64:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        LONG TEMP input64;
5bd8deadSopenharmony_ci        FLAT ATTRIB input32[3] = { fragment.attrib[0..1] };
5bd8deadSopenharmony_ci        PK64 input64.xy, input32[0];
5bd8deadSopenharmony_ci        PK64 input64.z,  input32[1];
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Note that such inputs must be declared as "FLAT" in the fragment program
5bd8deadSopenharmony_ci      to prevent the hardware from trying to do floating-point interpolation
5bd8deadSopenharmony_ci      on the separate 32-bit halves of the value being passed.  Such
5bd8deadSopenharmony_ci      interpolation would produce complete garbage.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (7) What are instanced geometry programs useful for?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Instanced geometry programs allow geometry programs that
5bd8deadSopenharmony_ci      perform regular operations to run more efficiently.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Consider a simple example of an algorithm that uses geometry programs to
5bd8deadSopenharmony_ci      render primitives to a cube map in a single pass.  Without instanced
5bd8deadSopenharmony_ci      geometry programs, the geometry program to render triangles to the cube
5bd8deadSopenharmony_ci      map would do something like:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        for (face = 0; face < 6; face++) {
5bd8deadSopenharmony_ci          for (vertex = 0; vertex < 3; vertex++) {
5bd8deadSopenharmony_ci            project vertex <vertex> onto face <face>, output position
5bd8deadSopenharmony_ci            compute/copy attributes of emitted <vertex> to outputs
5bd8deadSopenharmony_ci            output <face> to result.layer
5bd8deadSopenharmony_ci            emit the projected vertex
5bd8deadSopenharmony_ci          }
5bd8deadSopenharmony_ci          end the primitive (next triangle)
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      This algorithm would output 18 vertices per input triangle, three for
5bd8deadSopenharmony_ci      each cube face.  The six triangles emitted would be rasterized, one per
5bd8deadSopenharmony_ci      face.  Geometry programs that emit a large number of attributes have
5bd8deadSopenharmony_ci      often posed performance challenges, since all the attributes must be
5bd8deadSopenharmony_ci      stored somewhere until the emitted primitives.  Large storage
5bd8deadSopenharmony_ci      requirements may limit the number of threads that can be run in parallel
5bd8deadSopenharmony_ci      and reduce overall performance.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Instanced geometry programs allow this example to be restructured to run
5bd8deadSopenharmony_ci      with six separate threads, one per face.  Each thread projects the
5bd8deadSopenharmony_ci      triangle to only a single face (identified by the invocation number) and
5bd8deadSopenharmony_ci      emits only 3 vertices.  The reduced storage requirements allow more
5bd8deadSopenharmony_ci      geometry program threads to be run in parallel, with greater overall
5bd8deadSopenharmony_ci      efficiency.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Additionally, the total number of attributes that can be emitted by a
5bd8deadSopenharmony_ci      single geometry program invocation is limited.  However, for instanced
5bd8deadSopenharmony_ci      geometry shaders, that limit applies to each of <N> program invocations
5bd8deadSopenharmony_ci      which allows for a larger total output.  For example, if the GL
5bd8deadSopenharmony_ci      implementation supports only 1024 components of output per program
5bd8deadSopenharmony_ci      invocation, the 18-vertex algorithm above could emit no more than 56
5bd8deadSopenharmony_ci      components per vertex.  The same algorithm implemented as a 3-vertex
5bd8deadSopenharmony_ci      6-invocation geometry program could theoretically allow for 341
5bd8deadSopenharmony_ci      components per vertex.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (8) What are the special interpolation opcodes (IPAC, IPAO, IPAS) good
5bd8deadSopenharmony_ci        for, and how do they work?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  The interpolation opcodes allow programs to control the
5bd8deadSopenharmony_ci      frequency and location at which fragment inputs are sampled.  Limited
5bd8deadSopenharmony_ci      control has been provided in previous extensions, but the support was
5bd8deadSopenharmony_ci      more limited.  NV_gpu_program4 had an interpolation modifier (CENTROID)
5bd8deadSopenharmony_ci      that allowed attributes to be sampled inside the primitive, but that was
5bd8deadSopenharmony_ci      a per-attribute modifier -- you could only sample any given attribute at
5bd8deadSopenharmony_ci      one location.  NV_gpu_program4_1 added a new interpolation modifier
5bd8deadSopenharmony_ci      (SAMPLE) that directed that fragment programs be run once per sample,
5bd8deadSopenharmony_ci      and that the specified attributes be interpolated at the sample
5bd8deadSopenharmony_ci      location.  Per-sample interpolation can produce higher quality, but the
5bd8deadSopenharmony_ci      performance cost is significant since more fragment program invocations
5bd8deadSopenharmony_ci      are required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      This extension provides additional control over interpolation, and
5bd8deadSopenharmony_ci      allows programs to interpolate attributes at different locations without
5bd8deadSopenharmony_ci      necessarily requiring the performance hit of per-sample invocation.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The IPAC instruction allows an attribute to be sampled at the centroid
5bd8deadSopenharmony_ci      location, while still allowing the same attribute to be sampled
5bd8deadSopenharmony_ci      elsewhere.  The IPAS instruction allows the attribute to be sampled at a
5bd8deadSopenharmony_ci      number sample location, as per-sample interpolation would do.  Multiple
5bd8deadSopenharmony_ci      IPAS instructions with different sample numbers allows a program to
5bd8deadSopenharmony_ci      sample an attribute at multiple sample points in the pixel and then
5bd8deadSopenharmony_ci      combine the samples in a programmable manner, which may allow for higher
5bd8deadSopenharmony_ci      quality than simply interpolating at a single representative point in
5bd8deadSopenharmony_ci      the pixel.  The IPAO instruction allows the attribute to be sampled at
5bd8deadSopenharmony_ci      an arbitrary (x,y) offset relative to the pixel center.  The range of
5bd8deadSopenharmony_ci      supported (x,y) values is limited, and the limits in the initial
5bd8deadSopenharmony_ci      implementation are not large enough to permit sampling the attribute
5bd8deadSopenharmony_ci      outside the pixel.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Note that previous instruction sets allowed shaders to fake IPAC,
5bd8deadSopenharmony_ci      IPAS, and IPAO by a sequence such as:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TEMP ddx, ddy, offset, interp;
5bd8deadSopenharmony_ci        MOV interp, fragment.attrib[0];          # start with center
5bd8deadSopenharmony_ci        DDX ddx, fragment.attrib[0];
5bd8deadSopenharmony_ci        MAD interp, offset.x, ddx, interp;       # add offset.x * dA/dx
5bd8deadSopenharmony_ci        DDY ddx, fragment.attrib[0];
5bd8deadSopenharmony_ci        MAD interp, offset.y, ddy, interp;       # add offset.y * dA/dy
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      However, this method does not apply perspective correction.  The quality
5bd8deadSopenharmony_ci      of the results may be unacceptable, particularly for primitives that are
5bd8deadSopenharmony_ci      nearly perpendicular to the screen.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The semantics of the first operand of these instructions is different
5bd8deadSopenharmony_ci      from normal assembly instructions.  Operands are normally evaluated by
5bd8deadSopenharmony_ci      loading the value of the corresponding variable and applying any
5bd8deadSopenharmony_ci      swizzle/negation/absolute value modifier before the instruction is
5bd8deadSopenharmony_ci      executed.  In the IPAC/IPAO/IPAS instructions, the value of the
5bd8deadSopenharmony_ci      attribute is evaluated by the instruction itself.  Swizzles, negation,
5bd8deadSopenharmony_ci      and absolute value modifiers are still allowed, and are applied after
5bd8deadSopenharmony_ci      the attribute values are interpolated.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (9) When using a program that issues global stores (via the STORE
5bd8deadSopenharmony_ci        instruction), what amount of execution ordering is guaranteed?  How
5bd8deadSopenharmony_ci        can an application ensure that writes executed in a shader have
5bd8deadSopenharmony_ci        completed and will be visible to other operations using the buffer
5bd8deadSopenharmony_ci        object in question?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  There are very few automatic guarantees for potential
5bd8deadSopenharmony_ci      write/read or write/write conflicts.  Program invocations will run in
5bd8deadSopenharmony_ci      generally run in arbitrary order, and applications can't rely on
5bd8deadSopenharmony_ci      read/write order to match primitive order.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      To get consistent results when buffers are read and written using
5bd8deadSopenharmony_ci      multiple pipeline stages, manual synchronization using the
5bd8deadSopenharmony_ci      MemoryBarrierEXT() API documented in EXT_shader_image_load_store or some
5bd8deadSopenharmony_ci      other synchronization primitive is necessary.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (10) Unlike most other shader features, the STORE opcode allows for
5bd8deadSopenharmony_ci         externally-visible side effects from executing a program.  How does
5bd8deadSopenharmony_ci         this capability interact with other features of the GL?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  First, some GL implementations support a variety of "early Z"
5bd8deadSopenharmony_ci      optimizations designed to minimize unnecessary fragment processing work,
5bd8deadSopenharmony_ci      such as executing an expensive fragment program on a fragment that will
5bd8deadSopenharmony_ci      eventually fail the depth test.  Such optimizations have been valid
5bd8deadSopenharmony_ci      because fragment programs had no side effects.  That is no longer the
5bd8deadSopenharmony_ci      case, and such optimizations may not be employed if the fragment program
5bd8deadSopenharmony_ci      performs a global store.  However, we provide a new "early depth and
5bd8deadSopenharmony_ci      stencil test" enable that allows applications to deterministically
5bd8deadSopenharmony_ci      control depth and stencil testing.  If enabled, depth testing is always
5bd8deadSopenharmony_ci      performed prior to fragment program execution.  Fragment programs will
5bd8deadSopenharmony_ci      never be run on fragments that fail any of these tests.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Second, we are permitting global stores in all program types; however,
5bd8deadSopenharmony_ci      the number of program invocations is not well-defined for some program
5bd8deadSopenharmony_ci      types.  For example, a GL implementation may choose to combine multiple
5bd8deadSopenharmony_ci      instances of identical vertices (e.g., duplicate indices in
5bd8deadSopenharmony_ci      DrawElements, immediate-mode vertices with identical data) into one
5bd8deadSopenharmony_ci      single vertex program invocation, or it may run a vertex program on each
5bd8deadSopenharmony_ci      separately.  Similarly, the tessellation primitive generator will
5bd8deadSopenharmony_ci      generate independent primitives with duplicated vertices, which may or
5bd8deadSopenharmony_ci      may not be combined for tessellation evaluation program execution.
5bd8deadSopenharmony_ci      Fragment program execution also has several issues described in more
5bd8deadSopenharmony_ci      detail below.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (11) What issues arise when running fragment programs doing global stores?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  The order of per-fragment operations in the existing OpenGL
5bd8deadSopenharmony_ci      3.0 specification can be fairly loose, because previously-defined
5bd8deadSopenharmony_ci      fragment programs, shaders, and fixed-function fragment processing had
5bd8deadSopenharmony_ci      no side effects.  With side effects, the order of operations must be
5bd8deadSopenharmony_ci      defined more tightly.  In particular, the pixel ownership and scissor
5bd8deadSopenharmony_ci      tests are specified to be performed prior to fragment program execution,
5bd8deadSopenharmony_ci      and we provide an option to perform depth and stencil tests early as
5bd8deadSopenharmony_ci      well.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      OpenGL implementations sometimes run fragment programs on "helper"
5bd8deadSopenharmony_ci      pixels that have no coverage in order to be able to compute sane partial
5bd8deadSopenharmony_ci      deriviatives for fragment program instructions (DDX, DDY) or automatic
5bd8deadSopenharmony_ci      level-of-detail calculation for texturing.  In this approach,
5bd8deadSopenharmony_ci      derivatives are approximated by computing the difference in a quantity
5bd8deadSopenharmony_ci      computed for a given fragment at (x,y) and a fragment at a neighboring
5bd8deadSopenharmony_ci      pixel.  When a fragment program is executed on a "helper" pixel, global
5bd8deadSopenharmony_ci      stores have no effect.  Helper pixels aren't explicitly mentioned in the
5bd8deadSopenharmony_ci      spec body; instead, partial derivatives are obtained by magic.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      If a fragment program contains a KIL instruction, compilers may not
5bd8deadSopenharmony_ci      reorder code where an ATOM or STORE execution is executed before a KIL
5bd8deadSopenharmony_ci      instruction that logically precedes it in flow control.  Once a fragment
5bd8deadSopenharmony_ci      is killed, subsequent atomics or stores should never be executed.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Multisample rasterization poses several issues for fragment programs
5bd8deadSopenharmony_ci      with global stores.  The number of times a fragment program is executed
5bd8deadSopenharmony_ci      for multisample rendering is not fully specified, which gives
5bd8deadSopenharmony_ci      implementations a number of different choices -- pure multisample (only
5bd8deadSopenharmony_ci      runs once), pure supersample (runs once per covered sample), or modes in
5bd8deadSopenharmony_ci      between.  There are some ways for an application to indirectly control
5bd8deadSopenharmony_ci      the behavior -- for example, fragment programs specifying per-sample
5bd8deadSopenharmony_ci      attribute interpolation are guaranteed to run once per covered sample.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Note that when rendering to a multisample buffer, a pair of adjacent
5bd8deadSopenharmony_ci      triangles may cause a fragment program to be executed more than once at
5bd8deadSopenharmony_ci      a given (x,y) with different sets of samples covered.  This can also
5bd8deadSopenharmony_ci      occur in the interior of a quadrilateral or polygon primitive.
5bd8deadSopenharmony_ci      Implementations are permitted to split quads and polygons with >3
5bd8deadSopenharmony_ci      vertices into triangles, creating interior edges that split a pixel.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (12) What happens if early fragment tests are enabled, the early depth
5bd8deadSopenharmony_ci         test passes, and a fragment program that computes a new depth value
5bd8deadSopenharmony_ci         is executed?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  The depth value produced by the fragment program has no
5bd8deadSopenharmony_ci      effect if early fragment tests are enabled.  The depth value computed by
5bd8deadSopenharmony_ci      a fragment program is used only by the post-fragment program stencil and
5bd8deadSopenharmony_ci      depth tests, and those tests always have no effect when early depth
5bd8deadSopenharmony_ci      testing is enabled.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (13) How do early fragment tests interact with occlusion queries?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  When early fragment tests are enabled, sample counting for
5bd8deadSopenharmony_ci      occlusion queries also happens prior to fragment program execution.
5bd8deadSopenharmony_ci      Enabling early fragment tests can change the overall sample count,
5bd8deadSopenharmony_ci      because samples killed by alpha test and alpha to coverage will still be
5bd8deadSopenharmony_ci      counted if early fragment tests are enabled.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (14) What happens if a program performs a global store to a GPU address
5bd8deadSopenharmony_ci         corresponding to a read-only buffer mapping?  What if it performs a
5bd8deadSopenharmony_ci         global read to a write-only mapping?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Implementations may choose implement full memory protection,
5bd8deadSopenharmony_ci      in which case accesses using the wrong type of memory mapping will fault
5bd8deadSopenharmony_ci      and lead to termination of the application.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      However, full memory protection is not required in this extension --
5bd8deadSopenharmony_ci      implementations may choose to substitute a read-write mapping in place
5bd8deadSopenharmony_ci      of a read-only or write-only mapping.  As a result, we specify the
5bd8deadSopenharmony_ci      result of such invalid loads and stores to be undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Note that if a program erroneously writes to nominally read-only
5bd8deadSopenharmony_ci      mappings, the results may be weird.  If the implementation substitutes a
5bd8deadSopenharmony_ci      read-write mapping, such invalid writes are likely to proceed normally.
5bd8deadSopenharmony_ci      However, if the application later makes a buffer object non-resident and
5bd8deadSopenharmony_ci      the memory manager of the GL implementation needs to move the buffer,
5bd8deadSopenharmony_ci      the GL may assume that the contents of the buffer have not been modified
5bd8deadSopenharmony_ci      and thus discard the new values written by the (invalid) global store
5bd8deadSopenharmony_ci      instructions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (15) What performance considerations apply to atomics?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Atomics can be useful for operations like locking, or for
5bd8deadSopenharmony_ci      maintaining counters.  Note that high-performance GPUs may have hundreds
5bd8deadSopenharmony_ci      of program threads in flight at once, and may also have some SIMD
5bd8deadSopenharmony_ci      characteristics (where threads are grouped and run as a unit).  Using
5bd8deadSopenharmony_ci      ATOM instructions with a single memory address to implement a critical
5bd8deadSopenharmony_ci      section will result in serial execution -- only one of the hundreds of
5bd8deadSopenharmony_ci      threads can execute code in the critical section at a time.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      When a global operation would be done under a lock, it may be possible
5bd8deadSopenharmony_ci      to improve performance if the algorithm can be parallelized to have
5bd8deadSopenharmony_ci      multiple critical sections.  For example, an application could allocate
5bd8deadSopenharmony_ci      an array of shared resources, each protected by its own lock, and use
5bd8deadSopenharmony_ci      the LSBs of the primitive ID or some function of the screen-space (x,y)
5bd8deadSopenharmony_ci      to determine which resource in the array to use.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (16) The atomic instruction ATOM returns the old contents of memory into
5bd8deadSopenharmony_ci         the result register.  Should we provide a version of this opcodes
5bd8deadSopenharmony_ci         that doesn't return a value?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  No.  In theory, atomics that don't return any values can
5bd8deadSopenharmony_ci      perform better (because the program may not need to allocate resources
5bd8deadSopenharmony_ci      to hold a result or wait for the result.  However, a new opcode isn't
5bd8deadSopenharmony_ci      required to obtain this behavior -- a compiler can recognize that the
5bd8deadSopenharmony_ci      result of an ATOM instruction is written to a "dummy" temporary that
5bd8deadSopenharmony_ci      isn't read by subsequent instructions:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        TEMP junk;
5bd8deadSopenharmony_ci        ATOM.ADD.U32 junk, address, 1;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The compiler can also recognize that the result will always be discarded
5bd8deadSopenharmony_ci      if a conditional write mask of "(FL)" is used.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        ATOM.ADD.U32 not_junk (FL), address, 1;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (17) How do we ensure that memory access made by multiple program
5bd8deadSopenharmony_ci         invocations of possibly different types are coherent?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Atomic instructions allow program invocations to coordinate
5bd8deadSopenharmony_ci      using shared global memory addresses.  However, memory transactions,
5bd8deadSopenharmony_ci      including atomics, are not guaranteed to land in the order specified in
5bd8deadSopenharmony_ci      the program; they may be reordered by the compiler, cached in different
5bd8deadSopenharmony_ci      memory hierarchies, and stored in a distributed memory system where
5bd8deadSopenharmony_ci      later stores to one "partition" might be completed prior to earlier
5bd8deadSopenharmony_ci      stores to another.  The MEMBAR instruction helps control memory
5bd8deadSopenharmony_ci      transaction ordering by ensuring that all memory transactions prior to
5bd8deadSopenharmony_ci      the barrier complete before any after the barrier.  Additionally the
5bd8deadSopenharmony_ci      ".COH" modifier ensures that memory transactions using the modifier are
5bd8deadSopenharmony_ci      cached coherently and will be visible to other shader invocations.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (18) How do the TXG and TXGO opcodes work with sRGB textures?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci       RESOLVED. Gamma-correction is applied to the texture source color
5bd8deadSopenharmony_ci       before "gathering" and hence applies to all four components, unless
5bd8deadSopenharmony_ci       the texture swizzle of the selected component is ALPHA in which case
5bd8deadSopenharmony_ci       no gamma-correction is applied.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (19) How can render-to-texture algorithms take advantage of
5bd8deadSopenharmony_ci         MemoryBarrierEXT, nominally provided for global memory transactions?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED: Many algorithms use RTT to ping-pong between two allocations,
5bd8deadSopenharmony_ci      using the result of one rendering pass as the input to the next.
5bd8deadSopenharmony_ci      Existing mechanisms require expensive FBO Binds, DrawBuffer changes, or
5bd8deadSopenharmony_ci      FBO attachment changes to safely swap the render target and texture. With
5bd8deadSopenharmony_ci      memory barriers, layered geometry shader rendering, and texture arrays,
5bd8deadSopenharmony_ci      an application can very cheaply ping-pong between two layers of a single
5bd8deadSopenharmony_ci      texture. i.e.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        X = 0;
5bd8deadSopenharmony_ci        // Bind the array texture to a texture unit
5bd8deadSopenharmony_ci        // Attach the array texture to an FBO using FramebufferTextureARB
5bd8deadSopenharmony_ci        while (!done) {
5bd8deadSopenharmony_ci          // Stuff X in a constant, vertex attrib, etc.
5bd8deadSopenharmony_ci          Draw -
5bd8deadSopenharmony_ci            Texturing from layer X;
5bd8deadSopenharmony_ci            Writing gl_Layer = 1 - X in the geometry shader;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci          MemoryBarrierEXT(TEXTURE_FETCH_BARRIER_BIT_NV);
5bd8deadSopenharmony_ci          X = 1 - X;
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      However, be warned that this requires geometry shaders and hence adds
5bd8deadSopenharmony_ci      the overhead that all geometry must pass through an additional program
5bd8deadSopenharmony_ci      stage, so an application using large amounts of geometry could become
5bd8deadSopenharmony_ci      geometry-limited or more shader-limited.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (20) What is the ".PREC" instruction modifier good for?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  ".PREC" provides some invariance guarantees is useful for
5bd8deadSopenharmony_ci      certain algorithms.  Using ".PREC", it is possible to ensure that an
5bd8deadSopenharmony_ci      algorithm can be written to produce identical results on subtly
5bd8deadSopenharmony_ci      different inputs.  For example, the order of vertices visible to a
5bd8deadSopenharmony_ci      geometry or tessellation shader used to subdivide primitive edges might
5bd8deadSopenharmony_ci      present an edge shared between two primitives in one direction for one
5bd8deadSopenharmony_ci      primitive and the other direction for the adjacent primitive.  Even if
5bd8deadSopenharmony_ci      the weights are identical in the two cases, there may be cracking if the
5bd8deadSopenharmony_ci      computations are being done in an order-dependent manner.  If the
5bd8deadSopenharmony_ci      position of a new vertex were evaluation with code below with
5bd8deadSopenharmony_ci      limited-precision floating-point math, it's not necessarily the case
5bd8deadSopenharmony_ci      that we will get the same result for inputs (a,b,c) and (c,b,a) in the
5bd8deadSopenharmony_ci      following code:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci          ADD result, a, b;
5bd8deadSopenharmony_ci          ADD result, result, c;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      There are two problems with this code:  the rounding errors will be
5bd8deadSopenharmony_ci      different and the implementation is free to rearrange the computation
5bd8deadSopenharmony_ci      order.  The code can be rewritten as follows with ".PREC" and a
5bd8deadSopenharmony_ci      symmetric evaluation order to ensure a precise result with the inputs
5bd8deadSopenharmony_ci      reversed:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci          ADD result, a, c;
5bd8deadSopenharmony_ci          ADD.PREC result, result, b;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Note that in this example, the first instruction doesn't need the
5bd8deadSopenharmony_ci      ".PREC" qualifier because the second instruction requires that the
5bd8deadSopenharmony_ci      implementation compute <a>+<c>, which will be done reliably if <a> and
5bd8deadSopenharmony_ci      <c> are inputs.  If <a> and <c> were results of other computations, the
5bd8deadSopenharmony_ci      first add and possibly the dependent computations may also need to be
5bd8deadSopenharmony_ci      tagged with ".PREC" to ensure reliable results.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The ".PREC" modifier will disable certain optimization and thus carries
5bd8deadSopenharmony_ci      a performance cost.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (21) What are the TGALL, TGANY, TGEQ instructions good for?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  If an implementation performs SIMD thread execution,
5bd8deadSopenharmony_ci      divergent branching may result in reduced performance if the "if" and
5bd8deadSopenharmony_ci      "else" blocks of an "if" statement are executed sequentially.  For
5bd8deadSopenharmony_ci      example, an algorithm may have both a "fast path" that performs a
5bd8deadSopenharmony_ci      computation quickly for a subset of all cases and a "fast path" that
5bd8deadSopenharmony_ci      performs a computation quickly but correctly.  When performing SIMD
5bd8deadSopenharmony_ci      execution, code like the following:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        SNE.S.CC cc.x, condition.x;
5bd8deadSopenharmony_ci        IF NE.x;
5bd8deadSopenharmony_ci          # do fast path
5bd8deadSopenharmony_ci        ELSE;
5bd8deadSopenharmony_ci          # do slow path
5bd8deadSopenharmony_ci        ENDIF;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      may end up executing *both* the fast and slow paths for a SIMD thread
5bd8deadSopenharmony_ci      group if <condition> diverges, and may execute more slowly than simply
5bd8deadSopenharmony_ci      executing the slow path unconditionally.  These instructions allow code
5bd8deadSopenharmony_ci      like:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        # Condition code matches NE if and only if condition.x is non-zero
5bd8deadSopenharmony_ci        # for all threads.
5bd8deadSopenharmony_ci        TGALL.S.CC cc.x, condition.x;
5bd8deadSopenharmony_ci        IF NE.x;
5bd8deadSopenharmony_ci          # do fast path
5bd8deadSopenharmony_ci        ELSE;
5bd8deadSopenharmony_ci          # do slow path
5bd8deadSopenharmony_ci        ENDIF;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      that executes the fast path if and only if it can be used for *all*
5bd8deadSopenharmony_ci      threads in the group.  For thread groups where <condition> diverges,
5bd8deadSopenharmony_ci      this algorithm would unconditionally run the slow path, but would never
5bd8deadSopenharmony_ci      run both in sequence.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciRevision History
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
5bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------
5bd8deadSopenharmony_ci     8    05/25/22  shqxu     Fix use of a removed function
5bd8deadSopenharmony_ci                              MemoryBarrierNV.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     7    09/11/14  pbrown    Minor typo fixes.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     6    07/04/13  pbrown    Add missing language describing the
5bd8deadSopenharmony_ci                              <texImageUnitComp> grammar rule for component
5bd8deadSopenharmony_ci                              selection in TXG and TXGO instructions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     5    09/23/10  pbrown    Add missing constants for {MIN,MAX}_PROGRAM_
5bd8deadSopenharmony_ci                              TEXTURE_GATHER_OFFSET_NV (same as ARB/core).
5bd8deadSopenharmony_ci                              Add missing description for "su" in the opcode
5bd8deadSopenharmony_ci                              table; fix a couple operand order bugs for
5bd8deadSopenharmony_ci                              STORE.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     4    06/22/10  pbrown    Specify that the y/z/w component of the ATOM
5bd8deadSopenharmony_ci                              results are undefined, as is the case with
5bd8deadSopenharmony_ci                              ATOMIM from EXT_shader_image_load_store.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     3    04/13/10  pbrown    Remove F32 support from ATOM.ADD.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     2    03/22/10  pbrown    Various wording updates to the spec overview,
5bd8deadSopenharmony_ci                              dependencies, issues, and body.  Remove various
5bd8deadSopenharmony_ci                              spec language that has been refactored into the
5bd8deadSopenharmony_ci                              EXT_shader_image_load_store specification.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci     1              pbrown    Internal revisions.