15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_gpu_program5
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_NV_gpu_program5
85bd8deadSopenharmony_ci    GL_NV_gpu_program_fp64
95bd8deadSopenharmony_ci
105bd8deadSopenharmony_ciContact
115bd8deadSopenharmony_ci
125bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
135bd8deadSopenharmony_ci
145bd8deadSopenharmony_ciStatus
155bd8deadSopenharmony_ci
165bd8deadSopenharmony_ci    Shipping.
175bd8deadSopenharmony_ci
185bd8deadSopenharmony_ciVersion
195bd8deadSopenharmony_ci
205bd8deadSopenharmony_ci    Last Modified Date:         05/25/2022
215bd8deadSopenharmony_ci    NVIDIA Revision:            8
225bd8deadSopenharmony_ci
235bd8deadSopenharmony_ciNumber
245bd8deadSopenharmony_ci
255bd8deadSopenharmony_ci    388
265bd8deadSopenharmony_ci
275bd8deadSopenharmony_ciDependencies
285bd8deadSopenharmony_ci
295bd8deadSopenharmony_ci    OpenGL 2.0 is required.
305bd8deadSopenharmony_ci
315bd8deadSopenharmony_ci    This extension is written against the OpenGL 3.0 specification.
325bd8deadSopenharmony_ci
335bd8deadSopenharmony_ci    NV_gpu_program4 and NV_gpu_program4_1 are required.
345bd8deadSopenharmony_ci
355bd8deadSopenharmony_ci    NV_shader_buffer_load is required.
365bd8deadSopenharmony_ci
375bd8deadSopenharmony_ci    NV_shader_buffer_store is required.
385bd8deadSopenharmony_ci
395bd8deadSopenharmony_ci    This extension is written against and interacts with the NV_gpu_program4,
405bd8deadSopenharmony_ci    NV_vertex_program4, NV_geometry_program4, and NV_fragment_program4
415bd8deadSopenharmony_ci    specifications.
425bd8deadSopenharmony_ci
435bd8deadSopenharmony_ci    This extension interacts with NV_tessellation_program5.
445bd8deadSopenharmony_ci
455bd8deadSopenharmony_ci    This extension interacts with ARB_transform_feedback3.
465bd8deadSopenharmony_ci
475bd8deadSopenharmony_ci    This extension interacts trivially with NV_shader_buffer_load.
485bd8deadSopenharmony_ci
495bd8deadSopenharmony_ci    This extension interacts trivially with NV_shader_buffer_store.
505bd8deadSopenharmony_ci
515bd8deadSopenharmony_ci    This extension interacts trivially with NV_parameter_buffer_object2.
525bd8deadSopenharmony_ci
535bd8deadSopenharmony_ci    This extension interacts trivially with OpenGL 3.3, ARB_texture_swizzle,
545bd8deadSopenharmony_ci    and EXT_texture_swizzle.
555bd8deadSopenharmony_ci
565bd8deadSopenharmony_ci    This extension interacts trivially with ARB_blend_func_extended.
575bd8deadSopenharmony_ci
585bd8deadSopenharmony_ci    This extension interacts trivially with EXT_shader_image_load_store.
595bd8deadSopenharmony_ci
605bd8deadSopenharmony_ci    This extension interacts trivially with ARB_shader_subroutine.
615bd8deadSopenharmony_ci
625bd8deadSopenharmony_ci    If the 64-bit floating-point portion of this extension is not supported,
635bd8deadSopenharmony_ci    "GL_NV_gpu_program_fp64" will not be found in the extension string.
645bd8deadSopenharmony_ci
655bd8deadSopenharmony_ciOverview
665bd8deadSopenharmony_ci
675bd8deadSopenharmony_ci    This specification documents the common instruction set and basic
685bd8deadSopenharmony_ci    functionality provided by NVIDIA's 5th generation of assembly instruction
695bd8deadSopenharmony_ci    sets supporting programmable graphics pipeline stages.
705bd8deadSopenharmony_ci
715bd8deadSopenharmony_ci    The instruction set builds upon the basic framework provided by the
725bd8deadSopenharmony_ci    ARB_vertex_program and ARB_fragment_program extensions to expose
735bd8deadSopenharmony_ci    considerably more capable hardware.  In addition to new capabilities for
745bd8deadSopenharmony_ci    vertex and fragment programs, this extension provides new functionality
755bd8deadSopenharmony_ci    for geometry programs as originally described in the NV_geometry_program4
765bd8deadSopenharmony_ci    specification, and serves as the basis for the new tessellation control
775bd8deadSopenharmony_ci    and evaluation programs described in the NV_tessellation_program5
785bd8deadSopenharmony_ci    extension.
795bd8deadSopenharmony_ci
805bd8deadSopenharmony_ci    Programs using the functionality provided by this extension should begin
815bd8deadSopenharmony_ci    with the program headers "!!NVvp5.0" (vertex programs), "!!NVtcp5.0"
825bd8deadSopenharmony_ci    (tessellation control programs), "!!NVtep5.0" (tessellation evaluation
835bd8deadSopenharmony_ci    programs), "!!NVgp5.0" (geometry programs), and "!!NVfp5.0" (fragment
845bd8deadSopenharmony_ci    programs).
855bd8deadSopenharmony_ci
865bd8deadSopenharmony_ci    This extension provides a variety of new features, including:
875bd8deadSopenharmony_ci
885bd8deadSopenharmony_ci      * support for 64-bit integer operations;
895bd8deadSopenharmony_ci
905bd8deadSopenharmony_ci      * the ability to dynamically index into an array of texture units or
915bd8deadSopenharmony_ci        program parameter buffers;
925bd8deadSopenharmony_ci
935bd8deadSopenharmony_ci      * extending texel offset support to allow loading texel offsets from
945bd8deadSopenharmony_ci        regular integer operands computed at run-time, instead of requiring
955bd8deadSopenharmony_ci        that the offsets be constants encoded in texture instructions;
965bd8deadSopenharmony_ci
975bd8deadSopenharmony_ci      * extending TXG (texture gather) support to return the 2x2 footprint
985bd8deadSopenharmony_ci        from any component of the texture image instead of always returning
995bd8deadSopenharmony_ci        the first (x) component;
1005bd8deadSopenharmony_ci
1015bd8deadSopenharmony_ci      * extending TXG to support shadow comparisons in conjunction with a
1025bd8deadSopenharmony_ci        depth texture, via the SHADOW* targets;
1035bd8deadSopenharmony_ci
1045bd8deadSopenharmony_ci      * further extending texture gather support to provide a new opcode
1055bd8deadSopenharmony_ci        (TXGO) that applies a separate texel offset vector to each of the four
1065bd8deadSopenharmony_ci        samples returned by the instruction; 
1075bd8deadSopenharmony_ci
1085bd8deadSopenharmony_ci      * bit manipulation instructions, including ones to find the position of
1095bd8deadSopenharmony_ci        the most or least significant set bit, bitfield insertion and
1105bd8deadSopenharmony_ci        extraction, and bit reversal;
1115bd8deadSopenharmony_ci
1125bd8deadSopenharmony_ci      * a general data conversion instruction (CVT) supporting conversion
1135bd8deadSopenharmony_ci        between any two data types supported by this extension; and
1145bd8deadSopenharmony_ci
1155bd8deadSopenharmony_ci      * new instructions to compute the composite of a set of boolean
1165bd8deadSopenharmony_ci        conditions a group of shader threads.
1175bd8deadSopenharmony_ci
1185bd8deadSopenharmony_ci    This extension also provides some new capabilities for individual program
1195bd8deadSopenharmony_ci    types, including:
1205bd8deadSopenharmony_ci
1215bd8deadSopenharmony_ci      * support for instanced geometry programs, where a geometry program may
1225bd8deadSopenharmony_ci        be run multiple times for each primitive;
1235bd8deadSopenharmony_ci
1245bd8deadSopenharmony_ci      * support for emitting vertices in a geometry program where each vertex
1255bd8deadSopenharmony_ci        emitted may be directed at a specified vertex stream and captured
1265bd8deadSopenharmony_ci        using the ARB_transform_feedback3 extension;
1275bd8deadSopenharmony_ci
1285bd8deadSopenharmony_ci      * support for interpolating an attribute at a programmable offset
1295bd8deadSopenharmony_ci        relative to the pixel center (IPAO), at a programmable sample number
1305bd8deadSopenharmony_ci        (IPAS), or at the fragment's centroid location (IPAC) in a fragment
1315bd8deadSopenharmony_ci        program;
1325bd8deadSopenharmony_ci
1335bd8deadSopenharmony_ci      * support for reading a mask of covered samples in a fragment program;
1345bd8deadSopenharmony_ci
1355bd8deadSopenharmony_ci      * support for reading a point sprite coordinate directly in a fragment
1365bd8deadSopenharmony_ci        program, without overriding a texture coordinate;
1375bd8deadSopenharmony_ci
1385bd8deadSopenharmony_ci      * support for reading patch primitives and per-patch attributes
1395bd8deadSopenharmony_ci        (introduced by ARB_tessellation_shader) in a geometry program; and
1405bd8deadSopenharmony_ci
1415bd8deadSopenharmony_ci      * support for multiple output vectors for a single color output in a
1425bd8deadSopenharmony_ci        fragment program (as used by ARB_blend_func_extended).
1435bd8deadSopenharmony_ci
1445bd8deadSopenharmony_ci    This extension also provides optional support for 64-bit-per-component
1455bd8deadSopenharmony_ci    variables and 64-bit floating-point arithmetic.  These features are
1465bd8deadSopenharmony_ci    supported if and only if "NV_gpu_program_fp64" is found in the extension
1475bd8deadSopenharmony_ci    string.
1485bd8deadSopenharmony_ci
1495bd8deadSopenharmony_ci    This extension incorporates the memory access operations from the
1505bd8deadSopenharmony_ci    NV_shader_buffer_load and NV_parameter_buffer_object2 extensions,
1515bd8deadSopenharmony_ci    originally built as add-ons to NV_gpu_program4.  It also provides the
1525bd8deadSopenharmony_ci    following new capabilities:
1535bd8deadSopenharmony_ci
1545bd8deadSopenharmony_ci      * support for the features without requiring a separate OPTION keyword;
1555bd8deadSopenharmony_ci
1565bd8deadSopenharmony_ci      * support for indexing into an array of constant buffers using the LDC
1575bd8deadSopenharmony_ci        opcode added by NV_parameter_buffer_object2;
1585bd8deadSopenharmony_ci
1595bd8deadSopenharmony_ci      * support for storing into buffer objects at a specified GPU address
1605bd8deadSopenharmony_ci        using the STORE opcode, an allowing applications to create READ_WRITE
1615bd8deadSopenharmony_ci        and WRITE_ONLY mappings when making a buffer object resident using the
1625bd8deadSopenharmony_ci        API mechanisms in the NV_shader_buffer_store extension;
1635bd8deadSopenharmony_ci
1645bd8deadSopenharmony_ci      * storage instruction modifiers to allow loading and storing 64-bit
1655bd8deadSopenharmony_ci        component values;
1665bd8deadSopenharmony_ci
1675bd8deadSopenharmony_ci      * support for atomic memory transactions using the ATOM opcode, where
1685bd8deadSopenharmony_ci        the instruction atomically reads the memory pointed to by a pointer,
1695bd8deadSopenharmony_ci        performs a specified computation, stores the results of that
1705bd8deadSopenharmony_ci        computation, and returns the original value read;
1715bd8deadSopenharmony_ci
1725bd8deadSopenharmony_ci      * support for memory barrier transactions using the MEMBAR opcode, which
1735bd8deadSopenharmony_ci        ensures that all memory stores issued prior to the opcode complete
1745bd8deadSopenharmony_ci        prior to any subsequent memory transactions; and
1755bd8deadSopenharmony_ci
1765bd8deadSopenharmony_ci      * a fragment program option to specify that depth and stencil tests are
1775bd8deadSopenharmony_ci        performed prior to fragment program execution.
1785bd8deadSopenharmony_ci
1795bd8deadSopenharmony_ci    Additionally, the assembly program languages supported by this extension
1805bd8deadSopenharmony_ci    include support for reading, writing, and performing atomic memory
1815bd8deadSopenharmony_ci    operations on texture image data using the opcodes and mechanisms
1825bd8deadSopenharmony_ci    documented in the "Dependencies on NV_gpu_program5" section of the
1835bd8deadSopenharmony_ci    EXT_shader_image_load_store extension.
1845bd8deadSopenharmony_ci
1855bd8deadSopenharmony_ciNew Procedures and Functions
1865bd8deadSopenharmony_ci
1875bd8deadSopenharmony_ci    None.
1885bd8deadSopenharmony_ci
1895bd8deadSopenharmony_ciNew Tokens
1905bd8deadSopenharmony_ci
1915bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
1925bd8deadSopenharmony_ci    GetFloatv, and GetDoublev: 
1935bd8deadSopenharmony_ci
1945bd8deadSopenharmony_ci        MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV             0x8E5A
1955bd8deadSopenharmony_ci        MIN_FRAGMENT_INTERPOLATION_OFFSET_NV            0x8E5B
1965bd8deadSopenharmony_ci        MAX_FRAGMENT_INTERPOLATION_OFFSET_NV            0x8E5C
1975bd8deadSopenharmony_ci        FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV   0x8E5D
1985bd8deadSopenharmony_ci        MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV            0x8E5E
1995bd8deadSopenharmony_ci        MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV            0x8E5F
2005bd8deadSopenharmony_ci
2015bd8deadSopenharmony_ci
2025bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation)
2035bd8deadSopenharmony_ci
2045bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_fragment_program4, Program Grammar
2055bd8deadSopenharmony_ci
2065bd8deadSopenharmony_ci    (modify the section, updating the program header string for the extended
2075bd8deadSopenharmony_ci     instruction set)
2085bd8deadSopenharmony_ci
2095bd8deadSopenharmony_ci    Fragment programs are required to begin with the header string
2105bd8deadSopenharmony_ci    "!!NVfp5.0".  This header string identifies the subsequent program body as
2115bd8deadSopenharmony_ci    being a fragment program and indicates that it should be parsed according
2125bd8deadSopenharmony_ci    to the base NV_gpu_program5 grammar plus the additions below.  Program
2135bd8deadSopenharmony_ci    string parsing begins with the character immediately following the header
2145bd8deadSopenharmony_ci    string.
2155bd8deadSopenharmony_ci
2165bd8deadSopenharmony_ci    (add/change the following rules to the NV_fragment_program4 and 
2175bd8deadSopenharmony_ci     NV_gpu_program5 base grammars)
2185bd8deadSopenharmony_ci
2195bd8deadSopenharmony_ci    <SpecialInstruction>    ::= "IPAC" <opModifiers> <instResult> "," 
2205bd8deadSopenharmony_ci                                <instOperandV>
2215bd8deadSopenharmony_ci                              | "IPAO" <opModifiers> <instResult> "," 
2225bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandV>
2235bd8deadSopenharmony_ci                              | "IPAS" <opModifiers> <instResult> "," 
2245bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandS>
2255bd8deadSopenharmony_ci
2265bd8deadSopenharmony_ci    <interpModifier>        ::= "SAMPLE"
2275bd8deadSopenharmony_ci
2285bd8deadSopenharmony_ci    <attribBasic>           ::= <fragPrefix> "sampleid"
2295bd8deadSopenharmony_ci                              | <fragPrefix> "samplemask"
2305bd8deadSopenharmony_ci                              | <fragPrefix> "pointcoord"
2315bd8deadSopenharmony_ci
2325bd8deadSopenharmony_ci    <resultBasic>           ::= <resPrefix> "color" <resultOptColorNum>
2335bd8deadSopenharmony_ci                                <resultOptColorType>
2345bd8deadSopenharmony_ci                              | <resPrefix> "samplemask"
2355bd8deadSopenharmony_ci
2365bd8deadSopenharmony_ci    <resultOptColorType>    ::= ""
2375bd8deadSopenharmony_ci                              | "." <colorType>
2385bd8deadSopenharmony_ci
2395bd8deadSopenharmony_ci
2405bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_geometry_program4, Program Grammar
2415bd8deadSopenharmony_ci
2425bd8deadSopenharmony_ci    (modify the section, updating the program header string for the extended
2435bd8deadSopenharmony_ci     instruction set)
2445bd8deadSopenharmony_ci
2455bd8deadSopenharmony_ci    Geometry programs are required to begin with the header string
2465bd8deadSopenharmony_ci    "!!NVgp5.0".  This header string identifies the subsequent program body as
2475bd8deadSopenharmony_ci    being a geometry program and indicates that it should be parsed according
2485bd8deadSopenharmony_ci    to the base NV_gpu_program5 grammar plus the additions below.  Program
2495bd8deadSopenharmony_ci    string parsing begins with the character immediately following the header
2505bd8deadSopenharmony_ci    string.
2515bd8deadSopenharmony_ci
2525bd8deadSopenharmony_ci    (add the following rules to the NV_geometry_program4 and NV_gpu_program5
2535bd8deadSopenharmony_ci     base grammars)
2545bd8deadSopenharmony_ci
2555bd8deadSopenharmony_ci    <declaration>           ::= "INVOCATIONS" <int>
2565bd8deadSopenharmony_ci
2575bd8deadSopenharmony_ci    <declPrimInType>        ::= "PATCHES"
2585bd8deadSopenharmony_ci
2595bd8deadSopenharmony_ci    <SpecialInstruction>    ::= "EMITS" <instOperandS>
2605bd8deadSopenharmony_ci
2615bd8deadSopenharmony_ci    <attribBasic>           ::= <primPrefix> "invocation"
2625bd8deadSopenharmony_ci                              | <primPrefix> "vertexcount"
2635bd8deadSopenharmony_ci                              | <attribTessOuter> <optArrayMemAbs>
2645bd8deadSopenharmony_ci                              | <attribTessInner> <optArrayMemAbs>
2655bd8deadSopenharmony_ci                              | <attribPatchGeneric> <optArrayMemAbs>
2665bd8deadSopenharmony_ci
2675bd8deadSopenharmony_ci    <attribMulti>           ::= <attribTessOuter> <arrayRange>
2685bd8deadSopenharmony_ci                              | <attribTessInner> <arrayRange>
2695bd8deadSopenharmony_ci                              | <attribPatchGeneric> <arrayRange>
2705bd8deadSopenharmony_ci
2715bd8deadSopenharmony_ci    <attribTessOuter>       ::= <primPrefix> "." "tessouter"
2725bd8deadSopenharmony_ci
2735bd8deadSopenharmony_ci    <attribTessInner>       ::= <primPrefix> "." "tessinner"
2745bd8deadSopenharmony_ci
2755bd8deadSopenharmony_ci    <attribPatchGeneric>    ::= <primPrefix> "." "patch" "." "attrib"
2765bd8deadSopenharmony_ci
2775bd8deadSopenharmony_ci
2785bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_vertex_program4, Program Grammar
2795bd8deadSopenharmony_ci
2805bd8deadSopenharmony_ci    (modify the section, updating the program header string for the extended
2815bd8deadSopenharmony_ci     instruction set)
2825bd8deadSopenharmony_ci
2835bd8deadSopenharmony_ci    Vertex programs are required to begin with the header string "!!NVvp5.0".
2845bd8deadSopenharmony_ci    This header string identifies the subsequent program body as being a
2855bd8deadSopenharmony_ci    vertex program and indicates that it should be parsed according to the
2865bd8deadSopenharmony_ci    base NV_gpu_program5 grammar plus the additions below.  Program string
2875bd8deadSopenharmony_ci    parsing begins with the character immediately following the header string.
2885bd8deadSopenharmony_ci
2895bd8deadSopenharmony_ci
2905bd8deadSopenharmony_ci    Modify Section 2.X.2 of NV_gpu_program4, Program Grammar
2915bd8deadSopenharmony_ci
2925bd8deadSopenharmony_ci    (add the following grammar rules to the NV_gpu_program4 base grammar;
2935bd8deadSopenharmony_ci     additional grammar rules usable for assembly programs are documented in
2945bd8deadSopenharmony_ci     the EXT_shader_image_load_store and ARB_shader_subroutine specifications)
2955bd8deadSopenharmony_ci
2965bd8deadSopenharmony_ci    <instruction>           ::= <MemInstruction>
2975bd8deadSopenharmony_ci
2985bd8deadSopenharmony_ci    <MemInstruction>        ::= <ATOMop_instruction>
2995bd8deadSopenharmony_ci                              | <STOREop_instruction>
3005bd8deadSopenharmony_ci                              | <MEMBARop_instruction>
3015bd8deadSopenharmony_ci
3025bd8deadSopenharmony_ci    <VECTORop>              ::= "BFR"
3035bd8deadSopenharmony_ci                              | "BTC"
3045bd8deadSopenharmony_ci                              | "BTFL"
3055bd8deadSopenharmony_ci                              | "BTFM"
3065bd8deadSopenharmony_ci                              | "PK64"
3075bd8deadSopenharmony_ci                              | "LDC"
3085bd8deadSopenharmony_ci                              | "CVT"
3095bd8deadSopenharmony_ci                              | "TGALL"
3105bd8deadSopenharmony_ci                              | "TGANY"
3115bd8deadSopenharmony_ci                              | "TGEQ"
3125bd8deadSopenharmony_ci                              | "UP64"
3135bd8deadSopenharmony_ci
3145bd8deadSopenharmony_ci    <SCALARop>              ::= "LOAD"
3155bd8deadSopenharmony_ci
3165bd8deadSopenharmony_ci    <BINop>                 ::= "BFE"
3175bd8deadSopenharmony_ci
3185bd8deadSopenharmony_ci    <TRIop>                 ::= "BFI"
3195bd8deadSopenharmony_ci
3205bd8deadSopenharmony_ci    <TEXop_instruction>     ::= <TEXop> <opModifiers> <instResult> "," 
3215bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandV> "," 
3225bd8deadSopenharmony_ci                                <texAccess>
3235bd8deadSopenharmony_ci
3245bd8deadSopenharmony_ci    <TEXop>                 ::= "TXG"
3255bd8deadSopenharmony_ci                              | "LOD"
3265bd8deadSopenharmony_ci
3275bd8deadSopenharmony_ci    <TXDop>                 ::= "TXGO"
3285bd8deadSopenharmony_ci
3295bd8deadSopenharmony_ci    <ATOMop_instruction>    ::= <ATOMop> <opModifiers> <instResult> "," 
3305bd8deadSopenharmony_ci                                <instOperandV> "," <instOperandS>
3315bd8deadSopenharmony_ci
3325bd8deadSopenharmony_ci    <ATOMop>                ::= "ATOM"
3335bd8deadSopenharmony_ci
3345bd8deadSopenharmony_ci    <STOREop_instruction>   ::= <STOREop> <opModifiers> <instOperandV> "," 
3355bd8deadSopenharmony_ci                                <instOperandS>
3365bd8deadSopenharmony_ci
3375bd8deadSopenharmony_ci    <STOREop>               ::= "STORE"
3385bd8deadSopenharmony_ci
3395bd8deadSopenharmony_ci    <MEMBARop_instruction>  ::= <MEMBARop> <opModifiers>
3405bd8deadSopenharmony_ci
3415bd8deadSopenharmony_ci    <MEMBARop>              ::= "MEMBAR"
3425bd8deadSopenharmony_ci
3435bd8deadSopenharmony_ci    <opModifier>            ::= "F16"
3445bd8deadSopenharmony_ci                              | "F32"
3455bd8deadSopenharmony_ci                              | "F64"
3465bd8deadSopenharmony_ci                              | "F32X2"
3475bd8deadSopenharmony_ci                              | "F32X4"
3485bd8deadSopenharmony_ci                              | "F64X2"
3495bd8deadSopenharmony_ci                              | "F64X4"
3505bd8deadSopenharmony_ci                              | "S8"
3515bd8deadSopenharmony_ci                              | "S16"
3525bd8deadSopenharmony_ci                              | "S32"
3535bd8deadSopenharmony_ci                              | "S32X2"
3545bd8deadSopenharmony_ci                              | "S32X4"
3555bd8deadSopenharmony_ci                              | "S64"
3565bd8deadSopenharmony_ci                              | "S64X2"
3575bd8deadSopenharmony_ci                              | "S64X4"
3585bd8deadSopenharmony_ci                              | "U8"
3595bd8deadSopenharmony_ci                              | "U16"
3605bd8deadSopenharmony_ci                              | "U32"
3615bd8deadSopenharmony_ci                              | "U32X2"
3625bd8deadSopenharmony_ci                              | "U32X4"
3635bd8deadSopenharmony_ci                              | "U64"
3645bd8deadSopenharmony_ci                              | "U64X2"
3655bd8deadSopenharmony_ci                              | "U64X4"
3665bd8deadSopenharmony_ci                              | "ADD"
3675bd8deadSopenharmony_ci                              | "MIN"
3685bd8deadSopenharmony_ci                              | "MAX"
3695bd8deadSopenharmony_ci                              | "IWRAP"
3705bd8deadSopenharmony_ci                              | "DWRAP"
3715bd8deadSopenharmony_ci                              | "AND"
3725bd8deadSopenharmony_ci                              | "OR"
3735bd8deadSopenharmony_ci                              | "XOR"
3745bd8deadSopenharmony_ci                              | "EXCH"
3755bd8deadSopenharmony_ci                              | "CSWAP"
3765bd8deadSopenharmony_ci                              | "COH"
3775bd8deadSopenharmony_ci                              | "ROUND"
3785bd8deadSopenharmony_ci                              | "CEIL"
3795bd8deadSopenharmony_ci                              | "FLR"
3805bd8deadSopenharmony_ci                              | "TRUNC"
3815bd8deadSopenharmony_ci                              | "PREC"
3825bd8deadSopenharmony_ci                              | "VOL"
3835bd8deadSopenharmony_ci
3845bd8deadSopenharmony_ci    <texAccess>             ::= <textureUseS> "," <texTarget> <optTexOffset>
3855bd8deadSopenharmony_ci                              | <textureUseV> "," <texTarget> <optTexOffset>
3865bd8deadSopenharmony_ci
3875bd8deadSopenharmony_ci    <texTarget>             ::= "ARRAYCUBE"
3885bd8deadSopenharmony_ci                              | "SHADOWARRAYCUBE"
3895bd8deadSopenharmony_ci
3905bd8deadSopenharmony_ci    <optTexOffset>          ::= /* empty */
3915bd8deadSopenharmony_ci                              | <texOffset>
3925bd8deadSopenharmony_ci
3935bd8deadSopenharmony_ci    <texOffset>             ::= "offset" "(" <instOperandV> ")"
3945bd8deadSopenharmony_ci
3955bd8deadSopenharmony_ci    <namingStatement>       ::= <TEXTURE_statement>
3965bd8deadSopenharmony_ci
3975bd8deadSopenharmony_ci    <BUFFER_statement>      ::= <bufferDeclType> <establishName> 
3985bd8deadSopenharmony_ci                                <optArraySize> <optArraySize> "=" 
3995bd8deadSopenharmony_ci                                <bufferMultInit>
4005bd8deadSopenharmony_ci
4015bd8deadSopenharmony_ci    <bufferDeclType>        ::= "CBUFFER"
4025bd8deadSopenharmony_ci
4035bd8deadSopenharmony_ci    <TEXTURE_statement>     ::= "TEXTURE" <establishName> <texSingleInit>
4045bd8deadSopenharmony_ci                              | "TEXTURE" <establishName> <optArraySize> 
4055bd8deadSopenharmony_ci                                <texMultipleInit>
4065bd8deadSopenharmony_ci
4075bd8deadSopenharmony_ci    <texSingleInit>         ::= "=" <textureUseDS>
4085bd8deadSopenharmony_ci
4095bd8deadSopenharmony_ci    <texMultipleInit>       ::= "=" "{" <texItemList> "}"
4105bd8deadSopenharmony_ci
4115bd8deadSopenharmony_ci    <texItemList>           ::= <textureUseDM>
4125bd8deadSopenharmony_ci                              | <textureUseDM> "," <texItemList>
4135bd8deadSopenharmony_ci
4145bd8deadSopenharmony_ci    <bufferBinding>         ::= "program" "." "buffer" <arrayRange>
4155bd8deadSopenharmony_ci
4165bd8deadSopenharmony_ci    <textureUseS>           ::= <textureUseV> <texImageUnitComp>
4175bd8deadSopenharmony_ci
4185bd8deadSopenharmony_ci    <textureUseV>           ::= <texImageUnit>
4195bd8deadSopenharmony_ci                              | <texVarName> <optArrayMem>
4205bd8deadSopenharmony_ci
4215bd8deadSopenharmony_ci    <textureUseDS>          ::= "texture" <arrayMemAbs>
4225bd8deadSopenharmony_ci
4235bd8deadSopenharmony_ci    <textureUseDM>          ::= <textureUseDS>
4245bd8deadSopenharmony_ci                              | "texture" <arrayRange>
4255bd8deadSopenharmony_ci
4265bd8deadSopenharmony_ci    <texImageUnitComp>      ::= <scalarSuffix>
4275bd8deadSopenharmony_ci
4285bd8deadSopenharmony_ci
4295bd8deadSopenharmony_ci    Modify Section 2.X.3.1, Program Variable Types
4305bd8deadSopenharmony_ci
4315bd8deadSopenharmony_ci    (IGNORE if GL_NV_gpu_program_fp64 is not found in the extension string.
4325bd8deadSopenharmony_ci     Otherwise modify storage size modifiers to guarantee that "LONG"
4335bd8deadSopenharmony_ci     variables are at least 64 bits in size.)
4345bd8deadSopenharmony_ci
4355bd8deadSopenharmony_ci    Explicitly declared variables may optionally have one storage size
4365bd8deadSopenharmony_ci    modifier.  Variables decared as "SHORT" will be represented using at least
4375bd8deadSopenharmony_ci    16 bits per component.  "SHORT" floating-point values will have at least 5
4385bd8deadSopenharmony_ci    bits of exponent and 10 bits of mantissa.  Variables declared as "LONG"
4395bd8deadSopenharmony_ci    will be represented with at least 64 bits per component.  "LONG"
4405bd8deadSopenharmony_ci    floating-point values will have at least 11 bits of exponent and 52 bits
4415bd8deadSopenharmony_ci    of mantissa.  If no size modifier is provided, the GL will automatically
4425bd8deadSopenharmony_ci    select component sizes.  Implementations are not required to support more
4435bd8deadSopenharmony_ci    than one component size, so "SHORT", "LONG", and the default could all
4445bd8deadSopenharmony_ci    refer to the same component size.  The "LONG" modifier is supported only
4455bd8deadSopenharmony_ci    for declarations of temporary variables ("TEMP"), and attribute variables 
4465bd8deadSopenharmony_ci    ("ATTRIB") in vertex programs.  The "SHORT" modifier is supported only 
4475bd8deadSopenharmony_ci    for declarations of temporary variables and result variables ("OUTPUT").
4485bd8deadSopenharmony_ci
4495bd8deadSopenharmony_ci
4505bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_fragment_program4 specification, Program
4515bd8deadSopenharmony_ci    Attribute Variables.
4525bd8deadSopenharmony_ci
4535bd8deadSopenharmony_ci    (Add a table entry and relevant text describing the fragment program
4545bd8deadSopenharmony_ci     input sample mask variable.)
4555bd8deadSopenharmony_ci
4565bd8deadSopenharmony_ci      Fragment Attribute Binding  Components  Underlying State
4575bd8deadSopenharmony_ci      --------------------------  ----------  ----------------------------
4585bd8deadSopenharmony_ci      fragment.samplemask         (m,-,-,-)   fragment coverage mask
4595bd8deadSopenharmony_ci      fragment.pointcoord         (s,t,-,-)   fragment point sprite coordinate
4605bd8deadSopenharmony_ci
4615bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.samplemask", the "x"
4625bd8deadSopenharmony_ci    component is filled with a coverage mask indicating the set of samples
4635bd8deadSopenharmony_ci    covered by this fragment.  The coverage mask is a bitfield, where bit <n>
4645bd8deadSopenharmony_ci    is one if the sample number <n> is covered and zero otherwise.  If
4655bd8deadSopenharmony_ci    multisample buffers are not available (SAMPLE_BUFFERS is zero), bit zero
4665bd8deadSopenharmony_ci    indicates if the center of the pixel corresponding to the fragment is
4675bd8deadSopenharmony_ci    covered.
4685bd8deadSopenharmony_ci
4695bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.pointcoord", the "x" and
4705bd8deadSopenharmony_ci    "y" components are filled with the s and t point sprite coordinates
4715bd8deadSopenharmony_ci    (section 3.3.1), respectively.  The "z" and "w" components are undefined.
4725bd8deadSopenharmony_ci    If the fragment is generated by any primitive other than a point, or if
4735bd8deadSopenharmony_ci    point sprites are disabled, all four components of the binding are
4745bd8deadSopenharmony_ci    undefined.
4755bd8deadSopenharmony_ci
4765bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_geometry_program4 specification, Program
4775bd8deadSopenharmony_ci    Attribute Variables.
4785bd8deadSopenharmony_ci
4795bd8deadSopenharmony_ci    (Add a table entry and relevant text describing the geometry program
4805bd8deadSopenharmony_ci    invocation attribute and per-patch attributes.)
4815bd8deadSopenharmony_ci
4825bd8deadSopenharmony_ci      Geometry Vertex Binding         Components  Description
4835bd8deadSopenharmony_ci      -----------------------------   ----------  ----------------------------
4845bd8deadSopenharmony_ci      ...
4855bd8deadSopenharmony_ci      primitive.invocation            (id,-,-,-)  geometry program invocation
4865bd8deadSopenharmony_ci      primitive.tessouter[n]          (x,-,-,-)   outer tess. level n
4875bd8deadSopenharmony_ci      primitive.tessinner[n]          (x,-,-,-)   inner tess. level n
4885bd8deadSopenharmony_ci      primitive.patch.attrib[n]       (x,y,z,w)   generic patch attribute n
4895bd8deadSopenharmony_ci      primitive.tessouter[n..o]       (x,-,-,-)   outer tess. levels n to o
4905bd8deadSopenharmony_ci      primitive.tessinner[n..o]       (x,-,-,-)   inner tess. levels n to o
4915bd8deadSopenharmony_ci      primitive.patch.attrib[n..o]    (x,y,z,w)   generic patch attrib n to o
4925bd8deadSopenharmony_ci      primitive.vertexcount           (c,-,-,-)   vertices in primitive
4935bd8deadSopenharmony_ci
4945bd8deadSopenharmony_ci    ...
4955bd8deadSopenharmony_ci
4965bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.invocation", the "x"
4975bd8deadSopenharmony_ci    component is filled with an integer giving the number of previous
4985bd8deadSopenharmony_ci    invocations of the geometry program on the primitive being processed.  If
4995bd8deadSopenharmony_ci    the geometry program is invoked only once per primitive (default), this
5005bd8deadSopenharmony_ci    component will always be zero.  If the program is invoked multiple times
5015bd8deadSopenharmony_ci    (via the INVOCATIONS declaration), the component will be zero on the first
5025bd8deadSopenharmony_ci    invocation, one on the second, and so forth.  The "y", "z", and "w"
5035bd8deadSopenharmony_ci    components of the variable are always undefined.
5045bd8deadSopenharmony_ci
5055bd8deadSopenharmony_ci    If an attribute binding matches "primitive.tessouter[n]", the "x"
5065bd8deadSopenharmony_ci    component is filled with the per-patch outer tessellation level numbered
5075bd8deadSopenharmony_ci    <n> of the input patch.  <n> must be less than four.  The "y", "z", and
5085bd8deadSopenharmony_ci    "w" components are always undefined.  A program will fail to load if this
5095bd8deadSopenharmony_ci    attribute binding is used and the input primitive type is not PATCHES.
5105bd8deadSopenharmony_ci
5115bd8deadSopenharmony_ci    If an attribute binding matches "primitive.tessinner[n]", the "x"
5125bd8deadSopenharmony_ci    component is filled with the per-patch inner tessellation level numbered
5135bd8deadSopenharmony_ci    <n> of the input patch.  <n> must be less than two.  The "y", "z", and "w"
5145bd8deadSopenharmony_ci    components are always undefined.  A program will fail to load if this
5155bd8deadSopenharmony_ci    attribute binding is used and the input primitive type is not PATCHES.
5165bd8deadSopenharmony_ci
5175bd8deadSopenharmony_ci    If an attribute binding matches "primitive.patch.attrib[n]", the "x", "y",
5185bd8deadSopenharmony_ci    "z", and "w" components are filled with the corresponding components of
5195bd8deadSopenharmony_ci    the per-patch generic attribute numbered <n> of the input patch.  A
5205bd8deadSopenharmony_ci    program will fail to load if this attribute binding is used and the input
5215bd8deadSopenharmony_ci    primitive type is not PATCHES.
5225bd8deadSopenharmony_ci
5235bd8deadSopenharmony_ci    If an attribute binding matches "primitive.tessouter[n..o]",
5245bd8deadSopenharmony_ci    "primitive.tessinner[n..o]", or "primitive.patch.attrib[n..o]", a sequence
5255bd8deadSopenharmony_ci    of 1+<o>-<n> outer tessellation level, inner tessellation level, or
5265bd8deadSopenharmony_ci    per-patch generic attribute bindings is created.  For per-patch generic
5275bd8deadSopenharmony_ci    attribute bindings, it is as though the sequence
5285bd8deadSopenharmony_ci    "primitive.patch.attrib[n], primitive.patch.attrib[n+1], ...
5295bd8deadSopenharmony_ci    primitive.patch.attrib[o]" were specfied.  These bindings are available
5305bd8deadSopenharmony_ci    only in explicit declarations of array variables.  A program will fail to
5315bd8deadSopenharmony_ci    load if <n> is greater than <o> or the input primitive type is not
5325bd8deadSopenharmony_ci    PATCHES.
5335bd8deadSopenharmony_ci
5345bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.vertexcount", the "x"
5355bd8deadSopenharmony_ci    component is filled with the number of vertices in the input primitive
5365bd8deadSopenharmony_ci    being processed.  The "y", "z", and "w" components of the variable are
5375bd8deadSopenharmony_ci    always undefined.
5385bd8deadSopenharmony_ci
5395bd8deadSopenharmony_ci
5405bd8deadSopenharmony_ci    Modify Section 2.X.3.5, Program Results
5415bd8deadSopenharmony_ci
5425bd8deadSopenharmony_ci    (modify Table X.X) 
5435bd8deadSopenharmony_ci
5445bd8deadSopenharmony_ci      Binding                        Components  Description
5455bd8deadSopenharmony_ci      -----------------------------  ----------  ----------------------------
5465bd8deadSopenharmony_ci      result.color[n].primary        (r,g,b,a)   primary color n (SRC_COLOR)
5475bd8deadSopenharmony_ci      result.color[n].secondary      (r,g,b,a)   secondary color n (SRC1_COLOR)
5485bd8deadSopenharmony_ci
5495bd8deadSopenharmony_ci      Table X.X:  Fragment Result Variable Bindings. Components labeled "*"
5505bd8deadSopenharmony_ci      are unused. "[n]" is optional -- color <n> is used if specified; color
5515bd8deadSopenharmony_ci      0 is used otherwise.
5525bd8deadSopenharmony_ci
5535bd8deadSopenharmony_ci    (add after third paragraph)
5545bd8deadSopenharmony_ci
5555bd8deadSopenharmony_ci    If a result variable binding matches "result.color[n].primary" or
5565bd8deadSopenharmony_ci    "result.color[n].secondary" and the ARB_blend_func_extended option is
5575bd8deadSopenharmony_ci    specified, updates to the "x", "y", "z", and "w" components of these color
5585bd8deadSopenharmony_ci    result variables modify the "r", "g", "b", and "a" components of the
5595bd8deadSopenharmony_ci    SRC_COLOR and SRC1_COLOR color outputs, respectively, for the fragment
5605bd8deadSopenharmony_ci    output color numbered <n>.  If the ARB_blend_func_extended program option
5615bd8deadSopenharmony_ci    is not specified, the "result.color[n].primary" and
5625bd8deadSopenharmony_ci    "result.color[n].secondary" bindings are unavailable.
5635bd8deadSopenharmony_ci
5645bd8deadSopenharmony_ci
5655bd8deadSopenharmony_ci    Modify Section 2.X.3.6, Program Parameter Buffers
5665bd8deadSopenharmony_ci
5675bd8deadSopenharmony_ci    (modify the description of parameter buffer arrays to require that all
5685bd8deadSopenharmony_ci    bindings in an array declaration must use the same single buffer *or*
5695bd8deadSopenharmony_ci    buffer range)
5705bd8deadSopenharmony_ci
5715bd8deadSopenharmony_ci    ...  Program parameter buffer variables may be declared as arrays, but all
5725bd8deadSopenharmony_ci    bindings assigned to the array must use the same binding point or binding
5735bd8deadSopenharmony_ci    point range, and must increase consecutively.
5745bd8deadSopenharmony_ci
5755bd8deadSopenharmony_ci    (add to the end of the section)
5765bd8deadSopenharmony_ci
5775bd8deadSopenharmony_ci    In explicit variable declarations, the bindings in Table X.12.1 of the
5785bd8deadSopenharmony_ci    form "program.buffer[a..b]" may also be used, and indicate the variable
5795bd8deadSopenharmony_ci    spans multiple buffer binding points.  Such variables must be accessed as
5805bd8deadSopenharmony_ci    an arrays, with the first index specifying an offset into the range of
5815bd8deadSopenharmony_ci    buffer object binding points.  A buffer index of zero identifies binding
5825bd8deadSopenharmony_ci    point <a>; an index of <b>-<a>-1 identifies binding point <b>.  If such a
5835bd8deadSopenharmony_ci    variable is declared as an array, a second index must be provided to
5845bd8deadSopenharmony_ci    identify the individual array element.  A program will fail to compile if
5855bd8deadSopenharmony_ci    such bindings are used when <a> or <b> is negative or greater than or
5865bd8deadSopenharmony_ci    equal to the number of buffer binding points supported for the program
5875bd8deadSopenharmony_ci    type, or if <a> is greater than <b>.  The bindings in Table X.12.1 may not
5885bd8deadSopenharmony_ci    be used in implicit variable declarations.
5895bd8deadSopenharmony_ci
5905bd8deadSopenharmony_ci      Binding                        Components  Underlying State
5915bd8deadSopenharmony_ci      -----------------------------  ----------  -----------------------------
5925bd8deadSopenharmony_ci      program.buffer[a..b][c]        (x,x,x,x)   program parameter buffers a
5935bd8deadSopenharmony_ci                                                   through b, element c
5945bd8deadSopenharmony_ci      program.buffer[a..b][c..d]     (x,x,x,x)   program parameter buffers a
5955bd8deadSopenharmony_ci                                                   through b, elements b
5965bd8deadSopenharmony_ci                                                   through c
5975bd8deadSopenharmony_ci      program.buffer[a..b]           (x,x,x,x)   program parameter buffers a
5985bd8deadSopenharmony_ci                                                   through b, all elements
5995bd8deadSopenharmony_ci
6005bd8deadSopenharmony_ci      Table X.12.1:  Program Parameter Buffer Array Bindings.  <a> and <b>
6015bd8deadSopenharmony_ci      indicate buffer numbers, <c> and <d> indicate individual elements.
6025bd8deadSopenharmony_ci
6035bd8deadSopenharmony_ci    When bindings beginning with "program.buffer[a..b]" are used in a variable
6045bd8deadSopenharmony_ci    declaration, they behave identically to corresponding beginning with
6055bd8deadSopenharmony_ci    "program.buffer[a]", except that the variable is filled with a separate
6065bd8deadSopenharmony_ci    set of values for each buffer binding point from <a> to <b> inclusive.
6075bd8deadSopenharmony_ci
6085bd8deadSopenharmony_ci    (add new section after Section 2.X.3.7, Program Condition Code Registers
6095bd8deadSopenharmony_ci    and renumber subsequent sections accordingly)
6105bd8deadSopenharmony_ci
6115bd8deadSopenharmony_ci    Section 2.X.3.8, Program Texture Variables
6125bd8deadSopenharmony_ci
6135bd8deadSopenharmony_ci    Program texture variables are used as constants during program execution
6145bd8deadSopenharmony_ci    and refer the texture objects bound to to one or more texture image units.
6155bd8deadSopenharmony_ci    All texture variables have associated bindings and are read-only during
6165bd8deadSopenharmony_ci    program execution.  Texture variables retain their values across program
6175bd8deadSopenharmony_ci    invocations, and the set of texture image units to which they refer is
6185bd8deadSopenharmony_ci    constant.  The texture object a variable refers to may be changed by
6195bd8deadSopenharmony_ci    binding a new texture object to the appropriate target of the
6205bd8deadSopenharmony_ci    corresponding texture image unit.  Texture variables may only be used to
6215bd8deadSopenharmony_ci    identify a texture object in texture instructions, and may not be used as
6225bd8deadSopenharmony_ci    operands in any other instruction.  Texture variables may be declared
6235bd8deadSopenharmony_ci    explicitly via the <TEXTURE_statement> grammar rule, or implicitly by
6245bd8deadSopenharmony_ci    using a texture image unit binding in an instruction.
6255bd8deadSopenharmony_ci
6265bd8deadSopenharmony_ci    Texture array variables may be declared as arrays, but the list of
6275bd8deadSopenharmony_ci    texture image units assigned to the array must increase consectively.
6285bd8deadSopenharmony_ci
6295bd8deadSopenharmony_ci    Texture variables identify only a texture image unit; the corresponding
6305bd8deadSopenharmony_ci    texture target (e.g., 1D, 2D, CUBE) and texture object is identified by
6315bd8deadSopenharmony_ci    the <texTarget> grammar rule in instructions using the texture variable.
6325bd8deadSopenharmony_ci
6335bd8deadSopenharmony_ci      Binding          Components  Underlying State
6345bd8deadSopenharmony_ci      ---------------  ----------  ------------------------------------------
6355bd8deadSopenharmony_ci      texture[a]           x      texture object bound to image unit a
6365bd8deadSopenharmony_ci      texture[a..b]        x      texture objects bound to image units a
6375bd8deadSopenharmony_ci                                     through b
6385bd8deadSopenharmony_ci
6395bd8deadSopenharmony_ci      Table X.12.2:  Texture Image Unit Bindings.  <a> and <b> indicate
6405bd8deadSopenharmony_ci      texture image unit numbers.
6415bd8deadSopenharmony_ci
6425bd8deadSopenharmony_ci    If a texture binding matches "texture[a]", the texture variable is filled
6435bd8deadSopenharmony_ci    with a single integer referring to texture image unit <a>.
6445bd8deadSopenharmony_ci
6455bd8deadSopenharmony_ci    If a texture binding matches "texture[a..b]", the texture variable is
6465bd8deadSopenharmony_ci    filled with an array of integers referring to texture image units <a>
6475bd8deadSopenharmony_ci    through <b>, inclusive.  A program will fail to compile if <a> or <b> is
6485bd8deadSopenharmony_ci    negative or greater than or equal to the number of texture image units
6495bd8deadSopenharmony_ci    supported, or if <a> is greater than <b>.
6505bd8deadSopenharmony_ci
6515bd8deadSopenharmony_ci
6525bd8deadSopenharmony_ci    Modify Section 2.X.4, Program Execution Environment
6535bd8deadSopenharmony_ci
6545bd8deadSopenharmony_ci    (Update the instruction set table to include new columns to indicate the
6555bd8deadSopenharmony_ci     first ISA supporting the instruction, and to indicate whether the
6565bd8deadSopenharmony_ci     instruction supports 64-bit floating-point modifiers.)
6575bd8deadSopenharmony_ci
6585bd8deadSopenharmony_ci      Instr-      Modifiers 
6595bd8deadSopenharmony_ci      uction  V  F I C S H D  Out Inputs    Description
6605bd8deadSopenharmony_ci      ------- -- - - - - - -  --- --------  --------------------------------
6615bd8deadSopenharmony_ci      ABS     40 6 6 X X X F  v   v         absolute value
6625bd8deadSopenharmony_ci      ADD     40 6 6 X X X F  v   v,v       add
6635bd8deadSopenharmony_ci      AND     40 - 6 X - - S  v   v,v       bitwise and
6645bd8deadSopenharmony_ci      ATOM    50 - - X - - -  s   v,su      atomic memory transaction
6655bd8deadSopenharmony_ci      BFE     50 - X X - - S  v   v,v       bitfield extract
6665bd8deadSopenharmony_ci      BFI     50 - X X - - S  v   v,v,v     bitfield insert
6675bd8deadSopenharmony_ci      BFR     50 - X X - - S  v   v         bitfield reverse
6685bd8deadSopenharmony_ci      BRK     40 - - - - - -  -   c         break out of loop instruction
6695bd8deadSopenharmony_ci      BTC     50 - X X - - S  v   v         bit count
6705bd8deadSopenharmony_ci      BTFL    50 - X X - - S  v   v         find least significant bit
6715bd8deadSopenharmony_ci      BTFM    50 - X X - - S  v   v         find most significant bit
6725bd8deadSopenharmony_ci      CAL     40 - - - - - -  -   c         subroutine call
6735bd8deadSopenharmony_ci      CEIL    40 6 6 X X X F  v   vf        ceiling
6745bd8deadSopenharmony_ci      CMP     40 6 6 X X X F  v   v,v,v     compare
6755bd8deadSopenharmony_ci      CONT    40 - - - - - -  -   c         continue with next loop interation
6765bd8deadSopenharmony_ci      COS     40 X - X X X F  s   s         cosine with reduction to [-PI,PI]
6775bd8deadSopenharmony_ci      CVT     50 - - X X - F  v   v         general data type conversion
6785bd8deadSopenharmony_ci      DDX     40 X - X X X F  v   v         derivative relative to X (fp-only)
6795bd8deadSopenharmony_ci      DDY     40 X - X X X F  v   v         derivative relative to Y (fp-only)
6805bd8deadSopenharmony_ci      DIV     40 6 6 X X X F  v   v,s       divide vector components by scalar
6815bd8deadSopenharmony_ci      DP2     40 X - X X X F  s   v,v       2-component dot product
6825bd8deadSopenharmony_ci      DP2A    40 X - X X X F  s   v,v,v     2-comp. dot product w/scalar add
6835bd8deadSopenharmony_ci      DP3     40 X - X X X F  s   v,v       3-component dot product
6845bd8deadSopenharmony_ci      DP4     40 X - X X X F  s   v,v       4-component dot product
6855bd8deadSopenharmony_ci      DPH     40 X - X X X F  s   v,v       homogeneous dot product
6865bd8deadSopenharmony_ci      DST     40 X - X X X F  v   v,v       distance vector
6875bd8deadSopenharmony_ci      ELSE    40 - - - - - -  -   -         start if test else block
6885bd8deadSopenharmony_ci      EMIT    40 - - - - - -  -   -         emit vertex stream 0 (gp-only)
6895bd8deadSopenharmony_ci      EMITS   50 - X - - - S  -   s         emit vertex to stream (gp-only)
6905bd8deadSopenharmony_ci      ENDIF   40 - - - - - -  -   -         end if test block
6915bd8deadSopenharmony_ci      ENDPRIM 40 - - - - - -  -   -         end of primitive (gp-only)
6925bd8deadSopenharmony_ci      ENDREP  40 - - - - - -  -   -         end of repeat block
6935bd8deadSopenharmony_ci      EX2     40 X - X X X F  s   s         exponential base 2
6945bd8deadSopenharmony_ci      FLR     40 6 6 X X X F  v   vf        floor
6955bd8deadSopenharmony_ci      FRC     40 6 - X X X F  v   v         fraction
6965bd8deadSopenharmony_ci      I2F     40 - 6 X - - S  vf  v         integer to float
6975bd8deadSopenharmony_ci      IF      40 - - - - - -  -   c         start of if test block
6985bd8deadSopenharmony_ci      IPAC    50 X - X X - F  v   v         interpolate at centroid (fp-only) 
6995bd8deadSopenharmony_ci      IPAO    50 X - X X - F  v   v,v       interpolate w/offset (fp-only)
7005bd8deadSopenharmony_ci      IPAS    50 X - X X - F  v   v,su      interpolate at sample (fp-only)
7015bd8deadSopenharmony_ci      KIL     40 X X - - X F  -   vc        kill fragment
7025bd8deadSopenharmony_ci      LDC     40 - - X X - F  v   v         load from constant buffer
7035bd8deadSopenharmony_ci      LG2     40 X - X X X F  s   s         logarithm base 2
7045bd8deadSopenharmony_ci      LIT     40 X - X X X F  v   v         compute lighting coefficients
7055bd8deadSopenharmony_ci      LOAD    40 - - X X - F  v   su        global load
7065bd8deadSopenharmony_ci      LOD     41 X - X X - F  v   vf,t      compute texture LOD
7075bd8deadSopenharmony_ci      LRP     40 X - X X X F  v   v,v,v     linear interpolation
7085bd8deadSopenharmony_ci      MAD     40 6 6 X X X F  v   v,v,v     multiply and add
7095bd8deadSopenharmony_ci      MAX     40 6 6 X X X F  v   v,v       maximum
7105bd8deadSopenharmony_ci      MEMBAR  50 - - - - - -  -   -         memory barrier
7115bd8deadSopenharmony_ci      MIN     40 6 6 X X X F  v   v,v       minimum
7125bd8deadSopenharmony_ci      MOD     40 - 6 X - - S  v   v,s       modulus vector components by scalar
7135bd8deadSopenharmony_ci      MOV     40 6 6 X X X F  v   v         move
7145bd8deadSopenharmony_ci      MUL     40 6 6 X X X F  v   v,v       multiply
7155bd8deadSopenharmony_ci      NOT     40 - 6 X - - S  v   v         bitwise not
7165bd8deadSopenharmony_ci      NRM     40 X - X X X F  v   v         normalize 3-component vector
7175bd8deadSopenharmony_ci      OR      40 - 6 X - - S  v   v,v       bitwise or
7185bd8deadSopenharmony_ci      PK2H    40 X X - - - F  s   vf        pack two 16-bit floats
7195bd8deadSopenharmony_ci      PK2US   40 X X - - - F  s   vf        pack two floats as unsigned 16-bit
7205bd8deadSopenharmony_ci      PK4B    40 X X - - - F  s   vf        pack four floats as signed 8-bit
7215bd8deadSopenharmony_ci      PK4UB   40 X X - - - F  s   vf        pack four floats as unsigned 8-bit
7225bd8deadSopenharmony_ci      PK64    50 X X - - - F  v   v         pack 4x32-bit vectors to 2x64
7235bd8deadSopenharmony_ci      POW     40 X - X X X F  s   s,s       exponentiate
7245bd8deadSopenharmony_ci      RCC     40 X - X X X F  s   s         reciprocal (clamped)
7255bd8deadSopenharmony_ci      RCP     40 6 - X X X F  s   s         reciprocal
7265bd8deadSopenharmony_ci      REP     40 6 6 - - X F  -   v         start of repeat block
7275bd8deadSopenharmony_ci      RET     40 - - - - - -  -   c         subroutine return
7285bd8deadSopenharmony_ci      RFL     40 X - X X X F  v   v,v       reflection vector
7295bd8deadSopenharmony_ci      ROUND   40 6 6 X X X F  v   vf        round to nearest integer
7305bd8deadSopenharmony_ci      RSQ     40 6 - X X X F  s   s         reciprocal square root
7315bd8deadSopenharmony_ci      SAD     40 - 6 X - - S  vu  v,v,vu    sum of absolute differences
7325bd8deadSopenharmony_ci      SCS     40 X - X X X F  v   s         sine/cosine without reduction
7335bd8deadSopenharmony_ci      SEQ     40 6 6 X X X F  v   v,v       set on equal
7345bd8deadSopenharmony_ci      SFL     40 6 6 X X X F  v   v,v       set on false
7355bd8deadSopenharmony_ci      SGE     40 6 6 X X X F  v   v,v       set on greater than or equal
7365bd8deadSopenharmony_ci      SGT     40 6 6 X X X F  v   v,v       set on greater than
7375bd8deadSopenharmony_ci      SHL     40 - 6 X - - S  v   v,s       shift left
7385bd8deadSopenharmony_ci      SHR     40 - 6 X - - S  v   v,s       shift right 
7395bd8deadSopenharmony_ci      SIN     40 X - X X X F  s   s         sine with reduction to [-PI,PI]
7405bd8deadSopenharmony_ci      SLE     40 6 6 X X X F  v   v,v       set on less than or equal
7415bd8deadSopenharmony_ci      SLT     40 6 6 X X X F  v   v,v       set on less than
7425bd8deadSopenharmony_ci      SNE     40 6 6 X X X F  v   v,v       set on not equal
7435bd8deadSopenharmony_ci      SSG     40 6 - X X X F  v   v         set sign
7445bd8deadSopenharmony_ci      STORE   50 - - - - - -  -   v,su      global store
7455bd8deadSopenharmony_ci      STR     40 6 6 X X X F  v   v,v       set on true
7465bd8deadSopenharmony_ci      SUB     40 6 6 X X X F  v   v,v       subtract
7475bd8deadSopenharmony_ci      SWZ     40 X - X X X F  v   v         extended swizzle
7485bd8deadSopenharmony_ci      TEX     40 X X X X - F  v   vf,t      texture sample
7495bd8deadSopenharmony_ci      TGALL   50 X X X X - F  v   v         test all non-zero in thread group
7505bd8deadSopenharmony_ci      TGANY   50 X X X X - F  v   v         test any non-zero in thread group
7515bd8deadSopenharmony_ci      TGEQ    50 X X X X - F  v   v         test all equal in thread group
7525bd8deadSopenharmony_ci      TRUNC   40 6 6 X X X F  v   vf        truncate (round toward zero)
7535bd8deadSopenharmony_ci      TXB     40 X X X X - F  v   vf,t      texture sample with bias
7545bd8deadSopenharmony_ci      TXD     40 X X X X - F  v vf,vf,vf,t  texture sample w/partials      
7555bd8deadSopenharmony_ci      TXF     40 X X X X - F  v   vs,t      texel fetch
7565bd8deadSopenharmony_ci      TXFMS   40 X X X X - F  v   vs,t      multisample texel fetch
7575bd8deadSopenharmony_ci      TXG     41 X X X X - F  v   vf,t      texture gather
7585bd8deadSopenharmony_ci      TXGO    50 X X X X - F  v vf,vs,vs,t  texture gather w/per-texel offsets
7595bd8deadSopenharmony_ci      TXL     40 X X X X - F  v   vf,t      texture sample w/LOD
7605bd8deadSopenharmony_ci      TXP     40 X X X X - F  v   vf,t      texture sample w/projection
7615bd8deadSopenharmony_ci      TXQ     40 - - - - - S  vs  vs,t      texture info query
7625bd8deadSopenharmony_ci      UP2H    40 X X X X - F  vf  s         unpack two 16-bit floats
7635bd8deadSopenharmony_ci      UP2US   40 X X X X - F  vf  s         unpack two unsigned 16-bit integers
7645bd8deadSopenharmony_ci      UP4B    40 X X X X - F  vf  s         unpack four signed 8-bit integers
7655bd8deadSopenharmony_ci      UP4UB   40 X X X X - F  vf  s         unpack four unsigned 8-bit integers
7665bd8deadSopenharmony_ci      UP64    50 X X X X - F  v   v         unpack 2x64 vectors to 4x32
7675bd8deadSopenharmony_ci      X2D     40 X - X X X F  v   v,v,v     2D coordinate transformation
7685bd8deadSopenharmony_ci      XOR     40 - 6 X - - S  v   v,v       exclusive or
7695bd8deadSopenharmony_ci      XPD     40 X - X X X F  v   v,v       cross product
7705bd8deadSopenharmony_ci    
7715bd8deadSopenharmony_ci          Table X.13:  Summary of NV_gpu_program5 instructions.
7725bd8deadSopenharmony_ci    
7735bd8deadSopenharmony_ci      The "V" column indicates the first assembly language in the
7745bd8deadSopenharmony_ci      NV_gpu_program4 family (if any) supporting the opcode.  "41" and "50"
7755bd8deadSopenharmony_ci      indicate NV_gpu_program4_1 and NV_gpu_program5, respectively.
7765bd8deadSopenharmony_ci
7775bd8deadSopenharmony_ci      The "Modifiers" columns specify the set of modifiers allowed for the
7785bd8deadSopenharmony_ci      instruction:
7795bd8deadSopenharmony_ci
7805bd8deadSopenharmony_ci        F = floating-point data type modifiers
7815bd8deadSopenharmony_ci        I = signed and unsigned integer data type modifiers
7825bd8deadSopenharmony_ci        C = condition code update modifiers
7835bd8deadSopenharmony_ci        S = clamping (saturation) modifiers
7845bd8deadSopenharmony_ci        H = half-precision float data type suffix
7855bd8deadSopenharmony_ci        D = default data type modifier (F, U, or S)
7865bd8deadSopenharmony_ci
7875bd8deadSopenharmony_ci      For the "F" and "I" columns, an "X" indicates support for both unsized
7885bd8deadSopenharmony_ci      type modifiers and sized type modifiers with fewer than 64 bits.  A "6"
7895bd8deadSopenharmony_ci      indicates support for all modifiers, including 64-bit versions (when
7905bd8deadSopenharmony_ci      supported).
7915bd8deadSopenharmony_ci
7925bd8deadSopenharmony_ci      The input and output columns describe the formats of the operands and
7935bd8deadSopenharmony_ci      results of the instruction.
7945bd8deadSopenharmony_ci
7955bd8deadSopenharmony_ci        v:  4-component vector (data type is inherited from operation)
7965bd8deadSopenharmony_ci        vf: 4-component vector (data type is always floating-point)
7975bd8deadSopenharmony_ci        vs: 4-component vector (data type is always signed integer)
7985bd8deadSopenharmony_ci        vu: 4-component vector (data type is always unsigned integer)
7995bd8deadSopenharmony_ci        s:  scalar (replicated if written to a vector destination;
8005bd8deadSopenharmony_ci                    data type is inherited from operation)
8015bd8deadSopenharmony_ci        su:  scalar (data type is always unsigned integer)
8025bd8deadSopenharmony_ci        c:  condition code test result (e.g., "EQ", "GT1.x")
8035bd8deadSopenharmony_ci        vc: 4-component vector or condition code test
8045bd8deadSopenharmony_ci        t:  texture
8055bd8deadSopenharmony_ci
8065bd8deadSopenharmony_ci      Instructions labeled "fp-only" and "gp-only" are supported only for
8075bd8deadSopenharmony_ci      fragment and geometry programs, respectively.
8085bd8deadSopenharmony_ci    
8095bd8deadSopenharmony_ci
8105bd8deadSopenharmony_ci    Modify Section 2.X.4.1, Program Instruction Modifiers
8115bd8deadSopenharmony_ci
8125bd8deadSopenharmony_ci    (Update the discussion of instruction precision modifiers.  If
8135bd8deadSopenharmony_ci     GL_NV_gpu_program_fp64 is not found in the extension string, the "F64"
8145bd8deadSopenharmony_ci     instruction modifier described below is not supported.)
8155bd8deadSopenharmony_ci
8165bd8deadSopenharmony_ci    (add to Table X.14 of the NV_gpu_program4 specification.)
8175bd8deadSopenharmony_ci
8185bd8deadSopenharmony_ci      Modifier  Description
8195bd8deadSopenharmony_ci      --------  ---------------------------------------------------
8205bd8deadSopenharmony_ci      F         Floating-point operation
8215bd8deadSopenharmony_ci      U         Fixed-point operation, unsigned operands
8225bd8deadSopenharmony_ci      S         Fixed-point operation, signed operands
8235bd8deadSopenharmony_ci      ...
8245bd8deadSopenharmony_ci      F32       Floating-point operation, 32-bit precision or
8255bd8deadSopenharmony_ci                  access one 32-bit floating-point value
8265bd8deadSopenharmony_ci      F64       Floating-point operation, 64-bit precision or
8275bd8deadSopenharmony_ci                  access one 64-bit floating-point value
8285bd8deadSopenharmony_ci      S32       Fixed-point operation, signed 32-bit operands or
8295bd8deadSopenharmony_ci                  access one 32-bit signed integer value
8305bd8deadSopenharmony_ci      S64       Fixed-point operation, signed 64-bit operands or
8315bd8deadSopenharmony_ci                  access one 64-bit signed integer value
8325bd8deadSopenharmony_ci      U32       Fixed-point operation, unsigned 32-bit operands or
8335bd8deadSopenharmony_ci                  access one 32-bit unsigned integer value
8345bd8deadSopenharmony_ci      U64       Fixed-point operation, unsigned 64-bit operands or
8355bd8deadSopenharmony_ci                  access one 64-bit unsigned integer value
8365bd8deadSopenharmony_ci      ...
8375bd8deadSopenharmony_ci      F32X2     Access two 32-bit floating-point values
8385bd8deadSopenharmony_ci      F32X4     Access four 32-bit floating-point values
8395bd8deadSopenharmony_ci      F64X2     Access two 64-bit floating-point values
8405bd8deadSopenharmony_ci      F64X4     Access four 64-bit floating-point values
8415bd8deadSopenharmony_ci      S8        Access one 8-bit signed integer value
8425bd8deadSopenharmony_ci      S16       Access one 16-bit signed integer value
8435bd8deadSopenharmony_ci      S32X2     Access two 32-bit signed integer values
8445bd8deadSopenharmony_ci      S32X4     Access four 32-bit signed integer values
8455bd8deadSopenharmony_ci      S64       Access one 64-bit signed integer value
8465bd8deadSopenharmony_ci      S64X2     Access two 64-bit signed integer values
8475bd8deadSopenharmony_ci      S64X4     Access four 64-bit signed integer values
8485bd8deadSopenharmony_ci      U8        Access one 8-bit unsigned integer value
8495bd8deadSopenharmony_ci      U16       Access one 16-bit unsigned integer value
8505bd8deadSopenharmony_ci      U32       Access one 32-bit unsigned integer value
8515bd8deadSopenharmony_ci      U32X2     Access two 32-bit unsigned integer values
8525bd8deadSopenharmony_ci      U32X4     Access four 32-bit unsigned integer values
8535bd8deadSopenharmony_ci      U64       Access one 64-bit unsigned integer value
8545bd8deadSopenharmony_ci      U64X2     Access two 64-bit unsigned integer values
8555bd8deadSopenharmony_ci      U64X4     Access four 64-bit unsigned integer values
8565bd8deadSopenharmony_ci
8575bd8deadSopenharmony_ci      ADD       Perform add operation for ATOM
8585bd8deadSopenharmony_ci      MIN       Perform minimum operation for ATOM
8595bd8deadSopenharmony_ci      MAX       Perform maximum operation for ATOM
8605bd8deadSopenharmony_ci      IWRAP     Perform wrapping increment for ATOM
8615bd8deadSopenharmony_ci      DWRAP     Perform wrapping decrment for ATOM
8625bd8deadSopenharmony_ci      AND       Perform logical AND operation for ATOM
8635bd8deadSopenharmony_ci      OR        Perform logical OR operation for ATOM
8645bd8deadSopenharmony_ci      XOR       Perform logical XOR operation for ATOM
8655bd8deadSopenharmony_ci      EXCH      Perform exchange operation for ATOM
8665bd8deadSopenharmony_ci      CSWAP     Perform compare-and-swap operation for ATOM
8675bd8deadSopenharmony_ci
8685bd8deadSopenharmony_ci      COH       Make LOAD and STORE operations use coherent caching
8695bd8deadSopenharmony_ci      VOL       Make LOAD and STORE operations treat memory as volatile
8705bd8deadSopenharmony_ci
8715bd8deadSopenharmony_ci      PREC      Instruction results should be precise
8725bd8deadSopenharmony_ci
8735bd8deadSopenharmony_ci      ROUND     Inexact conversion results round to nearest value (even)
8745bd8deadSopenharmony_ci      CEIL      Inexact conversion results round to larger value
8755bd8deadSopenharmony_ci      FLR       Inexact conversion results round to smaller value
8765bd8deadSopenharmony_ci      TRUNC     Inexact conversion results round to value closest to zero 
8775bd8deadSopenharmony_ci      
8785bd8deadSopenharmony_ci
8795bd8deadSopenharmony_ci    "F", "U", and "S" modifiers are base data type modifiers and specify that
8805bd8deadSopenharmony_ci    the instruction should operate on floating-point, unsigned integer, or
8815bd8deadSopenharmony_ci    signed integer values, respectively.  For example, "ADD.F", "ADD.U", and
8825bd8deadSopenharmony_ci    "ADD.S" specify component-wise addition of floating-point, unsigned
8835bd8deadSopenharmony_ci    integer, or signed integer vectors, respectively.  While these modifiers
8845bd8deadSopenharmony_ci    specify a data type, they do not specify an exact precision at which the
8855bd8deadSopenharmony_ci    operation is performed.  Floating-point and fixed-point operations will
8865bd8deadSopenharmony_ci    typically be carried out at 32-bit precision, unless otherwise described
8875bd8deadSopenharmony_ci    in the instruction documentation or overridden by the precision modifiers.
8885bd8deadSopenharmony_ci    If all operands are represented with less than 32-bit precision (e.g.,
8895bd8deadSopenharmony_ci    variables with the "SHORT" component size modifier), operations may be
8905bd8deadSopenharmony_ci    carried out at a precision no less than the precision of the largest
8915bd8deadSopenharmony_ci    operand used by the instruction.  For some instructions, the data type of
8925bd8deadSopenharmony_ci    some operands or the result are fixed; in these cases, the data type
8935bd8deadSopenharmony_ci    modifier specifies the data type of the remaining values.
8945bd8deadSopenharmony_ci
8955bd8deadSopenharmony_ci    Operands represented with fewer bits than used to perform the instruction
8965bd8deadSopenharmony_ci    will be promoted to a larger data type.  Signed integer operands will be
8975bd8deadSopenharmony_ci    sign-extended, where the most significant bits are filled with ones if the
8985bd8deadSopenharmony_ci    operand is negative and zero otherwise.  Unsigned integer operands will be
8995bd8deadSopenharmony_ci    zero-extended, where the most significant bits are always filled with
9005bd8deadSopenharmony_ci    zeroes.  Operands represented with more bits than used to perform the
9015bd8deadSopenharmony_ci    instruction will be converted to lower precision.  Floating-point
9025bd8deadSopenharmony_ci    overflows result in IEEE infinity encodings; integer overflows result in
9035bd8deadSopenharmony_ci    the truncation of the most significant bits.
9045bd8deadSopenharmony_ci
9055bd8deadSopenharmony_ci    For arithmetic operations, the "F32", "F64", "U32", "U64", "S32", and
9065bd8deadSopenharmony_ci    "S64" modifiers are precision-specific data type modifiers that specify
9075bd8deadSopenharmony_ci    that floating-point, unsigned integer, or signed integer operations be
9085bd8deadSopenharmony_ci    carried out with an internal precision of no less than 32 or 64 bits per
9095bd8deadSopenharmony_ci    component, respectively.  The "F64", "U64", and "S64" modifiers are
9105bd8deadSopenharmony_ci    supported on only a subset of instructions, as documented in the
9115bd8deadSopenharmony_ci    instruction table.  The base data type of the instruction is trivially
9125bd8deadSopenharmony_ci    derived from a precision-specific data type modifiers, and an instruction
9135bd8deadSopenharmony_ci    may not specify both base and precision-specific data type modifiers.
9145bd8deadSopenharmony_ci
9155bd8deadSopenharmony_ci    ...
9165bd8deadSopenharmony_ci
9175bd8deadSopenharmony_ci    "SAT" and "SSAT" are clamping modifiers that generally specify that the
9185bd8deadSopenharmony_ci    floating-point components of the instruction result should be clamped to
9195bd8deadSopenharmony_ci    [0,1] or [-1,1], respectively, before updating the condition code and the
9205bd8deadSopenharmony_ci    destination variable.  If no clamping suffix is specified, unclamped
9215bd8deadSopenharmony_ci    results will be used for condition code updates (if any) and destination
9225bd8deadSopenharmony_ci    variable writes.  Clamping modifiers are not supported on instructions
9235bd8deadSopenharmony_ci    that do not produce floating-point results, with one exception.
9245bd8deadSopenharmony_ci
9255bd8deadSopenharmony_ci    ...
9265bd8deadSopenharmony_ci
9275bd8deadSopenharmony_ci    For load and store operations, the "F32", "F32X2", "F32X4", "F64",
9285bd8deadSopenharmony_ci    "F64X2", "F64X4", "S8", "S16", "S32", "S32X2", "S32X4", "S64", "S64X2",
9295bd8deadSopenharmony_ci    "S64X4", "U8", "U16", "U32", "U32X2", "U32X4", "U64", "U64X2", and "U64X4"
9305bd8deadSopenharmony_ci    storage modifiers control how data are loaded from or stored to memory.
9315bd8deadSopenharmony_ci    Storage modifiers are supported by the ATOM, LDC, LOAD, and STORE
9325bd8deadSopenharmony_ci    instructions and are covered in more detail in the descriptions of these
9335bd8deadSopenharmony_ci    instructions.  These instructions must specify exactly one of these
9345bd8deadSopenharmony_ci    modifiers, and may not specify any of the base data type modifiers (F,U,S)
9355bd8deadSopenharmony_ci    described above.  The base data types of the result vector of a load
9365bd8deadSopenharmony_ci    instruction or the first operand of a store instruction are trivially
9375bd8deadSopenharmony_ci    derived from the storage modifier.
9385bd8deadSopenharmony_ci
9395bd8deadSopenharmony_ci    For atomic memory operations performed by the ATOM instruction, the "ADD",
9405bd8deadSopenharmony_ci    "MIN", "MAX", "IWRAP", "DWRAP", "AND", "OR", "XOR", "EXCH", and "CSWAP"
9415bd8deadSopenharmony_ci    modifiers specify the operation to perform on the memory being accessed,
9425bd8deadSopenharmony_ci    and are described in more detail in the description of this instruction.
9435bd8deadSopenharmony_ci
9445bd8deadSopenharmony_ci    For load and store operations, the "COH" modifier controls whether the 
9455bd8deadSopenharmony_ci    operation uses a coherent level of the cache hierarchy, as described in 
9465bd8deadSopenharmony_ci    Section 2.X.4.5.
9475bd8deadSopenharmony_ci
9485bd8deadSopenharmony_ci    For load and store operations, the "VOL" modifier controls whether the
9495bd8deadSopenharmony_ci    operation treats the memory being read or written as volatile.
9505bd8deadSopenharmony_ci    Instructions modified with "VOL" will always read or write the underlying
9515bd8deadSopenharmony_ci    memory, whether or not previous or subsequent loads and stores access the
9525bd8deadSopenharmony_ci    same memory.
9535bd8deadSopenharmony_ci
9545bd8deadSopenharmony_ci    For arithmetic and logical operations, the "PREC" modifier controls
9555bd8deadSopenharmony_ci    whether the instruction result should be treated as precise.  For
9565bd8deadSopenharmony_ci    instructions not qualified with ".PREC", the implementation may rearrange
9575bd8deadSopenharmony_ci    the computations specified by the program instructions to execute more
9585bd8deadSopenharmony_ci    efficiently, even if it may generate slightly different results in some
9595bd8deadSopenharmony_ci    cases.  For example, an implementation may combine a MUL instruction with
9605bd8deadSopenharmony_ci    a dependent ADD instruction and generate code to execute a MAD
9615bd8deadSopenharmony_ci    (multiply-add) instruction instead.  The difference in rounding may
9625bd8deadSopenharmony_ci    produce unacceptable artifacts for some algorithms.  When ".PREC" is
9635bd8deadSopenharmony_ci    specified, the instruction will be executed in a manner that always
9645bd8deadSopenharmony_ci    generates the same result regardless of the program instructions that
9655bd8deadSopenharmony_ci    precede or follow the instruction.  Note that a ".PREC" modifier does not
9665bd8deadSopenharmony_ci    affect the processing of any other instruction.  For example, tagging an
9675bd8deadSopenharmony_ci    instruction with ".PREC" does not mean that the instructions used to
9685bd8deadSopenharmony_ci    generate the instruction's operands will be treated as precise unless
9695bd8deadSopenharmony_ci    those instructions are also qualified with ".PREC".
9705bd8deadSopenharmony_ci
9715bd8deadSopenharmony_ci    For the CVT (data type conversion) instruction, the "F16", "F32", "F64",
9725bd8deadSopenharmony_ci    "S8", "S16", "S32", "S64", "U8", "U16", "U32", and "U64" storage modifiers
9735bd8deadSopenharmony_ci    specify the data type of the vector operand and the converted result.  Two
9745bd8deadSopenharmony_ci    storage modifiers must be provided, which specify the data type of the
9755bd8deadSopenharmony_ci    result and the operand, respectively.  
9765bd8deadSopenharmony_ci
9775bd8deadSopenharmony_ci    For the CVT (data type conversion) instruction, the "ROUND", "CEIL",
9785bd8deadSopenharmony_ci    "FLR", and "TRUNC" modifiers specify how to round converted results that
9795bd8deadSopenharmony_ci    are not directly representable using the data type of the result.
9805bd8deadSopenharmony_ci
9815bd8deadSopenharmony_ci
9825bd8deadSopenharmony_ci    Modify Section 2.X.4.4, Program Texture Access
9835bd8deadSopenharmony_ci
9845bd8deadSopenharmony_ci    (Extend the language describing the operation of texel offsets to cover
9855bd8deadSopenharmony_ci     the new capability to load texel offsets from a register.  Otherwise,
9865bd8deadSopenharmony_ci     this functionality is unchanged from previous extensions.)
9875bd8deadSopenharmony_ci
9885bd8deadSopenharmony_ci    <offset> is a 3-component signed integer vector, which can be specified
9895bd8deadSopenharmony_ci    using constants embedded in the texture instruction according to the
9905bd8deadSopenharmony_ci    <texOffsetImmed> grammar rule, or taken from a vector operand according to
9915bd8deadSopenharmony_ci    the <texOffsetVar> grammar rule.  The three components of the offset
9925bd8deadSopenharmony_ci    vector are added to the computed <u>, <v>, and <w> texel locations prior
9935bd8deadSopenharmony_ci    to sampling.  When using a constant offset, one, two, or three components
9945bd8deadSopenharmony_ci    may be specified in the instruction; if fewer than three are specified,
9955bd8deadSopenharmony_ci    the remaining offset components are zero.  If no offsets are specified,
9965bd8deadSopenharmony_ci    all three components of the offset are treated as zero.  A limited range
9975bd8deadSopenharmony_ci    of offset values are supported; the minimum and maximum <texOffset> values
9985bd8deadSopenharmony_ci    are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
9995bd8deadSopenharmony_ci    MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively.  A program will fail to load:
10005bd8deadSopenharmony_ci
10015bd8deadSopenharmony_ci      * if the texture target specified in the instruction is 1D, ARRAY1D,
10025bd8deadSopenharmony_ci        SHADOW1D, or SHADOWARRAY1D, and the second or third component of a
10035bd8deadSopenharmony_ci        constant offset vector is non-zero;
10045bd8deadSopenharmony_ci
10055bd8deadSopenharmony_ci      * if the texture target specified in the instruction is 2D, RECT,
10065bd8deadSopenharmony_ci        ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
10075bd8deadSopenharmony_ci        component of a constant offset vector is non-zero;
10085bd8deadSopenharmony_ci
10095bd8deadSopenharmony_ci      * if the texture target is CUBE, SHADOWCUBE, ARRAYCUBE, or
10105bd8deadSopenharmony_ci        SHADOWARRAYCUBE, and any component of a constant offset vector is
10115bd8deadSopenharmony_ci        non-zero -- texel offsets are not supported for cube map or buffer
10125bd8deadSopenharmony_ci        textures;
10135bd8deadSopenharmony_ci
10145bd8deadSopenharmony_ci      * if any component of the constant offset vector of a TXGO instruction
10155bd8deadSopenharmony_ci        is non-zero -- non-constant offsets are provided in separate operands;
10165bd8deadSopenharmony_ci
10175bd8deadSopenharmony_ci      * if any component of a constant offset vector is less than
10185bd8deadSopenharmony_ci        MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
10195bd8deadSopenharmony_ci        MAX_PROGRAM_TEXEL_OFFSET_EXT;
10205bd8deadSopenharmony_ci
10215bd8deadSopenharmony_ci      * if a TXD or TXGO instruction specifies a non-constant texel offset
10225bd8deadSopenharmony_ci        according to the <texOffsetVar> grammar rule; or
10235bd8deadSopenharmony_ci
10245bd8deadSopenharmony_ci      * if any instruction specifies a non-constant texel offset according
10255bd8deadSopenharmony_ci        to the <texOffsetVar> grammar rule and the texture target is CUBE,
10265bd8deadSopenharmony_ci        SHADOWCUBE, ARRAYCUBE, or SHADOWARRAYCUBE.
10275bd8deadSopenharmony_ci
10285bd8deadSopenharmony_ci    The implementation-dependent minimum and maximum texel offset values apply
10295bd8deadSopenharmony_ci    to texel offsets are taken from a vector operand, but out-of-bounds or
10305bd8deadSopenharmony_ci    invalid component values will not prevent program loading since the
10315bd8deadSopenharmony_ci    offsets may not be computed until the program is executed.  Components of
10325bd8deadSopenharmony_ci    the vector operand not needed for the texture target are ignored.  The W
10335bd8deadSopenharmony_ci    component of the offset vector is always ignored; the Z component of the
10345bd8deadSopenharmony_ci    offset vector is ignored unless the target is 3D; the Y component is
10355bd8deadSopenharmony_ci    ignored if the target is 1D, ARRAY1D, SHADOW1D, or SHADOWARRAY1D.  If the
10365bd8deadSopenharmony_ci    value of any non-ignored component of the vector operand is outside
10375bd8deadSopenharmony_ci    implementation-dependent limits, the results of the texture lookup are
10385bd8deadSopenharmony_ci    undefined.  For all instructions except TXGO, the limits are
10395bd8deadSopenharmony_ci    MIN_PROGRAM_TEXEL_OFFSET_EXT and MAX_PROGRAM_TEXEL_OFFSET_EXT.  For the
10405bd8deadSopenharmony_ci    TXGO instruction, the limits are MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV and
10415bd8deadSopenharmony_ci    MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV.
10425bd8deadSopenharmony_ci
10435bd8deadSopenharmony_ci
10445bd8deadSopenharmony_ci    (Modify language describing how the check for using multiple targets on a
10455bd8deadSopenharmony_ci     single texture image unit works, to account for texture array variables
10465bd8deadSopenharmony_ci     where a single instruction may access one of multiple textures and the
10475bd8deadSopenharmony_ci     texture used is not known when the program is loaded.)
10485bd8deadSopenharmony_ci
10495bd8deadSopenharmony_ci    A program will fail to load if it attempts to sample from multiple texture
10505bd8deadSopenharmony_ci    targets (including the SHADOW pseudo-targets) on the same texture image
10515bd8deadSopenharmony_ci    unit.  For example, a program containing any two the following
10525bd8deadSopenharmony_ci    instructions will fail to load:
10535bd8deadSopenharmony_ci
10545bd8deadSopenharmony_ci      TEX out, coord, texture[0], 1D;
10555bd8deadSopenharmony_ci      TEX out, coord, texture[0], 2D;
10565bd8deadSopenharmony_ci      TEX out, coord, texture[0], ARRAY2D;
10575bd8deadSopenharmony_ci      TEX out, coord, texture[0], SHADOW2D;
10585bd8deadSopenharmony_ci      TEX out, coord, texture[0], 3D;
10595bd8deadSopenharmony_ci
10605bd8deadSopenharmony_ci    For the purposes of this test, sampling using a texture variable declared
10615bd8deadSopenharmony_ci    as an array is treated as though all texture image units bound to the
10625bd8deadSopenharmony_ci    variable were accessed.  A program containing the following
10635bd8deadSopenharmony_ci    instructions would fail to load:
10645bd8deadSopenharmony_ci
10655bd8deadSopenharmony_ci      TEXTURE textures[] = { texture[0..3] };
10665bd8deadSopenharmony_ci      TEX out, coord, textures[2], 2D;     # acts as if all textures are used
10675bd8deadSopenharmony_ci      TEX out, coord, texture[1], 3D;
10685bd8deadSopenharmony_ci
10695bd8deadSopenharmony_ci    (Add language describing texture gather component selection)
10705bd8deadSopenharmony_ci
10715bd8deadSopenharmony_ci    The TXG and TXGO instructions provide the ability to assemble a
10725bd8deadSopenharmony_ci    four-component vector by taking the value of a single component of a
10735bd8deadSopenharmony_ci    multi-component texture from each of four texels.  The component selected
10745bd8deadSopenharmony_ci    is identified by the <texImageUnitComp> grammar rule.  Component selection
10755bd8deadSopenharmony_ci    is not supported for any other instruction, and a program will fail to
10765bd8deadSopenharmony_ci    load if <texImageUnitComp> is matched for any texture instruction other
10775bd8deadSopenharmony_ci    than TXG or TXGO.
10785bd8deadSopenharmony_ci
10795bd8deadSopenharmony_ci
10805bd8deadSopenharmony_ci    Add New Section 2.X.4.5, Program Memory Access
10815bd8deadSopenharmony_ci
10825bd8deadSopenharmony_ci    Programs may load from or store to buffer object memory via the ATOM
10835bd8deadSopenharmony_ci    (atomic global memory operation), LDC (load constant), LOAD (global load),
10845bd8deadSopenharmony_ci    and STORE (global store) instructions.
10855bd8deadSopenharmony_ci
10865bd8deadSopenharmony_ci    Load instructions read 8, 16, 32, 64, 128, or 256 bits of data from a
10875bd8deadSopenharmony_ci    source address to produce a four-component vector, according to the
10885bd8deadSopenharmony_ci    storage modifier specified with the instruction.  The storage modifier has
10895bd8deadSopenharmony_ci    three parts:
10905bd8deadSopenharmony_ci
10915bd8deadSopenharmony_ci      - a base data type, "F", "S", or "U", specifying that the instruction
10925bd8deadSopenharmony_ci        fetches floating-point, signed integer, or unsigned integer values,
10935bd8deadSopenharmony_ci        respectively;
10945bd8deadSopenharmony_ci
10955bd8deadSopenharmony_ci      - a component size, specifying that the components fetched by the
10965bd8deadSopenharmony_ci        instruction have 8, 16, 32, or 64 bits; and
10975bd8deadSopenharmony_ci
10985bd8deadSopenharmony_ci      - an optional component count, where "X2" and "X4" indicate that two or
10995bd8deadSopenharmony_ci        four components be fetched, and no count indicates a single component
11005bd8deadSopenharmony_ci        fetch.
11015bd8deadSopenharmony_ci
11025bd8deadSopenharmony_ci    When the storage modifier specifies that fewer than four components should
11035bd8deadSopenharmony_ci    be fetched, remaining components are filled with zeroes.  When performing
11045bd8deadSopenharmony_ci    an atomic memory operation (ATOM) or a global load (LOAD), the GPU address
11055bd8deadSopenharmony_ci    is specified as an instruction operand.  When performing a constant buffer
11065bd8deadSopenharmony_ci    load (LDC), the GPU address is derived by adding the base address of the
11075bd8deadSopenharmony_ci    bound buffer object to an offset specified as an instruction operand.
11085bd8deadSopenharmony_ci    Given a GPU address <address> and a storage modifier <modifier>, the
11095bd8deadSopenharmony_ci    memory load can be described by the following code:
11105bd8deadSopenharmony_ci
11115bd8deadSopenharmony_ci      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
11125bd8deadSopenharmony_ci      {
11135bd8deadSopenharmony_ci        result_t_vec result = { 0, 0, 0, 0 };
11145bd8deadSopenharmony_ci        switch (modifier) {
11155bd8deadSopenharmony_ci        case F32:
11165bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
11175bd8deadSopenharmony_ci            break;
11185bd8deadSopenharmony_ci        case F32X2:
11195bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
11205bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
11215bd8deadSopenharmony_ci            break;
11225bd8deadSopenharmony_ci        case F32X4:
11235bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
11245bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
11255bd8deadSopenharmony_ci            result.z = ((float32_t *)address)[2];
11265bd8deadSopenharmony_ci            result.w = ((float32_t *)address)[3];
11275bd8deadSopenharmony_ci            break;
11285bd8deadSopenharmony_ci        case F64:
11295bd8deadSopenharmony_ci            result.x = ((float64_t *)address)[0];
11305bd8deadSopenharmony_ci            break;
11315bd8deadSopenharmony_ci        case F64X2:
11325bd8deadSopenharmony_ci            result.x = ((float64_t *)address)[0];
11335bd8deadSopenharmony_ci            result.y = ((float64_t *)address)[1];
11345bd8deadSopenharmony_ci            break;
11355bd8deadSopenharmony_ci        case F64X4:
11365bd8deadSopenharmony_ci            result.x = ((float64_t *)address)[0];
11375bd8deadSopenharmony_ci            result.y = ((float64_t *)address)[1];
11385bd8deadSopenharmony_ci            result.z = ((float64_t *)address)[2];
11395bd8deadSopenharmony_ci            result.w = ((float64_t *)address)[3];
11405bd8deadSopenharmony_ci            break;
11415bd8deadSopenharmony_ci        case S8:
11425bd8deadSopenharmony_ci            result.x = ((int8_t *)address)[0];
11435bd8deadSopenharmony_ci            break;
11445bd8deadSopenharmony_ci        case S16:
11455bd8deadSopenharmony_ci            result.x = ((int16_t *)address)[0];
11465bd8deadSopenharmony_ci            break;
11475bd8deadSopenharmony_ci        case S32:
11485bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
11495bd8deadSopenharmony_ci            break;
11505bd8deadSopenharmony_ci        case S32X2:
11515bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
11525bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
11535bd8deadSopenharmony_ci            break;
11545bd8deadSopenharmony_ci        case S32X4:
11555bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
11565bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
11575bd8deadSopenharmony_ci            result.z = ((int32_t *)address)[2];
11585bd8deadSopenharmony_ci            result.w = ((int32_t *)address)[3];
11595bd8deadSopenharmony_ci            break;
11605bd8deadSopenharmony_ci        case S64:
11615bd8deadSopenharmony_ci            result.x = ((int64_t *)address)[0];
11625bd8deadSopenharmony_ci            break;
11635bd8deadSopenharmony_ci        case S64X2:
11645bd8deadSopenharmony_ci            result.x = ((int64_t *)address)[0];
11655bd8deadSopenharmony_ci            result.y = ((int64_t *)address)[1];
11665bd8deadSopenharmony_ci            break;
11675bd8deadSopenharmony_ci        case S64X4:
11685bd8deadSopenharmony_ci            result.x = ((int64_t *)address)[0];
11695bd8deadSopenharmony_ci            result.y = ((int64_t *)address)[1];
11705bd8deadSopenharmony_ci            result.z = ((int64_t *)address)[2];
11715bd8deadSopenharmony_ci            result.w = ((int64_t *)address)[3];
11725bd8deadSopenharmony_ci            break;
11735bd8deadSopenharmony_ci        case U8:
11745bd8deadSopenharmony_ci            result.x = ((uint8_t *)address)[0];
11755bd8deadSopenharmony_ci            break;
11765bd8deadSopenharmony_ci        case U16:
11775bd8deadSopenharmony_ci            result.x = ((uint16_t *)address)[0];
11785bd8deadSopenharmony_ci            break;
11795bd8deadSopenharmony_ci        case U32:
11805bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
11815bd8deadSopenharmony_ci            break;
11825bd8deadSopenharmony_ci        case U32X2:
11835bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
11845bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
11855bd8deadSopenharmony_ci            break;
11865bd8deadSopenharmony_ci        case U32X4:
11875bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
11885bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
11895bd8deadSopenharmony_ci            result.z = ((uint32_t *)address)[2];
11905bd8deadSopenharmony_ci            result.w = ((uint32_t *)address)[3];
11915bd8deadSopenharmony_ci            break;
11925bd8deadSopenharmony_ci        case U64:
11935bd8deadSopenharmony_ci            result.x = ((uint64_t *)address)[0];
11945bd8deadSopenharmony_ci            break;
11955bd8deadSopenharmony_ci        case U64X2:
11965bd8deadSopenharmony_ci            result.x = ((uint64_t *)address)[0];
11975bd8deadSopenharmony_ci            result.y = ((uint64_t *)address)[1];
11985bd8deadSopenharmony_ci            break;
11995bd8deadSopenharmony_ci        case U64X4:
12005bd8deadSopenharmony_ci            result.x = ((uint64_t *)address)[0];
12015bd8deadSopenharmony_ci            result.y = ((uint64_t *)address)[1];
12025bd8deadSopenharmony_ci            result.z = ((uint64_t *)address)[2];
12035bd8deadSopenharmony_ci            result.w = ((uint64_t *)address)[3];
12045bd8deadSopenharmony_ci            break;
12055bd8deadSopenharmony_ci        }
12065bd8deadSopenharmony_ci        return result;
12075bd8deadSopenharmony_ci      }
12085bd8deadSopenharmony_ci
12095bd8deadSopenharmony_ci    Store instructions write the contents of a four-component vector operand
12105bd8deadSopenharmony_ci    into 8, 16, 32, 64, 128, or 256 bits, according to the storage modifier
12115bd8deadSopenharmony_ci    specified with the instruction.  The storage modifiers supported by stores
12125bd8deadSopenharmony_ci    are identical to those supported for loads.  Given a GPU address
12135bd8deadSopenharmony_ci    <address>, a vector operand <operand> containing the data to be stored,
12145bd8deadSopenharmony_ci    and a storage modifier <modifier>, the memory store can be described by
12155bd8deadSopenharmony_ci    the following code:
12165bd8deadSopenharmony_ci
12175bd8deadSopenharmony_ci      void BufferMemoryStore(char *address, operand_t_vec operand, 
12185bd8deadSopenharmony_ci                             OpModifier modifier)
12195bd8deadSopenharmony_ci      {
12205bd8deadSopenharmony_ci        switch (modifier) {
12215bd8deadSopenharmony_ci        case F32:
12225bd8deadSopenharmony_ci            ((float32_t *)address)[0] = operand.x;
12235bd8deadSopenharmony_ci            break;
12245bd8deadSopenharmony_ci        case F32X2:
12255bd8deadSopenharmony_ci            ((float32_t *)address)[0] = operand.x;
12265bd8deadSopenharmony_ci            ((float32_t *)address)[1] = operand.y;
12275bd8deadSopenharmony_ci            break;
12285bd8deadSopenharmony_ci        case F32X4:
12295bd8deadSopenharmony_ci            ((float32_t *)address)[0] = operand.x;
12305bd8deadSopenharmony_ci            ((float32_t *)address)[1] = operand.y;
12315bd8deadSopenharmony_ci            ((float32_t *)address)[2] = operand.z;
12325bd8deadSopenharmony_ci            ((float32_t *)address)[3] = operand.w;
12335bd8deadSopenharmony_ci            break;
12345bd8deadSopenharmony_ci        case F64:
12355bd8deadSopenharmony_ci            ((float64_t *)address)[0] = operand.x;
12365bd8deadSopenharmony_ci            break;
12375bd8deadSopenharmony_ci        case F64X2:
12385bd8deadSopenharmony_ci            ((float64_t *)address)[0] = operand.x;
12395bd8deadSopenharmony_ci            ((float64_t *)address)[1] = operand.y;
12405bd8deadSopenharmony_ci            break;
12415bd8deadSopenharmony_ci        case F64X4:
12425bd8deadSopenharmony_ci            ((float64_t *)address)[0] = operand.x;
12435bd8deadSopenharmony_ci            ((float64_t *)address)[1] = operand.y;
12445bd8deadSopenharmony_ci            ((float64_t *)address)[2] = operand.z;
12455bd8deadSopenharmony_ci            ((float64_t *)address)[3] = operand.w;
12465bd8deadSopenharmony_ci            break;
12475bd8deadSopenharmony_ci        case S8:
12485bd8deadSopenharmony_ci            ((int8_t *)address)[0] = operand.x;
12495bd8deadSopenharmony_ci            break;
12505bd8deadSopenharmony_ci        case S16:
12515bd8deadSopenharmony_ci            ((int16_t *)address)[0] = operand.x;
12525bd8deadSopenharmony_ci            break;
12535bd8deadSopenharmony_ci        case S32:
12545bd8deadSopenharmony_ci            ((int32_t *)address)[0] = operand.x;
12555bd8deadSopenharmony_ci            break;
12565bd8deadSopenharmony_ci        case S32X2:
12575bd8deadSopenharmony_ci            ((int32_t *)address)[0] = operand.x;
12585bd8deadSopenharmony_ci            ((int32_t *)address)[1] = operand.y;
12595bd8deadSopenharmony_ci            break;
12605bd8deadSopenharmony_ci        case S32X4:
12615bd8deadSopenharmony_ci            ((int32_t *)address)[0] = operand.x;
12625bd8deadSopenharmony_ci            ((int32_t *)address)[1] = operand.y;
12635bd8deadSopenharmony_ci            ((int32_t *)address)[2] = operand.z;
12645bd8deadSopenharmony_ci            ((int32_t *)address)[3] = operand.w;
12655bd8deadSopenharmony_ci            break;
12665bd8deadSopenharmony_ci        case S64:
12675bd8deadSopenharmony_ci            ((int64_t *)address)[0] = operand.x;
12685bd8deadSopenharmony_ci            break;
12695bd8deadSopenharmony_ci        case S64X2:
12705bd8deadSopenharmony_ci            ((int64_t *)address)[0] = operand.x;
12715bd8deadSopenharmony_ci            ((int64_t *)address)[1] = operand.y;
12725bd8deadSopenharmony_ci            break;
12735bd8deadSopenharmony_ci        case S64X4:
12745bd8deadSopenharmony_ci            ((int64_t *)address)[0] = operand.x;
12755bd8deadSopenharmony_ci            ((int64_t *)address)[1] = operand.y;
12765bd8deadSopenharmony_ci            ((int64_t *)address)[2] = operand.z;
12775bd8deadSopenharmony_ci            ((int64_t *)address)[3] = operand.w;
12785bd8deadSopenharmony_ci            break;
12795bd8deadSopenharmony_ci        case U8:
12805bd8deadSopenharmony_ci            ((uint8_t *)address)[0] = operand.x;
12815bd8deadSopenharmony_ci            break;
12825bd8deadSopenharmony_ci        case U16:
12835bd8deadSopenharmony_ci            ((uint16_t *)address)[0] = operand.x;
12845bd8deadSopenharmony_ci            break;
12855bd8deadSopenharmony_ci        case U32:
12865bd8deadSopenharmony_ci            ((uint32_t *)address)[0] = operand.x;
12875bd8deadSopenharmony_ci            break;
12885bd8deadSopenharmony_ci        case U32X2:
12895bd8deadSopenharmony_ci            ((uint32_t *)address)[0] = operand.x;
12905bd8deadSopenharmony_ci            ((uint32_t *)address)[1] = operand.y;
12915bd8deadSopenharmony_ci            break;
12925bd8deadSopenharmony_ci        case U32X4:
12935bd8deadSopenharmony_ci            ((uint32_t *)address)[0] = operand.x;
12945bd8deadSopenharmony_ci            ((uint32_t *)address)[1] = operand.y;
12955bd8deadSopenharmony_ci            ((uint32_t *)address)[2] = operand.z;
12965bd8deadSopenharmony_ci            ((uint32_t *)address)[3] = operand.w;
12975bd8deadSopenharmony_ci            break;
12985bd8deadSopenharmony_ci        case U64:
12995bd8deadSopenharmony_ci            ((uint64_t *)address)[0] = operand.x;
13005bd8deadSopenharmony_ci            break;
13015bd8deadSopenharmony_ci        case U64X2:
13025bd8deadSopenharmony_ci            ((uint64_t *)address)[0] = operand.x;
13035bd8deadSopenharmony_ci            ((uint64_t *)address)[1] = operand.y;
13045bd8deadSopenharmony_ci            break;
13055bd8deadSopenharmony_ci        case U64X4:
13065bd8deadSopenharmony_ci            ((uint64_t *)address)[0] = operand.x;
13075bd8deadSopenharmony_ci            ((uint64_t *)address)[1] = operand.y;
13085bd8deadSopenharmony_ci            ((uint64_t *)address)[2] = operand.z;
13095bd8deadSopenharmony_ci            ((uint64_t *)address)[3] = operand.w;
13105bd8deadSopenharmony_ci            break;
13115bd8deadSopenharmony_ci        }
13125bd8deadSopenharmony_ci      }
13135bd8deadSopenharmony_ci
13145bd8deadSopenharmony_ci    If a global load or store accesses a memory address that does not
13155bd8deadSopenharmony_ci    correspond to a buffer object made resident by MakeBufferResidentNV, the
13165bd8deadSopenharmony_ci    results of the operation are undefined and may produce a fault resulting
13175bd8deadSopenharmony_ci    in application termination.  If a load accesses a buffer object made
13185bd8deadSopenharmony_ci    resident with an <access> parameter of WRITE_ONLY, or if a store accesses
13195bd8deadSopenharmony_ci    a buffer object made resident with an <access> parameter of READ_ONLY, the
13205bd8deadSopenharmony_ci    results of the operation are also undefined and may lead to application
13215bd8deadSopenharmony_ci    termination.
13225bd8deadSopenharmony_ci
13235bd8deadSopenharmony_ci    The address used for global memory loads or stores or offset used for
13245bd8deadSopenharmony_ci    constant buffer loads must be aligned to the fetch size corresponding to
13255bd8deadSopenharmony_ci    the storage opcode modifier.  For S8 and U8, the offset has no alignment
13265bd8deadSopenharmony_ci    requirements.  For S16 and U16, the offset must be a multiple of two basic
13275bd8deadSopenharmony_ci    machine units.  For F32, S32, and U32, the offset must be a multiple of
13285bd8deadSopenharmony_ci    four.  For F32X2, F64, S32X2, S64, U32X2, and U64, the offset must be a
13295bd8deadSopenharmony_ci    multiple of eight.  For F32X4, F64X2, S32X4, S64X2, U32X4, and U64X2, the
13305bd8deadSopenharmony_ci    offset must be a multiple of sixteen.  For F64X4, S64X4, and U64X4, the
13315bd8deadSopenharmony_ci    offset must be a multiple of thirty-two.  If an offset is not correctly
13325bd8deadSopenharmony_ci    aligned, the values returned by a buffer memory load will be undefined,
13335bd8deadSopenharmony_ci    and the effects of a buffer memory store will also be undefined.
13345bd8deadSopenharmony_ci
13355bd8deadSopenharmony_ci    Global and image memory accesses in assembly programs are weakly ordered
13365bd8deadSopenharmony_ci    and may require synchronization relative to other operations in the OpenGL
13375bd8deadSopenharmony_ci    pipeline.  The ordering and synchronization mehcanisms described in
13385bd8deadSopenharmony_ci    Section 2.14.X (of the EXT_shader_image_load_store extension
13395bd8deadSopenharmony_ci    specification) for shaders using the OpenGL Shading Language apply equally
13405bd8deadSopenharmony_ci    to loads, stores, and atomics performed in assembly programs.
13415bd8deadSopenharmony_ci
13425bd8deadSopenharmony_ci
13435bd8deadSopenharmony_ci    Modify Section 2.X.6.Y of the NV_fragment_program4 specification
13445bd8deadSopenharmony_ci
13455bd8deadSopenharmony_ci    (add new option section)
13465bd8deadSopenharmony_ci
13475bd8deadSopenharmony_ci    + Early Per-Fragment Tests (NV_early_fragment_tests)
13485bd8deadSopenharmony_ci
13495bd8deadSopenharmony_ci    If a fragment program specifies the "NV_early_fragment_tests" option, the
13505bd8deadSopenharmony_ci    depth and stencil tests will be performed prior to fragment program
13515bd8deadSopenharmony_ci    invocation, as described in Section 3.X.
13525bd8deadSopenharmony_ci    
13535bd8deadSopenharmony_ci
13545bd8deadSopenharmony_ci    Modify Section 2.X.7.Y of the NV_geometry_program4 specification
13555bd8deadSopenharmony_ci
13565bd8deadSopenharmony_ci    (Simply add the new input primitive type "PATCHES" to the list of tokens
13575bd8deadSopenharmony_ci     allowed by the "PRIMITIVE_IN" declaration.)
13585bd8deadSopenharmony_ci
13595bd8deadSopenharmony_ci    - Input Primitive Type (PRIMITIVE_IN)
13605bd8deadSopenharmony_ci
13615bd8deadSopenharmony_ci    The PRIMITIVE_IN statement declares the type of primitives seen by a
13625bd8deadSopenharmony_ci    geometry program.  The single argument must be one of "POINTS", "LINES",
13635bd8deadSopenharmony_ci    "LINES_ADJACENCY", "TRIANGLES", "TRIANGLES_ADJACENCY", or "PATCHES".
13645bd8deadSopenharmony_ci    
13655bd8deadSopenharmony_ci
13665bd8deadSopenharmony_ci    (Add a new optional program declaration to declare a geometry shader that
13675bd8deadSopenharmony_ci     is run <N> times per primitive.)
13685bd8deadSopenharmony_ci
13695bd8deadSopenharmony_ci    Geometry programs support three types of mandatory declaration statements,
13705bd8deadSopenharmony_ci    as described below.  Each of the three must be included exactly once in
13715bd8deadSopenharmony_ci    the geometry program.
13725bd8deadSopenharmony_ci    
13735bd8deadSopenharmony_ci    ...
13745bd8deadSopenharmony_ci
13755bd8deadSopenharmony_ci    Geometry programs also support one optional declaration statement.
13765bd8deadSopenharmony_ci
13775bd8deadSopenharmony_ci    - Program Invocation Count (INVOCATIONS)
13785bd8deadSopenharmony_ci
13795bd8deadSopenharmony_ci    The INVOCATIONS statement declares the number of times the geometry
13805bd8deadSopenharmony_ci    program is run on each primitive processed.  The single argument must be a
13815bd8deadSopenharmony_ci    positive integer less than or equal to the value of the
13825bd8deadSopenharmony_ci    implementation-dependent limit MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV.  Each
13835bd8deadSopenharmony_ci    invocation of the geometry program will have the same inputs and outputs
13845bd8deadSopenharmony_ci    except for the built-in input variable "primitive.invocation".  This
13855bd8deadSopenharmony_ci    variable will be an integer between 0 and <n>-1, where <n> is the declared
13865bd8deadSopenharmony_ci    number of invocations.  If omitted, the program invocation count is one.
13875bd8deadSopenharmony_ci
13885bd8deadSopenharmony_ci
13895bd8deadSopenharmony_ci    Section 2.X.8.Z, ATOM:  Atomic Global Memory Operation
13905bd8deadSopenharmony_ci
13915bd8deadSopenharmony_ci    The ATOM instruction performs an atomic global memory operation by reading
13925bd8deadSopenharmony_ci    from memory at the address specified by the second unsigned integer scalar
13935bd8deadSopenharmony_ci    operand, computing a new value based on the value read from memory and the
13945bd8deadSopenharmony_ci    first (vector) operand, and then writing the result back to the same
13955bd8deadSopenharmony_ci    memory address.  The memory transaction is atomic, guaranteeing that no
13965bd8deadSopenharmony_ci    other write to the memory accessed will occur between the time it is read
13975bd8deadSopenharmony_ci    and written by the ATOM instruction.  The result of the ATOM instruction
13985bd8deadSopenharmony_ci    is the scalar value read from memory.
13995bd8deadSopenharmony_ci
14005bd8deadSopenharmony_ci    The ATOM instruction has two required instruction modifiers.  The atomic
14015bd8deadSopenharmony_ci    modifier specifies the type of operation to be performed.  The storage
14025bd8deadSopenharmony_ci    modifier specifies the size and data type of the operand read from memory
14035bd8deadSopenharmony_ci    and the base data type of the operation used to compute the value to be
14045bd8deadSopenharmony_ci    written to memory.
14055bd8deadSopenharmony_ci
14065bd8deadSopenharmony_ci      atomic     storage
14075bd8deadSopenharmony_ci      modifier   modifiers            operation
14085bd8deadSopenharmony_ci      --------   ------------------   --------------------------------------
14095bd8deadSopenharmony_ci       ADD       U32, S32, U64        compute a sum
14105bd8deadSopenharmony_ci       MIN       U32, S32             compute minimum
14115bd8deadSopenharmony_ci       MAX       U32, S32             compute maximum
14125bd8deadSopenharmony_ci       IWRAP     U32                  increment memory, wrapping at operand
14135bd8deadSopenharmony_ci       DWRAP     U32                  decrement memory, wrapping at operand
14145bd8deadSopenharmony_ci       AND       U32, S32             compute bit-wise AND
14155bd8deadSopenharmony_ci       OR        U32, S32             compute bit-wise OR
14165bd8deadSopenharmony_ci       XOR       U32, S32             compute bit-wise XOR
14175bd8deadSopenharmony_ci       EXCH      U32, S32, U64        exchange memory with operand
14185bd8deadSopenharmony_ci       CSWAP     U32, S32, U64        compare-and-swap
14195bd8deadSopenharmony_ci
14205bd8deadSopenharmony_ci     Table X.Y, Supported atomic and storage modifiers for the ATOM
14215bd8deadSopenharmony_ci     instruction.
14225bd8deadSopenharmony_ci
14235bd8deadSopenharmony_ci    Not all storage modifiers are supported by ATOM, and the set of modifiers
14245bd8deadSopenharmony_ci    allowed for any given instruction depends on the atomic modifier
14255bd8deadSopenharmony_ci    specified.  Table X.Y enumerates the set of atomic modifiers supported by
14265bd8deadSopenharmony_ci    the ATOM instruction, and the storage modifiers allowed for each.
14275bd8deadSopenharmony_ci
14285bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
14295bd8deadSopenharmony_ci      address = ScalarLoad(op1);
14305bd8deadSopenharmony_ci      result = BufferMemoryLoad(address, storageModifier);
14315bd8deadSopenharmony_ci      switch (atomicModifier) {
14325bd8deadSopenharmony_ci      case ADD:
14335bd8deadSopenharmony_ci        writeval = tmp0.x + result;
14345bd8deadSopenharmony_ci        break;
14355bd8deadSopenharmony_ci      case MIN:
14365bd8deadSopenharmony_ci        writeval = min(tmp0.x, result);
14375bd8deadSopenharmony_ci        break;
14385bd8deadSopenharmony_ci      case MAX:
14395bd8deadSopenharmony_ci        writeval = max(tmp0.x, result);
14405bd8deadSopenharmony_ci        break;
14415bd8deadSopenharmony_ci      case IWRAP:
14425bd8deadSopenharmony_ci        writeval = (result >= tmp0.x) ? 0 : result+1; 
14435bd8deadSopenharmony_ci        break;
14445bd8deadSopenharmony_ci      case DWRAP:
14455bd8deadSopenharmony_ci        writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1;
14465bd8deadSopenharmony_ci        break;
14475bd8deadSopenharmony_ci      case AND:
14485bd8deadSopenharmony_ci        writeval = tmp0.x & result;
14495bd8deadSopenharmony_ci        break;
14505bd8deadSopenharmony_ci      case OR:
14515bd8deadSopenharmony_ci        writeval = tmp0.x | result;
14525bd8deadSopenharmony_ci        break;
14535bd8deadSopenharmony_ci      case XOR:
14545bd8deadSopenharmony_ci        writeval = tmp0.x ^ result;
14555bd8deadSopenharmony_ci        break;
14565bd8deadSopenharmony_ci      case EXCH:
14575bd8deadSopenharmony_ci        break;
14585bd8deadSopenharmony_ci      case CSWAP:
14595bd8deadSopenharmony_ci        if (result == tmp0.x) {
14605bd8deadSopenharmony_ci          writeval = tmp0.y;
14615bd8deadSopenharmony_ci        } else {
14625bd8deadSopenharmony_ci          return result;  // no memory store
14635bd8deadSopenharmony_ci        }
14645bd8deadSopenharmony_ci        break;
14655bd8deadSopenharmony_ci      }
14665bd8deadSopenharmony_ci      BufferMemoryStore(address, writeval, storageModifier);
14675bd8deadSopenharmony_ci
14685bd8deadSopenharmony_ci    ATOM performs a scalar atomic operation.  The <y>, <z>, and <w> components
14695bd8deadSopenharmony_ci    of the result vector are undefined.
14705bd8deadSopenharmony_ci      
14715bd8deadSopenharmony_ci    ATOM supports no base data type modifiers, but requires exactly one
14725bd8deadSopenharmony_ci    storage modifier.  The base data types of the result vector, and the first
14735bd8deadSopenharmony_ci    (vector) operand are derived from the storage modifier.  The second
14745bd8deadSopenharmony_ci    operand is always interpreted as a scalar unsigned integer.
14755bd8deadSopenharmony_ci
14765bd8deadSopenharmony_ci
14775bd8deadSopenharmony_ci    Section 2.X.8.Z, BFE:  Bitfield Extract
14785bd8deadSopenharmony_ci
14795bd8deadSopenharmony_ci    The BFE instruction extracts a selected set of performs a component-wise
14805bd8deadSopenharmony_ci    bit extraction of the second vector operand to yield a result vector.  For
14815bd8deadSopenharmony_ci    each component, the number of bits extracted is given by the x component
14825bd8deadSopenharmony_ci    of the first vector operand, and the bit number of the least significant
14835bd8deadSopenharmony_ci    bit extracted is given by the y component of the first vector operand.
14845bd8deadSopenharmony_ci
14855bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
14865bd8deadSopenharmony_ci      tmp1 = VectorLoad(op1);
14875bd8deadSopenharmony_ci      result.x = BitfieldExtract(tmp0.x, tmp0.y, tmp1.x);
14885bd8deadSopenharmony_ci      result.y = BitfieldExtract(tmp0.x, tmp0.y, tmp1.y);
14895bd8deadSopenharmony_ci      result.z = BitfieldExtract(tmp0.x, tmp0.y, tmp1.z);
14905bd8deadSopenharmony_ci      result.w = BitfieldExtract(tmp0.x, tmp0.y, tmp1.w);
14915bd8deadSopenharmony_ci
14925bd8deadSopenharmony_ci    If the number of bits to extract is zero, zero is returned.  The results
14935bd8deadSopenharmony_ci    of bitfield extraction are undefined
14945bd8deadSopenharmony_ci
14955bd8deadSopenharmony_ci      * if the number of bits to extract or the starting offset is negative,
14965bd8deadSopenharmony_ci      * if the sum of the number of bits to extract and the starting offset
14975bd8deadSopenharmony_ci        is greater than the total number of bits in the operand/result, or
14985bd8deadSopenharmony_ci      * if the starting offset is greater than or equal to the total number of
14995bd8deadSopenharmony_ci        bits in the operand/result.
15005bd8deadSopenharmony_ci
15015bd8deadSopenharmony_ci      Type BitfieldExtract(Type bits, Type offset, Type value)
15025bd8deadSopenharmony_ci      {
15035bd8deadSopenharmony_ci        if (bits < 0 || offset < 0 || offset >= TotalBits(Type) ||
15045bd8deadSopenharmony_ci            bits + offset > TotalBits(Type)) {
15055bd8deadSopenharmony_ci          /* result undefined */
15065bd8deadSopenharmony_ci        } else if (bits == 0) {
15075bd8deadSopenharmony_ci          return 0;
15085bd8deadSopenharmony_ci        } else {
15095bd8deadSopenharmony_ci          return (value << (TotalBits(Type) - (bits+offset))) >>
15105bd8deadSopenharmony_ci                   (TotalBits(type) - bits);
15115bd8deadSopenharmony_ci        }
15125bd8deadSopenharmony_ci      }
15135bd8deadSopenharmony_ci
15145bd8deadSopenharmony_ci    BFE supports only signed and unsigned integer data type modifiers.  For
15155bd8deadSopenharmony_ci    signed integer data types, the extracted value is sign-extended (i.e.,
15165bd8deadSopenharmony_ci    filled with ones if the most significant bit extracted is one and filled
15175bd8deadSopenharmony_ci    with zeroes otherwise).  For unsigned integer data types, the extracted
15185bd8deadSopenharmony_ci    value is zero-extended.
15195bd8deadSopenharmony_ci
15205bd8deadSopenharmony_ci
15215bd8deadSopenharmony_ci    Section 2.X.8.Z, BFI:  Bitfield Insert
15225bd8deadSopenharmony_ci
15235bd8deadSopenharmony_ci    The BFI instruction performs a component-wise bitfield insertion of the
15245bd8deadSopenharmony_ci    second vector operand into the third vector operand to yield a result
15255bd8deadSopenharmony_ci    vector.  For each component, the <n> least significant bits are extracted
15265bd8deadSopenharmony_ci    from the corresponding component of the second vector operand, where <n>
15275bd8deadSopenharmony_ci    is given by the x component of the first vector operand.  Those bits are
15285bd8deadSopenharmony_ci    merged into the corresponding component of the third vector operand,
15295bd8deadSopenharmony_ci    replacing bits <b> through <b>+<n>-1, to produce the result.  The bit
15305bd8deadSopenharmony_ci    offset <b> is specified by the y component of the first operand.
15315bd8deadSopenharmony_ci
15325bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
15335bd8deadSopenharmony_ci      tmp1 = VectorLoad(op1);
15345bd8deadSopenharmony_ci      tmp2 = VectorLoad(op2);
15355bd8deadSopenharmony_ci      result.x = BitfieldInsert(op0.x, op0.y, tmp1.x, tmp2.x);
15365bd8deadSopenharmony_ci      result.y = BitfieldInsert(op0.x, op0.y, tmp1.y, tmp2.y);
15375bd8deadSopenharmony_ci      result.z = BitfieldInsert(op0.x, op0.y, tmp1.z, tmp2.z);
15385bd8deadSopenharmony_ci      result.w = BitfieldInsert(op0.x, op0.y, tmp1.w, tmp2.w);
15395bd8deadSopenharmony_ci
15405bd8deadSopenharmony_ci    The results of bitfield insertion are undefined
15415bd8deadSopenharmony_ci
15425bd8deadSopenharmony_ci      * if the number of bits to insert or the starting offset is negative,
15435bd8deadSopenharmony_ci      * if the sum of the number of bits to insert and the starting offset
15445bd8deadSopenharmony_ci        is greater than the total number of bits in the operand/result, or
15455bd8deadSopenharmony_ci      * if the starting offset is greater than or equal to the total number of
15465bd8deadSopenharmony_ci        bits in the operand/result.
15475bd8deadSopenharmony_ci
15485bd8deadSopenharmony_ci      Type BitfieldInsert(Type bits, Type offset, Type src, Type dst)
15495bd8deadSopenharmony_ci      {
15505bd8deadSopenharmony_ci        if (bits < 0 || offset < 0 || offset >= TotalBits(type) ||
15515bd8deadSopenharmony_ci            bits + offset > TotalBits(Type)) {
15525bd8deadSopenharmony_ci          /* result undefined */
15535bd8deadSopenharmony_ci        } else if (bits == TotalBits(Type)) {
15545bd8deadSopenharmony_ci          return src;
15555bd8deadSopenharmony_ci        } else {
15565bd8deadSopenharmony_ci          Type mask = ((1 << bits) - 1) << offset;
15575bd8deadSopenharmony_ci          return ((src << offset) & mask) | (dst & (~mask));
15585bd8deadSopenharmony_ci        }
15595bd8deadSopenharmony_ci      }    
15605bd8deadSopenharmony_ci
15615bd8deadSopenharmony_ci    BFI supports only signed and unsigned integer data type modifiers.  If no
15625bd8deadSopenharmony_ci    type modifier is specified, the operand and result vectors are treated as
15635bd8deadSopenharmony_ci    signed integers.
15645bd8deadSopenharmony_ci
15655bd8deadSopenharmony_ci
15665bd8deadSopenharmony_ci    Section 2.X.8.Z, BFR:  Bitfield Reverse
15675bd8deadSopenharmony_ci
15685bd8deadSopenharmony_ci    The BFR instruction performs a component-wise bit reversal of the single
15695bd8deadSopenharmony_ci    vector operand to produce a result vector.  Bit reversal is performed by
15705bd8deadSopenharmony_ci    exchanging the most and least significant bits, the second-most and
15715bd8deadSopenharmony_ci    second-least significant bits, and so on. 
15725bd8deadSopenharmony_ci
15735bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
15745bd8deadSopenharmony_ci      result.x = BitReverse(tmp0.x);
15755bd8deadSopenharmony_ci      result.y = BitReverse(tmp0.y);
15765bd8deadSopenharmony_ci      result.z = BitReverse(tmp0.z);
15775bd8deadSopenharmony_ci      result.w = BitReverse(tmp0.w);
15785bd8deadSopenharmony_ci
15795bd8deadSopenharmony_ci    BFR supports only signed and unsigned integer data type modifiers.  If no
15805bd8deadSopenharmony_ci    type modifier is specified, the operand and result vectors are treated as
15815bd8deadSopenharmony_ci    signed integers.
15825bd8deadSopenharmony_ci
15835bd8deadSopenharmony_ci
15845bd8deadSopenharmony_ci    Section 2.X.8.Z, BTC:  Bit Count
15855bd8deadSopenharmony_ci
15865bd8deadSopenharmony_ci    The BTC instruction performs a component-wise bit count of the single
15875bd8deadSopenharmony_ci    source vector to yield a result vector.  Each component of the result
15885bd8deadSopenharmony_ci    vector contains the number of one bits in the corresponding component of
15895bd8deadSopenharmony_ci    the source vector.
15905bd8deadSopenharmony_ci
15915bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
15925bd8deadSopenharmony_ci      result.x = BitCount(tmp0.x);
15935bd8deadSopenharmony_ci      result.y = BitCount(tmp0.y);
15945bd8deadSopenharmony_ci      result.z = BitCount(tmp0.z);
15955bd8deadSopenharmony_ci      result.w = BitCount(tmp0.w);
15965bd8deadSopenharmony_ci
15975bd8deadSopenharmony_ci    BTC supports only signed and unsigned integer data type modifiers.  If no
15985bd8deadSopenharmony_ci    type modifier is specified, both operands and the result are treated as
15995bd8deadSopenharmony_ci    signed integers.
16005bd8deadSopenharmony_ci
16015bd8deadSopenharmony_ci
16025bd8deadSopenharmony_ci    Section 2.X.8.Z, BTFL:  Find Least Significant Bit
16035bd8deadSopenharmony_ci
16045bd8deadSopenharmony_ci    The BTFL instruction searches for the least significant bit of each
16055bd8deadSopenharmony_ci    component of the single source vector, yielding a result vector comprising
16065bd8deadSopenharmony_ci    the bit number of the located bit for each component.
16075bd8deadSopenharmony_ci
16085bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
16095bd8deadSopenharmony_ci      result.x = FindLSB(tmp0.x);
16105bd8deadSopenharmony_ci      result.y = FindLSB(tmp0.y);
16115bd8deadSopenharmony_ci      result.z = FindLSB(tmp0.z);
16125bd8deadSopenharmony_ci      result.w = FindLSB(tmp0.w);
16135bd8deadSopenharmony_ci
16145bd8deadSopenharmony_ci    BTFL supports only signed and unsigned integer data type modifiers.  For
16155bd8deadSopenharmony_ci    unsigned integer data types, the search will yield the bit number of the
16165bd8deadSopenharmony_ci    least significant one bit in each component, or the maximum integer (all
16175bd8deadSopenharmony_ci    bits are ones) if the source vector component is zero.  For signed data
16185bd8deadSopenharmony_ci    types, the search will yield the bit number of the least significant one
16195bd8deadSopenharmony_ci    bit in each component, or -1 if the source vector component is zero.  If
16205bd8deadSopenharmony_ci    no type modifier is specified, both operands and the result are treated as
16215bd8deadSopenharmony_ci    signed integers.
16225bd8deadSopenharmony_ci
16235bd8deadSopenharmony_ci
16245bd8deadSopenharmony_ci    Section 2.X.8.Z, BTFM:  Find Most Significant Bit
16255bd8deadSopenharmony_ci
16265bd8deadSopenharmony_ci    The BTFM instruction searches for the most significant bit of each
16275bd8deadSopenharmony_ci    component of the single source vector, yielding a result vector comprising
16285bd8deadSopenharmony_ci    the bit number of the located bit for each component.  
16295bd8deadSopenharmony_ci
16305bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
16315bd8deadSopenharmony_ci      result.x = FindMSB(tmp0.x);
16325bd8deadSopenharmony_ci      result.y = FindMSB(tmp0.y);
16335bd8deadSopenharmony_ci      result.z = FindMSB(tmp0.z);
16345bd8deadSopenharmony_ci      result.w = FindMSB(tmp0.w);
16355bd8deadSopenharmony_ci
16365bd8deadSopenharmony_ci    BTFM supports only signed and unsigned integer data type modifiers.  For
16375bd8deadSopenharmony_ci    unsigned integer data types, the search will yield the bit number of the
16385bd8deadSopenharmony_ci    most significant one bit in each component , or the maximum integer (all
16395bd8deadSopenharmony_ci    bits are ones) if the source vector component is zero.  For signed data
16405bd8deadSopenharmony_ci    types, the search will yield the bit number of the most significant one
16415bd8deadSopenharmony_ci    bit if the source value is positive, the bit number of the most
16425bd8deadSopenharmony_ci    significant zero bit if the source value is negative, or -1 if the source
16435bd8deadSopenharmony_ci    value is zero.  If no type modifier is specified, both operands and the
16445bd8deadSopenharmony_ci    result are treated as signed integers.
16455bd8deadSopenharmony_ci
16465bd8deadSopenharmony_ci
16475bd8deadSopenharmony_ci    Section 2.X.8.Z, CVT:  Data Type Conversion
16485bd8deadSopenharmony_ci
16495bd8deadSopenharmony_ci    The CVT instruction converts each component of the single source vector
16505bd8deadSopenharmony_ci    from one specified data type to another to yield a result vector.
16515bd8deadSopenharmony_ci
16525bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
16535bd8deadSopenharmony_ci      result = DataTypeConvert(tmp0);
16545bd8deadSopenharmony_ci
16555bd8deadSopenharmony_ci    The CVT instruction requires two storage modifiers.  The first specifies
16565bd8deadSopenharmony_ci    the data type of the result components; the second specifies the data type
16575bd8deadSopenharmony_ci    of the operand components.  The supported storage modifiers are F16, F32,
16585bd8deadSopenharmony_ci    F64, S8, S16, S32, S64, U8, U16, U32, and U64.  A storage modifier of
16595bd8deadSopenharmony_ci    "F16" indicates a source or destination that is treated as having a
16605bd8deadSopenharmony_ci    floating-point type, but whose sixteen least significant bits describe a
16615bd8deadSopenharmony_ci    16-bit floating-point value using the encoding provided in Section 2.1.2.
16625bd8deadSopenharmony_ci
16635bd8deadSopenharmony_ci    If the component size of the source register doesn't match the size of the
16645bd8deadSopenharmony_ci    specified operand data type, the source register components are first
16655bd8deadSopenharmony_ci    interpreted as a value with the same base data type as the operand and
16665bd8deadSopenharmony_ci    converted to the operand data type.  The operand components are then
16675bd8deadSopenharmony_ci    converted to the result data type.  Finally, if the component size of the
16685bd8deadSopenharmony_ci    destination register doesn't match the specified result data type, the
16695bd8deadSopenharmony_ci    result components are converted to values of the same base data type with
16705bd8deadSopenharmony_ci    a size matching the result register's component size.
16715bd8deadSopenharmony_ci
16725bd8deadSopenharmony_ci    Data type conversion is performed by first converting the source
16735bd8deadSopenharmony_ci    components to an infinite-precision value of the destination data type,
16745bd8deadSopenharmony_ci    and then converting to the result data type.  When converting between
16755bd8deadSopenharmony_ci    floating-point and integer values, integer values are never interpreted as
16765bd8deadSopenharmony_ci    being normalized to [0,1] or [-1,+1].  Converting the floating-point
16775bd8deadSopenharmony_ci    special values -INF, +INF, and NaN to integers will yield undefined
16785bd8deadSopenharmony_ci    results.
16795bd8deadSopenharmony_ci
16805bd8deadSopenharmony_ci    When converting from a non-integral floating-point value to an integer,
16815bd8deadSopenharmony_ci    one of the two integers closest in value to the floating-point value are
16825bd8deadSopenharmony_ci    chosen according to the rounding instruction modifier.  If "CEIL" or "FLR"
16835bd8deadSopenharmony_ci    is specified, the larger or smaller value, respectively is chosen.  If
16845bd8deadSopenharmony_ci    "TRUNC" is specified, the value nearest to zero is chosen.  If "ROUND" is
16855bd8deadSopenharmony_ci    specified, if one integer is nearer in value to the original
16865bd8deadSopenharmony_ci    floating-point value, it is chosen; otherwise, the even integer is chosen.
16875bd8deadSopenharmony_ci    "ROUND" is used if no rounding modifier is specified.
16885bd8deadSopenharmony_ci
16895bd8deadSopenharmony_ci    When converting from the infinite-precision intermediate value to the
16905bd8deadSopenharmony_ci    destination data type:
16915bd8deadSopenharmony_ci
16925bd8deadSopenharmony_ci      * Floating-point values not exactly representable in the destination
16935bd8deadSopenharmony_ci        data are rounded to one of the two nearest values in the destination
16945bd8deadSopenharmony_ci        type according to the rounding modifier.  Note that the results of
16955bd8deadSopenharmony_ci        float-to-float conversion are not automatically rounded to integer
16965bd8deadSopenharmony_ci        values, even if a rounding modifier such as CEIL or FLR is specified.
16975bd8deadSopenharmony_ci
16985bd8deadSopenharmony_ci      * Integer values are clamped to the closest value representable in the
16995bd8deadSopenharmony_ci        result data type if the "SAT" (saturation) modifier is specified.
17005bd8deadSopenharmony_ci
17015bd8deadSopenharmony_ci      * Integer values drop the most significant bits if the "SAT" modifier is
17025bd8deadSopenharmony_ci        not specified.
17035bd8deadSopenharmony_ci
17045bd8deadSopenharmony_ci    Negation and absolute value operators are not supported on the source
17055bd8deadSopenharmony_ci    operand; a program using such operators will fail to compile.
17065bd8deadSopenharmony_ci
17075bd8deadSopenharmony_ci    CVT supports no data type modifiers; the type of the operand and result
17085bd8deadSopenharmony_ci    vectors is fully specified by the required storage modifiers.
17095bd8deadSopenharmony_ci
17105bd8deadSopenharmony_ci
17115bd8deadSopenharmony_ci    Section 2.X.8.Z, EMIT:  Emit Vertex
17125bd8deadSopenharmony_ci
17135bd8deadSopenharmony_ci    (Modify the description of the EMIT opcode to deal with the interaction
17145bd8deadSopenharmony_ci     with multiple vertex streams added by ARB_transform_feedback3.  For more
17155bd8deadSopenharmony_ci     information on vertex streams, see ARB_transform_feedback3.)
17165bd8deadSopenharmony_ci
17175bd8deadSopenharmony_ci    The EMIT instruction emits a new vertex to be added to the current output
17185bd8deadSopenharmony_ci    primitive for vertex stream zero.  The attributes of the emitted vertex
17195bd8deadSopenharmony_ci    are given by the current values of the vertex result variables.  After the
17205bd8deadSopenharmony_ci    EMIT instruction completes, a new vertex is started and all result
17215bd8deadSopenharmony_ci    variables become undefined.
17225bd8deadSopenharmony_ci
17235bd8deadSopenharmony_ci
17245bd8deadSopenharmony_ci    Section 2.X.8.Z, EMITS:  Emit Vertex to Stream
17255bd8deadSopenharmony_ci
17265bd8deadSopenharmony_ci    (Add new geometry program opcode; the EMITS instruction is not supported
17275bd8deadSopenharmony_ci     for any other program types.  For more information on vertex streams, see
17285bd8deadSopenharmony_ci     ARB_transform_feedback3.)
17295bd8deadSopenharmony_ci
17305bd8deadSopenharmony_ci    The EMITS instruction emits a new vertex to be added to the current output
17315bd8deadSopenharmony_ci    primitive for the vertex stream specified by the single signed integer
17325bd8deadSopenharmony_ci    scalar operand.  The attributes of the emitted vertex are given by the
17335bd8deadSopenharmony_ci    current values of the vertex result variables.  After the EMITS
17345bd8deadSopenharmony_ci    instruction completes, a new vertex is started and all result variables
17355bd8deadSopenharmony_ci    become undefined.
17365bd8deadSopenharmony_ci
17375bd8deadSopenharmony_ci    If the specified stream is negative or greater than or equal to the
17385bd8deadSopenharmony_ci    implementation-dependent number of vertex streams
17395bd8deadSopenharmony_ci    (MAX_VERTEX_STREAMS_NV), the results of the instruction are undefined.
17405bd8deadSopenharmony_ci
17415bd8deadSopenharmony_ci
17425bd8deadSopenharmony_ci    Section 2.X.8.Z, IPAC:  Interpolate at Centroid
17435bd8deadSopenharmony_ci
17445bd8deadSopenharmony_ci    The IPAC instruction generates a result vector by evaluating the fragment
17455bd8deadSopenharmony_ci    attribute named by the single vector operand at the centroid location.
17465bd8deadSopenharmony_ci    The result vector would be identical to the value obtained by a MOV
17475bd8deadSopenharmony_ci    instruction if the attribute variable were declared using the CENTROID
17485bd8deadSopenharmony_ci    modifier.  
17495bd8deadSopenharmony_ci
17505bd8deadSopenharmony_ci    When interpolating an attribute variable with this instruction, the
17515bd8deadSopenharmony_ci    CENTROID and SAMPLE attribute variable modifiers are ignored.  The FLAT
17525bd8deadSopenharmony_ci    and NOPERSPECTIVE variable modifiers operate normally.
17535bd8deadSopenharmony_ci
17545bd8deadSopenharmony_ci     tmp0 = Interpolate(op0, x_pixel + x_centroid, y_pixel + x_centroid);
17555bd8deadSopenharmony_ci     result = tmp0;
17565bd8deadSopenharmony_ci
17575bd8deadSopenharmony_ci    IPAC supports only floating-point data type modifiers.  A program will
17585bd8deadSopenharmony_ci    fail to load if it contains an IPAC instruction whose single operand is
17595bd8deadSopenharmony_ci    not a fragment program attribute variable or matches the "fragment.facing"
17605bd8deadSopenharmony_ci    or "primitive.id" binding.
17615bd8deadSopenharmony_ci
17625bd8deadSopenharmony_ci
17635bd8deadSopenharmony_ci    Section 2.X.8.Z, IPAO:  Interpolate with Offset
17645bd8deadSopenharmony_ci
17655bd8deadSopenharmony_ci    The IPAO instruction generates a result vector by evaluating the fragment
17665bd8deadSopenharmony_ci    attribute named by the single vector operand at an offset from the pixel
17675bd8deadSopenharmony_ci    center given by the x and y components of the second vector operand.  The
17685bd8deadSopenharmony_ci    z and w components of the second vector operand are ignored.  The (x,y)
17695bd8deadSopenharmony_ci    position used for interpolating the attribute variable is obtained by
17705bd8deadSopenharmony_ci    adding the (x,y) offsets in the second vector operand to the (x,y)
17715bd8deadSopenharmony_ci    position of the pixel center.  
17725bd8deadSopenharmony_ci
17735bd8deadSopenharmony_ci    The range of offsets supported by the IPAO instruction is
17745bd8deadSopenharmony_ci    implementation-dependent.  The position used to interpolate the attribute
17755bd8deadSopenharmony_ci    variable is undefined if the x or y component of the second operand is
17765bd8deadSopenharmony_ci    less than MIN_FRAGMENT_INTERPOLATION_OFFSET_NV or greater than
17775bd8deadSopenharmony_ci    MAX_FRAGMENT_INTERPOLATION_OFFSET_NV.  Additionally, the granularity of
17785bd8deadSopenharmony_ci    offsets may be limited.  The (x,y) value may be snapped to a fixed
17795bd8deadSopenharmony_ci    sub-pixel grid with the number of subpixel bits given by
17805bd8deadSopenharmony_ci    FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV.
17815bd8deadSopenharmony_ci
17825bd8deadSopenharmony_ci    When interpolating an attribute variable with this instruction, the
17835bd8deadSopenharmony_ci    CENTROID and SAMPLE attribute variable modifiers are ignored.  The FLAT
17845bd8deadSopenharmony_ci    and NOPERSPECTIVE variable modifiers operate normally.
17855bd8deadSopenharmony_ci
17865bd8deadSopenharmony_ci     tmp1 = VectorLoad(op1);
17875bd8deadSopenharmony_ci     tmp0 = Interpolate(op0, x_pixel + tmp1.x, y_pixel + tmp2.x);
17885bd8deadSopenharmony_ci     result = tmp0;
17895bd8deadSopenharmony_ci
17905bd8deadSopenharmony_ci    IPAO supports only floating-point data type modifiers.  A program will
17915bd8deadSopenharmony_ci    fail to load if it contains an IPAO instruction whose first operand is not
17925bd8deadSopenharmony_ci    a fragment program attribute variable or matches the "fragment.facing" or
17935bd8deadSopenharmony_ci    "primitive.id" binding.
17945bd8deadSopenharmony_ci
17955bd8deadSopenharmony_ci
17965bd8deadSopenharmony_ci    Section 2.X.8.Z, IPAS:  Interpolate at Sample Location
17975bd8deadSopenharmony_ci
17985bd8deadSopenharmony_ci    The IPAS instruction generates a result vector by evaluating the fragment
17995bd8deadSopenharmony_ci    attribute named by the single vector operand at the location of the
18005bd8deadSopenharmony_ci    pixel's sample whose sample number is given by the second integer scalar
18015bd8deadSopenharmony_ci    operand.  If multisample buffers are not available (SAMPLE_BUFFERS is
18025bd8deadSopenharmony_ci    zero), the attribute will be evaluated at the pixel center.  If the sample
18035bd8deadSopenharmony_ci    number given by the second operand does not exist, the position used to
18045bd8deadSopenharmony_ci    interpolate the attribute is undefined.
18055bd8deadSopenharmony_ci
18065bd8deadSopenharmony_ci    When interpolating an attribute variable with this instruction, the
18075bd8deadSopenharmony_ci    CENTROID and SAMPLE attribute variable modifiers are ignored.  The FLAT
18085bd8deadSopenharmony_ci    and NOPERSPECTIVE variable modifiers operate normally.
18095bd8deadSopenharmony_ci
18105bd8deadSopenharmony_ci     sample = ScalarLoad(op1);
18115bd8deadSopenharmony_ci     tmp1 = SampleOffset(sample);
18125bd8deadSopenharmony_ci     tmp0 = Interpolate(op0, x_pixel + tmp1.x, y_pixel + tmp2.x);
18135bd8deadSopenharmony_ci     result = tmp0;
18145bd8deadSopenharmony_ci
18155bd8deadSopenharmony_ci    IPAS supports only floating-point data type modifiers.  A program will
18165bd8deadSopenharmony_ci    fail to load if it contains an IPAO instruction whose first operand is not
18175bd8deadSopenharmony_ci    a fragment program attribute variable or matches the "fragment.facing" or
18185bd8deadSopenharmony_ci    "primitive.id" binding.
18195bd8deadSopenharmony_ci
18205bd8deadSopenharmony_ci
18215bd8deadSopenharmony_ci    Section 2.X.8.Z, LDC:  Load from Constant Buffer
18225bd8deadSopenharmony_ci
18235bd8deadSopenharmony_ci    The LDC instruction loads a vector operand from a buffer object to yield a
18245bd8deadSopenharmony_ci    result vector.  The operand used for the LDC instruction must correspond
18255bd8deadSopenharmony_ci    to a parameter buffer variable declared using the "CBUFFER" statement; a
18265bd8deadSopenharmony_ci    program will fail to load if any other type of operand is used in an LDC
18275bd8deadSopenharmony_ci    instruction.
18285bd8deadSopenharmony_ci
18295bd8deadSopenharmony_ci      result = BufferMemoryLoad(&op0, storageModifier);
18305bd8deadSopenharmony_ci
18315bd8deadSopenharmony_ci    A base operand vector is fetched from memory as described in Section
18325bd8deadSopenharmony_ci    2.X.4.5, with the GPU address derived from the binding corresponding to
18335bd8deadSopenharmony_ci    the operand.  A final operand vector is derived from the base operand
18345bd8deadSopenharmony_ci    vector by applying swizzle, negation, and absolute value operand modifiers
18355bd8deadSopenharmony_ci    as described in Section 2.X.4.2.
18365bd8deadSopenharmony_ci
18375bd8deadSopenharmony_ci    The amount of memory in any given buffer object binding accessible by the
18385bd8deadSopenharmony_ci    LDC instruction may be limited.  If any component fetched by the LDC
18395bd8deadSopenharmony_ci    instruction extends 4*<n> or more basic machine units from the beginning
18405bd8deadSopenharmony_ci    of the buffer object binding, where <n> is the implementation-dependent
18415bd8deadSopenharmony_ci    constant MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV, the value fetched for that
18425bd8deadSopenharmony_ci    component will be undefined.
18435bd8deadSopenharmony_ci
18445bd8deadSopenharmony_ci    LDC supports no base data type modifiers, but requires exactly one storage
18455bd8deadSopenharmony_ci    modifier.  The base data types of the operand and result vectors are
18465bd8deadSopenharmony_ci    derived from the storage modifier.
18475bd8deadSopenharmony_ci
18485bd8deadSopenharmony_ci
18495bd8deadSopenharmony_ci    Section 2.X.8.Z, LOAD:  Global Load
18505bd8deadSopenharmony_ci
18515bd8deadSopenharmony_ci    The LOAD instruction generates a result vector by reading an address from
18525bd8deadSopenharmony_ci    the single unsigned integer scalar operand and fetching data from buffer
18535bd8deadSopenharmony_ci    object memory, as described in Section 2.X.4.5.
18545bd8deadSopenharmony_ci
18555bd8deadSopenharmony_ci      address = ScalarLoad(op0);
18565bd8deadSopenharmony_ci      result = BufferMemoryLoad(address, storageModifier);
18575bd8deadSopenharmony_ci
18585bd8deadSopenharmony_ci    LOAD supports no base data type modifiers, but requires exactly one
18595bd8deadSopenharmony_ci    storage modifier.  The base data type of the result vector is derived from
18605bd8deadSopenharmony_ci    the storage modifier.  The single scalar operand is always interpreted as
18615bd8deadSopenharmony_ci    an unsigned integer.
18625bd8deadSopenharmony_ci
18635bd8deadSopenharmony_ci
18645bd8deadSopenharmony_ci    Section 2.X.8.Z, MEMBAR:  Memory Barrier
18655bd8deadSopenharmony_ci
18665bd8deadSopenharmony_ci    The MEMBAR instruction synchronizes memory transactions to ensure that
18675bd8deadSopenharmony_ci    memory transactions resulting from any instruction executed by the thread
18685bd8deadSopenharmony_ci    prior to the MEMBAR instruction complete prior to any memory transactions
18695bd8deadSopenharmony_ci    issued after the instruction.
18705bd8deadSopenharmony_ci
18715bd8deadSopenharmony_ci    MEMBAR has no operands and generates no result.
18725bd8deadSopenharmony_ci
18735bd8deadSopenharmony_ci
18745bd8deadSopenharmony_ci    Section 2.X.8.Z, PK64:  Pack 64-Bit Component
18755bd8deadSopenharmony_ci
18765bd8deadSopenharmony_ci    The PK64 instruction reads the four components of the single vector
18775bd8deadSopenharmony_ci    operand as 32-bit values, packs the bit representations of these into a
18785bd8deadSopenharmony_ci    pair of 64-bit values, and replicates those to produce a four-component
18795bd8deadSopenharmony_ci    result vector.  The "x" and "y" components of the operand are packed to
18805bd8deadSopenharmony_ci    produce the "x" and "z" components of the result vector; the "z" and "w"
18815bd8deadSopenharmony_ci    components of the operand are packed to produce the "y" and "w" components
18825bd8deadSopenharmony_ci    of the result vector.  The PK64 instruction can be reversed by the UP64
18835bd8deadSopenharmony_ci    instruction below.
18845bd8deadSopenharmony_ci
18855bd8deadSopenharmony_ci    This instruction is intended to allow a program to reconstruct 64-bit
18865bd8deadSopenharmony_ci    integer or floating-point values generated by the application but passed
18875bd8deadSopenharmony_ci    to the GL as two 32-bit values taken from adjacent words in memory.  The
18885bd8deadSopenharmony_ci    ability to use this technique depends on how the 64-bit value is stored in
18895bd8deadSopenharmony_ci    memory.  For "little-endian" processors, first 32-bit value would hold the
18905bd8deadSopenharmony_ci    with the least significant 32 bits of the 64-bit value.  For "big-endian"
18915bd8deadSopenharmony_ci    processors, the first 32-bit value holds the most significant 32 bits of
18925bd8deadSopenharmony_ci    the 64-bit value.  This reconstruction assumes that the first 32-bit word
18935bd8deadSopenharmony_ci    comes from the x component of the operand and the second 32-bit word comes
18945bd8deadSopenharmony_ci    from the y component.  The method used to construct a 64-bit value from a
18955bd8deadSopenharmony_ci    pair of 32-bit values depends on the processor type.
18965bd8deadSopenharmony_ci
18975bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
18985bd8deadSopenharmony_ci
18995bd8deadSopenharmony_ci      if (underlying system is little-endian) {
19005bd8deadSopenharmony_ci        result.x = RawBits(tmp.x) | (RawBits(tmp.y) << 32);
19015bd8deadSopenharmony_ci        result.y = RawBits(tmp.z) | (RawBits(tmp.w) << 32);
19025bd8deadSopenharmony_ci        result.z = RawBits(tmp.x) | (RawBits(tmp.y) << 32);
19035bd8deadSopenharmony_ci        result.w = RawBits(tmp.z) | (RawBits(tmp.w) << 32);
19045bd8deadSopenharmony_ci      } else {
19055bd8deadSopenharmony_ci        result.x = RawBits(tmp.y) | (RawBits(tmp.x) << 32);
19065bd8deadSopenharmony_ci        result.y = RawBits(tmp.w) | (RawBits(tmp.z) << 32);
19075bd8deadSopenharmony_ci        result.z = RawBits(tmp.y) | (RawBits(tmp.x) << 32);
19085bd8deadSopenharmony_ci        result.w = RawBits(tmp.w) | (RawBits(tmp.z) << 32);
19095bd8deadSopenharmony_ci      }
19105bd8deadSopenharmony_ci
19115bd8deadSopenharmony_ci    PK64 supports integer and floating-point data type modifiers, which
19125bd8deadSopenharmony_ci    specify the base data type of the operand and result.  The single vector
19135bd8deadSopenharmony_ci    operand is always treated as having 32-bit components, and the result is
19145bd8deadSopenharmony_ci    treated as a vector with 64-bit components.  The encoding performed by
19155bd8deadSopenharmony_ci    PK64 can be reversed using the UP64 instruction.
19165bd8deadSopenharmony_ci
19175bd8deadSopenharmony_ci    A program will fail to load if it contains a PK64 instruction that writes
19185bd8deadSopenharmony_ci    its results to a variable not declared as "LONG".
19195bd8deadSopenharmony_ci
19205bd8deadSopenharmony_ci
19215bd8deadSopenharmony_ci    Section 2.X.8.Z, STORE:  Global Store
19225bd8deadSopenharmony_ci
19235bd8deadSopenharmony_ci    The STORE instruction reads an address from the second unsigned integer
19245bd8deadSopenharmony_ci    scalar operand and writes the contents of the first vector operand to
19255bd8deadSopenharmony_ci    buffer object memory at that address, as described in Section 2.X.4.5.
19265bd8deadSopenharmony_ci    This instruction generates no result.
19275bd8deadSopenharmony_ci
19285bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
19295bd8deadSopenharmony_ci      address = ScalarLoad(op1);
19305bd8deadSopenharmony_ci      BufferMemoryStore(address, tmp0, storageModifier);
19315bd8deadSopenharmony_ci
19325bd8deadSopenharmony_ci    STORE supports no base data type modifiers, but requires exactly one
19335bd8deadSopenharmony_ci    storage modifier.  The base data type of the vector components of the
19345bd8deadSopenharmony_ci    first operand is derived from the storage modifier.  The second operand is
19355bd8deadSopenharmony_ci    always interpreted as an unsigned integer scalar.
19365bd8deadSopenharmony_ci
19375bd8deadSopenharmony_ci
19385bd8deadSopenharmony_ci    Section 2.X.8.Z, TEX:  Texture Sample
19395bd8deadSopenharmony_ci
19405bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
19415bd8deadSopenharmony_ci     longer need to be immediate arguments.)
19425bd8deadSopenharmony_ci
19435bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
19445bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
19455bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
19465bd8deadSopenharmony_ci      } else {
19475bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
19485bd8deadSopenharmony_ci      }
19495bd8deadSopenharmony_ci      ddx = ComputePartialsX(tmp);
19505bd8deadSopenharmony_ci      ddy = ComputePartialsY(tmp);
19515bd8deadSopenharmony_ci      lambda = ComputeLOD(ddx, ddy);
19525bd8deadSopenharmony_ci      result = TextureSample(tmp, lambda, ddx, ddy, itmp);
19535bd8deadSopenharmony_ci
19545bd8deadSopenharmony_ci
19555bd8deadSopenharmony_ci    Section 2.X.8.Z, TGALL:  Test for All Non-Zero in a Thread Group
19565bd8deadSopenharmony_ci
19575bd8deadSopenharmony_ci    The TGALL instruction produces a result vector by reading a vector operand
19585bd8deadSopenharmony_ci    for each active thread in the current thread group and comparing each
19595bd8deadSopenharmony_ci    component to zero.  A result vector component contains a TRUE value
19605bd8deadSopenharmony_ci    (described below) if the value of the corresponding component in the
19615bd8deadSopenharmony_ci    operand vector is non-zero for all active threads, and a FALSE value
19625bd8deadSopenharmony_ci    otherwise.
19635bd8deadSopenharmony_ci
19645bd8deadSopenharmony_ci    An implementation may choose to arrange programs threads into thread
19655bd8deadSopenharmony_ci    groups, and execute an instruction simultaneously for each thread in the
19665bd8deadSopenharmony_ci    group.  If the TGALL instruction is contained inside conditional flow
19675bd8deadSopenharmony_ci    control blocks and not all threads in the group execute the instruction,
19685bd8deadSopenharmony_ci    the operand values for threads not executing the instruction have no
19695bd8deadSopenharmony_ci    bearing on the value returned.  The method used to arrange threads into
19705bd8deadSopenharmony_ci    groups is undefined.
19715bd8deadSopenharmony_ci
19725bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
19735bd8deadSopenharmony_ci      result = { TRUE, TRUE, TRUE, TRUE };
19745bd8deadSopenharmony_ci      for (all active threads) {
19755bd8deadSopenharmony_ci        if ([thread]tmp.x == 0) result.x = FALSE;
19765bd8deadSopenharmony_ci        if ([thread]tmp.y == 0) result.y = FALSE;
19775bd8deadSopenharmony_ci        if ([thread]tmp.z == 0) result.z = FALSE;
19785bd8deadSopenharmony_ci        if ([thread]tmp.w == 0) result.w = FALSE;
19795bd8deadSopenharmony_ci      }
19805bd8deadSopenharmony_ci
19815bd8deadSopenharmony_ci    TGALL supports all data type modifiers.  For floating-point data types,
19825bd8deadSopenharmony_ci    the TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
19835bd8deadSopenharmony_ci    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
19845bd8deadSopenharmony_ci    integer data types, the TRUE value is the maximum integer value (all bits
19855bd8deadSopenharmony_ci    are ones) and the FALSE value is zero.
19865bd8deadSopenharmony_ci     
19875bd8deadSopenharmony_ci
19885bd8deadSopenharmony_ci    Section 2.X.8.Z, TGANY:  Test for Any Non-Zero in a Thread Group
19895bd8deadSopenharmony_ci
19905bd8deadSopenharmony_ci    The TGANY instruction produces a result vector by reading a vector operand
19915bd8deadSopenharmony_ci    for each active thread in the current thread group and comparing each
19925bd8deadSopenharmony_ci    component to zero.  A result vector component contains a TRUE value
19935bd8deadSopenharmony_ci    (described below) if the value of the corresponding component in the
19945bd8deadSopenharmony_ci    operand vector is non-zero for any active thread, and a FALSE value
19955bd8deadSopenharmony_ci    otherwise.
19965bd8deadSopenharmony_ci
19975bd8deadSopenharmony_ci    An implementation may choose to arrange programs threads into thread
19985bd8deadSopenharmony_ci    groups, and execute an instruction simultaneously for each thread in the
19995bd8deadSopenharmony_ci    group.  If the TGANY instruction is contained inside conditional flow
20005bd8deadSopenharmony_ci    control blocks and not all threads in the group execute the instruction,
20015bd8deadSopenharmony_ci    the operand values for threads not executing the instruction have no
20025bd8deadSopenharmony_ci    bearing on the value returned.  The method used to arrange threads into
20035bd8deadSopenharmony_ci    groups is undefined.
20045bd8deadSopenharmony_ci
20055bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
20065bd8deadSopenharmony_ci      result = { FALSE, FALSE, FALSE, FALSE };
20075bd8deadSopenharmony_ci      for (all active threads) {
20085bd8deadSopenharmony_ci        if ([thread]tmp.x != 0) result.x = TRUE;
20095bd8deadSopenharmony_ci        if ([thread]tmp.y != 0) result.y = TRUE;
20105bd8deadSopenharmony_ci        if ([thread]tmp.z != 0) result.z = TRUE;
20115bd8deadSopenharmony_ci        if ([thread]tmp.w != 0) result.w = TRUE;
20125bd8deadSopenharmony_ci      }
20135bd8deadSopenharmony_ci
20145bd8deadSopenharmony_ci    TGANY supports all data type modifiers.  For floating-point data types,
20155bd8deadSopenharmony_ci    the TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
20165bd8deadSopenharmony_ci    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
20175bd8deadSopenharmony_ci    integer data types, the TRUE value is the maximum integer value (all bits
20185bd8deadSopenharmony_ci    are ones) and the FALSE value is zero.
20195bd8deadSopenharmony_ci
20205bd8deadSopenharmony_ci
20215bd8deadSopenharmony_ci    Section 2.X.8.Z, TGEQ:  Test for All Equal Values in a Thread Group
20225bd8deadSopenharmony_ci
20235bd8deadSopenharmony_ci    The TGEQ instruction produces a result vector by reading a vector operand
20245bd8deadSopenharmony_ci    for each active thread in the current thread group and comparing each
20255bd8deadSopenharmony_ci    component to zero.  A result vector component contains a TRUE value
20265bd8deadSopenharmony_ci    (described below) if the value of the corresponding component in the
20275bd8deadSopenharmony_ci    operand vector is the same for all active threads, and a FALSE value
20285bd8deadSopenharmony_ci    otherwise.
20295bd8deadSopenharmony_ci
20305bd8deadSopenharmony_ci    An implementation may choose to arrange programs threads into thread
20315bd8deadSopenharmony_ci    groups, and execute an instruction simultaneously for each thread in the
20325bd8deadSopenharmony_ci    group.  If the TGEQ instruction is contained inside conditional flow
20335bd8deadSopenharmony_ci    control blocks and not all threads in the group execute the instruction,
20345bd8deadSopenharmony_ci    the operand values for threads not executing the instruction have no
20355bd8deadSopenharmony_ci    bearing on the value returned.  The method used to arrange threads into
20365bd8deadSopenharmony_ci    groups is undefined.
20375bd8deadSopenharmony_ci
20385bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
20395bd8deadSopenharmony_ci      tgall = { TRUE, TRUE, TRUE, TRUE };
20405bd8deadSopenharmony_ci      tgany = { FALSE, FALSE, FALSE, FALSE };
20415bd8deadSopenharmony_ci      for (all active threads) {
20425bd8deadSopenharmony_ci        if ([thread]tmp.x == 0) tgall.x = FALSE; else tgany.x = TRUE;
20435bd8deadSopenharmony_ci        if ([thread]tmp.y == 0) tgall.y = FALSE; else tgany.y = TRUE;
20445bd8deadSopenharmony_ci        if ([thread]tmp.z == 0) tgall.z = FALSE; else tgany.z = TRUE;
20455bd8deadSopenharmony_ci        if ([thread]tmp.w == 0) tgall.w = FALSE; else tgany.w = TRUE;
20465bd8deadSopenharmony_ci      }
20475bd8deadSopenharmony_ci      result.x = (tgall.x == tgany.x) ? TRUE : FALSE;
20485bd8deadSopenharmony_ci      result.y = (tgall.y == tgany.y) ? TRUE : FALSE;
20495bd8deadSopenharmony_ci      result.z = (tgall.z == tgany.z) ? TRUE : FALSE;
20505bd8deadSopenharmony_ci      result.w = (tgall.w == tgany.w) ? TRUE : FALSE;
20515bd8deadSopenharmony_ci
20525bd8deadSopenharmony_ci    TGEQ supports all data type modifiers.  For floating-point data types, the
20535bd8deadSopenharmony_ci    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
20545bd8deadSopenharmony_ci    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
20555bd8deadSopenharmony_ci    integer data types, the TRUE value is the maximum integer value (all bits
20565bd8deadSopenharmony_ci    are ones) and the FALSE value is zero.
20575bd8deadSopenharmony_ci
20585bd8deadSopenharmony_ci
20595bd8deadSopenharmony_ci    Section 2.X.8.Z, TXB:  Texture Sample with Bias
20605bd8deadSopenharmony_ci
20615bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
20625bd8deadSopenharmony_ci     longer need to be immediate arguments.)
20635bd8deadSopenharmony_ci
20645bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
20655bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
20665bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
20675bd8deadSopenharmony_ci      } else {
20685bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
20695bd8deadSopenharmony_ci      }
20705bd8deadSopenharmony_ci      ddx = ComputePartialsX(tmp);
20715bd8deadSopenharmony_ci      ddy = ComputePartialsY(tmp);
20725bd8deadSopenharmony_ci      lambda = ComputeLOD(ddx, ddy);
20735bd8deadSopenharmony_ci      result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, itmp);
20745bd8deadSopenharmony_ci
20755bd8deadSopenharmony_ci    Section 2.X.8.Z, TXG:  Texture Gather
20765bd8deadSopenharmony_ci
20775bd8deadSopenharmony_ci    (Update the TXG opcode description from NV_gpu_program4_1 specification.
20785bd8deadSopenharmony_ci     This version adds two capabilities:  any component of a multi-component
20795bd8deadSopenharmony_ci     texture can be selected by tacking on a component name to the texture
20805bd8deadSopenharmony_ci     variable passed to identify the texture unit, and depth compares are
20815bd8deadSopenharmony_ci     supported if a SHADOW target is specified.)
20825bd8deadSopenharmony_ci
20835bd8deadSopenharmony_ci    The TXG instruction takes the four components of a single floating-point
20845bd8deadSopenharmony_ci    vector operand as a texture coordinate, determines a set of four texels to
20855bd8deadSopenharmony_ci    sample from the base level of detail of the specified texture image, and
20865bd8deadSopenharmony_ci    returns one component from each texel in a four-component result vector.
20875bd8deadSopenharmony_ci    To determine the four texels to sample, the minification and magnification
20885bd8deadSopenharmony_ci    filters are ignored and the rules for LINEAR filter are applied to the
20895bd8deadSopenharmony_ci    base level of the texture image to determine the texels T_i0_j1, T_i1_j1,
20905bd8deadSopenharmony_ci    T_i1_j0, and T_i0_j0, as defined in equations 3.23 through 3.25. The 
20915bd8deadSopenharmony_ci    texels are then converted to texture source colors (Rs,Gs,Bs,As) according 
20925bd8deadSopenharmony_ci    to table 3.21, followed by application of the texture swizzle as described 
20935bd8deadSopenharmony_ci    in section 3.8.13.  A four-component vector is returned by taking one of 
20945bd8deadSopenharmony_ci    the four components of the swizzled texture source colors from each of the 
20955bd8deadSopenharmony_ci    four selected texels.  The component is selected using the 
20965bd8deadSopenharmony_ci    <texImageUnitComp> grammar rule, by adding a scalar suffix 
20975bd8deadSopenharmony_ci    (".x", ".y", ".z", ".w") to the identified texture; if no scalar suffix 
20985bd8deadSopenharmony_ci    is provided, the first component is selected.
20995bd8deadSopenharmony_ci
21005bd8deadSopenharmony_ci    TXG only operates on 2D, SHADOW2D, CUBE, SHADOWCUBE, ARRAY2D,
21015bd8deadSopenharmony_ci    SHADOWARRAY2D, ARRAYCUBE, SHADOWARRAYCUBE, RECT, and SHADOWRECT texture
21025bd8deadSopenharmony_ci    targets; a program will fail to compile if any other texture target is
21035bd8deadSopenharmony_ci    used.
21045bd8deadSopenharmony_ci
21055bd8deadSopenharmony_ci    When using a "SHADOW" texture target, component selection is ignored.
21065bd8deadSopenharmony_ci    Instead, depth comparisons are performed on the depth values for each of
21075bd8deadSopenharmony_ci    the four selected texels, and 0/1 values are returned based on the results
21085bd8deadSopenharmony_ci    of the comparison.  
21095bd8deadSopenharmony_ci
21105bd8deadSopenharmony_ci    As with other texture accesses, the results of a texture gather operation
21115bd8deadSopenharmony_ci    are undefined if the texture target in the instruction is incompatible
21125bd8deadSopenharmony_ci    with the selected texture's base internal format and depth compare mode.
21135bd8deadSopenharmony_ci
21145bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
21155bd8deadSopenharmony_ci      ddx = (0,0,0);
21165bd8deadSopenharmony_ci      ddy = (0,0,0);
21175bd8deadSopenharmony_ci      lambda = 0;
21185bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
21195bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
21205bd8deadSopenharmony_ci      } else {
21215bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
21225bd8deadSopenharmony_ci      }
21235bd8deadSopenharmony_ci      result.x = TextureSample_i0j1(tmp, lambda, ddx, ddy, itmp).<comp>;
21245bd8deadSopenharmony_ci      result.y = TextureSample_i1j1(tmp, lambda, ddx, ddy, itmp).<comp>;
21255bd8deadSopenharmony_ci      result.z = TextureSample_i1j0(tmp, lambda, ddx, ddy, itmp).<comp>;
21265bd8deadSopenharmony_ci      result.w = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
21275bd8deadSopenharmony_ci
21285bd8deadSopenharmony_ci    In this pseudocode, "<comp>" refers to the texel component selected by the
21295bd8deadSopenharmony_ci    <texImageUnitComp> grammar rule, as described above.
21305bd8deadSopenharmony_ci
21315bd8deadSopenharmony_ci    TXG supports all three data type modifiers.  The single operand is always
21325bd8deadSopenharmony_ci    treated as a floating-point vector; the results are interpreted according
21335bd8deadSopenharmony_ci    to the data type modifier.
21345bd8deadSopenharmony_ci
21355bd8deadSopenharmony_ci
21365bd8deadSopenharmony_ci    Section 2.X.8.Z, TXGO:  Texture Gather with Per-Texel Offsets
21375bd8deadSopenharmony_ci
21385bd8deadSopenharmony_ci    Like the TXG instruction, the TXGO instruction takes the four components
21395bd8deadSopenharmony_ci    of its first floating-point vector operand as a texture coordinate,
21405bd8deadSopenharmony_ci    determines a set of four texels to sample from the base level of detail of
21415bd8deadSopenharmony_ci    the specified texture image, and returns one component from each texel in
21425bd8deadSopenharmony_ci    a four-component result vector.  The second and third vector operands are
21435bd8deadSopenharmony_ci    taken as signed four-component integer vectors providing the x and y
21445bd8deadSopenharmony_ci    components of the offsets, respectively, used to determine the location of
21455bd8deadSopenharmony_ci    each of the four texels.  To determine the four texels to sample, each of
21465bd8deadSopenharmony_ci    the four independent offsets is used in conjunction with the specified
21475bd8deadSopenharmony_ci    texture coordinate to select a texel.  The minification and magnification
21485bd8deadSopenharmony_ci    filters are ignored and the rules for LINEAR filtering are used to select
21495bd8deadSopenharmony_ci    the texel T_i0_j0, as defined in equations 3.23 through 3.25, from the
21505bd8deadSopenharmony_ci    base level of the texture image.  The texels are then converted to texture 
21515bd8deadSopenharmony_ci    source colors (Rs,Gs,Bs,As) according to table 3.21, followed by 
21525bd8deadSopenharmony_ci    application of the texture swizzle as described in section 3.8.13.  A 
21535bd8deadSopenharmony_ci    four-component vector is returned by taking one of the four components 
21545bd8deadSopenharmony_ci    of the swizzled texture source colors from each of the four selected 
21555bd8deadSopenharmony_ci    texels.  The component is selected using the <texImageUnitComp> grammar 
21565bd8deadSopenharmony_ci    rule, by adding a scalar suffix (".x", ".y", ".z", ".w") to the identified 
21575bd8deadSopenharmony_ci    texture; if no scalar suffix is provided, the first component is selected.
21585bd8deadSopenharmony_ci
21595bd8deadSopenharmony_ci    TXGO only operates on 2D, SHADOW2D, ARRAY2D, SHADOWARRAY2D, RECT, and
21605bd8deadSopenharmony_ci    SHADOWRECT texture targets; a program will fail to compile if any other
21615bd8deadSopenharmony_ci    texture target is used.  
21625bd8deadSopenharmony_ci
21635bd8deadSopenharmony_ci    When using a "SHADOW" texture target, component selection is ignored.
21645bd8deadSopenharmony_ci    Instead, depth comparisons are performed on the depth values for each of
21655bd8deadSopenharmony_ci    the four selected texels, and 0/1 values are returned based on the results
21665bd8deadSopenharmony_ci    of the comparison.  
21675bd8deadSopenharmony_ci
21685bd8deadSopenharmony_ci    As with other texture accesses, the results of a texture gather operation
21695bd8deadSopenharmony_ci    are undefined if the texture target in the instruction is incompatible
21705bd8deadSopenharmony_ci    with the selected texture's base internal format and depth compare mode.
21715bd8deadSopenharmony_ci
21725bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
21735bd8deadSopenharmony_ci      itmp1 = VectorLoad(op1);
21745bd8deadSopenharmony_ci      itmp2 = VectorLoad(op2);
21755bd8deadSopenharmony_ci      ddx = (0,0,0);
21765bd8deadSopenharmony_ci      ddy = (0,0,0);
21775bd8deadSopenharmony_ci      lambda = 0;
21785bd8deadSopenharmony_ci      itmp = (op1.x, op2.x);
21795bd8deadSopenharmony_ci      result.x = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
21805bd8deadSopenharmony_ci      itmp = (op1.y, op2.y);
21815bd8deadSopenharmony_ci      result.y = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
21825bd8deadSopenharmony_ci      itmp = (op1.z, op2.z);
21835bd8deadSopenharmony_ci      result.z = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
21845bd8deadSopenharmony_ci      itmp = (op1.w, op2.w);
21855bd8deadSopenharmony_ci      result.w = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>;
21865bd8deadSopenharmony_ci
21875bd8deadSopenharmony_ci    In this pseudocode, "<comp>" refers to the texel component selected by the
21885bd8deadSopenharmony_ci    <texImageUnitComp> grammar rule, as described above.
21895bd8deadSopenharmony_ci
21905bd8deadSopenharmony_ci    If TEXTURE_WRAP_S or TEXTURE_WRAP_T are either CLAMP or MIRROR_CLAMP_EXT,
21915bd8deadSopenharmony_ci    the results of the TXGO instruction are undefined.
21925bd8deadSopenharmony_ci
21935bd8deadSopenharmony_ci    Note:  The TXG instruction is equivalent to the TXGO instruction with X
21945bd8deadSopenharmony_ci    and Y offset vectors of (0,1,1,0) and (0,0,-1,-1), respectively.
21955bd8deadSopenharmony_ci
21965bd8deadSopenharmony_ci    TXGO supports all three data type modifiers.  The first operand is always
21975bd8deadSopenharmony_ci    treated as a floating-point vector and the second and third operands are
21985bd8deadSopenharmony_ci    always treated as a signed integer vector; the results are interpreted
21995bd8deadSopenharmony_ci    according to the data type modifier.
22005bd8deadSopenharmony_ci
22015bd8deadSopenharmony_ci
22025bd8deadSopenharmony_ci    Section 2.X.8.Z, TXL:  Texture Sample with LOD
22035bd8deadSopenharmony_ci
22045bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
22055bd8deadSopenharmony_ci     longer need to be immediate arguments.)
22065bd8deadSopenharmony_ci
22075bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
22085bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
22095bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
22105bd8deadSopenharmony_ci      } else {
22115bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
22125bd8deadSopenharmony_ci      }
22135bd8deadSopenharmony_ci      ddx = (0,0,0);
22145bd8deadSopenharmony_ci      ddy = (0,0,0);
22155bd8deadSopenharmony_ci      result = TextureSample(tmp, tmp.w, ddx, ddy, itmp);
22165bd8deadSopenharmony_ci
22175bd8deadSopenharmony_ci
22185bd8deadSopenharmony_ci    Section 2.X.8.Z, TXP:  Texture Sample with Projection
22195bd8deadSopenharmony_ci
22205bd8deadSopenharmony_ci    (Modify the instruction pseudo-code to account for texel offsets no
22215bd8deadSopenharmony_ci     longer need to be immediate arguments.)
22225bd8deadSopenharmony_ci
22235bd8deadSopenharmony_ci      tmp0 = VectorLoad(op0);
22245bd8deadSopenharmony_ci      tmp0.x = tmp0.x / tmp0.w;
22255bd8deadSopenharmony_ci      tmp0.y = tmp0.y / tmp0.w;
22265bd8deadSopenharmony_ci      tmp0.z = tmp0.z / tmp0.w;
22275bd8deadSopenharmony_ci      if (instruction has variable texel offset) {
22285bd8deadSopenharmony_ci        itmp = VectorLoad(op1);
22295bd8deadSopenharmony_ci      } else {
22305bd8deadSopenharmony_ci        itmp = instruction.texelOffset;
22315bd8deadSopenharmony_ci      }
22325bd8deadSopenharmony_ci      ddx = ComputePartialsX(tmp);
22335bd8deadSopenharmony_ci      ddy = ComputePartialsY(tmp);
22345bd8deadSopenharmony_ci      lambda = ComputeLOD(ddx, ddy);
22355bd8deadSopenharmony_ci      result = TextureSample(tmp, lambda, ddx, ddy, itmp);
22365bd8deadSopenharmony_ci
22375bd8deadSopenharmony_ci        
22385bd8deadSopenharmony_ci    Section 2.X.8.Z, UP64:  Unpack 64-bit Component
22395bd8deadSopenharmony_ci
22405bd8deadSopenharmony_ci    The UP64 instruction produces a vector result with 32-bit components by
22415bd8deadSopenharmony_ci    unpacking the bits of the "x" and "y" components of a 64-bit vector
22425bd8deadSopenharmony_ci    operand.  The "x" component of the operand is unpacked to produce the "x"
22435bd8deadSopenharmony_ci    and "y" components of the result vector; the "y" component is unpacked to
22445bd8deadSopenharmony_ci    produce the "z" and "w" components of the result vector.
22455bd8deadSopenharmony_ci
22465bd8deadSopenharmony_ci    This instruction is intended to allow a program to pass 64-bit integer or
22475bd8deadSopenharmony_ci    floating-point values to an application using two 32-bit values stored in
22485bd8deadSopenharmony_ci    adjacent words in memory, which will be read by the application as single
22495bd8deadSopenharmony_ci    64-bit values.  The ability to use this technique depends on how the
22505bd8deadSopenharmony_ci    64-bit value is stored in memory.  For "little-endian" processors, the
22515bd8deadSopenharmony_ci    first 32-bit value would hold the with the least significant 32 bits of
22525bd8deadSopenharmony_ci    the 64-bit value.  For "big-endian" processors, the first 32-bit value
22535bd8deadSopenharmony_ci    holds the most significant 32 bits of the 64-bit value.  This
22545bd8deadSopenharmony_ci    reconstruction assumes that the first 32-bit word comes from the "x"
22555bd8deadSopenharmony_ci    component of the operand and the second 32-bit word comes from the "y"
22565bd8deadSopenharmony_ci    component.  The method used to unpack a 64-bit value into a pair of 32-bit
22575bd8deadSopenharmony_ci    values depends on the processor type.
22585bd8deadSopenharmony_ci
22595bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
22605bd8deadSopenharmony_ci      if (underlying system is little-endian) {
22615bd8deadSopenharmony_ci        result.x = (RawBits(tmp.x) >>  0) & 0xFFFFFFFF;
22625bd8deadSopenharmony_ci        result.y = (RawBits(tmp.x) >> 32) & 0xFFFFFFFF;
22635bd8deadSopenharmony_ci        result.z = (RawBits(tmp.y) >>  0) & 0xFFFFFFFF;
22645bd8deadSopenharmony_ci        result.w = (RawBits(tmp.y) >> 32) & 0xFFFFFFFF;
22655bd8deadSopenharmony_ci      } else {
22665bd8deadSopenharmony_ci        result.x = (RawBits(tmp.x) >> 32) & 0xFFFFFFFF;
22675bd8deadSopenharmony_ci        result.y = (RawBits(tmp.x) >>  0) & 0xFFFFFFFF;
22685bd8deadSopenharmony_ci        result.z = (RawBits(tmp.y) >> 32) & 0xFFFFFFFF;
22695bd8deadSopenharmony_ci        result.w = (RawBits(tmp.y) >>  0) & 0xFFFFFFFF;
22705bd8deadSopenharmony_ci      }
22715bd8deadSopenharmony_ci
22725bd8deadSopenharmony_ci    UP64 supports integer and floating-point data type modifiers, which
22735bd8deadSopenharmony_ci    specify the base data type of the operand and result.  The single operand
22745bd8deadSopenharmony_ci    vector always has 64-bit components.  The result is treated as a vector
22755bd8deadSopenharmony_ci    with 32-bit components.  The encoding performed by UP64 can be reversed
22765bd8deadSopenharmony_ci    using the PK64 instruction.
22775bd8deadSopenharmony_ci
22785bd8deadSopenharmony_ci    A program will fail to load if it contains a UP64 instruction whose
22795bd8deadSopenharmony_ci    operand is a variable not declared as "LONG".
22805bd8deadSopenharmony_ci
22815bd8deadSopenharmony_ci
22825bd8deadSopenharmony_ci    Modify Section 2.14.6.1 of the NV_geometry_program4 specification,
22835bd8deadSopenharmony_ci    Geometry Program Input Primitives
22845bd8deadSopenharmony_ci
22855bd8deadSopenharmony_ci    (add patches to the list of supported input primitive types)
22865bd8deadSopenharmony_ci
22875bd8deadSopenharmony_ci    The supported input primitive types are: ...
22885bd8deadSopenharmony_ci
22895bd8deadSopenharmony_ci    Patches (PATCHES)
22905bd8deadSopenharmony_ci
22915bd8deadSopenharmony_ci    Geometry programs that operate on patches are valid only for the
22925bd8deadSopenharmony_ci    PATCHES_NV primitive type.  There are a variable number of vertices
22935bd8deadSopenharmony_ci    available for each program invocation, depending on the number of input
22945bd8deadSopenharmony_ci    vertices in the primitive itself.  For a patch with <n> vertices,
22955bd8deadSopenharmony_ci    "vertex[0]" refers to the first vertex of the patch, and "vertex[<n>-1]"
22965bd8deadSopenharmony_ci    refers to the last vertex.
22975bd8deadSopenharmony_ci
22985bd8deadSopenharmony_ci    
22995bd8deadSopenharmony_ci    Modify Section 2.14.6.2 of the NV_geometry_program4 specification,
23005bd8deadSopenharmony_ci    Geometry Program Output Primitives
23015bd8deadSopenharmony_ci
23025bd8deadSopenharmony_ci    (Add a new paragraph limiting the use of the EMITS opcode to geometry
23035bd8deadSopenharmony_ci     programs with a POINTS output primitive type at the end of the section.
23045bd8deadSopenharmony_ci     This limitation may be removed in future specifications.)
23055bd8deadSopenharmony_ci
23065bd8deadSopenharmony_ci    Geometry programs may write to multiple vertex streams only if the
23075bd8deadSopenharmony_ci    specified output primitive type is POINTS.  A program will fail to load if
23085bd8deadSopenharmony_ci    it contains and EMITS instruction and the output primitive type specified
23095bd8deadSopenharmony_ci    by the PRIMITIVE_OUT declaration is not POINTS.
23105bd8deadSopenharmony_ci
23115bd8deadSopenharmony_ci    Modify Section 2.14.6.4 of the NV_geometry_program4 specification,
23125bd8deadSopenharmony_ci    Geometry Program Output Limits
23135bd8deadSopenharmony_ci
23145bd8deadSopenharmony_ci    (Modify the limitation on the total number of components emitted by a
23155bd8deadSopenharmony_ci     geometry program from NV_gpu_program4 to be per-invocation.  If a that
23165bd8deadSopenharmony_ci     limit is 4096 and a program has 16 invocations, each of the 16 program
23175bd8deadSopenharmony_ci     invocation can emit up to 4096 total components.)
23185bd8deadSopenharmony_ci
23195bd8deadSopenharmony_ci    There are two implementation-dependent limits that limit the total number
23205bd8deadSopenharmony_ci    of vertices that each invocation of a program can emit.  First, the vertex
23215bd8deadSopenharmony_ci    limit may not exceed the value of MAX_PROGRAM_OUTPUT_VERTICES_NV.  Second,
23225bd8deadSopenharmony_ci    product of the vertex limit and the number of result variable components
23235bd8deadSopenharmony_ci    written by the program (PROGRAM_RESULT_COMPONENTS_NV, as described in
23245bd8deadSopenharmony_ci    section 2.X.3.5 of NV_gpu_program4) may not exceed the value of
23255bd8deadSopenharmony_ci    MAX_PROGRAM_TOTAL_OUTPUT_COMPONENTS_NV.  A geometry program will fail to
23265bd8deadSopenharmony_ci    load if its maximum vertex count or maximum total component count exceeds
23275bd8deadSopenharmony_ci    the implementation-dependent limit.  The limits may be queried by calling
23285bd8deadSopenharmony_ci    GetProgramiv with a <target> of GEOMETRY_PROGRAM_NV.  Note that the
23295bd8deadSopenharmony_ci    maximum number of vertices that a geometry program can emit may be much
23305bd8deadSopenharmony_ci    lower than MAX_PROGRAM_OUTPUT_VERTICES_NV if the program writes a large
23315bd8deadSopenharmony_ci    number of result variable components.  If a geometry program has multiple
23325bd8deadSopenharmony_ci    invocations (via the "INVOCATIONS" declaration), the program will load
23335bd8deadSopenharmony_ci    successfully as long as no single invocation exceeds the total component
23345bd8deadSopenharmony_ci    count limit, even if the total output of all invocations combined exceeds
23355bd8deadSopenharmony_ci    the limit.
23365bd8deadSopenharmony_ci
23375bd8deadSopenharmony_ci
23385bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.0 Specification (Rasterization)
23395bd8deadSopenharmony_ci
23405bd8deadSopenharmony_ci    Modify Section 3.X, Early Per-Fragment Tests, as documented in the
23415bd8deadSopenharmony_ci    EXT_shader_image_load_store specification
23425bd8deadSopenharmony_ci
23435bd8deadSopenharmony_ci    (add new paragraph at the end of a section, describing how early fragment
23445bd8deadSopenharmony_ci     tests work when assembly fragment programs are active)
23455bd8deadSopenharmony_ci
23465bd8deadSopenharmony_ci    If an assembly fragment program is active, early depth tests are
23475bd8deadSopenharmony_ci    considered enabled if and only if the fragment program source included the
23485bd8deadSopenharmony_ci    NV_early_fragment_tests option.
23495bd8deadSopenharmony_ci
23505bd8deadSopenharmony_ci
23515bd8deadSopenharmony_ci   Add to Section 3.11.4.5 of ARB_fragment_program (Fragment Program):
23525bd8deadSopenharmony_ci
23535bd8deadSopenharmony_ci   Section 3.11.4.5.3, ARB_blend_func_extended Option
23545bd8deadSopenharmony_ci
23555bd8deadSopenharmony_ci   If a fragment program specifies the "ARB_blend_func_extended" option, dual 
23565bd8deadSopenharmony_ci   source color outputs as described in ARB_blend_func_extended are made 
23575bd8deadSopenharmony_ci   available through the use of the "result.color[n].primary" and
23585bd8deadSopenharmony_ci   "result.color[n].secondary" result bindings, corresponding to SRC_COLOR
23595bd8deadSopenharmony_ci   and SRC1_COLOR, respectively, for the fragment color output numbered <n>.
23605bd8deadSopenharmony_ci
23615bd8deadSopenharmony_ci
23625bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.0 Specification (Per-Fragment
23635bd8deadSopenharmony_ciOperations and the Frame Buffer)
23645bd8deadSopenharmony_ci
23655bd8deadSopenharmony_ci    Modify Section 4.4.3, Rendering When an Image of a Bound Texture Object
23665bd8deadSopenharmony_ci    is Also Attached to the Framebuffer, p. 288
23675bd8deadSopenharmony_ci
23685bd8deadSopenharmony_ci    (Replace the complicated set of conditions with the following)
23695bd8deadSopenharmony_ci
23705bd8deadSopenharmony_ci    Specifically, the values of rendered fragments are undefined if any 
23715bd8deadSopenharmony_ci    shader stage fetches texels from a given mipmap level, cubemap face, and 
23725bd8deadSopenharmony_ci    array layer of a texture if that same mipmap level, cubemap face, and 
23735bd8deadSopenharmony_ci    array layer of the texture can be written to via fragment shader outputs, 
23745bd8deadSopenharmony_ci    even if the reads and writes are not in the same Draw call. However, an 
23755bd8deadSopenharmony_ci    application can insert MemoryBarrier(TEXTURE_FETCH_BARRIER_BIT_NV) between
23765bd8deadSopenharmony_ci    Draw calls that have such read/write hazards in order to guarantee that 
23775bd8deadSopenharmony_ci    writes have completed and caches have been invalidated, as described in 
23785bd8deadSopenharmony_ci    section 2.20.X.
23795bd8deadSopenharmony_ci
23805bd8deadSopenharmony_ci
23815bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions)
23825bd8deadSopenharmony_ci
23835bd8deadSopenharmony_ci    None.
23845bd8deadSopenharmony_ci
23855bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.0 Specification (State and
23865bd8deadSopenharmony_ciState Requests)
23875bd8deadSopenharmony_ci
23885bd8deadSopenharmony_ci    None.
23895bd8deadSopenharmony_ci
23905bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.0 Specification (Invariance)
23915bd8deadSopenharmony_ci
23925bd8deadSopenharmony_ci    None.
23935bd8deadSopenharmony_ci
23945bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
23955bd8deadSopenharmony_ci
23965bd8deadSopenharmony_ci    None.
23975bd8deadSopenharmony_ci
23985bd8deadSopenharmony_ciGLX Protocol
23995bd8deadSopenharmony_ci
24005bd8deadSopenharmony_ci    None.
24015bd8deadSopenharmony_ci
24025bd8deadSopenharmony_ciErrors
24035bd8deadSopenharmony_ci
24045bd8deadSopenharmony_ci    None, other than new conditions by which a program string would fail to
24055bd8deadSopenharmony_ci    load.
24065bd8deadSopenharmony_ci
24075bd8deadSopenharmony_ciNew State
24085bd8deadSopenharmony_ci
24095bd8deadSopenharmony_ci    None.
24105bd8deadSopenharmony_ci
24115bd8deadSopenharmony_ci
24125bd8deadSopenharmony_ciNew Implementation Dependent State
24135bd8deadSopenharmony_ci
24145bd8deadSopenharmony_ci                                                             Minimum
24155bd8deadSopenharmony_ci    Get Value                         Type  Get Command       Value   Description           Sec.   Attrib
24165bd8deadSopenharmony_ci    --------------------------------  ----  ---------------  -------  --------------------- ------ ------
24175bd8deadSopenharmony_ci    MAX_GEOMETRY_PROGRAM_              Z+   GetIntegerv        32     Maximum number of GP  2.X.6.Y  -
24185bd8deadSopenharmony_ci      INVOCATIONS_NV                                                  invocations per prim.
24195bd8deadSopenharmony_ci    MIN_FRAGMENT_INTERPOLATION_        R    GetFloatv        -0.5     Max. negative offset  2.X.8.Z  -
24205bd8deadSopenharmony_ci      OFFSET_NV                                                       for IPAO instruction.
24215bd8deadSopenharmony_ci    MAX_FRAGMENT_INTERPOLATION_        R    GetFloatv         +0.5    Max. positive offset  2.X.8.Z  -
24225bd8deadSopenharmony_ci      OFFSET_NV                                                       for IPAO instruction.
24235bd8deadSopenharmony_ci    FRAGMENT_PROGRAM_INTERPOLATION_    Z+   GetIntegerv         4     Subpixel bit count    2.X.8.Z  -
24245bd8deadSopenharmony_ci      OFFSET_BITS_NV                                                  for IPAO instruction
24255bd8deadSopenharmony_ci
24265bd8deadSopenharmony_ci
24275bd8deadSopenharmony_ciDependencies on NV_gpu_program4, NV_vertex_program4, NV_geometry_program4, and
24285bd8deadSopenharmony_ciNV_fragment_program4
24295bd8deadSopenharmony_ci
24305bd8deadSopenharmony_ci    This extension is written against the NV_gpu_program4 family of
24315bd8deadSopenharmony_ci    extensions, and introduces new instruction set features and inputs/outputs
24325bd8deadSopenharmony_ci    described here.  These features are available only if the extension is
24335bd8deadSopenharmony_ci    supported and the appropriate program header string is used ("!!NVvp5.0"
24345bd8deadSopenharmony_ci    for vertex programs, "!!NVgp5.0" for geometry programs, and "!!NVfp5.0"
24355bd8deadSopenharmony_ci    for fragment programs.) When loading a program with an older header (e.g.,
24365bd8deadSopenharmony_ci    "!!NVvp4.0"), the instruction set features described in this extension are
24375bd8deadSopenharmony_ci    not available.  The features in this extension build upon those documented
24385bd8deadSopenharmony_ci    in full in NV_gpu_program4.
24395bd8deadSopenharmony_ci
24405bd8deadSopenharmony_ciDependencies on NV_tessellation_program5
24415bd8deadSopenharmony_ci
24425bd8deadSopenharmony_ci    This extension provides the basic assembly instruction set constructs for
24435bd8deadSopenharmony_ci    tessellation programs.  If this extension is supported, tessellation
24445bd8deadSopenharmony_ci    control and evaluation programs are supported, as described in the
24455bd8deadSopenharmony_ci    NV_tessellation_program5 specification.  There is no separate extension
24465bd8deadSopenharmony_ci    string for tessellation programs; such support is implied by this
24475bd8deadSopenharmony_ci    extension.
24485bd8deadSopenharmony_ci
24495bd8deadSopenharmony_ciDependencies on ARB_transform_feedback3
24505bd8deadSopenharmony_ci
24515bd8deadSopenharmony_ci    The concept of multiple vertex streams emitted by a geometry shader is
24525bd8deadSopenharmony_ci    introduced by ARB_transform_feedback3, as is the description of how they
24535bd8deadSopenharmony_ci    operate and implementation-dependent limits on the number of streams.
24545bd8deadSopenharmony_ci    This extension simply provides a mechanism to emit a vertex to more than
24555bd8deadSopenharmony_ci    one stream.  If ARB_transform_feedback3 is not supported, language
24565bd8deadSopenharmony_ci    describing the EMITS opcode and the restriction on PRIMITIVE_OUT when
24575bd8deadSopenharmony_ci    EMITS is used should be removed.
24585bd8deadSopenharmony_ci
24595bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load
24605bd8deadSopenharmony_ci
24615bd8deadSopenharmony_ci    The programmability functionality provided by NV_shader_buffer_load is
24625bd8deadSopenharmony_ci    also incorporated by this extension.  Any assembly program using a program
24635bd8deadSopenharmony_ci    header corresponding to this or any subsequent extension (e.g.,
24645bd8deadSopenharmony_ci    "!!NVfp5.0") may use the LOAD opcode without needing to declare "OPTION
24655bd8deadSopenharmony_ci    NV_shader_buffer_load".
24665bd8deadSopenharmony_ci
24675bd8deadSopenharmony_ci    NV_shader_buffer_load is required by this extension, which means that the
24685bd8deadSopenharmony_ci    API mechanisms documented there allowing applications to make a buffer
24695bd8deadSopenharmony_ci    resident and query its GPU address are available to any applications using
24705bd8deadSopenharmony_ci    this extension.
24715bd8deadSopenharmony_ci
24725bd8deadSopenharmony_ci    In addition to the basic functionality in NV_shader_buffer_load, this
24735bd8deadSopenharmony_ci    extension provides the ability to load 64-bit integers and floating-point
24745bd8deadSopenharmony_ci    values using the "S64", "S64X2", "S64X4", "U64", "U64X2", "U64X4", "F64",
24755bd8deadSopenharmony_ci    "F64X2", and "F64X4" opcode modifiers.
24765bd8deadSopenharmony_ci
24775bd8deadSopenharmony_ciDependencies on NV_shader_buffer_store
24785bd8deadSopenharmony_ci
24795bd8deadSopenharmony_ci    This extension provides assembly programmability support for the
24805bd8deadSopenharmony_ci    NV_shader_buffer_store, which provides the API mechanisms allowing buffer
24815bd8deadSopenharmony_ci    object to be stored to.  NV_shader_buffer_store does not have a separate
24825bd8deadSopenharmony_ci    extension string entry, and will always be supported if this extension is
24835bd8deadSopenharmony_ci    present.
24845bd8deadSopenharmony_ci
24855bd8deadSopenharmony_ciDependencies on NV_parameter_buffer_object2
24865bd8deadSopenharmony_ci
24875bd8deadSopenharmony_ci    The programmability functionality provided by NV_parameter_buffer_object2
24885bd8deadSopenharmony_ci    is also incorporated by this extension.  Any assembly program using a
24895bd8deadSopenharmony_ci    program header corresponding to this or any subsequent extension (e.g.,
24905bd8deadSopenharmony_ci    "!!NVfp5.0") may use the LDC opcode without needing to declare "OPTION
24915bd8deadSopenharmony_ci    NV_parameter_buffer_object2".
24925bd8deadSopenharmony_ci
24935bd8deadSopenharmony_ci    In addition to the basic functionality in NV_parameter_buffer_object2,
24945bd8deadSopenharmony_ci    this extension provides the ability to load 64-bit integers and
24955bd8deadSopenharmony_ci    floating-point values using the "S64", "S64X2", "S64X4", "U64", "U64X2",
24965bd8deadSopenharmony_ci    "U64X4", "F64", "F64X2", and "F64X4" opcode modifiers.
24975bd8deadSopenharmony_ci
24985bd8deadSopenharmony_ciDependencies on OpenGL 3.3, ARB_texture_swizzle, and EXT_texture_swizzle
24995bd8deadSopenharmony_ci
25005bd8deadSopenharmony_ci    If OpenGL 3.3, ARB_texture_swizzle, and EXT_texture_swizzle are not
25015bd8deadSopenharmony_ci    supported, remove the swizzling step from the definition of TXG and TXGO.
25025bd8deadSopenharmony_ci
25035bd8deadSopenharmony_ciDependencies on ARB_blend_func_extended
25045bd8deadSopenharmony_ci
25055bd8deadSopenharmony_ci    If ARB_blend_func_extended is not supported, references to the dual source
25065bd8deadSopenharmony_ci    color output bindings (result.color.primary and result.color.secondary)
25075bd8deadSopenharmony_ci    should be removed.
25085bd8deadSopenharmony_ci
25095bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store
25105bd8deadSopenharmony_ci
25115bd8deadSopenharmony_ci    EXT_shader_image_load_store provides OpenGL Shading Language mechanisms to
25125bd8deadSopenharmony_ci    load/store to buffer and texture image memory, including spec language
25135bd8deadSopenharmony_ci    describing memory access ordering and synchronization, a built-in function
25145bd8deadSopenharmony_ci    (MemoryBarrierEXT) controlling synchronization of memory operations, and
25155bd8deadSopenharmony_ci    spec language describing early fragment tests that can be enabled via GLSL
25165bd8deadSopenharmony_ci    fragment shader source.  These sections of the EXT_shader_image_load_store
25175bd8deadSopenharmony_ci    specification apply equally to the assembly program memory accesses
25185bd8deadSopenharmony_ci    provided by this extension.  If EXT_shader_image_load_store is not
25195bd8deadSopenharmony_ci    supported, the sections of that specification describing these features
25205bd8deadSopenharmony_ci    should be considered to be added to this extension.
25215bd8deadSopenharmony_ci
25225bd8deadSopenharmony_ci    EXT_shader_image_load_store additionally provides and documents assembly
25235bd8deadSopenharmony_ci    language support for image loads, stores, and atomics as described in the
25245bd8deadSopenharmony_ci    "Dependencies on NV_gpu_program5" section of EXT_shader_image_load_store.
25255bd8deadSopenharmony_ci    The features described there are automatically supported for all
25265bd8deadSopenharmony_ci    NV_gpu_program5 assembly programs without requiring any additional
25275bd8deadSopenharmony_ci    "OPTION" line.
25285bd8deadSopenharmony_ci
25295bd8deadSopenharmony_ciDependencies on ARB_shader_subroutine
25305bd8deadSopenharmony_ci
25315bd8deadSopenharmony_ci    ARB_shader_subroutine provides and documents assembly language support for
25325bd8deadSopenharmony_ci    subroutines as described in the "Dependencies on NV_gpu_program5" section
25335bd8deadSopenharmony_ci    of ARB_shader_subroutine.  The features described there are automatically
25345bd8deadSopenharmony_ci    supported for all NV_gpu_program5 assembly programs without requiring any
25355bd8deadSopenharmony_ci    additional "OPTION" line.
25365bd8deadSopenharmony_ci
25375bd8deadSopenharmony_ci
25385bd8deadSopenharmony_ciIssues
25395bd8deadSopenharmony_ci
25405bd8deadSopenharmony_ci    (1) Are there any restrictions or performance concerns involving the
25415bd8deadSopenharmony_ci        support for indexing textures or parameter buffers?
25425bd8deadSopenharmony_ci
25435bd8deadSopenharmony_ci      RESOLVED:  There are no significant functional limitations.  Textures
25445bd8deadSopenharmony_ci      and parameter buffers accessed with an index must be declared as arrays,
25455bd8deadSopenharmony_ci      so the assembler knows which textures might be accessed this way.
25465bd8deadSopenharmony_ci      Additionally, accessing an array of textures or parameter buffers with
25475bd8deadSopenharmony_ci      an out-of-bounds index will yield undefined results.  
25485bd8deadSopenharmony_ci
25495bd8deadSopenharmony_ci      In particular, there is no limitation on the values used for indexing --
25505bd8deadSopenharmony_ci      they are not required to be true constants and are not required to have
25515bd8deadSopenharmony_ci      the same value for all vertices/fragments in a primitive.  However,
25525bd8deadSopenharmony_ci      using divergent texture or parameter buffer indices may have performance
25535bd8deadSopenharmony_ci      concerns.  We expect that GPU implementations of this extension will run
25545bd8deadSopenharmony_ci      multiple program threads in parallel (SIMD).  If different threads in a
25555bd8deadSopenharmony_ci      thread group have different indices, it will be necessary to do lookups
25565bd8deadSopenharmony_ci      in more than one texture at once.  This is likely to result in some
25575bd8deadSopenharmony_ci      thread serialization.  We expect that indexed texture or parameter
25585bd8deadSopenharmony_ci      buffer access where all indices in a thread group match will perform
25595bd8deadSopenharmony_ci      identically to non-indexed accesses.
25605bd8deadSopenharmony_ci
25615bd8deadSopenharmony_ci    (2) Which texture instructions support programmable texel offsets, and
25625bd8deadSopenharmony_ci        what offset limits apply?
25635bd8deadSopenharmony_ci
25645bd8deadSopenharmony_ci      RESOLVED:  Most texture instructions (TEX, TXB, TXF, TXG, TXL, TXP)
25655bd8deadSopenharmony_ci      support both constant texel offsets as provided by NV_gpu_program4 and
25665bd8deadSopenharmony_ci      programmable texel offsets.  TXD supports only constant offsets.  TXGO
25675bd8deadSopenharmony_ci      does not support non-zero or programmable offsets in the texture portion
25685bd8deadSopenharmony_ci      of the instruction, but provides full support for programmable offsets
25695bd8deadSopenharmony_ci      via two of the three vector arguments in the regular instruction.
25705bd8deadSopenharmony_ci
25715bd8deadSopenharmony_ci      For example,
25725bd8deadSopenharmony_ci
25735bd8deadSopenharmony_ci        TEX result, coord, texture[0], 2D, (-1,-1);
25745bd8deadSopenharmony_ci
25755bd8deadSopenharmony_ci      uses the NV_gpu_program4 mechanism applies a constant texel offset of
25765bd8deadSopenharmony_ci      (-1,-1) to the texture coordinates.  With programmable offsets, the
25775bd8deadSopenharmony_ci      following code applies the same offset.
25785bd8deadSopenharmony_ci
25795bd8deadSopenharmony_ci        TEMP offxy;
25805bd8deadSopenharmony_ci        MOV offxy, {-1, -1};
25815bd8deadSopenharmony_ci        TEX result, coord, texture[0], offset(offxy);
25825bd8deadSopenharmony_ci
25835bd8deadSopenharmony_ci      Of course, the programmable form allows the offsets to be computed in
25845bd8deadSopenharmony_ci      the program and does not require constant values.
25855bd8deadSopenharmony_ci
25865bd8deadSopenharmony_ci      For most texture instructions, the range of allowable offsets is
25875bd8deadSopenharmony_ci      [MIN_PROGRAM_TEXEL_OFFSET_EXT, MAX_PROGRAM_TEXEL_OFFSET_EXT] for both
25885bd8deadSopenharmony_ci      constant and programmable texel offsets.  Constant offsets can be
25895bd8deadSopenharmony_ci      checked when the program is loaded, and out-of-bounds offsets cause the
25905bd8deadSopenharmony_ci      program to fail to load.  Programmable offsets can not have a
25915bd8deadSopenharmony_ci      load-time range check; out-of-bounds offsets produce undefined results.
25925bd8deadSopenharmony_ci
25935bd8deadSopenharmony_ci      Additionally, the new TXGO instruction has a separate (likely larger)
25945bd8deadSopenharmony_ci      allowable offset range, [MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV,
25955bd8deadSopenharmony_ci      MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV], that applies to the offset
25965bd8deadSopenharmony_ci      vectors passed in its second and third operand.
25975bd8deadSopenharmony_ci
25985bd8deadSopenharmony_ci      In the initial implementation of this extension, the range limits are
25995bd8deadSopenharmony_ci      [-8,+7] for most instructions and [-32,+31] for TXGO.
26005bd8deadSopenharmony_ci
26015bd8deadSopenharmony_ci    (3) What is TXGO (texture gather with separate offsets) good for?
26025bd8deadSopenharmony_ci
26035bd8deadSopenharmony_ci      RESOLVED:  TXGO allows for efficiently sampling a single-component
26045bd8deadSopenharmony_ci      texture with a variety of offsets that need not be contiguous.
26055bd8deadSopenharmony_ci
26065bd8deadSopenharmony_ci      For example, a shadow mapping algorithm using a high-resolution shadow
26075bd8deadSopenharmony_ci      map may have pixels whose footpoint covers a large number of texels in
26085bd8deadSopenharmony_ci      the shadow map.  Such pixels could do a single lookup into a
26095bd8deadSopenharmony_ci      lower-resolution texture (using mipmapping), but quality problems will
26105bd8deadSopenharmony_ci      arise.  Alternately, a shader could perform a large number of texture
26115bd8deadSopenharmony_ci      lookups using either NEAREST or LINEAR filtering from the
26125bd8deadSopenharmony_ci      high-resolution texture.  NEAREST filtering will require a separate
26135bd8deadSopenharmony_ci      lookup for each texel accessed; LINEAR filtering may require somewhat
26145bd8deadSopenharmony_ci      fewer lookups, but all accesses cover a 2x2 portion of the texture.  The
26155bd8deadSopenharmony_ci      TXG instruction added to NV_gpu_program4_1 allows a 2x2 block of texels
26165bd8deadSopenharmony_ci      to be returned in a single instruction in case the program wants to do
26175bd8deadSopenharmony_ci      something other than linear filtering with the samples.  The TXGO allows
26185bd8deadSopenharmony_ci      a program to do semi-random sampling of the texture without requiring
26195bd8deadSopenharmony_ci      that each sample cover a 2x2 block of texels.  For example, the TXGO
26205bd8deadSopenharmony_ci      instruction would allow a program to the four texels A, H, J, O from the
26215bd8deadSopenharmony_ci      4x4 block depicted below:
26225bd8deadSopenharmony_ci
26235bd8deadSopenharmony_ci        TXGO result, coord, {-1,+2,0,+1}, {-1,0,+1,+2}, texture[0], 2D;
26245bd8deadSopenharmony_ci
26255bd8deadSopenharmony_ci      The "equivalent" TXG instruction would only sample the four center
26265bd8deadSopenharmony_ci      texels F, G, J, and K
26275bd8deadSopenharmony_ci
26285bd8deadSopenharmony_ci        TXG result, coord, texture[0], 2D;
26295bd8deadSopenharmony_ci
26305bd8deadSopenharmony_ci      All sixteen texels of the footprint could be sampled with four TXG
26315bd8deadSopenharmony_ci      instructions,
26325bd8deadSopenharmony_ci
26335bd8deadSopenharmony_ci        TXG result0, coord, texture[0], 2D, (-1,-1);
26345bd8deadSopenharmony_ci        TXG result1, coord, texture[0], 2D, (-1,+1);
26355bd8deadSopenharmony_ci        TXG result2, coord, texture[0], 2D, (+1,-1);
26365bd8deadSopenharmony_ci        TXG result3, coord, texture[0], 2D, (+1,+1);
26375bd8deadSopenharmony_ci
26385bd8deadSopenharmony_ci      but accessing a smaller number of samples spread across the footprint
26395bd8deadSopenharmony_ci      with fewer instructions may produce results that are good enough.
26405bd8deadSopenharmony_ci
26415bd8deadSopenharmony_ci      The figure here depicts a texture with texel (0,0) shown in the
26425bd8deadSopenharmony_ci      upper-left corner.  If you insist on a lower-left origin, please look at
26435bd8deadSopenharmony_ci      this figure while standing on your head.
26445bd8deadSopenharmony_ci
26455bd8deadSopenharmony_ci       (0,0) +-+-+-+-+
26465bd8deadSopenharmony_ci             |A|B|C|D|
26475bd8deadSopenharmony_ci             +-+-+-+-+
26485bd8deadSopenharmony_ci             |E|F|G|H|
26495bd8deadSopenharmony_ci             +-+-+-+-+
26505bd8deadSopenharmony_ci             |I|J|K|L|
26515bd8deadSopenharmony_ci             +-+-+-+-+
26525bd8deadSopenharmony_ci             |M|N|O|P|
26535bd8deadSopenharmony_ci             +-+-+-+-+ (4,4)
26545bd8deadSopenharmony_ci
26555bd8deadSopenharmony_ci    (4) Why are the results of TXGO (texture gather with separate offsets)
26565bd8deadSopenharmony_ci        undefined if the wrap mode is CLAMP or MIRROR_CLAMP_EXT?
26575bd8deadSopenharmony_ci
26585bd8deadSopenharmony_ci      RESOLVED:  The CLAMP and MIRROR_CLAMP_EXT wrap modes are fairly
26595bd8deadSopenharmony_ci      different from other wrap modes.  After adding any instruction offsets,
26605bd8deadSopenharmony_ci      the spec says to pre-clamp the (u,v) coordinates to [0,texture_size]
26615bd8deadSopenharmony_ci      before generating the footprint.  If such clamping occurs on one edge
26625bd8deadSopenharmony_ci      for a normal texture filtering operation, the footprint ends up being
26635bd8deadSopenharmony_ci      half border texels, half edge texels, and the clamping effectively
26645bd8deadSopenharmony_ci      forces the interpolation weights used for texture filtering to 50/50.
26655bd8deadSopenharmony_ci
26665bd8deadSopenharmony_ci      We expect the TXG instruction to be used in cases where an application
26675bd8deadSopenharmony_ci      may want to do custom filtering, and is in control of its own filtering
26685bd8deadSopenharmony_ci      weights.  Coordinate clamping as above will affect the footprint used
26695bd8deadSopenharmony_ci      for filtering, but not the weights.  In the NV_gpu_program4_1 spec, we
26705bd8deadSopenharmony_ci      defined the TXG/CLAMP combination to simply return the "normal"
26715bd8deadSopenharmony_ci      footprint produced after the pre-clamp operation above.  Any adjustment
26725bd8deadSopenharmony_ci      of weights due to clamping is the responsibility of the application.  We
26735bd8deadSopenharmony_ci      don't expect this to be a common operation, because CLAMP_TO_EDGE or
26745bd8deadSopenharmony_ci      CLAMP_TO_BORDER are much more sensible wrap modes.
26755bd8deadSopenharmony_ci
26765bd8deadSopenharmony_ci      The hardware implementing TXGO is anticipated to extract all four
26775bd8deadSopenharmony_ci      samples in a single pass.  However, the spec language is defined for
26785bd8deadSopenharmony_ci      simplicity to perform four separate "gather" operations with the four
26795bd8deadSopenharmony_ci      provided offsets, extract a single sample from each, and combine the
26805bd8deadSopenharmony_ci      four samples into a vector.  This would require four separate pre-clamp
26815bd8deadSopenharmony_ci      operations, which was deemed too costly to implement in hardware for a
26825bd8deadSopenharmony_ci      wrap mode that doesn't work well with texture gather operations.  Even
26835bd8deadSopenharmony_ci      if such hardware were built, it still wouldn't obtain a footprint
26845bd8deadSopenharmony_ci      resembling the half-border, half-edge footprint for simple TXGO offsets
26855bd8deadSopenharmony_ci      -- that would require different per-texel clamping rules for the four
26865bd8deadSopenharmony_ci      samples.  We chose to leave the results of this operation undefined.
26875bd8deadSopenharmony_ci
26885bd8deadSopenharmony_ci    (5) Should double-precision floating-point support be required or
26895bd8deadSopenharmony_ci        optional?  If optional, how?
26905bd8deadSopenharmony_ci
26915bd8deadSopenharmony_ci      RESOLVED:  Double-precision floating-point support will be optional in
26925bd8deadSopenharmony_ci      case low-end GPUs supporting the remainder of these instruction features
26935bd8deadSopenharmony_ci      choose to cut costs by removing the silicon necessary to implement
26945bd8deadSopenharmony_ci      64-bit floating-point arithmetic.
26955bd8deadSopenharmony_ci
26965bd8deadSopenharmony_ci    (6) While this extension supports double-precision computation, how can
26975bd8deadSopenharmony_ci        you provide high-precision inputs and outputs to the GPU programs?
26985bd8deadSopenharmony_ci
26995bd8deadSopenharmony_ci      RESOLVED:  The underlying hardware implementing this extension does not
27005bd8deadSopenharmony_ci      provide full support for 64-bit floats, even though DOUBLE is a standard
27015bd8deadSopenharmony_ci      data type provided by the GL.  For example, when specifying a vertex
27025bd8deadSopenharmony_ci      array with a data type of DOUBLE, the vertex attribute components will
27035bd8deadSopenharmony_ci      end up being converted to 32-bit floats (FLOAT) by the driver before
27045bd8deadSopenharmony_ci      being passed to the hardware, and the extra precision in the original
27055bd8deadSopenharmony_ci      64-bit float values will be lost.
27065bd8deadSopenharmony_ci
27075bd8deadSopenharmony_ci      For vertex attributes, the EXT_vertex_attrib_64bit and
27085bd8deadSopenharmony_ci      NV_vertex_attrib_integer_64bit extensions provide the ability to specify
27095bd8deadSopenharmony_ci      64-bit vertex attribute components using the VertexAttribL* and
27105bd8deadSopenharmony_ci      VertexAttribLPointer APIs.  Such attributes can be read in a vertex
27115bd8deadSopenharmony_ci      program using a "LONG ATTRIB" declaration:
27125bd8deadSopenharmony_ci
27135bd8deadSopenharmony_ci        LONG ATTRIB vector64;
27145bd8deadSopenharmony_ci
27155bd8deadSopenharmony_ci      The LONG modifier can only be used vertex program inputs, and can not be
27165bd8deadSopenharmony_ci      used for inputs of any program type or outputs of any program type.  
27175bd8deadSopenharmony_ci
27185bd8deadSopenharmony_ci      For other cases, this extension provides the PK64 and UP64 instructions
27195bd8deadSopenharmony_ci      that provide a mechanism to pass 64-bit components using consecutive
27205bd8deadSopenharmony_ci      32-bit components.  For example, a 3-component vector with 64-bit
27215bd8deadSopenharmony_ci      components can be passed to a vertex shader using multiple vertex
27225bd8deadSopenharmony_ci      attributes without using the VertexAttribL APIs with the following code:
27235bd8deadSopenharmony_ci
27245bd8deadSopenharmony_ci        /* Pass the X/Y components in vertex attribute 0 (X/Y/Z/W).  Use 
27255bd8deadSopenharmony_ci           stride to skip over Z. */
27265bd8deadSopenharmony_ci        glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 3*sizeof(GLdouble), 
27275bd8deadSopenharmony_ci                              (GLdouble *) buffer);
27285bd8deadSopenharmony_ci
27295bd8deadSopenharmony_ci        /* Pass the Z components in vertex attribute 1 (X/Y).  Use stride to
27305bd8deadSopenharmony_ci           skip over original X/Y components. */
27315bd8deadSopenharmony_ci        glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 3*sizeof(GLdouble), 
27325bd8deadSopenharmony_ci                              (GLdouble *) buffer + 2);
27335bd8deadSopenharmony_ci
27345bd8deadSopenharmony_ci      In this example, the vertex program would use the PK64 instruction to
27355bd8deadSopenharmony_ci      reconstruct the 64-bit value for each component as follows:
27365bd8deadSopenharmony_ci
27375bd8deadSopenharmony_ci        LONG TEMP reconstructed;
27385bd8deadSopenharmony_ci        PK64 reconstructed.xy, vertex.attrib[0];
27395bd8deadSopenharmony_ci        PK64 reconstructed.z,  vertex.attrib[1];
27405bd8deadSopenharmony_ci
27415bd8deadSopenharmony_ci      A similar technique can be used to pass 64-bit values computed by a GPU
27425bd8deadSopenharmony_ci      program, using transform feedback or writes to a color buffer.  The UP64
27435bd8deadSopenharmony_ci      instruction would be used to convert the 64-bit computed value into two
27445bd8deadSopenharmony_ci      32-bit values, which would be written to adjacent components.
27455bd8deadSopenharmony_ci
27465bd8deadSopenharmony_ci      Note also that the original hardware implementation of this extension
27475bd8deadSopenharmony_ci      does not support interpolation of 64-bit floating-point values.  If an
27485bd8deadSopenharmony_ci      application desires to pass a 64-bit floating-point value from a vertex
27495bd8deadSopenharmony_ci      or geometry program to a fragment program, and doesn't require
27505bd8deadSopenharmony_ci      interpolation, the PK64/UP64 techniques can be combined.  For example,
27515bd8deadSopenharmony_ci      the vertex shader could unpack a 3-component vector with 64-bit
27525bd8deadSopenharmony_ci      components into a four-component and a two-component 32-bit vector:
27535bd8deadSopenharmony_ci
27545bd8deadSopenharmony_ci        LONG TEMP result64;
27555bd8deadSopenharmony_ci        RESULT result32[2] = { result.attrib[0..1] };
27565bd8deadSopenharmony_ci        UP64 result32[0],    result64.xyxy;
27575bd8deadSopenharmony_ci        UP64 result32[1].xy, result64.z;
27585bd8deadSopenharmony_ci
27595bd8deadSopenharmony_ci      The fragment program would read and reconstruct using PK64:
27605bd8deadSopenharmony_ci
27615bd8deadSopenharmony_ci        LONG TEMP input64;
27625bd8deadSopenharmony_ci        FLAT ATTRIB input32[3] = { fragment.attrib[0..1] };
27635bd8deadSopenharmony_ci        PK64 input64.xy, input32[0];
27645bd8deadSopenharmony_ci        PK64 input64.z,  input32[1];
27655bd8deadSopenharmony_ci
27665bd8deadSopenharmony_ci      Note that such inputs must be declared as "FLAT" in the fragment program
27675bd8deadSopenharmony_ci      to prevent the hardware from trying to do floating-point interpolation
27685bd8deadSopenharmony_ci      on the separate 32-bit halves of the value being passed.  Such
27695bd8deadSopenharmony_ci      interpolation would produce complete garbage.
27705bd8deadSopenharmony_ci
27715bd8deadSopenharmony_ci    (7) What are instanced geometry programs useful for?
27725bd8deadSopenharmony_ci
27735bd8deadSopenharmony_ci      RESOLVED:  Instanced geometry programs allow geometry programs that
27745bd8deadSopenharmony_ci      perform regular operations to run more efficiently.
27755bd8deadSopenharmony_ci
27765bd8deadSopenharmony_ci      Consider a simple example of an algorithm that uses geometry programs to
27775bd8deadSopenharmony_ci      render primitives to a cube map in a single pass.  Without instanced
27785bd8deadSopenharmony_ci      geometry programs, the geometry program to render triangles to the cube
27795bd8deadSopenharmony_ci      map would do something like:
27805bd8deadSopenharmony_ci
27815bd8deadSopenharmony_ci        for (face = 0; face < 6; face++) {
27825bd8deadSopenharmony_ci          for (vertex = 0; vertex < 3; vertex++) {
27835bd8deadSopenharmony_ci            project vertex <vertex> onto face <face>, output position
27845bd8deadSopenharmony_ci            compute/copy attributes of emitted <vertex> to outputs
27855bd8deadSopenharmony_ci            output <face> to result.layer
27865bd8deadSopenharmony_ci            emit the projected vertex
27875bd8deadSopenharmony_ci          }
27885bd8deadSopenharmony_ci          end the primitive (next triangle)
27895bd8deadSopenharmony_ci        }
27905bd8deadSopenharmony_ci
27915bd8deadSopenharmony_ci      This algorithm would output 18 vertices per input triangle, three for
27925bd8deadSopenharmony_ci      each cube face.  The six triangles emitted would be rasterized, one per
27935bd8deadSopenharmony_ci      face.  Geometry programs that emit a large number of attributes have
27945bd8deadSopenharmony_ci      often posed performance challenges, since all the attributes must be
27955bd8deadSopenharmony_ci      stored somewhere until the emitted primitives.  Large storage
27965bd8deadSopenharmony_ci      requirements may limit the number of threads that can be run in parallel
27975bd8deadSopenharmony_ci      and reduce overall performance.
27985bd8deadSopenharmony_ci
27995bd8deadSopenharmony_ci      Instanced geometry programs allow this example to be restructured to run
28005bd8deadSopenharmony_ci      with six separate threads, one per face.  Each thread projects the
28015bd8deadSopenharmony_ci      triangle to only a single face (identified by the invocation number) and
28025bd8deadSopenharmony_ci      emits only 3 vertices.  The reduced storage requirements allow more
28035bd8deadSopenharmony_ci      geometry program threads to be run in parallel, with greater overall
28045bd8deadSopenharmony_ci      efficiency.
28055bd8deadSopenharmony_ci
28065bd8deadSopenharmony_ci      Additionally, the total number of attributes that can be emitted by a
28075bd8deadSopenharmony_ci      single geometry program invocation is limited.  However, for instanced
28085bd8deadSopenharmony_ci      geometry shaders, that limit applies to each of <N> program invocations
28095bd8deadSopenharmony_ci      which allows for a larger total output.  For example, if the GL
28105bd8deadSopenharmony_ci      implementation supports only 1024 components of output per program
28115bd8deadSopenharmony_ci      invocation, the 18-vertex algorithm above could emit no more than 56
28125bd8deadSopenharmony_ci      components per vertex.  The same algorithm implemented as a 3-vertex
28135bd8deadSopenharmony_ci      6-invocation geometry program could theoretically allow for 341
28145bd8deadSopenharmony_ci      components per vertex.
28155bd8deadSopenharmony_ci
28165bd8deadSopenharmony_ci    (8) What are the special interpolation opcodes (IPAC, IPAO, IPAS) good
28175bd8deadSopenharmony_ci        for, and how do they work?
28185bd8deadSopenharmony_ci
28195bd8deadSopenharmony_ci      RESOLVED:  The interpolation opcodes allow programs to control the
28205bd8deadSopenharmony_ci      frequency and location at which fragment inputs are sampled.  Limited
28215bd8deadSopenharmony_ci      control has been provided in previous extensions, but the support was
28225bd8deadSopenharmony_ci      more limited.  NV_gpu_program4 had an interpolation modifier (CENTROID)
28235bd8deadSopenharmony_ci      that allowed attributes to be sampled inside the primitive, but that was
28245bd8deadSopenharmony_ci      a per-attribute modifier -- you could only sample any given attribute at
28255bd8deadSopenharmony_ci      one location.  NV_gpu_program4_1 added a new interpolation modifier
28265bd8deadSopenharmony_ci      (SAMPLE) that directed that fragment programs be run once per sample,
28275bd8deadSopenharmony_ci      and that the specified attributes be interpolated at the sample
28285bd8deadSopenharmony_ci      location.  Per-sample interpolation can produce higher quality, but the
28295bd8deadSopenharmony_ci      performance cost is significant since more fragment program invocations
28305bd8deadSopenharmony_ci      are required.
28315bd8deadSopenharmony_ci
28325bd8deadSopenharmony_ci      This extension provides additional control over interpolation, and
28335bd8deadSopenharmony_ci      allows programs to interpolate attributes at different locations without
28345bd8deadSopenharmony_ci      necessarily requiring the performance hit of per-sample invocation.
28355bd8deadSopenharmony_ci
28365bd8deadSopenharmony_ci      The IPAC instruction allows an attribute to be sampled at the centroid
28375bd8deadSopenharmony_ci      location, while still allowing the same attribute to be sampled
28385bd8deadSopenharmony_ci      elsewhere.  The IPAS instruction allows the attribute to be sampled at a
28395bd8deadSopenharmony_ci      number sample location, as per-sample interpolation would do.  Multiple
28405bd8deadSopenharmony_ci      IPAS instructions with different sample numbers allows a program to
28415bd8deadSopenharmony_ci      sample an attribute at multiple sample points in the pixel and then
28425bd8deadSopenharmony_ci      combine the samples in a programmable manner, which may allow for higher
28435bd8deadSopenharmony_ci      quality than simply interpolating at a single representative point in
28445bd8deadSopenharmony_ci      the pixel.  The IPAO instruction allows the attribute to be sampled at
28455bd8deadSopenharmony_ci      an arbitrary (x,y) offset relative to the pixel center.  The range of
28465bd8deadSopenharmony_ci      supported (x,y) values is limited, and the limits in the initial
28475bd8deadSopenharmony_ci      implementation are not large enough to permit sampling the attribute
28485bd8deadSopenharmony_ci      outside the pixel.
28495bd8deadSopenharmony_ci
28505bd8deadSopenharmony_ci      Note that previous instruction sets allowed shaders to fake IPAC,
28515bd8deadSopenharmony_ci      IPAS, and IPAO by a sequence such as:
28525bd8deadSopenharmony_ci
28535bd8deadSopenharmony_ci        TEMP ddx, ddy, offset, interp;
28545bd8deadSopenharmony_ci        MOV interp, fragment.attrib[0];          # start with center
28555bd8deadSopenharmony_ci        DDX ddx, fragment.attrib[0];
28565bd8deadSopenharmony_ci        MAD interp, offset.x, ddx, interp;       # add offset.x * dA/dx
28575bd8deadSopenharmony_ci        DDY ddx, fragment.attrib[0];
28585bd8deadSopenharmony_ci        MAD interp, offset.y, ddy, interp;       # add offset.y * dA/dy
28595bd8deadSopenharmony_ci
28605bd8deadSopenharmony_ci      However, this method does not apply perspective correction.  The quality
28615bd8deadSopenharmony_ci      of the results may be unacceptable, particularly for primitives that are
28625bd8deadSopenharmony_ci      nearly perpendicular to the screen.
28635bd8deadSopenharmony_ci
28645bd8deadSopenharmony_ci      The semantics of the first operand of these instructions is different
28655bd8deadSopenharmony_ci      from normal assembly instructions.  Operands are normally evaluated by
28665bd8deadSopenharmony_ci      loading the value of the corresponding variable and applying any
28675bd8deadSopenharmony_ci      swizzle/negation/absolute value modifier before the instruction is
28685bd8deadSopenharmony_ci      executed.  In the IPAC/IPAO/IPAS instructions, the value of the
28695bd8deadSopenharmony_ci      attribute is evaluated by the instruction itself.  Swizzles, negation,
28705bd8deadSopenharmony_ci      and absolute value modifiers are still allowed, and are applied after
28715bd8deadSopenharmony_ci      the attribute values are interpolated.
28725bd8deadSopenharmony_ci
28735bd8deadSopenharmony_ci    (9) When using a program that issues global stores (via the STORE
28745bd8deadSopenharmony_ci        instruction), what amount of execution ordering is guaranteed?  How
28755bd8deadSopenharmony_ci        can an application ensure that writes executed in a shader have
28765bd8deadSopenharmony_ci        completed and will be visible to other operations using the buffer
28775bd8deadSopenharmony_ci        object in question?
28785bd8deadSopenharmony_ci
28795bd8deadSopenharmony_ci      RESOLVED:  There are very few automatic guarantees for potential
28805bd8deadSopenharmony_ci      write/read or write/write conflicts.  Program invocations will run in
28815bd8deadSopenharmony_ci      generally run in arbitrary order, and applications can't rely on
28825bd8deadSopenharmony_ci      read/write order to match primitive order.
28835bd8deadSopenharmony_ci
28845bd8deadSopenharmony_ci      To get consistent results when buffers are read and written using
28855bd8deadSopenharmony_ci      multiple pipeline stages, manual synchronization using the
28865bd8deadSopenharmony_ci      MemoryBarrierEXT() API documented in EXT_shader_image_load_store or some
28875bd8deadSopenharmony_ci      other synchronization primitive is necessary.
28885bd8deadSopenharmony_ci
28895bd8deadSopenharmony_ci    (10) Unlike most other shader features, the STORE opcode allows for
28905bd8deadSopenharmony_ci         externally-visible side effects from executing a program.  How does
28915bd8deadSopenharmony_ci         this capability interact with other features of the GL?
28925bd8deadSopenharmony_ci
28935bd8deadSopenharmony_ci      RESOLVED:  First, some GL implementations support a variety of "early Z"
28945bd8deadSopenharmony_ci      optimizations designed to minimize unnecessary fragment processing work,
28955bd8deadSopenharmony_ci      such as executing an expensive fragment program on a fragment that will
28965bd8deadSopenharmony_ci      eventually fail the depth test.  Such optimizations have been valid
28975bd8deadSopenharmony_ci      because fragment programs had no side effects.  That is no longer the
28985bd8deadSopenharmony_ci      case, and such optimizations may not be employed if the fragment program
28995bd8deadSopenharmony_ci      performs a global store.  However, we provide a new "early depth and
29005bd8deadSopenharmony_ci      stencil test" enable that allows applications to deterministically
29015bd8deadSopenharmony_ci      control depth and stencil testing.  If enabled, depth testing is always
29025bd8deadSopenharmony_ci      performed prior to fragment program execution.  Fragment programs will
29035bd8deadSopenharmony_ci      never be run on fragments that fail any of these tests.
29045bd8deadSopenharmony_ci
29055bd8deadSopenharmony_ci      Second, we are permitting global stores in all program types; however,
29065bd8deadSopenharmony_ci      the number of program invocations is not well-defined for some program
29075bd8deadSopenharmony_ci      types.  For example, a GL implementation may choose to combine multiple
29085bd8deadSopenharmony_ci      instances of identical vertices (e.g., duplicate indices in
29095bd8deadSopenharmony_ci      DrawElements, immediate-mode vertices with identical data) into one
29105bd8deadSopenharmony_ci      single vertex program invocation, or it may run a vertex program on each
29115bd8deadSopenharmony_ci      separately.  Similarly, the tessellation primitive generator will
29125bd8deadSopenharmony_ci      generate independent primitives with duplicated vertices, which may or
29135bd8deadSopenharmony_ci      may not be combined for tessellation evaluation program execution.
29145bd8deadSopenharmony_ci      Fragment program execution also has several issues described in more
29155bd8deadSopenharmony_ci      detail below.
29165bd8deadSopenharmony_ci
29175bd8deadSopenharmony_ci    (11) What issues arise when running fragment programs doing global stores?
29185bd8deadSopenharmony_ci
29195bd8deadSopenharmony_ci      RESOLVED:  The order of per-fragment operations in the existing OpenGL
29205bd8deadSopenharmony_ci      3.0 specification can be fairly loose, because previously-defined
29215bd8deadSopenharmony_ci      fragment programs, shaders, and fixed-function fragment processing had
29225bd8deadSopenharmony_ci      no side effects.  With side effects, the order of operations must be
29235bd8deadSopenharmony_ci      defined more tightly.  In particular, the pixel ownership and scissor
29245bd8deadSopenharmony_ci      tests are specified to be performed prior to fragment program execution,
29255bd8deadSopenharmony_ci      and we provide an option to perform depth and stencil tests early as
29265bd8deadSopenharmony_ci      well.
29275bd8deadSopenharmony_ci
29285bd8deadSopenharmony_ci      OpenGL implementations sometimes run fragment programs on "helper"
29295bd8deadSopenharmony_ci      pixels that have no coverage in order to be able to compute sane partial
29305bd8deadSopenharmony_ci      deriviatives for fragment program instructions (DDX, DDY) or automatic
29315bd8deadSopenharmony_ci      level-of-detail calculation for texturing.  In this approach,
29325bd8deadSopenharmony_ci      derivatives are approximated by computing the difference in a quantity
29335bd8deadSopenharmony_ci      computed for a given fragment at (x,y) and a fragment at a neighboring
29345bd8deadSopenharmony_ci      pixel.  When a fragment program is executed on a "helper" pixel, global
29355bd8deadSopenharmony_ci      stores have no effect.  Helper pixels aren't explicitly mentioned in the
29365bd8deadSopenharmony_ci      spec body; instead, partial derivatives are obtained by magic.
29375bd8deadSopenharmony_ci
29385bd8deadSopenharmony_ci      If a fragment program contains a KIL instruction, compilers may not
29395bd8deadSopenharmony_ci      reorder code where an ATOM or STORE execution is executed before a KIL
29405bd8deadSopenharmony_ci      instruction that logically precedes it in flow control.  Once a fragment
29415bd8deadSopenharmony_ci      is killed, subsequent atomics or stores should never be executed.
29425bd8deadSopenharmony_ci
29435bd8deadSopenharmony_ci      Multisample rasterization poses several issues for fragment programs
29445bd8deadSopenharmony_ci      with global stores.  The number of times a fragment program is executed
29455bd8deadSopenharmony_ci      for multisample rendering is not fully specified, which gives
29465bd8deadSopenharmony_ci      implementations a number of different choices -- pure multisample (only
29475bd8deadSopenharmony_ci      runs once), pure supersample (runs once per covered sample), or modes in
29485bd8deadSopenharmony_ci      between.  There are some ways for an application to indirectly control
29495bd8deadSopenharmony_ci      the behavior -- for example, fragment programs specifying per-sample
29505bd8deadSopenharmony_ci      attribute interpolation are guaranteed to run once per covered sample.
29515bd8deadSopenharmony_ci
29525bd8deadSopenharmony_ci      Note that when rendering to a multisample buffer, a pair of adjacent
29535bd8deadSopenharmony_ci      triangles may cause a fragment program to be executed more than once at
29545bd8deadSopenharmony_ci      a given (x,y) with different sets of samples covered.  This can also
29555bd8deadSopenharmony_ci      occur in the interior of a quadrilateral or polygon primitive.
29565bd8deadSopenharmony_ci      Implementations are permitted to split quads and polygons with >3
29575bd8deadSopenharmony_ci      vertices into triangles, creating interior edges that split a pixel.
29585bd8deadSopenharmony_ci
29595bd8deadSopenharmony_ci    (12) What happens if early fragment tests are enabled, the early depth
29605bd8deadSopenharmony_ci         test passes, and a fragment program that computes a new depth value
29615bd8deadSopenharmony_ci         is executed?
29625bd8deadSopenharmony_ci
29635bd8deadSopenharmony_ci      RESOLVED:  The depth value produced by the fragment program has no
29645bd8deadSopenharmony_ci      effect if early fragment tests are enabled.  The depth value computed by
29655bd8deadSopenharmony_ci      a fragment program is used only by the post-fragment program stencil and
29665bd8deadSopenharmony_ci      depth tests, and those tests always have no effect when early depth
29675bd8deadSopenharmony_ci      testing is enabled.
29685bd8deadSopenharmony_ci
29695bd8deadSopenharmony_ci    (13) How do early fragment tests interact with occlusion queries?
29705bd8deadSopenharmony_ci
29715bd8deadSopenharmony_ci      RESOLVED:  When early fragment tests are enabled, sample counting for
29725bd8deadSopenharmony_ci      occlusion queries also happens prior to fragment program execution.
29735bd8deadSopenharmony_ci      Enabling early fragment tests can change the overall sample count,
29745bd8deadSopenharmony_ci      because samples killed by alpha test and alpha to coverage will still be
29755bd8deadSopenharmony_ci      counted if early fragment tests are enabled.
29765bd8deadSopenharmony_ci
29775bd8deadSopenharmony_ci    (14) What happens if a program performs a global store to a GPU address
29785bd8deadSopenharmony_ci         corresponding to a read-only buffer mapping?  What if it performs a
29795bd8deadSopenharmony_ci         global read to a write-only mapping?
29805bd8deadSopenharmony_ci
29815bd8deadSopenharmony_ci      RESOLVED:  Implementations may choose implement full memory protection,
29825bd8deadSopenharmony_ci      in which case accesses using the wrong type of memory mapping will fault
29835bd8deadSopenharmony_ci      and lead to termination of the application.  
29845bd8deadSopenharmony_ci
29855bd8deadSopenharmony_ci      However, full memory protection is not required in this extension --
29865bd8deadSopenharmony_ci      implementations may choose to substitute a read-write mapping in place
29875bd8deadSopenharmony_ci      of a read-only or write-only mapping.  As a result, we specify the
29885bd8deadSopenharmony_ci      result of such invalid loads and stores to be undefined.  
29895bd8deadSopenharmony_ci
29905bd8deadSopenharmony_ci      Note that if a program erroneously writes to nominally read-only
29915bd8deadSopenharmony_ci      mappings, the results may be weird.  If the implementation substitutes a
29925bd8deadSopenharmony_ci      read-write mapping, such invalid writes are likely to proceed normally.
29935bd8deadSopenharmony_ci      However, if the application later makes a buffer object non-resident and
29945bd8deadSopenharmony_ci      the memory manager of the GL implementation needs to move the buffer,
29955bd8deadSopenharmony_ci      the GL may assume that the contents of the buffer have not been modified
29965bd8deadSopenharmony_ci      and thus discard the new values written by the (invalid) global store
29975bd8deadSopenharmony_ci      instructions.
29985bd8deadSopenharmony_ci
29995bd8deadSopenharmony_ci    (15) What performance considerations apply to atomics?
30005bd8deadSopenharmony_ci
30015bd8deadSopenharmony_ci      RESOLVED:  Atomics can be useful for operations like locking, or for
30025bd8deadSopenharmony_ci      maintaining counters.  Note that high-performance GPUs may have hundreds
30035bd8deadSopenharmony_ci      of program threads in flight at once, and may also have some SIMD
30045bd8deadSopenharmony_ci      characteristics (where threads are grouped and run as a unit).  Using
30055bd8deadSopenharmony_ci      ATOM instructions with a single memory address to implement a critical
30065bd8deadSopenharmony_ci      section will result in serial execution -- only one of the hundreds of
30075bd8deadSopenharmony_ci      threads can execute code in the critical section at a time.  
30085bd8deadSopenharmony_ci
30095bd8deadSopenharmony_ci      When a global operation would be done under a lock, it may be possible
30105bd8deadSopenharmony_ci      to improve performance if the algorithm can be parallelized to have
30115bd8deadSopenharmony_ci      multiple critical sections.  For example, an application could allocate
30125bd8deadSopenharmony_ci      an array of shared resources, each protected by its own lock, and use
30135bd8deadSopenharmony_ci      the LSBs of the primitive ID or some function of the screen-space (x,y)
30145bd8deadSopenharmony_ci      to determine which resource in the array to use.
30155bd8deadSopenharmony_ci
30165bd8deadSopenharmony_ci    (16) The atomic instruction ATOM returns the old contents of memory into
30175bd8deadSopenharmony_ci         the result register.  Should we provide a version of this opcodes
30185bd8deadSopenharmony_ci         that doesn't return a value?
30195bd8deadSopenharmony_ci
30205bd8deadSopenharmony_ci      RESOLVED:  No.  In theory, atomics that don't return any values can
30215bd8deadSopenharmony_ci      perform better (because the program may not need to allocate resources
30225bd8deadSopenharmony_ci      to hold a result or wait for the result.  However, a new opcode isn't
30235bd8deadSopenharmony_ci      required to obtain this behavior -- a compiler can recognize that the
30245bd8deadSopenharmony_ci      result of an ATOM instruction is written to a "dummy" temporary that
30255bd8deadSopenharmony_ci      isn't read by subsequent instructions:
30265bd8deadSopenharmony_ci
30275bd8deadSopenharmony_ci        TEMP junk;
30285bd8deadSopenharmony_ci        ATOM.ADD.U32 junk, address, 1;
30295bd8deadSopenharmony_ci
30305bd8deadSopenharmony_ci      The compiler can also recognize that the result will always be discarded
30315bd8deadSopenharmony_ci      if a conditional write mask of "(FL)" is used.
30325bd8deadSopenharmony_ci
30335bd8deadSopenharmony_ci        ATOM.ADD.U32 not_junk (FL), address, 1;
30345bd8deadSopenharmony_ci
30355bd8deadSopenharmony_ci    (17) How do we ensure that memory access made by multiple program
30365bd8deadSopenharmony_ci         invocations of possibly different types are coherent?
30375bd8deadSopenharmony_ci
30385bd8deadSopenharmony_ci      RESOLVED:  Atomic instructions allow program invocations to coordinate
30395bd8deadSopenharmony_ci      using shared global memory addresses.  However, memory transactions,
30405bd8deadSopenharmony_ci      including atomics, are not guaranteed to land in the order specified in
30415bd8deadSopenharmony_ci      the program; they may be reordered by the compiler, cached in different
30425bd8deadSopenharmony_ci      memory hierarchies, and stored in a distributed memory system where
30435bd8deadSopenharmony_ci      later stores to one "partition" might be completed prior to earlier
30445bd8deadSopenharmony_ci      stores to another.  The MEMBAR instruction helps control memory
30455bd8deadSopenharmony_ci      transaction ordering by ensuring that all memory transactions prior to
30465bd8deadSopenharmony_ci      the barrier complete before any after the barrier.  Additionally the
30475bd8deadSopenharmony_ci      ".COH" modifier ensures that memory transactions using the modifier are
30485bd8deadSopenharmony_ci      cached coherently and will be visible to other shader invocations.
30495bd8deadSopenharmony_ci
30505bd8deadSopenharmony_ci    (18) How do the TXG and TXGO opcodes work with sRGB textures?
30515bd8deadSopenharmony_ci
30525bd8deadSopenharmony_ci       RESOLVED. Gamma-correction is applied to the texture source color 
30535bd8deadSopenharmony_ci       before "gathering" and hence applies to all four components, unless
30545bd8deadSopenharmony_ci       the texture swizzle of the selected component is ALPHA in which case 
30555bd8deadSopenharmony_ci       no gamma-correction is applied.
30565bd8deadSopenharmony_ci
30575bd8deadSopenharmony_ci    (19) How can render-to-texture algorithms take advantage of
30585bd8deadSopenharmony_ci         MemoryBarrierEXT, nominally provided for global memory transactions?
30595bd8deadSopenharmony_ci
30605bd8deadSopenharmony_ci      RESOLVED: Many algorithms use RTT to ping-pong between two allocations,
30615bd8deadSopenharmony_ci      using the result of one rendering pass as the input to the next. 
30625bd8deadSopenharmony_ci      Existing mechanisms require expensive FBO Binds, DrawBuffer changes, or
30635bd8deadSopenharmony_ci      FBO attachment changes to safely swap the render target and texture. With
30645bd8deadSopenharmony_ci      memory barriers, layered geometry shader rendering, and texture arrays, 
30655bd8deadSopenharmony_ci      an application can very cheaply ping-pong between two layers of a single 
30665bd8deadSopenharmony_ci      texture. i.e.
30675bd8deadSopenharmony_ci
30685bd8deadSopenharmony_ci        X = 0;
30695bd8deadSopenharmony_ci        // Bind the array texture to a texture unit
30705bd8deadSopenharmony_ci        // Attach the array texture to an FBO using FramebufferTextureARB
30715bd8deadSopenharmony_ci        while (!done) {
30725bd8deadSopenharmony_ci          // Stuff X in a constant, vertex attrib, etc.
30735bd8deadSopenharmony_ci          Draw - 
30745bd8deadSopenharmony_ci            Texturing from layer X;
30755bd8deadSopenharmony_ci            Writing gl_Layer = 1 - X in the geometry shader;
30765bd8deadSopenharmony_ci          
30775bd8deadSopenharmony_ci          MemoryBarrierEXT(TEXTURE_FETCH_BARRIER_BIT_NV);
30785bd8deadSopenharmony_ci          X = 1 - X;
30795bd8deadSopenharmony_ci        }
30805bd8deadSopenharmony_ci
30815bd8deadSopenharmony_ci      However, be warned that this requires geometry shaders and hence adds 
30825bd8deadSopenharmony_ci      the overhead that all geometry must pass through an additional program
30835bd8deadSopenharmony_ci      stage, so an application using large amounts of geometry could become 
30845bd8deadSopenharmony_ci      geometry-limited or more shader-limited.
30855bd8deadSopenharmony_ci
30865bd8deadSopenharmony_ci    (20) What is the ".PREC" instruction modifier good for?
30875bd8deadSopenharmony_ci
30885bd8deadSopenharmony_ci      RESOLVED:  ".PREC" provides some invariance guarantees is useful for
30895bd8deadSopenharmony_ci      certain algorithms.  Using ".PREC", it is possible to ensure that an
30905bd8deadSopenharmony_ci      algorithm can be written to produce identical results on subtly
30915bd8deadSopenharmony_ci      different inputs.  For example, the order of vertices visible to a
30925bd8deadSopenharmony_ci      geometry or tessellation shader used to subdivide primitive edges might
30935bd8deadSopenharmony_ci      present an edge shared between two primitives in one direction for one
30945bd8deadSopenharmony_ci      primitive and the other direction for the adjacent primitive.  Even if
30955bd8deadSopenharmony_ci      the weights are identical in the two cases, there may be cracking if the
30965bd8deadSopenharmony_ci      computations are being done in an order-dependent manner.  If the
30975bd8deadSopenharmony_ci      position of a new vertex were evaluation with code below with
30985bd8deadSopenharmony_ci      limited-precision floating-point math, it's not necessarily the case
30995bd8deadSopenharmony_ci      that we will get the same result for inputs (a,b,c) and (c,b,a) in the
31005bd8deadSopenharmony_ci      following code:
31015bd8deadSopenharmony_ci
31025bd8deadSopenharmony_ci          ADD result, a, b;
31035bd8deadSopenharmony_ci          ADD result, result, c;
31045bd8deadSopenharmony_ci
31055bd8deadSopenharmony_ci      There are two problems with this code:  the rounding errors will be
31065bd8deadSopenharmony_ci      different and the implementation is free to rearrange the computation
31075bd8deadSopenharmony_ci      order.  The code can be rewritten as follows with ".PREC" and a
31085bd8deadSopenharmony_ci      symmetric evaluation order to ensure a precise result with the inputs
31095bd8deadSopenharmony_ci      reversed:
31105bd8deadSopenharmony_ci
31115bd8deadSopenharmony_ci          ADD result, a, c;
31125bd8deadSopenharmony_ci          ADD.PREC result, result, b;
31135bd8deadSopenharmony_ci
31145bd8deadSopenharmony_ci      Note that in this example, the first instruction doesn't need the
31155bd8deadSopenharmony_ci      ".PREC" qualifier because the second instruction requires that the
31165bd8deadSopenharmony_ci      implementation compute <a>+<c>, which will be done reliably if <a> and
31175bd8deadSopenharmony_ci      <c> are inputs.  If <a> and <c> were results of other computations, the
31185bd8deadSopenharmony_ci      first add and possibly the dependent computations may also need to be
31195bd8deadSopenharmony_ci      tagged with ".PREC" to ensure reliable results.
31205bd8deadSopenharmony_ci
31215bd8deadSopenharmony_ci      The ".PREC" modifier will disable certain optimization and thus carries
31225bd8deadSopenharmony_ci      a performance cost.
31235bd8deadSopenharmony_ci
31245bd8deadSopenharmony_ci    (21) What are the TGALL, TGANY, TGEQ instructions good for?
31255bd8deadSopenharmony_ci
31265bd8deadSopenharmony_ci      RESOLVED:  If an implementation performs SIMD thread execution,
31275bd8deadSopenharmony_ci      divergent branching may result in reduced performance if the "if" and
31285bd8deadSopenharmony_ci      "else" blocks of an "if" statement are executed sequentially.  For
31295bd8deadSopenharmony_ci      example, an algorithm may have both a "fast path" that performs a
31305bd8deadSopenharmony_ci      computation quickly for a subset of all cases and a "fast path" that
31315bd8deadSopenharmony_ci      performs a computation quickly but correctly.  When performing SIMD
31325bd8deadSopenharmony_ci      execution, code like the following:
31335bd8deadSopenharmony_ci
31345bd8deadSopenharmony_ci        SNE.S.CC cc.x, condition.x;
31355bd8deadSopenharmony_ci        IF NE.x;
31365bd8deadSopenharmony_ci          # do fast path
31375bd8deadSopenharmony_ci        ELSE;
31385bd8deadSopenharmony_ci          # do slow path
31395bd8deadSopenharmony_ci        ENDIF;
31405bd8deadSopenharmony_ci
31415bd8deadSopenharmony_ci      may end up executing *both* the fast and slow paths for a SIMD thread
31425bd8deadSopenharmony_ci      group if <condition> diverges, and may execute more slowly than simply
31435bd8deadSopenharmony_ci      executing the slow path unconditionally.  These instructions allow code
31445bd8deadSopenharmony_ci      like:
31455bd8deadSopenharmony_ci
31465bd8deadSopenharmony_ci        # Condition code matches NE if and only if condition.x is non-zero 
31475bd8deadSopenharmony_ci        # for all threads.
31485bd8deadSopenharmony_ci        TGALL.S.CC cc.x, condition.x;
31495bd8deadSopenharmony_ci        IF NE.x;
31505bd8deadSopenharmony_ci          # do fast path
31515bd8deadSopenharmony_ci        ELSE;
31525bd8deadSopenharmony_ci          # do slow path
31535bd8deadSopenharmony_ci        ENDIF;
31545bd8deadSopenharmony_ci
31555bd8deadSopenharmony_ci      that executes the fast path if and only if it can be used for *all*
31565bd8deadSopenharmony_ci      threads in the group.  For thread groups where <condition> diverges,
31575bd8deadSopenharmony_ci      this algorithm would unconditionally run the slow path, but would never
31585bd8deadSopenharmony_ci      run both in sequence.
31595bd8deadSopenharmony_ci
31605bd8deadSopenharmony_ci
31615bd8deadSopenharmony_ciRevision History
31625bd8deadSopenharmony_ci
31635bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
31645bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------
31655bd8deadSopenharmony_ci     8    05/25/22  shqxu     Fix use of a removed function
31665bd8deadSopenharmony_ci                              MemoryBarrierNV.
31675bd8deadSopenharmony_ci
31685bd8deadSopenharmony_ci     7    09/11/14  pbrown    Minor typo fixes.
31695bd8deadSopenharmony_ci
31705bd8deadSopenharmony_ci     6    07/04/13  pbrown    Add missing language describing the
31715bd8deadSopenharmony_ci                              <texImageUnitComp> grammar rule for component
31725bd8deadSopenharmony_ci                              selection in TXG and TXGO instructions.
31735bd8deadSopenharmony_ci
31745bd8deadSopenharmony_ci     5    09/23/10  pbrown    Add missing constants for {MIN,MAX}_PROGRAM_
31755bd8deadSopenharmony_ci                              TEXTURE_GATHER_OFFSET_NV (same as ARB/core).
31765bd8deadSopenharmony_ci                              Add missing description for "su" in the opcode
31775bd8deadSopenharmony_ci                              table; fix a couple operand order bugs for
31785bd8deadSopenharmony_ci                              STORE.
31795bd8deadSopenharmony_ci
31805bd8deadSopenharmony_ci     4    06/22/10  pbrown    Specify that the y/z/w component of the ATOM
31815bd8deadSopenharmony_ci                              results are undefined, as is the case with
31825bd8deadSopenharmony_ci                              ATOMIM from EXT_shader_image_load_store.
31835bd8deadSopenharmony_ci
31845bd8deadSopenharmony_ci     3    04/13/10  pbrown    Remove F32 support from ATOM.ADD.
31855bd8deadSopenharmony_ci
31865bd8deadSopenharmony_ci     2    03/22/10  pbrown    Various wording updates to the spec overview, 
31875bd8deadSopenharmony_ci                              dependencies, issues, and body.  Remove various
31885bd8deadSopenharmony_ci                              spec language that has been refactored into the
31895bd8deadSopenharmony_ci                              EXT_shader_image_load_store specification.
31905bd8deadSopenharmony_ci
31915bd8deadSopenharmony_ci     1              pbrown    Internal revisions.
3192