15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_parameter_buffer_object2
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_NV_parameter_buffer_object2
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciStatus
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci    Shipping (July 2009, Release 190)
165bd8deadSopenharmony_ci
175bd8deadSopenharmony_ciVersion
185bd8deadSopenharmony_ci
195bd8deadSopenharmony_ci    Last Modified Date:         09/09/09
205bd8deadSopenharmony_ci    NVIDIA Revision:            2
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ciNumber
235bd8deadSopenharmony_ci
245bd8deadSopenharmony_ci    378
255bd8deadSopenharmony_ci
265bd8deadSopenharmony_ciDependencies
275bd8deadSopenharmony_ci
285bd8deadSopenharmony_ci    OpenGL 2.0 is required.
295bd8deadSopenharmony_ci
305bd8deadSopenharmony_ci    NV_gpu_program4 is required.
315bd8deadSopenharmony_ci
325bd8deadSopenharmony_ci    NV_parameter_buffer_object is required.
335bd8deadSopenharmony_ci
345bd8deadSopenharmony_ci    This extension is written against the NV_gpu_program4 specification.
355bd8deadSopenharmony_ci
365bd8deadSopenharmony_ci    NV_shader_buffer_load trivially affects the definition of this extension.
375bd8deadSopenharmony_ci
385bd8deadSopenharmony_ciOverview
395bd8deadSopenharmony_ci
405bd8deadSopenharmony_ci    This extension builds on the NV_parameter_buffer_object extension to
415bd8deadSopenharmony_ci    provide additional flexibility in sourcing data from buffer objects.  
425bd8deadSopenharmony_ci
435bd8deadSopenharmony_ci    The original NV_parameter_buffer_object (PaBO) extension provided the
445bd8deadSopenharmony_ci    ability to bind buffer objects to a set of numbered binding points and
455bd8deadSopenharmony_ci    access them in assembly programs as though they were arrays of 32-bit
465bd8deadSopenharmony_ci    scalars (via the BUFFER variable type) or arrays of four-component vectors
475bd8deadSopenharmony_ci    with 32-bit scalar components (via the BUFFER4 variable type).  However,
485bd8deadSopenharmony_ci    the functionality it provided had some significant limits on flexibility.
495bd8deadSopenharmony_ci    Since any given buffer binding point could be used either as a BUFFER or
505bd8deadSopenharmony_ci    BUFFER4, but not both, programs couldn't do both 32- and 128-bit fetches
515bd8deadSopenharmony_ci    from a single binding point.  Additionally, No support was provided for
525bd8deadSopenharmony_ci    8-, 16-, or 64-bit fetches, though they could be emulated using a larger
535bd8deadSopenharmony_ci    loads, with bitfield operations and/or write masking to put components in
545bd8deadSopenharmony_ci    the right places.  Indexing was supported, but strides were limited to 4-
555bd8deadSopenharmony_ci    and 16-byte multiples, depending on whether BUFFER or BUFFER4 is used.
565bd8deadSopenharmony_ci
575bd8deadSopenharmony_ci    This new extension provides the buffer variable declaration type CBUFFER
585bd8deadSopenharmony_ci    to specify a buffer that is treated as an array of bytes, rather than an
595bd8deadSopenharmony_ci    array of words or vectors.  The LDC instruction allows programs to extract
605bd8deadSopenharmony_ci    a vector of data from a CBUFFER variable, using a size and component count
615bd8deadSopenharmony_ci    specified in the opcode modifier.  1-, 2-, and 4-component fetches are
625bd8deadSopenharmony_ci    supported.  The LDC instruction supports byte offsets using normal array
635bd8deadSopenharmony_ci    indexing mechanisms; both run-time and immediate offsets are supported.
645bd8deadSopenharmony_ci    Offsets used for a buffer object fetch are required to be aligned to the
655bd8deadSopenharmony_ci    size of the fetch (1, 2, 4, 8, or 16 bytes).
665bd8deadSopenharmony_ci
675bd8deadSopenharmony_ciNew Procedures and Functions
685bd8deadSopenharmony_ci
695bd8deadSopenharmony_ci    None.
705bd8deadSopenharmony_ci
715bd8deadSopenharmony_ciNew Tokens
725bd8deadSopenharmony_ci
735bd8deadSopenharmony_ci    None.
745bd8deadSopenharmony_ci
755bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation)
765bd8deadSopenharmony_ci
775bd8deadSopenharmony_ci    (All modifications are relative to Section 2.X, GPU Programs, from the
785bd8deadSopenharmony_ci     NV_gpu_program4 specification.)
795bd8deadSopenharmony_ci
805bd8deadSopenharmony_ci    Modify Section 2.X.2, Program Grammar
815bd8deadSopenharmony_ci
825bd8deadSopenharmony_ci    (add after the long list of grammar rules) If a program specifies the
835bd8deadSopenharmony_ci    NV_parameter_buffer_object2 program option, the following rules are added
845bd8deadSopenharmony_ci    to the NV_gpu_program4 base program grammar:
855bd8deadSopenharmony_ci
865bd8deadSopenharmony_ci    <VECTORop>              ::= "LDC"
875bd8deadSopenharmony_ci
885bd8deadSopenharmony_ci    <opModifier>            ::= "F32";
895bd8deadSopenharmony_ci                              | "F32X2";
905bd8deadSopenharmony_ci                              | "F32X4";
915bd8deadSopenharmony_ci                              | "S8";
925bd8deadSopenharmony_ci                              | "S16";
935bd8deadSopenharmony_ci                              | "S32";
945bd8deadSopenharmony_ci                              | "S32X2";
955bd8deadSopenharmony_ci                              | "S32X4";
965bd8deadSopenharmony_ci                              | "U8";
975bd8deadSopenharmony_ci                              | "U16";
985bd8deadSopenharmony_ci                              | "U32";
995bd8deadSopenharmony_ci                              | "U32X2";
1005bd8deadSopenharmony_ci                              | "U32X4";
1015bd8deadSopenharmony_ci
1025bd8deadSopenharmony_ci    <bufferDeclType>        ::= "CBUFFER"
1035bd8deadSopenharmony_ci
1045bd8deadSopenharmony_ci
1055bd8deadSopenharmony_ci    Modify Section 2.X.3.6, Program Parameter Buffers
1065bd8deadSopenharmony_ci
1075bd8deadSopenharmony_ci    (modify the paragraph describing the different type of parameter buffer
1085bd8deadSopenharmony_ci    variable declarations to include support for "CBUFFER".)
1095bd8deadSopenharmony_ci
1105bd8deadSopenharmony_ci    Program parameter buffer variables are treated as an array of
1115bd8deadSopenharmony_ci    single-component words if the <bufferDeclType> grammar rule matches
1125bd8deadSopenharmony_ci    "BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
1135bd8deadSopenharmony_ci    Program parameter buffers may also be declared as an array of basic
1145bd8deadSopenharmony_ci    machine units from which data can be extracted using the LDC (load
1155bd8deadSopenharmony_ci    constant) instruction, if <bufferDeclType> matches "CBUFFER".  Parameter
1165bd8deadSopenharmony_ci    buffer variables declared using "CBUFFER" may not be used as an operand in
1175bd8deadSopenharmony_ci    any instruction other than LDC, while "BUFFER" and "BUFFER4" variables may
1185bd8deadSopenharmony_ci    not be used with LDC.  A program will fail to load if a variable declared
1195bd8deadSopenharmony_ci    as "BUFFER" and another variable declared as "BUFFER4" use the same buffer
1205bd8deadSopenharmony_ci    binding point.  There is no limitation on the use of "CBUFFER" variables
1215bd8deadSopenharmony_ci    in conjunction with "BUFFER" or "BUFFER4" variables using the same buffer
1225bd8deadSopenharmony_ci    binding point.
1235bd8deadSopenharmony_ci
1245bd8deadSopenharmony_ci    (modify/restructure the paragraph describing basic program parameter
1255bd8deadSopenharmony_ci     bindings to handle the byte bindings provided by "CBUFFER" variables)
1265bd8deadSopenharmony_ci
1275bd8deadSopenharmony_ci    If a program parameter buffer binding matches "program.buffer[a][b]", the
1285bd8deadSopenharmony_ci    program parameter variable corresponds to element <b> of the buffer object
1295bd8deadSopenharmony_ci    bound to binding point <a>.  Each element of the bound buffer object is
1305bd8deadSopenharmony_ci    treated as:
1315bd8deadSopenharmony_ci
1325bd8deadSopenharmony_ci      * a single basic machine unit of data, if the variable is declared using
1335bd8deadSopenharmony_ci        "CBUFFER";
1345bd8deadSopenharmony_ci
1355bd8deadSopenharmony_ci      * a single word of data that can hold an integer or floating-point
1365bd8deadSopenharmony_ci        value, if the variable is declared as "BUFFER"; or
1375bd8deadSopenharmony_ci
1385bd8deadSopenharmony_ci      * four words of data that can hold integer or floating-point values, if
1395bd8deadSopenharmony_ci        the variable is declared as "BUFFER4".
1405bd8deadSopenharmony_ci
1415bd8deadSopenharmony_ci    When a binding corresponding to a "BUFFER" variable is used as an operand,
1425bd8deadSopenharmony_ci    the selected word is broadcast to all four components of the variable.
1435bd8deadSopenharmony_ci    When a binding corresponding to a "BUFFER4" variable is used as an
1445bd8deadSopenharmony_ci    operand, the four components of the selected buffer element are loaded
1455bd8deadSopenharmony_ci    into the variable.  A binding corresponding to a "CBUFFER" variable may be
1465bd8deadSopenharmony_ci    used only in the LDC instruction, and will be used there as a pointer to
1475bd8deadSopenharmony_ci    extract operand values from buffer memory.  If no buffer object is bound
1485bd8deadSopenharmony_ci    to binding point <a>, or the bound buffer object is not large enough to
1495bd8deadSopenharmony_ci    hold element <b>, the values used are undefined.  The binding point <a>
1505bd8deadSopenharmony_ci    must be a nonnegative integer constant.
1515bd8deadSopenharmony_ci
1525bd8deadSopenharmony_ci
1535bd8deadSopenharmony_ci    Modify Section 2.X.4, Program Execution Environment
1545bd8deadSopenharmony_ci
1555bd8deadSopenharmony_ci    (Add to the set of opcodes in Table X.13)
1565bd8deadSopenharmony_ci
1575bd8deadSopenharmony_ci                  Modifiers 
1585bd8deadSopenharmony_ci      Instruction F I C S H D  Out Inputs    Description
1595bd8deadSopenharmony_ci      ----------- - - - - - -  --- --------  --------------------------------
1605bd8deadSopenharmony_ci      LDC         X X X X - F  v   v         load from constant buffer
1615bd8deadSopenharmony_ci
1625bd8deadSopenharmony_ci
1635bd8deadSopenharmony_ci    Modify Section 2.X.4.1, Program Instruction Modifiers
1645bd8deadSopenharmony_ci
1655bd8deadSopenharmony_ci    (Add to Table X.14, Instruction Modifiers, and to the corresponding
1665bd8deadSopenharmony_ci    description following the table)
1675bd8deadSopenharmony_ci
1685bd8deadSopenharmony_ci      Modifier  Description
1695bd8deadSopenharmony_ci      --------  -----------------------------------------------
1705bd8deadSopenharmony_ci      F32       Access one 32-bit floating-point value
1715bd8deadSopenharmony_ci      F32X2     Access two 32-bit floating-point values
1725bd8deadSopenharmony_ci      F32X4     Access four 32-bit floating-point values
1735bd8deadSopenharmony_ci      S8        Access one 8-bit signed integer value
1745bd8deadSopenharmony_ci      S16       Access one 16-bit signed integer value
1755bd8deadSopenharmony_ci      S32       Access one 32-bit signed integer value
1765bd8deadSopenharmony_ci      S32X2     Access two 32-bit signed integer values
1775bd8deadSopenharmony_ci      S32X4     Access four 32-bit signed integer values
1785bd8deadSopenharmony_ci      U8        Access one 8-bit unsigned integer value
1795bd8deadSopenharmony_ci      U16       Access one 16-bit unsigned integer value
1805bd8deadSopenharmony_ci      U32       Access one 32-bit unsigned integer value
1815bd8deadSopenharmony_ci      U32X2     Access two 32-bit unsigned integer values
1825bd8deadSopenharmony_ci      U32X4     Access four 32-bit unsigned integer values
1835bd8deadSopenharmony_ci
1845bd8deadSopenharmony_ci    For memory load operations, the "F32", "F32X2", "F32X4", "S8", "S16",
1855bd8deadSopenharmony_ci    "S32", "S32X2", "S32X4", "U8", "U16", "U32", "U32X2", and "U32X4" storage
1865bd8deadSopenharmony_ci    modifiers control how data are loaded from memory.  Storage modifiers are
1875bd8deadSopenharmony_ci    supported by the LDC and LOAD instructions and are covered in more detail
1885bd8deadSopenharmony_ci    in the descriptions of these instructions.  These instructions must
1895bd8deadSopenharmony_ci    specify exactly one of these modifiers, and may not specify any of the
1905bd8deadSopenharmony_ci    base data type modifiers (F,U,S) described above.  The base data type of
1915bd8deadSopenharmony_ci    the result vector of a LOAD or LDC instruction is trivially derived from
1925bd8deadSopenharmony_ci    the storage modifier.
1935bd8deadSopenharmony_ci
1945bd8deadSopenharmony_ci
1955bd8deadSopenharmony_ci    Add New Section 2.X.4.5, Program Memory Access
1965bd8deadSopenharmony_ci
1975bd8deadSopenharmony_ci    Programs may load from buffer object memory via the LDC (load constant)
1985bd8deadSopenharmony_ci    and LOAD (global load) instructions.
1995bd8deadSopenharmony_ci
2005bd8deadSopenharmony_ci    Load instructions read 8, 16, 32, 64, or 128 bits of data from a source
2015bd8deadSopenharmony_ci    address to produce a four-component vector, according to the storage
2025bd8deadSopenharmony_ci    modifier specified with the instruction.  The storage modifier has three
2035bd8deadSopenharmony_ci    parts:
2045bd8deadSopenharmony_ci
2055bd8deadSopenharmony_ci      - a base data type, "F", "S", or "U", specifying that the instruction
2065bd8deadSopenharmony_ci        fetches floating-point, signed integer, or unsigned integer values,
2075bd8deadSopenharmony_ci        respectively;
2085bd8deadSopenharmony_ci
2095bd8deadSopenharmony_ci      - a component size, specifying that the components fetched by the
2105bd8deadSopenharmony_ci        instruction have 8, 16, or 32 bits; and
2115bd8deadSopenharmony_ci
2125bd8deadSopenharmony_ci      - an optional component count, where "X2" and "X4" indicate that two or
2135bd8deadSopenharmony_ci        four components be fetched, and no count indicates a single component
2145bd8deadSopenharmony_ci        fetch.
2155bd8deadSopenharmony_ci
2165bd8deadSopenharmony_ci    When the storage modifier specifies that fewer than four components should
2175bd8deadSopenharmony_ci    be fetched, remaining components are filled with zeroes.  When performing
2185bd8deadSopenharmony_ci    a global load (LOAD), the GPU address is specified as an instruction
2195bd8deadSopenharmony_ci    operand.  When performing a constant buffer load (LDC), the GPU address is
2205bd8deadSopenharmony_ci    derived by adding the base address of the bound buffer object to an offset
2215bd8deadSopenharmony_ci    specified as an instruction operand.  Given a GPU address <address> and a
2225bd8deadSopenharmony_ci    storage modifier <modifier>, the memory load can be described by the
2235bd8deadSopenharmony_ci    following code:
2245bd8deadSopenharmony_ci
2255bd8deadSopenharmony_ci      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
2265bd8deadSopenharmony_ci      {
2275bd8deadSopenharmony_ci        result_t_vec result = { 0, 0, 0, 0 };
2285bd8deadSopenharmony_ci        switch (modifier) {
2295bd8deadSopenharmony_ci        case F32:
2305bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
2315bd8deadSopenharmony_ci            break;
2325bd8deadSopenharmony_ci        case F32X2:
2335bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
2345bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
2355bd8deadSopenharmony_ci            break;
2365bd8deadSopenharmony_ci        case F32X4:
2375bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
2385bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
2395bd8deadSopenharmony_ci            result.z = ((float32_t *)address)[2];
2405bd8deadSopenharmony_ci            result.w = ((float32_t *)address)[3];
2415bd8deadSopenharmony_ci            break;
2425bd8deadSopenharmony_ci        case S8:
2435bd8deadSopenharmony_ci            result.x = ((int8_t *)address)[0];
2445bd8deadSopenharmony_ci            break;
2455bd8deadSopenharmony_ci        case S16:
2465bd8deadSopenharmony_ci            result.x = ((int16_t *)address)[0];
2475bd8deadSopenharmony_ci            break;
2485bd8deadSopenharmony_ci        case S32:
2495bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
2505bd8deadSopenharmony_ci            break;
2515bd8deadSopenharmony_ci        case S32X2:
2525bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
2535bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
2545bd8deadSopenharmony_ci            break;
2555bd8deadSopenharmony_ci        case S32X4:
2565bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
2575bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
2585bd8deadSopenharmony_ci            result.z = ((int32_t *)address)[2];
2595bd8deadSopenharmony_ci            result.w = ((int32_t *)address)[3];
2605bd8deadSopenharmony_ci            break;
2615bd8deadSopenharmony_ci        case U8:
2625bd8deadSopenharmony_ci            result.x = ((uint8_t *)address)[0];
2635bd8deadSopenharmony_ci            break;
2645bd8deadSopenharmony_ci        case U16:
2655bd8deadSopenharmony_ci            result.x = ((uint16_t *)address)[0];
2665bd8deadSopenharmony_ci            break;
2675bd8deadSopenharmony_ci        case U32:
2685bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
2695bd8deadSopenharmony_ci            break;
2705bd8deadSopenharmony_ci        case U32X2:
2715bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
2725bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
2735bd8deadSopenharmony_ci            break;
2745bd8deadSopenharmony_ci        case U32X4:
2755bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
2765bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
2775bd8deadSopenharmony_ci            result.z = ((uint32_t *)address)[2];
2785bd8deadSopenharmony_ci            result.w = ((uint32_t *)address)[3];
2795bd8deadSopenharmony_ci            break;
2805bd8deadSopenharmony_ci        }
2815bd8deadSopenharmony_ci        return result;
2825bd8deadSopenharmony_ci      }
2835bd8deadSopenharmony_ci
2845bd8deadSopenharmony_ci    The offset used for the constant buffer loads must be aligned to the fetch
2855bd8deadSopenharmony_ci    size corresponding to the storage opcode modifier.  For S8 and U8, the
2865bd8deadSopenharmony_ci    offset has no alignment requirements.  For S16 and U16, the offset must be
2875bd8deadSopenharmony_ci    a multiple of two basic machine units.  For F32, S32, and U32, the offset
2885bd8deadSopenharmony_ci    must be a multiple of four.  For F32X2, S32X2, and U32X2, the offset must
2895bd8deadSopenharmony_ci    be a multiple of eight.  For F32X4, S32X4, and U32X4, the offset must be a
2905bd8deadSopenharmony_ci    multiple of sixteen.  If an offset is not correctly aligned, the values
2915bd8deadSopenharmony_ci    returned by a constant buffer load will be undefined.
2925bd8deadSopenharmony_ci
2935bd8deadSopenharmony_ci
2945bd8deadSopenharmony_ci    Modify Section 2.X.6, Program Options
2955bd8deadSopenharmony_ci
2965bd8deadSopenharmony_ci    + Extended Parameter Buffer Object Support (NV_parameter_buffer_object2)
2975bd8deadSopenharmony_ci
2985bd8deadSopenharmony_ci    If a program specifies the "NV_parameter_buffer_object2" option, it may
2995bd8deadSopenharmony_ci    use the CBUFFER statement to declare program parameter buffer variables
3005bd8deadSopenharmony_ci    and the LDC instruction to load data from parameter buffer variables using
3015bd8deadSopenharmony_ci    arbitrary offsets.
3025bd8deadSopenharmony_ci
3035bd8deadSopenharmony_ci
3045bd8deadSopenharmony_ci    Modify Section 2.X.8, Program Instruction Set
3055bd8deadSopenharmony_ci
3065bd8deadSopenharmony_ci    Section 2.X.8.Z, LDC:  Load from Constant Buffer
3075bd8deadSopenharmony_ci
3085bd8deadSopenharmony_ci    The LDC instruction loads a vector operand from a buffer object to yield a
3095bd8deadSopenharmony_ci    result vector.  The operand used for the LDC instruction must correspond
3105bd8deadSopenharmony_ci    to a parameter buffer variable declared using the "CBUFFER" statement; a
3115bd8deadSopenharmony_ci    program will fail to load if any other type of operand is used in an LDC
3125bd8deadSopenharmony_ci    instruction.
3135bd8deadSopenharmony_ci
3145bd8deadSopenharmony_ci      result = BufferMemoryLoad(&op0, storageModifier);
3155bd8deadSopenharmony_ci
3165bd8deadSopenharmony_ci    A base operand vector is fetched from memory as described in Section
3175bd8deadSopenharmony_ci    2.X.4.5, with the GPU address derived from the binding corresponding to
3185bd8deadSopenharmony_ci    the operand.  A final operand vector is derived from the base operand
3195bd8deadSopenharmony_ci    vector by applying swizzle, negation, and absolute value operand modifiers
3205bd8deadSopenharmony_ci    as described in Section 2.X.4.2.
3215bd8deadSopenharmony_ci
3225bd8deadSopenharmony_ci    The amount of memory in any given buffer object binding accessible by the
3235bd8deadSopenharmony_ci    LDC instruction may be limited.  If any component fetched by the LDC
3245bd8deadSopenharmony_ci    instruction extends 4*<n> or more basic machine units from the beginning
3255bd8deadSopenharmony_ci    of the buffer object binding, where <n> is the implementation-dependent
3265bd8deadSopenharmony_ci    constant MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV, the value fetched for that
3275bd8deadSopenharmony_ci    component will be undefined.
3285bd8deadSopenharmony_ci
3295bd8deadSopenharmony_ci    LDC supports no base data type modifiers, but requires exactly one storage
3305bd8deadSopenharmony_ci    modifier.  The base data types of the operand and result vectors are
3315bd8deadSopenharmony_ci    derived from the storage modifier.
3325bd8deadSopenharmony_ci
3335bd8deadSopenharmony_ci
3345bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.0 Specification (Rasterization)
3355bd8deadSopenharmony_ci
3365bd8deadSopenharmony_ci    None.
3375bd8deadSopenharmony_ci
3385bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.0 Specification (Per-Fragment
3395bd8deadSopenharmony_ciOperations and the Frame Buffer)
3405bd8deadSopenharmony_ci
3415bd8deadSopenharmony_ci    None.
3425bd8deadSopenharmony_ci
3435bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions)
3445bd8deadSopenharmony_ci
3455bd8deadSopenharmony_ci    None.
3465bd8deadSopenharmony_ci
3475bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.0 Specification (State and
3485bd8deadSopenharmony_ciState Requests)
3495bd8deadSopenharmony_ci
3505bd8deadSopenharmony_ci    None.
3515bd8deadSopenharmony_ci
3525bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.0 Specification (Invariance)
3535bd8deadSopenharmony_ci
3545bd8deadSopenharmony_ci    None.
3555bd8deadSopenharmony_ci
3565bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
3575bd8deadSopenharmony_ci
3585bd8deadSopenharmony_ci    None.
3595bd8deadSopenharmony_ci
3605bd8deadSopenharmony_ciErrors
3615bd8deadSopenharmony_ci
3625bd8deadSopenharmony_ci    No new errors.
3635bd8deadSopenharmony_ci
3645bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load
3655bd8deadSopenharmony_ci
3665bd8deadSopenharmony_ci    If NV_shader_buffer_load (or equivalent functionality) is not supported,
3675bd8deadSopenharmony_ci    references to the "LOAD" opcode in the description of the opcode modifiers
3685bd8deadSopenharmony_ci    for "LDC" should be removed.
3695bd8deadSopenharmony_ci
3705bd8deadSopenharmony_ciNew State
3715bd8deadSopenharmony_ci
3725bd8deadSopenharmony_ci    None.
3735bd8deadSopenharmony_ci
3745bd8deadSopenharmony_ciNew Implementation Dependent State
3755bd8deadSopenharmony_ci
3765bd8deadSopenharmony_ci    None.
3775bd8deadSopenharmony_ci
3785bd8deadSopenharmony_ciIssues
3795bd8deadSopenharmony_ci
3805bd8deadSopenharmony_ci    (1) What sort of alignment requirements, if any, should be imposed on the
3815bd8deadSopenharmony_ci        operand provided to the LDC instruction?
3825bd8deadSopenharmony_ci
3835bd8deadSopenharmony_ci      RESOLVED:  The offset of the operand must be aligned according to the
3845bd8deadSopenharmony_ci      size of the fetch.  For 1-, 2-, and 4-component fetches, the offset must
3855bd8deadSopenharmony_ci      be a multiple of <N>, 2*<N>, and 4*<N>, where <N> is the size in bytes
3865bd8deadSopenharmony_ci      of the components being fetched.
3875bd8deadSopenharmony_ci
3885bd8deadSopenharmony_ci    (2) NV_parameter_buffer_object provides an implementation-dependent limit
3895bd8deadSopenharmony_ci        on the portion of a buffer object that may be fetched via BUFFER and
3905bd8deadSopenharmony_ci        BUFFER4 variables?  Should the same limits apply to the LDC
3915bd8deadSopenharmony_ci        instruction?
3925bd8deadSopenharmony_ci
3935bd8deadSopenharmony_ci      RESOLVED:  Yes.  On currently shipping NVIDIA GPUs, the maximum program
3945bd8deadSopenharmony_ci      parameter buffer size is 16384 32-bit words, or 64KB.  Buffers larger
3955bd8deadSopenharmony_ci      than 64KB may be used, but any fetches accessing memory beyond the first
3965bd8deadSopenharmony_ci      64KB of a buffer binding will return undefined values.
3975bd8deadSopenharmony_ci
3985bd8deadSopenharmony_ci    (3) Should we support fetches of 3-component vectors?  If so, what should
3995bd8deadSopenharmony_ci    be the minimum alignment for the specified offset?
4005bd8deadSopenharmony_ci
4015bd8deadSopenharmony_ci      RESOLVED:  No, we'll leave 3-component vectors out of this extension.
4025bd8deadSopenharmony_ci      This limitation can be worked around by either by doing three separate
4035bd8deadSopenharmony_ci      single-component fetches or a four-component fetch with an appropriate
4045bd8deadSopenharmony_ci      write mask.  The former approach supports indexing in a tightly packed
4055bd8deadSopenharmony_ci      array of 3-component vectors; the latter would require that array
4065bd8deadSopenharmony_ci      elements be padded to four components.
4075bd8deadSopenharmony_ci
4085bd8deadSopenharmony_ci    (4) Should we support fetches of 8- and 16-bit components?
4095bd8deadSopenharmony_ci
4105bd8deadSopenharmony_ci      RESOLVED:  Yes, we will support fetches of 8- and 16-bit signed and
4115bd8deadSopenharmony_ci      unsigned integers.
4125bd8deadSopenharmony_ci
4135bd8deadSopenharmony_ci      Fetches of vectors of 8- and 16-bit integers are not supported but may
4145bd8deadSopenharmony_ci      be emulated by performing shift/mask operations on the results of 32-bit
4155bd8deadSopenharmony_ci      fetches.
4165bd8deadSopenharmony_ci
4175bd8deadSopenharmony_ci      Fetches of 16-bit floating-point values, or floating-point vectors
4185bd8deadSopenharmony_ci      thereof, are not supported.  A single fp16 fetch may be emulated using a
4195bd8deadSopenharmony_ci      16-bit unsigned integer fetch and the UP2H instruction to convert the 16
4205bd8deadSopenharmony_ci      LSBs of the fetch to a floating-point value.  The encoding of 16-bit
4215bd8deadSopenharmony_ci      floating-point values is described in section 2.1.2 of the OpenGL 3.0
4225bd8deadSopenharmony_ci      specification.
4235bd8deadSopenharmony_ci
4245bd8deadSopenharmony_ci    (5) Should we support fetches of 64-bit components?
4255bd8deadSopenharmony_ci
4265bd8deadSopenharmony_ci      RESOLVED:  No; the instruction set provided by NV_gpu_program4 does not
4275bd8deadSopenharmony_ci      support 64-bit components anywhere.  If future instructions support
4285bd8deadSopenharmony_ci      64-bit components, this restriction should be removed.
4295bd8deadSopenharmony_ci
4305bd8deadSopenharmony_ci    (6) How should the operands of the LDC instruction should be specified?
4315bd8deadSopenharmony_ci
4325bd8deadSopenharmony_ci      RESOLVED:  We will create a new type of buffer variable ("CBUFFER"),
4335bd8deadSopenharmony_ci      which defines an array of bytes to be fetched form.  The type of fetch
4345bd8deadSopenharmony_ci      to perform is specified by a storage modifier (as in
4355bd8deadSopenharmony_ci      NV_shader_buffer_load).  An offset relative to the buffer binding (in
4365bd8deadSopenharmony_ci      bytes) may be specified using normal array indexing syntax, and an index
4375bd8deadSopenharmony_ci      computed at run-time is supported.
4385bd8deadSopenharmony_ci
4395bd8deadSopenharmony_ci      Some examples:
4405bd8deadSopenharmony_ci
4415bd8deadSopenharmony_ci        CBUFFER buffer[] = { program.buffer[0] };
4425bd8deadSopenharmony_ci        TEMP      i;
4435bd8deadSopenharmony_ci        MOV.S     i, 32;                  # computed offset of 32B
4445bd8deadSopenharmony_ci        LDC.F32   result, buffer[12];     # (x,0,0,0) from bytes 12..15
4455bd8deadSopenharmony_ci        LDC.F32X4 result, buffer[16];     # (x,y,z,w) from bytes 16..31
4465bd8deadSopenharmony_ci        LDC.U8    result, buffer[i.x+3];  # (x,0,0,0) from byte 35
4475bd8deadSopenharmony_ci        LDC.S32   result, buffer[i.x+12]; # (x,0,0,0) from bytes 44..47
4485bd8deadSopenharmony_ci        LDC.U32X2 result, buffer[i.x+8];  # (x,y,0,0) from bytes 40..47
4495bd8deadSopenharmony_ci        LDC.S16   result, buffer[i.x+2];  # (x,0,0,0) from bytes 34..35
4505bd8deadSopenharmony_ci
4515bd8deadSopenharmony_ci      We chose to provide the new buffer variable type (CBUFFER) rather than
4525bd8deadSopenharmony_ci      reusing BUFFER or BUFFER4.  For CBUFFER variables, "buffer[12]"
4535bd8deadSopenharmony_ci      unambiguously specifies a 12-byte offset.  For BUFFER or BUFFER4
4545bd8deadSopenharmony_ci      variables, an operand of "buffer[12]" already has an existing meaning,
4555bd8deadSopenharmony_ci      implying an offset of 12 words or vectors, which would be 48 or 192
4565bd8deadSopenharmony_ci      bytes, respectively.  Because we want to be able to fetch 8-, and 16-bit
4575bd8deadSopenharmony_ci      units, having an offset multiplied by four doesn't make sense.  We could
4585bd8deadSopenharmony_ci      have had LDC simply ignore the type of binding and always interpret an
4595bd8deadSopenharmony_ci      index as a byte offset, but chose the new declaration type to avoid
4605bd8deadSopenharmony_ci      confusion.
4615bd8deadSopenharmony_ci        
4625bd8deadSopenharmony_ci      We also considered an approach where the buffer and offset were
4635bd8deadSopenharmony_ci      specified in separate operands.  That would be similar to texture, where
4645bd8deadSopenharmony_ci      the coordinates and texture are specified separately.  The first operand
4655bd8deadSopenharmony_ci      would have been interpreted as a unsigned scalar specifying a byte
4665bd8deadSopenharmony_ci      offset, the second operand would have specified a buffer variable
4675bd8deadSopenharmony_ci      binding, and a pointer would be obtained by adding the two
4685bd8deadSopenharmony_ci      operands. This would have looked something like:
4695bd8deadSopenharmony_ci
4705bd8deadSopenharmony_ci        BUFFER buffer[] = { program.buffer[0] };
4715bd8deadSopenharmony_ci        LDC.S32X2 result, offset.x, buffer;
4725bd8deadSopenharmony_ci
4735bd8deadSopenharmony_ci      We chose not to implement this approach mainly because this syntax would
4745bd8deadSopenharmony_ci      require specifying a new type of instruction; the syntax we adopted
4755bd8deadSopenharmony_ci      simply reuses existing vector operand and indexing mechanisms.
4765bd8deadSopenharmony_ci      Additionally, the syntax in this extension provides immediate offsets
4775bd8deadSopenharmony_ci      for "free", which the operand-buffer syntax would not support directly
4785bd8deadSopenharmony_ci      without additional new syntax.  For example, to load a structure with a
4795bd8deadSopenharmony_ci      pair of two-component vectors using offset-buffer syntax, you would have
4805bd8deadSopenharmony_ci      to do something like:
4815bd8deadSopenharmony_ci
4825bd8deadSopenharmony_ci        BUFFER buffer[] = { program.buffer[0] };
4835bd8deadSopenharmony_ci        TEMP offset;
4845bd8deadSopenharmony_ci        LDC.S32X2 result1, offset.x, buffer;
4855bd8deadSopenharmony_ci        ADD.U offset.x, offset.x, 8;            # bump offset to second vector
4865bd8deadSopenharmony_ci        LDC.S32X2 result2, offset.x, buffer;
4875bd8deadSopenharmony_ci
4885bd8deadSopenharmony_ci    (7) How should the fetches in the LDC instruction interact with other
4895bd8deadSopenharmony_ci        operand modifiers (swizzle, absolute value, negation)?  With result
4905bd8deadSopenharmony_ci        modifiers (condition codes, saturation)?
4915bd8deadSopenharmony_ci
4925bd8deadSopenharmony_ci      RESOLVED:  These features will be orthogonal.  When any of these
4935bd8deadSopenharmony_ci      modifiers are specified, the base data type to which they apply come
4945bd8deadSopenharmony_ci      from the storage modifier of the LDC instruction.
4955bd8deadSopenharmony_ci
4965bd8deadSopenharmony_ci      The LDC instruction is defined to produce a "base operand vector" from a
4975bd8deadSopenharmony_ci      memory fetch.  This isn't particularly different from normal operands,
4985bd8deadSopenharmony_ci      where a base operand vector is derived from the binding corresponding to
4995bd8deadSopenharmony_ci      the operand.  In both cases, the components of this vector are swizzled
5005bd8deadSopenharmony_ci      and have optional absolute value and negation operations performed to
5015bd8deadSopenharmony_ci      produce a final vector operand, as is the case with other vector
5025bd8deadSopenharmony_ci      operands.  
5035bd8deadSopenharmony_ci
5045bd8deadSopenharmony_ci      If condition code operations or saturation are specified for the result
5055bd8deadSopenharmony_ci      vector, these operations are performed using the appropriate data types.
5065bd8deadSopenharmony_ci
5075bd8deadSopenharmony_ci    (8) What happens if a non-zero base offset is specified for a CBUFFER
5085bd8deadSopenharmony_ci        variable?
5095bd8deadSopenharmony_ci
5105bd8deadSopenharmony_ci      RESOLVED:  A subset of the bytes in a buffer object can be specified
5115bd8deadSopenharmony_ci      using range syntax like the following:
5125bd8deadSopenharmony_ci
5135bd8deadSopenharmony_ci        CBUFFER buffer[] = { program.buffer[0][16..31] };
5145bd8deadSopenharmony_ci
5155bd8deadSopenharmony_ci      The sub-range need not start at the beginning of the buffer object; in
5165bd8deadSopenharmony_ci      the example above, it starts 16 bytes into the buffer.  When accessing a
5175bd8deadSopenharmony_ci      parameter buffer variable corresponding to such a sub-range, an array
5185bd8deadSopenharmony_ci      index is relative to the base of the sub-range.  So the offset of the
5195bd8deadSopenharmony_ci      sub-range is effectively added to the index used for the LDC operand:
5205bd8deadSopenharmony_ci
5215bd8deadSopenharmony_ci        LDC.F32   result, buffer[12];     # (x,0,0,0) from bytes 28..31
5225bd8deadSopenharmony_ci
5235bd8deadSopenharmony_ci    (9) What happens if a non-array CBUFFER variable is used?
5245bd8deadSopenharmony_ci
5255bd8deadSopenharmony_ci      RESOLVED:  A non-array variable may be used with LDC.  However, array
5265bd8deadSopenharmony_ci      indexing isn't supported with non-array variables, so all LDC loads
5275bd8deadSopenharmony_ci      using that variable will fetch using the same base address.
5285bd8deadSopenharmony_ci
5295bd8deadSopenharmony_ci        CBUFFER bufferElement = program.buffer[0][32];
5305bd8deadSopenharmony_ci        LDC.U8    result, buffer;     # (x,0,0,0) from byte 32
5315bd8deadSopenharmony_ci        LDC.S16   result, buffer;     # (x,0,0,0) from bytes 32..33
5325bd8deadSopenharmony_ci        LDC.F32   result, buffer;     # (x,0,0,0) from bytes 32..35
5335bd8deadSopenharmony_ci        LDC.F32X4 result, buffer;     # (x,y,z,w) from bytes 32..47
5345bd8deadSopenharmony_ci
5355bd8deadSopenharmony_ci    (10) Should single-component fetches from LDC smear their results across
5365bd8deadSopenharmony_ci         all four components of the result vector, to allow packing multiple
5375bd8deadSopenharmony_ci         non-vectors into a single vector?
5385bd8deadSopenharmony_ci
5395bd8deadSopenharmony_ci      RESOLVED:  No.  However, swizzle suffixes on the operand will provide
5405bd8deadSopenharmony_ci      this capability for free.  For example, let's say you wanted to fetch
5415bd8deadSopenharmony_ci      four scalars from a buffer and pack the results into a single temporary
5425bd8deadSopenharmony_ci      vector.  The swizzle syntax lets you do this by smearing the real
5435bd8deadSopenharmony_ci      component (always fetched in "x") into the other components:
5445bd8deadSopenharmony_ci
5455bd8deadSopenharmony_ci        CBUFFER buffer[] = { program.buffer[0] };
5465bd8deadSopenharmony_ci        LDC.F32 temp.x, buffer[16];
5475bd8deadSopenharmony_ci        LDC.F32 temp.y, buffer[28].x;
5485bd8deadSopenharmony_ci        LDC.F32 temp.z, buffer[32].x;
5495bd8deadSopenharmony_ci        LDC.F32 temp.w, buffer[40].x;
5505bd8deadSopenharmony_ci        
5515bd8deadSopenharmony_ci
5525bd8deadSopenharmony_ciRevision History
5535bd8deadSopenharmony_ci
5545bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
5555bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------
5565bd8deadSopenharmony_ci     1              pbrown    Internal revisions.
5575bd8deadSopenharmony_ci     2    09/09/09  mjk       Assigned number
558