extensions/NV/NV_parameter_buffer_object2.txt

5bd8deadSopenharmony_ciName
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_parameter_buffer_object2
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciName Strings
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    GL_NV_parameter_buffer_object2
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciContact
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciStatus
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Shipping (July 2009, Release 190)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciVersion
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Last Modified Date:         09/09/09
5bd8deadSopenharmony_ci    NVIDIA Revision:            2
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNumber
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    378
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    OpenGL 2.0 is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_gpu_program4 is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_parameter_buffer_object is required.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension is written against the NV_gpu_program4 specification.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    NV_shader_buffer_load trivially affects the definition of this extension.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciOverview
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This extension builds on the NV_parameter_buffer_object extension to
5bd8deadSopenharmony_ci    provide additional flexibility in sourcing data from buffer objects.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The original NV_parameter_buffer_object (PaBO) extension provided the
5bd8deadSopenharmony_ci    ability to bind buffer objects to a set of numbered binding points and
5bd8deadSopenharmony_ci    access them in assembly programs as though they were arrays of 32-bit
5bd8deadSopenharmony_ci    scalars (via the BUFFER variable type) or arrays of four-component vectors
5bd8deadSopenharmony_ci    with 32-bit scalar components (via the BUFFER4 variable type).  However,
5bd8deadSopenharmony_ci    the functionality it provided had some significant limits on flexibility.
5bd8deadSopenharmony_ci    Since any given buffer binding point could be used either as a BUFFER or
5bd8deadSopenharmony_ci    BUFFER4, but not both, programs couldn't do both 32- and 128-bit fetches
5bd8deadSopenharmony_ci    from a single binding point.  Additionally, No support was provided for
5bd8deadSopenharmony_ci    8-, 16-, or 64-bit fetches, though they could be emulated using a larger
5bd8deadSopenharmony_ci    loads, with bitfield operations and/or write masking to put components in
5bd8deadSopenharmony_ci    the right places.  Indexing was supported, but strides were limited to 4-
5bd8deadSopenharmony_ci    and 16-byte multiples, depending on whether BUFFER or BUFFER4 is used.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    This new extension provides the buffer variable declaration type CBUFFER
5bd8deadSopenharmony_ci    to specify a buffer that is treated as an array of bytes, rather than an
5bd8deadSopenharmony_ci    array of words or vectors.  The LDC instruction allows programs to extract
5bd8deadSopenharmony_ci    a vector of data from a CBUFFER variable, using a size and component count
5bd8deadSopenharmony_ci    specified in the opcode modifier.  1-, 2-, and 4-component fetches are
5bd8deadSopenharmony_ci    supported.  The LDC instruction supports byte offsets using normal array
5bd8deadSopenharmony_ci    indexing mechanisms; both run-time and immediate offsets are supported.
5bd8deadSopenharmony_ci    Offsets used for a buffer object fetch are required to be aligned to the
5bd8deadSopenharmony_ci    size of the fetch (1, 2, 4, 8, or 16 bytes).
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Procedures and Functions
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Tokens
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (All modifications are relative to Section 2.X, GPU Programs, from the
5bd8deadSopenharmony_ci     NV_gpu_program4 specification.)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.2, Program Grammar
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (add after the long list of grammar rules) If a program specifies the
5bd8deadSopenharmony_ci    NV_parameter_buffer_object2 program option, the following rules are added
5bd8deadSopenharmony_ci    to the NV_gpu_program4 base program grammar:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <VECTORop>              ::= "LDC"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <opModifier>            ::= "F32";
5bd8deadSopenharmony_ci                              | "F32X2";
5bd8deadSopenharmony_ci                              | "F32X4";
5bd8deadSopenharmony_ci                              | "S8";
5bd8deadSopenharmony_ci                              | "S16";
5bd8deadSopenharmony_ci                              | "S32";
5bd8deadSopenharmony_ci                              | "S32X2";
5bd8deadSopenharmony_ci                              | "S32X4";
5bd8deadSopenharmony_ci                              | "U8";
5bd8deadSopenharmony_ci                              | "U16";
5bd8deadSopenharmony_ci                              | "U32";
5bd8deadSopenharmony_ci                              | "U32X2";
5bd8deadSopenharmony_ci                              | "U32X4";
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    <bufferDeclType>        ::= "CBUFFER"
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.3.6, Program Parameter Buffers
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify the paragraph describing the different type of parameter buffer
5bd8deadSopenharmony_ci    variable declarations to include support for "CBUFFER".)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Program parameter buffer variables are treated as an array of
5bd8deadSopenharmony_ci    single-component words if the <bufferDeclType> grammar rule matches
5bd8deadSopenharmony_ci    "BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
5bd8deadSopenharmony_ci    Program parameter buffers may also be declared as an array of basic
5bd8deadSopenharmony_ci    machine units from which data can be extracted using the LDC (load
5bd8deadSopenharmony_ci    constant) instruction, if <bufferDeclType> matches "CBUFFER".  Parameter
5bd8deadSopenharmony_ci    buffer variables declared using "CBUFFER" may not be used as an operand in
5bd8deadSopenharmony_ci    any instruction other than LDC, while "BUFFER" and "BUFFER4" variables may
5bd8deadSopenharmony_ci    not be used with LDC.  A program will fail to load if a variable declared
5bd8deadSopenharmony_ci    as "BUFFER" and another variable declared as "BUFFER4" use the same buffer
5bd8deadSopenharmony_ci    binding point.  There is no limitation on the use of "CBUFFER" variables
5bd8deadSopenharmony_ci    in conjunction with "BUFFER" or "BUFFER4" variables using the same buffer
5bd8deadSopenharmony_ci    binding point.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (modify/restructure the paragraph describing basic program parameter
5bd8deadSopenharmony_ci     bindings to handle the byte bindings provided by "CBUFFER" variables)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a program parameter buffer binding matches "program.buffer[a][b]", the
5bd8deadSopenharmony_ci    program parameter variable corresponds to element <b> of the buffer object
5bd8deadSopenharmony_ci    bound to binding point <a>.  Each element of the bound buffer object is
5bd8deadSopenharmony_ci    treated as:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * a single basic machine unit of data, if the variable is declared using
5bd8deadSopenharmony_ci        "CBUFFER";
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * a single word of data that can hold an integer or floating-point
5bd8deadSopenharmony_ci        value, if the variable is declared as "BUFFER"; or
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      * four words of data that can hold integer or floating-point values, if
5bd8deadSopenharmony_ci        the variable is declared as "BUFFER4".
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When a binding corresponding to a "BUFFER" variable is used as an operand,
5bd8deadSopenharmony_ci    the selected word is broadcast to all four components of the variable.
5bd8deadSopenharmony_ci    When a binding corresponding to a "BUFFER4" variable is used as an
5bd8deadSopenharmony_ci    operand, the four components of the selected buffer element are loaded
5bd8deadSopenharmony_ci    into the variable.  A binding corresponding to a "CBUFFER" variable may be
5bd8deadSopenharmony_ci    used only in the LDC instruction, and will be used there as a pointer to
5bd8deadSopenharmony_ci    extract operand values from buffer memory.  If no buffer object is bound
5bd8deadSopenharmony_ci    to binding point <a>, or the bound buffer object is not large enough to
5bd8deadSopenharmony_ci    hold element <b>, the values used are undefined.  The binding point <a>
5bd8deadSopenharmony_ci    must be a nonnegative integer constant.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.4, Program Execution Environment
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add to the set of opcodes in Table X.13)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci                  Modifiers
5bd8deadSopenharmony_ci      Instruction F I C S H D  Out Inputs    Description
5bd8deadSopenharmony_ci      ----------- - - - - - -  --- --------  --------------------------------
5bd8deadSopenharmony_ci      LDC         X X X X - F  v   v         load from constant buffer
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.4.1, Program Instruction Modifiers
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (Add to Table X.14, Instruction Modifiers, and to the corresponding
5bd8deadSopenharmony_ci    description following the table)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Modifier  Description
5bd8deadSopenharmony_ci      --------  -----------------------------------------------
5bd8deadSopenharmony_ci      F32       Access one 32-bit floating-point value
5bd8deadSopenharmony_ci      F32X2     Access two 32-bit floating-point values
5bd8deadSopenharmony_ci      F32X4     Access four 32-bit floating-point values
5bd8deadSopenharmony_ci      S8        Access one 8-bit signed integer value
5bd8deadSopenharmony_ci      S16       Access one 16-bit signed integer value
5bd8deadSopenharmony_ci      S32       Access one 32-bit signed integer value
5bd8deadSopenharmony_ci      S32X2     Access two 32-bit signed integer values
5bd8deadSopenharmony_ci      S32X4     Access four 32-bit signed integer values
5bd8deadSopenharmony_ci      U8        Access one 8-bit unsigned integer value
5bd8deadSopenharmony_ci      U16       Access one 16-bit unsigned integer value
5bd8deadSopenharmony_ci      U32       Access one 32-bit unsigned integer value
5bd8deadSopenharmony_ci      U32X2     Access two 32-bit unsigned integer values
5bd8deadSopenharmony_ci      U32X4     Access four 32-bit unsigned integer values
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    For memory load operations, the "F32", "F32X2", "F32X4", "S8", "S16",
5bd8deadSopenharmony_ci    "S32", "S32X2", "S32X4", "U8", "U16", "U32", "U32X2", and "U32X4" storage
5bd8deadSopenharmony_ci    modifiers control how data are loaded from memory.  Storage modifiers are
5bd8deadSopenharmony_ci    supported by the LDC and LOAD instructions and are covered in more detail
5bd8deadSopenharmony_ci    in the descriptions of these instructions.  These instructions must
5bd8deadSopenharmony_ci    specify exactly one of these modifiers, and may not specify any of the
5bd8deadSopenharmony_ci    base data type modifiers (F,U,S) described above.  The base data type of
5bd8deadSopenharmony_ci    the result vector of a LOAD or LDC instruction is trivially derived from
5bd8deadSopenharmony_ci    the storage modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Add New Section 2.X.4.5, Program Memory Access
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Programs may load from buffer object memory via the LDC (load constant)
5bd8deadSopenharmony_ci    and LOAD (global load) instructions.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Load instructions read 8, 16, 32, 64, or 128 bits of data from a source
5bd8deadSopenharmony_ci    address to produce a four-component vector, according to the storage
5bd8deadSopenharmony_ci    modifier specified with the instruction.  The storage modifier has three
5bd8deadSopenharmony_ci    parts:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      - a base data type, "F", "S", or "U", specifying that the instruction
5bd8deadSopenharmony_ci        fetches floating-point, signed integer, or unsigned integer values,
5bd8deadSopenharmony_ci        respectively;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      - a component size, specifying that the components fetched by the
5bd8deadSopenharmony_ci        instruction have 8, 16, or 32 bits; and
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      - an optional component count, where "X2" and "X4" indicate that two or
5bd8deadSopenharmony_ci        four components be fetched, and no count indicates a single component
5bd8deadSopenharmony_ci        fetch.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    When the storage modifier specifies that fewer than four components should
5bd8deadSopenharmony_ci    be fetched, remaining components are filled with zeroes.  When performing
5bd8deadSopenharmony_ci    a global load (LOAD), the GPU address is specified as an instruction
5bd8deadSopenharmony_ci    operand.  When performing a constant buffer load (LDC), the GPU address is
5bd8deadSopenharmony_ci    derived by adding the base address of the bound buffer object to an offset
5bd8deadSopenharmony_ci    specified as an instruction operand.  Given a GPU address <address> and a
5bd8deadSopenharmony_ci    storage modifier <modifier>, the memory load can be described by the
5bd8deadSopenharmony_ci    following code:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
5bd8deadSopenharmony_ci      {
5bd8deadSopenharmony_ci        result_t_vec result = { 0, 0, 0, 0 };
5bd8deadSopenharmony_ci        switch (modifier) {
5bd8deadSopenharmony_ci        case F32:
5bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F32X2:
5bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case F32X4:
5bd8deadSopenharmony_ci            result.x = ((float32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((float32_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((float32_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((float32_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S8:
5bd8deadSopenharmony_ci            result.x = ((int8_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S16:
5bd8deadSopenharmony_ci            result.x = ((int16_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32:
5bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32X2:
5bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case S32X4:
5bd8deadSopenharmony_ci            result.x = ((int32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((int32_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((int32_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((int32_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U8:
5bd8deadSopenharmony_ci            result.x = ((uint8_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U16:
5bd8deadSopenharmony_ci            result.x = ((uint16_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32:
5bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32X2:
5bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        case U32X4:
5bd8deadSopenharmony_ci            result.x = ((uint32_t *)address)[0];
5bd8deadSopenharmony_ci            result.y = ((uint32_t *)address)[1];
5bd8deadSopenharmony_ci            result.z = ((uint32_t *)address)[2];
5bd8deadSopenharmony_ci            result.w = ((uint32_t *)address)[3];
5bd8deadSopenharmony_ci            break;
5bd8deadSopenharmony_ci        }
5bd8deadSopenharmony_ci        return result;
5bd8deadSopenharmony_ci      }
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The offset used for the constant buffer loads must be aligned to the fetch
5bd8deadSopenharmony_ci    size corresponding to the storage opcode modifier.  For S8 and U8, the
5bd8deadSopenharmony_ci    offset has no alignment requirements.  For S16 and U16, the offset must be
5bd8deadSopenharmony_ci    a multiple of two basic machine units.  For F32, S32, and U32, the offset
5bd8deadSopenharmony_ci    must be a multiple of four.  For F32X2, S32X2, and U32X2, the offset must
5bd8deadSopenharmony_ci    be a multiple of eight.  For F32X4, S32X4, and U32X4, the offset must be a
5bd8deadSopenharmony_ci    multiple of sixteen.  If an offset is not correctly aligned, the values
5bd8deadSopenharmony_ci    returned by a constant buffer load will be undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.6, Program Options
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    + Extended Parameter Buffer Object Support (NV_parameter_buffer_object2)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If a program specifies the "NV_parameter_buffer_object2" option, it may
5bd8deadSopenharmony_ci    use the CBUFFER statement to declare program parameter buffer variables
5bd8deadSopenharmony_ci    and the LDC instruction to load data from parameter buffer variables using
5bd8deadSopenharmony_ci    arbitrary offsets.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Modify Section 2.X.8, Program Instruction Set
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Section 2.X.8.Z, LDC:  Load from Constant Buffer
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The LDC instruction loads a vector operand from a buffer object to yield a
5bd8deadSopenharmony_ci    result vector.  The operand used for the LDC instruction must correspond
5bd8deadSopenharmony_ci    to a parameter buffer variable declared using the "CBUFFER" statement; a
5bd8deadSopenharmony_ci    program will fail to load if any other type of operand is used in an LDC
5bd8deadSopenharmony_ci    instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      result = BufferMemoryLoad(&op0, storageModifier);
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    A base operand vector is fetched from memory as described in Section
5bd8deadSopenharmony_ci    2.X.4.5, with the GPU address derived from the binding corresponding to
5bd8deadSopenharmony_ci    the operand.  A final operand vector is derived from the base operand
5bd8deadSopenharmony_ci    vector by applying swizzle, negation, and absolute value operand modifiers
5bd8deadSopenharmony_ci    as described in Section 2.X.4.2.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    The amount of memory in any given buffer object binding accessible by the
5bd8deadSopenharmony_ci    LDC instruction may be limited.  If any component fetched by the LDC
5bd8deadSopenharmony_ci    instruction extends 4*<n> or more basic machine units from the beginning
5bd8deadSopenharmony_ci    of the buffer object binding, where <n> is the implementation-dependent
5bd8deadSopenharmony_ci    constant MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV, the value fetched for that
5bd8deadSopenharmony_ci    component will be undefined.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    LDC supports no base data type modifiers, but requires exactly one storage
5bd8deadSopenharmony_ci    modifier.  The base data types of the operand and result vectors are
5bd8deadSopenharmony_ci    derived from the storage modifier.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.0 Specification (Rasterization)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.0 Specification (Per-Fragment
5bd8deadSopenharmony_ciOperations and the Frame Buffer)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.0 Specification (State and
5bd8deadSopenharmony_ciState Requests)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.0 Specification (Invariance)
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciErrors
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    No new errors.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    If NV_shader_buffer_load (or equivalent functionality) is not supported,
5bd8deadSopenharmony_ci    references to the "LOAD" opcode in the description of the opcode modifiers
5bd8deadSopenharmony_ci    for "LDC" should be removed.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew State
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciNew Implementation Dependent State
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    None.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciIssues
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (1) What sort of alignment requirements, if any, should be imposed on the
5bd8deadSopenharmony_ci        operand provided to the LDC instruction?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  The offset of the operand must be aligned according to the
5bd8deadSopenharmony_ci      size of the fetch.  For 1-, 2-, and 4-component fetches, the offset must
5bd8deadSopenharmony_ci      be a multiple of <N>, 2*<N>, and 4*<N>, where <N> is the size in bytes
5bd8deadSopenharmony_ci      of the components being fetched.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (2) NV_parameter_buffer_object provides an implementation-dependent limit
5bd8deadSopenharmony_ci        on the portion of a buffer object that may be fetched via BUFFER and
5bd8deadSopenharmony_ci        BUFFER4 variables?  Should the same limits apply to the LDC
5bd8deadSopenharmony_ci        instruction?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Yes.  On currently shipping NVIDIA GPUs, the maximum program
5bd8deadSopenharmony_ci      parameter buffer size is 16384 32-bit words, or 64KB.  Buffers larger
5bd8deadSopenharmony_ci      than 64KB may be used, but any fetches accessing memory beyond the first
5bd8deadSopenharmony_ci      64KB of a buffer binding will return undefined values.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (3) Should we support fetches of 3-component vectors?  If so, what should
5bd8deadSopenharmony_ci    be the minimum alignment for the specified offset?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  No, we'll leave 3-component vectors out of this extension.
5bd8deadSopenharmony_ci      This limitation can be worked around by either by doing three separate
5bd8deadSopenharmony_ci      single-component fetches or a four-component fetch with an appropriate
5bd8deadSopenharmony_ci      write mask.  The former approach supports indexing in a tightly packed
5bd8deadSopenharmony_ci      array of 3-component vectors; the latter would require that array
5bd8deadSopenharmony_ci      elements be padded to four components.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (4) Should we support fetches of 8- and 16-bit components?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  Yes, we will support fetches of 8- and 16-bit signed and
5bd8deadSopenharmony_ci      unsigned integers.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Fetches of vectors of 8- and 16-bit integers are not supported but may
5bd8deadSopenharmony_ci      be emulated by performing shift/mask operations on the results of 32-bit
5bd8deadSopenharmony_ci      fetches.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Fetches of 16-bit floating-point values, or floating-point vectors
5bd8deadSopenharmony_ci      thereof, are not supported.  A single fp16 fetch may be emulated using a
5bd8deadSopenharmony_ci      16-bit unsigned integer fetch and the UP2H instruction to convert the 16
5bd8deadSopenharmony_ci      LSBs of the fetch to a floating-point value.  The encoding of 16-bit
5bd8deadSopenharmony_ci      floating-point values is described in section 2.1.2 of the OpenGL 3.0
5bd8deadSopenharmony_ci      specification.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (5) Should we support fetches of 64-bit components?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  No; the instruction set provided by NV_gpu_program4 does not
5bd8deadSopenharmony_ci      support 64-bit components anywhere.  If future instructions support
5bd8deadSopenharmony_ci      64-bit components, this restriction should be removed.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (6) How should the operands of the LDC instruction should be specified?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  We will create a new type of buffer variable ("CBUFFER"),
5bd8deadSopenharmony_ci      which defines an array of bytes to be fetched form.  The type of fetch
5bd8deadSopenharmony_ci      to perform is specified by a storage modifier (as in
5bd8deadSopenharmony_ci      NV_shader_buffer_load).  An offset relative to the buffer binding (in
5bd8deadSopenharmony_ci      bytes) may be specified using normal array indexing syntax, and an index
5bd8deadSopenharmony_ci      computed at run-time is supported.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      Some examples:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        CBUFFER buffer[] = { program.buffer[0] };
5bd8deadSopenharmony_ci        TEMP      i;
5bd8deadSopenharmony_ci        MOV.S     i, 32;                  # computed offset of 32B
5bd8deadSopenharmony_ci        LDC.F32   result, buffer[12];     # (x,0,0,0) from bytes 12..15
5bd8deadSopenharmony_ci        LDC.F32X4 result, buffer[16];     # (x,y,z,w) from bytes 16..31
5bd8deadSopenharmony_ci        LDC.U8    result, buffer[i.x+3];  # (x,0,0,0) from byte 35
5bd8deadSopenharmony_ci        LDC.S32   result, buffer[i.x+12]; # (x,0,0,0) from bytes 44..47
5bd8deadSopenharmony_ci        LDC.U32X2 result, buffer[i.x+8];  # (x,y,0,0) from bytes 40..47
5bd8deadSopenharmony_ci        LDC.S16   result, buffer[i.x+2];  # (x,0,0,0) from bytes 34..35
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      We chose to provide the new buffer variable type (CBUFFER) rather than
5bd8deadSopenharmony_ci      reusing BUFFER or BUFFER4.  For CBUFFER variables, "buffer[12]"
5bd8deadSopenharmony_ci      unambiguously specifies a 12-byte offset.  For BUFFER or BUFFER4
5bd8deadSopenharmony_ci      variables, an operand of "buffer[12]" already has an existing meaning,
5bd8deadSopenharmony_ci      implying an offset of 12 words or vectors, which would be 48 or 192
5bd8deadSopenharmony_ci      bytes, respectively.  Because we want to be able to fetch 8-, and 16-bit
5bd8deadSopenharmony_ci      units, having an offset multiplied by four doesn't make sense.  We could
5bd8deadSopenharmony_ci      have had LDC simply ignore the type of binding and always interpret an
5bd8deadSopenharmony_ci      index as a byte offset, but chose the new declaration type to avoid
5bd8deadSopenharmony_ci      confusion.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      We also considered an approach where the buffer and offset were
5bd8deadSopenharmony_ci      specified in separate operands.  That would be similar to texture, where
5bd8deadSopenharmony_ci      the coordinates and texture are specified separately.  The first operand
5bd8deadSopenharmony_ci      would have been interpreted as a unsigned scalar specifying a byte
5bd8deadSopenharmony_ci      offset, the second operand would have specified a buffer variable
5bd8deadSopenharmony_ci      binding, and a pointer would be obtained by adding the two
5bd8deadSopenharmony_ci      operands. This would have looked something like:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        BUFFER buffer[] = { program.buffer[0] };
5bd8deadSopenharmony_ci        LDC.S32X2 result, offset.x, buffer;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      We chose not to implement this approach mainly because this syntax would
5bd8deadSopenharmony_ci      require specifying a new type of instruction; the syntax we adopted
5bd8deadSopenharmony_ci      simply reuses existing vector operand and indexing mechanisms.
5bd8deadSopenharmony_ci      Additionally, the syntax in this extension provides immediate offsets
5bd8deadSopenharmony_ci      for "free", which the operand-buffer syntax would not support directly
5bd8deadSopenharmony_ci      without additional new syntax.  For example, to load a structure with a
5bd8deadSopenharmony_ci      pair of two-component vectors using offset-buffer syntax, you would have
5bd8deadSopenharmony_ci      to do something like:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        BUFFER buffer[] = { program.buffer[0] };
5bd8deadSopenharmony_ci        TEMP offset;
5bd8deadSopenharmony_ci        LDC.S32X2 result1, offset.x, buffer;
5bd8deadSopenharmony_ci        ADD.U offset.x, offset.x, 8;            # bump offset to second vector
5bd8deadSopenharmony_ci        LDC.S32X2 result2, offset.x, buffer;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (7) How should the fetches in the LDC instruction interact with other
5bd8deadSopenharmony_ci        operand modifiers (swizzle, absolute value, negation)?  With result
5bd8deadSopenharmony_ci        modifiers (condition codes, saturation)?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  These features will be orthogonal.  When any of these
5bd8deadSopenharmony_ci      modifiers are specified, the base data type to which they apply come
5bd8deadSopenharmony_ci      from the storage modifier of the LDC instruction.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The LDC instruction is defined to produce a "base operand vector" from a
5bd8deadSopenharmony_ci      memory fetch.  This isn't particularly different from normal operands,
5bd8deadSopenharmony_ci      where a base operand vector is derived from the binding corresponding to
5bd8deadSopenharmony_ci      the operand.  In both cases, the components of this vector are swizzled
5bd8deadSopenharmony_ci      and have optional absolute value and negation operations performed to
5bd8deadSopenharmony_ci      produce a final vector operand, as is the case with other vector
5bd8deadSopenharmony_ci      operands.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      If condition code operations or saturation are specified for the result
5bd8deadSopenharmony_ci      vector, these operations are performed using the appropriate data types.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (8) What happens if a non-zero base offset is specified for a CBUFFER
5bd8deadSopenharmony_ci        variable?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  A subset of the bytes in a buffer object can be specified
5bd8deadSopenharmony_ci      using range syntax like the following:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        CBUFFER buffer[] = { program.buffer[0][16..31] };
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      The sub-range need not start at the beginning of the buffer object; in
5bd8deadSopenharmony_ci      the example above, it starts 16 bytes into the buffer.  When accessing a
5bd8deadSopenharmony_ci      parameter buffer variable corresponding to such a sub-range, an array
5bd8deadSopenharmony_ci      index is relative to the base of the sub-range.  So the offset of the
5bd8deadSopenharmony_ci      sub-range is effectively added to the index used for the LDC operand:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        LDC.F32   result, buffer[12];     # (x,0,0,0) from bytes 28..31
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (9) What happens if a non-array CBUFFER variable is used?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  A non-array variable may be used with LDC.  However, array
5bd8deadSopenharmony_ci      indexing isn't supported with non-array variables, so all LDC loads
5bd8deadSopenharmony_ci      using that variable will fetch using the same base address.
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        CBUFFER bufferElement = program.buffer[0][32];
5bd8deadSopenharmony_ci        LDC.U8    result, buffer;     # (x,0,0,0) from byte 32
5bd8deadSopenharmony_ci        LDC.S16   result, buffer;     # (x,0,0,0) from bytes 32..33
5bd8deadSopenharmony_ci        LDC.F32   result, buffer;     # (x,0,0,0) from bytes 32..35
5bd8deadSopenharmony_ci        LDC.F32X4 result, buffer;     # (x,y,z,w) from bytes 32..47
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    (10) Should single-component fetches from LDC smear their results across
5bd8deadSopenharmony_ci         all four components of the result vector, to allow packing multiple
5bd8deadSopenharmony_ci         non-vectors into a single vector?
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci      RESOLVED:  No.  However, swizzle suffixes on the operand will provide
5bd8deadSopenharmony_ci      this capability for free.  For example, let's say you wanted to fetch
5bd8deadSopenharmony_ci      four scalars from a buffer and pack the results into a single temporary
5bd8deadSopenharmony_ci      vector.  The swizzle syntax lets you do this by smearing the real
5bd8deadSopenharmony_ci      component (always fetched in "x") into the other components:
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci        CBUFFER buffer[] = { program.buffer[0] };
5bd8deadSopenharmony_ci        LDC.F32 temp.x, buffer[16];
5bd8deadSopenharmony_ci        LDC.F32 temp.y, buffer[28].x;
5bd8deadSopenharmony_ci        LDC.F32 temp.z, buffer[32].x;
5bd8deadSopenharmony_ci        LDC.F32 temp.w, buffer[40].x;
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ciRevision History
5bd8deadSopenharmony_ci
5bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
5bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------
5bd8deadSopenharmony_ci     1              pbrown    Internal revisions.
5bd8deadSopenharmony_ci     2    09/09/09  mjk       Assigned number