15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_gpu_program5_mem_extended
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_NV_gpu_program5_mem_extended
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciStatus
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci    Shipping.
165bd8deadSopenharmony_ci
175bd8deadSopenharmony_ciVersion
185bd8deadSopenharmony_ci
195bd8deadSopenharmony_ci    Last Modified Date:         October 30, 2012
205bd8deadSopenharmony_ci    NVIDIA Revision:            1
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ciNumber
235bd8deadSopenharmony_ci
245bd8deadSopenharmony_ci    OpenGL Extension #434
255bd8deadSopenharmony_ci
265bd8deadSopenharmony_ciDependencies
275bd8deadSopenharmony_ci
285bd8deadSopenharmony_ci    NV_gpu_program5 is required.
295bd8deadSopenharmony_ci
305bd8deadSopenharmony_ci    This extension is written against the NV_gpu_program5 extension
315bd8deadSopenharmony_ci    specification, which itself is written against the NV_gpu_program4 and
325bd8deadSopenharmony_ci    OpenGL 2.0 Specifications.
335bd8deadSopenharmony_ci
345bd8deadSopenharmony_ci    This extension interacts trivially with EXT_shader_image_load_store,
355bd8deadSopenharmony_ci    NV_shader_storage_buffer_object, and NV_compute_program5.
365bd8deadSopenharmony_ci
375bd8deadSopenharmony_ciOverview
385bd8deadSopenharmony_ci
395bd8deadSopenharmony_ci    This extension provides a new set of storage modifiers that can be used by
405bd8deadSopenharmony_ci    NV_gpu_program5 assembly program instructions loading from or storing to
415bd8deadSopenharmony_ci    various forms of GPU memory.  In particular, we provide support for loads
425bd8deadSopenharmony_ci    and stores using the storage modifiers:
435bd8deadSopenharmony_ci
445bd8deadSopenharmony_ci        .F16X2  .F16X4  .F16    (for 16-bit floating-point scalars/vectors)
455bd8deadSopenharmony_ci        .S8X2   .S8X4           (for 8-bit signed integer vectors)
465bd8deadSopenharmony_ci        .S16X2  .S16X4          (for 16-bit signed integer vectors)
475bd8deadSopenharmony_ci        .U8X2   .U8X4           (for 8-bit unsigned integer vectors)
485bd8deadSopenharmony_ci        .U16X2  .U16X4          (for 16-bit unsigned integer vectors)
495bd8deadSopenharmony_ci
505bd8deadSopenharmony_ci    These modifiers are allowed for the following load/store instructions:
515bd8deadSopenharmony_ci
525bd8deadSopenharmony_ci        LDC             Load from constant buffer
535bd8deadSopenharmony_ci
545bd8deadSopenharmony_ci        LOAD            Global load
555bd8deadSopenharmony_ci        STORE           Global store
565bd8deadSopenharmony_ci
575bd8deadSopenharmony_ci        LOADIM          Image load (via EXT_shader_image_load_store)
585bd8deadSopenharmony_ci        STOREIM         Image store (via EXT_shader_image_load_store)
595bd8deadSopenharmony_ci
605bd8deadSopenharmony_ci        LDB             Load from storage buffer (via 
615bd8deadSopenharmony_ci                          NV_shader_storage_buffer_object) 
625bd8deadSopenharmony_ci        STB             Store to storage buffer (via 
635bd8deadSopenharmony_ci                          NV_shader_storage_buffer_object) 
645bd8deadSopenharmony_ci
655bd8deadSopenharmony_ci        LDS             Load from shared memory (via NV_compute_program5)
665bd8deadSopenharmony_ci        STS             Store to shared memory (via NV_compute_program5)
675bd8deadSopenharmony_ci
685bd8deadSopenharmony_ci    For assembly programs prior to this extension, it was necessary to access
695bd8deadSopenharmony_ci    memory using packed types and then unpack with additional shader
705bd8deadSopenharmony_ci    instructions.
715bd8deadSopenharmony_ci
725bd8deadSopenharmony_ci    Similar capabilities have already been provided in the OpenGL Shading
735bd8deadSopenharmony_ci    Language (GLSL) via the NV_gpu_shader5 extension, using the extended data
745bd8deadSopenharmony_ci    types provided there (e.g., "float16_t", "u8vec4", "s16vec2").
755bd8deadSopenharmony_ci
765bd8deadSopenharmony_ciNew Procedures and Functions
775bd8deadSopenharmony_ci
785bd8deadSopenharmony_ci    None.
795bd8deadSopenharmony_ci
805bd8deadSopenharmony_ciNew Tokens
815bd8deadSopenharmony_ci
825bd8deadSopenharmony_ci    None.
835bd8deadSopenharmony_ci
845bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 2.0 Specification (OpenGL Operation)
855bd8deadSopenharmony_ci
865bd8deadSopenharmony_ci    (All modifications are relative to Section 2.X, GPU Programs, from the
875bd8deadSopenharmony_ci     NV_gpu_program4 specification.)
885bd8deadSopenharmony_ci
895bd8deadSopenharmony_ci    Modify Section 2.X.2, Program Grammar
905bd8deadSopenharmony_ci
915bd8deadSopenharmony_ci    (add after the long list of grammar rules) If a program specifies the
925bd8deadSopenharmony_ci    NV_gpu_program5_mem_extended program option, the following rules are added
935bd8deadSopenharmony_ci    to the NV_gpu_program5 base program grammar:
945bd8deadSopenharmony_ci
955bd8deadSopenharmony_ci    <opModifier>            ::= "F16X2"
965bd8deadSopenharmony_ci                              | "F16X4"
975bd8deadSopenharmony_ci                              | "S8X2"
985bd8deadSopenharmony_ci                              | "S8X4"
995bd8deadSopenharmony_ci                              | "S16X2"
1005bd8deadSopenharmony_ci                              | "S16X4"
1015bd8deadSopenharmony_ci                              | "U8X2"
1025bd8deadSopenharmony_ci                              | "U8X4"
1035bd8deadSopenharmony_ci                              | "U16X2"
1045bd8deadSopenharmony_ci                              | "U16X4"
1055bd8deadSopenharmony_ci
1065bd8deadSopenharmony_ci    (Note:  This extension also provides new capabilities for the "F16"
1075bd8deadSopenharmony_ci     modifier.  Since it was already supported in NV_gpu_program5, it isn't
1085bd8deadSopenharmony_ci     being added to the grammar here.)
1095bd8deadSopenharmony_ci
1105bd8deadSopenharmony_ci
1115bd8deadSopenharmony_ci    Modify Section 2.X.4.1, Program Instruction Modifiers
1125bd8deadSopenharmony_ci
1135bd8deadSopenharmony_ci    (add to Table X.14 of the NV_gpu_program4 specification.)
1145bd8deadSopenharmony_ci
1155bd8deadSopenharmony_ci      Modifier  Description
1165bd8deadSopenharmony_ci      --------  ---------------------------------------------------
1175bd8deadSopenharmony_ci      F16       Convert to or from one 16-bit floating-point value, 
1185bd8deadSopenharmony_ci                or access one 16-bit floating-point value
1195bd8deadSopenharmony_ci
1205bd8deadSopenharmony_ci      F16X2     Access two 16-bit floating-point values
1215bd8deadSopenharmony_ci      F16X4     Access four 16-bit floating-point values
1225bd8deadSopenharmony_ci      S8X2      Access two 8-bit signed integer values
1235bd8deadSopenharmony_ci      S8X4      Access four 8-bit signed integer values
1245bd8deadSopenharmony_ci      S16X2     Access two 16-bit signed integer values
1255bd8deadSopenharmony_ci      S16X4     Access four 16-bit signed integer values
1265bd8deadSopenharmony_ci      U8X2      Access two 8-bit unsigned integer values
1275bd8deadSopenharmony_ci      U8X4      Access four 8-bit unsigned integer values
1285bd8deadSopenharmony_ci      U16X2     Access two 16-bit unsigned integer values
1295bd8deadSopenharmony_ci      U16X4     Access four 16-bit unsigned integer values
1305bd8deadSopenharmony_ci
1315bd8deadSopenharmony_ci    (modify discussion of storage modifiers for load and store operations,
1325bd8deadSopenharmony_ci     adding the entries added to the table above)
1335bd8deadSopenharmony_ci
1345bd8deadSopenharmony_ci    For load and store operations, the "F32", "F32X2", "F32X4", "F64",
1355bd8deadSopenharmony_ci    "F64X2", "F64X4", "S8", "S8X2", "S8X4", "S16", "S16X2", "S16X4", "S32",
1365bd8deadSopenharmony_ci    "S32X2", "S32X4", "S64", "S64X2", "S64X4", "U8", "U8X2", "U8X4", "U16",
1375bd8deadSopenharmony_ci    "U16X2", "U16X4", "U32", "U32X2", "U32X4", "U64", "U64X2", "U64X4", "F16",
1385bd8deadSopenharmony_ci    "F16X2", and "F16X4" storage modifiers control how data are loaded from or
1395bd8deadSopenharmony_ci    stored to memory. ...
1405bd8deadSopenharmony_ci
1415bd8deadSopenharmony_ci
1425bd8deadSopenharmony_ci    Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5
1435bd8deadSopenharmony_ci
1445bd8deadSopenharmony_ci    (update pseudocode for BufferMemoryLoad)
1455bd8deadSopenharmony_ci
1465bd8deadSopenharmony_ci      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
1475bd8deadSopenharmony_ci      {
1485bd8deadSopenharmony_ci        result_t_vec result = { 0, 0, 0, 0 };
1495bd8deadSopenharmony_ci        switch (modifier) {
1505bd8deadSopenharmony_ci        
1515bd8deadSopenharmony_ci        /* Existing cases and code from NV_gpu_program5 unchanged. */
1525bd8deadSopenharmony_ci
1535bd8deadSopenharmony_ci        case F16:
1545bd8deadSopenharmony_ci            result.x = ((float16_t *)address)[0];
1555bd8deadSopenharmony_ci            break;
1565bd8deadSopenharmony_ci        case F16X2:
1575bd8deadSopenharmony_ci            result.x = ((float16_t *)address)[0];
1585bd8deadSopenharmony_ci            result.y = ((float16_t *)address)[1];
1595bd8deadSopenharmony_ci            break;
1605bd8deadSopenharmony_ci        case S8X2:
1615bd8deadSopenharmony_ci            result.x = ((int8_t *)address)[0];
1625bd8deadSopenharmony_ci            result.y = ((int8_t *)address)[1];
1635bd8deadSopenharmony_ci            break;
1645bd8deadSopenharmony_ci        case S8X4:
1655bd8deadSopenharmony_ci            result.x = ((int8_t *)address)[0];
1665bd8deadSopenharmony_ci            result.y = ((int8_t *)address)[1];
1675bd8deadSopenharmony_ci            result.z = ((int8_t *)address)[2];
1685bd8deadSopenharmony_ci            result.w = ((int8_t *)address)[3];
1695bd8deadSopenharmony_ci            break;
1705bd8deadSopenharmony_ci        case S16X2:
1715bd8deadSopenharmony_ci            result.x = ((int16_t *)address)[0];
1725bd8deadSopenharmony_ci            result.y = ((int16_t *)address)[1];
1735bd8deadSopenharmony_ci            break;
1745bd8deadSopenharmony_ci        case S16X4:
1755bd8deadSopenharmony_ci            result.x = ((int16_t *)address)[0];
1765bd8deadSopenharmony_ci            result.y = ((int16_t *)address)[1];
1775bd8deadSopenharmony_ci            result.z = ((int16_t *)address)[2];
1785bd8deadSopenharmony_ci            result.w = ((int16_t *)address)[3];
1795bd8deadSopenharmony_ci            break;
1805bd8deadSopenharmony_ci        case U8X2:
1815bd8deadSopenharmony_ci            result.x = ((uint8_t *)address)[0];
1825bd8deadSopenharmony_ci            result.y = ((uint8_t *)address)[1];
1835bd8deadSopenharmony_ci            break;
1845bd8deadSopenharmony_ci        case U8X4:
1855bd8deadSopenharmony_ci            result.x = ((uint8_t *)address)[0];
1865bd8deadSopenharmony_ci            result.y = ((uint8_t *)address)[1];
1875bd8deadSopenharmony_ci            result.z = ((uint8_t *)address)[2];
1885bd8deadSopenharmony_ci            result.w = ((uint8_t *)address)[3];
1895bd8deadSopenharmony_ci            break;
1905bd8deadSopenharmony_ci        case U16X2:
1915bd8deadSopenharmony_ci            result.x = ((uint16_t *)address)[0];
1925bd8deadSopenharmony_ci            result.y = ((uint16_t *)address)[1];
1935bd8deadSopenharmony_ci            break;
1945bd8deadSopenharmony_ci        case U16X4:
1955bd8deadSopenharmony_ci            result.x = ((uint16_t *)address)[0];
1965bd8deadSopenharmony_ci            result.y = ((uint16_t *)address)[1];
1975bd8deadSopenharmony_ci            result.z = ((uint16_t *)address)[2];
1985bd8deadSopenharmony_ci            result.w = ((uint16_t *)address)[3];
1995bd8deadSopenharmony_ci            break;
2005bd8deadSopenharmony_ci        }
2015bd8deadSopenharmony_ci        return result;
2025bd8deadSopenharmony_ci      }
2035bd8deadSopenharmony_ci
2045bd8deadSopenharmony_ci    (update pseudocode for BufferMemoryStore)
2055bd8deadSopenharmony_ci
2065bd8deadSopenharmony_ci      void BufferMemoryStore(char *address, operand_t_vec operand, 
2075bd8deadSopenharmony_ci                             OpModifier modifier)
2085bd8deadSopenharmony_ci      {
2095bd8deadSopenharmony_ci        switch (modifier) {
2105bd8deadSopenharmony_ci
2115bd8deadSopenharmony_ci        /* Existing cases and code from NV_gpu_program5 unchanged. */
2125bd8deadSopenharmony_ci
2135bd8deadSopenharmony_ci        case F16:
2145bd8deadSopenharmony_ci            ((float16_t *)address)[0] = operand.x;
2155bd8deadSopenharmony_ci            break;
2165bd8deadSopenharmony_ci        case F16X2:
2175bd8deadSopenharmony_ci            ((float16_t *)address)[0] = operand.x;
2185bd8deadSopenharmony_ci            ((float16_t *)address)[1] = operand.y;
2195bd8deadSopenharmony_ci            break;
2205bd8deadSopenharmony_ci        case S8X2:
2215bd8deadSopenharmony_ci            ((int8_t *)address)[0] = operand.x;
2225bd8deadSopenharmony_ci            ((int8_t *)address)[1] = operand.y;
2235bd8deadSopenharmony_ci            break;
2245bd8deadSopenharmony_ci        case S8X4:
2255bd8deadSopenharmony_ci            ((int8_t *)address)[0] = operand.x;
2265bd8deadSopenharmony_ci            ((int8_t *)address)[1] = operand.y;
2275bd8deadSopenharmony_ci            ((int8_t *)address)[2] = operand.z;
2285bd8deadSopenharmony_ci            ((int8_t *)address)[3] = operand.w;
2295bd8deadSopenharmony_ci            break;
2305bd8deadSopenharmony_ci        case S16X2:
2315bd8deadSopenharmony_ci            ((int16_t *)address)[0] = operand.x;
2325bd8deadSopenharmony_ci            ((int16_t *)address)[1] = operand.y;
2335bd8deadSopenharmony_ci            break;
2345bd8deadSopenharmony_ci        case S16X4:
2355bd8deadSopenharmony_ci            ((int16_t *)address)[0] = operand.x;
2365bd8deadSopenharmony_ci            ((int16_t *)address)[1] = operand.y;
2375bd8deadSopenharmony_ci            ((int16_t *)address)[2] = operand.z;
2385bd8deadSopenharmony_ci            ((int16_t *)address)[3] = operand.w;
2395bd8deadSopenharmony_ci            break;
2405bd8deadSopenharmony_ci        case U8X2:
2415bd8deadSopenharmony_ci            ((uint8_t *)address)[0] = operand.x;
2425bd8deadSopenharmony_ci            ((uint8_t *)address)[1] = operand.y;
2435bd8deadSopenharmony_ci            break;
2445bd8deadSopenharmony_ci        case U8X4:
2455bd8deadSopenharmony_ci            ((uint8_t *)address)[0] = operand.x;
2465bd8deadSopenharmony_ci            ((uint8_t *)address)[1] = operand.y;
2475bd8deadSopenharmony_ci            ((uint8_t *)address)[2] = operand.z;
2485bd8deadSopenharmony_ci            ((uint8_t *)address)[3] = operand.w;
2495bd8deadSopenharmony_ci            break;
2505bd8deadSopenharmony_ci        case U16X2:
2515bd8deadSopenharmony_ci            ((uint16_t *)address)[0] = operand.x;
2525bd8deadSopenharmony_ci            ((uint16_t *)address)[1] = operand.y;
2535bd8deadSopenharmony_ci            break;
2545bd8deadSopenharmony_ci        case U16X4:
2555bd8deadSopenharmony_ci            ((uint16_t *)address)[0] = operand.x;
2565bd8deadSopenharmony_ci            ((uint16_t *)address)[1] = operand.y;
2575bd8deadSopenharmony_ci            ((uint16_t *)address)[2] = operand.z;
2585bd8deadSopenharmony_ci            ((uint16_t *)address)[3] = operand.w;
2595bd8deadSopenharmony_ci            break;
2605bd8deadSopenharmony_ci        }
2615bd8deadSopenharmony_ci      }
2625bd8deadSopenharmony_ci
2635bd8deadSopenharmony_ci    (modify paragraph to indicate the alignment requirement for new storage
2645bd8deadSopenharmony_ci    modifiers) The address used for global memory loads or stores or offset
2655bd8deadSopenharmony_ci    used for constant buffer loads must be aligned to the fetch size
2665bd8deadSopenharmony_ci    corresponding to the storage opcode modifier.  For S8 and U8, the offset
2675bd8deadSopenharmony_ci    has no alignment requirements.  For F16, S8X2, S16, U8X2, and U16, the
2685bd8deadSopenharmony_ci    offset must be a multiple of two basic machine units.  For F32, S32, and
2695bd8deadSopenharmony_ci    U32, F16X2, S16X2, U16X2, S8X4, and U8X4, the offset must be a multiple of
2705bd8deadSopenharmony_ci    four.  For F32X2, F64, S32X2, S64, U32X2, U64, S16X4, and U16X4, the
2715bd8deadSopenharmony_ci    offset must be a multiple of eight.  ...  If an offset is not correctly
2725bd8deadSopenharmony_ci    aligned, the values returned by a buffer memory load will be undefined,
2735bd8deadSopenharmony_ci    and the effects of a buffer memory store will also be undefined.
2745bd8deadSopenharmony_ci
2755bd8deadSopenharmony_ci
2765bd8deadSopenharmony_ci    Modify Section 2.X.6, Program Options
2775bd8deadSopenharmony_ci
2785bd8deadSopenharmony_ci    + Extended Memory Format Support (NV_gpu_program5_mem_extended)
2795bd8deadSopenharmony_ci
2805bd8deadSopenharmony_ci    If a program specifies the "NV_gpu_program5_mem_extended" option, it may
2815bd8deadSopenharmony_ci    use the "F16", "F16X2", "F16X4", "S8X2", "S8X4", "S16X2", "S16X4", "U8X2",
2825bd8deadSopenharmony_ci    "U8X4", "U16X2", and "U16X4" storage modifiers on instructions loading
2835bd8deadSopenharmony_ci    values from memory or storing values to memory (LDC, LOAD, STORE, LOADIM,
2845bd8deadSopenharmony_ci    STOREIM, LDB, STB, LDS, STS).
2855bd8deadSopenharmony_ci
2865bd8deadSopenharmony_ci
2875bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 2.0 Specification (Rasterization)
2885bd8deadSopenharmony_ci
2895bd8deadSopenharmony_ci    None.
2905bd8deadSopenharmony_ci
2915bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 2.0 Specification (Per-Fragment
2925bd8deadSopenharmony_ciOperations and the Frame Buffer)
2935bd8deadSopenharmony_ci
2945bd8deadSopenharmony_ci    None.
2955bd8deadSopenharmony_ci
2965bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 2.0 Specification (Special Functions)
2975bd8deadSopenharmony_ci
2985bd8deadSopenharmony_ci    None.
2995bd8deadSopenharmony_ci
3005bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 2.0 Specification (State and
3015bd8deadSopenharmony_ciState Requests)
3025bd8deadSopenharmony_ci
3035bd8deadSopenharmony_ci    None.
3045bd8deadSopenharmony_ci
3055bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 2.0 Specification (Invariance)
3065bd8deadSopenharmony_ci
3075bd8deadSopenharmony_ci    None.
3085bd8deadSopenharmony_ci
3095bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
3105bd8deadSopenharmony_ci
3115bd8deadSopenharmony_ci    None.
3125bd8deadSopenharmony_ci
3135bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store, NV_shader_storage_buffer_object,
3145bd8deadSopenharmony_ciand NV_compute_program5
3155bd8deadSopenharmony_ci
3165bd8deadSopenharmony_ci    If EXT_shader_image_load_store is not supported, references to the LOADIM
3175bd8deadSopenharmony_ci    and STOREIM opcodes should be removed.
3185bd8deadSopenharmony_ci
3195bd8deadSopenharmony_ci    If NV_shader_storage_buffer_object is not supported, references to the LDB
3205bd8deadSopenharmony_ci    and STB opcodes should be removed.
3215bd8deadSopenharmony_ci
3225bd8deadSopenharmony_ci    If NV_compute_program5 is not supported, references to the LDS and STS
3235bd8deadSopenharmony_ci    opcodes should be removed.
3245bd8deadSopenharmony_ci
3255bd8deadSopenharmony_ciErrors
3265bd8deadSopenharmony_ci
3275bd8deadSopenharmony_ci    None.
3285bd8deadSopenharmony_ci
3295bd8deadSopenharmony_ciNew State
3305bd8deadSopenharmony_ci
3315bd8deadSopenharmony_ci    None.
3325bd8deadSopenharmony_ci
3335bd8deadSopenharmony_ciNew Implementation Dependent State
3345bd8deadSopenharmony_ci
3355bd8deadSopenharmony_ci    None.
3365bd8deadSopenharmony_ci
3375bd8deadSopenharmony_ciIssues
3385bd8deadSopenharmony_ci
3395bd8deadSopenharmony_ci    (1) Should this extension have its own extension string entry, or should
3405bd8deadSopenharmony_ci        its existence be inferred from the NV_gpu_program5 extension or some
3415bd8deadSopenharmony_ci        other extension?
3425bd8deadSopenharmony_ci
3435bd8deadSopenharmony_ci      RESOLVED:  Provide a separate extension string entry, since this
3445bd8deadSopenharmony_ci      functionality was added after NV_gpu_program5 was published and may not
3455bd8deadSopenharmony_ci      be available on older drivers supporting NV_gpu_program5.
3465bd8deadSopenharmony_ci
3475bd8deadSopenharmony_ciRevision History
3485bd8deadSopenharmony_ci
3495bd8deadSopenharmony_ci    Revision 1, October 30, 2012 (pbrown):  Initial revision.
350