15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_gpu_program5_mem_extended 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_NV_gpu_program5_mem_extended 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciStatus 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci Shipping. 165bd8deadSopenharmony_ci 175bd8deadSopenharmony_ciVersion 185bd8deadSopenharmony_ci 195bd8deadSopenharmony_ci Last Modified Date: October 30, 2012 205bd8deadSopenharmony_ci NVIDIA Revision: 1 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ciNumber 235bd8deadSopenharmony_ci 245bd8deadSopenharmony_ci OpenGL Extension #434 255bd8deadSopenharmony_ci 265bd8deadSopenharmony_ciDependencies 275bd8deadSopenharmony_ci 285bd8deadSopenharmony_ci NV_gpu_program5 is required. 295bd8deadSopenharmony_ci 305bd8deadSopenharmony_ci This extension is written against the NV_gpu_program5 extension 315bd8deadSopenharmony_ci specification, which itself is written against the NV_gpu_program4 and 325bd8deadSopenharmony_ci OpenGL 2.0 Specifications. 335bd8deadSopenharmony_ci 345bd8deadSopenharmony_ci This extension interacts trivially with EXT_shader_image_load_store, 355bd8deadSopenharmony_ci NV_shader_storage_buffer_object, and NV_compute_program5. 365bd8deadSopenharmony_ci 375bd8deadSopenharmony_ciOverview 385bd8deadSopenharmony_ci 395bd8deadSopenharmony_ci This extension provides a new set of storage modifiers that can be used by 405bd8deadSopenharmony_ci NV_gpu_program5 assembly program instructions loading from or storing to 415bd8deadSopenharmony_ci various forms of GPU memory. In particular, we provide support for loads 425bd8deadSopenharmony_ci and stores using the storage modifiers: 435bd8deadSopenharmony_ci 445bd8deadSopenharmony_ci .F16X2 .F16X4 .F16 (for 16-bit floating-point scalars/vectors) 455bd8deadSopenharmony_ci .S8X2 .S8X4 (for 8-bit signed integer vectors) 465bd8deadSopenharmony_ci .S16X2 .S16X4 (for 16-bit signed integer vectors) 475bd8deadSopenharmony_ci .U8X2 .U8X4 (for 8-bit unsigned integer vectors) 485bd8deadSopenharmony_ci .U16X2 .U16X4 (for 16-bit unsigned integer vectors) 495bd8deadSopenharmony_ci 505bd8deadSopenharmony_ci These modifiers are allowed for the following load/store instructions: 515bd8deadSopenharmony_ci 525bd8deadSopenharmony_ci LDC Load from constant buffer 535bd8deadSopenharmony_ci 545bd8deadSopenharmony_ci LOAD Global load 555bd8deadSopenharmony_ci STORE Global store 565bd8deadSopenharmony_ci 575bd8deadSopenharmony_ci LOADIM Image load (via EXT_shader_image_load_store) 585bd8deadSopenharmony_ci STOREIM Image store (via EXT_shader_image_load_store) 595bd8deadSopenharmony_ci 605bd8deadSopenharmony_ci LDB Load from storage buffer (via 615bd8deadSopenharmony_ci NV_shader_storage_buffer_object) 625bd8deadSopenharmony_ci STB Store to storage buffer (via 635bd8deadSopenharmony_ci NV_shader_storage_buffer_object) 645bd8deadSopenharmony_ci 655bd8deadSopenharmony_ci LDS Load from shared memory (via NV_compute_program5) 665bd8deadSopenharmony_ci STS Store to shared memory (via NV_compute_program5) 675bd8deadSopenharmony_ci 685bd8deadSopenharmony_ci For assembly programs prior to this extension, it was necessary to access 695bd8deadSopenharmony_ci memory using packed types and then unpack with additional shader 705bd8deadSopenharmony_ci instructions. 715bd8deadSopenharmony_ci 725bd8deadSopenharmony_ci Similar capabilities have already been provided in the OpenGL Shading 735bd8deadSopenharmony_ci Language (GLSL) via the NV_gpu_shader5 extension, using the extended data 745bd8deadSopenharmony_ci types provided there (e.g., "float16_t", "u8vec4", "s16vec2"). 755bd8deadSopenharmony_ci 765bd8deadSopenharmony_ciNew Procedures and Functions 775bd8deadSopenharmony_ci 785bd8deadSopenharmony_ci None. 795bd8deadSopenharmony_ci 805bd8deadSopenharmony_ciNew Tokens 815bd8deadSopenharmony_ci 825bd8deadSopenharmony_ci None. 835bd8deadSopenharmony_ci 845bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 2.0 Specification (OpenGL Operation) 855bd8deadSopenharmony_ci 865bd8deadSopenharmony_ci (All modifications are relative to Section 2.X, GPU Programs, from the 875bd8deadSopenharmony_ci NV_gpu_program4 specification.) 885bd8deadSopenharmony_ci 895bd8deadSopenharmony_ci Modify Section 2.X.2, Program Grammar 905bd8deadSopenharmony_ci 915bd8deadSopenharmony_ci (add after the long list of grammar rules) If a program specifies the 925bd8deadSopenharmony_ci NV_gpu_program5_mem_extended program option, the following rules are added 935bd8deadSopenharmony_ci to the NV_gpu_program5 base program grammar: 945bd8deadSopenharmony_ci 955bd8deadSopenharmony_ci <opModifier> ::= "F16X2" 965bd8deadSopenharmony_ci | "F16X4" 975bd8deadSopenharmony_ci | "S8X2" 985bd8deadSopenharmony_ci | "S8X4" 995bd8deadSopenharmony_ci | "S16X2" 1005bd8deadSopenharmony_ci | "S16X4" 1015bd8deadSopenharmony_ci | "U8X2" 1025bd8deadSopenharmony_ci | "U8X4" 1035bd8deadSopenharmony_ci | "U16X2" 1045bd8deadSopenharmony_ci | "U16X4" 1055bd8deadSopenharmony_ci 1065bd8deadSopenharmony_ci (Note: This extension also provides new capabilities for the "F16" 1075bd8deadSopenharmony_ci modifier. Since it was already supported in NV_gpu_program5, it isn't 1085bd8deadSopenharmony_ci being added to the grammar here.) 1095bd8deadSopenharmony_ci 1105bd8deadSopenharmony_ci 1115bd8deadSopenharmony_ci Modify Section 2.X.4.1, Program Instruction Modifiers 1125bd8deadSopenharmony_ci 1135bd8deadSopenharmony_ci (add to Table X.14 of the NV_gpu_program4 specification.) 1145bd8deadSopenharmony_ci 1155bd8deadSopenharmony_ci Modifier Description 1165bd8deadSopenharmony_ci -------- --------------------------------------------------- 1175bd8deadSopenharmony_ci F16 Convert to or from one 16-bit floating-point value, 1185bd8deadSopenharmony_ci or access one 16-bit floating-point value 1195bd8deadSopenharmony_ci 1205bd8deadSopenharmony_ci F16X2 Access two 16-bit floating-point values 1215bd8deadSopenharmony_ci F16X4 Access four 16-bit floating-point values 1225bd8deadSopenharmony_ci S8X2 Access two 8-bit signed integer values 1235bd8deadSopenharmony_ci S8X4 Access four 8-bit signed integer values 1245bd8deadSopenharmony_ci S16X2 Access two 16-bit signed integer values 1255bd8deadSopenharmony_ci S16X4 Access four 16-bit signed integer values 1265bd8deadSopenharmony_ci U8X2 Access two 8-bit unsigned integer values 1275bd8deadSopenharmony_ci U8X4 Access four 8-bit unsigned integer values 1285bd8deadSopenharmony_ci U16X2 Access two 16-bit unsigned integer values 1295bd8deadSopenharmony_ci U16X4 Access four 16-bit unsigned integer values 1305bd8deadSopenharmony_ci 1315bd8deadSopenharmony_ci (modify discussion of storage modifiers for load and store operations, 1325bd8deadSopenharmony_ci adding the entries added to the table above) 1335bd8deadSopenharmony_ci 1345bd8deadSopenharmony_ci For load and store operations, the "F32", "F32X2", "F32X4", "F64", 1355bd8deadSopenharmony_ci "F64X2", "F64X4", "S8", "S8X2", "S8X4", "S16", "S16X2", "S16X4", "S32", 1365bd8deadSopenharmony_ci "S32X2", "S32X4", "S64", "S64X2", "S64X4", "U8", "U8X2", "U8X4", "U16", 1375bd8deadSopenharmony_ci "U16X2", "U16X4", "U32", "U32X2", "U32X4", "U64", "U64X2", "U64X4", "F16", 1385bd8deadSopenharmony_ci "F16X2", and "F16X4" storage modifiers control how data are loaded from or 1395bd8deadSopenharmony_ci stored to memory. ... 1405bd8deadSopenharmony_ci 1415bd8deadSopenharmony_ci 1425bd8deadSopenharmony_ci Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5 1435bd8deadSopenharmony_ci 1445bd8deadSopenharmony_ci (update pseudocode for BufferMemoryLoad) 1455bd8deadSopenharmony_ci 1465bd8deadSopenharmony_ci result_t_vec BufferMemoryLoad(char *address, OpModifier modifier) 1475bd8deadSopenharmony_ci { 1485bd8deadSopenharmony_ci result_t_vec result = { 0, 0, 0, 0 }; 1495bd8deadSopenharmony_ci switch (modifier) { 1505bd8deadSopenharmony_ci 1515bd8deadSopenharmony_ci /* Existing cases and code from NV_gpu_program5 unchanged. */ 1525bd8deadSopenharmony_ci 1535bd8deadSopenharmony_ci case F16: 1545bd8deadSopenharmony_ci result.x = ((float16_t *)address)[0]; 1555bd8deadSopenharmony_ci break; 1565bd8deadSopenharmony_ci case F16X2: 1575bd8deadSopenharmony_ci result.x = ((float16_t *)address)[0]; 1585bd8deadSopenharmony_ci result.y = ((float16_t *)address)[1]; 1595bd8deadSopenharmony_ci break; 1605bd8deadSopenharmony_ci case S8X2: 1615bd8deadSopenharmony_ci result.x = ((int8_t *)address)[0]; 1625bd8deadSopenharmony_ci result.y = ((int8_t *)address)[1]; 1635bd8deadSopenharmony_ci break; 1645bd8deadSopenharmony_ci case S8X4: 1655bd8deadSopenharmony_ci result.x = ((int8_t *)address)[0]; 1665bd8deadSopenharmony_ci result.y = ((int8_t *)address)[1]; 1675bd8deadSopenharmony_ci result.z = ((int8_t *)address)[2]; 1685bd8deadSopenharmony_ci result.w = ((int8_t *)address)[3]; 1695bd8deadSopenharmony_ci break; 1705bd8deadSopenharmony_ci case S16X2: 1715bd8deadSopenharmony_ci result.x = ((int16_t *)address)[0]; 1725bd8deadSopenharmony_ci result.y = ((int16_t *)address)[1]; 1735bd8deadSopenharmony_ci break; 1745bd8deadSopenharmony_ci case S16X4: 1755bd8deadSopenharmony_ci result.x = ((int16_t *)address)[0]; 1765bd8deadSopenharmony_ci result.y = ((int16_t *)address)[1]; 1775bd8deadSopenharmony_ci result.z = ((int16_t *)address)[2]; 1785bd8deadSopenharmony_ci result.w = ((int16_t *)address)[3]; 1795bd8deadSopenharmony_ci break; 1805bd8deadSopenharmony_ci case U8X2: 1815bd8deadSopenharmony_ci result.x = ((uint8_t *)address)[0]; 1825bd8deadSopenharmony_ci result.y = ((uint8_t *)address)[1]; 1835bd8deadSopenharmony_ci break; 1845bd8deadSopenharmony_ci case U8X4: 1855bd8deadSopenharmony_ci result.x = ((uint8_t *)address)[0]; 1865bd8deadSopenharmony_ci result.y = ((uint8_t *)address)[1]; 1875bd8deadSopenharmony_ci result.z = ((uint8_t *)address)[2]; 1885bd8deadSopenharmony_ci result.w = ((uint8_t *)address)[3]; 1895bd8deadSopenharmony_ci break; 1905bd8deadSopenharmony_ci case U16X2: 1915bd8deadSopenharmony_ci result.x = ((uint16_t *)address)[0]; 1925bd8deadSopenharmony_ci result.y = ((uint16_t *)address)[1]; 1935bd8deadSopenharmony_ci break; 1945bd8deadSopenharmony_ci case U16X4: 1955bd8deadSopenharmony_ci result.x = ((uint16_t *)address)[0]; 1965bd8deadSopenharmony_ci result.y = ((uint16_t *)address)[1]; 1975bd8deadSopenharmony_ci result.z = ((uint16_t *)address)[2]; 1985bd8deadSopenharmony_ci result.w = ((uint16_t *)address)[3]; 1995bd8deadSopenharmony_ci break; 2005bd8deadSopenharmony_ci } 2015bd8deadSopenharmony_ci return result; 2025bd8deadSopenharmony_ci } 2035bd8deadSopenharmony_ci 2045bd8deadSopenharmony_ci (update pseudocode for BufferMemoryStore) 2055bd8deadSopenharmony_ci 2065bd8deadSopenharmony_ci void BufferMemoryStore(char *address, operand_t_vec operand, 2075bd8deadSopenharmony_ci OpModifier modifier) 2085bd8deadSopenharmony_ci { 2095bd8deadSopenharmony_ci switch (modifier) { 2105bd8deadSopenharmony_ci 2115bd8deadSopenharmony_ci /* Existing cases and code from NV_gpu_program5 unchanged. */ 2125bd8deadSopenharmony_ci 2135bd8deadSopenharmony_ci case F16: 2145bd8deadSopenharmony_ci ((float16_t *)address)[0] = operand.x; 2155bd8deadSopenharmony_ci break; 2165bd8deadSopenharmony_ci case F16X2: 2175bd8deadSopenharmony_ci ((float16_t *)address)[0] = operand.x; 2185bd8deadSopenharmony_ci ((float16_t *)address)[1] = operand.y; 2195bd8deadSopenharmony_ci break; 2205bd8deadSopenharmony_ci case S8X2: 2215bd8deadSopenharmony_ci ((int8_t *)address)[0] = operand.x; 2225bd8deadSopenharmony_ci ((int8_t *)address)[1] = operand.y; 2235bd8deadSopenharmony_ci break; 2245bd8deadSopenharmony_ci case S8X4: 2255bd8deadSopenharmony_ci ((int8_t *)address)[0] = operand.x; 2265bd8deadSopenharmony_ci ((int8_t *)address)[1] = operand.y; 2275bd8deadSopenharmony_ci ((int8_t *)address)[2] = operand.z; 2285bd8deadSopenharmony_ci ((int8_t *)address)[3] = operand.w; 2295bd8deadSopenharmony_ci break; 2305bd8deadSopenharmony_ci case S16X2: 2315bd8deadSopenharmony_ci ((int16_t *)address)[0] = operand.x; 2325bd8deadSopenharmony_ci ((int16_t *)address)[1] = operand.y; 2335bd8deadSopenharmony_ci break; 2345bd8deadSopenharmony_ci case S16X4: 2355bd8deadSopenharmony_ci ((int16_t *)address)[0] = operand.x; 2365bd8deadSopenharmony_ci ((int16_t *)address)[1] = operand.y; 2375bd8deadSopenharmony_ci ((int16_t *)address)[2] = operand.z; 2385bd8deadSopenharmony_ci ((int16_t *)address)[3] = operand.w; 2395bd8deadSopenharmony_ci break; 2405bd8deadSopenharmony_ci case U8X2: 2415bd8deadSopenharmony_ci ((uint8_t *)address)[0] = operand.x; 2425bd8deadSopenharmony_ci ((uint8_t *)address)[1] = operand.y; 2435bd8deadSopenharmony_ci break; 2445bd8deadSopenharmony_ci case U8X4: 2455bd8deadSopenharmony_ci ((uint8_t *)address)[0] = operand.x; 2465bd8deadSopenharmony_ci ((uint8_t *)address)[1] = operand.y; 2475bd8deadSopenharmony_ci ((uint8_t *)address)[2] = operand.z; 2485bd8deadSopenharmony_ci ((uint8_t *)address)[3] = operand.w; 2495bd8deadSopenharmony_ci break; 2505bd8deadSopenharmony_ci case U16X2: 2515bd8deadSopenharmony_ci ((uint16_t *)address)[0] = operand.x; 2525bd8deadSopenharmony_ci ((uint16_t *)address)[1] = operand.y; 2535bd8deadSopenharmony_ci break; 2545bd8deadSopenharmony_ci case U16X4: 2555bd8deadSopenharmony_ci ((uint16_t *)address)[0] = operand.x; 2565bd8deadSopenharmony_ci ((uint16_t *)address)[1] = operand.y; 2575bd8deadSopenharmony_ci ((uint16_t *)address)[2] = operand.z; 2585bd8deadSopenharmony_ci ((uint16_t *)address)[3] = operand.w; 2595bd8deadSopenharmony_ci break; 2605bd8deadSopenharmony_ci } 2615bd8deadSopenharmony_ci } 2625bd8deadSopenharmony_ci 2635bd8deadSopenharmony_ci (modify paragraph to indicate the alignment requirement for new storage 2645bd8deadSopenharmony_ci modifiers) The address used for global memory loads or stores or offset 2655bd8deadSopenharmony_ci used for constant buffer loads must be aligned to the fetch size 2665bd8deadSopenharmony_ci corresponding to the storage opcode modifier. For S8 and U8, the offset 2675bd8deadSopenharmony_ci has no alignment requirements. For F16, S8X2, S16, U8X2, and U16, the 2685bd8deadSopenharmony_ci offset must be a multiple of two basic machine units. For F32, S32, and 2695bd8deadSopenharmony_ci U32, F16X2, S16X2, U16X2, S8X4, and U8X4, the offset must be a multiple of 2705bd8deadSopenharmony_ci four. For F32X2, F64, S32X2, S64, U32X2, U64, S16X4, and U16X4, the 2715bd8deadSopenharmony_ci offset must be a multiple of eight. ... If an offset is not correctly 2725bd8deadSopenharmony_ci aligned, the values returned by a buffer memory load will be undefined, 2735bd8deadSopenharmony_ci and the effects of a buffer memory store will also be undefined. 2745bd8deadSopenharmony_ci 2755bd8deadSopenharmony_ci 2765bd8deadSopenharmony_ci Modify Section 2.X.6, Program Options 2775bd8deadSopenharmony_ci 2785bd8deadSopenharmony_ci + Extended Memory Format Support (NV_gpu_program5_mem_extended) 2795bd8deadSopenharmony_ci 2805bd8deadSopenharmony_ci If a program specifies the "NV_gpu_program5_mem_extended" option, it may 2815bd8deadSopenharmony_ci use the "F16", "F16X2", "F16X4", "S8X2", "S8X4", "S16X2", "S16X4", "U8X2", 2825bd8deadSopenharmony_ci "U8X4", "U16X2", and "U16X4" storage modifiers on instructions loading 2835bd8deadSopenharmony_ci values from memory or storing values to memory (LDC, LOAD, STORE, LOADIM, 2845bd8deadSopenharmony_ci STOREIM, LDB, STB, LDS, STS). 2855bd8deadSopenharmony_ci 2865bd8deadSopenharmony_ci 2875bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 2.0 Specification (Rasterization) 2885bd8deadSopenharmony_ci 2895bd8deadSopenharmony_ci None. 2905bd8deadSopenharmony_ci 2915bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 2.0 Specification (Per-Fragment 2925bd8deadSopenharmony_ciOperations and the Frame Buffer) 2935bd8deadSopenharmony_ci 2945bd8deadSopenharmony_ci None. 2955bd8deadSopenharmony_ci 2965bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 2.0 Specification (Special Functions) 2975bd8deadSopenharmony_ci 2985bd8deadSopenharmony_ci None. 2995bd8deadSopenharmony_ci 3005bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 2.0 Specification (State and 3015bd8deadSopenharmony_ciState Requests) 3025bd8deadSopenharmony_ci 3035bd8deadSopenharmony_ci None. 3045bd8deadSopenharmony_ci 3055bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 2.0 Specification (Invariance) 3065bd8deadSopenharmony_ci 3075bd8deadSopenharmony_ci None. 3085bd8deadSopenharmony_ci 3095bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications 3105bd8deadSopenharmony_ci 3115bd8deadSopenharmony_ci None. 3125bd8deadSopenharmony_ci 3135bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store, NV_shader_storage_buffer_object, 3145bd8deadSopenharmony_ciand NV_compute_program5 3155bd8deadSopenharmony_ci 3165bd8deadSopenharmony_ci If EXT_shader_image_load_store is not supported, references to the LOADIM 3175bd8deadSopenharmony_ci and STOREIM opcodes should be removed. 3185bd8deadSopenharmony_ci 3195bd8deadSopenharmony_ci If NV_shader_storage_buffer_object is not supported, references to the LDB 3205bd8deadSopenharmony_ci and STB opcodes should be removed. 3215bd8deadSopenharmony_ci 3225bd8deadSopenharmony_ci If NV_compute_program5 is not supported, references to the LDS and STS 3235bd8deadSopenharmony_ci opcodes should be removed. 3245bd8deadSopenharmony_ci 3255bd8deadSopenharmony_ciErrors 3265bd8deadSopenharmony_ci 3275bd8deadSopenharmony_ci None. 3285bd8deadSopenharmony_ci 3295bd8deadSopenharmony_ciNew State 3305bd8deadSopenharmony_ci 3315bd8deadSopenharmony_ci None. 3325bd8deadSopenharmony_ci 3335bd8deadSopenharmony_ciNew Implementation Dependent State 3345bd8deadSopenharmony_ci 3355bd8deadSopenharmony_ci None. 3365bd8deadSopenharmony_ci 3375bd8deadSopenharmony_ciIssues 3385bd8deadSopenharmony_ci 3395bd8deadSopenharmony_ci (1) Should this extension have its own extension string entry, or should 3405bd8deadSopenharmony_ci its existence be inferred from the NV_gpu_program5 extension or some 3415bd8deadSopenharmony_ci other extension? 3425bd8deadSopenharmony_ci 3435bd8deadSopenharmony_ci RESOLVED: Provide a separate extension string entry, since this 3445bd8deadSopenharmony_ci functionality was added after NV_gpu_program5 was published and may not 3455bd8deadSopenharmony_ci be available on older drivers supporting NV_gpu_program5. 3465bd8deadSopenharmony_ci 3475bd8deadSopenharmony_ciRevision History 3485bd8deadSopenharmony_ci 3495bd8deadSopenharmony_ci Revision 1, October 30, 2012 (pbrown): Initial revision. 350