15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_parameter_buffer_object2 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_NV_parameter_buffer_object2 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciStatus 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci Shipping (July 2009, Release 190) 165bd8deadSopenharmony_ci 175bd8deadSopenharmony_ciVersion 185bd8deadSopenharmony_ci 195bd8deadSopenharmony_ci Last Modified Date: 09/09/09 205bd8deadSopenharmony_ci NVIDIA Revision: 2 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ciNumber 235bd8deadSopenharmony_ci 245bd8deadSopenharmony_ci 378 255bd8deadSopenharmony_ci 265bd8deadSopenharmony_ciDependencies 275bd8deadSopenharmony_ci 285bd8deadSopenharmony_ci OpenGL 2.0 is required. 295bd8deadSopenharmony_ci 305bd8deadSopenharmony_ci NV_gpu_program4 is required. 315bd8deadSopenharmony_ci 325bd8deadSopenharmony_ci NV_parameter_buffer_object is required. 335bd8deadSopenharmony_ci 345bd8deadSopenharmony_ci This extension is written against the NV_gpu_program4 specification. 355bd8deadSopenharmony_ci 365bd8deadSopenharmony_ci NV_shader_buffer_load trivially affects the definition of this extension. 375bd8deadSopenharmony_ci 385bd8deadSopenharmony_ciOverview 395bd8deadSopenharmony_ci 405bd8deadSopenharmony_ci This extension builds on the NV_parameter_buffer_object extension to 415bd8deadSopenharmony_ci provide additional flexibility in sourcing data from buffer objects. 425bd8deadSopenharmony_ci 435bd8deadSopenharmony_ci The original NV_parameter_buffer_object (PaBO) extension provided the 445bd8deadSopenharmony_ci ability to bind buffer objects to a set of numbered binding points and 455bd8deadSopenharmony_ci access them in assembly programs as though they were arrays of 32-bit 465bd8deadSopenharmony_ci scalars (via the BUFFER variable type) or arrays of four-component vectors 475bd8deadSopenharmony_ci with 32-bit scalar components (via the BUFFER4 variable type). However, 485bd8deadSopenharmony_ci the functionality it provided had some significant limits on flexibility. 495bd8deadSopenharmony_ci Since any given buffer binding point could be used either as a BUFFER or 505bd8deadSopenharmony_ci BUFFER4, but not both, programs couldn't do both 32- and 128-bit fetches 515bd8deadSopenharmony_ci from a single binding point. Additionally, No support was provided for 525bd8deadSopenharmony_ci 8-, 16-, or 64-bit fetches, though they could be emulated using a larger 535bd8deadSopenharmony_ci loads, with bitfield operations and/or write masking to put components in 545bd8deadSopenharmony_ci the right places. Indexing was supported, but strides were limited to 4- 555bd8deadSopenharmony_ci and 16-byte multiples, depending on whether BUFFER or BUFFER4 is used. 565bd8deadSopenharmony_ci 575bd8deadSopenharmony_ci This new extension provides the buffer variable declaration type CBUFFER 585bd8deadSopenharmony_ci to specify a buffer that is treated as an array of bytes, rather than an 595bd8deadSopenharmony_ci array of words or vectors. The LDC instruction allows programs to extract 605bd8deadSopenharmony_ci a vector of data from a CBUFFER variable, using a size and component count 615bd8deadSopenharmony_ci specified in the opcode modifier. 1-, 2-, and 4-component fetches are 625bd8deadSopenharmony_ci supported. The LDC instruction supports byte offsets using normal array 635bd8deadSopenharmony_ci indexing mechanisms; both run-time and immediate offsets are supported. 645bd8deadSopenharmony_ci Offsets used for a buffer object fetch are required to be aligned to the 655bd8deadSopenharmony_ci size of the fetch (1, 2, 4, 8, or 16 bytes). 665bd8deadSopenharmony_ci 675bd8deadSopenharmony_ciNew Procedures and Functions 685bd8deadSopenharmony_ci 695bd8deadSopenharmony_ci None. 705bd8deadSopenharmony_ci 715bd8deadSopenharmony_ciNew Tokens 725bd8deadSopenharmony_ci 735bd8deadSopenharmony_ci None. 745bd8deadSopenharmony_ci 755bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation) 765bd8deadSopenharmony_ci 775bd8deadSopenharmony_ci (All modifications are relative to Section 2.X, GPU Programs, from the 785bd8deadSopenharmony_ci NV_gpu_program4 specification.) 795bd8deadSopenharmony_ci 805bd8deadSopenharmony_ci Modify Section 2.X.2, Program Grammar 815bd8deadSopenharmony_ci 825bd8deadSopenharmony_ci (add after the long list of grammar rules) If a program specifies the 835bd8deadSopenharmony_ci NV_parameter_buffer_object2 program option, the following rules are added 845bd8deadSopenharmony_ci to the NV_gpu_program4 base program grammar: 855bd8deadSopenharmony_ci 865bd8deadSopenharmony_ci <VECTORop> ::= "LDC" 875bd8deadSopenharmony_ci 885bd8deadSopenharmony_ci <opModifier> ::= "F32"; 895bd8deadSopenharmony_ci | "F32X2"; 905bd8deadSopenharmony_ci | "F32X4"; 915bd8deadSopenharmony_ci | "S8"; 925bd8deadSopenharmony_ci | "S16"; 935bd8deadSopenharmony_ci | "S32"; 945bd8deadSopenharmony_ci | "S32X2"; 955bd8deadSopenharmony_ci | "S32X4"; 965bd8deadSopenharmony_ci | "U8"; 975bd8deadSopenharmony_ci | "U16"; 985bd8deadSopenharmony_ci | "U32"; 995bd8deadSopenharmony_ci | "U32X2"; 1005bd8deadSopenharmony_ci | "U32X4"; 1015bd8deadSopenharmony_ci 1025bd8deadSopenharmony_ci <bufferDeclType> ::= "CBUFFER" 1035bd8deadSopenharmony_ci 1045bd8deadSopenharmony_ci 1055bd8deadSopenharmony_ci Modify Section 2.X.3.6, Program Parameter Buffers 1065bd8deadSopenharmony_ci 1075bd8deadSopenharmony_ci (modify the paragraph describing the different type of parameter buffer 1085bd8deadSopenharmony_ci variable declarations to include support for "CBUFFER".) 1095bd8deadSopenharmony_ci 1105bd8deadSopenharmony_ci Program parameter buffer variables are treated as an array of 1115bd8deadSopenharmony_ci single-component words if the <bufferDeclType> grammar rule matches 1125bd8deadSopenharmony_ci "BUFFER" or as an array of four-component vectors if it matches "BUFFER4". 1135bd8deadSopenharmony_ci Program parameter buffers may also be declared as an array of basic 1145bd8deadSopenharmony_ci machine units from which data can be extracted using the LDC (load 1155bd8deadSopenharmony_ci constant) instruction, if <bufferDeclType> matches "CBUFFER". Parameter 1165bd8deadSopenharmony_ci buffer variables declared using "CBUFFER" may not be used as an operand in 1175bd8deadSopenharmony_ci any instruction other than LDC, while "BUFFER" and "BUFFER4" variables may 1185bd8deadSopenharmony_ci not be used with LDC. A program will fail to load if a variable declared 1195bd8deadSopenharmony_ci as "BUFFER" and another variable declared as "BUFFER4" use the same buffer 1205bd8deadSopenharmony_ci binding point. There is no limitation on the use of "CBUFFER" variables 1215bd8deadSopenharmony_ci in conjunction with "BUFFER" or "BUFFER4" variables using the same buffer 1225bd8deadSopenharmony_ci binding point. 1235bd8deadSopenharmony_ci 1245bd8deadSopenharmony_ci (modify/restructure the paragraph describing basic program parameter 1255bd8deadSopenharmony_ci bindings to handle the byte bindings provided by "CBUFFER" variables) 1265bd8deadSopenharmony_ci 1275bd8deadSopenharmony_ci If a program parameter buffer binding matches "program.buffer[a][b]", the 1285bd8deadSopenharmony_ci program parameter variable corresponds to element <b> of the buffer object 1295bd8deadSopenharmony_ci bound to binding point <a>. Each element of the bound buffer object is 1305bd8deadSopenharmony_ci treated as: 1315bd8deadSopenharmony_ci 1325bd8deadSopenharmony_ci * a single basic machine unit of data, if the variable is declared using 1335bd8deadSopenharmony_ci "CBUFFER"; 1345bd8deadSopenharmony_ci 1355bd8deadSopenharmony_ci * a single word of data that can hold an integer or floating-point 1365bd8deadSopenharmony_ci value, if the variable is declared as "BUFFER"; or 1375bd8deadSopenharmony_ci 1385bd8deadSopenharmony_ci * four words of data that can hold integer or floating-point values, if 1395bd8deadSopenharmony_ci the variable is declared as "BUFFER4". 1405bd8deadSopenharmony_ci 1415bd8deadSopenharmony_ci When a binding corresponding to a "BUFFER" variable is used as an operand, 1425bd8deadSopenharmony_ci the selected word is broadcast to all four components of the variable. 1435bd8deadSopenharmony_ci When a binding corresponding to a "BUFFER4" variable is used as an 1445bd8deadSopenharmony_ci operand, the four components of the selected buffer element are loaded 1455bd8deadSopenharmony_ci into the variable. A binding corresponding to a "CBUFFER" variable may be 1465bd8deadSopenharmony_ci used only in the LDC instruction, and will be used there as a pointer to 1475bd8deadSopenharmony_ci extract operand values from buffer memory. If no buffer object is bound 1485bd8deadSopenharmony_ci to binding point <a>, or the bound buffer object is not large enough to 1495bd8deadSopenharmony_ci hold element <b>, the values used are undefined. The binding point <a> 1505bd8deadSopenharmony_ci must be a nonnegative integer constant. 1515bd8deadSopenharmony_ci 1525bd8deadSopenharmony_ci 1535bd8deadSopenharmony_ci Modify Section 2.X.4, Program Execution Environment 1545bd8deadSopenharmony_ci 1555bd8deadSopenharmony_ci (Add to the set of opcodes in Table X.13) 1565bd8deadSopenharmony_ci 1575bd8deadSopenharmony_ci Modifiers 1585bd8deadSopenharmony_ci Instruction F I C S H D Out Inputs Description 1595bd8deadSopenharmony_ci ----------- - - - - - - --- -------- -------------------------------- 1605bd8deadSopenharmony_ci LDC X X X X - F v v load from constant buffer 1615bd8deadSopenharmony_ci 1625bd8deadSopenharmony_ci 1635bd8deadSopenharmony_ci Modify Section 2.X.4.1, Program Instruction Modifiers 1645bd8deadSopenharmony_ci 1655bd8deadSopenharmony_ci (Add to Table X.14, Instruction Modifiers, and to the corresponding 1665bd8deadSopenharmony_ci description following the table) 1675bd8deadSopenharmony_ci 1685bd8deadSopenharmony_ci Modifier Description 1695bd8deadSopenharmony_ci -------- ----------------------------------------------- 1705bd8deadSopenharmony_ci F32 Access one 32-bit floating-point value 1715bd8deadSopenharmony_ci F32X2 Access two 32-bit floating-point values 1725bd8deadSopenharmony_ci F32X4 Access four 32-bit floating-point values 1735bd8deadSopenharmony_ci S8 Access one 8-bit signed integer value 1745bd8deadSopenharmony_ci S16 Access one 16-bit signed integer value 1755bd8deadSopenharmony_ci S32 Access one 32-bit signed integer value 1765bd8deadSopenharmony_ci S32X2 Access two 32-bit signed integer values 1775bd8deadSopenharmony_ci S32X4 Access four 32-bit signed integer values 1785bd8deadSopenharmony_ci U8 Access one 8-bit unsigned integer value 1795bd8deadSopenharmony_ci U16 Access one 16-bit unsigned integer value 1805bd8deadSopenharmony_ci U32 Access one 32-bit unsigned integer value 1815bd8deadSopenharmony_ci U32X2 Access two 32-bit unsigned integer values 1825bd8deadSopenharmony_ci U32X4 Access four 32-bit unsigned integer values 1835bd8deadSopenharmony_ci 1845bd8deadSopenharmony_ci For memory load operations, the "F32", "F32X2", "F32X4", "S8", "S16", 1855bd8deadSopenharmony_ci "S32", "S32X2", "S32X4", "U8", "U16", "U32", "U32X2", and "U32X4" storage 1865bd8deadSopenharmony_ci modifiers control how data are loaded from memory. Storage modifiers are 1875bd8deadSopenharmony_ci supported by the LDC and LOAD instructions and are covered in more detail 1885bd8deadSopenharmony_ci in the descriptions of these instructions. These instructions must 1895bd8deadSopenharmony_ci specify exactly one of these modifiers, and may not specify any of the 1905bd8deadSopenharmony_ci base data type modifiers (F,U,S) described above. The base data type of 1915bd8deadSopenharmony_ci the result vector of a LOAD or LDC instruction is trivially derived from 1925bd8deadSopenharmony_ci the storage modifier. 1935bd8deadSopenharmony_ci 1945bd8deadSopenharmony_ci 1955bd8deadSopenharmony_ci Add New Section 2.X.4.5, Program Memory Access 1965bd8deadSopenharmony_ci 1975bd8deadSopenharmony_ci Programs may load from buffer object memory via the LDC (load constant) 1985bd8deadSopenharmony_ci and LOAD (global load) instructions. 1995bd8deadSopenharmony_ci 2005bd8deadSopenharmony_ci Load instructions read 8, 16, 32, 64, or 128 bits of data from a source 2015bd8deadSopenharmony_ci address to produce a four-component vector, according to the storage 2025bd8deadSopenharmony_ci modifier specified with the instruction. The storage modifier has three 2035bd8deadSopenharmony_ci parts: 2045bd8deadSopenharmony_ci 2055bd8deadSopenharmony_ci - a base data type, "F", "S", or "U", specifying that the instruction 2065bd8deadSopenharmony_ci fetches floating-point, signed integer, or unsigned integer values, 2075bd8deadSopenharmony_ci respectively; 2085bd8deadSopenharmony_ci 2095bd8deadSopenharmony_ci - a component size, specifying that the components fetched by the 2105bd8deadSopenharmony_ci instruction have 8, 16, or 32 bits; and 2115bd8deadSopenharmony_ci 2125bd8deadSopenharmony_ci - an optional component count, where "X2" and "X4" indicate that two or 2135bd8deadSopenharmony_ci four components be fetched, and no count indicates a single component 2145bd8deadSopenharmony_ci fetch. 2155bd8deadSopenharmony_ci 2165bd8deadSopenharmony_ci When the storage modifier specifies that fewer than four components should 2175bd8deadSopenharmony_ci be fetched, remaining components are filled with zeroes. When performing 2185bd8deadSopenharmony_ci a global load (LOAD), the GPU address is specified as an instruction 2195bd8deadSopenharmony_ci operand. When performing a constant buffer load (LDC), the GPU address is 2205bd8deadSopenharmony_ci derived by adding the base address of the bound buffer object to an offset 2215bd8deadSopenharmony_ci specified as an instruction operand. Given a GPU address <address> and a 2225bd8deadSopenharmony_ci storage modifier <modifier>, the memory load can be described by the 2235bd8deadSopenharmony_ci following code: 2245bd8deadSopenharmony_ci 2255bd8deadSopenharmony_ci result_t_vec BufferMemoryLoad(char *address, OpModifier modifier) 2265bd8deadSopenharmony_ci { 2275bd8deadSopenharmony_ci result_t_vec result = { 0, 0, 0, 0 }; 2285bd8deadSopenharmony_ci switch (modifier) { 2295bd8deadSopenharmony_ci case F32: 2305bd8deadSopenharmony_ci result.x = ((float32_t *)address)[0]; 2315bd8deadSopenharmony_ci break; 2325bd8deadSopenharmony_ci case F32X2: 2335bd8deadSopenharmony_ci result.x = ((float32_t *)address)[0]; 2345bd8deadSopenharmony_ci result.y = ((float32_t *)address)[1]; 2355bd8deadSopenharmony_ci break; 2365bd8deadSopenharmony_ci case F32X4: 2375bd8deadSopenharmony_ci result.x = ((float32_t *)address)[0]; 2385bd8deadSopenharmony_ci result.y = ((float32_t *)address)[1]; 2395bd8deadSopenharmony_ci result.z = ((float32_t *)address)[2]; 2405bd8deadSopenharmony_ci result.w = ((float32_t *)address)[3]; 2415bd8deadSopenharmony_ci break; 2425bd8deadSopenharmony_ci case S8: 2435bd8deadSopenharmony_ci result.x = ((int8_t *)address)[0]; 2445bd8deadSopenharmony_ci break; 2455bd8deadSopenharmony_ci case S16: 2465bd8deadSopenharmony_ci result.x = ((int16_t *)address)[0]; 2475bd8deadSopenharmony_ci break; 2485bd8deadSopenharmony_ci case S32: 2495bd8deadSopenharmony_ci result.x = ((int32_t *)address)[0]; 2505bd8deadSopenharmony_ci break; 2515bd8deadSopenharmony_ci case S32X2: 2525bd8deadSopenharmony_ci result.x = ((int32_t *)address)[0]; 2535bd8deadSopenharmony_ci result.y = ((int32_t *)address)[1]; 2545bd8deadSopenharmony_ci break; 2555bd8deadSopenharmony_ci case S32X4: 2565bd8deadSopenharmony_ci result.x = ((int32_t *)address)[0]; 2575bd8deadSopenharmony_ci result.y = ((int32_t *)address)[1]; 2585bd8deadSopenharmony_ci result.z = ((int32_t *)address)[2]; 2595bd8deadSopenharmony_ci result.w = ((int32_t *)address)[3]; 2605bd8deadSopenharmony_ci break; 2615bd8deadSopenharmony_ci case U8: 2625bd8deadSopenharmony_ci result.x = ((uint8_t *)address)[0]; 2635bd8deadSopenharmony_ci break; 2645bd8deadSopenharmony_ci case U16: 2655bd8deadSopenharmony_ci result.x = ((uint16_t *)address)[0]; 2665bd8deadSopenharmony_ci break; 2675bd8deadSopenharmony_ci case U32: 2685bd8deadSopenharmony_ci result.x = ((uint32_t *)address)[0]; 2695bd8deadSopenharmony_ci break; 2705bd8deadSopenharmony_ci case U32X2: 2715bd8deadSopenharmony_ci result.x = ((uint32_t *)address)[0]; 2725bd8deadSopenharmony_ci result.y = ((uint32_t *)address)[1]; 2735bd8deadSopenharmony_ci break; 2745bd8deadSopenharmony_ci case U32X4: 2755bd8deadSopenharmony_ci result.x = ((uint32_t *)address)[0]; 2765bd8deadSopenharmony_ci result.y = ((uint32_t *)address)[1]; 2775bd8deadSopenharmony_ci result.z = ((uint32_t *)address)[2]; 2785bd8deadSopenharmony_ci result.w = ((uint32_t *)address)[3]; 2795bd8deadSopenharmony_ci break; 2805bd8deadSopenharmony_ci } 2815bd8deadSopenharmony_ci return result; 2825bd8deadSopenharmony_ci } 2835bd8deadSopenharmony_ci 2845bd8deadSopenharmony_ci The offset used for the constant buffer loads must be aligned to the fetch 2855bd8deadSopenharmony_ci size corresponding to the storage opcode modifier. For S8 and U8, the 2865bd8deadSopenharmony_ci offset has no alignment requirements. For S16 and U16, the offset must be 2875bd8deadSopenharmony_ci a multiple of two basic machine units. For F32, S32, and U32, the offset 2885bd8deadSopenharmony_ci must be a multiple of four. For F32X2, S32X2, and U32X2, the offset must 2895bd8deadSopenharmony_ci be a multiple of eight. For F32X4, S32X4, and U32X4, the offset must be a 2905bd8deadSopenharmony_ci multiple of sixteen. If an offset is not correctly aligned, the values 2915bd8deadSopenharmony_ci returned by a constant buffer load will be undefined. 2925bd8deadSopenharmony_ci 2935bd8deadSopenharmony_ci 2945bd8deadSopenharmony_ci Modify Section 2.X.6, Program Options 2955bd8deadSopenharmony_ci 2965bd8deadSopenharmony_ci + Extended Parameter Buffer Object Support (NV_parameter_buffer_object2) 2975bd8deadSopenharmony_ci 2985bd8deadSopenharmony_ci If a program specifies the "NV_parameter_buffer_object2" option, it may 2995bd8deadSopenharmony_ci use the CBUFFER statement to declare program parameter buffer variables 3005bd8deadSopenharmony_ci and the LDC instruction to load data from parameter buffer variables using 3015bd8deadSopenharmony_ci arbitrary offsets. 3025bd8deadSopenharmony_ci 3035bd8deadSopenharmony_ci 3045bd8deadSopenharmony_ci Modify Section 2.X.8, Program Instruction Set 3055bd8deadSopenharmony_ci 3065bd8deadSopenharmony_ci Section 2.X.8.Z, LDC: Load from Constant Buffer 3075bd8deadSopenharmony_ci 3085bd8deadSopenharmony_ci The LDC instruction loads a vector operand from a buffer object to yield a 3095bd8deadSopenharmony_ci result vector. The operand used for the LDC instruction must correspond 3105bd8deadSopenharmony_ci to a parameter buffer variable declared using the "CBUFFER" statement; a 3115bd8deadSopenharmony_ci program will fail to load if any other type of operand is used in an LDC 3125bd8deadSopenharmony_ci instruction. 3135bd8deadSopenharmony_ci 3145bd8deadSopenharmony_ci result = BufferMemoryLoad(&op0, storageModifier); 3155bd8deadSopenharmony_ci 3165bd8deadSopenharmony_ci A base operand vector is fetched from memory as described in Section 3175bd8deadSopenharmony_ci 2.X.4.5, with the GPU address derived from the binding corresponding to 3185bd8deadSopenharmony_ci the operand. A final operand vector is derived from the base operand 3195bd8deadSopenharmony_ci vector by applying swizzle, negation, and absolute value operand modifiers 3205bd8deadSopenharmony_ci as described in Section 2.X.4.2. 3215bd8deadSopenharmony_ci 3225bd8deadSopenharmony_ci The amount of memory in any given buffer object binding accessible by the 3235bd8deadSopenharmony_ci LDC instruction may be limited. If any component fetched by the LDC 3245bd8deadSopenharmony_ci instruction extends 4*<n> or more basic machine units from the beginning 3255bd8deadSopenharmony_ci of the buffer object binding, where <n> is the implementation-dependent 3265bd8deadSopenharmony_ci constant MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV, the value fetched for that 3275bd8deadSopenharmony_ci component will be undefined. 3285bd8deadSopenharmony_ci 3295bd8deadSopenharmony_ci LDC supports no base data type modifiers, but requires exactly one storage 3305bd8deadSopenharmony_ci modifier. The base data types of the operand and result vectors are 3315bd8deadSopenharmony_ci derived from the storage modifier. 3325bd8deadSopenharmony_ci 3335bd8deadSopenharmony_ci 3345bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.0 Specification (Rasterization) 3355bd8deadSopenharmony_ci 3365bd8deadSopenharmony_ci None. 3375bd8deadSopenharmony_ci 3385bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.0 Specification (Per-Fragment 3395bd8deadSopenharmony_ciOperations and the Frame Buffer) 3405bd8deadSopenharmony_ci 3415bd8deadSopenharmony_ci None. 3425bd8deadSopenharmony_ci 3435bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions) 3445bd8deadSopenharmony_ci 3455bd8deadSopenharmony_ci None. 3465bd8deadSopenharmony_ci 3475bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.0 Specification (State and 3485bd8deadSopenharmony_ciState Requests) 3495bd8deadSopenharmony_ci 3505bd8deadSopenharmony_ci None. 3515bd8deadSopenharmony_ci 3525bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.0 Specification (Invariance) 3535bd8deadSopenharmony_ci 3545bd8deadSopenharmony_ci None. 3555bd8deadSopenharmony_ci 3565bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications 3575bd8deadSopenharmony_ci 3585bd8deadSopenharmony_ci None. 3595bd8deadSopenharmony_ci 3605bd8deadSopenharmony_ciErrors 3615bd8deadSopenharmony_ci 3625bd8deadSopenharmony_ci No new errors. 3635bd8deadSopenharmony_ci 3645bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load 3655bd8deadSopenharmony_ci 3665bd8deadSopenharmony_ci If NV_shader_buffer_load (or equivalent functionality) is not supported, 3675bd8deadSopenharmony_ci references to the "LOAD" opcode in the description of the opcode modifiers 3685bd8deadSopenharmony_ci for "LDC" should be removed. 3695bd8deadSopenharmony_ci 3705bd8deadSopenharmony_ciNew State 3715bd8deadSopenharmony_ci 3725bd8deadSopenharmony_ci None. 3735bd8deadSopenharmony_ci 3745bd8deadSopenharmony_ciNew Implementation Dependent State 3755bd8deadSopenharmony_ci 3765bd8deadSopenharmony_ci None. 3775bd8deadSopenharmony_ci 3785bd8deadSopenharmony_ciIssues 3795bd8deadSopenharmony_ci 3805bd8deadSopenharmony_ci (1) What sort of alignment requirements, if any, should be imposed on the 3815bd8deadSopenharmony_ci operand provided to the LDC instruction? 3825bd8deadSopenharmony_ci 3835bd8deadSopenharmony_ci RESOLVED: The offset of the operand must be aligned according to the 3845bd8deadSopenharmony_ci size of the fetch. For 1-, 2-, and 4-component fetches, the offset must 3855bd8deadSopenharmony_ci be a multiple of <N>, 2*<N>, and 4*<N>, where <N> is the size in bytes 3865bd8deadSopenharmony_ci of the components being fetched. 3875bd8deadSopenharmony_ci 3885bd8deadSopenharmony_ci (2) NV_parameter_buffer_object provides an implementation-dependent limit 3895bd8deadSopenharmony_ci on the portion of a buffer object that may be fetched via BUFFER and 3905bd8deadSopenharmony_ci BUFFER4 variables? Should the same limits apply to the LDC 3915bd8deadSopenharmony_ci instruction? 3925bd8deadSopenharmony_ci 3935bd8deadSopenharmony_ci RESOLVED: Yes. On currently shipping NVIDIA GPUs, the maximum program 3945bd8deadSopenharmony_ci parameter buffer size is 16384 32-bit words, or 64KB. Buffers larger 3955bd8deadSopenharmony_ci than 64KB may be used, but any fetches accessing memory beyond the first 3965bd8deadSopenharmony_ci 64KB of a buffer binding will return undefined values. 3975bd8deadSopenharmony_ci 3985bd8deadSopenharmony_ci (3) Should we support fetches of 3-component vectors? If so, what should 3995bd8deadSopenharmony_ci be the minimum alignment for the specified offset? 4005bd8deadSopenharmony_ci 4015bd8deadSopenharmony_ci RESOLVED: No, we'll leave 3-component vectors out of this extension. 4025bd8deadSopenharmony_ci This limitation can be worked around by either by doing three separate 4035bd8deadSopenharmony_ci single-component fetches or a four-component fetch with an appropriate 4045bd8deadSopenharmony_ci write mask. The former approach supports indexing in a tightly packed 4055bd8deadSopenharmony_ci array of 3-component vectors; the latter would require that array 4065bd8deadSopenharmony_ci elements be padded to four components. 4075bd8deadSopenharmony_ci 4085bd8deadSopenharmony_ci (4) Should we support fetches of 8- and 16-bit components? 4095bd8deadSopenharmony_ci 4105bd8deadSopenharmony_ci RESOLVED: Yes, we will support fetches of 8- and 16-bit signed and 4115bd8deadSopenharmony_ci unsigned integers. 4125bd8deadSopenharmony_ci 4135bd8deadSopenharmony_ci Fetches of vectors of 8- and 16-bit integers are not supported but may 4145bd8deadSopenharmony_ci be emulated by performing shift/mask operations on the results of 32-bit 4155bd8deadSopenharmony_ci fetches. 4165bd8deadSopenharmony_ci 4175bd8deadSopenharmony_ci Fetches of 16-bit floating-point values, or floating-point vectors 4185bd8deadSopenharmony_ci thereof, are not supported. A single fp16 fetch may be emulated using a 4195bd8deadSopenharmony_ci 16-bit unsigned integer fetch and the UP2H instruction to convert the 16 4205bd8deadSopenharmony_ci LSBs of the fetch to a floating-point value. The encoding of 16-bit 4215bd8deadSopenharmony_ci floating-point values is described in section 2.1.2 of the OpenGL 3.0 4225bd8deadSopenharmony_ci specification. 4235bd8deadSopenharmony_ci 4245bd8deadSopenharmony_ci (5) Should we support fetches of 64-bit components? 4255bd8deadSopenharmony_ci 4265bd8deadSopenharmony_ci RESOLVED: No; the instruction set provided by NV_gpu_program4 does not 4275bd8deadSopenharmony_ci support 64-bit components anywhere. If future instructions support 4285bd8deadSopenharmony_ci 64-bit components, this restriction should be removed. 4295bd8deadSopenharmony_ci 4305bd8deadSopenharmony_ci (6) How should the operands of the LDC instruction should be specified? 4315bd8deadSopenharmony_ci 4325bd8deadSopenharmony_ci RESOLVED: We will create a new type of buffer variable ("CBUFFER"), 4335bd8deadSopenharmony_ci which defines an array of bytes to be fetched form. The type of fetch 4345bd8deadSopenharmony_ci to perform is specified by a storage modifier (as in 4355bd8deadSopenharmony_ci NV_shader_buffer_load). An offset relative to the buffer binding (in 4365bd8deadSopenharmony_ci bytes) may be specified using normal array indexing syntax, and an index 4375bd8deadSopenharmony_ci computed at run-time is supported. 4385bd8deadSopenharmony_ci 4395bd8deadSopenharmony_ci Some examples: 4405bd8deadSopenharmony_ci 4415bd8deadSopenharmony_ci CBUFFER buffer[] = { program.buffer[0] }; 4425bd8deadSopenharmony_ci TEMP i; 4435bd8deadSopenharmony_ci MOV.S i, 32; # computed offset of 32B 4445bd8deadSopenharmony_ci LDC.F32 result, buffer[12]; # (x,0,0,0) from bytes 12..15 4455bd8deadSopenharmony_ci LDC.F32X4 result, buffer[16]; # (x,y,z,w) from bytes 16..31 4465bd8deadSopenharmony_ci LDC.U8 result, buffer[i.x+3]; # (x,0,0,0) from byte 35 4475bd8deadSopenharmony_ci LDC.S32 result, buffer[i.x+12]; # (x,0,0,0) from bytes 44..47 4485bd8deadSopenharmony_ci LDC.U32X2 result, buffer[i.x+8]; # (x,y,0,0) from bytes 40..47 4495bd8deadSopenharmony_ci LDC.S16 result, buffer[i.x+2]; # (x,0,0,0) from bytes 34..35 4505bd8deadSopenharmony_ci 4515bd8deadSopenharmony_ci We chose to provide the new buffer variable type (CBUFFER) rather than 4525bd8deadSopenharmony_ci reusing BUFFER or BUFFER4. For CBUFFER variables, "buffer[12]" 4535bd8deadSopenharmony_ci unambiguously specifies a 12-byte offset. For BUFFER or BUFFER4 4545bd8deadSopenharmony_ci variables, an operand of "buffer[12]" already has an existing meaning, 4555bd8deadSopenharmony_ci implying an offset of 12 words or vectors, which would be 48 or 192 4565bd8deadSopenharmony_ci bytes, respectively. Because we want to be able to fetch 8-, and 16-bit 4575bd8deadSopenharmony_ci units, having an offset multiplied by four doesn't make sense. We could 4585bd8deadSopenharmony_ci have had LDC simply ignore the type of binding and always interpret an 4595bd8deadSopenharmony_ci index as a byte offset, but chose the new declaration type to avoid 4605bd8deadSopenharmony_ci confusion. 4615bd8deadSopenharmony_ci 4625bd8deadSopenharmony_ci We also considered an approach where the buffer and offset were 4635bd8deadSopenharmony_ci specified in separate operands. That would be similar to texture, where 4645bd8deadSopenharmony_ci the coordinates and texture are specified separately. The first operand 4655bd8deadSopenharmony_ci would have been interpreted as a unsigned scalar specifying a byte 4665bd8deadSopenharmony_ci offset, the second operand would have specified a buffer variable 4675bd8deadSopenharmony_ci binding, and a pointer would be obtained by adding the two 4685bd8deadSopenharmony_ci operands. This would have looked something like: 4695bd8deadSopenharmony_ci 4705bd8deadSopenharmony_ci BUFFER buffer[] = { program.buffer[0] }; 4715bd8deadSopenharmony_ci LDC.S32X2 result, offset.x, buffer; 4725bd8deadSopenharmony_ci 4735bd8deadSopenharmony_ci We chose not to implement this approach mainly because this syntax would 4745bd8deadSopenharmony_ci require specifying a new type of instruction; the syntax we adopted 4755bd8deadSopenharmony_ci simply reuses existing vector operand and indexing mechanisms. 4765bd8deadSopenharmony_ci Additionally, the syntax in this extension provides immediate offsets 4775bd8deadSopenharmony_ci for "free", which the operand-buffer syntax would not support directly 4785bd8deadSopenharmony_ci without additional new syntax. For example, to load a structure with a 4795bd8deadSopenharmony_ci pair of two-component vectors using offset-buffer syntax, you would have 4805bd8deadSopenharmony_ci to do something like: 4815bd8deadSopenharmony_ci 4825bd8deadSopenharmony_ci BUFFER buffer[] = { program.buffer[0] }; 4835bd8deadSopenharmony_ci TEMP offset; 4845bd8deadSopenharmony_ci LDC.S32X2 result1, offset.x, buffer; 4855bd8deadSopenharmony_ci ADD.U offset.x, offset.x, 8; # bump offset to second vector 4865bd8deadSopenharmony_ci LDC.S32X2 result2, offset.x, buffer; 4875bd8deadSopenharmony_ci 4885bd8deadSopenharmony_ci (7) How should the fetches in the LDC instruction interact with other 4895bd8deadSopenharmony_ci operand modifiers (swizzle, absolute value, negation)? With result 4905bd8deadSopenharmony_ci modifiers (condition codes, saturation)? 4915bd8deadSopenharmony_ci 4925bd8deadSopenharmony_ci RESOLVED: These features will be orthogonal. When any of these 4935bd8deadSopenharmony_ci modifiers are specified, the base data type to which they apply come 4945bd8deadSopenharmony_ci from the storage modifier of the LDC instruction. 4955bd8deadSopenharmony_ci 4965bd8deadSopenharmony_ci The LDC instruction is defined to produce a "base operand vector" from a 4975bd8deadSopenharmony_ci memory fetch. This isn't particularly different from normal operands, 4985bd8deadSopenharmony_ci where a base operand vector is derived from the binding corresponding to 4995bd8deadSopenharmony_ci the operand. In both cases, the components of this vector are swizzled 5005bd8deadSopenharmony_ci and have optional absolute value and negation operations performed to 5015bd8deadSopenharmony_ci produce a final vector operand, as is the case with other vector 5025bd8deadSopenharmony_ci operands. 5035bd8deadSopenharmony_ci 5045bd8deadSopenharmony_ci If condition code operations or saturation are specified for the result 5055bd8deadSopenharmony_ci vector, these operations are performed using the appropriate data types. 5065bd8deadSopenharmony_ci 5075bd8deadSopenharmony_ci (8) What happens if a non-zero base offset is specified for a CBUFFER 5085bd8deadSopenharmony_ci variable? 5095bd8deadSopenharmony_ci 5105bd8deadSopenharmony_ci RESOLVED: A subset of the bytes in a buffer object can be specified 5115bd8deadSopenharmony_ci using range syntax like the following: 5125bd8deadSopenharmony_ci 5135bd8deadSopenharmony_ci CBUFFER buffer[] = { program.buffer[0][16..31] }; 5145bd8deadSopenharmony_ci 5155bd8deadSopenharmony_ci The sub-range need not start at the beginning of the buffer object; in 5165bd8deadSopenharmony_ci the example above, it starts 16 bytes into the buffer. When accessing a 5175bd8deadSopenharmony_ci parameter buffer variable corresponding to such a sub-range, an array 5185bd8deadSopenharmony_ci index is relative to the base of the sub-range. So the offset of the 5195bd8deadSopenharmony_ci sub-range is effectively added to the index used for the LDC operand: 5205bd8deadSopenharmony_ci 5215bd8deadSopenharmony_ci LDC.F32 result, buffer[12]; # (x,0,0,0) from bytes 28..31 5225bd8deadSopenharmony_ci 5235bd8deadSopenharmony_ci (9) What happens if a non-array CBUFFER variable is used? 5245bd8deadSopenharmony_ci 5255bd8deadSopenharmony_ci RESOLVED: A non-array variable may be used with LDC. However, array 5265bd8deadSopenharmony_ci indexing isn't supported with non-array variables, so all LDC loads 5275bd8deadSopenharmony_ci using that variable will fetch using the same base address. 5285bd8deadSopenharmony_ci 5295bd8deadSopenharmony_ci CBUFFER bufferElement = program.buffer[0][32]; 5305bd8deadSopenharmony_ci LDC.U8 result, buffer; # (x,0,0,0) from byte 32 5315bd8deadSopenharmony_ci LDC.S16 result, buffer; # (x,0,0,0) from bytes 32..33 5325bd8deadSopenharmony_ci LDC.F32 result, buffer; # (x,0,0,0) from bytes 32..35 5335bd8deadSopenharmony_ci LDC.F32X4 result, buffer; # (x,y,z,w) from bytes 32..47 5345bd8deadSopenharmony_ci 5355bd8deadSopenharmony_ci (10) Should single-component fetches from LDC smear their results across 5365bd8deadSopenharmony_ci all four components of the result vector, to allow packing multiple 5375bd8deadSopenharmony_ci non-vectors into a single vector? 5385bd8deadSopenharmony_ci 5395bd8deadSopenharmony_ci RESOLVED: No. However, swizzle suffixes on the operand will provide 5405bd8deadSopenharmony_ci this capability for free. For example, let's say you wanted to fetch 5415bd8deadSopenharmony_ci four scalars from a buffer and pack the results into a single temporary 5425bd8deadSopenharmony_ci vector. The swizzle syntax lets you do this by smearing the real 5435bd8deadSopenharmony_ci component (always fetched in "x") into the other components: 5445bd8deadSopenharmony_ci 5455bd8deadSopenharmony_ci CBUFFER buffer[] = { program.buffer[0] }; 5465bd8deadSopenharmony_ci LDC.F32 temp.x, buffer[16]; 5475bd8deadSopenharmony_ci LDC.F32 temp.y, buffer[28].x; 5485bd8deadSopenharmony_ci LDC.F32 temp.z, buffer[32].x; 5495bd8deadSopenharmony_ci LDC.F32 temp.w, buffer[40].x; 5505bd8deadSopenharmony_ci 5515bd8deadSopenharmony_ci 5525bd8deadSopenharmony_ciRevision History 5535bd8deadSopenharmony_ci 5545bd8deadSopenharmony_ci Rev. Date Author Changes 5555bd8deadSopenharmony_ci ---- -------- -------- ----------------------------------------- 5565bd8deadSopenharmony_ci 1 pbrown Internal revisions. 5575bd8deadSopenharmony_ci 2 09/09/09 mjk Assigned number 558