15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_shader_buffer_store 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5) 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciStatus 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci Shipping. 165bd8deadSopenharmony_ci 175bd8deadSopenharmony_ciVersion 185bd8deadSopenharmony_ci 195bd8deadSopenharmony_ci Last Modified Date: May 25, 2022 205bd8deadSopenharmony_ci NVIDIA Revision: 6 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ciNumber 235bd8deadSopenharmony_ci 245bd8deadSopenharmony_ci 390 255bd8deadSopenharmony_ci 265bd8deadSopenharmony_ciDependencies 275bd8deadSopenharmony_ci 285bd8deadSopenharmony_ci OpenGL 3.0 and GLSL 1.30 are required. 295bd8deadSopenharmony_ci 305bd8deadSopenharmony_ci This extension is written against the OpenGL 3.2 (Compatibility Profile) 315bd8deadSopenharmony_ci specification, dated July 24, 2009. 325bd8deadSopenharmony_ci 335bd8deadSopenharmony_ci This extension is written against version 1.50.09 of the OpenGL Shading 345bd8deadSopenharmony_ci Language Specification. 355bd8deadSopenharmony_ci 365bd8deadSopenharmony_ci OpenGL 3.0 and GLSL 1.30 are required. 375bd8deadSopenharmony_ci 385bd8deadSopenharmony_ci NV_shader_buffer_load is required. 395bd8deadSopenharmony_ci 405bd8deadSopenharmony_ci NV_gpu_program5 and/or NV_gpu_shader5 is required. 415bd8deadSopenharmony_ci 425bd8deadSopenharmony_ci This extension interacts with EXT_shader_image_load_store. 435bd8deadSopenharmony_ci 445bd8deadSopenharmony_ci This extension interacts with NV_gpu_shader5. 455bd8deadSopenharmony_ci 465bd8deadSopenharmony_ci This extension interacts with NV_gpu_program5. 475bd8deadSopenharmony_ci 485bd8deadSopenharmony_ci This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object, 495bd8deadSopenharmony_ci and ARB_compute_shader. 505bd8deadSopenharmony_ci 515bd8deadSopenharmony_ci This extension interacts with OpenGL 4.2. 525bd8deadSopenharmony_ci 535bd8deadSopenharmony_ciOverview 545bd8deadSopenharmony_ci 555bd8deadSopenharmony_ci This extension builds upon the mechanisms added by the 565bd8deadSopenharmony_ci NV_shader_buffer_load extension to allow shaders to perform random-access 575bd8deadSopenharmony_ci reads to buffer object memory without using dedicated buffer object 585bd8deadSopenharmony_ci binding points. Instead, it allowed an application to make a buffer 595bd8deadSopenharmony_ci object resident, query a GPU address (pointer) for the buffer object, and 605bd8deadSopenharmony_ci then use that address as a pointer in shader code. This approach allows 615bd8deadSopenharmony_ci shaders to access a large number of buffer objects without needing to 625bd8deadSopenharmony_ci repeatedly bind buffers to a limited number of fixed-functionality binding 635bd8deadSopenharmony_ci points. 645bd8deadSopenharmony_ci 655bd8deadSopenharmony_ci This extension lifts the restriction from NV_shader_buffer_load that 665bd8deadSopenharmony_ci disallows writes. In particular, the MakeBufferResidentNV function now 675bd8deadSopenharmony_ci allows READ_WRITE and WRITE_ONLY access modes, and the shading language is 685bd8deadSopenharmony_ci extended to allow shaders to write through (GPU address) pointers. 695bd8deadSopenharmony_ci Additionally, the extension provides built-in functions to perform atomic 705bd8deadSopenharmony_ci memory transactions to buffer object memory. 715bd8deadSopenharmony_ci 725bd8deadSopenharmony_ci As with the shader writes provided by the EXT_shader_image_load_store 735bd8deadSopenharmony_ci extension, writes to buffer object memory using this extension are weakly 745bd8deadSopenharmony_ci ordered to allow for parallel or distributed shader execution. The 755bd8deadSopenharmony_ci EXT_shader_image_load_store extension provides mechanisms allowing for 765bd8deadSopenharmony_ci finer control of memory transaction order, and those mechanisms apply 775bd8deadSopenharmony_ci equally to buffer object stores using this extension. 785bd8deadSopenharmony_ci 795bd8deadSopenharmony_ci 805bd8deadSopenharmony_ciNew Procedures and Functions 815bd8deadSopenharmony_ci 825bd8deadSopenharmony_ci None. 835bd8deadSopenharmony_ci 845bd8deadSopenharmony_ciNew Tokens 855bd8deadSopenharmony_ci 865bd8deadSopenharmony_ci Accepted by the <barriers> parameter of MemoryBarrierEXT: 875bd8deadSopenharmony_ci 885bd8deadSopenharmony_ci SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 0x00000010 895bd8deadSopenharmony_ci 905bd8deadSopenharmony_ci Accepted by the <access> parameter of MakeBufferResidentNV: 915bd8deadSopenharmony_ci 925bd8deadSopenharmony_ci READ_WRITE 935bd8deadSopenharmony_ci WRITE_ONLY 945bd8deadSopenharmony_ci 955bd8deadSopenharmony_ci 965bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 975bd8deadSopenharmony_ci(OpenGL Operation) 985bd8deadSopenharmony_ci 995bd8deadSopenharmony_ci Modify Section 2.9, Buffer Objects, p. 46 1005bd8deadSopenharmony_ci 1015bd8deadSopenharmony_ci (extend the language inserted by NV_shader_buffer_load in its "Append to 1025bd8deadSopenharmony_ci Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings) 1035bd8deadSopenharmony_ci 1045bd8deadSopenharmony_ci The data store of a buffer object may be made accessible to the GL 1055bd8deadSopenharmony_ci via shader buffer loads and stores by calling: 1065bd8deadSopenharmony_ci 1075bd8deadSopenharmony_ci void MakeBufferResidentNV(enum target, enum access); 1085bd8deadSopenharmony_ci 1095bd8deadSopenharmony_ci <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY. If a shader loads 1105bd8deadSopenharmony_ci from a buffer with WRITE_ONLY <access> or stores to a buffer with 1115bd8deadSopenharmony_ci READ_ONLY <access>, the results of that shader operation are undefined and 1125bd8deadSopenharmony_ci may lead to application termination. <target> may be any of the buffer 1135bd8deadSopenharmony_ci targets accepted by BindBuffer. 1145bd8deadSopenharmony_ci 1155bd8deadSopenharmony_ci The data store of a buffer object may be made inaccessible to the GL 1165bd8deadSopenharmony_ci via shader buffer loads and stores by calling: 1175bd8deadSopenharmony_ci 1185bd8deadSopenharmony_ci void MakeBufferNonResidentNV(enum target); 1195bd8deadSopenharmony_ci 1205bd8deadSopenharmony_ci 1215bd8deadSopenharmony_ci Modify "Section 2.20.X, Shader Memory Access" introduced by the 1225bd8deadSopenharmony_ci NV_shader_buffer_load specification, to reflect that shaders may store to 1235bd8deadSopenharmony_ci buffer object memory. 1245bd8deadSopenharmony_ci 1255bd8deadSopenharmony_ci (first paragraph) Shaders may load from or store to buffer object memory 1265bd8deadSopenharmony_ci by dereferencing pointer variables. ... 1275bd8deadSopenharmony_ci 1285bd8deadSopenharmony_ci (second paragraph) When a shader dereferences a pointer variable, data are 1295bd8deadSopenharmony_ci read from or written to buffer object memory according to the following 1305bd8deadSopenharmony_ci rules: 1315bd8deadSopenharmony_ci 1325bd8deadSopenharmony_ci (modify the paragraph after the end of the alignment and stride rules, 1335bd8deadSopenharmony_ci allowing for writes, and also providing rules forbidding reads to 1345bd8deadSopenharmony_ci WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU 1355bd8deadSopenharmony_ci memory address that does not correspond to a buffer object made resident 1365bd8deadSopenharmony_ci by MakeBufferResidentNV, the results of the operation are undefined and 1375bd8deadSopenharmony_ci may result in application termination. If a shader reads from a buffer 1385bd8deadSopenharmony_ci object made resident with an <access> parameter of WRITE_ONLY, or writes 1395bd8deadSopenharmony_ci to a buffer object made resident with an <access> parameter of READ_ONLY, 1405bd8deadSopenharmony_ci the results of the operation are also undefined and may lead to 1415bd8deadSopenharmony_ci application termination. 1425bd8deadSopenharmony_ci 1435bd8deadSopenharmony_ci Incorporate the contents of "Section 2.14.X, Shader Memory Access" from 1445bd8deadSopenharmony_ci the EXT_shader_image_load_store specification into the same "Shader memory 1455bd8deadSopenharmony_ci Access", with the following edits. 1465bd8deadSopenharmony_ci 1475bd8deadSopenharmony_ci (modify first paragraph to reference pointers) Shaders may perform 1485bd8deadSopenharmony_ci random-access reads and writes to texture or buffer object memory using 1495bd8deadSopenharmony_ci pointers or with built-in image load, store, and atomic functions, as 1505bd8deadSopenharmony_ci described in the OpenGL Shading Language Specification. ... 1515bd8deadSopenharmony_ci 1525bd8deadSopenharmony_ci (add to list of bits in <barriers> in MemoryBarrierEXT) 1535bd8deadSopenharmony_ci 1545bd8deadSopenharmony_ci - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV: Memory accesses using pointers and 1555bd8deadSopenharmony_ci assembly program global loads, stores, and atomics issued after the 1565bd8deadSopenharmony_ci barrier will reflect data written by shaders prior to the barrier. 1575bd8deadSopenharmony_ci Additionally, memory writes using pointers issued after the barrier 1585bd8deadSopenharmony_ci will not execute until memory accesses (loads, stores, texture 1595bd8deadSopenharmony_ci fetches, vertex fetches, etc) initiated prior to the barrier complete. 1605bd8deadSopenharmony_ci 1615bd8deadSopenharmony_ci (modify second paragraph after the list of <barriers> bits) To allow for 1625bd8deadSopenharmony_ci independent shader threads to communicate by reads and writes to a common 1635bd8deadSopenharmony_ci memory address, pointers and image variables in the OpenGL shading 1645bd8deadSopenharmony_ci language may be declared as "coherent". Buffer object or texture memory 1655bd8deadSopenharmony_ci accessed through such variables may be cached only if... 1665bd8deadSopenharmony_ci 1675bd8deadSopenharmony_ci (add to the coherency guidelines) 1685bd8deadSopenharmony_ci 1695bd8deadSopenharmony_ci - Data written using pointers in one rendering pass and read by the shader 1705bd8deadSopenharmony_ci in a later pass need not use coherent variables or memoryBarrier(). 1715bd8deadSopenharmony_ci Calling MemoryBarrierEXT() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 1725bd8deadSopenharmony_ci set in <barriers> between passes is necessary. 1735bd8deadSopenharmony_ci 1745bd8deadSopenharmony_ci 1755bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 1765bd8deadSopenharmony_ci(Rasterization) 1775bd8deadSopenharmony_ci 1785bd8deadSopenharmony_ci None. 1795bd8deadSopenharmony_ci 1805bd8deadSopenharmony_ci 1815bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 1825bd8deadSopenharmony_ci(Per-Fragment Operations and the Frame Buffer) 1835bd8deadSopenharmony_ci 1845bd8deadSopenharmony_ci None. 1855bd8deadSopenharmony_ci 1865bd8deadSopenharmony_ci 1875bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 1885bd8deadSopenharmony_ci(Special Functions) 1895bd8deadSopenharmony_ci 1905bd8deadSopenharmony_ci None. 1915bd8deadSopenharmony_ci 1925bd8deadSopenharmony_ci 1935bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 1945bd8deadSopenharmony_ci(State and State Requests) 1955bd8deadSopenharmony_ci 1965bd8deadSopenharmony_ci None. 1975bd8deadSopenharmony_ci 1985bd8deadSopenharmony_ci 1995bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 2005bd8deadSopenharmony_ciSpecification (Invariance) 2015bd8deadSopenharmony_ci 2025bd8deadSopenharmony_ci None. 2035bd8deadSopenharmony_ci 2045bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications 2055bd8deadSopenharmony_ci 2065bd8deadSopenharmony_ci None. 2075bd8deadSopenharmony_ci 2085bd8deadSopenharmony_ciGLX Protocol 2095bd8deadSopenharmony_ci 2105bd8deadSopenharmony_ci None. 2115bd8deadSopenharmony_ci 2125bd8deadSopenharmony_ci 2135bd8deadSopenharmony_ciAdditions to the OpenGL Shading Language Specification, Version 1.50 (Revision 2145bd8deadSopenharmony_ci09) 2155bd8deadSopenharmony_ci 2165bd8deadSopenharmony_ci Modify Section 4.3.X, Memory Access Qualifiers, as added by 2175bd8deadSopenharmony_ci EXT_shader_image_load_store 2185bd8deadSopenharmony_ci 2195bd8deadSopenharmony_ci (modify second paragraph) Memory accesses to image and pointer variables 2205bd8deadSopenharmony_ci declared using the "coherent" storage qualifier are performed coherently 2215bd8deadSopenharmony_ci with similar accesses from other shader threads. ... 2225bd8deadSopenharmony_ci 2235bd8deadSopenharmony_ci (modify fourth paragraph) Memory accesses to image and pointer variables 2245bd8deadSopenharmony_ci declared using the "volatile" storage qualifier must treat the underlying 2255bd8deadSopenharmony_ci memory as though it could be read or written at any point during shader 2265bd8deadSopenharmony_ci execution by some source other than the executing thread. ... 2275bd8deadSopenharmony_ci 2285bd8deadSopenharmony_ci (modify fifth paragraph) Memory accesses to image and pointer variables 2295bd8deadSopenharmony_ci declared using the "restrict" storage qualifier may be compiled assuming 2305bd8deadSopenharmony_ci that the variable used to perform the memory access is the only way to 2315bd8deadSopenharmony_ci access the underlying memory using the shader stage in question. ... 2325bd8deadSopenharmony_ci 2335bd8deadSopenharmony_ci (modify sixth paragraph) Memory accesses to image and pointer variables 2345bd8deadSopenharmony_ci declared using the "const" storage qualifier may only read the underlying 2355bd8deadSopenharmony_ci memory, which is treated as read-only. ... 2365bd8deadSopenharmony_ci 2375bd8deadSopenharmony_ci (insert after seventh paragraph) 2385bd8deadSopenharmony_ci 2395bd8deadSopenharmony_ci In pointer variable declarations, the "coherent", "volatile", "restrict", 2405bd8deadSopenharmony_ci and "const" qualifiers can be positioned anywhere in the declaration, and 2415bd8deadSopenharmony_ci may apply qualify either a pointer or the underlying data being pointed 2425bd8deadSopenharmony_ci to, depending on its position in the declaration. Each qualifier to the 2435bd8deadSopenharmony_ci right of the basic data type in a declaration is considered to apply to 2445bd8deadSopenharmony_ci whatever type is found immediately to its left; qualifiers to the left of 2455bd8deadSopenharmony_ci the basic type are considered to apply to that basic type. To interpret 2465bd8deadSopenharmony_ci the meaning of qualifiers in pointer declarations, it is useful to read 2475bd8deadSopenharmony_ci the declaration from right to left as in the following examples. 2485bd8deadSopenharmony_ci 2495bd8deadSopenharmony_ci int * * const a; // a is a constant pointer to a pointer to int 2505bd8deadSopenharmony_ci int * volatile * b; // b is a pointer to a volatile pointer to int 2515bd8deadSopenharmony_ci int const * * c; // c is a pointer to a pointer to a constant int 2525bd8deadSopenharmony_ci const int * * d; // d is like c 2535bd8deadSopenharmony_ci int const * const * // e is a constant pointer to a constant pointer 2545bd8deadSopenharmony_ci const e; // to a constant int 2555bd8deadSopenharmony_ci 2565bd8deadSopenharmony_ci For pointer types, the "restrict" qualifier can be used to qualify 2575bd8deadSopenharmony_ci pointers, but not non-pointer types being pointed to. 2585bd8deadSopenharmony_ci 2595bd8deadSopenharmony_ci int * restrict a; // a is a restricted pointer to int 2605bd8deadSopenharmony_ci int restrict * b; // b qualifies "int" as restricted - illegal 2615bd8deadSopenharmony_ci 2625bd8deadSopenharmony_ci (modify eighth paragraph) The "coherent", "volatile", and "restrict" 2635bd8deadSopenharmony_ci storage qualifiers may only be used on image and pointer variables, and 2645bd8deadSopenharmony_ci may not be used on variables of any other type. ... 2655bd8deadSopenharmony_ci 2665bd8deadSopenharmony_ci (modify last paragraph) The values of image and pointer variables 2675bd8deadSopenharmony_ci qualified with "coherent," "volatile," "restrict", or "const" may not be 2685bd8deadSopenharmony_ci assigned to function parameters or l-values lacking such qualifiers. 2695bd8deadSopenharmony_ci 2705bd8deadSopenharmony_ci (add examples for the last paragraph) 2715bd8deadSopenharmony_ci 2725bd8deadSopenharmony_ci int volatile * var1; 2735bd8deadSopenharmony_ci int * var2; 2745bd8deadSopenharmony_ci int * restrict var3; 2755bd8deadSopenharmony_ci var1 = var2; // OK, adding "volatile" is allowed 2765bd8deadSopenharmony_ci var2 = var3; // illegal, stripping "restrict" is not 2775bd8deadSopenharmony_ci 2785bd8deadSopenharmony_ci 2795bd8deadSopenharmony_ci Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load 2805bd8deadSopenharmony_ci 2815bd8deadSopenharmony_ci (modify second paragraph, allowing storing through pointers) The pointer 2825bd8deadSopenharmony_ci dereference operator ... The result of a pointer dereference may be used 2835bd8deadSopenharmony_ci as the left-hand side of an assignment. 2845bd8deadSopenharmony_ci 2855bd8deadSopenharmony_ci 2865bd8deadSopenharmony_ci Modify Section 8.Y, Shader Memory Functions, as added by 2875bd8deadSopenharmony_ci EXT_shader_image_load_store 2885bd8deadSopenharmony_ci 2895bd8deadSopenharmony_ci (modify first paragraph) Shaders of all types may read and write the 2905bd8deadSopenharmony_ci contents of textures and buffer objects using pointers and image 2915bd8deadSopenharmony_ci variables. ... 2925bd8deadSopenharmony_ci 2935bd8deadSopenharmony_ci (modify description of memoryBarrier) memoryBarrier() can be used to 2945bd8deadSopenharmony_ci control the ordering of memory transactions issued by a shader thread. 2955bd8deadSopenharmony_ci When called, it will wait on the completion of all memory accesses 2965bd8deadSopenharmony_ci resulting from the use of pointers and image variables prior to calling 2975bd8deadSopenharmony_ci the function. ... 2985bd8deadSopenharmony_ci 2995bd8deadSopenharmony_ci (add the following paragraphs to the end of the section) 3005bd8deadSopenharmony_ci 3015bd8deadSopenharmony_ci If multiple threads need to atomically access shared memory addresses 3025bd8deadSopenharmony_ci using pointers, they may do so using the following built-in functions. 3035bd8deadSopenharmony_ci The following atomic memory access functions allow a shader thread to 3045bd8deadSopenharmony_ci read, modify, and write an address in memory in a manner that guarantees 3055bd8deadSopenharmony_ci that no other shader thread can modify the memory between the read and the 3065bd8deadSopenharmony_ci write. All of these functions read a single data element from memory, 3075bd8deadSopenharmony_ci compute a new value based on the value read from memory and one or more 3085bd8deadSopenharmony_ci other values passed to the function, and writes the result back to the 3095bd8deadSopenharmony_ci same memory address. The value returned to the caller is always the data 3105bd8deadSopenharmony_ci element originally read from memory. 3115bd8deadSopenharmony_ci 3125bd8deadSopenharmony_ci Syntax: 3135bd8deadSopenharmony_ci 3145bd8deadSopenharmony_ci uint atomicAdd(uint *address, uint data); 3155bd8deadSopenharmony_ci int atomicAdd(int *address, int data); 3165bd8deadSopenharmony_ci uint64_t atomicAdd(uint64_t *address, uint64_t data); 3175bd8deadSopenharmony_ci 3185bd8deadSopenharmony_ci uint atomicMin(uint *address, uint data); 3195bd8deadSopenharmony_ci int atomicMin(int *address, int data); 3205bd8deadSopenharmony_ci 3215bd8deadSopenharmony_ci uint atomicMax(uint *address, uint data); 3225bd8deadSopenharmony_ci int atomicMax(int *address, int data); 3235bd8deadSopenharmony_ci 3245bd8deadSopenharmony_ci uint atomicIncWrap(uint *address, uint wrap); 3255bd8deadSopenharmony_ci 3265bd8deadSopenharmony_ci uint atomicDecWrap(uint *address, uint wrap); 3275bd8deadSopenharmony_ci 3285bd8deadSopenharmony_ci uint atomicAnd(uint *address, uint data); 3295bd8deadSopenharmony_ci int atomicAnd(int *address, int data); 3305bd8deadSopenharmony_ci 3315bd8deadSopenharmony_ci uint atomicOr(uint *address, uint data); 3325bd8deadSopenharmony_ci int atomicOr(int *address, int data); 3335bd8deadSopenharmony_ci 3345bd8deadSopenharmony_ci uint atomicXor(uint *address, uint data); 3355bd8deadSopenharmony_ci int atomicXor(int *address, int data); 3365bd8deadSopenharmony_ci 3375bd8deadSopenharmony_ci uint atomicExchange(uint *address, uint data); 3385bd8deadSopenharmony_ci int atomicExchange(int *address, uint data); 3395bd8deadSopenharmony_ci uint64_t atomicExchange(uint64_t *address, uint64_t data); 3405bd8deadSopenharmony_ci 3415bd8deadSopenharmony_ci uint atomicCompSwap(uint *address, uint compare, uint data); 3425bd8deadSopenharmony_ci int atomicCompSwap(int *address, int compare, int data); 3435bd8deadSopenharmony_ci uint64_t atomicCompSwap(uint64_t *address, uint64_t compare, 3445bd8deadSopenharmony_ci uint64_t data); 3455bd8deadSopenharmony_ci 3465bd8deadSopenharmony_ci Description: 3475bd8deadSopenharmony_ci 3485bd8deadSopenharmony_ci atomicAdd() computes the new value written to <address> by adding the 3495bd8deadSopenharmony_ci value of <data> to the contents of <address>. This function supports 32- 3505bd8deadSopenharmony_ci and 64-bit unsigned integer operands, and 32-bit signed integer operands. 3515bd8deadSopenharmony_ci 3525bd8deadSopenharmony_ci atomicMin() computes the new value written to <address> by taking the 3535bd8deadSopenharmony_ci minimum of the value of <data> and the contents of <address>. This 3545bd8deadSopenharmony_ci function supports 32-bit signed and unsigned integer operands. 3555bd8deadSopenharmony_ci 3565bd8deadSopenharmony_ci atomicMax() computes the new value written to <address> by taking the 3575bd8deadSopenharmony_ci maximum of the value of <data> and the contents of <address>. This 3585bd8deadSopenharmony_ci function supports 32-bit signed and unsigned integer operands. 3595bd8deadSopenharmony_ci 3605bd8deadSopenharmony_ci atomicIncWrap() computes the new value written to <address> by adding one 3615bd8deadSopenharmony_ci to the contents of <address>, and then forcing the result to zero if and 3625bd8deadSopenharmony_ci only if the incremented value is greater than or equal to <wrap>. This 3635bd8deadSopenharmony_ci function supports only 32-bit unsigned integer operands. 3645bd8deadSopenharmony_ci 3655bd8deadSopenharmony_ci atomicDecWrap() computes the new value written to <address> by subtracting 3665bd8deadSopenharmony_ci one from the contents of <address>, and then forcing the result to 3675bd8deadSopenharmony_ci <wrap>-1 if the original value read from <address> was either zero or 3685bd8deadSopenharmony_ci greater than <wrap>. This function supports only 32-bit unsigned integer 3695bd8deadSopenharmony_ci operands. 3705bd8deadSopenharmony_ci 3715bd8deadSopenharmony_ci atomicAnd() computes the new value written to <address> by performing a 3725bd8deadSopenharmony_ci bitwise and of the value of <data> and the contents of <address>. This 3735bd8deadSopenharmony_ci function supports 32-bit signed and unsigned integer operands. 3745bd8deadSopenharmony_ci 3755bd8deadSopenharmony_ci atomicOr() computes the new value written to <address> by performing a 3765bd8deadSopenharmony_ci bitwise or of the value of <data> and the contents of <address>. This 3775bd8deadSopenharmony_ci function supports 32-bit signed and unsigned integer operands. 3785bd8deadSopenharmony_ci 3795bd8deadSopenharmony_ci atomicXor() computes the new value written to <address> by performing a 3805bd8deadSopenharmony_ci bitwise exclusive or of the value of <data> and the contents of <address>. 3815bd8deadSopenharmony_ci This function supports 32-bit signed and unsigned integer operands. 3825bd8deadSopenharmony_ci 3835bd8deadSopenharmony_ci atomicExchange() uses the value of <data> as the value written to 3845bd8deadSopenharmony_ci <address>. This function supports 32- and 64-bit unsigned integer 3855bd8deadSopenharmony_ci operands and 32-bit signed integer operands. 3865bd8deadSopenharmony_ci 3875bd8deadSopenharmony_ci atomicCompSwap() compares the value of <compare> and the contents of 3885bd8deadSopenharmony_ci <address>. If the values are equal, <data> is written to <address>; 3895bd8deadSopenharmony_ci otherwise, the original contents of <address> are preserved. This 3905bd8deadSopenharmony_ci function supports 32- and 64-bit unsigned integer operands and 32-bit 3915bd8deadSopenharmony_ci signed integer operands. 3925bd8deadSopenharmony_ci 3935bd8deadSopenharmony_ci 3945bd8deadSopenharmony_ci Modify Section 9, Shading Language Grammar, p. 105 3955bd8deadSopenharmony_ci 3965bd8deadSopenharmony_ci !!! TBD: Add grammar constructs for memory access qualifiers, allowing 3975bd8deadSopenharmony_ci memory access qualifiers before or after the type and the "*" 3985bd8deadSopenharmony_ci characters indicating pointers in a variable declaration. 3995bd8deadSopenharmony_ci 4005bd8deadSopenharmony_ci 4015bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store 4025bd8deadSopenharmony_ci 4035bd8deadSopenharmony_ci This specification incorporates the memory access ordering and 4045bd8deadSopenharmony_ci synchronization discussion from EXT_shader_image_load_store verbatim. 4055bd8deadSopenharmony_ci 4065bd8deadSopenharmony_ci If EXT_shader_image_load_store is not supported, this spec should be 4075bd8deadSopenharmony_ci construed to introduce: 4085bd8deadSopenharmony_ci 4095bd8deadSopenharmony_ci * the shader memory access language from that specification, including 4105bd8deadSopenharmony_ci the MemoryBarrierEXT() command and the tokens accepted by <barriers> 4115bd8deadSopenharmony_ci from that specification; 4125bd8deadSopenharmony_ci 4135bd8deadSopenharmony_ci * the memoryBarrier() function to the OpenGL shading language 4145bd8deadSopenharmony_ci specification; and 4155bd8deadSopenharmony_ci 4165bd8deadSopenharmony_ci * the capability and spec language allowing applications to enable early 4175bd8deadSopenharmony_ci depth tests. 4185bd8deadSopenharmony_ci 4195bd8deadSopenharmony_ciDependencies on NV_gpu_shader5 4205bd8deadSopenharmony_ci 4215bd8deadSopenharmony_ci This specification requires either NV_gpu_shader5 or NV_gpu_program5. 4225bd8deadSopenharmony_ci 4235bd8deadSopenharmony_ci If NV_gpu_shader5 is supported, use of the new shading language features 4245bd8deadSopenharmony_ci described in this extension requires 4255bd8deadSopenharmony_ci 4265bd8deadSopenharmony_ci #extension GL_NV_gpu_shader5 : enable 4275bd8deadSopenharmony_ci 4285bd8deadSopenharmony_ci If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading 4295bd8deadSopenharmony_ci Language Specification should be removed. 4305bd8deadSopenharmony_ci 4315bd8deadSopenharmony_ciDependencies on NV_gpu_program5 4325bd8deadSopenharmony_ci 4335bd8deadSopenharmony_ci If NV_gpu_program5 is supported, the extension provides support for stores 4345bd8deadSopenharmony_ci and atomic memory transactions to buffer object memory. Stores are 4355bd8deadSopenharmony_ci provided by the STORE opcode; atomics are provided by the ATOM opcode. No 4365bd8deadSopenharmony_ci "OPTION" line is required for these features, which are implied by 4375bd8deadSopenharmony_ci NV_gpu_program5 program headers such as "!!NVfp5.0". The operation of 4385bd8deadSopenharmony_ci these opcodes is described in the NV_gpu_program5 extension specification. 4395bd8deadSopenharmony_ci 4405bd8deadSopenharmony_ci Note also that NV_gpu_program5 also supports the LOAD opcode originally 4415bd8deadSopenharmony_ci added by the NV_shader_buffer_load and the MEMBAR opcode originally 4425bd8deadSopenharmony_ci provided by EXT_shader_image_load_store. 4435bd8deadSopenharmony_ci 4445bd8deadSopenharmony_ciDependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and 4455bd8deadSopenharmony_ciARB_compute_shader 4465bd8deadSopenharmony_ci 4475bd8deadSopenharmony_ci If GLSL 4.30 is supported, add the following atomic memory functions to 4485bd8deadSopenharmony_ci section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification: 4495bd8deadSopenharmony_ci 4505bd8deadSopenharmony_ci uint atomicIncWrap(inout uint mem, uint wrap); 4515bd8deadSopenharmony_ci uint atomicDecWrap(inout uint mem, uint wrap); 4525bd8deadSopenharmony_ci 4535bd8deadSopenharmony_ci with the following documentation 4545bd8deadSopenharmony_ci 4555bd8deadSopenharmony_ci atomicIncWrap() computes the new value written to <mem> by adding one to 4565bd8deadSopenharmony_ci the contents of <mem>, and then forcing the result to zero if and only 4575bd8deadSopenharmony_ci if the incremented value is greater than or equal to <wrap>. This 4585bd8deadSopenharmony_ci function supports only 32-bit unsigned integer operands. 4595bd8deadSopenharmony_ci 4605bd8deadSopenharmony_ci atomicDecWrap() computes the new value written to <mem> by subtracting 4615bd8deadSopenharmony_ci one from the contents of <mem>, and then forcing the result to <wrap>-1 4625bd8deadSopenharmony_ci if the original value read from <mem> was either zero or greater than 4635bd8deadSopenharmony_ci <wrap>. This function supports only 32-bit unsigned integer operands. 4645bd8deadSopenharmony_ci 4655bd8deadSopenharmony_ci Additionally, add the following functions to the section: 4665bd8deadSopenharmony_ci 4675bd8deadSopenharmony_ci uint64_t atomicAdd(inout uint64_t mem, uint data); 4685bd8deadSopenharmony_ci uint64_t atomicExchange(inout uint64_t mem, uint data); 4695bd8deadSopenharmony_ci uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare, 4705bd8deadSopenharmony_ci uint64_t data); 4715bd8deadSopenharmony_ci 4725bd8deadSopenharmony_ci If ARB_shader_storage_buffer_object or ARB_compute_shader are supported, 4735bd8deadSopenharmony_ci make similar edits to the functions documented in the 4745bd8deadSopenharmony_ci ARB_shader_storage_buffer object extension. 4755bd8deadSopenharmony_ci 4765bd8deadSopenharmony_ci These functions are available if and only if GL_NV_gpu_shader5 is enabled 4775bd8deadSopenharmony_ci via the "#extension" directive. 4785bd8deadSopenharmony_ci 4795bd8deadSopenharmony_ciDependencies on OpenGL 4.2 4805bd8deadSopenharmony_ci 4815bd8deadSopenharmony_ci If OpenGL 4.2 is supported, MemoryBarrierEXT can be replaced with the 4825bd8deadSopenharmony_ci equivalent core function MemoryBarrier. 4835bd8deadSopenharmony_ci 4845bd8deadSopenharmony_ci 4855bd8deadSopenharmony_ciErrors 4865bd8deadSopenharmony_ci 4875bd8deadSopenharmony_ci None 4885bd8deadSopenharmony_ci 4895bd8deadSopenharmony_ciNew State 4905bd8deadSopenharmony_ci 4915bd8deadSopenharmony_ci None. 4925bd8deadSopenharmony_ci 4935bd8deadSopenharmony_ciIssues 4945bd8deadSopenharmony_ci 4955bd8deadSopenharmony_ci (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply? 4965bd8deadSopenharmony_ci 4975bd8deadSopenharmony_ci RESOLVED: The primary reason for this limitation to exist was the lack 4985bd8deadSopenharmony_ci of 64-bit integer support in shaders (see issue 15 of 4995bd8deadSopenharmony_ci NV_shader_buffer_load). Given that this extension is being released at 5005bd8deadSopenharmony_ci the same time as NV_gpu_shader5 which adds 64-bit integer support, it 5015bd8deadSopenharmony_ci is expected that this maximum address will match the maximum address 5025bd8deadSopenharmony_ci supported by the GPU's address space, or will be equal to "~0ULL" 5035bd8deadSopenharmony_ci indicating that any GPU address returned by the GL will be usable in a 5045bd8deadSopenharmony_ci shader. 5055bd8deadSopenharmony_ci 5065bd8deadSopenharmony_ci (2) What qualifiers should be supported on pointer variables, and how can 5075bd8deadSopenharmony_ci they be used in declarations? 5085bd8deadSopenharmony_ci 5095bd8deadSopenharmony_ci RESOLVED: We will support the qualifiers "coherent", "volatile", 5105bd8deadSopenharmony_ci "restrict", and "const" to be used in pointer declarations. "coherent" 5115bd8deadSopenharmony_ci is taken from EXT_shader_image_load_store and is used to ensure that 5125bd8deadSopenharmony_ci memory accesses from different shader threads are cached coherently 5135bd8deadSopenharmony_ci (i.e., will be able to see each other when complete). "volatile" and 5145bd8deadSopenharmony_ci "const" behave is as in C. 5155bd8deadSopenharmony_ci 5165bd8deadSopenharmony_ci "restrict" behaves as in the C99 standard, and can be used to indicate 5175bd8deadSopenharmony_ci that no other pointer points to the same underlying data. This permits 5185bd8deadSopenharmony_ci optimizations that would otherwise be impossible if the compiler has to 5195bd8deadSopenharmony_ci assume that a pair of pointers might end up pointing to the same data. 5205bd8deadSopenharmony_ci For example, in standard C/C++, a loop like: 5215bd8deadSopenharmony_ci 5225bd8deadSopenharmony_ci int *a, *b; 5235bd8deadSopenharmony_ci a[0] = b[0] + b[0]; 5245bd8deadSopenharmony_ci a[1] = b[0] + b[1]; 5255bd8deadSopenharmony_ci a[2] = b[0] + b[2]; 5265bd8deadSopenharmony_ci 5275bd8deadSopenharmony_ci would need to reload b[0] for each assignment because a[0] or a[1] 5285bd8deadSopenharmony_ci might point at the same data as b[0]. With restrict, the compiler can 5295bd8deadSopenharmony_ci assume that b[0] is not modified by any of the instructions and load it 5305bd8deadSopenharmony_ci just once. 5315bd8deadSopenharmony_ci 5325bd8deadSopenharmony_ci (3) What amount of automatic synchronization is provided for buffer object 5335bd8deadSopenharmony_ci writes through pointers? 5345bd8deadSopenharmony_ci 5355bd8deadSopenharmony_ci RESOLVED: Use of MemoryBarrierEXT() is required, and there is no 5365bd8deadSopenharmony_ci automatic synchronization when buffers are bound or unbound. With 5375bd8deadSopenharmony_ci resident buffers, there are no well-defined binding points in the first 5385bd8deadSopenharmony_ci place -- all resident buffers are effectively "bound". 5395bd8deadSopenharmony_ci 5405bd8deadSopenharmony_ci Implicit synchronization is difficult, as it might require some 5415bd8deadSopenharmony_ci combination of: 5425bd8deadSopenharmony_ci 5435bd8deadSopenharmony_ci - tracking which buffers might be written (randomly) in the shader 5445bd8deadSopenharmony_ci itself; 5455bd8deadSopenharmony_ci 5465bd8deadSopenharmony_ci - assuming that if a shader that performs writes is executed, all 5475bd8deadSopenharmony_ci bytes of all resident buffers could be modified and thus must be 5485bd8deadSopenharmony_ci treated as dirty; 5495bd8deadSopenharmony_ci 5505bd8deadSopenharmony_ci - idling at the end of each primitive or draw call, so that the 5515bd8deadSopenharmony_ci results of all previous commands are complete. 5525bd8deadSopenharmony_ci 5535bd8deadSopenharmony_ci Since normal OpenGL operation is pipelined, idling would result in a 5545bd8deadSopenharmony_ci significant performance impact since pipelining would otherwise allow 5555bd8deadSopenharmony_ci fragment shader execution for draw call N while simultaneously 5565bd8deadSopenharmony_ci performing vertex shader execution for draw call N+1. 5575bd8deadSopenharmony_ci 5585bd8deadSopenharmony_ci 5595bd8deadSopenharmony_ciRevision History 5605bd8deadSopenharmony_ci 5615bd8deadSopenharmony_ci Rev. Date Author Changes 5625bd8deadSopenharmony_ci ---- -------- -------- ----------------------------------------- 5635bd8deadSopenharmony_ci 6 05/25/22 shqxu Update to address removal of function MemoryBarrierNV 5645bd8deadSopenharmony_ci and replace with MemoryBarrierEXT. Add interaction 5655bd8deadSopenharmony_ci with OpenGL 4.2 supporting MemoryBarrier. 5665bd8deadSopenharmony_ci 5675bd8deadSopenharmony_ci 5 08/13/12 pbrown Add interaction with OpenGL 4.3 (and related ARB 5685bd8deadSopenharmony_ci extensions) supporting atomic{Inc,Dec}Wrap and 5695bd8deadSopenharmony_ci 64-bit unsigned integer atomics to shared and 5705bd8deadSopenharmony_ci shader storage buffer memory. 5715bd8deadSopenharmony_ci 5725bd8deadSopenharmony_ci 4 04/13/10 pbrown Remove the floating-point version of atomicAdd(). 5735bd8deadSopenharmony_ci 5745bd8deadSopenharmony_ci 3 03/23/10 pbrown Minor cleanups to the dependency sections. 5755bd8deadSopenharmony_ci Fixed obsolete extension names. Add an issue 5765bd8deadSopenharmony_ci on synchronization. 5775bd8deadSopenharmony_ci 5785bd8deadSopenharmony_ci 2 03/16/10 pbrown Updated memory access qualifiers section 5795bd8deadSopenharmony_ci (volatile, coherent, restrict, const) for 5805bd8deadSopenharmony_ci pointers. Added language to document how 5815bd8deadSopenharmony_ci these qualifiers work in possibly complicated 5825bd8deadSopenharmony_ci expression. 5835bd8deadSopenharmony_ci 5845bd8deadSopenharmony_ci 1 pbrown Internal revisions. 585