15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_shader_buffer_store
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5)
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciStatus
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci    Shipping.
165bd8deadSopenharmony_ci
175bd8deadSopenharmony_ciVersion
185bd8deadSopenharmony_ci
195bd8deadSopenharmony_ci    Last Modified Date:         May 25, 2022
205bd8deadSopenharmony_ci    NVIDIA Revision:            6
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ciNumber
235bd8deadSopenharmony_ci
245bd8deadSopenharmony_ci    390
255bd8deadSopenharmony_ci
265bd8deadSopenharmony_ciDependencies
275bd8deadSopenharmony_ci
285bd8deadSopenharmony_ci    OpenGL 3.0 and GLSL 1.30 are required.
295bd8deadSopenharmony_ci
305bd8deadSopenharmony_ci    This extension is written against the OpenGL 3.2 (Compatibility Profile)
315bd8deadSopenharmony_ci    specification, dated July 24, 2009.
325bd8deadSopenharmony_ci
335bd8deadSopenharmony_ci    This extension is written against version 1.50.09 of the OpenGL Shading
345bd8deadSopenharmony_ci    Language Specification.
355bd8deadSopenharmony_ci
365bd8deadSopenharmony_ci    OpenGL 3.0 and GLSL 1.30 are required.
375bd8deadSopenharmony_ci
385bd8deadSopenharmony_ci    NV_shader_buffer_load is required.
395bd8deadSopenharmony_ci
405bd8deadSopenharmony_ci    NV_gpu_program5 and/or NV_gpu_shader5 is required.
415bd8deadSopenharmony_ci
425bd8deadSopenharmony_ci    This extension interacts with EXT_shader_image_load_store.
435bd8deadSopenharmony_ci
445bd8deadSopenharmony_ci    This extension interacts with NV_gpu_shader5.
455bd8deadSopenharmony_ci
465bd8deadSopenharmony_ci    This extension interacts with NV_gpu_program5.
475bd8deadSopenharmony_ci
485bd8deadSopenharmony_ci    This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object, 
495bd8deadSopenharmony_ci    and ARB_compute_shader.
505bd8deadSopenharmony_ci
515bd8deadSopenharmony_ci    This extension interacts with OpenGL 4.2.
525bd8deadSopenharmony_ci
535bd8deadSopenharmony_ciOverview
545bd8deadSopenharmony_ci
555bd8deadSopenharmony_ci    This extension builds upon the mechanisms added by the
565bd8deadSopenharmony_ci    NV_shader_buffer_load extension to allow shaders to perform random-access
575bd8deadSopenharmony_ci    reads to buffer object memory without using dedicated buffer object
585bd8deadSopenharmony_ci    binding points.  Instead, it allowed an application to make a buffer
595bd8deadSopenharmony_ci    object resident, query a GPU address (pointer) for the buffer object, and
605bd8deadSopenharmony_ci    then use that address as a pointer in shader code.  This approach allows
615bd8deadSopenharmony_ci    shaders to access a large number of buffer objects without needing to
625bd8deadSopenharmony_ci    repeatedly bind buffers to a limited number of fixed-functionality binding
635bd8deadSopenharmony_ci    points.
645bd8deadSopenharmony_ci
655bd8deadSopenharmony_ci    This extension lifts the restriction from NV_shader_buffer_load that
665bd8deadSopenharmony_ci    disallows writes.  In particular, the MakeBufferResidentNV function now
675bd8deadSopenharmony_ci    allows READ_WRITE and WRITE_ONLY access modes, and the shading language is
685bd8deadSopenharmony_ci    extended to allow shaders to write through (GPU address) pointers.
695bd8deadSopenharmony_ci    Additionally, the extension provides built-in functions to perform atomic
705bd8deadSopenharmony_ci    memory transactions to buffer object memory.
715bd8deadSopenharmony_ci
725bd8deadSopenharmony_ci    As with the shader writes provided by the EXT_shader_image_load_store
735bd8deadSopenharmony_ci    extension, writes to buffer object memory using this extension are weakly
745bd8deadSopenharmony_ci    ordered to allow for parallel or distributed shader execution.  The
755bd8deadSopenharmony_ci    EXT_shader_image_load_store extension provides mechanisms allowing for
765bd8deadSopenharmony_ci    finer control of memory transaction order, and those mechanisms apply
775bd8deadSopenharmony_ci    equally to buffer object stores using this extension.
785bd8deadSopenharmony_ci
795bd8deadSopenharmony_ci
805bd8deadSopenharmony_ciNew Procedures and Functions
815bd8deadSopenharmony_ci
825bd8deadSopenharmony_ci    None.
835bd8deadSopenharmony_ci
845bd8deadSopenharmony_ciNew Tokens
855bd8deadSopenharmony_ci
865bd8deadSopenharmony_ci    Accepted by the <barriers> parameter of MemoryBarrierEXT:
875bd8deadSopenharmony_ci
885bd8deadSopenharmony_ci        SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV             0x00000010
895bd8deadSopenharmony_ci
905bd8deadSopenharmony_ci    Accepted by the <access> parameter of MakeBufferResidentNV:
915bd8deadSopenharmony_ci
925bd8deadSopenharmony_ci        READ_WRITE
935bd8deadSopenharmony_ci        WRITE_ONLY
945bd8deadSopenharmony_ci
955bd8deadSopenharmony_ci
965bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
975bd8deadSopenharmony_ci(OpenGL Operation)
985bd8deadSopenharmony_ci
995bd8deadSopenharmony_ci    Modify Section 2.9, Buffer Objects, p. 46
1005bd8deadSopenharmony_ci
1015bd8deadSopenharmony_ci    (extend the language inserted by NV_shader_buffer_load in its "Append to
1025bd8deadSopenharmony_ci     Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings)
1035bd8deadSopenharmony_ci
1045bd8deadSopenharmony_ci    The data store of a buffer object may be made accessible to the GL 
1055bd8deadSopenharmony_ci    via shader buffer loads and stores by calling:
1065bd8deadSopenharmony_ci
1075bd8deadSopenharmony_ci        void MakeBufferResidentNV(enum target, enum access);
1085bd8deadSopenharmony_ci
1095bd8deadSopenharmony_ci    <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY.  If a shader loads
1105bd8deadSopenharmony_ci    from a buffer with WRITE_ONLY <access> or stores to a buffer with
1115bd8deadSopenharmony_ci    READ_ONLY <access>, the results of that shader operation are undefined and
1125bd8deadSopenharmony_ci    may lead to application termination.  <target> may be any of the buffer
1135bd8deadSopenharmony_ci    targets accepted by BindBuffer.
1145bd8deadSopenharmony_ci
1155bd8deadSopenharmony_ci    The data store of a buffer object may be made inaccessible to the GL
1165bd8deadSopenharmony_ci    via shader buffer loads and stores by calling:
1175bd8deadSopenharmony_ci    
1185bd8deadSopenharmony_ci        void MakeBufferNonResidentNV(enum target);
1195bd8deadSopenharmony_ci
1205bd8deadSopenharmony_ci
1215bd8deadSopenharmony_ci    Modify "Section 2.20.X, Shader Memory Access" introduced by the
1225bd8deadSopenharmony_ci    NV_shader_buffer_load specification, to reflect that shaders may store to
1235bd8deadSopenharmony_ci    buffer object memory.
1245bd8deadSopenharmony_ci
1255bd8deadSopenharmony_ci    (first paragraph) Shaders may load from or store to buffer object memory
1265bd8deadSopenharmony_ci    by dereferencing pointer variables.  ...
1275bd8deadSopenharmony_ci
1285bd8deadSopenharmony_ci    (second paragraph) When a shader dereferences a pointer variable, data are
1295bd8deadSopenharmony_ci    read from or written to buffer object memory according to the following
1305bd8deadSopenharmony_ci    rules:
1315bd8deadSopenharmony_ci
1325bd8deadSopenharmony_ci    (modify the paragraph after the end of the alignment and stride rules,
1335bd8deadSopenharmony_ci    allowing for writes, and also providing rules forbidding reads to
1345bd8deadSopenharmony_ci    WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU
1355bd8deadSopenharmony_ci    memory address that does not correspond to a buffer object made resident
1365bd8deadSopenharmony_ci    by MakeBufferResidentNV, the results of the operation are undefined and
1375bd8deadSopenharmony_ci    may result in application termination.  If a shader reads from a buffer
1385bd8deadSopenharmony_ci    object made resident with an <access> parameter of WRITE_ONLY, or writes
1395bd8deadSopenharmony_ci    to a buffer object made resident with an <access> parameter of READ_ONLY,
1405bd8deadSopenharmony_ci    the results of the operation are also undefined and may lead to
1415bd8deadSopenharmony_ci    application termination.
1425bd8deadSopenharmony_ci
1435bd8deadSopenharmony_ci    Incorporate the contents of "Section 2.14.X, Shader Memory Access" from
1445bd8deadSopenharmony_ci    the EXT_shader_image_load_store specification into the same "Shader memory
1455bd8deadSopenharmony_ci    Access", with the following edits.
1465bd8deadSopenharmony_ci
1475bd8deadSopenharmony_ci    (modify first paragraph to reference pointers) Shaders may perform
1485bd8deadSopenharmony_ci    random-access reads and writes to texture or buffer object memory using
1495bd8deadSopenharmony_ci    pointers or with built-in image load, store, and atomic functions, as
1505bd8deadSopenharmony_ci    described in the OpenGL Shading Language Specification.  ...
1515bd8deadSopenharmony_ci
1525bd8deadSopenharmony_ci    (add to list of bits in <barriers> in MemoryBarrierEXT)
1535bd8deadSopenharmony_ci
1545bd8deadSopenharmony_ci    - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV:  Memory accesses using pointers and
1555bd8deadSopenharmony_ci        assembly program global loads, stores, and atomics issued after the
1565bd8deadSopenharmony_ci        barrier will reflect data written by shaders prior to the barrier.
1575bd8deadSopenharmony_ci        Additionally, memory writes using pointers issued after the barrier
1585bd8deadSopenharmony_ci        will not execute until memory accesses (loads, stores, texture
1595bd8deadSopenharmony_ci        fetches, vertex fetches, etc) initiated prior to the barrier complete.
1605bd8deadSopenharmony_ci
1615bd8deadSopenharmony_ci    (modify second paragraph after the list of <barriers> bits) To allow for
1625bd8deadSopenharmony_ci    independent shader threads to communicate by reads and writes to a common
1635bd8deadSopenharmony_ci    memory address, pointers and image variables in the OpenGL shading
1645bd8deadSopenharmony_ci    language may be declared as "coherent".  Buffer object or texture memory
1655bd8deadSopenharmony_ci    accessed through such variables may be cached only if...
1665bd8deadSopenharmony_ci
1675bd8deadSopenharmony_ci    (add to the coherency guidelines)
1685bd8deadSopenharmony_ci
1695bd8deadSopenharmony_ci    - Data written using pointers in one rendering pass and read by the shader
1705bd8deadSopenharmony_ci      in a later pass need not use coherent variables or memoryBarrier().
1715bd8deadSopenharmony_ci      Calling MemoryBarrierEXT() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV
1725bd8deadSopenharmony_ci      set in <barriers> between passes is necessary.
1735bd8deadSopenharmony_ci
1745bd8deadSopenharmony_ci
1755bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
1765bd8deadSopenharmony_ci(Rasterization)
1775bd8deadSopenharmony_ci
1785bd8deadSopenharmony_ci    None.
1795bd8deadSopenharmony_ci
1805bd8deadSopenharmony_ci
1815bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
1825bd8deadSopenharmony_ci(Per-Fragment Operations and the Frame Buffer)
1835bd8deadSopenharmony_ci
1845bd8deadSopenharmony_ci    None.
1855bd8deadSopenharmony_ci
1865bd8deadSopenharmony_ci
1875bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
1885bd8deadSopenharmony_ci(Special Functions)
1895bd8deadSopenharmony_ci
1905bd8deadSopenharmony_ci    None.
1915bd8deadSopenharmony_ci
1925bd8deadSopenharmony_ci
1935bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
1945bd8deadSopenharmony_ci(State and State Requests)
1955bd8deadSopenharmony_ci
1965bd8deadSopenharmony_ci    None.
1975bd8deadSopenharmony_ci
1985bd8deadSopenharmony_ci
1995bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
2005bd8deadSopenharmony_ciSpecification (Invariance)
2015bd8deadSopenharmony_ci
2025bd8deadSopenharmony_ci    None.
2035bd8deadSopenharmony_ci
2045bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
2055bd8deadSopenharmony_ci
2065bd8deadSopenharmony_ci    None.
2075bd8deadSopenharmony_ci
2085bd8deadSopenharmony_ciGLX Protocol
2095bd8deadSopenharmony_ci
2105bd8deadSopenharmony_ci    None.
2115bd8deadSopenharmony_ci    
2125bd8deadSopenharmony_ci
2135bd8deadSopenharmony_ciAdditions to the OpenGL Shading Language Specification, Version 1.50 (Revision
2145bd8deadSopenharmony_ci09)
2155bd8deadSopenharmony_ci
2165bd8deadSopenharmony_ci    Modify Section 4.3.X, Memory Access Qualifiers, as added by
2175bd8deadSopenharmony_ci    EXT_shader_image_load_store
2185bd8deadSopenharmony_ci
2195bd8deadSopenharmony_ci    (modify second paragraph) Memory accesses to image and pointer variables
2205bd8deadSopenharmony_ci    declared using the "coherent" storage qualifier are performed coherently
2215bd8deadSopenharmony_ci    with similar accesses from other shader threads.  ...
2225bd8deadSopenharmony_ci
2235bd8deadSopenharmony_ci    (modify fourth paragraph) Memory accesses to image and pointer variables
2245bd8deadSopenharmony_ci    declared using the "volatile" storage qualifier must treat the underlying
2255bd8deadSopenharmony_ci    memory as though it could be read or written at any point during shader
2265bd8deadSopenharmony_ci    execution by some source other than the executing thread.  ...
2275bd8deadSopenharmony_ci
2285bd8deadSopenharmony_ci    (modify fifth paragraph) Memory accesses to image and pointer variables
2295bd8deadSopenharmony_ci    declared using the "restrict" storage qualifier may be compiled assuming
2305bd8deadSopenharmony_ci    that the variable used to perform the memory access is the only way to
2315bd8deadSopenharmony_ci    access the underlying memory using the shader stage in question.  ...
2325bd8deadSopenharmony_ci
2335bd8deadSopenharmony_ci    (modify sixth paragraph) Memory accesses to image and pointer variables
2345bd8deadSopenharmony_ci    declared using the "const" storage qualifier may only read the underlying
2355bd8deadSopenharmony_ci    memory, which is treated as read-only.  ...
2365bd8deadSopenharmony_ci
2375bd8deadSopenharmony_ci    (insert after seventh paragraph) 
2385bd8deadSopenharmony_ci
2395bd8deadSopenharmony_ci    In pointer variable declarations, the "coherent", "volatile", "restrict",
2405bd8deadSopenharmony_ci    and "const" qualifiers can be positioned anywhere in the declaration, and
2415bd8deadSopenharmony_ci    may apply qualify either a pointer or the underlying data being pointed
2425bd8deadSopenharmony_ci    to, depending on its position in the declaration.  Each qualifier to the
2435bd8deadSopenharmony_ci    right of the basic data type in a declaration is considered to apply to
2445bd8deadSopenharmony_ci    whatever type is found immediately to its left; qualifiers to the left of
2455bd8deadSopenharmony_ci    the basic type are considered to apply to that basic type.  To interpret
2465bd8deadSopenharmony_ci    the meaning of qualifiers in pointer declarations, it is useful to read
2475bd8deadSopenharmony_ci    the declaration from right to left as in the following examples.
2485bd8deadSopenharmony_ci
2495bd8deadSopenharmony_ci      int * * const a;     // a is a constant pointer to a pointer to int
2505bd8deadSopenharmony_ci      int * volatile * b;  // b is a pointer to a volatile pointer to int
2515bd8deadSopenharmony_ci      int const * * c;     // c is a pointer to a pointer to a constant int
2525bd8deadSopenharmony_ci      const int * * d;     // d is like c
2535bd8deadSopenharmony_ci      int const * const *  // e is a constant pointer to a constant pointer
2545bd8deadSopenharmony_ci       const e;            //   to a constant int
2555bd8deadSopenharmony_ci
2565bd8deadSopenharmony_ci    For pointer types, the "restrict" qualifier can be used to qualify
2575bd8deadSopenharmony_ci    pointers, but not non-pointer types being pointed to.
2585bd8deadSopenharmony_ci
2595bd8deadSopenharmony_ci      int * restrict a;    // a is a restricted pointer to int
2605bd8deadSopenharmony_ci      int restrict * b;    // b qualifies "int" as restricted - illegal
2615bd8deadSopenharmony_ci
2625bd8deadSopenharmony_ci    (modify eighth paragraph) The "coherent", "volatile", and "restrict"
2635bd8deadSopenharmony_ci    storage qualifiers may only be used on image and pointer variables, and
2645bd8deadSopenharmony_ci    may not be used on variables of any other type.  ...
2655bd8deadSopenharmony_ci
2665bd8deadSopenharmony_ci    (modify last paragraph) The values of image and pointer variables
2675bd8deadSopenharmony_ci    qualified with "coherent," "volatile," "restrict", or "const" may not be
2685bd8deadSopenharmony_ci    assigned to function parameters or l-values lacking such qualifiers.
2695bd8deadSopenharmony_ci
2705bd8deadSopenharmony_ci    (add examples for the last paragraph)
2715bd8deadSopenharmony_ci
2725bd8deadSopenharmony_ci      int volatile * var1;
2735bd8deadSopenharmony_ci      int * var2;
2745bd8deadSopenharmony_ci      int * restrict var3;
2755bd8deadSopenharmony_ci      var1 = var2;              // OK, adding "volatile" is allowed
2765bd8deadSopenharmony_ci      var2 = var3;              // illegal, stripping "restrict" is not
2775bd8deadSopenharmony_ci
2785bd8deadSopenharmony_ci
2795bd8deadSopenharmony_ci    Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load
2805bd8deadSopenharmony_ci
2815bd8deadSopenharmony_ci    (modify second paragraph, allowing storing through pointers) The pointer
2825bd8deadSopenharmony_ci    dereference operator ...  The result of a pointer dereference may be used
2835bd8deadSopenharmony_ci    as the left-hand side of an assignment.
2845bd8deadSopenharmony_ci
2855bd8deadSopenharmony_ci
2865bd8deadSopenharmony_ci    Modify Section 8.Y, Shader Memory Functions, as added by
2875bd8deadSopenharmony_ci    EXT_shader_image_load_store
2885bd8deadSopenharmony_ci
2895bd8deadSopenharmony_ci    (modify first paragraph) Shaders of all types may read and write the
2905bd8deadSopenharmony_ci    contents of textures and buffer objects using pointers and image
2915bd8deadSopenharmony_ci    variables.  ...
2925bd8deadSopenharmony_ci
2935bd8deadSopenharmony_ci    (modify description of memoryBarrier) memoryBarrier() can be used to
2945bd8deadSopenharmony_ci    control the ordering of memory transactions issued by a shader thread.
2955bd8deadSopenharmony_ci    When called, it will wait on the completion of all memory accesses
2965bd8deadSopenharmony_ci    resulting from the use of pointers and image variables prior to calling
2975bd8deadSopenharmony_ci    the function.  ...
2985bd8deadSopenharmony_ci
2995bd8deadSopenharmony_ci    (add the following paragraphs to the end of the section)
3005bd8deadSopenharmony_ci
3015bd8deadSopenharmony_ci    If multiple threads need to atomically access shared memory addresses
3025bd8deadSopenharmony_ci    using pointers, they may do so using the following built-in functions.
3035bd8deadSopenharmony_ci    The following atomic memory access functions allow a shader thread to
3045bd8deadSopenharmony_ci    read, modify, and write an address in memory in a manner that guarantees
3055bd8deadSopenharmony_ci    that no other shader thread can modify the memory between the read and the
3065bd8deadSopenharmony_ci    write.  All of these functions read a single data element from memory,
3075bd8deadSopenharmony_ci    compute a new value based on the value read from memory and one or more
3085bd8deadSopenharmony_ci    other values passed to the function, and writes the result back to the
3095bd8deadSopenharmony_ci    same memory address.  The value returned to the caller is always the data
3105bd8deadSopenharmony_ci    element originally read from memory.
3115bd8deadSopenharmony_ci
3125bd8deadSopenharmony_ci    Syntax:
3135bd8deadSopenharmony_ci
3145bd8deadSopenharmony_ci      uint      atomicAdd(uint *address, uint data);
3155bd8deadSopenharmony_ci      int       atomicAdd(int *address, int data);
3165bd8deadSopenharmony_ci      uint64_t  atomicAdd(uint64_t *address,  uint64_t data);
3175bd8deadSopenharmony_ci
3185bd8deadSopenharmony_ci      uint      atomicMin(uint *address, uint data);
3195bd8deadSopenharmony_ci      int       atomicMin(int *address, int data);
3205bd8deadSopenharmony_ci
3215bd8deadSopenharmony_ci      uint      atomicMax(uint *address, uint data);
3225bd8deadSopenharmony_ci      int       atomicMax(int *address, int data);
3235bd8deadSopenharmony_ci
3245bd8deadSopenharmony_ci      uint      atomicIncWrap(uint *address, uint wrap);
3255bd8deadSopenharmony_ci
3265bd8deadSopenharmony_ci      uint      atomicDecWrap(uint *address, uint wrap);
3275bd8deadSopenharmony_ci
3285bd8deadSopenharmony_ci      uint      atomicAnd(uint *address, uint data);
3295bd8deadSopenharmony_ci      int       atomicAnd(int *address, int data);
3305bd8deadSopenharmony_ci
3315bd8deadSopenharmony_ci      uint      atomicOr(uint *address, uint data);
3325bd8deadSopenharmony_ci      int       atomicOr(int *address, int data);
3335bd8deadSopenharmony_ci
3345bd8deadSopenharmony_ci      uint      atomicXor(uint *address, uint data);
3355bd8deadSopenharmony_ci      int       atomicXor(int *address, int data);
3365bd8deadSopenharmony_ci
3375bd8deadSopenharmony_ci      uint      atomicExchange(uint *address, uint data);
3385bd8deadSopenharmony_ci      int       atomicExchange(int *address, uint data);
3395bd8deadSopenharmony_ci      uint64_t  atomicExchange(uint64_t *address, uint64_t data);
3405bd8deadSopenharmony_ci
3415bd8deadSopenharmony_ci      uint      atomicCompSwap(uint *address, uint compare, uint data);
3425bd8deadSopenharmony_ci      int       atomicCompSwap(int *address, int compare, int data);
3435bd8deadSopenharmony_ci      uint64_t  atomicCompSwap(uint64_t *address, uint64_t compare, 
3445bd8deadSopenharmony_ci                               uint64_t data);
3455bd8deadSopenharmony_ci
3465bd8deadSopenharmony_ci    Description:
3475bd8deadSopenharmony_ci
3485bd8deadSopenharmony_ci    atomicAdd() computes the new value written to <address> by adding the
3495bd8deadSopenharmony_ci    value of <data> to the contents of <address>.  This function supports 32-
3505bd8deadSopenharmony_ci    and 64-bit unsigned integer operands, and 32-bit signed integer operands.
3515bd8deadSopenharmony_ci
3525bd8deadSopenharmony_ci    atomicMin() computes the new value written to <address> by taking the
3535bd8deadSopenharmony_ci    minimum of the value of <data> and the contents of <address>.  This
3545bd8deadSopenharmony_ci    function supports 32-bit signed and unsigned integer operands.
3555bd8deadSopenharmony_ci
3565bd8deadSopenharmony_ci    atomicMax() computes the new value written to <address> by taking the
3575bd8deadSopenharmony_ci    maximum of the value of <data> and the contents of <address>.  This
3585bd8deadSopenharmony_ci    function supports 32-bit signed and unsigned integer operands.
3595bd8deadSopenharmony_ci
3605bd8deadSopenharmony_ci    atomicIncWrap() computes the new value written to <address> by adding one
3615bd8deadSopenharmony_ci    to the contents of <address>, and then forcing the result to zero if and
3625bd8deadSopenharmony_ci    only if the incremented value is greater than or equal to <wrap>.  This
3635bd8deadSopenharmony_ci    function supports only 32-bit unsigned integer operands.
3645bd8deadSopenharmony_ci
3655bd8deadSopenharmony_ci    atomicDecWrap() computes the new value written to <address> by subtracting
3665bd8deadSopenharmony_ci    one from the contents of <address>, and then forcing the result to
3675bd8deadSopenharmony_ci    <wrap>-1 if the original value read from <address> was either zero or
3685bd8deadSopenharmony_ci    greater than <wrap>.  This function supports only 32-bit unsigned integer
3695bd8deadSopenharmony_ci    operands.
3705bd8deadSopenharmony_ci
3715bd8deadSopenharmony_ci    atomicAnd() computes the new value written to <address> by performing a
3725bd8deadSopenharmony_ci    bitwise and of the value of <data> and the contents of <address>.  This
3735bd8deadSopenharmony_ci    function supports 32-bit signed and unsigned integer operands.
3745bd8deadSopenharmony_ci
3755bd8deadSopenharmony_ci    atomicOr() computes the new value written to <address> by performing a
3765bd8deadSopenharmony_ci    bitwise or of the value of <data> and the contents of <address>.  This
3775bd8deadSopenharmony_ci    function supports 32-bit signed and unsigned integer operands.
3785bd8deadSopenharmony_ci
3795bd8deadSopenharmony_ci    atomicXor() computes the new value written to <address> by performing a
3805bd8deadSopenharmony_ci    bitwise exclusive or of the value of <data> and the contents of <address>.
3815bd8deadSopenharmony_ci    This function supports 32-bit signed and unsigned integer operands.
3825bd8deadSopenharmony_ci
3835bd8deadSopenharmony_ci    atomicExchange() uses the value of <data> as the value written to
3845bd8deadSopenharmony_ci    <address>.  This function supports 32- and 64-bit unsigned integer
3855bd8deadSopenharmony_ci    operands and 32-bit signed integer operands.
3865bd8deadSopenharmony_ci
3875bd8deadSopenharmony_ci    atomicCompSwap() compares the value of <compare> and the contents of
3885bd8deadSopenharmony_ci    <address>.  If the values are equal, <data> is written to <address>;
3895bd8deadSopenharmony_ci    otherwise, the original contents of <address> are preserved.  This
3905bd8deadSopenharmony_ci    function supports 32- and 64-bit unsigned integer operands and 32-bit
3915bd8deadSopenharmony_ci    signed integer operands.
3925bd8deadSopenharmony_ci
3935bd8deadSopenharmony_ci
3945bd8deadSopenharmony_ci    Modify Section 9, Shading Language Grammar, p. 105
3955bd8deadSopenharmony_ci
3965bd8deadSopenharmony_ci    !!! TBD:  Add grammar constructs for memory access qualifiers, allowing
3975bd8deadSopenharmony_ci        memory access qualifiers before or after the type and the "*"
3985bd8deadSopenharmony_ci        characters indicating pointers in a variable declaration.
3995bd8deadSopenharmony_ci 
4005bd8deadSopenharmony_ci
4015bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store
4025bd8deadSopenharmony_ci
4035bd8deadSopenharmony_ci    This specification incorporates the memory access ordering and
4045bd8deadSopenharmony_ci    synchronization discussion from EXT_shader_image_load_store verbatim.  
4055bd8deadSopenharmony_ci
4065bd8deadSopenharmony_ci    If EXT_shader_image_load_store is not supported, this spec should be
4075bd8deadSopenharmony_ci    construed to introduce:
4085bd8deadSopenharmony_ci
4095bd8deadSopenharmony_ci      * the shader memory access language from that specification, including
4105bd8deadSopenharmony_ci        the MemoryBarrierEXT() command and the tokens accepted by <barriers>
4115bd8deadSopenharmony_ci        from that specification;
4125bd8deadSopenharmony_ci
4135bd8deadSopenharmony_ci      * the memoryBarrier() function to the OpenGL shading language
4145bd8deadSopenharmony_ci        specification; and
4155bd8deadSopenharmony_ci
4165bd8deadSopenharmony_ci      * the capability and spec language allowing applications to enable early
4175bd8deadSopenharmony_ci        depth tests.
4185bd8deadSopenharmony_ci
4195bd8deadSopenharmony_ciDependencies on NV_gpu_shader5
4205bd8deadSopenharmony_ci
4215bd8deadSopenharmony_ci    This specification requires either NV_gpu_shader5 or NV_gpu_program5.  
4225bd8deadSopenharmony_ci
4235bd8deadSopenharmony_ci    If NV_gpu_shader5 is supported, use of the new shading language features
4245bd8deadSopenharmony_ci    described in this extension requires 
4255bd8deadSopenharmony_ci
4265bd8deadSopenharmony_ci      #extension GL_NV_gpu_shader5 : enable
4275bd8deadSopenharmony_ci
4285bd8deadSopenharmony_ci    If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading
4295bd8deadSopenharmony_ci    Language Specification should be removed.
4305bd8deadSopenharmony_ci
4315bd8deadSopenharmony_ciDependencies on NV_gpu_program5
4325bd8deadSopenharmony_ci
4335bd8deadSopenharmony_ci    If NV_gpu_program5 is supported, the extension provides support for stores
4345bd8deadSopenharmony_ci    and atomic memory transactions to buffer object memory.  Stores are
4355bd8deadSopenharmony_ci    provided by the STORE opcode; atomics are provided by the ATOM opcode.  No
4365bd8deadSopenharmony_ci    "OPTION" line is required for these features, which are implied by
4375bd8deadSopenharmony_ci    NV_gpu_program5 program headers such as "!!NVfp5.0".  The operation of
4385bd8deadSopenharmony_ci    these opcodes is described in the NV_gpu_program5 extension specification.
4395bd8deadSopenharmony_ci
4405bd8deadSopenharmony_ci    Note also that NV_gpu_program5 also supports the LOAD opcode originally
4415bd8deadSopenharmony_ci    added by the NV_shader_buffer_load and the MEMBAR opcode originally
4425bd8deadSopenharmony_ci    provided by EXT_shader_image_load_store.
4435bd8deadSopenharmony_ci
4445bd8deadSopenharmony_ciDependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and
4455bd8deadSopenharmony_ciARB_compute_shader
4465bd8deadSopenharmony_ci
4475bd8deadSopenharmony_ci    If GLSL 4.30 is supported, add the following atomic memory functions to
4485bd8deadSopenharmony_ci    section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification:
4495bd8deadSopenharmony_ci
4505bd8deadSopenharmony_ci      uint atomicIncWrap(inout uint mem, uint wrap);
4515bd8deadSopenharmony_ci      uint atomicDecWrap(inout uint mem, uint wrap);
4525bd8deadSopenharmony_ci
4535bd8deadSopenharmony_ci    with the following documentation
4545bd8deadSopenharmony_ci
4555bd8deadSopenharmony_ci      atomicIncWrap() computes the new value written to <mem> by adding one to
4565bd8deadSopenharmony_ci      the contents of <mem>, and then forcing the result to zero if and only
4575bd8deadSopenharmony_ci      if the incremented value is greater than or equal to <wrap>.  This
4585bd8deadSopenharmony_ci      function supports only 32-bit unsigned integer operands.
4595bd8deadSopenharmony_ci
4605bd8deadSopenharmony_ci      atomicDecWrap() computes the new value written to <mem> by subtracting
4615bd8deadSopenharmony_ci      one from the contents of <mem>, and then forcing the result to <wrap>-1
4625bd8deadSopenharmony_ci      if the original value read from <mem> was either zero or greater than
4635bd8deadSopenharmony_ci      <wrap>.  This function supports only 32-bit unsigned integer operands.
4645bd8deadSopenharmony_ci
4655bd8deadSopenharmony_ci    Additionally, add the following functions to the section:
4665bd8deadSopenharmony_ci
4675bd8deadSopenharmony_ci      uint64_t atomicAdd(inout uint64_t mem, uint data);
4685bd8deadSopenharmony_ci      uint64_t atomicExchange(inout uint64_t mem, uint data);
4695bd8deadSopenharmony_ci      uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare, 
4705bd8deadSopenharmony_ci                              uint64_t data);
4715bd8deadSopenharmony_ci
4725bd8deadSopenharmony_ci    If ARB_shader_storage_buffer_object or ARB_compute_shader are supported,
4735bd8deadSopenharmony_ci    make similar edits to the functions documented in the
4745bd8deadSopenharmony_ci    ARB_shader_storage_buffer object extension.
4755bd8deadSopenharmony_ci
4765bd8deadSopenharmony_ci    These functions are available if and only if GL_NV_gpu_shader5 is enabled
4775bd8deadSopenharmony_ci    via the "#extension" directive.
4785bd8deadSopenharmony_ci
4795bd8deadSopenharmony_ciDependencies on OpenGL 4.2
4805bd8deadSopenharmony_ci
4815bd8deadSopenharmony_ci    If OpenGL 4.2 is supported, MemoryBarrierEXT can be replaced with the
4825bd8deadSopenharmony_ci    equivalent core function MemoryBarrier.
4835bd8deadSopenharmony_ci
4845bd8deadSopenharmony_ci
4855bd8deadSopenharmony_ciErrors
4865bd8deadSopenharmony_ci
4875bd8deadSopenharmony_ci    None
4885bd8deadSopenharmony_ci
4895bd8deadSopenharmony_ciNew State
4905bd8deadSopenharmony_ci
4915bd8deadSopenharmony_ci    None.
4925bd8deadSopenharmony_ci
4935bd8deadSopenharmony_ciIssues
4945bd8deadSopenharmony_ci
4955bd8deadSopenharmony_ci    (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply?
4965bd8deadSopenharmony_ci
4975bd8deadSopenharmony_ci      RESOLVED:  The primary reason for this limitation to exist was the lack
4985bd8deadSopenharmony_ci      of 64-bit integer support in shaders (see issue 15 of 
4995bd8deadSopenharmony_ci      NV_shader_buffer_load). Given that this extension is being released at 
5005bd8deadSopenharmony_ci      the same time as NV_gpu_shader5 which adds 64-bit integer support, it 
5015bd8deadSopenharmony_ci      is expected that this maximum address will match the maximum address
5025bd8deadSopenharmony_ci      supported by the GPU's address space, or will be equal to "~0ULL" 
5035bd8deadSopenharmony_ci      indicating that any GPU address returned by the GL will be usable in a
5045bd8deadSopenharmony_ci      shader.
5055bd8deadSopenharmony_ci
5065bd8deadSopenharmony_ci    (2) What qualifiers should be supported on pointer variables, and how can
5075bd8deadSopenharmony_ci        they be used in declarations?
5085bd8deadSopenharmony_ci
5095bd8deadSopenharmony_ci      RESOLVED:  We will support the qualifiers "coherent", "volatile",
5105bd8deadSopenharmony_ci      "restrict", and "const" to be used in pointer declarations.  "coherent"
5115bd8deadSopenharmony_ci      is taken from EXT_shader_image_load_store and is used to ensure that
5125bd8deadSopenharmony_ci      memory accesses from different shader threads are cached coherently
5135bd8deadSopenharmony_ci      (i.e., will be able to see each other when complete).  "volatile" and
5145bd8deadSopenharmony_ci      "const" behave is as in C.
5155bd8deadSopenharmony_ci
5165bd8deadSopenharmony_ci      "restrict" behaves as in the C99 standard, and can be used to indicate
5175bd8deadSopenharmony_ci      that no other pointer points to the same underlying data.  This permits
5185bd8deadSopenharmony_ci      optimizations that would otherwise be impossible if the compiler has to
5195bd8deadSopenharmony_ci      assume that a pair of pointers might end up pointing to the same data.
5205bd8deadSopenharmony_ci      For example, in standard C/C++, a loop like:
5215bd8deadSopenharmony_ci
5225bd8deadSopenharmony_ci        int *a, *b;
5235bd8deadSopenharmony_ci        a[0] = b[0] + b[0];
5245bd8deadSopenharmony_ci        a[1] = b[0] + b[1];
5255bd8deadSopenharmony_ci        a[2] = b[0] + b[2];
5265bd8deadSopenharmony_ci
5275bd8deadSopenharmony_ci       would need to reload b[0] for each assignment because a[0] or a[1]
5285bd8deadSopenharmony_ci       might point at the same data as b[0].  With restrict, the compiler can
5295bd8deadSopenharmony_ci       assume that b[0] is not modified by any of the instructions and load it
5305bd8deadSopenharmony_ci       just once.
5315bd8deadSopenharmony_ci
5325bd8deadSopenharmony_ci    (3) What amount of automatic synchronization is provided for buffer object
5335bd8deadSopenharmony_ci        writes through pointers?
5345bd8deadSopenharmony_ci
5355bd8deadSopenharmony_ci      RESOLVED:  Use of MemoryBarrierEXT() is required, and there is no
5365bd8deadSopenharmony_ci      automatic synchronization when buffers are bound or unbound.  With
5375bd8deadSopenharmony_ci      resident buffers, there are no well-defined binding points in the first
5385bd8deadSopenharmony_ci      place -- all resident buffers are effectively "bound".
5395bd8deadSopenharmony_ci
5405bd8deadSopenharmony_ci      Implicit synchronization is difficult, as it might require some
5415bd8deadSopenharmony_ci      combination of:
5425bd8deadSopenharmony_ci
5435bd8deadSopenharmony_ci        - tracking which buffers might be written (randomly) in the shader
5445bd8deadSopenharmony_ci          itself;
5455bd8deadSopenharmony_ci
5465bd8deadSopenharmony_ci        - assuming that if a shader that performs writes is executed, all
5475bd8deadSopenharmony_ci          bytes of all resident buffers could be modified and thus must be
5485bd8deadSopenharmony_ci          treated as dirty;
5495bd8deadSopenharmony_ci
5505bd8deadSopenharmony_ci        - idling at the end of each primitive or draw call, so that the
5515bd8deadSopenharmony_ci          results of all previous commands are complete.
5525bd8deadSopenharmony_ci
5535bd8deadSopenharmony_ci      Since normal OpenGL operation is pipelined, idling would result in a
5545bd8deadSopenharmony_ci      significant performance impact since pipelining would otherwise allow
5555bd8deadSopenharmony_ci      fragment shader execution for draw call N while simultaneously
5565bd8deadSopenharmony_ci      performing vertex shader execution for draw call N+1.
5575bd8deadSopenharmony_ci
5585bd8deadSopenharmony_ci
5595bd8deadSopenharmony_ciRevision History
5605bd8deadSopenharmony_ci
5615bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
5625bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------
5635bd8deadSopenharmony_ci     6    05/25/22  shqxu     Update to address removal of function MemoryBarrierNV
5645bd8deadSopenharmony_ci                              and replace with MemoryBarrierEXT. Add interaction
5655bd8deadSopenharmony_ci                              with OpenGL 4.2 supporting MemoryBarrier.
5665bd8deadSopenharmony_ci
5675bd8deadSopenharmony_ci     5    08/13/12  pbrown    Add interaction with OpenGL 4.3 (and related ARB
5685bd8deadSopenharmony_ci                              extensions) supporting atomic{Inc,Dec}Wrap and 
5695bd8deadSopenharmony_ci                              64-bit unsigned integer atomics to shared and
5705bd8deadSopenharmony_ci                              shader storage buffer memory. 
5715bd8deadSopenharmony_ci
5725bd8deadSopenharmony_ci     4    04/13/10  pbrown    Remove the floating-point version of atomicAdd(). 
5735bd8deadSopenharmony_ci
5745bd8deadSopenharmony_ci     3    03/23/10  pbrown    Minor cleanups to the dependency sections.
5755bd8deadSopenharmony_ci                              Fixed obsolete extension names.  Add an issue
5765bd8deadSopenharmony_ci                              on synchronization.
5775bd8deadSopenharmony_ci
5785bd8deadSopenharmony_ci     2    03/16/10  pbrown    Updated memory access qualifiers section
5795bd8deadSopenharmony_ci                              (volatile, coherent, restrict, const) for
5805bd8deadSopenharmony_ci                              pointers.  Added language to document how
5815bd8deadSopenharmony_ci                              these qualifiers work in possibly complicated
5825bd8deadSopenharmony_ci                              expression.
5835bd8deadSopenharmony_ci
5845bd8deadSopenharmony_ci     1              pbrown    Internal revisions.
585