15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    ARB_compute_shader
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_ARB_compute_shader
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Graham Sellers, AMD (graham.sellers 'at' amd.com)
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciContributors
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci    Pat Brown, NVIDIA
165bd8deadSopenharmony_ci    Daniel Koch, TransGaming
175bd8deadSopenharmony_ci    John Kessenich
185bd8deadSopenharmony_ci    Members of the ARB working group
195bd8deadSopenharmony_ci
205bd8deadSopenharmony_ciNotice
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ci    Copyright (c) 2012-2014 The Khronos Group Inc. Copyright terms at
235bd8deadSopenharmony_ci        http://www.khronos.org/registry/speccopyright.html
245bd8deadSopenharmony_ci
255bd8deadSopenharmony_ciSpecification Update Policy
265bd8deadSopenharmony_ci
275bd8deadSopenharmony_ci    Khronos-approved extension specifications are updated in response to
285bd8deadSopenharmony_ci    issues and bugs prioritized by the Khronos OpenGL Working Group. For
295bd8deadSopenharmony_ci    extensions which have been promoted to a core Specification, fixes will
305bd8deadSopenharmony_ci    first appear in the latest version of that core Specification, and will
315bd8deadSopenharmony_ci    eventually be backported to the extension document. This policy is
325bd8deadSopenharmony_ci    described in more detail at
335bd8deadSopenharmony_ci        https://www.khronos.org/registry/OpenGL/docs/update_policy.php
345bd8deadSopenharmony_ci
355bd8deadSopenharmony_ciStatus
365bd8deadSopenharmony_ci
375bd8deadSopenharmony_ci    Complete.
385bd8deadSopenharmony_ci    Approved by the ARB on 2012/06/12.
395bd8deadSopenharmony_ci
405bd8deadSopenharmony_ciVersion
415bd8deadSopenharmony_ci
425bd8deadSopenharmony_ci    Last Modified Date: December 10, 2018
435bd8deadSopenharmony_ci    Revision: 28
445bd8deadSopenharmony_ci
455bd8deadSopenharmony_ciNumber
465bd8deadSopenharmony_ci
475bd8deadSopenharmony_ci    ARB Extension #122
485bd8deadSopenharmony_ci
495bd8deadSopenharmony_ciDependencies
505bd8deadSopenharmony_ci
515bd8deadSopenharmony_ci    OpenGL 4.2 is required.
525bd8deadSopenharmony_ci
535bd8deadSopenharmony_ci    This extension is written based on the wording of the OpenGL 4.2 (Core
545bd8deadSopenharmony_ci    Profile) specification, and on the wording of the OpenGL Shading Language
555bd8deadSopenharmony_ci    (GLSL) Specification, version 4.20.
565bd8deadSopenharmony_ci
575bd8deadSopenharmony_ci    This extension interacts with OpenGL 4.3 and
585bd8deadSopenharmony_ci    ARB_shader_storage_buffer_object.
595bd8deadSopenharmony_ci
605bd8deadSopenharmony_ci    This extension interacts with NV_vertex_buffer_unified_memory.
615bd8deadSopenharmony_ci
625bd8deadSopenharmony_ciOverview
635bd8deadSopenharmony_ci
645bd8deadSopenharmony_ci    Recent graphics hardware has become extremely powerful and a strong desire
655bd8deadSopenharmony_ci    to harness this power for work (both graphics and non-graphics) that does
665bd8deadSopenharmony_ci    not fit the traditional graphics pipeline well has emerged. To address
675bd8deadSopenharmony_ci    this, this extension adds a new single-stage program type known as a
685bd8deadSopenharmony_ci    compute program. This program may contain one or more compute shaders
695bd8deadSopenharmony_ci    which may be launched in a manner that is essentially stateless. This allows
705bd8deadSopenharmony_ci    arbitrary workloads to be sent to the graphics hardware with minimal
715bd8deadSopenharmony_ci    disturbance to the GL state machine.
725bd8deadSopenharmony_ci
735bd8deadSopenharmony_ci    In most respects, a compute program is identical to a traditional OpenGL
745bd8deadSopenharmony_ci    program object, with similar status, uniforms, and other such properties.
755bd8deadSopenharmony_ci    It has access to many of the same resources as fragment and other shader
765bd8deadSopenharmony_ci    types, such as textures, image variables, atomic counters, and so on.
775bd8deadSopenharmony_ci    However, it has no predefined inputs nor any fixed-function outputs. It
785bd8deadSopenharmony_ci    cannot be part of a pipeline and its visible side effects are through its
795bd8deadSopenharmony_ci    actions on images and atomic counters.
805bd8deadSopenharmony_ci
815bd8deadSopenharmony_ci    OpenCL is another solution for using graphics processors as generalized
825bd8deadSopenharmony_ci    compute devices. This extension addresses a different need. For example,
835bd8deadSopenharmony_ci    OpenCL is designed to be usable on a wide range of devices ranging from
845bd8deadSopenharmony_ci    CPUs, GPUs, and DSPs through to FPGAs. While one could implement GL on these
855bd8deadSopenharmony_ci    types of devices, the target here is clearly GPUs. Another difference is
865bd8deadSopenharmony_ci    that OpenCL is more full featured and includes features such as multiple
875bd8deadSopenharmony_ci    devices, asynchronous queues and strict IEEE semantics for floating point
885bd8deadSopenharmony_ci    operations. This extension follows the semantics of OpenGL - implicitly
895bd8deadSopenharmony_ci    synchronous, in-order operation with single-device, single queue
905bd8deadSopenharmony_ci    logical architecture and somewhat more relaxed numerical precision
915bd8deadSopenharmony_ci    requirements. Although not as feature rich, this extension offers several
925bd8deadSopenharmony_ci    advantages for applications that can tolerate the omission of these
935bd8deadSopenharmony_ci    features. Compute shaders are written in GLSL, for example and so code may
945bd8deadSopenharmony_ci    be shared between compute and other shader types. Objects are created and
955bd8deadSopenharmony_ci    owned by the same context as the rest of the GL, and therefore no
965bd8deadSopenharmony_ci    interoperability API is required and objects may be freely used by both
975bd8deadSopenharmony_ci    compute and graphics simultaneously without acquire-release semantics or
985bd8deadSopenharmony_ci    object type translation.
995bd8deadSopenharmony_ci
1005bd8deadSopenharmony_ciNew Procedures and Functions
1015bd8deadSopenharmony_ci
1025bd8deadSopenharmony_ci        void DispatchCompute(uint num_groups_x,
1035bd8deadSopenharmony_ci                             uint num_groups_y,
1045bd8deadSopenharmony_ci                             uint num_groups_z);
1055bd8deadSopenharmony_ci
1065bd8deadSopenharmony_ci        void DispatchComputeIndirect(intptr indirect);
1075bd8deadSopenharmony_ci
1085bd8deadSopenharmony_ciNew Tokens
1095bd8deadSopenharmony_ci
1105bd8deadSopenharmony_ci    Accepted by the <type> parameter of CreateShader and returned in the
1115bd8deadSopenharmony_ci    <params> parameter by GetShaderiv:
1125bd8deadSopenharmony_ci
1135bd8deadSopenharmony_ci        COMPUTE_SHADER                                  0x91B9
1145bd8deadSopenharmony_ci
1155bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetIntegerv, GetBooleanv, GetFloatv,
1165bd8deadSopenharmony_ci    GetDoublev and GetInteger64v:
1175bd8deadSopenharmony_ci
1185bd8deadSopenharmony_ci        MAX_COMPUTE_UNIFORM_BLOCKS                      0x91BB
1195bd8deadSopenharmony_ci        MAX_COMPUTE_TEXTURE_IMAGE_UNITS                 0x91BC
1205bd8deadSopenharmony_ci        MAX_COMPUTE_IMAGE_UNIFORMS                      0x91BD
1215bd8deadSopenharmony_ci        MAX_COMPUTE_SHARED_MEMORY_SIZE                  0x8262
1225bd8deadSopenharmony_ci        MAX_COMPUTE_UNIFORM_COMPONENTS                  0x8263
1235bd8deadSopenharmony_ci        MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS              0x8264
1245bd8deadSopenharmony_ci        MAX_COMPUTE_ATOMIC_COUNTERS                     0x8265
1255bd8deadSopenharmony_ci        MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS         0x8266
1265bd8deadSopenharmony_ci        MAX_COMPUTE_WORK_GROUP_INVOCATIONS              0x90EB
1275bd8deadSopenharmony_ci
1285bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetIntegeri_v, GetBooleani_v,
1295bd8deadSopenharmony_ci    GetFloati_v, GetDoublei_v and GetInteger64i_v:
1305bd8deadSopenharmony_ci
1315bd8deadSopenharmony_ci        MAX_COMPUTE_WORK_GROUP_COUNT                    0x91BE
1325bd8deadSopenharmony_ci        MAX_COMPUTE_WORK_GROUP_SIZE                     0x91BF
1335bd8deadSopenharmony_ci
1345bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetProgramiv:
1355bd8deadSopenharmony_ci
1365bd8deadSopenharmony_ci        COMPUTE_WORK_GROUP_SIZE                         0x8267
1375bd8deadSopenharmony_ci
1385bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetActiveUniformBlockiv:
1395bd8deadSopenharmony_ci
1405bd8deadSopenharmony_ci        UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER      0x90EC
1415bd8deadSopenharmony_ci
1425bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetActiveAtomicCounterBufferiv:
1435bd8deadSopenharmony_ci
1445bd8deadSopenharmony_ci        ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER  0x90ED
1455bd8deadSopenharmony_ci
1465bd8deadSopenharmony_ci    Accepted by the <target> parameters of BindBuffer, BufferData,
1475bd8deadSopenharmony_ci    BufferSubData, MapBuffer, UnmapBuffer, GetBufferSubData, and
1485bd8deadSopenharmony_ci    GetBufferPointerv:
1495bd8deadSopenharmony_ci
1505bd8deadSopenharmony_ci        DISPATCH_INDIRECT_BUFFER                        0x90EE
1515bd8deadSopenharmony_ci
1525bd8deadSopenharmony_ci    Accepted by the <value> parameter of GetIntegerv, GetBooleanv,
1535bd8deadSopenharmony_ci    GetInteger64v, GetFloatv, and GetDoublev:
1545bd8deadSopenharmony_ci
1555bd8deadSopenharmony_ci        DISPATCH_INDIRECT_BUFFER_BINDING                0x90EF
1565bd8deadSopenharmony_ci
1575bd8deadSopenharmony_ci    Accepted by the <stages> parameter of UseProgramStages:
1585bd8deadSopenharmony_ci
1595bd8deadSopenharmony_ci        COMPUTE_SHADER_BIT                              0x00000020
1605bd8deadSopenharmony_ci
1615bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 4.2 (Core Profile) Specification
1625bd8deadSopenharmony_ci(OpenGL Operation)
1635bd8deadSopenharmony_ci
1645bd8deadSopenharmony_ci    In section 2.9.1, "Creating and Binding Buffer Objects", add to table 2.8
1655bd8deadSopenharmony_ci    (p.43):
1665bd8deadSopenharmony_ci
1675bd8deadSopenharmony_ci                                                                Described
1685bd8deadSopenharmony_ci      Target name                 Purpose                     in sections(s)
1695bd8deadSopenharmony_ci      -----------------------     -------------------------  ---------------
1705bd8deadSopenharmony_ci      DISPATCH_INDIRECT_BUFFER    Indirect compute dispatch       5.5
1715bd8deadSopenharmony_ci                                  commands
1725bd8deadSopenharmony_ci
1735bd8deadSopenharmony_ci    Add to the end of section 2.9.8, "Indirect Commands In Buffer Objects"
1745bd8deadSopenharmony_ci    (p. 53):
1755bd8deadSopenharmony_ci
1765bd8deadSopenharmony_ci    Arguments to the DispatchComputeIndirect command are stored in buffer
1775bd8deadSopenharmony_ci    objects as a group of three unsigned integers.
1785bd8deadSopenharmony_ci
1795bd8deadSopenharmony_ci    A buffer object is bound to DISPATCH_INDIRECT_BUFFER by calling BindBuffer
1805bd8deadSopenharmony_ci    with target set to DISPATCH_INDIRECT_BUFFER, and buffer set to the name of
1815bd8deadSopenharmony_ci    the buffer object. If no corresponding buffer object exists, one is
1825bd8deadSopenharmony_ci    initialized as defined in section 2.9.
1835bd8deadSopenharmony_ci
1845bd8deadSopenharmony_ci    DispatchComputeIndirect sources its arguments from the buffer object whose
1855bd8deadSopenharmony_ci    name is bound to DISPATCH_INDIRECT_BUFFER, using the <indirect> parameter as
1865bd8deadSopenharmony_ci    an offset into the buffer object in the same fashion as described in
1875bd8deadSopenharmony_ci    section 2.9.6. An INVALID_OPERATION error is generated if this command
1885bd8deadSopenharmony_ci    sources data beyond the end of the buffer object, if zero is bound to
1895bd8deadSopenharmony_ci    DISPATCH_INDIRECT_BUFFER, or if <indirect> is less than zero or not a
1905bd8deadSopenharmony_ci    multiple of the size, in basic machine units, of uint.
1915bd8deadSopenharmony_ci
1925bd8deadSopenharmony_ci    In section 2.11, "Vertex Shaders", modify the introductory text on shaders
1935bd8deadSopenharmony_ci    to include compute shaders (second paragraph, p. 56):
1945bd8deadSopenharmony_ci
1955bd8deadSopenharmony_ci    In addition to vertex shaders, tessellation control..., geometry shaders,
1965bd8deadSopenharmony_ci    fragment shaders, and compute shders can be created, compiled, and linked
1975bd8deadSopenharmony_ci    into program objects.  ....  (section 3.10).  Compute shaders perform
1985bd8deadSopenharmony_ci    general computations for dispatched arrays of shader invocations (section
1995bd8deadSopenharmony_ci    5.5), but do not operate on primitives processed by the other shader
2005bd8deadSopenharmony_ci    types. ...
2015bd8deadSopenharmony_ci
2025bd8deadSopenharmony_ci    In section 2.11.3, "Program Objects", add to the reasons that LinkProgram
2035bd8deadSopenharmony_ci    may fail, p. 61:
2045bd8deadSopenharmony_ci
2055bd8deadSopenharmony_ci        * The program object contains objects to form a compute shader (see
2065bd8deadSopenharmony_ci          section 5.5) and objects to form any other type of shader.
2075bd8deadSopenharmony_ci
2085bd8deadSopenharmony_ci    In section 2.11.3, modify the description of active programs (last
2095bd8deadSopenharmony_ci    paragraph, p. 61, first paragraph, p. 62):
2105bd8deadSopenharmony_ci
2115bd8deadSopenharmony_ci    ... geometry shader stages, those stages are ignored.  If there is no
2125bd8deadSopenharmony_ci    active program for the compute shader stage, compute dispatches will
2135bd8deadSopenharmony_ci    generate an error.  The active program for the compute shader stage has no
2145bd8deadSopenharmony_ci    effect on the processing of vertices, geometric primitives, and fragments,
2155bd8deadSopenharmony_ci    and the active program for all other shader stages has no effect on
2165bd8deadSopenharmony_ci    compute dispatches.
2175bd8deadSopenharmony_ci
2185bd8deadSopenharmony_ci    In section 2.11.4, "Program Pipeline Objects", modify the description of
2195bd8deadSopenharmony_ci    UseProgramStages, p. 65:
2205bd8deadSopenharmony_ci
2215bd8deadSopenharmony_ci    The executables in a program object... becomes current.  These stages may
2225bd8deadSopenharmony_ci    include vertex, tessellation control, tessellation evaluation, geometry,
2235bd8deadSopenharmony_ci    fragment, or compute, indicated by VERTEX_SHADER_BIT,
2245bd8deadSopenharmony_ci    TESS_CONTROL_SHADER_BIT, TESS_EVALUATION_SHADER_BIT, GEOMETRY_SHADER_BIT,
2255bd8deadSopenharmony_ci    FRAGMENT_SHADER_BIT, or COMPUTE_SHADER_BIT, respectively. ...
2265bd8deadSopenharmony_ci
2275bd8deadSopenharmony_ci    In the unnumbered "Validation" section of section 2.11.12 "Shader
2285bd8deadSopenharmony_ci    Execution", modify the list of validation errors, pp. 112-113:
2295bd8deadSopenharmony_ci
2305bd8deadSopenharmony_ci    This error is generated by any command that transfers vertices to the GL
2315bd8deadSopenharmony_ci    or launches compute work if:
2325bd8deadSopenharmony_ci
2335bd8deadSopenharmony_ci      * (last bullet, p. 112) One program object is active... first program
2345bd8deadSopenharmony_ci        object was active.  The active compute shader is ignored for the
2355bd8deadSopenharmony_ci        purposes of this test.
2365bd8deadSopenharmony_ci
2375bd8deadSopenharmony_ci      * (2nd bullet, p. 113) There is no current program specified by
2385bd8deadSopenharmony_ci        UseProgram, there is a current program pipeline object, and the
2395bd8deadSopenharmony_ci        current program for any shader stage has been relinked since...
2405bd8deadSopenharmony_ci
2415bd8deadSopenharmony_ci      * (3rd bullet, p. 113) Any two active samplers in the set of active
2425bd8deadSopenharmony_ci        program objects are of different types but refer to the same texture
2435bd8deadSopenharmony_ci        image unit.
2445bd8deadSopenharmony_ci
2455bd8deadSopenharmony_ci      * (4th bullet, p. 113) The sum of the number of active samplers for each
2465bd8deadSopenharmony_ci        active program exceeds the maximum number of texture image units
2475bd8deadSopenharmony_ci        allowed.
2485bd8deadSopenharmony_ci
2495bd8deadSopenharmony_ci    Modify the paragraph describing ValidateProgram, p. 113:
2505bd8deadSopenharmony_ci
2515bd8deadSopenharmony_ci    ... If validation succeeded, ... set to FALSE.  If validation succeeded,
2525bd8deadSopenharmony_ci    no INVALID_OPERATION validation error will be generated if <program> were
2535bd8deadSopenharmony_ci    made current via UseProgram, given the current state.  If validation
2545bd8deadSopenharmony_ci    failed, such errors will be generated under the current state.
2555bd8deadSopenharmony_ci
2565bd8deadSopenharmony_ci    Modify the paragraph describing ValidateProgramPipeline, p. 114:
2575bd8deadSopenharmony_ci
2585bd8deadSopenharmony_ci    ... can be queried with GetProgramPipelineiv (see section 6.1.12).  If
2595bd8deadSopenharmony_ci    validation succeeded, no INVALID_OPERATION validation error will be
2605bd8deadSopenharmony_ci    generated if <pipeline> were bound and no program were made current via
2615bd8deadSopenharmony_ci    UseProgram, given the current state.  If validation failed, such errors
2625bd8deadSopenharmony_ci    will be generated under the current state.
2635bd8deadSopenharmony_ci
2645bd8deadSopenharmony_ci    In subsection 2.11.12, "Shader Execution":
2655bd8deadSopenharmony_ci
2665bd8deadSopenharmony_ci        Add to the list of implementation dependent constants under the
2675bd8deadSopenharmony_ci    "Texture Access" sub-heading:
2685bd8deadSopenharmony_ci
2695bd8deadSopenharmony_ci        MAX_COMPUTE_TEXTURE_IMAGE_UNITS (for compute shaders),
2705bd8deadSopenharmony_ci
2715bd8deadSopenharmony_ci        Add to the list of implementation dependent constants under the "Atomic
2725bd8deadSopenharmony_ci    Counter Access" sub-heading:
2735bd8deadSopenharmony_ci
2745bd8deadSopenharmony_ci        MAX_COMPUTE_ATOMIC_COUNTERS (for compute shaders),
2755bd8deadSopenharmony_ci
2765bd8deadSopenharmony_ci        Add to the list of implementation dependent constants under the "Image
2775bd8deadSopenharmony_ci    Access" sub-heading:
2785bd8deadSopenharmony_ci
2795bd8deadSopenharmony_ci        MAX_COMPUTE_IMAGE_UNIFORMS (for compute shaders),
2805bd8deadSopenharmony_ci
2815bd8deadSopenharmony_ci    In section 2.16, "Conditional Rendering", modify the sentence describing
2825bd8deadSopenharmony_ci    conditional rendering, starting with "In this case"...
2835bd8deadSopenharmony_ci
2845bd8deadSopenharmony_ci    In this case, all drawing commands (see section 2.8.3), as well as
2855bd8deadSopenharmony_ci    Clear and ClearBuffer* (see section 4.2.3), and compute dispatch
2865bd8deadSopenharmony_ci    through DispacthCompute* (see section 5.5), have no effect.
2875bd8deadSopenharmony_ci    In the "Shared Memory Access Synchronization" subsection of section
2885bd8deadSopenharmony_ci    2.11.13, "Shader Memory Access", modify the description of
2895bd8deadSopenharmony_ci    COMMAND_BARRIER_BIT (p. 118):
2905bd8deadSopenharmony_ci
2915bd8deadSopenharmony_ci      * COMMAND_BARRIER_BIT:  Command data sourced from buffer objects by
2925bd8deadSopenharmony_ci        Draw*Indirect and DispatchComputeIndirect commands ... The buffer
2935bd8deadSopenharmony_ci        objects affected by this bit are derived from the DRAW_INDIRECT_BUFFER
2945bd8deadSopenharmony_ci        and DISPATCH_INDIRECT_BUFFER bindings.
2955bd8deadSopenharmony_ci
2965bd8deadSopenharmony_ci    In subection 2.17.7, "Uniform Variables", replace the paragraph beginning
2975bd8deadSopenharmony_ci    "If <pname> is UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER,"... with:
2985bd8deadSopenharmony_ci
2995bd8deadSopenharmony_ci        If <pname> is UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER,
3005bd8deadSopenharmony_ci    UNIFORM_BLOCK_REFERENCED_BY_TESS_CONTROL_SHADER,
3015bd8deadSopenharmony_ci    UNIFORM_BLOCK_REFERENCED_BY_TESS_EVALUATION_SHADER,
3025bd8deadSopenharmony_ci    UNIFORM_BLOCK_REFERENCED_BY_GEOMETRY_SHADER,
3035bd8deadSopenharmony_ci    UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER or
3045bd8deadSopenharmony_ci    UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER, then a boolean value indicating
3055bd8deadSopenharmony_ci    whether the uniform block identified by uniformBlockIndex is referenced
3065bd8deadSopenharmony_ci    by the vertex, tessellation control, tessellation evaluation, geometry,
3075bd8deadSopenharmony_ci    fragment or compute programming stages of <program>, respectively, is
3085bd8deadSopenharmony_ci    returned.
3095bd8deadSopenharmony_ci
3105bd8deadSopenharmony_ci    Also in subsection 2.17.7, "Uniform Variables", replace the paragraph
3115bd8deadSopenharmony_ci    beginning, "If <pname> is ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER"
3125bd8deadSopenharmony_ci    on p.80 with:
3135bd8deadSopenharmony_ci
3145bd8deadSopenharmony_ci        If <pname> is ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER,
3155bd8deadSopenharmony_ci    ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_CONTROL_SHADER,
3165bd8deadSopenharmony_ci    ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_EVALUATION_SHADER,
3175bd8deadSopenharmony_ci    ATOMIC_COUNTER_BUFFER_REFERENCED_BY_GEOMETRY_SHADER,
3185bd8deadSopenharmony_ci    ATOMIC_COUNTER_BUFFER_REFERENCED_BY_FRAGMENT_SHADER or
3195bd8deadSopenharmony_ci    ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER, then a single boolean
3205bd8deadSopenharmony_ci    value indicating whether the atomic counter buffer identified by
3215bd8deadSopenharmony_ci    bufferIndex is referenced by the vertex, tessellation control, tessellation
3225bd8deadSopenharmony_ci    evaluation, geometry, fragment or compute programming stages of
3235bd8deadSopenharmony_ci    <program>, respectively, is returned.
3245bd8deadSopenharmony_ci
3255bd8deadSopenharmony_ci    Under the sub-heading "Uniform Blocks" in subsection 2.11.17, replace the
3265bd8deadSopenharmony_ci    sentence beginning "The limits for vertex, tessellation ..." on p.92
3275bd8deadSopenharmony_ci    with:
3285bd8deadSopenharmony_ci
3295bd8deadSopenharmony_ci        The limits for vertex, tessellation, geometry, fragment and compute
3305bd8deadSopenharmony_ci    shaders can be obtained by calling GetIntegerv with <pname> set to
3315bd8deadSopenharmony_ci    MAX_VERTEX_UNIFORM_BLOCKS, MAX_TESS_CONTROL_UNIFORM_BLOCKS,
3325bd8deadSopenharmony_ci    MAX_TESS_EVALUATION_UNIFORM_BLOCKS, MAX_GEOMETRY_UNIFORM_BLOCKS,
3335bd8deadSopenharmony_ci    MAX_FRAGMENT_UNIFORM_BLOCKS and MAX_COMPUTE_UNIFORM_BLOCKS, respectively.
3345bd8deadSopenharmony_ci
3355bd8deadSopenharmony_ci    Under the sub-heading "Atomic Counter Buffers" in subsection 2.11.17,
3365bd8deadSopenharmony_ci    replace the sentence beginning "The limits for vertex, geometry, ..."
3375bd8deadSopenharmony_ci    on p.96 with:
3385bd8deadSopenharmony_ci
3395bd8deadSopenharmony_ci        The limits for vertex, tessellation, geometry, fragment and compute
3405bd8deadSopenharmony_ci    shaders can be obtained by calling GetIntegerv with <pname> set to
3415bd8deadSopenharmony_ci    MAX_VERTEX_ATOMIC_COUNTER_BUFFERS, MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS,
3425bd8deadSopenharmony_ci    MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS,
3435bd8deadSopenharmony_ci    MAX_GEOMETRY_ATOMIC_COUNTER_BUFFERS, MAX_FRAGMENT_ATOMIC_COUNTER_BUFFERS and
3445bd8deadSopenharmony_ci    MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS, respectively.
3455bd8deadSopenharmony_ci
3465bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 4.2 (Core Profile) Specification
3475bd8deadSopenharmony_ci(Rasterization)
3485bd8deadSopenharmony_ci
3495bd8deadSopenharmony_ci    None.
3505bd8deadSopenharmony_ci
3515bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 4.2 (Core Profile) Specification
3525bd8deadSopenharmony_ci(Per-Fragment Operations and the Framebuffer)
3535bd8deadSopenharmony_ci
3545bd8deadSopenharmony_ci    None.
3555bd8deadSopenharmony_ci
3565bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 4.2 (Core Profile) Specification
3575bd8deadSopenharmony_ci(Special Functions)
3585bd8deadSopenharmony_ci
3595bd8deadSopenharmony_ci    Add Section 5.5, "Compute Shaders"
3605bd8deadSopenharmony_ci
3615bd8deadSopenharmony_ci        In addition to graphics-oriented shading operations such as vertex,
3625bd8deadSopenharmony_ci    tessellation, geometry and fragment shading, generic computation may be
3635bd8deadSopenharmony_ci    performed by the GL through the use of compute shaders. The compute pipeline
3645bd8deadSopenharmony_ci    is a form of single-stage machine that runs generic shaders. Compute shaders
3655bd8deadSopenharmony_ci    are created as described in section 2.11.1 using a <type> parameter of
3665bd8deadSopenharmony_ci    COMPUTE_SHADER. They are attached to and used in program objects as
3675bd8deadSopenharmony_ci    described in section 2.11.3.
3685bd8deadSopenharmony_ci
3695bd8deadSopenharmony_ci        Compute workloads are formed from groups of work items called
3705bd8deadSopenharmony_ci    _workgroups_ and processed by the executable code for a compute program.
3715bd8deadSopenharmony_ci    A workgroup is a collection of shader invocations that execute the same code,
3725bd8deadSopenharmony_ci    potentially in parallel. An invocation within a workgroup may share data
3735bd8deadSopenharmony_ci    with other members of the same workgroup through shared variables and
3745bd8deadSopenharmony_ci    issue memory and control barriers to synchronize with other members of the
3755bd8deadSopenharmony_ci    same workgroup.  One or more workgroups is launched by calling:
3765bd8deadSopenharmony_ci
3775bd8deadSopenharmony_ci        void DispatchCompute(uint num_groups_x,
3785bd8deadSopenharmony_ci                             uint num_groups_y,
3795bd8deadSopenharmony_ci                             uint num_groups_z);
3805bd8deadSopenharmony_ci
3815bd8deadSopenharmony_ci        Each workgroup is processed by the active program object for the
3825bd8deadSopenharmony_ci    compute shader stage.  The error INVALID_OPERATION will be generated if
3835bd8deadSopenharmony_ci    there is no active program object for the compute shader stage.  The
3845bd8deadSopenharmony_ci    active program for the compute shader stage will be determined in the same
3855bd8deadSopenharmony_ci    manner as the active program for other pipeline stages, as described in
3865bd8deadSopenharmony_ci    section 2.11.3.  While the individual shader invocations within a
3875bd8deadSopenharmony_ci    workgroup are executed as a unit, workgroups are executed completely
3885bd8deadSopenharmony_ci    independently and in unspecified order.
3895bd8deadSopenharmony_ci
3905bd8deadSopenharmony_ci        <num_groups_x>, <num_groups_y> and <num_groups_z> specify the number of
3915bd8deadSopenharmony_ci    workgroups that will be dispatched in the X, Y and Z dimensions,
3925bd8deadSopenharmony_ci    respectively. The builtin vector variable gl_NumWorkGroups will be
3935bd8deadSopenharmony_ci    initialized with the contents of the <num_groups_x>, <num_groups_y> and
3945bd8deadSopenharmony_ci    <num_groups_z> parameters. The maximum number of workgroups that may be
3955bd8deadSopenharmony_ci    dispatched at one time may be determined by calling GetIntegeri_v with
3965bd8deadSopenharmony_ci    <pname> set to MAX_COMPUTE_WORK_GROUP_COUNT and <index> must be zero, one,
3975bd8deadSopenharmony_ci    or two, representing the X, Y, and Z dimensions, respectively. The
3985bd8deadSopenharmony_ci    values in the <num_groups_x>, <num_groups_y> and <num_groups_z> array must
3995bd8deadSopenharmony_ci    be less than or equal to the maximum workgroup count for the corresponding
4005bd8deadSopenharmony_ci    dimension, otherwise an INVALID_VALUE error is generated. If the workgroup
4015bd8deadSopenharmony_ci    count in any dimension is zero, no workgroups are dispatched.
4025bd8deadSopenharmony_ci
4035bd8deadSopenharmony_ci        The workgroup size in each dimension are specified at compile time
4045bd8deadSopenharmony_ci    using an input layout qualifier in one or more of the compute shaders
4055bd8deadSopenharmony_ci    attached to the program (see Section 4 of the OpenGL Shading Language
4065bd8deadSopenharmony_ci    Specification). After the program has been linked, the workgroup size
4075bd8deadSopenharmony_ci    of the program may be retrieved by calling GetProgramiv with <pname> set to
4085bd8deadSopenharmony_ci    COMPUTE_WORK_GROUP_SIZE. This will return an array of three integers
4095bd8deadSopenharmony_ci    containing the workgroup size of the compute program as specified by
4105bd8deadSopenharmony_ci    its input layout qualifier(s). If <program> is the name of a program that
4115bd8deadSopenharmony_ci    has not been successfully linked, or is the name of a linked program object
4125bd8deadSopenharmony_ci    that contains no compute shaders, then an INVALID_OPERATION error is
4135bd8deadSopenharmony_ci    generated.
4145bd8deadSopenharmony_ci
4155bd8deadSopenharmony_ci        The maximum size of a workgroup may be determined by calling
4165bd8deadSopenharmony_ci    GetIntegeri_v with <pname> set to MAX_COMPUTE_WORK_GROUP_SIZE
4175bd8deadSopenharmony_ci    and <index> set to 0, 1, or 2 to retrieve the maximum work size in the
4185bd8deadSopenharmony_ci    X, Y and Z dimension, respectively. Furthermore, the maximum number of
4195bd8deadSopenharmony_ci    invocations in a single workgroup (i.e., the product of the three
4205bd8deadSopenharmony_ci    dimensions) may be determined by calling GetIntegerv with <pname> set to
4215bd8deadSopenharmony_ci    MAX_COMPUTE_WORK_GROUP_INVOCATIONS.
4225bd8deadSopenharmony_ci
4235bd8deadSopenharmony_ci        The command
4245bd8deadSopenharmony_ci
4255bd8deadSopenharmony_ci        void DispatchComputeIndirect(intptr indirect);
4265bd8deadSopenharmony_ci
4275bd8deadSopenharmony_ci    is equivalent (assuming no errors are generated) to calling
4285bd8deadSopenharmony_ci    DispatchCompute with <num_groups_x>, <num_groups_y> and <num_groups_z>
4295bd8deadSopenharmony_ci    initialized with the three uint values contained in the buffer currently
4305bd8deadSopenharmony_ci    bound to the DISPATCH_INDIRECT_BUFFER binding at an offset, in basic
4315bd8deadSopenharmony_ci    machine units, specified by <indirect>.  The error INVALID_VALUE is
4325bd8deadSopenharmony_ci    generated if <indirect> is less than zero or is not a multiple of four.
4335bd8deadSopenharmony_ci    The error INVALID_OPERATION is generated if no buffer is bound to
4345bd8deadSopenharmony_ci    DISPATCH_INDIRECT_BUFFER, if the command would source data beyond the end
4355bd8deadSopenharmony_ci    of the buffer object, or if there is no active program for the compute
4365bd8deadSopenharmony_ci    shader stage.  If any of <num_groups_x>, <num_groups_y> or <num_groups_z>
4375bd8deadSopenharmony_ci    is greater than MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding
4385bd8deadSopenharmony_ci    dimension then the results are undefined.
4395bd8deadSopenharmony_ci
4405bd8deadSopenharmony_ci    Add Subsection 5.5.1, "Compute Shader Variables"
4415bd8deadSopenharmony_ci
4425bd8deadSopenharmony_ci        Compute shaders can access variables belonging to the current program
4435bd8deadSopenharmony_ci    object. The amount of storage in the default uniform block accessed by a
4445bd8deadSopenharmony_ci    compute shader is specified by the value of the implementation dependent
4455bd8deadSopenharmony_ci    constant MAX_COMPUTE_UNIFORM_COMPONENTS. The total amount of
4465bd8deadSopenharmony_ci    combined storage available for uniform variables in all uniform blocks
4475bd8deadSopenharmony_ci    accessed by a compute shader (including the default unifom block) is
4485bd8deadSopenharmony_ci    specified by the implementation dependent constant
4495bd8deadSopenharmony_ci    MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS.
4505bd8deadSopenharmony_ci
4515bd8deadSopenharmony_ci        There is a limit to the total size of all variables declared as
4525bd8deadSopenharmony_ci    <shared> in a single program object. This limit, expressed in units of
4535bd8deadSopenharmony_ci    basic machine units, may be queried as the value of
4545bd8deadSopenharmony_ci    MAX_COMPUTE_SHARED_MEMORY_SIZE.
4555bd8deadSopenharmony_ci
4565bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 4.2 (Core Profile) Specification
4575bd8deadSopenharmony_ci(State and State Requests)
4585bd8deadSopenharmony_ci
4595bd8deadSopenharmony_ci    None.
4605bd8deadSopenharmony_ci
4615bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL Shading Language Specification, Version
4625bd8deadSopenharmony_ci4.20 (Overview of OpenGL Shading)
4635bd8deadSopenharmony_ci
4645bd8deadSopenharmony_ci    Replace the last sentence of the first paragraph of the overview with
4655bd8deadSopenharmony_ci    the following:
4665bd8deadSopenharmony_ci
4675bd8deadSopenharmony_ci    "Currently, these processors are the vertex, tessellation control,
4685bd8deadSopenharmony_ci     tessellation evaluation, geometry, fragment, and compute processors."
4695bd8deadSopenharmony_ci
4705bd8deadSopenharmony_ci    Replace the last sentence of the second paragraph of the overview with
4715bd8deadSopenharmony_ci    the following:
4725bd8deadSopenharmony_ci
4735bd8deadSopenharmony_ci    "The specific languages will be referred to by the name of the processor
4745bd8deadSopenharmony_ci     they target: vertex, tessellation control, tessellation evaluation,
4755bd8deadSopenharmony_ci     geometry, fragment, or compute."
4765bd8deadSopenharmony_ci
4775bd8deadSopenharmony_ci    Add a new Section 2.6 titled "Compute Processor" with the following text:
4785bd8deadSopenharmony_ci
4795bd8deadSopenharmony_ci    "The <compute processor> is a programmable unit that operates independently
4805bd8deadSopenharmony_ci    from the other shader processors. Compilation units written in the OpenGL
4815bd8deadSopenharmony_ci    Shading Language to run on this processor are called <compute shaders>.
4825bd8deadSopenharmony_ci    When a complete set of compute shaders are compiled and linked, they
4835bd8deadSopenharmony_ci    result in a <compute shader executable> that runs on the compute processor.
4845bd8deadSopenharmony_ci
4855bd8deadSopenharmony_ci    A compute shader has access to many of the same resources as fragment and
4865bd8deadSopenharmony_ci    other shader processors, such as textures, buffers, image variables,
4875bd8deadSopenharmony_ci    atomic counters, and so on. It does not have any predefined inputs
4885bd8deadSopenharmony_ci    nor any fixed-function outputs.  It is not part of the graphics pipeline
4895bd8deadSopenharmony_ci    and its visible side effects are through actions on images, storage
4905bd8deadSopenharmony_ci    buffers, and atomic counters.
4915bd8deadSopenharmony_ci
4925bd8deadSopenharmony_ci    A compute shader operates on a group of work items called a workgroup.
4935bd8deadSopenharmony_ci    A workgroup is a collection of shader invocations that execute the same
4945bd8deadSopenharmony_ci    code, potentially in parallel. An invocation within a workgroup may share data with
4955bd8deadSopenharmony_ci    other members of the same workgroup through shared variables and issue
4965bd8deadSopenharmony_ci    memory and control barriers to synchronize with other members of the same workgroup."
4975bd8deadSopenharmony_ci
4985bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL Shading Language Specification, Version
4995bd8deadSopenharmony_ci4.20 (Variables and Types)
5005bd8deadSopenharmony_ci
5015bd8deadSopenharmony_ci    Modify section 4.4.1, second paragraph from
5025bd8deadSopenharmony_ci
5035bd8deadSopenharmony_ci    "All shaders allow input layout qualifiers on input variable declarations."
5045bd8deadSopenharmony_ci
5055bd8deadSopenharmony_ci    to
5065bd8deadSopenharmony_ci
5075bd8deadSopenharmony_ci    "All shaders, except compute shaders, allow input layout location qualifiers on
5085bd8deadSopenharmony_ci     input variable declarations."
5095bd8deadSopenharmony_ci
5105bd8deadSopenharmony_ci    Modify Section 4.3. Add to the table at the start of Section 4.3:
5115bd8deadSopenharmony_ci
5125bd8deadSopenharmony_ci    +-------------------+-----------------------------------------------------------+
5135bd8deadSopenharmony_ci    | Storage Qualifier | Meaning                                                   |
5145bd8deadSopenharmony_ci    +-------------------+-----------------------------------------------------------+
5155bd8deadSopenharmony_ci    | <shared>          | variable storage is shared across all work items in a     |
5165bd8deadSopenharmony_ci    |                   | workgroup for compute shaders                             |
5175bd8deadSopenharmony_ci    +-------------------+-----------------------------------------------------------+
5185bd8deadSopenharmony_ci
5195bd8deadSopenharmony_ci    Add the following paragraph to Section 4.3.4, "Input Variables"
5205bd8deadSopenharmony_ci
5215bd8deadSopenharmony_ci        Compute shaders do not permit user-defined input variables and do not
5225bd8deadSopenharmony_ci    form a formal interface with any other shader stage. See section 7.1
5235bd8deadSopenharmony_ci    for a description of built-in compute shader input variables. All other
5245bd8deadSopenharmony_ci    input to a compute shader is retrieved explicitly through image loads,
5255bd8deadSopenharmony_ci    texture fetches, loads from uniforms or uniform buffers, or other user
5265bd8deadSopenharmony_ci    supplied code. Redeclaration of built-in input variables in compute
5275bd8deadSopenharmony_ci    shaders is not permitted.
5285bd8deadSopenharmony_ci
5295bd8deadSopenharmony_ci    Add the following paragraph to Section 4.3.6, "Output Variables"
5305bd8deadSopenharmony_ci
5315bd8deadSopenharmony_ci        Compute shaders have no built-in output variables, do not support
5325bd8deadSopenharmony_ci    user-defined output variables and do not form a formal interface with any
5335bd8deadSopenharmony_ci    other shader stage. All outputs from a compute shader take the form of the
5345bd8deadSopenharmony_ci    side effects such as image stores and operations on atomic counters.
5355bd8deadSopenharmony_ci
5365bd8deadSopenharmony_ci    Add Section 4.3.7, "Shared", renumber subsequent sections
5375bd8deadSopenharmony_ci
5385bd8deadSopenharmony_ci        The <shared> qualifier is used to declare variables that have storage
5395bd8deadSopenharmony_ci    shared between all work items of a compute shader workgroup.
5405bd8deadSopenharmony_ci    Variables declared as <shared> may only be used in compute shaders
5415bd8deadSopenharmony_ci    (see Section 5.5, "Compute Shaders"). Shared variables are implicitly
5425bd8deadSopenharmony_ci    coherent. That is, writes to shared variables from one shader invocation
5435bd8deadSopenharmony_ci    will eventually be seen by other invocations within the same workgroup.
5445bd8deadSopenharmony_ci
5455bd8deadSopenharmony_ci        Variables declared as <shared> may not have initializers and their
5465bd8deadSopenharmony_ci    contents are undefined at the beginning of shader execution. Any data
5475bd8deadSopenharmony_ci    written to <shared> variables will be visible to other shaders executing
5485bd8deadSopenharmony_ci    the same shader within the same workgroup. Order of execution
5495bd8deadSopenharmony_ci    with regards to reads and writes to the same <shared> variables by different
5505bd8deadSopenharmony_ci    invocations of a shader is not defined. In order to achieve ordering with
5515bd8deadSopenharmony_ci    respect to reads and writes to <shared> variables, memory barriers must be
5525bd8deadSopenharmony_ci    employed using the barrier() function (see Section 8.15).
5535bd8deadSopenharmony_ci
5545bd8deadSopenharmony_ci        There is a limit to the total size of all variables declared as
5555bd8deadSopenharmony_ci    <shared> in a single program object. This limit, expressed in units of
5565bd8deadSopenharmony_ci    basic machine units may be determined by using the OpenGL API to query the
5575bd8deadSopenharmony_ci    value of MAX_COMPUTE_SHARED_MEMORY_SIZE.
5585bd8deadSopenharmony_ci
5595bd8deadSopenharmony_ci    Add Section 4.4.1.4, "Compute-Shader Inputs"
5605bd8deadSopenharmony_ci
5615bd8deadSopenharmony_ci    There are no layout location qualifiers for compute shader inputs.
5625bd8deadSopenharmony_ci
5635bd8deadSopenharmony_ci    Layout qualifier identifiers for compute shader inputs are the workgroup
5645bd8deadSopenharmony_ci    size qualifiers:
5655bd8deadSopenharmony_ci
5665bd8deadSopenharmony_ci        layout-qualifier-id
5675bd8deadSopenharmony_ci            local_size_x = integer-constant
5685bd8deadSopenharmony_ci            local_size_y = integer-constant
5695bd8deadSopenharmony_ci            local_size_z = integer-constant
5705bd8deadSopenharmony_ci
5715bd8deadSopenharmony_ci    <local_size_x>, <local_size_y>, and <local_size_z> are used to define the
5725bd8deadSopenharmony_ci    local size of the kernel defined by the compute shader in the first,
5735bd8deadSopenharmony_ci    second, and third dimension, respectively. The default size in each
5745bd8deadSopenharmony_ci    dimension is 1. If a shader does not specify a size for one of the
5755bd8deadSopenharmony_ci    dimensions, that dimension will have a size of 1.
5765bd8deadSopenharmony_ci
5775bd8deadSopenharmony_ci    For example, the following declaration in a compute shader
5785bd8deadSopenharmony_ci
5795bd8deadSopenharmony_ci        layout (local_size_x = 32, local_size_y = 32) in;
5805bd8deadSopenharmony_ci
5815bd8deadSopenharmony_ci    is used to declare a two-dimensional compute shader with a local size of
5825bd8deadSopenharmony_ci    32 x 32 elements as a three-dimensional compute shader where the third dimension is
5835bd8deadSopenharmony_ci    one element deep.
5845bd8deadSopenharmony_ci
5855bd8deadSopenharmony_ci    As another example, the declaration
5865bd8deadSopenharmony_ci
5875bd8deadSopenharmony_ci        layout (local_size_x = 8) in;
5885bd8deadSopenharmony_ci
5895bd8deadSopenharmony_ci    effectively specifies that a one-dimensional compute shader is being
5905bd8deadSopenharmony_ci    compiled, and its size is 8 elements.
5915bd8deadSopenharmony_ci
5925bd8deadSopenharmony_ci        If the local size of the shader in any dimension is greater than the
5935bd8deadSopenharmony_ci    maximum size supported by the implementation for that dimension, a
5945bd8deadSopenharmony_ci    compile-time error results. Also, if such a layout qualifier is declared more
5955bd8deadSopenharmony_ci    than once in the same shader, all those declarations must indicate the same
5965bd8deadSopenharmony_ci    workgroup size; otherwise a compile-time error results. If multiple compute
5975bd8deadSopenharmony_ci    shaders attached to a single program object declare the workgroup size,
5985bd8deadSopenharmony_ci    the declarations must be identical; otherwise a link-time error results.
5995bd8deadSopenharmony_ci    Furthermore, if a program object contains any compute shaders, at
6005bd8deadSopenharmony_ci    least one must contain an input layout qualifier specifying the
6015bd8deadSopenharmony_ci    workgroup sizes of the program, or a link-time error will occur.
6025bd8deadSopenharmony_ci
6035bd8deadSopenharmony_ciAdditions to Chapter 7 of the OpenGL Shading Language Specification, Version
6045bd8deadSopenharmony_ci4.20 (Built-in Variables)
6055bd8deadSopenharmony_ci
6065bd8deadSopenharmony_ci    Add to the start of Section 7.1, "Built-In Language Variables", before the
6075bd8deadSopenharmony_ci    description of the vertex language built-in variables:
6085bd8deadSopenharmony_ci
6095bd8deadSopenharmony_ci        In the compute language, the built-in variables are declared as follows:
6105bd8deadSopenharmony_ci
6115bd8deadSopenharmony_ci        // workgroup dimensions
6125bd8deadSopenharmony_ci        in    uvec3 gl_NumWorkGroups;
6135bd8deadSopenharmony_ci        const uvec3 gl_WorkGroupSize;
6145bd8deadSopenharmony_ci
6155bd8deadSopenharmony_ci        // workgroup and invocation IDs
6165bd8deadSopenharmony_ci        in    uvec3 gl_WorkGroupID;
6175bd8deadSopenharmony_ci        in    uvec3 gl_LocalInvocationID;
6185bd8deadSopenharmony_ci
6195bd8deadSopenharmony_ci        // derived variables
6205bd8deadSopenharmony_ci        in    uvec3 gl_GlobalInvocationID;
6215bd8deadSopenharmony_ci        in    uint  gl_LocalInvocationIndex;
6225bd8deadSopenharmony_ci
6235bd8deadSopenharmony_ci    Add the end of Section 7.1, before Section 7.1.1:
6245bd8deadSopenharmony_ci
6255bd8deadSopenharmony_ci        The built-in variable <gl_NumWorkGroups> is a compute-shader input
6265bd8deadSopenharmony_ci    variable containing the total number of global work items in each
6275bd8deadSopenharmony_ci    dimension of the workgroup that will execute the compute shader.
6285bd8deadSopenharmony_ci    Its content is equal to the values specified in the <num_groups_x>,
6295bd8deadSopenharmony_ci    <num_groups_y>, and <num_groups_z> parameters passed to the
6305bd8deadSopenharmony_ci    DispatchCompute API entry point.
6315bd8deadSopenharmony_ci
6325bd8deadSopenharmony_ci        The built-in constant <gl_WorkGroupSize> is a compute-shader constant
6335bd8deadSopenharmony_ci    containing the workgroup size of the shader. The size of the workgroup
6345bd8deadSopenharmony_ci    in the X, Y, and Z dimensions is stored in the x, y, and z components.
6355bd8deadSopenharmony_ci    The values stored in <gl_WorkGroupSize> match those specified in the
6365bd8deadSopenharmony_ci    required <local_size_x>, <local_size_y>, and <local_size_z> layout
6375bd8deadSopenharmony_ci    qualifiers for the current shader. This value is constant so that
6385bd8deadSopenharmony_ci    it can be used to size arrays of memory that can be shared within
6395bd8deadSopenharmony_ci    the workgroup.
6405bd8deadSopenharmony_ci
6415bd8deadSopenharmony_ci        The built-in variable <gl_WorkGroupID> is a compute-shader input
6425bd8deadSopenharmony_ci    variable containing the 3-dimensional index of the global workgroup
6435bd8deadSopenharmony_ci    that the current invocation is executing in. The possible values range
6445bd8deadSopenharmony_ci    across the parameters passed into DispatchCompute, i.e., from (0, 0, 0) to
6455bd8deadSopenharmony_ci    (gl_NumWorkGroups.x - 1, gl_NumWorkGroups.y - 1, gl_NumWorkGroups.z - 1).
6465bd8deadSopenharmony_ci
6475bd8deadSopenharmony_ci        The built-in variable <gl_LocalInvocationID> is a compute-shader input
6485bd8deadSopenharmony_ci    variable containing the 3-dimensional index of the workgroup
6495bd8deadSopenharmony_ci    within the global workgroup that the current invocation is executing in.
6505bd8deadSopenharmony_ci    The possible values for this variable range across the workgroup
6515bd8deadSopenharmony_ci    size, i.e. (0,0,0) to (gl_WorkGroupSize.x - 1, gl_WorkGroupSize.y - 1,
6525bd8deadSopenharmony_ci    gl_WorkGroupSize.z - 1).
6535bd8deadSopenharmony_ci
6545bd8deadSopenharmony_ci        The built-in variable <gl_GlobalInvocationID> is a compute shader input
6555bd8deadSopenharmony_ci    variable containing the global index of the current work item.  This
6565bd8deadSopenharmony_ci    value uniquely identifies this invocation from all other invocations
6575bd8deadSopenharmony_ci    across all workgroups initiated by the current
6585bd8deadSopenharmony_ci    DispatchCompute call.  This is computed as:
6595bd8deadSopenharmony_ci
6605bd8deadSopenharmony_ci        gl_GlobalInvocationID =
6615bd8deadSopenharmony_ci            gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID.
6625bd8deadSopenharmony_ci
6635bd8deadSopenharmony_ci        The built-in variable <gl_LocalInvocationIndex> is a compute shader
6645bd8deadSopenharmony_ci    input variable that contains the 1-dimensional representation of the
6655bd8deadSopenharmony_ci    gl_LocalInvocationID. This is useful for uniquely identifying a
6665bd8deadSopenharmony_ci    unique region of shared memory within the workgroup for this
6675bd8deadSopenharmony_ci    invocation to use. This is computed as:
6685bd8deadSopenharmony_ci
6695bd8deadSopenharmony_ci        gl_LocalInvocationIndex =
6705bd8deadSopenharmony_ci            gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
6715bd8deadSopenharmony_ci            gl_LocalInvocationID.y * gl_WorkGroupSize.x +
6725bd8deadSopenharmony_ci            gl_LocalInvocationID.x;
6735bd8deadSopenharmony_ci
6745bd8deadSopenharmony_ci    Add to the list of built-in constants in Section 7.3:
6755bd8deadSopenharmony_ci
6765bd8deadSopenharmony_ci        const ivec3 gl_MaxComputeWorkGroupCount = { 65535, 65535, 65535 };
6775bd8deadSopenharmony_ci        const ivec3 gl_MaxComputeWorkGroupSize = { 1024, 1024, 64 };
6785bd8deadSopenharmony_ci        const int gl_MaxComputeUniformComponents = 512;
6795bd8deadSopenharmony_ci        const int gl_MaxComputeTextureImageUnits = 16;
6805bd8deadSopenharmony_ci        const int gl_MaxComputeImageUniforms = 8;
6815bd8deadSopenharmony_ci        const int gl_MaxComputeAtomicCounters = 8;
6825bd8deadSopenharmony_ci        const int gl_MaxComputeAtomicCounterBuffers = 1;
6835bd8deadSopenharmony_ci
6845bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL Shading Language Specification, Version
6855bd8deadSopenharmony_ci4.20 (Built-in Variables)
6865bd8deadSopenharmony_ci
6875bd8deadSopenharmony_ci    Insert "Atomic Memory Functions" section after Section 8.10, Atomic
6885bd8deadSopenharmony_ci    Counter Functions (p. 149).  Atomic memory operations are supported on
6895bd8deadSopenharmony_ci    shared variables; the set of operations and their definitions are similar
6905bd8deadSopenharmony_ci    to those for the imageAtomic*() functions.  These functions are fully
6915bd8deadSopenharmony_ci    documented in the ARB_shader_storage_buffer_object extension (see
6925bd8deadSopenharmony_ci    dependencies).
6935bd8deadSopenharmony_ci
6945bd8deadSopenharmony_ci    Modify the first paragraph of Section 8.15, "Shader Invocation Control
6955bd8deadSopenharmony_ci    Functions" to read:
6965bd8deadSopenharmony_ci
6975bd8deadSopenharmony_ci        The shader invocation control function is only available in tessellation
6985bd8deadSopenharmony_ci    control shaders and compute shaders. It is used to control the relative
6995bd8deadSopenharmony_ci    execution order of multiple shader invocations used to process a patch
7005bd8deadSopenharmony_ci    (in the case of tessellation control shaders) or a workgroup (in the
7015bd8deadSopenharmony_ci    case of compute shaders), which are otherwise executed with an undefined
7025bd8deadSopenharmony_ci    order.
7035bd8deadSopenharmony_ci
7045bd8deadSopenharmony_ci    +----------------+--------------------------------------------------------------------------+
7055bd8deadSopenharmony_ci    | Syntax         | Description                                                              |
7065bd8deadSopenharmony_ci    +----------------+--------------------------------------------------------------------------+
7075bd8deadSopenharmony_ci    | barrier        | For any given static instance of barrier() appearing in a tessellation   |
7085bd8deadSopenharmony_ci    |                | control shader or compute shader, all invocations for a single patch     |
7095bd8deadSopenharmony_ci    |                | or workgroup, respectively, must enter it before any will continue       |
7105bd8deadSopenharmony_ci    |                | beyond it.                                                               |
7115bd8deadSopenharmony_ci    +----------------+--------------------------------------------------------------------------+
7125bd8deadSopenharmony_ci
7135bd8deadSopenharmony_ci    Modify the second paragraph as follows:
7145bd8deadSopenharmony_ci
7155bd8deadSopenharmony_ci    ... Because invocations may execute in an undefined order between these
7165bd8deadSopenharmony_ci    barrier calls, the values of a per-vertex or per-patch output variable in
7175bd8deadSopenharmony_ci    a tessellation control shader or shared variables for compute shaders
7185bd8deadSopenharmony_ci    will be undefined in a number of cases enumerated in Section 4.3.7 "Output
7195bd8deadSopenharmony_ci    Variables" (for tessellation control shaders) and Section 4.3.6 "Shared
7205bd8deadSopenharmony_ci    Variables" (for compute shaders).
7215bd8deadSopenharmony_ci
7225bd8deadSopenharmony_ci    Replace the third paragraph with the following:
7235bd8deadSopenharmony_ci
7245bd8deadSopenharmony_ci    For tessellation control shaders, the barrier() function may only be
7255bd8deadSopenharmony_ci    placed inside the function main() of the tessellation control shader and
7265bd8deadSopenharmony_ci    may not be called within any control flow. Barriers are also disallowed
7275bd8deadSopenharmony_ci    after a return statement in the function main(). Any such misplaced
7285bd8deadSopenharmony_ci    barriers result in a compile-time error.
7295bd8deadSopenharmony_ci
7305bd8deadSopenharmony_ci    For compute shaders, the barrier() function may be placed within flow
7315bd8deadSopenharmony_ci    control, but that flow control must be uniform flow control. That is, all
7325bd8deadSopenharmony_ci    the controlling expressions that lead to execution of the barrier must be
7335bd8deadSopenharmony_ci    dynamically uniform expressions. This ensures that if any shader
7345bd8deadSopenharmony_ci    invocation enters a conditional statement, then all invocations will enter
7355bd8deadSopenharmony_ci    it. While compilers are encouraged to give warnings if they can detect
7365bd8deadSopenharmony_ci    this might not happen, compilers cannot completely determine this. Hence,
7375bd8deadSopenharmony_ci    it is the author's responsibility to ensure barrier() only exists inside
7385bd8deadSopenharmony_ci    uniform flow control. Otherwise, some shader invocations will stall
7395bd8deadSopenharmony_ci    indefinitely, waiting for a barrier that is never reached by other
7405bd8deadSopenharmony_ci    invocations.
7415bd8deadSopenharmony_ci
7425bd8deadSopenharmony_ci    Modify the table of memory control functions on p.160,
7435bd8deadSopenharmony_ci
7445bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7455bd8deadSopenharmony_ci    | Syntax                            | Description                                                                            |
7465bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7475bd8deadSopenharmony_ci    | void memoryBarrier()              | Control the ordering of all memory transactions issued by a single shader invocation.  |
7485bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7495bd8deadSopenharmony_ci    | void memoryBarrierAtomicCounter() | Control the ordering of accesses to atomic counter variables issued by a single shader |
7505bd8deadSopenharmony_ci    |                                   | invocation.                                                                            |
7515bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7525bd8deadSopenharmony_ci    | void memoryBarrierBuffer()        | Control the ordering of memory transactions to buffer variables issued within a        |
7535bd8deadSopenharmony_ci    |                                   | single shader invocation.                                                              |
7545bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7555bd8deadSopenharmony_ci    | void memoryBarrierImage()         | Control the ordering of memory transactions to images issued within a single shader    |
7565bd8deadSopenharmony_ci    |                                   | invocation.                                                                            |
7575bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7585bd8deadSopenharmony_ci    | void memoryBarrierShared()        | Control the ordering of memory transactions to shared variables issued within a single |
7595bd8deadSopenharmony_ci    |                                   | shader invocation.                                                                     |
7605bd8deadSopenharmony_ci    |                                   | Only available in compute shaders.                                                     |
7615bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7625bd8deadSopenharmony_ci    | void groupMemoryBarrier()         | Control the ordering of all memory transactions issued within a single shader          |
7635bd8deadSopenharmony_ci    |                                   | invocation, as viewed by other invocations in the same workgroup.                      |
7645bd8deadSopenharmony_ci    |                                   | Only available in compute shaders.                                                     |
7655bd8deadSopenharmony_ci    +-----------------------------------+----------------------------------------------------------------------------------------+
7665bd8deadSopenharmony_ci
7675bd8deadSopenharmony_ci    Modify the subsequent paragraph as follows:
7685bd8deadSopenharmony_ci
7695bd8deadSopenharmony_ci    The memory barrier built-in functions can be used to order reads and
7705bd8deadSopenharmony_ci    writes to variables stored in memory accessible to other shader
7715bd8deadSopenharmony_ci    invocations.  When called, these functions will wait for the completion of
7725bd8deadSopenharmony_ci    all reads and writes previously performed by the caller that access
7735bd8deadSopenharmony_ci    selected variable types, and then return with no other effect.  The
7745bd8deadSopenharmony_ci    built-in functions memoryBarrierAtomicCounter(), memoryBarrierBuffer(),
7755bd8deadSopenharmony_ci    memoryBarrierImage(), and memoryBarrierShared() wait for the completion of
7765bd8deadSopenharmony_ci    accesses to atomic counter, buffer, image, and shared variables,
7775bd8deadSopenharmony_ci    respectively.  The built-in functions memoryBarrier() and
7785bd8deadSopenharmony_ci    groupMemoryBarrier() wait for the completion of accesses to all of the
7795bd8deadSopenharmony_ci    above variable types.  The functions memoryBarrierShared() and
7805bd8deadSopenharmony_ci    groupMemoryBarrier() are available only in compute shaders; the other
7815bd8deadSopenharmony_ci    functions are available in all shader types.
7825bd8deadSopenharmony_ci
7835bd8deadSopenharmony_ci    When these functions return, any memory stores performed using coherent
7845bd8deadSopenharmony_ci    variables prior to the call will be visible to any future coherent access
7855bd8deadSopenharmony_ci    to the same memory performed by any other shader invocation.  In
7865bd8deadSopenharmony_ci    particular, the values written this way in one shader stage are guaranteed
7875bd8deadSopenharmony_ci    to be visible to coherent memory accesses performed by shader invocations
7885bd8deadSopenharmony_ci    in subsequent stages when those invocations were triggered by the
7895bd8deadSopenharmony_ci    execution of the original shader invocation (e.g., fragment shader
7905bd8deadSopenharmony_ci    invocations for a primitive resulting from a particular geometry shader
7915bd8deadSopenharmony_ci    invocation).
7925bd8deadSopenharmony_ci
7935bd8deadSopenharmony_ci    Additionally, memory barrier functions order stores performed by the
7945bd8deadSopenharmony_ci    calling invocation, as observed by other shader invocations.  Without
7955bd8deadSopenharmony_ci    memory barriers, if one shader invocation performs two stores to coherent
7965bd8deadSopenharmony_ci    variables, a second shader invocation might see the values written by the
7975bd8deadSopenharmony_ci    second store prior to seeing those written by the first.  However, if the
7985bd8deadSopenharmony_ci    first shader invocation calls a memory barrier function between the two
7995bd8deadSopenharmony_ci    stores, selected other shader invocations will never see the results of
8005bd8deadSopenharmony_ci    the second store before seeing those of the first.  When using the
8015bd8deadSopenharmony_ci    function groupMemoryBarrier(), this ordering guarantee applies only to
8025bd8deadSopenharmony_ci    other shader invocations in the same compute shader workgroup; all other
8035bd8deadSopenharmony_ci    memory barrier functions provide the guarantee to all other shader
8045bd8deadSopenharmony_ci    invocations.  No memory barrier is required to guarantee the order of
8055bd8deadSopenharmony_ci    memory stores as observed by the invocation performing the stores; an
8065bd8deadSopenharmony_ci    invocation reading from a variable that it previously wrote will always
8075bd8deadSopenharmony_ci    see the most recently written value unless another shader invocation also
8085bd8deadSopenharmony_ci    wrote to the same memory.
8095bd8deadSopenharmony_ci
8105bd8deadSopenharmony_ciDependencies on OpenGL 4.3 and ARB_shader_storage_buffer_object
8115bd8deadSopenharmony_ci
8125bd8deadSopenharmony_ci    If OpenGL 4.3 and ARB_shader_storage_buffer_object are not supported, the
8135bd8deadSopenharmony_ci    spec language adding the built-in functions atomicAdd(), atomicMin(),
8145bd8deadSopenharmony_ci    atomicMax(), atomicAnd(), atomicOr(), atomicXor(), atomicExchange(), and
8155bd8deadSopenharmony_ci    atomicCompSwap() should be considered to be incorporated into this
8165bd8deadSopenharmony_ci    extension as-is, except that buffer variables will not be supported and
8175bd8deadSopenharmony_ci    thus cannot be used with these functions.  No "#extension" directive is
8185bd8deadSopenharmony_ci    necessary to use these functions in compute shaders.
8195bd8deadSopenharmony_ci
8205bd8deadSopenharmony_ci    If OpenGL 4.3 and ARB_shader_storage_buffer_object are not supported,
8215bd8deadSopenharmony_ci    references to the GLSL built-in function memoryBarrierBuffer() should be
8225bd8deadSopenharmony_ci    removed.
8235bd8deadSopenharmony_ci
8245bd8deadSopenharmony_ciDependencies on NV_vertex_buffer_unified_memory
8255bd8deadSopenharmony_ci
8265bd8deadSopenharmony_ci    If NV_vertex_buffer_unified_memory is supported, a new buffer address
8275bd8deadSopenharmony_ci    range and enable is provided to permit the use with
8285bd8deadSopenharmony_ci    DispatchComputeIndirect with a resident buffer object without requiring
8295bd8deadSopenharmony_ci    that it be bound to the DISPATCH_INDIRECT_BUFFER target.  The following
8305bd8deadSopenharmony_ci    additional edits apply:
8315bd8deadSopenharmony_ci
8325bd8deadSopenharmony_ci    Accepted by the <cap> parameter of GetBufferParameterui64vNV:
8335bd8deadSopenharmony_ci
8345bd8deadSopenharmony_ci        DISPATCH_INDIRECT_BUFFER                        (defined above)
8355bd8deadSopenharmony_ci
8365bd8deadSopenharmony_ci    Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, and by
8375bd8deadSopenharmony_ci    the <pname> parameter of GetIntegerv, GetBooleanv, GetFloatv, GetDoublev
8385bd8deadSopenharmony_ci    and GetInteger64v:
8395bd8deadSopenharmony_ci
8405bd8deadSopenharmony_ci        DISPATCH_INDIRECT_UNIFIED_NV                    0x90FD
8415bd8deadSopenharmony_ci
8425bd8deadSopenharmony_ci    Accepted by the <pname> parameter of BufferAddressRangeNV
8435bd8deadSopenharmony_ci    and the <value> parameter of GetIntegerui64vNV:
8445bd8deadSopenharmony_ci
8455bd8deadSopenharmony_ci        DISPATCH_INDIRECT_ADDRESS_NV                    0x90FE
8465bd8deadSopenharmony_ci
8475bd8deadSopenharmony_ci    Accepted by the <value> parameter of GetIntegerv:
8485bd8deadSopenharmony_ci
8495bd8deadSopenharmony_ci        DISPATCH_INDIRECT_LENGTH_NV                     0x90FF
8505bd8deadSopenharmony_ci
8515bd8deadSopenharmony_ci    Add to the end of Section 5.5, after discussion of
8525bd8deadSopenharmony_ci    DispatchComputeIndirect:
8535bd8deadSopenharmony_ci
8545bd8deadSopenharmony_ci    If DISPATCH_INDIRECT_UNIFIED_NV is enabled, DispatchComputeIndirect does
8555bd8deadSopenharmony_ci    not use the buffer bound to DISPATCH_INDIRECT_BUFFER.  Instead, it sources
8565bd8deadSopenharmony_ci    its arguments from the GPU address range specified by calling
8575bd8deadSopenharmony_ci    BufferAddressRangeNV with a <pname> of DISPATCH_INDIRECT_ADDRESS_NV and an
8585bd8deadSopenharmony_ci    <index> of zero.  The address is obtained by adding the <indirect>
8595bd8deadSopenharmony_ci    parameter to the base address of the range, specified by the <address>
8605bd8deadSopenharmony_ci    parameter of BufferAddressRangeNV.  If the command sources data outside
8615bd8deadSopenharmony_ci    the specified address range, the error INVALID_OPERATION will be
8625bd8deadSopenharmony_ci    generated.  The DISPATCH_INDIRECT_BUFFER binding will be ignored in this
8635bd8deadSopenharmony_ci    case, and no errors will be generated due to the use of this binding.  The
8645bd8deadSopenharmony_ci    error INVALID_VALUE will still be generated if <indirect> is negative.  No
8655bd8deadSopenharmony_ci    INVALID_VALUE error will be generated if <indirect> is not a multiple of
8665bd8deadSopenharmony_ci    four, but INVALID_OPERATION will be generated if the effective address is
8675bd8deadSopenharmony_ci    not a multiple of four.  If the indirect dispatch address range does not
8685bd8deadSopenharmony_ci    belong to a buffer object that is resident at the time of the
8695bd8deadSopenharmony_ci    DispatchComputeIndirect call, undefined results, possibly including
8705bd8deadSopenharmony_ci    program termination, may occur.
8715bd8deadSopenharmony_ci
8725bd8deadSopenharmony_ci    Add the following to the "Compute Dispatch State" table defined in this
8735bd8deadSopenharmony_ci    extension:
8745bd8deadSopenharmony_ci
8755bd8deadSopenharmony_ci    Get Value                           Type    Get Command         Initial Value   Sec     Attribute
8765bd8deadSopenharmony_ci    ---------                           ----    -----------         -------------   ---     ---------
8775bd8deadSopenharmony_ci    DISPATCH_INDIRECT_UNIFIED_NV         B      IsEnabled               FALSE       5.5     none
8785bd8deadSopenharmony_ci    DISPATCH_INDIRECT_ADDRESS_NV        Z64+    GetIntegerui64vNV         0         5.5     none
8795bd8deadSopenharmony_ci    DISPATCH_INDIRECT_LENGTH_NV          Z+     GetIntegerv               0         5.5     none
8805bd8deadSopenharmony_ci
8815bd8deadSopenharmony_ciErrors
8825bd8deadSopenharmony_ci
8835bd8deadSopenharmony_ci    INVALID_OPERATION is generated by DispatchCompute or
8845bd8deadSopenharmony_ci    DispatchComputeIndirect if there is no active program for the compute
8855bd8deadSopenharmony_ci    shader stage.
8865bd8deadSopenharmony_ci
8875bd8deadSopenharmony_ci    INVALID_VALUE is generated by DispatchCompute if any of <num_groups_x>,
8885bd8deadSopenharmony_ci    <num_groups_y> or <num_groups_z> is greater than the value of
8895bd8deadSopenharmony_ci    MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding dimension.
8905bd8deadSopenharmony_ci
8915bd8deadSopenharmony_ci    INVALID_VALUE is generated by DispatchComputeIndirect if <indirect> is
8925bd8deadSopenharmony_ci    less than zero or not a multiple of four.
8935bd8deadSopenharmony_ci
8945bd8deadSopenharmony_ci    INVALID_OPERATION is generated by DispatchComputeIndirect if no buffer is
8955bd8deadSopenharmony_ci    bound to DISPATCH_INDIRECT_BUFFER or if the command would source data
8965bd8deadSopenharmony_ci    beyond the end of the bound buffer object.
8975bd8deadSopenharmony_ci
8985bd8deadSopenharmony_ci    INVALID_OPERATION is generated by GetProgramiv is <pname> is
8995bd8deadSopenharmony_ci    COMPUTE_WORK_GROUP_SIZE and either the program has not been linked
9005bd8deadSopenharmony_ci    successfully, or has been linked but contains no compute shaders.
9015bd8deadSopenharmony_ci
9025bd8deadSopenharmony_ci    LinkProgram will fail if <program> contains a combination of compute and
9035bd8deadSopenharmony_ci    non-compute shaders.
9045bd8deadSopenharmony_ci
9055bd8deadSopenharmony_ciNew State
9065bd8deadSopenharmony_ci
9075bd8deadSopenharmony_ci    None.
9085bd8deadSopenharmony_ci
9095bd8deadSopenharmony_ciNew Implementation Dependent State
9105bd8deadSopenharmony_ci
9115bd8deadSopenharmony_ci    Add to Table 6.31, "Program Pipeline Object State"
9125bd8deadSopenharmony_ci
9135bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9145bd8deadSopenharmony_ci    | Get Value                                          | Type      | Get Command             | Initial Value | Description                                                           | Sec.    |
9155bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9165bd8deadSopenharmony_ci    | COMPUTE_SHADER                                     | Z+        | GetProgramPipelineiv    | 0             | Name of current compute shader project object                         | 2.11.4  |
9175bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9185bd8deadSopenharmony_ci
9195bd8deadSopenharmony_ci    Add to Table 6.32, "Program Object State"
9205bd8deadSopenharmony_ci
9215bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9225bd8deadSopenharmony_ci    | Get Value                                          | Type      | Get Command             | Initial Value | Description                                                           | Sec.    |
9235bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9245bd8deadSopenharmony_ci    | COMPUTE_WORK_GROUP_SIZE                            | 3 x Z+    | GetProgramiv            | { 0, ... }    | Workgroup size of a linked compute program                            | 5.5     |
9255bd8deadSopenharmony_ci    | UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER         | B         | GetActiveUniformBlockiv | FALSE         | True if uniform block is referenced by the compute stage              | 2.17.7  |
9265bd8deadSopenharmony_ci    | ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER | B         | GetActiveAtomicCounter- | FALSE         | AACB has a counter used by compute shaders                            | 2.17.7  |
9275bd8deadSopenharmony_ci    |                                                    |           |   Bufferiv              | FALSE         |                                                                       |         |
9285bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9295bd8deadSopenharmony_ci
9305bd8deadSopenharmony_ci    Insert new table named "Compute Dispatch State", after Table 6.46 "Hints":
9315bd8deadSopenharmony_ci
9325bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9335bd8deadSopenharmony_ci    | Get Value                                          | Type      | Get Command             | Initial Value | Description                                                           | Sec.    |
9345bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9355bd8deadSopenharmony_ci    | DISPATCH_INDIRECT_BUFFER_BINDING                   | Z+        | GetIntegerv             | 0             | Indirect dispatch buffer binding                                      | 5.5     |
9365bd8deadSopenharmony_ci    +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+
9375bd8deadSopenharmony_ci
9385bd8deadSopenharmony_ci    Insert Table 6.50, "Implementation Dependent Compute Shader Limits",
9395bd8deadSopenharmony_ci    renumber subsequent tables.
9405bd8deadSopenharmony_ci
9415bd8deadSopenharmony_ci    +-----------------------------------------+-----------+---------------+---------------------+-----------------------------------------------------------------------+---------+
9425bd8deadSopenharmony_ci    | Get Value                               | Type      | Get Command   | Minimum Value       | Description                                                           | Sec.    |
9435bd8deadSopenharmony_ci    +-----------------------------------------+-----------+---------------+---------------------+-----------------------------------------------------------------------+---------+
9445bd8deadSopenharmony_ci    | MAX_COMPUTE_WORK_GROUP_COUNT            | 3 x Z+    | GetIntegeri_v | 65535               | Maximum number of workgroups that may be dispatched by a single       | 5.5     |
9455bd8deadSopenharmony_ci    |                                         |           |               |                     | dispatch command (per dimension)                                      |         |
9465bd8deadSopenharmony_ci    | MAX_COMPUTE_WORK_GROUP_SIZE             | 3 x Z+    | GetIntegeri_v | 1024 (x, y), 64 (z) | Maximum local size of a compute workgroup (per dimension)             | 5.5     |
9475bd8deadSopenharmony_ci    | MAX_COMPUTE_WORK_GROUP_INVOCATIONS      | Z+        | GetIntegerv   | 1024                | Maximum total compute shader invocations in a single workgroup        | 5.5     |
9485bd8deadSopenharmony_ci    | MAX_COMPUTE_UNIFORM_BLOCKS              | Z+        | GetIntegerv   | 12                  | Maximum number of uniform blocks per compute program                  | 2.11.7  |
9495bd8deadSopenharmony_ci    | MAX_COMPUTE_TEXTURE_IMAGE_UNITS         | Z+        | GetIntegerv   | 16                  | Maximum number of texture image units accessible by a compute shader  | 2.11.12 |
9505bd8deadSopenharmony_ci    | MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS      | Z+        | GetIntegerv   | 8                   | Number of atomic counter buffers accessed by a compute shader         | 2.11.17 |
9515bd8deadSopenharmony_ci    | MAX_COMPUTE_ATOMIC_COUNTERS             | Z+        | GetIntegerv   | 8                   | Number of atomic counters accessed by a compute shader                | 2.11.12 |
9525bd8deadSopenharmony_ci    | MAX_COMPUTE_SHARED_MEMORY_SIZE          | Z+        | GetIntegerv   | 32768               | Maximum total storage size of all variables declared as <shared> in   |         |
9535bd8deadSopenharmony_ci    |                                         |           |               |                     | all compute shaders linked into a single program object               |         |
9545bd8deadSopenharmony_ci    | MAX_COMPUTE_UNIFORM_COMPONENTS          | Z+        | GetIntegerv   | 512                 | Number of components for compute shader uniform variables             | 5.5.1   |
9555bd8deadSopenharmony_ci    | MAX_COMPUTE_IMAGE_UNIFORMS              | Z+        | GetIntegerv   | 8                   | Number of image variables in compute shaders                          | 2.11.12 |
9565bd8deadSopenharmony_ci    | MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS | Z+        | GetIntegerv   | *                   | Number of words for compute shader uniform variables in all uniform   | 5.5.1   |
9575bd8deadSopenharmony_ci    |                                         |           |               |                     | blocks, including the default                                         |         |
9585bd8deadSopenharmony_ci    +-----------------------------------------+-----------+---------------+---------------------+-----------------------------------------------------------------------+---------+
9595bd8deadSopenharmony_ci
9605bd8deadSopenharmony_ci    Modify Table 6.55, increasing the following minimum values:
9615bd8deadSopenharmony_ci
9625bd8deadSopenharmony_ci           MAX_COMBINED_TEXTURE_IMAGE_UNITS     96 (6*16), was 80
9635bd8deadSopenharmony_ci           MAX_UNIFORM_BUFFER_BINDINGS          72 (6*12), was 60
9645bd8deadSopenharmony_ci
9655bd8deadSopenharmony_ciIssues
9665bd8deadSopenharmony_ci
9675bd8deadSopenharmony_ci    1) Should <shared> variables be usable only in compute shaders, or in other
9685bd8deadSopenharmony_ci       stages too?
9695bd8deadSopenharmony_ci
9705bd8deadSopenharmony_ci       RESOLVED:  Support only in compute shaders.  While some hardware may be
9715bd8deadSopenharmony_ci       able to support shared variables in shader stages other than compute,
9725bd8deadSopenharmony_ci       it is difficult to clearly define what the semantics are as far as
9735bd8deadSopenharmony_ci       sharing. For example, what is the equivalent for a workgroup for
9745bd8deadSopenharmony_ci       vertex shaders?
9755bd8deadSopenharmony_ci
9765bd8deadSopenharmony_ci    2) Can we expose atomics on <shared> variables?
9775bd8deadSopenharmony_ci
9785bd8deadSopenharmony_ci       RESOLVED:  Yes.  The existing atomics in OpenGL 4.2 (via image
9795bd8deadSopenharmony_ci       variables) don't map well to the <shared> declaration.  Instead, we've
9805bd8deadSopenharmony_ci       defined new atomic functions that take a variable as a first input.
9815bd8deadSopenharmony_ci       These functions are specified in the ARB_shader_storage_buffer_object
9825bd8deadSopenharmony_ci       extension and are incorporated into this extension via the interaction
9835bd8deadSopenharmony_ci       described above.  We could have also chosen to define operators +=, &=,
9845bd8deadSopenharmony_ci       etc. to be atomic when applied to <shared> variables, but shaders may
9855bd8deadSopenharmony_ci       want to use such variables in cases where atomic access (and the
9865bd8deadSopenharmony_ci       related overhead) is not required.
9875bd8deadSopenharmony_ci
9885bd8deadSopenharmony_ci    3) Should the local size and dimensions of the workgroup be specified at
9895bd8deadSopenharmony_ci       compile time? What are the default local dimensions?
9905bd8deadSopenharmony_ci
9915bd8deadSopenharmony_ci       RESOLVED: Dimension is always 3 and a workgroup size declaration is
9925bd8deadSopenharmony_ci       compulsory at compile time. There is no default. The value used is
9935bd8deadSopenharmony_ci       queriable.  To use a 1- or 2-dimensional workgroup, the extra
9945bd8deadSopenharmony_ci       dimension(s) can be set to 1.
9955bd8deadSopenharmony_ci
9965bd8deadSopenharmony_ci    4) Do we need the local_work_size parameter in dispatch if the local size
9975bd8deadSopenharmony_ci       may be specified at compile time in the shader?
9985bd8deadSopenharmony_ci
9995bd8deadSopenharmony_ci       RESOLVED: The specification of the workgroup size is now mandatory in
10005bd8deadSopenharmony_ci       the shader source at compile time and the local_work_size may no longer
10015bd8deadSopenharmony_ci       be specified at dispatch time.
10025bd8deadSopenharmony_ci
10035bd8deadSopenharmony_ci    5) How do multiple shaders attached to a single program object work?
10045bd8deadSopenharmony_ci
10055bd8deadSopenharmony_ci       RESOLVED:  Just as with any other shader stage. Exactly one of the
10065bd8deadSopenharmony_ci       shaders must provide the 'main' entry point. All shaders attached to a
10075bd8deadSopenharmony_ci       program object effectively get compiled into a single, large program at
10085bd8deadSopenharmony_ci       link time.  The program is dispatched as one big entity. Über shader
10095bd8deadSopenharmony_ci       type functionality can be achieved through the use of subroutine
10105bd8deadSopenharmony_ci       uniforms, which also work exactly as for other shader stages.
10115bd8deadSopenharmony_ci
10125bd8deadSopenharmony_ci    6) Should compute dispatch honor conditional rendering?
10135bd8deadSopenharmony_ci
10145bd8deadSopenharmony_ci       RESOLVED: Yes, it does honor conditional rendering.
10155bd8deadSopenharmony_ci
10165bd8deadSopenharmony_ci    7) Is it possible to pass compute programs to UseProgram, etc.?
10175bd8deadSopenharmony_ci
10185bd8deadSopenharmony_ci       RESOLVED: Yes, compute programs can be made current via UseProgram and
10195bd8deadSopenharmony_ci       can be made current in a program pipeline object via UseProgramStages.
10205bd8deadSopenharmony_ci       Note that a compute program must be linked with PROGRAM_SEPARABLE set
10215bd8deadSopenharmony_ci       to TRUE to be passed to UseProgramStages, even though the compute
10225bd8deadSopenharmony_ci       pipeline has only a single shader stage.
10235bd8deadSopenharmony_ci
10245bd8deadSopenharmony_ci       The active compute program that will be used by DispatchCompute will be
10255bd8deadSopenharmony_ci       determined in the same manner as the active program for any other
10265bd8deadSopenharmony_ci       program stage:
10275bd8deadSopenharmony_ci
10285bd8deadSopenharmony_ci         * If there is a current program specified via UseProgram, that
10295bd8deadSopenharmony_ci           program is considered current for all stages, including compute.
10305bd8deadSopenharmony_ci
10315bd8deadSopenharmony_ci         * Otherwise, if there is a current program pipeline object, the
10325bd8deadSopenharmony_ci           program current for the compute stage of the pipeline object is
10335bd8deadSopenharmony_ci           considered current for the compute stage.
10345bd8deadSopenharmony_ci
10355bd8deadSopenharmony_ci         * If neither of the former apply, no program is current for the
10365bd8deadSopenharmony_ci           compute stage.
10375bd8deadSopenharmony_ci
10385bd8deadSopenharmony_ci       The program that is current for the compute stage is considered to be
10395bd8deadSopenharmony_ci       active if and only if it has a compute shader executable.  For example,
10405bd8deadSopenharmony_ci       if a non-compute program is made current via UseProgram, it will also
10415bd8deadSopenharmony_ci       be considered "current" for the compute stage, but won't be considered
10425bd8deadSopenharmony_ci       active.
10435bd8deadSopenharmony_ci
10445bd8deadSopenharmony_ci       When using program pipeline objects, it's possible to switch between
10455bd8deadSopenharmony_ci       graphics and compute work without switching programs.  For example, in:
10465bd8deadSopenharmony_ci
10475bd8deadSopenharmony_ci         glBindProgramPipeline(pipeline);
10485bd8deadSopenharmony_ci         glUseProgramStages(pipeline, GL_VERTEX_SHADER_BIT, programA);
10495bd8deadSopenharmony_ci         glUseProgramStages(pipeline, GL_FRAGMENT_SHADER_BIT, programB);
10505bd8deadSopenharmony_ci         glUseProgramStages(pipeline, GL_COMPUTE_SHADER_BIT, programC);
10515bd8deadSopenharmony_ci         glDrawArrays(GL_TRIANGLES, 0, 900);
10525bd8deadSopenharmony_ci         glDispatchCompute(5, 5, 5);
10535bd8deadSopenharmony_ci
10545bd8deadSopenharmony_ci       the triangles will be processed by programA and programB, while the
10555bd8deadSopenharmony_ci       compute dispatch will be processed by programC.  Similarly,
10565bd8deadSopenharmony_ci
10575bd8deadSopenharmony_ci         glUseProgramStages(pipeline, ~GL_COMPUTE_SHADER_BIT, programAB);
10585bd8deadSopenharmony_ci         glUseProgramStages(pipeline, GL_COMPUTE_SHADER_BIT, programC);
10595bd8deadSopenharmony_ci         glDrawArrays(GL_TRIANGLES, 0, 900);
10605bd8deadSopenharmony_ci         glDispatchCompute(5, 5, 5);
10615bd8deadSopenharmony_ci
10625bd8deadSopenharmony_ci       will have the triangles processed by the multi-stage programAB.
10635bd8deadSopenharmony_ci
10645bd8deadSopenharmony_ci    8) What happens if you try to draw with no active compute program?
10655bd8deadSopenharmony_ci
10665bd8deadSopenharmony_ci       RESOLVED:  An INVALID_OPERATION error is generated if there is no
10675bd8deadSopenharmony_ci       active program for the compute shader stage.
10685bd8deadSopenharmony_ci
10695bd8deadSopenharmony_ci    9) Should we increase minimums on certain replicated state bindings
10705bd8deadSopenharmony_ci       (texture image units, uniform buffer bindings) to reflect the addition
10715bd8deadSopenharmony_ci       of a sixth shader stage?
10725bd8deadSopenharmony_ci
10735bd8deadSopenharmony_ci       RESOLVED:  Yes, for MAX_COMBINED_TEXTURE_IMAGE_UNITS and
10745bd8deadSopenharmony_ci       MAX_UNIFORM_BUFFER_BINDINGS.  These limits permit applications to
10755bd8deadSopenharmony_ci       statically partition the shared set of texture bindings into six
10765bd8deadSopenharmony_ci       separate sets, one per shader stage.
10775bd8deadSopenharmony_ci
10785bd8deadSopenharmony_ci       The limit MAX_COMBINED_UNIFORM_BLOCKS is not increased, because it
10795bd8deadSopenharmony_ci       reflects the sum of the number of uniform blocks used in each stage of
10805bd8deadSopenharmony_ci       a single program.  Since no single program can have more than five
10815bd8deadSopenharmony_ci       stages, these limits don't need to be increased.
10825bd8deadSopenharmony_ci
10835bd8deadSopenharmony_ci    10) How do the shader built-in variables relate to DirectCompute's
10845bd8deadSopenharmony_ci       built-in system values (SV_*)?
10855bd8deadSopenharmony_ci
10865bd8deadSopenharmony_ci        OpenGL Compute             DirectCompute
10875bd8deadSopenharmony_ci        --------------------------------------------------
10885bd8deadSopenharmony_ci        gl_NumWorkGroups           --
10895bd8deadSopenharmony_ci        gl_WorkGroupSize           --
10905bd8deadSopenharmony_ci        gl_WorkGroupID             SV_GroupID
10915bd8deadSopenharmony_ci        gl_LocalInvocationID       SV_GroupThreadID
10925bd8deadSopenharmony_ci        gl_GlobalInvocationID      SV_DispatchThreadID
10935bd8deadSopenharmony_ci        gl_LocalInvocationIndex    SV_GroupIndex
10945bd8deadSopenharmony_ci
10955bd8deadSopenharmony_ci    11) How does "program validation" (checking the active programs against
10965bd8deadSopenharmony_ci        the current state) apply to DispatchCompute?
10975bd8deadSopenharmony_ci
10985bd8deadSopenharmony_ci      RESOLVED:  The same program validation logic will be applied to both
10995bd8deadSopenharmony_ci      graphics primitives (e.g., DrawArrays) and compute dispatches.
11005bd8deadSopenharmony_ci      Conditions that will cause validation errors for graphics primitives
11015bd8deadSopenharmony_ci      will also cause validation errors for compute dispatch, even if the
11025bd8deadSopenharmony_ci      conditions wouldn't otherwise affect compute, for example:
11035bd8deadSopenharmony_ci
11045bd8deadSopenharmony_ci        * Mis-configured program pipeline objects (e.g., inserting a geometry
11055bd8deadSopenharmony_ci          program A between the linked vertex and fragment shaders of of
11065bd8deadSopenharmony_ci          program B).
11075bd8deadSopenharmony_ci
11085bd8deadSopenharmony_ci        * A graphics program has a vertex shader that uses a 2D texture from
11095bd8deadSopenharmony_ci          texture image unit 0 and a fragment shader that uses a 3D texture
11105bd8deadSopenharmony_ci          from texture image unit 0.
11115bd8deadSopenharmony_ci
11125bd8deadSopenharmony_ci      Similarly, validation errors specific to the compute shader executable
11135bd8deadSopenharmony_ci      (e.g., using different targets on a single texture image unit in a
11145bd8deadSopenharmony_ci      compute program) will generate validation errors for graphics Draw*
11155bd8deadSopenharmony_ci      calls.
11165bd8deadSopenharmony_ci
11175bd8deadSopenharmony_ci      We chose to specify this behavior for several reasons.  First, using the
11185bd8deadSopenharmony_ci      same logic in both places ensures a single result for ValidateProgram
11195bd8deadSopenharmony_ci      and ValidateProgramPipeline (a single VALIDATE_STATUS value wouldn't be
11205bd8deadSopenharmony_ci      good enough if the result could be different for compute and graphics).
11215bd8deadSopenharmony_ci      Additionally, a single test allows implementations to set up state and
11225bd8deadSopenharmony_ci      perform validation tests for compute and graphics operations at the same
11235bd8deadSopenharmony_ci      time, without requiring additional irregular graphics- or
11245bd8deadSopenharmony_ci      compute-specific logic.
11255bd8deadSopenharmony_ci
11265bd8deadSopenharmony_ci    12) We specify an INVALID_OPERATION error for DispatchCompute when there
11275bd8deadSopenharmony_ci        is no active program on the compute stage.  Should we specify similar
11285bd8deadSopenharmony_ci        errors for Draw* calls if the current program specified by UseProgram
11295bd8deadSopenharmony_ci        is a compute program?
11305bd8deadSopenharmony_ci
11315bd8deadSopenharmony_ci      RESOLVED:  Not in the current spec.  If a compute shader is made
11325bd8deadSopenharmony_ci      current with UseProgram, there will be no active program for either the
11335bd8deadSopenharmony_ci      vertex and fragment stages.  In this case, the results of vertex and
11345bd8deadSopenharmony_ci      fragment processing are undefined, but no error is generated.  This
11355bd8deadSopenharmony_ci      behavior is already specified in unextended OpenGL 4.2.
11365bd8deadSopenharmony_ci
11375bd8deadSopenharmony_ci      We don't generate errors in this case for several reasons:
11385bd8deadSopenharmony_ci
11395bd8deadSopenharmony_ci        * For the compatibility profile, fixed-function vertex and fragment
11405bd8deadSopenharmony_ci          processing is available, and INVALID_OPERATION wouldn't make sense
11415bd8deadSopenharmony_ci          there.
11425bd8deadSopenharmony_ci
11435bd8deadSopenharmony_ci        * Even in the core profile, there are cases where no active fragment
11445bd8deadSopenharmony_ci          shader is needed (e.g., primitives with RASTERIZER_DISCARD enabled).
11455bd8deadSopenharmony_ci
11465bd8deadSopenharmony_ci      While there is no case where having only a compute program makes sense,
11475bd8deadSopenharmony_ci      at least in the core profile, we chose to keep the same undefined
11485bd8deadSopenharmony_ci      behavior that's already in place.
11495bd8deadSopenharmony_ci
11505bd8deadSopenharmony_ci    13) Should we provide any additional support extending the memoryBarrier()
11515bd8deadSopenharmony_ci        GLSL built-in function provided by ARB_shader_image_load_store and
11525bd8deadSopenharmony_ci        GLSL 4.20?
11535bd8deadSopenharmony_ci
11545bd8deadSopenharmony_ci      RESOLVED:  Yes.  The memoryBarrier() function provided by GLSL 4.20
11555bd8deadSopenharmony_ci      requires (a) synchronizing all memory transactions that might be visible
11565bd8deadSopenharmony_ci      to other shader invocations and (b) ordering memory transactions so that
11575bd8deadSopenharmony_ci      all other shader invocations never see stores issued after the barrier
11585bd8deadSopenharmony_ci      before seeing stores issued before the barrier.  Hardware
11595bd8deadSopenharmony_ci      implementations of GLSL 4.20 may have a high degree of parallelism,
11605bd8deadSopenharmony_ci      where the memory subsystem servicing shader loads and stores may have
11615bd8deadSopenharmony_ci      multiple independent sub-units, and where the shader invocations
11625bd8deadSopenharmony_ci      themselves may be executed in parallel on many shader cores.  The
11635bd8deadSopenharmony_ci      memoryBarrier() command may be fairly heavyweight, requiring
11645bd8deadSopenharmony_ci      synchronization with all memory sub-units and shader cores.
11655bd8deadSopenharmony_ci
11665bd8deadSopenharmony_ci      We provide new functions in two different directions that might serve as
11675bd8deadSopenharmony_ci      lighter weight alternatives to memoryBarrier().  In particular, we
11685bd8deadSopenharmony_ci      provide four new functions
11695bd8deadSopenharmony_ci
11705bd8deadSopenharmony_ci        void memoryBarrierAtomicCounter();
11715bd8deadSopenharmony_ci        void memoryBarrierBuffer();
11725bd8deadSopenharmony_ci        void memoryBarrierImage();
11735bd8deadSopenharmony_ci        void memoryBarrierShared();
11745bd8deadSopenharmony_ci
11755bd8deadSopenharmony_ci      that order transactions of only a specific memory type and might require
11765bd8deadSopenharmony_ci      synchronization with fewer sub-units of the memory subsystem and a new
11775bd8deadSopenharmony_ci      function:
11785bd8deadSopenharmony_ci
11795bd8deadSopenharmony_ci        void groupMemoryBarrier();
11805bd8deadSopenharmony_ci
11815bd8deadSopenharmony_ci      that only order transactions as viewed by other threads in the same
11825bd8deadSopenharmony_ci      workgroup, which might not require synchronization with other shader cores.
11835bd8deadSopenharmony_ci      Since shared memory is only accessible to threads within a single
11845bd8deadSopenharmony_ci      workgroup, memoryBarrierShared() also only requires synchronization with
11855bd8deadSopenharmony_ci      other threads in the same workgroup.
11865bd8deadSopenharmony_ci
11875bd8deadSopenharmony_ciRevision History
11885bd8deadSopenharmony_ci
11895bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
11905bd8deadSopenharmony_ci    ----  --------  --------- -----------------------------------------
11915bd8deadSopenharmony_ci    28    12/10/18  Jon Leech Use 'workgroup' consistently throughout (Bug
11925bd8deadSopenharmony_ci                              11723, internal API issue 87).
11935bd8deadSopenharmony_ci    27    07/24/14  Jon Leech Change value of GLSL limit
11945bd8deadSopenharmony_ci                              gl_MaxComputeUniformComponents to 512 for
11955bd8deadSopenharmony_ci                              consistency with the API (Bug 12370).
11965bd8deadSopenharmony_ci    26    01/30/14  Jon Leech Add table 6.31 COMPUTE_SHADER entry for
11975bd8deadSopenharmony_ci                              program pipeline objects (Bug 11539).
11985bd8deadSopenharmony_ci    25    10/23/12  pbrown    Remove the restriction forbidding the use of
11995bd8deadSopenharmony_ci                              barrier() inside potentially divergent flow
12005bd8deadSopenharmony_ci                              control.  Instead, we will allow barrier() to
12015bd8deadSopenharmony_ci                              be executed anywhere, but specify undefined
12025bd8deadSopenharmony_ci                              results (including hangs or program termination)
12035bd8deadSopenharmony_ci                              if the flow control is divergent (bug 9367).
12045bd8deadSopenharmony_ci    24    07/01/12  Jon Leech Fix typo (bug 8984).
12055bd8deadSopenharmony_ci    23    06/28/12  johnk     Remove two other references to "thread", add
12065bd8deadSopenharmony_ci                              "Only available in compute shaders" to the table
12075bd8deadSopenharmony_ci                              for memoryBarrierShared() and groupMemoryBarrier(),
12085bd8deadSopenharmony_ci                              fixed a typo.
12095bd8deadSopenharmony_ci    22    06/22/12  pbrown    Add a new built-in memoryBarrierBuffer() as an
12105bd8deadSopenharmony_ci                              interaction with ARB_shader_storage_buffer.  Add
12115bd8deadSopenharmony_ci                              a new built-in groupMemoryBarrier() that orders
12125bd8deadSopenharmony_ci                              memory transactions only as observed by other
12135bd8deadSopenharmony_ci                              shader invocations in the same work group.
12145bd8deadSopenharmony_ci                              Enhance the description of the GLSL memory
12155bd8deadSopenharmony_ci                              barrier functions.  Add issue 13 about the new
12165bd8deadSopenharmony_ci                              memory barrier functions added in this extension
12175bd8deadSopenharmony_ci                              (bug 9199).  Mark issues 11 and 12 as resolved.
12185bd8deadSopenharmony_ci                              Add NV_vertex_buffer_unified_memory interaction
12195bd8deadSopenharmony_ci                              allowing DispatchComputeIndirect to read its
12205bd8deadSopenharmony_ci                              arguments from any resident buffer object
12215bd8deadSopenharmony_ci                              instead of the single bound indirect dispatch
12225bd8deadSopenharmony_ci                              buffer.
12235bd8deadSopenharmony_ci    21    06/21/12  gsellers  Clarify that there are no built-in inputs or
12245bd8deadSopenharmony_ci                              outputs in compute shaders (bug 9200).
12255bd8deadSopenharmony_ci    20    06/21/12  gsellers  Throw INVALID_OPERATION if querying
12265bd8deadSopenharmony_ci                              COMPUTE_WORK_GROUP_SIZE from unlinked program or
12275bd8deadSopenharmony_ci                              program with no compute shader (bug 9117).
12285bd8deadSopenharmony_ci    19    06/18/12  pbrown    DispatchComputeIndirect throws INVALID_VALUE
12295bd8deadSopenharmony_ci                              if <indirect> is negative or misaligned (bug
12305bd8deadSopenharmony_ci                              9181).
12315bd8deadSopenharmony_ci    18    06/17/12  pbrown    Clarify that compute-only programs can be used
12325bd8deadSopenharmony_ci                              by both UseProgram and UseProgramStages, and add
12335bd8deadSopenharmony_ci                              a COMPUTE_SHADER_BIT for UseProgramStages (bug
12345bd8deadSopenharmony_ci                              9155).  Specify that validation errors checking
12355bd8deadSopenharmony_ci                              programs against each other and the GL state
12365bd8deadSopenharmony_ci                              apply equally to graphics primitives (Draw*) and
12375bd8deadSopenharmony_ci                              compute dispatches.  Update issue 7; add new
12385bd8deadSopenharmony_ci                              issues 11 and 12.  Clarify that compute shader
12395bd8deadSopenharmony_ci                              invocations in a workgroup are run "potentially
12405bd8deadSopenharmony_ci                              in parallel", but not "in lockstep" (bug 9151).
12415bd8deadSopenharmony_ci                              Other minor wording improvements.
12425bd8deadSopenharmony_ci    17    06/15/12  johnk     Don't allow location layout qualifiers for
12435bd8deadSopenharmony_ci                              compute shader inputs.
12445bd8deadSopenharmony_ci    16    06/15/12  johnk     In the intro material, allow work groups to
12455bd8deadSopenharmony_ci                              only potentially execute in parallel, and use
12465bd8deadSopenharmony_ci                              control barriers to synchronize.  Other minor
12475bd8deadSopenharmony_ci                              fixes.
12485bd8deadSopenharmony_ci    15    06/15/12  dgkoch    Added Additions to Ch.2 of Shading Language.
12495bd8deadSopenharmony_ci                              Renamed shader built-in variables, explained
12505bd8deadSopenharmony_ci                              them better, made them uvec3 instead of int[3].
12515bd8deadSopenharmony_ci                              Added derived shading language variables.
12525bd8deadSopenharmony_ci                              Renamed and changed built-in constants for
12535bd8deadSopenharmony_ci                              consistency with the variables. Removed
12545bd8deadSopenharmony_ci                              gl_MaxComputeWorkDimensions since it is no
12555bd8deadSopenharmony_ci                              longer necessary. Renamed API constants to
12565bd8deadSopenharmony_ci                              be consistent with shading language terminology.
12575bd8deadSopenharmony_ci                              Remove a few rogue references to variable
12585bd8deadSopenharmony_ci                              number of dispatch arguments. Added Issue 10.
12595bd8deadSopenharmony_ci                              (bugs 9151, 9167)
12605bd8deadSopenharmony_ci    14    06/14/12  pbrown    Modify DispatchComputeIndirect to accept an
12615bd8deadSopenharmony_ci                              "intptr"-typed offset instead of a "void *",
12625bd8deadSopenharmony_ci                              since doesn't accept pointers to client memory.
12635bd8deadSopenharmony_ci                              Modify DispatchComputeIndirect to use a new
12645bd8deadSopenharmony_ci                              buffer binding (DISPATCH_INDIRECT_BUFFER)
12655bd8deadSopenharmony_ci                              instead of sharing the binding used by
12665bd8deadSopenharmony_ci                              Draw*Indirect.  Add missing entries in the "New
12675bd8deadSopenharmony_ci                              Tokens" section and assign values.  Update
12685bd8deadSopenharmony_ci                              documentation of COMMAND_BARRIER_BIT to reflect
12695bd8deadSopenharmony_ci                              the new dispatch indirect binding.  Document
12705bd8deadSopenharmony_ci                              DispatchComputeIndirect errors for offsets that
12715bd8deadSopenharmony_ci                              are negative, misaligned, or run off the end of
12725bd8deadSopenharmony_ci                              the bound buffer.  Increase minimums for
12735bd8deadSopenharmony_ci                              combined texture image units and uniform buffer
12745bd8deadSopenharmony_ci                              bindings to reflect the new stage.  Update
12755bd8deadSopenharmony_ci                              various issues, add new issue 9 (bug 9130).
12765bd8deadSopenharmony_ci    13    06/14/12  Jon Leech Copy description of MAX_COMPUTE_SHARED_MEMORY_SIZE
12775bd8deadSopenharmony_ci                              into API spec from GLSL spec (bug 9069).
12785bd8deadSopenharmony_ci    12    05/14/12  pbrown    Add interaction with ARB_shader_storage_buffer_
12795bd8deadSopenharmony_ci                              object. The built-in functions provided there
12805bd8deadSopenharmony_ci                              for atomic memory operations on buffer variables
12815bd8deadSopenharmony_ci                              are also supported for the shared variables
12825bd8deadSopenharmony_ci                              provided here.  The functions themselves are
12835bd8deadSopenharmony_ci                              documented fully in the other specification.
12845bd8deadSopenharmony_ci    11    05/14/12  johnk     Keep the previous logical contents of the last
12855bd8deadSopenharmony_ci                              paragraph of the memory shader control functions.
12865bd8deadSopenharmony_ci    10    04/26/12  gsellers  Count max compute shared variable size in bytes.
12875bd8deadSopenharmony_ci                              Make shared variables implicitly coherent.
12885bd8deadSopenharmony_ci                              Add MAX_COMPUTE_UNIFORM_COMPONENTS.
12895bd8deadSopenharmony_ci                              Clean up MAX_COMPUTE_IMAGE_UNIFORMS.
12905bd8deadSopenharmony_ci     9    04/25/12  gsellers  Add UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER
12915bd8deadSopenharmony_ci                              and ATOMIC_COUNTER_BUFFER_REFERENCED_BY_-
12925bd8deadSopenharmony_ci                              COMPUTE_SHADER.  Remove <program> from dispatch
12935bd8deadSopenharmony_ci                              APIs.  Add memoryBarrier{Image,Shared,
12945bd8deadSopenharmony_ci                              AtomicCounter}().
12955bd8deadSopenharmony_ci     8    04/05/12  gsellers  Remove ARB suffixes.
12965bd8deadSopenharmony_ci     7    02/02/12  gsellers  Require OpenGL 4.2.
12975bd8deadSopenharmony_ci                              Add issue 8.
12985bd8deadSopenharmony_ci                              Up various minimums.
12995bd8deadSopenharmony_ci                              Remove variable dimensionality.
13005bd8deadSopenharmony_ci     6    01/24/12  gsellers  Require OpenGL 3.0.
13015bd8deadSopenharmony_ci                              Incorporate feedback from bmerry.
13025bd8deadSopenharmony_ci                              Add compute shader constants to sec. 7.7.
13035bd8deadSopenharmony_ci                              Add modifications to sec. 8.15 of the GLSL spec.
13045bd8deadSopenharmony_ci                              Add issue 7.
13055bd8deadSopenharmony_ci     5    01/20/12  gsellers  Make compute dispatch honor conditional
13065bd8deadSopenharmony_ci                              rendering.  Add indirect dispatch.
13075bd8deadSopenharmony_ci                              Change 'global work size' to 'num work groups',
13085bd8deadSopenharmony_ci                              make global size in multiples of work group size.
13095bd8deadSopenharmony_ci     4    01/10/12  gsellers  Fix typos and other small corrections.
13105bd8deadSopenharmony_ci                              Make specification of work group size at compile
13115bd8deadSopenharmony_ci                              time compulsory.
13125bd8deadSopenharmony_ci                              Add COMPUTE_WORK_DIMENSION_ARB and
13135bd8deadSopenharmony_ci                              COMPUTE_LOCAL_WORK_SIZE_ARB queries.
13145bd8deadSopenharmony_ci                              Add issue (5), resolve issues (3) and (4).
13155bd8deadSopenharmony_ci     3    01/09/12  gsellers  Change from AMD to ARB.
13165bd8deadSopenharmony_ci                              Update to be relative to OpenGL 4.2 (+GLSL 4.20).
13175bd8deadSopenharmony_ci                              Add <shared> variables.
13185bd8deadSopenharmony_ci                              Add issues (1) - (4).
13195bd8deadSopenharmony_ci                              Add link failure for programs that contain
13205bd8deadSopenharmony_ci                              compute and non-compute shaders.
13215bd8deadSopenharmony_ci     2    06/10/11  gsellers  Add error behavior.
13225bd8deadSopenharmony_ci                              Shading language changes.
13235bd8deadSopenharmony_ci                              Add global_offset parameter.
13245bd8deadSopenharmony_ci                              Add implementation dependent limits.
13255bd8deadSopenharmony_ci     1    09/24/10  gsellers  Initial revision
1326