15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci ARB_compute_shader 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_ARB_compute_shader 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Graham Sellers, AMD (graham.sellers 'at' amd.com) 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciContributors 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci Pat Brown, NVIDIA 165bd8deadSopenharmony_ci Daniel Koch, TransGaming 175bd8deadSopenharmony_ci John Kessenich 185bd8deadSopenharmony_ci Members of the ARB working group 195bd8deadSopenharmony_ci 205bd8deadSopenharmony_ciNotice 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ci Copyright (c) 2012-2014 The Khronos Group Inc. Copyright terms at 235bd8deadSopenharmony_ci http://www.khronos.org/registry/speccopyright.html 245bd8deadSopenharmony_ci 255bd8deadSopenharmony_ciSpecification Update Policy 265bd8deadSopenharmony_ci 275bd8deadSopenharmony_ci Khronos-approved extension specifications are updated in response to 285bd8deadSopenharmony_ci issues and bugs prioritized by the Khronos OpenGL Working Group. For 295bd8deadSopenharmony_ci extensions which have been promoted to a core Specification, fixes will 305bd8deadSopenharmony_ci first appear in the latest version of that core Specification, and will 315bd8deadSopenharmony_ci eventually be backported to the extension document. This policy is 325bd8deadSopenharmony_ci described in more detail at 335bd8deadSopenharmony_ci https://www.khronos.org/registry/OpenGL/docs/update_policy.php 345bd8deadSopenharmony_ci 355bd8deadSopenharmony_ciStatus 365bd8deadSopenharmony_ci 375bd8deadSopenharmony_ci Complete. 385bd8deadSopenharmony_ci Approved by the ARB on 2012/06/12. 395bd8deadSopenharmony_ci 405bd8deadSopenharmony_ciVersion 415bd8deadSopenharmony_ci 425bd8deadSopenharmony_ci Last Modified Date: December 10, 2018 435bd8deadSopenharmony_ci Revision: 28 445bd8deadSopenharmony_ci 455bd8deadSopenharmony_ciNumber 465bd8deadSopenharmony_ci 475bd8deadSopenharmony_ci ARB Extension #122 485bd8deadSopenharmony_ci 495bd8deadSopenharmony_ciDependencies 505bd8deadSopenharmony_ci 515bd8deadSopenharmony_ci OpenGL 4.2 is required. 525bd8deadSopenharmony_ci 535bd8deadSopenharmony_ci This extension is written based on the wording of the OpenGL 4.2 (Core 545bd8deadSopenharmony_ci Profile) specification, and on the wording of the OpenGL Shading Language 555bd8deadSopenharmony_ci (GLSL) Specification, version 4.20. 565bd8deadSopenharmony_ci 575bd8deadSopenharmony_ci This extension interacts with OpenGL 4.3 and 585bd8deadSopenharmony_ci ARB_shader_storage_buffer_object. 595bd8deadSopenharmony_ci 605bd8deadSopenharmony_ci This extension interacts with NV_vertex_buffer_unified_memory. 615bd8deadSopenharmony_ci 625bd8deadSopenharmony_ciOverview 635bd8deadSopenharmony_ci 645bd8deadSopenharmony_ci Recent graphics hardware has become extremely powerful and a strong desire 655bd8deadSopenharmony_ci to harness this power for work (both graphics and non-graphics) that does 665bd8deadSopenharmony_ci not fit the traditional graphics pipeline well has emerged. To address 675bd8deadSopenharmony_ci this, this extension adds a new single-stage program type known as a 685bd8deadSopenharmony_ci compute program. This program may contain one or more compute shaders 695bd8deadSopenharmony_ci which may be launched in a manner that is essentially stateless. This allows 705bd8deadSopenharmony_ci arbitrary workloads to be sent to the graphics hardware with minimal 715bd8deadSopenharmony_ci disturbance to the GL state machine. 725bd8deadSopenharmony_ci 735bd8deadSopenharmony_ci In most respects, a compute program is identical to a traditional OpenGL 745bd8deadSopenharmony_ci program object, with similar status, uniforms, and other such properties. 755bd8deadSopenharmony_ci It has access to many of the same resources as fragment and other shader 765bd8deadSopenharmony_ci types, such as textures, image variables, atomic counters, and so on. 775bd8deadSopenharmony_ci However, it has no predefined inputs nor any fixed-function outputs. It 785bd8deadSopenharmony_ci cannot be part of a pipeline and its visible side effects are through its 795bd8deadSopenharmony_ci actions on images and atomic counters. 805bd8deadSopenharmony_ci 815bd8deadSopenharmony_ci OpenCL is another solution for using graphics processors as generalized 825bd8deadSopenharmony_ci compute devices. This extension addresses a different need. For example, 835bd8deadSopenharmony_ci OpenCL is designed to be usable on a wide range of devices ranging from 845bd8deadSopenharmony_ci CPUs, GPUs, and DSPs through to FPGAs. While one could implement GL on these 855bd8deadSopenharmony_ci types of devices, the target here is clearly GPUs. Another difference is 865bd8deadSopenharmony_ci that OpenCL is more full featured and includes features such as multiple 875bd8deadSopenharmony_ci devices, asynchronous queues and strict IEEE semantics for floating point 885bd8deadSopenharmony_ci operations. This extension follows the semantics of OpenGL - implicitly 895bd8deadSopenharmony_ci synchronous, in-order operation with single-device, single queue 905bd8deadSopenharmony_ci logical architecture and somewhat more relaxed numerical precision 915bd8deadSopenharmony_ci requirements. Although not as feature rich, this extension offers several 925bd8deadSopenharmony_ci advantages for applications that can tolerate the omission of these 935bd8deadSopenharmony_ci features. Compute shaders are written in GLSL, for example and so code may 945bd8deadSopenharmony_ci be shared between compute and other shader types. Objects are created and 955bd8deadSopenharmony_ci owned by the same context as the rest of the GL, and therefore no 965bd8deadSopenharmony_ci interoperability API is required and objects may be freely used by both 975bd8deadSopenharmony_ci compute and graphics simultaneously without acquire-release semantics or 985bd8deadSopenharmony_ci object type translation. 995bd8deadSopenharmony_ci 1005bd8deadSopenharmony_ciNew Procedures and Functions 1015bd8deadSopenharmony_ci 1025bd8deadSopenharmony_ci void DispatchCompute(uint num_groups_x, 1035bd8deadSopenharmony_ci uint num_groups_y, 1045bd8deadSopenharmony_ci uint num_groups_z); 1055bd8deadSopenharmony_ci 1065bd8deadSopenharmony_ci void DispatchComputeIndirect(intptr indirect); 1075bd8deadSopenharmony_ci 1085bd8deadSopenharmony_ciNew Tokens 1095bd8deadSopenharmony_ci 1105bd8deadSopenharmony_ci Accepted by the <type> parameter of CreateShader and returned in the 1115bd8deadSopenharmony_ci <params> parameter by GetShaderiv: 1125bd8deadSopenharmony_ci 1135bd8deadSopenharmony_ci COMPUTE_SHADER 0x91B9 1145bd8deadSopenharmony_ci 1155bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetIntegerv, GetBooleanv, GetFloatv, 1165bd8deadSopenharmony_ci GetDoublev and GetInteger64v: 1175bd8deadSopenharmony_ci 1185bd8deadSopenharmony_ci MAX_COMPUTE_UNIFORM_BLOCKS 0x91BB 1195bd8deadSopenharmony_ci MAX_COMPUTE_TEXTURE_IMAGE_UNITS 0x91BC 1205bd8deadSopenharmony_ci MAX_COMPUTE_IMAGE_UNIFORMS 0x91BD 1215bd8deadSopenharmony_ci MAX_COMPUTE_SHARED_MEMORY_SIZE 0x8262 1225bd8deadSopenharmony_ci MAX_COMPUTE_UNIFORM_COMPONENTS 0x8263 1235bd8deadSopenharmony_ci MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS 0x8264 1245bd8deadSopenharmony_ci MAX_COMPUTE_ATOMIC_COUNTERS 0x8265 1255bd8deadSopenharmony_ci MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS 0x8266 1265bd8deadSopenharmony_ci MAX_COMPUTE_WORK_GROUP_INVOCATIONS 0x90EB 1275bd8deadSopenharmony_ci 1285bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetIntegeri_v, GetBooleani_v, 1295bd8deadSopenharmony_ci GetFloati_v, GetDoublei_v and GetInteger64i_v: 1305bd8deadSopenharmony_ci 1315bd8deadSopenharmony_ci MAX_COMPUTE_WORK_GROUP_COUNT 0x91BE 1325bd8deadSopenharmony_ci MAX_COMPUTE_WORK_GROUP_SIZE 0x91BF 1335bd8deadSopenharmony_ci 1345bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetProgramiv: 1355bd8deadSopenharmony_ci 1365bd8deadSopenharmony_ci COMPUTE_WORK_GROUP_SIZE 0x8267 1375bd8deadSopenharmony_ci 1385bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetActiveUniformBlockiv: 1395bd8deadSopenharmony_ci 1405bd8deadSopenharmony_ci UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER 0x90EC 1415bd8deadSopenharmony_ci 1425bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetActiveAtomicCounterBufferiv: 1435bd8deadSopenharmony_ci 1445bd8deadSopenharmony_ci ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER 0x90ED 1455bd8deadSopenharmony_ci 1465bd8deadSopenharmony_ci Accepted by the <target> parameters of BindBuffer, BufferData, 1475bd8deadSopenharmony_ci BufferSubData, MapBuffer, UnmapBuffer, GetBufferSubData, and 1485bd8deadSopenharmony_ci GetBufferPointerv: 1495bd8deadSopenharmony_ci 1505bd8deadSopenharmony_ci DISPATCH_INDIRECT_BUFFER 0x90EE 1515bd8deadSopenharmony_ci 1525bd8deadSopenharmony_ci Accepted by the <value> parameter of GetIntegerv, GetBooleanv, 1535bd8deadSopenharmony_ci GetInteger64v, GetFloatv, and GetDoublev: 1545bd8deadSopenharmony_ci 1555bd8deadSopenharmony_ci DISPATCH_INDIRECT_BUFFER_BINDING 0x90EF 1565bd8deadSopenharmony_ci 1575bd8deadSopenharmony_ci Accepted by the <stages> parameter of UseProgramStages: 1585bd8deadSopenharmony_ci 1595bd8deadSopenharmony_ci COMPUTE_SHADER_BIT 0x00000020 1605bd8deadSopenharmony_ci 1615bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 4.2 (Core Profile) Specification 1625bd8deadSopenharmony_ci(OpenGL Operation) 1635bd8deadSopenharmony_ci 1645bd8deadSopenharmony_ci In section 2.9.1, "Creating and Binding Buffer Objects", add to table 2.8 1655bd8deadSopenharmony_ci (p.43): 1665bd8deadSopenharmony_ci 1675bd8deadSopenharmony_ci Described 1685bd8deadSopenharmony_ci Target name Purpose in sections(s) 1695bd8deadSopenharmony_ci ----------------------- ------------------------- --------------- 1705bd8deadSopenharmony_ci DISPATCH_INDIRECT_BUFFER Indirect compute dispatch 5.5 1715bd8deadSopenharmony_ci commands 1725bd8deadSopenharmony_ci 1735bd8deadSopenharmony_ci Add to the end of section 2.9.8, "Indirect Commands In Buffer Objects" 1745bd8deadSopenharmony_ci (p. 53): 1755bd8deadSopenharmony_ci 1765bd8deadSopenharmony_ci Arguments to the DispatchComputeIndirect command are stored in buffer 1775bd8deadSopenharmony_ci objects as a group of three unsigned integers. 1785bd8deadSopenharmony_ci 1795bd8deadSopenharmony_ci A buffer object is bound to DISPATCH_INDIRECT_BUFFER by calling BindBuffer 1805bd8deadSopenharmony_ci with target set to DISPATCH_INDIRECT_BUFFER, and buffer set to the name of 1815bd8deadSopenharmony_ci the buffer object. If no corresponding buffer object exists, one is 1825bd8deadSopenharmony_ci initialized as defined in section 2.9. 1835bd8deadSopenharmony_ci 1845bd8deadSopenharmony_ci DispatchComputeIndirect sources its arguments from the buffer object whose 1855bd8deadSopenharmony_ci name is bound to DISPATCH_INDIRECT_BUFFER, using the <indirect> parameter as 1865bd8deadSopenharmony_ci an offset into the buffer object in the same fashion as described in 1875bd8deadSopenharmony_ci section 2.9.6. An INVALID_OPERATION error is generated if this command 1885bd8deadSopenharmony_ci sources data beyond the end of the buffer object, if zero is bound to 1895bd8deadSopenharmony_ci DISPATCH_INDIRECT_BUFFER, or if <indirect> is less than zero or not a 1905bd8deadSopenharmony_ci multiple of the size, in basic machine units, of uint. 1915bd8deadSopenharmony_ci 1925bd8deadSopenharmony_ci In section 2.11, "Vertex Shaders", modify the introductory text on shaders 1935bd8deadSopenharmony_ci to include compute shaders (second paragraph, p. 56): 1945bd8deadSopenharmony_ci 1955bd8deadSopenharmony_ci In addition to vertex shaders, tessellation control..., geometry shaders, 1965bd8deadSopenharmony_ci fragment shaders, and compute shders can be created, compiled, and linked 1975bd8deadSopenharmony_ci into program objects. .... (section 3.10). Compute shaders perform 1985bd8deadSopenharmony_ci general computations for dispatched arrays of shader invocations (section 1995bd8deadSopenharmony_ci 5.5), but do not operate on primitives processed by the other shader 2005bd8deadSopenharmony_ci types. ... 2015bd8deadSopenharmony_ci 2025bd8deadSopenharmony_ci In section 2.11.3, "Program Objects", add to the reasons that LinkProgram 2035bd8deadSopenharmony_ci may fail, p. 61: 2045bd8deadSopenharmony_ci 2055bd8deadSopenharmony_ci * The program object contains objects to form a compute shader (see 2065bd8deadSopenharmony_ci section 5.5) and objects to form any other type of shader. 2075bd8deadSopenharmony_ci 2085bd8deadSopenharmony_ci In section 2.11.3, modify the description of active programs (last 2095bd8deadSopenharmony_ci paragraph, p. 61, first paragraph, p. 62): 2105bd8deadSopenharmony_ci 2115bd8deadSopenharmony_ci ... geometry shader stages, those stages are ignored. If there is no 2125bd8deadSopenharmony_ci active program for the compute shader stage, compute dispatches will 2135bd8deadSopenharmony_ci generate an error. The active program for the compute shader stage has no 2145bd8deadSopenharmony_ci effect on the processing of vertices, geometric primitives, and fragments, 2155bd8deadSopenharmony_ci and the active program for all other shader stages has no effect on 2165bd8deadSopenharmony_ci compute dispatches. 2175bd8deadSopenharmony_ci 2185bd8deadSopenharmony_ci In section 2.11.4, "Program Pipeline Objects", modify the description of 2195bd8deadSopenharmony_ci UseProgramStages, p. 65: 2205bd8deadSopenharmony_ci 2215bd8deadSopenharmony_ci The executables in a program object... becomes current. These stages may 2225bd8deadSopenharmony_ci include vertex, tessellation control, tessellation evaluation, geometry, 2235bd8deadSopenharmony_ci fragment, or compute, indicated by VERTEX_SHADER_BIT, 2245bd8deadSopenharmony_ci TESS_CONTROL_SHADER_BIT, TESS_EVALUATION_SHADER_BIT, GEOMETRY_SHADER_BIT, 2255bd8deadSopenharmony_ci FRAGMENT_SHADER_BIT, or COMPUTE_SHADER_BIT, respectively. ... 2265bd8deadSopenharmony_ci 2275bd8deadSopenharmony_ci In the unnumbered "Validation" section of section 2.11.12 "Shader 2285bd8deadSopenharmony_ci Execution", modify the list of validation errors, pp. 112-113: 2295bd8deadSopenharmony_ci 2305bd8deadSopenharmony_ci This error is generated by any command that transfers vertices to the GL 2315bd8deadSopenharmony_ci or launches compute work if: 2325bd8deadSopenharmony_ci 2335bd8deadSopenharmony_ci * (last bullet, p. 112) One program object is active... first program 2345bd8deadSopenharmony_ci object was active. The active compute shader is ignored for the 2355bd8deadSopenharmony_ci purposes of this test. 2365bd8deadSopenharmony_ci 2375bd8deadSopenharmony_ci * (2nd bullet, p. 113) There is no current program specified by 2385bd8deadSopenharmony_ci UseProgram, there is a current program pipeline object, and the 2395bd8deadSopenharmony_ci current program for any shader stage has been relinked since... 2405bd8deadSopenharmony_ci 2415bd8deadSopenharmony_ci * (3rd bullet, p. 113) Any two active samplers in the set of active 2425bd8deadSopenharmony_ci program objects are of different types but refer to the same texture 2435bd8deadSopenharmony_ci image unit. 2445bd8deadSopenharmony_ci 2455bd8deadSopenharmony_ci * (4th bullet, p. 113) The sum of the number of active samplers for each 2465bd8deadSopenharmony_ci active program exceeds the maximum number of texture image units 2475bd8deadSopenharmony_ci allowed. 2485bd8deadSopenharmony_ci 2495bd8deadSopenharmony_ci Modify the paragraph describing ValidateProgram, p. 113: 2505bd8deadSopenharmony_ci 2515bd8deadSopenharmony_ci ... If validation succeeded, ... set to FALSE. If validation succeeded, 2525bd8deadSopenharmony_ci no INVALID_OPERATION validation error will be generated if <program> were 2535bd8deadSopenharmony_ci made current via UseProgram, given the current state. If validation 2545bd8deadSopenharmony_ci failed, such errors will be generated under the current state. 2555bd8deadSopenharmony_ci 2565bd8deadSopenharmony_ci Modify the paragraph describing ValidateProgramPipeline, p. 114: 2575bd8deadSopenharmony_ci 2585bd8deadSopenharmony_ci ... can be queried with GetProgramPipelineiv (see section 6.1.12). If 2595bd8deadSopenharmony_ci validation succeeded, no INVALID_OPERATION validation error will be 2605bd8deadSopenharmony_ci generated if <pipeline> were bound and no program were made current via 2615bd8deadSopenharmony_ci UseProgram, given the current state. If validation failed, such errors 2625bd8deadSopenharmony_ci will be generated under the current state. 2635bd8deadSopenharmony_ci 2645bd8deadSopenharmony_ci In subsection 2.11.12, "Shader Execution": 2655bd8deadSopenharmony_ci 2665bd8deadSopenharmony_ci Add to the list of implementation dependent constants under the 2675bd8deadSopenharmony_ci "Texture Access" sub-heading: 2685bd8deadSopenharmony_ci 2695bd8deadSopenharmony_ci MAX_COMPUTE_TEXTURE_IMAGE_UNITS (for compute shaders), 2705bd8deadSopenharmony_ci 2715bd8deadSopenharmony_ci Add to the list of implementation dependent constants under the "Atomic 2725bd8deadSopenharmony_ci Counter Access" sub-heading: 2735bd8deadSopenharmony_ci 2745bd8deadSopenharmony_ci MAX_COMPUTE_ATOMIC_COUNTERS (for compute shaders), 2755bd8deadSopenharmony_ci 2765bd8deadSopenharmony_ci Add to the list of implementation dependent constants under the "Image 2775bd8deadSopenharmony_ci Access" sub-heading: 2785bd8deadSopenharmony_ci 2795bd8deadSopenharmony_ci MAX_COMPUTE_IMAGE_UNIFORMS (for compute shaders), 2805bd8deadSopenharmony_ci 2815bd8deadSopenharmony_ci In section 2.16, "Conditional Rendering", modify the sentence describing 2825bd8deadSopenharmony_ci conditional rendering, starting with "In this case"... 2835bd8deadSopenharmony_ci 2845bd8deadSopenharmony_ci In this case, all drawing commands (see section 2.8.3), as well as 2855bd8deadSopenharmony_ci Clear and ClearBuffer* (see section 4.2.3), and compute dispatch 2865bd8deadSopenharmony_ci through DispacthCompute* (see section 5.5), have no effect. 2875bd8deadSopenharmony_ci In the "Shared Memory Access Synchronization" subsection of section 2885bd8deadSopenharmony_ci 2.11.13, "Shader Memory Access", modify the description of 2895bd8deadSopenharmony_ci COMMAND_BARRIER_BIT (p. 118): 2905bd8deadSopenharmony_ci 2915bd8deadSopenharmony_ci * COMMAND_BARRIER_BIT: Command data sourced from buffer objects by 2925bd8deadSopenharmony_ci Draw*Indirect and DispatchComputeIndirect commands ... The buffer 2935bd8deadSopenharmony_ci objects affected by this bit are derived from the DRAW_INDIRECT_BUFFER 2945bd8deadSopenharmony_ci and DISPATCH_INDIRECT_BUFFER bindings. 2955bd8deadSopenharmony_ci 2965bd8deadSopenharmony_ci In subection 2.17.7, "Uniform Variables", replace the paragraph beginning 2975bd8deadSopenharmony_ci "If <pname> is UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER,"... with: 2985bd8deadSopenharmony_ci 2995bd8deadSopenharmony_ci If <pname> is UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER, 3005bd8deadSopenharmony_ci UNIFORM_BLOCK_REFERENCED_BY_TESS_CONTROL_SHADER, 3015bd8deadSopenharmony_ci UNIFORM_BLOCK_REFERENCED_BY_TESS_EVALUATION_SHADER, 3025bd8deadSopenharmony_ci UNIFORM_BLOCK_REFERENCED_BY_GEOMETRY_SHADER, 3035bd8deadSopenharmony_ci UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER or 3045bd8deadSopenharmony_ci UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER, then a boolean value indicating 3055bd8deadSopenharmony_ci whether the uniform block identified by uniformBlockIndex is referenced 3065bd8deadSopenharmony_ci by the vertex, tessellation control, tessellation evaluation, geometry, 3075bd8deadSopenharmony_ci fragment or compute programming stages of <program>, respectively, is 3085bd8deadSopenharmony_ci returned. 3095bd8deadSopenharmony_ci 3105bd8deadSopenharmony_ci Also in subsection 2.17.7, "Uniform Variables", replace the paragraph 3115bd8deadSopenharmony_ci beginning, "If <pname> is ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER" 3125bd8deadSopenharmony_ci on p.80 with: 3135bd8deadSopenharmony_ci 3145bd8deadSopenharmony_ci If <pname> is ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER, 3155bd8deadSopenharmony_ci ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_CONTROL_SHADER, 3165bd8deadSopenharmony_ci ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_EVALUATION_SHADER, 3175bd8deadSopenharmony_ci ATOMIC_COUNTER_BUFFER_REFERENCED_BY_GEOMETRY_SHADER, 3185bd8deadSopenharmony_ci ATOMIC_COUNTER_BUFFER_REFERENCED_BY_FRAGMENT_SHADER or 3195bd8deadSopenharmony_ci ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER, then a single boolean 3205bd8deadSopenharmony_ci value indicating whether the atomic counter buffer identified by 3215bd8deadSopenharmony_ci bufferIndex is referenced by the vertex, tessellation control, tessellation 3225bd8deadSopenharmony_ci evaluation, geometry, fragment or compute programming stages of 3235bd8deadSopenharmony_ci <program>, respectively, is returned. 3245bd8deadSopenharmony_ci 3255bd8deadSopenharmony_ci Under the sub-heading "Uniform Blocks" in subsection 2.11.17, replace the 3265bd8deadSopenharmony_ci sentence beginning "The limits for vertex, tessellation ..." on p.92 3275bd8deadSopenharmony_ci with: 3285bd8deadSopenharmony_ci 3295bd8deadSopenharmony_ci The limits for vertex, tessellation, geometry, fragment and compute 3305bd8deadSopenharmony_ci shaders can be obtained by calling GetIntegerv with <pname> set to 3315bd8deadSopenharmony_ci MAX_VERTEX_UNIFORM_BLOCKS, MAX_TESS_CONTROL_UNIFORM_BLOCKS, 3325bd8deadSopenharmony_ci MAX_TESS_EVALUATION_UNIFORM_BLOCKS, MAX_GEOMETRY_UNIFORM_BLOCKS, 3335bd8deadSopenharmony_ci MAX_FRAGMENT_UNIFORM_BLOCKS and MAX_COMPUTE_UNIFORM_BLOCKS, respectively. 3345bd8deadSopenharmony_ci 3355bd8deadSopenharmony_ci Under the sub-heading "Atomic Counter Buffers" in subsection 2.11.17, 3365bd8deadSopenharmony_ci replace the sentence beginning "The limits for vertex, geometry, ..." 3375bd8deadSopenharmony_ci on p.96 with: 3385bd8deadSopenharmony_ci 3395bd8deadSopenharmony_ci The limits for vertex, tessellation, geometry, fragment and compute 3405bd8deadSopenharmony_ci shaders can be obtained by calling GetIntegerv with <pname> set to 3415bd8deadSopenharmony_ci MAX_VERTEX_ATOMIC_COUNTER_BUFFERS, MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS, 3425bd8deadSopenharmony_ci MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS, 3435bd8deadSopenharmony_ci MAX_GEOMETRY_ATOMIC_COUNTER_BUFFERS, MAX_FRAGMENT_ATOMIC_COUNTER_BUFFERS and 3445bd8deadSopenharmony_ci MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS, respectively. 3455bd8deadSopenharmony_ci 3465bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 4.2 (Core Profile) Specification 3475bd8deadSopenharmony_ci(Rasterization) 3485bd8deadSopenharmony_ci 3495bd8deadSopenharmony_ci None. 3505bd8deadSopenharmony_ci 3515bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 4.2 (Core Profile) Specification 3525bd8deadSopenharmony_ci(Per-Fragment Operations and the Framebuffer) 3535bd8deadSopenharmony_ci 3545bd8deadSopenharmony_ci None. 3555bd8deadSopenharmony_ci 3565bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 4.2 (Core Profile) Specification 3575bd8deadSopenharmony_ci(Special Functions) 3585bd8deadSopenharmony_ci 3595bd8deadSopenharmony_ci Add Section 5.5, "Compute Shaders" 3605bd8deadSopenharmony_ci 3615bd8deadSopenharmony_ci In addition to graphics-oriented shading operations such as vertex, 3625bd8deadSopenharmony_ci tessellation, geometry and fragment shading, generic computation may be 3635bd8deadSopenharmony_ci performed by the GL through the use of compute shaders. The compute pipeline 3645bd8deadSopenharmony_ci is a form of single-stage machine that runs generic shaders. Compute shaders 3655bd8deadSopenharmony_ci are created as described in section 2.11.1 using a <type> parameter of 3665bd8deadSopenharmony_ci COMPUTE_SHADER. They are attached to and used in program objects as 3675bd8deadSopenharmony_ci described in section 2.11.3. 3685bd8deadSopenharmony_ci 3695bd8deadSopenharmony_ci Compute workloads are formed from groups of work items called 3705bd8deadSopenharmony_ci _workgroups_ and processed by the executable code for a compute program. 3715bd8deadSopenharmony_ci A workgroup is a collection of shader invocations that execute the same code, 3725bd8deadSopenharmony_ci potentially in parallel. An invocation within a workgroup may share data 3735bd8deadSopenharmony_ci with other members of the same workgroup through shared variables and 3745bd8deadSopenharmony_ci issue memory and control barriers to synchronize with other members of the 3755bd8deadSopenharmony_ci same workgroup. One or more workgroups is launched by calling: 3765bd8deadSopenharmony_ci 3775bd8deadSopenharmony_ci void DispatchCompute(uint num_groups_x, 3785bd8deadSopenharmony_ci uint num_groups_y, 3795bd8deadSopenharmony_ci uint num_groups_z); 3805bd8deadSopenharmony_ci 3815bd8deadSopenharmony_ci Each workgroup is processed by the active program object for the 3825bd8deadSopenharmony_ci compute shader stage. The error INVALID_OPERATION will be generated if 3835bd8deadSopenharmony_ci there is no active program object for the compute shader stage. The 3845bd8deadSopenharmony_ci active program for the compute shader stage will be determined in the same 3855bd8deadSopenharmony_ci manner as the active program for other pipeline stages, as described in 3865bd8deadSopenharmony_ci section 2.11.3. While the individual shader invocations within a 3875bd8deadSopenharmony_ci workgroup are executed as a unit, workgroups are executed completely 3885bd8deadSopenharmony_ci independently and in unspecified order. 3895bd8deadSopenharmony_ci 3905bd8deadSopenharmony_ci <num_groups_x>, <num_groups_y> and <num_groups_z> specify the number of 3915bd8deadSopenharmony_ci workgroups that will be dispatched in the X, Y and Z dimensions, 3925bd8deadSopenharmony_ci respectively. The builtin vector variable gl_NumWorkGroups will be 3935bd8deadSopenharmony_ci initialized with the contents of the <num_groups_x>, <num_groups_y> and 3945bd8deadSopenharmony_ci <num_groups_z> parameters. The maximum number of workgroups that may be 3955bd8deadSopenharmony_ci dispatched at one time may be determined by calling GetIntegeri_v with 3965bd8deadSopenharmony_ci <pname> set to MAX_COMPUTE_WORK_GROUP_COUNT and <index> must be zero, one, 3975bd8deadSopenharmony_ci or two, representing the X, Y, and Z dimensions, respectively. The 3985bd8deadSopenharmony_ci values in the <num_groups_x>, <num_groups_y> and <num_groups_z> array must 3995bd8deadSopenharmony_ci be less than or equal to the maximum workgroup count for the corresponding 4005bd8deadSopenharmony_ci dimension, otherwise an INVALID_VALUE error is generated. If the workgroup 4015bd8deadSopenharmony_ci count in any dimension is zero, no workgroups are dispatched. 4025bd8deadSopenharmony_ci 4035bd8deadSopenharmony_ci The workgroup size in each dimension are specified at compile time 4045bd8deadSopenharmony_ci using an input layout qualifier in one or more of the compute shaders 4055bd8deadSopenharmony_ci attached to the program (see Section 4 of the OpenGL Shading Language 4065bd8deadSopenharmony_ci Specification). After the program has been linked, the workgroup size 4075bd8deadSopenharmony_ci of the program may be retrieved by calling GetProgramiv with <pname> set to 4085bd8deadSopenharmony_ci COMPUTE_WORK_GROUP_SIZE. This will return an array of three integers 4095bd8deadSopenharmony_ci containing the workgroup size of the compute program as specified by 4105bd8deadSopenharmony_ci its input layout qualifier(s). If <program> is the name of a program that 4115bd8deadSopenharmony_ci has not been successfully linked, or is the name of a linked program object 4125bd8deadSopenharmony_ci that contains no compute shaders, then an INVALID_OPERATION error is 4135bd8deadSopenharmony_ci generated. 4145bd8deadSopenharmony_ci 4155bd8deadSopenharmony_ci The maximum size of a workgroup may be determined by calling 4165bd8deadSopenharmony_ci GetIntegeri_v with <pname> set to MAX_COMPUTE_WORK_GROUP_SIZE 4175bd8deadSopenharmony_ci and <index> set to 0, 1, or 2 to retrieve the maximum work size in the 4185bd8deadSopenharmony_ci X, Y and Z dimension, respectively. Furthermore, the maximum number of 4195bd8deadSopenharmony_ci invocations in a single workgroup (i.e., the product of the three 4205bd8deadSopenharmony_ci dimensions) may be determined by calling GetIntegerv with <pname> set to 4215bd8deadSopenharmony_ci MAX_COMPUTE_WORK_GROUP_INVOCATIONS. 4225bd8deadSopenharmony_ci 4235bd8deadSopenharmony_ci The command 4245bd8deadSopenharmony_ci 4255bd8deadSopenharmony_ci void DispatchComputeIndirect(intptr indirect); 4265bd8deadSopenharmony_ci 4275bd8deadSopenharmony_ci is equivalent (assuming no errors are generated) to calling 4285bd8deadSopenharmony_ci DispatchCompute with <num_groups_x>, <num_groups_y> and <num_groups_z> 4295bd8deadSopenharmony_ci initialized with the three uint values contained in the buffer currently 4305bd8deadSopenharmony_ci bound to the DISPATCH_INDIRECT_BUFFER binding at an offset, in basic 4315bd8deadSopenharmony_ci machine units, specified by <indirect>. The error INVALID_VALUE is 4325bd8deadSopenharmony_ci generated if <indirect> is less than zero or is not a multiple of four. 4335bd8deadSopenharmony_ci The error INVALID_OPERATION is generated if no buffer is bound to 4345bd8deadSopenharmony_ci DISPATCH_INDIRECT_BUFFER, if the command would source data beyond the end 4355bd8deadSopenharmony_ci of the buffer object, or if there is no active program for the compute 4365bd8deadSopenharmony_ci shader stage. If any of <num_groups_x>, <num_groups_y> or <num_groups_z> 4375bd8deadSopenharmony_ci is greater than MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding 4385bd8deadSopenharmony_ci dimension then the results are undefined. 4395bd8deadSopenharmony_ci 4405bd8deadSopenharmony_ci Add Subsection 5.5.1, "Compute Shader Variables" 4415bd8deadSopenharmony_ci 4425bd8deadSopenharmony_ci Compute shaders can access variables belonging to the current program 4435bd8deadSopenharmony_ci object. The amount of storage in the default uniform block accessed by a 4445bd8deadSopenharmony_ci compute shader is specified by the value of the implementation dependent 4455bd8deadSopenharmony_ci constant MAX_COMPUTE_UNIFORM_COMPONENTS. The total amount of 4465bd8deadSopenharmony_ci combined storage available for uniform variables in all uniform blocks 4475bd8deadSopenharmony_ci accessed by a compute shader (including the default unifom block) is 4485bd8deadSopenharmony_ci specified by the implementation dependent constant 4495bd8deadSopenharmony_ci MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS. 4505bd8deadSopenharmony_ci 4515bd8deadSopenharmony_ci There is a limit to the total size of all variables declared as 4525bd8deadSopenharmony_ci <shared> in a single program object. This limit, expressed in units of 4535bd8deadSopenharmony_ci basic machine units, may be queried as the value of 4545bd8deadSopenharmony_ci MAX_COMPUTE_SHARED_MEMORY_SIZE. 4555bd8deadSopenharmony_ci 4565bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 4.2 (Core Profile) Specification 4575bd8deadSopenharmony_ci(State and State Requests) 4585bd8deadSopenharmony_ci 4595bd8deadSopenharmony_ci None. 4605bd8deadSopenharmony_ci 4615bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL Shading Language Specification, Version 4625bd8deadSopenharmony_ci4.20 (Overview of OpenGL Shading) 4635bd8deadSopenharmony_ci 4645bd8deadSopenharmony_ci Replace the last sentence of the first paragraph of the overview with 4655bd8deadSopenharmony_ci the following: 4665bd8deadSopenharmony_ci 4675bd8deadSopenharmony_ci "Currently, these processors are the vertex, tessellation control, 4685bd8deadSopenharmony_ci tessellation evaluation, geometry, fragment, and compute processors." 4695bd8deadSopenharmony_ci 4705bd8deadSopenharmony_ci Replace the last sentence of the second paragraph of the overview with 4715bd8deadSopenharmony_ci the following: 4725bd8deadSopenharmony_ci 4735bd8deadSopenharmony_ci "The specific languages will be referred to by the name of the processor 4745bd8deadSopenharmony_ci they target: vertex, tessellation control, tessellation evaluation, 4755bd8deadSopenharmony_ci geometry, fragment, or compute." 4765bd8deadSopenharmony_ci 4775bd8deadSopenharmony_ci Add a new Section 2.6 titled "Compute Processor" with the following text: 4785bd8deadSopenharmony_ci 4795bd8deadSopenharmony_ci "The <compute processor> is a programmable unit that operates independently 4805bd8deadSopenharmony_ci from the other shader processors. Compilation units written in the OpenGL 4815bd8deadSopenharmony_ci Shading Language to run on this processor are called <compute shaders>. 4825bd8deadSopenharmony_ci When a complete set of compute shaders are compiled and linked, they 4835bd8deadSopenharmony_ci result in a <compute shader executable> that runs on the compute processor. 4845bd8deadSopenharmony_ci 4855bd8deadSopenharmony_ci A compute shader has access to many of the same resources as fragment and 4865bd8deadSopenharmony_ci other shader processors, such as textures, buffers, image variables, 4875bd8deadSopenharmony_ci atomic counters, and so on. It does not have any predefined inputs 4885bd8deadSopenharmony_ci nor any fixed-function outputs. It is not part of the graphics pipeline 4895bd8deadSopenharmony_ci and its visible side effects are through actions on images, storage 4905bd8deadSopenharmony_ci buffers, and atomic counters. 4915bd8deadSopenharmony_ci 4925bd8deadSopenharmony_ci A compute shader operates on a group of work items called a workgroup. 4935bd8deadSopenharmony_ci A workgroup is a collection of shader invocations that execute the same 4945bd8deadSopenharmony_ci code, potentially in parallel. An invocation within a workgroup may share data with 4955bd8deadSopenharmony_ci other members of the same workgroup through shared variables and issue 4965bd8deadSopenharmony_ci memory and control barriers to synchronize with other members of the same workgroup." 4975bd8deadSopenharmony_ci 4985bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL Shading Language Specification, Version 4995bd8deadSopenharmony_ci4.20 (Variables and Types) 5005bd8deadSopenharmony_ci 5015bd8deadSopenharmony_ci Modify section 4.4.1, second paragraph from 5025bd8deadSopenharmony_ci 5035bd8deadSopenharmony_ci "All shaders allow input layout qualifiers on input variable declarations." 5045bd8deadSopenharmony_ci 5055bd8deadSopenharmony_ci to 5065bd8deadSopenharmony_ci 5075bd8deadSopenharmony_ci "All shaders, except compute shaders, allow input layout location qualifiers on 5085bd8deadSopenharmony_ci input variable declarations." 5095bd8deadSopenharmony_ci 5105bd8deadSopenharmony_ci Modify Section 4.3. Add to the table at the start of Section 4.3: 5115bd8deadSopenharmony_ci 5125bd8deadSopenharmony_ci +-------------------+-----------------------------------------------------------+ 5135bd8deadSopenharmony_ci | Storage Qualifier | Meaning | 5145bd8deadSopenharmony_ci +-------------------+-----------------------------------------------------------+ 5155bd8deadSopenharmony_ci | <shared> | variable storage is shared across all work items in a | 5165bd8deadSopenharmony_ci | | workgroup for compute shaders | 5175bd8deadSopenharmony_ci +-------------------+-----------------------------------------------------------+ 5185bd8deadSopenharmony_ci 5195bd8deadSopenharmony_ci Add the following paragraph to Section 4.3.4, "Input Variables" 5205bd8deadSopenharmony_ci 5215bd8deadSopenharmony_ci Compute shaders do not permit user-defined input variables and do not 5225bd8deadSopenharmony_ci form a formal interface with any other shader stage. See section 7.1 5235bd8deadSopenharmony_ci for a description of built-in compute shader input variables. All other 5245bd8deadSopenharmony_ci input to a compute shader is retrieved explicitly through image loads, 5255bd8deadSopenharmony_ci texture fetches, loads from uniforms or uniform buffers, or other user 5265bd8deadSopenharmony_ci supplied code. Redeclaration of built-in input variables in compute 5275bd8deadSopenharmony_ci shaders is not permitted. 5285bd8deadSopenharmony_ci 5295bd8deadSopenharmony_ci Add the following paragraph to Section 4.3.6, "Output Variables" 5305bd8deadSopenharmony_ci 5315bd8deadSopenharmony_ci Compute shaders have no built-in output variables, do not support 5325bd8deadSopenharmony_ci user-defined output variables and do not form a formal interface with any 5335bd8deadSopenharmony_ci other shader stage. All outputs from a compute shader take the form of the 5345bd8deadSopenharmony_ci side effects such as image stores and operations on atomic counters. 5355bd8deadSopenharmony_ci 5365bd8deadSopenharmony_ci Add Section 4.3.7, "Shared", renumber subsequent sections 5375bd8deadSopenharmony_ci 5385bd8deadSopenharmony_ci The <shared> qualifier is used to declare variables that have storage 5395bd8deadSopenharmony_ci shared between all work items of a compute shader workgroup. 5405bd8deadSopenharmony_ci Variables declared as <shared> may only be used in compute shaders 5415bd8deadSopenharmony_ci (see Section 5.5, "Compute Shaders"). Shared variables are implicitly 5425bd8deadSopenharmony_ci coherent. That is, writes to shared variables from one shader invocation 5435bd8deadSopenharmony_ci will eventually be seen by other invocations within the same workgroup. 5445bd8deadSopenharmony_ci 5455bd8deadSopenharmony_ci Variables declared as <shared> may not have initializers and their 5465bd8deadSopenharmony_ci contents are undefined at the beginning of shader execution. Any data 5475bd8deadSopenharmony_ci written to <shared> variables will be visible to other shaders executing 5485bd8deadSopenharmony_ci the same shader within the same workgroup. Order of execution 5495bd8deadSopenharmony_ci with regards to reads and writes to the same <shared> variables by different 5505bd8deadSopenharmony_ci invocations of a shader is not defined. In order to achieve ordering with 5515bd8deadSopenharmony_ci respect to reads and writes to <shared> variables, memory barriers must be 5525bd8deadSopenharmony_ci employed using the barrier() function (see Section 8.15). 5535bd8deadSopenharmony_ci 5545bd8deadSopenharmony_ci There is a limit to the total size of all variables declared as 5555bd8deadSopenharmony_ci <shared> in a single program object. This limit, expressed in units of 5565bd8deadSopenharmony_ci basic machine units may be determined by using the OpenGL API to query the 5575bd8deadSopenharmony_ci value of MAX_COMPUTE_SHARED_MEMORY_SIZE. 5585bd8deadSopenharmony_ci 5595bd8deadSopenharmony_ci Add Section 4.4.1.4, "Compute-Shader Inputs" 5605bd8deadSopenharmony_ci 5615bd8deadSopenharmony_ci There are no layout location qualifiers for compute shader inputs. 5625bd8deadSopenharmony_ci 5635bd8deadSopenharmony_ci Layout qualifier identifiers for compute shader inputs are the workgroup 5645bd8deadSopenharmony_ci size qualifiers: 5655bd8deadSopenharmony_ci 5665bd8deadSopenharmony_ci layout-qualifier-id 5675bd8deadSopenharmony_ci local_size_x = integer-constant 5685bd8deadSopenharmony_ci local_size_y = integer-constant 5695bd8deadSopenharmony_ci local_size_z = integer-constant 5705bd8deadSopenharmony_ci 5715bd8deadSopenharmony_ci <local_size_x>, <local_size_y>, and <local_size_z> are used to define the 5725bd8deadSopenharmony_ci local size of the kernel defined by the compute shader in the first, 5735bd8deadSopenharmony_ci second, and third dimension, respectively. The default size in each 5745bd8deadSopenharmony_ci dimension is 1. If a shader does not specify a size for one of the 5755bd8deadSopenharmony_ci dimensions, that dimension will have a size of 1. 5765bd8deadSopenharmony_ci 5775bd8deadSopenharmony_ci For example, the following declaration in a compute shader 5785bd8deadSopenharmony_ci 5795bd8deadSopenharmony_ci layout (local_size_x = 32, local_size_y = 32) in; 5805bd8deadSopenharmony_ci 5815bd8deadSopenharmony_ci is used to declare a two-dimensional compute shader with a local size of 5825bd8deadSopenharmony_ci 32 x 32 elements as a three-dimensional compute shader where the third dimension is 5835bd8deadSopenharmony_ci one element deep. 5845bd8deadSopenharmony_ci 5855bd8deadSopenharmony_ci As another example, the declaration 5865bd8deadSopenharmony_ci 5875bd8deadSopenharmony_ci layout (local_size_x = 8) in; 5885bd8deadSopenharmony_ci 5895bd8deadSopenharmony_ci effectively specifies that a one-dimensional compute shader is being 5905bd8deadSopenharmony_ci compiled, and its size is 8 elements. 5915bd8deadSopenharmony_ci 5925bd8deadSopenharmony_ci If the local size of the shader in any dimension is greater than the 5935bd8deadSopenharmony_ci maximum size supported by the implementation for that dimension, a 5945bd8deadSopenharmony_ci compile-time error results. Also, if such a layout qualifier is declared more 5955bd8deadSopenharmony_ci than once in the same shader, all those declarations must indicate the same 5965bd8deadSopenharmony_ci workgroup size; otherwise a compile-time error results. If multiple compute 5975bd8deadSopenharmony_ci shaders attached to a single program object declare the workgroup size, 5985bd8deadSopenharmony_ci the declarations must be identical; otherwise a link-time error results. 5995bd8deadSopenharmony_ci Furthermore, if a program object contains any compute shaders, at 6005bd8deadSopenharmony_ci least one must contain an input layout qualifier specifying the 6015bd8deadSopenharmony_ci workgroup sizes of the program, or a link-time error will occur. 6025bd8deadSopenharmony_ci 6035bd8deadSopenharmony_ciAdditions to Chapter 7 of the OpenGL Shading Language Specification, Version 6045bd8deadSopenharmony_ci4.20 (Built-in Variables) 6055bd8deadSopenharmony_ci 6065bd8deadSopenharmony_ci Add to the start of Section 7.1, "Built-In Language Variables", before the 6075bd8deadSopenharmony_ci description of the vertex language built-in variables: 6085bd8deadSopenharmony_ci 6095bd8deadSopenharmony_ci In the compute language, the built-in variables are declared as follows: 6105bd8deadSopenharmony_ci 6115bd8deadSopenharmony_ci // workgroup dimensions 6125bd8deadSopenharmony_ci in uvec3 gl_NumWorkGroups; 6135bd8deadSopenharmony_ci const uvec3 gl_WorkGroupSize; 6145bd8deadSopenharmony_ci 6155bd8deadSopenharmony_ci // workgroup and invocation IDs 6165bd8deadSopenharmony_ci in uvec3 gl_WorkGroupID; 6175bd8deadSopenharmony_ci in uvec3 gl_LocalInvocationID; 6185bd8deadSopenharmony_ci 6195bd8deadSopenharmony_ci // derived variables 6205bd8deadSopenharmony_ci in uvec3 gl_GlobalInvocationID; 6215bd8deadSopenharmony_ci in uint gl_LocalInvocationIndex; 6225bd8deadSopenharmony_ci 6235bd8deadSopenharmony_ci Add the end of Section 7.1, before Section 7.1.1: 6245bd8deadSopenharmony_ci 6255bd8deadSopenharmony_ci The built-in variable <gl_NumWorkGroups> is a compute-shader input 6265bd8deadSopenharmony_ci variable containing the total number of global work items in each 6275bd8deadSopenharmony_ci dimension of the workgroup that will execute the compute shader. 6285bd8deadSopenharmony_ci Its content is equal to the values specified in the <num_groups_x>, 6295bd8deadSopenharmony_ci <num_groups_y>, and <num_groups_z> parameters passed to the 6305bd8deadSopenharmony_ci DispatchCompute API entry point. 6315bd8deadSopenharmony_ci 6325bd8deadSopenharmony_ci The built-in constant <gl_WorkGroupSize> is a compute-shader constant 6335bd8deadSopenharmony_ci containing the workgroup size of the shader. The size of the workgroup 6345bd8deadSopenharmony_ci in the X, Y, and Z dimensions is stored in the x, y, and z components. 6355bd8deadSopenharmony_ci The values stored in <gl_WorkGroupSize> match those specified in the 6365bd8deadSopenharmony_ci required <local_size_x>, <local_size_y>, and <local_size_z> layout 6375bd8deadSopenharmony_ci qualifiers for the current shader. This value is constant so that 6385bd8deadSopenharmony_ci it can be used to size arrays of memory that can be shared within 6395bd8deadSopenharmony_ci the workgroup. 6405bd8deadSopenharmony_ci 6415bd8deadSopenharmony_ci The built-in variable <gl_WorkGroupID> is a compute-shader input 6425bd8deadSopenharmony_ci variable containing the 3-dimensional index of the global workgroup 6435bd8deadSopenharmony_ci that the current invocation is executing in. The possible values range 6445bd8deadSopenharmony_ci across the parameters passed into DispatchCompute, i.e., from (0, 0, 0) to 6455bd8deadSopenharmony_ci (gl_NumWorkGroups.x - 1, gl_NumWorkGroups.y - 1, gl_NumWorkGroups.z - 1). 6465bd8deadSopenharmony_ci 6475bd8deadSopenharmony_ci The built-in variable <gl_LocalInvocationID> is a compute-shader input 6485bd8deadSopenharmony_ci variable containing the 3-dimensional index of the workgroup 6495bd8deadSopenharmony_ci within the global workgroup that the current invocation is executing in. 6505bd8deadSopenharmony_ci The possible values for this variable range across the workgroup 6515bd8deadSopenharmony_ci size, i.e. (0,0,0) to (gl_WorkGroupSize.x - 1, gl_WorkGroupSize.y - 1, 6525bd8deadSopenharmony_ci gl_WorkGroupSize.z - 1). 6535bd8deadSopenharmony_ci 6545bd8deadSopenharmony_ci The built-in variable <gl_GlobalInvocationID> is a compute shader input 6555bd8deadSopenharmony_ci variable containing the global index of the current work item. This 6565bd8deadSopenharmony_ci value uniquely identifies this invocation from all other invocations 6575bd8deadSopenharmony_ci across all workgroups initiated by the current 6585bd8deadSopenharmony_ci DispatchCompute call. This is computed as: 6595bd8deadSopenharmony_ci 6605bd8deadSopenharmony_ci gl_GlobalInvocationID = 6615bd8deadSopenharmony_ci gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID. 6625bd8deadSopenharmony_ci 6635bd8deadSopenharmony_ci The built-in variable <gl_LocalInvocationIndex> is a compute shader 6645bd8deadSopenharmony_ci input variable that contains the 1-dimensional representation of the 6655bd8deadSopenharmony_ci gl_LocalInvocationID. This is useful for uniquely identifying a 6665bd8deadSopenharmony_ci unique region of shared memory within the workgroup for this 6675bd8deadSopenharmony_ci invocation to use. This is computed as: 6685bd8deadSopenharmony_ci 6695bd8deadSopenharmony_ci gl_LocalInvocationIndex = 6705bd8deadSopenharmony_ci gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y + 6715bd8deadSopenharmony_ci gl_LocalInvocationID.y * gl_WorkGroupSize.x + 6725bd8deadSopenharmony_ci gl_LocalInvocationID.x; 6735bd8deadSopenharmony_ci 6745bd8deadSopenharmony_ci Add to the list of built-in constants in Section 7.3: 6755bd8deadSopenharmony_ci 6765bd8deadSopenharmony_ci const ivec3 gl_MaxComputeWorkGroupCount = { 65535, 65535, 65535 }; 6775bd8deadSopenharmony_ci const ivec3 gl_MaxComputeWorkGroupSize = { 1024, 1024, 64 }; 6785bd8deadSopenharmony_ci const int gl_MaxComputeUniformComponents = 512; 6795bd8deadSopenharmony_ci const int gl_MaxComputeTextureImageUnits = 16; 6805bd8deadSopenharmony_ci const int gl_MaxComputeImageUniforms = 8; 6815bd8deadSopenharmony_ci const int gl_MaxComputeAtomicCounters = 8; 6825bd8deadSopenharmony_ci const int gl_MaxComputeAtomicCounterBuffers = 1; 6835bd8deadSopenharmony_ci 6845bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL Shading Language Specification, Version 6855bd8deadSopenharmony_ci4.20 (Built-in Variables) 6865bd8deadSopenharmony_ci 6875bd8deadSopenharmony_ci Insert "Atomic Memory Functions" section after Section 8.10, Atomic 6885bd8deadSopenharmony_ci Counter Functions (p. 149). Atomic memory operations are supported on 6895bd8deadSopenharmony_ci shared variables; the set of operations and their definitions are similar 6905bd8deadSopenharmony_ci to those for the imageAtomic*() functions. These functions are fully 6915bd8deadSopenharmony_ci documented in the ARB_shader_storage_buffer_object extension (see 6925bd8deadSopenharmony_ci dependencies). 6935bd8deadSopenharmony_ci 6945bd8deadSopenharmony_ci Modify the first paragraph of Section 8.15, "Shader Invocation Control 6955bd8deadSopenharmony_ci Functions" to read: 6965bd8deadSopenharmony_ci 6975bd8deadSopenharmony_ci The shader invocation control function is only available in tessellation 6985bd8deadSopenharmony_ci control shaders and compute shaders. It is used to control the relative 6995bd8deadSopenharmony_ci execution order of multiple shader invocations used to process a patch 7005bd8deadSopenharmony_ci (in the case of tessellation control shaders) or a workgroup (in the 7015bd8deadSopenharmony_ci case of compute shaders), which are otherwise executed with an undefined 7025bd8deadSopenharmony_ci order. 7035bd8deadSopenharmony_ci 7045bd8deadSopenharmony_ci +----------------+--------------------------------------------------------------------------+ 7055bd8deadSopenharmony_ci | Syntax | Description | 7065bd8deadSopenharmony_ci +----------------+--------------------------------------------------------------------------+ 7075bd8deadSopenharmony_ci | barrier | For any given static instance of barrier() appearing in a tessellation | 7085bd8deadSopenharmony_ci | | control shader or compute shader, all invocations for a single patch | 7095bd8deadSopenharmony_ci | | or workgroup, respectively, must enter it before any will continue | 7105bd8deadSopenharmony_ci | | beyond it. | 7115bd8deadSopenharmony_ci +----------------+--------------------------------------------------------------------------+ 7125bd8deadSopenharmony_ci 7135bd8deadSopenharmony_ci Modify the second paragraph as follows: 7145bd8deadSopenharmony_ci 7155bd8deadSopenharmony_ci ... Because invocations may execute in an undefined order between these 7165bd8deadSopenharmony_ci barrier calls, the values of a per-vertex or per-patch output variable in 7175bd8deadSopenharmony_ci a tessellation control shader or shared variables for compute shaders 7185bd8deadSopenharmony_ci will be undefined in a number of cases enumerated in Section 4.3.7 "Output 7195bd8deadSopenharmony_ci Variables" (for tessellation control shaders) and Section 4.3.6 "Shared 7205bd8deadSopenharmony_ci Variables" (for compute shaders). 7215bd8deadSopenharmony_ci 7225bd8deadSopenharmony_ci Replace the third paragraph with the following: 7235bd8deadSopenharmony_ci 7245bd8deadSopenharmony_ci For tessellation control shaders, the barrier() function may only be 7255bd8deadSopenharmony_ci placed inside the function main() of the tessellation control shader and 7265bd8deadSopenharmony_ci may not be called within any control flow. Barriers are also disallowed 7275bd8deadSopenharmony_ci after a return statement in the function main(). Any such misplaced 7285bd8deadSopenharmony_ci barriers result in a compile-time error. 7295bd8deadSopenharmony_ci 7305bd8deadSopenharmony_ci For compute shaders, the barrier() function may be placed within flow 7315bd8deadSopenharmony_ci control, but that flow control must be uniform flow control. That is, all 7325bd8deadSopenharmony_ci the controlling expressions that lead to execution of the barrier must be 7335bd8deadSopenharmony_ci dynamically uniform expressions. This ensures that if any shader 7345bd8deadSopenharmony_ci invocation enters a conditional statement, then all invocations will enter 7355bd8deadSopenharmony_ci it. While compilers are encouraged to give warnings if they can detect 7365bd8deadSopenharmony_ci this might not happen, compilers cannot completely determine this. Hence, 7375bd8deadSopenharmony_ci it is the author's responsibility to ensure barrier() only exists inside 7385bd8deadSopenharmony_ci uniform flow control. Otherwise, some shader invocations will stall 7395bd8deadSopenharmony_ci indefinitely, waiting for a barrier that is never reached by other 7405bd8deadSopenharmony_ci invocations. 7415bd8deadSopenharmony_ci 7425bd8deadSopenharmony_ci Modify the table of memory control functions on p.160, 7435bd8deadSopenharmony_ci 7445bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7455bd8deadSopenharmony_ci | Syntax | Description | 7465bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7475bd8deadSopenharmony_ci | void memoryBarrier() | Control the ordering of all memory transactions issued by a single shader invocation. | 7485bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7495bd8deadSopenharmony_ci | void memoryBarrierAtomicCounter() | Control the ordering of accesses to atomic counter variables issued by a single shader | 7505bd8deadSopenharmony_ci | | invocation. | 7515bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7525bd8deadSopenharmony_ci | void memoryBarrierBuffer() | Control the ordering of memory transactions to buffer variables issued within a | 7535bd8deadSopenharmony_ci | | single shader invocation. | 7545bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7555bd8deadSopenharmony_ci | void memoryBarrierImage() | Control the ordering of memory transactions to images issued within a single shader | 7565bd8deadSopenharmony_ci | | invocation. | 7575bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7585bd8deadSopenharmony_ci | void memoryBarrierShared() | Control the ordering of memory transactions to shared variables issued within a single | 7595bd8deadSopenharmony_ci | | shader invocation. | 7605bd8deadSopenharmony_ci | | Only available in compute shaders. | 7615bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7625bd8deadSopenharmony_ci | void groupMemoryBarrier() | Control the ordering of all memory transactions issued within a single shader | 7635bd8deadSopenharmony_ci | | invocation, as viewed by other invocations in the same workgroup. | 7645bd8deadSopenharmony_ci | | Only available in compute shaders. | 7655bd8deadSopenharmony_ci +-----------------------------------+----------------------------------------------------------------------------------------+ 7665bd8deadSopenharmony_ci 7675bd8deadSopenharmony_ci Modify the subsequent paragraph as follows: 7685bd8deadSopenharmony_ci 7695bd8deadSopenharmony_ci The memory barrier built-in functions can be used to order reads and 7705bd8deadSopenharmony_ci writes to variables stored in memory accessible to other shader 7715bd8deadSopenharmony_ci invocations. When called, these functions will wait for the completion of 7725bd8deadSopenharmony_ci all reads and writes previously performed by the caller that access 7735bd8deadSopenharmony_ci selected variable types, and then return with no other effect. The 7745bd8deadSopenharmony_ci built-in functions memoryBarrierAtomicCounter(), memoryBarrierBuffer(), 7755bd8deadSopenharmony_ci memoryBarrierImage(), and memoryBarrierShared() wait for the completion of 7765bd8deadSopenharmony_ci accesses to atomic counter, buffer, image, and shared variables, 7775bd8deadSopenharmony_ci respectively. The built-in functions memoryBarrier() and 7785bd8deadSopenharmony_ci groupMemoryBarrier() wait for the completion of accesses to all of the 7795bd8deadSopenharmony_ci above variable types. The functions memoryBarrierShared() and 7805bd8deadSopenharmony_ci groupMemoryBarrier() are available only in compute shaders; the other 7815bd8deadSopenharmony_ci functions are available in all shader types. 7825bd8deadSopenharmony_ci 7835bd8deadSopenharmony_ci When these functions return, any memory stores performed using coherent 7845bd8deadSopenharmony_ci variables prior to the call will be visible to any future coherent access 7855bd8deadSopenharmony_ci to the same memory performed by any other shader invocation. In 7865bd8deadSopenharmony_ci particular, the values written this way in one shader stage are guaranteed 7875bd8deadSopenharmony_ci to be visible to coherent memory accesses performed by shader invocations 7885bd8deadSopenharmony_ci in subsequent stages when those invocations were triggered by the 7895bd8deadSopenharmony_ci execution of the original shader invocation (e.g., fragment shader 7905bd8deadSopenharmony_ci invocations for a primitive resulting from a particular geometry shader 7915bd8deadSopenharmony_ci invocation). 7925bd8deadSopenharmony_ci 7935bd8deadSopenharmony_ci Additionally, memory barrier functions order stores performed by the 7945bd8deadSopenharmony_ci calling invocation, as observed by other shader invocations. Without 7955bd8deadSopenharmony_ci memory barriers, if one shader invocation performs two stores to coherent 7965bd8deadSopenharmony_ci variables, a second shader invocation might see the values written by the 7975bd8deadSopenharmony_ci second store prior to seeing those written by the first. However, if the 7985bd8deadSopenharmony_ci first shader invocation calls a memory barrier function between the two 7995bd8deadSopenharmony_ci stores, selected other shader invocations will never see the results of 8005bd8deadSopenharmony_ci the second store before seeing those of the first. When using the 8015bd8deadSopenharmony_ci function groupMemoryBarrier(), this ordering guarantee applies only to 8025bd8deadSopenharmony_ci other shader invocations in the same compute shader workgroup; all other 8035bd8deadSopenharmony_ci memory barrier functions provide the guarantee to all other shader 8045bd8deadSopenharmony_ci invocations. No memory barrier is required to guarantee the order of 8055bd8deadSopenharmony_ci memory stores as observed by the invocation performing the stores; an 8065bd8deadSopenharmony_ci invocation reading from a variable that it previously wrote will always 8075bd8deadSopenharmony_ci see the most recently written value unless another shader invocation also 8085bd8deadSopenharmony_ci wrote to the same memory. 8095bd8deadSopenharmony_ci 8105bd8deadSopenharmony_ciDependencies on OpenGL 4.3 and ARB_shader_storage_buffer_object 8115bd8deadSopenharmony_ci 8125bd8deadSopenharmony_ci If OpenGL 4.3 and ARB_shader_storage_buffer_object are not supported, the 8135bd8deadSopenharmony_ci spec language adding the built-in functions atomicAdd(), atomicMin(), 8145bd8deadSopenharmony_ci atomicMax(), atomicAnd(), atomicOr(), atomicXor(), atomicExchange(), and 8155bd8deadSopenharmony_ci atomicCompSwap() should be considered to be incorporated into this 8165bd8deadSopenharmony_ci extension as-is, except that buffer variables will not be supported and 8175bd8deadSopenharmony_ci thus cannot be used with these functions. No "#extension" directive is 8185bd8deadSopenharmony_ci necessary to use these functions in compute shaders. 8195bd8deadSopenharmony_ci 8205bd8deadSopenharmony_ci If OpenGL 4.3 and ARB_shader_storage_buffer_object are not supported, 8215bd8deadSopenharmony_ci references to the GLSL built-in function memoryBarrierBuffer() should be 8225bd8deadSopenharmony_ci removed. 8235bd8deadSopenharmony_ci 8245bd8deadSopenharmony_ciDependencies on NV_vertex_buffer_unified_memory 8255bd8deadSopenharmony_ci 8265bd8deadSopenharmony_ci If NV_vertex_buffer_unified_memory is supported, a new buffer address 8275bd8deadSopenharmony_ci range and enable is provided to permit the use with 8285bd8deadSopenharmony_ci DispatchComputeIndirect with a resident buffer object without requiring 8295bd8deadSopenharmony_ci that it be bound to the DISPATCH_INDIRECT_BUFFER target. The following 8305bd8deadSopenharmony_ci additional edits apply: 8315bd8deadSopenharmony_ci 8325bd8deadSopenharmony_ci Accepted by the <cap> parameter of GetBufferParameterui64vNV: 8335bd8deadSopenharmony_ci 8345bd8deadSopenharmony_ci DISPATCH_INDIRECT_BUFFER (defined above) 8355bd8deadSopenharmony_ci 8365bd8deadSopenharmony_ci Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, and by 8375bd8deadSopenharmony_ci the <pname> parameter of GetIntegerv, GetBooleanv, GetFloatv, GetDoublev 8385bd8deadSopenharmony_ci and GetInteger64v: 8395bd8deadSopenharmony_ci 8405bd8deadSopenharmony_ci DISPATCH_INDIRECT_UNIFIED_NV 0x90FD 8415bd8deadSopenharmony_ci 8425bd8deadSopenharmony_ci Accepted by the <pname> parameter of BufferAddressRangeNV 8435bd8deadSopenharmony_ci and the <value> parameter of GetIntegerui64vNV: 8445bd8deadSopenharmony_ci 8455bd8deadSopenharmony_ci DISPATCH_INDIRECT_ADDRESS_NV 0x90FE 8465bd8deadSopenharmony_ci 8475bd8deadSopenharmony_ci Accepted by the <value> parameter of GetIntegerv: 8485bd8deadSopenharmony_ci 8495bd8deadSopenharmony_ci DISPATCH_INDIRECT_LENGTH_NV 0x90FF 8505bd8deadSopenharmony_ci 8515bd8deadSopenharmony_ci Add to the end of Section 5.5, after discussion of 8525bd8deadSopenharmony_ci DispatchComputeIndirect: 8535bd8deadSopenharmony_ci 8545bd8deadSopenharmony_ci If DISPATCH_INDIRECT_UNIFIED_NV is enabled, DispatchComputeIndirect does 8555bd8deadSopenharmony_ci not use the buffer bound to DISPATCH_INDIRECT_BUFFER. Instead, it sources 8565bd8deadSopenharmony_ci its arguments from the GPU address range specified by calling 8575bd8deadSopenharmony_ci BufferAddressRangeNV with a <pname> of DISPATCH_INDIRECT_ADDRESS_NV and an 8585bd8deadSopenharmony_ci <index> of zero. The address is obtained by adding the <indirect> 8595bd8deadSopenharmony_ci parameter to the base address of the range, specified by the <address> 8605bd8deadSopenharmony_ci parameter of BufferAddressRangeNV. If the command sources data outside 8615bd8deadSopenharmony_ci the specified address range, the error INVALID_OPERATION will be 8625bd8deadSopenharmony_ci generated. The DISPATCH_INDIRECT_BUFFER binding will be ignored in this 8635bd8deadSopenharmony_ci case, and no errors will be generated due to the use of this binding. The 8645bd8deadSopenharmony_ci error INVALID_VALUE will still be generated if <indirect> is negative. No 8655bd8deadSopenharmony_ci INVALID_VALUE error will be generated if <indirect> is not a multiple of 8665bd8deadSopenharmony_ci four, but INVALID_OPERATION will be generated if the effective address is 8675bd8deadSopenharmony_ci not a multiple of four. If the indirect dispatch address range does not 8685bd8deadSopenharmony_ci belong to a buffer object that is resident at the time of the 8695bd8deadSopenharmony_ci DispatchComputeIndirect call, undefined results, possibly including 8705bd8deadSopenharmony_ci program termination, may occur. 8715bd8deadSopenharmony_ci 8725bd8deadSopenharmony_ci Add the following to the "Compute Dispatch State" table defined in this 8735bd8deadSopenharmony_ci extension: 8745bd8deadSopenharmony_ci 8755bd8deadSopenharmony_ci Get Value Type Get Command Initial Value Sec Attribute 8765bd8deadSopenharmony_ci --------- ---- ----------- ------------- --- --------- 8775bd8deadSopenharmony_ci DISPATCH_INDIRECT_UNIFIED_NV B IsEnabled FALSE 5.5 none 8785bd8deadSopenharmony_ci DISPATCH_INDIRECT_ADDRESS_NV Z64+ GetIntegerui64vNV 0 5.5 none 8795bd8deadSopenharmony_ci DISPATCH_INDIRECT_LENGTH_NV Z+ GetIntegerv 0 5.5 none 8805bd8deadSopenharmony_ci 8815bd8deadSopenharmony_ciErrors 8825bd8deadSopenharmony_ci 8835bd8deadSopenharmony_ci INVALID_OPERATION is generated by DispatchCompute or 8845bd8deadSopenharmony_ci DispatchComputeIndirect if there is no active program for the compute 8855bd8deadSopenharmony_ci shader stage. 8865bd8deadSopenharmony_ci 8875bd8deadSopenharmony_ci INVALID_VALUE is generated by DispatchCompute if any of <num_groups_x>, 8885bd8deadSopenharmony_ci <num_groups_y> or <num_groups_z> is greater than the value of 8895bd8deadSopenharmony_ci MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding dimension. 8905bd8deadSopenharmony_ci 8915bd8deadSopenharmony_ci INVALID_VALUE is generated by DispatchComputeIndirect if <indirect> is 8925bd8deadSopenharmony_ci less than zero or not a multiple of four. 8935bd8deadSopenharmony_ci 8945bd8deadSopenharmony_ci INVALID_OPERATION is generated by DispatchComputeIndirect if no buffer is 8955bd8deadSopenharmony_ci bound to DISPATCH_INDIRECT_BUFFER or if the command would source data 8965bd8deadSopenharmony_ci beyond the end of the bound buffer object. 8975bd8deadSopenharmony_ci 8985bd8deadSopenharmony_ci INVALID_OPERATION is generated by GetProgramiv is <pname> is 8995bd8deadSopenharmony_ci COMPUTE_WORK_GROUP_SIZE and either the program has not been linked 9005bd8deadSopenharmony_ci successfully, or has been linked but contains no compute shaders. 9015bd8deadSopenharmony_ci 9025bd8deadSopenharmony_ci LinkProgram will fail if <program> contains a combination of compute and 9035bd8deadSopenharmony_ci non-compute shaders. 9045bd8deadSopenharmony_ci 9055bd8deadSopenharmony_ciNew State 9065bd8deadSopenharmony_ci 9075bd8deadSopenharmony_ci None. 9085bd8deadSopenharmony_ci 9095bd8deadSopenharmony_ciNew Implementation Dependent State 9105bd8deadSopenharmony_ci 9115bd8deadSopenharmony_ci Add to Table 6.31, "Program Pipeline Object State" 9125bd8deadSopenharmony_ci 9135bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9145bd8deadSopenharmony_ci | Get Value | Type | Get Command | Initial Value | Description | Sec. | 9155bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9165bd8deadSopenharmony_ci | COMPUTE_SHADER | Z+ | GetProgramPipelineiv | 0 | Name of current compute shader project object | 2.11.4 | 9175bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9185bd8deadSopenharmony_ci 9195bd8deadSopenharmony_ci Add to Table 6.32, "Program Object State" 9205bd8deadSopenharmony_ci 9215bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9225bd8deadSopenharmony_ci | Get Value | Type | Get Command | Initial Value | Description | Sec. | 9235bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9245bd8deadSopenharmony_ci | COMPUTE_WORK_GROUP_SIZE | 3 x Z+ | GetProgramiv | { 0, ... } | Workgroup size of a linked compute program | 5.5 | 9255bd8deadSopenharmony_ci | UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER | B | GetActiveUniformBlockiv | FALSE | True if uniform block is referenced by the compute stage | 2.17.7 | 9265bd8deadSopenharmony_ci | ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER | B | GetActiveAtomicCounter- | FALSE | AACB has a counter used by compute shaders | 2.17.7 | 9275bd8deadSopenharmony_ci | | | Bufferiv | FALSE | | | 9285bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9295bd8deadSopenharmony_ci 9305bd8deadSopenharmony_ci Insert new table named "Compute Dispatch State", after Table 6.46 "Hints": 9315bd8deadSopenharmony_ci 9325bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9335bd8deadSopenharmony_ci | Get Value | Type | Get Command | Initial Value | Description | Sec. | 9345bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9355bd8deadSopenharmony_ci | DISPATCH_INDIRECT_BUFFER_BINDING | Z+ | GetIntegerv | 0 | Indirect dispatch buffer binding | 5.5 | 9365bd8deadSopenharmony_ci +----------------------------------------------------+-----------+-------------------------+---------------+-----------------------------------------------------------------------+---------+ 9375bd8deadSopenharmony_ci 9385bd8deadSopenharmony_ci Insert Table 6.50, "Implementation Dependent Compute Shader Limits", 9395bd8deadSopenharmony_ci renumber subsequent tables. 9405bd8deadSopenharmony_ci 9415bd8deadSopenharmony_ci +-----------------------------------------+-----------+---------------+---------------------+-----------------------------------------------------------------------+---------+ 9425bd8deadSopenharmony_ci | Get Value | Type | Get Command | Minimum Value | Description | Sec. | 9435bd8deadSopenharmony_ci +-----------------------------------------+-----------+---------------+---------------------+-----------------------------------------------------------------------+---------+ 9445bd8deadSopenharmony_ci | MAX_COMPUTE_WORK_GROUP_COUNT | 3 x Z+ | GetIntegeri_v | 65535 | Maximum number of workgroups that may be dispatched by a single | 5.5 | 9455bd8deadSopenharmony_ci | | | | | dispatch command (per dimension) | | 9465bd8deadSopenharmony_ci | MAX_COMPUTE_WORK_GROUP_SIZE | 3 x Z+ | GetIntegeri_v | 1024 (x, y), 64 (z) | Maximum local size of a compute workgroup (per dimension) | 5.5 | 9475bd8deadSopenharmony_ci | MAX_COMPUTE_WORK_GROUP_INVOCATIONS | Z+ | GetIntegerv | 1024 | Maximum total compute shader invocations in a single workgroup | 5.5 | 9485bd8deadSopenharmony_ci | MAX_COMPUTE_UNIFORM_BLOCKS | Z+ | GetIntegerv | 12 | Maximum number of uniform blocks per compute program | 2.11.7 | 9495bd8deadSopenharmony_ci | MAX_COMPUTE_TEXTURE_IMAGE_UNITS | Z+ | GetIntegerv | 16 | Maximum number of texture image units accessible by a compute shader | 2.11.12 | 9505bd8deadSopenharmony_ci | MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS | Z+ | GetIntegerv | 8 | Number of atomic counter buffers accessed by a compute shader | 2.11.17 | 9515bd8deadSopenharmony_ci | MAX_COMPUTE_ATOMIC_COUNTERS | Z+ | GetIntegerv | 8 | Number of atomic counters accessed by a compute shader | 2.11.12 | 9525bd8deadSopenharmony_ci | MAX_COMPUTE_SHARED_MEMORY_SIZE | Z+ | GetIntegerv | 32768 | Maximum total storage size of all variables declared as <shared> in | | 9535bd8deadSopenharmony_ci | | | | | all compute shaders linked into a single program object | | 9545bd8deadSopenharmony_ci | MAX_COMPUTE_UNIFORM_COMPONENTS | Z+ | GetIntegerv | 512 | Number of components for compute shader uniform variables | 5.5.1 | 9555bd8deadSopenharmony_ci | MAX_COMPUTE_IMAGE_UNIFORMS | Z+ | GetIntegerv | 8 | Number of image variables in compute shaders | 2.11.12 | 9565bd8deadSopenharmony_ci | MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS | Z+ | GetIntegerv | * | Number of words for compute shader uniform variables in all uniform | 5.5.1 | 9575bd8deadSopenharmony_ci | | | | | blocks, including the default | | 9585bd8deadSopenharmony_ci +-----------------------------------------+-----------+---------------+---------------------+-----------------------------------------------------------------------+---------+ 9595bd8deadSopenharmony_ci 9605bd8deadSopenharmony_ci Modify Table 6.55, increasing the following minimum values: 9615bd8deadSopenharmony_ci 9625bd8deadSopenharmony_ci MAX_COMBINED_TEXTURE_IMAGE_UNITS 96 (6*16), was 80 9635bd8deadSopenharmony_ci MAX_UNIFORM_BUFFER_BINDINGS 72 (6*12), was 60 9645bd8deadSopenharmony_ci 9655bd8deadSopenharmony_ciIssues 9665bd8deadSopenharmony_ci 9675bd8deadSopenharmony_ci 1) Should <shared> variables be usable only in compute shaders, or in other 9685bd8deadSopenharmony_ci stages too? 9695bd8deadSopenharmony_ci 9705bd8deadSopenharmony_ci RESOLVED: Support only in compute shaders. While some hardware may be 9715bd8deadSopenharmony_ci able to support shared variables in shader stages other than compute, 9725bd8deadSopenharmony_ci it is difficult to clearly define what the semantics are as far as 9735bd8deadSopenharmony_ci sharing. For example, what is the equivalent for a workgroup for 9745bd8deadSopenharmony_ci vertex shaders? 9755bd8deadSopenharmony_ci 9765bd8deadSopenharmony_ci 2) Can we expose atomics on <shared> variables? 9775bd8deadSopenharmony_ci 9785bd8deadSopenharmony_ci RESOLVED: Yes. The existing atomics in OpenGL 4.2 (via image 9795bd8deadSopenharmony_ci variables) don't map well to the <shared> declaration. Instead, we've 9805bd8deadSopenharmony_ci defined new atomic functions that take a variable as a first input. 9815bd8deadSopenharmony_ci These functions are specified in the ARB_shader_storage_buffer_object 9825bd8deadSopenharmony_ci extension and are incorporated into this extension via the interaction 9835bd8deadSopenharmony_ci described above. We could have also chosen to define operators +=, &=, 9845bd8deadSopenharmony_ci etc. to be atomic when applied to <shared> variables, but shaders may 9855bd8deadSopenharmony_ci want to use such variables in cases where atomic access (and the 9865bd8deadSopenharmony_ci related overhead) is not required. 9875bd8deadSopenharmony_ci 9885bd8deadSopenharmony_ci 3) Should the local size and dimensions of the workgroup be specified at 9895bd8deadSopenharmony_ci compile time? What are the default local dimensions? 9905bd8deadSopenharmony_ci 9915bd8deadSopenharmony_ci RESOLVED: Dimension is always 3 and a workgroup size declaration is 9925bd8deadSopenharmony_ci compulsory at compile time. There is no default. The value used is 9935bd8deadSopenharmony_ci queriable. To use a 1- or 2-dimensional workgroup, the extra 9945bd8deadSopenharmony_ci dimension(s) can be set to 1. 9955bd8deadSopenharmony_ci 9965bd8deadSopenharmony_ci 4) Do we need the local_work_size parameter in dispatch if the local size 9975bd8deadSopenharmony_ci may be specified at compile time in the shader? 9985bd8deadSopenharmony_ci 9995bd8deadSopenharmony_ci RESOLVED: The specification of the workgroup size is now mandatory in 10005bd8deadSopenharmony_ci the shader source at compile time and the local_work_size may no longer 10015bd8deadSopenharmony_ci be specified at dispatch time. 10025bd8deadSopenharmony_ci 10035bd8deadSopenharmony_ci 5) How do multiple shaders attached to a single program object work? 10045bd8deadSopenharmony_ci 10055bd8deadSopenharmony_ci RESOLVED: Just as with any other shader stage. Exactly one of the 10065bd8deadSopenharmony_ci shaders must provide the 'main' entry point. All shaders attached to a 10075bd8deadSopenharmony_ci program object effectively get compiled into a single, large program at 10085bd8deadSopenharmony_ci link time. The program is dispatched as one big entity. Über shader 10095bd8deadSopenharmony_ci type functionality can be achieved through the use of subroutine 10105bd8deadSopenharmony_ci uniforms, which also work exactly as for other shader stages. 10115bd8deadSopenharmony_ci 10125bd8deadSopenharmony_ci 6) Should compute dispatch honor conditional rendering? 10135bd8deadSopenharmony_ci 10145bd8deadSopenharmony_ci RESOLVED: Yes, it does honor conditional rendering. 10155bd8deadSopenharmony_ci 10165bd8deadSopenharmony_ci 7) Is it possible to pass compute programs to UseProgram, etc.? 10175bd8deadSopenharmony_ci 10185bd8deadSopenharmony_ci RESOLVED: Yes, compute programs can be made current via UseProgram and 10195bd8deadSopenharmony_ci can be made current in a program pipeline object via UseProgramStages. 10205bd8deadSopenharmony_ci Note that a compute program must be linked with PROGRAM_SEPARABLE set 10215bd8deadSopenharmony_ci to TRUE to be passed to UseProgramStages, even though the compute 10225bd8deadSopenharmony_ci pipeline has only a single shader stage. 10235bd8deadSopenharmony_ci 10245bd8deadSopenharmony_ci The active compute program that will be used by DispatchCompute will be 10255bd8deadSopenharmony_ci determined in the same manner as the active program for any other 10265bd8deadSopenharmony_ci program stage: 10275bd8deadSopenharmony_ci 10285bd8deadSopenharmony_ci * If there is a current program specified via UseProgram, that 10295bd8deadSopenharmony_ci program is considered current for all stages, including compute. 10305bd8deadSopenharmony_ci 10315bd8deadSopenharmony_ci * Otherwise, if there is a current program pipeline object, the 10325bd8deadSopenharmony_ci program current for the compute stage of the pipeline object is 10335bd8deadSopenharmony_ci considered current for the compute stage. 10345bd8deadSopenharmony_ci 10355bd8deadSopenharmony_ci * If neither of the former apply, no program is current for the 10365bd8deadSopenharmony_ci compute stage. 10375bd8deadSopenharmony_ci 10385bd8deadSopenharmony_ci The program that is current for the compute stage is considered to be 10395bd8deadSopenharmony_ci active if and only if it has a compute shader executable. For example, 10405bd8deadSopenharmony_ci if a non-compute program is made current via UseProgram, it will also 10415bd8deadSopenharmony_ci be considered "current" for the compute stage, but won't be considered 10425bd8deadSopenharmony_ci active. 10435bd8deadSopenharmony_ci 10445bd8deadSopenharmony_ci When using program pipeline objects, it's possible to switch between 10455bd8deadSopenharmony_ci graphics and compute work without switching programs. For example, in: 10465bd8deadSopenharmony_ci 10475bd8deadSopenharmony_ci glBindProgramPipeline(pipeline); 10485bd8deadSopenharmony_ci glUseProgramStages(pipeline, GL_VERTEX_SHADER_BIT, programA); 10495bd8deadSopenharmony_ci glUseProgramStages(pipeline, GL_FRAGMENT_SHADER_BIT, programB); 10505bd8deadSopenharmony_ci glUseProgramStages(pipeline, GL_COMPUTE_SHADER_BIT, programC); 10515bd8deadSopenharmony_ci glDrawArrays(GL_TRIANGLES, 0, 900); 10525bd8deadSopenharmony_ci glDispatchCompute(5, 5, 5); 10535bd8deadSopenharmony_ci 10545bd8deadSopenharmony_ci the triangles will be processed by programA and programB, while the 10555bd8deadSopenharmony_ci compute dispatch will be processed by programC. Similarly, 10565bd8deadSopenharmony_ci 10575bd8deadSopenharmony_ci glUseProgramStages(pipeline, ~GL_COMPUTE_SHADER_BIT, programAB); 10585bd8deadSopenharmony_ci glUseProgramStages(pipeline, GL_COMPUTE_SHADER_BIT, programC); 10595bd8deadSopenharmony_ci glDrawArrays(GL_TRIANGLES, 0, 900); 10605bd8deadSopenharmony_ci glDispatchCompute(5, 5, 5); 10615bd8deadSopenharmony_ci 10625bd8deadSopenharmony_ci will have the triangles processed by the multi-stage programAB. 10635bd8deadSopenharmony_ci 10645bd8deadSopenharmony_ci 8) What happens if you try to draw with no active compute program? 10655bd8deadSopenharmony_ci 10665bd8deadSopenharmony_ci RESOLVED: An INVALID_OPERATION error is generated if there is no 10675bd8deadSopenharmony_ci active program for the compute shader stage. 10685bd8deadSopenharmony_ci 10695bd8deadSopenharmony_ci 9) Should we increase minimums on certain replicated state bindings 10705bd8deadSopenharmony_ci (texture image units, uniform buffer bindings) to reflect the addition 10715bd8deadSopenharmony_ci of a sixth shader stage? 10725bd8deadSopenharmony_ci 10735bd8deadSopenharmony_ci RESOLVED: Yes, for MAX_COMBINED_TEXTURE_IMAGE_UNITS and 10745bd8deadSopenharmony_ci MAX_UNIFORM_BUFFER_BINDINGS. These limits permit applications to 10755bd8deadSopenharmony_ci statically partition the shared set of texture bindings into six 10765bd8deadSopenharmony_ci separate sets, one per shader stage. 10775bd8deadSopenharmony_ci 10785bd8deadSopenharmony_ci The limit MAX_COMBINED_UNIFORM_BLOCKS is not increased, because it 10795bd8deadSopenharmony_ci reflects the sum of the number of uniform blocks used in each stage of 10805bd8deadSopenharmony_ci a single program. Since no single program can have more than five 10815bd8deadSopenharmony_ci stages, these limits don't need to be increased. 10825bd8deadSopenharmony_ci 10835bd8deadSopenharmony_ci 10) How do the shader built-in variables relate to DirectCompute's 10845bd8deadSopenharmony_ci built-in system values (SV_*)? 10855bd8deadSopenharmony_ci 10865bd8deadSopenharmony_ci OpenGL Compute DirectCompute 10875bd8deadSopenharmony_ci -------------------------------------------------- 10885bd8deadSopenharmony_ci gl_NumWorkGroups -- 10895bd8deadSopenharmony_ci gl_WorkGroupSize -- 10905bd8deadSopenharmony_ci gl_WorkGroupID SV_GroupID 10915bd8deadSopenharmony_ci gl_LocalInvocationID SV_GroupThreadID 10925bd8deadSopenharmony_ci gl_GlobalInvocationID SV_DispatchThreadID 10935bd8deadSopenharmony_ci gl_LocalInvocationIndex SV_GroupIndex 10945bd8deadSopenharmony_ci 10955bd8deadSopenharmony_ci 11) How does "program validation" (checking the active programs against 10965bd8deadSopenharmony_ci the current state) apply to DispatchCompute? 10975bd8deadSopenharmony_ci 10985bd8deadSopenharmony_ci RESOLVED: The same program validation logic will be applied to both 10995bd8deadSopenharmony_ci graphics primitives (e.g., DrawArrays) and compute dispatches. 11005bd8deadSopenharmony_ci Conditions that will cause validation errors for graphics primitives 11015bd8deadSopenharmony_ci will also cause validation errors for compute dispatch, even if the 11025bd8deadSopenharmony_ci conditions wouldn't otherwise affect compute, for example: 11035bd8deadSopenharmony_ci 11045bd8deadSopenharmony_ci * Mis-configured program pipeline objects (e.g., inserting a geometry 11055bd8deadSopenharmony_ci program A between the linked vertex and fragment shaders of of 11065bd8deadSopenharmony_ci program B). 11075bd8deadSopenharmony_ci 11085bd8deadSopenharmony_ci * A graphics program has a vertex shader that uses a 2D texture from 11095bd8deadSopenharmony_ci texture image unit 0 and a fragment shader that uses a 3D texture 11105bd8deadSopenharmony_ci from texture image unit 0. 11115bd8deadSopenharmony_ci 11125bd8deadSopenharmony_ci Similarly, validation errors specific to the compute shader executable 11135bd8deadSopenharmony_ci (e.g., using different targets on a single texture image unit in a 11145bd8deadSopenharmony_ci compute program) will generate validation errors for graphics Draw* 11155bd8deadSopenharmony_ci calls. 11165bd8deadSopenharmony_ci 11175bd8deadSopenharmony_ci We chose to specify this behavior for several reasons. First, using the 11185bd8deadSopenharmony_ci same logic in both places ensures a single result for ValidateProgram 11195bd8deadSopenharmony_ci and ValidateProgramPipeline (a single VALIDATE_STATUS value wouldn't be 11205bd8deadSopenharmony_ci good enough if the result could be different for compute and graphics). 11215bd8deadSopenharmony_ci Additionally, a single test allows implementations to set up state and 11225bd8deadSopenharmony_ci perform validation tests for compute and graphics operations at the same 11235bd8deadSopenharmony_ci time, without requiring additional irregular graphics- or 11245bd8deadSopenharmony_ci compute-specific logic. 11255bd8deadSopenharmony_ci 11265bd8deadSopenharmony_ci 12) We specify an INVALID_OPERATION error for DispatchCompute when there 11275bd8deadSopenharmony_ci is no active program on the compute stage. Should we specify similar 11285bd8deadSopenharmony_ci errors for Draw* calls if the current program specified by UseProgram 11295bd8deadSopenharmony_ci is a compute program? 11305bd8deadSopenharmony_ci 11315bd8deadSopenharmony_ci RESOLVED: Not in the current spec. If a compute shader is made 11325bd8deadSopenharmony_ci current with UseProgram, there will be no active program for either the 11335bd8deadSopenharmony_ci vertex and fragment stages. In this case, the results of vertex and 11345bd8deadSopenharmony_ci fragment processing are undefined, but no error is generated. This 11355bd8deadSopenharmony_ci behavior is already specified in unextended OpenGL 4.2. 11365bd8deadSopenharmony_ci 11375bd8deadSopenharmony_ci We don't generate errors in this case for several reasons: 11385bd8deadSopenharmony_ci 11395bd8deadSopenharmony_ci * For the compatibility profile, fixed-function vertex and fragment 11405bd8deadSopenharmony_ci processing is available, and INVALID_OPERATION wouldn't make sense 11415bd8deadSopenharmony_ci there. 11425bd8deadSopenharmony_ci 11435bd8deadSopenharmony_ci * Even in the core profile, there are cases where no active fragment 11445bd8deadSopenharmony_ci shader is needed (e.g., primitives with RASTERIZER_DISCARD enabled). 11455bd8deadSopenharmony_ci 11465bd8deadSopenharmony_ci While there is no case where having only a compute program makes sense, 11475bd8deadSopenharmony_ci at least in the core profile, we chose to keep the same undefined 11485bd8deadSopenharmony_ci behavior that's already in place. 11495bd8deadSopenharmony_ci 11505bd8deadSopenharmony_ci 13) Should we provide any additional support extending the memoryBarrier() 11515bd8deadSopenharmony_ci GLSL built-in function provided by ARB_shader_image_load_store and 11525bd8deadSopenharmony_ci GLSL 4.20? 11535bd8deadSopenharmony_ci 11545bd8deadSopenharmony_ci RESOLVED: Yes. The memoryBarrier() function provided by GLSL 4.20 11555bd8deadSopenharmony_ci requires (a) synchronizing all memory transactions that might be visible 11565bd8deadSopenharmony_ci to other shader invocations and (b) ordering memory transactions so that 11575bd8deadSopenharmony_ci all other shader invocations never see stores issued after the barrier 11585bd8deadSopenharmony_ci before seeing stores issued before the barrier. Hardware 11595bd8deadSopenharmony_ci implementations of GLSL 4.20 may have a high degree of parallelism, 11605bd8deadSopenharmony_ci where the memory subsystem servicing shader loads and stores may have 11615bd8deadSopenharmony_ci multiple independent sub-units, and where the shader invocations 11625bd8deadSopenharmony_ci themselves may be executed in parallel on many shader cores. The 11635bd8deadSopenharmony_ci memoryBarrier() command may be fairly heavyweight, requiring 11645bd8deadSopenharmony_ci synchronization with all memory sub-units and shader cores. 11655bd8deadSopenharmony_ci 11665bd8deadSopenharmony_ci We provide new functions in two different directions that might serve as 11675bd8deadSopenharmony_ci lighter weight alternatives to memoryBarrier(). In particular, we 11685bd8deadSopenharmony_ci provide four new functions 11695bd8deadSopenharmony_ci 11705bd8deadSopenharmony_ci void memoryBarrierAtomicCounter(); 11715bd8deadSopenharmony_ci void memoryBarrierBuffer(); 11725bd8deadSopenharmony_ci void memoryBarrierImage(); 11735bd8deadSopenharmony_ci void memoryBarrierShared(); 11745bd8deadSopenharmony_ci 11755bd8deadSopenharmony_ci that order transactions of only a specific memory type and might require 11765bd8deadSopenharmony_ci synchronization with fewer sub-units of the memory subsystem and a new 11775bd8deadSopenharmony_ci function: 11785bd8deadSopenharmony_ci 11795bd8deadSopenharmony_ci void groupMemoryBarrier(); 11805bd8deadSopenharmony_ci 11815bd8deadSopenharmony_ci that only order transactions as viewed by other threads in the same 11825bd8deadSopenharmony_ci workgroup, which might not require synchronization with other shader cores. 11835bd8deadSopenharmony_ci Since shared memory is only accessible to threads within a single 11845bd8deadSopenharmony_ci workgroup, memoryBarrierShared() also only requires synchronization with 11855bd8deadSopenharmony_ci other threads in the same workgroup. 11865bd8deadSopenharmony_ci 11875bd8deadSopenharmony_ciRevision History 11885bd8deadSopenharmony_ci 11895bd8deadSopenharmony_ci Rev. Date Author Changes 11905bd8deadSopenharmony_ci ---- -------- --------- ----------------------------------------- 11915bd8deadSopenharmony_ci 28 12/10/18 Jon Leech Use 'workgroup' consistently throughout (Bug 11925bd8deadSopenharmony_ci 11723, internal API issue 87). 11935bd8deadSopenharmony_ci 27 07/24/14 Jon Leech Change value of GLSL limit 11945bd8deadSopenharmony_ci gl_MaxComputeUniformComponents to 512 for 11955bd8deadSopenharmony_ci consistency with the API (Bug 12370). 11965bd8deadSopenharmony_ci 26 01/30/14 Jon Leech Add table 6.31 COMPUTE_SHADER entry for 11975bd8deadSopenharmony_ci program pipeline objects (Bug 11539). 11985bd8deadSopenharmony_ci 25 10/23/12 pbrown Remove the restriction forbidding the use of 11995bd8deadSopenharmony_ci barrier() inside potentially divergent flow 12005bd8deadSopenharmony_ci control. Instead, we will allow barrier() to 12015bd8deadSopenharmony_ci be executed anywhere, but specify undefined 12025bd8deadSopenharmony_ci results (including hangs or program termination) 12035bd8deadSopenharmony_ci if the flow control is divergent (bug 9367). 12045bd8deadSopenharmony_ci 24 07/01/12 Jon Leech Fix typo (bug 8984). 12055bd8deadSopenharmony_ci 23 06/28/12 johnk Remove two other references to "thread", add 12065bd8deadSopenharmony_ci "Only available in compute shaders" to the table 12075bd8deadSopenharmony_ci for memoryBarrierShared() and groupMemoryBarrier(), 12085bd8deadSopenharmony_ci fixed a typo. 12095bd8deadSopenharmony_ci 22 06/22/12 pbrown Add a new built-in memoryBarrierBuffer() as an 12105bd8deadSopenharmony_ci interaction with ARB_shader_storage_buffer. Add 12115bd8deadSopenharmony_ci a new built-in groupMemoryBarrier() that orders 12125bd8deadSopenharmony_ci memory transactions only as observed by other 12135bd8deadSopenharmony_ci shader invocations in the same work group. 12145bd8deadSopenharmony_ci Enhance the description of the GLSL memory 12155bd8deadSopenharmony_ci barrier functions. Add issue 13 about the new 12165bd8deadSopenharmony_ci memory barrier functions added in this extension 12175bd8deadSopenharmony_ci (bug 9199). Mark issues 11 and 12 as resolved. 12185bd8deadSopenharmony_ci Add NV_vertex_buffer_unified_memory interaction 12195bd8deadSopenharmony_ci allowing DispatchComputeIndirect to read its 12205bd8deadSopenharmony_ci arguments from any resident buffer object 12215bd8deadSopenharmony_ci instead of the single bound indirect dispatch 12225bd8deadSopenharmony_ci buffer. 12235bd8deadSopenharmony_ci 21 06/21/12 gsellers Clarify that there are no built-in inputs or 12245bd8deadSopenharmony_ci outputs in compute shaders (bug 9200). 12255bd8deadSopenharmony_ci 20 06/21/12 gsellers Throw INVALID_OPERATION if querying 12265bd8deadSopenharmony_ci COMPUTE_WORK_GROUP_SIZE from unlinked program or 12275bd8deadSopenharmony_ci program with no compute shader (bug 9117). 12285bd8deadSopenharmony_ci 19 06/18/12 pbrown DispatchComputeIndirect throws INVALID_VALUE 12295bd8deadSopenharmony_ci if <indirect> is negative or misaligned (bug 12305bd8deadSopenharmony_ci 9181). 12315bd8deadSopenharmony_ci 18 06/17/12 pbrown Clarify that compute-only programs can be used 12325bd8deadSopenharmony_ci by both UseProgram and UseProgramStages, and add 12335bd8deadSopenharmony_ci a COMPUTE_SHADER_BIT for UseProgramStages (bug 12345bd8deadSopenharmony_ci 9155). Specify that validation errors checking 12355bd8deadSopenharmony_ci programs against each other and the GL state 12365bd8deadSopenharmony_ci apply equally to graphics primitives (Draw*) and 12375bd8deadSopenharmony_ci compute dispatches. Update issue 7; add new 12385bd8deadSopenharmony_ci issues 11 and 12. Clarify that compute shader 12395bd8deadSopenharmony_ci invocations in a workgroup are run "potentially 12405bd8deadSopenharmony_ci in parallel", but not "in lockstep" (bug 9151). 12415bd8deadSopenharmony_ci Other minor wording improvements. 12425bd8deadSopenharmony_ci 17 06/15/12 johnk Don't allow location layout qualifiers for 12435bd8deadSopenharmony_ci compute shader inputs. 12445bd8deadSopenharmony_ci 16 06/15/12 johnk In the intro material, allow work groups to 12455bd8deadSopenharmony_ci only potentially execute in parallel, and use 12465bd8deadSopenharmony_ci control barriers to synchronize. Other minor 12475bd8deadSopenharmony_ci fixes. 12485bd8deadSopenharmony_ci 15 06/15/12 dgkoch Added Additions to Ch.2 of Shading Language. 12495bd8deadSopenharmony_ci Renamed shader built-in variables, explained 12505bd8deadSopenharmony_ci them better, made them uvec3 instead of int[3]. 12515bd8deadSopenharmony_ci Added derived shading language variables. 12525bd8deadSopenharmony_ci Renamed and changed built-in constants for 12535bd8deadSopenharmony_ci consistency with the variables. Removed 12545bd8deadSopenharmony_ci gl_MaxComputeWorkDimensions since it is no 12555bd8deadSopenharmony_ci longer necessary. Renamed API constants to 12565bd8deadSopenharmony_ci be consistent with shading language terminology. 12575bd8deadSopenharmony_ci Remove a few rogue references to variable 12585bd8deadSopenharmony_ci number of dispatch arguments. Added Issue 10. 12595bd8deadSopenharmony_ci (bugs 9151, 9167) 12605bd8deadSopenharmony_ci 14 06/14/12 pbrown Modify DispatchComputeIndirect to accept an 12615bd8deadSopenharmony_ci "intptr"-typed offset instead of a "void *", 12625bd8deadSopenharmony_ci since doesn't accept pointers to client memory. 12635bd8deadSopenharmony_ci Modify DispatchComputeIndirect to use a new 12645bd8deadSopenharmony_ci buffer binding (DISPATCH_INDIRECT_BUFFER) 12655bd8deadSopenharmony_ci instead of sharing the binding used by 12665bd8deadSopenharmony_ci Draw*Indirect. Add missing entries in the "New 12675bd8deadSopenharmony_ci Tokens" section and assign values. Update 12685bd8deadSopenharmony_ci documentation of COMMAND_BARRIER_BIT to reflect 12695bd8deadSopenharmony_ci the new dispatch indirect binding. Document 12705bd8deadSopenharmony_ci DispatchComputeIndirect errors for offsets that 12715bd8deadSopenharmony_ci are negative, misaligned, or run off the end of 12725bd8deadSopenharmony_ci the bound buffer. Increase minimums for 12735bd8deadSopenharmony_ci combined texture image units and uniform buffer 12745bd8deadSopenharmony_ci bindings to reflect the new stage. Update 12755bd8deadSopenharmony_ci various issues, add new issue 9 (bug 9130). 12765bd8deadSopenharmony_ci 13 06/14/12 Jon Leech Copy description of MAX_COMPUTE_SHARED_MEMORY_SIZE 12775bd8deadSopenharmony_ci into API spec from GLSL spec (bug 9069). 12785bd8deadSopenharmony_ci 12 05/14/12 pbrown Add interaction with ARB_shader_storage_buffer_ 12795bd8deadSopenharmony_ci object. The built-in functions provided there 12805bd8deadSopenharmony_ci for atomic memory operations on buffer variables 12815bd8deadSopenharmony_ci are also supported for the shared variables 12825bd8deadSopenharmony_ci provided here. The functions themselves are 12835bd8deadSopenharmony_ci documented fully in the other specification. 12845bd8deadSopenharmony_ci 11 05/14/12 johnk Keep the previous logical contents of the last 12855bd8deadSopenharmony_ci paragraph of the memory shader control functions. 12865bd8deadSopenharmony_ci 10 04/26/12 gsellers Count max compute shared variable size in bytes. 12875bd8deadSopenharmony_ci Make shared variables implicitly coherent. 12885bd8deadSopenharmony_ci Add MAX_COMPUTE_UNIFORM_COMPONENTS. 12895bd8deadSopenharmony_ci Clean up MAX_COMPUTE_IMAGE_UNIFORMS. 12905bd8deadSopenharmony_ci 9 04/25/12 gsellers Add UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER 12915bd8deadSopenharmony_ci and ATOMIC_COUNTER_BUFFER_REFERENCED_BY_- 12925bd8deadSopenharmony_ci COMPUTE_SHADER. Remove <program> from dispatch 12935bd8deadSopenharmony_ci APIs. Add memoryBarrier{Image,Shared, 12945bd8deadSopenharmony_ci AtomicCounter}(). 12955bd8deadSopenharmony_ci 8 04/05/12 gsellers Remove ARB suffixes. 12965bd8deadSopenharmony_ci 7 02/02/12 gsellers Require OpenGL 4.2. 12975bd8deadSopenharmony_ci Add issue 8. 12985bd8deadSopenharmony_ci Up various minimums. 12995bd8deadSopenharmony_ci Remove variable dimensionality. 13005bd8deadSopenharmony_ci 6 01/24/12 gsellers Require OpenGL 3.0. 13015bd8deadSopenharmony_ci Incorporate feedback from bmerry. 13025bd8deadSopenharmony_ci Add compute shader constants to sec. 7.7. 13035bd8deadSopenharmony_ci Add modifications to sec. 8.15 of the GLSL spec. 13045bd8deadSopenharmony_ci Add issue 7. 13055bd8deadSopenharmony_ci 5 01/20/12 gsellers Make compute dispatch honor conditional 13065bd8deadSopenharmony_ci rendering. Add indirect dispatch. 13075bd8deadSopenharmony_ci Change 'global work size' to 'num work groups', 13085bd8deadSopenharmony_ci make global size in multiples of work group size. 13095bd8deadSopenharmony_ci 4 01/10/12 gsellers Fix typos and other small corrections. 13105bd8deadSopenharmony_ci Make specification of work group size at compile 13115bd8deadSopenharmony_ci time compulsory. 13125bd8deadSopenharmony_ci Add COMPUTE_WORK_DIMENSION_ARB and 13135bd8deadSopenharmony_ci COMPUTE_LOCAL_WORK_SIZE_ARB queries. 13145bd8deadSopenharmony_ci Add issue (5), resolve issues (3) and (4). 13155bd8deadSopenharmony_ci 3 01/09/12 gsellers Change from AMD to ARB. 13165bd8deadSopenharmony_ci Update to be relative to OpenGL 4.2 (+GLSL 4.20). 13175bd8deadSopenharmony_ci Add <shared> variables. 13185bd8deadSopenharmony_ci Add issues (1) - (4). 13195bd8deadSopenharmony_ci Add link failure for programs that contain 13205bd8deadSopenharmony_ci compute and non-compute shaders. 13215bd8deadSopenharmony_ci 2 06/10/11 gsellers Add error behavior. 13225bd8deadSopenharmony_ci Shading language changes. 13235bd8deadSopenharmony_ci Add global_offset parameter. 13245bd8deadSopenharmony_ci Add implementation dependent limits. 13255bd8deadSopenharmony_ci 1 09/24/10 gsellers Initial revision 1326