15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci ARB_fragment_shader_interlock 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_ARB_fragment_shader_interlock 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Slawomir Grajewski, Intel (slawomir.grajewski 'at' intel.com) 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciContributors 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci Contributors to INTEL_fragment_shader_ordering 165bd8deadSopenharmony_ci Contributers to NV_fragment_shader_interlock 175bd8deadSopenharmony_ci 185bd8deadSopenharmony_ciNotice 195bd8deadSopenharmony_ci 205bd8deadSopenharmony_ci Copyright (c) 2015 The Khronos Group Inc. Copyright terms at 215bd8deadSopenharmony_ci http://www.khronos.org/registry/speccopyright.html 225bd8deadSopenharmony_ci 235bd8deadSopenharmony_ciSpecification Update Policy 245bd8deadSopenharmony_ci 255bd8deadSopenharmony_ci Khronos-approved extension specifications are updated in response to 265bd8deadSopenharmony_ci issues and bugs prioritized by the Khronos OpenGL Working Group. For 275bd8deadSopenharmony_ci extensions which have been promoted to a core Specification, fixes will 285bd8deadSopenharmony_ci first appear in the latest version of that core Specification, and will 295bd8deadSopenharmony_ci eventually be backported to the extension document. This policy is 305bd8deadSopenharmony_ci described in more detail at 315bd8deadSopenharmony_ci https://www.khronos.org/registry/OpenGL/docs/update_policy.php 325bd8deadSopenharmony_ci 335bd8deadSopenharmony_ciStatus 345bd8deadSopenharmony_ci 355bd8deadSopenharmony_ci Complete. Approved by the ARB on June 26, 2015. 365bd8deadSopenharmony_ci Ratified by the Khronos Board of Promoters on August 7, 2015. 375bd8deadSopenharmony_ci 385bd8deadSopenharmony_ciVersion 395bd8deadSopenharmony_ci 405bd8deadSopenharmony_ci Last Modified Date: May 7, 2015 415bd8deadSopenharmony_ci Revision: 2 425bd8deadSopenharmony_ci 435bd8deadSopenharmony_ciNumber 445bd8deadSopenharmony_ci 455bd8deadSopenharmony_ci ARB Extension #177 465bd8deadSopenharmony_ci 475bd8deadSopenharmony_ciDependencies 485bd8deadSopenharmony_ci 495bd8deadSopenharmony_ci This extension is written against the OpenGL 4.5 (Core Profile) 505bd8deadSopenharmony_ci Specification. 515bd8deadSopenharmony_ci 525bd8deadSopenharmony_ci This extension is written against version 4.50 (revision 5) of the OpenGL 535bd8deadSopenharmony_ci Shading Language Specification. 545bd8deadSopenharmony_ci 555bd8deadSopenharmony_ci OpenGL 4.2 or ARB_shader_image_load_store is required; GLSL 4.20 is 565bd8deadSopenharmony_ci required. 575bd8deadSopenharmony_ci 585bd8deadSopenharmony_ciOverview 595bd8deadSopenharmony_ci 605bd8deadSopenharmony_ci In unextended OpenGL 4.5, applications may produce a 615bd8deadSopenharmony_ci large number of fragment shader invocations that perform loads and 625bd8deadSopenharmony_ci stores to memory using image uniforms, atomic counter uniforms, 635bd8deadSopenharmony_ci buffer variables, or pointers. The order in which loads and stores 645bd8deadSopenharmony_ci to common addresses are performed by different fragment shader 655bd8deadSopenharmony_ci invocations is largely undefined. For algorithms that use shader 665bd8deadSopenharmony_ci writes and touch the same pixels more than once, one or more of the 675bd8deadSopenharmony_ci following techniques may be required to ensure proper execution ordering: 685bd8deadSopenharmony_ci 695bd8deadSopenharmony_ci * inserting Finish or WaitSync commands to drain the pipeline between 705bd8deadSopenharmony_ci different "passes" or "layers"; 715bd8deadSopenharmony_ci 725bd8deadSopenharmony_ci * using only atomic memory operations to write to shader memory (which 735bd8deadSopenharmony_ci may be relatively slow and limits how memory may be updated); or 745bd8deadSopenharmony_ci 755bd8deadSopenharmony_ci * injecting spin loops into shaders to prevent multiple shader 765bd8deadSopenharmony_ci invocations from touching the same memory concurrently. 775bd8deadSopenharmony_ci 785bd8deadSopenharmony_ci This extension provides new GLSL built-in functions 795bd8deadSopenharmony_ci beginInvocationInterlockARB() and endInvocationInterlockARB() that delimit 805bd8deadSopenharmony_ci a critical section of fragment shader code. For pairs of shader 815bd8deadSopenharmony_ci invocations with "overlapping" coverage in a given pixel, the OpenGL 825bd8deadSopenharmony_ci implementation will guarantee that the critical section of the fragment 835bd8deadSopenharmony_ci shader will be executed for only one fragment at a time. 845bd8deadSopenharmony_ci 855bd8deadSopenharmony_ci There are four different interlock modes supported by this extension, 865bd8deadSopenharmony_ci which are identified by layout qualifiers. The qualifiers 875bd8deadSopenharmony_ci "pixel_interlock_ordered" and "pixel_interlock_unordered" provides mutual 885bd8deadSopenharmony_ci exclusion in the critical section for any pair of fragments corresponding 895bd8deadSopenharmony_ci to the same pixel. When using multisampling, the qualifiers 905bd8deadSopenharmony_ci "sample_interlock_ordered" and "sample_interlock_unordered" only provide 915bd8deadSopenharmony_ci mutual exclusion for pairs of fragments that both cover at least one 925bd8deadSopenharmony_ci common sample in the same pixel; these are recommended for performance if 935bd8deadSopenharmony_ci shaders use per-sample data structures. 945bd8deadSopenharmony_ci 955bd8deadSopenharmony_ci Additionally, when the "pixel_interlock_ordered" or 965bd8deadSopenharmony_ci "sample_interlock_ordered" layout qualifier is used, the interlock also 975bd8deadSopenharmony_ci guarantees that the critical section for multiple shader invocations with 985bd8deadSopenharmony_ci "overlapping" coverage will be executed in the order in which the 995bd8deadSopenharmony_ci primitives were processed by the GL. Such a guarantee is useful for 1005bd8deadSopenharmony_ci applications like blending in the fragment shader, where an application 1015bd8deadSopenharmony_ci requires that fragment values to be composited in the framebuffer in 1025bd8deadSopenharmony_ci primitive order. 1035bd8deadSopenharmony_ci 1045bd8deadSopenharmony_ci This extension can be useful for algorithms that need to access per-pixel 1055bd8deadSopenharmony_ci data structures via shader loads and stores. Such algorithms using this 1065bd8deadSopenharmony_ci extension can access such data structures in the critical section without 1075bd8deadSopenharmony_ci worrying about other invocations for the same pixel accessing the data 1085bd8deadSopenharmony_ci structures concurrently. Additionally, the ordering guarantees are useful 1095bd8deadSopenharmony_ci for cases where the API ordering of fragments is meaningful. For example, 1105bd8deadSopenharmony_ci applications may be able to execute programmable blending operations in 1115bd8deadSopenharmony_ci the fragment shader, where the destination buffer is read via image loads 1125bd8deadSopenharmony_ci and the final value is written via image stores. 1135bd8deadSopenharmony_ci 1145bd8deadSopenharmony_ciNew Procedures and Functions 1155bd8deadSopenharmony_ci 1165bd8deadSopenharmony_ci None. 1175bd8deadSopenharmony_ci 1185bd8deadSopenharmony_ciNew Tokens 1195bd8deadSopenharmony_ci 1205bd8deadSopenharmony_ci None. 1215bd8deadSopenharmony_ci 1225bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.50 1235bd8deadSopenharmony_ci 1245bd8deadSopenharmony_ci Including the following line in a shader can be used to control the 1255bd8deadSopenharmony_ci language features described in this extension: 1265bd8deadSopenharmony_ci 1275bd8deadSopenharmony_ci #extension GL_ARB_fragment_shader_interlock : <behavior> 1285bd8deadSopenharmony_ci 1295bd8deadSopenharmony_ci where <behavior> is as specified in section 3.3. 1305bd8deadSopenharmony_ci 1315bd8deadSopenharmony_ci New preprocessor #defines are added to the OpenGL Shading Language: 1325bd8deadSopenharmony_ci 1335bd8deadSopenharmony_ci #define GL_ARB_fragment_shader_interlock 1 1345bd8deadSopenharmony_ci 1355bd8deadSopenharmony_ci 1365bd8deadSopenharmony_ci Modify Section 4.4.1.3, Fragment Shader Inputs (p. 63) 1375bd8deadSopenharmony_ci 1385bd8deadSopenharmony_ci (add to the list of layout qualifiers containing "early_fragment_tests", 1395bd8deadSopenharmony_ci p. 63, and modify the surrounding language to reflect that multiple 1405bd8deadSopenharmony_ci layout qualifiers are supported on "in") 1415bd8deadSopenharmony_ci 1425bd8deadSopenharmony_ci layout-qualifier-id 1435bd8deadSopenharmony_ci pixel_interlock_ordered 1445bd8deadSopenharmony_ci pixel_interlock_unordered 1455bd8deadSopenharmony_ci sample_interlock_ordered 1465bd8deadSopenharmony_ci sample_interlock_unordered 1475bd8deadSopenharmony_ci 1485bd8deadSopenharmony_ci (add to the end of the section, p. 63) 1495bd8deadSopenharmony_ci 1505bd8deadSopenharmony_ci The identifiers "pixel_interlock_ordered", "pixel_interlock_unordered", 1515bd8deadSopenharmony_ci "sample_interlock_ordered", and "sample_interlock_unordered" control the 1525bd8deadSopenharmony_ci ordering of the execution of shader invocations between calls to the 1535bd8deadSopenharmony_ci built-in functions beginInvocationInterlockARB() and 1545bd8deadSopenharmony_ci endInvocationInterlockARB(), as described in section 8.13.3. A 1555bd8deadSopenharmony_ci compile or link error will be generated if more than one of these layout 1565bd8deadSopenharmony_ci qualifiers is specified in shader code. If a program containing a 1575bd8deadSopenharmony_ci fragment shader includes none of these layout qualifiers, it is as 1585bd8deadSopenharmony_ci though "pixel_interlock_ordered" were specified. 1595bd8deadSopenharmony_ci 1605bd8deadSopenharmony_ci Add to the end of Section 8.13, Fragment Processing Functions (p. 170) 1615bd8deadSopenharmony_ci 1625bd8deadSopenharmony_ci 8.13.3, Fragment Shader Execution Ordering Functions 1635bd8deadSopenharmony_ci 1645bd8deadSopenharmony_ci By default, fragment shader invocations are generally executed in 1655bd8deadSopenharmony_ci undefined order. Multiple fragment shader invocations may be executed 1665bd8deadSopenharmony_ci concurrently, including multiple invocations corresponding to a single 1675bd8deadSopenharmony_ci pixel. Additionally, fragment shader invocations for a single pixel might 1685bd8deadSopenharmony_ci not be processed in the order in which the primitives generating the 1695bd8deadSopenharmony_ci fragments were specified in the OpenGL API. 1705bd8deadSopenharmony_ci 1715bd8deadSopenharmony_ci The paired functions beginInvocationInterlockARB() and 1725bd8deadSopenharmony_ci endInvocationInterlockARB() allow shaders to specify a critical section, 1735bd8deadSopenharmony_ci inside which stronger execution ordering is guaranteed. When using the 1745bd8deadSopenharmony_ci "pixel_interlock_ordered" or "pixel_interlock_unordered" qualifier, 1755bd8deadSopenharmony_ci ordering guarantees are provided for any pair of fragment shader 1765bd8deadSopenharmony_ci invocations X and Y triggered by fragments A and B corresponding to the 1775bd8deadSopenharmony_ci same pixel. When using the "sample_interlock_ordered" or 1785bd8deadSopenharmony_ci "sample_interlock_unordered" qualifier, ordering guarantees are provided 1795bd8deadSopenharmony_ci for any pair of fragment shader invocations X and Y triggered by fragments 1805bd8deadSopenharmony_ci A and B that correspond to the same pixel, where at least one sample of 1815bd8deadSopenharmony_ci the pixel is covered by both fragments. No ordering guarantees are 1825bd8deadSopenharmony_ci provided for pairs of fragment shader invocations corresponding to 1835bd8deadSopenharmony_ci different pixels. Additionally, no ordering guarantees are provided for 1845bd8deadSopenharmony_ci pairs of fragment shader invocations corresponding to the same fragment. 1855bd8deadSopenharmony_ci When multisampling is enabled and the framebuffer has sample buffers, 1865bd8deadSopenharmony_ci multiple fragment shader invocations may result from a single fragment due 1875bd8deadSopenharmony_ci to the use of the "sample" auxiliary storage qualifier, OpenGL API 1885bd8deadSopenharmony_ci commands forcing multiple shader invocations per fragment, or for other 1895bd8deadSopenharmony_ci implementation-dependent reasons. 1905bd8deadSopenharmony_ci 1915bd8deadSopenharmony_ci When using the "pixel_interlock_unordered" or "sample_interlock_unordered" 1925bd8deadSopenharmony_ci qualifier, the interlock will ensure that the critical sections of 1935bd8deadSopenharmony_ci fragment shader invocations X and Y with overlapping coverage will never 1945bd8deadSopenharmony_ci execute concurrently. That is, invocation X is guaranteed to complete its 1955bd8deadSopenharmony_ci call to endInvocationInterlockARB() before invocation Y completes its call 1965bd8deadSopenharmony_ci to beginInvocationInterlockARB(), or vice versa. 1975bd8deadSopenharmony_ci 1985bd8deadSopenharmony_ci When using the "pixel_interlock_ordered" or "sample_interlock_ordered" 1995bd8deadSopenharmony_ci layout qualifier, the critical sections of invocations X and Y with 2005bd8deadSopenharmony_ci overlapping coverage will be executed in a specific order, based on the 2015bd8deadSopenharmony_ci relative order assigned to their fragments A and B. If fragment A is 2025bd8deadSopenharmony_ci considered to precede fragment B, the critical section of invocation X is 2035bd8deadSopenharmony_ci guaranteed to complete before the critical section of invocation Y begins. 2045bd8deadSopenharmony_ci When a pair of fragments A and B have overlapping coverage, fragment A is 2055bd8deadSopenharmony_ci considered to precede fragment B if 2065bd8deadSopenharmony_ci 2075bd8deadSopenharmony_ci * the OpenGL API command producing fragment A was called prior to the 2085bd8deadSopenharmony_ci command producing B, or 2095bd8deadSopenharmony_ci 2105bd8deadSopenharmony_ci * the point, line, triangle, [[compatibility profile: quadrilateral, 2115bd8deadSopenharmony_ci polygon,]] or patch primitive producing fragment A appears earlier in 2125bd8deadSopenharmony_ci the same strip, loop, fan, or independent primitive list producing 2135bd8deadSopenharmony_ci fragment B. 2145bd8deadSopenharmony_ci 2155bd8deadSopenharmony_ci When [[compatibility profile: decomposing quadrilateral or polygon 2165bd8deadSopenharmony_ci primitives or]] tessellating a single patch primitive, multiple 2175bd8deadSopenharmony_ci primitives may be generated in an undefined implementation-dependent 2185bd8deadSopenharmony_ci order. When fragments A and B are generated from such unordered 2195bd8deadSopenharmony_ci primitives, their ordering is also implementation-dependent. 2205bd8deadSopenharmony_ci 2215bd8deadSopenharmony_ci If fragment shader X completes its critical section before fragment shader 2225bd8deadSopenharmony_ci Y begins its critical section, all stores to memory performed in the 2235bd8deadSopenharmony_ci critical section of invocation X using a pointer, image uniform, atomic 2245bd8deadSopenharmony_ci counter uniform, or buffer variable qualified by "coherent" are guaranteed 2255bd8deadSopenharmony_ci to be visible to any reads of the same types of variable performed in the 2265bd8deadSopenharmony_ci critical section of invocation Y. 2275bd8deadSopenharmony_ci 2285bd8deadSopenharmony_ci If multisampling is disabled, or if the framebuffer does not include 2295bd8deadSopenharmony_ci sample buffers, fragment coverage is computed per-pixel. In this case, 2305bd8deadSopenharmony_ci the "sample_interlock_ordered" or "sample_interlock_unordered" layout 2315bd8deadSopenharmony_ci qualifiers are treated as "pixel_interlock_ordered" or 2325bd8deadSopenharmony_ci "pixel_interlock_unordered", respectively. 2335bd8deadSopenharmony_ci 2345bd8deadSopenharmony_ci Syntax: 2355bd8deadSopenharmony_ci 2365bd8deadSopenharmony_ci void beginInvocationInterlockARB(void); 2375bd8deadSopenharmony_ci void endInvocationInterlockARB(void); 2385bd8deadSopenharmony_ci 2395bd8deadSopenharmony_ci Description: 2405bd8deadSopenharmony_ci 2415bd8deadSopenharmony_ci The beginInvocationInterlockARB() and endInvocationInterlockARB() may only 2425bd8deadSopenharmony_ci be placed inside the function main() of a fragment shader and may not be 2435bd8deadSopenharmony_ci called within any flow control. These functions may not be called after a 2445bd8deadSopenharmony_ci return statement in the function main(), but may be called after a discard 2455bd8deadSopenharmony_ci statement. A compile- or link-time error will be generated if main() 2465bd8deadSopenharmony_ci calls either function more than once, contains a call to one function 2475bd8deadSopenharmony_ci without a matching call to the other, or calls endInvocationInterlockARB() 2485bd8deadSopenharmony_ci before calling beginInvocationInterlockARB(). 2495bd8deadSopenharmony_ci 2505bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications 2515bd8deadSopenharmony_ci 2525bd8deadSopenharmony_ci None. 2535bd8deadSopenharmony_ci 2545bd8deadSopenharmony_ciErrors 2555bd8deadSopenharmony_ci 2565bd8deadSopenharmony_ci None. 2575bd8deadSopenharmony_ci 2585bd8deadSopenharmony_ciNew State 2595bd8deadSopenharmony_ci 2605bd8deadSopenharmony_ci None. 2615bd8deadSopenharmony_ci 2625bd8deadSopenharmony_ciNew Implementation Dependent State 2635bd8deadSopenharmony_ci 2645bd8deadSopenharmony_ci None. 2655bd8deadSopenharmony_ci 2665bd8deadSopenharmony_ciIssues 2675bd8deadSopenharmony_ci 2685bd8deadSopenharmony_ci (1) When using multisampling, the OpenGL specification permits 2695bd8deadSopenharmony_ci multiple fragment shader invocations to be generated for a single 2705bd8deadSopenharmony_ci fragment. For example, per-sample shading using the "sample" 2715bd8deadSopenharmony_ci auxiliary storage qualifier or the MinSampleShading() OpenGL API command 2725bd8deadSopenharmony_ci can be used to force per-sample shading. What execution ordering 2735bd8deadSopenharmony_ci guarantees are provided between fragment shader invocations generated 2745bd8deadSopenharmony_ci from the same fragment? 2755bd8deadSopenharmony_ci 2765bd8deadSopenharmony_ci RESOLVED: We don't provide any ordering guarantees in this extension. 2775bd8deadSopenharmony_ci This implies that when using multisampling, there is no guarantee that 2785bd8deadSopenharmony_ci two fragment shader invocations for the same fragment won't be executing 2795bd8deadSopenharmony_ci their critical sections concurrently. This could cause problems for 2805bd8deadSopenharmony_ci algorithms sharing data structures between all the samples of a pixel 2815bd8deadSopenharmony_ci unless accesses to these data structures are performed atomically. 2825bd8deadSopenharmony_ci 2835bd8deadSopenharmony_ci When using per-sample shading, the interlock we provide *does* guarantee 2845bd8deadSopenharmony_ci that no two invocations corresponding to the same sample execute the 2855bd8deadSopenharmony_ci critical section concurrently. If a separate set of data structures is 2865bd8deadSopenharmony_ci provided for each sample, no conflicts should occur within the critical 2875bd8deadSopenharmony_ci section. 2885bd8deadSopenharmony_ci 2895bd8deadSopenharmony_ci Note that in addition to the per-sample shading options in the shading 2905bd8deadSopenharmony_ci language and API, implementations may provide multisample antialiasing 2915bd8deadSopenharmony_ci modes where the implementation can't simply run the fragment shader once 2925bd8deadSopenharmony_ci and broadcast results to a large set of covered samples. 2935bd8deadSopenharmony_ci 2945bd8deadSopenharmony_ci (2) What performance differences are expected between shaders using the 2955bd8deadSopenharmony_ci "pixel" and "sample" layout qualifier variants in this extension (e.g., 2965bd8deadSopenharmony_ci "pixel_invocation_ordered" and "sample_invocation_ordered")? 2975bd8deadSopenharmony_ci 2985bd8deadSopenharmony_ci RESOLVED: We expect that shaders using "sample" qualifiers may have 2995bd8deadSopenharmony_ci higher performance, since the implementation need not order pairs of 3005bd8deadSopenharmony_ci fragments that touch the same pixel with "complementary" coverage. Such 3015bd8deadSopenharmony_ci situations are fairly common: when two adjacent triangles combine to 3025bd8deadSopenharmony_ci cover a given pixel, two fragments will be generated for the pixel but 3035bd8deadSopenharmony_ci no sample will be covered by both. When using "sample" qualifiers, the 3045bd8deadSopenharmony_ci invocations for both fragments can run concurrently. When using "pixel" 3055bd8deadSopenharmony_ci qualifiers, the critical section for one fragment must wait until the 3065bd8deadSopenharmony_ci critical section for the other fragment completes. 3075bd8deadSopenharmony_ci 3085bd8deadSopenharmony_ci (3) What performance differences are expected between shaders using the 3095bd8deadSopenharmony_ci "ordered" and "unordered" layout qualifier variants in this extension 3105bd8deadSopenharmony_ci (e.g., "pixel_invocation_ordered" and "pixel_invocation_unordered")? 3115bd8deadSopenharmony_ci 3125bd8deadSopenharmony_ci RESOLVED: We expect that shaders using "unordered" may have higher 3135bd8deadSopenharmony_ci performance, since the critical section implementation doesn't need to 3145bd8deadSopenharmony_ci ensure that all previous invocations with overlapping coverage have 3155bd8deadSopenharmony_ci completed their critical sections. Some algorithms (e.g., building data 3165bd8deadSopenharmony_ci structures in order-independent transparency algorithms) will require 3175bd8deadSopenharmony_ci mutual exclusion when updating per-pixel data structures, but do not 3185bd8deadSopenharmony_ci require that shaders execute in a specific ordering. 3195bd8deadSopenharmony_ci 3205bd8deadSopenharmony_ci (4) Are fragment shaders using this extension allowed to write outputs? 3215bd8deadSopenharmony_ci If so, is there any guarantee on the order in which such outputs are 3225bd8deadSopenharmony_ci written to the framebuffer? 3235bd8deadSopenharmony_ci 3245bd8deadSopenharmony_ci RESOLVED: Yes, fragment shaders with critical sections may still write 3255bd8deadSopenharmony_ci outputs. If fragment shader outputs are written, they are stored or 3265bd8deadSopenharmony_ci blended into the framebuffer in API order, as is the case for fragment 3275bd8deadSopenharmony_ci shaders not using this extension. 3285bd8deadSopenharmony_ci 3295bd8deadSopenharmony_ci (5) What considerations apply when using this extension to implement a 3305bd8deadSopenharmony_ci programmable form of conventional blending using image stores? 3315bd8deadSopenharmony_ci 3325bd8deadSopenharmony_ci RESOLVED: Per-fragment operations performed in the pipeline following 3335bd8deadSopenharmony_ci fragment shader execution obviously have no effect on image stores 3345bd8deadSopenharmony_ci executing during fragment shader execution. In particular, multisample 3355bd8deadSopenharmony_ci operations such as broadcasting a single fragment output to multiple 3365bd8deadSopenharmony_ci samples or modifying the coverage with alpha-to-coverage or a shader 3375bd8deadSopenharmony_ci coverage mask output value have no effect. Fragments can not be killed 3385bd8deadSopenharmony_ci before fragment shader blending using the fixed-function alpha test or 3395bd8deadSopenharmony_ci using the depth test with a Z value produced by the shader. Fragments 3405bd8deadSopenharmony_ci will normally not be killed by fixed-function depth or stencil tests, 3415bd8deadSopenharmony_ci but those tests can be enabled before fragment shader invocations using 3425bd8deadSopenharmony_ci the layout qualifier "early_fragment_tests". Any required 3435bd8deadSopenharmony_ci fixed-function features that need to be handled before programmable 3445bd8deadSopenharmony_ci blending that aren't enabled by "early_fragment_tests" would need to be 3455bd8deadSopenharmony_ci emulated in the shader. 3465bd8deadSopenharmony_ci 3475bd8deadSopenharmony_ci Note also that performing blend computations in the shader are not 3485bd8deadSopenharmony_ci guaranteed to produce results that are bit-identical to these produced 3495bd8deadSopenharmony_ci by fixed-function blending hardware, even if mathematically equivalent 3505bd8deadSopenharmony_ci algorithms are used. 3515bd8deadSopenharmony_ci 3525bd8deadSopenharmony_ci (6) For operations accessing shared per-pixel data structures in the 3535bd8deadSopenharmony_ci critical section, what operations (if any) must be performed in shader 3545bd8deadSopenharmony_ci code to ensure that stores from one shader invocation are visible to 3555bd8deadSopenharmony_ci the next? 3565bd8deadSopenharmony_ci 3575bd8deadSopenharmony_ci RESOLVED: The "coherent" qualifier is required in the declaration of 3585bd8deadSopenharmony_ci the shared data structures to ensure that writes performed by one 3595bd8deadSopenharmony_ci invocation are visible to reads performed by another invocation. 3605bd8deadSopenharmony_ci 3615bd8deadSopenharmony_ci In shaders that don't use the interlock, "coherent" is not sufficient as 3625bd8deadSopenharmony_ci there is no guarantee of the ordering of fragment shader invocations -- 3635bd8deadSopenharmony_ci even if invocation A can see the values written by another invocation B, 3645bd8deadSopenharmony_ci there is no general guarantee that invocation A's read will be performed 3655bd8deadSopenharmony_ci before invocation B's write. The built-in function memoryBarrier() can 3665bd8deadSopenharmony_ci be used to generate a weak ordering by which threads can communicate, 3675bd8deadSopenharmony_ci but it doesn't order memory transactions between two separate 3685bd8deadSopenharmony_ci invocations. With the interlock, execution ordering between two threads 3695bd8deadSopenharmony_ci from the same pixel is well-defined as long as the loads and stores are 3705bd8deadSopenharmony_ci performed inside the critical section, and the use of "coherent" ensures 3715bd8deadSopenharmony_ci that stores done by one invocation are visible to other invocations. 3725bd8deadSopenharmony_ci 3735bd8deadSopenharmony_ci (7) Should we provide an explicit mechanisms for shaders to indicate a 3745bd8deadSopenharmony_ci critical section? Or should we just automatically infer a critical 3755bd8deadSopenharmony_ci section by analyzing shader code? Or should we just wrap the entire 3765bd8deadSopenharmony_ci fragment shader in a critical section? 3775bd8deadSopenharmony_ci 3785bd8deadSopenharmony_ci RESOLVED: Provide an explicit critical section. 3795bd8deadSopenharmony_ci 3805bd8deadSopenharmony_ci We definitely don't want to wrap the entire shader in a critical section 3815bd8deadSopenharmony_ci when a smaller section will suffice. Doing so would hold off the 3825bd8deadSopenharmony_ci execution of any other fragment shader invocation with the same (x,y) 3835bd8deadSopenharmony_ci for the entire (potentially long) life of the fragment shader. Hardware 3845bd8deadSopenharmony_ci would need to track a large number of fragments awaiting execution, and 3855bd8deadSopenharmony_ci may be so backed up that further fragments will be blocked even if they 3865bd8deadSopenharmony_ci don't overlap with any fragments currently executing. Providing a 3875bd8deadSopenharmony_ci smaller critical section reduces the amount of time other fragments are 3885bd8deadSopenharmony_ci blocked and allows implementations to perform useful work for 3895bd8deadSopenharmony_ci conflicting fragments before they hit the critical section. 3905bd8deadSopenharmony_ci 3915bd8deadSopenharmony_ci While a compiler could analyze the code and wrap a critical section 3925bd8deadSopenharmony_ci around all memory accesses, it may be difficult to determine which 3935bd8deadSopenharmony_ci accesses actually require mutual exclusion and ordering, and which 3945bd8deadSopenharmony_ci accesses are safe to do with no protection. Requiring shaders to 3955bd8deadSopenharmony_ci explicitly identify a critical section doesn't seem overwhelmingly 3965bd8deadSopenharmony_ci burdensome, and allows applications to exclude memory accesses that it 3975bd8deadSopenharmony_ci knows to be "safe". 3985bd8deadSopenharmony_ci 3995bd8deadSopenharmony_ci (8) What restrictions should be imposed on the use of the 4005bd8deadSopenharmony_ci beginInvocationInterlockARB() and endInvocationInterlockARB() functions 4015bd8deadSopenharmony_ci delimiting a critical section? 4025bd8deadSopenharmony_ci 4035bd8deadSopenharmony_ci RESOLVED: We impose restrictions similar to those on the barrier() 4045bd8deadSopenharmony_ci built-in function in tessellation control shaders to ensure that any 4055bd8deadSopenharmony_ci shader using this functionality has a single critical section that can 4065bd8deadSopenharmony_ci be easily identified during compilation. In particular, we require that 4075bd8deadSopenharmony_ci these functions be called in main() and don't permit them to be called 4085bd8deadSopenharmony_ci in conditional flow control. 4095bd8deadSopenharmony_ci 4105bd8deadSopenharmony_ci These restrictions ensure that there is always exactly one call to the 4115bd8deadSopenharmony_ci "begin" and "end" functions in a predictable location in the compiled 4125bd8deadSopenharmony_ci shader code, and ensure that the compiler and hardware don't have to 4135bd8deadSopenharmony_ci deal with unusual cases (like entering a critical section and never 4145bd8deadSopenharmony_ci leaving, leaving a critical section without entering it, or trying to 4155bd8deadSopenharmony_ci enter a critical section more than once). 4165bd8deadSopenharmony_ci 4175bd8deadSopenharmony_ciRevision History 4185bd8deadSopenharmony_ci 4195bd8deadSopenharmony_ci Rev. Date Author Changes 4205bd8deadSopenharmony_ci ---- -------- -------- ----------------------------------------- 4215bd8deadSopenharmony_ci 1 04/01/15 S.Grajewski Inital version merging 4225bd8deadSopenharmony_ci INTEL_fragment_shader_ordering with 4235bd8deadSopenharmony_ci NV_fragment_shader_interlock 4245bd8deadSopenharmony_ci 4255bd8deadSopenharmony_ci 2 05/07/15 S.Grajewski Built-in functions 4265bd8deadSopenharmony_ci beginInvocationInterlockARB() and 4275bd8deadSopenharmony_ci endInvocationInterlockARB() have now ARB 4285bd8deadSopenharmony_ci suffixes. 429