15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    ARB_fragment_shader_interlock
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_ARB_fragment_shader_interlock
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Slawomir Grajewski, Intel  (slawomir.grajewski 'at' intel.com)
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciContributors
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci    Contributors to INTEL_fragment_shader_ordering
165bd8deadSopenharmony_ci    Contributers to NV_fragment_shader_interlock
175bd8deadSopenharmony_ci
185bd8deadSopenharmony_ciNotice
195bd8deadSopenharmony_ci
205bd8deadSopenharmony_ci    Copyright (c) 2015 The Khronos Group Inc. Copyright terms at
215bd8deadSopenharmony_ci        http://www.khronos.org/registry/speccopyright.html
225bd8deadSopenharmony_ci
235bd8deadSopenharmony_ciSpecification Update Policy
245bd8deadSopenharmony_ci
255bd8deadSopenharmony_ci    Khronos-approved extension specifications are updated in response to
265bd8deadSopenharmony_ci    issues and bugs prioritized by the Khronos OpenGL Working Group. For
275bd8deadSopenharmony_ci    extensions which have been promoted to a core Specification, fixes will
285bd8deadSopenharmony_ci    first appear in the latest version of that core Specification, and will
295bd8deadSopenharmony_ci    eventually be backported to the extension document. This policy is
305bd8deadSopenharmony_ci    described in more detail at
315bd8deadSopenharmony_ci        https://www.khronos.org/registry/OpenGL/docs/update_policy.php
325bd8deadSopenharmony_ci
335bd8deadSopenharmony_ciStatus
345bd8deadSopenharmony_ci
355bd8deadSopenharmony_ci    Complete. Approved by the ARB on June 26, 2015.
365bd8deadSopenharmony_ci    Ratified by the Khronos Board of Promoters on August 7, 2015.
375bd8deadSopenharmony_ci
385bd8deadSopenharmony_ciVersion
395bd8deadSopenharmony_ci
405bd8deadSopenharmony_ci    Last Modified Date:        May 7, 2015
415bd8deadSopenharmony_ci    Revision:                  2
425bd8deadSopenharmony_ci
435bd8deadSopenharmony_ciNumber
445bd8deadSopenharmony_ci
455bd8deadSopenharmony_ci    ARB Extension #177
465bd8deadSopenharmony_ci
475bd8deadSopenharmony_ciDependencies
485bd8deadSopenharmony_ci
495bd8deadSopenharmony_ci    This extension is written against the OpenGL 4.5 (Core Profile)
505bd8deadSopenharmony_ci    Specification.
515bd8deadSopenharmony_ci
525bd8deadSopenharmony_ci    This extension is written against version 4.50 (revision 5) of the OpenGL
535bd8deadSopenharmony_ci    Shading Language Specification.
545bd8deadSopenharmony_ci
555bd8deadSopenharmony_ci    OpenGL 4.2 or ARB_shader_image_load_store is required; GLSL 4.20 is
565bd8deadSopenharmony_ci    required.
575bd8deadSopenharmony_ci
585bd8deadSopenharmony_ciOverview
595bd8deadSopenharmony_ci
605bd8deadSopenharmony_ci    In unextended OpenGL 4.5, applications may produce a
615bd8deadSopenharmony_ci    large number of fragment shader invocations that perform loads and
625bd8deadSopenharmony_ci    stores to memory using image uniforms, atomic counter uniforms,
635bd8deadSopenharmony_ci    buffer variables, or pointers. The order in which loads and stores
645bd8deadSopenharmony_ci    to common addresses are performed by different fragment shader
655bd8deadSopenharmony_ci    invocations is largely undefined.  For algorithms that use shader
665bd8deadSopenharmony_ci    writes and touch the same pixels more than once, one or more of the
675bd8deadSopenharmony_ci    following techniques may be required to ensure proper execution ordering:
685bd8deadSopenharmony_ci
695bd8deadSopenharmony_ci      * inserting Finish or WaitSync commands to drain the pipeline between
705bd8deadSopenharmony_ci        different "passes" or "layers";
715bd8deadSopenharmony_ci
725bd8deadSopenharmony_ci      * using only atomic memory operations to write to shader memory (which
735bd8deadSopenharmony_ci        may be relatively slow and limits how memory may be updated); or
745bd8deadSopenharmony_ci
755bd8deadSopenharmony_ci      * injecting spin loops into shaders to prevent multiple shader
765bd8deadSopenharmony_ci        invocations from touching the same memory concurrently.
775bd8deadSopenharmony_ci
785bd8deadSopenharmony_ci    This extension provides new GLSL built-in functions
795bd8deadSopenharmony_ci    beginInvocationInterlockARB() and endInvocationInterlockARB() that delimit
805bd8deadSopenharmony_ci    a critical section of fragment shader code.  For pairs of shader
815bd8deadSopenharmony_ci    invocations with "overlapping" coverage in a given pixel, the OpenGL
825bd8deadSopenharmony_ci    implementation will guarantee that the critical section of the fragment
835bd8deadSopenharmony_ci    shader will be executed for only one fragment at a time.
845bd8deadSopenharmony_ci
855bd8deadSopenharmony_ci    There are four different interlock modes supported by this extension,
865bd8deadSopenharmony_ci    which are identified by layout qualifiers.  The qualifiers
875bd8deadSopenharmony_ci    "pixel_interlock_ordered" and "pixel_interlock_unordered" provides mutual
885bd8deadSopenharmony_ci    exclusion in the critical section for any pair of fragments corresponding
895bd8deadSopenharmony_ci    to the same pixel.  When using multisampling, the qualifiers
905bd8deadSopenharmony_ci    "sample_interlock_ordered" and "sample_interlock_unordered" only provide
915bd8deadSopenharmony_ci    mutual exclusion for pairs of fragments that both cover at least one
925bd8deadSopenharmony_ci    common sample in the same pixel; these are recommended for performance if
935bd8deadSopenharmony_ci    shaders use per-sample data structures.
945bd8deadSopenharmony_ci
955bd8deadSopenharmony_ci    Additionally, when the "pixel_interlock_ordered" or
965bd8deadSopenharmony_ci    "sample_interlock_ordered" layout qualifier is used, the interlock also
975bd8deadSopenharmony_ci    guarantees that the critical section for multiple shader invocations with
985bd8deadSopenharmony_ci    "overlapping" coverage will be executed in the order in which the
995bd8deadSopenharmony_ci    primitives were processed by the GL.  Such a guarantee is useful for
1005bd8deadSopenharmony_ci    applications like blending in the fragment shader, where an application
1015bd8deadSopenharmony_ci    requires that fragment values to be composited in the framebuffer in
1025bd8deadSopenharmony_ci    primitive order.
1035bd8deadSopenharmony_ci
1045bd8deadSopenharmony_ci    This extension can be useful for algorithms that need to access per-pixel
1055bd8deadSopenharmony_ci    data structures via shader loads and stores.  Such algorithms using this
1065bd8deadSopenharmony_ci    extension can access such data structures in the critical section without
1075bd8deadSopenharmony_ci    worrying about other invocations for the same pixel accessing the data
1085bd8deadSopenharmony_ci    structures concurrently.  Additionally, the ordering guarantees are useful
1095bd8deadSopenharmony_ci    for cases where the API ordering of fragments is meaningful.  For example,
1105bd8deadSopenharmony_ci    applications may be able to execute programmable blending operations in
1115bd8deadSopenharmony_ci    the fragment shader, where the destination buffer is read via image loads
1125bd8deadSopenharmony_ci    and the final value is written via image stores.
1135bd8deadSopenharmony_ci
1145bd8deadSopenharmony_ciNew Procedures and Functions
1155bd8deadSopenharmony_ci
1165bd8deadSopenharmony_ci    None.
1175bd8deadSopenharmony_ci
1185bd8deadSopenharmony_ciNew Tokens
1195bd8deadSopenharmony_ci
1205bd8deadSopenharmony_ci    None.
1215bd8deadSopenharmony_ci
1225bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.50
1235bd8deadSopenharmony_ci
1245bd8deadSopenharmony_ci    Including the following line in a shader can be used to control the
1255bd8deadSopenharmony_ci    language features described in this extension:
1265bd8deadSopenharmony_ci
1275bd8deadSopenharmony_ci      #extension GL_ARB_fragment_shader_interlock : <behavior>
1285bd8deadSopenharmony_ci
1295bd8deadSopenharmony_ci    where <behavior> is as specified in section 3.3.
1305bd8deadSopenharmony_ci
1315bd8deadSopenharmony_ci    New preprocessor #defines are added to the OpenGL Shading Language:
1325bd8deadSopenharmony_ci
1335bd8deadSopenharmony_ci      #define GL_ARB_fragment_shader_interlock           1
1345bd8deadSopenharmony_ci
1355bd8deadSopenharmony_ci
1365bd8deadSopenharmony_ci    Modify Section 4.4.1.3, Fragment Shader Inputs (p. 63)
1375bd8deadSopenharmony_ci
1385bd8deadSopenharmony_ci    (add to the list of layout qualifiers containing "early_fragment_tests",
1395bd8deadSopenharmony_ci     p. 63, and modify the surrounding language to reflect that multiple
1405bd8deadSopenharmony_ci     layout qualifiers are supported on "in")
1415bd8deadSopenharmony_ci
1425bd8deadSopenharmony_ci      layout-qualifier-id
1435bd8deadSopenharmony_ci        pixel_interlock_ordered
1445bd8deadSopenharmony_ci        pixel_interlock_unordered
1455bd8deadSopenharmony_ci        sample_interlock_ordered
1465bd8deadSopenharmony_ci        sample_interlock_unordered
1475bd8deadSopenharmony_ci
1485bd8deadSopenharmony_ci    (add to the end of the section, p. 63)
1495bd8deadSopenharmony_ci
1505bd8deadSopenharmony_ci    The identifiers "pixel_interlock_ordered", "pixel_interlock_unordered",
1515bd8deadSopenharmony_ci    "sample_interlock_ordered", and "sample_interlock_unordered" control the
1525bd8deadSopenharmony_ci    ordering of the execution of shader invocations between calls to the
1535bd8deadSopenharmony_ci    built-in functions beginInvocationInterlockARB() and
1545bd8deadSopenharmony_ci    endInvocationInterlockARB(), as described in section 8.13.3. A
1555bd8deadSopenharmony_ci    compile or link error will be generated if more than one of these layout
1565bd8deadSopenharmony_ci    qualifiers is specified in shader code. If a program containing a
1575bd8deadSopenharmony_ci    fragment shader includes none of these layout qualifiers, it is as
1585bd8deadSopenharmony_ci    though "pixel_interlock_ordered" were specified.
1595bd8deadSopenharmony_ci
1605bd8deadSopenharmony_ci    Add to the end of Section 8.13, Fragment Processing Functions (p. 170)
1615bd8deadSopenharmony_ci
1625bd8deadSopenharmony_ci    8.13.3, Fragment Shader Execution Ordering Functions
1635bd8deadSopenharmony_ci
1645bd8deadSopenharmony_ci    By default, fragment shader invocations are generally executed in
1655bd8deadSopenharmony_ci    undefined order. Multiple fragment shader invocations may be executed
1665bd8deadSopenharmony_ci    concurrently, including multiple invocations corresponding to a single
1675bd8deadSopenharmony_ci    pixel. Additionally, fragment shader invocations for a single pixel might
1685bd8deadSopenharmony_ci    not be processed in the order in which the primitives generating the
1695bd8deadSopenharmony_ci    fragments were specified in the OpenGL API.
1705bd8deadSopenharmony_ci
1715bd8deadSopenharmony_ci    The paired functions beginInvocationInterlockARB() and
1725bd8deadSopenharmony_ci    endInvocationInterlockARB() allow shaders to specify a critical section,
1735bd8deadSopenharmony_ci    inside which stronger execution ordering is guaranteed.  When using the
1745bd8deadSopenharmony_ci    "pixel_interlock_ordered" or "pixel_interlock_unordered" qualifier,
1755bd8deadSopenharmony_ci    ordering guarantees are provided for any pair of fragment shader
1765bd8deadSopenharmony_ci    invocations X and Y triggered by fragments A and B corresponding to the
1775bd8deadSopenharmony_ci    same pixel. When using the "sample_interlock_ordered" or
1785bd8deadSopenharmony_ci    "sample_interlock_unordered" qualifier, ordering guarantees are provided
1795bd8deadSopenharmony_ci    for any pair of fragment shader invocations X and Y triggered by fragments
1805bd8deadSopenharmony_ci    A and B that correspond to the same pixel, where at least one sample of
1815bd8deadSopenharmony_ci    the pixel is covered by both fragments. No ordering guarantees are
1825bd8deadSopenharmony_ci    provided for pairs of fragment shader invocations corresponding to
1835bd8deadSopenharmony_ci    different pixels. Additionally, no ordering guarantees are provided for
1845bd8deadSopenharmony_ci    pairs of fragment shader invocations corresponding to the same fragment.
1855bd8deadSopenharmony_ci    When multisampling is enabled and the framebuffer has sample buffers,
1865bd8deadSopenharmony_ci    multiple fragment shader invocations may result from a single fragment due
1875bd8deadSopenharmony_ci    to the use of the "sample" auxiliary storage qualifier, OpenGL API
1885bd8deadSopenharmony_ci    commands forcing multiple shader invocations per fragment, or for other
1895bd8deadSopenharmony_ci    implementation-dependent reasons.
1905bd8deadSopenharmony_ci
1915bd8deadSopenharmony_ci    When using the "pixel_interlock_unordered" or "sample_interlock_unordered"
1925bd8deadSopenharmony_ci    qualifier, the interlock will ensure that the critical sections of
1935bd8deadSopenharmony_ci    fragment shader invocations X and Y with overlapping coverage will never
1945bd8deadSopenharmony_ci    execute concurrently. That is, invocation X is guaranteed to complete its
1955bd8deadSopenharmony_ci    call to endInvocationInterlockARB() before invocation Y completes its call
1965bd8deadSopenharmony_ci    to beginInvocationInterlockARB(), or vice versa.
1975bd8deadSopenharmony_ci
1985bd8deadSopenharmony_ci    When using the "pixel_interlock_ordered" or "sample_interlock_ordered"
1995bd8deadSopenharmony_ci    layout qualifier, the critical sections of invocations X and Y with
2005bd8deadSopenharmony_ci    overlapping coverage will be executed in a specific order, based on the
2015bd8deadSopenharmony_ci    relative order assigned to their fragments A and B.  If fragment A is
2025bd8deadSopenharmony_ci    considered to precede fragment B, the critical section of invocation X is
2035bd8deadSopenharmony_ci    guaranteed to complete before the critical section of invocation Y begins.
2045bd8deadSopenharmony_ci    When a pair of fragments A and B have overlapping coverage, fragment A is
2055bd8deadSopenharmony_ci    considered to precede fragment B if
2065bd8deadSopenharmony_ci
2075bd8deadSopenharmony_ci      * the OpenGL API command producing fragment A was called prior to the
2085bd8deadSopenharmony_ci        command producing B, or
2095bd8deadSopenharmony_ci
2105bd8deadSopenharmony_ci      * the point, line, triangle, [[compatibility profile: quadrilateral,
2115bd8deadSopenharmony_ci        polygon,]] or patch primitive producing fragment A appears earlier in
2125bd8deadSopenharmony_ci        the same strip, loop, fan, or independent primitive list producing
2135bd8deadSopenharmony_ci        fragment B.
2145bd8deadSopenharmony_ci
2155bd8deadSopenharmony_ci    When [[compatibility profile: decomposing quadrilateral or polygon
2165bd8deadSopenharmony_ci    primitives or]] tessellating a single patch primitive, multiple
2175bd8deadSopenharmony_ci    primitives may be generated in an undefined implementation-dependent
2185bd8deadSopenharmony_ci    order.  When fragments A and B are generated from such unordered
2195bd8deadSopenharmony_ci    primitives, their ordering is also implementation-dependent.
2205bd8deadSopenharmony_ci
2215bd8deadSopenharmony_ci    If fragment shader X completes its critical section before fragment shader
2225bd8deadSopenharmony_ci    Y begins its critical section, all stores to memory performed in the
2235bd8deadSopenharmony_ci    critical section of invocation X using a pointer, image uniform, atomic
2245bd8deadSopenharmony_ci    counter uniform, or buffer variable qualified by "coherent" are guaranteed
2255bd8deadSopenharmony_ci    to be visible to any reads of the same types of variable performed in the
2265bd8deadSopenharmony_ci    critical section of invocation Y.
2275bd8deadSopenharmony_ci
2285bd8deadSopenharmony_ci    If multisampling is disabled, or if the framebuffer does not include
2295bd8deadSopenharmony_ci    sample buffers, fragment coverage is computed per-pixel. In this case,
2305bd8deadSopenharmony_ci    the "sample_interlock_ordered" or "sample_interlock_unordered" layout
2315bd8deadSopenharmony_ci    qualifiers are treated as "pixel_interlock_ordered" or
2325bd8deadSopenharmony_ci    "pixel_interlock_unordered", respectively.
2335bd8deadSopenharmony_ci
2345bd8deadSopenharmony_ci      Syntax:
2355bd8deadSopenharmony_ci
2365bd8deadSopenharmony_ci        void beginInvocationInterlockARB(void);
2375bd8deadSopenharmony_ci        void endInvocationInterlockARB(void);
2385bd8deadSopenharmony_ci
2395bd8deadSopenharmony_ci      Description:
2405bd8deadSopenharmony_ci
2415bd8deadSopenharmony_ci    The beginInvocationInterlockARB() and endInvocationInterlockARB() may only
2425bd8deadSopenharmony_ci    be placed inside the function main() of a fragment shader and may not be
2435bd8deadSopenharmony_ci    called within any flow control.  These functions may not be called after a
2445bd8deadSopenharmony_ci    return statement in the function main(), but may be called after a discard
2455bd8deadSopenharmony_ci    statement.  A compile- or link-time error will be generated if main()
2465bd8deadSopenharmony_ci    calls either function more than once, contains a call to one function
2475bd8deadSopenharmony_ci    without a matching call to the other, or calls endInvocationInterlockARB()
2485bd8deadSopenharmony_ci    before calling beginInvocationInterlockARB().
2495bd8deadSopenharmony_ci
2505bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
2515bd8deadSopenharmony_ci
2525bd8deadSopenharmony_ci    None.
2535bd8deadSopenharmony_ci
2545bd8deadSopenharmony_ciErrors
2555bd8deadSopenharmony_ci
2565bd8deadSopenharmony_ci    None.
2575bd8deadSopenharmony_ci
2585bd8deadSopenharmony_ciNew State
2595bd8deadSopenharmony_ci
2605bd8deadSopenharmony_ci    None.
2615bd8deadSopenharmony_ci
2625bd8deadSopenharmony_ciNew Implementation Dependent State
2635bd8deadSopenharmony_ci
2645bd8deadSopenharmony_ci    None.
2655bd8deadSopenharmony_ci
2665bd8deadSopenharmony_ciIssues
2675bd8deadSopenharmony_ci
2685bd8deadSopenharmony_ci    (1) When using multisampling, the OpenGL specification permits
2695bd8deadSopenharmony_ci        multiple fragment shader invocations to be generated for a single
2705bd8deadSopenharmony_ci        fragment.  For example, per-sample shading using the "sample"
2715bd8deadSopenharmony_ci        auxiliary storage qualifier or the MinSampleShading() OpenGL API command
2725bd8deadSopenharmony_ci        can be used to force per-sample shading.  What execution ordering
2735bd8deadSopenharmony_ci        guarantees are provided between fragment shader invocations generated
2745bd8deadSopenharmony_ci        from the same fragment?
2755bd8deadSopenharmony_ci
2765bd8deadSopenharmony_ci      RESOLVED:  We don't provide any ordering guarantees in this extension.
2775bd8deadSopenharmony_ci      This implies that when using multisampling, there is no guarantee that
2785bd8deadSopenharmony_ci      two fragment shader invocations for the same fragment won't be executing
2795bd8deadSopenharmony_ci      their critical sections concurrently.  This could cause problems for
2805bd8deadSopenharmony_ci      algorithms sharing data structures between all the samples of a pixel
2815bd8deadSopenharmony_ci      unless accesses to these data structures are performed atomically.
2825bd8deadSopenharmony_ci
2835bd8deadSopenharmony_ci      When using per-sample shading, the interlock we provide *does* guarantee
2845bd8deadSopenharmony_ci      that no two invocations corresponding to the same sample execute the
2855bd8deadSopenharmony_ci      critical section concurrently.  If a separate set of data structures is
2865bd8deadSopenharmony_ci      provided for each sample, no conflicts should occur within the critical
2875bd8deadSopenharmony_ci      section.
2885bd8deadSopenharmony_ci
2895bd8deadSopenharmony_ci      Note that in addition to the per-sample shading options in the shading
2905bd8deadSopenharmony_ci      language and API, implementations may provide multisample antialiasing
2915bd8deadSopenharmony_ci      modes where the implementation can't simply run the fragment shader once
2925bd8deadSopenharmony_ci      and broadcast results to a large set of covered samples.
2935bd8deadSopenharmony_ci
2945bd8deadSopenharmony_ci    (2) What performance differences are expected between shaders using the
2955bd8deadSopenharmony_ci       "pixel" and "sample" layout qualifier variants in this extension (e.g.,
2965bd8deadSopenharmony_ci       "pixel_invocation_ordered" and "sample_invocation_ordered")?
2975bd8deadSopenharmony_ci
2985bd8deadSopenharmony_ci      RESOLVED:  We expect that shaders using "sample" qualifiers may have
2995bd8deadSopenharmony_ci      higher performance, since the implementation need not order pairs of
3005bd8deadSopenharmony_ci      fragments that touch the same pixel with "complementary" coverage.  Such
3015bd8deadSopenharmony_ci      situations are fairly common:  when two adjacent triangles combine to
3025bd8deadSopenharmony_ci      cover a given pixel, two fragments will be generated for the pixel but
3035bd8deadSopenharmony_ci      no sample will be covered by both.  When using "sample" qualifiers, the
3045bd8deadSopenharmony_ci      invocations for both fragments can run concurrently.  When using "pixel"
3055bd8deadSopenharmony_ci      qualifiers, the critical section for one fragment must wait until the
3065bd8deadSopenharmony_ci      critical section for the other fragment completes.
3075bd8deadSopenharmony_ci
3085bd8deadSopenharmony_ci    (3) What performance differences are expected between shaders using the
3095bd8deadSopenharmony_ci       "ordered" and "unordered" layout qualifier variants in this extension
3105bd8deadSopenharmony_ci       (e.g., "pixel_invocation_ordered" and "pixel_invocation_unordered")?
3115bd8deadSopenharmony_ci
3125bd8deadSopenharmony_ci      RESOLVED:  We expect that shaders using "unordered" may have higher
3135bd8deadSopenharmony_ci      performance, since the critical section implementation doesn't need to
3145bd8deadSopenharmony_ci      ensure that all previous invocations with overlapping coverage have
3155bd8deadSopenharmony_ci      completed their critical sections.  Some algorithms (e.g., building data
3165bd8deadSopenharmony_ci      structures in order-independent transparency algorithms) will require
3175bd8deadSopenharmony_ci      mutual exclusion when updating per-pixel data structures, but do not
3185bd8deadSopenharmony_ci      require that shaders execute in a specific ordering.
3195bd8deadSopenharmony_ci
3205bd8deadSopenharmony_ci    (4) Are fragment shaders using this extension allowed to write outputs?
3215bd8deadSopenharmony_ci        If so, is there any guarantee on the order in which such outputs are
3225bd8deadSopenharmony_ci        written to the framebuffer?
3235bd8deadSopenharmony_ci
3245bd8deadSopenharmony_ci      RESOLVED:  Yes, fragment shaders with critical sections may still write
3255bd8deadSopenharmony_ci      outputs.  If fragment shader outputs are written, they are stored or
3265bd8deadSopenharmony_ci      blended into the framebuffer in API order, as is the case for fragment
3275bd8deadSopenharmony_ci      shaders not using this extension.
3285bd8deadSopenharmony_ci
3295bd8deadSopenharmony_ci    (5) What considerations apply when using this extension to implement a
3305bd8deadSopenharmony_ci        programmable form of conventional blending using image stores?
3315bd8deadSopenharmony_ci
3325bd8deadSopenharmony_ci      RESOLVED:  Per-fragment operations performed in the pipeline following
3335bd8deadSopenharmony_ci      fragment shader execution obviously have no effect on image stores
3345bd8deadSopenharmony_ci      executing during fragment shader execution.  In particular, multisample
3355bd8deadSopenharmony_ci      operations such as broadcasting a single fragment output to multiple
3365bd8deadSopenharmony_ci      samples or modifying the coverage with alpha-to-coverage or a shader
3375bd8deadSopenharmony_ci      coverage mask output value have no effect.  Fragments can not be killed
3385bd8deadSopenharmony_ci      before fragment shader blending using the fixed-function alpha test or
3395bd8deadSopenharmony_ci      using the depth test with a Z value produced by the shader.  Fragments
3405bd8deadSopenharmony_ci      will normally not be killed by fixed-function depth or stencil tests,
3415bd8deadSopenharmony_ci      but those tests can be enabled before fragment shader invocations using
3425bd8deadSopenharmony_ci      the layout qualifier "early_fragment_tests".  Any required
3435bd8deadSopenharmony_ci      fixed-function features that need to be handled before programmable
3445bd8deadSopenharmony_ci      blending that aren't enabled by "early_fragment_tests" would need to be
3455bd8deadSopenharmony_ci      emulated in the shader.
3465bd8deadSopenharmony_ci
3475bd8deadSopenharmony_ci      Note also that performing blend computations in the shader are not
3485bd8deadSopenharmony_ci      guaranteed to produce results that are bit-identical to these produced
3495bd8deadSopenharmony_ci      by fixed-function blending hardware, even if mathematically equivalent
3505bd8deadSopenharmony_ci      algorithms are used.
3515bd8deadSopenharmony_ci
3525bd8deadSopenharmony_ci    (6) For operations accessing shared per-pixel data structures in the
3535bd8deadSopenharmony_ci        critical section, what operations (if any) must be performed in shader
3545bd8deadSopenharmony_ci        code to ensure that stores from one shader invocation are visible to
3555bd8deadSopenharmony_ci        the next?
3565bd8deadSopenharmony_ci
3575bd8deadSopenharmony_ci      RESOLVED:  The "coherent" qualifier is required in the declaration of
3585bd8deadSopenharmony_ci      the shared data structures to ensure that writes performed by one
3595bd8deadSopenharmony_ci      invocation are visible to reads performed by another invocation.
3605bd8deadSopenharmony_ci
3615bd8deadSopenharmony_ci      In shaders that don't use the interlock, "coherent" is not sufficient as
3625bd8deadSopenharmony_ci      there is no guarantee of the ordering of fragment shader invocations --
3635bd8deadSopenharmony_ci      even if invocation A can see the values written by another invocation B,
3645bd8deadSopenharmony_ci      there is no general guarantee that invocation A's read will be performed
3655bd8deadSopenharmony_ci      before invocation B's write.  The built-in function memoryBarrier() can
3665bd8deadSopenharmony_ci      be used to generate a weak ordering by which threads can communicate,
3675bd8deadSopenharmony_ci      but it doesn't order memory transactions between two separate
3685bd8deadSopenharmony_ci      invocations.  With the interlock, execution ordering between two threads
3695bd8deadSopenharmony_ci      from the same pixel is well-defined as long as the loads and stores are
3705bd8deadSopenharmony_ci      performed inside the critical section, and the use of "coherent" ensures
3715bd8deadSopenharmony_ci      that stores done by one invocation are visible to other invocations.
3725bd8deadSopenharmony_ci
3735bd8deadSopenharmony_ci    (7) Should we provide an explicit mechanisms for shaders to indicate a
3745bd8deadSopenharmony_ci        critical section?  Or should we just automatically infer a critical
3755bd8deadSopenharmony_ci        section by analyzing shader code?  Or should we just wrap the entire
3765bd8deadSopenharmony_ci        fragment shader in a critical section?
3775bd8deadSopenharmony_ci
3785bd8deadSopenharmony_ci      RESOLVED:  Provide an explicit critical section.
3795bd8deadSopenharmony_ci
3805bd8deadSopenharmony_ci      We definitely don't want to wrap the entire shader in a critical section
3815bd8deadSopenharmony_ci      when a smaller section will suffice.  Doing so would hold off the
3825bd8deadSopenharmony_ci      execution of any other fragment shader invocation with the same (x,y)
3835bd8deadSopenharmony_ci      for the entire (potentially long) life of the fragment shader.  Hardware
3845bd8deadSopenharmony_ci      would need to track a large number of fragments awaiting execution, and
3855bd8deadSopenharmony_ci      may be so backed up that further fragments will be blocked even if they
3865bd8deadSopenharmony_ci      don't overlap with any fragments currently executing.  Providing a
3875bd8deadSopenharmony_ci      smaller critical section reduces the amount of time other fragments are
3885bd8deadSopenharmony_ci      blocked and allows implementations to perform useful work for
3895bd8deadSopenharmony_ci      conflicting fragments before they hit the critical section.
3905bd8deadSopenharmony_ci
3915bd8deadSopenharmony_ci      While a compiler could analyze the code and wrap a critical section
3925bd8deadSopenharmony_ci      around all memory accesses, it may be difficult to determine which
3935bd8deadSopenharmony_ci      accesses actually require mutual exclusion and ordering, and which
3945bd8deadSopenharmony_ci      accesses are safe to do with no protection.  Requiring shaders to
3955bd8deadSopenharmony_ci      explicitly identify a critical section doesn't seem overwhelmingly
3965bd8deadSopenharmony_ci      burdensome, and allows applications to exclude memory accesses that it
3975bd8deadSopenharmony_ci      knows to be "safe".
3985bd8deadSopenharmony_ci
3995bd8deadSopenharmony_ci    (8) What restrictions should be imposed on the use of the
4005bd8deadSopenharmony_ci        beginInvocationInterlockARB() and endInvocationInterlockARB() functions
4015bd8deadSopenharmony_ci        delimiting a critical section?
4025bd8deadSopenharmony_ci
4035bd8deadSopenharmony_ci      RESOLVED:  We impose restrictions similar to those on the barrier()
4045bd8deadSopenharmony_ci      built-in function in tessellation control shaders to ensure that any
4055bd8deadSopenharmony_ci      shader using this functionality has a single critical section that can
4065bd8deadSopenharmony_ci      be easily identified during compilation.  In particular, we require that
4075bd8deadSopenharmony_ci      these functions be called in main() and don't permit them to be called
4085bd8deadSopenharmony_ci      in conditional flow control.
4095bd8deadSopenharmony_ci
4105bd8deadSopenharmony_ci      These restrictions ensure that there is always exactly one call to the
4115bd8deadSopenharmony_ci      "begin" and "end" functions in a predictable location in the compiled
4125bd8deadSopenharmony_ci      shader code, and ensure that the compiler and hardware don't have to
4135bd8deadSopenharmony_ci      deal with unusual cases (like entering a critical section and never
4145bd8deadSopenharmony_ci      leaving, leaving a critical section without entering it, or trying to
4155bd8deadSopenharmony_ci      enter a critical section more than once).
4165bd8deadSopenharmony_ci
4175bd8deadSopenharmony_ciRevision History
4185bd8deadSopenharmony_ci
4195bd8deadSopenharmony_ci    Rev.    Date    Author        Changes
4205bd8deadSopenharmony_ci    ----  --------  --------     -----------------------------------------
4215bd8deadSopenharmony_ci     1    04/01/15  S.Grajewski  Inital version merging
4225bd8deadSopenharmony_ci                                 INTEL_fragment_shader_ordering with
4235bd8deadSopenharmony_ci                                 NV_fragment_shader_interlock
4245bd8deadSopenharmony_ci
4255bd8deadSopenharmony_ci     2    05/07/15  S.Grajewski  Built-in functions
4265bd8deadSopenharmony_ci                                 beginInvocationInterlockARB() and
4275bd8deadSopenharmony_ci                                 endInvocationInterlockARB() have now ARB
4285bd8deadSopenharmony_ci                                 suffixes.
429