15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_fragment_shader_interlock
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_NV_fragment_shader_interlock
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
125bd8deadSopenharmony_ci
135bd8deadSopenharmony_ciContributors
145bd8deadSopenharmony_ci
155bd8deadSopenharmony_ci    Jeff Bolz, NVIDIA Corporation
165bd8deadSopenharmony_ci    Mathias Heyer, NVIDIA Corporation
175bd8deadSopenharmony_ci
185bd8deadSopenharmony_ciStatus
195bd8deadSopenharmony_ci
205bd8deadSopenharmony_ci    Shipping
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ciVersion
235bd8deadSopenharmony_ci
245bd8deadSopenharmony_ci    Last Modified Date:         March 27, 2015
255bd8deadSopenharmony_ci    NVIDIA Revision:            2
265bd8deadSopenharmony_ci
275bd8deadSopenharmony_ciNumber
285bd8deadSopenharmony_ci
295bd8deadSopenharmony_ci    OpenGL Extension #468
305bd8deadSopenharmony_ci    OpenGL ES Extension #230
315bd8deadSopenharmony_ci
325bd8deadSopenharmony_ciDependencies
335bd8deadSopenharmony_ci
345bd8deadSopenharmony_ci    This extension is written against the OpenGL 4.3
355bd8deadSopenharmony_ci    (Compatibility Profile, dated February 14, 2013), and the
365bd8deadSopenharmony_ci    OpenGL ES 3.1.0 (dated March 17, 2014) Specification
375bd8deadSopenharmony_ci
385bd8deadSopenharmony_ci    This extension is written against the OpenGL Shading Language
395bd8deadSopenharmony_ci    Specification (version 4.30, revision 8) and the OpenGL ES Shading
405bd8deadSopenharmony_ci    Language Specification (version 3.10, revision 2).
415bd8deadSopenharmony_ci
425bd8deadSopenharmony_ci    OpenGL 4.3 and GLSL 4.30 are required in an OpenGL implementation
435bd8deadSopenharmony_ci    OpenGL ES 3.1 and GLSL ES 3.10 are required in an OpenGL ES implementation
445bd8deadSopenharmony_ci
455bd8deadSopenharmony_ci    This extension interacts with NV_shader_buffer_load and
465bd8deadSopenharmony_ci    NV_shader_buffer_store.
475bd8deadSopenharmony_ci
485bd8deadSopenharmony_ci    This extension interacts with NV_gpu_program4 and NV_gpu_program5.
495bd8deadSopenharmony_ci
505bd8deadSopenharmony_ci    This extension interacts with EXT_tessellation_shader.
515bd8deadSopenharmony_ci
525bd8deadSopenharmony_ci    This extension interacts with OES_sample_shading
535bd8deadSopenharmony_ci
545bd8deadSopenharmony_ci    This extension interacts with OES_shader_multisample_interpolation
555bd8deadSopenharmony_ci
565bd8deadSopenharmony_ci    This extension interacts with OES_shader_image_atomic
575bd8deadSopenharmony_ci
585bd8deadSopenharmony_ciOverview
595bd8deadSopenharmony_ci
605bd8deadSopenharmony_ci    In unextended OpenGL 4.3 or OpenGL ES 3.1, applications may produce a
615bd8deadSopenharmony_ci    large number of fragment shader invocations that perform loads and
625bd8deadSopenharmony_ci    stores to memory using image uniforms, atomic counter uniforms,
635bd8deadSopenharmony_ci    buffer variables, or pointers. The order in which loads and stores
645bd8deadSopenharmony_ci    to common addresses are performed by different fragment shader
655bd8deadSopenharmony_ci    invocations is largely undefined.  For algorithms that use shader
665bd8deadSopenharmony_ci    writes and touch the same pixels more than once, one or more of the
675bd8deadSopenharmony_ci    following techniques may be required to ensure proper execution ordering:
685bd8deadSopenharmony_ci
695bd8deadSopenharmony_ci      * inserting Finish or WaitSync commands to drain the pipeline between
705bd8deadSopenharmony_ci        different "passes" or "layers";
715bd8deadSopenharmony_ci
725bd8deadSopenharmony_ci      * using only atomic memory operations to write to shader memory (which
735bd8deadSopenharmony_ci        may be relatively slow and limits how memory may be updated); or
745bd8deadSopenharmony_ci
755bd8deadSopenharmony_ci      * injecting spin loops into shaders to prevent multiple shader
765bd8deadSopenharmony_ci        invocations from touching the same memory concurrently.
775bd8deadSopenharmony_ci
785bd8deadSopenharmony_ci    This extension provides new GLSL built-in functions
795bd8deadSopenharmony_ci    beginInvocationInterlockNV() and endInvocationInterlockNV() that delimit a
805bd8deadSopenharmony_ci    critical section of fragment shader code.  For pairs of shader invocations
815bd8deadSopenharmony_ci    with "overlapping" coverage in a given pixel, the OpenGL implementation
825bd8deadSopenharmony_ci    will guarantee that the critical section of the fragment shader will be
835bd8deadSopenharmony_ci    executed for only one fragment at a time.
845bd8deadSopenharmony_ci
855bd8deadSopenharmony_ci    There are four different interlock modes supported by this extension,
865bd8deadSopenharmony_ci    which are identified by layout qualifiers.  The qualifiers
875bd8deadSopenharmony_ci    "pixel_interlock_ordered" and "pixel_interlock_unordered" provides mutual
885bd8deadSopenharmony_ci    exclusion in the critical section for any pair of fragments corresponding
895bd8deadSopenharmony_ci    to the same pixel.  When using multisampling, the qualifiers
905bd8deadSopenharmony_ci    "sample_interlock_ordered" and "sample_interlock_unordered" only provide
915bd8deadSopenharmony_ci    mutual exclusion for pairs of fragments that both cover at least one
925bd8deadSopenharmony_ci    common sample in the same pixel; these are recommended for performance if
935bd8deadSopenharmony_ci    shaders use per-sample data structures.
945bd8deadSopenharmony_ci
955bd8deadSopenharmony_ci    Additionally, when the "pixel_interlock_ordered" or
965bd8deadSopenharmony_ci    "sample_interlock_ordered" layout qualifier is used, the interlock also
975bd8deadSopenharmony_ci    guarantees that the critical section for multiple shader invocations with
985bd8deadSopenharmony_ci    "overlapping" coverage will be executed in the order in which the
995bd8deadSopenharmony_ci    primitives were processed by the GL.  Such a guarantee is useful for
1005bd8deadSopenharmony_ci    applications like blending in the fragment shader, where an application
1015bd8deadSopenharmony_ci    requires that fragment values to be composited in the framebuffer in
1025bd8deadSopenharmony_ci    primitive order.
1035bd8deadSopenharmony_ci
1045bd8deadSopenharmony_ci    This extension can be useful for algorithms that need to access per-pixel
1055bd8deadSopenharmony_ci    data structures via shader loads and stores.  Such algorithms using this
1065bd8deadSopenharmony_ci    extension can access such data structures in the critical section without
1075bd8deadSopenharmony_ci    worrying about other invocations for the same pixel accessing the data
1085bd8deadSopenharmony_ci    structures concurrently.  Additionally, the ordering guarantees are useful
1095bd8deadSopenharmony_ci    for cases where the API ordering of fragments is meaningful.  For example,
1105bd8deadSopenharmony_ci    applications may be able to execute programmable blending operations in
1115bd8deadSopenharmony_ci    the fragment shader, where the destination buffer is read via image loads
1125bd8deadSopenharmony_ci    and the final value is written via image stores.
1135bd8deadSopenharmony_ci
1145bd8deadSopenharmony_ciNew Procedures and Functions
1155bd8deadSopenharmony_ci
1165bd8deadSopenharmony_ci    None.
1175bd8deadSopenharmony_ci
1185bd8deadSopenharmony_ciNew Tokens
1195bd8deadSopenharmony_ci
1205bd8deadSopenharmony_ci    None.
1215bd8deadSopenharmony_ci
1225bd8deadSopenharmony_ciModifications to the OpenGL 4.3 Specification (Compatibility Profile)
1235bd8deadSopenharmony_ci
1245bd8deadSopenharmony_ci    None.
1255bd8deadSopenharmony_ci
1265bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.30
1275bd8deadSopenharmony_ci
1285bd8deadSopenharmony_ci    Including the following line in a shader can be used to control the
1295bd8deadSopenharmony_ci    language features described in this extension:
1305bd8deadSopenharmony_ci
1315bd8deadSopenharmony_ci      #extension GL_NV_fragment_shader_interlock : <behavior>
1325bd8deadSopenharmony_ci
1335bd8deadSopenharmony_ci    where <behavior> is as specified in section 3.3.
1345bd8deadSopenharmony_ci
1355bd8deadSopenharmony_ci    New preprocessor #defines are added to the OpenGL Shading Language:
1365bd8deadSopenharmony_ci
1375bd8deadSopenharmony_ci      #define GL_NV_fragment_shader_interlock           1
1385bd8deadSopenharmony_ci
1395bd8deadSopenharmony_ci
1405bd8deadSopenharmony_ci    Modify Section 4.4.1.3, Fragment Shader Inputs (p. 58)
1415bd8deadSopenharmony_ci
1425bd8deadSopenharmony_ci    (add to the list of layout qualifiers containing "early_fragment_tests",
1435bd8deadSopenharmony_ci     p. 59, and modify the surrounding language to reflect that multiple
1445bd8deadSopenharmony_ci     layout qualifiers are supported on "in")
1455bd8deadSopenharmony_ci
1465bd8deadSopenharmony_ci      layout-qualifier-id
1475bd8deadSopenharmony_ci        pixel_interlock_ordered
1485bd8deadSopenharmony_ci        pixel_interlock_unordered
1495bd8deadSopenharmony_ci        sample_interlock_ordered
1505bd8deadSopenharmony_ci        sample_interlock_unordered
1515bd8deadSopenharmony_ci
1525bd8deadSopenharmony_ci    (add to the end of the section, p. 59)
1535bd8deadSopenharmony_ci
1545bd8deadSopenharmony_ci    The identifiers "pixel_interlock_ordered", "pixel_interlock_unordered",
1555bd8deadSopenharmony_ci    "sample_interlock_ordered", and "sample_interlock_unordered" control the
1565bd8deadSopenharmony_ci    ordering of the execution of shader invocations between calls to the
1575bd8deadSopenharmony_ci    built-in functions beginInvocationInterlockNV() and
1585bd8deadSopenharmony_ci    endInvocationInterlockNV(), as described in section 8.13.3. A
1595bd8deadSopenharmony_ci    compile or link error will be generated if more than one of these layout
1605bd8deadSopenharmony_ci    qualifiers is specified in shader code. If a program containing a
1615bd8deadSopenharmony_ci    fragment shader includes none of these layout qualifiers, it is as
1625bd8deadSopenharmony_ci    though "pixel_interlock_ordered" were specified.
1635bd8deadSopenharmony_ci
1645bd8deadSopenharmony_ci    Add to the end of Section 8.13, Fragment Processing Functions (p. 168)
1655bd8deadSopenharmony_ci
1665bd8deadSopenharmony_ci    8.13.3, Fragment Shader Execution Ordering Functions
1675bd8deadSopenharmony_ci
1685bd8deadSopenharmony_ci    By default, fragment shader invocations are generally executed in
1695bd8deadSopenharmony_ci    undefined order. Multiple fragment shader invocations may be executed
1705bd8deadSopenharmony_ci    concurrently, including multiple invocations corresponding to a single
1715bd8deadSopenharmony_ci    pixel. Additionally, fragment shader invocations for a single pixel might
1725bd8deadSopenharmony_ci    not be processed in the order in which the primitives generating the
1735bd8deadSopenharmony_ci    fragments were specified in the OpenGL API.
1745bd8deadSopenharmony_ci
1755bd8deadSopenharmony_ci    The paired functions beginInvocationInterlockNV() and
1765bd8deadSopenharmony_ci    endInvocationInterlockNV() allow shaders to specify a critical section,
1775bd8deadSopenharmony_ci    inside which stronger execution ordering is guaranteed.  When using the
1785bd8deadSopenharmony_ci    "pixel_interlock_ordered" or "pixel_interlock_unordered" qualifier,
1795bd8deadSopenharmony_ci    ordering guarantees are provided for any pair of fragment shader
1805bd8deadSopenharmony_ci    invocations X and Y triggered by fragments A and B corresponding to the
1815bd8deadSopenharmony_ci    same pixel. When using the "sample_interlock_ordered" or
1825bd8deadSopenharmony_ci    "sample_interlock_unordered" qualifier, ordering guarantees are provided
1835bd8deadSopenharmony_ci    for any pair of fragment shader invocations X and Y triggered by fragments
1845bd8deadSopenharmony_ci    A and B that correspond to the same pixel, where at least one sample of
1855bd8deadSopenharmony_ci    the pixel is covered by both fragments. No ordering guarantees are
1865bd8deadSopenharmony_ci    provided for pairs of fragment shader invocations corresponding to
1875bd8deadSopenharmony_ci    different pixels. Additionally, no ordering guarantees are provided for
1885bd8deadSopenharmony_ci    pairs of fragment shader invocations corresponding to the same fragment.
1895bd8deadSopenharmony_ci    When multisampling is enabled and the framebuffer has sample buffers,
1905bd8deadSopenharmony_ci    multiple fragment shader invocations may result from a single fragment due
1915bd8deadSopenharmony_ci    to the use of the "sample" auxilliary storage qualifier, OpenGL API
1925bd8deadSopenharmony_ci    commands forcing multiple shader invocations per fragment, or for other
1935bd8deadSopenharmony_ci    implementation-dependent reasons.
1945bd8deadSopenharmony_ci
1955bd8deadSopenharmony_ci    When using the "pixel_interlock_unordered" or "sample_interlock_unordered"
1965bd8deadSopenharmony_ci    qualifier, the interlock will ensure that the critical sections of
1975bd8deadSopenharmony_ci    fragment shader invocations X and Y with overlapping coverage will never
1985bd8deadSopenharmony_ci    execute concurrently. That is, invocation X is guaranteed to complete its
1995bd8deadSopenharmony_ci    call to endInvocationInterlockNV() before invocation Y completes its call
2005bd8deadSopenharmony_ci    to beginInvocationInterlockNV(), or vice versa.
2015bd8deadSopenharmony_ci
2025bd8deadSopenharmony_ci    When using the "pixel_interlock_ordered" or "sample_interlock_ordered"
2035bd8deadSopenharmony_ci    layout qualifier, the critical sections of invocations X and Y with
2045bd8deadSopenharmony_ci    overlapping coverage will be executed in a specific order, based on the
2055bd8deadSopenharmony_ci    relative order assigned to their fragments A and B.  If fragment A is
2065bd8deadSopenharmony_ci    considered to precede fragment B, the critical section of invocation X is
2075bd8deadSopenharmony_ci    guaranteed to complete before the critical section of invocation Y begins.
2085bd8deadSopenharmony_ci    When a pair of fragments A and B have overlapping coverage, fragment A is
2095bd8deadSopenharmony_ci    considered to precede fragment B if
2105bd8deadSopenharmony_ci
2115bd8deadSopenharmony_ci      * the OpenGL API command producing fragment A was called prior to the
2125bd8deadSopenharmony_ci        command producing B, or
2135bd8deadSopenharmony_ci
2145bd8deadSopenharmony_ci      * the point, line, triangle, [[compatibility profile: quadrilateral,
2155bd8deadSopenharmony_ci        polygon,]] or patch primitive producing fragment A appears earlier in
2165bd8deadSopenharmony_ci        the same strip, loop, fan, or independent primitive list producing
2175bd8deadSopenharmony_ci        fragment B.
2185bd8deadSopenharmony_ci
2195bd8deadSopenharmony_ci    When [[compatibility profile: decomposing quadrilateral or polygon
2205bd8deadSopenharmony_ci    primitives or]] tessellating a single patch primitive, multiple
2215bd8deadSopenharmony_ci    primitives may be generated in an undefined implementation-dependent
2225bd8deadSopenharmony_ci    order.  When fragments A and B are generated from such unordered
2235bd8deadSopenharmony_ci    primitives, their ordering is also implementation-dependent.
2245bd8deadSopenharmony_ci
2255bd8deadSopenharmony_ci    If fragment shader X completes its critical section before fragment shader
2265bd8deadSopenharmony_ci    Y begins its critical section, all stores to memory performed in the
2275bd8deadSopenharmony_ci    critical section of invocation X using a pointer, image uniform, atomic
2285bd8deadSopenharmony_ci    counter uniform, or buffer variable qualified by "coherent" are guaranteed
2295bd8deadSopenharmony_ci    to be visible to any reads of the same types of variable performed in the
2305bd8deadSopenharmony_ci    critical section of invocation Y.
2315bd8deadSopenharmony_ci
2325bd8deadSopenharmony_ci    If multisampling is disabled, or if the framebuffer does not include
2335bd8deadSopenharmony_ci    sample buffers, fragment coverage is computed per-pixel. In this case,
2345bd8deadSopenharmony_ci    the "sample_interlock_ordered" or "sample_interlock_unordered" layout
2355bd8deadSopenharmony_ci    qualifiers are treated as "pixel_interlock_ordered" or
2365bd8deadSopenharmony_ci    "pixel_interlock_unordered", respectively.
2375bd8deadSopenharmony_ci
2385bd8deadSopenharmony_ci
2395bd8deadSopenharmony_ci      Syntax:
2405bd8deadSopenharmony_ci
2415bd8deadSopenharmony_ci        void beginInvocationInterlockNV(void);
2425bd8deadSopenharmony_ci        void endInvocationInterlockNV(void);
2435bd8deadSopenharmony_ci
2445bd8deadSopenharmony_ci      Description:
2455bd8deadSopenharmony_ci
2465bd8deadSopenharmony_ci    The beginInvocationInterlockNV() and endInvocationInterlockNV() may only
2475bd8deadSopenharmony_ci    be placed inside the function main() of a fragment shader and may not be
2485bd8deadSopenharmony_ci    called within any flow control.  These functions may not be called after a
2495bd8deadSopenharmony_ci    return statement in the function main(), but may be called after a discard
2505bd8deadSopenharmony_ci    statement.  A compile- or link-time error will be generated if main()
2515bd8deadSopenharmony_ci    calls either function more than once, contains a call to one function
2525bd8deadSopenharmony_ci    without a matching call to the other, or calls endInvocationInterlockNV()
2535bd8deadSopenharmony_ci    before calling beginInvocationInterlockNV().
2545bd8deadSopenharmony_ci
2555bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications
2565bd8deadSopenharmony_ci
2575bd8deadSopenharmony_ci    None.
2585bd8deadSopenharmony_ci
2595bd8deadSopenharmony_ciErrors
2605bd8deadSopenharmony_ci
2615bd8deadSopenharmony_ci    None.
2625bd8deadSopenharmony_ci
2635bd8deadSopenharmony_ciNew State
2645bd8deadSopenharmony_ci
2655bd8deadSopenharmony_ci    None.
2665bd8deadSopenharmony_ci
2675bd8deadSopenharmony_ciNew Implementation Dependent State
2685bd8deadSopenharmony_ci
2695bd8deadSopenharmony_ci    None.
2705bd8deadSopenharmony_ci
2715bd8deadSopenharmony_ciInteractions with OpenGL ES 3.1
2725bd8deadSopenharmony_ci
2735bd8deadSopenharmony_ci    Disabling multisample rasterization is not available on OpenGL ES;
2745bd8deadSopenharmony_ci    it is always enabled.
2755bd8deadSopenharmony_ci
2765bd8deadSopenharmony_ci
2775bd8deadSopenharmony_ciDependencies on EXT_tessellation_shader
2785bd8deadSopenharmony_ci
2795bd8deadSopenharmony_ci     If this extension is implemented on OpenGL ES and EXT_tessellation_shader
2805bd8deadSopenharmony_ci     is not supported, remove language referring to tessellation of patch
2815bd8deadSopenharmony_ci     primitives.
2825bd8deadSopenharmony_ci
2835bd8deadSopenharmony_ci
2845bd8deadSopenharmony_ciDependencies on OES_sample_shading
2855bd8deadSopenharmony_ci
2865bd8deadSopenharmony_ci     If this extension is implemented on OpenGL ES and OES_sample_shading
2875bd8deadSopenharmony_ci     is not supported, remove references to per-sample shading via
2885bd8deadSopenharmony_ci     MinSampleShading[OES]()
2895bd8deadSopenharmony_ci
2905bd8deadSopenharmony_ci
2915bd8deadSopenharmony_ciDependencies on OES_shader_image_atomic
2925bd8deadSopenharmony_ci
2935bd8deadSopenharmony_ci    If this extension is implemented on OpenGL ES and OES_shader_image_atomic
2945bd8deadSopenharmony_ci    is not supported, disregard language referring to atomic memory operations.
2955bd8deadSopenharmony_ci
2965bd8deadSopenharmony_ci
2975bd8deadSopenharmony_ciDependencies on OES_shader_multisample_interpolation
2985bd8deadSopenharmony_ci
2995bd8deadSopenharmony_ci   If this extension is implemented on OpenGL ES and OES_shader_-
3005bd8deadSopenharmony_ci   multisample_interpolation is not supported, ignore language
3015bd8deadSopenharmony_ci   about the "sample" auxilliary storage qualifier.
3025bd8deadSopenharmony_ci
3035bd8deadSopenharmony_ci
3045bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load and NV_shader_buffer_store
3055bd8deadSopenharmony_ci
3065bd8deadSopenharmony_ci    If NV_shader_buffer_load and NV_shader_buffer_store are not supported,
3075bd8deadSopenharmony_ci    references to ordering memory accesses using pointers should be deleted.
3085bd8deadSopenharmony_ci
3095bd8deadSopenharmony_ci
3105bd8deadSopenharmony_ciDependencies on NV_gpu_program4 and NV_fragment_program4
3115bd8deadSopenharmony_ci
3125bd8deadSopenharmony_ci    Modify Section 2.X.2, Program Grammar, of the NV_fragment_program4
3135bd8deadSopenharmony_ci    specification (which modifies the NV_gpu_program4 base grammar)
3145bd8deadSopenharmony_ci
3155bd8deadSopenharmony_ci      <SpecialInstruction>    ::= "FSIB"
3165bd8deadSopenharmony_ci                                | "FSIE"
3175bd8deadSopenharmony_ci
3185bd8deadSopenharmony_ci
3195bd8deadSopenharmony_ci    Modify Section 2.X.4, Program Execution Environment
3205bd8deadSopenharmony_ci
3215bd8deadSopenharmony_ci    (add to the opcode table)
3225bd8deadSopenharmony_ci
3235bd8deadSopenharmony_ci                  Modifiers
3245bd8deadSopenharmony_ci      Instruction F I C S H D  Out Inputs    Description
3255bd8deadSopenharmony_ci      ----------- - - - - - -  --- --------  --------------------------------
3265bd8deadSopenharmony_ci      FSIB        - - - - - -  -   -         begin fragment shader interlock
3275bd8deadSopenharmony_ci      FSIE        - - - - - -  -   -         end fragment shader interlock
3285bd8deadSopenharmony_ci
3295bd8deadSopenharmony_ci
3305bd8deadSopenharmony_ci    Modify Section 2.X.6, Program Options
3315bd8deadSopenharmony_ci
3325bd8deadSopenharmony_ci    + Fragment Shader Interlock (NV_pixel_interlock_ordered,
3335bd8deadSopenharmony_ci      NV_pixel_interlock_unordered, NV_sample_interlock_ordered, and
3345bd8deadSopenharmony_ci      NV_sample_interlock_ordered)
3355bd8deadSopenharmony_ci
3365bd8deadSopenharmony_ci    If a fragment program specifies the "NV_pixel_interlock_ordered",
3375bd8deadSopenharmony_ci    "NV_pixel_interlock_unordered", "NV_sample_interlock_ordered", or
3385bd8deadSopenharmony_ci    "NV_sample_interlock_ordered" options, it will configure a critical
3395bd8deadSopenharmony_ci    section using the FSIB (fragment shader interlock begin) and FSIE opcodes
3405bd8deadSopenharmony_ci    (fragment shader interlock end) opcodes.  The execution of the critical
3415bd8deadSopenharmony_ci    sections will be ordered for pairs of program invocations corresponding to
3425bd8deadSopenharmony_ci    the same pixel, as described in Section 8.13.3 of the OpenGL Shading
3435bd8deadSopenharmony_ci    Language Specification, where the four options are considered to specify
3445bd8deadSopenharmony_ci    layout qualifiers with names equivalent to matching the program option.
3455bd8deadSopenharmony_ci
3465bd8deadSopenharmony_ci    A program will fail to load if it specifies more than one of these program
3475bd8deadSopenharmony_ci    options, if it specifies exactly one of these options but does not contain
3485bd8deadSopenharmony_ci    exactly one FSIB instruction and one FSIE instruction, or if it contains
3495bd8deadSopenharmony_ci    an FSIB or FSIE instruction without specifying any of these options.
3505bd8deadSopenharmony_ci
3515bd8deadSopenharmony_ci
3525bd8deadSopenharmony_ci    Add the following subsections to section 2.X.8, Program Instruction Set
3535bd8deadSopenharmony_ci
3545bd8deadSopenharmony_ci
3555bd8deadSopenharmony_ci    Section 2.X.8.Z, FSIB:  Fragment Shader Interlock Begin
3565bd8deadSopenharmony_ci
3575bd8deadSopenharmony_ci    The FSIB instruction specifies the beginning of a critical section in a
3585bd8deadSopenharmony_ci    fragment program, where execution of the critical section is ordered
3595bd8deadSopenharmony_ci    relative to other fragments.  This instruction has no other effect.
3605bd8deadSopenharmony_ci
3615bd8deadSopenharmony_ci    The FSIB instruction is not allowed in arbitrary locations in a program.
3625bd8deadSopenharmony_ci    A program will fail to load if it includes an FSIB instruction inside a
3635bd8deadSopenharmony_ci    IF/ELSE/ENDIF block, inside a REP/ENDREP block, or inside any subroutine
3645bd8deadSopenharmony_ci    block other than the one labeled "main".  Additionally, a program will
3655bd8deadSopenharmony_ci    fail to load if it contains more than one FSIB instruction, or if its one
3665bd8deadSopenharmony_ci    FSIB instruction is not followed by an FSIE instruction.
3675bd8deadSopenharmony_ci
3685bd8deadSopenharmony_ci    FSIB has no operands and generates no result.
3695bd8deadSopenharmony_ci
3705bd8deadSopenharmony_ci
3715bd8deadSopenharmony_ci    Section 2.X.8.Z, FSIE:  Fragment Shader Interlock End
3725bd8deadSopenharmony_ci
3735bd8deadSopenharmony_ci    The FSIE instruction specifies the end of a critical section in a fragment
3745bd8deadSopenharmony_ci    program, where execution of the critical section is ordered relative to
3755bd8deadSopenharmony_ci    other fragments.  This instruction has no other effect.
3765bd8deadSopenharmony_ci
3775bd8deadSopenharmony_ci    The FSIE instruction is not allowed in arbitrary locations in a program.
3785bd8deadSopenharmony_ci    A program will fail to load if it includes an FSIE instruction inside a
3795bd8deadSopenharmony_ci    IF/ELSE/ENDIF block, inside a REP/ENDREP block, or inside any subroutine
3805bd8deadSopenharmony_ci    block other than the one labeled "main".  Additionally, a program will
3815bd8deadSopenharmony_ci    fail to load if it contains more than one FSIE instruction, or if its one
3825bd8deadSopenharmony_ci    FSIE instruction is not preceded by an FSIB instruction.
3835bd8deadSopenharmony_ci
3845bd8deadSopenharmony_ci    FSIE has no operands and generates no result.
3855bd8deadSopenharmony_ci
3865bd8deadSopenharmony_ciIssues
3875bd8deadSopenharmony_ci
3885bd8deadSopenharmony_ci    (1) What should this extension be called?
3895bd8deadSopenharmony_ci
3905bd8deadSopenharmony_ci      RESOLVED:  NV_fragment_shader_interlock.  The
3915bd8deadSopenharmony_ci      beginInvocationInterlockNV() and endInvocationInterlockNV() commands
3925bd8deadSopenharmony_ci      identify a critical section during which other invocations with
3935bd8deadSopenharmony_ci      overlapping coverage are locked out until the critical section
3945bd8deadSopenharmony_ci      completes.
3955bd8deadSopenharmony_ci
3965bd8deadSopenharmony_ci    (2) When using multisampling, the OpenGL specification permits
3975bd8deadSopenharmony_ci        multiple fragment shader invocations to be generated for a single
3985bd8deadSopenharmony_ci        fragment.  For example, per-sample shading using the "sample"
3995bd8deadSopenharmony_ci        auxilliary storage qualifier or the MinSampleShading() OpenGL API command
4005bd8deadSopenharmony_ci        can be used to force per-sample shading.  What execution ordering
4015bd8deadSopenharmony_ci        guarantees are provided between fragment shader invocations generated
4025bd8deadSopenharmony_ci        from the same fragment?
4035bd8deadSopenharmony_ci
4045bd8deadSopenharmony_ci      RESOLVED:  We don't provide any ordering guarantees in this extension.
4055bd8deadSopenharmony_ci      This implies that when using multisampling, there is no guarantee that
4065bd8deadSopenharmony_ci      two fragment shader invocations for the same fragment won't be executing
4075bd8deadSopenharmony_ci      their critical sections concurrently.  This could cause problems for
4085bd8deadSopenharmony_ci      algorithms sharing data structures between all the samples of a pixel
4095bd8deadSopenharmony_ci      unless accesses to these data structures are performed atomically.
4105bd8deadSopenharmony_ci
4115bd8deadSopenharmony_ci      When using per-sample shading, the interlock we provide *does* guarantee
4125bd8deadSopenharmony_ci      that no two invocations corresponding to the same sample execute the
4135bd8deadSopenharmony_ci      critical section concurrently.  If a separate set of data structures is
4145bd8deadSopenharmony_ci      provided for each sample, no conflicts should occur within the critical
4155bd8deadSopenharmony_ci      section.
4165bd8deadSopenharmony_ci
4175bd8deadSopenharmony_ci      Note that in addition to the per-sample shading options in the shading
4185bd8deadSopenharmony_ci      language and API, implementations may provide multisample antialiasing
4195bd8deadSopenharmony_ci      modes where the implementation can't simply run the fragment shader once
4205bd8deadSopenharmony_ci      and broadcast results to a large set of covered samples.
4215bd8deadSopenharmony_ci
4225bd8deadSopenharmony_ci    (3) What performance differences are expected between shaders using the
4235bd8deadSopenharmony_ci       "pixel" and "sample" layout qualifier variants in this extension (e.g.,
4245bd8deadSopenharmony_ci       "pixel_invocation_ordered" and "sample_invocation_ordered")?
4255bd8deadSopenharmony_ci
4265bd8deadSopenharmony_ci      RESOLVED:  We expect that shaders using "sample" qualifiers may have
4275bd8deadSopenharmony_ci      higher performance, since the implementation need not order pairs of
4285bd8deadSopenharmony_ci      fragments that touch the same pixel with "complementary" coverage.  Such
4295bd8deadSopenharmony_ci      situations are fairly common:  when two adjacent triangles combine to
4305bd8deadSopenharmony_ci      cover a given pixel, two fragments will be generated for the pixel but
4315bd8deadSopenharmony_ci      no sample will be covered by both.  When using "sample" qualifiers, the
4325bd8deadSopenharmony_ci      invocations for both fragments can run concurrently.  When using "pixel"
4335bd8deadSopenharmony_ci      qualifiers, the critical section for one fragment must wait until the
4345bd8deadSopenharmony_ci      critical section for the other fragment completes.
4355bd8deadSopenharmony_ci
4365bd8deadSopenharmony_ci    (4) What performance differences are expected between shaders using the
4375bd8deadSopenharmony_ci       "ordered" and "unordered" layout qualifier variants in this extension
4385bd8deadSopenharmony_ci       (e.g., "pixel_invocation_ordered" and "pixel_invocation_unordered")?
4395bd8deadSopenharmony_ci
4405bd8deadSopenharmony_ci      RESOLVED:  We expect that shaders using "unordered" may have higher
4415bd8deadSopenharmony_ci      performance, since the critical section implementation doesn't need to
4425bd8deadSopenharmony_ci      ensure that all previous invocations with overlapping coverage have
4435bd8deadSopenharmony_ci      completed their critical sections.  Some algorithms (e.g., building data
4445bd8deadSopenharmony_ci      structures in order-independent transparency algorithms) will require
4455bd8deadSopenharmony_ci      mutual exclusion when updating per-pixel data structures, but do not
4465bd8deadSopenharmony_ci      require that shaders execute in a specific ordering.
4475bd8deadSopenharmony_ci
4485bd8deadSopenharmony_ci    (5) Are fragment shaders using this extension allowed to write outputs?
4495bd8deadSopenharmony_ci        If so, is there any guarantee on the order in which such outputs are
4505bd8deadSopenharmony_ci        written to the framebuffer?
4515bd8deadSopenharmony_ci
4525bd8deadSopenharmony_ci      RESOLVED:  Yes, fragment shaders with critical sections may still write
4535bd8deadSopenharmony_ci      outputs.  If fragment shader outputs are written, they are stored or
4545bd8deadSopenharmony_ci      blended into the framebuffer in API order, as is the case for fragment
4555bd8deadSopenharmony_ci      shaders not using this extension.
4565bd8deadSopenharmony_ci
4575bd8deadSopenharmony_ci    (6) What considerations apply when using this extension to implement a
4585bd8deadSopenharmony_ci        programmable form of conventional blending using image stores?
4595bd8deadSopenharmony_ci
4605bd8deadSopenharmony_ci      RESOLVED:  Per-fragment operations performed in the pipeline following
4615bd8deadSopenharmony_ci      fragment shader execution obviously have no effect on image stores
4625bd8deadSopenharmony_ci      executing during fragment shader execution.  In particular, multisample
4635bd8deadSopenharmony_ci      operations such as broadcasting a single fragment output to multiple
4645bd8deadSopenharmony_ci      samples or modifying the coverage with alpha-to-coverage or a shader
4655bd8deadSopenharmony_ci      coverage mask output value have no effect.  Fragments can not be killed
4665bd8deadSopenharmony_ci      before fragment shader blending using the fixed-function alpha test or
4675bd8deadSopenharmony_ci      using the depth test with a Z value produced by the shader.  Fragments
4685bd8deadSopenharmony_ci      will normally not be killed by fixed-function depth or stencil tests,
4695bd8deadSopenharmony_ci      but those tests can be enabled before fragment shader invocations using
4705bd8deadSopenharmony_ci      the layout qualifier "early_fragment_tests".  Any required
4715bd8deadSopenharmony_ci      fixed-function features that need to be handled before programmable
4725bd8deadSopenharmony_ci      blending that aren't enabled by "early_fragment_tests" would need to be
4735bd8deadSopenharmony_ci      emulated in the shader.
4745bd8deadSopenharmony_ci
4755bd8deadSopenharmony_ci      Note also that performing blend computations in the shader are not
4765bd8deadSopenharmony_ci      guaranteed to produce results that are bit-identical to these produced
4775bd8deadSopenharmony_ci      by fixed-function blending hardware, even if mathematically equivalent
4785bd8deadSopenharmony_ci      algorithms are used.
4795bd8deadSopenharmony_ci
4805bd8deadSopenharmony_ci    (7) For operations accessing shared per-pixel data structures in the
4815bd8deadSopenharmony_ci        critical section, what operations (if any) must be performed in shader
4825bd8deadSopenharmony_ci        code to ensure that stores from one shader invocation are visible to
4835bd8deadSopenharmony_ci        the next?
4845bd8deadSopenharmony_ci
4855bd8deadSopenharmony_ci      RESOLVED:  The "coherent" qualifier is required in the declaration of
4865bd8deadSopenharmony_ci      the shared data structures to ensure that writes performed by one
4875bd8deadSopenharmony_ci      invocation are visible to reads performed by another invocation.
4885bd8deadSopenharmony_ci
4895bd8deadSopenharmony_ci      In shaders that don't use the interlock, "coherent" is not sufficient as
4905bd8deadSopenharmony_ci      there is no guarantee of the ordering of fragment shader invocations --
4915bd8deadSopenharmony_ci      even if invocation A can see the values written by another invocation B,
4925bd8deadSopenharmony_ci      there is no general guarantee that invocation A's read will be performed
4935bd8deadSopenharmony_ci      before invocation B's write.  The built-in function memoryBarrier() can
4945bd8deadSopenharmony_ci      be used to generate a weak ordering by which threads can communicate,
4955bd8deadSopenharmony_ci      but it doesn't order memory transactions between two separate
4965bd8deadSopenharmony_ci      invocations.  With the interlock, execution ordering between two threads
4975bd8deadSopenharmony_ci      from the same pixel is well-defined as long as the loads and stores are
4985bd8deadSopenharmony_ci      performed inside the critical section, and the use of "coherent" ensures
4995bd8deadSopenharmony_ci      that stores done by one invocation are visible to other invocations.
5005bd8deadSopenharmony_ci
5015bd8deadSopenharmony_ci    (8) Should we provide an explicit mechanisms for shaders to indicate a
5025bd8deadSopenharmony_ci        critical section?  Or should we just automatically infer a critical
5035bd8deadSopenharmony_ci        section by analyzing shader code?  Or should we just wrap the entire
5045bd8deadSopenharmony_ci        fragment shader in a critical section?
5055bd8deadSopenharmony_ci
5065bd8deadSopenharmony_ci      RESOLVED:  Provide an explicit critical section.
5075bd8deadSopenharmony_ci
5085bd8deadSopenharmony_ci      We definitely don't want to wrap the entire shader in a critical section
5095bd8deadSopenharmony_ci      when a smaller section will suffice.  Doing so would hold off the
5105bd8deadSopenharmony_ci      execution of any other fragment shader invocation with the same (x,y)
5115bd8deadSopenharmony_ci      for the entire (potentially long) life of the fragment shader.  Hardware
5125bd8deadSopenharmony_ci      would need to track a large number of fragments awaiting execution, and
5135bd8deadSopenharmony_ci      may be so backed up that further fragments will be blocked even if they
5145bd8deadSopenharmony_ci      don't overlap with any fragments currently executing.  Providing a
5155bd8deadSopenharmony_ci      smaller critical section reduces the amount of time other fragments are
5165bd8deadSopenharmony_ci      blocked and allows implementations to perform useful work for
5175bd8deadSopenharmony_ci      conflicting fragments before they hit the critical section.
5185bd8deadSopenharmony_ci
5195bd8deadSopenharmony_ci      While a compiler could analyze the code and wrap a critical section
5205bd8deadSopenharmony_ci      around all memory accesses, it may be difficult to determine which
5215bd8deadSopenharmony_ci      accesses actually require mutual exclusion and ordering, and which
5225bd8deadSopenharmony_ci      accesses are safe to do with no protection.  Requiring shaders to
5235bd8deadSopenharmony_ci      explicitly identify a critical section doesn't seem overwhelmingly
5245bd8deadSopenharmony_ci      burdensome, and allows applications to exclude memory accesses that it
5255bd8deadSopenharmony_ci      knows to be "safe".
5265bd8deadSopenharmony_ci
5275bd8deadSopenharmony_ci    (9) What restrictions should be imposed on the use of the
5285bd8deadSopenharmony_ci        beginInvocationInterlockNV() and endInvocationInterlockNV() functions
5295bd8deadSopenharmony_ci        delimiting a critical section?
5305bd8deadSopenharmony_ci
5315bd8deadSopenharmony_ci      RESOLVED:  We impose restrictions similar to those on the barrier()
5325bd8deadSopenharmony_ci      built-in function in tessellation control shaders to ensure that any
5335bd8deadSopenharmony_ci      shader using this functionality has a single critical section that can
5345bd8deadSopenharmony_ci      be easily identified during compilation.  In particular, we require that
5355bd8deadSopenharmony_ci      these functions be called in main() and don't permit them to be called
5365bd8deadSopenharmony_ci      in conditional flow control.
5375bd8deadSopenharmony_ci
5385bd8deadSopenharmony_ci      These restrictions ensure that there is always exactly one call to the
5395bd8deadSopenharmony_ci      "begin" and "end" functions in a predictable location in the compiled
5405bd8deadSopenharmony_ci      shader code, and ensure that the compiler and hardware don't have to
5415bd8deadSopenharmony_ci      deal with unusual cases (like entering a critical section and never
5425bd8deadSopenharmony_ci      leaving, leaving a critical section without entering it, or trying to
5435bd8deadSopenharmony_ci      enter a critical section more than once).
5445bd8deadSopenharmony_ci
5455bd8deadSopenharmony_ciRevision History
5465bd8deadSopenharmony_ci
5475bd8deadSopenharmony_ci    Revision 2, 2015/03/27
5485bd8deadSopenharmony_ci      - Add ES interactions
5495bd8deadSopenharmony_ci
5505bd8deadSopenharmony_ci    Revision 1
5515bd8deadSopenharmony_ci      - Internal revisions
552