15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_fragment_shader_interlock 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_NV_fragment_shader_interlock 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 125bd8deadSopenharmony_ci 135bd8deadSopenharmony_ciContributors 145bd8deadSopenharmony_ci 155bd8deadSopenharmony_ci Jeff Bolz, NVIDIA Corporation 165bd8deadSopenharmony_ci Mathias Heyer, NVIDIA Corporation 175bd8deadSopenharmony_ci 185bd8deadSopenharmony_ciStatus 195bd8deadSopenharmony_ci 205bd8deadSopenharmony_ci Shipping 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ciVersion 235bd8deadSopenharmony_ci 245bd8deadSopenharmony_ci Last Modified Date: March 27, 2015 255bd8deadSopenharmony_ci NVIDIA Revision: 2 265bd8deadSopenharmony_ci 275bd8deadSopenharmony_ciNumber 285bd8deadSopenharmony_ci 295bd8deadSopenharmony_ci OpenGL Extension #468 305bd8deadSopenharmony_ci OpenGL ES Extension #230 315bd8deadSopenharmony_ci 325bd8deadSopenharmony_ciDependencies 335bd8deadSopenharmony_ci 345bd8deadSopenharmony_ci This extension is written against the OpenGL 4.3 355bd8deadSopenharmony_ci (Compatibility Profile, dated February 14, 2013), and the 365bd8deadSopenharmony_ci OpenGL ES 3.1.0 (dated March 17, 2014) Specification 375bd8deadSopenharmony_ci 385bd8deadSopenharmony_ci This extension is written against the OpenGL Shading Language 395bd8deadSopenharmony_ci Specification (version 4.30, revision 8) and the OpenGL ES Shading 405bd8deadSopenharmony_ci Language Specification (version 3.10, revision 2). 415bd8deadSopenharmony_ci 425bd8deadSopenharmony_ci OpenGL 4.3 and GLSL 4.30 are required in an OpenGL implementation 435bd8deadSopenharmony_ci OpenGL ES 3.1 and GLSL ES 3.10 are required in an OpenGL ES implementation 445bd8deadSopenharmony_ci 455bd8deadSopenharmony_ci This extension interacts with NV_shader_buffer_load and 465bd8deadSopenharmony_ci NV_shader_buffer_store. 475bd8deadSopenharmony_ci 485bd8deadSopenharmony_ci This extension interacts with NV_gpu_program4 and NV_gpu_program5. 495bd8deadSopenharmony_ci 505bd8deadSopenharmony_ci This extension interacts with EXT_tessellation_shader. 515bd8deadSopenharmony_ci 525bd8deadSopenharmony_ci This extension interacts with OES_sample_shading 535bd8deadSopenharmony_ci 545bd8deadSopenharmony_ci This extension interacts with OES_shader_multisample_interpolation 555bd8deadSopenharmony_ci 565bd8deadSopenharmony_ci This extension interacts with OES_shader_image_atomic 575bd8deadSopenharmony_ci 585bd8deadSopenharmony_ciOverview 595bd8deadSopenharmony_ci 605bd8deadSopenharmony_ci In unextended OpenGL 4.3 or OpenGL ES 3.1, applications may produce a 615bd8deadSopenharmony_ci large number of fragment shader invocations that perform loads and 625bd8deadSopenharmony_ci stores to memory using image uniforms, atomic counter uniforms, 635bd8deadSopenharmony_ci buffer variables, or pointers. The order in which loads and stores 645bd8deadSopenharmony_ci to common addresses are performed by different fragment shader 655bd8deadSopenharmony_ci invocations is largely undefined. For algorithms that use shader 665bd8deadSopenharmony_ci writes and touch the same pixels more than once, one or more of the 675bd8deadSopenharmony_ci following techniques may be required to ensure proper execution ordering: 685bd8deadSopenharmony_ci 695bd8deadSopenharmony_ci * inserting Finish or WaitSync commands to drain the pipeline between 705bd8deadSopenharmony_ci different "passes" or "layers"; 715bd8deadSopenharmony_ci 725bd8deadSopenharmony_ci * using only atomic memory operations to write to shader memory (which 735bd8deadSopenharmony_ci may be relatively slow and limits how memory may be updated); or 745bd8deadSopenharmony_ci 755bd8deadSopenharmony_ci * injecting spin loops into shaders to prevent multiple shader 765bd8deadSopenharmony_ci invocations from touching the same memory concurrently. 775bd8deadSopenharmony_ci 785bd8deadSopenharmony_ci This extension provides new GLSL built-in functions 795bd8deadSopenharmony_ci beginInvocationInterlockNV() and endInvocationInterlockNV() that delimit a 805bd8deadSopenharmony_ci critical section of fragment shader code. For pairs of shader invocations 815bd8deadSopenharmony_ci with "overlapping" coverage in a given pixel, the OpenGL implementation 825bd8deadSopenharmony_ci will guarantee that the critical section of the fragment shader will be 835bd8deadSopenharmony_ci executed for only one fragment at a time. 845bd8deadSopenharmony_ci 855bd8deadSopenharmony_ci There are four different interlock modes supported by this extension, 865bd8deadSopenharmony_ci which are identified by layout qualifiers. The qualifiers 875bd8deadSopenharmony_ci "pixel_interlock_ordered" and "pixel_interlock_unordered" provides mutual 885bd8deadSopenharmony_ci exclusion in the critical section for any pair of fragments corresponding 895bd8deadSopenharmony_ci to the same pixel. When using multisampling, the qualifiers 905bd8deadSopenharmony_ci "sample_interlock_ordered" and "sample_interlock_unordered" only provide 915bd8deadSopenharmony_ci mutual exclusion for pairs of fragments that both cover at least one 925bd8deadSopenharmony_ci common sample in the same pixel; these are recommended for performance if 935bd8deadSopenharmony_ci shaders use per-sample data structures. 945bd8deadSopenharmony_ci 955bd8deadSopenharmony_ci Additionally, when the "pixel_interlock_ordered" or 965bd8deadSopenharmony_ci "sample_interlock_ordered" layout qualifier is used, the interlock also 975bd8deadSopenharmony_ci guarantees that the critical section for multiple shader invocations with 985bd8deadSopenharmony_ci "overlapping" coverage will be executed in the order in which the 995bd8deadSopenharmony_ci primitives were processed by the GL. Such a guarantee is useful for 1005bd8deadSopenharmony_ci applications like blending in the fragment shader, where an application 1015bd8deadSopenharmony_ci requires that fragment values to be composited in the framebuffer in 1025bd8deadSopenharmony_ci primitive order. 1035bd8deadSopenharmony_ci 1045bd8deadSopenharmony_ci This extension can be useful for algorithms that need to access per-pixel 1055bd8deadSopenharmony_ci data structures via shader loads and stores. Such algorithms using this 1065bd8deadSopenharmony_ci extension can access such data structures in the critical section without 1075bd8deadSopenharmony_ci worrying about other invocations for the same pixel accessing the data 1085bd8deadSopenharmony_ci structures concurrently. Additionally, the ordering guarantees are useful 1095bd8deadSopenharmony_ci for cases where the API ordering of fragments is meaningful. For example, 1105bd8deadSopenharmony_ci applications may be able to execute programmable blending operations in 1115bd8deadSopenharmony_ci the fragment shader, where the destination buffer is read via image loads 1125bd8deadSopenharmony_ci and the final value is written via image stores. 1135bd8deadSopenharmony_ci 1145bd8deadSopenharmony_ciNew Procedures and Functions 1155bd8deadSopenharmony_ci 1165bd8deadSopenharmony_ci None. 1175bd8deadSopenharmony_ci 1185bd8deadSopenharmony_ciNew Tokens 1195bd8deadSopenharmony_ci 1205bd8deadSopenharmony_ci None. 1215bd8deadSopenharmony_ci 1225bd8deadSopenharmony_ciModifications to the OpenGL 4.3 Specification (Compatibility Profile) 1235bd8deadSopenharmony_ci 1245bd8deadSopenharmony_ci None. 1255bd8deadSopenharmony_ci 1265bd8deadSopenharmony_ciModifications to the OpenGL Shading Language Specification, Version 4.30 1275bd8deadSopenharmony_ci 1285bd8deadSopenharmony_ci Including the following line in a shader can be used to control the 1295bd8deadSopenharmony_ci language features described in this extension: 1305bd8deadSopenharmony_ci 1315bd8deadSopenharmony_ci #extension GL_NV_fragment_shader_interlock : <behavior> 1325bd8deadSopenharmony_ci 1335bd8deadSopenharmony_ci where <behavior> is as specified in section 3.3. 1345bd8deadSopenharmony_ci 1355bd8deadSopenharmony_ci New preprocessor #defines are added to the OpenGL Shading Language: 1365bd8deadSopenharmony_ci 1375bd8deadSopenharmony_ci #define GL_NV_fragment_shader_interlock 1 1385bd8deadSopenharmony_ci 1395bd8deadSopenharmony_ci 1405bd8deadSopenharmony_ci Modify Section 4.4.1.3, Fragment Shader Inputs (p. 58) 1415bd8deadSopenharmony_ci 1425bd8deadSopenharmony_ci (add to the list of layout qualifiers containing "early_fragment_tests", 1435bd8deadSopenharmony_ci p. 59, and modify the surrounding language to reflect that multiple 1445bd8deadSopenharmony_ci layout qualifiers are supported on "in") 1455bd8deadSopenharmony_ci 1465bd8deadSopenharmony_ci layout-qualifier-id 1475bd8deadSopenharmony_ci pixel_interlock_ordered 1485bd8deadSopenharmony_ci pixel_interlock_unordered 1495bd8deadSopenharmony_ci sample_interlock_ordered 1505bd8deadSopenharmony_ci sample_interlock_unordered 1515bd8deadSopenharmony_ci 1525bd8deadSopenharmony_ci (add to the end of the section, p. 59) 1535bd8deadSopenharmony_ci 1545bd8deadSopenharmony_ci The identifiers "pixel_interlock_ordered", "pixel_interlock_unordered", 1555bd8deadSopenharmony_ci "sample_interlock_ordered", and "sample_interlock_unordered" control the 1565bd8deadSopenharmony_ci ordering of the execution of shader invocations between calls to the 1575bd8deadSopenharmony_ci built-in functions beginInvocationInterlockNV() and 1585bd8deadSopenharmony_ci endInvocationInterlockNV(), as described in section 8.13.3. A 1595bd8deadSopenharmony_ci compile or link error will be generated if more than one of these layout 1605bd8deadSopenharmony_ci qualifiers is specified in shader code. If a program containing a 1615bd8deadSopenharmony_ci fragment shader includes none of these layout qualifiers, it is as 1625bd8deadSopenharmony_ci though "pixel_interlock_ordered" were specified. 1635bd8deadSopenharmony_ci 1645bd8deadSopenharmony_ci Add to the end of Section 8.13, Fragment Processing Functions (p. 168) 1655bd8deadSopenharmony_ci 1665bd8deadSopenharmony_ci 8.13.3, Fragment Shader Execution Ordering Functions 1675bd8deadSopenharmony_ci 1685bd8deadSopenharmony_ci By default, fragment shader invocations are generally executed in 1695bd8deadSopenharmony_ci undefined order. Multiple fragment shader invocations may be executed 1705bd8deadSopenharmony_ci concurrently, including multiple invocations corresponding to a single 1715bd8deadSopenharmony_ci pixel. Additionally, fragment shader invocations for a single pixel might 1725bd8deadSopenharmony_ci not be processed in the order in which the primitives generating the 1735bd8deadSopenharmony_ci fragments were specified in the OpenGL API. 1745bd8deadSopenharmony_ci 1755bd8deadSopenharmony_ci The paired functions beginInvocationInterlockNV() and 1765bd8deadSopenharmony_ci endInvocationInterlockNV() allow shaders to specify a critical section, 1775bd8deadSopenharmony_ci inside which stronger execution ordering is guaranteed. When using the 1785bd8deadSopenharmony_ci "pixel_interlock_ordered" or "pixel_interlock_unordered" qualifier, 1795bd8deadSopenharmony_ci ordering guarantees are provided for any pair of fragment shader 1805bd8deadSopenharmony_ci invocations X and Y triggered by fragments A and B corresponding to the 1815bd8deadSopenharmony_ci same pixel. When using the "sample_interlock_ordered" or 1825bd8deadSopenharmony_ci "sample_interlock_unordered" qualifier, ordering guarantees are provided 1835bd8deadSopenharmony_ci for any pair of fragment shader invocations X and Y triggered by fragments 1845bd8deadSopenharmony_ci A and B that correspond to the same pixel, where at least one sample of 1855bd8deadSopenharmony_ci the pixel is covered by both fragments. No ordering guarantees are 1865bd8deadSopenharmony_ci provided for pairs of fragment shader invocations corresponding to 1875bd8deadSopenharmony_ci different pixels. Additionally, no ordering guarantees are provided for 1885bd8deadSopenharmony_ci pairs of fragment shader invocations corresponding to the same fragment. 1895bd8deadSopenharmony_ci When multisampling is enabled and the framebuffer has sample buffers, 1905bd8deadSopenharmony_ci multiple fragment shader invocations may result from a single fragment due 1915bd8deadSopenharmony_ci to the use of the "sample" auxilliary storage qualifier, OpenGL API 1925bd8deadSopenharmony_ci commands forcing multiple shader invocations per fragment, or for other 1935bd8deadSopenharmony_ci implementation-dependent reasons. 1945bd8deadSopenharmony_ci 1955bd8deadSopenharmony_ci When using the "pixel_interlock_unordered" or "sample_interlock_unordered" 1965bd8deadSopenharmony_ci qualifier, the interlock will ensure that the critical sections of 1975bd8deadSopenharmony_ci fragment shader invocations X and Y with overlapping coverage will never 1985bd8deadSopenharmony_ci execute concurrently. That is, invocation X is guaranteed to complete its 1995bd8deadSopenharmony_ci call to endInvocationInterlockNV() before invocation Y completes its call 2005bd8deadSopenharmony_ci to beginInvocationInterlockNV(), or vice versa. 2015bd8deadSopenharmony_ci 2025bd8deadSopenharmony_ci When using the "pixel_interlock_ordered" or "sample_interlock_ordered" 2035bd8deadSopenharmony_ci layout qualifier, the critical sections of invocations X and Y with 2045bd8deadSopenharmony_ci overlapping coverage will be executed in a specific order, based on the 2055bd8deadSopenharmony_ci relative order assigned to their fragments A and B. If fragment A is 2065bd8deadSopenharmony_ci considered to precede fragment B, the critical section of invocation X is 2075bd8deadSopenharmony_ci guaranteed to complete before the critical section of invocation Y begins. 2085bd8deadSopenharmony_ci When a pair of fragments A and B have overlapping coverage, fragment A is 2095bd8deadSopenharmony_ci considered to precede fragment B if 2105bd8deadSopenharmony_ci 2115bd8deadSopenharmony_ci * the OpenGL API command producing fragment A was called prior to the 2125bd8deadSopenharmony_ci command producing B, or 2135bd8deadSopenharmony_ci 2145bd8deadSopenharmony_ci * the point, line, triangle, [[compatibility profile: quadrilateral, 2155bd8deadSopenharmony_ci polygon,]] or patch primitive producing fragment A appears earlier in 2165bd8deadSopenharmony_ci the same strip, loop, fan, or independent primitive list producing 2175bd8deadSopenharmony_ci fragment B. 2185bd8deadSopenharmony_ci 2195bd8deadSopenharmony_ci When [[compatibility profile: decomposing quadrilateral or polygon 2205bd8deadSopenharmony_ci primitives or]] tessellating a single patch primitive, multiple 2215bd8deadSopenharmony_ci primitives may be generated in an undefined implementation-dependent 2225bd8deadSopenharmony_ci order. When fragments A and B are generated from such unordered 2235bd8deadSopenharmony_ci primitives, their ordering is also implementation-dependent. 2245bd8deadSopenharmony_ci 2255bd8deadSopenharmony_ci If fragment shader X completes its critical section before fragment shader 2265bd8deadSopenharmony_ci Y begins its critical section, all stores to memory performed in the 2275bd8deadSopenharmony_ci critical section of invocation X using a pointer, image uniform, atomic 2285bd8deadSopenharmony_ci counter uniform, or buffer variable qualified by "coherent" are guaranteed 2295bd8deadSopenharmony_ci to be visible to any reads of the same types of variable performed in the 2305bd8deadSopenharmony_ci critical section of invocation Y. 2315bd8deadSopenharmony_ci 2325bd8deadSopenharmony_ci If multisampling is disabled, or if the framebuffer does not include 2335bd8deadSopenharmony_ci sample buffers, fragment coverage is computed per-pixel. In this case, 2345bd8deadSopenharmony_ci the "sample_interlock_ordered" or "sample_interlock_unordered" layout 2355bd8deadSopenharmony_ci qualifiers are treated as "pixel_interlock_ordered" or 2365bd8deadSopenharmony_ci "pixel_interlock_unordered", respectively. 2375bd8deadSopenharmony_ci 2385bd8deadSopenharmony_ci 2395bd8deadSopenharmony_ci Syntax: 2405bd8deadSopenharmony_ci 2415bd8deadSopenharmony_ci void beginInvocationInterlockNV(void); 2425bd8deadSopenharmony_ci void endInvocationInterlockNV(void); 2435bd8deadSopenharmony_ci 2445bd8deadSopenharmony_ci Description: 2455bd8deadSopenharmony_ci 2465bd8deadSopenharmony_ci The beginInvocationInterlockNV() and endInvocationInterlockNV() may only 2475bd8deadSopenharmony_ci be placed inside the function main() of a fragment shader and may not be 2485bd8deadSopenharmony_ci called within any flow control. These functions may not be called after a 2495bd8deadSopenharmony_ci return statement in the function main(), but may be called after a discard 2505bd8deadSopenharmony_ci statement. A compile- or link-time error will be generated if main() 2515bd8deadSopenharmony_ci calls either function more than once, contains a call to one function 2525bd8deadSopenharmony_ci without a matching call to the other, or calls endInvocationInterlockNV() 2535bd8deadSopenharmony_ci before calling beginInvocationInterlockNV(). 2545bd8deadSopenharmony_ci 2555bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications 2565bd8deadSopenharmony_ci 2575bd8deadSopenharmony_ci None. 2585bd8deadSopenharmony_ci 2595bd8deadSopenharmony_ciErrors 2605bd8deadSopenharmony_ci 2615bd8deadSopenharmony_ci None. 2625bd8deadSopenharmony_ci 2635bd8deadSopenharmony_ciNew State 2645bd8deadSopenharmony_ci 2655bd8deadSopenharmony_ci None. 2665bd8deadSopenharmony_ci 2675bd8deadSopenharmony_ciNew Implementation Dependent State 2685bd8deadSopenharmony_ci 2695bd8deadSopenharmony_ci None. 2705bd8deadSopenharmony_ci 2715bd8deadSopenharmony_ciInteractions with OpenGL ES 3.1 2725bd8deadSopenharmony_ci 2735bd8deadSopenharmony_ci Disabling multisample rasterization is not available on OpenGL ES; 2745bd8deadSopenharmony_ci it is always enabled. 2755bd8deadSopenharmony_ci 2765bd8deadSopenharmony_ci 2775bd8deadSopenharmony_ciDependencies on EXT_tessellation_shader 2785bd8deadSopenharmony_ci 2795bd8deadSopenharmony_ci If this extension is implemented on OpenGL ES and EXT_tessellation_shader 2805bd8deadSopenharmony_ci is not supported, remove language referring to tessellation of patch 2815bd8deadSopenharmony_ci primitives. 2825bd8deadSopenharmony_ci 2835bd8deadSopenharmony_ci 2845bd8deadSopenharmony_ciDependencies on OES_sample_shading 2855bd8deadSopenharmony_ci 2865bd8deadSopenharmony_ci If this extension is implemented on OpenGL ES and OES_sample_shading 2875bd8deadSopenharmony_ci is not supported, remove references to per-sample shading via 2885bd8deadSopenharmony_ci MinSampleShading[OES]() 2895bd8deadSopenharmony_ci 2905bd8deadSopenharmony_ci 2915bd8deadSopenharmony_ciDependencies on OES_shader_image_atomic 2925bd8deadSopenharmony_ci 2935bd8deadSopenharmony_ci If this extension is implemented on OpenGL ES and OES_shader_image_atomic 2945bd8deadSopenharmony_ci is not supported, disregard language referring to atomic memory operations. 2955bd8deadSopenharmony_ci 2965bd8deadSopenharmony_ci 2975bd8deadSopenharmony_ciDependencies on OES_shader_multisample_interpolation 2985bd8deadSopenharmony_ci 2995bd8deadSopenharmony_ci If this extension is implemented on OpenGL ES and OES_shader_- 3005bd8deadSopenharmony_ci multisample_interpolation is not supported, ignore language 3015bd8deadSopenharmony_ci about the "sample" auxilliary storage qualifier. 3025bd8deadSopenharmony_ci 3035bd8deadSopenharmony_ci 3045bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load and NV_shader_buffer_store 3055bd8deadSopenharmony_ci 3065bd8deadSopenharmony_ci If NV_shader_buffer_load and NV_shader_buffer_store are not supported, 3075bd8deadSopenharmony_ci references to ordering memory accesses using pointers should be deleted. 3085bd8deadSopenharmony_ci 3095bd8deadSopenharmony_ci 3105bd8deadSopenharmony_ciDependencies on NV_gpu_program4 and NV_fragment_program4 3115bd8deadSopenharmony_ci 3125bd8deadSopenharmony_ci Modify Section 2.X.2, Program Grammar, of the NV_fragment_program4 3135bd8deadSopenharmony_ci specification (which modifies the NV_gpu_program4 base grammar) 3145bd8deadSopenharmony_ci 3155bd8deadSopenharmony_ci <SpecialInstruction> ::= "FSIB" 3165bd8deadSopenharmony_ci | "FSIE" 3175bd8deadSopenharmony_ci 3185bd8deadSopenharmony_ci 3195bd8deadSopenharmony_ci Modify Section 2.X.4, Program Execution Environment 3205bd8deadSopenharmony_ci 3215bd8deadSopenharmony_ci (add to the opcode table) 3225bd8deadSopenharmony_ci 3235bd8deadSopenharmony_ci Modifiers 3245bd8deadSopenharmony_ci Instruction F I C S H D Out Inputs Description 3255bd8deadSopenharmony_ci ----------- - - - - - - --- -------- -------------------------------- 3265bd8deadSopenharmony_ci FSIB - - - - - - - - begin fragment shader interlock 3275bd8deadSopenharmony_ci FSIE - - - - - - - - end fragment shader interlock 3285bd8deadSopenharmony_ci 3295bd8deadSopenharmony_ci 3305bd8deadSopenharmony_ci Modify Section 2.X.6, Program Options 3315bd8deadSopenharmony_ci 3325bd8deadSopenharmony_ci + Fragment Shader Interlock (NV_pixel_interlock_ordered, 3335bd8deadSopenharmony_ci NV_pixel_interlock_unordered, NV_sample_interlock_ordered, and 3345bd8deadSopenharmony_ci NV_sample_interlock_ordered) 3355bd8deadSopenharmony_ci 3365bd8deadSopenharmony_ci If a fragment program specifies the "NV_pixel_interlock_ordered", 3375bd8deadSopenharmony_ci "NV_pixel_interlock_unordered", "NV_sample_interlock_ordered", or 3385bd8deadSopenharmony_ci "NV_sample_interlock_ordered" options, it will configure a critical 3395bd8deadSopenharmony_ci section using the FSIB (fragment shader interlock begin) and FSIE opcodes 3405bd8deadSopenharmony_ci (fragment shader interlock end) opcodes. The execution of the critical 3415bd8deadSopenharmony_ci sections will be ordered for pairs of program invocations corresponding to 3425bd8deadSopenharmony_ci the same pixel, as described in Section 8.13.3 of the OpenGL Shading 3435bd8deadSopenharmony_ci Language Specification, where the four options are considered to specify 3445bd8deadSopenharmony_ci layout qualifiers with names equivalent to matching the program option. 3455bd8deadSopenharmony_ci 3465bd8deadSopenharmony_ci A program will fail to load if it specifies more than one of these program 3475bd8deadSopenharmony_ci options, if it specifies exactly one of these options but does not contain 3485bd8deadSopenharmony_ci exactly one FSIB instruction and one FSIE instruction, or if it contains 3495bd8deadSopenharmony_ci an FSIB or FSIE instruction without specifying any of these options. 3505bd8deadSopenharmony_ci 3515bd8deadSopenharmony_ci 3525bd8deadSopenharmony_ci Add the following subsections to section 2.X.8, Program Instruction Set 3535bd8deadSopenharmony_ci 3545bd8deadSopenharmony_ci 3555bd8deadSopenharmony_ci Section 2.X.8.Z, FSIB: Fragment Shader Interlock Begin 3565bd8deadSopenharmony_ci 3575bd8deadSopenharmony_ci The FSIB instruction specifies the beginning of a critical section in a 3585bd8deadSopenharmony_ci fragment program, where execution of the critical section is ordered 3595bd8deadSopenharmony_ci relative to other fragments. This instruction has no other effect. 3605bd8deadSopenharmony_ci 3615bd8deadSopenharmony_ci The FSIB instruction is not allowed in arbitrary locations in a program. 3625bd8deadSopenharmony_ci A program will fail to load if it includes an FSIB instruction inside a 3635bd8deadSopenharmony_ci IF/ELSE/ENDIF block, inside a REP/ENDREP block, or inside any subroutine 3645bd8deadSopenharmony_ci block other than the one labeled "main". Additionally, a program will 3655bd8deadSopenharmony_ci fail to load if it contains more than one FSIB instruction, or if its one 3665bd8deadSopenharmony_ci FSIB instruction is not followed by an FSIE instruction. 3675bd8deadSopenharmony_ci 3685bd8deadSopenharmony_ci FSIB has no operands and generates no result. 3695bd8deadSopenharmony_ci 3705bd8deadSopenharmony_ci 3715bd8deadSopenharmony_ci Section 2.X.8.Z, FSIE: Fragment Shader Interlock End 3725bd8deadSopenharmony_ci 3735bd8deadSopenharmony_ci The FSIE instruction specifies the end of a critical section in a fragment 3745bd8deadSopenharmony_ci program, where execution of the critical section is ordered relative to 3755bd8deadSopenharmony_ci other fragments. This instruction has no other effect. 3765bd8deadSopenharmony_ci 3775bd8deadSopenharmony_ci The FSIE instruction is not allowed in arbitrary locations in a program. 3785bd8deadSopenharmony_ci A program will fail to load if it includes an FSIE instruction inside a 3795bd8deadSopenharmony_ci IF/ELSE/ENDIF block, inside a REP/ENDREP block, or inside any subroutine 3805bd8deadSopenharmony_ci block other than the one labeled "main". Additionally, a program will 3815bd8deadSopenharmony_ci fail to load if it contains more than one FSIE instruction, or if its one 3825bd8deadSopenharmony_ci FSIE instruction is not preceded by an FSIB instruction. 3835bd8deadSopenharmony_ci 3845bd8deadSopenharmony_ci FSIE has no operands and generates no result. 3855bd8deadSopenharmony_ci 3865bd8deadSopenharmony_ciIssues 3875bd8deadSopenharmony_ci 3885bd8deadSopenharmony_ci (1) What should this extension be called? 3895bd8deadSopenharmony_ci 3905bd8deadSopenharmony_ci RESOLVED: NV_fragment_shader_interlock. The 3915bd8deadSopenharmony_ci beginInvocationInterlockNV() and endInvocationInterlockNV() commands 3925bd8deadSopenharmony_ci identify a critical section during which other invocations with 3935bd8deadSopenharmony_ci overlapping coverage are locked out until the critical section 3945bd8deadSopenharmony_ci completes. 3955bd8deadSopenharmony_ci 3965bd8deadSopenharmony_ci (2) When using multisampling, the OpenGL specification permits 3975bd8deadSopenharmony_ci multiple fragment shader invocations to be generated for a single 3985bd8deadSopenharmony_ci fragment. For example, per-sample shading using the "sample" 3995bd8deadSopenharmony_ci auxilliary storage qualifier or the MinSampleShading() OpenGL API command 4005bd8deadSopenharmony_ci can be used to force per-sample shading. What execution ordering 4015bd8deadSopenharmony_ci guarantees are provided between fragment shader invocations generated 4025bd8deadSopenharmony_ci from the same fragment? 4035bd8deadSopenharmony_ci 4045bd8deadSopenharmony_ci RESOLVED: We don't provide any ordering guarantees in this extension. 4055bd8deadSopenharmony_ci This implies that when using multisampling, there is no guarantee that 4065bd8deadSopenharmony_ci two fragment shader invocations for the same fragment won't be executing 4075bd8deadSopenharmony_ci their critical sections concurrently. This could cause problems for 4085bd8deadSopenharmony_ci algorithms sharing data structures between all the samples of a pixel 4095bd8deadSopenharmony_ci unless accesses to these data structures are performed atomically. 4105bd8deadSopenharmony_ci 4115bd8deadSopenharmony_ci When using per-sample shading, the interlock we provide *does* guarantee 4125bd8deadSopenharmony_ci that no two invocations corresponding to the same sample execute the 4135bd8deadSopenharmony_ci critical section concurrently. If a separate set of data structures is 4145bd8deadSopenharmony_ci provided for each sample, no conflicts should occur within the critical 4155bd8deadSopenharmony_ci section. 4165bd8deadSopenharmony_ci 4175bd8deadSopenharmony_ci Note that in addition to the per-sample shading options in the shading 4185bd8deadSopenharmony_ci language and API, implementations may provide multisample antialiasing 4195bd8deadSopenharmony_ci modes where the implementation can't simply run the fragment shader once 4205bd8deadSopenharmony_ci and broadcast results to a large set of covered samples. 4215bd8deadSopenharmony_ci 4225bd8deadSopenharmony_ci (3) What performance differences are expected between shaders using the 4235bd8deadSopenharmony_ci "pixel" and "sample" layout qualifier variants in this extension (e.g., 4245bd8deadSopenharmony_ci "pixel_invocation_ordered" and "sample_invocation_ordered")? 4255bd8deadSopenharmony_ci 4265bd8deadSopenharmony_ci RESOLVED: We expect that shaders using "sample" qualifiers may have 4275bd8deadSopenharmony_ci higher performance, since the implementation need not order pairs of 4285bd8deadSopenharmony_ci fragments that touch the same pixel with "complementary" coverage. Such 4295bd8deadSopenharmony_ci situations are fairly common: when two adjacent triangles combine to 4305bd8deadSopenharmony_ci cover a given pixel, two fragments will be generated for the pixel but 4315bd8deadSopenharmony_ci no sample will be covered by both. When using "sample" qualifiers, the 4325bd8deadSopenharmony_ci invocations for both fragments can run concurrently. When using "pixel" 4335bd8deadSopenharmony_ci qualifiers, the critical section for one fragment must wait until the 4345bd8deadSopenharmony_ci critical section for the other fragment completes. 4355bd8deadSopenharmony_ci 4365bd8deadSopenharmony_ci (4) What performance differences are expected between shaders using the 4375bd8deadSopenharmony_ci "ordered" and "unordered" layout qualifier variants in this extension 4385bd8deadSopenharmony_ci (e.g., "pixel_invocation_ordered" and "pixel_invocation_unordered")? 4395bd8deadSopenharmony_ci 4405bd8deadSopenharmony_ci RESOLVED: We expect that shaders using "unordered" may have higher 4415bd8deadSopenharmony_ci performance, since the critical section implementation doesn't need to 4425bd8deadSopenharmony_ci ensure that all previous invocations with overlapping coverage have 4435bd8deadSopenharmony_ci completed their critical sections. Some algorithms (e.g., building data 4445bd8deadSopenharmony_ci structures in order-independent transparency algorithms) will require 4455bd8deadSopenharmony_ci mutual exclusion when updating per-pixel data structures, but do not 4465bd8deadSopenharmony_ci require that shaders execute in a specific ordering. 4475bd8deadSopenharmony_ci 4485bd8deadSopenharmony_ci (5) Are fragment shaders using this extension allowed to write outputs? 4495bd8deadSopenharmony_ci If so, is there any guarantee on the order in which such outputs are 4505bd8deadSopenharmony_ci written to the framebuffer? 4515bd8deadSopenharmony_ci 4525bd8deadSopenharmony_ci RESOLVED: Yes, fragment shaders with critical sections may still write 4535bd8deadSopenharmony_ci outputs. If fragment shader outputs are written, they are stored or 4545bd8deadSopenharmony_ci blended into the framebuffer in API order, as is the case for fragment 4555bd8deadSopenharmony_ci shaders not using this extension. 4565bd8deadSopenharmony_ci 4575bd8deadSopenharmony_ci (6) What considerations apply when using this extension to implement a 4585bd8deadSopenharmony_ci programmable form of conventional blending using image stores? 4595bd8deadSopenharmony_ci 4605bd8deadSopenharmony_ci RESOLVED: Per-fragment operations performed in the pipeline following 4615bd8deadSopenharmony_ci fragment shader execution obviously have no effect on image stores 4625bd8deadSopenharmony_ci executing during fragment shader execution. In particular, multisample 4635bd8deadSopenharmony_ci operations such as broadcasting a single fragment output to multiple 4645bd8deadSopenharmony_ci samples or modifying the coverage with alpha-to-coverage or a shader 4655bd8deadSopenharmony_ci coverage mask output value have no effect. Fragments can not be killed 4665bd8deadSopenharmony_ci before fragment shader blending using the fixed-function alpha test or 4675bd8deadSopenharmony_ci using the depth test with a Z value produced by the shader. Fragments 4685bd8deadSopenharmony_ci will normally not be killed by fixed-function depth or stencil tests, 4695bd8deadSopenharmony_ci but those tests can be enabled before fragment shader invocations using 4705bd8deadSopenharmony_ci the layout qualifier "early_fragment_tests". Any required 4715bd8deadSopenharmony_ci fixed-function features that need to be handled before programmable 4725bd8deadSopenharmony_ci blending that aren't enabled by "early_fragment_tests" would need to be 4735bd8deadSopenharmony_ci emulated in the shader. 4745bd8deadSopenharmony_ci 4755bd8deadSopenharmony_ci Note also that performing blend computations in the shader are not 4765bd8deadSopenharmony_ci guaranteed to produce results that are bit-identical to these produced 4775bd8deadSopenharmony_ci by fixed-function blending hardware, even if mathematically equivalent 4785bd8deadSopenharmony_ci algorithms are used. 4795bd8deadSopenharmony_ci 4805bd8deadSopenharmony_ci (7) For operations accessing shared per-pixel data structures in the 4815bd8deadSopenharmony_ci critical section, what operations (if any) must be performed in shader 4825bd8deadSopenharmony_ci code to ensure that stores from one shader invocation are visible to 4835bd8deadSopenharmony_ci the next? 4845bd8deadSopenharmony_ci 4855bd8deadSopenharmony_ci RESOLVED: The "coherent" qualifier is required in the declaration of 4865bd8deadSopenharmony_ci the shared data structures to ensure that writes performed by one 4875bd8deadSopenharmony_ci invocation are visible to reads performed by another invocation. 4885bd8deadSopenharmony_ci 4895bd8deadSopenharmony_ci In shaders that don't use the interlock, "coherent" is not sufficient as 4905bd8deadSopenharmony_ci there is no guarantee of the ordering of fragment shader invocations -- 4915bd8deadSopenharmony_ci even if invocation A can see the values written by another invocation B, 4925bd8deadSopenharmony_ci there is no general guarantee that invocation A's read will be performed 4935bd8deadSopenharmony_ci before invocation B's write. The built-in function memoryBarrier() can 4945bd8deadSopenharmony_ci be used to generate a weak ordering by which threads can communicate, 4955bd8deadSopenharmony_ci but it doesn't order memory transactions between two separate 4965bd8deadSopenharmony_ci invocations. With the interlock, execution ordering between two threads 4975bd8deadSopenharmony_ci from the same pixel is well-defined as long as the loads and stores are 4985bd8deadSopenharmony_ci performed inside the critical section, and the use of "coherent" ensures 4995bd8deadSopenharmony_ci that stores done by one invocation are visible to other invocations. 5005bd8deadSopenharmony_ci 5015bd8deadSopenharmony_ci (8) Should we provide an explicit mechanisms for shaders to indicate a 5025bd8deadSopenharmony_ci critical section? Or should we just automatically infer a critical 5035bd8deadSopenharmony_ci section by analyzing shader code? Or should we just wrap the entire 5045bd8deadSopenharmony_ci fragment shader in a critical section? 5055bd8deadSopenharmony_ci 5065bd8deadSopenharmony_ci RESOLVED: Provide an explicit critical section. 5075bd8deadSopenharmony_ci 5085bd8deadSopenharmony_ci We definitely don't want to wrap the entire shader in a critical section 5095bd8deadSopenharmony_ci when a smaller section will suffice. Doing so would hold off the 5105bd8deadSopenharmony_ci execution of any other fragment shader invocation with the same (x,y) 5115bd8deadSopenharmony_ci for the entire (potentially long) life of the fragment shader. Hardware 5125bd8deadSopenharmony_ci would need to track a large number of fragments awaiting execution, and 5135bd8deadSopenharmony_ci may be so backed up that further fragments will be blocked even if they 5145bd8deadSopenharmony_ci don't overlap with any fragments currently executing. Providing a 5155bd8deadSopenharmony_ci smaller critical section reduces the amount of time other fragments are 5165bd8deadSopenharmony_ci blocked and allows implementations to perform useful work for 5175bd8deadSopenharmony_ci conflicting fragments before they hit the critical section. 5185bd8deadSopenharmony_ci 5195bd8deadSopenharmony_ci While a compiler could analyze the code and wrap a critical section 5205bd8deadSopenharmony_ci around all memory accesses, it may be difficult to determine which 5215bd8deadSopenharmony_ci accesses actually require mutual exclusion and ordering, and which 5225bd8deadSopenharmony_ci accesses are safe to do with no protection. Requiring shaders to 5235bd8deadSopenharmony_ci explicitly identify a critical section doesn't seem overwhelmingly 5245bd8deadSopenharmony_ci burdensome, and allows applications to exclude memory accesses that it 5255bd8deadSopenharmony_ci knows to be "safe". 5265bd8deadSopenharmony_ci 5275bd8deadSopenharmony_ci (9) What restrictions should be imposed on the use of the 5285bd8deadSopenharmony_ci beginInvocationInterlockNV() and endInvocationInterlockNV() functions 5295bd8deadSopenharmony_ci delimiting a critical section? 5305bd8deadSopenharmony_ci 5315bd8deadSopenharmony_ci RESOLVED: We impose restrictions similar to those on the barrier() 5325bd8deadSopenharmony_ci built-in function in tessellation control shaders to ensure that any 5335bd8deadSopenharmony_ci shader using this functionality has a single critical section that can 5345bd8deadSopenharmony_ci be easily identified during compilation. In particular, we require that 5355bd8deadSopenharmony_ci these functions be called in main() and don't permit them to be called 5365bd8deadSopenharmony_ci in conditional flow control. 5375bd8deadSopenharmony_ci 5385bd8deadSopenharmony_ci These restrictions ensure that there is always exactly one call to the 5395bd8deadSopenharmony_ci "begin" and "end" functions in a predictable location in the compiled 5405bd8deadSopenharmony_ci shader code, and ensure that the compiler and hardware don't have to 5415bd8deadSopenharmony_ci deal with unusual cases (like entering a critical section and never 5425bd8deadSopenharmony_ci leaving, leaving a critical section without entering it, or trying to 5435bd8deadSopenharmony_ci enter a critical section more than once). 5445bd8deadSopenharmony_ci 5455bd8deadSopenharmony_ciRevision History 5465bd8deadSopenharmony_ci 5475bd8deadSopenharmony_ci Revision 2, 2015/03/27 5485bd8deadSopenharmony_ci - Add ES interactions 5495bd8deadSopenharmony_ci 5505bd8deadSopenharmony_ci Revision 1 5515bd8deadSopenharmony_ci - Internal revisions 552