15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_shader_thread_group
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_NV_shader_thread_group
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContributors
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Jeannot Breton, NVIDIA
125bd8deadSopenharmony_ci    Pat Brown, NVIDIA
135bd8deadSopenharmony_ci    Eric Werness, NVIDIA
145bd8deadSopenharmony_ci    Mark Kilgard, NVIDIA
155bd8deadSopenharmony_ci
165bd8deadSopenharmony_ciContact
175bd8deadSopenharmony_ci
185bd8deadSopenharmony_ci    Jeannot Breton, NVIDIA Corporation (jbreton 'at' nvidia.com)
195bd8deadSopenharmony_ci
205bd8deadSopenharmony_ciStatus
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ci    Shipping.
235bd8deadSopenharmony_ci
245bd8deadSopenharmony_ciVersion
255bd8deadSopenharmony_ci
265bd8deadSopenharmony_ci    Last Modified Date:         7/21/2015
275bd8deadSopenharmony_ci    NVIDIA Revision:            4
285bd8deadSopenharmony_ci
295bd8deadSopenharmony_ciNumber
305bd8deadSopenharmony_ci
315bd8deadSopenharmony_ci    OpenGL Extension #447
325bd8deadSopenharmony_ci
335bd8deadSopenharmony_ciDependencies
345bd8deadSopenharmony_ci
355bd8deadSopenharmony_ci    This extension is written against the OpenGL 4.3 (Compatibility Profile)
365bd8deadSopenharmony_ci    Specification.
375bd8deadSopenharmony_ci
385bd8deadSopenharmony_ci    This extension is written against version 4.30 (revision 07) of the OpenGL 
395bd8deadSopenharmony_ci    Shading Language Specification.
405bd8deadSopenharmony_ci
415bd8deadSopenharmony_ci    OpenGL 4.3 and GLSL 4.3 are required.
425bd8deadSopenharmony_ci
435bd8deadSopenharmony_ci    This extension interacts with NV_gpu_program5
445bd8deadSopenharmony_ci
455bd8deadSopenharmony_ci    This extension interacts with NV_compute_program5
465bd8deadSopenharmony_ci
475bd8deadSopenharmony_ci    This extension interacts with NV_tessellation_program5
485bd8deadSopenharmony_ci
495bd8deadSopenharmony_ciOverview
505bd8deadSopenharmony_ci
515bd8deadSopenharmony_ci    Implementations of the OpenGL Shading Language may, but are not required 
525bd8deadSopenharmony_ci    to, run multiple shader threads for a single stage as a SIMD thread group, 
535bd8deadSopenharmony_ci    where individual execution threads are assigned to thread groups in an 
545bd8deadSopenharmony_ci    undefined, implementation-dependent order.  This extension provides a set 
555bd8deadSopenharmony_ci    of new features to the OpenGL Shading Language to query thread states and 
565bd8deadSopenharmony_ci    to share data between fragments within a 2x2 pixel quad. 
575bd8deadSopenharmony_ci    
585bd8deadSopenharmony_ci    More specifically the following functionalities were added:
595bd8deadSopenharmony_ci
605bd8deadSopenharmony_ci    *   New uniform variables and tokens to query the number of threads in a 
615bd8deadSopenharmony_ci        warp, the number of warps running on a SM and the number of SMs on the 
625bd8deadSopenharmony_ci        GPU.
635bd8deadSopenharmony_ci    
645bd8deadSopenharmony_ci    *   New shader inputs to query the thread id, the warp id and the SM id.
655bd8deadSopenharmony_ci
665bd8deadSopenharmony_ci    *   New shader inputs to query if a fragment shader thread is a helper
675bd8deadSopenharmony_ci        thread.
685bd8deadSopenharmony_ci        
695bd8deadSopenharmony_ci    *   New shader built-in functions to query the state of a Boolean condition
705bd8deadSopenharmony_ci        over all threads in a thread group.
715bd8deadSopenharmony_ci        
725bd8deadSopenharmony_ci    *   New shader built-in functions to query which threads are active within
735bd8deadSopenharmony_ci        a thread group.        
745bd8deadSopenharmony_ci        
755bd8deadSopenharmony_ci    *   New fragment shader built-in functions to share data between fragments 
765bd8deadSopenharmony_ci        within a 2x2 pixel quad.
775bd8deadSopenharmony_ci
785bd8deadSopenharmony_ci    Shaders using the new functionalities provided by this extension should 
795bd8deadSopenharmony_ci    enable this functionality via the construct
805bd8deadSopenharmony_ci    
815bd8deadSopenharmony_ci        #extension GL_NV_shader_thread_group : require     (or enable)
825bd8deadSopenharmony_ci
835bd8deadSopenharmony_ci    This extension also specifies some modifications to the program assembly
845bd8deadSopenharmony_ci    language to support the thread state query and thread data sharing
855bd8deadSopenharmony_ci    functionalities.
865bd8deadSopenharmony_ci
875bd8deadSopenharmony_ci    Note that in this extension specification warp and thread group have the
885bd8deadSopenharmony_ci    same meaning.  A warp is a group of threads that get executed in lockstep.
895bd8deadSopenharmony_ci    Each thread in a warp executes the same instruction of a program, but on
905bd8deadSopenharmony_ci    different data.
915bd8deadSopenharmony_ci
925bd8deadSopenharmony_ciNew Procedures and Functions
935bd8deadSopenharmony_ci
945bd8deadSopenharmony_ci    None
955bd8deadSopenharmony_ci
965bd8deadSopenharmony_ci
975bd8deadSopenharmony_ciNew Tokens
985bd8deadSopenharmony_ci
995bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
1005bd8deadSopenharmony_ci    GetFloatv, and GetDoublev: 
1015bd8deadSopenharmony_ci
1025bd8deadSopenharmony_ci        WARP_SIZE_NV                                    0x9339
1035bd8deadSopenharmony_ci        WARPS_PER_SM_NV                                 0x933A
1045bd8deadSopenharmony_ci        SM_COUNT_NV                                     0x933B
1055bd8deadSopenharmony_ci
1065bd8deadSopenharmony_ci
1075bd8deadSopenharmony_ciModifications to The OpenGL Shading Language Specification, Version 4.30 
1085bd8deadSopenharmony_ci(Revision 07)
1095bd8deadSopenharmony_ci
1105bd8deadSopenharmony_ci    Including the following line in a shader can be used to control the 
1115bd8deadSopenharmony_ci    language features described in this extension:
1125bd8deadSopenharmony_ci
1135bd8deadSopenharmony_ci      #extension GL_NV_shader_thread_group : <behavior>
1145bd8deadSopenharmony_ci
1155bd8deadSopenharmony_ci    where <behavior> is as specified in section 3.3.
1165bd8deadSopenharmony_ci
1175bd8deadSopenharmony_ci    New preprocessor #defines are added to the OpenGL Shading Language:
1185bd8deadSopenharmony_ci
1195bd8deadSopenharmony_ci      #define GL_NV_shader_thread_group         1
1205bd8deadSopenharmony_ci
1215bd8deadSopenharmony_ci    Modify Section 7.1, Built-in Languages Variable, p. 110
1225bd8deadSopenharmony_ci
1235bd8deadSopenharmony_ci    (Add to the list of built-in variables for the compute, vertex, geometry,
1245bd8deadSopenharmony_ci     tessellation control, tessellation evaluation and fragment languages)
1255bd8deadSopenharmony_ci    
1265bd8deadSopenharmony_ci        in uint  gl_ThreadInWarpNV;
1275bd8deadSopenharmony_ci        in uint  gl_ThreadEqMaskNV;
1285bd8deadSopenharmony_ci        in uint  gl_ThreadGeMaskNV;
1295bd8deadSopenharmony_ci        in uint  gl_ThreadGtMaskNV;
1305bd8deadSopenharmony_ci        in uint  gl_ThreadLeMaskNV;
1315bd8deadSopenharmony_ci        in uint  gl_ThreadLtMaskNV;
1325bd8deadSopenharmony_ci        in uint  gl_WarpIDNV;
1335bd8deadSopenharmony_ci        in uint  gl_SMIDNV;
1345bd8deadSopenharmony_ci        
1355bd8deadSopenharmony_ci    (Add to the list of built-in variables for the fragment languages)
1365bd8deadSopenharmony_ci
1375bd8deadSopenharmony_ci        in bool  gl_HelperThreadNV;
1385bd8deadSopenharmony_ci
1395bd8deadSopenharmony_ci    (Add those paragraphs at the end of this section)
1405bd8deadSopenharmony_ci    
1415bd8deadSopenharmony_ci    The variable gl_ThreadInWarpNV hold the id of the thread within the thread
1425bd8deadSopenharmony_ci    group(or warp).  This variable is in the range 0 to gl_WarpSizeNV-1, where 
1435bd8deadSopenharmony_ci    gl_WarpSizeNV is the total number of thread in a warp.
1445bd8deadSopenharmony_ci    
1455bd8deadSopenharmony_ci    The variable gl_ThreadEqMaskNV is a bitfield in which the bit equal to the
1465bd8deadSopenharmony_ci    current thread id is set.  The variable gl_ThreadGeMaskNV is a bitfield in
1475bd8deadSopenharmony_ci    which bits greater or equal to the current thread id are set.  The variable
1485bd8deadSopenharmony_ci    gl_ThreadGtMaskNV is a bitfield in which bits greater than the current
1495bd8deadSopenharmony_ci    thread id are set.  The variable gl_ThreadLeMaskNV is a bitfield in which
1505bd8deadSopenharmony_ci    bits lower or equal to the current thread id are set.  The variable
1515bd8deadSopenharmony_ci    gl_ThreadLtMaskNV is a bitfield in which bits lower than the current thread
1525bd8deadSopenharmony_ci    id are set.  
1535bd8deadSopenharmony_ci
1545bd8deadSopenharmony_ci    The value of gl_ThreadEqMaskNV, gl_ThreadGeMaskNV, gl_ThreadGtMaskNV,
1555bd8deadSopenharmony_ci    gl_ThreadLeMaskNV and gl_ThreadLtMaskNV are derived from the value of 
1565bd8deadSopenharmony_ci    gl_ThreadInWarpNV using simple bit-shift arithmetic, they don't take into
1575bd8deadSopenharmony_ci    account the value of the thread group active mask.  For example, if the
1585bd8deadSopenharmony_ci    application wants a bitfield in which bits lower or equal to the current
1595bd8deadSopenharmony_ci    thread id are set only for active threads, the result of gl_ThreadLeMaskNV
1605bd8deadSopenharmony_ci    will need to be ANDed with the thread group active mask.
1615bd8deadSopenharmony_ci
1625bd8deadSopenharmony_ci    The variable gl_WarpIDNV hold the warp id of the executing thread.  This 
1635bd8deadSopenharmony_ci    variable is in the range 0 to gl_WarpsPerSMNV-1, where gl_WarpsPerSMNV is
1645bd8deadSopenharmony_ci    the maximum number of warp executing on a SM.
1655bd8deadSopenharmony_ci    
1665bd8deadSopenharmony_ci    The variable gl_SMIDNV hold the SM id of the executing thread.  This
1675bd8deadSopenharmony_ci    variable is in the range 0 to gl_SMCountNV-1, where gl_SMCountNV is the
1685bd8deadSopenharmony_ci    number of SM on the GPU.
1695bd8deadSopenharmony_ci    
1705bd8deadSopenharmony_ci    The variable gl_HelperThreadNV specifies if the current thread is a helper
1715bd8deadSopenharmony_ci    thread.  In implementations supporting this extension, fragment shader
1725bd8deadSopenharmony_ci    invocations may be arranged in SIMD thread groups of 2x2 fragments called
1735bd8deadSopenharmony_ci    "quad".  When a fragment shader instruction is executed on a quad, it's
1745bd8deadSopenharmony_ci    possible that some fragments within the quad will execute the instruction
1755bd8deadSopenharmony_ci    even if they are not covered by the primitive.  Those threads are called
1765bd8deadSopenharmony_ci    helper threads.  Their outputs will be discarded and they will not execute
1775bd8deadSopenharmony_ci    global store functions, but the intermediate values they compute can still
1785bd8deadSopenharmony_ci    be used by thread group sharing functions or by fragment derivative
1795bd8deadSopenharmony_ci    functions like dFdx and dFdy.
1805bd8deadSopenharmony_ci    
1815bd8deadSopenharmony_ci
1825bd8deadSopenharmony_ci    Modify Section 7.4, Built-In Uniform State, p. 125
1835bd8deadSopenharmony_ci
1845bd8deadSopenharmony_ci    (Add to the list of built-in uniform variable declaration)
1855bd8deadSopenharmony_ci    
1865bd8deadSopenharmony_ci        uniform uint  gl_WarpSizeNV;
1875bd8deadSopenharmony_ci        uniform uint  gl_WarpsPerSMNV;
1885bd8deadSopenharmony_ci        uniform uint  gl_SMCountNV;
1895bd8deadSopenharmony_ci
1905bd8deadSopenharmony_ci    (Add this paragraph at the end of this section)
1915bd8deadSopenharmony_ci    
1925bd8deadSopenharmony_ci    The variable gl_WarpSizeNV is the total number of thread in a warp.  The
1935bd8deadSopenharmony_ci    variable gl_WarpsPerSMNV is the maximum number of warp executing on a SM.
1945bd8deadSopenharmony_ci    The variable gl_SMCountNV is the number of SM on the GPU.
1955bd8deadSopenharmony_ci    
1965bd8deadSopenharmony_ci    
1975bd8deadSopenharmony_ci    Modify Section 8.3, Common Functions, p. 133
1985bd8deadSopenharmony_ci 
1995bd8deadSopenharmony_ci    (add a function to query which threads are active within a thread group)
2005bd8deadSopenharmony_ci
2015bd8deadSopenharmony_ci    Syntax:
2025bd8deadSopenharmony_ci
2035bd8deadSopenharmony_ci      uint  activeThreadsNV(void)
2045bd8deadSopenharmony_ci
2055bd8deadSopenharmony_ci    In the value returned by activeThreadsNV(), bit <N> is set to 1 if the
2065bd8deadSopenharmony_ci    corresponding thread in the SIMD thread group is executing the call to
2075bd8deadSopenharmony_ci    activeThreadsNV() and 0 otherwise.  A bit in the return value may be set
2085bd8deadSopenharmony_ci    to zero due to conditional flow control (e.g., returning from a function,
2095bd8deadSopenharmony_ci    executing the "else" part of an "if" statement) or SIMD thread group was
2105bd8deadSopenharmony_ci    dispatched without a full collection of threads.
2115bd8deadSopenharmony_ci 
2125bd8deadSopenharmony_ci    (add a function to query the state of a Boolean condition over all the 
2135bd8deadSopenharmony_ci    threads in a thread group)
2145bd8deadSopenharmony_ci
2155bd8deadSopenharmony_ci    Syntax:
2165bd8deadSopenharmony_ci
2175bd8deadSopenharmony_ci      uint  ballotThreadNV(bool value)
2185bd8deadSopenharmony_ci
2195bd8deadSopenharmony_ci    The function ballotThreadNV() computes a 32-bit bitfield.  It looks at the
2205bd8deadSopenharmony_ci    condition <value> for each active thread of a thread group and set to 1 
2215bd8deadSopenharmony_ci    each bit for which the condition in the corresponding thread is true.  Bits
2225bd8deadSopenharmony_ci    for threads with false condition are set to 0.  Bits for inactive threads 
2235bd8deadSopenharmony_ci    are also set to 0.  It's possible to query the active thread mask by 
2245bd8deadSopenharmony_ci    calling the function activeThreadsNV.
2255bd8deadSopenharmony_ci
2265bd8deadSopenharmony_ci    (add a function to share data between fragment in a quad)
2275bd8deadSopenharmony_ci
2285bd8deadSopenharmony_ci    Syntax:
2295bd8deadSopenharmony_ci
2305bd8deadSopenharmony_ci        float  quadSwizzle0NV(float swizzledValue, [float unswizzledValue])
2315bd8deadSopenharmony_ci        vec2   quadSwizzle0NV(vec2  swizzledValue, [vec2  unswizzledValue])
2325bd8deadSopenharmony_ci        vec3   quadSwizzle0NV(vec3  swizzledValue, [vec3  unswizzledValue])
2335bd8deadSopenharmony_ci        vec4   quadSwizzle0NV(vec4  swizzledValue, [vec4  unswizzledValue])
2345bd8deadSopenharmony_ci
2355bd8deadSopenharmony_ci        float  quadSwizzle1NV(float swizzledValue, [float unswizzledValue])
2365bd8deadSopenharmony_ci        vec2   quadSwizzle1NV(vec2  swizzledValue, [vec2  unswizzledValue])
2375bd8deadSopenharmony_ci        vec3   quadSwizzle1NV(vec3  swizzledValue, [vec3  unswizzledValue])
2385bd8deadSopenharmony_ci        vec4   quadSwizzle1NV(vec4  swizzledValue, [vec4  unswizzledValue])
2395bd8deadSopenharmony_ci
2405bd8deadSopenharmony_ci        float  quadSwizzle2NV(float swizzledValue, [float unswizzledValue])
2415bd8deadSopenharmony_ci        vec2   quadSwizzle2NV(vec2  swizzledValue, [vec2  unswizzledValue])
2425bd8deadSopenharmony_ci        vec3   quadSwizzle2NV(vec3  swizzledValue, [vec3  unswizzledValue])
2435bd8deadSopenharmony_ci        vec4   quadSwizzle2NV(vec4  swizzledValue, [vec4  unswizzledValue])
2445bd8deadSopenharmony_ci
2455bd8deadSopenharmony_ci        float  quadSwizzle3NV(float swizzledValue, [float unswizzledValue])
2465bd8deadSopenharmony_ci        vec2   quadSwizzle3NV(vec2  swizzledValue, [vec2  unswizzledValue])
2475bd8deadSopenharmony_ci        vec3   quadSwizzle3NV(vec3  swizzledValue, [vec3  unswizzledValue])
2485bd8deadSopenharmony_ci        vec4   quadSwizzle3NV(vec4  swizzledValue, [vec4  unswizzledValue])
2495bd8deadSopenharmony_ci
2505bd8deadSopenharmony_ci        float  quadSwizzleXNV(float swizzledValue, [float unswizzledValue])
2515bd8deadSopenharmony_ci        vec2   quadSwizzleXNV(vec2  swizzledValue, [vec2  unswizzledValue])
2525bd8deadSopenharmony_ci        vec3   quadSwizzleXNV(vec3  swizzledValue, [vec3  unswizzledValue])
2535bd8deadSopenharmony_ci        vec4   quadSwizzleXNV(vec4  swizzledValue, [vec4  unswizzledValue])
2545bd8deadSopenharmony_ci
2555bd8deadSopenharmony_ci        float  quadSwizzleYNV(float swizzledValue, [float unswizzledValue])
2565bd8deadSopenharmony_ci        vec2   quadSwizzleYNV(vec2  swizzledValue, [vec2  unswizzledValue])
2575bd8deadSopenharmony_ci        vec3   quadSwizzleYNV(vec3  swizzledValue, [vec3  unswizzledValue])
2585bd8deadSopenharmony_ci        vec4   quadSwizzleYNV(vec4  swizzledValue, [vec4  unswizzledValue])
2595bd8deadSopenharmony_ci
2605bd8deadSopenharmony_ci    In implementations supporting this extension, if a primitive covers a
2615bd8deadSopenharmony_ci    fragment at (x,y), its fragment shader invocation will be arranged in a
2625bd8deadSopenharmony_ci    SIMD thread group with fragment shader invocations corresponding to three
2635bd8deadSopenharmony_ci    neighboring pixels.  These four invocations are arranged in a 2x2 grid,
2645bd8deadSopenharmony_ci    called a "quad".  If the neighbors of a fragment are not covered by the
2655bd8deadSopenharmony_ci    primitive, fragment shader invocations will still be generated.  The
2665bd8deadSopenharmony_ci    implementation may compute differences between values in these threads to
2675bd8deadSopenharmony_ci    estimate derivatives for dFdx(), dFdy(), and for texture lookups with
2685bd8deadSopenharmony_ci    automatic LOD calculations.
2695bd8deadSopenharmony_ci
2705bd8deadSopenharmony_ci    Fragments may have different locations in the quads based on the type of
2715bd8deadSopenharmony_ci    render target.
2725bd8deadSopenharmony_ci
2735bd8deadSopenharmony_ci    When rendering to a window, fragments within a quad follow this pattern:
2745bd8deadSopenharmony_ci    
2755bd8deadSopenharmony_ci        ---------------------------------------------------
2765bd8deadSopenharmony_ci        | gl_ThreadInWarpNV 4N+0 | gl_ThreadInWarpNV 4N+1 |
2775bd8deadSopenharmony_ci        |     pixel (X+0,Y+1)    |     pixel (X+1,Y+1)    |
2785bd8deadSopenharmony_ci        ---------------------------------------------------
2795bd8deadSopenharmony_ci        | gl_ThreadInWarpNV 4N+2 | gl_ThreadInWarpNV 4N+3 |
2805bd8deadSopenharmony_ci        |     pixel (X+0,Y+0)    |     pixel (X+1,Y+0)    |
2815bd8deadSopenharmony_ci        ---------------------------------------------------
2825bd8deadSopenharmony_ci
2835bd8deadSopenharmony_ci    
2845bd8deadSopenharmony_ci    When rendering to a framebuffer object, fragments within a quad follow this
2855bd8deadSopenharmony_ci    pattern:
2865bd8deadSopenharmony_ci    
2875bd8deadSopenharmony_ci        ---------------------------------------------------
2885bd8deadSopenharmony_ci        | gl_ThreadInWarpNV 4N+2 | gl_ThreadInWarpNV 4N+3 |
2895bd8deadSopenharmony_ci        |     pixel (X+0,Y+1)    |     pixel (X+1,Y+1)    |
2905bd8deadSopenharmony_ci        ---------------------------------------------------
2915bd8deadSopenharmony_ci        | gl_ThreadInWarpNV 4N+0 | gl_ThreadInWarpNV 4N+1 |
2925bd8deadSopenharmony_ci        |     pixel (X+0,Y+0)    |     pixel (X+1,Y+0)    |
2935bd8deadSopenharmony_ci        ---------------------------------------------------
2945bd8deadSopenharmony_ci    
2955bd8deadSopenharmony_ci    There are 6 quadSwizzle functions that allow fragments within a quad to 
2965bd8deadSopenharmony_ci    exchange data.  All those functions will read a floating point  
2975bd8deadSopenharmony_ci    operand <swizzledValue>, which can come from any fragment in the quad. 
2985bd8deadSopenharmony_ci    Another optional floating point operand <unswizzledValue>, which comes from
2995bd8deadSopenharmony_ci    the current fragment, can be added to <swizzledValue>.  The only difference
3005bd8deadSopenharmony_ci    between all those quadSwizzle functions is the location where they get the
3015bd8deadSopenharmony_ci    <swizzledValue> operand within the 2x2 pixel quad.
3025bd8deadSopenharmony_ci
3035bd8deadSopenharmony_ci    quadSwizzle0NV will read the <swizzledValue> operand from the fragment 0:
3045bd8deadSopenharmony_ci    
3055bd8deadSopenharmony_ci        result[thread N] = swizzledValue[thread 0] + unswizzledValue[thread N]
3065bd8deadSopenharmony_ci    
3075bd8deadSopenharmony_ci
3085bd8deadSopenharmony_ci    quadSwizzle1NV will read the <swizzledValue> operand from the fragment 1:
3095bd8deadSopenharmony_ci    
3105bd8deadSopenharmony_ci        result[thread N] = swizzledValue[thread 1] + unswizzledValue[thread N]
3115bd8deadSopenharmony_ci
3125bd8deadSopenharmony_ci
3135bd8deadSopenharmony_ci    quadSwizzle2NV will read the <swizzledValue> operand from the fragment 2:
3145bd8deadSopenharmony_ci    
3155bd8deadSopenharmony_ci        result[thread N] = swizzledValue[thread 2] + unswizzledValue[thread N]
3165bd8deadSopenharmony_ci
3175bd8deadSopenharmony_ci
3185bd8deadSopenharmony_ci    quadSwizzle3NV will read the <swizzledValue> operand from the fragment 3:
3195bd8deadSopenharmony_ci    
3205bd8deadSopenharmony_ci        result[thread N] = swizzledValue[thread 3] + unswizzledValue[thread N]
3215bd8deadSopenharmony_ci
3225bd8deadSopenharmony_ci
3235bd8deadSopenharmony_ci    quadSwizzleXNV will read the <swizzledValue> operand for each fragment
3245bd8deadSopenharmony_ci    from its neighbor in X:
3255bd8deadSopenharmony_ci
3265bd8deadSopenharmony_ci        result[thread 0] = swizzledValue[thread 1] + unswizzledValue[thread 0]
3275bd8deadSopenharmony_ci        result[thread 1] = swizzledValue[thread 0] + unswizzledValue[thread 1]
3285bd8deadSopenharmony_ci        result[thread 2] = swizzledValue[thread 3] + unswizzledValue[thread 2]
3295bd8deadSopenharmony_ci        result[thread 3] = swizzledValue[thread 2] + unswizzledValue[thread 3]
3305bd8deadSopenharmony_ci
3315bd8deadSopenharmony_ci
3325bd8deadSopenharmony_ci    quadSwizzleYNV will read the <swizzledValue> operand for each fragment
3335bd8deadSopenharmony_ci    from its neighbor in Y:
3345bd8deadSopenharmony_ci
3355bd8deadSopenharmony_ci        result[thread 0] = swizzledValue[thread 2] + unswizzledValue[thread 0]
3365bd8deadSopenharmony_ci        result[thread 1] = swizzledValue[thread 3] + unswizzledValue[thread 1]
3375bd8deadSopenharmony_ci        result[thread 2] = swizzledValue[thread 0] + unswizzledValue[thread 2]
3385bd8deadSopenharmony_ci        result[thread 3] = swizzledValue[thread 1] + unswizzledValue[thread 3]
3395bd8deadSopenharmony_ci
3405bd8deadSopenharmony_ci    
3415bd8deadSopenharmony_ci    If any thread in a 2x2 pixel quad is inactive, the quad is divergent.  In
3425bd8deadSopenharmony_ci    this case quadSwizzle will return 0 for all fragments in the quad.
3435bd8deadSopenharmony_ci
3445bd8deadSopenharmony_ci  
3455bd8deadSopenharmony_ciDependencies on NV_gpu_program5
3465bd8deadSopenharmony_ci
3475bd8deadSopenharmony_ci    If NV_gpu_program5 is supported and "OPTION NV_shader_thread_group" is 
3485bd8deadSopenharmony_ci    specified in an assembly program, the following edits are made to extend 
3495bd8deadSopenharmony_ci    the assembly programming model documented in the NV_gpu_program4 extension
3505bd8deadSopenharmony_ci    and extended by NV_gpu_program5.  
3515bd8deadSopenharmony_ci
3525bd8deadSopenharmony_ci    If NV_gpu_program5 is not supported, or if "OPTION NV_shader_thread_group"
3535bd8deadSopenharmony_ci    is not specified in an assembly program, the contents of this dependencies
3545bd8deadSopenharmony_ci    section should be ignored.
3555bd8deadSopenharmony_ci
3565bd8deadSopenharmony_ci    Modify Section 2.X.2, Program Grammar
3575bd8deadSopenharmony_ci
3585bd8deadSopenharmony_ci    (add the following rules to the the NV_gpu_program4 and 
3595bd8deadSopenharmony_ci     NV_gpu_program5 base grammars)
3605bd8deadSopenharmony_ci
3615bd8deadSopenharmony_ci    <VECTORop>              ::= "TGBALLOT"
3625bd8deadSopenharmony_ci
3635bd8deadSopenharmony_ci    <stateSingleItem>       ::= "state" "." <stateThreadItem>
3645bd8deadSopenharmony_ci
3655bd8deadSopenharmony_ci    <stateThreadItem>       ::= "thread" "." <stateThreadProperty>
3665bd8deadSopenharmony_ci
3675bd8deadSopenharmony_ci    <stateThreadProperty>   ::= "warpsize"
3685bd8deadSopenharmony_ci                              | "warpspersm"
3695bd8deadSopenharmony_ci                              | "smcount"
3705bd8deadSopenharmony_ci
3715bd8deadSopenharmony_ci    (add/change the following rules to the NV_fragment_program4 and 
3725bd8deadSopenharmony_ci     NV_gpu_program5 base grammars)
3735bd8deadSopenharmony_ci
3745bd8deadSopenharmony_ci    <VECTORop>              ::= "QSWZ0"
3755bd8deadSopenharmony_ci                              | "QSWZ1"
3765bd8deadSopenharmony_ci                              | "QSWZ2"
3775bd8deadSopenharmony_ci                              | "QSWZ3"
3785bd8deadSopenharmony_ci                              | "QSWZX"
3795bd8deadSopenharmony_ci                              | "QSWZY"
3805bd8deadSopenharmony_ci
3815bd8deadSopenharmony_ci    <attribBasic>           ::= <fragPrefix> "threadid"
3825bd8deadSopenharmony_ci                              | <fragPrefix> "threadeqmask"
3835bd8deadSopenharmony_ci                              | <fragPrefix> "threadltmask"
3845bd8deadSopenharmony_ci                              | <fragPrefix> "threadlemask"
3855bd8deadSopenharmony_ci                              | <fragPrefix> "threadgtmask"
3865bd8deadSopenharmony_ci                              | <fragPrefix> "threadgemask"
3875bd8deadSopenharmony_ci                              | <fragPrefix> "warpid"
3885bd8deadSopenharmony_ci                              | <fragPrefix> "smid"
3895bd8deadSopenharmony_ci                              | <fragPrefix> "helperthread"
3905bd8deadSopenharmony_ci
3915bd8deadSopenharmony_ci    (add/change the following rules to the NV_vertex_program4 and 
3925bd8deadSopenharmony_ci     NV_gpu_program5 base grammars)
3935bd8deadSopenharmony_ci
3945bd8deadSopenharmony_ci    <attribBasic>           ::= <vtxPrefix> "threadid"
3955bd8deadSopenharmony_ci                              | <vtxPrefix> "threadeqmask"
3965bd8deadSopenharmony_ci                              | <vtxPrefix> "threadltmask"
3975bd8deadSopenharmony_ci                              | <vtxPrefix> "threadlemask"
3985bd8deadSopenharmony_ci                              | <vtxPrefix> "threadgtmask"
3995bd8deadSopenharmony_ci                              | <vtxPrefix> "threadgemask"
4005bd8deadSopenharmony_ci                              | <vtxPrefix> "warpid"
4015bd8deadSopenharmony_ci                              | <vtxPrefix> "smid"
4025bd8deadSopenharmony_ci
4035bd8deadSopenharmony_ci    (add/change the following rules to the NV_geometry_program4 and 
4045bd8deadSopenharmony_ci     NV_gpu_program5 base grammars)
4055bd8deadSopenharmony_ci
4065bd8deadSopenharmony_ci    <attribBasic>           ::= <primPrefix> "threadid"
4075bd8deadSopenharmony_ci                              | <primPrefix> "threadeqmask"
4085bd8deadSopenharmony_ci                              | <primPrefix> "threadltmask"
4095bd8deadSopenharmony_ci                              | <primPrefix> "threadlemask"
4105bd8deadSopenharmony_ci                              | <primPrefix> "threadgtmask"
4115bd8deadSopenharmony_ci                              | <primPrefix> "threadgemask"
4125bd8deadSopenharmony_ci                              | <primPrefix> "warpid"
4135bd8deadSopenharmony_ci                              | <primPrefix> "smid"
4145bd8deadSopenharmony_ci
4155bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_gpu_program4 specification, Program 
4165bd8deadSopenharmony_ci    Attribute Variables.
4175bd8deadSopenharmony_ci
4185bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the fragment program
4195bd8deadSopenharmony_ci     input variable use to query thread states.)
4205bd8deadSopenharmony_ci
4215bd8deadSopenharmony_ci      Fragment Attribute Binding  Components  Underlying State
4225bd8deadSopenharmony_ci      --------------------------  ----------  ----------------------------
4235bd8deadSopenharmony_ci      ...
4245bd8deadSopenharmony_ci      fragment.threadid           (id,-,-,-)  id of the current thread
4255bd8deadSopenharmony_ci      fragment.threadeqmask       (m,-,-,-)   mask with the current thread
4265bd8deadSopenharmony_ci      fragment.threadltmask       (m,-,-,-)   mask with lower thread
4275bd8deadSopenharmony_ci      fragment.threadlemask       (m,-,-,-)   mask with lower or equal thread
4285bd8deadSopenharmony_ci      fragment.threadgtmask       (m,-,-,-)   mask with greater thread
4295bd8deadSopenharmony_ci      fragment.threadgemask       (m,-,-,-)   mask with greater or equal thread
4305bd8deadSopenharmony_ci      fragment.warpid             (id,-,-,-)  warp id of the current thread
4315bd8deadSopenharmony_ci      fragment.smid               (id,-,-,-)  SM id of the current thread
4325bd8deadSopenharmony_ci      fragment.helperthread       (k,-,-,-)   current thread is a helper thread
4335bd8deadSopenharmony_ci      ...
4345bd8deadSopenharmony_ci
4355bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.threadid", the "x" 
4365bd8deadSopenharmony_ci    component is filled with the thread id of the current thread.  The thread
4375bd8deadSopenharmony_ci    id is an unsigned integer in the range 0 to 31.
4385bd8deadSopenharmony_ci
4395bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.threadeqmask", the "x"
4405bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which the
4415bd8deadSopenharmony_ci    bit equal to the current thread id is set.
4425bd8deadSopenharmony_ci
4435bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.threadltmask", the "x" 
4445bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
4455bd8deadSopenharmony_ci    lower than the current thread id are set.
4465bd8deadSopenharmony_ci
4475bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.threadlemask", the "x" 
4485bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
4495bd8deadSopenharmony_ci    lower or equal to the current thread id are set.
4505bd8deadSopenharmony_ci
4515bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.threadgtmask", the "x"
4525bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
4535bd8deadSopenharmony_ci    greater than the current thread id are set.
4545bd8deadSopenharmony_ci
4555bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.threadgemask", the "x"
4565bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
4575bd8deadSopenharmony_ci    greater or equal to the current thread id are set.
4585bd8deadSopenharmony_ci
4595bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.warpid", the "x"
4605bd8deadSopenharmony_ci    component is filled with the warp id of the current thread.  The warp id is
4615bd8deadSopenharmony_ci    an unsigned integer, the range of this value is hw dependent.
4625bd8deadSopenharmony_ci
4635bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.smid", the "x" component
4645bd8deadSopenharmony_ci    is filled with the SM id of the current thread.  The SM id is an unsigned 
4655bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
4665bd8deadSopenharmony_ci    
4675bd8deadSopenharmony_ci    If a fragment attribute binding matches "fragment.helperthread", the "x"
4685bd8deadSopenharmony_ci    component is an integer value equal to -1 when the current thread is a
4695bd8deadSopenharmony_ci    helper thread and 0 otherwise.  In implementations supporting this
4705bd8deadSopenharmony_ci    extension, fragment program invocations may be arranged in SIMD thread
4715bd8deadSopenharmony_ci    groups of 2x2 fragments called "quad".  When a fragment program instruction
4725bd8deadSopenharmony_ci    is executed on a quad, it's possible that some fragments within the quad
4735bd8deadSopenharmony_ci    will execute the instruction even if they are not covered by the primitive.
4745bd8deadSopenharmony_ci    Those threads are called helper threads.  Their outputs will be discarded
4755bd8deadSopenharmony_ci    and they will not execute global store instructions, but the intermediate
4765bd8deadSopenharmony_ci    values they compute can still be used by thread group sharing instructions
4775bd8deadSopenharmony_ci    or by fragment derivative instructions like DDX and DDY.
4785bd8deadSopenharmony_ci    
4795bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the vertex program 
4805bd8deadSopenharmony_ci     attribute variable use to query thread states.)
4815bd8deadSopenharmony_ci
4825bd8deadSopenharmony_ci      Vertex Attribute Binding  Components  Underlying State
4835bd8deadSopenharmony_ci      ------------------------  ----------  ----------------------------
4845bd8deadSopenharmony_ci      ...
4855bd8deadSopenharmony_ci      vertex.threadid           (id,-,-,-)  id of the current thread
4865bd8deadSopenharmony_ci      vertex.threadeqmask       (m,-,-,-)   mask with the current thread
4875bd8deadSopenharmony_ci      vertex.threadltmask       (m,-,-,-)   mask with lower thread
4885bd8deadSopenharmony_ci      vertex.threadlemask       (m,-,-,-)   mask with lower or equal thread
4895bd8deadSopenharmony_ci      vertex.threadgtmask       (m,-,-,-)   mask with greater thread
4905bd8deadSopenharmony_ci      vertex.threadgemask       (m,-,-,-)   mask with greater or equal thread
4915bd8deadSopenharmony_ci      vertex.warpid             (id,-,-,-)  warp id of the current thread
4925bd8deadSopenharmony_ci      vertex.smid               (id,-,-,-)  SM id of the current thread
4935bd8deadSopenharmony_ci      ...
4945bd8deadSopenharmony_ci
4955bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.threadid", the "x" component
4965bd8deadSopenharmony_ci    is filled with the thread id of the current thread.  The thread id is an
4975bd8deadSopenharmony_ci    unsigned integer in the range 0 to 31.
4985bd8deadSopenharmony_ci
4995bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.threadeqmask", the "x"
5005bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which the
5015bd8deadSopenharmony_ci    bit equal to the current thread id is set.
5025bd8deadSopenharmony_ci
5035bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.threadltmask", the "x" 
5045bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5055bd8deadSopenharmony_ci    lower than the current thread id are set.
5065bd8deadSopenharmony_ci
5075bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.threadlemask", the "x" 
5085bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5095bd8deadSopenharmony_ci    lower or equal to the current thread id are set.
5105bd8deadSopenharmony_ci
5115bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.threadgtmask", the "x"
5125bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5135bd8deadSopenharmony_ci    greater than the current thread id are set.
5145bd8deadSopenharmony_ci
5155bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.threadgemask", the "x"
5165bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5175bd8deadSopenharmony_ci    greater or equal to the current thread id are set.
5185bd8deadSopenharmony_ci
5195bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.warpid", the "x" component is
5205bd8deadSopenharmony_ci    filled with the warp id of the current thread.  The warp id is an unsigned
5215bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
5225bd8deadSopenharmony_ci
5235bd8deadSopenharmony_ci    If a vertex attribute binding matches "vertex.smid", the "x" component
5245bd8deadSopenharmony_ci    is filled with the SM id of the current thread.  The SM id is an unsigned 
5255bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
5265bd8deadSopenharmony_ci
5275bd8deadSopenharmony_ci
5285bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the geometry program 
5295bd8deadSopenharmony_ci     attribute variable use to query thread states.)
5305bd8deadSopenharmony_ci
5315bd8deadSopenharmony_ci      Geometry Attribute Binding  Components  Underlying State
5325bd8deadSopenharmony_ci      --------------------------  ----------  ----------------------------
5335bd8deadSopenharmony_ci      ...
5345bd8deadSopenharmony_ci      primitive.threadid          (id,-,-,-)  id of the current thread
5355bd8deadSopenharmony_ci      primitive.threadeqmask      (m,-,-,-)   mask with the current thread
5365bd8deadSopenharmony_ci      primitive.threadltmask      (m,-,-,-)   mask with lower thread
5375bd8deadSopenharmony_ci      primitive.threadlemask      (m,-,-,-)   mask with lower or equal thread
5385bd8deadSopenharmony_ci      primitive.threadgtmask      (m,-,-,-)   mask with greater thread
5395bd8deadSopenharmony_ci      primitive.threadgemask      (m,-,-,-)   mask with greater or equal thread
5405bd8deadSopenharmony_ci      primitive.warpid            (id,-,-,-)  warp id of the current thread
5415bd8deadSopenharmony_ci      primitive.smid              (id,-,-,-)  SM id of the current thread
5425bd8deadSopenharmony_ci      ...
5435bd8deadSopenharmony_ci
5445bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.threadid", the "x" 
5455bd8deadSopenharmony_ci    component is filled with the thread id of the current thread.  The thread
5465bd8deadSopenharmony_ci    id is an unsigned integer in the range 0 to 31.
5475bd8deadSopenharmony_ci
5485bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.threadeqmask", the "x"
5495bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which the
5505bd8deadSopenharmony_ci    bit equal to the current thread id is set.
5515bd8deadSopenharmony_ci
5525bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.threadltmask", the "x" 
5535bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5545bd8deadSopenharmony_ci    lower than the current thread id are set.
5555bd8deadSopenharmony_ci
5565bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.threadlemask", the "x" 
5575bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5585bd8deadSopenharmony_ci    lower or equal to the current thread id are set.
5595bd8deadSopenharmony_ci
5605bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.threadgtmask", the "x"
5615bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5625bd8deadSopenharmony_ci    greater than the current thread id are set.
5635bd8deadSopenharmony_ci
5645bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.threadgemask", the "x"
5655bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
5665bd8deadSopenharmony_ci    greater or equal to the current thread id are set.
5675bd8deadSopenharmony_ci
5685bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.warpid", the "x"
5695bd8deadSopenharmony_ci    component is filled with the warp id of the current thread.  The warp id is
5705bd8deadSopenharmony_ci    an unsigned integer, the range of this value is hw dependent.
5715bd8deadSopenharmony_ci
5725bd8deadSopenharmony_ci    If a geometry attribute binding matches "primitive.smid", the "x" component
5735bd8deadSopenharmony_ci    is filled with the SM id of the current thread.  The SM id is an unsigned 
5745bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
5755bd8deadSopenharmony_ci
5765bd8deadSopenharmony_ci
5775bd8deadSopenharmony_ci    (add the following subsection to section 2.X.3.3, Parameters)
5785bd8deadSopenharmony_ci
5795bd8deadSopenharmony_ci    Thread Group Property Bindings
5805bd8deadSopenharmony_ci
5815bd8deadSopenharmony_ci      Binding                        Components  Underlying State
5825bd8deadSopenharmony_ci      -----------------------------  ----------  ----------------------------
5835bd8deadSopenharmony_ci      state.thread.warpsize          (x,-,-,-)   total number of thread in a 
5845bd8deadSopenharmony_ci                                                 warp
5855bd8deadSopenharmony_ci      state.thread.warpspersm        (x,-,-,-)   maximum number of warp 
5865bd8deadSopenharmony_ci                                                 executing on a SM
5875bd8deadSopenharmony_ci      state.thread.smcount           (x,-,-,-)   number of SM on the GPU
5885bd8deadSopenharmony_ci                                                     
5895bd8deadSopenharmony_ci    If a program parameter binding matches "state.thread.warpsize", the "x"
5905bd8deadSopenharmony_ci    component of the program parameter variable is filled with an integer value
5915bd8deadSopenharmony_ci    indicating the total number of thread in a warp.  The "y", "z", and "w" 
5925bd8deadSopenharmony_ci    components are undefined.
5935bd8deadSopenharmony_ci
5945bd8deadSopenharmony_ci    If a program parameter binding matches "state.thread.warpspersm", the "x"
5955bd8deadSopenharmony_ci    component of the program parameter variable is filled with an integer value
5965bd8deadSopenharmony_ci    indicating the maximum number of warp executing on a SM.  The "y", "z", and 
5975bd8deadSopenharmony_ci    "w" components are undefined.
5985bd8deadSopenharmony_ci
5995bd8deadSopenharmony_ci    If a program parameter binding matches "state.thread.smcount", the "x"
6005bd8deadSopenharmony_ci    component of the program parameter variable is filled with an integer value
6015bd8deadSopenharmony_ci    indicating the number of SM on the GPU.  The "y", "z", and "w" components 
6025bd8deadSopenharmony_ci    are undefined.
6035bd8deadSopenharmony_ci    
6045bd8deadSopenharmony_ci    
6055bd8deadSopenharmony_ci    Modify Section 2.X.4, Program Execution Environment  
6065bd8deadSopenharmony_ci
6075bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the program
6085bd8deadSopenharmony_ci     instruction to query thread conditions.)
6095bd8deadSopenharmony_ci    
6105bd8deadSopenharmony_ci      Instr-      Modifiers 
6115bd8deadSopenharmony_ci      uction   V  F I C S H D  Out Inputs    Description
6125bd8deadSopenharmony_ci      -------  -- - - - - - -  --- --------  --------------------------------      
6135bd8deadSopenharmony_ci      ...
6145bd8deadSopenharmony_ci      TGBALLOT 50 X X X X - - F  vu  v        query a boolean in thread group     
6155bd8deadSopenharmony_ci      ...
6165bd8deadSopenharmony_ci
6175bd8deadSopenharmony_ci
6185bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the fragment program 
6195bd8deadSopenharmony_ci     instructions to exchange data between threads.)
6205bd8deadSopenharmony_ci    
6215bd8deadSopenharmony_ci      Instr-      Modifiers 
6225bd8deadSopenharmony_ci      uction   V  F I C S H D  Out Inputs    Description
6235bd8deadSopenharmony_ci      -------  -- - - - - - -  --- --------  --------------------------------      
6245bd8deadSopenharmony_ci      ...
6255bd8deadSopenharmony_ci      QSWZ0    50 X - - - - - F  v   v,v      add fragment 0 in a quad
6265bd8deadSopenharmony_ci      QSWZ1    50 X - - - - - F  v   v,v      add fragment 1 in a quad
6275bd8deadSopenharmony_ci      QSWZ2    50 X - - - - - F  v   v,v      add fragment 2 in a quad
6285bd8deadSopenharmony_ci      QSWZ3    50 X - - - - - F  v   v,v      add fragment 3 in a quad
6295bd8deadSopenharmony_ci      QSWZX    50 X - - - - - F  v   v,v      add fragments horizontally
6305bd8deadSopenharmony_ci      QSWZY    50 X - - - - - F  v   v,v      add fragments vertically     
6315bd8deadSopenharmony_ci      ...
6325bd8deadSopenharmony_ci
6335bd8deadSopenharmony_ci
6345bd8deadSopenharmony_ci    (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 
6355bd8deadSopenharmony_ci     as extended by NV_gpu_program5)
6365bd8deadSopenharmony_ci
6375bd8deadSopenharmony_ci    + Shader thread group (NV_shader_thread_group)
6385bd8deadSopenharmony_ci
6395bd8deadSopenharmony_ci    If a fragment program specifies the "NV_shader_thread_group" option, it
6405bd8deadSopenharmony_ci    may use the "fragment.threadid", "fragment.threadeqmask", 
6415bd8deadSopenharmony_ci    "fragment.threadltmask", "fragment.threadlemask", "fragment.threadgtmask",
6425bd8deadSopenharmony_ci    "fragment.threadgemask", "fragment.warpid", "fragment.smid", 
6435bd8deadSopenharmony_ci    "fragment.helperthread", "state.thread.warpsize", "state.thread.warpspersm"
6445bd8deadSopenharmony_ci    and "state.thread.smcount" bindings.  It may also use the "TGBALLOT",
6455bd8deadSopenharmony_ci    "QSWZ0", "QSWZ1", "QSWZ2", "QSWZ3", "QSWZX" and "QSWZY" instructions.  If
6465bd8deadSopenharmony_ci    this option is not specified, a program will fail to compile if it uses
6475bd8deadSopenharmony_ci    those instructions or bindings.
6485bd8deadSopenharmony_ci
6495bd8deadSopenharmony_ci    If a vertex program specifies the "NV_shader_thread_group" option, it may
6505bd8deadSopenharmony_ci    use the "vertex.threadid", "vertex.threadeqmask", "vertex.threadltmask",
6515bd8deadSopenharmony_ci    "vertex.threadlemask", "vertex.threadgtmask", "vertex.threadgemask",
6525bd8deadSopenharmony_ci    "vertex.warpid", "vertex.smid", "state.thread.warpsize", 
6535bd8deadSopenharmony_ci    "state.thread.warpspersm" and "state.thread.smcount" bindings.  It may also
6545bd8deadSopenharmony_ci    use the "TGBALLOT" instruction.  If this option is not specified, a program
6555bd8deadSopenharmony_ci    will fail to compile if it uses those instructions or bindings.
6565bd8deadSopenharmony_ci
6575bd8deadSopenharmony_ci    If a geometry program specifies the "NV_shader_thread_group" option, it
6585bd8deadSopenharmony_ci    may use the "primitive.threadid", "primitive.threadeqmask", 
6595bd8deadSopenharmony_ci    "primitive.threadltmask", "primitive.threadlemask",
6605bd8deadSopenharmony_ci    "primitive.threadgtmask", "primitive.threadgemask", "primitive.warpid",
6615bd8deadSopenharmony_ci    "primitive.smid", "state.thread.warpsize", "state.thread.warpspersm" and
6625bd8deadSopenharmony_ci    "state.thread.smcount" bindings.  It may also use the "TGBALLOT"
6635bd8deadSopenharmony_ci    instruction.  If this option is not specified, a program will fail to 
6645bd8deadSopenharmony_ci    compile if it uses those instructions or bindings.
6655bd8deadSopenharmony_ci
6665bd8deadSopenharmony_ci    Section 2.X.8.Z, QSWZ0:  add fragment 0 data to all fragment in a quad
6675bd8deadSopenharmony_ci    
6685bd8deadSopenharmony_ci    The QSWZ0 instruction produces a floating point result by adding the
6695bd8deadSopenharmony_ci    first operand, a floating point value from fragment 0, to the second 
6705bd8deadSopenharmony_ci    operand, another floating point value from the current fragment.  
6715bd8deadSopenharmony_ci    
6725bd8deadSopenharmony_ci    quadSwizzle0NV is the GLSL function that implements the same functionality
6735bd8deadSopenharmony_ci    as the QSWZ0 assembly instruction.  The section 8.3 of the OpenGL Shading
6745bd8deadSopenharmony_ci    Language Specification has more detail about the implementation of
6755bd8deadSopenharmony_ci    quadSwizzle0NV.  This additional information also applies to QSWZ0.
6765bd8deadSopenharmony_ci    
6775bd8deadSopenharmony_ci    
6785bd8deadSopenharmony_ci    Section 2.X.8.Z, QSWZ1:  add fragment 1 data to all fragment in a quad
6795bd8deadSopenharmony_ci    
6805bd8deadSopenharmony_ci    The QSWZ1 instruction produces a floating point result by adding the
6815bd8deadSopenharmony_ci    first operand, a floating point value from fragment 1, to the second 
6825bd8deadSopenharmony_ci    operand, another floating point value from the current fragment.  
6835bd8deadSopenharmony_ci    
6845bd8deadSopenharmony_ci    quadSwizzle1NV is the GLSL function that implements the same functionality
6855bd8deadSopenharmony_ci    as the QSWZ1 assembly instruction.  The section 8.3 of the OpenGL Shading
6865bd8deadSopenharmony_ci    Language Specification has more detail about the implementation of
6875bd8deadSopenharmony_ci    quadSwizzle1NV.  This additional information also applies to QSWZ1.
6885bd8deadSopenharmony_ci
6895bd8deadSopenharmony_ci
6905bd8deadSopenharmony_ci    Section 2.X.8.Z, QSWZ2:  add fragment 2 data to all fragment in a quad
6915bd8deadSopenharmony_ci    
6925bd8deadSopenharmony_ci    The QSWZ2 instruction produces a floating point result by adding the
6935bd8deadSopenharmony_ci    first operand, a floating point value from fragment 2, to the second 
6945bd8deadSopenharmony_ci    operand, another floating point value from the current fragment.  
6955bd8deadSopenharmony_ci    
6965bd8deadSopenharmony_ci    quadSwizzle2NV is the GLSL function that implements the same functionality
6975bd8deadSopenharmony_ci    as the QSWZ2 assembly instruction.  The section 8.3 of the OpenGL Shading
6985bd8deadSopenharmony_ci    Language Specification has more detail about the implementation of
6995bd8deadSopenharmony_ci    quadSwizzle2NV.  This additional information also applies to QSWZ2.
7005bd8deadSopenharmony_ci
7015bd8deadSopenharmony_ci    
7025bd8deadSopenharmony_ci    Section 2.X.8.Z, QSWZ3:  add fragment 3 data to all fragment in a quad
7035bd8deadSopenharmony_ci    
7045bd8deadSopenharmony_ci    The QSWZ3 instruction produces a floating point result by adding the
7055bd8deadSopenharmony_ci    first operand, a floating point value from fragment 3, to the second 
7065bd8deadSopenharmony_ci    operand, another floating point value from the current fragment.  
7075bd8deadSopenharmony_ci    
7085bd8deadSopenharmony_ci    quadSwizzle3NV is the GLSL function that implements the same functionality
7095bd8deadSopenharmony_ci    as the QSWZ3 assembly instruction.  The section 8.3 of the OpenGL Shading
7105bd8deadSopenharmony_ci    Language Specification has more detail about the implementation of
7115bd8deadSopenharmony_ci    quadSwizzle3NV.  This additional information also applies to QSWZ3.
7125bd8deadSopenharmony_ci    
7135bd8deadSopenharmony_ci    
7145bd8deadSopenharmony_ci    Section 2.X.8.Z, QSWZX:  add fragments in a quad horizontally
7155bd8deadSopenharmony_ci
7165bd8deadSopenharmony_ci    The QSWZX instruction produces a floating point result by adding the
7175bd8deadSopenharmony_ci    first operand, a floating point value from the fragment neighbor in X to 
7185bd8deadSopenharmony_ci    the current fragment, to the second operand, another floating point value 
7195bd8deadSopenharmony_ci    from the current fragment.
7205bd8deadSopenharmony_ci
7215bd8deadSopenharmony_ci    quadSwizzleXNV is the GLSL function that implements the same functionality
7225bd8deadSopenharmony_ci    as the QSWZX assembly instruction.  The section 8.3 of the OpenGL Shading
7235bd8deadSopenharmony_ci    Language Specification has more detail about the implementation of
7245bd8deadSopenharmony_ci    quadSwizzleXNV.  This additional information also applies to QSWZX.
7255bd8deadSopenharmony_ci
7265bd8deadSopenharmony_ci    
7275bd8deadSopenharmony_ci    Section 2.X.8.Z, QSWZY:  add fragments in a quad vertically
7285bd8deadSopenharmony_ci    
7295bd8deadSopenharmony_ci    The QSWZY instruction produces a floating point result by adding the
7305bd8deadSopenharmony_ci    first operand, a floating point value from the fragment neighbor in Y to 
7315bd8deadSopenharmony_ci    the current fragment, to the second operand, another floating point value 
7325bd8deadSopenharmony_ci    from the current fragment.
7335bd8deadSopenharmony_ci
7345bd8deadSopenharmony_ci    quadSwizzleYNV is the GLSL function that implements the same functionality
7355bd8deadSopenharmony_ci    as the QSWZY assembly instruction.  The section 8.3 of the OpenGL Shading
7365bd8deadSopenharmony_ci    Language Specification has more detail about the implementation of
7375bd8deadSopenharmony_ci    quadSwizzleYNV.  This additional information also applies to QSWZY.
7385bd8deadSopenharmony_ci    
7395bd8deadSopenharmony_ci    
7405bd8deadSopenharmony_ci    Section 2.X.8.Z, TGBALLOT:  query a boolean condition over a thread group
7415bd8deadSopenharmony_ci
7425bd8deadSopenharmony_ci    The TGBALLOT instruction produces a result vector by reading a vector
7435bd8deadSopenharmony_ci    operand for each active thread in the current thread group and comparing 
7445bd8deadSopenharmony_ci    each component to zero.  A result vector component contains an integer 
7455bd8deadSopenharmony_ci    bitmask  value (described below) for which the bits in a component bitmask
7465bd8deadSopenharmony_ci    are set if the value in the operand vector is non-zero for the 
7475bd8deadSopenharmony_ci    corresponding thread, and not set otherwise.
7485bd8deadSopenharmony_ci
7495bd8deadSopenharmony_ci    Sometime when the instruction is in a conditional control flow block or 
7505bd8deadSopenharmony_ci    when it's not possible to completely fill a thread group, only a subset of
7515bd8deadSopenharmony_ci    the threads in the thread group will be active and will execute the 
7525bd8deadSopenharmony_ci    TGBALLOT instruction.  Each bit in the bitfield corresponding to inactive 
7535bd8deadSopenharmony_ci    threads will be set to 0.  It's possible to query the active thread mask 
7545bd8deadSopenharmony_ci    by calling TGBALLOT with 1 as the first operand.
7555bd8deadSopenharmony_ci
7565bd8deadSopenharmony_ci      tmp = VectorLoad(op0);
7575bd8deadSopenharmony_ci      result = { 0, 0, 0, 0 };
7585bd8deadSopenharmony_ci      for (all active threads) {
7595bd8deadSopenharmony_ci        if ([thread]tmp.x != 0) result.x |= 1 << thread;
7605bd8deadSopenharmony_ci        if ([thread]tmp.y != 0) result.y |= 1 << thread;
7615bd8deadSopenharmony_ci        if ([thread]tmp.z != 0) result.z |= 1 << thread;
7625bd8deadSopenharmony_ci        if ([thread]tmp.w != 0) result.w |= 1 << thread;
7635bd8deadSopenharmony_ci      }   
7645bd8deadSopenharmony_ci
7655bd8deadSopenharmony_ciDependencies on NV_tessellation_program5
7665bd8deadSopenharmony_ci
7675bd8deadSopenharmony_ci    If NV_tessellation_program5 is supported and 
7685bd8deadSopenharmony_ci    "OPTION NV_shader_thread_group" is specified in an assembly program, the
7695bd8deadSopenharmony_ci    following edits are made to extend the assembly programming model
7705bd8deadSopenharmony_ci    documented in the NV_gpu_program4 extension and extended by NV_gpu_program5
7715bd8deadSopenharmony_ci    and NV_tessellation_program5.  
7725bd8deadSopenharmony_ci
7735bd8deadSopenharmony_ci    If NV_tessellation_program5 is not supported, or if
7745bd8deadSopenharmony_ci    "OPTION NV_shader_thread_group" is not specified in an assembly program,
7755bd8deadSopenharmony_ci    the contents of this dependencies section should be ignored.
7765bd8deadSopenharmony_ci
7775bd8deadSopenharmony_ci
7785bd8deadSopenharmony_ci    Modify Section 2.X.2, Program Grammar
7795bd8deadSopenharmony_ci
7805bd8deadSopenharmony_ci    (add/change the following rules to the NV_gpu_program5 base grammars for
7815bd8deadSopenharmony_ci     tessellation control programs)
7825bd8deadSopenharmony_ci
7835bd8deadSopenharmony_ci    <attribBasic>           ::= <primPrefix> "threadid"
7845bd8deadSopenharmony_ci                              | <primPrefix> "threadeqmask"
7855bd8deadSopenharmony_ci                              | <primPrefix> "threadltmask"
7865bd8deadSopenharmony_ci                              | <primPrefix> "threadlemask"
7875bd8deadSopenharmony_ci                              | <primPrefix> "threadgtmask"
7885bd8deadSopenharmony_ci                              | <primPrefix> "threadgemask"
7895bd8deadSopenharmony_ci                              | <primPrefix> "warpid"
7905bd8deadSopenharmony_ci                              | <primPrefix> "smid"
7915bd8deadSopenharmony_ci
7925bd8deadSopenharmony_ci    (add/change the following rules to the NV_gpu_program5 base grammars for
7935bd8deadSopenharmony_ci     tessellation evaluation programs)
7945bd8deadSopenharmony_ci
7955bd8deadSopenharmony_ci    <attribBasic>           ::= <primPrefix> "threadid"
7965bd8deadSopenharmony_ci                              | <primPrefix> "threadeqmask"
7975bd8deadSopenharmony_ci                              | <primPrefix> "threadltmask"
7985bd8deadSopenharmony_ci                              | <primPrefix> "threadlemask"
7995bd8deadSopenharmony_ci                              | <primPrefix> "threadgtmask"
8005bd8deadSopenharmony_ci                              | <primPrefix> "threadgemask"
8015bd8deadSopenharmony_ci                              | <primPrefix> "warpid"
8025bd8deadSopenharmony_ci                              | <primPrefix> "smid"
8035bd8deadSopenharmony_ci
8045bd8deadSopenharmony_ci
8055bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_tessellation_program5 specification, 
8065bd8deadSopenharmony_ci    Program Attribute Variables.
8075bd8deadSopenharmony_ci
8085bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the Tessellation
8095bd8deadSopenharmony_ci     control and evaluation program attribute variables use to query thread
8105bd8deadSopenharmony_ci     states.)
8115bd8deadSopenharmony_ci
8125bd8deadSopenharmony_ci
8135bd8deadSopenharmony_ci      Primitive Binding Suffix    Components  Underlying State
8145bd8deadSopenharmony_ci      --------------------------  ----------  ----------------------------
8155bd8deadSopenharmony_ci      ...
8165bd8deadSopenharmony_ci      primitive.threadid         (id,-,-,-)  id of the current thread
8175bd8deadSopenharmony_ci      primitive.threadeqmask     (m,-,-,-)   mask with the current thread
8185bd8deadSopenharmony_ci      primitive.threadltmask     (m,-,-,-)   mask with lower thread
8195bd8deadSopenharmony_ci      primitive.threadlemask     (m,-,-,-)   mask with lower or equal thread
8205bd8deadSopenharmony_ci      primitive.threadgtmask     (m,-,-,-)   mask with greater thread
8215bd8deadSopenharmony_ci      primitive.threadgemask     (m,-,-,-)   mask with greater or equal thread
8225bd8deadSopenharmony_ci      primitive.warpid           (id,-,-,-)  warp id of the current thread
8235bd8deadSopenharmony_ci      primitive.smid             (id,-,-,-)  SM id of the current thread
8245bd8deadSopenharmony_ci      ...
8255bd8deadSopenharmony_ci
8265bd8deadSopenharmony_ci    If a attribute binding matches "primitive.threadid", the "x" component is
8275bd8deadSopenharmony_ci    filled with the thread id of the current thread.  The thread id is an
8285bd8deadSopenharmony_ci    unsigned integer in the range 0 to 31.
8295bd8deadSopenharmony_ci
8305bd8deadSopenharmony_ci    If a attribute binding matches "primitive.threadeqmask", the "x"
8315bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which the
8325bd8deadSopenharmony_ci    bit equal to the current thread id is set.
8335bd8deadSopenharmony_ci
8345bd8deadSopenharmony_ci    If a attribute binding matches "primitive.threadltmask", the "x" 
8355bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
8365bd8deadSopenharmony_ci    lower than the current thread id are set.
8375bd8deadSopenharmony_ci
8385bd8deadSopenharmony_ci    If a attribute binding matches "primitive.threadlemask", the "x" 
8395bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
8405bd8deadSopenharmony_ci    lower or equal to the current thread id are set.
8415bd8deadSopenharmony_ci
8425bd8deadSopenharmony_ci    If a attribute binding matches "primitive.threadgtmask", the "x"
8435bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
8445bd8deadSopenharmony_ci    greater than the current thread id are set.
8455bd8deadSopenharmony_ci
8465bd8deadSopenharmony_ci    If a attribute binding matches "primitive.threadgemask", the "x"
8475bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
8485bd8deadSopenharmony_ci    greater or equal to the current thread id are set.
8495bd8deadSopenharmony_ci
8505bd8deadSopenharmony_ci    If a attribute binding matches "primitive.warpid", the "x" component is
8515bd8deadSopenharmony_ci    filled with the warp id of the current thread.  The warp id is an unsigned
8525bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
8535bd8deadSopenharmony_ci
8545bd8deadSopenharmony_ci    If a attribute binding matches "primitive.smid", the "x" component is
8555bd8deadSopenharmony_ci    filled with the SM id of the current thread.  The SM id is an unsigned 
8565bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
8575bd8deadSopenharmony_ci
8585bd8deadSopenharmony_ci    (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 
8595bd8deadSopenharmony_ci     as extended by NV_gpu_program5 and NV_tessellation_program5)
8605bd8deadSopenharmony_ci
8615bd8deadSopenharmony_ci    + Shader thread group (NV_shader_thread_group)
8625bd8deadSopenharmony_ci
8635bd8deadSopenharmony_ci    If a program specifies the "NV_shader_thread_group" option, it may use
8645bd8deadSopenharmony_ci    the "primitive.threadid", "primitive.threadeqmask",
8655bd8deadSopenharmony_ci    "primitive.threadltmask", "primitive.threadlemask", 
8665bd8deadSopenharmony_ci    "primitive.threadgtmask", "primitive.threadgemask", "primitive.warpid",
8675bd8deadSopenharmony_ci    "primitive.smid", "state.thread.warpsize", "state.thread.warpspersm" and
8685bd8deadSopenharmony_ci    "state.thread.smcount" bindings.  It may also use the "TGBALLOT"
8695bd8deadSopenharmony_ci    instruction.  If this option is not specified, a program will fail to
8705bd8deadSopenharmony_ci    compile if it uses those bindings.
8715bd8deadSopenharmony_ci
8725bd8deadSopenharmony_ci
8735bd8deadSopenharmony_ciDependencies on NV_compute_program5
8745bd8deadSopenharmony_ci
8755bd8deadSopenharmony_ci    If NV_compute_program5 is supported and "OPTION NV_shader_thread_group" is
8765bd8deadSopenharmony_ci    specified in an assembly program, the following edits are made to extend 
8775bd8deadSopenharmony_ci    the assembly programming model documented in the NV_gpu_program4 extension
8785bd8deadSopenharmony_ci    and extended by NV_gpu_program5 and NV_compute_program5.  
8795bd8deadSopenharmony_ci
8805bd8deadSopenharmony_ci    If NV_compute_program5 is not supported, or if 
8815bd8deadSopenharmony_ci    "OPTION NV_shader_thread_group" is not specified in an assembly program, 
8825bd8deadSopenharmony_ci    the contents of this dependencies section should be ignored.
8835bd8deadSopenharmony_ci
8845bd8deadSopenharmony_ci    Section 2.X.2, Program Grammar
8855bd8deadSopenharmony_ci
8865bd8deadSopenharmony_ci    (add the following rules to the grammar)
8875bd8deadSopenharmony_ci
8885bd8deadSopenharmony_ci    <attribBasic>           ::= "invocation" "." "threadid"
8895bd8deadSopenharmony_ci                              | "invocation" "." "threadeqmask"
8905bd8deadSopenharmony_ci                              | "invocation" "." "threadltmask"
8915bd8deadSopenharmony_ci                              | "invocation" "." "threadlemask"
8925bd8deadSopenharmony_ci                              | "invocation" "." "threadgtmask"
8935bd8deadSopenharmony_ci                              | "invocation" "." "threadgemask"
8945bd8deadSopenharmony_ci                              | "invocation" "." "warpid"
8955bd8deadSopenharmony_ci                              | "invocation" "." "smid"
8965bd8deadSopenharmony_ci
8975bd8deadSopenharmony_ci    Modify Section 2.X.3.2 of the NV_compute_program5 specification, Program 
8985bd8deadSopenharmony_ci    Attribute Variables.
8995bd8deadSopenharmony_ci
9005bd8deadSopenharmony_ci    (Add the table entries and relevant text describing the compute program 
9015bd8deadSopenharmony_ci     input variable use to query thread states.)
9025bd8deadSopenharmony_ci
9035bd8deadSopenharmony_ci      Attribute Binding           Components  Underlying State
9045bd8deadSopenharmony_ci      --------------------------  ----------  ----------------------------
9055bd8deadSopenharmony_ci      ...
9065bd8deadSopenharmony_ci      invocation.threadid         (id,-,-,-)  id of the current thread
9075bd8deadSopenharmony_ci      invocation.threadeqmask     (m,-,-,-)   mask with the current thread
9085bd8deadSopenharmony_ci      invocation.threadltmask     (m,-,-,-)   mask with lower thread
9095bd8deadSopenharmony_ci      invocation.threadlemask     (m,-,-,-)   mask with lower or equal thread
9105bd8deadSopenharmony_ci      invocation.threadgtmask     (m,-,-,-)   mask with greater thread
9115bd8deadSopenharmony_ci      invocation.threadgemask     (m,-,-,-)   mask with greater or equal thread
9125bd8deadSopenharmony_ci      invocation.warpid           (id,-,-,-)  warp id of the current thread
9135bd8deadSopenharmony_ci      invocation.smid             (id,-,-,-)  SM id of the current thread
9145bd8deadSopenharmony_ci      ...
9155bd8deadSopenharmony_ci
9165bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.threadid", the "x" 
9175bd8deadSopenharmony_ci    component is filled with the thread id of the current thread.  The thread
9185bd8deadSopenharmony_ci    id is an unsigned integer in the range 0 to 31.
9195bd8deadSopenharmony_ci
9205bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.threadeqmask", the "x"
9215bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which the
9225bd8deadSopenharmony_ci    bit equal to the current thread id is set.
9235bd8deadSopenharmony_ci
9245bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.threadltmask", the "x" 
9255bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
9265bd8deadSopenharmony_ci    lower than the current thread id are set.
9275bd8deadSopenharmony_ci
9285bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.threadlemask", the "x" 
9295bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
9305bd8deadSopenharmony_ci    lower or equal to the current thread id are set.
9315bd8deadSopenharmony_ci
9325bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.threadgtmask", the "x"
9335bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
9345bd8deadSopenharmony_ci    greater than the current thread id are set.
9355bd8deadSopenharmony_ci
9365bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.threadgemask", the "x"
9375bd8deadSopenharmony_ci    component is filled with a 32-bit unsigned integer bitfield in which bits
9385bd8deadSopenharmony_ci    greater or equal to the current thread id are set.
9395bd8deadSopenharmony_ci
9405bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.warpid", the "x"
9415bd8deadSopenharmony_ci    component is filled with the warp id of the current thread.  The warp id is
9425bd8deadSopenharmony_ci    an unsigned integer, the range of this value is hw dependent.
9435bd8deadSopenharmony_ci
9445bd8deadSopenharmony_ci    If a compute attribute binding matches "invocation.smid", the "x" component
9455bd8deadSopenharmony_ci    is filled with the SM id of the current thread.  The SM id is an unsigned 
9465bd8deadSopenharmony_ci    integer, the range of this value is hw dependent.
9475bd8deadSopenharmony_ci
9485bd8deadSopenharmony_ci    (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 
9495bd8deadSopenharmony_ci     as extended by NV_gpu_program5 and NV_compute_program5)
9505bd8deadSopenharmony_ci
9515bd8deadSopenharmony_ci
9525bd8deadSopenharmony_ci    + Shader thread group (NV_shader_thread_group)
9535bd8deadSopenharmony_ci
9545bd8deadSopenharmony_ci    If a program specifies the "NV_shader_thread_group" option, it may use the
9555bd8deadSopenharmony_ci    "invocation.threadid", "invocation.threadeqmask", 
9565bd8deadSopenharmony_ci    "invocation.threadltmask", "invocation.threadlemask", 
9575bd8deadSopenharmony_ci    "invocation.threadgtmask", "invocation.threadgemask", "invocation.warpid",
9585bd8deadSopenharmony_ci    "invocation.smid", "state.thread.warpsize", "state.thread.warpspersm" and 
9595bd8deadSopenharmony_ci    "state.thread.smcount" bindings.  It may also use the "TGBALLOT"
9605bd8deadSopenharmony_ci    instruction.  If this option is not specified, a program will fail to
9615bd8deadSopenharmony_ci    compile if it uses those bindings.
9625bd8deadSopenharmony_ci
9635bd8deadSopenharmony_ci
9645bd8deadSopenharmony_ciErrors
9655bd8deadSopenharmony_ci
9665bd8deadSopenharmony_ci    None.
9675bd8deadSopenharmony_ci
9685bd8deadSopenharmony_ciNew State
9695bd8deadSopenharmony_ci
9705bd8deadSopenharmony_ci    None.
9715bd8deadSopenharmony_ci
9725bd8deadSopenharmony_ciNew Implementation Dependent State
9735bd8deadSopenharmony_ci
9745bd8deadSopenharmony_ci                                                             Minimum
9755bd8deadSopenharmony_ci    Get Value                         Type  Get Command       Value   Description           Sec.   Attrib
9765bd8deadSopenharmony_ci    --------------------------------  ----  ---------------  -------  --------------------- ------ ------
9775bd8deadSopenharmony_ci    WARP_SIZE_NV                       Z+   GetIntegerv        1       total number of      2.X.3.3  -
9785bd8deadSopenharmony_ci                                                                       thread in a warp.
9795bd8deadSopenharmony_ci                                                                       
9805bd8deadSopenharmony_ci    WARPS_PER_SM_NV                    Z+   GetIntegerv        1       maximum number of    2.X.3.3  -
9815bd8deadSopenharmony_ci                                                                       warp executing on a
9825bd8deadSopenharmony_ci                                                                       SM.
9835bd8deadSopenharmony_ci                                                                       
9845bd8deadSopenharmony_ci    SM_COUNT_NV                        Z+   GetIntegerv        1       number of SM on the  2.X.3.3  -
9855bd8deadSopenharmony_ci                                                                       GPU.                                                                       
9865bd8deadSopenharmony_ci      
9875bd8deadSopenharmony_ci      
9885bd8deadSopenharmony_ciIssues
9895bd8deadSopenharmony_ci
9905bd8deadSopenharmony_ci    None
9915bd8deadSopenharmony_ci
9925bd8deadSopenharmony_ci
9935bd8deadSopenharmony_ciRevision History
9945bd8deadSopenharmony_ci
9955bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
9965bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------
9975bd8deadSopenharmony_ci     4     7/21/15  jbreton    Update the layout of threads within a quad for
9985bd8deadSopenharmony_ci                               window and framebuffer object rendering.
9995bd8deadSopenharmony_ci     3     2/14/14  jbreton    Rename the extension from NVX to NV.
10005bd8deadSopenharmony_ci     2      9/4/13  jbreton    Add helperThread attribute binding.   
10015bd8deadSopenharmony_ci     1    12/19/12  jbreton    Internal revisions.
1002