15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_shader_thread_group 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_NV_shader_thread_group 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContributors 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Jeannot Breton, NVIDIA 125bd8deadSopenharmony_ci Pat Brown, NVIDIA 135bd8deadSopenharmony_ci Eric Werness, NVIDIA 145bd8deadSopenharmony_ci Mark Kilgard, NVIDIA 155bd8deadSopenharmony_ci 165bd8deadSopenharmony_ciContact 175bd8deadSopenharmony_ci 185bd8deadSopenharmony_ci Jeannot Breton, NVIDIA Corporation (jbreton 'at' nvidia.com) 195bd8deadSopenharmony_ci 205bd8deadSopenharmony_ciStatus 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ci Shipping. 235bd8deadSopenharmony_ci 245bd8deadSopenharmony_ciVersion 255bd8deadSopenharmony_ci 265bd8deadSopenharmony_ci Last Modified Date: 7/21/2015 275bd8deadSopenharmony_ci NVIDIA Revision: 4 285bd8deadSopenharmony_ci 295bd8deadSopenharmony_ciNumber 305bd8deadSopenharmony_ci 315bd8deadSopenharmony_ci OpenGL Extension #447 325bd8deadSopenharmony_ci 335bd8deadSopenharmony_ciDependencies 345bd8deadSopenharmony_ci 355bd8deadSopenharmony_ci This extension is written against the OpenGL 4.3 (Compatibility Profile) 365bd8deadSopenharmony_ci Specification. 375bd8deadSopenharmony_ci 385bd8deadSopenharmony_ci This extension is written against version 4.30 (revision 07) of the OpenGL 395bd8deadSopenharmony_ci Shading Language Specification. 405bd8deadSopenharmony_ci 415bd8deadSopenharmony_ci OpenGL 4.3 and GLSL 4.3 are required. 425bd8deadSopenharmony_ci 435bd8deadSopenharmony_ci This extension interacts with NV_gpu_program5 445bd8deadSopenharmony_ci 455bd8deadSopenharmony_ci This extension interacts with NV_compute_program5 465bd8deadSopenharmony_ci 475bd8deadSopenharmony_ci This extension interacts with NV_tessellation_program5 485bd8deadSopenharmony_ci 495bd8deadSopenharmony_ciOverview 505bd8deadSopenharmony_ci 515bd8deadSopenharmony_ci Implementations of the OpenGL Shading Language may, but are not required 525bd8deadSopenharmony_ci to, run multiple shader threads for a single stage as a SIMD thread group, 535bd8deadSopenharmony_ci where individual execution threads are assigned to thread groups in an 545bd8deadSopenharmony_ci undefined, implementation-dependent order. This extension provides a set 555bd8deadSopenharmony_ci of new features to the OpenGL Shading Language to query thread states and 565bd8deadSopenharmony_ci to share data between fragments within a 2x2 pixel quad. 575bd8deadSopenharmony_ci 585bd8deadSopenharmony_ci More specifically the following functionalities were added: 595bd8deadSopenharmony_ci 605bd8deadSopenharmony_ci * New uniform variables and tokens to query the number of threads in a 615bd8deadSopenharmony_ci warp, the number of warps running on a SM and the number of SMs on the 625bd8deadSopenharmony_ci GPU. 635bd8deadSopenharmony_ci 645bd8deadSopenharmony_ci * New shader inputs to query the thread id, the warp id and the SM id. 655bd8deadSopenharmony_ci 665bd8deadSopenharmony_ci * New shader inputs to query if a fragment shader thread is a helper 675bd8deadSopenharmony_ci thread. 685bd8deadSopenharmony_ci 695bd8deadSopenharmony_ci * New shader built-in functions to query the state of a Boolean condition 705bd8deadSopenharmony_ci over all threads in a thread group. 715bd8deadSopenharmony_ci 725bd8deadSopenharmony_ci * New shader built-in functions to query which threads are active within 735bd8deadSopenharmony_ci a thread group. 745bd8deadSopenharmony_ci 755bd8deadSopenharmony_ci * New fragment shader built-in functions to share data between fragments 765bd8deadSopenharmony_ci within a 2x2 pixel quad. 775bd8deadSopenharmony_ci 785bd8deadSopenharmony_ci Shaders using the new functionalities provided by this extension should 795bd8deadSopenharmony_ci enable this functionality via the construct 805bd8deadSopenharmony_ci 815bd8deadSopenharmony_ci #extension GL_NV_shader_thread_group : require (or enable) 825bd8deadSopenharmony_ci 835bd8deadSopenharmony_ci This extension also specifies some modifications to the program assembly 845bd8deadSopenharmony_ci language to support the thread state query and thread data sharing 855bd8deadSopenharmony_ci functionalities. 865bd8deadSopenharmony_ci 875bd8deadSopenharmony_ci Note that in this extension specification warp and thread group have the 885bd8deadSopenharmony_ci same meaning. A warp is a group of threads that get executed in lockstep. 895bd8deadSopenharmony_ci Each thread in a warp executes the same instruction of a program, but on 905bd8deadSopenharmony_ci different data. 915bd8deadSopenharmony_ci 925bd8deadSopenharmony_ciNew Procedures and Functions 935bd8deadSopenharmony_ci 945bd8deadSopenharmony_ci None 955bd8deadSopenharmony_ci 965bd8deadSopenharmony_ci 975bd8deadSopenharmony_ciNew Tokens 985bd8deadSopenharmony_ci 995bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, 1005bd8deadSopenharmony_ci GetFloatv, and GetDoublev: 1015bd8deadSopenharmony_ci 1025bd8deadSopenharmony_ci WARP_SIZE_NV 0x9339 1035bd8deadSopenharmony_ci WARPS_PER_SM_NV 0x933A 1045bd8deadSopenharmony_ci SM_COUNT_NV 0x933B 1055bd8deadSopenharmony_ci 1065bd8deadSopenharmony_ci 1075bd8deadSopenharmony_ciModifications to The OpenGL Shading Language Specification, Version 4.30 1085bd8deadSopenharmony_ci(Revision 07) 1095bd8deadSopenharmony_ci 1105bd8deadSopenharmony_ci Including the following line in a shader can be used to control the 1115bd8deadSopenharmony_ci language features described in this extension: 1125bd8deadSopenharmony_ci 1135bd8deadSopenharmony_ci #extension GL_NV_shader_thread_group : <behavior> 1145bd8deadSopenharmony_ci 1155bd8deadSopenharmony_ci where <behavior> is as specified in section 3.3. 1165bd8deadSopenharmony_ci 1175bd8deadSopenharmony_ci New preprocessor #defines are added to the OpenGL Shading Language: 1185bd8deadSopenharmony_ci 1195bd8deadSopenharmony_ci #define GL_NV_shader_thread_group 1 1205bd8deadSopenharmony_ci 1215bd8deadSopenharmony_ci Modify Section 7.1, Built-in Languages Variable, p. 110 1225bd8deadSopenharmony_ci 1235bd8deadSopenharmony_ci (Add to the list of built-in variables for the compute, vertex, geometry, 1245bd8deadSopenharmony_ci tessellation control, tessellation evaluation and fragment languages) 1255bd8deadSopenharmony_ci 1265bd8deadSopenharmony_ci in uint gl_ThreadInWarpNV; 1275bd8deadSopenharmony_ci in uint gl_ThreadEqMaskNV; 1285bd8deadSopenharmony_ci in uint gl_ThreadGeMaskNV; 1295bd8deadSopenharmony_ci in uint gl_ThreadGtMaskNV; 1305bd8deadSopenharmony_ci in uint gl_ThreadLeMaskNV; 1315bd8deadSopenharmony_ci in uint gl_ThreadLtMaskNV; 1325bd8deadSopenharmony_ci in uint gl_WarpIDNV; 1335bd8deadSopenharmony_ci in uint gl_SMIDNV; 1345bd8deadSopenharmony_ci 1355bd8deadSopenharmony_ci (Add to the list of built-in variables for the fragment languages) 1365bd8deadSopenharmony_ci 1375bd8deadSopenharmony_ci in bool gl_HelperThreadNV; 1385bd8deadSopenharmony_ci 1395bd8deadSopenharmony_ci (Add those paragraphs at the end of this section) 1405bd8deadSopenharmony_ci 1415bd8deadSopenharmony_ci The variable gl_ThreadInWarpNV hold the id of the thread within the thread 1425bd8deadSopenharmony_ci group(or warp). This variable is in the range 0 to gl_WarpSizeNV-1, where 1435bd8deadSopenharmony_ci gl_WarpSizeNV is the total number of thread in a warp. 1445bd8deadSopenharmony_ci 1455bd8deadSopenharmony_ci The variable gl_ThreadEqMaskNV is a bitfield in which the bit equal to the 1465bd8deadSopenharmony_ci current thread id is set. The variable gl_ThreadGeMaskNV is a bitfield in 1475bd8deadSopenharmony_ci which bits greater or equal to the current thread id are set. The variable 1485bd8deadSopenharmony_ci gl_ThreadGtMaskNV is a bitfield in which bits greater than the current 1495bd8deadSopenharmony_ci thread id are set. The variable gl_ThreadLeMaskNV is a bitfield in which 1505bd8deadSopenharmony_ci bits lower or equal to the current thread id are set. The variable 1515bd8deadSopenharmony_ci gl_ThreadLtMaskNV is a bitfield in which bits lower than the current thread 1525bd8deadSopenharmony_ci id are set. 1535bd8deadSopenharmony_ci 1545bd8deadSopenharmony_ci The value of gl_ThreadEqMaskNV, gl_ThreadGeMaskNV, gl_ThreadGtMaskNV, 1555bd8deadSopenharmony_ci gl_ThreadLeMaskNV and gl_ThreadLtMaskNV are derived from the value of 1565bd8deadSopenharmony_ci gl_ThreadInWarpNV using simple bit-shift arithmetic, they don't take into 1575bd8deadSopenharmony_ci account the value of the thread group active mask. For example, if the 1585bd8deadSopenharmony_ci application wants a bitfield in which bits lower or equal to the current 1595bd8deadSopenharmony_ci thread id are set only for active threads, the result of gl_ThreadLeMaskNV 1605bd8deadSopenharmony_ci will need to be ANDed with the thread group active mask. 1615bd8deadSopenharmony_ci 1625bd8deadSopenharmony_ci The variable gl_WarpIDNV hold the warp id of the executing thread. This 1635bd8deadSopenharmony_ci variable is in the range 0 to gl_WarpsPerSMNV-1, where gl_WarpsPerSMNV is 1645bd8deadSopenharmony_ci the maximum number of warp executing on a SM. 1655bd8deadSopenharmony_ci 1665bd8deadSopenharmony_ci The variable gl_SMIDNV hold the SM id of the executing thread. This 1675bd8deadSopenharmony_ci variable is in the range 0 to gl_SMCountNV-1, where gl_SMCountNV is the 1685bd8deadSopenharmony_ci number of SM on the GPU. 1695bd8deadSopenharmony_ci 1705bd8deadSopenharmony_ci The variable gl_HelperThreadNV specifies if the current thread is a helper 1715bd8deadSopenharmony_ci thread. In implementations supporting this extension, fragment shader 1725bd8deadSopenharmony_ci invocations may be arranged in SIMD thread groups of 2x2 fragments called 1735bd8deadSopenharmony_ci "quad". When a fragment shader instruction is executed on a quad, it's 1745bd8deadSopenharmony_ci possible that some fragments within the quad will execute the instruction 1755bd8deadSopenharmony_ci even if they are not covered by the primitive. Those threads are called 1765bd8deadSopenharmony_ci helper threads. Their outputs will be discarded and they will not execute 1775bd8deadSopenharmony_ci global store functions, but the intermediate values they compute can still 1785bd8deadSopenharmony_ci be used by thread group sharing functions or by fragment derivative 1795bd8deadSopenharmony_ci functions like dFdx and dFdy. 1805bd8deadSopenharmony_ci 1815bd8deadSopenharmony_ci 1825bd8deadSopenharmony_ci Modify Section 7.4, Built-In Uniform State, p. 125 1835bd8deadSopenharmony_ci 1845bd8deadSopenharmony_ci (Add to the list of built-in uniform variable declaration) 1855bd8deadSopenharmony_ci 1865bd8deadSopenharmony_ci uniform uint gl_WarpSizeNV; 1875bd8deadSopenharmony_ci uniform uint gl_WarpsPerSMNV; 1885bd8deadSopenharmony_ci uniform uint gl_SMCountNV; 1895bd8deadSopenharmony_ci 1905bd8deadSopenharmony_ci (Add this paragraph at the end of this section) 1915bd8deadSopenharmony_ci 1925bd8deadSopenharmony_ci The variable gl_WarpSizeNV is the total number of thread in a warp. The 1935bd8deadSopenharmony_ci variable gl_WarpsPerSMNV is the maximum number of warp executing on a SM. 1945bd8deadSopenharmony_ci The variable gl_SMCountNV is the number of SM on the GPU. 1955bd8deadSopenharmony_ci 1965bd8deadSopenharmony_ci 1975bd8deadSopenharmony_ci Modify Section 8.3, Common Functions, p. 133 1985bd8deadSopenharmony_ci 1995bd8deadSopenharmony_ci (add a function to query which threads are active within a thread group) 2005bd8deadSopenharmony_ci 2015bd8deadSopenharmony_ci Syntax: 2025bd8deadSopenharmony_ci 2035bd8deadSopenharmony_ci uint activeThreadsNV(void) 2045bd8deadSopenharmony_ci 2055bd8deadSopenharmony_ci In the value returned by activeThreadsNV(), bit <N> is set to 1 if the 2065bd8deadSopenharmony_ci corresponding thread in the SIMD thread group is executing the call to 2075bd8deadSopenharmony_ci activeThreadsNV() and 0 otherwise. A bit in the return value may be set 2085bd8deadSopenharmony_ci to zero due to conditional flow control (e.g., returning from a function, 2095bd8deadSopenharmony_ci executing the "else" part of an "if" statement) or SIMD thread group was 2105bd8deadSopenharmony_ci dispatched without a full collection of threads. 2115bd8deadSopenharmony_ci 2125bd8deadSopenharmony_ci (add a function to query the state of a Boolean condition over all the 2135bd8deadSopenharmony_ci threads in a thread group) 2145bd8deadSopenharmony_ci 2155bd8deadSopenharmony_ci Syntax: 2165bd8deadSopenharmony_ci 2175bd8deadSopenharmony_ci uint ballotThreadNV(bool value) 2185bd8deadSopenharmony_ci 2195bd8deadSopenharmony_ci The function ballotThreadNV() computes a 32-bit bitfield. It looks at the 2205bd8deadSopenharmony_ci condition <value> for each active thread of a thread group and set to 1 2215bd8deadSopenharmony_ci each bit for which the condition in the corresponding thread is true. Bits 2225bd8deadSopenharmony_ci for threads with false condition are set to 0. Bits for inactive threads 2235bd8deadSopenharmony_ci are also set to 0. It's possible to query the active thread mask by 2245bd8deadSopenharmony_ci calling the function activeThreadsNV. 2255bd8deadSopenharmony_ci 2265bd8deadSopenharmony_ci (add a function to share data between fragment in a quad) 2275bd8deadSopenharmony_ci 2285bd8deadSopenharmony_ci Syntax: 2295bd8deadSopenharmony_ci 2305bd8deadSopenharmony_ci float quadSwizzle0NV(float swizzledValue, [float unswizzledValue]) 2315bd8deadSopenharmony_ci vec2 quadSwizzle0NV(vec2 swizzledValue, [vec2 unswizzledValue]) 2325bd8deadSopenharmony_ci vec3 quadSwizzle0NV(vec3 swizzledValue, [vec3 unswizzledValue]) 2335bd8deadSopenharmony_ci vec4 quadSwizzle0NV(vec4 swizzledValue, [vec4 unswizzledValue]) 2345bd8deadSopenharmony_ci 2355bd8deadSopenharmony_ci float quadSwizzle1NV(float swizzledValue, [float unswizzledValue]) 2365bd8deadSopenharmony_ci vec2 quadSwizzle1NV(vec2 swizzledValue, [vec2 unswizzledValue]) 2375bd8deadSopenharmony_ci vec3 quadSwizzle1NV(vec3 swizzledValue, [vec3 unswizzledValue]) 2385bd8deadSopenharmony_ci vec4 quadSwizzle1NV(vec4 swizzledValue, [vec4 unswizzledValue]) 2395bd8deadSopenharmony_ci 2405bd8deadSopenharmony_ci float quadSwizzle2NV(float swizzledValue, [float unswizzledValue]) 2415bd8deadSopenharmony_ci vec2 quadSwizzle2NV(vec2 swizzledValue, [vec2 unswizzledValue]) 2425bd8deadSopenharmony_ci vec3 quadSwizzle2NV(vec3 swizzledValue, [vec3 unswizzledValue]) 2435bd8deadSopenharmony_ci vec4 quadSwizzle2NV(vec4 swizzledValue, [vec4 unswizzledValue]) 2445bd8deadSopenharmony_ci 2455bd8deadSopenharmony_ci float quadSwizzle3NV(float swizzledValue, [float unswizzledValue]) 2465bd8deadSopenharmony_ci vec2 quadSwizzle3NV(vec2 swizzledValue, [vec2 unswizzledValue]) 2475bd8deadSopenharmony_ci vec3 quadSwizzle3NV(vec3 swizzledValue, [vec3 unswizzledValue]) 2485bd8deadSopenharmony_ci vec4 quadSwizzle3NV(vec4 swizzledValue, [vec4 unswizzledValue]) 2495bd8deadSopenharmony_ci 2505bd8deadSopenharmony_ci float quadSwizzleXNV(float swizzledValue, [float unswizzledValue]) 2515bd8deadSopenharmony_ci vec2 quadSwizzleXNV(vec2 swizzledValue, [vec2 unswizzledValue]) 2525bd8deadSopenharmony_ci vec3 quadSwizzleXNV(vec3 swizzledValue, [vec3 unswizzledValue]) 2535bd8deadSopenharmony_ci vec4 quadSwizzleXNV(vec4 swizzledValue, [vec4 unswizzledValue]) 2545bd8deadSopenharmony_ci 2555bd8deadSopenharmony_ci float quadSwizzleYNV(float swizzledValue, [float unswizzledValue]) 2565bd8deadSopenharmony_ci vec2 quadSwizzleYNV(vec2 swizzledValue, [vec2 unswizzledValue]) 2575bd8deadSopenharmony_ci vec3 quadSwizzleYNV(vec3 swizzledValue, [vec3 unswizzledValue]) 2585bd8deadSopenharmony_ci vec4 quadSwizzleYNV(vec4 swizzledValue, [vec4 unswizzledValue]) 2595bd8deadSopenharmony_ci 2605bd8deadSopenharmony_ci In implementations supporting this extension, if a primitive covers a 2615bd8deadSopenharmony_ci fragment at (x,y), its fragment shader invocation will be arranged in a 2625bd8deadSopenharmony_ci SIMD thread group with fragment shader invocations corresponding to three 2635bd8deadSopenharmony_ci neighboring pixels. These four invocations are arranged in a 2x2 grid, 2645bd8deadSopenharmony_ci called a "quad". If the neighbors of a fragment are not covered by the 2655bd8deadSopenharmony_ci primitive, fragment shader invocations will still be generated. The 2665bd8deadSopenharmony_ci implementation may compute differences between values in these threads to 2675bd8deadSopenharmony_ci estimate derivatives for dFdx(), dFdy(), and for texture lookups with 2685bd8deadSopenharmony_ci automatic LOD calculations. 2695bd8deadSopenharmony_ci 2705bd8deadSopenharmony_ci Fragments may have different locations in the quads based on the type of 2715bd8deadSopenharmony_ci render target. 2725bd8deadSopenharmony_ci 2735bd8deadSopenharmony_ci When rendering to a window, fragments within a quad follow this pattern: 2745bd8deadSopenharmony_ci 2755bd8deadSopenharmony_ci --------------------------------------------------- 2765bd8deadSopenharmony_ci | gl_ThreadInWarpNV 4N+0 | gl_ThreadInWarpNV 4N+1 | 2775bd8deadSopenharmony_ci | pixel (X+0,Y+1) | pixel (X+1,Y+1) | 2785bd8deadSopenharmony_ci --------------------------------------------------- 2795bd8deadSopenharmony_ci | gl_ThreadInWarpNV 4N+2 | gl_ThreadInWarpNV 4N+3 | 2805bd8deadSopenharmony_ci | pixel (X+0,Y+0) | pixel (X+1,Y+0) | 2815bd8deadSopenharmony_ci --------------------------------------------------- 2825bd8deadSopenharmony_ci 2835bd8deadSopenharmony_ci 2845bd8deadSopenharmony_ci When rendering to a framebuffer object, fragments within a quad follow this 2855bd8deadSopenharmony_ci pattern: 2865bd8deadSopenharmony_ci 2875bd8deadSopenharmony_ci --------------------------------------------------- 2885bd8deadSopenharmony_ci | gl_ThreadInWarpNV 4N+2 | gl_ThreadInWarpNV 4N+3 | 2895bd8deadSopenharmony_ci | pixel (X+0,Y+1) | pixel (X+1,Y+1) | 2905bd8deadSopenharmony_ci --------------------------------------------------- 2915bd8deadSopenharmony_ci | gl_ThreadInWarpNV 4N+0 | gl_ThreadInWarpNV 4N+1 | 2925bd8deadSopenharmony_ci | pixel (X+0,Y+0) | pixel (X+1,Y+0) | 2935bd8deadSopenharmony_ci --------------------------------------------------- 2945bd8deadSopenharmony_ci 2955bd8deadSopenharmony_ci There are 6 quadSwizzle functions that allow fragments within a quad to 2965bd8deadSopenharmony_ci exchange data. All those functions will read a floating point 2975bd8deadSopenharmony_ci operand <swizzledValue>, which can come from any fragment in the quad. 2985bd8deadSopenharmony_ci Another optional floating point operand <unswizzledValue>, which comes from 2995bd8deadSopenharmony_ci the current fragment, can be added to <swizzledValue>. The only difference 3005bd8deadSopenharmony_ci between all those quadSwizzle functions is the location where they get the 3015bd8deadSopenharmony_ci <swizzledValue> operand within the 2x2 pixel quad. 3025bd8deadSopenharmony_ci 3035bd8deadSopenharmony_ci quadSwizzle0NV will read the <swizzledValue> operand from the fragment 0: 3045bd8deadSopenharmony_ci 3055bd8deadSopenharmony_ci result[thread N] = swizzledValue[thread 0] + unswizzledValue[thread N] 3065bd8deadSopenharmony_ci 3075bd8deadSopenharmony_ci 3085bd8deadSopenharmony_ci quadSwizzle1NV will read the <swizzledValue> operand from the fragment 1: 3095bd8deadSopenharmony_ci 3105bd8deadSopenharmony_ci result[thread N] = swizzledValue[thread 1] + unswizzledValue[thread N] 3115bd8deadSopenharmony_ci 3125bd8deadSopenharmony_ci 3135bd8deadSopenharmony_ci quadSwizzle2NV will read the <swizzledValue> operand from the fragment 2: 3145bd8deadSopenharmony_ci 3155bd8deadSopenharmony_ci result[thread N] = swizzledValue[thread 2] + unswizzledValue[thread N] 3165bd8deadSopenharmony_ci 3175bd8deadSopenharmony_ci 3185bd8deadSopenharmony_ci quadSwizzle3NV will read the <swizzledValue> operand from the fragment 3: 3195bd8deadSopenharmony_ci 3205bd8deadSopenharmony_ci result[thread N] = swizzledValue[thread 3] + unswizzledValue[thread N] 3215bd8deadSopenharmony_ci 3225bd8deadSopenharmony_ci 3235bd8deadSopenharmony_ci quadSwizzleXNV will read the <swizzledValue> operand for each fragment 3245bd8deadSopenharmony_ci from its neighbor in X: 3255bd8deadSopenharmony_ci 3265bd8deadSopenharmony_ci result[thread 0] = swizzledValue[thread 1] + unswizzledValue[thread 0] 3275bd8deadSopenharmony_ci result[thread 1] = swizzledValue[thread 0] + unswizzledValue[thread 1] 3285bd8deadSopenharmony_ci result[thread 2] = swizzledValue[thread 3] + unswizzledValue[thread 2] 3295bd8deadSopenharmony_ci result[thread 3] = swizzledValue[thread 2] + unswizzledValue[thread 3] 3305bd8deadSopenharmony_ci 3315bd8deadSopenharmony_ci 3325bd8deadSopenharmony_ci quadSwizzleYNV will read the <swizzledValue> operand for each fragment 3335bd8deadSopenharmony_ci from its neighbor in Y: 3345bd8deadSopenharmony_ci 3355bd8deadSopenharmony_ci result[thread 0] = swizzledValue[thread 2] + unswizzledValue[thread 0] 3365bd8deadSopenharmony_ci result[thread 1] = swizzledValue[thread 3] + unswizzledValue[thread 1] 3375bd8deadSopenharmony_ci result[thread 2] = swizzledValue[thread 0] + unswizzledValue[thread 2] 3385bd8deadSopenharmony_ci result[thread 3] = swizzledValue[thread 1] + unswizzledValue[thread 3] 3395bd8deadSopenharmony_ci 3405bd8deadSopenharmony_ci 3415bd8deadSopenharmony_ci If any thread in a 2x2 pixel quad is inactive, the quad is divergent. In 3425bd8deadSopenharmony_ci this case quadSwizzle will return 0 for all fragments in the quad. 3435bd8deadSopenharmony_ci 3445bd8deadSopenharmony_ci 3455bd8deadSopenharmony_ciDependencies on NV_gpu_program5 3465bd8deadSopenharmony_ci 3475bd8deadSopenharmony_ci If NV_gpu_program5 is supported and "OPTION NV_shader_thread_group" is 3485bd8deadSopenharmony_ci specified in an assembly program, the following edits are made to extend 3495bd8deadSopenharmony_ci the assembly programming model documented in the NV_gpu_program4 extension 3505bd8deadSopenharmony_ci and extended by NV_gpu_program5. 3515bd8deadSopenharmony_ci 3525bd8deadSopenharmony_ci If NV_gpu_program5 is not supported, or if "OPTION NV_shader_thread_group" 3535bd8deadSopenharmony_ci is not specified in an assembly program, the contents of this dependencies 3545bd8deadSopenharmony_ci section should be ignored. 3555bd8deadSopenharmony_ci 3565bd8deadSopenharmony_ci Modify Section 2.X.2, Program Grammar 3575bd8deadSopenharmony_ci 3585bd8deadSopenharmony_ci (add the following rules to the the NV_gpu_program4 and 3595bd8deadSopenharmony_ci NV_gpu_program5 base grammars) 3605bd8deadSopenharmony_ci 3615bd8deadSopenharmony_ci <VECTORop> ::= "TGBALLOT" 3625bd8deadSopenharmony_ci 3635bd8deadSopenharmony_ci <stateSingleItem> ::= "state" "." <stateThreadItem> 3645bd8deadSopenharmony_ci 3655bd8deadSopenharmony_ci <stateThreadItem> ::= "thread" "." <stateThreadProperty> 3665bd8deadSopenharmony_ci 3675bd8deadSopenharmony_ci <stateThreadProperty> ::= "warpsize" 3685bd8deadSopenharmony_ci | "warpspersm" 3695bd8deadSopenharmony_ci | "smcount" 3705bd8deadSopenharmony_ci 3715bd8deadSopenharmony_ci (add/change the following rules to the NV_fragment_program4 and 3725bd8deadSopenharmony_ci NV_gpu_program5 base grammars) 3735bd8deadSopenharmony_ci 3745bd8deadSopenharmony_ci <VECTORop> ::= "QSWZ0" 3755bd8deadSopenharmony_ci | "QSWZ1" 3765bd8deadSopenharmony_ci | "QSWZ2" 3775bd8deadSopenharmony_ci | "QSWZ3" 3785bd8deadSopenharmony_ci | "QSWZX" 3795bd8deadSopenharmony_ci | "QSWZY" 3805bd8deadSopenharmony_ci 3815bd8deadSopenharmony_ci <attribBasic> ::= <fragPrefix> "threadid" 3825bd8deadSopenharmony_ci | <fragPrefix> "threadeqmask" 3835bd8deadSopenharmony_ci | <fragPrefix> "threadltmask" 3845bd8deadSopenharmony_ci | <fragPrefix> "threadlemask" 3855bd8deadSopenharmony_ci | <fragPrefix> "threadgtmask" 3865bd8deadSopenharmony_ci | <fragPrefix> "threadgemask" 3875bd8deadSopenharmony_ci | <fragPrefix> "warpid" 3885bd8deadSopenharmony_ci | <fragPrefix> "smid" 3895bd8deadSopenharmony_ci | <fragPrefix> "helperthread" 3905bd8deadSopenharmony_ci 3915bd8deadSopenharmony_ci (add/change the following rules to the NV_vertex_program4 and 3925bd8deadSopenharmony_ci NV_gpu_program5 base grammars) 3935bd8deadSopenharmony_ci 3945bd8deadSopenharmony_ci <attribBasic> ::= <vtxPrefix> "threadid" 3955bd8deadSopenharmony_ci | <vtxPrefix> "threadeqmask" 3965bd8deadSopenharmony_ci | <vtxPrefix> "threadltmask" 3975bd8deadSopenharmony_ci | <vtxPrefix> "threadlemask" 3985bd8deadSopenharmony_ci | <vtxPrefix> "threadgtmask" 3995bd8deadSopenharmony_ci | <vtxPrefix> "threadgemask" 4005bd8deadSopenharmony_ci | <vtxPrefix> "warpid" 4015bd8deadSopenharmony_ci | <vtxPrefix> "smid" 4025bd8deadSopenharmony_ci 4035bd8deadSopenharmony_ci (add/change the following rules to the NV_geometry_program4 and 4045bd8deadSopenharmony_ci NV_gpu_program5 base grammars) 4055bd8deadSopenharmony_ci 4065bd8deadSopenharmony_ci <attribBasic> ::= <primPrefix> "threadid" 4075bd8deadSopenharmony_ci | <primPrefix> "threadeqmask" 4085bd8deadSopenharmony_ci | <primPrefix> "threadltmask" 4095bd8deadSopenharmony_ci | <primPrefix> "threadlemask" 4105bd8deadSopenharmony_ci | <primPrefix> "threadgtmask" 4115bd8deadSopenharmony_ci | <primPrefix> "threadgemask" 4125bd8deadSopenharmony_ci | <primPrefix> "warpid" 4135bd8deadSopenharmony_ci | <primPrefix> "smid" 4145bd8deadSopenharmony_ci 4155bd8deadSopenharmony_ci Modify Section 2.X.3.2 of the NV_gpu_program4 specification, Program 4165bd8deadSopenharmony_ci Attribute Variables. 4175bd8deadSopenharmony_ci 4185bd8deadSopenharmony_ci (Add the table entries and relevant text describing the fragment program 4195bd8deadSopenharmony_ci input variable use to query thread states.) 4205bd8deadSopenharmony_ci 4215bd8deadSopenharmony_ci Fragment Attribute Binding Components Underlying State 4225bd8deadSopenharmony_ci -------------------------- ---------- ---------------------------- 4235bd8deadSopenharmony_ci ... 4245bd8deadSopenharmony_ci fragment.threadid (id,-,-,-) id of the current thread 4255bd8deadSopenharmony_ci fragment.threadeqmask (m,-,-,-) mask with the current thread 4265bd8deadSopenharmony_ci fragment.threadltmask (m,-,-,-) mask with lower thread 4275bd8deadSopenharmony_ci fragment.threadlemask (m,-,-,-) mask with lower or equal thread 4285bd8deadSopenharmony_ci fragment.threadgtmask (m,-,-,-) mask with greater thread 4295bd8deadSopenharmony_ci fragment.threadgemask (m,-,-,-) mask with greater or equal thread 4305bd8deadSopenharmony_ci fragment.warpid (id,-,-,-) warp id of the current thread 4315bd8deadSopenharmony_ci fragment.smid (id,-,-,-) SM id of the current thread 4325bd8deadSopenharmony_ci fragment.helperthread (k,-,-,-) current thread is a helper thread 4335bd8deadSopenharmony_ci ... 4345bd8deadSopenharmony_ci 4355bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.threadid", the "x" 4365bd8deadSopenharmony_ci component is filled with the thread id of the current thread. The thread 4375bd8deadSopenharmony_ci id is an unsigned integer in the range 0 to 31. 4385bd8deadSopenharmony_ci 4395bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.threadeqmask", the "x" 4405bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which the 4415bd8deadSopenharmony_ci bit equal to the current thread id is set. 4425bd8deadSopenharmony_ci 4435bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.threadltmask", the "x" 4445bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 4455bd8deadSopenharmony_ci lower than the current thread id are set. 4465bd8deadSopenharmony_ci 4475bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.threadlemask", the "x" 4485bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 4495bd8deadSopenharmony_ci lower or equal to the current thread id are set. 4505bd8deadSopenharmony_ci 4515bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.threadgtmask", the "x" 4525bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 4535bd8deadSopenharmony_ci greater than the current thread id are set. 4545bd8deadSopenharmony_ci 4555bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.threadgemask", the "x" 4565bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 4575bd8deadSopenharmony_ci greater or equal to the current thread id are set. 4585bd8deadSopenharmony_ci 4595bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.warpid", the "x" 4605bd8deadSopenharmony_ci component is filled with the warp id of the current thread. The warp id is 4615bd8deadSopenharmony_ci an unsigned integer, the range of this value is hw dependent. 4625bd8deadSopenharmony_ci 4635bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.smid", the "x" component 4645bd8deadSopenharmony_ci is filled with the SM id of the current thread. The SM id is an unsigned 4655bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 4665bd8deadSopenharmony_ci 4675bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.helperthread", the "x" 4685bd8deadSopenharmony_ci component is an integer value equal to -1 when the current thread is a 4695bd8deadSopenharmony_ci helper thread and 0 otherwise. In implementations supporting this 4705bd8deadSopenharmony_ci extension, fragment program invocations may be arranged in SIMD thread 4715bd8deadSopenharmony_ci groups of 2x2 fragments called "quad". When a fragment program instruction 4725bd8deadSopenharmony_ci is executed on a quad, it's possible that some fragments within the quad 4735bd8deadSopenharmony_ci will execute the instruction even if they are not covered by the primitive. 4745bd8deadSopenharmony_ci Those threads are called helper threads. Their outputs will be discarded 4755bd8deadSopenharmony_ci and they will not execute global store instructions, but the intermediate 4765bd8deadSopenharmony_ci values they compute can still be used by thread group sharing instructions 4775bd8deadSopenharmony_ci or by fragment derivative instructions like DDX and DDY. 4785bd8deadSopenharmony_ci 4795bd8deadSopenharmony_ci (Add the table entries and relevant text describing the vertex program 4805bd8deadSopenharmony_ci attribute variable use to query thread states.) 4815bd8deadSopenharmony_ci 4825bd8deadSopenharmony_ci Vertex Attribute Binding Components Underlying State 4835bd8deadSopenharmony_ci ------------------------ ---------- ---------------------------- 4845bd8deadSopenharmony_ci ... 4855bd8deadSopenharmony_ci vertex.threadid (id,-,-,-) id of the current thread 4865bd8deadSopenharmony_ci vertex.threadeqmask (m,-,-,-) mask with the current thread 4875bd8deadSopenharmony_ci vertex.threadltmask (m,-,-,-) mask with lower thread 4885bd8deadSopenharmony_ci vertex.threadlemask (m,-,-,-) mask with lower or equal thread 4895bd8deadSopenharmony_ci vertex.threadgtmask (m,-,-,-) mask with greater thread 4905bd8deadSopenharmony_ci vertex.threadgemask (m,-,-,-) mask with greater or equal thread 4915bd8deadSopenharmony_ci vertex.warpid (id,-,-,-) warp id of the current thread 4925bd8deadSopenharmony_ci vertex.smid (id,-,-,-) SM id of the current thread 4935bd8deadSopenharmony_ci ... 4945bd8deadSopenharmony_ci 4955bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.threadid", the "x" component 4965bd8deadSopenharmony_ci is filled with the thread id of the current thread. The thread id is an 4975bd8deadSopenharmony_ci unsigned integer in the range 0 to 31. 4985bd8deadSopenharmony_ci 4995bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.threadeqmask", the "x" 5005bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which the 5015bd8deadSopenharmony_ci bit equal to the current thread id is set. 5025bd8deadSopenharmony_ci 5035bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.threadltmask", the "x" 5045bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5055bd8deadSopenharmony_ci lower than the current thread id are set. 5065bd8deadSopenharmony_ci 5075bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.threadlemask", the "x" 5085bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5095bd8deadSopenharmony_ci lower or equal to the current thread id are set. 5105bd8deadSopenharmony_ci 5115bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.threadgtmask", the "x" 5125bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5135bd8deadSopenharmony_ci greater than the current thread id are set. 5145bd8deadSopenharmony_ci 5155bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.threadgemask", the "x" 5165bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5175bd8deadSopenharmony_ci greater or equal to the current thread id are set. 5185bd8deadSopenharmony_ci 5195bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.warpid", the "x" component is 5205bd8deadSopenharmony_ci filled with the warp id of the current thread. The warp id is an unsigned 5215bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 5225bd8deadSopenharmony_ci 5235bd8deadSopenharmony_ci If a vertex attribute binding matches "vertex.smid", the "x" component 5245bd8deadSopenharmony_ci is filled with the SM id of the current thread. The SM id is an unsigned 5255bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 5265bd8deadSopenharmony_ci 5275bd8deadSopenharmony_ci 5285bd8deadSopenharmony_ci (Add the table entries and relevant text describing the geometry program 5295bd8deadSopenharmony_ci attribute variable use to query thread states.) 5305bd8deadSopenharmony_ci 5315bd8deadSopenharmony_ci Geometry Attribute Binding Components Underlying State 5325bd8deadSopenharmony_ci -------------------------- ---------- ---------------------------- 5335bd8deadSopenharmony_ci ... 5345bd8deadSopenharmony_ci primitive.threadid (id,-,-,-) id of the current thread 5355bd8deadSopenharmony_ci primitive.threadeqmask (m,-,-,-) mask with the current thread 5365bd8deadSopenharmony_ci primitive.threadltmask (m,-,-,-) mask with lower thread 5375bd8deadSopenharmony_ci primitive.threadlemask (m,-,-,-) mask with lower or equal thread 5385bd8deadSopenharmony_ci primitive.threadgtmask (m,-,-,-) mask with greater thread 5395bd8deadSopenharmony_ci primitive.threadgemask (m,-,-,-) mask with greater or equal thread 5405bd8deadSopenharmony_ci primitive.warpid (id,-,-,-) warp id of the current thread 5415bd8deadSopenharmony_ci primitive.smid (id,-,-,-) SM id of the current thread 5425bd8deadSopenharmony_ci ... 5435bd8deadSopenharmony_ci 5445bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.threadid", the "x" 5455bd8deadSopenharmony_ci component is filled with the thread id of the current thread. The thread 5465bd8deadSopenharmony_ci id is an unsigned integer in the range 0 to 31. 5475bd8deadSopenharmony_ci 5485bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.threadeqmask", the "x" 5495bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which the 5505bd8deadSopenharmony_ci bit equal to the current thread id is set. 5515bd8deadSopenharmony_ci 5525bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.threadltmask", the "x" 5535bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5545bd8deadSopenharmony_ci lower than the current thread id are set. 5555bd8deadSopenharmony_ci 5565bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.threadlemask", the "x" 5575bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5585bd8deadSopenharmony_ci lower or equal to the current thread id are set. 5595bd8deadSopenharmony_ci 5605bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.threadgtmask", the "x" 5615bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5625bd8deadSopenharmony_ci greater than the current thread id are set. 5635bd8deadSopenharmony_ci 5645bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.threadgemask", the "x" 5655bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 5665bd8deadSopenharmony_ci greater or equal to the current thread id are set. 5675bd8deadSopenharmony_ci 5685bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.warpid", the "x" 5695bd8deadSopenharmony_ci component is filled with the warp id of the current thread. The warp id is 5705bd8deadSopenharmony_ci an unsigned integer, the range of this value is hw dependent. 5715bd8deadSopenharmony_ci 5725bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.smid", the "x" component 5735bd8deadSopenharmony_ci is filled with the SM id of the current thread. The SM id is an unsigned 5745bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 5755bd8deadSopenharmony_ci 5765bd8deadSopenharmony_ci 5775bd8deadSopenharmony_ci (add the following subsection to section 2.X.3.3, Parameters) 5785bd8deadSopenharmony_ci 5795bd8deadSopenharmony_ci Thread Group Property Bindings 5805bd8deadSopenharmony_ci 5815bd8deadSopenharmony_ci Binding Components Underlying State 5825bd8deadSopenharmony_ci ----------------------------- ---------- ---------------------------- 5835bd8deadSopenharmony_ci state.thread.warpsize (x,-,-,-) total number of thread in a 5845bd8deadSopenharmony_ci warp 5855bd8deadSopenharmony_ci state.thread.warpspersm (x,-,-,-) maximum number of warp 5865bd8deadSopenharmony_ci executing on a SM 5875bd8deadSopenharmony_ci state.thread.smcount (x,-,-,-) number of SM on the GPU 5885bd8deadSopenharmony_ci 5895bd8deadSopenharmony_ci If a program parameter binding matches "state.thread.warpsize", the "x" 5905bd8deadSopenharmony_ci component of the program parameter variable is filled with an integer value 5915bd8deadSopenharmony_ci indicating the total number of thread in a warp. The "y", "z", and "w" 5925bd8deadSopenharmony_ci components are undefined. 5935bd8deadSopenharmony_ci 5945bd8deadSopenharmony_ci If a program parameter binding matches "state.thread.warpspersm", the "x" 5955bd8deadSopenharmony_ci component of the program parameter variable is filled with an integer value 5965bd8deadSopenharmony_ci indicating the maximum number of warp executing on a SM. The "y", "z", and 5975bd8deadSopenharmony_ci "w" components are undefined. 5985bd8deadSopenharmony_ci 5995bd8deadSopenharmony_ci If a program parameter binding matches "state.thread.smcount", the "x" 6005bd8deadSopenharmony_ci component of the program parameter variable is filled with an integer value 6015bd8deadSopenharmony_ci indicating the number of SM on the GPU. The "y", "z", and "w" components 6025bd8deadSopenharmony_ci are undefined. 6035bd8deadSopenharmony_ci 6045bd8deadSopenharmony_ci 6055bd8deadSopenharmony_ci Modify Section 2.X.4, Program Execution Environment 6065bd8deadSopenharmony_ci 6075bd8deadSopenharmony_ci (Add the table entries and relevant text describing the program 6085bd8deadSopenharmony_ci instruction to query thread conditions.) 6095bd8deadSopenharmony_ci 6105bd8deadSopenharmony_ci Instr- Modifiers 6115bd8deadSopenharmony_ci uction V F I C S H D Out Inputs Description 6125bd8deadSopenharmony_ci ------- -- - - - - - - --- -------- -------------------------------- 6135bd8deadSopenharmony_ci ... 6145bd8deadSopenharmony_ci TGBALLOT 50 X X X X - - F vu v query a boolean in thread group 6155bd8deadSopenharmony_ci ... 6165bd8deadSopenharmony_ci 6175bd8deadSopenharmony_ci 6185bd8deadSopenharmony_ci (Add the table entries and relevant text describing the fragment program 6195bd8deadSopenharmony_ci instructions to exchange data between threads.) 6205bd8deadSopenharmony_ci 6215bd8deadSopenharmony_ci Instr- Modifiers 6225bd8deadSopenharmony_ci uction V F I C S H D Out Inputs Description 6235bd8deadSopenharmony_ci ------- -- - - - - - - --- -------- -------------------------------- 6245bd8deadSopenharmony_ci ... 6255bd8deadSopenharmony_ci QSWZ0 50 X - - - - - F v v,v add fragment 0 in a quad 6265bd8deadSopenharmony_ci QSWZ1 50 X - - - - - F v v,v add fragment 1 in a quad 6275bd8deadSopenharmony_ci QSWZ2 50 X - - - - - F v v,v add fragment 2 in a quad 6285bd8deadSopenharmony_ci QSWZ3 50 X - - - - - F v v,v add fragment 3 in a quad 6295bd8deadSopenharmony_ci QSWZX 50 X - - - - - F v v,v add fragments horizontally 6305bd8deadSopenharmony_ci QSWZY 50 X - - - - - F v v,v add fragments vertically 6315bd8deadSopenharmony_ci ... 6325bd8deadSopenharmony_ci 6335bd8deadSopenharmony_ci 6345bd8deadSopenharmony_ci (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 6355bd8deadSopenharmony_ci as extended by NV_gpu_program5) 6365bd8deadSopenharmony_ci 6375bd8deadSopenharmony_ci + Shader thread group (NV_shader_thread_group) 6385bd8deadSopenharmony_ci 6395bd8deadSopenharmony_ci If a fragment program specifies the "NV_shader_thread_group" option, it 6405bd8deadSopenharmony_ci may use the "fragment.threadid", "fragment.threadeqmask", 6415bd8deadSopenharmony_ci "fragment.threadltmask", "fragment.threadlemask", "fragment.threadgtmask", 6425bd8deadSopenharmony_ci "fragment.threadgemask", "fragment.warpid", "fragment.smid", 6435bd8deadSopenharmony_ci "fragment.helperthread", "state.thread.warpsize", "state.thread.warpspersm" 6445bd8deadSopenharmony_ci and "state.thread.smcount" bindings. It may also use the "TGBALLOT", 6455bd8deadSopenharmony_ci "QSWZ0", "QSWZ1", "QSWZ2", "QSWZ3", "QSWZX" and "QSWZY" instructions. If 6465bd8deadSopenharmony_ci this option is not specified, a program will fail to compile if it uses 6475bd8deadSopenharmony_ci those instructions or bindings. 6485bd8deadSopenharmony_ci 6495bd8deadSopenharmony_ci If a vertex program specifies the "NV_shader_thread_group" option, it may 6505bd8deadSopenharmony_ci use the "vertex.threadid", "vertex.threadeqmask", "vertex.threadltmask", 6515bd8deadSopenharmony_ci "vertex.threadlemask", "vertex.threadgtmask", "vertex.threadgemask", 6525bd8deadSopenharmony_ci "vertex.warpid", "vertex.smid", "state.thread.warpsize", 6535bd8deadSopenharmony_ci "state.thread.warpspersm" and "state.thread.smcount" bindings. It may also 6545bd8deadSopenharmony_ci use the "TGBALLOT" instruction. If this option is not specified, a program 6555bd8deadSopenharmony_ci will fail to compile if it uses those instructions or bindings. 6565bd8deadSopenharmony_ci 6575bd8deadSopenharmony_ci If a geometry program specifies the "NV_shader_thread_group" option, it 6585bd8deadSopenharmony_ci may use the "primitive.threadid", "primitive.threadeqmask", 6595bd8deadSopenharmony_ci "primitive.threadltmask", "primitive.threadlemask", 6605bd8deadSopenharmony_ci "primitive.threadgtmask", "primitive.threadgemask", "primitive.warpid", 6615bd8deadSopenharmony_ci "primitive.smid", "state.thread.warpsize", "state.thread.warpspersm" and 6625bd8deadSopenharmony_ci "state.thread.smcount" bindings. It may also use the "TGBALLOT" 6635bd8deadSopenharmony_ci instruction. If this option is not specified, a program will fail to 6645bd8deadSopenharmony_ci compile if it uses those instructions or bindings. 6655bd8deadSopenharmony_ci 6665bd8deadSopenharmony_ci Section 2.X.8.Z, QSWZ0: add fragment 0 data to all fragment in a quad 6675bd8deadSopenharmony_ci 6685bd8deadSopenharmony_ci The QSWZ0 instruction produces a floating point result by adding the 6695bd8deadSopenharmony_ci first operand, a floating point value from fragment 0, to the second 6705bd8deadSopenharmony_ci operand, another floating point value from the current fragment. 6715bd8deadSopenharmony_ci 6725bd8deadSopenharmony_ci quadSwizzle0NV is the GLSL function that implements the same functionality 6735bd8deadSopenharmony_ci as the QSWZ0 assembly instruction. The section 8.3 of the OpenGL Shading 6745bd8deadSopenharmony_ci Language Specification has more detail about the implementation of 6755bd8deadSopenharmony_ci quadSwizzle0NV. This additional information also applies to QSWZ0. 6765bd8deadSopenharmony_ci 6775bd8deadSopenharmony_ci 6785bd8deadSopenharmony_ci Section 2.X.8.Z, QSWZ1: add fragment 1 data to all fragment in a quad 6795bd8deadSopenharmony_ci 6805bd8deadSopenharmony_ci The QSWZ1 instruction produces a floating point result by adding the 6815bd8deadSopenharmony_ci first operand, a floating point value from fragment 1, to the second 6825bd8deadSopenharmony_ci operand, another floating point value from the current fragment. 6835bd8deadSopenharmony_ci 6845bd8deadSopenharmony_ci quadSwizzle1NV is the GLSL function that implements the same functionality 6855bd8deadSopenharmony_ci as the QSWZ1 assembly instruction. The section 8.3 of the OpenGL Shading 6865bd8deadSopenharmony_ci Language Specification has more detail about the implementation of 6875bd8deadSopenharmony_ci quadSwizzle1NV. This additional information also applies to QSWZ1. 6885bd8deadSopenharmony_ci 6895bd8deadSopenharmony_ci 6905bd8deadSopenharmony_ci Section 2.X.8.Z, QSWZ2: add fragment 2 data to all fragment in a quad 6915bd8deadSopenharmony_ci 6925bd8deadSopenharmony_ci The QSWZ2 instruction produces a floating point result by adding the 6935bd8deadSopenharmony_ci first operand, a floating point value from fragment 2, to the second 6945bd8deadSopenharmony_ci operand, another floating point value from the current fragment. 6955bd8deadSopenharmony_ci 6965bd8deadSopenharmony_ci quadSwizzle2NV is the GLSL function that implements the same functionality 6975bd8deadSopenharmony_ci as the QSWZ2 assembly instruction. The section 8.3 of the OpenGL Shading 6985bd8deadSopenharmony_ci Language Specification has more detail about the implementation of 6995bd8deadSopenharmony_ci quadSwizzle2NV. This additional information also applies to QSWZ2. 7005bd8deadSopenharmony_ci 7015bd8deadSopenharmony_ci 7025bd8deadSopenharmony_ci Section 2.X.8.Z, QSWZ3: add fragment 3 data to all fragment in a quad 7035bd8deadSopenharmony_ci 7045bd8deadSopenharmony_ci The QSWZ3 instruction produces a floating point result by adding the 7055bd8deadSopenharmony_ci first operand, a floating point value from fragment 3, to the second 7065bd8deadSopenharmony_ci operand, another floating point value from the current fragment. 7075bd8deadSopenharmony_ci 7085bd8deadSopenharmony_ci quadSwizzle3NV is the GLSL function that implements the same functionality 7095bd8deadSopenharmony_ci as the QSWZ3 assembly instruction. The section 8.3 of the OpenGL Shading 7105bd8deadSopenharmony_ci Language Specification has more detail about the implementation of 7115bd8deadSopenharmony_ci quadSwizzle3NV. This additional information also applies to QSWZ3. 7125bd8deadSopenharmony_ci 7135bd8deadSopenharmony_ci 7145bd8deadSopenharmony_ci Section 2.X.8.Z, QSWZX: add fragments in a quad horizontally 7155bd8deadSopenharmony_ci 7165bd8deadSopenharmony_ci The QSWZX instruction produces a floating point result by adding the 7175bd8deadSopenharmony_ci first operand, a floating point value from the fragment neighbor in X to 7185bd8deadSopenharmony_ci the current fragment, to the second operand, another floating point value 7195bd8deadSopenharmony_ci from the current fragment. 7205bd8deadSopenharmony_ci 7215bd8deadSopenharmony_ci quadSwizzleXNV is the GLSL function that implements the same functionality 7225bd8deadSopenharmony_ci as the QSWZX assembly instruction. The section 8.3 of the OpenGL Shading 7235bd8deadSopenharmony_ci Language Specification has more detail about the implementation of 7245bd8deadSopenharmony_ci quadSwizzleXNV. This additional information also applies to QSWZX. 7255bd8deadSopenharmony_ci 7265bd8deadSopenharmony_ci 7275bd8deadSopenharmony_ci Section 2.X.8.Z, QSWZY: add fragments in a quad vertically 7285bd8deadSopenharmony_ci 7295bd8deadSopenharmony_ci The QSWZY instruction produces a floating point result by adding the 7305bd8deadSopenharmony_ci first operand, a floating point value from the fragment neighbor in Y to 7315bd8deadSopenharmony_ci the current fragment, to the second operand, another floating point value 7325bd8deadSopenharmony_ci from the current fragment. 7335bd8deadSopenharmony_ci 7345bd8deadSopenharmony_ci quadSwizzleYNV is the GLSL function that implements the same functionality 7355bd8deadSopenharmony_ci as the QSWZY assembly instruction. The section 8.3 of the OpenGL Shading 7365bd8deadSopenharmony_ci Language Specification has more detail about the implementation of 7375bd8deadSopenharmony_ci quadSwizzleYNV. This additional information also applies to QSWZY. 7385bd8deadSopenharmony_ci 7395bd8deadSopenharmony_ci 7405bd8deadSopenharmony_ci Section 2.X.8.Z, TGBALLOT: query a boolean condition over a thread group 7415bd8deadSopenharmony_ci 7425bd8deadSopenharmony_ci The TGBALLOT instruction produces a result vector by reading a vector 7435bd8deadSopenharmony_ci operand for each active thread in the current thread group and comparing 7445bd8deadSopenharmony_ci each component to zero. A result vector component contains an integer 7455bd8deadSopenharmony_ci bitmask value (described below) for which the bits in a component bitmask 7465bd8deadSopenharmony_ci are set if the value in the operand vector is non-zero for the 7475bd8deadSopenharmony_ci corresponding thread, and not set otherwise. 7485bd8deadSopenharmony_ci 7495bd8deadSopenharmony_ci Sometime when the instruction is in a conditional control flow block or 7505bd8deadSopenharmony_ci when it's not possible to completely fill a thread group, only a subset of 7515bd8deadSopenharmony_ci the threads in the thread group will be active and will execute the 7525bd8deadSopenharmony_ci TGBALLOT instruction. Each bit in the bitfield corresponding to inactive 7535bd8deadSopenharmony_ci threads will be set to 0. It's possible to query the active thread mask 7545bd8deadSopenharmony_ci by calling TGBALLOT with 1 as the first operand. 7555bd8deadSopenharmony_ci 7565bd8deadSopenharmony_ci tmp = VectorLoad(op0); 7575bd8deadSopenharmony_ci result = { 0, 0, 0, 0 }; 7585bd8deadSopenharmony_ci for (all active threads) { 7595bd8deadSopenharmony_ci if ([thread]tmp.x != 0) result.x |= 1 << thread; 7605bd8deadSopenharmony_ci if ([thread]tmp.y != 0) result.y |= 1 << thread; 7615bd8deadSopenharmony_ci if ([thread]tmp.z != 0) result.z |= 1 << thread; 7625bd8deadSopenharmony_ci if ([thread]tmp.w != 0) result.w |= 1 << thread; 7635bd8deadSopenharmony_ci } 7645bd8deadSopenharmony_ci 7655bd8deadSopenharmony_ciDependencies on NV_tessellation_program5 7665bd8deadSopenharmony_ci 7675bd8deadSopenharmony_ci If NV_tessellation_program5 is supported and 7685bd8deadSopenharmony_ci "OPTION NV_shader_thread_group" is specified in an assembly program, the 7695bd8deadSopenharmony_ci following edits are made to extend the assembly programming model 7705bd8deadSopenharmony_ci documented in the NV_gpu_program4 extension and extended by NV_gpu_program5 7715bd8deadSopenharmony_ci and NV_tessellation_program5. 7725bd8deadSopenharmony_ci 7735bd8deadSopenharmony_ci If NV_tessellation_program5 is not supported, or if 7745bd8deadSopenharmony_ci "OPTION NV_shader_thread_group" is not specified in an assembly program, 7755bd8deadSopenharmony_ci the contents of this dependencies section should be ignored. 7765bd8deadSopenharmony_ci 7775bd8deadSopenharmony_ci 7785bd8deadSopenharmony_ci Modify Section 2.X.2, Program Grammar 7795bd8deadSopenharmony_ci 7805bd8deadSopenharmony_ci (add/change the following rules to the NV_gpu_program5 base grammars for 7815bd8deadSopenharmony_ci tessellation control programs) 7825bd8deadSopenharmony_ci 7835bd8deadSopenharmony_ci <attribBasic> ::= <primPrefix> "threadid" 7845bd8deadSopenharmony_ci | <primPrefix> "threadeqmask" 7855bd8deadSopenharmony_ci | <primPrefix> "threadltmask" 7865bd8deadSopenharmony_ci | <primPrefix> "threadlemask" 7875bd8deadSopenharmony_ci | <primPrefix> "threadgtmask" 7885bd8deadSopenharmony_ci | <primPrefix> "threadgemask" 7895bd8deadSopenharmony_ci | <primPrefix> "warpid" 7905bd8deadSopenharmony_ci | <primPrefix> "smid" 7915bd8deadSopenharmony_ci 7925bd8deadSopenharmony_ci (add/change the following rules to the NV_gpu_program5 base grammars for 7935bd8deadSopenharmony_ci tessellation evaluation programs) 7945bd8deadSopenharmony_ci 7955bd8deadSopenharmony_ci <attribBasic> ::= <primPrefix> "threadid" 7965bd8deadSopenharmony_ci | <primPrefix> "threadeqmask" 7975bd8deadSopenharmony_ci | <primPrefix> "threadltmask" 7985bd8deadSopenharmony_ci | <primPrefix> "threadlemask" 7995bd8deadSopenharmony_ci | <primPrefix> "threadgtmask" 8005bd8deadSopenharmony_ci | <primPrefix> "threadgemask" 8015bd8deadSopenharmony_ci | <primPrefix> "warpid" 8025bd8deadSopenharmony_ci | <primPrefix> "smid" 8035bd8deadSopenharmony_ci 8045bd8deadSopenharmony_ci 8055bd8deadSopenharmony_ci Modify Section 2.X.3.2 of the NV_tessellation_program5 specification, 8065bd8deadSopenharmony_ci Program Attribute Variables. 8075bd8deadSopenharmony_ci 8085bd8deadSopenharmony_ci (Add the table entries and relevant text describing the Tessellation 8095bd8deadSopenharmony_ci control and evaluation program attribute variables use to query thread 8105bd8deadSopenharmony_ci states.) 8115bd8deadSopenharmony_ci 8125bd8deadSopenharmony_ci 8135bd8deadSopenharmony_ci Primitive Binding Suffix Components Underlying State 8145bd8deadSopenharmony_ci -------------------------- ---------- ---------------------------- 8155bd8deadSopenharmony_ci ... 8165bd8deadSopenharmony_ci primitive.threadid (id,-,-,-) id of the current thread 8175bd8deadSopenharmony_ci primitive.threadeqmask (m,-,-,-) mask with the current thread 8185bd8deadSopenharmony_ci primitive.threadltmask (m,-,-,-) mask with lower thread 8195bd8deadSopenharmony_ci primitive.threadlemask (m,-,-,-) mask with lower or equal thread 8205bd8deadSopenharmony_ci primitive.threadgtmask (m,-,-,-) mask with greater thread 8215bd8deadSopenharmony_ci primitive.threadgemask (m,-,-,-) mask with greater or equal thread 8225bd8deadSopenharmony_ci primitive.warpid (id,-,-,-) warp id of the current thread 8235bd8deadSopenharmony_ci primitive.smid (id,-,-,-) SM id of the current thread 8245bd8deadSopenharmony_ci ... 8255bd8deadSopenharmony_ci 8265bd8deadSopenharmony_ci If a attribute binding matches "primitive.threadid", the "x" component is 8275bd8deadSopenharmony_ci filled with the thread id of the current thread. The thread id is an 8285bd8deadSopenharmony_ci unsigned integer in the range 0 to 31. 8295bd8deadSopenharmony_ci 8305bd8deadSopenharmony_ci If a attribute binding matches "primitive.threadeqmask", the "x" 8315bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which the 8325bd8deadSopenharmony_ci bit equal to the current thread id is set. 8335bd8deadSopenharmony_ci 8345bd8deadSopenharmony_ci If a attribute binding matches "primitive.threadltmask", the "x" 8355bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 8365bd8deadSopenharmony_ci lower than the current thread id are set. 8375bd8deadSopenharmony_ci 8385bd8deadSopenharmony_ci If a attribute binding matches "primitive.threadlemask", the "x" 8395bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 8405bd8deadSopenharmony_ci lower or equal to the current thread id are set. 8415bd8deadSopenharmony_ci 8425bd8deadSopenharmony_ci If a attribute binding matches "primitive.threadgtmask", the "x" 8435bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 8445bd8deadSopenharmony_ci greater than the current thread id are set. 8455bd8deadSopenharmony_ci 8465bd8deadSopenharmony_ci If a attribute binding matches "primitive.threadgemask", the "x" 8475bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 8485bd8deadSopenharmony_ci greater or equal to the current thread id are set. 8495bd8deadSopenharmony_ci 8505bd8deadSopenharmony_ci If a attribute binding matches "primitive.warpid", the "x" component is 8515bd8deadSopenharmony_ci filled with the warp id of the current thread. The warp id is an unsigned 8525bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 8535bd8deadSopenharmony_ci 8545bd8deadSopenharmony_ci If a attribute binding matches "primitive.smid", the "x" component is 8555bd8deadSopenharmony_ci filled with the SM id of the current thread. The SM id is an unsigned 8565bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 8575bd8deadSopenharmony_ci 8585bd8deadSopenharmony_ci (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 8595bd8deadSopenharmony_ci as extended by NV_gpu_program5 and NV_tessellation_program5) 8605bd8deadSopenharmony_ci 8615bd8deadSopenharmony_ci + Shader thread group (NV_shader_thread_group) 8625bd8deadSopenharmony_ci 8635bd8deadSopenharmony_ci If a program specifies the "NV_shader_thread_group" option, it may use 8645bd8deadSopenharmony_ci the "primitive.threadid", "primitive.threadeqmask", 8655bd8deadSopenharmony_ci "primitive.threadltmask", "primitive.threadlemask", 8665bd8deadSopenharmony_ci "primitive.threadgtmask", "primitive.threadgemask", "primitive.warpid", 8675bd8deadSopenharmony_ci "primitive.smid", "state.thread.warpsize", "state.thread.warpspersm" and 8685bd8deadSopenharmony_ci "state.thread.smcount" bindings. It may also use the "TGBALLOT" 8695bd8deadSopenharmony_ci instruction. If this option is not specified, a program will fail to 8705bd8deadSopenharmony_ci compile if it uses those bindings. 8715bd8deadSopenharmony_ci 8725bd8deadSopenharmony_ci 8735bd8deadSopenharmony_ciDependencies on NV_compute_program5 8745bd8deadSopenharmony_ci 8755bd8deadSopenharmony_ci If NV_compute_program5 is supported and "OPTION NV_shader_thread_group" is 8765bd8deadSopenharmony_ci specified in an assembly program, the following edits are made to extend 8775bd8deadSopenharmony_ci the assembly programming model documented in the NV_gpu_program4 extension 8785bd8deadSopenharmony_ci and extended by NV_gpu_program5 and NV_compute_program5. 8795bd8deadSopenharmony_ci 8805bd8deadSopenharmony_ci If NV_compute_program5 is not supported, or if 8815bd8deadSopenharmony_ci "OPTION NV_shader_thread_group" is not specified in an assembly program, 8825bd8deadSopenharmony_ci the contents of this dependencies section should be ignored. 8835bd8deadSopenharmony_ci 8845bd8deadSopenharmony_ci Section 2.X.2, Program Grammar 8855bd8deadSopenharmony_ci 8865bd8deadSopenharmony_ci (add the following rules to the grammar) 8875bd8deadSopenharmony_ci 8885bd8deadSopenharmony_ci <attribBasic> ::= "invocation" "." "threadid" 8895bd8deadSopenharmony_ci | "invocation" "." "threadeqmask" 8905bd8deadSopenharmony_ci | "invocation" "." "threadltmask" 8915bd8deadSopenharmony_ci | "invocation" "." "threadlemask" 8925bd8deadSopenharmony_ci | "invocation" "." "threadgtmask" 8935bd8deadSopenharmony_ci | "invocation" "." "threadgemask" 8945bd8deadSopenharmony_ci | "invocation" "." "warpid" 8955bd8deadSopenharmony_ci | "invocation" "." "smid" 8965bd8deadSopenharmony_ci 8975bd8deadSopenharmony_ci Modify Section 2.X.3.2 of the NV_compute_program5 specification, Program 8985bd8deadSopenharmony_ci Attribute Variables. 8995bd8deadSopenharmony_ci 9005bd8deadSopenharmony_ci (Add the table entries and relevant text describing the compute program 9015bd8deadSopenharmony_ci input variable use to query thread states.) 9025bd8deadSopenharmony_ci 9035bd8deadSopenharmony_ci Attribute Binding Components Underlying State 9045bd8deadSopenharmony_ci -------------------------- ---------- ---------------------------- 9055bd8deadSopenharmony_ci ... 9065bd8deadSopenharmony_ci invocation.threadid (id,-,-,-) id of the current thread 9075bd8deadSopenharmony_ci invocation.threadeqmask (m,-,-,-) mask with the current thread 9085bd8deadSopenharmony_ci invocation.threadltmask (m,-,-,-) mask with lower thread 9095bd8deadSopenharmony_ci invocation.threadlemask (m,-,-,-) mask with lower or equal thread 9105bd8deadSopenharmony_ci invocation.threadgtmask (m,-,-,-) mask with greater thread 9115bd8deadSopenharmony_ci invocation.threadgemask (m,-,-,-) mask with greater or equal thread 9125bd8deadSopenharmony_ci invocation.warpid (id,-,-,-) warp id of the current thread 9135bd8deadSopenharmony_ci invocation.smid (id,-,-,-) SM id of the current thread 9145bd8deadSopenharmony_ci ... 9155bd8deadSopenharmony_ci 9165bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.threadid", the "x" 9175bd8deadSopenharmony_ci component is filled with the thread id of the current thread. The thread 9185bd8deadSopenharmony_ci id is an unsigned integer in the range 0 to 31. 9195bd8deadSopenharmony_ci 9205bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.threadeqmask", the "x" 9215bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which the 9225bd8deadSopenharmony_ci bit equal to the current thread id is set. 9235bd8deadSopenharmony_ci 9245bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.threadltmask", the "x" 9255bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 9265bd8deadSopenharmony_ci lower than the current thread id are set. 9275bd8deadSopenharmony_ci 9285bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.threadlemask", the "x" 9295bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 9305bd8deadSopenharmony_ci lower or equal to the current thread id are set. 9315bd8deadSopenharmony_ci 9325bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.threadgtmask", the "x" 9335bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 9345bd8deadSopenharmony_ci greater than the current thread id are set. 9355bd8deadSopenharmony_ci 9365bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.threadgemask", the "x" 9375bd8deadSopenharmony_ci component is filled with a 32-bit unsigned integer bitfield in which bits 9385bd8deadSopenharmony_ci greater or equal to the current thread id are set. 9395bd8deadSopenharmony_ci 9405bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.warpid", the "x" 9415bd8deadSopenharmony_ci component is filled with the warp id of the current thread. The warp id is 9425bd8deadSopenharmony_ci an unsigned integer, the range of this value is hw dependent. 9435bd8deadSopenharmony_ci 9445bd8deadSopenharmony_ci If a compute attribute binding matches "invocation.smid", the "x" component 9455bd8deadSopenharmony_ci is filled with the SM id of the current thread. The SM id is an unsigned 9465bd8deadSopenharmony_ci integer, the range of this value is hw dependent. 9475bd8deadSopenharmony_ci 9485bd8deadSopenharmony_ci (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 9495bd8deadSopenharmony_ci as extended by NV_gpu_program5 and NV_compute_program5) 9505bd8deadSopenharmony_ci 9515bd8deadSopenharmony_ci 9525bd8deadSopenharmony_ci + Shader thread group (NV_shader_thread_group) 9535bd8deadSopenharmony_ci 9545bd8deadSopenharmony_ci If a program specifies the "NV_shader_thread_group" option, it may use the 9555bd8deadSopenharmony_ci "invocation.threadid", "invocation.threadeqmask", 9565bd8deadSopenharmony_ci "invocation.threadltmask", "invocation.threadlemask", 9575bd8deadSopenharmony_ci "invocation.threadgtmask", "invocation.threadgemask", "invocation.warpid", 9585bd8deadSopenharmony_ci "invocation.smid", "state.thread.warpsize", "state.thread.warpspersm" and 9595bd8deadSopenharmony_ci "state.thread.smcount" bindings. It may also use the "TGBALLOT" 9605bd8deadSopenharmony_ci instruction. If this option is not specified, a program will fail to 9615bd8deadSopenharmony_ci compile if it uses those bindings. 9625bd8deadSopenharmony_ci 9635bd8deadSopenharmony_ci 9645bd8deadSopenharmony_ciErrors 9655bd8deadSopenharmony_ci 9665bd8deadSopenharmony_ci None. 9675bd8deadSopenharmony_ci 9685bd8deadSopenharmony_ciNew State 9695bd8deadSopenharmony_ci 9705bd8deadSopenharmony_ci None. 9715bd8deadSopenharmony_ci 9725bd8deadSopenharmony_ciNew Implementation Dependent State 9735bd8deadSopenharmony_ci 9745bd8deadSopenharmony_ci Minimum 9755bd8deadSopenharmony_ci Get Value Type Get Command Value Description Sec. Attrib 9765bd8deadSopenharmony_ci -------------------------------- ---- --------------- ------- --------------------- ------ ------ 9775bd8deadSopenharmony_ci WARP_SIZE_NV Z+ GetIntegerv 1 total number of 2.X.3.3 - 9785bd8deadSopenharmony_ci thread in a warp. 9795bd8deadSopenharmony_ci 9805bd8deadSopenharmony_ci WARPS_PER_SM_NV Z+ GetIntegerv 1 maximum number of 2.X.3.3 - 9815bd8deadSopenharmony_ci warp executing on a 9825bd8deadSopenharmony_ci SM. 9835bd8deadSopenharmony_ci 9845bd8deadSopenharmony_ci SM_COUNT_NV Z+ GetIntegerv 1 number of SM on the 2.X.3.3 - 9855bd8deadSopenharmony_ci GPU. 9865bd8deadSopenharmony_ci 9875bd8deadSopenharmony_ci 9885bd8deadSopenharmony_ciIssues 9895bd8deadSopenharmony_ci 9905bd8deadSopenharmony_ci None 9915bd8deadSopenharmony_ci 9925bd8deadSopenharmony_ci 9935bd8deadSopenharmony_ciRevision History 9945bd8deadSopenharmony_ci 9955bd8deadSopenharmony_ci Rev. Date Author Changes 9965bd8deadSopenharmony_ci ---- -------- -------- ----------------------------------------- 9975bd8deadSopenharmony_ci 4 7/21/15 jbreton Update the layout of threads within a quad for 9985bd8deadSopenharmony_ci window and framebuffer object rendering. 9995bd8deadSopenharmony_ci 3 2/14/14 jbreton Rename the extension from NVX to NV. 10005bd8deadSopenharmony_ci 2 9/4/13 jbreton Add helperThread attribute binding. 10015bd8deadSopenharmony_ci 1 12/19/12 jbreton Internal revisions. 1002