15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_gpu_program5 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_NV_gpu_program5 85bd8deadSopenharmony_ci GL_NV_gpu_program_fp64 95bd8deadSopenharmony_ci 105bd8deadSopenharmony_ciContact 115bd8deadSopenharmony_ci 125bd8deadSopenharmony_ci Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 135bd8deadSopenharmony_ci 145bd8deadSopenharmony_ciStatus 155bd8deadSopenharmony_ci 165bd8deadSopenharmony_ci Shipping. 175bd8deadSopenharmony_ci 185bd8deadSopenharmony_ciVersion 195bd8deadSopenharmony_ci 205bd8deadSopenharmony_ci Last Modified Date: 05/25/2022 215bd8deadSopenharmony_ci NVIDIA Revision: 8 225bd8deadSopenharmony_ci 235bd8deadSopenharmony_ciNumber 245bd8deadSopenharmony_ci 255bd8deadSopenharmony_ci 388 265bd8deadSopenharmony_ci 275bd8deadSopenharmony_ciDependencies 285bd8deadSopenharmony_ci 295bd8deadSopenharmony_ci OpenGL 2.0 is required. 305bd8deadSopenharmony_ci 315bd8deadSopenharmony_ci This extension is written against the OpenGL 3.0 specification. 325bd8deadSopenharmony_ci 335bd8deadSopenharmony_ci NV_gpu_program4 and NV_gpu_program4_1 are required. 345bd8deadSopenharmony_ci 355bd8deadSopenharmony_ci NV_shader_buffer_load is required. 365bd8deadSopenharmony_ci 375bd8deadSopenharmony_ci NV_shader_buffer_store is required. 385bd8deadSopenharmony_ci 395bd8deadSopenharmony_ci This extension is written against and interacts with the NV_gpu_program4, 405bd8deadSopenharmony_ci NV_vertex_program4, NV_geometry_program4, and NV_fragment_program4 415bd8deadSopenharmony_ci specifications. 425bd8deadSopenharmony_ci 435bd8deadSopenharmony_ci This extension interacts with NV_tessellation_program5. 445bd8deadSopenharmony_ci 455bd8deadSopenharmony_ci This extension interacts with ARB_transform_feedback3. 465bd8deadSopenharmony_ci 475bd8deadSopenharmony_ci This extension interacts trivially with NV_shader_buffer_load. 485bd8deadSopenharmony_ci 495bd8deadSopenharmony_ci This extension interacts trivially with NV_shader_buffer_store. 505bd8deadSopenharmony_ci 515bd8deadSopenharmony_ci This extension interacts trivially with NV_parameter_buffer_object2. 525bd8deadSopenharmony_ci 535bd8deadSopenharmony_ci This extension interacts trivially with OpenGL 3.3, ARB_texture_swizzle, 545bd8deadSopenharmony_ci and EXT_texture_swizzle. 555bd8deadSopenharmony_ci 565bd8deadSopenharmony_ci This extension interacts trivially with ARB_blend_func_extended. 575bd8deadSopenharmony_ci 585bd8deadSopenharmony_ci This extension interacts trivially with EXT_shader_image_load_store. 595bd8deadSopenharmony_ci 605bd8deadSopenharmony_ci This extension interacts trivially with ARB_shader_subroutine. 615bd8deadSopenharmony_ci 625bd8deadSopenharmony_ci If the 64-bit floating-point portion of this extension is not supported, 635bd8deadSopenharmony_ci "GL_NV_gpu_program_fp64" will not be found in the extension string. 645bd8deadSopenharmony_ci 655bd8deadSopenharmony_ciOverview 665bd8deadSopenharmony_ci 675bd8deadSopenharmony_ci This specification documents the common instruction set and basic 685bd8deadSopenharmony_ci functionality provided by NVIDIA's 5th generation of assembly instruction 695bd8deadSopenharmony_ci sets supporting programmable graphics pipeline stages. 705bd8deadSopenharmony_ci 715bd8deadSopenharmony_ci The instruction set builds upon the basic framework provided by the 725bd8deadSopenharmony_ci ARB_vertex_program and ARB_fragment_program extensions to expose 735bd8deadSopenharmony_ci considerably more capable hardware. In addition to new capabilities for 745bd8deadSopenharmony_ci vertex and fragment programs, this extension provides new functionality 755bd8deadSopenharmony_ci for geometry programs as originally described in the NV_geometry_program4 765bd8deadSopenharmony_ci specification, and serves as the basis for the new tessellation control 775bd8deadSopenharmony_ci and evaluation programs described in the NV_tessellation_program5 785bd8deadSopenharmony_ci extension. 795bd8deadSopenharmony_ci 805bd8deadSopenharmony_ci Programs using the functionality provided by this extension should begin 815bd8deadSopenharmony_ci with the program headers "!!NVvp5.0" (vertex programs), "!!NVtcp5.0" 825bd8deadSopenharmony_ci (tessellation control programs), "!!NVtep5.0" (tessellation evaluation 835bd8deadSopenharmony_ci programs), "!!NVgp5.0" (geometry programs), and "!!NVfp5.0" (fragment 845bd8deadSopenharmony_ci programs). 855bd8deadSopenharmony_ci 865bd8deadSopenharmony_ci This extension provides a variety of new features, including: 875bd8deadSopenharmony_ci 885bd8deadSopenharmony_ci * support for 64-bit integer operations; 895bd8deadSopenharmony_ci 905bd8deadSopenharmony_ci * the ability to dynamically index into an array of texture units or 915bd8deadSopenharmony_ci program parameter buffers; 925bd8deadSopenharmony_ci 935bd8deadSopenharmony_ci * extending texel offset support to allow loading texel offsets from 945bd8deadSopenharmony_ci regular integer operands computed at run-time, instead of requiring 955bd8deadSopenharmony_ci that the offsets be constants encoded in texture instructions; 965bd8deadSopenharmony_ci 975bd8deadSopenharmony_ci * extending TXG (texture gather) support to return the 2x2 footprint 985bd8deadSopenharmony_ci from any component of the texture image instead of always returning 995bd8deadSopenharmony_ci the first (x) component; 1005bd8deadSopenharmony_ci 1015bd8deadSopenharmony_ci * extending TXG to support shadow comparisons in conjunction with a 1025bd8deadSopenharmony_ci depth texture, via the SHADOW* targets; 1035bd8deadSopenharmony_ci 1045bd8deadSopenharmony_ci * further extending texture gather support to provide a new opcode 1055bd8deadSopenharmony_ci (TXGO) that applies a separate texel offset vector to each of the four 1065bd8deadSopenharmony_ci samples returned by the instruction; 1075bd8deadSopenharmony_ci 1085bd8deadSopenharmony_ci * bit manipulation instructions, including ones to find the position of 1095bd8deadSopenharmony_ci the most or least significant set bit, bitfield insertion and 1105bd8deadSopenharmony_ci extraction, and bit reversal; 1115bd8deadSopenharmony_ci 1125bd8deadSopenharmony_ci * a general data conversion instruction (CVT) supporting conversion 1135bd8deadSopenharmony_ci between any two data types supported by this extension; and 1145bd8deadSopenharmony_ci 1155bd8deadSopenharmony_ci * new instructions to compute the composite of a set of boolean 1165bd8deadSopenharmony_ci conditions a group of shader threads. 1175bd8deadSopenharmony_ci 1185bd8deadSopenharmony_ci This extension also provides some new capabilities for individual program 1195bd8deadSopenharmony_ci types, including: 1205bd8deadSopenharmony_ci 1215bd8deadSopenharmony_ci * support for instanced geometry programs, where a geometry program may 1225bd8deadSopenharmony_ci be run multiple times for each primitive; 1235bd8deadSopenharmony_ci 1245bd8deadSopenharmony_ci * support for emitting vertices in a geometry program where each vertex 1255bd8deadSopenharmony_ci emitted may be directed at a specified vertex stream and captured 1265bd8deadSopenharmony_ci using the ARB_transform_feedback3 extension; 1275bd8deadSopenharmony_ci 1285bd8deadSopenharmony_ci * support for interpolating an attribute at a programmable offset 1295bd8deadSopenharmony_ci relative to the pixel center (IPAO), at a programmable sample number 1305bd8deadSopenharmony_ci (IPAS), or at the fragment's centroid location (IPAC) in a fragment 1315bd8deadSopenharmony_ci program; 1325bd8deadSopenharmony_ci 1335bd8deadSopenharmony_ci * support for reading a mask of covered samples in a fragment program; 1345bd8deadSopenharmony_ci 1355bd8deadSopenharmony_ci * support for reading a point sprite coordinate directly in a fragment 1365bd8deadSopenharmony_ci program, without overriding a texture coordinate; 1375bd8deadSopenharmony_ci 1385bd8deadSopenharmony_ci * support for reading patch primitives and per-patch attributes 1395bd8deadSopenharmony_ci (introduced by ARB_tessellation_shader) in a geometry program; and 1405bd8deadSopenharmony_ci 1415bd8deadSopenharmony_ci * support for multiple output vectors for a single color output in a 1425bd8deadSopenharmony_ci fragment program (as used by ARB_blend_func_extended). 1435bd8deadSopenharmony_ci 1445bd8deadSopenharmony_ci This extension also provides optional support for 64-bit-per-component 1455bd8deadSopenharmony_ci variables and 64-bit floating-point arithmetic. These features are 1465bd8deadSopenharmony_ci supported if and only if "NV_gpu_program_fp64" is found in the extension 1475bd8deadSopenharmony_ci string. 1485bd8deadSopenharmony_ci 1495bd8deadSopenharmony_ci This extension incorporates the memory access operations from the 1505bd8deadSopenharmony_ci NV_shader_buffer_load and NV_parameter_buffer_object2 extensions, 1515bd8deadSopenharmony_ci originally built as add-ons to NV_gpu_program4. It also provides the 1525bd8deadSopenharmony_ci following new capabilities: 1535bd8deadSopenharmony_ci 1545bd8deadSopenharmony_ci * support for the features without requiring a separate OPTION keyword; 1555bd8deadSopenharmony_ci 1565bd8deadSopenharmony_ci * support for indexing into an array of constant buffers using the LDC 1575bd8deadSopenharmony_ci opcode added by NV_parameter_buffer_object2; 1585bd8deadSopenharmony_ci 1595bd8deadSopenharmony_ci * support for storing into buffer objects at a specified GPU address 1605bd8deadSopenharmony_ci using the STORE opcode, an allowing applications to create READ_WRITE 1615bd8deadSopenharmony_ci and WRITE_ONLY mappings when making a buffer object resident using the 1625bd8deadSopenharmony_ci API mechanisms in the NV_shader_buffer_store extension; 1635bd8deadSopenharmony_ci 1645bd8deadSopenharmony_ci * storage instruction modifiers to allow loading and storing 64-bit 1655bd8deadSopenharmony_ci component values; 1665bd8deadSopenharmony_ci 1675bd8deadSopenharmony_ci * support for atomic memory transactions using the ATOM opcode, where 1685bd8deadSopenharmony_ci the instruction atomically reads the memory pointed to by a pointer, 1695bd8deadSopenharmony_ci performs a specified computation, stores the results of that 1705bd8deadSopenharmony_ci computation, and returns the original value read; 1715bd8deadSopenharmony_ci 1725bd8deadSopenharmony_ci * support for memory barrier transactions using the MEMBAR opcode, which 1735bd8deadSopenharmony_ci ensures that all memory stores issued prior to the opcode complete 1745bd8deadSopenharmony_ci prior to any subsequent memory transactions; and 1755bd8deadSopenharmony_ci 1765bd8deadSopenharmony_ci * a fragment program option to specify that depth and stencil tests are 1775bd8deadSopenharmony_ci performed prior to fragment program execution. 1785bd8deadSopenharmony_ci 1795bd8deadSopenharmony_ci Additionally, the assembly program languages supported by this extension 1805bd8deadSopenharmony_ci include support for reading, writing, and performing atomic memory 1815bd8deadSopenharmony_ci operations on texture image data using the opcodes and mechanisms 1825bd8deadSopenharmony_ci documented in the "Dependencies on NV_gpu_program5" section of the 1835bd8deadSopenharmony_ci EXT_shader_image_load_store extension. 1845bd8deadSopenharmony_ci 1855bd8deadSopenharmony_ciNew Procedures and Functions 1865bd8deadSopenharmony_ci 1875bd8deadSopenharmony_ci None. 1885bd8deadSopenharmony_ci 1895bd8deadSopenharmony_ciNew Tokens 1905bd8deadSopenharmony_ci 1915bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, 1925bd8deadSopenharmony_ci GetFloatv, and GetDoublev: 1935bd8deadSopenharmony_ci 1945bd8deadSopenharmony_ci MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV 0x8E5A 1955bd8deadSopenharmony_ci MIN_FRAGMENT_INTERPOLATION_OFFSET_NV 0x8E5B 1965bd8deadSopenharmony_ci MAX_FRAGMENT_INTERPOLATION_OFFSET_NV 0x8E5C 1975bd8deadSopenharmony_ci FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV 0x8E5D 1985bd8deadSopenharmony_ci MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV 0x8E5E 1995bd8deadSopenharmony_ci MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV 0x8E5F 2005bd8deadSopenharmony_ci 2015bd8deadSopenharmony_ci 2025bd8deadSopenharmony_ciAdditions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation) 2035bd8deadSopenharmony_ci 2045bd8deadSopenharmony_ci Modify Section 2.X.2 of NV_fragment_program4, Program Grammar 2055bd8deadSopenharmony_ci 2065bd8deadSopenharmony_ci (modify the section, updating the program header string for the extended 2075bd8deadSopenharmony_ci instruction set) 2085bd8deadSopenharmony_ci 2095bd8deadSopenharmony_ci Fragment programs are required to begin with the header string 2105bd8deadSopenharmony_ci "!!NVfp5.0". This header string identifies the subsequent program body as 2115bd8deadSopenharmony_ci being a fragment program and indicates that it should be parsed according 2125bd8deadSopenharmony_ci to the base NV_gpu_program5 grammar plus the additions below. Program 2135bd8deadSopenharmony_ci string parsing begins with the character immediately following the header 2145bd8deadSopenharmony_ci string. 2155bd8deadSopenharmony_ci 2165bd8deadSopenharmony_ci (add/change the following rules to the NV_fragment_program4 and 2175bd8deadSopenharmony_ci NV_gpu_program5 base grammars) 2185bd8deadSopenharmony_ci 2195bd8deadSopenharmony_ci <SpecialInstruction> ::= "IPAC" <opModifiers> <instResult> "," 2205bd8deadSopenharmony_ci <instOperandV> 2215bd8deadSopenharmony_ci | "IPAO" <opModifiers> <instResult> "," 2225bd8deadSopenharmony_ci <instOperandV> "," <instOperandV> 2235bd8deadSopenharmony_ci | "IPAS" <opModifiers> <instResult> "," 2245bd8deadSopenharmony_ci <instOperandV> "," <instOperandS> 2255bd8deadSopenharmony_ci 2265bd8deadSopenharmony_ci <interpModifier> ::= "SAMPLE" 2275bd8deadSopenharmony_ci 2285bd8deadSopenharmony_ci <attribBasic> ::= <fragPrefix> "sampleid" 2295bd8deadSopenharmony_ci | <fragPrefix> "samplemask" 2305bd8deadSopenharmony_ci | <fragPrefix> "pointcoord" 2315bd8deadSopenharmony_ci 2325bd8deadSopenharmony_ci <resultBasic> ::= <resPrefix> "color" <resultOptColorNum> 2335bd8deadSopenharmony_ci <resultOptColorType> 2345bd8deadSopenharmony_ci | <resPrefix> "samplemask" 2355bd8deadSopenharmony_ci 2365bd8deadSopenharmony_ci <resultOptColorType> ::= "" 2375bd8deadSopenharmony_ci | "." <colorType> 2385bd8deadSopenharmony_ci 2395bd8deadSopenharmony_ci 2405bd8deadSopenharmony_ci Modify Section 2.X.2 of NV_geometry_program4, Program Grammar 2415bd8deadSopenharmony_ci 2425bd8deadSopenharmony_ci (modify the section, updating the program header string for the extended 2435bd8deadSopenharmony_ci instruction set) 2445bd8deadSopenharmony_ci 2455bd8deadSopenharmony_ci Geometry programs are required to begin with the header string 2465bd8deadSopenharmony_ci "!!NVgp5.0". This header string identifies the subsequent program body as 2475bd8deadSopenharmony_ci being a geometry program and indicates that it should be parsed according 2485bd8deadSopenharmony_ci to the base NV_gpu_program5 grammar plus the additions below. Program 2495bd8deadSopenharmony_ci string parsing begins with the character immediately following the header 2505bd8deadSopenharmony_ci string. 2515bd8deadSopenharmony_ci 2525bd8deadSopenharmony_ci (add the following rules to the NV_geometry_program4 and NV_gpu_program5 2535bd8deadSopenharmony_ci base grammars) 2545bd8deadSopenharmony_ci 2555bd8deadSopenharmony_ci <declaration> ::= "INVOCATIONS" <int> 2565bd8deadSopenharmony_ci 2575bd8deadSopenharmony_ci <declPrimInType> ::= "PATCHES" 2585bd8deadSopenharmony_ci 2595bd8deadSopenharmony_ci <SpecialInstruction> ::= "EMITS" <instOperandS> 2605bd8deadSopenharmony_ci 2615bd8deadSopenharmony_ci <attribBasic> ::= <primPrefix> "invocation" 2625bd8deadSopenharmony_ci | <primPrefix> "vertexcount" 2635bd8deadSopenharmony_ci | <attribTessOuter> <optArrayMemAbs> 2645bd8deadSopenharmony_ci | <attribTessInner> <optArrayMemAbs> 2655bd8deadSopenharmony_ci | <attribPatchGeneric> <optArrayMemAbs> 2665bd8deadSopenharmony_ci 2675bd8deadSopenharmony_ci <attribMulti> ::= <attribTessOuter> <arrayRange> 2685bd8deadSopenharmony_ci | <attribTessInner> <arrayRange> 2695bd8deadSopenharmony_ci | <attribPatchGeneric> <arrayRange> 2705bd8deadSopenharmony_ci 2715bd8deadSopenharmony_ci <attribTessOuter> ::= <primPrefix> "." "tessouter" 2725bd8deadSopenharmony_ci 2735bd8deadSopenharmony_ci <attribTessInner> ::= <primPrefix> "." "tessinner" 2745bd8deadSopenharmony_ci 2755bd8deadSopenharmony_ci <attribPatchGeneric> ::= <primPrefix> "." "patch" "." "attrib" 2765bd8deadSopenharmony_ci 2775bd8deadSopenharmony_ci 2785bd8deadSopenharmony_ci Modify Section 2.X.2 of NV_vertex_program4, Program Grammar 2795bd8deadSopenharmony_ci 2805bd8deadSopenharmony_ci (modify the section, updating the program header string for the extended 2815bd8deadSopenharmony_ci instruction set) 2825bd8deadSopenharmony_ci 2835bd8deadSopenharmony_ci Vertex programs are required to begin with the header string "!!NVvp5.0". 2845bd8deadSopenharmony_ci This header string identifies the subsequent program body as being a 2855bd8deadSopenharmony_ci vertex program and indicates that it should be parsed according to the 2865bd8deadSopenharmony_ci base NV_gpu_program5 grammar plus the additions below. Program string 2875bd8deadSopenharmony_ci parsing begins with the character immediately following the header string. 2885bd8deadSopenharmony_ci 2895bd8deadSopenharmony_ci 2905bd8deadSopenharmony_ci Modify Section 2.X.2 of NV_gpu_program4, Program Grammar 2915bd8deadSopenharmony_ci 2925bd8deadSopenharmony_ci (add the following grammar rules to the NV_gpu_program4 base grammar; 2935bd8deadSopenharmony_ci additional grammar rules usable for assembly programs are documented in 2945bd8deadSopenharmony_ci the EXT_shader_image_load_store and ARB_shader_subroutine specifications) 2955bd8deadSopenharmony_ci 2965bd8deadSopenharmony_ci <instruction> ::= <MemInstruction> 2975bd8deadSopenharmony_ci 2985bd8deadSopenharmony_ci <MemInstruction> ::= <ATOMop_instruction> 2995bd8deadSopenharmony_ci | <STOREop_instruction> 3005bd8deadSopenharmony_ci | <MEMBARop_instruction> 3015bd8deadSopenharmony_ci 3025bd8deadSopenharmony_ci <VECTORop> ::= "BFR" 3035bd8deadSopenharmony_ci | "BTC" 3045bd8deadSopenharmony_ci | "BTFL" 3055bd8deadSopenharmony_ci | "BTFM" 3065bd8deadSopenharmony_ci | "PK64" 3075bd8deadSopenharmony_ci | "LDC" 3085bd8deadSopenharmony_ci | "CVT" 3095bd8deadSopenharmony_ci | "TGALL" 3105bd8deadSopenharmony_ci | "TGANY" 3115bd8deadSopenharmony_ci | "TGEQ" 3125bd8deadSopenharmony_ci | "UP64" 3135bd8deadSopenharmony_ci 3145bd8deadSopenharmony_ci <SCALARop> ::= "LOAD" 3155bd8deadSopenharmony_ci 3165bd8deadSopenharmony_ci <BINop> ::= "BFE" 3175bd8deadSopenharmony_ci 3185bd8deadSopenharmony_ci <TRIop> ::= "BFI" 3195bd8deadSopenharmony_ci 3205bd8deadSopenharmony_ci <TEXop_instruction> ::= <TEXop> <opModifiers> <instResult> "," 3215bd8deadSopenharmony_ci <instOperandV> "," <instOperandV> "," 3225bd8deadSopenharmony_ci <texAccess> 3235bd8deadSopenharmony_ci 3245bd8deadSopenharmony_ci <TEXop> ::= "TXG" 3255bd8deadSopenharmony_ci | "LOD" 3265bd8deadSopenharmony_ci 3275bd8deadSopenharmony_ci <TXDop> ::= "TXGO" 3285bd8deadSopenharmony_ci 3295bd8deadSopenharmony_ci <ATOMop_instruction> ::= <ATOMop> <opModifiers> <instResult> "," 3305bd8deadSopenharmony_ci <instOperandV> "," <instOperandS> 3315bd8deadSopenharmony_ci 3325bd8deadSopenharmony_ci <ATOMop> ::= "ATOM" 3335bd8deadSopenharmony_ci 3345bd8deadSopenharmony_ci <STOREop_instruction> ::= <STOREop> <opModifiers> <instOperandV> "," 3355bd8deadSopenharmony_ci <instOperandS> 3365bd8deadSopenharmony_ci 3375bd8deadSopenharmony_ci <STOREop> ::= "STORE" 3385bd8deadSopenharmony_ci 3395bd8deadSopenharmony_ci <MEMBARop_instruction> ::= <MEMBARop> <opModifiers> 3405bd8deadSopenharmony_ci 3415bd8deadSopenharmony_ci <MEMBARop> ::= "MEMBAR" 3425bd8deadSopenharmony_ci 3435bd8deadSopenharmony_ci <opModifier> ::= "F16" 3445bd8deadSopenharmony_ci | "F32" 3455bd8deadSopenharmony_ci | "F64" 3465bd8deadSopenharmony_ci | "F32X2" 3475bd8deadSopenharmony_ci | "F32X4" 3485bd8deadSopenharmony_ci | "F64X2" 3495bd8deadSopenharmony_ci | "F64X4" 3505bd8deadSopenharmony_ci | "S8" 3515bd8deadSopenharmony_ci | "S16" 3525bd8deadSopenharmony_ci | "S32" 3535bd8deadSopenharmony_ci | "S32X2" 3545bd8deadSopenharmony_ci | "S32X4" 3555bd8deadSopenharmony_ci | "S64" 3565bd8deadSopenharmony_ci | "S64X2" 3575bd8deadSopenharmony_ci | "S64X4" 3585bd8deadSopenharmony_ci | "U8" 3595bd8deadSopenharmony_ci | "U16" 3605bd8deadSopenharmony_ci | "U32" 3615bd8deadSopenharmony_ci | "U32X2" 3625bd8deadSopenharmony_ci | "U32X4" 3635bd8deadSopenharmony_ci | "U64" 3645bd8deadSopenharmony_ci | "U64X2" 3655bd8deadSopenharmony_ci | "U64X4" 3665bd8deadSopenharmony_ci | "ADD" 3675bd8deadSopenharmony_ci | "MIN" 3685bd8deadSopenharmony_ci | "MAX" 3695bd8deadSopenharmony_ci | "IWRAP" 3705bd8deadSopenharmony_ci | "DWRAP" 3715bd8deadSopenharmony_ci | "AND" 3725bd8deadSopenharmony_ci | "OR" 3735bd8deadSopenharmony_ci | "XOR" 3745bd8deadSopenharmony_ci | "EXCH" 3755bd8deadSopenharmony_ci | "CSWAP" 3765bd8deadSopenharmony_ci | "COH" 3775bd8deadSopenharmony_ci | "ROUND" 3785bd8deadSopenharmony_ci | "CEIL" 3795bd8deadSopenharmony_ci | "FLR" 3805bd8deadSopenharmony_ci | "TRUNC" 3815bd8deadSopenharmony_ci | "PREC" 3825bd8deadSopenharmony_ci | "VOL" 3835bd8deadSopenharmony_ci 3845bd8deadSopenharmony_ci <texAccess> ::= <textureUseS> "," <texTarget> <optTexOffset> 3855bd8deadSopenharmony_ci | <textureUseV> "," <texTarget> <optTexOffset> 3865bd8deadSopenharmony_ci 3875bd8deadSopenharmony_ci <texTarget> ::= "ARRAYCUBE" 3885bd8deadSopenharmony_ci | "SHADOWARRAYCUBE" 3895bd8deadSopenharmony_ci 3905bd8deadSopenharmony_ci <optTexOffset> ::= /* empty */ 3915bd8deadSopenharmony_ci | <texOffset> 3925bd8deadSopenharmony_ci 3935bd8deadSopenharmony_ci <texOffset> ::= "offset" "(" <instOperandV> ")" 3945bd8deadSopenharmony_ci 3955bd8deadSopenharmony_ci <namingStatement> ::= <TEXTURE_statement> 3965bd8deadSopenharmony_ci 3975bd8deadSopenharmony_ci <BUFFER_statement> ::= <bufferDeclType> <establishName> 3985bd8deadSopenharmony_ci <optArraySize> <optArraySize> "=" 3995bd8deadSopenharmony_ci <bufferMultInit> 4005bd8deadSopenharmony_ci 4015bd8deadSopenharmony_ci <bufferDeclType> ::= "CBUFFER" 4025bd8deadSopenharmony_ci 4035bd8deadSopenharmony_ci <TEXTURE_statement> ::= "TEXTURE" <establishName> <texSingleInit> 4045bd8deadSopenharmony_ci | "TEXTURE" <establishName> <optArraySize> 4055bd8deadSopenharmony_ci <texMultipleInit> 4065bd8deadSopenharmony_ci 4075bd8deadSopenharmony_ci <texSingleInit> ::= "=" <textureUseDS> 4085bd8deadSopenharmony_ci 4095bd8deadSopenharmony_ci <texMultipleInit> ::= "=" "{" <texItemList> "}" 4105bd8deadSopenharmony_ci 4115bd8deadSopenharmony_ci <texItemList> ::= <textureUseDM> 4125bd8deadSopenharmony_ci | <textureUseDM> "," <texItemList> 4135bd8deadSopenharmony_ci 4145bd8deadSopenharmony_ci <bufferBinding> ::= "program" "." "buffer" <arrayRange> 4155bd8deadSopenharmony_ci 4165bd8deadSopenharmony_ci <textureUseS> ::= <textureUseV> <texImageUnitComp> 4175bd8deadSopenharmony_ci 4185bd8deadSopenharmony_ci <textureUseV> ::= <texImageUnit> 4195bd8deadSopenharmony_ci | <texVarName> <optArrayMem> 4205bd8deadSopenharmony_ci 4215bd8deadSopenharmony_ci <textureUseDS> ::= "texture" <arrayMemAbs> 4225bd8deadSopenharmony_ci 4235bd8deadSopenharmony_ci <textureUseDM> ::= <textureUseDS> 4245bd8deadSopenharmony_ci | "texture" <arrayRange> 4255bd8deadSopenharmony_ci 4265bd8deadSopenharmony_ci <texImageUnitComp> ::= <scalarSuffix> 4275bd8deadSopenharmony_ci 4285bd8deadSopenharmony_ci 4295bd8deadSopenharmony_ci Modify Section 2.X.3.1, Program Variable Types 4305bd8deadSopenharmony_ci 4315bd8deadSopenharmony_ci (IGNORE if GL_NV_gpu_program_fp64 is not found in the extension string. 4325bd8deadSopenharmony_ci Otherwise modify storage size modifiers to guarantee that "LONG" 4335bd8deadSopenharmony_ci variables are at least 64 bits in size.) 4345bd8deadSopenharmony_ci 4355bd8deadSopenharmony_ci Explicitly declared variables may optionally have one storage size 4365bd8deadSopenharmony_ci modifier. Variables decared as "SHORT" will be represented using at least 4375bd8deadSopenharmony_ci 16 bits per component. "SHORT" floating-point values will have at least 5 4385bd8deadSopenharmony_ci bits of exponent and 10 bits of mantissa. Variables declared as "LONG" 4395bd8deadSopenharmony_ci will be represented with at least 64 bits per component. "LONG" 4405bd8deadSopenharmony_ci floating-point values will have at least 11 bits of exponent and 52 bits 4415bd8deadSopenharmony_ci of mantissa. If no size modifier is provided, the GL will automatically 4425bd8deadSopenharmony_ci select component sizes. Implementations are not required to support more 4435bd8deadSopenharmony_ci than one component size, so "SHORT", "LONG", and the default could all 4445bd8deadSopenharmony_ci refer to the same component size. The "LONG" modifier is supported only 4455bd8deadSopenharmony_ci for declarations of temporary variables ("TEMP"), and attribute variables 4465bd8deadSopenharmony_ci ("ATTRIB") in vertex programs. The "SHORT" modifier is supported only 4475bd8deadSopenharmony_ci for declarations of temporary variables and result variables ("OUTPUT"). 4485bd8deadSopenharmony_ci 4495bd8deadSopenharmony_ci 4505bd8deadSopenharmony_ci Modify Section 2.X.3.2 of the NV_fragment_program4 specification, Program 4515bd8deadSopenharmony_ci Attribute Variables. 4525bd8deadSopenharmony_ci 4535bd8deadSopenharmony_ci (Add a table entry and relevant text describing the fragment program 4545bd8deadSopenharmony_ci input sample mask variable.) 4555bd8deadSopenharmony_ci 4565bd8deadSopenharmony_ci Fragment Attribute Binding Components Underlying State 4575bd8deadSopenharmony_ci -------------------------- ---------- ---------------------------- 4585bd8deadSopenharmony_ci fragment.samplemask (m,-,-,-) fragment coverage mask 4595bd8deadSopenharmony_ci fragment.pointcoord (s,t,-,-) fragment point sprite coordinate 4605bd8deadSopenharmony_ci 4615bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.samplemask", the "x" 4625bd8deadSopenharmony_ci component is filled with a coverage mask indicating the set of samples 4635bd8deadSopenharmony_ci covered by this fragment. The coverage mask is a bitfield, where bit <n> 4645bd8deadSopenharmony_ci is one if the sample number <n> is covered and zero otherwise. If 4655bd8deadSopenharmony_ci multisample buffers are not available (SAMPLE_BUFFERS is zero), bit zero 4665bd8deadSopenharmony_ci indicates if the center of the pixel corresponding to the fragment is 4675bd8deadSopenharmony_ci covered. 4685bd8deadSopenharmony_ci 4695bd8deadSopenharmony_ci If a fragment attribute binding matches "fragment.pointcoord", the "x" and 4705bd8deadSopenharmony_ci "y" components are filled with the s and t point sprite coordinates 4715bd8deadSopenharmony_ci (section 3.3.1), respectively. The "z" and "w" components are undefined. 4725bd8deadSopenharmony_ci If the fragment is generated by any primitive other than a point, or if 4735bd8deadSopenharmony_ci point sprites are disabled, all four components of the binding are 4745bd8deadSopenharmony_ci undefined. 4755bd8deadSopenharmony_ci 4765bd8deadSopenharmony_ci Modify Section 2.X.3.2 of the NV_geometry_program4 specification, Program 4775bd8deadSopenharmony_ci Attribute Variables. 4785bd8deadSopenharmony_ci 4795bd8deadSopenharmony_ci (Add a table entry and relevant text describing the geometry program 4805bd8deadSopenharmony_ci invocation attribute and per-patch attributes.) 4815bd8deadSopenharmony_ci 4825bd8deadSopenharmony_ci Geometry Vertex Binding Components Description 4835bd8deadSopenharmony_ci ----------------------------- ---------- ---------------------------- 4845bd8deadSopenharmony_ci ... 4855bd8deadSopenharmony_ci primitive.invocation (id,-,-,-) geometry program invocation 4865bd8deadSopenharmony_ci primitive.tessouter[n] (x,-,-,-) outer tess. level n 4875bd8deadSopenharmony_ci primitive.tessinner[n] (x,-,-,-) inner tess. level n 4885bd8deadSopenharmony_ci primitive.patch.attrib[n] (x,y,z,w) generic patch attribute n 4895bd8deadSopenharmony_ci primitive.tessouter[n..o] (x,-,-,-) outer tess. levels n to o 4905bd8deadSopenharmony_ci primitive.tessinner[n..o] (x,-,-,-) inner tess. levels n to o 4915bd8deadSopenharmony_ci primitive.patch.attrib[n..o] (x,y,z,w) generic patch attrib n to o 4925bd8deadSopenharmony_ci primitive.vertexcount (c,-,-,-) vertices in primitive 4935bd8deadSopenharmony_ci 4945bd8deadSopenharmony_ci ... 4955bd8deadSopenharmony_ci 4965bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.invocation", the "x" 4975bd8deadSopenharmony_ci component is filled with an integer giving the number of previous 4985bd8deadSopenharmony_ci invocations of the geometry program on the primitive being processed. If 4995bd8deadSopenharmony_ci the geometry program is invoked only once per primitive (default), this 5005bd8deadSopenharmony_ci component will always be zero. If the program is invoked multiple times 5015bd8deadSopenharmony_ci (via the INVOCATIONS declaration), the component will be zero on the first 5025bd8deadSopenharmony_ci invocation, one on the second, and so forth. The "y", "z", and "w" 5035bd8deadSopenharmony_ci components of the variable are always undefined. 5045bd8deadSopenharmony_ci 5055bd8deadSopenharmony_ci If an attribute binding matches "primitive.tessouter[n]", the "x" 5065bd8deadSopenharmony_ci component is filled with the per-patch outer tessellation level numbered 5075bd8deadSopenharmony_ci <n> of the input patch. <n> must be less than four. The "y", "z", and 5085bd8deadSopenharmony_ci "w" components are always undefined. A program will fail to load if this 5095bd8deadSopenharmony_ci attribute binding is used and the input primitive type is not PATCHES. 5105bd8deadSopenharmony_ci 5115bd8deadSopenharmony_ci If an attribute binding matches "primitive.tessinner[n]", the "x" 5125bd8deadSopenharmony_ci component is filled with the per-patch inner tessellation level numbered 5135bd8deadSopenharmony_ci <n> of the input patch. <n> must be less than two. The "y", "z", and "w" 5145bd8deadSopenharmony_ci components are always undefined. A program will fail to load if this 5155bd8deadSopenharmony_ci attribute binding is used and the input primitive type is not PATCHES. 5165bd8deadSopenharmony_ci 5175bd8deadSopenharmony_ci If an attribute binding matches "primitive.patch.attrib[n]", the "x", "y", 5185bd8deadSopenharmony_ci "z", and "w" components are filled with the corresponding components of 5195bd8deadSopenharmony_ci the per-patch generic attribute numbered <n> of the input patch. A 5205bd8deadSopenharmony_ci program will fail to load if this attribute binding is used and the input 5215bd8deadSopenharmony_ci primitive type is not PATCHES. 5225bd8deadSopenharmony_ci 5235bd8deadSopenharmony_ci If an attribute binding matches "primitive.tessouter[n..o]", 5245bd8deadSopenharmony_ci "primitive.tessinner[n..o]", or "primitive.patch.attrib[n..o]", a sequence 5255bd8deadSopenharmony_ci of 1+<o>-<n> outer tessellation level, inner tessellation level, or 5265bd8deadSopenharmony_ci per-patch generic attribute bindings is created. For per-patch generic 5275bd8deadSopenharmony_ci attribute bindings, it is as though the sequence 5285bd8deadSopenharmony_ci "primitive.patch.attrib[n], primitive.patch.attrib[n+1], ... 5295bd8deadSopenharmony_ci primitive.patch.attrib[o]" were specfied. These bindings are available 5305bd8deadSopenharmony_ci only in explicit declarations of array variables. A program will fail to 5315bd8deadSopenharmony_ci load if <n> is greater than <o> or the input primitive type is not 5325bd8deadSopenharmony_ci PATCHES. 5335bd8deadSopenharmony_ci 5345bd8deadSopenharmony_ci If a geometry attribute binding matches "primitive.vertexcount", the "x" 5355bd8deadSopenharmony_ci component is filled with the number of vertices in the input primitive 5365bd8deadSopenharmony_ci being processed. The "y", "z", and "w" components of the variable are 5375bd8deadSopenharmony_ci always undefined. 5385bd8deadSopenharmony_ci 5395bd8deadSopenharmony_ci 5405bd8deadSopenharmony_ci Modify Section 2.X.3.5, Program Results 5415bd8deadSopenharmony_ci 5425bd8deadSopenharmony_ci (modify Table X.X) 5435bd8deadSopenharmony_ci 5445bd8deadSopenharmony_ci Binding Components Description 5455bd8deadSopenharmony_ci ----------------------------- ---------- ---------------------------- 5465bd8deadSopenharmony_ci result.color[n].primary (r,g,b,a) primary color n (SRC_COLOR) 5475bd8deadSopenharmony_ci result.color[n].secondary (r,g,b,a) secondary color n (SRC1_COLOR) 5485bd8deadSopenharmony_ci 5495bd8deadSopenharmony_ci Table X.X: Fragment Result Variable Bindings. Components labeled "*" 5505bd8deadSopenharmony_ci are unused. "[n]" is optional -- color <n> is used if specified; color 5515bd8deadSopenharmony_ci 0 is used otherwise. 5525bd8deadSopenharmony_ci 5535bd8deadSopenharmony_ci (add after third paragraph) 5545bd8deadSopenharmony_ci 5555bd8deadSopenharmony_ci If a result variable binding matches "result.color[n].primary" or 5565bd8deadSopenharmony_ci "result.color[n].secondary" and the ARB_blend_func_extended option is 5575bd8deadSopenharmony_ci specified, updates to the "x", "y", "z", and "w" components of these color 5585bd8deadSopenharmony_ci result variables modify the "r", "g", "b", and "a" components of the 5595bd8deadSopenharmony_ci SRC_COLOR and SRC1_COLOR color outputs, respectively, for the fragment 5605bd8deadSopenharmony_ci output color numbered <n>. If the ARB_blend_func_extended program option 5615bd8deadSopenharmony_ci is not specified, the "result.color[n].primary" and 5625bd8deadSopenharmony_ci "result.color[n].secondary" bindings are unavailable. 5635bd8deadSopenharmony_ci 5645bd8deadSopenharmony_ci 5655bd8deadSopenharmony_ci Modify Section 2.X.3.6, Program Parameter Buffers 5665bd8deadSopenharmony_ci 5675bd8deadSopenharmony_ci (modify the description of parameter buffer arrays to require that all 5685bd8deadSopenharmony_ci bindings in an array declaration must use the same single buffer *or* 5695bd8deadSopenharmony_ci buffer range) 5705bd8deadSopenharmony_ci 5715bd8deadSopenharmony_ci ... Program parameter buffer variables may be declared as arrays, but all 5725bd8deadSopenharmony_ci bindings assigned to the array must use the same binding point or binding 5735bd8deadSopenharmony_ci point range, and must increase consecutively. 5745bd8deadSopenharmony_ci 5755bd8deadSopenharmony_ci (add to the end of the section) 5765bd8deadSopenharmony_ci 5775bd8deadSopenharmony_ci In explicit variable declarations, the bindings in Table X.12.1 of the 5785bd8deadSopenharmony_ci form "program.buffer[a..b]" may also be used, and indicate the variable 5795bd8deadSopenharmony_ci spans multiple buffer binding points. Such variables must be accessed as 5805bd8deadSopenharmony_ci an arrays, with the first index specifying an offset into the range of 5815bd8deadSopenharmony_ci buffer object binding points. A buffer index of zero identifies binding 5825bd8deadSopenharmony_ci point <a>; an index of <b>-<a>-1 identifies binding point <b>. If such a 5835bd8deadSopenharmony_ci variable is declared as an array, a second index must be provided to 5845bd8deadSopenharmony_ci identify the individual array element. A program will fail to compile if 5855bd8deadSopenharmony_ci such bindings are used when <a> or <b> is negative or greater than or 5865bd8deadSopenharmony_ci equal to the number of buffer binding points supported for the program 5875bd8deadSopenharmony_ci type, or if <a> is greater than <b>. The bindings in Table X.12.1 may not 5885bd8deadSopenharmony_ci be used in implicit variable declarations. 5895bd8deadSopenharmony_ci 5905bd8deadSopenharmony_ci Binding Components Underlying State 5915bd8deadSopenharmony_ci ----------------------------- ---------- ----------------------------- 5925bd8deadSopenharmony_ci program.buffer[a..b][c] (x,x,x,x) program parameter buffers a 5935bd8deadSopenharmony_ci through b, element c 5945bd8deadSopenharmony_ci program.buffer[a..b][c..d] (x,x,x,x) program parameter buffers a 5955bd8deadSopenharmony_ci through b, elements b 5965bd8deadSopenharmony_ci through c 5975bd8deadSopenharmony_ci program.buffer[a..b] (x,x,x,x) program parameter buffers a 5985bd8deadSopenharmony_ci through b, all elements 5995bd8deadSopenharmony_ci 6005bd8deadSopenharmony_ci Table X.12.1: Program Parameter Buffer Array Bindings. <a> and <b> 6015bd8deadSopenharmony_ci indicate buffer numbers, <c> and <d> indicate individual elements. 6025bd8deadSopenharmony_ci 6035bd8deadSopenharmony_ci When bindings beginning with "program.buffer[a..b]" are used in a variable 6045bd8deadSopenharmony_ci declaration, they behave identically to corresponding beginning with 6055bd8deadSopenharmony_ci "program.buffer[a]", except that the variable is filled with a separate 6065bd8deadSopenharmony_ci set of values for each buffer binding point from <a> to <b> inclusive. 6075bd8deadSopenharmony_ci 6085bd8deadSopenharmony_ci (add new section after Section 2.X.3.7, Program Condition Code Registers 6095bd8deadSopenharmony_ci and renumber subsequent sections accordingly) 6105bd8deadSopenharmony_ci 6115bd8deadSopenharmony_ci Section 2.X.3.8, Program Texture Variables 6125bd8deadSopenharmony_ci 6135bd8deadSopenharmony_ci Program texture variables are used as constants during program execution 6145bd8deadSopenharmony_ci and refer the texture objects bound to to one or more texture image units. 6155bd8deadSopenharmony_ci All texture variables have associated bindings and are read-only during 6165bd8deadSopenharmony_ci program execution. Texture variables retain their values across program 6175bd8deadSopenharmony_ci invocations, and the set of texture image units to which they refer is 6185bd8deadSopenharmony_ci constant. The texture object a variable refers to may be changed by 6195bd8deadSopenharmony_ci binding a new texture object to the appropriate target of the 6205bd8deadSopenharmony_ci corresponding texture image unit. Texture variables may only be used to 6215bd8deadSopenharmony_ci identify a texture object in texture instructions, and may not be used as 6225bd8deadSopenharmony_ci operands in any other instruction. Texture variables may be declared 6235bd8deadSopenharmony_ci explicitly via the <TEXTURE_statement> grammar rule, or implicitly by 6245bd8deadSopenharmony_ci using a texture image unit binding in an instruction. 6255bd8deadSopenharmony_ci 6265bd8deadSopenharmony_ci Texture array variables may be declared as arrays, but the list of 6275bd8deadSopenharmony_ci texture image units assigned to the array must increase consectively. 6285bd8deadSopenharmony_ci 6295bd8deadSopenharmony_ci Texture variables identify only a texture image unit; the corresponding 6305bd8deadSopenharmony_ci texture target (e.g., 1D, 2D, CUBE) and texture object is identified by 6315bd8deadSopenharmony_ci the <texTarget> grammar rule in instructions using the texture variable. 6325bd8deadSopenharmony_ci 6335bd8deadSopenharmony_ci Binding Components Underlying State 6345bd8deadSopenharmony_ci --------------- ---------- ------------------------------------------ 6355bd8deadSopenharmony_ci texture[a] x texture object bound to image unit a 6365bd8deadSopenharmony_ci texture[a..b] x texture objects bound to image units a 6375bd8deadSopenharmony_ci through b 6385bd8deadSopenharmony_ci 6395bd8deadSopenharmony_ci Table X.12.2: Texture Image Unit Bindings. <a> and <b> indicate 6405bd8deadSopenharmony_ci texture image unit numbers. 6415bd8deadSopenharmony_ci 6425bd8deadSopenharmony_ci If a texture binding matches "texture[a]", the texture variable is filled 6435bd8deadSopenharmony_ci with a single integer referring to texture image unit <a>. 6445bd8deadSopenharmony_ci 6455bd8deadSopenharmony_ci If a texture binding matches "texture[a..b]", the texture variable is 6465bd8deadSopenharmony_ci filled with an array of integers referring to texture image units <a> 6475bd8deadSopenharmony_ci through <b>, inclusive. A program will fail to compile if <a> or <b> is 6485bd8deadSopenharmony_ci negative or greater than or equal to the number of texture image units 6495bd8deadSopenharmony_ci supported, or if <a> is greater than <b>. 6505bd8deadSopenharmony_ci 6515bd8deadSopenharmony_ci 6525bd8deadSopenharmony_ci Modify Section 2.X.4, Program Execution Environment 6535bd8deadSopenharmony_ci 6545bd8deadSopenharmony_ci (Update the instruction set table to include new columns to indicate the 6555bd8deadSopenharmony_ci first ISA supporting the instruction, and to indicate whether the 6565bd8deadSopenharmony_ci instruction supports 64-bit floating-point modifiers.) 6575bd8deadSopenharmony_ci 6585bd8deadSopenharmony_ci Instr- Modifiers 6595bd8deadSopenharmony_ci uction V F I C S H D Out Inputs Description 6605bd8deadSopenharmony_ci ------- -- - - - - - - --- -------- -------------------------------- 6615bd8deadSopenharmony_ci ABS 40 6 6 X X X F v v absolute value 6625bd8deadSopenharmony_ci ADD 40 6 6 X X X F v v,v add 6635bd8deadSopenharmony_ci AND 40 - 6 X - - S v v,v bitwise and 6645bd8deadSopenharmony_ci ATOM 50 - - X - - - s v,su atomic memory transaction 6655bd8deadSopenharmony_ci BFE 50 - X X - - S v v,v bitfield extract 6665bd8deadSopenharmony_ci BFI 50 - X X - - S v v,v,v bitfield insert 6675bd8deadSopenharmony_ci BFR 50 - X X - - S v v bitfield reverse 6685bd8deadSopenharmony_ci BRK 40 - - - - - - - c break out of loop instruction 6695bd8deadSopenharmony_ci BTC 50 - X X - - S v v bit count 6705bd8deadSopenharmony_ci BTFL 50 - X X - - S v v find least significant bit 6715bd8deadSopenharmony_ci BTFM 50 - X X - - S v v find most significant bit 6725bd8deadSopenharmony_ci CAL 40 - - - - - - - c subroutine call 6735bd8deadSopenharmony_ci CEIL 40 6 6 X X X F v vf ceiling 6745bd8deadSopenharmony_ci CMP 40 6 6 X X X F v v,v,v compare 6755bd8deadSopenharmony_ci CONT 40 - - - - - - - c continue with next loop interation 6765bd8deadSopenharmony_ci COS 40 X - X X X F s s cosine with reduction to [-PI,PI] 6775bd8deadSopenharmony_ci CVT 50 - - X X - F v v general data type conversion 6785bd8deadSopenharmony_ci DDX 40 X - X X X F v v derivative relative to X (fp-only) 6795bd8deadSopenharmony_ci DDY 40 X - X X X F v v derivative relative to Y (fp-only) 6805bd8deadSopenharmony_ci DIV 40 6 6 X X X F v v,s divide vector components by scalar 6815bd8deadSopenharmony_ci DP2 40 X - X X X F s v,v 2-component dot product 6825bd8deadSopenharmony_ci DP2A 40 X - X X X F s v,v,v 2-comp. dot product w/scalar add 6835bd8deadSopenharmony_ci DP3 40 X - X X X F s v,v 3-component dot product 6845bd8deadSopenharmony_ci DP4 40 X - X X X F s v,v 4-component dot product 6855bd8deadSopenharmony_ci DPH 40 X - X X X F s v,v homogeneous dot product 6865bd8deadSopenharmony_ci DST 40 X - X X X F v v,v distance vector 6875bd8deadSopenharmony_ci ELSE 40 - - - - - - - - start if test else block 6885bd8deadSopenharmony_ci EMIT 40 - - - - - - - - emit vertex stream 0 (gp-only) 6895bd8deadSopenharmony_ci EMITS 50 - X - - - S - s emit vertex to stream (gp-only) 6905bd8deadSopenharmony_ci ENDIF 40 - - - - - - - - end if test block 6915bd8deadSopenharmony_ci ENDPRIM 40 - - - - - - - - end of primitive (gp-only) 6925bd8deadSopenharmony_ci ENDREP 40 - - - - - - - - end of repeat block 6935bd8deadSopenharmony_ci EX2 40 X - X X X F s s exponential base 2 6945bd8deadSopenharmony_ci FLR 40 6 6 X X X F v vf floor 6955bd8deadSopenharmony_ci FRC 40 6 - X X X F v v fraction 6965bd8deadSopenharmony_ci I2F 40 - 6 X - - S vf v integer to float 6975bd8deadSopenharmony_ci IF 40 - - - - - - - c start of if test block 6985bd8deadSopenharmony_ci IPAC 50 X - X X - F v v interpolate at centroid (fp-only) 6995bd8deadSopenharmony_ci IPAO 50 X - X X - F v v,v interpolate w/offset (fp-only) 7005bd8deadSopenharmony_ci IPAS 50 X - X X - F v v,su interpolate at sample (fp-only) 7015bd8deadSopenharmony_ci KIL 40 X X - - X F - vc kill fragment 7025bd8deadSopenharmony_ci LDC 40 - - X X - F v v load from constant buffer 7035bd8deadSopenharmony_ci LG2 40 X - X X X F s s logarithm base 2 7045bd8deadSopenharmony_ci LIT 40 X - X X X F v v compute lighting coefficients 7055bd8deadSopenharmony_ci LOAD 40 - - X X - F v su global load 7065bd8deadSopenharmony_ci LOD 41 X - X X - F v vf,t compute texture LOD 7075bd8deadSopenharmony_ci LRP 40 X - X X X F v v,v,v linear interpolation 7085bd8deadSopenharmony_ci MAD 40 6 6 X X X F v v,v,v multiply and add 7095bd8deadSopenharmony_ci MAX 40 6 6 X X X F v v,v maximum 7105bd8deadSopenharmony_ci MEMBAR 50 - - - - - - - - memory barrier 7115bd8deadSopenharmony_ci MIN 40 6 6 X X X F v v,v minimum 7125bd8deadSopenharmony_ci MOD 40 - 6 X - - S v v,s modulus vector components by scalar 7135bd8deadSopenharmony_ci MOV 40 6 6 X X X F v v move 7145bd8deadSopenharmony_ci MUL 40 6 6 X X X F v v,v multiply 7155bd8deadSopenharmony_ci NOT 40 - 6 X - - S v v bitwise not 7165bd8deadSopenharmony_ci NRM 40 X - X X X F v v normalize 3-component vector 7175bd8deadSopenharmony_ci OR 40 - 6 X - - S v v,v bitwise or 7185bd8deadSopenharmony_ci PK2H 40 X X - - - F s vf pack two 16-bit floats 7195bd8deadSopenharmony_ci PK2US 40 X X - - - F s vf pack two floats as unsigned 16-bit 7205bd8deadSopenharmony_ci PK4B 40 X X - - - F s vf pack four floats as signed 8-bit 7215bd8deadSopenharmony_ci PK4UB 40 X X - - - F s vf pack four floats as unsigned 8-bit 7225bd8deadSopenharmony_ci PK64 50 X X - - - F v v pack 4x32-bit vectors to 2x64 7235bd8deadSopenharmony_ci POW 40 X - X X X F s s,s exponentiate 7245bd8deadSopenharmony_ci RCC 40 X - X X X F s s reciprocal (clamped) 7255bd8deadSopenharmony_ci RCP 40 6 - X X X F s s reciprocal 7265bd8deadSopenharmony_ci REP 40 6 6 - - X F - v start of repeat block 7275bd8deadSopenharmony_ci RET 40 - - - - - - - c subroutine return 7285bd8deadSopenharmony_ci RFL 40 X - X X X F v v,v reflection vector 7295bd8deadSopenharmony_ci ROUND 40 6 6 X X X F v vf round to nearest integer 7305bd8deadSopenharmony_ci RSQ 40 6 - X X X F s s reciprocal square root 7315bd8deadSopenharmony_ci SAD 40 - 6 X - - S vu v,v,vu sum of absolute differences 7325bd8deadSopenharmony_ci SCS 40 X - X X X F v s sine/cosine without reduction 7335bd8deadSopenharmony_ci SEQ 40 6 6 X X X F v v,v set on equal 7345bd8deadSopenharmony_ci SFL 40 6 6 X X X F v v,v set on false 7355bd8deadSopenharmony_ci SGE 40 6 6 X X X F v v,v set on greater than or equal 7365bd8deadSopenharmony_ci SGT 40 6 6 X X X F v v,v set on greater than 7375bd8deadSopenharmony_ci SHL 40 - 6 X - - S v v,s shift left 7385bd8deadSopenharmony_ci SHR 40 - 6 X - - S v v,s shift right 7395bd8deadSopenharmony_ci SIN 40 X - X X X F s s sine with reduction to [-PI,PI] 7405bd8deadSopenharmony_ci SLE 40 6 6 X X X F v v,v set on less than or equal 7415bd8deadSopenharmony_ci SLT 40 6 6 X X X F v v,v set on less than 7425bd8deadSopenharmony_ci SNE 40 6 6 X X X F v v,v set on not equal 7435bd8deadSopenharmony_ci SSG 40 6 - X X X F v v set sign 7445bd8deadSopenharmony_ci STORE 50 - - - - - - - v,su global store 7455bd8deadSopenharmony_ci STR 40 6 6 X X X F v v,v set on true 7465bd8deadSopenharmony_ci SUB 40 6 6 X X X F v v,v subtract 7475bd8deadSopenharmony_ci SWZ 40 X - X X X F v v extended swizzle 7485bd8deadSopenharmony_ci TEX 40 X X X X - F v vf,t texture sample 7495bd8deadSopenharmony_ci TGALL 50 X X X X - F v v test all non-zero in thread group 7505bd8deadSopenharmony_ci TGANY 50 X X X X - F v v test any non-zero in thread group 7515bd8deadSopenharmony_ci TGEQ 50 X X X X - F v v test all equal in thread group 7525bd8deadSopenharmony_ci TRUNC 40 6 6 X X X F v vf truncate (round toward zero) 7535bd8deadSopenharmony_ci TXB 40 X X X X - F v vf,t texture sample with bias 7545bd8deadSopenharmony_ci TXD 40 X X X X - F v vf,vf,vf,t texture sample w/partials 7555bd8deadSopenharmony_ci TXF 40 X X X X - F v vs,t texel fetch 7565bd8deadSopenharmony_ci TXFMS 40 X X X X - F v vs,t multisample texel fetch 7575bd8deadSopenharmony_ci TXG 41 X X X X - F v vf,t texture gather 7585bd8deadSopenharmony_ci TXGO 50 X X X X - F v vf,vs,vs,t texture gather w/per-texel offsets 7595bd8deadSopenharmony_ci TXL 40 X X X X - F v vf,t texture sample w/LOD 7605bd8deadSopenharmony_ci TXP 40 X X X X - F v vf,t texture sample w/projection 7615bd8deadSopenharmony_ci TXQ 40 - - - - - S vs vs,t texture info query 7625bd8deadSopenharmony_ci UP2H 40 X X X X - F vf s unpack two 16-bit floats 7635bd8deadSopenharmony_ci UP2US 40 X X X X - F vf s unpack two unsigned 16-bit integers 7645bd8deadSopenharmony_ci UP4B 40 X X X X - F vf s unpack four signed 8-bit integers 7655bd8deadSopenharmony_ci UP4UB 40 X X X X - F vf s unpack four unsigned 8-bit integers 7665bd8deadSopenharmony_ci UP64 50 X X X X - F v v unpack 2x64 vectors to 4x32 7675bd8deadSopenharmony_ci X2D 40 X - X X X F v v,v,v 2D coordinate transformation 7685bd8deadSopenharmony_ci XOR 40 - 6 X - - S v v,v exclusive or 7695bd8deadSopenharmony_ci XPD 40 X - X X X F v v,v cross product 7705bd8deadSopenharmony_ci 7715bd8deadSopenharmony_ci Table X.13: Summary of NV_gpu_program5 instructions. 7725bd8deadSopenharmony_ci 7735bd8deadSopenharmony_ci The "V" column indicates the first assembly language in the 7745bd8deadSopenharmony_ci NV_gpu_program4 family (if any) supporting the opcode. "41" and "50" 7755bd8deadSopenharmony_ci indicate NV_gpu_program4_1 and NV_gpu_program5, respectively. 7765bd8deadSopenharmony_ci 7775bd8deadSopenharmony_ci The "Modifiers" columns specify the set of modifiers allowed for the 7785bd8deadSopenharmony_ci instruction: 7795bd8deadSopenharmony_ci 7805bd8deadSopenharmony_ci F = floating-point data type modifiers 7815bd8deadSopenharmony_ci I = signed and unsigned integer data type modifiers 7825bd8deadSopenharmony_ci C = condition code update modifiers 7835bd8deadSopenharmony_ci S = clamping (saturation) modifiers 7845bd8deadSopenharmony_ci H = half-precision float data type suffix 7855bd8deadSopenharmony_ci D = default data type modifier (F, U, or S) 7865bd8deadSopenharmony_ci 7875bd8deadSopenharmony_ci For the "F" and "I" columns, an "X" indicates support for both unsized 7885bd8deadSopenharmony_ci type modifiers and sized type modifiers with fewer than 64 bits. A "6" 7895bd8deadSopenharmony_ci indicates support for all modifiers, including 64-bit versions (when 7905bd8deadSopenharmony_ci supported). 7915bd8deadSopenharmony_ci 7925bd8deadSopenharmony_ci The input and output columns describe the formats of the operands and 7935bd8deadSopenharmony_ci results of the instruction. 7945bd8deadSopenharmony_ci 7955bd8deadSopenharmony_ci v: 4-component vector (data type is inherited from operation) 7965bd8deadSopenharmony_ci vf: 4-component vector (data type is always floating-point) 7975bd8deadSopenharmony_ci vs: 4-component vector (data type is always signed integer) 7985bd8deadSopenharmony_ci vu: 4-component vector (data type is always unsigned integer) 7995bd8deadSopenharmony_ci s: scalar (replicated if written to a vector destination; 8005bd8deadSopenharmony_ci data type is inherited from operation) 8015bd8deadSopenharmony_ci su: scalar (data type is always unsigned integer) 8025bd8deadSopenharmony_ci c: condition code test result (e.g., "EQ", "GT1.x") 8035bd8deadSopenharmony_ci vc: 4-component vector or condition code test 8045bd8deadSopenharmony_ci t: texture 8055bd8deadSopenharmony_ci 8065bd8deadSopenharmony_ci Instructions labeled "fp-only" and "gp-only" are supported only for 8075bd8deadSopenharmony_ci fragment and geometry programs, respectively. 8085bd8deadSopenharmony_ci 8095bd8deadSopenharmony_ci 8105bd8deadSopenharmony_ci Modify Section 2.X.4.1, Program Instruction Modifiers 8115bd8deadSopenharmony_ci 8125bd8deadSopenharmony_ci (Update the discussion of instruction precision modifiers. If 8135bd8deadSopenharmony_ci GL_NV_gpu_program_fp64 is not found in the extension string, the "F64" 8145bd8deadSopenharmony_ci instruction modifier described below is not supported.) 8155bd8deadSopenharmony_ci 8165bd8deadSopenharmony_ci (add to Table X.14 of the NV_gpu_program4 specification.) 8175bd8deadSopenharmony_ci 8185bd8deadSopenharmony_ci Modifier Description 8195bd8deadSopenharmony_ci -------- --------------------------------------------------- 8205bd8deadSopenharmony_ci F Floating-point operation 8215bd8deadSopenharmony_ci U Fixed-point operation, unsigned operands 8225bd8deadSopenharmony_ci S Fixed-point operation, signed operands 8235bd8deadSopenharmony_ci ... 8245bd8deadSopenharmony_ci F32 Floating-point operation, 32-bit precision or 8255bd8deadSopenharmony_ci access one 32-bit floating-point value 8265bd8deadSopenharmony_ci F64 Floating-point operation, 64-bit precision or 8275bd8deadSopenharmony_ci access one 64-bit floating-point value 8285bd8deadSopenharmony_ci S32 Fixed-point operation, signed 32-bit operands or 8295bd8deadSopenharmony_ci access one 32-bit signed integer value 8305bd8deadSopenharmony_ci S64 Fixed-point operation, signed 64-bit operands or 8315bd8deadSopenharmony_ci access one 64-bit signed integer value 8325bd8deadSopenharmony_ci U32 Fixed-point operation, unsigned 32-bit operands or 8335bd8deadSopenharmony_ci access one 32-bit unsigned integer value 8345bd8deadSopenharmony_ci U64 Fixed-point operation, unsigned 64-bit operands or 8355bd8deadSopenharmony_ci access one 64-bit unsigned integer value 8365bd8deadSopenharmony_ci ... 8375bd8deadSopenharmony_ci F32X2 Access two 32-bit floating-point values 8385bd8deadSopenharmony_ci F32X4 Access four 32-bit floating-point values 8395bd8deadSopenharmony_ci F64X2 Access two 64-bit floating-point values 8405bd8deadSopenharmony_ci F64X4 Access four 64-bit floating-point values 8415bd8deadSopenharmony_ci S8 Access one 8-bit signed integer value 8425bd8deadSopenharmony_ci S16 Access one 16-bit signed integer value 8435bd8deadSopenharmony_ci S32X2 Access two 32-bit signed integer values 8445bd8deadSopenharmony_ci S32X4 Access four 32-bit signed integer values 8455bd8deadSopenharmony_ci S64 Access one 64-bit signed integer value 8465bd8deadSopenharmony_ci S64X2 Access two 64-bit signed integer values 8475bd8deadSopenharmony_ci S64X4 Access four 64-bit signed integer values 8485bd8deadSopenharmony_ci U8 Access one 8-bit unsigned integer value 8495bd8deadSopenharmony_ci U16 Access one 16-bit unsigned integer value 8505bd8deadSopenharmony_ci U32 Access one 32-bit unsigned integer value 8515bd8deadSopenharmony_ci U32X2 Access two 32-bit unsigned integer values 8525bd8deadSopenharmony_ci U32X4 Access four 32-bit unsigned integer values 8535bd8deadSopenharmony_ci U64 Access one 64-bit unsigned integer value 8545bd8deadSopenharmony_ci U64X2 Access two 64-bit unsigned integer values 8555bd8deadSopenharmony_ci U64X4 Access four 64-bit unsigned integer values 8565bd8deadSopenharmony_ci 8575bd8deadSopenharmony_ci ADD Perform add operation for ATOM 8585bd8deadSopenharmony_ci MIN Perform minimum operation for ATOM 8595bd8deadSopenharmony_ci MAX Perform maximum operation for ATOM 8605bd8deadSopenharmony_ci IWRAP Perform wrapping increment for ATOM 8615bd8deadSopenharmony_ci DWRAP Perform wrapping decrment for ATOM 8625bd8deadSopenharmony_ci AND Perform logical AND operation for ATOM 8635bd8deadSopenharmony_ci OR Perform logical OR operation for ATOM 8645bd8deadSopenharmony_ci XOR Perform logical XOR operation for ATOM 8655bd8deadSopenharmony_ci EXCH Perform exchange operation for ATOM 8665bd8deadSopenharmony_ci CSWAP Perform compare-and-swap operation for ATOM 8675bd8deadSopenharmony_ci 8685bd8deadSopenharmony_ci COH Make LOAD and STORE operations use coherent caching 8695bd8deadSopenharmony_ci VOL Make LOAD and STORE operations treat memory as volatile 8705bd8deadSopenharmony_ci 8715bd8deadSopenharmony_ci PREC Instruction results should be precise 8725bd8deadSopenharmony_ci 8735bd8deadSopenharmony_ci ROUND Inexact conversion results round to nearest value (even) 8745bd8deadSopenharmony_ci CEIL Inexact conversion results round to larger value 8755bd8deadSopenharmony_ci FLR Inexact conversion results round to smaller value 8765bd8deadSopenharmony_ci TRUNC Inexact conversion results round to value closest to zero 8775bd8deadSopenharmony_ci 8785bd8deadSopenharmony_ci 8795bd8deadSopenharmony_ci "F", "U", and "S" modifiers are base data type modifiers and specify that 8805bd8deadSopenharmony_ci the instruction should operate on floating-point, unsigned integer, or 8815bd8deadSopenharmony_ci signed integer values, respectively. For example, "ADD.F", "ADD.U", and 8825bd8deadSopenharmony_ci "ADD.S" specify component-wise addition of floating-point, unsigned 8835bd8deadSopenharmony_ci integer, or signed integer vectors, respectively. While these modifiers 8845bd8deadSopenharmony_ci specify a data type, they do not specify an exact precision at which the 8855bd8deadSopenharmony_ci operation is performed. Floating-point and fixed-point operations will 8865bd8deadSopenharmony_ci typically be carried out at 32-bit precision, unless otherwise described 8875bd8deadSopenharmony_ci in the instruction documentation or overridden by the precision modifiers. 8885bd8deadSopenharmony_ci If all operands are represented with less than 32-bit precision (e.g., 8895bd8deadSopenharmony_ci variables with the "SHORT" component size modifier), operations may be 8905bd8deadSopenharmony_ci carried out at a precision no less than the precision of the largest 8915bd8deadSopenharmony_ci operand used by the instruction. For some instructions, the data type of 8925bd8deadSopenharmony_ci some operands or the result are fixed; in these cases, the data type 8935bd8deadSopenharmony_ci modifier specifies the data type of the remaining values. 8945bd8deadSopenharmony_ci 8955bd8deadSopenharmony_ci Operands represented with fewer bits than used to perform the instruction 8965bd8deadSopenharmony_ci will be promoted to a larger data type. Signed integer operands will be 8975bd8deadSopenharmony_ci sign-extended, where the most significant bits are filled with ones if the 8985bd8deadSopenharmony_ci operand is negative and zero otherwise. Unsigned integer operands will be 8995bd8deadSopenharmony_ci zero-extended, where the most significant bits are always filled with 9005bd8deadSopenharmony_ci zeroes. Operands represented with more bits than used to perform the 9015bd8deadSopenharmony_ci instruction will be converted to lower precision. Floating-point 9025bd8deadSopenharmony_ci overflows result in IEEE infinity encodings; integer overflows result in 9035bd8deadSopenharmony_ci the truncation of the most significant bits. 9045bd8deadSopenharmony_ci 9055bd8deadSopenharmony_ci For arithmetic operations, the "F32", "F64", "U32", "U64", "S32", and 9065bd8deadSopenharmony_ci "S64" modifiers are precision-specific data type modifiers that specify 9075bd8deadSopenharmony_ci that floating-point, unsigned integer, or signed integer operations be 9085bd8deadSopenharmony_ci carried out with an internal precision of no less than 32 or 64 bits per 9095bd8deadSopenharmony_ci component, respectively. The "F64", "U64", and "S64" modifiers are 9105bd8deadSopenharmony_ci supported on only a subset of instructions, as documented in the 9115bd8deadSopenharmony_ci instruction table. The base data type of the instruction is trivially 9125bd8deadSopenharmony_ci derived from a precision-specific data type modifiers, and an instruction 9135bd8deadSopenharmony_ci may not specify both base and precision-specific data type modifiers. 9145bd8deadSopenharmony_ci 9155bd8deadSopenharmony_ci ... 9165bd8deadSopenharmony_ci 9175bd8deadSopenharmony_ci "SAT" and "SSAT" are clamping modifiers that generally specify that the 9185bd8deadSopenharmony_ci floating-point components of the instruction result should be clamped to 9195bd8deadSopenharmony_ci [0,1] or [-1,1], respectively, before updating the condition code and the 9205bd8deadSopenharmony_ci destination variable. If no clamping suffix is specified, unclamped 9215bd8deadSopenharmony_ci results will be used for condition code updates (if any) and destination 9225bd8deadSopenharmony_ci variable writes. Clamping modifiers are not supported on instructions 9235bd8deadSopenharmony_ci that do not produce floating-point results, with one exception. 9245bd8deadSopenharmony_ci 9255bd8deadSopenharmony_ci ... 9265bd8deadSopenharmony_ci 9275bd8deadSopenharmony_ci For load and store operations, the "F32", "F32X2", "F32X4", "F64", 9285bd8deadSopenharmony_ci "F64X2", "F64X4", "S8", "S16", "S32", "S32X2", "S32X4", "S64", "S64X2", 9295bd8deadSopenharmony_ci "S64X4", "U8", "U16", "U32", "U32X2", "U32X4", "U64", "U64X2", and "U64X4" 9305bd8deadSopenharmony_ci storage modifiers control how data are loaded from or stored to memory. 9315bd8deadSopenharmony_ci Storage modifiers are supported by the ATOM, LDC, LOAD, and STORE 9325bd8deadSopenharmony_ci instructions and are covered in more detail in the descriptions of these 9335bd8deadSopenharmony_ci instructions. These instructions must specify exactly one of these 9345bd8deadSopenharmony_ci modifiers, and may not specify any of the base data type modifiers (F,U,S) 9355bd8deadSopenharmony_ci described above. The base data types of the result vector of a load 9365bd8deadSopenharmony_ci instruction or the first operand of a store instruction are trivially 9375bd8deadSopenharmony_ci derived from the storage modifier. 9385bd8deadSopenharmony_ci 9395bd8deadSopenharmony_ci For atomic memory operations performed by the ATOM instruction, the "ADD", 9405bd8deadSopenharmony_ci "MIN", "MAX", "IWRAP", "DWRAP", "AND", "OR", "XOR", "EXCH", and "CSWAP" 9415bd8deadSopenharmony_ci modifiers specify the operation to perform on the memory being accessed, 9425bd8deadSopenharmony_ci and are described in more detail in the description of this instruction. 9435bd8deadSopenharmony_ci 9445bd8deadSopenharmony_ci For load and store operations, the "COH" modifier controls whether the 9455bd8deadSopenharmony_ci operation uses a coherent level of the cache hierarchy, as described in 9465bd8deadSopenharmony_ci Section 2.X.4.5. 9475bd8deadSopenharmony_ci 9485bd8deadSopenharmony_ci For load and store operations, the "VOL" modifier controls whether the 9495bd8deadSopenharmony_ci operation treats the memory being read or written as volatile. 9505bd8deadSopenharmony_ci Instructions modified with "VOL" will always read or write the underlying 9515bd8deadSopenharmony_ci memory, whether or not previous or subsequent loads and stores access the 9525bd8deadSopenharmony_ci same memory. 9535bd8deadSopenharmony_ci 9545bd8deadSopenharmony_ci For arithmetic and logical operations, the "PREC" modifier controls 9555bd8deadSopenharmony_ci whether the instruction result should be treated as precise. For 9565bd8deadSopenharmony_ci instructions not qualified with ".PREC", the implementation may rearrange 9575bd8deadSopenharmony_ci the computations specified by the program instructions to execute more 9585bd8deadSopenharmony_ci efficiently, even if it may generate slightly different results in some 9595bd8deadSopenharmony_ci cases. For example, an implementation may combine a MUL instruction with 9605bd8deadSopenharmony_ci a dependent ADD instruction and generate code to execute a MAD 9615bd8deadSopenharmony_ci (multiply-add) instruction instead. The difference in rounding may 9625bd8deadSopenharmony_ci produce unacceptable artifacts for some algorithms. When ".PREC" is 9635bd8deadSopenharmony_ci specified, the instruction will be executed in a manner that always 9645bd8deadSopenharmony_ci generates the same result regardless of the program instructions that 9655bd8deadSopenharmony_ci precede or follow the instruction. Note that a ".PREC" modifier does not 9665bd8deadSopenharmony_ci affect the processing of any other instruction. For example, tagging an 9675bd8deadSopenharmony_ci instruction with ".PREC" does not mean that the instructions used to 9685bd8deadSopenharmony_ci generate the instruction's operands will be treated as precise unless 9695bd8deadSopenharmony_ci those instructions are also qualified with ".PREC". 9705bd8deadSopenharmony_ci 9715bd8deadSopenharmony_ci For the CVT (data type conversion) instruction, the "F16", "F32", "F64", 9725bd8deadSopenharmony_ci "S8", "S16", "S32", "S64", "U8", "U16", "U32", and "U64" storage modifiers 9735bd8deadSopenharmony_ci specify the data type of the vector operand and the converted result. Two 9745bd8deadSopenharmony_ci storage modifiers must be provided, which specify the data type of the 9755bd8deadSopenharmony_ci result and the operand, respectively. 9765bd8deadSopenharmony_ci 9775bd8deadSopenharmony_ci For the CVT (data type conversion) instruction, the "ROUND", "CEIL", 9785bd8deadSopenharmony_ci "FLR", and "TRUNC" modifiers specify how to round converted results that 9795bd8deadSopenharmony_ci are not directly representable using the data type of the result. 9805bd8deadSopenharmony_ci 9815bd8deadSopenharmony_ci 9825bd8deadSopenharmony_ci Modify Section 2.X.4.4, Program Texture Access 9835bd8deadSopenharmony_ci 9845bd8deadSopenharmony_ci (Extend the language describing the operation of texel offsets to cover 9855bd8deadSopenharmony_ci the new capability to load texel offsets from a register. Otherwise, 9865bd8deadSopenharmony_ci this functionality is unchanged from previous extensions.) 9875bd8deadSopenharmony_ci 9885bd8deadSopenharmony_ci <offset> is a 3-component signed integer vector, which can be specified 9895bd8deadSopenharmony_ci using constants embedded in the texture instruction according to the 9905bd8deadSopenharmony_ci <texOffsetImmed> grammar rule, or taken from a vector operand according to 9915bd8deadSopenharmony_ci the <texOffsetVar> grammar rule. The three components of the offset 9925bd8deadSopenharmony_ci vector are added to the computed <u>, <v>, and <w> texel locations prior 9935bd8deadSopenharmony_ci to sampling. When using a constant offset, one, two, or three components 9945bd8deadSopenharmony_ci may be specified in the instruction; if fewer than three are specified, 9955bd8deadSopenharmony_ci the remaining offset components are zero. If no offsets are specified, 9965bd8deadSopenharmony_ci all three components of the offset are treated as zero. A limited range 9975bd8deadSopenharmony_ci of offset values are supported; the minimum and maximum <texOffset> values 9985bd8deadSopenharmony_ci are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and 9995bd8deadSopenharmony_ci MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively. A program will fail to load: 10005bd8deadSopenharmony_ci 10015bd8deadSopenharmony_ci * if the texture target specified in the instruction is 1D, ARRAY1D, 10025bd8deadSopenharmony_ci SHADOW1D, or SHADOWARRAY1D, and the second or third component of a 10035bd8deadSopenharmony_ci constant offset vector is non-zero; 10045bd8deadSopenharmony_ci 10055bd8deadSopenharmony_ci * if the texture target specified in the instruction is 2D, RECT, 10065bd8deadSopenharmony_ci ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third 10075bd8deadSopenharmony_ci component of a constant offset vector is non-zero; 10085bd8deadSopenharmony_ci 10095bd8deadSopenharmony_ci * if the texture target is CUBE, SHADOWCUBE, ARRAYCUBE, or 10105bd8deadSopenharmony_ci SHADOWARRAYCUBE, and any component of a constant offset vector is 10115bd8deadSopenharmony_ci non-zero -- texel offsets are not supported for cube map or buffer 10125bd8deadSopenharmony_ci textures; 10135bd8deadSopenharmony_ci 10145bd8deadSopenharmony_ci * if any component of the constant offset vector of a TXGO instruction 10155bd8deadSopenharmony_ci is non-zero -- non-constant offsets are provided in separate operands; 10165bd8deadSopenharmony_ci 10175bd8deadSopenharmony_ci * if any component of a constant offset vector is less than 10185bd8deadSopenharmony_ci MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than 10195bd8deadSopenharmony_ci MAX_PROGRAM_TEXEL_OFFSET_EXT; 10205bd8deadSopenharmony_ci 10215bd8deadSopenharmony_ci * if a TXD or TXGO instruction specifies a non-constant texel offset 10225bd8deadSopenharmony_ci according to the <texOffsetVar> grammar rule; or 10235bd8deadSopenharmony_ci 10245bd8deadSopenharmony_ci * if any instruction specifies a non-constant texel offset according 10255bd8deadSopenharmony_ci to the <texOffsetVar> grammar rule and the texture target is CUBE, 10265bd8deadSopenharmony_ci SHADOWCUBE, ARRAYCUBE, or SHADOWARRAYCUBE. 10275bd8deadSopenharmony_ci 10285bd8deadSopenharmony_ci The implementation-dependent minimum and maximum texel offset values apply 10295bd8deadSopenharmony_ci to texel offsets are taken from a vector operand, but out-of-bounds or 10305bd8deadSopenharmony_ci invalid component values will not prevent program loading since the 10315bd8deadSopenharmony_ci offsets may not be computed until the program is executed. Components of 10325bd8deadSopenharmony_ci the vector operand not needed for the texture target are ignored. The W 10335bd8deadSopenharmony_ci component of the offset vector is always ignored; the Z component of the 10345bd8deadSopenharmony_ci offset vector is ignored unless the target is 3D; the Y component is 10355bd8deadSopenharmony_ci ignored if the target is 1D, ARRAY1D, SHADOW1D, or SHADOWARRAY1D. If the 10365bd8deadSopenharmony_ci value of any non-ignored component of the vector operand is outside 10375bd8deadSopenharmony_ci implementation-dependent limits, the results of the texture lookup are 10385bd8deadSopenharmony_ci undefined. For all instructions except TXGO, the limits are 10395bd8deadSopenharmony_ci MIN_PROGRAM_TEXEL_OFFSET_EXT and MAX_PROGRAM_TEXEL_OFFSET_EXT. For the 10405bd8deadSopenharmony_ci TXGO instruction, the limits are MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV and 10415bd8deadSopenharmony_ci MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV. 10425bd8deadSopenharmony_ci 10435bd8deadSopenharmony_ci 10445bd8deadSopenharmony_ci (Modify language describing how the check for using multiple targets on a 10455bd8deadSopenharmony_ci single texture image unit works, to account for texture array variables 10465bd8deadSopenharmony_ci where a single instruction may access one of multiple textures and the 10475bd8deadSopenharmony_ci texture used is not known when the program is loaded.) 10485bd8deadSopenharmony_ci 10495bd8deadSopenharmony_ci A program will fail to load if it attempts to sample from multiple texture 10505bd8deadSopenharmony_ci targets (including the SHADOW pseudo-targets) on the same texture image 10515bd8deadSopenharmony_ci unit. For example, a program containing any two the following 10525bd8deadSopenharmony_ci instructions will fail to load: 10535bd8deadSopenharmony_ci 10545bd8deadSopenharmony_ci TEX out, coord, texture[0], 1D; 10555bd8deadSopenharmony_ci TEX out, coord, texture[0], 2D; 10565bd8deadSopenharmony_ci TEX out, coord, texture[0], ARRAY2D; 10575bd8deadSopenharmony_ci TEX out, coord, texture[0], SHADOW2D; 10585bd8deadSopenharmony_ci TEX out, coord, texture[0], 3D; 10595bd8deadSopenharmony_ci 10605bd8deadSopenharmony_ci For the purposes of this test, sampling using a texture variable declared 10615bd8deadSopenharmony_ci as an array is treated as though all texture image units bound to the 10625bd8deadSopenharmony_ci variable were accessed. A program containing the following 10635bd8deadSopenharmony_ci instructions would fail to load: 10645bd8deadSopenharmony_ci 10655bd8deadSopenharmony_ci TEXTURE textures[] = { texture[0..3] }; 10665bd8deadSopenharmony_ci TEX out, coord, textures[2], 2D; # acts as if all textures are used 10675bd8deadSopenharmony_ci TEX out, coord, texture[1], 3D; 10685bd8deadSopenharmony_ci 10695bd8deadSopenharmony_ci (Add language describing texture gather component selection) 10705bd8deadSopenharmony_ci 10715bd8deadSopenharmony_ci The TXG and TXGO instructions provide the ability to assemble a 10725bd8deadSopenharmony_ci four-component vector by taking the value of a single component of a 10735bd8deadSopenharmony_ci multi-component texture from each of four texels. The component selected 10745bd8deadSopenharmony_ci is identified by the <texImageUnitComp> grammar rule. Component selection 10755bd8deadSopenharmony_ci is not supported for any other instruction, and a program will fail to 10765bd8deadSopenharmony_ci load if <texImageUnitComp> is matched for any texture instruction other 10775bd8deadSopenharmony_ci than TXG or TXGO. 10785bd8deadSopenharmony_ci 10795bd8deadSopenharmony_ci 10805bd8deadSopenharmony_ci Add New Section 2.X.4.5, Program Memory Access 10815bd8deadSopenharmony_ci 10825bd8deadSopenharmony_ci Programs may load from or store to buffer object memory via the ATOM 10835bd8deadSopenharmony_ci (atomic global memory operation), LDC (load constant), LOAD (global load), 10845bd8deadSopenharmony_ci and STORE (global store) instructions. 10855bd8deadSopenharmony_ci 10865bd8deadSopenharmony_ci Load instructions read 8, 16, 32, 64, 128, or 256 bits of data from a 10875bd8deadSopenharmony_ci source address to produce a four-component vector, according to the 10885bd8deadSopenharmony_ci storage modifier specified with the instruction. The storage modifier has 10895bd8deadSopenharmony_ci three parts: 10905bd8deadSopenharmony_ci 10915bd8deadSopenharmony_ci - a base data type, "F", "S", or "U", specifying that the instruction 10925bd8deadSopenharmony_ci fetches floating-point, signed integer, or unsigned integer values, 10935bd8deadSopenharmony_ci respectively; 10945bd8deadSopenharmony_ci 10955bd8deadSopenharmony_ci - a component size, specifying that the components fetched by the 10965bd8deadSopenharmony_ci instruction have 8, 16, 32, or 64 bits; and 10975bd8deadSopenharmony_ci 10985bd8deadSopenharmony_ci - an optional component count, where "X2" and "X4" indicate that two or 10995bd8deadSopenharmony_ci four components be fetched, and no count indicates a single component 11005bd8deadSopenharmony_ci fetch. 11015bd8deadSopenharmony_ci 11025bd8deadSopenharmony_ci When the storage modifier specifies that fewer than four components should 11035bd8deadSopenharmony_ci be fetched, remaining components are filled with zeroes. When performing 11045bd8deadSopenharmony_ci an atomic memory operation (ATOM) or a global load (LOAD), the GPU address 11055bd8deadSopenharmony_ci is specified as an instruction operand. When performing a constant buffer 11065bd8deadSopenharmony_ci load (LDC), the GPU address is derived by adding the base address of the 11075bd8deadSopenharmony_ci bound buffer object to an offset specified as an instruction operand. 11085bd8deadSopenharmony_ci Given a GPU address <address> and a storage modifier <modifier>, the 11095bd8deadSopenharmony_ci memory load can be described by the following code: 11105bd8deadSopenharmony_ci 11115bd8deadSopenharmony_ci result_t_vec BufferMemoryLoad(char *address, OpModifier modifier) 11125bd8deadSopenharmony_ci { 11135bd8deadSopenharmony_ci result_t_vec result = { 0, 0, 0, 0 }; 11145bd8deadSopenharmony_ci switch (modifier) { 11155bd8deadSopenharmony_ci case F32: 11165bd8deadSopenharmony_ci result.x = ((float32_t *)address)[0]; 11175bd8deadSopenharmony_ci break; 11185bd8deadSopenharmony_ci case F32X2: 11195bd8deadSopenharmony_ci result.x = ((float32_t *)address)[0]; 11205bd8deadSopenharmony_ci result.y = ((float32_t *)address)[1]; 11215bd8deadSopenharmony_ci break; 11225bd8deadSopenharmony_ci case F32X4: 11235bd8deadSopenharmony_ci result.x = ((float32_t *)address)[0]; 11245bd8deadSopenharmony_ci result.y = ((float32_t *)address)[1]; 11255bd8deadSopenharmony_ci result.z = ((float32_t *)address)[2]; 11265bd8deadSopenharmony_ci result.w = ((float32_t *)address)[3]; 11275bd8deadSopenharmony_ci break; 11285bd8deadSopenharmony_ci case F64: 11295bd8deadSopenharmony_ci result.x = ((float64_t *)address)[0]; 11305bd8deadSopenharmony_ci break; 11315bd8deadSopenharmony_ci case F64X2: 11325bd8deadSopenharmony_ci result.x = ((float64_t *)address)[0]; 11335bd8deadSopenharmony_ci result.y = ((float64_t *)address)[1]; 11345bd8deadSopenharmony_ci break; 11355bd8deadSopenharmony_ci case F64X4: 11365bd8deadSopenharmony_ci result.x = ((float64_t *)address)[0]; 11375bd8deadSopenharmony_ci result.y = ((float64_t *)address)[1]; 11385bd8deadSopenharmony_ci result.z = ((float64_t *)address)[2]; 11395bd8deadSopenharmony_ci result.w = ((float64_t *)address)[3]; 11405bd8deadSopenharmony_ci break; 11415bd8deadSopenharmony_ci case S8: 11425bd8deadSopenharmony_ci result.x = ((int8_t *)address)[0]; 11435bd8deadSopenharmony_ci break; 11445bd8deadSopenharmony_ci case S16: 11455bd8deadSopenharmony_ci result.x = ((int16_t *)address)[0]; 11465bd8deadSopenharmony_ci break; 11475bd8deadSopenharmony_ci case S32: 11485bd8deadSopenharmony_ci result.x = ((int32_t *)address)[0]; 11495bd8deadSopenharmony_ci break; 11505bd8deadSopenharmony_ci case S32X2: 11515bd8deadSopenharmony_ci result.x = ((int32_t *)address)[0]; 11525bd8deadSopenharmony_ci result.y = ((int32_t *)address)[1]; 11535bd8deadSopenharmony_ci break; 11545bd8deadSopenharmony_ci case S32X4: 11555bd8deadSopenharmony_ci result.x = ((int32_t *)address)[0]; 11565bd8deadSopenharmony_ci result.y = ((int32_t *)address)[1]; 11575bd8deadSopenharmony_ci result.z = ((int32_t *)address)[2]; 11585bd8deadSopenharmony_ci result.w = ((int32_t *)address)[3]; 11595bd8deadSopenharmony_ci break; 11605bd8deadSopenharmony_ci case S64: 11615bd8deadSopenharmony_ci result.x = ((int64_t *)address)[0]; 11625bd8deadSopenharmony_ci break; 11635bd8deadSopenharmony_ci case S64X2: 11645bd8deadSopenharmony_ci result.x = ((int64_t *)address)[0]; 11655bd8deadSopenharmony_ci result.y = ((int64_t *)address)[1]; 11665bd8deadSopenharmony_ci break; 11675bd8deadSopenharmony_ci case S64X4: 11685bd8deadSopenharmony_ci result.x = ((int64_t *)address)[0]; 11695bd8deadSopenharmony_ci result.y = ((int64_t *)address)[1]; 11705bd8deadSopenharmony_ci result.z = ((int64_t *)address)[2]; 11715bd8deadSopenharmony_ci result.w = ((int64_t *)address)[3]; 11725bd8deadSopenharmony_ci break; 11735bd8deadSopenharmony_ci case U8: 11745bd8deadSopenharmony_ci result.x = ((uint8_t *)address)[0]; 11755bd8deadSopenharmony_ci break; 11765bd8deadSopenharmony_ci case U16: 11775bd8deadSopenharmony_ci result.x = ((uint16_t *)address)[0]; 11785bd8deadSopenharmony_ci break; 11795bd8deadSopenharmony_ci case U32: 11805bd8deadSopenharmony_ci result.x = ((uint32_t *)address)[0]; 11815bd8deadSopenharmony_ci break; 11825bd8deadSopenharmony_ci case U32X2: 11835bd8deadSopenharmony_ci result.x = ((uint32_t *)address)[0]; 11845bd8deadSopenharmony_ci result.y = ((uint32_t *)address)[1]; 11855bd8deadSopenharmony_ci break; 11865bd8deadSopenharmony_ci case U32X4: 11875bd8deadSopenharmony_ci result.x = ((uint32_t *)address)[0]; 11885bd8deadSopenharmony_ci result.y = ((uint32_t *)address)[1]; 11895bd8deadSopenharmony_ci result.z = ((uint32_t *)address)[2]; 11905bd8deadSopenharmony_ci result.w = ((uint32_t *)address)[3]; 11915bd8deadSopenharmony_ci break; 11925bd8deadSopenharmony_ci case U64: 11935bd8deadSopenharmony_ci result.x = ((uint64_t *)address)[0]; 11945bd8deadSopenharmony_ci break; 11955bd8deadSopenharmony_ci case U64X2: 11965bd8deadSopenharmony_ci result.x = ((uint64_t *)address)[0]; 11975bd8deadSopenharmony_ci result.y = ((uint64_t *)address)[1]; 11985bd8deadSopenharmony_ci break; 11995bd8deadSopenharmony_ci case U64X4: 12005bd8deadSopenharmony_ci result.x = ((uint64_t *)address)[0]; 12015bd8deadSopenharmony_ci result.y = ((uint64_t *)address)[1]; 12025bd8deadSopenharmony_ci result.z = ((uint64_t *)address)[2]; 12035bd8deadSopenharmony_ci result.w = ((uint64_t *)address)[3]; 12045bd8deadSopenharmony_ci break; 12055bd8deadSopenharmony_ci } 12065bd8deadSopenharmony_ci return result; 12075bd8deadSopenharmony_ci } 12085bd8deadSopenharmony_ci 12095bd8deadSopenharmony_ci Store instructions write the contents of a four-component vector operand 12105bd8deadSopenharmony_ci into 8, 16, 32, 64, 128, or 256 bits, according to the storage modifier 12115bd8deadSopenharmony_ci specified with the instruction. The storage modifiers supported by stores 12125bd8deadSopenharmony_ci are identical to those supported for loads. Given a GPU address 12135bd8deadSopenharmony_ci <address>, a vector operand <operand> containing the data to be stored, 12145bd8deadSopenharmony_ci and a storage modifier <modifier>, the memory store can be described by 12155bd8deadSopenharmony_ci the following code: 12165bd8deadSopenharmony_ci 12175bd8deadSopenharmony_ci void BufferMemoryStore(char *address, operand_t_vec operand, 12185bd8deadSopenharmony_ci OpModifier modifier) 12195bd8deadSopenharmony_ci { 12205bd8deadSopenharmony_ci switch (modifier) { 12215bd8deadSopenharmony_ci case F32: 12225bd8deadSopenharmony_ci ((float32_t *)address)[0] = operand.x; 12235bd8deadSopenharmony_ci break; 12245bd8deadSopenharmony_ci case F32X2: 12255bd8deadSopenharmony_ci ((float32_t *)address)[0] = operand.x; 12265bd8deadSopenharmony_ci ((float32_t *)address)[1] = operand.y; 12275bd8deadSopenharmony_ci break; 12285bd8deadSopenharmony_ci case F32X4: 12295bd8deadSopenharmony_ci ((float32_t *)address)[0] = operand.x; 12305bd8deadSopenharmony_ci ((float32_t *)address)[1] = operand.y; 12315bd8deadSopenharmony_ci ((float32_t *)address)[2] = operand.z; 12325bd8deadSopenharmony_ci ((float32_t *)address)[3] = operand.w; 12335bd8deadSopenharmony_ci break; 12345bd8deadSopenharmony_ci case F64: 12355bd8deadSopenharmony_ci ((float64_t *)address)[0] = operand.x; 12365bd8deadSopenharmony_ci break; 12375bd8deadSopenharmony_ci case F64X2: 12385bd8deadSopenharmony_ci ((float64_t *)address)[0] = operand.x; 12395bd8deadSopenharmony_ci ((float64_t *)address)[1] = operand.y; 12405bd8deadSopenharmony_ci break; 12415bd8deadSopenharmony_ci case F64X4: 12425bd8deadSopenharmony_ci ((float64_t *)address)[0] = operand.x; 12435bd8deadSopenharmony_ci ((float64_t *)address)[1] = operand.y; 12445bd8deadSopenharmony_ci ((float64_t *)address)[2] = operand.z; 12455bd8deadSopenharmony_ci ((float64_t *)address)[3] = operand.w; 12465bd8deadSopenharmony_ci break; 12475bd8deadSopenharmony_ci case S8: 12485bd8deadSopenharmony_ci ((int8_t *)address)[0] = operand.x; 12495bd8deadSopenharmony_ci break; 12505bd8deadSopenharmony_ci case S16: 12515bd8deadSopenharmony_ci ((int16_t *)address)[0] = operand.x; 12525bd8deadSopenharmony_ci break; 12535bd8deadSopenharmony_ci case S32: 12545bd8deadSopenharmony_ci ((int32_t *)address)[0] = operand.x; 12555bd8deadSopenharmony_ci break; 12565bd8deadSopenharmony_ci case S32X2: 12575bd8deadSopenharmony_ci ((int32_t *)address)[0] = operand.x; 12585bd8deadSopenharmony_ci ((int32_t *)address)[1] = operand.y; 12595bd8deadSopenharmony_ci break; 12605bd8deadSopenharmony_ci case S32X4: 12615bd8deadSopenharmony_ci ((int32_t *)address)[0] = operand.x; 12625bd8deadSopenharmony_ci ((int32_t *)address)[1] = operand.y; 12635bd8deadSopenharmony_ci ((int32_t *)address)[2] = operand.z; 12645bd8deadSopenharmony_ci ((int32_t *)address)[3] = operand.w; 12655bd8deadSopenharmony_ci break; 12665bd8deadSopenharmony_ci case S64: 12675bd8deadSopenharmony_ci ((int64_t *)address)[0] = operand.x; 12685bd8deadSopenharmony_ci break; 12695bd8deadSopenharmony_ci case S64X2: 12705bd8deadSopenharmony_ci ((int64_t *)address)[0] = operand.x; 12715bd8deadSopenharmony_ci ((int64_t *)address)[1] = operand.y; 12725bd8deadSopenharmony_ci break; 12735bd8deadSopenharmony_ci case S64X4: 12745bd8deadSopenharmony_ci ((int64_t *)address)[0] = operand.x; 12755bd8deadSopenharmony_ci ((int64_t *)address)[1] = operand.y; 12765bd8deadSopenharmony_ci ((int64_t *)address)[2] = operand.z; 12775bd8deadSopenharmony_ci ((int64_t *)address)[3] = operand.w; 12785bd8deadSopenharmony_ci break; 12795bd8deadSopenharmony_ci case U8: 12805bd8deadSopenharmony_ci ((uint8_t *)address)[0] = operand.x; 12815bd8deadSopenharmony_ci break; 12825bd8deadSopenharmony_ci case U16: 12835bd8deadSopenharmony_ci ((uint16_t *)address)[0] = operand.x; 12845bd8deadSopenharmony_ci break; 12855bd8deadSopenharmony_ci case U32: 12865bd8deadSopenharmony_ci ((uint32_t *)address)[0] = operand.x; 12875bd8deadSopenharmony_ci break; 12885bd8deadSopenharmony_ci case U32X2: 12895bd8deadSopenharmony_ci ((uint32_t *)address)[0] = operand.x; 12905bd8deadSopenharmony_ci ((uint32_t *)address)[1] = operand.y; 12915bd8deadSopenharmony_ci break; 12925bd8deadSopenharmony_ci case U32X4: 12935bd8deadSopenharmony_ci ((uint32_t *)address)[0] = operand.x; 12945bd8deadSopenharmony_ci ((uint32_t *)address)[1] = operand.y; 12955bd8deadSopenharmony_ci ((uint32_t *)address)[2] = operand.z; 12965bd8deadSopenharmony_ci ((uint32_t *)address)[3] = operand.w; 12975bd8deadSopenharmony_ci break; 12985bd8deadSopenharmony_ci case U64: 12995bd8deadSopenharmony_ci ((uint64_t *)address)[0] = operand.x; 13005bd8deadSopenharmony_ci break; 13015bd8deadSopenharmony_ci case U64X2: 13025bd8deadSopenharmony_ci ((uint64_t *)address)[0] = operand.x; 13035bd8deadSopenharmony_ci ((uint64_t *)address)[1] = operand.y; 13045bd8deadSopenharmony_ci break; 13055bd8deadSopenharmony_ci case U64X4: 13065bd8deadSopenharmony_ci ((uint64_t *)address)[0] = operand.x; 13075bd8deadSopenharmony_ci ((uint64_t *)address)[1] = operand.y; 13085bd8deadSopenharmony_ci ((uint64_t *)address)[2] = operand.z; 13095bd8deadSopenharmony_ci ((uint64_t *)address)[3] = operand.w; 13105bd8deadSopenharmony_ci break; 13115bd8deadSopenharmony_ci } 13125bd8deadSopenharmony_ci } 13135bd8deadSopenharmony_ci 13145bd8deadSopenharmony_ci If a global load or store accesses a memory address that does not 13155bd8deadSopenharmony_ci correspond to a buffer object made resident by MakeBufferResidentNV, the 13165bd8deadSopenharmony_ci results of the operation are undefined and may produce a fault resulting 13175bd8deadSopenharmony_ci in application termination. If a load accesses a buffer object made 13185bd8deadSopenharmony_ci resident with an <access> parameter of WRITE_ONLY, or if a store accesses 13195bd8deadSopenharmony_ci a buffer object made resident with an <access> parameter of READ_ONLY, the 13205bd8deadSopenharmony_ci results of the operation are also undefined and may lead to application 13215bd8deadSopenharmony_ci termination. 13225bd8deadSopenharmony_ci 13235bd8deadSopenharmony_ci The address used for global memory loads or stores or offset used for 13245bd8deadSopenharmony_ci constant buffer loads must be aligned to the fetch size corresponding to 13255bd8deadSopenharmony_ci the storage opcode modifier. For S8 and U8, the offset has no alignment 13265bd8deadSopenharmony_ci requirements. For S16 and U16, the offset must be a multiple of two basic 13275bd8deadSopenharmony_ci machine units. For F32, S32, and U32, the offset must be a multiple of 13285bd8deadSopenharmony_ci four. For F32X2, F64, S32X2, S64, U32X2, and U64, the offset must be a 13295bd8deadSopenharmony_ci multiple of eight. For F32X4, F64X2, S32X4, S64X2, U32X4, and U64X2, the 13305bd8deadSopenharmony_ci offset must be a multiple of sixteen. For F64X4, S64X4, and U64X4, the 13315bd8deadSopenharmony_ci offset must be a multiple of thirty-two. If an offset is not correctly 13325bd8deadSopenharmony_ci aligned, the values returned by a buffer memory load will be undefined, 13335bd8deadSopenharmony_ci and the effects of a buffer memory store will also be undefined. 13345bd8deadSopenharmony_ci 13355bd8deadSopenharmony_ci Global and image memory accesses in assembly programs are weakly ordered 13365bd8deadSopenharmony_ci and may require synchronization relative to other operations in the OpenGL 13375bd8deadSopenharmony_ci pipeline. The ordering and synchronization mehcanisms described in 13385bd8deadSopenharmony_ci Section 2.14.X (of the EXT_shader_image_load_store extension 13395bd8deadSopenharmony_ci specification) for shaders using the OpenGL Shading Language apply equally 13405bd8deadSopenharmony_ci to loads, stores, and atomics performed in assembly programs. 13415bd8deadSopenharmony_ci 13425bd8deadSopenharmony_ci 13435bd8deadSopenharmony_ci Modify Section 2.X.6.Y of the NV_fragment_program4 specification 13445bd8deadSopenharmony_ci 13455bd8deadSopenharmony_ci (add new option section) 13465bd8deadSopenharmony_ci 13475bd8deadSopenharmony_ci + Early Per-Fragment Tests (NV_early_fragment_tests) 13485bd8deadSopenharmony_ci 13495bd8deadSopenharmony_ci If a fragment program specifies the "NV_early_fragment_tests" option, the 13505bd8deadSopenharmony_ci depth and stencil tests will be performed prior to fragment program 13515bd8deadSopenharmony_ci invocation, as described in Section 3.X. 13525bd8deadSopenharmony_ci 13535bd8deadSopenharmony_ci 13545bd8deadSopenharmony_ci Modify Section 2.X.7.Y of the NV_geometry_program4 specification 13555bd8deadSopenharmony_ci 13565bd8deadSopenharmony_ci (Simply add the new input primitive type "PATCHES" to the list of tokens 13575bd8deadSopenharmony_ci allowed by the "PRIMITIVE_IN" declaration.) 13585bd8deadSopenharmony_ci 13595bd8deadSopenharmony_ci - Input Primitive Type (PRIMITIVE_IN) 13605bd8deadSopenharmony_ci 13615bd8deadSopenharmony_ci The PRIMITIVE_IN statement declares the type of primitives seen by a 13625bd8deadSopenharmony_ci geometry program. The single argument must be one of "POINTS", "LINES", 13635bd8deadSopenharmony_ci "LINES_ADJACENCY", "TRIANGLES", "TRIANGLES_ADJACENCY", or "PATCHES". 13645bd8deadSopenharmony_ci 13655bd8deadSopenharmony_ci 13665bd8deadSopenharmony_ci (Add a new optional program declaration to declare a geometry shader that 13675bd8deadSopenharmony_ci is run <N> times per primitive.) 13685bd8deadSopenharmony_ci 13695bd8deadSopenharmony_ci Geometry programs support three types of mandatory declaration statements, 13705bd8deadSopenharmony_ci as described below. Each of the three must be included exactly once in 13715bd8deadSopenharmony_ci the geometry program. 13725bd8deadSopenharmony_ci 13735bd8deadSopenharmony_ci ... 13745bd8deadSopenharmony_ci 13755bd8deadSopenharmony_ci Geometry programs also support one optional declaration statement. 13765bd8deadSopenharmony_ci 13775bd8deadSopenharmony_ci - Program Invocation Count (INVOCATIONS) 13785bd8deadSopenharmony_ci 13795bd8deadSopenharmony_ci The INVOCATIONS statement declares the number of times the geometry 13805bd8deadSopenharmony_ci program is run on each primitive processed. The single argument must be a 13815bd8deadSopenharmony_ci positive integer less than or equal to the value of the 13825bd8deadSopenharmony_ci implementation-dependent limit MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV. Each 13835bd8deadSopenharmony_ci invocation of the geometry program will have the same inputs and outputs 13845bd8deadSopenharmony_ci except for the built-in input variable "primitive.invocation". This 13855bd8deadSopenharmony_ci variable will be an integer between 0 and <n>-1, where <n> is the declared 13865bd8deadSopenharmony_ci number of invocations. If omitted, the program invocation count is one. 13875bd8deadSopenharmony_ci 13885bd8deadSopenharmony_ci 13895bd8deadSopenharmony_ci Section 2.X.8.Z, ATOM: Atomic Global Memory Operation 13905bd8deadSopenharmony_ci 13915bd8deadSopenharmony_ci The ATOM instruction performs an atomic global memory operation by reading 13925bd8deadSopenharmony_ci from memory at the address specified by the second unsigned integer scalar 13935bd8deadSopenharmony_ci operand, computing a new value based on the value read from memory and the 13945bd8deadSopenharmony_ci first (vector) operand, and then writing the result back to the same 13955bd8deadSopenharmony_ci memory address. The memory transaction is atomic, guaranteeing that no 13965bd8deadSopenharmony_ci other write to the memory accessed will occur between the time it is read 13975bd8deadSopenharmony_ci and written by the ATOM instruction. The result of the ATOM instruction 13985bd8deadSopenharmony_ci is the scalar value read from memory. 13995bd8deadSopenharmony_ci 14005bd8deadSopenharmony_ci The ATOM instruction has two required instruction modifiers. The atomic 14015bd8deadSopenharmony_ci modifier specifies the type of operation to be performed. The storage 14025bd8deadSopenharmony_ci modifier specifies the size and data type of the operand read from memory 14035bd8deadSopenharmony_ci and the base data type of the operation used to compute the value to be 14045bd8deadSopenharmony_ci written to memory. 14055bd8deadSopenharmony_ci 14065bd8deadSopenharmony_ci atomic storage 14075bd8deadSopenharmony_ci modifier modifiers operation 14085bd8deadSopenharmony_ci -------- ------------------ -------------------------------------- 14095bd8deadSopenharmony_ci ADD U32, S32, U64 compute a sum 14105bd8deadSopenharmony_ci MIN U32, S32 compute minimum 14115bd8deadSopenharmony_ci MAX U32, S32 compute maximum 14125bd8deadSopenharmony_ci IWRAP U32 increment memory, wrapping at operand 14135bd8deadSopenharmony_ci DWRAP U32 decrement memory, wrapping at operand 14145bd8deadSopenharmony_ci AND U32, S32 compute bit-wise AND 14155bd8deadSopenharmony_ci OR U32, S32 compute bit-wise OR 14165bd8deadSopenharmony_ci XOR U32, S32 compute bit-wise XOR 14175bd8deadSopenharmony_ci EXCH U32, S32, U64 exchange memory with operand 14185bd8deadSopenharmony_ci CSWAP U32, S32, U64 compare-and-swap 14195bd8deadSopenharmony_ci 14205bd8deadSopenharmony_ci Table X.Y, Supported atomic and storage modifiers for the ATOM 14215bd8deadSopenharmony_ci instruction. 14225bd8deadSopenharmony_ci 14235bd8deadSopenharmony_ci Not all storage modifiers are supported by ATOM, and the set of modifiers 14245bd8deadSopenharmony_ci allowed for any given instruction depends on the atomic modifier 14255bd8deadSopenharmony_ci specified. Table X.Y enumerates the set of atomic modifiers supported by 14265bd8deadSopenharmony_ci the ATOM instruction, and the storage modifiers allowed for each. 14275bd8deadSopenharmony_ci 14285bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 14295bd8deadSopenharmony_ci address = ScalarLoad(op1); 14305bd8deadSopenharmony_ci result = BufferMemoryLoad(address, storageModifier); 14315bd8deadSopenharmony_ci switch (atomicModifier) { 14325bd8deadSopenharmony_ci case ADD: 14335bd8deadSopenharmony_ci writeval = tmp0.x + result; 14345bd8deadSopenharmony_ci break; 14355bd8deadSopenharmony_ci case MIN: 14365bd8deadSopenharmony_ci writeval = min(tmp0.x, result); 14375bd8deadSopenharmony_ci break; 14385bd8deadSopenharmony_ci case MAX: 14395bd8deadSopenharmony_ci writeval = max(tmp0.x, result); 14405bd8deadSopenharmony_ci break; 14415bd8deadSopenharmony_ci case IWRAP: 14425bd8deadSopenharmony_ci writeval = (result >= tmp0.x) ? 0 : result+1; 14435bd8deadSopenharmony_ci break; 14445bd8deadSopenharmony_ci case DWRAP: 14455bd8deadSopenharmony_ci writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1; 14465bd8deadSopenharmony_ci break; 14475bd8deadSopenharmony_ci case AND: 14485bd8deadSopenharmony_ci writeval = tmp0.x & result; 14495bd8deadSopenharmony_ci break; 14505bd8deadSopenharmony_ci case OR: 14515bd8deadSopenharmony_ci writeval = tmp0.x | result; 14525bd8deadSopenharmony_ci break; 14535bd8deadSopenharmony_ci case XOR: 14545bd8deadSopenharmony_ci writeval = tmp0.x ^ result; 14555bd8deadSopenharmony_ci break; 14565bd8deadSopenharmony_ci case EXCH: 14575bd8deadSopenharmony_ci break; 14585bd8deadSopenharmony_ci case CSWAP: 14595bd8deadSopenharmony_ci if (result == tmp0.x) { 14605bd8deadSopenharmony_ci writeval = tmp0.y; 14615bd8deadSopenharmony_ci } else { 14625bd8deadSopenharmony_ci return result; // no memory store 14635bd8deadSopenharmony_ci } 14645bd8deadSopenharmony_ci break; 14655bd8deadSopenharmony_ci } 14665bd8deadSopenharmony_ci BufferMemoryStore(address, writeval, storageModifier); 14675bd8deadSopenharmony_ci 14685bd8deadSopenharmony_ci ATOM performs a scalar atomic operation. The <y>, <z>, and <w> components 14695bd8deadSopenharmony_ci of the result vector are undefined. 14705bd8deadSopenharmony_ci 14715bd8deadSopenharmony_ci ATOM supports no base data type modifiers, but requires exactly one 14725bd8deadSopenharmony_ci storage modifier. The base data types of the result vector, and the first 14735bd8deadSopenharmony_ci (vector) operand are derived from the storage modifier. The second 14745bd8deadSopenharmony_ci operand is always interpreted as a scalar unsigned integer. 14755bd8deadSopenharmony_ci 14765bd8deadSopenharmony_ci 14775bd8deadSopenharmony_ci Section 2.X.8.Z, BFE: Bitfield Extract 14785bd8deadSopenharmony_ci 14795bd8deadSopenharmony_ci The BFE instruction extracts a selected set of performs a component-wise 14805bd8deadSopenharmony_ci bit extraction of the second vector operand to yield a result vector. For 14815bd8deadSopenharmony_ci each component, the number of bits extracted is given by the x component 14825bd8deadSopenharmony_ci of the first vector operand, and the bit number of the least significant 14835bd8deadSopenharmony_ci bit extracted is given by the y component of the first vector operand. 14845bd8deadSopenharmony_ci 14855bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 14865bd8deadSopenharmony_ci tmp1 = VectorLoad(op1); 14875bd8deadSopenharmony_ci result.x = BitfieldExtract(tmp0.x, tmp0.y, tmp1.x); 14885bd8deadSopenharmony_ci result.y = BitfieldExtract(tmp0.x, tmp0.y, tmp1.y); 14895bd8deadSopenharmony_ci result.z = BitfieldExtract(tmp0.x, tmp0.y, tmp1.z); 14905bd8deadSopenharmony_ci result.w = BitfieldExtract(tmp0.x, tmp0.y, tmp1.w); 14915bd8deadSopenharmony_ci 14925bd8deadSopenharmony_ci If the number of bits to extract is zero, zero is returned. The results 14935bd8deadSopenharmony_ci of bitfield extraction are undefined 14945bd8deadSopenharmony_ci 14955bd8deadSopenharmony_ci * if the number of bits to extract or the starting offset is negative, 14965bd8deadSopenharmony_ci * if the sum of the number of bits to extract and the starting offset 14975bd8deadSopenharmony_ci is greater than the total number of bits in the operand/result, or 14985bd8deadSopenharmony_ci * if the starting offset is greater than or equal to the total number of 14995bd8deadSopenharmony_ci bits in the operand/result. 15005bd8deadSopenharmony_ci 15015bd8deadSopenharmony_ci Type BitfieldExtract(Type bits, Type offset, Type value) 15025bd8deadSopenharmony_ci { 15035bd8deadSopenharmony_ci if (bits < 0 || offset < 0 || offset >= TotalBits(Type) || 15045bd8deadSopenharmony_ci bits + offset > TotalBits(Type)) { 15055bd8deadSopenharmony_ci /* result undefined */ 15065bd8deadSopenharmony_ci } else if (bits == 0) { 15075bd8deadSopenharmony_ci return 0; 15085bd8deadSopenharmony_ci } else { 15095bd8deadSopenharmony_ci return (value << (TotalBits(Type) - (bits+offset))) >> 15105bd8deadSopenharmony_ci (TotalBits(type) - bits); 15115bd8deadSopenharmony_ci } 15125bd8deadSopenharmony_ci } 15135bd8deadSopenharmony_ci 15145bd8deadSopenharmony_ci BFE supports only signed and unsigned integer data type modifiers. For 15155bd8deadSopenharmony_ci signed integer data types, the extracted value is sign-extended (i.e., 15165bd8deadSopenharmony_ci filled with ones if the most significant bit extracted is one and filled 15175bd8deadSopenharmony_ci with zeroes otherwise). For unsigned integer data types, the extracted 15185bd8deadSopenharmony_ci value is zero-extended. 15195bd8deadSopenharmony_ci 15205bd8deadSopenharmony_ci 15215bd8deadSopenharmony_ci Section 2.X.8.Z, BFI: Bitfield Insert 15225bd8deadSopenharmony_ci 15235bd8deadSopenharmony_ci The BFI instruction performs a component-wise bitfield insertion of the 15245bd8deadSopenharmony_ci second vector operand into the third vector operand to yield a result 15255bd8deadSopenharmony_ci vector. For each component, the <n> least significant bits are extracted 15265bd8deadSopenharmony_ci from the corresponding component of the second vector operand, where <n> 15275bd8deadSopenharmony_ci is given by the x component of the first vector operand. Those bits are 15285bd8deadSopenharmony_ci merged into the corresponding component of the third vector operand, 15295bd8deadSopenharmony_ci replacing bits <b> through <b>+<n>-1, to produce the result. The bit 15305bd8deadSopenharmony_ci offset <b> is specified by the y component of the first operand. 15315bd8deadSopenharmony_ci 15325bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 15335bd8deadSopenharmony_ci tmp1 = VectorLoad(op1); 15345bd8deadSopenharmony_ci tmp2 = VectorLoad(op2); 15355bd8deadSopenharmony_ci result.x = BitfieldInsert(op0.x, op0.y, tmp1.x, tmp2.x); 15365bd8deadSopenharmony_ci result.y = BitfieldInsert(op0.x, op0.y, tmp1.y, tmp2.y); 15375bd8deadSopenharmony_ci result.z = BitfieldInsert(op0.x, op0.y, tmp1.z, tmp2.z); 15385bd8deadSopenharmony_ci result.w = BitfieldInsert(op0.x, op0.y, tmp1.w, tmp2.w); 15395bd8deadSopenharmony_ci 15405bd8deadSopenharmony_ci The results of bitfield insertion are undefined 15415bd8deadSopenharmony_ci 15425bd8deadSopenharmony_ci * if the number of bits to insert or the starting offset is negative, 15435bd8deadSopenharmony_ci * if the sum of the number of bits to insert and the starting offset 15445bd8deadSopenharmony_ci is greater than the total number of bits in the operand/result, or 15455bd8deadSopenharmony_ci * if the starting offset is greater than or equal to the total number of 15465bd8deadSopenharmony_ci bits in the operand/result. 15475bd8deadSopenharmony_ci 15485bd8deadSopenharmony_ci Type BitfieldInsert(Type bits, Type offset, Type src, Type dst) 15495bd8deadSopenharmony_ci { 15505bd8deadSopenharmony_ci if (bits < 0 || offset < 0 || offset >= TotalBits(type) || 15515bd8deadSopenharmony_ci bits + offset > TotalBits(Type)) { 15525bd8deadSopenharmony_ci /* result undefined */ 15535bd8deadSopenharmony_ci } else if (bits == TotalBits(Type)) { 15545bd8deadSopenharmony_ci return src; 15555bd8deadSopenharmony_ci } else { 15565bd8deadSopenharmony_ci Type mask = ((1 << bits) - 1) << offset; 15575bd8deadSopenharmony_ci return ((src << offset) & mask) | (dst & (~mask)); 15585bd8deadSopenharmony_ci } 15595bd8deadSopenharmony_ci } 15605bd8deadSopenharmony_ci 15615bd8deadSopenharmony_ci BFI supports only signed and unsigned integer data type modifiers. If no 15625bd8deadSopenharmony_ci type modifier is specified, the operand and result vectors are treated as 15635bd8deadSopenharmony_ci signed integers. 15645bd8deadSopenharmony_ci 15655bd8deadSopenharmony_ci 15665bd8deadSopenharmony_ci Section 2.X.8.Z, BFR: Bitfield Reverse 15675bd8deadSopenharmony_ci 15685bd8deadSopenharmony_ci The BFR instruction performs a component-wise bit reversal of the single 15695bd8deadSopenharmony_ci vector operand to produce a result vector. Bit reversal is performed by 15705bd8deadSopenharmony_ci exchanging the most and least significant bits, the second-most and 15715bd8deadSopenharmony_ci second-least significant bits, and so on. 15725bd8deadSopenharmony_ci 15735bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 15745bd8deadSopenharmony_ci result.x = BitReverse(tmp0.x); 15755bd8deadSopenharmony_ci result.y = BitReverse(tmp0.y); 15765bd8deadSopenharmony_ci result.z = BitReverse(tmp0.z); 15775bd8deadSopenharmony_ci result.w = BitReverse(tmp0.w); 15785bd8deadSopenharmony_ci 15795bd8deadSopenharmony_ci BFR supports only signed and unsigned integer data type modifiers. If no 15805bd8deadSopenharmony_ci type modifier is specified, the operand and result vectors are treated as 15815bd8deadSopenharmony_ci signed integers. 15825bd8deadSopenharmony_ci 15835bd8deadSopenharmony_ci 15845bd8deadSopenharmony_ci Section 2.X.8.Z, BTC: Bit Count 15855bd8deadSopenharmony_ci 15865bd8deadSopenharmony_ci The BTC instruction performs a component-wise bit count of the single 15875bd8deadSopenharmony_ci source vector to yield a result vector. Each component of the result 15885bd8deadSopenharmony_ci vector contains the number of one bits in the corresponding component of 15895bd8deadSopenharmony_ci the source vector. 15905bd8deadSopenharmony_ci 15915bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 15925bd8deadSopenharmony_ci result.x = BitCount(tmp0.x); 15935bd8deadSopenharmony_ci result.y = BitCount(tmp0.y); 15945bd8deadSopenharmony_ci result.z = BitCount(tmp0.z); 15955bd8deadSopenharmony_ci result.w = BitCount(tmp0.w); 15965bd8deadSopenharmony_ci 15975bd8deadSopenharmony_ci BTC supports only signed and unsigned integer data type modifiers. If no 15985bd8deadSopenharmony_ci type modifier is specified, both operands and the result are treated as 15995bd8deadSopenharmony_ci signed integers. 16005bd8deadSopenharmony_ci 16015bd8deadSopenharmony_ci 16025bd8deadSopenharmony_ci Section 2.X.8.Z, BTFL: Find Least Significant Bit 16035bd8deadSopenharmony_ci 16045bd8deadSopenharmony_ci The BTFL instruction searches for the least significant bit of each 16055bd8deadSopenharmony_ci component of the single source vector, yielding a result vector comprising 16065bd8deadSopenharmony_ci the bit number of the located bit for each component. 16075bd8deadSopenharmony_ci 16085bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 16095bd8deadSopenharmony_ci result.x = FindLSB(tmp0.x); 16105bd8deadSopenharmony_ci result.y = FindLSB(tmp0.y); 16115bd8deadSopenharmony_ci result.z = FindLSB(tmp0.z); 16125bd8deadSopenharmony_ci result.w = FindLSB(tmp0.w); 16135bd8deadSopenharmony_ci 16145bd8deadSopenharmony_ci BTFL supports only signed and unsigned integer data type modifiers. For 16155bd8deadSopenharmony_ci unsigned integer data types, the search will yield the bit number of the 16165bd8deadSopenharmony_ci least significant one bit in each component, or the maximum integer (all 16175bd8deadSopenharmony_ci bits are ones) if the source vector component is zero. For signed data 16185bd8deadSopenharmony_ci types, the search will yield the bit number of the least significant one 16195bd8deadSopenharmony_ci bit in each component, or -1 if the source vector component is zero. If 16205bd8deadSopenharmony_ci no type modifier is specified, both operands and the result are treated as 16215bd8deadSopenharmony_ci signed integers. 16225bd8deadSopenharmony_ci 16235bd8deadSopenharmony_ci 16245bd8deadSopenharmony_ci Section 2.X.8.Z, BTFM: Find Most Significant Bit 16255bd8deadSopenharmony_ci 16265bd8deadSopenharmony_ci The BTFM instruction searches for the most significant bit of each 16275bd8deadSopenharmony_ci component of the single source vector, yielding a result vector comprising 16285bd8deadSopenharmony_ci the bit number of the located bit for each component. 16295bd8deadSopenharmony_ci 16305bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 16315bd8deadSopenharmony_ci result.x = FindMSB(tmp0.x); 16325bd8deadSopenharmony_ci result.y = FindMSB(tmp0.y); 16335bd8deadSopenharmony_ci result.z = FindMSB(tmp0.z); 16345bd8deadSopenharmony_ci result.w = FindMSB(tmp0.w); 16355bd8deadSopenharmony_ci 16365bd8deadSopenharmony_ci BTFM supports only signed and unsigned integer data type modifiers. For 16375bd8deadSopenharmony_ci unsigned integer data types, the search will yield the bit number of the 16385bd8deadSopenharmony_ci most significant one bit in each component , or the maximum integer (all 16395bd8deadSopenharmony_ci bits are ones) if the source vector component is zero. For signed data 16405bd8deadSopenharmony_ci types, the search will yield the bit number of the most significant one 16415bd8deadSopenharmony_ci bit if the source value is positive, the bit number of the most 16425bd8deadSopenharmony_ci significant zero bit if the source value is negative, or -1 if the source 16435bd8deadSopenharmony_ci value is zero. If no type modifier is specified, both operands and the 16445bd8deadSopenharmony_ci result are treated as signed integers. 16455bd8deadSopenharmony_ci 16465bd8deadSopenharmony_ci 16475bd8deadSopenharmony_ci Section 2.X.8.Z, CVT: Data Type Conversion 16485bd8deadSopenharmony_ci 16495bd8deadSopenharmony_ci The CVT instruction converts each component of the single source vector 16505bd8deadSopenharmony_ci from one specified data type to another to yield a result vector. 16515bd8deadSopenharmony_ci 16525bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 16535bd8deadSopenharmony_ci result = DataTypeConvert(tmp0); 16545bd8deadSopenharmony_ci 16555bd8deadSopenharmony_ci The CVT instruction requires two storage modifiers. The first specifies 16565bd8deadSopenharmony_ci the data type of the result components; the second specifies the data type 16575bd8deadSopenharmony_ci of the operand components. The supported storage modifiers are F16, F32, 16585bd8deadSopenharmony_ci F64, S8, S16, S32, S64, U8, U16, U32, and U64. A storage modifier of 16595bd8deadSopenharmony_ci "F16" indicates a source or destination that is treated as having a 16605bd8deadSopenharmony_ci floating-point type, but whose sixteen least significant bits describe a 16615bd8deadSopenharmony_ci 16-bit floating-point value using the encoding provided in Section 2.1.2. 16625bd8deadSopenharmony_ci 16635bd8deadSopenharmony_ci If the component size of the source register doesn't match the size of the 16645bd8deadSopenharmony_ci specified operand data type, the source register components are first 16655bd8deadSopenharmony_ci interpreted as a value with the same base data type as the operand and 16665bd8deadSopenharmony_ci converted to the operand data type. The operand components are then 16675bd8deadSopenharmony_ci converted to the result data type. Finally, if the component size of the 16685bd8deadSopenharmony_ci destination register doesn't match the specified result data type, the 16695bd8deadSopenharmony_ci result components are converted to values of the same base data type with 16705bd8deadSopenharmony_ci a size matching the result register's component size. 16715bd8deadSopenharmony_ci 16725bd8deadSopenharmony_ci Data type conversion is performed by first converting the source 16735bd8deadSopenharmony_ci components to an infinite-precision value of the destination data type, 16745bd8deadSopenharmony_ci and then converting to the result data type. When converting between 16755bd8deadSopenharmony_ci floating-point and integer values, integer values are never interpreted as 16765bd8deadSopenharmony_ci being normalized to [0,1] or [-1,+1]. Converting the floating-point 16775bd8deadSopenharmony_ci special values -INF, +INF, and NaN to integers will yield undefined 16785bd8deadSopenharmony_ci results. 16795bd8deadSopenharmony_ci 16805bd8deadSopenharmony_ci When converting from a non-integral floating-point value to an integer, 16815bd8deadSopenharmony_ci one of the two integers closest in value to the floating-point value are 16825bd8deadSopenharmony_ci chosen according to the rounding instruction modifier. If "CEIL" or "FLR" 16835bd8deadSopenharmony_ci is specified, the larger or smaller value, respectively is chosen. If 16845bd8deadSopenharmony_ci "TRUNC" is specified, the value nearest to zero is chosen. If "ROUND" is 16855bd8deadSopenharmony_ci specified, if one integer is nearer in value to the original 16865bd8deadSopenharmony_ci floating-point value, it is chosen; otherwise, the even integer is chosen. 16875bd8deadSopenharmony_ci "ROUND" is used if no rounding modifier is specified. 16885bd8deadSopenharmony_ci 16895bd8deadSopenharmony_ci When converting from the infinite-precision intermediate value to the 16905bd8deadSopenharmony_ci destination data type: 16915bd8deadSopenharmony_ci 16925bd8deadSopenharmony_ci * Floating-point values not exactly representable in the destination 16935bd8deadSopenharmony_ci data are rounded to one of the two nearest values in the destination 16945bd8deadSopenharmony_ci type according to the rounding modifier. Note that the results of 16955bd8deadSopenharmony_ci float-to-float conversion are not automatically rounded to integer 16965bd8deadSopenharmony_ci values, even if a rounding modifier such as CEIL or FLR is specified. 16975bd8deadSopenharmony_ci 16985bd8deadSopenharmony_ci * Integer values are clamped to the closest value representable in the 16995bd8deadSopenharmony_ci result data type if the "SAT" (saturation) modifier is specified. 17005bd8deadSopenharmony_ci 17015bd8deadSopenharmony_ci * Integer values drop the most significant bits if the "SAT" modifier is 17025bd8deadSopenharmony_ci not specified. 17035bd8deadSopenharmony_ci 17045bd8deadSopenharmony_ci Negation and absolute value operators are not supported on the source 17055bd8deadSopenharmony_ci operand; a program using such operators will fail to compile. 17065bd8deadSopenharmony_ci 17075bd8deadSopenharmony_ci CVT supports no data type modifiers; the type of the operand and result 17085bd8deadSopenharmony_ci vectors is fully specified by the required storage modifiers. 17095bd8deadSopenharmony_ci 17105bd8deadSopenharmony_ci 17115bd8deadSopenharmony_ci Section 2.X.8.Z, EMIT: Emit Vertex 17125bd8deadSopenharmony_ci 17135bd8deadSopenharmony_ci (Modify the description of the EMIT opcode to deal with the interaction 17145bd8deadSopenharmony_ci with multiple vertex streams added by ARB_transform_feedback3. For more 17155bd8deadSopenharmony_ci information on vertex streams, see ARB_transform_feedback3.) 17165bd8deadSopenharmony_ci 17175bd8deadSopenharmony_ci The EMIT instruction emits a new vertex to be added to the current output 17185bd8deadSopenharmony_ci primitive for vertex stream zero. The attributes of the emitted vertex 17195bd8deadSopenharmony_ci are given by the current values of the vertex result variables. After the 17205bd8deadSopenharmony_ci EMIT instruction completes, a new vertex is started and all result 17215bd8deadSopenharmony_ci variables become undefined. 17225bd8deadSopenharmony_ci 17235bd8deadSopenharmony_ci 17245bd8deadSopenharmony_ci Section 2.X.8.Z, EMITS: Emit Vertex to Stream 17255bd8deadSopenharmony_ci 17265bd8deadSopenharmony_ci (Add new geometry program opcode; the EMITS instruction is not supported 17275bd8deadSopenharmony_ci for any other program types. For more information on vertex streams, see 17285bd8deadSopenharmony_ci ARB_transform_feedback3.) 17295bd8deadSopenharmony_ci 17305bd8deadSopenharmony_ci The EMITS instruction emits a new vertex to be added to the current output 17315bd8deadSopenharmony_ci primitive for the vertex stream specified by the single signed integer 17325bd8deadSopenharmony_ci scalar operand. The attributes of the emitted vertex are given by the 17335bd8deadSopenharmony_ci current values of the vertex result variables. After the EMITS 17345bd8deadSopenharmony_ci instruction completes, a new vertex is started and all result variables 17355bd8deadSopenharmony_ci become undefined. 17365bd8deadSopenharmony_ci 17375bd8deadSopenharmony_ci If the specified stream is negative or greater than or equal to the 17385bd8deadSopenharmony_ci implementation-dependent number of vertex streams 17395bd8deadSopenharmony_ci (MAX_VERTEX_STREAMS_NV), the results of the instruction are undefined. 17405bd8deadSopenharmony_ci 17415bd8deadSopenharmony_ci 17425bd8deadSopenharmony_ci Section 2.X.8.Z, IPAC: Interpolate at Centroid 17435bd8deadSopenharmony_ci 17445bd8deadSopenharmony_ci The IPAC instruction generates a result vector by evaluating the fragment 17455bd8deadSopenharmony_ci attribute named by the single vector operand at the centroid location. 17465bd8deadSopenharmony_ci The result vector would be identical to the value obtained by a MOV 17475bd8deadSopenharmony_ci instruction if the attribute variable were declared using the CENTROID 17485bd8deadSopenharmony_ci modifier. 17495bd8deadSopenharmony_ci 17505bd8deadSopenharmony_ci When interpolating an attribute variable with this instruction, the 17515bd8deadSopenharmony_ci CENTROID and SAMPLE attribute variable modifiers are ignored. The FLAT 17525bd8deadSopenharmony_ci and NOPERSPECTIVE variable modifiers operate normally. 17535bd8deadSopenharmony_ci 17545bd8deadSopenharmony_ci tmp0 = Interpolate(op0, x_pixel + x_centroid, y_pixel + x_centroid); 17555bd8deadSopenharmony_ci result = tmp0; 17565bd8deadSopenharmony_ci 17575bd8deadSopenharmony_ci IPAC supports only floating-point data type modifiers. A program will 17585bd8deadSopenharmony_ci fail to load if it contains an IPAC instruction whose single operand is 17595bd8deadSopenharmony_ci not a fragment program attribute variable or matches the "fragment.facing" 17605bd8deadSopenharmony_ci or "primitive.id" binding. 17615bd8deadSopenharmony_ci 17625bd8deadSopenharmony_ci 17635bd8deadSopenharmony_ci Section 2.X.8.Z, IPAO: Interpolate with Offset 17645bd8deadSopenharmony_ci 17655bd8deadSopenharmony_ci The IPAO instruction generates a result vector by evaluating the fragment 17665bd8deadSopenharmony_ci attribute named by the single vector operand at an offset from the pixel 17675bd8deadSopenharmony_ci center given by the x and y components of the second vector operand. The 17685bd8deadSopenharmony_ci z and w components of the second vector operand are ignored. The (x,y) 17695bd8deadSopenharmony_ci position used for interpolating the attribute variable is obtained by 17705bd8deadSopenharmony_ci adding the (x,y) offsets in the second vector operand to the (x,y) 17715bd8deadSopenharmony_ci position of the pixel center. 17725bd8deadSopenharmony_ci 17735bd8deadSopenharmony_ci The range of offsets supported by the IPAO instruction is 17745bd8deadSopenharmony_ci implementation-dependent. The position used to interpolate the attribute 17755bd8deadSopenharmony_ci variable is undefined if the x or y component of the second operand is 17765bd8deadSopenharmony_ci less than MIN_FRAGMENT_INTERPOLATION_OFFSET_NV or greater than 17775bd8deadSopenharmony_ci MAX_FRAGMENT_INTERPOLATION_OFFSET_NV. Additionally, the granularity of 17785bd8deadSopenharmony_ci offsets may be limited. The (x,y) value may be snapped to a fixed 17795bd8deadSopenharmony_ci sub-pixel grid with the number of subpixel bits given by 17805bd8deadSopenharmony_ci FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV. 17815bd8deadSopenharmony_ci 17825bd8deadSopenharmony_ci When interpolating an attribute variable with this instruction, the 17835bd8deadSopenharmony_ci CENTROID and SAMPLE attribute variable modifiers are ignored. The FLAT 17845bd8deadSopenharmony_ci and NOPERSPECTIVE variable modifiers operate normally. 17855bd8deadSopenharmony_ci 17865bd8deadSopenharmony_ci tmp1 = VectorLoad(op1); 17875bd8deadSopenharmony_ci tmp0 = Interpolate(op0, x_pixel + tmp1.x, y_pixel + tmp2.x); 17885bd8deadSopenharmony_ci result = tmp0; 17895bd8deadSopenharmony_ci 17905bd8deadSopenharmony_ci IPAO supports only floating-point data type modifiers. A program will 17915bd8deadSopenharmony_ci fail to load if it contains an IPAO instruction whose first operand is not 17925bd8deadSopenharmony_ci a fragment program attribute variable or matches the "fragment.facing" or 17935bd8deadSopenharmony_ci "primitive.id" binding. 17945bd8deadSopenharmony_ci 17955bd8deadSopenharmony_ci 17965bd8deadSopenharmony_ci Section 2.X.8.Z, IPAS: Interpolate at Sample Location 17975bd8deadSopenharmony_ci 17985bd8deadSopenharmony_ci The IPAS instruction generates a result vector by evaluating the fragment 17995bd8deadSopenharmony_ci attribute named by the single vector operand at the location of the 18005bd8deadSopenharmony_ci pixel's sample whose sample number is given by the second integer scalar 18015bd8deadSopenharmony_ci operand. If multisample buffers are not available (SAMPLE_BUFFERS is 18025bd8deadSopenharmony_ci zero), the attribute will be evaluated at the pixel center. If the sample 18035bd8deadSopenharmony_ci number given by the second operand does not exist, the position used to 18045bd8deadSopenharmony_ci interpolate the attribute is undefined. 18055bd8deadSopenharmony_ci 18065bd8deadSopenharmony_ci When interpolating an attribute variable with this instruction, the 18075bd8deadSopenharmony_ci CENTROID and SAMPLE attribute variable modifiers are ignored. The FLAT 18085bd8deadSopenharmony_ci and NOPERSPECTIVE variable modifiers operate normally. 18095bd8deadSopenharmony_ci 18105bd8deadSopenharmony_ci sample = ScalarLoad(op1); 18115bd8deadSopenharmony_ci tmp1 = SampleOffset(sample); 18125bd8deadSopenharmony_ci tmp0 = Interpolate(op0, x_pixel + tmp1.x, y_pixel + tmp2.x); 18135bd8deadSopenharmony_ci result = tmp0; 18145bd8deadSopenharmony_ci 18155bd8deadSopenharmony_ci IPAS supports only floating-point data type modifiers. A program will 18165bd8deadSopenharmony_ci fail to load if it contains an IPAO instruction whose first operand is not 18175bd8deadSopenharmony_ci a fragment program attribute variable or matches the "fragment.facing" or 18185bd8deadSopenharmony_ci "primitive.id" binding. 18195bd8deadSopenharmony_ci 18205bd8deadSopenharmony_ci 18215bd8deadSopenharmony_ci Section 2.X.8.Z, LDC: Load from Constant Buffer 18225bd8deadSopenharmony_ci 18235bd8deadSopenharmony_ci The LDC instruction loads a vector operand from a buffer object to yield a 18245bd8deadSopenharmony_ci result vector. The operand used for the LDC instruction must correspond 18255bd8deadSopenharmony_ci to a parameter buffer variable declared using the "CBUFFER" statement; a 18265bd8deadSopenharmony_ci program will fail to load if any other type of operand is used in an LDC 18275bd8deadSopenharmony_ci instruction. 18285bd8deadSopenharmony_ci 18295bd8deadSopenharmony_ci result = BufferMemoryLoad(&op0, storageModifier); 18305bd8deadSopenharmony_ci 18315bd8deadSopenharmony_ci A base operand vector is fetched from memory as described in Section 18325bd8deadSopenharmony_ci 2.X.4.5, with the GPU address derived from the binding corresponding to 18335bd8deadSopenharmony_ci the operand. A final operand vector is derived from the base operand 18345bd8deadSopenharmony_ci vector by applying swizzle, negation, and absolute value operand modifiers 18355bd8deadSopenharmony_ci as described in Section 2.X.4.2. 18365bd8deadSopenharmony_ci 18375bd8deadSopenharmony_ci The amount of memory in any given buffer object binding accessible by the 18385bd8deadSopenharmony_ci LDC instruction may be limited. If any component fetched by the LDC 18395bd8deadSopenharmony_ci instruction extends 4*<n> or more basic machine units from the beginning 18405bd8deadSopenharmony_ci of the buffer object binding, where <n> is the implementation-dependent 18415bd8deadSopenharmony_ci constant MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV, the value fetched for that 18425bd8deadSopenharmony_ci component will be undefined. 18435bd8deadSopenharmony_ci 18445bd8deadSopenharmony_ci LDC supports no base data type modifiers, but requires exactly one storage 18455bd8deadSopenharmony_ci modifier. The base data types of the operand and result vectors are 18465bd8deadSopenharmony_ci derived from the storage modifier. 18475bd8deadSopenharmony_ci 18485bd8deadSopenharmony_ci 18495bd8deadSopenharmony_ci Section 2.X.8.Z, LOAD: Global Load 18505bd8deadSopenharmony_ci 18515bd8deadSopenharmony_ci The LOAD instruction generates a result vector by reading an address from 18525bd8deadSopenharmony_ci the single unsigned integer scalar operand and fetching data from buffer 18535bd8deadSopenharmony_ci object memory, as described in Section 2.X.4.5. 18545bd8deadSopenharmony_ci 18555bd8deadSopenharmony_ci address = ScalarLoad(op0); 18565bd8deadSopenharmony_ci result = BufferMemoryLoad(address, storageModifier); 18575bd8deadSopenharmony_ci 18585bd8deadSopenharmony_ci LOAD supports no base data type modifiers, but requires exactly one 18595bd8deadSopenharmony_ci storage modifier. The base data type of the result vector is derived from 18605bd8deadSopenharmony_ci the storage modifier. The single scalar operand is always interpreted as 18615bd8deadSopenharmony_ci an unsigned integer. 18625bd8deadSopenharmony_ci 18635bd8deadSopenharmony_ci 18645bd8deadSopenharmony_ci Section 2.X.8.Z, MEMBAR: Memory Barrier 18655bd8deadSopenharmony_ci 18665bd8deadSopenharmony_ci The MEMBAR instruction synchronizes memory transactions to ensure that 18675bd8deadSopenharmony_ci memory transactions resulting from any instruction executed by the thread 18685bd8deadSopenharmony_ci prior to the MEMBAR instruction complete prior to any memory transactions 18695bd8deadSopenharmony_ci issued after the instruction. 18705bd8deadSopenharmony_ci 18715bd8deadSopenharmony_ci MEMBAR has no operands and generates no result. 18725bd8deadSopenharmony_ci 18735bd8deadSopenharmony_ci 18745bd8deadSopenharmony_ci Section 2.X.8.Z, PK64: Pack 64-Bit Component 18755bd8deadSopenharmony_ci 18765bd8deadSopenharmony_ci The PK64 instruction reads the four components of the single vector 18775bd8deadSopenharmony_ci operand as 32-bit values, packs the bit representations of these into a 18785bd8deadSopenharmony_ci pair of 64-bit values, and replicates those to produce a four-component 18795bd8deadSopenharmony_ci result vector. The "x" and "y" components of the operand are packed to 18805bd8deadSopenharmony_ci produce the "x" and "z" components of the result vector; the "z" and "w" 18815bd8deadSopenharmony_ci components of the operand are packed to produce the "y" and "w" components 18825bd8deadSopenharmony_ci of the result vector. The PK64 instruction can be reversed by the UP64 18835bd8deadSopenharmony_ci instruction below. 18845bd8deadSopenharmony_ci 18855bd8deadSopenharmony_ci This instruction is intended to allow a program to reconstruct 64-bit 18865bd8deadSopenharmony_ci integer or floating-point values generated by the application but passed 18875bd8deadSopenharmony_ci to the GL as two 32-bit values taken from adjacent words in memory. The 18885bd8deadSopenharmony_ci ability to use this technique depends on how the 64-bit value is stored in 18895bd8deadSopenharmony_ci memory. For "little-endian" processors, first 32-bit value would hold the 18905bd8deadSopenharmony_ci with the least significant 32 bits of the 64-bit value. For "big-endian" 18915bd8deadSopenharmony_ci processors, the first 32-bit value holds the most significant 32 bits of 18925bd8deadSopenharmony_ci the 64-bit value. This reconstruction assumes that the first 32-bit word 18935bd8deadSopenharmony_ci comes from the x component of the operand and the second 32-bit word comes 18945bd8deadSopenharmony_ci from the y component. The method used to construct a 64-bit value from a 18955bd8deadSopenharmony_ci pair of 32-bit values depends on the processor type. 18965bd8deadSopenharmony_ci 18975bd8deadSopenharmony_ci tmp = VectorLoad(op0); 18985bd8deadSopenharmony_ci 18995bd8deadSopenharmony_ci if (underlying system is little-endian) { 19005bd8deadSopenharmony_ci result.x = RawBits(tmp.x) | (RawBits(tmp.y) << 32); 19015bd8deadSopenharmony_ci result.y = RawBits(tmp.z) | (RawBits(tmp.w) << 32); 19025bd8deadSopenharmony_ci result.z = RawBits(tmp.x) | (RawBits(tmp.y) << 32); 19035bd8deadSopenharmony_ci result.w = RawBits(tmp.z) | (RawBits(tmp.w) << 32); 19045bd8deadSopenharmony_ci } else { 19055bd8deadSopenharmony_ci result.x = RawBits(tmp.y) | (RawBits(tmp.x) << 32); 19065bd8deadSopenharmony_ci result.y = RawBits(tmp.w) | (RawBits(tmp.z) << 32); 19075bd8deadSopenharmony_ci result.z = RawBits(tmp.y) | (RawBits(tmp.x) << 32); 19085bd8deadSopenharmony_ci result.w = RawBits(tmp.w) | (RawBits(tmp.z) << 32); 19095bd8deadSopenharmony_ci } 19105bd8deadSopenharmony_ci 19115bd8deadSopenharmony_ci PK64 supports integer and floating-point data type modifiers, which 19125bd8deadSopenharmony_ci specify the base data type of the operand and result. The single vector 19135bd8deadSopenharmony_ci operand is always treated as having 32-bit components, and the result is 19145bd8deadSopenharmony_ci treated as a vector with 64-bit components. The encoding performed by 19155bd8deadSopenharmony_ci PK64 can be reversed using the UP64 instruction. 19165bd8deadSopenharmony_ci 19175bd8deadSopenharmony_ci A program will fail to load if it contains a PK64 instruction that writes 19185bd8deadSopenharmony_ci its results to a variable not declared as "LONG". 19195bd8deadSopenharmony_ci 19205bd8deadSopenharmony_ci 19215bd8deadSopenharmony_ci Section 2.X.8.Z, STORE: Global Store 19225bd8deadSopenharmony_ci 19235bd8deadSopenharmony_ci The STORE instruction reads an address from the second unsigned integer 19245bd8deadSopenharmony_ci scalar operand and writes the contents of the first vector operand to 19255bd8deadSopenharmony_ci buffer object memory at that address, as described in Section 2.X.4.5. 19265bd8deadSopenharmony_ci This instruction generates no result. 19275bd8deadSopenharmony_ci 19285bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 19295bd8deadSopenharmony_ci address = ScalarLoad(op1); 19305bd8deadSopenharmony_ci BufferMemoryStore(address, tmp0, storageModifier); 19315bd8deadSopenharmony_ci 19325bd8deadSopenharmony_ci STORE supports no base data type modifiers, but requires exactly one 19335bd8deadSopenharmony_ci storage modifier. The base data type of the vector components of the 19345bd8deadSopenharmony_ci first operand is derived from the storage modifier. The second operand is 19355bd8deadSopenharmony_ci always interpreted as an unsigned integer scalar. 19365bd8deadSopenharmony_ci 19375bd8deadSopenharmony_ci 19385bd8deadSopenharmony_ci Section 2.X.8.Z, TEX: Texture Sample 19395bd8deadSopenharmony_ci 19405bd8deadSopenharmony_ci (Modify the instruction pseudo-code to account for texel offsets no 19415bd8deadSopenharmony_ci longer need to be immediate arguments.) 19425bd8deadSopenharmony_ci 19435bd8deadSopenharmony_ci tmp = VectorLoad(op0); 19445bd8deadSopenharmony_ci if (instruction has variable texel offset) { 19455bd8deadSopenharmony_ci itmp = VectorLoad(op1); 19465bd8deadSopenharmony_ci } else { 19475bd8deadSopenharmony_ci itmp = instruction.texelOffset; 19485bd8deadSopenharmony_ci } 19495bd8deadSopenharmony_ci ddx = ComputePartialsX(tmp); 19505bd8deadSopenharmony_ci ddy = ComputePartialsY(tmp); 19515bd8deadSopenharmony_ci lambda = ComputeLOD(ddx, ddy); 19525bd8deadSopenharmony_ci result = TextureSample(tmp, lambda, ddx, ddy, itmp); 19535bd8deadSopenharmony_ci 19545bd8deadSopenharmony_ci 19555bd8deadSopenharmony_ci Section 2.X.8.Z, TGALL: Test for All Non-Zero in a Thread Group 19565bd8deadSopenharmony_ci 19575bd8deadSopenharmony_ci The TGALL instruction produces a result vector by reading a vector operand 19585bd8deadSopenharmony_ci for each active thread in the current thread group and comparing each 19595bd8deadSopenharmony_ci component to zero. A result vector component contains a TRUE value 19605bd8deadSopenharmony_ci (described below) if the value of the corresponding component in the 19615bd8deadSopenharmony_ci operand vector is non-zero for all active threads, and a FALSE value 19625bd8deadSopenharmony_ci otherwise. 19635bd8deadSopenharmony_ci 19645bd8deadSopenharmony_ci An implementation may choose to arrange programs threads into thread 19655bd8deadSopenharmony_ci groups, and execute an instruction simultaneously for each thread in the 19665bd8deadSopenharmony_ci group. If the TGALL instruction is contained inside conditional flow 19675bd8deadSopenharmony_ci control blocks and not all threads in the group execute the instruction, 19685bd8deadSopenharmony_ci the operand values for threads not executing the instruction have no 19695bd8deadSopenharmony_ci bearing on the value returned. The method used to arrange threads into 19705bd8deadSopenharmony_ci groups is undefined. 19715bd8deadSopenharmony_ci 19725bd8deadSopenharmony_ci tmp = VectorLoad(op0); 19735bd8deadSopenharmony_ci result = { TRUE, TRUE, TRUE, TRUE }; 19745bd8deadSopenharmony_ci for (all active threads) { 19755bd8deadSopenharmony_ci if ([thread]tmp.x == 0) result.x = FALSE; 19765bd8deadSopenharmony_ci if ([thread]tmp.y == 0) result.y = FALSE; 19775bd8deadSopenharmony_ci if ([thread]tmp.z == 0) result.z = FALSE; 19785bd8deadSopenharmony_ci if ([thread]tmp.w == 0) result.w = FALSE; 19795bd8deadSopenharmony_ci } 19805bd8deadSopenharmony_ci 19815bd8deadSopenharmony_ci TGALL supports all data type modifiers. For floating-point data types, 19825bd8deadSopenharmony_ci the TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 19835bd8deadSopenharmony_ci types, the TRUE value is -1 and the FALSE value is 0. For unsigned 19845bd8deadSopenharmony_ci integer data types, the TRUE value is the maximum integer value (all bits 19855bd8deadSopenharmony_ci are ones) and the FALSE value is zero. 19865bd8deadSopenharmony_ci 19875bd8deadSopenharmony_ci 19885bd8deadSopenharmony_ci Section 2.X.8.Z, TGANY: Test for Any Non-Zero in a Thread Group 19895bd8deadSopenharmony_ci 19905bd8deadSopenharmony_ci The TGANY instruction produces a result vector by reading a vector operand 19915bd8deadSopenharmony_ci for each active thread in the current thread group and comparing each 19925bd8deadSopenharmony_ci component to zero. A result vector component contains a TRUE value 19935bd8deadSopenharmony_ci (described below) if the value of the corresponding component in the 19945bd8deadSopenharmony_ci operand vector is non-zero for any active thread, and a FALSE value 19955bd8deadSopenharmony_ci otherwise. 19965bd8deadSopenharmony_ci 19975bd8deadSopenharmony_ci An implementation may choose to arrange programs threads into thread 19985bd8deadSopenharmony_ci groups, and execute an instruction simultaneously for each thread in the 19995bd8deadSopenharmony_ci group. If the TGANY instruction is contained inside conditional flow 20005bd8deadSopenharmony_ci control blocks and not all threads in the group execute the instruction, 20015bd8deadSopenharmony_ci the operand values for threads not executing the instruction have no 20025bd8deadSopenharmony_ci bearing on the value returned. The method used to arrange threads into 20035bd8deadSopenharmony_ci groups is undefined. 20045bd8deadSopenharmony_ci 20055bd8deadSopenharmony_ci tmp = VectorLoad(op0); 20065bd8deadSopenharmony_ci result = { FALSE, FALSE, FALSE, FALSE }; 20075bd8deadSopenharmony_ci for (all active threads) { 20085bd8deadSopenharmony_ci if ([thread]tmp.x != 0) result.x = TRUE; 20095bd8deadSopenharmony_ci if ([thread]tmp.y != 0) result.y = TRUE; 20105bd8deadSopenharmony_ci if ([thread]tmp.z != 0) result.z = TRUE; 20115bd8deadSopenharmony_ci if ([thread]tmp.w != 0) result.w = TRUE; 20125bd8deadSopenharmony_ci } 20135bd8deadSopenharmony_ci 20145bd8deadSopenharmony_ci TGANY supports all data type modifiers. For floating-point data types, 20155bd8deadSopenharmony_ci the TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 20165bd8deadSopenharmony_ci types, the TRUE value is -1 and the FALSE value is 0. For unsigned 20175bd8deadSopenharmony_ci integer data types, the TRUE value is the maximum integer value (all bits 20185bd8deadSopenharmony_ci are ones) and the FALSE value is zero. 20195bd8deadSopenharmony_ci 20205bd8deadSopenharmony_ci 20215bd8deadSopenharmony_ci Section 2.X.8.Z, TGEQ: Test for All Equal Values in a Thread Group 20225bd8deadSopenharmony_ci 20235bd8deadSopenharmony_ci The TGEQ instruction produces a result vector by reading a vector operand 20245bd8deadSopenharmony_ci for each active thread in the current thread group and comparing each 20255bd8deadSopenharmony_ci component to zero. A result vector component contains a TRUE value 20265bd8deadSopenharmony_ci (described below) if the value of the corresponding component in the 20275bd8deadSopenharmony_ci operand vector is the same for all active threads, and a FALSE value 20285bd8deadSopenharmony_ci otherwise. 20295bd8deadSopenharmony_ci 20305bd8deadSopenharmony_ci An implementation may choose to arrange programs threads into thread 20315bd8deadSopenharmony_ci groups, and execute an instruction simultaneously for each thread in the 20325bd8deadSopenharmony_ci group. If the TGEQ instruction is contained inside conditional flow 20335bd8deadSopenharmony_ci control blocks and not all threads in the group execute the instruction, 20345bd8deadSopenharmony_ci the operand values for threads not executing the instruction have no 20355bd8deadSopenharmony_ci bearing on the value returned. The method used to arrange threads into 20365bd8deadSopenharmony_ci groups is undefined. 20375bd8deadSopenharmony_ci 20385bd8deadSopenharmony_ci tmp = VectorLoad(op0); 20395bd8deadSopenharmony_ci tgall = { TRUE, TRUE, TRUE, TRUE }; 20405bd8deadSopenharmony_ci tgany = { FALSE, FALSE, FALSE, FALSE }; 20415bd8deadSopenharmony_ci for (all active threads) { 20425bd8deadSopenharmony_ci if ([thread]tmp.x == 0) tgall.x = FALSE; else tgany.x = TRUE; 20435bd8deadSopenharmony_ci if ([thread]tmp.y == 0) tgall.y = FALSE; else tgany.y = TRUE; 20445bd8deadSopenharmony_ci if ([thread]tmp.z == 0) tgall.z = FALSE; else tgany.z = TRUE; 20455bd8deadSopenharmony_ci if ([thread]tmp.w == 0) tgall.w = FALSE; else tgany.w = TRUE; 20465bd8deadSopenharmony_ci } 20475bd8deadSopenharmony_ci result.x = (tgall.x == tgany.x) ? TRUE : FALSE; 20485bd8deadSopenharmony_ci result.y = (tgall.y == tgany.y) ? TRUE : FALSE; 20495bd8deadSopenharmony_ci result.z = (tgall.z == tgany.z) ? TRUE : FALSE; 20505bd8deadSopenharmony_ci result.w = (tgall.w == tgany.w) ? TRUE : FALSE; 20515bd8deadSopenharmony_ci 20525bd8deadSopenharmony_ci TGEQ supports all data type modifiers. For floating-point data types, the 20535bd8deadSopenharmony_ci TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 20545bd8deadSopenharmony_ci types, the TRUE value is -1 and the FALSE value is 0. For unsigned 20555bd8deadSopenharmony_ci integer data types, the TRUE value is the maximum integer value (all bits 20565bd8deadSopenharmony_ci are ones) and the FALSE value is zero. 20575bd8deadSopenharmony_ci 20585bd8deadSopenharmony_ci 20595bd8deadSopenharmony_ci Section 2.X.8.Z, TXB: Texture Sample with Bias 20605bd8deadSopenharmony_ci 20615bd8deadSopenharmony_ci (Modify the instruction pseudo-code to account for texel offsets no 20625bd8deadSopenharmony_ci longer need to be immediate arguments.) 20635bd8deadSopenharmony_ci 20645bd8deadSopenharmony_ci tmp = VectorLoad(op0); 20655bd8deadSopenharmony_ci if (instruction has variable texel offset) { 20665bd8deadSopenharmony_ci itmp = VectorLoad(op1); 20675bd8deadSopenharmony_ci } else { 20685bd8deadSopenharmony_ci itmp = instruction.texelOffset; 20695bd8deadSopenharmony_ci } 20705bd8deadSopenharmony_ci ddx = ComputePartialsX(tmp); 20715bd8deadSopenharmony_ci ddy = ComputePartialsY(tmp); 20725bd8deadSopenharmony_ci lambda = ComputeLOD(ddx, ddy); 20735bd8deadSopenharmony_ci result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, itmp); 20745bd8deadSopenharmony_ci 20755bd8deadSopenharmony_ci Section 2.X.8.Z, TXG: Texture Gather 20765bd8deadSopenharmony_ci 20775bd8deadSopenharmony_ci (Update the TXG opcode description from NV_gpu_program4_1 specification. 20785bd8deadSopenharmony_ci This version adds two capabilities: any component of a multi-component 20795bd8deadSopenharmony_ci texture can be selected by tacking on a component name to the texture 20805bd8deadSopenharmony_ci variable passed to identify the texture unit, and depth compares are 20815bd8deadSopenharmony_ci supported if a SHADOW target is specified.) 20825bd8deadSopenharmony_ci 20835bd8deadSopenharmony_ci The TXG instruction takes the four components of a single floating-point 20845bd8deadSopenharmony_ci vector operand as a texture coordinate, determines a set of four texels to 20855bd8deadSopenharmony_ci sample from the base level of detail of the specified texture image, and 20865bd8deadSopenharmony_ci returns one component from each texel in a four-component result vector. 20875bd8deadSopenharmony_ci To determine the four texels to sample, the minification and magnification 20885bd8deadSopenharmony_ci filters are ignored and the rules for LINEAR filter are applied to the 20895bd8deadSopenharmony_ci base level of the texture image to determine the texels T_i0_j1, T_i1_j1, 20905bd8deadSopenharmony_ci T_i1_j0, and T_i0_j0, as defined in equations 3.23 through 3.25. The 20915bd8deadSopenharmony_ci texels are then converted to texture source colors (Rs,Gs,Bs,As) according 20925bd8deadSopenharmony_ci to table 3.21, followed by application of the texture swizzle as described 20935bd8deadSopenharmony_ci in section 3.8.13. A four-component vector is returned by taking one of 20945bd8deadSopenharmony_ci the four components of the swizzled texture source colors from each of the 20955bd8deadSopenharmony_ci four selected texels. The component is selected using the 20965bd8deadSopenharmony_ci <texImageUnitComp> grammar rule, by adding a scalar suffix 20975bd8deadSopenharmony_ci (".x", ".y", ".z", ".w") to the identified texture; if no scalar suffix 20985bd8deadSopenharmony_ci is provided, the first component is selected. 20995bd8deadSopenharmony_ci 21005bd8deadSopenharmony_ci TXG only operates on 2D, SHADOW2D, CUBE, SHADOWCUBE, ARRAY2D, 21015bd8deadSopenharmony_ci SHADOWARRAY2D, ARRAYCUBE, SHADOWARRAYCUBE, RECT, and SHADOWRECT texture 21025bd8deadSopenharmony_ci targets; a program will fail to compile if any other texture target is 21035bd8deadSopenharmony_ci used. 21045bd8deadSopenharmony_ci 21055bd8deadSopenharmony_ci When using a "SHADOW" texture target, component selection is ignored. 21065bd8deadSopenharmony_ci Instead, depth comparisons are performed on the depth values for each of 21075bd8deadSopenharmony_ci the four selected texels, and 0/1 values are returned based on the results 21085bd8deadSopenharmony_ci of the comparison. 21095bd8deadSopenharmony_ci 21105bd8deadSopenharmony_ci As with other texture accesses, the results of a texture gather operation 21115bd8deadSopenharmony_ci are undefined if the texture target in the instruction is incompatible 21125bd8deadSopenharmony_ci with the selected texture's base internal format and depth compare mode. 21135bd8deadSopenharmony_ci 21145bd8deadSopenharmony_ci tmp = VectorLoad(op0); 21155bd8deadSopenharmony_ci ddx = (0,0,0); 21165bd8deadSopenharmony_ci ddy = (0,0,0); 21175bd8deadSopenharmony_ci lambda = 0; 21185bd8deadSopenharmony_ci if (instruction has variable texel offset) { 21195bd8deadSopenharmony_ci itmp = VectorLoad(op1); 21205bd8deadSopenharmony_ci } else { 21215bd8deadSopenharmony_ci itmp = instruction.texelOffset; 21225bd8deadSopenharmony_ci } 21235bd8deadSopenharmony_ci result.x = TextureSample_i0j1(tmp, lambda, ddx, ddy, itmp).<comp>; 21245bd8deadSopenharmony_ci result.y = TextureSample_i1j1(tmp, lambda, ddx, ddy, itmp).<comp>; 21255bd8deadSopenharmony_ci result.z = TextureSample_i1j0(tmp, lambda, ddx, ddy, itmp).<comp>; 21265bd8deadSopenharmony_ci result.w = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>; 21275bd8deadSopenharmony_ci 21285bd8deadSopenharmony_ci In this pseudocode, "<comp>" refers to the texel component selected by the 21295bd8deadSopenharmony_ci <texImageUnitComp> grammar rule, as described above. 21305bd8deadSopenharmony_ci 21315bd8deadSopenharmony_ci TXG supports all three data type modifiers. The single operand is always 21325bd8deadSopenharmony_ci treated as a floating-point vector; the results are interpreted according 21335bd8deadSopenharmony_ci to the data type modifier. 21345bd8deadSopenharmony_ci 21355bd8deadSopenharmony_ci 21365bd8deadSopenharmony_ci Section 2.X.8.Z, TXGO: Texture Gather with Per-Texel Offsets 21375bd8deadSopenharmony_ci 21385bd8deadSopenharmony_ci Like the TXG instruction, the TXGO instruction takes the four components 21395bd8deadSopenharmony_ci of its first floating-point vector operand as a texture coordinate, 21405bd8deadSopenharmony_ci determines a set of four texels to sample from the base level of detail of 21415bd8deadSopenharmony_ci the specified texture image, and returns one component from each texel in 21425bd8deadSopenharmony_ci a four-component result vector. The second and third vector operands are 21435bd8deadSopenharmony_ci taken as signed four-component integer vectors providing the x and y 21445bd8deadSopenharmony_ci components of the offsets, respectively, used to determine the location of 21455bd8deadSopenharmony_ci each of the four texels. To determine the four texels to sample, each of 21465bd8deadSopenharmony_ci the four independent offsets is used in conjunction with the specified 21475bd8deadSopenharmony_ci texture coordinate to select a texel. The minification and magnification 21485bd8deadSopenharmony_ci filters are ignored and the rules for LINEAR filtering are used to select 21495bd8deadSopenharmony_ci the texel T_i0_j0, as defined in equations 3.23 through 3.25, from the 21505bd8deadSopenharmony_ci base level of the texture image. The texels are then converted to texture 21515bd8deadSopenharmony_ci source colors (Rs,Gs,Bs,As) according to table 3.21, followed by 21525bd8deadSopenharmony_ci application of the texture swizzle as described in section 3.8.13. A 21535bd8deadSopenharmony_ci four-component vector is returned by taking one of the four components 21545bd8deadSopenharmony_ci of the swizzled texture source colors from each of the four selected 21555bd8deadSopenharmony_ci texels. The component is selected using the <texImageUnitComp> grammar 21565bd8deadSopenharmony_ci rule, by adding a scalar suffix (".x", ".y", ".z", ".w") to the identified 21575bd8deadSopenharmony_ci texture; if no scalar suffix is provided, the first component is selected. 21585bd8deadSopenharmony_ci 21595bd8deadSopenharmony_ci TXGO only operates on 2D, SHADOW2D, ARRAY2D, SHADOWARRAY2D, RECT, and 21605bd8deadSopenharmony_ci SHADOWRECT texture targets; a program will fail to compile if any other 21615bd8deadSopenharmony_ci texture target is used. 21625bd8deadSopenharmony_ci 21635bd8deadSopenharmony_ci When using a "SHADOW" texture target, component selection is ignored. 21645bd8deadSopenharmony_ci Instead, depth comparisons are performed on the depth values for each of 21655bd8deadSopenharmony_ci the four selected texels, and 0/1 values are returned based on the results 21665bd8deadSopenharmony_ci of the comparison. 21675bd8deadSopenharmony_ci 21685bd8deadSopenharmony_ci As with other texture accesses, the results of a texture gather operation 21695bd8deadSopenharmony_ci are undefined if the texture target in the instruction is incompatible 21705bd8deadSopenharmony_ci with the selected texture's base internal format and depth compare mode. 21715bd8deadSopenharmony_ci 21725bd8deadSopenharmony_ci tmp = VectorLoad(op0); 21735bd8deadSopenharmony_ci itmp1 = VectorLoad(op1); 21745bd8deadSopenharmony_ci itmp2 = VectorLoad(op2); 21755bd8deadSopenharmony_ci ddx = (0,0,0); 21765bd8deadSopenharmony_ci ddy = (0,0,0); 21775bd8deadSopenharmony_ci lambda = 0; 21785bd8deadSopenharmony_ci itmp = (op1.x, op2.x); 21795bd8deadSopenharmony_ci result.x = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>; 21805bd8deadSopenharmony_ci itmp = (op1.y, op2.y); 21815bd8deadSopenharmony_ci result.y = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>; 21825bd8deadSopenharmony_ci itmp = (op1.z, op2.z); 21835bd8deadSopenharmony_ci result.z = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>; 21845bd8deadSopenharmony_ci itmp = (op1.w, op2.w); 21855bd8deadSopenharmony_ci result.w = TextureSample_i0j0(tmp, lambda, ddx, ddy, itmp).<comp>; 21865bd8deadSopenharmony_ci 21875bd8deadSopenharmony_ci In this pseudocode, "<comp>" refers to the texel component selected by the 21885bd8deadSopenharmony_ci <texImageUnitComp> grammar rule, as described above. 21895bd8deadSopenharmony_ci 21905bd8deadSopenharmony_ci If TEXTURE_WRAP_S or TEXTURE_WRAP_T are either CLAMP or MIRROR_CLAMP_EXT, 21915bd8deadSopenharmony_ci the results of the TXGO instruction are undefined. 21925bd8deadSopenharmony_ci 21935bd8deadSopenharmony_ci Note: The TXG instruction is equivalent to the TXGO instruction with X 21945bd8deadSopenharmony_ci and Y offset vectors of (0,1,1,0) and (0,0,-1,-1), respectively. 21955bd8deadSopenharmony_ci 21965bd8deadSopenharmony_ci TXGO supports all three data type modifiers. The first operand is always 21975bd8deadSopenharmony_ci treated as a floating-point vector and the second and third operands are 21985bd8deadSopenharmony_ci always treated as a signed integer vector; the results are interpreted 21995bd8deadSopenharmony_ci according to the data type modifier. 22005bd8deadSopenharmony_ci 22015bd8deadSopenharmony_ci 22025bd8deadSopenharmony_ci Section 2.X.8.Z, TXL: Texture Sample with LOD 22035bd8deadSopenharmony_ci 22045bd8deadSopenharmony_ci (Modify the instruction pseudo-code to account for texel offsets no 22055bd8deadSopenharmony_ci longer need to be immediate arguments.) 22065bd8deadSopenharmony_ci 22075bd8deadSopenharmony_ci tmp = VectorLoad(op0); 22085bd8deadSopenharmony_ci if (instruction has variable texel offset) { 22095bd8deadSopenharmony_ci itmp = VectorLoad(op1); 22105bd8deadSopenharmony_ci } else { 22115bd8deadSopenharmony_ci itmp = instruction.texelOffset; 22125bd8deadSopenharmony_ci } 22135bd8deadSopenharmony_ci ddx = (0,0,0); 22145bd8deadSopenharmony_ci ddy = (0,0,0); 22155bd8deadSopenharmony_ci result = TextureSample(tmp, tmp.w, ddx, ddy, itmp); 22165bd8deadSopenharmony_ci 22175bd8deadSopenharmony_ci 22185bd8deadSopenharmony_ci Section 2.X.8.Z, TXP: Texture Sample with Projection 22195bd8deadSopenharmony_ci 22205bd8deadSopenharmony_ci (Modify the instruction pseudo-code to account for texel offsets no 22215bd8deadSopenharmony_ci longer need to be immediate arguments.) 22225bd8deadSopenharmony_ci 22235bd8deadSopenharmony_ci tmp0 = VectorLoad(op0); 22245bd8deadSopenharmony_ci tmp0.x = tmp0.x / tmp0.w; 22255bd8deadSopenharmony_ci tmp0.y = tmp0.y / tmp0.w; 22265bd8deadSopenharmony_ci tmp0.z = tmp0.z / tmp0.w; 22275bd8deadSopenharmony_ci if (instruction has variable texel offset) { 22285bd8deadSopenharmony_ci itmp = VectorLoad(op1); 22295bd8deadSopenharmony_ci } else { 22305bd8deadSopenharmony_ci itmp = instruction.texelOffset; 22315bd8deadSopenharmony_ci } 22325bd8deadSopenharmony_ci ddx = ComputePartialsX(tmp); 22335bd8deadSopenharmony_ci ddy = ComputePartialsY(tmp); 22345bd8deadSopenharmony_ci lambda = ComputeLOD(ddx, ddy); 22355bd8deadSopenharmony_ci result = TextureSample(tmp, lambda, ddx, ddy, itmp); 22365bd8deadSopenharmony_ci 22375bd8deadSopenharmony_ci 22385bd8deadSopenharmony_ci Section 2.X.8.Z, UP64: Unpack 64-bit Component 22395bd8deadSopenharmony_ci 22405bd8deadSopenharmony_ci The UP64 instruction produces a vector result with 32-bit components by 22415bd8deadSopenharmony_ci unpacking the bits of the "x" and "y" components of a 64-bit vector 22425bd8deadSopenharmony_ci operand. The "x" component of the operand is unpacked to produce the "x" 22435bd8deadSopenharmony_ci and "y" components of the result vector; the "y" component is unpacked to 22445bd8deadSopenharmony_ci produce the "z" and "w" components of the result vector. 22455bd8deadSopenharmony_ci 22465bd8deadSopenharmony_ci This instruction is intended to allow a program to pass 64-bit integer or 22475bd8deadSopenharmony_ci floating-point values to an application using two 32-bit values stored in 22485bd8deadSopenharmony_ci adjacent words in memory, which will be read by the application as single 22495bd8deadSopenharmony_ci 64-bit values. The ability to use this technique depends on how the 22505bd8deadSopenharmony_ci 64-bit value is stored in memory. For "little-endian" processors, the 22515bd8deadSopenharmony_ci first 32-bit value would hold the with the least significant 32 bits of 22525bd8deadSopenharmony_ci the 64-bit value. For "big-endian" processors, the first 32-bit value 22535bd8deadSopenharmony_ci holds the most significant 32 bits of the 64-bit value. This 22545bd8deadSopenharmony_ci reconstruction assumes that the first 32-bit word comes from the "x" 22555bd8deadSopenharmony_ci component of the operand and the second 32-bit word comes from the "y" 22565bd8deadSopenharmony_ci component. The method used to unpack a 64-bit value into a pair of 32-bit 22575bd8deadSopenharmony_ci values depends on the processor type. 22585bd8deadSopenharmony_ci 22595bd8deadSopenharmony_ci tmp = VectorLoad(op0); 22605bd8deadSopenharmony_ci if (underlying system is little-endian) { 22615bd8deadSopenharmony_ci result.x = (RawBits(tmp.x) >> 0) & 0xFFFFFFFF; 22625bd8deadSopenharmony_ci result.y = (RawBits(tmp.x) >> 32) & 0xFFFFFFFF; 22635bd8deadSopenharmony_ci result.z = (RawBits(tmp.y) >> 0) & 0xFFFFFFFF; 22645bd8deadSopenharmony_ci result.w = (RawBits(tmp.y) >> 32) & 0xFFFFFFFF; 22655bd8deadSopenharmony_ci } else { 22665bd8deadSopenharmony_ci result.x = (RawBits(tmp.x) >> 32) & 0xFFFFFFFF; 22675bd8deadSopenharmony_ci result.y = (RawBits(tmp.x) >> 0) & 0xFFFFFFFF; 22685bd8deadSopenharmony_ci result.z = (RawBits(tmp.y) >> 32) & 0xFFFFFFFF; 22695bd8deadSopenharmony_ci result.w = (RawBits(tmp.y) >> 0) & 0xFFFFFFFF; 22705bd8deadSopenharmony_ci } 22715bd8deadSopenharmony_ci 22725bd8deadSopenharmony_ci UP64 supports integer and floating-point data type modifiers, which 22735bd8deadSopenharmony_ci specify the base data type of the operand and result. The single operand 22745bd8deadSopenharmony_ci vector always has 64-bit components. The result is treated as a vector 22755bd8deadSopenharmony_ci with 32-bit components. The encoding performed by UP64 can be reversed 22765bd8deadSopenharmony_ci using the PK64 instruction. 22775bd8deadSopenharmony_ci 22785bd8deadSopenharmony_ci A program will fail to load if it contains a UP64 instruction whose 22795bd8deadSopenharmony_ci operand is a variable not declared as "LONG". 22805bd8deadSopenharmony_ci 22815bd8deadSopenharmony_ci 22825bd8deadSopenharmony_ci Modify Section 2.14.6.1 of the NV_geometry_program4 specification, 22835bd8deadSopenharmony_ci Geometry Program Input Primitives 22845bd8deadSopenharmony_ci 22855bd8deadSopenharmony_ci (add patches to the list of supported input primitive types) 22865bd8deadSopenharmony_ci 22875bd8deadSopenharmony_ci The supported input primitive types are: ... 22885bd8deadSopenharmony_ci 22895bd8deadSopenharmony_ci Patches (PATCHES) 22905bd8deadSopenharmony_ci 22915bd8deadSopenharmony_ci Geometry programs that operate on patches are valid only for the 22925bd8deadSopenharmony_ci PATCHES_NV primitive type. There are a variable number of vertices 22935bd8deadSopenharmony_ci available for each program invocation, depending on the number of input 22945bd8deadSopenharmony_ci vertices in the primitive itself. For a patch with <n> vertices, 22955bd8deadSopenharmony_ci "vertex[0]" refers to the first vertex of the patch, and "vertex[<n>-1]" 22965bd8deadSopenharmony_ci refers to the last vertex. 22975bd8deadSopenharmony_ci 22985bd8deadSopenharmony_ci 22995bd8deadSopenharmony_ci Modify Section 2.14.6.2 of the NV_geometry_program4 specification, 23005bd8deadSopenharmony_ci Geometry Program Output Primitives 23015bd8deadSopenharmony_ci 23025bd8deadSopenharmony_ci (Add a new paragraph limiting the use of the EMITS opcode to geometry 23035bd8deadSopenharmony_ci programs with a POINTS output primitive type at the end of the section. 23045bd8deadSopenharmony_ci This limitation may be removed in future specifications.) 23055bd8deadSopenharmony_ci 23065bd8deadSopenharmony_ci Geometry programs may write to multiple vertex streams only if the 23075bd8deadSopenharmony_ci specified output primitive type is POINTS. A program will fail to load if 23085bd8deadSopenharmony_ci it contains and EMITS instruction and the output primitive type specified 23095bd8deadSopenharmony_ci by the PRIMITIVE_OUT declaration is not POINTS. 23105bd8deadSopenharmony_ci 23115bd8deadSopenharmony_ci Modify Section 2.14.6.4 of the NV_geometry_program4 specification, 23125bd8deadSopenharmony_ci Geometry Program Output Limits 23135bd8deadSopenharmony_ci 23145bd8deadSopenharmony_ci (Modify the limitation on the total number of components emitted by a 23155bd8deadSopenharmony_ci geometry program from NV_gpu_program4 to be per-invocation. If a that 23165bd8deadSopenharmony_ci limit is 4096 and a program has 16 invocations, each of the 16 program 23175bd8deadSopenharmony_ci invocation can emit up to 4096 total components.) 23185bd8deadSopenharmony_ci 23195bd8deadSopenharmony_ci There are two implementation-dependent limits that limit the total number 23205bd8deadSopenharmony_ci of vertices that each invocation of a program can emit. First, the vertex 23215bd8deadSopenharmony_ci limit may not exceed the value of MAX_PROGRAM_OUTPUT_VERTICES_NV. Second, 23225bd8deadSopenharmony_ci product of the vertex limit and the number of result variable components 23235bd8deadSopenharmony_ci written by the program (PROGRAM_RESULT_COMPONENTS_NV, as described in 23245bd8deadSopenharmony_ci section 2.X.3.5 of NV_gpu_program4) may not exceed the value of 23255bd8deadSopenharmony_ci MAX_PROGRAM_TOTAL_OUTPUT_COMPONENTS_NV. A geometry program will fail to 23265bd8deadSopenharmony_ci load if its maximum vertex count or maximum total component count exceeds 23275bd8deadSopenharmony_ci the implementation-dependent limit. The limits may be queried by calling 23285bd8deadSopenharmony_ci GetProgramiv with a <target> of GEOMETRY_PROGRAM_NV. Note that the 23295bd8deadSopenharmony_ci maximum number of vertices that a geometry program can emit may be much 23305bd8deadSopenharmony_ci lower than MAX_PROGRAM_OUTPUT_VERTICES_NV if the program writes a large 23315bd8deadSopenharmony_ci number of result variable components. If a geometry program has multiple 23325bd8deadSopenharmony_ci invocations (via the "INVOCATIONS" declaration), the program will load 23335bd8deadSopenharmony_ci successfully as long as no single invocation exceeds the total component 23345bd8deadSopenharmony_ci count limit, even if the total output of all invocations combined exceeds 23355bd8deadSopenharmony_ci the limit. 23365bd8deadSopenharmony_ci 23375bd8deadSopenharmony_ci 23385bd8deadSopenharmony_ciAdditions to Chapter 3 of the OpenGL 3.0 Specification (Rasterization) 23395bd8deadSopenharmony_ci 23405bd8deadSopenharmony_ci Modify Section 3.X, Early Per-Fragment Tests, as documented in the 23415bd8deadSopenharmony_ci EXT_shader_image_load_store specification 23425bd8deadSopenharmony_ci 23435bd8deadSopenharmony_ci (add new paragraph at the end of a section, describing how early fragment 23445bd8deadSopenharmony_ci tests work when assembly fragment programs are active) 23455bd8deadSopenharmony_ci 23465bd8deadSopenharmony_ci If an assembly fragment program is active, early depth tests are 23475bd8deadSopenharmony_ci considered enabled if and only if the fragment program source included the 23485bd8deadSopenharmony_ci NV_early_fragment_tests option. 23495bd8deadSopenharmony_ci 23505bd8deadSopenharmony_ci 23515bd8deadSopenharmony_ci Add to Section 3.11.4.5 of ARB_fragment_program (Fragment Program): 23525bd8deadSopenharmony_ci 23535bd8deadSopenharmony_ci Section 3.11.4.5.3, ARB_blend_func_extended Option 23545bd8deadSopenharmony_ci 23555bd8deadSopenharmony_ci If a fragment program specifies the "ARB_blend_func_extended" option, dual 23565bd8deadSopenharmony_ci source color outputs as described in ARB_blend_func_extended are made 23575bd8deadSopenharmony_ci available through the use of the "result.color[n].primary" and 23585bd8deadSopenharmony_ci "result.color[n].secondary" result bindings, corresponding to SRC_COLOR 23595bd8deadSopenharmony_ci and SRC1_COLOR, respectively, for the fragment color output numbered <n>. 23605bd8deadSopenharmony_ci 23615bd8deadSopenharmony_ci 23625bd8deadSopenharmony_ciAdditions to Chapter 4 of the OpenGL 3.0 Specification (Per-Fragment 23635bd8deadSopenharmony_ciOperations and the Frame Buffer) 23645bd8deadSopenharmony_ci 23655bd8deadSopenharmony_ci Modify Section 4.4.3, Rendering When an Image of a Bound Texture Object 23665bd8deadSopenharmony_ci is Also Attached to the Framebuffer, p. 288 23675bd8deadSopenharmony_ci 23685bd8deadSopenharmony_ci (Replace the complicated set of conditions with the following) 23695bd8deadSopenharmony_ci 23705bd8deadSopenharmony_ci Specifically, the values of rendered fragments are undefined if any 23715bd8deadSopenharmony_ci shader stage fetches texels from a given mipmap level, cubemap face, and 23725bd8deadSopenharmony_ci array layer of a texture if that same mipmap level, cubemap face, and 23735bd8deadSopenharmony_ci array layer of the texture can be written to via fragment shader outputs, 23745bd8deadSopenharmony_ci even if the reads and writes are not in the same Draw call. However, an 23755bd8deadSopenharmony_ci application can insert MemoryBarrier(TEXTURE_FETCH_BARRIER_BIT_NV) between 23765bd8deadSopenharmony_ci Draw calls that have such read/write hazards in order to guarantee that 23775bd8deadSopenharmony_ci writes have completed and caches have been invalidated, as described in 23785bd8deadSopenharmony_ci section 2.20.X. 23795bd8deadSopenharmony_ci 23805bd8deadSopenharmony_ci 23815bd8deadSopenharmony_ciAdditions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions) 23825bd8deadSopenharmony_ci 23835bd8deadSopenharmony_ci None. 23845bd8deadSopenharmony_ci 23855bd8deadSopenharmony_ciAdditions to Chapter 6 of the OpenGL 3.0 Specification (State and 23865bd8deadSopenharmony_ciState Requests) 23875bd8deadSopenharmony_ci 23885bd8deadSopenharmony_ci None. 23895bd8deadSopenharmony_ci 23905bd8deadSopenharmony_ciAdditions to Appendix A of the OpenGL 3.0 Specification (Invariance) 23915bd8deadSopenharmony_ci 23925bd8deadSopenharmony_ci None. 23935bd8deadSopenharmony_ci 23945bd8deadSopenharmony_ciAdditions to the AGL/GLX/WGL Specifications 23955bd8deadSopenharmony_ci 23965bd8deadSopenharmony_ci None. 23975bd8deadSopenharmony_ci 23985bd8deadSopenharmony_ciGLX Protocol 23995bd8deadSopenharmony_ci 24005bd8deadSopenharmony_ci None. 24015bd8deadSopenharmony_ci 24025bd8deadSopenharmony_ciErrors 24035bd8deadSopenharmony_ci 24045bd8deadSopenharmony_ci None, other than new conditions by which a program string would fail to 24055bd8deadSopenharmony_ci load. 24065bd8deadSopenharmony_ci 24075bd8deadSopenharmony_ciNew State 24085bd8deadSopenharmony_ci 24095bd8deadSopenharmony_ci None. 24105bd8deadSopenharmony_ci 24115bd8deadSopenharmony_ci 24125bd8deadSopenharmony_ciNew Implementation Dependent State 24135bd8deadSopenharmony_ci 24145bd8deadSopenharmony_ci Minimum 24155bd8deadSopenharmony_ci Get Value Type Get Command Value Description Sec. Attrib 24165bd8deadSopenharmony_ci -------------------------------- ---- --------------- ------- --------------------- ------ ------ 24175bd8deadSopenharmony_ci MAX_GEOMETRY_PROGRAM_ Z+ GetIntegerv 32 Maximum number of GP 2.X.6.Y - 24185bd8deadSopenharmony_ci INVOCATIONS_NV invocations per prim. 24195bd8deadSopenharmony_ci MIN_FRAGMENT_INTERPOLATION_ R GetFloatv -0.5 Max. negative offset 2.X.8.Z - 24205bd8deadSopenharmony_ci OFFSET_NV for IPAO instruction. 24215bd8deadSopenharmony_ci MAX_FRAGMENT_INTERPOLATION_ R GetFloatv +0.5 Max. positive offset 2.X.8.Z - 24225bd8deadSopenharmony_ci OFFSET_NV for IPAO instruction. 24235bd8deadSopenharmony_ci FRAGMENT_PROGRAM_INTERPOLATION_ Z+ GetIntegerv 4 Subpixel bit count 2.X.8.Z - 24245bd8deadSopenharmony_ci OFFSET_BITS_NV for IPAO instruction 24255bd8deadSopenharmony_ci 24265bd8deadSopenharmony_ci 24275bd8deadSopenharmony_ciDependencies on NV_gpu_program4, NV_vertex_program4, NV_geometry_program4, and 24285bd8deadSopenharmony_ciNV_fragment_program4 24295bd8deadSopenharmony_ci 24305bd8deadSopenharmony_ci This extension is written against the NV_gpu_program4 family of 24315bd8deadSopenharmony_ci extensions, and introduces new instruction set features and inputs/outputs 24325bd8deadSopenharmony_ci described here. These features are available only if the extension is 24335bd8deadSopenharmony_ci supported and the appropriate program header string is used ("!!NVvp5.0" 24345bd8deadSopenharmony_ci for vertex programs, "!!NVgp5.0" for geometry programs, and "!!NVfp5.0" 24355bd8deadSopenharmony_ci for fragment programs.) When loading a program with an older header (e.g., 24365bd8deadSopenharmony_ci "!!NVvp4.0"), the instruction set features described in this extension are 24375bd8deadSopenharmony_ci not available. The features in this extension build upon those documented 24385bd8deadSopenharmony_ci in full in NV_gpu_program4. 24395bd8deadSopenharmony_ci 24405bd8deadSopenharmony_ciDependencies on NV_tessellation_program5 24415bd8deadSopenharmony_ci 24425bd8deadSopenharmony_ci This extension provides the basic assembly instruction set constructs for 24435bd8deadSopenharmony_ci tessellation programs. If this extension is supported, tessellation 24445bd8deadSopenharmony_ci control and evaluation programs are supported, as described in the 24455bd8deadSopenharmony_ci NV_tessellation_program5 specification. There is no separate extension 24465bd8deadSopenharmony_ci string for tessellation programs; such support is implied by this 24475bd8deadSopenharmony_ci extension. 24485bd8deadSopenharmony_ci 24495bd8deadSopenharmony_ciDependencies on ARB_transform_feedback3 24505bd8deadSopenharmony_ci 24515bd8deadSopenharmony_ci The concept of multiple vertex streams emitted by a geometry shader is 24525bd8deadSopenharmony_ci introduced by ARB_transform_feedback3, as is the description of how they 24535bd8deadSopenharmony_ci operate and implementation-dependent limits on the number of streams. 24545bd8deadSopenharmony_ci This extension simply provides a mechanism to emit a vertex to more than 24555bd8deadSopenharmony_ci one stream. If ARB_transform_feedback3 is not supported, language 24565bd8deadSopenharmony_ci describing the EMITS opcode and the restriction on PRIMITIVE_OUT when 24575bd8deadSopenharmony_ci EMITS is used should be removed. 24585bd8deadSopenharmony_ci 24595bd8deadSopenharmony_ciDependencies on NV_shader_buffer_load 24605bd8deadSopenharmony_ci 24615bd8deadSopenharmony_ci The programmability functionality provided by NV_shader_buffer_load is 24625bd8deadSopenharmony_ci also incorporated by this extension. Any assembly program using a program 24635bd8deadSopenharmony_ci header corresponding to this or any subsequent extension (e.g., 24645bd8deadSopenharmony_ci "!!NVfp5.0") may use the LOAD opcode without needing to declare "OPTION 24655bd8deadSopenharmony_ci NV_shader_buffer_load". 24665bd8deadSopenharmony_ci 24675bd8deadSopenharmony_ci NV_shader_buffer_load is required by this extension, which means that the 24685bd8deadSopenharmony_ci API mechanisms documented there allowing applications to make a buffer 24695bd8deadSopenharmony_ci resident and query its GPU address are available to any applications using 24705bd8deadSopenharmony_ci this extension. 24715bd8deadSopenharmony_ci 24725bd8deadSopenharmony_ci In addition to the basic functionality in NV_shader_buffer_load, this 24735bd8deadSopenharmony_ci extension provides the ability to load 64-bit integers and floating-point 24745bd8deadSopenharmony_ci values using the "S64", "S64X2", "S64X4", "U64", "U64X2", "U64X4", "F64", 24755bd8deadSopenharmony_ci "F64X2", and "F64X4" opcode modifiers. 24765bd8deadSopenharmony_ci 24775bd8deadSopenharmony_ciDependencies on NV_shader_buffer_store 24785bd8deadSopenharmony_ci 24795bd8deadSopenharmony_ci This extension provides assembly programmability support for the 24805bd8deadSopenharmony_ci NV_shader_buffer_store, which provides the API mechanisms allowing buffer 24815bd8deadSopenharmony_ci object to be stored to. NV_shader_buffer_store does not have a separate 24825bd8deadSopenharmony_ci extension string entry, and will always be supported if this extension is 24835bd8deadSopenharmony_ci present. 24845bd8deadSopenharmony_ci 24855bd8deadSopenharmony_ciDependencies on NV_parameter_buffer_object2 24865bd8deadSopenharmony_ci 24875bd8deadSopenharmony_ci The programmability functionality provided by NV_parameter_buffer_object2 24885bd8deadSopenharmony_ci is also incorporated by this extension. Any assembly program using a 24895bd8deadSopenharmony_ci program header corresponding to this or any subsequent extension (e.g., 24905bd8deadSopenharmony_ci "!!NVfp5.0") may use the LDC opcode without needing to declare "OPTION 24915bd8deadSopenharmony_ci NV_parameter_buffer_object2". 24925bd8deadSopenharmony_ci 24935bd8deadSopenharmony_ci In addition to the basic functionality in NV_parameter_buffer_object2, 24945bd8deadSopenharmony_ci this extension provides the ability to load 64-bit integers and 24955bd8deadSopenharmony_ci floating-point values using the "S64", "S64X2", "S64X4", "U64", "U64X2", 24965bd8deadSopenharmony_ci "U64X4", "F64", "F64X2", and "F64X4" opcode modifiers. 24975bd8deadSopenharmony_ci 24985bd8deadSopenharmony_ciDependencies on OpenGL 3.3, ARB_texture_swizzle, and EXT_texture_swizzle 24995bd8deadSopenharmony_ci 25005bd8deadSopenharmony_ci If OpenGL 3.3, ARB_texture_swizzle, and EXT_texture_swizzle are not 25015bd8deadSopenharmony_ci supported, remove the swizzling step from the definition of TXG and TXGO. 25025bd8deadSopenharmony_ci 25035bd8deadSopenharmony_ciDependencies on ARB_blend_func_extended 25045bd8deadSopenharmony_ci 25055bd8deadSopenharmony_ci If ARB_blend_func_extended is not supported, references to the dual source 25065bd8deadSopenharmony_ci color output bindings (result.color.primary and result.color.secondary) 25075bd8deadSopenharmony_ci should be removed. 25085bd8deadSopenharmony_ci 25095bd8deadSopenharmony_ciDependencies on EXT_shader_image_load_store 25105bd8deadSopenharmony_ci 25115bd8deadSopenharmony_ci EXT_shader_image_load_store provides OpenGL Shading Language mechanisms to 25125bd8deadSopenharmony_ci load/store to buffer and texture image memory, including spec language 25135bd8deadSopenharmony_ci describing memory access ordering and synchronization, a built-in function 25145bd8deadSopenharmony_ci (MemoryBarrierEXT) controlling synchronization of memory operations, and 25155bd8deadSopenharmony_ci spec language describing early fragment tests that can be enabled via GLSL 25165bd8deadSopenharmony_ci fragment shader source. These sections of the EXT_shader_image_load_store 25175bd8deadSopenharmony_ci specification apply equally to the assembly program memory accesses 25185bd8deadSopenharmony_ci provided by this extension. If EXT_shader_image_load_store is not 25195bd8deadSopenharmony_ci supported, the sections of that specification describing these features 25205bd8deadSopenharmony_ci should be considered to be added to this extension. 25215bd8deadSopenharmony_ci 25225bd8deadSopenharmony_ci EXT_shader_image_load_store additionally provides and documents assembly 25235bd8deadSopenharmony_ci language support for image loads, stores, and atomics as described in the 25245bd8deadSopenharmony_ci "Dependencies on NV_gpu_program5" section of EXT_shader_image_load_store. 25255bd8deadSopenharmony_ci The features described there are automatically supported for all 25265bd8deadSopenharmony_ci NV_gpu_program5 assembly programs without requiring any additional 25275bd8deadSopenharmony_ci "OPTION" line. 25285bd8deadSopenharmony_ci 25295bd8deadSopenharmony_ciDependencies on ARB_shader_subroutine 25305bd8deadSopenharmony_ci 25315bd8deadSopenharmony_ci ARB_shader_subroutine provides and documents assembly language support for 25325bd8deadSopenharmony_ci subroutines as described in the "Dependencies on NV_gpu_program5" section 25335bd8deadSopenharmony_ci of ARB_shader_subroutine. The features described there are automatically 25345bd8deadSopenharmony_ci supported for all NV_gpu_program5 assembly programs without requiring any 25355bd8deadSopenharmony_ci additional "OPTION" line. 25365bd8deadSopenharmony_ci 25375bd8deadSopenharmony_ci 25385bd8deadSopenharmony_ciIssues 25395bd8deadSopenharmony_ci 25405bd8deadSopenharmony_ci (1) Are there any restrictions or performance concerns involving the 25415bd8deadSopenharmony_ci support for indexing textures or parameter buffers? 25425bd8deadSopenharmony_ci 25435bd8deadSopenharmony_ci RESOLVED: There are no significant functional limitations. Textures 25445bd8deadSopenharmony_ci and parameter buffers accessed with an index must be declared as arrays, 25455bd8deadSopenharmony_ci so the assembler knows which textures might be accessed this way. 25465bd8deadSopenharmony_ci Additionally, accessing an array of textures or parameter buffers with 25475bd8deadSopenharmony_ci an out-of-bounds index will yield undefined results. 25485bd8deadSopenharmony_ci 25495bd8deadSopenharmony_ci In particular, there is no limitation on the values used for indexing -- 25505bd8deadSopenharmony_ci they are not required to be true constants and are not required to have 25515bd8deadSopenharmony_ci the same value for all vertices/fragments in a primitive. However, 25525bd8deadSopenharmony_ci using divergent texture or parameter buffer indices may have performance 25535bd8deadSopenharmony_ci concerns. We expect that GPU implementations of this extension will run 25545bd8deadSopenharmony_ci multiple program threads in parallel (SIMD). If different threads in a 25555bd8deadSopenharmony_ci thread group have different indices, it will be necessary to do lookups 25565bd8deadSopenharmony_ci in more than one texture at once. This is likely to result in some 25575bd8deadSopenharmony_ci thread serialization. We expect that indexed texture or parameter 25585bd8deadSopenharmony_ci buffer access where all indices in a thread group match will perform 25595bd8deadSopenharmony_ci identically to non-indexed accesses. 25605bd8deadSopenharmony_ci 25615bd8deadSopenharmony_ci (2) Which texture instructions support programmable texel offsets, and 25625bd8deadSopenharmony_ci what offset limits apply? 25635bd8deadSopenharmony_ci 25645bd8deadSopenharmony_ci RESOLVED: Most texture instructions (TEX, TXB, TXF, TXG, TXL, TXP) 25655bd8deadSopenharmony_ci support both constant texel offsets as provided by NV_gpu_program4 and 25665bd8deadSopenharmony_ci programmable texel offsets. TXD supports only constant offsets. TXGO 25675bd8deadSopenharmony_ci does not support non-zero or programmable offsets in the texture portion 25685bd8deadSopenharmony_ci of the instruction, but provides full support for programmable offsets 25695bd8deadSopenharmony_ci via two of the three vector arguments in the regular instruction. 25705bd8deadSopenharmony_ci 25715bd8deadSopenharmony_ci For example, 25725bd8deadSopenharmony_ci 25735bd8deadSopenharmony_ci TEX result, coord, texture[0], 2D, (-1,-1); 25745bd8deadSopenharmony_ci 25755bd8deadSopenharmony_ci uses the NV_gpu_program4 mechanism applies a constant texel offset of 25765bd8deadSopenharmony_ci (-1,-1) to the texture coordinates. With programmable offsets, the 25775bd8deadSopenharmony_ci following code applies the same offset. 25785bd8deadSopenharmony_ci 25795bd8deadSopenharmony_ci TEMP offxy; 25805bd8deadSopenharmony_ci MOV offxy, {-1, -1}; 25815bd8deadSopenharmony_ci TEX result, coord, texture[0], offset(offxy); 25825bd8deadSopenharmony_ci 25835bd8deadSopenharmony_ci Of course, the programmable form allows the offsets to be computed in 25845bd8deadSopenharmony_ci the program and does not require constant values. 25855bd8deadSopenharmony_ci 25865bd8deadSopenharmony_ci For most texture instructions, the range of allowable offsets is 25875bd8deadSopenharmony_ci [MIN_PROGRAM_TEXEL_OFFSET_EXT, MAX_PROGRAM_TEXEL_OFFSET_EXT] for both 25885bd8deadSopenharmony_ci constant and programmable texel offsets. Constant offsets can be 25895bd8deadSopenharmony_ci checked when the program is loaded, and out-of-bounds offsets cause the 25905bd8deadSopenharmony_ci program to fail to load. Programmable offsets can not have a 25915bd8deadSopenharmony_ci load-time range check; out-of-bounds offsets produce undefined results. 25925bd8deadSopenharmony_ci 25935bd8deadSopenharmony_ci Additionally, the new TXGO instruction has a separate (likely larger) 25945bd8deadSopenharmony_ci allowable offset range, [MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV, 25955bd8deadSopenharmony_ci MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV], that applies to the offset 25965bd8deadSopenharmony_ci vectors passed in its second and third operand. 25975bd8deadSopenharmony_ci 25985bd8deadSopenharmony_ci In the initial implementation of this extension, the range limits are 25995bd8deadSopenharmony_ci [-8,+7] for most instructions and [-32,+31] for TXGO. 26005bd8deadSopenharmony_ci 26015bd8deadSopenharmony_ci (3) What is TXGO (texture gather with separate offsets) good for? 26025bd8deadSopenharmony_ci 26035bd8deadSopenharmony_ci RESOLVED: TXGO allows for efficiently sampling a single-component 26045bd8deadSopenharmony_ci texture with a variety of offsets that need not be contiguous. 26055bd8deadSopenharmony_ci 26065bd8deadSopenharmony_ci For example, a shadow mapping algorithm using a high-resolution shadow 26075bd8deadSopenharmony_ci map may have pixels whose footpoint covers a large number of texels in 26085bd8deadSopenharmony_ci the shadow map. Such pixels could do a single lookup into a 26095bd8deadSopenharmony_ci lower-resolution texture (using mipmapping), but quality problems will 26105bd8deadSopenharmony_ci arise. Alternately, a shader could perform a large number of texture 26115bd8deadSopenharmony_ci lookups using either NEAREST or LINEAR filtering from the 26125bd8deadSopenharmony_ci high-resolution texture. NEAREST filtering will require a separate 26135bd8deadSopenharmony_ci lookup for each texel accessed; LINEAR filtering may require somewhat 26145bd8deadSopenharmony_ci fewer lookups, but all accesses cover a 2x2 portion of the texture. The 26155bd8deadSopenharmony_ci TXG instruction added to NV_gpu_program4_1 allows a 2x2 block of texels 26165bd8deadSopenharmony_ci to be returned in a single instruction in case the program wants to do 26175bd8deadSopenharmony_ci something other than linear filtering with the samples. The TXGO allows 26185bd8deadSopenharmony_ci a program to do semi-random sampling of the texture without requiring 26195bd8deadSopenharmony_ci that each sample cover a 2x2 block of texels. For example, the TXGO 26205bd8deadSopenharmony_ci instruction would allow a program to the four texels A, H, J, O from the 26215bd8deadSopenharmony_ci 4x4 block depicted below: 26225bd8deadSopenharmony_ci 26235bd8deadSopenharmony_ci TXGO result, coord, {-1,+2,0,+1}, {-1,0,+1,+2}, texture[0], 2D; 26245bd8deadSopenharmony_ci 26255bd8deadSopenharmony_ci The "equivalent" TXG instruction would only sample the four center 26265bd8deadSopenharmony_ci texels F, G, J, and K 26275bd8deadSopenharmony_ci 26285bd8deadSopenharmony_ci TXG result, coord, texture[0], 2D; 26295bd8deadSopenharmony_ci 26305bd8deadSopenharmony_ci All sixteen texels of the footprint could be sampled with four TXG 26315bd8deadSopenharmony_ci instructions, 26325bd8deadSopenharmony_ci 26335bd8deadSopenharmony_ci TXG result0, coord, texture[0], 2D, (-1,-1); 26345bd8deadSopenharmony_ci TXG result1, coord, texture[0], 2D, (-1,+1); 26355bd8deadSopenharmony_ci TXG result2, coord, texture[0], 2D, (+1,-1); 26365bd8deadSopenharmony_ci TXG result3, coord, texture[0], 2D, (+1,+1); 26375bd8deadSopenharmony_ci 26385bd8deadSopenharmony_ci but accessing a smaller number of samples spread across the footprint 26395bd8deadSopenharmony_ci with fewer instructions may produce results that are good enough. 26405bd8deadSopenharmony_ci 26415bd8deadSopenharmony_ci The figure here depicts a texture with texel (0,0) shown in the 26425bd8deadSopenharmony_ci upper-left corner. If you insist on a lower-left origin, please look at 26435bd8deadSopenharmony_ci this figure while standing on your head. 26445bd8deadSopenharmony_ci 26455bd8deadSopenharmony_ci (0,0) +-+-+-+-+ 26465bd8deadSopenharmony_ci |A|B|C|D| 26475bd8deadSopenharmony_ci +-+-+-+-+ 26485bd8deadSopenharmony_ci |E|F|G|H| 26495bd8deadSopenharmony_ci +-+-+-+-+ 26505bd8deadSopenharmony_ci |I|J|K|L| 26515bd8deadSopenharmony_ci +-+-+-+-+ 26525bd8deadSopenharmony_ci |M|N|O|P| 26535bd8deadSopenharmony_ci +-+-+-+-+ (4,4) 26545bd8deadSopenharmony_ci 26555bd8deadSopenharmony_ci (4) Why are the results of TXGO (texture gather with separate offsets) 26565bd8deadSopenharmony_ci undefined if the wrap mode is CLAMP or MIRROR_CLAMP_EXT? 26575bd8deadSopenharmony_ci 26585bd8deadSopenharmony_ci RESOLVED: The CLAMP and MIRROR_CLAMP_EXT wrap modes are fairly 26595bd8deadSopenharmony_ci different from other wrap modes. After adding any instruction offsets, 26605bd8deadSopenharmony_ci the spec says to pre-clamp the (u,v) coordinates to [0,texture_size] 26615bd8deadSopenharmony_ci before generating the footprint. If such clamping occurs on one edge 26625bd8deadSopenharmony_ci for a normal texture filtering operation, the footprint ends up being 26635bd8deadSopenharmony_ci half border texels, half edge texels, and the clamping effectively 26645bd8deadSopenharmony_ci forces the interpolation weights used for texture filtering to 50/50. 26655bd8deadSopenharmony_ci 26665bd8deadSopenharmony_ci We expect the TXG instruction to be used in cases where an application 26675bd8deadSopenharmony_ci may want to do custom filtering, and is in control of its own filtering 26685bd8deadSopenharmony_ci weights. Coordinate clamping as above will affect the footprint used 26695bd8deadSopenharmony_ci for filtering, but not the weights. In the NV_gpu_program4_1 spec, we 26705bd8deadSopenharmony_ci defined the TXG/CLAMP combination to simply return the "normal" 26715bd8deadSopenharmony_ci footprint produced after the pre-clamp operation above. Any adjustment 26725bd8deadSopenharmony_ci of weights due to clamping is the responsibility of the application. We 26735bd8deadSopenharmony_ci don't expect this to be a common operation, because CLAMP_TO_EDGE or 26745bd8deadSopenharmony_ci CLAMP_TO_BORDER are much more sensible wrap modes. 26755bd8deadSopenharmony_ci 26765bd8deadSopenharmony_ci The hardware implementing TXGO is anticipated to extract all four 26775bd8deadSopenharmony_ci samples in a single pass. However, the spec language is defined for 26785bd8deadSopenharmony_ci simplicity to perform four separate "gather" operations with the four 26795bd8deadSopenharmony_ci provided offsets, extract a single sample from each, and combine the 26805bd8deadSopenharmony_ci four samples into a vector. This would require four separate pre-clamp 26815bd8deadSopenharmony_ci operations, which was deemed too costly to implement in hardware for a 26825bd8deadSopenharmony_ci wrap mode that doesn't work well with texture gather operations. Even 26835bd8deadSopenharmony_ci if such hardware were built, it still wouldn't obtain a footprint 26845bd8deadSopenharmony_ci resembling the half-border, half-edge footprint for simple TXGO offsets 26855bd8deadSopenharmony_ci -- that would require different per-texel clamping rules for the four 26865bd8deadSopenharmony_ci samples. We chose to leave the results of this operation undefined. 26875bd8deadSopenharmony_ci 26885bd8deadSopenharmony_ci (5) Should double-precision floating-point support be required or 26895bd8deadSopenharmony_ci optional? If optional, how? 26905bd8deadSopenharmony_ci 26915bd8deadSopenharmony_ci RESOLVED: Double-precision floating-point support will be optional in 26925bd8deadSopenharmony_ci case low-end GPUs supporting the remainder of these instruction features 26935bd8deadSopenharmony_ci choose to cut costs by removing the silicon necessary to implement 26945bd8deadSopenharmony_ci 64-bit floating-point arithmetic. 26955bd8deadSopenharmony_ci 26965bd8deadSopenharmony_ci (6) While this extension supports double-precision computation, how can 26975bd8deadSopenharmony_ci you provide high-precision inputs and outputs to the GPU programs? 26985bd8deadSopenharmony_ci 26995bd8deadSopenharmony_ci RESOLVED: The underlying hardware implementing this extension does not 27005bd8deadSopenharmony_ci provide full support for 64-bit floats, even though DOUBLE is a standard 27015bd8deadSopenharmony_ci data type provided by the GL. For example, when specifying a vertex 27025bd8deadSopenharmony_ci array with a data type of DOUBLE, the vertex attribute components will 27035bd8deadSopenharmony_ci end up being converted to 32-bit floats (FLOAT) by the driver before 27045bd8deadSopenharmony_ci being passed to the hardware, and the extra precision in the original 27055bd8deadSopenharmony_ci 64-bit float values will be lost. 27065bd8deadSopenharmony_ci 27075bd8deadSopenharmony_ci For vertex attributes, the EXT_vertex_attrib_64bit and 27085bd8deadSopenharmony_ci NV_vertex_attrib_integer_64bit extensions provide the ability to specify 27095bd8deadSopenharmony_ci 64-bit vertex attribute components using the VertexAttribL* and 27105bd8deadSopenharmony_ci VertexAttribLPointer APIs. Such attributes can be read in a vertex 27115bd8deadSopenharmony_ci program using a "LONG ATTRIB" declaration: 27125bd8deadSopenharmony_ci 27135bd8deadSopenharmony_ci LONG ATTRIB vector64; 27145bd8deadSopenharmony_ci 27155bd8deadSopenharmony_ci The LONG modifier can only be used vertex program inputs, and can not be 27165bd8deadSopenharmony_ci used for inputs of any program type or outputs of any program type. 27175bd8deadSopenharmony_ci 27185bd8deadSopenharmony_ci For other cases, this extension provides the PK64 and UP64 instructions 27195bd8deadSopenharmony_ci that provide a mechanism to pass 64-bit components using consecutive 27205bd8deadSopenharmony_ci 32-bit components. For example, a 3-component vector with 64-bit 27215bd8deadSopenharmony_ci components can be passed to a vertex shader using multiple vertex 27225bd8deadSopenharmony_ci attributes without using the VertexAttribL APIs with the following code: 27235bd8deadSopenharmony_ci 27245bd8deadSopenharmony_ci /* Pass the X/Y components in vertex attribute 0 (X/Y/Z/W). Use 27255bd8deadSopenharmony_ci stride to skip over Z. */ 27265bd8deadSopenharmony_ci glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 3*sizeof(GLdouble), 27275bd8deadSopenharmony_ci (GLdouble *) buffer); 27285bd8deadSopenharmony_ci 27295bd8deadSopenharmony_ci /* Pass the Z components in vertex attribute 1 (X/Y). Use stride to 27305bd8deadSopenharmony_ci skip over original X/Y components. */ 27315bd8deadSopenharmony_ci glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 3*sizeof(GLdouble), 27325bd8deadSopenharmony_ci (GLdouble *) buffer + 2); 27335bd8deadSopenharmony_ci 27345bd8deadSopenharmony_ci In this example, the vertex program would use the PK64 instruction to 27355bd8deadSopenharmony_ci reconstruct the 64-bit value for each component as follows: 27365bd8deadSopenharmony_ci 27375bd8deadSopenharmony_ci LONG TEMP reconstructed; 27385bd8deadSopenharmony_ci PK64 reconstructed.xy, vertex.attrib[0]; 27395bd8deadSopenharmony_ci PK64 reconstructed.z, vertex.attrib[1]; 27405bd8deadSopenharmony_ci 27415bd8deadSopenharmony_ci A similar technique can be used to pass 64-bit values computed by a GPU 27425bd8deadSopenharmony_ci program, using transform feedback or writes to a color buffer. The UP64 27435bd8deadSopenharmony_ci instruction would be used to convert the 64-bit computed value into two 27445bd8deadSopenharmony_ci 32-bit values, which would be written to adjacent components. 27455bd8deadSopenharmony_ci 27465bd8deadSopenharmony_ci Note also that the original hardware implementation of this extension 27475bd8deadSopenharmony_ci does not support interpolation of 64-bit floating-point values. If an 27485bd8deadSopenharmony_ci application desires to pass a 64-bit floating-point value from a vertex 27495bd8deadSopenharmony_ci or geometry program to a fragment program, and doesn't require 27505bd8deadSopenharmony_ci interpolation, the PK64/UP64 techniques can be combined. For example, 27515bd8deadSopenharmony_ci the vertex shader could unpack a 3-component vector with 64-bit 27525bd8deadSopenharmony_ci components into a four-component and a two-component 32-bit vector: 27535bd8deadSopenharmony_ci 27545bd8deadSopenharmony_ci LONG TEMP result64; 27555bd8deadSopenharmony_ci RESULT result32[2] = { result.attrib[0..1] }; 27565bd8deadSopenharmony_ci UP64 result32[0], result64.xyxy; 27575bd8deadSopenharmony_ci UP64 result32[1].xy, result64.z; 27585bd8deadSopenharmony_ci 27595bd8deadSopenharmony_ci The fragment program would read and reconstruct using PK64: 27605bd8deadSopenharmony_ci 27615bd8deadSopenharmony_ci LONG TEMP input64; 27625bd8deadSopenharmony_ci FLAT ATTRIB input32[3] = { fragment.attrib[0..1] }; 27635bd8deadSopenharmony_ci PK64 input64.xy, input32[0]; 27645bd8deadSopenharmony_ci PK64 input64.z, input32[1]; 27655bd8deadSopenharmony_ci 27665bd8deadSopenharmony_ci Note that such inputs must be declared as "FLAT" in the fragment program 27675bd8deadSopenharmony_ci to prevent the hardware from trying to do floating-point interpolation 27685bd8deadSopenharmony_ci on the separate 32-bit halves of the value being passed. Such 27695bd8deadSopenharmony_ci interpolation would produce complete garbage. 27705bd8deadSopenharmony_ci 27715bd8deadSopenharmony_ci (7) What are instanced geometry programs useful for? 27725bd8deadSopenharmony_ci 27735bd8deadSopenharmony_ci RESOLVED: Instanced geometry programs allow geometry programs that 27745bd8deadSopenharmony_ci perform regular operations to run more efficiently. 27755bd8deadSopenharmony_ci 27765bd8deadSopenharmony_ci Consider a simple example of an algorithm that uses geometry programs to 27775bd8deadSopenharmony_ci render primitives to a cube map in a single pass. Without instanced 27785bd8deadSopenharmony_ci geometry programs, the geometry program to render triangles to the cube 27795bd8deadSopenharmony_ci map would do something like: 27805bd8deadSopenharmony_ci 27815bd8deadSopenharmony_ci for (face = 0; face < 6; face++) { 27825bd8deadSopenharmony_ci for (vertex = 0; vertex < 3; vertex++) { 27835bd8deadSopenharmony_ci project vertex <vertex> onto face <face>, output position 27845bd8deadSopenharmony_ci compute/copy attributes of emitted <vertex> to outputs 27855bd8deadSopenharmony_ci output <face> to result.layer 27865bd8deadSopenharmony_ci emit the projected vertex 27875bd8deadSopenharmony_ci } 27885bd8deadSopenharmony_ci end the primitive (next triangle) 27895bd8deadSopenharmony_ci } 27905bd8deadSopenharmony_ci 27915bd8deadSopenharmony_ci This algorithm would output 18 vertices per input triangle, three for 27925bd8deadSopenharmony_ci each cube face. The six triangles emitted would be rasterized, one per 27935bd8deadSopenharmony_ci face. Geometry programs that emit a large number of attributes have 27945bd8deadSopenharmony_ci often posed performance challenges, since all the attributes must be 27955bd8deadSopenharmony_ci stored somewhere until the emitted primitives. Large storage 27965bd8deadSopenharmony_ci requirements may limit the number of threads that can be run in parallel 27975bd8deadSopenharmony_ci and reduce overall performance. 27985bd8deadSopenharmony_ci 27995bd8deadSopenharmony_ci Instanced geometry programs allow this example to be restructured to run 28005bd8deadSopenharmony_ci with six separate threads, one per face. Each thread projects the 28015bd8deadSopenharmony_ci triangle to only a single face (identified by the invocation number) and 28025bd8deadSopenharmony_ci emits only 3 vertices. The reduced storage requirements allow more 28035bd8deadSopenharmony_ci geometry program threads to be run in parallel, with greater overall 28045bd8deadSopenharmony_ci efficiency. 28055bd8deadSopenharmony_ci 28065bd8deadSopenharmony_ci Additionally, the total number of attributes that can be emitted by a 28075bd8deadSopenharmony_ci single geometry program invocation is limited. However, for instanced 28085bd8deadSopenharmony_ci geometry shaders, that limit applies to each of <N> program invocations 28095bd8deadSopenharmony_ci which allows for a larger total output. For example, if the GL 28105bd8deadSopenharmony_ci implementation supports only 1024 components of output per program 28115bd8deadSopenharmony_ci invocation, the 18-vertex algorithm above could emit no more than 56 28125bd8deadSopenharmony_ci components per vertex. The same algorithm implemented as a 3-vertex 28135bd8deadSopenharmony_ci 6-invocation geometry program could theoretically allow for 341 28145bd8deadSopenharmony_ci components per vertex. 28155bd8deadSopenharmony_ci 28165bd8deadSopenharmony_ci (8) What are the special interpolation opcodes (IPAC, IPAO, IPAS) good 28175bd8deadSopenharmony_ci for, and how do they work? 28185bd8deadSopenharmony_ci 28195bd8deadSopenharmony_ci RESOLVED: The interpolation opcodes allow programs to control the 28205bd8deadSopenharmony_ci frequency and location at which fragment inputs are sampled. Limited 28215bd8deadSopenharmony_ci control has been provided in previous extensions, but the support was 28225bd8deadSopenharmony_ci more limited. NV_gpu_program4 had an interpolation modifier (CENTROID) 28235bd8deadSopenharmony_ci that allowed attributes to be sampled inside the primitive, but that was 28245bd8deadSopenharmony_ci a per-attribute modifier -- you could only sample any given attribute at 28255bd8deadSopenharmony_ci one location. NV_gpu_program4_1 added a new interpolation modifier 28265bd8deadSopenharmony_ci (SAMPLE) that directed that fragment programs be run once per sample, 28275bd8deadSopenharmony_ci and that the specified attributes be interpolated at the sample 28285bd8deadSopenharmony_ci location. Per-sample interpolation can produce higher quality, but the 28295bd8deadSopenharmony_ci performance cost is significant since more fragment program invocations 28305bd8deadSopenharmony_ci are required. 28315bd8deadSopenharmony_ci 28325bd8deadSopenharmony_ci This extension provides additional control over interpolation, and 28335bd8deadSopenharmony_ci allows programs to interpolate attributes at different locations without 28345bd8deadSopenharmony_ci necessarily requiring the performance hit of per-sample invocation. 28355bd8deadSopenharmony_ci 28365bd8deadSopenharmony_ci The IPAC instruction allows an attribute to be sampled at the centroid 28375bd8deadSopenharmony_ci location, while still allowing the same attribute to be sampled 28385bd8deadSopenharmony_ci elsewhere. The IPAS instruction allows the attribute to be sampled at a 28395bd8deadSopenharmony_ci number sample location, as per-sample interpolation would do. Multiple 28405bd8deadSopenharmony_ci IPAS instructions with different sample numbers allows a program to 28415bd8deadSopenharmony_ci sample an attribute at multiple sample points in the pixel and then 28425bd8deadSopenharmony_ci combine the samples in a programmable manner, which may allow for higher 28435bd8deadSopenharmony_ci quality than simply interpolating at a single representative point in 28445bd8deadSopenharmony_ci the pixel. The IPAO instruction allows the attribute to be sampled at 28455bd8deadSopenharmony_ci an arbitrary (x,y) offset relative to the pixel center. The range of 28465bd8deadSopenharmony_ci supported (x,y) values is limited, and the limits in the initial 28475bd8deadSopenharmony_ci implementation are not large enough to permit sampling the attribute 28485bd8deadSopenharmony_ci outside the pixel. 28495bd8deadSopenharmony_ci 28505bd8deadSopenharmony_ci Note that previous instruction sets allowed shaders to fake IPAC, 28515bd8deadSopenharmony_ci IPAS, and IPAO by a sequence such as: 28525bd8deadSopenharmony_ci 28535bd8deadSopenharmony_ci TEMP ddx, ddy, offset, interp; 28545bd8deadSopenharmony_ci MOV interp, fragment.attrib[0]; # start with center 28555bd8deadSopenharmony_ci DDX ddx, fragment.attrib[0]; 28565bd8deadSopenharmony_ci MAD interp, offset.x, ddx, interp; # add offset.x * dA/dx 28575bd8deadSopenharmony_ci DDY ddx, fragment.attrib[0]; 28585bd8deadSopenharmony_ci MAD interp, offset.y, ddy, interp; # add offset.y * dA/dy 28595bd8deadSopenharmony_ci 28605bd8deadSopenharmony_ci However, this method does not apply perspective correction. The quality 28615bd8deadSopenharmony_ci of the results may be unacceptable, particularly for primitives that are 28625bd8deadSopenharmony_ci nearly perpendicular to the screen. 28635bd8deadSopenharmony_ci 28645bd8deadSopenharmony_ci The semantics of the first operand of these instructions is different 28655bd8deadSopenharmony_ci from normal assembly instructions. Operands are normally evaluated by 28665bd8deadSopenharmony_ci loading the value of the corresponding variable and applying any 28675bd8deadSopenharmony_ci swizzle/negation/absolute value modifier before the instruction is 28685bd8deadSopenharmony_ci executed. In the IPAC/IPAO/IPAS instructions, the value of the 28695bd8deadSopenharmony_ci attribute is evaluated by the instruction itself. Swizzles, negation, 28705bd8deadSopenharmony_ci and absolute value modifiers are still allowed, and are applied after 28715bd8deadSopenharmony_ci the attribute values are interpolated. 28725bd8deadSopenharmony_ci 28735bd8deadSopenharmony_ci (9) When using a program that issues global stores (via the STORE 28745bd8deadSopenharmony_ci instruction), what amount of execution ordering is guaranteed? How 28755bd8deadSopenharmony_ci can an application ensure that writes executed in a shader have 28765bd8deadSopenharmony_ci completed and will be visible to other operations using the buffer 28775bd8deadSopenharmony_ci object in question? 28785bd8deadSopenharmony_ci 28795bd8deadSopenharmony_ci RESOLVED: There are very few automatic guarantees for potential 28805bd8deadSopenharmony_ci write/read or write/write conflicts. Program invocations will run in 28815bd8deadSopenharmony_ci generally run in arbitrary order, and applications can't rely on 28825bd8deadSopenharmony_ci read/write order to match primitive order. 28835bd8deadSopenharmony_ci 28845bd8deadSopenharmony_ci To get consistent results when buffers are read and written using 28855bd8deadSopenharmony_ci multiple pipeline stages, manual synchronization using the 28865bd8deadSopenharmony_ci MemoryBarrierEXT() API documented in EXT_shader_image_load_store or some 28875bd8deadSopenharmony_ci other synchronization primitive is necessary. 28885bd8deadSopenharmony_ci 28895bd8deadSopenharmony_ci (10) Unlike most other shader features, the STORE opcode allows for 28905bd8deadSopenharmony_ci externally-visible side effects from executing a program. How does 28915bd8deadSopenharmony_ci this capability interact with other features of the GL? 28925bd8deadSopenharmony_ci 28935bd8deadSopenharmony_ci RESOLVED: First, some GL implementations support a variety of "early Z" 28945bd8deadSopenharmony_ci optimizations designed to minimize unnecessary fragment processing work, 28955bd8deadSopenharmony_ci such as executing an expensive fragment program on a fragment that will 28965bd8deadSopenharmony_ci eventually fail the depth test. Such optimizations have been valid 28975bd8deadSopenharmony_ci because fragment programs had no side effects. That is no longer the 28985bd8deadSopenharmony_ci case, and such optimizations may not be employed if the fragment program 28995bd8deadSopenharmony_ci performs a global store. However, we provide a new "early depth and 29005bd8deadSopenharmony_ci stencil test" enable that allows applications to deterministically 29015bd8deadSopenharmony_ci control depth and stencil testing. If enabled, depth testing is always 29025bd8deadSopenharmony_ci performed prior to fragment program execution. Fragment programs will 29035bd8deadSopenharmony_ci never be run on fragments that fail any of these tests. 29045bd8deadSopenharmony_ci 29055bd8deadSopenharmony_ci Second, we are permitting global stores in all program types; however, 29065bd8deadSopenharmony_ci the number of program invocations is not well-defined for some program 29075bd8deadSopenharmony_ci types. For example, a GL implementation may choose to combine multiple 29085bd8deadSopenharmony_ci instances of identical vertices (e.g., duplicate indices in 29095bd8deadSopenharmony_ci DrawElements, immediate-mode vertices with identical data) into one 29105bd8deadSopenharmony_ci single vertex program invocation, or it may run a vertex program on each 29115bd8deadSopenharmony_ci separately. Similarly, the tessellation primitive generator will 29125bd8deadSopenharmony_ci generate independent primitives with duplicated vertices, which may or 29135bd8deadSopenharmony_ci may not be combined for tessellation evaluation program execution. 29145bd8deadSopenharmony_ci Fragment program execution also has several issues described in more 29155bd8deadSopenharmony_ci detail below. 29165bd8deadSopenharmony_ci 29175bd8deadSopenharmony_ci (11) What issues arise when running fragment programs doing global stores? 29185bd8deadSopenharmony_ci 29195bd8deadSopenharmony_ci RESOLVED: The order of per-fragment operations in the existing OpenGL 29205bd8deadSopenharmony_ci 3.0 specification can be fairly loose, because previously-defined 29215bd8deadSopenharmony_ci fragment programs, shaders, and fixed-function fragment processing had 29225bd8deadSopenharmony_ci no side effects. With side effects, the order of operations must be 29235bd8deadSopenharmony_ci defined more tightly. In particular, the pixel ownership and scissor 29245bd8deadSopenharmony_ci tests are specified to be performed prior to fragment program execution, 29255bd8deadSopenharmony_ci and we provide an option to perform depth and stencil tests early as 29265bd8deadSopenharmony_ci well. 29275bd8deadSopenharmony_ci 29285bd8deadSopenharmony_ci OpenGL implementations sometimes run fragment programs on "helper" 29295bd8deadSopenharmony_ci pixels that have no coverage in order to be able to compute sane partial 29305bd8deadSopenharmony_ci deriviatives for fragment program instructions (DDX, DDY) or automatic 29315bd8deadSopenharmony_ci level-of-detail calculation for texturing. In this approach, 29325bd8deadSopenharmony_ci derivatives are approximated by computing the difference in a quantity 29335bd8deadSopenharmony_ci computed for a given fragment at (x,y) and a fragment at a neighboring 29345bd8deadSopenharmony_ci pixel. When a fragment program is executed on a "helper" pixel, global 29355bd8deadSopenharmony_ci stores have no effect. Helper pixels aren't explicitly mentioned in the 29365bd8deadSopenharmony_ci spec body; instead, partial derivatives are obtained by magic. 29375bd8deadSopenharmony_ci 29385bd8deadSopenharmony_ci If a fragment program contains a KIL instruction, compilers may not 29395bd8deadSopenharmony_ci reorder code where an ATOM or STORE execution is executed before a KIL 29405bd8deadSopenharmony_ci instruction that logically precedes it in flow control. Once a fragment 29415bd8deadSopenharmony_ci is killed, subsequent atomics or stores should never be executed. 29425bd8deadSopenharmony_ci 29435bd8deadSopenharmony_ci Multisample rasterization poses several issues for fragment programs 29445bd8deadSopenharmony_ci with global stores. The number of times a fragment program is executed 29455bd8deadSopenharmony_ci for multisample rendering is not fully specified, which gives 29465bd8deadSopenharmony_ci implementations a number of different choices -- pure multisample (only 29475bd8deadSopenharmony_ci runs once), pure supersample (runs once per covered sample), or modes in 29485bd8deadSopenharmony_ci between. There are some ways for an application to indirectly control 29495bd8deadSopenharmony_ci the behavior -- for example, fragment programs specifying per-sample 29505bd8deadSopenharmony_ci attribute interpolation are guaranteed to run once per covered sample. 29515bd8deadSopenharmony_ci 29525bd8deadSopenharmony_ci Note that when rendering to a multisample buffer, a pair of adjacent 29535bd8deadSopenharmony_ci triangles may cause a fragment program to be executed more than once at 29545bd8deadSopenharmony_ci a given (x,y) with different sets of samples covered. This can also 29555bd8deadSopenharmony_ci occur in the interior of a quadrilateral or polygon primitive. 29565bd8deadSopenharmony_ci Implementations are permitted to split quads and polygons with >3 29575bd8deadSopenharmony_ci vertices into triangles, creating interior edges that split a pixel. 29585bd8deadSopenharmony_ci 29595bd8deadSopenharmony_ci (12) What happens if early fragment tests are enabled, the early depth 29605bd8deadSopenharmony_ci test passes, and a fragment program that computes a new depth value 29615bd8deadSopenharmony_ci is executed? 29625bd8deadSopenharmony_ci 29635bd8deadSopenharmony_ci RESOLVED: The depth value produced by the fragment program has no 29645bd8deadSopenharmony_ci effect if early fragment tests are enabled. The depth value computed by 29655bd8deadSopenharmony_ci a fragment program is used only by the post-fragment program stencil and 29665bd8deadSopenharmony_ci depth tests, and those tests always have no effect when early depth 29675bd8deadSopenharmony_ci testing is enabled. 29685bd8deadSopenharmony_ci 29695bd8deadSopenharmony_ci (13) How do early fragment tests interact with occlusion queries? 29705bd8deadSopenharmony_ci 29715bd8deadSopenharmony_ci RESOLVED: When early fragment tests are enabled, sample counting for 29725bd8deadSopenharmony_ci occlusion queries also happens prior to fragment program execution. 29735bd8deadSopenharmony_ci Enabling early fragment tests can change the overall sample count, 29745bd8deadSopenharmony_ci because samples killed by alpha test and alpha to coverage will still be 29755bd8deadSopenharmony_ci counted if early fragment tests are enabled. 29765bd8deadSopenharmony_ci 29775bd8deadSopenharmony_ci (14) What happens if a program performs a global store to a GPU address 29785bd8deadSopenharmony_ci corresponding to a read-only buffer mapping? What if it performs a 29795bd8deadSopenharmony_ci global read to a write-only mapping? 29805bd8deadSopenharmony_ci 29815bd8deadSopenharmony_ci RESOLVED: Implementations may choose implement full memory protection, 29825bd8deadSopenharmony_ci in which case accesses using the wrong type of memory mapping will fault 29835bd8deadSopenharmony_ci and lead to termination of the application. 29845bd8deadSopenharmony_ci 29855bd8deadSopenharmony_ci However, full memory protection is not required in this extension -- 29865bd8deadSopenharmony_ci implementations may choose to substitute a read-write mapping in place 29875bd8deadSopenharmony_ci of a read-only or write-only mapping. As a result, we specify the 29885bd8deadSopenharmony_ci result of such invalid loads and stores to be undefined. 29895bd8deadSopenharmony_ci 29905bd8deadSopenharmony_ci Note that if a program erroneously writes to nominally read-only 29915bd8deadSopenharmony_ci mappings, the results may be weird. If the implementation substitutes a 29925bd8deadSopenharmony_ci read-write mapping, such invalid writes are likely to proceed normally. 29935bd8deadSopenharmony_ci However, if the application later makes a buffer object non-resident and 29945bd8deadSopenharmony_ci the memory manager of the GL implementation needs to move the buffer, 29955bd8deadSopenharmony_ci the GL may assume that the contents of the buffer have not been modified 29965bd8deadSopenharmony_ci and thus discard the new values written by the (invalid) global store 29975bd8deadSopenharmony_ci instructions. 29985bd8deadSopenharmony_ci 29995bd8deadSopenharmony_ci (15) What performance considerations apply to atomics? 30005bd8deadSopenharmony_ci 30015bd8deadSopenharmony_ci RESOLVED: Atomics can be useful for operations like locking, or for 30025bd8deadSopenharmony_ci maintaining counters. Note that high-performance GPUs may have hundreds 30035bd8deadSopenharmony_ci of program threads in flight at once, and may also have some SIMD 30045bd8deadSopenharmony_ci characteristics (where threads are grouped and run as a unit). Using 30055bd8deadSopenharmony_ci ATOM instructions with a single memory address to implement a critical 30065bd8deadSopenharmony_ci section will result in serial execution -- only one of the hundreds of 30075bd8deadSopenharmony_ci threads can execute code in the critical section at a time. 30085bd8deadSopenharmony_ci 30095bd8deadSopenharmony_ci When a global operation would be done under a lock, it may be possible 30105bd8deadSopenharmony_ci to improve performance if the algorithm can be parallelized to have 30115bd8deadSopenharmony_ci multiple critical sections. For example, an application could allocate 30125bd8deadSopenharmony_ci an array of shared resources, each protected by its own lock, and use 30135bd8deadSopenharmony_ci the LSBs of the primitive ID or some function of the screen-space (x,y) 30145bd8deadSopenharmony_ci to determine which resource in the array to use. 30155bd8deadSopenharmony_ci 30165bd8deadSopenharmony_ci (16) The atomic instruction ATOM returns the old contents of memory into 30175bd8deadSopenharmony_ci the result register. Should we provide a version of this opcodes 30185bd8deadSopenharmony_ci that doesn't return a value? 30195bd8deadSopenharmony_ci 30205bd8deadSopenharmony_ci RESOLVED: No. In theory, atomics that don't return any values can 30215bd8deadSopenharmony_ci perform better (because the program may not need to allocate resources 30225bd8deadSopenharmony_ci to hold a result or wait for the result. However, a new opcode isn't 30235bd8deadSopenharmony_ci required to obtain this behavior -- a compiler can recognize that the 30245bd8deadSopenharmony_ci result of an ATOM instruction is written to a "dummy" temporary that 30255bd8deadSopenharmony_ci isn't read by subsequent instructions: 30265bd8deadSopenharmony_ci 30275bd8deadSopenharmony_ci TEMP junk; 30285bd8deadSopenharmony_ci ATOM.ADD.U32 junk, address, 1; 30295bd8deadSopenharmony_ci 30305bd8deadSopenharmony_ci The compiler can also recognize that the result will always be discarded 30315bd8deadSopenharmony_ci if a conditional write mask of "(FL)" is used. 30325bd8deadSopenharmony_ci 30335bd8deadSopenharmony_ci ATOM.ADD.U32 not_junk (FL), address, 1; 30345bd8deadSopenharmony_ci 30355bd8deadSopenharmony_ci (17) How do we ensure that memory access made by multiple program 30365bd8deadSopenharmony_ci invocations of possibly different types are coherent? 30375bd8deadSopenharmony_ci 30385bd8deadSopenharmony_ci RESOLVED: Atomic instructions allow program invocations to coordinate 30395bd8deadSopenharmony_ci using shared global memory addresses. However, memory transactions, 30405bd8deadSopenharmony_ci including atomics, are not guaranteed to land in the order specified in 30415bd8deadSopenharmony_ci the program; they may be reordered by the compiler, cached in different 30425bd8deadSopenharmony_ci memory hierarchies, and stored in a distributed memory system where 30435bd8deadSopenharmony_ci later stores to one "partition" might be completed prior to earlier 30445bd8deadSopenharmony_ci stores to another. The MEMBAR instruction helps control memory 30455bd8deadSopenharmony_ci transaction ordering by ensuring that all memory transactions prior to 30465bd8deadSopenharmony_ci the barrier complete before any after the barrier. Additionally the 30475bd8deadSopenharmony_ci ".COH" modifier ensures that memory transactions using the modifier are 30485bd8deadSopenharmony_ci cached coherently and will be visible to other shader invocations. 30495bd8deadSopenharmony_ci 30505bd8deadSopenharmony_ci (18) How do the TXG and TXGO opcodes work with sRGB textures? 30515bd8deadSopenharmony_ci 30525bd8deadSopenharmony_ci RESOLVED. Gamma-correction is applied to the texture source color 30535bd8deadSopenharmony_ci before "gathering" and hence applies to all four components, unless 30545bd8deadSopenharmony_ci the texture swizzle of the selected component is ALPHA in which case 30555bd8deadSopenharmony_ci no gamma-correction is applied. 30565bd8deadSopenharmony_ci 30575bd8deadSopenharmony_ci (19) How can render-to-texture algorithms take advantage of 30585bd8deadSopenharmony_ci MemoryBarrierEXT, nominally provided for global memory transactions? 30595bd8deadSopenharmony_ci 30605bd8deadSopenharmony_ci RESOLVED: Many algorithms use RTT to ping-pong between two allocations, 30615bd8deadSopenharmony_ci using the result of one rendering pass as the input to the next. 30625bd8deadSopenharmony_ci Existing mechanisms require expensive FBO Binds, DrawBuffer changes, or 30635bd8deadSopenharmony_ci FBO attachment changes to safely swap the render target and texture. With 30645bd8deadSopenharmony_ci memory barriers, layered geometry shader rendering, and texture arrays, 30655bd8deadSopenharmony_ci an application can very cheaply ping-pong between two layers of a single 30665bd8deadSopenharmony_ci texture. i.e. 30675bd8deadSopenharmony_ci 30685bd8deadSopenharmony_ci X = 0; 30695bd8deadSopenharmony_ci // Bind the array texture to a texture unit 30705bd8deadSopenharmony_ci // Attach the array texture to an FBO using FramebufferTextureARB 30715bd8deadSopenharmony_ci while (!done) { 30725bd8deadSopenharmony_ci // Stuff X in a constant, vertex attrib, etc. 30735bd8deadSopenharmony_ci Draw - 30745bd8deadSopenharmony_ci Texturing from layer X; 30755bd8deadSopenharmony_ci Writing gl_Layer = 1 - X in the geometry shader; 30765bd8deadSopenharmony_ci 30775bd8deadSopenharmony_ci MemoryBarrierEXT(TEXTURE_FETCH_BARRIER_BIT_NV); 30785bd8deadSopenharmony_ci X = 1 - X; 30795bd8deadSopenharmony_ci } 30805bd8deadSopenharmony_ci 30815bd8deadSopenharmony_ci However, be warned that this requires geometry shaders and hence adds 30825bd8deadSopenharmony_ci the overhead that all geometry must pass through an additional program 30835bd8deadSopenharmony_ci stage, so an application using large amounts of geometry could become 30845bd8deadSopenharmony_ci geometry-limited or more shader-limited. 30855bd8deadSopenharmony_ci 30865bd8deadSopenharmony_ci (20) What is the ".PREC" instruction modifier good for? 30875bd8deadSopenharmony_ci 30885bd8deadSopenharmony_ci RESOLVED: ".PREC" provides some invariance guarantees is useful for 30895bd8deadSopenharmony_ci certain algorithms. Using ".PREC", it is possible to ensure that an 30905bd8deadSopenharmony_ci algorithm can be written to produce identical results on subtly 30915bd8deadSopenharmony_ci different inputs. For example, the order of vertices visible to a 30925bd8deadSopenharmony_ci geometry or tessellation shader used to subdivide primitive edges might 30935bd8deadSopenharmony_ci present an edge shared between two primitives in one direction for one 30945bd8deadSopenharmony_ci primitive and the other direction for the adjacent primitive. Even if 30955bd8deadSopenharmony_ci the weights are identical in the two cases, there may be cracking if the 30965bd8deadSopenharmony_ci computations are being done in an order-dependent manner. If the 30975bd8deadSopenharmony_ci position of a new vertex were evaluation with code below with 30985bd8deadSopenharmony_ci limited-precision floating-point math, it's not necessarily the case 30995bd8deadSopenharmony_ci that we will get the same result for inputs (a,b,c) and (c,b,a) in the 31005bd8deadSopenharmony_ci following code: 31015bd8deadSopenharmony_ci 31025bd8deadSopenharmony_ci ADD result, a, b; 31035bd8deadSopenharmony_ci ADD result, result, c; 31045bd8deadSopenharmony_ci 31055bd8deadSopenharmony_ci There are two problems with this code: the rounding errors will be 31065bd8deadSopenharmony_ci different and the implementation is free to rearrange the computation 31075bd8deadSopenharmony_ci order. The code can be rewritten as follows with ".PREC" and a 31085bd8deadSopenharmony_ci symmetric evaluation order to ensure a precise result with the inputs 31095bd8deadSopenharmony_ci reversed: 31105bd8deadSopenharmony_ci 31115bd8deadSopenharmony_ci ADD result, a, c; 31125bd8deadSopenharmony_ci ADD.PREC result, result, b; 31135bd8deadSopenharmony_ci 31145bd8deadSopenharmony_ci Note that in this example, the first instruction doesn't need the 31155bd8deadSopenharmony_ci ".PREC" qualifier because the second instruction requires that the 31165bd8deadSopenharmony_ci implementation compute <a>+<c>, which will be done reliably if <a> and 31175bd8deadSopenharmony_ci <c> are inputs. If <a> and <c> were results of other computations, the 31185bd8deadSopenharmony_ci first add and possibly the dependent computations may also need to be 31195bd8deadSopenharmony_ci tagged with ".PREC" to ensure reliable results. 31205bd8deadSopenharmony_ci 31215bd8deadSopenharmony_ci The ".PREC" modifier will disable certain optimization and thus carries 31225bd8deadSopenharmony_ci a performance cost. 31235bd8deadSopenharmony_ci 31245bd8deadSopenharmony_ci (21) What are the TGALL, TGANY, TGEQ instructions good for? 31255bd8deadSopenharmony_ci 31265bd8deadSopenharmony_ci RESOLVED: If an implementation performs SIMD thread execution, 31275bd8deadSopenharmony_ci divergent branching may result in reduced performance if the "if" and 31285bd8deadSopenharmony_ci "else" blocks of an "if" statement are executed sequentially. For 31295bd8deadSopenharmony_ci example, an algorithm may have both a "fast path" that performs a 31305bd8deadSopenharmony_ci computation quickly for a subset of all cases and a "fast path" that 31315bd8deadSopenharmony_ci performs a computation quickly but correctly. When performing SIMD 31325bd8deadSopenharmony_ci execution, code like the following: 31335bd8deadSopenharmony_ci 31345bd8deadSopenharmony_ci SNE.S.CC cc.x, condition.x; 31355bd8deadSopenharmony_ci IF NE.x; 31365bd8deadSopenharmony_ci # do fast path 31375bd8deadSopenharmony_ci ELSE; 31385bd8deadSopenharmony_ci # do slow path 31395bd8deadSopenharmony_ci ENDIF; 31405bd8deadSopenharmony_ci 31415bd8deadSopenharmony_ci may end up executing *both* the fast and slow paths for a SIMD thread 31425bd8deadSopenharmony_ci group if <condition> diverges, and may execute more slowly than simply 31435bd8deadSopenharmony_ci executing the slow path unconditionally. These instructions allow code 31445bd8deadSopenharmony_ci like: 31455bd8deadSopenharmony_ci 31465bd8deadSopenharmony_ci # Condition code matches NE if and only if condition.x is non-zero 31475bd8deadSopenharmony_ci # for all threads. 31485bd8deadSopenharmony_ci TGALL.S.CC cc.x, condition.x; 31495bd8deadSopenharmony_ci IF NE.x; 31505bd8deadSopenharmony_ci # do fast path 31515bd8deadSopenharmony_ci ELSE; 31525bd8deadSopenharmony_ci # do slow path 31535bd8deadSopenharmony_ci ENDIF; 31545bd8deadSopenharmony_ci 31555bd8deadSopenharmony_ci that executes the fast path if and only if it can be used for *all* 31565bd8deadSopenharmony_ci threads in the group. For thread groups where <condition> diverges, 31575bd8deadSopenharmony_ci this algorithm would unconditionally run the slow path, but would never 31585bd8deadSopenharmony_ci run both in sequence. 31595bd8deadSopenharmony_ci 31605bd8deadSopenharmony_ci 31615bd8deadSopenharmony_ciRevision History 31625bd8deadSopenharmony_ci 31635bd8deadSopenharmony_ci Rev. Date Author Changes 31645bd8deadSopenharmony_ci ---- -------- -------- ----------------------------------------- 31655bd8deadSopenharmony_ci 8 05/25/22 shqxu Fix use of a removed function 31665bd8deadSopenharmony_ci MemoryBarrierNV. 31675bd8deadSopenharmony_ci 31685bd8deadSopenharmony_ci 7 09/11/14 pbrown Minor typo fixes. 31695bd8deadSopenharmony_ci 31705bd8deadSopenharmony_ci 6 07/04/13 pbrown Add missing language describing the 31715bd8deadSopenharmony_ci <texImageUnitComp> grammar rule for component 31725bd8deadSopenharmony_ci selection in TXG and TXGO instructions. 31735bd8deadSopenharmony_ci 31745bd8deadSopenharmony_ci 5 09/23/10 pbrown Add missing constants for {MIN,MAX}_PROGRAM_ 31755bd8deadSopenharmony_ci TEXTURE_GATHER_OFFSET_NV (same as ARB/core). 31765bd8deadSopenharmony_ci Add missing description for "su" in the opcode 31775bd8deadSopenharmony_ci table; fix a couple operand order bugs for 31785bd8deadSopenharmony_ci STORE. 31795bd8deadSopenharmony_ci 31805bd8deadSopenharmony_ci 4 06/22/10 pbrown Specify that the y/z/w component of the ATOM 31815bd8deadSopenharmony_ci results are undefined, as is the case with 31825bd8deadSopenharmony_ci ATOMIM from EXT_shader_image_load_store. 31835bd8deadSopenharmony_ci 31845bd8deadSopenharmony_ci 3 04/13/10 pbrown Remove F32 support from ATOM.ADD. 31855bd8deadSopenharmony_ci 31865bd8deadSopenharmony_ci 2 03/22/10 pbrown Various wording updates to the spec overview, 31875bd8deadSopenharmony_ci dependencies, issues, and body. Remove various 31885bd8deadSopenharmony_ci spec language that has been refactored into the 31895bd8deadSopenharmony_ci EXT_shader_image_load_store specification. 31905bd8deadSopenharmony_ci 31915bd8deadSopenharmony_ci 1 pbrown Internal revisions. 3192