15bd8deadSopenharmony_ciName
25bd8deadSopenharmony_ci
35bd8deadSopenharmony_ci    NV_gpu_multicast
45bd8deadSopenharmony_ci
55bd8deadSopenharmony_ciName Strings
65bd8deadSopenharmony_ci
75bd8deadSopenharmony_ci    GL_NV_gpu_multicast
85bd8deadSopenharmony_ci
95bd8deadSopenharmony_ciContact
105bd8deadSopenharmony_ci
115bd8deadSopenharmony_ci    Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com)
125bd8deadSopenharmony_ci    Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com)
135bd8deadSopenharmony_ci
145bd8deadSopenharmony_ciContributors
155bd8deadSopenharmony_ci
165bd8deadSopenharmony_ci    Christoph Kubisch, NVIDIA
175bd8deadSopenharmony_ci    Mark Kilgard, NVIDIA
185bd8deadSopenharmony_ci    Robert Menzel, NVIDIA
195bd8deadSopenharmony_ci    Kevin Lefebvre, NVIDIA
205bd8deadSopenharmony_ci    Ralf Biermann, NVIDIA
215bd8deadSopenharmony_ci
225bd8deadSopenharmony_ciStatus
235bd8deadSopenharmony_ci
245bd8deadSopenharmony_ci    Shipping in NVIDIA release 370.XX drivers and up.
255bd8deadSopenharmony_ci
265bd8deadSopenharmony_ciVersion
275bd8deadSopenharmony_ci
285bd8deadSopenharmony_ci    Last Modified Date:         April 2, 2019
295bd8deadSopenharmony_ci    Revision:                   7
305bd8deadSopenharmony_ci
315bd8deadSopenharmony_ciNumber
325bd8deadSopenharmony_ci
335bd8deadSopenharmony_ci    OpenGL Extension #494
345bd8deadSopenharmony_ci
355bd8deadSopenharmony_ciDependencies
365bd8deadSopenharmony_ci
375bd8deadSopenharmony_ci    This extension is written against the OpenGL 4.5 specification
385bd8deadSopenharmony_ci    (Compatibility Profile), dated February 2, 2015.
395bd8deadSopenharmony_ci
405bd8deadSopenharmony_ci    This extension requires ARB_copy_image.
415bd8deadSopenharmony_ci
425bd8deadSopenharmony_ci    This extension interacts with ARB_sample_locations.
435bd8deadSopenharmony_ci
445bd8deadSopenharmony_ci    This extension interacts with ARB_sparse_buffer.
455bd8deadSopenharmony_ci
465bd8deadSopenharmony_ci    This extension requires EXT_direct_state_access.
475bd8deadSopenharmony_ci
485bd8deadSopenharmony_ci    This extension interacts with EXT_bindable_uniform
495bd8deadSopenharmony_ci
505bd8deadSopenharmony_ciOverview
515bd8deadSopenharmony_ci
525bd8deadSopenharmony_ci    This extension enables novel multi-GPU rendering techniques by providing application control
535bd8deadSopenharmony_ci    over a group of linked GPUs with identical hardware configuration.
545bd8deadSopenharmony_ci
555bd8deadSopenharmony_ci    Multi-GPU rendering techniques fall into two categories: implicit and explicit.  Existing
565bd8deadSopenharmony_ci    explicit approaches like WGL_NV_gpu_affinity have two main drawbacks: CPU overhead and
575bd8deadSopenharmony_ci    application complexity.  An application must manage one context per GPU and multi-pump the API
585bd8deadSopenharmony_ci    stream.  Implicit multi-GPU rendering techniques avoid these issues by broadcasting rendering
595bd8deadSopenharmony_ci    from one context to multiple GPUs.  Common implicit approaches include alternate-frame
605bd8deadSopenharmony_ci    rendering (AFR), split-frame rendering (SFR) and multi-GPU anti-aliasing.  They each have
615bd8deadSopenharmony_ci    drawbacks.  AFR scales nicely but interacts poorly with inter-frame dependencies.  SFR can
625bd8deadSopenharmony_ci    improve latency but has challenges with offscreen rendering and scaling of vertex processing.
635bd8deadSopenharmony_ci    With multi-GPU anti-aliasing, each GPU renders the same content with alternate sample
645bd8deadSopenharmony_ci    positions and the driver blends the result to improve quality.  This also has issues with
655bd8deadSopenharmony_ci    offscreen rendering and can conflict with other anti-aliasing techniques.
665bd8deadSopenharmony_ci    
675bd8deadSopenharmony_ci    These issues with implicit multi-GPU rendering all have the same root cause: the driver lacks
685bd8deadSopenharmony_ci    adequate knowledge to accelerate every application.  To resolve this, NV_gpu_multicast
695bd8deadSopenharmony_ci    provides fine-grained, explicit application control over multiple GPUs with a single context.
705bd8deadSopenharmony_ci
715bd8deadSopenharmony_ci    Key points:
725bd8deadSopenharmony_ci
735bd8deadSopenharmony_ci    - One context controls multiple GPUs.  Every GPU in the linked group can access every object.
745bd8deadSopenharmony_ci
755bd8deadSopenharmony_ci    - Rendering is broadcast.  Each draw is repeated across all GPUs in the linked group.
765bd8deadSopenharmony_ci
775bd8deadSopenharmony_ci    - Each GPU gets its own instance of all framebuffers, allowing individualized output for each
785bd8deadSopenharmony_ci      GPU.  Input data can be customized for each GPU using buffers created with the storage flag,
795bd8deadSopenharmony_ci      PER_GPU_STORAGE_BIT_NV and a new API, MulticastBufferSubDataNV. 
805bd8deadSopenharmony_ci
815bd8deadSopenharmony_ci    - New interfaces provide mechanisms to transfer textures and buffers from one GPU to another.
825bd8deadSopenharmony_ci    
835bd8deadSopenharmony_ciNew Procedures and Functions
845bd8deadSopenharmony_ci
855bd8deadSopenharmony_ci    void RenderGpuMaskNV(bitfield mask);
865bd8deadSopenharmony_ci
875bd8deadSopenharmony_ci    void MulticastBufferSubDataNV(
885bd8deadSopenharmony_ci        bitfield gpuMask, uint buffer,
895bd8deadSopenharmony_ci        intptr offset, sizeiptr size,
905bd8deadSopenharmony_ci        const void *data);
915bd8deadSopenharmony_ci
925bd8deadSopenharmony_ci    void MulticastCopyBufferSubDataNV(
935bd8deadSopenharmony_ci        uint readGpu, bitfield writeGpuMask,
945bd8deadSopenharmony_ci        uint readBuffer, uint writeBuffer,
955bd8deadSopenharmony_ci        intptr readOffset, intptr writeOffset, sizeiptr size);
965bd8deadSopenharmony_ci
975bd8deadSopenharmony_ci    void MulticastCopyImageSubDataNV(
985bd8deadSopenharmony_ci        uint srcGpu, bitfield dstGpuMask,
995bd8deadSopenharmony_ci        uint srcName, enum srcTarget, 
1005bd8deadSopenharmony_ci        int srcLevel,
1015bd8deadSopenharmony_ci        int srcX, int srcY, int srcZ,
1025bd8deadSopenharmony_ci        uint dstName, enum dstTarget,
1035bd8deadSopenharmony_ci        int dstLevel,
1045bd8deadSopenharmony_ci        int dstX, int dstY, int dstZ,
1055bd8deadSopenharmony_ci        sizei srcWidth, sizei srcHeight, sizei srcDepth);
1065bd8deadSopenharmony_ci
1075bd8deadSopenharmony_ci    void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu,
1085bd8deadSopenharmony_ci                                    int srcX0, int srcY0, int srcX1, int srcY1,
1095bd8deadSopenharmony_ci                                    int dstX0, int dstY0, int dstX1, int dstY1,
1105bd8deadSopenharmony_ci                                    bitfield mask, enum filter);
1115bd8deadSopenharmony_ci
1125bd8deadSopenharmony_ci    void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start,
1135bd8deadSopenharmony_ci                                                 sizei count, const float *v);
1145bd8deadSopenharmony_ci
1155bd8deadSopenharmony_ci    void MulticastBarrierNV(void);
1165bd8deadSopenharmony_ci
1175bd8deadSopenharmony_ci    void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask);
1185bd8deadSopenharmony_ci
1195bd8deadSopenharmony_ci    void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params);
1205bd8deadSopenharmony_ci    void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params);
1215bd8deadSopenharmony_ci    void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params);
1225bd8deadSopenharmony_ci    void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params);
1235bd8deadSopenharmony_ci
1245bd8deadSopenharmony_ciNew Tokens
1255bd8deadSopenharmony_ci
1265bd8deadSopenharmony_ci    Accepted in the <flags> parameter of BufferStorage and NamedBufferStorageEXT:
1275bd8deadSopenharmony_ci
1285bd8deadSopenharmony_ci        PER_GPU_STORAGE_BIT_NV                     0x0800
1295bd8deadSopenharmony_ci
1305bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and
1315bd8deadSopenharmony_ci    GetDoublev:
1325bd8deadSopenharmony_ci
1335bd8deadSopenharmony_ci        MULTICAST_GPUS_NV                          0x92BA
1345bd8deadSopenharmony_ci        RENDER_GPU_MASK_NV                         0x9558
1355bd8deadSopenharmony_ci
1365bd8deadSopenharmony_ci    Accepted as a value for <pname> for the TexParameter{if}, TexParameter{if}v,
1375bd8deadSopenharmony_ci    TextureParameter{if}, TextureParameter{if}v, MultiTexParameter{if}EXT and
1385bd8deadSopenharmony_ci    MultiTexParameter{if}vEXT commands and for the <value> parameter of GetTexParameter{if}v,
1395bd8deadSopenharmony_ci    GetTextureParameter{if}vEXT and GetMultiTexParameter{if}vEXT: 
1405bd8deadSopenharmony_ci        
1415bd8deadSopenharmony_ci        PER_GPU_STORAGE_NV                          0x9548
1425bd8deadSopenharmony_ci
1435bd8deadSopenharmony_ci    Accepted by the <pname> parameter of GetMultisamplefv:
1445bd8deadSopenharmony_ci
1455bd8deadSopenharmony_ci        MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV   0x9549
1465bd8deadSopenharmony_ci
1475bd8deadSopenharmony_ciAdditions to the OpenGL 4.5 Specification (Compatibility Profile)
1485bd8deadSopenharmony_ci
1495bd8deadSopenharmony_ci    (Add a new chapter after chapter 19 "Compute Shaders")
1505bd8deadSopenharmony_ci
1515bd8deadSopenharmony_ci    20 Multicast Rendering
1525bd8deadSopenharmony_ci
1535bd8deadSopenharmony_ci    Some implementations support multiple linked GPUs driven by a single context.  Often the
1545bd8deadSopenharmony_ci    distribution of work to individual GPUs is managed by the GL without client knowledge.  This
1555bd8deadSopenharmony_ci    chapter specifies commands for explicitly distributing work across GPUs in a linked group.
1565bd8deadSopenharmony_ci    Rendering can be enabled or disabled for specific GPUs.  Draw commands are multicast, or
1575bd8deadSopenharmony_ci    repeated across all enabled GPUs.  Objects are shared by all GPUs, however each GPU has its
1585bd8deadSopenharmony_ci    own instance (copy) of many resources, including framebuffers.  When each GPU has its own
1595bd8deadSopenharmony_ci    instance of a resource, it is considered to have per-GPU storage.  When all GPUs share a
1605bd8deadSopenharmony_ci    single instance of a resource, this is considered GPU-shared storage.
1615bd8deadSopenharmony_ci    
1625bd8deadSopenharmony_ci    The mechanism for linking GPUs is implementation specific, as is the mechanism for enabling
1635bd8deadSopenharmony_ci    multicast rendering support (if necessary).  The number of GPUs usable for multicast rendering
1645bd8deadSopenharmony_ci    by a context can be queried by calling GetIntegerv with the symbolic constant
1655bd8deadSopenharmony_ci    MULTICAST_GPUS_NV.  This number is constant for the lifetime of a context.  Individual GPUs
1665bd8deadSopenharmony_ci    are identified using zero-based indices in the range [0, n-1], where n is the number of
1675bd8deadSopenharmony_ci    multicast GPUs.  GPUs are also identified by bitmasks of the form 2^i, where i is the GPU
1685bd8deadSopenharmony_ci    index.  A set of GPUs is specified by the union of masks for each GPU in the set.
1695bd8deadSopenharmony_ci
1705bd8deadSopenharmony_ci    20.1 Controlling Individual GPUs 
1715bd8deadSopenharmony_ci
1725bd8deadSopenharmony_ci    Render commands are restricted to a specific set of GPUs with
1735bd8deadSopenharmony_ci
1745bd8deadSopenharmony_ci      void RenderGpuMaskNV(bitfield mask);
1755bd8deadSopenharmony_ci
1765bd8deadSopenharmony_ci    The following errors apply to RenderGpuMaskNV:
1775bd8deadSopenharmony_ci
1785bd8deadSopenharmony_ci    INVALID_OPERATION is generated
1795bd8deadSopenharmony_ci    * if <mask> is zero,
1805bd8deadSopenharmony_ci    * if <mask> is not zero and <mask> is greater than or equal to 2^n, where n is equal
1815bd8deadSopenharmony_ci    to MULTICAST_GPUS_NV,
1825bd8deadSopenharmony_ci    * if issued between BeginConditionalRender and the corresponding EndConditionalRender.
1835bd8deadSopenharmony_ci
1845bd8deadSopenharmony_ci    If the command does not generate an error, RENDER_GPU_MASK_NV is set to <mask>.  The default
1855bd8deadSopenharmony_ci    value of RENDER_GPU_MASK_NV is (2^n)-1.
1865bd8deadSopenharmony_ci
1875bd8deadSopenharmony_ci    Render commands are skipped for a GPU that is not present in RENDER_GPU_MASK_NV.  For example:
1885bd8deadSopenharmony_ci    draw calls, clears, compute dispatches, and copies or pixel path operations that write to a
1895bd8deadSopenharmony_ci    framebuffer (e.g. DrawPixels, BlitFramebuffer).  For a full list of render commands see
1905bd8deadSopenharmony_ci    section 2.4 (page 26).  MulticastBlitFramebufferNV is an exception to this policy: while it is
1915bd8deadSopenharmony_ci    a rendering command, it has its own source and destinations mask.  Note that buffer and
1925bd8deadSopenharmony_ci    textures updates are not affected by RENDER_GPU_MASK_NV.
1935bd8deadSopenharmony_ci    
1945bd8deadSopenharmony_ci    20.2 Multi-GPU Buffer Storage
1955bd8deadSopenharmony_ci
1965bd8deadSopenharmony_ci    Like other resources, buffer objects can have two types of storage, per-GPU storage or
1975bd8deadSopenharmony_ci    GPU-shared storage.  Per-GPU storage can be explicitly requested using the
1985bd8deadSopenharmony_ci    PER_GPU_STORAGE_BIT_NV flag with BufferStorage/NamedBufferStorageEXT.  If this flag is not
1995bd8deadSopenharmony_ci    set, the type of storage used is undefined.  The implementation may use either type and
2005bd8deadSopenharmony_ci    transition between them at any time.  Client reads of a buffer with per-GPU storage may source
2015bd8deadSopenharmony_ci    from any GPU.
2025bd8deadSopenharmony_ci
2035bd8deadSopenharmony_ci    The following rules apply to buffer objects with per-GPU storage:
2045bd8deadSopenharmony_ci
2055bd8deadSopenharmony_ci      When mapped updates apply to all GPUs (only WRITE_ONLY access is supported).
2065bd8deadSopenharmony_ci      When used as the write buffer for CopyBufferSubData or CopyNamedBufferSubData, writes apply
2075bd8deadSopenharmony_ci      to all GPUs.
2085bd8deadSopenharmony_ci
2095bd8deadSopenharmony_ci    The following commands affect storage on all GPUs, even if the buffer object has per-GPU
2105bd8deadSopenharmony_ci    storage:
2115bd8deadSopenharmony_ci
2125bd8deadSopenharmony_ci      BufferSubData, NamedBufferSubData, ClearBufferSubData, and ClearNamedBufferData
2135bd8deadSopenharmony_ci    
2145bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if BufferStorage/NamedBufferStorageEXT is called with
2155bd8deadSopenharmony_ci    PER_GPU_STORAGE_BIT_NV set with MAP_READ_BIT or SPARSE_STORAGE_BIT_ARB.
2165bd8deadSopenharmony_ci
2175bd8deadSopenharmony_ci    To modify buffer object data on one or more GPUs, the client may use the command
2185bd8deadSopenharmony_ci
2195bd8deadSopenharmony_ci      void MulticastBufferSubDataNV(
2205bd8deadSopenharmony_ci          bitfield gpuMask, uint buffer,
2215bd8deadSopenharmony_ci          intptr offset, sizeiptr size,
2225bd8deadSopenharmony_ci          const void *data);
2235bd8deadSopenharmony_ci
2245bd8deadSopenharmony_ci    This command operates similarly to NamedBufferSubData, except that it updates the per-GPU
2255bd8deadSopenharmony_ci    buffer data on the set of GPUs defined by <gpuMask>.  If <buffer> has GPU-shared storage,
2265bd8deadSopenharmony_ci    <gpuMask> is ignored and the shared instance of the buffer is updated.
2275bd8deadSopenharmony_ci
2285bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if <gpuMask> is zero or is greater than or equal to 2^n,
2295bd8deadSopenharmony_ci    where n is equal to MULTICAST_GPUS_NV.
2305bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if <buffer> is not the name of an existing buffer
2315bd8deadSopenharmony_ci    object.
2325bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if <offset> or <size> is negative, or if <offset> + <size>
2335bd8deadSopenharmony_ci    is greater than the value of BUFFER_SIZE for the buffer object.
2345bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if any part of the specified buffer range is mapped
2355bd8deadSopenharmony_ci    with MapBufferRange or MapBuffer (see section 6.3), unless it was mapped with
2365bd8deadSopenharmony_ci    MAP_PERSISTENT_BIT set in the MapBufferRange access flags.
2375bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if the BUFFER_IMMUTABLE_STORAGE flag of the buffer
2385bd8deadSopenharmony_ci    object is TRUE and the value of BUFFER_STORAGE_FLAGS for the buffer does not have the
2395bd8deadSopenharmony_ci    DYNAMIC_STORAGE_BIT set.
2405bd8deadSopenharmony_ci
2415bd8deadSopenharmony_ci    To copy between buffers created with PER_GPU_STORAGE_BIT_NV, the client may use the command 
2425bd8deadSopenharmony_ci
2435bd8deadSopenharmony_ci      void MulticastCopyBufferSubDataNV(
2445bd8deadSopenharmony_ci        uint readGpu, bitfield writeGpuMask,
2455bd8deadSopenharmony_ci        uint readBuffer, uint writeBuffer,
2465bd8deadSopenharmony_ci        intptr readOffset, intptr writeOffset, sizeiptr size);
2475bd8deadSopenharmony_ci
2485bd8deadSopenharmony_ci    This command operates similarly to CopyNamedBufferSubData, while adding control over the
2495bd8deadSopenharmony_ci    source and destination GPU(s).  The read GPU index is specified by <readGpu> and
2505bd8deadSopenharmony_ci    the set of write GPUs is specified by the mask in <writeGpuMask>.
2515bd8deadSopenharmony_ci    
2525bd8deadSopenharmony_ci    Implementations may also support this command with buffers not created with
2535bd8deadSopenharmony_ci    PER_GPU_STORAGE_BIT_NV.  This support can be determined with one test copy with an error check
2545bd8deadSopenharmony_ci    (see error discussion below).  Note that a buffer created without PER_GPU_STORAGE_BIT_NV is
2555bd8deadSopenharmony_ci    considered to have undefined storage and the behavior of the command depends on the storage
2565bd8deadSopenharmony_ci    type (per-GPU or GPU-shared) currently used for <writeBuffer>.  If <writeBuffer> is using
2575bd8deadSopenharmony_ci    GPU-shared storage, the normal error checks apply but the command behaves as if <writeGpuMask>
2585bd8deadSopenharmony_ci    includes all GPUs.  If <writeBuffer> is using per-GPU storage, the command behaves as if
2595bd8deadSopenharmony_ci    PER_GPU_STORAGE_BIT_NV were set, however performance may be reduced.
2605bd8deadSopenharmony_ci
2615bd8deadSopenharmony_ci    This following error may apply to MulticastCopyBufferSubDataNV on some implementations and not
2625bd8deadSopenharmony_ci    on others.  In earlier revisions of this extension the error was required, therefore
2635bd8deadSopenharmony_ci    applications should perform a test copy using buffers without PER_GPU_STORAGE_BIT_NV before
2645bd8deadSopenharmony_ci    relying on that functionality:
2655bd8deadSopenharmony_ci
2665bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if the value of BUFFER_STORAGE_FLAGS for <readBuffer>
2675bd8deadSopenharmony_ci    or <writeBuffer> does not have PER_GPU_STORAGE_BIT_NV set.
2685bd8deadSopenharmony_ci
2695bd8deadSopenharmony_ci    The following errors apply to MulticastCopyBufferSubDataNV:
2705bd8deadSopenharmony_ci
2715bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if <readBuffer> or <writeBuffer> is not the name of an
2725bd8deadSopenharmony_ci    existing buffer object.
2735bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if any of <readOffset>, <writeOffset>, or <size> are
2745bd8deadSopenharmony_ci    negative, if <readOffset> + <size> exceeds the size of the source buffer object, or if
2755bd8deadSopenharmony_ci    <writeOffset> + <size> exceeds the size of the destination buffer object.
2765bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if either the source or destination buffer objects is
2775bd8deadSopenharmony_ci    mapped, unless they were mapped with MAP_PERSISTENT_BIT set in the Map*BufferRange access
2785bd8deadSopenharmony_ci    flags.
2795bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if <readGpu> is greater than or equal to
2805bd8deadSopenharmony_ci    MULTICAST_GPUS_NV.
2815bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated if <writeGpuMask> is zero.  An INVALID_VALUE error is
2825bd8deadSopenharmony_ci    generated if <writeGpuMask> is not zero and <writeGpuMask> is greater than or equal to 2^n,
2835bd8deadSopenharmony_ci    where n is equal to MULTICAST_GPUS_NV.
2845bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if the source and destination are the same buffer object,
2855bd8deadSopenharmony_ci    <readGpu> is present in <writeGpuMask>, and the ranges [<readOffset>; <readOffset> + <size>)
2865bd8deadSopenharmony_ci    and [<writeOffset>; <writeOffset> + <size>) overlap.
2875bd8deadSopenharmony_ci
2885bd8deadSopenharmony_ci    20.3 Multi-GPU Framebuffers and Textures
2895bd8deadSopenharmony_ci
2905bd8deadSopenharmony_ci    All buffers in the default framebuffer as well as renderbuffers receive per-GPU storage.  By
2915bd8deadSopenharmony_ci    default, storage for textures is undefined: it may be per-GPU or GPU-shared and can transition
2925bd8deadSopenharmony_ci    between the types at any time.  Per-GPU storage can be specified via
2935bd8deadSopenharmony_ci    [Multi]Tex[ture]Parameter{if}[v] with PER_GPU_STORAGE_NV for the <pname> argument and TRUE for
2945bd8deadSopenharmony_ci    the value.  For this storage parameter to take effect, it must be specified after the texture
2955bd8deadSopenharmony_ci    object is created and before the texture contents are defined by TexImage*, TexStorage* or
2965bd8deadSopenharmony_ci    TextureStorage*.
2975bd8deadSopenharmony_ci
2985bd8deadSopenharmony_ci    20.3.1 Copying Image Data Between GPUs
2995bd8deadSopenharmony_ci
3005bd8deadSopenharmony_ci    To copy texel data between GPUs, the client may use the command:
3015bd8deadSopenharmony_ci
3025bd8deadSopenharmony_ci    void MulticastCopyImageSubDataNV(
3035bd8deadSopenharmony_ci        uint srcGpu, bitfield dstGpuMask,
3045bd8deadSopenharmony_ci        uint srcName, enum srcTarget, 
3055bd8deadSopenharmony_ci        int srcLevel,
3065bd8deadSopenharmony_ci        int srcX, int srcY, int srcZ,
3075bd8deadSopenharmony_ci        uint dstName, enum dstTarget,
3085bd8deadSopenharmony_ci        int dstLevel,
3095bd8deadSopenharmony_ci        int dstX, int dstY, int dstZ,
3105bd8deadSopenharmony_ci        sizei srcWidth, sizei srcHeight, sizei srcDepth);
3115bd8deadSopenharmony_ci
3125bd8deadSopenharmony_ci    This command operates equivalently to CopyImageSubData, except that it takes a source GPU and
3135bd8deadSopenharmony_ci    a destination GPU set defined by <srcGpu> and <dstGpuMask> (respectively).  Texel data is
3145bd8deadSopenharmony_ci    copied from the source GPU to all destination GPUs.  The following errors apply to
3155bd8deadSopenharmony_ci    MulticastCopyImageSubDataNV:
3165bd8deadSopenharmony_ci
3175bd8deadSopenharmony_ci    INVALID_ENUM is generated
3185bd8deadSopenharmony_ci     * if either <srcTarget> or <dstTarget> 
3195bd8deadSopenharmony_ci      - is not RENDERBUFFER or a valid non-proxy texture target
3205bd8deadSopenharmony_ci      - is TEXTURE_BUFFER, or
3215bd8deadSopenharmony_ci      - is one of the cubemap face selectors described in table 3.17,
3225bd8deadSopenharmony_ci     * if the target does not match the type of the object.
3235bd8deadSopenharmony_ci
3245bd8deadSopenharmony_ci    INVALID_OPERATION is generated
3255bd8deadSopenharmony_ci     * if either object is a texture and the texture is not complete,
3265bd8deadSopenharmony_ci     * if the source and destination formats are not compatible,
3275bd8deadSopenharmony_ci     * if the source and destination number of samples do not match,
3285bd8deadSopenharmony_ci     * if one image is compressed and the other is uncompressed and the
3295bd8deadSopenharmony_ci       block size of compressed image is not equal to the texel size
3305bd8deadSopenharmony_ci       of the compressed image.
3315bd8deadSopenharmony_ci
3325bd8deadSopenharmony_ci    INVALID_VALUE is generated
3335bd8deadSopenharmony_ci     * if <srcGpu> is greater than or equal to MULTICAST_GPUS_NV,
3345bd8deadSopenharmony_ci     * if <dstGpuMask> is zero,
3355bd8deadSopenharmony_ci     * if <dstGpuMask> is greater than or equal to 2^n, where n is equal to
3365bd8deadSopenharmony_ci       MULTICAST_GPUS_NV,
3375bd8deadSopenharmony_ci     * if either <srcName> or <dstName> does not correspond to a valid
3385bd8deadSopenharmony_ci       renderbuffer or texture object according to the corresponding
3395bd8deadSopenharmony_ci       target parameter, or
3405bd8deadSopenharmony_ci     * if the specified level is not a valid level for the image, or
3415bd8deadSopenharmony_ci     * if the dimensions of the either subregion exceeds the boundaries 
3425bd8deadSopenharmony_ci       of the corresponding image object, or
3435bd8deadSopenharmony_ci     * if the image format is compressed and the dimensions of the
3445bd8deadSopenharmony_ci       subregion fail to meet the alignment constraints of the format.
3455bd8deadSopenharmony_ci
3465bd8deadSopenharmony_ci    To copy pixel values from one GPU to another use the following command:
3475bd8deadSopenharmony_ci
3485bd8deadSopenharmony_ci    void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu,
3495bd8deadSopenharmony_ci                                    int srcX0, int srcY0, int srcX1, int srcY1,
3505bd8deadSopenharmony_ci                                    int dstX0, int dstY0, int dstX1, int dstY1,
3515bd8deadSopenharmony_ci                                    bitfield mask, enum filter);
3525bd8deadSopenharmony_ci
3535bd8deadSopenharmony_ci    This command operates equivalently to BlitNamedFramebuffer except that it takes a source GPU
3545bd8deadSopenharmony_ci    and a destination GPU defined by <srcGpu> and <dstGpu> (respectively).  Pixel values are
3555bd8deadSopenharmony_ci    copied from the read framebuffer on the source GPU to the draw framebuffer on the destination
3565bd8deadSopenharmony_ci    GPU.
3575bd8deadSopenharmony_ci
3585bd8deadSopenharmony_ci    In addition to the errors generated by BlitNamedFramebuffer (see listing starting on page
3595bd8deadSopenharmony_ci    634), calling MulticastBlitFramebufferNV will generate INVALID_VALUE if <srcGpu> or <dstGpu>
3605bd8deadSopenharmony_ci    is greater than or equal to MULTICAST_GPUS_NV.
3615bd8deadSopenharmony_ci    
3625bd8deadSopenharmony_ci    20.3.2 Per-GPU Sample Locations  
3635bd8deadSopenharmony_ci
3645bd8deadSopenharmony_ci    Programmable sample locations can be customized for each GPU and framebuffer using the
3655bd8deadSopenharmony_ci    following command:
3665bd8deadSopenharmony_ci
3675bd8deadSopenharmony_ci    void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start,
3685bd8deadSopenharmony_ci                                                 sizei count, const float *v);
3695bd8deadSopenharmony_ci
3705bd8deadSopenharmony_ci    An INVALID_OPERATION error is generated by MulticastFramebufferSampleLocationsfvNV if
3715bd8deadSopenharmony_ci    <framebuffer> is not the name of an existing framebuffer object.
3725bd8deadSopenharmony_ci   
3735bd8deadSopenharmony_ci    INVALID_VALUE is generated if the sum of <start> and <count> is greater than
3745bd8deadSopenharmony_ci    PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB.
3755bd8deadSopenharmony_ci
3765bd8deadSopenharmony_ci    An INVALID_VALUE error is generated if <gpu> is greater than or equal to MULTICAST_GPUS_NV.
3775bd8deadSopenharmony_ci
3785bd8deadSopenharmony_ci    This is equivalent to FramebufferSampleLocationsfvARB except that it sets
3795bd8deadSopenharmony_ci    MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV at the appropriate offset for the specified GPU.
3805bd8deadSopenharmony_ci    Just as with FramebufferSampleLocationsfvARB, FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB
3815bd8deadSopenharmony_ci    must be enabled for these sample locations to take effect.  FramebufferSampleLocationsfvARB
3825bd8deadSopenharmony_ci    and NamedFramebufferSampleLocationsfvARB also set MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV
3835bd8deadSopenharmony_ci    but for the specified sample across all multicast GPUs.  If <gpu> is 0,
3845bd8deadSopenharmony_ci    MulticastFramebufferSampleLocationsfvNV updates PROGRAMMABLE_SAMPLE_LOCATION_ARB in addition
3855bd8deadSopenharmony_ci    to MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV.
3865bd8deadSopenharmony_ci
3875bd8deadSopenharmony_ci    The programmed sample locations can be retrieved using GetMultisamplefv with <pname> set to
3885bd8deadSopenharmony_ci    MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV and indices calculated as follows:
3895bd8deadSopenharmony_ci
3905bd8deadSopenharmony_ci        index_x = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i;
3915bd8deadSopenharmony_ci        index_y = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i + 1;
3925bd8deadSopenharmony_ci
3935bd8deadSopenharmony_ci    20.4 Interactions with Other Copy Functions
3945bd8deadSopenharmony_ci
3955bd8deadSopenharmony_ci    Many existing commands can be used to copy between resources with GPU-shared, per-GPU or
3965bd8deadSopenharmony_ci    undefined storage.  For example: ReadPixels, GetBufferSubData or TexImage2D with a pixel
3975bd8deadSopenharmony_ci    unpack buffer.  The following table defines how the storage of the resource influences the
3985bd8deadSopenharmony_ci    behavior of these copies.
3995bd8deadSopenharmony_ci
4005bd8deadSopenharmony_ci    Table 20.1 Behavior of Copy Commands with Multi-GPU Storage 
4015bd8deadSopenharmony_ci
4025bd8deadSopenharmony_ci    Source     Destination Behavior
4035bd8deadSopenharmony_ci    ---------- ----------- -----------------------------------------------------------------------
4045bd8deadSopenharmony_ci    GPU-shared GPU-shared  There is just one source and one destination.  Copy from source to
4055bd8deadSopenharmony_ci                           destination.
4065bd8deadSopenharmony_ci    GPU-shared per-GPU     There is a single source.  Copy it to the destination on all GPUs.
4075bd8deadSopenharmony_ci    GPU-shared undefined   Either of the above behaviors for a GPU-shared source may apply.
4085bd8deadSopenharmony_ci
4095bd8deadSopenharmony_ci    per-GPU    GPU-shared  Copy from the GPU with the lowest index set in RENDER_GPU_MASK_NV to
4105bd8deadSopenharmony_ci                           to the shared destination.
4115bd8deadSopenharmony_ci    per-GPU    per-GPU     Implementations are encouraged to copy from source to destination 
4125bd8deadSopenharmony_ci                           separately on each GPU.  This is not required.  If and when this is not
4135bd8deadSopenharmony_ci                           feasible, the copy should source from the GPU with the lowest index set
4145bd8deadSopenharmony_ci                           in RENDER_GPU_MASK_NV.
4155bd8deadSopenharmony_ci    per-GPU    undefined   Either of the above behaviors for a per-GPU source may apply.
4165bd8deadSopenharmony_ci
4175bd8deadSopenharmony_ci    undefined  GPU-shared  Either of the above behaviors for a GPU-shared destination may apply.
4185bd8deadSopenharmony_ci    undefined  per-GPU     Either of the above behaviors for a per-GPU destination may apply. 
4195bd8deadSopenharmony_ci    undefined  undefined   Any of the above behaviors may apply.
4205bd8deadSopenharmony_ci
4215bd8deadSopenharmony_ci    20.5 Multi-GPU Synchronization
4225bd8deadSopenharmony_ci
4235bd8deadSopenharmony_ci    MulticastCopyImageSubDataNV and MulticastCopyBufferSubDataNV each provide implicit
4245bd8deadSopenharmony_ci    synchronization with previous work on the source GPU.  MulticastBlitFramebufferNV is
4255bd8deadSopenharmony_ci    different, providing implicit synchronization with previous work on the destination GPU.
4265bd8deadSopenharmony_ci    In both cases, synchronization of the copies can be achieved with calls to the barrier
4275bd8deadSopenharmony_ci    command:
4285bd8deadSopenharmony_ci
4295bd8deadSopenharmony_ci      void MulticastBarrierNV(void);
4305bd8deadSopenharmony_ci
4315bd8deadSopenharmony_ci    This is called to block all GPUs until all previous commands have been completed by all GPUs,
4325bd8deadSopenharmony_ci    and all writes have landed.  To guarantee consistency, synchronization must be placed between
4335bd8deadSopenharmony_ci    any two accesses by multiple GPUs to the same memory when at least one of the accesses is a
4345bd8deadSopenharmony_ci    write.  This includes accesses to both the source and the destination.  The safest approach is
4355bd8deadSopenharmony_ci    to call MulticastBarrierNV immediately before and after each copy that involves multiple GPUs.
4365bd8deadSopenharmony_ci    
4375bd8deadSopenharmony_ci    GPU writes and reads to/from GPU-shared locations require synchronization as well.  GPU writes
4385bd8deadSopenharmony_ci    such as transform feedback, shader image store, CopyTexImage, CopyBufferSubData are not
4395bd8deadSopenharmony_ci    automatically synchronized with writes by other GPUs.  Neither are GPU reads such as texture
4405bd8deadSopenharmony_ci    fetches, shader image loads, CopyTexImage, etc. synchronized with writes by other GPUs.
4415bd8deadSopenharmony_ci    Existing barriers such as TextureBarrier and MemoryBarrier only provide consistency guarantees
4425bd8deadSopenharmony_ci    for rendering, writes and reads on a single GPU.
4435bd8deadSopenharmony_ci
4445bd8deadSopenharmony_ci    In some cases it may be desirable to have one or more GPUs wait for an operation to complete
4455bd8deadSopenharmony_ci    on another GPU without synchronizing all GPUs with MulticastBarrierNV.  This can be performed
4465bd8deadSopenharmony_ci    with the following command:
4475bd8deadSopenharmony_ci
4485bd8deadSopenharmony_ci      void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask);
4495bd8deadSopenharmony_ci
4505bd8deadSopenharmony_ci    INVALID_VALUE is generated
4515bd8deadSopenharmony_ci     * if <signalGpu> is greater than or equal to MULTICAST_GPUS_NV,
4525bd8deadSopenharmony_ci     * if <waitGpuMask> is zero,
4535bd8deadSopenharmony_ci     * if <waitGpuMask> is greater than or equal to 2^n, where n is equal to
4545bd8deadSopenharmony_ci       MULTICAST_GPUS_NV, or
4555bd8deadSopenharmony_ci     * if <signalGpu> is present in <waitGpuMask>.
4565bd8deadSopenharmony_ci
4575bd8deadSopenharmony_ci    MulticastWaitSyncNV provides the same consistency guarantees as MulticastBarrierNV but only
4585bd8deadSopenharmony_ci    between the GPUs specified by <signalGpu> and <waitGpuMask> in a single direction.  It forces
4595bd8deadSopenharmony_ci    the GPUs specified by waitGpuMask to wait until the GPU specified by <signalGpu> has completed
4605bd8deadSopenharmony_ci    all previous commands and writes associated with those commands.
4615bd8deadSopenharmony_ci
4625bd8deadSopenharmony_ci    20.6 Multi-GPU Queries
4635bd8deadSopenharmony_ci
4645bd8deadSopenharmony_ci    Queries are performed across all multicast GPUs.  Each query object stores independent result
4655bd8deadSopenharmony_ci    values for each GPU.  The result value for a specific GPU can be queried using one of the 
4665bd8deadSopenharmony_ci    following commands:
4675bd8deadSopenharmony_ci    
4685bd8deadSopenharmony_ci    void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params);
4695bd8deadSopenharmony_ci    void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params);
4705bd8deadSopenharmony_ci    void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params);
4715bd8deadSopenharmony_ci    void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params);
4725bd8deadSopenharmony_ci
4735bd8deadSopenharmony_ci    The behavior of these commands matches the GetQueryObject* equivalent commands, except they
4745bd8deadSopenharmony_ci    return the result value for the specified GPU.  A query may be available on one GPU but not on
4755bd8deadSopenharmony_ci    another, so it may be necessary to check QUERY_RESULT_AVAILABLE for each GPU.  GetQueryObject*
4765bd8deadSopenharmony_ci    return query results and availability for GPU 0 only.
4775bd8deadSopenharmony_ci
4785bd8deadSopenharmony_ci    In addition to the errors generated by GetQueryObject* (see the listing in section 4.2 on page
4795bd8deadSopenharmony_ci    49), calling MulticastGetQueryObject* will generate INVALID_VALUE if <gpu> is greater than or
4805bd8deadSopenharmony_ci    equal to MULTICAST_GPUS_NV.
4815bd8deadSopenharmony_ci
4825bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL 4.5 (Compatibility Profile) Specification
4835bd8deadSopenharmony_ci(Textures and Samplers)
4845bd8deadSopenharmony_ci
4855bd8deadSopenharmony_ci    Modify Section 8.10 (Texture Parameters)
4865bd8deadSopenharmony_ci
4875bd8deadSopenharmony_ci    Insert the following paragraph before Table 8.25 (Texture parameters and their values):
4885bd8deadSopenharmony_ci
4895bd8deadSopenharmony_ci        If <pname> is PER_GPU_STORAGE_NV, then the state is stored in the texture, but only takes
4905bd8deadSopenharmony_ci    effect the next time storage is allocated for a texture using TexImage*, TexStorage* or
4915bd8deadSopenharmony_ci    TextureStorage*.  If the value of TEXTURE_IMMUTABLE_FORMAT is TRUE, then PER_GPU_STORAGE_NV
4925bd8deadSopenharmony_ci    cannot be changed and an error is generated.
4935bd8deadSopenharmony_ci
4945bd8deadSopenharmony_ci    Additions to Table 8.26 Texture parameters and their values
4955bd8deadSopenharmony_ci
4965bd8deadSopenharmony_ci    Name               Type    Legal values
4975bd8deadSopenharmony_ci    ------------------ ------- ------------
4985bd8deadSopenharmony_ci    PER_GPU_STORAGE_NV boolean TRUE, FALSE
4995bd8deadSopenharmony_ci
5005bd8deadSopenharmony_ciAdditions to Chapter 10 of the OpenGL 4.5 (Compatibility Profile) Specification
5015bd8deadSopenharmony_ci(Vertex Specification and Drawing Commands)
5025bd8deadSopenharmony_ci
5035bd8deadSopenharmony_ci    Modify Section 10.9 (Conditional Rendering)
5045bd8deadSopenharmony_ci
5055bd8deadSopenharmony_ci    Replace the following text:
5065bd8deadSopenharmony_ci
5075bd8deadSopenharmony_ci        If the result (SAMPLES_PASSED) of the query is zero, or if the result (ANY_SAMPLES_PASSED
5085bd8deadSopenharmony_ci        or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, all rendering commands described in
5095bd8deadSopenharmony_ci        section 2.4 are discarded and have no effect when issued between BeginConditional- Render
5105bd8deadSopenharmony_ci        and the corresponding EndConditionalRender
5115bd8deadSopenharmony_ci
5125bd8deadSopenharmony_ci    with this text:
5135bd8deadSopenharmony_ci
5145bd8deadSopenharmony_ci        For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is
5155bd8deadSopenharmony_ci        zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE,
5165bd8deadSopenharmony_ci        all rendering commands described in section 2.4 are discarded by this GPU and have no
5175bd8deadSopenharmony_ci        effect when issued between BeginConditional- Render and the corresponding
5185bd8deadSopenharmony_ci        EndConditionalRender
5195bd8deadSopenharmony_ci
5205bd8deadSopenharmony_ci    Similarly replace the following:
5215bd8deadSopenharmony_ci
5225bd8deadSopenharmony_ci        If the result (SAMPLES_PASSED) of the query is non-zero, or if the result
5235bd8deadSopenharmony_ci        (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is TRUE, such commands are not
5245bd8deadSopenharmony_ci        discarded.
5255bd8deadSopenharmony_ci
5265bd8deadSopenharmony_ci    with this:
5275bd8deadSopenharmony_ci
5285bd8deadSopenharmony_ci        For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is
5295bd8deadSopenharmony_ci        non-zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is
5305bd8deadSopenharmony_ci        TRUE, such commands are not discarded.
5315bd8deadSopenharmony_ci
5325bd8deadSopenharmony_ci    Finally, replace all instances of "the GL" with "each active render GPU".
5335bd8deadSopenharmony_ci
5345bd8deadSopenharmony_ciAdditions to Chapter 14 of the OpenGL 4.5 (Compatibility Profile) Specification
5355bd8deadSopenharmony_ci(Fixed-Function Primitive Assembly and Rasterization)
5365bd8deadSopenharmony_ci
5375bd8deadSopenharmony_ci    Modify Section 14.3.1 (Multisampling)
5385bd8deadSopenharmony_ci
5395bd8deadSopenharmony_ci    Replace the following text:
5405bd8deadSopenharmony_ci
5415bd8deadSopenharmony_ci        The location for sample <i> is taken from v[2*(i-start)] and v[2*(i-start)+1].
5425bd8deadSopenharmony_ci
5435bd8deadSopenharmony_ci    with the following:
5445bd8deadSopenharmony_ci
5455bd8deadSopenharmony_ci        These commands set the sample locations for all multicast GPUs in
5465bd8deadSopenharmony_ci        MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV.  The location for sample <i> on
5475bd8deadSopenharmony_ci        gpu <g> is taken from v[g*N+2*(i-start)] and v[g*N+2*(i-start)+1].
5485bd8deadSopenharmony_ci
5495bd8deadSopenharmony_ci    Replace the following error generated by GetMultisamplefv:
5505bd8deadSopenharmony_ci
5515bd8deadSopenharmony_ci        An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB or
5525bd8deadSopenharmony_ci        PROGRAMMABLE_SAMPLE_LOCATION_ARB.
5535bd8deadSopenharmony_ci
5545bd8deadSopenharmony_ci    with the following:
5555bd8deadSopenharmony_ci
5565bd8deadSopenharmony_ci        An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB,
5575bd8deadSopenharmony_ci        PROGRAMMABLE_SAMPLE_LOCATION_ARB or MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV.
5585bd8deadSopenharmony_ci
5595bd8deadSopenharmony_ci    Add the following to the list of errors generated by GetMultisamplefv:
5605bd8deadSopenharmony_ci
5615bd8deadSopenharmony_ci        An INVALID_VALUE error is generated if <pname> is
5625bd8deadSopenharmony_ci        MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_ARB and <index> is greater than or equal to the
5635bd8deadSopenharmony_ci        value of PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB multiplied by the value of
5645bd8deadSopenharmony_ci        MULTICAST_GPUS_NV.
5655bd8deadSopenharmony_ci
5665bd8deadSopenharmony_ci    Replace the following pseudocode (in both locations):
5675bd8deadSopenharmony_ci
5685bd8deadSopenharmony_ci        float *table = FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB;
5695bd8deadSopenharmony_ci        sample_location.xy = (table[2*sample_i], table[2*sample_i+1]);
5705bd8deadSopenharmony_ci
5715bd8deadSopenharmony_ci    with the following:
5725bd8deadSopenharmony_ci    
5735bd8deadSopenharmony_ci        float *table = MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV;
5745bd8deadSopenharmony_ci        table += PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB * gpu;
5755bd8deadSopenharmony_ci        sample_location.xy = (table[2*sample_i], table[2*sample_i+1]);
5765bd8deadSopenharmony_ci
5775bd8deadSopenharmony_ciAdditions to the WGL/GLX/EGL/AGL Specifications
5785bd8deadSopenharmony_ci
5795bd8deadSopenharmony_ci    None
5805bd8deadSopenharmony_ci
5815bd8deadSopenharmony_ciDependencies on ARB_sample_locations
5825bd8deadSopenharmony_ci
5835bd8deadSopenharmony_ci    If ARB_sample_locations is not supported, section 20.3.2 and any references to
5845bd8deadSopenharmony_ci    MulticastFramebufferSampleLocationsfvNV and MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV should
5855bd8deadSopenharmony_ci    be removed.  The modifications to Section 14.3.1 (Multisampling) should also be removed.
5865bd8deadSopenharmony_ci
5875bd8deadSopenharmony_ciDependencies on ARB_sparse_buffer
5885bd8deadSopenharmony_ci
5895bd8deadSopenharmony_ci    If ARB_sparse_buffer is not supported, any reference to SPARSE_STORAGE_BIT_ARB should be
5905bd8deadSopenharmony_ci    removed.
5915bd8deadSopenharmony_ci
5925bd8deadSopenharmony_ciInteractions with EXT_bindable_uniform
5935bd8deadSopenharmony_ci
5945bd8deadSopenharmony_ci    When using the functionality of EXT_bindable_uniform and a per-GPU storage buffer is bound
5955bd8deadSopenharmony_ci    to a bindable location in a program object, client uniform updates apply to all GPUs.
5965bd8deadSopenharmony_ci
5975bd8deadSopenharmony_ci    An INVALID_OPERATION is generated if a buffer with PER_GPU_STORAGE_BIT_NV is bound to a
5985bd8deadSopenharmony_ci    program object's bindable location and GetUniformfv, GetUniformiv, GetUniformuiv or
5995bd8deadSopenharmony_ci    GetUniformdv is called.
6005bd8deadSopenharmony_ci
6015bd8deadSopenharmony_ciErrors
6025bd8deadSopenharmony_ci
6035bd8deadSopenharmony_ci    Relaxation of INVALID_ENUM errors
6045bd8deadSopenharmony_ci    ---------------------------------
6055bd8deadSopenharmony_ci    GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and GetDoublev now accept new tokens as
6065bd8deadSopenharmony_ci    described in the "New Tokens" section.
6075bd8deadSopenharmony_ci
6085bd8deadSopenharmony_ciNew State
6095bd8deadSopenharmony_ci
6105bd8deadSopenharmony_ci    Additions to Table 23.4 Rasterization
6115bd8deadSopenharmony_ci                                                   Initial
6125bd8deadSopenharmony_ci    Get Value                   Type  Get Command Value  Description               Sec.  Attribute
6135bd8deadSopenharmony_ci    -------------------------- ------ ----------- -----  -----------------------   ----  ---------
6145bd8deadSopenharmony_ci    RENDER_GPU_MASK_NV           Z+   GetIntegerv   *    Mask of GPUs that have    20.1     -
6155bd8deadSopenharmony_ci                                                           writes enabled
6165bd8deadSopenharmony_ci    * See section 20.1
6175bd8deadSopenharmony_ci
6185bd8deadSopenharmony_ci    Additions to Table 23.19 Textures (state per texture object)
6195bd8deadSopenharmony_ci
6205bd8deadSopenharmony_ci                                                    Initial
6215bd8deadSopenharmony_ci    Get Value                Type   Get Command      Value    Description                  Sec.
6225bd8deadSopenharmony_ci    ---------                ----   -----------      -------  -----------                  ----
6235bd8deadSopenharmony_ci    PER_GPU_STORAGE_NV       B      GetTexParameter  FALSE    Per-GPU storage requested    20.3
6245bd8deadSopenharmony_ci
6255bd8deadSopenharmony_ci    
6265bd8deadSopenharmony_ci    Additions to Table 23.30 Framebuffer (state per framebuffer object)
6275bd8deadSopenharmony_ci
6285bd8deadSopenharmony_ci    Get Value                Get Command      Type Initial Value    Description          Sec.    Attribute
6295bd8deadSopenharmony_ci    ---------                -----------      ---- -------------    -----------          ----    ---------
6305bd8deadSopenharmony_ci    MULTICAST_PROGRAMMABLE_- GetMultisamplefv  *    (0.5,0.5)       Programmable sample  20.3.2      -
6315bd8deadSopenharmony_ci        SAMPLE_LOCATION_NV        
6325bd8deadSopenharmony_ci
6335bd8deadSopenharmony_ci    * The type here is "2* x n x 2 x R[0,1]" which is is equivalent to PROGRAMMABLE_SAMPLE_LOCATION_ARB
6345bd8deadSopenharmony_ci    but with samples locations for all multicast GPUs (one after the other).
6355bd8deadSopenharmony_ci
6365bd8deadSopenharmony_ciNew Implementation Dependent State
6375bd8deadSopenharmony_ci
6385bd8deadSopenharmony_ci    Add to Table 23.82, Implementation-Dependent Values, p. 784
6395bd8deadSopenharmony_ci
6405bd8deadSopenharmony_ci                                                     Minimum
6415bd8deadSopenharmony_ci    Get Value                     Type   Get Command  Value  Description               Sec.  Attribute
6425bd8deadSopenharmony_ci    ---------------------------- ------ ------------- -----  ----------------------    ----  ---------
6435bd8deadSopenharmony_ci    MULTICAST_GPUS_NV              Z+    GetIntegerv    1    Number of linked GPUs     20.0     -
6445bd8deadSopenharmony_ci                                                             usable for multicast
6455bd8deadSopenharmony_ci
6465bd8deadSopenharmony_ciBackwards Compatibility
6475bd8deadSopenharmony_ci
6485bd8deadSopenharmony_ci    This extension replaces NVX_linked_gpu_multicast.  The enumerant values for MULTICAST_GPUS_NV
6495bd8deadSopenharmony_ci    and PER_GPU_STORAGE_BIT_NV match those of MAX_LGPU_GPUS_NVX and LGPU_SEPARATE_STORAGE_BIT_NVX
6505bd8deadSopenharmony_ci    (respectively).  MulticastBufferSubDataNV, MulticastCopyImageSubDataNV and MulticastBarrierNV
6515bd8deadSopenharmony_ci    behave analog to LGPUNamedBufferSubDataNVX, LGPUCopyImageSubDataNVX and LGPUInterlockNVX
6525bd8deadSopenharmony_ci    (respectively).
6535bd8deadSopenharmony_ci
6545bd8deadSopenharmony_ciSample Code
6555bd8deadSopenharmony_ci
6565bd8deadSopenharmony_ci    Binocular stereo rendering example using NV_gpu_multicast with single GPU fallback:
6575bd8deadSopenharmony_ci   
6585bd8deadSopenharmony_ci    struct ViewData {
6595bd8deadSopenharmony_ci        GLint viewport_index;
6605bd8deadSopenharmony_ci        GLfloat mvp[16];
6615bd8deadSopenharmony_ci        GLfloat modelview[16];
6625bd8deadSopenharmony_ci    };
6635bd8deadSopenharmony_ci    ViewData leftViewData = { 0, {...}, {...} };
6645bd8deadSopenharmony_ci    ViewData rightViewData = { 1, {...}, {...} };
6655bd8deadSopenharmony_ci
6665bd8deadSopenharmony_ci    GLuint ubo[2];
6675bd8deadSopenharmony_ci    glCreateBuffers(2, &ubo[0]);
6685bd8deadSopenharmony_ci
6695bd8deadSopenharmony_ci    if (has_NV_gpu_multicast) {
6705bd8deadSopenharmony_ci        glNamedBufferStorage(ubo[0], size, NULL, GL_PER_GPU_STORAGE_BIT_NV | GL_DYNAMIC_STORAGE_BIT);
6715bd8deadSopenharmony_ci        glMulticastBufferSubDataNV(0x1, ubo[0], 0, size, &leftViewData);
6725bd8deadSopenharmony_ci        glMulticastBufferSubDataNV(0x2, ubo[0], 0, size, &rightViewData);
6735bd8deadSopenharmony_ci    } else {
6745bd8deadSopenharmony_ci        glNamedBufferStorage(ubo[0], size, &leftViewData, 0);
6755bd8deadSopenharmony_ci        glNamedBufferStorage(ubo[1], size, &rightViewData, 0);
6765bd8deadSopenharmony_ci    }
6775bd8deadSopenharmony_ci
6785bd8deadSopenharmony_ci    glViewportIndexedf(0, 0, 0, 640, 480);  // left viewport
6795bd8deadSopenharmony_ci    glViewportIndexedf(1, 640, 0, 640, 480);  // right viewport
6805bd8deadSopenharmony_ci    // Vertex shader sets gl_ViewportIndex according to viewport_index in UBO
6815bd8deadSopenharmony_ci
6825bd8deadSopenharmony_ci    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
6835bd8deadSopenharmony_ci
6845bd8deadSopenharmony_ci    if (has_NV_gpu_multicast) {
6855bd8deadSopenharmony_ci        glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]);
6865bd8deadSopenharmony_ci        drawScene();
6875bd8deadSopenharmony_ci        // Make GPU 1 wait for glClear above to complete on GPU 0
6885bd8deadSopenharmony_ci        glMulticastWaitSyncNV(0, 0x2);
6895bd8deadSopenharmony_ci        // Copy right viewport from GPU 1 to GPU 0
6905bd8deadSopenharmony_ci        glMulticastCopyImageSubDataNV(1, 0x1,
6915bd8deadSopenharmony_ci                                      renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0,
6925bd8deadSopenharmony_ci                                      renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0,
6935bd8deadSopenharmony_ci                                      640, 480, 1);
6945bd8deadSopenharmony_ci        // Make GPU 0 wait for GPU 1 copy to GPU 0
6955bd8deadSopenharmony_ci        glMulticastWaitSyncNV(1, 0x1);
6965bd8deadSopenharmony_ci    } else {
6975bd8deadSopenharmony_ci        glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]);
6985bd8deadSopenharmony_ci        drawScene();
6995bd8deadSopenharmony_ci        glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[1]);
7005bd8deadSopenharmony_ci        drawScene();
7015bd8deadSopenharmony_ci    }
7025bd8deadSopenharmony_ci    // Both viewports are now present in GPU 0's renderbuffer
7035bd8deadSopenharmony_ci
7045bd8deadSopenharmony_ciIssues
7055bd8deadSopenharmony_ci
7065bd8deadSopenharmony_ci  (1) Should we provide explicit inter-GPU synchronization API?  Will this make the implementation
7075bd8deadSopenharmony_ci    easier or harder for the driver and applications?
7085bd8deadSopenharmony_ci
7095bd8deadSopenharmony_ci    RESOLVED. Yes. A naive implementation of implicit synchronization would simply synchronize the
7105bd8deadSopenharmony_ci    GPUs before and after each copy.  Smart implicit synchronization would have to track all APIs
7115bd8deadSopenharmony_ci    that can modify buffers and textures, creating an excessive burden for driver implementation
7125bd8deadSopenharmony_ci    and maintenance.  An application can track dependencies more easily and outperform a naive
7135bd8deadSopenharmony_ci    driver implementation using explicit synchronization.
7145bd8deadSopenharmony_ci
7155bd8deadSopenharmony_ci  (2) How does this extension interact with queries (e.g. occlusion queries)?
7165bd8deadSopenharmony_ci
7175bd8deadSopenharmony_ci    RESOLVED. Queries are performed separately on each GPU. The standard GetQueryObject* APIs
7185bd8deadSopenharmony_ci    return query results for GPU 0 only. However GetQueryBufferObject* can be used to retrieve
7195bd8deadSopenharmony_ci    query results for all GPUs through a buffer with separate storage (PER_GPU_STORAGE_BIT_NV).
7205bd8deadSopenharmony_ci
7215bd8deadSopenharmony_ci  (3) Are copy operations controlled by the render mask?
7225bd8deadSopenharmony_ci
7235bd8deadSopenharmony_ci    RESOLVED. Copies which write to the framebuffer are considered render commands and implicitly
7245bd8deadSopenharmony_ci    controlled by the render mask.  Copies between textures and buffers are not considered render
7255bd8deadSopenharmony_ci    commands so they are not influenced by the mask.  If masked copies are desired, use
7265bd8deadSopenharmony_ci    MulticastCopyImageSubDataNV, MulticastCopyBufferSubDataNV or MulticastBlitFramebufferNV.
7275bd8deadSopenharmony_ci    These commands explicitly specify the GPU source and destination and are not influenced by the
7285bd8deadSopenharmony_ci    render mask.  
7295bd8deadSopenharmony_ci
7305bd8deadSopenharmony_ci  (4) What happens if the MulticastCopyBufferSubDataNV source and destination buffer is the same?
7315bd8deadSopenharmony_ci
7325bd8deadSopenharmony_ci    RESOLVED.  When the source and destination involve the same GPU, MulticastCopyBufferSubDataNV
7335bd8deadSopenharmony_ci    matches the behavior of CopyBufferSubData: overlapped copies are not allowed and an
7345bd8deadSopenharmony_ci    INVALID_VALUE error results.  When the source and destination do not involve the same GPU,
7355bd8deadSopenharmony_ci    overlapping copies are allowed and no error is generated.
7365bd8deadSopenharmony_ci
7375bd8deadSopenharmony_ci  (5) How does this extension interact with CopyTexImage2D?
7385bd8deadSopenharmony_ci
7395bd8deadSopenharmony_ci    RESOLVED.  The behavior depends on the storage type of the target.  See section 20.4.  Since
7405bd8deadSopenharmony_ci    CopyTexImage* sources from the framebuffer, the source always has per-GPU storage.
7415bd8deadSopenharmony_ci
7425bd8deadSopenharmony_ci  (6) Should we provide a mechanism to modify viewports independently for each GPU?
7435bd8deadSopenharmony_ci
7445bd8deadSopenharmony_ci    RESOLVED. No. This can be achieved using multicast UBOs and ARB_shader_viewport_layer_array.
7455bd8deadSopenharmony_ci
7465bd8deadSopenharmony_ci  (7) Should we add a present API that automatically displays content from a specific GPU? It
7475bd8deadSopenharmony_ci    could abstract the transport mechanism, copying when necessary. 
7485bd8deadSopenharmony_ci
7495bd8deadSopenharmony_ci    RESOLVED. No. Transfers should be avoided to maximize performance and minimize latency.
7505bd8deadSopenharmony_ci    Minimizing transfers requires application awareness of display connectivity to assign
7515bd8deadSopenharmony_ci    rendering appropriately.  Hiding transfers behind an API would also prevent some interesting
7525bd8deadSopenharmony_ci    multi-GPU rendering techniques (e.g. checkerboard-style split rendering).
7535bd8deadSopenharmony_ci
7545bd8deadSopenharmony_ci    WGL_NV_bridged_display can be used to enable display from multiple GPUs without copies.
7555bd8deadSopenharmony_ci
7565bd8deadSopenharmony_ci  (8) Should we expose the extension on single-GPU configurations?
7575bd8deadSopenharmony_ci
7585bd8deadSopenharmony_ci    RESOLVED.  Yes, this is recommended.  It allows more code sharing between multi-GPU and
7595bd8deadSopenharmony_ci    single-GPU code paths.  If there is only one GPU present MULTICAST_GPUS_NV will be 1.  It
7605bd8deadSopenharmony_ci    may also be 1 if explicit GPU control is unavailable (e.g. if the active multi-GPU rendering
7615bd8deadSopenharmony_ci    mode prevents it).  Note that in revisions 5 and prior of this extension the minimum for
7625bd8deadSopenharmony_ci    MULTICAST_GPUS_NV was 2.
7635bd8deadSopenharmony_ci  
7645bd8deadSopenharmony_ci  (9) Should glGet*BufferParameter* return the PER_GPU_STORAGE_BIT_NV bit when
7655bd8deadSopenharmony_ci    BUFFER_STORAGE_FLAGS is queried?
7665bd8deadSopenharmony_ci
7675bd8deadSopenharmony_ci    RESOLVED. Yes. BUFFER_STORAGE_FLAGS must match the flags parameter input to *BufferStorage, as
7685bd8deadSopenharmony_ci    specified in table 6.3.
7695bd8deadSopenharmony_ci
7705bd8deadSopenharmony_ci  (10) Can a query be complete/available on one GPU and not another?
7715bd8deadSopenharmony_ci
7725bd8deadSopenharmony_ci    RESOLVED. Yes. Independent query completion is important for conditional rendering.  It
7735bd8deadSopenharmony_ci    allows each GPU to begin conditional rendering in mode QUERY_WAIT without waiting on other
7745bd8deadSopenharmony_ci    GPUs.
7755bd8deadSopenharmony_ci
7765bd8deadSopenharmony_ci  (11) How can custom texel data for be uploaded to each GPU for a given texture?
7775bd8deadSopenharmony_ci
7785bd8deadSopenharmony_ci    The easiest way is to create staging textures with the custom texel data and then copy it
7795bd8deadSopenharmony_ci    to a texture with per-GPU storage using MulticastCopyImageSubDataNV.
7805bd8deadSopenharmony_ci
7815bd8deadSopenharmony_ci  (12) Should we allow the waitGpuMask in MulticastWaitSyncNV to include the signal GPU?
7825bd8deadSopenharmony_ci
7835bd8deadSopenharmony_ci    RESOLVED. No. There is no reason for a GPU to wait on itself.  This is effectively a no-op in
7845bd8deadSopenharmony_ci    the command stream.  Furthermore it is easy to confuse GPU indices and masks, so it is
7855bd8deadSopenharmony_ci    beneficial to explicitly generate an error in this case.
7865bd8deadSopenharmony_ci
7875bd8deadSopenharmony_ci  (13) Will support for NVX_linked_gpu_multicast continue?
7885bd8deadSopenharmony_ci
7895bd8deadSopenharmony_ci    RESOLVED. NVX_linked_gpu_multicast is deprecated and applications should switch to
7905bd8deadSopenharmony_ci    NV_gpu_multicast.  However, implementations are encouraged to continue supporting
7915bd8deadSopenharmony_ci    NVX_linked_gpu_multicast for backwards compatibility.
7925bd8deadSopenharmony_ci
7935bd8deadSopenharmony_ci  (14) Does RenderGpuMaskNV work with immediate mode rendering?
7945bd8deadSopenharmony_ci
7955bd8deadSopenharmony_ci    RESOLVED. Yes, the render GPU mask applies to immediate mode rendering the same as other
7965bd8deadSopenharmony_ci    rendering.  Note that RenderGpuMaskNV is not one of the commands allowed between Begin and End
7975bd8deadSopenharmony_ci    (see section 10.7.5) so the render mask must be set before Begin is called.
7985bd8deadSopenharmony_ci
7995bd8deadSopenharmony_ciRevision History
8005bd8deadSopenharmony_ci
8015bd8deadSopenharmony_ci    Rev.    Date    Author    Changes
8025bd8deadSopenharmony_ci    ----  --------  --------  -----------------------------------------------
8035bd8deadSopenharmony_ci     7    04/02/19  jschnarr  clarify that the interactions with uniform APIs only apply to
8045bd8deadSopenharmony_ci                              EXT_bindable_uniform (not ARB_uniform_buffer_object).
8055bd8deadSopenharmony_ci                              optionally allow MulticastCopyBufferSubDataNV with buffers lacking
8065bd8deadSopenharmony_ci                              per-GPU storage
8075bd8deadSopenharmony_ci     6    01/03/19  jschnarr  reduce MULTICAST_GPUS_NV minimum to 1
8085bd8deadSopenharmony_ci                              clarify that MULTICAST_GPUS_NV is constant for a context
8095bd8deadSopenharmony_ci     5    10/07/16  jschnarr  trivial typo fix
8105bd8deadSopenharmony_ci     4    07/21/16  mjk       registered
8115bd8deadSopenharmony_ci     3    06/15/16  jschnarr  R370 release
812