15bd8deadSopenharmony_ciName 25bd8deadSopenharmony_ci 35bd8deadSopenharmony_ci NV_gpu_multicast 45bd8deadSopenharmony_ci 55bd8deadSopenharmony_ciName Strings 65bd8deadSopenharmony_ci 75bd8deadSopenharmony_ci GL_NV_gpu_multicast 85bd8deadSopenharmony_ci 95bd8deadSopenharmony_ciContact 105bd8deadSopenharmony_ci 115bd8deadSopenharmony_ci Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com) 125bd8deadSopenharmony_ci Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com) 135bd8deadSopenharmony_ci 145bd8deadSopenharmony_ciContributors 155bd8deadSopenharmony_ci 165bd8deadSopenharmony_ci Christoph Kubisch, NVIDIA 175bd8deadSopenharmony_ci Mark Kilgard, NVIDIA 185bd8deadSopenharmony_ci Robert Menzel, NVIDIA 195bd8deadSopenharmony_ci Kevin Lefebvre, NVIDIA 205bd8deadSopenharmony_ci Ralf Biermann, NVIDIA 215bd8deadSopenharmony_ci 225bd8deadSopenharmony_ciStatus 235bd8deadSopenharmony_ci 245bd8deadSopenharmony_ci Shipping in NVIDIA release 370.XX drivers and up. 255bd8deadSopenharmony_ci 265bd8deadSopenharmony_ciVersion 275bd8deadSopenharmony_ci 285bd8deadSopenharmony_ci Last Modified Date: April 2, 2019 295bd8deadSopenharmony_ci Revision: 7 305bd8deadSopenharmony_ci 315bd8deadSopenharmony_ciNumber 325bd8deadSopenharmony_ci 335bd8deadSopenharmony_ci OpenGL Extension #494 345bd8deadSopenharmony_ci 355bd8deadSopenharmony_ciDependencies 365bd8deadSopenharmony_ci 375bd8deadSopenharmony_ci This extension is written against the OpenGL 4.5 specification 385bd8deadSopenharmony_ci (Compatibility Profile), dated February 2, 2015. 395bd8deadSopenharmony_ci 405bd8deadSopenharmony_ci This extension requires ARB_copy_image. 415bd8deadSopenharmony_ci 425bd8deadSopenharmony_ci This extension interacts with ARB_sample_locations. 435bd8deadSopenharmony_ci 445bd8deadSopenharmony_ci This extension interacts with ARB_sparse_buffer. 455bd8deadSopenharmony_ci 465bd8deadSopenharmony_ci This extension requires EXT_direct_state_access. 475bd8deadSopenharmony_ci 485bd8deadSopenharmony_ci This extension interacts with EXT_bindable_uniform 495bd8deadSopenharmony_ci 505bd8deadSopenharmony_ciOverview 515bd8deadSopenharmony_ci 525bd8deadSopenharmony_ci This extension enables novel multi-GPU rendering techniques by providing application control 535bd8deadSopenharmony_ci over a group of linked GPUs with identical hardware configuration. 545bd8deadSopenharmony_ci 555bd8deadSopenharmony_ci Multi-GPU rendering techniques fall into two categories: implicit and explicit. Existing 565bd8deadSopenharmony_ci explicit approaches like WGL_NV_gpu_affinity have two main drawbacks: CPU overhead and 575bd8deadSopenharmony_ci application complexity. An application must manage one context per GPU and multi-pump the API 585bd8deadSopenharmony_ci stream. Implicit multi-GPU rendering techniques avoid these issues by broadcasting rendering 595bd8deadSopenharmony_ci from one context to multiple GPUs. Common implicit approaches include alternate-frame 605bd8deadSopenharmony_ci rendering (AFR), split-frame rendering (SFR) and multi-GPU anti-aliasing. They each have 615bd8deadSopenharmony_ci drawbacks. AFR scales nicely but interacts poorly with inter-frame dependencies. SFR can 625bd8deadSopenharmony_ci improve latency but has challenges with offscreen rendering and scaling of vertex processing. 635bd8deadSopenharmony_ci With multi-GPU anti-aliasing, each GPU renders the same content with alternate sample 645bd8deadSopenharmony_ci positions and the driver blends the result to improve quality. This also has issues with 655bd8deadSopenharmony_ci offscreen rendering and can conflict with other anti-aliasing techniques. 665bd8deadSopenharmony_ci 675bd8deadSopenharmony_ci These issues with implicit multi-GPU rendering all have the same root cause: the driver lacks 685bd8deadSopenharmony_ci adequate knowledge to accelerate every application. To resolve this, NV_gpu_multicast 695bd8deadSopenharmony_ci provides fine-grained, explicit application control over multiple GPUs with a single context. 705bd8deadSopenharmony_ci 715bd8deadSopenharmony_ci Key points: 725bd8deadSopenharmony_ci 735bd8deadSopenharmony_ci - One context controls multiple GPUs. Every GPU in the linked group can access every object. 745bd8deadSopenharmony_ci 755bd8deadSopenharmony_ci - Rendering is broadcast. Each draw is repeated across all GPUs in the linked group. 765bd8deadSopenharmony_ci 775bd8deadSopenharmony_ci - Each GPU gets its own instance of all framebuffers, allowing individualized output for each 785bd8deadSopenharmony_ci GPU. Input data can be customized for each GPU using buffers created with the storage flag, 795bd8deadSopenharmony_ci PER_GPU_STORAGE_BIT_NV and a new API, MulticastBufferSubDataNV. 805bd8deadSopenharmony_ci 815bd8deadSopenharmony_ci - New interfaces provide mechanisms to transfer textures and buffers from one GPU to another. 825bd8deadSopenharmony_ci 835bd8deadSopenharmony_ciNew Procedures and Functions 845bd8deadSopenharmony_ci 855bd8deadSopenharmony_ci void RenderGpuMaskNV(bitfield mask); 865bd8deadSopenharmony_ci 875bd8deadSopenharmony_ci void MulticastBufferSubDataNV( 885bd8deadSopenharmony_ci bitfield gpuMask, uint buffer, 895bd8deadSopenharmony_ci intptr offset, sizeiptr size, 905bd8deadSopenharmony_ci const void *data); 915bd8deadSopenharmony_ci 925bd8deadSopenharmony_ci void MulticastCopyBufferSubDataNV( 935bd8deadSopenharmony_ci uint readGpu, bitfield writeGpuMask, 945bd8deadSopenharmony_ci uint readBuffer, uint writeBuffer, 955bd8deadSopenharmony_ci intptr readOffset, intptr writeOffset, sizeiptr size); 965bd8deadSopenharmony_ci 975bd8deadSopenharmony_ci void MulticastCopyImageSubDataNV( 985bd8deadSopenharmony_ci uint srcGpu, bitfield dstGpuMask, 995bd8deadSopenharmony_ci uint srcName, enum srcTarget, 1005bd8deadSopenharmony_ci int srcLevel, 1015bd8deadSopenharmony_ci int srcX, int srcY, int srcZ, 1025bd8deadSopenharmony_ci uint dstName, enum dstTarget, 1035bd8deadSopenharmony_ci int dstLevel, 1045bd8deadSopenharmony_ci int dstX, int dstY, int dstZ, 1055bd8deadSopenharmony_ci sizei srcWidth, sizei srcHeight, sizei srcDepth); 1065bd8deadSopenharmony_ci 1075bd8deadSopenharmony_ci void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu, 1085bd8deadSopenharmony_ci int srcX0, int srcY0, int srcX1, int srcY1, 1095bd8deadSopenharmony_ci int dstX0, int dstY0, int dstX1, int dstY1, 1105bd8deadSopenharmony_ci bitfield mask, enum filter); 1115bd8deadSopenharmony_ci 1125bd8deadSopenharmony_ci void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start, 1135bd8deadSopenharmony_ci sizei count, const float *v); 1145bd8deadSopenharmony_ci 1155bd8deadSopenharmony_ci void MulticastBarrierNV(void); 1165bd8deadSopenharmony_ci 1175bd8deadSopenharmony_ci void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask); 1185bd8deadSopenharmony_ci 1195bd8deadSopenharmony_ci void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params); 1205bd8deadSopenharmony_ci void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params); 1215bd8deadSopenharmony_ci void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params); 1225bd8deadSopenharmony_ci void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params); 1235bd8deadSopenharmony_ci 1245bd8deadSopenharmony_ciNew Tokens 1255bd8deadSopenharmony_ci 1265bd8deadSopenharmony_ci Accepted in the <flags> parameter of BufferStorage and NamedBufferStorageEXT: 1275bd8deadSopenharmony_ci 1285bd8deadSopenharmony_ci PER_GPU_STORAGE_BIT_NV 0x0800 1295bd8deadSopenharmony_ci 1305bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and 1315bd8deadSopenharmony_ci GetDoublev: 1325bd8deadSopenharmony_ci 1335bd8deadSopenharmony_ci MULTICAST_GPUS_NV 0x92BA 1345bd8deadSopenharmony_ci RENDER_GPU_MASK_NV 0x9558 1355bd8deadSopenharmony_ci 1365bd8deadSopenharmony_ci Accepted as a value for <pname> for the TexParameter{if}, TexParameter{if}v, 1375bd8deadSopenharmony_ci TextureParameter{if}, TextureParameter{if}v, MultiTexParameter{if}EXT and 1385bd8deadSopenharmony_ci MultiTexParameter{if}vEXT commands and for the <value> parameter of GetTexParameter{if}v, 1395bd8deadSopenharmony_ci GetTextureParameter{if}vEXT and GetMultiTexParameter{if}vEXT: 1405bd8deadSopenharmony_ci 1415bd8deadSopenharmony_ci PER_GPU_STORAGE_NV 0x9548 1425bd8deadSopenharmony_ci 1435bd8deadSopenharmony_ci Accepted by the <pname> parameter of GetMultisamplefv: 1445bd8deadSopenharmony_ci 1455bd8deadSopenharmony_ci MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV 0x9549 1465bd8deadSopenharmony_ci 1475bd8deadSopenharmony_ciAdditions to the OpenGL 4.5 Specification (Compatibility Profile) 1485bd8deadSopenharmony_ci 1495bd8deadSopenharmony_ci (Add a new chapter after chapter 19 "Compute Shaders") 1505bd8deadSopenharmony_ci 1515bd8deadSopenharmony_ci 20 Multicast Rendering 1525bd8deadSopenharmony_ci 1535bd8deadSopenharmony_ci Some implementations support multiple linked GPUs driven by a single context. Often the 1545bd8deadSopenharmony_ci distribution of work to individual GPUs is managed by the GL without client knowledge. This 1555bd8deadSopenharmony_ci chapter specifies commands for explicitly distributing work across GPUs in a linked group. 1565bd8deadSopenharmony_ci Rendering can be enabled or disabled for specific GPUs. Draw commands are multicast, or 1575bd8deadSopenharmony_ci repeated across all enabled GPUs. Objects are shared by all GPUs, however each GPU has its 1585bd8deadSopenharmony_ci own instance (copy) of many resources, including framebuffers. When each GPU has its own 1595bd8deadSopenharmony_ci instance of a resource, it is considered to have per-GPU storage. When all GPUs share a 1605bd8deadSopenharmony_ci single instance of a resource, this is considered GPU-shared storage. 1615bd8deadSopenharmony_ci 1625bd8deadSopenharmony_ci The mechanism for linking GPUs is implementation specific, as is the mechanism for enabling 1635bd8deadSopenharmony_ci multicast rendering support (if necessary). The number of GPUs usable for multicast rendering 1645bd8deadSopenharmony_ci by a context can be queried by calling GetIntegerv with the symbolic constant 1655bd8deadSopenharmony_ci MULTICAST_GPUS_NV. This number is constant for the lifetime of a context. Individual GPUs 1665bd8deadSopenharmony_ci are identified using zero-based indices in the range [0, n-1], where n is the number of 1675bd8deadSopenharmony_ci multicast GPUs. GPUs are also identified by bitmasks of the form 2^i, where i is the GPU 1685bd8deadSopenharmony_ci index. A set of GPUs is specified by the union of masks for each GPU in the set. 1695bd8deadSopenharmony_ci 1705bd8deadSopenharmony_ci 20.1 Controlling Individual GPUs 1715bd8deadSopenharmony_ci 1725bd8deadSopenharmony_ci Render commands are restricted to a specific set of GPUs with 1735bd8deadSopenharmony_ci 1745bd8deadSopenharmony_ci void RenderGpuMaskNV(bitfield mask); 1755bd8deadSopenharmony_ci 1765bd8deadSopenharmony_ci The following errors apply to RenderGpuMaskNV: 1775bd8deadSopenharmony_ci 1785bd8deadSopenharmony_ci INVALID_OPERATION is generated 1795bd8deadSopenharmony_ci * if <mask> is zero, 1805bd8deadSopenharmony_ci * if <mask> is not zero and <mask> is greater than or equal to 2^n, where n is equal 1815bd8deadSopenharmony_ci to MULTICAST_GPUS_NV, 1825bd8deadSopenharmony_ci * if issued between BeginConditionalRender and the corresponding EndConditionalRender. 1835bd8deadSopenharmony_ci 1845bd8deadSopenharmony_ci If the command does not generate an error, RENDER_GPU_MASK_NV is set to <mask>. The default 1855bd8deadSopenharmony_ci value of RENDER_GPU_MASK_NV is (2^n)-1. 1865bd8deadSopenharmony_ci 1875bd8deadSopenharmony_ci Render commands are skipped for a GPU that is not present in RENDER_GPU_MASK_NV. For example: 1885bd8deadSopenharmony_ci draw calls, clears, compute dispatches, and copies or pixel path operations that write to a 1895bd8deadSopenharmony_ci framebuffer (e.g. DrawPixels, BlitFramebuffer). For a full list of render commands see 1905bd8deadSopenharmony_ci section 2.4 (page 26). MulticastBlitFramebufferNV is an exception to this policy: while it is 1915bd8deadSopenharmony_ci a rendering command, it has its own source and destinations mask. Note that buffer and 1925bd8deadSopenharmony_ci textures updates are not affected by RENDER_GPU_MASK_NV. 1935bd8deadSopenharmony_ci 1945bd8deadSopenharmony_ci 20.2 Multi-GPU Buffer Storage 1955bd8deadSopenharmony_ci 1965bd8deadSopenharmony_ci Like other resources, buffer objects can have two types of storage, per-GPU storage or 1975bd8deadSopenharmony_ci GPU-shared storage. Per-GPU storage can be explicitly requested using the 1985bd8deadSopenharmony_ci PER_GPU_STORAGE_BIT_NV flag with BufferStorage/NamedBufferStorageEXT. If this flag is not 1995bd8deadSopenharmony_ci set, the type of storage used is undefined. The implementation may use either type and 2005bd8deadSopenharmony_ci transition between them at any time. Client reads of a buffer with per-GPU storage may source 2015bd8deadSopenharmony_ci from any GPU. 2025bd8deadSopenharmony_ci 2035bd8deadSopenharmony_ci The following rules apply to buffer objects with per-GPU storage: 2045bd8deadSopenharmony_ci 2055bd8deadSopenharmony_ci When mapped updates apply to all GPUs (only WRITE_ONLY access is supported). 2065bd8deadSopenharmony_ci When used as the write buffer for CopyBufferSubData or CopyNamedBufferSubData, writes apply 2075bd8deadSopenharmony_ci to all GPUs. 2085bd8deadSopenharmony_ci 2095bd8deadSopenharmony_ci The following commands affect storage on all GPUs, even if the buffer object has per-GPU 2105bd8deadSopenharmony_ci storage: 2115bd8deadSopenharmony_ci 2125bd8deadSopenharmony_ci BufferSubData, NamedBufferSubData, ClearBufferSubData, and ClearNamedBufferData 2135bd8deadSopenharmony_ci 2145bd8deadSopenharmony_ci An INVALID_VALUE error is generated if BufferStorage/NamedBufferStorageEXT is called with 2155bd8deadSopenharmony_ci PER_GPU_STORAGE_BIT_NV set with MAP_READ_BIT or SPARSE_STORAGE_BIT_ARB. 2165bd8deadSopenharmony_ci 2175bd8deadSopenharmony_ci To modify buffer object data on one or more GPUs, the client may use the command 2185bd8deadSopenharmony_ci 2195bd8deadSopenharmony_ci void MulticastBufferSubDataNV( 2205bd8deadSopenharmony_ci bitfield gpuMask, uint buffer, 2215bd8deadSopenharmony_ci intptr offset, sizeiptr size, 2225bd8deadSopenharmony_ci const void *data); 2235bd8deadSopenharmony_ci 2245bd8deadSopenharmony_ci This command operates similarly to NamedBufferSubData, except that it updates the per-GPU 2255bd8deadSopenharmony_ci buffer data on the set of GPUs defined by <gpuMask>. If <buffer> has GPU-shared storage, 2265bd8deadSopenharmony_ci <gpuMask> is ignored and the shared instance of the buffer is updated. 2275bd8deadSopenharmony_ci 2285bd8deadSopenharmony_ci An INVALID_VALUE error is generated if <gpuMask> is zero or is greater than or equal to 2^n, 2295bd8deadSopenharmony_ci where n is equal to MULTICAST_GPUS_NV. 2305bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if <buffer> is not the name of an existing buffer 2315bd8deadSopenharmony_ci object. 2325bd8deadSopenharmony_ci An INVALID_VALUE error is generated if <offset> or <size> is negative, or if <offset> + <size> 2335bd8deadSopenharmony_ci is greater than the value of BUFFER_SIZE for the buffer object. 2345bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if any part of the specified buffer range is mapped 2355bd8deadSopenharmony_ci with MapBufferRange or MapBuffer (see section 6.3), unless it was mapped with 2365bd8deadSopenharmony_ci MAP_PERSISTENT_BIT set in the MapBufferRange access flags. 2375bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if the BUFFER_IMMUTABLE_STORAGE flag of the buffer 2385bd8deadSopenharmony_ci object is TRUE and the value of BUFFER_STORAGE_FLAGS for the buffer does not have the 2395bd8deadSopenharmony_ci DYNAMIC_STORAGE_BIT set. 2405bd8deadSopenharmony_ci 2415bd8deadSopenharmony_ci To copy between buffers created with PER_GPU_STORAGE_BIT_NV, the client may use the command 2425bd8deadSopenharmony_ci 2435bd8deadSopenharmony_ci void MulticastCopyBufferSubDataNV( 2445bd8deadSopenharmony_ci uint readGpu, bitfield writeGpuMask, 2455bd8deadSopenharmony_ci uint readBuffer, uint writeBuffer, 2465bd8deadSopenharmony_ci intptr readOffset, intptr writeOffset, sizeiptr size); 2475bd8deadSopenharmony_ci 2485bd8deadSopenharmony_ci This command operates similarly to CopyNamedBufferSubData, while adding control over the 2495bd8deadSopenharmony_ci source and destination GPU(s). The read GPU index is specified by <readGpu> and 2505bd8deadSopenharmony_ci the set of write GPUs is specified by the mask in <writeGpuMask>. 2515bd8deadSopenharmony_ci 2525bd8deadSopenharmony_ci Implementations may also support this command with buffers not created with 2535bd8deadSopenharmony_ci PER_GPU_STORAGE_BIT_NV. This support can be determined with one test copy with an error check 2545bd8deadSopenharmony_ci (see error discussion below). Note that a buffer created without PER_GPU_STORAGE_BIT_NV is 2555bd8deadSopenharmony_ci considered to have undefined storage and the behavior of the command depends on the storage 2565bd8deadSopenharmony_ci type (per-GPU or GPU-shared) currently used for <writeBuffer>. If <writeBuffer> is using 2575bd8deadSopenharmony_ci GPU-shared storage, the normal error checks apply but the command behaves as if <writeGpuMask> 2585bd8deadSopenharmony_ci includes all GPUs. If <writeBuffer> is using per-GPU storage, the command behaves as if 2595bd8deadSopenharmony_ci PER_GPU_STORAGE_BIT_NV were set, however performance may be reduced. 2605bd8deadSopenharmony_ci 2615bd8deadSopenharmony_ci This following error may apply to MulticastCopyBufferSubDataNV on some implementations and not 2625bd8deadSopenharmony_ci on others. In earlier revisions of this extension the error was required, therefore 2635bd8deadSopenharmony_ci applications should perform a test copy using buffers without PER_GPU_STORAGE_BIT_NV before 2645bd8deadSopenharmony_ci relying on that functionality: 2655bd8deadSopenharmony_ci 2665bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if the value of BUFFER_STORAGE_FLAGS for <readBuffer> 2675bd8deadSopenharmony_ci or <writeBuffer> does not have PER_GPU_STORAGE_BIT_NV set. 2685bd8deadSopenharmony_ci 2695bd8deadSopenharmony_ci The following errors apply to MulticastCopyBufferSubDataNV: 2705bd8deadSopenharmony_ci 2715bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if <readBuffer> or <writeBuffer> is not the name of an 2725bd8deadSopenharmony_ci existing buffer object. 2735bd8deadSopenharmony_ci An INVALID_VALUE error is generated if any of <readOffset>, <writeOffset>, or <size> are 2745bd8deadSopenharmony_ci negative, if <readOffset> + <size> exceeds the size of the source buffer object, or if 2755bd8deadSopenharmony_ci <writeOffset> + <size> exceeds the size of the destination buffer object. 2765bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if either the source or destination buffer objects is 2775bd8deadSopenharmony_ci mapped, unless they were mapped with MAP_PERSISTENT_BIT set in the Map*BufferRange access 2785bd8deadSopenharmony_ci flags. 2795bd8deadSopenharmony_ci An INVALID_VALUE error is generated if <readGpu> is greater than or equal to 2805bd8deadSopenharmony_ci MULTICAST_GPUS_NV. 2815bd8deadSopenharmony_ci An INVALID_OPERATION error is generated if <writeGpuMask> is zero. An INVALID_VALUE error is 2825bd8deadSopenharmony_ci generated if <writeGpuMask> is not zero and <writeGpuMask> is greater than or equal to 2^n, 2835bd8deadSopenharmony_ci where n is equal to MULTICAST_GPUS_NV. 2845bd8deadSopenharmony_ci An INVALID_VALUE error is generated if the source and destination are the same buffer object, 2855bd8deadSopenharmony_ci <readGpu> is present in <writeGpuMask>, and the ranges [<readOffset>; <readOffset> + <size>) 2865bd8deadSopenharmony_ci and [<writeOffset>; <writeOffset> + <size>) overlap. 2875bd8deadSopenharmony_ci 2885bd8deadSopenharmony_ci 20.3 Multi-GPU Framebuffers and Textures 2895bd8deadSopenharmony_ci 2905bd8deadSopenharmony_ci All buffers in the default framebuffer as well as renderbuffers receive per-GPU storage. By 2915bd8deadSopenharmony_ci default, storage for textures is undefined: it may be per-GPU or GPU-shared and can transition 2925bd8deadSopenharmony_ci between the types at any time. Per-GPU storage can be specified via 2935bd8deadSopenharmony_ci [Multi]Tex[ture]Parameter{if}[v] with PER_GPU_STORAGE_NV for the <pname> argument and TRUE for 2945bd8deadSopenharmony_ci the value. For this storage parameter to take effect, it must be specified after the texture 2955bd8deadSopenharmony_ci object is created and before the texture contents are defined by TexImage*, TexStorage* or 2965bd8deadSopenharmony_ci TextureStorage*. 2975bd8deadSopenharmony_ci 2985bd8deadSopenharmony_ci 20.3.1 Copying Image Data Between GPUs 2995bd8deadSopenharmony_ci 3005bd8deadSopenharmony_ci To copy texel data between GPUs, the client may use the command: 3015bd8deadSopenharmony_ci 3025bd8deadSopenharmony_ci void MulticastCopyImageSubDataNV( 3035bd8deadSopenharmony_ci uint srcGpu, bitfield dstGpuMask, 3045bd8deadSopenharmony_ci uint srcName, enum srcTarget, 3055bd8deadSopenharmony_ci int srcLevel, 3065bd8deadSopenharmony_ci int srcX, int srcY, int srcZ, 3075bd8deadSopenharmony_ci uint dstName, enum dstTarget, 3085bd8deadSopenharmony_ci int dstLevel, 3095bd8deadSopenharmony_ci int dstX, int dstY, int dstZ, 3105bd8deadSopenharmony_ci sizei srcWidth, sizei srcHeight, sizei srcDepth); 3115bd8deadSopenharmony_ci 3125bd8deadSopenharmony_ci This command operates equivalently to CopyImageSubData, except that it takes a source GPU and 3135bd8deadSopenharmony_ci a destination GPU set defined by <srcGpu> and <dstGpuMask> (respectively). Texel data is 3145bd8deadSopenharmony_ci copied from the source GPU to all destination GPUs. The following errors apply to 3155bd8deadSopenharmony_ci MulticastCopyImageSubDataNV: 3165bd8deadSopenharmony_ci 3175bd8deadSopenharmony_ci INVALID_ENUM is generated 3185bd8deadSopenharmony_ci * if either <srcTarget> or <dstTarget> 3195bd8deadSopenharmony_ci - is not RENDERBUFFER or a valid non-proxy texture target 3205bd8deadSopenharmony_ci - is TEXTURE_BUFFER, or 3215bd8deadSopenharmony_ci - is one of the cubemap face selectors described in table 3.17, 3225bd8deadSopenharmony_ci * if the target does not match the type of the object. 3235bd8deadSopenharmony_ci 3245bd8deadSopenharmony_ci INVALID_OPERATION is generated 3255bd8deadSopenharmony_ci * if either object is a texture and the texture is not complete, 3265bd8deadSopenharmony_ci * if the source and destination formats are not compatible, 3275bd8deadSopenharmony_ci * if the source and destination number of samples do not match, 3285bd8deadSopenharmony_ci * if one image is compressed and the other is uncompressed and the 3295bd8deadSopenharmony_ci block size of compressed image is not equal to the texel size 3305bd8deadSopenharmony_ci of the compressed image. 3315bd8deadSopenharmony_ci 3325bd8deadSopenharmony_ci INVALID_VALUE is generated 3335bd8deadSopenharmony_ci * if <srcGpu> is greater than or equal to MULTICAST_GPUS_NV, 3345bd8deadSopenharmony_ci * if <dstGpuMask> is zero, 3355bd8deadSopenharmony_ci * if <dstGpuMask> is greater than or equal to 2^n, where n is equal to 3365bd8deadSopenharmony_ci MULTICAST_GPUS_NV, 3375bd8deadSopenharmony_ci * if either <srcName> or <dstName> does not correspond to a valid 3385bd8deadSopenharmony_ci renderbuffer or texture object according to the corresponding 3395bd8deadSopenharmony_ci target parameter, or 3405bd8deadSopenharmony_ci * if the specified level is not a valid level for the image, or 3415bd8deadSopenharmony_ci * if the dimensions of the either subregion exceeds the boundaries 3425bd8deadSopenharmony_ci of the corresponding image object, or 3435bd8deadSopenharmony_ci * if the image format is compressed and the dimensions of the 3445bd8deadSopenharmony_ci subregion fail to meet the alignment constraints of the format. 3455bd8deadSopenharmony_ci 3465bd8deadSopenharmony_ci To copy pixel values from one GPU to another use the following command: 3475bd8deadSopenharmony_ci 3485bd8deadSopenharmony_ci void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu, 3495bd8deadSopenharmony_ci int srcX0, int srcY0, int srcX1, int srcY1, 3505bd8deadSopenharmony_ci int dstX0, int dstY0, int dstX1, int dstY1, 3515bd8deadSopenharmony_ci bitfield mask, enum filter); 3525bd8deadSopenharmony_ci 3535bd8deadSopenharmony_ci This command operates equivalently to BlitNamedFramebuffer except that it takes a source GPU 3545bd8deadSopenharmony_ci and a destination GPU defined by <srcGpu> and <dstGpu> (respectively). Pixel values are 3555bd8deadSopenharmony_ci copied from the read framebuffer on the source GPU to the draw framebuffer on the destination 3565bd8deadSopenharmony_ci GPU. 3575bd8deadSopenharmony_ci 3585bd8deadSopenharmony_ci In addition to the errors generated by BlitNamedFramebuffer (see listing starting on page 3595bd8deadSopenharmony_ci 634), calling MulticastBlitFramebufferNV will generate INVALID_VALUE if <srcGpu> or <dstGpu> 3605bd8deadSopenharmony_ci is greater than or equal to MULTICAST_GPUS_NV. 3615bd8deadSopenharmony_ci 3625bd8deadSopenharmony_ci 20.3.2 Per-GPU Sample Locations 3635bd8deadSopenharmony_ci 3645bd8deadSopenharmony_ci Programmable sample locations can be customized for each GPU and framebuffer using the 3655bd8deadSopenharmony_ci following command: 3665bd8deadSopenharmony_ci 3675bd8deadSopenharmony_ci void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start, 3685bd8deadSopenharmony_ci sizei count, const float *v); 3695bd8deadSopenharmony_ci 3705bd8deadSopenharmony_ci An INVALID_OPERATION error is generated by MulticastFramebufferSampleLocationsfvNV if 3715bd8deadSopenharmony_ci <framebuffer> is not the name of an existing framebuffer object. 3725bd8deadSopenharmony_ci 3735bd8deadSopenharmony_ci INVALID_VALUE is generated if the sum of <start> and <count> is greater than 3745bd8deadSopenharmony_ci PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB. 3755bd8deadSopenharmony_ci 3765bd8deadSopenharmony_ci An INVALID_VALUE error is generated if <gpu> is greater than or equal to MULTICAST_GPUS_NV. 3775bd8deadSopenharmony_ci 3785bd8deadSopenharmony_ci This is equivalent to FramebufferSampleLocationsfvARB except that it sets 3795bd8deadSopenharmony_ci MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV at the appropriate offset for the specified GPU. 3805bd8deadSopenharmony_ci Just as with FramebufferSampleLocationsfvARB, FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB 3815bd8deadSopenharmony_ci must be enabled for these sample locations to take effect. FramebufferSampleLocationsfvARB 3825bd8deadSopenharmony_ci and NamedFramebufferSampleLocationsfvARB also set MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV 3835bd8deadSopenharmony_ci but for the specified sample across all multicast GPUs. If <gpu> is 0, 3845bd8deadSopenharmony_ci MulticastFramebufferSampleLocationsfvNV updates PROGRAMMABLE_SAMPLE_LOCATION_ARB in addition 3855bd8deadSopenharmony_ci to MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV. 3865bd8deadSopenharmony_ci 3875bd8deadSopenharmony_ci The programmed sample locations can be retrieved using GetMultisamplefv with <pname> set to 3885bd8deadSopenharmony_ci MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV and indices calculated as follows: 3895bd8deadSopenharmony_ci 3905bd8deadSopenharmony_ci index_x = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i; 3915bd8deadSopenharmony_ci index_y = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i + 1; 3925bd8deadSopenharmony_ci 3935bd8deadSopenharmony_ci 20.4 Interactions with Other Copy Functions 3945bd8deadSopenharmony_ci 3955bd8deadSopenharmony_ci Many existing commands can be used to copy between resources with GPU-shared, per-GPU or 3965bd8deadSopenharmony_ci undefined storage. For example: ReadPixels, GetBufferSubData or TexImage2D with a pixel 3975bd8deadSopenharmony_ci unpack buffer. The following table defines how the storage of the resource influences the 3985bd8deadSopenharmony_ci behavior of these copies. 3995bd8deadSopenharmony_ci 4005bd8deadSopenharmony_ci Table 20.1 Behavior of Copy Commands with Multi-GPU Storage 4015bd8deadSopenharmony_ci 4025bd8deadSopenharmony_ci Source Destination Behavior 4035bd8deadSopenharmony_ci ---------- ----------- ----------------------------------------------------------------------- 4045bd8deadSopenharmony_ci GPU-shared GPU-shared There is just one source and one destination. Copy from source to 4055bd8deadSopenharmony_ci destination. 4065bd8deadSopenharmony_ci GPU-shared per-GPU There is a single source. Copy it to the destination on all GPUs. 4075bd8deadSopenharmony_ci GPU-shared undefined Either of the above behaviors for a GPU-shared source may apply. 4085bd8deadSopenharmony_ci 4095bd8deadSopenharmony_ci per-GPU GPU-shared Copy from the GPU with the lowest index set in RENDER_GPU_MASK_NV to 4105bd8deadSopenharmony_ci to the shared destination. 4115bd8deadSopenharmony_ci per-GPU per-GPU Implementations are encouraged to copy from source to destination 4125bd8deadSopenharmony_ci separately on each GPU. This is not required. If and when this is not 4135bd8deadSopenharmony_ci feasible, the copy should source from the GPU with the lowest index set 4145bd8deadSopenharmony_ci in RENDER_GPU_MASK_NV. 4155bd8deadSopenharmony_ci per-GPU undefined Either of the above behaviors for a per-GPU source may apply. 4165bd8deadSopenharmony_ci 4175bd8deadSopenharmony_ci undefined GPU-shared Either of the above behaviors for a GPU-shared destination may apply. 4185bd8deadSopenharmony_ci undefined per-GPU Either of the above behaviors for a per-GPU destination may apply. 4195bd8deadSopenharmony_ci undefined undefined Any of the above behaviors may apply. 4205bd8deadSopenharmony_ci 4215bd8deadSopenharmony_ci 20.5 Multi-GPU Synchronization 4225bd8deadSopenharmony_ci 4235bd8deadSopenharmony_ci MulticastCopyImageSubDataNV and MulticastCopyBufferSubDataNV each provide implicit 4245bd8deadSopenharmony_ci synchronization with previous work on the source GPU. MulticastBlitFramebufferNV is 4255bd8deadSopenharmony_ci different, providing implicit synchronization with previous work on the destination GPU. 4265bd8deadSopenharmony_ci In both cases, synchronization of the copies can be achieved with calls to the barrier 4275bd8deadSopenharmony_ci command: 4285bd8deadSopenharmony_ci 4295bd8deadSopenharmony_ci void MulticastBarrierNV(void); 4305bd8deadSopenharmony_ci 4315bd8deadSopenharmony_ci This is called to block all GPUs until all previous commands have been completed by all GPUs, 4325bd8deadSopenharmony_ci and all writes have landed. To guarantee consistency, synchronization must be placed between 4335bd8deadSopenharmony_ci any two accesses by multiple GPUs to the same memory when at least one of the accesses is a 4345bd8deadSopenharmony_ci write. This includes accesses to both the source and the destination. The safest approach is 4355bd8deadSopenharmony_ci to call MulticastBarrierNV immediately before and after each copy that involves multiple GPUs. 4365bd8deadSopenharmony_ci 4375bd8deadSopenharmony_ci GPU writes and reads to/from GPU-shared locations require synchronization as well. GPU writes 4385bd8deadSopenharmony_ci such as transform feedback, shader image store, CopyTexImage, CopyBufferSubData are not 4395bd8deadSopenharmony_ci automatically synchronized with writes by other GPUs. Neither are GPU reads such as texture 4405bd8deadSopenharmony_ci fetches, shader image loads, CopyTexImage, etc. synchronized with writes by other GPUs. 4415bd8deadSopenharmony_ci Existing barriers such as TextureBarrier and MemoryBarrier only provide consistency guarantees 4425bd8deadSopenharmony_ci for rendering, writes and reads on a single GPU. 4435bd8deadSopenharmony_ci 4445bd8deadSopenharmony_ci In some cases it may be desirable to have one or more GPUs wait for an operation to complete 4455bd8deadSopenharmony_ci on another GPU without synchronizing all GPUs with MulticastBarrierNV. This can be performed 4465bd8deadSopenharmony_ci with the following command: 4475bd8deadSopenharmony_ci 4485bd8deadSopenharmony_ci void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask); 4495bd8deadSopenharmony_ci 4505bd8deadSopenharmony_ci INVALID_VALUE is generated 4515bd8deadSopenharmony_ci * if <signalGpu> is greater than or equal to MULTICAST_GPUS_NV, 4525bd8deadSopenharmony_ci * if <waitGpuMask> is zero, 4535bd8deadSopenharmony_ci * if <waitGpuMask> is greater than or equal to 2^n, where n is equal to 4545bd8deadSopenharmony_ci MULTICAST_GPUS_NV, or 4555bd8deadSopenharmony_ci * if <signalGpu> is present in <waitGpuMask>. 4565bd8deadSopenharmony_ci 4575bd8deadSopenharmony_ci MulticastWaitSyncNV provides the same consistency guarantees as MulticastBarrierNV but only 4585bd8deadSopenharmony_ci between the GPUs specified by <signalGpu> and <waitGpuMask> in a single direction. It forces 4595bd8deadSopenharmony_ci the GPUs specified by waitGpuMask to wait until the GPU specified by <signalGpu> has completed 4605bd8deadSopenharmony_ci all previous commands and writes associated with those commands. 4615bd8deadSopenharmony_ci 4625bd8deadSopenharmony_ci 20.6 Multi-GPU Queries 4635bd8deadSopenharmony_ci 4645bd8deadSopenharmony_ci Queries are performed across all multicast GPUs. Each query object stores independent result 4655bd8deadSopenharmony_ci values for each GPU. The result value for a specific GPU can be queried using one of the 4665bd8deadSopenharmony_ci following commands: 4675bd8deadSopenharmony_ci 4685bd8deadSopenharmony_ci void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params); 4695bd8deadSopenharmony_ci void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params); 4705bd8deadSopenharmony_ci void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params); 4715bd8deadSopenharmony_ci void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params); 4725bd8deadSopenharmony_ci 4735bd8deadSopenharmony_ci The behavior of these commands matches the GetQueryObject* equivalent commands, except they 4745bd8deadSopenharmony_ci return the result value for the specified GPU. A query may be available on one GPU but not on 4755bd8deadSopenharmony_ci another, so it may be necessary to check QUERY_RESULT_AVAILABLE for each GPU. GetQueryObject* 4765bd8deadSopenharmony_ci return query results and availability for GPU 0 only. 4775bd8deadSopenharmony_ci 4785bd8deadSopenharmony_ci In addition to the errors generated by GetQueryObject* (see the listing in section 4.2 on page 4795bd8deadSopenharmony_ci 49), calling MulticastGetQueryObject* will generate INVALID_VALUE if <gpu> is greater than or 4805bd8deadSopenharmony_ci equal to MULTICAST_GPUS_NV. 4815bd8deadSopenharmony_ci 4825bd8deadSopenharmony_ciAdditions to Chapter 8 of the OpenGL 4.5 (Compatibility Profile) Specification 4835bd8deadSopenharmony_ci(Textures and Samplers) 4845bd8deadSopenharmony_ci 4855bd8deadSopenharmony_ci Modify Section 8.10 (Texture Parameters) 4865bd8deadSopenharmony_ci 4875bd8deadSopenharmony_ci Insert the following paragraph before Table 8.25 (Texture parameters and their values): 4885bd8deadSopenharmony_ci 4895bd8deadSopenharmony_ci If <pname> is PER_GPU_STORAGE_NV, then the state is stored in the texture, but only takes 4905bd8deadSopenharmony_ci effect the next time storage is allocated for a texture using TexImage*, TexStorage* or 4915bd8deadSopenharmony_ci TextureStorage*. If the value of TEXTURE_IMMUTABLE_FORMAT is TRUE, then PER_GPU_STORAGE_NV 4925bd8deadSopenharmony_ci cannot be changed and an error is generated. 4935bd8deadSopenharmony_ci 4945bd8deadSopenharmony_ci Additions to Table 8.26 Texture parameters and their values 4955bd8deadSopenharmony_ci 4965bd8deadSopenharmony_ci Name Type Legal values 4975bd8deadSopenharmony_ci ------------------ ------- ------------ 4985bd8deadSopenharmony_ci PER_GPU_STORAGE_NV boolean TRUE, FALSE 4995bd8deadSopenharmony_ci 5005bd8deadSopenharmony_ciAdditions to Chapter 10 of the OpenGL 4.5 (Compatibility Profile) Specification 5015bd8deadSopenharmony_ci(Vertex Specification and Drawing Commands) 5025bd8deadSopenharmony_ci 5035bd8deadSopenharmony_ci Modify Section 10.9 (Conditional Rendering) 5045bd8deadSopenharmony_ci 5055bd8deadSopenharmony_ci Replace the following text: 5065bd8deadSopenharmony_ci 5075bd8deadSopenharmony_ci If the result (SAMPLES_PASSED) of the query is zero, or if the result (ANY_SAMPLES_PASSED 5085bd8deadSopenharmony_ci or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, all rendering commands described in 5095bd8deadSopenharmony_ci section 2.4 are discarded and have no effect when issued between BeginConditional- Render 5105bd8deadSopenharmony_ci and the corresponding EndConditionalRender 5115bd8deadSopenharmony_ci 5125bd8deadSopenharmony_ci with this text: 5135bd8deadSopenharmony_ci 5145bd8deadSopenharmony_ci For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is 5155bd8deadSopenharmony_ci zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, 5165bd8deadSopenharmony_ci all rendering commands described in section 2.4 are discarded by this GPU and have no 5175bd8deadSopenharmony_ci effect when issued between BeginConditional- Render and the corresponding 5185bd8deadSopenharmony_ci EndConditionalRender 5195bd8deadSopenharmony_ci 5205bd8deadSopenharmony_ci Similarly replace the following: 5215bd8deadSopenharmony_ci 5225bd8deadSopenharmony_ci If the result (SAMPLES_PASSED) of the query is non-zero, or if the result 5235bd8deadSopenharmony_ci (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is TRUE, such commands are not 5245bd8deadSopenharmony_ci discarded. 5255bd8deadSopenharmony_ci 5265bd8deadSopenharmony_ci with this: 5275bd8deadSopenharmony_ci 5285bd8deadSopenharmony_ci For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is 5295bd8deadSopenharmony_ci non-zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is 5305bd8deadSopenharmony_ci TRUE, such commands are not discarded. 5315bd8deadSopenharmony_ci 5325bd8deadSopenharmony_ci Finally, replace all instances of "the GL" with "each active render GPU". 5335bd8deadSopenharmony_ci 5345bd8deadSopenharmony_ciAdditions to Chapter 14 of the OpenGL 4.5 (Compatibility Profile) Specification 5355bd8deadSopenharmony_ci(Fixed-Function Primitive Assembly and Rasterization) 5365bd8deadSopenharmony_ci 5375bd8deadSopenharmony_ci Modify Section 14.3.1 (Multisampling) 5385bd8deadSopenharmony_ci 5395bd8deadSopenharmony_ci Replace the following text: 5405bd8deadSopenharmony_ci 5415bd8deadSopenharmony_ci The location for sample <i> is taken from v[2*(i-start)] and v[2*(i-start)+1]. 5425bd8deadSopenharmony_ci 5435bd8deadSopenharmony_ci with the following: 5445bd8deadSopenharmony_ci 5455bd8deadSopenharmony_ci These commands set the sample locations for all multicast GPUs in 5465bd8deadSopenharmony_ci MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV. The location for sample <i> on 5475bd8deadSopenharmony_ci gpu <g> is taken from v[g*N+2*(i-start)] and v[g*N+2*(i-start)+1]. 5485bd8deadSopenharmony_ci 5495bd8deadSopenharmony_ci Replace the following error generated by GetMultisamplefv: 5505bd8deadSopenharmony_ci 5515bd8deadSopenharmony_ci An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB or 5525bd8deadSopenharmony_ci PROGRAMMABLE_SAMPLE_LOCATION_ARB. 5535bd8deadSopenharmony_ci 5545bd8deadSopenharmony_ci with the following: 5555bd8deadSopenharmony_ci 5565bd8deadSopenharmony_ci An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB, 5575bd8deadSopenharmony_ci PROGRAMMABLE_SAMPLE_LOCATION_ARB or MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV. 5585bd8deadSopenharmony_ci 5595bd8deadSopenharmony_ci Add the following to the list of errors generated by GetMultisamplefv: 5605bd8deadSopenharmony_ci 5615bd8deadSopenharmony_ci An INVALID_VALUE error is generated if <pname> is 5625bd8deadSopenharmony_ci MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_ARB and <index> is greater than or equal to the 5635bd8deadSopenharmony_ci value of PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB multiplied by the value of 5645bd8deadSopenharmony_ci MULTICAST_GPUS_NV. 5655bd8deadSopenharmony_ci 5665bd8deadSopenharmony_ci Replace the following pseudocode (in both locations): 5675bd8deadSopenharmony_ci 5685bd8deadSopenharmony_ci float *table = FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB; 5695bd8deadSopenharmony_ci sample_location.xy = (table[2*sample_i], table[2*sample_i+1]); 5705bd8deadSopenharmony_ci 5715bd8deadSopenharmony_ci with the following: 5725bd8deadSopenharmony_ci 5735bd8deadSopenharmony_ci float *table = MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV; 5745bd8deadSopenharmony_ci table += PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB * gpu; 5755bd8deadSopenharmony_ci sample_location.xy = (table[2*sample_i], table[2*sample_i+1]); 5765bd8deadSopenharmony_ci 5775bd8deadSopenharmony_ciAdditions to the WGL/GLX/EGL/AGL Specifications 5785bd8deadSopenharmony_ci 5795bd8deadSopenharmony_ci None 5805bd8deadSopenharmony_ci 5815bd8deadSopenharmony_ciDependencies on ARB_sample_locations 5825bd8deadSopenharmony_ci 5835bd8deadSopenharmony_ci If ARB_sample_locations is not supported, section 20.3.2 and any references to 5845bd8deadSopenharmony_ci MulticastFramebufferSampleLocationsfvNV and MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV should 5855bd8deadSopenharmony_ci be removed. The modifications to Section 14.3.1 (Multisampling) should also be removed. 5865bd8deadSopenharmony_ci 5875bd8deadSopenharmony_ciDependencies on ARB_sparse_buffer 5885bd8deadSopenharmony_ci 5895bd8deadSopenharmony_ci If ARB_sparse_buffer is not supported, any reference to SPARSE_STORAGE_BIT_ARB should be 5905bd8deadSopenharmony_ci removed. 5915bd8deadSopenharmony_ci 5925bd8deadSopenharmony_ciInteractions with EXT_bindable_uniform 5935bd8deadSopenharmony_ci 5945bd8deadSopenharmony_ci When using the functionality of EXT_bindable_uniform and a per-GPU storage buffer is bound 5955bd8deadSopenharmony_ci to a bindable location in a program object, client uniform updates apply to all GPUs. 5965bd8deadSopenharmony_ci 5975bd8deadSopenharmony_ci An INVALID_OPERATION is generated if a buffer with PER_GPU_STORAGE_BIT_NV is bound to a 5985bd8deadSopenharmony_ci program object's bindable location and GetUniformfv, GetUniformiv, GetUniformuiv or 5995bd8deadSopenharmony_ci GetUniformdv is called. 6005bd8deadSopenharmony_ci 6015bd8deadSopenharmony_ciErrors 6025bd8deadSopenharmony_ci 6035bd8deadSopenharmony_ci Relaxation of INVALID_ENUM errors 6045bd8deadSopenharmony_ci --------------------------------- 6055bd8deadSopenharmony_ci GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and GetDoublev now accept new tokens as 6065bd8deadSopenharmony_ci described in the "New Tokens" section. 6075bd8deadSopenharmony_ci 6085bd8deadSopenharmony_ciNew State 6095bd8deadSopenharmony_ci 6105bd8deadSopenharmony_ci Additions to Table 23.4 Rasterization 6115bd8deadSopenharmony_ci Initial 6125bd8deadSopenharmony_ci Get Value Type Get Command Value Description Sec. Attribute 6135bd8deadSopenharmony_ci -------------------------- ------ ----------- ----- ----------------------- ---- --------- 6145bd8deadSopenharmony_ci RENDER_GPU_MASK_NV Z+ GetIntegerv * Mask of GPUs that have 20.1 - 6155bd8deadSopenharmony_ci writes enabled 6165bd8deadSopenharmony_ci * See section 20.1 6175bd8deadSopenharmony_ci 6185bd8deadSopenharmony_ci Additions to Table 23.19 Textures (state per texture object) 6195bd8deadSopenharmony_ci 6205bd8deadSopenharmony_ci Initial 6215bd8deadSopenharmony_ci Get Value Type Get Command Value Description Sec. 6225bd8deadSopenharmony_ci --------- ---- ----------- ------- ----------- ---- 6235bd8deadSopenharmony_ci PER_GPU_STORAGE_NV B GetTexParameter FALSE Per-GPU storage requested 20.3 6245bd8deadSopenharmony_ci 6255bd8deadSopenharmony_ci 6265bd8deadSopenharmony_ci Additions to Table 23.30 Framebuffer (state per framebuffer object) 6275bd8deadSopenharmony_ci 6285bd8deadSopenharmony_ci Get Value Get Command Type Initial Value Description Sec. Attribute 6295bd8deadSopenharmony_ci --------- ----------- ---- ------------- ----------- ---- --------- 6305bd8deadSopenharmony_ci MULTICAST_PROGRAMMABLE_- GetMultisamplefv * (0.5,0.5) Programmable sample 20.3.2 - 6315bd8deadSopenharmony_ci SAMPLE_LOCATION_NV 6325bd8deadSopenharmony_ci 6335bd8deadSopenharmony_ci * The type here is "2* x n x 2 x R[0,1]" which is is equivalent to PROGRAMMABLE_SAMPLE_LOCATION_ARB 6345bd8deadSopenharmony_ci but with samples locations for all multicast GPUs (one after the other). 6355bd8deadSopenharmony_ci 6365bd8deadSopenharmony_ciNew Implementation Dependent State 6375bd8deadSopenharmony_ci 6385bd8deadSopenharmony_ci Add to Table 23.82, Implementation-Dependent Values, p. 784 6395bd8deadSopenharmony_ci 6405bd8deadSopenharmony_ci Minimum 6415bd8deadSopenharmony_ci Get Value Type Get Command Value Description Sec. Attribute 6425bd8deadSopenharmony_ci ---------------------------- ------ ------------- ----- ---------------------- ---- --------- 6435bd8deadSopenharmony_ci MULTICAST_GPUS_NV Z+ GetIntegerv 1 Number of linked GPUs 20.0 - 6445bd8deadSopenharmony_ci usable for multicast 6455bd8deadSopenharmony_ci 6465bd8deadSopenharmony_ciBackwards Compatibility 6475bd8deadSopenharmony_ci 6485bd8deadSopenharmony_ci This extension replaces NVX_linked_gpu_multicast. The enumerant values for MULTICAST_GPUS_NV 6495bd8deadSopenharmony_ci and PER_GPU_STORAGE_BIT_NV match those of MAX_LGPU_GPUS_NVX and LGPU_SEPARATE_STORAGE_BIT_NVX 6505bd8deadSopenharmony_ci (respectively). MulticastBufferSubDataNV, MulticastCopyImageSubDataNV and MulticastBarrierNV 6515bd8deadSopenharmony_ci behave analog to LGPUNamedBufferSubDataNVX, LGPUCopyImageSubDataNVX and LGPUInterlockNVX 6525bd8deadSopenharmony_ci (respectively). 6535bd8deadSopenharmony_ci 6545bd8deadSopenharmony_ciSample Code 6555bd8deadSopenharmony_ci 6565bd8deadSopenharmony_ci Binocular stereo rendering example using NV_gpu_multicast with single GPU fallback: 6575bd8deadSopenharmony_ci 6585bd8deadSopenharmony_ci struct ViewData { 6595bd8deadSopenharmony_ci GLint viewport_index; 6605bd8deadSopenharmony_ci GLfloat mvp[16]; 6615bd8deadSopenharmony_ci GLfloat modelview[16]; 6625bd8deadSopenharmony_ci }; 6635bd8deadSopenharmony_ci ViewData leftViewData = { 0, {...}, {...} }; 6645bd8deadSopenharmony_ci ViewData rightViewData = { 1, {...}, {...} }; 6655bd8deadSopenharmony_ci 6665bd8deadSopenharmony_ci GLuint ubo[2]; 6675bd8deadSopenharmony_ci glCreateBuffers(2, &ubo[0]); 6685bd8deadSopenharmony_ci 6695bd8deadSopenharmony_ci if (has_NV_gpu_multicast) { 6705bd8deadSopenharmony_ci glNamedBufferStorage(ubo[0], size, NULL, GL_PER_GPU_STORAGE_BIT_NV | GL_DYNAMIC_STORAGE_BIT); 6715bd8deadSopenharmony_ci glMulticastBufferSubDataNV(0x1, ubo[0], 0, size, &leftViewData); 6725bd8deadSopenharmony_ci glMulticastBufferSubDataNV(0x2, ubo[0], 0, size, &rightViewData); 6735bd8deadSopenharmony_ci } else { 6745bd8deadSopenharmony_ci glNamedBufferStorage(ubo[0], size, &leftViewData, 0); 6755bd8deadSopenharmony_ci glNamedBufferStorage(ubo[1], size, &rightViewData, 0); 6765bd8deadSopenharmony_ci } 6775bd8deadSopenharmony_ci 6785bd8deadSopenharmony_ci glViewportIndexedf(0, 0, 0, 640, 480); // left viewport 6795bd8deadSopenharmony_ci glViewportIndexedf(1, 640, 0, 640, 480); // right viewport 6805bd8deadSopenharmony_ci // Vertex shader sets gl_ViewportIndex according to viewport_index in UBO 6815bd8deadSopenharmony_ci 6825bd8deadSopenharmony_ci glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); 6835bd8deadSopenharmony_ci 6845bd8deadSopenharmony_ci if (has_NV_gpu_multicast) { 6855bd8deadSopenharmony_ci glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]); 6865bd8deadSopenharmony_ci drawScene(); 6875bd8deadSopenharmony_ci // Make GPU 1 wait for glClear above to complete on GPU 0 6885bd8deadSopenharmony_ci glMulticastWaitSyncNV(0, 0x2); 6895bd8deadSopenharmony_ci // Copy right viewport from GPU 1 to GPU 0 6905bd8deadSopenharmony_ci glMulticastCopyImageSubDataNV(1, 0x1, 6915bd8deadSopenharmony_ci renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0, 6925bd8deadSopenharmony_ci renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0, 6935bd8deadSopenharmony_ci 640, 480, 1); 6945bd8deadSopenharmony_ci // Make GPU 0 wait for GPU 1 copy to GPU 0 6955bd8deadSopenharmony_ci glMulticastWaitSyncNV(1, 0x1); 6965bd8deadSopenharmony_ci } else { 6975bd8deadSopenharmony_ci glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]); 6985bd8deadSopenharmony_ci drawScene(); 6995bd8deadSopenharmony_ci glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[1]); 7005bd8deadSopenharmony_ci drawScene(); 7015bd8deadSopenharmony_ci } 7025bd8deadSopenharmony_ci // Both viewports are now present in GPU 0's renderbuffer 7035bd8deadSopenharmony_ci 7045bd8deadSopenharmony_ciIssues 7055bd8deadSopenharmony_ci 7065bd8deadSopenharmony_ci (1) Should we provide explicit inter-GPU synchronization API? Will this make the implementation 7075bd8deadSopenharmony_ci easier or harder for the driver and applications? 7085bd8deadSopenharmony_ci 7095bd8deadSopenharmony_ci RESOLVED. Yes. A naive implementation of implicit synchronization would simply synchronize the 7105bd8deadSopenharmony_ci GPUs before and after each copy. Smart implicit synchronization would have to track all APIs 7115bd8deadSopenharmony_ci that can modify buffers and textures, creating an excessive burden for driver implementation 7125bd8deadSopenharmony_ci and maintenance. An application can track dependencies more easily and outperform a naive 7135bd8deadSopenharmony_ci driver implementation using explicit synchronization. 7145bd8deadSopenharmony_ci 7155bd8deadSopenharmony_ci (2) How does this extension interact with queries (e.g. occlusion queries)? 7165bd8deadSopenharmony_ci 7175bd8deadSopenharmony_ci RESOLVED. Queries are performed separately on each GPU. The standard GetQueryObject* APIs 7185bd8deadSopenharmony_ci return query results for GPU 0 only. However GetQueryBufferObject* can be used to retrieve 7195bd8deadSopenharmony_ci query results for all GPUs through a buffer with separate storage (PER_GPU_STORAGE_BIT_NV). 7205bd8deadSopenharmony_ci 7215bd8deadSopenharmony_ci (3) Are copy operations controlled by the render mask? 7225bd8deadSopenharmony_ci 7235bd8deadSopenharmony_ci RESOLVED. Copies which write to the framebuffer are considered render commands and implicitly 7245bd8deadSopenharmony_ci controlled by the render mask. Copies between textures and buffers are not considered render 7255bd8deadSopenharmony_ci commands so they are not influenced by the mask. If masked copies are desired, use 7265bd8deadSopenharmony_ci MulticastCopyImageSubDataNV, MulticastCopyBufferSubDataNV or MulticastBlitFramebufferNV. 7275bd8deadSopenharmony_ci These commands explicitly specify the GPU source and destination and are not influenced by the 7285bd8deadSopenharmony_ci render mask. 7295bd8deadSopenharmony_ci 7305bd8deadSopenharmony_ci (4) What happens if the MulticastCopyBufferSubDataNV source and destination buffer is the same? 7315bd8deadSopenharmony_ci 7325bd8deadSopenharmony_ci RESOLVED. When the source and destination involve the same GPU, MulticastCopyBufferSubDataNV 7335bd8deadSopenharmony_ci matches the behavior of CopyBufferSubData: overlapped copies are not allowed and an 7345bd8deadSopenharmony_ci INVALID_VALUE error results. When the source and destination do not involve the same GPU, 7355bd8deadSopenharmony_ci overlapping copies are allowed and no error is generated. 7365bd8deadSopenharmony_ci 7375bd8deadSopenharmony_ci (5) How does this extension interact with CopyTexImage2D? 7385bd8deadSopenharmony_ci 7395bd8deadSopenharmony_ci RESOLVED. The behavior depends on the storage type of the target. See section 20.4. Since 7405bd8deadSopenharmony_ci CopyTexImage* sources from the framebuffer, the source always has per-GPU storage. 7415bd8deadSopenharmony_ci 7425bd8deadSopenharmony_ci (6) Should we provide a mechanism to modify viewports independently for each GPU? 7435bd8deadSopenharmony_ci 7445bd8deadSopenharmony_ci RESOLVED. No. This can be achieved using multicast UBOs and ARB_shader_viewport_layer_array. 7455bd8deadSopenharmony_ci 7465bd8deadSopenharmony_ci (7) Should we add a present API that automatically displays content from a specific GPU? It 7475bd8deadSopenharmony_ci could abstract the transport mechanism, copying when necessary. 7485bd8deadSopenharmony_ci 7495bd8deadSopenharmony_ci RESOLVED. No. Transfers should be avoided to maximize performance and minimize latency. 7505bd8deadSopenharmony_ci Minimizing transfers requires application awareness of display connectivity to assign 7515bd8deadSopenharmony_ci rendering appropriately. Hiding transfers behind an API would also prevent some interesting 7525bd8deadSopenharmony_ci multi-GPU rendering techniques (e.g. checkerboard-style split rendering). 7535bd8deadSopenharmony_ci 7545bd8deadSopenharmony_ci WGL_NV_bridged_display can be used to enable display from multiple GPUs without copies. 7555bd8deadSopenharmony_ci 7565bd8deadSopenharmony_ci (8) Should we expose the extension on single-GPU configurations? 7575bd8deadSopenharmony_ci 7585bd8deadSopenharmony_ci RESOLVED. Yes, this is recommended. It allows more code sharing between multi-GPU and 7595bd8deadSopenharmony_ci single-GPU code paths. If there is only one GPU present MULTICAST_GPUS_NV will be 1. It 7605bd8deadSopenharmony_ci may also be 1 if explicit GPU control is unavailable (e.g. if the active multi-GPU rendering 7615bd8deadSopenharmony_ci mode prevents it). Note that in revisions 5 and prior of this extension the minimum for 7625bd8deadSopenharmony_ci MULTICAST_GPUS_NV was 2. 7635bd8deadSopenharmony_ci 7645bd8deadSopenharmony_ci (9) Should glGet*BufferParameter* return the PER_GPU_STORAGE_BIT_NV bit when 7655bd8deadSopenharmony_ci BUFFER_STORAGE_FLAGS is queried? 7665bd8deadSopenharmony_ci 7675bd8deadSopenharmony_ci RESOLVED. Yes. BUFFER_STORAGE_FLAGS must match the flags parameter input to *BufferStorage, as 7685bd8deadSopenharmony_ci specified in table 6.3. 7695bd8deadSopenharmony_ci 7705bd8deadSopenharmony_ci (10) Can a query be complete/available on one GPU and not another? 7715bd8deadSopenharmony_ci 7725bd8deadSopenharmony_ci RESOLVED. Yes. Independent query completion is important for conditional rendering. It 7735bd8deadSopenharmony_ci allows each GPU to begin conditional rendering in mode QUERY_WAIT without waiting on other 7745bd8deadSopenharmony_ci GPUs. 7755bd8deadSopenharmony_ci 7765bd8deadSopenharmony_ci (11) How can custom texel data for be uploaded to each GPU for a given texture? 7775bd8deadSopenharmony_ci 7785bd8deadSopenharmony_ci The easiest way is to create staging textures with the custom texel data and then copy it 7795bd8deadSopenharmony_ci to a texture with per-GPU storage using MulticastCopyImageSubDataNV. 7805bd8deadSopenharmony_ci 7815bd8deadSopenharmony_ci (12) Should we allow the waitGpuMask in MulticastWaitSyncNV to include the signal GPU? 7825bd8deadSopenharmony_ci 7835bd8deadSopenharmony_ci RESOLVED. No. There is no reason for a GPU to wait on itself. This is effectively a no-op in 7845bd8deadSopenharmony_ci the command stream. Furthermore it is easy to confuse GPU indices and masks, so it is 7855bd8deadSopenharmony_ci beneficial to explicitly generate an error in this case. 7865bd8deadSopenharmony_ci 7875bd8deadSopenharmony_ci (13) Will support for NVX_linked_gpu_multicast continue? 7885bd8deadSopenharmony_ci 7895bd8deadSopenharmony_ci RESOLVED. NVX_linked_gpu_multicast is deprecated and applications should switch to 7905bd8deadSopenharmony_ci NV_gpu_multicast. However, implementations are encouraged to continue supporting 7915bd8deadSopenharmony_ci NVX_linked_gpu_multicast for backwards compatibility. 7925bd8deadSopenharmony_ci 7935bd8deadSopenharmony_ci (14) Does RenderGpuMaskNV work with immediate mode rendering? 7945bd8deadSopenharmony_ci 7955bd8deadSopenharmony_ci RESOLVED. Yes, the render GPU mask applies to immediate mode rendering the same as other 7965bd8deadSopenharmony_ci rendering. Note that RenderGpuMaskNV is not one of the commands allowed between Begin and End 7975bd8deadSopenharmony_ci (see section 10.7.5) so the render mask must be set before Begin is called. 7985bd8deadSopenharmony_ci 7995bd8deadSopenharmony_ciRevision History 8005bd8deadSopenharmony_ci 8015bd8deadSopenharmony_ci Rev. Date Author Changes 8025bd8deadSopenharmony_ci ---- -------- -------- ----------------------------------------------- 8035bd8deadSopenharmony_ci 7 04/02/19 jschnarr clarify that the interactions with uniform APIs only apply to 8045bd8deadSopenharmony_ci EXT_bindable_uniform (not ARB_uniform_buffer_object). 8055bd8deadSopenharmony_ci optionally allow MulticastCopyBufferSubDataNV with buffers lacking 8065bd8deadSopenharmony_ci per-GPU storage 8075bd8deadSopenharmony_ci 6 01/03/19 jschnarr reduce MULTICAST_GPUS_NV minimum to 1 8085bd8deadSopenharmony_ci clarify that MULTICAST_GPUS_NV is constant for a context 8095bd8deadSopenharmony_ci 5 10/07/16 jschnarr trivial typo fix 8105bd8deadSopenharmony_ci 4 07/21/16 mjk registered 8115bd8deadSopenharmony_ci 3 06/15/16 jschnarr R370 release 812