1// Copyright 2021-2024 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_KHR_video_encode_h264
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document outlines a proposal to enable performing H.264/AVC video encode operations in Vulkan.
11
12== Problem Statement
13
14The `VK_KHR_video_queue` extension introduces support for video coding operations and the `VK_KHR_video_encode_queue` extension further extends this with APIs specific to video encoding.
15
16The goal of this proposal is to build upon this infrastructure to introduce support for encoding elementary video stream sequences compliant with the H.264/AVC video compression standard.
17
18
19== Solution Space
20
21As the `VK_KHR_video_queue` and `VK_KHR_video_encode_queue` extensions already laid down the architecture for how codec-specific video encode extensions need to be designed, this extension only needs to define the APIs to provide the necessary codec-specific parameters at various points during the use of the codec-independent APIs. In particular:
22
23  * APIs allowing to specify H.264 sequence and picture parameter sets (SPS, PPS) to be stored in video session parameters objects
24  * APIs allowing to specify H.264 information specific to the encoded picture, including references to previously stored SPS and PPS entries
25  * APIs allowing to specify H.264 reference picture information specific to the active reference pictures and optional reconstructed picture used in video encode operations
26
27Codec-specific encoding parameters are specified by the application through custom definitions provided by a video std header dedicated to H.264 video encoding.
28
29This proposal uses the common H.264 definitions first utilized by the `VK_KHR_video_decode_h264` extension and augments it with another video std header specific to H.264 encoding. Thus this extension uses the following video std headers:
30
31  * `vulkan_video_codec_h264std` - containing common definitions for all H.264 video coding operations
32  * `vulkan_video_codec_h264std_encode` - containing definitions specific to H.264 video encoding operations
33
34These headers can be included as follows:
35
36[source,c]
37----
38#include <vk_video/vulkan_video_codec_h264std.h>
39#include <vk_video/vulkan_video_codec_h264std_encode.h>
40----
41
42
43== Proposal
44
45=== Video Std Headers
46
47This extension uses the new `vulkan_video_codec_h264std_encode` video std header. Implementations must always support at least version 1.0.0 of this video std header.
48
49
50=== H.264 Encode Profiles
51
52This extension introduces the new video codec operation `VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR`. This flag can be used to check whether a particular queue family supports encoding H.264/AVC content, as returned in `VkQueueFamilyVideoPropertiesKHR`.
53
54An H.264 encode profile can be defined through a `VkVideoProfileInfoKHR` structure using this new video codec operation and by including the following new codec-specific profile information structure in the `pNext` chain:
55
56[source,c]
57----
58typedef struct VkVideoEncodeH264ProfileInfoKHR {
59    VkStructureType                              sType;
60    const void*                                  pNext;
61    StdVideoH264ProfileIdc                       stdProfileIdc;
62} VkVideoEncodeH264ProfileInfoKHR;
63----
64
65`stdProfileIdc` specifies the H.264 profile indicator.
66
67
68=== H.264 Encode Capabilities
69
70Applications need to include the following new structure in the `pNext` chain of `VkVideoCapabilitiesKHR` when calling the `vkGetPhysicalDeviceVideoCapabilitiesKHR` command to retrieve the capabilities specific to H.264 video encoding:
71
72[source,c]
73----
74typedef struct VkVideoEncodeH264CapabilitiesKHR {
75    VkStructureType                        sType;
76    void*                                  pNext;
77    VkVideoEncodeH264CapabilityFlagsKHR    flags;
78    StdVideoH264LevelIdc                   maxLevelIdc;
79    uint32_t                               maxSliceCount;
80    uint32_t                               maxPPictureL0ReferenceCount;
81    uint32_t                               maxBPictureL0ReferenceCount;
82    uint32_t                               maxL1ReferenceCount;
83    uint32_t                               maxTemporalLayerCount;
84    VkBool32                               expectDyadicTemporalLayerPattern;
85    int32_t                                minQp;
86    int32_t                                maxQp;
87    VkBool32                               prefersGopRemainingFrames;
88    VkBool32                               requiresGopRemainingFrames;
89    VkVideoEncodeH264StdFlagsKHR           stdSyntaxFlags;
90} VkVideoEncodeH264CapabilitiesKHR;
91----
92
93`flags` indicates support for various H.264 encoding capabilities:
94
95  * `VK_VIDEO_ENCODE_H264_CAPABILITY_HRD_COMPLIANCE_BIT_KHR` - support for generating HRD compliant bitstreams when the related HRD parameters are present
96  * `VK_VIDEO_ENCODE_H264_CAPABILITY_PREDICTION_WEIGHT_TABLE_GENERATED_BIT_KHR` - support for generating the weight tables used by the encoding process, when necessary, instead of the application having to provide them
97  * `VK_VIDEO_ENCODE_H264_CAPABILITY_ROW_UNALIGNED_SLICE_BIT_KHR` - support for slices that do not start/finish at macroblock row boundaries
98  * `VK_VIDEO_ENCODE_H264_CAPABILITY_DIFFERENT_SLICE_TYPE_BIT_KHR` - support for different slice types within a frame
99  * `VK_VIDEO_ENCODE_H264_CAPABILITY_B_FRAME_IN_L0_LIST_BIT_KHR` - support for including B pictures in the L0 reference list
100  * `VK_VIDEO_ENCODE_H264_CAPABILITY_B_FRAME_IN_L1_LIST_BIT_KHR` - support for including B pictures in the L1 reference list
101  * `VK_VIDEO_ENCODE_H264_CAPABILITY_PER_PICTURE_TYPE_MIN_MAX_QP_BIT_KHR` - support for using different min/max QP values for each picture type when rate control is enabled
102  * `VK_VIDEO_ENCODE_H264_CAPABILITY_PER_SLICE_CONSTANT_QP_BIT_KHR` - support for using different constant QP values for each slice of a frame when rate control is disabled
103  * `VK_VIDEO_ENCODE_H264_CAPABILITY_GENERATE_PREFIX_NALU_BIT_KHR` - support for generating prefix NAL units
104
105`maxLevelIdc` indicates the maximum supported H.264 level indicator.
106
107`maxSliceCount` indicates the implementation's upper bound on the number of H.264 slices that an encoded frame can contain, although the actual maximum may be smaller for a given frame depending on its dimensions and some of the capability flags described earlier.
108
109`maxPPictureL0ReferenceCount`, `maxBPictureL0ReferenceCount`, and `maxL1ReferenceCount` indicate the maximum number of reference frames that the encoded frames can refer to through the L0 and L1 reference lists depending on the type of the picture (P or B), respectively. These capabilities do not restrict the number of references the application can include in the L0 and L1 reference lists as, in practice, implementations may restrict the effective number of used references based on the encoded content and/or the capabilities of the encoder implementation. However, they do indirectly indicate whether encoding P or B pictures are supported. In particular:
110
111  * If `maxPPictureL0ReferenceCount` is zero, then encoding P pictures is not supported by the implementation
112  * If both `maxBPictureL0ReferenceCount` and `maxL1ReferenceCount` are zero, then encoding B pictures is not supported by the implementation
113
114`maxTemporalLayerCount` indicates the number of supported H.264 temporal layers, while `expectDyadicTemporalLayerPattern` indicates whether the multi-layer rate control algorithm of the implementation (if support is indicated by `VkVideoEncodeCapabilitiesKHR::maxRateControlLayers` being greater than one for the given H.264 encode profile) expects the application to use a dyadic temporal layer pattern for accurate operation.
115
116`minQp` and `maxQp` indicate the supported range of QP values that can be used in the rate control configurations or as the constant QP to be used when rate control is disabled.
117
118`prefersGopRemainingFrames` and `requiresGopRemainingFrames` indicate whether the implementation prefers or requires, respectively, that the application track the remaining number of frames (for each type) in the current GOP (group of pictures), as some implementations may need this information for the accurate operation of their rate control algorithm.
119
120`stdSyntaxFlags` contains a set of flags that provide information to the application about which video std parameters or parameter values are supported to be used directly as specified by the application. These flags do not restrict what video std parameter values the application can specify, rather, they provide guarantees about respecting those.
121
122
123=== H.264 Encode Parameter Sets
124
125The use of video session parameters objects is mandatory when encoding H.264 video streams. Applications need to include the following new structure in the `pNext` chain of `VkVideoSessionParametersCreateInfoKHR` when creating video session parameters objects for H.264 encode use, to specify the parameter set capacity of the created objects:
126
127[source,c]
128----
129typedef struct VkVideoEncodeH264SessionParametersCreateInfoKHR {
130    VkStructureType                                        sType;
131    const void*                                            pNext;
132    uint32_t                                               maxStdSPSCount;
133    uint32_t                                               maxStdPPSCount;
134    const VkVideoEncodeH264SessionParametersAddInfoKHR*    pParametersAddInfo;
135} VkVideoEncodeH264SessionParametersCreateInfoKHR;
136----
137
138The optional `pParametersAddInfo` member also allows specifying an initial set of parameter sets to add to the created object:
139
140[source,c]
141----
142typedef struct VkVideoEncodeH264SessionParametersAddInfoKHR {
143    VkStructureType                            sType;
144    const void*                                pNext;
145    uint32_t                                   stdSPSCount;
146    const StdVideoH264SequenceParameterSet*    pStdSPSs;
147    uint32_t                                   stdPPSCount;
148    const StdVideoH264PictureParameterSet*     pStdPPSs;
149} VkVideoEncodeH264SessionParametersAddInfoKHR;
150----
151
152This structure can also be included in the `pNext` chain of `VkVideoSessionParametersUpdateInfoKHR` used in video session parameters update operations to add further parameter sets to an object after its creation.
153
154Individual parameter sets are stored using parameter set IDs as their keys, specifically:
155
156  * H.264 SPS entries are identified using a `seq_parameter_set_id` value
157  * H.264 PPS entries are identified using a pair of `seq_parameter_set_id` and `pic_parameter_set_id` values
158
159The H.264/AVC video compression standard always requires an SPS and PPS, hence the application has to add an instance of each parameter set to the used parameters object before being able to record video encode operations.
160
161Furthermore, the H.264/AVC video compression standard also allows modifying existing parameter sets, but as parameters already stored in video session parameters objects cannot be changed in Vulkan, the application has to create new parameters objects in such cases, as described in the proposal for `VK_KHR_video_queue`.
162
163As implementations can override parameters in the SPS and PPS entries stored in video session parameters objects, as described in the proposal for `VK_KHR_video_encode_queue`, this proposal introduces additional structures specific to H.264 encode to be used with the `vkGetEncodedVideoSessionParametersKHR` command.
164
165First, the following new structure has to be included in the `pNext` chain of `VkVideoEncodeSessionParametersGetInfoKHR` to identify the H.264 parameter sets that the command is expected to return feedback information or encoded parameter set data for:
166
167[source,c]
168----
169typedef struct VkVideoEncodeH264SessionParametersGetInfoKHR {
170    VkStructureType    sType;
171    const void*        pNext;
172    VkBool32           writeStdSPS;
173    VkBool32           writeStdPPS;
174    uint32_t           stdSPSId;
175    uint32_t           stdPPSId;
176} VkVideoEncodeH264SessionParametersGetInfoKHR;
177----
178
179`writeStdSPS` and `writeStdPPS` specify whether SPS or PPS feedback/bitstream data is requested. Both can be requested, if needed.
180
181`stdSPSId` and `stdPPSId` are used to identify the SPS and/or PPS to request data for, the latter being relevant only for PPS queries.
182
183When requesting feedback using the `vkGetEncodedVideoSessionParametersKHR` command, the following new structure can be included in the `pNext` chain of `VkVideoEncodeSessionParametersFeedbackInfoKHR`:
184
185[source,c]
186----
187typedef struct VkVideoEncodeH264SessionParametersFeedbackInfoKHR {
188    VkStructureType    sType;
189    void*              pNext;
190    VkBool32           hasStdSPSOverrides;
191    VkBool32           hasStdPPSOverrides;
192} VkVideoEncodeH264SessionParametersFeedbackInfoKHR;
193----
194
195The resulting values of `hasStdSPSOverrides` and `hasStdPPSOverrides` indicate whether overrides were applied to the SPS and/or PPS, respectively, if the corresponding `writeStd` field was set in the input parameters.
196
197When requesting encoded bitstream data using the `vkGetEncodedVideoSessionParametersKHR` command, the output host data buffer will be filled with the encoded bitstream of the requested H.264 parameter sets.
198
199As described in great detail in the proposal for the `VK_KHR_video_encode_queue` extension, the application may have the option to encode the parameters otherwise stored in video session parameters object on its own. However, this may not result in a compliant bitstream if the implementation applied overrides to SPS or PPS parameters, thus it is generally recommended for applications to use the encoded parameter set data retrieved using the `vkGetEncodedVideoSessionParametersKHR` command.
200
201
202=== H.264 Encoding Parameters
203
204Encode parameters specific to H.264 need to be provided by the application through the `pNext` chain of `VkVideoEncodeInfoKHR`, using the following new structure:
205
206[source,c]
207----
208typedef struct VkVideoEncodeH264PictureInfoKHR {
209    VkStructureType                             sType;
210    const void*                                 pNext;
211    uint32_t                                    naluSliceEntryCount;
212    const VkVideoEncodeH264NaluSliceInfoKHR*    pNaluSliceEntries;
213    const StdVideoEncodeH264PictureInfo*        pStdPictureInfo;
214    VkBool32                                    generatePrefixNalu;
215} VkVideoEncodeH264PictureInfoKHR;
216----
217
218`naluSliceEntryCount` specifies the number of slices to encode for the frame and the elements of the `pNaluSliceEntries` array provide additional information for each slice, as described later.
219
220`pStdPictureInfo` points to the codec-specific encode parameters defined in the `vulkan_video_codec_h264std_encode` video std header.
221
222The active SPS and PPS (sourced from the bound video session parameters object) are identified by the `seq_parameter_set_id` and `pic_parameter_set_id` parameters.
223
224The structure pointed to by `pStdPictureInfo->pRefLists` specifies the codec-specific parameters related to the reference lists. In particular, it specifies the DPB slots corresponding to the elements of the L0 and L1 reference lists, as well as the reference picture marking and reference list modification operations.
225
226If the `VK_VIDEO_ENCODE_H264_CAPABILITY_GENERATE_PREFIX_NALU_BIT_KHR` capability flag is supported, `generatePrefixNalu` can be set to `VK_TRUE` to request the generation of prefix NAL units before each encoded slice.
227
228The parameters of individual slices are provided through instances of the following new structure:
229
230[source,c]
231----
232typedef struct VkVideoEncodeH264NaluSliceInfoKHR {
233    VkStructureType                         sType;
234    const void*                             pNext;
235    int32_t                                 constantQp;
236    const StdVideoEncodeH264SliceHeader*    pStdSliceHeader;
237} VkVideoEncodeH264NaluSliceInfoKHR;
238----
239
240`constantQp` specifies the constant QP value to use for the slice when rate control is disabled.
241
242`pStdSliceHeader` points to the codec-specific encode parameters to use in the slice header.
243
244Picture information specific to H.264 for the active reference pictures and the optional reconstructed picture need to be provided by the application through the `pNext` chain of corresponding elements of `VkVideoEncodeInfoKHR::pReferenceSlots` and the `pNext` chain of `VkVideoEncodeInfoKHR::pSetupReferenceSlot`, respectively, using the following new structure:
245
246[source,c]
247----
248typedef struct VkVideoEncodeH264DpbSlotInfoKHR {
249    VkStructureType                           sType;
250    const void*                               pNext;
251    const StdVideoEncodeH264ReferenceInfo*    pStdReferenceInfo;
252} VkVideoEncodeH264DpbSlotInfoKHR;
253----
254
255`pStdReferenceInfo` points to the codec-specific reference picture parameters defined in the `vulkan_video_codec_h264std_encode` video std header.
256
257It is the application's responsibility to specify codec-specific parameters that are compliant to the rules defined by the H.264/AVC video compression standard. While it is not illegal, from the API usage's point of view, to specify non-compliant inputs, they may cause the video encode operation to complete unsuccessfully and will cause the output bitstream and the reconstructed picture, if one is specified, to have undefined contents after the execution of the operation.
258
259Implementations may override some of these parameters in order to conform to any restrictions of the encoder implementation, but that will not affect the overall operation of the encoding. The application has the option to also opt-in for additional optimizing overrides that can result in better performance or efficiency tailored to the usage scenario by creating the video session with the new `VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_PARAMETER_OPTIMIZATIONS_BIT_KHR` flag.
260
261For more information about individual H.264 bitstream syntax elements, derived values, and, in general, how to interpret these parameters, please refer to the corresponding sections of the https://www.itu.int/rec/T-REC-H.264-202108-I/[ITU-T H.264 Specification].
262
263
264=== H.264 Reference Lists
265
266In order to populate the L0 and L1 reference lists used to encode predictive pictures, the application has to set the corresponding elements of the `RefPicList0` and `RefPicList1` array members of the structure pointed to by `VkVideoEncodeH264PictureInfoKHR::pStdPictureInfo->pRefLists` to the DPB slot indices of the reference pictures, while all unused elements of `RefPicList0` and `RefPicList1` have to be set to `STD_VIDEO_H264_NO_REFERENCE_PICTURE`. As usual, the reference picture resources are specified by including them in the list of active reference pictures according to the codec-independent semantics defined by the `VK_KHR_video_encode_queue` extension.
267
268In all cases the set of DPB slot indices referenced by the L0 and L1 reference lists and the list of active reference pictures specified in `VkVideoEncodeInfoKHR::pReferenceSlots` must match, but the order in which the active reference pictures are included in the `pReferenceSlots` array does not matter.
269
270
271=== H.264 Rate Control
272
273This proposal adds a set of optional rate control parameters specific to H.264 encoding that provide additional guidance to the implementation's rate control algorithm.
274
275When rate control is not disabled and not set to implementation-default behavior, the application can include the following new structure in the `pNext` chain of `VkVideoEncodeRateControlInfoKHR`:
276
277[source,c]
278----
279typedef struct VkVideoEncodeH264RateControlInfoKHR {
280    VkStructureType                         sType;
281    const void*                             pNext;
282    VkVideoEncodeH264RateControlFlagsKHR    flags;
283    uint32_t                                gopFrameCount;
284    uint32_t                                idrPeriod;
285    uint32_t                                consecutiveBFrameCount;
286    uint32_t                                temporalLayerCount;
287} VkVideoEncodeH264RateControlInfoKHR;
288----
289
290`flags` can include one or more of the following flags:
291
292  * `VK_VIDEO_ENCODE_H264_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR` can be used to indicate that the application would like the implementation's rate control algorithm to attempt to produce an HRD compliant bitstream when possible
293  * `VK_VIDEO_ENCODE_H264_RATE_CONTROL_REGULAR_GOP_BIT_KHR` can be used to indicate that the application intends to use a regular GOP structure according to the parameters specified in `gopFrameCount`, `idrPeriod`, and `consecutiveBFrameCount`
294  * `VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR` can be used to indicate that the application intends to follow a flat reference pattern in the GOP where each P frame uses the last non-B frame as reference, and each B frame uses the last and next non-B frame as forward and backward references, respectively
295  * `VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR` can be used to indicate that the application intends to follow a dyadic reference pattern
296  * `VK_VIDEO_ENCODE_H264_RATE_CONTROL_TEMPORAL_LAYER_PATTERN_DYADIC_BIT_KHR` can be used to indicate that the application intends to follow a dyadic temporal layer pattern when using multiple temporal layers
297
298`gopFrameCount`, `idrPeriod`, and `consecutiveBFrameCount` specify the GOP size, IDR period, and the number of consecutive B frames between non-B frames, respectively, that define the typical structure of the GOP the implementation's rate control algorithm should expect. If `VK_VIDEO_ENCODE_H264_RATE_CONTROL_REGULAR_GOP_BIT_KHR` is also specified in `flags`, the implementation will expect all GOPs to follow this structure, while otherwise it may assume that the application will diverge from these values from time to time. If any of these values are zero, then the implementation's rate control algorithm will not make any assumptions about the corresponding parameter of the GOP structure.
299
300`temporalLayerCount` indicates the number of H.264 temporal layers that the application intends to use and it is expected to match the number of rate control layers when multi-layer rate control is used.
301
302The following new structure can be included in the `pNext` chain of `VkVideoEncodeRateControlLayerInfoKHR` to specify additional per-rate-control-layer guidance parameters specific to H.264 encode:
303
304[source,c]
305----
306typedef struct VkVideoEncodeH264RateControlLayerInfoKHR {
307    VkStructureType                  sType;
308    const void*                      pNext;
309    VkBool32                         useMinQp;
310    VkVideoEncodeH264QpKHR           minQp;
311    VkBool32                         useMaxQp;
312    VkVideoEncodeH264QpKHR           maxQp;
313    VkBool32                         useMaxFrameSize;
314    VkVideoEncodeH264FrameSizeKHR    maxFrameSize;
315} VkVideoEncodeH264RateControlLayerInfoKHR;
316----
317
318When `useMinQp` is set to `VK_TRUE`, `minQp` specifies the lower bound on the QP values, for each picture type, that the implementation's rate control algorithm should use. Similarly, when `useMaxQp` is set to `VK_TRUE`, `maxQp` specifies the upper bound on the QP values.
319
320When `useMaxFrameSize` is set to `VK_TRUE`, `maxFrameSize` specifies the maximum frame size in bytes, for each picture type, that the implementation's rate control algorithm should target.
321
322Some implementations may benefit from or require additional guidance on the remaining number of frames in the currently encoded GOP, as indicated by the `prefersGopRemainingFrames` and `requiresGopRemainingFrames` capabilities, respectively. This may be the case either due to the implementation not being able to track the current position of the encoded stream within the GOP, or because the implementation may be able to use this information to better react to dynamic changes to the GOP structure. This proposal solves this by introducing the following new structure that can be included in the `pNext` chain of `VkVideoBeginCodingInfoKHR`:
323
324[source,c]
325----
326typedef struct VkVideoEncodeH264GopRemainingFrameInfoKHR {
327    VkStructureType    sType;
328    const void*        pNext;
329    VkBool32           useGopRemainingFrames;
330    uint32_t           gopRemainingI;
331    uint32_t           gopRemainingP;
332    uint32_t           gopRemainingB;
333} VkVideoEncodeH264GopRemainingFrameInfoKHR;
334----
335
336When `useGopRemainingFrames` is set to `VK_TRUE`, the implementation's rate control algorithm may use the values specified in `gopRemainingI`, `gopRemainingP`, and `gopRemainingB` as a guidance on the number of remaining frames of the corresponding type in the currently encoded GOP.
337
338
339== Examples
340
341=== Select queue family with H.264 encode support
342
343[source,c]
344----
345uint32_t queueFamilyIndex;
346uint32_t queueFamilyCount;
347
348vkGetPhysicalDeviceQueueFamilyProperties2(physicalDevice, &queueFamilyCount, NULL);
349
350VkQueueFamilyProperties2* props = calloc(queueFamilyCount,
351    sizeof(VkQueueFamilyProperties2));
352VkQueueFamilyVideoPropertiesKHR* videoProps = calloc(queueFamilyCount,
353    sizeof(VkQueueFamilyVideoPropertiesKHR));
354
355for (queueFamilyIndex = 0; queueFamilyIndex < queueFamilyCount; ++queueFamilyIndex) {
356    props[queueFamilyIndex].sType = VK_STRUCTURE_TYPE_QUEUE_FAMILY_PROPERTIES_2;
357    props[queueFamilyIndex].pNext = &videoProps[queueFamilyIndex];
358
359    videoProps[queueFamilyIndex].sType = VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR;
360}
361
362vkGetPhysicalDeviceQueueFamilyProperties2(physicalDevice, &queueFamilyCount, props);
363
364for (queueFamilyIndex = 0; queueFamilyIndex < queueFamilyCount; ++queueFamilyIndex) {
365    if ((props[queueFamilyIndex].queueFamilyProperties.queueFlags & VK_QUEUE_VIDEO_ENCODE_BIT_KHR) != 0 &&
366        (videoProps[queueFamilyIndex].videoCodecOperations & VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR) != 0) {
367        break;
368    }
369}
370
371if (queueFamilyIndex < queueFamilyCount) {
372    // Found appropriate queue family
373    ...
374} else {
375    // Did not find a queue family with the needed capabilities
376    ...
377}
378----
379
380
381=== Check support and query the capabilities for an H.264 encode profile
382
383[source,c]
384----
385VkResult result;
386
387VkVideoEncodeH264ProfileInfoKHR encodeH264ProfileInfo = {
388    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_PROFILE_INFO_KHR,
389    .pNext = NULL,
390    .stdProfileIdc = STD_VIDEO_H264_PROFILE_IDC_BASELINE
391};
392
393VkVideoProfileInfoKHR profileInfo = {
394    .sType = VK_STRUCTURE_TYPE_VIDEO_PROFILE_INFO_KHR,
395    .pNext = &encodeH264ProfileInfo,
396    .videoCodecOperation = VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR,
397    .chromaSubsampling = VK_VIDEO_CHROMA_SUBSAMPLING_420_BIT_KHR,
398    .lumaBitDepth = VK_VIDEO_COMPONENT_BIT_DEPTH_8_BIT_KHR,
399    .chromaBitDepth = VK_VIDEO_COMPONENT_BIT_DEPTH_8_BIT_KHR
400};
401
402VkVideoEncodeH264CapabilitiesKHR encodeH264Capabilities = {
403    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_CAPABILITIES_KHR,
404    .pNext = NULL,
405};
406
407VkVideoEncodeCapabilitiesKHR encodeCapabilities = {
408    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_CAPABILITIES_KHR,
409    .pNext = &encodeH264Capabilities
410}
411
412VkVideoCapabilitiesKHR capabilities = {
413    .sType = VK_STRUCTURE_TYPE_VIDEO_CAPABILITIES_KHR,
414    .pNext = &encodeCapabilities
415};
416
417result = vkGetPhysicalDeviceVideoCapabilitiesKHR(physicalDevice, &profileInfo, &capabilities);
418
419if (result == VK_SUCCESS) {
420    // Profile is supported, check additional capabilities
421    ...
422} else {
423    // Profile is not supported, result provides additional information about why
424    ...
425}
426----
427
428=== Create and update H.264 video session parameters objects
429
430[source,c]
431----
432VkVideoSessionParametersKHR videoSessionParams = VK_NULL_HANDLE;
433
434VkVideoEncodeH264SessionParametersCreateInfoKHR encodeH264CreateInfo = {
435    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_SESSION_PARAMETERS_CREATE_INFO_KHR,
436    .pNext = NULL,
437    .maxStdSPSCount = ... // SPS capacity
438    .maxStdPPSCount = ... // PPS capacity
439    .pParametersAddInfo = ... // parameters to add at creation time or NULL
440};
441
442VkVideoSessionParametersCreateInfoKHR createInfo = {
443    .sType = VK_STRUCTURE_TYPE_VIDEO_SESSION_PARAMETERS_CREATE_INFO_KHR,
444    .pNext = &encodeH264CreateInfo,
445    .flags = 0,
446    .videoSessionParametersTemplate = ... // template to use or VK_NULL_HANDLE
447    .videoSession = videoSession
448};
449
450vkCreateVideoSessionParametersKHR(device, &createInfo, NULL, &videoSessionParams);
451
452...
453
454StdVideoH264SequenceParameterSet sps = {};
455// parse and populate SPS parameters
456...
457
458StdVideoH264PictureParameterSet pps = {};
459// parse and populate PPS parameters
460...
461
462VkVideoEncodeH264SessionParametersAddInfoKHR encodeH264AddInfo = {
463    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_SESSION_PARAMETERS_ADD_INFO_KHR,
464    .pNext = NULL,
465    .stdSPSCount = 1,
466    .pStdSPSs = &sps,
467    .stdPPSCount = 1,
468    .pStdPPSs = &pps
469};
470
471VkVideoSessionParametersUpdateInfoKHR updateInfo = {
472    .sType = VK_STRUCTURE_TYPE_VIDEO_SESSION_PARAMETERS_UPDATE_INFO_KHR,
473    .pNext = &encodeH264AddInfo,
474    .updateSequenceCount = 1 // incremented for each subsequent update
475};
476
477vkUpdateVideoSessionParametersKHR(device, &videoSessionParams, &updateInfo);
478----
479
480
481=== Record H.264 encode operation producing an I frame that is also set up as a reference
482
483[source,c]
484----
485// Bound reference resource list provided has to include reconstructed picture resource
486vkCmdBeginVideoCodingKHR(commandBuffer, ...);
487
488StdVideoEncodeH264ReferenceInfo stdReferenceInfo = {};
489// Populate H.264 reference picture info for the reconstructed picture
490stdReferenceInfo.primary_pic_type = STD_VIDEO_H264_PICTURE_TYPE_I;
491...
492
493VkVideoEncodeH264DpbSlotInfoKHR encodeH264DpbSlotInfo = {
494    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_DPB_SLOT_INFO_KHR,
495    .pNext = NULL,
496    .pStdReferenceInfo = &stdReferenceInfo
497};
498
499VkVideoReferenceSlotInfoKHR setupSlotInfo = {
500    .sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
501    .pNext = &encodeH264DpbSlotInfo
502    ...
503};
504
505StdVideoEncodeH264ReferenceListsInfo stdRefListInfo = {};
506// No references are used so just initialize the RefPicLists
507for (uint32_t i = 0; i < STD_VIDEO_H264_MAX_NUM_LIST_REF; ++i) {
508    stdRefListInfo.RefPicList0[i] = STD_VIDEO_H264_NO_REFERENCE_PICTURE;
509    stdRefListInfo.RefPicList1[i] = STD_VIDEO_H264_NO_REFERENCE_PICTURE;
510}
511// Populate H.264 reference list modification/marking ops and other parameters
512...
513
514StdVideoEncodeH264PictureInfo stdPictureInfo = {};
515// Populate H.264 picture info for the encode input picture
516...
517// Make sure that the reconstructed picture is requested to be set up as reference
518stdPictureInfo.flags.is_reference = 1;
519...
520stdPictureInfo.primary_pic_type = STD_VIDEO_H264_PICTURE_TYPE_I;
521...
522stdPictureInfo.pRefLists = &stdRefListInfo;
523
524VkVideoEncodeH264PictureInfoKHR encodeH264PictureInfo = {
525    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_PICTURE_INFO_KHR,
526    .pNext = NULL,
527    .naluSliceEntryCount = ... // number of slices to encode
528    .pNaluSliceEntries = ... // pointer to the array of slice parameters
529    .pStdPictureInfo = &stdPictureInfo
530};
531
532VkVideoEncodeInfoKHR encodeInfo = {
533    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_INFO_KHR,
534    .pNext = &encodeH264PictureInfo,
535    ...
536    .pSetupReferenceSlot = &setupSlotInfo,
537    ...
538};
539
540vkCmdEncodeVideoKHR(commandBuffer, &encodeInfo);
541
542vkCmdEndVideoCodingKHR(commandBuffer, ...);
543----
544
545
546=== Record H.264 encode operation producing a P frame with a single backward reference
547
548[source,c]
549----
550// Bound reference resource list provided has to include the used reference picture resource
551vkCmdBeginVideoCodingKHR(commandBuffer, ...);
552
553StdVideoEncodeH264ReferenceInfo stdBackwardReferenceInfo = {};
554// Populate H.264 reference picture info for the backward referenced picture
555...
556
557VkVideoEncodeH264DpbSlotInfoKHR encodeH264DpbSlotInfo = {
558    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_DPB_SLOT_INFO_KHR,
559    .pNext = NULL,
560    .pStdReferenceInfo = &stdBackwardReferenceInfo
561};
562
563VkVideoReferenceSlotInfoKHR referenceSlotInfo = {
564    .sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
565    .pNext = &encodeH264DpbSlotInfo,
566    .slotIndex = ... // DPB slot index of the backward reference picture
567    ...
568};
569
570StdVideoEncodeH264ReferenceListsInfo stdRefListInfo = {};
571// Initialize the RefPicLists and add the backward reference to the L0 list
572for (uint32_t i = 0; i < STD_VIDEO_H264_MAX_NUM_LIST_REF; ++i) {
573    stdRefListInfo.RefPicList0[i] = STD_VIDEO_H264_NO_REFERENCE_PICTURE;
574    stdRefListInfo.RefPicList1[i] = STD_VIDEO_H264_NO_REFERENCE_PICTURE;
575}
576stdRefListInfo.RefPicList0[0] = ... // DPB slot index of the backward reference picture
577// Populate H.264 reference list modification/marking ops and other parameters
578...
579
580StdVideoEncodeH264PictureInfo stdPictureInfo = {};
581// Populate H.264 picture info for the encode input picture
582...
583stdPictureInfo.primary_pic_type = STD_VIDEO_H264_PICTURE_TYPE_P;
584...
585stdPictureInfo.pRefLists = &stdRefListInfo;
586
587VkVideoEncodeH264PictureInfoKHR encodeH264PictureInfo = {
588    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_PICTURE_INFO_KHR,
589    .pNext = NULL,
590    .naluSliceEntryCount = ... // number of slices to encode
591    .pNaluSliceEntries = ... // pointer to the array of slice parameters
592    .pStdPictureInfo = &stdPictureInfo
593};
594
595VkVideoEncodeInfoKHR encodeInfo = {
596    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_INFO_KHR,
597    .pNext = &encodeH264PictureInfo,
598    ...
599    .referenceSlotCount = 1,
600    .pReferenceSlots = &referenceSlotInfo
601};
602
603vkCmdEncodeVideoKHR(commandBuffer, &encodeInfo);
604
605vkCmdEndVideoCodingKHR(commandBuffer, ...);
606----
607
608
609=== Record H.264 encode operation producing a B frame with a forward and a backward reference
610
611[source,c]
612----
613// Bound reference resource list provided has to include the used reference picture resources
614vkCmdBeginVideoCodingKHR(commandBuffer, ...);
615
616StdVideoEncodeH264ReferenceInfo stdBackwardReferenceInfo = {};
617// Populate H.264 reference picture info for the backward referenced picture
618...
619
620StdVideoEncodeH264ReferenceInfo stdForwardReferenceInfo = {};
621// Populate H.264 reference picture info for the forward referenced picture
622...
623
624VkVideoEncodeH264DpbSlotInfoKHR encodeH264DpbSlotInfo[] = {
625    {
626        .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_DPB_SLOT_INFO_KHR,
627        .pNext = NULL,
628        .pStdReferenceInfo = &stdBackwardReferenceInfo
629    },
630    {
631        .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_DPB_SLOT_INFO_KHR,
632        .pNext = NULL,
633        .pStdReferenceInfo = &stdForwardReferenceInfo
634    }
635};
636
637VkVideoReferenceSlotInfoKHR referenceSlotInfo[] = {
638    {
639        .sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
640        .pNext = &encodeH264DpbSlotInfo[0],
641        .slotIndex = ... // DPB slot index of the backward reference picture
642        ...
643    },
644    {
645        .sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
646        .pNext = &encodeH264DpbSlotInfo[1],
647        .slotIndex = ... // DPB slot index of the forward reference picture
648        ...
649    }
650};
651
652StdVideoEncodeH264ReferenceListsInfo stdRefListInfo = {};
653// Initialize the RefPicLists, add the backward reference to the L0 list,
654// and add the forward reference to the L1 list
655for (uint32_t i = 0; i < STD_VIDEO_H264_MAX_NUM_LIST_REF; ++i) {
656    stdRefListInfo.RefPicList0[i] = STD_VIDEO_H264_NO_REFERENCE_PICTURE;
657    stdRefListInfo.RefPicList1[i] = STD_VIDEO_H264_NO_REFERENCE_PICTURE;
658}
659stdRefListInfo.RefPicList0[0] = ... // DPB slot index of the backward reference picture
660stdRefListInfo.RefPicList1[0] = ... // DPB slot index of the forward reference picture
661// Populate H.264 reference list modification/marking ops and other parameters
662...
663
664StdVideoEncodeH264PictureInfo stdPictureInfo = {};
665// Populate H.264 picture info for the encode input picture
666...
667stdPictureInfo.primary_pic_type = STD_VIDEO_H264_PICTURE_TYPE_B;
668...
669stdPictureInfo.pRefLists = &stdRefListInfo;
670
671VkVideoEncodeH264PictureInfoKHR encodeH264PictureInfo = {
672    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_PICTURE_INFO_KHR,
673    .pNext = NULL,
674    .naluSliceEntryCount = ... // number of slices to encode
675    .pNaluSliceEntries = ... // pointer to the array of slice parameters
676    .pStdPictureInfo = &stdPictureInfo
677};
678
679VkVideoEncodeInfoKHR encodeInfo = {
680    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_INFO_KHR,
681    .pNext = &encodeH264PictureInfo,
682    ...
683    .referenceSlotCount = sizeof(referenceSlotInfo) / sizeof(referenceSlotInfo[0]),
684    .pReferenceSlots = &referenceSlotInfo[0]
685};
686
687vkCmdEncodeVideoKHR(commandBuffer, &encodeInfo);
688
689vkCmdEndVideoCodingKHR(commandBuffer, ...);
690----
691
692
693=== Change the rate control configuration of an H.264 encode session with optional H.264 controls
694
695[source,c]
696----
697vkCmdBeginVideoCodingKHR(commandBuffer, ...);
698
699// Include the optional H.264 rate control layer information
700// In this example we restrict the QP range to be used by the implementation
701VkVideoEncodeH264RateControlLayerInfoKHR rateControlLayersH264[] = {
702    {
703        .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_RATE_CONTROL_LAYER_INFO_KHR,
704        .pNext = NULL,
705        .useMinQp = VK_TRUE,
706        .minQp = { /* min I frame QP */, /* min P frame QP */, /* min B frame QP */ },
707        .useMaxQp = VK_TRUE,
708        .minQp = { /* max I frame QP */, /* max P frame QP */, /* max B frame QP */ },
709        .useMaxFrameSize = VK_FALSE,
710        .maxFrameSize = { 0, 0, 0 }
711    },
712    ...
713};
714
715VkVideoEncodeRateControlLayerInfoKHR rateControlLayers[] = {
716    {
717        .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_RATE_CONTROL_LAYER_INFO_KHR,
718        .pNext = &rateControlLayersH264[0],
719        ...
720    },
721    ...
722};
723
724// Include the optional H.264 global rate control information
725VkVideoEncodeH264RateControlInfoKHR rateControlInfoH264 = {
726    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_H264_RATE_CONTROL_INFO_KHR,
727    .pNext = NULL,
728    .flags = VK_VIDEO_ENCODE_H264_RATE_CONTROL_REGULAR_GOP_BIT_KHR // Indicate the use of a regular GOP structure...
729           | VK_VIDEO_ENCODE_H264_RATE_CONTROL_TEMPORAL_LAYER_PATTERN_DYADIC_BIT_KHR, // ... and a dyadic temporal layer pattern
730    // Indicate a GOP structure of the form IBBBPBBBPBBBI with an IDR frame at the beginning of every 10th GOP
731    .gopFrameCount = 12,
732    .idrPeriod = 120,
733    .consecutiveBFrameCount = 3,
734    // This example uses multiple temporal layers with per layer rate control
735    .temporalLayerCount = sizeof(rateControlLayers) / sizeof(rateControlLayers[0])
736};
737
738VkVideoEncodeRateControlInfoKHR rateControlInfo = {
739    .sType = VK_STRUCTURE_TYPE_VIDEO_ENCODE_RATE_CONTROL_INFO_KHR,
740    .pNext = &rateControlInfoH264,
741    ...
742    .layerCount = sizeof(rateControlLayers) / sizeof(rateControlLayers[0]),
743    .pLayers = rateControlLayers,
744    ...
745};
746
747// Change the rate control configuration for the video session
748VkVideoCodingControlInfoKHR controlInfo = {
749    .sType = VK_STRUCTURE_TYPE_VIDEO_CODING_CONTROL_INFO_KHR,
750    .pNext = &rateControlInfo,
751    .flags = VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
752};
753
754vkCmdControlVideoCodingKHR(commandBuffer, &controlInfo);
755
756...
757
758vkCmdEndVideoCodingKHR(commandBuffer, ...);
759----
760
761
762== Issues
763
764=== RESOLVED: In what form should codec-specific parameters be provided?
765
766In the form of structures defined by the `vulkan_video_codec_h264std_encode` and `vulkan_video_codec_h264std` video std headers. Applications are responsible to populate the structures defined by the video std headers. It is also the application's responsibility to maintain and manage these data structures, as needed, to be able to provide them as inputs to video encode operations where needed.
767
768
769=== RESOLVED: Why the `vulkan_video_codec_h264std` video std header does not have a version number?
770
771The `vulkan_video_codec_h264std` video std header was introduced to share common definitions used in both H.264/AVC video decoding and video encoding, as the two functionalities were designed in parallel. However, as no video coding extension uses this video std header directly, only as a dependency of the video std header specific to the particular video coding operation, no separate versioning scheme was deemed necessary.
772
773
774=== RESOLVED: What are the requirements for the codec-specific input parameters?
775
776It is legal from an API usage perspective for the application to provide any values for the codec-specific input parameters (parameter sets, picture information, etc.). However, if the input data does not conform to the requirements of the H.264/AVC video compression standard, then video encode operations may complete unsuccessfully and, in general, the outputs produced by the video encode operation will have undefined contents.
777
778In addition, certain commands may return the `VK_ERROR_INVALID_VIDEO_STD_PARAMETERS_KHR` error if any of the specified codec-specific parameters do not adhere to the syntactic or semantic requirements of the H.264/AVC video compression standard or if values derived from parameters according to the rules defined by the H.264/AVC video compression standard do not adhere to the capabilities of the H.264/AVC video compression standard or the implementation. In particular, in this extension the following commands may return this error code:
779
780  * `vkCreateVideoSessionParametersKHR` or `vkUpdateVideoSessionParametersKHR` - if the specified parameter sets are invalid according to these rules
781  * `vkEndCommandBuffer` - if the codec-specific picture information provided to video encode operations are invalid according to these rules
782
783Generating errors in the cases above, however, is not required so applications should not rely on receiving an error code for the purposes of verifying the correctness of the used codec-specific parameters.
784
785
786=== RESOLVED: Are interlaced frames supported?
787
788No. Encoding interlaced H.264 content does not seem like an important use case to support.
789
790
791=== RESOLVED: Do we want to allow the application to specify separate reference lists for each slice?
792
793Not in this extension. While the H.264/AVC video compression standard seems to support this, such flexibility is not exposed here for the sake of simplicity. If the need arises to support per slice reference lists operations, a layered extension can introduce the necessary APIs to enable it.
794
795
796=== RESOLVED: Are prefix NAL units generated by the implementation when multiple temporal layers are used?
797
798Only when the `VK_VIDEO_ENCODE_H264_CAPABILITY_GENERATE_PREFIX_NALU_BIT_KHR` capability flag is supported by the implementation and the application explicitly requests the generation of prefix NAL units using the `generatePrefixNalu` parameter.
799
800If an application intends to use multiple temporal layers on an implementation that does not support the generation of prefix NALU units, then the application is responsible for inserting those into the final bitstream.
801
802
803=== RESOLVED: What codec-specific parameters are guaranteed to not be overridden by implementations?
804
805This proposal only requires that implementations do not override the `primary_pic_type` and `slice_type` parameters, as the used picture and slice types are fundamental to the general operation of H.264 encoding. In addition, bits set in the `stdSyntaxFlags` capability provide additional guarantees about other Video Std parameters that the implementation will use without overriding them. No further restrictions are included in this extension regarding codec-specific parameter overrides, however, future extensions may include capability flags providing additional guarantees based on the needs of the users of the API.
806
807
808=== RESOLVED: How is reference picture setup requested for H.264 encode operations?
809
810As specifying a reconstructed picture DPB slot and resource is always required per the latest revision of the video extensions, additional codec syntax controls whether reference picture setup is requested and, in response, the DPB slot is activated with the reconstructed picture.
811
812For H.264 encode, reference picture setup is requested and the DPB slot specified for the reconstructed picture is activated with the picture if and only if the `StdVideoEncodeH264PictureInfo::flags.is_reference` flag is set.
813
814
815== Further Functionality
816
817Future extensions can further extend the capabilities provided here, e.g. exposing support for encode modes allowing per-slice input and/or output.
818