1# ASTC Format Overview 2 3Adaptive Scalable Texture Compression (ASTC) is an advanced lossy texture 4compression technology developed by Arm and AMD. It has been adopted as an 5official Khronos extension to the OpenGL and OpenGL ES APIs, and as a standard 6optional feature for the Vulkan API. 7 8ASTC offers a number of advantages over earlier texture compression formats: 9 10* **Format flexibility:** ASTC supports compressing between 1 and 4 channels of 11 data, including support for one non-correlated channel such as RGB+A 12 (correlated RGB, non-correlated alpha). 13* **Bit rate flexibility:** ASTC supports compressing images with a fine 14 grained choice of bit rates between 0.89 and 8 bits per texel (bpt). The bit 15 rate choice is independent to the color format choice. 16* **Advanced format support:** ASTC supports compressing images in either low 17 dynamic range (LDR), LDR sRGB, or high dynamic range (HDR) color spaces, as 18 well as support for compressing 3D volumetric textures. 19* **Improved image quality:** Despite the high degree of format flexibility, 20 ASTC manages to beat nearly all legacy texture compression formats -- such as 21 ETC2, PVRCT, and the BC formats -- on image quality at equivalent bit 22 rates. 23 24This article explores the ASTC format, and how it manages to generate the 25flexibility and quality improvements that it achieves. 26 27 28Why ASTC? 29========= 30 31Before the creation of ASTC, the format and bit rate coverage of the available 32formats was very sparse: 33 34 35 36In reality the situation is even worse than this diagram shows, as many of 37these formats are proprietary or simply not available on some operating 38systems, so any single platform will have very limited compression choices. 39 40For developers this situation makes developing content which is portable across 41multiple platforms a tricky proposition. It's almost certain that differently 42compressed assets will be needed for different platforms. Each asset pack would 43likely then need to use different levels of compression, and may even have to 44fall back to no compression for some assets on some platforms, which leaves 45either some image quality or some memory bandwidth efficiency untapped. 46 47It was clear a better way was needed, so the Khronos group asked members to 48submit proposals for a new compression algorithm to be adopted in the same 49manner that the earlier ETC algorithm was adopted for OpenGL ES. ASTC was the 50result of this, and has been adopted as an official algorithm for OpenGL, 51OpenGL ES, and Vulkan. 52 53 54Format overview 55=============== 56 57Given the fragmentation issues with the existing compression formats, it should 58be no surprise that the high level design objectives for ASTC were to have 59something which could be used across the whole range of art assets found in 60modern content, and which allows artists to have more control over the quality 61to bit rate tradeoff. 62 63There are quite a few technical components which make up the ASTC format, so 64before we dive into detail it will be useful to give an overview of how ASTC 65works at a higher level. 66 67 68Block compression 69----------------- 70 71Compression formats for real-time graphics need the ability to quickly and 72efficiently make random samples into a texture. This places two technical 73requirements on any compression format: 74 75* It must be possible to compute the address of data in memory given only a 76 sample coordinate. 77* It must be possible to decompress random samples without decompressing too 78 much surrounding data. 79 80The standard solution for this used by all contemporary real-time formats, 81including ASTC, is to divide the image into fixed-size blocks of texels, each 82of which is compressed into a fixed number of output bits. This feature makes 83it possible to access texels quickly, in any order, and with a well-bounded 84decompression cost. 85 86The 2D block footprints in ASTC range from 4x4 texels up to 12x12 texels, which 87all compress into 128-bit output blocks. By dividing 128 bits by the number of 88texels in the footprint, we derive the format bit rates which range from 8 bpt 89(`128/(4*4)`) down to 0.89 bpt (`128/(12*12)`). 90 91 92Color encoding 93-------------- 94 95ASTC uses gradients to assign the color values of each texel. Each compressed 96block stores the end-point colors for a gradient, and an interpolation weight 97for each texel which defines the texel's location along that gradient. During 98decompression the color value for each texel is generated by interpolating 99between the two end-point colors, based on the per-texel weight. 100 101 102 103In many cases a block will contain a complex distribution of colors, for 104example a red ball sitting on green grass. In these scenarios a single color 105gradient will not be able to accurately represent all of the texels' values. To 106support this ASTC allows a block to define up to four distinct color gradients, 107known as partitions, and can assign each texel to a single partition. For our 108example we require two partitions, one for our ball texels and one for our 109grass texels. 110 111 112 113Now that you know the high level operation of the format, we can dive into more 114detail. 115 116 117Integer encoding 118================ 119 120Initially the idea of fractional bits per texel sounds implausible, or even 121impossible, because we're so used to storing numbers as a whole number of bits. 122However, it's not quite as strange as it sounds. ASTC uses an encoding 123technique called Bounded Integer Sequence Encoding (BISE), which makes heavy 124use of storing numbers with a fractional number of bits to pack information 125more efficiently. 126 127 128Storing alphabets 129----------------- 130 131Even though color and weight values per texel are notionally floating-point 132values, we have far too few bits available to directly store the actual values, 133so they must be quantized during compression to reduce the storage size. For 134example, if we have a floating-point weight for each texel in the range 0.0 to 1351.0 we could choose to quantize it to five values - 0.0, 0.25, 0.5, 0.75, and 1361.0 - which we can then represent in storage using the integer values 0 to 4. 137 138In the general case we need to be able to efficiently store characters of an 139alphabet containing N symbols if we choose quantize to N levels. An N symbol 140alphabet contains `log2(N)` bits of information per character. If we have an 141alphabet of 5 possible symbols then each character contains ~2.32 bits of 142information, but simple binary storage would require us to round up to 3 bits. 143This wastes 22.3% of our storage capacity. The chart below shows the percentage 144of our bit-space wasted when using simple binary encoding to store an arbitrary 145N symbol alphabet: 146 147 148 149... which shows for most alphabet sizes we waste a lot of our storage capacity 150when using an integer number of bits per character. Efficiency is of critical 151importance to a compression format, so this is something we needed to be able 152to improve. 153 154**Note:** We could have chosen to round-up the quantization level to the next 155power of two, and at least use the bits we're spending. However, this forces 156the encoder to spend bits which could be used elsewhere for a bigger benefit, 157so it will reduce image quality and is a sub-optimal solution. 158 159 160Quints 161------ 162 163Instead of rounding up a 5 symbol alphabet - called a "quint" in BISE - to 164three bits, we could choose to instead pack three quint characters together. 165Three characters in a 5-symbol alphabet have 5<sup>3</sup> (125) combinations, 166and contain 6.97 bits of information. We can store this in 7 bits and have a 167storage waste of only 0.5%. 168 169 170Trits 171----- 172 173We can similarly construct a 3-symbol alphabet - called a "trit" in BISE - and 174pack trit characters in groups of five. Each character group has 3<sup>5</sup> 175(243) combinations, and contains 7.92 bits of information. We can store this in 1768 bits and have a storage waste of only 1%. 177 178 179BISE 180---- 181 182The BISE encoding used by ASTC allows storage of character sequences using 183arbitrary alphabets of up to 256 symbols, encoding each alphabet size in the 184most space-efficient choice of bits, trits, and quints. 185 186* Alphabets with up to (2<sup>n</sup> - 1) symbols can be encoded using n bits 187 per character. 188* Alphabets with up (3 * 2<sup>n</sup> - 1) symbols can be encoded using n bits 189 (m) and a trit (t) per character, and reconstructed using the equation 190 (t * 2<sup>n</sup> + m). 191* Alphabets with up to (5 * 2<sup>n</sup> - 1) symbols can be encoded using n 192 bits (m) and a quint (q) per character, and reconstructed using the equation 193 (q * 2<sup>n</sup> + m). 194 195When the number of characters in a sequence is not a multiple of three or five 196we need to avoid wasting storage at the end of the sequence, so we add another 197constraint on the encoding. If the last few values in the sequence to encode 198are zero, the last few bits in the encoded bit string must also be zero. 199Ideally, the number of non-zero bits should be easily calculated and not depend 200on the magnitudes of the previous encoded values. This is a little tricky to 201arrange during compression, but it is possible. This means that we do not need 202to store any padding after the end of the bit sequence, as we can safely assume 203that they are zero bits. 204 205With this constraint in place - and by some smart packing the bits, trits, and 206quints - BISE encodes an string of S characters in an N symbol alphabet using a 207fixed number of bits: 208 209* S values up to (2<sup>n</sup> - 1) uses (NS) bits. 210* S values up to (3 * 2<sup>n</sup> - 1) uses (NS + ceil(8S / 5)) bits. 211* S values up to (5 * 2<sup>n</sup> - 1) uses (NS + ceil(7S / 3)) bits. 212 213... and the compressor will choose the one of these which produces the smallest 214storage for the alphabet size being stored; some will use binary, some will use 215bits and a trit, and some will use bits and a quint. If we compare the storage 216efficiency of BISE against simple binary for the range of possible alphabet 217sizes we might want to encode we can see that it is much more efficient. 218 219 220 221 222Block sizes 223=========== 224 225ASTC always compresses blocks of texels into 128-bit outputs, but allows the 226developer to select from a range of block sizes to enable a fine-grained 227tradeoff between image quality and size. 228 229| Block footprint | Bits/texel | | Block footprint | Bits/texel | 230| --------------- | ---------- | --- | --------------- | ---------- | 231| 4x4 | 8.00 | | 10x5 | 2.56 | 232| 5x4 | 6.40 | | 10x6 | 2.13 | 233| 5x5 | 5.12 | | 8x8 | 2.00 | 234| 6x5 | 4.27 | | 10x8 | 1.60 | 235| 6x6 | 3.56 | | 10x10 | 1.28 | 236| 8x5 | 3.20 | | 12x10 | 1.07 | 237| 8x6 | 2.67 | | 12x12 | 0.89 | 238 239 240 241Color endpoints 242=============== 243 244The color data for a block is encoded as a gradient between two color 245endpoints, with each texel selecting a position along that gradient which is 246then interpolated during decompression. ASTC supports 16 color endpoint 247encoding schemes, known as "endpoint modes". Options for endpoint modes 248include: 249 250* Varying the number of color channels: e.g. luminance, luminance + alpha, rgb, 251 and rgba. 252* Varying the encoding method: e.g. direct, base+offset, base+scale, 253 quantization level. 254* Varying the data range: e.g. low dynamic range, or high dynamic range 255 256The endpoint modes, and the endpoint color BISE quantization level, can be 257chosen on a per-block basis. 258 259 260Color partitions 261================ 262 263Colors within a block are often complex, and cannot be accurately captured by a 264single color gradient, as discussed earlier with our example of a red ball 265lying on green grass. ASTC allows up to four color gradients - known as 266"partitions" - to be assigned to a single block. Each texel is then assigned to 267a single partition for the purposes of decompression. 268 269Rather then directly storing the partition assignment for each texel, which 270would need a lot of decompressor hardware to store it for all block sizes, we 271generate it procedurally. Each block only needs to store the partition index - 272which is the seed for the procedural generator - and the per texel assignment 273can then be generated on-the-fly during decompression. The image below shows 274the generated texel assignments for two (top), three (middle), and four 275(bottom) partitions for the 8x8 block size. 276 277 278 279The number of partitions and the partition index can be chosen on a per-block 280basis, and a different color endpoint mode can be chosen per partition. 281 282**Note:** ASTC uses a 10-bit seed to drive the partition assignments. The hash 283used will introduce horizontal bias in a third of the partitions, vertical bias 284in a third, and no bias in the rest. As they are procedurally generated not all 285of the partitions are useful, in particular with the smaller block sizes. 286 287* Many partitions are duplicates. 288* Many partitions are degenerate (an N partition hash results in at least one 289 partition assignment that contains no texels). 290 291 292Texel weights 293============= 294 295Each texel requires a weight, which defines the relative contribution of each 296color endpoint when interpolating the color gradient. 297 298For smaller block sizes we can choose to store the weight directly, with one 299weight per texel, but for the larger block sizes we simply do not have enough 300bits of storage to do this. To work around this ASTC allows the weight grid to 301be stored at a lower resolution than the texel grid. The per-texel weights are 302interpolated from the stored weight grid during decompression using a bilinear 303interpolation. 304 305The number of texel weights, and the weight value BISE quantization level, can 306be chosen on a per-block basis. 307 308 309Dual-plane weights 310------------------ 311 312Using a single weight for all color channels works well when there is good 313correlation across the channels, but this is not always the case. Common 314examples where we would expect to get low correlation at least some of the time 315are textures storing RGBA data - alpha masks are not usually closely 316correlated with the color value - or normal data - the X and Y normal values 317often change independently. 318 319ASTC allows a dual-plane mode, which uses two separate weight grids for each 320texel. A single channel can be assigned to a second plane of weights, while 321the other three use the first plane of weights. 322 323The use of dual-plane mode can be chosen on a per-block basis, but its use 324prevents the use of four color partitions as we do not have enough bits to 325concurrently store both an extra plane of weights and an extra set of color 326endpoints. 327 328 329End results 330=========== 331 332So, if we pull all of this together what do we end up with? 333 334 335Adaptive 336-------- 337 338The first word in the name of ASTC is "adaptive", and it should now hopefully 339be clear why. Each block always compresses into 128-bits of storage, but the 340developer can choose from a wide range of texel block sizes and the compressor 341gets a huge amount of latitude to determine how those 128 bits are used. 342 343The compressor can trade off the number of bits assigned to colors (number of 344partitions, endpoint mode, and stored quantization level) and weights (number 345of weights per block, use of dual-plane, and stored quantization level) on a 346per-block basis to get the best image quality possible. 347 348 349 350 351Format support 352-------------- 353 354The compression scheme used by ASTC effectively compresses arbitrary sequences 355of floating point numbers, with a flexible number of channels, across any of 356the supported block sizes. There is no real notion of "color format" in the 357format itself at all, beyond the color endpoint mode selection, although a 358sensible compressor will want to use some format-specific heuristics to drive 359an efficient state-space search. 360 361The orthogonal encoding design allows ASTC to provide almost complete coverage 362of our desirable format matrix from earlier, across a wide range of bit rates: 363 364 365 366The only significant omission is the absence of a dedicated two channel 367encoding for HDR textures. We simply ran out of entries in the space we had for 368encoding color endpoint modes, and this one didn't make the cut. 369 370The flexibility allowed by ASTC ticks the requirement that almost any asset can 371be compressed to some degree, at an appropriate bitrate for its quality needs. 372This is a powerful enabler for a compression format, because it puts control in 373the hands of content creators and not arbitrary format restrictions. 374 375 376Image quality 377------------- 378 379The normal expectation would be that this level of format flexibility would 380come at a cost of image quality; it has to cost something, right? Luckily this 381isn't true. The high packing efficiency allowed by BISE encoding, and the 382ability to dynamically choose where to spend encoding space on a per-block 383basis, means that an ASTC compressor is not forced to spend bits on things that 384don't help image quality. 385 386This gives some significant improvements in image quality compared to the older 387texture formats, even though ASTC also handles a much wider range of options. 388 389* ASTC at 2 bpt outperforms PVRTC at 2 bpt by ~2.0dB. 390* ASTC at 3.56 bpt outperforms PVRTC and BC1 at 4 bpt by ~1.5dB, and ETC2 by 391 ~0.7dB, despite a 10% bit rate disadvantage. 392* ASTC at 8 bpt for LDR formats is comparable in quality to BC7 at 8 bpt. 393* ASTC at 8 bpt for HDR formats is comparable in quality to BC6H at 8 bpt. 394 395Differences as small as 0.25dB are visible to the human eye, and remember that 396dB uses a logarithmic scale, so these are significant image quality 397improvements. 398 399 4003D compression 401-------------- 402 403One of the nice bonus features of ASTC is that the techniques which underpin 404the format generalize to compressing volumetric texture data without needing 405very much additional decompression hardware. 406 407ASTC is therefore also able to optionally support compression of 3D textures, 408which is a unique feature not found in any earlier format, at the following 409bit rates: 410 411| Block footprint | Bits/texel | | Block footprint | Bits/texel | 412| --------------- | ---------- | --- | --------------- | ---------- | 413| 3x3x3 | 4.74 | | 5x5x4 | 1.28 | 414| 4x3x3 | 3.56 | | 5x5x5 | 1.02 | 415| 4x4x3 | 2.67 | | 6x5x5 | 0.85 | 416| 4x4x4 | 2.00 | | 6x6x5 | 0.71 | 417| 5x4x4 | 1.60 | | 6x6x6 | 0.59 | 418 419 420Availability 421============ 422 423The ASTC functionality is specified as a set of feature profiles, allowing 424GPU hardware manufacturers to select which parts of the standard they 425implement. There are four commonly seen profiles: 426 427* "LDR": 428 * 2D blocks. 429 * LDR and sRGB color space. 430 * [KHR_texture_compression_astc_ldr][astc_ldr]: KHR OpenGL ES extension. 431* "LDR + Sliced 3D": 432 * 2D blocks and sliced 3D blocks. 433 * LDR and sRGB color space. 434 * [KHR_texture_compression_astc_sliced_3d][astc_3d]: KHR OpenGL ES extension. 435* "HDR": 436 * 2D and sliced 3D blocks. 437 * LDR, sRGB, and HDR color spaces. 438 * [KHR_texture_compression_astc_hdr][astc_ldr]: KHR OpenGL ES extension. 439* "Full": 440 * 2D, sliced 3D, and volumetric 3D blocks. 441 * LDR, sRGB, and HDR color spaces. 442 * [OES_texture_compression_astc][astc_full]: OES OpenGL ES extension. 443 444The LDR profile is mandatory in OpenGL ES 3.2 and a standardized optional 445feature for Vulkan, and therefore widely supported on contemporary mobile 446devices. The 2D HDR profile is not mandatory, but is widely supported. 447 4483D texturing 449------------ 450 451The APIs expose 3D textures in two flavors. 452 453The sliced 3D texture support builds a 3D texture from an array of 2D image 454slices that have each been individually compressed using 2D ASTC compression. 455This is required for the HDR profile, so is also widely supported. 456 457The volumetric 3D texture support uses the native 3D block sizes provided by 458ASTC to implement true volumetric compression. This enables a wider choice of 459low bitrate options than the 2D blocks, which is particularly important for 3D 460textures of any non-trivial size. Volumetric formats are not widely supported, 461but are supported on all of the Arm Mali GPUs that support ASTC. 462 463ASTC decode mode 464---------------- 465 466ASTC is specified to decompress texels into fp16 intermediate values, except 467for sRGB which always decompresses into 8-bit UNORM intermediates. For many use 468cases this gives more dynamic range and precision than required. This can cause 469a reduction in both texture cache efficiency and texture filtering performance 470due to the larger decompressed data size. 471 472A pair of extensions exist, and are widely supported on recent mobile GPUs, 473which allow applications to reduce the intermediate precision to either UNORM8 474(recommended for LDR textures) or RGB9e5 (recommended for HDR textures). 475 476* [OES_texture_compression_astc_decode_mode][astc_decode]: Allow UNORM8 477 intermediates 478* [OES_texture_compression_astc_decode_mode_rgb9e5][astc_decode]: Allow RGB9e5 479 intermediates 480 481[astc_ldr]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_hdr.txt 482[astc_3d]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_sliced_3d.txt 483[astc_full]: https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_compression_astc.txt 484[astc_decode]: https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_texture_compression_astc_decode_mode.txt 485 486- - - 487 488_Copyright © 2019-2022, Arm Limited and contributors. All rights reserved._ 489