127b27ec6Sopenharmony_ciLZ4 Frame Format Description
227b27ec6Sopenharmony_ci============================
327b27ec6Sopenharmony_ci
427b27ec6Sopenharmony_ci### Notices
527b27ec6Sopenharmony_ci
627b27ec6Sopenharmony_ciCopyright (c) 2013-2020 Yann Collet
727b27ec6Sopenharmony_ci
827b27ec6Sopenharmony_ciPermission is granted to copy and distribute this document
927b27ec6Sopenharmony_cifor any purpose and without charge,
1027b27ec6Sopenharmony_ciincluding translations into other languages
1127b27ec6Sopenharmony_ciand incorporation into compilations,
1227b27ec6Sopenharmony_ciprovided that the copyright notice and this notice are preserved,
1327b27ec6Sopenharmony_ciand that any substantive changes or deletions from the original
1427b27ec6Sopenharmony_ciare clearly marked.
1527b27ec6Sopenharmony_ciDistribution of this document is unlimited.
1627b27ec6Sopenharmony_ci
1727b27ec6Sopenharmony_ci### Version
1827b27ec6Sopenharmony_ci
1927b27ec6Sopenharmony_ci1.6.2 (12/08/2020)
2027b27ec6Sopenharmony_ci
2127b27ec6Sopenharmony_ci
2227b27ec6Sopenharmony_ciIntroduction
2327b27ec6Sopenharmony_ci------------
2427b27ec6Sopenharmony_ci
2527b27ec6Sopenharmony_ciThe purpose of this document is to define a lossless compressed data format,
2627b27ec6Sopenharmony_cithat is independent of CPU type, operating system,
2727b27ec6Sopenharmony_cifile system and character set, suitable for
2827b27ec6Sopenharmony_ciFile compression, Pipe and streaming compression
2927b27ec6Sopenharmony_ciusing the [LZ4 algorithm](http://www.lz4.org).
3027b27ec6Sopenharmony_ci
3127b27ec6Sopenharmony_ciThe data can be produced or consumed,
3227b27ec6Sopenharmony_cieven for an arbitrarily long sequentially presented input data stream,
3327b27ec6Sopenharmony_ciusing only an a priori bounded amount of intermediate storage,
3427b27ec6Sopenharmony_ciand hence can be used in data communications.
3527b27ec6Sopenharmony_ciThe format uses the LZ4 compression method,
3627b27ec6Sopenharmony_ciand optional [xxHash-32 checksum method](https://github.com/Cyan4973/xxHash),
3727b27ec6Sopenharmony_cifor detection of data corruption.
3827b27ec6Sopenharmony_ci
3927b27ec6Sopenharmony_ciThe data format defined by this specification
4027b27ec6Sopenharmony_cidoes not attempt to allow random access to compressed data.
4127b27ec6Sopenharmony_ci
4227b27ec6Sopenharmony_ciThis specification is intended for use by implementers of software
4327b27ec6Sopenharmony_cito compress data into LZ4 format and/or decompress data from LZ4 format.
4427b27ec6Sopenharmony_ciThe text of the specification assumes a basic background in programming
4527b27ec6Sopenharmony_ciat the level of bits and other primitive data representations.
4627b27ec6Sopenharmony_ci
4727b27ec6Sopenharmony_ciUnless otherwise indicated below,
4827b27ec6Sopenharmony_cia compliant compressor must produce data sets
4927b27ec6Sopenharmony_cithat conform to the specifications presented here.
5027b27ec6Sopenharmony_ciIt doesn't need to support all options though.
5127b27ec6Sopenharmony_ci
5227b27ec6Sopenharmony_ciA compliant decompressor must be able to decompress
5327b27ec6Sopenharmony_ciat least one working set of parameters
5427b27ec6Sopenharmony_cithat conforms to the specifications presented here.
5527b27ec6Sopenharmony_ciIt may also ignore checksums.
5627b27ec6Sopenharmony_ciWhenever it does not support a specific parameter within the compressed stream,
5727b27ec6Sopenharmony_ciit must produce a non-ambiguous error code
5827b27ec6Sopenharmony_ciand associated error message explaining which parameter is unsupported.
5927b27ec6Sopenharmony_ci
6027b27ec6Sopenharmony_ci
6127b27ec6Sopenharmony_ciGeneral Structure of LZ4 Frame format
6227b27ec6Sopenharmony_ci-------------------------------------
6327b27ec6Sopenharmony_ci
6427b27ec6Sopenharmony_ci| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
6527b27ec6Sopenharmony_ci|:-------:|:-------------:| ----- | ----- | ------- | ----------- |
6627b27ec6Sopenharmony_ci| 4 bytes |  3-15 bytes   |       |       | 4 bytes | 0-4 bytes   |
6727b27ec6Sopenharmony_ci
6827b27ec6Sopenharmony_ci__Magic Number__
6927b27ec6Sopenharmony_ci
7027b27ec6Sopenharmony_ci4 Bytes, Little endian format.
7127b27ec6Sopenharmony_ciValue : 0x184D2204
7227b27ec6Sopenharmony_ci
7327b27ec6Sopenharmony_ci__Frame Descriptor__
7427b27ec6Sopenharmony_ci
7527b27ec6Sopenharmony_ci3 to 15 Bytes, to be detailed in its own paragraph,
7627b27ec6Sopenharmony_cias it is the most important part of the spec.
7727b27ec6Sopenharmony_ci
7827b27ec6Sopenharmony_ciThe combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes
7927b27ec6Sopenharmony_cicalled ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.
8027b27ec6Sopenharmony_ci
8127b27ec6Sopenharmony_ci__Data Blocks__
8227b27ec6Sopenharmony_ci
8327b27ec6Sopenharmony_ciTo be detailed in its own paragraph.
8427b27ec6Sopenharmony_ciThat’s where compressed data is stored.
8527b27ec6Sopenharmony_ci
8627b27ec6Sopenharmony_ci__EndMark__
8727b27ec6Sopenharmony_ci
8827b27ec6Sopenharmony_ciThe flow of blocks ends when the last data block is followed by
8927b27ec6Sopenharmony_cithe 32-bit value `0x00000000`.
9027b27ec6Sopenharmony_ci
9127b27ec6Sopenharmony_ci__Content Checksum__
9227b27ec6Sopenharmony_ci
9327b27ec6Sopenharmony_ci_Content_Checksum_ verify that the full content has been decoded correctly.
9427b27ec6Sopenharmony_ciThe content checksum is the result of [xxHash-32 algorithm]
9527b27ec6Sopenharmony_cidigesting the original (decoded) data as input, and a seed of zero.
9627b27ec6Sopenharmony_ciContent checksum is only present when its associated flag
9727b27ec6Sopenharmony_ciis set in the frame descriptor.
9827b27ec6Sopenharmony_ciContent Checksum validates the result,
9927b27ec6Sopenharmony_cithat all blocks were fully transmitted in the correct order and without error,
10027b27ec6Sopenharmony_ciand also that the encoding/decoding process itself generated no distortion.
10127b27ec6Sopenharmony_ciIts usage is recommended.
10227b27ec6Sopenharmony_ci
10327b27ec6Sopenharmony_ciThe combined _EndMark_ and _Content_Checksum_ fields might sometimes be
10427b27ec6Sopenharmony_cireferred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.
10527b27ec6Sopenharmony_ci
10627b27ec6Sopenharmony_ci__Frame Concatenation__
10727b27ec6Sopenharmony_ci
10827b27ec6Sopenharmony_ciIn some circumstances, it may be preferable to append multiple frames,
10927b27ec6Sopenharmony_cifor example in order to add new data to an existing compressed file
11027b27ec6Sopenharmony_ciwithout re-framing it.
11127b27ec6Sopenharmony_ci
11227b27ec6Sopenharmony_ciIn such case, each frame has its own set of descriptor flags.
11327b27ec6Sopenharmony_ciEach frame is considered independent.
11427b27ec6Sopenharmony_ciThe only relation between frames is their sequential order.
11527b27ec6Sopenharmony_ci
11627b27ec6Sopenharmony_ciThe ability to decode multiple concatenated frames
11727b27ec6Sopenharmony_ciwithin a single stream or file
11827b27ec6Sopenharmony_ciis left outside of this specification.
11927b27ec6Sopenharmony_ciAs an example, the reference lz4 command line utility behavior is
12027b27ec6Sopenharmony_cito decode all concatenated frames in their sequential order.
12127b27ec6Sopenharmony_ci
12227b27ec6Sopenharmony_ci
12327b27ec6Sopenharmony_ciFrame Descriptor
12427b27ec6Sopenharmony_ci----------------
12527b27ec6Sopenharmony_ci
12627b27ec6Sopenharmony_ci| FLG     | BD      | (Content Size) | (Dictionary ID) | HC      |
12727b27ec6Sopenharmony_ci| ------- | ------- |:--------------:|:---------------:| ------- |
12827b27ec6Sopenharmony_ci| 1 byte  | 1 byte  |  0 - 8 bytes   |   0 - 4 bytes   | 1 byte  |
12927b27ec6Sopenharmony_ci
13027b27ec6Sopenharmony_ciThe descriptor uses a minimum of 3 bytes,
13127b27ec6Sopenharmony_ciand up to 15 bytes depending on optional parameters.
13227b27ec6Sopenharmony_ci
13327b27ec6Sopenharmony_ci__FLG byte__
13427b27ec6Sopenharmony_ci
13527b27ec6Sopenharmony_ci|  BitNb  |  7-6  |   5   |    4     |  3   |    2     |    1     |   0  |
13627b27ec6Sopenharmony_ci| ------- |-------|-------|----------|------|----------|----------|------|
13727b27ec6Sopenharmony_ci|FieldName|Version|B.Indep|B.Checksum|C.Size|C.Checksum|*Reserved*|DictID|
13827b27ec6Sopenharmony_ci
13927b27ec6Sopenharmony_ci
14027b27ec6Sopenharmony_ci__BD byte__
14127b27ec6Sopenharmony_ci
14227b27ec6Sopenharmony_ci|  BitNb  |     7    |     6-5-4     |  3-2-1-0 |
14327b27ec6Sopenharmony_ci| ------- | -------- | ------------- | -------- |
14427b27ec6Sopenharmony_ci|FieldName|*Reserved*| Block MaxSize |*Reserved*|
14527b27ec6Sopenharmony_ci
14627b27ec6Sopenharmony_ciIn the tables, bit 7 is highest bit, while bit 0 is lowest.
14727b27ec6Sopenharmony_ci
14827b27ec6Sopenharmony_ci__Version Number__
14927b27ec6Sopenharmony_ci
15027b27ec6Sopenharmony_ci2-bits field, must be set to `01`.
15127b27ec6Sopenharmony_ciAny other value cannot be decoded by this version of the specification.
15227b27ec6Sopenharmony_ciOther version numbers will use different flag layouts.
15327b27ec6Sopenharmony_ci
15427b27ec6Sopenharmony_ci__Block Independence flag__
15527b27ec6Sopenharmony_ci
15627b27ec6Sopenharmony_ciIf this flag is set to “1”, blocks are independent.
15727b27ec6Sopenharmony_ciIf this flag is set to “0”, each block depends on previous ones
15827b27ec6Sopenharmony_ci(up to LZ4 window size, which is 64 KB).
15927b27ec6Sopenharmony_ciIn such case, it’s necessary to decode all blocks in sequence.
16027b27ec6Sopenharmony_ci
16127b27ec6Sopenharmony_ciBlock dependency improves compression ratio, especially for small blocks.
16227b27ec6Sopenharmony_ciOn the other hand, it makes random access or multi-threaded decoding impossible.
16327b27ec6Sopenharmony_ci
16427b27ec6Sopenharmony_ci__Block checksum flag__
16527b27ec6Sopenharmony_ci
16627b27ec6Sopenharmony_ciIf this flag is set, each data block will be followed by a 4-bytes checksum,
16727b27ec6Sopenharmony_cicalculated by using the xxHash-32 algorithm on the raw (compressed) data block.
16827b27ec6Sopenharmony_ciThe intention is to detect data corruption (storage or transmission errors)
16927b27ec6Sopenharmony_ciimmediately, before decoding.
17027b27ec6Sopenharmony_ciBlock checksum usage is optional.
17127b27ec6Sopenharmony_ci
17227b27ec6Sopenharmony_ci__Content Size flag__
17327b27ec6Sopenharmony_ci
17427b27ec6Sopenharmony_ciIf this flag is set, the uncompressed size of data included within the frame
17527b27ec6Sopenharmony_ciwill be present as an 8 bytes unsigned little endian value, after the flags.
17627b27ec6Sopenharmony_ciContent Size usage is optional.
17727b27ec6Sopenharmony_ci
17827b27ec6Sopenharmony_ci__Content checksum flag__
17927b27ec6Sopenharmony_ci
18027b27ec6Sopenharmony_ciIf this flag is set, a 32-bits content checksum will be appended
18127b27ec6Sopenharmony_ciafter the EndMark.
18227b27ec6Sopenharmony_ci
18327b27ec6Sopenharmony_ci__Dictionary ID flag__
18427b27ec6Sopenharmony_ci
18527b27ec6Sopenharmony_ciIf this flag is set, a 4-bytes Dict-ID field will be present,
18627b27ec6Sopenharmony_ciafter the descriptor flags and the Content Size.
18727b27ec6Sopenharmony_ci
18827b27ec6Sopenharmony_ci__Block Maximum Size__
18927b27ec6Sopenharmony_ci
19027b27ec6Sopenharmony_ciThis information is useful to help the decoder allocate memory.
19127b27ec6Sopenharmony_ciSize here refers to the original (uncompressed) data size.
19227b27ec6Sopenharmony_ciBlock Maximum Size is one value among the following table :
19327b27ec6Sopenharmony_ci
19427b27ec6Sopenharmony_ci|  0  |  1  |  2  |  3  |   4   |   5    |  6   |  7   |
19527b27ec6Sopenharmony_ci| --- | --- | --- | --- | ----- | ------ | ---- | ---- |
19627b27ec6Sopenharmony_ci| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB |
19727b27ec6Sopenharmony_ci
19827b27ec6Sopenharmony_ciThe decoder may refuse to allocate block sizes above any system-specific size.
19927b27ec6Sopenharmony_ciUnused values may be used in a future revision of the spec.
20027b27ec6Sopenharmony_ciA decoder conformant with the current version of the spec
20127b27ec6Sopenharmony_ciis only able to decode block sizes defined in this spec.
20227b27ec6Sopenharmony_ci
20327b27ec6Sopenharmony_ci__Reserved bits__
20427b27ec6Sopenharmony_ci
20527b27ec6Sopenharmony_ciValue of reserved bits **must** be 0 (zero).
20627b27ec6Sopenharmony_ciReserved bit might be used in a future version of the specification,
20727b27ec6Sopenharmony_citypically enabling new optional features.
20827b27ec6Sopenharmony_ciWhen this happens, a decoder respecting the current specification version
20927b27ec6Sopenharmony_cishall not be able to decode such a frame.
21027b27ec6Sopenharmony_ci
21127b27ec6Sopenharmony_ci__Content Size__
21227b27ec6Sopenharmony_ci
21327b27ec6Sopenharmony_ciThis is the original (uncompressed) size.
21427b27ec6Sopenharmony_ciThis information is optional, and only present if the associated flag is set.
21527b27ec6Sopenharmony_ciContent size is provided using unsigned 8 Bytes, for a maximum of 16 Exabytes.
21627b27ec6Sopenharmony_ciFormat is Little endian.
21727b27ec6Sopenharmony_ciThis value is informational, typically for display or memory allocation.
21827b27ec6Sopenharmony_ciIt can be skipped by a decoder, or used to validate content correctness.
21927b27ec6Sopenharmony_ci
22027b27ec6Sopenharmony_ci__Dictionary ID__
22127b27ec6Sopenharmony_ci
22227b27ec6Sopenharmony_ciDict-ID is only present if the associated flag is set.
22327b27ec6Sopenharmony_ciIt's an unsigned 32-bits value, stored using little-endian convention.
22427b27ec6Sopenharmony_ciA dictionary is useful to compress short input sequences.
22527b27ec6Sopenharmony_ciThe compressor can take advantage of the dictionary context
22627b27ec6Sopenharmony_cito encode the input in a more compact manner.
22727b27ec6Sopenharmony_ciIt works as a kind of “known prefix” which is used by
22827b27ec6Sopenharmony_ciboth the compressor and the decompressor to “warm-up” reference tables.
22927b27ec6Sopenharmony_ci
23027b27ec6Sopenharmony_ciThe decompressor can use Dict-ID identifier to determine
23127b27ec6Sopenharmony_ciwhich dictionary must be used to correctly decode data.
23227b27ec6Sopenharmony_ciThe compressor and the decompressor must use exactly the same dictionary.
23327b27ec6Sopenharmony_ciIt's presumed that the 32-bits dictID uniquely identifies a dictionary.
23427b27ec6Sopenharmony_ci
23527b27ec6Sopenharmony_ciWithin a single frame, a single dictionary can be defined.
23627b27ec6Sopenharmony_ciWhen the frame descriptor defines independent blocks,
23727b27ec6Sopenharmony_cieach block will be initialized with the same dictionary.
23827b27ec6Sopenharmony_ciIf the frame descriptor defines linked blocks,
23927b27ec6Sopenharmony_cithe dictionary will only be used once, at the beginning of the frame.
24027b27ec6Sopenharmony_ci
24127b27ec6Sopenharmony_ci__Header Checksum__
24227b27ec6Sopenharmony_ci
24327b27ec6Sopenharmony_ciOne-byte checksum of combined descriptor fields, including optional ones.
24427b27ec6Sopenharmony_ciThe value is the second byte of `xxh32()` : ` (xxh32()>>8) & 0xFF `
24527b27ec6Sopenharmony_ciusing zero as a seed, and the full Frame Descriptor as an input
24627b27ec6Sopenharmony_ci(including optional fields when they are present).
24727b27ec6Sopenharmony_ciA wrong checksum indicates that the descriptor is erroneous.
24827b27ec6Sopenharmony_ci
24927b27ec6Sopenharmony_ci
25027b27ec6Sopenharmony_ciData Blocks
25127b27ec6Sopenharmony_ci-----------
25227b27ec6Sopenharmony_ci
25327b27ec6Sopenharmony_ci| Block Size |  data  | (Block Checksum) |
25427b27ec6Sopenharmony_ci|:----------:| ------ |:----------------:|
25527b27ec6Sopenharmony_ci|  4 bytes   |        |   0 - 4 bytes    |
25627b27ec6Sopenharmony_ci
25727b27ec6Sopenharmony_ci
25827b27ec6Sopenharmony_ci__Block Size__
25927b27ec6Sopenharmony_ci
26027b27ec6Sopenharmony_ciThis field uses 4-bytes, format is little-endian.
26127b27ec6Sopenharmony_ci
26227b27ec6Sopenharmony_ciIf the highest bit is set (`1`), the block is uncompressed.
26327b27ec6Sopenharmony_ci
26427b27ec6Sopenharmony_ciIf the highest bit is not set (`0`), the block is LZ4-compressed,
26527b27ec6Sopenharmony_ciusing the [LZ4 block format specification](https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md).
26627b27ec6Sopenharmony_ci
26727b27ec6Sopenharmony_ciAll other bits give the size, in bytes, of the data section.
26827b27ec6Sopenharmony_ciThe size does not include the block checksum if present.
26927b27ec6Sopenharmony_ci
27027b27ec6Sopenharmony_ci_Block_Size_ shall never be larger than _Block_Maximum_Size_.
27127b27ec6Sopenharmony_ciSuch an outcome could potentially happen for non-compressible sources.
27227b27ec6Sopenharmony_ciIn such a case, such data block must be passed using uncompressed format.
27327b27ec6Sopenharmony_ci
27427b27ec6Sopenharmony_ciA value of `0x00000000` is invalid, and signifies an _EndMark_ instead.
27527b27ec6Sopenharmony_ciNote that this is different from a value of `0x80000000` (highest bit set),
27627b27ec6Sopenharmony_ciwhich is an uncompressed block of size 0 (empty),
27727b27ec6Sopenharmony_ciwhich is valid, and therefore doesn't end a frame.
27827b27ec6Sopenharmony_ciNote that, if _Block_checksum_ is enabled,
27927b27ec6Sopenharmony_cieven an empty block must be followed by a 32-bit block checksum.
28027b27ec6Sopenharmony_ci
28127b27ec6Sopenharmony_ci__Data__
28227b27ec6Sopenharmony_ci
28327b27ec6Sopenharmony_ciWhere the actual data to decode stands.
28427b27ec6Sopenharmony_ciIt might be compressed or not, depending on previous field indications.
28527b27ec6Sopenharmony_ci
28627b27ec6Sopenharmony_ciWhen compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md).
28727b27ec6Sopenharmony_ci
28827b27ec6Sopenharmony_ciNote that a block is not necessarily full.
28927b27ec6Sopenharmony_ciUncompressed size of data can be any size __up to__ _Block_Maximum_Size_,
29027b27ec6Sopenharmony_ciso it may contain less data than the maximum block size.
29127b27ec6Sopenharmony_ci
29227b27ec6Sopenharmony_ci__Block checksum__
29327b27ec6Sopenharmony_ci
29427b27ec6Sopenharmony_ciOnly present if the associated flag is set.
29527b27ec6Sopenharmony_ciThis is a 4-bytes checksum value, in little endian format,
29627b27ec6Sopenharmony_cicalculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block,
29727b27ec6Sopenharmony_ciand a seed of zero.
29827b27ec6Sopenharmony_ciThe intention is to detect data corruption (storage or transmission errors)
29927b27ec6Sopenharmony_cibefore decoding.
30027b27ec6Sopenharmony_ci
30127b27ec6Sopenharmony_ci_Block_checksum_ can be cumulative with _Content_checksum_.
30227b27ec6Sopenharmony_ci
30327b27ec6Sopenharmony_ci[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md
30427b27ec6Sopenharmony_ci
30527b27ec6Sopenharmony_ci
30627b27ec6Sopenharmony_ciSkippable Frames
30727b27ec6Sopenharmony_ci----------------
30827b27ec6Sopenharmony_ci
30927b27ec6Sopenharmony_ci| Magic Number | Frame Size | User Data |
31027b27ec6Sopenharmony_ci|:------------:|:----------:| --------- |
31127b27ec6Sopenharmony_ci|   4 bytes    |  4 bytes   |           |
31227b27ec6Sopenharmony_ci
31327b27ec6Sopenharmony_ciSkippable frames allow the integration of user-defined data
31427b27ec6Sopenharmony_ciinto a flow of concatenated frames.
31527b27ec6Sopenharmony_ciIts design is pretty straightforward,
31627b27ec6Sopenharmony_ciwith the sole objective to allow the decoder to quickly skip
31727b27ec6Sopenharmony_ciover user-defined data and continue decoding.
31827b27ec6Sopenharmony_ci
31927b27ec6Sopenharmony_ciFor the purpose of facilitating identification,
32027b27ec6Sopenharmony_ciit is discouraged to start a flow of concatenated frames with a skippable frame.
32127b27ec6Sopenharmony_ciIf there is a need to start such a flow with some user data
32227b27ec6Sopenharmony_ciencapsulated into a skippable frame,
32327b27ec6Sopenharmony_ciit’s recommended to start with a zero-byte LZ4 frame
32427b27ec6Sopenharmony_cifollowed by a skippable frame.
32527b27ec6Sopenharmony_ciThis will make it easier for file type identifiers.
32627b27ec6Sopenharmony_ci
32727b27ec6Sopenharmony_ci
32827b27ec6Sopenharmony_ci__Magic Number__
32927b27ec6Sopenharmony_ci
33027b27ec6Sopenharmony_ci4 Bytes, Little endian format.
33127b27ec6Sopenharmony_ciValue : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
33227b27ec6Sopenharmony_ciAll 16 values are valid to identify a skippable frame.
33327b27ec6Sopenharmony_ci
33427b27ec6Sopenharmony_ci__Frame Size__
33527b27ec6Sopenharmony_ci
33627b27ec6Sopenharmony_ciThis is the size, in bytes, of the following User Data
33727b27ec6Sopenharmony_ci(without including the magic number nor the size field itself).
33827b27ec6Sopenharmony_ci4 Bytes, Little endian format, unsigned 32-bits.
33927b27ec6Sopenharmony_ciThis means User Data can’t be bigger than (2^32-1) Bytes.
34027b27ec6Sopenharmony_ci
34127b27ec6Sopenharmony_ci__User Data__
34227b27ec6Sopenharmony_ci
34327b27ec6Sopenharmony_ciUser Data can be anything. Data will just be skipped by the decoder.
34427b27ec6Sopenharmony_ci
34527b27ec6Sopenharmony_ci
34627b27ec6Sopenharmony_ciLegacy frame
34727b27ec6Sopenharmony_ci------------
34827b27ec6Sopenharmony_ci
34927b27ec6Sopenharmony_ciThe Legacy frame format was defined into the initial versions of “LZ4Demo”.
35027b27ec6Sopenharmony_ciNewer compressors should not use this format anymore, as it is too restrictive.
35127b27ec6Sopenharmony_ci
35227b27ec6Sopenharmony_ciMain characteristics of the legacy format :
35327b27ec6Sopenharmony_ci
35427b27ec6Sopenharmony_ci- Fixed block size : 8 MB.
35527b27ec6Sopenharmony_ci- All blocks must be completely filled, except the last one.
35627b27ec6Sopenharmony_ci- All blocks are always compressed, even when compression is detrimental.
35727b27ec6Sopenharmony_ci- The last block is detected either because
35827b27ec6Sopenharmony_ci  it is followed by the “EOF” (End of File) mark,
35927b27ec6Sopenharmony_ci  or because it is followed by a known Frame Magic Number.
36027b27ec6Sopenharmony_ci- No checksum
36127b27ec6Sopenharmony_ci- Convention is Little endian
36227b27ec6Sopenharmony_ci
36327b27ec6Sopenharmony_ci| MagicNb | B.CSize | CData | B.CSize | CData |  (...)  | EndMark |
36427b27ec6Sopenharmony_ci| ------- | ------- | ----- | ------- | ----- | ------- | ------- |
36527b27ec6Sopenharmony_ci| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times |   EOF   |
36627b27ec6Sopenharmony_ci
36727b27ec6Sopenharmony_ci
36827b27ec6Sopenharmony_ci__Magic Number__
36927b27ec6Sopenharmony_ci
37027b27ec6Sopenharmony_ci4 Bytes, Little endian format.
37127b27ec6Sopenharmony_ciValue : 0x184C2102
37227b27ec6Sopenharmony_ci
37327b27ec6Sopenharmony_ci__Block Compressed Size__
37427b27ec6Sopenharmony_ci
37527b27ec6Sopenharmony_ciThis is the size, in bytes, of the following compressed data block.
37627b27ec6Sopenharmony_ci4 Bytes, Little endian format.
37727b27ec6Sopenharmony_ci
37827b27ec6Sopenharmony_ci__Data__
37927b27ec6Sopenharmony_ci
38027b27ec6Sopenharmony_ciWhere the actual compressed data stands.
38127b27ec6Sopenharmony_ciData is always compressed, even when compression is detrimental.
38227b27ec6Sopenharmony_ci
38327b27ec6Sopenharmony_ci__EndMark__
38427b27ec6Sopenharmony_ci
38527b27ec6Sopenharmony_ciEnd of legacy frame is implicit only.
38627b27ec6Sopenharmony_ciIt must be followed by a standard EOF (End Of File) signal,
38727b27ec6Sopenharmony_ciwhether it is a file or a stream.
38827b27ec6Sopenharmony_ci
38927b27ec6Sopenharmony_ciAlternatively, if the frame is followed by a valid Frame Magic Number,
39027b27ec6Sopenharmony_ciit is considered completed.
39127b27ec6Sopenharmony_ciThis policy makes it possible to concatenate legacy frames.
39227b27ec6Sopenharmony_ci
39327b27ec6Sopenharmony_ciAny other value will be interpreted as a block size,
39427b27ec6Sopenharmony_ciand trigger an error if it does not fit within acceptable range.
39527b27ec6Sopenharmony_ci
39627b27ec6Sopenharmony_ci
39727b27ec6Sopenharmony_ciVersion changes
39827b27ec6Sopenharmony_ci---------------
39927b27ec6Sopenharmony_ci
40027b27ec6Sopenharmony_ci1.6.2 : clarifies specification of _EndMark_
40127b27ec6Sopenharmony_ci
40227b27ec6Sopenharmony_ci1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"
40327b27ec6Sopenharmony_ci
40427b27ec6Sopenharmony_ci1.6.0 : restored Dictionary ID field in Frame header
40527b27ec6Sopenharmony_ci
40627b27ec6Sopenharmony_ci1.5.1 : changed document format to MarkDown
40727b27ec6Sopenharmony_ci
40827b27ec6Sopenharmony_ci1.5 : removed Dictionary ID from specification
40927b27ec6Sopenharmony_ci
41027b27ec6Sopenharmony_ci1.4.1 : changed wording from “stream” to “frame”
41127b27ec6Sopenharmony_ci
41227b27ec6Sopenharmony_ci1.4 : added skippable streams, re-added stream checksum
41327b27ec6Sopenharmony_ci
41427b27ec6Sopenharmony_ci1.3 : modified header checksum
41527b27ec6Sopenharmony_ci
41627b27ec6Sopenharmony_ci1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.
41727b27ec6Sopenharmony_ci
41827b27ec6Sopenharmony_ci1.1 : optional fields are now part of the descriptor
41927b27ec6Sopenharmony_ci
42027b27ec6Sopenharmony_ci1.0 : changed “block size” specification, adding a compressed/uncompressed flag
42127b27ec6Sopenharmony_ci
42227b27ec6Sopenharmony_ci0.9 : reduced scale of “block maximum size” table
42327b27ec6Sopenharmony_ci
42427b27ec6Sopenharmony_ci0.8 : removed : high compression flag
42527b27ec6Sopenharmony_ci
42627b27ec6Sopenharmony_ci0.7 : removed : stream checksum
42727b27ec6Sopenharmony_ci
42827b27ec6Sopenharmony_ci0.6 : settled : stream size uses 8 bytes, endian convention is little endian
42927b27ec6Sopenharmony_ci
43027b27ec6Sopenharmony_ci0.5: added copyright notice
43127b27ec6Sopenharmony_ci
43227b27ec6Sopenharmony_ci0.4 : changed format to Google Doc compatible OpenDocument
433