162306a36Sopenharmony_ci// SPDX-License-Identifier: GPL-2.0-or-later 262306a36Sopenharmony_ci/* 362306a36Sopenharmony_ci * decompress_common.c - Code shared by the XPRESS and LZX decompressors 462306a36Sopenharmony_ci * 562306a36Sopenharmony_ci * Copyright (C) 2015 Eric Biggers 662306a36Sopenharmony_ci */ 762306a36Sopenharmony_ci 862306a36Sopenharmony_ci#include "decompress_common.h" 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ci/* 1162306a36Sopenharmony_ci * make_huffman_decode_table() - 1262306a36Sopenharmony_ci * 1362306a36Sopenharmony_ci * Build a decoding table for a canonical prefix code, or "Huffman code". 1462306a36Sopenharmony_ci * 1562306a36Sopenharmony_ci * This is an internal function, not part of the library API! 1662306a36Sopenharmony_ci * 1762306a36Sopenharmony_ci * This takes as input the length of the codeword for each symbol in the 1862306a36Sopenharmony_ci * alphabet and produces as output a table that can be used for fast 1962306a36Sopenharmony_ci * decoding of prefix-encoded symbols using read_huffsym(). 2062306a36Sopenharmony_ci * 2162306a36Sopenharmony_ci * Strictly speaking, a canonical prefix code might not be a Huffman 2262306a36Sopenharmony_ci * code. But this algorithm will work either way; and in fact, since 2362306a36Sopenharmony_ci * Huffman codes are defined in terms of symbol frequencies, there is no 2462306a36Sopenharmony_ci * way for the decompressor to know whether the code is a true Huffman 2562306a36Sopenharmony_ci * code or not until all symbols have been decoded. 2662306a36Sopenharmony_ci * 2762306a36Sopenharmony_ci * Because the prefix code is assumed to be "canonical", it can be 2862306a36Sopenharmony_ci * reconstructed directly from the codeword lengths. A prefix code is 2962306a36Sopenharmony_ci * canonical if and only if a longer codeword never lexicographically 3062306a36Sopenharmony_ci * precedes a shorter codeword, and the lexicographic ordering of 3162306a36Sopenharmony_ci * codewords of the same length is the same as the lexicographic ordering 3262306a36Sopenharmony_ci * of the corresponding symbols. Consequently, we can sort the symbols 3362306a36Sopenharmony_ci * primarily by codeword length and secondarily by symbol value, then 3462306a36Sopenharmony_ci * reconstruct the prefix code by generating codewords lexicographically 3562306a36Sopenharmony_ci * in that order. 3662306a36Sopenharmony_ci * 3762306a36Sopenharmony_ci * This function does not, however, generate the prefix code explicitly. 3862306a36Sopenharmony_ci * Instead, it directly builds a table for decoding symbols using the 3962306a36Sopenharmony_ci * code. The basic idea is this: given the next 'max_codeword_len' bits 4062306a36Sopenharmony_ci * in the input, we can look up the decoded symbol by indexing a table 4162306a36Sopenharmony_ci * containing 2**max_codeword_len entries. A codeword with length 4262306a36Sopenharmony_ci * 'max_codeword_len' will have exactly one entry in this table, whereas 4362306a36Sopenharmony_ci * a codeword shorter than 'max_codeword_len' will have multiple entries 4462306a36Sopenharmony_ci * in this table. Precisely, a codeword of length n will be represented 4562306a36Sopenharmony_ci * by 2**(max_codeword_len - n) entries in this table. The 0-based index 4662306a36Sopenharmony_ci * of each such entry will contain the corresponding codeword as a prefix 4762306a36Sopenharmony_ci * when zero-padded on the left to 'max_codeword_len' binary digits. 4862306a36Sopenharmony_ci * 4962306a36Sopenharmony_ci * That's the basic idea, but we implement two optimizations regarding 5062306a36Sopenharmony_ci * the format of the decode table itself: 5162306a36Sopenharmony_ci * 5262306a36Sopenharmony_ci * - For many compression formats, the maximum codeword length is too 5362306a36Sopenharmony_ci * long for it to be efficient to build the full decoding table 5462306a36Sopenharmony_ci * whenever a new prefix code is used. Instead, we can build the table 5562306a36Sopenharmony_ci * using only 2**table_bits entries, where 'table_bits' is some number 5662306a36Sopenharmony_ci * less than or equal to 'max_codeword_len'. Then, only codewords of 5762306a36Sopenharmony_ci * length 'table_bits' and shorter can be directly looked up. For 5862306a36Sopenharmony_ci * longer codewords, the direct lookup instead produces the root of a 5962306a36Sopenharmony_ci * binary tree. Using this tree, the decoder can do traditional 6062306a36Sopenharmony_ci * bit-by-bit decoding of the remainder of the codeword. Child nodes 6162306a36Sopenharmony_ci * are allocated in extra entries at the end of the table; leaf nodes 6262306a36Sopenharmony_ci * contain symbols. Note that the long-codeword case is, in general, 6362306a36Sopenharmony_ci * not performance critical, since in Huffman codes the most frequently 6462306a36Sopenharmony_ci * used symbols are assigned the shortest codeword lengths. 6562306a36Sopenharmony_ci * 6662306a36Sopenharmony_ci * - When we decode a symbol using a direct lookup of the table, we still 6762306a36Sopenharmony_ci * need to know its length so that the bitstream can be advanced by the 6862306a36Sopenharmony_ci * appropriate number of bits. The simple solution is to simply retain 6962306a36Sopenharmony_ci * the 'lens' array and use the decoded symbol as an index into it. 7062306a36Sopenharmony_ci * However, this requires two separate array accesses in the fast path. 7162306a36Sopenharmony_ci * The optimization is to store the length directly in the decode 7262306a36Sopenharmony_ci * table. We use the bottom 11 bits for the symbol and the top 5 bits 7362306a36Sopenharmony_ci * for the length. In addition, to combine this optimization with the 7462306a36Sopenharmony_ci * previous one, we introduce a special case where the top 2 bits of 7562306a36Sopenharmony_ci * the length are both set if the entry is actually the root of a 7662306a36Sopenharmony_ci * binary tree. 7762306a36Sopenharmony_ci * 7862306a36Sopenharmony_ci * @decode_table: 7962306a36Sopenharmony_ci * The array in which to create the decoding table. This must have 8062306a36Sopenharmony_ci * a length of at least ((2**table_bits) + 2 * num_syms) entries. 8162306a36Sopenharmony_ci * 8262306a36Sopenharmony_ci * @num_syms: 8362306a36Sopenharmony_ci * The number of symbols in the alphabet; also, the length of the 8462306a36Sopenharmony_ci * 'lens' array. Must be less than or equal to 2048. 8562306a36Sopenharmony_ci * 8662306a36Sopenharmony_ci * @table_bits: 8762306a36Sopenharmony_ci * The order of the decode table size, as explained above. Must be 8862306a36Sopenharmony_ci * less than or equal to 13. 8962306a36Sopenharmony_ci * 9062306a36Sopenharmony_ci * @lens: 9162306a36Sopenharmony_ci * An array of length @num_syms, indexable by symbol, that gives the 9262306a36Sopenharmony_ci * length of the codeword, in bits, for that symbol. The length can 9362306a36Sopenharmony_ci * be 0, which means that the symbol does not have a codeword 9462306a36Sopenharmony_ci * assigned. 9562306a36Sopenharmony_ci * 9662306a36Sopenharmony_ci * @max_codeword_len: 9762306a36Sopenharmony_ci * The longest codeword length allowed in the compression format. 9862306a36Sopenharmony_ci * All entries in 'lens' must be less than or equal to this value. 9962306a36Sopenharmony_ci * This must be less than or equal to 23. 10062306a36Sopenharmony_ci * 10162306a36Sopenharmony_ci * @working_space 10262306a36Sopenharmony_ci * A temporary array of length '2 * (max_codeword_len + 1) + 10362306a36Sopenharmony_ci * num_syms'. 10462306a36Sopenharmony_ci * 10562306a36Sopenharmony_ci * Returns 0 on success, or -1 if the lengths do not form a valid prefix 10662306a36Sopenharmony_ci * code. 10762306a36Sopenharmony_ci */ 10862306a36Sopenharmony_ciint make_huffman_decode_table(u16 decode_table[], const u32 num_syms, 10962306a36Sopenharmony_ci const u32 table_bits, const u8 lens[], 11062306a36Sopenharmony_ci const u32 max_codeword_len, 11162306a36Sopenharmony_ci u16 working_space[]) 11262306a36Sopenharmony_ci{ 11362306a36Sopenharmony_ci const u32 table_num_entries = 1 << table_bits; 11462306a36Sopenharmony_ci u16 * const len_counts = &working_space[0]; 11562306a36Sopenharmony_ci u16 * const offsets = &working_space[1 * (max_codeword_len + 1)]; 11662306a36Sopenharmony_ci u16 * const sorted_syms = &working_space[2 * (max_codeword_len + 1)]; 11762306a36Sopenharmony_ci int left; 11862306a36Sopenharmony_ci void *decode_table_ptr; 11962306a36Sopenharmony_ci u32 sym_idx; 12062306a36Sopenharmony_ci u32 codeword_len; 12162306a36Sopenharmony_ci u32 stores_per_loop; 12262306a36Sopenharmony_ci u32 decode_table_pos; 12362306a36Sopenharmony_ci u32 len; 12462306a36Sopenharmony_ci u32 sym; 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ci /* Count how many symbols have each possible codeword length. 12762306a36Sopenharmony_ci * Note that a length of 0 indicates the corresponding symbol is not 12862306a36Sopenharmony_ci * used in the code and therefore does not have a codeword. 12962306a36Sopenharmony_ci */ 13062306a36Sopenharmony_ci for (len = 0; len <= max_codeword_len; len++) 13162306a36Sopenharmony_ci len_counts[len] = 0; 13262306a36Sopenharmony_ci for (sym = 0; sym < num_syms; sym++) 13362306a36Sopenharmony_ci len_counts[lens[sym]]++; 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci /* We can assume all lengths are <= max_codeword_len, but we 13662306a36Sopenharmony_ci * cannot assume they form a valid prefix code. A codeword of 13762306a36Sopenharmony_ci * length n should require a proportion of the codespace equaling 13862306a36Sopenharmony_ci * (1/2)^n. The code is valid if and only if the codespace is 13962306a36Sopenharmony_ci * exactly filled by the lengths, by this measure. 14062306a36Sopenharmony_ci */ 14162306a36Sopenharmony_ci left = 1; 14262306a36Sopenharmony_ci for (len = 1; len <= max_codeword_len; len++) { 14362306a36Sopenharmony_ci left <<= 1; 14462306a36Sopenharmony_ci left -= len_counts[len]; 14562306a36Sopenharmony_ci if (left < 0) { 14662306a36Sopenharmony_ci /* The lengths overflow the codespace; that is, the code 14762306a36Sopenharmony_ci * is over-subscribed. 14862306a36Sopenharmony_ci */ 14962306a36Sopenharmony_ci return -1; 15062306a36Sopenharmony_ci } 15162306a36Sopenharmony_ci } 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ci if (left) { 15462306a36Sopenharmony_ci /* The lengths do not fill the codespace; that is, they form an 15562306a36Sopenharmony_ci * incomplete set. 15662306a36Sopenharmony_ci */ 15762306a36Sopenharmony_ci if (left == (1 << max_codeword_len)) { 15862306a36Sopenharmony_ci /* The code is completely empty. This is arguably 15962306a36Sopenharmony_ci * invalid, but in fact it is valid in LZX and XPRESS, 16062306a36Sopenharmony_ci * so we must allow it. By definition, no symbols can 16162306a36Sopenharmony_ci * be decoded with an empty code. Consequently, we 16262306a36Sopenharmony_ci * technically don't even need to fill in the decode 16362306a36Sopenharmony_ci * table. However, to avoid accessing uninitialized 16462306a36Sopenharmony_ci * memory if the algorithm nevertheless attempts to 16562306a36Sopenharmony_ci * decode symbols using such a code, we zero out the 16662306a36Sopenharmony_ci * decode table. 16762306a36Sopenharmony_ci */ 16862306a36Sopenharmony_ci memset(decode_table, 0, 16962306a36Sopenharmony_ci table_num_entries * sizeof(decode_table[0])); 17062306a36Sopenharmony_ci return 0; 17162306a36Sopenharmony_ci } 17262306a36Sopenharmony_ci return -1; 17362306a36Sopenharmony_ci } 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ci /* Sort the symbols primarily by length and secondarily by symbol order. 17662306a36Sopenharmony_ci */ 17762306a36Sopenharmony_ci 17862306a36Sopenharmony_ci /* Initialize 'offsets' so that offsets[len] for 1 <= len <= 17962306a36Sopenharmony_ci * max_codeword_len is the number of codewords shorter than 'len' bits. 18062306a36Sopenharmony_ci */ 18162306a36Sopenharmony_ci offsets[1] = 0; 18262306a36Sopenharmony_ci for (len = 1; len < max_codeword_len; len++) 18362306a36Sopenharmony_ci offsets[len + 1] = offsets[len] + len_counts[len]; 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ci /* Use the 'offsets' array to sort the symbols. Note that we do not 18662306a36Sopenharmony_ci * include symbols that are not used in the code. Consequently, fewer 18762306a36Sopenharmony_ci * than 'num_syms' entries in 'sorted_syms' may be filled. 18862306a36Sopenharmony_ci */ 18962306a36Sopenharmony_ci for (sym = 0; sym < num_syms; sym++) 19062306a36Sopenharmony_ci if (lens[sym]) 19162306a36Sopenharmony_ci sorted_syms[offsets[lens[sym]]++] = sym; 19262306a36Sopenharmony_ci 19362306a36Sopenharmony_ci /* Fill entries for codewords with length <= table_bits 19462306a36Sopenharmony_ci * --- that is, those short enough for a direct mapping. 19562306a36Sopenharmony_ci * 19662306a36Sopenharmony_ci * The table will start with entries for the shortest codeword(s), which 19762306a36Sopenharmony_ci * have the most entries. From there, the number of entries per 19862306a36Sopenharmony_ci * codeword will decrease. 19962306a36Sopenharmony_ci */ 20062306a36Sopenharmony_ci decode_table_ptr = decode_table; 20162306a36Sopenharmony_ci sym_idx = 0; 20262306a36Sopenharmony_ci codeword_len = 1; 20362306a36Sopenharmony_ci stores_per_loop = (1 << (table_bits - codeword_len)); 20462306a36Sopenharmony_ci for (; stores_per_loop != 0; codeword_len++, stores_per_loop >>= 1) { 20562306a36Sopenharmony_ci u32 end_sym_idx = sym_idx + len_counts[codeword_len]; 20662306a36Sopenharmony_ci 20762306a36Sopenharmony_ci for (; sym_idx < end_sym_idx; sym_idx++) { 20862306a36Sopenharmony_ci u16 entry; 20962306a36Sopenharmony_ci u16 *p; 21062306a36Sopenharmony_ci u32 n; 21162306a36Sopenharmony_ci 21262306a36Sopenharmony_ci entry = ((u32)codeword_len << 11) | sorted_syms[sym_idx]; 21362306a36Sopenharmony_ci p = (u16 *)decode_table_ptr; 21462306a36Sopenharmony_ci n = stores_per_loop; 21562306a36Sopenharmony_ci 21662306a36Sopenharmony_ci do { 21762306a36Sopenharmony_ci *p++ = entry; 21862306a36Sopenharmony_ci } while (--n); 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ci decode_table_ptr = p; 22162306a36Sopenharmony_ci } 22262306a36Sopenharmony_ci } 22362306a36Sopenharmony_ci 22462306a36Sopenharmony_ci /* If we've filled in the entire table, we are done. Otherwise, 22562306a36Sopenharmony_ci * there are codewords longer than table_bits for which we must 22662306a36Sopenharmony_ci * generate binary trees. 22762306a36Sopenharmony_ci */ 22862306a36Sopenharmony_ci decode_table_pos = (u16 *)decode_table_ptr - decode_table; 22962306a36Sopenharmony_ci if (decode_table_pos != table_num_entries) { 23062306a36Sopenharmony_ci u32 j; 23162306a36Sopenharmony_ci u32 next_free_tree_slot; 23262306a36Sopenharmony_ci u32 cur_codeword; 23362306a36Sopenharmony_ci 23462306a36Sopenharmony_ci /* First, zero out the remaining entries. This is 23562306a36Sopenharmony_ci * necessary so that these entries appear as 23662306a36Sopenharmony_ci * "unallocated" in the next part. Each of these entries 23762306a36Sopenharmony_ci * will eventually be filled with the representation of 23862306a36Sopenharmony_ci * the root node of a binary tree. 23962306a36Sopenharmony_ci */ 24062306a36Sopenharmony_ci j = decode_table_pos; 24162306a36Sopenharmony_ci do { 24262306a36Sopenharmony_ci decode_table[j] = 0; 24362306a36Sopenharmony_ci } while (++j != table_num_entries); 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ci /* We allocate child nodes starting at the end of the 24662306a36Sopenharmony_ci * direct lookup table. Note that there should be 24762306a36Sopenharmony_ci * 2*num_syms extra entries for this purpose, although 24862306a36Sopenharmony_ci * fewer than this may actually be needed. 24962306a36Sopenharmony_ci */ 25062306a36Sopenharmony_ci next_free_tree_slot = table_num_entries; 25162306a36Sopenharmony_ci 25262306a36Sopenharmony_ci /* Iterate through each codeword with length greater than 25362306a36Sopenharmony_ci * 'table_bits', primarily in order of codeword length 25462306a36Sopenharmony_ci * and secondarily in order of symbol. 25562306a36Sopenharmony_ci */ 25662306a36Sopenharmony_ci for (cur_codeword = decode_table_pos << 1; 25762306a36Sopenharmony_ci codeword_len <= max_codeword_len; 25862306a36Sopenharmony_ci codeword_len++, cur_codeword <<= 1) { 25962306a36Sopenharmony_ci u32 end_sym_idx = sym_idx + len_counts[codeword_len]; 26062306a36Sopenharmony_ci 26162306a36Sopenharmony_ci for (; sym_idx < end_sym_idx; sym_idx++, cur_codeword++) { 26262306a36Sopenharmony_ci /* 'sorted_sym' is the symbol represented by the 26362306a36Sopenharmony_ci * codeword. 26462306a36Sopenharmony_ci */ 26562306a36Sopenharmony_ci u32 sorted_sym = sorted_syms[sym_idx]; 26662306a36Sopenharmony_ci u32 extra_bits = codeword_len - table_bits; 26762306a36Sopenharmony_ci u32 node_idx = cur_codeword >> extra_bits; 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci /* Go through each bit of the current codeword 27062306a36Sopenharmony_ci * beyond the prefix of length @table_bits and 27162306a36Sopenharmony_ci * walk the appropriate binary tree, allocating 27262306a36Sopenharmony_ci * any slots that have not yet been allocated. 27362306a36Sopenharmony_ci * 27462306a36Sopenharmony_ci * Note that the 'pointer' entry to the binary 27562306a36Sopenharmony_ci * tree, which is stored in the direct lookup 27662306a36Sopenharmony_ci * portion of the table, is represented 27762306a36Sopenharmony_ci * identically to other internal (non-leaf) 27862306a36Sopenharmony_ci * nodes of the binary tree; it can be thought 27962306a36Sopenharmony_ci * of as simply the root of the tree. The 28062306a36Sopenharmony_ci * representation of these internal nodes is 28162306a36Sopenharmony_ci * simply the index of the left child combined 28262306a36Sopenharmony_ci * with the special bits 0xC000 to distinguish 28362306a36Sopenharmony_ci * the entry from direct mapping and leaf node 28462306a36Sopenharmony_ci * entries. 28562306a36Sopenharmony_ci */ 28662306a36Sopenharmony_ci do { 28762306a36Sopenharmony_ci /* At least one bit remains in the 28862306a36Sopenharmony_ci * codeword, but the current node is an 28962306a36Sopenharmony_ci * unallocated leaf. Change it to an 29062306a36Sopenharmony_ci * internal node. 29162306a36Sopenharmony_ci */ 29262306a36Sopenharmony_ci if (decode_table[node_idx] == 0) { 29362306a36Sopenharmony_ci decode_table[node_idx] = 29462306a36Sopenharmony_ci next_free_tree_slot | 0xC000; 29562306a36Sopenharmony_ci decode_table[next_free_tree_slot++] = 0; 29662306a36Sopenharmony_ci decode_table[next_free_tree_slot++] = 0; 29762306a36Sopenharmony_ci } 29862306a36Sopenharmony_ci 29962306a36Sopenharmony_ci /* Go to the left child if the next bit 30062306a36Sopenharmony_ci * in the codeword is 0; otherwise go to 30162306a36Sopenharmony_ci * the right child. 30262306a36Sopenharmony_ci */ 30362306a36Sopenharmony_ci node_idx = decode_table[node_idx] & 0x3FFF; 30462306a36Sopenharmony_ci --extra_bits; 30562306a36Sopenharmony_ci node_idx += (cur_codeword >> extra_bits) & 1; 30662306a36Sopenharmony_ci } while (extra_bits != 0); 30762306a36Sopenharmony_ci 30862306a36Sopenharmony_ci /* We've traversed the tree using the entire 30962306a36Sopenharmony_ci * codeword, and we're now at the entry where 31062306a36Sopenharmony_ci * the actual symbol will be stored. This is 31162306a36Sopenharmony_ci * distinguished from internal nodes by not 31262306a36Sopenharmony_ci * having its high two bits set. 31362306a36Sopenharmony_ci */ 31462306a36Sopenharmony_ci decode_table[node_idx] = sorted_sym; 31562306a36Sopenharmony_ci } 31662306a36Sopenharmony_ci } 31762306a36Sopenharmony_ci } 31862306a36Sopenharmony_ci return 0; 31962306a36Sopenharmony_ci} 320