1# third_party_lzma 2 3## 介绍 4 5LZMA 是著名的LZ77压缩算法的改良版本, 最大化地提高了压缩比率, 保持了高压缩速度和解压缩时较低的内存需要。 6 7LZMA2 基于 LZMA, 在压缩过程中提供了更好的多线程支持, 和其他改进优化。 8 97z 是一种数据压缩和文件档案的格式, 是7zip软件的主要文件格式 [**7z官网**](https://www.7-zip.org)。 107z 格式支持不同的压缩方式: LZMA, LZMA2 和其他, 同时也支持基于AES-256的对称加密。 11 12XZ 是一种使用LZMA2数据压缩的文件格式, XZ格式带有额外的特性: SHA/CRC数据校验, 用于提升压缩比率的filters, 拆分blocks和streams。 13 14## 软件架构 15 16软件架构说明 17 18| format/algorithm | C | C++ | C# | Java | 19| :------ | :---------| :----- | :----- | :----- | 20| LZMA 压缩和解压缩 | ✓ | ✓ | ✓ | ✓ | 21| LZMA2 压缩和解压缩 | ✓ | ✓ | | | 22| XZ 压缩和解压缩 | ✓ | ✓ | | | 23| 7Z 解压缩 | ✓ | ✓ | | | 24| 7Z 压缩 | | ✓ | | | 25| small SFXs for installers (7z decompression) | ✓ | | | | 26| SFXs and SFXs for installers (7z decompression) | | ✓ | | | 27 28--- 29 30```bash 31/third_party/lzma 32├── Asm # asm files (optimized code for CRC calculation and Intel-AES encryption) 33│ ├── arm 34│ ├── arm64 35│ └── x86 36├── C # C files (compression / decompression and other) 37│ └── Util 38│ ├── 7z # 7z decoder program (decoding 7z files) 39│ ├── Lzma # LZMA program (file->file LZMA encoder/decoder) 40│ ├── LzmaLib # LZMA library (.DLL for Windows) 41│ └── SfxSetup # small SFX module for installers 42├── CPP 43│ ├── Common # common files for C++ projects 44│ ├── Windows # common files for Windows related code 45│ └── 7zip # files related to 7-Zip 46│ ├── Archive # files related to archiving 47│ │ ├── Common # common files for archive handling 48│ │ └── 7z # 7z C++ Encoder/Decoder 49│ ├── Bundles # Modules that are bundles of other modules (files) 50│ │ ├── Alone7z # 7zr.exe: Standalone 7-Zip console program (reduced version) 51│ │ ├── Format7zExtractR # 7zxr.dll: Reduced version of 7z DLL: extracting from 7z/LZMA/BCJ/BCJ2. 52│ │ ├── Format7zR # 7zr.dll: Reduced version of 7z DLL: extracting/compressing to 7z/LZMA/BCJ/BCJ2 53│ │ ├── LzmaCon # lzma.exe: LZMA compression/decompression 54│ │ ├── LzmaSpec # example code for LZMA Specification 55│ │ ├── SFXCon # 7zCon.sfx: Console 7z SFX module 56│ │ ├── SFXSetup # 7zS.sfx: 7z SFX module for installers 57│ │ └── SFXWin # 7z.sfx: GUI 7z SFX module 58│ ├── Common # common files for 7-Zip 59│ ├── Compress # files for compression/decompression 60│ ├── Crypto # files for encryption / decompression 61│ └── UI # User Interface files 62│ ├── Client7z # Test application for 7za.dll, 7zr.dll, 7zxr.dll 63│ ├── Common # Common UI files 64│ ├── Console # Code for console program (7z.exe) 65│ ├── Explorer # Some code from 7-Zip Shell extension 66│ ├── FileManager # Some GUI code from 7-Zip File Manager 67│ └── GUI # Some GUI code from 7-Zip 68├── CS 69│ └── 7zip 70│ ├── Common # some common files for 7-Zip 71│ └── Compress # files related to compression/decompression 72│ ├── LZ # files related to LZ (Lempel-Ziv) compression algorithm 73│ ├── LZMA # LZMA compression/decompression 74│ ├── LzmaAlone # file->file LZMA compression/decompression 75│ └── RangeCoder # Range Coder (special code of compression/decompression) 76├── DOC 77│ ├── 7zC.txt # 7z ANSI-C Decoder description 78│ ├── 7zFormat.txt # 7z Format description 79│ ├── installer.txt # information about 7-Zip for installers 80│ ├── lzma-history.txt # history of LZMA SDK 81│ ├── lzma-sdk.txt # LZMA SDK description 82│ ├── lzma-specification.txt # Specification of LZMA 83│ ├── lzma.txt # LZMA compression description 84│ └── Methods.txt # Compression method IDs for .7z 85└── Java 86 └── SevenZip 87 └── Compression # files related to compression/decompression 88 ├── LZ # files related to LZ (Lempel-Ziv) compression algorithm 89 ├── LZMA # LZMA compression/decompression 90 └── RangeCoder # Range Coder (special code of compression/decompression) 91``` 92 93## 证书 94 95LZMA SDK is written and placed in the public domain by Igor Pavlov. 96 97Some code in LZMA SDK is based on public domain code from another developers: 98 99 1) PPMd var.H (2001): Dmitry Shkarin 100 101 2) SHA-256: Wei Dai (Crypto++ library) 102 103Anyone is free to copy, modify, publish, use, compile, sell, or distribute the 104original LZMA SDK code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. 105 106LZMA SDK code is compatible with open source licenses, for example, you can include it to GNU GPL or GNU LGPL code. 107 108## 编译构建 109 110### ***UNIX/Linux*** 111 112使用gcc和clang编译7-zip有多种选项,同时7-zip代码中两部分重要的代码: C和汇编。如果与汇编代码一起编译版本,会得到更快的7-zip二进制。7-zip的汇编代码遵循不同平台的语法。 113 114#### *arm64* 115 116gcc和clang arm64版本支持arm64汇编代码语法。 117 118#### *x86 and x86_64(AMD64)* 119 120Asmc Macro Assembler 和 JWasm 在Linux 系统上都支持MASM语法,但JWasm 不支持一些7-zip中使用的cpu指令。 121如果你想编译更快的7zip,必须在Linux上安装Asmc Macro Assembler [https://github.com/nidud/asmc](https://github.com/nidud/asmc) 122 123### ***构建命令*** 124 125目录中有两个主要文件用于编译 126 makefile - 使用nmake命令编译Windows版本的7zip 127 makefile.gcc - 使用make命令编译Linux/macOs版本的7zip 128 129首先切换到包含 `makefile.gcc`的目录下: 130 131```bash 132 cd CPP/7zip/Bundles/Alone7z 133``` 134 135```bash 136 make -j -f makefile.gcc 137``` 138 139另外在"CPP/7zip/"目录下的"*.mak"文件也可以与优化的代码同时编译,并且带有优化选项。比如: 140 141```bash 142 cd CPP/7zip/Bundles/Alone7z 143 make -j -f ../../cmpl_gcc.mak 144``` 145 146## **接口使用说明** 147 148这部分描述了C语言实现的LZMA编码和解码函数 149 150注意: 你也可以阅读参考 LZMA Specification (lzma-specification.txt from LZMA SDK) 151 152你也可以查看使用LZMA编码和解码的案例: 153 ***C/Util/Lzma/LzmaUtil.c*** 154 155### ***LZMA 压缩的文件格式*** 156 157```bash 158Offset Size Description 159 0 1 Special LZMA properties (lc,lp, pb in encoded form) 160 1 4 Dictionary size (little endian) 161 5 8 Uncompressed size (little endian). -1 means unknown size 162 13 Compressed data 163``` 164 165ANSI-C(American National Standards Institue) LZMA Decoder 166请注意ANSI-C的接口在LZMA SDK 4.58版本发生了变更,如果你想使用旧的接口,你可以从sourceforge.net 网站下载之前的LZMA SDK版本。 167 168使用 ANSI-C LZMA Decoder需要使用到以下文件: 169 170```bash 171 LzmaDec.h 172 LzmaDec.c 173 7zTypes.h 174 Precomp.h 175 Compiler.h 176``` 177 178参考案例: C/Util/Lzma/LzmaUtil.c 179 180LZMA decoding的内存要求 181 1821. LZMA decoding函数局部变量的栈内存不超过200-400字节 183 1842. LZMA Decoder使用字典缓冲区和内部state结构 185 1863. 内部state结构size消耗state_size = (4 + (1.5 << (lc + lp))) KB by default (lc=3, lp=0), state_size = 16 KB. 187 188### ***如何解压缩*** 189 190LZMA Decoder (ANSI-C version) 支持以下两种接口: 191 192**1)** 单次调用: LzmaDecode 193 194**2)** 多次调用:LzmaDec_DecodeToBuf(类似于zlib接口) 195 196**你必须自己定义内存分配器:** 197 198Example: 199 200```c 201void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); } 202void SzFree(void *p, void *address) { p = p; free(address); } 203ISzAlloc alloc = { SzAlloc, SzFree }; 204``` 205 206You can use p = p; operator to disable compiler warnings. 207 208#### ***单次调用*** 209 2101. 使用场景: RAM->RAM decompressing 2112. 编译文件: LzmaDec.h + LzmaDec.c + 7zTypes.h 2123. 编译宏: 不需要 2134. 内存需要: 214 215- Input buffer: compressed size 216- Output buffer: uncompressed size 217- LZMA Internal Structures: state_size (16 KB for default settings) 218 219**Interface:** 220 221```c 222 int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, 223 const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, 224 ELzmaStatus *status, ISzAlloc *alloc); 225 In: 226 dest - output data 227 destLen - output data size 228 src - input data 229 srcLen - input data size 230 propData - LZMA properties (5 bytes) 231 propSize - size of propData buffer (5 bytes) 232 finishMode - It has meaning only if the decoding reaches output limit (*destLen). 233 LZMA_FINISH_ANY - Decode just destLen bytes. 234 LZMA_FINISH_END - Stream must be finished after (*destLen). 235 You can use LZMA_FINISH_END, when you know that 236 current output buffer covers last bytes of stream. 237 alloc - Memory allocator. 238 239 Out: 240 destLen - processed output size 241 srcLen - processed input size 242 243 Output: 244 SZ_OK 245 status: 246 LZMA_STATUS_FINISHED_WITH_MARK 247 LZMA_STATUS_NOT_FINISHED 248 LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK 249 SZ_ERROR_DATA - Data error 250 SZ_ERROR_MEM - Memory allocation error 251 SZ_ERROR_UNSUPPORTED - Unsupported properties 252 SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src). 253``` 254 255如果LZMA decoder 在输出缓冲区上限前到达并看到了end_marker, 返回OK,同时输出的destLen的值会比输出缓冲区的上限小。 256 257你可以在完全解压缩后使用多重检查数据的完整性: 258 259 1. 检查返回值和status变量 260 2. 如果你已知未压缩的数据大小,检查 output(destLen) = uncompressedSize 261 3. 如果你已知压缩后的数据大小,检查 output(srcLen) = compressedSize 262 263#### ***根据状态多次调用 (类似于zlib接口)*** 264 2651. 使用场景: file->file decompressing 2662. 编译文件: LzmaDec.h + LzmaDec.c + 7zTypes.h 2673. 内存要求: 268 269- Buffer for input stream: any size (for example, 16 KB) 270- Buffer for output stream: any size (for example, 16 KB) 271- LZMA Internal Structures: state_size (16 KB for default settings) 272- LZMA dictionary (字典大小编码在LZMA properties header中) 273 274使用流程: 275 276**1)** 读取 LZMA properties (5 bytes) and uncompressed size (8 bytes, 小端序) 到 header: 277 278```c 279 unsigned char header[LZMA_PROPS_SIZE + 8]; 280 ReadFile(inFile, header, sizeof(header) 281``` 282 283**2)** 使用"LZMA properties"分配创建 CLzmaDec(state + dictionary) 284 285```c 286 CLzmaDec state; 287 LzmaDec_Constr(&state); 288 res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc); 289 if (res != SZ_OK) 290 return res; 291``` 292 293**3)** 初始化LzmaDec,在循环中调用LzmaDec_DecodeToBuf 294 295```c 296 LzmaDec_Init(&state); 297 for (;;) 298 { 299 ... 300 int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, 301 const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode); 302 ... 303 } 304``` 305 306**4)** 释放所有分配的结构 307 308```c 309 LzmaDec_Free(&state, &g_Alloc); 310``` 311 312Look example code: 313 C/Util/Lzma/LzmaUtil.c 314 315### ***如何压缩数据*** 316 3171 编译文件: 318 319```bash 320 7zTypes.h 321 Threads.h 322 LzmaEnc.h 323 LzmaEnc.c 324 LzFind.h 325 LzFind.c 326 LzFindMt.h 327 LzFindMt.c 328 LzHash.h 329``` 330 3312 内存需要: 332 333- (dictSize * 11.5 + 6 MB) + state_size 334 335Lzma Encoder 可使用两种内存分配器: 336 337- alloc - for small arrays. 338- allocBig - for big arrays. 339 340例如,你可以在allocBig分配器中使用大RAM页(2 MB)来获得更快的压缩速度。需要注意的是Windows对于大RAM页的实现较差。alloc和allocBig也可以使用相同的分配器。 341 342#### ***带有回调的单次压缩*** 343 344Look example code: 345 C/Util/Lzma/LzmaUtil.c 346 347使用场景: file->file compressing 348 349**1)** 你必须实现接口的回调函数 350 351```c 352ISeqInStream 353ISeqOutStream 354ICompressProgress 355ISzAlloc 356 357static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); } 358static void SzFree(void *p, void *address) { p = p; MyFree(address); } 359static ISzAlloc g_Alloc = { SzAlloc, SzFree }; 360 361 CFileSeqInStream inStream; 362 CFileSeqOutStream outStream; 363 364 inStream.funcTable.Read = MyRead; 365 inStream.file = inFile; 366 outStream.funcTable.Write = MyWrite; 367 outStream.file = outFile; 368``` 369 370**2)** 创建CLzmaEncHandle对象 371 372```c 373 CLzmaEncHandle enc; 374 375 enc = LzmaEnc_Create(&g_Alloc); 376 if (enc == 0) 377 return SZ_ERROR_MEM; 378``` 379 380**3)** 初始化CLzmaEncProps属性 381 382```c 383 LzmaEncProps_Init(&props); 384``` 385 386之后你可以改变这个结构里的一些属性 387 388**4)** 把上一个步骤设置的属性设置给LZMA Encoder 389 390```c 391 res = LzmaEnc_SetProps(enc, &props); 392``` 393 394**5)** 将编码的属性写入header 395 396```c 397 Byte header[LZMA_PROPS_SIZE + 8]; 398 size_t headerSize = LZMA_PROPS_SIZE; 399 UInt64 fileSize; 400 int i; 401 402 res = LzmaEnc_WriteProperties(enc, header, &headerSize); 403 fileSize = MyGetFileLength(inFile); 404 for (i = 0; i < 8; i++) 405 header[headerSize++] = (Byte)(fileSize >> (8 * i)); 406 MyWriteFileAndCheck(outFile, header, headerSize) 407``` 408 409**6)** 调用编码函数 410 411```c 412 res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, 413 NULL, &g_Alloc, &g_Alloc); 414``` 415 416**7)** 删除LZMA Encoder对象 417 418```c 419 LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc); 420``` 421 422如果回调函数返回某些错误码,LzmaEnc_Encode 也会返回该错误码或者返回类似于SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS。 423 424--- 425 426#### ***单次调用 RAM->RAM 压缩*** 427 428单次调用,RAM->RAM 压缩与设置回调的方式压缩类似, 但你需要提供指向buffers的指针而不是指向回调函数的指针。 429 430```c 431SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, 432 const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, 433 ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig); 434Return code: 435 SZ_OK - OK 436 SZ_ERROR_MEM - Memory allocation error 437 SZ_ERROR_PARAM - Incorrect paramater 438 SZ_ERROR_OUTPUT_EOF - output buffer overflow 439 SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version) 440``` 441 442宏 443 444```c 445_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code. 446_LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for 447 - some structures will be doubled in that case. 448_LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit. 449_LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type. 450_7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder. 451``` 452 453C++版本的 LZMA Encoder/Decoder 454 455C++版本的 LZMA 代码使用COM-LIKE接口。如果你想使用,可以了解下COM(Component Object Model)/OLE(Object Linking and Embedding)/DDE(Dynamic Data Exchange)的基础。 456 457C++版本的 LZMA 代码部门仅仅只是将ANSI-C代码包装了. 458 459注意: 460如果你使用7zip目录下的C++代码,你必须检查你正确地使用new 运算符 461MSVC 6.0 编译7-zip时,不会抛出 new 运算符的异常。所以7zip在 CPP\Common\NewHandler.cpp 重新定义了new operator 462 463```cpp 464operator new(size_t size) 465{ 466 void *p = ::malloc(size); 467 if (p == 0) 468 throw CNewException(); 469 return p; 470} 471``` 472 473如果你使用的MSCV版本支持new运算符的异常抛出,你在编译7zip时可以忽略"NewHandler.cpp"。 474所以使用标准的异常。实际上7zip的部分代码捕获的任何异常都会转换为HRESULT码。如果你调用7zip的COM interface 就不需要捕获CNewException. 475 476### ***接口案例:*** 477 478Look example code : C/Util/Lzma/LzmaUtil.c 479 480```bash 481 cd C/Util/Lzma 482 make -j -f makefile.gcc 483 output: ./_o/7lzma 484``` 485 486```bash 487 LZMA-C 22.01 (x64) : Igor Pavlov : Public domain : 2022-07-15 488 489 Usage: lzma <e|d> inputFile outputFile 490 e: encode file 491 d: decode file 492``` 493 494## 参与贡献 495 496[https://sourceforge.net/p/sevenzip/_list/tickets](https://sourceforge.net/p/sevenzip/_list/tickets) 497 498## 相关仓 499 500[**developtools\hiperf**](https://gitee.com/openharmony/developtools_hiperf) 501