1/* stb_image - v2.28 - public domain image loader - http://nothings.org/stb
2                                  no warranty implied; use at your own risk
3
4   Do this:
5      #define STB_IMAGE_IMPLEMENTATION
6   before you include this file in *one* C or C++ file to create the implementation.
7
8   // i.e. it should look like this:
9   #include ...
10   #include ...
11   #include ...
12   #define STB_IMAGE_IMPLEMENTATION
13   #include "stb_image.h"
14
15   You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16   And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17
18
19   QUICK NOTES:
20      Primarily of interest to game developers and other people who can
21          avoid problematic images and only need the trivial interface
22
23      JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24      PNG 1/2/4/8/16-bit-per-channel
25
26      TGA (not sure what subset, if a subset)
27      BMP non-1bpp, non-RLE
28      PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29
30      GIF (*comp always reports as 4-channel)
31      HDR (radiance rgbE format)
32      PIC (Softimage PIC)
33      PNM (PPM and PGM binary only)
34
35      Animated GIF still needs a proper API, but here's one way to do it:
36          http://gist.github.com/urraka/685d9a6340b26b830d49
37
38      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39      - decode from arbitrary I/O callbacks
40      - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41
42   Full documentation under "DOCUMENTATION" below.
43
44
45LICENSE
46
47  See end of file for license information.
48
49RECENT REVISION HISTORY:
50
51      2.28  (2023-01-29) many error fixes, security errors, just tons of stuff
52      2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
53      2.26  (2020-07-13) many minor fixes
54      2.25  (2020-02-02) fix warnings
55      2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
56      2.23  (2019-08-11) fix clang static analysis warning
57      2.22  (2019-03-04) gif fixes, fix warnings
58      2.21  (2019-02-25) fix typo in comment
59      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
60      2.19  (2018-02-11) fix warning
61      2.18  (2018-01-30) fix warnings
62      2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
63      2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
64      2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
65      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
66      2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
67      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
68      2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
69                         RGB-format JPEG; remove white matting in PSD;
70                         allocate large structures on the stack;
71                         correct channel count for PNG & BMP
72      2.10  (2016-01-22) avoid warning introduced in 2.09
73      2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
74
75   See end of file for full revision history.
76
77
78 ============================    Contributors    =========================
79
80 Image formats                          Extensions, features
81    Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
82    Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
83    Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
84    Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
85    Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
86    Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
87    Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
88    github:urraka (animated gif)           Junggon Kim (PNM comments)
89    Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
90                                           socks-the-fox (16-bit PNG)
91                                           Jeremy Sawicki (handle all ImageNet JPGs)
92 Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
93    Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
94    Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
95    John-Mark Allen
96    Carmelo J Fdez-Aguera
97
98 Bug & warning fixes
99    Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
100    Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
101    Phil Jordan                                Dave Moore           Roy Eltham
102    Hayaki Saito            Nathan Reed        Won Chun
103    Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
104    Thomas Ruf              Ronny Chevalier                         github:rlyeh
105    Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
106    Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
107    Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
108    Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
109    Cass Everitt            Ryamond Barbiero                        github:grim210
110    Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
111    Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
112    Josh Tobin              Neil Bickford      Matthew Gregan       github:poppolopoppo
113    Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
114    Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
115                            Brad Weinberger    Matvey Cherevko      github:mosra
116    Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
117    Ryan C. Gordon          [reserved]                              [reserved]
118                     DO NOT ADD YOUR NAME HERE
119
120                     Jacko Dirks
121
122  To add your name to the credits, pick a random blank space in the middle and fill it.
123  80% of merge conflicts on stb PRs are due to people adding their name at the end
124  of the credits.
125*/
126
127#ifndef STBI_INCLUDE_STB_IMAGE_H
128#define STBI_INCLUDE_STB_IMAGE_H
129
130// DOCUMENTATION
131//
132// Limitations:
133//    - no 12-bit-per-channel JPEG
134//    - no JPEGs with arithmetic coding
135//    - GIF always returns *comp=4
136//
137// Basic usage (see HDR discussion below for HDR usage):
138//    int x,y,n;
139//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
140//    // ... process data if not NULL ...
141//    // ... x = width, y = height, n = # 8-bit components per pixel ...
142//    // ... replace '0' with '1'..'4' to force that many components per pixel
143//    // ... but 'n' will always be the number that it would have been if you said 0
144//    stbi_image_free(data);
145//
146// Standard parameters:
147//    int *x                 -- outputs image width in pixels
148//    int *y                 -- outputs image height in pixels
149//    int *channels_in_file  -- outputs # of image components in image file
150//    int desired_channels   -- if non-zero, # of image components requested in result
151//
152// The return value from an image loader is an 'unsigned char *' which points
153// to the pixel data, or NULL on an allocation failure or if the image is
154// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
155// with each pixel consisting of N interleaved 8-bit components; the first
156// pixel pointed to is top-left-most in the image. There is no padding between
157// image scanlines or between pixels, regardless of format. The number of
158// components N is 'desired_channels' if desired_channels is non-zero, or
159// *channels_in_file otherwise. If desired_channels is non-zero,
160// *channels_in_file has the number of components that _would_ have been
161// output otherwise. E.g. if you set desired_channels to 4, you will always
162// get RGBA output, but you can check *channels_in_file to see if it's trivially
163// opaque because e.g. there were only 3 channels in the source image.
164//
165// An output image with N components has the following components interleaved
166// in this order in each pixel:
167//
168//     N=#comp     components
169//       1           grey
170//       2           grey, alpha
171//       3           red, green, blue
172//       4           red, green, blue, alpha
173//
174// If image loading fails for any reason, the return value will be NULL,
175// and *x, *y, *channels_in_file will be unchanged. The function
176// stbi_failure_reason() can be queried for an extremely brief, end-user
177// unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
178// to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
179// more user-friendly ones.
180//
181// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
182//
183// To query the width, height and component count of an image without having to
184// decode the full file, you can use the stbi_info family of functions:
185//
186//   int x,y,n,ok;
187//   ok = stbi_info(filename, &x, &y, &n);
188//   // returns ok=1 and sets x, y, n if image is a supported format,
189//   // 0 otherwise.
190//
191// Note that stb_image pervasively uses ints in its public API for sizes,
192// including sizes of memory buffers. This is now part of the API and thus
193// hard to change without causing breakage. As a result, the various image
194// loaders all have certain limits on image size; these differ somewhat
195// by format but generally boil down to either just under 2GB or just under
196// 1GB. When the decoded image would be larger than this, stb_image decoding
197// will fail.
198//
199// Additionally, stb_image will reject image files that have any of their
200// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
201// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
202// the only way to have an image with such dimensions load correctly
203// is for it to have a rather extreme aspect ratio. Either way, the
204// assumption here is that such larger images are likely to be malformed
205// or malicious. If you do need to load an image with individual dimensions
206// larger than that, and it still fits in the overall size limit, you can
207// #define STBI_MAX_DIMENSIONS on your own to be something larger.
208//
209// ===========================================================================
210//
211// UNICODE:
212//
213//   If compiling for Windows and you wish to use Unicode filenames, compile
214//   with
215//       #define STBI_WINDOWS_UTF8
216//   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
217//   Windows wchar_t filenames to utf8.
218//
219// ===========================================================================
220//
221// Philosophy
222//
223// stb libraries are designed with the following priorities:
224//
225//    1. easy to use
226//    2. easy to maintain
227//    3. good performance
228//
229// Sometimes I let "good performance" creep up in priority over "easy to maintain",
230// and for best performance I may provide less-easy-to-use APIs that give higher
231// performance, in addition to the easy-to-use ones. Nevertheless, it's important
232// to keep in mind that from the standpoint of you, a client of this library,
233// all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
234//
235// Some secondary priorities arise directly from the first two, some of which
236// provide more explicit reasons why performance can't be emphasized.
237//
238//    - Portable ("ease of use")
239//    - Small source code footprint ("easy to maintain")
240//    - No dependencies ("ease of use")
241//
242// ===========================================================================
243//
244// I/O callbacks
245//
246// I/O callbacks allow you to read from arbitrary sources, like packaged
247// files or some other source. Data read from callbacks are processed
248// through a small internal buffer (currently 128 bytes) to try to reduce
249// overhead.
250//
251// The three functions you must define are "read" (reads some bytes of data),
252// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
253//
254// ===========================================================================
255//
256// SIMD support
257//
258// The JPEG decoder will try to automatically use SIMD kernels on x86 when
259// supported by the compiler. For ARM Neon support, you must explicitly
260// request it.
261//
262// (The old do-it-yourself SIMD API is no longer supported in the current
263// code.)
264//
265// On x86, SSE2 will automatically be used when available based on a run-time
266// test; if not, the generic C versions are used as a fall-back. On ARM targets,
267// the typical path is to have separate builds for NEON and non-NEON devices
268// (at least this is true for iOS and Android). Therefore, the NEON support is
269// toggled by a build flag: define STBI_NEON to get NEON loops.
270//
271// If for some reason you do not want to use any of SIMD code, or if
272// you have issues compiling it, you can disable it entirely by
273// defining STBI_NO_SIMD.
274//
275// ===========================================================================
276//
277// HDR image support   (disable by defining STBI_NO_HDR)
278//
279// stb_image supports loading HDR images in general, and currently the Radiance
280// .HDR file format specifically. You can still load any file through the existing
281// interface; if you attempt to load an HDR file, it will be automatically remapped
282// to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
283// both of these constants can be reconfigured through this interface:
284//
285//     stbi_hdr_to_ldr_gamma(2.2f);
286//     stbi_hdr_to_ldr_scale(1.0f);
287//
288// (note, do not use _inverse_ constants; stbi_image will invert them
289// appropriately).
290//
291// Additionally, there is a new, parallel interface for loading files as
292// (linear) floats to preserve the full dynamic range:
293//
294//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
295//
296// If you load LDR images through this interface, those images will
297// be promoted to floating point values, run through the inverse of
298// constants corresponding to the above:
299//
300//     stbi_ldr_to_hdr_scale(1.0f);
301//     stbi_ldr_to_hdr_gamma(2.2f);
302//
303// Finally, given a filename (or an open file or memory block--see header
304// file for details) containing image data, you can query for the "most
305// appropriate" interface to use (that is, whether the image is HDR or
306// not), using:
307//
308//     stbi_is_hdr(char *filename);
309//
310// ===========================================================================
311//
312// iPhone PNG support:
313//
314// We optionally support converting iPhone-formatted PNGs (which store
315// premultiplied BGRA) back to RGB, even though they're internally encoded
316// differently. To enable this conversion, call
317// stbi_convert_iphone_png_to_rgb(1).
318//
319// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
320// pixel to remove any premultiplied alpha *only* if the image file explicitly
321// says there's premultiplied data (currently only happens in iPhone images,
322// and only if iPhone convert-to-rgb processing is on).
323//
324// ===========================================================================
325//
326// ADDITIONAL CONFIGURATION
327//
328//  - You can suppress implementation of any of the decoders to reduce
329//    your code footprint by #defining one or more of the following
330//    symbols before creating the implementation.
331//
332//        STBI_NO_JPEG
333//        STBI_NO_PNG
334//        STBI_NO_BMP
335//        STBI_NO_PSD
336//        STBI_NO_TGA
337//        STBI_NO_GIF
338//        STBI_NO_HDR
339//        STBI_NO_PIC
340//        STBI_NO_PNM   (.ppm and .pgm)
341//
342//  - You can request *only* certain decoders and suppress all other ones
343//    (this will be more forward-compatible, as addition of new decoders
344//    doesn't require you to disable them explicitly):
345//
346//        STBI_ONLY_JPEG
347//        STBI_ONLY_PNG
348//        STBI_ONLY_BMP
349//        STBI_ONLY_PSD
350//        STBI_ONLY_TGA
351//        STBI_ONLY_GIF
352//        STBI_ONLY_HDR
353//        STBI_ONLY_PIC
354//        STBI_ONLY_PNM   (.ppm and .pgm)
355//
356//   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
357//     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
358//
359//  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
360//    than that size (in either width or height) without further processing.
361//    This is to let programs in the wild set an upper bound to prevent
362//    denial-of-service attacks on untrusted data, as one could generate a
363//    valid image of gigantic dimensions and force stb_image to allocate a
364//    huge block of memory and spend disproportionate time decoding it. By
365//    default this is set to (1 << 24), which is 16777216, but that's still
366//    very big.
367
368#ifndef STBI_NO_STDIO
369#include <stdio.h>
370#endif // STBI_NO_STDIO
371
372#define STBI_VERSION 1
373
374enum
375{
376   STBI_default = 0, // only used for desired_channels
377
378   STBI_grey       = 1,
379   STBI_grey_alpha = 2,
380   STBI_rgb        = 3,
381   STBI_rgb_alpha  = 4
382};
383
384#include <stdlib.h>
385typedef unsigned char stbi_uc;
386typedef unsigned short stbi_us;
387
388#ifdef __cplusplus
389extern "C" {
390#endif
391
392#ifndef STBIDEF
393#ifdef STB_IMAGE_STATIC
394#define STBIDEF static
395#else
396#define STBIDEF extern
397#endif
398#endif
399
400//////////////////////////////////////////////////////////////////////////////
401//
402// PRIMARY API - works on images of any type
403//
404
405//
406// load image by filename, open file, or memory buffer
407//
408
409typedef struct
410{
411   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
412   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
413   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
414} stbi_io_callbacks;
415
416////////////////////////////////////
417//
418// 8-bits-per-channel interface
419//
420
421STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
422STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
423
424#ifndef STBI_NO_STDIO
425STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
426STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
427// for stbi_load_from_file, file pointer is left pointing immediately after image
428#endif
429
430#ifndef STBI_NO_GIF
431STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
432#endif
433
434#ifdef STBI_WINDOWS_UTF8
435STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
436#endif
437
438////////////////////////////////////
439//
440// 16-bits-per-channel interface
441//
442
443STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
444STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
445
446#ifndef STBI_NO_STDIO
447STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
448STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
449#endif
450
451////////////////////////////////////
452//
453// float-per-channel interface
454//
455#ifndef STBI_NO_LINEAR
456   STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
457   STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
458
459   #ifndef STBI_NO_STDIO
460   STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
461   STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
462   #endif
463#endif
464
465#ifndef STBI_NO_HDR
466   STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
467   STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
468#endif // STBI_NO_HDR
469
470#ifndef STBI_NO_LINEAR
471   STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
472   STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
473#endif // STBI_NO_LINEAR
474
475// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
476STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
477STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
478#ifndef STBI_NO_STDIO
479STBIDEF int      stbi_is_hdr          (char const *filename);
480STBIDEF int      stbi_is_hdr_from_file(FILE *f);
481#endif // STBI_NO_STDIO
482
483
484// get a VERY brief reason for failure
485// on most compilers (and ALL modern mainstream compilers) this is threadsafe
486STBIDEF const char *stbi_failure_reason  (void);
487
488// free the loaded image -- this is just free()
489STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
490
491// get image dimensions & components without fully decoding
492STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
493STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
494STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
495STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
496
497#ifndef STBI_NO_STDIO
498STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
499STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
500STBIDEF int      stbi_is_16_bit          (char const *filename);
501STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
502#endif
503
504
505
506// for image formats that explicitly notate that they have premultiplied alpha,
507// we just return the colors as stored in the file. set this flag to force
508// unpremultiplication. results are undefined if the unpremultiply overflow.
509STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
510
511// indicate whether we should process iphone images back to canonical format,
512// or just pass them through "as-is"
513STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
514
515// flip the image vertically, so the first pixel in the output array is the bottom left
516STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
517
518// as above, but only applies to images loaded on the thread that calls the function
519// this function is only available if your compiler supports thread-local variables;
520// calling it will fail to link if your compiler doesn't
521STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
522STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
523STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
524
525// ZLIB client - used by PNG, available for other purposes
526
527STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
528STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
529STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
530STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
531
532STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
533STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
534
535
536#ifdef __cplusplus
537}
538#endif
539
540//
541//
542////   end header file   /////////////////////////////////////////////////////
543#endif // STBI_INCLUDE_STB_IMAGE_H
544
545#ifdef STB_IMAGE_IMPLEMENTATION
546
547#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
548  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
549  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
550  || defined(STBI_ONLY_ZLIB)
551   #ifndef STBI_ONLY_JPEG
552   #define STBI_NO_JPEG
553   #endif
554   #ifndef STBI_ONLY_PNG
555   #define STBI_NO_PNG
556   #endif
557   #ifndef STBI_ONLY_BMP
558   #define STBI_NO_BMP
559   #endif
560   #ifndef STBI_ONLY_PSD
561   #define STBI_NO_PSD
562   #endif
563   #ifndef STBI_ONLY_TGA
564   #define STBI_NO_TGA
565   #endif
566   #ifndef STBI_ONLY_GIF
567   #define STBI_NO_GIF
568   #endif
569   #ifndef STBI_ONLY_HDR
570   #define STBI_NO_HDR
571   #endif
572   #ifndef STBI_ONLY_PIC
573   #define STBI_NO_PIC
574   #endif
575   #ifndef STBI_ONLY_PNM
576   #define STBI_NO_PNM
577   #endif
578#endif
579
580#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
581#define STBI_NO_ZLIB
582#endif
583
584
585#include <stdarg.h>
586#include <stddef.h> // ptrdiff_t on osx
587#include <stdlib.h>
588#include <string.h>
589#include <limits.h>
590
591#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
592#include <math.h>  // ldexp, pow
593#endif
594
595#ifndef STBI_NO_STDIO
596#include <stdio.h>
597#endif
598
599#ifndef STBI_ASSERT
600#include <assert.h>
601#define STBI_ASSERT(x) assert(x)
602#endif
603
604#ifdef __cplusplus
605#define STBI_EXTERN extern "C"
606#else
607#define STBI_EXTERN extern
608#endif
609
610
611#ifndef _MSC_VER
612   #ifdef __cplusplus
613   #define stbi_inline inline
614   #else
615   #define stbi_inline
616   #endif
617#else
618   #define stbi_inline __forceinline
619#endif
620
621#ifndef STBI_NO_THREAD_LOCALS
622   #if defined(__cplusplus) &&  __cplusplus >= 201103L
623      #define STBI_THREAD_LOCAL       thread_local
624   #elif defined(__GNUC__) && __GNUC__ < 5
625      #define STBI_THREAD_LOCAL       __thread
626   #elif defined(_MSC_VER)
627      #define STBI_THREAD_LOCAL       __declspec(thread)
628   #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
629      #define STBI_THREAD_LOCAL       _Thread_local
630   #endif
631
632   #ifndef STBI_THREAD_LOCAL
633      #if defined(__GNUC__)
634        #define STBI_THREAD_LOCAL       __thread
635      #endif
636   #endif
637#endif
638
639#if defined(_MSC_VER) || defined(__SYMBIAN32__)
640typedef unsigned short stbi__uint16;
641typedef   signed short stbi__int16;
642typedef unsigned int   stbi__uint32;
643typedef   signed int   stbi__int32;
644#else
645#include <stdint.h>
646typedef uint16_t stbi__uint16;
647typedef int16_t  stbi__int16;
648typedef uint32_t stbi__uint32;
649typedef int32_t  stbi__int32;
650#endif
651
652// should produce compiler error if size is wrong
653typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
654
655#ifdef _MSC_VER
656#define STBI_NOTUSED(v)  (void)(v)
657#else
658#define STBI_NOTUSED(v)  (void)sizeof(v)
659#endif
660
661#ifdef _MSC_VER
662#define STBI_HAS_LROTL
663#endif
664
665#ifdef STBI_HAS_LROTL
666   #define stbi_lrot(x,y)  _lrotl(x,y)
667#else
668   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
669#endif
670
671#if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
672// ok
673#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
674// ok
675#else
676#error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
677#endif
678
679#ifndef STBI_MALLOC
680#define STBI_MALLOC(sz)           malloc(sz)
681#define STBI_REALLOC(p,newsz)     realloc(p,newsz)
682#define STBI_FREE(p)              free(p)
683#endif
684
685#ifndef STBI_REALLOC_SIZED
686#define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
687#endif
688
689// x86/x64 detection
690#if defined(__x86_64__) || defined(_M_X64)
691#define STBI__X64_TARGET
692#elif defined(__i386) || defined(_M_IX86)
693#define STBI__X86_TARGET
694#endif
695
696#if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
697// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
698// which in turn means it gets to use SSE2 everywhere. This is unfortunate,
699// but previous attempts to provide the SSE2 functions with runtime
700// detection caused numerous issues. The way architecture extensions are
701// exposed in GCC/Clang is, sadly, not really suited for one-file libs.
702// New behavior: if compiled with -msse2, we use SSE2 without any
703// detection; if not, we don't use it at all.
704#define STBI_NO_SIMD
705#endif
706
707#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
708// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
709//
710// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
711// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
712// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
713// simultaneously enabling "-mstackrealign".
714//
715// See https://github.com/nothings/stb/issues/81 for more information.
716//
717// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
718// -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
719#define STBI_NO_SIMD
720#endif
721
722#if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
723#define STBI_SSE2
724#include <emmintrin.h>
725
726#ifdef _MSC_VER
727
728#if _MSC_VER >= 1400  // not VC6
729#include <intrin.h> // __cpuid
730static int stbi__cpuid3(void)
731{
732   int info[4];
733   __cpuid(info,1);
734   return info[3];
735}
736#else
737static int stbi__cpuid3(void)
738{
739   int res;
740   __asm {
741      mov  eax,1
742      cpuid
743      mov  res,edx
744   }
745   return res;
746}
747#endif
748
749#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
750
751#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
752static int stbi__sse2_available(void)
753{
754   int info3 = stbi__cpuid3();
755   return ((info3 >> 26) & 1) != 0;
756}
757#endif
758
759#else // assume GCC-style if not VC++
760#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
761
762#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
763static int stbi__sse2_available(void)
764{
765   // If we're even attempting to compile this on GCC/Clang, that means
766   // -msse2 is on, which means the compiler is allowed to use SSE2
767   // instructions at will, and so are we.
768   return 1;
769}
770#endif
771
772#endif
773#endif
774
775// ARM NEON
776#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
777#undef STBI_NEON
778#endif
779
780#ifdef STBI_NEON
781#include <arm_neon.h>
782#ifdef _MSC_VER
783#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
784#else
785#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
786#endif
787#endif
788
789#ifndef STBI_SIMD_ALIGN
790#define STBI_SIMD_ALIGN(type, name) type name
791#endif
792
793#ifndef STBI_MAX_DIMENSIONS
794#define STBI_MAX_DIMENSIONS (1 << 24)
795#endif
796
797///////////////////////////////////////////////
798//
799//  stbi__context struct and start_xxx functions
800
801// stbi__context structure is our basic context used by all images, so it
802// contains all the IO context, plus some basic image information
803typedef struct
804{
805   stbi__uint32 img_x, img_y;
806   int img_n, img_out_n;
807
808   stbi_io_callbacks io;
809   void *io_user_data;
810
811   int read_from_callbacks;
812   int buflen;
813   stbi_uc buffer_start[128];
814   int callback_already_read;
815
816   stbi_uc *img_buffer, *img_buffer_end;
817   stbi_uc *img_buffer_original, *img_buffer_original_end;
818} stbi__context;
819
820
821static void stbi__refill_buffer(stbi__context *s);
822
823// initialize a memory-decode context
824static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
825{
826   s->io.read = NULL;
827   s->read_from_callbacks = 0;
828   s->callback_already_read = 0;
829   s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
830   s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
831}
832
833// initialize a callback-based context
834static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
835{
836   s->io = *c;
837   s->io_user_data = user;
838   s->buflen = sizeof(s->buffer_start);
839   s->read_from_callbacks = 1;
840   s->callback_already_read = 0;
841   s->img_buffer = s->img_buffer_original = s->buffer_start;
842   stbi__refill_buffer(s);
843   s->img_buffer_original_end = s->img_buffer_end;
844}
845
846#ifndef STBI_NO_STDIO
847
848static int stbi__stdio_read(void *user, char *data, int size)
849{
850   return (int) fread(data,1,size,(FILE*) user);
851}
852
853static void stbi__stdio_skip(void *user, int n)
854{
855   int ch;
856   fseek((FILE*) user, n, SEEK_CUR);
857   ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
858   if (ch != EOF) {
859      ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
860   }
861}
862
863static int stbi__stdio_eof(void *user)
864{
865   return feof((FILE*) user) || ferror((FILE *) user);
866}
867
868static stbi_io_callbacks stbi__stdio_callbacks =
869{
870   stbi__stdio_read,
871   stbi__stdio_skip,
872   stbi__stdio_eof,
873};
874
875static void stbi__start_file(stbi__context *s, FILE *f)
876{
877   stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
878}
879
880//static void stop_file(stbi__context *s) { }
881
882#endif // !STBI_NO_STDIO
883
884static void stbi__rewind(stbi__context *s)
885{
886   // conceptually rewind SHOULD rewind to the beginning of the stream,
887   // but we just rewind to the beginning of the initial buffer, because
888   // we only use it after doing 'test', which only ever looks at at most 92 bytes
889   s->img_buffer = s->img_buffer_original;
890   s->img_buffer_end = s->img_buffer_original_end;
891}
892
893enum
894{
895   STBI_ORDER_RGB,
896   STBI_ORDER_BGR
897};
898
899typedef struct
900{
901   int bits_per_channel;
902   int num_channels;
903   int channel_order;
904} stbi__result_info;
905
906#ifndef STBI_NO_JPEG
907static int      stbi__jpeg_test(stbi__context *s);
908static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
909static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
910#endif
911
912#ifndef STBI_NO_PNG
913static int      stbi__png_test(stbi__context *s);
914static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
915static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
916static int      stbi__png_is16(stbi__context *s);
917#endif
918
919#ifndef STBI_NO_BMP
920static int      stbi__bmp_test(stbi__context *s);
921static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
922static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
923#endif
924
925#ifndef STBI_NO_TGA
926static int      stbi__tga_test(stbi__context *s);
927static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
928static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
929#endif
930
931#ifndef STBI_NO_PSD
932static int      stbi__psd_test(stbi__context *s);
933static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
934static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
935static int      stbi__psd_is16(stbi__context *s);
936#endif
937
938#ifndef STBI_NO_HDR
939static int      stbi__hdr_test(stbi__context *s);
940static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
941static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
942#endif
943
944#ifndef STBI_NO_PIC
945static int      stbi__pic_test(stbi__context *s);
946static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
947static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
948#endif
949
950#ifndef STBI_NO_GIF
951static int      stbi__gif_test(stbi__context *s);
952static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
953static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
954static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
955#endif
956
957#ifndef STBI_NO_PNM
958static int      stbi__pnm_test(stbi__context *s);
959static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
960static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
961static int      stbi__pnm_is16(stbi__context *s);
962#endif
963
964static
965#ifdef STBI_THREAD_LOCAL
966STBI_THREAD_LOCAL
967#endif
968const char *stbi__g_failure_reason;
969
970STBIDEF const char *stbi_failure_reason(void)
971{
972   return stbi__g_failure_reason;
973}
974
975#ifndef STBI_NO_FAILURE_STRINGS
976static int stbi__err(const char *str)
977{
978   stbi__g_failure_reason = str;
979   return 0;
980}
981#endif
982
983static void *stbi__malloc(size_t size)
984{
985    return STBI_MALLOC(size);
986}
987
988// stb_image uses ints pervasively, including for offset calculations.
989// therefore the largest decoded image size we can support with the
990// current code, even on 64-bit targets, is INT_MAX. this is not a
991// significant limitation for the intended use case.
992//
993// we do, however, need to make sure our size calculations don't
994// overflow. hence a few helper functions for size calculations that
995// multiply integers together, making sure that they're non-negative
996// and no overflow occurs.
997
998// return 1 if the sum is valid, 0 on overflow.
999// negative terms are considered invalid.
1000static int stbi__addsizes_valid(int a, int b)
1001{
1002   if (b < 0) return 0;
1003   // now 0 <= b <= INT_MAX, hence also
1004   // 0 <= INT_MAX - b <= INTMAX.
1005   // And "a + b <= INT_MAX" (which might overflow) is the
1006   // same as a <= INT_MAX - b (no overflow)
1007   return a <= INT_MAX - b;
1008}
1009
1010// returns 1 if the product is valid, 0 on overflow.
1011// negative factors are considered invalid.
1012static int stbi__mul2sizes_valid(int a, int b)
1013{
1014   if (a < 0 || b < 0) return 0;
1015   if (b == 0) return 1; // mul-by-0 is always safe
1016   // portable way to check for no overflows in a*b
1017   return a <= INT_MAX/b;
1018}
1019
1020#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1021// returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
1022static int stbi__mad2sizes_valid(int a, int b, int add)
1023{
1024   return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
1025}
1026#endif
1027
1028// returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
1029static int stbi__mad3sizes_valid(int a, int b, int c, int add)
1030{
1031   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1032      stbi__addsizes_valid(a*b*c, add);
1033}
1034
1035// returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
1036#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1037static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
1038{
1039   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1040      stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
1041}
1042#endif
1043
1044#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1045// mallocs with size overflow checking
1046static void *stbi__malloc_mad2(int a, int b, int add)
1047{
1048   if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
1049   return stbi__malloc(a*b + add);
1050}
1051#endif
1052
1053static void *stbi__malloc_mad3(int a, int b, int c, int add)
1054{
1055   if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
1056   return stbi__malloc(a*b*c + add);
1057}
1058
1059#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1060static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
1061{
1062   if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
1063   return stbi__malloc(a*b*c*d + add);
1064}
1065#endif
1066
1067// returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow.
1068static int stbi__addints_valid(int a, int b)
1069{
1070   if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow
1071   if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0.
1072   return a <= INT_MAX - b;
1073}
1074
1075// returns 1 if the product of two signed shorts is valid, 0 on overflow.
1076static int stbi__mul2shorts_valid(short a, short b)
1077{
1078   if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow
1079   if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid
1080   if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
1081   return a >= SHRT_MIN / b;
1082}
1083
1084// stbi__err - error
1085// stbi__errpf - error returning pointer to float
1086// stbi__errpuc - error returning pointer to unsigned char
1087
1088#ifdef STBI_NO_FAILURE_STRINGS
1089   #define stbi__err(x,y)  0
1090#elif defined(STBI_FAILURE_USERMSG)
1091   #define stbi__err(x,y)  stbi__err(y)
1092#else
1093   #define stbi__err(x,y)  stbi__err(x)
1094#endif
1095
1096#define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1097#define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1098
1099STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1100{
1101   STBI_FREE(retval_from_stbi_load);
1102}
1103
1104#ifndef STBI_NO_LINEAR
1105static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1106#endif
1107
1108#ifndef STBI_NO_HDR
1109static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
1110#endif
1111
1112static int stbi__vertically_flip_on_load_global = 0;
1113
1114STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1115{
1116   stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
1117}
1118
1119#ifndef STBI_THREAD_LOCAL
1120#define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
1121#else
1122static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
1123
1124STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
1125{
1126   stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
1127   stbi__vertically_flip_on_load_set = 1;
1128}
1129
1130#define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
1131                                         ? stbi__vertically_flip_on_load_local  \
1132                                         : stbi__vertically_flip_on_load_global)
1133#endif // STBI_THREAD_LOCAL
1134
1135static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1136{
1137   memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1138   ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1139   ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1140   ri->num_channels = 0;
1141
1142   // test the formats with a very explicit header first (at least a FOURCC
1143   // or distinctive magic number first)
1144   #ifndef STBI_NO_PNG
1145   if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
1146   #endif
1147   #ifndef STBI_NO_BMP
1148   if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
1149   #endif
1150   #ifndef STBI_NO_GIF
1151   if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
1152   #endif
1153   #ifndef STBI_NO_PSD
1154   if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
1155   #else
1156   STBI_NOTUSED(bpc);
1157   #endif
1158   #ifndef STBI_NO_PIC
1159   if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
1160   #endif
1161
1162   // then the formats that can end up attempting to load with just 1 or 2
1163   // bytes matching expectations; these are prone to false positives, so
1164   // try them later
1165   #ifndef STBI_NO_JPEG
1166   if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
1167   #endif
1168   #ifndef STBI_NO_PNM
1169   if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
1170   #endif
1171
1172   #ifndef STBI_NO_HDR
1173   if (stbi__hdr_test(s)) {
1174      float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
1175      return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1176   }
1177   #endif
1178
1179   #ifndef STBI_NO_TGA
1180   // test tga last because it's a crappy test!
1181   if (stbi__tga_test(s))
1182      return stbi__tga_load(s,x,y,comp,req_comp, ri);
1183   #endif
1184
1185   return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1186}
1187
1188static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1189{
1190   int i;
1191   int img_len = w * h * channels;
1192   stbi_uc *reduced;
1193
1194   reduced = (stbi_uc *) stbi__malloc(img_len);
1195   if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1196
1197   for (i = 0; i < img_len; ++i)
1198      reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1199
1200   STBI_FREE(orig);
1201   return reduced;
1202}
1203
1204static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1205{
1206   int i;
1207   int img_len = w * h * channels;
1208   stbi__uint16 *enlarged;
1209
1210   enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
1211   if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1212
1213   for (i = 0; i < img_len; ++i)
1214      enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1215
1216   STBI_FREE(orig);
1217   return enlarged;
1218}
1219
1220static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
1221{
1222   int row;
1223   size_t bytes_per_row = (size_t)w * bytes_per_pixel;
1224   stbi_uc temp[2048];
1225   stbi_uc *bytes = (stbi_uc *)image;
1226
1227   for (row = 0; row < (h>>1); row++) {
1228      stbi_uc *row0 = bytes + row*bytes_per_row;
1229      stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
1230      // swap row0 with row1
1231      size_t bytes_left = bytes_per_row;
1232      while (bytes_left) {
1233         size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
1234         memcpy(temp, row0, bytes_copy);
1235         memcpy(row0, row1, bytes_copy);
1236         memcpy(row1, temp, bytes_copy);
1237         row0 += bytes_copy;
1238         row1 += bytes_copy;
1239         bytes_left -= bytes_copy;
1240      }
1241   }
1242}
1243
1244#ifndef STBI_NO_GIF
1245static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
1246{
1247   int slice;
1248   int slice_size = w * h * bytes_per_pixel;
1249
1250   stbi_uc *bytes = (stbi_uc *)image;
1251   for (slice = 0; slice < z; ++slice) {
1252      stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
1253      bytes += slice_size;
1254   }
1255}
1256#endif
1257
1258static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1259{
1260   stbi__result_info ri;
1261   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1262
1263   if (result == NULL)
1264      return NULL;
1265
1266   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1267   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1268
1269   if (ri.bits_per_channel != 8) {
1270      result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1271      ri.bits_per_channel = 8;
1272   }
1273
1274   // @TODO: move stbi__convert_format to here
1275
1276   if (stbi__vertically_flip_on_load) {
1277      int channels = req_comp ? req_comp : *comp;
1278      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
1279   }
1280
1281   return (unsigned char *) result;
1282}
1283
1284static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1285{
1286   stbi__result_info ri;
1287   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1288
1289   if (result == NULL)
1290      return NULL;
1291
1292   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1293   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1294
1295   if (ri.bits_per_channel != 16) {
1296      result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1297      ri.bits_per_channel = 16;
1298   }
1299
1300   // @TODO: move stbi__convert_format16 to here
1301   // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1302
1303   if (stbi__vertically_flip_on_load) {
1304      int channels = req_comp ? req_comp : *comp;
1305      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
1306   }
1307
1308   return (stbi__uint16 *) result;
1309}
1310
1311#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
1312static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1313{
1314   if (stbi__vertically_flip_on_load && result != NULL) {
1315      int channels = req_comp ? req_comp : *comp;
1316      stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
1317   }
1318}
1319#endif
1320
1321#ifndef STBI_NO_STDIO
1322
1323#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1324STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
1325STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
1326#endif
1327
1328#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1329STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
1330{
1331	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
1332}
1333#endif
1334
1335static FILE *stbi__fopen(char const *filename, char const *mode)
1336{
1337   FILE *f;
1338#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1339   wchar_t wMode[64];
1340   wchar_t wFilename[1024];
1341	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
1342      return 0;
1343
1344	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
1345      return 0;
1346
1347#if defined(_MSC_VER) && _MSC_VER >= 1400
1348	if (0 != _wfopen_s(&f, wFilename, wMode))
1349		f = 0;
1350#else
1351   f = _wfopen(wFilename, wMode);
1352#endif
1353
1354#elif defined(_MSC_VER) && _MSC_VER >= 1400
1355   if (0 != fopen_s(&f, filename, mode))
1356      f=0;
1357#else
1358   f = fopen(filename, mode);
1359#endif
1360   return f;
1361}
1362
1363
1364STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1365{
1366   FILE *f = stbi__fopen(filename, "rb");
1367   unsigned char *result;
1368   if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1369   result = stbi_load_from_file(f,x,y,comp,req_comp);
1370   fclose(f);
1371   return result;
1372}
1373
1374STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1375{
1376   unsigned char *result;
1377   stbi__context s;
1378   stbi__start_file(&s,f);
1379   result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1380   if (result) {
1381      // need to 'unget' all the characters in the IO buffer
1382      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1383   }
1384   return result;
1385}
1386
1387STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1388{
1389   stbi__uint16 *result;
1390   stbi__context s;
1391   stbi__start_file(&s,f);
1392   result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
1393   if (result) {
1394      // need to 'unget' all the characters in the IO buffer
1395      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1396   }
1397   return result;
1398}
1399
1400STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1401{
1402   FILE *f = stbi__fopen(filename, "rb");
1403   stbi__uint16 *result;
1404   if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
1405   result = stbi_load_from_file_16(f,x,y,comp,req_comp);
1406   fclose(f);
1407   return result;
1408}
1409
1410
1411#endif //!STBI_NO_STDIO
1412
1413STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
1414{
1415   stbi__context s;
1416   stbi__start_mem(&s,buffer,len);
1417   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1418}
1419
1420STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
1421{
1422   stbi__context s;
1423   stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1424   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1425}
1426
1427STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1428{
1429   stbi__context s;
1430   stbi__start_mem(&s,buffer,len);
1431   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1432}
1433
1434STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1435{
1436   stbi__context s;
1437   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1438   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1439}
1440
1441#ifndef STBI_NO_GIF
1442STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
1443{
1444   unsigned char *result;
1445   stbi__context s;
1446   stbi__start_mem(&s,buffer,len);
1447
1448   result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
1449   if (stbi__vertically_flip_on_load && result) {
1450      stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
1451   }
1452
1453   return result;
1454}
1455#endif
1456
1457#ifndef STBI_NO_LINEAR
1458static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1459{
1460   unsigned char *data;
1461   #ifndef STBI_NO_HDR
1462   if (stbi__hdr_test(s)) {
1463      stbi__result_info ri;
1464      float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
1465      if (hdr_data)
1466         stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1467      return hdr_data;
1468   }
1469   #endif
1470   data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1471   if (data)
1472      return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1473   return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1474}
1475
1476STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1477{
1478   stbi__context s;
1479   stbi__start_mem(&s,buffer,len);
1480   return stbi__loadf_main(&s,x,y,comp,req_comp);
1481}
1482
1483STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1484{
1485   stbi__context s;
1486   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1487   return stbi__loadf_main(&s,x,y,comp,req_comp);
1488}
1489
1490#ifndef STBI_NO_STDIO
1491STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1492{
1493   float *result;
1494   FILE *f = stbi__fopen(filename, "rb");
1495   if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1496   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1497   fclose(f);
1498   return result;
1499}
1500
1501STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1502{
1503   stbi__context s;
1504   stbi__start_file(&s,f);
1505   return stbi__loadf_main(&s,x,y,comp,req_comp);
1506}
1507#endif // !STBI_NO_STDIO
1508
1509#endif // !STBI_NO_LINEAR
1510
1511// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1512// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1513// reports false!
1514
1515STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1516{
1517   #ifndef STBI_NO_HDR
1518   stbi__context s;
1519   stbi__start_mem(&s,buffer,len);
1520   return stbi__hdr_test(&s);
1521   #else
1522   STBI_NOTUSED(buffer);
1523   STBI_NOTUSED(len);
1524   return 0;
1525   #endif
1526}
1527
1528#ifndef STBI_NO_STDIO
1529STBIDEF int      stbi_is_hdr          (char const *filename)
1530{
1531   FILE *f = stbi__fopen(filename, "rb");
1532   int result=0;
1533   if (f) {
1534      result = stbi_is_hdr_from_file(f);
1535      fclose(f);
1536   }
1537   return result;
1538}
1539
1540STBIDEF int stbi_is_hdr_from_file(FILE *f)
1541{
1542   #ifndef STBI_NO_HDR
1543   long pos = ftell(f);
1544   int res;
1545   stbi__context s;
1546   stbi__start_file(&s,f);
1547   res = stbi__hdr_test(&s);
1548   fseek(f, pos, SEEK_SET);
1549   return res;
1550   #else
1551   STBI_NOTUSED(f);
1552   return 0;
1553   #endif
1554}
1555#endif // !STBI_NO_STDIO
1556
1557STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1558{
1559   #ifndef STBI_NO_HDR
1560   stbi__context s;
1561   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1562   return stbi__hdr_test(&s);
1563   #else
1564   STBI_NOTUSED(clbk);
1565   STBI_NOTUSED(user);
1566   return 0;
1567   #endif
1568}
1569
1570#ifndef STBI_NO_LINEAR
1571static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1572
1573STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1574STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1575#endif
1576
1577static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1578
1579STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1580STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1581
1582
1583//////////////////////////////////////////////////////////////////////////////
1584//
1585// Common code used by all image loaders
1586//
1587
1588enum
1589{
1590   STBI__SCAN_load=0,
1591   STBI__SCAN_type,
1592   STBI__SCAN_header
1593};
1594
1595static void stbi__refill_buffer(stbi__context *s)
1596{
1597   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1598   s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
1599   if (n == 0) {
1600      // at end of file, treat same as if from memory, but need to handle case
1601      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1602      s->read_from_callbacks = 0;
1603      s->img_buffer = s->buffer_start;
1604      s->img_buffer_end = s->buffer_start+1;
1605      *s->img_buffer = 0;
1606   } else {
1607      s->img_buffer = s->buffer_start;
1608      s->img_buffer_end = s->buffer_start + n;
1609   }
1610}
1611
1612stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1613{
1614   if (s->img_buffer < s->img_buffer_end)
1615      return *s->img_buffer++;
1616   if (s->read_from_callbacks) {
1617      stbi__refill_buffer(s);
1618      return *s->img_buffer++;
1619   }
1620   return 0;
1621}
1622
1623#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1624// nothing
1625#else
1626stbi_inline static int stbi__at_eof(stbi__context *s)
1627{
1628   if (s->io.read) {
1629      if (!(s->io.eof)(s->io_user_data)) return 0;
1630      // if feof() is true, check if buffer = end
1631      // special case: we've only got the special 0 character at the end
1632      if (s->read_from_callbacks == 0) return 1;
1633   }
1634
1635   return s->img_buffer >= s->img_buffer_end;
1636}
1637#endif
1638
1639#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
1640// nothing
1641#else
1642static void stbi__skip(stbi__context *s, int n)
1643{
1644   if (n == 0) return;  // already there!
1645   if (n < 0) {
1646      s->img_buffer = s->img_buffer_end;
1647      return;
1648   }
1649   if (s->io.read) {
1650      int blen = (int) (s->img_buffer_end - s->img_buffer);
1651      if (blen < n) {
1652         s->img_buffer = s->img_buffer_end;
1653         (s->io.skip)(s->io_user_data, n - blen);
1654         return;
1655      }
1656   }
1657   s->img_buffer += n;
1658}
1659#endif
1660
1661#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
1662// nothing
1663#else
1664static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1665{
1666   if (s->io.read) {
1667      int blen = (int) (s->img_buffer_end - s->img_buffer);
1668      if (blen < n) {
1669         int res, count;
1670
1671         memcpy(buffer, s->img_buffer, blen);
1672
1673         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1674         res = (count == (n-blen));
1675         s->img_buffer = s->img_buffer_end;
1676         return res;
1677      }
1678   }
1679
1680   if (s->img_buffer+n <= s->img_buffer_end) {
1681      memcpy(buffer, s->img_buffer, n);
1682      s->img_buffer += n;
1683      return 1;
1684   } else
1685      return 0;
1686}
1687#endif
1688
1689#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1690// nothing
1691#else
1692static int stbi__get16be(stbi__context *s)
1693{
1694   int z = stbi__get8(s);
1695   return (z << 8) + stbi__get8(s);
1696}
1697#endif
1698
1699#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1700// nothing
1701#else
1702static stbi__uint32 stbi__get32be(stbi__context *s)
1703{
1704   stbi__uint32 z = stbi__get16be(s);
1705   return (z << 16) + stbi__get16be(s);
1706}
1707#endif
1708
1709#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1710// nothing
1711#else
1712static int stbi__get16le(stbi__context *s)
1713{
1714   int z = stbi__get8(s);
1715   return z + (stbi__get8(s) << 8);
1716}
1717#endif
1718
1719#ifndef STBI_NO_BMP
1720static stbi__uint32 stbi__get32le(stbi__context *s)
1721{
1722   stbi__uint32 z = stbi__get16le(s);
1723   z += (stbi__uint32)stbi__get16le(s) << 16;
1724   return z;
1725}
1726#endif
1727
1728#define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1729
1730#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1731// nothing
1732#else
1733//////////////////////////////////////////////////////////////////////////////
1734//
1735//  generic converter from built-in img_n to req_comp
1736//    individual types do this automatically as much as possible (e.g. jpeg
1737//    does all cases internally since it needs to colorspace convert anyway,
1738//    and it never has alpha, so very few cases ). png can automatically
1739//    interleave an alpha=255 channel, but falls back to this for other cases
1740//
1741//  assume data buffer is malloced, so malloc a new one and free that one
1742//  only failure mode is malloc failing
1743
1744static stbi_uc stbi__compute_y(int r, int g, int b)
1745{
1746   return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1747}
1748#endif
1749
1750#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1751// nothing
1752#else
1753static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1754{
1755   int i,j;
1756   unsigned char *good;
1757
1758   if (req_comp == img_n) return data;
1759   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1760
1761   good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
1762   if (good == NULL) {
1763      STBI_FREE(data);
1764      return stbi__errpuc("outofmem", "Out of memory");
1765   }
1766
1767   for (j=0; j < (int) y; ++j) {
1768      unsigned char *src  = data + j * x * img_n   ;
1769      unsigned char *dest = good + j * x * req_comp;
1770
1771      #define STBI__COMBO(a,b)  ((a)*8+(b))
1772      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1773      // convert source image with img_n components to one with req_comp components;
1774      // avoid switch per pixel, so use switch per scanline and massive macros
1775      switch (STBI__COMBO(img_n, req_comp)) {
1776         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
1777         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1778         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
1779         STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
1780         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1781         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
1782         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
1783         STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1784         STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
1785         STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1786         STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1787         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
1788         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
1789      }
1790      #undef STBI__CASE
1791   }
1792
1793   STBI_FREE(data);
1794   return good;
1795}
1796#endif
1797
1798#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1799// nothing
1800#else
1801static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1802{
1803   return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
1804}
1805#endif
1806
1807#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1808// nothing
1809#else
1810static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1811{
1812   int i,j;
1813   stbi__uint16 *good;
1814
1815   if (req_comp == img_n) return data;
1816   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1817
1818   good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
1819   if (good == NULL) {
1820      STBI_FREE(data);
1821      return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1822   }
1823
1824   for (j=0; j < (int) y; ++j) {
1825      stbi__uint16 *src  = data + j * x * img_n   ;
1826      stbi__uint16 *dest = good + j * x * req_comp;
1827
1828      #define STBI__COMBO(a,b)  ((a)*8+(b))
1829      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1830      // convert source image with img_n components to one with req_comp components;
1831      // avoid switch per pixel, so use switch per scanline and massive macros
1832      switch (STBI__COMBO(img_n, req_comp)) {
1833         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
1834         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1835         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
1836         STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
1837         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1838         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
1839         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
1840         STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1841         STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
1842         STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1843         STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1844         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
1845         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
1846      }
1847      #undef STBI__CASE
1848   }
1849
1850   STBI_FREE(data);
1851   return good;
1852}
1853#endif
1854
1855#ifndef STBI_NO_LINEAR
1856static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1857{
1858   int i,k,n;
1859   float *output;
1860   if (!data) return NULL;
1861   output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1862   if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1863   // compute number of non-alpha components
1864   if (comp & 1) n = comp; else n = comp-1;
1865   for (i=0; i < x*y; ++i) {
1866      for (k=0; k < n; ++k) {
1867         output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1868      }
1869   }
1870   if (n < comp) {
1871      for (i=0; i < x*y; ++i) {
1872         output[i*comp + n] = data[i*comp + n]/255.0f;
1873      }
1874   }
1875   STBI_FREE(data);
1876   return output;
1877}
1878#endif
1879
1880#ifndef STBI_NO_HDR
1881#define stbi__float2int(x)   ((int) (x))
1882static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1883{
1884   int i,k,n;
1885   stbi_uc *output;
1886   if (!data) return NULL;
1887   output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
1888   if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1889   // compute number of non-alpha components
1890   if (comp & 1) n = comp; else n = comp-1;
1891   for (i=0; i < x*y; ++i) {
1892      for (k=0; k < n; ++k) {
1893         float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1894         if (z < 0) z = 0;
1895         if (z > 255) z = 255;
1896         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1897      }
1898      if (k < comp) {
1899         float z = data[i*comp+k] * 255 + 0.5f;
1900         if (z < 0) z = 0;
1901         if (z > 255) z = 255;
1902         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1903      }
1904   }
1905   STBI_FREE(data);
1906   return output;
1907}
1908#endif
1909
1910//////////////////////////////////////////////////////////////////////////////
1911//
1912//  "baseline" JPEG/JFIF decoder
1913//
1914//    simple implementation
1915//      - doesn't support delayed output of y-dimension
1916//      - simple interface (only one output format: 8-bit interleaved RGB)
1917//      - doesn't try to recover corrupt jpegs
1918//      - doesn't allow partial loading, loading multiple at once
1919//      - still fast on x86 (copying globals into locals doesn't help x86)
1920//      - allocates lots of intermediate memory (full size of all components)
1921//        - non-interleaved case requires this anyway
1922//        - allows good upsampling (see next)
1923//    high-quality
1924//      - upsampled channels are bilinearly interpolated, even across blocks
1925//      - quality integer IDCT derived from IJG's 'slow'
1926//    performance
1927//      - fast huffman; reasonable integer IDCT
1928//      - some SIMD kernels for common paths on targets with SSE2/NEON
1929//      - uses a lot of intermediate memory, could cache poorly
1930
1931#ifndef STBI_NO_JPEG
1932
1933// huffman decoding acceleration
1934#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1935
1936typedef struct
1937{
1938   stbi_uc  fast[1 << FAST_BITS];
1939   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1940   stbi__uint16 code[256];
1941   stbi_uc  values[256];
1942   stbi_uc  size[257];
1943   unsigned int maxcode[18];
1944   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1945} stbi__huffman;
1946
1947typedef struct
1948{
1949   stbi__context *s;
1950   stbi__huffman huff_dc[4];
1951   stbi__huffman huff_ac[4];
1952   stbi__uint16 dequant[4][64];
1953   stbi__int16 fast_ac[4][1 << FAST_BITS];
1954
1955// sizes for components, interleaved MCUs
1956   int img_h_max, img_v_max;
1957   int img_mcu_x, img_mcu_y;
1958   int img_mcu_w, img_mcu_h;
1959
1960// definition of jpeg image component
1961   struct
1962   {
1963      int id;
1964      int h,v;
1965      int tq;
1966      int hd,ha;
1967      int dc_pred;
1968
1969      int x,y,w2,h2;
1970      stbi_uc *data;
1971      void *raw_data, *raw_coeff;
1972      stbi_uc *linebuf;
1973      short   *coeff;   // progressive only
1974      int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1975   } img_comp[4];
1976
1977   stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1978   int            code_bits;   // number of valid bits
1979   unsigned char  marker;      // marker seen while filling entropy buffer
1980   int            nomore;      // flag if we saw a marker so must stop
1981
1982   int            progressive;
1983   int            spec_start;
1984   int            spec_end;
1985   int            succ_high;
1986   int            succ_low;
1987   int            eob_run;
1988   int            jfif;
1989   int            app14_color_transform; // Adobe APP14 tag
1990   int            rgb;
1991
1992   int scan_n, order[4];
1993   int restart_interval, todo;
1994
1995// kernels
1996   void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1997   void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1998   stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1999} stbi__jpeg;
2000
2001static int stbi__build_huffman(stbi__huffman *h, int *count)
2002{
2003   int i,j,k=0;
2004   unsigned int code;
2005   // build size list for each symbol (from JPEG spec)
2006   for (i=0; i < 16; ++i) {
2007      for (j=0; j < count[i]; ++j) {
2008         h->size[k++] = (stbi_uc) (i+1);
2009         if(k >= 257) return stbi__err("bad size list","Corrupt JPEG");
2010      }
2011   }
2012   h->size[k] = 0;
2013
2014   // compute actual symbols (from jpeg spec)
2015   code = 0;
2016   k = 0;
2017   for(j=1; j <= 16; ++j) {
2018      // compute delta to add to code to compute symbol id
2019      h->delta[j] = k - code;
2020      if (h->size[k] == j) {
2021         while (h->size[k] == j)
2022            h->code[k++] = (stbi__uint16) (code++);
2023         if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
2024      }
2025      // compute largest code + 1 for this size, preshifted as needed later
2026      h->maxcode[j] = code << (16-j);
2027      code <<= 1;
2028   }
2029   h->maxcode[j] = 0xffffffff;
2030
2031   // build non-spec acceleration table; 255 is flag for not-accelerated
2032   memset(h->fast, 255, 1 << FAST_BITS);
2033   for (i=0; i < k; ++i) {
2034      int s = h->size[i];
2035      if (s <= FAST_BITS) {
2036         int c = h->code[i] << (FAST_BITS-s);
2037         int m = 1 << (FAST_BITS-s);
2038         for (j=0; j < m; ++j) {
2039            h->fast[c+j] = (stbi_uc) i;
2040         }
2041      }
2042   }
2043   return 1;
2044}
2045
2046// build a table that decodes both magnitude and value of small ACs in
2047// one go.
2048static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
2049{
2050   int i;
2051   for (i=0; i < (1 << FAST_BITS); ++i) {
2052      stbi_uc fast = h->fast[i];
2053      fast_ac[i] = 0;
2054      if (fast < 255) {
2055         int rs = h->values[fast];
2056         int run = (rs >> 4) & 15;
2057         int magbits = rs & 15;
2058         int len = h->size[fast];
2059
2060         if (magbits && len + magbits <= FAST_BITS) {
2061            // magnitude code followed by receive_extend code
2062            int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
2063            int m = 1 << (magbits - 1);
2064            if (k < m) k += (~0U << magbits) + 1;
2065            // if the result is small enough, we can fit it in fast_ac table
2066            if (k >= -128 && k <= 127)
2067               fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
2068         }
2069      }
2070   }
2071}
2072
2073static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
2074{
2075   do {
2076      unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
2077      if (b == 0xff) {
2078         int c = stbi__get8(j->s);
2079         while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
2080         if (c != 0) {
2081            j->marker = (unsigned char) c;
2082            j->nomore = 1;
2083            return;
2084         }
2085      }
2086      j->code_buffer |= b << (24 - j->code_bits);
2087      j->code_bits += 8;
2088   } while (j->code_bits <= 24);
2089}
2090
2091// (1 << n) - 1
2092static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
2093
2094// decode a jpeg huffman value from the bitstream
2095stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
2096{
2097   unsigned int temp;
2098   int c,k;
2099
2100   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2101
2102   // look at the top FAST_BITS and determine what symbol ID it is,
2103   // if the code is <= FAST_BITS
2104   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2105   k = h->fast[c];
2106   if (k < 255) {
2107      int s = h->size[k];
2108      if (s > j->code_bits)
2109         return -1;
2110      j->code_buffer <<= s;
2111      j->code_bits -= s;
2112      return h->values[k];
2113   }
2114
2115   // naive test is to shift the code_buffer down so k bits are
2116   // valid, then test against maxcode. To speed this up, we've
2117   // preshifted maxcode left so that it has (16-k) 0s at the
2118   // end; in other words, regardless of the number of bits, it
2119   // wants to be compared against something shifted to have 16;
2120   // that way we don't need to shift inside the loop.
2121   temp = j->code_buffer >> 16;
2122   for (k=FAST_BITS+1 ; ; ++k)
2123      if (temp < h->maxcode[k])
2124         break;
2125   if (k == 17) {
2126      // error! code not found
2127      j->code_bits -= 16;
2128      return -1;
2129   }
2130
2131   if (k > j->code_bits)
2132      return -1;
2133
2134   // convert the huffman code to the symbol id
2135   c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
2136   if(c < 0 || c >= 256) // symbol id out of bounds!
2137       return -1;
2138   STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
2139
2140   // convert the id to a symbol
2141   j->code_bits -= k;
2142   j->code_buffer <<= k;
2143   return h->values[c];
2144}
2145
2146// bias[n] = (-1<<n) + 1
2147static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
2148
2149// combined JPEG 'receive' and JPEG 'extend', since baseline
2150// always extends everything it receives.
2151stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
2152{
2153   unsigned int k;
2154   int sgn;
2155   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2156   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
2157
2158   sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
2159   k = stbi_lrot(j->code_buffer, n);
2160   j->code_buffer = k & ~stbi__bmask[n];
2161   k &= stbi__bmask[n];
2162   j->code_bits -= n;
2163   return k + (stbi__jbias[n] & (sgn - 1));
2164}
2165
2166// get some unsigned bits
2167stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
2168{
2169   unsigned int k;
2170   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2171   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
2172   k = stbi_lrot(j->code_buffer, n);
2173   j->code_buffer = k & ~stbi__bmask[n];
2174   k &= stbi__bmask[n];
2175   j->code_bits -= n;
2176   return k;
2177}
2178
2179stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
2180{
2181   unsigned int k;
2182   if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
2183   if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing
2184   k = j->code_buffer;
2185   j->code_buffer <<= 1;
2186   --j->code_bits;
2187   return k & 0x80000000;
2188}
2189
2190// given a value that's at position X in the zigzag stream,
2191// where does it appear in the 8x8 matrix coded as row-major?
2192static const stbi_uc stbi__jpeg_dezigzag[64+15] =
2193{
2194    0,  1,  8, 16,  9,  2,  3, 10,
2195   17, 24, 32, 25, 18, 11,  4,  5,
2196   12, 19, 26, 33, 40, 48, 41, 34,
2197   27, 20, 13,  6,  7, 14, 21, 28,
2198   35, 42, 49, 56, 57, 50, 43, 36,
2199   29, 22, 15, 23, 30, 37, 44, 51,
2200   58, 59, 52, 45, 38, 31, 39, 46,
2201   53, 60, 61, 54, 47, 55, 62, 63,
2202   // let corrupt input sample past end
2203   63, 63, 63, 63, 63, 63, 63, 63,
2204   63, 63, 63, 63, 63, 63, 63
2205};
2206
2207// decode one 64-entry block--
2208static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
2209{
2210   int diff,dc,k;
2211   int t;
2212
2213   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2214   t = stbi__jpeg_huff_decode(j, hdc);
2215   if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
2216
2217   // 0 all the ac values now so we can do it 32-bits at a time
2218   memset(data,0,64*sizeof(data[0]));
2219
2220   diff = t ? stbi__extend_receive(j, t) : 0;
2221   if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG");
2222   dc = j->img_comp[b].dc_pred + diff;
2223   j->img_comp[b].dc_pred = dc;
2224   if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2225   data[0] = (short) (dc * dequant[0]);
2226
2227   // decode AC components, see JPEG spec
2228   k = 1;
2229   do {
2230      unsigned int zig;
2231      int c,r,s;
2232      if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2233      c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2234      r = fac[c];
2235      if (r) { // fast-AC path
2236         k += (r >> 4) & 15; // run
2237         s = r & 15; // combined length
2238         if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
2239         j->code_buffer <<= s;
2240         j->code_bits -= s;
2241         // decode into unzigzag'd location
2242         zig = stbi__jpeg_dezigzag[k++];
2243         data[zig] = (short) ((r >> 8) * dequant[zig]);
2244      } else {
2245         int rs = stbi__jpeg_huff_decode(j, hac);
2246         if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2247         s = rs & 15;
2248         r = rs >> 4;
2249         if (s == 0) {
2250            if (rs != 0xf0) break; // end block
2251            k += 16;
2252         } else {
2253            k += r;
2254            // decode into unzigzag'd location
2255            zig = stbi__jpeg_dezigzag[k++];
2256            data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
2257         }
2258      }
2259   } while (k < 64);
2260   return 1;
2261}
2262
2263static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2264{
2265   int diff,dc;
2266   int t;
2267   if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2268
2269   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2270
2271   if (j->succ_high == 0) {
2272      // first scan for DC coefficient, must be first
2273      memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
2274      t = stbi__jpeg_huff_decode(j, hdc);
2275      if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2276      diff = t ? stbi__extend_receive(j, t) : 0;
2277
2278      if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG");
2279      dc = j->img_comp[b].dc_pred + diff;
2280      j->img_comp[b].dc_pred = dc;
2281      if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2282      data[0] = (short) (dc * (1 << j->succ_low));
2283   } else {
2284      // refinement scan for DC coefficient
2285      if (stbi__jpeg_get_bit(j))
2286         data[0] += (short) (1 << j->succ_low);
2287   }
2288   return 1;
2289}
2290
2291// @OPTIMIZE: store non-zigzagged during the decode passes,
2292// and only de-zigzag when dequantizing
2293static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2294{
2295   int k;
2296   if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2297
2298   if (j->succ_high == 0) {
2299      int shift = j->succ_low;
2300
2301      if (j->eob_run) {
2302         --j->eob_run;
2303         return 1;
2304      }
2305
2306      k = j->spec_start;
2307      do {
2308         unsigned int zig;
2309         int c,r,s;
2310         if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2311         c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2312         r = fac[c];
2313         if (r) { // fast-AC path
2314            k += (r >> 4) & 15; // run
2315            s = r & 15; // combined length
2316            if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
2317            j->code_buffer <<= s;
2318            j->code_bits -= s;
2319            zig = stbi__jpeg_dezigzag[k++];
2320            data[zig] = (short) ((r >> 8) * (1 << shift));
2321         } else {
2322            int rs = stbi__jpeg_huff_decode(j, hac);
2323            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2324            s = rs & 15;
2325            r = rs >> 4;
2326            if (s == 0) {
2327               if (r < 15) {
2328                  j->eob_run = (1 << r);
2329                  if (r)
2330                     j->eob_run += stbi__jpeg_get_bits(j, r);
2331                  --j->eob_run;
2332                  break;
2333               }
2334               k += 16;
2335            } else {
2336               k += r;
2337               zig = stbi__jpeg_dezigzag[k++];
2338               data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
2339            }
2340         }
2341      } while (k <= j->spec_end);
2342   } else {
2343      // refinement scan for these AC coefficients
2344
2345      short bit = (short) (1 << j->succ_low);
2346
2347      if (j->eob_run) {
2348         --j->eob_run;
2349         for (k = j->spec_start; k <= j->spec_end; ++k) {
2350            short *p = &data[stbi__jpeg_dezigzag[k]];
2351            if (*p != 0)
2352               if (stbi__jpeg_get_bit(j))
2353                  if ((*p & bit)==0) {
2354                     if (*p > 0)
2355                        *p += bit;
2356                     else
2357                        *p -= bit;
2358                  }
2359         }
2360      } else {
2361         k = j->spec_start;
2362         do {
2363            int r,s;
2364            int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2365            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2366            s = rs & 15;
2367            r = rs >> 4;
2368            if (s == 0) {
2369               if (r < 15) {
2370                  j->eob_run = (1 << r) - 1;
2371                  if (r)
2372                     j->eob_run += stbi__jpeg_get_bits(j, r);
2373                  r = 64; // force end of block
2374               } else {
2375                  // r=15 s=0 should write 16 0s, so we just do
2376                  // a run of 15 0s and then write s (which is 0),
2377                  // so we don't have to do anything special here
2378               }
2379            } else {
2380               if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2381               // sign bit
2382               if (stbi__jpeg_get_bit(j))
2383                  s = bit;
2384               else
2385                  s = -bit;
2386            }
2387
2388            // advance by r
2389            while (k <= j->spec_end) {
2390               short *p = &data[stbi__jpeg_dezigzag[k++]];
2391               if (*p != 0) {
2392                  if (stbi__jpeg_get_bit(j))
2393                     if ((*p & bit)==0) {
2394                        if (*p > 0)
2395                           *p += bit;
2396                        else
2397                           *p -= bit;
2398                     }
2399               } else {
2400                  if (r == 0) {
2401                     *p = (short) s;
2402                     break;
2403                  }
2404                  --r;
2405               }
2406            }
2407         } while (k <= j->spec_end);
2408      }
2409   }
2410   return 1;
2411}
2412
2413// take a -128..127 value and stbi__clamp it and convert to 0..255
2414stbi_inline static stbi_uc stbi__clamp(int x)
2415{
2416   // trick to use a single test to catch both cases
2417   if ((unsigned int) x > 255) {
2418      if (x < 0) return 0;
2419      if (x > 255) return 255;
2420   }
2421   return (stbi_uc) x;
2422}
2423
2424#define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
2425#define stbi__fsh(x)  ((x) * 4096)
2426
2427// derived from jidctint -- DCT_ISLOW
2428#define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2429   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2430   p2 = s2;                                    \
2431   p3 = s6;                                    \
2432   p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
2433   t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
2434   t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
2435   p2 = s0;                                    \
2436   p3 = s4;                                    \
2437   t0 = stbi__fsh(p2+p3);                      \
2438   t1 = stbi__fsh(p2-p3);                      \
2439   x0 = t0+t3;                                 \
2440   x3 = t0-t3;                                 \
2441   x1 = t1+t2;                                 \
2442   x2 = t1-t2;                                 \
2443   t0 = s7;                                    \
2444   t1 = s5;                                    \
2445   t2 = s3;                                    \
2446   t3 = s1;                                    \
2447   p3 = t0+t2;                                 \
2448   p4 = t1+t3;                                 \
2449   p1 = t0+t3;                                 \
2450   p2 = t1+t2;                                 \
2451   p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
2452   t0 = t0*stbi__f2f( 0.298631336f);           \
2453   t1 = t1*stbi__f2f( 2.053119869f);           \
2454   t2 = t2*stbi__f2f( 3.072711026f);           \
2455   t3 = t3*stbi__f2f( 1.501321110f);           \
2456   p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
2457   p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
2458   p3 = p3*stbi__f2f(-1.961570560f);           \
2459   p4 = p4*stbi__f2f(-0.390180644f);           \
2460   t3 += p1+p4;                                \
2461   t2 += p2+p3;                                \
2462   t1 += p2+p4;                                \
2463   t0 += p1+p3;
2464
2465static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2466{
2467   int i,val[64],*v=val;
2468   stbi_uc *o;
2469   short *d = data;
2470
2471   // columns
2472   for (i=0; i < 8; ++i,++d, ++v) {
2473      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2474      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2475           && d[40]==0 && d[48]==0 && d[56]==0) {
2476         //    no shortcut                 0     seconds
2477         //    (1|2|3|4|5|6|7)==0          0     seconds
2478         //    all separate               -0.047 seconds
2479         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2480         int dcterm = d[0]*4;
2481         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2482      } else {
2483         STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2484         // constants scaled things up by 1<<12; let's bring them back
2485         // down, but keep 2 extra bits of precision
2486         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2487         v[ 0] = (x0+t3) >> 10;
2488         v[56] = (x0-t3) >> 10;
2489         v[ 8] = (x1+t2) >> 10;
2490         v[48] = (x1-t2) >> 10;
2491         v[16] = (x2+t1) >> 10;
2492         v[40] = (x2-t1) >> 10;
2493         v[24] = (x3+t0) >> 10;
2494         v[32] = (x3-t0) >> 10;
2495      }
2496   }
2497
2498   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2499      // no fast case since the first 1D IDCT spread components out
2500      STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2501      // constants scaled things up by 1<<12, plus we had 1<<2 from first
2502      // loop, plus horizontal and vertical each scale by sqrt(8) so together
2503      // we've got an extra 1<<3, so 1<<17 total we need to remove.
2504      // so we want to round that, which means adding 0.5 * 1<<17,
2505      // aka 65536. Also, we'll end up with -128 to 127 that we want
2506      // to encode as 0..255 by adding 128, so we'll add that before the shift
2507      x0 += 65536 + (128<<17);
2508      x1 += 65536 + (128<<17);
2509      x2 += 65536 + (128<<17);
2510      x3 += 65536 + (128<<17);
2511      // tried computing the shifts into temps, or'ing the temps to see
2512      // if any were out of range, but that was slower
2513      o[0] = stbi__clamp((x0+t3) >> 17);
2514      o[7] = stbi__clamp((x0-t3) >> 17);
2515      o[1] = stbi__clamp((x1+t2) >> 17);
2516      o[6] = stbi__clamp((x1-t2) >> 17);
2517      o[2] = stbi__clamp((x2+t1) >> 17);
2518      o[5] = stbi__clamp((x2-t1) >> 17);
2519      o[3] = stbi__clamp((x3+t0) >> 17);
2520      o[4] = stbi__clamp((x3-t0) >> 17);
2521   }
2522}
2523
2524#ifdef STBI_SSE2
2525// sse2 integer IDCT. not the fastest possible implementation but it
2526// produces bit-identical results to the generic C version so it's
2527// fully "transparent".
2528static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2529{
2530   // This is constructed to match our regular (generic) integer IDCT exactly.
2531   __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2532   __m128i tmp;
2533
2534   // dot product constant: even elems=x, odd elems=y
2535   #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2536
2537   // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2538   // out(1) = c1[even]*x + c1[odd]*y
2539   #define dct_rot(out0,out1, x,y,c0,c1) \
2540      __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2541      __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2542      __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2543      __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2544      __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2545      __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2546
2547   // out = in << 12  (in 16-bit, out 32-bit)
2548   #define dct_widen(out, in) \
2549      __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2550      __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2551
2552   // wide add
2553   #define dct_wadd(out, a, b) \
2554      __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2555      __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2556
2557   // wide sub
2558   #define dct_wsub(out, a, b) \
2559      __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2560      __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2561
2562   // butterfly a/b, add bias, then shift by "s" and pack
2563   #define dct_bfly32o(out0, out1, a,b,bias,s) \
2564      { \
2565         __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2566         __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2567         dct_wadd(sum, abiased, b); \
2568         dct_wsub(dif, abiased, b); \
2569         out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2570         out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2571      }
2572
2573   // 8-bit interleave step (for transposes)
2574   #define dct_interleave8(a, b) \
2575      tmp = a; \
2576      a = _mm_unpacklo_epi8(a, b); \
2577      b = _mm_unpackhi_epi8(tmp, b)
2578
2579   // 16-bit interleave step (for transposes)
2580   #define dct_interleave16(a, b) \
2581      tmp = a; \
2582      a = _mm_unpacklo_epi16(a, b); \
2583      b = _mm_unpackhi_epi16(tmp, b)
2584
2585   #define dct_pass(bias,shift) \
2586      { \
2587         /* even part */ \
2588         dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2589         __m128i sum04 = _mm_add_epi16(row0, row4); \
2590         __m128i dif04 = _mm_sub_epi16(row0, row4); \
2591         dct_widen(t0e, sum04); \
2592         dct_widen(t1e, dif04); \
2593         dct_wadd(x0, t0e, t3e); \
2594         dct_wsub(x3, t0e, t3e); \
2595         dct_wadd(x1, t1e, t2e); \
2596         dct_wsub(x2, t1e, t2e); \
2597         /* odd part */ \
2598         dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2599         dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2600         __m128i sum17 = _mm_add_epi16(row1, row7); \
2601         __m128i sum35 = _mm_add_epi16(row3, row5); \
2602         dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2603         dct_wadd(x4, y0o, y4o); \
2604         dct_wadd(x5, y1o, y5o); \
2605         dct_wadd(x6, y2o, y5o); \
2606         dct_wadd(x7, y3o, y4o); \
2607         dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2608         dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2609         dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2610         dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2611      }
2612
2613   __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2614   __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2615   __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2616   __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2617   __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2618   __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2619   __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2620   __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2621
2622   // rounding biases in column/row passes, see stbi__idct_block for explanation.
2623   __m128i bias_0 = _mm_set1_epi32(512);
2624   __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2625
2626   // load
2627   row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2628   row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2629   row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2630   row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2631   row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2632   row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2633   row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2634   row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2635
2636   // column pass
2637   dct_pass(bias_0, 10);
2638
2639   {
2640      // 16bit 8x8 transpose pass 1
2641      dct_interleave16(row0, row4);
2642      dct_interleave16(row1, row5);
2643      dct_interleave16(row2, row6);
2644      dct_interleave16(row3, row7);
2645
2646      // transpose pass 2
2647      dct_interleave16(row0, row2);
2648      dct_interleave16(row1, row3);
2649      dct_interleave16(row4, row6);
2650      dct_interleave16(row5, row7);
2651
2652      // transpose pass 3
2653      dct_interleave16(row0, row1);
2654      dct_interleave16(row2, row3);
2655      dct_interleave16(row4, row5);
2656      dct_interleave16(row6, row7);
2657   }
2658
2659   // row pass
2660   dct_pass(bias_1, 17);
2661
2662   {
2663      // pack
2664      __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2665      __m128i p1 = _mm_packus_epi16(row2, row3);
2666      __m128i p2 = _mm_packus_epi16(row4, row5);
2667      __m128i p3 = _mm_packus_epi16(row6, row7);
2668
2669      // 8bit 8x8 transpose pass 1
2670      dct_interleave8(p0, p2); // a0e0a1e1...
2671      dct_interleave8(p1, p3); // c0g0c1g1...
2672
2673      // transpose pass 2
2674      dct_interleave8(p0, p1); // a0c0e0g0...
2675      dct_interleave8(p2, p3); // b0d0f0h0...
2676
2677      // transpose pass 3
2678      dct_interleave8(p0, p2); // a0b0c0d0...
2679      dct_interleave8(p1, p3); // a4b4c4d4...
2680
2681      // store
2682      _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2683      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2684      _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2685      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2686      _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2687      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2688      _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2689      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2690   }
2691
2692#undef dct_const
2693#undef dct_rot
2694#undef dct_widen
2695#undef dct_wadd
2696#undef dct_wsub
2697#undef dct_bfly32o
2698#undef dct_interleave8
2699#undef dct_interleave16
2700#undef dct_pass
2701}
2702
2703#endif // STBI_SSE2
2704
2705#ifdef STBI_NEON
2706
2707// NEON integer IDCT. should produce bit-identical
2708// results to the generic C version.
2709static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2710{
2711   int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2712
2713   int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2714   int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2715   int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2716   int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2717   int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2718   int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2719   int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2720   int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2721   int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2722   int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2723   int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2724   int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2725
2726#define dct_long_mul(out, inq, coeff) \
2727   int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2728   int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2729
2730#define dct_long_mac(out, acc, inq, coeff) \
2731   int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2732   int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2733
2734#define dct_widen(out, inq) \
2735   int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2736   int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2737
2738// wide add
2739#define dct_wadd(out, a, b) \
2740   int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2741   int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2742
2743// wide sub
2744#define dct_wsub(out, a, b) \
2745   int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2746   int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2747
2748// butterfly a/b, then shift using "shiftop" by "s" and pack
2749#define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2750   { \
2751      dct_wadd(sum, a, b); \
2752      dct_wsub(dif, a, b); \
2753      out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2754      out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2755   }
2756
2757#define dct_pass(shiftop, shift) \
2758   { \
2759      /* even part */ \
2760      int16x8_t sum26 = vaddq_s16(row2, row6); \
2761      dct_long_mul(p1e, sum26, rot0_0); \
2762      dct_long_mac(t2e, p1e, row6, rot0_1); \
2763      dct_long_mac(t3e, p1e, row2, rot0_2); \
2764      int16x8_t sum04 = vaddq_s16(row0, row4); \
2765      int16x8_t dif04 = vsubq_s16(row0, row4); \
2766      dct_widen(t0e, sum04); \
2767      dct_widen(t1e, dif04); \
2768      dct_wadd(x0, t0e, t3e); \
2769      dct_wsub(x3, t0e, t3e); \
2770      dct_wadd(x1, t1e, t2e); \
2771      dct_wsub(x2, t1e, t2e); \
2772      /* odd part */ \
2773      int16x8_t sum15 = vaddq_s16(row1, row5); \
2774      int16x8_t sum17 = vaddq_s16(row1, row7); \
2775      int16x8_t sum35 = vaddq_s16(row3, row5); \
2776      int16x8_t sum37 = vaddq_s16(row3, row7); \
2777      int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2778      dct_long_mul(p5o, sumodd, rot1_0); \
2779      dct_long_mac(p1o, p5o, sum17, rot1_1); \
2780      dct_long_mac(p2o, p5o, sum35, rot1_2); \
2781      dct_long_mul(p3o, sum37, rot2_0); \
2782      dct_long_mul(p4o, sum15, rot2_1); \
2783      dct_wadd(sump13o, p1o, p3o); \
2784      dct_wadd(sump24o, p2o, p4o); \
2785      dct_wadd(sump23o, p2o, p3o); \
2786      dct_wadd(sump14o, p1o, p4o); \
2787      dct_long_mac(x4, sump13o, row7, rot3_0); \
2788      dct_long_mac(x5, sump24o, row5, rot3_1); \
2789      dct_long_mac(x6, sump23o, row3, rot3_2); \
2790      dct_long_mac(x7, sump14o, row1, rot3_3); \
2791      dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2792      dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2793      dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2794      dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2795   }
2796
2797   // load
2798   row0 = vld1q_s16(data + 0*8);
2799   row1 = vld1q_s16(data + 1*8);
2800   row2 = vld1q_s16(data + 2*8);
2801   row3 = vld1q_s16(data + 3*8);
2802   row4 = vld1q_s16(data + 4*8);
2803   row5 = vld1q_s16(data + 5*8);
2804   row6 = vld1q_s16(data + 6*8);
2805   row7 = vld1q_s16(data + 7*8);
2806
2807   // add DC bias
2808   row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2809
2810   // column pass
2811   dct_pass(vrshrn_n_s32, 10);
2812
2813   // 16bit 8x8 transpose
2814   {
2815// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2816// whether compilers actually get this is another story, sadly.
2817#define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2818#define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2819#define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2820
2821      // pass 1
2822      dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2823      dct_trn16(row2, row3);
2824      dct_trn16(row4, row5);
2825      dct_trn16(row6, row7);
2826
2827      // pass 2
2828      dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2829      dct_trn32(row1, row3);
2830      dct_trn32(row4, row6);
2831      dct_trn32(row5, row7);
2832
2833      // pass 3
2834      dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2835      dct_trn64(row1, row5);
2836      dct_trn64(row2, row6);
2837      dct_trn64(row3, row7);
2838
2839#undef dct_trn16
2840#undef dct_trn32
2841#undef dct_trn64
2842   }
2843
2844   // row pass
2845   // vrshrn_n_s32 only supports shifts up to 16, we need
2846   // 17. so do a non-rounding shift of 16 first then follow
2847   // up with a rounding shift by 1.
2848   dct_pass(vshrn_n_s32, 16);
2849
2850   {
2851      // pack and round
2852      uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2853      uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2854      uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2855      uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2856      uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2857      uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2858      uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2859      uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2860
2861      // again, these can translate into one instruction, but often don't.
2862#define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2863#define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2864#define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2865
2866      // sadly can't use interleaved stores here since we only write
2867      // 8 bytes to each scan line!
2868
2869      // 8x8 8-bit transpose pass 1
2870      dct_trn8_8(p0, p1);
2871      dct_trn8_8(p2, p3);
2872      dct_trn8_8(p4, p5);
2873      dct_trn8_8(p6, p7);
2874
2875      // pass 2
2876      dct_trn8_16(p0, p2);
2877      dct_trn8_16(p1, p3);
2878      dct_trn8_16(p4, p6);
2879      dct_trn8_16(p5, p7);
2880
2881      // pass 3
2882      dct_trn8_32(p0, p4);
2883      dct_trn8_32(p1, p5);
2884      dct_trn8_32(p2, p6);
2885      dct_trn8_32(p3, p7);
2886
2887      // store
2888      vst1_u8(out, p0); out += out_stride;
2889      vst1_u8(out, p1); out += out_stride;
2890      vst1_u8(out, p2); out += out_stride;
2891      vst1_u8(out, p3); out += out_stride;
2892      vst1_u8(out, p4); out += out_stride;
2893      vst1_u8(out, p5); out += out_stride;
2894      vst1_u8(out, p6); out += out_stride;
2895      vst1_u8(out, p7);
2896
2897#undef dct_trn8_8
2898#undef dct_trn8_16
2899#undef dct_trn8_32
2900   }
2901
2902#undef dct_long_mul
2903#undef dct_long_mac
2904#undef dct_widen
2905#undef dct_wadd
2906#undef dct_wsub
2907#undef dct_bfly32o
2908#undef dct_pass
2909}
2910
2911#endif // STBI_NEON
2912
2913#define STBI__MARKER_none  0xff
2914// if there's a pending marker from the entropy stream, return that
2915// otherwise, fetch from the stream and get a marker. if there's no
2916// marker, return 0xff, which is never a valid marker value
2917static stbi_uc stbi__get_marker(stbi__jpeg *j)
2918{
2919   stbi_uc x;
2920   if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2921   x = stbi__get8(j->s);
2922   if (x != 0xff) return STBI__MARKER_none;
2923   while (x == 0xff)
2924      x = stbi__get8(j->s); // consume repeated 0xff fill bytes
2925   return x;
2926}
2927
2928// in each scan, we'll have scan_n components, and the order
2929// of the components is specified by order[]
2930#define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2931
2932// after a restart interval, stbi__jpeg_reset the entropy decoder and
2933// the dc prediction
2934static void stbi__jpeg_reset(stbi__jpeg *j)
2935{
2936   j->code_bits = 0;
2937   j->code_buffer = 0;
2938   j->nomore = 0;
2939   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
2940   j->marker = STBI__MARKER_none;
2941   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2942   j->eob_run = 0;
2943   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2944   // since we don't even allow 1<<30 pixels
2945}
2946
2947static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2948{
2949   stbi__jpeg_reset(z);
2950   if (!z->progressive) {
2951      if (z->scan_n == 1) {
2952         int i,j;
2953         STBI_SIMD_ALIGN(short, data[64]);
2954         int n = z->order[0];
2955         // non-interleaved data, we just need to process one block at a time,
2956         // in trivial scanline order
2957         // number of blocks to do just depends on how many actual "pixels" this
2958         // component has, independent of interleaved MCU blocking and such
2959         int w = (z->img_comp[n].x+7) >> 3;
2960         int h = (z->img_comp[n].y+7) >> 3;
2961         for (j=0; j < h; ++j) {
2962            for (i=0; i < w; ++i) {
2963               int ha = z->img_comp[n].ha;
2964               if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2965               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2966               // every data block is an MCU, so countdown the restart interval
2967               if (--z->todo <= 0) {
2968                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2969                  // if it's NOT a restart, then just bail, so we get corrupt data
2970                  // rather than no data
2971                  if (!STBI__RESTART(z->marker)) return 1;
2972                  stbi__jpeg_reset(z);
2973               }
2974            }
2975         }
2976         return 1;
2977      } else { // interleaved
2978         int i,j,k,x,y;
2979         STBI_SIMD_ALIGN(short, data[64]);
2980         for (j=0; j < z->img_mcu_y; ++j) {
2981            for (i=0; i < z->img_mcu_x; ++i) {
2982               // scan an interleaved mcu... process scan_n components in order
2983               for (k=0; k < z->scan_n; ++k) {
2984                  int n = z->order[k];
2985                  // scan out an mcu's worth of this component; that's just determined
2986                  // by the basic H and V specified for the component
2987                  for (y=0; y < z->img_comp[n].v; ++y) {
2988                     for (x=0; x < z->img_comp[n].h; ++x) {
2989                        int x2 = (i*z->img_comp[n].h + x)*8;
2990                        int y2 = (j*z->img_comp[n].v + y)*8;
2991                        int ha = z->img_comp[n].ha;
2992                        if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2993                        z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2994                     }
2995                  }
2996               }
2997               // after all interleaved components, that's an interleaved MCU,
2998               // so now count down the restart interval
2999               if (--z->todo <= 0) {
3000                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3001                  if (!STBI__RESTART(z->marker)) return 1;
3002                  stbi__jpeg_reset(z);
3003               }
3004            }
3005         }
3006         return 1;
3007      }
3008   } else {
3009      if (z->scan_n == 1) {
3010         int i,j;
3011         int n = z->order[0];
3012         // non-interleaved data, we just need to process one block at a time,
3013         // in trivial scanline order
3014         // number of blocks to do just depends on how many actual "pixels" this
3015         // component has, independent of interleaved MCU blocking and such
3016         int w = (z->img_comp[n].x+7) >> 3;
3017         int h = (z->img_comp[n].y+7) >> 3;
3018         for (j=0; j < h; ++j) {
3019            for (i=0; i < w; ++i) {
3020               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3021               if (z->spec_start == 0) {
3022                  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3023                     return 0;
3024               } else {
3025                  int ha = z->img_comp[n].ha;
3026                  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
3027                     return 0;
3028               }
3029               // every data block is an MCU, so countdown the restart interval
3030               if (--z->todo <= 0) {
3031                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3032                  if (!STBI__RESTART(z->marker)) return 1;
3033                  stbi__jpeg_reset(z);
3034               }
3035            }
3036         }
3037         return 1;
3038      } else { // interleaved
3039         int i,j,k,x,y;
3040         for (j=0; j < z->img_mcu_y; ++j) {
3041            for (i=0; i < z->img_mcu_x; ++i) {
3042               // scan an interleaved mcu... process scan_n components in order
3043               for (k=0; k < z->scan_n; ++k) {
3044                  int n = z->order[k];
3045                  // scan out an mcu's worth of this component; that's just determined
3046                  // by the basic H and V specified for the component
3047                  for (y=0; y < z->img_comp[n].v; ++y) {
3048                     for (x=0; x < z->img_comp[n].h; ++x) {
3049                        int x2 = (i*z->img_comp[n].h + x);
3050                        int y2 = (j*z->img_comp[n].v + y);
3051                        short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
3052                        if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3053                           return 0;
3054                     }
3055                  }
3056               }
3057               // after all interleaved components, that's an interleaved MCU,
3058               // so now count down the restart interval
3059               if (--z->todo <= 0) {
3060                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3061                  if (!STBI__RESTART(z->marker)) return 1;
3062                  stbi__jpeg_reset(z);
3063               }
3064            }
3065         }
3066         return 1;
3067      }
3068   }
3069}
3070
3071static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
3072{
3073   int i;
3074   for (i=0; i < 64; ++i)
3075      data[i] *= dequant[i];
3076}
3077
3078static void stbi__jpeg_finish(stbi__jpeg *z)
3079{
3080   if (z->progressive) {
3081      // dequantize and idct the data
3082      int i,j,n;
3083      for (n=0; n < z->s->img_n; ++n) {
3084         int w = (z->img_comp[n].x+7) >> 3;
3085         int h = (z->img_comp[n].y+7) >> 3;
3086         for (j=0; j < h; ++j) {
3087            for (i=0; i < w; ++i) {
3088               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3089               stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
3090               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
3091            }
3092         }
3093      }
3094   }
3095}
3096
3097static int stbi__process_marker(stbi__jpeg *z, int m)
3098{
3099   int L;
3100   switch (m) {
3101      case STBI__MARKER_none: // no marker found
3102         return stbi__err("expected marker","Corrupt JPEG");
3103
3104      case 0xDD: // DRI - specify restart interval
3105         if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
3106         z->restart_interval = stbi__get16be(z->s);
3107         return 1;
3108
3109      case 0xDB: // DQT - define quantization table
3110         L = stbi__get16be(z->s)-2;
3111         while (L > 0) {
3112            int q = stbi__get8(z->s);
3113            int p = q >> 4, sixteen = (p != 0);
3114            int t = q & 15,i;
3115            if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
3116            if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
3117
3118            for (i=0; i < 64; ++i)
3119               z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
3120            L -= (sixteen ? 129 : 65);
3121         }
3122         return L==0;
3123
3124      case 0xC4: // DHT - define huffman table
3125         L = stbi__get16be(z->s)-2;
3126         while (L > 0) {
3127            stbi_uc *v;
3128            int sizes[16],i,n=0;
3129            int q = stbi__get8(z->s);
3130            int tc = q >> 4;
3131            int th = q & 15;
3132            if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
3133            for (i=0; i < 16; ++i) {
3134               sizes[i] = stbi__get8(z->s);
3135               n += sizes[i];
3136            }
3137            if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values!
3138            L -= 17;
3139            if (tc == 0) {
3140               if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
3141               v = z->huff_dc[th].values;
3142            } else {
3143               if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
3144               v = z->huff_ac[th].values;
3145            }
3146            for (i=0; i < n; ++i)
3147               v[i] = stbi__get8(z->s);
3148            if (tc != 0)
3149               stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
3150            L -= n;
3151         }
3152         return L==0;
3153   }
3154
3155   // check for comment block or APP blocks
3156   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
3157      L = stbi__get16be(z->s);
3158      if (L < 2) {
3159         if (m == 0xFE)
3160            return stbi__err("bad COM len","Corrupt JPEG");
3161         else
3162            return stbi__err("bad APP len","Corrupt JPEG");
3163      }
3164      L -= 2;
3165
3166      if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
3167         static const unsigned char tag[5] = {'J','F','I','F','\0'};
3168         int ok = 1;
3169         int i;
3170         for (i=0; i < 5; ++i)
3171            if (stbi__get8(z->s) != tag[i])
3172               ok = 0;
3173         L -= 5;
3174         if (ok)
3175            z->jfif = 1;
3176      } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
3177         static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
3178         int ok = 1;
3179         int i;
3180         for (i=0; i < 6; ++i)
3181            if (stbi__get8(z->s) != tag[i])
3182               ok = 0;
3183         L -= 6;
3184         if (ok) {
3185            stbi__get8(z->s); // version
3186            stbi__get16be(z->s); // flags0
3187            stbi__get16be(z->s); // flags1
3188            z->app14_color_transform = stbi__get8(z->s); // color transform
3189            L -= 6;
3190         }
3191      }
3192
3193      stbi__skip(z->s, L);
3194      return 1;
3195   }
3196
3197   return stbi__err("unknown marker","Corrupt JPEG");
3198}
3199
3200// after we see SOS
3201static int stbi__process_scan_header(stbi__jpeg *z)
3202{
3203   int i;
3204   int Ls = stbi__get16be(z->s);
3205   z->scan_n = stbi__get8(z->s);
3206   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
3207   if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
3208   for (i=0; i < z->scan_n; ++i) {
3209      int id = stbi__get8(z->s), which;
3210      int q = stbi__get8(z->s);
3211      for (which = 0; which < z->s->img_n; ++which)
3212         if (z->img_comp[which].id == id)
3213            break;
3214      if (which == z->s->img_n) return 0; // no match
3215      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
3216      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
3217      z->order[i] = which;
3218   }
3219
3220   {
3221      int aa;
3222      z->spec_start = stbi__get8(z->s);
3223      z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
3224      aa = stbi__get8(z->s);
3225      z->succ_high = (aa >> 4);
3226      z->succ_low  = (aa & 15);
3227      if (z->progressive) {
3228         if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
3229            return stbi__err("bad SOS", "Corrupt JPEG");
3230      } else {
3231         if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
3232         if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
3233         z->spec_end = 63;
3234      }
3235   }
3236
3237   return 1;
3238}
3239
3240static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
3241{
3242   int i;
3243   for (i=0; i < ncomp; ++i) {
3244      if (z->img_comp[i].raw_data) {
3245         STBI_FREE(z->img_comp[i].raw_data);
3246         z->img_comp[i].raw_data = NULL;
3247         z->img_comp[i].data = NULL;
3248      }
3249      if (z->img_comp[i].raw_coeff) {
3250         STBI_FREE(z->img_comp[i].raw_coeff);
3251         z->img_comp[i].raw_coeff = 0;
3252         z->img_comp[i].coeff = 0;
3253      }
3254      if (z->img_comp[i].linebuf) {
3255         STBI_FREE(z->img_comp[i].linebuf);
3256         z->img_comp[i].linebuf = NULL;
3257      }
3258   }
3259   return why;
3260}
3261
3262static int stbi__process_frame_header(stbi__jpeg *z, int scan)
3263{
3264   stbi__context *s = z->s;
3265   int Lf,p,i,q, h_max=1,v_max=1,c;
3266   Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
3267   p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
3268   s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
3269   s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
3270   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3271   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3272   c = stbi__get8(s);
3273   if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
3274   s->img_n = c;
3275   for (i=0; i < c; ++i) {
3276      z->img_comp[i].data = NULL;
3277      z->img_comp[i].linebuf = NULL;
3278   }
3279
3280   if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
3281
3282   z->rgb = 0;
3283   for (i=0; i < s->img_n; ++i) {
3284      static const unsigned char rgb[3] = { 'R', 'G', 'B' };
3285      z->img_comp[i].id = stbi__get8(s);
3286      if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
3287         ++z->rgb;
3288      q = stbi__get8(s);
3289      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
3290      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
3291      z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
3292   }
3293
3294   if (scan != STBI__SCAN_load) return 1;
3295
3296   if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3297
3298   for (i=0; i < s->img_n; ++i) {
3299      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3300      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3301   }
3302
3303   // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
3304   // and I've never seen a non-corrupted JPEG file actually use them
3305   for (i=0; i < s->img_n; ++i) {
3306      if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
3307      if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
3308   }
3309
3310   // compute interleaved mcu info
3311   z->img_h_max = h_max;
3312   z->img_v_max = v_max;
3313   z->img_mcu_w = h_max * 8;
3314   z->img_mcu_h = v_max * 8;
3315   // these sizes can't be more than 17 bits
3316   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
3317   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
3318
3319   for (i=0; i < s->img_n; ++i) {
3320      // number of effective pixels (e.g. for non-interleaved MCU)
3321      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
3322      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
3323      // to simplify generation, we'll allocate enough memory to decode
3324      // the bogus oversized data from using interleaved MCUs and their
3325      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3326      // discard the extra data until colorspace conversion
3327      //
3328      // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3329      // so these muls can't overflow with 32-bit ints (which we require)
3330      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3331      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3332      z->img_comp[i].coeff = 0;
3333      z->img_comp[i].raw_coeff = 0;
3334      z->img_comp[i].linebuf = NULL;
3335      z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3336      if (z->img_comp[i].raw_data == NULL)
3337         return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3338      // align blocks for idct using mmx/sse
3339      z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
3340      if (z->progressive) {
3341         // w2, h2 are multiples of 8 (see above)
3342         z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3343         z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3344         z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3345         if (z->img_comp[i].raw_coeff == NULL)
3346            return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3347         z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
3348      }
3349   }
3350
3351   return 1;
3352}
3353
3354// use comparisons since in some cases we handle more than one case (e.g. SOF)
3355#define stbi__DNL(x)         ((x) == 0xdc)
3356#define stbi__SOI(x)         ((x) == 0xd8)
3357#define stbi__EOI(x)         ((x) == 0xd9)
3358#define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3359#define stbi__SOS(x)         ((x) == 0xda)
3360
3361#define stbi__SOF_progressive(x)   ((x) == 0xc2)
3362
3363static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3364{
3365   int m;
3366   z->jfif = 0;
3367   z->app14_color_transform = -1; // valid values are 0,1,2
3368   z->marker = STBI__MARKER_none; // initialize cached marker to empty
3369   m = stbi__get_marker(z);
3370   if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
3371   if (scan == STBI__SCAN_type) return 1;
3372   m = stbi__get_marker(z);
3373   while (!stbi__SOF(m)) {
3374      if (!stbi__process_marker(z,m)) return 0;
3375      m = stbi__get_marker(z);
3376      while (m == STBI__MARKER_none) {
3377         // some files have extra padding after their blocks, so ok, we'll scan
3378         if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3379         m = stbi__get_marker(z);
3380      }
3381   }
3382   z->progressive = stbi__SOF_progressive(m);
3383   if (!stbi__process_frame_header(z, scan)) return 0;
3384   return 1;
3385}
3386
3387static int stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
3388{
3389   // some JPEGs have junk at end, skip over it but if we find what looks
3390   // like a valid marker, resume there
3391   while (!stbi__at_eof(j->s)) {
3392      int x = stbi__get8(j->s);
3393      while (x == 255) { // might be a marker
3394         if (stbi__at_eof(j->s)) return STBI__MARKER_none;
3395         x = stbi__get8(j->s);
3396         if (x != 0x00 && x != 0xff) {
3397            // not a stuffed zero or lead-in to another marker, looks
3398            // like an actual marker, return it
3399            return x;
3400         }
3401         // stuffed zero has x=0 now which ends the loop, meaning we go
3402         // back to regular scan loop.
3403         // repeated 0xff keeps trying to read the next byte of the marker.
3404      }
3405   }
3406   return STBI__MARKER_none;
3407}
3408
3409// decode image to YCbCr format
3410static int stbi__decode_jpeg_image(stbi__jpeg *j)
3411{
3412   int m;
3413   for (m = 0; m < 4; m++) {
3414      j->img_comp[m].raw_data = NULL;
3415      j->img_comp[m].raw_coeff = NULL;
3416   }
3417   j->restart_interval = 0;
3418   if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3419   m = stbi__get_marker(j);
3420   while (!stbi__EOI(m)) {
3421      if (stbi__SOS(m)) {
3422         if (!stbi__process_scan_header(j)) return 0;
3423         if (!stbi__parse_entropy_coded_data(j)) return 0;
3424         if (j->marker == STBI__MARKER_none ) {
3425         j->marker = stbi__skip_jpeg_junk_at_end(j);
3426            // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3427         }
3428         m = stbi__get_marker(j);
3429         if (STBI__RESTART(m))
3430            m = stbi__get_marker(j);
3431      } else if (stbi__DNL(m)) {
3432         int Ld = stbi__get16be(j->s);
3433         stbi__uint32 NL = stbi__get16be(j->s);
3434         if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
3435         if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
3436         m = stbi__get_marker(j);
3437      } else {
3438         if (!stbi__process_marker(j, m)) return 1;
3439         m = stbi__get_marker(j);
3440      }
3441   }
3442   if (j->progressive)
3443      stbi__jpeg_finish(j);
3444   return 1;
3445}
3446
3447// static jfif-centered resampling (across block boundaries)
3448
3449typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3450                                    int w, int hs);
3451
3452#define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3453
3454static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3455{
3456   STBI_NOTUSED(out);
3457   STBI_NOTUSED(in_far);
3458   STBI_NOTUSED(w);
3459   STBI_NOTUSED(hs);
3460   return in_near;
3461}
3462
3463static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3464{
3465   // need to generate two samples vertically for every one in input
3466   int i;
3467   STBI_NOTUSED(hs);
3468   for (i=0; i < w; ++i)
3469      out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
3470   return out;
3471}
3472
3473static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3474{
3475   // need to generate two samples horizontally for every one in input
3476   int i;
3477   stbi_uc *input = in_near;
3478
3479   if (w == 1) {
3480      // if only one sample, can't do any interpolation
3481      out[0] = out[1] = input[0];
3482      return out;
3483   }
3484
3485   out[0] = input[0];
3486   out[1] = stbi__div4(input[0]*3 + input[1] + 2);
3487   for (i=1; i < w-1; ++i) {
3488      int n = 3*input[i]+2;
3489      out[i*2+0] = stbi__div4(n+input[i-1]);
3490      out[i*2+1] = stbi__div4(n+input[i+1]);
3491   }
3492   out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
3493   out[i*2+1] = input[w-1];
3494
3495   STBI_NOTUSED(in_far);
3496   STBI_NOTUSED(hs);
3497
3498   return out;
3499}
3500
3501#define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3502
3503static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3504{
3505   // need to generate 2x2 samples for every one in input
3506   int i,t0,t1;
3507   if (w == 1) {
3508      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3509      return out;
3510   }
3511
3512   t1 = 3*in_near[0] + in_far[0];
3513   out[0] = stbi__div4(t1+2);
3514   for (i=1; i < w; ++i) {
3515      t0 = t1;
3516      t1 = 3*in_near[i]+in_far[i];
3517      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3518      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3519   }
3520   out[w*2-1] = stbi__div4(t1+2);
3521
3522   STBI_NOTUSED(hs);
3523
3524   return out;
3525}
3526
3527#if defined(STBI_SSE2) || defined(STBI_NEON)
3528static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3529{
3530   // need to generate 2x2 samples for every one in input
3531   int i=0,t0,t1;
3532
3533   if (w == 1) {
3534      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3535      return out;
3536   }
3537
3538   t1 = 3*in_near[0] + in_far[0];
3539   // process groups of 8 pixels for as long as we can.
3540   // note we can't handle the last pixel in a row in this loop
3541   // because we need to handle the filter boundary conditions.
3542   for (; i < ((w-1) & ~7); i += 8) {
3543#if defined(STBI_SSE2)
3544      // load and perform the vertical filtering pass
3545      // this uses 3*x + y = 4*x + (y - x)
3546      __m128i zero  = _mm_setzero_si128();
3547      __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
3548      __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3549      __m128i farw  = _mm_unpacklo_epi8(farb, zero);
3550      __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3551      __m128i diff  = _mm_sub_epi16(farw, nearw);
3552      __m128i nears = _mm_slli_epi16(nearw, 2);
3553      __m128i curr  = _mm_add_epi16(nears, diff); // current row
3554
3555      // horizontal filter works the same based on shifted vers of current
3556      // row. "prev" is current row shifted right by 1 pixel; we need to
3557      // insert the previous pixel value (from t1).
3558      // "next" is current row shifted left by 1 pixel, with first pixel
3559      // of next block of 8 pixels added in.
3560      __m128i prv0 = _mm_slli_si128(curr, 2);
3561      __m128i nxt0 = _mm_srli_si128(curr, 2);
3562      __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3563      __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3564
3565      // horizontal filter, polyphase implementation since it's convenient:
3566      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3567      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3568      // note the shared term.
3569      __m128i bias  = _mm_set1_epi16(8);
3570      __m128i curs = _mm_slli_epi16(curr, 2);
3571      __m128i prvd = _mm_sub_epi16(prev, curr);
3572      __m128i nxtd = _mm_sub_epi16(next, curr);
3573      __m128i curb = _mm_add_epi16(curs, bias);
3574      __m128i even = _mm_add_epi16(prvd, curb);
3575      __m128i odd  = _mm_add_epi16(nxtd, curb);
3576
3577      // interleave even and odd pixels, then undo scaling.
3578      __m128i int0 = _mm_unpacklo_epi16(even, odd);
3579      __m128i int1 = _mm_unpackhi_epi16(even, odd);
3580      __m128i de0  = _mm_srli_epi16(int0, 4);
3581      __m128i de1  = _mm_srli_epi16(int1, 4);
3582
3583      // pack and write output
3584      __m128i outv = _mm_packus_epi16(de0, de1);
3585      _mm_storeu_si128((__m128i *) (out + i*2), outv);
3586#elif defined(STBI_NEON)
3587      // load and perform the vertical filtering pass
3588      // this uses 3*x + y = 4*x + (y - x)
3589      uint8x8_t farb  = vld1_u8(in_far + i);
3590      uint8x8_t nearb = vld1_u8(in_near + i);
3591      int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3592      int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3593      int16x8_t curr  = vaddq_s16(nears, diff); // current row
3594
3595      // horizontal filter works the same based on shifted vers of current
3596      // row. "prev" is current row shifted right by 1 pixel; we need to
3597      // insert the previous pixel value (from t1).
3598      // "next" is current row shifted left by 1 pixel, with first pixel
3599      // of next block of 8 pixels added in.
3600      int16x8_t prv0 = vextq_s16(curr, curr, 7);
3601      int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3602      int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3603      int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3604
3605      // horizontal filter, polyphase implementation since it's convenient:
3606      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3607      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3608      // note the shared term.
3609      int16x8_t curs = vshlq_n_s16(curr, 2);
3610      int16x8_t prvd = vsubq_s16(prev, curr);
3611      int16x8_t nxtd = vsubq_s16(next, curr);
3612      int16x8_t even = vaddq_s16(curs, prvd);
3613      int16x8_t odd  = vaddq_s16(curs, nxtd);
3614
3615      // undo scaling and round, then store with even/odd phases interleaved
3616      uint8x8x2_t o;
3617      o.val[0] = vqrshrun_n_s16(even, 4);
3618      o.val[1] = vqrshrun_n_s16(odd,  4);
3619      vst2_u8(out + i*2, o);
3620#endif
3621
3622      // "previous" value for next iter
3623      t1 = 3*in_near[i+7] + in_far[i+7];
3624   }
3625
3626   t0 = t1;
3627   t1 = 3*in_near[i] + in_far[i];
3628   out[i*2] = stbi__div16(3*t1 + t0 + 8);
3629
3630   for (++i; i < w; ++i) {
3631      t0 = t1;
3632      t1 = 3*in_near[i]+in_far[i];
3633      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3634      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3635   }
3636   out[w*2-1] = stbi__div4(t1+2);
3637
3638   STBI_NOTUSED(hs);
3639
3640   return out;
3641}
3642#endif
3643
3644static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3645{
3646   // resample with nearest-neighbor
3647   int i,j;
3648   STBI_NOTUSED(in_far);
3649   for (i=0; i < w; ++i)
3650      for (j=0; j < hs; ++j)
3651         out[i*hs+j] = in_near[i];
3652   return out;
3653}
3654
3655// this is a reduced-precision calculation of YCbCr-to-RGB introduced
3656// to make sure the code produces the same results in both SIMD and scalar
3657#define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
3658static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3659{
3660   int i;
3661   for (i=0; i < count; ++i) {
3662      int y_fixed = (y[i] << 20) + (1<<19); // rounding
3663      int r,g,b;
3664      int cr = pcr[i] - 128;
3665      int cb = pcb[i] - 128;
3666      r = y_fixed +  cr* stbi__float2fixed(1.40200f);
3667      g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3668      b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
3669      r >>= 20;
3670      g >>= 20;
3671      b >>= 20;
3672      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3673      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3674      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3675      out[0] = (stbi_uc)r;
3676      out[1] = (stbi_uc)g;
3677      out[2] = (stbi_uc)b;
3678      out[3] = 255;
3679      out += step;
3680   }
3681}
3682
3683#if defined(STBI_SSE2) || defined(STBI_NEON)
3684static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3685{
3686   int i = 0;
3687
3688#ifdef STBI_SSE2
3689   // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3690   // it's useful in practice (you wouldn't use it for textures, for example).
3691   // so just accelerate step == 4 case.
3692   if (step == 4) {
3693      // this is a fairly straightforward implementation and not super-optimized.
3694      __m128i signflip  = _mm_set1_epi8(-0x80);
3695      __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3696      __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3697      __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3698      __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3699      __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3700      __m128i xw = _mm_set1_epi16(255); // alpha channel
3701
3702      for (; i+7 < count; i += 8) {
3703         // load
3704         __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3705         __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3706         __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3707         __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3708         __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3709
3710         // unpack to short (and left-shift cr, cb by 8)
3711         __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3712         __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3713         __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3714
3715         // color transform
3716         __m128i yws = _mm_srli_epi16(yw, 4);
3717         __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3718         __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3719         __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3720         __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3721         __m128i rws = _mm_add_epi16(cr0, yws);
3722         __m128i gwt = _mm_add_epi16(cb0, yws);
3723         __m128i bws = _mm_add_epi16(yws, cb1);
3724         __m128i gws = _mm_add_epi16(gwt, cr1);
3725
3726         // descale
3727         __m128i rw = _mm_srai_epi16(rws, 4);
3728         __m128i bw = _mm_srai_epi16(bws, 4);
3729         __m128i gw = _mm_srai_epi16(gws, 4);
3730
3731         // back to byte, set up for transpose
3732         __m128i brb = _mm_packus_epi16(rw, bw);
3733         __m128i gxb = _mm_packus_epi16(gw, xw);
3734
3735         // transpose to interleave channels
3736         __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3737         __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3738         __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3739         __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3740
3741         // store
3742         _mm_storeu_si128((__m128i *) (out + 0), o0);
3743         _mm_storeu_si128((__m128i *) (out + 16), o1);
3744         out += 32;
3745      }
3746   }
3747#endif
3748
3749#ifdef STBI_NEON
3750   // in this version, step=3 support would be easy to add. but is there demand?
3751   if (step == 4) {
3752      // this is a fairly straightforward implementation and not super-optimized.
3753      uint8x8_t signflip = vdup_n_u8(0x80);
3754      int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3755      int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3756      int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3757      int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3758
3759      for (; i+7 < count; i += 8) {
3760         // load
3761         uint8x8_t y_bytes  = vld1_u8(y + i);
3762         uint8x8_t cr_bytes = vld1_u8(pcr + i);
3763         uint8x8_t cb_bytes = vld1_u8(pcb + i);
3764         int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3765         int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3766
3767         // expand to s16
3768         int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3769         int16x8_t crw = vshll_n_s8(cr_biased, 7);
3770         int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3771
3772         // color transform
3773         int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3774         int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3775         int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3776         int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3777         int16x8_t rws = vaddq_s16(yws, cr0);
3778         int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3779         int16x8_t bws = vaddq_s16(yws, cb1);
3780
3781         // undo scaling, round, convert to byte
3782         uint8x8x4_t o;
3783         o.val[0] = vqrshrun_n_s16(rws, 4);
3784         o.val[1] = vqrshrun_n_s16(gws, 4);
3785         o.val[2] = vqrshrun_n_s16(bws, 4);
3786         o.val[3] = vdup_n_u8(255);
3787
3788         // store, interleaving r/g/b/a
3789         vst4_u8(out, o);
3790         out += 8*4;
3791      }
3792   }
3793#endif
3794
3795   for (; i < count; ++i) {
3796      int y_fixed = (y[i] << 20) + (1<<19); // rounding
3797      int r,g,b;
3798      int cr = pcr[i] - 128;
3799      int cb = pcb[i] - 128;
3800      r = y_fixed + cr* stbi__float2fixed(1.40200f);
3801      g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3802      b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
3803      r >>= 20;
3804      g >>= 20;
3805      b >>= 20;
3806      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3807      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3808      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3809      out[0] = (stbi_uc)r;
3810      out[1] = (stbi_uc)g;
3811      out[2] = (stbi_uc)b;
3812      out[3] = 255;
3813      out += step;
3814   }
3815}
3816#endif
3817
3818// set up the kernels
3819static void stbi__setup_jpeg(stbi__jpeg *j)
3820{
3821   j->idct_block_kernel = stbi__idct_block;
3822   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3823   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3824
3825#ifdef STBI_SSE2
3826   if (stbi__sse2_available()) {
3827      j->idct_block_kernel = stbi__idct_simd;
3828      j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3829      j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3830   }
3831#endif
3832
3833#ifdef STBI_NEON
3834   j->idct_block_kernel = stbi__idct_simd;
3835   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3836   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3837#endif
3838}
3839
3840// clean up the temporary component buffers
3841static void stbi__cleanup_jpeg(stbi__jpeg *j)
3842{
3843   stbi__free_jpeg_components(j, j->s->img_n, 0);
3844}
3845
3846typedef struct
3847{
3848   resample_row_func resample;
3849   stbi_uc *line0,*line1;
3850   int hs,vs;   // expansion factor in each axis
3851   int w_lores; // horizontal pixels pre-expansion
3852   int ystep;   // how far through vertical expansion we are
3853   int ypos;    // which pre-expansion row we're on
3854} stbi__resample;
3855
3856// fast 0..255 * 0..255 => 0..255 rounded multiplication
3857static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
3858{
3859   unsigned int t = x*y + 128;
3860   return (stbi_uc) ((t + (t >>8)) >> 8);
3861}
3862
3863static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3864{
3865   int n, decode_n, is_rgb;
3866   z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3867
3868   // validate req_comp
3869   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3870
3871   // load a jpeg image from whichever source, but leave in YCbCr format
3872   if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3873
3874   // determine actual number of components to generate
3875   n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
3876
3877   is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
3878
3879   if (z->s->img_n == 3 && n < 3 && !is_rgb)
3880      decode_n = 1;
3881   else
3882      decode_n = z->s->img_n;
3883
3884   // nothing to do if no components requested; check this now to avoid
3885   // accessing uninitialized coutput[0] later
3886   if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
3887
3888   // resample and color-convert
3889   {
3890      int k;
3891      unsigned int i,j;
3892      stbi_uc *output;
3893      stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
3894
3895      stbi__resample res_comp[4];
3896
3897      for (k=0; k < decode_n; ++k) {
3898         stbi__resample *r = &res_comp[k];
3899
3900         // allocate line buffer big enough for upsampling off the edges
3901         // with upsample factor of 4
3902         z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3903         if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3904
3905         r->hs      = z->img_h_max / z->img_comp[k].h;
3906         r->vs      = z->img_v_max / z->img_comp[k].v;
3907         r->ystep   = r->vs >> 1;
3908         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3909         r->ypos    = 0;
3910         r->line0   = r->line1 = z->img_comp[k].data;
3911
3912         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3913         else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3914         else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3915         else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3916         else                               r->resample = stbi__resample_row_generic;
3917      }
3918
3919      // can't error after this so, this is safe
3920      output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3921      if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3922
3923      // now go ahead and resample
3924      for (j=0; j < z->s->img_y; ++j) {
3925         stbi_uc *out = output + n * z->s->img_x * j;
3926         for (k=0; k < decode_n; ++k) {
3927            stbi__resample *r = &res_comp[k];
3928            int y_bot = r->ystep >= (r->vs >> 1);
3929            coutput[k] = r->resample(z->img_comp[k].linebuf,
3930                                     y_bot ? r->line1 : r->line0,
3931                                     y_bot ? r->line0 : r->line1,
3932                                     r->w_lores, r->hs);
3933            if (++r->ystep >= r->vs) {
3934               r->ystep = 0;
3935               r->line0 = r->line1;
3936               if (++r->ypos < z->img_comp[k].y)
3937                  r->line1 += z->img_comp[k].w2;
3938            }
3939         }
3940         if (n >= 3) {
3941            stbi_uc *y = coutput[0];
3942            if (z->s->img_n == 3) {
3943               if (is_rgb) {
3944                  for (i=0; i < z->s->img_x; ++i) {
3945                     out[0] = y[i];
3946                     out[1] = coutput[1][i];
3947                     out[2] = coutput[2][i];
3948                     out[3] = 255;
3949                     out += n;
3950                  }
3951               } else {
3952                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3953               }
3954            } else if (z->s->img_n == 4) {
3955               if (z->app14_color_transform == 0) { // CMYK
3956                  for (i=0; i < z->s->img_x; ++i) {
3957                     stbi_uc m = coutput[3][i];
3958                     out[0] = stbi__blinn_8x8(coutput[0][i], m);
3959                     out[1] = stbi__blinn_8x8(coutput[1][i], m);
3960                     out[2] = stbi__blinn_8x8(coutput[2][i], m);
3961                     out[3] = 255;
3962                     out += n;
3963                  }
3964               } else if (z->app14_color_transform == 2) { // YCCK
3965                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3966                  for (i=0; i < z->s->img_x; ++i) {
3967                     stbi_uc m = coutput[3][i];
3968                     out[0] = stbi__blinn_8x8(255 - out[0], m);
3969                     out[1] = stbi__blinn_8x8(255 - out[1], m);
3970                     out[2] = stbi__blinn_8x8(255 - out[2], m);
3971                     out += n;
3972                  }
3973               } else { // YCbCr + alpha?  Ignore the fourth channel for now
3974                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3975               }
3976            } else
3977               for (i=0; i < z->s->img_x; ++i) {
3978                  out[0] = out[1] = out[2] = y[i];
3979                  out[3] = 255; // not used if n==3
3980                  out += n;
3981               }
3982         } else {
3983            if (is_rgb) {
3984               if (n == 1)
3985                  for (i=0; i < z->s->img_x; ++i)
3986                     *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3987               else {
3988                  for (i=0; i < z->s->img_x; ++i, out += 2) {
3989                     out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3990                     out[1] = 255;
3991                  }
3992               }
3993            } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
3994               for (i=0; i < z->s->img_x; ++i) {
3995                  stbi_uc m = coutput[3][i];
3996                  stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
3997                  stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
3998                  stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
3999                  out[0] = stbi__compute_y(r, g, b);
4000                  out[1] = 255;
4001                  out += n;
4002               }
4003            } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
4004               for (i=0; i < z->s->img_x; ++i) {
4005                  out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
4006                  out[1] = 255;
4007                  out += n;
4008               }
4009            } else {
4010               stbi_uc *y = coutput[0];
4011               if (n == 1)
4012                  for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
4013               else
4014                  for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
4015            }
4016         }
4017      }
4018      stbi__cleanup_jpeg(z);
4019      *out_x = z->s->img_x;
4020      *out_y = z->s->img_y;
4021      if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
4022      return output;
4023   }
4024}
4025
4026static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
4027{
4028   unsigned char* result;
4029   stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
4030   if (!j) return stbi__errpuc("outofmem", "Out of memory");
4031   memset(j, 0, sizeof(stbi__jpeg));
4032   STBI_NOTUSED(ri);
4033   j->s = s;
4034   stbi__setup_jpeg(j);
4035   result = load_jpeg_image(j, x,y,comp,req_comp);
4036   STBI_FREE(j);
4037   return result;
4038}
4039
4040static int stbi__jpeg_test(stbi__context *s)
4041{
4042   int r;
4043   stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
4044   if (!j) return stbi__err("outofmem", "Out of memory");
4045   memset(j, 0, sizeof(stbi__jpeg));
4046   j->s = s;
4047   stbi__setup_jpeg(j);
4048   r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
4049   stbi__rewind(s);
4050   STBI_FREE(j);
4051   return r;
4052}
4053
4054static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
4055{
4056   if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
4057      stbi__rewind( j->s );
4058      return 0;
4059   }
4060   if (x) *x = j->s->img_x;
4061   if (y) *y = j->s->img_y;
4062   if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
4063   return 1;
4064}
4065
4066static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
4067{
4068   int result;
4069   stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
4070   if (!j) return stbi__err("outofmem", "Out of memory");
4071   memset(j, 0, sizeof(stbi__jpeg));
4072   j->s = s;
4073   result = stbi__jpeg_info_raw(j, x, y, comp);
4074   STBI_FREE(j);
4075   return result;
4076}
4077#endif
4078
4079// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
4080//    simple implementation
4081//      - all input must be provided in an upfront buffer
4082//      - all output is written to a single output buffer (can malloc/realloc)
4083//    performance
4084//      - fast huffman
4085
4086#ifndef STBI_NO_ZLIB
4087
4088// fast-way is faster to check than jpeg huffman, but slow way is slower
4089#define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
4090#define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
4091#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
4092
4093// zlib-style huffman encoding
4094// (jpegs packs from left, zlib from right, so can't share code)
4095typedef struct
4096{
4097   stbi__uint16 fast[1 << STBI__ZFAST_BITS];
4098   stbi__uint16 firstcode[16];
4099   int maxcode[17];
4100   stbi__uint16 firstsymbol[16];
4101   stbi_uc  size[STBI__ZNSYMS];
4102   stbi__uint16 value[STBI__ZNSYMS];
4103} stbi__zhuffman;
4104
4105stbi_inline static int stbi__bitreverse16(int n)
4106{
4107  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
4108  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
4109  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
4110  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
4111  return n;
4112}
4113
4114stbi_inline static int stbi__bit_reverse(int v, int bits)
4115{
4116   STBI_ASSERT(bits <= 16);
4117   // to bit reverse n bits, reverse 16 and shift
4118   // e.g. 11 bits, bit reverse and shift away 5
4119   return stbi__bitreverse16(v) >> (16-bits);
4120}
4121
4122static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
4123{
4124   int i,k=0;
4125   int code, next_code[16], sizes[17];
4126
4127   // DEFLATE spec for generating codes
4128   memset(sizes, 0, sizeof(sizes));
4129   memset(z->fast, 0, sizeof(z->fast));
4130   for (i=0; i < num; ++i)
4131      ++sizes[sizelist[i]];
4132   sizes[0] = 0;
4133   for (i=1; i < 16; ++i)
4134      if (sizes[i] > (1 << i))
4135         return stbi__err("bad sizes", "Corrupt PNG");
4136   code = 0;
4137   for (i=1; i < 16; ++i) {
4138      next_code[i] = code;
4139      z->firstcode[i] = (stbi__uint16) code;
4140      z->firstsymbol[i] = (stbi__uint16) k;
4141      code = (code + sizes[i]);
4142      if (sizes[i])
4143         if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
4144      z->maxcode[i] = code << (16-i); // preshift for inner loop
4145      code <<= 1;
4146      k += sizes[i];
4147   }
4148   z->maxcode[16] = 0x10000; // sentinel
4149   for (i=0; i < num; ++i) {
4150      int s = sizelist[i];
4151      if (s) {
4152         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
4153         stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
4154         z->size [c] = (stbi_uc     ) s;
4155         z->value[c] = (stbi__uint16) i;
4156         if (s <= STBI__ZFAST_BITS) {
4157            int j = stbi__bit_reverse(next_code[s],s);
4158            while (j < (1 << STBI__ZFAST_BITS)) {
4159               z->fast[j] = fastv;
4160               j += (1 << s);
4161            }
4162         }
4163         ++next_code[s];
4164      }
4165   }
4166   return 1;
4167}
4168
4169// zlib-from-memory implementation for PNG reading
4170//    because PNG allows splitting the zlib stream arbitrarily,
4171//    and it's annoying structurally to have PNG call ZLIB call PNG,
4172//    we require PNG read all the IDATs and combine them into a single
4173//    memory buffer
4174
4175typedef struct
4176{
4177   stbi_uc *zbuffer, *zbuffer_end;
4178   int num_bits;
4179   stbi__uint32 code_buffer;
4180
4181   char *zout;
4182   char *zout_start;
4183   char *zout_end;
4184   int   z_expandable;
4185
4186   stbi__zhuffman z_length, z_distance;
4187} stbi__zbuf;
4188
4189stbi_inline static int stbi__zeof(stbi__zbuf *z)
4190{
4191   return (z->zbuffer >= z->zbuffer_end);
4192}
4193
4194stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
4195{
4196   return stbi__zeof(z) ? 0 : *z->zbuffer++;
4197}
4198
4199static void stbi__fill_bits(stbi__zbuf *z)
4200{
4201   do {
4202      if (z->code_buffer >= (1U << z->num_bits)) {
4203        z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
4204        return;
4205      }
4206      z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
4207      z->num_bits += 8;
4208   } while (z->num_bits <= 24);
4209}
4210
4211stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
4212{
4213   unsigned int k;
4214   if (z->num_bits < n) stbi__fill_bits(z);
4215   k = z->code_buffer & ((1 << n) - 1);
4216   z->code_buffer >>= n;
4217   z->num_bits -= n;
4218   return k;
4219}
4220
4221static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
4222{
4223   int b,s,k;
4224   // not resolved by fast table, so compute it the slow way
4225   // use jpeg approach, which requires MSbits at top
4226   k = stbi__bit_reverse(a->code_buffer, 16);
4227   for (s=STBI__ZFAST_BITS+1; ; ++s)
4228      if (k < z->maxcode[s])
4229         break;
4230   if (s >= 16) return -1; // invalid code!
4231   // code size is s, so:
4232   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
4233   if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
4234   if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
4235   a->code_buffer >>= s;
4236   a->num_bits -= s;
4237   return z->value[b];
4238}
4239
4240stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
4241{
4242   int b,s;
4243   if (a->num_bits < 16) {
4244      if (stbi__zeof(a)) {
4245         return -1;   /* report error for unexpected end of data. */
4246      }
4247      stbi__fill_bits(a);
4248   }
4249   b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
4250   if (b) {
4251      s = b >> 9;
4252      a->code_buffer >>= s;
4253      a->num_bits -= s;
4254      return b & 511;
4255   }
4256   return stbi__zhuffman_decode_slowpath(a, z);
4257}
4258
4259static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
4260{
4261   char *q;
4262   unsigned int cur, limit, old_limit;
4263   z->zout = zout;
4264   if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
4265   cur   = (unsigned int) (z->zout - z->zout_start);
4266   limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
4267   if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
4268   while (cur + n > limit) {
4269      if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
4270      limit *= 2;
4271   }
4272   q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
4273   STBI_NOTUSED(old_limit);
4274   if (q == NULL) return stbi__err("outofmem", "Out of memory");
4275   z->zout_start = q;
4276   z->zout       = q + cur;
4277   z->zout_end   = q + limit;
4278   return 1;
4279}
4280
4281static const int stbi__zlength_base[31] = {
4282   3,4,5,6,7,8,9,10,11,13,
4283   15,17,19,23,27,31,35,43,51,59,
4284   67,83,99,115,131,163,195,227,258,0,0 };
4285
4286static const int stbi__zlength_extra[31]=
4287{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
4288
4289static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
4290257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
4291
4292static const int stbi__zdist_extra[32] =
4293{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
4294
4295static int stbi__parse_huffman_block(stbi__zbuf *a)
4296{
4297   char *zout = a->zout;
4298   for(;;) {
4299      int z = stbi__zhuffman_decode(a, &a->z_length);
4300      if (z < 256) {
4301         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
4302         if (zout >= a->zout_end) {
4303            if (!stbi__zexpand(a, zout, 1)) return 0;
4304            zout = a->zout;
4305         }
4306         *zout++ = (char) z;
4307      } else {
4308         stbi_uc *p;
4309         int len,dist;
4310         if (z == 256) {
4311            a->zout = zout;
4312            return 1;
4313         }
4314         if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data
4315         z -= 257;
4316         len = stbi__zlength_base[z];
4317         if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
4318         z = stbi__zhuffman_decode(a, &a->z_distance);
4319         if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data
4320         dist = stbi__zdist_base[z];
4321         if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
4322         if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
4323         if (zout + len > a->zout_end) {
4324            if (!stbi__zexpand(a, zout, len)) return 0;
4325            zout = a->zout;
4326         }
4327         p = (stbi_uc *) (zout - dist);
4328         if (dist == 1) { // run of one byte; common in images.
4329            stbi_uc v = *p;
4330            if (len) { do *zout++ = v; while (--len); }
4331         } else {
4332            if (len) { do *zout++ = *p++; while (--len); }
4333         }
4334      }
4335   }
4336}
4337
4338static int stbi__compute_huffman_codes(stbi__zbuf *a)
4339{
4340   static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
4341   stbi__zhuffman z_codelength;
4342   stbi_uc lencodes[286+32+137];//padding for maximum single op
4343   stbi_uc codelength_sizes[19];
4344   int i,n;
4345
4346   int hlit  = stbi__zreceive(a,5) + 257;
4347   int hdist = stbi__zreceive(a,5) + 1;
4348   int hclen = stbi__zreceive(a,4) + 4;
4349   int ntot  = hlit + hdist;
4350
4351   memset(codelength_sizes, 0, sizeof(codelength_sizes));
4352   for (i=0; i < hclen; ++i) {
4353      int s = stbi__zreceive(a,3);
4354      codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
4355   }
4356   if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
4357
4358   n = 0;
4359   while (n < ntot) {
4360      int c = stbi__zhuffman_decode(a, &z_codelength);
4361      if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4362      if (c < 16)
4363         lencodes[n++] = (stbi_uc) c;
4364      else {
4365         stbi_uc fill = 0;
4366         if (c == 16) {
4367            c = stbi__zreceive(a,2)+3;
4368            if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4369            fill = lencodes[n-1];
4370         } else if (c == 17) {
4371            c = stbi__zreceive(a,3)+3;
4372         } else if (c == 18) {
4373            c = stbi__zreceive(a,7)+11;
4374         } else {
4375            return stbi__err("bad codelengths", "Corrupt PNG");
4376         }
4377         if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4378         memset(lencodes+n, fill, c);
4379         n += c;
4380      }
4381   }
4382   if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
4383   if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4384   if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
4385   return 1;
4386}
4387
4388static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4389{
4390   stbi_uc header[4];
4391   int len,nlen,k;
4392   if (a->num_bits & 7)
4393      stbi__zreceive(a, a->num_bits & 7); // discard
4394   // drain the bit-packed data into header
4395   k = 0;
4396   while (a->num_bits > 0) {
4397      header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
4398      a->code_buffer >>= 8;
4399      a->num_bits -= 8;
4400   }
4401   if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
4402   // now fill header the normal way
4403   while (k < 4)
4404      header[k++] = stbi__zget8(a);
4405   len  = header[1] * 256 + header[0];
4406   nlen = header[3] * 256 + header[2];
4407   if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
4408   if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
4409   if (a->zout + len > a->zout_end)
4410      if (!stbi__zexpand(a, a->zout, len)) return 0;
4411   memcpy(a->zout, a->zbuffer, len);
4412   a->zbuffer += len;
4413   a->zout += len;
4414   return 1;
4415}
4416
4417static int stbi__parse_zlib_header(stbi__zbuf *a)
4418{
4419   int cmf   = stbi__zget8(a);
4420   int cm    = cmf & 15;
4421   /* int cinfo = cmf >> 4; */
4422   int flg   = stbi__zget8(a);
4423   if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4424   if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4425   if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
4426   if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
4427   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4428   return 1;
4429}
4430
4431static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
4432{
4433   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4434   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4435   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4436   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4437   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4438   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4439   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4440   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4441   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
4442};
4443static const stbi_uc stbi__zdefault_distance[32] =
4444{
4445   5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
4446};
4447/*
4448Init algorithm:
4449{
4450   int i;   // use <= to match clearly with spec
4451   for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
4452   for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
4453   for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
4454   for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
4455
4456   for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
4457}
4458*/
4459
4460static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4461{
4462   int final, type;
4463   if (parse_header)
4464      if (!stbi__parse_zlib_header(a)) return 0;
4465   a->num_bits = 0;
4466   a->code_buffer = 0;
4467   do {
4468      final = stbi__zreceive(a,1);
4469      type = stbi__zreceive(a,2);
4470      if (type == 0) {
4471         if (!stbi__parse_uncompressed_block(a)) return 0;
4472      } else if (type == 3) {
4473         return 0;
4474      } else {
4475         if (type == 1) {
4476            // use fixed code lengths
4477            if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
4478            if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
4479         } else {
4480            if (!stbi__compute_huffman_codes(a)) return 0;
4481         }
4482         if (!stbi__parse_huffman_block(a)) return 0;
4483      }
4484   } while (!final);
4485   return 1;
4486}
4487
4488static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4489{
4490   a->zout_start = obuf;
4491   a->zout       = obuf;
4492   a->zout_end   = obuf + olen;
4493   a->z_expandable = exp;
4494
4495   return stbi__parse_zlib(a, parse_header);
4496}
4497
4498STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4499{
4500   stbi__zbuf a;
4501   char *p = (char *) stbi__malloc(initial_size);
4502   if (p == NULL) return NULL;
4503   a.zbuffer = (stbi_uc *) buffer;
4504   a.zbuffer_end = (stbi_uc *) buffer + len;
4505   if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4506      if (outlen) *outlen = (int) (a.zout - a.zout_start);
4507      return a.zout_start;
4508   } else {
4509      STBI_FREE(a.zout_start);
4510      return NULL;
4511   }
4512}
4513
4514STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4515{
4516   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4517}
4518
4519STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4520{
4521   stbi__zbuf a;
4522   char *p = (char *) stbi__malloc(initial_size);
4523   if (p == NULL) return NULL;
4524   a.zbuffer = (stbi_uc *) buffer;
4525   a.zbuffer_end = (stbi_uc *) buffer + len;
4526   if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4527      if (outlen) *outlen = (int) (a.zout - a.zout_start);
4528      return a.zout_start;
4529   } else {
4530      STBI_FREE(a.zout_start);
4531      return NULL;
4532   }
4533}
4534
4535STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4536{
4537   stbi__zbuf a;
4538   a.zbuffer = (stbi_uc *) ibuffer;
4539   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4540   if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4541      return (int) (a.zout - a.zout_start);
4542   else
4543      return -1;
4544}
4545
4546STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4547{
4548   stbi__zbuf a;
4549   char *p = (char *) stbi__malloc(16384);
4550   if (p == NULL) return NULL;
4551   a.zbuffer = (stbi_uc *) buffer;
4552   a.zbuffer_end = (stbi_uc *) buffer+len;
4553   if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4554      if (outlen) *outlen = (int) (a.zout - a.zout_start);
4555      return a.zout_start;
4556   } else {
4557      STBI_FREE(a.zout_start);
4558      return NULL;
4559   }
4560}
4561
4562STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4563{
4564   stbi__zbuf a;
4565   a.zbuffer = (stbi_uc *) ibuffer;
4566   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4567   if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4568      return (int) (a.zout - a.zout_start);
4569   else
4570      return -1;
4571}
4572#endif
4573
4574// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
4575//    simple implementation
4576//      - only 8-bit samples
4577//      - no CRC checking
4578//      - allocates lots of intermediate memory
4579//        - avoids problem of streaming data between subsystems
4580//        - avoids explicit window management
4581//    performance
4582//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4583
4584#ifndef STBI_NO_PNG
4585typedef struct
4586{
4587   stbi__uint32 length;
4588   stbi__uint32 type;
4589} stbi__pngchunk;
4590
4591static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4592{
4593   stbi__pngchunk c;
4594   c.length = stbi__get32be(s);
4595   c.type   = stbi__get32be(s);
4596   return c;
4597}
4598
4599static int stbi__check_png_header(stbi__context *s)
4600{
4601   static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4602   int i;
4603   for (i=0; i < 8; ++i)
4604      if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
4605   return 1;
4606}
4607
4608typedef struct
4609{
4610   stbi__context *s;
4611   stbi_uc *idata, *expanded, *out;
4612   int depth;
4613} stbi__png;
4614
4615
4616enum {
4617   STBI__F_none=0,
4618   STBI__F_sub=1,
4619   STBI__F_up=2,
4620   STBI__F_avg=3,
4621   STBI__F_paeth=4,
4622   // synthetic filters used for first scanline to avoid needing a dummy row of 0s
4623   STBI__F_avg_first,
4624   STBI__F_paeth_first
4625};
4626
4627static stbi_uc first_row_filter[5] =
4628{
4629   STBI__F_none,
4630   STBI__F_sub,
4631   STBI__F_none,
4632   STBI__F_avg_first,
4633   STBI__F_paeth_first
4634};
4635
4636static int stbi__paeth(int a, int b, int c)
4637{
4638   int p = a + b - c;
4639   int pa = abs(p-a);
4640   int pb = abs(p-b);
4641   int pc = abs(p-c);
4642   if (pa <= pb && pa <= pc) return a;
4643   if (pb <= pc) return b;
4644   return c;
4645}
4646
4647static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4648
4649// create the png data from post-deflated data
4650static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4651{
4652   int bytes = (depth == 16? 2 : 1);
4653   stbi__context *s = a->s;
4654   stbi__uint32 i,j,stride = x*out_n*bytes;
4655   stbi__uint32 img_len, img_width_bytes;
4656   int k;
4657   int img_n = s->img_n; // copy it into a local for later
4658
4659   int output_bytes = out_n*bytes;
4660   int filter_bytes = img_n*bytes;
4661   int width = x;
4662
4663   STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4664   a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4665   if (!a->out) return stbi__err("outofmem", "Out of memory");
4666
4667   if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
4668   img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4669   img_len = (img_width_bytes + 1) * y;
4670
4671   // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
4672   // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
4673   // so just check for raw_len < img_len always.
4674   if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4675
4676   for (j=0; j < y; ++j) {
4677      stbi_uc *cur = a->out + stride*j;
4678      stbi_uc *prior;
4679      int filter = *raw++;
4680
4681      if (filter > 4)
4682         return stbi__err("invalid filter","Corrupt PNG");
4683
4684      if (depth < 8) {
4685         if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
4686         cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4687         filter_bytes = 1;
4688         width = img_width_bytes;
4689      }
4690      prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
4691
4692      // if first row, use special filter that doesn't sample previous row
4693      if (j == 0) filter = first_row_filter[filter];
4694
4695      // handle first byte explicitly
4696      for (k=0; k < filter_bytes; ++k) {
4697         switch (filter) {
4698            case STBI__F_none       : cur[k] = raw[k]; break;
4699            case STBI__F_sub        : cur[k] = raw[k]; break;
4700            case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4701            case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4702            case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4703            case STBI__F_avg_first  : cur[k] = raw[k]; break;
4704            case STBI__F_paeth_first: cur[k] = raw[k]; break;
4705         }
4706      }
4707
4708      if (depth == 8) {
4709         if (img_n != out_n)
4710            cur[img_n] = 255; // first pixel
4711         raw += img_n;
4712         cur += out_n;
4713         prior += out_n;
4714      } else if (depth == 16) {
4715         if (img_n != out_n) {
4716            cur[filter_bytes]   = 255; // first pixel top byte
4717            cur[filter_bytes+1] = 255; // first pixel bottom byte
4718         }
4719         raw += filter_bytes;
4720         cur += output_bytes;
4721         prior += output_bytes;
4722      } else {
4723         raw += 1;
4724         cur += 1;
4725         prior += 1;
4726      }
4727
4728      // this is a little gross, so that we don't switch per-pixel or per-component
4729      if (depth < 8 || img_n == out_n) {
4730         int nk = (width - 1)*filter_bytes;
4731         #define STBI__CASE(f) \
4732             case f:     \
4733                for (k=0; k < nk; ++k)
4734         switch (filter) {
4735            // "none" filter turns into a memcpy here; make that explicit.
4736            case STBI__F_none:         memcpy(cur, raw, nk); break;
4737            STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
4738            STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4739            STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
4740            STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
4741            STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
4742            STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
4743         }
4744         #undef STBI__CASE
4745         raw += nk;
4746      } else {
4747         STBI_ASSERT(img_n+1 == out_n);
4748         #define STBI__CASE(f) \
4749             case f:     \
4750                for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
4751                   for (k=0; k < filter_bytes; ++k)
4752         switch (filter) {
4753            STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
4754            STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
4755            STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4756            STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
4757            STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
4758            STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
4759            STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
4760         }
4761         #undef STBI__CASE
4762
4763         // the loop above sets the high byte of the pixels' alpha, but for
4764         // 16 bit png files we also need the low byte set. we'll do that here.
4765         if (depth == 16) {
4766            cur = a->out + stride*j; // start at the beginning of the row again
4767            for (i=0; i < x; ++i,cur+=output_bytes) {
4768               cur[filter_bytes+1] = 255;
4769            }
4770         }
4771      }
4772   }
4773
4774   // we make a separate pass to expand bits to pixels; for performance,
4775   // this could run two scanlines behind the above code, so it won't
4776   // intefere with filtering but will still be in the cache.
4777   if (depth < 8) {
4778      for (j=0; j < y; ++j) {
4779         stbi_uc *cur = a->out + stride*j;
4780         stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4781         // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4782         // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4783         stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4784
4785         // note that the final byte might overshoot and write more data than desired.
4786         // we can allocate enough data that this never writes out of memory, but it
4787         // could also overwrite the next scanline. can it overwrite non-empty data
4788         // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4789         // so we need to explicitly clamp the final ones
4790
4791         if (depth == 4) {
4792            for (k=x*img_n; k >= 2; k-=2, ++in) {
4793               *cur++ = scale * ((*in >> 4)       );
4794               *cur++ = scale * ((*in     ) & 0x0f);
4795            }
4796            if (k > 0) *cur++ = scale * ((*in >> 4)       );
4797         } else if (depth == 2) {
4798            for (k=x*img_n; k >= 4; k-=4, ++in) {
4799               *cur++ = scale * ((*in >> 6)       );
4800               *cur++ = scale * ((*in >> 4) & 0x03);
4801               *cur++ = scale * ((*in >> 2) & 0x03);
4802               *cur++ = scale * ((*in     ) & 0x03);
4803            }
4804            if (k > 0) *cur++ = scale * ((*in >> 6)       );
4805            if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4806            if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4807         } else if (depth == 1) {
4808            for (k=x*img_n; k >= 8; k-=8, ++in) {
4809               *cur++ = scale * ((*in >> 7)       );
4810               *cur++ = scale * ((*in >> 6) & 0x01);
4811               *cur++ = scale * ((*in >> 5) & 0x01);
4812               *cur++ = scale * ((*in >> 4) & 0x01);
4813               *cur++ = scale * ((*in >> 3) & 0x01);
4814               *cur++ = scale * ((*in >> 2) & 0x01);
4815               *cur++ = scale * ((*in >> 1) & 0x01);
4816               *cur++ = scale * ((*in     ) & 0x01);
4817            }
4818            if (k > 0) *cur++ = scale * ((*in >> 7)       );
4819            if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4820            if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4821            if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4822            if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4823            if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4824            if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4825         }
4826         if (img_n != out_n) {
4827            int q;
4828            // insert alpha = 255
4829            cur = a->out + stride*j;
4830            if (img_n == 1) {
4831               for (q=x-1; q >= 0; --q) {
4832                  cur[q*2+1] = 255;
4833                  cur[q*2+0] = cur[q];
4834               }
4835            } else {
4836               STBI_ASSERT(img_n == 3);
4837               for (q=x-1; q >= 0; --q) {
4838                  cur[q*4+3] = 255;
4839                  cur[q*4+2] = cur[q*3+2];
4840                  cur[q*4+1] = cur[q*3+1];
4841                  cur[q*4+0] = cur[q*3+0];
4842               }
4843            }
4844         }
4845      }
4846   } else if (depth == 16) {
4847      // force the image data from big-endian to platform-native.
4848      // this is done in a separate pass due to the decoding relying
4849      // on the data being untouched, but could probably be done
4850      // per-line during decode if care is taken.
4851      stbi_uc *cur = a->out;
4852      stbi__uint16 *cur16 = (stbi__uint16*)cur;
4853
4854      for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
4855         *cur16 = (cur[0] << 8) | cur[1];
4856      }
4857   }
4858
4859   return 1;
4860}
4861
4862static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4863{
4864   int bytes = (depth == 16 ? 2 : 1);
4865   int out_bytes = out_n * bytes;
4866   stbi_uc *final;
4867   int p;
4868   if (!interlaced)
4869      return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4870
4871   // de-interlacing
4872   final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4873   if (!final) return stbi__err("outofmem", "Out of memory");
4874   for (p=0; p < 7; ++p) {
4875      int xorig[] = { 0,4,0,2,0,1,0 };
4876      int yorig[] = { 0,0,4,0,2,0,1 };
4877      int xspc[]  = { 8,8,4,4,2,2,1 };
4878      int yspc[]  = { 8,8,8,4,4,2,2 };
4879      int i,j,x,y;
4880      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4881      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4882      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4883      if (x && y) {
4884         stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4885         if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4886            STBI_FREE(final);
4887            return 0;
4888         }
4889         for (j=0; j < y; ++j) {
4890            for (i=0; i < x; ++i) {
4891               int out_y = j*yspc[p]+yorig[p];
4892               int out_x = i*xspc[p]+xorig[p];
4893               memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4894                      a->out + (j*x+i)*out_bytes, out_bytes);
4895            }
4896         }
4897         STBI_FREE(a->out);
4898         image_data += img_len;
4899         image_data_len -= img_len;
4900      }
4901   }
4902   a->out = final;
4903
4904   return 1;
4905}
4906
4907static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4908{
4909   stbi__context *s = z->s;
4910   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4911   stbi_uc *p = z->out;
4912
4913   // compute color-based transparency, assuming we've
4914   // already got 255 as the alpha value in the output
4915   STBI_ASSERT(out_n == 2 || out_n == 4);
4916
4917   if (out_n == 2) {
4918      for (i=0; i < pixel_count; ++i) {
4919         p[1] = (p[0] == tc[0] ? 0 : 255);
4920         p += 2;
4921      }
4922   } else {
4923      for (i=0; i < pixel_count; ++i) {
4924         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4925            p[3] = 0;
4926         p += 4;
4927      }
4928   }
4929   return 1;
4930}
4931
4932static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4933{
4934   stbi__context *s = z->s;
4935   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4936   stbi__uint16 *p = (stbi__uint16*) z->out;
4937
4938   // compute color-based transparency, assuming we've
4939   // already got 65535 as the alpha value in the output
4940   STBI_ASSERT(out_n == 2 || out_n == 4);
4941
4942   if (out_n == 2) {
4943      for (i = 0; i < pixel_count; ++i) {
4944         p[1] = (p[0] == tc[0] ? 0 : 65535);
4945         p += 2;
4946      }
4947   } else {
4948      for (i = 0; i < pixel_count; ++i) {
4949         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4950            p[3] = 0;
4951         p += 4;
4952      }
4953   }
4954   return 1;
4955}
4956
4957static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4958{
4959   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4960   stbi_uc *p, *temp_out, *orig = a->out;
4961
4962   p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4963   if (p == NULL) return stbi__err("outofmem", "Out of memory");
4964
4965   // between here and free(out) below, exitting would leak
4966   temp_out = p;
4967
4968   if (pal_img_n == 3) {
4969      for (i=0; i < pixel_count; ++i) {
4970         int n = orig[i]*4;
4971         p[0] = palette[n  ];
4972         p[1] = palette[n+1];
4973         p[2] = palette[n+2];
4974         p += 3;
4975      }
4976   } else {
4977      for (i=0; i < pixel_count; ++i) {
4978         int n = orig[i]*4;
4979         p[0] = palette[n  ];
4980         p[1] = palette[n+1];
4981         p[2] = palette[n+2];
4982         p[3] = palette[n+3];
4983         p += 4;
4984      }
4985   }
4986   STBI_FREE(a->out);
4987   a->out = temp_out;
4988
4989   STBI_NOTUSED(len);
4990
4991   return 1;
4992}
4993
4994static int stbi__unpremultiply_on_load_global = 0;
4995static int stbi__de_iphone_flag_global = 0;
4996
4997STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4998{
4999   stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
5000}
5001
5002STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
5003{
5004   stbi__de_iphone_flag_global = flag_true_if_should_convert;
5005}
5006
5007#ifndef STBI_THREAD_LOCAL
5008#define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
5009#define stbi__de_iphone_flag  stbi__de_iphone_flag_global
5010#else
5011static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
5012static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
5013
5014STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
5015{
5016   stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
5017   stbi__unpremultiply_on_load_set = 1;
5018}
5019
5020STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
5021{
5022   stbi__de_iphone_flag_local = flag_true_if_should_convert;
5023   stbi__de_iphone_flag_set = 1;
5024}
5025
5026#define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
5027                                       ? stbi__unpremultiply_on_load_local      \
5028                                       : stbi__unpremultiply_on_load_global)
5029#define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
5030                                ? stbi__de_iphone_flag_local                    \
5031                                : stbi__de_iphone_flag_global)
5032#endif // STBI_THREAD_LOCAL
5033
5034static void stbi__de_iphone(stbi__png *z)
5035{
5036   stbi__context *s = z->s;
5037   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
5038   stbi_uc *p = z->out;
5039
5040   if (s->img_out_n == 3) {  // convert bgr to rgb
5041      for (i=0; i < pixel_count; ++i) {
5042         stbi_uc t = p[0];
5043         p[0] = p[2];
5044         p[2] = t;
5045         p += 3;
5046      }
5047   } else {
5048      STBI_ASSERT(s->img_out_n == 4);
5049      if (stbi__unpremultiply_on_load) {
5050         // convert bgr to rgb and unpremultiply
5051         for (i=0; i < pixel_count; ++i) {
5052            stbi_uc a = p[3];
5053            stbi_uc t = p[0];
5054            if (a) {
5055               stbi_uc half = a / 2;
5056               p[0] = (p[2] * 255 + half) / a;
5057               p[1] = (p[1] * 255 + half) / a;
5058               p[2] = ( t   * 255 + half) / a;
5059            } else {
5060               p[0] = p[2];
5061               p[2] = t;
5062            }
5063            p += 4;
5064         }
5065      } else {
5066         // convert bgr to rgb
5067         for (i=0; i < pixel_count; ++i) {
5068            stbi_uc t = p[0];
5069            p[0] = p[2];
5070            p[2] = t;
5071            p += 4;
5072         }
5073      }
5074   }
5075}
5076
5077#define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
5078
5079static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
5080{
5081   stbi_uc palette[1024], pal_img_n=0;
5082   stbi_uc has_trans=0, tc[3]={0};
5083   stbi__uint16 tc16[3];
5084   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
5085   int first=1,k,interlace=0, color=0, is_iphone=0;
5086   stbi__context *s = z->s;
5087
5088   z->expanded = NULL;
5089   z->idata = NULL;
5090   z->out = NULL;
5091
5092   if (!stbi__check_png_header(s)) return 0;
5093
5094   if (scan == STBI__SCAN_type) return 1;
5095
5096   for (;;) {
5097      stbi__pngchunk c = stbi__get_chunk_header(s);
5098      switch (c.type) {
5099         case STBI__PNG_TYPE('C','g','B','I'):
5100            is_iphone = 1;
5101            stbi__skip(s, c.length);
5102            break;
5103         case STBI__PNG_TYPE('I','H','D','R'): {
5104            int comp,filter;
5105            if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
5106            first = 0;
5107            if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
5108            s->img_x = stbi__get32be(s);
5109            s->img_y = stbi__get32be(s);
5110            if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5111            if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5112            z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
5113            color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
5114            if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
5115            if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
5116            comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
5117            filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
5118            interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
5119            if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
5120            if (!pal_img_n) {
5121               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
5122               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
5123            } else {
5124               // if paletted, then pal_n is our final components, and
5125               // img_n is # components to decompress/filter.
5126               s->img_n = 1;
5127               if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
5128            }
5129            // even with SCAN_header, have to scan to see if we have a tRNS
5130            break;
5131         }
5132
5133         case STBI__PNG_TYPE('P','L','T','E'):  {
5134            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5135            if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
5136            pal_len = c.length / 3;
5137            if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
5138            for (i=0; i < pal_len; ++i) {
5139               palette[i*4+0] = stbi__get8(s);
5140               palette[i*4+1] = stbi__get8(s);
5141               palette[i*4+2] = stbi__get8(s);
5142               palette[i*4+3] = 255;
5143            }
5144            break;
5145         }
5146
5147         case STBI__PNG_TYPE('t','R','N','S'): {
5148            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5149            if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
5150            if (pal_img_n) {
5151               if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
5152               if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
5153               if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
5154               pal_img_n = 4;
5155               for (i=0; i < c.length; ++i)
5156                  palette[i*4+3] = stbi__get8(s);
5157            } else {
5158               if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
5159               if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
5160               has_trans = 1;
5161               // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now.
5162               if (scan == STBI__SCAN_header) { ++s->img_n; return 1; }
5163               if (z->depth == 16) {
5164                  for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
5165               } else {
5166                  for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
5167               }
5168            }
5169            break;
5170         }
5171
5172         case STBI__PNG_TYPE('I','D','A','T'): {
5173            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5174            if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
5175            if (scan == STBI__SCAN_header) {
5176               // header scan definitely stops at first IDAT
5177               if (pal_img_n)
5178                  s->img_n = pal_img_n;
5179               return 1;
5180            }
5181            if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes");
5182            if ((int)(ioff + c.length) < (int)ioff) return 0;
5183            if (ioff + c.length > idata_limit) {
5184               stbi__uint32 idata_limit_old = idata_limit;
5185               stbi_uc *p;
5186               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
5187               while (ioff + c.length > idata_limit)
5188                  idata_limit *= 2;
5189               STBI_NOTUSED(idata_limit_old);
5190               p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
5191               z->idata = p;
5192            }
5193            if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
5194            ioff += c.length;
5195            break;
5196         }
5197
5198         case STBI__PNG_TYPE('I','E','N','D'): {
5199            stbi__uint32 raw_len, bpl;
5200            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5201            if (scan != STBI__SCAN_load) return 1;
5202            if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
5203            // initial guess for decoded data size to avoid unnecessary reallocs
5204            bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
5205            raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
5206            z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
5207            if (z->expanded == NULL) return 0; // zlib should set error
5208            STBI_FREE(z->idata); z->idata = NULL;
5209            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
5210               s->img_out_n = s->img_n+1;
5211            else
5212               s->img_out_n = s->img_n;
5213            if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
5214            if (has_trans) {
5215               if (z->depth == 16) {
5216                  if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
5217               } else {
5218                  if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
5219               }
5220            }
5221            if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
5222               stbi__de_iphone(z);
5223            if (pal_img_n) {
5224               // pal_img_n == 3 or 4
5225               s->img_n = pal_img_n; // record the actual colors we had
5226               s->img_out_n = pal_img_n;
5227               if (req_comp >= 3) s->img_out_n = req_comp;
5228               if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
5229                  return 0;
5230            } else if (has_trans) {
5231               // non-paletted image with tRNS -> source image has (constant) alpha
5232               ++s->img_n;
5233            }
5234            STBI_FREE(z->expanded); z->expanded = NULL;
5235            // end of PNG chunk, read and skip CRC
5236            stbi__get32be(s);
5237            return 1;
5238         }
5239
5240         default:
5241            // if critical, fail
5242            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5243            if ((c.type & (1 << 29)) == 0) {
5244               #ifndef STBI_NO_FAILURE_STRINGS
5245               // not threadsafe
5246               static char invalid_chunk[] = "XXXX PNG chunk not known";
5247               invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
5248               invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
5249               invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
5250               invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
5251               #endif
5252               return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
5253            }
5254            stbi__skip(s, c.length);
5255            break;
5256      }
5257      // end of PNG chunk, read and skip CRC
5258      stbi__get32be(s);
5259   }
5260}
5261
5262static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
5263{
5264   void *result=NULL;
5265   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
5266   if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
5267      if (p->depth <= 8)
5268         ri->bits_per_channel = 8;
5269      else if (p->depth == 16)
5270         ri->bits_per_channel = 16;
5271      else
5272         return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
5273      result = p->out;
5274      p->out = NULL;
5275      if (req_comp && req_comp != p->s->img_out_n) {
5276         if (ri->bits_per_channel == 8)
5277            result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5278         else
5279            result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5280         p->s->img_out_n = req_comp;
5281         if (result == NULL) return result;
5282      }
5283      *x = p->s->img_x;
5284      *y = p->s->img_y;
5285      if (n) *n = p->s->img_n;
5286   }
5287   STBI_FREE(p->out);      p->out      = NULL;
5288   STBI_FREE(p->expanded); p->expanded = NULL;
5289   STBI_FREE(p->idata);    p->idata    = NULL;
5290
5291   return result;
5292}
5293
5294static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5295{
5296   stbi__png p;
5297   p.s = s;
5298   return stbi__do_png(&p, x,y,comp,req_comp, ri);
5299}
5300
5301static int stbi__png_test(stbi__context *s)
5302{
5303   int r;
5304   r = stbi__check_png_header(s);
5305   stbi__rewind(s);
5306   return r;
5307}
5308
5309static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
5310{
5311   if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
5312      stbi__rewind( p->s );
5313      return 0;
5314   }
5315   if (x) *x = p->s->img_x;
5316   if (y) *y = p->s->img_y;
5317   if (comp) *comp = p->s->img_n;
5318   return 1;
5319}
5320
5321static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
5322{
5323   stbi__png p;
5324   p.s = s;
5325   return stbi__png_info_raw(&p, x, y, comp);
5326}
5327
5328static int stbi__png_is16(stbi__context *s)
5329{
5330   stbi__png p;
5331   p.s = s;
5332   if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
5333	   return 0;
5334   if (p.depth != 16) {
5335      stbi__rewind(p.s);
5336      return 0;
5337   }
5338   return 1;
5339}
5340#endif
5341
5342// Microsoft/Windows BMP image
5343
5344#ifndef STBI_NO_BMP
5345static int stbi__bmp_test_raw(stbi__context *s)
5346{
5347   int r;
5348   int sz;
5349   if (stbi__get8(s) != 'B') return 0;
5350   if (stbi__get8(s) != 'M') return 0;
5351   stbi__get32le(s); // discard filesize
5352   stbi__get16le(s); // discard reserved
5353   stbi__get16le(s); // discard reserved
5354   stbi__get32le(s); // discard data offset
5355   sz = stbi__get32le(s);
5356   r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
5357   return r;
5358}
5359
5360static int stbi__bmp_test(stbi__context *s)
5361{
5362   int r = stbi__bmp_test_raw(s);
5363   stbi__rewind(s);
5364   return r;
5365}
5366
5367
5368// returns 0..31 for the highest set bit
5369static int stbi__high_bit(unsigned int z)
5370{
5371   int n=0;
5372   if (z == 0) return -1;
5373   if (z >= 0x10000) { n += 16; z >>= 16; }
5374   if (z >= 0x00100) { n +=  8; z >>=  8; }
5375   if (z >= 0x00010) { n +=  4; z >>=  4; }
5376   if (z >= 0x00004) { n +=  2; z >>=  2; }
5377   if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
5378   return n;
5379}
5380
5381static int stbi__bitcount(unsigned int a)
5382{
5383   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
5384   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
5385   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5386   a = (a + (a >> 8)); // max 16 per 8 bits
5387   a = (a + (a >> 16)); // max 32 per 8 bits
5388   return a & 0xff;
5389}
5390
5391// extract an arbitrarily-aligned N-bit value (N=bits)
5392// from v, and then make it 8-bits long and fractionally
5393// extend it to full full range.
5394static int stbi__shiftsigned(unsigned int v, int shift, int bits)
5395{
5396   static unsigned int mul_table[9] = {
5397      0,
5398      0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
5399      0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
5400   };
5401   static unsigned int shift_table[9] = {
5402      0, 0,0,1,0,2,4,6,0,
5403   };
5404   if (shift < 0)
5405      v <<= -shift;
5406   else
5407      v >>= shift;
5408   STBI_ASSERT(v < 256);
5409   v >>= (8-bits);
5410   STBI_ASSERT(bits >= 0 && bits <= 8);
5411   return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
5412}
5413
5414typedef struct
5415{
5416   int bpp, offset, hsz;
5417   unsigned int mr,mg,mb,ma, all_a;
5418   int extra_read;
5419} stbi__bmp_data;
5420
5421static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
5422{
5423   // BI_BITFIELDS specifies masks explicitly, don't override
5424   if (compress == 3)
5425      return 1;
5426
5427   if (compress == 0) {
5428      if (info->bpp == 16) {
5429         info->mr = 31u << 10;
5430         info->mg = 31u <<  5;
5431         info->mb = 31u <<  0;
5432      } else if (info->bpp == 32) {
5433         info->mr = 0xffu << 16;
5434         info->mg = 0xffu <<  8;
5435         info->mb = 0xffu <<  0;
5436         info->ma = 0xffu << 24;
5437         info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5438      } else {
5439         // otherwise, use defaults, which is all-0
5440         info->mr = info->mg = info->mb = info->ma = 0;
5441      }
5442      return 1;
5443   }
5444   return 0; // error
5445}
5446
5447static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
5448{
5449   int hsz;
5450   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
5451   stbi__get32le(s); // discard filesize
5452   stbi__get16le(s); // discard reserved
5453   stbi__get16le(s); // discard reserved
5454   info->offset = stbi__get32le(s);
5455   info->hsz = hsz = stbi__get32le(s);
5456   info->mr = info->mg = info->mb = info->ma = 0;
5457   info->extra_read = 14;
5458
5459   if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
5460
5461   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5462   if (hsz == 12) {
5463      s->img_x = stbi__get16le(s);
5464      s->img_y = stbi__get16le(s);
5465   } else {
5466      s->img_x = stbi__get32le(s);
5467      s->img_y = stbi__get32le(s);
5468   }
5469   if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
5470   info->bpp = stbi__get16le(s);
5471   if (hsz != 12) {
5472      int compress = stbi__get32le(s);
5473      if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5474      if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
5475      if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
5476      stbi__get32le(s); // discard sizeof
5477      stbi__get32le(s); // discard hres
5478      stbi__get32le(s); // discard vres
5479      stbi__get32le(s); // discard colorsused
5480      stbi__get32le(s); // discard max important
5481      if (hsz == 40 || hsz == 56) {
5482         if (hsz == 56) {
5483            stbi__get32le(s);
5484            stbi__get32le(s);
5485            stbi__get32le(s);
5486            stbi__get32le(s);
5487         }
5488         if (info->bpp == 16 || info->bpp == 32) {
5489            if (compress == 0) {
5490               stbi__bmp_set_mask_defaults(info, compress);
5491            } else if (compress == 3) {
5492               info->mr = stbi__get32le(s);
5493               info->mg = stbi__get32le(s);
5494               info->mb = stbi__get32le(s);
5495               info->extra_read += 12;
5496               // not documented, but generated by photoshop and handled by mspaint
5497               if (info->mr == info->mg && info->mg == info->mb) {
5498                  // ?!?!?
5499                  return stbi__errpuc("bad BMP", "bad BMP");
5500               }
5501            } else
5502               return stbi__errpuc("bad BMP", "bad BMP");
5503         }
5504      } else {
5505         // V4/V5 header
5506         int i;
5507         if (hsz != 108 && hsz != 124)
5508            return stbi__errpuc("bad BMP", "bad BMP");
5509         info->mr = stbi__get32le(s);
5510         info->mg = stbi__get32le(s);
5511         info->mb = stbi__get32le(s);
5512         info->ma = stbi__get32le(s);
5513         if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
5514            stbi__bmp_set_mask_defaults(info, compress);
5515         stbi__get32le(s); // discard color space
5516         for (i=0; i < 12; ++i)
5517            stbi__get32le(s); // discard color space parameters
5518         if (hsz == 124) {
5519            stbi__get32le(s); // discard rendering intent
5520            stbi__get32le(s); // discard offset of profile data
5521            stbi__get32le(s); // discard size of profile data
5522            stbi__get32le(s); // discard reserved
5523         }
5524      }
5525   }
5526   return (void *) 1;
5527}
5528
5529
5530static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5531{
5532   stbi_uc *out;
5533   unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
5534   stbi_uc pal[256][4];
5535   int psize=0,i,j,width;
5536   int flip_vertically, pad, target;
5537   stbi__bmp_data info;
5538   STBI_NOTUSED(ri);
5539
5540   info.all_a = 255;
5541   if (stbi__bmp_parse_header(s, &info) == NULL)
5542      return NULL; // error code already set
5543
5544   flip_vertically = ((int) s->img_y) > 0;
5545   s->img_y = abs((int) s->img_y);
5546
5547   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5548   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5549
5550   mr = info.mr;
5551   mg = info.mg;
5552   mb = info.mb;
5553   ma = info.ma;
5554   all_a = info.all_a;
5555
5556   if (info.hsz == 12) {
5557      if (info.bpp < 24)
5558         psize = (info.offset - info.extra_read - 24) / 3;
5559   } else {
5560      if (info.bpp < 16)
5561         psize = (info.offset - info.extra_read - info.hsz) >> 2;
5562   }
5563   if (psize == 0) {
5564      // accept some number of extra bytes after the header, but if the offset points either to before
5565      // the header ends or implies a large amount of extra data, reject the file as malformed
5566      int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original);
5567      int header_limit = 1024; // max we actually read is below 256 bytes currently.
5568      int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size.
5569      if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
5570         return stbi__errpuc("bad header", "Corrupt BMP");
5571      }
5572      // we established that bytes_read_so_far is positive and sensible.
5573      // the first half of this test rejects offsets that are either too small positives, or
5574      // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn
5575      // ensures the number computed in the second half of the test can't overflow.
5576      if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) {
5577         return stbi__errpuc("bad offset", "Corrupt BMP");
5578      } else {
5579         stbi__skip(s, info.offset - bytes_read_so_far);
5580      }
5581   }
5582
5583   if (info.bpp == 24 && ma == 0xff000000)
5584      s->img_n = 3;
5585   else
5586      s->img_n = ma ? 4 : 3;
5587   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5588      target = req_comp;
5589   else
5590      target = s->img_n; // if they want monochrome, we'll post-convert
5591
5592   // sanity-check size
5593   if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5594      return stbi__errpuc("too large", "Corrupt BMP");
5595
5596   out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5597   if (!out) return stbi__errpuc("outofmem", "Out of memory");
5598   if (info.bpp < 16) {
5599      int z=0;
5600      if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5601      for (i=0; i < psize; ++i) {
5602         pal[i][2] = stbi__get8(s);
5603         pal[i][1] = stbi__get8(s);
5604         pal[i][0] = stbi__get8(s);
5605         if (info.hsz != 12) stbi__get8(s);
5606         pal[i][3] = 255;
5607      }
5608      stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5609      if (info.bpp == 1) width = (s->img_x + 7) >> 3;
5610      else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5611      else if (info.bpp == 8) width = s->img_x;
5612      else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5613      pad = (-width)&3;
5614      if (info.bpp == 1) {
5615         for (j=0; j < (int) s->img_y; ++j) {
5616            int bit_offset = 7, v = stbi__get8(s);
5617            for (i=0; i < (int) s->img_x; ++i) {
5618               int color = (v>>bit_offset)&0x1;
5619               out[z++] = pal[color][0];
5620               out[z++] = pal[color][1];
5621               out[z++] = pal[color][2];
5622               if (target == 4) out[z++] = 255;
5623               if (i+1 == (int) s->img_x) break;
5624               if((--bit_offset) < 0) {
5625                  bit_offset = 7;
5626                  v = stbi__get8(s);
5627               }
5628            }
5629            stbi__skip(s, pad);
5630         }
5631      } else {
5632         for (j=0; j < (int) s->img_y; ++j) {
5633            for (i=0; i < (int) s->img_x; i += 2) {
5634               int v=stbi__get8(s),v2=0;
5635               if (info.bpp == 4) {
5636                  v2 = v & 15;
5637                  v >>= 4;
5638               }
5639               out[z++] = pal[v][0];
5640               out[z++] = pal[v][1];
5641               out[z++] = pal[v][2];
5642               if (target == 4) out[z++] = 255;
5643               if (i+1 == (int) s->img_x) break;
5644               v = (info.bpp == 8) ? stbi__get8(s) : v2;
5645               out[z++] = pal[v][0];
5646               out[z++] = pal[v][1];
5647               out[z++] = pal[v][2];
5648               if (target == 4) out[z++] = 255;
5649            }
5650            stbi__skip(s, pad);
5651         }
5652      }
5653   } else {
5654      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
5655      int z = 0;
5656      int easy=0;
5657      stbi__skip(s, info.offset - info.extra_read - info.hsz);
5658      if (info.bpp == 24) width = 3 * s->img_x;
5659      else if (info.bpp == 16) width = 2*s->img_x;
5660      else /* bpp = 32 and pad = 0 */ width=0;
5661      pad = (-width) & 3;
5662      if (info.bpp == 24) {
5663         easy = 1;
5664      } else if (info.bpp == 32) {
5665         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5666            easy = 2;
5667      }
5668      if (!easy) {
5669         if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5670         // right shift amt to put high bit in position #7
5671         rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
5672         gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
5673         bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
5674         ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
5675         if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5676      }
5677      for (j=0; j < (int) s->img_y; ++j) {
5678         if (easy) {
5679            for (i=0; i < (int) s->img_x; ++i) {
5680               unsigned char a;
5681               out[z+2] = stbi__get8(s);
5682               out[z+1] = stbi__get8(s);
5683               out[z+0] = stbi__get8(s);
5684               z += 3;
5685               a = (easy == 2 ? stbi__get8(s) : 255);
5686               all_a |= a;
5687               if (target == 4) out[z++] = a;
5688            }
5689         } else {
5690            int bpp = info.bpp;
5691            for (i=0; i < (int) s->img_x; ++i) {
5692               stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
5693               unsigned int a;
5694               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5695               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5696               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5697               a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5698               all_a |= a;
5699               if (target == 4) out[z++] = STBI__BYTECAST(a);
5700            }
5701         }
5702         stbi__skip(s, pad);
5703      }
5704   }
5705
5706   // if alpha channel is all 0s, replace with all 255s
5707   if (target == 4 && all_a == 0)
5708      for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
5709         out[i] = 255;
5710
5711   if (flip_vertically) {
5712      stbi_uc t;
5713      for (j=0; j < (int) s->img_y>>1; ++j) {
5714         stbi_uc *p1 = out +      j     *s->img_x*target;
5715         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
5716         for (i=0; i < (int) s->img_x*target; ++i) {
5717            t = p1[i]; p1[i] = p2[i]; p2[i] = t;
5718         }
5719      }
5720   }
5721
5722   if (req_comp && req_comp != target) {
5723      out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5724      if (out == NULL) return out; // stbi__convert_format frees input on failure
5725   }
5726
5727   *x = s->img_x;
5728   *y = s->img_y;
5729   if (comp) *comp = s->img_n;
5730   return out;
5731}
5732#endif
5733
5734// Targa Truevision - TGA
5735// by Jonathan Dummer
5736#ifndef STBI_NO_TGA
5737// returns STBI_rgb or whatever, 0 on error
5738static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5739{
5740   // only RGB or RGBA (incl. 16bit) or grey allowed
5741   if (is_rgb16) *is_rgb16 = 0;
5742   switch(bits_per_pixel) {
5743      case 8:  return STBI_grey;
5744      case 16: if(is_grey) return STBI_grey_alpha;
5745               // fallthrough
5746      case 15: if(is_rgb16) *is_rgb16 = 1;
5747               return STBI_rgb;
5748      case 24: // fallthrough
5749      case 32: return bits_per_pixel/8;
5750      default: return 0;
5751   }
5752}
5753
5754static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5755{
5756    int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5757    int sz, tga_colormap_type;
5758    stbi__get8(s);                   // discard Offset
5759    tga_colormap_type = stbi__get8(s); // colormap type
5760    if( tga_colormap_type > 1 ) {
5761        stbi__rewind(s);
5762        return 0;      // only RGB or indexed allowed
5763    }
5764    tga_image_type = stbi__get8(s); // image type
5765    if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
5766        if (tga_image_type != 1 && tga_image_type != 9) {
5767            stbi__rewind(s);
5768            return 0;
5769        }
5770        stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5771        sz = stbi__get8(s);    //   check bits per palette color entry
5772        if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5773            stbi__rewind(s);
5774            return 0;
5775        }
5776        stbi__skip(s,4);       // skip image x and y origin
5777        tga_colormap_bpp = sz;
5778    } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5779        if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5780            stbi__rewind(s);
5781            return 0; // only RGB or grey allowed, +/- RLE
5782        }
5783        stbi__skip(s,9); // skip colormap specification and image x/y origin
5784        tga_colormap_bpp = 0;
5785    }
5786    tga_w = stbi__get16le(s);
5787    if( tga_w < 1 ) {
5788        stbi__rewind(s);
5789        return 0;   // test width
5790    }
5791    tga_h = stbi__get16le(s);
5792    if( tga_h < 1 ) {
5793        stbi__rewind(s);
5794        return 0;   // test height
5795    }
5796    tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5797    stbi__get8(s); // ignore alpha bits
5798    if (tga_colormap_bpp != 0) {
5799        if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5800            // when using a colormap, tga_bits_per_pixel is the size of the indexes
5801            // I don't think anything but 8 or 16bit indexes makes sense
5802            stbi__rewind(s);
5803            return 0;
5804        }
5805        tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5806    } else {
5807        tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5808    }
5809    if(!tga_comp) {
5810      stbi__rewind(s);
5811      return 0;
5812    }
5813    if (x) *x = tga_w;
5814    if (y) *y = tga_h;
5815    if (comp) *comp = tga_comp;
5816    return 1;                   // seems to have passed everything
5817}
5818
5819static int stbi__tga_test(stbi__context *s)
5820{
5821   int res = 0;
5822   int sz, tga_color_type;
5823   stbi__get8(s);      //   discard Offset
5824   tga_color_type = stbi__get8(s);   //   color type
5825   if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
5826   sz = stbi__get8(s);   //   image type
5827   if ( tga_color_type == 1 ) { // colormapped (paletted) image
5828      if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5829      stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5830      sz = stbi__get8(s);    //   check bits per palette color entry
5831      if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5832      stbi__skip(s,4);       // skip image x and y origin
5833   } else { // "normal" image w/o colormap
5834      if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5835      stbi__skip(s,9); // skip colormap specification and image x/y origin
5836   }
5837   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
5838   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
5839   sz = stbi__get8(s);   //   bits per pixel
5840   if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5841   if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5842
5843   res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5844
5845errorEnd:
5846   stbi__rewind(s);
5847   return res;
5848}
5849
5850// read 16bit value and convert to 24bit RGB
5851static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5852{
5853   stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5854   stbi__uint16 fiveBitMask = 31;
5855   // we have 3 channels with 5bits each
5856   int r = (px >> 10) & fiveBitMask;
5857   int g = (px >> 5) & fiveBitMask;
5858   int b = px & fiveBitMask;
5859   // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5860   out[0] = (stbi_uc)((r * 255)/31);
5861   out[1] = (stbi_uc)((g * 255)/31);
5862   out[2] = (stbi_uc)((b * 255)/31);
5863
5864   // some people claim that the most significant bit might be used for alpha
5865   // (possibly if an alpha-bit is set in the "image descriptor byte")
5866   // but that only made 16bit test images completely translucent..
5867   // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5868}
5869
5870static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5871{
5872   //   read in the TGA header stuff
5873   int tga_offset = stbi__get8(s);
5874   int tga_indexed = stbi__get8(s);
5875   int tga_image_type = stbi__get8(s);
5876   int tga_is_RLE = 0;
5877   int tga_palette_start = stbi__get16le(s);
5878   int tga_palette_len = stbi__get16le(s);
5879   int tga_palette_bits = stbi__get8(s);
5880   int tga_x_origin = stbi__get16le(s);
5881   int tga_y_origin = stbi__get16le(s);
5882   int tga_width = stbi__get16le(s);
5883   int tga_height = stbi__get16le(s);
5884   int tga_bits_per_pixel = stbi__get8(s);
5885   int tga_comp, tga_rgb16=0;
5886   int tga_inverted = stbi__get8(s);
5887   // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5888   //   image data
5889   unsigned char *tga_data;
5890   unsigned char *tga_palette = NULL;
5891   int i, j;
5892   unsigned char raw_data[4] = {0};
5893   int RLE_count = 0;
5894   int RLE_repeating = 0;
5895   int read_next_pixel = 1;
5896   STBI_NOTUSED(ri);
5897   STBI_NOTUSED(tga_x_origin); // @TODO
5898   STBI_NOTUSED(tga_y_origin); // @TODO
5899
5900   if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5901   if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5902
5903   //   do a tiny bit of precessing
5904   if ( tga_image_type >= 8 )
5905   {
5906      tga_image_type -= 8;
5907      tga_is_RLE = 1;
5908   }
5909   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5910
5911   //   If I'm paletted, then I'll use the number of bits from the palette
5912   if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5913   else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5914
5915   if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5916      return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5917
5918   //   tga info
5919   *x = tga_width;
5920   *y = tga_height;
5921   if (comp) *comp = tga_comp;
5922
5923   if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5924      return stbi__errpuc("too large", "Corrupt TGA");
5925
5926   tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5927   if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5928
5929   // skip to the data's starting position (offset usually = 0)
5930   stbi__skip(s, tga_offset );
5931
5932   if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5933      for (i=0; i < tga_height; ++i) {
5934         int row = tga_inverted ? tga_height -i - 1 : i;
5935         stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5936         stbi__getn(s, tga_row, tga_width * tga_comp);
5937      }
5938   } else  {
5939      //   do I need to load a palette?
5940      if ( tga_indexed)
5941      {
5942         if (tga_palette_len == 0) {  /* you have to have at least one entry! */
5943            STBI_FREE(tga_data);
5944            return stbi__errpuc("bad palette", "Corrupt TGA");
5945         }
5946
5947         //   any data to skip? (offset usually = 0)
5948         stbi__skip(s, tga_palette_start );
5949         //   load the palette
5950         tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5951         if (!tga_palette) {
5952            STBI_FREE(tga_data);
5953            return stbi__errpuc("outofmem", "Out of memory");
5954         }
5955         if (tga_rgb16) {
5956            stbi_uc *pal_entry = tga_palette;
5957            STBI_ASSERT(tga_comp == STBI_rgb);
5958            for (i=0; i < tga_palette_len; ++i) {
5959               stbi__tga_read_rgb16(s, pal_entry);
5960               pal_entry += tga_comp;
5961            }
5962         } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5963               STBI_FREE(tga_data);
5964               STBI_FREE(tga_palette);
5965               return stbi__errpuc("bad palette", "Corrupt TGA");
5966         }
5967      }
5968      //   load the data
5969      for (i=0; i < tga_width * tga_height; ++i)
5970      {
5971         //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5972         if ( tga_is_RLE )
5973         {
5974            if ( RLE_count == 0 )
5975            {
5976               //   yep, get the next byte as a RLE command
5977               int RLE_cmd = stbi__get8(s);
5978               RLE_count = 1 + (RLE_cmd & 127);
5979               RLE_repeating = RLE_cmd >> 7;
5980               read_next_pixel = 1;
5981            } else if ( !RLE_repeating )
5982            {
5983               read_next_pixel = 1;
5984            }
5985         } else
5986         {
5987            read_next_pixel = 1;
5988         }
5989         //   OK, if I need to read a pixel, do it now
5990         if ( read_next_pixel )
5991         {
5992            //   load however much data we did have
5993            if ( tga_indexed )
5994            {
5995               // read in index, then perform the lookup
5996               int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5997               if ( pal_idx >= tga_palette_len ) {
5998                  // invalid index
5999                  pal_idx = 0;
6000               }
6001               pal_idx *= tga_comp;
6002               for (j = 0; j < tga_comp; ++j) {
6003                  raw_data[j] = tga_palette[pal_idx+j];
6004               }
6005            } else if(tga_rgb16) {
6006               STBI_ASSERT(tga_comp == STBI_rgb);
6007               stbi__tga_read_rgb16(s, raw_data);
6008            } else {
6009               //   read in the data raw
6010               for (j = 0; j < tga_comp; ++j) {
6011                  raw_data[j] = stbi__get8(s);
6012               }
6013            }
6014            //   clear the reading flag for the next pixel
6015            read_next_pixel = 0;
6016         } // end of reading a pixel
6017
6018         // copy data
6019         for (j = 0; j < tga_comp; ++j)
6020           tga_data[i*tga_comp+j] = raw_data[j];
6021
6022         //   in case we're in RLE mode, keep counting down
6023         --RLE_count;
6024      }
6025      //   do I need to invert the image?
6026      if ( tga_inverted )
6027      {
6028         for (j = 0; j*2 < tga_height; ++j)
6029         {
6030            int index1 = j * tga_width * tga_comp;
6031            int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
6032            for (i = tga_width * tga_comp; i > 0; --i)
6033            {
6034               unsigned char temp = tga_data[index1];
6035               tga_data[index1] = tga_data[index2];
6036               tga_data[index2] = temp;
6037               ++index1;
6038               ++index2;
6039            }
6040         }
6041      }
6042      //   clear my palette, if I had one
6043      if ( tga_palette != NULL )
6044      {
6045         STBI_FREE( tga_palette );
6046      }
6047   }
6048
6049   // swap RGB - if the source data was RGB16, it already is in the right order
6050   if (tga_comp >= 3 && !tga_rgb16)
6051   {
6052      unsigned char* tga_pixel = tga_data;
6053      for (i=0; i < tga_width * tga_height; ++i)
6054      {
6055         unsigned char temp = tga_pixel[0];
6056         tga_pixel[0] = tga_pixel[2];
6057         tga_pixel[2] = temp;
6058         tga_pixel += tga_comp;
6059      }
6060   }
6061
6062   // convert to target component count
6063   if (req_comp && req_comp != tga_comp)
6064      tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
6065
6066   //   the things I do to get rid of an error message, and yet keep
6067   //   Microsoft's C compilers happy... [8^(
6068   tga_palette_start = tga_palette_len = tga_palette_bits =
6069         tga_x_origin = tga_y_origin = 0;
6070   STBI_NOTUSED(tga_palette_start);
6071   //   OK, done
6072   return tga_data;
6073}
6074#endif
6075
6076// *************************************************************************************************
6077// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
6078
6079#ifndef STBI_NO_PSD
6080static int stbi__psd_test(stbi__context *s)
6081{
6082   int r = (stbi__get32be(s) == 0x38425053);
6083   stbi__rewind(s);
6084   return r;
6085}
6086
6087static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
6088{
6089   int count, nleft, len;
6090
6091   count = 0;
6092   while ((nleft = pixelCount - count) > 0) {
6093      len = stbi__get8(s);
6094      if (len == 128) {
6095         // No-op.
6096      } else if (len < 128) {
6097         // Copy next len+1 bytes literally.
6098         len++;
6099         if (len > nleft) return 0; // corrupt data
6100         count += len;
6101         while (len) {
6102            *p = stbi__get8(s);
6103            p += 4;
6104            len--;
6105         }
6106      } else if (len > 128) {
6107         stbi_uc   val;
6108         // Next -len+1 bytes in the dest are replicated from next source byte.
6109         // (Interpret len as a negative 8-bit int.)
6110         len = 257 - len;
6111         if (len > nleft) return 0; // corrupt data
6112         val = stbi__get8(s);
6113         count += len;
6114         while (len) {
6115            *p = val;
6116            p += 4;
6117            len--;
6118         }
6119      }
6120   }
6121
6122   return 1;
6123}
6124
6125static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
6126{
6127   int pixelCount;
6128   int channelCount, compression;
6129   int channel, i;
6130   int bitdepth;
6131   int w,h;
6132   stbi_uc *out;
6133   STBI_NOTUSED(ri);
6134
6135   // Check identifier
6136   if (stbi__get32be(s) != 0x38425053)   // "8BPS"
6137      return stbi__errpuc("not PSD", "Corrupt PSD image");
6138
6139   // Check file type version.
6140   if (stbi__get16be(s) != 1)
6141      return stbi__errpuc("wrong version", "Unsupported version of PSD image");
6142
6143   // Skip 6 reserved bytes.
6144   stbi__skip(s, 6 );
6145
6146   // Read the number of channels (R, G, B, A, etc).
6147   channelCount = stbi__get16be(s);
6148   if (channelCount < 0 || channelCount > 16)
6149      return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
6150
6151   // Read the rows and columns of the image.
6152   h = stbi__get32be(s);
6153   w = stbi__get32be(s);
6154
6155   if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6156   if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6157
6158   // Make sure the depth is 8 bits.
6159   bitdepth = stbi__get16be(s);
6160   if (bitdepth != 8 && bitdepth != 16)
6161      return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
6162
6163   // Make sure the color mode is RGB.
6164   // Valid options are:
6165   //   0: Bitmap
6166   //   1: Grayscale
6167   //   2: Indexed color
6168   //   3: RGB color
6169   //   4: CMYK color
6170   //   7: Multichannel
6171   //   8: Duotone
6172   //   9: Lab color
6173   if (stbi__get16be(s) != 3)
6174      return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
6175
6176   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
6177   stbi__skip(s,stbi__get32be(s) );
6178
6179   // Skip the image resources.  (resolution, pen tool paths, etc)
6180   stbi__skip(s, stbi__get32be(s) );
6181
6182   // Skip the reserved data.
6183   stbi__skip(s, stbi__get32be(s) );
6184
6185   // Find out if the data is compressed.
6186   // Known values:
6187   //   0: no compression
6188   //   1: RLE compressed
6189   compression = stbi__get16be(s);
6190   if (compression > 1)
6191      return stbi__errpuc("bad compression", "PSD has an unknown compression format");
6192
6193   // Check size
6194   if (!stbi__mad3sizes_valid(4, w, h, 0))
6195      return stbi__errpuc("too large", "Corrupt PSD");
6196
6197   // Create the destination image.
6198
6199   if (!compression && bitdepth == 16 && bpc == 16) {
6200      out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
6201      ri->bits_per_channel = 16;
6202   } else
6203      out = (stbi_uc *) stbi__malloc(4 * w*h);
6204
6205   if (!out) return stbi__errpuc("outofmem", "Out of memory");
6206   pixelCount = w*h;
6207
6208   // Initialize the data to zero.
6209   //memset( out, 0, pixelCount * 4 );
6210
6211   // Finally, the image data.
6212   if (compression) {
6213      // RLE as used by .PSD and .TIFF
6214      // Loop until you get the number of unpacked bytes you are expecting:
6215      //     Read the next source byte into n.
6216      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
6217      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
6218      //     Else if n is 128, noop.
6219      // Endloop
6220
6221      // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
6222      // which we're going to just skip.
6223      stbi__skip(s, h * channelCount * 2 );
6224
6225      // Read the RLE data by channel.
6226      for (channel = 0; channel < 4; channel++) {
6227         stbi_uc *p;
6228
6229         p = out+channel;
6230         if (channel >= channelCount) {
6231            // Fill this channel with default data.
6232            for (i = 0; i < pixelCount; i++, p += 4)
6233               *p = (channel == 3 ? 255 : 0);
6234         } else {
6235            // Read the RLE data.
6236            if (!stbi__psd_decode_rle(s, p, pixelCount)) {
6237               STBI_FREE(out);
6238               return stbi__errpuc("corrupt", "bad RLE data");
6239            }
6240         }
6241      }
6242
6243   } else {
6244      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
6245      // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
6246
6247      // Read the data by channel.
6248      for (channel = 0; channel < 4; channel++) {
6249         if (channel >= channelCount) {
6250            // Fill this channel with default data.
6251            if (bitdepth == 16 && bpc == 16) {
6252               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6253               stbi__uint16 val = channel == 3 ? 65535 : 0;
6254               for (i = 0; i < pixelCount; i++, q += 4)
6255                  *q = val;
6256            } else {
6257               stbi_uc *p = out+channel;
6258               stbi_uc val = channel == 3 ? 255 : 0;
6259               for (i = 0; i < pixelCount; i++, p += 4)
6260                  *p = val;
6261            }
6262         } else {
6263            if (ri->bits_per_channel == 16) {    // output bpc
6264               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6265               for (i = 0; i < pixelCount; i++, q += 4)
6266                  *q = (stbi__uint16) stbi__get16be(s);
6267            } else {
6268               stbi_uc *p = out+channel;
6269               if (bitdepth == 16) {  // input bpc
6270                  for (i = 0; i < pixelCount; i++, p += 4)
6271                     *p = (stbi_uc) (stbi__get16be(s) >> 8);
6272               } else {
6273                  for (i = 0; i < pixelCount; i++, p += 4)
6274                     *p = stbi__get8(s);
6275               }
6276            }
6277         }
6278      }
6279   }
6280
6281   // remove weird white matte from PSD
6282   if (channelCount >= 4) {
6283      if (ri->bits_per_channel == 16) {
6284         for (i=0; i < w*h; ++i) {
6285            stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
6286            if (pixel[3] != 0 && pixel[3] != 65535) {
6287               float a = pixel[3] / 65535.0f;
6288               float ra = 1.0f / a;
6289               float inv_a = 65535.0f * (1 - ra);
6290               pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
6291               pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
6292               pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
6293            }
6294         }
6295      } else {
6296         for (i=0; i < w*h; ++i) {
6297            unsigned char *pixel = out + 4*i;
6298            if (pixel[3] != 0 && pixel[3] != 255) {
6299               float a = pixel[3] / 255.0f;
6300               float ra = 1.0f / a;
6301               float inv_a = 255.0f * (1 - ra);
6302               pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
6303               pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
6304               pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
6305            }
6306         }
6307      }
6308   }
6309
6310   // convert to desired output format
6311   if (req_comp && req_comp != 4) {
6312      if (ri->bits_per_channel == 16)
6313         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
6314      else
6315         out = stbi__convert_format(out, 4, req_comp, w, h);
6316      if (out == NULL) return out; // stbi__convert_format frees input on failure
6317   }
6318
6319   if (comp) *comp = 4;
6320   *y = h;
6321   *x = w;
6322
6323   return out;
6324}
6325#endif
6326
6327// *************************************************************************************************
6328// Softimage PIC loader
6329// by Tom Seddon
6330//
6331// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
6332// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
6333
6334#ifndef STBI_NO_PIC
6335static int stbi__pic_is4(stbi__context *s,const char *str)
6336{
6337   int i;
6338   for (i=0; i<4; ++i)
6339      if (stbi__get8(s) != (stbi_uc)str[i])
6340         return 0;
6341
6342   return 1;
6343}
6344
6345static int stbi__pic_test_core(stbi__context *s)
6346{
6347   int i;
6348
6349   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
6350      return 0;
6351
6352   for(i=0;i<84;++i)
6353      stbi__get8(s);
6354
6355   if (!stbi__pic_is4(s,"PICT"))
6356      return 0;
6357
6358   return 1;
6359}
6360
6361typedef struct
6362{
6363   stbi_uc size,type,channel;
6364} stbi__pic_packet;
6365
6366static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
6367{
6368   int mask=0x80, i;
6369
6370   for (i=0; i<4; ++i, mask>>=1) {
6371      if (channel & mask) {
6372         if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
6373         dest[i]=stbi__get8(s);
6374      }
6375   }
6376
6377   return dest;
6378}
6379
6380static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
6381{
6382   int mask=0x80,i;
6383
6384   for (i=0;i<4; ++i, mask>>=1)
6385      if (channel&mask)
6386         dest[i]=src[i];
6387}
6388
6389static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
6390{
6391   int act_comp=0,num_packets=0,y,chained;
6392   stbi__pic_packet packets[10];
6393
6394   // this will (should...) cater for even some bizarre stuff like having data
6395    // for the same channel in multiple packets.
6396   do {
6397      stbi__pic_packet *packet;
6398
6399      if (num_packets==sizeof(packets)/sizeof(packets[0]))
6400         return stbi__errpuc("bad format","too many packets");
6401
6402      packet = &packets[num_packets++];
6403
6404      chained = stbi__get8(s);
6405      packet->size    = stbi__get8(s);
6406      packet->type    = stbi__get8(s);
6407      packet->channel = stbi__get8(s);
6408
6409      act_comp |= packet->channel;
6410
6411      if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
6412      if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
6413   } while (chained);
6414
6415   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
6416
6417   for(y=0; y<height; ++y) {
6418      int packet_idx;
6419
6420      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
6421         stbi__pic_packet *packet = &packets[packet_idx];
6422         stbi_uc *dest = result+y*width*4;
6423
6424         switch (packet->type) {
6425            default:
6426               return stbi__errpuc("bad format","packet has bad compression type");
6427
6428            case 0: {//uncompressed
6429               int x;
6430
6431               for(x=0;x<width;++x, dest+=4)
6432                  if (!stbi__readval(s,packet->channel,dest))
6433                     return 0;
6434               break;
6435            }
6436
6437            case 1://Pure RLE
6438               {
6439                  int left=width, i;
6440
6441                  while (left>0) {
6442                     stbi_uc count,value[4];
6443
6444                     count=stbi__get8(s);
6445                     if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
6446
6447                     if (count > left)
6448                        count = (stbi_uc) left;
6449
6450                     if (!stbi__readval(s,packet->channel,value))  return 0;
6451
6452                     for(i=0; i<count; ++i,dest+=4)
6453                        stbi__copyval(packet->channel,dest,value);
6454                     left -= count;
6455                  }
6456               }
6457               break;
6458
6459            case 2: {//Mixed RLE
6460               int left=width;
6461               while (left>0) {
6462                  int count = stbi__get8(s), i;
6463                  if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
6464
6465                  if (count >= 128) { // Repeated
6466                     stbi_uc value[4];
6467
6468                     if (count==128)
6469                        count = stbi__get16be(s);
6470                     else
6471                        count -= 127;
6472                     if (count > left)
6473                        return stbi__errpuc("bad file","scanline overrun");
6474
6475                     if (!stbi__readval(s,packet->channel,value))
6476                        return 0;
6477
6478                     for(i=0;i<count;++i, dest += 4)
6479                        stbi__copyval(packet->channel,dest,value);
6480                  } else { // Raw
6481                     ++count;
6482                     if (count>left) return stbi__errpuc("bad file","scanline overrun");
6483
6484                     for(i=0;i<count;++i, dest+=4)
6485                        if (!stbi__readval(s,packet->channel,dest))
6486                           return 0;
6487                  }
6488                  left-=count;
6489               }
6490               break;
6491            }
6492         }
6493      }
6494   }
6495
6496   return result;
6497}
6498
6499static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
6500{
6501   stbi_uc *result;
6502   int i, x,y, internal_comp;
6503   STBI_NOTUSED(ri);
6504
6505   if (!comp) comp = &internal_comp;
6506
6507   for (i=0; i<92; ++i)
6508      stbi__get8(s);
6509
6510   x = stbi__get16be(s);
6511   y = stbi__get16be(s);
6512
6513   if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6514   if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6515
6516   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
6517   if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
6518
6519   stbi__get32be(s); //skip `ratio'
6520   stbi__get16be(s); //skip `fields'
6521   stbi__get16be(s); //skip `pad'
6522
6523   // intermediate buffer is RGBA
6524   result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
6525   if (!result) return stbi__errpuc("outofmem", "Out of memory");
6526   memset(result, 0xff, x*y*4);
6527
6528   if (!stbi__pic_load_core(s,x,y,comp, result)) {
6529      STBI_FREE(result);
6530      result=0;
6531   }
6532   *px = x;
6533   *py = y;
6534   if (req_comp == 0) req_comp = *comp;
6535   result=stbi__convert_format(result,4,req_comp,x,y);
6536
6537   return result;
6538}
6539
6540static int stbi__pic_test(stbi__context *s)
6541{
6542   int r = stbi__pic_test_core(s);
6543   stbi__rewind(s);
6544   return r;
6545}
6546#endif
6547
6548// *************************************************************************************************
6549// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6550
6551#ifndef STBI_NO_GIF
6552typedef struct
6553{
6554   stbi__int16 prefix;
6555   stbi_uc first;
6556   stbi_uc suffix;
6557} stbi__gif_lzw;
6558
6559typedef struct
6560{
6561   int w,h;
6562   stbi_uc *out;                 // output buffer (always 4 components)
6563   stbi_uc *background;          // The current "background" as far as a gif is concerned
6564   stbi_uc *history;
6565   int flags, bgindex, ratio, transparent, eflags;
6566   stbi_uc  pal[256][4];
6567   stbi_uc lpal[256][4];
6568   stbi__gif_lzw codes[8192];
6569   stbi_uc *color_table;
6570   int parse, step;
6571   int lflags;
6572   int start_x, start_y;
6573   int max_x, max_y;
6574   int cur_x, cur_y;
6575   int line_size;
6576   int delay;
6577} stbi__gif;
6578
6579static int stbi__gif_test_raw(stbi__context *s)
6580{
6581   int sz;
6582   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6583   sz = stbi__get8(s);
6584   if (sz != '9' && sz != '7') return 0;
6585   if (stbi__get8(s) != 'a') return 0;
6586   return 1;
6587}
6588
6589static int stbi__gif_test(stbi__context *s)
6590{
6591   int r = stbi__gif_test_raw(s);
6592   stbi__rewind(s);
6593   return r;
6594}
6595
6596static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6597{
6598   int i;
6599   for (i=0; i < num_entries; ++i) {
6600      pal[i][2] = stbi__get8(s);
6601      pal[i][1] = stbi__get8(s);
6602      pal[i][0] = stbi__get8(s);
6603      pal[i][3] = transp == i ? 0 : 255;
6604   }
6605}
6606
6607static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6608{
6609   stbi_uc version;
6610   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6611      return stbi__err("not GIF", "Corrupt GIF");
6612
6613   version = stbi__get8(s);
6614   if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
6615   if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
6616
6617   stbi__g_failure_reason = "";
6618   g->w = stbi__get16le(s);
6619   g->h = stbi__get16le(s);
6620   g->flags = stbi__get8(s);
6621   g->bgindex = stbi__get8(s);
6622   g->ratio = stbi__get8(s);
6623   g->transparent = -1;
6624
6625   if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6626   if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6627
6628   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
6629
6630   if (is_info) return 1;
6631
6632   if (g->flags & 0x80)
6633      stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
6634
6635   return 1;
6636}
6637
6638static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6639{
6640   stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6641   if (!g) return stbi__err("outofmem", "Out of memory");
6642   if (!stbi__gif_header(s, g, comp, 1)) {
6643      STBI_FREE(g);
6644      stbi__rewind( s );
6645      return 0;
6646   }
6647   if (x) *x = g->w;
6648   if (y) *y = g->h;
6649   STBI_FREE(g);
6650   return 1;
6651}
6652
6653static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6654{
6655   stbi_uc *p, *c;
6656   int idx;
6657
6658   // recurse to decode the prefixes, since the linked-list is backwards,
6659   // and working backwards through an interleaved image would be nasty
6660   if (g->codes[code].prefix >= 0)
6661      stbi__out_gif_code(g, g->codes[code].prefix);
6662
6663   if (g->cur_y >= g->max_y) return;
6664
6665   idx = g->cur_x + g->cur_y;
6666   p = &g->out[idx];
6667   g->history[idx / 4] = 1;
6668
6669   c = &g->color_table[g->codes[code].suffix * 4];
6670   if (c[3] > 128) { // don't render transparent pixels;
6671      p[0] = c[2];
6672      p[1] = c[1];
6673      p[2] = c[0];
6674      p[3] = c[3];
6675   }
6676   g->cur_x += 4;
6677
6678   if (g->cur_x >= g->max_x) {
6679      g->cur_x = g->start_x;
6680      g->cur_y += g->step;
6681
6682      while (g->cur_y >= g->max_y && g->parse > 0) {
6683         g->step = (1 << g->parse) * g->line_size;
6684         g->cur_y = g->start_y + (g->step >> 1);
6685         --g->parse;
6686      }
6687   }
6688}
6689
6690static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6691{
6692   stbi_uc lzw_cs;
6693   stbi__int32 len, init_code;
6694   stbi__uint32 first;
6695   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6696   stbi__gif_lzw *p;
6697
6698   lzw_cs = stbi__get8(s);
6699   if (lzw_cs > 12) return NULL;
6700   clear = 1 << lzw_cs;
6701   first = 1;
6702   codesize = lzw_cs + 1;
6703   codemask = (1 << codesize) - 1;
6704   bits = 0;
6705   valid_bits = 0;
6706   for (init_code = 0; init_code < clear; init_code++) {
6707      g->codes[init_code].prefix = -1;
6708      g->codes[init_code].first = (stbi_uc) init_code;
6709      g->codes[init_code].suffix = (stbi_uc) init_code;
6710   }
6711
6712   // support no starting clear code
6713   avail = clear+2;
6714   oldcode = -1;
6715
6716   len = 0;
6717   for(;;) {
6718      if (valid_bits < codesize) {
6719         if (len == 0) {
6720            len = stbi__get8(s); // start new block
6721            if (len == 0)
6722               return g->out;
6723         }
6724         --len;
6725         bits |= (stbi__int32) stbi__get8(s) << valid_bits;
6726         valid_bits += 8;
6727      } else {
6728         stbi__int32 code = bits & codemask;
6729         bits >>= codesize;
6730         valid_bits -= codesize;
6731         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6732         if (code == clear) {  // clear code
6733            codesize = lzw_cs + 1;
6734            codemask = (1 << codesize) - 1;
6735            avail = clear + 2;
6736            oldcode = -1;
6737            first = 0;
6738         } else if (code == clear + 1) { // end of stream code
6739            stbi__skip(s, len);
6740            while ((len = stbi__get8(s)) > 0)
6741               stbi__skip(s,len);
6742            return g->out;
6743         } else if (code <= avail) {
6744            if (first) {
6745               return stbi__errpuc("no clear code", "Corrupt GIF");
6746            }
6747
6748            if (oldcode >= 0) {
6749               p = &g->codes[avail++];
6750               if (avail > 8192) {
6751                  return stbi__errpuc("too many codes", "Corrupt GIF");
6752               }
6753
6754               p->prefix = (stbi__int16) oldcode;
6755               p->first = g->codes[oldcode].first;
6756               p->suffix = (code == avail) ? p->first : g->codes[code].first;
6757            } else if (code == avail)
6758               return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6759
6760            stbi__out_gif_code(g, (stbi__uint16) code);
6761
6762            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6763               codesize++;
6764               codemask = (1 << codesize) - 1;
6765            }
6766
6767            oldcode = code;
6768         } else {
6769            return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6770         }
6771      }
6772   }
6773}
6774
6775// this function is designed to support animated gifs, although stb_image doesn't support it
6776// two back is the image from two frames ago, used for a very specific disposal format
6777static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
6778{
6779   int dispose;
6780   int first_frame;
6781   int pi;
6782   int pcount;
6783   STBI_NOTUSED(req_comp);
6784
6785   // on first frame, any non-written pixels get the background colour (non-transparent)
6786   first_frame = 0;
6787   if (g->out == 0) {
6788      if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
6789      if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
6790         return stbi__errpuc("too large", "GIF image is too large");
6791      pcount = g->w * g->h;
6792      g->out = (stbi_uc *) stbi__malloc(4 * pcount);
6793      g->background = (stbi_uc *) stbi__malloc(4 * pcount);
6794      g->history = (stbi_uc *) stbi__malloc(pcount);
6795      if (!g->out || !g->background || !g->history)
6796         return stbi__errpuc("outofmem", "Out of memory");
6797
6798      // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
6799      // background colour is only used for pixels that are not rendered first frame, after that "background"
6800      // color refers to the color that was there the previous frame.
6801      memset(g->out, 0x00, 4 * pcount);
6802      memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
6803      memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
6804      first_frame = 1;
6805   } else {
6806      // second frame - how do we dispose of the previous one?
6807      dispose = (g->eflags & 0x1C) >> 2;
6808      pcount = g->w * g->h;
6809
6810      if ((dispose == 3) && (two_back == 0)) {
6811         dispose = 2; // if I don't have an image to revert back to, default to the old background
6812      }
6813
6814      if (dispose == 3) { // use previous graphic
6815         for (pi = 0; pi < pcount; ++pi) {
6816            if (g->history[pi]) {
6817               memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
6818            }
6819         }
6820      } else if (dispose == 2) {
6821         // restore what was changed last frame to background before that frame;
6822         for (pi = 0; pi < pcount; ++pi) {
6823            if (g->history[pi]) {
6824               memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
6825            }
6826         }
6827      } else {
6828         // This is a non-disposal case eithe way, so just
6829         // leave the pixels as is, and they will become the new background
6830         // 1: do not dispose
6831         // 0:  not specified.
6832      }
6833
6834      // background is what out is after the undoing of the previou frame;
6835      memcpy( g->background, g->out, 4 * g->w * g->h );
6836   }
6837
6838   // clear my history;
6839   memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
6840
6841   for (;;) {
6842      int tag = stbi__get8(s);
6843      switch (tag) {
6844         case 0x2C: /* Image Descriptor */
6845         {
6846            stbi__int32 x, y, w, h;
6847            stbi_uc *o;
6848
6849            x = stbi__get16le(s);
6850            y = stbi__get16le(s);
6851            w = stbi__get16le(s);
6852            h = stbi__get16le(s);
6853            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6854               return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6855
6856            g->line_size = g->w * 4;
6857            g->start_x = x * 4;
6858            g->start_y = y * g->line_size;
6859            g->max_x   = g->start_x + w * 4;
6860            g->max_y   = g->start_y + h * g->line_size;
6861            g->cur_x   = g->start_x;
6862            g->cur_y   = g->start_y;
6863
6864            // if the width of the specified rectangle is 0, that means
6865            // we may not see *any* pixels or the image is malformed;
6866            // to make sure this is caught, move the current y down to
6867            // max_y (which is what out_gif_code checks).
6868            if (w == 0)
6869               g->cur_y = g->max_y;
6870
6871            g->lflags = stbi__get8(s);
6872
6873            if (g->lflags & 0x40) {
6874               g->step = 8 * g->line_size; // first interlaced spacing
6875               g->parse = 3;
6876            } else {
6877               g->step = g->line_size;
6878               g->parse = 0;
6879            }
6880
6881            if (g->lflags & 0x80) {
6882               stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6883               g->color_table = (stbi_uc *) g->lpal;
6884            } else if (g->flags & 0x80) {
6885               g->color_table = (stbi_uc *) g->pal;
6886            } else
6887               return stbi__errpuc("missing color table", "Corrupt GIF");
6888
6889            o = stbi__process_gif_raster(s, g);
6890            if (!o) return NULL;
6891
6892            // if this was the first frame,
6893            pcount = g->w * g->h;
6894            if (first_frame && (g->bgindex > 0)) {
6895               // if first frame, any pixel not drawn to gets the background color
6896               for (pi = 0; pi < pcount; ++pi) {
6897                  if (g->history[pi] == 0) {
6898                     g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
6899                     memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
6900                  }
6901               }
6902            }
6903
6904            return o;
6905         }
6906
6907         case 0x21: // Comment Extension.
6908         {
6909            int len;
6910            int ext = stbi__get8(s);
6911            if (ext == 0xF9) { // Graphic Control Extension.
6912               len = stbi__get8(s);
6913               if (len == 4) {
6914                  g->eflags = stbi__get8(s);
6915                  g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
6916
6917                  // unset old transparent
6918                  if (g->transparent >= 0) {
6919                     g->pal[g->transparent][3] = 255;
6920                  }
6921                  if (g->eflags & 0x01) {
6922                     g->transparent = stbi__get8(s);
6923                     if (g->transparent >= 0) {
6924                        g->pal[g->transparent][3] = 0;
6925                     }
6926                  } else {
6927                     // don't need transparent
6928                     stbi__skip(s, 1);
6929                     g->transparent = -1;
6930                  }
6931               } else {
6932                  stbi__skip(s, len);
6933                  break;
6934               }
6935            }
6936            while ((len = stbi__get8(s)) != 0) {
6937               stbi__skip(s, len);
6938            }
6939            break;
6940         }
6941
6942         case 0x3B: // gif stream termination code
6943            return (stbi_uc *) s; // using '1' causes warning on some compilers
6944
6945         default:
6946            return stbi__errpuc("unknown code", "Corrupt GIF");
6947      }
6948   }
6949}
6950
6951static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
6952{
6953   STBI_FREE(g->out);
6954   STBI_FREE(g->history);
6955   STBI_FREE(g->background);
6956
6957   if (out) STBI_FREE(out);
6958   if (delays && *delays) STBI_FREE(*delays);
6959   return stbi__errpuc("outofmem", "Out of memory");
6960}
6961
6962static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
6963{
6964   if (stbi__gif_test(s)) {
6965      int layers = 0;
6966      stbi_uc *u = 0;
6967      stbi_uc *out = 0;
6968      stbi_uc *two_back = 0;
6969      stbi__gif g;
6970      int stride;
6971      int out_size = 0;
6972      int delays_size = 0;
6973
6974      STBI_NOTUSED(out_size);
6975      STBI_NOTUSED(delays_size);
6976
6977      memset(&g, 0, sizeof(g));
6978      if (delays) {
6979         *delays = 0;
6980      }
6981
6982      do {
6983         u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
6984         if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6985
6986         if (u) {
6987            *x = g.w;
6988            *y = g.h;
6989            ++layers;
6990            stride = g.w * g.h * 4;
6991
6992            if (out) {
6993               void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
6994               if (!tmp) {
6995                  void *ret = stbi__load_gif_main_outofmem(&g, out, delays);
6996                  if (delays && *delays) *delays = 0;
6997                  return ret;
6998               }
6999               else {
7000                   out = (stbi_uc*) tmp;
7001                   out_size = layers * stride;
7002               }
7003
7004               if (delays) {
7005                  int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
7006                  if (!new_delays)
7007                     return stbi__load_gif_main_outofmem(&g, out, delays);
7008                  *delays = new_delays;
7009                  delays_size = layers * sizeof(int);
7010               }
7011            } else {
7012               out = (stbi_uc*)stbi__malloc( layers * stride );
7013               if (!out) {
7014                  void *ret = stbi__load_gif_main_outofmem(&g, out, delays);
7015                  if (delays && *delays) *delays = 0;
7016                  return ret;
7017               }
7018               out_size = layers * stride;
7019               if (delays) {
7020                  *delays = (int*) stbi__malloc( layers * sizeof(int) );
7021                  if (!*delays)
7022                     return stbi__load_gif_main_outofmem(&g, out, delays);
7023                  delays_size = layers * sizeof(int);
7024               }
7025            }
7026            memcpy( out + ((layers - 1) * stride), u, stride );
7027            if (layers >= 2) {
7028               two_back = out + (layers - 2) * stride;
7029            }
7030
7031            if (delays) {
7032               (*delays)[layers - 1U] = g.delay;
7033            }
7034         }
7035      } while (u != 0);
7036
7037      // free temp buffer;
7038      STBI_FREE(g.out);
7039      STBI_FREE(g.history);
7040      STBI_FREE(g.background);
7041
7042      // do the final conversion after loading everything;
7043      if (req_comp && req_comp != 4)
7044         out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
7045
7046      *z = layers;
7047      return out;
7048   } else {
7049      return stbi__errpuc("not GIF", "Image was not as a gif type.");
7050   }
7051}
7052
7053static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7054{
7055   stbi_uc *u = 0;
7056   stbi__gif g;
7057   memset(&g, 0, sizeof(g));
7058   STBI_NOTUSED(ri);
7059
7060   u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
7061   if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
7062   if (u) {
7063      *x = g.w;
7064      *y = g.h;
7065
7066      // moved conversion to after successful load so that the same
7067      // can be done for multiple frames.
7068      if (req_comp && req_comp != 4)
7069         u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
7070   } else if (g.out) {
7071      // if there was an error and we allocated an image buffer, free it!
7072      STBI_FREE(g.out);
7073   }
7074
7075   // free buffers needed for multiple frame loading;
7076   STBI_FREE(g.history);
7077   STBI_FREE(g.background);
7078
7079   return u;
7080}
7081
7082static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
7083{
7084   return stbi__gif_info_raw(s,x,y,comp);
7085}
7086#endif
7087
7088// *************************************************************************************************
7089// Radiance RGBE HDR loader
7090// originally by Nicolas Schulz
7091#ifndef STBI_NO_HDR
7092static int stbi__hdr_test_core(stbi__context *s, const char *signature)
7093{
7094   int i;
7095   for (i=0; signature[i]; ++i)
7096      if (stbi__get8(s) != signature[i])
7097          return 0;
7098   stbi__rewind(s);
7099   return 1;
7100}
7101
7102static int stbi__hdr_test(stbi__context* s)
7103{
7104   int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
7105   stbi__rewind(s);
7106   if(!r) {
7107       r = stbi__hdr_test_core(s, "#?RGBE\n");
7108       stbi__rewind(s);
7109   }
7110   return r;
7111}
7112
7113#define STBI__HDR_BUFLEN  1024
7114static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
7115{
7116   int len=0;
7117   char c = '\0';
7118
7119   c = (char) stbi__get8(z);
7120
7121   while (!stbi__at_eof(z) && c != '\n') {
7122      buffer[len++] = c;
7123      if (len == STBI__HDR_BUFLEN-1) {
7124         // flush to end of line
7125         while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
7126            ;
7127         break;
7128      }
7129      c = (char) stbi__get8(z);
7130   }
7131
7132   buffer[len] = 0;
7133   return buffer;
7134}
7135
7136static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
7137{
7138   if ( input[3] != 0 ) {
7139      float f1;
7140      // Exponent
7141      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
7142      if (req_comp <= 2)
7143         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
7144      else {
7145         output[0] = input[0] * f1;
7146         output[1] = input[1] * f1;
7147         output[2] = input[2] * f1;
7148      }
7149      if (req_comp == 2) output[1] = 1;
7150      if (req_comp == 4) output[3] = 1;
7151   } else {
7152      switch (req_comp) {
7153         case 4: output[3] = 1; /* fallthrough */
7154         case 3: output[0] = output[1] = output[2] = 0;
7155                 break;
7156         case 2: output[1] = 1; /* fallthrough */
7157         case 1: output[0] = 0;
7158                 break;
7159      }
7160   }
7161}
7162
7163static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7164{
7165   char buffer[STBI__HDR_BUFLEN];
7166   char *token;
7167   int valid = 0;
7168   int width, height;
7169   stbi_uc *scanline;
7170   float *hdr_data;
7171   int len;
7172   unsigned char count, value;
7173   int i, j, k, c1,c2, z;
7174   const char *headerToken;
7175   STBI_NOTUSED(ri);
7176
7177   // Check identifier
7178   headerToken = stbi__hdr_gettoken(s,buffer);
7179   if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
7180      return stbi__errpf("not HDR", "Corrupt HDR image");
7181
7182   // Parse header
7183   for(;;) {
7184      token = stbi__hdr_gettoken(s,buffer);
7185      if (token[0] == 0) break;
7186      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7187   }
7188
7189   if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
7190
7191   // Parse width and height
7192   // can't use sscanf() if we're not using stdio!
7193   token = stbi__hdr_gettoken(s,buffer);
7194   if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7195   token += 3;
7196   height = (int) strtol(token, &token, 10);
7197   while (*token == ' ') ++token;
7198   if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7199   token += 3;
7200   width = (int) strtol(token, NULL, 10);
7201
7202   if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7203   if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7204
7205   *x = width;
7206   *y = height;
7207
7208   if (comp) *comp = 3;
7209   if (req_comp == 0) req_comp = 3;
7210
7211   if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
7212      return stbi__errpf("too large", "HDR image is too large");
7213
7214   // Read data
7215   hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
7216   if (!hdr_data)
7217      return stbi__errpf("outofmem", "Out of memory");
7218
7219   // Load image data
7220   // image data is stored as some number of sca
7221   if ( width < 8 || width >= 32768) {
7222      // Read flat data
7223      for (j=0; j < height; ++j) {
7224         for (i=0; i < width; ++i) {
7225            stbi_uc rgbe[4];
7226           main_decode_loop:
7227            stbi__getn(s, rgbe, 4);
7228            stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
7229         }
7230      }
7231   } else {
7232      // Read RLE-encoded data
7233      scanline = NULL;
7234
7235      for (j = 0; j < height; ++j) {
7236         c1 = stbi__get8(s);
7237         c2 = stbi__get8(s);
7238         len = stbi__get8(s);
7239         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
7240            // not run-length encoded, so we have to actually use THIS data as a decoded
7241            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
7242            stbi_uc rgbe[4];
7243            rgbe[0] = (stbi_uc) c1;
7244            rgbe[1] = (stbi_uc) c2;
7245            rgbe[2] = (stbi_uc) len;
7246            rgbe[3] = (stbi_uc) stbi__get8(s);
7247            stbi__hdr_convert(hdr_data, rgbe, req_comp);
7248            i = 1;
7249            j = 0;
7250            STBI_FREE(scanline);
7251            goto main_decode_loop; // yes, this makes no sense
7252         }
7253         len <<= 8;
7254         len |= stbi__get8(s);
7255         if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
7256         if (scanline == NULL) {
7257            scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
7258            if (!scanline) {
7259               STBI_FREE(hdr_data);
7260               return stbi__errpf("outofmem", "Out of memory");
7261            }
7262         }
7263
7264         for (k = 0; k < 4; ++k) {
7265            int nleft;
7266            i = 0;
7267            while ((nleft = width - i) > 0) {
7268               count = stbi__get8(s);
7269               if (count > 128) {
7270                  // Run
7271                  value = stbi__get8(s);
7272                  count -= 128;
7273                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7274                  for (z = 0; z < count; ++z)
7275                     scanline[i++ * 4 + k] = value;
7276               } else {
7277                  // Dump
7278                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7279                  for (z = 0; z < count; ++z)
7280                     scanline[i++ * 4 + k] = stbi__get8(s);
7281               }
7282            }
7283         }
7284         for (i=0; i < width; ++i)
7285            stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
7286      }
7287      if (scanline)
7288         STBI_FREE(scanline);
7289   }
7290
7291   return hdr_data;
7292}
7293
7294static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
7295{
7296   char buffer[STBI__HDR_BUFLEN];
7297   char *token;
7298   int valid = 0;
7299   int dummy;
7300
7301   if (!x) x = &dummy;
7302   if (!y) y = &dummy;
7303   if (!comp) comp = &dummy;
7304
7305   if (stbi__hdr_test(s) == 0) {
7306       stbi__rewind( s );
7307       return 0;
7308   }
7309
7310   for(;;) {
7311      token = stbi__hdr_gettoken(s,buffer);
7312      if (token[0] == 0) break;
7313      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7314   }
7315
7316   if (!valid) {
7317       stbi__rewind( s );
7318       return 0;
7319   }
7320   token = stbi__hdr_gettoken(s,buffer);
7321   if (strncmp(token, "-Y ", 3)) {
7322       stbi__rewind( s );
7323       return 0;
7324   }
7325   token += 3;
7326   *y = (int) strtol(token, &token, 10);
7327   while (*token == ' ') ++token;
7328   if (strncmp(token, "+X ", 3)) {
7329       stbi__rewind( s );
7330       return 0;
7331   }
7332   token += 3;
7333   *x = (int) strtol(token, NULL, 10);
7334   *comp = 3;
7335   return 1;
7336}
7337#endif // STBI_NO_HDR
7338
7339#ifndef STBI_NO_BMP
7340static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
7341{
7342   void *p;
7343   stbi__bmp_data info;
7344
7345   info.all_a = 255;
7346   p = stbi__bmp_parse_header(s, &info);
7347   if (p == NULL) {
7348      stbi__rewind( s );
7349      return 0;
7350   }
7351   if (x) *x = s->img_x;
7352   if (y) *y = s->img_y;
7353   if (comp) {
7354      if (info.bpp == 24 && info.ma == 0xff000000)
7355         *comp = 3;
7356      else
7357         *comp = info.ma ? 4 : 3;
7358   }
7359   return 1;
7360}
7361#endif
7362
7363#ifndef STBI_NO_PSD
7364static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
7365{
7366   int channelCount, dummy, depth;
7367   if (!x) x = &dummy;
7368   if (!y) y = &dummy;
7369   if (!comp) comp = &dummy;
7370   if (stbi__get32be(s) != 0x38425053) {
7371       stbi__rewind( s );
7372       return 0;
7373   }
7374   if (stbi__get16be(s) != 1) {
7375       stbi__rewind( s );
7376       return 0;
7377   }
7378   stbi__skip(s, 6);
7379   channelCount = stbi__get16be(s);
7380   if (channelCount < 0 || channelCount > 16) {
7381       stbi__rewind( s );
7382       return 0;
7383   }
7384   *y = stbi__get32be(s);
7385   *x = stbi__get32be(s);
7386   depth = stbi__get16be(s);
7387   if (depth != 8 && depth != 16) {
7388       stbi__rewind( s );
7389       return 0;
7390   }
7391   if (stbi__get16be(s) != 3) {
7392       stbi__rewind( s );
7393       return 0;
7394   }
7395   *comp = 4;
7396   return 1;
7397}
7398
7399static int stbi__psd_is16(stbi__context *s)
7400{
7401   int channelCount, depth;
7402   if (stbi__get32be(s) != 0x38425053) {
7403       stbi__rewind( s );
7404       return 0;
7405   }
7406   if (stbi__get16be(s) != 1) {
7407       stbi__rewind( s );
7408       return 0;
7409   }
7410   stbi__skip(s, 6);
7411   channelCount = stbi__get16be(s);
7412   if (channelCount < 0 || channelCount > 16) {
7413       stbi__rewind( s );
7414       return 0;
7415   }
7416   STBI_NOTUSED(stbi__get32be(s));
7417   STBI_NOTUSED(stbi__get32be(s));
7418   depth = stbi__get16be(s);
7419   if (depth != 16) {
7420       stbi__rewind( s );
7421       return 0;
7422   }
7423   return 1;
7424}
7425#endif
7426
7427#ifndef STBI_NO_PIC
7428static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
7429{
7430   int act_comp=0,num_packets=0,chained,dummy;
7431   stbi__pic_packet packets[10];
7432
7433   if (!x) x = &dummy;
7434   if (!y) y = &dummy;
7435   if (!comp) comp = &dummy;
7436
7437   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
7438      stbi__rewind(s);
7439      return 0;
7440   }
7441
7442   stbi__skip(s, 88);
7443
7444   *x = stbi__get16be(s);
7445   *y = stbi__get16be(s);
7446   if (stbi__at_eof(s)) {
7447      stbi__rewind( s);
7448      return 0;
7449   }
7450   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
7451      stbi__rewind( s );
7452      return 0;
7453   }
7454
7455   stbi__skip(s, 8);
7456
7457   do {
7458      stbi__pic_packet *packet;
7459
7460      if (num_packets==sizeof(packets)/sizeof(packets[0]))
7461         return 0;
7462
7463      packet = &packets[num_packets++];
7464      chained = stbi__get8(s);
7465      packet->size    = stbi__get8(s);
7466      packet->type    = stbi__get8(s);
7467      packet->channel = stbi__get8(s);
7468      act_comp |= packet->channel;
7469
7470      if (stbi__at_eof(s)) {
7471          stbi__rewind( s );
7472          return 0;
7473      }
7474      if (packet->size != 8) {
7475          stbi__rewind( s );
7476          return 0;
7477      }
7478   } while (chained);
7479
7480   *comp = (act_comp & 0x10 ? 4 : 3);
7481
7482   return 1;
7483}
7484#endif
7485
7486// *************************************************************************************************
7487// Portable Gray Map and Portable Pixel Map loader
7488// by Ken Miller
7489//
7490// PGM: http://netpbm.sourceforge.net/doc/pgm.html
7491// PPM: http://netpbm.sourceforge.net/doc/ppm.html
7492//
7493// Known limitations:
7494//    Does not support comments in the header section
7495//    Does not support ASCII image data (formats P2 and P3)
7496
7497#ifndef STBI_NO_PNM
7498
7499static int      stbi__pnm_test(stbi__context *s)
7500{
7501   char p, t;
7502   p = (char) stbi__get8(s);
7503   t = (char) stbi__get8(s);
7504   if (p != 'P' || (t != '5' && t != '6')) {
7505       stbi__rewind( s );
7506       return 0;
7507   }
7508   return 1;
7509}
7510
7511static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7512{
7513   stbi_uc *out;
7514   STBI_NOTUSED(ri);
7515
7516   ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
7517   if (ri->bits_per_channel == 0)
7518      return 0;
7519
7520   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7521   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7522
7523   *x = s->img_x;
7524   *y = s->img_y;
7525   if (comp) *comp = s->img_n;
7526
7527   if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
7528      return stbi__errpuc("too large", "PNM too large");
7529
7530   out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
7531   if (!out) return stbi__errpuc("outofmem", "Out of memory");
7532   if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) {
7533      STBI_FREE(out);
7534      return stbi__errpuc("bad PNM", "PNM file truncated");
7535   }
7536
7537   if (req_comp && req_comp != s->img_n) {
7538      if (ri->bits_per_channel == 16) {
7539         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y);
7540      } else {
7541         out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
7542      }
7543      if (out == NULL) return out; // stbi__convert_format frees input on failure
7544   }
7545   return out;
7546}
7547
7548static int      stbi__pnm_isspace(char c)
7549{
7550   return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
7551}
7552
7553static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
7554{
7555   for (;;) {
7556      while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
7557         *c = (char) stbi__get8(s);
7558
7559      if (stbi__at_eof(s) || *c != '#')
7560         break;
7561
7562      while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
7563         *c = (char) stbi__get8(s);
7564   }
7565}
7566
7567static int      stbi__pnm_isdigit(char c)
7568{
7569   return c >= '0' && c <= '9';
7570}
7571
7572static int      stbi__pnm_getinteger(stbi__context *s, char *c)
7573{
7574   int value = 0;
7575
7576   while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
7577      value = value*10 + (*c - '0');
7578      *c = (char) stbi__get8(s);
7579      if((value > 214748364) || (value == 214748364 && *c > '7'))
7580          return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int");
7581   }
7582
7583   return value;
7584}
7585
7586static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
7587{
7588   int maxv, dummy;
7589   char c, p, t;
7590
7591   if (!x) x = &dummy;
7592   if (!y) y = &dummy;
7593   if (!comp) comp = &dummy;
7594
7595   stbi__rewind(s);
7596
7597   // Get identifier
7598   p = (char) stbi__get8(s);
7599   t = (char) stbi__get8(s);
7600   if (p != 'P' || (t != '5' && t != '6')) {
7601       stbi__rewind(s);
7602       return 0;
7603   }
7604
7605   *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
7606
7607   c = (char) stbi__get8(s);
7608   stbi__pnm_skip_whitespace(s, &c);
7609
7610   *x = stbi__pnm_getinteger(s, &c); // read width
7611   if(*x == 0)
7612       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
7613   stbi__pnm_skip_whitespace(s, &c);
7614
7615   *y = stbi__pnm_getinteger(s, &c); // read height
7616   if (*y == 0)
7617       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
7618   stbi__pnm_skip_whitespace(s, &c);
7619
7620   maxv = stbi__pnm_getinteger(s, &c);  // read max value
7621   if (maxv > 65535)
7622      return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
7623   else if (maxv > 255)
7624      return 16;
7625   else
7626      return 8;
7627}
7628
7629static int stbi__pnm_is16(stbi__context *s)
7630{
7631   if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
7632	   return 1;
7633   return 0;
7634}
7635#endif
7636
7637static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
7638{
7639   #ifndef STBI_NO_JPEG
7640   if (stbi__jpeg_info(s, x, y, comp)) return 1;
7641   #endif
7642
7643   #ifndef STBI_NO_PNG
7644   if (stbi__png_info(s, x, y, comp))  return 1;
7645   #endif
7646
7647   #ifndef STBI_NO_GIF
7648   if (stbi__gif_info(s, x, y, comp))  return 1;
7649   #endif
7650
7651   #ifndef STBI_NO_BMP
7652   if (stbi__bmp_info(s, x, y, comp))  return 1;
7653   #endif
7654
7655   #ifndef STBI_NO_PSD
7656   if (stbi__psd_info(s, x, y, comp))  return 1;
7657   #endif
7658
7659   #ifndef STBI_NO_PIC
7660   if (stbi__pic_info(s, x, y, comp))  return 1;
7661   #endif
7662
7663   #ifndef STBI_NO_PNM
7664   if (stbi__pnm_info(s, x, y, comp))  return 1;
7665   #endif
7666
7667   #ifndef STBI_NO_HDR
7668   if (stbi__hdr_info(s, x, y, comp))  return 1;
7669   #endif
7670
7671   // test tga last because it's a crappy test!
7672   #ifndef STBI_NO_TGA
7673   if (stbi__tga_info(s, x, y, comp))
7674       return 1;
7675   #endif
7676   return stbi__err("unknown image type", "Image not of any known type, or corrupt");
7677}
7678
7679static int stbi__is_16_main(stbi__context *s)
7680{
7681   #ifndef STBI_NO_PNG
7682   if (stbi__png_is16(s))  return 1;
7683   #endif
7684
7685   #ifndef STBI_NO_PSD
7686   if (stbi__psd_is16(s))  return 1;
7687   #endif
7688
7689   #ifndef STBI_NO_PNM
7690   if (stbi__pnm_is16(s))  return 1;
7691   #endif
7692   return 0;
7693}
7694
7695#ifndef STBI_NO_STDIO
7696STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
7697{
7698    FILE *f = stbi__fopen(filename, "rb");
7699    int result;
7700    if (!f) return stbi__err("can't fopen", "Unable to open file");
7701    result = stbi_info_from_file(f, x, y, comp);
7702    fclose(f);
7703    return result;
7704}
7705
7706STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
7707{
7708   int r;
7709   stbi__context s;
7710   long pos = ftell(f);
7711   stbi__start_file(&s, f);
7712   r = stbi__info_main(&s,x,y,comp);
7713   fseek(f,pos,SEEK_SET);
7714   return r;
7715}
7716
7717STBIDEF int stbi_is_16_bit(char const *filename)
7718{
7719    FILE *f = stbi__fopen(filename, "rb");
7720    int result;
7721    if (!f) return stbi__err("can't fopen", "Unable to open file");
7722    result = stbi_is_16_bit_from_file(f);
7723    fclose(f);
7724    return result;
7725}
7726
7727STBIDEF int stbi_is_16_bit_from_file(FILE *f)
7728{
7729   int r;
7730   stbi__context s;
7731   long pos = ftell(f);
7732   stbi__start_file(&s, f);
7733   r = stbi__is_16_main(&s);
7734   fseek(f,pos,SEEK_SET);
7735   return r;
7736}
7737#endif // !STBI_NO_STDIO
7738
7739STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
7740{
7741   stbi__context s;
7742   stbi__start_mem(&s,buffer,len);
7743   return stbi__info_main(&s,x,y,comp);
7744}
7745
7746STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
7747{
7748   stbi__context s;
7749   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7750   return stbi__info_main(&s,x,y,comp);
7751}
7752
7753STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
7754{
7755   stbi__context s;
7756   stbi__start_mem(&s,buffer,len);
7757   return stbi__is_16_main(&s);
7758}
7759
7760STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
7761{
7762   stbi__context s;
7763   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7764   return stbi__is_16_main(&s);
7765}
7766
7767#endif // STB_IMAGE_IMPLEMENTATION
7768
7769/*
7770   revision history:
7771      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
7772      2.19  (2018-02-11) fix warning
7773      2.18  (2018-01-30) fix warnings
7774      2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
7775                         1-bit BMP
7776                         *_is_16_bit api
7777                         avoid warnings
7778      2.16  (2017-07-23) all functions have 16-bit variants;
7779                         STBI_NO_STDIO works again;
7780                         compilation fixes;
7781                         fix rounding in unpremultiply;
7782                         optimize vertical flip;
7783                         disable raw_len validation;
7784                         documentation fixes
7785      2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
7786                         warning fixes; disable run-time SSE detection on gcc;
7787                         uniform handling of optional "return" values;
7788                         thread-safe initialization of zlib tables
7789      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
7790      2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
7791      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7792      2.11  (2016-04-02) allocate large structures on the stack
7793                         remove white matting for transparent PSD
7794                         fix reported channel count for PNG & BMP
7795                         re-enable SSE2 in non-gcc 64-bit
7796                         support RGB-formatted JPEG
7797                         read 16-bit PNGs (only as 8-bit)
7798      2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7799      2.09  (2016-01-16) allow comments in PNM files
7800                         16-bit-per-pixel TGA (not bit-per-component)
7801                         info() for TGA could break due to .hdr handling
7802                         info() for BMP to shares code instead of sloppy parse
7803                         can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7804                         code cleanup
7805      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7806      2.07  (2015-09-13) fix compiler warnings
7807                         partial animated GIF support
7808                         limited 16-bpc PSD support
7809                         #ifdef unused functions
7810                         bug with < 92 byte PIC,PNM,HDR,TGA
7811      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
7812      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
7813      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7814      2.03  (2015-04-12) extra corruption checking (mmozeiko)
7815                         stbi_set_flip_vertically_on_load (nguillemot)
7816                         fix NEON support; fix mingw support
7817      2.02  (2015-01-19) fix incorrect assert, fix warning
7818      2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7819      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7820      2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7821                         progressive JPEG (stb)
7822                         PGM/PPM support (Ken Miller)
7823                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
7824                         GIF bugfix -- seemingly never worked
7825                         STBI_NO_*, STBI_ONLY_*
7826      1.48  (2014-12-14) fix incorrectly-named assert()
7827      1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7828                         optimize PNG (ryg)
7829                         fix bug in interlaced PNG with user-specified channel count (stb)
7830      1.46  (2014-08-26)
7831              fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7832      1.45  (2014-08-16)
7833              fix MSVC-ARM internal compiler error by wrapping malloc
7834      1.44  (2014-08-07)
7835              various warning fixes from Ronny Chevalier
7836      1.43  (2014-07-15)
7837              fix MSVC-only compiler problem in code changed in 1.42
7838      1.42  (2014-07-09)
7839              don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7840              fixes to stbi__cleanup_jpeg path
7841              added STBI_ASSERT to avoid requiring assert.h
7842      1.41  (2014-06-25)
7843              fix search&replace from 1.36 that messed up comments/error messages
7844      1.40  (2014-06-22)
7845              fix gcc struct-initialization warning
7846      1.39  (2014-06-15)
7847              fix to TGA optimization when req_comp != number of components in TGA;
7848              fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7849              add support for BMP version 5 (more ignored fields)
7850      1.38  (2014-06-06)
7851              suppress MSVC warnings on integer casts truncating values
7852              fix accidental rename of 'skip' field of I/O
7853      1.37  (2014-06-04)
7854              remove duplicate typedef
7855      1.36  (2014-06-03)
7856              convert to header file single-file library
7857              if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7858      1.35  (2014-05-27)
7859              various warnings
7860              fix broken STBI_SIMD path
7861              fix bug where stbi_load_from_file no longer left file pointer in correct place
7862              fix broken non-easy path for 32-bit BMP (possibly never used)
7863              TGA optimization by Arseny Kapoulkine
7864      1.34  (unknown)
7865              use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7866      1.33  (2011-07-14)
7867              make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7868      1.32  (2011-07-13)
7869              support for "info" function for all supported filetypes (SpartanJ)
7870      1.31  (2011-06-20)
7871              a few more leak fixes, bug in PNG handling (SpartanJ)
7872      1.30  (2011-06-11)
7873              added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7874              removed deprecated format-specific test/load functions
7875              removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7876              error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7877              fix inefficiency in decoding 32-bit BMP (David Woo)
7878      1.29  (2010-08-16)
7879              various warning fixes from Aurelien Pocheville
7880      1.28  (2010-08-01)
7881              fix bug in GIF palette transparency (SpartanJ)
7882      1.27  (2010-08-01)
7883              cast-to-stbi_uc to fix warnings
7884      1.26  (2010-07-24)
7885              fix bug in file buffering for PNG reported by SpartanJ
7886      1.25  (2010-07-17)
7887              refix trans_data warning (Won Chun)
7888      1.24  (2010-07-12)
7889              perf improvements reading from files on platforms with lock-heavy fgetc()
7890              minor perf improvements for jpeg
7891              deprecated type-specific functions so we'll get feedback if they're needed
7892              attempt to fix trans_data warning (Won Chun)
7893      1.23    fixed bug in iPhone support
7894      1.22  (2010-07-10)
7895              removed image *writing* support
7896              stbi_info support from Jetro Lauha
7897              GIF support from Jean-Marc Lienher
7898              iPhone PNG-extensions from James Brown
7899              warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7900      1.21    fix use of 'stbi_uc' in header (reported by jon blow)
7901      1.20    added support for Softimage PIC, by Tom Seddon
7902      1.19    bug in interlaced PNG corruption check (found by ryg)
7903      1.18  (2008-08-02)
7904              fix a threading bug (local mutable static)
7905      1.17    support interlaced PNG
7906      1.16    major bugfix - stbi__convert_format converted one too many pixels
7907      1.15    initialize some fields for thread safety
7908      1.14    fix threadsafe conversion bug
7909              header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7910      1.13    threadsafe
7911      1.12    const qualifiers in the API
7912      1.11    Support installable IDCT, colorspace conversion routines
7913      1.10    Fixes for 64-bit (don't use "unsigned long")
7914              optimized upsampling by Fabian "ryg" Giesen
7915      1.09    Fix format-conversion for PSD code (bad global variables!)
7916      1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7917      1.07    attempt to fix C++ warning/errors again
7918      1.06    attempt to fix C++ warning/errors again
7919      1.05    fix TGA loading to return correct *comp and use good luminance calc
7920      1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
7921      1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7922      1.02    support for (subset of) HDR files, float interface for preferred access to them
7923      1.01    fix bug: possible bug in handling right-side up bmps... not sure
7924              fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7925      1.00    interface to zlib that skips zlib header
7926      0.99    correct handling of alpha in palette
7927      0.98    TGA loader by lonesock; dynamically add loaders (untested)
7928      0.97    jpeg errors on too large a file; also catch another malloc failure
7929      0.96    fix detection of invalid v value - particleman@mollyrocket forum
7930      0.95    during header scan, seek to markers in case of padding
7931      0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7932      0.93    handle jpegtran output; verbose errors
7933      0.92    read 4,8,16,24,32-bit BMP files of several formats
7934      0.91    output 24-bit Windows 3.0 BMP files
7935      0.90    fix a few more warnings; bump version number to approach 1.0
7936      0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
7937      0.60    fix compiling as c++
7938      0.59    fix warnings: merge Dave Moore's -Wall fixes
7939      0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
7940      0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7941      0.56    fix bug: zlib uncompressed mode len vs. nlen
7942      0.55    fix bug: restart_interval not initialized to 0
7943      0.54    allow NULL for 'int *comp'
7944      0.53    fix bug in png 3->4; speedup png decoding
7945      0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7946      0.51    obey req_comp requests, 1-component jpegs return as 1-component,
7947              on 'test' only check type, not whether we support this variant
7948      0.50  (2006-11-19)
7949              first released version
7950*/
7951
7952
7953/*
7954------------------------------------------------------------------------------
7955This software is available under 2 licenses -- choose whichever you prefer.
7956------------------------------------------------------------------------------
7957ALTERNATIVE A - MIT License
7958Copyright (c) 2017 Sean Barrett
7959Permission is hereby granted, free of charge, to any person obtaining a copy of
7960this software and associated documentation files (the "Software"), to deal in
7961the Software without restriction, including without limitation the rights to
7962use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7963of the Software, and to permit persons to whom the Software is furnished to do
7964so, subject to the following conditions:
7965The above copyright notice and this permission notice shall be included in all
7966copies or substantial portions of the Software.
7967THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7968IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7969FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7970AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
7971LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
7972OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
7973SOFTWARE.
7974------------------------------------------------------------------------------
7975ALTERNATIVE B - Public Domain (www.unlicense.org)
7976This is free and unencumbered software released into the public domain.
7977Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
7978software, either in source code form or as a compiled binary, for any purpose,
7979commercial or non-commercial, and by any means.
7980In jurisdictions that recognize copyright laws, the author or authors of this
7981software dedicate any and all copyright interest in the software to the public
7982domain. We make this dedication for the benefit of the public at large and to
7983the detriment of our heirs and successors. We intend this dedication to be an
7984overt act of relinquishment in perpetuity of all present and future rights to
7985this software under copyright law.
7986THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7987IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7988FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7989AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
7990ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
7991WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7992------------------------------------------------------------------------------
7993*/
7994