minesweeper

A minewseeper implementation to play around with Hare and Raylib
git clone https://git.tronto.net/minesweeper
Download | Log | Files | Refs | README | LICENSE

dr_flac.h (525671B)


      1 /*
      2 FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
      3 dr_flac - v0.12.42 - 2023-11-02
      4 
      5 David Reid - mackron@gmail.com
      6 
      7 GitHub: https://github.com/mackron/dr_libs
      8 */
      9 
     10 /*
     11 RELEASE NOTES - v0.12.0
     12 =======================
     13 Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
     14 
     15 
     16 Improved Client-Defined Memory Allocation
     17 -----------------------------------------
     18 The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
     19 existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
     20 allocation callbacks are specified.
     21 
     22 To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
     23 
     24     void* my_malloc(size_t sz, void* pUserData)
     25     {
     26         return malloc(sz);
     27     }
     28     void* my_realloc(void* p, size_t sz, void* pUserData)
     29     {
     30         return realloc(p, sz);
     31     }
     32     void my_free(void* p, void* pUserData)
     33     {
     34         free(p);
     35     }
     36 
     37     ...
     38 
     39     drflac_allocation_callbacks allocationCallbacks;
     40     allocationCallbacks.pUserData = &myData;
     41     allocationCallbacks.onMalloc  = my_malloc;
     42     allocationCallbacks.onRealloc = my_realloc;
     43     allocationCallbacks.onFree    = my_free;
     44     drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
     45 
     46 The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
     47 
     48 Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
     49 DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
     50 
     51 Every API that opens a drflac object now takes this extra parameter. These include the following:
     52 
     53     drflac_open()
     54     drflac_open_relaxed()
     55     drflac_open_with_metadata()
     56     drflac_open_with_metadata_relaxed()
     57     drflac_open_file()
     58     drflac_open_file_with_metadata()
     59     drflac_open_memory()
     60     drflac_open_memory_with_metadata()
     61     drflac_open_and_read_pcm_frames_s32()
     62     drflac_open_and_read_pcm_frames_s16()
     63     drflac_open_and_read_pcm_frames_f32()
     64     drflac_open_file_and_read_pcm_frames_s32()
     65     drflac_open_file_and_read_pcm_frames_s16()
     66     drflac_open_file_and_read_pcm_frames_f32()
     67     drflac_open_memory_and_read_pcm_frames_s32()
     68     drflac_open_memory_and_read_pcm_frames_s16()
     69     drflac_open_memory_and_read_pcm_frames_f32()
     70 
     71 
     72 
     73 Optimizations
     74 -------------
     75 Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
     76 improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
     77 advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
     78 means it will be disabled when DR_FLAC_NO_CRC is used.
     79 
     80 The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
     81 particular. 16-bit streams should also see some improvement.
     82 
     83 drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
     84 to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
     85 
     86 A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
     87 channel reconstruction which is the last part of the decoding process.
     88 
     89 The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
     90 compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
     91 compile time and the REV instruction requires ARM architecture version 6.
     92 
     93 An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
     94 
     95 
     96 Removed APIs
     97 ------------
     98 The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
     99 
    100     drflac_read_s32()                   -> drflac_read_pcm_frames_s32()
    101     drflac_read_s16()                   -> drflac_read_pcm_frames_s16()
    102     drflac_read_f32()                   -> drflac_read_pcm_frames_f32()
    103     drflac_seek_to_sample()             -> drflac_seek_to_pcm_frame()
    104     drflac_open_and_decode_s32()        -> drflac_open_and_read_pcm_frames_s32()
    105     drflac_open_and_decode_s16()        -> drflac_open_and_read_pcm_frames_s16()
    106     drflac_open_and_decode_f32()        -> drflac_open_and_read_pcm_frames_f32()
    107     drflac_open_and_decode_file_s32()   -> drflac_open_file_and_read_pcm_frames_s32()
    108     drflac_open_and_decode_file_s16()   -> drflac_open_file_and_read_pcm_frames_s16()
    109     drflac_open_and_decode_file_f32()   -> drflac_open_file_and_read_pcm_frames_f32()
    110     drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
    111     drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
    112     drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
    113 
    114 Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
    115 to the old per-sample APIs. You now need to use the "pcm_frame" versions.
    116 */
    117 
    118 
    119 /*
    120 Introduction
    121 ============
    122 dr_flac is a single file library. To use it, do something like the following in one .c file.
    123 
    124     ```c
    125     #define DR_FLAC_IMPLEMENTATION
    126     #include "dr_flac.h"
    127     ```
    128 
    129 You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
    130 
    131     ```c
    132     drflac* pFlac = drflac_open_file("MySong.flac", NULL);
    133     if (pFlac == NULL) {
    134         // Failed to open FLAC file
    135     }
    136 
    137     drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
    138     drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
    139     ```
    140 
    141 The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
    142 should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
    143 a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
    144 
    145 You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
    146 samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
    147 
    148     ```c
    149     while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
    150         do_something();
    151     }
    152     ```
    153 
    154 You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
    155 
    156 If you just want to quickly decode an entire FLAC file in one go you can do something like this:
    157 
    158     ```c
    159     unsigned int channels;
    160     unsigned int sampleRate;
    161     drflac_uint64 totalPCMFrameCount;
    162     drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
    163     if (pSampleData == NULL) {
    164         // Failed to open and decode FLAC file.
    165     }
    166 
    167     ...
    168 
    169     drflac_free(pSampleData, NULL);
    170     ```
    171 
    172 You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
    173 should be considered lossy.
    174 
    175 
    176 If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
    177 The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
    178 reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
    179 
    180 The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
    181 streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
    182     
    183     `drflac_open_relaxed()`
    184     `drflac_open_with_metadata_relaxed()`
    185 
    186 It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
    187 APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
    188 
    189 
    190 
    191 Build Options
    192 =============
    193 #define these options before including this file.
    194 
    195 #define DR_FLAC_NO_STDIO
    196   Disable `drflac_open_file()` and family.
    197 
    198 #define DR_FLAC_NO_OGG
    199   Disables support for Ogg/FLAC streams.
    200 
    201 #define DR_FLAC_BUFFER_SIZE <number>
    202   Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
    203   Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
    204   you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
    205 
    206 #define DR_FLAC_NO_CRC
    207   Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
    208   be used if available. Otherwise the seek will be performed using brute force.
    209 
    210 #define DR_FLAC_NO_SIMD
    211   Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
    212 
    213 #define DR_FLAC_NO_WCHAR
    214   Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
    215 
    216 
    217 
    218 Notes
    219 =====
    220 - dr_flac does not support changing the sample rate nor channel count mid stream.
    221 - dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
    222 - When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
    223   to differences in corrupted stream recorvery logic between the two APIs.
    224 */
    225 
    226 #ifndef dr_flac_h
    227 #define dr_flac_h
    228 
    229 #ifdef __cplusplus
    230 extern "C" {
    231 #endif
    232 
    233 #define DRFLAC_STRINGIFY(x)      #x
    234 #define DRFLAC_XSTRINGIFY(x)     DRFLAC_STRINGIFY(x)
    235 
    236 #define DRFLAC_VERSION_MAJOR     0
    237 #define DRFLAC_VERSION_MINOR     12
    238 #define DRFLAC_VERSION_REVISION  42
    239 #define DRFLAC_VERSION_STRING    DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
    240 
    241 #include <stddef.h> /* For size_t. */
    242 
    243 /* Sized Types */
    244 typedef   signed char           drflac_int8;
    245 typedef unsigned char           drflac_uint8;
    246 typedef   signed short          drflac_int16;
    247 typedef unsigned short          drflac_uint16;
    248 typedef   signed int            drflac_int32;
    249 typedef unsigned int            drflac_uint32;
    250 #if defined(_MSC_VER) && !defined(__clang__)
    251     typedef   signed __int64    drflac_int64;
    252     typedef unsigned __int64    drflac_uint64;
    253 #else
    254     #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
    255         #pragma GCC diagnostic push
    256         #pragma GCC diagnostic ignored "-Wlong-long"
    257         #if defined(__clang__)
    258             #pragma GCC diagnostic ignored "-Wc++11-long-long"
    259         #endif
    260     #endif
    261     typedef   signed long long  drflac_int64;
    262     typedef unsigned long long  drflac_uint64;
    263     #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
    264         #pragma GCC diagnostic pop
    265     #endif
    266 #endif
    267 #if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
    268     typedef drflac_uint64       drflac_uintptr;
    269 #else
    270     typedef drflac_uint32       drflac_uintptr;
    271 #endif
    272 typedef drflac_uint8            drflac_bool8;
    273 typedef drflac_uint32           drflac_bool32;
    274 #define DRFLAC_TRUE             1
    275 #define DRFLAC_FALSE            0
    276 /* End Sized Types */
    277 
    278 /* Decorations */
    279 #if !defined(DRFLAC_API)
    280     #if defined(DRFLAC_DLL)
    281         #if defined(_WIN32)
    282             #define DRFLAC_DLL_IMPORT  __declspec(dllimport)
    283             #define DRFLAC_DLL_EXPORT  __declspec(dllexport)
    284             #define DRFLAC_DLL_PRIVATE static
    285         #else
    286             #if defined(__GNUC__) && __GNUC__ >= 4
    287                 #define DRFLAC_DLL_IMPORT  __attribute__((visibility("default")))
    288                 #define DRFLAC_DLL_EXPORT  __attribute__((visibility("default")))
    289                 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
    290             #else
    291                 #define DRFLAC_DLL_IMPORT
    292                 #define DRFLAC_DLL_EXPORT
    293                 #define DRFLAC_DLL_PRIVATE static
    294             #endif
    295         #endif
    296 
    297         #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
    298             #define DRFLAC_API  DRFLAC_DLL_EXPORT
    299         #else
    300             #define DRFLAC_API  DRFLAC_DLL_IMPORT
    301         #endif
    302         #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
    303     #else
    304         #define DRFLAC_API extern
    305         #define DRFLAC_PRIVATE static
    306     #endif
    307 #endif
    308 /* End Decorations */
    309 
    310 #if defined(_MSC_VER) && _MSC_VER >= 1700   /* Visual Studio 2012 */
    311     #define DRFLAC_DEPRECATED       __declspec(deprecated)
    312 #elif (defined(__GNUC__) && __GNUC__ >= 4)  /* GCC 4 */
    313     #define DRFLAC_DEPRECATED       __attribute__((deprecated))
    314 #elif defined(__has_feature)                /* Clang */
    315     #if __has_feature(attribute_deprecated)
    316         #define DRFLAC_DEPRECATED   __attribute__((deprecated))
    317     #else
    318         #define DRFLAC_DEPRECATED
    319     #endif
    320 #else
    321     #define DRFLAC_DEPRECATED
    322 #endif
    323 
    324 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
    325 DRFLAC_API const char* drflac_version_string(void);
    326 
    327 /* Allocation Callbacks */
    328 typedef struct
    329 {
    330     void* pUserData;
    331     void* (* onMalloc)(size_t sz, void* pUserData);
    332     void* (* onRealloc)(void* p, size_t sz, void* pUserData);
    333     void  (* onFree)(void* p, void* pUserData);
    334 } drflac_allocation_callbacks;
    335 /* End Allocation Callbacks */
    336 
    337 /*
    338 As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
    339 but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
    340 */
    341 #ifndef DR_FLAC_BUFFER_SIZE
    342 #define DR_FLAC_BUFFER_SIZE   4096
    343 #endif
    344 
    345 
    346 /* Architecture Detection */
    347 #if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
    348 #define DRFLAC_64BIT
    349 #endif
    350 
    351 #if defined(__x86_64__) || defined(_M_X64)
    352     #define DRFLAC_X64
    353 #elif defined(__i386) || defined(_M_IX86)
    354     #define DRFLAC_X86
    355 #elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64)
    356     #define DRFLAC_ARM
    357 #endif
    358 /* End Architecture Detection */
    359 
    360 
    361 #ifdef DRFLAC_64BIT
    362 typedef drflac_uint64 drflac_cache_t;
    363 #else
    364 typedef drflac_uint32 drflac_cache_t;
    365 #endif
    366 
    367 /* The various metadata block types. */
    368 #define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO       0
    369 #define DRFLAC_METADATA_BLOCK_TYPE_PADDING          1
    370 #define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION      2
    371 #define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE        3
    372 #define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT   4
    373 #define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET         5
    374 #define DRFLAC_METADATA_BLOCK_TYPE_PICTURE          6
    375 #define DRFLAC_METADATA_BLOCK_TYPE_INVALID          127
    376 
    377 /* The various picture types specified in the PICTURE block. */
    378 #define DRFLAC_PICTURE_TYPE_OTHER                   0
    379 #define DRFLAC_PICTURE_TYPE_FILE_ICON               1
    380 #define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON         2
    381 #define DRFLAC_PICTURE_TYPE_COVER_FRONT             3
    382 #define DRFLAC_PICTURE_TYPE_COVER_BACK              4
    383 #define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE            5
    384 #define DRFLAC_PICTURE_TYPE_MEDIA                   6
    385 #define DRFLAC_PICTURE_TYPE_LEAD_ARTIST             7
    386 #define DRFLAC_PICTURE_TYPE_ARTIST                  8
    387 #define DRFLAC_PICTURE_TYPE_CONDUCTOR               9
    388 #define DRFLAC_PICTURE_TYPE_BAND                    10
    389 #define DRFLAC_PICTURE_TYPE_COMPOSER                11
    390 #define DRFLAC_PICTURE_TYPE_LYRICIST                12
    391 #define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION      13
    392 #define DRFLAC_PICTURE_TYPE_DURING_RECORDING        14
    393 #define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE      15
    394 #define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE          16
    395 #define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH     17
    396 #define DRFLAC_PICTURE_TYPE_ILLUSTRATION            18
    397 #define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE           19
    398 #define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE      20
    399 
    400 typedef enum
    401 {
    402     drflac_container_native,
    403     drflac_container_ogg,
    404     drflac_container_unknown
    405 } drflac_container;
    406 
    407 typedef enum
    408 {
    409     drflac_seek_origin_start,
    410     drflac_seek_origin_current
    411 } drflac_seek_origin;
    412 
    413 /* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
    414 typedef struct
    415 {
    416     drflac_uint64 firstPCMFrame;
    417     drflac_uint64 flacFrameOffset;   /* The offset from the first byte of the header of the first frame. */
    418     drflac_uint16 pcmFrameCount;
    419 } drflac_seekpoint;
    420 
    421 typedef struct
    422 {
    423     drflac_uint16 minBlockSizeInPCMFrames;
    424     drflac_uint16 maxBlockSizeInPCMFrames;
    425     drflac_uint32 minFrameSizeInPCMFrames;
    426     drflac_uint32 maxFrameSizeInPCMFrames;
    427     drflac_uint32 sampleRate;
    428     drflac_uint8  channels;
    429     drflac_uint8  bitsPerSample;
    430     drflac_uint64 totalPCMFrameCount;
    431     drflac_uint8  md5[16];
    432 } drflac_streaminfo;
    433 
    434 typedef struct
    435 {
    436     /*
    437     The metadata type. Use this to know how to interpret the data below. Will be set to one of the
    438     DRFLAC_METADATA_BLOCK_TYPE_* tokens.
    439     */
    440     drflac_uint32 type;
    441 
    442     /*
    443     A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
    444     not modify the contents of this buffer. Use the structures below for more meaningful and structured
    445     information about the metadata. It's possible for this to be null.
    446     */
    447     const void* pRawData;
    448 
    449     /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
    450     drflac_uint32 rawDataSize;
    451 
    452     union
    453     {
    454         drflac_streaminfo streaminfo;
    455 
    456         struct
    457         {
    458             int unused;
    459         } padding;
    460 
    461         struct
    462         {
    463             drflac_uint32 id;
    464             const void* pData;
    465             drflac_uint32 dataSize;
    466         } application;
    467 
    468         struct
    469         {
    470             drflac_uint32 seekpointCount;
    471             const drflac_seekpoint* pSeekpoints;
    472         } seektable;
    473 
    474         struct
    475         {
    476             drflac_uint32 vendorLength;
    477             const char* vendor;
    478             drflac_uint32 commentCount;
    479             const void* pComments;
    480         } vorbis_comment;
    481 
    482         struct
    483         {
    484             char catalog[128];
    485             drflac_uint64 leadInSampleCount;
    486             drflac_bool32 isCD;
    487             drflac_uint8 trackCount;
    488             const void* pTrackData;
    489         } cuesheet;
    490 
    491         struct
    492         {
    493             drflac_uint32 type;
    494             drflac_uint32 mimeLength;
    495             const char* mime;
    496             drflac_uint32 descriptionLength;
    497             const char* description;
    498             drflac_uint32 width;
    499             drflac_uint32 height;
    500             drflac_uint32 colorDepth;
    501             drflac_uint32 indexColorCount;
    502             drflac_uint32 pictureDataSize;
    503             const drflac_uint8* pPictureData;
    504         } picture;
    505     } data;
    506 } drflac_metadata;
    507 
    508 
    509 /*
    510 Callback for when data needs to be read from the client.
    511 
    512 
    513 Parameters
    514 ----------
    515 pUserData (in)
    516     The user data that was passed to drflac_open() and family.
    517 
    518 pBufferOut (out)
    519     The output buffer.
    520 
    521 bytesToRead (in)
    522     The number of bytes to read.
    523 
    524 
    525 Return Value
    526 ------------
    527 The number of bytes actually read.
    528 
    529 
    530 Remarks
    531 -------
    532 A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
    533 you have reached the end of the stream.
    534 */
    535 typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
    536 
    537 /*
    538 Callback for when data needs to be seeked.
    539 
    540 
    541 Parameters
    542 ----------
    543 pUserData (in)
    544     The user data that was passed to drflac_open() and family.
    545 
    546 offset (in)
    547     The number of bytes to move, relative to the origin. Will never be negative.
    548 
    549 origin (in)
    550     The origin of the seek - the current position or the start of the stream.
    551 
    552 
    553 Return Value
    554 ------------
    555 Whether or not the seek was successful.
    556 
    557 
    558 Remarks
    559 -------
    560 The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
    561 either drflac_seek_origin_start or drflac_seek_origin_current.
    562 
    563 When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
    564 and handled by returning DRFLAC_FALSE.
    565 */
    566 typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
    567 
    568 /*
    569 Callback for when a metadata block is read.
    570 
    571 
    572 Parameters
    573 ----------
    574 pUserData (in)
    575     The user data that was passed to drflac_open() and family.
    576 
    577 pMetadata (in)
    578     A pointer to a structure containing the data of the metadata block.
    579 
    580 
    581 Remarks
    582 -------
    583 Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
    584 will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
    585 */
    586 typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
    587 
    588 
    589 /* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
    590 typedef struct
    591 {
    592     const drflac_uint8* data;
    593     size_t dataSize;
    594     size_t currentReadPos;
    595 } drflac__memory_stream;
    596 
    597 /* Structure for internal use. Used for bit streaming. */
    598 typedef struct
    599 {
    600     /* The function to call when more data needs to be read. */
    601     drflac_read_proc onRead;
    602 
    603     /* The function to call when the current read position needs to be moved. */
    604     drflac_seek_proc onSeek;
    605 
    606     /* The user data to pass around to onRead and onSeek. */
    607     void* pUserData;
    608 
    609 
    610     /*
    611     The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
    612     stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
    613     or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
    614     */
    615     size_t unalignedByteCount;
    616 
    617     /* The content of the unaligned bytes. */
    618     drflac_cache_t unalignedCache;
    619 
    620     /* The index of the next valid cache line in the "L2" cache. */
    621     drflac_uint32 nextL2Line;
    622 
    623     /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
    624     drflac_uint32 consumedBits;
    625 
    626     /*
    627     The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
    628     Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
    629     */
    630     drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
    631     drflac_cache_t cache;
    632 
    633     /*
    634     CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
    635     is reset to 0 at the beginning of each frame.
    636     */
    637     drflac_uint16 crc16;
    638     drflac_cache_t crc16Cache;              /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
    639     drflac_uint32 crc16CacheIgnoredBytes;   /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
    640 } drflac_bs;
    641 
    642 typedef struct
    643 {
    644     /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
    645     drflac_uint8 subframeType;
    646 
    647     /* The number of wasted bits per sample as specified by the sub-frame header. */
    648     drflac_uint8 wastedBitsPerSample;
    649 
    650     /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
    651     drflac_uint8 lpcOrder;
    652 
    653     /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
    654     drflac_int32* pSamplesS32;
    655 } drflac_subframe;
    656 
    657 typedef struct
    658 {
    659     /*
    660     If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
    661     always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
    662     */
    663     drflac_uint64 pcmFrameNumber;
    664 
    665     /*
    666     If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
    667     is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
    668     */
    669     drflac_uint32 flacFrameNumber;
    670 
    671     /* The sample rate of this frame. */
    672     drflac_uint32 sampleRate;
    673 
    674     /* The number of PCM frames in each sub-frame within this frame. */
    675     drflac_uint16 blockSizeInPCMFrames;
    676 
    677     /*
    678     The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
    679     will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
    680     */
    681     drflac_uint8 channelAssignment;
    682 
    683     /* The number of bits per sample within this frame. */
    684     drflac_uint8 bitsPerSample;
    685 
    686     /* The frame's CRC. */
    687     drflac_uint8 crc8;
    688 } drflac_frame_header;
    689 
    690 typedef struct
    691 {
    692     /* The header. */
    693     drflac_frame_header header;
    694 
    695     /*
    696     The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
    697     this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
    698     */
    699     drflac_uint32 pcmFramesRemaining;
    700 
    701     /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
    702     drflac_subframe subframes[8];
    703 } drflac_frame;
    704 
    705 typedef struct
    706 {
    707     /* The function to call when a metadata block is read. */
    708     drflac_meta_proc onMeta;
    709 
    710     /* The user data posted to the metadata callback function. */
    711     void* pUserDataMD;
    712 
    713     /* Memory allocation callbacks. */
    714     drflac_allocation_callbacks allocationCallbacks;
    715 
    716 
    717     /* The sample rate. Will be set to something like 44100. */
    718     drflac_uint32 sampleRate;
    719 
    720     /*
    721     The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
    722     value specified in the STREAMINFO block.
    723     */
    724     drflac_uint8 channels;
    725 
    726     /* The bits per sample. Will be set to something like 16, 24, etc. */
    727     drflac_uint8 bitsPerSample;
    728 
    729     /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
    730     drflac_uint16 maxBlockSizeInPCMFrames;
    731 
    732     /*
    733     The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
    734     the total PCM frame count is unknown. Likely the case with streams like internet radio.
    735     */
    736     drflac_uint64 totalPCMFrameCount;
    737 
    738 
    739     /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
    740     drflac_container container;
    741 
    742     /* The number of seekpoints in the seektable. */
    743     drflac_uint32 seekpointCount;
    744 
    745 
    746     /* Information about the frame the decoder is currently sitting on. */
    747     drflac_frame currentFLACFrame;
    748 
    749 
    750     /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
    751     drflac_uint64 currentPCMFrame;
    752 
    753     /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
    754     drflac_uint64 firstFLACFramePosInBytes;
    755 
    756 
    757     /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
    758     drflac__memory_stream memoryStream;
    759 
    760 
    761     /* A pointer to the decoded sample data. This is an offset of pExtraData. */
    762     drflac_int32* pDecodedSamples;
    763 
    764     /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
    765     drflac_seekpoint* pSeekpoints;
    766 
    767     /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
    768     void* _oggbs;
    769 
    770     /* Internal use only. Used for profiling and testing different seeking modes. */
    771     drflac_bool32 _noSeekTableSeek    : 1;
    772     drflac_bool32 _noBinarySearchSeek : 1;
    773     drflac_bool32 _noBruteForceSeek   : 1;
    774 
    775     /* The bit streamer. The raw FLAC data is fed through this object. */
    776     drflac_bs bs;
    777 
    778     /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
    779     drflac_uint8 pExtraData[1];
    780 } drflac;
    781 
    782 
    783 /*
    784 Opens a FLAC decoder.
    785 
    786 
    787 Parameters
    788 ----------
    789 onRead (in)
    790     The function to call when data needs to be read from the client.
    791 
    792 onSeek (in)
    793     The function to call when the read position of the client data needs to move.
    794 
    795 pUserData (in, optional)
    796     A pointer to application defined data that will be passed to onRead and onSeek.
    797 
    798 pAllocationCallbacks (in, optional)
    799     A pointer to application defined callbacks for managing memory allocations.
    800 
    801 
    802 Return Value
    803 ------------
    804 Returns a pointer to an object representing the decoder.
    805 
    806 
    807 Remarks
    808 -------
    809 Close the decoder with `drflac_close()`.
    810 
    811 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
    812 
    813 This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
    814 without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
    815 
    816 This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
    817 from a block of memory respectively.
    818 
    819 The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
    820 
    821 Use `drflac_open_with_metadata()` if you need access to metadata.
    822 
    823 
    824 Seek Also
    825 ---------
    826 drflac_open_file()
    827 drflac_open_memory()
    828 drflac_open_with_metadata()
    829 drflac_close()
    830 */
    831 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
    832 
    833 /*
    834 Opens a FLAC stream with relaxed validation of the header block.
    835 
    836 
    837 Parameters
    838 ----------
    839 onRead (in)
    840     The function to call when data needs to be read from the client.
    841 
    842 onSeek (in)
    843     The function to call when the read position of the client data needs to move.
    844 
    845 container (in)
    846     Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
    847 
    848 pUserData (in, optional)
    849     A pointer to application defined data that will be passed to onRead and onSeek.
    850 
    851 pAllocationCallbacks (in, optional)
    852     A pointer to application defined callbacks for managing memory allocations.
    853 
    854 
    855 Return Value
    856 ------------
    857 A pointer to an object representing the decoder.
    858 
    859 
    860 Remarks
    861 -------
    862 The same as drflac_open(), except attempts to open the stream even when a header block is not present.
    863 
    864 Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
    865 as that is for internal use only.
    866 
    867 Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
    868 force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
    869 
    870 Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
    871 */
    872 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
    873 
    874 /*
    875 Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
    876 
    877 
    878 Parameters
    879 ----------
    880 onRead (in)
    881     The function to call when data needs to be read from the client.
    882 
    883 onSeek (in)
    884     The function to call when the read position of the client data needs to move.
    885 
    886 onMeta (in)
    887     The function to call for every metadata block.
    888 
    889 pUserData (in, optional)
    890     A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
    891 
    892 pAllocationCallbacks (in, optional)
    893     A pointer to application defined callbacks for managing memory allocations.
    894 
    895 
    896 Return Value
    897 ------------
    898 A pointer to an object representing the decoder.
    899 
    900 
    901 Remarks
    902 -------
    903 Close the decoder with `drflac_close()`.
    904 
    905 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
    906 
    907 This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
    908 metadata block except for STREAMINFO and PADDING blocks.
    909 
    910 The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
    911 pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
    912 the different metadata types.
    913 
    914 The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
    915 
    916 Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
    917 the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
    918 metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
    919 returned depending on whether or not the stream is being opened with metadata.
    920 
    921 
    922 Seek Also
    923 ---------
    924 drflac_open_file_with_metadata()
    925 drflac_open_memory_with_metadata()
    926 drflac_open()
    927 drflac_close()
    928 */
    929 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
    930 
    931 /*
    932 The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
    933 
    934 See Also
    935 --------
    936 drflac_open_with_metadata()
    937 drflac_open_relaxed()
    938 */
    939 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
    940 
    941 /*
    942 Closes the given FLAC decoder.
    943 
    944 
    945 Parameters
    946 ----------
    947 pFlac (in)
    948     The decoder to close.
    949 
    950 
    951 Remarks
    952 -------
    953 This will destroy the decoder object.
    954 
    955 
    956 See Also
    957 --------
    958 drflac_open()
    959 drflac_open_with_metadata()
    960 drflac_open_file()
    961 drflac_open_file_w()
    962 drflac_open_file_with_metadata()
    963 drflac_open_file_with_metadata_w()
    964 drflac_open_memory()
    965 drflac_open_memory_with_metadata()
    966 */
    967 DRFLAC_API void drflac_close(drflac* pFlac);
    968 
    969 
    970 /*
    971 Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
    972 
    973 
    974 Parameters
    975 ----------
    976 pFlac (in)
    977     The decoder.
    978 
    979 framesToRead (in)
    980     The number of PCM frames to read.
    981 
    982 pBufferOut (out, optional)
    983     A pointer to the buffer that will receive the decoded samples.
    984 
    985 
    986 Return Value
    987 ------------
    988 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
    989 
    990 
    991 Remarks
    992 -------
    993 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
    994 */
    995 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
    996 
    997 
    998 /*
    999 Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
   1000 
   1001 
   1002 Parameters
   1003 ----------
   1004 pFlac (in)
   1005     The decoder.
   1006 
   1007 framesToRead (in)
   1008     The number of PCM frames to read.
   1009 
   1010 pBufferOut (out, optional)
   1011     A pointer to the buffer that will receive the decoded samples.
   1012 
   1013 
   1014 Return Value
   1015 ------------
   1016 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
   1017 
   1018 
   1019 Remarks
   1020 -------
   1021 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
   1022 
   1023 Note that this is lossy for streams where the bits per sample is larger than 16.
   1024 */
   1025 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
   1026 
   1027 /*
   1028 Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
   1029 
   1030 
   1031 Parameters
   1032 ----------
   1033 pFlac (in)
   1034     The decoder.
   1035 
   1036 framesToRead (in)
   1037     The number of PCM frames to read.
   1038 
   1039 pBufferOut (out, optional)
   1040     A pointer to the buffer that will receive the decoded samples.
   1041 
   1042 
   1043 Return Value
   1044 ------------
   1045 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
   1046 
   1047 
   1048 Remarks
   1049 -------
   1050 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
   1051 
   1052 Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
   1053 */
   1054 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
   1055 
   1056 /*
   1057 Seeks to the PCM frame at the given index.
   1058 
   1059 
   1060 Parameters
   1061 ----------
   1062 pFlac (in)
   1063     The decoder.
   1064 
   1065 pcmFrameIndex (in)
   1066     The index of the PCM frame to seek to. See notes below.
   1067 
   1068 
   1069 Return Value
   1070 -------------
   1071 `DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
   1072 */
   1073 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
   1074 
   1075 
   1076 
   1077 #ifndef DR_FLAC_NO_STDIO
   1078 /*
   1079 Opens a FLAC decoder from the file at the given path.
   1080 
   1081 
   1082 Parameters
   1083 ----------
   1084 pFileName (in)
   1085     The path of the file to open, either absolute or relative to the current directory.
   1086 
   1087 pAllocationCallbacks (in, optional)
   1088     A pointer to application defined callbacks for managing memory allocations.
   1089 
   1090 
   1091 Return Value
   1092 ------------
   1093 A pointer to an object representing the decoder.
   1094 
   1095 
   1096 Remarks
   1097 -------
   1098 Close the decoder with drflac_close().
   1099 
   1100 
   1101 Remarks
   1102 -------
   1103 This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
   1104 at any given time, so keep this mind if you have many decoders open at the same time.
   1105 
   1106 
   1107 See Also
   1108 --------
   1109 drflac_open_file_with_metadata()
   1110 drflac_open()
   1111 drflac_close()
   1112 */
   1113 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
   1114 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
   1115 
   1116 /*
   1117 Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
   1118 
   1119 
   1120 Parameters
   1121 ----------
   1122 pFileName (in)
   1123     The path of the file to open, either absolute or relative to the current directory.
   1124 
   1125 pAllocationCallbacks (in, optional)
   1126     A pointer to application defined callbacks for managing memory allocations.
   1127 
   1128 onMeta (in)
   1129     The callback to fire for each metadata block.
   1130 
   1131 pUserData (in)
   1132     A pointer to the user data to pass to the metadata callback.
   1133 
   1134 pAllocationCallbacks (in)
   1135     A pointer to application defined callbacks for managing memory allocations.
   1136 
   1137 
   1138 Remarks
   1139 -------
   1140 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
   1141 
   1142 
   1143 See Also
   1144 --------
   1145 drflac_open_with_metadata()
   1146 drflac_open()
   1147 drflac_close()
   1148 */
   1149 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
   1150 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
   1151 #endif
   1152 
   1153 /*
   1154 Opens a FLAC decoder from a pre-allocated block of memory
   1155 
   1156 
   1157 Parameters
   1158 ----------
   1159 pData (in)
   1160     A pointer to the raw encoded FLAC data.
   1161 
   1162 dataSize (in)
   1163     The size in bytes of `data`.
   1164 
   1165 pAllocationCallbacks (in)
   1166     A pointer to application defined callbacks for managing memory allocations.
   1167 
   1168 
   1169 Return Value
   1170 ------------
   1171 A pointer to an object representing the decoder.
   1172 
   1173 
   1174 Remarks
   1175 -------
   1176 This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
   1177 
   1178 
   1179 See Also
   1180 --------
   1181 drflac_open()
   1182 drflac_close()
   1183 */
   1184 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
   1185 
   1186 /*
   1187 Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
   1188 
   1189 
   1190 Parameters
   1191 ----------
   1192 pData (in)
   1193     A pointer to the raw encoded FLAC data.
   1194 
   1195 dataSize (in)
   1196     The size in bytes of `data`.
   1197 
   1198 onMeta (in)
   1199     The callback to fire for each metadata block.
   1200 
   1201 pUserData (in)
   1202     A pointer to the user data to pass to the metadata callback.
   1203 
   1204 pAllocationCallbacks (in)
   1205     A pointer to application defined callbacks for managing memory allocations.
   1206 
   1207 
   1208 Remarks
   1209 -------
   1210 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
   1211 
   1212 
   1213 See Also
   1214 -------
   1215 drflac_open_with_metadata()
   1216 drflac_open()
   1217 drflac_close()
   1218 */
   1219 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
   1220 
   1221 
   1222 
   1223 /* High Level APIs */
   1224 
   1225 /*
   1226 Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
   1227 pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
   1228 
   1229 You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
   1230 case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
   1231 
   1232 Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
   1233 read samples into a dynamically sized buffer on the heap until no samples are left.
   1234 
   1235 Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
   1236 */
   1237 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1238 
   1239 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
   1240 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1241 
   1242 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
   1243 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1244 
   1245 #ifndef DR_FLAC_NO_STDIO
   1246 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
   1247 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1248 
   1249 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
   1250 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1251 
   1252 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
   1253 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1254 #endif
   1255 
   1256 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
   1257 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1258 
   1259 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
   1260 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1261 
   1262 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
   1263 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
   1264 
   1265 /*
   1266 Frees memory that was allocated internally by dr_flac.
   1267 
   1268 Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
   1269 */
   1270 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
   1271 
   1272 
   1273 /* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
   1274 typedef struct
   1275 {
   1276     drflac_uint32 countRemaining;
   1277     const char* pRunningData;
   1278 } drflac_vorbis_comment_iterator;
   1279 
   1280 /*
   1281 Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
   1282 metadata block.
   1283 */
   1284 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
   1285 
   1286 /*
   1287 Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
   1288 returned string is NOT null terminated.
   1289 */
   1290 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
   1291 
   1292 
   1293 /* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
   1294 typedef struct
   1295 {
   1296     drflac_uint32 countRemaining;
   1297     const char* pRunningData;
   1298 } drflac_cuesheet_track_iterator;
   1299 
   1300 /* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
   1301 typedef struct
   1302 {
   1303     drflac_uint64 offset;
   1304     drflac_uint8 index;
   1305     drflac_uint8 reserved[3];
   1306 } drflac_cuesheet_track_index;
   1307 
   1308 typedef struct
   1309 {
   1310     drflac_uint64 offset;
   1311     drflac_uint8 trackNumber;
   1312     char ISRC[12];
   1313     drflac_bool8 isAudio;
   1314     drflac_bool8 preEmphasis;
   1315     drflac_uint8 indexCount;
   1316     const drflac_cuesheet_track_index* pIndexPoints;
   1317 } drflac_cuesheet_track;
   1318 
   1319 /*
   1320 Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
   1321 block.
   1322 */
   1323 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
   1324 
   1325 /* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
   1326 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
   1327 
   1328 
   1329 #ifdef __cplusplus
   1330 }
   1331 #endif
   1332 #endif  /* dr_flac_h */
   1333 
   1334 
   1335 /************************************************************************************************************************************************************
   1336  ************************************************************************************************************************************************************
   1337 
   1338  IMPLEMENTATION
   1339 
   1340  ************************************************************************************************************************************************************
   1341  ************************************************************************************************************************************************************/
   1342 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
   1343 #ifndef dr_flac_c
   1344 #define dr_flac_c
   1345 
   1346 /* Disable some annoying warnings. */
   1347 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
   1348     #pragma GCC diagnostic push
   1349     #if __GNUC__ >= 7
   1350     #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
   1351     #endif
   1352 #endif
   1353 
   1354 #ifdef __linux__
   1355     #ifndef _BSD_SOURCE
   1356         #define _BSD_SOURCE
   1357     #endif
   1358     #ifndef _DEFAULT_SOURCE
   1359         #define _DEFAULT_SOURCE
   1360     #endif
   1361     #ifndef __USE_BSD
   1362         #define __USE_BSD
   1363     #endif
   1364     #include <endian.h>
   1365 #endif
   1366 
   1367 #include <stdlib.h>
   1368 #include <string.h>
   1369 
   1370 /* Inline */
   1371 #ifdef _MSC_VER
   1372     #define DRFLAC_INLINE __forceinline
   1373 #elif defined(__GNUC__)
   1374     /*
   1375     I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
   1376     the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
   1377     case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
   1378     command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
   1379     I am using "__inline__" only when we're compiling in strict ANSI mode.
   1380     */
   1381     #if defined(__STRICT_ANSI__)
   1382         #define DRFLAC_GNUC_INLINE_HINT __inline__
   1383     #else
   1384         #define DRFLAC_GNUC_INLINE_HINT inline
   1385     #endif
   1386 
   1387     #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
   1388         #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
   1389     #else
   1390         #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
   1391     #endif
   1392 #elif defined(__WATCOMC__)
   1393     #define DRFLAC_INLINE __inline
   1394 #else
   1395     #define DRFLAC_INLINE
   1396 #endif
   1397 /* End Inline */
   1398 
   1399 /*
   1400 Intrinsics Support
   1401 
   1402 There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
   1403 
   1404     "error: shift must be an immediate"
   1405 
   1406 Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
   1407 */
   1408 #if !defined(DR_FLAC_NO_SIMD)
   1409     #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
   1410         #if defined(_MSC_VER) && !defined(__clang__)
   1411             /* MSVC. */
   1412             #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2)    /* 2005 */
   1413                 #define DRFLAC_SUPPORT_SSE2
   1414             #endif
   1415             #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41)   /* 2010 */
   1416                 #define DRFLAC_SUPPORT_SSE41
   1417             #endif
   1418         #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
   1419             /* Assume GNUC-style. */
   1420             #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
   1421                 #define DRFLAC_SUPPORT_SSE2
   1422             #endif
   1423             #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
   1424                 #define DRFLAC_SUPPORT_SSE41
   1425             #endif
   1426         #endif
   1427 
   1428         /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
   1429         #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
   1430             #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
   1431                 #define DRFLAC_SUPPORT_SSE2
   1432             #endif
   1433             #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
   1434                 #define DRFLAC_SUPPORT_SSE41
   1435             #endif
   1436         #endif
   1437 
   1438         #if defined(DRFLAC_SUPPORT_SSE41)
   1439             #include <smmintrin.h>
   1440         #elif defined(DRFLAC_SUPPORT_SSE2)
   1441             #include <emmintrin.h>
   1442         #endif
   1443     #endif
   1444 
   1445     #if defined(DRFLAC_ARM)
   1446         #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
   1447             #define DRFLAC_SUPPORT_NEON
   1448             #include <arm_neon.h>
   1449         #endif
   1450     #endif
   1451 #endif
   1452 
   1453 /* Compile-time CPU feature support. */
   1454 #if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
   1455     #if defined(_MSC_VER) && !defined(__clang__)
   1456         #if _MSC_VER >= 1400
   1457             #include <intrin.h>
   1458             static void drflac__cpuid(int info[4], int fid)
   1459             {
   1460                 __cpuid(info, fid);
   1461             }
   1462         #else
   1463             #define DRFLAC_NO_CPUID
   1464         #endif
   1465     #else
   1466         #if defined(__GNUC__) || defined(__clang__)
   1467             static void drflac__cpuid(int info[4], int fid)
   1468             {
   1469                 /*
   1470                 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
   1471                 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
   1472                 supporting different assembly dialects.
   1473 
   1474                 What's basically happening is that we're saving and restoring the ebx register manually.
   1475                 */
   1476                 #if defined(DRFLAC_X86) && defined(__PIC__)
   1477                     __asm__ __volatile__ (
   1478                         "xchg{l} {%%}ebx, %k1;"
   1479                         "cpuid;"
   1480                         "xchg{l} {%%}ebx, %k1;"
   1481                         : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
   1482                     );
   1483                 #else
   1484                     __asm__ __volatile__ (
   1485                         "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
   1486                     );
   1487                 #endif
   1488             }
   1489         #else
   1490             #define DRFLAC_NO_CPUID
   1491         #endif
   1492     #endif
   1493 #else
   1494     #define DRFLAC_NO_CPUID
   1495 #endif
   1496 
   1497 static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
   1498 {
   1499 #if defined(DRFLAC_SUPPORT_SSE2)
   1500     #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
   1501         #if defined(DRFLAC_X64)
   1502             return DRFLAC_TRUE;    /* 64-bit targets always support SSE2. */
   1503         #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
   1504             return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
   1505         #else
   1506             #if defined(DRFLAC_NO_CPUID)
   1507                 return DRFLAC_FALSE;
   1508             #else
   1509                 int info[4];
   1510                 drflac__cpuid(info, 1);
   1511                 return (info[3] & (1 << 26)) != 0;
   1512             #endif
   1513         #endif
   1514     #else
   1515         return DRFLAC_FALSE;       /* SSE2 is only supported on x86 and x64 architectures. */
   1516     #endif
   1517 #else
   1518     return DRFLAC_FALSE;           /* No compiler support. */
   1519 #endif
   1520 }
   1521 
   1522 static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
   1523 {
   1524 #if defined(DRFLAC_SUPPORT_SSE41)
   1525     #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
   1526         #if defined(__SSE4_1__) || defined(__AVX__)
   1527             return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
   1528         #else
   1529             #if defined(DRFLAC_NO_CPUID)
   1530                 return DRFLAC_FALSE;
   1531             #else
   1532                 int info[4];
   1533                 drflac__cpuid(info, 1);
   1534                 return (info[2] & (1 << 19)) != 0;
   1535             #endif
   1536         #endif
   1537     #else
   1538         return DRFLAC_FALSE;       /* SSE41 is only supported on x86 and x64 architectures. */
   1539     #endif
   1540 #else
   1541     return DRFLAC_FALSE;           /* No compiler support. */
   1542 #endif
   1543 }
   1544 
   1545 
   1546 #if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
   1547     #define DRFLAC_HAS_LZCNT_INTRINSIC
   1548 #elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
   1549     #define DRFLAC_HAS_LZCNT_INTRINSIC
   1550 #elif defined(__clang__)
   1551     #if defined(__has_builtin)
   1552         #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
   1553             #define DRFLAC_HAS_LZCNT_INTRINSIC
   1554         #endif
   1555     #endif
   1556 #endif
   1557 
   1558 #if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
   1559     #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
   1560     #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
   1561     #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
   1562 #elif defined(__clang__)
   1563     #if defined(__has_builtin)
   1564         #if __has_builtin(__builtin_bswap16)
   1565             #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
   1566         #endif
   1567         #if __has_builtin(__builtin_bswap32)
   1568             #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
   1569         #endif
   1570         #if __has_builtin(__builtin_bswap64)
   1571             #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
   1572         #endif
   1573     #endif
   1574 #elif defined(__GNUC__)
   1575     #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
   1576         #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
   1577         #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
   1578     #endif
   1579     #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
   1580         #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
   1581     #endif
   1582 #elif defined(__WATCOMC__) && defined(__386__)
   1583     #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
   1584     #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
   1585     #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
   1586     extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
   1587     extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
   1588     extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
   1589 #pragma aux _watcom_bswap16 = \
   1590     "xchg al, ah" \
   1591     parm  [ax]    \
   1592     value [ax]    \
   1593     modify nomemory;
   1594 #pragma aux _watcom_bswap32 = \
   1595     "bswap eax" \
   1596     parm  [eax] \
   1597     value [eax] \
   1598     modify nomemory;
   1599 #pragma aux _watcom_bswap64 = \
   1600     "bswap eax"     \
   1601     "bswap edx"     \
   1602     "xchg eax,edx"  \
   1603     parm [eax edx]  \
   1604     value [eax edx] \
   1605     modify nomemory;
   1606 #endif
   1607 
   1608 
   1609 /* Standard library stuff. */
   1610 #ifndef DRFLAC_ASSERT
   1611 #include <assert.h>
   1612 #define DRFLAC_ASSERT(expression)           assert(expression)
   1613 #endif
   1614 #ifndef DRFLAC_MALLOC
   1615 #define DRFLAC_MALLOC(sz)                   malloc((sz))
   1616 #endif
   1617 #ifndef DRFLAC_REALLOC
   1618 #define DRFLAC_REALLOC(p, sz)               realloc((p), (sz))
   1619 #endif
   1620 #ifndef DRFLAC_FREE
   1621 #define DRFLAC_FREE(p)                      free((p))
   1622 #endif
   1623 #ifndef DRFLAC_COPY_MEMORY
   1624 #define DRFLAC_COPY_MEMORY(dst, src, sz)    memcpy((dst), (src), (sz))
   1625 #endif
   1626 #ifndef DRFLAC_ZERO_MEMORY
   1627 #define DRFLAC_ZERO_MEMORY(p, sz)           memset((p), 0, (sz))
   1628 #endif
   1629 #ifndef DRFLAC_ZERO_OBJECT
   1630 #define DRFLAC_ZERO_OBJECT(p)               DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
   1631 #endif
   1632 
   1633 #define DRFLAC_MAX_SIMD_VECTOR_SIZE                     64  /* 64 for AVX-512 in the future. */
   1634 
   1635 /* Result Codes */
   1636 typedef drflac_int32 drflac_result;
   1637 #define DRFLAC_SUCCESS                                   0
   1638 #define DRFLAC_ERROR                                    -1   /* A generic error. */
   1639 #define DRFLAC_INVALID_ARGS                             -2
   1640 #define DRFLAC_INVALID_OPERATION                        -3
   1641 #define DRFLAC_OUT_OF_MEMORY                            -4
   1642 #define DRFLAC_OUT_OF_RANGE                             -5
   1643 #define DRFLAC_ACCESS_DENIED                            -6
   1644 #define DRFLAC_DOES_NOT_EXIST                           -7
   1645 #define DRFLAC_ALREADY_EXISTS                           -8
   1646 #define DRFLAC_TOO_MANY_OPEN_FILES                      -9
   1647 #define DRFLAC_INVALID_FILE                             -10
   1648 #define DRFLAC_TOO_BIG                                  -11
   1649 #define DRFLAC_PATH_TOO_LONG                            -12
   1650 #define DRFLAC_NAME_TOO_LONG                            -13
   1651 #define DRFLAC_NOT_DIRECTORY                            -14
   1652 #define DRFLAC_IS_DIRECTORY                             -15
   1653 #define DRFLAC_DIRECTORY_NOT_EMPTY                      -16
   1654 #define DRFLAC_END_OF_FILE                              -17
   1655 #define DRFLAC_NO_SPACE                                 -18
   1656 #define DRFLAC_BUSY                                     -19
   1657 #define DRFLAC_IO_ERROR                                 -20
   1658 #define DRFLAC_INTERRUPT                                -21
   1659 #define DRFLAC_UNAVAILABLE                              -22
   1660 #define DRFLAC_ALREADY_IN_USE                           -23
   1661 #define DRFLAC_BAD_ADDRESS                              -24
   1662 #define DRFLAC_BAD_SEEK                                 -25
   1663 #define DRFLAC_BAD_PIPE                                 -26
   1664 #define DRFLAC_DEADLOCK                                 -27
   1665 #define DRFLAC_TOO_MANY_LINKS                           -28
   1666 #define DRFLAC_NOT_IMPLEMENTED                          -29
   1667 #define DRFLAC_NO_MESSAGE                               -30
   1668 #define DRFLAC_BAD_MESSAGE                              -31
   1669 #define DRFLAC_NO_DATA_AVAILABLE                        -32
   1670 #define DRFLAC_INVALID_DATA                             -33
   1671 #define DRFLAC_TIMEOUT                                  -34
   1672 #define DRFLAC_NO_NETWORK                               -35
   1673 #define DRFLAC_NOT_UNIQUE                               -36
   1674 #define DRFLAC_NOT_SOCKET                               -37
   1675 #define DRFLAC_NO_ADDRESS                               -38
   1676 #define DRFLAC_BAD_PROTOCOL                             -39
   1677 #define DRFLAC_PROTOCOL_UNAVAILABLE                     -40
   1678 #define DRFLAC_PROTOCOL_NOT_SUPPORTED                   -41
   1679 #define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED            -42
   1680 #define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED             -43
   1681 #define DRFLAC_SOCKET_NOT_SUPPORTED                     -44
   1682 #define DRFLAC_CONNECTION_RESET                         -45
   1683 #define DRFLAC_ALREADY_CONNECTED                        -46
   1684 #define DRFLAC_NOT_CONNECTED                            -47
   1685 #define DRFLAC_CONNECTION_REFUSED                       -48
   1686 #define DRFLAC_NO_HOST                                  -49
   1687 #define DRFLAC_IN_PROGRESS                              -50
   1688 #define DRFLAC_CANCELLED                                -51
   1689 #define DRFLAC_MEMORY_ALREADY_MAPPED                    -52
   1690 #define DRFLAC_AT_END                                   -53
   1691 
   1692 #define DRFLAC_CRC_MISMATCH                             -100
   1693 /* End Result Codes */
   1694 
   1695 
   1696 #define DRFLAC_SUBFRAME_CONSTANT                        0
   1697 #define DRFLAC_SUBFRAME_VERBATIM                        1
   1698 #define DRFLAC_SUBFRAME_FIXED                           8
   1699 #define DRFLAC_SUBFRAME_LPC                             32
   1700 #define DRFLAC_SUBFRAME_RESERVED                        255
   1701 
   1702 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE  0
   1703 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
   1704 
   1705 #define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT           0
   1706 #define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE             8
   1707 #define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE            9
   1708 #define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE              10
   1709 
   1710 #define DRFLAC_SEEKPOINT_SIZE_IN_BYTES                  18
   1711 #define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES             36
   1712 #define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES       12
   1713 
   1714 #define drflac_align(x, a)                              ((((x) + (a) - 1) / (a)) * (a))
   1715 
   1716 
   1717 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
   1718 {
   1719     if (pMajor) {
   1720         *pMajor = DRFLAC_VERSION_MAJOR;
   1721     }
   1722 
   1723     if (pMinor) {
   1724         *pMinor = DRFLAC_VERSION_MINOR;
   1725     }
   1726 
   1727     if (pRevision) {
   1728         *pRevision = DRFLAC_VERSION_REVISION;
   1729     }
   1730 }
   1731 
   1732 DRFLAC_API const char* drflac_version_string(void)
   1733 {
   1734     return DRFLAC_VERSION_STRING;
   1735 }
   1736 
   1737 
   1738 /* CPU caps. */
   1739 #if defined(__has_feature)
   1740     #if __has_feature(thread_sanitizer)
   1741         #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
   1742     #else
   1743         #define DRFLAC_NO_THREAD_SANITIZE
   1744     #endif
   1745 #else
   1746     #define DRFLAC_NO_THREAD_SANITIZE
   1747 #endif
   1748 
   1749 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
   1750 static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
   1751 #endif
   1752 
   1753 #ifndef DRFLAC_NO_CPUID
   1754 static drflac_bool32 drflac__gIsSSE2Supported  = DRFLAC_FALSE;
   1755 static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
   1756 
   1757 /*
   1758 I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
   1759 actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
   1760 complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
   1761 just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
   1762 */
   1763 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
   1764 {
   1765     static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
   1766 
   1767     if (!isCPUCapsInitialized) {
   1768         /* LZCNT */
   1769 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
   1770         int info[4] = {0};
   1771         drflac__cpuid(info, 0x80000001);
   1772         drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
   1773 #endif
   1774 
   1775         /* SSE2 */
   1776         drflac__gIsSSE2Supported = drflac_has_sse2();
   1777 
   1778         /* SSE4.1 */
   1779         drflac__gIsSSE41Supported = drflac_has_sse41();
   1780 
   1781         /* Initialized. */
   1782         isCPUCapsInitialized = DRFLAC_TRUE;
   1783     }
   1784 }
   1785 #else
   1786 static drflac_bool32 drflac__gIsNEONSupported  = DRFLAC_FALSE;
   1787 
   1788 static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
   1789 {
   1790 #if defined(DRFLAC_SUPPORT_NEON)
   1791     #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
   1792         #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
   1793             return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate NEON code we can assume support. */
   1794         #else
   1795             /* TODO: Runtime check. */
   1796             return DRFLAC_FALSE;
   1797         #endif
   1798     #else
   1799         return DRFLAC_FALSE;       /* NEON is only supported on ARM architectures. */
   1800     #endif
   1801 #else
   1802     return DRFLAC_FALSE;           /* No compiler support. */
   1803 #endif
   1804 }
   1805 
   1806 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
   1807 {
   1808     drflac__gIsNEONSupported = drflac__has_neon();
   1809 
   1810 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
   1811     drflac__gIsLZCNTSupported = DRFLAC_TRUE;
   1812 #endif
   1813 }
   1814 #endif
   1815 
   1816 
   1817 /* Endian Management */
   1818 static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
   1819 {
   1820 #if defined(DRFLAC_X86) || defined(DRFLAC_X64)
   1821     return DRFLAC_TRUE;
   1822 #elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
   1823     return DRFLAC_TRUE;
   1824 #else
   1825     int n = 1;
   1826     return (*(char*)&n) == 1;
   1827 #endif
   1828 }
   1829 
   1830 static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
   1831 {
   1832 #ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
   1833     #if defined(_MSC_VER) && !defined(__clang__)
   1834         return _byteswap_ushort(n);
   1835     #elif defined(__GNUC__) || defined(__clang__)
   1836         return __builtin_bswap16(n);
   1837     #elif defined(__WATCOMC__) && defined(__386__)
   1838         return _watcom_bswap16(n);
   1839     #else
   1840         #error "This compiler does not support the byte swap intrinsic."
   1841     #endif
   1842 #else
   1843     return ((n & 0xFF00) >> 8) |
   1844            ((n & 0x00FF) << 8);
   1845 #endif
   1846 }
   1847 
   1848 static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
   1849 {
   1850 #ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
   1851     #if defined(_MSC_VER) && !defined(__clang__)
   1852         return _byteswap_ulong(n);
   1853     #elif defined(__GNUC__) || defined(__clang__)
   1854         #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT)   /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
   1855             /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
   1856             drflac_uint32 r;
   1857             __asm__ __volatile__ (
   1858             #if defined(DRFLAC_64BIT)
   1859                 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
   1860             #else
   1861                 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
   1862             #endif
   1863             );
   1864             return r;
   1865         #else
   1866             return __builtin_bswap32(n);
   1867         #endif
   1868     #elif defined(__WATCOMC__) && defined(__386__)
   1869         return _watcom_bswap32(n);
   1870     #else
   1871         #error "This compiler does not support the byte swap intrinsic."
   1872     #endif
   1873 #else
   1874     return ((n & 0xFF000000) >> 24) |
   1875            ((n & 0x00FF0000) >>  8) |
   1876            ((n & 0x0000FF00) <<  8) |
   1877            ((n & 0x000000FF) << 24);
   1878 #endif
   1879 }
   1880 
   1881 static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
   1882 {
   1883 #ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
   1884     #if defined(_MSC_VER) && !defined(__clang__)
   1885         return _byteswap_uint64(n);
   1886     #elif defined(__GNUC__) || defined(__clang__)
   1887         return __builtin_bswap64(n);
   1888     #elif defined(__WATCOMC__) && defined(__386__)
   1889         return _watcom_bswap64(n);
   1890     #else
   1891         #error "This compiler does not support the byte swap intrinsic."
   1892     #endif
   1893 #else
   1894     /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
   1895     return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
   1896            ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
   1897            ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
   1898            ((n & ((drflac_uint64)0x000000FF << 32)) >>  8) |
   1899            ((n & ((drflac_uint64)0xFF000000      )) <<  8) |
   1900            ((n & ((drflac_uint64)0x00FF0000      )) << 24) |
   1901            ((n & ((drflac_uint64)0x0000FF00      )) << 40) |
   1902            ((n & ((drflac_uint64)0x000000FF      )) << 56);
   1903 #endif
   1904 }
   1905 
   1906 
   1907 static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
   1908 {
   1909     if (drflac__is_little_endian()) {
   1910         return drflac__swap_endian_uint16(n);
   1911     }
   1912 
   1913     return n;
   1914 }
   1915 
   1916 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
   1917 {
   1918     if (drflac__is_little_endian()) {
   1919         return drflac__swap_endian_uint32(n);
   1920     }
   1921 
   1922     return n;
   1923 }
   1924 
   1925 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
   1926 {
   1927     const drflac_uint8* pNum = (drflac_uint8*)pData;
   1928     return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
   1929 }
   1930 
   1931 static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
   1932 {
   1933     if (drflac__is_little_endian()) {
   1934         return drflac__swap_endian_uint64(n);
   1935     }
   1936 
   1937     return n;
   1938 }
   1939 
   1940 
   1941 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
   1942 {
   1943     if (!drflac__is_little_endian()) {
   1944         return drflac__swap_endian_uint32(n);
   1945     }
   1946 
   1947     return n;
   1948 }
   1949 
   1950 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
   1951 {
   1952     const drflac_uint8* pNum = (drflac_uint8*)pData;
   1953     return *pNum | *(pNum+1) << 8 |  *(pNum+2) << 16 | *(pNum+3) << 24;
   1954 }
   1955 
   1956 
   1957 static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
   1958 {
   1959     drflac_uint32 result = 0;
   1960     result |= (n & 0x7F000000) >> 3;
   1961     result |= (n & 0x007F0000) >> 2;
   1962     result |= (n & 0x00007F00) >> 1;
   1963     result |= (n & 0x0000007F) >> 0;
   1964 
   1965     return result;
   1966 }
   1967 
   1968 
   1969 
   1970 /* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
   1971 static drflac_uint8 drflac__crc8_table[] = {
   1972     0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
   1973     0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
   1974     0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
   1975     0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
   1976     0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
   1977     0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
   1978     0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
   1979     0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
   1980     0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
   1981     0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
   1982     0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
   1983     0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
   1984     0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
   1985     0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
   1986     0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
   1987     0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
   1988 };
   1989 
   1990 static drflac_uint16 drflac__crc16_table[] = {
   1991     0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
   1992     0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
   1993     0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
   1994     0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
   1995     0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
   1996     0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
   1997     0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
   1998     0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
   1999     0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
   2000     0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
   2001     0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
   2002     0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
   2003     0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
   2004     0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
   2005     0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
   2006     0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
   2007     0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
   2008     0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
   2009     0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
   2010     0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
   2011     0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
   2012     0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
   2013     0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
   2014     0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
   2015     0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
   2016     0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
   2017     0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
   2018     0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
   2019     0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
   2020     0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
   2021     0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
   2022     0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
   2023 };
   2024 
   2025 static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
   2026 {
   2027     return drflac__crc8_table[crc ^ data];
   2028 }
   2029 
   2030 static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
   2031 {
   2032 #ifdef DR_FLAC_NO_CRC
   2033     (void)crc;
   2034     (void)data;
   2035     (void)count;
   2036     return 0;
   2037 #else
   2038 #if 0
   2039     /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
   2040     drflac_uint8 p = 0x07;
   2041     for (int i = count-1; i >= 0; --i) {
   2042         drflac_uint8 bit = (data & (1 << i)) >> i;
   2043         if (crc & 0x80) {
   2044             crc = ((crc << 1) | bit) ^ p;
   2045         } else {
   2046             crc = ((crc << 1) | bit);
   2047         }
   2048     }
   2049     return crc;
   2050 #else
   2051     drflac_uint32 wholeBytes;
   2052     drflac_uint32 leftoverBits;
   2053     drflac_uint64 leftoverDataMask;
   2054 
   2055     static drflac_uint64 leftoverDataMaskTable[8] = {
   2056         0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
   2057     };
   2058 
   2059     DRFLAC_ASSERT(count <= 32);
   2060 
   2061     wholeBytes = count >> 3;
   2062     leftoverBits = count - (wholeBytes*8);
   2063     leftoverDataMask = leftoverDataMaskTable[leftoverBits];
   2064 
   2065     switch (wholeBytes) {
   2066         case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
   2067         case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
   2068         case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
   2069         case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
   2070         case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
   2071     }
   2072     return crc;
   2073 #endif
   2074 #endif
   2075 }
   2076 
   2077 static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
   2078 {
   2079     return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
   2080 }
   2081 
   2082 static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
   2083 {
   2084 #ifdef DRFLAC_64BIT
   2085     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
   2086     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
   2087     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
   2088     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
   2089 #endif
   2090     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
   2091     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
   2092     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
   2093     crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
   2094 
   2095     return crc;
   2096 }
   2097 
   2098 static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
   2099 {
   2100     switch (byteCount)
   2101     {
   2102 #ifdef DRFLAC_64BIT
   2103     case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
   2104     case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
   2105     case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
   2106     case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
   2107 #endif
   2108     case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
   2109     case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
   2110     case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
   2111     case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
   2112     }
   2113 
   2114     return crc;
   2115 }
   2116 
   2117 #if 0
   2118 static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
   2119 {
   2120 #ifdef DR_FLAC_NO_CRC
   2121     (void)crc;
   2122     (void)data;
   2123     (void)count;
   2124     return 0;
   2125 #else
   2126 #if 0
   2127     /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
   2128     drflac_uint16 p = 0x8005;
   2129     for (int i = count-1; i >= 0; --i) {
   2130         drflac_uint16 bit = (data & (1ULL << i)) >> i;
   2131         if (r & 0x8000) {
   2132             r = ((r << 1) | bit) ^ p;
   2133         } else {
   2134             r = ((r << 1) | bit);
   2135         }
   2136     }
   2137 
   2138     return crc;
   2139 #else
   2140     drflac_uint32 wholeBytes;
   2141     drflac_uint32 leftoverBits;
   2142     drflac_uint64 leftoverDataMask;
   2143 
   2144     static drflac_uint64 leftoverDataMaskTable[8] = {
   2145         0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
   2146     };
   2147 
   2148     DRFLAC_ASSERT(count <= 64);
   2149 
   2150     wholeBytes = count >> 3;
   2151     leftoverBits = count & 7;
   2152     leftoverDataMask = leftoverDataMaskTable[leftoverBits];
   2153 
   2154     switch (wholeBytes) {
   2155         default:
   2156         case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
   2157         case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
   2158         case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
   2159         case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
   2160         case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
   2161     }
   2162     return crc;
   2163 #endif
   2164 #endif
   2165 }
   2166 
   2167 static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
   2168 {
   2169 #ifdef DR_FLAC_NO_CRC
   2170     (void)crc;
   2171     (void)data;
   2172     (void)count;
   2173     return 0;
   2174 #else
   2175     drflac_uint32 wholeBytes;
   2176     drflac_uint32 leftoverBits;
   2177     drflac_uint64 leftoverDataMask;
   2178 
   2179     static drflac_uint64 leftoverDataMaskTable[8] = {
   2180         0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
   2181     };
   2182 
   2183     DRFLAC_ASSERT(count <= 64);
   2184 
   2185     wholeBytes = count >> 3;
   2186     leftoverBits = count & 7;
   2187     leftoverDataMask = leftoverDataMaskTable[leftoverBits];
   2188 
   2189     switch (wholeBytes) {
   2190         default:
   2191         case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits)));    /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
   2192         case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
   2193         case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
   2194         case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
   2195         case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000      ) << leftoverBits)) >> (24 + leftoverBits)));
   2196         case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000      ) << leftoverBits)) >> (16 + leftoverBits)));
   2197         case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00      ) << leftoverBits)) >> ( 8 + leftoverBits)));
   2198         case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF      ) << leftoverBits)) >> ( 0 + leftoverBits)));
   2199         case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
   2200     }
   2201     return crc;
   2202 #endif
   2203 }
   2204 
   2205 
   2206 static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
   2207 {
   2208 #ifdef DRFLAC_64BIT
   2209     return drflac_crc16__64bit(crc, data, count);
   2210 #else
   2211     return drflac_crc16__32bit(crc, data, count);
   2212 #endif
   2213 }
   2214 #endif
   2215 
   2216 
   2217 #ifdef DRFLAC_64BIT
   2218 #define drflac__be2host__cache_line drflac__be2host_64
   2219 #else
   2220 #define drflac__be2host__cache_line drflac__be2host_32
   2221 #endif
   2222 
   2223 /*
   2224 BIT READING ATTEMPT #2
   2225 
   2226 This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
   2227 on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
   2228 is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
   2229 array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
   2230 from onRead() is read into.
   2231 */
   2232 #define DRFLAC_CACHE_L1_SIZE_BYTES(bs)                      (sizeof((bs)->cache))
   2233 #define DRFLAC_CACHE_L1_SIZE_BITS(bs)                       (sizeof((bs)->cache)*8)
   2234 #define DRFLAC_CACHE_L1_BITS_REMAINING(bs)                  (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
   2235 #define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount)           (~((~(drflac_cache_t)0) >> (_bitCount)))
   2236 #define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount)      (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
   2237 #define DRFLAC_CACHE_L1_SELECT(bs, _bitCount)               (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
   2238 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount)     (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >>  DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
   2239 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
   2240 #define DRFLAC_CACHE_L2_SIZE_BYTES(bs)                      (sizeof((bs)->cacheL2))
   2241 #define DRFLAC_CACHE_L2_LINE_COUNT(bs)                      (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
   2242 #define DRFLAC_CACHE_L2_LINES_REMAINING(bs)                 (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
   2243 
   2244 
   2245 #ifndef DR_FLAC_NO_CRC
   2246 static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
   2247 {
   2248     bs->crc16 = 0;
   2249     bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
   2250 }
   2251 
   2252 static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
   2253 {
   2254     if (bs->crc16CacheIgnoredBytes == 0) {
   2255         bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
   2256     } else {
   2257         bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
   2258         bs->crc16CacheIgnoredBytes = 0;
   2259     }
   2260 }
   2261 
   2262 static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
   2263 {
   2264     /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
   2265     DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
   2266 
   2267     /*
   2268     The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
   2269     by the number of bits that have been consumed.
   2270     */
   2271     if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
   2272         drflac__update_crc16(bs);
   2273     } else {
   2274         /* We only accumulate the consumed bits. */
   2275         bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
   2276 
   2277         /*
   2278         The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
   2279         so we can handle that later.
   2280         */
   2281         bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
   2282     }
   2283 
   2284     return bs->crc16;
   2285 }
   2286 #endif
   2287 
   2288 static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
   2289 {
   2290     size_t bytesRead;
   2291     size_t alignedL1LineCount;
   2292 
   2293     /* Fast path. Try loading straight from L2. */
   2294     if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
   2295         bs->cache = bs->cacheL2[bs->nextL2Line++];
   2296         return DRFLAC_TRUE;
   2297     }
   2298 
   2299     /*
   2300     If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
   2301     any left.
   2302     */
   2303     if (bs->unalignedByteCount > 0) {
   2304         return DRFLAC_FALSE;   /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
   2305     }
   2306 
   2307     bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
   2308 
   2309     bs->nextL2Line = 0;
   2310     if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
   2311         bs->cache = bs->cacheL2[bs->nextL2Line++];
   2312         return DRFLAC_TRUE;
   2313     }
   2314 
   2315 
   2316     /*
   2317     If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
   2318     means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
   2319     and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
   2320     the size of the L1 so we'll need to seek backwards by any misaligned bytes.
   2321     */
   2322     alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
   2323 
   2324     /* We need to keep track of any unaligned bytes for later use. */
   2325     bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
   2326     if (bs->unalignedByteCount > 0) {
   2327         bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
   2328     }
   2329 
   2330     if (alignedL1LineCount > 0) {
   2331         size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
   2332         size_t i;
   2333         for (i = alignedL1LineCount; i > 0; --i) {
   2334             bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
   2335         }
   2336 
   2337         bs->nextL2Line = (drflac_uint32)offset;
   2338         bs->cache = bs->cacheL2[bs->nextL2Line++];
   2339         return DRFLAC_TRUE;
   2340     } else {
   2341         /* If we get into this branch it means we weren't able to load any L1-aligned data. */
   2342         bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
   2343         return DRFLAC_FALSE;
   2344     }
   2345 }
   2346 
   2347 static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
   2348 {
   2349     size_t bytesRead;
   2350 
   2351 #ifndef DR_FLAC_NO_CRC
   2352     drflac__update_crc16(bs);
   2353 #endif
   2354 
   2355     /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
   2356     if (drflac__reload_l1_cache_from_l2(bs)) {
   2357         bs->cache = drflac__be2host__cache_line(bs->cache);
   2358         bs->consumedBits = 0;
   2359 #ifndef DR_FLAC_NO_CRC
   2360         bs->crc16Cache = bs->cache;
   2361 #endif
   2362         return DRFLAC_TRUE;
   2363     }
   2364 
   2365     /* Slow path. */
   2366 
   2367     /*
   2368     If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
   2369     few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
   2370     data from the unaligned cache.
   2371     */
   2372     bytesRead = bs->unalignedByteCount;
   2373     if (bytesRead == 0) {
   2374         bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- The stream has been exhausted, so marked the bits as consumed. */
   2375         return DRFLAC_FALSE;
   2376     }
   2377 
   2378     DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
   2379     bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
   2380 
   2381     bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
   2382     bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs));    /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
   2383     bs->unalignedByteCount = 0;     /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
   2384 
   2385 #ifndef DR_FLAC_NO_CRC
   2386     bs->crc16Cache = bs->cache >> bs->consumedBits;
   2387     bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
   2388 #endif
   2389     return DRFLAC_TRUE;
   2390 }
   2391 
   2392 static void drflac__reset_cache(drflac_bs* bs)
   2393 {
   2394     bs->nextL2Line   = DRFLAC_CACHE_L2_LINE_COUNT(bs);  /* <-- This clears the L2 cache. */
   2395     bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- This clears the L1 cache. */
   2396     bs->cache = 0;
   2397     bs->unalignedByteCount = 0;                         /* <-- This clears the trailing unaligned bytes. */
   2398     bs->unalignedCache = 0;
   2399 
   2400 #ifndef DR_FLAC_NO_CRC
   2401     bs->crc16Cache = 0;
   2402     bs->crc16CacheIgnoredBytes = 0;
   2403 #endif
   2404 }
   2405 
   2406 
   2407 static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
   2408 {
   2409     DRFLAC_ASSERT(bs != NULL);
   2410     DRFLAC_ASSERT(pResultOut != NULL);
   2411     DRFLAC_ASSERT(bitCount > 0);
   2412     DRFLAC_ASSERT(bitCount <= 32);
   2413 
   2414     if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
   2415         if (!drflac__reload_cache(bs)) {
   2416             return DRFLAC_FALSE;
   2417         }
   2418     }
   2419 
   2420     if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   2421         /*
   2422         If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
   2423         a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
   2424         more optimal solution for this.
   2425         */
   2426 #ifdef DRFLAC_64BIT
   2427         *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
   2428         bs->consumedBits += bitCount;
   2429         bs->cache <<= bitCount;
   2430 #else
   2431         if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
   2432             *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
   2433             bs->consumedBits += bitCount;
   2434             bs->cache <<= bitCount;
   2435         } else {
   2436             /* Cannot shift by 32-bits, so need to do it differently. */
   2437             *pResultOut = (drflac_uint32)bs->cache;
   2438             bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
   2439             bs->cache = 0;
   2440         }
   2441 #endif
   2442 
   2443         return DRFLAC_TRUE;
   2444     } else {
   2445         /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
   2446         drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
   2447         drflac_uint32 bitCountLo = bitCount - bitCountHi;
   2448         drflac_uint32 resultHi;
   2449 
   2450         DRFLAC_ASSERT(bitCountHi > 0);
   2451         DRFLAC_ASSERT(bitCountHi < 32);
   2452         resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
   2453 
   2454         if (!drflac__reload_cache(bs)) {
   2455             return DRFLAC_FALSE;
   2456         }
   2457         if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   2458             /* This happens when we get to end of stream */
   2459             return DRFLAC_FALSE;
   2460         }
   2461 
   2462         *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
   2463         bs->consumedBits += bitCountLo;
   2464         bs->cache <<= bitCountLo;
   2465         return DRFLAC_TRUE;
   2466     }
   2467 }
   2468 
   2469 static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
   2470 {
   2471     drflac_uint32 result;
   2472 
   2473     DRFLAC_ASSERT(bs != NULL);
   2474     DRFLAC_ASSERT(pResult != NULL);
   2475     DRFLAC_ASSERT(bitCount > 0);
   2476     DRFLAC_ASSERT(bitCount <= 32);
   2477 
   2478     if (!drflac__read_uint32(bs, bitCount, &result)) {
   2479         return DRFLAC_FALSE;
   2480     }
   2481 
   2482     /* Do not attempt to shift by 32 as it's undefined. */
   2483     if (bitCount < 32) {
   2484         drflac_uint32 signbit;
   2485         signbit = ((result >> (bitCount-1)) & 0x01);
   2486         result |= (~signbit + 1) << bitCount;
   2487     }
   2488 
   2489     *pResult = (drflac_int32)result;
   2490     return DRFLAC_TRUE;
   2491 }
   2492 
   2493 #ifdef DRFLAC_64BIT
   2494 static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
   2495 {
   2496     drflac_uint32 resultHi;
   2497     drflac_uint32 resultLo;
   2498 
   2499     DRFLAC_ASSERT(bitCount <= 64);
   2500     DRFLAC_ASSERT(bitCount >  32);
   2501 
   2502     if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
   2503         return DRFLAC_FALSE;
   2504     }
   2505 
   2506     if (!drflac__read_uint32(bs, 32, &resultLo)) {
   2507         return DRFLAC_FALSE;
   2508     }
   2509 
   2510     *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
   2511     return DRFLAC_TRUE;
   2512 }
   2513 #endif
   2514 
   2515 /* Function below is unused, but leaving it here in case I need to quickly add it again. */
   2516 #if 0
   2517 static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
   2518 {
   2519     drflac_uint64 result;
   2520     drflac_uint64 signbit;
   2521 
   2522     DRFLAC_ASSERT(bitCount <= 64);
   2523 
   2524     if (!drflac__read_uint64(bs, bitCount, &result)) {
   2525         return DRFLAC_FALSE;
   2526     }
   2527 
   2528     signbit = ((result >> (bitCount-1)) & 0x01);
   2529     result |= (~signbit + 1) << bitCount;
   2530 
   2531     *pResultOut = (drflac_int64)result;
   2532     return DRFLAC_TRUE;
   2533 }
   2534 #endif
   2535 
   2536 static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
   2537 {
   2538     drflac_uint32 result;
   2539 
   2540     DRFLAC_ASSERT(bs != NULL);
   2541     DRFLAC_ASSERT(pResult != NULL);
   2542     DRFLAC_ASSERT(bitCount > 0);
   2543     DRFLAC_ASSERT(bitCount <= 16);
   2544 
   2545     if (!drflac__read_uint32(bs, bitCount, &result)) {
   2546         return DRFLAC_FALSE;
   2547     }
   2548 
   2549     *pResult = (drflac_uint16)result;
   2550     return DRFLAC_TRUE;
   2551 }
   2552 
   2553 #if 0
   2554 static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
   2555 {
   2556     drflac_int32 result;
   2557 
   2558     DRFLAC_ASSERT(bs != NULL);
   2559     DRFLAC_ASSERT(pResult != NULL);
   2560     DRFLAC_ASSERT(bitCount > 0);
   2561     DRFLAC_ASSERT(bitCount <= 16);
   2562 
   2563     if (!drflac__read_int32(bs, bitCount, &result)) {
   2564         return DRFLAC_FALSE;
   2565     }
   2566 
   2567     *pResult = (drflac_int16)result;
   2568     return DRFLAC_TRUE;
   2569 }
   2570 #endif
   2571 
   2572 static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
   2573 {
   2574     drflac_uint32 result;
   2575 
   2576     DRFLAC_ASSERT(bs != NULL);
   2577     DRFLAC_ASSERT(pResult != NULL);
   2578     DRFLAC_ASSERT(bitCount > 0);
   2579     DRFLAC_ASSERT(bitCount <= 8);
   2580 
   2581     if (!drflac__read_uint32(bs, bitCount, &result)) {
   2582         return DRFLAC_FALSE;
   2583     }
   2584 
   2585     *pResult = (drflac_uint8)result;
   2586     return DRFLAC_TRUE;
   2587 }
   2588 
   2589 static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
   2590 {
   2591     drflac_int32 result;
   2592 
   2593     DRFLAC_ASSERT(bs != NULL);
   2594     DRFLAC_ASSERT(pResult != NULL);
   2595     DRFLAC_ASSERT(bitCount > 0);
   2596     DRFLAC_ASSERT(bitCount <= 8);
   2597 
   2598     if (!drflac__read_int32(bs, bitCount, &result)) {
   2599         return DRFLAC_FALSE;
   2600     }
   2601 
   2602     *pResult = (drflac_int8)result;
   2603     return DRFLAC_TRUE;
   2604 }
   2605 
   2606 
   2607 static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
   2608 {
   2609     if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   2610         bs->consumedBits += (drflac_uint32)bitsToSeek;
   2611         bs->cache <<= bitsToSeek;
   2612         return DRFLAC_TRUE;
   2613     } else {
   2614         /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
   2615         bitsToSeek       -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
   2616         bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
   2617         bs->cache         = 0;
   2618 
   2619         /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
   2620 #ifdef DRFLAC_64BIT
   2621         while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
   2622             drflac_uint64 bin;
   2623             if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
   2624                 return DRFLAC_FALSE;
   2625             }
   2626             bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
   2627         }
   2628 #else
   2629         while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
   2630             drflac_uint32 bin;
   2631             if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
   2632                 return DRFLAC_FALSE;
   2633             }
   2634             bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
   2635         }
   2636 #endif
   2637 
   2638         /* Whole leftover bytes. */
   2639         while (bitsToSeek >= 8) {
   2640             drflac_uint8 bin;
   2641             if (!drflac__read_uint8(bs, 8, &bin)) {
   2642                 return DRFLAC_FALSE;
   2643             }
   2644             bitsToSeek -= 8;
   2645         }
   2646 
   2647         /* Leftover bits. */
   2648         if (bitsToSeek > 0) {
   2649             drflac_uint8 bin;
   2650             if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
   2651                 return DRFLAC_FALSE;
   2652             }
   2653             bitsToSeek = 0; /* <-- Necessary for the assert below. */
   2654         }
   2655 
   2656         DRFLAC_ASSERT(bitsToSeek == 0);
   2657         return DRFLAC_TRUE;
   2658     }
   2659 }
   2660 
   2661 
   2662 /* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
   2663 static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
   2664 {
   2665     DRFLAC_ASSERT(bs != NULL);
   2666 
   2667     /*
   2668     The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
   2669     thing to do is align to the next byte.
   2670     */
   2671     if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
   2672         return DRFLAC_FALSE;
   2673     }
   2674 
   2675     for (;;) {
   2676         drflac_uint8 hi;
   2677 
   2678 #ifndef DR_FLAC_NO_CRC
   2679         drflac__reset_crc16(bs);
   2680 #endif
   2681 
   2682         if (!drflac__read_uint8(bs, 8, &hi)) {
   2683             return DRFLAC_FALSE;
   2684         }
   2685 
   2686         if (hi == 0xFF) {
   2687             drflac_uint8 lo;
   2688             if (!drflac__read_uint8(bs, 6, &lo)) {
   2689                 return DRFLAC_FALSE;
   2690             }
   2691 
   2692             if (lo == 0x3E) {
   2693                 return DRFLAC_TRUE;
   2694             } else {
   2695                 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
   2696                     return DRFLAC_FALSE;
   2697                 }
   2698             }
   2699         }
   2700     }
   2701 
   2702     /* Should never get here. */
   2703     /*return DRFLAC_FALSE;*/
   2704 }
   2705 
   2706 
   2707 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
   2708 #define DRFLAC_IMPLEMENT_CLZ_LZCNT
   2709 #endif
   2710 #if  defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
   2711 #define DRFLAC_IMPLEMENT_CLZ_MSVC
   2712 #endif
   2713 #if  defined(__WATCOMC__) && defined(__386__)
   2714 #define DRFLAC_IMPLEMENT_CLZ_WATCOM
   2715 #endif
   2716 #ifdef __MRC__
   2717 #include <intrinsics.h>
   2718 #define DRFLAC_IMPLEMENT_CLZ_MRC
   2719 #endif
   2720 
   2721 static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
   2722 {
   2723     drflac_uint32 n;
   2724     static drflac_uint32 clz_table_4[] = {
   2725         0,
   2726         4,
   2727         3, 3,
   2728         2, 2, 2, 2,
   2729         1, 1, 1, 1, 1, 1, 1, 1
   2730     };
   2731 
   2732     if (x == 0) {
   2733         return sizeof(x)*8;
   2734     }
   2735 
   2736     n = clz_table_4[x >> (sizeof(x)*8 - 4)];
   2737     if (n == 0) {
   2738 #ifdef DRFLAC_64BIT
   2739         if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n  = 32; x <<= 32; }
   2740         if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
   2741         if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8;  x <<= 8;  }
   2742         if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4;  x <<= 4;  }
   2743 #else
   2744         if ((x & 0xFFFF0000) == 0) { n  = 16; x <<= 16; }
   2745         if ((x & 0xFF000000) == 0) { n += 8;  x <<= 8;  }
   2746         if ((x & 0xF0000000) == 0) { n += 4;  x <<= 4;  }
   2747 #endif
   2748         n += clz_table_4[x >> (sizeof(x)*8 - 4)];
   2749     }
   2750 
   2751     return n - 1;
   2752 }
   2753 
   2754 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
   2755 static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
   2756 {
   2757     /* Fast compile time check for ARM. */
   2758 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
   2759     return DRFLAC_TRUE;
   2760 #elif defined(__MRC__)
   2761     return DRFLAC_TRUE;
   2762 #else
   2763     /* If the compiler itself does not support the intrinsic then we'll need to return false. */
   2764     #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
   2765         return drflac__gIsLZCNTSupported;
   2766     #else
   2767         return DRFLAC_FALSE;
   2768     #endif
   2769 #endif
   2770 }
   2771 
   2772 static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
   2773 {
   2774     /*
   2775     It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
   2776     to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
   2777     it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
   2778     64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
   2779     around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
   2780     the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
   2781     in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
   2782     getting clobbered?
   2783 
   2784     I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
   2785     assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
   2786 
   2787     Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
   2788     compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
   2789     to know how to fix the inlined assembly for correctness sake, however.
   2790     */
   2791 
   2792 #if defined(_MSC_VER) /*&& !defined(__clang__)*/    /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
   2793     #ifdef DRFLAC_64BIT
   2794         return (drflac_uint32)__lzcnt64(x);
   2795     #else
   2796         return (drflac_uint32)__lzcnt(x);
   2797     #endif
   2798 #else
   2799     #if defined(__GNUC__) || defined(__clang__)
   2800         #if defined(DRFLAC_X64)
   2801             {
   2802                 drflac_uint64 r;
   2803                 __asm__ __volatile__ (
   2804                     "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
   2805                 );
   2806 
   2807                 return (drflac_uint32)r;
   2808             }
   2809         #elif defined(DRFLAC_X86)
   2810             {
   2811                 drflac_uint32 r;
   2812                 __asm__ __volatile__ (
   2813                     "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
   2814                 );
   2815 
   2816                 return r;
   2817             }
   2818         #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT)   /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
   2819             {
   2820                 unsigned int r;
   2821                 __asm__ __volatile__ (
   2822                 #if defined(DRFLAC_64BIT)
   2823                     "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
   2824                 #else
   2825                     "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
   2826                 #endif
   2827                 );
   2828 
   2829                 return r;
   2830             }
   2831         #else
   2832             if (x == 0) {
   2833                 return sizeof(x)*8;
   2834             }
   2835             #ifdef DRFLAC_64BIT
   2836                 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
   2837             #else
   2838                 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
   2839             #endif
   2840         #endif
   2841     #else
   2842         /* Unsupported compiler. */
   2843         #error "This compiler does not support the lzcnt intrinsic."
   2844     #endif
   2845 #endif
   2846 }
   2847 #endif
   2848 
   2849 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
   2850 #include <intrin.h> /* For BitScanReverse(). */
   2851 
   2852 static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
   2853 {
   2854     drflac_uint32 n;
   2855 
   2856     if (x == 0) {
   2857         return sizeof(x)*8;
   2858     }
   2859 
   2860 #ifdef DRFLAC_64BIT
   2861     _BitScanReverse64((unsigned long*)&n, x);
   2862 #else
   2863     _BitScanReverse((unsigned long*)&n, x);
   2864 #endif
   2865     return sizeof(x)*8 - n - 1;
   2866 }
   2867 #endif
   2868 
   2869 #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
   2870 static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
   2871 #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
   2872 /* Use the LZCNT instruction (only available on some processors since the 2010s). */
   2873 #pragma aux drflac__clz_watcom_lzcnt = \
   2874     "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
   2875     parm [eax] \
   2876     value [eax] \
   2877     modify nomemory;
   2878 #else
   2879 /* Use the 386+-compatible implementation. */
   2880 #pragma aux drflac__clz_watcom = \
   2881     "bsr eax, eax" \
   2882     "xor eax, 31" \
   2883     parm [eax] nomemory \
   2884     value [eax] \
   2885     modify exact [eax] nomemory;
   2886 #endif
   2887 #endif
   2888 
   2889 static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
   2890 {
   2891 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
   2892     if (drflac__is_lzcnt_supported()) {
   2893         return drflac__clz_lzcnt(x);
   2894     } else
   2895 #endif
   2896     {
   2897 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
   2898         return drflac__clz_msvc(x);
   2899 #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
   2900         return drflac__clz_watcom_lzcnt(x);
   2901 #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
   2902         return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
   2903 #elif defined(__MRC__)
   2904         return __cntlzw(x);
   2905 #else
   2906         return drflac__clz_software(x);
   2907 #endif
   2908     }
   2909 }
   2910 
   2911 
   2912 static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
   2913 {
   2914     drflac_uint32 zeroCounter = 0;
   2915     drflac_uint32 setBitOffsetPlus1;
   2916 
   2917     while (bs->cache == 0) {
   2918         zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
   2919         if (!drflac__reload_cache(bs)) {
   2920             return DRFLAC_FALSE;
   2921         }
   2922     }
   2923 
   2924     if (bs->cache == 1) {
   2925         /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
   2926         *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
   2927         if (!drflac__reload_cache(bs)) {
   2928             return DRFLAC_FALSE;
   2929         }
   2930 
   2931         return DRFLAC_TRUE;
   2932     }
   2933 
   2934     setBitOffsetPlus1 = drflac__clz(bs->cache);
   2935     setBitOffsetPlus1 += 1;
   2936 
   2937     if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   2938         /* This happens when we get to end of stream */
   2939         return DRFLAC_FALSE;
   2940     }
   2941 
   2942     bs->consumedBits += setBitOffsetPlus1;
   2943     bs->cache <<= setBitOffsetPlus1;
   2944 
   2945     *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
   2946     return DRFLAC_TRUE;
   2947 }
   2948 
   2949 
   2950 
   2951 static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
   2952 {
   2953     DRFLAC_ASSERT(bs != NULL);
   2954     DRFLAC_ASSERT(offsetFromStart > 0);
   2955 
   2956     /*
   2957     Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
   2958     is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
   2959     To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
   2960     */
   2961     if (offsetFromStart > 0x7FFFFFFF) {
   2962         drflac_uint64 bytesRemaining = offsetFromStart;
   2963         if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
   2964             return DRFLAC_FALSE;
   2965         }
   2966         bytesRemaining -= 0x7FFFFFFF;
   2967 
   2968         while (bytesRemaining > 0x7FFFFFFF) {
   2969             if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
   2970                 return DRFLAC_FALSE;
   2971             }
   2972             bytesRemaining -= 0x7FFFFFFF;
   2973         }
   2974 
   2975         if (bytesRemaining > 0) {
   2976             if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
   2977                 return DRFLAC_FALSE;
   2978             }
   2979         }
   2980     } else {
   2981         if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
   2982             return DRFLAC_FALSE;
   2983         }
   2984     }
   2985 
   2986     /* The cache should be reset to force a reload of fresh data from the client. */
   2987     drflac__reset_cache(bs);
   2988     return DRFLAC_TRUE;
   2989 }
   2990 
   2991 
   2992 static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
   2993 {
   2994     drflac_uint8 crc;
   2995     drflac_uint64 result;
   2996     drflac_uint8 utf8[7] = {0};
   2997     int byteCount;
   2998     int i;
   2999 
   3000     DRFLAC_ASSERT(bs != NULL);
   3001     DRFLAC_ASSERT(pNumberOut != NULL);
   3002     DRFLAC_ASSERT(pCRCOut != NULL);
   3003 
   3004     crc = *pCRCOut;
   3005 
   3006     if (!drflac__read_uint8(bs, 8, utf8)) {
   3007         *pNumberOut = 0;
   3008         return DRFLAC_AT_END;
   3009     }
   3010     crc = drflac_crc8(crc, utf8[0], 8);
   3011 
   3012     if ((utf8[0] & 0x80) == 0) {
   3013         *pNumberOut = utf8[0];
   3014         *pCRCOut = crc;
   3015         return DRFLAC_SUCCESS;
   3016     }
   3017 
   3018     /*byteCount = 1;*/
   3019     if ((utf8[0] & 0xE0) == 0xC0) {
   3020         byteCount = 2;
   3021     } else if ((utf8[0] & 0xF0) == 0xE0) {
   3022         byteCount = 3;
   3023     } else if ((utf8[0] & 0xF8) == 0xF0) {
   3024         byteCount = 4;
   3025     } else if ((utf8[0] & 0xFC) == 0xF8) {
   3026         byteCount = 5;
   3027     } else if ((utf8[0] & 0xFE) == 0xFC) {
   3028         byteCount = 6;
   3029     } else if ((utf8[0] & 0xFF) == 0xFE) {
   3030         byteCount = 7;
   3031     } else {
   3032         *pNumberOut = 0;
   3033         return DRFLAC_CRC_MISMATCH;     /* Bad UTF-8 encoding. */
   3034     }
   3035 
   3036     /* Read extra bytes. */
   3037     DRFLAC_ASSERT(byteCount > 1);
   3038 
   3039     result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
   3040     for (i = 1; i < byteCount; ++i) {
   3041         if (!drflac__read_uint8(bs, 8, utf8 + i)) {
   3042             *pNumberOut = 0;
   3043             return DRFLAC_AT_END;
   3044         }
   3045         crc = drflac_crc8(crc, utf8[i], 8);
   3046 
   3047         result = (result << 6) | (utf8[i] & 0x3F);
   3048     }
   3049 
   3050     *pNumberOut = result;
   3051     *pCRCOut = crc;
   3052     return DRFLAC_SUCCESS;
   3053 }
   3054 
   3055 
   3056 static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
   3057 {
   3058 #if 1   /* Needs optimizing. */
   3059     drflac_uint32 result = 0;
   3060     while (x > 0) {
   3061         result += 1;
   3062         x >>= 1;
   3063     }
   3064 
   3065     return result;
   3066 #endif
   3067 }
   3068 
   3069 static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
   3070 {
   3071     /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
   3072     return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
   3073 }
   3074 
   3075 
   3076 /*
   3077 The next two functions are responsible for calculating the prediction.
   3078 
   3079 When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
   3080 safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
   3081 */
   3082 #if defined(__clang__)
   3083 __attribute__((no_sanitize("signed-integer-overflow")))
   3084 #endif
   3085 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
   3086 {
   3087     drflac_int32 prediction = 0;
   3088 
   3089     DRFLAC_ASSERT(order <= 32);
   3090 
   3091     /* 32-bit version. */
   3092 
   3093     /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
   3094     switch (order)
   3095     {
   3096     case 32: prediction += coefficients[31] * pDecodedSamples[-32];
   3097     case 31: prediction += coefficients[30] * pDecodedSamples[-31];
   3098     case 30: prediction += coefficients[29] * pDecodedSamples[-30];
   3099     case 29: prediction += coefficients[28] * pDecodedSamples[-29];
   3100     case 28: prediction += coefficients[27] * pDecodedSamples[-28];
   3101     case 27: prediction += coefficients[26] * pDecodedSamples[-27];
   3102     case 26: prediction += coefficients[25] * pDecodedSamples[-26];
   3103     case 25: prediction += coefficients[24] * pDecodedSamples[-25];
   3104     case 24: prediction += coefficients[23] * pDecodedSamples[-24];
   3105     case 23: prediction += coefficients[22] * pDecodedSamples[-23];
   3106     case 22: prediction += coefficients[21] * pDecodedSamples[-22];
   3107     case 21: prediction += coefficients[20] * pDecodedSamples[-21];
   3108     case 20: prediction += coefficients[19] * pDecodedSamples[-20];
   3109     case 19: prediction += coefficients[18] * pDecodedSamples[-19];
   3110     case 18: prediction += coefficients[17] * pDecodedSamples[-18];
   3111     case 17: prediction += coefficients[16] * pDecodedSamples[-17];
   3112     case 16: prediction += coefficients[15] * pDecodedSamples[-16];
   3113     case 15: prediction += coefficients[14] * pDecodedSamples[-15];
   3114     case 14: prediction += coefficients[13] * pDecodedSamples[-14];
   3115     case 13: prediction += coefficients[12] * pDecodedSamples[-13];
   3116     case 12: prediction += coefficients[11] * pDecodedSamples[-12];
   3117     case 11: prediction += coefficients[10] * pDecodedSamples[-11];
   3118     case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
   3119     case  9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
   3120     case  8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
   3121     case  7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
   3122     case  6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
   3123     case  5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
   3124     case  4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
   3125     case  3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
   3126     case  2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
   3127     case  1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
   3128     }
   3129 
   3130     return (drflac_int32)(prediction >> shift);
   3131 }
   3132 
   3133 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
   3134 {
   3135     drflac_int64 prediction;
   3136 
   3137     DRFLAC_ASSERT(order <= 32);
   3138 
   3139     /* 64-bit version. */
   3140 
   3141     /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
   3142 #ifndef DRFLAC_64BIT
   3143     if (order == 8)
   3144     {
   3145         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3146         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3147         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
   3148         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
   3149         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
   3150         prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
   3151         prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
   3152         prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
   3153     }
   3154     else if (order == 7)
   3155     {
   3156         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3157         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3158         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
   3159         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
   3160         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
   3161         prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
   3162         prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
   3163     }
   3164     else if (order == 3)
   3165     {
   3166         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3167         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3168         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
   3169     }
   3170     else if (order == 6)
   3171     {
   3172         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3173         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3174         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
   3175         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
   3176         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
   3177         prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
   3178     }
   3179     else if (order == 5)
   3180     {
   3181         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3182         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3183         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
   3184         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
   3185         prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
   3186     }
   3187     else if (order == 4)
   3188     {
   3189         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3190         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3191         prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
   3192         prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
   3193     }
   3194     else if (order == 12)
   3195     {
   3196         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
   3197         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
   3198         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
   3199         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
   3200         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
   3201         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
   3202         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
   3203         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
   3204         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
   3205         prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
   3206         prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
   3207         prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
   3208     }
   3209     else if (order == 2)
   3210     {
   3211         prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3212         prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
   3213     }
   3214     else if (order == 1)
   3215     {
   3216         prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
   3217     }
   3218     else if (order == 10)
   3219     {
   3220         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
   3221         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
   3222         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
   3223         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
   3224         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
   3225         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
   3226         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
   3227         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
   3228         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
   3229         prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
   3230     }
   3231     else if (order == 9)
   3232     {
   3233         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
   3234         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
   3235         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
   3236         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
   3237         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
   3238         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
   3239         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
   3240         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
   3241         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
   3242     }
   3243     else if (order == 11)
   3244     {
   3245         prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
   3246         prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
   3247         prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
   3248         prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
   3249         prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
   3250         prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
   3251         prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
   3252         prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
   3253         prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
   3254         prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
   3255         prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
   3256     }
   3257     else
   3258     {
   3259         int j;
   3260 
   3261         prediction = 0;
   3262         for (j = 0; j < (int)order; ++j) {
   3263             prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
   3264         }
   3265     }
   3266 #endif
   3267 
   3268     /*
   3269     VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
   3270     reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
   3271     */
   3272 #ifdef DRFLAC_64BIT
   3273     prediction = 0;
   3274     switch (order)
   3275     {
   3276     case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
   3277     case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
   3278     case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
   3279     case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
   3280     case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
   3281     case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
   3282     case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
   3283     case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
   3284     case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
   3285     case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
   3286     case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
   3287     case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
   3288     case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
   3289     case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
   3290     case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
   3291     case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
   3292     case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
   3293     case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
   3294     case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
   3295     case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
   3296     case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
   3297     case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
   3298     case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
   3299     case  9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
   3300     case  8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
   3301     case  7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
   3302     case  6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
   3303     case  5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
   3304     case  4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
   3305     case  3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
   3306     case  2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
   3307     case  1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
   3308     }
   3309 #endif
   3310 
   3311     return (drflac_int32)(prediction >> shift);
   3312 }
   3313 
   3314 
   3315 #if 0
   3316 /*
   3317 Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
   3318 sake of readability and should only be used as a reference.
   3319 */
   3320 static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   3321 {
   3322     drflac_uint32 i;
   3323 
   3324     DRFLAC_ASSERT(bs != NULL);
   3325     DRFLAC_ASSERT(pSamplesOut != NULL);
   3326 
   3327     for (i = 0; i < count; ++i) {
   3328         drflac_uint32 zeroCounter = 0;
   3329         for (;;) {
   3330             drflac_uint8 bit;
   3331             if (!drflac__read_uint8(bs, 1, &bit)) {
   3332                 return DRFLAC_FALSE;
   3333             }
   3334 
   3335             if (bit == 0) {
   3336                 zeroCounter += 1;
   3337             } else {
   3338                 break;
   3339             }
   3340         }
   3341 
   3342         drflac_uint32 decodedRice;
   3343         if (riceParam > 0) {
   3344             if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
   3345                 return DRFLAC_FALSE;
   3346             }
   3347         } else {
   3348             decodedRice = 0;
   3349         }
   3350 
   3351         decodedRice |= (zeroCounter << riceParam);
   3352         if ((decodedRice & 0x01)) {
   3353             decodedRice = ~(decodedRice >> 1);
   3354         } else {
   3355             decodedRice =  (decodedRice >> 1);
   3356         }
   3357 
   3358 
   3359         if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
   3360             pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
   3361         } else {
   3362             pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
   3363         }
   3364     }
   3365 
   3366     return DRFLAC_TRUE;
   3367 }
   3368 #endif
   3369 
   3370 #if 0
   3371 static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
   3372 {
   3373     drflac_uint32 zeroCounter = 0;
   3374     drflac_uint32 decodedRice;
   3375 
   3376     for (;;) {
   3377         drflac_uint8 bit;
   3378         if (!drflac__read_uint8(bs, 1, &bit)) {
   3379             return DRFLAC_FALSE;
   3380         }
   3381 
   3382         if (bit == 0) {
   3383             zeroCounter += 1;
   3384         } else {
   3385             break;
   3386         }
   3387     }
   3388 
   3389     if (riceParam > 0) {
   3390         if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
   3391             return DRFLAC_FALSE;
   3392         }
   3393     } else {
   3394         decodedRice = 0;
   3395     }
   3396 
   3397     *pZeroCounterOut = zeroCounter;
   3398     *pRiceParamPartOut = decodedRice;
   3399     return DRFLAC_TRUE;
   3400 }
   3401 #endif
   3402 
   3403 #if 0
   3404 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
   3405 {
   3406     drflac_cache_t riceParamMask;
   3407     drflac_uint32 zeroCounter;
   3408     drflac_uint32 setBitOffsetPlus1;
   3409     drflac_uint32 riceParamPart;
   3410     drflac_uint32 riceLength;
   3411 
   3412     DRFLAC_ASSERT(riceParam > 0);   /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
   3413 
   3414     riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
   3415 
   3416     zeroCounter = 0;
   3417     while (bs->cache == 0) {
   3418         zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
   3419         if (!drflac__reload_cache(bs)) {
   3420             return DRFLAC_FALSE;
   3421         }
   3422     }
   3423 
   3424     setBitOffsetPlus1 = drflac__clz(bs->cache);
   3425     zeroCounter += setBitOffsetPlus1;
   3426     setBitOffsetPlus1 += 1;
   3427 
   3428     riceLength = setBitOffsetPlus1 + riceParam;
   3429     if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   3430         riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
   3431 
   3432         bs->consumedBits += riceLength;
   3433         bs->cache <<= riceLength;
   3434     } else {
   3435         drflac_uint32 bitCountLo;
   3436         drflac_cache_t resultHi;
   3437 
   3438         bs->consumedBits += riceLength;
   3439         bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1);    /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
   3440 
   3441         /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
   3442         bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
   3443         resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam);  /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
   3444 
   3445         if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
   3446 #ifndef DR_FLAC_NO_CRC
   3447             drflac__update_crc16(bs);
   3448 #endif
   3449             bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
   3450             bs->consumedBits = 0;
   3451 #ifndef DR_FLAC_NO_CRC
   3452             bs->crc16Cache = bs->cache;
   3453 #endif
   3454         } else {
   3455             /* Slow path. We need to fetch more data from the client. */
   3456             if (!drflac__reload_cache(bs)) {
   3457                 return DRFLAC_FALSE;
   3458             }
   3459             if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   3460                 /* This happens when we get to end of stream */
   3461                 return DRFLAC_FALSE;
   3462             }
   3463         }
   3464 
   3465         riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
   3466 
   3467         bs->consumedBits += bitCountLo;
   3468         bs->cache <<= bitCountLo;
   3469     }
   3470 
   3471     pZeroCounterOut[0] = zeroCounter;
   3472     pRiceParamPartOut[0] = riceParamPart;
   3473 
   3474     return DRFLAC_TRUE;
   3475 }
   3476 #endif
   3477 
   3478 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
   3479 {
   3480     drflac_uint32  riceParamPlus1 = riceParam + 1;
   3481     /*drflac_cache_t riceParamPlus1Mask  = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
   3482     drflac_uint32  riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
   3483     drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
   3484 
   3485     /*
   3486     The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
   3487     no idea how this will work in practice...
   3488     */
   3489     drflac_cache_t bs_cache = bs->cache;
   3490     drflac_uint32  bs_consumedBits = bs->consumedBits;
   3491 
   3492     /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
   3493     drflac_uint32  lzcount = drflac__clz(bs_cache);
   3494     if (lzcount < sizeof(bs_cache)*8) {
   3495         pZeroCounterOut[0] = lzcount;
   3496 
   3497         /*
   3498         It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
   3499         this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
   3500         outside of this function at a higher level.
   3501         */
   3502     extract_rice_param_part:
   3503         bs_cache       <<= lzcount;
   3504         bs_consumedBits += lzcount;
   3505 
   3506         if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
   3507             /* Getting here means the rice parameter part is wholly contained within the current cache line. */
   3508             pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
   3509             bs_cache       <<= riceParamPlus1;
   3510             bs_consumedBits += riceParamPlus1;
   3511         } else {
   3512             drflac_uint32 riceParamPartHi;
   3513             drflac_uint32 riceParamPartLo;
   3514             drflac_uint32 riceParamPartLoBitCount;
   3515 
   3516             /*
   3517             Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
   3518             line, reload the cache, and then combine it with the head of the next cache line.
   3519             */
   3520 
   3521             /* Grab the high part of the rice parameter part. */
   3522             riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
   3523 
   3524             /* Before reloading the cache we need to grab the size in bits of the low part. */
   3525             riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
   3526             DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
   3527 
   3528             /* Now reload the cache. */
   3529             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
   3530             #ifndef DR_FLAC_NO_CRC
   3531                 drflac__update_crc16(bs);
   3532             #endif
   3533                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
   3534                 bs_consumedBits = riceParamPartLoBitCount;
   3535             #ifndef DR_FLAC_NO_CRC
   3536                 bs->crc16Cache = bs_cache;
   3537             #endif
   3538             } else {
   3539                 /* Slow path. We need to fetch more data from the client. */
   3540                 if (!drflac__reload_cache(bs)) {
   3541                     return DRFLAC_FALSE;
   3542                 }
   3543                 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   3544                     /* This happens when we get to end of stream */
   3545                     return DRFLAC_FALSE;
   3546                 }
   3547 
   3548                 bs_cache = bs->cache;
   3549                 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
   3550             }
   3551 
   3552             /* We should now have enough information to construct the rice parameter part. */
   3553             riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
   3554             pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
   3555 
   3556             bs_cache <<= riceParamPartLoBitCount;
   3557         }
   3558     } else {
   3559         /*
   3560         Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
   3561         to drflac__clz() and we need to reload the cache.
   3562         */
   3563         drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
   3564         for (;;) {
   3565             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
   3566             #ifndef DR_FLAC_NO_CRC
   3567                 drflac__update_crc16(bs);
   3568             #endif
   3569                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
   3570                 bs_consumedBits = 0;
   3571             #ifndef DR_FLAC_NO_CRC
   3572                 bs->crc16Cache = bs_cache;
   3573             #endif
   3574             } else {
   3575                 /* Slow path. We need to fetch more data from the client. */
   3576                 if (!drflac__reload_cache(bs)) {
   3577                     return DRFLAC_FALSE;
   3578                 }
   3579 
   3580                 bs_cache = bs->cache;
   3581                 bs_consumedBits = bs->consumedBits;
   3582             }
   3583 
   3584             lzcount = drflac__clz(bs_cache);
   3585             zeroCounter += lzcount;
   3586 
   3587             if (lzcount < sizeof(bs_cache)*8) {
   3588                 break;
   3589             }
   3590         }
   3591 
   3592         pZeroCounterOut[0] = zeroCounter;
   3593         goto extract_rice_param_part;
   3594     }
   3595 
   3596     /* Make sure the cache is restored at the end of it all. */
   3597     bs->cache = bs_cache;
   3598     bs->consumedBits = bs_consumedBits;
   3599 
   3600     return DRFLAC_TRUE;
   3601 }
   3602 
   3603 static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
   3604 {
   3605     drflac_uint32  riceParamPlus1 = riceParam + 1;
   3606     drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
   3607 
   3608     /*
   3609     The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
   3610     no idea how this will work in practice...
   3611     */
   3612     drflac_cache_t bs_cache = bs->cache;
   3613     drflac_uint32  bs_consumedBits = bs->consumedBits;
   3614 
   3615     /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
   3616     drflac_uint32  lzcount = drflac__clz(bs_cache);
   3617     if (lzcount < sizeof(bs_cache)*8) {
   3618         /*
   3619         It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
   3620         this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
   3621         outside of this function at a higher level.
   3622         */
   3623     extract_rice_param_part:
   3624         bs_cache       <<= lzcount;
   3625         bs_consumedBits += lzcount;
   3626 
   3627         if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
   3628             /* Getting here means the rice parameter part is wholly contained within the current cache line. */
   3629             bs_cache       <<= riceParamPlus1;
   3630             bs_consumedBits += riceParamPlus1;
   3631         } else {
   3632             /*
   3633             Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
   3634             line, reload the cache, and then combine it with the head of the next cache line.
   3635             */
   3636 
   3637             /* Before reloading the cache we need to grab the size in bits of the low part. */
   3638             drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
   3639             DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
   3640 
   3641             /* Now reload the cache. */
   3642             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
   3643             #ifndef DR_FLAC_NO_CRC
   3644                 drflac__update_crc16(bs);
   3645             #endif
   3646                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
   3647                 bs_consumedBits = riceParamPartLoBitCount;
   3648             #ifndef DR_FLAC_NO_CRC
   3649                 bs->crc16Cache = bs_cache;
   3650             #endif
   3651             } else {
   3652                 /* Slow path. We need to fetch more data from the client. */
   3653                 if (!drflac__reload_cache(bs)) {
   3654                     return DRFLAC_FALSE;
   3655                 }
   3656 
   3657                 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
   3658                     /* This happens when we get to end of stream */
   3659                     return DRFLAC_FALSE;
   3660                 }
   3661 
   3662                 bs_cache = bs->cache;
   3663                 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
   3664             }
   3665 
   3666             bs_cache <<= riceParamPartLoBitCount;
   3667         }
   3668     } else {
   3669         /*
   3670         Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
   3671         to drflac__clz() and we need to reload the cache.
   3672         */
   3673         for (;;) {
   3674             if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
   3675             #ifndef DR_FLAC_NO_CRC
   3676                 drflac__update_crc16(bs);
   3677             #endif
   3678                 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
   3679                 bs_consumedBits = 0;
   3680             #ifndef DR_FLAC_NO_CRC
   3681                 bs->crc16Cache = bs_cache;
   3682             #endif
   3683             } else {
   3684                 /* Slow path. We need to fetch more data from the client. */
   3685                 if (!drflac__reload_cache(bs)) {
   3686                     return DRFLAC_FALSE;
   3687                 }
   3688 
   3689                 bs_cache = bs->cache;
   3690                 bs_consumedBits = bs->consumedBits;
   3691             }
   3692 
   3693             lzcount = drflac__clz(bs_cache);
   3694             if (lzcount < sizeof(bs_cache)*8) {
   3695                 break;
   3696             }
   3697         }
   3698 
   3699         goto extract_rice_param_part;
   3700     }
   3701 
   3702     /* Make sure the cache is restored at the end of it all. */
   3703     bs->cache = bs_cache;
   3704     bs->consumedBits = bs_consumedBits;
   3705 
   3706     return DRFLAC_TRUE;
   3707 }
   3708 
   3709 
   3710 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   3711 {
   3712     drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
   3713     drflac_uint32 zeroCountPart0;
   3714     drflac_uint32 riceParamPart0;
   3715     drflac_uint32 riceParamMask;
   3716     drflac_uint32 i;
   3717 
   3718     DRFLAC_ASSERT(bs != NULL);
   3719     DRFLAC_ASSERT(pSamplesOut != NULL);
   3720 
   3721     (void)bitsPerSample;
   3722     (void)order;
   3723     (void)shift;
   3724     (void)coefficients;
   3725 
   3726     riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
   3727 
   3728     i = 0;
   3729     while (i < count) {
   3730         /* Rice extraction. */
   3731         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
   3732             return DRFLAC_FALSE;
   3733         }
   3734 
   3735         /* Rice reconstruction. */
   3736         riceParamPart0 &= riceParamMask;
   3737         riceParamPart0 |= (zeroCountPart0 << riceParam);
   3738         riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
   3739 
   3740         pSamplesOut[i] = riceParamPart0;
   3741 
   3742         i += 1;
   3743     }
   3744 
   3745     return DRFLAC_TRUE;
   3746 }
   3747 
   3748 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   3749 {
   3750     drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
   3751     drflac_uint32 zeroCountPart0 = 0;
   3752     drflac_uint32 zeroCountPart1 = 0;
   3753     drflac_uint32 zeroCountPart2 = 0;
   3754     drflac_uint32 zeroCountPart3 = 0;
   3755     drflac_uint32 riceParamPart0 = 0;
   3756     drflac_uint32 riceParamPart1 = 0;
   3757     drflac_uint32 riceParamPart2 = 0;
   3758     drflac_uint32 riceParamPart3 = 0;
   3759     drflac_uint32 riceParamMask;
   3760     const drflac_int32* pSamplesOutEnd;
   3761     drflac_uint32 i;
   3762 
   3763     DRFLAC_ASSERT(bs != NULL);
   3764     DRFLAC_ASSERT(pSamplesOut != NULL);
   3765 
   3766     if (lpcOrder == 0) {
   3767         return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
   3768     }
   3769 
   3770     riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
   3771     pSamplesOutEnd = pSamplesOut + (count & ~3);
   3772 
   3773     if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
   3774         while (pSamplesOut < pSamplesOutEnd) {
   3775             /*
   3776             Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
   3777             against an array. Not sure why, but perhaps it's making more efficient use of registers?
   3778             */
   3779             if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
   3780                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
   3781                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
   3782                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
   3783                 return DRFLAC_FALSE;
   3784             }
   3785 
   3786             riceParamPart0 &= riceParamMask;
   3787             riceParamPart1 &= riceParamMask;
   3788             riceParamPart2 &= riceParamMask;
   3789             riceParamPart3 &= riceParamMask;
   3790 
   3791             riceParamPart0 |= (zeroCountPart0 << riceParam);
   3792             riceParamPart1 |= (zeroCountPart1 << riceParam);
   3793             riceParamPart2 |= (zeroCountPart2 << riceParam);
   3794             riceParamPart3 |= (zeroCountPart3 << riceParam);
   3795 
   3796             riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
   3797             riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
   3798             riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
   3799             riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
   3800 
   3801             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
   3802             pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
   3803             pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
   3804             pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
   3805 
   3806             pSamplesOut += 4;
   3807         }
   3808     } else {
   3809         while (pSamplesOut < pSamplesOutEnd) {
   3810             if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
   3811                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
   3812                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
   3813                 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
   3814                 return DRFLAC_FALSE;
   3815             }
   3816 
   3817             riceParamPart0 &= riceParamMask;
   3818             riceParamPart1 &= riceParamMask;
   3819             riceParamPart2 &= riceParamMask;
   3820             riceParamPart3 &= riceParamMask;
   3821 
   3822             riceParamPart0 |= (zeroCountPart0 << riceParam);
   3823             riceParamPart1 |= (zeroCountPart1 << riceParam);
   3824             riceParamPart2 |= (zeroCountPart2 << riceParam);
   3825             riceParamPart3 |= (zeroCountPart3 << riceParam);
   3826 
   3827             riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
   3828             riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
   3829             riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
   3830             riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
   3831 
   3832             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
   3833             pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
   3834             pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
   3835             pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
   3836 
   3837             pSamplesOut += 4;
   3838         }
   3839     }
   3840 
   3841     i = (count & ~3);
   3842     while (i < count) {
   3843         /* Rice extraction. */
   3844         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
   3845             return DRFLAC_FALSE;
   3846         }
   3847 
   3848         /* Rice reconstruction. */
   3849         riceParamPart0 &= riceParamMask;
   3850         riceParamPart0 |= (zeroCountPart0 << riceParam);
   3851         riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
   3852         /*riceParamPart0  = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
   3853 
   3854         /* Sample reconstruction. */
   3855         if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
   3856             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
   3857         } else {
   3858             pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
   3859         }
   3860 
   3861         i += 1;
   3862         pSamplesOut += 1;
   3863     }
   3864 
   3865     return DRFLAC_TRUE;
   3866 }
   3867 
   3868 #if defined(DRFLAC_SUPPORT_SSE2)
   3869 static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
   3870 {
   3871     __m128i r;
   3872 
   3873     /* Pack. */
   3874     r = _mm_packs_epi32(a, b);
   3875 
   3876     /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
   3877     r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
   3878 
   3879     /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
   3880     r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
   3881     r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
   3882 
   3883     return r;
   3884 }
   3885 #endif
   3886 
   3887 #if defined(DRFLAC_SUPPORT_SSE41)
   3888 static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
   3889 {
   3890     return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
   3891 }
   3892 
   3893 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
   3894 {
   3895     __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
   3896     __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
   3897     return _mm_add_epi32(x64, x32);
   3898 }
   3899 
   3900 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
   3901 {
   3902     return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
   3903 }
   3904 
   3905 static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
   3906 {
   3907     /*
   3908     To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
   3909     is shifted with zero bits, whereas the right side is shifted with sign bits.
   3910     */
   3911     __m128i lo = _mm_srli_epi64(x, count);
   3912     __m128i hi = _mm_srai_epi32(x, count);
   3913 
   3914     hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0));    /* The high part needs to have the low part cleared. */
   3915 
   3916     return _mm_or_si128(lo, hi);
   3917 }
   3918 
   3919 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   3920 {
   3921     int i;
   3922     drflac_uint32 riceParamMask;
   3923     drflac_int32* pDecodedSamples    = pSamplesOut;
   3924     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
   3925     drflac_uint32 zeroCountParts0 = 0;
   3926     drflac_uint32 zeroCountParts1 = 0;
   3927     drflac_uint32 zeroCountParts2 = 0;
   3928     drflac_uint32 zeroCountParts3 = 0;
   3929     drflac_uint32 riceParamParts0 = 0;
   3930     drflac_uint32 riceParamParts1 = 0;
   3931     drflac_uint32 riceParamParts2 = 0;
   3932     drflac_uint32 riceParamParts3 = 0;
   3933     __m128i coefficients128_0;
   3934     __m128i coefficients128_4;
   3935     __m128i coefficients128_8;
   3936     __m128i samples128_0;
   3937     __m128i samples128_4;
   3938     __m128i samples128_8;
   3939     __m128i riceParamMask128;
   3940 
   3941     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
   3942 
   3943     riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
   3944     riceParamMask128 = _mm_set1_epi32(riceParamMask);
   3945 
   3946     /* Pre-load. */
   3947     coefficients128_0 = _mm_setzero_si128();
   3948     coefficients128_4 = _mm_setzero_si128();
   3949     coefficients128_8 = _mm_setzero_si128();
   3950 
   3951     samples128_0 = _mm_setzero_si128();
   3952     samples128_4 = _mm_setzero_si128();
   3953     samples128_8 = _mm_setzero_si128();
   3954 
   3955     /*
   3956     Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
   3957     what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
   3958     in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
   3959     so I think there's opportunity for this to be simplified.
   3960     */
   3961 #if 1
   3962     {
   3963         int runningOrder = order;
   3964 
   3965         /* 0 - 3. */
   3966         if (runningOrder >= 4) {
   3967             coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
   3968             samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
   3969             runningOrder -= 4;
   3970         } else {
   3971             switch (runningOrder) {
   3972                 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
   3973                 case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
   3974                 case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
   3975             }
   3976             runningOrder = 0;
   3977         }
   3978 
   3979         /* 4 - 7 */
   3980         if (runningOrder >= 4) {
   3981             coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
   3982             samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
   3983             runningOrder -= 4;
   3984         } else {
   3985             switch (runningOrder) {
   3986                 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
   3987                 case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
   3988                 case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
   3989             }
   3990             runningOrder = 0;
   3991         }
   3992 
   3993         /* 8 - 11 */
   3994         if (runningOrder == 4) {
   3995             coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
   3996             samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
   3997             runningOrder -= 4;
   3998         } else {
   3999             switch (runningOrder) {
   4000                 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
   4001                 case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
   4002                 case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
   4003             }
   4004             runningOrder = 0;
   4005         }
   4006 
   4007         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
   4008         coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
   4009         coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
   4010         coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
   4011     }
   4012 #else
   4013     /* This causes strict-aliasing warnings with GCC. */
   4014     switch (order)
   4015     {
   4016     case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
   4017     case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
   4018     case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
   4019     case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
   4020     case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
   4021     case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
   4022     case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
   4023     case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
   4024     case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
   4025     case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
   4026     case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
   4027     case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
   4028     }
   4029 #endif
   4030 
   4031     /* For this version we are doing one sample at a time. */
   4032     while (pDecodedSamples < pDecodedSamplesEnd) {
   4033         __m128i prediction128;
   4034         __m128i zeroCountPart128;
   4035         __m128i riceParamPart128;
   4036 
   4037         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
   4038             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
   4039             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
   4040             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
   4041             return DRFLAC_FALSE;
   4042         }
   4043 
   4044         zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
   4045         riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
   4046 
   4047         riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
   4048         riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
   4049         riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01)));  /* <-- SSE2 compatible */
   4050         /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/   /* <-- Only supported from SSE4.1 and is slower in my testing... */
   4051 
   4052         if (order <= 4) {
   4053             for (i = 0; i < 4; i += 1) {
   4054                 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
   4055 
   4056                 /* Horizontal add and shift. */
   4057                 prediction128 = drflac__mm_hadd_epi32(prediction128);
   4058                 prediction128 = _mm_srai_epi32(prediction128, shift);
   4059                 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
   4060 
   4061                 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
   4062                 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
   4063             }
   4064         } else if (order <= 8) {
   4065             for (i = 0; i < 4; i += 1) {
   4066                 prediction128 =                              _mm_mullo_epi32(coefficients128_4, samples128_4);
   4067                 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
   4068 
   4069                 /* Horizontal add and shift. */
   4070                 prediction128 = drflac__mm_hadd_epi32(prediction128);
   4071                 prediction128 = _mm_srai_epi32(prediction128, shift);
   4072                 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
   4073 
   4074                 samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
   4075                 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
   4076                 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
   4077             }
   4078         } else {
   4079             for (i = 0; i < 4; i += 1) {
   4080                 prediction128 =                              _mm_mullo_epi32(coefficients128_8, samples128_8);
   4081                 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
   4082                 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
   4083 
   4084                 /* Horizontal add and shift. */
   4085                 prediction128 = drflac__mm_hadd_epi32(prediction128);
   4086                 prediction128 = _mm_srai_epi32(prediction128, shift);
   4087                 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
   4088 
   4089                 samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
   4090                 samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
   4091                 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
   4092                 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
   4093             }
   4094         }
   4095 
   4096         /* We store samples in groups of 4. */
   4097         _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
   4098         pDecodedSamples += 4;
   4099     }
   4100 
   4101     /* Make sure we process the last few samples. */
   4102     i = (count & ~3);
   4103     while (i < (int)count) {
   4104         /* Rice extraction. */
   4105         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
   4106             return DRFLAC_FALSE;
   4107         }
   4108 
   4109         /* Rice reconstruction. */
   4110         riceParamParts0 &= riceParamMask;
   4111         riceParamParts0 |= (zeroCountParts0 << riceParam);
   4112         riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
   4113 
   4114         /* Sample reconstruction. */
   4115         pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
   4116 
   4117         i += 1;
   4118         pDecodedSamples += 1;
   4119     }
   4120 
   4121     return DRFLAC_TRUE;
   4122 }
   4123 
   4124 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4125 {
   4126     int i;
   4127     drflac_uint32 riceParamMask;
   4128     drflac_int32* pDecodedSamples    = pSamplesOut;
   4129     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
   4130     drflac_uint32 zeroCountParts0 = 0;
   4131     drflac_uint32 zeroCountParts1 = 0;
   4132     drflac_uint32 zeroCountParts2 = 0;
   4133     drflac_uint32 zeroCountParts3 = 0;
   4134     drflac_uint32 riceParamParts0 = 0;
   4135     drflac_uint32 riceParamParts1 = 0;
   4136     drflac_uint32 riceParamParts2 = 0;
   4137     drflac_uint32 riceParamParts3 = 0;
   4138     __m128i coefficients128_0;
   4139     __m128i coefficients128_4;
   4140     __m128i coefficients128_8;
   4141     __m128i samples128_0;
   4142     __m128i samples128_4;
   4143     __m128i samples128_8;
   4144     __m128i prediction128;
   4145     __m128i riceParamMask128;
   4146 
   4147     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
   4148 
   4149     DRFLAC_ASSERT(order <= 12);
   4150 
   4151     riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
   4152     riceParamMask128 = _mm_set1_epi32(riceParamMask);
   4153 
   4154     prediction128 = _mm_setzero_si128();
   4155 
   4156     /* Pre-load. */
   4157     coefficients128_0  = _mm_setzero_si128();
   4158     coefficients128_4  = _mm_setzero_si128();
   4159     coefficients128_8  = _mm_setzero_si128();
   4160 
   4161     samples128_0  = _mm_setzero_si128();
   4162     samples128_4  = _mm_setzero_si128();
   4163     samples128_8  = _mm_setzero_si128();
   4164 
   4165 #if 1
   4166     {
   4167         int runningOrder = order;
   4168 
   4169         /* 0 - 3. */
   4170         if (runningOrder >= 4) {
   4171             coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
   4172             samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
   4173             runningOrder -= 4;
   4174         } else {
   4175             switch (runningOrder) {
   4176                 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
   4177                 case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
   4178                 case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
   4179             }
   4180             runningOrder = 0;
   4181         }
   4182 
   4183         /* 4 - 7 */
   4184         if (runningOrder >= 4) {
   4185             coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
   4186             samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
   4187             runningOrder -= 4;
   4188         } else {
   4189             switch (runningOrder) {
   4190                 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
   4191                 case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
   4192                 case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
   4193             }
   4194             runningOrder = 0;
   4195         }
   4196 
   4197         /* 8 - 11 */
   4198         if (runningOrder == 4) {
   4199             coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
   4200             samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
   4201             runningOrder -= 4;
   4202         } else {
   4203             switch (runningOrder) {
   4204                 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
   4205                 case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
   4206                 case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
   4207             }
   4208             runningOrder = 0;
   4209         }
   4210 
   4211         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
   4212         coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
   4213         coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
   4214         coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
   4215     }
   4216 #else
   4217     switch (order)
   4218     {
   4219     case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
   4220     case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
   4221     case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
   4222     case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
   4223     case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
   4224     case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
   4225     case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
   4226     case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
   4227     case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
   4228     case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
   4229     case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
   4230     case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
   4231     }
   4232 #endif
   4233 
   4234     /* For this version we are doing one sample at a time. */
   4235     while (pDecodedSamples < pDecodedSamplesEnd) {
   4236         __m128i zeroCountPart128;
   4237         __m128i riceParamPart128;
   4238 
   4239         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
   4240             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
   4241             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
   4242             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
   4243             return DRFLAC_FALSE;
   4244         }
   4245 
   4246         zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
   4247         riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
   4248 
   4249         riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
   4250         riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
   4251         riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
   4252 
   4253         for (i = 0; i < 4; i += 1) {
   4254             prediction128 = _mm_xor_si128(prediction128, prediction128);    /* Reset to 0. */
   4255 
   4256             switch (order)
   4257             {
   4258             case 12:
   4259             case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
   4260             case 10:
   4261             case  9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
   4262             case  8:
   4263             case  7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
   4264             case  6:
   4265             case  5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
   4266             case  4:
   4267             case  3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
   4268             case  2:
   4269             case  1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
   4270             }
   4271 
   4272             /* Horizontal add and shift. */
   4273             prediction128 = drflac__mm_hadd_epi64(prediction128);
   4274             prediction128 = drflac__mm_srai_epi64(prediction128, shift);
   4275             prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
   4276 
   4277             /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
   4278             samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
   4279             samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
   4280             samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
   4281 
   4282             /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
   4283             riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
   4284         }
   4285 
   4286         /* We store samples in groups of 4. */
   4287         _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
   4288         pDecodedSamples += 4;
   4289     }
   4290 
   4291     /* Make sure we process the last few samples. */
   4292     i = (count & ~3);
   4293     while (i < (int)count) {
   4294         /* Rice extraction. */
   4295         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
   4296             return DRFLAC_FALSE;
   4297         }
   4298 
   4299         /* Rice reconstruction. */
   4300         riceParamParts0 &= riceParamMask;
   4301         riceParamParts0 |= (zeroCountParts0 << riceParam);
   4302         riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
   4303 
   4304         /* Sample reconstruction. */
   4305         pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
   4306 
   4307         i += 1;
   4308         pDecodedSamples += 1;
   4309     }
   4310 
   4311     return DRFLAC_TRUE;
   4312 }
   4313 
   4314 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4315 {
   4316     DRFLAC_ASSERT(bs != NULL);
   4317     DRFLAC_ASSERT(pSamplesOut != NULL);
   4318 
   4319     /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
   4320     if (lpcOrder > 0 && lpcOrder <= 12) {
   4321         if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
   4322             return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
   4323         } else {
   4324             return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
   4325         }
   4326     } else {
   4327         return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
   4328     }
   4329 }
   4330 #endif
   4331 
   4332 #if defined(DRFLAC_SUPPORT_NEON)
   4333 static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
   4334 {
   4335     vst1q_s32(p+0, x.val[0]);
   4336     vst1q_s32(p+4, x.val[1]);
   4337 }
   4338 
   4339 static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
   4340 {
   4341     vst1q_u32(p+0, x.val[0]);
   4342     vst1q_u32(p+4, x.val[1]);
   4343 }
   4344 
   4345 static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
   4346 {
   4347     vst1q_f32(p+0, x.val[0]);
   4348     vst1q_f32(p+4, x.val[1]);
   4349 }
   4350 
   4351 static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
   4352 {
   4353     vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
   4354 }
   4355 
   4356 static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
   4357 {
   4358     vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
   4359 }
   4360 
   4361 static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
   4362 {
   4363     drflac_int32 x[4];
   4364     x[3] = x3;
   4365     x[2] = x2;
   4366     x[1] = x1;
   4367     x[0] = x0;
   4368     return vld1q_s32(x);
   4369 }
   4370 
   4371 static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
   4372 {
   4373     /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
   4374 
   4375     /* Reference */
   4376     /*return drflac__vdupq_n_s32x4(
   4377         vgetq_lane_s32(a, 0),
   4378         vgetq_lane_s32(b, 3),
   4379         vgetq_lane_s32(b, 2),
   4380         vgetq_lane_s32(b, 1)
   4381     );*/
   4382 
   4383     return vextq_s32(b, a, 1);
   4384 }
   4385 
   4386 static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
   4387 {
   4388     /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
   4389 
   4390     /* Reference */
   4391     /*return drflac__vdupq_n_s32x4(
   4392         vgetq_lane_s32(a, 0),
   4393         vgetq_lane_s32(b, 3),
   4394         vgetq_lane_s32(b, 2),
   4395         vgetq_lane_s32(b, 1)
   4396     );*/
   4397 
   4398     return vextq_u32(b, a, 1);
   4399 }
   4400 
   4401 static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
   4402 {
   4403     /* The sum must end up in position 0. */
   4404 
   4405     /* Reference */
   4406     /*return vdupq_n_s32(
   4407         vgetq_lane_s32(x, 3) +
   4408         vgetq_lane_s32(x, 2) +
   4409         vgetq_lane_s32(x, 1) +
   4410         vgetq_lane_s32(x, 0)
   4411     );*/
   4412 
   4413     int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
   4414     return vpadd_s32(r, r);
   4415 }
   4416 
   4417 static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
   4418 {
   4419     return vadd_s64(vget_high_s64(x), vget_low_s64(x));
   4420 }
   4421 
   4422 static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
   4423 {
   4424     /* Reference */
   4425     /*return drflac__vdupq_n_s32x4(
   4426         vgetq_lane_s32(x, 0),
   4427         vgetq_lane_s32(x, 1),
   4428         vgetq_lane_s32(x, 2),
   4429         vgetq_lane_s32(x, 3)
   4430     );*/
   4431 
   4432     return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
   4433 }
   4434 
   4435 static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
   4436 {
   4437     return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
   4438 }
   4439 
   4440 static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
   4441 {
   4442     return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
   4443 }
   4444 
   4445 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4446 {
   4447     int i;
   4448     drflac_uint32 riceParamMask;
   4449     drflac_int32* pDecodedSamples    = pSamplesOut;
   4450     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
   4451     drflac_uint32 zeroCountParts[4];
   4452     drflac_uint32 riceParamParts[4];
   4453     int32x4_t coefficients128_0;
   4454     int32x4_t coefficients128_4;
   4455     int32x4_t coefficients128_8;
   4456     int32x4_t samples128_0;
   4457     int32x4_t samples128_4;
   4458     int32x4_t samples128_8;
   4459     uint32x4_t riceParamMask128;
   4460     int32x4_t riceParam128;
   4461     int32x2_t shift64;
   4462     uint32x4_t one128;
   4463 
   4464     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
   4465 
   4466     riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
   4467     riceParamMask128 = vdupq_n_u32(riceParamMask);
   4468 
   4469     riceParam128 = vdupq_n_s32(riceParam);
   4470     shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
   4471     one128 = vdupq_n_u32(1);
   4472 
   4473     /*
   4474     Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
   4475     what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
   4476     in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
   4477     so I think there's opportunity for this to be simplified.
   4478     */
   4479     {
   4480         int runningOrder = order;
   4481         drflac_int32 tempC[4] = {0, 0, 0, 0};
   4482         drflac_int32 tempS[4] = {0, 0, 0, 0};
   4483 
   4484         /* 0 - 3. */
   4485         if (runningOrder >= 4) {
   4486             coefficients128_0 = vld1q_s32(coefficients + 0);
   4487             samples128_0      = vld1q_s32(pSamplesOut  - 4);
   4488             runningOrder -= 4;
   4489         } else {
   4490             switch (runningOrder) {
   4491                 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
   4492                 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
   4493                 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
   4494             }
   4495 
   4496             coefficients128_0 = vld1q_s32(tempC);
   4497             samples128_0      = vld1q_s32(tempS);
   4498             runningOrder = 0;
   4499         }
   4500 
   4501         /* 4 - 7 */
   4502         if (runningOrder >= 4) {
   4503             coefficients128_4 = vld1q_s32(coefficients + 4);
   4504             samples128_4      = vld1q_s32(pSamplesOut  - 8);
   4505             runningOrder -= 4;
   4506         } else {
   4507             switch (runningOrder) {
   4508                 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
   4509                 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
   4510                 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
   4511             }
   4512 
   4513             coefficients128_4 = vld1q_s32(tempC);
   4514             samples128_4      = vld1q_s32(tempS);
   4515             runningOrder = 0;
   4516         }
   4517 
   4518         /* 8 - 11 */
   4519         if (runningOrder == 4) {
   4520             coefficients128_8 = vld1q_s32(coefficients + 8);
   4521             samples128_8      = vld1q_s32(pSamplesOut  - 12);
   4522             runningOrder -= 4;
   4523         } else {
   4524             switch (runningOrder) {
   4525                 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
   4526                 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
   4527                 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
   4528             }
   4529 
   4530             coefficients128_8 = vld1q_s32(tempC);
   4531             samples128_8      = vld1q_s32(tempS);
   4532             runningOrder = 0;
   4533         }
   4534 
   4535         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
   4536         coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
   4537         coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
   4538         coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
   4539     }
   4540 
   4541     /* For this version we are doing one sample at a time. */
   4542     while (pDecodedSamples < pDecodedSamplesEnd) {
   4543         int32x4_t prediction128;
   4544         int32x2_t prediction64;
   4545         uint32x4_t zeroCountPart128;
   4546         uint32x4_t riceParamPart128;
   4547 
   4548         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
   4549             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
   4550             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
   4551             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
   4552             return DRFLAC_FALSE;
   4553         }
   4554 
   4555         zeroCountPart128 = vld1q_u32(zeroCountParts);
   4556         riceParamPart128 = vld1q_u32(riceParamParts);
   4557 
   4558         riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
   4559         riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
   4560         riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
   4561 
   4562         if (order <= 4) {
   4563             for (i = 0; i < 4; i += 1) {
   4564                 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
   4565 
   4566                 /* Horizontal add and shift. */
   4567                 prediction64 = drflac__vhaddq_s32(prediction128);
   4568                 prediction64 = vshl_s32(prediction64, shift64);
   4569                 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
   4570 
   4571                 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
   4572                 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
   4573             }
   4574         } else if (order <= 8) {
   4575             for (i = 0; i < 4; i += 1) {
   4576                 prediction128 =                vmulq_s32(coefficients128_4, samples128_4);
   4577                 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
   4578 
   4579                 /* Horizontal add and shift. */
   4580                 prediction64 = drflac__vhaddq_s32(prediction128);
   4581                 prediction64 = vshl_s32(prediction64, shift64);
   4582                 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
   4583 
   4584                 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
   4585                 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
   4586                 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
   4587             }
   4588         } else {
   4589             for (i = 0; i < 4; i += 1) {
   4590                 prediction128 =                vmulq_s32(coefficients128_8, samples128_8);
   4591                 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
   4592                 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
   4593 
   4594                 /* Horizontal add and shift. */
   4595                 prediction64 = drflac__vhaddq_s32(prediction128);
   4596                 prediction64 = vshl_s32(prediction64, shift64);
   4597                 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
   4598 
   4599                 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
   4600                 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
   4601                 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
   4602                 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
   4603             }
   4604         }
   4605 
   4606         /* We store samples in groups of 4. */
   4607         vst1q_s32(pDecodedSamples, samples128_0);
   4608         pDecodedSamples += 4;
   4609     }
   4610 
   4611     /* Make sure we process the last few samples. */
   4612     i = (count & ~3);
   4613     while (i < (int)count) {
   4614         /* Rice extraction. */
   4615         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
   4616             return DRFLAC_FALSE;
   4617         }
   4618 
   4619         /* Rice reconstruction. */
   4620         riceParamParts[0] &= riceParamMask;
   4621         riceParamParts[0] |= (zeroCountParts[0] << riceParam);
   4622         riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
   4623 
   4624         /* Sample reconstruction. */
   4625         pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
   4626 
   4627         i += 1;
   4628         pDecodedSamples += 1;
   4629     }
   4630 
   4631     return DRFLAC_TRUE;
   4632 }
   4633 
   4634 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4635 {
   4636     int i;
   4637     drflac_uint32 riceParamMask;
   4638     drflac_int32* pDecodedSamples    = pSamplesOut;
   4639     drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
   4640     drflac_uint32 zeroCountParts[4];
   4641     drflac_uint32 riceParamParts[4];
   4642     int32x4_t coefficients128_0;
   4643     int32x4_t coefficients128_4;
   4644     int32x4_t coefficients128_8;
   4645     int32x4_t samples128_0;
   4646     int32x4_t samples128_4;
   4647     int32x4_t samples128_8;
   4648     uint32x4_t riceParamMask128;
   4649     int32x4_t riceParam128;
   4650     int64x1_t shift64;
   4651     uint32x4_t one128;
   4652     int64x2_t prediction128 = { 0 };
   4653     uint32x4_t zeroCountPart128;
   4654     uint32x4_t riceParamPart128;
   4655 
   4656     const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
   4657 
   4658     riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
   4659     riceParamMask128 = vdupq_n_u32(riceParamMask);
   4660 
   4661     riceParam128 = vdupq_n_s32(riceParam);
   4662     shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
   4663     one128 = vdupq_n_u32(1);
   4664 
   4665     /*
   4666     Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
   4667     what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
   4668     in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
   4669     so I think there's opportunity for this to be simplified.
   4670     */
   4671     {
   4672         int runningOrder = order;
   4673         drflac_int32 tempC[4] = {0, 0, 0, 0};
   4674         drflac_int32 tempS[4] = {0, 0, 0, 0};
   4675 
   4676         /* 0 - 3. */
   4677         if (runningOrder >= 4) {
   4678             coefficients128_0 = vld1q_s32(coefficients + 0);
   4679             samples128_0      = vld1q_s32(pSamplesOut  - 4);
   4680             runningOrder -= 4;
   4681         } else {
   4682             switch (runningOrder) {
   4683                 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
   4684                 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
   4685                 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
   4686             }
   4687 
   4688             coefficients128_0 = vld1q_s32(tempC);
   4689             samples128_0      = vld1q_s32(tempS);
   4690             runningOrder = 0;
   4691         }
   4692 
   4693         /* 4 - 7 */
   4694         if (runningOrder >= 4) {
   4695             coefficients128_4 = vld1q_s32(coefficients + 4);
   4696             samples128_4      = vld1q_s32(pSamplesOut  - 8);
   4697             runningOrder -= 4;
   4698         } else {
   4699             switch (runningOrder) {
   4700                 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
   4701                 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
   4702                 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
   4703             }
   4704 
   4705             coefficients128_4 = vld1q_s32(tempC);
   4706             samples128_4      = vld1q_s32(tempS);
   4707             runningOrder = 0;
   4708         }
   4709 
   4710         /* 8 - 11 */
   4711         if (runningOrder == 4) {
   4712             coefficients128_8 = vld1q_s32(coefficients + 8);
   4713             samples128_8      = vld1q_s32(pSamplesOut  - 12);
   4714             runningOrder -= 4;
   4715         } else {
   4716             switch (runningOrder) {
   4717                 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
   4718                 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
   4719                 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
   4720             }
   4721 
   4722             coefficients128_8 = vld1q_s32(tempC);
   4723             samples128_8      = vld1q_s32(tempS);
   4724             runningOrder = 0;
   4725         }
   4726 
   4727         /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
   4728         coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
   4729         coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
   4730         coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
   4731     }
   4732 
   4733     /* For this version we are doing one sample at a time. */
   4734     while (pDecodedSamples < pDecodedSamplesEnd) {
   4735         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
   4736             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
   4737             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
   4738             !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
   4739             return DRFLAC_FALSE;
   4740         }
   4741 
   4742         zeroCountPart128 = vld1q_u32(zeroCountParts);
   4743         riceParamPart128 = vld1q_u32(riceParamParts);
   4744 
   4745         riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
   4746         riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
   4747         riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
   4748 
   4749         for (i = 0; i < 4; i += 1) {
   4750             int64x1_t prediction64;
   4751 
   4752             prediction128 = veorq_s64(prediction128, prediction128);    /* Reset to 0. */
   4753             switch (order)
   4754             {
   4755             case 12:
   4756             case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
   4757             case 10:
   4758             case  9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
   4759             case  8:
   4760             case  7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
   4761             case  6:
   4762             case  5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
   4763             case  4:
   4764             case  3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
   4765             case  2:
   4766             case  1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
   4767             }
   4768 
   4769             /* Horizontal add and shift. */
   4770             prediction64 = drflac__vhaddq_s64(prediction128);
   4771             prediction64 = vshl_s64(prediction64, shift64);
   4772             prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
   4773 
   4774             /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
   4775             samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
   4776             samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
   4777             samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
   4778 
   4779             /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
   4780             riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
   4781         }
   4782 
   4783         /* We store samples in groups of 4. */
   4784         vst1q_s32(pDecodedSamples, samples128_0);
   4785         pDecodedSamples += 4;
   4786     }
   4787 
   4788     /* Make sure we process the last few samples. */
   4789     i = (count & ~3);
   4790     while (i < (int)count) {
   4791         /* Rice extraction. */
   4792         if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
   4793             return DRFLAC_FALSE;
   4794         }
   4795 
   4796         /* Rice reconstruction. */
   4797         riceParamParts[0] &= riceParamMask;
   4798         riceParamParts[0] |= (zeroCountParts[0] << riceParam);
   4799         riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
   4800 
   4801         /* Sample reconstruction. */
   4802         pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
   4803 
   4804         i += 1;
   4805         pDecodedSamples += 1;
   4806     }
   4807 
   4808     return DRFLAC_TRUE;
   4809 }
   4810 
   4811 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4812 {
   4813     DRFLAC_ASSERT(bs != NULL);
   4814     DRFLAC_ASSERT(pSamplesOut != NULL);
   4815 
   4816     /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
   4817     if (lpcOrder > 0 && lpcOrder <= 12) {
   4818         if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
   4819             return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
   4820         } else {
   4821             return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
   4822         }
   4823     } else {
   4824         return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
   4825     }
   4826 }
   4827 #endif
   4828 
   4829 static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4830 {
   4831 #if defined(DRFLAC_SUPPORT_SSE41)
   4832     if (drflac__gIsSSE41Supported) {
   4833         return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
   4834     } else
   4835 #elif defined(DRFLAC_SUPPORT_NEON)
   4836     if (drflac__gIsNEONSupported) {
   4837         return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
   4838     } else
   4839 #endif
   4840     {
   4841         /* Scalar fallback. */
   4842     #if 0
   4843         return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
   4844     #else
   4845         return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
   4846     #endif
   4847     }
   4848 }
   4849 
   4850 /* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
   4851 static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
   4852 {
   4853     drflac_uint32 i;
   4854 
   4855     DRFLAC_ASSERT(bs != NULL);
   4856 
   4857     for (i = 0; i < count; ++i) {
   4858         if (!drflac__seek_rice_parts(bs, riceParam)) {
   4859             return DRFLAC_FALSE;
   4860         }
   4861     }
   4862 
   4863     return DRFLAC_TRUE;
   4864 }
   4865 
   4866 #if defined(__clang__)
   4867 __attribute__((no_sanitize("signed-integer-overflow")))
   4868 #endif
   4869 static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
   4870 {
   4871     drflac_uint32 i;
   4872 
   4873     DRFLAC_ASSERT(bs != NULL);
   4874     DRFLAC_ASSERT(unencodedBitsPerSample <= 31);    /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
   4875     DRFLAC_ASSERT(pSamplesOut != NULL);
   4876 
   4877     for (i = 0; i < count; ++i) {
   4878         if (unencodedBitsPerSample > 0) {
   4879             if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
   4880                 return DRFLAC_FALSE;
   4881             }
   4882         } else {
   4883             pSamplesOut[i] = 0;
   4884         }
   4885 
   4886         if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
   4887             pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
   4888         } else {
   4889             pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
   4890         }
   4891     }
   4892 
   4893     return DRFLAC_TRUE;
   4894 }
   4895 
   4896 
   4897 /*
   4898 Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
   4899 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
   4900 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
   4901 */
   4902 static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
   4903 {
   4904     drflac_uint8 residualMethod;
   4905     drflac_uint8 partitionOrder;
   4906     drflac_uint32 samplesInPartition;
   4907     drflac_uint32 partitionsRemaining;
   4908 
   4909     DRFLAC_ASSERT(bs != NULL);
   4910     DRFLAC_ASSERT(blockSize != 0);
   4911     DRFLAC_ASSERT(pDecodedSamples != NULL);       /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
   4912 
   4913     if (!drflac__read_uint8(bs, 2, &residualMethod)) {
   4914         return DRFLAC_FALSE;
   4915     }
   4916 
   4917     if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
   4918         return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
   4919     }
   4920 
   4921     /* Ignore the first <order> values. */
   4922     pDecodedSamples += lpcOrder;
   4923 
   4924     if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
   4925         return DRFLAC_FALSE;
   4926     }
   4927 
   4928     /*
   4929     From the FLAC spec:
   4930       The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
   4931     */
   4932     if (partitionOrder > 8) {
   4933         return DRFLAC_FALSE;
   4934     }
   4935 
   4936     /* Validation check. */
   4937     if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
   4938         return DRFLAC_FALSE;
   4939     }
   4940 
   4941     samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
   4942     partitionsRemaining = (1 << partitionOrder);
   4943     for (;;) {
   4944         drflac_uint8 riceParam = 0;
   4945         if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
   4946             if (!drflac__read_uint8(bs, 4, &riceParam)) {
   4947                 return DRFLAC_FALSE;
   4948             }
   4949             if (riceParam == 15) {
   4950                 riceParam = 0xFF;
   4951             }
   4952         } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
   4953             if (!drflac__read_uint8(bs, 5, &riceParam)) {
   4954                 return DRFLAC_FALSE;
   4955             }
   4956             if (riceParam == 31) {
   4957                 riceParam = 0xFF;
   4958             }
   4959         }
   4960 
   4961         if (riceParam != 0xFF) {
   4962             if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
   4963                 return DRFLAC_FALSE;
   4964             }
   4965         } else {
   4966             drflac_uint8 unencodedBitsPerSample = 0;
   4967             if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
   4968                 return DRFLAC_FALSE;
   4969             }
   4970 
   4971             if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
   4972                 return DRFLAC_FALSE;
   4973             }
   4974         }
   4975 
   4976         pDecodedSamples += samplesInPartition;
   4977 
   4978         if (partitionsRemaining == 1) {
   4979             break;
   4980         }
   4981 
   4982         partitionsRemaining -= 1;
   4983 
   4984         if (partitionOrder != 0) {
   4985             samplesInPartition = blockSize / (1 << partitionOrder);
   4986         }
   4987     }
   4988 
   4989     return DRFLAC_TRUE;
   4990 }
   4991 
   4992 /*
   4993 Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
   4994 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
   4995 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
   4996 */
   4997 static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
   4998 {
   4999     drflac_uint8 residualMethod;
   5000     drflac_uint8 partitionOrder;
   5001     drflac_uint32 samplesInPartition;
   5002     drflac_uint32 partitionsRemaining;
   5003 
   5004     DRFLAC_ASSERT(bs != NULL);
   5005     DRFLAC_ASSERT(blockSize != 0);
   5006 
   5007     if (!drflac__read_uint8(bs, 2, &residualMethod)) {
   5008         return DRFLAC_FALSE;
   5009     }
   5010 
   5011     if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
   5012         return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
   5013     }
   5014 
   5015     if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
   5016         return DRFLAC_FALSE;
   5017     }
   5018 
   5019     /*
   5020     From the FLAC spec:
   5021       The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
   5022     */
   5023     if (partitionOrder > 8) {
   5024         return DRFLAC_FALSE;
   5025     }
   5026 
   5027     /* Validation check. */
   5028     if ((blockSize / (1 << partitionOrder)) <= order) {
   5029         return DRFLAC_FALSE;
   5030     }
   5031 
   5032     samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
   5033     partitionsRemaining = (1 << partitionOrder);
   5034     for (;;)
   5035     {
   5036         drflac_uint8 riceParam = 0;
   5037         if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
   5038             if (!drflac__read_uint8(bs, 4, &riceParam)) {
   5039                 return DRFLAC_FALSE;
   5040             }
   5041             if (riceParam == 15) {
   5042                 riceParam = 0xFF;
   5043             }
   5044         } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
   5045             if (!drflac__read_uint8(bs, 5, &riceParam)) {
   5046                 return DRFLAC_FALSE;
   5047             }
   5048             if (riceParam == 31) {
   5049                 riceParam = 0xFF;
   5050             }
   5051         }
   5052 
   5053         if (riceParam != 0xFF) {
   5054             if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
   5055                 return DRFLAC_FALSE;
   5056             }
   5057         } else {
   5058             drflac_uint8 unencodedBitsPerSample = 0;
   5059             if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
   5060                 return DRFLAC_FALSE;
   5061             }
   5062 
   5063             if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
   5064                 return DRFLAC_FALSE;
   5065             }
   5066         }
   5067 
   5068 
   5069         if (partitionsRemaining == 1) {
   5070             break;
   5071         }
   5072 
   5073         partitionsRemaining -= 1;
   5074         samplesInPartition = blockSize / (1 << partitionOrder);
   5075     }
   5076 
   5077     return DRFLAC_TRUE;
   5078 }
   5079 
   5080 
   5081 static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
   5082 {
   5083     drflac_uint32 i;
   5084 
   5085     /* Only a single sample needs to be decoded here. */
   5086     drflac_int32 sample;
   5087     if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
   5088         return DRFLAC_FALSE;
   5089     }
   5090 
   5091     /*
   5092     We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
   5093     we'll want to look at a more efficient way.
   5094     */
   5095     for (i = 0; i < blockSize; ++i) {
   5096         pDecodedSamples[i] = sample;
   5097     }
   5098 
   5099     return DRFLAC_TRUE;
   5100 }
   5101 
   5102 static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
   5103 {
   5104     drflac_uint32 i;
   5105 
   5106     for (i = 0; i < blockSize; ++i) {
   5107         drflac_int32 sample;
   5108         if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
   5109             return DRFLAC_FALSE;
   5110         }
   5111 
   5112         pDecodedSamples[i] = sample;
   5113     }
   5114 
   5115     return DRFLAC_TRUE;
   5116 }
   5117 
   5118 static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
   5119 {
   5120     drflac_uint32 i;
   5121 
   5122     static drflac_int32 lpcCoefficientsTable[5][4] = {
   5123         {0,  0, 0,  0},
   5124         {1,  0, 0,  0},
   5125         {2, -1, 0,  0},
   5126         {3, -3, 1,  0},
   5127         {4, -6, 4, -1}
   5128     };
   5129 
   5130     /* Warm up samples and coefficients. */
   5131     for (i = 0; i < lpcOrder; ++i) {
   5132         drflac_int32 sample;
   5133         if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
   5134             return DRFLAC_FALSE;
   5135         }
   5136 
   5137         pDecodedSamples[i] = sample;
   5138     }
   5139 
   5140     if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
   5141         return DRFLAC_FALSE;
   5142     }
   5143 
   5144     return DRFLAC_TRUE;
   5145 }
   5146 
   5147 static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
   5148 {
   5149     drflac_uint8 i;
   5150     drflac_uint8 lpcPrecision;
   5151     drflac_int8 lpcShift;
   5152     drflac_int32 coefficients[32];
   5153 
   5154     /* Warm up samples. */
   5155     for (i = 0; i < lpcOrder; ++i) {
   5156         drflac_int32 sample;
   5157         if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
   5158             return DRFLAC_FALSE;
   5159         }
   5160 
   5161         pDecodedSamples[i] = sample;
   5162     }
   5163 
   5164     if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
   5165         return DRFLAC_FALSE;
   5166     }
   5167     if (lpcPrecision == 15) {
   5168         return DRFLAC_FALSE;    /* Invalid. */
   5169     }
   5170     lpcPrecision += 1;
   5171 
   5172     if (!drflac__read_int8(bs, 5, &lpcShift)) {
   5173         return DRFLAC_FALSE;
   5174     }
   5175 
   5176     /*
   5177     From the FLAC specification:
   5178 
   5179         Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
   5180 
   5181     Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
   5182     not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
   5183     */
   5184     if (lpcShift < 0) {
   5185         return DRFLAC_FALSE;
   5186     }
   5187 
   5188     DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
   5189     for (i = 0; i < lpcOrder; ++i) {
   5190         if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
   5191             return DRFLAC_FALSE;
   5192         }
   5193     }
   5194 
   5195     if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
   5196         return DRFLAC_FALSE;
   5197     }
   5198 
   5199     return DRFLAC_TRUE;
   5200 }
   5201 
   5202 
   5203 static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
   5204 {
   5205     const drflac_uint32 sampleRateTable[12]  = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
   5206     const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1};   /* -1 = reserved. */
   5207 
   5208     DRFLAC_ASSERT(bs != NULL);
   5209     DRFLAC_ASSERT(header != NULL);
   5210 
   5211     /* Keep looping until we find a valid sync code. */
   5212     for (;;) {
   5213         drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
   5214         drflac_uint8 reserved = 0;
   5215         drflac_uint8 blockingStrategy = 0;
   5216         drflac_uint8 blockSize = 0;
   5217         drflac_uint8 sampleRate = 0;
   5218         drflac_uint8 channelAssignment = 0;
   5219         drflac_uint8 bitsPerSample = 0;
   5220         drflac_bool32 isVariableBlockSize;
   5221 
   5222         if (!drflac__find_and_seek_to_next_sync_code(bs)) {
   5223             return DRFLAC_FALSE;
   5224         }
   5225 
   5226         if (!drflac__read_uint8(bs, 1, &reserved)) {
   5227             return DRFLAC_FALSE;
   5228         }
   5229         if (reserved == 1) {
   5230             continue;
   5231         }
   5232         crc8 = drflac_crc8(crc8, reserved, 1);
   5233 
   5234         if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
   5235             return DRFLAC_FALSE;
   5236         }
   5237         crc8 = drflac_crc8(crc8, blockingStrategy, 1);
   5238 
   5239         if (!drflac__read_uint8(bs, 4, &blockSize)) {
   5240             return DRFLAC_FALSE;
   5241         }
   5242         if (blockSize == 0) {
   5243             continue;
   5244         }
   5245         crc8 = drflac_crc8(crc8, blockSize, 4);
   5246 
   5247         if (!drflac__read_uint8(bs, 4, &sampleRate)) {
   5248             return DRFLAC_FALSE;
   5249         }
   5250         crc8 = drflac_crc8(crc8, sampleRate, 4);
   5251 
   5252         if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
   5253             return DRFLAC_FALSE;
   5254         }
   5255         if (channelAssignment > 10) {
   5256             continue;
   5257         }
   5258         crc8 = drflac_crc8(crc8, channelAssignment, 4);
   5259 
   5260         if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
   5261             return DRFLAC_FALSE;
   5262         }
   5263         if (bitsPerSample == 3 || bitsPerSample == 7) {
   5264             continue;
   5265         }
   5266         crc8 = drflac_crc8(crc8, bitsPerSample, 3);
   5267 
   5268 
   5269         if (!drflac__read_uint8(bs, 1, &reserved)) {
   5270             return DRFLAC_FALSE;
   5271         }
   5272         if (reserved == 1) {
   5273             continue;
   5274         }
   5275         crc8 = drflac_crc8(crc8, reserved, 1);
   5276 
   5277 
   5278         isVariableBlockSize = blockingStrategy == 1;
   5279         if (isVariableBlockSize) {
   5280             drflac_uint64 pcmFrameNumber;
   5281             drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
   5282             if (result != DRFLAC_SUCCESS) {
   5283                 if (result == DRFLAC_AT_END) {
   5284                     return DRFLAC_FALSE;
   5285                 } else {
   5286                     continue;
   5287                 }
   5288             }
   5289             header->flacFrameNumber  = 0;
   5290             header->pcmFrameNumber = pcmFrameNumber;
   5291         } else {
   5292             drflac_uint64 flacFrameNumber = 0;
   5293             drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
   5294             if (result != DRFLAC_SUCCESS) {
   5295                 if (result == DRFLAC_AT_END) {
   5296                     return DRFLAC_FALSE;
   5297                 } else {
   5298                     continue;
   5299                 }
   5300             }
   5301             header->flacFrameNumber  = (drflac_uint32)flacFrameNumber;   /* <-- Safe cast. */
   5302             header->pcmFrameNumber = 0;
   5303         }
   5304 
   5305 
   5306         DRFLAC_ASSERT(blockSize > 0);
   5307         if (blockSize == 1) {
   5308             header->blockSizeInPCMFrames = 192;
   5309         } else if (blockSize <= 5) {
   5310             DRFLAC_ASSERT(blockSize >= 2);
   5311             header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
   5312         } else if (blockSize == 6) {
   5313             if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
   5314                 return DRFLAC_FALSE;
   5315             }
   5316             crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
   5317             header->blockSizeInPCMFrames += 1;
   5318         } else if (blockSize == 7) {
   5319             if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
   5320                 return DRFLAC_FALSE;
   5321             }
   5322             crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
   5323             if (header->blockSizeInPCMFrames == 0xFFFF) {
   5324                 return DRFLAC_FALSE;    /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
   5325             }
   5326             header->blockSizeInPCMFrames += 1;
   5327         } else {
   5328             DRFLAC_ASSERT(blockSize >= 8);
   5329             header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
   5330         }
   5331 
   5332 
   5333         if (sampleRate <= 11) {
   5334             header->sampleRate = sampleRateTable[sampleRate];
   5335         } else if (sampleRate == 12) {
   5336             if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
   5337                 return DRFLAC_FALSE;
   5338             }
   5339             crc8 = drflac_crc8(crc8, header->sampleRate, 8);
   5340             header->sampleRate *= 1000;
   5341         } else if (sampleRate == 13) {
   5342             if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
   5343                 return DRFLAC_FALSE;
   5344             }
   5345             crc8 = drflac_crc8(crc8, header->sampleRate, 16);
   5346         } else if (sampleRate == 14) {
   5347             if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
   5348                 return DRFLAC_FALSE;
   5349             }
   5350             crc8 = drflac_crc8(crc8, header->sampleRate, 16);
   5351             header->sampleRate *= 10;
   5352         } else {
   5353             continue;  /* Invalid. Assume an invalid block. */
   5354         }
   5355 
   5356 
   5357         header->channelAssignment = channelAssignment;
   5358 
   5359         header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
   5360         if (header->bitsPerSample == 0) {
   5361             header->bitsPerSample = streaminfoBitsPerSample;
   5362         }
   5363 
   5364         if (header->bitsPerSample != streaminfoBitsPerSample) {
   5365             /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
   5366             return DRFLAC_FALSE;
   5367         }
   5368 
   5369         if (!drflac__read_uint8(bs, 8, &header->crc8)) {
   5370             return DRFLAC_FALSE;
   5371         }
   5372 
   5373 #ifndef DR_FLAC_NO_CRC
   5374         if (header->crc8 != crc8) {
   5375             continue;    /* CRC mismatch. Loop back to the top and find the next sync code. */
   5376         }
   5377 #endif
   5378         return DRFLAC_TRUE;
   5379     }
   5380 }
   5381 
   5382 static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
   5383 {
   5384     drflac_uint8 header;
   5385     int type;
   5386 
   5387     if (!drflac__read_uint8(bs, 8, &header)) {
   5388         return DRFLAC_FALSE;
   5389     }
   5390 
   5391     /* First bit should always be 0. */
   5392     if ((header & 0x80) != 0) {
   5393         return DRFLAC_FALSE;
   5394     }
   5395 
   5396     type = (header & 0x7E) >> 1;
   5397     if (type == 0) {
   5398         pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
   5399     } else if (type == 1) {
   5400         pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
   5401     } else {
   5402         if ((type & 0x20) != 0) {
   5403             pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
   5404             pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
   5405         } else if ((type & 0x08) != 0) {
   5406             pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
   5407             pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
   5408             if (pSubframe->lpcOrder > 4) {
   5409                 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
   5410                 pSubframe->lpcOrder = 0;
   5411             }
   5412         } else {
   5413             pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
   5414         }
   5415     }
   5416 
   5417     if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
   5418         return DRFLAC_FALSE;
   5419     }
   5420 
   5421     /* Wasted bits per sample. */
   5422     pSubframe->wastedBitsPerSample = 0;
   5423     if ((header & 0x01) == 1) {
   5424         unsigned int wastedBitsPerSample;
   5425         if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
   5426             return DRFLAC_FALSE;
   5427         }
   5428         pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
   5429     }
   5430 
   5431     return DRFLAC_TRUE;
   5432 }
   5433 
   5434 static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
   5435 {
   5436     drflac_subframe* pSubframe;
   5437     drflac_uint32 subframeBitsPerSample;
   5438 
   5439     DRFLAC_ASSERT(bs != NULL);
   5440     DRFLAC_ASSERT(frame != NULL);
   5441 
   5442     pSubframe = frame->subframes + subframeIndex;
   5443     if (!drflac__read_subframe_header(bs, pSubframe)) {
   5444         return DRFLAC_FALSE;
   5445     }
   5446 
   5447     /* Side channels require an extra bit per sample. Took a while to figure that one out... */
   5448     subframeBitsPerSample = frame->header.bitsPerSample;
   5449     if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
   5450         subframeBitsPerSample += 1;
   5451     } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
   5452         subframeBitsPerSample += 1;
   5453     }
   5454 
   5455     if (subframeBitsPerSample > 32) {
   5456         /* libFLAC and ffmpeg reject 33-bit subframes as well */
   5457         return DRFLAC_FALSE;
   5458     }
   5459 
   5460     /* Need to handle wasted bits per sample. */
   5461     if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
   5462         return DRFLAC_FALSE;
   5463     }
   5464     subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
   5465 
   5466     pSubframe->pSamplesS32 = pDecodedSamplesOut;
   5467 
   5468     switch (pSubframe->subframeType)
   5469     {
   5470         case DRFLAC_SUBFRAME_CONSTANT:
   5471         {
   5472             drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
   5473         } break;
   5474 
   5475         case DRFLAC_SUBFRAME_VERBATIM:
   5476         {
   5477             drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
   5478         } break;
   5479 
   5480         case DRFLAC_SUBFRAME_FIXED:
   5481         {
   5482             drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
   5483         } break;
   5484 
   5485         case DRFLAC_SUBFRAME_LPC:
   5486         {
   5487             drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
   5488         } break;
   5489 
   5490         default: return DRFLAC_FALSE;
   5491     }
   5492 
   5493     return DRFLAC_TRUE;
   5494 }
   5495 
   5496 static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
   5497 {
   5498     drflac_subframe* pSubframe;
   5499     drflac_uint32 subframeBitsPerSample;
   5500 
   5501     DRFLAC_ASSERT(bs != NULL);
   5502     DRFLAC_ASSERT(frame != NULL);
   5503 
   5504     pSubframe = frame->subframes + subframeIndex;
   5505     if (!drflac__read_subframe_header(bs, pSubframe)) {
   5506         return DRFLAC_FALSE;
   5507     }
   5508 
   5509     /* Side channels require an extra bit per sample. Took a while to figure that one out... */
   5510     subframeBitsPerSample = frame->header.bitsPerSample;
   5511     if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
   5512         subframeBitsPerSample += 1;
   5513     } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
   5514         subframeBitsPerSample += 1;
   5515     }
   5516 
   5517     /* Need to handle wasted bits per sample. */
   5518     if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
   5519         return DRFLAC_FALSE;
   5520     }
   5521     subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
   5522 
   5523     pSubframe->pSamplesS32 = NULL;
   5524 
   5525     switch (pSubframe->subframeType)
   5526     {
   5527         case DRFLAC_SUBFRAME_CONSTANT:
   5528         {
   5529             if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
   5530                 return DRFLAC_FALSE;
   5531             }
   5532         } break;
   5533 
   5534         case DRFLAC_SUBFRAME_VERBATIM:
   5535         {
   5536             unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
   5537             if (!drflac__seek_bits(bs, bitsToSeek)) {
   5538                 return DRFLAC_FALSE;
   5539             }
   5540         } break;
   5541 
   5542         case DRFLAC_SUBFRAME_FIXED:
   5543         {
   5544             unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
   5545             if (!drflac__seek_bits(bs, bitsToSeek)) {
   5546                 return DRFLAC_FALSE;
   5547             }
   5548 
   5549             if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
   5550                 return DRFLAC_FALSE;
   5551             }
   5552         } break;
   5553 
   5554         case DRFLAC_SUBFRAME_LPC:
   5555         {
   5556             drflac_uint8 lpcPrecision;
   5557 
   5558             unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
   5559             if (!drflac__seek_bits(bs, bitsToSeek)) {
   5560                 return DRFLAC_FALSE;
   5561             }
   5562 
   5563             if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
   5564                 return DRFLAC_FALSE;
   5565             }
   5566             if (lpcPrecision == 15) {
   5567                 return DRFLAC_FALSE;    /* Invalid. */
   5568             }
   5569             lpcPrecision += 1;
   5570 
   5571 
   5572             bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5;    /* +5 for shift. */
   5573             if (!drflac__seek_bits(bs, bitsToSeek)) {
   5574                 return DRFLAC_FALSE;
   5575             }
   5576 
   5577             if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
   5578                 return DRFLAC_FALSE;
   5579             }
   5580         } break;
   5581 
   5582         default: return DRFLAC_FALSE;
   5583     }
   5584 
   5585     return DRFLAC_TRUE;
   5586 }
   5587 
   5588 
   5589 static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
   5590 {
   5591     drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
   5592 
   5593     DRFLAC_ASSERT(channelAssignment <= 10);
   5594     return lookup[channelAssignment];
   5595 }
   5596 
   5597 static drflac_result drflac__decode_flac_frame(drflac* pFlac)
   5598 {
   5599     int channelCount;
   5600     int i;
   5601     drflac_uint8 paddingSizeInBits;
   5602     drflac_uint16 desiredCRC16;
   5603 #ifndef DR_FLAC_NO_CRC
   5604     drflac_uint16 actualCRC16;
   5605 #endif
   5606 
   5607     /* This function should be called while the stream is sitting on the first byte after the frame header. */
   5608     DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
   5609 
   5610     /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
   5611     if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
   5612         return DRFLAC_ERROR;
   5613     }
   5614 
   5615     /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
   5616     channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
   5617     if (channelCount != (int)pFlac->channels) {
   5618         return DRFLAC_ERROR;
   5619     }
   5620 
   5621     for (i = 0; i < channelCount; ++i) {
   5622         if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
   5623             return DRFLAC_ERROR;
   5624         }
   5625     }
   5626 
   5627     paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
   5628     if (paddingSizeInBits > 0) {
   5629         drflac_uint8 padding = 0;
   5630         if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
   5631             return DRFLAC_AT_END;
   5632         }
   5633     }
   5634 
   5635 #ifndef DR_FLAC_NO_CRC
   5636     actualCRC16 = drflac__flush_crc16(&pFlac->bs);
   5637 #endif
   5638     if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
   5639         return DRFLAC_AT_END;
   5640     }
   5641 
   5642 #ifndef DR_FLAC_NO_CRC
   5643     if (actualCRC16 != desiredCRC16) {
   5644         return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
   5645     }
   5646 #endif
   5647 
   5648     pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
   5649 
   5650     return DRFLAC_SUCCESS;
   5651 }
   5652 
   5653 static drflac_result drflac__seek_flac_frame(drflac* pFlac)
   5654 {
   5655     int channelCount;
   5656     int i;
   5657     drflac_uint16 desiredCRC16;
   5658 #ifndef DR_FLAC_NO_CRC
   5659     drflac_uint16 actualCRC16;
   5660 #endif
   5661 
   5662     channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
   5663     for (i = 0; i < channelCount; ++i) {
   5664         if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
   5665             return DRFLAC_ERROR;
   5666         }
   5667     }
   5668 
   5669     /* Padding. */
   5670     if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
   5671         return DRFLAC_ERROR;
   5672     }
   5673 
   5674     /* CRC. */
   5675 #ifndef DR_FLAC_NO_CRC
   5676     actualCRC16 = drflac__flush_crc16(&pFlac->bs);
   5677 #endif
   5678     if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
   5679         return DRFLAC_AT_END;
   5680     }
   5681 
   5682 #ifndef DR_FLAC_NO_CRC
   5683     if (actualCRC16 != desiredCRC16) {
   5684         return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
   5685     }
   5686 #endif
   5687 
   5688     return DRFLAC_SUCCESS;
   5689 }
   5690 
   5691 static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
   5692 {
   5693     DRFLAC_ASSERT(pFlac != NULL);
   5694 
   5695     for (;;) {
   5696         drflac_result result;
   5697 
   5698         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   5699             return DRFLAC_FALSE;
   5700         }
   5701 
   5702         result = drflac__decode_flac_frame(pFlac);
   5703         if (result != DRFLAC_SUCCESS) {
   5704             if (result == DRFLAC_CRC_MISMATCH) {
   5705                 continue;   /* CRC mismatch. Skip to the next frame. */
   5706             } else {
   5707                 return DRFLAC_FALSE;
   5708             }
   5709         }
   5710 
   5711         return DRFLAC_TRUE;
   5712     }
   5713 }
   5714 
   5715 static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
   5716 {
   5717     drflac_uint64 firstPCMFrame;
   5718     drflac_uint64 lastPCMFrame;
   5719 
   5720     DRFLAC_ASSERT(pFlac != NULL);
   5721 
   5722     firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
   5723     if (firstPCMFrame == 0) {
   5724         firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
   5725     }
   5726 
   5727     lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
   5728     if (lastPCMFrame > 0) {
   5729         lastPCMFrame -= 1; /* Needs to be zero based. */
   5730     }
   5731 
   5732     if (pFirstPCMFrame) {
   5733         *pFirstPCMFrame = firstPCMFrame;
   5734     }
   5735     if (pLastPCMFrame) {
   5736         *pLastPCMFrame = lastPCMFrame;
   5737     }
   5738 }
   5739 
   5740 static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
   5741 {
   5742     drflac_bool32 result;
   5743 
   5744     DRFLAC_ASSERT(pFlac != NULL);
   5745 
   5746     result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
   5747 
   5748     DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
   5749     pFlac->currentPCMFrame = 0;
   5750 
   5751     return result;
   5752 }
   5753 
   5754 static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
   5755 {
   5756     /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
   5757     DRFLAC_ASSERT(pFlac != NULL);
   5758     return drflac__seek_flac_frame(pFlac);
   5759 }
   5760 
   5761 
   5762 static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
   5763 {
   5764     drflac_uint64 pcmFramesRead = 0;
   5765     while (pcmFramesToSeek > 0) {
   5766         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
   5767             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
   5768                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
   5769             }
   5770         } else {
   5771             if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
   5772                 pcmFramesRead   += pcmFramesToSeek;
   5773                 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek;   /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
   5774                 pcmFramesToSeek  = 0;
   5775             } else {
   5776                 pcmFramesRead   += pFlac->currentFLACFrame.pcmFramesRemaining;
   5777                 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
   5778                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
   5779             }
   5780         }
   5781     }
   5782 
   5783     pFlac->currentPCMFrame += pcmFramesRead;
   5784     return pcmFramesRead;
   5785 }
   5786 
   5787 
   5788 static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
   5789 {
   5790     drflac_bool32 isMidFrame = DRFLAC_FALSE;
   5791     drflac_uint64 runningPCMFrameCount;
   5792 
   5793     DRFLAC_ASSERT(pFlac != NULL);
   5794 
   5795     /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
   5796     if (pcmFrameIndex >= pFlac->currentPCMFrame) {
   5797         /* Seeking forward. Need to seek from the current position. */
   5798         runningPCMFrameCount = pFlac->currentPCMFrame;
   5799 
   5800         /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
   5801         if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
   5802             if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   5803                 return DRFLAC_FALSE;
   5804             }
   5805         } else {
   5806             isMidFrame = DRFLAC_TRUE;
   5807         }
   5808     } else {
   5809         /* Seeking backwards. Need to seek from the start of the file. */
   5810         runningPCMFrameCount = 0;
   5811 
   5812         /* Move back to the start. */
   5813         if (!drflac__seek_to_first_frame(pFlac)) {
   5814             return DRFLAC_FALSE;
   5815         }
   5816 
   5817         /* Decode the first frame in preparation for sample-exact seeking below. */
   5818         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   5819             return DRFLAC_FALSE;
   5820         }
   5821     }
   5822 
   5823     /*
   5824     We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
   5825     header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
   5826     */
   5827     for (;;) {
   5828         drflac_uint64 pcmFrameCountInThisFLACFrame;
   5829         drflac_uint64 firstPCMFrameInFLACFrame = 0;
   5830         drflac_uint64 lastPCMFrameInFLACFrame = 0;
   5831 
   5832         drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
   5833 
   5834         pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
   5835         if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
   5836             /*
   5837             The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
   5838             it never existed and keep iterating.
   5839             */
   5840             drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
   5841 
   5842             if (!isMidFrame) {
   5843                 drflac_result result = drflac__decode_flac_frame(pFlac);
   5844                 if (result == DRFLAC_SUCCESS) {
   5845                     /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
   5846                     return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
   5847                 } else {
   5848                     if (result == DRFLAC_CRC_MISMATCH) {
   5849                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
   5850                     } else {
   5851                         return DRFLAC_FALSE;
   5852                     }
   5853                 }
   5854             } else {
   5855                 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
   5856                 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
   5857             }
   5858         } else {
   5859             /*
   5860             It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
   5861             frame never existed and leave the running sample count untouched.
   5862             */
   5863             if (!isMidFrame) {
   5864                 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
   5865                 if (result == DRFLAC_SUCCESS) {
   5866                     runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
   5867                 } else {
   5868                     if (result == DRFLAC_CRC_MISMATCH) {
   5869                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
   5870                     } else {
   5871                         return DRFLAC_FALSE;
   5872                     }
   5873                 }
   5874             } else {
   5875                 /*
   5876                 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
   5877                 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
   5878                 */
   5879                 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
   5880                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
   5881                 isMidFrame = DRFLAC_FALSE;
   5882             }
   5883 
   5884             /* If we are seeking to the end of the file and we've just hit it, we're done. */
   5885             if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
   5886                 return DRFLAC_TRUE;
   5887             }
   5888         }
   5889 
   5890     next_iteration:
   5891         /* Grab the next frame in preparation for the next iteration. */
   5892         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   5893             return DRFLAC_FALSE;
   5894         }
   5895     }
   5896 }
   5897 
   5898 
   5899 #if !defined(DR_FLAC_NO_CRC)
   5900 /*
   5901 We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
   5902 uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
   5903 location.
   5904 */
   5905 #define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
   5906 
   5907 static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
   5908 {
   5909     DRFLAC_ASSERT(pFlac != NULL);
   5910     DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
   5911     DRFLAC_ASSERT(targetByte >= rangeLo);
   5912     DRFLAC_ASSERT(targetByte <= rangeHi);
   5913 
   5914     *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
   5915 
   5916     for (;;) {
   5917         /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
   5918         drflac_uint64 lastTargetByte = targetByte;
   5919 
   5920         /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
   5921         if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
   5922             /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
   5923             if (targetByte == 0) {
   5924                 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
   5925                 return DRFLAC_FALSE;
   5926             }
   5927 
   5928             /* Halve the byte location and continue. */
   5929             targetByte = rangeLo + ((rangeHi - rangeLo)/2);
   5930             rangeHi = targetByte;
   5931         } else {
   5932             /* Getting here should mean that we have seeked to an appropriate byte. */
   5933 
   5934             /* Clear the details of the FLAC frame so we don't misreport data. */
   5935             DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
   5936 
   5937             /*
   5938             Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
   5939             CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
   5940             so it needs to stay this way for now.
   5941             */
   5942 #if 1
   5943             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
   5944                 /* Halve the byte location and continue. */
   5945                 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
   5946                 rangeHi = targetByte;
   5947             } else {
   5948                 break;
   5949             }
   5950 #else
   5951             if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   5952                 /* Halve the byte location and continue. */
   5953                 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
   5954                 rangeHi = targetByte;
   5955             } else {
   5956                 break;
   5957             }
   5958 #endif
   5959         }
   5960 
   5961         /* We already tried this byte and there are no more to try, break out. */
   5962         if(targetByte == lastTargetByte) {
   5963             return DRFLAC_FALSE;
   5964         }
   5965     }
   5966 
   5967     /* The current PCM frame needs to be updated based on the frame we just seeked to. */
   5968     drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
   5969 
   5970     DRFLAC_ASSERT(targetByte <= rangeHi);
   5971 
   5972     *pLastSuccessfulSeekOffset = targetByte;
   5973     return DRFLAC_TRUE;
   5974 }
   5975 
   5976 static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
   5977 {
   5978     /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
   5979 #if 0
   5980     if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
   5981         /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
   5982         if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
   5983             return DRFLAC_FALSE;
   5984         }
   5985     }
   5986 #endif
   5987 
   5988     return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
   5989 }
   5990 
   5991 
   5992 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
   5993 {
   5994     /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
   5995 
   5996     drflac_uint64 targetByte;
   5997     drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
   5998     drflac_uint64 pcmRangeHi = 0;
   5999     drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
   6000     drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
   6001     drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
   6002 
   6003     targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
   6004     if (targetByte > byteRangeHi) {
   6005         targetByte = byteRangeHi;
   6006     }
   6007 
   6008     for (;;) {
   6009         if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
   6010             /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
   6011             drflac_uint64 newPCMRangeLo;
   6012             drflac_uint64 newPCMRangeHi;
   6013             drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
   6014 
   6015             /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
   6016             if (pcmRangeLo == newPCMRangeLo) {
   6017                 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
   6018                     break;  /* Failed to seek to closest frame. */
   6019                 }
   6020 
   6021                 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
   6022                     return DRFLAC_TRUE;
   6023                 } else {
   6024                     break;  /* Failed to seek forward. */
   6025                 }
   6026             }
   6027 
   6028             pcmRangeLo = newPCMRangeLo;
   6029             pcmRangeHi = newPCMRangeHi;
   6030 
   6031             if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
   6032                 /* The target PCM frame is in this FLAC frame. */
   6033                 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
   6034                     return DRFLAC_TRUE;
   6035                 } else {
   6036                     break;  /* Failed to seek to FLAC frame. */
   6037                 }
   6038             } else {
   6039                 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
   6040 
   6041                 if (pcmRangeLo > pcmFrameIndex) {
   6042                     /* We seeked too far forward. We need to move our target byte backward and try again. */
   6043                     byteRangeHi = lastSuccessfulSeekOffset;
   6044                     if (byteRangeLo > byteRangeHi) {
   6045                         byteRangeLo = byteRangeHi;
   6046                     }
   6047 
   6048                     targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
   6049                     if (targetByte < byteRangeLo) {
   6050                         targetByte = byteRangeLo;
   6051                     }
   6052                 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
   6053                     /* We didn't seek far enough. We need to move our target byte forward and try again. */
   6054 
   6055                     /* If we're close enough we can just seek forward. */
   6056                     if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
   6057                         if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
   6058                             return DRFLAC_TRUE;
   6059                         } else {
   6060                             break;  /* Failed to seek to FLAC frame. */
   6061                         }
   6062                     } else {
   6063                         byteRangeLo = lastSuccessfulSeekOffset;
   6064                         if (byteRangeHi < byteRangeLo) {
   6065                             byteRangeHi = byteRangeLo;
   6066                         }
   6067 
   6068                         targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
   6069                         if (targetByte > byteRangeHi) {
   6070                             targetByte = byteRangeHi;
   6071                         }
   6072 
   6073                         if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
   6074                             closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
   6075                         }
   6076                     }
   6077                 }
   6078             }
   6079         } else {
   6080             /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
   6081             break;
   6082         }
   6083     }
   6084 
   6085     drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
   6086     return DRFLAC_FALSE;
   6087 }
   6088 
   6089 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
   6090 {
   6091     drflac_uint64 byteRangeLo;
   6092     drflac_uint64 byteRangeHi;
   6093     drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
   6094 
   6095     /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
   6096     if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
   6097         return DRFLAC_FALSE;
   6098     }
   6099 
   6100     /* If we're close enough to the start, just move to the start and seek forward. */
   6101     if (pcmFrameIndex < seekForwardThreshold) {
   6102         return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
   6103     }
   6104 
   6105     /*
   6106     Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
   6107     the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
   6108     */
   6109     byteRangeLo = pFlac->firstFLACFramePosInBytes;
   6110     byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
   6111 
   6112     return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
   6113 }
   6114 #endif  /* !DR_FLAC_NO_CRC */
   6115 
   6116 static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
   6117 {
   6118     drflac_uint32 iClosestSeekpoint = 0;
   6119     drflac_bool32 isMidFrame = DRFLAC_FALSE;
   6120     drflac_uint64 runningPCMFrameCount;
   6121     drflac_uint32 iSeekpoint;
   6122 
   6123 
   6124     DRFLAC_ASSERT(pFlac != NULL);
   6125 
   6126     if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
   6127         return DRFLAC_FALSE;
   6128     }
   6129 
   6130     /* Do not use the seektable if pcmFramIndex is not coverd by it. */
   6131     if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
   6132         return DRFLAC_FALSE;
   6133     }
   6134 
   6135     for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
   6136         if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
   6137             break;
   6138         }
   6139 
   6140         iClosestSeekpoint = iSeekpoint;
   6141     }
   6142 
   6143     /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
   6144     if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
   6145         return DRFLAC_FALSE;
   6146     }
   6147     if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
   6148         return DRFLAC_FALSE;
   6149     }
   6150 
   6151 #if !defined(DR_FLAC_NO_CRC)
   6152     /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
   6153     if (pFlac->totalPCMFrameCount > 0) {
   6154         drflac_uint64 byteRangeLo;
   6155         drflac_uint64 byteRangeHi;
   6156 
   6157         byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
   6158         byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
   6159 
   6160         /*
   6161         If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
   6162         value for byteRangeHi which will clamp it appropriately.
   6163 
   6164         Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
   6165         have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
   6166         */
   6167         if (iClosestSeekpoint < pFlac->seekpointCount-1) {
   6168             drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
   6169 
   6170             /* Basic validation on the seekpoints to ensure they're usable. */
   6171             if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
   6172                 return DRFLAC_FALSE;    /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
   6173             }
   6174 
   6175             if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
   6176                 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
   6177             }
   6178         }
   6179 
   6180         if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
   6181             if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   6182                 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
   6183 
   6184                 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
   6185                     return DRFLAC_TRUE;
   6186                 }
   6187             }
   6188         }
   6189     }
   6190 #endif  /* !DR_FLAC_NO_CRC */
   6191 
   6192     /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
   6193 
   6194     /*
   6195     If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
   6196     from the seekpoint's first sample.
   6197     */
   6198     if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
   6199         /* Optimized case. Just seek forward from where we are. */
   6200         runningPCMFrameCount = pFlac->currentPCMFrame;
   6201 
   6202         /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
   6203         if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
   6204             if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   6205                 return DRFLAC_FALSE;
   6206             }
   6207         } else {
   6208             isMidFrame = DRFLAC_TRUE;
   6209         }
   6210     } else {
   6211         /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
   6212         runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
   6213 
   6214         if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
   6215             return DRFLAC_FALSE;
   6216         }
   6217 
   6218         /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
   6219         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   6220             return DRFLAC_FALSE;
   6221         }
   6222     }
   6223 
   6224     for (;;) {
   6225         drflac_uint64 pcmFrameCountInThisFLACFrame;
   6226         drflac_uint64 firstPCMFrameInFLACFrame = 0;
   6227         drflac_uint64 lastPCMFrameInFLACFrame = 0;
   6228 
   6229         drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
   6230 
   6231         pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
   6232         if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
   6233             /*
   6234             The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
   6235             it never existed and keep iterating.
   6236             */
   6237             drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
   6238 
   6239             if (!isMidFrame) {
   6240                 drflac_result result = drflac__decode_flac_frame(pFlac);
   6241                 if (result == DRFLAC_SUCCESS) {
   6242                     /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
   6243                     return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
   6244                 } else {
   6245                     if (result == DRFLAC_CRC_MISMATCH) {
   6246                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
   6247                     } else {
   6248                         return DRFLAC_FALSE;
   6249                     }
   6250                 }
   6251             } else {
   6252                 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
   6253                 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
   6254             }
   6255         } else {
   6256             /*
   6257             It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
   6258             frame never existed and leave the running sample count untouched.
   6259             */
   6260             if (!isMidFrame) {
   6261                 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
   6262                 if (result == DRFLAC_SUCCESS) {
   6263                     runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
   6264                 } else {
   6265                     if (result == DRFLAC_CRC_MISMATCH) {
   6266                         goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
   6267                     } else {
   6268                         return DRFLAC_FALSE;
   6269                     }
   6270                 }
   6271             } else {
   6272                 /*
   6273                 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
   6274                 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
   6275                 */
   6276                 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
   6277                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
   6278                 isMidFrame = DRFLAC_FALSE;
   6279             }
   6280 
   6281             /* If we are seeking to the end of the file and we've just hit it, we're done. */
   6282             if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
   6283                 return DRFLAC_TRUE;
   6284             }
   6285         }
   6286 
   6287     next_iteration:
   6288         /* Grab the next frame in preparation for the next iteration. */
   6289         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   6290             return DRFLAC_FALSE;
   6291         }
   6292     }
   6293 }
   6294 
   6295 
   6296 #ifndef DR_FLAC_NO_OGG
   6297 typedef struct
   6298 {
   6299     drflac_uint8 capturePattern[4];  /* Should be "OggS" */
   6300     drflac_uint8 structureVersion;   /* Always 0. */
   6301     drflac_uint8 headerType;
   6302     drflac_uint64 granulePosition;
   6303     drflac_uint32 serialNumber;
   6304     drflac_uint32 sequenceNumber;
   6305     drflac_uint32 checksum;
   6306     drflac_uint8 segmentCount;
   6307     drflac_uint8 segmentTable[255];
   6308 } drflac_ogg_page_header;
   6309 #endif
   6310 
   6311 typedef struct
   6312 {
   6313     drflac_read_proc onRead;
   6314     drflac_seek_proc onSeek;
   6315     drflac_meta_proc onMeta;
   6316     drflac_container container;
   6317     void* pUserData;
   6318     void* pUserDataMD;
   6319     drflac_uint32 sampleRate;
   6320     drflac_uint8  channels;
   6321     drflac_uint8  bitsPerSample;
   6322     drflac_uint64 totalPCMFrameCount;
   6323     drflac_uint16 maxBlockSizeInPCMFrames;
   6324     drflac_uint64 runningFilePos;
   6325     drflac_bool32 hasStreamInfoBlock;
   6326     drflac_bool32 hasMetadataBlocks;
   6327     drflac_bs bs;                           /* <-- A bit streamer is required for loading data during initialization. */
   6328     drflac_frame_header firstFrameHeader;   /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
   6329 
   6330 #ifndef DR_FLAC_NO_OGG
   6331     drflac_uint32 oggSerial;
   6332     drflac_uint64 oggFirstBytePos;
   6333     drflac_ogg_page_header oggBosHeader;
   6334 #endif
   6335 } drflac_init_info;
   6336 
   6337 static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
   6338 {
   6339     blockHeader = drflac__be2host_32(blockHeader);
   6340     *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
   6341     *blockType   = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
   6342     *blockSize   =                (blockHeader & 0x00FFFFFFUL);
   6343 }
   6344 
   6345 static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
   6346 {
   6347     drflac_uint32 blockHeader;
   6348 
   6349     *blockSize = 0;
   6350     if (onRead(pUserData, &blockHeader, 4) != 4) {
   6351         return DRFLAC_FALSE;
   6352     }
   6353 
   6354     drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
   6355     return DRFLAC_TRUE;
   6356 }
   6357 
   6358 static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
   6359 {
   6360     drflac_uint32 blockSizes;
   6361     drflac_uint64 frameSizes = 0;
   6362     drflac_uint64 importantProps;
   6363     drflac_uint8 md5[16];
   6364 
   6365     /* min/max block size. */
   6366     if (onRead(pUserData, &blockSizes, 4) != 4) {
   6367         return DRFLAC_FALSE;
   6368     }
   6369 
   6370     /* min/max frame size. */
   6371     if (onRead(pUserData, &frameSizes, 6) != 6) {
   6372         return DRFLAC_FALSE;
   6373     }
   6374 
   6375     /* Sample rate, channels, bits per sample and total sample count. */
   6376     if (onRead(pUserData, &importantProps, 8) != 8) {
   6377         return DRFLAC_FALSE;
   6378     }
   6379 
   6380     /* MD5 */
   6381     if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
   6382         return DRFLAC_FALSE;
   6383     }
   6384 
   6385     blockSizes     = drflac__be2host_32(blockSizes);
   6386     frameSizes     = drflac__be2host_64(frameSizes);
   6387     importantProps = drflac__be2host_64(importantProps);
   6388 
   6389     pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
   6390     pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
   6391     pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
   6392     pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) <<  0)) >> 16);
   6393     pStreamInfo->sampleRate              = (drflac_uint32)((importantProps &  (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
   6394     pStreamInfo->channels                = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
   6395     pStreamInfo->bitsPerSample           = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
   6396     pStreamInfo->totalPCMFrameCount      =                ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
   6397     DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
   6398 
   6399     return DRFLAC_TRUE;
   6400 }
   6401 
   6402 
   6403 static void* drflac__malloc_default(size_t sz, void* pUserData)
   6404 {
   6405     (void)pUserData;
   6406     return DRFLAC_MALLOC(sz);
   6407 }
   6408 
   6409 static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
   6410 {
   6411     (void)pUserData;
   6412     return DRFLAC_REALLOC(p, sz);
   6413 }
   6414 
   6415 static void drflac__free_default(void* p, void* pUserData)
   6416 {
   6417     (void)pUserData;
   6418     DRFLAC_FREE(p);
   6419 }
   6420 
   6421 
   6422 static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
   6423 {
   6424     if (pAllocationCallbacks == NULL) {
   6425         return NULL;
   6426     }
   6427 
   6428     if (pAllocationCallbacks->onMalloc != NULL) {
   6429         return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
   6430     }
   6431 
   6432     /* Try using realloc(). */
   6433     if (pAllocationCallbacks->onRealloc != NULL) {
   6434         return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
   6435     }
   6436 
   6437     return NULL;
   6438 }
   6439 
   6440 static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
   6441 {
   6442     if (pAllocationCallbacks == NULL) {
   6443         return NULL;
   6444     }
   6445 
   6446     if (pAllocationCallbacks->onRealloc != NULL) {
   6447         return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
   6448     }
   6449 
   6450     /* Try emulating realloc() in terms of malloc()/free(). */
   6451     if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
   6452         void* p2;
   6453 
   6454         p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
   6455         if (p2 == NULL) {
   6456             return NULL;
   6457         }
   6458 
   6459         if (p != NULL) {
   6460             DRFLAC_COPY_MEMORY(p2, p, szOld);
   6461             pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
   6462         }
   6463 
   6464         return p2;
   6465     }
   6466 
   6467     return NULL;
   6468 }
   6469 
   6470 static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
   6471 {
   6472     if (p == NULL || pAllocationCallbacks == NULL) {
   6473         return;
   6474     }
   6475 
   6476     if (pAllocationCallbacks->onFree != NULL) {
   6477         pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
   6478     }
   6479 }
   6480 
   6481 
   6482 static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
   6483 {
   6484     /*
   6485     We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
   6486     we'll be sitting on byte 42.
   6487     */
   6488     drflac_uint64 runningFilePos = 42;
   6489     drflac_uint64 seektablePos   = 0;
   6490     drflac_uint32 seektableSize  = 0;
   6491 
   6492     for (;;) {
   6493         drflac_metadata metadata;
   6494         drflac_uint8 isLastBlock = 0;
   6495         drflac_uint8 blockType = 0;
   6496         drflac_uint32 blockSize;
   6497         if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
   6498             return DRFLAC_FALSE;
   6499         }
   6500         runningFilePos += 4;
   6501 
   6502         metadata.type = blockType;
   6503         metadata.pRawData = NULL;
   6504         metadata.rawDataSize = 0;
   6505 
   6506         switch (blockType)
   6507         {
   6508             case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
   6509             {
   6510                 if (blockSize < 4) {
   6511                     return DRFLAC_FALSE;
   6512                 }
   6513 
   6514                 if (onMeta) {
   6515                     void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
   6516                     if (pRawData == NULL) {
   6517                         return DRFLAC_FALSE;
   6518                     }
   6519 
   6520                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
   6521                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6522                         return DRFLAC_FALSE;
   6523                     }
   6524 
   6525                     metadata.pRawData = pRawData;
   6526                     metadata.rawDataSize = blockSize;
   6527                     metadata.data.application.id       = drflac__be2host_32(*(drflac_uint32*)pRawData);
   6528                     metadata.data.application.pData    = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
   6529                     metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
   6530                     onMeta(pUserDataMD, &metadata);
   6531 
   6532                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6533                 }
   6534             } break;
   6535 
   6536             case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
   6537             {
   6538                 seektablePos  = runningFilePos;
   6539                 seektableSize = blockSize;
   6540 
   6541                 if (onMeta) {
   6542                     drflac_uint32 seekpointCount;
   6543                     drflac_uint32 iSeekpoint;
   6544                     void* pRawData;
   6545 
   6546                     seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
   6547 
   6548                     pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
   6549                     if (pRawData == NULL) {
   6550                         return DRFLAC_FALSE;
   6551                     }
   6552 
   6553                     /* We need to read seekpoint by seekpoint and do some processing. */
   6554                     for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
   6555                         drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
   6556 
   6557                         if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
   6558                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6559                             return DRFLAC_FALSE;
   6560                         }
   6561 
   6562                         /* Endian swap. */
   6563                         pSeekpoint->firstPCMFrame   = drflac__be2host_64(pSeekpoint->firstPCMFrame);
   6564                         pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
   6565                         pSeekpoint->pcmFrameCount   = drflac__be2host_16(pSeekpoint->pcmFrameCount);
   6566                     }
   6567 
   6568                     metadata.pRawData = pRawData;
   6569                     metadata.rawDataSize = blockSize;
   6570                     metadata.data.seektable.seekpointCount = seekpointCount;
   6571                     metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
   6572 
   6573                     onMeta(pUserDataMD, &metadata);
   6574 
   6575                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6576                 }
   6577             } break;
   6578 
   6579             case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
   6580             {
   6581                 if (blockSize < 8) {
   6582                     return DRFLAC_FALSE;
   6583                 }
   6584 
   6585                 if (onMeta) {
   6586                     void* pRawData;
   6587                     const char* pRunningData;
   6588                     const char* pRunningDataEnd;
   6589                     drflac_uint32 i;
   6590 
   6591                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
   6592                     if (pRawData == NULL) {
   6593                         return DRFLAC_FALSE;
   6594                     }
   6595 
   6596                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
   6597                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6598                         return DRFLAC_FALSE;
   6599                     }
   6600 
   6601                     metadata.pRawData = pRawData;
   6602                     metadata.rawDataSize = blockSize;
   6603 
   6604                     pRunningData    = (const char*)pRawData;
   6605                     pRunningDataEnd = (const char*)pRawData + blockSize;
   6606 
   6607                     metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6608 
   6609                     /* Need space for the rest of the block */
   6610                     if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
   6611                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6612                         return DRFLAC_FALSE;
   6613                     }
   6614                     metadata.data.vorbis_comment.vendor       = pRunningData;                                            pRunningData += metadata.data.vorbis_comment.vendorLength;
   6615                     metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6616 
   6617                     /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
   6618                     if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
   6619                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6620                         return DRFLAC_FALSE;
   6621                     }
   6622                     metadata.data.vorbis_comment.pComments    = pRunningData;
   6623 
   6624                     /* Check that the comments section is valid before passing it to the callback */
   6625                     for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
   6626                         drflac_uint32 commentLength;
   6627 
   6628                         if (pRunningDataEnd - pRunningData < 4) {
   6629                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6630                             return DRFLAC_FALSE;
   6631                         }
   6632 
   6633                         commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6634                         if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
   6635                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6636                             return DRFLAC_FALSE;
   6637                         }
   6638                         pRunningData += commentLength;
   6639                     }
   6640 
   6641                     onMeta(pUserDataMD, &metadata);
   6642 
   6643                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6644                 }
   6645             } break;
   6646 
   6647             case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
   6648             {
   6649                 if (blockSize < 396) {
   6650                     return DRFLAC_FALSE;
   6651                 }
   6652 
   6653                 if (onMeta) {
   6654                     void* pRawData;
   6655                     const char* pRunningData;
   6656                     const char* pRunningDataEnd;
   6657                     size_t bufferSize;
   6658                     drflac_uint8 iTrack;
   6659                     drflac_uint8 iIndex;
   6660                     void* pTrackData;
   6661 
   6662                     /*
   6663                     This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
   6664                     we need for storing the necessary data. The second pass will fill that buffer with usable data.
   6665                     */
   6666                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
   6667                     if (pRawData == NULL) {
   6668                         return DRFLAC_FALSE;
   6669                     }
   6670 
   6671                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
   6672                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6673                         return DRFLAC_FALSE;
   6674                     }
   6675 
   6676                     metadata.pRawData = pRawData;
   6677                     metadata.rawDataSize = blockSize;
   6678 
   6679                     pRunningData    = (const char*)pRawData;
   6680                     pRunningDataEnd = (const char*)pRawData + blockSize;
   6681 
   6682                     DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128);                              pRunningData += 128;
   6683                     metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
   6684                     metadata.data.cuesheet.isCD              = (pRunningData[0] & 0x80) != 0;                           pRunningData += 259;
   6685                     metadata.data.cuesheet.trackCount        = pRunningData[0];                                         pRunningData += 1;
   6686                     metadata.data.cuesheet.pTrackData        = NULL;    /* Will be filled later. */
   6687 
   6688                     /* Pass 1: Calculate the size of the buffer for the track data. */
   6689                     {
   6690                         const char* pRunningDataSaved = pRunningData;   /* Will be restored at the end in preparation for the second pass. */
   6691 
   6692                         bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
   6693 
   6694                         for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
   6695                             drflac_uint8 indexCount;
   6696                             drflac_uint32 indexPointSize;
   6697 
   6698                             if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
   6699                                 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6700                                 return DRFLAC_FALSE;
   6701                             }
   6702 
   6703                             /* Skip to the index point count */
   6704                             pRunningData += 35;
   6705                             
   6706                             indexCount = pRunningData[0];
   6707                             pRunningData += 1;
   6708                             
   6709                             bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
   6710 
   6711                             /* Quick validation check. */
   6712                             indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
   6713                             if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
   6714                                 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6715                                 return DRFLAC_FALSE;
   6716                             }
   6717 
   6718                             pRunningData += indexPointSize;
   6719                         }
   6720 
   6721                         pRunningData = pRunningDataSaved;
   6722                     }
   6723 
   6724                     /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
   6725                     {
   6726                         char* pRunningTrackData;
   6727 
   6728                         pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
   6729                         if (pTrackData == NULL) {
   6730                             drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6731                             return DRFLAC_FALSE;
   6732                         }
   6733 
   6734                         pRunningTrackData = (char*)pTrackData;
   6735 
   6736                         for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
   6737                             drflac_uint8 indexCount;
   6738 
   6739                             DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
   6740                             pRunningData      += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
   6741                             pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
   6742 
   6743                             /* Grab the index count for the next part. */
   6744                             indexCount = pRunningData[0];
   6745                             pRunningData      += 1;
   6746                             pRunningTrackData += 1;
   6747 
   6748                             /* Extract each track index. */
   6749                             for (iIndex = 0; iIndex < indexCount; ++iIndex) {
   6750                                 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
   6751 
   6752                                 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
   6753                                 pRunningData      += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
   6754                                 pRunningTrackData += sizeof(drflac_cuesheet_track_index);
   6755 
   6756                                 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
   6757                             }
   6758                         }
   6759 
   6760                         metadata.data.cuesheet.pTrackData = pTrackData;
   6761                     }
   6762 
   6763                     /* The original data is no longer needed. */
   6764                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6765                     pRawData = NULL;
   6766 
   6767                     onMeta(pUserDataMD, &metadata);
   6768 
   6769                     drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
   6770                     pTrackData = NULL;
   6771                 }
   6772             } break;
   6773 
   6774             case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
   6775             {
   6776                 if (blockSize < 32) {
   6777                     return DRFLAC_FALSE;
   6778                 }
   6779 
   6780                 if (onMeta) {
   6781                     void* pRawData;
   6782                     const char* pRunningData;
   6783                     const char* pRunningDataEnd;
   6784 
   6785                     pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
   6786                     if (pRawData == NULL) {
   6787                         return DRFLAC_FALSE;
   6788                     }
   6789 
   6790                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
   6791                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6792                         return DRFLAC_FALSE;
   6793                     }
   6794 
   6795                     metadata.pRawData = pRawData;
   6796                     metadata.rawDataSize = blockSize;
   6797 
   6798                     pRunningData    = (const char*)pRawData;
   6799                     pRunningDataEnd = (const char*)pRawData + blockSize;
   6800 
   6801                     metadata.data.picture.type       = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6802                     metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6803 
   6804                     /* Need space for the rest of the block */
   6805                     if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
   6806                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6807                         return DRFLAC_FALSE;
   6808                     }
   6809                     metadata.data.picture.mime              = pRunningData;                                   pRunningData += metadata.data.picture.mimeLength;
   6810                     metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6811 
   6812                     /* Need space for the rest of the block */
   6813                     if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
   6814                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6815                         return DRFLAC_FALSE;
   6816                     }
   6817                     metadata.data.picture.description     = pRunningData;                                   pRunningData += metadata.data.picture.descriptionLength;
   6818                     metadata.data.picture.width           = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6819                     metadata.data.picture.height          = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6820                     metadata.data.picture.colorDepth      = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6821                     metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6822                     metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
   6823                     metadata.data.picture.pPictureData    = (const drflac_uint8*)pRunningData;
   6824 
   6825                     /* Need space for the picture after the block */
   6826                     if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
   6827                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6828                         return DRFLAC_FALSE;
   6829                     }
   6830 
   6831                     onMeta(pUserDataMD, &metadata);
   6832 
   6833                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6834                 }
   6835             } break;
   6836 
   6837             case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
   6838             {
   6839                 if (onMeta) {
   6840                     metadata.data.padding.unused = 0;
   6841 
   6842                     /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
   6843                     if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
   6844                         isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
   6845                     } else {
   6846                         onMeta(pUserDataMD, &metadata);
   6847                     }
   6848                 }
   6849             } break;
   6850 
   6851             case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
   6852             {
   6853                 /* Invalid chunk. Just skip over this one. */
   6854                 if (onMeta) {
   6855                     if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
   6856                         isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
   6857                     }
   6858                 }
   6859             } break;
   6860 
   6861             default:
   6862             {
   6863                 /*
   6864                 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
   6865                 can at the very least report the chunk to the application and let it look at the raw data.
   6866                 */
   6867                 if (onMeta) {
   6868                     void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
   6869                     if (pRawData == NULL) {
   6870                         return DRFLAC_FALSE;
   6871                     }
   6872 
   6873                     if (onRead(pUserData, pRawData, blockSize) != blockSize) {
   6874                         drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6875                         return DRFLAC_FALSE;
   6876                     }
   6877 
   6878                     metadata.pRawData = pRawData;
   6879                     metadata.rawDataSize = blockSize;
   6880                     onMeta(pUserDataMD, &metadata);
   6881 
   6882                     drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
   6883                 }
   6884             } break;
   6885         }
   6886 
   6887         /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
   6888         if (onMeta == NULL && blockSize > 0) {
   6889             if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
   6890                 isLastBlock = DRFLAC_TRUE;
   6891             }
   6892         }
   6893 
   6894         runningFilePos += blockSize;
   6895         if (isLastBlock) {
   6896             break;
   6897         }
   6898     }
   6899 
   6900     *pSeektablePos   = seektablePos;
   6901     *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
   6902     *pFirstFramePos  = runningFilePos;
   6903 
   6904     return DRFLAC_TRUE;
   6905 }
   6906 
   6907 static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
   6908 {
   6909     /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
   6910 
   6911     drflac_uint8 isLastBlock;
   6912     drflac_uint8 blockType;
   6913     drflac_uint32 blockSize;
   6914 
   6915     (void)onSeek;
   6916 
   6917     pInit->container = drflac_container_native;
   6918 
   6919     /* The first metadata block should be the STREAMINFO block. */
   6920     if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
   6921         return DRFLAC_FALSE;
   6922     }
   6923 
   6924     if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
   6925         if (!relaxed) {
   6926             /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
   6927             return DRFLAC_FALSE;
   6928         } else {
   6929             /*
   6930             Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
   6931             for that frame.
   6932             */
   6933             pInit->hasStreamInfoBlock = DRFLAC_FALSE;
   6934             pInit->hasMetadataBlocks  = DRFLAC_FALSE;
   6935 
   6936             if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
   6937                 return DRFLAC_FALSE;    /* Couldn't find a frame. */
   6938             }
   6939 
   6940             if (pInit->firstFrameHeader.bitsPerSample == 0) {
   6941                 return DRFLAC_FALSE;    /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
   6942             }
   6943 
   6944             pInit->sampleRate              = pInit->firstFrameHeader.sampleRate;
   6945             pInit->channels                = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
   6946             pInit->bitsPerSample           = pInit->firstFrameHeader.bitsPerSample;
   6947             pInit->maxBlockSizeInPCMFrames = 65535;   /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
   6948             return DRFLAC_TRUE;
   6949         }
   6950     } else {
   6951         drflac_streaminfo streaminfo;
   6952         if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
   6953             return DRFLAC_FALSE;
   6954         }
   6955 
   6956         pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
   6957         pInit->sampleRate              = streaminfo.sampleRate;
   6958         pInit->channels                = streaminfo.channels;
   6959         pInit->bitsPerSample           = streaminfo.bitsPerSample;
   6960         pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
   6961         pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;    /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
   6962         pInit->hasMetadataBlocks       = !isLastBlock;
   6963 
   6964         if (onMeta) {
   6965             drflac_metadata metadata;
   6966             metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
   6967             metadata.pRawData = NULL;
   6968             metadata.rawDataSize = 0;
   6969             metadata.data.streaminfo = streaminfo;
   6970             onMeta(pUserDataMD, &metadata);
   6971         }
   6972 
   6973         return DRFLAC_TRUE;
   6974     }
   6975 }
   6976 
   6977 #ifndef DR_FLAC_NO_OGG
   6978 #define DRFLAC_OGG_MAX_PAGE_SIZE            65307
   6979 #define DRFLAC_OGG_CAPTURE_PATTERN_CRC32    1605413199  /* CRC-32 of "OggS". */
   6980 
   6981 typedef enum
   6982 {
   6983     drflac_ogg_recover_on_crc_mismatch,
   6984     drflac_ogg_fail_on_crc_mismatch
   6985 } drflac_ogg_crc_mismatch_recovery;
   6986 
   6987 #ifndef DR_FLAC_NO_CRC
   6988 static drflac_uint32 drflac__crc32_table[] = {
   6989     0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
   6990     0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
   6991     0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
   6992     0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
   6993     0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
   6994     0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
   6995     0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
   6996     0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
   6997     0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
   6998     0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
   6999     0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
   7000     0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
   7001     0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
   7002     0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
   7003     0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
   7004     0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
   7005     0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
   7006     0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
   7007     0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
   7008     0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
   7009     0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
   7010     0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
   7011     0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
   7012     0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
   7013     0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
   7014     0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
   7015     0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
   7016     0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
   7017     0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
   7018     0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
   7019     0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
   7020     0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
   7021     0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
   7022     0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
   7023     0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
   7024     0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
   7025     0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
   7026     0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
   7027     0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
   7028     0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
   7029     0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
   7030     0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
   7031     0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
   7032     0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
   7033     0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
   7034     0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
   7035     0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
   7036     0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
   7037     0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
   7038     0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
   7039     0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
   7040     0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
   7041     0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
   7042     0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
   7043     0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
   7044     0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
   7045     0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
   7046     0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
   7047     0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
   7048     0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
   7049     0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
   7050     0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
   7051     0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
   7052     0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
   7053 };
   7054 #endif
   7055 
   7056 static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
   7057 {
   7058 #ifndef DR_FLAC_NO_CRC
   7059     return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
   7060 #else
   7061     (void)data;
   7062     return crc32;
   7063 #endif
   7064 }
   7065 
   7066 #if 0
   7067 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
   7068 {
   7069     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
   7070     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
   7071     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  8) & 0xFF));
   7072     crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  0) & 0xFF));
   7073     return crc32;
   7074 }
   7075 
   7076 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
   7077 {
   7078     crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
   7079     crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >>  0) & 0xFFFFFFFF));
   7080     return crc32;
   7081 }
   7082 #endif
   7083 
   7084 static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
   7085 {
   7086     /* This can be optimized. */
   7087     drflac_uint32 i;
   7088     for (i = 0; i < dataSize; ++i) {
   7089         crc32 = drflac_crc32_byte(crc32, pData[i]);
   7090     }
   7091     return crc32;
   7092 }
   7093 
   7094 
   7095 static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
   7096 {
   7097     return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
   7098 }
   7099 
   7100 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
   7101 {
   7102     return 27 + pHeader->segmentCount;
   7103 }
   7104 
   7105 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
   7106 {
   7107     drflac_uint32 pageBodySize = 0;
   7108     int i;
   7109 
   7110     for (i = 0; i < pHeader->segmentCount; ++i) {
   7111         pageBodySize += pHeader->segmentTable[i];
   7112     }
   7113 
   7114     return pageBodySize;
   7115 }
   7116 
   7117 static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
   7118 {
   7119     drflac_uint8 data[23];
   7120     drflac_uint32 i;
   7121 
   7122     DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
   7123 
   7124     if (onRead(pUserData, data, 23) != 23) {
   7125         return DRFLAC_AT_END;
   7126     }
   7127     *pBytesRead += 23;
   7128 
   7129     /*
   7130     It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
   7131     us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
   7132     like to have it map to the structure of the underlying data.
   7133     */
   7134     pHeader->capturePattern[0] = 'O';
   7135     pHeader->capturePattern[1] = 'g';
   7136     pHeader->capturePattern[2] = 'g';
   7137     pHeader->capturePattern[3] = 'S';
   7138 
   7139     pHeader->structureVersion = data[0];
   7140     pHeader->headerType       = data[1];
   7141     DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
   7142     DRFLAC_COPY_MEMORY(&pHeader->serialNumber,    &data[10], 4);
   7143     DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber,  &data[14], 4);
   7144     DRFLAC_COPY_MEMORY(&pHeader->checksum,        &data[18], 4);
   7145     pHeader->segmentCount     = data[22];
   7146 
   7147     /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
   7148     data[18] = 0;
   7149     data[19] = 0;
   7150     data[20] = 0;
   7151     data[21] = 0;
   7152 
   7153     for (i = 0; i < 23; ++i) {
   7154         *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
   7155     }
   7156 
   7157 
   7158     if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
   7159         return DRFLAC_AT_END;
   7160     }
   7161     *pBytesRead += pHeader->segmentCount;
   7162 
   7163     for (i = 0; i < pHeader->segmentCount; ++i) {
   7164         *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
   7165     }
   7166 
   7167     return DRFLAC_SUCCESS;
   7168 }
   7169 
   7170 static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
   7171 {
   7172     drflac_uint8 id[4];
   7173 
   7174     *pBytesRead = 0;
   7175 
   7176     if (onRead(pUserData, id, 4) != 4) {
   7177         return DRFLAC_AT_END;
   7178     }
   7179     *pBytesRead += 4;
   7180 
   7181     /* We need to read byte-by-byte until we find the OggS capture pattern. */
   7182     for (;;) {
   7183         if (drflac_ogg__is_capture_pattern(id)) {
   7184             drflac_result result;
   7185 
   7186             *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
   7187 
   7188             result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
   7189             if (result == DRFLAC_SUCCESS) {
   7190                 return DRFLAC_SUCCESS;
   7191             } else {
   7192                 if (result == DRFLAC_CRC_MISMATCH) {
   7193                     continue;
   7194                 } else {
   7195                     return result;
   7196                 }
   7197             }
   7198         } else {
   7199             /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
   7200             id[0] = id[1];
   7201             id[1] = id[2];
   7202             id[2] = id[3];
   7203             if (onRead(pUserData, &id[3], 1) != 1) {
   7204                 return DRFLAC_AT_END;
   7205             }
   7206             *pBytesRead += 1;
   7207         }
   7208     }
   7209 }
   7210 
   7211 
   7212 /*
   7213 The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
   7214 in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
   7215 in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
   7216 dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
   7217 the physical Ogg bitstream are converted and delivered in native FLAC format.
   7218 */
   7219 typedef struct
   7220 {
   7221     drflac_read_proc onRead;                /* The original onRead callback from drflac_open() and family. */
   7222     drflac_seek_proc onSeek;                /* The original onSeek callback from drflac_open() and family. */
   7223     void* pUserData;                        /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
   7224     drflac_uint64 currentBytePos;           /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
   7225     drflac_uint64 firstBytePos;             /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
   7226     drflac_uint32 serialNumber;             /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
   7227     drflac_ogg_page_header bosPageHeader;   /* Used for seeking. */
   7228     drflac_ogg_page_header currentPageHeader;
   7229     drflac_uint32 bytesRemainingInPage;
   7230     drflac_uint32 pageDataSize;
   7231     drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
   7232 } drflac_oggbs; /* oggbs = Ogg Bitstream */
   7233 
   7234 static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
   7235 {
   7236     size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
   7237     oggbs->currentBytePos += bytesActuallyRead;
   7238 
   7239     return bytesActuallyRead;
   7240 }
   7241 
   7242 static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
   7243 {
   7244     if (origin == drflac_seek_origin_start) {
   7245         if (offset <= 0x7FFFFFFF) {
   7246             if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
   7247                 return DRFLAC_FALSE;
   7248             }
   7249             oggbs->currentBytePos = offset;
   7250 
   7251             return DRFLAC_TRUE;
   7252         } else {
   7253             if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
   7254                 return DRFLAC_FALSE;
   7255             }
   7256             oggbs->currentBytePos = offset;
   7257 
   7258             return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
   7259         }
   7260     } else {
   7261         while (offset > 0x7FFFFFFF) {
   7262             if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
   7263                 return DRFLAC_FALSE;
   7264             }
   7265             oggbs->currentBytePos += 0x7FFFFFFF;
   7266             offset -= 0x7FFFFFFF;
   7267         }
   7268 
   7269         if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) {    /* <-- Safe cast thanks to the loop above. */
   7270             return DRFLAC_FALSE;
   7271         }
   7272         oggbs->currentBytePos += offset;
   7273 
   7274         return DRFLAC_TRUE;
   7275     }
   7276 }
   7277 
   7278 static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
   7279 {
   7280     drflac_ogg_page_header header;
   7281     for (;;) {
   7282         drflac_uint32 crc32 = 0;
   7283         drflac_uint32 bytesRead;
   7284         drflac_uint32 pageBodySize;
   7285 #ifndef DR_FLAC_NO_CRC
   7286         drflac_uint32 actualCRC32;
   7287 #endif
   7288 
   7289         if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
   7290             return DRFLAC_FALSE;
   7291         }
   7292         oggbs->currentBytePos += bytesRead;
   7293 
   7294         pageBodySize = drflac_ogg__get_page_body_size(&header);
   7295         if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
   7296             continue;   /* Invalid page size. Assume it's corrupted and just move to the next page. */
   7297         }
   7298 
   7299         if (header.serialNumber != oggbs->serialNumber) {
   7300             /* It's not a FLAC page. Skip it. */
   7301             if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
   7302                 return DRFLAC_FALSE;
   7303             }
   7304             continue;
   7305         }
   7306 
   7307 
   7308         /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
   7309         if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
   7310             return DRFLAC_FALSE;
   7311         }
   7312         oggbs->pageDataSize = pageBodySize;
   7313 
   7314 #ifndef DR_FLAC_NO_CRC
   7315         actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
   7316         if (actualCRC32 != header.checksum) {
   7317             if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
   7318                 continue;   /* CRC mismatch. Skip this page. */
   7319             } else {
   7320                 /*
   7321                 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
   7322                 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
   7323                 seek did not fully complete.
   7324                 */
   7325                 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
   7326                 return DRFLAC_FALSE;
   7327             }
   7328         }
   7329 #else
   7330         (void)recoveryMethod;   /* <-- Silence a warning. */
   7331 #endif
   7332 
   7333         oggbs->currentPageHeader = header;
   7334         oggbs->bytesRemainingInPage = pageBodySize;
   7335         return DRFLAC_TRUE;
   7336     }
   7337 }
   7338 
   7339 /* Function below is unused at the moment, but I might be re-adding it later. */
   7340 #if 0
   7341 static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
   7342 {
   7343     drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
   7344     drflac_uint8 iSeg = 0;
   7345     drflac_uint32 iByte = 0;
   7346     while (iByte < bytesConsumedInPage) {
   7347         drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
   7348         if (iByte + segmentSize > bytesConsumedInPage) {
   7349             break;
   7350         } else {
   7351             iSeg += 1;
   7352             iByte += segmentSize;
   7353         }
   7354     }
   7355 
   7356     *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
   7357     return iSeg;
   7358 }
   7359 
   7360 static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
   7361 {
   7362     /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
   7363     for (;;) {
   7364         drflac_bool32 atEndOfPage = DRFLAC_FALSE;
   7365 
   7366         drflac_uint8 bytesRemainingInSeg;
   7367         drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
   7368 
   7369         drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
   7370         for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
   7371             drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
   7372             if (segmentSize < 255) {
   7373                 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
   7374                     atEndOfPage = DRFLAC_TRUE;
   7375                 }
   7376 
   7377                 break;
   7378             }
   7379 
   7380             bytesToEndOfPacketOrPage += segmentSize;
   7381         }
   7382 
   7383         /*
   7384         At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
   7385         want to load the next page and keep searching for the end of the packet.
   7386         */
   7387         drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
   7388         oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
   7389 
   7390         if (atEndOfPage) {
   7391             /*
   7392             We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
   7393             straddle pages.
   7394             */
   7395             if (!drflac_oggbs__goto_next_page(oggbs)) {
   7396                 return DRFLAC_FALSE;
   7397             }
   7398 
   7399             /* If it's a fresh packet it most likely means we're at the next packet. */
   7400             if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
   7401                 return DRFLAC_TRUE;
   7402             }
   7403         } else {
   7404             /* We're at the next packet. */
   7405             return DRFLAC_TRUE;
   7406         }
   7407     }
   7408 }
   7409 
   7410 static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
   7411 {
   7412     /* The bitstream should be sitting on the first byte just after the header of the frame. */
   7413 
   7414     /* What we're actually doing here is seeking to the start of the next packet. */
   7415     return drflac_oggbs__seek_to_next_packet(oggbs);
   7416 }
   7417 #endif
   7418 
   7419 static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
   7420 {
   7421     drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
   7422     drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
   7423     size_t bytesRead = 0;
   7424 
   7425     DRFLAC_ASSERT(oggbs != NULL);
   7426     DRFLAC_ASSERT(pRunningBufferOut != NULL);
   7427 
   7428     /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
   7429     while (bytesRead < bytesToRead) {
   7430         size_t bytesRemainingToRead = bytesToRead - bytesRead;
   7431 
   7432         if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
   7433             DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
   7434             bytesRead += bytesRemainingToRead;
   7435             oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
   7436             break;
   7437         }
   7438 
   7439         /* If we get here it means some of the requested data is contained in the next pages. */
   7440         if (oggbs->bytesRemainingInPage > 0) {
   7441             DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
   7442             bytesRead += oggbs->bytesRemainingInPage;
   7443             pRunningBufferOut += oggbs->bytesRemainingInPage;
   7444             oggbs->bytesRemainingInPage = 0;
   7445         }
   7446 
   7447         DRFLAC_ASSERT(bytesRemainingToRead > 0);
   7448         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
   7449             break;  /* Failed to go to the next page. Might have simply hit the end of the stream. */
   7450         }
   7451     }
   7452 
   7453     return bytesRead;
   7454 }
   7455 
   7456 static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
   7457 {
   7458     drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
   7459     int bytesSeeked = 0;
   7460 
   7461     DRFLAC_ASSERT(oggbs != NULL);
   7462     DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
   7463 
   7464     /* Seeking is always forward which makes things a lot simpler. */
   7465     if (origin == drflac_seek_origin_start) {
   7466         if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
   7467             return DRFLAC_FALSE;
   7468         }
   7469 
   7470         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
   7471             return DRFLAC_FALSE;
   7472         }
   7473 
   7474         return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
   7475     }
   7476 
   7477     DRFLAC_ASSERT(origin == drflac_seek_origin_current);
   7478 
   7479     while (bytesSeeked < offset) {
   7480         int bytesRemainingToSeek = offset - bytesSeeked;
   7481         DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
   7482 
   7483         if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
   7484             bytesSeeked += bytesRemainingToSeek;
   7485             (void)bytesSeeked;  /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
   7486             oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
   7487             break;
   7488         }
   7489 
   7490         /* If we get here it means some of the requested data is contained in the next pages. */
   7491         if (oggbs->bytesRemainingInPage > 0) {
   7492             bytesSeeked += (int)oggbs->bytesRemainingInPage;
   7493             oggbs->bytesRemainingInPage = 0;
   7494         }
   7495 
   7496         DRFLAC_ASSERT(bytesRemainingToSeek > 0);
   7497         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
   7498             /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
   7499             return DRFLAC_FALSE;
   7500         }
   7501     }
   7502 
   7503     return DRFLAC_TRUE;
   7504 }
   7505 
   7506 
   7507 static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
   7508 {
   7509     drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
   7510     drflac_uint64 originalBytePos;
   7511     drflac_uint64 runningGranulePosition;
   7512     drflac_uint64 runningFrameBytePos;
   7513     drflac_uint64 runningPCMFrameCount;
   7514 
   7515     DRFLAC_ASSERT(oggbs != NULL);
   7516 
   7517     originalBytePos = oggbs->currentBytePos;   /* For recovery. Points to the OggS identifier. */
   7518 
   7519     /* First seek to the first frame. */
   7520     if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
   7521         return DRFLAC_FALSE;
   7522     }
   7523     oggbs->bytesRemainingInPage = 0;
   7524 
   7525     runningGranulePosition = 0;
   7526     for (;;) {
   7527         if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
   7528             drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
   7529             return DRFLAC_FALSE;   /* Never did find that sample... */
   7530         }
   7531 
   7532         runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
   7533         if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
   7534             break; /* The sample is somewhere in the previous page. */
   7535         }
   7536 
   7537         /*
   7538         At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
   7539         disregard any pages that do not begin a fresh packet.
   7540         */
   7541         if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {    /* <-- Is it a fresh page? */
   7542             if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
   7543                 drflac_uint8 firstBytesInPage[2];
   7544                 firstBytesInPage[0] = oggbs->pageData[0];
   7545                 firstBytesInPage[1] = oggbs->pageData[1];
   7546 
   7547                 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) {    /* <-- Does the page begin with a frame's sync code? */
   7548                     runningGranulePosition = oggbs->currentPageHeader.granulePosition;
   7549                 }
   7550 
   7551                 continue;
   7552             }
   7553         }
   7554     }
   7555 
   7556     /*
   7557     We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
   7558     start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
   7559     a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
   7560     we find the one containing the target sample.
   7561     */
   7562     if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
   7563         return DRFLAC_FALSE;
   7564     }
   7565     if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
   7566         return DRFLAC_FALSE;
   7567     }
   7568 
   7569     /*
   7570     At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
   7571     looping over these frames until we find the one containing the sample we're after.
   7572     */
   7573     runningPCMFrameCount = runningGranulePosition;
   7574     for (;;) {
   7575         /*
   7576         There are two ways to find the sample and seek past irrelevant frames:
   7577           1) Use the native FLAC decoder.
   7578           2) Use Ogg's framing system.
   7579 
   7580         Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
   7581         do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
   7582         duplication for the decoding of frame headers.
   7583 
   7584         Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
   7585         bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
   7586         standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
   7587         the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
   7588         using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
   7589         avoid the use of the drflac_bs object.
   7590 
   7591         Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
   7592           1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
   7593           2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
   7594           3) Simplicity.
   7595         */
   7596         drflac_uint64 firstPCMFrameInFLACFrame = 0;
   7597         drflac_uint64 lastPCMFrameInFLACFrame = 0;
   7598         drflac_uint64 pcmFrameCountInThisFrame;
   7599 
   7600         if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   7601             return DRFLAC_FALSE;
   7602         }
   7603 
   7604         drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
   7605 
   7606         pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
   7607 
   7608         /* If we are seeking to the end of the file and we've just hit it, we're done. */
   7609         if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
   7610             drflac_result result = drflac__decode_flac_frame(pFlac);
   7611             if (result == DRFLAC_SUCCESS) {
   7612                 pFlac->currentPCMFrame = pcmFrameIndex;
   7613                 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
   7614                 return DRFLAC_TRUE;
   7615             } else {
   7616                 return DRFLAC_FALSE;
   7617             }
   7618         }
   7619 
   7620         if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
   7621             /*
   7622             The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
   7623             it never existed and keep iterating.
   7624             */
   7625             drflac_result result = drflac__decode_flac_frame(pFlac);
   7626             if (result == DRFLAC_SUCCESS) {
   7627                 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
   7628                 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount);    /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
   7629                 if (pcmFramesToDecode == 0) {
   7630                     return DRFLAC_TRUE;
   7631                 }
   7632 
   7633                 pFlac->currentPCMFrame = runningPCMFrameCount;
   7634 
   7635                 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
   7636             } else {
   7637                 if (result == DRFLAC_CRC_MISMATCH) {
   7638                     continue;   /* CRC mismatch. Pretend this frame never existed. */
   7639                 } else {
   7640                     return DRFLAC_FALSE;
   7641                 }
   7642             }
   7643         } else {
   7644             /*
   7645             It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
   7646             frame never existed and leave the running sample count untouched.
   7647             */
   7648             drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
   7649             if (result == DRFLAC_SUCCESS) {
   7650                 runningPCMFrameCount += pcmFrameCountInThisFrame;
   7651             } else {
   7652                 if (result == DRFLAC_CRC_MISMATCH) {
   7653                     continue;   /* CRC mismatch. Pretend this frame never existed. */
   7654                 } else {
   7655                     return DRFLAC_FALSE;
   7656                 }
   7657             }
   7658         }
   7659     }
   7660 }
   7661 
   7662 
   7663 
   7664 static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
   7665 {
   7666     drflac_ogg_page_header header;
   7667     drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
   7668     drflac_uint32 bytesRead = 0;
   7669 
   7670     /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
   7671     (void)relaxed;
   7672 
   7673     pInit->container = drflac_container_ogg;
   7674     pInit->oggFirstBytePos = 0;
   7675 
   7676     /*
   7677     We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
   7678     stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
   7679     any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
   7680     */
   7681     if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
   7682         return DRFLAC_FALSE;
   7683     }
   7684     pInit->runningFilePos += bytesRead;
   7685 
   7686     for (;;) {
   7687         int pageBodySize;
   7688 
   7689         /* Break if we're past the beginning of stream page. */
   7690         if ((header.headerType & 0x02) == 0) {
   7691             return DRFLAC_FALSE;
   7692         }
   7693 
   7694         /* Check if it's a FLAC header. */
   7695         pageBodySize = drflac_ogg__get_page_body_size(&header);
   7696         if (pageBodySize == 51) {   /* 51 = the lacing value of the FLAC header packet. */
   7697             /* It could be a FLAC page... */
   7698             drflac_uint32 bytesRemainingInPage = pageBodySize;
   7699             drflac_uint8 packetType;
   7700 
   7701             if (onRead(pUserData, &packetType, 1) != 1) {
   7702                 return DRFLAC_FALSE;
   7703             }
   7704 
   7705             bytesRemainingInPage -= 1;
   7706             if (packetType == 0x7F) {
   7707                 /* Increasingly more likely to be a FLAC page... */
   7708                 drflac_uint8 sig[4];
   7709                 if (onRead(pUserData, sig, 4) != 4) {
   7710                     return DRFLAC_FALSE;
   7711                 }
   7712 
   7713                 bytesRemainingInPage -= 4;
   7714                 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
   7715                     /* Almost certainly a FLAC page... */
   7716                     drflac_uint8 mappingVersion[2];
   7717                     if (onRead(pUserData, mappingVersion, 2) != 2) {
   7718                         return DRFLAC_FALSE;
   7719                     }
   7720 
   7721                     if (mappingVersion[0] != 1) {
   7722                         return DRFLAC_FALSE;   /* Only supporting version 1.x of the Ogg mapping. */
   7723                     }
   7724 
   7725                     /*
   7726                     The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
   7727                     be handling it in a generic way based on the serial number and packet types.
   7728                     */
   7729                     if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
   7730                         return DRFLAC_FALSE;
   7731                     }
   7732 
   7733                     /* Expecting the native FLAC signature "fLaC". */
   7734                     if (onRead(pUserData, sig, 4) != 4) {
   7735                         return DRFLAC_FALSE;
   7736                     }
   7737 
   7738                     if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
   7739                         /* The remaining data in the page should be the STREAMINFO block. */
   7740                         drflac_streaminfo streaminfo;
   7741                         drflac_uint8 isLastBlock;
   7742                         drflac_uint8 blockType;
   7743                         drflac_uint32 blockSize;
   7744                         if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
   7745                             return DRFLAC_FALSE;
   7746                         }
   7747 
   7748                         if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
   7749                             return DRFLAC_FALSE;    /* Invalid block type. First block must be the STREAMINFO block. */
   7750                         }
   7751 
   7752                         if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
   7753                             /* Success! */
   7754                             pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
   7755                             pInit->sampleRate              = streaminfo.sampleRate;
   7756                             pInit->channels                = streaminfo.channels;
   7757                             pInit->bitsPerSample           = streaminfo.bitsPerSample;
   7758                             pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
   7759                             pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
   7760                             pInit->hasMetadataBlocks       = !isLastBlock;
   7761 
   7762                             if (onMeta) {
   7763                                 drflac_metadata metadata;
   7764                                 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
   7765                                 metadata.pRawData = NULL;
   7766                                 metadata.rawDataSize = 0;
   7767                                 metadata.data.streaminfo = streaminfo;
   7768                                 onMeta(pUserDataMD, &metadata);
   7769                             }
   7770 
   7771                             pInit->runningFilePos  += pageBodySize;
   7772                             pInit->oggFirstBytePos  = pInit->runningFilePos - 79;   /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
   7773                             pInit->oggSerial        = header.serialNumber;
   7774                             pInit->oggBosHeader     = header;
   7775                             break;
   7776                         } else {
   7777                             /* Failed to read STREAMINFO block. Aww, so close... */
   7778                             return DRFLAC_FALSE;
   7779                         }
   7780                     } else {
   7781                         /* Invalid file. */
   7782                         return DRFLAC_FALSE;
   7783                     }
   7784                 } else {
   7785                     /* Not a FLAC header. Skip it. */
   7786                     if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
   7787                         return DRFLAC_FALSE;
   7788                     }
   7789                 }
   7790             } else {
   7791                 /* Not a FLAC header. Seek past the entire page and move on to the next. */
   7792                 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
   7793                     return DRFLAC_FALSE;
   7794                 }
   7795             }
   7796         } else {
   7797             if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
   7798                 return DRFLAC_FALSE;
   7799             }
   7800         }
   7801 
   7802         pInit->runningFilePos += pageBodySize;
   7803 
   7804 
   7805         /* Read the header of the next page. */
   7806         if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
   7807             return DRFLAC_FALSE;
   7808         }
   7809         pInit->runningFilePos += bytesRead;
   7810     }
   7811 
   7812     /*
   7813     If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
   7814     packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
   7815     Ogg bistream object.
   7816     */
   7817     pInit->hasMetadataBlocks = DRFLAC_TRUE;    /* <-- Always have at least VORBIS_COMMENT metadata block. */
   7818     return DRFLAC_TRUE;
   7819 }
   7820 #endif
   7821 
   7822 static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
   7823 {
   7824     drflac_bool32 relaxed;
   7825     drflac_uint8 id[4];
   7826 
   7827     if (pInit == NULL || onRead == NULL || onSeek == NULL) {
   7828         return DRFLAC_FALSE;
   7829     }
   7830 
   7831     DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
   7832     pInit->onRead       = onRead;
   7833     pInit->onSeek       = onSeek;
   7834     pInit->onMeta       = onMeta;
   7835     pInit->container    = container;
   7836     pInit->pUserData    = pUserData;
   7837     pInit->pUserDataMD  = pUserDataMD;
   7838 
   7839     pInit->bs.onRead    = onRead;
   7840     pInit->bs.onSeek    = onSeek;
   7841     pInit->bs.pUserData = pUserData;
   7842     drflac__reset_cache(&pInit->bs);
   7843 
   7844 
   7845     /* If the container is explicitly defined then we can try opening in relaxed mode. */
   7846     relaxed = container != drflac_container_unknown;
   7847 
   7848     /* Skip over any ID3 tags. */
   7849     for (;;) {
   7850         if (onRead(pUserData, id, 4) != 4) {
   7851             return DRFLAC_FALSE;    /* Ran out of data. */
   7852         }
   7853         pInit->runningFilePos += 4;
   7854 
   7855         if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
   7856             drflac_uint8 header[6];
   7857             drflac_uint8 flags;
   7858             drflac_uint32 headerSize;
   7859 
   7860             if (onRead(pUserData, header, 6) != 6) {
   7861                 return DRFLAC_FALSE;    /* Ran out of data. */
   7862             }
   7863             pInit->runningFilePos += 6;
   7864 
   7865             flags = header[1];
   7866 
   7867             DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
   7868             headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
   7869             if (flags & 0x10) {
   7870                 headerSize += 10;
   7871             }
   7872 
   7873             if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
   7874                 return DRFLAC_FALSE;    /* Failed to seek past the tag. */
   7875             }
   7876             pInit->runningFilePos += headerSize;
   7877         } else {
   7878             break;
   7879         }
   7880     }
   7881 
   7882     if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
   7883         return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
   7884     }
   7885 #ifndef DR_FLAC_NO_OGG
   7886     if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
   7887         return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
   7888     }
   7889 #endif
   7890 
   7891     /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
   7892     if (relaxed) {
   7893         if (container == drflac_container_native) {
   7894             return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
   7895         }
   7896 #ifndef DR_FLAC_NO_OGG
   7897         if (container == drflac_container_ogg) {
   7898             return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
   7899         }
   7900 #endif
   7901     }
   7902 
   7903     /* Unsupported container. */
   7904     return DRFLAC_FALSE;
   7905 }
   7906 
   7907 static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
   7908 {
   7909     DRFLAC_ASSERT(pFlac != NULL);
   7910     DRFLAC_ASSERT(pInit != NULL);
   7911 
   7912     DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
   7913     pFlac->bs                      = pInit->bs;
   7914     pFlac->onMeta                  = pInit->onMeta;
   7915     pFlac->pUserDataMD             = pInit->pUserDataMD;
   7916     pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
   7917     pFlac->sampleRate              = pInit->sampleRate;
   7918     pFlac->channels                = (drflac_uint8)pInit->channels;
   7919     pFlac->bitsPerSample           = (drflac_uint8)pInit->bitsPerSample;
   7920     pFlac->totalPCMFrameCount      = pInit->totalPCMFrameCount;
   7921     pFlac->container               = pInit->container;
   7922 }
   7923 
   7924 
   7925 static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
   7926 {
   7927     drflac_init_info init;
   7928     drflac_uint32 allocationSize;
   7929     drflac_uint32 wholeSIMDVectorCountPerChannel;
   7930     drflac_uint32 decodedSamplesAllocationSize;
   7931 #ifndef DR_FLAC_NO_OGG
   7932     drflac_oggbs* pOggbs = NULL;
   7933 #endif
   7934     drflac_uint64 firstFramePos;
   7935     drflac_uint64 seektablePos;
   7936     drflac_uint32 seekpointCount;
   7937     drflac_allocation_callbacks allocationCallbacks;
   7938     drflac* pFlac;
   7939 
   7940     /* CPU support first. */
   7941     drflac__init_cpu_caps();
   7942 
   7943     if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
   7944         return NULL;
   7945     }
   7946 
   7947     if (pAllocationCallbacks != NULL) {
   7948         allocationCallbacks = *pAllocationCallbacks;
   7949         if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
   7950             return NULL;    /* Invalid allocation callbacks. */
   7951         }
   7952     } else {
   7953         allocationCallbacks.pUserData = NULL;
   7954         allocationCallbacks.onMalloc  = drflac__malloc_default;
   7955         allocationCallbacks.onRealloc = drflac__realloc_default;
   7956         allocationCallbacks.onFree    = drflac__free_default;
   7957     }
   7958 
   7959 
   7960     /*
   7961     The size of the allocation for the drflac object needs to be large enough to fit the following:
   7962       1) The main members of the drflac structure
   7963       2) A block of memory large enough to store the decoded samples of the largest frame in the stream
   7964       3) If the container is Ogg, a drflac_oggbs object
   7965 
   7966     The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
   7967     the different SIMD instruction sets.
   7968     */
   7969     allocationSize = sizeof(drflac);
   7970 
   7971     /*
   7972     The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
   7973     we are supporting.
   7974     */
   7975     if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
   7976         wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
   7977     } else {
   7978         wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
   7979     }
   7980 
   7981     decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
   7982 
   7983     allocationSize += decodedSamplesAllocationSize;
   7984     allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE;  /* Allocate extra bytes to ensure we have enough for alignment. */
   7985 
   7986 #ifndef DR_FLAC_NO_OGG
   7987     /* There's additional data required for Ogg streams. */
   7988     if (init.container == drflac_container_ogg) {
   7989         allocationSize += sizeof(drflac_oggbs);
   7990 
   7991         pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
   7992         if (pOggbs == NULL) {
   7993             return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
   7994         }
   7995 
   7996         DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
   7997         pOggbs->onRead = onRead;
   7998         pOggbs->onSeek = onSeek;
   7999         pOggbs->pUserData = pUserData;
   8000         pOggbs->currentBytePos = init.oggFirstBytePos;
   8001         pOggbs->firstBytePos = init.oggFirstBytePos;
   8002         pOggbs->serialNumber = init.oggSerial;
   8003         pOggbs->bosPageHeader = init.oggBosHeader;
   8004         pOggbs->bytesRemainingInPage = 0;
   8005     }
   8006 #endif
   8007 
   8008     /*
   8009     This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
   8010     consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
   8011     and decoding the metadata.
   8012     */
   8013     firstFramePos  = 42;   /* <-- We know we are at byte 42 at this point. */
   8014     seektablePos   = 0;
   8015     seekpointCount = 0;
   8016     if (init.hasMetadataBlocks) {
   8017         drflac_read_proc onReadOverride = onRead;
   8018         drflac_seek_proc onSeekOverride = onSeek;
   8019         void* pUserDataOverride = pUserData;
   8020 
   8021 #ifndef DR_FLAC_NO_OGG
   8022         if (init.container == drflac_container_ogg) {
   8023             onReadOverride = drflac__on_read_ogg;
   8024             onSeekOverride = drflac__on_seek_ogg;
   8025             pUserDataOverride = (void*)pOggbs;
   8026         }
   8027 #endif
   8028 
   8029         if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
   8030         #ifndef DR_FLAC_NO_OGG
   8031             drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
   8032         #endif
   8033             return NULL;
   8034         }
   8035 
   8036         allocationSize += seekpointCount * sizeof(drflac_seekpoint);
   8037     }
   8038 
   8039 
   8040     pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
   8041     if (pFlac == NULL) {
   8042     #ifndef DR_FLAC_NO_OGG
   8043         drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
   8044     #endif
   8045         return NULL;
   8046     }
   8047 
   8048     drflac__init_from_info(pFlac, &init);
   8049     pFlac->allocationCallbacks = allocationCallbacks;
   8050     pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
   8051 
   8052 #ifndef DR_FLAC_NO_OGG
   8053     if (init.container == drflac_container_ogg) {
   8054         drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
   8055         DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
   8056 
   8057         /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
   8058         drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
   8059         pOggbs = NULL;
   8060 
   8061         /* The Ogg bistream needs to be layered on top of the original bitstream. */
   8062         pFlac->bs.onRead = drflac__on_read_ogg;
   8063         pFlac->bs.onSeek = drflac__on_seek_ogg;
   8064         pFlac->bs.pUserData = (void*)pInternalOggbs;
   8065         pFlac->_oggbs = (void*)pInternalOggbs;
   8066     }
   8067 #endif
   8068 
   8069     pFlac->firstFLACFramePosInBytes = firstFramePos;
   8070 
   8071     /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
   8072 #ifndef DR_FLAC_NO_OGG
   8073     if (init.container == drflac_container_ogg)
   8074     {
   8075         pFlac->pSeekpoints = NULL;
   8076         pFlac->seekpointCount = 0;
   8077     }
   8078     else
   8079 #endif
   8080     {
   8081         /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
   8082         if (seektablePos != 0) {
   8083             pFlac->seekpointCount = seekpointCount;
   8084             pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
   8085 
   8086             DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
   8087             DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
   8088 
   8089             /* Seek to the seektable, then just read directly into our seektable buffer. */
   8090             if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
   8091                 drflac_uint32 iSeekpoint;
   8092 
   8093                 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
   8094                     if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
   8095                         /* Endian swap. */
   8096                         pFlac->pSeekpoints[iSeekpoint].firstPCMFrame   = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
   8097                         pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
   8098                         pFlac->pSeekpoints[iSeekpoint].pcmFrameCount   = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
   8099                     } else {
   8100                         /* Failed to read the seektable. Pretend we don't have one. */
   8101                         pFlac->pSeekpoints = NULL;
   8102                         pFlac->seekpointCount = 0;
   8103                         break;
   8104                     }
   8105                 }
   8106 
   8107                 /* We need to seek back to where we were. If this fails it's a critical error. */
   8108                 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
   8109                     drflac__free_from_callbacks(pFlac, &allocationCallbacks);
   8110                     return NULL;
   8111                 }
   8112             } else {
   8113                 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
   8114                 pFlac->pSeekpoints = NULL;
   8115                 pFlac->seekpointCount = 0;
   8116             }
   8117         }
   8118     }
   8119 
   8120 
   8121     /*
   8122     If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
   8123     the first frame.
   8124     */
   8125     if (!init.hasStreamInfoBlock) {
   8126         pFlac->currentFLACFrame.header = init.firstFrameHeader;
   8127         for (;;) {
   8128             drflac_result result = drflac__decode_flac_frame(pFlac);
   8129             if (result == DRFLAC_SUCCESS) {
   8130                 break;
   8131             } else {
   8132                 if (result == DRFLAC_CRC_MISMATCH) {
   8133                     if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
   8134                         drflac__free_from_callbacks(pFlac, &allocationCallbacks);
   8135                         return NULL;
   8136                     }
   8137                     continue;
   8138                 } else {
   8139                     drflac__free_from_callbacks(pFlac, &allocationCallbacks);
   8140                     return NULL;
   8141                 }
   8142             }
   8143         }
   8144     }
   8145 
   8146     return pFlac;
   8147 }
   8148 
   8149 
   8150 
   8151 #ifndef DR_FLAC_NO_STDIO
   8152 #include <stdio.h>
   8153 #ifndef DR_FLAC_NO_WCHAR
   8154 #include <wchar.h>      /* For wcslen(), wcsrtombs() */
   8155 #endif
   8156 
   8157 /* Errno */
   8158 /* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
   8159 #include <errno.h>
   8160 static drflac_result drflac_result_from_errno(int e)
   8161 {
   8162     switch (e)
   8163     {
   8164         case 0: return DRFLAC_SUCCESS;
   8165     #ifdef EPERM
   8166         case EPERM: return DRFLAC_INVALID_OPERATION;
   8167     #endif
   8168     #ifdef ENOENT
   8169         case ENOENT: return DRFLAC_DOES_NOT_EXIST;
   8170     #endif
   8171     #ifdef ESRCH
   8172         case ESRCH: return DRFLAC_DOES_NOT_EXIST;
   8173     #endif
   8174     #ifdef EINTR
   8175         case EINTR: return DRFLAC_INTERRUPT;
   8176     #endif
   8177     #ifdef EIO
   8178         case EIO: return DRFLAC_IO_ERROR;
   8179     #endif
   8180     #ifdef ENXIO
   8181         case ENXIO: return DRFLAC_DOES_NOT_EXIST;
   8182     #endif
   8183     #ifdef E2BIG
   8184         case E2BIG: return DRFLAC_INVALID_ARGS;
   8185     #endif
   8186     #ifdef ENOEXEC
   8187         case ENOEXEC: return DRFLAC_INVALID_FILE;
   8188     #endif
   8189     #ifdef EBADF
   8190         case EBADF: return DRFLAC_INVALID_FILE;
   8191     #endif
   8192     #ifdef ECHILD
   8193         case ECHILD: return DRFLAC_ERROR;
   8194     #endif
   8195     #ifdef EAGAIN
   8196         case EAGAIN: return DRFLAC_UNAVAILABLE;
   8197     #endif
   8198     #ifdef ENOMEM
   8199         case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
   8200     #endif
   8201     #ifdef EACCES
   8202         case EACCES: return DRFLAC_ACCESS_DENIED;
   8203     #endif
   8204     #ifdef EFAULT
   8205         case EFAULT: return DRFLAC_BAD_ADDRESS;
   8206     #endif
   8207     #ifdef ENOTBLK
   8208         case ENOTBLK: return DRFLAC_ERROR;
   8209     #endif
   8210     #ifdef EBUSY
   8211         case EBUSY: return DRFLAC_BUSY;
   8212     #endif
   8213     #ifdef EEXIST
   8214         case EEXIST: return DRFLAC_ALREADY_EXISTS;
   8215     #endif
   8216     #ifdef EXDEV
   8217         case EXDEV: return DRFLAC_ERROR;
   8218     #endif
   8219     #ifdef ENODEV
   8220         case ENODEV: return DRFLAC_DOES_NOT_EXIST;
   8221     #endif
   8222     #ifdef ENOTDIR
   8223         case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
   8224     #endif
   8225     #ifdef EISDIR
   8226         case EISDIR: return DRFLAC_IS_DIRECTORY;
   8227     #endif
   8228     #ifdef EINVAL
   8229         case EINVAL: return DRFLAC_INVALID_ARGS;
   8230     #endif
   8231     #ifdef ENFILE
   8232         case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
   8233     #endif
   8234     #ifdef EMFILE
   8235         case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
   8236     #endif
   8237     #ifdef ENOTTY
   8238         case ENOTTY: return DRFLAC_INVALID_OPERATION;
   8239     #endif
   8240     #ifdef ETXTBSY
   8241         case ETXTBSY: return DRFLAC_BUSY;
   8242     #endif
   8243     #ifdef EFBIG
   8244         case EFBIG: return DRFLAC_TOO_BIG;
   8245     #endif
   8246     #ifdef ENOSPC
   8247         case ENOSPC: return DRFLAC_NO_SPACE;
   8248     #endif
   8249     #ifdef ESPIPE
   8250         case ESPIPE: return DRFLAC_BAD_SEEK;
   8251     #endif
   8252     #ifdef EROFS
   8253         case EROFS: return DRFLAC_ACCESS_DENIED;
   8254     #endif
   8255     #ifdef EMLINK
   8256         case EMLINK: return DRFLAC_TOO_MANY_LINKS;
   8257     #endif
   8258     #ifdef EPIPE
   8259         case EPIPE: return DRFLAC_BAD_PIPE;
   8260     #endif
   8261     #ifdef EDOM
   8262         case EDOM: return DRFLAC_OUT_OF_RANGE;
   8263     #endif
   8264     #ifdef ERANGE
   8265         case ERANGE: return DRFLAC_OUT_OF_RANGE;
   8266     #endif
   8267     #ifdef EDEADLK
   8268         case EDEADLK: return DRFLAC_DEADLOCK;
   8269     #endif
   8270     #ifdef ENAMETOOLONG
   8271         case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
   8272     #endif
   8273     #ifdef ENOLCK
   8274         case ENOLCK: return DRFLAC_ERROR;
   8275     #endif
   8276     #ifdef ENOSYS
   8277         case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
   8278     #endif
   8279     #ifdef ENOTEMPTY
   8280         case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
   8281     #endif
   8282     #ifdef ELOOP
   8283         case ELOOP: return DRFLAC_TOO_MANY_LINKS;
   8284     #endif
   8285     #ifdef ENOMSG
   8286         case ENOMSG: return DRFLAC_NO_MESSAGE;
   8287     #endif
   8288     #ifdef EIDRM
   8289         case EIDRM: return DRFLAC_ERROR;
   8290     #endif
   8291     #ifdef ECHRNG
   8292         case ECHRNG: return DRFLAC_ERROR;
   8293     #endif
   8294     #ifdef EL2NSYNC
   8295         case EL2NSYNC: return DRFLAC_ERROR;
   8296     #endif
   8297     #ifdef EL3HLT
   8298         case EL3HLT: return DRFLAC_ERROR;
   8299     #endif
   8300     #ifdef EL3RST
   8301         case EL3RST: return DRFLAC_ERROR;
   8302     #endif
   8303     #ifdef ELNRNG
   8304         case ELNRNG: return DRFLAC_OUT_OF_RANGE;
   8305     #endif
   8306     #ifdef EUNATCH
   8307         case EUNATCH: return DRFLAC_ERROR;
   8308     #endif
   8309     #ifdef ENOCSI
   8310         case ENOCSI: return DRFLAC_ERROR;
   8311     #endif
   8312     #ifdef EL2HLT
   8313         case EL2HLT: return DRFLAC_ERROR;
   8314     #endif
   8315     #ifdef EBADE
   8316         case EBADE: return DRFLAC_ERROR;
   8317     #endif
   8318     #ifdef EBADR
   8319         case EBADR: return DRFLAC_ERROR;
   8320     #endif
   8321     #ifdef EXFULL
   8322         case EXFULL: return DRFLAC_ERROR;
   8323     #endif
   8324     #ifdef ENOANO
   8325         case ENOANO: return DRFLAC_ERROR;
   8326     #endif
   8327     #ifdef EBADRQC
   8328         case EBADRQC: return DRFLAC_ERROR;
   8329     #endif
   8330     #ifdef EBADSLT
   8331         case EBADSLT: return DRFLAC_ERROR;
   8332     #endif
   8333     #ifdef EBFONT
   8334         case EBFONT: return DRFLAC_INVALID_FILE;
   8335     #endif
   8336     #ifdef ENOSTR
   8337         case ENOSTR: return DRFLAC_ERROR;
   8338     #endif
   8339     #ifdef ENODATA
   8340         case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
   8341     #endif
   8342     #ifdef ETIME
   8343         case ETIME: return DRFLAC_TIMEOUT;
   8344     #endif
   8345     #ifdef ENOSR
   8346         case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
   8347     #endif
   8348     #ifdef ENONET
   8349         case ENONET: return DRFLAC_NO_NETWORK;
   8350     #endif
   8351     #ifdef ENOPKG
   8352         case ENOPKG: return DRFLAC_ERROR;
   8353     #endif
   8354     #ifdef EREMOTE
   8355         case EREMOTE: return DRFLAC_ERROR;
   8356     #endif
   8357     #ifdef ENOLINK
   8358         case ENOLINK: return DRFLAC_ERROR;
   8359     #endif
   8360     #ifdef EADV
   8361         case EADV: return DRFLAC_ERROR;
   8362     #endif
   8363     #ifdef ESRMNT
   8364         case ESRMNT: return DRFLAC_ERROR;
   8365     #endif
   8366     #ifdef ECOMM
   8367         case ECOMM: return DRFLAC_ERROR;
   8368     #endif
   8369     #ifdef EPROTO
   8370         case EPROTO: return DRFLAC_ERROR;
   8371     #endif
   8372     #ifdef EMULTIHOP
   8373         case EMULTIHOP: return DRFLAC_ERROR;
   8374     #endif
   8375     #ifdef EDOTDOT
   8376         case EDOTDOT: return DRFLAC_ERROR;
   8377     #endif
   8378     #ifdef EBADMSG
   8379         case EBADMSG: return DRFLAC_BAD_MESSAGE;
   8380     #endif
   8381     #ifdef EOVERFLOW
   8382         case EOVERFLOW: return DRFLAC_TOO_BIG;
   8383     #endif
   8384     #ifdef ENOTUNIQ
   8385         case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
   8386     #endif
   8387     #ifdef EBADFD
   8388         case EBADFD: return DRFLAC_ERROR;
   8389     #endif
   8390     #ifdef EREMCHG
   8391         case EREMCHG: return DRFLAC_ERROR;
   8392     #endif
   8393     #ifdef ELIBACC
   8394         case ELIBACC: return DRFLAC_ACCESS_DENIED;
   8395     #endif
   8396     #ifdef ELIBBAD
   8397         case ELIBBAD: return DRFLAC_INVALID_FILE;
   8398     #endif
   8399     #ifdef ELIBSCN
   8400         case ELIBSCN: return DRFLAC_INVALID_FILE;
   8401     #endif
   8402     #ifdef ELIBMAX
   8403         case ELIBMAX: return DRFLAC_ERROR;
   8404     #endif
   8405     #ifdef ELIBEXEC
   8406         case ELIBEXEC: return DRFLAC_ERROR;
   8407     #endif
   8408     #ifdef EILSEQ
   8409         case EILSEQ: return DRFLAC_INVALID_DATA;
   8410     #endif
   8411     #ifdef ERESTART
   8412         case ERESTART: return DRFLAC_ERROR;
   8413     #endif
   8414     #ifdef ESTRPIPE
   8415         case ESTRPIPE: return DRFLAC_ERROR;
   8416     #endif
   8417     #ifdef EUSERS
   8418         case EUSERS: return DRFLAC_ERROR;
   8419     #endif
   8420     #ifdef ENOTSOCK
   8421         case ENOTSOCK: return DRFLAC_NOT_SOCKET;
   8422     #endif
   8423     #ifdef EDESTADDRREQ
   8424         case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
   8425     #endif
   8426     #ifdef EMSGSIZE
   8427         case EMSGSIZE: return DRFLAC_TOO_BIG;
   8428     #endif
   8429     #ifdef EPROTOTYPE
   8430         case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
   8431     #endif
   8432     #ifdef ENOPROTOOPT
   8433         case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
   8434     #endif
   8435     #ifdef EPROTONOSUPPORT
   8436         case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
   8437     #endif
   8438     #ifdef ESOCKTNOSUPPORT
   8439         case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
   8440     #endif
   8441     #ifdef EOPNOTSUPP
   8442         case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
   8443     #endif
   8444     #ifdef EPFNOSUPPORT
   8445         case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
   8446     #endif
   8447     #ifdef EAFNOSUPPORT
   8448         case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
   8449     #endif
   8450     #ifdef EADDRINUSE
   8451         case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
   8452     #endif
   8453     #ifdef EADDRNOTAVAIL
   8454         case EADDRNOTAVAIL: return DRFLAC_ERROR;
   8455     #endif
   8456     #ifdef ENETDOWN
   8457         case ENETDOWN: return DRFLAC_NO_NETWORK;
   8458     #endif
   8459     #ifdef ENETUNREACH
   8460         case ENETUNREACH: return DRFLAC_NO_NETWORK;
   8461     #endif
   8462     #ifdef ENETRESET
   8463         case ENETRESET: return DRFLAC_NO_NETWORK;
   8464     #endif
   8465     #ifdef ECONNABORTED
   8466         case ECONNABORTED: return DRFLAC_NO_NETWORK;
   8467     #endif
   8468     #ifdef ECONNRESET
   8469         case ECONNRESET: return DRFLAC_CONNECTION_RESET;
   8470     #endif
   8471     #ifdef ENOBUFS
   8472         case ENOBUFS: return DRFLAC_NO_SPACE;
   8473     #endif
   8474     #ifdef EISCONN
   8475         case EISCONN: return DRFLAC_ALREADY_CONNECTED;
   8476     #endif
   8477     #ifdef ENOTCONN
   8478         case ENOTCONN: return DRFLAC_NOT_CONNECTED;
   8479     #endif
   8480     #ifdef ESHUTDOWN
   8481         case ESHUTDOWN: return DRFLAC_ERROR;
   8482     #endif
   8483     #ifdef ETOOMANYREFS
   8484         case ETOOMANYREFS: return DRFLAC_ERROR;
   8485     #endif
   8486     #ifdef ETIMEDOUT
   8487         case ETIMEDOUT: return DRFLAC_TIMEOUT;
   8488     #endif
   8489     #ifdef ECONNREFUSED
   8490         case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
   8491     #endif
   8492     #ifdef EHOSTDOWN
   8493         case EHOSTDOWN: return DRFLAC_NO_HOST;
   8494     #endif
   8495     #ifdef EHOSTUNREACH
   8496         case EHOSTUNREACH: return DRFLAC_NO_HOST;
   8497     #endif
   8498     #ifdef EALREADY
   8499         case EALREADY: return DRFLAC_IN_PROGRESS;
   8500     #endif
   8501     #ifdef EINPROGRESS
   8502         case EINPROGRESS: return DRFLAC_IN_PROGRESS;
   8503     #endif
   8504     #ifdef ESTALE
   8505         case ESTALE: return DRFLAC_INVALID_FILE;
   8506     #endif
   8507     #ifdef EUCLEAN
   8508         case EUCLEAN: return DRFLAC_ERROR;
   8509     #endif
   8510     #ifdef ENOTNAM
   8511         case ENOTNAM: return DRFLAC_ERROR;
   8512     #endif
   8513     #ifdef ENAVAIL
   8514         case ENAVAIL: return DRFLAC_ERROR;
   8515     #endif
   8516     #ifdef EISNAM
   8517         case EISNAM: return DRFLAC_ERROR;
   8518     #endif
   8519     #ifdef EREMOTEIO
   8520         case EREMOTEIO: return DRFLAC_IO_ERROR;
   8521     #endif
   8522     #ifdef EDQUOT
   8523         case EDQUOT: return DRFLAC_NO_SPACE;
   8524     #endif
   8525     #ifdef ENOMEDIUM
   8526         case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
   8527     #endif
   8528     #ifdef EMEDIUMTYPE
   8529         case EMEDIUMTYPE: return DRFLAC_ERROR;
   8530     #endif
   8531     #ifdef ECANCELED
   8532         case ECANCELED: return DRFLAC_CANCELLED;
   8533     #endif
   8534     #ifdef ENOKEY
   8535         case ENOKEY: return DRFLAC_ERROR;
   8536     #endif
   8537     #ifdef EKEYEXPIRED
   8538         case EKEYEXPIRED: return DRFLAC_ERROR;
   8539     #endif
   8540     #ifdef EKEYREVOKED
   8541         case EKEYREVOKED: return DRFLAC_ERROR;
   8542     #endif
   8543     #ifdef EKEYREJECTED
   8544         case EKEYREJECTED: return DRFLAC_ERROR;
   8545     #endif
   8546     #ifdef EOWNERDEAD
   8547         case EOWNERDEAD: return DRFLAC_ERROR;
   8548     #endif
   8549     #ifdef ENOTRECOVERABLE
   8550         case ENOTRECOVERABLE: return DRFLAC_ERROR;
   8551     #endif
   8552     #ifdef ERFKILL
   8553         case ERFKILL: return DRFLAC_ERROR;
   8554     #endif
   8555     #ifdef EHWPOISON
   8556         case EHWPOISON: return DRFLAC_ERROR;
   8557     #endif
   8558         default: return DRFLAC_ERROR;
   8559     }
   8560 }
   8561 /* End Errno */
   8562 
   8563 /* fopen */
   8564 static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
   8565 {
   8566 #if defined(_MSC_VER) && _MSC_VER >= 1400
   8567     errno_t err;
   8568 #endif
   8569 
   8570     if (ppFile != NULL) {
   8571         *ppFile = NULL;  /* Safety. */
   8572     }
   8573 
   8574     if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
   8575         return DRFLAC_INVALID_ARGS;
   8576     }
   8577 
   8578 #if defined(_MSC_VER) && _MSC_VER >= 1400
   8579     err = fopen_s(ppFile, pFilePath, pOpenMode);
   8580     if (err != 0) {
   8581         return drflac_result_from_errno(err);
   8582     }
   8583 #else
   8584 #if defined(_WIN32) || defined(__APPLE__)
   8585     *ppFile = fopen(pFilePath, pOpenMode);
   8586 #else
   8587     #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
   8588         *ppFile = fopen64(pFilePath, pOpenMode);
   8589     #else
   8590         *ppFile = fopen(pFilePath, pOpenMode);
   8591     #endif
   8592 #endif
   8593     if (*ppFile == NULL) {
   8594         drflac_result result = drflac_result_from_errno(errno);
   8595         if (result == DRFLAC_SUCCESS) {
   8596             result = DRFLAC_ERROR;   /* Just a safety check to make sure we never ever return success when pFile == NULL. */
   8597         }
   8598 
   8599         return result;
   8600     }
   8601 #endif
   8602 
   8603     return DRFLAC_SUCCESS;
   8604 }
   8605 
   8606 /*
   8607 _wfopen() isn't always available in all compilation environments.
   8608 
   8609     * Windows only.
   8610     * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
   8611     * MinGW-64 (both 32- and 64-bit) seems to support it.
   8612     * MinGW wraps it in !defined(__STRICT_ANSI__).
   8613     * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
   8614 
   8615 This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
   8616 fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
   8617 */
   8618 #if defined(_WIN32)
   8619     #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
   8620         #define DRFLAC_HAS_WFOPEN
   8621     #endif
   8622 #endif
   8623 
   8624 #ifndef DR_FLAC_NO_WCHAR
   8625 static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
   8626 {
   8627     if (ppFile != NULL) {
   8628         *ppFile = NULL;  /* Safety. */
   8629     }
   8630 
   8631     if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
   8632         return DRFLAC_INVALID_ARGS;
   8633     }
   8634 
   8635 #if defined(DRFLAC_HAS_WFOPEN)
   8636     {
   8637         /* Use _wfopen() on Windows. */
   8638     #if defined(_MSC_VER) && _MSC_VER >= 1400
   8639         errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
   8640         if (err != 0) {
   8641             return drflac_result_from_errno(err);
   8642         }
   8643     #else
   8644         *ppFile = _wfopen(pFilePath, pOpenMode);
   8645         if (*ppFile == NULL) {
   8646             return drflac_result_from_errno(errno);
   8647         }
   8648     #endif
   8649         (void)pAllocationCallbacks;
   8650     }
   8651 #else
   8652     /*
   8653     Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
   8654 	fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
   8655 	that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
   8656     maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
   8657 	error I'll look into improving compatibility.
   8658     */
   8659 
   8660 	/*
   8661 	Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
   8662 	need to abort with an error. If you encounter a compiler lacking such support, add it to this list
   8663 	and submit a bug report and it'll be added to the library upstream.
   8664 	*/
   8665 	#if defined(__DJGPP__)
   8666 	{
   8667 		/* Nothing to do here. This will fall through to the error check below. */
   8668 	}
   8669 	#else
   8670     {
   8671         mbstate_t mbs;
   8672         size_t lenMB;
   8673         const wchar_t* pFilePathTemp = pFilePath;
   8674         char* pFilePathMB = NULL;
   8675         char pOpenModeMB[32] = {0};
   8676 
   8677         /* Get the length first. */
   8678         DRFLAC_ZERO_OBJECT(&mbs);
   8679         lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
   8680         if (lenMB == (size_t)-1) {
   8681             return drflac_result_from_errno(errno);
   8682         }
   8683 
   8684         pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
   8685         if (pFilePathMB == NULL) {
   8686             return DRFLAC_OUT_OF_MEMORY;
   8687         }
   8688 
   8689         pFilePathTemp = pFilePath;
   8690         DRFLAC_ZERO_OBJECT(&mbs);
   8691         wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
   8692 
   8693         /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
   8694         {
   8695             size_t i = 0;
   8696             for (;;) {
   8697                 if (pOpenMode[i] == 0) {
   8698                     pOpenModeMB[i] = '\0';
   8699                     break;
   8700                 }
   8701 
   8702                 pOpenModeMB[i] = (char)pOpenMode[i];
   8703                 i += 1;
   8704             }
   8705         }
   8706 
   8707         *ppFile = fopen(pFilePathMB, pOpenModeMB);
   8708 
   8709         drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
   8710     }
   8711 	#endif
   8712 
   8713     if (*ppFile == NULL) {
   8714         return DRFLAC_ERROR;
   8715     }
   8716 #endif
   8717 
   8718     return DRFLAC_SUCCESS;
   8719 }
   8720 #endif
   8721 /* End fopen */
   8722 
   8723 static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
   8724 {
   8725     return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
   8726 }
   8727 
   8728 static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
   8729 {
   8730     DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
   8731 
   8732     return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
   8733 }
   8734 
   8735 
   8736 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
   8737 {
   8738     drflac* pFlac;
   8739     FILE* pFile;
   8740 
   8741     if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
   8742         return NULL;
   8743     }
   8744 
   8745     pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
   8746     if (pFlac == NULL) {
   8747         fclose(pFile);
   8748         return NULL;
   8749     }
   8750 
   8751     return pFlac;
   8752 }
   8753 
   8754 #ifndef DR_FLAC_NO_WCHAR
   8755 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
   8756 {
   8757     drflac* pFlac;
   8758     FILE* pFile;
   8759 
   8760     if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
   8761         return NULL;
   8762     }
   8763 
   8764     pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
   8765     if (pFlac == NULL) {
   8766         fclose(pFile);
   8767         return NULL;
   8768     }
   8769 
   8770     return pFlac;
   8771 }
   8772 #endif
   8773 
   8774 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8775 {
   8776     drflac* pFlac;
   8777     FILE* pFile;
   8778 
   8779     if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
   8780         return NULL;
   8781     }
   8782 
   8783     pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
   8784     if (pFlac == NULL) {
   8785         fclose(pFile);
   8786         return pFlac;
   8787     }
   8788 
   8789     return pFlac;
   8790 }
   8791 
   8792 #ifndef DR_FLAC_NO_WCHAR
   8793 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8794 {
   8795     drflac* pFlac;
   8796     FILE* pFile;
   8797 
   8798     if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
   8799         return NULL;
   8800     }
   8801 
   8802     pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
   8803     if (pFlac == NULL) {
   8804         fclose(pFile);
   8805         return pFlac;
   8806     }
   8807 
   8808     return pFlac;
   8809 }
   8810 #endif
   8811 #endif  /* DR_FLAC_NO_STDIO */
   8812 
   8813 static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
   8814 {
   8815     drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
   8816     size_t bytesRemaining;
   8817 
   8818     DRFLAC_ASSERT(memoryStream != NULL);
   8819     DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
   8820 
   8821     bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
   8822     if (bytesToRead > bytesRemaining) {
   8823         bytesToRead = bytesRemaining;
   8824     }
   8825 
   8826     if (bytesToRead > 0) {
   8827         DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
   8828         memoryStream->currentReadPos += bytesToRead;
   8829     }
   8830 
   8831     return bytesToRead;
   8832 }
   8833 
   8834 static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
   8835 {
   8836     drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
   8837 
   8838     DRFLAC_ASSERT(memoryStream != NULL);
   8839     DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
   8840 
   8841     if (offset > (drflac_int64)memoryStream->dataSize) {
   8842         return DRFLAC_FALSE;
   8843     }
   8844 
   8845     if (origin == drflac_seek_origin_current) {
   8846         if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
   8847             memoryStream->currentReadPos += offset;
   8848         } else {
   8849             return DRFLAC_FALSE;  /* Trying to seek too far forward. */
   8850         }
   8851     } else {
   8852         if ((drflac_uint32)offset <= memoryStream->dataSize) {
   8853             memoryStream->currentReadPos = offset;
   8854         } else {
   8855             return DRFLAC_FALSE;  /* Trying to seek too far forward. */
   8856         }
   8857     }
   8858 
   8859     return DRFLAC_TRUE;
   8860 }
   8861 
   8862 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
   8863 {
   8864     drflac__memory_stream memoryStream;
   8865     drflac* pFlac;
   8866 
   8867     memoryStream.data = (const drflac_uint8*)pData;
   8868     memoryStream.dataSize = dataSize;
   8869     memoryStream.currentReadPos = 0;
   8870     pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
   8871     if (pFlac == NULL) {
   8872         return NULL;
   8873     }
   8874 
   8875     pFlac->memoryStream = memoryStream;
   8876 
   8877     /* This is an awful hack... */
   8878 #ifndef DR_FLAC_NO_OGG
   8879     if (pFlac->container == drflac_container_ogg)
   8880     {
   8881         drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
   8882         oggbs->pUserData = &pFlac->memoryStream;
   8883     }
   8884     else
   8885 #endif
   8886     {
   8887         pFlac->bs.pUserData = &pFlac->memoryStream;
   8888     }
   8889 
   8890     return pFlac;
   8891 }
   8892 
   8893 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8894 {
   8895     drflac__memory_stream memoryStream;
   8896     drflac* pFlac;
   8897 
   8898     memoryStream.data = (const drflac_uint8*)pData;
   8899     memoryStream.dataSize = dataSize;
   8900     memoryStream.currentReadPos = 0;
   8901     pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
   8902     if (pFlac == NULL) {
   8903         return NULL;
   8904     }
   8905 
   8906     pFlac->memoryStream = memoryStream;
   8907 
   8908     /* This is an awful hack... */
   8909 #ifndef DR_FLAC_NO_OGG
   8910     if (pFlac->container == drflac_container_ogg)
   8911     {
   8912         drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
   8913         oggbs->pUserData = &pFlac->memoryStream;
   8914     }
   8915     else
   8916 #endif
   8917     {
   8918         pFlac->bs.pUserData = &pFlac->memoryStream;
   8919     }
   8920 
   8921     return pFlac;
   8922 }
   8923 
   8924 
   8925 
   8926 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8927 {
   8928     return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
   8929 }
   8930 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8931 {
   8932     return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
   8933 }
   8934 
   8935 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8936 {
   8937     return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
   8938 }
   8939 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
   8940 {
   8941     return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
   8942 }
   8943 
   8944 DRFLAC_API void drflac_close(drflac* pFlac)
   8945 {
   8946     if (pFlac == NULL) {
   8947         return;
   8948     }
   8949 
   8950 #ifndef DR_FLAC_NO_STDIO
   8951     /*
   8952     If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
   8953     was used by looking at the callbacks.
   8954     */
   8955     if (pFlac->bs.onRead == drflac__on_read_stdio) {
   8956         fclose((FILE*)pFlac->bs.pUserData);
   8957     }
   8958 
   8959 #ifndef DR_FLAC_NO_OGG
   8960     /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
   8961     if (pFlac->container == drflac_container_ogg) {
   8962         drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
   8963         DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
   8964 
   8965         if (oggbs->onRead == drflac__on_read_stdio) {
   8966             fclose((FILE*)oggbs->pUserData);
   8967         }
   8968     }
   8969 #endif
   8970 #endif
   8971 
   8972     drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
   8973 }
   8974 
   8975 
   8976 #if 0
   8977 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   8978 {
   8979     drflac_uint64 i;
   8980     for (i = 0; i < frameCount; ++i) {
   8981         drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   8982         drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   8983         drflac_uint32 right = left - side;
   8984 
   8985         pOutputSamples[i*2+0] = (drflac_int32)left;
   8986         pOutputSamples[i*2+1] = (drflac_int32)right;
   8987     }
   8988 }
   8989 #endif
   8990 
   8991 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   8992 {
   8993     drflac_uint64 i;
   8994     drflac_uint64 frameCount4 = frameCount >> 2;
   8995     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   8996     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   8997     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   8998     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   8999 
   9000     for (i = 0; i < frameCount4; ++i) {
   9001         drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
   9002         drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
   9003         drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
   9004         drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
   9005 
   9006         drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
   9007         drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
   9008         drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
   9009         drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
   9010 
   9011         drflac_uint32 right0 = left0 - side0;
   9012         drflac_uint32 right1 = left1 - side1;
   9013         drflac_uint32 right2 = left2 - side2;
   9014         drflac_uint32 right3 = left3 - side3;
   9015 
   9016         pOutputSamples[i*8+0] = (drflac_int32)left0;
   9017         pOutputSamples[i*8+1] = (drflac_int32)right0;
   9018         pOutputSamples[i*8+2] = (drflac_int32)left1;
   9019         pOutputSamples[i*8+3] = (drflac_int32)right1;
   9020         pOutputSamples[i*8+4] = (drflac_int32)left2;
   9021         pOutputSamples[i*8+5] = (drflac_int32)right2;
   9022         pOutputSamples[i*8+6] = (drflac_int32)left3;
   9023         pOutputSamples[i*8+7] = (drflac_int32)right3;
   9024     }
   9025 
   9026     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9027         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
   9028         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
   9029         drflac_uint32 right = left - side;
   9030 
   9031         pOutputSamples[i*2+0] = (drflac_int32)left;
   9032         pOutputSamples[i*2+1] = (drflac_int32)right;
   9033     }
   9034 }
   9035 
   9036 #if defined(DRFLAC_SUPPORT_SSE2)
   9037 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9038 {
   9039     drflac_uint64 i;
   9040     drflac_uint64 frameCount4 = frameCount >> 2;
   9041     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9042     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9043     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9044     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9045 
   9046     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9047 
   9048     for (i = 0; i < frameCount4; ++i) {
   9049         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
   9050         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
   9051         __m128i right = _mm_sub_epi32(left, side);
   9052 
   9053         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
   9054         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
   9055     }
   9056 
   9057     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9058         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
   9059         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
   9060         drflac_uint32 right = left - side;
   9061 
   9062         pOutputSamples[i*2+0] = (drflac_int32)left;
   9063         pOutputSamples[i*2+1] = (drflac_int32)right;
   9064     }
   9065 }
   9066 #endif
   9067 
   9068 #if defined(DRFLAC_SUPPORT_NEON)
   9069 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9070 {
   9071     drflac_uint64 i;
   9072     drflac_uint64 frameCount4 = frameCount >> 2;
   9073     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9074     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9075     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9076     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9077     int32x4_t shift0_4;
   9078     int32x4_t shift1_4;
   9079 
   9080     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9081 
   9082     shift0_4 = vdupq_n_s32(shift0);
   9083     shift1_4 = vdupq_n_s32(shift1);
   9084 
   9085     for (i = 0; i < frameCount4; ++i) {
   9086         uint32x4_t left;
   9087         uint32x4_t side;
   9088         uint32x4_t right;
   9089 
   9090         left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
   9091         side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
   9092         right = vsubq_u32(left, side);
   9093 
   9094         drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
   9095     }
   9096 
   9097     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9098         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
   9099         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
   9100         drflac_uint32 right = left - side;
   9101 
   9102         pOutputSamples[i*2+0] = (drflac_int32)left;
   9103         pOutputSamples[i*2+1] = (drflac_int32)right;
   9104     }
   9105 }
   9106 #endif
   9107 
   9108 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9109 {
   9110 #if defined(DRFLAC_SUPPORT_SSE2)
   9111     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
   9112         drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9113     } else
   9114 #elif defined(DRFLAC_SUPPORT_NEON)
   9115     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
   9116         drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9117     } else
   9118 #endif
   9119     {
   9120         /* Scalar fallback. */
   9121 #if 0
   9122         drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9123 #else
   9124         drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9125 #endif
   9126     }
   9127 }
   9128 
   9129 
   9130 #if 0
   9131 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9132 {
   9133     drflac_uint64 i;
   9134     for (i = 0; i < frameCount; ++i) {
   9135         drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   9136         drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   9137         drflac_uint32 left  = right + side;
   9138 
   9139         pOutputSamples[i*2+0] = (drflac_int32)left;
   9140         pOutputSamples[i*2+1] = (drflac_int32)right;
   9141     }
   9142 }
   9143 #endif
   9144 
   9145 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9146 {
   9147     drflac_uint64 i;
   9148     drflac_uint64 frameCount4 = frameCount >> 2;
   9149     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9150     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9151     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9152     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9153 
   9154     for (i = 0; i < frameCount4; ++i) {
   9155         drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
   9156         drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
   9157         drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
   9158         drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
   9159 
   9160         drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
   9161         drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
   9162         drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
   9163         drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
   9164 
   9165         drflac_uint32 left0 = right0 + side0;
   9166         drflac_uint32 left1 = right1 + side1;
   9167         drflac_uint32 left2 = right2 + side2;
   9168         drflac_uint32 left3 = right3 + side3;
   9169 
   9170         pOutputSamples[i*8+0] = (drflac_int32)left0;
   9171         pOutputSamples[i*8+1] = (drflac_int32)right0;
   9172         pOutputSamples[i*8+2] = (drflac_int32)left1;
   9173         pOutputSamples[i*8+3] = (drflac_int32)right1;
   9174         pOutputSamples[i*8+4] = (drflac_int32)left2;
   9175         pOutputSamples[i*8+5] = (drflac_int32)right2;
   9176         pOutputSamples[i*8+6] = (drflac_int32)left3;
   9177         pOutputSamples[i*8+7] = (drflac_int32)right3;
   9178     }
   9179 
   9180     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9181         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
   9182         drflac_uint32 right = pInputSamples1U32[i] << shift1;
   9183         drflac_uint32 left  = right + side;
   9184 
   9185         pOutputSamples[i*2+0] = (drflac_int32)left;
   9186         pOutputSamples[i*2+1] = (drflac_int32)right;
   9187     }
   9188 }
   9189 
   9190 #if defined(DRFLAC_SUPPORT_SSE2)
   9191 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9192 {
   9193     drflac_uint64 i;
   9194     drflac_uint64 frameCount4 = frameCount >> 2;
   9195     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9196     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9197     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9198     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9199 
   9200     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9201 
   9202     for (i = 0; i < frameCount4; ++i) {
   9203         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
   9204         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
   9205         __m128i left  = _mm_add_epi32(right, side);
   9206 
   9207         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
   9208         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
   9209     }
   9210 
   9211     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9212         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
   9213         drflac_uint32 right = pInputSamples1U32[i] << shift1;
   9214         drflac_uint32 left  = right + side;
   9215 
   9216         pOutputSamples[i*2+0] = (drflac_int32)left;
   9217         pOutputSamples[i*2+1] = (drflac_int32)right;
   9218     }
   9219 }
   9220 #endif
   9221 
   9222 #if defined(DRFLAC_SUPPORT_NEON)
   9223 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9224 {
   9225     drflac_uint64 i;
   9226     drflac_uint64 frameCount4 = frameCount >> 2;
   9227     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9228     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9229     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9230     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9231     int32x4_t shift0_4;
   9232     int32x4_t shift1_4;
   9233 
   9234     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9235 
   9236     shift0_4 = vdupq_n_s32(shift0);
   9237     shift1_4 = vdupq_n_s32(shift1);
   9238 
   9239     for (i = 0; i < frameCount4; ++i) {
   9240         uint32x4_t side;
   9241         uint32x4_t right;
   9242         uint32x4_t left;
   9243 
   9244         side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
   9245         right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
   9246         left  = vaddq_u32(right, side);
   9247 
   9248         drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
   9249     }
   9250 
   9251     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9252         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
   9253         drflac_uint32 right = pInputSamples1U32[i] << shift1;
   9254         drflac_uint32 left  = right + side;
   9255 
   9256         pOutputSamples[i*2+0] = (drflac_int32)left;
   9257         pOutputSamples[i*2+1] = (drflac_int32)right;
   9258     }
   9259 }
   9260 #endif
   9261 
   9262 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9263 {
   9264 #if defined(DRFLAC_SUPPORT_SSE2)
   9265     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
   9266         drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9267     } else
   9268 #elif defined(DRFLAC_SUPPORT_NEON)
   9269     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
   9270         drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9271     } else
   9272 #endif
   9273     {
   9274         /* Scalar fallback. */
   9275 #if 0
   9276         drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9277 #else
   9278         drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9279 #endif
   9280     }
   9281 }
   9282 
   9283 
   9284 #if 0
   9285 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9286 {
   9287     for (drflac_uint64 i = 0; i < frameCount; ++i) {
   9288         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9289         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9290 
   9291         mid = (mid << 1) | (side & 0x01);
   9292 
   9293         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
   9294         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
   9295     }
   9296 }
   9297 #endif
   9298 
   9299 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9300 {
   9301     drflac_uint64 i;
   9302     drflac_uint64 frameCount4 = frameCount >> 2;
   9303     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9304     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9305     drflac_int32 shift = unusedBitsPerSample;
   9306 
   9307     if (shift > 0) {
   9308         shift -= 1;
   9309         for (i = 0; i < frameCount4; ++i) {
   9310             drflac_uint32 temp0L;
   9311             drflac_uint32 temp1L;
   9312             drflac_uint32 temp2L;
   9313             drflac_uint32 temp3L;
   9314             drflac_uint32 temp0R;
   9315             drflac_uint32 temp1R;
   9316             drflac_uint32 temp2R;
   9317             drflac_uint32 temp3R;
   9318 
   9319             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9320             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9321             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9322             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9323 
   9324             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9325             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9326             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9327             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9328 
   9329             mid0 = (mid0 << 1) | (side0 & 0x01);
   9330             mid1 = (mid1 << 1) | (side1 & 0x01);
   9331             mid2 = (mid2 << 1) | (side2 & 0x01);
   9332             mid3 = (mid3 << 1) | (side3 & 0x01);
   9333 
   9334             temp0L = (mid0 + side0) << shift;
   9335             temp1L = (mid1 + side1) << shift;
   9336             temp2L = (mid2 + side2) << shift;
   9337             temp3L = (mid3 + side3) << shift;
   9338 
   9339             temp0R = (mid0 - side0) << shift;
   9340             temp1R = (mid1 - side1) << shift;
   9341             temp2R = (mid2 - side2) << shift;
   9342             temp3R = (mid3 - side3) << shift;
   9343 
   9344             pOutputSamples[i*8+0] = (drflac_int32)temp0L;
   9345             pOutputSamples[i*8+1] = (drflac_int32)temp0R;
   9346             pOutputSamples[i*8+2] = (drflac_int32)temp1L;
   9347             pOutputSamples[i*8+3] = (drflac_int32)temp1R;
   9348             pOutputSamples[i*8+4] = (drflac_int32)temp2L;
   9349             pOutputSamples[i*8+5] = (drflac_int32)temp2R;
   9350             pOutputSamples[i*8+6] = (drflac_int32)temp3L;
   9351             pOutputSamples[i*8+7] = (drflac_int32)temp3R;
   9352         }
   9353     } else {
   9354         for (i = 0; i < frameCount4; ++i) {
   9355             drflac_uint32 temp0L;
   9356             drflac_uint32 temp1L;
   9357             drflac_uint32 temp2L;
   9358             drflac_uint32 temp3L;
   9359             drflac_uint32 temp0R;
   9360             drflac_uint32 temp1R;
   9361             drflac_uint32 temp2R;
   9362             drflac_uint32 temp3R;
   9363 
   9364             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9365             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9366             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9367             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9368 
   9369             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9370             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9371             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9372             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9373 
   9374             mid0 = (mid0 << 1) | (side0 & 0x01);
   9375             mid1 = (mid1 << 1) | (side1 & 0x01);
   9376             mid2 = (mid2 << 1) | (side2 & 0x01);
   9377             mid3 = (mid3 << 1) | (side3 & 0x01);
   9378 
   9379             temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
   9380             temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
   9381             temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
   9382             temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
   9383 
   9384             temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
   9385             temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
   9386             temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
   9387             temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
   9388 
   9389             pOutputSamples[i*8+0] = (drflac_int32)temp0L;
   9390             pOutputSamples[i*8+1] = (drflac_int32)temp0R;
   9391             pOutputSamples[i*8+2] = (drflac_int32)temp1L;
   9392             pOutputSamples[i*8+3] = (drflac_int32)temp1R;
   9393             pOutputSamples[i*8+4] = (drflac_int32)temp2L;
   9394             pOutputSamples[i*8+5] = (drflac_int32)temp2R;
   9395             pOutputSamples[i*8+6] = (drflac_int32)temp3L;
   9396             pOutputSamples[i*8+7] = (drflac_int32)temp3R;
   9397         }
   9398     }
   9399 
   9400     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9401         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9402         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9403 
   9404         mid = (mid << 1) | (side & 0x01);
   9405 
   9406         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
   9407         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
   9408     }
   9409 }
   9410 
   9411 #if defined(DRFLAC_SUPPORT_SSE2)
   9412 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9413 {
   9414     drflac_uint64 i;
   9415     drflac_uint64 frameCount4 = frameCount >> 2;
   9416     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9417     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9418     drflac_int32 shift = unusedBitsPerSample;
   9419 
   9420     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9421 
   9422     if (shift == 0) {
   9423         for (i = 0; i < frameCount4; ++i) {
   9424             __m128i mid;
   9425             __m128i side;
   9426             __m128i left;
   9427             __m128i right;
   9428 
   9429             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   9430             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   9431 
   9432             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
   9433 
   9434             left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
   9435             right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
   9436 
   9437             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
   9438             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
   9439         }
   9440 
   9441         for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9442             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9443             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9444 
   9445             mid = (mid << 1) | (side & 0x01);
   9446 
   9447             pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
   9448             pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
   9449         }
   9450     } else {
   9451         shift -= 1;
   9452         for (i = 0; i < frameCount4; ++i) {
   9453             __m128i mid;
   9454             __m128i side;
   9455             __m128i left;
   9456             __m128i right;
   9457 
   9458             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   9459             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   9460 
   9461             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
   9462 
   9463             left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
   9464             right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
   9465 
   9466             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
   9467             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
   9468         }
   9469 
   9470         for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9471             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9472             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9473 
   9474             mid = (mid << 1) | (side & 0x01);
   9475 
   9476             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
   9477             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
   9478         }
   9479     }
   9480 }
   9481 #endif
   9482 
   9483 #if defined(DRFLAC_SUPPORT_NEON)
   9484 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9485 {
   9486     drflac_uint64 i;
   9487     drflac_uint64 frameCount4 = frameCount >> 2;
   9488     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9489     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9490     drflac_int32 shift = unusedBitsPerSample;
   9491     int32x4_t  wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
   9492     int32x4_t  wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
   9493     uint32x4_t one4;
   9494 
   9495     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9496 
   9497     wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   9498     wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   9499     one4         = vdupq_n_u32(1);
   9500 
   9501     if (shift == 0) {
   9502         for (i = 0; i < frameCount4; ++i) {
   9503             uint32x4_t mid;
   9504             uint32x4_t side;
   9505             int32x4_t left;
   9506             int32x4_t right;
   9507 
   9508             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
   9509             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
   9510 
   9511             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
   9512 
   9513             left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
   9514             right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
   9515 
   9516             drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
   9517         }
   9518 
   9519         for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9520             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9521             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9522 
   9523             mid = (mid << 1) | (side & 0x01);
   9524 
   9525             pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
   9526             pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
   9527         }
   9528     } else {
   9529         int32x4_t shift4;
   9530 
   9531         shift -= 1;
   9532         shift4 = vdupq_n_s32(shift);
   9533 
   9534         for (i = 0; i < frameCount4; ++i) {
   9535             uint32x4_t mid;
   9536             uint32x4_t side;
   9537             int32x4_t left;
   9538             int32x4_t right;
   9539 
   9540             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
   9541             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
   9542 
   9543             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
   9544 
   9545             left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
   9546             right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
   9547 
   9548             drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
   9549         }
   9550 
   9551         for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9552             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9553             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9554 
   9555             mid = (mid << 1) | (side & 0x01);
   9556 
   9557             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
   9558             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
   9559         }
   9560     }
   9561 }
   9562 #endif
   9563 
   9564 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9565 {
   9566 #if defined(DRFLAC_SUPPORT_SSE2)
   9567     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
   9568         drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9569     } else
   9570 #elif defined(DRFLAC_SUPPORT_NEON)
   9571     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
   9572         drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9573     } else
   9574 #endif
   9575     {
   9576         /* Scalar fallback. */
   9577 #if 0
   9578         drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9579 #else
   9580         drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9581 #endif
   9582     }
   9583 }
   9584 
   9585 
   9586 #if 0
   9587 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9588 {
   9589     for (drflac_uint64 i = 0; i < frameCount; ++i) {
   9590         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
   9591         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
   9592     }
   9593 }
   9594 #endif
   9595 
   9596 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9597 {
   9598     drflac_uint64 i;
   9599     drflac_uint64 frameCount4 = frameCount >> 2;
   9600     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9601     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9602     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9603     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9604 
   9605     for (i = 0; i < frameCount4; ++i) {
   9606         drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
   9607         drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
   9608         drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
   9609         drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
   9610 
   9611         drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
   9612         drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
   9613         drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
   9614         drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
   9615 
   9616         pOutputSamples[i*8+0] = (drflac_int32)tempL0;
   9617         pOutputSamples[i*8+1] = (drflac_int32)tempR0;
   9618         pOutputSamples[i*8+2] = (drflac_int32)tempL1;
   9619         pOutputSamples[i*8+3] = (drflac_int32)tempR1;
   9620         pOutputSamples[i*8+4] = (drflac_int32)tempL2;
   9621         pOutputSamples[i*8+5] = (drflac_int32)tempR2;
   9622         pOutputSamples[i*8+6] = (drflac_int32)tempL3;
   9623         pOutputSamples[i*8+7] = (drflac_int32)tempR3;
   9624     }
   9625 
   9626     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9627         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
   9628         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
   9629     }
   9630 }
   9631 
   9632 #if defined(DRFLAC_SUPPORT_SSE2)
   9633 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9634 {
   9635     drflac_uint64 i;
   9636     drflac_uint64 frameCount4 = frameCount >> 2;
   9637     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9638     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9639     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9640     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9641 
   9642     for (i = 0; i < frameCount4; ++i) {
   9643         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
   9644         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
   9645 
   9646         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
   9647         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
   9648     }
   9649 
   9650     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9651         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
   9652         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
   9653     }
   9654 }
   9655 #endif
   9656 
   9657 #if defined(DRFLAC_SUPPORT_NEON)
   9658 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9659 {
   9660     drflac_uint64 i;
   9661     drflac_uint64 frameCount4 = frameCount >> 2;
   9662     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9663     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9664     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9665     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9666 
   9667     int32x4_t shift4_0 = vdupq_n_s32(shift0);
   9668     int32x4_t shift4_1 = vdupq_n_s32(shift1);
   9669 
   9670     for (i = 0; i < frameCount4; ++i) {
   9671         int32x4_t left;
   9672         int32x4_t right;
   9673 
   9674         left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
   9675         right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
   9676 
   9677         drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
   9678     }
   9679 
   9680     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9681         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
   9682         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
   9683     }
   9684 }
   9685 #endif
   9686 
   9687 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
   9688 {
   9689 #if defined(DRFLAC_SUPPORT_SSE2)
   9690     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
   9691         drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9692     } else
   9693 #elif defined(DRFLAC_SUPPORT_NEON)
   9694     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
   9695         drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9696     } else
   9697 #endif
   9698     {
   9699         /* Scalar fallback. */
   9700 #if 0
   9701         drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9702 #else
   9703         drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9704 #endif
   9705     }
   9706 }
   9707 
   9708 
   9709 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
   9710 {
   9711     drflac_uint64 framesRead;
   9712     drflac_uint32 unusedBitsPerSample;
   9713 
   9714     if (pFlac == NULL || framesToRead == 0) {
   9715         return 0;
   9716     }
   9717 
   9718     if (pBufferOut == NULL) {
   9719         return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
   9720     }
   9721 
   9722     DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
   9723     unusedBitsPerSample = 32 - pFlac->bitsPerSample;
   9724 
   9725     framesRead = 0;
   9726     while (framesToRead > 0) {
   9727         /* If we've run out of samples in this frame, go to the next. */
   9728         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
   9729             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
   9730                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
   9731             }
   9732         } else {
   9733             unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
   9734             drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
   9735             drflac_uint64 frameCountThisIteration = framesToRead;
   9736 
   9737             if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
   9738                 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
   9739             }
   9740 
   9741             if (channelCount == 2) {
   9742                 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
   9743                 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
   9744 
   9745                 switch (pFlac->currentFLACFrame.header.channelAssignment)
   9746                 {
   9747                     case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
   9748                     {
   9749                         drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
   9750                     } break;
   9751 
   9752                     case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
   9753                     {
   9754                         drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
   9755                     } break;
   9756 
   9757                     case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
   9758                     {
   9759                         drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
   9760                     } break;
   9761 
   9762                     case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
   9763                     default:
   9764                     {
   9765                         drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
   9766                     } break;
   9767                 }
   9768             } else {
   9769                 /* Generic interleaving. */
   9770                 drflac_uint64 i;
   9771                 for (i = 0; i < frameCountThisIteration; ++i) {
   9772                     unsigned int j;
   9773                     for (j = 0; j < channelCount; ++j) {
   9774                         pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
   9775                     }
   9776                 }
   9777             }
   9778 
   9779             framesRead                += frameCountThisIteration;
   9780             pBufferOut                += frameCountThisIteration * channelCount;
   9781             framesToRead              -= frameCountThisIteration;
   9782             pFlac->currentPCMFrame    += frameCountThisIteration;
   9783             pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
   9784         }
   9785     }
   9786 
   9787     return framesRead;
   9788 }
   9789 
   9790 
   9791 #if 0
   9792 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9793 {
   9794     drflac_uint64 i;
   9795     for (i = 0; i < frameCount; ++i) {
   9796         drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   9797         drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   9798         drflac_uint32 right = left - side;
   9799 
   9800         left  >>= 16;
   9801         right >>= 16;
   9802 
   9803         pOutputSamples[i*2+0] = (drflac_int16)left;
   9804         pOutputSamples[i*2+1] = (drflac_int16)right;
   9805     }
   9806 }
   9807 #endif
   9808 
   9809 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9810 {
   9811     drflac_uint64 i;
   9812     drflac_uint64 frameCount4 = frameCount >> 2;
   9813     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9814     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9815     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9816     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9817 
   9818     for (i = 0; i < frameCount4; ++i) {
   9819         drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
   9820         drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
   9821         drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
   9822         drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
   9823 
   9824         drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
   9825         drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
   9826         drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
   9827         drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
   9828 
   9829         drflac_uint32 right0 = left0 - side0;
   9830         drflac_uint32 right1 = left1 - side1;
   9831         drflac_uint32 right2 = left2 - side2;
   9832         drflac_uint32 right3 = left3 - side3;
   9833 
   9834         left0  >>= 16;
   9835         left1  >>= 16;
   9836         left2  >>= 16;
   9837         left3  >>= 16;
   9838 
   9839         right0 >>= 16;
   9840         right1 >>= 16;
   9841         right2 >>= 16;
   9842         right3 >>= 16;
   9843 
   9844         pOutputSamples[i*8+0] = (drflac_int16)left0;
   9845         pOutputSamples[i*8+1] = (drflac_int16)right0;
   9846         pOutputSamples[i*8+2] = (drflac_int16)left1;
   9847         pOutputSamples[i*8+3] = (drflac_int16)right1;
   9848         pOutputSamples[i*8+4] = (drflac_int16)left2;
   9849         pOutputSamples[i*8+5] = (drflac_int16)right2;
   9850         pOutputSamples[i*8+6] = (drflac_int16)left3;
   9851         pOutputSamples[i*8+7] = (drflac_int16)right3;
   9852     }
   9853 
   9854     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9855         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
   9856         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
   9857         drflac_uint32 right = left - side;
   9858 
   9859         left  >>= 16;
   9860         right >>= 16;
   9861 
   9862         pOutputSamples[i*2+0] = (drflac_int16)left;
   9863         pOutputSamples[i*2+1] = (drflac_int16)right;
   9864     }
   9865 }
   9866 
   9867 #if defined(DRFLAC_SUPPORT_SSE2)
   9868 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9869 {
   9870     drflac_uint64 i;
   9871     drflac_uint64 frameCount4 = frameCount >> 2;
   9872     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9873     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9874     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9875     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9876 
   9877     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9878 
   9879     for (i = 0; i < frameCount4; ++i) {
   9880         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
   9881         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
   9882         __m128i right = _mm_sub_epi32(left, side);
   9883 
   9884         left  = _mm_srai_epi32(left,  16);
   9885         right = _mm_srai_epi32(right, 16);
   9886 
   9887         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
   9888     }
   9889 
   9890     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9891         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
   9892         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
   9893         drflac_uint32 right = left - side;
   9894 
   9895         left  >>= 16;
   9896         right >>= 16;
   9897 
   9898         pOutputSamples[i*2+0] = (drflac_int16)left;
   9899         pOutputSamples[i*2+1] = (drflac_int16)right;
   9900     }
   9901 }
   9902 #endif
   9903 
   9904 #if defined(DRFLAC_SUPPORT_NEON)
   9905 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9906 {
   9907     drflac_uint64 i;
   9908     drflac_uint64 frameCount4 = frameCount >> 2;
   9909     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9910     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9911     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9912     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9913     int32x4_t shift0_4;
   9914     int32x4_t shift1_4;
   9915 
   9916     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
   9917 
   9918     shift0_4 = vdupq_n_s32(shift0);
   9919     shift1_4 = vdupq_n_s32(shift1);
   9920 
   9921     for (i = 0; i < frameCount4; ++i) {
   9922         uint32x4_t left;
   9923         uint32x4_t side;
   9924         uint32x4_t right;
   9925 
   9926         left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
   9927         side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
   9928         right = vsubq_u32(left, side);
   9929 
   9930         left  = vshrq_n_u32(left,  16);
   9931         right = vshrq_n_u32(right, 16);
   9932 
   9933         drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
   9934     }
   9935 
   9936     for (i = (frameCount4 << 2); i < frameCount; ++i) {
   9937         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
   9938         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
   9939         drflac_uint32 right = left - side;
   9940 
   9941         left  >>= 16;
   9942         right >>= 16;
   9943 
   9944         pOutputSamples[i*2+0] = (drflac_int16)left;
   9945         pOutputSamples[i*2+1] = (drflac_int16)right;
   9946     }
   9947 }
   9948 #endif
   9949 
   9950 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9951 {
   9952 #if defined(DRFLAC_SUPPORT_SSE2)
   9953     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
   9954         drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9955     } else
   9956 #elif defined(DRFLAC_SUPPORT_NEON)
   9957     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
   9958         drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9959     } else
   9960 #endif
   9961     {
   9962         /* Scalar fallback. */
   9963 #if 0
   9964         drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9965 #else
   9966         drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
   9967 #endif
   9968     }
   9969 }
   9970 
   9971 
   9972 #if 0
   9973 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9974 {
   9975     drflac_uint64 i;
   9976     for (i = 0; i < frameCount; ++i) {
   9977         drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
   9978         drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
   9979         drflac_uint32 left  = right + side;
   9980 
   9981         left  >>= 16;
   9982         right >>= 16;
   9983 
   9984         pOutputSamples[i*2+0] = (drflac_int16)left;
   9985         pOutputSamples[i*2+1] = (drflac_int16)right;
   9986     }
   9987 }
   9988 #endif
   9989 
   9990 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
   9991 {
   9992     drflac_uint64 i;
   9993     drflac_uint64 frameCount4 = frameCount >> 2;
   9994     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
   9995     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
   9996     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
   9997     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
   9998 
   9999     for (i = 0; i < frameCount4; ++i) {
  10000         drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
  10001         drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
  10002         drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
  10003         drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
  10004 
  10005         drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
  10006         drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
  10007         drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
  10008         drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
  10009 
  10010         drflac_uint32 left0 = right0 + side0;
  10011         drflac_uint32 left1 = right1 + side1;
  10012         drflac_uint32 left2 = right2 + side2;
  10013         drflac_uint32 left3 = right3 + side3;
  10014 
  10015         left0  >>= 16;
  10016         left1  >>= 16;
  10017         left2  >>= 16;
  10018         left3  >>= 16;
  10019 
  10020         right0 >>= 16;
  10021         right1 >>= 16;
  10022         right2 >>= 16;
  10023         right3 >>= 16;
  10024 
  10025         pOutputSamples[i*8+0] = (drflac_int16)left0;
  10026         pOutputSamples[i*8+1] = (drflac_int16)right0;
  10027         pOutputSamples[i*8+2] = (drflac_int16)left1;
  10028         pOutputSamples[i*8+3] = (drflac_int16)right1;
  10029         pOutputSamples[i*8+4] = (drflac_int16)left2;
  10030         pOutputSamples[i*8+5] = (drflac_int16)right2;
  10031         pOutputSamples[i*8+6] = (drflac_int16)left3;
  10032         pOutputSamples[i*8+7] = (drflac_int16)right3;
  10033     }
  10034 
  10035     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10036         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
  10037         drflac_uint32 right = pInputSamples1U32[i] << shift1;
  10038         drflac_uint32 left  = right + side;
  10039 
  10040         left  >>= 16;
  10041         right >>= 16;
  10042 
  10043         pOutputSamples[i*2+0] = (drflac_int16)left;
  10044         pOutputSamples[i*2+1] = (drflac_int16)right;
  10045     }
  10046 }
  10047 
  10048 #if defined(DRFLAC_SUPPORT_SSE2)
  10049 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10050 {
  10051     drflac_uint64 i;
  10052     drflac_uint64 frameCount4 = frameCount >> 2;
  10053     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10054     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10055     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10056     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10057 
  10058     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10059 
  10060     for (i = 0; i < frameCount4; ++i) {
  10061         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  10062         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  10063         __m128i left  = _mm_add_epi32(right, side);
  10064 
  10065         left  = _mm_srai_epi32(left,  16);
  10066         right = _mm_srai_epi32(right, 16);
  10067 
  10068         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  10069     }
  10070 
  10071     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10072         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
  10073         drflac_uint32 right = pInputSamples1U32[i] << shift1;
  10074         drflac_uint32 left  = right + side;
  10075 
  10076         left  >>= 16;
  10077         right >>= 16;
  10078 
  10079         pOutputSamples[i*2+0] = (drflac_int16)left;
  10080         pOutputSamples[i*2+1] = (drflac_int16)right;
  10081     }
  10082 }
  10083 #endif
  10084 
  10085 #if defined(DRFLAC_SUPPORT_NEON)
  10086 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10087 {
  10088     drflac_uint64 i;
  10089     drflac_uint64 frameCount4 = frameCount >> 2;
  10090     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10091     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10092     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10093     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10094     int32x4_t shift0_4;
  10095     int32x4_t shift1_4;
  10096 
  10097     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10098 
  10099     shift0_4 = vdupq_n_s32(shift0);
  10100     shift1_4 = vdupq_n_s32(shift1);
  10101 
  10102     for (i = 0; i < frameCount4; ++i) {
  10103         uint32x4_t side;
  10104         uint32x4_t right;
  10105         uint32x4_t left;
  10106 
  10107         side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  10108         right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  10109         left  = vaddq_u32(right, side);
  10110 
  10111         left  = vshrq_n_u32(left,  16);
  10112         right = vshrq_n_u32(right, 16);
  10113 
  10114         drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
  10115     }
  10116 
  10117     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10118         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
  10119         drflac_uint32 right = pInputSamples1U32[i] << shift1;
  10120         drflac_uint32 left  = right + side;
  10121 
  10122         left  >>= 16;
  10123         right >>= 16;
  10124 
  10125         pOutputSamples[i*2+0] = (drflac_int16)left;
  10126         pOutputSamples[i*2+1] = (drflac_int16)right;
  10127     }
  10128 }
  10129 #endif
  10130 
  10131 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10132 {
  10133 #if defined(DRFLAC_SUPPORT_SSE2)
  10134     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  10135         drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10136     } else
  10137 #elif defined(DRFLAC_SUPPORT_NEON)
  10138     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  10139         drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10140     } else
  10141 #endif
  10142     {
  10143         /* Scalar fallback. */
  10144 #if 0
  10145         drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10146 #else
  10147         drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10148 #endif
  10149     }
  10150 }
  10151 
  10152 
  10153 #if 0
  10154 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10155 {
  10156     for (drflac_uint64 i = 0; i < frameCount; ++i) {
  10157         drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10158         drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10159 
  10160         mid = (mid << 1) | (side & 0x01);
  10161 
  10162         pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
  10163         pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
  10164     }
  10165 }
  10166 #endif
  10167 
  10168 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10169 {
  10170     drflac_uint64 i;
  10171     drflac_uint64 frameCount4 = frameCount >> 2;
  10172     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10173     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10174     drflac_uint32 shift = unusedBitsPerSample;
  10175 
  10176     if (shift > 0) {
  10177         shift -= 1;
  10178         for (i = 0; i < frameCount4; ++i) {
  10179             drflac_uint32 temp0L;
  10180             drflac_uint32 temp1L;
  10181             drflac_uint32 temp2L;
  10182             drflac_uint32 temp3L;
  10183             drflac_uint32 temp0R;
  10184             drflac_uint32 temp1R;
  10185             drflac_uint32 temp2R;
  10186             drflac_uint32 temp3R;
  10187 
  10188             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10189             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10190             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10191             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10192 
  10193             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10194             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10195             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10196             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10197 
  10198             mid0 = (mid0 << 1) | (side0 & 0x01);
  10199             mid1 = (mid1 << 1) | (side1 & 0x01);
  10200             mid2 = (mid2 << 1) | (side2 & 0x01);
  10201             mid3 = (mid3 << 1) | (side3 & 0x01);
  10202 
  10203             temp0L = (mid0 + side0) << shift;
  10204             temp1L = (mid1 + side1) << shift;
  10205             temp2L = (mid2 + side2) << shift;
  10206             temp3L = (mid3 + side3) << shift;
  10207 
  10208             temp0R = (mid0 - side0) << shift;
  10209             temp1R = (mid1 - side1) << shift;
  10210             temp2R = (mid2 - side2) << shift;
  10211             temp3R = (mid3 - side3) << shift;
  10212 
  10213             temp0L >>= 16;
  10214             temp1L >>= 16;
  10215             temp2L >>= 16;
  10216             temp3L >>= 16;
  10217 
  10218             temp0R >>= 16;
  10219             temp1R >>= 16;
  10220             temp2R >>= 16;
  10221             temp3R >>= 16;
  10222 
  10223             pOutputSamples[i*8+0] = (drflac_int16)temp0L;
  10224             pOutputSamples[i*8+1] = (drflac_int16)temp0R;
  10225             pOutputSamples[i*8+2] = (drflac_int16)temp1L;
  10226             pOutputSamples[i*8+3] = (drflac_int16)temp1R;
  10227             pOutputSamples[i*8+4] = (drflac_int16)temp2L;
  10228             pOutputSamples[i*8+5] = (drflac_int16)temp2R;
  10229             pOutputSamples[i*8+6] = (drflac_int16)temp3L;
  10230             pOutputSamples[i*8+7] = (drflac_int16)temp3R;
  10231         }
  10232     } else {
  10233         for (i = 0; i < frameCount4; ++i) {
  10234             drflac_uint32 temp0L;
  10235             drflac_uint32 temp1L;
  10236             drflac_uint32 temp2L;
  10237             drflac_uint32 temp3L;
  10238             drflac_uint32 temp0R;
  10239             drflac_uint32 temp1R;
  10240             drflac_uint32 temp2R;
  10241             drflac_uint32 temp3R;
  10242 
  10243             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10244             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10245             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10246             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10247 
  10248             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10249             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10250             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10251             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10252 
  10253             mid0 = (mid0 << 1) | (side0 & 0x01);
  10254             mid1 = (mid1 << 1) | (side1 & 0x01);
  10255             mid2 = (mid2 << 1) | (side2 & 0x01);
  10256             mid3 = (mid3 << 1) | (side3 & 0x01);
  10257 
  10258             temp0L = ((drflac_int32)(mid0 + side0) >> 1);
  10259             temp1L = ((drflac_int32)(mid1 + side1) >> 1);
  10260             temp2L = ((drflac_int32)(mid2 + side2) >> 1);
  10261             temp3L = ((drflac_int32)(mid3 + side3) >> 1);
  10262 
  10263             temp0R = ((drflac_int32)(mid0 - side0) >> 1);
  10264             temp1R = ((drflac_int32)(mid1 - side1) >> 1);
  10265             temp2R = ((drflac_int32)(mid2 - side2) >> 1);
  10266             temp3R = ((drflac_int32)(mid3 - side3) >> 1);
  10267 
  10268             temp0L >>= 16;
  10269             temp1L >>= 16;
  10270             temp2L >>= 16;
  10271             temp3L >>= 16;
  10272 
  10273             temp0R >>= 16;
  10274             temp1R >>= 16;
  10275             temp2R >>= 16;
  10276             temp3R >>= 16;
  10277 
  10278             pOutputSamples[i*8+0] = (drflac_int16)temp0L;
  10279             pOutputSamples[i*8+1] = (drflac_int16)temp0R;
  10280             pOutputSamples[i*8+2] = (drflac_int16)temp1L;
  10281             pOutputSamples[i*8+3] = (drflac_int16)temp1R;
  10282             pOutputSamples[i*8+4] = (drflac_int16)temp2L;
  10283             pOutputSamples[i*8+5] = (drflac_int16)temp2R;
  10284             pOutputSamples[i*8+6] = (drflac_int16)temp3L;
  10285             pOutputSamples[i*8+7] = (drflac_int16)temp3R;
  10286         }
  10287     }
  10288 
  10289     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10290         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10291         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10292 
  10293         mid = (mid << 1) | (side & 0x01);
  10294 
  10295         pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
  10296         pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
  10297     }
  10298 }
  10299 
  10300 #if defined(DRFLAC_SUPPORT_SSE2)
  10301 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10302 {
  10303     drflac_uint64 i;
  10304     drflac_uint64 frameCount4 = frameCount >> 2;
  10305     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10306     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10307     drflac_uint32 shift = unusedBitsPerSample;
  10308 
  10309     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10310 
  10311     if (shift == 0) {
  10312         for (i = 0; i < frameCount4; ++i) {
  10313             __m128i mid;
  10314             __m128i side;
  10315             __m128i left;
  10316             __m128i right;
  10317 
  10318             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  10319             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  10320 
  10321             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  10322 
  10323             left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
  10324             right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
  10325 
  10326             left  = _mm_srai_epi32(left,  16);
  10327             right = _mm_srai_epi32(right, 16);
  10328 
  10329             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  10330         }
  10331 
  10332         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10333             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10334             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10335 
  10336             mid = (mid << 1) | (side & 0x01);
  10337 
  10338             pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
  10339             pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
  10340         }
  10341     } else {
  10342         shift -= 1;
  10343         for (i = 0; i < frameCount4; ++i) {
  10344             __m128i mid;
  10345             __m128i side;
  10346             __m128i left;
  10347             __m128i right;
  10348 
  10349             mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  10350             side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  10351 
  10352             mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  10353 
  10354             left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
  10355             right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
  10356 
  10357             left  = _mm_srai_epi32(left,  16);
  10358             right = _mm_srai_epi32(right, 16);
  10359 
  10360             _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  10361         }
  10362 
  10363         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10364             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10365             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10366 
  10367             mid = (mid << 1) | (side & 0x01);
  10368 
  10369             pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
  10370             pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
  10371         }
  10372     }
  10373 }
  10374 #endif
  10375 
  10376 #if defined(DRFLAC_SUPPORT_NEON)
  10377 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10378 {
  10379     drflac_uint64 i;
  10380     drflac_uint64 frameCount4 = frameCount >> 2;
  10381     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10382     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10383     drflac_uint32 shift = unusedBitsPerSample;
  10384     int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
  10385     int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
  10386 
  10387     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10388 
  10389     wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  10390     wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  10391 
  10392     if (shift == 0) {
  10393         for (i = 0; i < frameCount4; ++i) {
  10394             uint32x4_t mid;
  10395             uint32x4_t side;
  10396             int32x4_t left;
  10397             int32x4_t right;
  10398 
  10399             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
  10400             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
  10401 
  10402             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  10403 
  10404             left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
  10405             right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
  10406 
  10407             left  = vshrq_n_s32(left,  16);
  10408             right = vshrq_n_s32(right, 16);
  10409 
  10410             drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
  10411         }
  10412 
  10413         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10414             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10415             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10416 
  10417             mid = (mid << 1) | (side & 0x01);
  10418 
  10419             pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
  10420             pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
  10421         }
  10422     } else {
  10423         int32x4_t shift4;
  10424 
  10425         shift -= 1;
  10426         shift4 = vdupq_n_s32(shift);
  10427 
  10428         for (i = 0; i < frameCount4; ++i) {
  10429             uint32x4_t mid;
  10430             uint32x4_t side;
  10431             int32x4_t left;
  10432             int32x4_t right;
  10433 
  10434             mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
  10435             side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
  10436 
  10437             mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  10438 
  10439             left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
  10440             right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
  10441 
  10442             left  = vshrq_n_s32(left,  16);
  10443             right = vshrq_n_s32(right, 16);
  10444 
  10445             drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
  10446         }
  10447 
  10448         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10449             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10450             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10451 
  10452             mid = (mid << 1) | (side & 0x01);
  10453 
  10454             pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
  10455             pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
  10456         }
  10457     }
  10458 }
  10459 #endif
  10460 
  10461 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10462 {
  10463 #if defined(DRFLAC_SUPPORT_SSE2)
  10464     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  10465         drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10466     } else
  10467 #elif defined(DRFLAC_SUPPORT_NEON)
  10468     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  10469         drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10470     } else
  10471 #endif
  10472     {
  10473         /* Scalar fallback. */
  10474 #if 0
  10475         drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10476 #else
  10477         drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10478 #endif
  10479     }
  10480 }
  10481 
  10482 
  10483 #if 0
  10484 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10485 {
  10486     for (drflac_uint64 i = 0; i < frameCount; ++i) {
  10487         pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
  10488         pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
  10489     }
  10490 }
  10491 #endif
  10492 
  10493 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10494 {
  10495     drflac_uint64 i;
  10496     drflac_uint64 frameCount4 = frameCount >> 2;
  10497     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10498     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10499     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10500     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10501 
  10502     for (i = 0; i < frameCount4; ++i) {
  10503         drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
  10504         drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
  10505         drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
  10506         drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
  10507 
  10508         drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
  10509         drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
  10510         drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
  10511         drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
  10512 
  10513         tempL0 >>= 16;
  10514         tempL1 >>= 16;
  10515         tempL2 >>= 16;
  10516         tempL3 >>= 16;
  10517 
  10518         tempR0 >>= 16;
  10519         tempR1 >>= 16;
  10520         tempR2 >>= 16;
  10521         tempR3 >>= 16;
  10522 
  10523         pOutputSamples[i*8+0] = (drflac_int16)tempL0;
  10524         pOutputSamples[i*8+1] = (drflac_int16)tempR0;
  10525         pOutputSamples[i*8+2] = (drflac_int16)tempL1;
  10526         pOutputSamples[i*8+3] = (drflac_int16)tempR1;
  10527         pOutputSamples[i*8+4] = (drflac_int16)tempL2;
  10528         pOutputSamples[i*8+5] = (drflac_int16)tempR2;
  10529         pOutputSamples[i*8+6] = (drflac_int16)tempL3;
  10530         pOutputSamples[i*8+7] = (drflac_int16)tempR3;
  10531     }
  10532 
  10533     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10534         pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
  10535         pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
  10536     }
  10537 }
  10538 
  10539 #if defined(DRFLAC_SUPPORT_SSE2)
  10540 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10541 {
  10542     drflac_uint64 i;
  10543     drflac_uint64 frameCount4 = frameCount >> 2;
  10544     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10545     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10546     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10547     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10548 
  10549     for (i = 0; i < frameCount4; ++i) {
  10550         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  10551         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  10552 
  10553         left  = _mm_srai_epi32(left,  16);
  10554         right = _mm_srai_epi32(right, 16);
  10555 
  10556         /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
  10557         _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  10558     }
  10559 
  10560     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10561         pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
  10562         pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
  10563     }
  10564 }
  10565 #endif
  10566 
  10567 #if defined(DRFLAC_SUPPORT_NEON)
  10568 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10569 {
  10570     drflac_uint64 i;
  10571     drflac_uint64 frameCount4 = frameCount >> 2;
  10572     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10573     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10574     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10575     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10576 
  10577     int32x4_t shift0_4 = vdupq_n_s32(shift0);
  10578     int32x4_t shift1_4 = vdupq_n_s32(shift1);
  10579 
  10580     for (i = 0; i < frameCount4; ++i) {
  10581         int32x4_t left;
  10582         int32x4_t right;
  10583 
  10584         left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
  10585         right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
  10586 
  10587         left  = vshrq_n_s32(left,  16);
  10588         right = vshrq_n_s32(right, 16);
  10589 
  10590         drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
  10591     }
  10592 
  10593     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10594         pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
  10595         pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
  10596     }
  10597 }
  10598 #endif
  10599 
  10600 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  10601 {
  10602 #if defined(DRFLAC_SUPPORT_SSE2)
  10603     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  10604         drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10605     } else
  10606 #elif defined(DRFLAC_SUPPORT_NEON)
  10607     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  10608         drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10609     } else
  10610 #endif
  10611     {
  10612         /* Scalar fallback. */
  10613 #if 0
  10614         drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10615 #else
  10616         drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10617 #endif
  10618     }
  10619 }
  10620 
  10621 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
  10622 {
  10623     drflac_uint64 framesRead;
  10624     drflac_uint32 unusedBitsPerSample;
  10625 
  10626     if (pFlac == NULL || framesToRead == 0) {
  10627         return 0;
  10628     }
  10629 
  10630     if (pBufferOut == NULL) {
  10631         return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
  10632     }
  10633 
  10634     DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
  10635     unusedBitsPerSample = 32 - pFlac->bitsPerSample;
  10636 
  10637     framesRead = 0;
  10638     while (framesToRead > 0) {
  10639         /* If we've run out of samples in this frame, go to the next. */
  10640         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  10641             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  10642                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
  10643             }
  10644         } else {
  10645             unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  10646             drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
  10647             drflac_uint64 frameCountThisIteration = framesToRead;
  10648 
  10649             if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
  10650                 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
  10651             }
  10652 
  10653             if (channelCount == 2) {
  10654                 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
  10655                 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
  10656 
  10657                 switch (pFlac->currentFLACFrame.header.channelAssignment)
  10658                 {
  10659                     case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
  10660                     {
  10661                         drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  10662                     } break;
  10663 
  10664                     case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
  10665                     {
  10666                         drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  10667                     } break;
  10668 
  10669                     case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
  10670                     {
  10671                         drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  10672                     } break;
  10673 
  10674                     case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
  10675                     default:
  10676                     {
  10677                         drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  10678                     } break;
  10679                 }
  10680             } else {
  10681                 /* Generic interleaving. */
  10682                 drflac_uint64 i;
  10683                 for (i = 0; i < frameCountThisIteration; ++i) {
  10684                     unsigned int j;
  10685                     for (j = 0; j < channelCount; ++j) {
  10686                         drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
  10687                         pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
  10688                     }
  10689                 }
  10690             }
  10691 
  10692             framesRead                += frameCountThisIteration;
  10693             pBufferOut                += frameCountThisIteration * channelCount;
  10694             framesToRead              -= frameCountThisIteration;
  10695             pFlac->currentPCMFrame    += frameCountThisIteration;
  10696             pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
  10697         }
  10698     }
  10699 
  10700     return framesRead;
  10701 }
  10702 
  10703 
  10704 #if 0
  10705 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10706 {
  10707     drflac_uint64 i;
  10708     for (i = 0; i < frameCount; ++i) {
  10709         drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  10710         drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  10711         drflac_uint32 right = left - side;
  10712 
  10713         pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
  10714         pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
  10715     }
  10716 }
  10717 #endif
  10718 
  10719 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10720 {
  10721     drflac_uint64 i;
  10722     drflac_uint64 frameCount4 = frameCount >> 2;
  10723     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10724     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10725     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10726     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10727 
  10728     float factor = 1 / 2147483648.0;
  10729 
  10730     for (i = 0; i < frameCount4; ++i) {
  10731         drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
  10732         drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
  10733         drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
  10734         drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
  10735 
  10736         drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
  10737         drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
  10738         drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
  10739         drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
  10740 
  10741         drflac_uint32 right0 = left0 - side0;
  10742         drflac_uint32 right1 = left1 - side1;
  10743         drflac_uint32 right2 = left2 - side2;
  10744         drflac_uint32 right3 = left3 - side3;
  10745 
  10746         pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
  10747         pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
  10748         pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
  10749         pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
  10750         pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
  10751         pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
  10752         pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
  10753         pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
  10754     }
  10755 
  10756     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10757         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
  10758         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
  10759         drflac_uint32 right = left - side;
  10760 
  10761         pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
  10762         pOutputSamples[i*2+1] = (drflac_int32)right * factor;
  10763     }
  10764 }
  10765 
  10766 #if defined(DRFLAC_SUPPORT_SSE2)
  10767 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10768 {
  10769     drflac_uint64 i;
  10770     drflac_uint64 frameCount4 = frameCount >> 2;
  10771     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10772     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10773     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  10774     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  10775     __m128 factor;
  10776 
  10777     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10778 
  10779     factor = _mm_set1_ps(1.0f / 8388608.0f);
  10780 
  10781     for (i = 0; i < frameCount4; ++i) {
  10782         __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  10783         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  10784         __m128i right = _mm_sub_epi32(left, side);
  10785         __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
  10786         __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
  10787 
  10788         _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  10789         _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  10790     }
  10791 
  10792     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10793         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
  10794         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
  10795         drflac_uint32 right = left - side;
  10796 
  10797         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
  10798         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  10799     }
  10800 }
  10801 #endif
  10802 
  10803 #if defined(DRFLAC_SUPPORT_NEON)
  10804 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10805 {
  10806     drflac_uint64 i;
  10807     drflac_uint64 frameCount4 = frameCount >> 2;
  10808     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10809     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10810     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  10811     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  10812     float32x4_t factor4;
  10813     int32x4_t shift0_4;
  10814     int32x4_t shift1_4;
  10815 
  10816     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10817 
  10818     factor4  = vdupq_n_f32(1.0f / 8388608.0f);
  10819     shift0_4 = vdupq_n_s32(shift0);
  10820     shift1_4 = vdupq_n_s32(shift1);
  10821 
  10822     for (i = 0; i < frameCount4; ++i) {
  10823         uint32x4_t left;
  10824         uint32x4_t side;
  10825         uint32x4_t right;
  10826         float32x4_t leftf;
  10827         float32x4_t rightf;
  10828 
  10829         left   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  10830         side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  10831         right  = vsubq_u32(left, side);
  10832         leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
  10833         rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
  10834 
  10835         drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  10836     }
  10837 
  10838     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10839         drflac_uint32 left  = pInputSamples0U32[i] << shift0;
  10840         drflac_uint32 side  = pInputSamples1U32[i] << shift1;
  10841         drflac_uint32 right = left - side;
  10842 
  10843         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
  10844         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  10845     }
  10846 }
  10847 #endif
  10848 
  10849 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10850 {
  10851 #if defined(DRFLAC_SUPPORT_SSE2)
  10852     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  10853         drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10854     } else
  10855 #elif defined(DRFLAC_SUPPORT_NEON)
  10856     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  10857         drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10858     } else
  10859 #endif
  10860     {
  10861         /* Scalar fallback. */
  10862 #if 0
  10863         drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10864 #else
  10865         drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  10866 #endif
  10867     }
  10868 }
  10869 
  10870 
  10871 #if 0
  10872 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10873 {
  10874     drflac_uint64 i;
  10875     for (i = 0; i < frameCount; ++i) {
  10876         drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  10877         drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  10878         drflac_uint32 left  = right + side;
  10879 
  10880         pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
  10881         pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
  10882     }
  10883 }
  10884 #endif
  10885 
  10886 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10887 {
  10888     drflac_uint64 i;
  10889     drflac_uint64 frameCount4 = frameCount >> 2;
  10890     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10891     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10892     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  10893     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  10894     float factor = 1 / 2147483648.0;
  10895 
  10896     for (i = 0; i < frameCount4; ++i) {
  10897         drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
  10898         drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
  10899         drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
  10900         drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
  10901 
  10902         drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
  10903         drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
  10904         drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
  10905         drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
  10906 
  10907         drflac_uint32 left0 = right0 + side0;
  10908         drflac_uint32 left1 = right1 + side1;
  10909         drflac_uint32 left2 = right2 + side2;
  10910         drflac_uint32 left3 = right3 + side3;
  10911 
  10912         pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
  10913         pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
  10914         pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
  10915         pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
  10916         pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
  10917         pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
  10918         pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
  10919         pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
  10920     }
  10921 
  10922     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10923         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
  10924         drflac_uint32 right = pInputSamples1U32[i] << shift1;
  10925         drflac_uint32 left  = right + side;
  10926 
  10927         pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
  10928         pOutputSamples[i*2+1] = (drflac_int32)right * factor;
  10929     }
  10930 }
  10931 
  10932 #if defined(DRFLAC_SUPPORT_SSE2)
  10933 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10934 {
  10935     drflac_uint64 i;
  10936     drflac_uint64 frameCount4 = frameCount >> 2;
  10937     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10938     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10939     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  10940     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  10941     __m128 factor;
  10942 
  10943     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10944 
  10945     factor = _mm_set1_ps(1.0f / 8388608.0f);
  10946 
  10947     for (i = 0; i < frameCount4; ++i) {
  10948         __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  10949         __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  10950         __m128i left  = _mm_add_epi32(right, side);
  10951         __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
  10952         __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
  10953 
  10954         _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  10955         _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  10956     }
  10957 
  10958     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  10959         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
  10960         drflac_uint32 right = pInputSamples1U32[i] << shift1;
  10961         drflac_uint32 left  = right + side;
  10962 
  10963         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
  10964         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  10965     }
  10966 }
  10967 #endif
  10968 
  10969 #if defined(DRFLAC_SUPPORT_NEON)
  10970 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  10971 {
  10972     drflac_uint64 i;
  10973     drflac_uint64 frameCount4 = frameCount >> 2;
  10974     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  10975     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  10976     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  10977     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  10978     float32x4_t factor4;
  10979     int32x4_t shift0_4;
  10980     int32x4_t shift1_4;
  10981 
  10982     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  10983 
  10984     factor4  = vdupq_n_f32(1.0f / 8388608.0f);
  10985     shift0_4 = vdupq_n_s32(shift0);
  10986     shift1_4 = vdupq_n_s32(shift1);
  10987 
  10988     for (i = 0; i < frameCount4; ++i) {
  10989         uint32x4_t side;
  10990         uint32x4_t right;
  10991         uint32x4_t left;
  10992         float32x4_t leftf;
  10993         float32x4_t rightf;
  10994 
  10995         side   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  10996         right  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  10997         left   = vaddq_u32(right, side);
  10998         leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
  10999         rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
  11000 
  11001         drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  11002     }
  11003 
  11004     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11005         drflac_uint32 side  = pInputSamples0U32[i] << shift0;
  11006         drflac_uint32 right = pInputSamples1U32[i] << shift1;
  11007         drflac_uint32 left  = right + side;
  11008 
  11009         pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
  11010         pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  11011     }
  11012 }
  11013 #endif
  11014 
  11015 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11016 {
  11017 #if defined(DRFLAC_SUPPORT_SSE2)
  11018     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  11019         drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11020     } else
  11021 #elif defined(DRFLAC_SUPPORT_NEON)
  11022     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  11023         drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11024     } else
  11025 #endif
  11026     {
  11027         /* Scalar fallback. */
  11028 #if 0
  11029         drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11030 #else
  11031         drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11032 #endif
  11033     }
  11034 }
  11035 
  11036 
  11037 #if 0
  11038 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11039 {
  11040     for (drflac_uint64 i = 0; i < frameCount; ++i) {
  11041         drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11042         drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11043 
  11044         mid = (mid << 1) | (side & 0x01);
  11045 
  11046         pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
  11047         pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
  11048     }
  11049 }
  11050 #endif
  11051 
  11052 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11053 {
  11054     drflac_uint64 i;
  11055     drflac_uint64 frameCount4 = frameCount >> 2;
  11056     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  11057     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  11058     drflac_uint32 shift = unusedBitsPerSample;
  11059     float factor = 1 / 2147483648.0;
  11060 
  11061     if (shift > 0) {
  11062         shift -= 1;
  11063         for (i = 0; i < frameCount4; ++i) {
  11064             drflac_uint32 temp0L;
  11065             drflac_uint32 temp1L;
  11066             drflac_uint32 temp2L;
  11067             drflac_uint32 temp3L;
  11068             drflac_uint32 temp0R;
  11069             drflac_uint32 temp1R;
  11070             drflac_uint32 temp2R;
  11071             drflac_uint32 temp3R;
  11072 
  11073             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11074             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11075             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11076             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11077 
  11078             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11079             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11080             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11081             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11082 
  11083             mid0 = (mid0 << 1) | (side0 & 0x01);
  11084             mid1 = (mid1 << 1) | (side1 & 0x01);
  11085             mid2 = (mid2 << 1) | (side2 & 0x01);
  11086             mid3 = (mid3 << 1) | (side3 & 0x01);
  11087 
  11088             temp0L = (mid0 + side0) << shift;
  11089             temp1L = (mid1 + side1) << shift;
  11090             temp2L = (mid2 + side2) << shift;
  11091             temp3L = (mid3 + side3) << shift;
  11092 
  11093             temp0R = (mid0 - side0) << shift;
  11094             temp1R = (mid1 - side1) << shift;
  11095             temp2R = (mid2 - side2) << shift;
  11096             temp3R = (mid3 - side3) << shift;
  11097 
  11098             pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
  11099             pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
  11100             pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
  11101             pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
  11102             pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
  11103             pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
  11104             pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
  11105             pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
  11106         }
  11107     } else {
  11108         for (i = 0; i < frameCount4; ++i) {
  11109             drflac_uint32 temp0L;
  11110             drflac_uint32 temp1L;
  11111             drflac_uint32 temp2L;
  11112             drflac_uint32 temp3L;
  11113             drflac_uint32 temp0R;
  11114             drflac_uint32 temp1R;
  11115             drflac_uint32 temp2R;
  11116             drflac_uint32 temp3R;
  11117 
  11118             drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11119             drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11120             drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11121             drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11122 
  11123             drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11124             drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11125             drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11126             drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11127 
  11128             mid0 = (mid0 << 1) | (side0 & 0x01);
  11129             mid1 = (mid1 << 1) | (side1 & 0x01);
  11130             mid2 = (mid2 << 1) | (side2 & 0x01);
  11131             mid3 = (mid3 << 1) | (side3 & 0x01);
  11132 
  11133             temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
  11134             temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
  11135             temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
  11136             temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
  11137 
  11138             temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
  11139             temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
  11140             temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
  11141             temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
  11142 
  11143             pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
  11144             pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
  11145             pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
  11146             pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
  11147             pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
  11148             pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
  11149             pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
  11150             pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
  11151         }
  11152     }
  11153 
  11154     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11155         drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11156         drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11157 
  11158         mid = (mid << 1) | (side & 0x01);
  11159 
  11160         pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
  11161         pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
  11162     }
  11163 }
  11164 
  11165 #if defined(DRFLAC_SUPPORT_SSE2)
  11166 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11167 {
  11168     drflac_uint64 i;
  11169     drflac_uint64 frameCount4 = frameCount >> 2;
  11170     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  11171     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  11172     drflac_uint32 shift = unusedBitsPerSample - 8;
  11173     float factor;
  11174     __m128 factor128;
  11175 
  11176     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  11177 
  11178     factor = 1.0f / 8388608.0f;
  11179     factor128 = _mm_set1_ps(factor);
  11180 
  11181     if (shift == 0) {
  11182         for (i = 0; i < frameCount4; ++i) {
  11183             __m128i mid;
  11184             __m128i side;
  11185             __m128i tempL;
  11186             __m128i tempR;
  11187             __m128  leftf;
  11188             __m128  rightf;
  11189 
  11190             mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  11191             side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  11192 
  11193             mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  11194 
  11195             tempL  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
  11196             tempR  = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
  11197 
  11198             leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
  11199             rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
  11200 
  11201             _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  11202             _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  11203         }
  11204 
  11205         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11206             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11207             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11208 
  11209             mid = (mid << 1) | (side & 0x01);
  11210 
  11211             pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
  11212             pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
  11213         }
  11214     } else {
  11215         shift -= 1;
  11216         for (i = 0; i < frameCount4; ++i) {
  11217             __m128i mid;
  11218             __m128i side;
  11219             __m128i tempL;
  11220             __m128i tempR;
  11221             __m128 leftf;
  11222             __m128 rightf;
  11223 
  11224             mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  11225             side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  11226 
  11227             mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  11228 
  11229             tempL  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
  11230             tempR  = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
  11231 
  11232             leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
  11233             rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
  11234 
  11235             _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  11236             _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  11237         }
  11238 
  11239         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11240             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11241             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11242 
  11243             mid = (mid << 1) | (side & 0x01);
  11244 
  11245             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
  11246             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
  11247         }
  11248     }
  11249 }
  11250 #endif
  11251 
  11252 #if defined(DRFLAC_SUPPORT_NEON)
  11253 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11254 {
  11255     drflac_uint64 i;
  11256     drflac_uint64 frameCount4 = frameCount >> 2;
  11257     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  11258     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  11259     drflac_uint32 shift = unusedBitsPerSample - 8;
  11260     float factor;
  11261     float32x4_t factor4;
  11262     int32x4_t shift4;
  11263     int32x4_t wbps0_4;  /* Wasted Bits Per Sample */
  11264     int32x4_t wbps1_4;  /* Wasted Bits Per Sample */
  11265 
  11266     DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  11267 
  11268     factor  = 1.0f / 8388608.0f;
  11269     factor4 = vdupq_n_f32(factor);
  11270     wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  11271     wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  11272 
  11273     if (shift == 0) {
  11274         for (i = 0; i < frameCount4; ++i) {
  11275             int32x4_t lefti;
  11276             int32x4_t righti;
  11277             float32x4_t leftf;
  11278             float32x4_t rightf;
  11279 
  11280             uint32x4_t mid  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
  11281             uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
  11282 
  11283             mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  11284 
  11285             lefti  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
  11286             righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
  11287 
  11288             leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
  11289             rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
  11290 
  11291             drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  11292         }
  11293 
  11294         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11295             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11296             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11297 
  11298             mid = (mid << 1) | (side & 0x01);
  11299 
  11300             pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
  11301             pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
  11302         }
  11303     } else {
  11304         shift -= 1;
  11305         shift4 = vdupq_n_s32(shift);
  11306         for (i = 0; i < frameCount4; ++i) {
  11307             uint32x4_t mid;
  11308             uint32x4_t side;
  11309             int32x4_t lefti;
  11310             int32x4_t righti;
  11311             float32x4_t leftf;
  11312             float32x4_t rightf;
  11313 
  11314             mid    = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
  11315             side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
  11316 
  11317             mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  11318 
  11319             lefti  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
  11320             righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
  11321 
  11322             leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
  11323             rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
  11324 
  11325             drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  11326         }
  11327 
  11328         for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11329             drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11330             drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11331 
  11332             mid = (mid << 1) | (side & 0x01);
  11333 
  11334             pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
  11335             pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
  11336         }
  11337     }
  11338 }
  11339 #endif
  11340 
  11341 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11342 {
  11343 #if defined(DRFLAC_SUPPORT_SSE2)
  11344     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  11345         drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11346     } else
  11347 #elif defined(DRFLAC_SUPPORT_NEON)
  11348     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  11349         drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11350     } else
  11351 #endif
  11352     {
  11353         /* Scalar fallback. */
  11354 #if 0
  11355         drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11356 #else
  11357         drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11358 #endif
  11359     }
  11360 }
  11361 
  11362 #if 0
  11363 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11364 {
  11365     for (drflac_uint64 i = 0; i < frameCount; ++i) {
  11366         pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
  11367         pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
  11368     }
  11369 }
  11370 #endif
  11371 
  11372 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11373 {
  11374     drflac_uint64 i;
  11375     drflac_uint64 frameCount4 = frameCount >> 2;
  11376     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  11377     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  11378     drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  11379     drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  11380     float factor = 1 / 2147483648.0;
  11381 
  11382     for (i = 0; i < frameCount4; ++i) {
  11383         drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
  11384         drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
  11385         drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
  11386         drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
  11387 
  11388         drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
  11389         drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
  11390         drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
  11391         drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
  11392 
  11393         pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
  11394         pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
  11395         pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
  11396         pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
  11397         pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
  11398         pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
  11399         pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
  11400         pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
  11401     }
  11402 
  11403     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11404         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
  11405         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
  11406     }
  11407 }
  11408 
  11409 #if defined(DRFLAC_SUPPORT_SSE2)
  11410 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11411 {
  11412     drflac_uint64 i;
  11413     drflac_uint64 frameCount4 = frameCount >> 2;
  11414     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  11415     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  11416     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  11417     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  11418 
  11419     float factor = 1.0f / 8388608.0f;
  11420     __m128 factor128 = _mm_set1_ps(factor);
  11421 
  11422     for (i = 0; i < frameCount4; ++i) {
  11423         __m128i lefti;
  11424         __m128i righti;
  11425         __m128 leftf;
  11426         __m128 rightf;
  11427 
  11428         lefti  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  11429         righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  11430 
  11431         leftf  = _mm_mul_ps(_mm_cvtepi32_ps(lefti),  factor128);
  11432         rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
  11433 
  11434         _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  11435         _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  11436     }
  11437 
  11438     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11439         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
  11440         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
  11441     }
  11442 }
  11443 #endif
  11444 
  11445 #if defined(DRFLAC_SUPPORT_NEON)
  11446 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11447 {
  11448     drflac_uint64 i;
  11449     drflac_uint64 frameCount4 = frameCount >> 2;
  11450     const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  11451     const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  11452     drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  11453     drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  11454 
  11455     float factor = 1.0f / 8388608.0f;
  11456     float32x4_t factor4 = vdupq_n_f32(factor);
  11457     int32x4_t shift0_4  = vdupq_n_s32(shift0);
  11458     int32x4_t shift1_4  = vdupq_n_s32(shift1);
  11459 
  11460     for (i = 0; i < frameCount4; ++i) {
  11461         int32x4_t lefti;
  11462         int32x4_t righti;
  11463         float32x4_t leftf;
  11464         float32x4_t rightf;
  11465 
  11466         lefti  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
  11467         righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
  11468 
  11469         leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
  11470         rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
  11471 
  11472         drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  11473     }
  11474 
  11475     for (i = (frameCount4 << 2); i < frameCount; ++i) {
  11476         pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
  11477         pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
  11478     }
  11479 }
  11480 #endif
  11481 
  11482 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  11483 {
  11484 #if defined(DRFLAC_SUPPORT_SSE2)
  11485     if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  11486         drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11487     } else
  11488 #elif defined(DRFLAC_SUPPORT_NEON)
  11489     if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  11490         drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11491     } else
  11492 #endif
  11493     {
  11494         /* Scalar fallback. */
  11495 #if 0
  11496         drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11497 #else
  11498         drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  11499 #endif
  11500     }
  11501 }
  11502 
  11503 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
  11504 {
  11505     drflac_uint64 framesRead;
  11506     drflac_uint32 unusedBitsPerSample;
  11507 
  11508     if (pFlac == NULL || framesToRead == 0) {
  11509         return 0;
  11510     }
  11511 
  11512     if (pBufferOut == NULL) {
  11513         return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
  11514     }
  11515 
  11516     DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
  11517     unusedBitsPerSample = 32 - pFlac->bitsPerSample;
  11518 
  11519     framesRead = 0;
  11520     while (framesToRead > 0) {
  11521         /* If we've run out of samples in this frame, go to the next. */
  11522         if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  11523             if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  11524                 break;  /* Couldn't read the next frame, so just break from the loop and return. */
  11525             }
  11526         } else {
  11527             unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  11528             drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
  11529             drflac_uint64 frameCountThisIteration = framesToRead;
  11530 
  11531             if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
  11532                 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
  11533             }
  11534 
  11535             if (channelCount == 2) {
  11536                 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
  11537                 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
  11538 
  11539                 switch (pFlac->currentFLACFrame.header.channelAssignment)
  11540                 {
  11541                     case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
  11542                     {
  11543                         drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  11544                     } break;
  11545 
  11546                     case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
  11547                     {
  11548                         drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  11549                     } break;
  11550 
  11551                     case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
  11552                     {
  11553                         drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  11554                     } break;
  11555 
  11556                     case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
  11557                     default:
  11558                     {
  11559                         drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  11560                     } break;
  11561                 }
  11562             } else {
  11563                 /* Generic interleaving. */
  11564                 drflac_uint64 i;
  11565                 for (i = 0; i < frameCountThisIteration; ++i) {
  11566                     unsigned int j;
  11567                     for (j = 0; j < channelCount; ++j) {
  11568                         drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
  11569                         pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
  11570                     }
  11571                 }
  11572             }
  11573 
  11574             framesRead                += frameCountThisIteration;
  11575             pBufferOut                += frameCountThisIteration * channelCount;
  11576             framesToRead              -= frameCountThisIteration;
  11577             pFlac->currentPCMFrame    += frameCountThisIteration;
  11578             pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
  11579         }
  11580     }
  11581 
  11582     return framesRead;
  11583 }
  11584 
  11585 
  11586 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
  11587 {
  11588     if (pFlac == NULL) {
  11589         return DRFLAC_FALSE;
  11590     }
  11591 
  11592     /* Don't do anything if we're already on the seek point. */
  11593     if (pFlac->currentPCMFrame == pcmFrameIndex) {
  11594         return DRFLAC_TRUE;
  11595     }
  11596 
  11597     /*
  11598     If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
  11599     when the decoder was opened.
  11600     */
  11601     if (pFlac->firstFLACFramePosInBytes == 0) {
  11602         return DRFLAC_FALSE;
  11603     }
  11604 
  11605     if (pcmFrameIndex == 0) {
  11606         pFlac->currentPCMFrame = 0;
  11607         return drflac__seek_to_first_frame(pFlac);
  11608     } else {
  11609         drflac_bool32 wasSuccessful = DRFLAC_FALSE;
  11610         drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
  11611 
  11612         /* Clamp the sample to the end. */
  11613         if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
  11614             pcmFrameIndex = pFlac->totalPCMFrameCount;
  11615         }
  11616 
  11617         /* If the target sample and the current sample are in the same frame we just move the position forward. */
  11618         if (pcmFrameIndex > pFlac->currentPCMFrame) {
  11619             /* Forward. */
  11620             drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
  11621             if (pFlac->currentFLACFrame.pcmFramesRemaining >  offset) {
  11622                 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
  11623                 pFlac->currentPCMFrame = pcmFrameIndex;
  11624                 return DRFLAC_TRUE;
  11625             }
  11626         } else {
  11627             /* Backward. */
  11628             drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
  11629             drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
  11630             drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
  11631             if (currentFLACFramePCMFramesConsumed > offsetAbs) {
  11632                 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
  11633                 pFlac->currentPCMFrame = pcmFrameIndex;
  11634                 return DRFLAC_TRUE;
  11635             }
  11636         }
  11637 
  11638         /*
  11639         Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
  11640         we'll instead use Ogg's natural seeking facility.
  11641         */
  11642 #ifndef DR_FLAC_NO_OGG
  11643         if (pFlac->container == drflac_container_ogg)
  11644         {
  11645             wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
  11646         }
  11647         else
  11648 #endif
  11649         {
  11650             /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
  11651             if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
  11652                 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
  11653             }
  11654 
  11655 #if !defined(DR_FLAC_NO_CRC)
  11656             /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
  11657             if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
  11658                 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
  11659             }
  11660 #endif
  11661 
  11662             /* Fall back to brute force if all else fails. */
  11663             if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
  11664                 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
  11665             }
  11666         }
  11667 
  11668         if (wasSuccessful) {
  11669             pFlac->currentPCMFrame = pcmFrameIndex;
  11670         } else {
  11671             /* Seek failed. Try putting the decoder back to it's original state. */
  11672             if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
  11673                 /* Failed to seek back to the original PCM frame. Fall back to 0. */
  11674                 drflac_seek_to_pcm_frame(pFlac, 0);
  11675             }
  11676         }
  11677 
  11678         return wasSuccessful;
  11679     }
  11680 }
  11681 
  11682 
  11683 
  11684 /* High Level APIs */
  11685 
  11686 /* SIZE_MAX */
  11687 #if defined(SIZE_MAX)
  11688     #define DRFLAC_SIZE_MAX  SIZE_MAX
  11689 #else
  11690     #if defined(DRFLAC_64BIT)
  11691         #define DRFLAC_SIZE_MAX  ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
  11692     #else
  11693         #define DRFLAC_SIZE_MAX  0xFFFFFFFF
  11694     #endif
  11695 #endif
  11696 /* End SIZE_MAX */
  11697 
  11698 
  11699 /* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
  11700 #define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
  11701 static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
  11702 {                                                                                                                                                                   \
  11703     type* pSampleData = NULL;                                                                                                                                       \
  11704     drflac_uint64 totalPCMFrameCount;                                                                                                                               \
  11705                                                                                                                                                                     \
  11706     DRFLAC_ASSERT(pFlac != NULL);                                                                                                                                   \
  11707                                                                                                                                                                     \
  11708     totalPCMFrameCount = pFlac->totalPCMFrameCount;                                                                                                                 \
  11709                                                                                                                                                                     \
  11710     if (totalPCMFrameCount == 0) {                                                                                                                                  \
  11711         type buffer[4096];                                                                                                                                          \
  11712         drflac_uint64 pcmFramesRead;                                                                                                                                \
  11713         size_t sampleDataBufferSize = sizeof(buffer);                                                                                                               \
  11714                                                                                                                                                                     \
  11715         pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks);                                                      \
  11716         if (pSampleData == NULL) {                                                                                                                                  \
  11717             goto on_error;                                                                                                                                          \
  11718         }                                                                                                                                                           \
  11719                                                                                                                                                                     \
  11720         while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) {          \
  11721             if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) {                                                   \
  11722                 type* pNewSampleData;                                                                                                                               \
  11723                 size_t newSampleDataBufferSize;                                                                                                                     \
  11724                                                                                                                                                                     \
  11725                 newSampleDataBufferSize = sampleDataBufferSize * 2;                                                                                                 \
  11726                 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks);    \
  11727                 if (pNewSampleData == NULL) {                                                                                                                       \
  11728                     drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks);                                                                          \
  11729                     goto on_error;                                                                                                                                  \
  11730                 }                                                                                                                                                   \
  11731                                                                                                                                                                     \
  11732                 sampleDataBufferSize = newSampleDataBufferSize;                                                                                                     \
  11733                 pSampleData = pNewSampleData;                                                                                                                       \
  11734             }                                                                                                                                                       \
  11735                                                                                                                                                                     \
  11736             DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type)));                   \
  11737             totalPCMFrameCount += pcmFramesRead;                                                                                                                    \
  11738         }                                                                                                                                                           \
  11739                                                                                                                                                                     \
  11740         /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to                                       \
  11741            protect those ears from random noise! */                                                                                                                 \
  11742         DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type)));   \
  11743     } else {                                                                                                                                                        \
  11744         drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type);                                                                                   \
  11745         if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) {                                                                                                            \
  11746             goto on_error;  /* The decoded data is too big. */                                                                                                      \
  11747         }                                                                                                                                                           \
  11748                                                                                                                                                                     \
  11749         pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks);    /* <-- Safe cast as per the check above. */           \
  11750         if (pSampleData == NULL) {                                                                                                                                  \
  11751             goto on_error;                                                                                                                                          \
  11752         }                                                                                                                                                           \
  11753                                                                                                                                                                     \
  11754         totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData);                                                     \
  11755     }                                                                                                                                                               \
  11756                                                                                                                                                                     \
  11757     if (sampleRateOut) *sampleRateOut = pFlac->sampleRate;                                                                                                          \
  11758     if (channelsOut) *channelsOut = pFlac->channels;                                                                                                                \
  11759     if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount;                                                                                         \
  11760                                                                                                                                                                     \
  11761     drflac_close(pFlac);                                                                                                                                            \
  11762     return pSampleData;                                                                                                                                             \
  11763                                                                                                                                                                     \
  11764 on_error:                                                                                                                                                           \
  11765     drflac_close(pFlac);                                                                                                                                            \
  11766     return NULL;                                                                                                                                                    \
  11767 }
  11768 
  11769 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
  11770 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
  11771 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
  11772 
  11773 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
  11774 {
  11775     drflac* pFlac;
  11776 
  11777     if (channelsOut) {
  11778         *channelsOut = 0;
  11779     }
  11780     if (sampleRateOut) {
  11781         *sampleRateOut = 0;
  11782     }
  11783     if (totalPCMFrameCountOut) {
  11784         *totalPCMFrameCountOut = 0;
  11785     }
  11786 
  11787     pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
  11788     if (pFlac == NULL) {
  11789         return NULL;
  11790     }
  11791 
  11792     return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
  11793 }
  11794 
  11795 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
  11796 {
  11797     drflac* pFlac;
  11798 
  11799     if (channelsOut) {
  11800         *channelsOut = 0;
  11801     }
  11802     if (sampleRateOut) {
  11803         *sampleRateOut = 0;
  11804     }
  11805     if (totalPCMFrameCountOut) {
  11806         *totalPCMFrameCountOut = 0;
  11807     }
  11808 
  11809     pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
  11810     if (pFlac == NULL) {
  11811         return NULL;
  11812     }
  11813 
  11814     return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
  11815 }
  11816 
  11817 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
  11818 {
  11819     drflac* pFlac;
  11820 
  11821     if (channelsOut) {
  11822         *channelsOut = 0;
  11823     }
  11824     if (sampleRateOut) {
  11825         *sampleRateOut = 0;
  11826     }
  11827     if (totalPCMFrameCountOut) {
  11828         *totalPCMFrameCountOut = 0;
  11829     }
  11830 
  11831     pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
  11832     if (pFlac == NULL) {
  11833         return NULL;
  11834     }
  11835 
  11836     return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
  11837 }
  11838 
  11839 #ifndef DR_FLAC_NO_STDIO
  11840 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  11841 {
  11842     drflac* pFlac;
  11843 
  11844     if (sampleRate) {
  11845         *sampleRate = 0;
  11846     }
  11847     if (channels) {
  11848         *channels = 0;
  11849     }
  11850     if (totalPCMFrameCount) {
  11851         *totalPCMFrameCount = 0;
  11852     }
  11853 
  11854     pFlac = drflac_open_file(filename, pAllocationCallbacks);
  11855     if (pFlac == NULL) {
  11856         return NULL;
  11857     }
  11858 
  11859     return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
  11860 }
  11861 
  11862 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  11863 {
  11864     drflac* pFlac;
  11865 
  11866     if (sampleRate) {
  11867         *sampleRate = 0;
  11868     }
  11869     if (channels) {
  11870         *channels = 0;
  11871     }
  11872     if (totalPCMFrameCount) {
  11873         *totalPCMFrameCount = 0;
  11874     }
  11875 
  11876     pFlac = drflac_open_file(filename, pAllocationCallbacks);
  11877     if (pFlac == NULL) {
  11878         return NULL;
  11879     }
  11880 
  11881     return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
  11882 }
  11883 
  11884 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  11885 {
  11886     drflac* pFlac;
  11887 
  11888     if (sampleRate) {
  11889         *sampleRate = 0;
  11890     }
  11891     if (channels) {
  11892         *channels = 0;
  11893     }
  11894     if (totalPCMFrameCount) {
  11895         *totalPCMFrameCount = 0;
  11896     }
  11897 
  11898     pFlac = drflac_open_file(filename, pAllocationCallbacks);
  11899     if (pFlac == NULL) {
  11900         return NULL;
  11901     }
  11902 
  11903     return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
  11904 }
  11905 #endif
  11906 
  11907 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  11908 {
  11909     drflac* pFlac;
  11910 
  11911     if (sampleRate) {
  11912         *sampleRate = 0;
  11913     }
  11914     if (channels) {
  11915         *channels = 0;
  11916     }
  11917     if (totalPCMFrameCount) {
  11918         *totalPCMFrameCount = 0;
  11919     }
  11920 
  11921     pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
  11922     if (pFlac == NULL) {
  11923         return NULL;
  11924     }
  11925 
  11926     return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
  11927 }
  11928 
  11929 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  11930 {
  11931     drflac* pFlac;
  11932 
  11933     if (sampleRate) {
  11934         *sampleRate = 0;
  11935     }
  11936     if (channels) {
  11937         *channels = 0;
  11938     }
  11939     if (totalPCMFrameCount) {
  11940         *totalPCMFrameCount = 0;
  11941     }
  11942 
  11943     pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
  11944     if (pFlac == NULL) {
  11945         return NULL;
  11946     }
  11947 
  11948     return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
  11949 }
  11950 
  11951 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  11952 {
  11953     drflac* pFlac;
  11954 
  11955     if (sampleRate) {
  11956         *sampleRate = 0;
  11957     }
  11958     if (channels) {
  11959         *channels = 0;
  11960     }
  11961     if (totalPCMFrameCount) {
  11962         *totalPCMFrameCount = 0;
  11963     }
  11964 
  11965     pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
  11966     if (pFlac == NULL) {
  11967         return NULL;
  11968     }
  11969 
  11970     return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
  11971 }
  11972 
  11973 
  11974 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
  11975 {
  11976     if (pAllocationCallbacks != NULL) {
  11977         drflac__free_from_callbacks(p, pAllocationCallbacks);
  11978     } else {
  11979         drflac__free_default(p, NULL);
  11980     }
  11981 }
  11982 
  11983 
  11984 
  11985 
  11986 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
  11987 {
  11988     if (pIter == NULL) {
  11989         return;
  11990     }
  11991 
  11992     pIter->countRemaining = commentCount;
  11993     pIter->pRunningData   = (const char*)pComments;
  11994 }
  11995 
  11996 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
  11997 {
  11998     drflac_int32 length;
  11999     const char* pComment;
  12000 
  12001     /* Safety. */
  12002     if (pCommentLengthOut) {
  12003         *pCommentLengthOut = 0;
  12004     }
  12005 
  12006     if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
  12007         return NULL;
  12008     }
  12009 
  12010     length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
  12011     pIter->pRunningData += 4;
  12012 
  12013     pComment = pIter->pRunningData;
  12014     pIter->pRunningData += length;
  12015     pIter->countRemaining -= 1;
  12016 
  12017     if (pCommentLengthOut) {
  12018         *pCommentLengthOut = length;
  12019     }
  12020 
  12021     return pComment;
  12022 }
  12023 
  12024 
  12025 
  12026 
  12027 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
  12028 {
  12029     if (pIter == NULL) {
  12030         return;
  12031     }
  12032 
  12033     pIter->countRemaining = trackCount;
  12034     pIter->pRunningData   = (const char*)pTrackData;
  12035 }
  12036 
  12037 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
  12038 {
  12039     drflac_cuesheet_track cuesheetTrack;
  12040     const char* pRunningData;
  12041     drflac_uint64 offsetHi;
  12042     drflac_uint64 offsetLo;
  12043 
  12044     if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
  12045         return DRFLAC_FALSE;
  12046     }
  12047 
  12048     pRunningData = pIter->pRunningData;
  12049 
  12050     offsetHi                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  12051     offsetLo                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  12052     cuesheetTrack.offset       = offsetLo | (offsetHi << 32);
  12053     cuesheetTrack.trackNumber  = pRunningData[0];                                         pRunningData += 1;
  12054     DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC));     pRunningData += 12;
  12055     cuesheetTrack.isAudio      = (pRunningData[0] & 0x80) != 0;
  12056     cuesheetTrack.preEmphasis  = (pRunningData[0] & 0x40) != 0;                           pRunningData += 14;
  12057     cuesheetTrack.indexCount   = pRunningData[0];                                         pRunningData += 1;
  12058     cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData;        pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
  12059 
  12060     pIter->pRunningData = pRunningData;
  12061     pIter->countRemaining -= 1;
  12062 
  12063     if (pCuesheetTrack) {
  12064         *pCuesheetTrack = cuesheetTrack;
  12065     }
  12066 
  12067     return DRFLAC_TRUE;
  12068 }
  12069 
  12070 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
  12071     #pragma GCC diagnostic pop
  12072 #endif
  12073 #endif  /* dr_flac_c */
  12074 #endif  /* DR_FLAC_IMPLEMENTATION */
  12075 
  12076 
  12077 /*
  12078 REVISION HISTORY
  12079 ================
  12080 v0.12.42 - 2023-11-02
  12081   - Fix build for ARMv6-M.
  12082   - Fix a compilation warning with GCC.
  12083 
  12084 v0.12.41 - 2023-06-17
  12085   - Fix an incorrect date in revision history. No functional change.
  12086 
  12087 v0.12.40 - 2023-05-22
  12088   - Minor code restructure. No functional change.
  12089 
  12090 v0.12.39 - 2022-09-17
  12091   - Fix compilation with DJGPP.
  12092   - Fix compilation error with Visual Studio 2019 and the ARM build.
  12093   - Fix an error with SSE 4.1 detection.
  12094   - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
  12095   - Improve compatibility with compilers which lack support for explicit struct packing.
  12096   - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
  12097     allocation when loading an Ogg encapsulated file.
  12098 
  12099 v0.12.38 - 2022-04-10
  12100   - Fix compilation error on older versions of GCC.
  12101 
  12102 v0.12.37 - 2022-02-12
  12103   - Improve ARM detection.
  12104 
  12105 v0.12.36 - 2022-02-07
  12106   - Fix a compilation error with the ARM build.
  12107 
  12108 v0.12.35 - 2022-02-06
  12109   - Fix a bug due to underestimating the amount of precision required for the prediction stage.
  12110   - Fix some bugs found from fuzz testing.
  12111 
  12112 v0.12.34 - 2022-01-07
  12113   - Fix some misalignment bugs when reading metadata.
  12114 
  12115 v0.12.33 - 2021-12-22
  12116   - Fix a bug with seeking when the seek table does not start at PCM frame 0.
  12117 
  12118 v0.12.32 - 2021-12-11
  12119   - Fix a warning with Clang.
  12120 
  12121 v0.12.31 - 2021-08-16
  12122   - Silence some warnings.
  12123 
  12124 v0.12.30 - 2021-07-31
  12125   - Fix platform detection for ARM64.
  12126 
  12127 v0.12.29 - 2021-04-02
  12128   - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
  12129   - Fix a decoding error due to an incorrect validation check.
  12130 
  12131 v0.12.28 - 2021-02-21
  12132   - Fix a warning due to referencing _MSC_VER when it is undefined.
  12133 
  12134 v0.12.27 - 2021-01-31
  12135   - Fix a static analysis warning.
  12136 
  12137 v0.12.26 - 2021-01-17
  12138   - Fix a compilation warning due to _BSD_SOURCE being deprecated.
  12139 
  12140 v0.12.25 - 2020-12-26
  12141   - Update documentation.
  12142 
  12143 v0.12.24 - 2020-11-29
  12144   - Fix ARM64/NEON detection when compiling with MSVC.
  12145 
  12146 v0.12.23 - 2020-11-21
  12147   - Fix compilation with OpenWatcom.
  12148 
  12149 v0.12.22 - 2020-11-01
  12150   - Fix an error with the previous release.
  12151 
  12152 v0.12.21 - 2020-11-01
  12153   - Fix a possible deadlock when seeking.
  12154   - Improve compiler support for older versions of GCC.
  12155 
  12156 v0.12.20 - 2020-09-08
  12157   - Fix a compilation error on older compilers.
  12158 
  12159 v0.12.19 - 2020-08-30
  12160   - Fix a bug due to an undefined 32-bit shift.
  12161 
  12162 v0.12.18 - 2020-08-14
  12163   - Fix a crash when compiling with clang-cl.
  12164 
  12165 v0.12.17 - 2020-08-02
  12166   - Simplify sized types.
  12167 
  12168 v0.12.16 - 2020-07-25
  12169   - Fix a compilation warning.
  12170 
  12171 v0.12.15 - 2020-07-06
  12172   - Check for negative LPC shifts and return an error.
  12173 
  12174 v0.12.14 - 2020-06-23
  12175   - Add include guard for the implementation section.
  12176 
  12177 v0.12.13 - 2020-05-16
  12178   - Add compile-time and run-time version querying.
  12179     - DRFLAC_VERSION_MINOR
  12180     - DRFLAC_VERSION_MAJOR
  12181     - DRFLAC_VERSION_REVISION
  12182     - DRFLAC_VERSION_STRING
  12183     - drflac_version()
  12184     - drflac_version_string()
  12185 
  12186 v0.12.12 - 2020-04-30
  12187   - Fix compilation errors with VC6.
  12188 
  12189 v0.12.11 - 2020-04-19
  12190   - Fix some pedantic warnings.
  12191   - Fix some undefined behaviour warnings.
  12192 
  12193 v0.12.10 - 2020-04-10
  12194   - Fix some bugs when trying to seek with an invalid seek table.
  12195 
  12196 v0.12.9 - 2020-04-05
  12197   - Fix warnings.
  12198 
  12199 v0.12.8 - 2020-04-04
  12200   - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
  12201   - Fix some static analysis warnings.
  12202   - Minor documentation updates.
  12203 
  12204 v0.12.7 - 2020-03-14
  12205   - Fix compilation errors with VC6.
  12206 
  12207 v0.12.6 - 2020-03-07
  12208   - Fix compilation error with Visual Studio .NET 2003.
  12209 
  12210 v0.12.5 - 2020-01-30
  12211   - Silence some static analysis warnings.
  12212 
  12213 v0.12.4 - 2020-01-29
  12214   - Silence some static analysis warnings.
  12215 
  12216 v0.12.3 - 2019-12-02
  12217   - Fix some warnings when compiling with GCC and the -Og flag.
  12218   - Fix a crash in out-of-memory situations.
  12219   - Fix potential integer overflow bug.
  12220   - Fix some static analysis warnings.
  12221   - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
  12222   - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
  12223 
  12224 v0.12.2 - 2019-10-07
  12225   - Internal code clean up.
  12226 
  12227 v0.12.1 - 2019-09-29
  12228   - Fix some Clang Static Analyzer warnings.
  12229   - Fix an unused variable warning.
  12230 
  12231 v0.12.0 - 2019-09-23
  12232   - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
  12233     routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
  12234     - drflac_open()
  12235     - drflac_open_relaxed()
  12236     - drflac_open_with_metadata()
  12237     - drflac_open_with_metadata_relaxed()
  12238     - drflac_open_file()
  12239     - drflac_open_file_with_metadata()
  12240     - drflac_open_memory()
  12241     - drflac_open_memory_with_metadata()
  12242     - drflac_open_and_read_pcm_frames_s32()
  12243     - drflac_open_and_read_pcm_frames_s16()
  12244     - drflac_open_and_read_pcm_frames_f32()
  12245     - drflac_open_file_and_read_pcm_frames_s32()
  12246     - drflac_open_file_and_read_pcm_frames_s16()
  12247     - drflac_open_file_and_read_pcm_frames_f32()
  12248     - drflac_open_memory_and_read_pcm_frames_s32()
  12249     - drflac_open_memory_and_read_pcm_frames_s16()
  12250     - drflac_open_memory_and_read_pcm_frames_f32()
  12251     Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
  12252     DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
  12253   - Remove deprecated APIs:
  12254     - drflac_read_s32()
  12255     - drflac_read_s16()
  12256     - drflac_read_f32()
  12257     - drflac_seek_to_sample()
  12258     - drflac_open_and_decode_s32()
  12259     - drflac_open_and_decode_s16()
  12260     - drflac_open_and_decode_f32()
  12261     - drflac_open_and_decode_file_s32()
  12262     - drflac_open_and_decode_file_s16()
  12263     - drflac_open_and_decode_file_f32()
  12264     - drflac_open_and_decode_memory_s32()
  12265     - drflac_open_and_decode_memory_s16()
  12266     - drflac_open_and_decode_memory_f32()
  12267   - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
  12268     by doing pFlac->totalPCMFrameCount*pFlac->channels.
  12269   - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
  12270   - Fix errors when seeking to the end of a stream.
  12271   - Optimizations to seeking.
  12272   - SSE improvements and optimizations.
  12273   - ARM NEON optimizations.
  12274   - Optimizations to drflac_read_pcm_frames_s16().
  12275   - Optimizations to drflac_read_pcm_frames_s32().
  12276 
  12277 v0.11.10 - 2019-06-26
  12278   - Fix a compiler error.
  12279 
  12280 v0.11.9 - 2019-06-16
  12281   - Silence some ThreadSanitizer warnings.
  12282 
  12283 v0.11.8 - 2019-05-21
  12284   - Fix warnings.
  12285 
  12286 v0.11.7 - 2019-05-06
  12287   - C89 fixes.
  12288 
  12289 v0.11.6 - 2019-05-05
  12290   - Add support for C89.
  12291   - Fix a compiler warning when CRC is disabled.
  12292   - Change license to choice of public domain or MIT-0.
  12293 
  12294 v0.11.5 - 2019-04-19
  12295   - Fix a compiler error with GCC.
  12296 
  12297 v0.11.4 - 2019-04-17
  12298   - Fix some warnings with GCC when compiling with -std=c99.
  12299 
  12300 v0.11.3 - 2019-04-07
  12301   - Silence warnings with GCC.
  12302 
  12303 v0.11.2 - 2019-03-10
  12304   - Fix a warning.
  12305 
  12306 v0.11.1 - 2019-02-17
  12307   - Fix a potential bug with seeking.
  12308 
  12309 v0.11.0 - 2018-12-16
  12310   - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
  12311     drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
  12312     and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
  12313     dividing it by the channel count, and then do the same with the return value.
  12314   - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
  12315     the changes to drflac_read_*() apply.
  12316   - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
  12317     the changes to drflac_read_*() apply.
  12318   - Optimizations.
  12319 
  12320 v0.10.0 - 2018-09-11
  12321   - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
  12322     need to do it yourself via the callback API.
  12323   - Fix the clang build.
  12324   - Fix undefined behavior.
  12325   - Fix errors with CUESHEET metdata blocks.
  12326   - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
  12327     Vorbis comment API.
  12328   - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
  12329   - Minor optimizations.
  12330 
  12331 v0.9.11 - 2018-08-29
  12332   - Fix a bug with sample reconstruction.
  12333 
  12334 v0.9.10 - 2018-08-07
  12335   - Improve 64-bit detection.
  12336 
  12337 v0.9.9 - 2018-08-05
  12338   - Fix C++ build on older versions of GCC.
  12339 
  12340 v0.9.8 - 2018-07-24
  12341   - Fix compilation errors.
  12342 
  12343 v0.9.7 - 2018-07-05
  12344   - Fix a warning.
  12345 
  12346 v0.9.6 - 2018-06-29
  12347   - Fix some typos.
  12348 
  12349 v0.9.5 - 2018-06-23
  12350   - Fix some warnings.
  12351 
  12352 v0.9.4 - 2018-06-14
  12353   - Optimizations to seeking.
  12354   - Clean up.
  12355 
  12356 v0.9.3 - 2018-05-22
  12357   - Bug fix.
  12358 
  12359 v0.9.2 - 2018-05-12
  12360   - Fix a compilation error due to a missing break statement.
  12361 
  12362 v0.9.1 - 2018-04-29
  12363   - Fix compilation error with Clang.
  12364 
  12365 v0.9 - 2018-04-24
  12366   - Fix Clang build.
  12367   - Start using major.minor.revision versioning.
  12368 
  12369 v0.8g - 2018-04-19
  12370   - Fix build on non-x86/x64 architectures.
  12371 
  12372 v0.8f - 2018-02-02
  12373   - Stop pretending to support changing rate/channels mid stream.
  12374 
  12375 v0.8e - 2018-02-01
  12376   - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
  12377   - Fix a crash the the Rice partition order is invalid.
  12378 
  12379 v0.8d - 2017-09-22
  12380   - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
  12381 
  12382 v0.8c - 2017-09-07
  12383   - Fix warning on non-x86/x64 architectures.
  12384 
  12385 v0.8b - 2017-08-19
  12386   - Fix build on non-x86/x64 architectures.
  12387 
  12388 v0.8a - 2017-08-13
  12389   - A small optimization for the Clang build.
  12390 
  12391 v0.8 - 2017-08-12
  12392   - API CHANGE: Rename dr_* types to drflac_*.
  12393   - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
  12394   - Add support for custom implementations of malloc(), realloc(), etc.
  12395   - Add CRC checking to Ogg encapsulated streams.
  12396   - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
  12397   - Bug fixes.
  12398 
  12399 v0.7 - 2017-07-23
  12400   - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
  12401 
  12402 v0.6 - 2017-07-22
  12403   - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
  12404     never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
  12405 
  12406 v0.5 - 2017-07-16
  12407   - Fix typos.
  12408   - Change drflac_bool* types to unsigned.
  12409   - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
  12410 
  12411 v0.4f - 2017-03-10
  12412   - Fix a couple of bugs with the bitstreaming code.
  12413 
  12414 v0.4e - 2017-02-17
  12415   - Fix some warnings.
  12416 
  12417 v0.4d - 2016-12-26
  12418   - Add support for 32-bit floating-point PCM decoding.
  12419   - Use drflac_int* and drflac_uint* sized types to improve compiler support.
  12420   - Minor improvements to documentation.
  12421 
  12422 v0.4c - 2016-12-26
  12423   - Add support for signed 16-bit integer PCM decoding.
  12424 
  12425 v0.4b - 2016-10-23
  12426   - A minor change to drflac_bool8 and drflac_bool32 types.
  12427 
  12428 v0.4a - 2016-10-11
  12429   - Rename drBool32 to drflac_bool32 for styling consistency.
  12430 
  12431 v0.4 - 2016-09-29
  12432   - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
  12433   - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
  12434   - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
  12435     keep it consistent with drflac_audio.
  12436 
  12437 v0.3f - 2016-09-21
  12438   - Fix a warning with GCC.
  12439 
  12440 v0.3e - 2016-09-18
  12441   - Fixed a bug where GCC 4.3+ was not getting properly identified.
  12442   - Fixed a few typos.
  12443   - Changed date formats to ISO 8601 (YYYY-MM-DD).
  12444 
  12445 v0.3d - 2016-06-11
  12446   - Minor clean up.
  12447 
  12448 v0.3c - 2016-05-28
  12449   - Fixed compilation error.
  12450 
  12451 v0.3b - 2016-05-16
  12452   - Fixed Linux/GCC build.
  12453   - Updated documentation.
  12454 
  12455 v0.3a - 2016-05-15
  12456   - Minor fixes to documentation.
  12457 
  12458 v0.3 - 2016-05-11
  12459   - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
  12460   - Lots of clean up.
  12461 
  12462 v0.2b - 2016-05-10
  12463   - Bug fixes.
  12464 
  12465 v0.2a - 2016-05-10
  12466   - Made drflac_open_and_decode() more robust.
  12467   - Removed an unused debugging variable
  12468 
  12469 v0.2 - 2016-05-09
  12470   - Added support for Ogg encapsulation.
  12471   - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
  12472     should be relative to the start or the current position. Also changes the seeking rules such that
  12473     seeking offsets will never be negative.
  12474   - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
  12475 
  12476 v0.1b - 2016-05-07
  12477   - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
  12478   - Removed a stale comment.
  12479 
  12480 v0.1a - 2016-05-05
  12481   - Minor formatting changes.
  12482   - Fixed a warning on the GCC build.
  12483 
  12484 v0.1 - 2016-05-03
  12485   - Initial versioned release.
  12486 */
  12487 
  12488 /*
  12489 This software is available as a choice of the following licenses. Choose
  12490 whichever you prefer.
  12491 
  12492 ===============================================================================
  12493 ALTERNATIVE 1 - Public Domain (www.unlicense.org)
  12494 ===============================================================================
  12495 This is free and unencumbered software released into the public domain.
  12496 
  12497 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
  12498 software, either in source code form or as a compiled binary, for any purpose,
  12499 commercial or non-commercial, and by any means.
  12500 
  12501 In jurisdictions that recognize copyright laws, the author or authors of this
  12502 software dedicate any and all copyright interest in the software to the public
  12503 domain. We make this dedication for the benefit of the public at large and to
  12504 the detriment of our heirs and successors. We intend this dedication to be an
  12505 overt act of relinquishment in perpetuity of all present and future rights to
  12506 this software under copyright law.
  12507 
  12508 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  12509 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  12510 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  12511 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
  12512 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
  12513 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  12514 
  12515 For more information, please refer to <http://unlicense.org/>
  12516 
  12517 ===============================================================================
  12518 ALTERNATIVE 2 - MIT No Attribution
  12519 ===============================================================================
  12520 Copyright 2023 David Reid
  12521 
  12522 Permission is hereby granted, free of charge, to any person obtaining a copy of
  12523 this software and associated documentation files (the "Software"), to deal in
  12524 the Software without restriction, including without limitation the rights to
  12525 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  12526 of the Software, and to permit persons to whom the Software is furnished to do
  12527 so.
  12528 
  12529 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  12530 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  12531 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  12532 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  12533 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  12534 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  12535 SOFTWARE.
  12536 */