dr_flac.h (525671B)
1 /* 2 FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file. 3 dr_flac - v0.12.42 - 2023-11-02 4 5 David Reid - mackron@gmail.com 6 7 GitHub: https://github.com/mackron/dr_libs 8 */ 9 10 /* 11 RELEASE NOTES - v0.12.0 12 ======================= 13 Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs. 14 15 16 Improved Client-Defined Memory Allocation 17 ----------------------------------------- 18 The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The 19 existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom 20 allocation callbacks are specified. 21 22 To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this: 23 24 void* my_malloc(size_t sz, void* pUserData) 25 { 26 return malloc(sz); 27 } 28 void* my_realloc(void* p, size_t sz, void* pUserData) 29 { 30 return realloc(p, sz); 31 } 32 void my_free(void* p, void* pUserData) 33 { 34 free(p); 35 } 36 37 ... 38 39 drflac_allocation_callbacks allocationCallbacks; 40 allocationCallbacks.pUserData = &myData; 41 allocationCallbacks.onMalloc = my_malloc; 42 allocationCallbacks.onRealloc = my_realloc; 43 allocationCallbacks.onFree = my_free; 44 drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks); 45 46 The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines. 47 48 Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC, 49 DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions. 50 51 Every API that opens a drflac object now takes this extra parameter. These include the following: 52 53 drflac_open() 54 drflac_open_relaxed() 55 drflac_open_with_metadata() 56 drflac_open_with_metadata_relaxed() 57 drflac_open_file() 58 drflac_open_file_with_metadata() 59 drflac_open_memory() 60 drflac_open_memory_with_metadata() 61 drflac_open_and_read_pcm_frames_s32() 62 drflac_open_and_read_pcm_frames_s16() 63 drflac_open_and_read_pcm_frames_f32() 64 drflac_open_file_and_read_pcm_frames_s32() 65 drflac_open_file_and_read_pcm_frames_s16() 66 drflac_open_file_and_read_pcm_frames_f32() 67 drflac_open_memory_and_read_pcm_frames_s32() 68 drflac_open_memory_and_read_pcm_frames_s16() 69 drflac_open_memory_and_read_pcm_frames_f32() 70 71 72 73 Optimizations 74 ------------- 75 Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly 76 improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes 77 advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which 78 means it will be disabled when DR_FLAC_NO_CRC is used. 79 80 The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in 81 particular. 16-bit streams should also see some improvement. 82 83 drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32 84 to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths. 85 86 A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo 87 channel reconstruction which is the last part of the decoding process. 88 89 The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when 90 compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at 91 compile time and the REV instruction requires ARM architecture version 6. 92 93 An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling. 94 95 96 Removed APIs 97 ------------ 98 The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0: 99 100 drflac_read_s32() -> drflac_read_pcm_frames_s32() 101 drflac_read_s16() -> drflac_read_pcm_frames_s16() 102 drflac_read_f32() -> drflac_read_pcm_frames_f32() 103 drflac_seek_to_sample() -> drflac_seek_to_pcm_frame() 104 drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32() 105 drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16() 106 drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32() 107 drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32() 108 drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16() 109 drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32() 110 drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32() 111 drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16() 112 drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32() 113 114 Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate 115 to the old per-sample APIs. You now need to use the "pcm_frame" versions. 116 */ 117 118 119 /* 120 Introduction 121 ============ 122 dr_flac is a single file library. To use it, do something like the following in one .c file. 123 124 ```c 125 #define DR_FLAC_IMPLEMENTATION 126 #include "dr_flac.h" 127 ``` 128 129 You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following: 130 131 ```c 132 drflac* pFlac = drflac_open_file("MySong.flac", NULL); 133 if (pFlac == NULL) { 134 // Failed to open FLAC file 135 } 136 137 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32)); 138 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples); 139 ``` 140 141 The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample, 142 should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above 143 a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well. 144 145 You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many 146 samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example: 147 148 ```c 149 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) { 150 do_something(); 151 } 152 ``` 153 154 You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`. 155 156 If you just want to quickly decode an entire FLAC file in one go you can do something like this: 157 158 ```c 159 unsigned int channels; 160 unsigned int sampleRate; 161 drflac_uint64 totalPCMFrameCount; 162 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL); 163 if (pSampleData == NULL) { 164 // Failed to open and decode FLAC file. 165 } 166 167 ... 168 169 drflac_free(pSampleData, NULL); 170 ``` 171 172 You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these 173 should be considered lossy. 174 175 176 If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`. 177 The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac 178 reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns. 179 180 The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style 181 streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs: 182 183 `drflac_open_relaxed()` 184 `drflac_open_with_metadata_relaxed()` 185 186 It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these 187 APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame. 188 189 190 191 Build Options 192 ============= 193 #define these options before including this file. 194 195 #define DR_FLAC_NO_STDIO 196 Disable `drflac_open_file()` and family. 197 198 #define DR_FLAC_NO_OGG 199 Disables support for Ogg/FLAC streams. 200 201 #define DR_FLAC_BUFFER_SIZE <number> 202 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data. 203 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if 204 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8. 205 206 #define DR_FLAC_NO_CRC 207 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will 208 be used if available. Otherwise the seek will be performed using brute force. 209 210 #define DR_FLAC_NO_SIMD 211 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler. 212 213 #define DR_FLAC_NO_WCHAR 214 Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined. 215 216 217 218 Notes 219 ===== 220 - dr_flac does not support changing the sample rate nor channel count mid stream. 221 - dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization. 222 - When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due 223 to differences in corrupted stream recorvery logic between the two APIs. 224 */ 225 226 #ifndef dr_flac_h 227 #define dr_flac_h 228 229 #ifdef __cplusplus 230 extern "C" { 231 #endif 232 233 #define DRFLAC_STRINGIFY(x) #x 234 #define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x) 235 236 #define DRFLAC_VERSION_MAJOR 0 237 #define DRFLAC_VERSION_MINOR 12 238 #define DRFLAC_VERSION_REVISION 42 239 #define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION) 240 241 #include <stddef.h> /* For size_t. */ 242 243 /* Sized Types */ 244 typedef signed char drflac_int8; 245 typedef unsigned char drflac_uint8; 246 typedef signed short drflac_int16; 247 typedef unsigned short drflac_uint16; 248 typedef signed int drflac_int32; 249 typedef unsigned int drflac_uint32; 250 #if defined(_MSC_VER) && !defined(__clang__) 251 typedef signed __int64 drflac_int64; 252 typedef unsigned __int64 drflac_uint64; 253 #else 254 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6))) 255 #pragma GCC diagnostic push 256 #pragma GCC diagnostic ignored "-Wlong-long" 257 #if defined(__clang__) 258 #pragma GCC diagnostic ignored "-Wc++11-long-long" 259 #endif 260 #endif 261 typedef signed long long drflac_int64; 262 typedef unsigned long long drflac_uint64; 263 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6))) 264 #pragma GCC diagnostic pop 265 #endif 266 #endif 267 #if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) 268 typedef drflac_uint64 drflac_uintptr; 269 #else 270 typedef drflac_uint32 drflac_uintptr; 271 #endif 272 typedef drflac_uint8 drflac_bool8; 273 typedef drflac_uint32 drflac_bool32; 274 #define DRFLAC_TRUE 1 275 #define DRFLAC_FALSE 0 276 /* End Sized Types */ 277 278 /* Decorations */ 279 #if !defined(DRFLAC_API) 280 #if defined(DRFLAC_DLL) 281 #if defined(_WIN32) 282 #define DRFLAC_DLL_IMPORT __declspec(dllimport) 283 #define DRFLAC_DLL_EXPORT __declspec(dllexport) 284 #define DRFLAC_DLL_PRIVATE static 285 #else 286 #if defined(__GNUC__) && __GNUC__ >= 4 287 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default"))) 288 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default"))) 289 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden"))) 290 #else 291 #define DRFLAC_DLL_IMPORT 292 #define DRFLAC_DLL_EXPORT 293 #define DRFLAC_DLL_PRIVATE static 294 #endif 295 #endif 296 297 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION) 298 #define DRFLAC_API DRFLAC_DLL_EXPORT 299 #else 300 #define DRFLAC_API DRFLAC_DLL_IMPORT 301 #endif 302 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE 303 #else 304 #define DRFLAC_API extern 305 #define DRFLAC_PRIVATE static 306 #endif 307 #endif 308 /* End Decorations */ 309 310 #if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */ 311 #define DRFLAC_DEPRECATED __declspec(deprecated) 312 #elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */ 313 #define DRFLAC_DEPRECATED __attribute__((deprecated)) 314 #elif defined(__has_feature) /* Clang */ 315 #if __has_feature(attribute_deprecated) 316 #define DRFLAC_DEPRECATED __attribute__((deprecated)) 317 #else 318 #define DRFLAC_DEPRECATED 319 #endif 320 #else 321 #define DRFLAC_DEPRECATED 322 #endif 323 324 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision); 325 DRFLAC_API const char* drflac_version_string(void); 326 327 /* Allocation Callbacks */ 328 typedef struct 329 { 330 void* pUserData; 331 void* (* onMalloc)(size_t sz, void* pUserData); 332 void* (* onRealloc)(void* p, size_t sz, void* pUserData); 333 void (* onFree)(void* p, void* pUserData); 334 } drflac_allocation_callbacks; 335 /* End Allocation Callbacks */ 336 337 /* 338 As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed, 339 but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8. 340 */ 341 #ifndef DR_FLAC_BUFFER_SIZE 342 #define DR_FLAC_BUFFER_SIZE 4096 343 #endif 344 345 346 /* Architecture Detection */ 347 #if defined(_WIN64) || defined(_LP64) || defined(__LP64__) 348 #define DRFLAC_64BIT 349 #endif 350 351 #if defined(__x86_64__) || defined(_M_X64) 352 #define DRFLAC_X64 353 #elif defined(__i386) || defined(_M_IX86) 354 #define DRFLAC_X86 355 #elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64) 356 #define DRFLAC_ARM 357 #endif 358 /* End Architecture Detection */ 359 360 361 #ifdef DRFLAC_64BIT 362 typedef drflac_uint64 drflac_cache_t; 363 #else 364 typedef drflac_uint32 drflac_cache_t; 365 #endif 366 367 /* The various metadata block types. */ 368 #define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0 369 #define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1 370 #define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2 371 #define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3 372 #define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4 373 #define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5 374 #define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6 375 #define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127 376 377 /* The various picture types specified in the PICTURE block. */ 378 #define DRFLAC_PICTURE_TYPE_OTHER 0 379 #define DRFLAC_PICTURE_TYPE_FILE_ICON 1 380 #define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2 381 #define DRFLAC_PICTURE_TYPE_COVER_FRONT 3 382 #define DRFLAC_PICTURE_TYPE_COVER_BACK 4 383 #define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5 384 #define DRFLAC_PICTURE_TYPE_MEDIA 6 385 #define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7 386 #define DRFLAC_PICTURE_TYPE_ARTIST 8 387 #define DRFLAC_PICTURE_TYPE_CONDUCTOR 9 388 #define DRFLAC_PICTURE_TYPE_BAND 10 389 #define DRFLAC_PICTURE_TYPE_COMPOSER 11 390 #define DRFLAC_PICTURE_TYPE_LYRICIST 12 391 #define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13 392 #define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14 393 #define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15 394 #define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16 395 #define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17 396 #define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18 397 #define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19 398 #define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20 399 400 typedef enum 401 { 402 drflac_container_native, 403 drflac_container_ogg, 404 drflac_container_unknown 405 } drflac_container; 406 407 typedef enum 408 { 409 drflac_seek_origin_start, 410 drflac_seek_origin_current 411 } drflac_seek_origin; 412 413 /* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */ 414 typedef struct 415 { 416 drflac_uint64 firstPCMFrame; 417 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */ 418 drflac_uint16 pcmFrameCount; 419 } drflac_seekpoint; 420 421 typedef struct 422 { 423 drflac_uint16 minBlockSizeInPCMFrames; 424 drflac_uint16 maxBlockSizeInPCMFrames; 425 drflac_uint32 minFrameSizeInPCMFrames; 426 drflac_uint32 maxFrameSizeInPCMFrames; 427 drflac_uint32 sampleRate; 428 drflac_uint8 channels; 429 drflac_uint8 bitsPerSample; 430 drflac_uint64 totalPCMFrameCount; 431 drflac_uint8 md5[16]; 432 } drflac_streaminfo; 433 434 typedef struct 435 { 436 /* 437 The metadata type. Use this to know how to interpret the data below. Will be set to one of the 438 DRFLAC_METADATA_BLOCK_TYPE_* tokens. 439 */ 440 drflac_uint32 type; 441 442 /* 443 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to 444 not modify the contents of this buffer. Use the structures below for more meaningful and structured 445 information about the metadata. It's possible for this to be null. 446 */ 447 const void* pRawData; 448 449 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */ 450 drflac_uint32 rawDataSize; 451 452 union 453 { 454 drflac_streaminfo streaminfo; 455 456 struct 457 { 458 int unused; 459 } padding; 460 461 struct 462 { 463 drflac_uint32 id; 464 const void* pData; 465 drflac_uint32 dataSize; 466 } application; 467 468 struct 469 { 470 drflac_uint32 seekpointCount; 471 const drflac_seekpoint* pSeekpoints; 472 } seektable; 473 474 struct 475 { 476 drflac_uint32 vendorLength; 477 const char* vendor; 478 drflac_uint32 commentCount; 479 const void* pComments; 480 } vorbis_comment; 481 482 struct 483 { 484 char catalog[128]; 485 drflac_uint64 leadInSampleCount; 486 drflac_bool32 isCD; 487 drflac_uint8 trackCount; 488 const void* pTrackData; 489 } cuesheet; 490 491 struct 492 { 493 drflac_uint32 type; 494 drflac_uint32 mimeLength; 495 const char* mime; 496 drflac_uint32 descriptionLength; 497 const char* description; 498 drflac_uint32 width; 499 drflac_uint32 height; 500 drflac_uint32 colorDepth; 501 drflac_uint32 indexColorCount; 502 drflac_uint32 pictureDataSize; 503 const drflac_uint8* pPictureData; 504 } picture; 505 } data; 506 } drflac_metadata; 507 508 509 /* 510 Callback for when data needs to be read from the client. 511 512 513 Parameters 514 ---------- 515 pUserData (in) 516 The user data that was passed to drflac_open() and family. 517 518 pBufferOut (out) 519 The output buffer. 520 521 bytesToRead (in) 522 The number of bytes to read. 523 524 525 Return Value 526 ------------ 527 The number of bytes actually read. 528 529 530 Remarks 531 ------- 532 A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or 533 you have reached the end of the stream. 534 */ 535 typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead); 536 537 /* 538 Callback for when data needs to be seeked. 539 540 541 Parameters 542 ---------- 543 pUserData (in) 544 The user data that was passed to drflac_open() and family. 545 546 offset (in) 547 The number of bytes to move, relative to the origin. Will never be negative. 548 549 origin (in) 550 The origin of the seek - the current position or the start of the stream. 551 552 553 Return Value 554 ------------ 555 Whether or not the seek was successful. 556 557 558 Remarks 559 ------- 560 The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be 561 either drflac_seek_origin_start or drflac_seek_origin_current. 562 563 When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected 564 and handled by returning DRFLAC_FALSE. 565 */ 566 typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin); 567 568 /* 569 Callback for when a metadata block is read. 570 571 572 Parameters 573 ---------- 574 pUserData (in) 575 The user data that was passed to drflac_open() and family. 576 577 pMetadata (in) 578 A pointer to a structure containing the data of the metadata block. 579 580 581 Remarks 582 ------- 583 Use pMetadata->type to determine which metadata block is being handled and how to read the data. This 584 will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens. 585 */ 586 typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata); 587 588 589 /* Structure for internal use. Only used for decoders opened with drflac_open_memory. */ 590 typedef struct 591 { 592 const drflac_uint8* data; 593 size_t dataSize; 594 size_t currentReadPos; 595 } drflac__memory_stream; 596 597 /* Structure for internal use. Used for bit streaming. */ 598 typedef struct 599 { 600 /* The function to call when more data needs to be read. */ 601 drflac_read_proc onRead; 602 603 /* The function to call when the current read position needs to be moved. */ 604 drflac_seek_proc onSeek; 605 606 /* The user data to pass around to onRead and onSeek. */ 607 void* pUserData; 608 609 610 /* 611 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the 612 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether 613 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t). 614 */ 615 size_t unalignedByteCount; 616 617 /* The content of the unaligned bytes. */ 618 drflac_cache_t unalignedCache; 619 620 /* The index of the next valid cache line in the "L2" cache. */ 621 drflac_uint32 nextL2Line; 622 623 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */ 624 drflac_uint32 consumedBits; 625 626 /* 627 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such: 628 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions. 629 */ 630 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)]; 631 drflac_cache_t cache; 632 633 /* 634 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this 635 is reset to 0 at the beginning of each frame. 636 */ 637 drflac_uint16 crc16; 638 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */ 639 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */ 640 } drflac_bs; 641 642 typedef struct 643 { 644 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */ 645 drflac_uint8 subframeType; 646 647 /* The number of wasted bits per sample as specified by the sub-frame header. */ 648 drflac_uint8 wastedBitsPerSample; 649 650 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */ 651 drflac_uint8 lpcOrder; 652 653 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */ 654 drflac_int32* pSamplesS32; 655 } drflac_subframe; 656 657 typedef struct 658 { 659 /* 660 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will 661 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits. 662 */ 663 drflac_uint64 pcmFrameNumber; 664 665 /* 666 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This 667 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits. 668 */ 669 drflac_uint32 flacFrameNumber; 670 671 /* The sample rate of this frame. */ 672 drflac_uint32 sampleRate; 673 674 /* The number of PCM frames in each sub-frame within this frame. */ 675 drflac_uint16 blockSizeInPCMFrames; 676 677 /* 678 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this 679 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE. 680 */ 681 drflac_uint8 channelAssignment; 682 683 /* The number of bits per sample within this frame. */ 684 drflac_uint8 bitsPerSample; 685 686 /* The frame's CRC. */ 687 drflac_uint8 crc8; 688 } drflac_frame_header; 689 690 typedef struct 691 { 692 /* The header. */ 693 drflac_frame_header header; 694 695 /* 696 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read, 697 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame. 698 */ 699 drflac_uint32 pcmFramesRemaining; 700 701 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */ 702 drflac_subframe subframes[8]; 703 } drflac_frame; 704 705 typedef struct 706 { 707 /* The function to call when a metadata block is read. */ 708 drflac_meta_proc onMeta; 709 710 /* The user data posted to the metadata callback function. */ 711 void* pUserDataMD; 712 713 /* Memory allocation callbacks. */ 714 drflac_allocation_callbacks allocationCallbacks; 715 716 717 /* The sample rate. Will be set to something like 44100. */ 718 drflac_uint32 sampleRate; 719 720 /* 721 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the 722 value specified in the STREAMINFO block. 723 */ 724 drflac_uint8 channels; 725 726 /* The bits per sample. Will be set to something like 16, 24, etc. */ 727 drflac_uint8 bitsPerSample; 728 729 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */ 730 drflac_uint16 maxBlockSizeInPCMFrames; 731 732 /* 733 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means 734 the total PCM frame count is unknown. Likely the case with streams like internet radio. 735 */ 736 drflac_uint64 totalPCMFrameCount; 737 738 739 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */ 740 drflac_container container; 741 742 /* The number of seekpoints in the seektable. */ 743 drflac_uint32 seekpointCount; 744 745 746 /* Information about the frame the decoder is currently sitting on. */ 747 drflac_frame currentFLACFrame; 748 749 750 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */ 751 drflac_uint64 currentPCMFrame; 752 753 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */ 754 drflac_uint64 firstFLACFramePosInBytes; 755 756 757 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */ 758 drflac__memory_stream memoryStream; 759 760 761 /* A pointer to the decoded sample data. This is an offset of pExtraData. */ 762 drflac_int32* pDecodedSamples; 763 764 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */ 765 drflac_seekpoint* pSeekpoints; 766 767 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */ 768 void* _oggbs; 769 770 /* Internal use only. Used for profiling and testing different seeking modes. */ 771 drflac_bool32 _noSeekTableSeek : 1; 772 drflac_bool32 _noBinarySearchSeek : 1; 773 drflac_bool32 _noBruteForceSeek : 1; 774 775 /* The bit streamer. The raw FLAC data is fed through this object. */ 776 drflac_bs bs; 777 778 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */ 779 drflac_uint8 pExtraData[1]; 780 } drflac; 781 782 783 /* 784 Opens a FLAC decoder. 785 786 787 Parameters 788 ---------- 789 onRead (in) 790 The function to call when data needs to be read from the client. 791 792 onSeek (in) 793 The function to call when the read position of the client data needs to move. 794 795 pUserData (in, optional) 796 A pointer to application defined data that will be passed to onRead and onSeek. 797 798 pAllocationCallbacks (in, optional) 799 A pointer to application defined callbacks for managing memory allocations. 800 801 802 Return Value 803 ------------ 804 Returns a pointer to an object representing the decoder. 805 806 807 Remarks 808 ------- 809 Close the decoder with `drflac_close()`. 810 811 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`. 812 813 This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly 814 without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos. 815 816 This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or 817 from a block of memory respectively. 818 819 The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present. 820 821 Use `drflac_open_with_metadata()` if you need access to metadata. 822 823 824 Seek Also 825 --------- 826 drflac_open_file() 827 drflac_open_memory() 828 drflac_open_with_metadata() 829 drflac_close() 830 */ 831 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 832 833 /* 834 Opens a FLAC stream with relaxed validation of the header block. 835 836 837 Parameters 838 ---------- 839 onRead (in) 840 The function to call when data needs to be read from the client. 841 842 onSeek (in) 843 The function to call when the read position of the client data needs to move. 844 845 container (in) 846 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation. 847 848 pUserData (in, optional) 849 A pointer to application defined data that will be passed to onRead and onSeek. 850 851 pAllocationCallbacks (in, optional) 852 A pointer to application defined callbacks for managing memory allocations. 853 854 855 Return Value 856 ------------ 857 A pointer to an object representing the decoder. 858 859 860 Remarks 861 ------- 862 The same as drflac_open(), except attempts to open the stream even when a header block is not present. 863 864 Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown` 865 as that is for internal use only. 866 867 Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort, 868 force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found. 869 870 Use `drflac_open_with_metadata_relaxed()` if you need access to metadata. 871 */ 872 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 873 874 /* 875 Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.). 876 877 878 Parameters 879 ---------- 880 onRead (in) 881 The function to call when data needs to be read from the client. 882 883 onSeek (in) 884 The function to call when the read position of the client data needs to move. 885 886 onMeta (in) 887 The function to call for every metadata block. 888 889 pUserData (in, optional) 890 A pointer to application defined data that will be passed to onRead, onSeek and onMeta. 891 892 pAllocationCallbacks (in, optional) 893 A pointer to application defined callbacks for managing memory allocations. 894 895 896 Return Value 897 ------------ 898 A pointer to an object representing the decoder. 899 900 901 Remarks 902 ------- 903 Close the decoder with `drflac_close()`. 904 905 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`. 906 907 This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every 908 metadata block except for STREAMINFO and PADDING blocks. 909 910 The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a 911 pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against 912 the different metadata types. 913 914 The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present. 915 916 Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to 917 the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the 918 metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being 919 returned depending on whether or not the stream is being opened with metadata. 920 921 922 Seek Also 923 --------- 924 drflac_open_file_with_metadata() 925 drflac_open_memory_with_metadata() 926 drflac_open() 927 drflac_close() 928 */ 929 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 930 931 /* 932 The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present. 933 934 See Also 935 -------- 936 drflac_open_with_metadata() 937 drflac_open_relaxed() 938 */ 939 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 940 941 /* 942 Closes the given FLAC decoder. 943 944 945 Parameters 946 ---------- 947 pFlac (in) 948 The decoder to close. 949 950 951 Remarks 952 ------- 953 This will destroy the decoder object. 954 955 956 See Also 957 -------- 958 drflac_open() 959 drflac_open_with_metadata() 960 drflac_open_file() 961 drflac_open_file_w() 962 drflac_open_file_with_metadata() 963 drflac_open_file_with_metadata_w() 964 drflac_open_memory() 965 drflac_open_memory_with_metadata() 966 */ 967 DRFLAC_API void drflac_close(drflac* pFlac); 968 969 970 /* 971 Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM. 972 973 974 Parameters 975 ---------- 976 pFlac (in) 977 The decoder. 978 979 framesToRead (in) 980 The number of PCM frames to read. 981 982 pBufferOut (out, optional) 983 A pointer to the buffer that will receive the decoded samples. 984 985 986 Return Value 987 ------------ 988 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end. 989 990 991 Remarks 992 ------- 993 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked. 994 */ 995 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut); 996 997 998 /* 999 Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM. 1000 1001 1002 Parameters 1003 ---------- 1004 pFlac (in) 1005 The decoder. 1006 1007 framesToRead (in) 1008 The number of PCM frames to read. 1009 1010 pBufferOut (out, optional) 1011 A pointer to the buffer that will receive the decoded samples. 1012 1013 1014 Return Value 1015 ------------ 1016 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end. 1017 1018 1019 Remarks 1020 ------- 1021 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked. 1022 1023 Note that this is lossy for streams where the bits per sample is larger than 16. 1024 */ 1025 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut); 1026 1027 /* 1028 Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM. 1029 1030 1031 Parameters 1032 ---------- 1033 pFlac (in) 1034 The decoder. 1035 1036 framesToRead (in) 1037 The number of PCM frames to read. 1038 1039 pBufferOut (out, optional) 1040 A pointer to the buffer that will receive the decoded samples. 1041 1042 1043 Return Value 1044 ------------ 1045 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end. 1046 1047 1048 Remarks 1049 ------- 1050 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked. 1051 1052 Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number. 1053 */ 1054 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut); 1055 1056 /* 1057 Seeks to the PCM frame at the given index. 1058 1059 1060 Parameters 1061 ---------- 1062 pFlac (in) 1063 The decoder. 1064 1065 pcmFrameIndex (in) 1066 The index of the PCM frame to seek to. See notes below. 1067 1068 1069 Return Value 1070 ------------- 1071 `DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise. 1072 */ 1073 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex); 1074 1075 1076 1077 #ifndef DR_FLAC_NO_STDIO 1078 /* 1079 Opens a FLAC decoder from the file at the given path. 1080 1081 1082 Parameters 1083 ---------- 1084 pFileName (in) 1085 The path of the file to open, either absolute or relative to the current directory. 1086 1087 pAllocationCallbacks (in, optional) 1088 A pointer to application defined callbacks for managing memory allocations. 1089 1090 1091 Return Value 1092 ------------ 1093 A pointer to an object representing the decoder. 1094 1095 1096 Remarks 1097 ------- 1098 Close the decoder with drflac_close(). 1099 1100 1101 Remarks 1102 ------- 1103 This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open 1104 at any given time, so keep this mind if you have many decoders open at the same time. 1105 1106 1107 See Also 1108 -------- 1109 drflac_open_file_with_metadata() 1110 drflac_open() 1111 drflac_close() 1112 */ 1113 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks); 1114 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks); 1115 1116 /* 1117 Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.) 1118 1119 1120 Parameters 1121 ---------- 1122 pFileName (in) 1123 The path of the file to open, either absolute or relative to the current directory. 1124 1125 pAllocationCallbacks (in, optional) 1126 A pointer to application defined callbacks for managing memory allocations. 1127 1128 onMeta (in) 1129 The callback to fire for each metadata block. 1130 1131 pUserData (in) 1132 A pointer to the user data to pass to the metadata callback. 1133 1134 pAllocationCallbacks (in) 1135 A pointer to application defined callbacks for managing memory allocations. 1136 1137 1138 Remarks 1139 ------- 1140 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled. 1141 1142 1143 See Also 1144 -------- 1145 drflac_open_with_metadata() 1146 drflac_open() 1147 drflac_close() 1148 */ 1149 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 1150 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 1151 #endif 1152 1153 /* 1154 Opens a FLAC decoder from a pre-allocated block of memory 1155 1156 1157 Parameters 1158 ---------- 1159 pData (in) 1160 A pointer to the raw encoded FLAC data. 1161 1162 dataSize (in) 1163 The size in bytes of `data`. 1164 1165 pAllocationCallbacks (in) 1166 A pointer to application defined callbacks for managing memory allocations. 1167 1168 1169 Return Value 1170 ------------ 1171 A pointer to an object representing the decoder. 1172 1173 1174 Remarks 1175 ------- 1176 This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder. 1177 1178 1179 See Also 1180 -------- 1181 drflac_open() 1182 drflac_close() 1183 */ 1184 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks); 1185 1186 /* 1187 Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.) 1188 1189 1190 Parameters 1191 ---------- 1192 pData (in) 1193 A pointer to the raw encoded FLAC data. 1194 1195 dataSize (in) 1196 The size in bytes of `data`. 1197 1198 onMeta (in) 1199 The callback to fire for each metadata block. 1200 1201 pUserData (in) 1202 A pointer to the user data to pass to the metadata callback. 1203 1204 pAllocationCallbacks (in) 1205 A pointer to application defined callbacks for managing memory allocations. 1206 1207 1208 Remarks 1209 ------- 1210 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled. 1211 1212 1213 See Also 1214 ------- 1215 drflac_open_with_metadata() 1216 drflac_open() 1217 drflac_close() 1218 */ 1219 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks); 1220 1221 1222 1223 /* High Level APIs */ 1224 1225 /* 1226 Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a 1227 pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free(). 1228 1229 You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which 1230 case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE. 1231 1232 Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously 1233 read samples into a dynamically sized buffer on the heap until no samples are left. 1234 1235 Do not call this function on a broadcast type of stream (like internet radio streams and whatnot). 1236 */ 1237 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1238 1239 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */ 1240 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1241 1242 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */ 1243 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1244 1245 #ifndef DR_FLAC_NO_STDIO 1246 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */ 1247 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1248 1249 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */ 1250 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1251 1252 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */ 1253 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1254 #endif 1255 1256 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */ 1257 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1258 1259 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */ 1260 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1261 1262 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */ 1263 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks); 1264 1265 /* 1266 Frees memory that was allocated internally by dr_flac. 1267 1268 Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this. 1269 */ 1270 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks); 1271 1272 1273 /* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */ 1274 typedef struct 1275 { 1276 drflac_uint32 countRemaining; 1277 const char* pRunningData; 1278 } drflac_vorbis_comment_iterator; 1279 1280 /* 1281 Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT 1282 metadata block. 1283 */ 1284 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments); 1285 1286 /* 1287 Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The 1288 returned string is NOT null terminated. 1289 */ 1290 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut); 1291 1292 1293 /* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */ 1294 typedef struct 1295 { 1296 drflac_uint32 countRemaining; 1297 const char* pRunningData; 1298 } drflac_cuesheet_track_iterator; 1299 1300 /* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */ 1301 typedef struct 1302 { 1303 drflac_uint64 offset; 1304 drflac_uint8 index; 1305 drflac_uint8 reserved[3]; 1306 } drflac_cuesheet_track_index; 1307 1308 typedef struct 1309 { 1310 drflac_uint64 offset; 1311 drflac_uint8 trackNumber; 1312 char ISRC[12]; 1313 drflac_bool8 isAudio; 1314 drflac_bool8 preEmphasis; 1315 drflac_uint8 indexCount; 1316 const drflac_cuesheet_track_index* pIndexPoints; 1317 } drflac_cuesheet_track; 1318 1319 /* 1320 Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata 1321 block. 1322 */ 1323 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData); 1324 1325 /* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */ 1326 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack); 1327 1328 1329 #ifdef __cplusplus 1330 } 1331 #endif 1332 #endif /* dr_flac_h */ 1333 1334 1335 /************************************************************************************************************************************************************ 1336 ************************************************************************************************************************************************************ 1337 1338 IMPLEMENTATION 1339 1340 ************************************************************************************************************************************************************ 1341 ************************************************************************************************************************************************************/ 1342 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION) 1343 #ifndef dr_flac_c 1344 #define dr_flac_c 1345 1346 /* Disable some annoying warnings. */ 1347 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6))) 1348 #pragma GCC diagnostic push 1349 #if __GNUC__ >= 7 1350 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough" 1351 #endif 1352 #endif 1353 1354 #ifdef __linux__ 1355 #ifndef _BSD_SOURCE 1356 #define _BSD_SOURCE 1357 #endif 1358 #ifndef _DEFAULT_SOURCE 1359 #define _DEFAULT_SOURCE 1360 #endif 1361 #ifndef __USE_BSD 1362 #define __USE_BSD 1363 #endif 1364 #include <endian.h> 1365 #endif 1366 1367 #include <stdlib.h> 1368 #include <string.h> 1369 1370 /* Inline */ 1371 #ifdef _MSC_VER 1372 #define DRFLAC_INLINE __forceinline 1373 #elif defined(__GNUC__) 1374 /* 1375 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when 1376 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some 1377 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the 1378 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue 1379 I am using "__inline__" only when we're compiling in strict ANSI mode. 1380 */ 1381 #if defined(__STRICT_ANSI__) 1382 #define DRFLAC_GNUC_INLINE_HINT __inline__ 1383 #else 1384 #define DRFLAC_GNUC_INLINE_HINT inline 1385 #endif 1386 1387 #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__) 1388 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline)) 1389 #else 1390 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT 1391 #endif 1392 #elif defined(__WATCOMC__) 1393 #define DRFLAC_INLINE __inline 1394 #else 1395 #define DRFLAC_INLINE 1396 #endif 1397 /* End Inline */ 1398 1399 /* 1400 Intrinsics Support 1401 1402 There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with 1403 1404 "error: shift must be an immediate" 1405 1406 Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below. 1407 */ 1408 #if !defined(DR_FLAC_NO_SIMD) 1409 #if defined(DRFLAC_X64) || defined(DRFLAC_X86) 1410 #if defined(_MSC_VER) && !defined(__clang__) 1411 /* MSVC. */ 1412 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */ 1413 #define DRFLAC_SUPPORT_SSE2 1414 #endif 1415 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */ 1416 #define DRFLAC_SUPPORT_SSE41 1417 #endif 1418 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))) 1419 /* Assume GNUC-style. */ 1420 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2) 1421 #define DRFLAC_SUPPORT_SSE2 1422 #endif 1423 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41) 1424 #define DRFLAC_SUPPORT_SSE41 1425 #endif 1426 #endif 1427 1428 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */ 1429 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include) 1430 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>) 1431 #define DRFLAC_SUPPORT_SSE2 1432 #endif 1433 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>) 1434 #define DRFLAC_SUPPORT_SSE41 1435 #endif 1436 #endif 1437 1438 #if defined(DRFLAC_SUPPORT_SSE41) 1439 #include <smmintrin.h> 1440 #elif defined(DRFLAC_SUPPORT_SSE2) 1441 #include <emmintrin.h> 1442 #endif 1443 #endif 1444 1445 #if defined(DRFLAC_ARM) 1446 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64)) 1447 #define DRFLAC_SUPPORT_NEON 1448 #include <arm_neon.h> 1449 #endif 1450 #endif 1451 #endif 1452 1453 /* Compile-time CPU feature support. */ 1454 #if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) 1455 #if defined(_MSC_VER) && !defined(__clang__) 1456 #if _MSC_VER >= 1400 1457 #include <intrin.h> 1458 static void drflac__cpuid(int info[4], int fid) 1459 { 1460 __cpuid(info, fid); 1461 } 1462 #else 1463 #define DRFLAC_NO_CPUID 1464 #endif 1465 #else 1466 #if defined(__GNUC__) || defined(__clang__) 1467 static void drflac__cpuid(int info[4], int fid) 1468 { 1469 /* 1470 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the 1471 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for 1472 supporting different assembly dialects. 1473 1474 What's basically happening is that we're saving and restoring the ebx register manually. 1475 */ 1476 #if defined(DRFLAC_X86) && defined(__PIC__) 1477 __asm__ __volatile__ ( 1478 "xchg{l} {%%}ebx, %k1;" 1479 "cpuid;" 1480 "xchg{l} {%%}ebx, %k1;" 1481 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0) 1482 ); 1483 #else 1484 __asm__ __volatile__ ( 1485 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0) 1486 ); 1487 #endif 1488 } 1489 #else 1490 #define DRFLAC_NO_CPUID 1491 #endif 1492 #endif 1493 #else 1494 #define DRFLAC_NO_CPUID 1495 #endif 1496 1497 static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void) 1498 { 1499 #if defined(DRFLAC_SUPPORT_SSE2) 1500 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2) 1501 #if defined(DRFLAC_X64) 1502 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */ 1503 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__) 1504 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */ 1505 #else 1506 #if defined(DRFLAC_NO_CPUID) 1507 return DRFLAC_FALSE; 1508 #else 1509 int info[4]; 1510 drflac__cpuid(info, 1); 1511 return (info[3] & (1 << 26)) != 0; 1512 #endif 1513 #endif 1514 #else 1515 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */ 1516 #endif 1517 #else 1518 return DRFLAC_FALSE; /* No compiler support. */ 1519 #endif 1520 } 1521 1522 static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void) 1523 { 1524 #if defined(DRFLAC_SUPPORT_SSE41) 1525 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41) 1526 #if defined(__SSE4_1__) || defined(__AVX__) 1527 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */ 1528 #else 1529 #if defined(DRFLAC_NO_CPUID) 1530 return DRFLAC_FALSE; 1531 #else 1532 int info[4]; 1533 drflac__cpuid(info, 1); 1534 return (info[2] & (1 << 19)) != 0; 1535 #endif 1536 #endif 1537 #else 1538 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */ 1539 #endif 1540 #else 1541 return DRFLAC_FALSE; /* No compiler support. */ 1542 #endif 1543 } 1544 1545 1546 #if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__) 1547 #define DRFLAC_HAS_LZCNT_INTRINSIC 1548 #elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7))) 1549 #define DRFLAC_HAS_LZCNT_INTRINSIC 1550 #elif defined(__clang__) 1551 #if defined(__has_builtin) 1552 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl) 1553 #define DRFLAC_HAS_LZCNT_INTRINSIC 1554 #endif 1555 #endif 1556 #endif 1557 1558 #if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__) 1559 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC 1560 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC 1561 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC 1562 #elif defined(__clang__) 1563 #if defined(__has_builtin) 1564 #if __has_builtin(__builtin_bswap16) 1565 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC 1566 #endif 1567 #if __has_builtin(__builtin_bswap32) 1568 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC 1569 #endif 1570 #if __has_builtin(__builtin_bswap64) 1571 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC 1572 #endif 1573 #endif 1574 #elif defined(__GNUC__) 1575 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)) 1576 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC 1577 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC 1578 #endif 1579 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8)) 1580 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC 1581 #endif 1582 #elif defined(__WATCOMC__) && defined(__386__) 1583 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC 1584 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC 1585 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC 1586 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16); 1587 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32); 1588 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64); 1589 #pragma aux _watcom_bswap16 = \ 1590 "xchg al, ah" \ 1591 parm [ax] \ 1592 value [ax] \ 1593 modify nomemory; 1594 #pragma aux _watcom_bswap32 = \ 1595 "bswap eax" \ 1596 parm [eax] \ 1597 value [eax] \ 1598 modify nomemory; 1599 #pragma aux _watcom_bswap64 = \ 1600 "bswap eax" \ 1601 "bswap edx" \ 1602 "xchg eax,edx" \ 1603 parm [eax edx] \ 1604 value [eax edx] \ 1605 modify nomemory; 1606 #endif 1607 1608 1609 /* Standard library stuff. */ 1610 #ifndef DRFLAC_ASSERT 1611 #include <assert.h> 1612 #define DRFLAC_ASSERT(expression) assert(expression) 1613 #endif 1614 #ifndef DRFLAC_MALLOC 1615 #define DRFLAC_MALLOC(sz) malloc((sz)) 1616 #endif 1617 #ifndef DRFLAC_REALLOC 1618 #define DRFLAC_REALLOC(p, sz) realloc((p), (sz)) 1619 #endif 1620 #ifndef DRFLAC_FREE 1621 #define DRFLAC_FREE(p) free((p)) 1622 #endif 1623 #ifndef DRFLAC_COPY_MEMORY 1624 #define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz)) 1625 #endif 1626 #ifndef DRFLAC_ZERO_MEMORY 1627 #define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz)) 1628 #endif 1629 #ifndef DRFLAC_ZERO_OBJECT 1630 #define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p))) 1631 #endif 1632 1633 #define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */ 1634 1635 /* Result Codes */ 1636 typedef drflac_int32 drflac_result; 1637 #define DRFLAC_SUCCESS 0 1638 #define DRFLAC_ERROR -1 /* A generic error. */ 1639 #define DRFLAC_INVALID_ARGS -2 1640 #define DRFLAC_INVALID_OPERATION -3 1641 #define DRFLAC_OUT_OF_MEMORY -4 1642 #define DRFLAC_OUT_OF_RANGE -5 1643 #define DRFLAC_ACCESS_DENIED -6 1644 #define DRFLAC_DOES_NOT_EXIST -7 1645 #define DRFLAC_ALREADY_EXISTS -8 1646 #define DRFLAC_TOO_MANY_OPEN_FILES -9 1647 #define DRFLAC_INVALID_FILE -10 1648 #define DRFLAC_TOO_BIG -11 1649 #define DRFLAC_PATH_TOO_LONG -12 1650 #define DRFLAC_NAME_TOO_LONG -13 1651 #define DRFLAC_NOT_DIRECTORY -14 1652 #define DRFLAC_IS_DIRECTORY -15 1653 #define DRFLAC_DIRECTORY_NOT_EMPTY -16 1654 #define DRFLAC_END_OF_FILE -17 1655 #define DRFLAC_NO_SPACE -18 1656 #define DRFLAC_BUSY -19 1657 #define DRFLAC_IO_ERROR -20 1658 #define DRFLAC_INTERRUPT -21 1659 #define DRFLAC_UNAVAILABLE -22 1660 #define DRFLAC_ALREADY_IN_USE -23 1661 #define DRFLAC_BAD_ADDRESS -24 1662 #define DRFLAC_BAD_SEEK -25 1663 #define DRFLAC_BAD_PIPE -26 1664 #define DRFLAC_DEADLOCK -27 1665 #define DRFLAC_TOO_MANY_LINKS -28 1666 #define DRFLAC_NOT_IMPLEMENTED -29 1667 #define DRFLAC_NO_MESSAGE -30 1668 #define DRFLAC_BAD_MESSAGE -31 1669 #define DRFLAC_NO_DATA_AVAILABLE -32 1670 #define DRFLAC_INVALID_DATA -33 1671 #define DRFLAC_TIMEOUT -34 1672 #define DRFLAC_NO_NETWORK -35 1673 #define DRFLAC_NOT_UNIQUE -36 1674 #define DRFLAC_NOT_SOCKET -37 1675 #define DRFLAC_NO_ADDRESS -38 1676 #define DRFLAC_BAD_PROTOCOL -39 1677 #define DRFLAC_PROTOCOL_UNAVAILABLE -40 1678 #define DRFLAC_PROTOCOL_NOT_SUPPORTED -41 1679 #define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42 1680 #define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43 1681 #define DRFLAC_SOCKET_NOT_SUPPORTED -44 1682 #define DRFLAC_CONNECTION_RESET -45 1683 #define DRFLAC_ALREADY_CONNECTED -46 1684 #define DRFLAC_NOT_CONNECTED -47 1685 #define DRFLAC_CONNECTION_REFUSED -48 1686 #define DRFLAC_NO_HOST -49 1687 #define DRFLAC_IN_PROGRESS -50 1688 #define DRFLAC_CANCELLED -51 1689 #define DRFLAC_MEMORY_ALREADY_MAPPED -52 1690 #define DRFLAC_AT_END -53 1691 1692 #define DRFLAC_CRC_MISMATCH -100 1693 /* End Result Codes */ 1694 1695 1696 #define DRFLAC_SUBFRAME_CONSTANT 0 1697 #define DRFLAC_SUBFRAME_VERBATIM 1 1698 #define DRFLAC_SUBFRAME_FIXED 8 1699 #define DRFLAC_SUBFRAME_LPC 32 1700 #define DRFLAC_SUBFRAME_RESERVED 255 1701 1702 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0 1703 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1 1704 1705 #define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0 1706 #define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8 1707 #define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9 1708 #define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10 1709 1710 #define DRFLAC_SEEKPOINT_SIZE_IN_BYTES 18 1711 #define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES 36 1712 #define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES 12 1713 1714 #define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a)) 1715 1716 1717 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision) 1718 { 1719 if (pMajor) { 1720 *pMajor = DRFLAC_VERSION_MAJOR; 1721 } 1722 1723 if (pMinor) { 1724 *pMinor = DRFLAC_VERSION_MINOR; 1725 } 1726 1727 if (pRevision) { 1728 *pRevision = DRFLAC_VERSION_REVISION; 1729 } 1730 } 1731 1732 DRFLAC_API const char* drflac_version_string(void) 1733 { 1734 return DRFLAC_VERSION_STRING; 1735 } 1736 1737 1738 /* CPU caps. */ 1739 #if defined(__has_feature) 1740 #if __has_feature(thread_sanitizer) 1741 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread"))) 1742 #else 1743 #define DRFLAC_NO_THREAD_SANITIZE 1744 #endif 1745 #else 1746 #define DRFLAC_NO_THREAD_SANITIZE 1747 #endif 1748 1749 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) 1750 static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE; 1751 #endif 1752 1753 #ifndef DRFLAC_NO_CPUID 1754 static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE; 1755 static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE; 1756 1757 /* 1758 I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does 1759 actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of 1760 complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore 1761 just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute. 1762 */ 1763 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void) 1764 { 1765 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE; 1766 1767 if (!isCPUCapsInitialized) { 1768 /* LZCNT */ 1769 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) 1770 int info[4] = {0}; 1771 drflac__cpuid(info, 0x80000001); 1772 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0; 1773 #endif 1774 1775 /* SSE2 */ 1776 drflac__gIsSSE2Supported = drflac_has_sse2(); 1777 1778 /* SSE4.1 */ 1779 drflac__gIsSSE41Supported = drflac_has_sse41(); 1780 1781 /* Initialized. */ 1782 isCPUCapsInitialized = DRFLAC_TRUE; 1783 } 1784 } 1785 #else 1786 static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE; 1787 1788 static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void) 1789 { 1790 #if defined(DRFLAC_SUPPORT_NEON) 1791 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON) 1792 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64)) 1793 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */ 1794 #else 1795 /* TODO: Runtime check. */ 1796 return DRFLAC_FALSE; 1797 #endif 1798 #else 1799 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */ 1800 #endif 1801 #else 1802 return DRFLAC_FALSE; /* No compiler support. */ 1803 #endif 1804 } 1805 1806 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void) 1807 { 1808 drflac__gIsNEONSupported = drflac__has_neon(); 1809 1810 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) 1811 drflac__gIsLZCNTSupported = DRFLAC_TRUE; 1812 #endif 1813 } 1814 #endif 1815 1816 1817 /* Endian Management */ 1818 static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void) 1819 { 1820 #if defined(DRFLAC_X86) || defined(DRFLAC_X64) 1821 return DRFLAC_TRUE; 1822 #elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN 1823 return DRFLAC_TRUE; 1824 #else 1825 int n = 1; 1826 return (*(char*)&n) == 1; 1827 #endif 1828 } 1829 1830 static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n) 1831 { 1832 #ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC 1833 #if defined(_MSC_VER) && !defined(__clang__) 1834 return _byteswap_ushort(n); 1835 #elif defined(__GNUC__) || defined(__clang__) 1836 return __builtin_bswap16(n); 1837 #elif defined(__WATCOMC__) && defined(__386__) 1838 return _watcom_bswap16(n); 1839 #else 1840 #error "This compiler does not support the byte swap intrinsic." 1841 #endif 1842 #else 1843 return ((n & 0xFF00) >> 8) | 1844 ((n & 0x00FF) << 8); 1845 #endif 1846 } 1847 1848 static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n) 1849 { 1850 #ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC 1851 #if defined(_MSC_VER) && !defined(__clang__) 1852 return _byteswap_ulong(n); 1853 #elif defined(__GNUC__) || defined(__clang__) 1854 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */ 1855 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */ 1856 drflac_uint32 r; 1857 __asm__ __volatile__ ( 1858 #if defined(DRFLAC_64BIT) 1859 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */ 1860 #else 1861 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n) 1862 #endif 1863 ); 1864 return r; 1865 #else 1866 return __builtin_bswap32(n); 1867 #endif 1868 #elif defined(__WATCOMC__) && defined(__386__) 1869 return _watcom_bswap32(n); 1870 #else 1871 #error "This compiler does not support the byte swap intrinsic." 1872 #endif 1873 #else 1874 return ((n & 0xFF000000) >> 24) | 1875 ((n & 0x00FF0000) >> 8) | 1876 ((n & 0x0000FF00) << 8) | 1877 ((n & 0x000000FF) << 24); 1878 #endif 1879 } 1880 1881 static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n) 1882 { 1883 #ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC 1884 #if defined(_MSC_VER) && !defined(__clang__) 1885 return _byteswap_uint64(n); 1886 #elif defined(__GNUC__) || defined(__clang__) 1887 return __builtin_bswap64(n); 1888 #elif defined(__WATCOMC__) && defined(__386__) 1889 return _watcom_bswap64(n); 1890 #else 1891 #error "This compiler does not support the byte swap intrinsic." 1892 #endif 1893 #else 1894 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */ 1895 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) | 1896 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) | 1897 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) | 1898 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) | 1899 ((n & ((drflac_uint64)0xFF000000 )) << 8) | 1900 ((n & ((drflac_uint64)0x00FF0000 )) << 24) | 1901 ((n & ((drflac_uint64)0x0000FF00 )) << 40) | 1902 ((n & ((drflac_uint64)0x000000FF )) << 56); 1903 #endif 1904 } 1905 1906 1907 static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n) 1908 { 1909 if (drflac__is_little_endian()) { 1910 return drflac__swap_endian_uint16(n); 1911 } 1912 1913 return n; 1914 } 1915 1916 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n) 1917 { 1918 if (drflac__is_little_endian()) { 1919 return drflac__swap_endian_uint32(n); 1920 } 1921 1922 return n; 1923 } 1924 1925 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData) 1926 { 1927 const drflac_uint8* pNum = (drflac_uint8*)pData; 1928 return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3); 1929 } 1930 1931 static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n) 1932 { 1933 if (drflac__is_little_endian()) { 1934 return drflac__swap_endian_uint64(n); 1935 } 1936 1937 return n; 1938 } 1939 1940 1941 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n) 1942 { 1943 if (!drflac__is_little_endian()) { 1944 return drflac__swap_endian_uint32(n); 1945 } 1946 1947 return n; 1948 } 1949 1950 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData) 1951 { 1952 const drflac_uint8* pNum = (drflac_uint8*)pData; 1953 return *pNum | *(pNum+1) << 8 | *(pNum+2) << 16 | *(pNum+3) << 24; 1954 } 1955 1956 1957 static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n) 1958 { 1959 drflac_uint32 result = 0; 1960 result |= (n & 0x7F000000) >> 3; 1961 result |= (n & 0x007F0000) >> 2; 1962 result |= (n & 0x00007F00) >> 1; 1963 result |= (n & 0x0000007F) >> 0; 1964 1965 return result; 1966 } 1967 1968 1969 1970 /* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */ 1971 static drflac_uint8 drflac__crc8_table[] = { 1972 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D, 1973 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D, 1974 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD, 1975 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD, 1976 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA, 1977 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A, 1978 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A, 1979 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A, 1980 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4, 1981 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4, 1982 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44, 1983 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34, 1984 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63, 1985 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13, 1986 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83, 1987 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3 1988 }; 1989 1990 static drflac_uint16 drflac__crc16_table[] = { 1991 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011, 1992 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022, 1993 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072, 1994 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041, 1995 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2, 1996 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1, 1997 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1, 1998 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082, 1999 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192, 2000 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1, 2001 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1, 2002 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2, 2003 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151, 2004 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162, 2005 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132, 2006 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101, 2007 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312, 2008 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321, 2009 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371, 2010 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342, 2011 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1, 2012 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2, 2013 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2, 2014 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381, 2015 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291, 2016 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2, 2017 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2, 2018 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1, 2019 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252, 2020 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261, 2021 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231, 2022 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202 2023 }; 2024 2025 static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data) 2026 { 2027 return drflac__crc8_table[crc ^ data]; 2028 } 2029 2030 static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count) 2031 { 2032 #ifdef DR_FLAC_NO_CRC 2033 (void)crc; 2034 (void)data; 2035 (void)count; 2036 return 0; 2037 #else 2038 #if 0 2039 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */ 2040 drflac_uint8 p = 0x07; 2041 for (int i = count-1; i >= 0; --i) { 2042 drflac_uint8 bit = (data & (1 << i)) >> i; 2043 if (crc & 0x80) { 2044 crc = ((crc << 1) | bit) ^ p; 2045 } else { 2046 crc = ((crc << 1) | bit); 2047 } 2048 } 2049 return crc; 2050 #else 2051 drflac_uint32 wholeBytes; 2052 drflac_uint32 leftoverBits; 2053 drflac_uint64 leftoverDataMask; 2054 2055 static drflac_uint64 leftoverDataMaskTable[8] = { 2056 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F 2057 }; 2058 2059 DRFLAC_ASSERT(count <= 32); 2060 2061 wholeBytes = count >> 3; 2062 leftoverBits = count - (wholeBytes*8); 2063 leftoverDataMask = leftoverDataMaskTable[leftoverBits]; 2064 2065 switch (wholeBytes) { 2066 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits))); 2067 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits))); 2068 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits))); 2069 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits))); 2070 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]); 2071 } 2072 return crc; 2073 #endif 2074 #endif 2075 } 2076 2077 static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data) 2078 { 2079 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data]; 2080 } 2081 2082 static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data) 2083 { 2084 #ifdef DRFLAC_64BIT 2085 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF)); 2086 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF)); 2087 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF)); 2088 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF)); 2089 #endif 2090 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF)); 2091 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF)); 2092 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF)); 2093 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF)); 2094 2095 return crc; 2096 } 2097 2098 static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount) 2099 { 2100 switch (byteCount) 2101 { 2102 #ifdef DRFLAC_64BIT 2103 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF)); 2104 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF)); 2105 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF)); 2106 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF)); 2107 #endif 2108 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF)); 2109 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF)); 2110 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF)); 2111 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF)); 2112 } 2113 2114 return crc; 2115 } 2116 2117 #if 0 2118 static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count) 2119 { 2120 #ifdef DR_FLAC_NO_CRC 2121 (void)crc; 2122 (void)data; 2123 (void)count; 2124 return 0; 2125 #else 2126 #if 0 2127 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */ 2128 drflac_uint16 p = 0x8005; 2129 for (int i = count-1; i >= 0; --i) { 2130 drflac_uint16 bit = (data & (1ULL << i)) >> i; 2131 if (r & 0x8000) { 2132 r = ((r << 1) | bit) ^ p; 2133 } else { 2134 r = ((r << 1) | bit); 2135 } 2136 } 2137 2138 return crc; 2139 #else 2140 drflac_uint32 wholeBytes; 2141 drflac_uint32 leftoverBits; 2142 drflac_uint64 leftoverDataMask; 2143 2144 static drflac_uint64 leftoverDataMaskTable[8] = { 2145 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F 2146 }; 2147 2148 DRFLAC_ASSERT(count <= 64); 2149 2150 wholeBytes = count >> 3; 2151 leftoverBits = count & 7; 2152 leftoverDataMask = leftoverDataMaskTable[leftoverBits]; 2153 2154 switch (wholeBytes) { 2155 default: 2156 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits))); 2157 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits))); 2158 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits))); 2159 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits))); 2160 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)]; 2161 } 2162 return crc; 2163 #endif 2164 #endif 2165 } 2166 2167 static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count) 2168 { 2169 #ifdef DR_FLAC_NO_CRC 2170 (void)crc; 2171 (void)data; 2172 (void)count; 2173 return 0; 2174 #else 2175 drflac_uint32 wholeBytes; 2176 drflac_uint32 leftoverBits; 2177 drflac_uint64 leftoverDataMask; 2178 2179 static drflac_uint64 leftoverDataMaskTable[8] = { 2180 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F 2181 }; 2182 2183 DRFLAC_ASSERT(count <= 64); 2184 2185 wholeBytes = count >> 3; 2186 leftoverBits = count & 7; 2187 leftoverDataMask = leftoverDataMaskTable[leftoverBits]; 2188 2189 switch (wholeBytes) { 2190 default: 2191 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */ 2192 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits))); 2193 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits))); 2194 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits))); 2195 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits))); 2196 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits))); 2197 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits))); 2198 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits))); 2199 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)]; 2200 } 2201 return crc; 2202 #endif 2203 } 2204 2205 2206 static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count) 2207 { 2208 #ifdef DRFLAC_64BIT 2209 return drflac_crc16__64bit(crc, data, count); 2210 #else 2211 return drflac_crc16__32bit(crc, data, count); 2212 #endif 2213 } 2214 #endif 2215 2216 2217 #ifdef DRFLAC_64BIT 2218 #define drflac__be2host__cache_line drflac__be2host_64 2219 #else 2220 #define drflac__be2host__cache_line drflac__be2host_32 2221 #endif 2222 2223 /* 2224 BIT READING ATTEMPT #2 2225 2226 This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting 2227 on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache 2228 is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an 2229 array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data 2230 from onRead() is read into. 2231 */ 2232 #define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache)) 2233 #define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8) 2234 #define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits) 2235 #define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount))) 2236 #define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount)) 2237 #define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount)) 2238 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount))) 2239 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1))) 2240 #define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2)) 2241 #define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0])) 2242 #define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line) 2243 2244 2245 #ifndef DR_FLAC_NO_CRC 2246 static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs) 2247 { 2248 bs->crc16 = 0; 2249 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3; 2250 } 2251 2252 static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs) 2253 { 2254 if (bs->crc16CacheIgnoredBytes == 0) { 2255 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache); 2256 } else { 2257 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes); 2258 bs->crc16CacheIgnoredBytes = 0; 2259 } 2260 } 2261 2262 static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs) 2263 { 2264 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */ 2265 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0); 2266 2267 /* 2268 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined 2269 by the number of bits that have been consumed. 2270 */ 2271 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) { 2272 drflac__update_crc16(bs); 2273 } else { 2274 /* We only accumulate the consumed bits. */ 2275 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes); 2276 2277 /* 2278 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated 2279 so we can handle that later. 2280 */ 2281 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3; 2282 } 2283 2284 return bs->crc16; 2285 } 2286 #endif 2287 2288 static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs) 2289 { 2290 size_t bytesRead; 2291 size_t alignedL1LineCount; 2292 2293 /* Fast path. Try loading straight from L2. */ 2294 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) { 2295 bs->cache = bs->cacheL2[bs->nextL2Line++]; 2296 return DRFLAC_TRUE; 2297 } 2298 2299 /* 2300 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's 2301 any left. 2302 */ 2303 if (bs->unalignedByteCount > 0) { 2304 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */ 2305 } 2306 2307 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs)); 2308 2309 bs->nextL2Line = 0; 2310 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) { 2311 bs->cache = bs->cacheL2[bs->nextL2Line++]; 2312 return DRFLAC_TRUE; 2313 } 2314 2315 2316 /* 2317 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably 2318 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer 2319 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to 2320 the size of the L1 so we'll need to seek backwards by any misaligned bytes. 2321 */ 2322 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs); 2323 2324 /* We need to keep track of any unaligned bytes for later use. */ 2325 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs)); 2326 if (bs->unalignedByteCount > 0) { 2327 bs->unalignedCache = bs->cacheL2[alignedL1LineCount]; 2328 } 2329 2330 if (alignedL1LineCount > 0) { 2331 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount; 2332 size_t i; 2333 for (i = alignedL1LineCount; i > 0; --i) { 2334 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1]; 2335 } 2336 2337 bs->nextL2Line = (drflac_uint32)offset; 2338 bs->cache = bs->cacheL2[bs->nextL2Line++]; 2339 return DRFLAC_TRUE; 2340 } else { 2341 /* If we get into this branch it means we weren't able to load any L1-aligned data. */ 2342 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); 2343 return DRFLAC_FALSE; 2344 } 2345 } 2346 2347 static drflac_bool32 drflac__reload_cache(drflac_bs* bs) 2348 { 2349 size_t bytesRead; 2350 2351 #ifndef DR_FLAC_NO_CRC 2352 drflac__update_crc16(bs); 2353 #endif 2354 2355 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */ 2356 if (drflac__reload_l1_cache_from_l2(bs)) { 2357 bs->cache = drflac__be2host__cache_line(bs->cache); 2358 bs->consumedBits = 0; 2359 #ifndef DR_FLAC_NO_CRC 2360 bs->crc16Cache = bs->cache; 2361 #endif 2362 return DRFLAC_TRUE; 2363 } 2364 2365 /* Slow path. */ 2366 2367 /* 2368 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last 2369 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the 2370 data from the unaligned cache. 2371 */ 2372 bytesRead = bs->unalignedByteCount; 2373 if (bytesRead == 0) { 2374 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */ 2375 return DRFLAC_FALSE; 2376 } 2377 2378 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs)); 2379 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8; 2380 2381 bs->cache = drflac__be2host__cache_line(bs->unalignedCache); 2382 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */ 2383 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */ 2384 2385 #ifndef DR_FLAC_NO_CRC 2386 bs->crc16Cache = bs->cache >> bs->consumedBits; 2387 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3; 2388 #endif 2389 return DRFLAC_TRUE; 2390 } 2391 2392 static void drflac__reset_cache(drflac_bs* bs) 2393 { 2394 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */ 2395 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */ 2396 bs->cache = 0; 2397 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */ 2398 bs->unalignedCache = 0; 2399 2400 #ifndef DR_FLAC_NO_CRC 2401 bs->crc16Cache = 0; 2402 bs->crc16CacheIgnoredBytes = 0; 2403 #endif 2404 } 2405 2406 2407 static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut) 2408 { 2409 DRFLAC_ASSERT(bs != NULL); 2410 DRFLAC_ASSERT(pResultOut != NULL); 2411 DRFLAC_ASSERT(bitCount > 0); 2412 DRFLAC_ASSERT(bitCount <= 32); 2413 2414 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) { 2415 if (!drflac__reload_cache(bs)) { 2416 return DRFLAC_FALSE; 2417 } 2418 } 2419 2420 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 2421 /* 2422 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do 2423 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly 2424 more optimal solution for this. 2425 */ 2426 #ifdef DRFLAC_64BIT 2427 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount); 2428 bs->consumedBits += bitCount; 2429 bs->cache <<= bitCount; 2430 #else 2431 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { 2432 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount); 2433 bs->consumedBits += bitCount; 2434 bs->cache <<= bitCount; 2435 } else { 2436 /* Cannot shift by 32-bits, so need to do it differently. */ 2437 *pResultOut = (drflac_uint32)bs->cache; 2438 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); 2439 bs->cache = 0; 2440 } 2441 #endif 2442 2443 return DRFLAC_TRUE; 2444 } else { 2445 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */ 2446 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs); 2447 drflac_uint32 bitCountLo = bitCount - bitCountHi; 2448 drflac_uint32 resultHi; 2449 2450 DRFLAC_ASSERT(bitCountHi > 0); 2451 DRFLAC_ASSERT(bitCountHi < 32); 2452 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi); 2453 2454 if (!drflac__reload_cache(bs)) { 2455 return DRFLAC_FALSE; 2456 } 2457 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 2458 /* This happens when we get to end of stream */ 2459 return DRFLAC_FALSE; 2460 } 2461 2462 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo); 2463 bs->consumedBits += bitCountLo; 2464 bs->cache <<= bitCountLo; 2465 return DRFLAC_TRUE; 2466 } 2467 } 2468 2469 static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult) 2470 { 2471 drflac_uint32 result; 2472 2473 DRFLAC_ASSERT(bs != NULL); 2474 DRFLAC_ASSERT(pResult != NULL); 2475 DRFLAC_ASSERT(bitCount > 0); 2476 DRFLAC_ASSERT(bitCount <= 32); 2477 2478 if (!drflac__read_uint32(bs, bitCount, &result)) { 2479 return DRFLAC_FALSE; 2480 } 2481 2482 /* Do not attempt to shift by 32 as it's undefined. */ 2483 if (bitCount < 32) { 2484 drflac_uint32 signbit; 2485 signbit = ((result >> (bitCount-1)) & 0x01); 2486 result |= (~signbit + 1) << bitCount; 2487 } 2488 2489 *pResult = (drflac_int32)result; 2490 return DRFLAC_TRUE; 2491 } 2492 2493 #ifdef DRFLAC_64BIT 2494 static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut) 2495 { 2496 drflac_uint32 resultHi; 2497 drflac_uint32 resultLo; 2498 2499 DRFLAC_ASSERT(bitCount <= 64); 2500 DRFLAC_ASSERT(bitCount > 32); 2501 2502 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) { 2503 return DRFLAC_FALSE; 2504 } 2505 2506 if (!drflac__read_uint32(bs, 32, &resultLo)) { 2507 return DRFLAC_FALSE; 2508 } 2509 2510 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo); 2511 return DRFLAC_TRUE; 2512 } 2513 #endif 2514 2515 /* Function below is unused, but leaving it here in case I need to quickly add it again. */ 2516 #if 0 2517 static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut) 2518 { 2519 drflac_uint64 result; 2520 drflac_uint64 signbit; 2521 2522 DRFLAC_ASSERT(bitCount <= 64); 2523 2524 if (!drflac__read_uint64(bs, bitCount, &result)) { 2525 return DRFLAC_FALSE; 2526 } 2527 2528 signbit = ((result >> (bitCount-1)) & 0x01); 2529 result |= (~signbit + 1) << bitCount; 2530 2531 *pResultOut = (drflac_int64)result; 2532 return DRFLAC_TRUE; 2533 } 2534 #endif 2535 2536 static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult) 2537 { 2538 drflac_uint32 result; 2539 2540 DRFLAC_ASSERT(bs != NULL); 2541 DRFLAC_ASSERT(pResult != NULL); 2542 DRFLAC_ASSERT(bitCount > 0); 2543 DRFLAC_ASSERT(bitCount <= 16); 2544 2545 if (!drflac__read_uint32(bs, bitCount, &result)) { 2546 return DRFLAC_FALSE; 2547 } 2548 2549 *pResult = (drflac_uint16)result; 2550 return DRFLAC_TRUE; 2551 } 2552 2553 #if 0 2554 static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult) 2555 { 2556 drflac_int32 result; 2557 2558 DRFLAC_ASSERT(bs != NULL); 2559 DRFLAC_ASSERT(pResult != NULL); 2560 DRFLAC_ASSERT(bitCount > 0); 2561 DRFLAC_ASSERT(bitCount <= 16); 2562 2563 if (!drflac__read_int32(bs, bitCount, &result)) { 2564 return DRFLAC_FALSE; 2565 } 2566 2567 *pResult = (drflac_int16)result; 2568 return DRFLAC_TRUE; 2569 } 2570 #endif 2571 2572 static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult) 2573 { 2574 drflac_uint32 result; 2575 2576 DRFLAC_ASSERT(bs != NULL); 2577 DRFLAC_ASSERT(pResult != NULL); 2578 DRFLAC_ASSERT(bitCount > 0); 2579 DRFLAC_ASSERT(bitCount <= 8); 2580 2581 if (!drflac__read_uint32(bs, bitCount, &result)) { 2582 return DRFLAC_FALSE; 2583 } 2584 2585 *pResult = (drflac_uint8)result; 2586 return DRFLAC_TRUE; 2587 } 2588 2589 static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult) 2590 { 2591 drflac_int32 result; 2592 2593 DRFLAC_ASSERT(bs != NULL); 2594 DRFLAC_ASSERT(pResult != NULL); 2595 DRFLAC_ASSERT(bitCount > 0); 2596 DRFLAC_ASSERT(bitCount <= 8); 2597 2598 if (!drflac__read_int32(bs, bitCount, &result)) { 2599 return DRFLAC_FALSE; 2600 } 2601 2602 *pResult = (drflac_int8)result; 2603 return DRFLAC_TRUE; 2604 } 2605 2606 2607 static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek) 2608 { 2609 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 2610 bs->consumedBits += (drflac_uint32)bitsToSeek; 2611 bs->cache <<= bitsToSeek; 2612 return DRFLAC_TRUE; 2613 } else { 2614 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */ 2615 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs); 2616 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs); 2617 bs->cache = 0; 2618 2619 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */ 2620 #ifdef DRFLAC_64BIT 2621 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) { 2622 drflac_uint64 bin; 2623 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) { 2624 return DRFLAC_FALSE; 2625 } 2626 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs); 2627 } 2628 #else 2629 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) { 2630 drflac_uint32 bin; 2631 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) { 2632 return DRFLAC_FALSE; 2633 } 2634 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs); 2635 } 2636 #endif 2637 2638 /* Whole leftover bytes. */ 2639 while (bitsToSeek >= 8) { 2640 drflac_uint8 bin; 2641 if (!drflac__read_uint8(bs, 8, &bin)) { 2642 return DRFLAC_FALSE; 2643 } 2644 bitsToSeek -= 8; 2645 } 2646 2647 /* Leftover bits. */ 2648 if (bitsToSeek > 0) { 2649 drflac_uint8 bin; 2650 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) { 2651 return DRFLAC_FALSE; 2652 } 2653 bitsToSeek = 0; /* <-- Necessary for the assert below. */ 2654 } 2655 2656 DRFLAC_ASSERT(bitsToSeek == 0); 2657 return DRFLAC_TRUE; 2658 } 2659 } 2660 2661 2662 /* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */ 2663 static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs) 2664 { 2665 DRFLAC_ASSERT(bs != NULL); 2666 2667 /* 2668 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first 2669 thing to do is align to the next byte. 2670 */ 2671 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) { 2672 return DRFLAC_FALSE; 2673 } 2674 2675 for (;;) { 2676 drflac_uint8 hi; 2677 2678 #ifndef DR_FLAC_NO_CRC 2679 drflac__reset_crc16(bs); 2680 #endif 2681 2682 if (!drflac__read_uint8(bs, 8, &hi)) { 2683 return DRFLAC_FALSE; 2684 } 2685 2686 if (hi == 0xFF) { 2687 drflac_uint8 lo; 2688 if (!drflac__read_uint8(bs, 6, &lo)) { 2689 return DRFLAC_FALSE; 2690 } 2691 2692 if (lo == 0x3E) { 2693 return DRFLAC_TRUE; 2694 } else { 2695 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) { 2696 return DRFLAC_FALSE; 2697 } 2698 } 2699 } 2700 } 2701 2702 /* Should never get here. */ 2703 /*return DRFLAC_FALSE;*/ 2704 } 2705 2706 2707 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) 2708 #define DRFLAC_IMPLEMENT_CLZ_LZCNT 2709 #endif 2710 #if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__) 2711 #define DRFLAC_IMPLEMENT_CLZ_MSVC 2712 #endif 2713 #if defined(__WATCOMC__) && defined(__386__) 2714 #define DRFLAC_IMPLEMENT_CLZ_WATCOM 2715 #endif 2716 #ifdef __MRC__ 2717 #include <intrinsics.h> 2718 #define DRFLAC_IMPLEMENT_CLZ_MRC 2719 #endif 2720 2721 static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x) 2722 { 2723 drflac_uint32 n; 2724 static drflac_uint32 clz_table_4[] = { 2725 0, 2726 4, 2727 3, 3, 2728 2, 2, 2, 2, 2729 1, 1, 1, 1, 1, 1, 1, 1 2730 }; 2731 2732 if (x == 0) { 2733 return sizeof(x)*8; 2734 } 2735 2736 n = clz_table_4[x >> (sizeof(x)*8 - 4)]; 2737 if (n == 0) { 2738 #ifdef DRFLAC_64BIT 2739 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; } 2740 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; } 2741 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; } 2742 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; } 2743 #else 2744 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; } 2745 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; } 2746 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; } 2747 #endif 2748 n += clz_table_4[x >> (sizeof(x)*8 - 4)]; 2749 } 2750 2751 return n - 1; 2752 } 2753 2754 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT 2755 static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void) 2756 { 2757 /* Fast compile time check for ARM. */ 2758 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) 2759 return DRFLAC_TRUE; 2760 #elif defined(__MRC__) 2761 return DRFLAC_TRUE; 2762 #else 2763 /* If the compiler itself does not support the intrinsic then we'll need to return false. */ 2764 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC 2765 return drflac__gIsLZCNTSupported; 2766 #else 2767 return DRFLAC_FALSE; 2768 #endif 2769 #endif 2770 } 2771 2772 static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x) 2773 { 2774 /* 2775 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics 2776 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave 2777 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or 2778 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work 2779 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include 2780 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this 2781 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register 2782 getting clobbered? 2783 2784 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline 2785 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed. 2786 2787 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra 2788 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice 2789 to know how to fix the inlined assembly for correctness sake, however. 2790 */ 2791 2792 #if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */ 2793 #ifdef DRFLAC_64BIT 2794 return (drflac_uint32)__lzcnt64(x); 2795 #else 2796 return (drflac_uint32)__lzcnt(x); 2797 #endif 2798 #else 2799 #if defined(__GNUC__) || defined(__clang__) 2800 #if defined(DRFLAC_X64) 2801 { 2802 drflac_uint64 r; 2803 __asm__ __volatile__ ( 2804 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc" 2805 ); 2806 2807 return (drflac_uint32)r; 2808 } 2809 #elif defined(DRFLAC_X86) 2810 { 2811 drflac_uint32 r; 2812 __asm__ __volatile__ ( 2813 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc" 2814 ); 2815 2816 return r; 2817 } 2818 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */ 2819 { 2820 unsigned int r; 2821 __asm__ __volatile__ ( 2822 #if defined(DRFLAC_64BIT) 2823 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */ 2824 #else 2825 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x) 2826 #endif 2827 ); 2828 2829 return r; 2830 } 2831 #else 2832 if (x == 0) { 2833 return sizeof(x)*8; 2834 } 2835 #ifdef DRFLAC_64BIT 2836 return (drflac_uint32)__builtin_clzll((drflac_uint64)x); 2837 #else 2838 return (drflac_uint32)__builtin_clzl((drflac_uint32)x); 2839 #endif 2840 #endif 2841 #else 2842 /* Unsupported compiler. */ 2843 #error "This compiler does not support the lzcnt intrinsic." 2844 #endif 2845 #endif 2846 } 2847 #endif 2848 2849 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC 2850 #include <intrin.h> /* For BitScanReverse(). */ 2851 2852 static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x) 2853 { 2854 drflac_uint32 n; 2855 2856 if (x == 0) { 2857 return sizeof(x)*8; 2858 } 2859 2860 #ifdef DRFLAC_64BIT 2861 _BitScanReverse64((unsigned long*)&n, x); 2862 #else 2863 _BitScanReverse((unsigned long*)&n, x); 2864 #endif 2865 return sizeof(x)*8 - n - 1; 2866 } 2867 #endif 2868 2869 #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM 2870 static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32); 2871 #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT 2872 /* Use the LZCNT instruction (only available on some processors since the 2010s). */ 2873 #pragma aux drflac__clz_watcom_lzcnt = \ 2874 "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \ 2875 parm [eax] \ 2876 value [eax] \ 2877 modify nomemory; 2878 #else 2879 /* Use the 386+-compatible implementation. */ 2880 #pragma aux drflac__clz_watcom = \ 2881 "bsr eax, eax" \ 2882 "xor eax, 31" \ 2883 parm [eax] nomemory \ 2884 value [eax] \ 2885 modify exact [eax] nomemory; 2886 #endif 2887 #endif 2888 2889 static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x) 2890 { 2891 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT 2892 if (drflac__is_lzcnt_supported()) { 2893 return drflac__clz_lzcnt(x); 2894 } else 2895 #endif 2896 { 2897 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC 2898 return drflac__clz_msvc(x); 2899 #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT) 2900 return drflac__clz_watcom_lzcnt(x); 2901 #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM) 2902 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x); 2903 #elif defined(__MRC__) 2904 return __cntlzw(x); 2905 #else 2906 return drflac__clz_software(x); 2907 #endif 2908 } 2909 } 2910 2911 2912 static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut) 2913 { 2914 drflac_uint32 zeroCounter = 0; 2915 drflac_uint32 setBitOffsetPlus1; 2916 2917 while (bs->cache == 0) { 2918 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs); 2919 if (!drflac__reload_cache(bs)) { 2920 return DRFLAC_FALSE; 2921 } 2922 } 2923 2924 if (bs->cache == 1) { 2925 /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */ 2926 *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1; 2927 if (!drflac__reload_cache(bs)) { 2928 return DRFLAC_FALSE; 2929 } 2930 2931 return DRFLAC_TRUE; 2932 } 2933 2934 setBitOffsetPlus1 = drflac__clz(bs->cache); 2935 setBitOffsetPlus1 += 1; 2936 2937 if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 2938 /* This happens when we get to end of stream */ 2939 return DRFLAC_FALSE; 2940 } 2941 2942 bs->consumedBits += setBitOffsetPlus1; 2943 bs->cache <<= setBitOffsetPlus1; 2944 2945 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1; 2946 return DRFLAC_TRUE; 2947 } 2948 2949 2950 2951 static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart) 2952 { 2953 DRFLAC_ASSERT(bs != NULL); 2954 DRFLAC_ASSERT(offsetFromStart > 0); 2955 2956 /* 2957 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which 2958 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit. 2959 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder. 2960 */ 2961 if (offsetFromStart > 0x7FFFFFFF) { 2962 drflac_uint64 bytesRemaining = offsetFromStart; 2963 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) { 2964 return DRFLAC_FALSE; 2965 } 2966 bytesRemaining -= 0x7FFFFFFF; 2967 2968 while (bytesRemaining > 0x7FFFFFFF) { 2969 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) { 2970 return DRFLAC_FALSE; 2971 } 2972 bytesRemaining -= 0x7FFFFFFF; 2973 } 2974 2975 if (bytesRemaining > 0) { 2976 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) { 2977 return DRFLAC_FALSE; 2978 } 2979 } 2980 } else { 2981 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) { 2982 return DRFLAC_FALSE; 2983 } 2984 } 2985 2986 /* The cache should be reset to force a reload of fresh data from the client. */ 2987 drflac__reset_cache(bs); 2988 return DRFLAC_TRUE; 2989 } 2990 2991 2992 static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut) 2993 { 2994 drflac_uint8 crc; 2995 drflac_uint64 result; 2996 drflac_uint8 utf8[7] = {0}; 2997 int byteCount; 2998 int i; 2999 3000 DRFLAC_ASSERT(bs != NULL); 3001 DRFLAC_ASSERT(pNumberOut != NULL); 3002 DRFLAC_ASSERT(pCRCOut != NULL); 3003 3004 crc = *pCRCOut; 3005 3006 if (!drflac__read_uint8(bs, 8, utf8)) { 3007 *pNumberOut = 0; 3008 return DRFLAC_AT_END; 3009 } 3010 crc = drflac_crc8(crc, utf8[0], 8); 3011 3012 if ((utf8[0] & 0x80) == 0) { 3013 *pNumberOut = utf8[0]; 3014 *pCRCOut = crc; 3015 return DRFLAC_SUCCESS; 3016 } 3017 3018 /*byteCount = 1;*/ 3019 if ((utf8[0] & 0xE0) == 0xC0) { 3020 byteCount = 2; 3021 } else if ((utf8[0] & 0xF0) == 0xE0) { 3022 byteCount = 3; 3023 } else if ((utf8[0] & 0xF8) == 0xF0) { 3024 byteCount = 4; 3025 } else if ((utf8[0] & 0xFC) == 0xF8) { 3026 byteCount = 5; 3027 } else if ((utf8[0] & 0xFE) == 0xFC) { 3028 byteCount = 6; 3029 } else if ((utf8[0] & 0xFF) == 0xFE) { 3030 byteCount = 7; 3031 } else { 3032 *pNumberOut = 0; 3033 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */ 3034 } 3035 3036 /* Read extra bytes. */ 3037 DRFLAC_ASSERT(byteCount > 1); 3038 3039 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1))); 3040 for (i = 1; i < byteCount; ++i) { 3041 if (!drflac__read_uint8(bs, 8, utf8 + i)) { 3042 *pNumberOut = 0; 3043 return DRFLAC_AT_END; 3044 } 3045 crc = drflac_crc8(crc, utf8[i], 8); 3046 3047 result = (result << 6) | (utf8[i] & 0x3F); 3048 } 3049 3050 *pNumberOut = result; 3051 *pCRCOut = crc; 3052 return DRFLAC_SUCCESS; 3053 } 3054 3055 3056 static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x) 3057 { 3058 #if 1 /* Needs optimizing. */ 3059 drflac_uint32 result = 0; 3060 while (x > 0) { 3061 result += 1; 3062 x >>= 1; 3063 } 3064 3065 return result; 3066 #endif 3067 } 3068 3069 static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision) 3070 { 3071 /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */ 3072 return bitsPerSample + precision + drflac__ilog2_u32(order) > 32; 3073 } 3074 3075 3076 /* 3077 The next two functions are responsible for calculating the prediction. 3078 3079 When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's 3080 safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16. 3081 */ 3082 #if defined(__clang__) 3083 __attribute__((no_sanitize("signed-integer-overflow"))) 3084 #endif 3085 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples) 3086 { 3087 drflac_int32 prediction = 0; 3088 3089 DRFLAC_ASSERT(order <= 32); 3090 3091 /* 32-bit version. */ 3092 3093 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */ 3094 switch (order) 3095 { 3096 case 32: prediction += coefficients[31] * pDecodedSamples[-32]; 3097 case 31: prediction += coefficients[30] * pDecodedSamples[-31]; 3098 case 30: prediction += coefficients[29] * pDecodedSamples[-30]; 3099 case 29: prediction += coefficients[28] * pDecodedSamples[-29]; 3100 case 28: prediction += coefficients[27] * pDecodedSamples[-28]; 3101 case 27: prediction += coefficients[26] * pDecodedSamples[-27]; 3102 case 26: prediction += coefficients[25] * pDecodedSamples[-26]; 3103 case 25: prediction += coefficients[24] * pDecodedSamples[-25]; 3104 case 24: prediction += coefficients[23] * pDecodedSamples[-24]; 3105 case 23: prediction += coefficients[22] * pDecodedSamples[-23]; 3106 case 22: prediction += coefficients[21] * pDecodedSamples[-22]; 3107 case 21: prediction += coefficients[20] * pDecodedSamples[-21]; 3108 case 20: prediction += coefficients[19] * pDecodedSamples[-20]; 3109 case 19: prediction += coefficients[18] * pDecodedSamples[-19]; 3110 case 18: prediction += coefficients[17] * pDecodedSamples[-18]; 3111 case 17: prediction += coefficients[16] * pDecodedSamples[-17]; 3112 case 16: prediction += coefficients[15] * pDecodedSamples[-16]; 3113 case 15: prediction += coefficients[14] * pDecodedSamples[-15]; 3114 case 14: prediction += coefficients[13] * pDecodedSamples[-14]; 3115 case 13: prediction += coefficients[12] * pDecodedSamples[-13]; 3116 case 12: prediction += coefficients[11] * pDecodedSamples[-12]; 3117 case 11: prediction += coefficients[10] * pDecodedSamples[-11]; 3118 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10]; 3119 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9]; 3120 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8]; 3121 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7]; 3122 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6]; 3123 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5]; 3124 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4]; 3125 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3]; 3126 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2]; 3127 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1]; 3128 } 3129 3130 return (drflac_int32)(prediction >> shift); 3131 } 3132 3133 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples) 3134 { 3135 drflac_int64 prediction; 3136 3137 DRFLAC_ASSERT(order <= 32); 3138 3139 /* 64-bit version. */ 3140 3141 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */ 3142 #ifndef DRFLAC_64BIT 3143 if (order == 8) 3144 { 3145 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3146 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3147 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3148 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3149 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3150 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3151 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7]; 3152 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8]; 3153 } 3154 else if (order == 7) 3155 { 3156 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3157 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3158 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3159 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3160 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3161 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3162 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7]; 3163 } 3164 else if (order == 3) 3165 { 3166 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3167 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3168 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3169 } 3170 else if (order == 6) 3171 { 3172 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3173 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3174 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3175 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3176 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3177 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3178 } 3179 else if (order == 5) 3180 { 3181 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3182 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3183 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3184 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3185 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3186 } 3187 else if (order == 4) 3188 { 3189 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3190 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3191 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3192 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3193 } 3194 else if (order == 12) 3195 { 3196 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3197 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3198 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3199 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3200 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3201 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3202 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7]; 3203 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8]; 3204 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9]; 3205 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10]; 3206 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11]; 3207 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12]; 3208 } 3209 else if (order == 2) 3210 { 3211 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3212 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3213 } 3214 else if (order == 1) 3215 { 3216 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3217 } 3218 else if (order == 10) 3219 { 3220 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3221 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3222 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3223 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3224 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3225 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3226 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7]; 3227 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8]; 3228 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9]; 3229 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10]; 3230 } 3231 else if (order == 9) 3232 { 3233 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3234 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3235 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3236 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3237 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3238 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3239 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7]; 3240 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8]; 3241 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9]; 3242 } 3243 else if (order == 11) 3244 { 3245 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1]; 3246 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2]; 3247 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3]; 3248 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4]; 3249 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5]; 3250 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6]; 3251 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7]; 3252 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8]; 3253 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9]; 3254 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10]; 3255 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11]; 3256 } 3257 else 3258 { 3259 int j; 3260 3261 prediction = 0; 3262 for (j = 0; j < (int)order; ++j) { 3263 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1]; 3264 } 3265 } 3266 #endif 3267 3268 /* 3269 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some 3270 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform. 3271 */ 3272 #ifdef DRFLAC_64BIT 3273 prediction = 0; 3274 switch (order) 3275 { 3276 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32]; 3277 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31]; 3278 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30]; 3279 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29]; 3280 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28]; 3281 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27]; 3282 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26]; 3283 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25]; 3284 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24]; 3285 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23]; 3286 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22]; 3287 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21]; 3288 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20]; 3289 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19]; 3290 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18]; 3291 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17]; 3292 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16]; 3293 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15]; 3294 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14]; 3295 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13]; 3296 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12]; 3297 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11]; 3298 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10]; 3299 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9]; 3300 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8]; 3301 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7]; 3302 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6]; 3303 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5]; 3304 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4]; 3305 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3]; 3306 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2]; 3307 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1]; 3308 } 3309 #endif 3310 3311 return (drflac_int32)(prediction >> shift); 3312 } 3313 3314 3315 #if 0 3316 /* 3317 Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the 3318 sake of readability and should only be used as a reference. 3319 */ 3320 static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 3321 { 3322 drflac_uint32 i; 3323 3324 DRFLAC_ASSERT(bs != NULL); 3325 DRFLAC_ASSERT(pSamplesOut != NULL); 3326 3327 for (i = 0; i < count; ++i) { 3328 drflac_uint32 zeroCounter = 0; 3329 for (;;) { 3330 drflac_uint8 bit; 3331 if (!drflac__read_uint8(bs, 1, &bit)) { 3332 return DRFLAC_FALSE; 3333 } 3334 3335 if (bit == 0) { 3336 zeroCounter += 1; 3337 } else { 3338 break; 3339 } 3340 } 3341 3342 drflac_uint32 decodedRice; 3343 if (riceParam > 0) { 3344 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) { 3345 return DRFLAC_FALSE; 3346 } 3347 } else { 3348 decodedRice = 0; 3349 } 3350 3351 decodedRice |= (zeroCounter << riceParam); 3352 if ((decodedRice & 0x01)) { 3353 decodedRice = ~(decodedRice >> 1); 3354 } else { 3355 decodedRice = (decodedRice >> 1); 3356 } 3357 3358 3359 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) { 3360 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i); 3361 } else { 3362 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i); 3363 } 3364 } 3365 3366 return DRFLAC_TRUE; 3367 } 3368 #endif 3369 3370 #if 0 3371 static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut) 3372 { 3373 drflac_uint32 zeroCounter = 0; 3374 drflac_uint32 decodedRice; 3375 3376 for (;;) { 3377 drflac_uint8 bit; 3378 if (!drflac__read_uint8(bs, 1, &bit)) { 3379 return DRFLAC_FALSE; 3380 } 3381 3382 if (bit == 0) { 3383 zeroCounter += 1; 3384 } else { 3385 break; 3386 } 3387 } 3388 3389 if (riceParam > 0) { 3390 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) { 3391 return DRFLAC_FALSE; 3392 } 3393 } else { 3394 decodedRice = 0; 3395 } 3396 3397 *pZeroCounterOut = zeroCounter; 3398 *pRiceParamPartOut = decodedRice; 3399 return DRFLAC_TRUE; 3400 } 3401 #endif 3402 3403 #if 0 3404 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut) 3405 { 3406 drflac_cache_t riceParamMask; 3407 drflac_uint32 zeroCounter; 3408 drflac_uint32 setBitOffsetPlus1; 3409 drflac_uint32 riceParamPart; 3410 drflac_uint32 riceLength; 3411 3412 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */ 3413 3414 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam); 3415 3416 zeroCounter = 0; 3417 while (bs->cache == 0) { 3418 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs); 3419 if (!drflac__reload_cache(bs)) { 3420 return DRFLAC_FALSE; 3421 } 3422 } 3423 3424 setBitOffsetPlus1 = drflac__clz(bs->cache); 3425 zeroCounter += setBitOffsetPlus1; 3426 setBitOffsetPlus1 += 1; 3427 3428 riceLength = setBitOffsetPlus1 + riceParam; 3429 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 3430 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength)); 3431 3432 bs->consumedBits += riceLength; 3433 bs->cache <<= riceLength; 3434 } else { 3435 drflac_uint32 bitCountLo; 3436 drflac_cache_t resultHi; 3437 3438 bs->consumedBits += riceLength; 3439 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */ 3440 3441 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */ 3442 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs); 3443 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */ 3444 3445 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) { 3446 #ifndef DR_FLAC_NO_CRC 3447 drflac__update_crc16(bs); 3448 #endif 3449 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]); 3450 bs->consumedBits = 0; 3451 #ifndef DR_FLAC_NO_CRC 3452 bs->crc16Cache = bs->cache; 3453 #endif 3454 } else { 3455 /* Slow path. We need to fetch more data from the client. */ 3456 if (!drflac__reload_cache(bs)) { 3457 return DRFLAC_FALSE; 3458 } 3459 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 3460 /* This happens when we get to end of stream */ 3461 return DRFLAC_FALSE; 3462 } 3463 } 3464 3465 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo)); 3466 3467 bs->consumedBits += bitCountLo; 3468 bs->cache <<= bitCountLo; 3469 } 3470 3471 pZeroCounterOut[0] = zeroCounter; 3472 pRiceParamPartOut[0] = riceParamPart; 3473 3474 return DRFLAC_TRUE; 3475 } 3476 #endif 3477 3478 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut) 3479 { 3480 drflac_uint32 riceParamPlus1 = riceParam + 1; 3481 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/ 3482 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1); 3483 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1; 3484 3485 /* 3486 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have 3487 no idea how this will work in practice... 3488 */ 3489 drflac_cache_t bs_cache = bs->cache; 3490 drflac_uint32 bs_consumedBits = bs->consumedBits; 3491 3492 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */ 3493 drflac_uint32 lzcount = drflac__clz(bs_cache); 3494 if (lzcount < sizeof(bs_cache)*8) { 3495 pZeroCounterOut[0] = lzcount; 3496 3497 /* 3498 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting 3499 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled 3500 outside of this function at a higher level. 3501 */ 3502 extract_rice_param_part: 3503 bs_cache <<= lzcount; 3504 bs_consumedBits += lzcount; 3505 3506 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) { 3507 /* Getting here means the rice parameter part is wholly contained within the current cache line. */ 3508 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift); 3509 bs_cache <<= riceParamPlus1; 3510 bs_consumedBits += riceParamPlus1; 3511 } else { 3512 drflac_uint32 riceParamPartHi; 3513 drflac_uint32 riceParamPartLo; 3514 drflac_uint32 riceParamPartLoBitCount; 3515 3516 /* 3517 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache 3518 line, reload the cache, and then combine it with the head of the next cache line. 3519 */ 3520 3521 /* Grab the high part of the rice parameter part. */ 3522 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift); 3523 3524 /* Before reloading the cache we need to grab the size in bits of the low part. */ 3525 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits; 3526 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32); 3527 3528 /* Now reload the cache. */ 3529 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) { 3530 #ifndef DR_FLAC_NO_CRC 3531 drflac__update_crc16(bs); 3532 #endif 3533 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]); 3534 bs_consumedBits = riceParamPartLoBitCount; 3535 #ifndef DR_FLAC_NO_CRC 3536 bs->crc16Cache = bs_cache; 3537 #endif 3538 } else { 3539 /* Slow path. We need to fetch more data from the client. */ 3540 if (!drflac__reload_cache(bs)) { 3541 return DRFLAC_FALSE; 3542 } 3543 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 3544 /* This happens when we get to end of stream */ 3545 return DRFLAC_FALSE; 3546 } 3547 3548 bs_cache = bs->cache; 3549 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount; 3550 } 3551 3552 /* We should now have enough information to construct the rice parameter part. */ 3553 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount))); 3554 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo; 3555 3556 bs_cache <<= riceParamPartLoBitCount; 3557 } 3558 } else { 3559 /* 3560 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call 3561 to drflac__clz() and we need to reload the cache. 3562 */ 3563 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits); 3564 for (;;) { 3565 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) { 3566 #ifndef DR_FLAC_NO_CRC 3567 drflac__update_crc16(bs); 3568 #endif 3569 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]); 3570 bs_consumedBits = 0; 3571 #ifndef DR_FLAC_NO_CRC 3572 bs->crc16Cache = bs_cache; 3573 #endif 3574 } else { 3575 /* Slow path. We need to fetch more data from the client. */ 3576 if (!drflac__reload_cache(bs)) { 3577 return DRFLAC_FALSE; 3578 } 3579 3580 bs_cache = bs->cache; 3581 bs_consumedBits = bs->consumedBits; 3582 } 3583 3584 lzcount = drflac__clz(bs_cache); 3585 zeroCounter += lzcount; 3586 3587 if (lzcount < sizeof(bs_cache)*8) { 3588 break; 3589 } 3590 } 3591 3592 pZeroCounterOut[0] = zeroCounter; 3593 goto extract_rice_param_part; 3594 } 3595 3596 /* Make sure the cache is restored at the end of it all. */ 3597 bs->cache = bs_cache; 3598 bs->consumedBits = bs_consumedBits; 3599 3600 return DRFLAC_TRUE; 3601 } 3602 3603 static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam) 3604 { 3605 drflac_uint32 riceParamPlus1 = riceParam + 1; 3606 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1; 3607 3608 /* 3609 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have 3610 no idea how this will work in practice... 3611 */ 3612 drflac_cache_t bs_cache = bs->cache; 3613 drflac_uint32 bs_consumedBits = bs->consumedBits; 3614 3615 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */ 3616 drflac_uint32 lzcount = drflac__clz(bs_cache); 3617 if (lzcount < sizeof(bs_cache)*8) { 3618 /* 3619 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting 3620 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled 3621 outside of this function at a higher level. 3622 */ 3623 extract_rice_param_part: 3624 bs_cache <<= lzcount; 3625 bs_consumedBits += lzcount; 3626 3627 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) { 3628 /* Getting here means the rice parameter part is wholly contained within the current cache line. */ 3629 bs_cache <<= riceParamPlus1; 3630 bs_consumedBits += riceParamPlus1; 3631 } else { 3632 /* 3633 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache 3634 line, reload the cache, and then combine it with the head of the next cache line. 3635 */ 3636 3637 /* Before reloading the cache we need to grab the size in bits of the low part. */ 3638 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits; 3639 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32); 3640 3641 /* Now reload the cache. */ 3642 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) { 3643 #ifndef DR_FLAC_NO_CRC 3644 drflac__update_crc16(bs); 3645 #endif 3646 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]); 3647 bs_consumedBits = riceParamPartLoBitCount; 3648 #ifndef DR_FLAC_NO_CRC 3649 bs->crc16Cache = bs_cache; 3650 #endif 3651 } else { 3652 /* Slow path. We need to fetch more data from the client. */ 3653 if (!drflac__reload_cache(bs)) { 3654 return DRFLAC_FALSE; 3655 } 3656 3657 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) { 3658 /* This happens when we get to end of stream */ 3659 return DRFLAC_FALSE; 3660 } 3661 3662 bs_cache = bs->cache; 3663 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount; 3664 } 3665 3666 bs_cache <<= riceParamPartLoBitCount; 3667 } 3668 } else { 3669 /* 3670 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call 3671 to drflac__clz() and we need to reload the cache. 3672 */ 3673 for (;;) { 3674 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) { 3675 #ifndef DR_FLAC_NO_CRC 3676 drflac__update_crc16(bs); 3677 #endif 3678 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]); 3679 bs_consumedBits = 0; 3680 #ifndef DR_FLAC_NO_CRC 3681 bs->crc16Cache = bs_cache; 3682 #endif 3683 } else { 3684 /* Slow path. We need to fetch more data from the client. */ 3685 if (!drflac__reload_cache(bs)) { 3686 return DRFLAC_FALSE; 3687 } 3688 3689 bs_cache = bs->cache; 3690 bs_consumedBits = bs->consumedBits; 3691 } 3692 3693 lzcount = drflac__clz(bs_cache); 3694 if (lzcount < sizeof(bs_cache)*8) { 3695 break; 3696 } 3697 } 3698 3699 goto extract_rice_param_part; 3700 } 3701 3702 /* Make sure the cache is restored at the end of it all. */ 3703 bs->cache = bs_cache; 3704 bs->consumedBits = bs_consumedBits; 3705 3706 return DRFLAC_TRUE; 3707 } 3708 3709 3710 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 3711 { 3712 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF}; 3713 drflac_uint32 zeroCountPart0; 3714 drflac_uint32 riceParamPart0; 3715 drflac_uint32 riceParamMask; 3716 drflac_uint32 i; 3717 3718 DRFLAC_ASSERT(bs != NULL); 3719 DRFLAC_ASSERT(pSamplesOut != NULL); 3720 3721 (void)bitsPerSample; 3722 (void)order; 3723 (void)shift; 3724 (void)coefficients; 3725 3726 riceParamMask = (drflac_uint32)~((~0UL) << riceParam); 3727 3728 i = 0; 3729 while (i < count) { 3730 /* Rice extraction. */ 3731 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) { 3732 return DRFLAC_FALSE; 3733 } 3734 3735 /* Rice reconstruction. */ 3736 riceParamPart0 &= riceParamMask; 3737 riceParamPart0 |= (zeroCountPart0 << riceParam); 3738 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01]; 3739 3740 pSamplesOut[i] = riceParamPart0; 3741 3742 i += 1; 3743 } 3744 3745 return DRFLAC_TRUE; 3746 } 3747 3748 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 3749 { 3750 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF}; 3751 drflac_uint32 zeroCountPart0 = 0; 3752 drflac_uint32 zeroCountPart1 = 0; 3753 drflac_uint32 zeroCountPart2 = 0; 3754 drflac_uint32 zeroCountPart3 = 0; 3755 drflac_uint32 riceParamPart0 = 0; 3756 drflac_uint32 riceParamPart1 = 0; 3757 drflac_uint32 riceParamPart2 = 0; 3758 drflac_uint32 riceParamPart3 = 0; 3759 drflac_uint32 riceParamMask; 3760 const drflac_int32* pSamplesOutEnd; 3761 drflac_uint32 i; 3762 3763 DRFLAC_ASSERT(bs != NULL); 3764 DRFLAC_ASSERT(pSamplesOut != NULL); 3765 3766 if (lpcOrder == 0) { 3767 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut); 3768 } 3769 3770 riceParamMask = (drflac_uint32)~((~0UL) << riceParam); 3771 pSamplesOutEnd = pSamplesOut + (count & ~3); 3772 3773 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) { 3774 while (pSamplesOut < pSamplesOutEnd) { 3775 /* 3776 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version 3777 against an array. Not sure why, but perhaps it's making more efficient use of registers? 3778 */ 3779 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) || 3780 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) || 3781 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) || 3782 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) { 3783 return DRFLAC_FALSE; 3784 } 3785 3786 riceParamPart0 &= riceParamMask; 3787 riceParamPart1 &= riceParamMask; 3788 riceParamPart2 &= riceParamMask; 3789 riceParamPart3 &= riceParamMask; 3790 3791 riceParamPart0 |= (zeroCountPart0 << riceParam); 3792 riceParamPart1 |= (zeroCountPart1 << riceParam); 3793 riceParamPart2 |= (zeroCountPart2 << riceParam); 3794 riceParamPart3 |= (zeroCountPart3 << riceParam); 3795 3796 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01]; 3797 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01]; 3798 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01]; 3799 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01]; 3800 3801 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0); 3802 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1); 3803 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2); 3804 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3); 3805 3806 pSamplesOut += 4; 3807 } 3808 } else { 3809 while (pSamplesOut < pSamplesOutEnd) { 3810 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) || 3811 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) || 3812 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) || 3813 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) { 3814 return DRFLAC_FALSE; 3815 } 3816 3817 riceParamPart0 &= riceParamMask; 3818 riceParamPart1 &= riceParamMask; 3819 riceParamPart2 &= riceParamMask; 3820 riceParamPart3 &= riceParamMask; 3821 3822 riceParamPart0 |= (zeroCountPart0 << riceParam); 3823 riceParamPart1 |= (zeroCountPart1 << riceParam); 3824 riceParamPart2 |= (zeroCountPart2 << riceParam); 3825 riceParamPart3 |= (zeroCountPart3 << riceParam); 3826 3827 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01]; 3828 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01]; 3829 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01]; 3830 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01]; 3831 3832 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0); 3833 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1); 3834 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2); 3835 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3); 3836 3837 pSamplesOut += 4; 3838 } 3839 } 3840 3841 i = (count & ~3); 3842 while (i < count) { 3843 /* Rice extraction. */ 3844 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) { 3845 return DRFLAC_FALSE; 3846 } 3847 3848 /* Rice reconstruction. */ 3849 riceParamPart0 &= riceParamMask; 3850 riceParamPart0 |= (zeroCountPart0 << riceParam); 3851 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01]; 3852 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/ 3853 3854 /* Sample reconstruction. */ 3855 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) { 3856 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0); 3857 } else { 3858 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0); 3859 } 3860 3861 i += 1; 3862 pSamplesOut += 1; 3863 } 3864 3865 return DRFLAC_TRUE; 3866 } 3867 3868 #if defined(DRFLAC_SUPPORT_SSE2) 3869 static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b) 3870 { 3871 __m128i r; 3872 3873 /* Pack. */ 3874 r = _mm_packs_epi32(a, b); 3875 3876 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */ 3877 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0)); 3878 3879 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */ 3880 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0)); 3881 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0)); 3882 3883 return r; 3884 } 3885 #endif 3886 3887 #if defined(DRFLAC_SUPPORT_SSE41) 3888 static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a) 3889 { 3890 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128())); 3891 } 3892 3893 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x) 3894 { 3895 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2))); 3896 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2)); 3897 return _mm_add_epi32(x64, x32); 3898 } 3899 3900 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x) 3901 { 3902 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2))); 3903 } 3904 3905 static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count) 3906 { 3907 /* 3908 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side 3909 is shifted with zero bits, whereas the right side is shifted with sign bits. 3910 */ 3911 __m128i lo = _mm_srli_epi64(x, count); 3912 __m128i hi = _mm_srai_epi32(x, count); 3913 3914 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */ 3915 3916 return _mm_or_si128(lo, hi); 3917 } 3918 3919 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 3920 { 3921 int i; 3922 drflac_uint32 riceParamMask; 3923 drflac_int32* pDecodedSamples = pSamplesOut; 3924 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3); 3925 drflac_uint32 zeroCountParts0 = 0; 3926 drflac_uint32 zeroCountParts1 = 0; 3927 drflac_uint32 zeroCountParts2 = 0; 3928 drflac_uint32 zeroCountParts3 = 0; 3929 drflac_uint32 riceParamParts0 = 0; 3930 drflac_uint32 riceParamParts1 = 0; 3931 drflac_uint32 riceParamParts2 = 0; 3932 drflac_uint32 riceParamParts3 = 0; 3933 __m128i coefficients128_0; 3934 __m128i coefficients128_4; 3935 __m128i coefficients128_8; 3936 __m128i samples128_0; 3937 __m128i samples128_4; 3938 __m128i samples128_8; 3939 __m128i riceParamMask128; 3940 3941 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF}; 3942 3943 riceParamMask = (drflac_uint32)~((~0UL) << riceParam); 3944 riceParamMask128 = _mm_set1_epi32(riceParamMask); 3945 3946 /* Pre-load. */ 3947 coefficients128_0 = _mm_setzero_si128(); 3948 coefficients128_4 = _mm_setzero_si128(); 3949 coefficients128_8 = _mm_setzero_si128(); 3950 3951 samples128_0 = _mm_setzero_si128(); 3952 samples128_4 = _mm_setzero_si128(); 3953 samples128_8 = _mm_setzero_si128(); 3954 3955 /* 3956 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than 3957 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results 3958 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted 3959 so I think there's opportunity for this to be simplified. 3960 */ 3961 #if 1 3962 { 3963 int runningOrder = order; 3964 3965 /* 0 - 3. */ 3966 if (runningOrder >= 4) { 3967 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0)); 3968 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4)); 3969 runningOrder -= 4; 3970 } else { 3971 switch (runningOrder) { 3972 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break; 3973 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break; 3974 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break; 3975 } 3976 runningOrder = 0; 3977 } 3978 3979 /* 4 - 7 */ 3980 if (runningOrder >= 4) { 3981 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4)); 3982 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8)); 3983 runningOrder -= 4; 3984 } else { 3985 switch (runningOrder) { 3986 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break; 3987 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break; 3988 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break; 3989 } 3990 runningOrder = 0; 3991 } 3992 3993 /* 8 - 11 */ 3994 if (runningOrder == 4) { 3995 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8)); 3996 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12)); 3997 runningOrder -= 4; 3998 } else { 3999 switch (runningOrder) { 4000 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break; 4001 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break; 4002 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break; 4003 } 4004 runningOrder = 0; 4005 } 4006 4007 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */ 4008 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3)); 4009 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3)); 4010 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3)); 4011 } 4012 #else 4013 /* This causes strict-aliasing warnings with GCC. */ 4014 switch (order) 4015 { 4016 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12]; 4017 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11]; 4018 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10]; 4019 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9]; 4020 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8]; 4021 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7]; 4022 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6]; 4023 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5]; 4024 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4]; 4025 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3]; 4026 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2]; 4027 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1]; 4028 } 4029 #endif 4030 4031 /* For this version we are doing one sample at a time. */ 4032 while (pDecodedSamples < pDecodedSamplesEnd) { 4033 __m128i prediction128; 4034 __m128i zeroCountPart128; 4035 __m128i riceParamPart128; 4036 4037 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) || 4038 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) || 4039 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) || 4040 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) { 4041 return DRFLAC_FALSE; 4042 } 4043 4044 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0); 4045 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0); 4046 4047 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128); 4048 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam)); 4049 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */ 4050 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */ 4051 4052 if (order <= 4) { 4053 for (i = 0; i < 4; i += 1) { 4054 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0); 4055 4056 /* Horizontal add and shift. */ 4057 prediction128 = drflac__mm_hadd_epi32(prediction128); 4058 prediction128 = _mm_srai_epi32(prediction128, shift); 4059 prediction128 = _mm_add_epi32(riceParamPart128, prediction128); 4060 4061 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4); 4062 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4); 4063 } 4064 } else if (order <= 8) { 4065 for (i = 0; i < 4; i += 1) { 4066 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4); 4067 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0)); 4068 4069 /* Horizontal add and shift. */ 4070 prediction128 = drflac__mm_hadd_epi32(prediction128); 4071 prediction128 = _mm_srai_epi32(prediction128, shift); 4072 prediction128 = _mm_add_epi32(riceParamPart128, prediction128); 4073 4074 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4); 4075 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4); 4076 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4); 4077 } 4078 } else { 4079 for (i = 0; i < 4; i += 1) { 4080 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8); 4081 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4)); 4082 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0)); 4083 4084 /* Horizontal add and shift. */ 4085 prediction128 = drflac__mm_hadd_epi32(prediction128); 4086 prediction128 = _mm_srai_epi32(prediction128, shift); 4087 prediction128 = _mm_add_epi32(riceParamPart128, prediction128); 4088 4089 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4); 4090 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4); 4091 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4); 4092 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4); 4093 } 4094 } 4095 4096 /* We store samples in groups of 4. */ 4097 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0); 4098 pDecodedSamples += 4; 4099 } 4100 4101 /* Make sure we process the last few samples. */ 4102 i = (count & ~3); 4103 while (i < (int)count) { 4104 /* Rice extraction. */ 4105 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) { 4106 return DRFLAC_FALSE; 4107 } 4108 4109 /* Rice reconstruction. */ 4110 riceParamParts0 &= riceParamMask; 4111 riceParamParts0 |= (zeroCountParts0 << riceParam); 4112 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01]; 4113 4114 /* Sample reconstruction. */ 4115 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples); 4116 4117 i += 1; 4118 pDecodedSamples += 1; 4119 } 4120 4121 return DRFLAC_TRUE; 4122 } 4123 4124 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4125 { 4126 int i; 4127 drflac_uint32 riceParamMask; 4128 drflac_int32* pDecodedSamples = pSamplesOut; 4129 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3); 4130 drflac_uint32 zeroCountParts0 = 0; 4131 drflac_uint32 zeroCountParts1 = 0; 4132 drflac_uint32 zeroCountParts2 = 0; 4133 drflac_uint32 zeroCountParts3 = 0; 4134 drflac_uint32 riceParamParts0 = 0; 4135 drflac_uint32 riceParamParts1 = 0; 4136 drflac_uint32 riceParamParts2 = 0; 4137 drflac_uint32 riceParamParts3 = 0; 4138 __m128i coefficients128_0; 4139 __m128i coefficients128_4; 4140 __m128i coefficients128_8; 4141 __m128i samples128_0; 4142 __m128i samples128_4; 4143 __m128i samples128_8; 4144 __m128i prediction128; 4145 __m128i riceParamMask128; 4146 4147 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF}; 4148 4149 DRFLAC_ASSERT(order <= 12); 4150 4151 riceParamMask = (drflac_uint32)~((~0UL) << riceParam); 4152 riceParamMask128 = _mm_set1_epi32(riceParamMask); 4153 4154 prediction128 = _mm_setzero_si128(); 4155 4156 /* Pre-load. */ 4157 coefficients128_0 = _mm_setzero_si128(); 4158 coefficients128_4 = _mm_setzero_si128(); 4159 coefficients128_8 = _mm_setzero_si128(); 4160 4161 samples128_0 = _mm_setzero_si128(); 4162 samples128_4 = _mm_setzero_si128(); 4163 samples128_8 = _mm_setzero_si128(); 4164 4165 #if 1 4166 { 4167 int runningOrder = order; 4168 4169 /* 0 - 3. */ 4170 if (runningOrder >= 4) { 4171 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0)); 4172 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4)); 4173 runningOrder -= 4; 4174 } else { 4175 switch (runningOrder) { 4176 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break; 4177 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break; 4178 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break; 4179 } 4180 runningOrder = 0; 4181 } 4182 4183 /* 4 - 7 */ 4184 if (runningOrder >= 4) { 4185 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4)); 4186 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8)); 4187 runningOrder -= 4; 4188 } else { 4189 switch (runningOrder) { 4190 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break; 4191 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break; 4192 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break; 4193 } 4194 runningOrder = 0; 4195 } 4196 4197 /* 8 - 11 */ 4198 if (runningOrder == 4) { 4199 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8)); 4200 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12)); 4201 runningOrder -= 4; 4202 } else { 4203 switch (runningOrder) { 4204 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break; 4205 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break; 4206 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break; 4207 } 4208 runningOrder = 0; 4209 } 4210 4211 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */ 4212 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3)); 4213 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3)); 4214 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3)); 4215 } 4216 #else 4217 switch (order) 4218 { 4219 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12]; 4220 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11]; 4221 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10]; 4222 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9]; 4223 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8]; 4224 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7]; 4225 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6]; 4226 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5]; 4227 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4]; 4228 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3]; 4229 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2]; 4230 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1]; 4231 } 4232 #endif 4233 4234 /* For this version we are doing one sample at a time. */ 4235 while (pDecodedSamples < pDecodedSamplesEnd) { 4236 __m128i zeroCountPart128; 4237 __m128i riceParamPart128; 4238 4239 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) || 4240 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) || 4241 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) || 4242 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) { 4243 return DRFLAC_FALSE; 4244 } 4245 4246 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0); 4247 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0); 4248 4249 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128); 4250 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam)); 4251 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1))); 4252 4253 for (i = 0; i < 4; i += 1) { 4254 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */ 4255 4256 switch (order) 4257 { 4258 case 12: 4259 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0)))); 4260 case 10: 4261 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2)))); 4262 case 8: 4263 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0)))); 4264 case 6: 4265 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2)))); 4266 case 4: 4267 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0)))); 4268 case 2: 4269 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2)))); 4270 } 4271 4272 /* Horizontal add and shift. */ 4273 prediction128 = drflac__mm_hadd_epi64(prediction128); 4274 prediction128 = drflac__mm_srai_epi64(prediction128, shift); 4275 prediction128 = _mm_add_epi32(riceParamPart128, prediction128); 4276 4277 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */ 4278 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4); 4279 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4); 4280 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4); 4281 4282 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */ 4283 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4); 4284 } 4285 4286 /* We store samples in groups of 4. */ 4287 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0); 4288 pDecodedSamples += 4; 4289 } 4290 4291 /* Make sure we process the last few samples. */ 4292 i = (count & ~3); 4293 while (i < (int)count) { 4294 /* Rice extraction. */ 4295 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) { 4296 return DRFLAC_FALSE; 4297 } 4298 4299 /* Rice reconstruction. */ 4300 riceParamParts0 &= riceParamMask; 4301 riceParamParts0 |= (zeroCountParts0 << riceParam); 4302 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01]; 4303 4304 /* Sample reconstruction. */ 4305 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples); 4306 4307 i += 1; 4308 pDecodedSamples += 1; 4309 } 4310 4311 return DRFLAC_TRUE; 4312 } 4313 4314 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4315 { 4316 DRFLAC_ASSERT(bs != NULL); 4317 DRFLAC_ASSERT(pSamplesOut != NULL); 4318 4319 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */ 4320 if (lpcOrder > 0 && lpcOrder <= 12) { 4321 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) { 4322 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut); 4323 } else { 4324 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut); 4325 } 4326 } else { 4327 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut); 4328 } 4329 } 4330 #endif 4331 4332 #if defined(DRFLAC_SUPPORT_NEON) 4333 static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x) 4334 { 4335 vst1q_s32(p+0, x.val[0]); 4336 vst1q_s32(p+4, x.val[1]); 4337 } 4338 4339 static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x) 4340 { 4341 vst1q_u32(p+0, x.val[0]); 4342 vst1q_u32(p+4, x.val[1]); 4343 } 4344 4345 static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x) 4346 { 4347 vst1q_f32(p+0, x.val[0]); 4348 vst1q_f32(p+4, x.val[1]); 4349 } 4350 4351 static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x) 4352 { 4353 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1])); 4354 } 4355 4356 static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x) 4357 { 4358 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1])); 4359 } 4360 4361 static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0) 4362 { 4363 drflac_int32 x[4]; 4364 x[3] = x3; 4365 x[2] = x2; 4366 x[1] = x1; 4367 x[0] = x0; 4368 return vld1q_s32(x); 4369 } 4370 4371 static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b) 4372 { 4373 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */ 4374 4375 /* Reference */ 4376 /*return drflac__vdupq_n_s32x4( 4377 vgetq_lane_s32(a, 0), 4378 vgetq_lane_s32(b, 3), 4379 vgetq_lane_s32(b, 2), 4380 vgetq_lane_s32(b, 1) 4381 );*/ 4382 4383 return vextq_s32(b, a, 1); 4384 } 4385 4386 static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b) 4387 { 4388 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */ 4389 4390 /* Reference */ 4391 /*return drflac__vdupq_n_s32x4( 4392 vgetq_lane_s32(a, 0), 4393 vgetq_lane_s32(b, 3), 4394 vgetq_lane_s32(b, 2), 4395 vgetq_lane_s32(b, 1) 4396 );*/ 4397 4398 return vextq_u32(b, a, 1); 4399 } 4400 4401 static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x) 4402 { 4403 /* The sum must end up in position 0. */ 4404 4405 /* Reference */ 4406 /*return vdupq_n_s32( 4407 vgetq_lane_s32(x, 3) + 4408 vgetq_lane_s32(x, 2) + 4409 vgetq_lane_s32(x, 1) + 4410 vgetq_lane_s32(x, 0) 4411 );*/ 4412 4413 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x)); 4414 return vpadd_s32(r, r); 4415 } 4416 4417 static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x) 4418 { 4419 return vadd_s64(vget_high_s64(x), vget_low_s64(x)); 4420 } 4421 4422 static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x) 4423 { 4424 /* Reference */ 4425 /*return drflac__vdupq_n_s32x4( 4426 vgetq_lane_s32(x, 0), 4427 vgetq_lane_s32(x, 1), 4428 vgetq_lane_s32(x, 2), 4429 vgetq_lane_s32(x, 3) 4430 );*/ 4431 4432 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x))); 4433 } 4434 4435 static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x) 4436 { 4437 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF)); 4438 } 4439 4440 static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x) 4441 { 4442 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF)); 4443 } 4444 4445 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4446 { 4447 int i; 4448 drflac_uint32 riceParamMask; 4449 drflac_int32* pDecodedSamples = pSamplesOut; 4450 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3); 4451 drflac_uint32 zeroCountParts[4]; 4452 drflac_uint32 riceParamParts[4]; 4453 int32x4_t coefficients128_0; 4454 int32x4_t coefficients128_4; 4455 int32x4_t coefficients128_8; 4456 int32x4_t samples128_0; 4457 int32x4_t samples128_4; 4458 int32x4_t samples128_8; 4459 uint32x4_t riceParamMask128; 4460 int32x4_t riceParam128; 4461 int32x2_t shift64; 4462 uint32x4_t one128; 4463 4464 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF}; 4465 4466 riceParamMask = (drflac_uint32)~((~0UL) << riceParam); 4467 riceParamMask128 = vdupq_n_u32(riceParamMask); 4468 4469 riceParam128 = vdupq_n_s32(riceParam); 4470 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */ 4471 one128 = vdupq_n_u32(1); 4472 4473 /* 4474 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than 4475 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results 4476 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted 4477 so I think there's opportunity for this to be simplified. 4478 */ 4479 { 4480 int runningOrder = order; 4481 drflac_int32 tempC[4] = {0, 0, 0, 0}; 4482 drflac_int32 tempS[4] = {0, 0, 0, 0}; 4483 4484 /* 0 - 3. */ 4485 if (runningOrder >= 4) { 4486 coefficients128_0 = vld1q_s32(coefficients + 0); 4487 samples128_0 = vld1q_s32(pSamplesOut - 4); 4488 runningOrder -= 4; 4489 } else { 4490 switch (runningOrder) { 4491 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */ 4492 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */ 4493 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */ 4494 } 4495 4496 coefficients128_0 = vld1q_s32(tempC); 4497 samples128_0 = vld1q_s32(tempS); 4498 runningOrder = 0; 4499 } 4500 4501 /* 4 - 7 */ 4502 if (runningOrder >= 4) { 4503 coefficients128_4 = vld1q_s32(coefficients + 4); 4504 samples128_4 = vld1q_s32(pSamplesOut - 8); 4505 runningOrder -= 4; 4506 } else { 4507 switch (runningOrder) { 4508 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */ 4509 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */ 4510 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */ 4511 } 4512 4513 coefficients128_4 = vld1q_s32(tempC); 4514 samples128_4 = vld1q_s32(tempS); 4515 runningOrder = 0; 4516 } 4517 4518 /* 8 - 11 */ 4519 if (runningOrder == 4) { 4520 coefficients128_8 = vld1q_s32(coefficients + 8); 4521 samples128_8 = vld1q_s32(pSamplesOut - 12); 4522 runningOrder -= 4; 4523 } else { 4524 switch (runningOrder) { 4525 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */ 4526 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */ 4527 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */ 4528 } 4529 4530 coefficients128_8 = vld1q_s32(tempC); 4531 samples128_8 = vld1q_s32(tempS); 4532 runningOrder = 0; 4533 } 4534 4535 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */ 4536 coefficients128_0 = drflac__vrevq_s32(coefficients128_0); 4537 coefficients128_4 = drflac__vrevq_s32(coefficients128_4); 4538 coefficients128_8 = drflac__vrevq_s32(coefficients128_8); 4539 } 4540 4541 /* For this version we are doing one sample at a time. */ 4542 while (pDecodedSamples < pDecodedSamplesEnd) { 4543 int32x4_t prediction128; 4544 int32x2_t prediction64; 4545 uint32x4_t zeroCountPart128; 4546 uint32x4_t riceParamPart128; 4547 4548 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) || 4549 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) || 4550 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) || 4551 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) { 4552 return DRFLAC_FALSE; 4553 } 4554 4555 zeroCountPart128 = vld1q_u32(zeroCountParts); 4556 riceParamPart128 = vld1q_u32(riceParamParts); 4557 4558 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128); 4559 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128)); 4560 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128)); 4561 4562 if (order <= 4) { 4563 for (i = 0; i < 4; i += 1) { 4564 prediction128 = vmulq_s32(coefficients128_0, samples128_0); 4565 4566 /* Horizontal add and shift. */ 4567 prediction64 = drflac__vhaddq_s32(prediction128); 4568 prediction64 = vshl_s32(prediction64, shift64); 4569 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128))); 4570 4571 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0); 4572 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128); 4573 } 4574 } else if (order <= 8) { 4575 for (i = 0; i < 4; i += 1) { 4576 prediction128 = vmulq_s32(coefficients128_4, samples128_4); 4577 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0); 4578 4579 /* Horizontal add and shift. */ 4580 prediction64 = drflac__vhaddq_s32(prediction128); 4581 prediction64 = vshl_s32(prediction64, shift64); 4582 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128))); 4583 4584 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4); 4585 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0); 4586 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128); 4587 } 4588 } else { 4589 for (i = 0; i < 4; i += 1) { 4590 prediction128 = vmulq_s32(coefficients128_8, samples128_8); 4591 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4); 4592 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0); 4593 4594 /* Horizontal add and shift. */ 4595 prediction64 = drflac__vhaddq_s32(prediction128); 4596 prediction64 = vshl_s32(prediction64, shift64); 4597 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128))); 4598 4599 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8); 4600 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4); 4601 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0); 4602 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128); 4603 } 4604 } 4605 4606 /* We store samples in groups of 4. */ 4607 vst1q_s32(pDecodedSamples, samples128_0); 4608 pDecodedSamples += 4; 4609 } 4610 4611 /* Make sure we process the last few samples. */ 4612 i = (count & ~3); 4613 while (i < (int)count) { 4614 /* Rice extraction. */ 4615 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) { 4616 return DRFLAC_FALSE; 4617 } 4618 4619 /* Rice reconstruction. */ 4620 riceParamParts[0] &= riceParamMask; 4621 riceParamParts[0] |= (zeroCountParts[0] << riceParam); 4622 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01]; 4623 4624 /* Sample reconstruction. */ 4625 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples); 4626 4627 i += 1; 4628 pDecodedSamples += 1; 4629 } 4630 4631 return DRFLAC_TRUE; 4632 } 4633 4634 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4635 { 4636 int i; 4637 drflac_uint32 riceParamMask; 4638 drflac_int32* pDecodedSamples = pSamplesOut; 4639 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3); 4640 drflac_uint32 zeroCountParts[4]; 4641 drflac_uint32 riceParamParts[4]; 4642 int32x4_t coefficients128_0; 4643 int32x4_t coefficients128_4; 4644 int32x4_t coefficients128_8; 4645 int32x4_t samples128_0; 4646 int32x4_t samples128_4; 4647 int32x4_t samples128_8; 4648 uint32x4_t riceParamMask128; 4649 int32x4_t riceParam128; 4650 int64x1_t shift64; 4651 uint32x4_t one128; 4652 int64x2_t prediction128 = { 0 }; 4653 uint32x4_t zeroCountPart128; 4654 uint32x4_t riceParamPart128; 4655 4656 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF}; 4657 4658 riceParamMask = (drflac_uint32)~((~0UL) << riceParam); 4659 riceParamMask128 = vdupq_n_u32(riceParamMask); 4660 4661 riceParam128 = vdupq_n_s32(riceParam); 4662 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */ 4663 one128 = vdupq_n_u32(1); 4664 4665 /* 4666 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than 4667 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results 4668 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted 4669 so I think there's opportunity for this to be simplified. 4670 */ 4671 { 4672 int runningOrder = order; 4673 drflac_int32 tempC[4] = {0, 0, 0, 0}; 4674 drflac_int32 tempS[4] = {0, 0, 0, 0}; 4675 4676 /* 0 - 3. */ 4677 if (runningOrder >= 4) { 4678 coefficients128_0 = vld1q_s32(coefficients + 0); 4679 samples128_0 = vld1q_s32(pSamplesOut - 4); 4680 runningOrder -= 4; 4681 } else { 4682 switch (runningOrder) { 4683 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */ 4684 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */ 4685 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */ 4686 } 4687 4688 coefficients128_0 = vld1q_s32(tempC); 4689 samples128_0 = vld1q_s32(tempS); 4690 runningOrder = 0; 4691 } 4692 4693 /* 4 - 7 */ 4694 if (runningOrder >= 4) { 4695 coefficients128_4 = vld1q_s32(coefficients + 4); 4696 samples128_4 = vld1q_s32(pSamplesOut - 8); 4697 runningOrder -= 4; 4698 } else { 4699 switch (runningOrder) { 4700 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */ 4701 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */ 4702 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */ 4703 } 4704 4705 coefficients128_4 = vld1q_s32(tempC); 4706 samples128_4 = vld1q_s32(tempS); 4707 runningOrder = 0; 4708 } 4709 4710 /* 8 - 11 */ 4711 if (runningOrder == 4) { 4712 coefficients128_8 = vld1q_s32(coefficients + 8); 4713 samples128_8 = vld1q_s32(pSamplesOut - 12); 4714 runningOrder -= 4; 4715 } else { 4716 switch (runningOrder) { 4717 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */ 4718 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */ 4719 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */ 4720 } 4721 4722 coefficients128_8 = vld1q_s32(tempC); 4723 samples128_8 = vld1q_s32(tempS); 4724 runningOrder = 0; 4725 } 4726 4727 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */ 4728 coefficients128_0 = drflac__vrevq_s32(coefficients128_0); 4729 coefficients128_4 = drflac__vrevq_s32(coefficients128_4); 4730 coefficients128_8 = drflac__vrevq_s32(coefficients128_8); 4731 } 4732 4733 /* For this version we are doing one sample at a time. */ 4734 while (pDecodedSamples < pDecodedSamplesEnd) { 4735 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) || 4736 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) || 4737 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) || 4738 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) { 4739 return DRFLAC_FALSE; 4740 } 4741 4742 zeroCountPart128 = vld1q_u32(zeroCountParts); 4743 riceParamPart128 = vld1q_u32(riceParamParts); 4744 4745 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128); 4746 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128)); 4747 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128)); 4748 4749 for (i = 0; i < 4; i += 1) { 4750 int64x1_t prediction64; 4751 4752 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */ 4753 switch (order) 4754 { 4755 case 12: 4756 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8))); 4757 case 10: 4758 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8))); 4759 case 8: 4760 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4))); 4761 case 6: 4762 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4))); 4763 case 4: 4764 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0))); 4765 case 2: 4766 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0))); 4767 } 4768 4769 /* Horizontal add and shift. */ 4770 prediction64 = drflac__vhaddq_s64(prediction128); 4771 prediction64 = vshl_s64(prediction64, shift64); 4772 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0))); 4773 4774 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */ 4775 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8); 4776 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4); 4777 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0); 4778 4779 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */ 4780 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128); 4781 } 4782 4783 /* We store samples in groups of 4. */ 4784 vst1q_s32(pDecodedSamples, samples128_0); 4785 pDecodedSamples += 4; 4786 } 4787 4788 /* Make sure we process the last few samples. */ 4789 i = (count & ~3); 4790 while (i < (int)count) { 4791 /* Rice extraction. */ 4792 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) { 4793 return DRFLAC_FALSE; 4794 } 4795 4796 /* Rice reconstruction. */ 4797 riceParamParts[0] &= riceParamMask; 4798 riceParamParts[0] |= (zeroCountParts[0] << riceParam); 4799 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01]; 4800 4801 /* Sample reconstruction. */ 4802 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples); 4803 4804 i += 1; 4805 pDecodedSamples += 1; 4806 } 4807 4808 return DRFLAC_TRUE; 4809 } 4810 4811 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4812 { 4813 DRFLAC_ASSERT(bs != NULL); 4814 DRFLAC_ASSERT(pSamplesOut != NULL); 4815 4816 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */ 4817 if (lpcOrder > 0 && lpcOrder <= 12) { 4818 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) { 4819 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut); 4820 } else { 4821 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut); 4822 } 4823 } else { 4824 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut); 4825 } 4826 } 4827 #endif 4828 4829 static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4830 { 4831 #if defined(DRFLAC_SUPPORT_SSE41) 4832 if (drflac__gIsSSE41Supported) { 4833 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut); 4834 } else 4835 #elif defined(DRFLAC_SUPPORT_NEON) 4836 if (drflac__gIsNEONSupported) { 4837 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut); 4838 } else 4839 #endif 4840 { 4841 /* Scalar fallback. */ 4842 #if 0 4843 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut); 4844 #else 4845 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut); 4846 #endif 4847 } 4848 } 4849 4850 /* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */ 4851 static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam) 4852 { 4853 drflac_uint32 i; 4854 4855 DRFLAC_ASSERT(bs != NULL); 4856 4857 for (i = 0; i < count; ++i) { 4858 if (!drflac__seek_rice_parts(bs, riceParam)) { 4859 return DRFLAC_FALSE; 4860 } 4861 } 4862 4863 return DRFLAC_TRUE; 4864 } 4865 4866 #if defined(__clang__) 4867 __attribute__((no_sanitize("signed-integer-overflow"))) 4868 #endif 4869 static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut) 4870 { 4871 drflac_uint32 i; 4872 4873 DRFLAC_ASSERT(bs != NULL); 4874 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */ 4875 DRFLAC_ASSERT(pSamplesOut != NULL); 4876 4877 for (i = 0; i < count; ++i) { 4878 if (unencodedBitsPerSample > 0) { 4879 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) { 4880 return DRFLAC_FALSE; 4881 } 4882 } else { 4883 pSamplesOut[i] = 0; 4884 } 4885 4886 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) { 4887 pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i); 4888 } else { 4889 pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i); 4890 } 4891 } 4892 4893 return DRFLAC_TRUE; 4894 } 4895 4896 4897 /* 4898 Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called 4899 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The 4900 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded. 4901 */ 4902 static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples) 4903 { 4904 drflac_uint8 residualMethod; 4905 drflac_uint8 partitionOrder; 4906 drflac_uint32 samplesInPartition; 4907 drflac_uint32 partitionsRemaining; 4908 4909 DRFLAC_ASSERT(bs != NULL); 4910 DRFLAC_ASSERT(blockSize != 0); 4911 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */ 4912 4913 if (!drflac__read_uint8(bs, 2, &residualMethod)) { 4914 return DRFLAC_FALSE; 4915 } 4916 4917 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) { 4918 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */ 4919 } 4920 4921 /* Ignore the first <order> values. */ 4922 pDecodedSamples += lpcOrder; 4923 4924 if (!drflac__read_uint8(bs, 4, &partitionOrder)) { 4925 return DRFLAC_FALSE; 4926 } 4927 4928 /* 4929 From the FLAC spec: 4930 The Rice partition order in a Rice-coded residual section must be less than or equal to 8. 4931 */ 4932 if (partitionOrder > 8) { 4933 return DRFLAC_FALSE; 4934 } 4935 4936 /* Validation check. */ 4937 if ((blockSize / (1 << partitionOrder)) < lpcOrder) { 4938 return DRFLAC_FALSE; 4939 } 4940 4941 samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder; 4942 partitionsRemaining = (1 << partitionOrder); 4943 for (;;) { 4944 drflac_uint8 riceParam = 0; 4945 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) { 4946 if (!drflac__read_uint8(bs, 4, &riceParam)) { 4947 return DRFLAC_FALSE; 4948 } 4949 if (riceParam == 15) { 4950 riceParam = 0xFF; 4951 } 4952 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) { 4953 if (!drflac__read_uint8(bs, 5, &riceParam)) { 4954 return DRFLAC_FALSE; 4955 } 4956 if (riceParam == 31) { 4957 riceParam = 0xFF; 4958 } 4959 } 4960 4961 if (riceParam != 0xFF) { 4962 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) { 4963 return DRFLAC_FALSE; 4964 } 4965 } else { 4966 drflac_uint8 unencodedBitsPerSample = 0; 4967 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) { 4968 return DRFLAC_FALSE; 4969 } 4970 4971 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) { 4972 return DRFLAC_FALSE; 4973 } 4974 } 4975 4976 pDecodedSamples += samplesInPartition; 4977 4978 if (partitionsRemaining == 1) { 4979 break; 4980 } 4981 4982 partitionsRemaining -= 1; 4983 4984 if (partitionOrder != 0) { 4985 samplesInPartition = blockSize / (1 << partitionOrder); 4986 } 4987 } 4988 4989 return DRFLAC_TRUE; 4990 } 4991 4992 /* 4993 Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called 4994 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The 4995 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded. 4996 */ 4997 static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order) 4998 { 4999 drflac_uint8 residualMethod; 5000 drflac_uint8 partitionOrder; 5001 drflac_uint32 samplesInPartition; 5002 drflac_uint32 partitionsRemaining; 5003 5004 DRFLAC_ASSERT(bs != NULL); 5005 DRFLAC_ASSERT(blockSize != 0); 5006 5007 if (!drflac__read_uint8(bs, 2, &residualMethod)) { 5008 return DRFLAC_FALSE; 5009 } 5010 5011 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) { 5012 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */ 5013 } 5014 5015 if (!drflac__read_uint8(bs, 4, &partitionOrder)) { 5016 return DRFLAC_FALSE; 5017 } 5018 5019 /* 5020 From the FLAC spec: 5021 The Rice partition order in a Rice-coded residual section must be less than or equal to 8. 5022 */ 5023 if (partitionOrder > 8) { 5024 return DRFLAC_FALSE; 5025 } 5026 5027 /* Validation check. */ 5028 if ((blockSize / (1 << partitionOrder)) <= order) { 5029 return DRFLAC_FALSE; 5030 } 5031 5032 samplesInPartition = (blockSize / (1 << partitionOrder)) - order; 5033 partitionsRemaining = (1 << partitionOrder); 5034 for (;;) 5035 { 5036 drflac_uint8 riceParam = 0; 5037 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) { 5038 if (!drflac__read_uint8(bs, 4, &riceParam)) { 5039 return DRFLAC_FALSE; 5040 } 5041 if (riceParam == 15) { 5042 riceParam = 0xFF; 5043 } 5044 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) { 5045 if (!drflac__read_uint8(bs, 5, &riceParam)) { 5046 return DRFLAC_FALSE; 5047 } 5048 if (riceParam == 31) { 5049 riceParam = 0xFF; 5050 } 5051 } 5052 5053 if (riceParam != 0xFF) { 5054 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) { 5055 return DRFLAC_FALSE; 5056 } 5057 } else { 5058 drflac_uint8 unencodedBitsPerSample = 0; 5059 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) { 5060 return DRFLAC_FALSE; 5061 } 5062 5063 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) { 5064 return DRFLAC_FALSE; 5065 } 5066 } 5067 5068 5069 if (partitionsRemaining == 1) { 5070 break; 5071 } 5072 5073 partitionsRemaining -= 1; 5074 samplesInPartition = blockSize / (1 << partitionOrder); 5075 } 5076 5077 return DRFLAC_TRUE; 5078 } 5079 5080 5081 static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples) 5082 { 5083 drflac_uint32 i; 5084 5085 /* Only a single sample needs to be decoded here. */ 5086 drflac_int32 sample; 5087 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) { 5088 return DRFLAC_FALSE; 5089 } 5090 5091 /* 5092 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely) 5093 we'll want to look at a more efficient way. 5094 */ 5095 for (i = 0; i < blockSize; ++i) { 5096 pDecodedSamples[i] = sample; 5097 } 5098 5099 return DRFLAC_TRUE; 5100 } 5101 5102 static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples) 5103 { 5104 drflac_uint32 i; 5105 5106 for (i = 0; i < blockSize; ++i) { 5107 drflac_int32 sample; 5108 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) { 5109 return DRFLAC_FALSE; 5110 } 5111 5112 pDecodedSamples[i] = sample; 5113 } 5114 5115 return DRFLAC_TRUE; 5116 } 5117 5118 static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples) 5119 { 5120 drflac_uint32 i; 5121 5122 static drflac_int32 lpcCoefficientsTable[5][4] = { 5123 {0, 0, 0, 0}, 5124 {1, 0, 0, 0}, 5125 {2, -1, 0, 0}, 5126 {3, -3, 1, 0}, 5127 {4, -6, 4, -1} 5128 }; 5129 5130 /* Warm up samples and coefficients. */ 5131 for (i = 0; i < lpcOrder; ++i) { 5132 drflac_int32 sample; 5133 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) { 5134 return DRFLAC_FALSE; 5135 } 5136 5137 pDecodedSamples[i] = sample; 5138 } 5139 5140 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) { 5141 return DRFLAC_FALSE; 5142 } 5143 5144 return DRFLAC_TRUE; 5145 } 5146 5147 static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples) 5148 { 5149 drflac_uint8 i; 5150 drflac_uint8 lpcPrecision; 5151 drflac_int8 lpcShift; 5152 drflac_int32 coefficients[32]; 5153 5154 /* Warm up samples. */ 5155 for (i = 0; i < lpcOrder; ++i) { 5156 drflac_int32 sample; 5157 if (!drflac__read_int32(bs, bitsPerSample, &sample)) { 5158 return DRFLAC_FALSE; 5159 } 5160 5161 pDecodedSamples[i] = sample; 5162 } 5163 5164 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) { 5165 return DRFLAC_FALSE; 5166 } 5167 if (lpcPrecision == 15) { 5168 return DRFLAC_FALSE; /* Invalid. */ 5169 } 5170 lpcPrecision += 1; 5171 5172 if (!drflac__read_int8(bs, 5, &lpcShift)) { 5173 return DRFLAC_FALSE; 5174 } 5175 5176 /* 5177 From the FLAC specification: 5178 5179 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement) 5180 5181 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is 5182 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support. 5183 */ 5184 if (lpcShift < 0) { 5185 return DRFLAC_FALSE; 5186 } 5187 5188 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients)); 5189 for (i = 0; i < lpcOrder; ++i) { 5190 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) { 5191 return DRFLAC_FALSE; 5192 } 5193 } 5194 5195 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) { 5196 return DRFLAC_FALSE; 5197 } 5198 5199 return DRFLAC_TRUE; 5200 } 5201 5202 5203 static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header) 5204 { 5205 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000}; 5206 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */ 5207 5208 DRFLAC_ASSERT(bs != NULL); 5209 DRFLAC_ASSERT(header != NULL); 5210 5211 /* Keep looping until we find a valid sync code. */ 5212 for (;;) { 5213 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */ 5214 drflac_uint8 reserved = 0; 5215 drflac_uint8 blockingStrategy = 0; 5216 drflac_uint8 blockSize = 0; 5217 drflac_uint8 sampleRate = 0; 5218 drflac_uint8 channelAssignment = 0; 5219 drflac_uint8 bitsPerSample = 0; 5220 drflac_bool32 isVariableBlockSize; 5221 5222 if (!drflac__find_and_seek_to_next_sync_code(bs)) { 5223 return DRFLAC_FALSE; 5224 } 5225 5226 if (!drflac__read_uint8(bs, 1, &reserved)) { 5227 return DRFLAC_FALSE; 5228 } 5229 if (reserved == 1) { 5230 continue; 5231 } 5232 crc8 = drflac_crc8(crc8, reserved, 1); 5233 5234 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) { 5235 return DRFLAC_FALSE; 5236 } 5237 crc8 = drflac_crc8(crc8, blockingStrategy, 1); 5238 5239 if (!drflac__read_uint8(bs, 4, &blockSize)) { 5240 return DRFLAC_FALSE; 5241 } 5242 if (blockSize == 0) { 5243 continue; 5244 } 5245 crc8 = drflac_crc8(crc8, blockSize, 4); 5246 5247 if (!drflac__read_uint8(bs, 4, &sampleRate)) { 5248 return DRFLAC_FALSE; 5249 } 5250 crc8 = drflac_crc8(crc8, sampleRate, 4); 5251 5252 if (!drflac__read_uint8(bs, 4, &channelAssignment)) { 5253 return DRFLAC_FALSE; 5254 } 5255 if (channelAssignment > 10) { 5256 continue; 5257 } 5258 crc8 = drflac_crc8(crc8, channelAssignment, 4); 5259 5260 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) { 5261 return DRFLAC_FALSE; 5262 } 5263 if (bitsPerSample == 3 || bitsPerSample == 7) { 5264 continue; 5265 } 5266 crc8 = drflac_crc8(crc8, bitsPerSample, 3); 5267 5268 5269 if (!drflac__read_uint8(bs, 1, &reserved)) { 5270 return DRFLAC_FALSE; 5271 } 5272 if (reserved == 1) { 5273 continue; 5274 } 5275 crc8 = drflac_crc8(crc8, reserved, 1); 5276 5277 5278 isVariableBlockSize = blockingStrategy == 1; 5279 if (isVariableBlockSize) { 5280 drflac_uint64 pcmFrameNumber; 5281 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8); 5282 if (result != DRFLAC_SUCCESS) { 5283 if (result == DRFLAC_AT_END) { 5284 return DRFLAC_FALSE; 5285 } else { 5286 continue; 5287 } 5288 } 5289 header->flacFrameNumber = 0; 5290 header->pcmFrameNumber = pcmFrameNumber; 5291 } else { 5292 drflac_uint64 flacFrameNumber = 0; 5293 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8); 5294 if (result != DRFLAC_SUCCESS) { 5295 if (result == DRFLAC_AT_END) { 5296 return DRFLAC_FALSE; 5297 } else { 5298 continue; 5299 } 5300 } 5301 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */ 5302 header->pcmFrameNumber = 0; 5303 } 5304 5305 5306 DRFLAC_ASSERT(blockSize > 0); 5307 if (blockSize == 1) { 5308 header->blockSizeInPCMFrames = 192; 5309 } else if (blockSize <= 5) { 5310 DRFLAC_ASSERT(blockSize >= 2); 5311 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2)); 5312 } else if (blockSize == 6) { 5313 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) { 5314 return DRFLAC_FALSE; 5315 } 5316 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8); 5317 header->blockSizeInPCMFrames += 1; 5318 } else if (blockSize == 7) { 5319 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) { 5320 return DRFLAC_FALSE; 5321 } 5322 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16); 5323 if (header->blockSizeInPCMFrames == 0xFFFF) { 5324 return DRFLAC_FALSE; /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */ 5325 } 5326 header->blockSizeInPCMFrames += 1; 5327 } else { 5328 DRFLAC_ASSERT(blockSize >= 8); 5329 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8)); 5330 } 5331 5332 5333 if (sampleRate <= 11) { 5334 header->sampleRate = sampleRateTable[sampleRate]; 5335 } else if (sampleRate == 12) { 5336 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) { 5337 return DRFLAC_FALSE; 5338 } 5339 crc8 = drflac_crc8(crc8, header->sampleRate, 8); 5340 header->sampleRate *= 1000; 5341 } else if (sampleRate == 13) { 5342 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) { 5343 return DRFLAC_FALSE; 5344 } 5345 crc8 = drflac_crc8(crc8, header->sampleRate, 16); 5346 } else if (sampleRate == 14) { 5347 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) { 5348 return DRFLAC_FALSE; 5349 } 5350 crc8 = drflac_crc8(crc8, header->sampleRate, 16); 5351 header->sampleRate *= 10; 5352 } else { 5353 continue; /* Invalid. Assume an invalid block. */ 5354 } 5355 5356 5357 header->channelAssignment = channelAssignment; 5358 5359 header->bitsPerSample = bitsPerSampleTable[bitsPerSample]; 5360 if (header->bitsPerSample == 0) { 5361 header->bitsPerSample = streaminfoBitsPerSample; 5362 } 5363 5364 if (header->bitsPerSample != streaminfoBitsPerSample) { 5365 /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */ 5366 return DRFLAC_FALSE; 5367 } 5368 5369 if (!drflac__read_uint8(bs, 8, &header->crc8)) { 5370 return DRFLAC_FALSE; 5371 } 5372 5373 #ifndef DR_FLAC_NO_CRC 5374 if (header->crc8 != crc8) { 5375 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */ 5376 } 5377 #endif 5378 return DRFLAC_TRUE; 5379 } 5380 } 5381 5382 static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe) 5383 { 5384 drflac_uint8 header; 5385 int type; 5386 5387 if (!drflac__read_uint8(bs, 8, &header)) { 5388 return DRFLAC_FALSE; 5389 } 5390 5391 /* First bit should always be 0. */ 5392 if ((header & 0x80) != 0) { 5393 return DRFLAC_FALSE; 5394 } 5395 5396 type = (header & 0x7E) >> 1; 5397 if (type == 0) { 5398 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT; 5399 } else if (type == 1) { 5400 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM; 5401 } else { 5402 if ((type & 0x20) != 0) { 5403 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC; 5404 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1; 5405 } else if ((type & 0x08) != 0) { 5406 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED; 5407 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07); 5408 if (pSubframe->lpcOrder > 4) { 5409 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED; 5410 pSubframe->lpcOrder = 0; 5411 } 5412 } else { 5413 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED; 5414 } 5415 } 5416 5417 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) { 5418 return DRFLAC_FALSE; 5419 } 5420 5421 /* Wasted bits per sample. */ 5422 pSubframe->wastedBitsPerSample = 0; 5423 if ((header & 0x01) == 1) { 5424 unsigned int wastedBitsPerSample; 5425 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) { 5426 return DRFLAC_FALSE; 5427 } 5428 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1; 5429 } 5430 5431 return DRFLAC_TRUE; 5432 } 5433 5434 static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut) 5435 { 5436 drflac_subframe* pSubframe; 5437 drflac_uint32 subframeBitsPerSample; 5438 5439 DRFLAC_ASSERT(bs != NULL); 5440 DRFLAC_ASSERT(frame != NULL); 5441 5442 pSubframe = frame->subframes + subframeIndex; 5443 if (!drflac__read_subframe_header(bs, pSubframe)) { 5444 return DRFLAC_FALSE; 5445 } 5446 5447 /* Side channels require an extra bit per sample. Took a while to figure that one out... */ 5448 subframeBitsPerSample = frame->header.bitsPerSample; 5449 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) { 5450 subframeBitsPerSample += 1; 5451 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) { 5452 subframeBitsPerSample += 1; 5453 } 5454 5455 if (subframeBitsPerSample > 32) { 5456 /* libFLAC and ffmpeg reject 33-bit subframes as well */ 5457 return DRFLAC_FALSE; 5458 } 5459 5460 /* Need to handle wasted bits per sample. */ 5461 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) { 5462 return DRFLAC_FALSE; 5463 } 5464 subframeBitsPerSample -= pSubframe->wastedBitsPerSample; 5465 5466 pSubframe->pSamplesS32 = pDecodedSamplesOut; 5467 5468 switch (pSubframe->subframeType) 5469 { 5470 case DRFLAC_SUBFRAME_CONSTANT: 5471 { 5472 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32); 5473 } break; 5474 5475 case DRFLAC_SUBFRAME_VERBATIM: 5476 { 5477 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32); 5478 } break; 5479 5480 case DRFLAC_SUBFRAME_FIXED: 5481 { 5482 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32); 5483 } break; 5484 5485 case DRFLAC_SUBFRAME_LPC: 5486 { 5487 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32); 5488 } break; 5489 5490 default: return DRFLAC_FALSE; 5491 } 5492 5493 return DRFLAC_TRUE; 5494 } 5495 5496 static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex) 5497 { 5498 drflac_subframe* pSubframe; 5499 drflac_uint32 subframeBitsPerSample; 5500 5501 DRFLAC_ASSERT(bs != NULL); 5502 DRFLAC_ASSERT(frame != NULL); 5503 5504 pSubframe = frame->subframes + subframeIndex; 5505 if (!drflac__read_subframe_header(bs, pSubframe)) { 5506 return DRFLAC_FALSE; 5507 } 5508 5509 /* Side channels require an extra bit per sample. Took a while to figure that one out... */ 5510 subframeBitsPerSample = frame->header.bitsPerSample; 5511 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) { 5512 subframeBitsPerSample += 1; 5513 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) { 5514 subframeBitsPerSample += 1; 5515 } 5516 5517 /* Need to handle wasted bits per sample. */ 5518 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) { 5519 return DRFLAC_FALSE; 5520 } 5521 subframeBitsPerSample -= pSubframe->wastedBitsPerSample; 5522 5523 pSubframe->pSamplesS32 = NULL; 5524 5525 switch (pSubframe->subframeType) 5526 { 5527 case DRFLAC_SUBFRAME_CONSTANT: 5528 { 5529 if (!drflac__seek_bits(bs, subframeBitsPerSample)) { 5530 return DRFLAC_FALSE; 5531 } 5532 } break; 5533 5534 case DRFLAC_SUBFRAME_VERBATIM: 5535 { 5536 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample; 5537 if (!drflac__seek_bits(bs, bitsToSeek)) { 5538 return DRFLAC_FALSE; 5539 } 5540 } break; 5541 5542 case DRFLAC_SUBFRAME_FIXED: 5543 { 5544 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample; 5545 if (!drflac__seek_bits(bs, bitsToSeek)) { 5546 return DRFLAC_FALSE; 5547 } 5548 5549 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) { 5550 return DRFLAC_FALSE; 5551 } 5552 } break; 5553 5554 case DRFLAC_SUBFRAME_LPC: 5555 { 5556 drflac_uint8 lpcPrecision; 5557 5558 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample; 5559 if (!drflac__seek_bits(bs, bitsToSeek)) { 5560 return DRFLAC_FALSE; 5561 } 5562 5563 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) { 5564 return DRFLAC_FALSE; 5565 } 5566 if (lpcPrecision == 15) { 5567 return DRFLAC_FALSE; /* Invalid. */ 5568 } 5569 lpcPrecision += 1; 5570 5571 5572 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */ 5573 if (!drflac__seek_bits(bs, bitsToSeek)) { 5574 return DRFLAC_FALSE; 5575 } 5576 5577 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) { 5578 return DRFLAC_FALSE; 5579 } 5580 } break; 5581 5582 default: return DRFLAC_FALSE; 5583 } 5584 5585 return DRFLAC_TRUE; 5586 } 5587 5588 5589 static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment) 5590 { 5591 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2}; 5592 5593 DRFLAC_ASSERT(channelAssignment <= 10); 5594 return lookup[channelAssignment]; 5595 } 5596 5597 static drflac_result drflac__decode_flac_frame(drflac* pFlac) 5598 { 5599 int channelCount; 5600 int i; 5601 drflac_uint8 paddingSizeInBits; 5602 drflac_uint16 desiredCRC16; 5603 #ifndef DR_FLAC_NO_CRC 5604 drflac_uint16 actualCRC16; 5605 #endif 5606 5607 /* This function should be called while the stream is sitting on the first byte after the frame header. */ 5608 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes)); 5609 5610 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */ 5611 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) { 5612 return DRFLAC_ERROR; 5613 } 5614 5615 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */ 5616 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment); 5617 if (channelCount != (int)pFlac->channels) { 5618 return DRFLAC_ERROR; 5619 } 5620 5621 for (i = 0; i < channelCount; ++i) { 5622 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) { 5623 return DRFLAC_ERROR; 5624 } 5625 } 5626 5627 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7); 5628 if (paddingSizeInBits > 0) { 5629 drflac_uint8 padding = 0; 5630 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) { 5631 return DRFLAC_AT_END; 5632 } 5633 } 5634 5635 #ifndef DR_FLAC_NO_CRC 5636 actualCRC16 = drflac__flush_crc16(&pFlac->bs); 5637 #endif 5638 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) { 5639 return DRFLAC_AT_END; 5640 } 5641 5642 #ifndef DR_FLAC_NO_CRC 5643 if (actualCRC16 != desiredCRC16) { 5644 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */ 5645 } 5646 #endif 5647 5648 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames; 5649 5650 return DRFLAC_SUCCESS; 5651 } 5652 5653 static drflac_result drflac__seek_flac_frame(drflac* pFlac) 5654 { 5655 int channelCount; 5656 int i; 5657 drflac_uint16 desiredCRC16; 5658 #ifndef DR_FLAC_NO_CRC 5659 drflac_uint16 actualCRC16; 5660 #endif 5661 5662 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment); 5663 for (i = 0; i < channelCount; ++i) { 5664 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) { 5665 return DRFLAC_ERROR; 5666 } 5667 } 5668 5669 /* Padding. */ 5670 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) { 5671 return DRFLAC_ERROR; 5672 } 5673 5674 /* CRC. */ 5675 #ifndef DR_FLAC_NO_CRC 5676 actualCRC16 = drflac__flush_crc16(&pFlac->bs); 5677 #endif 5678 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) { 5679 return DRFLAC_AT_END; 5680 } 5681 5682 #ifndef DR_FLAC_NO_CRC 5683 if (actualCRC16 != desiredCRC16) { 5684 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */ 5685 } 5686 #endif 5687 5688 return DRFLAC_SUCCESS; 5689 } 5690 5691 static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac) 5692 { 5693 DRFLAC_ASSERT(pFlac != NULL); 5694 5695 for (;;) { 5696 drflac_result result; 5697 5698 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 5699 return DRFLAC_FALSE; 5700 } 5701 5702 result = drflac__decode_flac_frame(pFlac); 5703 if (result != DRFLAC_SUCCESS) { 5704 if (result == DRFLAC_CRC_MISMATCH) { 5705 continue; /* CRC mismatch. Skip to the next frame. */ 5706 } else { 5707 return DRFLAC_FALSE; 5708 } 5709 } 5710 5711 return DRFLAC_TRUE; 5712 } 5713 } 5714 5715 static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame) 5716 { 5717 drflac_uint64 firstPCMFrame; 5718 drflac_uint64 lastPCMFrame; 5719 5720 DRFLAC_ASSERT(pFlac != NULL); 5721 5722 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber; 5723 if (firstPCMFrame == 0) { 5724 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames; 5725 } 5726 5727 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames; 5728 if (lastPCMFrame > 0) { 5729 lastPCMFrame -= 1; /* Needs to be zero based. */ 5730 } 5731 5732 if (pFirstPCMFrame) { 5733 *pFirstPCMFrame = firstPCMFrame; 5734 } 5735 if (pLastPCMFrame) { 5736 *pLastPCMFrame = lastPCMFrame; 5737 } 5738 } 5739 5740 static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac) 5741 { 5742 drflac_bool32 result; 5743 5744 DRFLAC_ASSERT(pFlac != NULL); 5745 5746 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes); 5747 5748 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame)); 5749 pFlac->currentPCMFrame = 0; 5750 5751 return result; 5752 } 5753 5754 static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac) 5755 { 5756 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */ 5757 DRFLAC_ASSERT(pFlac != NULL); 5758 return drflac__seek_flac_frame(pFlac); 5759 } 5760 5761 5762 static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek) 5763 { 5764 drflac_uint64 pcmFramesRead = 0; 5765 while (pcmFramesToSeek > 0) { 5766 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) { 5767 if (!drflac__read_and_decode_next_flac_frame(pFlac)) { 5768 break; /* Couldn't read the next frame, so just break from the loop and return. */ 5769 } 5770 } else { 5771 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) { 5772 pcmFramesRead += pcmFramesToSeek; 5773 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */ 5774 pcmFramesToSeek = 0; 5775 } else { 5776 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining; 5777 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining; 5778 pFlac->currentFLACFrame.pcmFramesRemaining = 0; 5779 } 5780 } 5781 } 5782 5783 pFlac->currentPCMFrame += pcmFramesRead; 5784 return pcmFramesRead; 5785 } 5786 5787 5788 static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex) 5789 { 5790 drflac_bool32 isMidFrame = DRFLAC_FALSE; 5791 drflac_uint64 runningPCMFrameCount; 5792 5793 DRFLAC_ASSERT(pFlac != NULL); 5794 5795 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */ 5796 if (pcmFrameIndex >= pFlac->currentPCMFrame) { 5797 /* Seeking forward. Need to seek from the current position. */ 5798 runningPCMFrameCount = pFlac->currentPCMFrame; 5799 5800 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */ 5801 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) { 5802 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 5803 return DRFLAC_FALSE; 5804 } 5805 } else { 5806 isMidFrame = DRFLAC_TRUE; 5807 } 5808 } else { 5809 /* Seeking backwards. Need to seek from the start of the file. */ 5810 runningPCMFrameCount = 0; 5811 5812 /* Move back to the start. */ 5813 if (!drflac__seek_to_first_frame(pFlac)) { 5814 return DRFLAC_FALSE; 5815 } 5816 5817 /* Decode the first frame in preparation for sample-exact seeking below. */ 5818 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 5819 return DRFLAC_FALSE; 5820 } 5821 } 5822 5823 /* 5824 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its 5825 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame. 5826 */ 5827 for (;;) { 5828 drflac_uint64 pcmFrameCountInThisFLACFrame; 5829 drflac_uint64 firstPCMFrameInFLACFrame = 0; 5830 drflac_uint64 lastPCMFrameInFLACFrame = 0; 5831 5832 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame); 5833 5834 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1; 5835 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) { 5836 /* 5837 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend 5838 it never existed and keep iterating. 5839 */ 5840 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount; 5841 5842 if (!isMidFrame) { 5843 drflac_result result = drflac__decode_flac_frame(pFlac); 5844 if (result == DRFLAC_SUCCESS) { 5845 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */ 5846 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */ 5847 } else { 5848 if (result == DRFLAC_CRC_MISMATCH) { 5849 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */ 5850 } else { 5851 return DRFLAC_FALSE; 5852 } 5853 } 5854 } else { 5855 /* We started seeking mid-frame which means we need to skip the frame decoding part. */ 5856 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; 5857 } 5858 } else { 5859 /* 5860 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this 5861 frame never existed and leave the running sample count untouched. 5862 */ 5863 if (!isMidFrame) { 5864 drflac_result result = drflac__seek_to_next_flac_frame(pFlac); 5865 if (result == DRFLAC_SUCCESS) { 5866 runningPCMFrameCount += pcmFrameCountInThisFLACFrame; 5867 } else { 5868 if (result == DRFLAC_CRC_MISMATCH) { 5869 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */ 5870 } else { 5871 return DRFLAC_FALSE; 5872 } 5873 } 5874 } else { 5875 /* 5876 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with 5877 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header. 5878 */ 5879 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining; 5880 pFlac->currentFLACFrame.pcmFramesRemaining = 0; 5881 isMidFrame = DRFLAC_FALSE; 5882 } 5883 5884 /* If we are seeking to the end of the file and we've just hit it, we're done. */ 5885 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) { 5886 return DRFLAC_TRUE; 5887 } 5888 } 5889 5890 next_iteration: 5891 /* Grab the next frame in preparation for the next iteration. */ 5892 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 5893 return DRFLAC_FALSE; 5894 } 5895 } 5896 } 5897 5898 5899 #if !defined(DR_FLAC_NO_CRC) 5900 /* 5901 We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their 5902 uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting 5903 location. 5904 */ 5905 #define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f 5906 5907 static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset) 5908 { 5909 DRFLAC_ASSERT(pFlac != NULL); 5910 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL); 5911 DRFLAC_ASSERT(targetByte >= rangeLo); 5912 DRFLAC_ASSERT(targetByte <= rangeHi); 5913 5914 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes; 5915 5916 for (;;) { 5917 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */ 5918 drflac_uint64 lastTargetByte = targetByte; 5919 5920 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */ 5921 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) { 5922 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */ 5923 if (targetByte == 0) { 5924 drflac__seek_to_first_frame(pFlac); /* Try to recover. */ 5925 return DRFLAC_FALSE; 5926 } 5927 5928 /* Halve the byte location and continue. */ 5929 targetByte = rangeLo + ((rangeHi - rangeLo)/2); 5930 rangeHi = targetByte; 5931 } else { 5932 /* Getting here should mean that we have seeked to an appropriate byte. */ 5933 5934 /* Clear the details of the FLAC frame so we don't misreport data. */ 5935 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame)); 5936 5937 /* 5938 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the 5939 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing 5940 so it needs to stay this way for now. 5941 */ 5942 #if 1 5943 if (!drflac__read_and_decode_next_flac_frame(pFlac)) { 5944 /* Halve the byte location and continue. */ 5945 targetByte = rangeLo + ((rangeHi - rangeLo)/2); 5946 rangeHi = targetByte; 5947 } else { 5948 break; 5949 } 5950 #else 5951 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 5952 /* Halve the byte location and continue. */ 5953 targetByte = rangeLo + ((rangeHi - rangeLo)/2); 5954 rangeHi = targetByte; 5955 } else { 5956 break; 5957 } 5958 #endif 5959 } 5960 5961 /* We already tried this byte and there are no more to try, break out. */ 5962 if(targetByte == lastTargetByte) { 5963 return DRFLAC_FALSE; 5964 } 5965 } 5966 5967 /* The current PCM frame needs to be updated based on the frame we just seeked to. */ 5968 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL); 5969 5970 DRFLAC_ASSERT(targetByte <= rangeHi); 5971 5972 *pLastSuccessfulSeekOffset = targetByte; 5973 return DRFLAC_TRUE; 5974 } 5975 5976 static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset) 5977 { 5978 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */ 5979 #if 0 5980 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) { 5981 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */ 5982 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) { 5983 return DRFLAC_FALSE; 5984 } 5985 } 5986 #endif 5987 5988 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset; 5989 } 5990 5991 5992 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi) 5993 { 5994 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */ 5995 5996 drflac_uint64 targetByte; 5997 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount; 5998 drflac_uint64 pcmRangeHi = 0; 5999 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1; 6000 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo; 6001 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096; 6002 6003 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO); 6004 if (targetByte > byteRangeHi) { 6005 targetByte = byteRangeHi; 6006 } 6007 6008 for (;;) { 6009 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) { 6010 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */ 6011 drflac_uint64 newPCMRangeLo; 6012 drflac_uint64 newPCMRangeHi; 6013 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi); 6014 6015 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */ 6016 if (pcmRangeLo == newPCMRangeLo) { 6017 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) { 6018 break; /* Failed to seek to closest frame. */ 6019 } 6020 6021 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) { 6022 return DRFLAC_TRUE; 6023 } else { 6024 break; /* Failed to seek forward. */ 6025 } 6026 } 6027 6028 pcmRangeLo = newPCMRangeLo; 6029 pcmRangeHi = newPCMRangeHi; 6030 6031 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) { 6032 /* The target PCM frame is in this FLAC frame. */ 6033 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) { 6034 return DRFLAC_TRUE; 6035 } else { 6036 break; /* Failed to seek to FLAC frame. */ 6037 } 6038 } else { 6039 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f); 6040 6041 if (pcmRangeLo > pcmFrameIndex) { 6042 /* We seeked too far forward. We need to move our target byte backward and try again. */ 6043 byteRangeHi = lastSuccessfulSeekOffset; 6044 if (byteRangeLo > byteRangeHi) { 6045 byteRangeLo = byteRangeHi; 6046 } 6047 6048 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2); 6049 if (targetByte < byteRangeLo) { 6050 targetByte = byteRangeLo; 6051 } 6052 } else /*if (pcmRangeHi < pcmFrameIndex)*/ { 6053 /* We didn't seek far enough. We need to move our target byte forward and try again. */ 6054 6055 /* If we're close enough we can just seek forward. */ 6056 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) { 6057 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) { 6058 return DRFLAC_TRUE; 6059 } else { 6060 break; /* Failed to seek to FLAC frame. */ 6061 } 6062 } else { 6063 byteRangeLo = lastSuccessfulSeekOffset; 6064 if (byteRangeHi < byteRangeLo) { 6065 byteRangeHi = byteRangeLo; 6066 } 6067 6068 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio); 6069 if (targetByte > byteRangeHi) { 6070 targetByte = byteRangeHi; 6071 } 6072 6073 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) { 6074 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset; 6075 } 6076 } 6077 } 6078 } 6079 } else { 6080 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */ 6081 break; 6082 } 6083 } 6084 6085 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */ 6086 return DRFLAC_FALSE; 6087 } 6088 6089 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex) 6090 { 6091 drflac_uint64 byteRangeLo; 6092 drflac_uint64 byteRangeHi; 6093 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096; 6094 6095 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */ 6096 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) { 6097 return DRFLAC_FALSE; 6098 } 6099 6100 /* If we're close enough to the start, just move to the start and seek forward. */ 6101 if (pcmFrameIndex < seekForwardThreshold) { 6102 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex; 6103 } 6104 6105 /* 6106 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures 6107 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it. 6108 */ 6109 byteRangeLo = pFlac->firstFLACFramePosInBytes; 6110 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f); 6111 6112 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi); 6113 } 6114 #endif /* !DR_FLAC_NO_CRC */ 6115 6116 static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex) 6117 { 6118 drflac_uint32 iClosestSeekpoint = 0; 6119 drflac_bool32 isMidFrame = DRFLAC_FALSE; 6120 drflac_uint64 runningPCMFrameCount; 6121 drflac_uint32 iSeekpoint; 6122 6123 6124 DRFLAC_ASSERT(pFlac != NULL); 6125 6126 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) { 6127 return DRFLAC_FALSE; 6128 } 6129 6130 /* Do not use the seektable if pcmFramIndex is not coverd by it. */ 6131 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) { 6132 return DRFLAC_FALSE; 6133 } 6134 6135 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) { 6136 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) { 6137 break; 6138 } 6139 6140 iClosestSeekpoint = iSeekpoint; 6141 } 6142 6143 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */ 6144 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) { 6145 return DRFLAC_FALSE; 6146 } 6147 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) { 6148 return DRFLAC_FALSE; 6149 } 6150 6151 #if !defined(DR_FLAC_NO_CRC) 6152 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */ 6153 if (pFlac->totalPCMFrameCount > 0) { 6154 drflac_uint64 byteRangeLo; 6155 drflac_uint64 byteRangeHi; 6156 6157 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f); 6158 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset; 6159 6160 /* 6161 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting 6162 value for byteRangeHi which will clamp it appropriately. 6163 6164 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There 6165 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort. 6166 */ 6167 if (iClosestSeekpoint < pFlac->seekpointCount-1) { 6168 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1; 6169 6170 /* Basic validation on the seekpoints to ensure they're usable. */ 6171 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) { 6172 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */ 6173 } 6174 6175 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */ 6176 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */ 6177 } 6178 } 6179 6180 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) { 6181 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 6182 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL); 6183 6184 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) { 6185 return DRFLAC_TRUE; 6186 } 6187 } 6188 } 6189 } 6190 #endif /* !DR_FLAC_NO_CRC */ 6191 6192 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */ 6193 6194 /* 6195 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking 6196 from the seekpoint's first sample. 6197 */ 6198 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) { 6199 /* Optimized case. Just seek forward from where we are. */ 6200 runningPCMFrameCount = pFlac->currentPCMFrame; 6201 6202 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */ 6203 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) { 6204 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 6205 return DRFLAC_FALSE; 6206 } 6207 } else { 6208 isMidFrame = DRFLAC_TRUE; 6209 } 6210 } else { 6211 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */ 6212 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame; 6213 6214 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) { 6215 return DRFLAC_FALSE; 6216 } 6217 6218 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */ 6219 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 6220 return DRFLAC_FALSE; 6221 } 6222 } 6223 6224 for (;;) { 6225 drflac_uint64 pcmFrameCountInThisFLACFrame; 6226 drflac_uint64 firstPCMFrameInFLACFrame = 0; 6227 drflac_uint64 lastPCMFrameInFLACFrame = 0; 6228 6229 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame); 6230 6231 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1; 6232 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) { 6233 /* 6234 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend 6235 it never existed and keep iterating. 6236 */ 6237 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount; 6238 6239 if (!isMidFrame) { 6240 drflac_result result = drflac__decode_flac_frame(pFlac); 6241 if (result == DRFLAC_SUCCESS) { 6242 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */ 6243 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */ 6244 } else { 6245 if (result == DRFLAC_CRC_MISMATCH) { 6246 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */ 6247 } else { 6248 return DRFLAC_FALSE; 6249 } 6250 } 6251 } else { 6252 /* We started seeking mid-frame which means we need to skip the frame decoding part. */ 6253 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; 6254 } 6255 } else { 6256 /* 6257 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this 6258 frame never existed and leave the running sample count untouched. 6259 */ 6260 if (!isMidFrame) { 6261 drflac_result result = drflac__seek_to_next_flac_frame(pFlac); 6262 if (result == DRFLAC_SUCCESS) { 6263 runningPCMFrameCount += pcmFrameCountInThisFLACFrame; 6264 } else { 6265 if (result == DRFLAC_CRC_MISMATCH) { 6266 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */ 6267 } else { 6268 return DRFLAC_FALSE; 6269 } 6270 } 6271 } else { 6272 /* 6273 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with 6274 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header. 6275 */ 6276 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining; 6277 pFlac->currentFLACFrame.pcmFramesRemaining = 0; 6278 isMidFrame = DRFLAC_FALSE; 6279 } 6280 6281 /* If we are seeking to the end of the file and we've just hit it, we're done. */ 6282 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) { 6283 return DRFLAC_TRUE; 6284 } 6285 } 6286 6287 next_iteration: 6288 /* Grab the next frame in preparation for the next iteration. */ 6289 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 6290 return DRFLAC_FALSE; 6291 } 6292 } 6293 } 6294 6295 6296 #ifndef DR_FLAC_NO_OGG 6297 typedef struct 6298 { 6299 drflac_uint8 capturePattern[4]; /* Should be "OggS" */ 6300 drflac_uint8 structureVersion; /* Always 0. */ 6301 drflac_uint8 headerType; 6302 drflac_uint64 granulePosition; 6303 drflac_uint32 serialNumber; 6304 drflac_uint32 sequenceNumber; 6305 drflac_uint32 checksum; 6306 drflac_uint8 segmentCount; 6307 drflac_uint8 segmentTable[255]; 6308 } drflac_ogg_page_header; 6309 #endif 6310 6311 typedef struct 6312 { 6313 drflac_read_proc onRead; 6314 drflac_seek_proc onSeek; 6315 drflac_meta_proc onMeta; 6316 drflac_container container; 6317 void* pUserData; 6318 void* pUserDataMD; 6319 drflac_uint32 sampleRate; 6320 drflac_uint8 channels; 6321 drflac_uint8 bitsPerSample; 6322 drflac_uint64 totalPCMFrameCount; 6323 drflac_uint16 maxBlockSizeInPCMFrames; 6324 drflac_uint64 runningFilePos; 6325 drflac_bool32 hasStreamInfoBlock; 6326 drflac_bool32 hasMetadataBlocks; 6327 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */ 6328 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */ 6329 6330 #ifndef DR_FLAC_NO_OGG 6331 drflac_uint32 oggSerial; 6332 drflac_uint64 oggFirstBytePos; 6333 drflac_ogg_page_header oggBosHeader; 6334 #endif 6335 } drflac_init_info; 6336 6337 static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize) 6338 { 6339 blockHeader = drflac__be2host_32(blockHeader); 6340 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31); 6341 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24); 6342 *blockSize = (blockHeader & 0x00FFFFFFUL); 6343 } 6344 6345 static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize) 6346 { 6347 drflac_uint32 blockHeader; 6348 6349 *blockSize = 0; 6350 if (onRead(pUserData, &blockHeader, 4) != 4) { 6351 return DRFLAC_FALSE; 6352 } 6353 6354 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize); 6355 return DRFLAC_TRUE; 6356 } 6357 6358 static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo) 6359 { 6360 drflac_uint32 blockSizes; 6361 drflac_uint64 frameSizes = 0; 6362 drflac_uint64 importantProps; 6363 drflac_uint8 md5[16]; 6364 6365 /* min/max block size. */ 6366 if (onRead(pUserData, &blockSizes, 4) != 4) { 6367 return DRFLAC_FALSE; 6368 } 6369 6370 /* min/max frame size. */ 6371 if (onRead(pUserData, &frameSizes, 6) != 6) { 6372 return DRFLAC_FALSE; 6373 } 6374 6375 /* Sample rate, channels, bits per sample and total sample count. */ 6376 if (onRead(pUserData, &importantProps, 8) != 8) { 6377 return DRFLAC_FALSE; 6378 } 6379 6380 /* MD5 */ 6381 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) { 6382 return DRFLAC_FALSE; 6383 } 6384 6385 blockSizes = drflac__be2host_32(blockSizes); 6386 frameSizes = drflac__be2host_64(frameSizes); 6387 importantProps = drflac__be2host_64(importantProps); 6388 6389 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16); 6390 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF); 6391 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40); 6392 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16); 6393 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44); 6394 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1; 6395 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1; 6396 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF))); 6397 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5)); 6398 6399 return DRFLAC_TRUE; 6400 } 6401 6402 6403 static void* drflac__malloc_default(size_t sz, void* pUserData) 6404 { 6405 (void)pUserData; 6406 return DRFLAC_MALLOC(sz); 6407 } 6408 6409 static void* drflac__realloc_default(void* p, size_t sz, void* pUserData) 6410 { 6411 (void)pUserData; 6412 return DRFLAC_REALLOC(p, sz); 6413 } 6414 6415 static void drflac__free_default(void* p, void* pUserData) 6416 { 6417 (void)pUserData; 6418 DRFLAC_FREE(p); 6419 } 6420 6421 6422 static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks) 6423 { 6424 if (pAllocationCallbacks == NULL) { 6425 return NULL; 6426 } 6427 6428 if (pAllocationCallbacks->onMalloc != NULL) { 6429 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData); 6430 } 6431 6432 /* Try using realloc(). */ 6433 if (pAllocationCallbacks->onRealloc != NULL) { 6434 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData); 6435 } 6436 6437 return NULL; 6438 } 6439 6440 static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks) 6441 { 6442 if (pAllocationCallbacks == NULL) { 6443 return NULL; 6444 } 6445 6446 if (pAllocationCallbacks->onRealloc != NULL) { 6447 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData); 6448 } 6449 6450 /* Try emulating realloc() in terms of malloc()/free(). */ 6451 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) { 6452 void* p2; 6453 6454 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData); 6455 if (p2 == NULL) { 6456 return NULL; 6457 } 6458 6459 if (p != NULL) { 6460 DRFLAC_COPY_MEMORY(p2, p, szOld); 6461 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData); 6462 } 6463 6464 return p2; 6465 } 6466 6467 return NULL; 6468 } 6469 6470 static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks) 6471 { 6472 if (p == NULL || pAllocationCallbacks == NULL) { 6473 return; 6474 } 6475 6476 if (pAllocationCallbacks->onFree != NULL) { 6477 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData); 6478 } 6479 } 6480 6481 6482 static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks) 6483 { 6484 /* 6485 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that 6486 we'll be sitting on byte 42. 6487 */ 6488 drflac_uint64 runningFilePos = 42; 6489 drflac_uint64 seektablePos = 0; 6490 drflac_uint32 seektableSize = 0; 6491 6492 for (;;) { 6493 drflac_metadata metadata; 6494 drflac_uint8 isLastBlock = 0; 6495 drflac_uint8 blockType = 0; 6496 drflac_uint32 blockSize; 6497 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) { 6498 return DRFLAC_FALSE; 6499 } 6500 runningFilePos += 4; 6501 6502 metadata.type = blockType; 6503 metadata.pRawData = NULL; 6504 metadata.rawDataSize = 0; 6505 6506 switch (blockType) 6507 { 6508 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION: 6509 { 6510 if (blockSize < 4) { 6511 return DRFLAC_FALSE; 6512 } 6513 6514 if (onMeta) { 6515 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks); 6516 if (pRawData == NULL) { 6517 return DRFLAC_FALSE; 6518 } 6519 6520 if (onRead(pUserData, pRawData, blockSize) != blockSize) { 6521 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6522 return DRFLAC_FALSE; 6523 } 6524 6525 metadata.pRawData = pRawData; 6526 metadata.rawDataSize = blockSize; 6527 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData); 6528 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32)); 6529 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32); 6530 onMeta(pUserDataMD, &metadata); 6531 6532 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6533 } 6534 } break; 6535 6536 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE: 6537 { 6538 seektablePos = runningFilePos; 6539 seektableSize = blockSize; 6540 6541 if (onMeta) { 6542 drflac_uint32 seekpointCount; 6543 drflac_uint32 iSeekpoint; 6544 void* pRawData; 6545 6546 seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES; 6547 6548 pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks); 6549 if (pRawData == NULL) { 6550 return DRFLAC_FALSE; 6551 } 6552 6553 /* We need to read seekpoint by seekpoint and do some processing. */ 6554 for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) { 6555 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint; 6556 6557 if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) { 6558 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6559 return DRFLAC_FALSE; 6560 } 6561 6562 /* Endian swap. */ 6563 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame); 6564 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset); 6565 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount); 6566 } 6567 6568 metadata.pRawData = pRawData; 6569 metadata.rawDataSize = blockSize; 6570 metadata.data.seektable.seekpointCount = seekpointCount; 6571 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData; 6572 6573 onMeta(pUserDataMD, &metadata); 6574 6575 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6576 } 6577 } break; 6578 6579 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT: 6580 { 6581 if (blockSize < 8) { 6582 return DRFLAC_FALSE; 6583 } 6584 6585 if (onMeta) { 6586 void* pRawData; 6587 const char* pRunningData; 6588 const char* pRunningDataEnd; 6589 drflac_uint32 i; 6590 6591 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks); 6592 if (pRawData == NULL) { 6593 return DRFLAC_FALSE; 6594 } 6595 6596 if (onRead(pUserData, pRawData, blockSize) != blockSize) { 6597 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6598 return DRFLAC_FALSE; 6599 } 6600 6601 metadata.pRawData = pRawData; 6602 metadata.rawDataSize = blockSize; 6603 6604 pRunningData = (const char*)pRawData; 6605 pRunningDataEnd = (const char*)pRawData + blockSize; 6606 6607 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6608 6609 /* Need space for the rest of the block */ 6610 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */ 6611 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6612 return DRFLAC_FALSE; 6613 } 6614 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength; 6615 metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6616 6617 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */ 6618 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */ 6619 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6620 return DRFLAC_FALSE; 6621 } 6622 metadata.data.vorbis_comment.pComments = pRunningData; 6623 6624 /* Check that the comments section is valid before passing it to the callback */ 6625 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) { 6626 drflac_uint32 commentLength; 6627 6628 if (pRunningDataEnd - pRunningData < 4) { 6629 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6630 return DRFLAC_FALSE; 6631 } 6632 6633 commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6634 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */ 6635 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6636 return DRFLAC_FALSE; 6637 } 6638 pRunningData += commentLength; 6639 } 6640 6641 onMeta(pUserDataMD, &metadata); 6642 6643 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6644 } 6645 } break; 6646 6647 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET: 6648 { 6649 if (blockSize < 396) { 6650 return DRFLAC_FALSE; 6651 } 6652 6653 if (onMeta) { 6654 void* pRawData; 6655 const char* pRunningData; 6656 const char* pRunningDataEnd; 6657 size_t bufferSize; 6658 drflac_uint8 iTrack; 6659 drflac_uint8 iIndex; 6660 void* pTrackData; 6661 6662 /* 6663 This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation 6664 we need for storing the necessary data. The second pass will fill that buffer with usable data. 6665 */ 6666 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks); 6667 if (pRawData == NULL) { 6668 return DRFLAC_FALSE; 6669 } 6670 6671 if (onRead(pUserData, pRawData, blockSize) != blockSize) { 6672 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6673 return DRFLAC_FALSE; 6674 } 6675 6676 metadata.pRawData = pRawData; 6677 metadata.rawDataSize = blockSize; 6678 6679 pRunningData = (const char*)pRawData; 6680 pRunningDataEnd = (const char*)pRawData + blockSize; 6681 6682 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128; 6683 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8; 6684 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259; 6685 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1; 6686 metadata.data.cuesheet.pTrackData = NULL; /* Will be filled later. */ 6687 6688 /* Pass 1: Calculate the size of the buffer for the track data. */ 6689 { 6690 const char* pRunningDataSaved = pRunningData; /* Will be restored at the end in preparation for the second pass. */ 6691 6692 bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES; 6693 6694 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) { 6695 drflac_uint8 indexCount; 6696 drflac_uint32 indexPointSize; 6697 6698 if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) { 6699 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6700 return DRFLAC_FALSE; 6701 } 6702 6703 /* Skip to the index point count */ 6704 pRunningData += 35; 6705 6706 indexCount = pRunningData[0]; 6707 pRunningData += 1; 6708 6709 bufferSize += indexCount * sizeof(drflac_cuesheet_track_index); 6710 6711 /* Quick validation check. */ 6712 indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES; 6713 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) { 6714 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6715 return DRFLAC_FALSE; 6716 } 6717 6718 pRunningData += indexPointSize; 6719 } 6720 6721 pRunningData = pRunningDataSaved; 6722 } 6723 6724 /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */ 6725 { 6726 char* pRunningTrackData; 6727 6728 pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks); 6729 if (pTrackData == NULL) { 6730 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6731 return DRFLAC_FALSE; 6732 } 6733 6734 pRunningTrackData = (char*)pTrackData; 6735 6736 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) { 6737 drflac_uint8 indexCount; 6738 6739 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES); 6740 pRunningData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */ 6741 pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; 6742 6743 /* Grab the index count for the next part. */ 6744 indexCount = pRunningData[0]; 6745 pRunningData += 1; 6746 pRunningTrackData += 1; 6747 6748 /* Extract each track index. */ 6749 for (iIndex = 0; iIndex < indexCount; ++iIndex) { 6750 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData; 6751 6752 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES); 6753 pRunningData += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES; 6754 pRunningTrackData += sizeof(drflac_cuesheet_track_index); 6755 6756 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset); 6757 } 6758 } 6759 6760 metadata.data.cuesheet.pTrackData = pTrackData; 6761 } 6762 6763 /* The original data is no longer needed. */ 6764 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6765 pRawData = NULL; 6766 6767 onMeta(pUserDataMD, &metadata); 6768 6769 drflac__free_from_callbacks(pTrackData, pAllocationCallbacks); 6770 pTrackData = NULL; 6771 } 6772 } break; 6773 6774 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE: 6775 { 6776 if (blockSize < 32) { 6777 return DRFLAC_FALSE; 6778 } 6779 6780 if (onMeta) { 6781 void* pRawData; 6782 const char* pRunningData; 6783 const char* pRunningDataEnd; 6784 6785 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks); 6786 if (pRawData == NULL) { 6787 return DRFLAC_FALSE; 6788 } 6789 6790 if (onRead(pUserData, pRawData, blockSize) != blockSize) { 6791 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6792 return DRFLAC_FALSE; 6793 } 6794 6795 metadata.pRawData = pRawData; 6796 metadata.rawDataSize = blockSize; 6797 6798 pRunningData = (const char*)pRawData; 6799 pRunningDataEnd = (const char*)pRawData + blockSize; 6800 6801 metadata.data.picture.type = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6802 metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6803 6804 /* Need space for the rest of the block */ 6805 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */ 6806 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6807 return DRFLAC_FALSE; 6808 } 6809 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength; 6810 metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6811 6812 /* Need space for the rest of the block */ 6813 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */ 6814 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6815 return DRFLAC_FALSE; 6816 } 6817 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength; 6818 metadata.data.picture.width = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6819 metadata.data.picture.height = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6820 metadata.data.picture.colorDepth = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6821 metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6822 metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4; 6823 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData; 6824 6825 /* Need space for the picture after the block */ 6826 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */ 6827 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6828 return DRFLAC_FALSE; 6829 } 6830 6831 onMeta(pUserDataMD, &metadata); 6832 6833 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6834 } 6835 } break; 6836 6837 case DRFLAC_METADATA_BLOCK_TYPE_PADDING: 6838 { 6839 if (onMeta) { 6840 metadata.data.padding.unused = 0; 6841 6842 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */ 6843 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) { 6844 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */ 6845 } else { 6846 onMeta(pUserDataMD, &metadata); 6847 } 6848 } 6849 } break; 6850 6851 case DRFLAC_METADATA_BLOCK_TYPE_INVALID: 6852 { 6853 /* Invalid chunk. Just skip over this one. */ 6854 if (onMeta) { 6855 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) { 6856 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */ 6857 } 6858 } 6859 } break; 6860 6861 default: 6862 { 6863 /* 6864 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we 6865 can at the very least report the chunk to the application and let it look at the raw data. 6866 */ 6867 if (onMeta) { 6868 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks); 6869 if (pRawData == NULL) { 6870 return DRFLAC_FALSE; 6871 } 6872 6873 if (onRead(pUserData, pRawData, blockSize) != blockSize) { 6874 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6875 return DRFLAC_FALSE; 6876 } 6877 6878 metadata.pRawData = pRawData; 6879 metadata.rawDataSize = blockSize; 6880 onMeta(pUserDataMD, &metadata); 6881 6882 drflac__free_from_callbacks(pRawData, pAllocationCallbacks); 6883 } 6884 } break; 6885 } 6886 6887 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */ 6888 if (onMeta == NULL && blockSize > 0) { 6889 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) { 6890 isLastBlock = DRFLAC_TRUE; 6891 } 6892 } 6893 6894 runningFilePos += blockSize; 6895 if (isLastBlock) { 6896 break; 6897 } 6898 } 6899 6900 *pSeektablePos = seektablePos; 6901 *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES; 6902 *pFirstFramePos = runningFilePos; 6903 6904 return DRFLAC_TRUE; 6905 } 6906 6907 static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed) 6908 { 6909 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */ 6910 6911 drflac_uint8 isLastBlock; 6912 drflac_uint8 blockType; 6913 drflac_uint32 blockSize; 6914 6915 (void)onSeek; 6916 6917 pInit->container = drflac_container_native; 6918 6919 /* The first metadata block should be the STREAMINFO block. */ 6920 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) { 6921 return DRFLAC_FALSE; 6922 } 6923 6924 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) { 6925 if (!relaxed) { 6926 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */ 6927 return DRFLAC_FALSE; 6928 } else { 6929 /* 6930 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined 6931 for that frame. 6932 */ 6933 pInit->hasStreamInfoBlock = DRFLAC_FALSE; 6934 pInit->hasMetadataBlocks = DRFLAC_FALSE; 6935 6936 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) { 6937 return DRFLAC_FALSE; /* Couldn't find a frame. */ 6938 } 6939 6940 if (pInit->firstFrameHeader.bitsPerSample == 0) { 6941 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */ 6942 } 6943 6944 pInit->sampleRate = pInit->firstFrameHeader.sampleRate; 6945 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment); 6946 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample; 6947 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */ 6948 return DRFLAC_TRUE; 6949 } 6950 } else { 6951 drflac_streaminfo streaminfo; 6952 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) { 6953 return DRFLAC_FALSE; 6954 } 6955 6956 pInit->hasStreamInfoBlock = DRFLAC_TRUE; 6957 pInit->sampleRate = streaminfo.sampleRate; 6958 pInit->channels = streaminfo.channels; 6959 pInit->bitsPerSample = streaminfo.bitsPerSample; 6960 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount; 6961 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */ 6962 pInit->hasMetadataBlocks = !isLastBlock; 6963 6964 if (onMeta) { 6965 drflac_metadata metadata; 6966 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO; 6967 metadata.pRawData = NULL; 6968 metadata.rawDataSize = 0; 6969 metadata.data.streaminfo = streaminfo; 6970 onMeta(pUserDataMD, &metadata); 6971 } 6972 6973 return DRFLAC_TRUE; 6974 } 6975 } 6976 6977 #ifndef DR_FLAC_NO_OGG 6978 #define DRFLAC_OGG_MAX_PAGE_SIZE 65307 6979 #define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */ 6980 6981 typedef enum 6982 { 6983 drflac_ogg_recover_on_crc_mismatch, 6984 drflac_ogg_fail_on_crc_mismatch 6985 } drflac_ogg_crc_mismatch_recovery; 6986 6987 #ifndef DR_FLAC_NO_CRC 6988 static drflac_uint32 drflac__crc32_table[] = { 6989 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L, 6990 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L, 6991 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L, 6992 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL, 6993 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L, 6994 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L, 6995 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L, 6996 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL, 6997 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L, 6998 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L, 6999 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L, 7000 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL, 7001 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L, 7002 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L, 7003 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L, 7004 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL, 7005 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL, 7006 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L, 7007 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L, 7008 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL, 7009 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL, 7010 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L, 7011 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L, 7012 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL, 7013 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL, 7014 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L, 7015 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L, 7016 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL, 7017 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL, 7018 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L, 7019 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L, 7020 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL, 7021 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L, 7022 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL, 7023 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL, 7024 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L, 7025 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L, 7026 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL, 7027 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL, 7028 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L, 7029 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L, 7030 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL, 7031 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL, 7032 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L, 7033 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L, 7034 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL, 7035 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL, 7036 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L, 7037 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L, 7038 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL, 7039 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L, 7040 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L, 7041 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L, 7042 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL, 7043 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L, 7044 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L, 7045 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L, 7046 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL, 7047 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L, 7048 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L, 7049 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L, 7050 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL, 7051 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L, 7052 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L 7053 }; 7054 #endif 7055 7056 static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data) 7057 { 7058 #ifndef DR_FLAC_NO_CRC 7059 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data]; 7060 #else 7061 (void)data; 7062 return crc32; 7063 #endif 7064 } 7065 7066 #if 0 7067 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data) 7068 { 7069 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF)); 7070 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF)); 7071 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF)); 7072 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF)); 7073 return crc32; 7074 } 7075 7076 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data) 7077 { 7078 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF)); 7079 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF)); 7080 return crc32; 7081 } 7082 #endif 7083 7084 static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize) 7085 { 7086 /* This can be optimized. */ 7087 drflac_uint32 i; 7088 for (i = 0; i < dataSize; ++i) { 7089 crc32 = drflac_crc32_byte(crc32, pData[i]); 7090 } 7091 return crc32; 7092 } 7093 7094 7095 static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4]) 7096 { 7097 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S'; 7098 } 7099 7100 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader) 7101 { 7102 return 27 + pHeader->segmentCount; 7103 } 7104 7105 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader) 7106 { 7107 drflac_uint32 pageBodySize = 0; 7108 int i; 7109 7110 for (i = 0; i < pHeader->segmentCount; ++i) { 7111 pageBodySize += pHeader->segmentTable[i]; 7112 } 7113 7114 return pageBodySize; 7115 } 7116 7117 static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32) 7118 { 7119 drflac_uint8 data[23]; 7120 drflac_uint32 i; 7121 7122 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32); 7123 7124 if (onRead(pUserData, data, 23) != 23) { 7125 return DRFLAC_AT_END; 7126 } 7127 *pBytesRead += 23; 7128 7129 /* 7130 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about 7131 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I 7132 like to have it map to the structure of the underlying data. 7133 */ 7134 pHeader->capturePattern[0] = 'O'; 7135 pHeader->capturePattern[1] = 'g'; 7136 pHeader->capturePattern[2] = 'g'; 7137 pHeader->capturePattern[3] = 'S'; 7138 7139 pHeader->structureVersion = data[0]; 7140 pHeader->headerType = data[1]; 7141 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8); 7142 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4); 7143 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4); 7144 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4); 7145 pHeader->segmentCount = data[22]; 7146 7147 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */ 7148 data[18] = 0; 7149 data[19] = 0; 7150 data[20] = 0; 7151 data[21] = 0; 7152 7153 for (i = 0; i < 23; ++i) { 7154 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]); 7155 } 7156 7157 7158 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) { 7159 return DRFLAC_AT_END; 7160 } 7161 *pBytesRead += pHeader->segmentCount; 7162 7163 for (i = 0; i < pHeader->segmentCount; ++i) { 7164 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]); 7165 } 7166 7167 return DRFLAC_SUCCESS; 7168 } 7169 7170 static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32) 7171 { 7172 drflac_uint8 id[4]; 7173 7174 *pBytesRead = 0; 7175 7176 if (onRead(pUserData, id, 4) != 4) { 7177 return DRFLAC_AT_END; 7178 } 7179 *pBytesRead += 4; 7180 7181 /* We need to read byte-by-byte until we find the OggS capture pattern. */ 7182 for (;;) { 7183 if (drflac_ogg__is_capture_pattern(id)) { 7184 drflac_result result; 7185 7186 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32; 7187 7188 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32); 7189 if (result == DRFLAC_SUCCESS) { 7190 return DRFLAC_SUCCESS; 7191 } else { 7192 if (result == DRFLAC_CRC_MISMATCH) { 7193 continue; 7194 } else { 7195 return result; 7196 } 7197 } 7198 } else { 7199 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */ 7200 id[0] = id[1]; 7201 id[1] = id[2]; 7202 id[2] = id[3]; 7203 if (onRead(pUserData, &id[3], 1) != 1) { 7204 return DRFLAC_AT_END; 7205 } 7206 *pBytesRead += 1; 7207 } 7208 } 7209 } 7210 7211 7212 /* 7213 The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works 7214 in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed 7215 in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type 7216 dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from 7217 the physical Ogg bitstream are converted and delivered in native FLAC format. 7218 */ 7219 typedef struct 7220 { 7221 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */ 7222 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */ 7223 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */ 7224 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */ 7225 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */ 7226 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */ 7227 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */ 7228 drflac_ogg_page_header currentPageHeader; 7229 drflac_uint32 bytesRemainingInPage; 7230 drflac_uint32 pageDataSize; 7231 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE]; 7232 } drflac_oggbs; /* oggbs = Ogg Bitstream */ 7233 7234 static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead) 7235 { 7236 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead); 7237 oggbs->currentBytePos += bytesActuallyRead; 7238 7239 return bytesActuallyRead; 7240 } 7241 7242 static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin) 7243 { 7244 if (origin == drflac_seek_origin_start) { 7245 if (offset <= 0x7FFFFFFF) { 7246 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) { 7247 return DRFLAC_FALSE; 7248 } 7249 oggbs->currentBytePos = offset; 7250 7251 return DRFLAC_TRUE; 7252 } else { 7253 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) { 7254 return DRFLAC_FALSE; 7255 } 7256 oggbs->currentBytePos = offset; 7257 7258 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current); 7259 } 7260 } else { 7261 while (offset > 0x7FFFFFFF) { 7262 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) { 7263 return DRFLAC_FALSE; 7264 } 7265 oggbs->currentBytePos += 0x7FFFFFFF; 7266 offset -= 0x7FFFFFFF; 7267 } 7268 7269 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */ 7270 return DRFLAC_FALSE; 7271 } 7272 oggbs->currentBytePos += offset; 7273 7274 return DRFLAC_TRUE; 7275 } 7276 } 7277 7278 static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod) 7279 { 7280 drflac_ogg_page_header header; 7281 for (;;) { 7282 drflac_uint32 crc32 = 0; 7283 drflac_uint32 bytesRead; 7284 drflac_uint32 pageBodySize; 7285 #ifndef DR_FLAC_NO_CRC 7286 drflac_uint32 actualCRC32; 7287 #endif 7288 7289 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) { 7290 return DRFLAC_FALSE; 7291 } 7292 oggbs->currentBytePos += bytesRead; 7293 7294 pageBodySize = drflac_ogg__get_page_body_size(&header); 7295 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) { 7296 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */ 7297 } 7298 7299 if (header.serialNumber != oggbs->serialNumber) { 7300 /* It's not a FLAC page. Skip it. */ 7301 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) { 7302 return DRFLAC_FALSE; 7303 } 7304 continue; 7305 } 7306 7307 7308 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */ 7309 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) { 7310 return DRFLAC_FALSE; 7311 } 7312 oggbs->pageDataSize = pageBodySize; 7313 7314 #ifndef DR_FLAC_NO_CRC 7315 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize); 7316 if (actualCRC32 != header.checksum) { 7317 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) { 7318 continue; /* CRC mismatch. Skip this page. */ 7319 } else { 7320 /* 7321 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we 7322 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the 7323 seek did not fully complete. 7324 */ 7325 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch); 7326 return DRFLAC_FALSE; 7327 } 7328 } 7329 #else 7330 (void)recoveryMethod; /* <-- Silence a warning. */ 7331 #endif 7332 7333 oggbs->currentPageHeader = header; 7334 oggbs->bytesRemainingInPage = pageBodySize; 7335 return DRFLAC_TRUE; 7336 } 7337 } 7338 7339 /* Function below is unused at the moment, but I might be re-adding it later. */ 7340 #if 0 7341 static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg) 7342 { 7343 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage; 7344 drflac_uint8 iSeg = 0; 7345 drflac_uint32 iByte = 0; 7346 while (iByte < bytesConsumedInPage) { 7347 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg]; 7348 if (iByte + segmentSize > bytesConsumedInPage) { 7349 break; 7350 } else { 7351 iSeg += 1; 7352 iByte += segmentSize; 7353 } 7354 } 7355 7356 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte); 7357 return iSeg; 7358 } 7359 7360 static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs) 7361 { 7362 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */ 7363 for (;;) { 7364 drflac_bool32 atEndOfPage = DRFLAC_FALSE; 7365 7366 drflac_uint8 bytesRemainingInSeg; 7367 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg); 7368 7369 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg; 7370 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) { 7371 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg]; 7372 if (segmentSize < 255) { 7373 if (iSeg == oggbs->currentPageHeader.segmentCount-1) { 7374 atEndOfPage = DRFLAC_TRUE; 7375 } 7376 7377 break; 7378 } 7379 7380 bytesToEndOfPacketOrPage += segmentSize; 7381 } 7382 7383 /* 7384 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll 7385 want to load the next page and keep searching for the end of the packet. 7386 */ 7387 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current); 7388 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage; 7389 7390 if (atEndOfPage) { 7391 /* 7392 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may 7393 straddle pages. 7394 */ 7395 if (!drflac_oggbs__goto_next_page(oggbs)) { 7396 return DRFLAC_FALSE; 7397 } 7398 7399 /* If it's a fresh packet it most likely means we're at the next packet. */ 7400 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { 7401 return DRFLAC_TRUE; 7402 } 7403 } else { 7404 /* We're at the next packet. */ 7405 return DRFLAC_TRUE; 7406 } 7407 } 7408 } 7409 7410 static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs) 7411 { 7412 /* The bitstream should be sitting on the first byte just after the header of the frame. */ 7413 7414 /* What we're actually doing here is seeking to the start of the next packet. */ 7415 return drflac_oggbs__seek_to_next_packet(oggbs); 7416 } 7417 #endif 7418 7419 static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead) 7420 { 7421 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData; 7422 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut; 7423 size_t bytesRead = 0; 7424 7425 DRFLAC_ASSERT(oggbs != NULL); 7426 DRFLAC_ASSERT(pRunningBufferOut != NULL); 7427 7428 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */ 7429 while (bytesRead < bytesToRead) { 7430 size_t bytesRemainingToRead = bytesToRead - bytesRead; 7431 7432 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) { 7433 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead); 7434 bytesRead += bytesRemainingToRead; 7435 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead; 7436 break; 7437 } 7438 7439 /* If we get here it means some of the requested data is contained in the next pages. */ 7440 if (oggbs->bytesRemainingInPage > 0) { 7441 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage); 7442 bytesRead += oggbs->bytesRemainingInPage; 7443 pRunningBufferOut += oggbs->bytesRemainingInPage; 7444 oggbs->bytesRemainingInPage = 0; 7445 } 7446 7447 DRFLAC_ASSERT(bytesRemainingToRead > 0); 7448 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) { 7449 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */ 7450 } 7451 } 7452 7453 return bytesRead; 7454 } 7455 7456 static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin) 7457 { 7458 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData; 7459 int bytesSeeked = 0; 7460 7461 DRFLAC_ASSERT(oggbs != NULL); 7462 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */ 7463 7464 /* Seeking is always forward which makes things a lot simpler. */ 7465 if (origin == drflac_seek_origin_start) { 7466 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) { 7467 return DRFLAC_FALSE; 7468 } 7469 7470 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) { 7471 return DRFLAC_FALSE; 7472 } 7473 7474 return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current); 7475 } 7476 7477 DRFLAC_ASSERT(origin == drflac_seek_origin_current); 7478 7479 while (bytesSeeked < offset) { 7480 int bytesRemainingToSeek = offset - bytesSeeked; 7481 DRFLAC_ASSERT(bytesRemainingToSeek >= 0); 7482 7483 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) { 7484 bytesSeeked += bytesRemainingToSeek; 7485 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */ 7486 oggbs->bytesRemainingInPage -= bytesRemainingToSeek; 7487 break; 7488 } 7489 7490 /* If we get here it means some of the requested data is contained in the next pages. */ 7491 if (oggbs->bytesRemainingInPage > 0) { 7492 bytesSeeked += (int)oggbs->bytesRemainingInPage; 7493 oggbs->bytesRemainingInPage = 0; 7494 } 7495 7496 DRFLAC_ASSERT(bytesRemainingToSeek > 0); 7497 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) { 7498 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */ 7499 return DRFLAC_FALSE; 7500 } 7501 } 7502 7503 return DRFLAC_TRUE; 7504 } 7505 7506 7507 static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex) 7508 { 7509 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs; 7510 drflac_uint64 originalBytePos; 7511 drflac_uint64 runningGranulePosition; 7512 drflac_uint64 runningFrameBytePos; 7513 drflac_uint64 runningPCMFrameCount; 7514 7515 DRFLAC_ASSERT(oggbs != NULL); 7516 7517 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */ 7518 7519 /* First seek to the first frame. */ 7520 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) { 7521 return DRFLAC_FALSE; 7522 } 7523 oggbs->bytesRemainingInPage = 0; 7524 7525 runningGranulePosition = 0; 7526 for (;;) { 7527 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) { 7528 drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start); 7529 return DRFLAC_FALSE; /* Never did find that sample... */ 7530 } 7531 7532 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize; 7533 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) { 7534 break; /* The sample is somewhere in the previous page. */ 7535 } 7536 7537 /* 7538 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we 7539 disregard any pages that do not begin a fresh packet. 7540 */ 7541 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */ 7542 if (oggbs->currentPageHeader.segmentTable[0] >= 2) { 7543 drflac_uint8 firstBytesInPage[2]; 7544 firstBytesInPage[0] = oggbs->pageData[0]; 7545 firstBytesInPage[1] = oggbs->pageData[1]; 7546 7547 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */ 7548 runningGranulePosition = oggbs->currentPageHeader.granulePosition; 7549 } 7550 7551 continue; 7552 } 7553 } 7554 } 7555 7556 /* 7557 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the 7558 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of 7559 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until 7560 we find the one containing the target sample. 7561 */ 7562 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) { 7563 return DRFLAC_FALSE; 7564 } 7565 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) { 7566 return DRFLAC_FALSE; 7567 } 7568 7569 /* 7570 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep 7571 looping over these frames until we find the one containing the sample we're after. 7572 */ 7573 runningPCMFrameCount = runningGranulePosition; 7574 for (;;) { 7575 /* 7576 There are two ways to find the sample and seek past irrelevant frames: 7577 1) Use the native FLAC decoder. 7578 2) Use Ogg's framing system. 7579 7580 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to 7581 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code 7582 duplication for the decoding of frame headers. 7583 7584 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg 7585 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the 7586 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks 7587 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read 7588 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to 7589 avoid the use of the drflac_bs object. 7590 7591 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons: 7592 1) Seeking is already partially accelerated using Ogg's paging system in the code block above. 7593 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon. 7594 3) Simplicity. 7595 */ 7596 drflac_uint64 firstPCMFrameInFLACFrame = 0; 7597 drflac_uint64 lastPCMFrameInFLACFrame = 0; 7598 drflac_uint64 pcmFrameCountInThisFrame; 7599 7600 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 7601 return DRFLAC_FALSE; 7602 } 7603 7604 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame); 7605 7606 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1; 7607 7608 /* If we are seeking to the end of the file and we've just hit it, we're done. */ 7609 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) { 7610 drflac_result result = drflac__decode_flac_frame(pFlac); 7611 if (result == DRFLAC_SUCCESS) { 7612 pFlac->currentPCMFrame = pcmFrameIndex; 7613 pFlac->currentFLACFrame.pcmFramesRemaining = 0; 7614 return DRFLAC_TRUE; 7615 } else { 7616 return DRFLAC_FALSE; 7617 } 7618 } 7619 7620 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) { 7621 /* 7622 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend 7623 it never existed and keep iterating. 7624 */ 7625 drflac_result result = drflac__decode_flac_frame(pFlac); 7626 if (result == DRFLAC_SUCCESS) { 7627 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */ 7628 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */ 7629 if (pcmFramesToDecode == 0) { 7630 return DRFLAC_TRUE; 7631 } 7632 7633 pFlac->currentPCMFrame = runningPCMFrameCount; 7634 7635 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */ 7636 } else { 7637 if (result == DRFLAC_CRC_MISMATCH) { 7638 continue; /* CRC mismatch. Pretend this frame never existed. */ 7639 } else { 7640 return DRFLAC_FALSE; 7641 } 7642 } 7643 } else { 7644 /* 7645 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this 7646 frame never existed and leave the running sample count untouched. 7647 */ 7648 drflac_result result = drflac__seek_to_next_flac_frame(pFlac); 7649 if (result == DRFLAC_SUCCESS) { 7650 runningPCMFrameCount += pcmFrameCountInThisFrame; 7651 } else { 7652 if (result == DRFLAC_CRC_MISMATCH) { 7653 continue; /* CRC mismatch. Pretend this frame never existed. */ 7654 } else { 7655 return DRFLAC_FALSE; 7656 } 7657 } 7658 } 7659 } 7660 } 7661 7662 7663 7664 static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed) 7665 { 7666 drflac_ogg_page_header header; 7667 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32; 7668 drflac_uint32 bytesRead = 0; 7669 7670 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */ 7671 (void)relaxed; 7672 7673 pInit->container = drflac_container_ogg; 7674 pInit->oggFirstBytePos = 0; 7675 7676 /* 7677 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the 7678 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if 7679 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed. 7680 */ 7681 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) { 7682 return DRFLAC_FALSE; 7683 } 7684 pInit->runningFilePos += bytesRead; 7685 7686 for (;;) { 7687 int pageBodySize; 7688 7689 /* Break if we're past the beginning of stream page. */ 7690 if ((header.headerType & 0x02) == 0) { 7691 return DRFLAC_FALSE; 7692 } 7693 7694 /* Check if it's a FLAC header. */ 7695 pageBodySize = drflac_ogg__get_page_body_size(&header); 7696 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */ 7697 /* It could be a FLAC page... */ 7698 drflac_uint32 bytesRemainingInPage = pageBodySize; 7699 drflac_uint8 packetType; 7700 7701 if (onRead(pUserData, &packetType, 1) != 1) { 7702 return DRFLAC_FALSE; 7703 } 7704 7705 bytesRemainingInPage -= 1; 7706 if (packetType == 0x7F) { 7707 /* Increasingly more likely to be a FLAC page... */ 7708 drflac_uint8 sig[4]; 7709 if (onRead(pUserData, sig, 4) != 4) { 7710 return DRFLAC_FALSE; 7711 } 7712 7713 bytesRemainingInPage -= 4; 7714 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') { 7715 /* Almost certainly a FLAC page... */ 7716 drflac_uint8 mappingVersion[2]; 7717 if (onRead(pUserData, mappingVersion, 2) != 2) { 7718 return DRFLAC_FALSE; 7719 } 7720 7721 if (mappingVersion[0] != 1) { 7722 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */ 7723 } 7724 7725 /* 7726 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to 7727 be handling it in a generic way based on the serial number and packet types. 7728 */ 7729 if (!onSeek(pUserData, 2, drflac_seek_origin_current)) { 7730 return DRFLAC_FALSE; 7731 } 7732 7733 /* Expecting the native FLAC signature "fLaC". */ 7734 if (onRead(pUserData, sig, 4) != 4) { 7735 return DRFLAC_FALSE; 7736 } 7737 7738 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') { 7739 /* The remaining data in the page should be the STREAMINFO block. */ 7740 drflac_streaminfo streaminfo; 7741 drflac_uint8 isLastBlock; 7742 drflac_uint8 blockType; 7743 drflac_uint32 blockSize; 7744 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) { 7745 return DRFLAC_FALSE; 7746 } 7747 7748 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) { 7749 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */ 7750 } 7751 7752 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) { 7753 /* Success! */ 7754 pInit->hasStreamInfoBlock = DRFLAC_TRUE; 7755 pInit->sampleRate = streaminfo.sampleRate; 7756 pInit->channels = streaminfo.channels; 7757 pInit->bitsPerSample = streaminfo.bitsPerSample; 7758 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount; 7759 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; 7760 pInit->hasMetadataBlocks = !isLastBlock; 7761 7762 if (onMeta) { 7763 drflac_metadata metadata; 7764 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO; 7765 metadata.pRawData = NULL; 7766 metadata.rawDataSize = 0; 7767 metadata.data.streaminfo = streaminfo; 7768 onMeta(pUserDataMD, &metadata); 7769 } 7770 7771 pInit->runningFilePos += pageBodySize; 7772 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */ 7773 pInit->oggSerial = header.serialNumber; 7774 pInit->oggBosHeader = header; 7775 break; 7776 } else { 7777 /* Failed to read STREAMINFO block. Aww, so close... */ 7778 return DRFLAC_FALSE; 7779 } 7780 } else { 7781 /* Invalid file. */ 7782 return DRFLAC_FALSE; 7783 } 7784 } else { 7785 /* Not a FLAC header. Skip it. */ 7786 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) { 7787 return DRFLAC_FALSE; 7788 } 7789 } 7790 } else { 7791 /* Not a FLAC header. Seek past the entire page and move on to the next. */ 7792 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) { 7793 return DRFLAC_FALSE; 7794 } 7795 } 7796 } else { 7797 if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) { 7798 return DRFLAC_FALSE; 7799 } 7800 } 7801 7802 pInit->runningFilePos += pageBodySize; 7803 7804 7805 /* Read the header of the next page. */ 7806 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) { 7807 return DRFLAC_FALSE; 7808 } 7809 pInit->runningFilePos += bytesRead; 7810 } 7811 7812 /* 7813 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next 7814 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the 7815 Ogg bistream object. 7816 */ 7817 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */ 7818 return DRFLAC_TRUE; 7819 } 7820 #endif 7821 7822 static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD) 7823 { 7824 drflac_bool32 relaxed; 7825 drflac_uint8 id[4]; 7826 7827 if (pInit == NULL || onRead == NULL || onSeek == NULL) { 7828 return DRFLAC_FALSE; 7829 } 7830 7831 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit)); 7832 pInit->onRead = onRead; 7833 pInit->onSeek = onSeek; 7834 pInit->onMeta = onMeta; 7835 pInit->container = container; 7836 pInit->pUserData = pUserData; 7837 pInit->pUserDataMD = pUserDataMD; 7838 7839 pInit->bs.onRead = onRead; 7840 pInit->bs.onSeek = onSeek; 7841 pInit->bs.pUserData = pUserData; 7842 drflac__reset_cache(&pInit->bs); 7843 7844 7845 /* If the container is explicitly defined then we can try opening in relaxed mode. */ 7846 relaxed = container != drflac_container_unknown; 7847 7848 /* Skip over any ID3 tags. */ 7849 for (;;) { 7850 if (onRead(pUserData, id, 4) != 4) { 7851 return DRFLAC_FALSE; /* Ran out of data. */ 7852 } 7853 pInit->runningFilePos += 4; 7854 7855 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') { 7856 drflac_uint8 header[6]; 7857 drflac_uint8 flags; 7858 drflac_uint32 headerSize; 7859 7860 if (onRead(pUserData, header, 6) != 6) { 7861 return DRFLAC_FALSE; /* Ran out of data. */ 7862 } 7863 pInit->runningFilePos += 6; 7864 7865 flags = header[1]; 7866 7867 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4); 7868 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize)); 7869 if (flags & 0x10) { 7870 headerSize += 10; 7871 } 7872 7873 if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) { 7874 return DRFLAC_FALSE; /* Failed to seek past the tag. */ 7875 } 7876 pInit->runningFilePos += headerSize; 7877 } else { 7878 break; 7879 } 7880 } 7881 7882 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') { 7883 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed); 7884 } 7885 #ifndef DR_FLAC_NO_OGG 7886 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') { 7887 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed); 7888 } 7889 #endif 7890 7891 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */ 7892 if (relaxed) { 7893 if (container == drflac_container_native) { 7894 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed); 7895 } 7896 #ifndef DR_FLAC_NO_OGG 7897 if (container == drflac_container_ogg) { 7898 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed); 7899 } 7900 #endif 7901 } 7902 7903 /* Unsupported container. */ 7904 return DRFLAC_FALSE; 7905 } 7906 7907 static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit) 7908 { 7909 DRFLAC_ASSERT(pFlac != NULL); 7910 DRFLAC_ASSERT(pInit != NULL); 7911 7912 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac)); 7913 pFlac->bs = pInit->bs; 7914 pFlac->onMeta = pInit->onMeta; 7915 pFlac->pUserDataMD = pInit->pUserDataMD; 7916 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames; 7917 pFlac->sampleRate = pInit->sampleRate; 7918 pFlac->channels = (drflac_uint8)pInit->channels; 7919 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample; 7920 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount; 7921 pFlac->container = pInit->container; 7922 } 7923 7924 7925 static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks) 7926 { 7927 drflac_init_info init; 7928 drflac_uint32 allocationSize; 7929 drflac_uint32 wholeSIMDVectorCountPerChannel; 7930 drflac_uint32 decodedSamplesAllocationSize; 7931 #ifndef DR_FLAC_NO_OGG 7932 drflac_oggbs* pOggbs = NULL; 7933 #endif 7934 drflac_uint64 firstFramePos; 7935 drflac_uint64 seektablePos; 7936 drflac_uint32 seekpointCount; 7937 drflac_allocation_callbacks allocationCallbacks; 7938 drflac* pFlac; 7939 7940 /* CPU support first. */ 7941 drflac__init_cpu_caps(); 7942 7943 if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) { 7944 return NULL; 7945 } 7946 7947 if (pAllocationCallbacks != NULL) { 7948 allocationCallbacks = *pAllocationCallbacks; 7949 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) { 7950 return NULL; /* Invalid allocation callbacks. */ 7951 } 7952 } else { 7953 allocationCallbacks.pUserData = NULL; 7954 allocationCallbacks.onMalloc = drflac__malloc_default; 7955 allocationCallbacks.onRealloc = drflac__realloc_default; 7956 allocationCallbacks.onFree = drflac__free_default; 7957 } 7958 7959 7960 /* 7961 The size of the allocation for the drflac object needs to be large enough to fit the following: 7962 1) The main members of the drflac structure 7963 2) A block of memory large enough to store the decoded samples of the largest frame in the stream 7964 3) If the container is Ogg, a drflac_oggbs object 7965 7966 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration 7967 the different SIMD instruction sets. 7968 */ 7969 allocationSize = sizeof(drflac); 7970 7971 /* 7972 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector 7973 we are supporting. 7974 */ 7975 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) { 7976 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))); 7977 } else { 7978 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1; 7979 } 7980 7981 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels; 7982 7983 allocationSize += decodedSamplesAllocationSize; 7984 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */ 7985 7986 #ifndef DR_FLAC_NO_OGG 7987 /* There's additional data required for Ogg streams. */ 7988 if (init.container == drflac_container_ogg) { 7989 allocationSize += sizeof(drflac_oggbs); 7990 7991 pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks); 7992 if (pOggbs == NULL) { 7993 return NULL; /*DRFLAC_OUT_OF_MEMORY;*/ 7994 } 7995 7996 DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs)); 7997 pOggbs->onRead = onRead; 7998 pOggbs->onSeek = onSeek; 7999 pOggbs->pUserData = pUserData; 8000 pOggbs->currentBytePos = init.oggFirstBytePos; 8001 pOggbs->firstBytePos = init.oggFirstBytePos; 8002 pOggbs->serialNumber = init.oggSerial; 8003 pOggbs->bosPageHeader = init.oggBosHeader; 8004 pOggbs->bytesRemainingInPage = 0; 8005 } 8006 #endif 8007 8008 /* 8009 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to 8010 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading 8011 and decoding the metadata. 8012 */ 8013 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */ 8014 seektablePos = 0; 8015 seekpointCount = 0; 8016 if (init.hasMetadataBlocks) { 8017 drflac_read_proc onReadOverride = onRead; 8018 drflac_seek_proc onSeekOverride = onSeek; 8019 void* pUserDataOverride = pUserData; 8020 8021 #ifndef DR_FLAC_NO_OGG 8022 if (init.container == drflac_container_ogg) { 8023 onReadOverride = drflac__on_read_ogg; 8024 onSeekOverride = drflac__on_seek_ogg; 8025 pUserDataOverride = (void*)pOggbs; 8026 } 8027 #endif 8028 8029 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) { 8030 #ifndef DR_FLAC_NO_OGG 8031 drflac__free_from_callbacks(pOggbs, &allocationCallbacks); 8032 #endif 8033 return NULL; 8034 } 8035 8036 allocationSize += seekpointCount * sizeof(drflac_seekpoint); 8037 } 8038 8039 8040 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks); 8041 if (pFlac == NULL) { 8042 #ifndef DR_FLAC_NO_OGG 8043 drflac__free_from_callbacks(pOggbs, &allocationCallbacks); 8044 #endif 8045 return NULL; 8046 } 8047 8048 drflac__init_from_info(pFlac, &init); 8049 pFlac->allocationCallbacks = allocationCallbacks; 8050 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE); 8051 8052 #ifndef DR_FLAC_NO_OGG 8053 if (init.container == drflac_container_ogg) { 8054 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint))); 8055 DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs)); 8056 8057 /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */ 8058 drflac__free_from_callbacks(pOggbs, &allocationCallbacks); 8059 pOggbs = NULL; 8060 8061 /* The Ogg bistream needs to be layered on top of the original bitstream. */ 8062 pFlac->bs.onRead = drflac__on_read_ogg; 8063 pFlac->bs.onSeek = drflac__on_seek_ogg; 8064 pFlac->bs.pUserData = (void*)pInternalOggbs; 8065 pFlac->_oggbs = (void*)pInternalOggbs; 8066 } 8067 #endif 8068 8069 pFlac->firstFLACFramePosInBytes = firstFramePos; 8070 8071 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */ 8072 #ifndef DR_FLAC_NO_OGG 8073 if (init.container == drflac_container_ogg) 8074 { 8075 pFlac->pSeekpoints = NULL; 8076 pFlac->seekpointCount = 0; 8077 } 8078 else 8079 #endif 8080 { 8081 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */ 8082 if (seektablePos != 0) { 8083 pFlac->seekpointCount = seekpointCount; 8084 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize); 8085 8086 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL); 8087 DRFLAC_ASSERT(pFlac->bs.onRead != NULL); 8088 8089 /* Seek to the seektable, then just read directly into our seektable buffer. */ 8090 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) { 8091 drflac_uint32 iSeekpoint; 8092 8093 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) { 8094 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) { 8095 /* Endian swap. */ 8096 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame); 8097 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset); 8098 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount); 8099 } else { 8100 /* Failed to read the seektable. Pretend we don't have one. */ 8101 pFlac->pSeekpoints = NULL; 8102 pFlac->seekpointCount = 0; 8103 break; 8104 } 8105 } 8106 8107 /* We need to seek back to where we were. If this fails it's a critical error. */ 8108 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) { 8109 drflac__free_from_callbacks(pFlac, &allocationCallbacks); 8110 return NULL; 8111 } 8112 } else { 8113 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */ 8114 pFlac->pSeekpoints = NULL; 8115 pFlac->seekpointCount = 0; 8116 } 8117 } 8118 } 8119 8120 8121 /* 8122 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode 8123 the first frame. 8124 */ 8125 if (!init.hasStreamInfoBlock) { 8126 pFlac->currentFLACFrame.header = init.firstFrameHeader; 8127 for (;;) { 8128 drflac_result result = drflac__decode_flac_frame(pFlac); 8129 if (result == DRFLAC_SUCCESS) { 8130 break; 8131 } else { 8132 if (result == DRFLAC_CRC_MISMATCH) { 8133 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) { 8134 drflac__free_from_callbacks(pFlac, &allocationCallbacks); 8135 return NULL; 8136 } 8137 continue; 8138 } else { 8139 drflac__free_from_callbacks(pFlac, &allocationCallbacks); 8140 return NULL; 8141 } 8142 } 8143 } 8144 } 8145 8146 return pFlac; 8147 } 8148 8149 8150 8151 #ifndef DR_FLAC_NO_STDIO 8152 #include <stdio.h> 8153 #ifndef DR_FLAC_NO_WCHAR 8154 #include <wchar.h> /* For wcslen(), wcsrtombs() */ 8155 #endif 8156 8157 /* Errno */ 8158 /* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */ 8159 #include <errno.h> 8160 static drflac_result drflac_result_from_errno(int e) 8161 { 8162 switch (e) 8163 { 8164 case 0: return DRFLAC_SUCCESS; 8165 #ifdef EPERM 8166 case EPERM: return DRFLAC_INVALID_OPERATION; 8167 #endif 8168 #ifdef ENOENT 8169 case ENOENT: return DRFLAC_DOES_NOT_EXIST; 8170 #endif 8171 #ifdef ESRCH 8172 case ESRCH: return DRFLAC_DOES_NOT_EXIST; 8173 #endif 8174 #ifdef EINTR 8175 case EINTR: return DRFLAC_INTERRUPT; 8176 #endif 8177 #ifdef EIO 8178 case EIO: return DRFLAC_IO_ERROR; 8179 #endif 8180 #ifdef ENXIO 8181 case ENXIO: return DRFLAC_DOES_NOT_EXIST; 8182 #endif 8183 #ifdef E2BIG 8184 case E2BIG: return DRFLAC_INVALID_ARGS; 8185 #endif 8186 #ifdef ENOEXEC 8187 case ENOEXEC: return DRFLAC_INVALID_FILE; 8188 #endif 8189 #ifdef EBADF 8190 case EBADF: return DRFLAC_INVALID_FILE; 8191 #endif 8192 #ifdef ECHILD 8193 case ECHILD: return DRFLAC_ERROR; 8194 #endif 8195 #ifdef EAGAIN 8196 case EAGAIN: return DRFLAC_UNAVAILABLE; 8197 #endif 8198 #ifdef ENOMEM 8199 case ENOMEM: return DRFLAC_OUT_OF_MEMORY; 8200 #endif 8201 #ifdef EACCES 8202 case EACCES: return DRFLAC_ACCESS_DENIED; 8203 #endif 8204 #ifdef EFAULT 8205 case EFAULT: return DRFLAC_BAD_ADDRESS; 8206 #endif 8207 #ifdef ENOTBLK 8208 case ENOTBLK: return DRFLAC_ERROR; 8209 #endif 8210 #ifdef EBUSY 8211 case EBUSY: return DRFLAC_BUSY; 8212 #endif 8213 #ifdef EEXIST 8214 case EEXIST: return DRFLAC_ALREADY_EXISTS; 8215 #endif 8216 #ifdef EXDEV 8217 case EXDEV: return DRFLAC_ERROR; 8218 #endif 8219 #ifdef ENODEV 8220 case ENODEV: return DRFLAC_DOES_NOT_EXIST; 8221 #endif 8222 #ifdef ENOTDIR 8223 case ENOTDIR: return DRFLAC_NOT_DIRECTORY; 8224 #endif 8225 #ifdef EISDIR 8226 case EISDIR: return DRFLAC_IS_DIRECTORY; 8227 #endif 8228 #ifdef EINVAL 8229 case EINVAL: return DRFLAC_INVALID_ARGS; 8230 #endif 8231 #ifdef ENFILE 8232 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES; 8233 #endif 8234 #ifdef EMFILE 8235 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES; 8236 #endif 8237 #ifdef ENOTTY 8238 case ENOTTY: return DRFLAC_INVALID_OPERATION; 8239 #endif 8240 #ifdef ETXTBSY 8241 case ETXTBSY: return DRFLAC_BUSY; 8242 #endif 8243 #ifdef EFBIG 8244 case EFBIG: return DRFLAC_TOO_BIG; 8245 #endif 8246 #ifdef ENOSPC 8247 case ENOSPC: return DRFLAC_NO_SPACE; 8248 #endif 8249 #ifdef ESPIPE 8250 case ESPIPE: return DRFLAC_BAD_SEEK; 8251 #endif 8252 #ifdef EROFS 8253 case EROFS: return DRFLAC_ACCESS_DENIED; 8254 #endif 8255 #ifdef EMLINK 8256 case EMLINK: return DRFLAC_TOO_MANY_LINKS; 8257 #endif 8258 #ifdef EPIPE 8259 case EPIPE: return DRFLAC_BAD_PIPE; 8260 #endif 8261 #ifdef EDOM 8262 case EDOM: return DRFLAC_OUT_OF_RANGE; 8263 #endif 8264 #ifdef ERANGE 8265 case ERANGE: return DRFLAC_OUT_OF_RANGE; 8266 #endif 8267 #ifdef EDEADLK 8268 case EDEADLK: return DRFLAC_DEADLOCK; 8269 #endif 8270 #ifdef ENAMETOOLONG 8271 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG; 8272 #endif 8273 #ifdef ENOLCK 8274 case ENOLCK: return DRFLAC_ERROR; 8275 #endif 8276 #ifdef ENOSYS 8277 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED; 8278 #endif 8279 #ifdef ENOTEMPTY 8280 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY; 8281 #endif 8282 #ifdef ELOOP 8283 case ELOOP: return DRFLAC_TOO_MANY_LINKS; 8284 #endif 8285 #ifdef ENOMSG 8286 case ENOMSG: return DRFLAC_NO_MESSAGE; 8287 #endif 8288 #ifdef EIDRM 8289 case EIDRM: return DRFLAC_ERROR; 8290 #endif 8291 #ifdef ECHRNG 8292 case ECHRNG: return DRFLAC_ERROR; 8293 #endif 8294 #ifdef EL2NSYNC 8295 case EL2NSYNC: return DRFLAC_ERROR; 8296 #endif 8297 #ifdef EL3HLT 8298 case EL3HLT: return DRFLAC_ERROR; 8299 #endif 8300 #ifdef EL3RST 8301 case EL3RST: return DRFLAC_ERROR; 8302 #endif 8303 #ifdef ELNRNG 8304 case ELNRNG: return DRFLAC_OUT_OF_RANGE; 8305 #endif 8306 #ifdef EUNATCH 8307 case EUNATCH: return DRFLAC_ERROR; 8308 #endif 8309 #ifdef ENOCSI 8310 case ENOCSI: return DRFLAC_ERROR; 8311 #endif 8312 #ifdef EL2HLT 8313 case EL2HLT: return DRFLAC_ERROR; 8314 #endif 8315 #ifdef EBADE 8316 case EBADE: return DRFLAC_ERROR; 8317 #endif 8318 #ifdef EBADR 8319 case EBADR: return DRFLAC_ERROR; 8320 #endif 8321 #ifdef EXFULL 8322 case EXFULL: return DRFLAC_ERROR; 8323 #endif 8324 #ifdef ENOANO 8325 case ENOANO: return DRFLAC_ERROR; 8326 #endif 8327 #ifdef EBADRQC 8328 case EBADRQC: return DRFLAC_ERROR; 8329 #endif 8330 #ifdef EBADSLT 8331 case EBADSLT: return DRFLAC_ERROR; 8332 #endif 8333 #ifdef EBFONT 8334 case EBFONT: return DRFLAC_INVALID_FILE; 8335 #endif 8336 #ifdef ENOSTR 8337 case ENOSTR: return DRFLAC_ERROR; 8338 #endif 8339 #ifdef ENODATA 8340 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE; 8341 #endif 8342 #ifdef ETIME 8343 case ETIME: return DRFLAC_TIMEOUT; 8344 #endif 8345 #ifdef ENOSR 8346 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE; 8347 #endif 8348 #ifdef ENONET 8349 case ENONET: return DRFLAC_NO_NETWORK; 8350 #endif 8351 #ifdef ENOPKG 8352 case ENOPKG: return DRFLAC_ERROR; 8353 #endif 8354 #ifdef EREMOTE 8355 case EREMOTE: return DRFLAC_ERROR; 8356 #endif 8357 #ifdef ENOLINK 8358 case ENOLINK: return DRFLAC_ERROR; 8359 #endif 8360 #ifdef EADV 8361 case EADV: return DRFLAC_ERROR; 8362 #endif 8363 #ifdef ESRMNT 8364 case ESRMNT: return DRFLAC_ERROR; 8365 #endif 8366 #ifdef ECOMM 8367 case ECOMM: return DRFLAC_ERROR; 8368 #endif 8369 #ifdef EPROTO 8370 case EPROTO: return DRFLAC_ERROR; 8371 #endif 8372 #ifdef EMULTIHOP 8373 case EMULTIHOP: return DRFLAC_ERROR; 8374 #endif 8375 #ifdef EDOTDOT 8376 case EDOTDOT: return DRFLAC_ERROR; 8377 #endif 8378 #ifdef EBADMSG 8379 case EBADMSG: return DRFLAC_BAD_MESSAGE; 8380 #endif 8381 #ifdef EOVERFLOW 8382 case EOVERFLOW: return DRFLAC_TOO_BIG; 8383 #endif 8384 #ifdef ENOTUNIQ 8385 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE; 8386 #endif 8387 #ifdef EBADFD 8388 case EBADFD: return DRFLAC_ERROR; 8389 #endif 8390 #ifdef EREMCHG 8391 case EREMCHG: return DRFLAC_ERROR; 8392 #endif 8393 #ifdef ELIBACC 8394 case ELIBACC: return DRFLAC_ACCESS_DENIED; 8395 #endif 8396 #ifdef ELIBBAD 8397 case ELIBBAD: return DRFLAC_INVALID_FILE; 8398 #endif 8399 #ifdef ELIBSCN 8400 case ELIBSCN: return DRFLAC_INVALID_FILE; 8401 #endif 8402 #ifdef ELIBMAX 8403 case ELIBMAX: return DRFLAC_ERROR; 8404 #endif 8405 #ifdef ELIBEXEC 8406 case ELIBEXEC: return DRFLAC_ERROR; 8407 #endif 8408 #ifdef EILSEQ 8409 case EILSEQ: return DRFLAC_INVALID_DATA; 8410 #endif 8411 #ifdef ERESTART 8412 case ERESTART: return DRFLAC_ERROR; 8413 #endif 8414 #ifdef ESTRPIPE 8415 case ESTRPIPE: return DRFLAC_ERROR; 8416 #endif 8417 #ifdef EUSERS 8418 case EUSERS: return DRFLAC_ERROR; 8419 #endif 8420 #ifdef ENOTSOCK 8421 case ENOTSOCK: return DRFLAC_NOT_SOCKET; 8422 #endif 8423 #ifdef EDESTADDRREQ 8424 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS; 8425 #endif 8426 #ifdef EMSGSIZE 8427 case EMSGSIZE: return DRFLAC_TOO_BIG; 8428 #endif 8429 #ifdef EPROTOTYPE 8430 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL; 8431 #endif 8432 #ifdef ENOPROTOOPT 8433 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE; 8434 #endif 8435 #ifdef EPROTONOSUPPORT 8436 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED; 8437 #endif 8438 #ifdef ESOCKTNOSUPPORT 8439 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED; 8440 #endif 8441 #ifdef EOPNOTSUPP 8442 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION; 8443 #endif 8444 #ifdef EPFNOSUPPORT 8445 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED; 8446 #endif 8447 #ifdef EAFNOSUPPORT 8448 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED; 8449 #endif 8450 #ifdef EADDRINUSE 8451 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE; 8452 #endif 8453 #ifdef EADDRNOTAVAIL 8454 case EADDRNOTAVAIL: return DRFLAC_ERROR; 8455 #endif 8456 #ifdef ENETDOWN 8457 case ENETDOWN: return DRFLAC_NO_NETWORK; 8458 #endif 8459 #ifdef ENETUNREACH 8460 case ENETUNREACH: return DRFLAC_NO_NETWORK; 8461 #endif 8462 #ifdef ENETRESET 8463 case ENETRESET: return DRFLAC_NO_NETWORK; 8464 #endif 8465 #ifdef ECONNABORTED 8466 case ECONNABORTED: return DRFLAC_NO_NETWORK; 8467 #endif 8468 #ifdef ECONNRESET 8469 case ECONNRESET: return DRFLAC_CONNECTION_RESET; 8470 #endif 8471 #ifdef ENOBUFS 8472 case ENOBUFS: return DRFLAC_NO_SPACE; 8473 #endif 8474 #ifdef EISCONN 8475 case EISCONN: return DRFLAC_ALREADY_CONNECTED; 8476 #endif 8477 #ifdef ENOTCONN 8478 case ENOTCONN: return DRFLAC_NOT_CONNECTED; 8479 #endif 8480 #ifdef ESHUTDOWN 8481 case ESHUTDOWN: return DRFLAC_ERROR; 8482 #endif 8483 #ifdef ETOOMANYREFS 8484 case ETOOMANYREFS: return DRFLAC_ERROR; 8485 #endif 8486 #ifdef ETIMEDOUT 8487 case ETIMEDOUT: return DRFLAC_TIMEOUT; 8488 #endif 8489 #ifdef ECONNREFUSED 8490 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED; 8491 #endif 8492 #ifdef EHOSTDOWN 8493 case EHOSTDOWN: return DRFLAC_NO_HOST; 8494 #endif 8495 #ifdef EHOSTUNREACH 8496 case EHOSTUNREACH: return DRFLAC_NO_HOST; 8497 #endif 8498 #ifdef EALREADY 8499 case EALREADY: return DRFLAC_IN_PROGRESS; 8500 #endif 8501 #ifdef EINPROGRESS 8502 case EINPROGRESS: return DRFLAC_IN_PROGRESS; 8503 #endif 8504 #ifdef ESTALE 8505 case ESTALE: return DRFLAC_INVALID_FILE; 8506 #endif 8507 #ifdef EUCLEAN 8508 case EUCLEAN: return DRFLAC_ERROR; 8509 #endif 8510 #ifdef ENOTNAM 8511 case ENOTNAM: return DRFLAC_ERROR; 8512 #endif 8513 #ifdef ENAVAIL 8514 case ENAVAIL: return DRFLAC_ERROR; 8515 #endif 8516 #ifdef EISNAM 8517 case EISNAM: return DRFLAC_ERROR; 8518 #endif 8519 #ifdef EREMOTEIO 8520 case EREMOTEIO: return DRFLAC_IO_ERROR; 8521 #endif 8522 #ifdef EDQUOT 8523 case EDQUOT: return DRFLAC_NO_SPACE; 8524 #endif 8525 #ifdef ENOMEDIUM 8526 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST; 8527 #endif 8528 #ifdef EMEDIUMTYPE 8529 case EMEDIUMTYPE: return DRFLAC_ERROR; 8530 #endif 8531 #ifdef ECANCELED 8532 case ECANCELED: return DRFLAC_CANCELLED; 8533 #endif 8534 #ifdef ENOKEY 8535 case ENOKEY: return DRFLAC_ERROR; 8536 #endif 8537 #ifdef EKEYEXPIRED 8538 case EKEYEXPIRED: return DRFLAC_ERROR; 8539 #endif 8540 #ifdef EKEYREVOKED 8541 case EKEYREVOKED: return DRFLAC_ERROR; 8542 #endif 8543 #ifdef EKEYREJECTED 8544 case EKEYREJECTED: return DRFLAC_ERROR; 8545 #endif 8546 #ifdef EOWNERDEAD 8547 case EOWNERDEAD: return DRFLAC_ERROR; 8548 #endif 8549 #ifdef ENOTRECOVERABLE 8550 case ENOTRECOVERABLE: return DRFLAC_ERROR; 8551 #endif 8552 #ifdef ERFKILL 8553 case ERFKILL: return DRFLAC_ERROR; 8554 #endif 8555 #ifdef EHWPOISON 8556 case EHWPOISON: return DRFLAC_ERROR; 8557 #endif 8558 default: return DRFLAC_ERROR; 8559 } 8560 } 8561 /* End Errno */ 8562 8563 /* fopen */ 8564 static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode) 8565 { 8566 #if defined(_MSC_VER) && _MSC_VER >= 1400 8567 errno_t err; 8568 #endif 8569 8570 if (ppFile != NULL) { 8571 *ppFile = NULL; /* Safety. */ 8572 } 8573 8574 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) { 8575 return DRFLAC_INVALID_ARGS; 8576 } 8577 8578 #if defined(_MSC_VER) && _MSC_VER >= 1400 8579 err = fopen_s(ppFile, pFilePath, pOpenMode); 8580 if (err != 0) { 8581 return drflac_result_from_errno(err); 8582 } 8583 #else 8584 #if defined(_WIN32) || defined(__APPLE__) 8585 *ppFile = fopen(pFilePath, pOpenMode); 8586 #else 8587 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE) 8588 *ppFile = fopen64(pFilePath, pOpenMode); 8589 #else 8590 *ppFile = fopen(pFilePath, pOpenMode); 8591 #endif 8592 #endif 8593 if (*ppFile == NULL) { 8594 drflac_result result = drflac_result_from_errno(errno); 8595 if (result == DRFLAC_SUCCESS) { 8596 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */ 8597 } 8598 8599 return result; 8600 } 8601 #endif 8602 8603 return DRFLAC_SUCCESS; 8604 } 8605 8606 /* 8607 _wfopen() isn't always available in all compilation environments. 8608 8609 * Windows only. 8610 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back). 8611 * MinGW-64 (both 32- and 64-bit) seems to support it. 8612 * MinGW wraps it in !defined(__STRICT_ANSI__). 8613 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS). 8614 8615 This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs() 8616 fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support. 8617 */ 8618 #if defined(_WIN32) 8619 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS)) 8620 #define DRFLAC_HAS_WFOPEN 8621 #endif 8622 #endif 8623 8624 #ifndef DR_FLAC_NO_WCHAR 8625 static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks) 8626 { 8627 if (ppFile != NULL) { 8628 *ppFile = NULL; /* Safety. */ 8629 } 8630 8631 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) { 8632 return DRFLAC_INVALID_ARGS; 8633 } 8634 8635 #if defined(DRFLAC_HAS_WFOPEN) 8636 { 8637 /* Use _wfopen() on Windows. */ 8638 #if defined(_MSC_VER) && _MSC_VER >= 1400 8639 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode); 8640 if (err != 0) { 8641 return drflac_result_from_errno(err); 8642 } 8643 #else 8644 *ppFile = _wfopen(pFilePath, pOpenMode); 8645 if (*ppFile == NULL) { 8646 return drflac_result_from_errno(errno); 8647 } 8648 #endif 8649 (void)pAllocationCallbacks; 8650 } 8651 #else 8652 /* 8653 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because 8654 fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note 8655 that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for 8656 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler 8657 error I'll look into improving compatibility. 8658 */ 8659 8660 /* 8661 Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just 8662 need to abort with an error. If you encounter a compiler lacking such support, add it to this list 8663 and submit a bug report and it'll be added to the library upstream. 8664 */ 8665 #if defined(__DJGPP__) 8666 { 8667 /* Nothing to do here. This will fall through to the error check below. */ 8668 } 8669 #else 8670 { 8671 mbstate_t mbs; 8672 size_t lenMB; 8673 const wchar_t* pFilePathTemp = pFilePath; 8674 char* pFilePathMB = NULL; 8675 char pOpenModeMB[32] = {0}; 8676 8677 /* Get the length first. */ 8678 DRFLAC_ZERO_OBJECT(&mbs); 8679 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs); 8680 if (lenMB == (size_t)-1) { 8681 return drflac_result_from_errno(errno); 8682 } 8683 8684 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks); 8685 if (pFilePathMB == NULL) { 8686 return DRFLAC_OUT_OF_MEMORY; 8687 } 8688 8689 pFilePathTemp = pFilePath; 8690 DRFLAC_ZERO_OBJECT(&mbs); 8691 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs); 8692 8693 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */ 8694 { 8695 size_t i = 0; 8696 for (;;) { 8697 if (pOpenMode[i] == 0) { 8698 pOpenModeMB[i] = '\0'; 8699 break; 8700 } 8701 8702 pOpenModeMB[i] = (char)pOpenMode[i]; 8703 i += 1; 8704 } 8705 } 8706 8707 *ppFile = fopen(pFilePathMB, pOpenModeMB); 8708 8709 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks); 8710 } 8711 #endif 8712 8713 if (*ppFile == NULL) { 8714 return DRFLAC_ERROR; 8715 } 8716 #endif 8717 8718 return DRFLAC_SUCCESS; 8719 } 8720 #endif 8721 /* End fopen */ 8722 8723 static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead) 8724 { 8725 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData); 8726 } 8727 8728 static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin) 8729 { 8730 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */ 8731 8732 return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0; 8733 } 8734 8735 8736 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks) 8737 { 8738 drflac* pFlac; 8739 FILE* pFile; 8740 8741 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) { 8742 return NULL; 8743 } 8744 8745 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks); 8746 if (pFlac == NULL) { 8747 fclose(pFile); 8748 return NULL; 8749 } 8750 8751 return pFlac; 8752 } 8753 8754 #ifndef DR_FLAC_NO_WCHAR 8755 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks) 8756 { 8757 drflac* pFlac; 8758 FILE* pFile; 8759 8760 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) { 8761 return NULL; 8762 } 8763 8764 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks); 8765 if (pFlac == NULL) { 8766 fclose(pFile); 8767 return NULL; 8768 } 8769 8770 return pFlac; 8771 } 8772 #endif 8773 8774 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8775 { 8776 drflac* pFlac; 8777 FILE* pFile; 8778 8779 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) { 8780 return NULL; 8781 } 8782 8783 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks); 8784 if (pFlac == NULL) { 8785 fclose(pFile); 8786 return pFlac; 8787 } 8788 8789 return pFlac; 8790 } 8791 8792 #ifndef DR_FLAC_NO_WCHAR 8793 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8794 { 8795 drflac* pFlac; 8796 FILE* pFile; 8797 8798 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) { 8799 return NULL; 8800 } 8801 8802 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks); 8803 if (pFlac == NULL) { 8804 fclose(pFile); 8805 return pFlac; 8806 } 8807 8808 return pFlac; 8809 } 8810 #endif 8811 #endif /* DR_FLAC_NO_STDIO */ 8812 8813 static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead) 8814 { 8815 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData; 8816 size_t bytesRemaining; 8817 8818 DRFLAC_ASSERT(memoryStream != NULL); 8819 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos); 8820 8821 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos; 8822 if (bytesToRead > bytesRemaining) { 8823 bytesToRead = bytesRemaining; 8824 } 8825 8826 if (bytesToRead > 0) { 8827 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead); 8828 memoryStream->currentReadPos += bytesToRead; 8829 } 8830 8831 return bytesToRead; 8832 } 8833 8834 static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin) 8835 { 8836 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData; 8837 8838 DRFLAC_ASSERT(memoryStream != NULL); 8839 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */ 8840 8841 if (offset > (drflac_int64)memoryStream->dataSize) { 8842 return DRFLAC_FALSE; 8843 } 8844 8845 if (origin == drflac_seek_origin_current) { 8846 if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) { 8847 memoryStream->currentReadPos += offset; 8848 } else { 8849 return DRFLAC_FALSE; /* Trying to seek too far forward. */ 8850 } 8851 } else { 8852 if ((drflac_uint32)offset <= memoryStream->dataSize) { 8853 memoryStream->currentReadPos = offset; 8854 } else { 8855 return DRFLAC_FALSE; /* Trying to seek too far forward. */ 8856 } 8857 } 8858 8859 return DRFLAC_TRUE; 8860 } 8861 8862 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks) 8863 { 8864 drflac__memory_stream memoryStream; 8865 drflac* pFlac; 8866 8867 memoryStream.data = (const drflac_uint8*)pData; 8868 memoryStream.dataSize = dataSize; 8869 memoryStream.currentReadPos = 0; 8870 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks); 8871 if (pFlac == NULL) { 8872 return NULL; 8873 } 8874 8875 pFlac->memoryStream = memoryStream; 8876 8877 /* This is an awful hack... */ 8878 #ifndef DR_FLAC_NO_OGG 8879 if (pFlac->container == drflac_container_ogg) 8880 { 8881 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs; 8882 oggbs->pUserData = &pFlac->memoryStream; 8883 } 8884 else 8885 #endif 8886 { 8887 pFlac->bs.pUserData = &pFlac->memoryStream; 8888 } 8889 8890 return pFlac; 8891 } 8892 8893 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8894 { 8895 drflac__memory_stream memoryStream; 8896 drflac* pFlac; 8897 8898 memoryStream.data = (const drflac_uint8*)pData; 8899 memoryStream.dataSize = dataSize; 8900 memoryStream.currentReadPos = 0; 8901 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks); 8902 if (pFlac == NULL) { 8903 return NULL; 8904 } 8905 8906 pFlac->memoryStream = memoryStream; 8907 8908 /* This is an awful hack... */ 8909 #ifndef DR_FLAC_NO_OGG 8910 if (pFlac->container == drflac_container_ogg) 8911 { 8912 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs; 8913 oggbs->pUserData = &pFlac->memoryStream; 8914 } 8915 else 8916 #endif 8917 { 8918 pFlac->bs.pUserData = &pFlac->memoryStream; 8919 } 8920 8921 return pFlac; 8922 } 8923 8924 8925 8926 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8927 { 8928 return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks); 8929 } 8930 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8931 { 8932 return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks); 8933 } 8934 8935 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8936 { 8937 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks); 8938 } 8939 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks) 8940 { 8941 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks); 8942 } 8943 8944 DRFLAC_API void drflac_close(drflac* pFlac) 8945 { 8946 if (pFlac == NULL) { 8947 return; 8948 } 8949 8950 #ifndef DR_FLAC_NO_STDIO 8951 /* 8952 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file() 8953 was used by looking at the callbacks. 8954 */ 8955 if (pFlac->bs.onRead == drflac__on_read_stdio) { 8956 fclose((FILE*)pFlac->bs.pUserData); 8957 } 8958 8959 #ifndef DR_FLAC_NO_OGG 8960 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */ 8961 if (pFlac->container == drflac_container_ogg) { 8962 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs; 8963 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg); 8964 8965 if (oggbs->onRead == drflac__on_read_stdio) { 8966 fclose((FILE*)oggbs->pUserData); 8967 } 8968 } 8969 #endif 8970 #endif 8971 8972 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks); 8973 } 8974 8975 8976 #if 0 8977 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 8978 { 8979 drflac_uint64 i; 8980 for (i = 0; i < frameCount; ++i) { 8981 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 8982 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 8983 drflac_uint32 right = left - side; 8984 8985 pOutputSamples[i*2+0] = (drflac_int32)left; 8986 pOutputSamples[i*2+1] = (drflac_int32)right; 8987 } 8988 } 8989 #endif 8990 8991 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 8992 { 8993 drflac_uint64 i; 8994 drflac_uint64 frameCount4 = frameCount >> 2; 8995 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 8996 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 8997 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 8998 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 8999 9000 for (i = 0; i < frameCount4; ++i) { 9001 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0; 9002 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0; 9003 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0; 9004 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0; 9005 9006 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1; 9007 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1; 9008 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1; 9009 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1; 9010 9011 drflac_uint32 right0 = left0 - side0; 9012 drflac_uint32 right1 = left1 - side1; 9013 drflac_uint32 right2 = left2 - side2; 9014 drflac_uint32 right3 = left3 - side3; 9015 9016 pOutputSamples[i*8+0] = (drflac_int32)left0; 9017 pOutputSamples[i*8+1] = (drflac_int32)right0; 9018 pOutputSamples[i*8+2] = (drflac_int32)left1; 9019 pOutputSamples[i*8+3] = (drflac_int32)right1; 9020 pOutputSamples[i*8+4] = (drflac_int32)left2; 9021 pOutputSamples[i*8+5] = (drflac_int32)right2; 9022 pOutputSamples[i*8+6] = (drflac_int32)left3; 9023 pOutputSamples[i*8+7] = (drflac_int32)right3; 9024 } 9025 9026 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9027 drflac_uint32 left = pInputSamples0U32[i] << shift0; 9028 drflac_uint32 side = pInputSamples1U32[i] << shift1; 9029 drflac_uint32 right = left - side; 9030 9031 pOutputSamples[i*2+0] = (drflac_int32)left; 9032 pOutputSamples[i*2+1] = (drflac_int32)right; 9033 } 9034 } 9035 9036 #if defined(DRFLAC_SUPPORT_SSE2) 9037 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9038 { 9039 drflac_uint64 i; 9040 drflac_uint64 frameCount4 = frameCount >> 2; 9041 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9042 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9043 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9044 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9045 9046 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9047 9048 for (i = 0; i < frameCount4; ++i) { 9049 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 9050 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 9051 __m128i right = _mm_sub_epi32(left, side); 9052 9053 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right)); 9054 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right)); 9055 } 9056 9057 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9058 drflac_uint32 left = pInputSamples0U32[i] << shift0; 9059 drflac_uint32 side = pInputSamples1U32[i] << shift1; 9060 drflac_uint32 right = left - side; 9061 9062 pOutputSamples[i*2+0] = (drflac_int32)left; 9063 pOutputSamples[i*2+1] = (drflac_int32)right; 9064 } 9065 } 9066 #endif 9067 9068 #if defined(DRFLAC_SUPPORT_NEON) 9069 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9070 { 9071 drflac_uint64 i; 9072 drflac_uint64 frameCount4 = frameCount >> 2; 9073 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9074 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9075 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9076 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9077 int32x4_t shift0_4; 9078 int32x4_t shift1_4; 9079 9080 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9081 9082 shift0_4 = vdupq_n_s32(shift0); 9083 shift1_4 = vdupq_n_s32(shift1); 9084 9085 for (i = 0; i < frameCount4; ++i) { 9086 uint32x4_t left; 9087 uint32x4_t side; 9088 uint32x4_t right; 9089 9090 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4); 9091 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4); 9092 right = vsubq_u32(left, side); 9093 9094 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right)); 9095 } 9096 9097 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9098 drflac_uint32 left = pInputSamples0U32[i] << shift0; 9099 drflac_uint32 side = pInputSamples1U32[i] << shift1; 9100 drflac_uint32 right = left - side; 9101 9102 pOutputSamples[i*2+0] = (drflac_int32)left; 9103 pOutputSamples[i*2+1] = (drflac_int32)right; 9104 } 9105 } 9106 #endif 9107 9108 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9109 { 9110 #if defined(DRFLAC_SUPPORT_SSE2) 9111 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 9112 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9113 } else 9114 #elif defined(DRFLAC_SUPPORT_NEON) 9115 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 9116 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9117 } else 9118 #endif 9119 { 9120 /* Scalar fallback. */ 9121 #if 0 9122 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9123 #else 9124 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9125 #endif 9126 } 9127 } 9128 9129 9130 #if 0 9131 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9132 { 9133 drflac_uint64 i; 9134 for (i = 0; i < frameCount; ++i) { 9135 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 9136 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 9137 drflac_uint32 left = right + side; 9138 9139 pOutputSamples[i*2+0] = (drflac_int32)left; 9140 pOutputSamples[i*2+1] = (drflac_int32)right; 9141 } 9142 } 9143 #endif 9144 9145 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9146 { 9147 drflac_uint64 i; 9148 drflac_uint64 frameCount4 = frameCount >> 2; 9149 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9150 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9151 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9152 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9153 9154 for (i = 0; i < frameCount4; ++i) { 9155 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0; 9156 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0; 9157 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0; 9158 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0; 9159 9160 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1; 9161 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1; 9162 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1; 9163 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1; 9164 9165 drflac_uint32 left0 = right0 + side0; 9166 drflac_uint32 left1 = right1 + side1; 9167 drflac_uint32 left2 = right2 + side2; 9168 drflac_uint32 left3 = right3 + side3; 9169 9170 pOutputSamples[i*8+0] = (drflac_int32)left0; 9171 pOutputSamples[i*8+1] = (drflac_int32)right0; 9172 pOutputSamples[i*8+2] = (drflac_int32)left1; 9173 pOutputSamples[i*8+3] = (drflac_int32)right1; 9174 pOutputSamples[i*8+4] = (drflac_int32)left2; 9175 pOutputSamples[i*8+5] = (drflac_int32)right2; 9176 pOutputSamples[i*8+6] = (drflac_int32)left3; 9177 pOutputSamples[i*8+7] = (drflac_int32)right3; 9178 } 9179 9180 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9181 drflac_uint32 side = pInputSamples0U32[i] << shift0; 9182 drflac_uint32 right = pInputSamples1U32[i] << shift1; 9183 drflac_uint32 left = right + side; 9184 9185 pOutputSamples[i*2+0] = (drflac_int32)left; 9186 pOutputSamples[i*2+1] = (drflac_int32)right; 9187 } 9188 } 9189 9190 #if defined(DRFLAC_SUPPORT_SSE2) 9191 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9192 { 9193 drflac_uint64 i; 9194 drflac_uint64 frameCount4 = frameCount >> 2; 9195 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9196 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9197 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9198 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9199 9200 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9201 9202 for (i = 0; i < frameCount4; ++i) { 9203 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 9204 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 9205 __m128i left = _mm_add_epi32(right, side); 9206 9207 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right)); 9208 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right)); 9209 } 9210 9211 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9212 drflac_uint32 side = pInputSamples0U32[i] << shift0; 9213 drflac_uint32 right = pInputSamples1U32[i] << shift1; 9214 drflac_uint32 left = right + side; 9215 9216 pOutputSamples[i*2+0] = (drflac_int32)left; 9217 pOutputSamples[i*2+1] = (drflac_int32)right; 9218 } 9219 } 9220 #endif 9221 9222 #if defined(DRFLAC_SUPPORT_NEON) 9223 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9224 { 9225 drflac_uint64 i; 9226 drflac_uint64 frameCount4 = frameCount >> 2; 9227 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9228 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9229 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9230 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9231 int32x4_t shift0_4; 9232 int32x4_t shift1_4; 9233 9234 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9235 9236 shift0_4 = vdupq_n_s32(shift0); 9237 shift1_4 = vdupq_n_s32(shift1); 9238 9239 for (i = 0; i < frameCount4; ++i) { 9240 uint32x4_t side; 9241 uint32x4_t right; 9242 uint32x4_t left; 9243 9244 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4); 9245 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4); 9246 left = vaddq_u32(right, side); 9247 9248 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right)); 9249 } 9250 9251 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9252 drflac_uint32 side = pInputSamples0U32[i] << shift0; 9253 drflac_uint32 right = pInputSamples1U32[i] << shift1; 9254 drflac_uint32 left = right + side; 9255 9256 pOutputSamples[i*2+0] = (drflac_int32)left; 9257 pOutputSamples[i*2+1] = (drflac_int32)right; 9258 } 9259 } 9260 #endif 9261 9262 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9263 { 9264 #if defined(DRFLAC_SUPPORT_SSE2) 9265 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 9266 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9267 } else 9268 #elif defined(DRFLAC_SUPPORT_NEON) 9269 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 9270 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9271 } else 9272 #endif 9273 { 9274 /* Scalar fallback. */ 9275 #if 0 9276 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9277 #else 9278 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9279 #endif 9280 } 9281 } 9282 9283 9284 #if 0 9285 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9286 { 9287 for (drflac_uint64 i = 0; i < frameCount; ++i) { 9288 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9289 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9290 9291 mid = (mid << 1) | (side & 0x01); 9292 9293 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample); 9294 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample); 9295 } 9296 } 9297 #endif 9298 9299 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9300 { 9301 drflac_uint64 i; 9302 drflac_uint64 frameCount4 = frameCount >> 2; 9303 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9304 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9305 drflac_int32 shift = unusedBitsPerSample; 9306 9307 if (shift > 0) { 9308 shift -= 1; 9309 for (i = 0; i < frameCount4; ++i) { 9310 drflac_uint32 temp0L; 9311 drflac_uint32 temp1L; 9312 drflac_uint32 temp2L; 9313 drflac_uint32 temp3L; 9314 drflac_uint32 temp0R; 9315 drflac_uint32 temp1R; 9316 drflac_uint32 temp2R; 9317 drflac_uint32 temp3R; 9318 9319 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9320 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9321 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9322 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9323 9324 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9325 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9326 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9327 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9328 9329 mid0 = (mid0 << 1) | (side0 & 0x01); 9330 mid1 = (mid1 << 1) | (side1 & 0x01); 9331 mid2 = (mid2 << 1) | (side2 & 0x01); 9332 mid3 = (mid3 << 1) | (side3 & 0x01); 9333 9334 temp0L = (mid0 + side0) << shift; 9335 temp1L = (mid1 + side1) << shift; 9336 temp2L = (mid2 + side2) << shift; 9337 temp3L = (mid3 + side3) << shift; 9338 9339 temp0R = (mid0 - side0) << shift; 9340 temp1R = (mid1 - side1) << shift; 9341 temp2R = (mid2 - side2) << shift; 9342 temp3R = (mid3 - side3) << shift; 9343 9344 pOutputSamples[i*8+0] = (drflac_int32)temp0L; 9345 pOutputSamples[i*8+1] = (drflac_int32)temp0R; 9346 pOutputSamples[i*8+2] = (drflac_int32)temp1L; 9347 pOutputSamples[i*8+3] = (drflac_int32)temp1R; 9348 pOutputSamples[i*8+4] = (drflac_int32)temp2L; 9349 pOutputSamples[i*8+5] = (drflac_int32)temp2R; 9350 pOutputSamples[i*8+6] = (drflac_int32)temp3L; 9351 pOutputSamples[i*8+7] = (drflac_int32)temp3R; 9352 } 9353 } else { 9354 for (i = 0; i < frameCount4; ++i) { 9355 drflac_uint32 temp0L; 9356 drflac_uint32 temp1L; 9357 drflac_uint32 temp2L; 9358 drflac_uint32 temp3L; 9359 drflac_uint32 temp0R; 9360 drflac_uint32 temp1R; 9361 drflac_uint32 temp2R; 9362 drflac_uint32 temp3R; 9363 9364 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9365 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9366 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9367 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9368 9369 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9370 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9371 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9372 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9373 9374 mid0 = (mid0 << 1) | (side0 & 0x01); 9375 mid1 = (mid1 << 1) | (side1 & 0x01); 9376 mid2 = (mid2 << 1) | (side2 & 0x01); 9377 mid3 = (mid3 << 1) | (side3 & 0x01); 9378 9379 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1); 9380 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1); 9381 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1); 9382 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1); 9383 9384 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1); 9385 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1); 9386 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1); 9387 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1); 9388 9389 pOutputSamples[i*8+0] = (drflac_int32)temp0L; 9390 pOutputSamples[i*8+1] = (drflac_int32)temp0R; 9391 pOutputSamples[i*8+2] = (drflac_int32)temp1L; 9392 pOutputSamples[i*8+3] = (drflac_int32)temp1R; 9393 pOutputSamples[i*8+4] = (drflac_int32)temp2L; 9394 pOutputSamples[i*8+5] = (drflac_int32)temp2R; 9395 pOutputSamples[i*8+6] = (drflac_int32)temp3L; 9396 pOutputSamples[i*8+7] = (drflac_int32)temp3R; 9397 } 9398 } 9399 9400 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9401 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9402 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9403 9404 mid = (mid << 1) | (side & 0x01); 9405 9406 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample); 9407 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample); 9408 } 9409 } 9410 9411 #if defined(DRFLAC_SUPPORT_SSE2) 9412 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9413 { 9414 drflac_uint64 i; 9415 drflac_uint64 frameCount4 = frameCount >> 2; 9416 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9417 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9418 drflac_int32 shift = unusedBitsPerSample; 9419 9420 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9421 9422 if (shift == 0) { 9423 for (i = 0; i < frameCount4; ++i) { 9424 __m128i mid; 9425 __m128i side; 9426 __m128i left; 9427 __m128i right; 9428 9429 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 9430 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 9431 9432 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01))); 9433 9434 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1); 9435 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1); 9436 9437 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right)); 9438 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right)); 9439 } 9440 9441 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9442 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9443 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9444 9445 mid = (mid << 1) | (side & 0x01); 9446 9447 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1; 9448 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1; 9449 } 9450 } else { 9451 shift -= 1; 9452 for (i = 0; i < frameCount4; ++i) { 9453 __m128i mid; 9454 __m128i side; 9455 __m128i left; 9456 __m128i right; 9457 9458 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 9459 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 9460 9461 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01))); 9462 9463 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift); 9464 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift); 9465 9466 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right)); 9467 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right)); 9468 } 9469 9470 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9471 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9472 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9473 9474 mid = (mid << 1) | (side & 0x01); 9475 9476 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift); 9477 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift); 9478 } 9479 } 9480 } 9481 #endif 9482 9483 #if defined(DRFLAC_SUPPORT_NEON) 9484 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9485 { 9486 drflac_uint64 i; 9487 drflac_uint64 frameCount4 = frameCount >> 2; 9488 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9489 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9490 drflac_int32 shift = unusedBitsPerSample; 9491 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */ 9492 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */ 9493 uint32x4_t one4; 9494 9495 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9496 9497 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 9498 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 9499 one4 = vdupq_n_u32(1); 9500 9501 if (shift == 0) { 9502 for (i = 0; i < frameCount4; ++i) { 9503 uint32x4_t mid; 9504 uint32x4_t side; 9505 int32x4_t left; 9506 int32x4_t right; 9507 9508 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4); 9509 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4); 9510 9511 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4)); 9512 9513 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1); 9514 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1); 9515 9516 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right)); 9517 } 9518 9519 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9520 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9521 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9522 9523 mid = (mid << 1) | (side & 0x01); 9524 9525 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1; 9526 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1; 9527 } 9528 } else { 9529 int32x4_t shift4; 9530 9531 shift -= 1; 9532 shift4 = vdupq_n_s32(shift); 9533 9534 for (i = 0; i < frameCount4; ++i) { 9535 uint32x4_t mid; 9536 uint32x4_t side; 9537 int32x4_t left; 9538 int32x4_t right; 9539 9540 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4); 9541 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4); 9542 9543 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4)); 9544 9545 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4)); 9546 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4)); 9547 9548 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right)); 9549 } 9550 9551 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9552 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9553 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9554 9555 mid = (mid << 1) | (side & 0x01); 9556 9557 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift); 9558 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift); 9559 } 9560 } 9561 } 9562 #endif 9563 9564 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9565 { 9566 #if defined(DRFLAC_SUPPORT_SSE2) 9567 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 9568 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9569 } else 9570 #elif defined(DRFLAC_SUPPORT_NEON) 9571 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 9572 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9573 } else 9574 #endif 9575 { 9576 /* Scalar fallback. */ 9577 #if 0 9578 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9579 #else 9580 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9581 #endif 9582 } 9583 } 9584 9585 9586 #if 0 9587 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9588 { 9589 for (drflac_uint64 i = 0; i < frameCount; ++i) { 9590 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)); 9591 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)); 9592 } 9593 } 9594 #endif 9595 9596 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9597 { 9598 drflac_uint64 i; 9599 drflac_uint64 frameCount4 = frameCount >> 2; 9600 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9601 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9602 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9603 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9604 9605 for (i = 0; i < frameCount4; ++i) { 9606 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0; 9607 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0; 9608 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0; 9609 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0; 9610 9611 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1; 9612 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1; 9613 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1; 9614 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1; 9615 9616 pOutputSamples[i*8+0] = (drflac_int32)tempL0; 9617 pOutputSamples[i*8+1] = (drflac_int32)tempR0; 9618 pOutputSamples[i*8+2] = (drflac_int32)tempL1; 9619 pOutputSamples[i*8+3] = (drflac_int32)tempR1; 9620 pOutputSamples[i*8+4] = (drflac_int32)tempL2; 9621 pOutputSamples[i*8+5] = (drflac_int32)tempR2; 9622 pOutputSamples[i*8+6] = (drflac_int32)tempL3; 9623 pOutputSamples[i*8+7] = (drflac_int32)tempR3; 9624 } 9625 9626 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9627 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0); 9628 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1); 9629 } 9630 } 9631 9632 #if defined(DRFLAC_SUPPORT_SSE2) 9633 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9634 { 9635 drflac_uint64 i; 9636 drflac_uint64 frameCount4 = frameCount >> 2; 9637 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9638 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9639 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9640 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9641 9642 for (i = 0; i < frameCount4; ++i) { 9643 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 9644 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 9645 9646 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right)); 9647 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right)); 9648 } 9649 9650 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9651 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0); 9652 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1); 9653 } 9654 } 9655 #endif 9656 9657 #if defined(DRFLAC_SUPPORT_NEON) 9658 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9659 { 9660 drflac_uint64 i; 9661 drflac_uint64 frameCount4 = frameCount >> 2; 9662 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9663 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9664 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9665 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9666 9667 int32x4_t shift4_0 = vdupq_n_s32(shift0); 9668 int32x4_t shift4_1 = vdupq_n_s32(shift1); 9669 9670 for (i = 0; i < frameCount4; ++i) { 9671 int32x4_t left; 9672 int32x4_t right; 9673 9674 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0)); 9675 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1)); 9676 9677 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right)); 9678 } 9679 9680 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9681 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0); 9682 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1); 9683 } 9684 } 9685 #endif 9686 9687 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples) 9688 { 9689 #if defined(DRFLAC_SUPPORT_SSE2) 9690 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 9691 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9692 } else 9693 #elif defined(DRFLAC_SUPPORT_NEON) 9694 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 9695 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9696 } else 9697 #endif 9698 { 9699 /* Scalar fallback. */ 9700 #if 0 9701 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9702 #else 9703 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9704 #endif 9705 } 9706 } 9707 9708 9709 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut) 9710 { 9711 drflac_uint64 framesRead; 9712 drflac_uint32 unusedBitsPerSample; 9713 9714 if (pFlac == NULL || framesToRead == 0) { 9715 return 0; 9716 } 9717 9718 if (pBufferOut == NULL) { 9719 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead); 9720 } 9721 9722 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32); 9723 unusedBitsPerSample = 32 - pFlac->bitsPerSample; 9724 9725 framesRead = 0; 9726 while (framesToRead > 0) { 9727 /* If we've run out of samples in this frame, go to the next. */ 9728 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) { 9729 if (!drflac__read_and_decode_next_flac_frame(pFlac)) { 9730 break; /* Couldn't read the next frame, so just break from the loop and return. */ 9731 } 9732 } else { 9733 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment); 9734 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining; 9735 drflac_uint64 frameCountThisIteration = framesToRead; 9736 9737 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) { 9738 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining; 9739 } 9740 9741 if (channelCount == 2) { 9742 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame; 9743 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame; 9744 9745 switch (pFlac->currentFLACFrame.header.channelAssignment) 9746 { 9747 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE: 9748 { 9749 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 9750 } break; 9751 9752 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE: 9753 { 9754 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 9755 } break; 9756 9757 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE: 9758 { 9759 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 9760 } break; 9761 9762 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT: 9763 default: 9764 { 9765 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 9766 } break; 9767 } 9768 } else { 9769 /* Generic interleaving. */ 9770 drflac_uint64 i; 9771 for (i = 0; i < frameCountThisIteration; ++i) { 9772 unsigned int j; 9773 for (j = 0; j < channelCount; ++j) { 9774 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample)); 9775 } 9776 } 9777 } 9778 9779 framesRead += frameCountThisIteration; 9780 pBufferOut += frameCountThisIteration * channelCount; 9781 framesToRead -= frameCountThisIteration; 9782 pFlac->currentPCMFrame += frameCountThisIteration; 9783 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration; 9784 } 9785 } 9786 9787 return framesRead; 9788 } 9789 9790 9791 #if 0 9792 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9793 { 9794 drflac_uint64 i; 9795 for (i = 0; i < frameCount; ++i) { 9796 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 9797 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 9798 drflac_uint32 right = left - side; 9799 9800 left >>= 16; 9801 right >>= 16; 9802 9803 pOutputSamples[i*2+0] = (drflac_int16)left; 9804 pOutputSamples[i*2+1] = (drflac_int16)right; 9805 } 9806 } 9807 #endif 9808 9809 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9810 { 9811 drflac_uint64 i; 9812 drflac_uint64 frameCount4 = frameCount >> 2; 9813 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9814 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9815 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9816 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9817 9818 for (i = 0; i < frameCount4; ++i) { 9819 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0; 9820 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0; 9821 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0; 9822 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0; 9823 9824 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1; 9825 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1; 9826 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1; 9827 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1; 9828 9829 drflac_uint32 right0 = left0 - side0; 9830 drflac_uint32 right1 = left1 - side1; 9831 drflac_uint32 right2 = left2 - side2; 9832 drflac_uint32 right3 = left3 - side3; 9833 9834 left0 >>= 16; 9835 left1 >>= 16; 9836 left2 >>= 16; 9837 left3 >>= 16; 9838 9839 right0 >>= 16; 9840 right1 >>= 16; 9841 right2 >>= 16; 9842 right3 >>= 16; 9843 9844 pOutputSamples[i*8+0] = (drflac_int16)left0; 9845 pOutputSamples[i*8+1] = (drflac_int16)right0; 9846 pOutputSamples[i*8+2] = (drflac_int16)left1; 9847 pOutputSamples[i*8+3] = (drflac_int16)right1; 9848 pOutputSamples[i*8+4] = (drflac_int16)left2; 9849 pOutputSamples[i*8+5] = (drflac_int16)right2; 9850 pOutputSamples[i*8+6] = (drflac_int16)left3; 9851 pOutputSamples[i*8+7] = (drflac_int16)right3; 9852 } 9853 9854 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9855 drflac_uint32 left = pInputSamples0U32[i] << shift0; 9856 drflac_uint32 side = pInputSamples1U32[i] << shift1; 9857 drflac_uint32 right = left - side; 9858 9859 left >>= 16; 9860 right >>= 16; 9861 9862 pOutputSamples[i*2+0] = (drflac_int16)left; 9863 pOutputSamples[i*2+1] = (drflac_int16)right; 9864 } 9865 } 9866 9867 #if defined(DRFLAC_SUPPORT_SSE2) 9868 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9869 { 9870 drflac_uint64 i; 9871 drflac_uint64 frameCount4 = frameCount >> 2; 9872 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9873 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9874 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9875 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9876 9877 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9878 9879 for (i = 0; i < frameCount4; ++i) { 9880 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 9881 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 9882 __m128i right = _mm_sub_epi32(left, side); 9883 9884 left = _mm_srai_epi32(left, 16); 9885 right = _mm_srai_epi32(right, 16); 9886 9887 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right)); 9888 } 9889 9890 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9891 drflac_uint32 left = pInputSamples0U32[i] << shift0; 9892 drflac_uint32 side = pInputSamples1U32[i] << shift1; 9893 drflac_uint32 right = left - side; 9894 9895 left >>= 16; 9896 right >>= 16; 9897 9898 pOutputSamples[i*2+0] = (drflac_int16)left; 9899 pOutputSamples[i*2+1] = (drflac_int16)right; 9900 } 9901 } 9902 #endif 9903 9904 #if defined(DRFLAC_SUPPORT_NEON) 9905 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9906 { 9907 drflac_uint64 i; 9908 drflac_uint64 frameCount4 = frameCount >> 2; 9909 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9910 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9911 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9912 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9913 int32x4_t shift0_4; 9914 int32x4_t shift1_4; 9915 9916 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 9917 9918 shift0_4 = vdupq_n_s32(shift0); 9919 shift1_4 = vdupq_n_s32(shift1); 9920 9921 for (i = 0; i < frameCount4; ++i) { 9922 uint32x4_t left; 9923 uint32x4_t side; 9924 uint32x4_t right; 9925 9926 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4); 9927 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4); 9928 right = vsubq_u32(left, side); 9929 9930 left = vshrq_n_u32(left, 16); 9931 right = vshrq_n_u32(right, 16); 9932 9933 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right))); 9934 } 9935 9936 for (i = (frameCount4 << 2); i < frameCount; ++i) { 9937 drflac_uint32 left = pInputSamples0U32[i] << shift0; 9938 drflac_uint32 side = pInputSamples1U32[i] << shift1; 9939 drflac_uint32 right = left - side; 9940 9941 left >>= 16; 9942 right >>= 16; 9943 9944 pOutputSamples[i*2+0] = (drflac_int16)left; 9945 pOutputSamples[i*2+1] = (drflac_int16)right; 9946 } 9947 } 9948 #endif 9949 9950 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9951 { 9952 #if defined(DRFLAC_SUPPORT_SSE2) 9953 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 9954 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9955 } else 9956 #elif defined(DRFLAC_SUPPORT_NEON) 9957 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 9958 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9959 } else 9960 #endif 9961 { 9962 /* Scalar fallback. */ 9963 #if 0 9964 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9965 #else 9966 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 9967 #endif 9968 } 9969 } 9970 9971 9972 #if 0 9973 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9974 { 9975 drflac_uint64 i; 9976 for (i = 0; i < frameCount; ++i) { 9977 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 9978 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 9979 drflac_uint32 left = right + side; 9980 9981 left >>= 16; 9982 right >>= 16; 9983 9984 pOutputSamples[i*2+0] = (drflac_int16)left; 9985 pOutputSamples[i*2+1] = (drflac_int16)right; 9986 } 9987 } 9988 #endif 9989 9990 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 9991 { 9992 drflac_uint64 i; 9993 drflac_uint64 frameCount4 = frameCount >> 2; 9994 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 9995 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 9996 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 9997 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 9998 9999 for (i = 0; i < frameCount4; ++i) { 10000 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0; 10001 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0; 10002 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0; 10003 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0; 10004 10005 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1; 10006 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1; 10007 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1; 10008 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1; 10009 10010 drflac_uint32 left0 = right0 + side0; 10011 drflac_uint32 left1 = right1 + side1; 10012 drflac_uint32 left2 = right2 + side2; 10013 drflac_uint32 left3 = right3 + side3; 10014 10015 left0 >>= 16; 10016 left1 >>= 16; 10017 left2 >>= 16; 10018 left3 >>= 16; 10019 10020 right0 >>= 16; 10021 right1 >>= 16; 10022 right2 >>= 16; 10023 right3 >>= 16; 10024 10025 pOutputSamples[i*8+0] = (drflac_int16)left0; 10026 pOutputSamples[i*8+1] = (drflac_int16)right0; 10027 pOutputSamples[i*8+2] = (drflac_int16)left1; 10028 pOutputSamples[i*8+3] = (drflac_int16)right1; 10029 pOutputSamples[i*8+4] = (drflac_int16)left2; 10030 pOutputSamples[i*8+5] = (drflac_int16)right2; 10031 pOutputSamples[i*8+6] = (drflac_int16)left3; 10032 pOutputSamples[i*8+7] = (drflac_int16)right3; 10033 } 10034 10035 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10036 drflac_uint32 side = pInputSamples0U32[i] << shift0; 10037 drflac_uint32 right = pInputSamples1U32[i] << shift1; 10038 drflac_uint32 left = right + side; 10039 10040 left >>= 16; 10041 right >>= 16; 10042 10043 pOutputSamples[i*2+0] = (drflac_int16)left; 10044 pOutputSamples[i*2+1] = (drflac_int16)right; 10045 } 10046 } 10047 10048 #if defined(DRFLAC_SUPPORT_SSE2) 10049 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10050 { 10051 drflac_uint64 i; 10052 drflac_uint64 frameCount4 = frameCount >> 2; 10053 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10054 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10055 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10056 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10057 10058 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10059 10060 for (i = 0; i < frameCount4; ++i) { 10061 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 10062 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 10063 __m128i left = _mm_add_epi32(right, side); 10064 10065 left = _mm_srai_epi32(left, 16); 10066 right = _mm_srai_epi32(right, 16); 10067 10068 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right)); 10069 } 10070 10071 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10072 drflac_uint32 side = pInputSamples0U32[i] << shift0; 10073 drflac_uint32 right = pInputSamples1U32[i] << shift1; 10074 drflac_uint32 left = right + side; 10075 10076 left >>= 16; 10077 right >>= 16; 10078 10079 pOutputSamples[i*2+0] = (drflac_int16)left; 10080 pOutputSamples[i*2+1] = (drflac_int16)right; 10081 } 10082 } 10083 #endif 10084 10085 #if defined(DRFLAC_SUPPORT_NEON) 10086 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10087 { 10088 drflac_uint64 i; 10089 drflac_uint64 frameCount4 = frameCount >> 2; 10090 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10091 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10092 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10093 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10094 int32x4_t shift0_4; 10095 int32x4_t shift1_4; 10096 10097 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10098 10099 shift0_4 = vdupq_n_s32(shift0); 10100 shift1_4 = vdupq_n_s32(shift1); 10101 10102 for (i = 0; i < frameCount4; ++i) { 10103 uint32x4_t side; 10104 uint32x4_t right; 10105 uint32x4_t left; 10106 10107 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4); 10108 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4); 10109 left = vaddq_u32(right, side); 10110 10111 left = vshrq_n_u32(left, 16); 10112 right = vshrq_n_u32(right, 16); 10113 10114 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right))); 10115 } 10116 10117 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10118 drflac_uint32 side = pInputSamples0U32[i] << shift0; 10119 drflac_uint32 right = pInputSamples1U32[i] << shift1; 10120 drflac_uint32 left = right + side; 10121 10122 left >>= 16; 10123 right >>= 16; 10124 10125 pOutputSamples[i*2+0] = (drflac_int16)left; 10126 pOutputSamples[i*2+1] = (drflac_int16)right; 10127 } 10128 } 10129 #endif 10130 10131 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10132 { 10133 #if defined(DRFLAC_SUPPORT_SSE2) 10134 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 10135 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10136 } else 10137 #elif defined(DRFLAC_SUPPORT_NEON) 10138 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 10139 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10140 } else 10141 #endif 10142 { 10143 /* Scalar fallback. */ 10144 #if 0 10145 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10146 #else 10147 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10148 #endif 10149 } 10150 } 10151 10152 10153 #if 0 10154 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10155 { 10156 for (drflac_uint64 i = 0; i < frameCount; ++i) { 10157 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10158 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10159 10160 mid = (mid << 1) | (side & 0x01); 10161 10162 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16); 10163 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16); 10164 } 10165 } 10166 #endif 10167 10168 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10169 { 10170 drflac_uint64 i; 10171 drflac_uint64 frameCount4 = frameCount >> 2; 10172 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10173 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10174 drflac_uint32 shift = unusedBitsPerSample; 10175 10176 if (shift > 0) { 10177 shift -= 1; 10178 for (i = 0; i < frameCount4; ++i) { 10179 drflac_uint32 temp0L; 10180 drflac_uint32 temp1L; 10181 drflac_uint32 temp2L; 10182 drflac_uint32 temp3L; 10183 drflac_uint32 temp0R; 10184 drflac_uint32 temp1R; 10185 drflac_uint32 temp2R; 10186 drflac_uint32 temp3R; 10187 10188 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10189 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10190 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10191 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10192 10193 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10194 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10195 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10196 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10197 10198 mid0 = (mid0 << 1) | (side0 & 0x01); 10199 mid1 = (mid1 << 1) | (side1 & 0x01); 10200 mid2 = (mid2 << 1) | (side2 & 0x01); 10201 mid3 = (mid3 << 1) | (side3 & 0x01); 10202 10203 temp0L = (mid0 + side0) << shift; 10204 temp1L = (mid1 + side1) << shift; 10205 temp2L = (mid2 + side2) << shift; 10206 temp3L = (mid3 + side3) << shift; 10207 10208 temp0R = (mid0 - side0) << shift; 10209 temp1R = (mid1 - side1) << shift; 10210 temp2R = (mid2 - side2) << shift; 10211 temp3R = (mid3 - side3) << shift; 10212 10213 temp0L >>= 16; 10214 temp1L >>= 16; 10215 temp2L >>= 16; 10216 temp3L >>= 16; 10217 10218 temp0R >>= 16; 10219 temp1R >>= 16; 10220 temp2R >>= 16; 10221 temp3R >>= 16; 10222 10223 pOutputSamples[i*8+0] = (drflac_int16)temp0L; 10224 pOutputSamples[i*8+1] = (drflac_int16)temp0R; 10225 pOutputSamples[i*8+2] = (drflac_int16)temp1L; 10226 pOutputSamples[i*8+3] = (drflac_int16)temp1R; 10227 pOutputSamples[i*8+4] = (drflac_int16)temp2L; 10228 pOutputSamples[i*8+5] = (drflac_int16)temp2R; 10229 pOutputSamples[i*8+6] = (drflac_int16)temp3L; 10230 pOutputSamples[i*8+7] = (drflac_int16)temp3R; 10231 } 10232 } else { 10233 for (i = 0; i < frameCount4; ++i) { 10234 drflac_uint32 temp0L; 10235 drflac_uint32 temp1L; 10236 drflac_uint32 temp2L; 10237 drflac_uint32 temp3L; 10238 drflac_uint32 temp0R; 10239 drflac_uint32 temp1R; 10240 drflac_uint32 temp2R; 10241 drflac_uint32 temp3R; 10242 10243 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10244 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10245 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10246 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10247 10248 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10249 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10250 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10251 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10252 10253 mid0 = (mid0 << 1) | (side0 & 0x01); 10254 mid1 = (mid1 << 1) | (side1 & 0x01); 10255 mid2 = (mid2 << 1) | (side2 & 0x01); 10256 mid3 = (mid3 << 1) | (side3 & 0x01); 10257 10258 temp0L = ((drflac_int32)(mid0 + side0) >> 1); 10259 temp1L = ((drflac_int32)(mid1 + side1) >> 1); 10260 temp2L = ((drflac_int32)(mid2 + side2) >> 1); 10261 temp3L = ((drflac_int32)(mid3 + side3) >> 1); 10262 10263 temp0R = ((drflac_int32)(mid0 - side0) >> 1); 10264 temp1R = ((drflac_int32)(mid1 - side1) >> 1); 10265 temp2R = ((drflac_int32)(mid2 - side2) >> 1); 10266 temp3R = ((drflac_int32)(mid3 - side3) >> 1); 10267 10268 temp0L >>= 16; 10269 temp1L >>= 16; 10270 temp2L >>= 16; 10271 temp3L >>= 16; 10272 10273 temp0R >>= 16; 10274 temp1R >>= 16; 10275 temp2R >>= 16; 10276 temp3R >>= 16; 10277 10278 pOutputSamples[i*8+0] = (drflac_int16)temp0L; 10279 pOutputSamples[i*8+1] = (drflac_int16)temp0R; 10280 pOutputSamples[i*8+2] = (drflac_int16)temp1L; 10281 pOutputSamples[i*8+3] = (drflac_int16)temp1R; 10282 pOutputSamples[i*8+4] = (drflac_int16)temp2L; 10283 pOutputSamples[i*8+5] = (drflac_int16)temp2R; 10284 pOutputSamples[i*8+6] = (drflac_int16)temp3L; 10285 pOutputSamples[i*8+7] = (drflac_int16)temp3R; 10286 } 10287 } 10288 10289 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10290 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10291 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10292 10293 mid = (mid << 1) | (side & 0x01); 10294 10295 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16); 10296 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16); 10297 } 10298 } 10299 10300 #if defined(DRFLAC_SUPPORT_SSE2) 10301 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10302 { 10303 drflac_uint64 i; 10304 drflac_uint64 frameCount4 = frameCount >> 2; 10305 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10306 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10307 drflac_uint32 shift = unusedBitsPerSample; 10308 10309 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10310 10311 if (shift == 0) { 10312 for (i = 0; i < frameCount4; ++i) { 10313 __m128i mid; 10314 __m128i side; 10315 __m128i left; 10316 __m128i right; 10317 10318 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 10319 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 10320 10321 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01))); 10322 10323 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1); 10324 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1); 10325 10326 left = _mm_srai_epi32(left, 16); 10327 right = _mm_srai_epi32(right, 16); 10328 10329 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right)); 10330 } 10331 10332 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10333 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10334 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10335 10336 mid = (mid << 1) | (side & 0x01); 10337 10338 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16); 10339 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16); 10340 } 10341 } else { 10342 shift -= 1; 10343 for (i = 0; i < frameCount4; ++i) { 10344 __m128i mid; 10345 __m128i side; 10346 __m128i left; 10347 __m128i right; 10348 10349 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 10350 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 10351 10352 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01))); 10353 10354 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift); 10355 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift); 10356 10357 left = _mm_srai_epi32(left, 16); 10358 right = _mm_srai_epi32(right, 16); 10359 10360 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right)); 10361 } 10362 10363 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10364 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10365 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10366 10367 mid = (mid << 1) | (side & 0x01); 10368 10369 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16); 10370 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16); 10371 } 10372 } 10373 } 10374 #endif 10375 10376 #if defined(DRFLAC_SUPPORT_NEON) 10377 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10378 { 10379 drflac_uint64 i; 10380 drflac_uint64 frameCount4 = frameCount >> 2; 10381 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10382 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10383 drflac_uint32 shift = unusedBitsPerSample; 10384 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */ 10385 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */ 10386 10387 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10388 10389 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 10390 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 10391 10392 if (shift == 0) { 10393 for (i = 0; i < frameCount4; ++i) { 10394 uint32x4_t mid; 10395 uint32x4_t side; 10396 int32x4_t left; 10397 int32x4_t right; 10398 10399 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4); 10400 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4); 10401 10402 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1))); 10403 10404 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1); 10405 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1); 10406 10407 left = vshrq_n_s32(left, 16); 10408 right = vshrq_n_s32(right, 16); 10409 10410 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right))); 10411 } 10412 10413 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10414 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10415 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10416 10417 mid = (mid << 1) | (side & 0x01); 10418 10419 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16); 10420 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16); 10421 } 10422 } else { 10423 int32x4_t shift4; 10424 10425 shift -= 1; 10426 shift4 = vdupq_n_s32(shift); 10427 10428 for (i = 0; i < frameCount4; ++i) { 10429 uint32x4_t mid; 10430 uint32x4_t side; 10431 int32x4_t left; 10432 int32x4_t right; 10433 10434 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4); 10435 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4); 10436 10437 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1))); 10438 10439 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4)); 10440 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4)); 10441 10442 left = vshrq_n_s32(left, 16); 10443 right = vshrq_n_s32(right, 16); 10444 10445 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right))); 10446 } 10447 10448 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10449 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10450 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10451 10452 mid = (mid << 1) | (side & 0x01); 10453 10454 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16); 10455 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16); 10456 } 10457 } 10458 } 10459 #endif 10460 10461 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10462 { 10463 #if defined(DRFLAC_SUPPORT_SSE2) 10464 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 10465 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10466 } else 10467 #elif defined(DRFLAC_SUPPORT_NEON) 10468 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 10469 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10470 } else 10471 #endif 10472 { 10473 /* Scalar fallback. */ 10474 #if 0 10475 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10476 #else 10477 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10478 #endif 10479 } 10480 } 10481 10482 10483 #if 0 10484 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10485 { 10486 for (drflac_uint64 i = 0; i < frameCount; ++i) { 10487 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16); 10488 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16); 10489 } 10490 } 10491 #endif 10492 10493 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10494 { 10495 drflac_uint64 i; 10496 drflac_uint64 frameCount4 = frameCount >> 2; 10497 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10498 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10499 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10500 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10501 10502 for (i = 0; i < frameCount4; ++i) { 10503 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0; 10504 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0; 10505 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0; 10506 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0; 10507 10508 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1; 10509 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1; 10510 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1; 10511 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1; 10512 10513 tempL0 >>= 16; 10514 tempL1 >>= 16; 10515 tempL2 >>= 16; 10516 tempL3 >>= 16; 10517 10518 tempR0 >>= 16; 10519 tempR1 >>= 16; 10520 tempR2 >>= 16; 10521 tempR3 >>= 16; 10522 10523 pOutputSamples[i*8+0] = (drflac_int16)tempL0; 10524 pOutputSamples[i*8+1] = (drflac_int16)tempR0; 10525 pOutputSamples[i*8+2] = (drflac_int16)tempL1; 10526 pOutputSamples[i*8+3] = (drflac_int16)tempR1; 10527 pOutputSamples[i*8+4] = (drflac_int16)tempL2; 10528 pOutputSamples[i*8+5] = (drflac_int16)tempR2; 10529 pOutputSamples[i*8+6] = (drflac_int16)tempL3; 10530 pOutputSamples[i*8+7] = (drflac_int16)tempR3; 10531 } 10532 10533 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10534 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16); 10535 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16); 10536 } 10537 } 10538 10539 #if defined(DRFLAC_SUPPORT_SSE2) 10540 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10541 { 10542 drflac_uint64 i; 10543 drflac_uint64 frameCount4 = frameCount >> 2; 10544 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10545 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10546 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10547 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10548 10549 for (i = 0; i < frameCount4; ++i) { 10550 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 10551 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 10552 10553 left = _mm_srai_epi32(left, 16); 10554 right = _mm_srai_epi32(right, 16); 10555 10556 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */ 10557 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right)); 10558 } 10559 10560 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10561 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16); 10562 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16); 10563 } 10564 } 10565 #endif 10566 10567 #if defined(DRFLAC_SUPPORT_NEON) 10568 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10569 { 10570 drflac_uint64 i; 10571 drflac_uint64 frameCount4 = frameCount >> 2; 10572 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10573 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10574 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10575 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10576 10577 int32x4_t shift0_4 = vdupq_n_s32(shift0); 10578 int32x4_t shift1_4 = vdupq_n_s32(shift1); 10579 10580 for (i = 0; i < frameCount4; ++i) { 10581 int32x4_t left; 10582 int32x4_t right; 10583 10584 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4)); 10585 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4)); 10586 10587 left = vshrq_n_s32(left, 16); 10588 right = vshrq_n_s32(right, 16); 10589 10590 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right))); 10591 } 10592 10593 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10594 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16); 10595 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16); 10596 } 10597 } 10598 #endif 10599 10600 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples) 10601 { 10602 #if defined(DRFLAC_SUPPORT_SSE2) 10603 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 10604 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10605 } else 10606 #elif defined(DRFLAC_SUPPORT_NEON) 10607 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 10608 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10609 } else 10610 #endif 10611 { 10612 /* Scalar fallback. */ 10613 #if 0 10614 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10615 #else 10616 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10617 #endif 10618 } 10619 } 10620 10621 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut) 10622 { 10623 drflac_uint64 framesRead; 10624 drflac_uint32 unusedBitsPerSample; 10625 10626 if (pFlac == NULL || framesToRead == 0) { 10627 return 0; 10628 } 10629 10630 if (pBufferOut == NULL) { 10631 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead); 10632 } 10633 10634 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32); 10635 unusedBitsPerSample = 32 - pFlac->bitsPerSample; 10636 10637 framesRead = 0; 10638 while (framesToRead > 0) { 10639 /* If we've run out of samples in this frame, go to the next. */ 10640 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) { 10641 if (!drflac__read_and_decode_next_flac_frame(pFlac)) { 10642 break; /* Couldn't read the next frame, so just break from the loop and return. */ 10643 } 10644 } else { 10645 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment); 10646 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining; 10647 drflac_uint64 frameCountThisIteration = framesToRead; 10648 10649 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) { 10650 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining; 10651 } 10652 10653 if (channelCount == 2) { 10654 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame; 10655 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame; 10656 10657 switch (pFlac->currentFLACFrame.header.channelAssignment) 10658 { 10659 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE: 10660 { 10661 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 10662 } break; 10663 10664 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE: 10665 { 10666 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 10667 } break; 10668 10669 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE: 10670 { 10671 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 10672 } break; 10673 10674 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT: 10675 default: 10676 { 10677 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 10678 } break; 10679 } 10680 } else { 10681 /* Generic interleaving. */ 10682 drflac_uint64 i; 10683 for (i = 0; i < frameCountThisIteration; ++i) { 10684 unsigned int j; 10685 for (j = 0; j < channelCount; ++j) { 10686 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample)); 10687 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16); 10688 } 10689 } 10690 } 10691 10692 framesRead += frameCountThisIteration; 10693 pBufferOut += frameCountThisIteration * channelCount; 10694 framesToRead -= frameCountThisIteration; 10695 pFlac->currentPCMFrame += frameCountThisIteration; 10696 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration; 10697 } 10698 } 10699 10700 return framesRead; 10701 } 10702 10703 10704 #if 0 10705 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10706 { 10707 drflac_uint64 i; 10708 for (i = 0; i < frameCount; ++i) { 10709 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 10710 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 10711 drflac_uint32 right = left - side; 10712 10713 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0); 10714 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0); 10715 } 10716 } 10717 #endif 10718 10719 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10720 { 10721 drflac_uint64 i; 10722 drflac_uint64 frameCount4 = frameCount >> 2; 10723 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10724 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10725 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10726 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10727 10728 float factor = 1 / 2147483648.0; 10729 10730 for (i = 0; i < frameCount4; ++i) { 10731 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0; 10732 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0; 10733 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0; 10734 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0; 10735 10736 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1; 10737 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1; 10738 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1; 10739 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1; 10740 10741 drflac_uint32 right0 = left0 - side0; 10742 drflac_uint32 right1 = left1 - side1; 10743 drflac_uint32 right2 = left2 - side2; 10744 drflac_uint32 right3 = left3 - side3; 10745 10746 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor; 10747 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor; 10748 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor; 10749 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor; 10750 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor; 10751 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor; 10752 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor; 10753 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor; 10754 } 10755 10756 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10757 drflac_uint32 left = pInputSamples0U32[i] << shift0; 10758 drflac_uint32 side = pInputSamples1U32[i] << shift1; 10759 drflac_uint32 right = left - side; 10760 10761 pOutputSamples[i*2+0] = (drflac_int32)left * factor; 10762 pOutputSamples[i*2+1] = (drflac_int32)right * factor; 10763 } 10764 } 10765 10766 #if defined(DRFLAC_SUPPORT_SSE2) 10767 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10768 { 10769 drflac_uint64 i; 10770 drflac_uint64 frameCount4 = frameCount >> 2; 10771 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10772 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10773 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8; 10774 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8; 10775 __m128 factor; 10776 10777 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10778 10779 factor = _mm_set1_ps(1.0f / 8388608.0f); 10780 10781 for (i = 0; i < frameCount4; ++i) { 10782 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 10783 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 10784 __m128i right = _mm_sub_epi32(left, side); 10785 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor); 10786 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor); 10787 10788 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf)); 10789 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf)); 10790 } 10791 10792 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10793 drflac_uint32 left = pInputSamples0U32[i] << shift0; 10794 drflac_uint32 side = pInputSamples1U32[i] << shift1; 10795 drflac_uint32 right = left - side; 10796 10797 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f; 10798 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f; 10799 } 10800 } 10801 #endif 10802 10803 #if defined(DRFLAC_SUPPORT_NEON) 10804 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10805 { 10806 drflac_uint64 i; 10807 drflac_uint64 frameCount4 = frameCount >> 2; 10808 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10809 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10810 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8; 10811 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8; 10812 float32x4_t factor4; 10813 int32x4_t shift0_4; 10814 int32x4_t shift1_4; 10815 10816 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10817 10818 factor4 = vdupq_n_f32(1.0f / 8388608.0f); 10819 shift0_4 = vdupq_n_s32(shift0); 10820 shift1_4 = vdupq_n_s32(shift1); 10821 10822 for (i = 0; i < frameCount4; ++i) { 10823 uint32x4_t left; 10824 uint32x4_t side; 10825 uint32x4_t right; 10826 float32x4_t leftf; 10827 float32x4_t rightf; 10828 10829 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4); 10830 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4); 10831 right = vsubq_u32(left, side); 10832 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4); 10833 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4); 10834 10835 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf)); 10836 } 10837 10838 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10839 drflac_uint32 left = pInputSamples0U32[i] << shift0; 10840 drflac_uint32 side = pInputSamples1U32[i] << shift1; 10841 drflac_uint32 right = left - side; 10842 10843 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f; 10844 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f; 10845 } 10846 } 10847 #endif 10848 10849 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10850 { 10851 #if defined(DRFLAC_SUPPORT_SSE2) 10852 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 10853 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10854 } else 10855 #elif defined(DRFLAC_SUPPORT_NEON) 10856 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 10857 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10858 } else 10859 #endif 10860 { 10861 /* Scalar fallback. */ 10862 #if 0 10863 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10864 #else 10865 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 10866 #endif 10867 } 10868 } 10869 10870 10871 #if 0 10872 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10873 { 10874 drflac_uint64 i; 10875 for (i = 0; i < frameCount; ++i) { 10876 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 10877 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 10878 drflac_uint32 left = right + side; 10879 10880 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0); 10881 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0); 10882 } 10883 } 10884 #endif 10885 10886 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10887 { 10888 drflac_uint64 i; 10889 drflac_uint64 frameCount4 = frameCount >> 2; 10890 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10891 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10892 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 10893 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 10894 float factor = 1 / 2147483648.0; 10895 10896 for (i = 0; i < frameCount4; ++i) { 10897 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0; 10898 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0; 10899 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0; 10900 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0; 10901 10902 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1; 10903 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1; 10904 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1; 10905 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1; 10906 10907 drflac_uint32 left0 = right0 + side0; 10908 drflac_uint32 left1 = right1 + side1; 10909 drflac_uint32 left2 = right2 + side2; 10910 drflac_uint32 left3 = right3 + side3; 10911 10912 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor; 10913 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor; 10914 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor; 10915 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor; 10916 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor; 10917 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor; 10918 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor; 10919 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor; 10920 } 10921 10922 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10923 drflac_uint32 side = pInputSamples0U32[i] << shift0; 10924 drflac_uint32 right = pInputSamples1U32[i] << shift1; 10925 drflac_uint32 left = right + side; 10926 10927 pOutputSamples[i*2+0] = (drflac_int32)left * factor; 10928 pOutputSamples[i*2+1] = (drflac_int32)right * factor; 10929 } 10930 } 10931 10932 #if defined(DRFLAC_SUPPORT_SSE2) 10933 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10934 { 10935 drflac_uint64 i; 10936 drflac_uint64 frameCount4 = frameCount >> 2; 10937 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10938 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10939 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8; 10940 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8; 10941 __m128 factor; 10942 10943 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10944 10945 factor = _mm_set1_ps(1.0f / 8388608.0f); 10946 10947 for (i = 0; i < frameCount4; ++i) { 10948 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 10949 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 10950 __m128i left = _mm_add_epi32(right, side); 10951 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor); 10952 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor); 10953 10954 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf)); 10955 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf)); 10956 } 10957 10958 for (i = (frameCount4 << 2); i < frameCount; ++i) { 10959 drflac_uint32 side = pInputSamples0U32[i] << shift0; 10960 drflac_uint32 right = pInputSamples1U32[i] << shift1; 10961 drflac_uint32 left = right + side; 10962 10963 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f; 10964 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f; 10965 } 10966 } 10967 #endif 10968 10969 #if defined(DRFLAC_SUPPORT_NEON) 10970 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 10971 { 10972 drflac_uint64 i; 10973 drflac_uint64 frameCount4 = frameCount >> 2; 10974 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 10975 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 10976 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8; 10977 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8; 10978 float32x4_t factor4; 10979 int32x4_t shift0_4; 10980 int32x4_t shift1_4; 10981 10982 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 10983 10984 factor4 = vdupq_n_f32(1.0f / 8388608.0f); 10985 shift0_4 = vdupq_n_s32(shift0); 10986 shift1_4 = vdupq_n_s32(shift1); 10987 10988 for (i = 0; i < frameCount4; ++i) { 10989 uint32x4_t side; 10990 uint32x4_t right; 10991 uint32x4_t left; 10992 float32x4_t leftf; 10993 float32x4_t rightf; 10994 10995 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4); 10996 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4); 10997 left = vaddq_u32(right, side); 10998 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4); 10999 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4); 11000 11001 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf)); 11002 } 11003 11004 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11005 drflac_uint32 side = pInputSamples0U32[i] << shift0; 11006 drflac_uint32 right = pInputSamples1U32[i] << shift1; 11007 drflac_uint32 left = right + side; 11008 11009 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f; 11010 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f; 11011 } 11012 } 11013 #endif 11014 11015 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11016 { 11017 #if defined(DRFLAC_SUPPORT_SSE2) 11018 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 11019 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11020 } else 11021 #elif defined(DRFLAC_SUPPORT_NEON) 11022 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 11023 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11024 } else 11025 #endif 11026 { 11027 /* Scalar fallback. */ 11028 #if 0 11029 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11030 #else 11031 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11032 #endif 11033 } 11034 } 11035 11036 11037 #if 0 11038 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11039 { 11040 for (drflac_uint64 i = 0; i < frameCount; ++i) { 11041 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11042 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11043 11044 mid = (mid << 1) | (side & 0x01); 11045 11046 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0); 11047 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0); 11048 } 11049 } 11050 #endif 11051 11052 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11053 { 11054 drflac_uint64 i; 11055 drflac_uint64 frameCount4 = frameCount >> 2; 11056 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 11057 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 11058 drflac_uint32 shift = unusedBitsPerSample; 11059 float factor = 1 / 2147483648.0; 11060 11061 if (shift > 0) { 11062 shift -= 1; 11063 for (i = 0; i < frameCount4; ++i) { 11064 drflac_uint32 temp0L; 11065 drflac_uint32 temp1L; 11066 drflac_uint32 temp2L; 11067 drflac_uint32 temp3L; 11068 drflac_uint32 temp0R; 11069 drflac_uint32 temp1R; 11070 drflac_uint32 temp2R; 11071 drflac_uint32 temp3R; 11072 11073 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11074 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11075 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11076 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11077 11078 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11079 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11080 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11081 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11082 11083 mid0 = (mid0 << 1) | (side0 & 0x01); 11084 mid1 = (mid1 << 1) | (side1 & 0x01); 11085 mid2 = (mid2 << 1) | (side2 & 0x01); 11086 mid3 = (mid3 << 1) | (side3 & 0x01); 11087 11088 temp0L = (mid0 + side0) << shift; 11089 temp1L = (mid1 + side1) << shift; 11090 temp2L = (mid2 + side2) << shift; 11091 temp3L = (mid3 + side3) << shift; 11092 11093 temp0R = (mid0 - side0) << shift; 11094 temp1R = (mid1 - side1) << shift; 11095 temp2R = (mid2 - side2) << shift; 11096 temp3R = (mid3 - side3) << shift; 11097 11098 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor; 11099 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor; 11100 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor; 11101 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor; 11102 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor; 11103 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor; 11104 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor; 11105 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor; 11106 } 11107 } else { 11108 for (i = 0; i < frameCount4; ++i) { 11109 drflac_uint32 temp0L; 11110 drflac_uint32 temp1L; 11111 drflac_uint32 temp2L; 11112 drflac_uint32 temp3L; 11113 drflac_uint32 temp0R; 11114 drflac_uint32 temp1R; 11115 drflac_uint32 temp2R; 11116 drflac_uint32 temp3R; 11117 11118 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11119 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11120 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11121 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11122 11123 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11124 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11125 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11126 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11127 11128 mid0 = (mid0 << 1) | (side0 & 0x01); 11129 mid1 = (mid1 << 1) | (side1 & 0x01); 11130 mid2 = (mid2 << 1) | (side2 & 0x01); 11131 mid3 = (mid3 << 1) | (side3 & 0x01); 11132 11133 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1); 11134 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1); 11135 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1); 11136 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1); 11137 11138 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1); 11139 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1); 11140 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1); 11141 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1); 11142 11143 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor; 11144 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor; 11145 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor; 11146 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor; 11147 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor; 11148 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor; 11149 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor; 11150 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor; 11151 } 11152 } 11153 11154 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11155 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11156 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11157 11158 mid = (mid << 1) | (side & 0x01); 11159 11160 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor; 11161 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor; 11162 } 11163 } 11164 11165 #if defined(DRFLAC_SUPPORT_SSE2) 11166 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11167 { 11168 drflac_uint64 i; 11169 drflac_uint64 frameCount4 = frameCount >> 2; 11170 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 11171 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 11172 drflac_uint32 shift = unusedBitsPerSample - 8; 11173 float factor; 11174 __m128 factor128; 11175 11176 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 11177 11178 factor = 1.0f / 8388608.0f; 11179 factor128 = _mm_set1_ps(factor); 11180 11181 if (shift == 0) { 11182 for (i = 0; i < frameCount4; ++i) { 11183 __m128i mid; 11184 __m128i side; 11185 __m128i tempL; 11186 __m128i tempR; 11187 __m128 leftf; 11188 __m128 rightf; 11189 11190 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 11191 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 11192 11193 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01))); 11194 11195 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1); 11196 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1); 11197 11198 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128); 11199 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128); 11200 11201 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf)); 11202 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf)); 11203 } 11204 11205 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11206 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11207 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11208 11209 mid = (mid << 1) | (side & 0x01); 11210 11211 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor; 11212 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor; 11213 } 11214 } else { 11215 shift -= 1; 11216 for (i = 0; i < frameCount4; ++i) { 11217 __m128i mid; 11218 __m128i side; 11219 __m128i tempL; 11220 __m128i tempR; 11221 __m128 leftf; 11222 __m128 rightf; 11223 11224 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 11225 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 11226 11227 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01))); 11228 11229 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift); 11230 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift); 11231 11232 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128); 11233 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128); 11234 11235 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf)); 11236 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf)); 11237 } 11238 11239 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11240 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11241 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11242 11243 mid = (mid << 1) | (side & 0x01); 11244 11245 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor; 11246 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor; 11247 } 11248 } 11249 } 11250 #endif 11251 11252 #if defined(DRFLAC_SUPPORT_NEON) 11253 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11254 { 11255 drflac_uint64 i; 11256 drflac_uint64 frameCount4 = frameCount >> 2; 11257 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 11258 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 11259 drflac_uint32 shift = unusedBitsPerSample - 8; 11260 float factor; 11261 float32x4_t factor4; 11262 int32x4_t shift4; 11263 int32x4_t wbps0_4; /* Wasted Bits Per Sample */ 11264 int32x4_t wbps1_4; /* Wasted Bits Per Sample */ 11265 11266 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24); 11267 11268 factor = 1.0f / 8388608.0f; 11269 factor4 = vdupq_n_f32(factor); 11270 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample); 11271 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample); 11272 11273 if (shift == 0) { 11274 for (i = 0; i < frameCount4; ++i) { 11275 int32x4_t lefti; 11276 int32x4_t righti; 11277 float32x4_t leftf; 11278 float32x4_t rightf; 11279 11280 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4); 11281 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4); 11282 11283 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1))); 11284 11285 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1); 11286 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1); 11287 11288 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4); 11289 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4); 11290 11291 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf)); 11292 } 11293 11294 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11295 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11296 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11297 11298 mid = (mid << 1) | (side & 0x01); 11299 11300 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor; 11301 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor; 11302 } 11303 } else { 11304 shift -= 1; 11305 shift4 = vdupq_n_s32(shift); 11306 for (i = 0; i < frameCount4; ++i) { 11307 uint32x4_t mid; 11308 uint32x4_t side; 11309 int32x4_t lefti; 11310 int32x4_t righti; 11311 float32x4_t leftf; 11312 float32x4_t rightf; 11313 11314 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4); 11315 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4); 11316 11317 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1))); 11318 11319 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4)); 11320 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4)); 11321 11322 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4); 11323 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4); 11324 11325 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf)); 11326 } 11327 11328 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11329 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11330 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11331 11332 mid = (mid << 1) | (side & 0x01); 11333 11334 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor; 11335 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor; 11336 } 11337 } 11338 } 11339 #endif 11340 11341 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11342 { 11343 #if defined(DRFLAC_SUPPORT_SSE2) 11344 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 11345 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11346 } else 11347 #elif defined(DRFLAC_SUPPORT_NEON) 11348 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 11349 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11350 } else 11351 #endif 11352 { 11353 /* Scalar fallback. */ 11354 #if 0 11355 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11356 #else 11357 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11358 #endif 11359 } 11360 } 11361 11362 #if 0 11363 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11364 { 11365 for (drflac_uint64 i = 0; i < frameCount; ++i) { 11366 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0); 11367 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0); 11368 } 11369 } 11370 #endif 11371 11372 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11373 { 11374 drflac_uint64 i; 11375 drflac_uint64 frameCount4 = frameCount >> 2; 11376 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 11377 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 11378 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample; 11379 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample; 11380 float factor = 1 / 2147483648.0; 11381 11382 for (i = 0; i < frameCount4; ++i) { 11383 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0; 11384 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0; 11385 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0; 11386 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0; 11387 11388 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1; 11389 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1; 11390 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1; 11391 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1; 11392 11393 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor; 11394 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor; 11395 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor; 11396 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor; 11397 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor; 11398 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor; 11399 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor; 11400 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor; 11401 } 11402 11403 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11404 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor; 11405 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor; 11406 } 11407 } 11408 11409 #if defined(DRFLAC_SUPPORT_SSE2) 11410 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11411 { 11412 drflac_uint64 i; 11413 drflac_uint64 frameCount4 = frameCount >> 2; 11414 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 11415 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 11416 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8; 11417 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8; 11418 11419 float factor = 1.0f / 8388608.0f; 11420 __m128 factor128 = _mm_set1_ps(factor); 11421 11422 for (i = 0; i < frameCount4; ++i) { 11423 __m128i lefti; 11424 __m128i righti; 11425 __m128 leftf; 11426 __m128 rightf; 11427 11428 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0); 11429 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1); 11430 11431 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128); 11432 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128); 11433 11434 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf)); 11435 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf)); 11436 } 11437 11438 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11439 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor; 11440 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor; 11441 } 11442 } 11443 #endif 11444 11445 #if defined(DRFLAC_SUPPORT_NEON) 11446 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11447 { 11448 drflac_uint64 i; 11449 drflac_uint64 frameCount4 = frameCount >> 2; 11450 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0; 11451 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1; 11452 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8; 11453 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8; 11454 11455 float factor = 1.0f / 8388608.0f; 11456 float32x4_t factor4 = vdupq_n_f32(factor); 11457 int32x4_t shift0_4 = vdupq_n_s32(shift0); 11458 int32x4_t shift1_4 = vdupq_n_s32(shift1); 11459 11460 for (i = 0; i < frameCount4; ++i) { 11461 int32x4_t lefti; 11462 int32x4_t righti; 11463 float32x4_t leftf; 11464 float32x4_t rightf; 11465 11466 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4)); 11467 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4)); 11468 11469 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4); 11470 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4); 11471 11472 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf)); 11473 } 11474 11475 for (i = (frameCount4 << 2); i < frameCount; ++i) { 11476 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor; 11477 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor; 11478 } 11479 } 11480 #endif 11481 11482 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples) 11483 { 11484 #if defined(DRFLAC_SUPPORT_SSE2) 11485 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) { 11486 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11487 } else 11488 #elif defined(DRFLAC_SUPPORT_NEON) 11489 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) { 11490 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11491 } else 11492 #endif 11493 { 11494 /* Scalar fallback. */ 11495 #if 0 11496 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11497 #else 11498 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples); 11499 #endif 11500 } 11501 } 11502 11503 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut) 11504 { 11505 drflac_uint64 framesRead; 11506 drflac_uint32 unusedBitsPerSample; 11507 11508 if (pFlac == NULL || framesToRead == 0) { 11509 return 0; 11510 } 11511 11512 if (pBufferOut == NULL) { 11513 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead); 11514 } 11515 11516 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32); 11517 unusedBitsPerSample = 32 - pFlac->bitsPerSample; 11518 11519 framesRead = 0; 11520 while (framesToRead > 0) { 11521 /* If we've run out of samples in this frame, go to the next. */ 11522 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) { 11523 if (!drflac__read_and_decode_next_flac_frame(pFlac)) { 11524 break; /* Couldn't read the next frame, so just break from the loop and return. */ 11525 } 11526 } else { 11527 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment); 11528 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining; 11529 drflac_uint64 frameCountThisIteration = framesToRead; 11530 11531 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) { 11532 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining; 11533 } 11534 11535 if (channelCount == 2) { 11536 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame; 11537 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame; 11538 11539 switch (pFlac->currentFLACFrame.header.channelAssignment) 11540 { 11541 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE: 11542 { 11543 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 11544 } break; 11545 11546 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE: 11547 { 11548 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 11549 } break; 11550 11551 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE: 11552 { 11553 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 11554 } break; 11555 11556 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT: 11557 default: 11558 { 11559 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut); 11560 } break; 11561 } 11562 } else { 11563 /* Generic interleaving. */ 11564 drflac_uint64 i; 11565 for (i = 0; i < frameCountThisIteration; ++i) { 11566 unsigned int j; 11567 for (j = 0; j < channelCount; ++j) { 11568 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample)); 11569 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0); 11570 } 11571 } 11572 } 11573 11574 framesRead += frameCountThisIteration; 11575 pBufferOut += frameCountThisIteration * channelCount; 11576 framesToRead -= frameCountThisIteration; 11577 pFlac->currentPCMFrame += frameCountThisIteration; 11578 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration; 11579 } 11580 } 11581 11582 return framesRead; 11583 } 11584 11585 11586 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex) 11587 { 11588 if (pFlac == NULL) { 11589 return DRFLAC_FALSE; 11590 } 11591 11592 /* Don't do anything if we're already on the seek point. */ 11593 if (pFlac->currentPCMFrame == pcmFrameIndex) { 11594 return DRFLAC_TRUE; 11595 } 11596 11597 /* 11598 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present 11599 when the decoder was opened. 11600 */ 11601 if (pFlac->firstFLACFramePosInBytes == 0) { 11602 return DRFLAC_FALSE; 11603 } 11604 11605 if (pcmFrameIndex == 0) { 11606 pFlac->currentPCMFrame = 0; 11607 return drflac__seek_to_first_frame(pFlac); 11608 } else { 11609 drflac_bool32 wasSuccessful = DRFLAC_FALSE; 11610 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame; 11611 11612 /* Clamp the sample to the end. */ 11613 if (pcmFrameIndex > pFlac->totalPCMFrameCount) { 11614 pcmFrameIndex = pFlac->totalPCMFrameCount; 11615 } 11616 11617 /* If the target sample and the current sample are in the same frame we just move the position forward. */ 11618 if (pcmFrameIndex > pFlac->currentPCMFrame) { 11619 /* Forward. */ 11620 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame); 11621 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) { 11622 pFlac->currentFLACFrame.pcmFramesRemaining -= offset; 11623 pFlac->currentPCMFrame = pcmFrameIndex; 11624 return DRFLAC_TRUE; 11625 } 11626 } else { 11627 /* Backward. */ 11628 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex); 11629 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames; 11630 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining; 11631 if (currentFLACFramePCMFramesConsumed > offsetAbs) { 11632 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs; 11633 pFlac->currentPCMFrame = pcmFrameIndex; 11634 return DRFLAC_TRUE; 11635 } 11636 } 11637 11638 /* 11639 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so 11640 we'll instead use Ogg's natural seeking facility. 11641 */ 11642 #ifndef DR_FLAC_NO_OGG 11643 if (pFlac->container == drflac_container_ogg) 11644 { 11645 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex); 11646 } 11647 else 11648 #endif 11649 { 11650 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */ 11651 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) { 11652 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex); 11653 } 11654 11655 #if !defined(DR_FLAC_NO_CRC) 11656 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */ 11657 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) { 11658 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex); 11659 } 11660 #endif 11661 11662 /* Fall back to brute force if all else fails. */ 11663 if (!wasSuccessful && !pFlac->_noBruteForceSeek) { 11664 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex); 11665 } 11666 } 11667 11668 if (wasSuccessful) { 11669 pFlac->currentPCMFrame = pcmFrameIndex; 11670 } else { 11671 /* Seek failed. Try putting the decoder back to it's original state. */ 11672 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) { 11673 /* Failed to seek back to the original PCM frame. Fall back to 0. */ 11674 drflac_seek_to_pcm_frame(pFlac, 0); 11675 } 11676 } 11677 11678 return wasSuccessful; 11679 } 11680 } 11681 11682 11683 11684 /* High Level APIs */ 11685 11686 /* SIZE_MAX */ 11687 #if defined(SIZE_MAX) 11688 #define DRFLAC_SIZE_MAX SIZE_MAX 11689 #else 11690 #if defined(DRFLAC_64BIT) 11691 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF) 11692 #else 11693 #define DRFLAC_SIZE_MAX 0xFFFFFFFF 11694 #endif 11695 #endif 11696 /* End SIZE_MAX */ 11697 11698 11699 /* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */ 11700 #define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \ 11701 static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\ 11702 { \ 11703 type* pSampleData = NULL; \ 11704 drflac_uint64 totalPCMFrameCount; \ 11705 \ 11706 DRFLAC_ASSERT(pFlac != NULL); \ 11707 \ 11708 totalPCMFrameCount = pFlac->totalPCMFrameCount; \ 11709 \ 11710 if (totalPCMFrameCount == 0) { \ 11711 type buffer[4096]; \ 11712 drflac_uint64 pcmFramesRead; \ 11713 size_t sampleDataBufferSize = sizeof(buffer); \ 11714 \ 11715 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \ 11716 if (pSampleData == NULL) { \ 11717 goto on_error; \ 11718 } \ 11719 \ 11720 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \ 11721 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \ 11722 type* pNewSampleData; \ 11723 size_t newSampleDataBufferSize; \ 11724 \ 11725 newSampleDataBufferSize = sampleDataBufferSize * 2; \ 11726 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \ 11727 if (pNewSampleData == NULL) { \ 11728 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \ 11729 goto on_error; \ 11730 } \ 11731 \ 11732 sampleDataBufferSize = newSampleDataBufferSize; \ 11733 pSampleData = pNewSampleData; \ 11734 } \ 11735 \ 11736 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \ 11737 totalPCMFrameCount += pcmFramesRead; \ 11738 } \ 11739 \ 11740 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \ 11741 protect those ears from random noise! */ \ 11742 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \ 11743 } else { \ 11744 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \ 11745 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \ 11746 goto on_error; /* The decoded data is too big. */ \ 11747 } \ 11748 \ 11749 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \ 11750 if (pSampleData == NULL) { \ 11751 goto on_error; \ 11752 } \ 11753 \ 11754 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \ 11755 } \ 11756 \ 11757 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \ 11758 if (channelsOut) *channelsOut = pFlac->channels; \ 11759 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \ 11760 \ 11761 drflac_close(pFlac); \ 11762 return pSampleData; \ 11763 \ 11764 on_error: \ 11765 drflac_close(pFlac); \ 11766 return NULL; \ 11767 } 11768 11769 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32) 11770 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16) 11771 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float) 11772 11773 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks) 11774 { 11775 drflac* pFlac; 11776 11777 if (channelsOut) { 11778 *channelsOut = 0; 11779 } 11780 if (sampleRateOut) { 11781 *sampleRateOut = 0; 11782 } 11783 if (totalPCMFrameCountOut) { 11784 *totalPCMFrameCountOut = 0; 11785 } 11786 11787 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks); 11788 if (pFlac == NULL) { 11789 return NULL; 11790 } 11791 11792 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut); 11793 } 11794 11795 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks) 11796 { 11797 drflac* pFlac; 11798 11799 if (channelsOut) { 11800 *channelsOut = 0; 11801 } 11802 if (sampleRateOut) { 11803 *sampleRateOut = 0; 11804 } 11805 if (totalPCMFrameCountOut) { 11806 *totalPCMFrameCountOut = 0; 11807 } 11808 11809 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks); 11810 if (pFlac == NULL) { 11811 return NULL; 11812 } 11813 11814 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut); 11815 } 11816 11817 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks) 11818 { 11819 drflac* pFlac; 11820 11821 if (channelsOut) { 11822 *channelsOut = 0; 11823 } 11824 if (sampleRateOut) { 11825 *sampleRateOut = 0; 11826 } 11827 if (totalPCMFrameCountOut) { 11828 *totalPCMFrameCountOut = 0; 11829 } 11830 11831 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks); 11832 if (pFlac == NULL) { 11833 return NULL; 11834 } 11835 11836 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut); 11837 } 11838 11839 #ifndef DR_FLAC_NO_STDIO 11840 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks) 11841 { 11842 drflac* pFlac; 11843 11844 if (sampleRate) { 11845 *sampleRate = 0; 11846 } 11847 if (channels) { 11848 *channels = 0; 11849 } 11850 if (totalPCMFrameCount) { 11851 *totalPCMFrameCount = 0; 11852 } 11853 11854 pFlac = drflac_open_file(filename, pAllocationCallbacks); 11855 if (pFlac == NULL) { 11856 return NULL; 11857 } 11858 11859 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount); 11860 } 11861 11862 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks) 11863 { 11864 drflac* pFlac; 11865 11866 if (sampleRate) { 11867 *sampleRate = 0; 11868 } 11869 if (channels) { 11870 *channels = 0; 11871 } 11872 if (totalPCMFrameCount) { 11873 *totalPCMFrameCount = 0; 11874 } 11875 11876 pFlac = drflac_open_file(filename, pAllocationCallbacks); 11877 if (pFlac == NULL) { 11878 return NULL; 11879 } 11880 11881 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount); 11882 } 11883 11884 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks) 11885 { 11886 drflac* pFlac; 11887 11888 if (sampleRate) { 11889 *sampleRate = 0; 11890 } 11891 if (channels) { 11892 *channels = 0; 11893 } 11894 if (totalPCMFrameCount) { 11895 *totalPCMFrameCount = 0; 11896 } 11897 11898 pFlac = drflac_open_file(filename, pAllocationCallbacks); 11899 if (pFlac == NULL) { 11900 return NULL; 11901 } 11902 11903 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount); 11904 } 11905 #endif 11906 11907 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks) 11908 { 11909 drflac* pFlac; 11910 11911 if (sampleRate) { 11912 *sampleRate = 0; 11913 } 11914 if (channels) { 11915 *channels = 0; 11916 } 11917 if (totalPCMFrameCount) { 11918 *totalPCMFrameCount = 0; 11919 } 11920 11921 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks); 11922 if (pFlac == NULL) { 11923 return NULL; 11924 } 11925 11926 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount); 11927 } 11928 11929 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks) 11930 { 11931 drflac* pFlac; 11932 11933 if (sampleRate) { 11934 *sampleRate = 0; 11935 } 11936 if (channels) { 11937 *channels = 0; 11938 } 11939 if (totalPCMFrameCount) { 11940 *totalPCMFrameCount = 0; 11941 } 11942 11943 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks); 11944 if (pFlac == NULL) { 11945 return NULL; 11946 } 11947 11948 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount); 11949 } 11950 11951 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks) 11952 { 11953 drflac* pFlac; 11954 11955 if (sampleRate) { 11956 *sampleRate = 0; 11957 } 11958 if (channels) { 11959 *channels = 0; 11960 } 11961 if (totalPCMFrameCount) { 11962 *totalPCMFrameCount = 0; 11963 } 11964 11965 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks); 11966 if (pFlac == NULL) { 11967 return NULL; 11968 } 11969 11970 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount); 11971 } 11972 11973 11974 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks) 11975 { 11976 if (pAllocationCallbacks != NULL) { 11977 drflac__free_from_callbacks(p, pAllocationCallbacks); 11978 } else { 11979 drflac__free_default(p, NULL); 11980 } 11981 } 11982 11983 11984 11985 11986 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments) 11987 { 11988 if (pIter == NULL) { 11989 return; 11990 } 11991 11992 pIter->countRemaining = commentCount; 11993 pIter->pRunningData = (const char*)pComments; 11994 } 11995 11996 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut) 11997 { 11998 drflac_int32 length; 11999 const char* pComment; 12000 12001 /* Safety. */ 12002 if (pCommentLengthOut) { 12003 *pCommentLengthOut = 0; 12004 } 12005 12006 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) { 12007 return NULL; 12008 } 12009 12010 length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData); 12011 pIter->pRunningData += 4; 12012 12013 pComment = pIter->pRunningData; 12014 pIter->pRunningData += length; 12015 pIter->countRemaining -= 1; 12016 12017 if (pCommentLengthOut) { 12018 *pCommentLengthOut = length; 12019 } 12020 12021 return pComment; 12022 } 12023 12024 12025 12026 12027 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData) 12028 { 12029 if (pIter == NULL) { 12030 return; 12031 } 12032 12033 pIter->countRemaining = trackCount; 12034 pIter->pRunningData = (const char*)pTrackData; 12035 } 12036 12037 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack) 12038 { 12039 drflac_cuesheet_track cuesheetTrack; 12040 const char* pRunningData; 12041 drflac_uint64 offsetHi; 12042 drflac_uint64 offsetLo; 12043 12044 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) { 12045 return DRFLAC_FALSE; 12046 } 12047 12048 pRunningData = pIter->pRunningData; 12049 12050 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4; 12051 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4; 12052 cuesheetTrack.offset = offsetLo | (offsetHi << 32); 12053 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1; 12054 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12; 12055 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0; 12056 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14; 12057 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1; 12058 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index); 12059 12060 pIter->pRunningData = pRunningData; 12061 pIter->countRemaining -= 1; 12062 12063 if (pCuesheetTrack) { 12064 *pCuesheetTrack = cuesheetTrack; 12065 } 12066 12067 return DRFLAC_TRUE; 12068 } 12069 12070 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6))) 12071 #pragma GCC diagnostic pop 12072 #endif 12073 #endif /* dr_flac_c */ 12074 #endif /* DR_FLAC_IMPLEMENTATION */ 12075 12076 12077 /* 12078 REVISION HISTORY 12079 ================ 12080 v0.12.42 - 2023-11-02 12081 - Fix build for ARMv6-M. 12082 - Fix a compilation warning with GCC. 12083 12084 v0.12.41 - 2023-06-17 12085 - Fix an incorrect date in revision history. No functional change. 12086 12087 v0.12.40 - 2023-05-22 12088 - Minor code restructure. No functional change. 12089 12090 v0.12.39 - 2022-09-17 12091 - Fix compilation with DJGPP. 12092 - Fix compilation error with Visual Studio 2019 and the ARM build. 12093 - Fix an error with SSE 4.1 detection. 12094 - Add support for disabling wchar_t with DR_WAV_NO_WCHAR. 12095 - Improve compatibility with compilers which lack support for explicit struct packing. 12096 - Improve compatibility with low-end and embedded hardware by reducing the amount of stack 12097 allocation when loading an Ogg encapsulated file. 12098 12099 v0.12.38 - 2022-04-10 12100 - Fix compilation error on older versions of GCC. 12101 12102 v0.12.37 - 2022-02-12 12103 - Improve ARM detection. 12104 12105 v0.12.36 - 2022-02-07 12106 - Fix a compilation error with the ARM build. 12107 12108 v0.12.35 - 2022-02-06 12109 - Fix a bug due to underestimating the amount of precision required for the prediction stage. 12110 - Fix some bugs found from fuzz testing. 12111 12112 v0.12.34 - 2022-01-07 12113 - Fix some misalignment bugs when reading metadata. 12114 12115 v0.12.33 - 2021-12-22 12116 - Fix a bug with seeking when the seek table does not start at PCM frame 0. 12117 12118 v0.12.32 - 2021-12-11 12119 - Fix a warning with Clang. 12120 12121 v0.12.31 - 2021-08-16 12122 - Silence some warnings. 12123 12124 v0.12.30 - 2021-07-31 12125 - Fix platform detection for ARM64. 12126 12127 v0.12.29 - 2021-04-02 12128 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking. 12129 - Fix a decoding error due to an incorrect validation check. 12130 12131 v0.12.28 - 2021-02-21 12132 - Fix a warning due to referencing _MSC_VER when it is undefined. 12133 12134 v0.12.27 - 2021-01-31 12135 - Fix a static analysis warning. 12136 12137 v0.12.26 - 2021-01-17 12138 - Fix a compilation warning due to _BSD_SOURCE being deprecated. 12139 12140 v0.12.25 - 2020-12-26 12141 - Update documentation. 12142 12143 v0.12.24 - 2020-11-29 12144 - Fix ARM64/NEON detection when compiling with MSVC. 12145 12146 v0.12.23 - 2020-11-21 12147 - Fix compilation with OpenWatcom. 12148 12149 v0.12.22 - 2020-11-01 12150 - Fix an error with the previous release. 12151 12152 v0.12.21 - 2020-11-01 12153 - Fix a possible deadlock when seeking. 12154 - Improve compiler support for older versions of GCC. 12155 12156 v0.12.20 - 2020-09-08 12157 - Fix a compilation error on older compilers. 12158 12159 v0.12.19 - 2020-08-30 12160 - Fix a bug due to an undefined 32-bit shift. 12161 12162 v0.12.18 - 2020-08-14 12163 - Fix a crash when compiling with clang-cl. 12164 12165 v0.12.17 - 2020-08-02 12166 - Simplify sized types. 12167 12168 v0.12.16 - 2020-07-25 12169 - Fix a compilation warning. 12170 12171 v0.12.15 - 2020-07-06 12172 - Check for negative LPC shifts and return an error. 12173 12174 v0.12.14 - 2020-06-23 12175 - Add include guard for the implementation section. 12176 12177 v0.12.13 - 2020-05-16 12178 - Add compile-time and run-time version querying. 12179 - DRFLAC_VERSION_MINOR 12180 - DRFLAC_VERSION_MAJOR 12181 - DRFLAC_VERSION_REVISION 12182 - DRFLAC_VERSION_STRING 12183 - drflac_version() 12184 - drflac_version_string() 12185 12186 v0.12.12 - 2020-04-30 12187 - Fix compilation errors with VC6. 12188 12189 v0.12.11 - 2020-04-19 12190 - Fix some pedantic warnings. 12191 - Fix some undefined behaviour warnings. 12192 12193 v0.12.10 - 2020-04-10 12194 - Fix some bugs when trying to seek with an invalid seek table. 12195 12196 v0.12.9 - 2020-04-05 12197 - Fix warnings. 12198 12199 v0.12.8 - 2020-04-04 12200 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w(). 12201 - Fix some static analysis warnings. 12202 - Minor documentation updates. 12203 12204 v0.12.7 - 2020-03-14 12205 - Fix compilation errors with VC6. 12206 12207 v0.12.6 - 2020-03-07 12208 - Fix compilation error with Visual Studio .NET 2003. 12209 12210 v0.12.5 - 2020-01-30 12211 - Silence some static analysis warnings. 12212 12213 v0.12.4 - 2020-01-29 12214 - Silence some static analysis warnings. 12215 12216 v0.12.3 - 2019-12-02 12217 - Fix some warnings when compiling with GCC and the -Og flag. 12218 - Fix a crash in out-of-memory situations. 12219 - Fix potential integer overflow bug. 12220 - Fix some static analysis warnings. 12221 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation. 12222 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8. 12223 12224 v0.12.2 - 2019-10-07 12225 - Internal code clean up. 12226 12227 v0.12.1 - 2019-09-29 12228 - Fix some Clang Static Analyzer warnings. 12229 - Fix an unused variable warning. 12230 12231 v0.12.0 - 2019-09-23 12232 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation 12233 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs: 12234 - drflac_open() 12235 - drflac_open_relaxed() 12236 - drflac_open_with_metadata() 12237 - drflac_open_with_metadata_relaxed() 12238 - drflac_open_file() 12239 - drflac_open_file_with_metadata() 12240 - drflac_open_memory() 12241 - drflac_open_memory_with_metadata() 12242 - drflac_open_and_read_pcm_frames_s32() 12243 - drflac_open_and_read_pcm_frames_s16() 12244 - drflac_open_and_read_pcm_frames_f32() 12245 - drflac_open_file_and_read_pcm_frames_s32() 12246 - drflac_open_file_and_read_pcm_frames_s16() 12247 - drflac_open_file_and_read_pcm_frames_f32() 12248 - drflac_open_memory_and_read_pcm_frames_s32() 12249 - drflac_open_memory_and_read_pcm_frames_s16() 12250 - drflac_open_memory_and_read_pcm_frames_f32() 12251 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use 12252 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE. 12253 - Remove deprecated APIs: 12254 - drflac_read_s32() 12255 - drflac_read_s16() 12256 - drflac_read_f32() 12257 - drflac_seek_to_sample() 12258 - drflac_open_and_decode_s32() 12259 - drflac_open_and_decode_s16() 12260 - drflac_open_and_decode_f32() 12261 - drflac_open_and_decode_file_s32() 12262 - drflac_open_and_decode_file_s16() 12263 - drflac_open_and_decode_file_f32() 12264 - drflac_open_and_decode_memory_s32() 12265 - drflac_open_and_decode_memory_s16() 12266 - drflac_open_and_decode_memory_f32() 12267 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount 12268 by doing pFlac->totalPCMFrameCount*pFlac->channels. 12269 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames. 12270 - Fix errors when seeking to the end of a stream. 12271 - Optimizations to seeking. 12272 - SSE improvements and optimizations. 12273 - ARM NEON optimizations. 12274 - Optimizations to drflac_read_pcm_frames_s16(). 12275 - Optimizations to drflac_read_pcm_frames_s32(). 12276 12277 v0.11.10 - 2019-06-26 12278 - Fix a compiler error. 12279 12280 v0.11.9 - 2019-06-16 12281 - Silence some ThreadSanitizer warnings. 12282 12283 v0.11.8 - 2019-05-21 12284 - Fix warnings. 12285 12286 v0.11.7 - 2019-05-06 12287 - C89 fixes. 12288 12289 v0.11.6 - 2019-05-05 12290 - Add support for C89. 12291 - Fix a compiler warning when CRC is disabled. 12292 - Change license to choice of public domain or MIT-0. 12293 12294 v0.11.5 - 2019-04-19 12295 - Fix a compiler error with GCC. 12296 12297 v0.11.4 - 2019-04-17 12298 - Fix some warnings with GCC when compiling with -std=c99. 12299 12300 v0.11.3 - 2019-04-07 12301 - Silence warnings with GCC. 12302 12303 v0.11.2 - 2019-03-10 12304 - Fix a warning. 12305 12306 v0.11.1 - 2019-02-17 12307 - Fix a potential bug with seeking. 12308 12309 v0.11.0 - 2018-12-16 12310 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with 12311 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take 12312 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by 12313 dividing it by the channel count, and then do the same with the return value. 12314 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as 12315 the changes to drflac_read_*() apply. 12316 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as 12317 the changes to drflac_read_*() apply. 12318 - Optimizations. 12319 12320 v0.10.0 - 2018-09-11 12321 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you 12322 need to do it yourself via the callback API. 12323 - Fix the clang build. 12324 - Fix undefined behavior. 12325 - Fix errors with CUESHEET metdata blocks. 12326 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the 12327 Vorbis comment API. 12328 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams. 12329 - Minor optimizations. 12330 12331 v0.9.11 - 2018-08-29 12332 - Fix a bug with sample reconstruction. 12333 12334 v0.9.10 - 2018-08-07 12335 - Improve 64-bit detection. 12336 12337 v0.9.9 - 2018-08-05 12338 - Fix C++ build on older versions of GCC. 12339 12340 v0.9.8 - 2018-07-24 12341 - Fix compilation errors. 12342 12343 v0.9.7 - 2018-07-05 12344 - Fix a warning. 12345 12346 v0.9.6 - 2018-06-29 12347 - Fix some typos. 12348 12349 v0.9.5 - 2018-06-23 12350 - Fix some warnings. 12351 12352 v0.9.4 - 2018-06-14 12353 - Optimizations to seeking. 12354 - Clean up. 12355 12356 v0.9.3 - 2018-05-22 12357 - Bug fix. 12358 12359 v0.9.2 - 2018-05-12 12360 - Fix a compilation error due to a missing break statement. 12361 12362 v0.9.1 - 2018-04-29 12363 - Fix compilation error with Clang. 12364 12365 v0.9 - 2018-04-24 12366 - Fix Clang build. 12367 - Start using major.minor.revision versioning. 12368 12369 v0.8g - 2018-04-19 12370 - Fix build on non-x86/x64 architectures. 12371 12372 v0.8f - 2018-02-02 12373 - Stop pretending to support changing rate/channels mid stream. 12374 12375 v0.8e - 2018-02-01 12376 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream. 12377 - Fix a crash the the Rice partition order is invalid. 12378 12379 v0.8d - 2017-09-22 12380 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped. 12381 12382 v0.8c - 2017-09-07 12383 - Fix warning on non-x86/x64 architectures. 12384 12385 v0.8b - 2017-08-19 12386 - Fix build on non-x86/x64 architectures. 12387 12388 v0.8a - 2017-08-13 12389 - A small optimization for the Clang build. 12390 12391 v0.8 - 2017-08-12 12392 - API CHANGE: Rename dr_* types to drflac_*. 12393 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation. 12394 - Add support for custom implementations of malloc(), realloc(), etc. 12395 - Add CRC checking to Ogg encapsulated streams. 12396 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported. 12397 - Bug fixes. 12398 12399 v0.7 - 2017-07-23 12400 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed(). 12401 12402 v0.6 - 2017-07-22 12403 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they 12404 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame. 12405 12406 v0.5 - 2017-07-16 12407 - Fix typos. 12408 - Change drflac_bool* types to unsigned. 12409 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC. 12410 12411 v0.4f - 2017-03-10 12412 - Fix a couple of bugs with the bitstreaming code. 12413 12414 v0.4e - 2017-02-17 12415 - Fix some warnings. 12416 12417 v0.4d - 2016-12-26 12418 - Add support for 32-bit floating-point PCM decoding. 12419 - Use drflac_int* and drflac_uint* sized types to improve compiler support. 12420 - Minor improvements to documentation. 12421 12422 v0.4c - 2016-12-26 12423 - Add support for signed 16-bit integer PCM decoding. 12424 12425 v0.4b - 2016-10-23 12426 - A minor change to drflac_bool8 and drflac_bool32 types. 12427 12428 v0.4a - 2016-10-11 12429 - Rename drBool32 to drflac_bool32 for styling consistency. 12430 12431 v0.4 - 2016-09-29 12432 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type. 12433 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32(). 12434 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to 12435 keep it consistent with drflac_audio. 12436 12437 v0.3f - 2016-09-21 12438 - Fix a warning with GCC. 12439 12440 v0.3e - 2016-09-18 12441 - Fixed a bug where GCC 4.3+ was not getting properly identified. 12442 - Fixed a few typos. 12443 - Changed date formats to ISO 8601 (YYYY-MM-DD). 12444 12445 v0.3d - 2016-06-11 12446 - Minor clean up. 12447 12448 v0.3c - 2016-05-28 12449 - Fixed compilation error. 12450 12451 v0.3b - 2016-05-16 12452 - Fixed Linux/GCC build. 12453 - Updated documentation. 12454 12455 v0.3a - 2016-05-15 12456 - Minor fixes to documentation. 12457 12458 v0.3 - 2016-05-11 12459 - Optimizations. Now at about parity with the reference implementation on 32-bit builds. 12460 - Lots of clean up. 12461 12462 v0.2b - 2016-05-10 12463 - Bug fixes. 12464 12465 v0.2a - 2016-05-10 12466 - Made drflac_open_and_decode() more robust. 12467 - Removed an unused debugging variable 12468 12469 v0.2 - 2016-05-09 12470 - Added support for Ogg encapsulation. 12471 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek 12472 should be relative to the start or the current position. Also changes the seeking rules such that 12473 seeking offsets will never be negative. 12474 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count. 12475 12476 v0.1b - 2016-05-07 12477 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize. 12478 - Removed a stale comment. 12479 12480 v0.1a - 2016-05-05 12481 - Minor formatting changes. 12482 - Fixed a warning on the GCC build. 12483 12484 v0.1 - 2016-05-03 12485 - Initial versioned release. 12486 */ 12487 12488 /* 12489 This software is available as a choice of the following licenses. Choose 12490 whichever you prefer. 12491 12492 =============================================================================== 12493 ALTERNATIVE 1 - Public Domain (www.unlicense.org) 12494 =============================================================================== 12495 This is free and unencumbered software released into the public domain. 12496 12497 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this 12498 software, either in source code form or as a compiled binary, for any purpose, 12499 commercial or non-commercial, and by any means. 12500 12501 In jurisdictions that recognize copyright laws, the author or authors of this 12502 software dedicate any and all copyright interest in the software to the public 12503 domain. We make this dedication for the benefit of the public at large and to 12504 the detriment of our heirs and successors. We intend this dedication to be an 12505 overt act of relinquishment in perpetuity of all present and future rights to 12506 this software under copyright law. 12507 12508 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12509 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 12510 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 12511 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 12512 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 12513 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 12514 12515 For more information, please refer to <http://unlicense.org/> 12516 12517 =============================================================================== 12518 ALTERNATIVE 2 - MIT No Attribution 12519 =============================================================================== 12520 Copyright 2023 David Reid 12521 12522 Permission is hereby granted, free of charge, to any person obtaining a copy of 12523 this software and associated documentation files (the "Software"), to deal in 12524 the Software without restriction, including without limitation the rights to 12525 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 12526 of the Software, and to permit persons to whom the Software is furnished to do 12527 so. 12528 12529 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12530 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 12531 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 12532 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12533 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 12534 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 12535 SOFTWARE. 12536 */