webdev.md (47669B)
1 # A masochist's guide to web development 2 3 ## Table of contents 4 5 * [Introduction](#introduction) 6 * [Setting things up](#setting-things-up) 7 * [Hello world](#hello-world) 8 * [Intermezzo I: What is WebAssembly?](#intermezzo-i-what-is-webassembly) 9 * [Building a library](#building-a-library) 10 * [Intermezzo II: JavaScript and the DOM](#intermezzo-ii-javascript-and-the-dom) 11 * [Loading the library and making it a module](#loading-the-library-and-making-it-a-module) 12 * [Multithreading](#multithreading) 13 * [Intermezzo III: Web Workers and Spectre](#intermezzo-iii-web-workers-and-spectre) 14 * [Don't block the main thread!](#dont-block-the-main-thread) 15 * [Callback functions](#callback-functions) 16 * [Persistent storage](#persistent-storage) 17 * [Closing thoughts](#closing-thoughts) 18 19 ## Introduction 20 21 I have recently worked on making a web application out of 22 [my latest Rubik's cube optimal solver](https://git.tronto.net/nissy-core/file/README.md.html). 23 This involved building a rather complex C code base (with 24 multithreading, SIMD, callback functions and whatnot) to 25 [WebAssembly](https://en.wikipedia.org/wiki/WebAssembly) via 26 [Emscripten](https://emscripten.org/), and writing a minimal amount of 27 JavaScript and HTML for the frontend. 28 29 This whole process was complex, tiring and at times frustrating - 30 but eventually [it was a success](https://tronto.net:48)! Not only 31 I accomplished my goal, but I have learnt a lot along the way. After 32 finishing the work, I decided to write down all that I have learnt and 33 share it with the world with this post. 34 35 You may be wondering why one should do such a thing instead of either 36 rewriting their code base in a more web-friendly language, or distributing 37 their app using a native GUI framework. The main reason to use WebAssembly 38 is that it can provide near-native performance (or so they claim) while 39 running inside a web browser; this gives you all the portability of a 40 web app without too much of a performance drawback, something that would 41 not be possible with an interpreted language such as JavaScript. 42 43 So, what is this blog post? A tutorial for web development? I am not sure 44 about this, but if it is, it is definitely not a normal one. As the title 45 suggests, you should not start from this guide unless you just *love* 46 banging your head against the wall. If you are looking for a *sane* 47 guide to web development, I strongly advise you head on to the 48 [Mozilla Developer Network tutorials page](https://developer.mozilla.org/en-US/docs/MDN/Tutorials) 49 and start from there. 50 51 But if you are a C or C++ developer looking to port a program or library 52 to the web, then you are in the right place. With this post I am going 53 to walk you through the process of building an increasingly complex 54 library that can run in a web browser. Make sure you are 55 sitting comfortably and be ready to sweat, because I am not going to 56 shy away from the hard stuff and the complicated details. 57 58 To follow this tutorial you won't need much experience with web 59 development, but some familiarity with HTML and an idea of what JavaScript 60 will be useful. It will also help to know that you can access your 61 browser's JavaScript console and other developer tools by pressing F12, 62 at least on Firefox or Chrome - but I guess I have literally just taught 63 you that, if you did not already know it. For all the rest, I'll make 64 sure to add many hyperlinks throughout the text, so you can follow them 65 if something is new to you. 66 67 A little disclaimer: although I am a somewhat experienced C developer, 68 I had very little web development experience before embarking in 69 this adventure. If you are a web developer, you may find errors in 70 this post that are going to make you laugh at my ignorance. If you do, 71 I'd appreciate it if you could report them to me by sending an email to 72 `sebastiano@tronto.net`! 73 74 With this out of the way, let's get started! 75 76 ## Setting things up 77 78 The examples used in this tutorial are all contained in a git repository, 79 which you can find either on 80 [my git page](https://git.tronto.net/emscripten-tutorial/file/README.md.html) or 81 [on github](https://github.com/sebastianotronto/emscripten-tutorial). 82 83 In order to follow them you are going to need: 84 85 * A working installation of [Emscripten](https://emscripten.org/) 86 (which also includes Node.js). Refer to the official website for 87 installation instructions. 88 * A web server such [darkhttpd](https://github.com/emikulic/darkhttpd) 89 or the Python `http.server` package; the examples will use darkhttpd. 90 91 I have only tested all of this on Linux, but everything should work 92 exactly the same on any UNIX system. If you are a Windows user, you can 93 either run everything inside 94 [WSL](https://learn.microsoft.com/en-us/windows/wsl/), or you can try and 95 adjust the examples to your system - if you choose this second option, 96 I'll happily accept patches or pull requests :) 97 98 ## Hello world 99 100 Let's start with the classic Hello World program: 101 102 ``` 103 #include <stdio.h> 104 105 int main() { 106 printf("Hello, web!\n"); 107 } 108 ``` 109 110 You can compile the code above with 111 112 ``` 113 emcc -o index.html hello.c 114 ``` 115 116 And if you now start a web server in the current folder, for example with 117 `darkhttpd .` (the dot at the end is important), and open a web browser to 118 [localhost:8080](http://localhost:8080) (or whatever port your web server 119 uses), you should see something like this: 120 121  122 123 As you can see, the compiler generated a bunch of extra stuff around 124 you print statement. You may or may not want this, but for now we can 125 take it as a convenient way to check that our program works as expected. 126 127 There are other ways to run this compiled code. With the command above, 128 the compiler should have generated for you 3 files: 129 130 * `index.html` - the web page in the screenshot above. 131 * `index.wasm` - the actual compiled code of your program; this file contains 132 WebAssembly bytecode. 133 * `index.js` - some JavaScript *glue code* to make it possible for `index.wasm` 134 to actually run in a browser. 135 136 If you don't specify `-o index.html`, or if your specify `-o` followed 137 by a filename ending in `.js`, the `.html` page is not going to be 138 generated. In this case (but also if you *do* generate the html page), 139 you can run the JavaScript code in your terminal with: 140 141 ``` 142 node index.js 143 ``` 144 145 In later examples, the same code may not work seamlessly in both a web 146 browser and in Node.js - for example, when dealing with persistent data 147 storage. But until then, we can generate all three files with a single 148 command and run our code in either way. 149 150 It is also possible to ask Emscripten to generate only the `.wasm` file, 151 in case you want to write the JavaScript glue code by yourself. To do 152 this, you can pass the `-sSTANDALONE_WASM` option to `emcc`. However, 153 in some cases the `.js` file is going to be generated even when this 154 option is used, for example when building a source file without a `main()` 155 entry point. Since this is something we'll do soon, we can forget about 156 this option and just take it as a fact that the `.wasm` files generated 157 by emscripten require some glue JavaScript code to actually run, 158 but in case you are interested you can check out 159 [the official documentation](https://emscripten.org/docs/tools_reference/settings_reference.html#standalone-wasm). 160 161 You can find the code for this example, as well as scripts to 162 build it and run the web server, in the directory `00_hello_world` 163 of the git repository 164 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 165 [github](https://github.com/sebastianotronto/emscripten-tutorial)). 166 167 Anyway, now we can build our C code to run in a web page. But this is 168 probably not the way we want to run it. First of all, we don't want to 169 use the HTML template provided by Emscripten; but more importantly, we 170 probably don't want to write a program that just prints stuff to standard 171 output. More likely, we want to write some kind of library of functions 172 that can be called from the front-end, so that the user can interact with 173 our program via an HTML + JavaScript web page. Before going into that, 174 let's take a break to discuss what we are actually compiling our code to. 175 176 ## Intermezzo I: What is WebAssembly? 177 178  179 180 [WebAssembly](https://en.wikipedia.org/wiki/WebAssembly) is a low-level 181 language meant to run in a virtual machine inside a web browser. The main 182 motivation behind it is running higher-performance web applications compared 183 to JavaScript; this is made possible, by its 184 compact bytecode and its stack-based virtual machine. 185 186 WebAssembly (or WASM for short) is supported by all major browsers 187 since around 2017. Interestingly, Emscripten, the compiler we are 188 using to translate our C code to WASM, first appeared in 2011, 189 predating WASM by a few years. Early on, Emscripten would compile 190 C and C++ code into JavaScript, or rather a subset thereof called 191 [asm.js](https://en.wikipedia.org/wiki/Asm.js). 192 193 Just like regular 194 [assembly](https://en.wikipedia.org/wiki/Assembly_language), WASM 195 also has a text-based representation. This means that one could write 196 WASM code directly, assemble it to bytecode, and then run it. We are 197 not going to do it, but if you are curious here is a simple example 198 (computing the factorial of a number, taken from Wikipedia): 199 200 ``` 201 (func (param i64) (result i64) 202 local.get 0 203 i64.eqz 204 if (result i64) 205 i64.const 1 206 else 207 local.get 0 208 local.get 0 209 i64.const 1 210 i64.sub 211 call 0 212 i64.mul 213 end) 214 ``` 215 216 As you can see, it looks like a strange mix of assembly and 217 [Lisp](https://en.wikipedia.org/wiki/Lisp_(programming_language)). 218 If you want to try and run WASM locally, outside of a web browser, 219 you could use something like [Wasmtime](https://wasmtime.dev/). 220 221 Until early 2025, the WASM "architecture" was 32-bit only. One big 222 limitation that this brings is that you cannot use more that 4GB 223 (2<sup>32</sup> bytes) of memory, because pointers are only 32 bits 224 long; moreover, your C / C++ code may need some adjustments if it 225 relied on the assumption that e.g. `sizeof(size_t) == 8`. At the 226 time writing a new standard that enables 64 bit pointers, called 227 WASM64, is supported on Firefox and Chrome, but not on Webkit-based 228 browsers such as Safari yet. Depending on when you are reading this, 229 this may have changed - you can check the status of WASM64 support 230 [here](https://webassembly.org/features/). 231 232 ## Building a library 233 234 Back to the main topic. Where were we? Oh yes, we wanted to build 235 a C *library* to WASM and call it from JavaScript. Our complex, 236 high-performance, math-heavy library probably looks something like this: 237 238 library.h (actually, we are not going to need this): 239 240 ``` 241 int multiply(int, int); 242 ``` 243 244 library.c: 245 246 ``` 247 int multiply(int a, int b) { 248 return a * b; 249 } 250 ``` 251 252 Or maybe it is a bit more complicated than that. But we said we are 253 going to build up in complexity, and this is just the beginning, so 254 let's stick to `multiply()`. 255 256 To build this library you can use: 257 258 ``` 259 emcc -o library.js library.c 260 ``` 261 262 As we saw before, this is going to generate both a `library.js` and a 263 `library.wasm` file. Now we would like to call our library function 264 with something like this 265 266 program.js: 267 268 ``` 269 var library = require("./library.js"); 270 const result = library.multiply(6, 7); 271 console.log("The answer is " + result); 272 ``` 273 274 *(The `require()` syntax above is valid when running this code in Node.js, 275 but not, for example when running in a browser. We'll see in the next 276 session what to do in that case, but for now let's stick to this.)* 277 278 Unfortunately, this will not work for a couple of reasons. The reason 279 first is that Emscripten is going to add an underscore `_` to all our 280 function names; so we'll have to call `library._multiply()`. But this 281 still won't work, because by default the compiler does not *export* all 282 the functions in your code - that is, it does not make them visible to 283 the outside. To specify which functions you want to 284 export, you can use the `-sEXPORTED_FUNCTIONS` flag, like so: 285 286 ``` 287 emcc -sEXPORTED_FUNCTION=_multiply -o library.js library.c 288 ``` 289 290 And now we finally have access to our `multiply()` function... 291 292 ``` 293 $ node program.js 294 Aborted(Assertion failed: native function `multiply` called before runtime initialization) 295 ``` 296 297 ...or maybe not. If you are new to JavaScript like I was a few weeks 298 ago, you may find this error message surprising. Some runtime must be 299 initialized, but can't it just, like... initialize *before* trying to 300 run the next instruction? 301 302 Things are not that simple. A lot of things in JavaScript happen 303 *asynchronously*, and in these situations you'll have to either use 304 [`await`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await) 305 or a 306 [*callback function*](https://developer.mozilla.org/en-US/docs/Glossary/Callback_function). 307 So we'll have to do something like this: 308 309 ``` 310 var library = require("./build/library.js"); 311 312 library.onRuntimeInitialized = () => { 313 const result = library._multiply(6, 7); 314 console.log("The answer is " + result); 315 }; 316 ``` 317 318 And now we can finally run our program: 319 320 ``` 321 $ node program.js 322 The answer is 42 323 ``` 324 325 The code for this example can be found in the `01_library` folder in 326 the git repository 327 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 328 [github](https://github.com/sebastianotronto/emscripten-tutorial)). 329 330 ## Intermezzo II: JavaScript and the DOM 331 332  333 334 If we want to build an interactive web page using JavaScript, we'll 335 need a way for our script to communicate with the page, i.e. a way 336 to access the HTML structure from JavaScript code. What we are looking 337 for is called 338 *[Document Object Model](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model)*, 339 or DOM for short. 340 341 For example, if you have a paragraph with some text in your HTML: 342 343 ``` 344 <p id="myParagraph">Hello!</p> 345 ``` 346 347 you can access this text from JavaScript like this: 348 349 ``` 350 var paragraph = document.getElementById("myParagraph"); 351 paragraph.innerText = "New text!"; 352 ``` 353 354 Here we are selecting the paragraph HTML element using its ID, and we 355 are changing its text via its `innerText` property, all from JavaScript. 356 357 Let's see a more complex example: 358 359 HTML: 360 361 ``` 362 <button id="theButton">Press me!</button> 363 ``` 364 365 JS: 366 367 ``` 368 var button = document.getElementById("theButton"); 369 var counter = 0; 370 371 button.addEventListener("click", () => { 372 counter++; 373 button.innerText = "I have been pressed " + counter + " times!"; 374 }); 375 ``` 376 377 In the example above we add an 378 *[event listener](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener)* 379 to a button: the (anonymous) function we defined is going to be called 380 every time the button is clicked. And since this is a web page, I guess 381 I can show you what this actually looks like. 382 383 Behold, the dynamic button: 384 385 <div style="text-align:center"> 386 <button id="theButton">Press me!</button> 387 </div> 388 389 <script> 390 window.onload = () => { 391 var button = document.getElementById("theButton"); 392 var count = 0; 393 394 button.addEventListener("click", () => { 395 count++; 396 button.innerText = "I have been pressed " + count + " times!" 397 }); 398 }; 399 </script> 400 401 If you are completely new to web development, you may be wondering 402 where you should write this JavaScript code. One option is to write it 403 in the same HTML file as the rest of the page, inside a `<script>` tag; 404 this is how I did it in the example above, as you can check by viewing 405 the source of this page (press Ctrl+U, or right-click and select 406 "view source", or prepend `view-source:` to this page's URL; hopefully 407 at least one of these methods should work in your browser). 408 409 However, if the script gets too large you may want to split it off in 410 a separate file, which we'll demonstrate in this next example. 411 412 Let's now make a template web page for using our powerful library. Let's 413 start with the HTML, which is in large part boilerplate: 414 415 index.html: 416 417 ``` 418 <!doctype html> 419 <html lang="en-US"> 420 <head> 421 <meta charset="utf-8"> 422 <meta name="viewport" content="width=device-width"> 423 <title>Multiply two numbers</title> 424 <script src="./script.js" defer></script> 425 </head> 426 427 <body> 428 <p> 429 <input id="aInput"> x <input id="bInput"> 430 <button id="goButton">=</button> 431 <span id="resultText"></span> 432 </p> 433 </body> 434 435 </html> 436 ``` 437 438 Besides the `<body>` element, the only important line for us is line 439 7, which loads the script from a file. Notice that we use the `defer` 440 keyword here: this is telling the browser to wait until the whole page 441 has been loaded before executing the script. If we did not do this, we 442 could run in the situation where we `document.getElementById()` returns 443 `null`, because the element we are trying to get is not loaded yet (yes, 444 this happened to me while I was writing this post). If you want to know 445 more, check out this 446 [MDN page](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/script#defer). 447 448 Now to the JavaScript code. For now we are going to use the built-in 449 `*` operator to multiply the two numbers, but in the next section we 450 are going to replace it with our own library. 451 452 script.js (in the same folder as index.html): 453 454 ``` 455 var aInput = document.getElementById("aInput"); 456 var bInput = document.getElementById("bInput"); 457 var button = document.getElementById("goButton"); 458 var resultText = document.getElementById("resultText"); 459 460 button.addEventListener("click", () => { 461 var a = Number(aInput.value); 462 var b = Number(bInput.value); 463 resultText.innerText = a * b; 464 }); 465 ``` 466 467 The final result will look something like this: 468 469 <p style="text-align:center"> 470 <input id="aInput"> x <input id="bInput"> 471 <button id="goButton">=</button> 472 <span id="resultText"></span> 473 </p> 474 475 <script> 476 var aInput = document.getElementById("aInput"); 477 var bInput = document.getElementById("bInput"); 478 var button = document.getElementById("goButton"); 479 var resultText = document.getElementById("resultText"); 480 481 button.addEventListener("click", () => { 482 var a = Number(aInput.value); 483 var b = Number(bInput.value); 484 resultText.innerText = a * b; 485 }); 486 </script> 487 488 In a real-world scenario you would probably want to check that the text 489 provided in the input fields is actually a number, or perhaps use the 490 [`type="number"`](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/input/number) 491 attribute for the input fields. But we'll ignore these issues here - 492 we are going to have more serious problems to deal with. 493 494 ## Loading the library and making it a module 495 496 With what we have learned in the previous intermezzo (you are not skipping 497 those, right?) we can finally run our library code in a real web page. The 498 code is pretty much the same as above; we just need to include both the 499 library and the script file in the HTML: 500 501 ``` 502 <script src="./library.js" defer></script> 503 <script src="./script.js" defer></script> 504 ``` 505 506 and of course we have to change the line where we perform the multiplication: 507 508 ``` 509 resultText.innerText = Module._multiply(a, b); 510 ``` 511 512 Here `Module` is the default name given to our library by 513 Emscripten. Apart from being too generic a name, this leads to another 514 problem: we can't include more than one Emscripten-built library in our 515 page in this way - otherwise, both are going to be called `Module`. 516 517 Luckily, there is another way: we can build a 518 [modularized](https://emscripten.org/docs/compiling/Modularized-Output.html) 519 library, i.e. obtain a 520 [JavaScript Module](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules). 521 This may sound a bit strange, because the name `Module` kind of implies 522 there is already a module. The way I understand it is that by default 523 Emscripten produces a *script* that *contains* a module named `Module`; 524 when building a modularized library, the whole resulting file is a module. 525 526 Modularizing our build is not necessary right now, but 527 there are a couple of other advantages to it: 528 529 * As mentioned above, we can change the name of our module and include 530 more than one Emscripten-built library, if we want. 531 * We will be able to use the module in the same way in Node.js and in 532 our web page script. This way we can minimize the differences between 533 the two versions of our code, which can be useful for testing. 534 * In case we want to build a more complex layer of JavaScript between 535 our library and our web page, with a modularized build we can easily 536 include the module in another file, which can then be included in the 537 main script. 538 539 So let's go ahead and build our library like so: 540 541 ``` 542 emcc -sEXPORTED_FUNCTION=_multiply -sMODULARIZE -sEXPORT_NAME=MyLibrary \ 543 -o library.mjs library.c 544 ``` 545 546 Notice I have changed the extension from `.js` to `.mjs`. Don't worry, 547 either extension can be used. And you are going to run into issues with 548 either choice: 549 550 * If you run your code in Node.js, it will understand that the library 551 file is a module only if you use the `.mjs` extension. Alternatively, 552 you can change some settings in a local configuration file to 553 enforce this. 554 * If you run your code in a web page, your web server may not be 555 configured to serve `.mjs` files as JavaScript files. This can 556 easily be changed by adding a configuration line somewhere. 557 558 In my examples I chose to use the `.mjs` extensions to make Node.js 559 happy, and I changed the configuration of my web servers as needed. For 560 example, for darkhttpd I added a file called `mime.txt` with a single 561 line `text/javascript mjs`, and launched the server with the 562 `--mimetypes mime.txt` option. 563 564 Now we have to make a couple of changes. Our `program.js`, for running 565 in node, becomes: 566 567 ``` 568 import MyLibrary from "./library.mjs" 569 570 var myLibraryInstance = await MyLibrary(); 571 572 const result = myLibraryInstance(6, 7); 573 console.log("The answer is " + result); 574 ``` 575 576 By the way, I have renamed this file to `program.mjs`. This is because 577 only modules can use the 578 [static `import`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import) 579 statement; alternatively, I could have used the 580 [dynamic `import()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import) 581 and kept the `.js` extension. 582 583 Similary, we have to update our `script.js` (or `script.mjs`) to import 584 the module and create an instance. Moreover, we have to specify in the 585 HTML that the script is now a module: 586 587 ``` 588 <script src="./script.mjs" type="module" defer></script> 589 ``` 590 591 And we can get rid of the other `<script>` tag, since now the library 592 is included directly in `script.mjs`. 593 594 You can find the full the code for this example the folder 595 `02_library_modularized` in the git repository 596 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 597 [github](https://github.com/sebastianotronto/emscripten-tutorial)). 598 599 ## Multithreading 600 601  602 603 Let's move on to a more interesting example. If one of the goals of 604 WebAssembly is performance, there is no point in using only 1/16th of 605 your CPU - let's port a multithreaded application to the web! 606 607 As a more complicated example, let's write a function that counts how 608 many prime numbers there are in a given range. This function takes two 609 integers as input and returns a single integer as output, but it does 610 a non-trivial amount of work under the hood. A simple implementation of 611 this routine would be something like this: 612 613 ``` 614 bool isprime(int n) { 615 if (n < 2) 616 return false; 617 618 for (int i = 2; i*i <= n; i++) 619 if (n % i == 0) 620 return false; 621 return true; 622 } 623 624 int primes_in_range(int low, int high) { 625 if (low < 0 || high < low) 626 return 0; 627 628 int count = 0; 629 for (int i = low; i < high; i++) 630 if (isprime(i)) 631 count++; 632 633 return count; 634 } 635 ``` 636 637 This algorithm is 638 [embarassingly parallelizable](https://en.wikipedia.org/wiki/Embarrassingly_parallel): 639 we can split the interval `[low, high)` into smaller sub-intervals and 640 process each one of them in a separate thread; then we just need to add 641 up the results of the sub-intervals. 642 643 For the actual implementation, we are going to use 644 [pthreads](https://en.wikipedia.org/wiki/Pthreads), for the simple reason 645 that it is 646 [supported by Emscripten](https://emscripten.org/docs/porting/pthreads.html). 647 In practice, assuming we are working on a UNIX platform, we could also 648 use C11's [threads.h](https://en.cppreference.com/w/c/header/threads) or 649 C++'s [std::thread](https://en.cppreference.com/w/cpp/thread/thread.html), 650 but only because they happen to be wrappers around pthreads. On other 651 platforms, or in other implementations of the C and C++ standard library, 652 this may not be the case; so we'll stick to old-school pthreads. 653 654 This is my parallel version of `primes_in_range()`: 655 656 primes.c: 657 658 ``` 659 #include <stdbool.h> 660 #include <pthread.h> 661 662 #define NTHREADS 16 663 664 bool isprime(int); 665 void *pthread_routine(void *); 666 667 struct interval { int low; int high; int count; }; 668 669 int primes_in_range(int low, int high) { 670 pthread_t threads[NTHREADS]; 671 struct interval args[NTHREADS]; 672 673 if (low < 0 || high < low) 674 return 0; 675 676 int interval_size = (high-low)/NTHREADS + 1; 677 for (int i = 0; i < NTHREADS; i++) { 678 args[i].low = low + i*interval_size; 679 args[i].high = args[i].low + interval_size; 680 pthread_create(&threads[i], NULL, pthread_routine, &args[i]); 681 } 682 683 int result = 0; 684 for (int i = 0; i < NTHREADS; i++) { 685 pthread_join(threads[i], NULL); 686 result += args[i].count; 687 } 688 689 return result; 690 } 691 692 bool isprime(int n) { 693 if (n < 2) 694 return false; 695 696 for (int i = 2; i*i <= n; i++) 697 if (n % i == 0) 698 return false; 699 return true; 700 } 701 702 void *pthread_routine(void *arg) { 703 struct interval *interval = arg; 704 705 interval->count = 0; 706 for (int i = interval->low; i < interval->high; i++) 707 if (isprime(i)) 708 interval->count++; 709 710 return NULL; 711 } 712 ``` 713 714 *(Pro tip: if you take the number of threads as an extra parameter for 715 your function, you can pass to it the value 716 [`navigator.hardwareConcurrency`](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/hardwareConcurrency) 717 from the JavaScript front-end and use exactly the maximum number of 718 threads that can run in parallel on the host platform.)* 719 720 To build this with Emscripten we'll have to pass the `-pthread` option and, 721 optionally, a suitable value for 722 [`-sPTHREAD_POOL_SIZE`](https://emscripten.org/docs/tools_reference/settings_reference.html#pthread-pool-size). 723 724 If we want to run our multithreaded code in an actual browser, we'll 725 have to scratch our head a bit harder. The code we are supposed to 726 write is exactly what we expect, but once again we have to tinker with 727 our web server configuration. For technical reasons that we'll cover in 728 the next intermezzo, in order to run multithreaded code in a browser 729 we must add a couple of HTTP headers: 730 731 ``` 732 Cross-Origin-Opener-Policy: same-origin 733 Cross-Origin-Embedder-Policy: require-corp 734 ``` 735 736 These headers are part of the response your browser will receive when 737 it requests any web page from the server. The way you set these depends on 738 the server you are using; with darkhttpd you can use the `--header` option. 739 740 With your server correctly set up, you can enjoy a multithreaded program 741 running in your browser! As always, you can check out this example from 742 the `03_threads` folder of the git repository 743 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 744 [github](https://github.com/sebastianotronto/emscripten-tutorial)). 745 746 ## Intermezzo III: Web Workers and Spectre 747 748  749 750 On a low level, threads are implemented by Emscripten using 751 [web workers](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API), 752 which are processes separated from the main web page process and 753 communicate with it and with each other by 754 [passing messages](https://developer.mozilla.org/en-US/docs/Web/API/Worker/postMessage). 755 Web workers are commonly used to run slow operations in the background 756 without blocking the UI threads, so the web page remains responsive 757 while these operations run - we'll do this in the next section. 758 759 Web workers do not have regular access to the same memory as the main 760 process, and this is something that will give us some issues in later 761 sections. However, there are ways around this limitation. One of these 762 ways is provided by 763 [SharedArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer), 764 which we won't use directly in this tutorial, but is used by 765 Emscripten under the hood. 766 767 And this is why we had to set the `Cross-Origin-*` headers. In 2018, a 768 CPU vulnerability called [Spectre](https://spectreattack.com) was found, 769 and it was shown that an attacker could take advantage of shared memory 770 between the main browser thread and web workers to 771 [execute code remotely](https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)#Remote_exploitation). 772 As a counter-measure, most browsers now require your app to be in a 773 [secure context](https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts) 774 and 775 [cross-origin isolated](https://developer.mozilla.org/en-US/docs/Web/API/Window/crossOriginIsolated) 776 to allow using `SharedArrayBuffer`s. 777 778 Even if you do not plan to use web workers directly, it is still good to 779 have a rough idea of how they work, because of the 780 [law of leaky abstractions](https://en.wikipedia.org/wiki/Leaky_abstraction): 781 *all abstractions are leaky*. 782 The fact that we had to mess around with our `Cross-Origin-*` headers 783 despite not caring at all about `SharedArrayBuffer`s is a blatant example 784 of this. 785 786 ## Don't block the main thread! 787 788 If you have run the previous example, may have noticed a scary warning 789 like this in your browser's console: 790 791 ![A warning saying "Blocking on the main thread is very dangerous, see [link]"](blocking.png) 792 793 *The link points to 794 [this page](https://emscripten.org/docs/porting/pthreads.html#blocking-on-the-main-browser-thread) 795 in Emscripten's documentation.* 796 797 The issue here is that our heavy computation is not running "in the 798 background", but its main thread (the one spawning the other threads) 799 coincides with the browser's main thread, the one that is responsible 800 for drawing the UI and handling user interaction. So if our computation 801 really takes long, the browser is going to freeze - and after a few 802 seconds it will ask us if we want to kill this long-running script. 803 804 As we anticipated in the previous intermezzo, we are going to solve this 805 with a web worker. We will structure this solution as follows: 806 807 * The main script will be responsible for reading the user input, sending 808 a message to the worker to ask it to compute the result, and handling 809 the result that the worker is going to send back once it is done. No 810 slow operation is performed by this script, so that it won't block 811 the main thread. 812 * The worker will be responsible for receiving mesages from the main 813 script, handling them by calling the library, and sending a message 814 with the response back once it is done computing. 815 816 In practice, this will look like this: 817 818 script.mjs: 819 820 ``` 821 var aInput = document.getElementById("aInput"); 822 var bInput = document.getElementById("bInput"); 823 var button = document.getElementById("goButton"); 824 var resultText = document.getElementById("resultText"); 825 826 var worker = new Worker("./worker.mjs", { type: "module" }); 827 828 button.addEventListener("click", () => worker.postMessage({ 829 a: Number(aInput.value), 830 b: Number(bInput.value) 831 })); 832 833 worker.onmessage = (e) => resultText.innerText = "There are " + 834 e.data.result + " primes between " + e.data.a + " and " + e.data.b; 835 ``` 836 837 worker.mjs: 838 839 ``` 840 import Primes from "./build/primes.mjs"; 841 842 var primes = await Primes(); 843 844 onmessage = (e) => { 845 const count = primes._primes_in_range(e.data.a, e.data.b); 846 postMessage({ result: count, a: e.data.a, b: e.data.b }); 847 }; 848 ``` 849 850 More complicated than before, but nothing crazy. Notice how we are using 851 [`postMessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Worker/postMessage) 852 and 853 [`onmessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Worker/message_event) 854 to pass events back and forth. The argument of `postMessage()` is the 855 actual data we want to send in 856 [JSON](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON) 857 format, while the argument of `onmessage()` is an 858 [event](https://developer.mozilla.org/en-US/docs/Web/API/Event) 859 whose `data` property contains the object that was sent with `postMessage()`. 860 861 You can check out this example in the directory `04_no_block` in the 862 repository 863 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 864 [github](https://github.com/sebastianotronto/emscripten-tutorial)). 865 Try also large numbers, in the range of millions or tens of millions, and 866 compare it with the previous example - but not don't go too large, we 867 only support 32-bit integers for now. Notice how, with this new setup, 868 the browser remains responsive while it is loading the response. 869 870 Oh and by the way, a nice exercise for you now could be making the 871 script show some kind of `"Loading result..."` message while the worker 872 is working. This is not hard to do, but a huge improvement in user 873 experience! 874 875 ## Callback functions 876 877  878 879 For one reason or another, your library function may take as parameter 880 another function. For example, you may use this other function to print 881 log messages regardless of where your library code is run: a command-line 882 tool may pass `printf()` to log to console, while a GUI application 883 may want to show these messages to some text area in a window, and it 884 will pass the appropriate function pointer parameter. This is the use case 885 that we are going to take as an example here, but it is not the only one. 886 887 Implementing this was probably the step that took me the longest in my 888 endeavor to port my Rubik's cube solver to the web. Luckily for you, 889 when writing this post I found a simpler method, so you won't have to 890 endure the same pain. 891 892 First, we'll have to adapt our library function like this: 893 894 ``` 895 int primes_in_range(int low, int high, void (*log)(const char *)) { 896 /* The old code, with calls to log() whenever we want */ 897 }; 898 ``` 899 900 *Tip: when using callback functions like this, it is good practice 901 to have them accept an extra `void *` parameter, and the library 902 function should also accept an extra `void *` parameter that it then 903 passes on to the callback. So our function would look something like 904 this: `int primes_int_range(int low, int high, void (*log)(const char *, void *), void *log_data)`. 905 This makes the setup extremely flexible, and allows passing callback 906 functions in situation where this may be tricky. For example, this 907 way you could pass a C++ member function by passing an object as 908 `log_data` and a function that call `log_data`'s member function 909 as `log`. Since we are not going to use this in this example, I'll stick 910 to the simpler setup.* 911 912 Now, to call our function from the JavaScript side we would like 913 to do something like this: 914 915 ``` 916 int result = primes_in_range(a, b, console.log); // Logging to console 917 ``` 918 919 Unfortunately, this will not work, because `console.log`, a JavaScript 920 [function object](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function), 921 does not get automatically converted to a function *pointer*, which is 922 what C expects. So we'll have to do something slightly more complicated: 923 924 ``` 925 import Primes from "./build/primes.mjs" 926 927 var primes = await Primes(); 928 const logPtr = primes.addFunction((cstr) => { 929 console.log(primes.UTF8ToString(cstr)); 930 }, "vp"); 931 932 const count = primes._primes_in_range(1, 100, logPtr); 933 ``` 934 935 Here `addFunction()` is a function generated by Emscripten. Notice also 936 that we are wrapping our `console.log()` in a call to `UTF8ToString()`, 937 an Emscripten utility to convert C strings to JavaScript strings, and 938 that we are passing the function's signature `"vp"` (returns `void`, 939 takes a `pointer`) as an argument; see 940 [here](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#function-signatures) 941 for more information. 942 943 Other than that, you just need to add a couple of compiler flags: 944 945 * `-sEXPORTED_RUNTIME_METHODS=addFunction,UTF8ToString` to tell the 946 compiler to make these two methods available. 947 * `-sALLOW_TABLE_GROWTH` to make it possible to add functions to 948 out module at runtime with `addFunction()`. 949 950 And as you can check by running the example `05_callback` from the repo 951 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 952 [github](https://github.com/sebastianotronto/emscripten-tutorial)), 953 everything works as expected, both in Node.js and in a web page. To make 954 the examples more interesting, the web page one is not only not logging the 955 messages to console, but it also shows them as text in the web page. 956 957 *Note: you must be careful where you call this callback function from. 958 If you try to call it from outside the main thread - for example, in one 959 of the threads that are spawned to count the primes in the sub-intervals 960 - you'll get a horrible crash. This is because web workers do not have 961 access to the functions that reside in another worker's memory.* 962 963 ## Persistent storage 964 965  966 967 Our multithreaded implementation of `primes_in_range()` is not slow, but 968 it could be faster. One possible way to speed it up is to use a look-up 969 table to make `is_prime()` run in constant time; for this we'll need to 970 memorize which numbers below 2<sup>31</sup> (the maximum value of 32-bit 971 signed integer) are prime. This will require 2<sup>31</sup> bits of data, 972 or 256MB. It would be nice if we could store this data persistently in 973 the user's browser, so that if they use our app again in the future we 974 won't need to repeat expensive calculations or re-download large files. 975 976 Putting aside the question of whether any of the above is a good idea, 977 and assuming you know how to generate such a table, in C you would 978 read and store the data like this: 979 980 ``` 981 #include <stdio.h> 982 983 #define FILENAME "./build/primes_table" 984 985 void read_table(unsigned char *table) { 986 FILE *f = fopen(FILENAME, "rb"); 987 fread(table, TABLESIZE, 1, f); 988 fclose(f); 989 } 990 991 void store_table(const unsigned char *table) { 992 FILE *f = fopen(FILENAME, "wb"); 993 fwrite(table, TABLESIZE, 1, f); 994 fclose(f); 995 } 996 ``` 997 998 *Note: the code snippet above is extremely simplified, you probably want 999 to add some error-handling code if you implement something like this.* 1000 1001 The good news is that we can use the same code when building with 1002 Emscripten! The bad news is that... well, it's a bit more complicated 1003 than that. 1004 1005 First of all, it is important to know that 1006 [Emscripten's File System API](https://emscripten.org/docs/api_reference/Filesystem-API.html) 1007 supports different "backends", by which I mean ways of translating the 1008 C / C++ file operations to WASM / JavaScript. I am not going to discuss 1009 all of them here, but I want to highlight a few key points: 1010 1011 * The default backend is called `MEMFS`. It is a virtual file system 1012 that resides in RAM, and all data written to it is lost when the 1013 application is closed. 1014 * Only one of these backends (`NODERAWFS`) gives access to the actual 1015 local file system, and it is only usable when running your app with 1016 Node.js. Browsers are *sandboxed*, and the filesystem is not normally 1017 accessible to them. There are ways, such as the 1018 [File System API](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API), 1019 to access files, but as far as I understand each file you want to 1020 access requires explicit actions from the user. We would like to manage 1021 our data automatically, so we are not going to use this API. 1022 * The backend we are going to use is called `IDBFS`. It provides access 1023 to the [IndexedDB API](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), 1024 which allows to persistently store large quantities of data in the 1025 browser's cache. The data is only removed if the user asks for it, 1026 for example by cleaning it from the browser's settings page. 1027 1028 To activate the `IDBFS` backend, we are going to add `--lidbfs.js` 1029 to our compiler options. The Indexed DB is not the only way to store 1030 data persistently in the browser. For an overview of all the options, 1031 you can take a look at 1032 [this page on MDN](https://developer.mozilla.org/en-US/docs/Learn_web_development/Extensions/Client-side_APIs/Client-side_storage). 1033 1034 The compiler flag is not enough, however. We also need to: 1035 1036 1. Create a directory (for the virtual file system) where our data file 1037 is going to be stored. We are going to call this directory `assets`, 1038 but you can pick any other name; it does not have to coincide with the 1039 name of a directory that exists on your local file system. 1040 2. Mount the directory we have just created in the indexed DB. 1041 3. Synchronize the virtual file system, so that our script is able to 1042 read pre-existing files. 1043 1044 All of the above has to be done from JavaScript, which makes things a 1045 little bit complicated, because we are reading our files from C code. 1046 We have a couple of ways to work around this issue: 1047 1048 * Using 1049 [inline JavaScript](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#interacting-with-code-call-javascript-from-native) 1050 in our C code with the `EM_JS()` or `EM_ASYNC_JS()` Emscripten macros. 1051 * Setting up the indexed DB file system when the module loads using 1052 the `--pre-js` compiler option. 1053 1054 Here we are going to use the second solution, but the first option is 1055 good to keep in mind, because it allows us to call JavaScript code at 1056 any point rather than just at startup. 1057 1058 *Note: if you do end up using `EM_ASYNC_JS()` to make asynchronous 1059 JavasScript functions callable from C, keep in mind that any C 1060 function that, directly or indirectly, calls an async JavaScript 1061 function, will now return a 1062 [promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise) 1063 when called from JavaScript. But wether an async function is called is 1064 determined at runtime, so you C function may return a value one time 1065 and a promise another time, depending on how exactly it runs!* 1066 1067 So we are going to add `--pre-js init_idbfs.js` to our compiler options, 1068 with `init_idbfs.js` containing the following: 1069 1070 ``` 1071 Module['preRun'] = [ 1072 async () => { 1073 const dir = "/assets"; 1074 1075 FS.mkdir(dir); 1076 FS.mount(IDBFS, { autoPersist: true }, dir); 1077 1078 Module.fileSystemLoaded = new Promise((resolve, reject) => { 1079 FS.syncfs(true, (err) => { 1080 if (err) reject(err); 1081 else resolve(true); 1082 }); 1083 }); 1084 1085 } 1086 ]; 1087 ``` 1088 1089 As you can see, the syncing operation is more complicated, the main 1090 reason being that it is an 1091 [asynchronous operation](https://developer.mozilla.org/en-US/docs/Learn_web_development/Extensions/Async_JS). 1092 For this reason, we are wrapping it in a 1093 [Promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise), 1094 so we can detect when this operation is done and react accordingly. 1095 We are going to do so from our worker script, which will send a message to 1096 the main script to communicate that the file system is ready to go: 1097 1098 ``` 1099 primes.fileSystemLoaded.then(() => { 1100 postMessage({ type: "readySignal" }); 1101 }); 1102 ``` 1103 1104 The main script can then handle this signal as it prefers, for example by 1105 enabling the `Compute` button, if it was previously marked as `disabled`. 1106 1107 One last thing: since we are now using a large amount of memory and 1108 loading the virtual file system at the start, the compiler will complain 1109 that we are not reserving enough memory for our application. Adding a 1110 `-sINITIAL_MEMORY=272629760` compiler flag will do the trick (watch out: 1111 the number you provide must be a multiple of 2<sup>16</sup>). I am not 1112 entirely sure why this is the case, since we are not loading the file in 1113 memory statically, but only at runtime, and only when the 1114 `primes_in_range()` function is called. I would expect that using 1115 [`-sALLOW_MEMORY_GROWTH`](https://emscripten.org/docs/tools_reference/settings_reference.html#allow-memory-growth) 1116 would be enough - and indeed this is the case if we use the `EM_ASYNC_JS()` 1117 macro to load the file system on-demand. 1118 1119 And with all this, we are ready to run our optimized version of the 1120 `primes_in_range()` algorithm, all from within our browser! As always, 1121 you can check out the complete code in the folder `06_storage` of 1122 the repository 1123 ([git.tronto.net](https://git.tronto.net/emscripten-tutorial/file/README.md.html), 1124 [github](https://github.com/sebastianotronto/emscripten-tutorial)). 1125 1126 If generating this data on the user's side seems redundant, you can 1127 also have it downloaded from the server. I won't explain how to it here, 1128 since there are many possible ways to achieve this - after all, the indexed 1129 DB is also accessible from JavaScript. If you want to experiment more 1130 with Emscripten you can try to use the 1131 [Fetch API](https://emscripten.org/docs/api_reference/fetch.html); in my 1132 project I was not able to make its synchronous version work together with 1133 `-sMODULARIZE`, so I ended up using 1134 [`fetch()`](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) 1135 directly from within an `EM_ASYNC_JS()` function. This tutorial is already 1136 too long, so I am going to leave this as an exercise for the reader. 1137 1138 ## Closing thoughts 1139 1140 I have discussed almost everything that I have learned about building a 1141 webapp in C / C++ with Emscripten. I ended up using C, not C++, for all 1142 of my example, so I did not have a chance to discuss some neat C++-specific 1143 features such as 1144 [`EMBIND()`](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html) 1145 and 1146 [`emscripten::val`](https://emscripten.org/docs/api_reference/val.h.html) 1147 - do check them out if you plan to use C++ for your web app! 1148 1149 Even if this page is structured like a tutorial, this is probably better 1150 described as a collection of personal notes, a "brain dump" that I wrote 1151 for myself as is the case with many of my blog posts. Writing this piece 1152 was a great occasion for me to review the work that I have done and the 1153 things I have learned. And while reflecting on all of this I was able to 1154 isolate a specific impression that I had while working on this, 1155 and I summarized it in on sentence: 1156 1157 <center><strong><i> 1158 It's leaky abstractions all the way down. 1159 </i></strong></center> 1160 1161 If you have not encountered this term before (but you should, I have already 1162 used it in this post), *leaky abstraction* is a term used to describe the 1163 failure of an abstraction to hide the low-level details it is abstracting. 1164 The so-called 1165 [law of leaky abstractions](https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/) 1166 says that all abtractions are leaky. But, in my opinion not all 1167 abstractions leak in the same way - some leak way more than others. 1168 1169 Emscripten is a great project that tries to abstract away all the web 1170 (JavaScript, WASM, web workers, local storage...) so that you can build 1171 and run your C / C++ code in a web browser. Frankly, this is mind-blowing, 1172 and I have mad respect for the Emscripten developers. 1173 1174 But as soon as the complexity of your codebase bumps up a notch, you 1175 immediately find out that the abstractions don't hold anymore. If yor 1176 app is multithreaded, you have to learn what a web worker is. If you 1177 want to read some data from a file, welcome to the world of client-side 1178 storage. You need 64-bit memory support because you are processing more 1179 than 2GB of data? Sure, but first make sure that your users are not 1180 using Safari. 1181 1182 But I am not complaining about this. A browser is a very different beast 1183 from a bare-metal operating system, and it is to be expected that you 1184 have to know something about the system you are deploying to. I am happy 1185 that I could learn about all of this, and I believe this knowledge is 1186 going to give me an extra edge whenever I'll work on the web again.