# WASM Parallelism Using Rust

August 20, 2021

This will be a living document over the coming weeks as I explore this project.

I'm looking to explore the viability of using WASM for highly parallel tasks in a web browser. This includes exploring:

  • does it perform meaningfully faster than JS or Web Worker-based JS?
  • is there sufficient browser support?
  • is it durable?
  • what's the development ergonomics like?

# Library/Toolchain Use

This project does not intend to explore all the options for how we compile Rust to WASM and utilize threading. I found this fantastic post (opens new window) about the issue and I intend on following its advice. This boils down to using Rayon (opens new window) for a threading model via wasm-bindgen-rayon (opens new window).

# Meltdown/Spectre and SharedArrayBuffer

SharedArrayBuffer is a wrapper around ArrayBuffer that handles atomic sharing for you. This is the key to sharing data between the JS environment and multiple WASM threads without introducing race conditions.

I recalled that there was discussion of a Spectre mitigation applied in browsers by disabling SharedArrayBuffer. It looks like that's been addressed as long as you supply the correct headers (opens new window).

SharedArrayBuffer exists in Chrome, Firefox, Edge, but not Safari. So you either have to say no to Safari or use a feature detector and fall back to a single-threaded approach.

# Demo Failures on an M1 Mac

I ran this demo in two ways: from the hosted demo website (opens new window) and by compiling it myself (opens new window). For me, it doesn't work properly. I can run both threading approaches, but the multithreaded approach takes 10-20x longer. I asked friends to try, and they all had expected results: it was cpucount times faster. I suspect there's something weird going on because I'm on an M1 Mac. I have an issue open here (opens new window). This is a possible dealbreaker because if I can detect a lack of feature support, I can fall back. If I can't, I end up shipping something that feels very broken on an M1 Mac. Of course I could get clever and switch based on user agent, but that's such a can of worms.

Trying to compile it locally was a pain, given the lack of binaries for wasm-bindgen for an M1 Mac. I had to install it with Cargo and then remove it from the package.json dependencies for the demo.

# Testing Durability

I've tried to do clever performance tricks using Web Workers and WebGL in the past. Most times I get a product that works very well most of the time, and then there's 1 or 2 coworkers where it utterly fails for. In one case, the Web Workers would memory leak, while WebGL shaders would eat up gigs of VRAM and send their discrete graphics cards screaming to the moon.

I need to design and perform a test that demonstrates not just the parallelism win of this approach, but that it's durable. That means:

  • it performs consistently
  • graceful degredation to a single-threaded or JS implementation is reliable
  • no memory leaks
  • it works on all major browsers (I caan live with Safari being slower, but not broken, so it'll have to fall back properly)

# Testing Performance

Testing performance can utilize a lot of what we use to test durability. If we've proven a fallback approach can work reliably, we can use the fallback (eg. pure JS or WebWorker JS) to test the performance. I am thinking these flavours:

  • Rust WASM (multithreaded)
  • Rust WASM (single threaded)
  • JS WebWorkers (multithreaded)
  • JS (single threaded)

While my primary goal is to learn, this is all in the context of a real problem at work: testing the ability for robots to reach navigable positons on a map, given a raster that describes occupancy (walls, obstacles, etc.). The demo should ideally test against this concept:

  1. Load an array of occupancy data
  2. Load a collection of (x,y) position tuples.
  3. Check if each position, when expanded to a circle by a given radius (our robots are circular), intersects any occupied space.

The parallelism happens in step 3. There might be >10,000 positions and we can evaluate them in parallel. We might also want to do things like evaluate different radii, given different robot configurations.

# To be continued...