☁️

Isolates, microVMs, and WebAssembly

I’ve been thinking about WebAssembly a lot lately.

I’m mostly focused on two lines of inquiry. First: What’s the promise of WebAssembly? Why is it exciting? What makes it different? (I get into some of this in What’s WebAssembly?) And second: What’s the state of WebAssembly today?

For me, answering both of those questions required an exploration of Serverless more broadly, as WebAssembly could both compete with and complement emergent technologies like V8 Isolates (which powers Cloudflare Workers) and microVMs (like Firecracker, which powers AWS Lambda and Fly.io).

N.B. These are public notes and not intended as an opinionated essay. Feel free to DM me any corrections.

Containers

Imagine you’re me (very naive), and you’re building a Serverless [1] platform (like Cloud Run) based on Docker containers. The idea: users give you Docker containers, and you provide them with HTTP endpoints in return. When a request comes in, you route it to one of your servers, which routes it to a container (spinning it up if needed), which services the request.

This illustrates some of the problem-solving required to deliver a Serverless solution (taken from Matt Rickard’s excellent write-up):

📦 Packaging: How do users give you their code? (In the above example, we chose to use Docker containers, but there are other options.)
🧊 Cold starts: When your server receives a request, it needs to download and run the Docker image. How do you avoid lengthy delays for those initial requests? For future requests? (Do you keep that container running forever?)
🔒 Security boundaries: Not only are you running untrusted, user-provided code, but in order to make Serverless ‘work’, you need to be able to run code from many users on the same physical machine. Docker provides some isolation, which is good news for us… but it’s considered unsafe to run multiple Docker containers with the default configuration in a multi-tenant scenario (source), which is, of course, bad news for us.
📈 Efficiency: How do you make the system as efficient as possible? You might have many, many containers running on each of your machines. How do you enable your system to run as many of those containers, as cheaply as possible? (Unlike the others, this is largely a concern of the platform provider as opposed to the user, though of course it will ultimately impact pricing.)

There are lots of tradeoffs here. For example, if you’re willing to keep the user’s container running at all times, that will help with cold starts, at the cost of efficiency (resource limits). But with this lens, the primary upside of our system is packaging, as users can give us any Docker container, and we can run it.

The first downside (at least, as described above) is security. If we want a secure multi-tenant system, we can’t run Docker directly. GKE, for example, uses a technology called gVisor, which effectively emulates Linux in Go to create a “kernel sandbox”, and thus avoids containers hitting the kernel directly (source). So maybe we’d have to do something like that.

The second downside is low resource efficiency (in a relative, fuzzy sense). Containers are typically seen as more efficient than virtual machines, since they can share a common operating system kernel, which reduces the resources required to run a bunch of them at once and greatly decreases startup time. In a multi-tenant environment, though, we need to enforce isolation between containers — so we have to use something like the gVisor system described above (or: Different Types of Software Containers), which will cause a “a low-double-digits percentage hit” in performance vis-à-vis that ideal (source). Also: compared to a small JavaScript payload (as in V8 Isolates, below), we’re asking users to give us an entire root filesystem by way of a container, which means we’re moving around and storing more, bigger files.

We may also suffer from cold start times. AWS Lambda, for example, can only handle a single request at a time, so a new Lambda has to be cold-started whenever you get concurrent requests. As Lambda receives more and more requests, it has to create more of these containerized processes, which is expensive (compared to, e.g., Isolates, described below).

Isolates (Cloudflare Workers)

Cloudflare Workers takes a different approach to Serverless.

There’s no Docker, and no Containers. Instead, they leverage a technology called V8 Isolates, originally designed to power JavaScript in the browser. Isolates are “lightweight contexts that group variables with the code allowed to mutate them”.

You can run thousands of Isolates in a single process. You spin up one JavaScript runtime, and then have essentially no overhead for startup time or memory consumption as you run more and more Isolates. Cold starts are non-existent, and the system itself is extremely efficient. You can run this system on the edge in part because all it takes is a small amount of user code (JavaScript) — V8 already has the standard library and everything that user code needs to be useful.

The downsides of Isolates are:

🔒 Security: "v8 has a large and complicated attack surface” (source). Kurt Mackey from Fly.io argues that “Under no circumstances should CloudFlare or anyone else be running multiple isolates in the same OS process. They need to be sandboxed in isolated processes” (source). In response, Kenton Varda (the tech lead for Cloudflare Workers) claims that there’s plenty they can do instead of process isolation, and that imposing strict process isolation “would mean an order of magnitude more overhead, in terms of CPU and memory usage” (source). So, there’s at least some debate here around the level of isolation and hardening required to run V8 Isolates in this way.
📦 Packaging: Originally, those running V8 Isolates could only really support JavaScript / TypeScript. This is changing and is part of why WebAssembly is exciting: V8 can run WebAssembly, so any language that can compile to WebAssembly can in theory run on V8 Isolates.

Workers does support WebAssembly, but the cold start time is quite high in my experience due to the lack of shared modules (source). For Ruff (linked above), I see a cold start time of ~1.5 seconds on Workers; warm requests take < 100 ms.

Firecracker / microVMs (Fly.io)

Fly.io (and AWS Lambda) takes yet another, different approach.

With Fly.io, you do package your code with Docker, but they don’t run it in a container. Instead, they use a technology called Firecracker to create “microVMs”. These microVMs are very efficient: “Firecracker can fit thousands of micro-VMs on a single server, paying less than 5MB per instance in memory” (source). So when a request comes in, they route it to your VM, and run your code on the VM directly.

The upside, from the user’s perspective, is that packaging is much easier: you can ship anything, not just JavaScript (according to Kurt Mackey, Fly.io started with a JavaScript runtime, like Cloudflare, but found that customers were better off “just running Docker images” (source)). It’s also (arguably) more secure than Isolates (source), while still being very fast. Kurt Mackey argues that if you’re forking a process for every Isolate (he thinks you should), you might as well run Firecracker, which enables you to run a much wider range of applications (source).

Fly’s bet is that they can “make launching VMs as quick as launching containers while retaining the great isolation benefits of a VM” (source). (They couple this with running your app in multiple locations, rather than a traditional single region datacenter.)

Note that Fly.io is more of a… PaaS? It’s priced based on the number and size of your VMs, so it’s “more like Fargate than Lambda” (source). It does autoscale, but they don’t scale down to zero. It has a different set of tradeoffs than Cloudflare Workers, which take your code and run it for you at ‘any’ scale.

WebAssembly

Given this context, where does WebAssembly fit in? And how does it align with the criteria we set out at the start?

WebAssembly is designed to be very fast, very portable, and very secure (source). With WebAssembly, you can compile a variety of languages down to WebAssembly’s intermediate representation, then run that compiled code on any WebAssembly runtime, anywhere, in a highly sandboxed way, at near-native speeds.

I don’t have great intuition for whether WebAssembly can be as fast and lightweight as ‘JavaScript on V8 Isolates’, but one goal could be: a “much faster runtime than VMs”, but with the same security guarantees, and similarly broad language support. Something between Isolates and microVMs, maybe? Fast, lightweight, portable, secure, etc.

I could imagine a Serverless platform based on WebAssembly (but not necessarily on V8 Isolates). That platform would spin up a bunch of servers, each armed with a WebAssembly runtime like Wasmtime or WasmEdge or Wasmer (there are a lot of runtimes), then run users’ WebAssembly binaries directly on host machines — no need for containers, no need for VMs, no need for microVMs.

In theory, this could be super efficient. Fermyon, for example, “has found tens of thousands of WebAssembly binaries can run in a single Spin instance while keeping startup times under a millisecond” (source).

This platform could have other benefits too. For example, you could seamlessly mix and match code written in multiple languages, each of which compile to WebAssembly. Further, the only demand on the user would be that they provide a WebAssembly module — they don’t have to construct a Docker Image or otherwise package their code (though some might view this as a weakness, not a strength).

WebAssembly in 2022

In exploring this idea, and it’s applicability to shipping software today, I’ve run into a few problems:

There’s no standardized component model for WebAssembly right now, so in order to run on any sort of hosted WebAssembly platform, you have to implement module-side code (source). You see this with Fermyon, where you have to write your code using their Spin framework. You also see this with Fastly’s Compute@Edge, where you have to implement fastly_http_req and fastly_http_body to make your WebAssembly code compatible with their interface (source). Maybe this is fine, I just expected something more generic.
Compared to running JavaScript on V8 Isolates, WebAssembly binaries are quite large (though compared to Docker Images, they’re very small). Deploying WebAssembly on Cloudflare Workers is really slick (see this primer, or my example), but (1) they have a 1MB limit on the size of the binary, and (2) cold starts can take ~1-2 seconds. (Fastly’s Compute@Edge doesn’t seem to suffer from these limitations.) The main issue seems to be that there’s no concept of shared modules in the WebAssembly world right now, so every binary has to ship with its language’s standard library. Kenton Varda, the tech lead for Cloudflare Workers, talks about it here (this comment is also interesting in that it demonstrates some hesitancy from Cloudflare to prioritize Wasm):

The early WebAssembly-native companies, like Fermyon, don’t yet have hosted solutions: Fermyon’s Spin can only be self-hosted right now (source). Suborbital’s product is focused on infrastructure for running user-defined code rather than as a generic Serverless solution. (Note: this is often cited as a ‘killer use-case’ for WebAssembly. As a concrete example: rather than using webhooks, and calling out a user’s HTTP endpoint whenever an event occurs, the user could just give you code that you run directly — made safe by WebAssembly’s sandboxing. But you can also do this with V8 Isolates.) Cosmonic is building a WebAssembly-first cloud, but it’s in a developer preview. Most other companies (like Wasmer) seem to be focused on runtimes at the moment. It may just be too early!
There are a bunch of limitations to what you can do with WebAssembly right now. For Example, if you’re using WASI (the standardized WebAssembly system interface), there’s no way to make HTTP requests (source). It will be possible once the standard has been defined and implemented, but today, you’re out of luck. (Spin does support HTTP requests.)
Today, you can package a WebAssembly module into a container and deploy it wherever you like, but… why do that? Like with Firecracker, I think you need to run on bare metal to fulfill the promise of WebAssembly. Do we need a Fly.io for WebAssembly? (That might end up being Fastly, or Cloudflare, or Fly.io, or another existing platform.)

The service I want doesn’t seem to exist (yet). What I want is a hosted service that lets me…

Compile my code into a WASI-complaint WebAssembly binary…
Run a single command to create an HTTP endpoint that runs the binary (piping POST body to stdin and returning stdout, like Cloudflare Workers)…
Reap all of the performance, latency, scalability, and security benefits that WebAssembly promises…
Pay via a true serverless, usage-based pricing model…

This would be something like Cloudflare Workers, but without the 1MB code size limit, and without the expensive cold starts. Something that’s ‘WebAssembly-native’. The best solution I’ve found is Fastly’s Compute@Edge, with the one tradeoff being that it requires writing code atop Fastly’s web framework (or, at least, implementing a specific module interface). Fermyon also seems well-positioned to offer this, and I suspect it’s their goal to provide a fully managed solution eventually.

[1] Serverless is a weird term, since on GCP, all of AppEngine, Cloud Run, and Cloud Functions could be considered serverless (source), despite serving very different use-cases and architectures. I like Erik’s claim that serverless demands usage-based pricing.

Published on September 26, 2022.