Browsertech Interview: WebAssembly for Data Analysis

Paul was joined by Andrew and Eric of Prospective to talk WebAssembly, Pyodide, and streaming real-time data visualization in the browser.

Excerpt of interview with Prospective: What is Prospective?

Paul: Tell me about Prospective.

It's a tool for, I mean, this will sound reductionist, but it's a tool for making dashboards that runs in the browser?

Andrew: It's a tool for making dashboards, but it is a single tool for making data products, is the way that I would phrase it. We want to bring that down the stack and incorporate more of the aspects of how you got your data into your product in the first place. Basically, making it a place where you can go to a platform, bring in your data, do all of the parts that are involved with raw data in the wild, make it presentable and tell a story with it.

And that's not just picking the colors. Or setting the axis widths, or how the ticks work, right? It is aligning it, it's simplifying it, it's grouping it, it's summarizing it, it's identifying outliers and annotating it with the information that you want to convey. It's filtering it down to the pieces that you think are interesting, it's segmenting it along the dimensions around the story that you want to tell.

And it brings a data science environment based on Pyodide. Pyodide is the CPython interpreter, and a collection of data science libraries and their C extensions, compiled to WebAssembly with emscripten.

When [data analysts] get to the edges of what our tool provides, we want it to be naturally extendable with more sophisticated tools, specifically programming tools.

Paul: So now you've got Pyodide running in the browser, but originally, pre-Prospective the company, when you were just working with the open source component, the Python side was just server side?

Andrew: Yeah. So, I mean, Our technology vocabularies fail us at this point because of all of the weird stuff you can do.

We are running a Python server in a web worker as if it was a web server. It allows you to say, project this Kalman filter on this, or calculate a linear regression, or do something that Perspective doesn't do.

Wasm pain points

Paul: What pain points are you encountering right now with WebAssembly?

Andrew: The standard is moving a lot slower than I expected it to. I don't get the impression that the founders of the WebAssembly initiative, the Googles etc., are as interested in the technology now as they were when they introduced it.

It’s growing a life of its own among hackers and experimenters. But Google was looking to sell ads, not the hundred [megabyte] platform things that people are starting to build on it now.

For what we're doing as a company, and as an open source project, we're well aware of what the limitations of WebAssembly are. So we are building a product that takes advantage of what it can do right now. We're not limited by any of these things. But the potential of WebAssembly, I think, could be a lot more.

I know Spectre and the timing attack classes that were discovered around the time that WebAssembly was introduced really threw a wrench in some of the more aggressive features associated with WebAssembly.

Paul: Even SharedArrayBuffer...

Andrew: currently behind a special content flag that has to be provided by the server in order for it to run in the browser. Which is no problem for us, because we're an enterprise product. We can support multi threading until the cows come home. But it is a huge problem for an open source project.

You have to build a bespoke server that supports these headers that host your code in a very specific way. Otherwise, you won't be able to use these features. And it's hard because it's different bytecode. So you have to build multiple versions of the library, and have runtime detection to figure out which one to download and all sorts of complex stuff.

Anyway, I'll tell you what's not been a blocker that I think the community complains about a lot: the interface with the DOM. This was quite slow originally in WebAssembly, but it's actually gotten a lot faster recently with using TextEncoder and DataViews, and stuff like that to transfer string values back and forth. It’s a lot better now.

I'm not aware that the WebAssembly committee or anybody involved in actually the WebAssembly inspect ever had any interest ever in building out, like, a full DOM model, or an IDL model or something like for WebAssembly. So it seemed like a completely impractical goal. And you know, we're currently using Yew, which touches the DOM a lot.

Paul: And that's all going over the JavaScript/WebAssembly boundary with wasm-bindgen?

Andrew: Yes. It generates a lot of JavaScript glue code, for sure. But, I've built a lot of React applications, I've built a lot of JavaScript applications, and our new application is very fast. We get two millisecond render times, basically no matter what I do to it, on an application which is quite complicated and does a lot of custom rendering, a lot of icons, lots of data reactivity on the page at the same time. It's quite good.

For what we want to use it for, we're trying to get away from the DOM because we want to do very bespoke rendering with it.

So, you know, having a faster way to move divs around on the page or something that both seems an impractical goal and something that I'm certainly not asking for.

The rest of the conversation is available in podcast form.