Announcing Perspective 3.0, the worst release ever!

It only gets better

We’ve just released Perspective v3.0.0, the latest major release of the open-source high performance data-visualization and analytics component underlying Prospective.co.

Perspective 3.0.0 is, by definition, the worst version of Perspective ever released, on account of the Major Version tick which indicates we’ve intentionally broken the API of this previously-working product. Until today, Perspective 2.10.1, had matured to confident stability on the back of 25 minor and patch version releases in the 2.x.x series. Now … well, Perspective 2.10.1 still exists technically, but we’ve also introduced a shiny new version that is guaranteed (numerically) to be the worst version yet!

There are a lot of breaking API changes in this release, some behavior has been changed or deprecated, and a few new features have been added as well. Read on to learn some mor eabout why we went and broke this perfeclty good open-source project, or skip to the end to get a summary of the actual API changes.

How did we get here?

Perspective used to look like ***. It still does, but it used to also.

In ancient times (2016), Perspective’s data engine bore little resemblance to its modern OSS version. Its API was complex, often requiring dozens of classes to configure even simple data queries, and much of its implementation was buggy, incomplete or just plain non-existent. Data ingestion had to be written by hand, cell-by-cell. The developer experience was overall cumbersome, error-prone and frustrating, especially for implementing UI workflows with dynamic (not hard-coded) queries.

Perspective OSS did away with most of that nonsense when porting this component to WebAssembly. In its new life as a data engine for a high-performance Browser-based JavaScript UI, Perspective needed a simpler API that could be queried from a single simple serializable query configuration, rather than a long sequence of oft-quirky method calls. Manual, cell-by-cell data ingestion was replaced with efficient support for formats data actually comes in, such as CSV, JSON and Apache Arrow.

The overall API was reduced to just 2 classes comprising 39 methods in its JavaScript incarnation. This simplicity allowed us to architect a UI which was mostly decoupled from the features of the engine, only interacting with it asynchronously and windowing data requests to avoid extracting the entire dataset at once in-memory.

Everything was fixed and no further changes were ever needed!

Further changes were needed

As the project improved, new features demanded more and more of our simple data engine architecture.

Perspective soon added WebWorker support, allowing queries against the data engine to run in parallel with the UI rendering. Accompanying this change came a rudimentary JSON-based RPC protocol, based on the newly-simplified 39 method data engine JavaScript API.

Later, Python, Jupyter and Node.js engine bindings were added via a WebSocket transport, leading to an explosion of Client and Server implementations for this ad-hoc JSON RPC protocol. The duplicate protocol implementations suffered from sometimes-subtle system idiosyncrasies as well as copious opportunity for simple programmer error. Lacking a spec, each implementation grew its own extensions and exceptions - DataFrame support in Python, Date object support in Node.js, etc. As behavior between implementations began to drift, bugs began to emerge and Perspective’s test suite struggled to cope.

Further, network transports like the WebSockets API have message size, update rate and connection durability limitations not present in the relatively straightforward WebWorker API. Suddenly, our simple RPC protocol needed features such as batching, chunking, throttling, error transmission and multiplexing. Features which needed to be duplicated, tested and documented for each combination of binding Language, Client and Server.

This design limited how much we could extend the platform, and how quickly. It locked us into the existing API patterns, and the tight coupling between engine and RPC protocol constrained what features we could deliver.

In the end, at least we were confident the complexity was justified by the engine’s unique WebAssembly support.

WebAssembly support is no longer unique

When Perspective was open-sourced circa 2017, there weren’t many options available for the plucky web developer seeking a high performance browser-side query engine. Today, that is no longer the case. Perspective’s own data engine continues to offer a unique mix of high-performance data ingestion & processing, OLAP-style multi-axis pivotting and streaming support - but other offerings are starting to grow their own unique features in this space. As the ecosystem matures, we’d like to extend our UI to the feature sets of new data engines availble in the browser, allowing high performance visualization on top of shared data in engines like DuckDB, rather than serializing and copying as must be done today.

On the server (in Python for example), the environment is much more competetive. There are great options available for streaming, and basically any other data consideration is catered to with mature and performant solutions. While Perspective on the server is still quite fast, its simple design limits the feature set and overall scalability - and for ecosystems with a mature data solution already in place, Perspective on the server becomes an extra in-memory bottleneck for distributed or out-of-memory Tables. Supporting these platforms via a common high-performance virtual data API, without copying into Perspective’s in-memory model, would enable a Perspective UX with the raw query performance, table size and features of your server-side data engine of choice.

What do we want to change?

The Perspective project’s goal is to be a great data visualization tool. To the extent that a data engine is core to the overall user experience, it is core to Perspective. Metrics like CPU/memory performance, query features, data size limits and idempotent streaming/static, ultimately limit the problems that can be solved with Perspective, making it less useful and introducing the need for specialized cohorts for “big”, “fast” or “complex” data sets.

In order to support pluggable data engines, we need a stable, minimal, unambiguously simple API - to minimize the developer effort required to implement this API on the Server side for new data engines such as DuckDB, SQLite and Polars. It needs to be easy to implement this API with good performance by default, and easy to implement the entire API correctly so that Perspective’s UX doesn’t degrade between engines. It needs to be tested, documented, and properly versioned, so integrations can be robust as the feature set expands.

We need a portable design which makes it easy to support Perspective myriad of current-and-future language bindings. The cost of this complexity in 2.10.1 made certain feature choices (like SQL join operations) intimidating, because any extension of the RPC API would need multiple platform-specific implementations. The more code we can write once and compile to multiple platforms, the less code there is to test, document, or potentially break.

As I said in the title, any change of this mangnitude will make this the worst version of Perspective ever released. It would therefore be helpful if this API (RPC and developer-facing) can be iterated on rapidly. We will add new features to Perspective’s UX and data engine, as well as begin to model the feature sets of other engines. All of these will likely require elaborations (or consolidations) of this new RPC API. The lack of a reference implementation of 2.10.1’s API meant that we relied on ancestral stability to maintain order, and changing a published API meant tedious and risky re-building multi-language refactoring (and 3 new docs sites to update and publish!).

Overall, we want a solution that is rigorous, testable, typeable, assertable, and portable.

Design decisions

Rust

The friction with the original API led us to the decision to write our new Client in a compiled language that would be portable and easily bindable to all the different target languages we want to support. Rust has the best story for cross-language compatibility right now: it’s fast (can make memory-efficient use of a binary message protocol, e.g.), and it embeds in everything (we use pyo3 and wasm-bindgen). It has excellent build tooling in general, and (subjectively) the best WebAssembly compatibility story in general, along with the most mature WebAssembly ecosystem.

Writing the Client bindings in Rust resulted in a subsystem that behaves the same in every language. It has allowed us to re-use our benchmark and test suites interchangeably across all languages we support, and eliminated thousands of lines of duplicated code between languages. It promises to keep ongoing maintenance of language bindings low, and has already allowed us to add native bindings to Rust itself.

It’s also fun! But that wasn’t the point (… unless?)

Protobuf

Choice of wire format can be a contentious affair for a project, especially when performance is of prime concern as it often correlates inversely with developer sanity. For Perspective 3.0.0 however, that question was easy and the answer was Protobuf.

We wanted something with efficient serialization and broad platform support, but more importantly we wanted something rigorous and dependable. The shared Client/Server design allows us to potentially swap out this serialization format for all client/server/languages at once. The appeal of Protobuf came largely from its many high-quality implementations, long industry track record and general aura of warm indifference it inspires. No one ever got fired for suggesting Protobuf, as I’m sure someone important said once.

Using a binary protocol allows target languages like Python to parse and generate the message stream efficiently and off-GIL, so that communications overhead won’t impact the runtime performance of the Python server. In WebAssembly, it allows us to avoid extensive allocation on JavaScript heap when interacting with messages (even though the runtime is currently single-threaded).

What’s changed?

Rust library

Perspective 3.0.0 adds a native Rust library (perspective on crates.io) alongside the Python and JavaScript versions. See the new rust-axum example which embeds Perspecitve in an axum.

C++ Server

The Perspective Server (also package as perspective-server on crates.io) has been ported almost entirely into C++ for everything (there is no JS or Python wrapping layer), and the API is simply two methods: receive_request and send_response, which each take a protobuf payload. As a result, binding the server into other languages is dead simple, and binding the client to an arbitrary message transport (Socket, external message queue, smoke signal, etc.) similarly only requires the send and receive implementations in the host language.

Many features needed to be de-duplicated from Python/JavaScript into an idiomatic C++ implementation:

  • JSON parsing (in String form, which subsumes Python Dict and JavaScript Object forms via language-native stringification).
  • "date" and "datetime" parsing (uses Apache Arrow’s built-in parser, + many of the legacy formats from 2.10.1).
  • View config parsing and creation of the internal View object.
  • PerspectiveManager (Python), WebSocketServer (JavaScript), etc. state management for collections of Tables and Views is now built-in to the Client/Server API.
  • Callback bookkeeping, resource ownership/cleanup code.

As a result, new features like improved "datetime" type string parsing behave consistently across languages by default.

Docs

Since the language bindings in Perspective 2.10.1 were hand-written, so too was the documentation, a mix of Sphinx, JSDoc, TypeScript and Markdown, all self-hosted without proper versioning. As all language bindings are now written in Rust, and Rust has excellent documentation tooling built-in, Perspective now has consistent documentation across languages that is properly versioned.

Performance

Some APIs, such as JSON ingestion, are now much faster even in single-threaded mode.

Performance improvements were not the goal of 3.0.0 development. Nevertheless, we diligently track Perspective’s CPU performance across versions to protect against performance regressions, and the 3.0.0 release has recorded an overall CPU time improvement on every method we benchmark, JSON ingestion benchmarks in particular.

While Perspective lacks comprehensive concurrent benchmarks so far, ad-hoc testing of 3.0.0 seems to exhibit much better thread utilization than 2.10.1, especially for paths such as JSON ingestion where logic has been moved from Python to C++ (and the GIL is released). We expect to be able to iterate quickly with the new API design, and we’ve even merged some early multi-core optimization on the back of this work.

No, like, what’s actually changed, like in the API?

In all languages:

  • Python’s PerspectiveManager, the browser’s PerspectiveWorker, etc., have been replaced by Server, an explicitly instanced engine API. A Server hosts Tables and shares no state with other Servers in the same process, aside from a global executor pool on platforms which support threading.
  • A Client is needed to send commands to a Server. Methods like JavaScript’s perspective.websocket() now return an instance of Client, so for the most part the user experience here is unchanged from 2.10.1. However, Client can be implemented for an arbitrary transport (like a Socket) with only a few methods, analogous to “send protobuf” and “receive protobuf”. This will make the process of extending Perspective to new platforms (like Rust!) much easier.
  • JSON (JavaScript) and Dict/List (Python) has been streamlined. This was previously implemented internally through the legacy cell-by-cell batch update API, leading to bad performance and behavior drift. In 3.0.0, JSON is now parsed and generated entirely in C++ via RapidJSON. While the browser generally has excellent JSON parsing performance, the resulting JavaScript objects need further processing to be arranged as Perspective Table. With this change, loading data in JSON format is substantially faster, moreso if you can pass the JSON data as a String data type rather than a JavaScript Object or Python Dict (internally we’ll now stringify the latter types and use RapidJSON to parse the, and this is still much faster than 2.10.1!).
  • Previously, Perspective supported (Date, Datetime) (JS), and (date, datetime, pandas.Timestamp) (Python) in JSON/Dict format. However, these types are not JSON serializable, which was a source of implementation inconsistency in 2.10.1. In 3.0.0, these types are no longer directly supported. See the language-specific notes on this data type below.
  • In JSON input modes, Perspective used to perform platform-specific coercion of string types to "date" and "datetime" types. This is now standardized in 3.0.0, so some parsing behavior may be slightly different
  • Partial updates used to be supported on a row-by-row basis using JSON null vs undefined values to differentiate “reset” and “ignore” update behavior, respectively. However, this was difficult to support consistently in formats like CSV and Apache Arrow, which lack a distinct “ignore” value like JavaScript’s undefined. In 3.0.0, partial updates are still supported, but the entire column must now be omitted per update batch; if you need to apply a partial update with mixed missing columns per row, you’ll need to split the batch manually before calling Table.update().
  • Perspective’s ExprTK integration has some extensions, e.g. for handling string columns & literals, which was implemented per-platform in the client (for some reason). This behavior has been rewritten in C++, so all Perspective servers behave the same now - but legacy applications with complex expressions may find some “valid” expression in the Table.view and Table.validate_expr commands no longer work. See Python-specific notes below.

JavaScript:

  • All JavaScript Perspective packages are now ES Modules (type: "module" in package.json). This requires <script type="module"> tags when importing the CDN versions, or a bundler that properly understands ES Modules. While we still only officially provide a bundler plugin for esbuild, in 3.0.0 it should be much easier to write bindings for new bundlers, as there is very little JavaScript left and the boostrap process without a bundler is much simpler.
  • perspective.table() constructor no longer supports schema inference for JSON columns with Date and Datetime values. These non-JSON compatible types can still be coerced into Perspective (or rather - the browser will auto coerce these to numeric types, which Perspective can coerce further), but a schema must be provided to the constructor to inform Perspective to do so, because JSON.stringify will coerce these to number which causes Perspective to infer them as integer. This change simplifies the API quite a bit as well as making it consistent in behavior between Python, JavaScript and Rust.
  • perspective.worker() and perspective.websocket() are now asynchronous and must be awaited.
  • View.on_update(), View.on_remove(), View.on_delete() now return callback ID values that must be provided to their reciprocal View.remove_update() (etc., respectively).
  • perspective.memory_usage() is renamed perspective.system_info().
  • <perspective-viewer> must now be imported with <script type="module">. In order to call methods on a <perspective-viewer> custom element from a script, you must either import "@finos/perspective-viewer"; or await customElements.whenDefined("perspective-viewer"); to await the WebAssembly module compilation.

Python:

  • Python wheels now target abi3-py39, adding support for Python 3.12 and beyond, but deprecating 3.8 and below. perspective-python can still be built from source for these platforms.
  • The perspective module no longer exports or instantiates a default Server, instead you must create a Server and synchronous Client before you can create a Table. We made this change to keep the API consistent across platforms, and to minimize confusion in contexts where you may accidentally create alternative Server instances, such as passing data to the PerspectiveWidget Jupyter widget constructor. TODO A default Server may be added to the root in 3.1.0.
  • date, datetime and pandas.Timestamp (in Dicts at least) are not supported by the Table constructor anymore (as described above). Internally, Perspective uses json.dumps() to stringify input, which can be somewhat configured globally if you choose. In 3.0.0, it is best to leave JSON in string format if you can! Since JSON parsing now occurs in C++ rather than Python, it is now off-GIL and threadsafe, in addition to improved single-threaded performance.
  • pandas.DataFrame is no longer directly supported by perspective.table(), but they may still be loaded internally if pyarrow is available in the environment. This load path is both dramatically less code and faster than 2.x, but pyarrow is much more stringent about type coercion/inference than Perspective 2.x is. TODO We plan to remove this from the engine entirely.
  • Table.validate_expr had a different return format (a List of Lists, but really tuples) from the JavaScript version. This behavior now uses the JavaScript version’s List of Dict output. See docs

In Jupyter:

  • PerspectiveWidget constructor kwargs server and client have been replaced with binding_mode, which can be either "server" (the default) or "client-server". Previously, client=False, server=False was a nonsensical case, and just client=True relied on custom Python data marshalling to JSON (causing non-idiomatic behavior & performance).
  • When passing data (in Perspective-compatible format) to PerspectiveWidget, instead of an instance of a Table, a new Server and Client are implicitly created internally (because the global Server instance has been removed). In order to access the Table, you must call PerspectiveWidget.table or the newly added Perspective.client properties.

The long view

Befitting a release canonically the worst since launch, the Perspective 3.x.x series is only going to get better! 3.0.0 is not just a major internal change, it is a stable platform upon which we intend to rapidly innovate. We’ll post more about our future plans soon!

We think Perspective 3.0.0’s design unlocks the next stage of the project’s evolution. It has reduced the accidental complexity related to language bindings and the client-server API, and allowed us to increase the breadth and depth of our test coverage with a single suite to cover all our implementations. We’ll be able to add new features more quickly, and we’ll have fewer bugs in fewer subsystems. In Perspective 3.0, we’ve been able to take advantage of better tech and simpler design to make 3.0 easier to integrate into your enterprise systems.

Special Thanks!

A special thanks to the main architects/contributors for this release, @timbess, @sinistersnare and @tomjakubowski.

Full Changelog: https://github.com/finos/perspective/compare/v2.10.1…v3.0.0