Prospective + S3: De-Swamping the Data Lake

Data discovery and exploration in S3

S3 has become synonymous with affordable, easy, reliable data storage. It is a foundational layer for most organizations' “big data” strategy. Sadly, this also means that this is often where useful data goes to die. More often than not the effort (and cost) of exploring data that lives in S3 outweighs the potential benefits of using an additional data source. 

What if instead of having to involve countless data teams, set up complex ETL pipelines (and maintain them) just to decide if you want to use a data source, you could simply visualize any table with just the file path? What if it took you seconds between “I wonder if this table would be useful to include …” and “this has exactly what I’m looking for” or “this is complete garbage”? 

At Prospective, our mission is to make all data accessible to anyone from experimentation to production. We have built our architecture to finally eliminate the friction between data discovery and data productization. 

After working with customers across capital markets, industrials, gaming, and beyond we realized that a critical piece to achieving this was a direct connection to S3. While real-time data feeds often take a similar shape – transactions, events, observable things – the data in S3 can take many forms. For some clients the goal was making sense of historical pricing data. For others it was making third party data feeds usable alongside their own transactional data.

In a more literal sense, data can also take many file formats given the flexibility of S3. For data engineers dealing with compressed, non-human readable file formats like parquet is easy, but what about for data analysts or business leaders?

TLDR – storing data in S3 is like storing junk in your junk drawer. You know there is something useful in there, but you are scared to open it. We built Prospective’s data adapter to:

  1. Give teams instant visibility into their data lakes. With our adapter all you need is a file path in order to connect an S3 bucket to Prospective; within seconds files are displayed on screen ready to be interrogated. This simple direct connection to S3 storage transforms the data swamp (junk drawer) into a usable data lake. 
  1. Eliminate complex ETL processes. Our customers no longer need to pre-process, transform, join, combine, move files in order to view or make sense of them. For one of our capital markets customers, this completely changed the way they did price auditing. Instead of having to build manual processes to find historical pricing data on an ad-hoc basis, they could instantly add historical pricing data to their existing Prospective dashboard and visualize it alongside their real-time feeds.
  1. Simplify data infrastructure. Leveraging large amounts of contextual data from S3 no longer requires setting up additional infrastructure. Beyond eliminating ETL processes, having a direct connection to S3 also removes the need to send data to intermediate servers prior to consumption. For one of our industrials customers, this meant being able to incorporate third party geospatial data directly from a third-party feed. 

Our S3 adapter both (1) reduces our customers data engineering and ETL footprint and (2) directly increases their ability to execute on their core business priorities. We’ve been able to do this because of the early bet we made on the browser as the foundation for software delivery.

If you are looking for a way to make sense of S3, we’d love to chat with you about how we could simplify and enhance your existing user experience. We’re always happy to chat @ https://prospective.co/meet-eric

Thanks to Caitlin Lohrenz for help on this post!