Rethinking Data Architectures For A Cloud World

2 July, 2021

No Comments

Data analytics solutions are continuing to emerge at a fast and furious rate. Data teams are at the center of the storm because they have to balance all the demands for access, data integrity, security, and proper governance, which entails compliance with policies and regulations. The businesses they serve need information as quickly as possible and have little patience for that precarious balancing act. The data teams have to move fast and smart.

They also have to be fortune tellers because they need to build not just the systems for today, but also the platforms for tomorrow. The first key question the data team must consider is: open or closed data architectures.

Open vs. closed data architecture

By definition, those databases are what we would call “closed data architectures.” That’s not a value statement; it’s a descriptive one. It means that the data itself is closed off from other applications and must be accessed through the database engine. This is true even for moving data around with ETL jobs because at some point, to do the export or the import, you need to go through the database, whether that’s the optimal way to achieve what you want to do or not. The data is “closed” off from the rest of the architecture in this important sense.

In contrast, an “open data architecture” is one that stores the data in its own independent tier within the architecture, which allows different best-of-breed engines to be used for an organization’s variety of analytic needs. That’s important because there’s never been a silver bullet when it comes to analytic processing needs, and there likely never will be. An open architecture puts you in an ideal position to be able to use whatever best-of-breed services exist today or in the future.

Open, services-oriented data architecture

When applications moved from client-server to web, the fundamental architecture changed. We went from monolithic applications that ran in one process, to services-oriented applications that were broken into smaller, more specialized software services. Eventually, these became known as “microservices” and they remain the dominant design for web and mobile applications. The microservices approach held many advantages that were realized due to the nature of cloud infrastructure. In a scale-out system with on-demand resource models and numerous teams working on pieces of functionality, the “application” became nothing more than a facade for dozens or hundreds of microservices.

Everyone agrees that this approach has many advantages for building modular and scalable applications. For some reason, we’re expected to believe that this paradigm isn’t nearly as effective for data. At Dremio, we believe that’s inaccurate. We believe the logic of looking at our data in the same open, services-oriented manner as our applications is intuitively obvious and desirable.