Businesses, institutions, and organizations across various fields and industries derive data from many different sources, using it in multiple ways and for numerous purposes. As the complexity of data grows exponentially (especially in larger organizations), a modern data platform becomes indispensable for consolidating data management and extracting its actual value.
A data platform collects, processes, analyzes and presents data from different systems and processes through a collection of technologies. However, what exactly constitutes a “data platform” varies, depending on its purpose and opinion. Some people call “a solution” that goes even a bit further than business intelligence a data platform. Others set stricter requirements and are not satisfied with a solution that offers anything less than end-to-end data management.
At any rate, every data platform must consist of different layers, three or four minimum. The first layer begins with ingestion — data collection and input from various sources. That data must then be stored somewhere, in object storage or a data warehouse, for example. A server layer in between, possibly with an analysis stack, comes afterward.
But like all technology, the data platform is evolving quickly. Ten years ago, you could call a data platform “modern” if it was Hadoop. Now, there are new requirements. Here are the five characteristics required in a modern data platform.
Cloud-Based and Managed
According to Diederik Greveling, CTO for GoDataDriven, cloud computing has played a major role in today’s modern data platform. “The Cloud has introduced new technology and brought a new way of working with data,” explained Greveling during Club Cloud’s GoDataFest 2021. “We see tools, especially managed tools, used more often, like Fivetran, Stitch, or Airbyte. In combination with the Cloud, these tools make the data platform simpler to maintain and easier to scale than was the case with on-premises,” he said.
Cataloging and Data Lineage Standard
Increasingly, organizations need the ability to follow the data. A data catalog and data lineage make that possible. What used to be an afterthought and added to the platform later is now a standard component. “Having this ability from the get-go makes data much more accessible,” Greveling explained, “it’s critical because it’s not only data scientists and data engineers who need to work with the data but also the analysts. Marketing teams also want access to all the financial data.”
From ETL to ELT
The traditional procedure for drawing data out from different systems and combining in a data warehouse was ETL: extract-transform-load. You would first pull the data from the system and adapt it to the data lake before loading. Today, we are moving toward ELT — extract-load-transform. With the modern data platform, raw data is first stored in the data lake and then transformed. That takes more computing power, but the Cloud offers the capacity for that.
“Because the step from extract to load is standardized, you no longer have to do all those separate transformations.,” explained Greveling. “It provides a standardized method for going from MySQL to BigQuery, so I no longer have to deploy one or two data engineers to manage all those separate transformations. It’s a simple, linear process,” he said.
Every company is different, and the value of data lies mainly in the use case. That made setting up traditional on-premises data platforms an enormous challenge. Everything had to be custom-made, the models all had to be put into production, and everything had to be adapted to the requirements. “To a certain degree, that is still the case,” said Greveling, “but the big difference with the modern data platform is that everything is standardized, significantly shortening the implementation time, especially in the beginning.”
The company has already delivered an operational data platform within a few weeks for one customer, something which used to take months. “When you look at the requirements for many customers, you notice that they align on many points. You can lay the foundation in a couple of weeks and then focus on the use case. That’s really important,” he said.
Makes Data Mesh Possible
The fifth characteristic of the modern data platform is that it enables organizations to implement a data mesh architecture as a process. This new concept decentralizes the data team so that you can avoid an enormous potential bottleneck. In a certain sense, data mesh is comparable with micro-services, but for data platforms. The data streams are divided across different teams that can use them in a relevant way. “You can then scale the whole data process within the organization,” explained Greveling. Because the different components can easily exchange datasets with APIs, each team can, for example, continue using its own BigQuery environment.
“Managing the data is decentralized, but sharing the data is standardized,” Greveling summarized. “That is really quite powerful.”
Thanks to the enormous computing power of the Cloud, the modern data platform significantly improves on the traditional, on-premises model at critical points. It expands the possibilities for extracting value from your data while enhancing your organization’s processes. An MDP also helps organizations achieve an abbreviated time-to-market and faster implementation — all while minimizing maintenance and management costs.