From the time the term “data lake” first appeared in 2010, organizations rushed to build them. They were driven by a vision of unified data access for the entire organization, transforming their legacy firms into modern, data-driven companies. Many companies spent millions of dollars deploying and operating these complex systems with many integration points. At that time, data lakes promised to bring together siloed data, so organizations could see an integrated view of their businesses, run scenarios and implement new data driven processes.
The reality of data lakes, however, has been quite far from that original vision. This happened for a variety of reasons, including a lack of skills maturity within organizations, operational complexities and difficulties with existing legacy systems at integration points. The biggest mistake was that most organizations focused on the technical details, such as implementing a Hadoop cluster in a data center, while missing the operational process pieces that affect the consumption of data, the business process impact and the measurement of outcomes.
Enter the Enterprise Data Platform (EDP)
An EDP provides not only the technical elements necessary for a data-driven organization, but also the additional services that ensure business users can consume the data stored within the EDP to make better decisions to improve the process.
Figure 1 speaks to the functional elements that make up an EDP. These elements give organizations a single, uniform view of data that is then exposed to, and manipulated and analyzed by, a variety of tools and methods to meet the individual needs of business units and users. Components include:
- Persistent Store – A highly performant platform for storing raw data for the purpose of processing and loading it into analysis platforms.
- Flexible Services – The deployment of an EDP will demand an ever-changing combination of services to support batch processing, streaming analytics, data transformation and machine learning. An EDP lets you deploy these services rapidly, and then remove them when they are no longer needed.
- Data Catalog – This is the central element that ensures users have a uniform view into the data stored in the EDP, so they can see how that data has been utilized and how to consume the various data sets and types.
- Automated Data Preparation – An effective EDP will include data sets that are pre-integrated and pre-prepared for users to consume. This process should be automated so new data sets are quickly added to the EDP, and made available to users with minimal pre-work before moving to business analysis.
To ensure that an EDP becomes a powerful tool for your organization, you need to make some changes that will ensure the capability is fully utilized.
- Training – Staff should be adequately trained on how to take advantage of the EDP’s capabilities, focusing on making sure business users have the necessary skills in data analysis, visualization and deaccessioning.
- Data Governance – It is important that the organization put in place strong data governance to ensure decisions made from EDP sourced data sets are of high confidence.
- Lineage – Strong lineage makes certain that organizations can track decisions back through all past data integration, transformation or modification.
- Metadata and Business Term Mapping – Strong mapping of data sets to business terms ensures that the organizational culture carries through to how people locate and consume valuable data sets.
An Enterprise Data Platform allows an organization to move beyond simply storing data for basic analysis, to enabling data consumption by all parts of an organization.
EDPs change the model from moving all data to a central location, to building a next-generation set of services that can easily adapt to changing business needs, while simplifying data platforms and eliminating legacy technologies and processes. EDPs gives businesses access to prepared data that is easily deployable across different services as needed, yet protected by organizational standards.