The global big data marketplace is scheduled to reach a quarter of a trillion dollars by 2025, and the AI market which also is heavily dependent on data, is due to grow to $38.4 billion in the same time. The hype around data has been immense, with cliché quotes such as, “Data is the new oil”. While as early as 2010, the Economist reported how an IBM study showed businesses did not have the kind of confidence needed to have data science applied to their data. What has changed today? Well, quite a lot, from the standpoint of businesses having all their bases covered while embarking on data projects.
The Digital Transformation Context
Given the current hype around digital, and digital transformation, enterprise data is the quintessential elephant in the room, occupying a much bigger and more important place than it did a decade ago. Hence the question of, “With digital technologies dominating the marketplace how do businesses get to the point where their data and data science will translate into revenue and ROI?”
The answer begins with addressing the elephant in the room. How do you plan to:
- Aggregate your data efficiently
- Manage your data sources
- Pinpoint the right use cases for your data science
- Establish clear governance processes to make sure your data is trustworthy
- Rigorously implement access, and security systems to address data privacy and security concerns
- Evangelize the data story to gain stakeholder confidence and support
This AMPERE approach can help maintain the flow of interest, the current that keeps your data science projects running and more importantly, successful. But it is the first step, and requires the teams to understand that:
- Your data lakes, data warehouses, cloud – they are the technology. The Infrastructure that holds your data, the tools, processes, and the different integrators that give life to your data program.
- Data is the oil that lubricates your business. Keeping it pure, keeping it going, and matching the data requirements for each of your use cases is key to achieving consistency.
- AI is your differentiator. The secret sauce that gives your business the competitive advantage you need the most to remain relevant in today’s digital world.
That said, it is still a combination of all three facets of your data story that offers the business benefit – they need to be orchestrated seamlessly – which brings us to…
How do we do that
Orchestrating the three crucial facets of your data story, requires what is called a Data Center of Excellence. An aggregate of the data sources, the data governance, data tools, platforms, and of course the AI itself.
It starts with Data
The data you have needs to be in perspective. For that, you need more than just data scientists. You first need to get your data in order, so it can be used by the data scientists. Enter Data Management.
Data Management is nothing but the processes you implement to make sure your data is:
- Cleanse and Validate – Going back a decade, when big data was the big deal, being able to trust the data you use was the biggest challenge – it is still one of the biggest challenges today. From eliminating duplicates, to errors of omission and commission such as differences in spellings, punctuation, and fields chosen to be filled, due to a variety of reasons, the data that you need to get your insights from, can be compromised. It is important that this information is accurate and validated.
- Data Integrity - Once cleaned, the data needs to have a clear relationship between the source, the staging area, to its final destination in your data science platforms, the data needs to be free of errors. Yes, after it is cleansed…because your data is constantly used, modified, and updated. It is important that you maintain this data quality by implementing appropriate processes and procedures that will prevent intended or unwarranted modification of this data, compromising its integrity. A process also known as data governance.
- Data Governed? Now Share it – Integrate the data sources with the analytics tools, APIs and interfaces for reports and dashboards.
It sounds easy, but there’s a lot of work that goes behind identifying your data, making it trustworthy, governing your data, and making it available for the data scientists.
Getting Started with your Data Center of Excellence
Getting started is a bit of a misnomer here, because some of the biggest challenges businesses face with their Data CoE is the process of getting started. Some of the aspects that go into the creation of a Data Center of Excellence or Data CoE, are:
Your data vision, which drives your data aggregation, management, governance, and ultimately, your data science. While large businesses have hired their data science teams and have been able to get this part of the Data CoE covered, they still have relied on consulting services providers. Why? Because of the unbiased nature of consultancies. As consulting services providers are able to take third-person’s perspective, they are typically able to identify the drivers that will either make your business data vision or make constructive arguments against it.
The Business Cases
Your data business cases that ultimately crystallize that vision into reality. While identifying potential cases is easy, understanding their feasibility and auditing the potential gains from each of these business cases is also crucial in ascertaining the validity of each business case. Here again, external contributors can help maintain objectivity and eliminate bias.
Adopt the right technologies – Whether on-prem, cloud, or hybrid your data platforms need the appropriate technologies that avoid siloes and improve visibility and governance. Here, the emphasis is typically on the “Right Partner”. Even if you were to partner with a consulting services provider, you need to make sure they take a technology agnostic approach that will eventually help you make the most of your Data CoE investments. One of the most critical contributions of a consulting partner when it comes to technology adoption is creating a long-term business case view of using one technology as opposed to another.
A very important consideration here, is that technology, once selected needs the right skillset and experience to implement the technology stack. Once implemented, the technology will need to be managed and supported, which will also need an appropriate skillset. Pertinent questions to ask here, include
- People: Do you have the skill-set in-house? If not, can you acquire them in time? If you can, can you do it affordably? Data Scientists are by far the most expensive resources today, followed by AI, ML, and NLP. This is an important consideration before you choose to hire the skillset you need to implement and support your Data CoE technology stack.
- Processes: And Practices – you need them both, and you need the right processes and industry best-practices to be established, so your Data CoE is operating at full steam and at peak efficiency. Getting them can be difficult as once again, there are people involved. Experience helps reduce risk, and Expertise, makes things go smoother and almost always, faster.
- Technologies: Beyond the technology stack by itself, you need tools and technologies that keep your Data CoE running. From monitoring tools that help keep the infrastructure and applications up and running, to IT Service Management tools to address issues, service requests, and of course, the development tools for Continuous Integration and Continuous Delivery (CI/CD). We have to go back to the first two points here as the resources’ experience and expertise with these tools and technologies, and of course, the processes and practices such as Agile, ITIL, and ISO to make sure the tools and technologies are utilized appropriately, and adequately.
or and buy
The most important aspect of the Data CoE is the CoE itself. It doesn’t matter in the longer term if you established it yourself, or you had a partner. Ultimately, what your Data CoE’s performance will be measured by, is the insight and the ROI it delivers. So, isn’t faster, better?
When time is of essence, and building a team from scratch is difficult, Consulting and Talent Services can help immensely reduce risk and improve the time-to-insights. They can bring in the right resources to design your Data CoE, establish the right technology stack, the practices, and allow you to hit the ground running.