Today, most companies still rely on either traditional databases or data warehouses to generate the reports their management needs to make informed decisions. Those solutions have gotten us pretty far, but they also have some serious drawbacks. For one, loading all that data into the same place and then generating reports can take hours - and that’s only if everything works right the first time around! If something goes wrong during the process (and something often does) it can take even longer.
The problem is obvious. Processing large amounts of data requires significant compute power. And that kind of processing power costs a lot of money. Spending more money on hardware to get the runtime of that important monthly report down to one minute doesn’t make sense, especially if that hardware does nothing else the rest of the month. If you bought it, you’re stuck paying for it for several years, no matter if you’re using it, or not.
It becomes even worse when you get down to the BI, analytics, and data science aspects of your organization. To begin with, good data analysts, data scientists, and data engineers are hard to find, and expensive. Then, in most data science departments it’s not uncommon to have one data scientist waiting for their complex job on their expensive cluster to finish, while five more are waiting for their turn, just so their job can begin. So now have all these hard to find, expensive people, standing around waiting for their compute jobs to finish.That’s not very efficient! On the other hand, buying enough hardware to process all that data more quickly will cost a prohibitive amount of money as well, and even more people to keep it all up and running. And, at the most, these resources will only be used for eight hours a day, five days a week.
So, either you don’t have enough compute power because people are waiting for results; or you have too much, because it’s doing nothing while still costing money.
Isn’t there a better way to deal with this problem?
Luckily there is. Today, cloud providers sell compute power in small increments. Amazon sells it by the hour, Google Cloud even sells it by the minute. Combined with technology that allows you to separate data storage from data processing, it means you can rent a large amount of compute power when employees are waiting for it, and immediately stop paying when you don't need it anymore.
You will be able to split your costs into two parts. The storage parts that retains your data 24/7, which is relatively cheap. And the expensive compute resources needed to calculate reports only when you need them and people are waiting for them.
But the best part is that your people won't waste time waiting around for results. It doesn't matter if processing your data takes ten machines ten hours, or 100 machines one hour, or 1200 hundred machines five minutes, the total cost will be the same since you’re paying for those machines by the minute.
So, what about that expensive data scientists who had to wait ten hours to get a report back before? They’ll get it much faster now. And, if it turns out that there’s a mistake in it somewhere, they can just run the report again. Co-workers that waited for ten hours to begin their jobs can use their own resources and run their jobs at the same time.
The real value in moving your data to the cloud goes beyond saving costs. The real value is that your employees won’t waste their valuable hours anymore waiting for their data.
This article is part of the Urgent Future report Cloud - It's a Golden Age.