When a gamer playing Fortnite buys a new "skin" (or outfit) for their player, the transaction data from the purchase is sent to a data lake.
Epic Games, the creator of the videogame hit with 250 million registered users, relies on data lake infrastructure to produce insights that drive its decisions.
Amazon Web Services, the company's cloud provider, says this is a good fit between use case and technology. Tending to a massive community of gamers means Epic Games must handle two petabytes of new data each month with enough speed and flexibility, according to Herain Oberoi, GM of database, analytics, and blockchain marketing at AWS.
"It's an example of scale that you just would not have been able to solve with a traditional data warehouse," Oberoi said.
Data lake infrastructure, a centralized repository of information, lets companies house raw, unstructured data with the elasticity of the cloud. For the technology to make sense, key decision makers within the organization need easy access to the data.
To maximize the technology's impact, businesses need to:
- discover whether data lakes are a good fit
- have a rough idea of which data insights are important
- strategize to avoid ending up with unused, siloed data
A question of access
Data lakes can store large quantities of data from multiple sources whether the sets are structured or not, a key difference from data warehouses, which house already-structured data for a set purpose.
The development of data analytics tools makes the line between a data lake and a data warehouse permeable, said Tristan Handy, CEO and founder of Fishtown Analytics, in an interview with CIO Dive.
"You unlock the ability to do all of this stuff to a totally different set of people in the enterprise."
Tristan Handy
CEO, founder of Fishtown Analytics
With tools such as Amazon's Redshift Spectrum or Snowflake, data engineers can join data in the warehouse and lake together in the same query. It's all part of the broader business intelligence infrastructure, one which allows for data lake/warehouse hybrids to exist. Talend, Apache Spark and other cloud providers such as Microsoft and IBM enable similar capabilities.
Still, when data lakes are deployed, leaders have to ensure the potential benefits of the infrastructure can reach those who can act upon it. Think of business analysts or sales managers.
"You unlock the ability to do all of this stuff to a totally different set of people in the enterprise," Handy said. "The change hasn't been fully realized in a lot of enterprise settings."
Data in context
In the age of cheap cloud, the barrier of entry to spinning up some servers and setting up a data lake has drastically dropped. From an ease-of-use perspective and cost perspective.
As it's become easier to build data lakes, the guardrails of data ingestion have fallen off: All data splashes down in the lake, often without a clear idea of which insights it can yield for a company.
A data lake without some intentionality can lead to a disaster scenario, said Taylor Bird, SVP of product and solutions at Onica, in an interview with CIO Dive. "You sort of assumed that the advantage would just present itself instead of planning for it."
Siloed information that sits unused in the data lake lands companies in a "data black hole," a situation that wastes resources and drives leaders further away from the insights they're looking for, according to Bird.
Part of the appeal of the technology is companies can deploy it without defining all insights on day one. But building a data lake with the aspirational idea of getting to the insights later is something Bird advises against.
How to succeed?
Netflix and YouTube surprise users by delivering super-accurate content suggestions. It's one use case data lake infrastructure is well suited for because it pairs multiple streams of information — at scale — to gather deeper insights.
"... if I can bring my business data and couple it with other data, like clickstream, intent data ... that's where things get super, super interesting."
Matthew Halliday
Co-founder, VP of products at Incorta
"If I look at just my business data, that gives me a lot of value," said Matthew Halliday, co-founder and VP of products at Incorta, in an interview with CIO Dive. "But if I can bring my business data and couple it with other data, like clickstream, intent data ... that's where things get super, super interesting."
With large data volumes, the question of security comes into play. The right people in an organization need to have the right security credentials in order to be able to look at and access the right kind of data, according to Oberoi.
Infrastructure woes can be solved by the right management strategy. With strong leaders who can enforce clear requirements around data lake infrastructure, a company increases its chances of sidestepping sunken costs into an infrastructure that won't deliver.