Data lineage: The key to avoiding a Facebook-sized data fallout

Editor's Note: The following is a guest post from Sue Habas, VP of Strategic Technologies at ASG Technologies, a technology solutions provider.

Facebook is the first corporate giant to fall — or at least publicly skin their knee — due to data privacy. The Cambridge Analytica scandal has brought to light several issues focused on how U.S. companies are dealing with personal data, proving that even the world's largest tech companies are struggling to achieve security and compliance.

The recent call to reform data privacy has too often been dismissed as everything from an inconvenience to an empty threat, but with the General Data Protection Regulation looming close, companies are realizing that the consequences for failing to properly manage data may be more severe than they first anticipated.

If the scandal around Facebook and Cambridge Analytica is just the tip of the iceberg, organizations should take note of how quickly mismanaged data can cause unsuspecting companies to sink.

To avoid the fate (and fines) that come with non-compliance, companies are searching for ways to manage the ever-increasing volume of data that they are collecting and creating — ideally without having to devote countless hours to manually checking for compliance issues. The solution is not simple, though it is possible, and there are solutions that can accelerate the process.

Organizations can drastically enhance security by investing in an enterprisewide approach to finding and managing data. All too often, companies delegate privacy data protection to department heads and are more likely to miss data, create redundancies and only secure bits and pieces of the story needed to assess risk and compliance.

Rather than focusing on specific parts of the data lifecycle, such as how it's being used by the sales team or marketing department, an end-to-end compliance strategy delivers a transparent view of data across the entire lifecycle — weaving together a comprehensive story of information.

In the case of the GDPR, it is especially critical for organizations to discover exactly where personally identifiable information originates and where it is moving in and out of the organization on an international scale. For those in charge of implementing standards around data governance, this process necessitates looking at data from a global perspective — across departments, states and even countries.

This end-to-end data flow combined with GDPR's rules of consent will give organizations the full picture of where the privacy data is stored, how it's being used and whether it aligns with the consent of the corporate privacy regulations.

Data lineage is the lifeboat that organizations need to achieve an enterprisewide approach to corporate data strategy. Looking at different areas of data requires picking up and getting past other technologies, which demands (and wastes) significant man-hours.

In fact, humans are no longer capable of keeping up with the speed of data creation. With so much data being generated and gathered, the information manually collected on a Monday could be outdated by Friday —not to mention that human-created spreadsheets are inherently biased and error-prone.

Strong, automated data lineage supplies the diversity, depth and breadth needed to gather and analyze data from end to end, rather than focusing on specific chunks of information — enabling companies to cross business and technology borders.

There is a right way to do data lineage, however — it's not a one-size-fits-all solution to data privacy and GDPR. The more automation and intelligence that organizations implement around data discovery, the more success and efficiency they can drive. Automation diminishes human inefficiencies and reduces liability.

Still, even with this holistic view, there will be gaps in the data that are difficult to detect and discover due to "one-off" code written by developers to connect that data. Automation significantly reduces the number of gaps in data lineage, though no solution is 100 percent perfect.

Leveraging technology that delivers automation but also offers stitching capabilities can help show how the data in these gaps is connected, even before the gaps are completely closed — allowing users to gain a full picture despite those darker areas of information.

There are two types of data lineage: inferred and fact-based. Some vendors are inferring data lineage, but it's important to be highly skeptical of what this means and how often it is used, given it is interpreting lineage by comparing metadata structures, time stamps or name matching.

Fact-based lineage, on the other hand, looks at the transformation code between gaps and how it is aggregating to make connections — employing automation to see what is actually happening to the data and when it is being transformed. This information should be transparent to allow for analysis of whether or not it meets the business rules for manipulating that data.

Companies that try to take on data privacy without an enterprisewide approach to data discovery and lineage — especially mammoth companies like Facebook — are vulnerable. Organizations cannot risk losing track of their most critical data, whether it’s around privacy or regulatory compliance, given the steep consequences, from losing customer loyalty and facing noncompliance fines to receiving bad publicity that can be devastating to the brand.

As the GDPR era fast approaches, companies need to have precise, real-time visibility into the information supply chain to discover critical data, to know when it changes and to understand how it can impact the organization.

An enterprise-wide approach to data lineage empowers organizations, from corporate giants to small enterprises, to know exactly what they're working with. By leveraging the right tools to keep pace with personal data as it constantly changes and moves, any organization can secure a full picture that will allow them to prove compliance and ensure business remains afloat come the GDPR deadline in May.