Five Steps to Take Back Control of your Hadoop Data

Five Steps to Take Back Control of your Hadoop Data

Recently Reiner Kappenberger offered five steps to take back control of Hadoop data to the Discover Performance website, HP Software’s community for IT leaders. Reiner is a global product manager at HP Security Voltage for HP SecureData for Hadoop, and brings over 25 years of experience and expertise in the online security, data management, and telecommunications sector.


Enterprises are quickly adopting the open architecture of Hadoop for big data due to its ability to handle large volumes of structured and unstructured data more efficiently. Reiner shrewdly points out that Hadoop has few security layers and businesses are left on their own to secure their data.

Market solutions for Hadoop security are beginning to emerge, delivering data security features that make it possible to protect sensitive data. Whether you leverage a commercial solution or create a homegrown approach, Reiner suggests the following five steps to identify what needs protecting and apply the right techniques to protect it.

Five steps to take back control of Hadoop Data

First, take an inventory of all the data you intend to store in your Hadoop environment. It is imperative to audit and understand your Hadoop data, advises Reiner. Next, perform threat modeling on sensitive data. The goal of threat modeling is to identify the potential vulnerabilities of at-risk data and to know how the data could be used against you if stolen. “For example,” explains Reiner, “We know that personally identifiable information always has a high black market value. But assessing data vulnerability isn’t always so straightforward. Date of birth may not seem like a sensitive value alone, but when combined with a zip code, a date of birth gives criminals a lot more to go on. Be aware of how various data can be combined for corrupt purposes.”

Then, identify the business-critical values within sensitive data. It is no good to make the data secure if the security tactic also neutralizes its business value. “You’ll need to know if data has a characteristic that is critical for downstream business processes,” eplains Reiner.

After that, Reiner says, apply tokenization and format-preserving encryption on data as it is ingested. “This is particularly suited for Hadoop,” says Reiner, “because they do not result in collisions that prevent you from analyzing data.”

Lastly, provide data-at-rest encryption throughout the Hadoop cluster. “When hard drives age out of the system and need replacing, encryption of data-at-rest means you won’t have to worry about what could be found on a discarded drive once it has left your control,” continues Reiner. “This step is often overlooked because it’s not a standard feature offered by Hadoop vendors.”

Whether your company is planning a big data project or already is using Hadoop applications, Reiner offers these best practices for security leaders to put an end to costly data breaches and ensure attackers will glean nothing from their attempts to breach Hadoop in the enterprise.

Go here for more information on HP SecureData for Hadoop.




Leave a Reply

Your email address will not be published. Required fields are marked *