Big Data Survey Results Show that Data Security is Paramount
At the end of September, HPE Security – Data Security attended the Strata + Hadoop World Conference in New York City. Strata + Hadoop World is where data scientists, analysts, and executives get up to speed on emerging techniques and technologies. The show bills itself as the largest of its kind, and it was interesting to note that new sessions and tracks were added to reflect challenges that have emerged in the data field—including security. One of the sessions was our very own Reiner Kappenberger, who showed how to “Enable secure data sharing and analytics in Hadoop with 5 key steps.”
Taking place on the conference floor, we conducted an anonymous survey querying attendees about their protection of sensitive data, and their current approach to securing data in Hadoop. With over 142 attendees participating, the results are revealing and show that protecting sensitive data in Hadoop is a top-of-mind concern for over 75% of the survey participants.
Below is a quick summary of the survey results:
- Seventy-five percent of the survey participants said their business currently uses some form of sensitive data such as PCI (payment card information), PII (personal identity information) or PHI (personal health information).
- When it comes to protecting that sensitive data, 61 percent said they are protecting that sensitive data using encryption and 27 percent are protecting with tokenization.
- Over 77% of the survey attendees said they are planning big data projects involving sensitive data.
- When asked what kind of sensitive data they need to secure for their Big Data projects, 42% said they need to secure credit card numbers, 54% need to secure social security numbers, 71% need to secure names and addresses and 60% need to secure date of birth (more than one answer could be selected).
This survey provides us with some great insights such as telling us that the clear majority of big data analytics projects involve sensitive data, a point underscored by the 77% of respondents who are planning such projects.
What is even more interesting is to compare this survey data to the survey results from last year’s Strata + Hadoop World Conference in NYC. The same questions were asked of 224 participants. While the same amount of survey participants said their business currently uses some form of sensitive data, the amount who use encryption is up, from 51% in 2014 to 61% in 2015 , as well as the percentage who are using tokenization, from 17% last year to 27% in 2015. Encryption and tokenization are becoming more widely adopted as industry standards such as NIST and ANSI become more commonly recognized.
Information is the New Asset
When survey participants from Strata + Hadoop World 2014 were asked what kind of sensitive data they need to secure for their Big Data projects, 35% said they need to secure credit card numbers, 27% need to secure social security numbers, 45% need to secure names and addresses and 29% need to secure date of birth. Compare that to the 2015 results.
With the 2015 participants, 42% said they need to secure credit card numbers, up 6 percent, showing the growth of transactions data in big data environments. Of last year’s participants, 27% said they needed to secure social security numbers, and this year that number doubled to 54%. Also compare personally identifiable information that is now being protected: last year only 45% protected names and addresses and 29% protected date of birth. Those numbers are up for this year, 71% secure names and addresses and 60% secure date of birth. As businesses become more aware that PII can be monetized and increasingly targeted by cyber criminals, the survey shows they are taking steps to prevent financial harm to consumers and loss of reputation to their brand.
The same questions were asked at the recent Teradata PARTNERS Conference in October. Similar to Strata + Hadoop, the Teradata conference is where technology leaders learn and share the latest strategies for optimizing their company’s use of data. Here a full 83% of participants said their company uses PCI, PII or PHI data. A similar amount to Strata 2015 are using encryption (57%) and tokenization (22%) to secure their data The types of sensitive data they are protecting are roughly the same, credit card (48%), social security (47%) Name/address (69%) and date of birth (58%).
Where the information really gets interesting is the question about the internet of things. The internet of things has become such a hot topic and can be loosely defined as the growing network of everyday objects that feature an IP address for internet connectivity—e.g. smoke detectors, wearables, door locks, thermostats, etc. This is such an emerging field, we didn’t even ask the question last year at Strata + Hadoop. We asked it for the first time in September and 37% said their companies were sending and receiving data with IoT, with 9% not knowing. And of that number roughly only one-third, 30%, were protecting the data with 14% not knowing if they protected their data or not.
Sensitive Data is Sensitive Data
At Teradata, 44% of participants said they collect data from Internet-connected devices. Of those, over one-third, 35%k, are protecting the data, which would appear to indicate a high proportion of sensitive or potentially sensitive data may be involved. Sensitive data is sensitive data, whether it is collected at a point-of-sale device, a web page or a smart watch. This sensitive data needs to be protected wherever it travels in a company’s ecosystem. With such large amounts of data being collected, stored and analyzed every day with Big Data projects, data security needs to be top of mind for CISOs and database architects
As Hadoop adoption and big data projects continue to accelerate across the enterprise landscape, data security clearly becomes a key consideration. As the survey reveals, three-quarters of Hadoop projects will involve some form of sensitive data, from credit cards to customer names and addresses. Data-centric protection is a best practice to secure sensitive data-at-rest, in-use, and in-motion.
Learn more about HPE SecureData for Hadoop.