Panic, shame, anger. These are emotions that hit you when your personal health or financial data has been stolen by a stranger. Even just the fear of having your data compromised is unsettling. No one expects this to happen to them, especially when they place their trust in the system.
This is why health and financial institutions have a duty to safeguard private and sensitive patient and client information. Health institutions of course hold our most sensitive information — medical records, things we may not even want family to know about. Financial institutions hold our valuable social security number, banking details, and payment information.
According to the 2017 Ponemon Institute Cost of a Data Breach Study sponsored by IBM, the average global cost of a data breach for a health organization is $9.8 million and $6.3 million for a financial organization. Naturally, these two industries have the highest average cost per stolen record: health care is $380 and financial services is $245, much more expensive than the global average industry cost of $141 per stolen record.
The slew of high profile cyber attacks on citizens is a stark reminder of how vulnerable our sensitive information really is. The very recent double Equifax data breach potentially impacts 143 million U.S. consumers, pummelling both the Equifax stock and most importantly consumer confidence. Anthem, the largest health insurance company in the U.S., has recently agreed to pay a record $115 million to settle a 2015 data breach class action lawsuit for over 78 million healthcare records breached. For both these cases, the intangible consequences were just as damaging as the financial costs, namely bruised company morale and lost leadership from executive attrition.
Beware of the legal obligations for protecting sensitive data
Given how much personal data is being stored digitally in the cloud, important regulations exist to ensure organizations do their best to protect us.
For healthcare, the HIPAA Privacy Rule is a national U.S. standard that requires healthcare providers and organizations, as well as their business associates, to develop and follow procedures that ensure the confidentiality and security of protected health information (PHI) when it is transferred, received, handled, or shared. The penalty for a HIPAA violation can be very expensive. The penalties for noncompliance are based on the level of negligence and can range from $100 to $50,000 per violation (or per record), with a maximum penalty of $1.5 million per year for violations of an identical provision.
For financial privacy, the Federal Trade Commission (FTC) under federal law requires banks, securities, firms insurance companies, and companies providing many other types of financial products and services to maintain safeguards to protect customer information and prevent malicious access to it. Noncompliance fines are even more damaging than HIPAA violations. The consequences of not being PCI compliant range from $5,000 to $500,000, which is levied by banks and credit card institutions. Even if a company is 100% PCI compliant and validated, a breach in cardholder data may still occur if the institution isn’t careful.
Malicious attacks are the biggest cause of data breaches
According to the 2017 Ponemon Study, the main cause responsible for 42% of data breaches is malicious or criminal attacks. 25% of breaches are due to human error, especially negligence, and 25% involved system glitches including both IT and business process failures. Since malicious attacks are obviously intentional and can even come from internal bad actors, they are the most costly.
The consequences of not taking proper precautions are severe. For instance, CardioNet, a maker of wireless devices for heart patients was fined $2.5 million for a 2012 incident in which an employee’s laptop computer was stolen from an unlocked car. The damage to brand reputation is even more difficult to overcome when customers churn and partners cut ties. According to a Forbes Insights report, ‘Fallout: The Reputational Impact of IT Risk’, 46% of organizations had suffered damage to their reputations and brand value as a result of a breach.
Use encryption or hashing in real-time at scale
It’s clear that safeguarding private and sensitive customer and patient information is paramount. But for massive health and financial organizations, actually implementing the best controls and security is challenging. Especially when troves of data originates from multiple sources and is stored across singular or multiple databases and data warehouses, how do you protect data at scale?
As experts in moving data and data integration, we at Alooma believe that encryption at scale is one really good answer. Encryption is the process of taking meaningful information and turning it into an unreadable form (usually random looking numbers) to prevent unauthorized access. Encryption is a reversible process, enabling authorized entities to convert the unreadable form back to meaningful information.
Another form of preventing unauthorized access to data is hashing. Hashing is a one way process, turning meaningful information into a random number in a reproducible, yet un-reversible way. Hashing is good for storing passwords: even if you obtain the hashed password, you cannot reverse it to the real password, but if you have the password, you can easily hash it again and make sure it matches. This is also good in case a private detail (like a name, or SSN) is shared across different tables or datasets. In this case, a hashed piece of information will stay identical across the different tables, keeping the data in tact.
The last form of preventing unauthorized access to data is called anonymization, or masking. This means any personal information will just be removed and will not leave a trace as to what it was.
So, how can you do this? Let’s go through an example.
Two Step example using Alooma
Since it’s common for related customer or patient data to be stored in different tables in a data warehouse, it’s most efficient to perform data encryption in transit using a data pipeline before it hits your data warehouse.
In this example, we will demonstrate how to encrypt dummy customer information. In the table below we have six customers and their name, age, address, date of birth, income and credit score. Using the Alooma Code Engine, we’ll hash the sensitive information that hackers want: date of birth, address, income and credit score.
Original Table - Unencrypted Data
first_name | last_name | D.O.B | address | Income | credit score | |
---|---|---|---|---|---|---|
1 | Simon | Wong | 07/23/1972 | 34 6th Street | $250,000 | 856 |
2 | Onson | Sweemey | 02/07/1964 | 13 34th Street | $100,000 | 745 |
3 | Sleve | Mcdichal | 10/16/1988 | 18 35th Street | $75,000 | 624 |
4 | Gleanallen | Mixon | 08/22/1990 | 92 39th Street | $84,000 | 766 |
5 | Darryl | Archibeld | 12/12/1982 | 21 36th Street | $62,000 | 512 |
6 | Anatoli | Smorin | 05/30/1974 | 10 37th Street | $35,000 | 254 |
Step 1 - Run this encryption code in the Alooma Code Engine
import hashlib
def hash_password(password):
# uuid is used to generate a random number
salt = '12'
return hashlib.sha256(salt.encode() + password.encode()).hexdigest()
def hash_event(event, fields):
for field in fields:
if event[field]:
event[field] = hash_password(event[field])
return event
Note: The def hash_event
function will only work for flat events. Nested events will not be converted this way.
Step 2 - Click `Run Test` to view a sample of the unencrypted and encrypted data in the lower panel of the Code Engine
Final Table - Encrypted Data
first_name | last_name | D.O.B | address | Income | credit score | |
---|---|---|---|---|---|---|
1 | Simon | Wong | 54f55ee8fde0e163256e319cb3e18cdd238d99031b6a3aa24f32074192c9cc6b | 01b628db091f236a504eb9029aef41798f67bffc194177730259789a4a1c964c | 50cf0782d11e5bb5514080b35047ce2d661d7a826e8b84995b029bc37f9583fb | 58ba31f958e9c22ad7c3d98c1849b9cfbf3e34430ef567ee9aa0e13af82b17e8 |
2 | Onson | Sweemey | 1f5ae5f850c5a0afabcef0ba14d0a6b925d22ba59ffafe24cacd7bce406c925f | 03883a6737efb434243a69f6bb798bb6a2fc8174d8f48e652f20b7884c9e8012 | db3ee8084ee3b7100df8c43b21aad3c20326a8dd5bcf74276f0e6fd942dad1ad | b86c1a3640eec9bc237b66051f9d88ba0f21ce176cc35e0eaa42cdd072e38316 |
3 | Sleve | Mcdichal | b58a0f1598e962e12760f18d3608e86ed9f767831c80eba24354ff7c1468a792 | c03aa67a138c8c84c48de3617eb253563bba4652d67cf85b70dafad373819863 | bd54ec4218ff83a351345336006ab613f3d894b6b1d7e46cb5dbb67bafea8b52 | a6745bab61af13d9e0bbd5e54ca9c5dd377f07aab2d51f2e7fe6bef41b42703a |
4 | Gleanallen | Mixon | d5cf7ac42cb51b669fa71f1945124a7617027717e93c56c6021d0780bedd99f1 | 8bbd6676f8d3e79305d90de50f7288e512f5826d8125c95fda8bf46458d24ed4 | 652032d4cdf2f5615455cbdaa747555179f11e0213384e37e1b0aa1288ca92ea | 7a6a56cdae56d8bb046f9fa759029c5f5ea5f7f9d673637752bac2c71ebdb0e7 |
5 | Darryl | Archibeld | 39ed390e4f16bdf548fafe4c1efe26bfe1a6cfe141b3da74f9a56aa9f7cbdab7 | 1ba39fe0250500081fbc94beb31c13f41bdfa530477a9c29e08c50805dc7b243 | bc94e8f46b3b3530c7fc341ccc9e03575203df31c22a44e4abb5c78b9ed0508f | fa1f94262c4ad9e45c99eded9f2406423a40039a60f299d5703114f8ce770e32 |
6 | Anatoli | Smorin | 5b186c25f37dcdf9025600c1153e84eedc932d49cea329da4f256ddb0984d2e8 | aac84dd8c940635d90fd8f8956ba4aff135376be08d91f6b2c1dc78ba34ecc19 | b62cc50ba1c0b73db27f84b56471c7e5ac137f5f615861107ab0dc97ed4f52d8 | fb5f6241b4e9c28e7e7972412f27894c6b9fbfa10e9f5a0539abace709651c0a |
Things to pay attention to
When using this hashing method, there are several important points to remember, especially when encrypting at scale.
Key points are:
- Once data is hashed, you cannot decrypt it. You can obtain the original data and hash it again, to make sure it is identical, but you can't go from hashed form back to the data.
- Hashing algorithms can be CPU intensive and will take a small toll on the resources used for processing. Therefore, using the Code Engine to perform extensive hashing may take a larger toll on the total rate of event processing.
- If you’re using PII columns for sorting (e.g. first & last name) the sorting will be lost after hashing those columns. Keep that in mind when designing your data warehouse's schema.
PII data is rarely required for analytics. That’s why keeping data hashed or encrypted in a consistent manner helps to reduce security risk.
Advantages of using Alooma for Hashing PII
Since Alooma is a modern data pipeline service built on top of scalable, distributed streaming infrastructure, there are a number of advantages over home grown solutions for masking, hashing, or encrypting.
Key advantages are:
- Easy to deploy and maintain- with about 10 lines of code, you can configure masking, hashing, or encryption that will work uniformly across all of your data, and will stay resilient even as your data grows and changes.
- Distributed streaming infrastructure for processing- frees you from worrying about scaling computing resources as your data volumes grow, and as encryption algorithms take their toll.
- Automatic compliance- Alooma invests considerable resources into keeping its compliance and certification up to date and to the highest standard.
Alooma takes security seriously
Since the Alooma founders hail from the Israel Defense Force, security is in our DNA. We take data security extremely seriously and want our customers to trust us and our customers’ customers to trust them. With a combination of technology, experience, and protocols, Alooma upholds the strictest data privacy standards.
We are happy to sign a Business Associate Agreement (BAA) to ensure your data is HIPAA compliant. For an extra layer of security, we are SOC 2 Type II certified. That means our information systems are continuously audited to make sure our data security, availability, processing integrity, confidentiality, and privacy standards is assured by Ernst & Young.