Encrypt Private Health and Financial Data in Real-Time

by Daniel Ben Yosef & Jason Lim  
8 min read  • 14 Dec 2017

Panic, shame, anger. These are emotions that hit you when your personal health or financial data has been stolen by a stranger. Even just the fear of having your data compromised is unsettling. No one expects this to happen to them, especially when they place their trust in the system.

This is why health and financial institutions have a duty to safeguard private and sensitive patient and client information. Health institutions of course hold our most sensitive information — medical records, things we may not even want family to know about. Financial institutions hold our valuable social security number, banking details, and payment information.

According to the 2017 Ponemon Institute Cost of a Data Breach Study sponsored by IBM, the average global cost of a data breach for a health organization is $9.8 million and $6.3 million for a financial organization. Naturally, these two industries have the highest average cost per stolen record: health care is $380 and financial services is $245, much more expensive than the global average industry cost of $141 per stolen record.

The slew of high profile cyber attacks on citizens is a stark reminder of how vulnerable our sensitive information really is. The very recent double Equifax data breach potentially impacts 143 million U.S. consumers, pummelling both the Equifax stock and most importantly consumer confidence. Anthem, the largest health insurance company in the U.S., has recently agreed to pay a record $115 million to settle a 2015 data breach class action lawsuit for over 78 million healthcare records breached. For both these cases, the intangible consequences were just as damaging as the financial costs, namely bruised company morale and lost leadership from executive attrition.

Given how much personal data is being stored digitally in the cloud, important regulations exist to ensure organizations do their best to protect us.

For healthcare, the HIPAA Privacy Rule is a national U.S. standard that requires healthcare providers and organizations, as well as their business associates, to develop and follow procedures that ensure the confidentiality and security of protected health information (PHI) when it is transferred, received, handled, or shared. The penalty for a HIPAA violation can be very expensive. The penalties for noncompliance are based on the level of negligence and can range from $100 to $50,000 per violation (or per record), with a maximum penalty of $1.5 million per year for violations of an identical provision.

For financial privacy, the Federal Trade Commission (FTC) under federal law requires banks, securities, firms insurance companies, and companies providing many other types of financial products and services to maintain safeguards to protect customer information and prevent malicious access to it. Noncompliance fines are even more damaging than HIPAA violations. The consequences of not being PCI compliant range from $5,000 to $500,000, which is levied by banks and credit card institutions. Even if a company is 100% PCI compliant and validated, a breach in cardholder data may still occur if the institution isn’t careful.

Malicious attacks are the biggest cause of data breaches

According to the 2017 Ponemon Study, the main cause responsible for 42% of data breaches is malicious or criminal attacks. 25% of breaches are due to human error, especially negligence, and 25% involved system glitches including both IT and business process failures. Since malicious attacks are obviously intentional and can even come from internal bad actors, they are the most costly.

The consequences of not taking proper precautions are severe. For instance, CardioNet, a maker of wireless devices for heart patients was fined $2.5 million for a 2012 incident in which an employee’s laptop computer was stolen from an unlocked car. The damage to brand reputation is even more difficult to overcome when customers churn and partners cut ties. According to a Forbes Insights report, ‘Fallout: The Reputational Impact of IT Risk’, 46% of organizations had suffered damage to their reputations and brand value as a result of a breach.

Use encryption or hashing in real-time at scale

It’s clear that safeguarding private and sensitive customer and patient information is paramount. But for massive health and financial organizations, actually implementing the best controls and security is challenging. Especially when troves of data originates from multiple sources and is stored across singular or multiple databases and data warehouses, how do you protect data at scale?

As experts in moving data and data integration, we at Alooma believe that encryption at scale is one really good answer. Encryption is the process of taking meaningful information and turning it into an unreadable form (usually random looking numbers) to prevent unauthorized access. Encryption is a reversible process, enabling authorized entities to convert the unreadable form back to meaningful information.

Another form of preventing unauthorized access to data is hashing. Hashing is a one way process, turning meaningful information into a random number in a reproducible, yet un-reversible way. Hashing is good for storing passwords: even if you obtain the hashed password, you cannot reverse it to the real password, but if you have the password, you can easily hash it again and make sure it matches. This is also good in case a private detail (like a name, or SSN) is shared across different tables or datasets. In this case, a hashed piece of information will stay identical across the different tables, keeping the data in tact.

The last form of preventing unauthorized access to data is called anonymization, or masking. This means any personal information will just be removed and will not leave a trace as to what it was.

So, how can you do this? Let’s go through an example.

Two Step example using Alooma

Since it’s common for related customer or patient data to be stored in different tables in a data warehouse, it’s most efficient to perform data encryption in transit using a data pipeline before it hits your data warehouse.

In this example, we will demonstrate how to encrypt dummy customer information. In the table below we have six customers and their name, age, address, date of birth, income and credit score. Using the Alooma Code Engine, we’ll hash the sensitive information that hackers want: date of birth, address, income and credit score.

Original Table - Unencrypted Data

first_name last_name D.O.B address Income credit score
1 Simon Wong 07/23/1972 34 6th Street $250,000 856
2 Onson Sweemey 02/07/1964 13 34th Street $100,000 745
3 Sleve Mcdichal 10/16/1988 18 35th Street $75,000 624
4 Gleanallen Mixon 08/22/1990 92 39th Street $84,000 766
5 Darryl Archibeld 12/12/1982 21 36th Street $62,000 512
6 Anatoli Smorin 05/30/1974 10 37th Street $35,000 254

Step 1 - Run this encryption code in the Alooma Code Engine

import hashlib

def hash_password(password):
  # uuid is used to generate a random number
  salt = '12'
  return hashlib.sha256(salt.encode() + password.encode()).hexdigest()

def hash_event(event, fields):
  for field in fields:
    if event[field]:
      event[field] = hash_password(event[field])

  return event

Note: The def hash_event function will only work for flat events. Nested events will not be converted this way.


Step 2 - Click `Run Test` to view a sample of the unencrypted and encrypted data in the lower panel of the Code Engine


Final Table - Encrypted Data

first_name last_name D.O.B address Income credit score
1 Simon Wong 54f55ee8fde0e163256e319cb3e18cdd238d99031b6a3aa24f32074192c9cc6b 01b628db091f236a504eb9029aef41798f67bffc194177730259789a4a1c964c 50cf0782d11e5bb5514080b35047ce2d661d7a826e8b84995b029bc37f9583fb 58ba31f958e9c22ad7c3d98c1849b9cfbf3e34430ef567ee9aa0e13af82b17e8
2 Onson Sweemey 1f5ae5f850c5a0afabcef0ba14d0a6b925d22ba59ffafe24cacd7bce406c925f 03883a6737efb434243a69f6bb798bb6a2fc8174d8f48e652f20b7884c9e8012 db3ee8084ee3b7100df8c43b21aad3c20326a8dd5bcf74276f0e6fd942dad1ad b86c1a3640eec9bc237b66051f9d88ba0f21ce176cc35e0eaa42cdd072e38316
3 Sleve Mcdichal b58a0f1598e962e12760f18d3608e86ed9f767831c80eba24354ff7c1468a792 c03aa67a138c8c84c48de3617eb253563bba4652d67cf85b70dafad373819863 bd54ec4218ff83a351345336006ab613f3d894b6b1d7e46cb5dbb67bafea8b52 a6745bab61af13d9e0bbd5e54ca9c5dd377f07aab2d51f2e7fe6bef41b42703a
4 Gleanallen Mixon d5cf7ac42cb51b669fa71f1945124a7617027717e93c56c6021d0780bedd99f1 8bbd6676f8d3e79305d90de50f7288e512f5826d8125c95fda8bf46458d24ed4 652032d4cdf2f5615455cbdaa747555179f11e0213384e37e1b0aa1288ca92ea 7a6a56cdae56d8bb046f9fa759029c5f5ea5f7f9d673637752bac2c71ebdb0e7
5 Darryl Archibeld 39ed390e4f16bdf548fafe4c1efe26bfe1a6cfe141b3da74f9a56aa9f7cbdab7 1ba39fe0250500081fbc94beb31c13f41bdfa530477a9c29e08c50805dc7b243 bc94e8f46b3b3530c7fc341ccc9e03575203df31c22a44e4abb5c78b9ed0508f fa1f94262c4ad9e45c99eded9f2406423a40039a60f299d5703114f8ce770e32
6 Anatoli Smorin 5b186c25f37dcdf9025600c1153e84eedc932d49cea329da4f256ddb0984d2e8 aac84dd8c940635d90fd8f8956ba4aff135376be08d91f6b2c1dc78ba34ecc19 b62cc50ba1c0b73db27f84b56471c7e5ac137f5f615861107ab0dc97ed4f52d8 fb5f6241b4e9c28e7e7972412f27894c6b9fbfa10e9f5a0539abace709651c0a

Things to pay attention to

When using this hashing method, there are several important points to remember, especially when encrypting at scale.

Key points are:

  • Once data is hashed, you cannot decrypt it. You can obtain the original data and hash it again, to make sure it is identical, but you can't go from hashed form back to the data.
  • Hashing algorithms can be CPU intensive and will take a small toll on the resources used for processing. Therefore, using the Code Engine to perform extensive hashing may take a larger toll on the total rate of event processing.
  • If you’re using PII columns for sorting (e.g. first & last name) the sorting will be lost after hashing those columns. Keep that in mind when designing your data warehouse's schema.

PII data is rarely required for analytics. That’s why keeping data hashed or encrypted in a consistent manner helps to reduce security risk.

Advantages of using Alooma for Hashing PII

Since Alooma is a modern data pipeline service built on top of scalable, distributed streaming infrastructure, there are a number of advantages over home grown solutions for masking, hashing, or encrypting.

Key advantages are:

  • Easy to deploy and maintain- with about 10 lines of code, you can configure masking, hashing, or encryption that will work uniformly across all of your data, and will stay resilient even as your data grows and changes.
  • Distributed streaming infrastructure for processing- frees you from worrying about scaling computing resources as your data volumes grow, and as encryption algorithms take their toll.
  • Automatic compliance- Alooma invests considerable resources into keeping its compliance and certification up to date and to the highest standard.

Alooma takes security seriously

Since the Alooma founders hail from the Israel Defense Force, security is in our DNA. We take data security extremely seriously and want our customers to trust us and our customers’ customers to trust them. With a combination of technology, experience, and protocols, Alooma upholds the strictest data privacy standards.

We are happy to sign a Business Associate Agreement (BAA) to ensure your data is HIPAA compliant. For an extra layer of security, we are SOC 2 Type II certified. That means our information systems are continuously audited to make sure our data security, availability, processing integrity, confidentiality, and privacy standards is assured by Ernst & Young.

This might interest you as well