Data cleansing, without a catch
Cleanse your mismatches
Automatically parse different formats
Enrich your data on the fly
Integrate your data into a single source of truth
All of your data is safe and accounted for
View your data in real time
Data cleansing FAQ
Data cleansing refers to the detection and removal of errors from collected data. As part of the ETL process, whenever large amounts of data from multiple sources require integration, there is a likelihood that some data will be redundant, erroneous, missing, or invalid. Also, data types and formats may need adjustment to match those in the destination data warehouse schema. Learn more about data cleansing.
Data-driven decisions are crucial to modern business; however, if your data sources contain errors or if there are inconsistencies between them, your decision making will likely be impacted. Data cleansing is necessary when integrating heterogeneous data sources due to the benefits it provides: you can ensure that data arrives in your data warehouse uniform and ready to query, and give yourself the power to correct data integrity problems before they impact decision making.
Despite your best efforts, data from each of your sources may contain misspellings, mis-fielded information, duplicate records or even contradictory or incorrect information.
Some examples of data cleansing include:
- Data record deduplication;
- Data matching (Do records in a common dataset share a common format?);
- Removing garbage or nonsense values;
- Removing null values;
- Removing errors;
- Enrichment of data with supplementary data, e.g. geodata or timestamps;
- Confirming data like addresses, urls, etc;
- Fixing records according to a predefined ruleset.
Data cleansing should not occur in isolation, but in conjunction with other data transformations.