GoFundMe lacked a central warehouse for the data available from its backend relational databases, online events and metrics, support service, and other internal and external sources-all generating a total of about one billion events every month. Without centralization, the analytics were both piecemeal and siloed, which prevented the IT staff from getting a holistic sense of exactly where the business was going.
Michal Bujak, technical lead on the analytics project, understood the challenge involved in building a data pipeline to get just such a holistic view: “Data is spread across multiple sources and different systems in different formats. Some of it’s flat, some is relational, some is JSON, and writing custom scripts to integrate it all seemed too difficult to even attempt”—an effort he estimates would have taken 2 or 3 people upwards of a year to complete.
In a word: flexibility. Navigating the uncharted waters of crowdsourcing would require a data analytics solution that’s as adaptable as the business, and that requires a data pipeline that’s flexible. According to Bujak, “We didn’t want to be locked into any product’s schema or toolset, and with Alooma we got a data pipeline that has connectors for all of our data sources, along with the ability to write custom Python scripts to transform any of our data as needed. In effect, Alooma has given us an out-of-the-box pipeline that lets us retain full control over all of our data.”
With Alooma, GoFundMe got not only flexibility and simplicity, but also data integrity. Custom ETL scripts often corrupt data in ways that remain undetected. To avoid this problem, Alooma has several safeguards built in to avoid the misleading “Garbage In, Garbage Out” results that can plague businesses. One of the safeguards Bujak values most is the ability to re-stream data to ensure that nothing gets lost, broken or duplicated as it transits the data pipeline.