Enterprises building data lakes and warehouses have been turning to third-party vendors to bring in data. Building connectors to applications likes Salesforce is tedious and requires data engineering expertise. Plus, the SaaS platforms change up their data schemas and API details.
These data pipeline vendors extract data from the source systems and load the data into the warehouse. They will also clean up or sync the data between different systems, either before it gets to the warehouse, or after.
Today, more and more companies are opting for the latter.
With the first approach, transforming the data before you get to the warehouse, you lose traceability, said Goutham Belliappa, vice president of artificial intelligence engineering at Capgemini North America.