Have you ever encountered a situation where your downstream data pipelines are blocked by a small manual mistake in one of the Google Sheets? Sometimes, the sheet is not even owned by your team, so you can’t do anything but chase the sheet owner to fix it. Meanwhile, many other critical pipelines are also failing as a consequence, and you need to take care of them as well.
You feel exhausted and drained. The worst part is, there is nothing you can really do as an engineer. It’s all about endless communication and stakeholder management. The Google Sheet issue is just an example of source issues that can occur across various scales. Take a moment to pause and consider one issue that resonates with you as we delve into the article.
A key to improving this situation is automating the communication lifecycle within your data pipelines. If your pipeline has an alerting mechanism in place, then it’s already a good start. However, alerts primarily target the data engineering teams rather than external teams.
Based on my experience, it’s equally vital to establish proactive communication with the source team or end users to ensure they are well-informed about ongoing situations and can take action accordingly. Throughout this article, I will use Mage for the implementation, a modern Airflow alternative known for its effective features in solving such problems.
One of the missions of engineers is to automate things. It saves us time for the future and it is fun. Nobody enjoys continuously chasing the sourcing team to fix data issues or individually explaining what happened to end users when things are not working. We would instead let a bot do it for us. There are two levels of automation we can implement:
Immediate feedback to the data source team — Rather than manually informing the source team on the data issue, an automated and consistent way of communication can be established through a bot. Whenever a data test fails, a callback-type like function will be triggered to notify the source team via email or Slack, providing them with detailed reasons for…