Masora: Bridging the Gap Between Raw Data and Actionable Insights
Introduction
Every hour, thousands of user actions flow through our systems—cart confirmations, cancellations, and interactions that tell the story of how users engage with our platform. But raw data sitting in storage doesn’t create value on its own. That’s where Masora comes in.
Masora is our data pipeline service that transforms raw user action data from Google Cloud Storage into structured insights ready for analytics. Think of it as a translator that takes the language of parquet files and converts it into the dialect that our PostgreSQL reporting database understands.
The Challenge We’re Solving
In today’s data-driven world, having information scattered across different storage systems creates a bottleneck. Our user action data—particularly from WhatsApp interactions like cart cancellations and confirmations—was sitting in parquet files in Google Cloud Storage. While this format is efficient for storage, it wasn’t immediately accessible for the analytics and reporting our team needs to make informed decisions.
That’s the problem Masora solves. It bridges the gap between our raw data storage and our analytics infrastructure, ensuring that every user interaction is captured, processed, and made available for analysis.
How Masora Works
The beauty of Masora lies in its simplicity and reliability. The pipeline runs like clockwork, orchestrated by Google Cloud Scheduler that triggers the process every hour. When the scheduler sends an HTTP request to Masora’s entry point, the service springs into action.
Here’s what happens behind the scenes: Masora leverages BigQuery’s powerful external table capabilities to query the parquet files directly from Google Cloud Storage. This means we don’t need to move or duplicate data—we can query it right where it lives. BigQuery then processes the user action data, and Masora takes that processed information and writes it in carefully managed batches to our PostgreSQL reporting database.
This batch processing approach isn’t just about efficiency—it ensures that our reporting database can handle the load gracefully, maintaining performance even as data volumes grow. The entire process runs on Google Cloud Run, giving us the flexibility to scale up or down based on demand, and the ability to trigger it manually via HTTP endpoints when needed.
The result? A seamless flow from raw user actions to actionable insights, running automatically in the background so our team can focus on what matters most—understanding our users and improving their experience.