User Action Logs: Queue-First Design and Why We Chose Parquet
We used to write all user action logs to PostgreSQL. It was the fastest way to ship the feature, but we knew it wouldn’t scale. As volume grew—user clicks, page views, cart actions, seller operations—the database became a bottleneck. We moved to a queue-first design and from PostgreSQL to Parquet files on GCS. This post explains why we made that shift and how it works.
What are user action logs? In our system, they’re timestamped records of things that happen in the product: a customer viewing a product or adding to cart, a seller updating a package status, an order being confirmed, and so on. They come from many sources like the user and cart services, the seller dashboard, the order and fulfillment back-ends and we use them for analytics, reporting, and understanding how the platform is used.
The Problem: When Everything Went to PostgreSQL
Writing every action log straight into PostgreSQL caused three issues:
- Latency — The client (or the service emitting the event) had to wait for the write.
- Database load — Traffic growth meant more connections, more disk I/O, and more tuning.
- Schema evolution — Adding or changing columns for wide, nested payloads meant migrations and backfills.
We wanted ingestion to be non-blocking for producers and cost-effective for storage and analytics. That’s why we moved user action logs from PostgreSQL to Parquet files on GCS.
High-Level Design: Queue Then Process
We use a single message queue (AWS SQS) as the buffer between producers and our logging backend. Services (e.g. order, seller, user) don’t call the logging API and wait—they publish a message and return. A dedicated logging service, Ore, consumes from the queue and writes Parquet files to GCS.
Benefits of the queue approach:
- Decoupling — Producers depend only on SQS, not on Ore’s availability or speed.
- Backpressure — Traffic spikes sit in the queue, and Ore consumes at a sustainable rate.
- Retries — Failed processing can be retried per message without the client re-sending.
- Batching — The consumer processes many messages in one run (e.g. one Lambda batch) and writes one or more Parquet files per batch, cutting I/O and cost.
Each message carries a type (e.g. user action, seller action) so Ore can route to the right schema and write Parquet files to GCS.
Why We Moved from PostgreSQL to Parquet
Several factors drove the move to Parquet files:
| Concern | PostgreSQL | Parquet (object storage) |
|---|---|---|
| Write throughput | Limited by DB connections and disk, and scaling is vertical or via replicas. | Append-only files, and throughput scales with object storage and consumer parallelism. |
| Storage cost | Higher per GB, and indexes and tables add overhead. | Columnar format compresses well, and cold storage (e.g. GCS) is cheap. |
| Schema flexibility | Migrations and backfills for new columns are heavy. | New columns can be optional, and readers ignore unknown fields. |
| Analytics | Heavy aggregations can lock or slow the OLTP DB. | We query these files in BigQuery, and engines like Athena or Spark read Parquet natively without touching the main DB. |
| Retention | Deleting or archiving old rows is non-trivial. | Lifecycle policies (e.g. move to colder storage or delete by prefix) are simple. |
In short: the queue keeps ingestion async and resilient, and the move from PostgreSQL to Parquet keeps storage and analytics scalable and cheap.
End-to-End Flow in a Nutshell
- Producers (user, seller, order services, etc.) build a small payload (e.g.
actionType, identifiers, timestamps) and send it to a central SQS queue, with a message attribute for the log type. - Ore is triggered by the queue (e.g. via Lambda) and reads a batch of messages.
- Ore maps payloads to a flat, column-oriented structure and writes Parquet files to GCS.
- We query the Parquet logs in BigQuery (e.g. via external tables over GCS) for analytics and reporting. For the pipeline that moves this data into our reporting database, see Masora: Bridging the Gap Between Raw Data and Actionable Insights.
From the client’s perspective, no synchronous database write is needed—only that the message reaches the queue.
Takeaways
- We used to write all user action logs to PostgreSQL. We now write them as Parquet files on GCS.
- Queue-first ingestion keeps producers fast and decouples them from the logging backend.
- Parquet on GCS gives us high write throughput and low storage cost. We query the logs in BigQuery for analytics, with Masora feeding our reporting database.
If you’re writing everything to a single database and feeling the pinch, moving those streams to Parquet—and putting a queue in front—is a pattern worth considering. It’s scaled well for us without overloading the core database.