How we process 100k threads/day without blowing up

The morning everything fell apart

Tuesday, 8:47 AM. Datadog is screaming. Average latency at 14 seconds per request. The database is sweating. Workers have been queued for 40 minutes. And 23 customers are waiting for Novaseed to surface fresh Reddit threads for their morning prospecting session.

We'd grown volume faster than we'd grown infrastructure. Classic. We thought spinning up another server would fix it. It didn't. Because the problem wasn't compute, it was architecture.

Here's what we rebuilt, and why it holds at 100,000 threads a day.

The mistake everyone makes: the synchronous monolith

We built the first version fast. Too fast. A single Python service that crawled sources, scored buying intent, and stored results in the database, all in one synchronous loop. It worked fine at 500 threads a day. At 5,000, we started seeing timeouts. At 20,000, the service just died.

The problem with a synchronous pipeline at this scale is that every step waits for the one before it. If scraping Reddit takes 800ms and you've got 20,000 URLs to process, the math is brutal. And if one step slows down (rate limiting, a flaky external API, a saturated DB), everything behind it stalls.

So we decoupled everything.

Today the architecture works like this: collectors (the components reading Reddit, LinkedIn, X, Facebook) publish raw events to a queue. Scorers consume that queue independently. Reply generators consume the scorer output. Each step is stateless, horizontally scalable, and can fail without taking the rest of the pipeline down.

We run Celery with Redis as the broker for short-lived tasks, and dedicated workers for heavier jobs that need persistent context. No magic. Just decoupling.

Scoring at 100k: this is where it gets real

Scraping threads is a volume problem. Scoring buying intent is a nuance problem. Doing both at scale without burning money on compute is the actual challenge.

Our scoring model runs in layers. The first layer is lexical and very fast: we look for weak signals, language patterns that suggest purchase intent (direct questions, solution comparisons, frustration with existing tools). This layer eliminates 60 to 70% of threads in milliseconds. You don't need an LLM for that.

The second layer is where we call a heavier model. But only on the 30-40% that passed the first filter. That keeps scoring quality high without a catastrophic OpenAI bill. We batch calls in groups of 20 threads, which cuts latency and per-unit cost significantly.

We also run aggressive caching on domains we've already seen. A Reddit thread on r/SaaS about a HubSpot integration is likely to look a lot like other threads we've already scored. We cache embeddings, not scores, because customer context changes. But skipping the recomputation of vector representations alone saves us roughly 40% on compute.

Dealing with rate limits without losing your mind

Reddit, LinkedIn, X: all three have rate limits, all apply them differently, and none of them warn you cleanly when you're getting close. (LinkedIn in particular loves to silently throttle you without returning a 429. Took us an embarrassingly long time to figure that one out.)

We built an abstraction layer we internally call the "router". Its only job: distribute requests across multiple accounts, user agents, and proxies, while respecting each platform's rate limiting windows. It's not glamorous work. It's plumbing. But it's the difference between a tool that crashes and a tool that runs.

Every collector exposes health metrics: request count within the rolling window, average response time, error rate. The router makes real-time decisions. If Reddit starts responding slowly, we back off automatically without stopping the whole pipeline.

We lost two weeks building this. We haven't had a single rate limiting incident in production since.

What we'd do differently from day one

If we started over: queues go in during the first sprint. Not when things break. The temptation to build something synchronous "just to test it" is understandable, but the technical debt compounds fast.

We'd also measure cost per thread from the start. Today we know we process one thread for roughly €0.0004 in total compute. It took us six months to have that number. Without it, you can't make smart optimization decisions.

And we'd pick Postgres with JSONB over MongoDB to store enriched threads. We migrated three months post-launch and it was not fun.

The architecture we're running now isn't perfect. We still have bottlenecks on personalized reply generation, especially when multiple customers in the same vertical hit their peaks at the same time. We're working on it. But we can process 100,000 threads a day with infrastructure that costs less than a junior sales rep's monthly salary. That's the benchmark that actually matters.

inbown.com

Want to see Inbown in action?

Scan your site, get 20 prospects ready to buy. Free, 30 seconds.

Scan my product →