Building AI at Scale: Lessons from 1B+ Users | Anand Taralika

When we set out to build Adobe's AI Assistant, we knew it had to serve our entire user base - over a billion monthly active users. Here's what I learned along the way.

The Challenge

Building AI products is hard. Building AI products that serve a billion users is exponentially harder. You're dealing with:

Massive scale: Every millisecond of latency costs millions in user engagement
Global distribution: Users across 100+ countries with varying network conditions
Cost constraints: LLM API costs can spiral out of control quickly
Quality expectations: Users expect ChatGPT-level quality, but with PDF-specific knowledge

Architecture Decisions

1. Multi-Cloud from Day One

We built on AWS, Azure, and GCP simultaneously. This wasn't premature optimization - it was essential:

class ModelGateway:
    def route_request(self, model: str, prompt: str):
        # Intelligent routing based on cost, latency, availability
        if model.startswith("gpt"):
            return self.azure_client.complete(prompt)
        elif model.startswith("claude"):
            return self.aws_bedrock.complete(prompt)
        else:
            return self.gcp_vertex.complete(prompt)

Result: $2M+ saved annually through intelligent routing and cost optimization.

2. Hybrid RAG Architecture

We didn't just use vector search. We combined:

Dense retrieval: For semantic similarity
Sparse retrieval: For exact keyword matches
Reranking: To surface the most relevant chunks
Conversational memory: For multi-turn coherence

3. Streaming Everything

Sub-2 second response times at billion-user scale required streaming from end to end:

Stream from LLM
Stream through our processing pipeline
Stream to the client

Users see words appearing instantly, not a loading spinner.

Team Dynamics

Scaling the product also meant scaling the team. At peak, I was directing 30+ engineers across:

San Jose (ML pipelines & experimentation)
Bangalore (Research)
Noida (Infra)

Key lesson: Over-communicate. What works for 5 people breaks at 30. We instituted:

Daily async standups (timezone-friendly)
Weekly architecture reviews
Bi-weekly demo days

What I'd Do Differently

Mistake #1: We built too many models internally before leveraging LLMs. We should have started with GPT-4 and moved to custom models only when justified.

Mistake #2: We underestimated the importance of evaluation. Build your eval harness before your product.

Mistake #3: We didn't invest enough in observability early. When things break at scale, you need deep visibility.

Results

🏆 TIME Best Invention 2024
💰 $5B revenue potential unlocked
👥 1B+ MAU served
⚡ Sub-2s response times
📊 95%+ user satisfaction

Takeaways for Builders

Start simple: MVP with LLM APIs like Gemini, then optimize
Multi-cloud from start: Don't get locked in
Obsess over latency: Every 100ms matters
Build evaluation first: You can't improve what you can't measure
Stream everything: Users perceive streaming as 10x faster

Building at this scale taught me that the hard problems aren't technical - they're about coordination, communication, and keeping quality high while moving fast.

Want to chat about scaling AI? Reach out or ask my AI Twin