Building AI at Scale: Lessons from 1B+ Users
Key lessons learned from scaling Adobe's AI Assistant to serve over a billion users. From architecture decisions to team dynamics.
When we set out to build Adobe's AI Assistant, we knew it had to serve our entire user base - over a billion monthly active users. Here's what I learned along the way.
The Challenge
Building AI products is hard. Building AI products that serve a billion users is exponentially harder. You're dealing with:
- Massive scale: Every millisecond of latency costs millions in user engagement
- Global distribution: Users across 100+ countries with varying network conditions
- Cost constraints: LLM API costs can spiral out of control quickly
- Quality expectations: Users expect ChatGPT-level quality, but with PDF-specific knowledge
Architecture Decisions
1. Multi-Cloud from Day One
We built on AWS, Azure, and GCP simultaneously. This wasn't premature optimization - it was essential:
class ModelGateway:
def route_request(self, model: str, prompt: str):
# Intelligent routing based on cost, latency, availability
if model.startswith("gpt"):
return self.azure_client.complete(prompt)
elif model.startswith("claude"):
return self.aws_bedrock.complete(prompt)
else:
return self.gcp_vertex.complete(prompt)
Result: $2M+ saved annually through intelligent routing and cost optimization.
2. Hybrid RAG Architecture
We didn't just use vector search. We combined:
- Dense retrieval: For semantic similarity
- Sparse retrieval: For exact keyword matches
- Reranking: To surface the most relevant chunks
- Conversational memory: For multi-turn coherence
3. Streaming Everything
Sub-2 second response times at billion-user scale required streaming from end to end:
- Stream from LLM
- Stream through our processing pipeline
- Stream to the client
Users see words appearing instantly, not a loading spinner.
Team Dynamics
Scaling the product also meant scaling the team. At peak, I was directing 30+ engineers across:
- San Jose (ML pipelines & experimentation)
- Bangalore (Research)
- Noida (Infra)
Key lesson: Over-communicate. What works for 5 people breaks at 30. We instituted:
- Daily async standups (timezone-friendly)
- Weekly architecture reviews
- Bi-weekly demo days
What I'd Do Differently
Mistake #1: We built too many models internally before leveraging LLMs. We should have started with GPT-4 and moved to custom models only when justified.
Mistake #2: We underestimated the importance of evaluation. Build your eval harness before your product.
Mistake #3: We didn't invest enough in observability early. When things break at scale, you need deep visibility.
Results
- 🏆 TIME Best Invention 2024
- 💰 $5B revenue potential unlocked
- 👥 1B+ MAU served
- ⚡ Sub-2s response times
- 📊 95%+ user satisfaction
Takeaways for Builders
- Start simple: MVP with LLM APIs like Gemini, then optimize
- Multi-cloud from start: Don't get locked in
- Obsess over latency: Every 100ms matters
- Build evaluation first: You can't improve what you can't measure
- Stream everything: Users perceive streaming as 10x faster
Building at this scale taught me that the hard problems aren't technical - they're about coordination, communication, and keeping quality high while moving fast.