Cost Optimization Tips
Understanding token pricing
Costs scale with input and output tokens. Long system prompts and conversation history cost real money on every single call. Model choice also matters — smaller/faster models cost less per token than the largest models.
Practical cost-reduction techniques
Trim conversation history to what's needed. Use prompt caching for large, repeated system prompts or documents (Claude supports caching to avoid reprocessing the same content). Set sensible max_tokens limits rather than leaving them unbounded.
Monitoring usage
Check the usage field on every API response to track input/output tokens per call. Aggregate this in your database to build per-user or per-feature cost dashboards before costs surprise you.
Key Takeaways
- Costs scale with both input and output tokens on every call.
- Trim history and use prompt caching for repeated large content.
- Set explicit `max_tokens` limits to avoid runaway costs.
- Track `usage` data to monitor spend before it surprises you.
Build a cost tracker
Log the `usage.input_tokens` and `usage.output_tokens` from every API call into Firestore, and write a small script that sums total tokens used this week.