Lesson 912 lessons

Cost Optimization Tips

Understanding token pricing

Costs scale with input and output tokens. Long system prompts and conversation history cost real money on every single call. Model choice also matters — smaller/faster models cost less per token than the largest models.

Practical cost-reduction techniques

Trim conversation history to what's needed. Use prompt caching for large, repeated system prompts or documents (Claude supports caching to avoid reprocessing the same content). Set sensible max_tokens limits rather than leaving them unbounded.

Monitoring usage

Check the usage field on every API response to track input/output tokens per call. Aggregate this in your database to build per-user or per-feature cost dashboards before costs surprise you.

Key Takeaways

  • Costs scale with both input and output tokens on every call.
  • Trim history and use prompt caching for repeated large content.
  • Set explicit `max_tokens` limits to avoid runaway costs.
  • Track `usage` data to monitor spend before it surprises you.

Build a cost tracker

Log the `usage.input_tokens` and `usage.output_tokens` from every API call into Firestore, and write a small script that sums total tokens used this week.

Deploying to Vercel