Vector Databases — Pinecone and Chroma
What a vector database does
A vector database stores embeddings and performs fast similarity search — given a query vector, it quickly finds the stored vectors (and their associated text) that are mathematically closest to it.
Pinecone vs Chroma — choosing a starting point
Pinecone is a managed cloud service — no infrastructure to run, scales easily, has a free tier for learning. Chroma is open-source and can run locally or self-hosted — better for learning internals or keeping data fully in your control.
Setting up your first vector store
Create an index/collection, generate embeddings for a handful of test documents, insert them with their metadata (source, date, title), and run a test query to confirm relevant results return.
Key Takeaways
- Vector databases store embeddings and perform fast similarity search.
- Pinecone is managed/cloud; Chroma is open-source and self-hostable.
- Choose based on infrastructure preference and data control needs.
- Always store metadata alongside vectors for source attribution.
Set up a test vector store
Create a Pinecone or Chroma index, insert embeddings for 5 test document chunks with metadata, and run a query to confirm retrieval works.