Lesson 311 lessons

Vector Databases — Pinecone and Chroma

What a vector database does

A vector database stores embeddings and performs fast similarity search — given a query vector, it quickly finds the stored vectors (and their associated text) that are mathematically closest to it.

Pinecone vs Chroma — choosing a starting point

Pinecone is a managed cloud service — no infrastructure to run, scales easily, has a free tier for learning. Chroma is open-source and can run locally or self-hosted — better for learning internals or keeping data fully in your control.

Setting up your first vector store

Create an index/collection, generate embeddings for a handful of test documents, insert them with their metadata (source, date, title), and run a test query to confirm relevant results return.

Key Takeaways

Vector databases store embeddings and perform fast similarity search.
Pinecone is managed/cloud; Chroma is open-source and self-hostable.
Choose based on infrastructure preference and data control needs.
Always store metadata alongside vectors for source attribution.

Set up a test vector store

Create a Pinecone or Chroma index, insert embeddings for 5 test document chunks with metadata, and run a query to confirm retrieval works.

Take Lesson Exam