Embeddings — How AI Understands Documents
What an embedding actually is
An embedding model converts a piece of text into a list of numbers (a vector) that represents its meaning. Texts with similar meaning end up with mathematically similar vectors — this is what makes semantic search possible.
Semantic search vs keyword search
Keyword search finds exact word matches. Semantic search (via embeddings) finds conceptually related content even with completely different wording — a search for "how to cancel a subscription" can match a document about "ending recurring billing".
Generating embeddings for your documents
Use an embedding API (OpenAI's or Voyage AI's embedding models) to convert each document chunk into a vector, then store those vectors alongside the original text in a vector database for later retrieval.
Key Takeaways
- Embeddings convert text into numerical vectors representing meaning.
- Semantically similar texts produce mathematically similar vectors.
- Semantic search finds conceptually related content, not just exact keyword matches.
- Embeddings are generated once per document chunk and stored for retrieval.
Compare keyword vs semantic matches
Write 2 differently-worded questions about the same topic, and manually judge whether a keyword search vs semantic understanding would find the same source document for both.