AI Glossary

AI & Technology Terms Explained

Practical definitions for business leaders and engineers. Written by production AI engineers, not marketing teams.

Trusted by 247+ companies worldwide

Agent Evaluation

A testing framework that scores a multi-turn AI agent behavior (not just final output) across goal completion, trajectory efficiency, tool-call correctness, and safety.

Agentic AI

Agentic AI refers to AI systems that can autonomously plan, reason, use tools, and take actions to accomplish complex goals — going beyond simple question-answering.

Agentic RAG

RAG architecture where an AI agent decides when to retrieve, what to retrieve, and how to use the retrieved context, instead of retrieving on every query.

AI Agent

An AI agent is an autonomous software system that perceives its environment, makes decisions, and takes actions to achieve specific goals without constant human supervision.

An AI feature embedded inside a specific app (IDE, CRM, design tool) that suggests, completes, or executes actions in that app context, collaborating with the user instead of acting fully autonomously.

AI Gateway

A centralized service that manages access to multiple language models, handling authentication, rate limiting, cost tracking, and routing requests to appropriate models.

AI Guardrails

Safety mechanisms and rules that constrain AI behavior, preventing harmful outputs, ensuring compliance, and maintaining alignment with organizational policies and user expectations.

AI Hallucination

An AI hallucination is when a language model generates information that sounds plausible but is factually incorrect, fabricated, or not grounded in its training data.

AI Orchestration

The coordination and management of multiple AI components, agents, and workflows to work together seamlessly, handling routing, sequencing, and error management.

Attention Mechanism

A neural network component that allows models to focus on relevant parts of input data, weighing the importance of different elements when processing information.

AutoGen

A Microsoft framework for building applications with multiple AI agents that communicate through conversation, enabling collaborative problem-solving and autonomous execution.

BERT

Bidirectional Encoder Representations from Transformers: a pre-trained language model developed by Google that excels at understanding context and semantic relationships in text.

Blue-Green Deployment

A deployment strategy maintaining two identical production environments, enabling instant rollback if new version has issues and zero-downtime updates.

Caching Strategy

A systematic approach to storing frequently accessed data in fast-access locations, reducing database load, improving response times, and reducing computational costs.

CDN

Content Delivery Network: a geographically distributed system of servers that cache and deliver content to users from locations close to them, reducing latency and bandwidth costs.

Chain of Thought (CoT)

Chain of Thought (CoT) is a prompting technique that instructs AI models to reason step-by-step before giving a final answer, significantly improving accuracy on complex tasks.

Chunking Strategy

The choice of how to split source documents into smaller pieces before embedding them for RAG retrieval. Affects retrieval accuracy more than any other RAG decision.

Churn Rate

The percentage of customers who stop using a service during a given period, indicating customer satisfaction and product-market fit.

CI/CD Pipeline

An automated system for building, testing, and deploying code changes, enabling rapid, reliable release cycles and reducing manual errors.

Claude

A large language model developed by Anthropic, designed with a focus on safety, reasoning, and helpful responses. Claude excels at complex tasks, code generation, and multi-turn conversations.

Context Engineering

The discipline of designing what context an LLM sees on each call (retrieved chunks, tool outputs, agent memory, system rules, examples) to produce reliable outputs.

Context Window

The maximum amount of text that a language model can consider when processing input, including both the user's prompt and the model's previous responses in a conversation.

Conversion Rate Optimization (CRO)

The practice of improving the percentage of website visitors or product users who take desired actions, such as making purchases or signing up, through testing and refinement.

CrewAI

A framework for orchestrating multiple AI agents with distinct roles and expertise that collaborate to solve complex tasks through structured communication and task delegation.

Customer Acquisition Cost (CAC)

The total cost to acquire a new customer, including sales, marketing, and overhead, indicating business unit economics and sustainability of growth.

Customer Lifetime Value (LTV)

The total profit expected from a customer over their entire relationship with a company, indicating business sustainability and guiding acquisition spending.

Deep Learning

A subset of machine learning using neural networks with multiple hidden layers to learn hierarchical representations of data, enabling AI to understand complex patterns.

DevOps

A set of practices combining software development and IT operations, emphasizing automation, collaboration, and continuous improvement for faster, more reliable software delivery.

Diffusion Model

A generative AI model that learns to create new content by iteratively removing noise from random data, capable of generating images, audio, and other media from text descriptions.

Docker

A containerization platform that packages applications and dependencies into lightweight, portable containers, enabling consistent behavior across development, testing, and production environments.

Edge Computing

A distributed computing architecture where computation happens near data sources (on edge devices) rather than in centralized cloud data centers, reducing latency and bandwidth.

Embeddings (AI)

Embeddings are numerical representations of text, images, or other data that capture semantic meaning, enabling AI systems to understand similarity and relationships.

Few-Shot Learning

A machine learning approach where models learn to perform a task from just a few examples (typically 2-10), enabling quick adaptation without extensive training.

Fine-Tuning (AI)

Fine-tuning is the process of further training a pre-trained AI model on your specific data to improve its performance for your particular use case.

Fine-tuning vs RAG

The architectural decision between teaching a model new knowledge by retraining its weights (fine-tuning) versus inserting fresh facts into its context at runtime (RAG).

Fractional CTO

A part-time or project-based Chief Technology Officer role providing strategic technology leadership, guidance, and decision-making to organizations that can't afford or don't need a full-time CTO.

Function Calling (AI)

A capability where language models can request execution of specific functions or APIs by outputting structured specifications, enabling precise tool use and integrations.

Gemini

A multimodal AI model developed by Google that can process text, images, audio, and video. Gemini powers Google's AI services and competes with other large language models in reasoning and accuracy.

GPT

Generative Pre-trained Transformer: a large language model developed by OpenAI that generates human-like text based on prompts. GPT models power many AI applications from chatbots to content generation.

GraphQL

A query language and runtime for APIs that enables clients to request exactly the data they need, eliminating over-fetching and under-fetching common with REST APIs.

Growth Hacking

A mindset and set of techniques focused on rapid, sustainable business growth through creativity, analytical thinking, and rapid experimentation over traditional marketing.

HNSW Algorithm

Hierarchical Navigable Small World: the leading approximate-nearest-neighbor index used inside almost every modern vector database for fast similarity search at scale.

Hybrid Search

A search technique combining keyword matching with semantic similarity, leveraging both approaches to provide more relevant and comprehensive results than either alone.

Inference

The process of using a trained AI model to make predictions or generate outputs from new input data, as opposed to the training phase where the model learns.

Infrastructure as Code

A practice of defining and managing infrastructure (servers, networks, databases) through code and version control, enabling repeatable, version-controlled infrastructure.

Knowledge Graph

A knowledge graph is a structured representation of real-world entities and their relationships, used to organize information for AI reasoning and search.

Kubernetes

An open-source platform for orchestrating containerized applications, automating deployment, scaling, and management across clusters of machines.

LangChain

An open-source framework that simplifies building applications with large language models by providing tools for prompt management, chains, memory, and integration with external data sources.

LangGraph

A library for building complex AI applications by creating directed graphs that represent the flow of information and decisions, enabling cyclic workflows and stateful computation.

LLM Evals

Automated tests that score an LLM application output quality (coverage, correctness, tone, safety), typically against a labeled dataset.

LLM Observability

Tracing, logging, evaluation, and cost monitoring across every LLM call in a production AI app, so you can debug, measure quality, and control spend.

Load Balancing

A technique for distributing incoming network requests across multiple servers, ensuring no single server becomes overloaded and improving overall system reliability and performance.

Machine Learning

A field of AI where systems learn patterns from data without being explicitly programmed, enabling them to make predictions and decisions based on examples.

MCP Server

A small process that exposes tools, resources, or prompts to an LLM client via the Model Context Protocol, letting any compliant AI agent use the server capabilities.

Microservices

An architectural style where applications are built as a collection of loosely coupled, independently deployable services that communicate through APIs, enabling flexibility and scalability.

Model Context Protocol (MCP)

An open standard that enables language models to securely access external tools, data sources, and systems through a standardized interface, extending AI capabilities.

Model Routing

A technique where requests are directed to different AI models based on criteria like task type, complexity, or cost, optimizing for performance and economics.

Model Training

The process of teaching an AI model to recognize patterns by iteratively adjusting its parameters based on examples from training data until accuracy improves.

Monthly Recurring Revenue (MRR)

A metric tracking predictable revenue generated each month from subscriptions, excluding one-time payments and refunds, indicating business health and growth.

Multi-Agent System

A multi-agent system is an AI architecture where multiple specialized agents collaborate, communicate, and coordinate to solve complex tasks that a single agent cannot.

MVP (Minimum Viable Product)

A Minimum Viable Product (MVP) is the simplest version of a product that delivers core value to early users and validates a business hypothesis.

Natural Language Processing (NLP)

Natural Language Processing (NLP) is a branch of AI that enables computers to understand, interpret, and generate human language.

Neural Network

A machine learning model inspired by biological neural systems, consisting of interconnected nodes (neurons) organized in layers that learn patterns from data through training.

pgvector

pgvector is a PostgreSQL extension that adds vector similarity search capabilities, enabling RAG systems and semantic search directly in your existing PostgreSQL database.

Product-Market Fit

The moment when a product resonates strongly with its target market, evidenced by strong user retention, word-of-mouth growth, and willingness to pay for the product.

Prompt Caching

An LLM API feature that caches static parts of a prompt (system instructions, long context) so repeat requests skip re-processing those tokens, cutting cost and latency.

Prompt Chaining

A technique where the output of one AI request becomes the input to another, breaking complex tasks into manageable steps and improving accuracy and reasoning.

Prompt Engineering

Prompt engineering is the practice of designing and optimizing inputs to AI language models to get accurate, consistent, and useful outputs.

Prompt Injection

Prompt injection is a security attack where malicious input manipulates an AI system into ignoring its instructions, revealing confidential data, or performing unauthorized actions.

Reinforcement Learning

A machine learning approach where AI agents learn by taking actions in an environment, receiving rewards or penalties for their actions, and optimizing to maximize cumulative rewards.

Reranker

A second-stage scoring model that re-orders retrieved chunks for relevance, typically a cross-encoder that scores each (query, chunk) pair more accurately than the initial vector search.

REST API

An architectural style for web services using HTTP methods (GET, POST, PUT, DELETE) to perform operations on resources identified by URLs, enabling stateless client-server communication.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an AI architecture that combines a language model with a search system to answer questions using your own data.

SaaS (Software as a Service)

A software delivery model where applications are hosted on cloud servers and accessed by users through web browsers, eliminating installation and maintenance overhead.

Semantic Search

A search technique that finds results based on meaning and intent rather than keyword matching, using embeddings to understand conceptual relationships between documents.

Serverless

A cloud computing model where applications run on managed platforms without explicit server management, with automatic scaling and pay-per-use billing.

Similarity Search

The process of finding items in a database most similar to a query item based on numerical metrics, used extensively in recommendation systems and semantic search.

Structured Output

An LLM feature that constrains the model to return JSON matching a schema you specify: no free-form text, no parser errors.

Supervised Learning

A machine learning approach where models learn from labeled training data—examples paired with correct answers—to make predictions on new, unseen data.

Synthetic Data Generation

Using one LLM to generate training data, eval data, or augmentation examples for another model or app: labeled question-answer pairs, edge cases, role-played conversations.

Technical Co-Founder

A co-founder who leads technology and product development, bringing engineering expertise, technical decision-making, and often serving as initial CTO.

Technical Debt

Technical debt is the accumulated cost of choosing quick, expedient solutions over well-engineered ones, resulting in future rework and slower development.

Temperature (AI)

A parameter that controls the randomness or creativity in AI model outputs: lower values (0-0.5) produce focused, consistent responses; higher values (0.7-2.0) produce more creative, diverse outputs.

Token (AI/LLM)

A token is the basic unit of text that AI language models process — roughly 3/4 of a word in English — and the basis for LLM pricing and context limits.

Tokenization

The process of breaking down text into smaller units called tokens (words, subwords, or characters) that AI models can process and understand.

Tool Use (AI)

The ability of AI agents to access and use external tools and systems to accomplish tasks, extending their capabilities beyond text generation to interact with the real world.

Transfer Learning

A machine learning technique where knowledge learned from one task is transferred and applied to a different but related task, enabling faster training with less data.

Transformer Architecture

The Transformer is the neural network architecture behind all modern LLMs, using self-attention mechanisms to process and generate sequences of data.

Unsupervised Learning

A machine learning approach where models find hidden patterns and structures in unlabeled data without being told what to look for.

Vector Database

A vector database is a specialized database designed to store, index, and search high-dimensional vector embeddings for AI applications like semantic search and RAG systems.

Vibe Coding

A coding workflow where developers describe intent in natural language and let AI tools (Cursor, Copilot, Claude Code, Lovable, Bolt) generate, edit, and ship the code.

WebAssembly

A binary instruction format and runtime environment that enables high-performance applications to run in web browsers, complementing JavaScript with near-native speed.

WebSocket

A WebSocket is a communication protocol that enables persistent, two-way connections between a client and server, used for real-time features like live chat and streaming AI responses.

Zero-Shot Learning

A machine learning approach where AI models perform tasks without any task-specific training or examples, relying purely on their pre-training and prompt instructions.