AI & Agent Development Glossary

78 terms covering AI agents, LLMs, and developer infrastructure. Each definition is self-contained and quotable.

A

A/B Testing

A/B testing compares two or more variants of a system by randomly assigning users to groups and measuring statistically significant differences in predefined outcome metrics.

MLOps

Agent Harness

An agent harness is the runtime environment that manages an AI agent's execution loop, tool access, permission boundaries, memory persistence, and conversation state.

Developer Tools

Agent Loop

An agent loop is the iterative cycle of observe, reason, act, and evaluate that an AI agent repeats until it completes a task or reaches a termination condition.

AI Agent Development

Agent Memory

Agent memory is the system that enables AI agents to persist, retrieve, and reason over information across conversation turns and sessions, providing continuity beyond the immediate context window.

AI Agent Development

Agent Observability

Agent observability is the practice of instrumenting AI agent systems to capture traces, metrics, and logs across the full execution lifecycle, enabling debugging, performance optimization, and reliability monitoring.

MLOps

Agent Orchestration

Agent orchestration is the coordination layer that manages how multiple AI agents communicate, share context, delegate tasks, and resolve conflicts within a system.

AI Agent Development

Agentic AI

Agentic AI refers to artificial intelligence systems that autonomously plan, execute, and adapt multi-step tasks toward a goal without requiring human intervention at each step.

AI Agent Development

Agentic RAG

Agentic RAG is a retrieval-augmented generation pattern where an AI agent iteratively decides what to retrieve, evaluates retrieval quality, and reformulates queries until it has sufficient context to answer accurately.

AI Agent Development

AI Agent Memory

AI agent memory is the system that persists information across interactions, enabling agents to recall past context, learn from experience, and maintain continuity between sessions.

AI Agent Development

AI Alignment

AI alignment is the research field dedicated to ensuring artificial intelligence systems reliably pursue goals that match human intentions, values, and ethical principles.

AI Safety

AI Coding Agent

An AI coding agent is an autonomous software development assistant that can read codebases, write code, run tests, debug errors, and commit changes with minimal human direction.

Developer Tools

AI Guardrails

AI guardrails are programmatic constraints and validation layers that prevent AI systems from generating harmful, off-topic, or policy-violating outputs during production use.

AI Safety

Attention Mechanism

An attention mechanism allows neural networks to dynamically focus on relevant parts of the input when producing each element of the output, weighting information by learned importance.

LLM Architecture

C

Canary Release

A canary release gradually routes a small percentage of production traffic to a new version while monitoring for errors before expanding to all users.

DevOps/CI-CD

Chain of Thought

Chain of thought is a prompting technique that instructs language models to produce intermediate reasoning steps before arriving at a final answer, improving accuracy on complex tasks.

Prompt Engineering

Constitutional AI

Constitutional AI is an alignment technique where a language model critiques and revises its own outputs according to a set of written principles, reducing reliance on human feedback for safety training.

AI Safety

Container Orchestration

Container orchestration automates the deployment, scaling, networking, and lifecycle management of containerized applications across clusters of machines.

Cloud Infrastructure

Content Delivery Network

A content delivery network (CDN) distributes cached copies of web content across geographically dispersed servers to reduce latency and improve load times for users worldwide.

Cloud Infrastructure

Context Compression

Context compression is a set of techniques that reduce the token count of prompts while preserving semantic content, enabling more information to fit within a model's fixed context window.

LLM Infrastructure

Context Engineering

Context engineering is the practice of designing and optimizing the information provided to a language model to maximize the relevance, accuracy, and efficiency of its outputs.

LLM Infrastructure

Context Window

A context window is the maximum number of tokens a language model can process in a single input-output interaction, encompassing both the prompt and the generated response.

LLM Infrastructure

Continuous Batching

Continuous batching is an inference serving technique that dynamically adds and removes requests from a running batch at each generation step, maximizing GPU utilization without waiting for all requests to complete.

LLM Infrastructure

Continuous Deployment

Continuous deployment automatically releases every code change that passes automated testing directly to production without manual approval gates.

DevOps/CI-CD

M

MCP Server

An MCP server is a lightweight program that exposes tools, resources, and prompts to AI applications through the Model Context Protocol's standardized client-server interface.

Developer Tools

Mixture of Experts

Mixture of Experts (MoE) is a neural network architecture that routes each input to a subset of specialized sub-networks, enabling massive model capacity with efficient per-token computation.

LLM Architecture

Model Context Protocol (MCP)

Model Context Protocol is an open standard that defines how AI applications connect to external data sources and tools through a unified client-server interface.

Developer Tools

Model Distillation

Model distillation transfers knowledge from a large teacher model to a smaller student model by training the student to match the teacher's output distributions rather than hard labels.

LLM Architecture

Model Evaluation

Model evaluation is the systematic process of measuring language model performance against benchmarks, human judgments, and task-specific metrics to determine fitness for production deployment.

MLOps

Model Gateway

A model gateway is an API proxy layer that sits between applications and LLM providers, providing unified access, load balancing, fallback routing, cost tracking, and policy enforcement across multiple models.

LLM Infrastructure

Model Registry

A model registry is a centralized repository that stores, versions, and manages machine learning model artifacts along with their metadata, lineage, and deployment status.

MLOps

Model Routing

Model routing is the dynamic selection of which language model handles each request based on task complexity, cost constraints, latency requirements, or content classification.

LLM Infrastructure

Model Serving

Model serving deploys trained machine learning models as production services that accept inference requests and return predictions with low latency and high availability.

MLOps

Multi-Agent System

A multi-agent system is an architecture where multiple specialized AI agents collaborate, communicate, and coordinate to solve problems that exceed any single agent's capabilities.

AI Agent Development

Multi-Modal Agents

Multi-modal agents are AI systems that perceive and act across multiple data types — text, images, audio, video, and code — using vision-language models to understand and interact with graphical interfaces.

AI Agent Development

Multimodal AI

Multimodal AI refers to systems that can process, understand, and generate content across multiple data types including text, images, audio, and video within a unified model.

Machine Learning