Glossary

AI and LLM terms, explained

Short, practical definitions for the concepts behind modern AI assistants, chatbots, and local AI clients.

A

B

C

Caching
Operations

Caching stores results to avoid repeated computation.

Catastrophic forgetting
Training

Catastrophic forgetting is loss of earlier knowledge after new training.

Chain-of-thought
Prompting

Chain-of-thought is the step-by-step reasoning a model may use.

Checkpoint
Training

A checkpoint is a saved snapshot of model weights.

Chunk overlap
Retrieval

Chunk overlap repeats some text across chunks.

Chunking
Retrieval

Chunking splits content into smaller pieces for retrieval.

Circuit breaker
Operations

A circuit breaker stops requests to a failing service.

Citations
Safety

Citations link claims to sources.

Code interpreter
Agents

A code interpreter executes code to solve tasks.

Compliance
Safety

Compliance ensures systems meet legal or policy requirements.

Concurrency limit
Operations

Concurrency limits cap simultaneous requests.

Content filter
Safety

A content filter blocks or removes unsafe content.

Context compression
Prompting

Context compression shortens input while preserving key information.

Context reranking
Retrieval

Context reranking reorders retrieved chunks for relevance.

Context truncation
Operations

Context truncation drops older tokens when the context limit is reached.

Context window
Models

The context window is the maximum amount of text a model can consider at once.

Continual learning
Training

Continual learning updates models over time without full retraining.

Conversation state
Product

Conversation state is the accumulated context of a chat.

Cosine similarity
Retrieval

Cosine similarity measures the angle between vectors.

Cost optimization
Operations

Cost optimization reduces spend without hurting quality.

Cost per token
Operations

Cost per token is the price paid for input and output tokens.

Critic model
Agents

A critic model reviews outputs and flags issues.

D

E

F

G

H

I

J

K

L

M

O

P

Parameter count
Models

Parameter count is the number of learned weights in a model.

Passage retrieval
Retrieval

Passage retrieval finds specific passages instead of full documents.

PEFT
Training

PEFT stands for parameter-efficient fine-tuning.

Perplexity
Evaluation

Perplexity measures how well a model predicts text.

PII
Safety

PII stands for personally identifiable information.

PII detection
Safety

PII detection finds personal identifiers in text.

Planner
Agents

A planner produces a structured plan for completing a task.

Planning agent
Agents

A planning agent creates a multi-step plan before acting.

Positional encoding
Models

Positional encoding injects token order information into embeddings.

Preference dataset
Training

A preference dataset contains ranked or paired responses.

Presence penalty
Generation

Presence penalty discourages reuse of tokens that already appeared.

Pretraining
Training

Pretraining is the large-scale training phase on broad data.

Prompt
Prompting

A prompt is the input text that guides the model's response.

Prompt chaining
Prompting

Prompt chaining splits a task into multiple prompt steps.

Prompt engineering
Prompting

Prompt engineering is crafting prompts to improve output quality.

Prompt evaluation
Evaluation

Prompt evaluation measures output quality across prompt variants.

Prompt injection
Safety

Prompt injection is a malicious attempt to override system instructions.

Prompt monitoring
Operations

Prompt monitoring tracks prompt usage and performance.

Prompt registry
Operations

A prompt registry stores prompts and metadata.

Prompt template
Prompting

A prompt template is a reusable prompt with placeholders.

Prompt versioning
Operations

Prompt versioning tracks prompt changes over time.

Q

R

S

Safety classifier
Safety

A safety classifier labels content risk or policy categories.

Safety policy
Safety

A safety policy defines allowed and disallowed content.

Sampling
Generation

Sampling chooses output tokens probabilistically.

Sampling seed
Generation

A sampling seed fixes the random sequence used in generation.

Schema validation
Operations

Schema validation checks output against a defined structure.

Secret management
Safety

Secret management stores and controls access to credentials.

Self-attention
Models

Self-attention lets a model weigh relationships between tokens in a sequence.

Self-consistency
Agents

Self-consistency samples multiple outputs and selects the best.

Self-reflection
Agents

Self-reflection is a review step where a model critiques its output.

Semantic caching
Operations

Semantic caching reuses responses for similar queries.

Semantic search
Retrieval

Semantic search finds results based on meaning, not just keywords.

Session
Product

A session is a bounded interaction period with a user.

Short-term memory
Agents

Short-term memory holds recent context for the current task.

Sliding window attention
Models

Sliding window attention limits attention to a moving context window.

Sparse attention
Models

Sparse attention computes attention only for selected token pairs.

Sparse MoE
Models

Sparse MoE activates only a subset of experts per token.

Speculative decoding
Generation

Speculative decoding uses a draft model to propose tokens quickly.

Speech-to-text
Multimodal

Speech-to-text converts audio into written text.

Stop sequence
Generation

A stop sequence tells the model when to stop generating.

Streaming
Operations

Streaming sends partial output as it is generated.

Structured output
Prompting

Structured output enforces a specific response format.

Supervised fine-tuning
Training

Supervised fine-tuning trains on labeled input-output pairs.

Supervisor agent
Agents

A supervisor agent coordinates sub-agents.

Synthetic data
Training

Synthetic data is model-generated training data.

System prompt
Prompting

A system prompt sets the overall behavior and rules for the model.

T

Task decomposition
Agents

Task decomposition splits a goal into smaller steps.

Temperature
Generation

Temperature controls randomness in text generation.

Temperature scaling
Generation

Temperature scaling adjusts the sharpness of token probabilities.

Text-to-speech
Multimodal

Text-to-speech converts text into spoken audio.

Thread
Product

A thread is a sequence of messages for a single topic.

Throughput
Operations

Throughput is the amount of work completed per unit time.

Time to first token
Operations

TTFT measures how fast the first token arrives.

Time to last token
Operations

TTLT measures total response time.

Token accounting
Operations

Token accounting tracks input and output token usage.

Token budget
Models

A token budget is the maximum tokens allowed for prompt and response.

Tokenization
Models

Tokenization is the process of splitting text into tokens.

Tokens
Models

Tokens are the basic units of text that models process.

Tokens per second
Operations

Tokens per second measures generation speed.

Tool router
Agents

A tool router selects the best tool for a given request.

Tool sandbox
Deployment

A tool sandbox runs tools in a restricted environment.

Tool use
Agents

Tool use lets a model call external capabilities such as search, code execution, or APIs.

Top-k
Generation

Top-k sampling limits generation to the k most probable tokens.

Top-p
Generation

Top-p (nucleus sampling) limits output to the most probable tokens whose total probability is p.

Toxicity
Safety

Toxicity is harmful, abusive, or offensive content.

Tracing
Operations

Tracing records the path of a request through a system.

Transformer
Models

Transformer is the neural network architecture behind most modern LLMs.

U

V

W

Z