The Brocode AI Glossary · 240 terms · 5 categories

Every AI term you'll hear in a UAE boardroom — defined in under 60 seconds.

Plain English, a one-sentence GCC example, and a one-line note on why it matters to your business. No Wikipedia walls of text. Arabic-script parity on Arabic-specific entries. Reviewed quarterly by a named principal engineer.

240 terms

Foundation models62
Classical ML48
MLOps & infrastructure54
Governance & risk40
Arabic-specific36

Download the 24-page Pocket GuideGet the monthly vocabulary email

How an entry is built

A consistent three-block format on every term.

The shortest accurate answer, in the same shape every time. Definition. Example. Why it matters.

Block 1 · Plain definition
Forty words or fewer. No jargon. No analogies that require a second analogy. A non-technical reader can quote it into a board paper.
Block 2 · UAE / GCC example
One sentence, regionally grounded. A UAE bank, a federal entity, a Saudi retailer, a Qatari hospital — the example is recognisable to a GCC reader within five seconds.
Block 3 · Why it matters to your business
One sentence on the commercial or operational consequence. If the visitor remembers nothing else, this is the line they remember.

240
Terms in the library
5
Categories — foundation to Arabic-specific
Qtr
Reviewed every quarter by named engineers
60 s
Average time to read an entry

Most-read this month

What other readers searched for first.

1RAG
2AI agent
3Fine-tuning
4Data residency
5Vector database
6Khaleeji Arabic

Browse by category

Five tracks across the 240 terms.

62 terms
Foundation models
LLM, transformer, attention, embedding, RAG, context window, agent, tool use.
Browse this category
48 terms
Classical ML
Supervised, unsupervised, reinforcement, gradient descent, overfitting, ROC.
Browse this category
54 terms
MLOps & infrastructure
Training, inference, drift, observability, model registry, vector database.
Browse this category
40 terms
Governance & risk
Model risk, bias, fairness, audit trail, data residency, TDRA, CBUAE, FSRA.
Browse this category
36 terms
Arabic-specific
Khaleeji, MSA, NER, tashkeel, tatweel — with Arabic-script parity.
Browse this category

Sample entries — read inline

Fifteen terms in the three-block format.

The first chapter of the glossary, rendered openly. Arabic-script parity is shown on Arabic-specific terms.

Foundation

RAG (Retrieval-Augmented Generation)

Definition: A pattern where a language model is fed relevant passages from your own corpus at query time, so its answer is grounded in your data rather than its training memory.
GCC example: A UAE bank uses RAG to answer customer queries against its product disclosures — the LLM cites the disclosure page rather than inventing a rate.
Why it matters: It is the cheapest way to get an LLM to be useful on private data without retraining.

Read the practitioner guide

Foundation

Fine-tuning

Definition: Updating a model's weights on a small, task-specific dataset so it performs better on that task. Distinct from RAG, which leaves the weights alone.
GCC example: A federal entity fine-tunes Falcon-7B on 8,000 Khaleeji intent-classification examples and improves accuracy from 71 % to 88 %.
Why it matters: It is the right answer when behaviour, not knowledge, is what you need to change.

Read the practitioner guide

MLOps

Vector database

Definition: A database optimised for nearest-neighbour search over high-dimensional embeddings, used as the retrieval layer in most RAG systems.
GCC example: A telco stores 4 million product-disclosure passages as 768-dim embeddings in pgvector and serves them at p95 18 ms.
Why it matters: Choice of vector store affects latency, cost, and where the data lives — three things procurement asks about.

Foundation

Context window

Definition: The maximum number of tokens (roughly, sub-words) a language model can consider at one time. Beyond the window, content is forgotten.
GCC example: Claude 3.5 has a 200K-token window; GPT-4o has 128K; Jais 13B has 8K. Window size shapes RAG chunking strategy.
Why it matters: Window size is the difference between answering a single email and answering a 60-page board pack.

Foundation

AI agent

Definition: A program that uses a language model to decide what tool to call next, in a loop, until a goal is met. Distinguished from a chatbot by its ability to take actions.
GCC example: A claims agent in a UAE insurer pulls the policy, queries the fraud model, drafts a settlement letter, and routes it for adjuster review.
Why it matters: Agents close the loop between language and action — they are how AI moves from advice to work.

Read the practitioner guide

Arabic-specificالعربية الخليجية

Khaleeji Arabic

Definition: The cluster of Gulf Arabic dialects spoken across the UAE, KSA, Qatar, Bahrain, Kuwait, and Oman. Distinct from Modern Standard Arabic in vocabulary, morphology, and phonology.
GCC example: A Sharjah-based call to a contact-centre is in Khaleeji; an MSA-tuned model mis-transcribes 9–14 % more tokens than a Khaleeji-tuned one.
Why it matters: Most production AI for GCC customers fails on Khaleeji unless explicitly tuned for it.

Read the practitioner guide

Arabic-specificالعربية الفصحى الحديثة

MSA (Modern Standard Arabic)

Definition: The pan-Arab literary and broadcast register, used in print, news, and official documents. Almost nobody speaks it in everyday conversation.
GCC example: A federal-entity letter is in MSA; the call-centre conversation about that letter is in Khaleeji.
Why it matters: Most published Arabic NLP benchmarks score MSA — your production data is rarely MSA.

Read the practitioner guide

Arabic-specificالتشكيل

Tashkeel

Definition: Diacritical marks that disambiguate Arabic vowels and grammatical roles. Usually absent in everyday text; sometimes essential for disambiguation.
GCC example: Without tashkeel, the word "كتب" can mean "he wrote", "books", or "was written" — a Khaleeji intent classifier has to handle all three.
Why it matters: Tashkeel handling is a tokenisation question with downstream model-quality effects.

Governance

Data residency

Definition: A requirement that data remain physically within a specified jurisdiction at rest, in transit, and during processing.
GCC example: A CBUAE-supervised bank requires customer PII to remain in UAE-resident infrastructure — practically, in AWS UAE North, Azure UAE North, OCI Abu Dhabi, or G42 Cloud.
Why it matters: Residency drives architecture, vendor selection, and contract clauses. Get it wrong at design and you pay at audit.

Read the practitioner guide

Foundation

Hallucination

Definition: When a language model produces a fluent, plausible answer that is factually wrong — usually because the answer was not grounded in retrieved evidence.
GCC example: An ungrounded chatbot tells a customer a fictitious branch is open at midnight; the customer drives there and complains.
Why it matters: Hallucinations are the failure mode procurement most worries about. RAG, citation, and refusal patterns reduce them — not eliminate them.

MLOps

Model drift

Definition: A gradual deterioration in model performance because the live data distribution has drifted away from the training data.
GCC example: A fraud model trained pre-Ramadan misses a seasonal mule-account pattern; precision falls 12 % in the first week of the holy month.
Why it matters: Drift detection is the single MLOps capability separating a hobby model from a production one.

Governance

Model card

Definition: A short structured document recording what a model does, what data trained it, how it was evaluated, and its known limits — the regulator-facing equivalent of a datasheet.
GCC example: A CBUAE-supervised bank publishes a model card for every AI model in customer-facing production. The card is the first thing the supervisor asks for.
Why it matters: Model cards turn AI risk into a documented, auditable artefact. No card, no production deployment.

Read the practitioner guide

Foundation

Embedding

Definition: A vector of numbers representing the meaning of a piece of text, image, or audio in a high-dimensional space — so that semantically similar items sit near each other.
GCC example: A retailer embeds product titles in Arabic and English in a shared 768-dim space; a Khaleeji query finds the English product.
Why it matters: Embeddings are how search escapes keyword matching and starts behaving like a human reader.

Governance

TDRA-compliance

Definition: Conformance with the technical and licensing standards set by the UAE Telecommunications and Digital Government Regulatory Authority for connected and AI-enabled services.
GCC example: A federal-entity AI service exposes APIs through a TDRA-licensed interconnect; data does not leave the UAE.
Why it matters: TDRA-alignment is a procurement gate for federal projects. It shapes architecture and operating model.

MLOps

Quantisation

Definition: Reducing the numeric precision of a model's weights — for example, from 16-bit to 8-bit or 4-bit — to shrink memory and accelerate inference.
GCC example: A 13B-parameter LLM quantised to INT4 fits on a single 24 GB GPU and serves at twice the throughput.
Why it matters: Quantisation is how a self-hosted model becomes commercially viable on a sensible GPU bill.

Reviewed by Yasmin Al Marzooqi, Head of Arabic NLP — last refresh February 2026.

Arabic-specific entries — English and Arabic side-by-side

Five regional terms in script-parity format.

Where the term is anchored in the regional context, the Arabic rendering sits beside the English. Reviewed by a native MSA editor.

Khaleeji Arabic
العربية الخليجية
The cluster of Gulf Arabic dialects spoken across the UAE, KSA, Qatar, Bahrain, Kuwait, and Oman. Distinct from Modern Standard Arabic in vocabulary, morphology, and phonology.
Why it matters: Most production AI for GCC customers fails on Khaleeji unless explicitly tuned for it.
MSA (Modern Standard Arabic)
العربية الفصحى الحديثة
The pan-Arab literary and broadcast register, used in print, news, and official documents. Almost nobody speaks it in everyday conversation.
Why it matters: Most published Arabic NLP benchmarks score MSA — your production data is rarely MSA.
Tashkeel
التشكيل
Diacritical marks that disambiguate Arabic vowels and grammatical roles. Usually absent in everyday text; sometimes essential for disambiguation.
Why it matters: Tashkeel handling is a tokenisation question with downstream model-quality effects.

Free download

Brocode AI Glossary — Pocket Guide

A 24-page downloadable distillation of the 60 most-asked terms, formatted for printing or reading on a phone. The three-block format is preserved. The Arabic-script parity is preserved.

The 60 most-asked terms in the three-block format
Foundation, Classical ML, MLOps, Governance, Arabic-specific
Arabic-script parity on Arabic-specific terms
Printable single-sheet quick-reference at the back
Reviewed by Yasmin Al Marzooqi — last refresh February 2026

Glossary editorial

Missing a term? Suggest one — the editorial team reviews suggestions monthly.

Common requests in the last quarter: speculative decoding, ICV (in-country value), tatweel, prompt injection, and federated learning. Three of these will be in the next refresh.

Suggest a term

Next steps

Every AI term you'll hear in a UAE boardroom — defined in under 60 seconds.

A consistent three-block format on every term.

What other readers searched for first.

Five tracks across the 240 terms.

Foundation models

Classical ML

MLOps & infrastructure

Governance & risk

Arabic-specific

Fifteen terms in the three-block format.

RAG (Retrieval-Augmented Generation)

Fine-tuning

Vector database

Context window

AI agent

Khaleeji Arabic

MSA (Modern Standard Arabic)

Tashkeel

Data residency

Hallucination

Model drift

Model card

Embedding

TDRA-compliance

Quantisation

Five regional terms in script-parity format.

Khaleeji Arabic

MSA (Modern Standard Arabic)

Tashkeel

Brocode AI Glossary — Pocket Guide

Missing a term? Suggest one — the editorial team reviews suggestions monthly.

Related capabilities and stories

Insights hub

Practitioner guides

Commercial FAQs

NLP services

Generative AI development

Every AI term you'll hear in a UAE boardroom — defined in under 60 seconds.

A consistent three-block format on every term.

What other readers searched for first.

Five tracks across the 240 terms.

Foundation models

Classical ML

MLOps & infrastructure

Governance & risk

Arabic-specific

Fifteen terms in the three-block format.

RAG (Retrieval-Augmented Generation)

Fine-tuning

Vector database

Context window

AI agent

Khaleeji Arabic

MSA (Modern Standard Arabic)

Tashkeel

Data residency

Hallucination

Model drift

Model card

Embedding

TDRA-compliance

Quantisation

Five regional terms in script-parity format.

Khaleeji Arabic

MSA (Modern Standard Arabic)

Tashkeel

Brocode AI Glossary — Pocket Guide

Three new terms. One practitioner guide. One minute to read.

Missing a term? Suggest one — the editorial team reviews suggestions monthly.

Related capabilities and stories

Insights hub

Practitioner guides

Commercial FAQs

NLP services

Generative AI development