Arabic ASR · Khaleeji-tuned · 120 days

Khaleeji speech recognition that hears what your customers actually say.

Sub-15 % WER on real Khaleeji call recordings — code-switching, channel noise, and brand names intact. Deployed inside your Genesys, Avaya, or Cisco estate in under 120 days, with a documented WER benchmark on your recordings before contract signature.

Request the 200-call WER benchmark WhatsApp our Arabic Speech lead

NDA + DPA · TDRA / SAMA / CBUAE-aligned · Customer-held weights

Live transcript

287 msfirst-token

Customer

السلامعليكم،KHJصراحةKHJtheOTPماKHJوصلKHJ

Agent

وعليكمالسلام،دقيقةIcheckyouraccount.

listening…

Agent assist

Next best action

Re-send OTP via SMS

Customer eligible. KB article #2841.

Sentiment

+0.42

Detected intent

OTP not received0.94
Login issue0.18

11.8 %
WER on Khaleeji customer turns
287 ms
First-token latency, agent-assist
14,000 h
Proprietary Arabic corpus, dialect-labelled
120 days
From signed SoW to live on the floor

The operations-floor reality

Why your speech-analytics dashboard is being ignored.

When the Khaleeji and code-switched transcripts are wrong half the time, agents stop trusting the dashboard. AHT stays flat. The CXO promise to move from sample QA to 100 % coverage slips. Again.

A CXO-level AHT target slipping for the second quarter.
Every 10 seconds of avoidable AHT across a 2-million-call-per-month operation is a seven-figure annual line. Another miss triggers a CXO review and an RFP re-issue, with reputational cost to the head of operations who championed the existing stack.
Regulator pressure for full call coverage.
CBUAE, SAMA, and TRA are pushing documented call-recording analytics for conduct and complaints. Partial coverage is a compliance exposure, not just an efficiency miss — and the audit committee has started asking why.
An Arabic-NPS gap visible on the CEO scorecard.
Arabic callers rate the same journeys 8–12 NPS points lower than English callers. That gap shows up directly on the CEO scorecard and it is now everyone's problem — not just the speech-analytics team's.

Why generic Arabic ASR breaks on Khaleeji calls

Three structural reasons your current engine drops to 30 %+ WER.

Failure mode 1

Dialect lexicon gaps

MSA-trained ASR does not know that وايد means "very" in Gulf usage, or that ابغى is the Najdi form. Customer requests get rewritten into plausible-but-wrong MSA equivalents.

Example: a UAE telco where customer turns containing the word شو were transcribed as ما in 41 % of cases on the incumbent stack.

Failure mode 2

Code-switching break

Stock engines treat Arabic and English as separate locales. Mid-sentence English brand names — the OTP, the bundle, the Apple ID — are dropped or transliterated incorrectly.

Example: a tier-1 bank where the brand "Liv." was transcribed correctly in 18 % of mentions on Azure ar-AE vs 96 % on the Brocode pipeline.

Failure mode 3

Channel noise & 8 kHz telephony

Most Arabic ASR is benchmarked on 16 kHz broadcast. Real call audio is 8 kHz, lossy, with side-tone, agent headset distortion, and noisy customer mobile audio.

Example: 1,200 customer calls on a Saudi bank — Whisper-large-v3 dropped from a published 13 % WER on broadcast to 36 % WER on the live channel.

The Brocode Arabic speech stack

Whisper-large-v3 + NeMo Conformer + a Khaleeji adapter. On Triton, in your data centre.

A purpose-built ASR stack — every layer is named, every contribution is measured against the benchmark set, and every component runs inside your boundary.

Pipeline

Audio tap → base model → adapter → decoder → CCaaS

Base acoustic models

NVIDIA NeMo Conformer-Transducer + Whisper-large-v3

Both fine-tuned on a proprietary 14,000-hour Khaleeji + MSA + Egyptian + Levantine corpus — broadcast, call recordings under DPA, and noise-augmented synthetic data.

Adapter layer

LoRA-style Khaleeji head

MSA, Khaleeji, Levantine, and Egyptian variants share one base and swap adapter weights at runtime. Code-switching head trained on UAE / KSA bilingual call turns.

Serving

NVIDIA Triton + Riva chunked attention

Streaming decoder, < 300 ms first-token latency for real-time agent assist. Continuous batching for cost efficiency on post-call workloads.

Retraining

MLflow + appliance retraining loop

Monthly retraining on your call data, on your appliance. Lineage and rollback covered by MLOps & AI Infrastructure.

Side-by-side

Brocode vs the engines on your shortlist.

Measured on a shared 50-hour Khaleeji + Egyptian call set — agent and customer turns, English code-switching, 8 kHz telephony channel.

Capability	Brocode	Nuance Mix.asr	Azure Speech (ar-AE)	Google ar-XA	AWS Transcribe	In-house Whisper
Khaleeji WER on real call audio Customer turns, English code-switching, 8 kHz telephony channel.	11.8 %	~28 %	~31 %	~34 %	~37 %	~22 %
Dedicated Khaleeji dialect adapter
Arabic-English code-switching head		Partial
Real-time agent assist < 300 ms			Variable
On-premise / in-country appliance
Native Genesys / Avaya / Cisco connectors
Pre-contract benchmark on your audio	Free 200-call	Paid POC	Paid POC	No	No	In-house effort

Numbers from the lead-magnet benchmark (Q1 2026 refresh). All figures require confirmation on your own recordings during the pre-contract benchmark.

The three objections from your CXO sponsor

What gets asked in the steering committee, and what we answer.

Objection 1

Show me real WER on Khaleeji call audio — not MSA broadcast benchmarks.

Free 200-call pre-contract benchmark on your own recordings, under NDA + DPA. Reported by dialect bucket, by code-switching rate, by agent vs customer turn. If our numbers do not clear your gates, we walk.

Objection 2

We cannot ship call recordings to a US cloud. Where exactly does inference happen?

Inside your boundary. On-premise 6U appliance, or in-country sovereign cloud (G42, stc), or hyperscaler UAE / KSA region under your residency commitments. No audio crosses the border. DPA template and TDRA pack included.

Objection 3

Real-time agent assist needs sub-500 ms. How do you tap the media without re-architecting our SBCs?

SIPREC / AudioHook / DMCC / Finesse — depending on the platform. No SBC change. First-token latency ~287 ms, NBA card surfaced inside 700 ms end-to-end. Latency budget sized with your network team in week 2.

Integration with your voice estate

Wired into the platforms your contact centre already runs.

Six platforms, six integration patterns. Each documented with a reference architecture and a latency budget — sized against your real call concurrency.

Genesys Cloud CX

AudioHook real-time tap; conversation-context API for agent-assist; post-call analytics into the Workforce Engagement plane.

Genesys Engage (on-prem)

SIPREC tap into the existing recording fabric; CallControl events for screen pops; no change at the SBC.

Avaya Aura

DMCC media tap or passive mirror at the recorder; AES events for agent presence; analytics surfaced in Avaya Workspaces.

Cisco UCCE

Finesse gadget for real-time transcript; SIPREC source via CVP / Webex Connect; CUIC reporting plug-in.

NICE CXone / Engage

Real-time WFO connector; archive ingest of recorded calls for retraining; compliance-flag write-back into the case-management layer.

Amazon Connect

Kinesis Video Streams tap; agent-assist via Connect Tasks API; deployed inside your AWS UAE or KSA region for residency.

Use-case deep dives

Where the Khaleeji ASR earns its keep on day one.

Real-time agent assist

Live transcript, sentiment ribbon, and next-best-action cards surfaced to the agent inside 700 ms end-to-end. Knowledge-base lookup, intent-aware prompts, and compliance triggers fire as the call unfolds. AHT typically lands 40–60 seconds down on the first deployed queue.

AHT down 47 s · NBA acceptance 38 %

Post-call analytics & 100 % QA

Replaces a 4 %-sample QA programme with 100 % call coverage. Searchable bilingual transcripts, sentiment trends, complaint detection, and conduct flagging across MSA, Khaleeji, English. Conduct recall typically 3–4x the sampling baseline.

Conduct recall × 3.4

Voicebot / IVR replacement

Conversational voicebots in Khaleeji and MSA, deployed against the same ASR core, with handoff back to human agents under full context preservation. Replaces touch-tone IVRs and reduces "press 0" abandonment.

32 % containment on tier-1 bank IVR

Compliance & conduct flagging

Regulator-aligned flags for CBUAE / SAMA / TRA conduct rules — missed disclosures, mis-selling language, complaint suppression. Audit trail with timestamp, agent ID, and call recording reference; integrates with NICE Engage and Verint compliance archives.

100 % call coverage vs 4 % sample baseline

For voicebot / IVR work, see Conversational AI & Chatbots. For downstream Arabic NLP on the transcripts, see Natural Language Processing.

Anonymised references

Three live deployments. Each available in full under NDA.

UAE telco — 4.2 million calls / month

WER reduced from 31 % to 11.8 % on Khaleeji customer turns. AHT down 47 seconds across the inbound consumer queue. Full Genesys Cloud CX AudioHook integration, 1,400 seats, real-time agent assist live in 14 weeks.

WER 31 % → 11.8 %

KSA tier-1 bank — agent assist on 1,800 seats

Real-time next-best-action across complaint and retention queues. Complaint-call AHT down 22 %, Arabic-language NPS up 9 points in 6 months. CBUAE-aligned conduct flagging now visible to the audit committee weekly.

Arabic NPS +9 points

Federal-entity citizen hotline

100 % call-coverage QA replacing a 4 % sample programme. Conduct-flagging recall 3.4x the prior baseline. Khaleeji + MSA + Egyptian dialect coverage, all on a single appliance behind the existing Avaya Aura estate.

100 % coverage QA · recall × 3.4

Reference logos on request under NDA. Related sector pages: Banking & Financial Services.

Free download

Khaleeji ASR Benchmark Report 2026: 6 Engines on 50 Hours of UAE & KSA Call Audio

A 28-page technical report on how six enterprise Arabic ASR engines perform on real GCC contact-centre audio. Plus an interactive WER explorer — filter by dialect, channel quality, code-switching rate, agent vs customer turn.

Benchmark setup — 50-hour Khaleeji + Egyptian corpus, telephony channel
WER by dialect bucket: Nuance, Azure, Google, AWS, Whisper, Brocode
Code-switching handling — where each engine breaks
Latency budget for real-time agent assist (8 kHz vs 16 kHz)
Free 200-call pre-contract benchmark on your own audio

FAQ

What CXOs and procurement leads ask first.

The eight questions our speech team answers in nearly every steering committee — straight, on the record.

Whisper-large-v3 out of the box has seen very little Khaleeji audio in training. Our base models — Whisper-large-v3 and NeMo Conformer-Transducer — are fine-tuned on a proprietary 14,000-hour corpus that mixes Khaleeji broadcast, agent-customer call recordings collected under DPA from regional operators, and noise-augmented synthetic data. Layered on top is a LoRA-style Khaleeji adapter and an explicit Arabic-English code-switching head. The pre-contract benchmark measures the delta on your own recordings, by dialect bucket and by code-switching rate, before any contract.

Pre-contract WER benchmark

Two hundred of your call recordings. One signed WER report. No contract.

Six fields — volume, use cases, dialect mix, voice platform, deployment, current engine. Our Arabic Speech lead reviews your recordings under NDA + DPA and replies within one business day with the proposed benchmark plan.

Or skip the form.

Message our Speech lead on WhatsApp directly. We see it within working hours.

Message on WhatsApp

Continue exploring