Arabic ASR · Khaleeji-tuned · 120 days
Khaleeji speech recognition that hears what your customers actually say.
Sub-15 % WER on real Khaleeji call recordings — code-switching, channel noise, and brand names intact. Deployed inside your Genesys, Avaya, or Cisco estate in under 120 days, with a documented WER benchmark on your recordings before contract signature.
NDA + DPA · TDRA / SAMA / CBUAE-aligned · Customer-held weights
Customer
السلامعليكم،KHJصراحةKHJtheOTPماKHJوصلKHJ
Agent
وعليكمالسلام،دقيقةIcheckyouraccount.
Agent assist
Next best action
Re-send OTP via SMS
Customer eligible. KB article #2841.
Sentiment
Detected intent
- OTP not received0.94
- Login issue0.18
11.8 %
WER on Khaleeji customer turns
287 ms
First-token latency, agent-assist
14,000 h
Proprietary Arabic corpus, dialect-labelled
120 days
From signed SoW to live on the floor
The operations-floor reality
Why your speech-analytics dashboard is being ignored.
When the Khaleeji and code-switched transcripts are wrong half the time, agents stop trusting the dashboard. AHT stays flat. The CXO promise to move from sample QA to 100 % coverage slips. Again.
A CXO-level AHT target slipping for the second quarter.
Every 10 seconds of avoidable AHT across a 2-million-call-per-month operation is a seven-figure annual line. Another miss triggers a CXO review and an RFP re-issue, with reputational cost to the head of operations who championed the existing stack.
Regulator pressure for full call coverage.
CBUAE, SAMA, and TRA are pushing documented call-recording analytics for conduct and complaints. Partial coverage is a compliance exposure, not just an efficiency miss — and the audit committee has started asking why.
An Arabic-NPS gap visible on the CEO scorecard.
Arabic callers rate the same journeys 8–12 NPS points lower than English callers. That gap shows up directly on the CEO scorecard and it is now everyone's problem — not just the speech-analytics team's.
Why generic Arabic ASR breaks on Khaleeji calls
Three structural reasons your current engine drops to 30 %+ WER.
Failure mode 1
Dialect lexicon gaps
MSA-trained ASR does not know that وايد means "very" in Gulf usage, or that ابغى is the Najdi form. Customer requests get rewritten into plausible-but-wrong MSA equivalents.
Example: a UAE telco where customer turns containing the word شو were transcribed as ما in 41 % of cases on the incumbent stack.
Failure mode 2
Code-switching break
Stock engines treat Arabic and English as separate locales. Mid-sentence English brand names — the OTP, the bundle, the Apple ID — are dropped or transliterated incorrectly.
Example: a tier-1 bank where the brand "Liv." was transcribed correctly in 18 % of mentions on Azure ar-AE vs 96 % on the Brocode pipeline.
Failure mode 3
Channel noise & 8 kHz telephony
Most Arabic ASR is benchmarked on 16 kHz broadcast. Real call audio is 8 kHz, lossy, with side-tone, agent headset distortion, and noisy customer mobile audio.
Example: 1,200 customer calls on a Saudi bank — Whisper-large-v3 dropped from a published 13 % WER on broadcast to 36 % WER on the live channel.
The Brocode Arabic speech stack
Whisper-large-v3 + NeMo Conformer + a Khaleeji adapter. On Triton, in your data centre.
A purpose-built ASR stack — every layer is named, every contribution is measured against the benchmark set, and every component runs inside your boundary.
Pipeline
Audio tap → base model → adapter → decoder → CCaaS
Base acoustic models
NVIDIA NeMo Conformer-Transducer + Whisper-large-v3
Both fine-tuned on a proprietary 14,000-hour Khaleeji + MSA + Egyptian + Levantine corpus — broadcast, call recordings under DPA, and noise-augmented synthetic data.
Adapter layer
LoRA-style Khaleeji head
MSA, Khaleeji, Levantine, and Egyptian variants share one base and swap adapter weights at runtime. Code-switching head trained on UAE / KSA bilingual call turns.
Serving
NVIDIA Triton + Riva chunked attention
Streaming decoder, < 300 ms first-token latency for real-time agent assist. Continuous batching for cost efficiency on post-call workloads.
Retraining
MLflow + appliance retraining loop
Monthly retraining on your call data, on your appliance. Lineage and rollback covered by MLOps & AI Infrastructure.
Side-by-side
Brocode vs the engines on your shortlist.
Measured on a shared 50-hour Khaleeji + Egyptian call set — agent and customer turns, English code-switching, 8 kHz telephony channel.
| Capability | Brocode | Nuance Mix.asr | Azure Speech (ar-AE) | Google ar-XA | AWS Transcribe | In-house Whisper |
|---|---|---|---|---|---|---|
| Khaleeji WER on real call audio Customer turns, English code-switching, 8 kHz telephony channel. | 11.8 % | ~28 % | ~31 % | ~34 % | ~37 % | ~22 % |
| Dedicated Khaleeji dialect adapter | ||||||
| Arabic-English code-switching head | Partial | |||||
| Real-time agent assist < 300 ms | Variable | |||||
| On-premise / in-country appliance | ||||||
| Native Genesys / Avaya / Cisco connectors | ||||||
| Pre-contract benchmark on your audio | Free 200-call | Paid POC | Paid POC | No | No | In-house effort |
Numbers from the lead-magnet benchmark (Q1 2026 refresh). All figures require confirmation on your own recordings during the pre-contract benchmark.
The three objections from your CXO sponsor
What gets asked in the steering committee, and what we answer.
Objection 1
Show me real WER on Khaleeji call audio — not MSA broadcast benchmarks.
Free 200-call pre-contract benchmark on your own recordings, under NDA + DPA. Reported by dialect bucket, by code-switching rate, by agent vs customer turn. If our numbers do not clear your gates, we walk.
Objection 2
We cannot ship call recordings to a US cloud. Where exactly does inference happen?
Inside your boundary. On-premise 6U appliance, or in-country sovereign cloud (G42, stc), or hyperscaler UAE / KSA region under your residency commitments. No audio crosses the border. DPA template and TDRA pack included.
Objection 3
Real-time agent assist needs sub-500 ms. How do you tap the media without re-architecting our SBCs?
SIPREC / AudioHook / DMCC / Finesse — depending on the platform. No SBC change. First-token latency ~287 ms, NBA card surfaced inside 700 ms end-to-end. Latency budget sized with your network team in week 2.
Integration with your voice estate
Wired into the platforms your contact centre already runs.
Six platforms, six integration patterns. Each documented with a reference architecture and a latency budget — sized against your real call concurrency.
Genesys Cloud CX
AudioHook real-time tap; conversation-context API for agent-assist; post-call analytics into the Workforce Engagement plane.
Genesys Engage (on-prem)
SIPREC tap into the existing recording fabric; CallControl events for screen pops; no change at the SBC.
Avaya Aura
DMCC media tap or passive mirror at the recorder; AES events for agent presence; analytics surfaced in Avaya Workspaces.
Cisco UCCE
Finesse gadget for real-time transcript; SIPREC source via CVP / Webex Connect; CUIC reporting plug-in.
NICE CXone / Engage
Real-time WFO connector; archive ingest of recorded calls for retraining; compliance-flag write-back into the case-management layer.
Amazon Connect
Kinesis Video Streams tap; agent-assist via Connect Tasks API; deployed inside your AWS UAE or KSA region for residency.
Use-case deep dives
Where the Khaleeji ASR earns its keep on day one.
Real-time agent assist
Live transcript, sentiment ribbon, and next-best-action cards surfaced to the agent inside 700 ms end-to-end. Knowledge-base lookup, intent-aware prompts, and compliance triggers fire as the call unfolds. AHT typically lands 40–60 seconds down on the first deployed queue.
AHT down 47 s · NBA acceptance 38 %
Post-call analytics & 100 % QA
Replaces a 4 %-sample QA programme with 100 % call coverage. Searchable bilingual transcripts, sentiment trends, complaint detection, and conduct flagging across MSA, Khaleeji, English. Conduct recall typically 3–4x the sampling baseline.
Conduct recall × 3.4
Voicebot / IVR replacement
Conversational voicebots in Khaleeji and MSA, deployed against the same ASR core, with handoff back to human agents under full context preservation. Replaces touch-tone IVRs and reduces "press 0" abandonment.
32 % containment on tier-1 bank IVR
Compliance & conduct flagging
Regulator-aligned flags for CBUAE / SAMA / TRA conduct rules — missed disclosures, mis-selling language, complaint suppression. Audit trail with timestamp, agent ID, and call recording reference; integrates with NICE Engage and Verint compliance archives.
100 % call coverage vs 4 % sample baseline
For voicebot / IVR work, see Conversational AI & Chatbots. For downstream Arabic NLP on the transcripts, see Natural Language Processing.
Anonymised references
Three live deployments. Each available in full under NDA.
UAE telco — 4.2 million calls / month
WER reduced from 31 % to 11.8 % on Khaleeji customer turns. AHT down 47 seconds across the inbound consumer queue. Full Genesys Cloud CX AudioHook integration, 1,400 seats, real-time agent assist live in 14 weeks.
WER 31 % → 11.8 %
KSA tier-1 bank — agent assist on 1,800 seats
Real-time next-best-action across complaint and retention queues. Complaint-call AHT down 22 %, Arabic-language NPS up 9 points in 6 months. CBUAE-aligned conduct flagging now visible to the audit committee weekly.
Arabic NPS +9 points
Federal-entity citizen hotline
100 % call-coverage QA replacing a 4 % sample programme. Conduct-flagging recall 3.4x the prior baseline. Khaleeji + MSA + Egyptian dialect coverage, all on a single appliance behind the existing Avaya Aura estate.
100 % coverage QA · recall × 3.4
Reference logos on request under NDA. Related sector pages: Banking & Financial Services.
Free download
Khaleeji ASR Benchmark Report 2026: 6 Engines on 50 Hours of UAE & KSA Call Audio
A 28-page technical report on how six enterprise Arabic ASR engines perform on real GCC contact-centre audio. Plus an interactive WER explorer — filter by dialect, channel quality, code-switching rate, agent vs customer turn.
- Benchmark setup — 50-hour Khaleeji + Egyptian corpus, telephony channel
- WER by dialect bucket: Nuance, Azure, Google, AWS, Whisper, Brocode
- Code-switching handling — where each engine breaks
- Latency budget for real-time agent assist (8 kHz vs 16 kHz)
- Free 200-call pre-contract benchmark on your own audio
FAQ
What CXOs and procurement leads ask first.
The eight questions our speech team answers in nearly every steering committee — straight, on the record.
Whisper-large-v3 out of the box has seen very little Khaleeji audio in training. Our base models — Whisper-large-v3 and NeMo Conformer-Transducer — are fine-tuned on a proprietary 14,000-hour corpus that mixes Khaleeji broadcast, agent-customer call recordings collected under DPA from regional operators, and noise-augmented synthetic data. Layered on top is a LoRA-style Khaleeji adapter and an explicit Arabic-English code-switching head. The pre-contract benchmark measures the delta on your own recordings, by dialect bucket and by code-switching rate, before any contract.
Pre-contract WER benchmark
Two hundred of your call recordings. One signed WER report. No contract.
Six fields — volume, use cases, dialect mix, voice platform, deployment, current engine. Our Arabic Speech lead reviews your recordings under NDA + DPA and replies within one business day with the proposed benchmark plan.
Or skip the form.
Message our Speech lead on WhatsApp directly. We see it within working hours.
Message on WhatsAppContinue exploring
Related capabilities and stories
Natural Language Processing
Intent, entities, and conduct flags downstream of ASR transcripts.
Read moreConversational AI & Chatbots
Voicebot and IVR replacement on the same Arabic speech core.
Read moreMLOps & AI Infrastructure
Acoustic-model retraining, drift monitoring, and lineage.
Read moreSelf-hosted LLM Infrastructure
Sovereign LLM tier downstream of the transcript layer.
Read moreBanking & Financial Services
Tier-1 bank agent-assist and conduct-flagging deployments.
Read more