← all guides Guide · Malaysia · Fintech · Sovereign AI

Sovereign AI 101: why on-shore data storage just became the new standard for Malaysian fintech.

Published 6 May 2026

WEKA — the global high-performance AI data platform — announced a strategic partnership with Glocomp Systems today, May 6, to deploy production-ready AI infrastructure inside Malaysia. The deal directly answers the 2026 data-residency rules without requiring an offshore inference hop. Until this morning, "sovereign AI" in Malaysian fintech was mostly a slide in a deck. As of today, it is buyable, locally deployable, and meaningfully cheaper to defend at a regulator review than the multi-cloud architecture most fintechs shipped last year. Here's the technical and commercial read for product, engineering, and compliance leads.

What was announced

May 6 2026

WEKA × Glocomp Systems strategic partnership announced — locally-deployed, production-ready AI infrastructure for Malaysian financial services

WEKA brings the high-performance data platform layer (object storage, GPU-side caching, posix-compatible file access). Glocomp brings the local data-centre, deployment, and managed-service footprint. The combination yields an AI inference stack that runs entirely within Malaysian jurisdiction.

For two years, Malaysian fintechs running anything model-heavy have made the same uncomfortable architectural compromise: train and infer in Singapore, Tokyo, or US-East, send the data offshore for milliseconds, then justify the round-trip in a residency-compliance memo. The 2026 data-residency rules tightened the legal noose on that pattern. Today's announcement removes the technical excuse for keeping it.

This is not a hyperscaler region announcement (those continue separately). It is a data platform deployment — the layer that sits underneath the GPUs and decides how fast your inference stack can read training weights, customer embeddings, and feature stores without round-tripping to object storage. The reason the announcement matters: WEKA's architecture is the standard most large global financial firms use under their AI training pipelines. Having it locally deployable means a Malaysian fintech can now ship a credible on-shore AI product, not a marketing-grade one.

What changed for compliance

Pre-announcement (default 2024–early 2026 stack)

Inference offshore — usually Singapore or Jakarta region
Training data — copied offshore, returned post-training
Audit answer — "data flows are encrypted in transit, residency commitments per [hyperscaler] schedule"
Regulator posture — tolerated; questioned more aggressively in 2025
Risk profile — depends on the hyperscaler's regional resilience and that vendor's own compliance posture

Post-announcement (now buyable)

Inference on-shore — Glocomp DC, Malaysian jurisdiction, no cross-border hop required
Training data — stays inside Malaysia for the entire pipeline
Audit answer — "production training and inference both inside Malaysian jurisdiction; data does not leave"
Regulator posture — preferred; matches the spirit of 2026 residency rules, not just the letter
Risk profile — single-tenant or dedicated; failure modes are local and inspectable

The audit-answer line is the part most leadership teams will care about. "We comply because our hyperscaler has a Malaysian region commitment" is a defensible posture. "Our training and inference both run on infrastructure physically inside Malaysia" is a better posture. The 2026 rules don't require the better posture, but a regulator review under stress (post-incident, pre-licence-renewal, or in a sector-wide audit) gives more weight to the simpler answer.

Who this actually moves

1

Fintechs running their own models (credit scoring, fraud, KYC)

This is the primary buyer group. If your team trains or fine-tunes models on customer data — credit-risk scoring, transaction-fraud detection, KYC document verification — moving the training and inference loop on-shore is now a procurement decision, not an engineering greenfield. Expect 4–8 months from contract to first production cutover for a mid-size fintech.
2

Banks and insurers running embedded GenAI features

Customer-service summarisation, document Q&A, internal copilots. Most current deployments use a hyperscaler's managed model with private-link routing. The new option: deploy an open-weights model (Llama 3, Qwen, DeepSeek) on the on-shore stack with the data never leaving. The output quality gap closed in 2025; the residency story closes in 2026.
3

Aggregators and budgeting apps with AI categorisation

This category should care most and will move slowest. The AI features (spending categorisation, anomaly alerts) usually run via a third-party LLM API — meaning your customers' bank data leaves the country to be classified. An on-shore inference stack changes the architecture, but only if the product team prioritises it over feature velocity.
4

Pure on-device apps (Duitful, encrypted ledgers, on-device wallets)

Already compliant by construction — there is no server-side AI for residency rules to apply to, because the data does not leave the device. The new announcement does not change anything for this category. It does, however, raise the floor on what cloud fintechs can credibly claim about residency, which makes the on-device category easier to explain to consumers ("they have to do this in a data centre; we just don't move your data at all").
5

SME and corporate-banking software

Document understanding, contract analysis, supplier-risk classification. These workloads typically include sensitive corporate counterparty data — sometimes more sensitive than retail. On-shore inference is increasingly a procurement requirement from large corporate buyers. Expect RFPs in H2 2026 to include "AI inference on Malaysian-resident infrastructure" as a hard line item.

What this is not

It is

A locally-deployable, production-grade AI data platform with established financial-services credentials
An answer to the technical-feasibility question that compliance teams have been raising since 2024
A standard reference architecture that auditors will recognise from other jurisdictions (UAE, Saudi, Indonesia)
A meaningful reduction in residency risk for any fintech currently sending training data offshore

It is not

A replacement for end-to-end encryption or on-device privacy where those are the right architecture
A guarantee of cost reduction — on-shore typically costs more per GPU-hour than the cheapest hyperscaler region
An exemption from PDPA, BNM RMiT, or sector-specific rules — those still apply on top
A reason to skip data minimisation — the cheapest data to protect remains the data you never collected

The last point is the one most procurement teams skim. Buying an on-shore AI stack does not absolve a product of unnecessary collection. If your KYC flow scoops up data fields the model doesn't actually use, on-shore inference makes the storage compliant but doesn't make the collection necessary. Data minimisation is upstream of residency, and residency is upstream of inference architecture. Fix in that order.

Architecture and procurement implications

1

Inventory the data flows that currently leave Malaysia

For every model in production, document: where the training data is stored, where the training compute runs, where the inference runs, what flows to a third-party LLM API. Most fintechs discover three or four offshore hops they had not consciously approved — usually in vendor-embedded analytics or model-quality telemetry.
2

Decide the on-shore migration tier per workload

Not every workload needs to migrate. Critical, regulated, or customer-PII-heavy models go first. Internal-only or de-identified workloads can stay on the hyperscaler. The wrong default is "migrate everything"; the right default is "migrate what's residency-critical and benchmark the cost."
3

Lock the audit narrative now

Before the next external review or licence-renewal cycle, write the one-page residency narrative that names the on-shore stack, the data flows it covers, and the residual offshore exposure (if any). Auditors prefer one document that answers the question completely to a stack of memos that answer it partially.
4

Negotiate dual-region resilience

A single Malaysian DC is a single point of failure. The local-resilience story (KL + Cyberjaya, or KL + Penang) needs to land in the procurement contract — not later as an "enhancement." Dual-zone is the credible production posture.
5

Re-cost the unit economics

On-shore GPU-hours typically run 15–40% above the cheapest hyperscaler region. For a credit-scoring model running 50M inferences/month, that is a real number. It is also smaller than the cost of a residency-related licence-conditioning event. Run the math; document the trade-off.

What to ignore

1

"Sovereign AI" branding from non-specialist vendors

Several local SIs will rebrand existing managed-services offerings as "sovereign AI" this quarter. The marker of a credible offering is the data-platform layer (WEKA, VAST, or comparable), not the brand. If the deck does not name the storage layer and the GPU class, the offering is a wrap, not a stack.
2

"All AI must be on-shore" hot takes

Internal productivity tooling (a developer copilot, a doc-search assistant) does not handle customer PII and does not need on-shore inference. The conversation is about regulated data flows, not about every prompt the company sends.
3

"Open-source LLM solves residency by itself"

Self-hosting Llama 3 on a hyperscaler's GPU instance does not solve residency if the inference VPC is in Singapore. The model weights being open-source is orthogonal to where the inference physically runs. Both have to align.
4

Conflating on-shore with on-device

They solve different problems. On-shore puts data inside Malaysian jurisdiction; on-device puts it inside the user's phone. For most fintech back-end workloads, on-shore is the right answer. For consumer-facing personal-records products (expense tracking, debt tracking), on-device is the better answer because the data never has to leave the user at all.

For the consumer-facing read of the same shift, the Data Residency 5-min checklist covers what end-users should actually ask of the apps they hand financial data to.

Common questions

We're a 30-person fintech and we don't train our own models. Does this matter?

Yes, but indirectly. If you call a third-party LLM API for any feature touching customer data, the residency exposure is on you regardless of who runs the model. Map your prompts: if customer PII is in the request body, you have an offshore data flow. The fix is either an on-shore-deployed model (now feasible) or removing PII from the prompt entirely. Most teams discover the second option after costing the first.

What about Singapore — isn't that "close enough" for residency?

Not under the 2026 Malaysian data-residency rules. Geography and political proximity are different from legal jurisdiction. A Singapore-region deployment may still satisfy a hyperscaler's "regional" commitment, but it is not Malaysian-resident data. The 2026 rules treat that distinction as material.

Will this make AI fintech features more expensive?

At the GPU-hour level, yes — on-shore typically runs 15–40% above the cheapest offshore region. At the unit-economics level, the answer depends on your scale. For most pre-IPO fintechs, the cost increment is meaningful but not disqualifying. For workloads that don't actually need on-shore inference (internal tooling, public-data analysis), the right choice is to leave them where they are and migrate only what's residency-critical.

How does this interact with BNM's existing RMiT framework?

It complements rather than replaces. RMiT covers technology risk management generally — change management, third-party assurance, business continuity. Residency is a subset. An on-shore AI deployment still needs an RMiT-aligned change record, third-party due diligence on the data-platform vendor, and a documented BCP. The deployment makes the residency line easier to defend; it does not remove the surrounding RMiT obligations.

How does Duitful relate to any of this?

Duitful is the on-device end of the spectrum: encrypted local storage, no account, no server-side AI, no inference of any kind on user data. The on-shore vs offshore conversation does not apply to it because the data does not leave the device. The reason to mention it in this guide is to anchor the architecture taxonomy: on-device is the strongest privacy posture; on-shore is the strongest server-side posture; offshore is what most apps ship by default and what 2026 is now constraining.

Build the consumer side on-device

Whatever your AI stack does on the back end, the consumer-facing layer should still keep raw personal records on-device wherever it can. Duitful is the reference example — encrypted local storage, no account, no cloud sync — and it shows what an on-shore AI partnership *cannot* replace: keeping the user's raw data outside any third party's reach in the first place. Free to start, RM 19.90 one-time for Pro.

See Duitful →

Sovereign AI 101: why on-shore data storage just became the new standard for Malaysian fintech.

What was announced

What changed for compliance

Who this actually moves

Fintechs running their own models (credit scoring, fraud, KYC)

Banks and insurers running embedded GenAI features

Aggregators and budgeting apps with AI categorisation

Pure on-device apps (Duitful, encrypted ledgers, on-device wallets)

SME and corporate-banking software

What this is not

Architecture and procurement implications

Inventory the data flows that currently leave Malaysia

Decide the on-shore migration tier per workload

Lock the audit narrative now

Negotiate dual-region resilience

Re-cost the unit economics

What to ignore

"Sovereign AI" branding from non-specialist vendors

"All AI must be on-shore" hot takes

"Open-source LLM solves residency by itself"

Conflating on-shore with on-device

Common questions

Build the consumer side on-device