Published
WEKA — the global high-performance AI data platform — announced a strategic partnership with Glocomp Systems today, May 6, to deploy production-ready AI infrastructure inside Malaysia. The deal directly answers the 2026 data-residency rules without requiring an offshore inference hop. Until this morning, "sovereign AI" in Malaysian fintech was mostly a slide in a deck. As of today, it is buyable, locally deployable, and meaningfully cheaper to defend at a regulator review than the multi-cloud architecture most fintechs shipped last year. Here's the technical and commercial read for product, engineering, and compliance leads.
May 6 2026
WEKA × Glocomp Systems strategic partnership announced — locally-deployed, production-ready AI infrastructure for Malaysian financial services
WEKA brings the high-performance data platform layer (object storage, GPU-side caching, posix-compatible file access). Glocomp brings the local data-centre, deployment, and managed-service footprint. The combination yields an AI inference stack that runs entirely within Malaysian jurisdiction.
For two years, Malaysian fintechs running anything model-heavy have made the same uncomfortable architectural compromise: train and infer in Singapore, Tokyo, or US-East, send the data offshore for milliseconds, then justify the round-trip in a residency-compliance memo. The 2026 data-residency rules tightened the legal noose on that pattern. Today's announcement removes the technical excuse for keeping it.
This is not a hyperscaler region announcement (those continue separately). It is a data platform deployment — the layer that sits underneath the GPUs and decides how fast your inference stack can read training weights, customer embeddings, and feature stores without round-tripping to object storage. The reason the announcement matters: WEKA's architecture is the standard most large global financial firms use under their AI training pipelines. Having it locally deployable means a Malaysian fintech can now ship a credible on-shore AI product, not a marketing-grade one.
Pre-announcement (default 2024–early 2026 stack)
Post-announcement (now buyable)
The audit-answer line is the part most leadership teams will care about. "We comply because our hyperscaler has a Malaysian region commitment" is a defensible posture. "Our training and inference both run on infrastructure physically inside Malaysia" is a better posture. The 2026 rules don't require the better posture, but a regulator review under stress (post-incident, pre-licence-renewal, or in a sector-wide audit) gives more weight to the simpler answer.
This is the primary buyer group. If your team trains or fine-tunes models on customer data — credit-risk scoring, transaction-fraud detection, KYC document verification — moving the training and inference loop on-shore is now a procurement decision, not an engineering greenfield. Expect 4–8 months from contract to first production cutover for a mid-size fintech.
Customer-service summarisation, document Q&A, internal copilots. Most current deployments use a hyperscaler's managed model with private-link routing. The new option: deploy an open-weights model (Llama 3, Qwen, DeepSeek) on the on-shore stack with the data never leaving. The output quality gap closed in 2025; the residency story closes in 2026.
This category should care most and will move slowest. The AI features (spending categorisation, anomaly alerts) usually run via a third-party LLM API — meaning your customers' bank data leaves the country to be classified. An on-shore inference stack changes the architecture, but only if the product team prioritises it over feature velocity.
Already compliant by construction — there is no server-side AI for residency rules to apply to, because the data does not leave the device. The new announcement does not change anything for this category. It does, however, raise the floor on what cloud fintechs can credibly claim about residency, which makes the on-device category easier to explain to consumers ("they have to do this in a data centre; we just don't move your data at all").
Document understanding, contract analysis, supplier-risk classification. These workloads typically include sensitive corporate counterparty data — sometimes more sensitive than retail. On-shore inference is increasingly a procurement requirement from large corporate buyers. Expect RFPs in H2 2026 to include "AI inference on Malaysian-resident infrastructure" as a hard line item.
It is
It is not
The last point is the one most procurement teams skim. Buying an on-shore AI stack does not absolve a product of unnecessary collection. If your KYC flow scoops up data fields the model doesn't actually use, on-shore inference makes the storage compliant but doesn't make the collection necessary. Data minimisation is upstream of residency, and residency is upstream of inference architecture. Fix in that order.
For every model in production, document: where the training data is stored, where the training compute runs, where the inference runs, what flows to a third-party LLM API. Most fintechs discover three or four offshore hops they had not consciously approved — usually in vendor-embedded analytics or model-quality telemetry.
Not every workload needs to migrate. Critical, regulated, or customer-PII-heavy models go first. Internal-only or de-identified workloads can stay on the hyperscaler. The wrong default is "migrate everything"; the right default is "migrate what's residency-critical and benchmark the cost."
Before the next external review or licence-renewal cycle, write the one-page residency narrative that names the on-shore stack, the data flows it covers, and the residual offshore exposure (if any). Auditors prefer one document that answers the question completely to a stack of memos that answer it partially.
A single Malaysian DC is a single point of failure. The local-resilience story (KL + Cyberjaya, or KL + Penang) needs to land in the procurement contract — not later as an "enhancement." Dual-zone is the credible production posture.
On-shore GPU-hours typically run 15–40% above the cheapest hyperscaler region. For a credit-scoring model running 50M inferences/month, that is a real number. It is also smaller than the cost of a residency-related licence-conditioning event. Run the math; document the trade-off.
Several local SIs will rebrand existing managed-services offerings as "sovereign AI" this quarter. The marker of a credible offering is the data-platform layer (WEKA, VAST, or comparable), not the brand. If the deck does not name the storage layer and the GPU class, the offering is a wrap, not a stack.
Internal productivity tooling (a developer copilot, a doc-search assistant) does not handle customer PII and does not need on-shore inference. The conversation is about regulated data flows, not about every prompt the company sends.
Self-hosting Llama 3 on a hyperscaler's GPU instance does not solve residency if the inference VPC is in Singapore. The model weights being open-source is orthogonal to where the inference physically runs. Both have to align.
They solve different problems. On-shore puts data inside Malaysian jurisdiction; on-device puts it inside the user's phone. For most fintech back-end workloads, on-shore is the right answer. For consumer-facing personal-records products (expense tracking, debt tracking), on-device is the better answer because the data never has to leave the user at all.
For the consumer-facing read of the same shift, the Data Residency 5-min checklist covers what end-users should actually ask of the apps they hand financial data to.
Yes, but indirectly. If you call a third-party LLM API for any feature touching customer data, the residency exposure is on you regardless of who runs the model. Map your prompts: if customer PII is in the request body, you have an offshore data flow. The fix is either an on-shore-deployed model (now feasible) or removing PII from the prompt entirely. Most teams discover the second option after costing the first.
Not under the 2026 Malaysian data-residency rules. Geography and political proximity are different from legal jurisdiction. A Singapore-region deployment may still satisfy a hyperscaler's "regional" commitment, but it is not Malaysian-resident data. The 2026 rules treat that distinction as material.
At the GPU-hour level, yes — on-shore typically runs 15–40% above the cheapest offshore region. At the unit-economics level, the answer depends on your scale. For most pre-IPO fintechs, the cost increment is meaningful but not disqualifying. For workloads that don't actually need on-shore inference (internal tooling, public-data analysis), the right choice is to leave them where they are and migrate only what's residency-critical.
It complements rather than replaces. RMiT covers technology risk management generally — change management, third-party assurance, business continuity. Residency is a subset. An on-shore AI deployment still needs an RMiT-aligned change record, third-party due diligence on the data-platform vendor, and a documented BCP. The deployment makes the residency line easier to defend; it does not remove the surrounding RMiT obligations.
Duitful is the on-device end of the spectrum: encrypted local storage, no account, no server-side AI, no inference of any kind on user data. The on-shore vs offshore conversation does not apply to it because the data does not leave the device. The reason to mention it in this guide is to anchor the architecture taxonomy: on-device is the strongest privacy posture; on-shore is the strongest server-side posture; offshore is what most apps ship by default and what 2026 is now constraining.
Whatever your AI stack does on the back end, the consumer-facing layer should still keep raw personal records on-device wherever it can. Duitful is the reference example — encrypted local storage, no account, no cloud sync — and it shows what an on-shore AI partnership *cannot* replace: keeping the user's raw data outside any third party's reach in the first place. Free to start, RM 19.90 one-time for Pro.
See Duitful →