logo
Rumah Kasus

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

Sertifikasi
Cina Beijing Qianxing Jietong Technology Co., Ltd. Sertifikasi
Cina Beijing Qianxing Jietong Technology Co., Ltd. Sertifikasi
Ulasan pelanggan
Staf penjualan Beijing Qianxing Jietong Technology Co, Ltd sangat profesional dan sabar. Mereka dapat memberikan kutipan dengan cepat. Kualitas dan kemasan produk juga sangat baik. Kerjasama kami sangat lancar.

—— Festfing DV》LLC

Ketika saya sangat mencari CPU intel dan SSD Toshiba, Sandy dari Beijing Qianxing Jietong Technology Co., Ltd memberi saya banyak bantuan dan mendapatkan produk yang saya butuhkan dengan cepat. Saya sangat menghargai dia.

—— Kitty Yen

Sandy dari Beijing Qianxing Jietong Technology Co, Ltd adalah penjual yang sangat berhati-hati, yang dapat mengingatkan saya tentang kesalahan konfigurasi saat saya membeli server. Para insinyur juga sangat profesional dan dapat dengan cepat menyelesaikan proses pengujian.

—— Strelkin Mikhail Vladimirovich

Kami sangat senang dengan pengalaman kami bekerja dengan Beijing Qianxing Jietong. Kualitas produk sangat baik, dan pengiriman selalu tepat waktu. Tim penjualan mereka profesional, sabar, dan sangat membantu dengan semua pertanyaan kami. Kami sangat menghargai dukungan mereka dan berharap dapat menjalin kemitraan jangka panjang. Sangat direkomendasikan!

—— Ahmad Navid

Kualitas: Pengalaman yang baik dengan pemasok saya. MikroTik RB3011 sudah digunakan, tetapi dalam kondisi yang sangat baik dan semuanya bekerja dengan sempurna. Komunikasi cepat dan lancar,dan semua kekhawatiran saya segera ditangani. Penyedia yang sangat dapat diandalkan sangat direkomendasikan.

—— Geran Colesio

I 'm Online Chat Now

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

April 10, 2026
WEKA has announced the integration of its NeuralMesh platform with the NVIDIA STX reference architecture, establishing its Augmented Memory Grid as a key building block for next-generation AI infrastructure. The combined solution addresses one of the most significant bottlenecks in large-scale inference environments: memory constraints that directly affect performance, total cost of ownership, and scalable growth.

Operating through NeuralMesh, WEKA’s Augmented Memory Grid expands GPU memory by externalizing and persisting key-value caches. When deployed with NVIDIA STX, this architecture delivers high-throughput context memory storage for agentic AI workloads, supporting long-context reasoning across sessions, tools, and end-to-end workflows. According to the company, configurations combining NVIDIA Vera Rubin NVL72 systems, BlueField-4 DPUs, and Spectrum-X Ethernet can boost context memory token throughput by 4x to 10x. The platform is also projected to deliver at least 320 GB/s read and 150 GB/s write throughput, more than doubling the performance of traditional AI storage architectures.

kasus perusahaan terbaru tentang WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks  0

Memory Infrastructure Becomes the Inference Bottleneck


WEKA centers this integration on the growing memory wall challenge in modern AI deployments. Within today’s inference pipelines, limited high-bandwidth GPU memory forces frequent KV cache evictions, leading to repeated recomputation and diminished operational efficiency. As system concurrency rises, these inefficiencies multiply, increasing infrastructure expenses and reducing performance predictability.

The company promotes shared KV cache infrastructure as the solution. By preserving persistent context across users and sessions, shared caching eliminates redundant processing and stabilizes token throughput. NVIDIA STX provides the validated reference architecture for this model, while WEKA delivers the storage and memory extension layer.

NeuralMesh and Augmented Memory Grid Architecture


NeuralMesh acts as WEKA’s distributed storage platform, built to integrate seamlessly across the full NVIDIA STX stack. It delivers high-performance data services optimized for AI workloads, while the Augmented Memory Grid serves as a dedicated memory expansion layer that consolidates KV cache outside of GPU memory.

This design allows inference environments to sustain long-context sessions without overloading GPU resources. By retaining cache state and enabling reuse across workloads, the platform maintains high utilization and consistent performance as deployments scale.

WEKA notes that the Augmented Memory Grid, first unveiled at GTC 2025 and now generally available, has been validated on NVIDIA Grace CPU platforms paired with BlueField DPUs. The architecture delivers measurable gains in inference efficiency, including drastically faster time-to-first-token, higher per-GPU token throughput, and stable performance under increased concurrency. Offloading the data path to BlueField-4 also reduces CPU overhead and alleviates I/O bottlenecks.

Performance and Efficiency Gains


In production-like environments, the platform is engineered to enhance responsiveness and infrastructure efficiency. WEKA states that the Augmented Memory Grid can reduce time-to-first-token by 4x to 20x, while increasing per-GPU token output by up to 6.5x. These improvements stem from higher KV cache hit rates and fewer recomputation cycles, enabling systems to maintain performance as context sizes and user counts expand.

Firmus, an AI infrastructure provider, is highlighted as an early adopter leveraging NeuralMesh with NVIDIA-based infrastructure. The firm reports improved token throughput and lower latency at scale, with gains coming from more efficient use of existing GPUs rather than additional hardware deployments.

Implications for AI Infrastructure Design


This integration highlights a shift in AI system design, where memory and storage strategies increasingly define overall performance and cost efficiency. As agentic AI workloads expand and context windows widen, DRAM-only approaches become unsustainable due to rising recomputation costs and underutilized GPUs.

WEKA positions persistent, shared KV cache as a foundational capability for AI factories. Organizations adopting this model can achieve higher GPU utilization, lower energy consumption per inference task, and more predictable scaling. In contrast, environments relying exclusively on local GPU memory will likely face rising operational costs and diminishing returns as workloads grow.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Rincian kontak
Beijing Qianxing Jietong Technology Co., Ltd.

Kontak Person: Ms. Sandy Yang

Tel: 13426366826

Mengirimkan permintaan Anda secara langsung kepada kami (0 / 3000)