
F5 has announced new capabilities for F5 BIG-IP Next for Kubernetes accelerated with NVIDIA BlueField-3 DPUs and the NVIDIA DOCA software framework, underscored by customer Sesterce’s validation deployment. Sesterce is a leading European operator specializing in next-generation infrastructures and sovereign AI, designed to meet the needs of accelerated computing and artificial intelligence.
Extending the F5 Application Delivery and Security Platform, BIG-IP Next for Kubernetes running natively on NVIDIA BlueField-3 DPUs delivers high-performance traffic management and security for large-scale AI infrastructure, unlocking greater efficiency, control, and performance for AI applications. In tandem with the compelling performance advantages announced along with general availability earlier this year, Sesterce has successfully completed validation of the F5 and NVIDIA solution across a number of key capabilities, including the following areas:
- Enhanced performance, multi-tenancy, and security to meet cloud-grade expectations, initially showing a 20% improvement in GPU utilization.
- Integration with NVIDIA Dynamo and KV Cache Manager to reduce latency for the reasoning of large language model (LLM) inference systems and optimization of GPUs and memory resources.
- Smart LLM routing on BlueField DPUs, running effectively with NVIDIA NIM microservices for workloads requiring multiple models, providing customers the best of all available models.
- Scaling and securing Model Context Protocol (MCP) including reverse proxy capabilities and protections for more scalable and secure LLMs, enabling customers to swiftly and safely utilize the power of MCP servers.
- Powerful data programmability with robust F5 iRules capabilities, allowing rapid customization to support AI applications and evolving security requirements.
Highlights of new solution capabilities include:
· LLM Routing and Dynamic Load Balancing with BIG-IP Next for Kubernetes
With this collaborative solution, simple AI-related tasks can be routed to less expensive, lightweight LLMs in supporting generative AI while reserving advanced models for complex queries. This level of customizable intelligence also enables routing functions to leverage domain-specific LLMs, improving output quality and significantly enhancing customer experiences. F5's advanced traffic management ensures queries are sent to the most suitable LLM, lowering latency and improving time to first token.
“Enterprises are increasingly deploying multiple LLMs to power advanced AI experiences—but routing and classifying LLM traffic can be compute-heavy, degrading performance and user experience,” said Kunal Anand, Chief Innovation Officer at F5. “By programming routing logic directly on NVIDIA BlueField-3 DPUs, F5 BIG-IP Next for Kubernetes is the most efficient approach for delivering and securing LLM traffic. This is just the beginning. Our platform unlocks new possibilities for AI infrastructure, and we’re excited to deepen co-innovation with NVIDIA as enterprise AI continues to scale.”
· Optimizing GPUs for Distributed AI Inference at Scale with NVIDIA Dynamo and KV Cache Integration
Earlier this year, NVIDIA Dynamo was introduced, providing a supplementary framework for deploying generative AI and reasoning models in large-scale distributed environments. NVIDIA Dynamo streamlines the complexity of running AI inference in distributed environments by orchestrating tasks like scheduling, routing, and memory management to ensure seamless operation under dynamic workloads. Offloading specific operations from CPUs to BlueField DPUs is one of the core benefits of the combined F5 and NVIDIA solution. With F5, the Dynamo KV Cache Manager feature can intelligently route requests based on capacity, using Key-Value (KV) caching to accelerate generative AI use cases by speeding up processes based on retaining information from previous operations (rather than requiring resource-intensive recomputation). From an infrastructure perspective, organizations storing and reusing KV cache data can do so at a fraction of the cost of using GPU memory for this purpose.
· Improved Protection for MCP Servers with F5 and NVIDIA
Model Context Protocol (MCP) is an open protocol developed by Anthropic that standardizes how applications provide context to LLMs. Deploying the combined F5 and NVIDIA solution in front of MCP servers allows F5 technology to serve as a reverse proxy, bolstering security capabilities for MCP solutions and the LLMs they support. In addition, the full data programmability enabled by F5 iRules promotes rapid adaptation and resilience for fast-evolving AI protocol requirements, as well as additional protection against emerging cybersecurity risks.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.