Cloudflare (NET 1.44%), known for its global edge computing network that speeds up and secures apps and websites, is positioning itself as the simplest way to run artificial intelligence (AI) models in the cloud. The company already uses AI internally on every request that comes through its network, and it has been testing out ways for its clients to run AI models as well.

Last week, Cloudflare announced a new product called Workers AI that supercharges developers' ability to run AI inference workloads on its platform. Built on top of the Workers serverless computing product, Workers AI aims to make it dead simple to get popular AI models up and running. While the number of models it supports today is limited, the long-term potential for Cloudflare is enormous.

GPUs for everyone

Running an advanced artificial intelligence model at high speeds requires some serious hardware. Cloudflare's Workers AI runs on GPUs. The company has installed GPUs at a subset of its data centers so far, and it plans to bring the high-powered hardware to 100 data centers by the end of the year, and to nearly all of its data centers by the end of 2024. This will make it possible to run AI inference workloads close to users, minimizing latency and improving the overall experience.

Currently, Workers AI supports a handful of popular AI models, including the smallest variant of the Llama 2 large language model. This model isn't nearly as powerful as the models that underpin OpenAI's ChatGPT, but Cloudflare plans to add support for additional models over time. It is partnering with Hugging Face, which hosts more than half a million AI models, and users will eventually be able to deploy those models directly from the Cloudflare dashboard.

Notably, Cloudflare is putting privacy front and center. The company won't use customers' data to train AI models, and the models won't learn from customer usage. This is an important feature for any business that needs to keep its data private.

Workers AI is in early beta, so we can expect that there will be hiccups as the company brings it to general availability. Each AI model comes with its own set of limits that will presumably be lifted as the product matures. The Llama 2 model, for example, is limited to 50 requests per minute.

Betting on AI inference

Cloudflare is making no attempt to go after the AI training market. Training an advanced AI model requires vast computational resources, and Cloudflare's edge network provides no advantages for that purpose.

AI inference -- actually running those trained AI models -- is a different story. A key differentiator for AI platforms will be speed. In other words, when a query is submitted to an AI model, how quickly can a response be returned? A few seconds isn't good enough for a real-time application. By installing GPUs globally and allowing developers to route requests to the nearest data center, Cloudflare can greatly reduce this response time.

One benefit of focusing on the AI inference niche is that Cloudflare doesn't need the most advanced hardware available. NVIDIA's powerful H100 GPUs are in high demand for AI training, but AI inference can be accomplished with older and less powerful hardware. Cloudflare will be able to expand its GPU network much more quickly and at a lower cost by installing older accelerators. "And maybe we don't need the H100. Maybe we can live within A100 or, you know, whatever is, again, a generation or two behind," said Cloudflare CEO Matthew Prince during the latest earnings call.

Cloudflare expects the AI inference market to be "substantially larger" than the AI training market in the long run. By making it as easy as possible and as cheap as possible for developers to run AI models on its platform, the company is positioning itself to be a leader in this fast-growing industry.