Meta and Cerebras Systems introduce Llama API achieving 2,600 tokens per second, outperforming GPU solutions by up to 18x, as Meta seeks to commercialize Llama models and compete with OpenAI and Google in AI inference.

Meta partners with Cerebras Systems to launch a Llama API delivering 2,600 tokens/second, exceeding GPU solutions by up to 18x. Transforming Llama models into a commercial service, Meta aims to challenge OpenAI and Google in AI inference.

Source: venturebeat.com