Meta partners with Cerebras Systems to launch a Llama API delivering 2,600 tokens/second, exceeding GPU solutions by up to 18x. Transforming Llama models into a commercial service, Meta aims to challenge OpenAI and Google in AI inference.
Source: venturebeat.com