The new AI chip has been developed by AI startup Groq, and it claims to provide “the world’s fastest large language models”. (Groq)

AI startup Groq gains attention for its ‘lightning-fast’ engine, outperforming Nvidia’s AI chips

Febin MathewFebruary 22, 24 Artificial Intelligence

AI startup Groq, not to be confused with Elon Musk’s Grok, has introduced a new AI chip featuring a Language Processing Unit (LPU) architecture that promises immediate response times. This announcement comes amidst a surge in AI development, with companies like OpenAI, Meta, and Google working on their own AI tools such as Sora and Gemma. Groq boldly asserts that their technology offers the fastest large language models in the world.

Groq claims its LPUs are faster than Nvidia’s Graphics Processing Units (GPUs). Considering that Nvidia has so far grabbed the limelight when it comes to AI chips, this aspect is startling. However, to back this up, Gizmodo reports that the demonstrations performed by Groq were “lightning fast” and even made “…current versions of ChatGPT, Gemini, and even Grok look slow.”

Groq AI chip

The AI chip developed by Groq has specialized processing units that use large language models (LLM) and offer near-instant response times. The new processing unit, known as the Tensor Streaming Processor (TSP), is classified as an LPU rather than a graphics processing unit (GPU). The company says it provides “the fastest inference for computationally demanding applications with a sequential component,” such as artificial intelligence applications or LLMs.

What is the use of it?

It eliminates the need for complex scheduling hardware and favors more streamlined processing, the company claims. Groq’s LPU is designed to overcome compute density and memory bandwidth – two issues that plague LLMs. The company says that for LLMs, LPUs have greater computing capacity than GPUs and CPUs, which reduces computing time per word. This results in much faster text generation.

Calling it an “inference engine,” the company says its new AI processor supports standard machine learning (ML) frameworks such as PyTorch, TensorFlow, and ONNX for inference. However, its LPU Inference Engine does not currently support machine learning (ML) training.

Groq enables faster and more efficient processing with lower latency and consistent performance. However, it is not an AI chatbot and is not meant to replace one. Instead, it claims to make them run faster. Those who want to try Groq can use open source LLMs like Llama-2 or Mixtral 8x7B.

Examples

In a demo shared at X by HyperWrite CEO Matt Shumer, Groq provided multiple responses and referrals to a survey in seconds. Another demo of Groq in a side-by-side comparison with GPT-3.5 revealed that it performed the same task as GPT, only almost 4 times faster. According to benchmarks, Groq can mint nearly 500 tokens per second compared to the 30-50 tokens that GPT-3.5 handles.