Elon Musk's xAI unveils Grok 1.5 Vision, a new AI model with integrated computer vision features, competing directly with GPT-4 Vision and Gemini 1.5 Pro. (Bloomberg)

Elon Musk Introduces Grok 1.5 Vision

Adriana MaraisApril 16, 24 Artificial Intelligence

Elon Musk’s artificial intelligence company, xAI, has unveiled an enhanced version of its Grok 1.5 model called Grok 1.5 Vision. This updated model now includes computer vision capabilities, enabling it to analyze visual content and answer questions related to images. This announcement follows closely after OpenAI’s introduction of the GPT-4 model, which also incorporates computer vision technology.

XAI announced this update via its official X account (formerly Twitter) and shared insights into the model’s features via a blog post. While the core features of Grok 1.5 remain consistent with this updated version, the added vision capabilities promise to open up new horizons for how AI interacts with the real world.

Benchmarks and performance

XAI ran benchmark tests that showcased the Grok 1.5 Vision’s performance on various metrics, including the company’s own RealWorldQA benchmark. This benchmark evaluates the model’s “real world spatial understanding”. In addition, the model was evaluated in other tests such as MMMU and ChartQA. Impressively in RealWorldQA, Grok outperformed OpenAI’s GPT-4 with Vision and Google’s Gemini 1.5 Pro, though it lagged behind in other tests.

Understanding computer vision

Computer vision is an exciting field of computer science that focuses on enabling computers, including artificial intelligence models, to recognize and interpret real-world objects using images and videos. Basically, it aims to give machines human-like vision.

Several leading technology companies are investing heavily in the development of vision-focused AI models. Google’s Gemini 1.5 Pro and OpenAI’s GPT-4 with Vision are notable competitors in this field.

The potential applications of computer vision are vast and transformative. For example, Healthify, an Indian calorie tracking and nutrition platform, recently integrated a feature called “Snap”. Here, users can photograph food items, and AI will suggest healthier recipe changes and exercise programs to compensate for calorie intake. Beyond that, computer vision holds promise for medical diagnostics, autonomous vehicles, and more.