Google Introduces Enhanced AI Model for Processing Lengthy Text and Video, Despite Persistent Hallucinations
Google, a subsidiary of Alphabet Inc., is introducing an upgraded edition of its robust artificial intelligence model, claiming superiority over rival products in its ability to process extensive volumes of text and video content.
The updated AI model, called Gemini 1.5 Pro, will be made available Thursday to cloud customers and developers to test its new features and eventually create new commercial applications. Google and its rivals have spent billions to boost their capabilities in creative artificial intelligence and want to woo business customers to show that their investments are paying off.
“Our primary focus today is to deliver to you the research that enables this model,” Oriol Vinyals, Google’s vice president and CTO of Gemini, said in a press conference with reporters. “Tomorrow, we look forward to seeing what the world does with the new features.” The mid-sized version of the new AI model, the Gemini 1.5 Pro, performs at the same level as the larger Gemini 1.0 Ultra model, Google said.
We are on WhatsApp channels. Click to join.
Since the explosive success of OpenAI in late 2022 with its conversational chatbot ChatGPT, Google has been working to show that it is also a powerhouse of cutting-edge generative AI technology that can create new text, images or even videos based on user prompts. More and more companies have experimented with technology to automate tasks such as coding, summarizing reports or creating marketing campaigns.
Google launched its Gemini AI model in December, and it comes in three versions, so it can be tailored to the task at hand and run on everything from mobile devices to large data centers. Gemini is Google’s answer to the combined forces of Microsoft Corp. and OpenAI, which some say have been able to take advantage of the current AI boom faster, including among cloud customers and developers.
Now Google is trying to attract these users to its ecosystem with even more powerful tools. According to Vinyals, Gemini 1.5 can be trained faster and more efficiently, and it can process a huge amount of data every time it is requested. For example, developers can use Gemini 1.5 Pro to retrieve up to an hour of video, 11 hours of audio, or more than 700,000 words in a document. The amount of data is, according to Google, the “longest context window” of any large-scale AI model yet. According to Google, Gemini 1.5 can handle much more data compared to what the latest AI models from OpenAI and Anthropic can handle.
In a pre-recorded video presentation to reporters, Google showed how engineers asked the Gemini 1.5 Pro to take in a 402-page PDF copy of the Apollo 11 moon landing, then told it to look for quotes that presented “three funny moments.” One of the AI model’s responses noted that five hours into the Apollo 11 mission, astronaut Michael Collins told Mission Control, “If we’re late getting back to you, it’s because we’re eating sandwiches.”
In another pre-recorded demo, Google engineers asked the Gemini 1.5 Pro to find a specific scene from a 44-minute Buster Keaton film and provided the AI model with a rough sketch of the scene they remembered. The twins successfully located the scene, noting that it was filmed about 15 minutes into the video.
However, Google cautioned that, like all generative models, the answers aren’t always perfect. Gemini 1.5 Pro is still prone to hallucinations, is sometimes slow and doesn’t always understand users’ intent, forcing them to ask their questions in different ways before the model gives the correct answer. Vinyals said the company is “working to optimize” Gemini 1.5’s performance to make it faster, and that it is “still in the experimental and research phase.”
According to the company, developers can explore Gemini 1.5 Pro using Google’s AI Studio, while some cloud customers can use the AI model in private preview on its enterprise platform, Vertex AI. Google also said Thursday that it will expand access to the extensive Gemini 1.0 Ultra and open up the model to a wider number of global customers at Vertex AI.