Discover the Benefits of Google Gemini: A Multimodal AI Model
Yesterday, December 6, Alphabet CEO Sundar Pichai and DeepMind’s CEO Demis Hassabis unveiled Google Gemini. This new release surpasses PaLM-2 and is now the largest language model introduced by the company. With its increased size, Gemini also acquires enhanced capabilities. The highest variant, Gemini Ultra, is a multimodal AI model that can respond using text, images, videos, and audio, expanding the possibilities of a general-purpose foundation model. If you are curious about the features and potential applications of Gemini AI, continue reading below.
Following the announcement of the new AI model, Google released a YouTube video showcasing the features of Google Gemini. The video mentions: “We’ve taken the material to test it in a variety of challenges, showing it a series of images and asking it to justify what it’s seeing”. The full video highlights some of Gemini’s more advanced features and use cases.
Google Gemini features
Throughout the video, Gemini is given access to the camera and can see whatever the user is doing. The video puts the AI model through a series of tests where it has to analyze whatever is happening in the visual media.
1. Multimodal dialogue
In the first part, the user draws on paper and asks Gemini to guess what it sees. The AI model guesses the image as the user increases its complexity. At each step, Gemini is able to provide a sound analysis of the drawing and provide additional information about the object. It also recognized objects and provided information about what they might be made of.
2. Multilingualism
In the second part, the user asks the AI to tell him how to pronounce a word in a different language. The AI not only displays the answer in text form, but also provides voice feedback to help the user pick up the dialect. It also helped him with pronunciation.
3. Creating the game
In the third segment, the user puts a world map and a rubber duck on the table and asks the artificial intelligence to create a fun game based on it and use emoticons in the game. Gemini obliges and creates a country guessing game where the user has to guess the name of the country based on three emojis.
4. Visual puzzles
In the next part, the artificial intelligence is tested and asked to solve a puzzle presented to it in the real world. The video shows that it can easily track and solve puzzles in real time.
5. Making connections
In the next part, the user holds two random objects on the table and asks Gemini what it sees. Based on visual context, AI is able to establish a connection between two objects and classify them. The user is constantly swapping out objects, but each time he is able to find the right category to group the objects together.
6. Creating the image and text
Next, the user holds two balls of yarn of different colors on the table and asks the AI to suggest what they could be made of. AI will come up with different things that can be done. Although the primary response is in text form, it also displays a reference image generated by AI to help the user visualize the end result.
7. Logic and spatial reasoning
AI has also been shown to be comfortable answering logic-based visual puzzles and correctly identifying its various aspects before offering a solution.
8. Translation of visuals
The last part asks Google Gemini to recognize what the user is drawing. When he draws a guitar, the AI recognizes it and plays the guitar music produced by the AI. The user keeps adding more instruments and themes, and the AI is able to change the music based on the new elements added.
The video highlights many of its features and how when the AI model is equipped with different devices and transformed into special AI tools, it can help users in different situations.