Google confesses that the Gemini AI hands-on demo video was not authentic and was edited with the intention of “motivating developers.”
Google recently introduced its newest foundation model, Gemini, and showcased its impressive capabilities. In addition, the company shared a demo video on YouTube featuring a user engaging in different activities on a table, which the AI promptly recognized and responded to verbally. This astonishing video left viewers in awe, as no other industry player has achieved such remarkable results with an AI model thus far, and it is unlikely that anyone, including Google, will be able to replicate it anytime soon. However, it has come to light that the video was not entirely authentic, as Google had edited the Gemini AI demo video and incorporated elements to exaggerate its level of advancement.
In the demo video, questions were first raised by Bloomberg, directed by Parmy Olsen, who claimed that Google misrepresented Gemini AI’s capabilities in the demo. The Verge later contacted a Google spokesperson, who referred Oriol Vinyals, Google DeepMind’s Vice President of Research & Deep Learning Lead, to the X release. He is also the co-director of the Gemini project.
In the post, he said: “All the user prompts and outputs in the video are real, shortened. The video illustrates what multimodal user experiences built on Gemini could look like. We did it to inspire developers.”
Google admits to editing its Gemini AI demo video
Bloomberg learned more about exactly how it was done from another Google spokesperson, who revealed that the video’s results were achieved “using stills from footage and prompting through text.”
These claims are quite worrying. Many people do video editing to remove bugs and small improvements, but adding elements to make it look like it has new core competencies can be considered misleading. Apart from this, it also becomes clear that apart from certain parts of the video, it is also not that impressive.
The second killing blow was delivered by Wharton professor Ethan Mollick with X, who showed here that ChatGPT Plus could deliver similar results to an image-based prompt. He said, “The video part is pretty cool, but I find it hard to believe that there isn’t prompting going on behind the scenes here. GPT-4 class AIs are good at interpreting intent, but not so much in changing context. GPT-4 doesn’t do video, but seems to give similar answers to Gemini”.
Many have reacted to Vinyal’s post about the “shortness” and expressed their disappointment when they discovered that the video was not real. One X user responded: “If you want to inspire developers, why don’t you post factual content? Prompts can’t be ‘real’ and truncated at the same time. It was dishonest and misleading.”