Meta Launches AI-Powered Generative Model ‘CM3leon’ For Text and Image Creation
Meta, previously known as Facebook, has unveiled a new artificial intelligence (AI) model called “CM3leon” (pronounced as chameleon), which is capable of generating both images from text and text from images.
“CM3leon is the first multimodal model trained from text-only language models with modified recipes, including a pre-training phase augmented by large-scale search and a second multi-task supervised fine-tuning (SFT) phase,” Meta said in a blog post on Friday.
With CM3leon’s capabilities, the company said image creation tools can produce more consistent images that better follow input prompts.
According to Meta, CM3leon only requires five times more computing power and less teaching material than previous transformer-based methods.
Compared to the most commonly used image creation benchmark (null image MS-COCO), CM3Leon achieved an FID (Frechet Inception Distance) score of 4.88, setting a new high for text-to-image creation and surpassing Google’s text-to-image model, Parti.
In addition, the tech giant said that CM3leon excels at many vision language tasks, such as visual question answering and long captioning.
The null performance of CM3Leon is better compared to larger models trained on larger datasets, even though it only trains on a dataset of three billion text labels.
“As our goal is to create high-quality generative models, we believe CM3leon’s strong performance across a variety of tasks is a step toward more accurate image generation and understanding,” Meta said.
“Models like CM3leon can ultimately help drive more creativity and better applications in the metaverse. We look forward to exploring the boundaries of multimodal language models and will add more models in the future,” it added.