Alibaba releases Qwen-VLo, its latest AI image model rivaling OpenAI’s GPT-4o

Read Time:2 Minute, 50 Second

Alibaba has launched a new AI image generation model called Qwen-VLo that is said to have the ability to understand context and generate images based on that understanding.

“Today, we are excited to introduce a new model, Qwen VLo, a unified multimodal understanding and generation model. This newly upgraded model not only “understands” the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation,” the company said in a blog post published on June 26.

Unlike previous Alibaba models such as Qwen-VL, Qwen-VLo can offer the user more detailed images with significantly more accuracy. While previous models altered unrelated details within the image when the user requested only minor changes (such as colour), Qwen-VLo is able to preserve the original structure of the image and make the requested changes to it, as per the e-commerce giant.

Story continues below this ad

The model is also able to understand open-ended requests, such as artistic style, weather changes, or even making the image bear resemblance to a specific time period. Alibaba also announced that the model would support multiple languages besides Chinese and English.

One of the model’s notable features is Multiple Image Input. The model takes existing images provided by the user, alters the text within them, and is even able to manipulate them to become part of the generated image. For instance, in an example given by the company, the user provided images of individual bathing products and a basket, then asked Qwen-VLo to put the products into the basket.

The Multiple Image Input feature in Qwen-VLo. (Image: Alibaba)

However, this feature has not been officially rolled out within the model yet.

Qwen-VLo makes use of dynamic resolution training, allowing the user to re-size their images as per required dimensions, including 1:1, 3:4, and 16:9. The model also uses a progressive top-to-bottom, left-to-right generation process, which helps in tasks requiring fine control. However, in its blog post, the company has said that the model is still in the preview stage and users could encounter errors such as inconsistency and non-compliance.

Story continues below this ad

The company further theorised that its AI models could be capable of conveying ideas and meanings through the images it creates in the future. Alibaba also proposed model generating segmentation/ detection maps to further improve the performance of Qwen-VLo.

Widely known for its e-commerce business in China, Alibaba has thrown its hat into the AI race. The company’s CEO, Eddie Wu, even said that Alibaba is now fully focused on AI model development and aims to build AI systems with human-level intellectual capabilities.

(This article has been curated by Purv Ashar, who is an intern with The Indian Express)