Tech

Microsoft presents a small language model capable of visualizing images

Phi-3-vision is a multimodal model, that is, it can read both text and images, and it is best used on mobile devices. Microsoft says Phi-3-vision, now available in preview, is a 4.2 billion parameter model (parameters refer to the complexity of a model and the amount of training it includes) that can perform general visual reasoning tasks such as asking questions about graphs or images.

But Phi-3-vision is much smaller than other image-focused AI models like OpenAI’s DALL-E or Stability AI’s Stable Diffusion. Unlike these models, Phi-3-vision does not generate images, but it can understand the content of an image and analyze it for a user.

Microsoft announced the Phi-3 in April with the release of the Phi-3-mini, the smallest Phi-3 model with 3.8 billion parameters. The Phi-3 family has two other members: Phi-3-small (7 billion parameters) and Phi-3-medium (14 billion parameters).

AI model developers have been developing small and lightweight AI models like Phi-3 as demand for using more cost-effective and less computationally intensive AI services increases. Small models can be used to power AI functionality on devices such as phones and laptops without needing to use too much computer memory. Microsoft has already released other small models in addition to the Phi-3 and its predecessor, the Phi-2. Its math problem-solving model, Orca-Math, is said to answer math questions better than its larger counterparts, like Google’s Gemini Pro.

Phi-3-vision is now available in preview. Other members of the Phi-3 family (Phi-3-mini, Phi-3-small, and Phi-3-medium) are now available through the Azure template library.

News Source : www.theverge.com
Gn tech

Back to top button