In a quiet reveal that resonated more through tech corridors than any orchestrated announcement, researchers at Apple Inc. and Cornell University slyly introduced Ferret to the world last October. This open source, multimodal Large Language Model (LLM) breaks Apple’s tradition of secrecy, marking a significant leap in the AI space. Using images as queries, Ferret’s quiet debut on GitHub has generated considerable interest among AI enthusiasts and researchers.
Amid the hushed corridors of innovation, researchers at Apple and Cornell University unexpectedly introduced an open source multimodal extended language model (LLM) known as Ferret last October. This unexpected release on GitHub went unnoticed but has since captivated the attention of the AI community.
The ferret’s ingenious operation – Close-up
Ferret’s modus operandi involves examining specific regions of an image, identifying valuable elements, and encapsulating them within a bounding box. This new approach allows users to use these items as queries, prompting Ferret to respond in the traditional way.
For example, when a user highlights an image of an animal and asks Ferret about its species, the model identifies it and responds accordingly. Ferret can even leverage the context of other elements in the image to provide more detailed responses, providing insight into its unique multimodal capabilities.
The open source Ferret model, characterized as having the ability to reference and make connections between various elements at different levels of granularity, marks a significant shift for Apple, as indicated by ideas shared by Zhe Gan, an AI researcher at Apple.
Known for its secretive nature, the company’s willingness to share its AI advances with the open source community is considered a surprising move. This new opening positions Apple as an important player in the field of multimodal AI, defying industry expectations.
The release of Ferret not only marks Apple’s foray into open source AI, but also reflects the company’s strategic response to the challenges of the AI industry. As technology blogger Ben Dickson noted, Apple faces stiff competition from rivals like Microsoft Corp. and Google LLC due to limitations in its computing resources. Unlike models like ChatGPT, Apple’s infrastructure is not equipped to serve large-scale extended language models (LLMs).
This predicament leaves Apple at a crossroads, with two viable options. The first is to form strategic partnerships with hyperscale cloud providers to strengthen its AI capabilities. The second, as Ferret’s release indicates, is to adopt an open source approach, similar to the strategy employed by Meta Platforms Inc. The choice between collaboration and community sharing reflects Apple’s commitment to remaining competitive in the rapidly evolving AI landscape.
While Ferret quietly explores uncharted territories in multimodal AI, Apple finds itself at a crossroads that transcends mere technological innovation. The release of this open source marvel poses a nuanced question about Apple’s future in the field of AI.
Will Ferret propel Apple to the forefront of multimodal AI, challenging industry norms and enabling collaborative advancements? Or does it symbolize a broader shift in the AI landscape, where industry giants balance proprietary prowess and community innovation? Echoes of Ferret’s stealth arrival persist, inviting speculation about Apple’s evolving role in shaping the future of artificial intelligence. The answer lies at the intersection of technology, collaboration, and the ever-changing dynamics of the AI narrative.
Gn En tech