NVIDIA, widely recognized for its dominance in artificial intelligence hardware thanks to its powerful GPUs such as the H100 and the new B200, has decided to enter the competitive software arena as well. Traditionally, NVIDIA has traditionally led in data center infrastructure, but is now looking to compete in large language model (LLM) development. The company has announced its own LLM, dubbed NVLM 1.0, a family of models that excel in vision and language.
NVLM 1.0: A New Multimodal Model
NVLM 1.0 is a set of multimodal models that, according to NVIDIA, compete directly with big names in the industry such as GPT-4 and Llama 3. The most prominent model in this family is the NVLM-D-72B, which features 72 billion parameters and has demonstrated outstanding performance in vision and language tasks, even outperforming Llama 3 405B in certain tests, despite being a more compact model. One of the most interesting aspects of NVLM 1.0 is its open source nature. NVIDIA plans to release the weights and code used to train the model, making it easy for developers to adapt and apply it to their own projects. This approach aligns with thetrend of providing open resources, as Meta has done with Llama, enabling greater accessibility and flexibility in the use of artificial intelligence.
A versatile model for multiple applications
NVLM-D-72B has multimodal capabilities that allow it to interpret both visual and textual inputs. It can analyze images, solve mathematical problems step-by-step, and even interpret memes, making it particularly versatile. NVIDIA explained that the modeluses advanced techniques such as OCR, reasoning, and knowledge of the world to achieve a comprehensive analysis of inputs. NVIDIA's entry into the AI software arena marks a new milestone for the company. By offering its model openly, the company is positioning itself as a serious competitor to other AI giants, with an attractive alternative for developers and experts looking for more accessible and powerful solutions.