The Tool for Customizing Llama 3.1 Model is Here! NVIDIA Builds a Generative AI Foundry, Along with Accelerated Deployment Microservices
NVIDIA has announced the launch of the new NVIDIA AI Foundry service and NVIDIA NIM inference microservices, which, together with the newly released Meta Llama 3.1 series of open-source models, provide strong support for generative AI for enterprises worldwide.
The Llama 3.1 large language model comes in three parameter sizes: 8B, 70B, and 405B. The model was trained on over 16,000 NVIDIA Tensor Core GPUs and optimized for NVIDIA accelerated computing and software, whether in data centers, the cloud, or on local workstations equipped with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs.
Just as TSMC serves as a foundry for global chip companies, NVIDIA has created an enterprise-level AI foundry called NVIDIA AI Foundry.
NVIDIA founder and CEO Jensen Huang stated, "Meta's Llama 3.1 open-source model marks a pivotal moment for global enterprises to adopt generative AI. Llama 3.1 will spark a wave of advanced generative AI applications created by various enterprises and industries. NVIDIA AI Foundry has integrated Llama 3.1 throughout the process and can help enterprises build and deploy custom Llama supermodels."
NVIDIA AI Foundry is powered by the NVIDIA DGX Cloud AI platform, co-designed by NVIDIA and leading public cloud providers, offering an end-to-end service for rapidly building custom supermodels. It aims to provide enterprises with substantial computing resources that can be easily scaled in response to changing AI demands.
"With NVIDIA AI Foundry, enterprises can easily create and customize the most advanced AI services they desire and deploy them through NVIDIA NIM," said Meta founder and CEO Mark Zuckerberg.
If enterprises require more training data to create domain-specific models, they can use their own data along with synthetic data generated by the Llama 3.1 405B and NVIDIA Nemotron Reward models to train these supermodels for improved accuracy. Customers with their own training data can customize the Llama 3.1 model using NVIDIA NeMo and further enhance model accuracy through Domain Adaptation Pre-training (DAPT).
NVIDIA and Meta have also collaborated to provide a distillation method for Llama 3.1, allowing developers to create smaller custom Llama 3.1 models for generative AI applications. This enables enterprises to run Llama-driven AI applications on more accelerated infrastructures, such as AI workstations and laptops.
Once custom models are created, enterprises can build NVIDIA NIM inference microservices to run these models in production on their preferred cloud platforms and NVIDIA-certified systems provided by global server manufacturers, using their chosen best Machine Learning Operations (MLOps) and Artificial Intelligence Operations (AIOps) platforms.
NIM microservices help deploy the Llama 3.1 model into production, with throughput up to 2.5 times higher than running inference without NIM.
You can learn about the NVIDIA NIM inference microservices suitable for the Llama 3.1 model at ai.nvidia.com, which accelerates the deployment of the Llama 3.1 model into production-level AI.
By combining the Llama 3.1 NIM microservices with the new NVIDIA NeMo Retriever NIM microservices, you can build advanced retrieval workflows for AI copilots, assistants, and digital avatars.
By utilizing the new NVIDIA NeMo Retriever NIM inference microservices to achieve Retrieval-Augmented Generation (RAG), enterprises can deploy custom Llama supermodels and Llama NIM microservices into production to enhance response accuracy.
When combined with the NVIDIA NIM inference microservices suitable for Llama 3.1 405B, the NeMo Retriever NIM microservices can provide high retrieval accuracy for open and commercial text question-answering in RAG workflows.
NVIDIA AI Foundry integrates NVIDIA software, infrastructure, and expertise with open community models, technologies, and support from the NVIDIA AI ecosystem. NVIDIA AI Enterprise experts and global system integrator partners collaborate with AI Foundry customers to accelerate the entire process from development to deployment.
Accenture, a professional services company, is pioneering the use of NVIDIA AI Foundry to create custom Llama 3.1 models using the Accenture AI Refinery framework for itself and for clients who wish to deploy generative AI applications that reflect their culture, language, and industry.
Enterprises in industries such as healthcare, energy, financial services, retail, transportation, and telecommunications are already using NVIDIA NIM microservices for Llama. The first companies to utilize the new NIM microservices for Llama 3.1 include Aramco, AT&T, and Uber.
Hundreds of NVIDIA NIM partners providing enterprise, data, and infrastructure platforms can now integrate these new microservices into their AI solutions, empowering over 5 million developers and 19,000 startups in the NVIDIA community with generative AI.
Production support for Llama 3.1 NIM and NeMo Retriever NIM microservices is available through NVIDIA AI Enterprise. Members of the NVIDIA Developer Program will soon be able to access NIM microservices for free to conduct research, development, and testing on their preferred infrastructure.
Accessibility and cost-effectiveness are essential efforts driving enterprises to adopt AI. In recent years, NVIDIA has been helping businesses more easily acquire advanced generative AI capabilities to meet their operational needs through initiatives like AI Foundry, NIM microservices, and the development of several high-performance large models.
With the release of Llama 3.1, NVIDIA has launched a series of tools to help enterprises quickly customize or deploy Llama 3.1, demonstrating the company's keen awareness and responsiveness to changes in the cutting-edge industry. Compared to other companies still focused on chip development, NVIDIA has clearly moved much further ahead.