Databricks Unveils GPU and LLM Optimization Support for Databricks Model Serving

In a move set to radically transform AI model deployment, Databricks has released a public preview of GPU and LLM optimization support for its Databricks Model Serving. This innovative feature paves the way for the deployment of an array of AI models, such as Large Language Models (LLMs) and Vision models, on the Lakehouse Platform.

The Databricks Model Serving offers automatic optimization for LLM Serving. This eliminates the need for manual configuration, leading to high-performance results. Databricks claims this is the first serverless GPU serving product based on a united data and AI platform. It empowers users to design and implement General Artificial Intelligence (GenAI) applications smoothly within one platform, facilitating all steps right from data ingestion to model deployment and monitoring.

With the Databricks Model Serving, deploying AI models becomes a breeze, even for users lacking comprehensive infrastructure knowledge. Users get the versatility of deploying myriads of models, including those based on natural language, vision, audio, tabular, or custom ones, irrespective of their training method, be it from scratch, open-source, or fine-tuned with proprietary data.

To initiate the process, users need to register their model with MLflow, post which Databricks Model Serving will create a production-level container complete with GPU libraries like CUDA and deploy it on serverless GPUs. This fully managed service takes care of everything from instance management, version compatibility maintenance, patch updates, and even auto-adjusts instances scaling congruent with traffic flows, leading to substantial savings on infrastructure expenses while optimizing performance and latency.

Along with launching the GPU and LLM support, Databricks Model Serving has introduced upgrades for more efficient serving of large language models, resulting in a significant reduction in latency and cost, up to a factor of 3-5x. For using this Optimized LLM Serving, all one needs to do is provide the model and corresponding weights. Databricks covers the remaining aspects to ensure optimal model performance.

This process unburdens users from handling low-level model optimization intricacies, allowing them to focus on integrating LLM into their application. Presently, Databricks Model Serving auto optimizes MPT and Llama2 models with plans in the pipeline to extend its support to more models in the future.

AppMaster, a no-code platform, is also known for its powerful features in handling backend, web, and mobile applications. Offering an integrated development environment, AppMaster simplifies the process of building and deploying applications, making it a strong player in the no-code market.

Databricks Unveils GPU and LLM Optimization Support for Databricks Model Serving

Related Posts