Deployment

Once a model is finalized—whether from the Catalog or after fine-tuning—it can be deployed as an always-available endpoint for inference.

AI Studio supports two deployment modes:

On-Demand: Instant, serverless access. Best for quick experiments and lightweight use cases.
Dedicated: Persistent deployment on dedicated infrastructure. Recommended for production-grade workloads.

Deploying a model creates a stable, callable API that can be integrated into downstream systems and user-facing products. It ensures:

For high-availability, real-time applications, deploying the model is essential.

Mode

Description

Use Case

On-Demand

Serverless, ephemeral deployment managed by the platform

Ad-hoc testing, internal tools

Dedicated

Persistent instance with reserved GPU

Production systems, high-throughput APIs

Dedicated deployments offer significant benefits over serverless access:

Guaranteed Throughput: Your model runs on a dedicated GPU (e.g., NVIDIA H100), delivering consistent latency and handling concurrent requests reliably.
Data Security: Inference runs in an isolated environment, reducing risk of data leakage. Suitable for enterprise, healthcare, and financial use cases.
Stable Endpoint: Model versioning is locked, making it easy to debug, monitor, and iterate. Ideal for applications with audit or compliance needs.
Fine-tuned Model Hosting: Use dedicated deployments to host your own custom fine-tuned models with controlled rollout.

To deploy a model from Model Catalog:

Navigate to the deployment tab
Click New Deployment
Select desired model from drop down
Provide unique deplyment name
Click on deploy and the system provisions a dedicated instance and exposes an OpenAI-compatible endpoint

To deploy a fine-tuned model:

Navigate to the completed fine-tuning job
Click Deploy
Provide a unique deployment name
Click on deploy and the system provisions a dedicated instance and exposes an OpenAI-compatible endpoint

Deployment is complete within minutes and ready for use across your application stack.

The Deployments tab provides:

Stopping a deployment releases its GPU and suspends the associated endpoint.

Last updated 3 months ago

Was this helpful?