# Deployment

## Deployment

Once a model is finalized—whether from the Catalog or after fine-tuning—it can be deployed as an always-available endpoint for inference.

AI Studio supports two deployment modes:

* **On-Demand**: Instant, serverless access. Best for quick experiments and lightweight use cases.
* **Dedicated**: Persistent deployment on dedicated infrastructure. Recommended for production-grade workloads.

***

### 1. Why Deploy?

Deploying a model creates a stable, callable API that can be integrated into downstream systems and user-facing products. It ensures:

* Predictable performance
* Repeatable results
* Centralized monitoring
* Easy access via standard APIs

For high-availability, real-time applications, deploying the model is essential.

***

### 2. On-Demand vs Dedicated Deployments

| Mode          | Description                                              | Use Case                                 |
| ------------- | -------------------------------------------------------- | ---------------------------------------- |
| **On-Demand** | Serverless, ephemeral deployment managed by the platform | Ad-hoc testing, internal tools           |
| **Dedicated** | Persistent instance with reserved GPU                    | Production systems, high-throughput APIs |

***

### 3. Why Use Dedicated Deployment?

Dedicated deployments offer significant benefits over serverless access:

* **Guaranteed Throughput**: Your model runs on a dedicated GPU (e.g., NVIDIA H100), delivering consistent latency and handling concurrent requests reliably.
* **Data Security**: Inference runs in an isolated environment, reducing risk of data leakage. Suitable for enterprise, healthcare, and financial use cases.
* **Stable Endpoint**: Model versioning is locked, making it easy to debug, monitor, and iterate. Ideal for applications with audit or compliance needs.
* **Fine-tuned Model Hosting**: Use dedicated deployments to host your own custom fine-tuned models with controlled rollout.

***

### 4. Creating a Deployment

To deploy a model from Model Catalog:

1. Navigate to the deployment tab
2. Click New Deployment
3. Select desired model from drop down
4. Provide unique deplyment name
5. Click on deploy and the system provisions a dedicated instance and exposes an OpenAI-compatible endpoint

To deploy a fine-tuned model:

1. Navigate to the completed fine-tuning job
2. Click **Deploy**
3. Provide a unique deployment name
4. Click on deploy and the system provisions a dedicated instance and exposes an OpenAI-compatible endpoint

Deployment is complete within minutes and ready for use across your application stack.

***

### 5. Managing Deployments

The **Deployments** tab provides:

* **Status Monitoring**: Running, stopped, or failed state
* **Usage Tracking**: Requests, token throughput, and basic logs
* **Deployment Controls**: Start, stop, or redeploy models
* **Checkpoint Selection**: Redeploy older fine-tuning checkpoints if needed

Stopping a deployment releases its GPU and suspends the associated endpoint.

***

### 6. Next Steps

* Run Inference using the deployed model
* Evaluate latency and runtime performance post-deployment
* Fine-Tune to improve domain alignment before deployment
* Refer to the API Reference to integrate your deployment endpoint


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cloud.olakrutrim.com/basics/ai-studio/ai-jobs/deployment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
