# Inferencing

## Inferencing

Once you've identified a model from the Catalog, you can run inference either directly through the Playground or by integrating via API. Inference allows you to generate outputs using prompts, uploaded files, or other input types depending on the model's modality.

***

### 1. How to Run Inference

#### Option 1: Playground (No Code)

Every model in the Catalog includes a **Playground** tab. This UI lets you:

* Enter prompts (for text models)
* Upload files (for speech/image models)
* Adjust generation parameters (`temperature`, `top_p`, `max_tokens`, etc.)
* View results inline

This is the fastest way to test a model before moving to production.

#### Option 2: API via Starter Code

Use the **Starter Code** tab to copy cURL or Python code that calls the Krutrim API directly. The code includes:

* Proper API endpoint
* Pre-filled `model` identifier
* Default prompt structure
* Optional generation parameters

You only need to plug in your API key and input data.

***

### 2. Integration Options

Krutrim inference APIs are **OpenAI-compatible**, making integration seamless with many open-source tools and SDKs.

#### OpenAI SDK (Python)

Krutrim supports the `openai` Python SDK for text models:

{% code overflow="wrap" %}

```python
from openai import OpenAI

client = OpenAI(
    api_key="your_krutrim_key",
    base_url="https://cloud.olakrutrim.com/v1"
)

response = client.chat.completions.create(
    model="krutrim-1",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement simply."}
    ]
)
print(response.choices[0].message.content)
```

{% endcode %}

#### Langchain Integration

Krutrim models can also be used in Langchain through OpenAI-compatible wrappers.

{% code overflow="wrap" %}

```python
/from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_key="your_krutrim_key",
    openai_api_base="https://cloud.olakrutrim.com/v1",
    model_name="krutrim-1"
)

llm.predict("What are some use cases of LLMs in finance?")

```

{% endcode %}

This enables integration with Langchain chains, memory, tools, and agents.

### 3. Supported Parameters

You can control generation behavior using the following parameters:

<table><thead><tr><th width="184.87890625">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><code>temperature</code></td><td>Controls randomness (lower = deterministic, higher = more creative)</td></tr><tr><td><code>top_p</code></td><td>Controls nucleus sampling probability mass</td></tr><tr><td><code>max_tokens</code></td><td>Maximum number of tokens to generate</td></tr><tr><td><code>frequency_penalty</code></td><td>Penalizes repeating tokens</td></tr><tr><td><code>presence_penalty</code></td><td>Encourages introducing new topics</td></tr><tr><td><code>logit_bias</code></td><td>Biases probability of specific tokens</td></tr><tr><td><code>stop</code></td><td>Token(s) at which generation should stop</td></tr><tr><td><code>stream</code></td><td>Enables token-by-token streaming</td></tr></tbody></table>

Defaults vary by model and can be overridden via Playground or API.

***

### 4. Tokenization and Output

* Each model uses its own tokenizer, which is applied automatically.
* You are charged per **input + output tokens**, based on the model's pricing.

Refer to the **Billing** page for detailed rates and token limits.

***

### 5. Troubleshooting Inference

| Symptom               | Likely Cause                      | Solution                                         |
| --------------------- | --------------------------------- | ------------------------------------------------ |
| Output is cut off     | `max_tokens` is too low           | Increase the `max_tokens` value                  |
| Output is repetitive  | Low `temperature` or no penalties | Raise `temperature` or apply `frequency_penalty` |
| High latency          | Large model or long prompt        | Use a smaller model or reduce prompt size        |
| Invalid model error   | Incorrect model name              | Copy exact model string from the Model Card      |
| Authentication failed | Missing or expired API key        | Regenerate your API key in the Krutrim Console   |

***

### 6. Next Steps

* Fine-Tune a model for improved domain alignment
* Evaluate model quality and latency metrics
* Deploy a model as a persistent, production-ready endpoint

For API endpoint details and parameters, visit the **API Reference**.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cloud.olakrutrim.com/basics/ai-studio/ai-jobs/inferencing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
