Billing for AI Studio
Krutrim Cloud's AI Studio offers access to a diverse catalogue of open-source and in-house AI models for text generation, embeddings, speech, and multimodal use cases. All models are billed based on usage, with costs varying depending on the type of input/output and the model in use.
Model catalog billing
Each model in AI Studio defines:
- Billing mode (Input tokens, Output tokens, Audio minutes, etc.) 
- Unit of pricing (Per 1M tokens, Per minute, Per hour, etc.) 
- Rate (in ₹) 
You are billed only for what you use. Charges are calculated based on usage metrics collected during API calls or console interactions with the models.
Model Catalog Pricing
DeepSeek-R1-Distill-Llama-70B
₹10.00 / 1M tokens
₹10.00 / 1M tokens
Tokens
Text Generation
DeepSeek-R1-Distill-Llama-8B
₹3.00 / 1M tokens
₹3.00 / 1M tokens
Tokens
Text Generation
DeepSeek-R1
₹11.00 / 1M tokens
₹16.00 / 1M tokens
Tokens
Text Generation
Google / Gemma-3-27b-it
₹8.00 / 1M tokens
₹25.00 / 1M tokens
Tokens
Multimodal
Krutrim / Krutrim-1
₹16.6 / 1M tokens
₹16.6 / 1M tokens
Tokens
Text Generation
Krutrim / Krutrim-2
₹6.6 / 1M tokens
₹6.6 / 1M tokens
Tokens
Text Generation
Krutrim / Krutrim-TTS
N/A
₹4.42 / min
Minutes (output)
Text-to-Speech
Krutrim / Krutrim-Dhwani
₹24.00 / hour
N/A
Hours (input)
Speech-to-Text
Krutrim / Bhasantarit
₹6.26 / 1M tokens
N/A
Tokens (input)
Text-to-Embedding
Krutrim / Vyakyarth
₹6.06 / 1M tokens
N/A
Tokens (input)
Text-to-Embedding
Krutrim / Chitrapathak
₹83.6 / 1M tokens
₹34.53 / 1M tokens
Tokens
Image-to-Text
Krutrim / Tokenizer
₹0.00
₹0.00
Tokens
Tokenization Utility
Meta / LLaMA-3.2-11B-Vision-Instruct
₹14.94 / 1M tokens
₹14.94 / 1M tokens
Tokens
Image-to-Text
Meta / LLaMA-3.3-70B-Instruct
₹73.04 / 1M tokens
₹73.04 / 1M tokens
Tokens
Text Generation
Meta / LLaMA-4-Maverick-17B-12BE-Instruct
₹17.00 / 1M tokens
₹50.00 / 1M tokens
Tokens
Multimodal
Microsoft / Phi-4 Reasoning Plus
₹5.00 / 1M tokens
₹29.00 / 1M tokens
Tokens
Text Generation
Mistral / Mistral-7B-v0.2
₹16.6 / 1M tokens
₹16.6 / 1M tokens
Tokens
Text Generation
Qwen / Qwen3-32B
₹8.00 / 1M tokens
₹25.00 / 1M tokens
Tokens
Text Generation
Qwen / Qwen3-30B-A3B
₹8.00 / 1M tokens
₹25.00 / 1M tokens
Tokens
Text Generation
Additional Notes
- Tokenization logic varies by model and is handled automatically using the model’s default tokenizer. 
- Speech models are priced per minute/hour of audio, based on the type (TTS or STT). 
- Multimodal models may count both text and vision tokens. 
Evaluation Billing
The cost of running Model Evaluations and Performance Evaluations on the Krutrim platform is the same as inference — based on the number of tokens processed.
Token-Based Pricing
- You are billed per token used during evaluation. 
- There are no additional fees for launching or running evaluation jobs. 
Low Balance Handling
- If your account balance drops below a certain threshold, the platform will automatically pause evaluation services until funds are added. 
Fine-Tuning Billing
Krutrim offers transparent and flexible pricing for both fine-tuning and deployment of custom models. Below is a detailed breakdown of the billing model:
Fine-Tuning Pricing
Fine-tuning costs are calculated per 1 million tokens, and pricing varies based on the model used.
Llama-3-8b-instruct
₹33
per 1M tokens
Note: This is token-based pricing, ensuring you pay only for the compute resources used during fine-tuning.
Deployment Pricing
Once your model is fine-tuned, you can deploy it on an NVIDIA H100 GPU. Pricing is based on actual GPU usage.
NVIDIA H100
₹213
per GPU-hour
Deployment pricing includes inference usage — you are charged only for the active duration of your deployment.
Cost Management Best Practices
- Optimize Your Dataset: High-quality, concise datasets reduce token usage and improve training efficiency. 
- Monitor Active Deployments: Shut down unused deployments to avoid incurring extra GPU charges. 
- Use Checkpoints: Save checkpoints during training to avoid repeating the entire process in case of interruptions. 
Low Balance Handling
If your account balance falls below a defined threshold:
- Fine-tuning and deployment services will be automatically paused. 
- Resume once sufficient balance is added to the account. 
Where to See This in Console
To see real-time usage and billing:
- Go to Billing → Usage → AI Studio 
- Select the relevant sub-tab: - Model Catalogue 
- Fine Tuning 
- Deployment 
- Evaluation 
 
- Use the date filter to narrow your view 
- Click Export to download usage data 
Last updated
Was this helpful?

