satellite-dishLiteLLM

Access 100+ AI providers through LiteLLM proxy with load balancing and cost tracking

Access 100+ AI providers through a unified OpenAI-compatible proxy with advanced features


Overview

LiteLLM is a powerful proxy server that unifies access to 100+ AI providers (OpenAI, Anthropic, Azure, Vertex, Bedrock, Cohere, etc.) through a single OpenAI-compatible API. It adds enterprise features like load balancing, fallbacks, budgets, and rate limiting on top of any AI provider.

Key Benefits

  • 🌐 100+ Providers: Access every major AI provider through one interface

  • 🔄 Load Balancing: Distribute requests across multiple providers/models

  • 💰 Cost Tracking: Built-in budget management and spend tracking

  • ⚡ Fallbacks: Automatic failover when providers are down

  • 🔧 Proxy Mode: Run as standalone proxy server for team-wide use

  • 📊 Observability: Detailed logging, metrics, and analytics

  • 🔐 Virtual Keys: Manage API keys centrally with role-based access

Use Cases

  • Multi-Provider Access: Unified interface for all AI providers

  • Load Balancing: Distribute load across providers for reliability

  • Cost Management: Track and limit AI spending across teams

  • Provider Migration: Easy switching between providers

  • Team Collaboration: Centralized proxy for entire organization

  • Enterprise Features: Budgets, rate limits, audit logs


Quick Start

Option 1: Direct Integration (SDK Only)

Use LiteLLM directly in your code without running a proxy server.

1. Install LiteLLM

3. Use via LiteLLM Python Client

Run LiteLLM as a standalone proxy server for team-wide access.

1. Install LiteLLM

2. Create Configuration File

Create litellm_config.yaml:

3. Start Proxy Server

5. Test Setup


Provider Support

Supported Providers (100+)

LiteLLM supports all major AI providers:

Category
Providers

Major Cloud

OpenAI, Anthropic, Google (Gemini, Vertex), Azure OpenAI, AWS Bedrock

Open Source

Hugging Face, Together AI, Replicate, Ollama, vLLM, LocalAI

Specialized

Cohere, AI21, Aleph Alpha, Perplexity, Groq, Fireworks AI

Aggregators

OpenRouter, Anyscale, Deep Infra, Mistral AI

Enterprise

SageMaker, Cloudflare Workers AI, Azure AI Studio

Custom

Any OpenAI-compatible endpoint

Model Name Format


Advanced Features

1. Load Balancing

Distribute requests across multiple providers or API keys:

Usage with NeurosLink AI:

2. Automatic Failover

Configure fallback providers for reliability:

3. Budget Management

Set spending limits per user/team:

Track spending:

4. Rate Limiting

Control request rates per user/model:

5. Caching

Reduce costs by caching responses:

Usage:

6. Virtual Keys (Team Management)

Create team-specific API keys with permissions:

Teams use their virtual key:


Basic Usage

Multi-Model Workflow

Cost Tracking


CLI Usage

Basic Commands

Proxy Management


Production Deployment

Docker Deployment

Docker Compose

Kubernetes Deployment

High Availability Setup


Observability & Monitoring

Logging

Prometheus Metrics

Custom Logging


Troubleshooting

Common Issues

1. "Connection refused"

Problem: LiteLLM proxy not running.

Solution:

2. "Invalid API key"

Problem: Master key or virtual key incorrect.

Solution:

3. "Budget exceeded"

Problem: Virtual key reached budget limit.

Solution:

4. "Model not found"

Problem: Model not configured in model_list.

Solution:


Best Practices

1. Use Virtual Keys

2. Enable Fallbacks

3. Implement Caching

4. Monitor Costs

5. Use Load Balancing



Additional Resources


Need Help? Join our GitHub Discussionsarrow-up-right or open an issuearrow-up-right.

Last updated

Was this helpful?