Custom Providers
DevoxxGenie offers flexible integration with custom LLM providers through its "Custom OpenAI" provider option. This allows you to connect to any service that implements the OpenAI-compatible API format.
Overview
The Custom OpenAI provider feature allows you to:
- Connect to self-hosted or third-party LLM services
- Use private or specialized LLM deployments
- Integrate with enterprise LLM platforms
- Use new providers that aren't directly supported yet
This feature is particularly useful for:
- Enterprise environments with custom AI deployments
- Research teams using specialized models
- Users who want to connect to newer or niche providers
- Organizations with specific security or compliance requirements
Setting Up Custom Providers
Basic Configuration
To configure a custom provider:
- Open DevoxxGenie settings (
Settings→Tools→DevoxxGenie) - Navigate to the "LLM Providers" tab
- Select "Custom OpenAI" as the provider
- Configure the required settings:
- URL: The base endpoint for the API (e.g.,
http://localhost:8000/v1orhttps://custom-provider.example.com/api) - Model: The model identifier expected by the provider
- API Key: Authentication key (if required)
- URL: The base endpoint for the API (e.g.,
Model Parameters
You can configure standard LLM parameters:
- Temperature: Controls randomness (0.0-2.0)
- Top P: Controls diversity via nucleus sampling (0.0-1.0)
- Max Tokens: Maximum length of the response
- Timeout: Request timeout in seconds
Compatible Provider Types
Self-Hosted OpenAI-Compatible Servers
Many self-hosted inference servers implement the OpenAI API format:
LocalAI
LocalAI is a drop-in replacement for OpenAI's API that runs on your hardware:
# Example setup
docker run -p 8080:8080 localai/localai
Configuration for DevoxxGenie:
- URL:
http://localhost:8080/v1 - Model: Name of your loaded model (e.g.,
ggml-gpt4all-j)
vLLM
vLLM is a high-throughput and memory-efficient inference engine:
# Example setup
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf
Configuration for DevoxxGenie:
- URL:
http://localhost:8000/v1 - Model: The model name (e.g.,
meta-llama/Llama-2-7b-chat-hf)
Text Generation WebUI
Text Generation WebUI is a popular interface for running LLMs:
# Enable the OpenAI API extension in the WebUI
python server.py --extensions openai
Configuration for DevoxxGenie:
- URL:
http://localhost:5000/v1 - Model: The model name as shown in the WebUI
LMStudio
LMStudio provides an OpenAI-compatible server:
- Launch LMStudio
- Load your model
- Start the local server
Configuration for DevoxxGenie:
- URL:
http://localhost:1234/v1 - Model: The model name as configured in LMStudio
Java-Based LLM Servers
Several Java-based LLM servers provide OpenAI-compatible APIs:
JLama
JLama is a pure Java implementation of LLM inference:
# Start JLama with OpenAI-compatible API
java -jar jlama.jar --openai-api
Configuration for DevoxxGenie:
- URL:
http://localhost:8080/v1 - Model: The model name as configured in JLama
Llama3.java
Llama3.java is a Spring Boot wrapper for Llama 3 models:
# Start the service
java -jar llama3java.jar
Configuration for DevoxxGenie:
- URL:
http://localhost:8080/v1 - Model: The model name (e.g.,
llama3)
Enterprise AI Platforms
Many enterprise AI platforms provide OpenAI-compatible endpoints:
HuggingFace Inference Endpoints
If you deploy models on HuggingFace Inference Endpoints, they can expose OpenAI-compatible APIs:
Configuration for DevoxxGenie:
- URL: Your HuggingFace endpoint URL
- API Key: Your HuggingFace API key
- Model: The deployed model name
SageMaker Endpoints
Custom Amazon SageMaker deployments with OpenAI-compatible APIs:
Configuration for DevoxxGenie:
- URL: Your SageMaker endpoint URL
- API Key: Your AWS credentials (handled separately)
- Model: The model name expected by your endpoint
Private AI Deployments
For private enterprise AI deployments:
Configuration for DevoxxGenie:
- URL: Your internal API endpoint
- API Key: Your internal authentication token
- Model: The internal model identifier
Recent Cloud Providers with Custom Integration
Several newer cloud providers may be used via the Custom OpenAI integration. Note that DeepSeek, Grok, and Kimi are now available as native providers in DevoxxGenie. Use Custom OpenAI only if you need to connect to alternative endpoints.
DeepSeek R1 (via Nvidia/Chutes)
To connect to DeepSeek R1 model via Nvidia or Chutes (when not using the native DeepSeek provider):
Configuration for DevoxxGenie:
- URL:
https://chutes-deepseek-ai-deepseek-r1.chutes.ai/v1/(Chutes) orhttps://integrate.api.nvidia.com/v1(Nvidia) - API Key: Your DeepSeek API key
- Model:
deepseek-r1
Grok
To connect to Grok (when not using the native Grok provider):
Configuration for DevoxxGenie:
- URL:
https://console.x.ai/(Check the latest URL) - API Key: Your Grok API key
- Model: The Grok model name
Kimi (Moonshot AI)
To connect to Kimi via Custom OpenAI (when not using the native Kimi provider):
Configuration for DevoxxGenie:
- URL:
https://api.moonshot.ai/v1 - API Key: Your Kimi API key from platform.moonshot.ai
- Model:
moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k, orkimi-k2-turbo-preview
OpenAI API Compatibility
When using custom providers, it's important to understand OpenAI API compatibility:
Core Endpoints
Most providers implement these essential endpoints:
/chat/completions: For chat-based interactions (most commonly used)/completions: For completion-based interactions/models: To list available models
Request Format
The standard request format follows this pattern:
{
"model": "model-name",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, can you help me with Java code?"}
],
"temperature": 0.7,
"max_tokens": 1000
}
Response Format
The standard response format follows this pattern:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677858242,
"model": "model-name",
"choices": [
{
"message": {
"role": "assistant",
"content": "Hello! I'd be happy to help with your Java code..."
},
"finish_reason": "stop",
"index": 0
}
],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 120,
"total_tokens": 170
}
}
Compatibility Variations
Different providers may have varying levels of compatibility:
- Full compatibility: Implements all endpoints and parameters
- Basic compatibility: Implements only the essential endpoints
- Partial compatibility: May have limitations or differences in parameters
Best Practices
Testing New Providers
Before using a custom provider extensively:
- Test with simple queries to verify basic functionality
- Check response formats match what DevoxxGenie expects
- Test streaming if you intend to use it
- Validate token counting accuracy
Security Considerations
When using custom providers:
- Verify the trustworthiness of the provider
- Use HTTPS for remote endpoints
- Use API keys with minimal necessary permissions
- Be careful with sensitive code and data
Performance Optimization
To optimize performance with custom providers:
- Choose appropriate timeout settings
- Check latency and adjust expectations
- Use appropriate model parameters (temperature, etc.)
- Consider local endpoints for lower latency
Troubleshooting
Common Issues
Connection Errors
If you're unable to connect to the provider:
- Verify the URL is correct and accessible
- Check that the service is running
- Test the endpoint with a tool like curl or Postman
- Check for firewall or network restrictions
# Example curl command to test an endpoint
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model":"test-model","messages":[{"role":"user","content":"Hello"}]}'
Authentication Issues
If authentication fails:
- Verify the API key is correct
- Ensure the key has the necessary permissions
- Check if the provider requires additional headers
- Verify the authentication method (some may use Bearer tokens)
Response Format Problems
If responses aren't being processed correctly:
- Check if the provider fully implements the OpenAI schema
- Look for any non-standard fields or formats
- Test with simpler requests to isolate issues
- Check provider documentation for any format differences
Model Not Found
If you get "model not found" errors:
- Verify the model name is exactly as expected by the provider
- Check if the model needs to be loaded or started first
- Ensure the model is available in your provider's deployment
Example Configurations
Here are complete configuration examples for common custom provider setups:
LocalAI with Llama 3
URL: http://localhost:8080/v1
Model: llama3:8b
API Key: (leave empty)
Temperature: 0.7
Max Tokens: 2000
Timeout: 300
JLama Server
URL: http://localhost:8080/v1
Model: jlama-model
API Key: (leave empty)
Temperature: 0.8
Max Tokens: 1000
Timeout: 120
Enterprise Deployment
URL: https://ai-api.enterprise.com/openai/v1
Model: enterprise-code-llm
API Key: your-api-key-here
Temperature: 0.5
Max Tokens: 4000
Timeout: 60
Future Directions
The landscape of custom LLM providers is rapidly evolving:
- More standardization: Increasing adoption of OpenAI-compatible APIs
- Better performance: Improved inference servers with lower latency
- Specialized providers: More code-focused and domain-specific models
- Enterprise solutions: More managed and private AI deployments
DevoxxGenie will continue to enhance its custom provider support to accommodate these developments.