Custom Providers

DevoxxGenie offers flexible integration with custom LLM providers through its "Custom OpenAI" provider option. This allows you to connect to any service that implements the OpenAI-compatible API format.

Overview

The Custom OpenAI provider feature allows you to:

Connect to self-hosted or third-party LLM services
Use private or specialized LLM deployments
Integrate with enterprise LLM platforms
Use new providers that aren't directly supported yet

This feature is particularly useful for:

Enterprise environments with custom AI deployments
Research teams using specialized models
Users who want to connect to newer or niche providers
Organizations with specific security or compliance requirements

Setting Up Custom Providers

Basic Configuration

To configure a custom provider:

Open DevoxxGenie settings (Settings → Tools → DevoxxGenie)
Navigate to the "LLM Providers" tab
Select "Custom OpenAI" as the provider
Configure the required settings:
- URL: The base endpoint for the API (e.g., http://localhost:8000/v1 or https://custom-provider.example.com/api)
- Model: The model identifier expected by the provider
- API Key: Authentication key (if required)

Model Parameters

You can configure standard LLM parameters:

Temperature: Controls randomness (0.0-2.0)
Top P: Controls diversity via nucleus sampling (0.0-1.0)
Max Tokens: Maximum length of the response
Timeout: Request timeout in seconds

Compatible Provider Types

Self-Hosted OpenAI-Compatible Servers

Many self-hosted inference servers implement the OpenAI API format:

LocalAI

LocalAI is a drop-in replacement for OpenAI's API that runs on your hardware:

# Example setup
docker run -p 8080:8080 localai/localai

Configuration for DevoxxGenie:

URL: http://localhost:8080/v1
Model: Name of your loaded model (e.g., ggml-gpt4all-j)

vLLM

vLLM is a high-throughput and memory-efficient inference engine:

# Example setup
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf

Configuration for DevoxxGenie:

URL: http://localhost:8000/v1
Model: The model name (e.g., meta-llama/Llama-2-7b-chat-hf)

Text Generation WebUI

Text Generation WebUI is a popular interface for running LLMs:

# Enable the OpenAI API extension in the WebUI
python server.py --extensions openai

Configuration for DevoxxGenie:

URL: http://localhost:5000/v1
Model: The model name as shown in the WebUI

LMStudio

LMStudio provides an OpenAI-compatible server:

Launch LMStudio
Load your model
Start the local server

Configuration for DevoxxGenie:

URL: http://localhost:1234/v1
Model: The model name as configured in LMStudio

Java-Based LLM Servers

Several Java-based LLM servers provide OpenAI-compatible APIs:

JLama

JLama is a pure Java implementation of LLM inference:

# Start JLama with OpenAI-compatible API
java -jar jlama.jar --openai-api

Configuration for DevoxxGenie:

URL: http://localhost:8080/v1
Model: The model name as configured in JLama

Llama3.java

Llama3.java is a Spring Boot wrapper for Llama 3 models:

# Start the service
java -jar llama3java.jar

Configuration for DevoxxGenie:

URL: http://localhost:8080/v1
Model: The model name (e.g., llama3)

Enterprise AI Platforms

Many enterprise AI platforms provide OpenAI-compatible endpoints:

HuggingFace Inference Endpoints

If you deploy models on HuggingFace Inference Endpoints, they can expose OpenAI-compatible APIs:

Configuration for DevoxxGenie:

URL: Your HuggingFace endpoint URL
API Key: Your HuggingFace API key
Model: The deployed model name

SageMaker Endpoints

Custom Amazon SageMaker deployments with OpenAI-compatible APIs:

Configuration for DevoxxGenie:

URL: Your SageMaker endpoint URL
API Key: Your AWS credentials (handled separately)
Model: The model name expected by your endpoint

Private AI Deployments

For private enterprise AI deployments:

Configuration for DevoxxGenie:

URL: Your internal API endpoint
API Key: Your internal authentication token
Model: The internal model identifier

Recent Cloud Providers with Custom Integration

Several newer cloud providers may be used via the Custom OpenAI integration. Note that DeepSeek, Grok, and Kimi are now available as native providers in DevoxxGenie. Use Custom OpenAI only if you need to connect to alternative endpoints.

DeepSeek R1 (via Nvidia/Chutes)

To connect to DeepSeek R1 model via Nvidia or Chutes (when not using the native DeepSeek provider):

Configuration for DevoxxGenie:

URL: https://chutes-deepseek-ai-deepseek-r1.chutes.ai/v1/ (Chutes) or https://integrate.api.nvidia.com/v1 (Nvidia)
API Key: Your DeepSeek API key
Model: deepseek-r1

Grok

To connect to Grok (when not using the native Grok provider):

Configuration for DevoxxGenie:

URL: https://console.x.ai/ (Check the latest URL)
API Key: Your Grok API key
Model: The Grok model name

Kimi (Moonshot AI)

To connect to Kimi via Custom OpenAI (when not using the native Kimi provider):

Configuration for DevoxxGenie:

URL: https://api.moonshot.ai/v1
API Key: Your Kimi API key from platform.moonshot.ai
Model: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k, or kimi-k2-turbo-preview

OpenAI API Compatibility

When using custom providers, it's important to understand OpenAI API compatibility:

Core Endpoints

Most providers implement these essential endpoints:

/chat/completions: For chat-based interactions (most commonly used)
/completions: For completion-based interactions
/models: To list available models

Request Format

The standard request format follows this pattern:

{
  "model": "model-name",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, can you help me with Java code?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000
}

Response Format

The standard response format follows this pattern:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "model-name",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Hello! I'd be happy to help with your Java code..."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 120,
    "total_tokens": 170
  }
}

Compatibility Variations

Different providers may have varying levels of compatibility:

Full compatibility: Implements all endpoints and parameters
Basic compatibility: Implements only the essential endpoints
Partial compatibility: May have limitations or differences in parameters

Best Practices

Testing New Providers

Before using a custom provider extensively:

Test with simple queries to verify basic functionality
Check response formats match what DevoxxGenie expects
Test streaming if you intend to use it
Validate token counting accuracy

Security Considerations

When using custom providers:

Verify the trustworthiness of the provider
Use HTTPS for remote endpoints
Use API keys with minimal necessary permissions
Be careful with sensitive code and data

Performance Optimization

To optimize performance with custom providers:

Choose appropriate timeout settings
Check latency and adjust expectations
Use appropriate model parameters (temperature, etc.)
Consider local endpoints for lower latency

Troubleshooting

Common Issues

Connection Errors

If you're unable to connect to the provider:

Verify the URL is correct and accessible
Check that the service is running
Test the endpoint with a tool like curl or Postman
Check for firewall or network restrictions

# Example curl command to test an endpoint
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model":"test-model","messages":[{"role":"user","content":"Hello"}]}'

Authentication Issues

If authentication fails:

Verify the API key is correct
Ensure the key has the necessary permissions
Check if the provider requires additional headers
Verify the authentication method (some may use Bearer tokens)

Response Format Problems

If responses aren't being processed correctly:

Check if the provider fully implements the OpenAI schema
Look for any non-standard fields or formats
Test with simpler requests to isolate issues
Check provider documentation for any format differences

Model Not Found

If you get "model not found" errors:

Verify the model name is exactly as expected by the provider
Check if the model needs to be loaded or started first
Ensure the model is available in your provider's deployment

Example Configurations

Here are complete configuration examples for common custom provider setups:

LocalAI with Llama 3

URL: http://localhost:8080/v1
Model: llama3:8b
API Key: (leave empty)
Temperature: 0.7
Max Tokens: 2000
Timeout: 300

JLama Server

URL: http://localhost:8080/v1
Model: jlama-model
API Key: (leave empty)
Temperature: 0.8
Max Tokens: 1000
Timeout: 120

Enterprise Deployment

URL: https://ai-api.enterprise.com/openai/v1
Model: enterprise-code-llm
API Key: your-api-key-here
Temperature: 0.5
Max Tokens: 4000
Timeout: 60

Future Directions

The landscape of custom LLM providers is rapidly evolving:

More standardization: Increasing adoption of OpenAI-compatible APIs
Better performance: Improved inference servers with lower latency
Specialized providers: More code-focused and domain-specific models
Enterprise solutions: More managed and private AI deployments

DevoxxGenie will continue to enhance its custom provider support to accommodate these developments.

Overview​

Setting Up Custom Providers​

Basic Configuration​

Model Parameters​

Compatible Provider Types​

Self-Hosted OpenAI-Compatible Servers​

LocalAI​

vLLM​

Text Generation WebUI​

LMStudio​

Java-Based LLM Servers​

JLama​

Llama3.java​

Enterprise AI Platforms​

HuggingFace Inference Endpoints​

SageMaker Endpoints​

Private AI Deployments​

Recent Cloud Providers with Custom Integration​

DeepSeek R1 (via Nvidia/Chutes)​

Grok​

Kimi (Moonshot AI)​

OpenAI API Compatibility​

Core Endpoints​

Request Format​

Response Format​

Compatibility Variations​

Best Practices​

Testing New Providers​

Security Considerations​

Performance Optimization​

Troubleshooting​

Common Issues​

Connection Errors​

Authentication Issues​

Response Format Problems​

Model Not Found​

Example Configurations​

LocalAI with Llama 3​

JLama Server​

Enterprise Deployment​

Future Directions​

Overview

Setting Up Custom Providers

Basic Configuration

Model Parameters

Compatible Provider Types

Self-Hosted OpenAI-Compatible Servers

LocalAI

vLLM

Text Generation WebUI

LMStudio

Java-Based LLM Servers

JLama

Llama3.java

Enterprise AI Platforms

HuggingFace Inference Endpoints

SageMaker Endpoints

Private AI Deployments

Recent Cloud Providers with Custom Integration

DeepSeek R1 (via Nvidia/Chutes)

Grok

Kimi (Moonshot AI)

OpenAI API Compatibility

Core Endpoints

Request Format

Response Format

Compatibility Variations

Best Practices

Testing New Providers

Security Considerations

Performance Optimization

Troubleshooting

Common Issues

Connection Errors

Authentication Issues

Response Format Problems

Model Not Found

Example Configurations

LocalAI with Llama 3

JLama Server

Enterprise Deployment

Future Directions