Drag & Drop Images

Starting with version 0.4.10, DevoxxGenie supports drag and drop images for multimodal LLMs, allowing you to include visual context in your conversations with the AI.

Overview

The drag and drop images feature allows you to:

Drag images directly into the DevoxxGenie chat window
Include screenshots of UI, diagrams, or code
Ask questions about visual elements
Get more contextually relevant answers based on both text and images

This feature works with multimodal LLMs that can process both text and images, such as:

Google Gemini
Anthropic Claude (3 series and newer)
OpenAI GPT-4V and newer
Local models like LLaVA (via Ollama)

Using Image Drag & Drop

Adding Images to Your Prompts

To include an image in your conversation:

Drag an image file from your file explorer and drop it into the DevoxxGenie input field
Or take a screenshot and paste it directly into the input field (Ctrl+V or Cmd+V)
Type your question or prompt about the image
Submit the prompt

Supported Image Types

DevoxxGenie supports common image formats:

PNG (.png)
JPEG/JPG (.jpg, .jpeg)
GIF (.gif, non-animated)
WebP (.webp)
BMP (.bmp)

Image Size Considerations

When using images with LLMs, consider:

Resolution: Higher resolution images provide more detail but use more tokens
File size: Larger files may take longer to process
Cropping: Crop images to focus on relevant areas
Compression: Some compression is automatically applied

Use Cases for Images

Code Screenshots

Use image drag and drop to:

Share code from outside your project
Show code from other applications or websites
Ask about code that isn't text-selectable

Example prompts:

"What does this code do? How can I improve it?"
"Is there a bug in this code? How would you fix it?"
"How would you refactor this to be more efficient?"

UI and Design

Get feedback on user interfaces:

Drag in screenshots of your application UI
Ask for design recommendations
Identify UI/UX issues

Example prompts:

"How can this UI be improved for better usability?"
"Does this design follow best practices?"
"How would you implement this UI in Java Swing?"

Diagrams and Architecture

Discuss system architecture and diagrams:

Include architecture diagrams
Show database schemas
Share UML diagrams

Example prompts:

"Explain what this architecture diagram represents"
"How would you implement this class diagram in Java?"
"Suggest improvements to this database schema"

Error Messages and Logs

Get help with errors:

Share screenshots of error messages
Include log output
Show stack traces

Example prompts:

"What's causing this error and how can I fix it?"
"Help me understand this stack trace"
"How should I debug this issue?"

Combining Images with Code

For the best results, combine images with text and code:

Drag and drop an image
Include relevant code snippets in your prompt
Ask specific questions that relate both

Example combined approach:

Here's my current implementation:

[Image of UI]

And here's the related code:

```java
public void createUI() {
    JPanel panel = new JPanel();
    // ...
}

How can I improve this to better follow Material Design guidelines?

## Multimodal Model Support

Different LLM providers have varying levels of image support:

### Cloud Providers

- **OpenAI**: GPT-4V and GPT-4o support images
- **Anthropic**: Claude 3 Opus, Sonnet, and Haiku support images
- **Google**: All Gemini models support images

### Local Providers

- **Ollama**: Supports image input with models like LLaVA
- **Other local providers**: Image support varies by implementation

## Image Privacy Considerations

When using images with LLMs, consider:

1. **Cloud processing**: Images sent to cloud providers are processed on their servers
2. **Data retention**: Check provider policies regarding image data retention
3. **Sensitive information**: Avoid sharing images with sensitive or confidential information
4. **Local alternatives**: For maximum privacy, use local multimodal models like LLaVA through Ollama

## Best Practices

For the best results with image drag and drop:

1. **Be specific**: Ask clear questions about the image
2. **Crop appropriately**: Include only the relevant parts of the image
3. **Provide context**: Explain what the image shows and what you're looking for
4. **Use high contrast**: Ensure text in images is easily readable
5. **Combine with code**: When relevant, include both images and code snippets
6. **Test different models**: Different multimodal models have varying capabilities with images

## Troubleshooting

If you encounter issues with image drag and drop:

### Image Not Displaying in Input

- Verify the image format is supported
- Check that the file isn't too large
- Try copying and pasting instead of dragging and dropping

### Model Not Responding to Image

- Confirm you're using a multimodal-capable model
- Try a different provider or model
- Ensure your provider API key has access to multimodal models

### Poor Response Quality

- Improve image clarity and contrast
- Be more specific in your prompt
- Try breaking complex images into simpler ones
- Add more textual context about what you're asking

## Future Enhancements

The DevoxxGenie team is working on several enhancements to the image feature:

1. **Support for more image formats**
2. **Better image compression and optimization**
3. **Image annotation tools**
4. **Multi-image support in a single prompt**

Stay updated with the latest releases to access these improvements as they become available.

Overview​

Using Image Drag & Drop​

Adding Images to Your Prompts​

Supported Image Types​

Image Size Considerations​

Use Cases for Images​

Code Screenshots​

UI and Design​

Diagrams and Architecture​

Error Messages and Logs​

Combining Images with Code​

Overview

Using Image Drag & Drop

Adding Images to Your Prompts

Supported Image Types

Image Size Considerations

Use Cases for Images

Code Screenshots

UI and Design

Diagrams and Architecture

Error Messages and Logs

Combining Images with Code