Model Card for PaliGemma Fine-Tuned Model
This model is a fine-tuned version of Google’s PaliGemma-3B, designed for Vision-Language tasks, particularly image-based question answering and multimodal reasoning. The model has been optimized using Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA and QLoRA, to reduce computational costs while maintaining high performance.
Model Details
Model Description
- Developed by: [Taha Majlesi]
- Funded by: [More Information Needed]
- Model Type: Vision-Language Model (VLM)
- Language(s): English
- License: MIT
- Finetuned from model: google/paligemma-3b-pt-224
Model Sources
- Repository: [More Information Needed]
- Paper (if available): [More Information Needed]
- Demo: [More Information Needed]
Uses
Direct Use
- Visual Question Answering (VQA)
- Multimodal reasoning on image-text pairs
- Image captioning with contextual understanding
Downstream Use
- Custom fine-tuning for domain-specific multimodal datasets
- Integration into AI assistants for visual understanding
- Enhancements in image-text search systems
Out-of-Scope Use
- This model is not designed for pure NLP tasks without visual inputs.
- The model may not perform well on low-resource languages.
- Not intended for real-time inference on edge devices due to model size constraints.
Bias, Risks, and Limitations
- Bias: The model may reflect biases present in the training data, especially in image-text relationships.
- Limitations: Performance may degrade on unseen, highly abstract, or domain-specific images.
- Risks: Misinterpretation of ambiguous images and hallucination of non-existent details.
Recommendations
- Use dataset-specific fine-tuning to mitigate biases.
- Evaluate performance on diverse benchmarks before deployment.
- Implement human-in-the-loop validation in sensitive applications.
How to Get Started with the Model
To use the fine-tuned model, install the required libraries:
pip install transformers peft accelerate bitsandbytes
- Downloads last month
- 157
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for tahamajs/plamma
Base model
google/paligemma-3b-pt-224