Model Card for PaliGemma Fine-Tuned Model

This model is a fine-tuned version of Google’s PaliGemma-3B, designed for Vision-Language tasks, particularly image-based question answering and multimodal reasoning. The model has been optimized using Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA and QLoRA, to reduce computational costs while maintaining high performance.

Model Details

Model Description

  • Developed by: [Taha Majlesi]
  • Funded by: [More Information Needed]
  • Model Type: Vision-Language Model (VLM)
  • Language(s): English
  • License: MIT
  • Finetuned from model: google/paligemma-3b-pt-224

Model Sources

  • Repository: [More Information Needed]
  • Paper (if available): [More Information Needed]
  • Demo: [More Information Needed]

Uses

Direct Use

  • Visual Question Answering (VQA)
  • Multimodal reasoning on image-text pairs
  • Image captioning with contextual understanding

Downstream Use

  • Custom fine-tuning for domain-specific multimodal datasets
  • Integration into AI assistants for visual understanding
  • Enhancements in image-text search systems

Out-of-Scope Use

  • This model is not designed for pure NLP tasks without visual inputs.
  • The model may not perform well on low-resource languages.
  • Not intended for real-time inference on edge devices due to model size constraints.

Bias, Risks, and Limitations

  • Bias: The model may reflect biases present in the training data, especially in image-text relationships.
  • Limitations: Performance may degrade on unseen, highly abstract, or domain-specific images.
  • Risks: Misinterpretation of ambiguous images and hallucination of non-existent details.

Recommendations

  • Use dataset-specific fine-tuning to mitigate biases.
  • Evaluate performance on diverse benchmarks before deployment.
  • Implement human-in-the-loop validation in sensitive applications.

How to Get Started with the Model

To use the fine-tuned model, install the required libraries:

pip install transformers peft accelerate bitsandbytes
Downloads last month
157
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for tahamajs/plamma

Adapter
(169)
this model