YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

language:

  • en
  • zh tags:
  • paligemma
  • multitask
  • object-detection
  • text-generation
  • visual-question-answering datasets:
  • custom license: mit

model-index: - name: Extremely4606/paligemma24_12_30 results: - task: type: object-detection dataset: type: custom name: Defect Detection metrics: - name: mAP type: mean_average_precision value: 0.85 # 替换为实际评估值 - task: type: text-generation dataset: type: custom name: Text Description metrics: - name: BLEU type: bleu value: 0.78 # 替换为实际评估值 - task: type: visual-question-answering dataset: type: custom name: Visual QA metrics: - name: Accuracy type: accuracy value: 0.90 # 替换为实际评估值

PaliGemma Multitask Model

This is a multitask model based on PaliGemma that can perform:

  1. Object Detection (defect detection)
  2. Text Generation (defect description)
  3. Visual Question Answering

Model Description

The model is fine-tuned from google/paligemma-3b-mix-224 using LoRA technique.

Usage

from transformers import AutoProcessor, AutoModelForVision2Seq

# Load model and processor
processor = AutoProcessor.from_pretrained("Extremely4606/paligemma24_12_30")
model = AutoModelForVision2Seq.from_pretrained("Extremely4606/paligemma24_12_30")

# Process image and text
inputs = processor(images=image, text=text, return_tensors="pt")

# Get predictions
outputs = model(**inputs)
Downloads last month
8
Safetensors
Model size
2.92B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.