SpatialVLA Fine-Tuned on fractal & bridge

This model was produced by fine-tuning the SpatialVLA model via LoRA (r=32) on the fractal and bridge dataset. We made a few modifications to the training dataset to improve final performance (see the SpatialVLA paper for details).

Usage Instructions

See the SpatialVLA GitHub README for instructions on how to run and evaluate this model on WidowX Robot tasks.

Citation

BibTeX:

@misc{qu2025spatialvlaexploringspatialrepresentations,
      title={SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model}, 
      author={Delin Qu and Haoming Song and Qizhi Chen and Yuanqi Yao and Xinyi Ye and Yan Ding and Zhigang Wang and JiaYuan Gu and Bin Zhao and Dong Wang and Xuelong Li},
      year={2025},
      eprint={2501.15830},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2501.15830}, 
}
Downloads last month
99
Safetensors
Model size
4.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for IPEC-COMMUNITY/spatialvla-4b-mix-224-pt

Finetuned
(1)
this model

Collection including IPEC-COMMUNITY/spatialvla-4b-mix-224-pt