--- license: mit --- # Merlin: Vision Language Foundation Model for 3D Computed Tomography [![pypi](https://img.shields.io/pypi/v/merlin-vlm?style=for-the-badge)](https://pypi.org/project/merlin-vlm/) Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining. [[💻 Github](https://github.com/StanfordMIMI/Merlin)] [[📄 Paper](https://arxiv.org/abs/2406.06512)] ## ⚡️ Installation To install Merlin, you can simply run: ```python pip install merlin-vlm ``` For an editable installation, use the following commands to clone and install this repository. ```bash git clone https://github.com/StanfordMIMI/Merlin.git cd merlin pip install -e . ``` For usage instructions, please visit the github [repository](https://github.com/StanfordMIMI/Merlin). ### 📁 Project Structure: ``` . ├── README.md ├── i3_resnet_clinical_longformer_best_clip_04-02-2024_23-21-36_epoch_99.pt ├── image1.nii.gz ``` ## 📎 Citation If you find this repository useful for your work, please cite the cite the [original paper](https://arxiv.org/abs/2406.06512): ```bibtex @article{blankemeier2024merlin, title={Merlin: A vision language foundation model for 3d computed tomography}, author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others}, journal={Research Square}, pages={rs--3}, year={2024} } ```