Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)
This is a LoRA adapter for the microsoft/Phi-3.5-mini-instruct
model, fine-tuned using QLoRA on medical instruction-following datasets. This is NOT a standalone modelβyou must load it with the base model.
π₯ How to Use the LoRA Adapter
To use this adapter, you need the base model microsoft/Phi-3.5-mini-instruct
. Load it with peft
:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
# Define base model and your fine-tuned LoRA checkpoint
base_model_name = "microsoft/Phi-3.5-mini-instruct"
lora_model_path = "syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load model with proper 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3.5-mini-instruct",
quantization_config=bnb_config,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, lora_model_path)
model = model.merge_and_unload()
model.to(device)
print("Model successfully loaded!")
# Inference function
def generate_response(user_query, system_message=None, max_length=1024):
if system_message is None:
system_message = ("You are a trusted AI-powered medical assistant. "
"Analyze patient queries carefully and provide accurate, professional, and empathetic responses. "
"Prioritize patient safety, adhere to medical best practices, and recommend consulting a healthcare provider when necessary.")
# Prepare input prompt
prompt = f"<|system|> {system_message} <|end|>\n<|user|> {user_query} <|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=max_length)
# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("<|assistant|>")[-1].strip().split("<|end|>")[0].strip()
if __name__ == "__main__":
res = generate_response("Hi, How can someone let go of fever?")
print(res)
π‘ Training Details
- Base Model: microsoft/Phi-3.5-mini-instruct
- Fine-Tuned On: Medical conversations & instruction-based datasets
- Fine-Tuning Method: QLoRA
- Precision: 4-bit (
bitsandbytes
)
π License & Credits
- This adapter follows the Apache-2.0 License.
- Credits: syubraj for fine-tuning.
π Citation
If you use this model, please cite:
@misc{syubraj2024phi3.5medical,
title={Phi-3.5 Mini Instruct Medical Chat (LoRA Adapter)},
author={syubraj},
year={2024},
url={https://huggingface.co/syubraj/Phi-3.5-mini-instruct-MedicalChat-adapter}
}
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for syubraj/Phi-3.5-mini-instruct-MedicalChat-QLoRA
Base model
microsoft/Phi-3.5-mini-instruct