metadata

license: apache-2.0
pipeline_tag: text-generation
tags:
  - Medical AI
  - Small LM
  - USMLE
  - Chain-of-thought Reasoning
  - Synthetic Data

Meerkat-7B (Version 1.0)

Meerkat-7B-v1.0 is an instruction-tuned medical AI system that surpasses the passing threshold of 60% for the United States Medical Licensing Examination (USMLE) for the first time among all 7B-parameter models. The model was trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. This equips the model with high-level medical reasoning capabilities required for solving complex medical problems. For further insights into our model, please refer to our paper.

Quick Start

The input query should always end with "ASSISTANT:" as shown below.

query = "USER: What should I do when I get cold? ASSISTANT:"

We can use our model using the apply_chat_template function as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"  # cuda or cpu
checkpoint = "dmis-lab/meerkat-7b-v1.0"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    #torch_dtype=torch.bfloat16,  # You can choose to use this when there's not enough GPU memory available.
)

# Multi-turn dialogue example
messages = [
    {"role": "system", "content": "You are a helpful doctor or healthcare professional. Guide the conversation to provide useful, complete, and scientifically-grounded answers to user questions. You have the option to compose a concise, single-turn conversation if the user's input is comprehensive to provide accurate answers. However, if essential details are missing, you should engage in a multi-turn dialogue, asking follow-up questions to gather a thorough medical history and records.\n\n"},
    {"role": "user", "content": "Hello, doctor. I'm a 69-year-old male currently undergoing chemotherapy for metastatic small cell lung carcinoma. I've been responding well to etoposide and cisplatin, but recently, I've developed multiple \"spots\" all over my body."},
    {"role": "assistant", "content": "Hello, I'm sorry to hear that you're experiencing these new symptoms. Can you please describe the spots in more detail?"},
    {"role": "user", "content": "It started recently, and the spots are all over my body. I also have a rash on my trunk and both upper and lower extremities."}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Prompt Details

To reproduce the results reported in our paper, it is advisable to utilize the identical system messages used during model training. Please refer to the guidelines detailed below.

USMLE or Clinical Cases

When solving USMLE-style questions such as MedQA and MedBullets, or dealing with complex clinical cases like the JAMA Clinical Challenge, use the following system message:

messages = [
    {"role": "system", "content": "The following is a multiple-choice question about medical knowledge. Solve this in a step-by-step fashion, starting by summarizing the available information. Output a single option from the given options as the final answer. You are strongly required to follow the specified output format; conclude your response with the phrase \"the answer is ([option_id]) [answer_string]\".\n\n"},
    {"role": "user", "content": "A 67-year-old man with transitional cell carcinoma of the bladder comes to the physician because of a 2-day history of ringing sensation in his ear. He received this first course of neoadjuvant chemotherapy 1 week ago. Pure tone audiometry shows a sensorineural hearing loss of 45 dB. The expected beneficial effect of the drug that caused this patient's symptoms is most likely due to which of the following actions? (A) Inhibition of proteasome (B) Hyperstabilization of microtubules (C) Generation of free radicals (D) Cross-linking of DNA"},
]

The model generates reasoning paths to solve the problem and then sequentially provides the predicted answers. Since the model ends its response with "the answer is," it is straightforward to extract the predicted answer for comparison with the actual answer.

Multiple-choice Exams

For other types of multiple-choice exams such as MedMCQA or MMLU, use the following simple system message:

messages = [
    {"role": "system", "content": "Answer the multiple-choice question about medical knowledge.\n\n"},
    {"role": "user", "content": "In a Robertsonian translocation fusion occurs at the: (A) telomeres. (B) centromeres. (C) histones. (D) ends of the long arms."},
]

Other Use Cases

Our model was trained using the AlpaCare instruction dataset comprising 52K examples, to enhance its generalization capabilities across diverse user prompts. Feel free to design and test your prompts and to share your thoughts with us, whether the model exceeds expectations or falls short!

Model Architecture

Our model was based on Mistral-7B-v0.1 because of its accuracy and run-time efficiency.

Training Data

We plan to release our training dataset publicly soon.

Contact

Feel free to email [email protected] if you have any questions.