Introduction

This repo contains Physician-Ko-8B, a medical language model with 8 billion parameters. This model builds upon the foundation of LLaMA-3-physician-8b-instruct model fine-tuned with a Korean dataset.

Datasets

Approach 1

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "eded0902/Physician-Ko-8B"
tokenizer_name = "YiDuo1999/Llama-3-Physician-8B-Instruct"
device_map = 'auto'

model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True,use_cache=False,device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>"), tokenizer.convert_tokens_to_ids("<|im_end|>")]
tokenizer.pad_token = tokenizer.eos_token

def askme(question):
    sys_message = ''' 
    You are an AI Medical Assistant trained on a vast dataset of health information. Please be thorough and
    provide an informative answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
    '''   
    # Create messages structured for the chat template
    messages = [{"role": "system", "content": sys_message}, {"role": "user", "content": question}]
    
    # Applying chat template
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=1000, use_cache=True)
    
    # Extract and return the generated text, removing the prompt
    response_text = tokenizer.batch_decode(outputs)[0].strip()
    answer = response_text.split('<|im_start|>assistant')[-1].split('<|im_end|>')[0].strip()
    return answer

# Example usage
# - Context: First describe your problem.
# - Question: Then make the question.
question = '''HIV๊ฐ€ ๋ญ์•ผ?'''
print(askme(question))

the type of answer is:

'HIV๋Š” Human Immunodeficiency Virus์˜ ์•ฝ์ž๋กœ, ์ธ์ฒด ๋ฉด์—ญ๊ฒฐํ• ๋ฐ”์ด๋Ÿฌ์Šค๋ผ๊ณ ๋„ ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค. ์ด ๋ฐ”์ด๋Ÿฌ์Šค๋Š” ์ธ๊ฐ„์˜ ๋ฉด์—ญ ์ฒด๊ณ„๋ฅผ ์•ฝํ™”์‹œํ‚ค๋Š” ๋ฐ”์ด๋Ÿฌ์Šค๋กœ, ์ธ์ฒด์˜ ๋ฉด์—ญ ์„ธํฌ๋ฅผ ๊ณต๊ฒฉํ•˜์—ฌ ๋ฉด์—ญ๋ ฅ์„ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค. HIV์— ๊ฐ์—ผ๋˜๋ฉด ์ธ์ฒด์˜ ๋ฉด์—ญ ์ฒด๊ณ„๊ฐ€ ์•ฝํ•ด์ ธ ๋‹ค์–‘ํ•œ ๊ฐ์—ผ์„ฑ ์งˆํ™˜๊ณผ ์ข…์–‘์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. HIV ๊ฐ์—ผ์„ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•ˆ์ „ํ•œ ์„ฑ๊ด€๊ณ„ ์œ ์ง€, ํ˜ˆ์•ก ๋ฐ ํ˜ˆ์•ก ์ œ์ œ์˜ ๊ณต์œ ๋ฅผ ํ”ผํ•˜๋Š” ๋“ฑ์˜ ์˜ˆ๋ฐฉ ์กฐ์น˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Approach 2

Using langchain

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate

model_name = "eded0902/Physician-Ko-8B"
tokenizer_name = "YiDuo1999/Llama-3-Physician-8B-Instruct"
device_map = 'auto'

model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True,use_cache=False,device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>"), tokenizer.convert_tokens_to_ids("<|im_end|>")]
tokenizer.pad_token = tokenizer.eos_token

pipe = pipeline("text-generation", 
               model=model, 
               tokenizer=tokenizer, 
               max_new_tokens=512
               )
hf = HuggingFacePipeline(pipeline=pipe)

sys_message = """ You are an AI Medical Assistant trained on a vast dataset of health information. Please be thorough and
    provide an informative answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
    """
question = "HIV๊ฐ€ ๋ญ์•ผ?"
# Create messages structured for the chat template
messages = [{"role": "system", "content": sys_message}, {"role": "user", "content": question}]

# Applying chat template
template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

print(chain.invoke({"question": question})[len(template):].split('<|im_end|>')[0].strip())

the type of answer is:

HIV๋Š” ์ธ๊ฐ„ ๋ฉด์—ญ ๊ฒฐํ• ๋ฐ”์ด๋Ÿฌ์Šค(Human Immunodeficiency Virus, HIV)์˜ ์•ฝ์ž์ž…๋‹ˆ๋‹ค. ์ด ๋ฐ”์ด๋Ÿฌ์Šค๋Š” ์ธ์ฒด์˜ ๋ฉด์—ญ ์ฒด๊ณ„๋ฅผ ์•ฝํ™”์‹œ์ผœ ๊ฐ์—ผ์„ ์ผ์œผํ‚ค๋Š” ๋ฐ”์ด๋Ÿฌ์Šค์ž…๋‹ˆ๋‹ค. HIV๋Š” ์ฃผ๋กœ ์„ฑ์  ์ ‘์ด‰, ํ˜ˆ์•ก ์ „ํŒŒ, ํƒœ์•„ ๊ฐ์—ผ ๋“ฑ์„ ํ†ตํ•ด ์ „ํŒŒ๋ฉ๋‹ˆ๋‹ค. HIV์— ๊ฐ์—ผ๋˜๋ฉด ๋ฉด์—ญ ์„ธํฌ๋“ค์ด ํŒŒ๊ดด๋˜์–ด ๋‹ค์–‘ํ•œ ๊ฐ์—ผ์„ฑ ์งˆํ™˜๊ณผ ์ข…์–‘์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. HIV ๊ฐ์—ผ์„ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•ˆ์ „ํ•œ ์„ฑํ–‰์œ„์™€ ํ˜ˆ์•ก ๋ฐ ํ˜ˆ์•ก ์ œํ’ˆ์˜ ์•ˆ์ „ํ•œ ์‚ฌ์šฉ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ HIV ๊ฐ์—ผ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์ •๊ธฐ์ ์ธ ๊ฒ€์‚ฌ๋ฅผ ๋ฐ›๋Š” ๊ฒƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for eded0902/Physician-Ko-8B

Finetuned
(1)
this model

Datasets used to train eded0902/Physician-Ko-8B