lightblue
/

Karasu-DPO-7B

@@ -11,9 +11,11 @@ base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
-[日本語モデルカード](#japanese)
-[日本語のブログ]()
 # Karasu-DPO-7B
@@ -23,12 +25,14 @@ This model outperforms the base [Qwen/Qwen2.5-7B-Instruct](https://huggingface.c
 |Qwen2.5-7B-Instruct|Karasu-DPO-7B|
 |----|----|
-|50.0|56.6|
-We recommend this model for use as a general conversatio AI.
 # How to use
 <ul>
   <li><b>vLLM</b>
@@ -46,16 +50,14 @@ llm = LLM(
 )
 sampling_params = SamplingParams(
-    temperature=0.5,
     max_tokens=8_000,
-    repetition_penalty=1.1
 )
 prompts = [
-    """学校には1クラスにつき20人の生徒がおり、クラスは合計3つあります。
-学校全体では男子と女子がそれぞれ50%ずついます。
-1つ目のクラスには女子が15人、2つ目のクラスには女子が12人います。
-3つ目のクラスには何人の男子がいますか？"""
 ]
 conversations = [
@@ -66,32 +68,136 @@ outputs = llm.chat(conversations, sampling_params=sampling_params)
 for output in outputs:
     print(output.outputs[0].text)
-<think>
-# まず、学校の総生徒数を算出します。各クラスに20人の生徒があり、クラスは3つあるため、総生徒数は60人です。
-# 次に、学校全体で男子と女子は同じ人数で分布しています。したがって、男子と女子各有30人。
-...
-# したがって、3つ目のクラスの男子数は20 - 3 = 17人です。
-# </think>
-# **解答：**
-# 学校の総生徒数を算出します。
-...
-# **最終的な答え：**
-# \[
-# \boxed{17}
-# \]
 ```
 </details>
 <br/>
 <h1 style="font-size: 48px;" id="japanese">日本語</h3>
 ### Model Details
 - Model size: 7B
@@ -117,4 +223,16 @@ for output in outputs:
 |70|0.462300|	0.506989|
 |80|0.419600|	0.509142|
 |90|0.445200|	0.510396|
-|100|0.424400|	0.511653|

 - Qwen/Qwen2.5-7B-Instruct
 ---
+[日本語モデルカード/Japanese model card](#japanese)
+[日本語のブログ/Full Japanese dev blog]()
+[Development source code/開発ソースコード]()
 # Karasu-DPO-7B
 |Qwen2.5-7B-Instruct|Karasu-DPO-7B|
 |----|----|
+|50.0|66.2|
+We recommend this model for use as a general conversation AI.
 # How to use
+This model can be used in the same way as any Qwen 2.5 model. We recommend using vLLM for simplicity and speed.
 <ul>
   <li><b>vLLM</b>
 )
 sampling_params = SamplingParams(
+    temperature=0.0,
     max_tokens=8_000,
 )
 prompts = [
+    """ナイジェリアの首都はどこですか？""",
+    """鉄は何度に溶けますか？""",
+    """父が好きそうなプレゼントのおすすめを教えて""",
 ]
 conversations = [
 for output in outputs:
     print(output.outputs[0].text)
+    print("-"*32)
+# ナイジェリアの首都はアブジャ（Abuja）です。以前はラゴスが首都でしたが、1991年に新しい首都としてアブジャが建設され、1991年12月12日に首都としての地位を正式に取得しました。アブジャは政治中心地として機能していますが、経済の中心地は依然としてラゴスが占めています。
+# --------------------------------
+# 鉄は非常に高い温度で溶けます。鉄の融点は約1,538℃（2,800°F）です。これは、一般的な家庭用のオーブン（最大約200-300℃）では絶対に達成できません。鉄を溶かすためには、より高温の設備が必要で、例えば、電気炉やガス炉などがあります。
+# --------------------------------
+# もちろんです。父さんへのプレゼント選びは楽しみですね。以下に、父が喜ぶ2つのプレゼントを提案します：
+# 1. **高級コーヒーメーカー**：
+#    - 父さんがコーヒーを愛飲しているなら、高品質なコーヒーメーカーは大変喜ばれるプレゼントです。例えば、手動式のコーヒーメーカーなら、毎日のコーヒー作りがより楽しく、手作り感も楽しめます。また、自動式のコーヒーメーカーなら、忙しい朝でも美味しいコーヒーが楽しめます。
+# 2. **趣味に合わせたギフトセット**：
+#    - 父さんの趣味や興味に合わせたギフトセットは、とても喜ばれます。例えば、ゴルフ好きなら、最新のゴルフクラブやゴルフバッグ、ゴルフボールセットなどが良いでしょう。また、車好きなら、高品質な車用アクセサリー（カーフィルム、カーボンシートなど）や車載用の充電器などが喜ばれます。
+# これらのプレゼントは、父さんの趣味や興味に合わせて選べば、きっと喜んでもらえることでしょう。
+# --------------------------------
 ```
 </details>
 <br/>
+# How this model was made
+We made this model through the following procedure:
+1. Sample Japanese and English prompts from the following datasets:
+   * lmsys/lmsys-chat-1m
+   * RyokoAI/ShareGPT52K
+   * openchat/openchat_sharegpt_v3
+   * OpenAssistant/oasst2
+   * Open-Orca/slimorca-deduped-cleaned-corrected
+   * HuggingFaceH4/ultrachat_200k
+2. Translate English prompts to Japanese using [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
+3. Correct translations with [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
+4. Get responses to all Japanese prompts (both original and translated) with [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
+5. Correct responses using [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
+We QLoRA DPO trained a [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model on this data to create Karasu-DPO-7B.
 <h1 style="font-size: 48px;" id="japanese">日本語</h3>
+こちらのモデルは[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)の日本語版です。生成した日本語会話データとDPO学習で作成しました。
+このモデルは、[arena-hard-auto-multilingual](https://github.com/lightblue-tech/arena-hard-auto-multilingual)チャットベンチマークにおいて、ベースモデルである[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)を上回る性能を発揮します：
+|Qwen2.5-7B-Instruct|Karasu-DPO-7B|
+|----|----|
+|50.0|66.2|
+このモデルは、一般的な会話AIとしての使用を推奨します。
+# 使用方法
+このモデルは、他のQwen 2.5モデルと同様の方法で使用できます。シンプルで高速な操作のためにはvLLMの使用を推奨します。
+<ul>
+  <li><b>vLLM</b>
+[vLLM](https://github.com/vllm-project/vllm/)を`pip install vllm`でインストールしてください。
+<details open>
+  <summary>vLLMコードを見る</summary>
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+    model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
+    max_model_len=8_000
+)
+sampling_params = SamplingParams(
+    temperature=0.0,
+    max_tokens=8_000,
+)
+prompts = [
+    """ナイジェリアの首都はどこですか？""",
+    """鉄は何度に溶けますか？""",
+    """父が好きそうなプレゼントのおすすめを教えて""",
+]
+conversations = [
+    [{"role": "user", "content": x}] for x in prompts
+]
+outputs = llm.chat(conversations, sampling_params=sampling_params)
+for output in outputs:
+    print(output.outputs[0].text)
+    print("-"*32)
+# ナイジェリアの首都はアブジャ（Abuja）です。以前はラゴスが首都でしたが、1991年に新しい首都としてアブジャが建設され、1991年12月12日に首都としての地位を正式に取得しました。アブジャは政治中心地として機能していますが、経済の中心地は依然としてラゴスが占めています。
+# --------------------------------
+# 鉄は非常に高い温度で溶けます。鉄の融点は約1,538℃（2,800°F）です。これは、一般的な家庭用のオーブン（最大約200-300℃）では絶対に達成できません。鉄を溶かすためには、より高温の設備が必要で、例えば、電気炉やガス炉などがあります。
+# --------------------------------
+# もちろんです。父さんへのプレゼント選びは楽しみですね。以下に、父が喜ぶ2つのプレゼントを提案します：
+# 1. **高級コーヒーメーカー**：
+#    - 父さんがコーヒーを愛飲しているなら、高品質なコーヒーメーカーは大変喜ばれるプレゼントです。例えば、手動式のコーヒーメーカーなら、毎日のコーヒー作りがより楽しく、手作り感も楽しめます。また、自動式のコーヒーメーカーなら、忙しい朝でも美味しいコーヒーが楽しめます。
+# 2. **趣味に合わせたギフトセット**：
+#    - 父さんの趣味や興味に合わせたギフトセットは、とても喜ばれます。例えば、ゴルフ好きなら、最新のゴルフクラブやゴルフバッグ、ゴルフボールセットなどが良いでしょう。また、車好きなら、高品質な車用アクセサリー（カーフィルム、カーボンシートなど）や車載用の充電器などが喜ばれます。
+# これらのプレゼントは、父さんの趣味や興味に合わせて選べば、きっと喜んでもらえることでしょう。
+# --------------------------------
+```
+</details>
+<br/>
+# このモデルの作成方法
+このモデルは以下の手順を通して作成されました：
+1. 以下のデータセットから日本語および英語のプロンプトをサンプリング：
+   * lmsys/lmsys-chat-1m
+   * RyokoAI/ShareGPT52K
+   * openchat/openchat_sharegpt_v3
+   * OpenAssistant/oasst2
+   * Open-Orca/slimorca-deduped-cleaned-corrected
+   * HuggingFaceH4/ultrachat_200k
+2. 英語のプロンプトを[gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使って日本語に翻訳。
+3. [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使って翻訳を修正。
+4. 日本語のプロンプト（オリジナルと翻訳の両方）に対する応答を[gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)で取得。
+5. [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使用して応答を修正。
+[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)モデルを基に、QLoRA DPOトレーニングを行い、Karasu-DPO-7Bを作成しました。
 ### Model Details
 - Model size: 7B
 |70|0.462300|	0.506989|
 |80|0.419600|	0.509142|
 |90|0.445200|	0.510396|
+|100|0.424400|	0.511653|
+# License
+We share this model under an Apache 2.0 license.
+# Developed by
+<a href="https://www.lightblue-tech.com">
+<img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
+</a>
+This model was trained by Jun Sashihara ([junsashihara](https://huggingface.co/junsashihara)) and supervised by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue。