Update README.md
Browse files
README.md
CHANGED
@@ -4,10 +4,16 @@ license: llama3.2
|
|
4 |
base_model: macadeliccc/magistrate-3.2-3b-base
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
-
|
8 |
-
-
|
9 |
-
|
|
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
should probably proofread and complete it, then remove this comment. -->
|
@@ -687,26 +693,29 @@ tokens:
|
|
687 |
|
688 |
</details><br>
|
689 |
|
690 |
-
# outputs/magistrate-3.2-3b
|
691 |
-
|
692 |
-
This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
|
693 |
-
It achieves the following results on the evaluation set:
|
694 |
-
- Loss: 0.8067
|
695 |
|
696 |
## Model description
|
697 |
|
698 |
-
|
|
|
|
|
699 |
|
700 |
## Intended uses & limitations
|
701 |
|
702 |
-
|
703 |
|
704 |
## Training and evaluation data
|
705 |
|
706 |
-
|
|
|
707 |
|
708 |
## Training procedure
|
709 |
|
|
|
|
|
|
|
|
|
|
|
710 |
### Training hyperparameters
|
711 |
|
712 |
The following hyperparameters were used during training:
|
@@ -742,4 +751,4 @@ The following hyperparameters were used during training:
|
|
742 |
- Transformers 4.45.0
|
743 |
- Pytorch 2.3.1+cu121
|
744 |
- Datasets 2.21.0
|
745 |
-
- Tokenizers 0.20.0
|
|
|
4 |
base_model: macadeliccc/magistrate-3.2-3b-base
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
+
- spectrum
|
8 |
+
- pytorch
|
9 |
+
- llama-3
|
10 |
+
- axolotl
|
11 |
---
|
12 |
+
# magistrate-3.2-3b-it
|
13 |
+
|
14 |
+
This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
|
15 |
+
It achieves the following results on the evaluation set:
|
16 |
+
- Loss: 0.8067
|
17 |
|
18 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
19 |
should probably proofread and complete it, then remove this comment. -->
|
|
|
693 |
|
694 |
</details><br>
|
695 |
|
|
|
|
|
|
|
|
|
|
|
696 |
|
697 |
## Model description
|
698 |
|
699 |
+
Magistrate-3.2-3b-it is a legal assistant specializing in US Supreme Court case law and US Federal regulations.
|
700 |
+
|
701 |
+
The base model is pretrained with ~250M tokens containing no synthetic legal data. The instruct model does contain synthetic data.
|
702 |
|
703 |
## Intended uses & limitations
|
704 |
|
705 |
+
This model is for research purposes and for continued development of the legal specialty. You are liable for all model outputs.
|
706 |
|
707 |
## Training and evaluation data
|
708 |
|
709 |
+
This model was trained on a variety of standard open source datasets like OpenHermes-2.5, hermes-function-calling, and some select entries from the Tome.
|
710 |
+
Additionally, I have included a comprehensive, non-synthetic argument dataset. This is a work in progress but has shown promising results so far.
|
711 |
|
712 |
## Training procedure
|
713 |
|
714 |
+
Spectrum top 35% finetune for both pretrain and SFT. Thanks to the cognitive computations team for the work done with spectrum.
|
715 |
+
|
716 |
+
+ Pretraining methodology based on Cohere's paper: [To Code, or Not To Code? Exploring Impact of Code in Pre-training](https://arxiv.org/abs/2408.10914)
|
717 |
+
+ Instruct finetune largely based on OpenHermes-2.5 and hermes-function-calling
|
718 |
+
|
719 |
### Training hyperparameters
|
720 |
|
721 |
The following hyperparameters were used during training:
|
|
|
751 |
- Transformers 4.45.0
|
752 |
- Pytorch 2.3.1+cu121
|
753 |
- Datasets 2.21.0
|
754 |
+
- Tokenizers 0.20.0
|