macadeliccc commited on
Commit
1229612
·
verified ·
1 Parent(s): e7ea80d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -12
README.md CHANGED
@@ -4,10 +4,16 @@ license: llama3.2
4
  base_model: macadeliccc/magistrate-3.2-3b-base
5
  tags:
6
  - generated_from_trainer
7
- model-index:
8
- - name: outputs/magistrate-3.2-3b
9
- results: []
 
10
  ---
 
 
 
 
 
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
@@ -687,26 +693,29 @@ tokens:
687
 
688
  </details><br>
689
 
690
- # outputs/magistrate-3.2-3b
691
-
692
- This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
693
- It achieves the following results on the evaluation set:
694
- - Loss: 0.8067
695
 
696
  ## Model description
697
 
698
- More information needed
 
 
699
 
700
  ## Intended uses & limitations
701
 
702
- More information needed
703
 
704
  ## Training and evaluation data
705
 
706
- More information needed
 
707
 
708
  ## Training procedure
709
 
 
 
 
 
 
710
  ### Training hyperparameters
711
 
712
  The following hyperparameters were used during training:
@@ -742,4 +751,4 @@ The following hyperparameters were used during training:
742
  - Transformers 4.45.0
743
  - Pytorch 2.3.1+cu121
744
  - Datasets 2.21.0
745
- - Tokenizers 0.20.0
 
4
  base_model: macadeliccc/magistrate-3.2-3b-base
5
  tags:
6
  - generated_from_trainer
7
+ - spectrum
8
+ - pytorch
9
+ - llama-3
10
+ - axolotl
11
  ---
12
+ # magistrate-3.2-3b-it
13
+
14
+ This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
15
+ It achieves the following results on the evaluation set:
16
+ - Loss: 0.8067
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
 
693
 
694
  </details><br>
695
 
 
 
 
 
 
696
 
697
  ## Model description
698
 
699
+ Magistrate-3.2-3b-it is a legal assistant specializing in US Supreme Court case law and US Federal regulations.
700
+
701
+ The base model is pretrained with ~250M tokens containing no synthetic legal data. The instruct model does contain synthetic data.
702
 
703
  ## Intended uses & limitations
704
 
705
+ This model is for research purposes and for continued development of the legal specialty. You are liable for all model outputs.
706
 
707
  ## Training and evaluation data
708
 
709
+ This model was trained on a variety of standard open source datasets like OpenHermes-2.5, hermes-function-calling, and some select entries from the Tome.
710
+ Additionally, I have included a comprehensive, non-synthetic argument dataset. This is a work in progress but has shown promising results so far.
711
 
712
  ## Training procedure
713
 
714
+ Spectrum top 35% finetune for both pretrain and SFT. Thanks to the cognitive computations team for the work done with spectrum.
715
+
716
+ + Pretraining methodology based on Cohere's paper: [To Code, or Not To Code? Exploring Impact of Code in Pre-training](https://arxiv.org/abs/2408.10914)
717
+ + Instruct finetune largely based on OpenHermes-2.5 and hermes-function-calling
718
+
719
  ### Training hyperparameters
720
 
721
  The following hyperparameters were used during training:
 
751
  - Transformers 4.45.0
752
  - Pytorch 2.3.1+cu121
753
  - Datasets 2.21.0
754
+ - Tokenizers 0.20.0