macadeliccc
/

magistrate-3.2-3b-it

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Oct 1, 2024

Commit

1229612

·

verified ·

1 Parent(s): e7ea80d

Update README.md

Files changed (1) hide show

README.md +21 -12

README.md CHANGED Viewed

@@ -4,10 +4,16 @@ license: llama3.2
 base_model: macadeliccc/magistrate-3.2-3b-base
 tags:
 - generated_from_trainer
-model-index:
-- name: outputs/magistrate-3.2-3b
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -687,26 +693,29 @@ tokens:
 </details><br>
-# outputs/magistrate-3.2-3b
-This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.8067
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -742,4 +751,4 @@ The following hyperparameters were used during training:
 - Transformers 4.45.0
 - Pytorch 2.3.1+cu121
 - Datasets 2.21.0
-- Tokenizers 0.20.0

 base_model: macadeliccc/magistrate-3.2-3b-base
 tags:
 - generated_from_trainer
+- spectrum
+- pytorch
+- llama-3
+- axolotl
 ---
+# magistrate-3.2-3b-it
+This model is a fine-tuned version of [macadeliccc/magistrate-3.2-3b-base](https://huggingface.co/macadeliccc/magistrate-3.2-3b-base) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.8067
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 </details><br>
 ## Model description
+Magistrate-3.2-3b-it is a legal assistant specializing in US Supreme Court case law and US Federal regulations.
+The base model is pretrained with ~250M tokens containing no synthetic legal data. The instruct model does contain synthetic data.
 ## Intended uses & limitations
+This model is for research purposes and for continued development of the legal specialty. You are liable for all model outputs.
 ## Training and evaluation data
+This model was trained on a variety of standard open source datasets like OpenHermes-2.5, hermes-function-calling, and some select entries from the Tome.
+Additionally, I have included a comprehensive, non-synthetic argument dataset. This is a work in progress but has shown promising results so far.
 ## Training procedure
+Spectrum top 35% finetune for both pretrain and SFT. Thanks to the cognitive computations team for the work done with spectrum.
++ Pretraining methodology based on Cohere's paper: [To Code, or Not To Code? Exploring Impact of Code in Pre-training](https://arxiv.org/abs/2408.10914)
++ Instruct finetune largely based on OpenHermes-2.5 and hermes-function-calling
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.45.0
 - Pytorch 2.3.1+cu121
 - Datasets 2.21.0
+- Tokenizers 0.20.0