Training in progress, step 341, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +63 -0
last-checkpoint/trainer_state.json +2436 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/OpenHermes-2.5-Mistral-7B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/OpenHermes-2.5-Mistral-7B",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "k_proj",
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:669a93a7a599d147ac68d4e0cd4acfda8ad8db9f83a76565f9a7cbec3d5822cf
+size 83945296

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 32000,
+  "<|im_start|>": 32001
+}

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4e78b8cbbc492e035ec231795143bd705d58ffccf9c6af9264d03f3d598018d
+size 43123028

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec6897e825191b0a80051b1e34fb6ced22b692c5c07df5bd607896b8ec6078eb
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5b0bb58a6976151803bee01b02feadab21639b27a6d2d75a55682622d20ee556
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
+size 493443

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "legacy": true,
+  "model_max_length": 32768,
+  "pad_token": "<unk>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "trust_remote_code": false,
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true,
+  "use_fast": true
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2436 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.0668693009118541,
+  "eval_steps": 341,
+  "global_step": 341,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00019609765663300324,
+      "grad_norm": 18.01366424560547,
+      "learning_rate": 2e-05,
+      "loss": 3.0843,
+      "step": 1
+    },
+    {
+      "epoch": 0.00019609765663300324,
+      "eval_loss": 1.1017773151397705,
+      "eval_runtime": 79.0856,
+      "eval_samples_per_second": 27.16,
+      "eval_steps_per_second": 13.58,
+      "step": 1
+    },
+    {
+      "epoch": 0.0003921953132660065,
+      "grad_norm": 16.80318260192871,
+      "learning_rate": 4e-05,
+      "loss": 3.2221,
+      "step": 2
+    },
+    {
+      "epoch": 0.0005882929698990097,
+      "grad_norm": 16.65929412841797,
+      "learning_rate": 6e-05,
+      "loss": 3.8956,
+      "step": 3
+    },
+    {
+      "epoch": 0.000784390626532013,
+      "grad_norm": 37.782188415527344,
+      "learning_rate": 8e-05,
+      "loss": 5.185,
+      "step": 4
+    },
+    {
+      "epoch": 0.0009804882831650162,
+      "grad_norm": 19.226940155029297,
+      "learning_rate": 0.0001,
+      "loss": 3.1542,
+      "step": 5
+    },
+    {
+      "epoch": 0.0011765859397980193,
+      "grad_norm": 26.570402145385742,
+      "learning_rate": 0.00012,
+      "loss": 4.5356,
+      "step": 6
+    },
+    {
+      "epoch": 0.0013726835964310226,
+      "grad_norm": 22.43348503112793,
+      "learning_rate": 0.00014,
+      "loss": 3.8177,
+      "step": 7
+    },
+    {
+      "epoch": 0.001568781253064026,
+      "grad_norm": 29.817941665649414,
+      "learning_rate": 0.00016,
+      "loss": 4.4013,
+      "step": 8
+    },
+    {
+      "epoch": 0.0017648789096970292,
+      "grad_norm": 18.46044158935547,
+      "learning_rate": 0.00018,
+      "loss": 3.374,
+      "step": 9
+    },
+    {
+      "epoch": 0.0019609765663300325,
+      "grad_norm": 24.337013244628906,
+      "learning_rate": 0.0002,
+      "loss": 3.2509,
+      "step": 10
+    },
+    {
+      "epoch": 0.0021570742229630358,
+      "grad_norm": 18.9931640625,
+      "learning_rate": 0.00019999972962977903,
+      "loss": 2.933,
+      "step": 11
+    },
+    {
+      "epoch": 0.0023531718795960386,
+      "grad_norm": 17.7039852142334,
+      "learning_rate": 0.00019999891852057812,
+      "loss": 3.2867,
+      "step": 12
+    },
+    {
+      "epoch": 0.002549269536229042,
+      "grad_norm": 15.128417015075684,
+      "learning_rate": 0.0001999975666767833,
+      "loss": 2.2303,
+      "step": 13
+    },
+    {
+      "epoch": 0.002745367192862045,
+      "grad_norm": 8.747583389282227,
+      "learning_rate": 0.00019999567410570446,
+      "loss": 1.7348,
+      "step": 14
+    },
+    {
+      "epoch": 0.0029414648494950485,
+      "grad_norm": 20.570377349853516,
+      "learning_rate": 0.00019999324081757555,
+      "loss": 3.9284,
+      "step": 15
+    },
+    {
+      "epoch": 0.003137562506128052,
+      "grad_norm": 7.86672306060791,
+      "learning_rate": 0.00019999026682555434,
+      "loss": 1.8121,
+      "step": 16
+    },
+    {
+      "epoch": 0.003333660162761055,
+      "grad_norm": 10.341567039489746,
+      "learning_rate": 0.0001999867521457224,
+      "loss": 1.5864,
+      "step": 17
+    },
+    {
+      "epoch": 0.0035297578193940584,
+      "grad_norm": 7.888909339904785,
+      "learning_rate": 0.00019998269679708504,
+      "loss": 3.1584,
+      "step": 18
+    },
+    {
+      "epoch": 0.0037258554760270617,
+      "grad_norm": 15.346771240234375,
+      "learning_rate": 0.00019997810080157113,
+      "loss": 2.0858,
+      "step": 19
+    },
+    {
+      "epoch": 0.003921953132660065,
+      "grad_norm": 11.417168617248535,
+      "learning_rate": 0.0001999729641840331,
+      "loss": 3.5646,
+      "step": 20
+    },
+    {
+      "epoch": 0.004118050789293068,
+      "grad_norm": 12.86220932006836,
+      "learning_rate": 0.00019996728697224675,
+      "loss": 3.0594,
+      "step": 21
+    },
+    {
+      "epoch": 0.0043141484459260715,
+      "grad_norm": 7.418736457824707,
+      "learning_rate": 0.00019996106919691102,
+      "loss": 2.1349,
+      "step": 22
+    },
+    {
+      "epoch": 0.004510246102559075,
+      "grad_norm": 4.803528308868408,
+      "learning_rate": 0.00019995431089164795,
+      "loss": 2.3151,
+      "step": 23
+    },
+    {
+      "epoch": 0.004706343759192077,
+      "grad_norm": 8.667632102966309,
+      "learning_rate": 0.00019994701209300245,
+      "loss": 1.6791,
+      "step": 24
+    },
+    {
+      "epoch": 0.0049024414158250805,
+      "grad_norm": 5.3950724601745605,
+      "learning_rate": 0.00019993917284044202,
+      "loss": 2.7265,
+      "step": 25
+    },
+    {
+      "epoch": 0.005098539072458084,
+      "grad_norm": 18.16863441467285,
+      "learning_rate": 0.0001999307931763567,
+      "loss": 1.7289,
+      "step": 26
+    },
+    {
+      "epoch": 0.005294636729091087,
+      "grad_norm": 10.66887378692627,
+      "learning_rate": 0.00019992187314605872,
+      "loss": 2.645,
+      "step": 27
+    },
+    {
+      "epoch": 0.00549073438572409,
+      "grad_norm": 26.42878532409668,
+      "learning_rate": 0.00019991241279778232,
+      "loss": 3.4603,
+      "step": 28
+    },
+    {
+      "epoch": 0.005686832042357094,
+      "grad_norm": 10.017987251281738,
+      "learning_rate": 0.0001999024121826834,
+      "loss": 1.2001,
+      "step": 29
+    },
+    {
+      "epoch": 0.005882929698990097,
+      "grad_norm": 11.155564308166504,
+      "learning_rate": 0.00019989187135483933,
+      "loss": 1.5102,
+      "step": 30
+    },
+    {
+      "epoch": 0.0060790273556231,
+      "grad_norm": 3.7590131759643555,
+      "learning_rate": 0.00019988079037124864,
+      "loss": 1.0619,
+      "step": 31
+    },
+    {
+      "epoch": 0.006275125012256104,
+      "grad_norm": 8.47985553741455,
+      "learning_rate": 0.00019986916929183067,
+      "loss": 2.6256,
+      "step": 32
+    },
+    {
+      "epoch": 0.006471222668889107,
+      "grad_norm": 16.655269622802734,
+      "learning_rate": 0.00019985700817942533,
+      "loss": 2.2039,
+      "step": 33
+    },
+    {
+      "epoch": 0.00666732032552211,
+      "grad_norm": 9.003829956054688,
+      "learning_rate": 0.00019984430709979264,
+      "loss": 1.7281,
+      "step": 34
+    },
+    {
+      "epoch": 0.0068634179821551134,
+      "grad_norm": 36.33080291748047,
+      "learning_rate": 0.0001998310661216125,
+      "loss": 3.0575,
+      "step": 35
+    },
+    {
+      "epoch": 0.007059515638788117,
+      "grad_norm": 11.909663200378418,
+      "learning_rate": 0.00019981728531648423,
+      "loss": 1.9623,
+      "step": 36
+    },
+    {
+      "epoch": 0.00725561329542112,
+      "grad_norm": 10.967493057250977,
+      "learning_rate": 0.00019980296475892616,
+      "loss": 2.8071,
+      "step": 37
+    },
+    {
+      "epoch": 0.007451710952054123,
+      "grad_norm": 11.050918579101562,
+      "learning_rate": 0.00019978810452637543,
+      "loss": 1.8019,
+      "step": 38
+    },
+    {
+      "epoch": 0.007647808608687127,
+      "grad_norm": 17.752105712890625,
+      "learning_rate": 0.00019977270469918727,
+      "loss": 3.0332,
+      "step": 39
+    },
+    {
+      "epoch": 0.00784390626532013,
+      "grad_norm": 6.67478609085083,
+      "learning_rate": 0.0001997567653606348,
+      "loss": 1.32,
+      "step": 40
+    },
+    {
+      "epoch": 0.008040003921953132,
+      "grad_norm": 7.402947902679443,
+      "learning_rate": 0.00019974028659690843,
+      "loss": 1.4442,
+      "step": 41
+    },
+    {
+      "epoch": 0.008236101578586136,
+      "grad_norm": 13.006294250488281,
+      "learning_rate": 0.00019972326849711553,
+      "loss": 2.2418,
+      "step": 42
+    },
+    {
+      "epoch": 0.008432199235219139,
+      "grad_norm": 5.940983295440674,
+      "learning_rate": 0.00019970571115327985,
+      "loss": 0.9049,
+      "step": 43
+    },
+    {
+      "epoch": 0.008628296891852143,
+      "grad_norm": 9.866477012634277,
+      "learning_rate": 0.00019968761466034103,
+      "loss": 2.7203,
+      "step": 44
+    },
+    {
+      "epoch": 0.008824394548485145,
+      "grad_norm": 8.239157676696777,
+      "learning_rate": 0.00019966897911615416,
+      "loss": 1.9653,
+      "step": 45
+    },
+    {
+      "epoch": 0.00902049220511815,
+      "grad_norm": 16.142181396484375,
+      "learning_rate": 0.0001996498046214891,
+      "loss": 2.7509,
+      "step": 46
+    },
+    {
+      "epoch": 0.009216589861751152,
+      "grad_norm": 9.295431137084961,
+      "learning_rate": 0.00019963009128003018,
+      "loss": 2.0133,
+      "step": 47
+    },
+    {
+      "epoch": 0.009412687518384154,
+      "grad_norm": 10.362130165100098,
+      "learning_rate": 0.00019960983919837535,
+      "loss": 1.716,
+      "step": 48
+    },
+    {
+      "epoch": 0.009608785175017159,
+      "grad_norm": 28.2889461517334,
+      "learning_rate": 0.00019958904848603584,
+      "loss": 2.8961,
+      "step": 49
+    },
+    {
+      "epoch": 0.009804882831650161,
+      "grad_norm": 13.199325561523438,
+      "learning_rate": 0.0001995677192554354,
+      "loss": 2.62,
+      "step": 50
+    },
+    {
+      "epoch": 0.010000980488283165,
+      "grad_norm": 11.95132827758789,
+      "learning_rate": 0.00019954585162190985,
+      "loss": 2.792,
+      "step": 51
+    },
+    {
+      "epoch": 0.010197078144916168,
+      "grad_norm": 16.581575393676758,
+      "learning_rate": 0.0001995234457037063,
+      "loss": 2.7239,
+      "step": 52
+    },
+    {
+      "epoch": 0.010393175801549172,
+      "grad_norm": 12.048559188842773,
+      "learning_rate": 0.00019950050162198258,
+      "loss": 1.9892,
+      "step": 53
+    },
+    {
+      "epoch": 0.010589273458182174,
+      "grad_norm": 9.297942161560059,
+      "learning_rate": 0.00019947701950080672,
+      "loss": 1.8015,
+      "step": 54
+    },
+    {
+      "epoch": 0.010785371114815178,
+      "grad_norm": 6.09962797164917,
+      "learning_rate": 0.00019945299946715596,
+      "loss": 1.1493,
+      "step": 55
+    },
+    {
+      "epoch": 0.01098146877144818,
+      "grad_norm": 6.668224811553955,
+      "learning_rate": 0.00019942844165091633,
+      "loss": 0.9968,
+      "step": 56
+    },
+    {
+      "epoch": 0.011177566428081185,
+      "grad_norm": 17.000507354736328,
+      "learning_rate": 0.00019940334618488194,
+      "loss": 1.5857,
+      "step": 57
+    },
+    {
+      "epoch": 0.011373664084714187,
+      "grad_norm": 11.42551326751709,
+      "learning_rate": 0.00019937771320475406,
+      "loss": 1.5236,
+      "step": 58
+    },
+    {
+      "epoch": 0.011569761741347192,
+      "grad_norm": 13.255271911621094,
+      "learning_rate": 0.00019935154284914065,
+      "loss": 1.6174,
+      "step": 59
+    },
+    {
+      "epoch": 0.011765859397980194,
+      "grad_norm": 12.664427757263184,
+      "learning_rate": 0.00019932483525955533,
+      "loss": 1.476,
+      "step": 60
+    },
+    {
+      "epoch": 0.011961957054613198,
+      "grad_norm": 19.540660858154297,
+      "learning_rate": 0.00019929759058041687,
+      "loss": 1.5251,
+      "step": 61
+    },
+    {
+      "epoch": 0.0121580547112462,
+      "grad_norm": 10.44942855834961,
+      "learning_rate": 0.0001992698089590483,
+      "loss": 1.7865,
+      "step": 62
+    },
+    {
+      "epoch": 0.012354152367879203,
+      "grad_norm": 13.294017791748047,
+      "learning_rate": 0.00019924149054567606,
+      "loss": 4.1284,
+      "step": 63
+    },
+    {
+      "epoch": 0.012550250024512207,
+      "grad_norm": 13.700861930847168,
+      "learning_rate": 0.00019921263549342922,
+      "loss": 2.8987,
+      "step": 64
+    },
+    {
+      "epoch": 0.01274634768114521,
+      "grad_norm": 51.7520866394043,
+      "learning_rate": 0.00019918324395833877,
+      "loss": 1.7335,
+      "step": 65
+    },
+    {
+      "epoch": 0.012942445337778214,
+      "grad_norm": 25.415748596191406,
+      "learning_rate": 0.00019915331609933657,
+      "loss": 2.6131,
+      "step": 66
+    },
+    {
+      "epoch": 0.013138542994411216,
+      "grad_norm": 10.575088500976562,
+      "learning_rate": 0.00019912285207825475,
+      "loss": 1.202,
+      "step": 67
+    },
+    {
+      "epoch": 0.01333464065104422,
+      "grad_norm": 7.0860772132873535,
+      "learning_rate": 0.00019909185205982453,
+      "loss": 1.5077,
+      "step": 68
+    },
+    {
+      "epoch": 0.013530738307677223,
+      "grad_norm": 8.169393539428711,
+      "learning_rate": 0.00019906031621167553,
+      "loss": 2.4139,
+      "step": 69
+    },
+    {
+      "epoch": 0.013726835964310227,
+      "grad_norm": 15.750587463378906,
+      "learning_rate": 0.00019902824470433489,
+      "loss": 2.8999,
+      "step": 70
+    },
+    {
+      "epoch": 0.01392293362094323,
+      "grad_norm": 19.615734100341797,
+      "learning_rate": 0.00019899563771122618,
+      "loss": 3.3154,
+      "step": 71
+    },
+    {
+      "epoch": 0.014119031277576233,
+      "grad_norm": 24.015810012817383,
+      "learning_rate": 0.0001989624954086686,
+      "loss": 2.6689,
+      "step": 72
+    },
+    {
+      "epoch": 0.014315128934209236,
+      "grad_norm": 7.955759525299072,
+      "learning_rate": 0.00019892881797587601,
+      "loss": 1.8847,
+      "step": 73
+    },
+    {
+      "epoch": 0.01451122659084224,
+      "grad_norm": 13.804747581481934,
+      "learning_rate": 0.00019889460559495588,
+      "loss": 2.2221,
+      "step": 74
+    },
+    {
+      "epoch": 0.014707324247475242,
+      "grad_norm": 19.74148941040039,
+      "learning_rate": 0.0001988598584509084,
+      "loss": 1.9316,
+      "step": 75
+    },
+    {
+      "epoch": 0.014903421904108247,
+      "grad_norm": 7.9701385498046875,
+      "learning_rate": 0.00019882457673162543,
+      "loss": 2.1958,
+      "step": 76
+    },
+    {
+      "epoch": 0.015099519560741249,
+      "grad_norm": 4.7594170570373535,
+      "learning_rate": 0.00019878876062788954,
+      "loss": 1.3551,
+      "step": 77
+    },
+    {
+      "epoch": 0.015295617217374253,
+      "grad_norm": 7.796854496002197,
+      "learning_rate": 0.0001987524103333728,
+      "loss": 2.616,
+      "step": 78
+    },
+    {
+      "epoch": 0.015491714874007256,
+      "grad_norm": 6.997422218322754,
+      "learning_rate": 0.00019871552604463602,
+      "loss": 2.908,
+      "step": 79
+    },
+    {
+      "epoch": 0.01568781253064026,
+      "grad_norm": 6.436347484588623,
+      "learning_rate": 0.00019867810796112744,
+      "loss": 3.6826,
+      "step": 80
+    },
+    {
+      "epoch": 0.01588391018727326,
+      "grad_norm": 10.243407249450684,
+      "learning_rate": 0.00019864015628518175,
+      "loss": 1.3711,
+      "step": 81
+    },
+    {
+      "epoch": 0.016080007843906265,
+      "grad_norm": 11.830530166625977,
+      "learning_rate": 0.00019860167122201904,
+      "loss": 2.2325,
+      "step": 82
+    },
+    {
+      "epoch": 0.01627610550053927,
+      "grad_norm": 7.456768989562988,
+      "learning_rate": 0.0001985626529797436,
+      "loss": 1.9991,
+      "step": 83
+    },
+    {
+      "epoch": 0.016472203157172273,
+      "grad_norm": 50.91776657104492,
+      "learning_rate": 0.00019852310176934288,
+      "loss": 1.294,
+      "step": 84
+    },
+    {
+      "epoch": 0.016668300813805274,
+      "grad_norm": 6.4661054611206055,
+      "learning_rate": 0.00019848301780468622,
+      "loss": 2.4052,
+      "step": 85
+    },
+    {
+      "epoch": 0.016864398470438278,
+      "grad_norm": 5.4749674797058105,
+      "learning_rate": 0.00019844240130252385,
+      "loss": 2.1508,
+      "step": 86
+    },
+    {
+      "epoch": 0.017060496127071282,
+      "grad_norm": 9.457857131958008,
+      "learning_rate": 0.00019840125248248564,
+      "loss": 2.1732,
+      "step": 87
+    },
+    {
+      "epoch": 0.017256593783704286,
+      "grad_norm": 6.932736396789551,
+      "learning_rate": 0.00019835957156707988,
+      "loss": 1.0618,
+      "step": 88
+    },
+    {
+      "epoch": 0.017452691440337287,
+      "grad_norm": 10.447488784790039,
+      "learning_rate": 0.00019831735878169212,
+      "loss": 1.1214,
+      "step": 89
+    },
+    {
+      "epoch": 0.01764878909697029,
+      "grad_norm": 7.159199237823486,
+      "learning_rate": 0.000198274614354584,
+      "loss": 2.495,
+      "step": 90
+    },
+    {
+      "epoch": 0.017844886753603295,
+      "grad_norm": 8.753946304321289,
+      "learning_rate": 0.00019823133851689187,
+      "loss": 2.343,
+      "step": 91
+    },
+    {
+      "epoch": 0.0180409844102363,
+      "grad_norm": 9.124870300292969,
+      "learning_rate": 0.00019818753150262574,
+      "loss": 1.6556,
+      "step": 92
+    },
+    {
+      "epoch": 0.0182370820668693,
+      "grad_norm": 5.470896244049072,
+      "learning_rate": 0.00019814319354866786,
+      "loss": 1.2787,
+      "step": 93
+    },
+    {
+      "epoch": 0.018433179723502304,
+      "grad_norm": 8.39749526977539,
+      "learning_rate": 0.00019809832489477142,
+      "loss": 1.6804,
+      "step": 94
+    },
+    {
+      "epoch": 0.01862927738013531,
+      "grad_norm": 6.869524955749512,
+      "learning_rate": 0.0001980529257835594,
+      "loss": 1.6563,
+      "step": 95
+    },
+    {
+      "epoch": 0.01882537503676831,
+      "grad_norm": 11.144633293151855,
+      "learning_rate": 0.0001980069964605232,
+      "loss": 1.9428,
+      "step": 96
+    },
+    {
+      "epoch": 0.019021472693401313,
+      "grad_norm": 11.564960479736328,
+      "learning_rate": 0.00019796053717402118,
+      "loss": 2.2905,
+      "step": 97
+    },
+    {
+      "epoch": 0.019217570350034317,
+      "grad_norm": 4.628671169281006,
+      "learning_rate": 0.00019791354817527755,
+      "loss": 1.0654,
+      "step": 98
+    },
+    {
+      "epoch": 0.01941366800666732,
+      "grad_norm": 15.970938682556152,
+      "learning_rate": 0.00019786602971838074,
+      "loss": 2.9314,
+      "step": 99
+    },
+    {
+      "epoch": 0.019609765663300322,
+      "grad_norm": 5.425403594970703,
+      "learning_rate": 0.00019781798206028239,
+      "loss": 1.7236,
+      "step": 100
+    },
+    {
+      "epoch": 0.019805863319933326,
+      "grad_norm": 5.114929676055908,
+      "learning_rate": 0.0001977694054607955,
+      "loss": 2.1674,
+      "step": 101
+    },
+    {
+      "epoch": 0.02000196097656633,
+      "grad_norm": 9.564162254333496,
+      "learning_rate": 0.0001977203001825935,
+      "loss": 2.3091,
+      "step": 102
+    },
+    {
+      "epoch": 0.020198058633199335,
+      "grad_norm": 7.184217929840088,
+      "learning_rate": 0.00019767066649120838,
+      "loss": 2.5231,
+      "step": 103
+    },
+    {
+      "epoch": 0.020394156289832335,
+      "grad_norm": 8.926065444946289,
+      "learning_rate": 0.00019762050465502965,
+      "loss": 1.3343,
+      "step": 104
+    },
+    {
+      "epoch": 0.02059025394646534,
+      "grad_norm": 5.511591911315918,
+      "learning_rate": 0.0001975698149453026,
+      "loss": 1.7055,
+      "step": 105
+    },
+    {
+      "epoch": 0.020786351603098344,
+      "grad_norm": 8.730749130249023,
+      "learning_rate": 0.00019751859763612704,
+      "loss": 1.7862,
+      "step": 106
+    },
+    {
+      "epoch": 0.020982449259731348,
+      "grad_norm": 10.519305229187012,
+      "learning_rate": 0.00019746685300445565,
+      "loss": 1.9938,
+      "step": 107
+    },
+    {
+      "epoch": 0.02117854691636435,
+      "grad_norm": 6.307378768920898,
+      "learning_rate": 0.00019741458133009258,
+      "loss": 1.5184,
+      "step": 108
+    },
+    {
+      "epoch": 0.021374644572997353,
+      "grad_norm": 10.66562557220459,
+      "learning_rate": 0.00019736178289569186,
+      "loss": 2.0555,
+      "step": 109
+    },
+    {
+      "epoch": 0.021570742229630357,
+      "grad_norm": 6.235299587249756,
+      "learning_rate": 0.0001973084579867561,
+      "loss": 1.4547,
+      "step": 110
+    },
+    {
+      "epoch": 0.021766839886263357,
+      "grad_norm": 12.061420440673828,
+      "learning_rate": 0.00019725460689163455,
+      "loss": 1.9823,
+      "step": 111
+    },
+    {
+      "epoch": 0.02196293754289636,
+      "grad_norm": 8.891043663024902,
+      "learning_rate": 0.0001972002299015219,
+      "loss": 2.4757,
+      "step": 112
+    },
+    {
+      "epoch": 0.022159035199529366,
+      "grad_norm": 8.200632095336914,
+      "learning_rate": 0.00019714532731045649,
+      "loss": 1.4906,
+      "step": 113
+    },
+    {
+      "epoch": 0.02235513285616237,
+      "grad_norm": 8.106672286987305,
+      "learning_rate": 0.00019708989941531887,
+      "loss": 0.8863,
+      "step": 114
+    },
+    {
+      "epoch": 0.02255123051279537,
+      "grad_norm": 6.451066493988037,
+      "learning_rate": 0.0001970339465158301,
+      "loss": 2.1244,
+      "step": 115
+    },
+    {
+      "epoch": 0.022747328169428375,
+      "grad_norm": 6.3670220375061035,
+      "learning_rate": 0.0001969774689145501,
+      "loss": 1.3732,
+      "step": 116
+    },
+    {
+      "epoch": 0.02294342582606138,
+      "grad_norm": 6.401678562164307,
+      "learning_rate": 0.0001969204669168761,
+      "loss": 1.3963,
+      "step": 117
+    },
+    {
+      "epoch": 0.023139523482694383,
+      "grad_norm": 6.917253494262695,
+      "learning_rate": 0.00019686294083104094,
+      "loss": 1.3802,
+      "step": 118
+    },
+    {
+      "epoch": 0.023335621139327384,
+      "grad_norm": 4.852232456207275,
+      "learning_rate": 0.00019680489096811149,
+      "loss": 0.7538,
+      "step": 119
+    },
+    {
+      "epoch": 0.023531718795960388,
+      "grad_norm": 5.08411169052124,
+      "learning_rate": 0.00019674631764198677,
+      "loss": 1.5128,
+      "step": 120
+    },
+    {
+      "epoch": 0.023727816452593392,
+      "grad_norm": 13.64840316772461,
+      "learning_rate": 0.00019668722116939649,
+      "loss": 1.2718,
+      "step": 121
+    },
+    {
+      "epoch": 0.023923914109226396,
+      "grad_norm": 7.075209617614746,
+      "learning_rate": 0.00019662760186989913,
+      "loss": 2.4186,
+      "step": 122
+    },
+    {
+      "epoch": 0.024120011765859397,
+      "grad_norm": 12.186111450195312,
+      "learning_rate": 0.00019656746006588044,
+      "loss": 2.8495,
+      "step": 123
+    },
+    {
+      "epoch": 0.0243161094224924,
+      "grad_norm": 13.221094131469727,
+      "learning_rate": 0.00019650679608255138,
+      "loss": 1.1832,
+      "step": 124
+    },
+    {
+      "epoch": 0.024512207079125405,
+      "grad_norm": 4.765728950500488,
+      "learning_rate": 0.0001964456102479467,
+      "loss": 1.1407,
+      "step": 125
+    },
+    {
+      "epoch": 0.024708304735758406,
+      "grad_norm": 4.750761032104492,
+      "learning_rate": 0.00019638390289292295,
+      "loss": 1.1096,
+      "step": 126
+    },
+    {
+      "epoch": 0.02490440239239141,
+      "grad_norm": 5.133049011230469,
+      "learning_rate": 0.0001963216743511567,
+      "loss": 2.6525,
+      "step": 127
+    },
+    {
+      "epoch": 0.025100500049024414,
+      "grad_norm": 14.541696548461914,
+      "learning_rate": 0.0001962589249591429,
+      "loss": 3.3739,
+      "step": 128
+    },
+    {
+      "epoch": 0.02529659770565742,
+      "grad_norm": 7.252123832702637,
+      "learning_rate": 0.00019619565505619288,
+      "loss": 2.0899,
+      "step": 129
+    },
+    {
+      "epoch": 0.02549269536229042,
+      "grad_norm": 7.14664888381958,
+      "learning_rate": 0.00019613186498443257,
+      "loss": 1.4538,
+      "step": 130
+    },
+    {
+      "epoch": 0.025688793018923423,
+      "grad_norm": 7.936334609985352,
+      "learning_rate": 0.0001960675550888007,
+      "loss": 1.351,
+      "step": 131
+    },
+    {
+      "epoch": 0.025884890675556427,
+      "grad_norm": 7.465827465057373,
+      "learning_rate": 0.00019600272571704687,
+      "loss": 1.0752,
+      "step": 132
+    },
+    {
+      "epoch": 0.02608098833218943,
+      "grad_norm": 12.85190486907959,
+      "learning_rate": 0.00019593737721972977,
+      "loss": 2.5674,
+      "step": 133
+    },
+    {
+      "epoch": 0.026277085988822432,
+      "grad_norm": 8.640382766723633,
+      "learning_rate": 0.00019587150995021505,
+      "loss": 2.5631,
+      "step": 134
+    },
+    {
+      "epoch": 0.026473183645455436,
+      "grad_norm": 5.070466995239258,
+      "learning_rate": 0.00019580512426467376,
+      "loss": 0.8935,
+      "step": 135
+    },
+    {
+      "epoch": 0.02666928130208844,
+      "grad_norm": 5.741654872894287,
+      "learning_rate": 0.00019573822052208013,
+      "loss": 2.1005,
+      "step": 136
+    },
+    {
+      "epoch": 0.026865378958721445,
+      "grad_norm": 8.615095138549805,
+      "learning_rate": 0.00019567079908420972,
+      "loss": 2.7478,
+      "step": 137
+    },
+    {
+      "epoch": 0.027061476615354445,
+      "grad_norm": 15.559739112854004,
+      "learning_rate": 0.00019560286031563754,
+      "loss": 1.8455,
+      "step": 138
+    },
+    {
+      "epoch": 0.02725757427198745,
+      "grad_norm": 5.683060169219971,
+      "learning_rate": 0.000195534404583736,
+      "loss": 1.6896,
+      "step": 139
+    },
+    {
+      "epoch": 0.027453671928620454,
+      "grad_norm": 5.791153430938721,
+      "learning_rate": 0.00019546543225867292,
+      "loss": 1.9291,
+      "step": 140
+    },
+    {
+      "epoch": 0.027649769585253458,
+      "grad_norm": 6.800760269165039,
+      "learning_rate": 0.0001953959437134095,
+      "loss": 1.789,
+      "step": 141
+    },
+    {
+      "epoch": 0.02784586724188646,
+      "grad_norm": 6.912458896636963,
+      "learning_rate": 0.00019532593932369849,
+      "loss": 2.2544,
+      "step": 142
+    },
+    {
+      "epoch": 0.028041964898519463,
+      "grad_norm": 6.928767681121826,
+      "learning_rate": 0.00019525541946808188,
+      "loss": 1.2531,
+      "step": 143
+    },
+    {
+      "epoch": 0.028238062555152467,
+      "grad_norm": 8.96179485321045,
+      "learning_rate": 0.00019518438452788907,
+      "loss": 2.3403,
+      "step": 144
+    },
+    {
+      "epoch": 0.028434160211785468,
+      "grad_norm": 7.004507064819336,
+      "learning_rate": 0.00019511283488723473,
+      "loss": 1.1211,
+      "step": 145
+    },
+    {
+      "epoch": 0.028630257868418472,
+      "grad_norm": 8.655360221862793,
+      "learning_rate": 0.00019504077093301665,
+      "loss": 1.6074,
+      "step": 146
+    },
+    {
+      "epoch": 0.028826355525051476,
+      "grad_norm": 7.188142776489258,
+      "learning_rate": 0.00019496819305491383,
+      "loss": 1.3564,
+      "step": 147
+    },
+    {
+      "epoch": 0.02902245318168448,
+      "grad_norm": 5.618204116821289,
+      "learning_rate": 0.00019489510164538416,
+      "loss": 2.5936,
+      "step": 148
+    },
+    {
+      "epoch": 0.02921855083831748,
+      "grad_norm": 6.578945159912109,
+      "learning_rate": 0.00019482149709966246,
+      "loss": 0.9577,
+      "step": 149
+    },
+    {
+      "epoch": 0.029414648494950485,
+      "grad_norm": 11.468759536743164,
+      "learning_rate": 0.00019474737981575832,
+      "loss": 2.1746,
+      "step": 150
+    },
+    {
+      "epoch": 0.02961074615158349,
+      "grad_norm": 5.663360595703125,
+      "learning_rate": 0.00019467275019445385,
+      "loss": 1.2751,
+      "step": 151
+    },
+    {
+      "epoch": 0.029806843808216493,
+      "grad_norm": 6.297888278961182,
+      "learning_rate": 0.00019459760863930155,
+      "loss": 1.574,
+      "step": 152
+    },
+    {
+      "epoch": 0.030002941464849494,
+      "grad_norm": 8.760517120361328,
+      "learning_rate": 0.00019452195555662224,
+      "loss": 1.1148,
+      "step": 153
+    },
+    {
+      "epoch": 0.030199039121482498,
+      "grad_norm": 8.745152473449707,
+      "learning_rate": 0.00019444579135550273,
+      "loss": 1.4212,
+      "step": 154
+    },
+    {
+      "epoch": 0.030395136778115502,
+      "grad_norm": 7.594636917114258,
+      "learning_rate": 0.00019436911644779366,
+      "loss": 0.9161,
+      "step": 155
+    },
+    {
+      "epoch": 0.030591234434748506,
+      "grad_norm": 8.563089370727539,
+      "learning_rate": 0.00019429193124810725,
+      "loss": 1.5844,
+      "step": 156
+    },
+    {
+      "epoch": 0.030787332091381507,
+      "grad_norm": 16.4445743560791,
+      "learning_rate": 0.00019421423617381508,
+      "loss": 1.4798,
+      "step": 157
+    },
+    {
+      "epoch": 0.03098342974801451,
+      "grad_norm": 6.86074161529541,
+      "learning_rate": 0.00019413603164504588,
+      "loss": 2.3323,
+      "step": 158
+    },
+    {
+      "epoch": 0.031179527404647515,
+      "grad_norm": 11.293970108032227,
+      "learning_rate": 0.0001940573180846831,
+      "loss": 2.0304,
+      "step": 159
+    },
+    {
+      "epoch": 0.03137562506128052,
+      "grad_norm": 9.772544860839844,
+      "learning_rate": 0.00019397809591836286,
+      "loss": 3.1622,
+      "step": 160
+    },
+    {
+      "epoch": 0.031571722717913524,
+      "grad_norm": 8.489032745361328,
+      "learning_rate": 0.00019389836557447143,
+      "loss": 1.0113,
+      "step": 161
+    },
+    {
+      "epoch": 0.03176782037454652,
+      "grad_norm": 12.992219924926758,
+      "learning_rate": 0.000193818127484143,
+      "loss": 2.5478,
+      "step": 162
+    },
+    {
+      "epoch": 0.031963918031179525,
+      "grad_norm": 6.758875846862793,
+      "learning_rate": 0.0001937373820812574,
+      "loss": 1.8935,
+      "step": 163
+    },
+    {
+      "epoch": 0.03216001568781253,
+      "grad_norm": 11.88957405090332,
+      "learning_rate": 0.0001936561298024377,
+      "loss": 3.1899,
+      "step": 164
+    },
+    {
+      "epoch": 0.03235611334444553,
+      "grad_norm": 8.777862548828125,
+      "learning_rate": 0.00019357437108704777,
+      "loss": 2.7038,
+      "step": 165
+    },
+    {
+      "epoch": 0.03255221100107854,
+      "grad_norm": 5.135364532470703,
+      "learning_rate": 0.0001934921063771901,
+      "loss": 0.7644,
+      "step": 166
+    },
+    {
+      "epoch": 0.03274830865771154,
+      "grad_norm": 9.942065238952637,
+      "learning_rate": 0.00019340933611770321,
+      "loss": 1.4148,
+      "step": 167
+    },
+    {
+      "epoch": 0.032944406314344546,
+      "grad_norm": 7.881731033325195,
+      "learning_rate": 0.0001933260607561594,
+      "loss": 1.7402,
+      "step": 168
+    },
+    {
+      "epoch": 0.03314050397097755,
+      "grad_norm": 6.189711570739746,
+      "learning_rate": 0.00019324228074286222,
+      "loss": 1.642,
+      "step": 169
+    },
+    {
+      "epoch": 0.03333660162761055,
+      "grad_norm": 4.979406356811523,
+      "learning_rate": 0.00019315799653084404,
+      "loss": 2.425,
+      "step": 170
+    },
+    {
+      "epoch": 0.03353269928424355,
+      "grad_norm": 8.461871147155762,
+      "learning_rate": 0.00019307320857586376,
+      "loss": 2.6563,
+      "step": 171
+    },
+    {
+      "epoch": 0.033728796940876556,
+      "grad_norm": 8.992694854736328,
+      "learning_rate": 0.00019298791733640406,
+      "loss": 1.7962,
+      "step": 172
+    },
+    {
+      "epoch": 0.03392489459750956,
+      "grad_norm": 8.074629783630371,
+      "learning_rate": 0.00019290212327366924,
+      "loss": 1.5342,
+      "step": 173
+    },
+    {
+      "epoch": 0.034120992254142564,
+      "grad_norm": 6.5658111572265625,
+      "learning_rate": 0.00019281582685158247,
+      "loss": 0.7919,
+      "step": 174
+    },
+    {
+      "epoch": 0.03431708991077557,
+      "grad_norm": 9.946451187133789,
+      "learning_rate": 0.00019272902853678336,
+      "loss": 1.5664,
+      "step": 175
+    },
+    {
+      "epoch": 0.03451318756740857,
+      "grad_norm": 6.764862060546875,
+      "learning_rate": 0.00019264172879862552,
+      "loss": 2.1083,
+      "step": 176
+    },
+    {
+      "epoch": 0.03470928522404157,
+      "grad_norm": 11.195146560668945,
+      "learning_rate": 0.000192553928109174,
+      "loss": 1.3371,
+      "step": 177
+    },
+    {
+      "epoch": 0.034905382880674574,
+      "grad_norm": 6.352316856384277,
+      "learning_rate": 0.00019246562694320255,
+      "loss": 2.543,
+      "step": 178
+    },
+    {
+      "epoch": 0.03510148053730758,
+      "grad_norm": 4.030996799468994,
+      "learning_rate": 0.00019237682577819137,
+      "loss": 0.9273,
+      "step": 179
+    },
+    {
+      "epoch": 0.03529757819394058,
+      "grad_norm": 6.009339809417725,
+      "learning_rate": 0.00019228752509432417,
+      "loss": 2.1444,
+      "step": 180
+    },
+    {
+      "epoch": 0.035493675850573586,
+      "grad_norm": 14.911331176757812,
+      "learning_rate": 0.00019219772537448597,
+      "loss": 1.5989,
+      "step": 181
+    },
+    {
+      "epoch": 0.03568977350720659,
+      "grad_norm": 6.454592227935791,
+      "learning_rate": 0.00019210742710426012,
+      "loss": 1.0608,
+      "step": 182
+    },
+    {
+      "epoch": 0.035885871163839594,
+      "grad_norm": 11.354242324829102,
+      "learning_rate": 0.00019201663077192586,
+      "loss": 1.7558,
+      "step": 183
+    },
+    {
+      "epoch": 0.0360819688204726,
+      "grad_norm": 5.804329872131348,
+      "learning_rate": 0.0001919253368684557,
+      "loss": 1.411,
+      "step": 184
+    },
+    {
+      "epoch": 0.036278066477105596,
+      "grad_norm": 6.735583305358887,
+      "learning_rate": 0.00019183354588751271,
+      "loss": 2.4473,
+      "step": 185
+    },
+    {
+      "epoch": 0.0364741641337386,
+      "grad_norm": 9.169321060180664,
+      "learning_rate": 0.00019174125832544786,
+      "loss": 1.8947,
+      "step": 186
+    },
+    {
+      "epoch": 0.036670261790371604,
+      "grad_norm": 3.465175151824951,
+      "learning_rate": 0.0001916484746812973,
+      "loss": 1.1306,
+      "step": 187
+    },
+    {
+      "epoch": 0.03686635944700461,
+      "grad_norm": 10.71740436553955,
+      "learning_rate": 0.0001915551954567797,
+      "loss": 1.0832,
+      "step": 188
+    },
+    {
+      "epoch": 0.03706245710363761,
+      "grad_norm": 10.946560859680176,
+      "learning_rate": 0.0001914614211562936,
+      "loss": 1.9391,
+      "step": 189
+    },
+    {
+      "epoch": 0.03725855476027062,
+      "grad_norm": 6.847762584686279,
+      "learning_rate": 0.0001913671522869145,
+      "loss": 2.197,
+      "step": 190
+    },
+    {
+      "epoch": 0.03745465241690362,
+      "grad_norm": 13.330089569091797,
+      "learning_rate": 0.00019127238935839235,
+      "loss": 2.3539,
+      "step": 191
+    },
+    {
+      "epoch": 0.03765075007353662,
+      "grad_norm": 22.94158172607422,
+      "learning_rate": 0.00019117713288314863,
+      "loss": 2.868,
+      "step": 192
+    },
+    {
+      "epoch": 0.03784684773016962,
+      "grad_norm": 7.080881595611572,
+      "learning_rate": 0.00019108138337627358,
+      "loss": 1.7925,
+      "step": 193
+    },
+    {
+      "epoch": 0.038042945386802626,
+      "grad_norm": 4.726489067077637,
+      "learning_rate": 0.00019098514135552357,
+      "loss": 1.008,
+      "step": 194
+    },
+    {
+      "epoch": 0.03823904304343563,
+      "grad_norm": 5.7414870262146,
+      "learning_rate": 0.00019088840734131807,
+      "loss": 0.8934,
+      "step": 195
+    },
+    {
+      "epoch": 0.038435140700068635,
+      "grad_norm": 6.781312465667725,
+      "learning_rate": 0.00019079118185673705,
+      "loss": 1.6637,
+      "step": 196
+    },
+    {
+      "epoch": 0.03863123835670164,
+      "grad_norm": 12.264861106872559,
+      "learning_rate": 0.00019069346542751803,
+      "loss": 1.6055,
+      "step": 197
+    },
+    {
+      "epoch": 0.03882733601333464,
+      "grad_norm": 8.975689888000488,
+      "learning_rate": 0.00019059525858205323,
+      "loss": 2.6467,
+      "step": 198
+    },
+    {
+      "epoch": 0.03902343366996765,
+      "grad_norm": 9.027484893798828,
+      "learning_rate": 0.0001904965618513868,
+      "loss": 1.9813,
+      "step": 199
+    },
+    {
+      "epoch": 0.039219531326600644,
+      "grad_norm": 9.491011619567871,
+      "learning_rate": 0.0001903973757692119,
+      "loss": 1.4225,
+      "step": 200
+    },
+    {
+      "epoch": 0.03941562898323365,
+      "grad_norm": 5.202075481414795,
+      "learning_rate": 0.00019029770087186773,
+      "loss": 1.3524,
+      "step": 201
+    },
+    {
+      "epoch": 0.03961172663986665,
+      "grad_norm": 11.139556884765625,
+      "learning_rate": 0.00019019753769833678,
+      "loss": 2.0723,
+      "step": 202
+    },
+    {
+      "epoch": 0.03980782429649966,
+      "grad_norm": 11.383284568786621,
+      "learning_rate": 0.0001900968867902419,
+      "loss": 1.5843,
+      "step": 203
+    },
+    {
+      "epoch": 0.04000392195313266,
+      "grad_norm": 7.112687110900879,
+      "learning_rate": 0.00018999574869184324,
+      "loss": 1.3899,
+      "step": 204
+    },
+    {
+      "epoch": 0.040200019609765665,
+      "grad_norm": 13.50672721862793,
+      "learning_rate": 0.00018989412395003537,
+      "loss": 1.7484,
+      "step": 205
+    },
+    {
+      "epoch": 0.04039611726639867,
+      "grad_norm": 13.107057571411133,
+      "learning_rate": 0.00018979201311434434,
+      "loss": 1.6412,
+      "step": 206
+    },
+    {
+      "epoch": 0.040592214923031666,
+      "grad_norm": 11.221402168273926,
+      "learning_rate": 0.0001896894167369248,
+      "loss": 2.608,
+      "step": 207
+    },
+    {
+      "epoch": 0.04078831257966467,
+      "grad_norm": 4.945010662078857,
+      "learning_rate": 0.0001895863353725568,
+      "loss": 1.3582,
+      "step": 208
+    },
+    {
+      "epoch": 0.040984410236297675,
+      "grad_norm": 15.066801071166992,
+      "learning_rate": 0.00018948276957864299,
+      "loss": 1.7296,
+      "step": 209
+    },
+    {
+      "epoch": 0.04118050789293068,
+      "grad_norm": 7.497617244720459,
+      "learning_rate": 0.0001893787199152055,
+      "loss": 1.4961,
+      "step": 210
+    },
+    {
+      "epoch": 0.04137660554956368,
+      "grad_norm": 4.299473762512207,
+      "learning_rate": 0.00018927418694488296,
+      "loss": 1.7403,
+      "step": 211
+    },
+    {
+      "epoch": 0.04157270320619669,
+      "grad_norm": 6.988886833190918,
+      "learning_rate": 0.00018916917123292738,
+      "loss": 2.6546,
+      "step": 212
+    },
+    {
+      "epoch": 0.04176880086282969,
+      "grad_norm": 9.461721420288086,
+      "learning_rate": 0.00018906367334720124,
+      "loss": 2.067,
+      "step": 213
+    },
+    {
+      "epoch": 0.041964898519462696,
+      "grad_norm": 4.021213054656982,
+      "learning_rate": 0.0001889576938581742,
+      "loss": 1.0274,
+      "step": 214
+    },
+    {
+      "epoch": 0.04216099617609569,
+      "grad_norm": 7.254751205444336,
+      "learning_rate": 0.00018885123333892026,
+      "loss": 1.7091,
+      "step": 215
+    },
+    {
+      "epoch": 0.0423570938327287,
+      "grad_norm": 10.783350944519043,
+      "learning_rate": 0.00018874429236511448,
+      "loss": 1.3779,
+      "step": 216
+    },
+    {
+      "epoch": 0.0425531914893617,
+      "grad_norm": 16.38484764099121,
+      "learning_rate": 0.00018863687151503,
+      "loss": 2.8516,
+      "step": 217
+    },
+    {
+      "epoch": 0.042749289145994705,
+      "grad_norm": 5.848017692565918,
+      "learning_rate": 0.00018852897136953473,
+      "loss": 1.6814,
+      "step": 218
+    },
+    {
+      "epoch": 0.04294538680262771,
+      "grad_norm": 5.781267166137695,
+      "learning_rate": 0.00018842059251208845,
+      "loss": 2.1672,
+      "step": 219
+    },
+    {
+      "epoch": 0.043141484459260714,
+      "grad_norm": 6.244543552398682,
+      "learning_rate": 0.00018831173552873946,
+      "loss": 0.6934,
+      "step": 220
+    },
+    {
+      "epoch": 0.04333758211589372,
+      "grad_norm": 6.526815414428711,
+      "learning_rate": 0.0001882024010081215,
+      "loss": 1.6071,
+      "step": 221
+    },
+    {
+      "epoch": 0.043533679772526715,
+      "grad_norm": 6.064664363861084,
+      "learning_rate": 0.00018809258954145052,
+      "loss": 1.8964,
+      "step": 222
+    },
+    {
+      "epoch": 0.04372977742915972,
+      "grad_norm": 6.710343837738037,
+      "learning_rate": 0.0001879823017225215,
+      "loss": 2.063,
+      "step": 223
+    },
+    {
+      "epoch": 0.04392587508579272,
+      "grad_norm": 9.060622215270996,
+      "learning_rate": 0.00018787153814770537,
+      "loss": 1.568,
+      "step": 224
+    },
+    {
+      "epoch": 0.04412197274242573,
+      "grad_norm": 4.686062335968018,
+      "learning_rate": 0.00018776029941594552,
+      "loss": 1.1178,
+      "step": 225
+    },
+    {
+      "epoch": 0.04431807039905873,
+      "grad_norm": 6.257881164550781,
+      "learning_rate": 0.00018764858612875472,
+      "loss": 2.1195,
+      "step": 226
+    },
+    {
+      "epoch": 0.044514168055691736,
+      "grad_norm": 9.814234733581543,
+      "learning_rate": 0.00018753639889021196,
+      "loss": 1.1679,
+      "step": 227
+    },
+    {
+      "epoch": 0.04471026571232474,
+      "grad_norm": 13.946937561035156,
+      "learning_rate": 0.00018742373830695898,
+      "loss": 1.6899,
+      "step": 228
+    },
+    {
+      "epoch": 0.044906363368957744,
+      "grad_norm": 6.277964115142822,
+      "learning_rate": 0.0001873106049881971,
+      "loss": 1.5076,
+      "step": 229
+    },
+    {
+      "epoch": 0.04510246102559074,
+      "grad_norm": 14.092109680175781,
+      "learning_rate": 0.00018719699954568398,
+      "loss": 1.9726,
+      "step": 230
+    },
+    {
+      "epoch": 0.045298558682223745,
+      "grad_norm": 6.5708231925964355,
+      "learning_rate": 0.00018708292259373015,
+      "loss": 0.6623,
+      "step": 231
+    },
+    {
+      "epoch": 0.04549465633885675,
+      "grad_norm": 6.839974880218506,
+      "learning_rate": 0.00018696837474919582,
+      "loss": 1.8836,
+      "step": 232
+    },
+    {
+      "epoch": 0.045690753995489754,
+      "grad_norm": 17.919775009155273,
+      "learning_rate": 0.00018685335663148753,
+      "loss": 2.2343,
+      "step": 233
+    },
+    {
+      "epoch": 0.04588685165212276,
+      "grad_norm": 7.140262603759766,
+      "learning_rate": 0.00018673786886255476,
+      "loss": 1.9653,
+      "step": 234
+    },
+    {
+      "epoch": 0.04608294930875576,
+      "grad_norm": 11.049707412719727,
+      "learning_rate": 0.00018662191206688658,
+      "loss": 1.8658,
+      "step": 235
+    },
+    {
+      "epoch": 0.046279046965388766,
+      "grad_norm": 5.617869853973389,
+      "learning_rate": 0.00018650548687150823,
+      "loss": 1.862,
+      "step": 236
+    },
+    {
+      "epoch": 0.04647514462202176,
+      "grad_norm": 6.494004726409912,
+      "learning_rate": 0.00018638859390597792,
+      "loss": 3.787,
+      "step": 237
+    },
+    {
+      "epoch": 0.04667124227865477,
+      "grad_norm": 17.628082275390625,
+      "learning_rate": 0.00018627123380238314,
+      "loss": 1.9129,
+      "step": 238
+    },
+    {
+      "epoch": 0.04686733993528777,
+      "grad_norm": 5.634758949279785,
+      "learning_rate": 0.0001861534071953374,
+      "loss": 1.4702,
+      "step": 239
+    },
+    {
+      "epoch": 0.047063437591920776,
+      "grad_norm": 20.99329948425293,
+      "learning_rate": 0.00018603511472197685,
+      "loss": 2.2196,
+      "step": 240
+    },
+    {
+      "epoch": 0.04725953524855378,
+      "grad_norm": 6.650235652923584,
+      "learning_rate": 0.00018591635702195673,
+      "loss": 0.986,
+      "step": 241
+    },
+    {
+      "epoch": 0.047455632905186784,
+      "grad_norm": 4.837536334991455,
+      "learning_rate": 0.00018579713473744795,
+      "loss": 1.2033,
+      "step": 242
+    },
+    {
+      "epoch": 0.04765173056181979,
+      "grad_norm": 9.390655517578125,
+      "learning_rate": 0.00018567744851313362,
+      "loss": 1.7356,
+      "step": 243
+    },
+    {
+      "epoch": 0.04784782821845279,
+      "grad_norm": 8.881529808044434,
+      "learning_rate": 0.0001855572989962056,
+      "loss": 1.0063,
+      "step": 244
+    },
+    {
+      "epoch": 0.04804392587508579,
+      "grad_norm": 6.325936317443848,
+      "learning_rate": 0.00018543668683636085,
+      "loss": 1.5957,
+      "step": 245
+    },
+    {
+      "epoch": 0.048240023531718794,
+      "grad_norm": 10.866408348083496,
+      "learning_rate": 0.0001853156126857981,
+      "loss": 2.0472,
+      "step": 246
+    },
+    {
+      "epoch": 0.0484361211883518,
+      "grad_norm": 6.741912364959717,
+      "learning_rate": 0.00018519407719921427,
+      "loss": 1.6029,
+      "step": 247
+    },
+    {
+      "epoch": 0.0486322188449848,
+      "grad_norm": 5.979978561401367,
+      "learning_rate": 0.00018507208103380092,
+      "loss": 0.9353,
+      "step": 248
+    },
+    {
+      "epoch": 0.048828316501617806,
+      "grad_norm": 6.375998020172119,
+      "learning_rate": 0.00018494962484924058,
+      "loss": 2.5973,
+      "step": 249
+    },
+    {
+      "epoch": 0.04902441415825081,
+      "grad_norm": 8.528829574584961,
+      "learning_rate": 0.00018482670930770342,
+      "loss": 1.1618,
+      "step": 250
+    },
+    {
+      "epoch": 0.049220511814883815,
+      "grad_norm": 7.874660491943359,
+      "learning_rate": 0.0001847033350738435,
+      "loss": 1.3043,
+      "step": 251
+    },
+    {
+      "epoch": 0.04941660947151681,
+      "grad_norm": 4.421433925628662,
+      "learning_rate": 0.00018457950281479513,
+      "loss": 2.0768,
+      "step": 252
+    },
+    {
+      "epoch": 0.049612707128149816,
+      "grad_norm": 8.347919464111328,
+      "learning_rate": 0.00018445521320016944,
+      "loss": 1.0983,
+      "step": 253
+    },
+    {
+      "epoch": 0.04980880478478282,
+      "grad_norm": 6.713651657104492,
+      "learning_rate": 0.00018433046690205068,
+      "loss": 0.9891,
+      "step": 254
+    },
+    {
+      "epoch": 0.050004902441415824,
+      "grad_norm": 12.359843254089355,
+      "learning_rate": 0.0001842052645949925,
+      "loss": 2.5271,
+      "step": 255
+    },
+    {
+      "epoch": 0.05020100009804883,
+      "grad_norm": 4.7271199226379395,
+      "learning_rate": 0.00018407960695601442,
+      "loss": 1.394,
+      "step": 256
+    },
+    {
+      "epoch": 0.05039709775468183,
+      "grad_norm": 5.810708522796631,
+      "learning_rate": 0.0001839534946645981,
+      "loss": 1.7354,
+      "step": 257
+    },
+    {
+      "epoch": 0.05059319541131484,
+      "grad_norm": 28.575908660888672,
+      "learning_rate": 0.00018382692840268367,
+      "loss": 3.4793,
+      "step": 258
+    },
+    {
+      "epoch": 0.05078929306794784,
+      "grad_norm": 5.775376319885254,
+      "learning_rate": 0.00018369990885466617,
+      "loss": 1.4695,
+      "step": 259
+    },
+    {
+      "epoch": 0.05098539072458084,
+      "grad_norm": 7.531515121459961,
+      "learning_rate": 0.0001835724367073916,
+      "loss": 2.0772,
+      "step": 260
+    },
+    {
+      "epoch": 0.05118148838121384,
+      "grad_norm": 5.099686145782471,
+      "learning_rate": 0.00018344451265015348,
+      "loss": 3.2733,
+      "step": 261
+    },
+    {
+      "epoch": 0.05137758603784685,
+      "grad_norm": 5.558218479156494,
+      "learning_rate": 0.00018331613737468887,
+      "loss": 1.7578,
+      "step": 262
+    },
+    {
+      "epoch": 0.05157368369447985,
+      "grad_norm": 7.837512016296387,
+      "learning_rate": 0.00018318731157517478,
+      "loss": 2.7413,
+      "step": 263
+    },
+    {
+      "epoch": 0.051769781351112855,
+      "grad_norm": 9.013374328613281,
+      "learning_rate": 0.00018305803594822448,
+      "loss": 1.7722,
+      "step": 264
+    },
+    {
+      "epoch": 0.05196587900774586,
+      "grad_norm": 7.108828067779541,
+      "learning_rate": 0.00018292831119288348,
+      "loss": 1.31,
+      "step": 265
+    },
+    {
+      "epoch": 0.05216197666437886,
+      "grad_norm": 6.387202739715576,
+      "learning_rate": 0.0001827981380106261,
+      "loss": 1.0588,
+      "step": 266
+    },
+    {
+      "epoch": 0.05235807432101187,
+      "grad_norm": 13.306784629821777,
+      "learning_rate": 0.00018266751710535131,
+      "loss": 2.6092,
+      "step": 267
+    },
+    {
+      "epoch": 0.052554171977644865,
+      "grad_norm": 7.774232864379883,
+      "learning_rate": 0.00018253644918337915,
+      "loss": 1.2318,
+      "step": 268
+    },
+    {
+      "epoch": 0.05275026963427787,
+      "grad_norm": 5.396208763122559,
+      "learning_rate": 0.00018240493495344694,
+      "loss": 1.2144,
+      "step": 269
+    },
+    {
+      "epoch": 0.05294636729091087,
+      "grad_norm": 13.393839836120605,
+      "learning_rate": 0.0001822729751267053,
+      "loss": 2.4278,
+      "step": 270
+    },
+    {
+      "epoch": 0.05314246494754388,
+      "grad_norm": 14.873156547546387,
+      "learning_rate": 0.00018214057041671434,
+      "loss": 2.4015,
+      "step": 271
+    },
+    {
+      "epoch": 0.05333856260417688,
+      "grad_norm": 7.224187850952148,
+      "learning_rate": 0.00018200772153943988,
+      "loss": 1.578,
+      "step": 272
+    },
+    {
+      "epoch": 0.053534660260809885,
+      "grad_norm": 16.03417205810547,
+      "learning_rate": 0.00018187442921324958,
+      "loss": 2.4634,
+      "step": 273
+    },
+    {
+      "epoch": 0.05373075791744289,
+      "grad_norm": 6.284254550933838,
+      "learning_rate": 0.00018174069415890888,
+      "loss": 1.1995,
+      "step": 274
+    },
+    {
+      "epoch": 0.05392685557407589,
+      "grad_norm": 5.986271858215332,
+      "learning_rate": 0.00018160651709957736,
+      "loss": 1.8718,
+      "step": 275
+    },
+    {
+      "epoch": 0.05412295323070889,
+      "grad_norm": 13.663339614868164,
+      "learning_rate": 0.00018147189876080463,
+      "loss": 1.8664,
+      "step": 276
+    },
+    {
+      "epoch": 0.054319050887341895,
+      "grad_norm": 8.843401908874512,
+      "learning_rate": 0.0001813368398705265,
+      "loss": 4.1059,
+      "step": 277
+    },
+    {
+      "epoch": 0.0545151485439749,
+      "grad_norm": 6.658843994140625,
+      "learning_rate": 0.00018120134115906096,
+      "loss": 0.8269,
+      "step": 278
+    },
+    {
+      "epoch": 0.0547112462006079,
+      "grad_norm": 14.153685569763184,
+      "learning_rate": 0.0001810654033591044,
+      "loss": 1.9537,
+      "step": 279
+    },
+    {
+      "epoch": 0.05490734385724091,
+      "grad_norm": 5.585552215576172,
+      "learning_rate": 0.00018092902720572745,
+      "loss": 1.2531,
+      "step": 280
+    },
+    {
+      "epoch": 0.05510344151387391,
+      "grad_norm": 6.414156436920166,
+      "learning_rate": 0.00018079221343637113,
+      "loss": 0.9456,
+      "step": 281
+    },
+    {
+      "epoch": 0.055299539170506916,
+      "grad_norm": 6.674518585205078,
+      "learning_rate": 0.00018065496279084283,
+      "loss": 1.1899,
+      "step": 282
+    },
+    {
+      "epoch": 0.05549563682713991,
+      "grad_norm": 16.878883361816406,
+      "learning_rate": 0.00018051727601131227,
+      "loss": 0.8761,
+      "step": 283
+    },
+    {
+      "epoch": 0.05569173448377292,
+      "grad_norm": 31.99020004272461,
+      "learning_rate": 0.0001803791538423076,
+      "loss": 1.7235,
+      "step": 284
+    },
+    {
+      "epoch": 0.05588783214040592,
+      "grad_norm": 4.821253776550293,
+      "learning_rate": 0.0001802405970307112,
+      "loss": 1.2902,
+      "step": 285
+    },
+    {
+      "epoch": 0.056083929797038926,
+      "grad_norm": 7.2293267250061035,
+      "learning_rate": 0.00018010160632575577,
+      "loss": 1.0864,
+      "step": 286
+    },
+    {
+      "epoch": 0.05628002745367193,
+      "grad_norm": 8.262434959411621,
+      "learning_rate": 0.00017996218247902035,
+      "loss": 1.6246,
+      "step": 287
+    },
+    {
+      "epoch": 0.056476125110304934,
+      "grad_norm": 7.845222473144531,
+      "learning_rate": 0.00017982232624442595,
+      "loss": 1.7249,
+      "step": 288
+    },
+    {
+      "epoch": 0.05667222276693794,
+      "grad_norm": 9.781503677368164,
+      "learning_rate": 0.0001796820383782319,
+      "loss": 1.9911,
+      "step": 289
+    },
+    {
+      "epoch": 0.056868320423570935,
+      "grad_norm": 10.7435884475708,
+      "learning_rate": 0.00017954131963903133,
+      "loss": 1.4689,
+      "step": 290
+    },
+    {
+      "epoch": 0.05706441808020394,
+      "grad_norm": 8.709835052490234,
+      "learning_rate": 0.00017940017078774747,
+      "loss": 2.3939,
+      "step": 291
+    },
+    {
+      "epoch": 0.057260515736836944,
+      "grad_norm": 6.140249252319336,
+      "learning_rate": 0.00017925859258762915,
+      "loss": 2.0753,
+      "step": 292
+    },
+    {
+      "epoch": 0.05745661339346995,
+      "grad_norm": 9.725993156433105,
+      "learning_rate": 0.00017911658580424704,
+      "loss": 1.5315,
+      "step": 293
+    },
+    {
+      "epoch": 0.05765271105010295,
+      "grad_norm": 20.413393020629883,
+      "learning_rate": 0.00017897415120548917,
+      "loss": 2.4083,
+      "step": 294
+    },
+    {
+      "epoch": 0.057848808706735956,
+      "grad_norm": 7.881735324859619,
+      "learning_rate": 0.00017883128956155706,
+      "loss": 3.1099,
+      "step": 295
+    },
+    {
+      "epoch": 0.05804490636336896,
+      "grad_norm": 8.856921195983887,
+      "learning_rate": 0.0001786880016449614,
+      "loss": 0.9566,
+      "step": 296
+    },
+    {
+      "epoch": 0.058241004020001964,
+      "grad_norm": 25.80061912536621,
+      "learning_rate": 0.0001785442882305179,
+      "loss": 2.6205,
+      "step": 297
+    },
+    {
+      "epoch": 0.05843710167663496,
+      "grad_norm": 13.293787956237793,
+      "learning_rate": 0.00017840015009534308,
+      "loss": 1.7317,
+      "step": 298
+    },
+    {
+      "epoch": 0.058633199333267966,
+      "grad_norm": 11.313176155090332,
+      "learning_rate": 0.00017825558801885016,
+      "loss": 1.5122,
+      "step": 299
+    },
+    {
+      "epoch": 0.05882929698990097,
+      "grad_norm": 9.636054039001465,
+      "learning_rate": 0.00017811060278274474,
+      "loss": 2.1976,
+      "step": 300
+    },
+    {
+      "epoch": 0.059025394646533974,
+      "grad_norm": 11.10985279083252,
+      "learning_rate": 0.00017796519517102066,
+      "loss": 2.5926,
+      "step": 301
+    },
+    {
+      "epoch": 0.05922149230316698,
+      "grad_norm": 6.27458381652832,
+      "learning_rate": 0.00017781936596995563,
+      "loss": 2.5326,
+      "step": 302
+    },
+    {
+      "epoch": 0.05941758995979998,
+      "grad_norm": 10.134029388427734,
+      "learning_rate": 0.00017767311596810715,
+      "loss": 1.4142,
+      "step": 303
+    },
+    {
+      "epoch": 0.059613687616432987,
+      "grad_norm": 10.580477714538574,
+      "learning_rate": 0.0001775264459563081,
+      "loss": 2.3389,
+      "step": 304
+    },
+    {
+      "epoch": 0.059809785273065984,
+      "grad_norm": 7.545969009399414,
+      "learning_rate": 0.00017737935672766257,
+      "loss": 2.3728,
+      "step": 305
+    },
+    {
+      "epoch": 0.06000588292969899,
+      "grad_norm": 6.5107622146606445,
+      "learning_rate": 0.00017723184907754154,
+      "loss": 1.0087,
+      "step": 306
+    },
+    {
+      "epoch": 0.06020198058633199,
+      "grad_norm": 10.663747787475586,
+      "learning_rate": 0.00017708392380357845,
+      "loss": 2.2001,
+      "step": 307
+    },
+    {
+      "epoch": 0.060398078242964996,
+      "grad_norm": 10.211285591125488,
+      "learning_rate": 0.0001769355817056651,
+      "loss": 2.2537,
+      "step": 308
+    },
+    {
+      "epoch": 0.060594175899598,
+      "grad_norm": 8.512606620788574,
+      "learning_rate": 0.00017678682358594728,
+      "loss": 1.0687,
+      "step": 309
+    },
+    {
+      "epoch": 0.060790273556231005,
+      "grad_norm": 10.64330768585205,
+      "learning_rate": 0.0001766376502488202,
+      "loss": 2.4672,
+      "step": 310
+    },
+    {
+      "epoch": 0.06098637121286401,
+      "grad_norm": 9.972363471984863,
+      "learning_rate": 0.0001764880625009245,
+      "loss": 2.2277,
+      "step": 311
+    },
+    {
+      "epoch": 0.06118246886949701,
+      "grad_norm": 10.01198959350586,
+      "learning_rate": 0.0001763380611511416,
+      "loss": 1.906,
+      "step": 312
+    },
+    {
+      "epoch": 0.06137856652613001,
+      "grad_norm": 11.719582557678223,
+      "learning_rate": 0.00017618764701058949,
+      "loss": 2.4849,
+      "step": 313
+    },
+    {
+      "epoch": 0.061574664182763014,
+      "grad_norm": 9.08901309967041,
+      "learning_rate": 0.0001760368208926182,
+      "loss": 1.4476,
+      "step": 314
+    },
+    {
+      "epoch": 0.06177076183939602,
+      "grad_norm": 11.74946403503418,
+      "learning_rate": 0.00017588558361280557,
+      "loss": 1.3607,
+      "step": 315
+    },
+    {
+      "epoch": 0.06196685949602902,
+      "grad_norm": 6.549604892730713,
+      "learning_rate": 0.00017573393598895276,
+      "loss": 2.066,
+      "step": 316
+    },
+    {
+      "epoch": 0.06216295715266203,
+      "grad_norm": 5.160909652709961,
+      "learning_rate": 0.00017558187884107978,
+      "loss": 0.9799,
+      "step": 317
+    },
+    {
+      "epoch": 0.06235905480929503,
+      "grad_norm": 11.00747299194336,
+      "learning_rate": 0.00017542941299142112,
+      "loss": 2.3193,
+      "step": 318
+    },
+    {
+      "epoch": 0.06255515246592804,
+      "grad_norm": 4.467062473297119,
+      "learning_rate": 0.00017527653926442135,
+      "loss": 1.9208,
+      "step": 319
+    },
+    {
+      "epoch": 0.06275125012256104,
+      "grad_norm": 11.554845809936523,
+      "learning_rate": 0.00017512325848673043,
+      "loss": 1.5197,
+      "step": 320
+    },
+    {
+      "epoch": 0.06294734777919404,
+      "grad_norm": 6.626307964324951,
+      "learning_rate": 0.0001749695714871996,
+      "loss": 2.0222,
+      "step": 321
+    },
+    {
+      "epoch": 0.06314344543582705,
+      "grad_norm": 5.357675075531006,
+      "learning_rate": 0.00017481547909687658,
+      "loss": 2.3264,
+      "step": 322
+    },
+    {
+      "epoch": 0.06333954309246005,
+      "grad_norm": 9.451851844787598,
+      "learning_rate": 0.00017466098214900124,
+      "loss": 1.5699,
+      "step": 323
+    },
+    {
+      "epoch": 0.06353564074909304,
+      "grad_norm": 5.937804698944092,
+      "learning_rate": 0.00017450608147900106,
+      "loss": 1.832,
+      "step": 324
+    },
+    {
+      "epoch": 0.06373173840572605,
+      "grad_norm": 10.744606971740723,
+      "learning_rate": 0.00017435077792448664,
+      "loss": 1.4665,
+      "step": 325
+    },
+    {
+      "epoch": 0.06392783606235905,
+      "grad_norm": 9.900199890136719,
+      "learning_rate": 0.0001741950723252471,
+      "loss": 1.4785,
+      "step": 326
+    },
+    {
+      "epoch": 0.06412393371899205,
+      "grad_norm": 13.663531303405762,
+      "learning_rate": 0.00017403896552324553,
+      "loss": 0.9564,
+      "step": 327
+    },
+    {
+      "epoch": 0.06432003137562506,
+      "grad_norm": 4.708930015563965,
+      "learning_rate": 0.00017388245836261464,
+      "loss": 0.8293,
+      "step": 328
+    },
+    {
+      "epoch": 0.06451612903225806,
+      "grad_norm": 6.656357288360596,
+      "learning_rate": 0.00017372555168965184,
+      "loss": 2.2177,
+      "step": 329
+    },
+    {
+      "epoch": 0.06471222668889107,
+      "grad_norm": 6.349363803863525,
+      "learning_rate": 0.00017356824635281502,
+      "loss": 3.3319,
+      "step": 330
+    },
+    {
+      "epoch": 0.06490832434552407,
+      "grad_norm": 10.561055183410645,
+      "learning_rate": 0.00017341054320271776,
+      "loss": 1.919,
+      "step": 331
+    },
+    {
+      "epoch": 0.06510442200215708,
+      "grad_norm": 10.805617332458496,
+      "learning_rate": 0.00017325244309212475,
+      "loss": 2.9898,
+      "step": 332
+    },
+    {
+      "epoch": 0.06530051965879008,
+      "grad_norm": 9.67844009399414,
+      "learning_rate": 0.0001730939468759472,
+      "loss": 3.0727,
+      "step": 333
+    },
+    {
+      "epoch": 0.06549661731542308,
+      "grad_norm": 9.487295150756836,
+      "learning_rate": 0.00017293505541123833,
+      "loss": 1.2741,
+      "step": 334
+    },
+    {
+      "epoch": 0.06569271497205609,
+      "grad_norm": 5.027645111083984,
+      "learning_rate": 0.00017277576955718847,
+      "loss": 1.6962,
+      "step": 335
+    },
+    {
+      "epoch": 0.06588881262868909,
+      "grad_norm": 6.2696685791015625,
+      "learning_rate": 0.0001726160901751207,
+      "loss": 1.4388,
+      "step": 336
+    },
+    {
+      "epoch": 0.0660849102853221,
+      "grad_norm": 9.509906768798828,
+      "learning_rate": 0.000172456018128486,
+      "loss": 2.3173,
+      "step": 337
+    },
+    {
+      "epoch": 0.0662810079419551,
+      "grad_norm": 5.882188320159912,
+      "learning_rate": 0.00017229555428285864,
+      "loss": 1.1832,
+      "step": 338
+    },
+    {
+      "epoch": 0.06647710559858809,
+      "grad_norm": 8.857211112976074,
+      "learning_rate": 0.00017213469950593156,
+      "loss": 2.484,
+      "step": 339
+    },
+    {
+      "epoch": 0.0666732032552211,
+      "grad_norm": 10.660289764404297,
+      "learning_rate": 0.00017197345466751158,
+      "loss": 2.5169,
+      "step": 340
+    },
+    {
+      "epoch": 0.0668693009118541,
+      "grad_norm": 6.141770839691162,
+      "learning_rate": 0.00017181182063951474,
+      "loss": 1.601,
+      "step": 341
+    },
+    {
+      "epoch": 0.0668693009118541,
+      "eval_loss": 0.4308730363845825,
+      "eval_runtime": 77.758,
+      "eval_samples_per_second": 27.624,
+      "eval_steps_per_second": 13.812,
+      "step": 341
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1361,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 341,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.34807651820503e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b824b34abb9b483ba144d252745a6c7219fffc9242a3b936ca3b8b5c3590fbfe
+size 6776