Training in progress, step 326, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +30 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +205 -0
last-checkpoint/trainer_state.json +2331 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: katuni4ka/tiny-random-dbrx
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "katuni4ka/tiny-random-dbrx",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "out_proj",
+    "layer",
+    "Wqkv"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08ca8205181b68d333a4f0a289870a53b963f718604c5c951422e52950488627
+size 5752

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 100279,
+  "<|im_start|>": 100278
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b61c2ac0c7b26591a9180a1aff3f962565b362434671c0886e8e747a2c4d663b
+size 15814

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d46b927c6c42d138de013ceecb523baf0b07dd3e0ea25da1fa01ad086a6af7f
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f7efc07dc5d49180f63f51b1c4ed58268af3b4670f1838f1cfdb3384e17c3fe
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,205 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<||_unused_0_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "<||_unused_1_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100262": {
+      "content": "<||_unused_2_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100263": {
+      "content": "<||_unused_3_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100264": {
+      "content": "<||_unused_4_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<||_unused_5_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<||_unused_6_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100267": {
+      "content": "<||_unused_7_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100268": {
+      "content": "<||_unused_8_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100269": {
+      "content": "<||_unused_9_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100270": {
+      "content": "<||_unused_10_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100271": {
+      "content": "<||_unused_11_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100272": {
+      "content": "<||_unused_12_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100273": {
+      "content": "<||_unused_13_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100274": {
+      "content": "<||_unused_14_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100275": {
+      "content": "<||_unused_15_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100278": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100279": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 32768,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2331 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.006562194913292471,
+  "eval_steps": 326,
+  "global_step": 326,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 2.0129432249363408e-05,
+      "grad_norm": 1.1866097338497639e-05,
+      "learning_rate": 2e-05,
+      "loss": 46.0,
+      "step": 1
+    },
+    {
+      "epoch": 2.0129432249363408e-05,
+      "eval_loss": 11.5,
+      "eval_runtime": 126.1545,
+      "eval_samples_per_second": 165.813,
+      "eval_steps_per_second": 82.906,
+      "step": 1
+    },
+    {
+      "epoch": 4.0258864498726816e-05,
+      "grad_norm": 2.147201303159818e-05,
+      "learning_rate": 4e-05,
+      "loss": 46.0,
+      "step": 2
+    },
+    {
+      "epoch": 6.038829674809022e-05,
+      "grad_norm": 1.848486135713756e-05,
+      "learning_rate": 6e-05,
+      "loss": 46.0,
+      "step": 3
+    },
+    {
+      "epoch": 8.051772899745363e-05,
+      "grad_norm": 1.654278821661137e-05,
+      "learning_rate": 8e-05,
+      "loss": 46.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.00010064716124681703,
+      "grad_norm": 2.277838393638376e-05,
+      "learning_rate": 0.0001,
+      "loss": 46.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.00012077659349618043,
+      "grad_norm": 2.333819975319784e-05,
+      "learning_rate": 0.00012,
+      "loss": 46.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.00014090602574554385,
+      "grad_norm": 1.976581188500859e-05,
+      "learning_rate": 0.00014,
+      "loss": 46.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.00016103545799490726,
+      "grad_norm": 2.9277169232955202e-05,
+      "learning_rate": 0.00016,
+      "loss": 46.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.00018116489024427065,
+      "grad_norm": 1.2510759916040115e-05,
+      "learning_rate": 0.00018,
+      "loss": 46.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.00020129432249363407,
+      "grad_norm": 1.7789652702049352e-05,
+      "learning_rate": 0.0002,
+      "loss": 46.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.00022142375474299748,
+      "grad_norm": 2.230467725894414e-05,
+      "learning_rate": 0.00019999970482981582,
+      "loss": 46.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.00024155318699236087,
+      "grad_norm": 2.8929885957040824e-05,
+      "learning_rate": 0.0001999988193210057,
+      "loss": 46.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.0002616826192417243,
+      "grad_norm": 2.140910510206595e-05,
+      "learning_rate": 0.00019999734347879723,
+      "loss": 46.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.0002818120514910877,
+      "grad_norm": 1.3324294741323683e-05,
+      "learning_rate": 0.0001999952773119029,
+      "loss": 46.0,
+      "step": 14
+    },
+    {
+      "epoch": 0.0003019414837404511,
+      "grad_norm": 6.112633127486333e-05,
+      "learning_rate": 0.00019999262083252007,
+      "loss": 46.0,
+      "step": 15
+    },
+    {
+      "epoch": 0.00032207091598981453,
+      "grad_norm": 2.477996349625755e-05,
+      "learning_rate": 0.00019998937405633105,
+      "loss": 46.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.0003422003482391779,
+      "grad_norm": 2.2150932636577636e-05,
+      "learning_rate": 0.00019998553700250284,
+      "loss": 46.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.0003623297804885413,
+      "grad_norm": 1.1595971955102868e-05,
+      "learning_rate": 0.00019998110969368717,
+      "loss": 46.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.00038245921273790474,
+      "grad_norm": 1.8772680050460622e-05,
+      "learning_rate": 0.00019997609215602019,
+      "loss": 46.0,
+      "step": 19
+    },
+    {
+      "epoch": 0.00040258864498726813,
+      "grad_norm": 1.745060035318602e-05,
+      "learning_rate": 0.00019997048441912246,
+      "loss": 46.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.0004227180772366315,
+      "grad_norm": 3.103197013842873e-05,
+      "learning_rate": 0.0001999642865160987,
+      "loss": 46.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.00044284750948599496,
+      "grad_norm": 3.2184922019951046e-05,
+      "learning_rate": 0.0001999574984835377,
+      "loss": 46.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.00046297694173535835,
+      "grad_norm": 2.257189953525085e-05,
+      "learning_rate": 0.00019995012036151186,
+      "loss": 46.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.00048310637398472174,
+      "grad_norm": 3.554321301635355e-05,
+      "learning_rate": 0.00019994215219357728,
+      "loss": 46.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.0005032358062340851,
+      "grad_norm": 1.5587129382765852e-05,
+      "learning_rate": 0.00019993359402677323,
+      "loss": 46.0,
+      "step": 25
+    },
+    {
+      "epoch": 0.0005233652384834486,
+      "grad_norm": 9.828573638515081e-06,
+      "learning_rate": 0.00019992444591162206,
+      "loss": 46.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.000543494670732812,
+      "grad_norm": 1.708105810394045e-05,
+      "learning_rate": 0.00019991470790212877,
+      "loss": 46.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.0005636241029821754,
+      "grad_norm": 2.235212923551444e-05,
+      "learning_rate": 0.00019990438005578075,
+      "loss": 46.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.0005837535352315388,
+      "grad_norm": 2.0345447410363704e-05,
+      "learning_rate": 0.00019989346243354746,
+      "loss": 46.0,
+      "step": 29
+    },
+    {
+      "epoch": 0.0006038829674809022,
+      "grad_norm": 2.3022035747999325e-05,
+      "learning_rate": 0.00019988195509988005,
+      "loss": 46.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.0006240123997302656,
+      "grad_norm": 2.097547439916525e-05,
+      "learning_rate": 0.00019986985812271092,
+      "loss": 46.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.0006441418319796291,
+      "grad_norm": 2.48163087235298e-05,
+      "learning_rate": 0.00019985717157345345,
+      "loss": 46.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.0006642712642289924,
+      "grad_norm": 1.3824127563566435e-05,
+      "learning_rate": 0.00019984389552700144,
+      "loss": 46.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.0006844006964783558,
+      "grad_norm": 5.524979133042507e-05,
+      "learning_rate": 0.0001998300300617287,
+      "loss": 46.0,
+      "step": 34
+    },
+    {
+      "epoch": 0.0007045301287277192,
+      "grad_norm": 2.9547367375926115e-05,
+      "learning_rate": 0.00019981557525948875,
+      "loss": 46.0,
+      "step": 35
+    },
+    {
+      "epoch": 0.0007246595609770826,
+      "grad_norm": 3.511565591907129e-05,
+      "learning_rate": 0.00019980053120561411,
+      "loss": 46.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.0007447889932264461,
+      "grad_norm": 1.500822963862447e-05,
+      "learning_rate": 0.00019978489798891584,
+      "loss": 46.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.0007649184254758095,
+      "grad_norm": 2.595680416561663e-05,
+      "learning_rate": 0.00019976867570168318,
+      "loss": 46.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.0007850478577251729,
+      "grad_norm": 2.681766818568576e-05,
+      "learning_rate": 0.00019975186443968286,
+      "loss": 46.0,
+      "step": 39
+    },
+    {
+      "epoch": 0.0008051772899745363,
+      "grad_norm": 3.518196172080934e-05,
+      "learning_rate": 0.0001997344643021585,
+      "loss": 46.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.0008253067222238997,
+      "grad_norm": 2.3766757294652052e-05,
+      "learning_rate": 0.00019971647539183013,
+      "loss": 46.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.000845436154473263,
+      "grad_norm": 1.9241595509811305e-05,
+      "learning_rate": 0.00019969789781489362,
+      "loss": 46.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.0008655655867226265,
+      "grad_norm": 2.352761111978907e-05,
+      "learning_rate": 0.00019967873168101984,
+      "loss": 46.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.0008856950189719899,
+      "grad_norm": 2.3743756173644215e-05,
+      "learning_rate": 0.00019965897710335422,
+      "loss": 46.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.0009058244512213533,
+      "grad_norm": 3.65232554031536e-05,
+      "learning_rate": 0.00019963863419851605,
+      "loss": 46.0,
+      "step": 45
+    },
+    {
+      "epoch": 0.0009259538834707167,
+      "grad_norm": 2.59846947301412e-05,
+      "learning_rate": 0.00019961770308659767,
+      "loss": 46.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.0009460833157200801,
+      "grad_norm": 2.885664434870705e-05,
+      "learning_rate": 0.00019959618389116387,
+      "loss": 46.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.0009662127479694435,
+      "grad_norm": 2.3175163732958026e-05,
+      "learning_rate": 0.0001995740767392512,
+      "loss": 46.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.000986342180218807,
+      "grad_norm": 4.130275920033455e-05,
+      "learning_rate": 0.0001995513817613671,
+      "loss": 46.0,
+      "step": 49
+    },
+    {
+      "epoch": 0.0010064716124681702,
+      "grad_norm": 3.658945206552744e-05,
+      "learning_rate": 0.00019952809909148914,
+      "loss": 46.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.0010266010447175337,
+      "grad_norm": 2.976952600874938e-05,
+      "learning_rate": 0.0001995042288670643,
+      "loss": 46.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.0010467304769668972,
+      "grad_norm": 1.5616597011103295e-05,
+      "learning_rate": 0.00019947977122900822,
+      "loss": 46.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.0010668599092162605,
+      "grad_norm": 2.330297138541937e-05,
+      "learning_rate": 0.0001994547263217042,
+      "loss": 46.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.001086989341465624,
+      "grad_norm": 2.5345374524476938e-05,
+      "learning_rate": 0.00019942909429300238,
+      "loss": 46.0,
+      "step": 54
+    },
+    {
+      "epoch": 0.0011071187737149873,
+      "grad_norm": 2.747085818555206e-05,
+      "learning_rate": 0.00019940287529421902,
+      "loss": 46.0,
+      "step": 55
+    },
+    {
+      "epoch": 0.0011272482059643508,
+      "grad_norm": 5.7161822041962296e-05,
+      "learning_rate": 0.00019937606948013548,
+      "loss": 46.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.0011473776382137143,
+      "grad_norm": 1.3162572031433228e-05,
+      "learning_rate": 0.00019934867700899722,
+      "loss": 46.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.0011675070704630776,
+      "grad_norm": 3.8153884815983474e-05,
+      "learning_rate": 0.00019932069804251312,
+      "loss": 46.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.001187636502712441,
+      "grad_norm": 2.5788935090531595e-05,
+      "learning_rate": 0.0001992921327458543,
+      "loss": 46.0,
+      "step": 59
+    },
+    {
+      "epoch": 0.0012077659349618043,
+      "grad_norm": 1.2793129826604854e-05,
+      "learning_rate": 0.00019926298128765323,
+      "loss": 46.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.0012278953672111678,
+      "grad_norm": 2.963062252092641e-05,
+      "learning_rate": 0.00019923324384000276,
+      "loss": 46.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.0012480247994605311,
+      "grad_norm": 1.7501424736110494e-05,
+      "learning_rate": 0.00019920292057845499,
+      "loss": 46.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.0012681542317098946,
+      "grad_norm": 2.3330876501859166e-05,
+      "learning_rate": 0.00019917201168202043,
+      "loss": 46.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.0012882836639592581,
+      "grad_norm": 1.3082960322208237e-05,
+      "learning_rate": 0.00019914051733316678,
+      "loss": 46.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.0013084130962086214,
+      "grad_norm": 2.3455559130525216e-05,
+      "learning_rate": 0.00019910843771781783,
+      "loss": 46.0,
+      "step": 65
+    },
+    {
+      "epoch": 0.0013285425284579849,
+      "grad_norm": 1.9461349438643083e-05,
+      "learning_rate": 0.00019907577302535255,
+      "loss": 46.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.0013486719607073482,
+      "grad_norm": 3.472498428891413e-05,
+      "learning_rate": 0.00019904252344860382,
+      "loss": 46.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.0013688013929567117,
+      "grad_norm": 2.7159438104717992e-05,
+      "learning_rate": 0.00019900868918385726,
+      "loss": 46.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.0013889308252060752,
+      "grad_norm": 1.6992695236695e-05,
+      "learning_rate": 0.00019897427043085022,
+      "loss": 46.0,
+      "step": 69
+    },
+    {
+      "epoch": 0.0014090602574554384,
+      "grad_norm": 2.162869532185141e-05,
+      "learning_rate": 0.0001989392673927705,
+      "loss": 46.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.001429189689704802,
+      "grad_norm": 5.969742778688669e-05,
+      "learning_rate": 0.00019890368027625517,
+      "loss": 46.0,
+      "step": 71
+    },
+    {
+      "epoch": 0.0014493191219541652,
+      "grad_norm": 2.1275785911711864e-05,
+      "learning_rate": 0.00019886750929138934,
+      "loss": 46.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.0014694485542035287,
+      "grad_norm": 2.3872542442404665e-05,
+      "learning_rate": 0.0001988307546517049,
+      "loss": 46.0,
+      "step": 73
+    },
+    {
+      "epoch": 0.0014895779864528922,
+      "grad_norm": 5.359681381378323e-05,
+      "learning_rate": 0.00019879341657417935,
+      "loss": 46.0,
+      "step": 74
+    },
+    {
+      "epoch": 0.0015097074187022555,
+      "grad_norm": 2.5549368729116395e-05,
+      "learning_rate": 0.00019875549527923449,
+      "loss": 46.0,
+      "step": 75
+    },
+    {
+      "epoch": 0.001529836850951619,
+      "grad_norm": 2.281313754792791e-05,
+      "learning_rate": 0.00019871699099073493,
+      "loss": 46.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.0015499662832009823,
+      "grad_norm": 3.20350554829929e-05,
+      "learning_rate": 0.0001986779039359871,
+      "loss": 46.0,
+      "step": 77
+    },
+    {
+      "epoch": 0.0015700957154503458,
+      "grad_norm": 3.160408232361078e-05,
+      "learning_rate": 0.00019863823434573762,
+      "loss": 46.0,
+      "step": 78
+    },
+    {
+      "epoch": 0.001590225147699709,
+      "grad_norm": 2.4337972718058154e-05,
+      "learning_rate": 0.00019859798245417217,
+      "loss": 46.0,
+      "step": 79
+    },
+    {
+      "epoch": 0.0016103545799490725,
+      "grad_norm": 3.159191328450106e-05,
+      "learning_rate": 0.0001985571484989138,
+      "loss": 46.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.001630484012198436,
+      "grad_norm": 2.5550882128300145e-05,
+      "learning_rate": 0.00019851573272102195,
+      "loss": 46.0,
+      "step": 81
+    },
+    {
+      "epoch": 0.0016506134444477993,
+      "grad_norm": 1.8689172065933235e-05,
+      "learning_rate": 0.0001984737353649906,
+      "loss": 46.0,
+      "step": 82
+    },
+    {
+      "epoch": 0.0016707428766971628,
+      "grad_norm": 2.9251643354655243e-05,
+      "learning_rate": 0.00019843115667874707,
+      "loss": 46.0,
+      "step": 83
+    },
+    {
+      "epoch": 0.001690872308946526,
+      "grad_norm": 3.018877760041505e-05,
+      "learning_rate": 0.00019838799691365065,
+      "loss": 46.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.0017110017411958896,
+      "grad_norm": 1.1726152479241136e-05,
+      "learning_rate": 0.00019834425632449075,
+      "loss": 46.0,
+      "step": 85
+    },
+    {
+      "epoch": 0.001731131173445253,
+      "grad_norm": 2.3671049348195083e-05,
+      "learning_rate": 0.00019829993516948577,
+      "loss": 46.0,
+      "step": 86
+    },
+    {
+      "epoch": 0.0017512606056946164,
+      "grad_norm": 2.0576631868607365e-05,
+      "learning_rate": 0.00019825503371028136,
+      "loss": 46.0,
+      "step": 87
+    },
+    {
+      "epoch": 0.0017713900379439798,
+      "grad_norm": 1.466808589611901e-05,
+      "learning_rate": 0.000198209552211949,
+      "loss": 46.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.0017915194701933431,
+      "grad_norm": 2.361923543503508e-05,
+      "learning_rate": 0.00019816349094298427,
+      "loss": 46.0,
+      "step": 89
+    },
+    {
+      "epoch": 0.0018116489024427066,
+      "grad_norm": 1.9187695215805434e-05,
+      "learning_rate": 0.0001981168501753055,
+      "loss": 46.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.0018317783346920701,
+      "grad_norm": 2.630672861414496e-05,
+      "learning_rate": 0.0001980696301842519,
+      "loss": 46.0,
+      "step": 91
+    },
+    {
+      "epoch": 0.0018519077669414334,
+      "grad_norm": 1.8121598259313032e-05,
+      "learning_rate": 0.00019802183124858222,
+      "loss": 46.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.001872037199190797,
+      "grad_norm": 3.593276414903812e-05,
+      "learning_rate": 0.00019797345365047284,
+      "loss": 46.0,
+      "step": 93
+    },
+    {
+      "epoch": 0.0018921666314401602,
+      "grad_norm": 2.5328612537123263e-05,
+      "learning_rate": 0.0001979244976755162,
+      "loss": 46.0,
+      "step": 94
+    },
+    {
+      "epoch": 0.0019122960636895237,
+      "grad_norm": 3.064305201405659e-05,
+      "learning_rate": 0.00019787496361271925,
+      "loss": 46.0,
+      "step": 95
+    },
+    {
+      "epoch": 0.001932425495938887,
+      "grad_norm": 2.1601079424726777e-05,
+      "learning_rate": 0.00019782485175450155,
+      "loss": 46.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.0019525549281882504,
+      "grad_norm": 1.7290110918111168e-05,
+      "learning_rate": 0.0001977741623966936,
+      "loss": 46.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.001972684360437614,
+      "grad_norm": 1.116962175728986e-05,
+      "learning_rate": 0.00019772289583853514,
+      "loss": 46.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.0019928137926869772,
+      "grad_norm": 1.0275795830239076e-05,
+      "learning_rate": 0.00019767105238267338,
+      "loss": 46.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.0020129432249363405,
+      "grad_norm": 2.2131345758680254e-05,
+      "learning_rate": 0.00019761863233516117,
+      "loss": 46.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.002033072657185704,
+      "grad_norm": 3.4143031371058896e-05,
+      "learning_rate": 0.0001975656360054552,
+      "loss": 46.0,
+      "step": 101
+    },
+    {
+      "epoch": 0.0020532020894350675,
+      "grad_norm": 3.857325282297097e-05,
+      "learning_rate": 0.0001975120637064142,
+      "loss": 46.0,
+      "step": 102
+    },
+    {
+      "epoch": 0.0020733315216844308,
+      "grad_norm": 2.403794314886909e-05,
+      "learning_rate": 0.00019745791575429705,
+      "loss": 46.0,
+      "step": 103
+    },
+    {
+      "epoch": 0.0020934609539337945,
+      "grad_norm": 3.789052425418049e-05,
+      "learning_rate": 0.00019740319246876106,
+      "loss": 46.0,
+      "step": 104
+    },
+    {
+      "epoch": 0.0021135903861831578,
+      "grad_norm": 3.8589034375036135e-05,
+      "learning_rate": 0.00019734789417285976,
+      "loss": 46.0,
+      "step": 105
+    },
+    {
+      "epoch": 0.002133719818432521,
+      "grad_norm": 2.034025419561658e-05,
+      "learning_rate": 0.0001972920211930414,
+      "loss": 46.0,
+      "step": 106
+    },
+    {
+      "epoch": 0.0021538492506818843,
+      "grad_norm": 1.9496819732012227e-05,
+      "learning_rate": 0.0001972355738591467,
+      "loss": 46.0,
+      "step": 107
+    },
+    {
+      "epoch": 0.002173978682931248,
+      "grad_norm": 1.7886142813949846e-05,
+      "learning_rate": 0.00019717855250440705,
+      "loss": 46.0,
+      "step": 108
+    },
+    {
+      "epoch": 0.0021941081151806113,
+      "grad_norm": 1.818929194996599e-05,
+      "learning_rate": 0.00019712095746544255,
+      "loss": 46.0,
+      "step": 109
+    },
+    {
+      "epoch": 0.0022142375474299746,
+      "grad_norm": 2.199762820964679e-05,
+      "learning_rate": 0.00019706278908225992,
+      "loss": 46.0,
+      "step": 110
+    },
+    {
+      "epoch": 0.0022343669796793383,
+      "grad_norm": 2.1755575289716944e-05,
+      "learning_rate": 0.00019700404769825068,
+      "loss": 46.0,
+      "step": 111
+    },
+    {
+      "epoch": 0.0022544964119287016,
+      "grad_norm": 3.8793521525803953e-05,
+      "learning_rate": 0.00019694473366018887,
+      "loss": 46.0,
+      "step": 112
+    },
+    {
+      "epoch": 0.002274625844178065,
+      "grad_norm": 3.468850627541542e-05,
+      "learning_rate": 0.00019688484731822923,
+      "loss": 46.0,
+      "step": 113
+    },
+    {
+      "epoch": 0.0022947552764274286,
+      "grad_norm": 2.4715391191421077e-05,
+      "learning_rate": 0.00019682438902590498,
+      "loss": 46.0,
+      "step": 114
+    },
+    {
+      "epoch": 0.002314884708676792,
+      "grad_norm": 3.426595503697172e-05,
+      "learning_rate": 0.0001967633591401259,
+      "loss": 46.0,
+      "step": 115
+    },
+    {
+      "epoch": 0.002335014140926155,
+      "grad_norm": 5.176919148652814e-05,
+      "learning_rate": 0.000196701758021176,
+      "loss": 46.0,
+      "step": 116
+    },
+    {
+      "epoch": 0.0023551435731755184,
+      "grad_norm": 2.376974771323148e-05,
+      "learning_rate": 0.00019663958603271148,
+      "loss": 46.0,
+      "step": 117
+    },
+    {
+      "epoch": 0.002375273005424882,
+      "grad_norm": 2.0293871784815565e-05,
+      "learning_rate": 0.0001965768435417588,
+      "loss": 46.0,
+      "step": 118
+    },
+    {
+      "epoch": 0.0023954024376742454,
+      "grad_norm": 4.838638415094465e-05,
+      "learning_rate": 0.00019651353091871215,
+      "loss": 46.0,
+      "step": 119
+    },
+    {
+      "epoch": 0.0024155318699236087,
+      "grad_norm": 2.106054307660088e-05,
+      "learning_rate": 0.00019644964853733152,
+      "loss": 46.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.0024356613021729724,
+      "grad_norm": 2.7618483727565035e-05,
+      "learning_rate": 0.0001963851967747404,
+      "loss": 46.0,
+      "step": 121
+    },
+    {
+      "epoch": 0.0024557907344223357,
+      "grad_norm": 1.421527485945262e-05,
+      "learning_rate": 0.00019632017601142355,
+      "loss": 46.0,
+      "step": 122
+    },
+    {
+      "epoch": 0.002475920166671699,
+      "grad_norm": 3.1367508199764416e-05,
+      "learning_rate": 0.00019625458663122478,
+      "loss": 46.0,
+      "step": 123
+    },
+    {
+      "epoch": 0.0024960495989210622,
+      "grad_norm": 3.238041608710773e-05,
+      "learning_rate": 0.00019618842902134465,
+      "loss": 46.0,
+      "step": 124
+    },
+    {
+      "epoch": 0.002516179031170426,
+      "grad_norm": 2.0453908291528933e-05,
+      "learning_rate": 0.00019612170357233836,
+      "loss": 46.0,
+      "step": 125
+    },
+    {
+      "epoch": 0.0025363084634197892,
+      "grad_norm": 1.5395889931824058e-05,
+      "learning_rate": 0.00019605441067811302,
+      "loss": 46.0,
+      "step": 126
+    },
+    {
+      "epoch": 0.0025564378956691525,
+      "grad_norm": 2.2598505893256515e-05,
+      "learning_rate": 0.00019598655073592585,
+      "loss": 46.0,
+      "step": 127
+    },
+    {
+      "epoch": 0.0025765673279185162,
+      "grad_norm": 2.011835022130981e-05,
+      "learning_rate": 0.0001959181241463814,
+      "loss": 46.0,
+      "step": 128
+    },
+    {
+      "epoch": 0.0025966967601678795,
+      "grad_norm": 2.2615582565777004e-05,
+      "learning_rate": 0.00019584913131342953,
+      "loss": 46.0,
+      "step": 129
+    },
+    {
+      "epoch": 0.0026168261924172428,
+      "grad_norm": 2.472496998962015e-05,
+      "learning_rate": 0.0001957795726443628,
+      "loss": 46.0,
+      "step": 130
+    },
+    {
+      "epoch": 0.0026369556246666065,
+      "grad_norm": 2.1229192498140037e-05,
+      "learning_rate": 0.000195709448549814,
+      "loss": 46.0,
+      "step": 131
+    },
+    {
+      "epoch": 0.0026570850569159698,
+      "grad_norm": 3.1881041650194675e-05,
+      "learning_rate": 0.00019563875944375407,
+      "loss": 46.0,
+      "step": 132
+    },
+    {
+      "epoch": 0.002677214489165333,
+      "grad_norm": 3.062764881178737e-05,
+      "learning_rate": 0.0001955675057434893,
+      "loss": 46.0,
+      "step": 133
+    },
+    {
+      "epoch": 0.0026973439214146963,
+      "grad_norm": 3.407730400795117e-05,
+      "learning_rate": 0.00019549568786965903,
+      "loss": 46.0,
+      "step": 134
+    },
+    {
+      "epoch": 0.00271747335366406,
+      "grad_norm": 2.335791396035347e-05,
+      "learning_rate": 0.00019542330624623322,
+      "loss": 46.0,
+      "step": 135
+    },
+    {
+      "epoch": 0.0027376027859134233,
+      "grad_norm": 2.1637504687532783e-05,
+      "learning_rate": 0.00019535036130050975,
+      "loss": 46.0,
+      "step": 136
+    },
+    {
+      "epoch": 0.0027577322181627866,
+      "grad_norm": 2.3219181457534432e-05,
+      "learning_rate": 0.00019527685346311212,
+      "loss": 46.0,
+      "step": 137
+    },
+    {
+      "epoch": 0.0027778616504121503,
+      "grad_norm": 1.165738285635598e-05,
+      "learning_rate": 0.0001952027831679867,
+      "loss": 46.0,
+      "step": 138
+    },
+    {
+      "epoch": 0.0027979910826615136,
+      "grad_norm": 2.6394216547487304e-05,
+      "learning_rate": 0.00019512815085240046,
+      "loss": 46.0,
+      "step": 139
+    },
+    {
+      "epoch": 0.002818120514910877,
+      "grad_norm": 2.7199243049835786e-05,
+      "learning_rate": 0.000195052956956938,
+      "loss": 46.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.00283824994716024,
+      "grad_norm": 1.723020432109479e-05,
+      "learning_rate": 0.00019497720192549926,
+      "loss": 46.0,
+      "step": 141
+    },
+    {
+      "epoch": 0.002858379379409604,
+      "grad_norm": 2.4921268050093204e-05,
+      "learning_rate": 0.00019490088620529678,
+      "loss": 46.0,
+      "step": 142
+    },
+    {
+      "epoch": 0.002878508811658967,
+      "grad_norm": 2.3121931008063257e-05,
+      "learning_rate": 0.00019482401024685308,
+      "loss": 46.0,
+      "step": 143
+    },
+    {
+      "epoch": 0.0028986382439083304,
+      "grad_norm": 4.1502407839288935e-05,
+      "learning_rate": 0.0001947465745039979,
+      "loss": 46.0,
+      "step": 144
+    },
+    {
+      "epoch": 0.002918767676157694,
+      "grad_norm": 3.218562051188201e-05,
+      "learning_rate": 0.0001946685794338658,
+      "loss": 46.0,
+      "step": 145
+    },
+    {
+      "epoch": 0.0029388971084070574,
+      "grad_norm": 1.8879612980526872e-05,
+      "learning_rate": 0.00019459002549689308,
+      "loss": 46.0,
+      "step": 146
+    },
+    {
+      "epoch": 0.0029590265406564207,
+      "grad_norm": 2.8899030439788476e-05,
+      "learning_rate": 0.0001945109131568154,
+      "loss": 46.0,
+      "step": 147
+    },
+    {
+      "epoch": 0.0029791559729057844,
+      "grad_norm": 3.5309523809701204e-05,
+      "learning_rate": 0.00019443124288066475,
+      "loss": 46.0,
+      "step": 148
+    },
+    {
+      "epoch": 0.0029992854051551477,
+      "grad_norm": 4.7148212615866214e-05,
+      "learning_rate": 0.00019435101513876703,
+      "loss": 46.0,
+      "step": 149
+    },
+    {
+      "epoch": 0.003019414837404511,
+      "grad_norm": 3.963925701100379e-05,
+      "learning_rate": 0.00019427023040473896,
+      "loss": 46.0,
+      "step": 150
+    },
+    {
+      "epoch": 0.0030395442696538742,
+      "grad_norm": 2.9483388061635196e-05,
+      "learning_rate": 0.0001941888891554854,
+      "loss": 46.0,
+      "step": 151
+    },
+    {
+      "epoch": 0.003059673701903238,
+      "grad_norm": 2.0797941033379175e-05,
+      "learning_rate": 0.00019410699187119663,
+      "loss": 46.0,
+      "step": 152
+    },
+    {
+      "epoch": 0.0030798031341526012,
+      "grad_norm": 2.525432500988245e-05,
+      "learning_rate": 0.00019402453903534533,
+      "loss": 46.0,
+      "step": 153
+    },
+    {
+      "epoch": 0.0030999325664019645,
+      "grad_norm": 1.9120217984891497e-05,
+      "learning_rate": 0.0001939415311346839,
+      "loss": 46.0,
+      "step": 154
+    },
+    {
+      "epoch": 0.0031200619986513282,
+      "grad_norm": 2.6778399842442013e-05,
+      "learning_rate": 0.0001938579686592415,
+      "loss": 46.0,
+      "step": 155
+    },
+    {
+      "epoch": 0.0031401914309006915,
+      "grad_norm": 2.4967603167169727e-05,
+      "learning_rate": 0.00019377385210232113,
+      "loss": 46.0,
+      "step": 156
+    },
+    {
+      "epoch": 0.003160320863150055,
+      "grad_norm": 2.38423963310197e-05,
+      "learning_rate": 0.0001936891819604968,
+      "loss": 46.0,
+      "step": 157
+    },
+    {
+      "epoch": 0.003180450295399418,
+      "grad_norm": 5.6928216508822516e-05,
+      "learning_rate": 0.00019360395873361055,
+      "loss": 46.0,
+      "step": 158
+    },
+    {
+      "epoch": 0.0032005797276487818,
+      "grad_norm": 4.014354999526404e-05,
+      "learning_rate": 0.00019351818292476946,
+      "loss": 46.0,
+      "step": 159
+    },
+    {
+      "epoch": 0.003220709159898145,
+      "grad_norm": 4.82712421217002e-05,
+      "learning_rate": 0.00019343185504034277,
+      "loss": 46.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.0032408385921475083,
+      "grad_norm": 3.384835144970566e-05,
+      "learning_rate": 0.0001933449755899588,
+      "loss": 46.0,
+      "step": 161
+    },
+    {
+      "epoch": 0.003260968024396872,
+      "grad_norm": 1.4583272786694579e-05,
+      "learning_rate": 0.0001932575450865021,
+      "loss": 46.0,
+      "step": 162
+    },
+    {
+      "epoch": 0.0032810974566462353,
+      "grad_norm": 4.5586399210151285e-05,
+      "learning_rate": 0.00019316956404611012,
+      "loss": 46.0,
+      "step": 163
+    },
+    {
+      "epoch": 0.0033012268888955986,
+      "grad_norm": 4.526826523942873e-05,
+      "learning_rate": 0.00019308103298817052,
+      "loss": 46.0,
+      "step": 164
+    },
+    {
+      "epoch": 0.0033213563211449623,
+      "grad_norm": 5.154962491360493e-05,
+      "learning_rate": 0.00019299195243531792,
+      "loss": 46.0,
+      "step": 165
+    },
+    {
+      "epoch": 0.0033414857533943256,
+      "grad_norm": 2.3496044377679937e-05,
+      "learning_rate": 0.00019290232291343067,
+      "loss": 46.0,
+      "step": 166
+    },
+    {
+      "epoch": 0.003361615185643689,
+      "grad_norm": 3.0550760129699484e-05,
+      "learning_rate": 0.0001928121449516281,
+      "loss": 46.0,
+      "step": 167
+    },
+    {
+      "epoch": 0.003381744617893052,
+      "grad_norm": 2.7053209123550914e-05,
+      "learning_rate": 0.00019272141908226707,
+      "loss": 46.0,
+      "step": 168
+    },
+    {
+      "epoch": 0.003401874050142416,
+      "grad_norm": 1.612185405974742e-05,
+      "learning_rate": 0.0001926301458409391,
+      "loss": 46.0,
+      "step": 169
+    },
+    {
+      "epoch": 0.003422003482391779,
+      "grad_norm": 1.803100349206943e-05,
+      "learning_rate": 0.00019253832576646688,
+      "loss": 46.0,
+      "step": 170
+    },
+    {
+      "epoch": 0.0034421329146411424,
+      "grad_norm": 1.77473557414487e-05,
+      "learning_rate": 0.00019244595940090143,
+      "loss": 46.0,
+      "step": 171
+    },
+    {
+      "epoch": 0.003462262346890506,
+      "grad_norm": 2.4842493075993843e-05,
+      "learning_rate": 0.00019235304728951866,
+      "loss": 46.0,
+      "step": 172
+    },
+    {
+      "epoch": 0.0034823917791398694,
+      "grad_norm": 3.840986391878687e-05,
+      "learning_rate": 0.00019225958998081633,
+      "loss": 46.0,
+      "step": 173
+    },
+    {
+      "epoch": 0.0035025212113892327,
+      "grad_norm": 3.629952698247507e-05,
+      "learning_rate": 0.0001921655880265106,
+      "loss": 46.0,
+      "step": 174
+    },
+    {
+      "epoch": 0.003522650643638596,
+      "grad_norm": 3.082855619140901e-05,
+      "learning_rate": 0.00019207104198153295,
+      "loss": 46.0,
+      "step": 175
+    },
+    {
+      "epoch": 0.0035427800758879597,
+      "grad_norm": 8.436971984338015e-05,
+      "learning_rate": 0.0001919759524040269,
+      "loss": 46.0,
+      "step": 176
+    },
+    {
+      "epoch": 0.003562909508137323,
+      "grad_norm": 3.003582423843909e-05,
+      "learning_rate": 0.0001918803198553446,
+      "loss": 46.0,
+      "step": 177
+    },
+    {
+      "epoch": 0.0035830389403866863,
+      "grad_norm": 4.6667788410559297e-05,
+      "learning_rate": 0.00019178414490004356,
+      "loss": 46.0,
+      "step": 178
+    },
+    {
+      "epoch": 0.00360316837263605,
+      "grad_norm": 3.2573891076026484e-05,
+      "learning_rate": 0.00019168742810588335,
+      "loss": 46.0,
+      "step": 179
+    },
+    {
+      "epoch": 0.0036232978048854132,
+      "grad_norm": 2.6542162231635302e-05,
+      "learning_rate": 0.00019159017004382234,
+      "loss": 46.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.0036434272371347765,
+      "grad_norm": 2.6043957404908724e-05,
+      "learning_rate": 0.00019149237128801404,
+      "loss": 46.0,
+      "step": 181
+    },
+    {
+      "epoch": 0.0036635566693841402,
+      "grad_norm": 1.9306073227198794e-05,
+      "learning_rate": 0.000191394032415804,
+      "loss": 46.0,
+      "step": 182
+    },
+    {
+      "epoch": 0.0036836861016335035,
+      "grad_norm": 4.7370471293106675e-05,
+      "learning_rate": 0.00019129515400772635,
+      "loss": 46.0,
+      "step": 183
+    },
+    {
+      "epoch": 0.003703815533882867,
+      "grad_norm": 3.607594771892764e-05,
+      "learning_rate": 0.00019119573664750018,
+      "loss": 46.0,
+      "step": 184
+    },
+    {
+      "epoch": 0.00372394496613223,
+      "grad_norm": 4.207424717606045e-05,
+      "learning_rate": 0.00019109578092202628,
+      "loss": 46.0,
+      "step": 185
+    },
+    {
+      "epoch": 0.003744074398381594,
+      "grad_norm": 4.7341436584247276e-05,
+      "learning_rate": 0.00019099528742138371,
+      "loss": 46.0,
+      "step": 186
+    },
+    {
+      "epoch": 0.003764203830630957,
+      "grad_norm": 6.413136725313962e-05,
+      "learning_rate": 0.00019089425673882615,
+      "loss": 46.0,
+      "step": 187
+    },
+    {
+      "epoch": 0.0037843332628803203,
+      "grad_norm": 3.3956010156543925e-05,
+      "learning_rate": 0.0001907926894707785,
+      "loss": 46.0,
+      "step": 188
+    },
+    {
+      "epoch": 0.003804462695129684,
+      "grad_norm": 7.443443610100076e-05,
+      "learning_rate": 0.00019069058621683336,
+      "loss": 46.0,
+      "step": 189
+    },
+    {
+      "epoch": 0.0038245921273790473,
+      "grad_norm": 9.83256395556964e-05,
+      "learning_rate": 0.0001905879475797474,
+      "loss": 46.0,
+      "step": 190
+    },
+    {
+      "epoch": 0.0038447215596284106,
+      "grad_norm": 2.799310823320411e-05,
+      "learning_rate": 0.00019048477416543801,
+      "loss": 46.0,
+      "step": 191
+    },
+    {
+      "epoch": 0.003864850991877774,
+      "grad_norm": 2.725904414546676e-05,
+      "learning_rate": 0.00019038106658297944,
+      "loss": 46.0,
+      "step": 192
+    },
+    {
+      "epoch": 0.0038849804241271376,
+      "grad_norm": 1.805232386686839e-05,
+      "learning_rate": 0.00019027682544459947,
+      "loss": 46.0,
+      "step": 193
+    },
+    {
+      "epoch": 0.003905109856376501,
+      "grad_norm": 2.9510436434065923e-05,
+      "learning_rate": 0.00019017205136567556,
+      "loss": 46.0,
+      "step": 194
+    },
+    {
+      "epoch": 0.003925239288625864,
+      "grad_norm": 3.2932246540440246e-05,
+      "learning_rate": 0.00019006674496473144,
+      "loss": 46.0,
+      "step": 195
+    },
+    {
+      "epoch": 0.003945368720875228,
+      "grad_norm": 3.495354394544847e-05,
+      "learning_rate": 0.00018996090686343328,
+      "loss": 46.0,
+      "step": 196
+    },
+    {
+      "epoch": 0.003965498153124591,
+      "grad_norm": 6.263954128371552e-05,
+      "learning_rate": 0.0001898545376865861,
+      "loss": 46.0,
+      "step": 197
+    },
+    {
+      "epoch": 0.0039856275853739544,
+      "grad_norm": 2.9388587790890597e-05,
+      "learning_rate": 0.00018974763806213013,
+      "loss": 46.0,
+      "step": 198
+    },
+    {
+      "epoch": 0.004005757017623318,
+      "grad_norm": 2.9143146093701944e-05,
+      "learning_rate": 0.000189640208621137,
+      "loss": 46.0,
+      "step": 199
+    },
+    {
+      "epoch": 0.004025886449872681,
+      "grad_norm": 2.8607553758774884e-05,
+      "learning_rate": 0.00018953224999780605,
+      "loss": 46.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.004046015882122045,
+      "grad_norm": 2.6011948648374528e-05,
+      "learning_rate": 0.00018942376282946066,
+      "loss": 46.0,
+      "step": 201
+    },
+    {
+      "epoch": 0.004066145314371408,
+      "grad_norm": 5.046524165663868e-05,
+      "learning_rate": 0.0001893147477565443,
+      "loss": 46.0,
+      "step": 202
+    },
+    {
+      "epoch": 0.004086274746620771,
+      "grad_norm": 2.9760611141682602e-05,
+      "learning_rate": 0.000189205205422617,
+      "loss": 46.0,
+      "step": 203
+    },
+    {
+      "epoch": 0.004106404178870135,
+      "grad_norm": 8.055127545958385e-05,
+      "learning_rate": 0.0001890951364743514,
+      "loss": 46.0,
+      "step": 204
+    },
+    {
+      "epoch": 0.004126533611119499,
+      "grad_norm": 3.0201517802197486e-05,
+      "learning_rate": 0.00018898454156152886,
+      "loss": 46.0,
+      "step": 205
+    },
+    {
+      "epoch": 0.0041466630433688615,
+      "grad_norm": 3.596295937313698e-05,
+      "learning_rate": 0.0001888734213370359,
+      "loss": 46.0,
+      "step": 206
+    },
+    {
+      "epoch": 0.004166792475618225,
+      "grad_norm": 3.9855971408542246e-05,
+      "learning_rate": 0.00018876177645685998,
+      "loss": 46.0,
+      "step": 207
+    },
+    {
+      "epoch": 0.004186921907867589,
+      "grad_norm": 2.937594945251476e-05,
+      "learning_rate": 0.00018864960758008592,
+      "loss": 46.0,
+      "step": 208
+    },
+    {
+      "epoch": 0.004207051340116952,
+      "grad_norm": 2.6503237677388825e-05,
+      "learning_rate": 0.00018853691536889188,
+      "loss": 46.0,
+      "step": 209
+    },
+    {
+      "epoch": 0.0042271807723663155,
+      "grad_norm": 2.7466578103485517e-05,
+      "learning_rate": 0.0001884237004885455,
+      "loss": 46.0,
+      "step": 210
+    },
+    {
+      "epoch": 0.004247310204615679,
+      "grad_norm": 2.5270055630244315e-05,
+      "learning_rate": 0.0001883099636073999,
+      "loss": 46.0,
+      "step": 211
+    },
+    {
+      "epoch": 0.004267439636865042,
+      "grad_norm": 4.509964492172003e-05,
+      "learning_rate": 0.0001881957053968898,
+      "loss": 46.0,
+      "step": 212
+    },
+    {
+      "epoch": 0.004287569069114406,
+      "grad_norm": 4.1347884689457715e-05,
+      "learning_rate": 0.00018808092653152753,
+      "loss": 46.0,
+      "step": 213
+    },
+    {
+      "epoch": 0.004307698501363769,
+      "grad_norm": 2.3344733563135378e-05,
+      "learning_rate": 0.00018796562768889913,
+      "loss": 46.0,
+      "step": 214
+    },
+    {
+      "epoch": 0.004327827933613132,
+      "grad_norm": 3.056141213164665e-05,
+      "learning_rate": 0.0001878498095496601,
+      "loss": 46.0,
+      "step": 215
+    },
+    {
+      "epoch": 0.004347957365862496,
+      "grad_norm": 1.8424869267619215e-05,
+      "learning_rate": 0.00018773347279753177,
+      "loss": 46.0,
+      "step": 216
+    },
+    {
+      "epoch": 0.004368086798111859,
+      "grad_norm": 3.535512223606929e-05,
+      "learning_rate": 0.00018761661811929686,
+      "loss": 46.0,
+      "step": 217
+    },
+    {
+      "epoch": 0.004388216230361223,
+      "grad_norm": 2.6731742764241062e-05,
+      "learning_rate": 0.00018749924620479585,
+      "loss": 46.0,
+      "step": 218
+    },
+    {
+      "epoch": 0.004408345662610586,
+      "grad_norm": 4.029847332276404e-05,
+      "learning_rate": 0.0001873813577469224,
+      "loss": 46.0,
+      "step": 219
+    },
+    {
+      "epoch": 0.004428475094859949,
+      "grad_norm": 4.0732127672526985e-05,
+      "learning_rate": 0.0001872629534416197,
+      "loss": 46.0,
+      "step": 220
+    },
+    {
+      "epoch": 0.004448604527109313,
+      "grad_norm": 2.8962362193851732e-05,
+      "learning_rate": 0.0001871440339878762,
+      "loss": 46.0,
+      "step": 221
+    },
+    {
+      "epoch": 0.004468733959358677,
+      "grad_norm": 4.08275009249337e-05,
+      "learning_rate": 0.0001870246000877214,
+      "loss": 46.0,
+      "step": 222
+    },
+    {
+      "epoch": 0.0044888633916080395,
+      "grad_norm": 3.2036841730587184e-05,
+      "learning_rate": 0.00018690465244622183,
+      "loss": 46.0,
+      "step": 223
+    },
+    {
+      "epoch": 0.004508992823857403,
+      "grad_norm": 5.666902507073246e-05,
+      "learning_rate": 0.00018678419177147685,
+      "loss": 46.0,
+      "step": 224
+    },
+    {
+      "epoch": 0.004529122256106767,
+      "grad_norm": 1.926498043758329e-05,
+      "learning_rate": 0.0001866632187746145,
+      "loss": 46.0,
+      "step": 225
+    },
+    {
+      "epoch": 0.00454925168835613,
+      "grad_norm": 5.15770552738104e-05,
+      "learning_rate": 0.00018654173416978714,
+      "loss": 46.0,
+      "step": 226
+    },
+    {
+      "epoch": 0.0045693811206054934,
+      "grad_norm": 4.0023831388680264e-05,
+      "learning_rate": 0.0001864197386741674,
+      "loss": 46.0,
+      "step": 227
+    },
+    {
+      "epoch": 0.004589510552854857,
+      "grad_norm": 2.732311622821726e-05,
+      "learning_rate": 0.00018629723300794408,
+      "loss": 46.0,
+      "step": 228
+    },
+    {
+      "epoch": 0.00460963998510422,
+      "grad_norm": 3.606328391470015e-05,
+      "learning_rate": 0.00018617421789431747,
+      "loss": 46.0,
+      "step": 229
+    },
+    {
+      "epoch": 0.004629769417353584,
+      "grad_norm": 4.1729483200469986e-05,
+      "learning_rate": 0.0001860506940594955,
+      "loss": 46.0,
+      "step": 230
+    },
+    {
+      "epoch": 0.0046498988496029466,
+      "grad_norm": 4.251101199770346e-05,
+      "learning_rate": 0.00018592666223268917,
+      "loss": 46.0,
+      "step": 231
+    },
+    {
+      "epoch": 0.00467002828185231,
+      "grad_norm": 4.2483963625272736e-05,
+      "learning_rate": 0.00018580212314610846,
+      "loss": 46.0,
+      "step": 232
+    },
+    {
+      "epoch": 0.004690157714101674,
+      "grad_norm": 3.098902016063221e-05,
+      "learning_rate": 0.0001856770775349579,
+      "loss": 46.0,
+      "step": 233
+    },
+    {
+      "epoch": 0.004710287146351037,
+      "grad_norm": 2.9945371352368966e-05,
+      "learning_rate": 0.00018555152613743215,
+      "loss": 46.0,
+      "step": 234
+    },
+    {
+      "epoch": 0.0047304165786004005,
+      "grad_norm": 4.764752884511836e-05,
+      "learning_rate": 0.00018542546969471183,
+      "loss": 46.0,
+      "step": 235
+    },
+    {
+      "epoch": 0.004750546010849764,
+      "grad_norm": 2.68215353571577e-05,
+      "learning_rate": 0.00018529890895095902,
+      "loss": 46.0,
+      "step": 236
+    },
+    {
+      "epoch": 0.004770675443099127,
+      "grad_norm": 5.318366311257705e-05,
+      "learning_rate": 0.00018517184465331288,
+      "loss": 46.0,
+      "step": 237
+    },
+    {
+      "epoch": 0.004790804875348491,
+      "grad_norm": 7.759372965665534e-05,
+      "learning_rate": 0.00018504427755188521,
+      "loss": 46.0,
+      "step": 238
+    },
+    {
+      "epoch": 0.0048109343075978545,
+      "grad_norm": 2.4518141799489968e-05,
+      "learning_rate": 0.00018491620839975617,
+      "loss": 46.0,
+      "step": 239
+    },
+    {
+      "epoch": 0.004831063739847217,
+      "grad_norm": 2.9744596758973785e-05,
+      "learning_rate": 0.00018478763795296962,
+      "loss": 46.0,
+      "step": 240
+    },
+    {
+      "epoch": 0.004851193172096581,
+      "grad_norm": 3.903737888322212e-05,
+      "learning_rate": 0.0001846585669705288,
+      "loss": 46.0,
+      "step": 241
+    },
+    {
+      "epoch": 0.004871322604345945,
+      "grad_norm": 3.140496482956223e-05,
+      "learning_rate": 0.00018452899621439182,
+      "loss": 46.0,
+      "step": 242
+    },
+    {
+      "epoch": 0.004891452036595308,
+      "grad_norm": 2.7846319426316768e-05,
+      "learning_rate": 0.00018439892644946722,
+      "loss": 46.0,
+      "step": 243
+    },
+    {
+      "epoch": 0.004911581468844671,
+      "grad_norm": 2.935269549197983e-05,
+      "learning_rate": 0.00018426835844360929,
+      "loss": 46.0,
+      "step": 244
+    },
+    {
+      "epoch": 0.004931710901094035,
+      "grad_norm": 2.9461683880072087e-05,
+      "learning_rate": 0.00018413729296761364,
+      "loss": 46.0,
+      "step": 245
+    },
+    {
+      "epoch": 0.004951840333343398,
+      "grad_norm": 3.557924719643779e-05,
+      "learning_rate": 0.00018400573079521278,
+      "loss": 46.0,
+      "step": 246
+    },
+    {
+      "epoch": 0.004971969765592762,
+      "grad_norm": 3.282381294411607e-05,
+      "learning_rate": 0.0001838736727030712,
+      "loss": 46.0,
+      "step": 247
+    },
+    {
+      "epoch": 0.0049920991978421245,
+      "grad_norm": 4.159653326496482e-05,
+      "learning_rate": 0.00018374111947078124,
+      "loss": 46.0,
+      "step": 248
+    },
+    {
+      "epoch": 0.005012228630091488,
+      "grad_norm": 3.4549964766483754e-05,
+      "learning_rate": 0.00018360807188085807,
+      "loss": 46.0,
+      "step": 249
+    },
+    {
+      "epoch": 0.005032358062340852,
+      "grad_norm": 4.0204184188041836e-05,
+      "learning_rate": 0.00018347453071873536,
+      "loss": 46.0,
+      "step": 250
+    },
+    {
+      "epoch": 0.005052487494590215,
+      "grad_norm": 8.349636482307687e-05,
+      "learning_rate": 0.00018334049677276045,
+      "loss": 46.0,
+      "step": 251
+    },
+    {
+      "epoch": 0.0050726169268395785,
+      "grad_norm": 3.3643322240095586e-05,
+      "learning_rate": 0.0001832059708341899,
+      "loss": 46.0,
+      "step": 252
+    },
+    {
+      "epoch": 0.005092746359088942,
+      "grad_norm": 3.255937190260738e-05,
+      "learning_rate": 0.00018307095369718456,
+      "loss": 46.0,
+      "step": 253
+    },
+    {
+      "epoch": 0.005112875791338305,
+      "grad_norm": 3.45467560691759e-05,
+      "learning_rate": 0.00018293544615880517,
+      "loss": 46.0,
+      "step": 254
+    },
+    {
+      "epoch": 0.005133005223587669,
+      "grad_norm": 6.099267557146959e-05,
+      "learning_rate": 0.00018279944901900737,
+      "loss": 46.0,
+      "step": 255
+    },
+    {
+      "epoch": 0.0051531346558370324,
+      "grad_norm": 3.314892455819063e-05,
+      "learning_rate": 0.00018266296308063718,
+      "loss": 46.0,
+      "step": 256
+    },
+    {
+      "epoch": 0.005173264088086395,
+      "grad_norm": 2.7799773306469433e-05,
+      "learning_rate": 0.00018252598914942622,
+      "loss": 46.0,
+      "step": 257
+    },
+    {
+      "epoch": 0.005193393520335759,
+      "grad_norm": 4.2107418266823515e-05,
+      "learning_rate": 0.00018238852803398689,
+      "loss": 46.0,
+      "step": 258
+    },
+    {
+      "epoch": 0.005213522952585123,
+      "grad_norm": 6.404446321539581e-05,
+      "learning_rate": 0.00018225058054580765,
+      "loss": 46.0,
+      "step": 259
+    },
+    {
+      "epoch": 0.0052336523848344856,
+      "grad_norm": 5.3031737479614094e-05,
+      "learning_rate": 0.0001821121474992482,
+      "loss": 46.0,
+      "step": 260
+    },
+    {
+      "epoch": 0.005253781817083849,
+      "grad_norm": 4.130045635974966e-05,
+      "learning_rate": 0.00018197322971153467,
+      "loss": 46.0,
+      "step": 261
+    },
+    {
+      "epoch": 0.005273911249333213,
+      "grad_norm": 4.748915307573043e-05,
+      "learning_rate": 0.0001818338280027549,
+      "loss": 46.0,
+      "step": 262
+    },
+    {
+      "epoch": 0.005294040681582576,
+      "grad_norm": 2.8563030355144292e-05,
+      "learning_rate": 0.00018169394319585345,
+      "loss": 46.0,
+      "step": 263
+    },
+    {
+      "epoch": 0.0053141701138319395,
+      "grad_norm": 4.959934449288994e-05,
+      "learning_rate": 0.00018155357611662672,
+      "loss": 46.0,
+      "step": 264
+    },
+    {
+      "epoch": 0.005334299546081302,
+      "grad_norm": 4.6712710172869265e-05,
+      "learning_rate": 0.0001814127275937183,
+      "loss": 46.0,
+      "step": 265
+    },
+    {
+      "epoch": 0.005354428978330666,
+      "grad_norm": 0.00011124753655167297,
+      "learning_rate": 0.0001812713984586139,
+      "loss": 46.0,
+      "step": 266
+    },
+    {
+      "epoch": 0.00537455841058003,
+      "grad_norm": 4.563620314002037e-05,
+      "learning_rate": 0.00018112958954563646,
+      "loss": 46.0,
+      "step": 267
+    },
+    {
+      "epoch": 0.005394687842829393,
+      "grad_norm": 5.554988456424326e-05,
+      "learning_rate": 0.00018098730169194117,
+      "loss": 46.0,
+      "step": 268
+    },
+    {
+      "epoch": 0.005414817275078756,
+      "grad_norm": 4.447490573511459e-05,
+      "learning_rate": 0.00018084453573751072,
+      "loss": 46.0,
+      "step": 269
+    },
+    {
+      "epoch": 0.00543494670732812,
+      "grad_norm": 3.21212355629541e-05,
+      "learning_rate": 0.00018070129252515014,
+      "loss": 46.0,
+      "step": 270
+    },
+    {
+      "epoch": 0.005455076139577483,
+      "grad_norm": 3.499364902381785e-05,
+      "learning_rate": 0.00018055757290048202,
+      "loss": 46.0,
+      "step": 271
+    },
+    {
+      "epoch": 0.005475205571826847,
+      "grad_norm": 4.179975076112896e-05,
+      "learning_rate": 0.00018041337771194121,
+      "loss": 46.0,
+      "step": 272
+    },
+    {
+      "epoch": 0.00549533500407621,
+      "grad_norm": 5.2844952733721584e-05,
+      "learning_rate": 0.0001802687078107702,
+      "loss": 46.0,
+      "step": 273
+    },
+    {
+      "epoch": 0.005515464436325573,
+      "grad_norm": 2.9436003387672827e-05,
+      "learning_rate": 0.0001801235640510138,
+      "loss": 46.0,
+      "step": 274
+    },
+    {
+      "epoch": 0.005535593868574937,
+      "grad_norm": 0.00010626760922605172,
+      "learning_rate": 0.0001799779472895142,
+      "loss": 46.0,
+      "step": 275
+    },
+    {
+      "epoch": 0.005555723300824301,
+      "grad_norm": 7.006096711847931e-05,
+      "learning_rate": 0.00017983185838590587,
+      "loss": 46.0,
+      "step": 276
+    },
+    {
+      "epoch": 0.0055758527330736635,
+      "grad_norm": 4.731449007522315e-05,
+      "learning_rate": 0.0001796852982026107,
+      "loss": 46.0,
+      "step": 277
+    },
+    {
+      "epoch": 0.005595982165323027,
+      "grad_norm": 2.740498530329205e-05,
+      "learning_rate": 0.00017953826760483255,
+      "loss": 46.0,
+      "step": 278
+    },
+    {
+      "epoch": 0.005616111597572391,
+      "grad_norm": 2.5784778699744493e-05,
+      "learning_rate": 0.00017939076746055239,
+      "loss": 46.0,
+      "step": 279
+    },
+    {
+      "epoch": 0.005636241029821754,
+      "grad_norm": 3.0875242373440415e-05,
+      "learning_rate": 0.00017924279864052313,
+      "loss": 46.0,
+      "step": 280
+    },
+    {
+      "epoch": 0.0056563704620711175,
+      "grad_norm": 2.555253195168916e-05,
+      "learning_rate": 0.00017909436201826444,
+      "loss": 46.0,
+      "step": 281
+    },
+    {
+      "epoch": 0.00567649989432048,
+      "grad_norm": 3.1929652323015034e-05,
+      "learning_rate": 0.00017894545847005764,
+      "loss": 46.0,
+      "step": 282
+    },
+    {
+      "epoch": 0.005696629326569844,
+      "grad_norm": 5.2126772061455995e-05,
+      "learning_rate": 0.00017879608887494045,
+      "loss": 46.0,
+      "step": 283
+    },
+    {
+      "epoch": 0.005716758758819208,
+      "grad_norm": 2.7905460228794254e-05,
+      "learning_rate": 0.00017864625411470193,
+      "loss": 46.0,
+      "step": 284
+    },
+    {
+      "epoch": 0.005736888191068571,
+      "grad_norm": 5.273651913739741e-05,
+      "learning_rate": 0.00017849595507387714,
+      "loss": 46.0,
+      "step": 285
+    },
+    {
+      "epoch": 0.005757017623317934,
+      "grad_norm": 2.429057531116996e-05,
+      "learning_rate": 0.00017834519263974197,
+      "loss": 46.0,
+      "step": 286
+    },
+    {
+      "epoch": 0.005777147055567298,
+      "grad_norm": 3.3973785320995376e-05,
+      "learning_rate": 0.00017819396770230793,
+      "loss": 46.0,
+      "step": 287
+    },
+    {
+      "epoch": 0.005797276487816661,
+      "grad_norm": 3.730989556061104e-05,
+      "learning_rate": 0.0001780422811543169,
+      "loss": 46.0,
+      "step": 288
+    },
+    {
+      "epoch": 0.0058174059200660246,
+      "grad_norm": 5.928779864916578e-05,
+      "learning_rate": 0.00017789013389123582,
+      "loss": 46.0,
+      "step": 289
+    },
+    {
+      "epoch": 0.005837535352315388,
+      "grad_norm": 3.284361446276307e-05,
+      "learning_rate": 0.00017773752681125133,
+      "loss": 46.0,
+      "step": 290
+    },
+    {
+      "epoch": 0.005857664784564751,
+      "grad_norm": 2.5975041353376582e-05,
+      "learning_rate": 0.00017758446081526472,
+      "loss": 46.0,
+      "step": 291
+    },
+    {
+      "epoch": 0.005877794216814115,
+      "grad_norm": 4.9675658374326304e-05,
+      "learning_rate": 0.00017743093680688628,
+      "loss": 46.0,
+      "step": 292
+    },
+    {
+      "epoch": 0.0058979236490634785,
+      "grad_norm": 3.443100649747066e-05,
+      "learning_rate": 0.00017727695569243025,
+      "loss": 46.0,
+      "step": 293
+    },
+    {
+      "epoch": 0.005918053081312841,
+      "grad_norm": 4.2306735849706456e-05,
+      "learning_rate": 0.00017712251838090929,
+      "loss": 46.0,
+      "step": 294
+    },
+    {
+      "epoch": 0.005938182513562205,
+      "grad_norm": 5.587004852714017e-05,
+      "learning_rate": 0.00017696762578402918,
+      "loss": 46.0,
+      "step": 295
+    },
+    {
+      "epoch": 0.005958311945811569,
+      "grad_norm": 4.021718632429838e-05,
+      "learning_rate": 0.0001768122788161835,
+      "loss": 46.0,
+      "step": 296
+    },
+    {
+      "epoch": 0.005978441378060932,
+      "grad_norm": 3.435139296925627e-05,
+      "learning_rate": 0.00017665647839444808,
+      "loss": 46.0,
+      "step": 297
+    },
+    {
+      "epoch": 0.005998570810310295,
+      "grad_norm": 4.693563096225262e-05,
+      "learning_rate": 0.0001765002254385757,
+      "loss": 46.0,
+      "step": 298
+    },
+    {
+      "epoch": 0.006018700242559658,
+      "grad_norm": 3.511687464197166e-05,
+      "learning_rate": 0.0001763435208709906,
+      "loss": 46.0,
+      "step": 299
+    },
+    {
+      "epoch": 0.006038829674809022,
+      "grad_norm": 5.281609992380254e-05,
+      "learning_rate": 0.00017618636561678316,
+      "loss": 46.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.006058959107058386,
+      "grad_norm": 6.96783245075494e-05,
+      "learning_rate": 0.0001760287606037043,
+      "loss": 46.0,
+      "step": 301
+    },
+    {
+      "epoch": 0.0060790885393077485,
+      "grad_norm": 3.3282187359873205e-05,
+      "learning_rate": 0.00017587070676215993,
+      "loss": 46.0,
+      "step": 302
+    },
+    {
+      "epoch": 0.006099217971557112,
+      "grad_norm": 7.593463669763878e-05,
+      "learning_rate": 0.0001757122050252058,
+      "loss": 46.0,
+      "step": 303
+    },
+    {
+      "epoch": 0.006119347403806476,
+      "grad_norm": 6.294970808085054e-05,
+      "learning_rate": 0.0001755532563285416,
+      "loss": 46.0,
+      "step": 304
+    },
+    {
+      "epoch": 0.006139476836055839,
+      "grad_norm": 3.691632446134463e-05,
+      "learning_rate": 0.0001753938616105056,
+      "loss": 46.0,
+      "step": 305
+    },
+    {
+      "epoch": 0.0061596062683052025,
+      "grad_norm": 4.616468140739016e-05,
+      "learning_rate": 0.0001752340218120693,
+      "loss": 46.0,
+      "step": 306
+    },
+    {
+      "epoch": 0.006179735700554566,
+      "grad_norm": 2.737195791269187e-05,
+      "learning_rate": 0.00017507373787683142,
+      "loss": 46.0,
+      "step": 307
+    },
+    {
+      "epoch": 0.006199865132803929,
+      "grad_norm": 6.505291094072163e-05,
+      "learning_rate": 0.00017491301075101278,
+      "loss": 46.0,
+      "step": 308
+    },
+    {
+      "epoch": 0.006219994565053293,
+      "grad_norm": 5.131972284289077e-05,
+      "learning_rate": 0.0001747518413834505,
+      "loss": 46.0,
+      "step": 309
+    },
+    {
+      "epoch": 0.0062401239973026565,
+      "grad_norm": 4.8223384510492906e-05,
+      "learning_rate": 0.0001745902307255924,
+      "loss": 46.0,
+      "step": 310
+    },
+    {
+      "epoch": 0.006260253429552019,
+      "grad_norm": 3.8179550756467506e-05,
+      "learning_rate": 0.00017442817973149145,
+      "loss": 46.0,
+      "step": 311
+    },
+    {
+      "epoch": 0.006280382861801383,
+      "grad_norm": 7.28157683624886e-05,
+      "learning_rate": 0.0001742656893578001,
+      "loss": 46.0,
+      "step": 312
+    },
+    {
+      "epoch": 0.006300512294050747,
+      "grad_norm": 4.902153159491718e-05,
+      "learning_rate": 0.00017410276056376456,
+      "loss": 46.0,
+      "step": 313
+    },
+    {
+      "epoch": 0.00632064172630011,
+      "grad_norm": 6.659854261670262e-05,
+      "learning_rate": 0.00017393939431121933,
+      "loss": 46.0,
+      "step": 314
+    },
+    {
+      "epoch": 0.006340771158549473,
+      "grad_norm": 5.896111542824656e-05,
+      "learning_rate": 0.00017377559156458132,
+      "loss": 46.0,
+      "step": 315
+    },
+    {
+      "epoch": 0.006360900590798836,
+      "grad_norm": 3.361068957019597e-05,
+      "learning_rate": 0.00017361135329084428,
+      "loss": 46.0,
+      "step": 316
+    },
+    {
+      "epoch": 0.0063810300230482,
+      "grad_norm": 8.01550195319578e-05,
+      "learning_rate": 0.00017344668045957305,
+      "loss": 46.0,
+      "step": 317
+    },
+    {
+      "epoch": 0.0064011594552975636,
+      "grad_norm": 7.291202200576663e-05,
+      "learning_rate": 0.0001732815740428978,
+      "loss": 46.0,
+      "step": 318
+    },
+    {
+      "epoch": 0.006421288887546926,
+      "grad_norm": 4.988636646885425e-05,
+      "learning_rate": 0.00017311603501550838,
+      "loss": 46.0,
+      "step": 319
+    },
+    {
+      "epoch": 0.00644141831979629,
+      "grad_norm": 4.8562131269136444e-05,
+      "learning_rate": 0.00017295006435464848,
+      "loss": 46.0,
+      "step": 320
+    },
+    {
+      "epoch": 0.006461547752045654,
+      "grad_norm": 3.899990770150907e-05,
+      "learning_rate": 0.00017278366304010993,
+      "loss": 46.0,
+      "step": 321
+    },
+    {
+      "epoch": 0.006481677184295017,
+      "grad_norm": 8.76895574037917e-05,
+      "learning_rate": 0.00017261683205422687,
+      "loss": 46.0,
+      "step": 322
+    },
+    {
+      "epoch": 0.00650180661654438,
+      "grad_norm": 6.916802522027865e-05,
+      "learning_rate": 0.00017244957238186993,
+      "loss": 46.0,
+      "step": 323
+    },
+    {
+      "epoch": 0.006521936048793744,
+      "grad_norm": 7.918164919828996e-05,
+      "learning_rate": 0.00017228188501044043,
+      "loss": 46.0,
+      "step": 324
+    },
+    {
+      "epoch": 0.006542065481043107,
+      "grad_norm": 0.00010430561087559909,
+      "learning_rate": 0.00017211377092986476,
+      "loss": 46.0,
+      "step": 325
+    },
+    {
+      "epoch": 0.006562194913292471,
+      "grad_norm": 3.571771958377212e-05,
+      "learning_rate": 0.00017194523113258804,
+      "loss": 46.0,
+      "step": 326
+    },
+    {
+      "epoch": 0.006562194913292471,
+      "eval_loss": 11.5,
+      "eval_runtime": 125.9586,
+      "eval_samples_per_second": 166.07,
+      "eval_steps_per_second": 83.035,
+      "step": 326
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1303,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 326,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7535828779008.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33d4bc277b56b41edc4bffcdbfbe9d17ee64bf4defcf31380e449771efdfc1d3
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff