This is a character (english a-z 0-9 and so on) trained model following Andrej karpathy's llama.c project https://github.com/karpathy/llama2.c on both TinyStories and my own internal similar dataset I made. the last 150k is from a subset of cosmopedia I extracted for younger people.

Trained for 49,152,000,000 tokens

for it to see/output Uppercase letters this model uses a Shift-Key modifier before the letter to become uppercase, and has never been trained on actual uppercase letters.

This modifier is ↨ and here are the functions I use to convert from straight text to the modified format and back.

def add_caseifer(text):
    # Using list comprehension for more efficient concatenation
    return ''.join(['↨' + char.lower() if char.isupper() else char for char in text
    
def remove_caseifer(text):
    new_text = ""
    i = 0
    while i < len(text):
        if text[i] == "↨":
            if i+1 < len(text):
                new_text += text[i+1].upper()
                i += 1
            else:
                pass  # skip this index
        else:
            new_text += text[i]
        i += 1
    return new_text

As such for test strings to use in chat try using somthing like:

↨hello, my name is ↨clara and ↨i like

Run history: iter β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ loss/train β–ˆβ–…β–„β–…β–ƒβ–…β–„β–ƒβ–„β–„β–„β–„β–ƒβ–ƒβ–„β–„β–‚β–β–‚β–‚β–ƒβ–ƒβ–‚β–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–ƒβ–‚β–‚β–‚β–‚β–ƒβ–β–‚β–‚β–β–‚ loss/val β–ˆβ–‡β–†β–…β–…β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β– lr ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ mfu β–…β–…β–…β–„β–…β–…β–…β–…β–…β–…β–„β–…β–…β–…β–β–…β–…β–…β–„β–…β–…β–…β–…β–ˆβ–…β–…β–…β–…β–…β–…β–…β–…β–…β–…β–…β–…β–ˆβ–…β–…β–ˆ step_time β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‚β–‡β–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–β–‡β–ˆβ– tokens β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ

Run summary: iter 500000 loss/train 0.48935 loss/val 0.45042 lr 1e-05 mfu 9.31042 step_time 63441.47873 tokens 49152000000

Downloads last month
174
Safetensors
Model size
85.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Corianas/Microllama_Char_500k_step

Merges
1 model

Dataset used to train Corianas/Microllama_Char_500k_step