Model Card for Model ID

This is a vanilla example of the model resulting from DoRA fine tuning. The Jupyter Notebook here: https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it provides a template for DoRA /LoRA fine tuning of Gemma 2-2B instruct. This template can be easily modified to create a custom LLM to solve a real world problem.

Model Details

Gemma-2 Fine-Tuning with LoRA and DoRA: A Practical plug and play Template

Overview:

The notebook where this model was developed provides a practical, simple case template for fine-tuning Gemma-2 models (2B, 9B, 27B) using the Weight-Decomposed Low-Rank Adaptation (DoRA) version of Low-Rank Adaptation (LoRA), on a free-tier Google Colab GPU. This approach allows for efficient customization of Gemma-2 for specific tasks without the computational overhead of full fine-tuning. The basic concepts are discussed, but this notebook is meant to be a practical template for any developer at any level to be able to "just plug and play" without needing a PhD in math to do it.

Model Description

The Problem: "Off The Shelf" LLMs are great, but they are jacks of all trades and masters at none:

Gemma-2 offers impressive performance, especially for its size, excelling at code generation, complex question answering, and following nuanced instructions. The quality of its writing, quality of the explainations it generates, and "human - like" writing style is also rather impressive. However, like other pre-trained LLMs, its performance an a niche task needs to be enhanced a bit, and that is where fine-tuning on task-specific data comes in. Traditional fine-tuning is computationally expensive, involves thousands of dollars in compute resources, and leaves a gaping carbon footprint, making it impractical for many users.

LoRA: A second Generation approach to Parameter-Efficient Fine-Tuning Solutions:

LoRA addresses this challenge by freezing the pre-trained model weights, in other words, basically leaving the existing model as is, and training a small set of new weights that are added in parallel to some of the model's layers.
- The benefit: This drastically reduces the number of trainable parameters, enabling efficient fine-tuning on consumer-grade hardware. This provides almost as good accuracy as full fine tuning, and requires as little as 1% as many compute resources to accomplish.
- The drawback we really want to avoid here: The models that classic case LoRA produces is usually lower than that of full fine tuning.
For advanced users: these adapters are low rank matrices (adapter weights) injected along side specific layers, usually the query, key, and value feed forward layers.

DoRA: A 3rd Generation approach to Parameter-Efficient Fine-Tuning Solutions that we will use here:

Weight-Decomposed Low-Rank Adaptation (DoRA) builds upon LoRA by adding a matrix factorization that improves the accuracy without much additional computational expense. You don't really need to understand what is happening under the hood to use it. This template will is fairly robust and should work reasonably well on a lot of data sets.
- The benefits: Like conventional LoRA, we are leaving the model's original weights as - is and only training adapters that were added that account for less than 1% of the model's weights. Unlike conventional LoRA, DoRA will often create models that are equally as accurate as those done by expensive full fine tuning, and if not equally accurate, very close to it in most cases if done correctly and carefully optimized and on the right training data.
**For advanced users:"" What is happening is that DoRA incorporates orthogonal constraints on the adapter weights. This technique decomposes the updates of the weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the magnitude is handled by a separate learnable parameter. You can read more about it on the resources below, but to stay true to the scope of this notebook, to serve as a practical template and guide to arrive at a proof of concept or MVP custom LLM that can be refined later by advanced users if need be, we refer you to the paper and other academic materials rather than go deep into the details.
- https://arxiv.org/abs/2402.09353
- https://www.youtube.com/watch?v=J2WzLS9TggQ

Why This Template (On the Github Link Above)?

Practical, Plug and Play: If you don't understand the theory discussed here, no problem. If you understand the basics of Python and follow the instructions, this template can be easily used to fine tune your own custom LLM without any cost to you to do so. If you are a developer, you can use other tutorials to integrate the model you create into a chatbot UI like one of these to make a practical app.
- https://www.gradio.app/docs/gradio/chatinterface
- https://reflex.dev/docs/getting-started/chatapp-tutorial/
Free-Tier Colab Ready: Designed to run efficiently on Google Colab's free T4 GPUs, making powerful LLM customization accessible to everyone.
Scalable: Easily adaptable for larger Gemma-2 models (9B, 27B) by simply changing the model_name and running in a suitable environment with more resources.
Simple and Customizable: Provides a clear and concise code structure that can be easily modified for various tasks and datasets.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: David Thrower and Cerebros AutoML
Funded by [optional]: David Thrower
Shared by [optional]: David Thrower
Model type: LLM, Fork of Gemma 2-2B-IT
Language(s) (NLP): [More Information Needed]
License: Gemma, Cerebros modified Apache 2.0
Finetuned from model [optional]: Gemma 2-2B-IT

Model Sources [optional]

Repository: https://huggingface.co/google/gemma-2-2b-it
Paper [optional]: [More Information Needed]
Demo [optional]: https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it

Uses

The model and associated Jupyter notebook demonstrate how you can easily make a custom LLM to do whatever you want, using a simple template to fine tune Gemma family models using DoRA / LoRA fine tuning on free tier Google Colab notebooks.

Direct Use

This model is a simple vanilla demo. The Jupyter notebook associated with it will enable you to make a fine tuned LLM to do pretty much whatever you want.

Downstream Use [optional]

Follow the fine tuning template and fine tune it to do whatever you want, as long as it is legal and ethical.

Out-of-Scope Use

Anything that Cerebros modified Apache 2.0 excludes:
- Anything Apache 2.0 excludes
- Military use, except explicitly authorized by the author
- Law enforcement use intended to aide in making decisions that lead to a anyone being incarcerated or in any way managing an incarceration operation, or criminal prosecution operation, jail, prison, or participating in decisions that flag citizens for investigation or exclusion from public locations, whether physical or virtual.
- Use in committing property or violent crimes
- Use in any application supporting the adult films industry
- Use in any application supporting or in any way promoting the alcoholic beverages, firearms, and / or tobacco industries
- Any use supporting the trade, marketing of, or administration of prescription drugs which are commonly abused
- Use in a manner intended to identify or discriminate against anyone on any ethnic, ideological, religious, racial, demographic,familial status,family of origin, sex or gender, gender identity, sexual orientation, status of being a victim of any crime, having a history or present status of being a good-faith litigant, national origin(including citizenship or lawful resident status), disability, age, pregnancy, parental status, mental health, income, or socioeconomic / credit status (which includes lawful credit, tenant, and HR screening other than screening for criminal history).
- Promoting controversial services such as abortion, via any and all types of marketing, market targeting, operational,administrative, or financial support for providers of such services.
- Any use supporting any operation which attempts to sway public opinion, political alignment, or purchasing habits via means such as:
- Misleading the public to believe that the opinions promoted by said operation are those of a different group of people than those which the campaign portrays them as being. For example, a political group attempting to cast an image that a given political alignment is that of low income rural citizens, when such is not consistent with known statistics on such population (commonly referred to as astroturfing).
- Leading the public to believe premises that contradict duly accepted scientific findings, implausible doctrines, or premises that are generally regarded as heretical or occult.
- Promoting or managing any operation profiting from dishonest or unorthodox marketing practices or marketing unorthodox products generally regarded as a junk deal to consumers or employees: (e.g. multi-level marketing operations, 'businesses' that rely on 1099 contractors not ensured a regular wage for all hours worked, companies having any full time employee paid less than $40,000 per year at the time of this writing weighted to BLS inflation, short term consumer lenders and retailers / car dealers offering credit to consumers who could not be approved for the same loan by an FDIC insured bank, operations that make sales through telemarketing or automated phone calls, non-opt-in email distribution marketing, vacation timeshare operations, etc.)
- Any use that supports copyright, trademark, patent, or trade secret infringement.
- Any use that may reasonably be deemed as negligent.
- Any use intended to prevent Cerebros from operating their own commercial distribution of Cerebros or any attempt to gain a de-facto monopoly on commercial or managed platform use of this or a derivative work.
- Any use in an AI system that is inherently designed to avoid contact from customers, employees, applicants, citizens, or otherwise makes decisions that significantly affect a person's life or finances without human review of ALL decisions made by said system having an unfavorable impact on a person. Example of acceptable uses under this term:
  - An IVR or email routing system that predicts which department a customer's inquiry should be routed to.
  - Examples of unacceptable uses under this term:
  - An IVR system that is designed to make it cumbersome for a customer to reach a human representative at a company (e.g. the system has no option to reach a human representative or the option is in a nested layer of a multi - layer menu of options).
  - Email screening applications that only allow selected categories of email from known customers, employees, constituents, etc to appear in a business or government representative's email inbox, blindly discarding or obfuscating all other inquiries.
Anything that violates Google's terms of use for Gemma and derivative works:
- https://ai.google.dev/gemma/terms

Bias, Risks, and Limitations

It is always your responsibility to screen models you create for bias and proper operating charastics as required in the professional ethics of your use case.
Being a fork of Gemma reasonable efforts have been made at the foundation model level to ensure it is fair.

Recommendations

Follow the DIY fine Dora fine tuning https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it (download the .ipynb file and run it in Google Colab)
Adapt the training data set to suit your own use case.
Make contributions to the notebook and extend it.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model. https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it (The .ipynb file found here)

Training Details

This is a basic vanilla example and a template to train your own fine - tuned Gemma 2 model.

Training Data

A simple - case vanilla data set meant to emulate a start - up's proof of concept / MVP. This is meant to be replaced with your own data.

Training Procedure

See the Jupyter notebook in the repo link. Run it in Google Colab and modify it to your use case.

Preprocessing [optional]

N/A

Training Hyperparameters

See the Jupyter notebook in the repo link.

Speeds, Sizes, Times [optional]

N/A

Evaluation

Contributions welcome!

Testing Data, Factors & Metrics

Testing Data

Contributions welcome!

Factors

Contributions welcome!

Model Examination [optional]

Contributions welcome!

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Google Colab T4
Hours used: 0.15
Cloud Provider: GCP
Compute Region: N/A
Carbon Emitted: 2g

Technical Specifications [optional]

N/A

Model Architecture and Objective

Gemma CausaLLM not otherwise specified.

Compute Infrastructure

N/A

Hardware

T4 GPU

Software

N/A

Citation [optional]

N/A

BibTeX:

N/A

APA:

N/A

Glossary [optional]

N/A

More Information [optional]

N/A

Model Card Authors [optional]

N/A

Model Card Contact

David Thrower [email protected] (239) 645-3585 https://www.linkedin.com/in/david-thrower-%F0%9F%8C%BB-2972482a