# AIDO.Protein-16B-v1 ## Model Description We pretrained our model in three stages. This model represents the final stage, where we continued training our AIDO.Protein-16B using an additional 100 billion amino acids from Uniref90. ## How to Use ### Build any downstream models from this backbone #### Embedding ```python from genbio_finetune.tasks import Embed model = Embed.from_config({"model.backbone": "proteinfm_v1"}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) embedding = model(collated_batch) print(embedding.shape) print(embedding) ``` #### Sequence Level Classification ```python import torch from genbio_finetune.tasks import SequenceClassification model = SequenceClassification.from_config({"model.backbone": "proteinfm_v1", "model.n_classes": 2}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) logits = model(collated_batch) print(logits) print(torch.argmax(logits, dim=-1)) ``` #### Token Level Classification ```python import torch from genbio_finetune.tasks import TokenClassification model = TokenClassification.from_config({"model.backbone": "proteinfm_v1", "model.n_classes": 3}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) logits = model(collated_batch) print(logits) print(torch.argmax(logits, dim=-1)) ``` #### Regression ```python from genbio_finetune.tasks import SequenceRegression model = SequenceRegression.from_config({"model.backbone": "proteinfm_v1"}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) logits = model(collated_batch) print(logits) ``` #### Protein-Protein Interaction #### Or use our one-liner CLI to finetune or evaluate any of the above! ``` gbft fit --model SequenceClassification --model.backbone proteinfm_v1 --data SequenceClassification --data.path gbft test --model SequenceClassification --model.backbone proteinfm_v1 --data SequenceClassification --data.path ``` For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)