Waseem AlShikh's picture

Waseem AlShikh

wassemgtk

AI & ML interests

Multi-modal, Palmyra LLMs, Knowledge Graph

Organizations

Writer's profile picture Social Post Explorers's profile picture

wassemgtk's activity

reacted to samjulien's post with πŸ”₯ 7 months ago
view post
Post
1952
πŸ”₯ Today, Writer dropped Palmyra-Med-70b and Palmyra-Fin-70b, two new domain-specific models that are setting a new standard for medical and financial model performance.

TL;DR
Palmyra-Med-70b
πŸ”’ 8k and 32k versions available
πŸš€ MMLU performance of ~86%, outperforming other top models
πŸ‘¨β€βš•οΈ Great for diagnosing, planning treatments, medical research, insurance coding and billing
πŸ“ƒ Open-model license for non-commercial use cases
πŸ€— Available on Hugging Face: Writer/Palmyra-Med-70B
πŸ’Ύ Live on NVIDIA NIM: https://build.nvidia.com/writer/palmyra-med-70b

Palmyra-Fin-70b
πŸš€ Passed the CFA Level III exam with a 73% score β€” the first model to do so
πŸ’Έ Skilled at complex tasks like investment research, financial analysis, and sentiment analysis
πŸ“ˆ Outperformed other top models on a long-fin-eval test of real-world use cases
πŸ“ƒ Open-model license for non-commercial use cases
πŸ€— Available on Hugging Face: Writer/Palmyra-Fin-70B-32K
πŸ’Ύ Live on NVIDIA NIM: https://build.nvidia.com/writer/palmyra-fin-70b-32k

Try them out and let us know what you think!
  • 2 replies
Β·
posted an update 10 months ago
view post
Post
3610
Writer team had the opportunity to run an eval for Mixtral-8x22b, results were interesting.

| ---------------------------- |
| #mmlu 77.26 |
| ---------------------------- |
| #hellaswag 88.81 |
| ---------------------------- |
| #truthfulqa 52.05 |
| ---------------------------- |
| #arc_challenge 70.31 |
| ---------------------------- |
| #winogrande 84.93 |
| ---------------------------- |
| #gsm8k 76.65 |
| ---------------------------- |
  • 2 replies
Β·
posted an update 11 months ago
view post
Post
We are thrilled to announce the release of the OmniACT dataset! This revolutionary dataset and benchmark focuses on pushing the limits of how virtual agents can facilitate the automation of our computer tasks. Imagine less clicking and typing, and more observation as your computer takes care of tasks such as organizing schedules or arranging travel arrangements on its own.

Check it out ➑️ [OmniACT Dataset on Hugging Face]( Writer/omniact)

For a deep dive, here’s the paper: [OmniACT Paper](https://arxiv.org/abs/2402.17553)