bug: fix minor bugs
Browse files- .github/workflows/huggingface-sync.yml +13 -2
- README.md +19 -17
.github/workflows/huggingface-sync.yml
CHANGED
@@ -18,6 +18,15 @@ jobs:
|
|
18 |
git config --global user.email "github-actions[bot]@users.noreply.github.com"
|
19 |
git config --global user.name "github-actions[bot]"
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
- name: Login to Hugging Face
|
22 |
env:
|
23 |
HF_TOKEN: ${{ secrets.HUGGINGFACE_TOKEN }}
|
@@ -26,5 +35,7 @@ jobs:
|
|
26 |
|
27 |
- name: Push to Hugging Face Space
|
28 |
run: |
|
29 |
-
git remote add space https://huggingface.co/spaces/JeffYang52415/LLMEval-Dataset-Parser
|
30 |
-
git
|
|
|
|
|
|
18 |
git config --global user.email "github-actions[bot]@users.noreply.github.com"
|
19 |
git config --global user.name "github-actions[bot]"
|
20 |
|
21 |
+
- name: Set up Python
|
22 |
+
uses: actions/setup-python@v4
|
23 |
+
with:
|
24 |
+
python-version: "3.x"
|
25 |
+
|
26 |
+
- name: Install Hugging Face CLI
|
27 |
+
run: |
|
28 |
+
pip install --upgrade huggingface-hub
|
29 |
+
|
30 |
- name: Login to Hugging Face
|
31 |
env:
|
32 |
HF_TOKEN: ${{ secrets.HUGGINGFACE_TOKEN }}
|
|
|
35 |
|
36 |
- name: Push to Hugging Face Space
|
37 |
run: |
|
38 |
+
git remote add space https://huggingface.co/spaces/JeffYang52415/LLMEval-Dataset-Parser || true
|
39 |
+
git fetch space || true
|
40 |
+
# Force push to ensure sync, use with caution
|
41 |
+
git push -f space main:main
|
README.md
CHANGED
@@ -13,10 +13,12 @@ short_description: A collection of parsers for LLM benchmark datasets
|
|
13 |
|
14 |
**LLMDataParser** is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like **MMLU**, **GSM8k**, and others, streamlining dataset preparation for LLM evaluation. The library aims to simplify the process of working with common LLM benchmark datasets through a consistent API.
|
15 |
|
|
|
|
|
|
|
16 |
## Features
|
17 |
|
18 |
- **Unified Interface**: Consistent `DatasetParser` for all datasets.
|
19 |
-
- **LLM-Agnostic**: Independent of any specific language model.
|
20 |
- **Easy to Use**: Simple methods and built-in Python types.
|
21 |
- **Extensible**: Easily add support for new datasets.
|
22 |
- **Gradio**: Built-in Gradio interface for interactive dataset exploration and testing.
|
@@ -78,22 +80,22 @@ Poetry manages the virtual environment and dependencies automatically, so you do
|
|
78 |
Here's a simple example demonstrating how to use the library:
|
79 |
|
80 |
```python
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
```
|
98 |
|
99 |
We also provide a Gradio demo for interactive testing:
|
|
|
13 |
|
14 |
**LLMDataParser** is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like **MMLU**, **GSM8k**, and others, streamlining dataset preparation for LLM evaluation. The library aims to simplify the process of working with common LLM benchmark datasets through a consistent API.
|
15 |
|
16 |
+
**Spaces**: You can also try out the online demo on Hugging Face Spaces:
|
17 |
+
[LLMEval Dataset Parser Demo](https://huggingface.co/spaces/JeffYang52415/LLMEval-Dataset-Parser)
|
18 |
+
|
19 |
## Features
|
20 |
|
21 |
- **Unified Interface**: Consistent `DatasetParser` for all datasets.
|
|
|
22 |
- **Easy to Use**: Simple methods and built-in Python types.
|
23 |
- **Extensible**: Easily add support for new datasets.
|
24 |
- **Gradio**: Built-in Gradio interface for interactive dataset exploration and testing.
|
|
|
80 |
Here's a simple example demonstrating how to use the library:
|
81 |
|
82 |
```python
|
83 |
+
from llmdataparser import ParserRegistry
|
84 |
+
# list all available parsers
|
85 |
+
ParserRegistry.list_parsers()
|
86 |
+
# get a parser
|
87 |
+
parser = ParserRegistry.get_parser("mmlu")
|
88 |
+
# load the parser
|
89 |
+
parser.load() # optional: task_name, split
|
90 |
+
# parse the parser
|
91 |
+
parser.parse() # optional: split_names
|
92 |
+
|
93 |
+
print(parser.task_names)
|
94 |
+
print(parser.split_names)
|
95 |
+
print(parser.get_dataset_description)
|
96 |
+
print(parser.get_huggingface_link)
|
97 |
+
print(parser.total_tasks)
|
98 |
+
data = parser.get_parsed_data
|
99 |
```
|
100 |
|
101 |
We also provide a Gradio demo for interactive testing:
|