File size: 4,675 Bytes
1d01073
 
540f8c3
 
1d01073
9c39c74
1d01073
9c39c74
1d01073
9c39c74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d01073
9c39c74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d01073
 
9c39c74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
540f8c3
9c39c74
 
 
 
540f8c3
9c39c74
 
 
1d01073
540f8c3
9c39c74
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# VayuBuddy Question Curation

## 🎯 Aim
The purpose to create this templet is to have the automated interface to collect and manage data analytic questions for VayuBuddy

## πŸ“‚ Folder Structure

The project is organized as follows:

```bash
project_root/
│── app.py                         # Main Streamlit application
│── requirements.txt               # Dependencies list
│── README.md                      # Documentation
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ questions/                 # Stores question-related data
β”‚   β”‚   β”œβ”€β”€ 0/                     # Folder for question ID 0
β”‚   β”‚   β”‚   β”œβ”€β”€ question.txt       # Question text
β”‚   β”‚   β”‚   β”œβ”€β”€ answer.txt         # Answer text
β”‚   β”‚   β”‚   β”œβ”€β”€ code.py            # Reference code for the question
β”‚   β”‚   β”‚   └── metadata.json      # Metadata for the question
β”‚   β”‚   β”œβ”€β”€ 1/                     # Folder for question ID 1
β”‚   β”‚   β”‚   β”œβ”€β”€ question.txt       # Question text
β”‚   β”‚   β”‚   β”œβ”€β”€ answer.txt         # Answer text
β”‚   β”‚   β”‚   β”œβ”€β”€ code.py            # Reference code for the question
β”‚   β”‚   β”‚   └── metadata.json      # Metadata for the question
β”‚   ... ... ...                    # and so on...
β”‚   β”‚   ... ...
β”‚   β”‚
β”‚   └── raw_data/                  # Stores the required CSV's
β”‚       β”œβ”€β”€ NCAP_Funding.csv       # NCAP Funding Data
β”‚       β”œβ”€β”€ State.csv              # States area & population Data
β”‚       └── Data.csv               # Main AQI Data
β”‚
β”œβ”€β”€ pages/                         # Streamlit multipage support
β”‚   β”œβ”€β”€ all_question.py            # Page to view questions
β”‚   β”œβ”€β”€ execute_code.py            # Page to run the code of all questions
β”‚   β”œβ”€β”€ add_question.py            # Page to add new questions
β”‚   β”œβ”€β”€ edit_question.py           # Page to edit existing questions
β”‚   └── delete_question.py         # Page to delete questions
β”‚
β”œβ”€β”€ utils/                         # Utility functions
β”‚   β”œβ”€β”€ load_jsonl.py              # Function to load questions a list
β”‚   β”œβ”€β”€ data_to_jsonl.py           # Function to convert question folders into JSONL 
β”‚   β”œβ”€β”€ jsonl_to_data.py           # Function to convert JSONL into question folders 
β”‚   └── code_services.py           # Handles code formatting & execution
β”‚
└── output.jsonl                   # Processed question data in JSONL format
```

This structure ensures **modularity** and **maintainability** of the project. πŸš€


## πŸ“œ How to use this App

- Add questions through ```Add Questions``` Page
- Edit questions through ```Edit Questions``` Page
- Delete questions through ```Delete Questions``` Page
- The Data will not be saved in-case of missing fields or error in code

### ```NOTE```
- while entering Data form code.py in ```Add Questions``` Page or ```Edit Questions``` Page either follow the ```true_code format``` i.e. all code written in the true_code function and true_code function called in the end of it's defination or follow ```No true_code format```

#### true_code format
```python
def true_code():
    import pandas as pd
    
    df = pd.read_csv('data/raw_data/Data.csv', sep=",")
    
    data = df.groupby(['state','station'])['PM2.5'].mean()
    ans = data.idxmax()[0]
    print(ans)

true_code()
```

#### No true_code format
```python
import pandas as pd

df = pd.read_csv('data/raw_data/Data.csv', sep=",")

data = df.groupby(['state','station'])['PM2.5'].mean()
ans = data.idxmax()[0]
print(ans)
```

## 🧩 Sample Question

### question.txt
```bash
Which state has the highest average PM2.5 concentration across all stations?
```

### answer.txt
```bash
Delhi
```

### code.py
```python
def true_code():
    import pandas as pd
    
    df = pd.read_csv('data/raw_data/Data.csv', sep=",")
    
    data = df.groupby(['state','station'])['PM2.5'].mean()
    ans = data.idxmax()[0]
    print(ans)

true_code()
```

### metadata.json
```json
{
    "question_id": 0,
    "category": "spatial",
    "answer_category": "single",
    "plot": false,
    "libraries": [
        "pandas"
    ]
}
```


## πŸ› οΈ How to Set-Up project

open the terminal in the empty folder and follow the following steps:

### 1st step : clone repo
```bash
git clone https://github.com/ratnesh003/VayuBuddy-Question-Curation.git .
```

### 2rd step : to install the dependencies to run the codes
```bash
pip install -r requirements.txt
```

### 3nd step : to create dummy /data folder from already present output.jsonl
```bash
py .\utils\jsonl_to_data.py 
```