File size: 4,177 Bytes
e67921d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# GraphRAG Indexer Application

## Table of Contents
1. [Introduction](#introduction)
2. [Setup](#setup)
3. [Application Structure](#application-structure)
4. [Indexing](#indexing)
5. [Prompt Tuning](#prompt-tuning)
6. [Data Management](#data-management)
7. [Configuration](#configuration)
8. [API Integration](#api-integration)
9. [Troubleshooting](#troubleshooting)

## Introduction

The GraphRAG Indexer Application is a Gradio-based user interface for managing the indexing and prompt tuning processes of the GraphRAG (Graph Retrieval-Augmented Generation) system. This application provides an intuitive way to configure, run, and monitor indexing and prompt tuning tasks, as well as manage related data files.

## Setup

1. Ensure you have Python 3.7+ installed.
2. Install required dependencies:
   ```
   pip install gradio requests pydantic python-dotenv pyyaml pandas lancedb
   ```
3. Set up environment variables in `indexing/.env`:
   ```
   API_BASE_URL=http://localhost:8012
   LLM_API_BASE=http://localhost:11434
   EMBEDDINGS_API_BASE=http://localhost:11434
   ROOT_DIR=indexing
   ```
4. Run the application:
   ```
   python index_app.py
   ```

## Application Structure

The application is divided into three main tabs:
1. Indexing
2. Prompt Tuning
3. Data Management

Each tab provides specific functionality related to its purpose.

## Indexing

The Indexing tab allows users to configure and run the GraphRAG indexing process.

### Features:
- Select LLM and Embedding models
- Set root directory for indexing
- Configure verbose and cache options
- Advanced options for resuming, reporting, and output formats
- Run indexing and check status

### Usage:
1. Select the desired LLM and Embedding models from the dropdowns.
2. Set the root directory for indexing.
3. Configure additional options as needed.
4. Click "Run Indexing" to start the process.
5. Use "Check Indexing Status" to monitor progress.

## Prompt Tuning

The Prompt Tuning tab enables users to configure and run prompt tuning for GraphRAG.

### Features:
- Set root directory and domain
- Choose tuning method (random, top, all)
- Configure limit, language, max tokens, and chunk size
- Option to exclude entity types
- Run prompt tuning and check status

### Usage:
1. Set the root directory and optional domain.
2. Choose the tuning method and configure parameters.
3. Click "Run Prompt Tuning" to start the process.
4. Use "Check Prompt Tuning Status" to monitor progress.

## Data Management

The Data Management tab provides tools for managing input files and viewing output folders.

### Features:
- File upload functionality
- File list management (view, refresh, delete)
- Output folder exploration
- File content viewing and editing

### Usage:
1. Use the File Upload section to add new input files.
2. Manage existing files in the File Management section.
3. Explore output folders and their contents in the Output Folders section.

## Configuration

The application uses a combination of environment variables and a `config.yaml` file for configuration. Key settings include:

- LLM and Embedding models
- API endpoints
- Community level for GraphRAG
- Token limits
- API keys and types

To modify these settings, edit the `.env` file or create a `config.yaml` file in the root directory.

## API Integration

The application integrates with a backend API for executing indexing and prompt tuning tasks. Key API endpoints used:

- `/v1/index`: Start indexing process
- `/v1/index_status`: Check indexing status
- `/v1/prompt_tune`: Start prompt tuning process
- `/v1/prompt_tune_status`: Check prompt tuning status

These endpoints are called using the `requests` library, with appropriate error handling and logging.

## Troubleshooting

Common issues and solutions:

1. **Model loading fails**: Ensure the LLM_API_BASE is correctly set and the API is accessible.
2. **Indexing or Prompt Tuning doesn't start**: Check API connectivity and verify that all required fields are filled.
3. **File management issues**: Ensure proper read/write permissions in the ROOT_DIR.

For any persistent issues, check the application logs (visible in the console) for detailed error messages.