RAHMAN00700 commited on
Commit
c17eb1b
Β·
verified Β·
1 Parent(s): 3bef709

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -144
README.md CHANGED
@@ -1,144 +1,142 @@
1
- ---
2
- title: Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx
3
- emoji: 😻
4
- colorFrom: purple
5
- colorTo: pink
6
- sdk: streamlit
7
- sdk_version: 1.40.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- # Multi-Document Retrieval with Watsonx 😻
13
-
14
- **A Streamlit-powered app for querying multiple document types using Watsonx and LangChain.**
15
-
16
- This project allows users to upload various file formats (PDFs, DOCX, CSV, JSON, YAML, HTML, etc.) and retrieve contextually accurate responses using Watsonx LLM models and LangChain. The app provides a seamless interface to perform retrieval-augmented generation (RAG) from uploaded documents
17
-
18
- **Note**: While this app runs efficiently on machines with low specifications, for faster indexing and response times, I recommend using a more powerful machine.
19
-
20
- ## Live App
21
- [Link to live app](https://huggingface.co/spaces/RAHMAN00700/Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx)
22
-
23
- ![GUI image](assets/1.jpg)
24
- ---
25
-
26
- ## Features
27
-
28
- - **File Support**: Supports multiple file formats such as PDFs, Word documents, PowerPoint presentations, CSV, JSON, YAML, HTML, and plain text.
29
- - **Watsonx LLM Integration**: Utilize IBM Watsonx's LLM models for querying and generating answers.
30
- - **Embeddings**: Uses `HuggingFace` embeddings for document indexing.
31
- - **RAG (Retrieval Augmented Generation)**: Combines document-based retrieval with LLMs for accurate responses.
32
- - **Streamlit Interface**: Provides an intuitive user experience.
33
-
34
- ---
35
-
36
- ## Installation
37
-
38
- Follow these steps to clone and run the project locally:
39
-
40
- ### Prerequisites
41
-
42
- 1. **Python 3.8+** installed on your system.
43
- 2. Install `pip` (Python package manager).
44
- 3. An IBM Watsonx API key and Project ID.
45
- 4. Install Git if not already installed.
46
-
47
- ### Clone the Repository
48
-
49
- ```bash
50
- git clone https://github.com/Abd-al-RahmanH/Multi-Doc-Retrieval-Watsonx.git
51
- cd Multi-Doc-Retrieval-Watsonx
52
- ```
53
- ![Github cloning](assets/2.jpg)
54
-
55
- ### Install Dependencies
56
-
57
- 1. Create a virtual environment (optional but recommended):
58
-
59
- ```bash
60
- python -m venv env
61
- source env/bin/activate # On Windows: .\env\Scripts\activate
62
- ```
63
-
64
- 2. Install required Python packages:
65
-
66
- ```bash
67
- pip install -r requirements.txt
68
- ```
69
-
70
- ### Set Environment Variables
71
-
72
- Create a `.env` file in the project directory with the following keys:
73
-
74
- ```env
75
- WATSONX_API_KEY=<your_watsonx_api_key>
76
- WATSONX_PROJECT_ID=<your_watsonx_project_id>
77
- ```
78
-
79
- ### Run the App
80
-
81
- 1. Start the Streamlit app by running:
82
-
83
- ```bash
84
- streamlit run app.py
85
- ```
86
-
87
- 2. Open the URL displayed in your terminal (usually [http://localhost:8501](http://localhost:8501)) to access the app.
88
-
89
- ---
90
-
91
- ## How to Use
92
-
93
- 1. **Upload Documents**: Drag and drop supported files (e.g., PDFs, DOCX, JSON) in the app sidebar.
94
- 2. **Select Model and Parameters**: Choose a Watsonx model and configure settings like output tokens and decoding methods.
95
- 3. **Ask Questions**: Enter queries in the chat input to retrieve answers based on the uploaded document.
96
-
97
- ![How to use](assets/3.jpg)
98
- ---
99
-
100
- ## Project Structure
101
-
102
- ```plaintext
103
- Multi-Doc-Retrieval-Watsonx/
104
- β”œβ”€β”€ app.py # Main application file
105
- β”œβ”€β”€ requirements.txt # Python dependencies
106
- β”œβ”€β”€ README.md # Project documentation
107
- └── .env # Environment variables (not included in repo, create manually)
108
- ```
109
-
110
- ---
111
-
112
- ## Dependencies
113
-
114
- - **Streamlit**: For building the user interface.
115
- - **LangChain**: For document retrieval and RAG implementation.
116
- - **HuggingFace Transformers**: For embedding and vector representation.
117
- - **Watsonx Foundation Models**: For querying and text generation.
118
- - **Various Python Libraries**: For file handling, including `pandas`, `python-docx`, `python-pptx`, and more.
119
-
120
- ---
121
-
122
- ## Contributing
123
-
124
- We welcome contributions! If you'd like to improve this project:
125
-
126
- 1. Fork the repository.
127
- 2. Create a feature branch: `git checkout -b feature-name`.
128
- 3. Commit your changes: `git commit -m 'Add a new feature'`.
129
- 4. Push to the branch: `git push origin feature-name`.
130
- 5. Open a Pull Request.
131
-
132
- ---
133
-
134
- ## More Blogs and Interesting Projects
135
-
136
- For more blogs and interesting projects, visit my personal website: [https://abdulrahmanh.com](https://abdulrahmanh.com)
137
-
138
- ## License
139
-
140
- This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
141
-
142
- ---
143
-
144
-
 
1
+ ---
2
+ title: Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx
3
+ emoji: 😻
4
+ colorFrom: purple
5
+ colorTo: pink
6
+ sdk: streamlit
7
+ sdk_version: 1.42.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Multi-Document Retrieval with Watsonx 😻
13
+
14
+ **A Streamlit-powered app for querying multiple document types using Watsonx and LangChain.**
15
+
16
+ This project allows users to upload various file formats (PDFs, DOCX, CSV, JSON, YAML, HTML, etc.) and retrieve contextually accurate responses using Watsonx LLM models and LangChain. The app provides a seamless interface to perform retrieval-augmented generation (RAG) from uploaded documents
17
+
18
+ **Note**: While this app runs efficiently on machines with low specifications, for faster indexing and response times, I recommend using a more powerful machine.
19
+
20
+ ## Live App
21
+ [Link to live app](https://huggingface.co/spaces/RAHMAN00700/Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx)
22
+
23
+ ![GUI image](assets/1.jpg)
24
+ ---
25
+
26
+ ## Features
27
+
28
+ - **File Support**: Supports multiple file formats such as PDFs, Word documents, PowerPoint presentations, CSV, JSON, YAML, HTML, and plain text.
29
+ - **Watsonx LLM Integration**: Utilize IBM Watsonx's LLM models for querying and generating answers.
30
+ - **Embeddings**: Uses `HuggingFace` embeddings for document indexing.
31
+ - **RAG (Retrieval Augmented Generation)**: Combines document-based retrieval with LLMs for accurate responses.
32
+ - **Streamlit Interface**: Provides an intuitive user experience.
33
+
34
+ ---
35
+
36
+ ## Installation
37
+
38
+ Follow these steps to clone and run the project locally:
39
+
40
+ ### Prerequisites
41
+
42
+ 1. **Python 3.8+** installed on your system.
43
+ 2. Install `pip` (Python package manager).
44
+ 3. An IBM Watsonx API key and Project ID.
45
+ 4. Install Git if not already installed.
46
+
47
+ ### Clone the Repository
48
+
49
+ ```bash
50
+ git clone https://github.com/Abd-al-RahmanH/Multi-Doc-Retrieval-Watsonx.git
51
+ cd Multi-Doc-Retrieval-Watsonx
52
+ ```
53
+ ![Github cloning](assets/2.jpg)
54
+
55
+ ### Install Dependencies
56
+
57
+ 1. Create a virtual environment (optional but recommended):
58
+
59
+ ```bash
60
+ python -m venv env
61
+ source env/bin/activate # On Windows: .\env\Scripts\activate
62
+ ```
63
+
64
+ 2. Install required Python packages:
65
+
66
+ ```bash
67
+ pip install -r requirements.txt
68
+ ```
69
+
70
+ ### Set Environment Variables
71
+
72
+ Create a `.env` file in the project directory with the following keys:
73
+
74
+ ```env
75
+ WATSONX_API_KEY=<your_watsonx_api_key>
76
+ WATSONX_PROJECT_ID=<your_watsonx_project_id>
77
+ ```
78
+
79
+ ### Run the App
80
+
81
+ 1. Start the Streamlit app by running:
82
+
83
+ ```bash
84
+ streamlit run app.py
85
+ ```
86
+
87
+ 2. Open the URL displayed in your terminal (usually [http://localhost:8501](http://localhost:8501)) to access the app.
88
+
89
+ ---
90
+
91
+ ## How to Use
92
+
93
+ 1. **Upload Documents**: Drag and drop supported files (e.g., PDFs, DOCX, JSON) in the app sidebar.
94
+ 2. **Select Model and Parameters**: Choose a Watsonx model and configure settings like output tokens and decoding methods.
95
+ 3. **Ask Questions**: Enter queries in the chat input to retrieve answers based on the uploaded document.
96
+
97
+ ![How to use](assets/3.jpg)
98
+ ---
99
+
100
+ ## Project Structure
101
+
102
+ ```plaintext
103
+ Multi-Doc-Retrieval-Watsonx/
104
+ β”œβ”€β”€ app.py # Main application file
105
+ β”œβ”€β”€ requirements.txt # Python dependencies
106
+ β”œβ”€β”€ README.md # Project documentation
107
+ └── .env # Environment variables (not included in repo, create manually)
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Dependencies
113
+
114
+ - **Streamlit**: For building the user interface.
115
+ - **LangChain**: For document retrieval and RAG implementation.
116
+ - **HuggingFace Transformers**: For embedding and vector representation.
117
+ - **Watsonx Foundation Models**: For querying and text generation.
118
+ - **Various Python Libraries**: For file handling, including `pandas`, `python-docx`, `python-pptx`, and more.
119
+
120
+ ---
121
+
122
+ ## Contributing
123
+
124
+ We welcome contributions! If you'd like to improve this project:
125
+
126
+ 1. Fork the repository.
127
+ 2. Create a feature branch: `git checkout -b feature-name`.
128
+ 3. Commit your changes: `git commit -m 'Add a new feature'`.
129
+ 4. Push to the branch: `git push origin feature-name`.
130
+ 5. Open a Pull Request.
131
+
132
+ ---
133
+
134
+ ## More Blogs and Interesting Projects
135
+
136
+ For more blogs and interesting projects, visit my personal website: [https://abdulrahmanh.com](https://abdulrahmanh.com)
137
+
138
+ ## License
139
+
140
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
141
+
142
+ ---