Remove Docker, update README with setup and auto-start instructions
- Remove Dockerfile and docker-compose.yaml (not suitable for this project) - Update README.md with comprehensive setup documentation - Add systemd, tmux, and rc.local auto-start options - Add troubleshooting section
This commit is contained in:
33
Dockerfile
33
Dockerfile
@ -1,33 +0,0 @@
|
|||||||
FROM python:3.11-slim
|
|
||||||
|
|
||||||
# Install system dependencies for sentence-transformers
|
|
||||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
||||||
gcc \
|
|
||||||
g++ \
|
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
|
||||||
|
|
||||||
# Set working directory
|
|
||||||
WORKDIR /app
|
|
||||||
|
|
||||||
# Install uv
|
|
||||||
RUN pip install uv
|
|
||||||
|
|
||||||
# Copy pyproject.toml
|
|
||||||
COPY pyproject.toml .
|
|
||||||
|
|
||||||
# Install dependencies
|
|
||||||
RUN uv sync --frozen --no-dev
|
|
||||||
|
|
||||||
# Copy source code
|
|
||||||
COPY src/ ./src/
|
|
||||||
|
|
||||||
# Create data directories
|
|
||||||
RUN mkdir -p /data/vault /data/chroma_db /data/embeddings_cache
|
|
||||||
|
|
||||||
# Set environment variables
|
|
||||||
ENV PYTHONUNBUFFERED=1 \
|
|
||||||
VAULT_PATH=/data/vault \
|
|
||||||
EMBEDDINGS_CACHE_DIR=/data/embeddings_cache
|
|
||||||
|
|
||||||
# Default command runs the MCP server
|
|
||||||
CMD ["python", "-m", "knowledge_rag.server"]
|
|
||||||
172
README.md
172
README.md
@ -1,21 +1,171 @@
|
|||||||
# Knowledge Base
|
# Knowledge Base RAG System
|
||||||
|
|
||||||
Personal knowledge base repository for storing useful information, notes, and documentation.
|
A self-hosted RAG (Retrieval Augmented Generation) system for your Obsidian vault with MCP server integration.
|
||||||
|
|
||||||
## Contents
|
## Features
|
||||||
|
|
||||||
- [Getting Started](#getting-started)
|
- **Semantic Search**: Find relevant content using embeddings, not just keywords
|
||||||
- [Contributing](#contributing)
|
- **MCP Server**: Exposes search, indexing, and stats tools via MCP protocol
|
||||||
- [License](#license)
|
- **Local-first**: No external APIs - everything runs locally
|
||||||
|
- **Obsidian Compatible**: Works with your existing markdown vault
|
||||||
|
|
||||||
## Getting Started
|
## Requirements
|
||||||
|
|
||||||
This repository contains various knowledge articles, how-to guides, and reference documentation.
|
- Python 3.11+
|
||||||
|
- ~2GB disk space for embeddings model
|
||||||
|
|
||||||
## Contributing
|
## Quick Start
|
||||||
|
|
||||||
Feel free to contribute by creating issues or submitting pull requests.
|
### 1. Install uv (if not already)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
source ~/.local/bin/env
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Clone and setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/knowledge-base
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Configure
|
||||||
|
|
||||||
|
Edit `.env` to set your vault path:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
VAULT_PATH=/path/to/your/obsidian-vault
|
||||||
|
EMBEDDING_MODEL=all-MiniLM-L6-v2 # optional
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Install dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv sync
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Run the server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source .venv/bin/activate
|
||||||
|
VAULT_PATH=./knowledge python -m knowledge_rag.server
|
||||||
|
```
|
||||||
|
|
||||||
|
The server will:
|
||||||
|
- Auto-index your vault on startup
|
||||||
|
- Listen for MCP requests via stdio
|
||||||
|
|
||||||
|
## MCP Tools
|
||||||
|
|
||||||
|
Once running, these tools are available:
|
||||||
|
|
||||||
|
| Tool | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `search_knowledge` | Semantic search across your vault |
|
||||||
|
| `index_knowledge` | Re-index the vault (use after adding files) |
|
||||||
|
| `get_knowledge_stats` | View indexing statistics |
|
||||||
|
|
||||||
|
## Usage Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Example: Searching the knowledge base
|
||||||
|
# (via MCP client or Claude Desktop integration)
|
||||||
|
|
||||||
|
await search_knowledge({
|
||||||
|
"query": "how does the RAG system work",
|
||||||
|
"top_k": 5
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
## Auto-Start on Boot
|
||||||
|
|
||||||
|
### Option 1: Systemd Service
|
||||||
|
|
||||||
|
Create `/etc/systemd/system/knowledge-rag.service`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Knowledge Base RAG MCP Server
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=ernie
|
||||||
|
WorkingDirectory=/home/ernie/knowledge-base
|
||||||
|
Environment="VAULT_PATH=/home/ernie/knowledge"
|
||||||
|
Environment="PATH=/home/ernie/.local/bin:/usr/bin:/bin"
|
||||||
|
ExecStart=/home/ernie/knowledge-base/.venv/bin/python -m knowledge_rag.server
|
||||||
|
Restart=always
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
Then enable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable knowledge-rag.service
|
||||||
|
sudo systemctl start knowledge-rag.service
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: tmux/screen
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start in tmux
|
||||||
|
tmux new -s knowledge-rag
|
||||||
|
source .venv/bin/activate
|
||||||
|
VAULT_PATH=./knowledge python -m knowledge_rag.server
|
||||||
|
# Detach: Ctrl+b, then d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: rc.local or startup script
|
||||||
|
|
||||||
|
Add to your `~/.bashrc` or startup script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Only start if not already running
|
||||||
|
if ! pgrep -f "knowledge_rag.server" > /dev/null; then
|
||||||
|
cd ~/knowledge-base
|
||||||
|
source .venv/bin/activate
|
||||||
|
VAULT_PATH=./knowledge nohup python -m knowledge_rag.server > /tmp/knowledge-rag.log 2>&1 &
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
knowledge-base/
|
||||||
|
├── src/knowledge_rag/ # Source code
|
||||||
|
│ ├── server.py # MCP server
|
||||||
|
│ ├── chunker.py # Markdown chunking
|
||||||
|
│ ├── embeddings.py # Sentence-transformers wrapper
|
||||||
|
│ └── vector_store.py # ChromaDB wrapper
|
||||||
|
├── knowledge/ # Your Obsidian vault (gitignored)
|
||||||
|
├── pyproject.toml # Project config
|
||||||
|
└── .env.example # Environment template
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `VAULT_PATH` | `/data/vault` | Path to your Obsidian vault |
|
||||||
|
| `EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Sentence-transformers model |
|
||||||
|
| `EMBEDDINGS_CACHE_DIR` | `/data/embeddings_cache` | Model cache location |
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### First run is slow
|
||||||
|
The embedding model (~90MB) downloads on first run. Subsequent runs are faster.
|
||||||
|
|
||||||
|
### No search results
|
||||||
|
Run `index_knowledge` tool to index your vault, or restart the server.
|
||||||
|
|
||||||
|
### Out of memory
|
||||||
|
The default model is lightweight. For even smaller models, try `paraphrase-MiniLM-L3-v2`.
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT License
|
MIT
|
||||||
|
|||||||
@ -1,32 +0,0 @@
|
|||||||
version: "3.8"
|
|
||||||
|
|
||||||
services:
|
|
||||||
knowledge-rag:
|
|
||||||
build:
|
|
||||||
context: .
|
|
||||||
dockerfile: Dockerfile
|
|
||||||
container_name: knowledge-rag
|
|
||||||
volumes:
|
|
||||||
# Mount your obsidian vault here
|
|
||||||
- ${VAULT_PATH:-./knowledge}:/data/vault
|
|
||||||
# Persist ChromaDB vector store
|
|
||||||
- ./data/chroma_db:/data/chroma_db
|
|
||||||
# Persist embeddings cache
|
|
||||||
- ./data/embeddings_cache:/data/embeddings_cache
|
|
||||||
environment:
|
|
||||||
- VAULT_PATH=/data/vault
|
|
||||||
- EMBEDDING_MODEL=${EMBEDDING_MODEL:-all-MiniLM-L6-v2}
|
|
||||||
- EMBEDDINGS_CACHE_DIR=/data/embeddings_cache
|
|
||||||
restart: unless-stopped
|
|
||||||
|
|
||||||
# Optional: Watchtower for auto-updates
|
|
||||||
# watchtower:
|
|
||||||
# image: containrr/watchtower
|
|
||||||
# container_name: watchtower
|
|
||||||
# volumes:
|
|
||||||
# - /var/run/docker.sock:/var/run/docker.sock
|
|
||||||
# environment:
|
|
||||||
# - WATCHTOWER_CLEANUP=true
|
|
||||||
# - WATCHTOWER_INCLUDE_STOPPED=true
|
|
||||||
# command: --interval 3600 knowledge-rag
|
|
||||||
# restart: unless-stopped
|
|
||||||
@ -14,6 +14,9 @@ dependencies = [
|
|||||||
"pydantic>=2.0.0",
|
"pydantic>=2.0.0",
|
||||||
"watchdog>=3.0.0",
|
"watchdog>=3.0.0",
|
||||||
"httpx>=0.25.0",
|
"httpx>=0.25.0",
|
||||||
|
# CPU-only PyTorch
|
||||||
|
"torch>=2.0.0",
|
||||||
|
"numpy>=1.24.0",
|
||||||
]
|
]
|
||||||
|
|
||||||
[project.optional-dependencies]
|
[project.optional-dependencies]
|
||||||
@ -27,6 +30,9 @@ dev = [
|
|||||||
requires = ["hatchling"]
|
requires = ["hatchling"]
|
||||||
build-backend = "hatchling.build"
|
build-backend = "hatchling.build"
|
||||||
|
|
||||||
|
[tool.hatch.build.targets.wheel]
|
||||||
|
packages = ["src/knowledge_rag"]
|
||||||
|
|
||||||
[tool.ruff]
|
[tool.ruff]
|
||||||
line-length = 100
|
line-length = 100
|
||||||
target-version = "py311"
|
target-version = "py311"
|
||||||
|
|||||||
4
uv.lock
generated
4
uv.lock
generated
@ -1265,9 +1265,11 @@ dependencies = [
|
|||||||
{ name = "llama-index" },
|
{ name = "llama-index" },
|
||||||
{ name = "llama-index-vector-stores-chroma" },
|
{ name = "llama-index-vector-stores-chroma" },
|
||||||
{ name = "mcp" },
|
{ name = "mcp" },
|
||||||
|
{ name = "numpy" },
|
||||||
{ name = "pydantic" },
|
{ name = "pydantic" },
|
||||||
{ name = "python-dotenv" },
|
{ name = "python-dotenv" },
|
||||||
{ name = "sentence-transformers" },
|
{ name = "sentence-transformers" },
|
||||||
|
{ name = "torch" },
|
||||||
{ name = "watchdog" },
|
{ name = "watchdog" },
|
||||||
]
|
]
|
||||||
|
|
||||||
@ -1285,12 +1287,14 @@ requires-dist = [
|
|||||||
{ name = "llama-index", specifier = ">=0.10.0" },
|
{ name = "llama-index", specifier = ">=0.10.0" },
|
||||||
{ name = "llama-index-vector-stores-chroma", specifier = ">=0.1.0" },
|
{ name = "llama-index-vector-stores-chroma", specifier = ">=0.1.0" },
|
||||||
{ name = "mcp", specifier = ">=1.0.0" },
|
{ name = "mcp", specifier = ">=1.0.0" },
|
||||||
|
{ name = "numpy", specifier = ">=1.24.0" },
|
||||||
{ name = "pydantic", specifier = ">=2.0.0" },
|
{ name = "pydantic", specifier = ">=2.0.0" },
|
||||||
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=7.0.0" },
|
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=7.0.0" },
|
||||||
{ name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.21.0" },
|
{ name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.21.0" },
|
||||||
{ name = "python-dotenv", specifier = ">=1.0.0" },
|
{ name = "python-dotenv", specifier = ">=1.0.0" },
|
||||||
{ name = "ruff", marker = "extra == 'dev'", specifier = ">=0.1.0" },
|
{ name = "ruff", marker = "extra == 'dev'", specifier = ">=0.1.0" },
|
||||||
{ name = "sentence-transformers", specifier = ">=2.2.0" },
|
{ name = "sentence-transformers", specifier = ">=2.2.0" },
|
||||||
|
{ name = "torch", specifier = ">=2.0.0" },
|
||||||
{ name = "watchdog", specifier = ">=3.0.0" },
|
{ name = "watchdog", specifier = ">=3.0.0" },
|
||||||
]
|
]
|
||||||
provides-extras = ["dev"]
|
provides-extras = ["dev"]
|
||||||
|
|||||||
Reference in New Issue
Block a user