Remove Docker, update README with setup and auto-start instructions

- Remove Dockerfile and docker-compose.yaml (not suitable for this project) - Update README.md with comprehensive setup documentation - Add systemd, tmux, and rc.local auto-start options - Add troubleshooting section
2026-03-03 21:33:36 -05:00
parent 11c3f705ce
commit 8d09d03fe8
5 changed files with 171 additions and 76 deletions
--- a/README.md
+++ b/README.md
@ -1,21 +1,171 @@
-# Knowledge Base
+# Knowledge Base RAG System

-Personal knowledge base repository for storing useful information, notes, and documentation.
+A self-hosted RAG (Retrieval Augmented Generation) system for your Obsidian vault with MCP server integration.

-## Contents
+## Features

- [Getting Started](#getting-started)
- [Contributing](#contributing)
- [License](#license)
+- **Semantic Search**: Find relevant content using embeddings, not just keywords
+- **MCP Server**: Exposes search, indexing, and stats tools via MCP protocol
+- **Local-first**: No external APIs - everything runs locally
+- **Obsidian Compatible**: Works with your existing markdown vault

-## Getting Started
+## Requirements

-This repository contains various knowledge articles, how-to guides, and reference documentation.
+- Python 3.11+
+- ~2GB disk space for embeddings model

-## Contributing
+## Quick Start

-Feel free to contribute by creating issues or submitting pull requests.
+### 1. Install uv (if not already)
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+source ~/.local/bin/env
+```
+
+### 2. Clone and setup
+
+```bash
+cd ~/knowledge-base
+cp .env.example .env
+```
+
+### 3. Configure
+
+Edit `.env` to set your vault path:
+
+```bash
+VAULT_PATH=/path/to/your/obsidian-vault
+EMBEDDING_MODEL=all-MiniLM-L6-v2  # optional
+```
+
+### 4. Install dependencies
+
+```bash
+uv sync
+```
+
+### 5. Run the server
+
+```bash
+source .venv/bin/activate
+VAULT_PATH=./knowledge python -m knowledge_rag.server
+```
+
+The server will:
+- Auto-index your vault on startup
+- Listen for MCP requests via stdio
+
+## MCP Tools
+
+Once running, these tools are available:
+
+| Tool | Description |
+|------|-------------|
+| `search_knowledge` | Semantic search across your vault |
+| `index_knowledge` | Re-index the vault (use after adding files) |
+| `get_knowledge_stats` | View indexing statistics |
+
+## Usage Example
+
+```python
+# Example: Searching the knowledge base
+# (via MCP client or Claude Desktop integration)
+
+await search_knowledge({
+    "query": "how does the RAG system work",
+    "top_k": 5
+})
+```
+
+## Auto-Start on Boot
+
+### Option 1: Systemd Service
+
+Create `/etc/systemd/system/knowledge-rag.service`:
+
+```ini
+[Unit]
+Description=Knowledge Base RAG MCP Server
+After=network.target
+
+[Service]
+Type=simple
+User=ernie
+WorkingDirectory=/home/ernie/knowledge-base
+Environment="VAULT_PATH=/home/ernie/knowledge"
+Environment="PATH=/home/ernie/.local/bin:/usr/bin:/bin"
+ExecStart=/home/ernie/knowledge-base/.venv/bin/python -m knowledge_rag.server
+Restart=always
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Then enable:
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable knowledge-rag.service
+sudo systemctl start knowledge-rag.service
+```
+
+### Option 2: tmux/screen
+
+```bash
+# Start in tmux
+tmux new -s knowledge-rag
+source .venv/bin/activate
+VAULT_PATH=./knowledge python -m knowledge_rag.server
+# Detach: Ctrl+b, then d
+```
+
+### Option 3: rc.local or startup script
+
+Add to your `~/.bashrc` or startup script:
+
+```bash
+# Only start if not already running
+if ! pgrep -f "knowledge_rag.server" > /dev/null; then
+    cd ~/knowledge-base
+    source .venv/bin/activate
+    VAULT_PATH=./knowledge nohup python -m knowledge_rag.server > /tmp/knowledge-rag.log 2>&1 &
+fi
+```
+
+## Project Structure
+
+```
+knowledge-base/
+├── src/knowledge_rag/    # Source code
+│   ├── server.py         # MCP server
+│   ├── chunker.py        # Markdown chunking
+│   ├── embeddings.py     # Sentence-transformers wrapper
+│   └── vector_store.py  # ChromaDB wrapper
+├── knowledge/            # Your Obsidian vault (gitignored)
+├── pyproject.toml        # Project config
+└── .env.example          # Environment template
+```
+
+## Configuration
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `VAULT_PATH` | `/data/vault` | Path to your Obsidian vault |
+| `EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | Sentence-transformers model |
+| `EMBEDDINGS_CACHE_DIR` | `/data/embeddings_cache` | Model cache location |
+
+## Troubleshooting
+
+### First run is slow
+The embedding model (~90MB) downloads on first run. Subsequent runs are faster.
+
+### No search results
+Run `index_knowledge` tool to index your vault, or restart the server.
+
+### Out of memory
+The default model is lightweight. For even smaller models, try `paraphrase-MiniLM-L3-v2`.

 ## License

-MIT License
+MIT