PocketLLM - Local AI Chat with Document Retrieval

Full-stack chat application with Ollama (Llama 2), TF-IDF document retrieval, multi-session support, and admin dashboard. Runs entirely offline with local LLM inference.

Overview

PocketLLM is a production-ready, full-stack AI chat application that runs entirely on your machine. It integrates local LLM inference via Ollama, document evidence retrieval using TF-IDF ranking, and provides a modern React interface with real-time streaming responses.

Key Features

  • Local LLM Integration: Uses Ollama with Llama 2 7B Chat model for offline inference
  • Document Evidence Retrieval: Upload documents and receive cited responses with TF-IDF relevance ranking
  • Streaming Responses: Real-time message generation via Server-Sent Events (SSE)
  • Multi-Session Management: Multiple concurrent chat sessions with persistent MongoDB history
  • Admin Dashboard: System metrics, logs, and health monitoring
  • JWT Authentication: Secure user authentication with role-based access control
  • Responsive Design: Mobile-friendly interface built with TailwindCSS

Tech Stack

Frontend: React 18, React Router, TailwindCSS, Axios
Backend: Node.js, Express, Mongoose, JWT
Database: MongoDB
LLM: Ollama (Llama 2 7B Chat)
NLP: Natural.js (TF-IDF, Porter Stemmer)
DevOps: Docker, Docker Compose

Architecture

The application follows a microservices architecture with separate frontend and backend services:

  • Frontend Service: React SPA served via Nginx
  • Backend API: Express.js REST API with SSE streaming
  • Document Service: TF-IDF based semantic search for evidence retrieval
  • LLM Service: Ollama integration for local model inference
  • Session Management: MongoDB for persistent chat history

Highlights

  • Offline-First: Complete local deployment with no external API dependencies
  • Evidence-Based Responses: Documents are ranked by TF-IDF relevance and cited in responses
  • Production-Ready: Includes Docker Compose setup, health monitoring, and admin tools
  • Scalable Design: Supports multiple concurrent sessions and users

Repository

View on GitHub