LightRAG

Author	SHA1	Message	Date
clssck	abb44eccb1	feat(lightrag): improve entity extraction prompts and rerank chunking Enhance entity extraction with better structured prompts: - Reorganize prompt format for improved clarity and consistency - Add XML-style formatting tags for better LLM parsing - Include language parameter in keywords extraction cache key - Fix language parameter usage in keywords_extraction prompt Improve rerank module with chunking fixes: - Fix top_n behavior to limit documents instead of chunks - Add Cohere reranker support with proper chunking - Improve error handling for rerank API responses Update operate.py: - Better entity extraction parsing and validation - Improved cache key generation for multilingual support	2025-12-12 16:45:14 +01:00
clssck	59e89772de	refactor: consolidate to PostgreSQL-only backend and modernize stack Remove legacy storage implementations and deprecated examples: - Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends - Remove Kubernetes deployment manifests and installation scripts - Delete unofficial examples for deprecated backends and offline deployment docs Streamline core infrastructure: - Consolidate storage layer to PostgreSQL-only implementation - Add full-text search caching with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes Modernize frontend and tooling: - Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles - Update Dockerfile for PostgreSQL-only deployment - Add Makefile for common development tasks - Update environment and configuration examples Enhance evaluation and testing capabilities: - Add prompt optimization with DSPy and auto-tuning - Implement ground truth regeneration and variant testing - Add prompt debugging and response comparison utilities - Expand test coverage with new integration scenarios Simplify dependencies and configuration: - Remove offline-specific requirement files - Update pyproject.toml with streamlined dependencies - Add Python version pinning with .python-version - Create project guidelines in CLAUDE.md and AGENTS.md	2025-12-12 16:28:49 +01:00
clssck	da9070ecf7	refactor: remove legacy storage implementations and k8s deployment Remove deprecated storage backends and Kubernetes deployment configuration: - Delete unused storage implementations: FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis - Remove Kubernetes deployment manifests and installation scripts - Delete legacy examples for deprecated backends - Consolidate to PostgreSQL-only storage backend Streamline dependencies and add new capabilities: - Remove deprecated code documentation and migration guides - Add full-text search caching layer with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes - Simplify configuration with PostgreSQL-focused setup Update documentation and configuration: - Rewrite README to focus on supported features - Update environment and configuration examples - Remove Kubernetes-specific documentation - Add new utility scripts for PDF uploads and pipeline monitoring	2025-12-09 14:02:00 +01:00
clssck	b2736517df	docs(readme): rewrite README to highlight specialized fork features, S3 integration, and modern UI	2025-12-08 20:52:57 +01:00
clssck	95c83abcf8	feat(lightrag,lightrag_webui): add S3 storage integration and UI Add S3 storage client and API routes for document management: - Implement s3_routes.py with file upload, download, delete endpoints - Enhance s3_client.py with improved error handling and operations - Add S3 browser UI component with file viewing and management - Implement FileViewer and PDFViewer components for storage preview - Add Resizable and Sheet UI components for layout control Update backend infrastructure: - Add bulk operations and parameterized queries to postgres_impl.py - Enhance document routes with improved type hints - Update API server registration for new S3 routes - Refine upload routes and utility functions Modernize web UI: - Integrate S3 browser into main application layout - Update localization files for storage UI strings - Add storage settings to application configuration - Sync package dependencies and lock files Remove obsolete reproduction script: - Delete reproduce_citation.py (replaced by test suite) Update configuration: - Enhance pyrightconfig.json for stricter type checking	2025-12-07 11:04:38 +01:00
clssck	082a5a8fad	test(lightrag,api): add comprehensive test coverage and S3 support Add extensive test suites for API routes and utilities: - Implement test_search_routes.py (406 lines) for search endpoint validation - Implement test_upload_routes.py (724 lines) for document upload workflows - Implement test_s3_client.py (618 lines) for S3 storage operations - Implement test_citation_utils.py (352 lines) for citation extraction - Implement test_chunking.py (216 lines) for text chunking validation Add S3 storage client implementation: - Create lightrag/storage/s3_client.py with S3 operations - Add storage module initialization with exports - Integrate S3 client with document upload handling Enhance API routes and core functionality: - Add search_routes.py with full-text and graph search endpoints - Add upload_routes.py with multipart document upload support - Update operate.py with bulk operations and health checks - Enhance postgres_impl.py with bulk upsert and parameterized queries - Update lightrag_server.py to register new API routes - Improve utils.py with citation and formatting utilities Update dependencies and configuration: - Add S3 and test dependencies to pyproject.toml - Update docker-compose.test.yml for testing environment - Sync uv.lock with new dependencies Apply code quality improvements across all modified files: - Add type hints to function signatures - Update imports and router initialization - Fix logging and error handling	2025-12-05 23:13:39 +01:00
clssck	65d2cd16b1	feat(examples, lightrag): fix logging and code improvements Fix logging output in evaluation test harness and examples: - Replace print() statements with logger calls in e2e_test_harness.py - Update copy_llm_cache_to_another_storage.py to use logger instead of print - Remove redundant logging configuration in copy_llm_cache_to_another_storage.py Fix path handling and typos: - Correct makedirs() call in lightrag_openai_demo.py to create log_dir directly - Update constants.py comments to clarify SOURCE_IDS_LIMIT_METHOD options - Remove duplicate return statement in utils.py normalize_extracted_info() - Fix error string formatting in chroma_impl.py with !s conversion - Remove unused pipmaster import from chroma_impl.py	2025-12-05 18:10:19 +01:00
clssck	dd1413f3eb	test(lightrag,examples): add prompt accuracy and quality tests Add comprehensive test suites for prompt evaluation: - test_prompt_accuracy.py: 365 lines testing prompt extraction accuracy - test_prompt_quality_deep.py: 672 lines for deep quality analysis - Refactor prompt.py to consolidate optimized variants (removed prompt_optimized.py) - Apply ruff formatting and type hints across 30 files - Update pyrightconfig.json for static type checking - Modernize reproduce scripts and examples with improved type annotations - Sync uv.lock dependencies	2025-12-05 16:39:52 +01:00
clssck	69358d830d	test(lightrag,examples,api): comprehensive ruff formatting and type hints Format entire codebase with ruff and add type hints across all modules: - Apply ruff formatting to all Python files (121 files, 17K insertions) - Add type hints to function signatures throughout lightrag core and API - Update test suite with improved type annotations and docstrings - Add pyrightconfig.json for static type checking configuration - Create prompt_optimized.py and test_extraction_prompt_ab.py test files - Update ruff.toml and .gitignore for improved linting configuration - Standardize code style across examples, reproduce scripts, and utilities	2025-12-05 15:17:06 +01:00
clssck	a6b87df758	feat(postgres): add bulk operations and health check - Implement bulk upsert_nodes/edges via UNWIND reducing round trips - Add health_check for graph connectivity and AGE catalog status - Switch to parameterized queries preventing Cypher injection - Fix node ID sanitization: strip control chars, escape quotes	2025-12-03 18:19:26 +00:00
clssck	c5f230a30c	test: fix env handling, add type hints, improve docs Improve code quality and test robustness: - Refactor environment variable parsing in rerank config using centralized get_env_value helper - Add return type hints to all test methods for better type safety - Fix patch path in test from lightrag.utils to lightrag.rerank for correct import location - Clarify batch insert endpoint behavior regarding duplicate content rejection - Expand .dockerignore to comprehensively exclude node_modules (200MB+), Python cache files, and venv directories - Update dependency groups: align evaluation and test extras with pytest/pre-commit/ruff tools	2025-12-03 15:02:11 +01:00
clssck	9bae6267f6	chore: sync with upstream (#4 ) * chore: sync with upstream - Cohere rerank improvements - Content deduplication - Dependency updates * fix: address CodeRabbit review feedback - Harden env parsing for RERANK_MAX_TOKENS_PER_DOC with try/except - Add @pytest.mark.offline to test_overlap_validation - Remove unused doc_indices variable	2025-12-03 13:16:28 +01:00
clssck	99f950671e	feat(lightrag_webui): optimize GraphControl performance with caching and memoization - Cache selection state and neighbor sets in refs to prevent expensive reducer recreation on every hover/selection change - Memoize theme-derived values (labelColor, edgeColor, etc) to avoid recomputation in reducer functions - Improve node neighbor lookup from O(n) array.includes() to O(1) Set lookup - Refactor nodeReducer and edgeReducer with stable dependencies on themeColors - Remove unnecessary error handling in reducers (defensive checks) - Clean up comments and consolidate logic for improved readability - Fix typo: "Simgma" → "Sigma"	2025-12-03 12:50:47 +01:00
clssck	e106c8e16b	Merge pull request #2 from clssck/sync-upstream-dec-2025 chore: sync with upstream HKUDS/LightRAG	2025-12-01 21:57:11 +01:00
clssck	8d099fc3ac	chore: sync with upstream HKUDS/LightRAG - Add KaTeX extensions (mhchem for chemistry, copy-tex for copying) - Add CASCADE to AGE extension for PostgreSQL - Remove future dependency, replace passlib with bcrypt - Fix Jina embedding configuration and provider defaults - Update gunicorn help text and bump API version to 0258 - Documentation and README updates	2025-12-01 21:30:19 +01:00
clssck	1bdd906753	chore(lightrag): remove legacy prompts and clean up prompt.py Remove unused LLM-generated citation prompts that were kept for backward compatibility but never referenced in codebase. Consolidate duplicate instructions in entity summarization prompt and fix minor typos. - Remove rag_response_with_llm_citations prompt (dead code) - Remove naive_rag_response_with_llm_citations prompt (dead code) - Remove unused cite_ready_* backward compatibility aliases - Consolidate duplicate context/objectivity instructions in summarize prompt - Fix typo in example (extra parenthesis) - Clarify delimiter documentation comment	2025-12-01 21:02:44 +01:00
clssck	663ada943a	chore: add citation system and enhance RAG UI components Add citation tracking and display system across backend and frontend components. Backend changes include citation.py for document attribution, enhanced query routes with citation metadata, improved prompt templates, and PostgreSQL schema updates. Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements, and ChatMessage enhancements for displaying document sources. Update dependencies and docker-compose test configuration for improved development workflow.	2025-12-01 17:50:00 +01:00
clssck	77df910525	chore: add citation system and code formatting setup Add citation.py module for document citation tracking and management. Configure Biome and Ruff for consistent code formatting across TypeScript and Python. Update webui with improved component organization, API client refactoring, and enhanced user interface patterns. Add formatting configs and dependency updates for build toolchain optimization.	2025-11-30 20:51:43 +01:00
clssck	4e58da3583	style(lightrag_webui): fix indentation, color palette, and component optimization - Fix inconsistent indentation in App.tsx (66 → 68 chars) - Refactor GraphControl reducer logic: cache selection/theme in refs to prevent expensive re-renders on every hover/selection change; extract nodeReducer and edgeReducer to useCallback with stable dependencies - Improve GraphViewer performance: extract FocusSync and GraphSearchWithSelection components to prevent re-renders from unrelated store updates - Remove unused imports (X icon, ZapIcon, i18n) - Remove unused function parameter (storageConfig) - Standardize dark theme colors: improve contrast and visual hierarchy (hsl values); update scrollbar colors for better visibility - Normalize quote style: double quotes → single quotes in className attributes - Fix form element styling: improve dark mode button hover states (gray-800/900 → gray-700/800, red-900 → red-800) - Optimize dropdown menu colors: dark mode backgrounds (gray-900/gray-800) - Relocate HIDDEN_COLUMNS constant to module level in TableExplorer - Optimize RowDetailModal: move entries computation to useMemo for perf - Fix useLightragGraph dependency array: add missing minDegree and includeOrphans dependencies	2025-11-30 20:15:27 +01:00
clssck	9f5948650e	chore(lightrag): add wikipedia test dataset for evaluation Add comprehensive test dataset with 7 domain-specific Wikipedia documents (climate, finance, medical, sports) and corresponding test cases in JSON format. Total of 2292 lines of test data across 8 files for RAG quality evaluation and end-to-end testing infrastructure.	2025-11-30 20:14:52 +01:00
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	ef7327bb3e	chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning, Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py for automated evaluation pipelines, and ingest_test_docs.py for batch document ingestion. Updates docker-compose.test.yml with aggressive async settings, memory limits, and optimized chunking parameters. Parallelize entity summarization in operate.py for improved extraction performance. Fix typos in merge node/edge logs.	2025-11-29 10:39:20 +01:00
clssck	d2c9e6e2ec	test(lightrag): add orphan connection feature with quality validation tests Implement automatic orphan entity connection system that identifies entities with no relationships and creates meaningful connections via vector similarity + LLM validation. This improves knowledge graph connectivity and retrieval quality. Changes: - Add orphan connection configuration parameters (thresholds, cross-connect settings) - Implement aconnect_orphan_entities() method with 4-step validation pipeline - Add SQL templates for efficient orphan and candidate entity queries - Create POST /graph/orphans/connect API endpoint with configurable parameters - Add orphan connection validation prompt for LLM-based relationship verification - Include relationship density requirement in extraction prompts to prevent orphans - Update docker-compose.test.yml with optimized extraction parameters - Add quality validation test suite (run_quality_tests.py) for retrieval evaluation - Add unit test framework (test_orphan_connection_quality.py) with test cases - Enable auto-run of orphan connection after document processing	2025-11-28 18:23:30 +01:00
clssck	90825e823a	remove inherited workflows, keep only docker-publish	2025-11-28 09:10:38 +00:00
clssck	3b250fd0d0	simplify docker workflow to manual trigger only	2025-11-28 08:43:36 +00:00
clssck	b6074b9a81	chore(lightrag, lightrag_webui): improve code quality and security - Extract PostgreSQL storage check into named variable for clarity - Move APIRouter initialization into create_table_routes function scope - Add robust type handling for database query results - Add input validation for table names and pagination parameters - Add regex-based SQL injection prevention for table name sanitization - Improve clipboard copy fallback logic and error handling - Add memoization for JSON serialization to prevent unnecessary recalculations - Hide meta column from table explorer UI display - Sort table columns alphabetically for consistent ordering - Add keyboard accessibility to status filter buttons - Add preprocessed status filter to document manager - Update @tanstack/react-query from 5.60.0 to 5.87.1 - Extract dev storage config into constant to reduce duplication - Update documentation comments for clarity	2025-11-27 21:39:42 +01:00
clssck	a9edadef45	feat: add Table Explorer feature with dynamic table data fetching and schema display - Implemented Table Explorer component to allow users to select and view database tables. - Added API calls for fetching table list, schema, and paginated data. - Introduced row detail modal for displaying and copying row data. - Enhanced DataTable component to support row click events. - Updated UI components for better user experience and accessibility. - Added mock data for development mode to facilitate testing. - Updated localization files to include new terms related to tables. - Modified settings store to include storage configuration for conditional UI rendering. - Improved styling and layout for various components to align with new design standards.	2025-11-27 18:27:14 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	4f12fe121d	Change entity extraction logging from warning to info level • Reduce log noise for empty entities	2025-11-27 11:00:34 +08:00
yangdx	93d445dfdd	Add pipeline status lock function for legacy compatibility - Add get_pipeline_status_lock function - Return NamespaceLock for consistency - Support workspace parameter - Enable logging option - Legacy code compatibility	2025-11-25 18:24:39 +08:00
Daniel.y	d2cd1c0722	Merge pull request #2421 from EightyOliveira/fix_catch_order fix:exception handling order error	2025-11-25 17:52:56 +08:00
yangdx	777c91794b	Add Langfuse observability configuration to env.example - Add Langfuse environment variables - Include setup instructions - Support OpenAI compatible APIs - Enable tracing configuration - Add cloud/self-host options	2025-11-25 17:16:55 +08:00
EightyOliveira	8994c70f2f	fix:exception handling order error	2025-11-25 16:36:41 +08:00
Daniel.y	2539b4e2c8	Merge pull request #2418 from danielaskdd/start-without-webui Refact: Allow API Server to Start Without Built WebUI Assets	2025-11-25 03:02:15 +08:00
yangdx	48b67d3077	Handle missing WebUI assets gracefully without blocking server startup - Change build check from error to warning - Redirect to /docs when WebUI unavailable - Add webui_available to health endpoint - Only mount /webui if assets exist - Return status tuple from build check	2025-11-25 02:51:55 +08:00
Daniel.y	2832a2ca7e	Merge pull request #2417 from danielaskdd/neo4j-retry Fix: Add Comprehensive Retry Mechanism for Neo4j Storage Operations	2025-11-25 02:03:48 +08:00
yangdx	5f91063c7a	Add ruff as dependency to pytest and evaluation extras	2025-11-25 02:03:28 +08:00
yangdx	8c4d7a00ad	Refactor: Extract retry decorator to reduce code duplication in Neo4J storage • Define READ_RETRY_EXCEPTIONS constant • Create reusable READ_RETRY decorator • Replace 11 duplicate retry decorators • Improve code maintainability • Add missing retry to edge_degrees_batch	2025-11-25 01:35:21 +08:00
Daniel.y	5b81ef000e	Merge pull request #2410 from netbrah/create-copilot-setup-steps feat: create copilot-setup-steps.yml	2025-11-24 22:36:33 +08:00
yangdx	7aaa51cda9	Add retry decorators to Neo4j read operations for resilience	2025-11-24 22:28:15 +08:00
palanisd	c233da6318	Update copilot-setup-steps.yml	2025-11-23 17:42:04 -05:00
palanisd	1b0413ee74	Create copilot-setup-steps.yml	2025-11-22 15:29:05 -05:00
chaohuang-ai	16eb0d5bee	Merge pull request #2409 from HKUDS/chaohuang-ai-patch-3 Update README.md	2025-11-23 00:54:04 +08:00
chaohuang-ai	37178462ab	Update README.md	2025-11-23 00:53:39 +08:00
chaohuang-ai	6d3bfe46d0	Merge pull request #2408 from HKUDS/chaohuang-ai-patch-2 Update README.md	2025-11-23 00:50:16 +08:00
chaohuang-ai	babbcb566b	Update README.md	2025-11-23 00:48:52 +08:00
yangdx	5f53de8866	Fix Azure configuration examples and correct typos in env.example	2025-11-22 09:05:52 +08:00
yangdx	fa6797f246	Update env.example	2025-11-22 00:32:12 +08:00
yangdx	49fb11e205	Update Azure OpenAI configuration examples	2025-11-22 00:19:23 +08:00
yangdx	7b76211066	Add fallback to AZURE_OPENAI_API_VERSION for embedding API version	2025-11-22 00:14:35 +08:00

1 2 3 4 5 ...

5855 commits