LightRAG

Author	SHA1	Message	Date
clssck	59e89772de	refactor: consolidate to PostgreSQL-only backend and modernize stack Remove legacy storage implementations and deprecated examples: - Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends - Remove Kubernetes deployment manifests and installation scripts - Delete unofficial examples for deprecated backends and offline deployment docs Streamline core infrastructure: - Consolidate storage layer to PostgreSQL-only implementation - Add full-text search caching with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes Modernize frontend and tooling: - Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles - Update Dockerfile for PostgreSQL-only deployment - Add Makefile for common development tasks - Update environment and configuration examples Enhance evaluation and testing capabilities: - Add prompt optimization with DSPy and auto-tuning - Implement ground truth regeneration and variant testing - Add prompt debugging and response comparison utilities - Expand test coverage with new integration scenarios Simplify dependencies and configuration: - Remove offline-specific requirement files - Update pyproject.toml with streamlined dependencies - Add Python version pinning with .python-version - Create project guidelines in CLAUDE.md and AGENTS.md	2025-12-12 16:28:49 +01:00
clssck	da9070ecf7	refactor: remove legacy storage implementations and k8s deployment Remove deprecated storage backends and Kubernetes deployment configuration: - Delete unused storage implementations: FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis - Remove Kubernetes deployment manifests and installation scripts - Delete legacy examples for deprecated backends - Consolidate to PostgreSQL-only storage backend Streamline dependencies and add new capabilities: - Remove deprecated code documentation and migration guides - Add full-text search caching layer with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes - Simplify configuration with PostgreSQL-focused setup Update documentation and configuration: - Rewrite README to focus on supported features - Update environment and configuration examples - Remove Kubernetes-specific documentation - Add new utility scripts for PDF uploads and pipeline monitoring	2025-12-09 14:02:00 +01:00
clssck	95c83abcf8	feat(lightrag,lightrag_webui): add S3 storage integration and UI Add S3 storage client and API routes for document management: - Implement s3_routes.py with file upload, download, delete endpoints - Enhance s3_client.py with improved error handling and operations - Add S3 browser UI component with file viewing and management - Implement FileViewer and PDFViewer components for storage preview - Add Resizable and Sheet UI components for layout control Update backend infrastructure: - Add bulk operations and parameterized queries to postgres_impl.py - Enhance document routes with improved type hints - Update API server registration for new S3 routes - Refine upload routes and utility functions Modernize web UI: - Integrate S3 browser into main application layout - Update localization files for storage UI strings - Add storage settings to application configuration - Sync package dependencies and lock files Remove obsolete reproduction script: - Delete reproduce_citation.py (replaced by test suite) Update configuration: - Enhance pyrightconfig.json for stricter type checking	2025-12-07 11:04:38 +01:00
clssck	082a5a8fad	test(lightrag,api): add comprehensive test coverage and S3 support Add extensive test suites for API routes and utilities: - Implement test_search_routes.py (406 lines) for search endpoint validation - Implement test_upload_routes.py (724 lines) for document upload workflows - Implement test_s3_client.py (618 lines) for S3 storage operations - Implement test_citation_utils.py (352 lines) for citation extraction - Implement test_chunking.py (216 lines) for text chunking validation Add S3 storage client implementation: - Create lightrag/storage/s3_client.py with S3 operations - Add storage module initialization with exports - Integrate S3 client with document upload handling Enhance API routes and core functionality: - Add search_routes.py with full-text and graph search endpoints - Add upload_routes.py with multipart document upload support - Update operate.py with bulk operations and health checks - Enhance postgres_impl.py with bulk upsert and parameterized queries - Update lightrag_server.py to register new API routes - Improve utils.py with citation and formatting utilities Update dependencies and configuration: - Add S3 and test dependencies to pyproject.toml - Update docker-compose.test.yml for testing environment - Sync uv.lock with new dependencies Apply code quality improvements across all modified files: - Add type hints to function signatures - Update imports and router initialization - Fix logging and error handling	2025-12-05 23:13:39 +01:00
clssck	65d2cd16b1	feat(examples, lightrag): fix logging and code improvements Fix logging output in evaluation test harness and examples: - Replace print() statements with logger calls in e2e_test_harness.py - Update copy_llm_cache_to_another_storage.py to use logger instead of print - Remove redundant logging configuration in copy_llm_cache_to_another_storage.py Fix path handling and typos: - Correct makedirs() call in lightrag_openai_demo.py to create log_dir directly - Update constants.py comments to clarify SOURCE_IDS_LIMIT_METHOD options - Remove duplicate return statement in utils.py normalize_extracted_info() - Fix error string formatting in chroma_impl.py with !s conversion - Remove unused pipmaster import from chroma_impl.py	2025-12-05 18:10:19 +01:00
clssck	69358d830d	test(lightrag,examples,api): comprehensive ruff formatting and type hints Format entire codebase with ruff and add type hints across all modules: - Apply ruff formatting to all Python files (121 files, 17K insertions) - Add type hints to function signatures throughout lightrag core and API - Update test suite with improved type annotations and docstrings - Add pyrightconfig.json for static type checking configuration - Create prompt_optimized.py and test_extraction_prompt_ab.py test files - Update ruff.toml and .gitignore for improved linting configuration - Standardize code style across examples, reproduce scripts, and utilities	2025-12-05 15:17:06 +01:00
yangdx	0c4cba3860	Fix double decoration in azure_openai_embed and document decorator usage • Remove redundant @retry decorator • Call openai_embed.func directly • Add detailed decorator documentation • Prevent double parameter injection • Fix EmbeddingFunc wrapping issues	2025-11-21 18:03:53 +08:00
yangdx	0fb2925c6a	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling	2025-11-19 21:38:17 +08:00
yangdx	90f52acf0c	Fix linting	2025-11-17 12:28:53 +08:00
yangdx	c13f9116d9	Add embedding dimension validation to EmbeddingFunc wrapper • Validate total elements divisibility • Check vector count matches input count • Raise clear error messages on mismatch • Ensure embedding output correctness • Add docstring for EmbeddingFunc class	2025-11-17 12:26:54 +08:00
yangdx	05852e1ab2	Add max_token_size parameter to embedding function decorators - Add max_token_size=8192 to all embed funcs - Move siliconcloud to deprecated folder - Import wrap_embedding_func_with_attrs - Update EmbeddingFunc docstring - Fix langfuse import type annotation	2025-11-14 18:41:43 +08:00
yangdx	6de4123f74	Optimize JSON string sanitization with precompiled regex and zero-copy - Precompile regex pattern at module level - Zero-copy path for clean strings - Use C-level regex for performance - Remove deprecated _sanitize_json_data - Fast detection for common case	2025-11-12 15:42:07 +08:00
yangdx	777c987371	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-12 13:48:56 +08:00
yangdx	f28a0c25b1	Improve JSON data sanitization to handle tuples and dict keys - Sanitize dictionary keys - Preserve tuple types - Handle nested structures better	2025-11-12 00:50:18 +08:00
yangdx	6918a88f92	Add specialized JSON string sanitizer to prevent UTF-8 encoding errors • Remove surrogate characters (U+D800-DFFF) • Filter Unicode non-characters • Direct char-by-char filtering	2025-11-12 00:38:47 +08:00
yangdx	d1f4b6e515	Add data sanitization to JSON writing to prevent UTF-8 encoding errors • Add _sanitize_json_data helper function • Recursively clean strings in data • Sanitize before JSON serialization • Prevent encoding-related crashes • Use existing sanitize_text_for_encoding	2025-11-12 00:11:13 +08:00
yangdx	c14f25b7f8	Add mandatory dimension parameter handling for Jina API compliance	2025-11-07 21:08:34 +08:00
yangdx	33a1482f7f	Add optional embedding dimension parameter control via env var * Add EMBEDDING_SEND_DIM environment variable * Update Jina/OpenAI embed functions * Add send_dimensions to EmbeddingFunc * Auto-inject embedding_dim when enabled * Add parameter validation warnings	2025-11-07 20:46:40 +08:00
yangdx	5f49cee20f	Merge branch 'main' into VOXWAVE-FOUNDRY/main	2025-11-06 15:37:35 +08:00
yangdx	3fbd704bf9	Enhance entity/relation editing with chunk tracking synchronization • Add chunk storage sync to edit ops • Implement incremental chunk ID updates • Support entity renaming migrations • Normalize relation keys consistently • Preserve chunk references on edits	2025-10-26 14:34:56 +08:00
yangdx	a9fec26798	Add file path limit configuration for entities and relations • Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function	2025-10-20 20:12:53 +08:00
Humphry	0b3d31507e	extended to use gemini, sswitched to use gemini-flash-latest	2025-10-20 13:17:16 +03:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	03333d63f3	Merge branch 'main' into limit-vdb-metadata-size	2025-10-17 21:36:06 +08:00
yangdx	f555824064	Fix tuple delimiter corruption handling in regex patterns	2025-10-17 18:43:45 +08:00
DivinesLight	c06522b927	Get max source Id config from .env and lightRAG init	2025-10-15 18:24:38 +05:00
haseebuchiha	d52c3377b4	Import from env and use default if none and removed useless import	2025-10-14 16:14:03 +05:00
DivinesLight	54f0a7d1ca	Quick fix to limit source_id ballooning while inserting nodes	2025-10-14 14:47:04 +05:00
NeelM0906	f6d1fb98ac	Fix Linting errors	2025-10-09 16:52:22 -04:00
yangdx	a528213210	Fix logging filter logic • Fix boolean operator precedence in filter • Consolidate GET/POST condition logic	2025-09-26 19:42:33 +08:00
yangdx	8cd4139cbf	refactor: fix double query problem by add aquery_llm function for consistent response handling - Add new aquery_llm/query_llm methods providing structured responses - Consolidate /query and /query/stream endpoints to use unified aquery_llm - Optimize cache handling by moving cache checks before LLM calls	2025-09-26 19:05:03 +08:00
yangdx	5eb4a4b799	feat: simplify citations, add reference merging, and restructure API response format	2025-09-24 14:30:10 +08:00
yangdx	6e2eab5c23	Add ID fields to entities, relations, and chunks in raw data query results	2025-09-21 23:31:35 +08:00
yangdx	18e886d7e9	Improve context item identification with meaningful IDs - Add EN prefix to entitie IDs - Add RE prefix to relation IDs -Add DC prefix chunk IDs - Enhance traceability across contexts	2025-09-21 20:19:14 +08:00
yangdx	37d01e2df8	fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses	2025-09-15 03:45:09 +08:00
yangdx	e71229698d	refactor: centralize metadata generation in query functions - Remove processing_info generation from _convert_to_user_format function - Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions - Simplify _convert_to_user_format to focus only on data format conversion	2025-09-15 03:11:07 +08:00
yangdx	b1c8206346	Add aquery_data endpoint for structured retrieval without LLM generation - Add QueryDataResponse model - Implement /query/data endpoint - Add aquery_data method to LightRAG - Return entities, relationships, chunks	2025-09-15 02:15:14 +08:00
yangdx	82a67354d0	Code formatting improvements and style consistency fixes * Remove trailing whitespace * Fix function signature ellipsis style	2025-09-14 17:49:02 +08:00
yangdx	87bb8a023b	Fix tuple delimiter regex patterns and add debug logging - Add debug logs for malformed records - Fix regex for consecutive delimiters - Handle missing closing brackets	2025-09-14 17:29:27 +08:00
yangdx	70fee5bbeb	Fix syntax warning by removin examples from fix_tuple_delimiter_corruption docstring	2025-09-14 12:37:21 +08:00
yangdx	20c5127c7c	Merge branch 'optimize-extraction' into return-data-only	2025-09-14 12:33:37 +08:00
yangdx	ff705a2323	Fix tuple delimiter corruption when missing closing bracket, Handle <\|#: -> <\|#\|> pattern	2025-09-14 11:44:21 +08:00
yangdx	1dc96f3959	Merge branch 'optimize-extraction' into return-data-only	2025-09-14 05:37:48 +08:00
yangdx	2686fc526e	Change entity type from CreativeWork to Content and update delimiter • Replace CreativeWork with Content type • Improve LLM output error messages • Update prompt for binary relationships • Fix delimiter corruption examples	2025-09-14 00:55:15 +08:00
yangdx	0ffb5d5f2d	Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results • Reuse existing query logic paths and remove kg_search function entirely • Update kg_query/naive_query to return raw data as needed	2025-09-13 15:30:29 +08:00
yangdx	8088b7e07a	Fix tuple delimiter corruption handling and update documentation	2025-09-12 18:03:37 +08:00
yangdx	8a3e2c03a9	Fix tuple delimiter corruption patterns with pipes and brackets - Handle <\|\|S\|\|> malformed delimiters - Fix <\|\|> empty pipe sequences - Repair <\|\| incomplete patterns - Process \|\|S\|\| missing brackets - Improve delimiter normalization	2025-09-12 17:45:32 +08:00
yangdx	0221213b9b	Improve entity summarization with JSONL format and fix tuple delimiters • Convert descriptions to JSONL format • Add token-based truncation helper • Enhance entity name consistency rules • Improve summarization prompt clarity • Fix tuple delimiter corruption patterns	2025-09-12 12:32:08 +08:00
yangdx	1892ed23cc	Change tuple delimiter from <\|SEP\|> to <\|S\|> across codebase • Update prompt instruction clarity • Correct utility function examples • Update regex pattern comments	2025-09-12 08:57:46 +08:00
yangdx	c07bcbff44	Fix tuple delimiter corruption patterns and add missing edge cases	2025-09-12 08:35:37 +08:00

1 2 3 4 5 ...

267 commits