LightRAG

Author	SHA1	Message	Date
clssck	59e89772de	refactor: consolidate to PostgreSQL-only backend and modernize stack Remove legacy storage implementations and deprecated examples: - Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends - Remove Kubernetes deployment manifests and installation scripts - Delete unofficial examples for deprecated backends and offline deployment docs Streamline core infrastructure: - Consolidate storage layer to PostgreSQL-only implementation - Add full-text search caching with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes Modernize frontend and tooling: - Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles - Update Dockerfile for PostgreSQL-only deployment - Add Makefile for common development tasks - Update environment and configuration examples Enhance evaluation and testing capabilities: - Add prompt optimization with DSPy and auto-tuning - Implement ground truth regeneration and variant testing - Add prompt debugging and response comparison utilities - Expand test coverage with new integration scenarios Simplify dependencies and configuration: - Remove offline-specific requirement files - Update pyproject.toml with streamlined dependencies - Add Python version pinning with .python-version - Create project guidelines in CLAUDE.md and AGENTS.md	2025-12-12 16:28:49 +01:00
clssck	da9070ecf7	refactor: remove legacy storage implementations and k8s deployment Remove deprecated storage backends and Kubernetes deployment configuration: - Delete unused storage implementations: FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis - Remove Kubernetes deployment manifests and installation scripts - Delete legacy examples for deprecated backends - Consolidate to PostgreSQL-only storage backend Streamline dependencies and add new capabilities: - Remove deprecated code documentation and migration guides - Add full-text search caching layer with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes - Simplify configuration with PostgreSQL-focused setup Update documentation and configuration: - Rewrite README to focus on supported features - Update environment and configuration examples - Remove Kubernetes-specific documentation - Add new utility scripts for PDF uploads and pipeline monitoring	2025-12-09 14:02:00 +01:00
clssck	65d2cd16b1	feat(examples, lightrag): fix logging and code improvements Fix logging output in evaluation test harness and examples: - Replace print() statements with logger calls in e2e_test_harness.py - Update copy_llm_cache_to_another_storage.py to use logger instead of print - Remove redundant logging configuration in copy_llm_cache_to_another_storage.py Fix path handling and typos: - Correct makedirs() call in lightrag_openai_demo.py to create log_dir directly - Update constants.py comments to clarify SOURCE_IDS_LIMIT_METHOD options - Remove duplicate return statement in utils.py normalize_extracted_info() - Fix error string formatting in chroma_impl.py with !s conversion - Remove unused pipmaster import from chroma_impl.py	2025-12-05 18:10:19 +01:00
clssck	69358d830d	test(lightrag,examples,api): comprehensive ruff formatting and type hints Format entire codebase with ruff and add type hints across all modules: - Apply ruff formatting to all Python files (121 files, 17K insertions) - Add type hints to function signatures throughout lightrag core and API - Update test suite with improved type annotations and docstrings - Add pyrightconfig.json for static type checking configuration - Create prompt_optimized.py and test_extraction_prompt_ab.py test files - Update ruff.toml and .gitignore for improved linting configuration - Standardize code style across examples, reproduce scripts, and utilities	2025-12-05 15:17:06 +01:00
clssck	663ada943a	chore: add citation system and enhance RAG UI components Add citation tracking and display system across backend and frontend components. Backend changes include citation.py for document attribution, enhanced query routes with citation metadata, improved prompt templates, and PostgreSQL schema updates. Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements, and ChatMessage enhancements for displaying document sources. Update dependencies and docker-compose test configuration for improved development workflow.	2025-12-01 17:50:00 +01:00
yangdx	ab32456a79	Refactor entity merging with unified attribute merge function • Update GRAPH_FIELD_SEP comment clarity • Deprecate merge_strategy parameter • Unify entity/relation merge logic • Add join_unique_comma strategy	2025-10-27 00:04:17 +08:00
yangdx	904b1f46f9	Add entity name length truncation with configurable limit	2025-10-22 14:02:30 +08:00
yangdx	88a45523e2	Increase default max file paths from 30 to 100 and improve documentation - Bump DEFAULT_MAX_FILE_PATHS to 100 - Add clarifying comment about display	2025-10-21 17:33:00 +08:00
yangdx	3ad616be4f	Change default source IDs limit method from KEEP to FIFO	2025-10-21 16:12:11 +08:00
yangdx	1248b3ab04	Increase default limits for source IDs and file paths in metadata • Entity source IDs: 3 → 300 • Relation source IDs: 3 → 300 • File paths: 2 → 30	2025-10-21 05:30:09 +08:00
yangdx	e0fd31a60d	Fix logging message formatting	2025-10-20 22:09:09 +08:00
yangdx	a9fec26798	Add file path limit configuration for entities and relations • Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function	2025-10-20 20:12:53 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
DivinesLight	c06522b927	Get max source Id config from .env and lightRAG init	2025-10-15 18:24:38 +05:00
DivinesLight	54f0a7d1ca	Quick fix to limit source_id ballooning while inserting nodes	2025-10-14 14:47:04 +05:00
yangdx	699ca3ba00	Remove deprecated `history_turns` and `ids` parameters from query API endpoint • Update QueryParam documentation • Mark history_turns as deprecated • Clean up splash screen display • Clarify conversation_history usage	2025-09-25 04:58:57 +08:00
yangdx	9dd1790b5c	Add "Creature" entity type and reorganize type mappings - Add Creature to default entity types - Map animals/beings to creature type	2025-09-23 21:58:33 +08:00
yangdx	5311083f43	Rename "Process" entity type to "Method" across all components	2025-09-14 02:30:05 +08:00
yangdx	7060cf17f0	Add Process and Data entity types to LLM extraction system • Add Process and Data to default types • Update env.example configuration • Add translations for new entities • Support 5 languages (en/zh/fr/ar/tw)	2025-09-14 01:14:47 +08:00
yangdx	2686fc526e	Change entity type from CreativeWork to Content and update delimiter • Replace CreativeWork with Content type • Improve LLM output error messages • Update prompt for binary relationships • Fix delimiter corruption examples	2025-09-14 00:55:15 +08:00
yangdx	41cdeaeaad	Add Concept and NaturalObject to default entity types	2025-09-13 15:37:11 +08:00
yangdx	f3b5352019	Refine default entity types	2025-09-13 11:17:06 +08:00
yangdx	8d53ef7ff0	Increase default Gunicorn worker timeout from 210 to 300 seconds	2025-09-08 20:03:21 +08:00
yangdx	78abb397bf	Reorder entity types and add Document type to extraction	2025-09-03 12:44:40 +08:00
yangdx	9d81cd724a	Fix typo: change "Equiment" to "Equipment" in entity types	2025-09-02 03:19:31 +08:00
yangdx	4e751e0653	refac: Enhance extraction with improved prompts and parser - Prompts: Restructured prompts with clearer steps and quality guidelines. Simplified the relationship tuple by removing `relationship_strength` - Model: Updated default entity types to be more comprehensive and consistently capitalized (e.g., `Location`, `Product`)	2025-08-31 22:24:11 +08:00
yangdx	925e631a9a	refac: Add robust time out handling for LLM request	2025-08-29 13:50:35 +08:00
yangdx	8a0d06e557	Restore default entity types	2025-08-27 12:51:18 +08:00
yangdx	ff0a18e08c	Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method	2025-08-27 12:23:22 +08:00
Thibo Rosemplatt	c3aabfc251	Merge branch 'main' into entityTypesServerSupport	2025-08-26 21:48:20 +02:00
yangdx	6bcfe696ee	feat: add output length recommendation and description type to LLM summary - Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens) - Optimize prompt temple for LLM summary	2025-08-26 14:41:12 +08:00
yangdx	84416d104d	Increase default LLM summary merge threshold from 4 to 8 for reducing summary trigger frequency	2025-08-26 03:57:35 +08:00
yangdx	de2daf6565	refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration - Update algorithm logic in operate.py for better token management - Fix health endpoint to use correct parameter names	2025-08-26 01:35:50 +08:00
Thibo Rosemplatt	d054ec5d00	Added entity_types as a user defined variable (via .env)	2025-08-23 20:16:11 +02:00
yangdx	47485b130d	refac(ui): Show rerank binding info on status card - Remove separate ENABLE_RERANK flag in favor of rerank_binding="null" - Change default rerank binding from "cohere" to "null" (disabled) - Update UI to display both rerank binding and model information	2025-08-23 02:04:14 +08:00
yangdx	bf43e1b8c1	fix: Resolve default rerank config problem when env var missing - Read config from selected_rerank_func when env var missing - Make api_key optional for rerank function - Add response format validation with proper error handling - Update Cohere rerank default to official API endpoint	2025-08-23 01:07:59 +08:00
yangdx	16a1ef1178	Update summary_max_tokens default from 10k to 30k tokens	2025-08-21 23:16:07 +08:00
yangdx	4c556d8aae	Set default TIMEOUT value to 150, and gunicorn timeout to TIMEOUT+30	2025-08-20 22:04:32 +08:00
yangdx	d5e8f1e860	Update default query parameters for better performance - Increase chunk_top_k from 10 to 20 - Reduce max_entity_tokens to 6000 - Reduce max_relation_tokens to 8000 - Update web UI default values - Fix max_total_tokens to 30000	2025-08-18 19:32:11 +08:00
yangdx	dcec511f72	feat: increase file path length limit to 32768 and add schema migration for Milvus DB - Bump path limit to 32768 chars - Add migration detection logic - Implement dual-client migration - Auto-migrate old collections	2025-08-18 04:37:12 +08:00
yangdx	5a40ff654e	Change KG chunk selection default to VECTOR - Set KG_CHUNK_PICK_METHOD default to VECTOR - Update env.example with new config option	2025-08-13 23:10:42 +08:00
yangdx	f1dafa0d01	feat: KG related chunks selection by vector similarity - Add env switch to toggle weighted polling vs vector-similarity strategy - Implement similarity-based sorting with fallback to weighted - Introduce batch vector read API for vector storage - Implement vector store and retrive funtion for Nanovector DB - Preserve default behavior (weighted polling selection method)	2025-08-13 18:16:42 +08:00
yangdx	9d5603d35e	Set the default LLM temperature to 1.0 and centralize constant management	2025-07-31 17:15:10 +08:00
yangdx	c6bd9f0329	Disable conversation history by default - Set default history_turns to 0 - Mark history_turns as deprecated - Remove history_turns from example - Update documentation comments	2025-07-31 12:28:42 +08:00
yangdx	f2ffff063b	feat: refactor ollama server configuration management - Add ollama_server_infos attribute to LightRAG class with default initialization - Move default values to constants.py for centralized configuration - Refactor OllamaServerInfos class with property accessors and CLI support - Update OllamaAPI to get configuration through rag object instead of direct import - Add command line arguments for simulated model name and tag - Fix type imports to avoid circular dependencies	2025-07-28 01:38:35 +08:00
yangdx	598eecd06d	Refactor: Rename llm_model_max_token_size to summary_max_tokens This commit renames the parameter 'llm_model_max_token_size' to 'summary_max_tokens' for better clarity, as it specifically controls the token limit for entity relation summaries.	2025-07-28 00:49:08 +08:00
yangdx	d0d57a45b6	feat: add environment variables to /health endpoint and centralize defaults - Add 9 environment variables to /health endpoint configuration section - Centralize default constants in lightrag/constants.py for consistency - Update config.py to use centralized defaults for better maintainability	2025-07-28 00:30:56 +08:00
yangdx	a9565d7379	feat: Skip rerank filtering when `min_rerank_score` is 0.0	2025-07-27 16:50:12 +08:00
yangdx	ebaff228aa	feat: Add rerank score filtering with configurable threshold - Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0) - Add MIN_RERANK_SCORE environment variable support - Filter chunks with rerank scores below threshold in process_chunks_unified - Add info-level logging for filtering operations - Handle empty results gracefully after filtering - Maintain backward compatibility with non-reranked chunks	2025-07-27 16:37:44 +08:00
yangdx	055629d30d	Reduce default max total tokens to 30k	2025-07-27 10:33:06 +08:00

1 2

62 commits