LightRAG

Author	SHA1	Message	Date
clssck	8d099fc3ac	chore: sync with upstream HKUDS/LightRAG - Add KaTeX extensions (mhchem for chemistry, copy-tex for copying) - Add CASCADE to AGE extension for PostgreSQL - Remove future dependency, replace passlib with bcrypt - Fix Jina embedding configuration and provider defaults - Update gunicorn help text and bump API version to 0258 - Documentation and README updates	2025-12-01 21:30:19 +01:00
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	ab4d7ac2b0	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-14 19:28:36 +08:00
yangdx	a24d8181c2	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety)	2025-11-13 18:58:09 +08:00
yangdx	746c069ab0	Implement lazy configuration initialization for API server • Add lazy config initialization • Maintain backward compatibility • Support programmatic usage • Add gunicorn dependency • Explicit config in entry points	2025-11-13 15:28:05 +08:00
yangdx	de4ed73652	Add Gemini embedding support - Implement gemini_embed function - Add gemini to embedding binding choices - Add L2 normalization for dims < 3072	2025-11-08 03:34:30 +08:00
yangdx	0b2a15c452	Centralize embedding_send_dim config through args instead of env var	2025-11-08 01:52:23 +08:00
yangdx	5f49cee20f	Merge branch 'main' into VOXWAVE-FOUNDRY/main	2025-11-06 15:37:35 +08:00
yangdx	61b57cbb5d	Add PDF decryption support for password-protected files • Add PDF_DECRYPT_PASSWORD env variable • Check encryption status before reading • Handle decrypt errors gracefully • Log detailed error messages • Support both encrypted/plain PDFs	2025-11-01 15:01:17 +08:00
Humphry	0b3d31507e	extended to use gemini, sswitched to use gemini-flash-latest	2025-10-20 13:17:16 +03:00
yangdx	6b953fa53d	Remove auto-scan-at-startup feature and related documentation • Remove --auto-scan-at-startup arg • Delete auto scan docs sections • Remove startup scanning logic	2025-09-23 16:24:53 +08:00
yangdx	de4fe8bc7d	Improve uvicorn workers warning message clarity	2025-09-08 16:05:51 +08:00
yangdx	c8c59c38b0	Fix entity types configuration to support JSON list parsing - Add JSON parsing for list env vars - Update entity types example format - Add list type support to get_env_value	2025-09-01 00:14:57 +08:00
yangdx	ff0a18e08c	Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method	2025-08-27 12:23:22 +08:00
Thibo Rosemplatt	c3aabfc251	Merge branch 'main' into entityTypesServerSupport	2025-08-26 21:48:20 +02:00
yangdx	6bcfe696ee	feat: add output length recommendation and description type to LLM summary - Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens) - Optimize prompt temple for LLM summary	2025-08-26 14:41:12 +08:00
yangdx	de2daf6565	refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration - Update algorithm logic in operate.py for better token management - Fix health endpoint to use correct parameter names	2025-08-26 01:35:50 +08:00
Thibo Rosemplatt	d054ec5d00	Added entity_types as a user defined variable (via .env)	2025-08-23 20:16:11 +02:00
yangdx	47485b130d	refac(ui): Show rerank binding info on status card - Remove separate ENABLE_RERANK flag in favor of rerank_binding="null" - Change default rerank binding from "cohere" to "null" (disabled) - Update UI to display both rerank binding and model information	2025-08-23 02:04:14 +08:00
yangdx	bf43e1b8c1	fix: Resolve default rerank config problem when env var missing - Read config from selected_rerank_func when env var missing - Make api_key optional for rerank function - Add response format validation with proper error handling - Update Cohere rerank default to official API endpoint	2025-08-23 01:07:59 +08:00
yangdx	580cb7906c	feat: Add multiple rerank provider support to LightRAG Server by adding new env vars and cli params - Add --enable-rerank CLI argument and ENABLE_RERANK env var - Simplify rerank configuration logic to only check enable flag and binding - Update health endpoint to show enable_rerank and rerank_configured status - Improve logging messages for rerank enable/disable states - Maintain backward compatibility with default value True	2025-08-22 19:29:45 +08:00
yangdx	0e67ead8fa	Rename MAX_TOKENS to SUMMARY_MAX_TOKENS for clarity	2025-08-21 10:15:20 +08:00
yangdx	aa22772721	Refactor LLM temperature handling to be provider-specific • Remove global temperature parameter • Add provider-specific temp configs • Update env example with new settings • Fix Bedrock temperature handling • Clean up splash screen display	2025-08-20 23:52:33 +08:00
SJ	f7ca9ae16a	Ruff formatted	2025-08-15 22:21:34 +00:00
SJ	99643f01de	Enhancement: support aws bedrock as an LLm binding #1733	2025-08-13 02:08:13 -05:00
yangdx	4d492abf41	feat: implement temperature priority cascade for LLM bindings - Add global --temperature command line argument with env fallback - Implement temperature priority for Ollama LLM binding: 1. --ollama-llm-temperature (highest) 2. OLLAMA_LLM_TEMPERATURE env var 3. --temperature command arg 4. TEMPERATURE env var (lowest) - Implement same priority logic for OpenAI/Azure OpenAI LLM binding - Ensure command line args always override environment variables - Maintain backward compatibility with existing configurations	2025-08-05 04:53:55 +08:00
yangdx	adf7ec8e35	feat: Add OpenAI LLM Options support with BindingOptions framework - Add OpenAILLMOptions dataclass with full OpenAI API parameter support - Integrate OpenAI options in config.py for automatic binding detection - Update server functions to inject OpenAI options for openai/azure_openai bindings	2025-08-05 03:47:26 +08:00
yangdx	3099748668	Add temperature fallback for Ollama LLM binding - Implement OLLAMA_LLM_TEMPERATURE env var - Fallback to global TEMPERATURE if unset - Remove redundant OllamaLLMOptions logic - Update env.example with new setting	2025-08-05 01:50:09 +08:00
yangdx	8271e1f6f1	Move OllamaServerInfos class to base module - Eliminate dependency of the core module on the API module.	2025-07-31 23:24:49 +08:00
yangdx	9d5603d35e	Set the default LLM temperature to 1.0 and centralize constant management	2025-07-31 17:15:10 +08:00
administrator	c26dfa33de	Fix: corrected unterminated f-string in config.py	2025-07-29 11:21:23 +07:00
yangdx	9923821d75	refactor: Remove deprecated `max_token_size` from embedding configuration This parameter is no longer used. Its removal simplifies the API and clarifies that token length management is handled by upstream text chunking logic rather than the embedding wrapper.	2025-07-29 10:49:35 +08:00
yangdx	f4c2dc327d	Fix linting	2025-07-29 09:57:41 +08:00
Michele Comitini	bd94714b15	options needs to be passed to ollama client embed() method Fix line length Create binding_options.py Remove test property Add dynamic binding options to CLI and environment config Automatically generate command-line arguments and environment variable support for all LLM provider bindings using BindingOptions. Add sample .env generation and extensible framework for new providers. Add example option definitions and fix test arg check in OllamaOptions Add options_dict method to BindingOptions for argument parsing Add comprehensive Ollama binding configuration options ruff formatting Apply ruff formatting to binding_options.py Add Ollama separate options for embedding and LLM Refactor Ollama binding options and fix class var handling The changes improve how class variables are handled in binding options and better organize the Ollama-specific options into LLM and embedding subclasses. Fix typo in arg test. Rename cls parameter to klass to avoid keyword shadowing Fix Ollama embedding binding name typo Fix ollama embedder context param name Split Ollama options into LLM and embedding configs with mixin base Add Ollama option configuration to LLM and embeddings in lightrag_server Update sample .env generation and environment handling Conditionally add env vars and cmdline options only when ollama bindings are used. Add example env file for Ollama binding options.	2025-07-28 12:05:40 +02:00
yangdx	f2ffff063b	feat: refactor ollama server configuration management - Add ollama_server_infos attribute to LightRAG class with default initialization - Move default values to constants.py for centralized configuration - Refactor OllamaServerInfos class with property accessors and CLI support - Update OllamaAPI to get configuration through rag object instead of direct import - Add command line arguments for simulated model name and tag - Fix type imports to avoid circular dependencies	2025-07-28 01:38:35 +08:00
yangdx	598eecd06d	Refactor: Rename llm_model_max_token_size to summary_max_tokens This commit renames the parameter 'llm_model_max_token_size' to 'summary_max_tokens' for better clarity, as it specifically controls the token limit for entity relation summaries.	2025-07-28 00:49:08 +08:00
yangdx	d0d57a45b6	feat: add environment variables to /health endpoint and centralize defaults - Add 9 environment variables to /health endpoint configuration section - Centralize default constants in lightrag/constants.py for consistency - Update config.py to use centralized defaults for better maintainability	2025-07-28 00:30:56 +08:00
yangdx	ebaff228aa	feat: Add rerank score filtering with configurable threshold - Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0) - Add MIN_RERANK_SCORE environment variable support - Filter chunks with rerank scores below threshold in process_chunks_unified - Add info-level logging for filtering operations - Handle empty results gracefully after filtering - Maintain backward compatibility with non-reranked chunks	2025-07-27 16:37:44 +08:00
yangdx	5f7cb437e8	Centralize query parameters into LightRAG class This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.	2025-07-15 23:56:49 +08:00
yangdx	e8e1f6ab56	feat: centralize environment variable defaults in constants.py	2025-07-15 16:11:50 +08:00
zrguo	7c882313bb	remove chunk_rerank_top_k	2025-07-15 11:52:34 +08:00
yangdx	b03bb48e24	feat: Refine summary logic and add dedicated Ollama num_ctx config - Refactor the trigger condition for LLM-based summarization of entities and relations. Instead of relying on character length, the summary is now triggered when the number of merged description fragments exceeds a configured threshold. This provides a more robust and logical condition for consolidation. - Introduce the `OLLAMA_NUM_CTX` environment variable to explicitly configure the context window size (`num_ctx`) for Ollama models. This decouples the model's context length from the `MAX_TOKENS` parameter, which is now specifically used to limit input for summary generation, making the configuration clearer and more flexible. - Updated `README` files, `env.example`, and default values to reflect these changes.	2025-07-14 01:55:04 +08:00
yangdx	2056c3c809	Increase default CHUNK_TOP_K from 5 to 15	2025-07-09 04:41:51 +08:00
zrguo	d4651d59c1	Add rerank to server	2025-07-08 21:44:20 +08:00
yangdx	ef79088f60	Move max_graph_nodes to global config	2025-07-07 21:53:57 +08:00
yangdx	033098c1bc	Feat: Add WORKSPACE support to all storage types	2025-07-07 00:57:21 +08:00
yangdx	4d57370c94	Refactor: Move get_env_value from api.config to utils Relocates the `get_env_value` utility function from `lightrag.api.config` to `lightrag.utils` to decouple LightRAG core from API Server	2025-05-10 08:58:18 +08:00
yangdx	c8ecfa2d68	feat: Centralize configuration and update defaults This commit introduces `lightrag/constants.py` to centralize default values for various configurations across the API and core components. Key changes: - Added `constants.py` to centralize default values - Improved the `get_env_value` function in `api/config.py` to correctly handle string "None" as a None value and to catch `TypeError` during value conversion. - Updated the default `SUMMARY_LANGUAGE` to "English" - Set default `WORKERS` to 2	2025-05-06 22:00:43 +08:00
yangdx	e94f7dbe1b	Fix linting	2025-04-09 12:42:48 +08:00

1 2

55 commits