Commit graph

57 commits

Author SHA1 Message Date
yangdx
ab32456a79 Refactor entity merging with unified attribute merge function
• Update GRAPH_FIELD_SEP comment clarity
• Deprecate merge_strategy parameter
• Unify entity/relation merge logic
• Add join_unique_comma strategy
2025-10-27 00:04:17 +08:00
yangdx
904b1f46f9 Add entity name length truncation with configurable limit 2025-10-22 14:02:30 +08:00
yangdx
88a45523e2 Increase default max file paths from 30 to 100 and improve documentation
- Bump DEFAULT_MAX_FILE_PATHS to 100
- Add clarifying comment about display
2025-10-21 17:33:00 +08:00
yangdx
3ad616be4f Change default source IDs limit method from KEEP to FIFO 2025-10-21 16:12:11 +08:00
yangdx
1248b3ab04 Increase default limits for source IDs and file paths in metadata
• Entity source IDs: 3 → 300
• Relation source IDs: 3 → 300
• File paths: 2 → 30
2025-10-21 05:30:09 +08:00
yangdx
e0fd31a60d Fix logging message formatting 2025-10-20 22:09:09 +08:00
yangdx
a9fec26798 Add file path limit configuration for entities and relations
• Add MAX_FILE_PATHS env variable
• Implement file path count limiting
• Support KEEP/FIFO strategies
• Add truncation placeholder
• Remove old build_file_path function
2025-10-20 20:12:53 +08:00
yangdx
dc62c78f98 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
2025-10-20 15:24:15 +08:00
DivinesLight
c06522b927 Get max source Id config from .env and lightRAG init 2025-10-15 18:24:38 +05:00
DivinesLight
54f0a7d1ca Quick fix to limit source_id ballooning while inserting nodes 2025-10-14 14:47:04 +05:00
yangdx
699ca3ba00 Remove deprecated history_turns and ids parameters from query API endpoint
• Update QueryParam documentation
• Mark history_turns as deprecated
• Clean up splash screen display
• Clarify conversation_history usage
2025-09-25 04:58:57 +08:00
yangdx
9dd1790b5c Add "Creature" entity type and reorganize type mappings
- Add Creature to default entity types
- Map animals/beings to creature type
2025-09-23 21:58:33 +08:00
yangdx
5311083f43 Rename "Process" entity type to "Method" across all components 2025-09-14 02:30:05 +08:00
yangdx
7060cf17f0 Add Process and Data entity types to LLM extraction system
• Add Process and Data to default types
• Update env.example configuration
• Add translations for new entities
• Support 5 languages (en/zh/fr/ar/tw)
2025-09-14 01:14:47 +08:00
yangdx
2686fc526e Change entity type from CreativeWork to Content and update delimiter
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
41cdeaeaad Add Concept and NaturalObject to default entity types 2025-09-13 15:37:11 +08:00
yangdx
f3b5352019 Refine default entity types 2025-09-13 11:17:06 +08:00
yangdx
8d53ef7ff0 Increase default Gunicorn worker timeout from 210 to 300 seconds 2025-09-08 20:03:21 +08:00
yangdx
78abb397bf Reorder entity types and add Document type to extraction 2025-09-03 12:44:40 +08:00
yangdx
9d81cd724a Fix typo: change "Equiment" to "Equipment" in entity types 2025-09-02 03:19:31 +08:00
yangdx
4e751e0653 refac: Enhance extraction with improved prompts and parser
-   **Prompts**: Restructured prompts with clearer steps and quality guidelines. Simplified the relationship tuple by removing `relationship_strength`
-   **Model**: Updated default entity types to be more comprehensive and consistently capitalized (e.g., `Location`, `Product`)
2025-08-31 22:24:11 +08:00
yangdx
925e631a9a refac: Add robust time out handling for LLM request 2025-08-29 13:50:35 +08:00
yangdx
8a0d06e557 Restore default entity types 2025-08-27 12:51:18 +08:00
yangdx
ff0a18e08c Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method 2025-08-27 12:23:22 +08:00
Thibo Rosemplatt
c3aabfc251 Merge branch 'main' into entityTypesServerSupport 2025-08-26 21:48:20 +02:00
yangdx
6bcfe696ee feat: add output length recommendation and description type to LLM summary
- Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens)
- Optimize prompt temple for LLM summary
2025-08-26 14:41:12 +08:00
yangdx
84416d104d Increase default LLM summary merge threshold from 4 to 8 for reducing summary trigger frequency 2025-08-26 03:57:35 +08:00
yangdx
de2daf6565 refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration
- Update algorithm logic in operate.py for better token management
- Fix health endpoint to use correct parameter names
2025-08-26 01:35:50 +08:00
Thibo Rosemplatt
d054ec5d00 Added entity_types as a user defined variable (via .env) 2025-08-23 20:16:11 +02:00
yangdx
47485b130d refac(ui): Show rerank binding info on status card
- Remove separate ENABLE_RERANK flag in favor of rerank_binding="null"
- Change default rerank binding from "cohere" to "null" (disabled)
- Update UI to display both rerank binding and model information
2025-08-23 02:04:14 +08:00
yangdx
bf43e1b8c1 fix: Resolve default rerank config problem when env var missing
- Read config from selected_rerank_func when env var missing
- Make api_key optional for rerank function
- Add response format validation with proper error handling
- Update Cohere rerank default to official API endpoint
2025-08-23 01:07:59 +08:00
yangdx
16a1ef1178 Update summary_max_tokens default from 10k to 30k tokens 2025-08-21 23:16:07 +08:00
yangdx
4c556d8aae Set default TIMEOUT value to 150, and gunicorn timeout to TIMEOUT+30 2025-08-20 22:04:32 +08:00
yangdx
d5e8f1e860 Update default query parameters for better performance
- Increase chunk_top_k from 10 to 20
- Reduce max_entity_tokens to 6000
- Reduce max_relation_tokens to 8000
- Update web UI default values
- Fix max_total_tokens to 30000
2025-08-18 19:32:11 +08:00
yangdx
dcec511f72 feat: increase file path length limit to 32768 and add schema migration for Milvus DB
- Bump path limit to 32768 chars
- Add migration detection logic
- Implement dual-client migration
- Auto-migrate old collections
2025-08-18 04:37:12 +08:00
yangdx
5a40ff654e Change KG chunk selection default to VECTOR
- Set KG_CHUNK_PICK_METHOD default to VECTOR
- Update env.example with new config option
2025-08-13 23:10:42 +08:00
yangdx
f1dafa0d01 feat: KG related chunks selection by vector similarity
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00
yangdx
9d5603d35e Set the default LLM temperature to 1.0 and centralize constant management 2025-07-31 17:15:10 +08:00
yangdx
c6bd9f0329 Disable conversation history by default
- Set default history_turns to 0
- Mark history_turns as deprecated
- Remove history_turns from example
- Update documentation comments
2025-07-31 12:28:42 +08:00
yangdx
f2ffff063b feat: refactor ollama server configuration management
- Add ollama_server_infos attribute to LightRAG class with default initialization
- Move default values to constants.py for centralized configuration
- Refactor OllamaServerInfos class with property accessors and CLI support
- Update OllamaAPI to get configuration through rag object instead of direct import
- Add command line arguments for simulated model name and tag
- Fix type imports to avoid circular dependencies
2025-07-28 01:38:35 +08:00
yangdx
598eecd06d Refactor: Rename llm_model_max_token_size to summary_max_tokens
This commit renames the parameter 'llm_model_max_token_size' to 'summary_max_tokens' for better clarity, as it specifically controls the token limit for entity relation summaries.
2025-07-28 00:49:08 +08:00
yangdx
d0d57a45b6 feat: add environment variables to /health endpoint and centralize defaults
- Add 9 environment variables to /health endpoint configuration section
- Centralize default constants in lightrag/constants.py for consistency
- Update config.py to use centralized defaults for better maintainability
2025-07-28 00:30:56 +08:00
yangdx
a9565d7379 feat: Skip rerank filtering when min_rerank_score is 0.0 2025-07-27 16:50:12 +08:00
yangdx
ebaff228aa feat: Add rerank score filtering with configurable threshold
- Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0)
- Add MIN_RERANK_SCORE environment variable support
- Filter chunks with rerank scores below threshold in process_chunks_unified
- Add info-level logging for filtering operations
- Handle empty results gracefully after filtering
- Maintain backward compatibility with non-reranked chunks
2025-07-27 16:37:44 +08:00
yangdx
055629d30d Reduce default max total tokens to 30k 2025-07-27 10:33:06 +08:00
yangdx
c8c3545454 refactor: extract file path length limit to shared constant
• Add DEFAULT_MAX_FILE_PATH_LENGTH constant
• Replace hardcoded 4090 in Milvus impl
2025-07-26 10:45:03 +08:00
yangdx
2c940f0728 reduce RELATED_CHUNK_NUMBER from 10 to 5 2025-07-24 02:49:05 +08:00
yangdx
8103b200db Set DEFAULT_HISTORY_TURNS to 0 2025-07-16 02:20:27 +08:00
yangdx
6e084bfae1 Increase default related chunk number from 5 to 10 2025-07-16 00:22:34 +08:00
yangdx
5f7cb437e8 Centralize query parameters into LightRAG class
This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.
2025-07-15 23:56:49 +08:00