LightRAG

Author	SHA1	Message	Date
yangdx	5885637ebf	Add specialized JSON string sanitizer to prevent UTF-8 encoding errors • Remove surrogate characters (U+D800-DFFF) • Filter Unicode non-characters • Direct char-by-char filtering	2025-11-17 12:54:32 +08:00
yangdx	23cbb9c9b2	Add data sanitization to JSON writing to prevent UTF-8 encoding errors • Add _sanitize_json_data helper function • Recursively clean strings in data • Sanitize before JSON serialization • Prevent encoding-related crashes • Use existing sanitize_text_for_encoding	2025-11-17 12:54:32 +08:00
yangdx	c14f25b7f8	Add mandatory dimension parameter handling for Jina API compliance	2025-11-07 21:08:34 +08:00
yangdx	33a1482f7f	Add optional embedding dimension parameter control via env var * Add EMBEDDING_SEND_DIM environment variable * Update Jina/OpenAI embed functions * Add send_dimensions to EmbeddingFunc * Auto-inject embedding_dim when enabled * Add parameter validation warnings	2025-11-07 20:46:40 +08:00
yangdx	5f49cee20f	Merge branch 'main' into VOXWAVE-FOUNDRY/main	2025-11-06 15:37:35 +08:00
yangdx	3fbd704bf9	Enhance entity/relation editing with chunk tracking synchronization • Add chunk storage sync to edit ops • Implement incremental chunk ID updates • Support entity renaming migrations • Normalize relation keys consistently • Preserve chunk references on edits	2025-10-26 14:34:56 +08:00
yangdx	a9fec26798	Add file path limit configuration for entities and relations • Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function	2025-10-20 20:12:53 +08:00
Humphry	0b3d31507e	extended to use gemini, sswitched to use gemini-flash-latest	2025-10-20 13:17:16 +03:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	03333d63f3	Merge branch 'main' into limit-vdb-metadata-size	2025-10-17 21:36:06 +08:00
yangdx	f555824064	Fix tuple delimiter corruption handling in regex patterns	2025-10-17 18:43:45 +08:00
DivinesLight	c06522b927	Get max source Id config from .env and lightRAG init	2025-10-15 18:24:38 +05:00
haseebuchiha	d52c3377b4	Import from env and use default if none and removed useless import	2025-10-14 16:14:03 +05:00
DivinesLight	54f0a7d1ca	Quick fix to limit source_id ballooning while inserting nodes	2025-10-14 14:47:04 +05:00
NeelM0906	f6d1fb98ac	Fix Linting errors	2025-10-09 16:52:22 -04:00
yangdx	a528213210	Fix logging filter logic • Fix boolean operator precedence in filter • Consolidate GET/POST condition logic	2025-09-26 19:42:33 +08:00
yangdx	8cd4139cbf	refactor: fix double query problem by add aquery_llm function for consistent response handling - Add new aquery_llm/query_llm methods providing structured responses - Consolidate /query and /query/stream endpoints to use unified aquery_llm - Optimize cache handling by moving cache checks before LLM calls	2025-09-26 19:05:03 +08:00
yangdx	5eb4a4b799	feat: simplify citations, add reference merging, and restructure API response format	2025-09-24 14:30:10 +08:00
yangdx	6e2eab5c23	Add ID fields to entities, relations, and chunks in raw data query results	2025-09-21 23:31:35 +08:00
yangdx	18e886d7e9	Improve context item identification with meaningful IDs - Add EN prefix to entitie IDs - Add RE prefix to relation IDs -Add DC prefix chunk IDs - Enhance traceability across contexts	2025-09-21 20:19:14 +08:00
yangdx	37d01e2df8	fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses	2025-09-15 03:45:09 +08:00
yangdx	e71229698d	refactor: centralize metadata generation in query functions - Remove processing_info generation from _convert_to_user_format function - Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions - Simplify _convert_to_user_format to focus only on data format conversion	2025-09-15 03:11:07 +08:00
yangdx	b1c8206346	Add aquery_data endpoint for structured retrieval without LLM generation - Add QueryDataResponse model - Implement /query/data endpoint - Add aquery_data method to LightRAG - Return entities, relationships, chunks	2025-09-15 02:15:14 +08:00
yangdx	82a67354d0	Code formatting improvements and style consistency fixes * Remove trailing whitespace * Fix function signature ellipsis style	2025-09-14 17:49:02 +08:00
yangdx	87bb8a023b	Fix tuple delimiter regex patterns and add debug logging - Add debug logs for malformed records - Fix regex for consecutive delimiters - Handle missing closing brackets	2025-09-14 17:29:27 +08:00
yangdx	70fee5bbeb	Fix syntax warning by removin examples from fix_tuple_delimiter_corruption docstring	2025-09-14 12:37:21 +08:00
yangdx	20c5127c7c	Merge branch 'optimize-extraction' into return-data-only	2025-09-14 12:33:37 +08:00
yangdx	ff705a2323	Fix tuple delimiter corruption when missing closing bracket, Handle <\|#: -> <\|#\|> pattern	2025-09-14 11:44:21 +08:00
yangdx	1dc96f3959	Merge branch 'optimize-extraction' into return-data-only	2025-09-14 05:37:48 +08:00
yangdx	2686fc526e	Change entity type from CreativeWork to Content and update delimiter • Replace CreativeWork with Content type • Improve LLM output error messages • Update prompt for binary relationships • Fix delimiter corruption examples	2025-09-14 00:55:15 +08:00
yangdx	0ffb5d5f2d	Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results • Reuse existing query logic paths and remove kg_search function entirely • Update kg_query/naive_query to return raw data as needed	2025-09-13 15:30:29 +08:00
yangdx	8088b7e07a	Fix tuple delimiter corruption handling and update documentation	2025-09-12 18:03:37 +08:00
yangdx	8a3e2c03a9	Fix tuple delimiter corruption patterns with pipes and brackets - Handle <\|\|S\|\|> malformed delimiters - Fix <\|\|> empty pipe sequences - Repair <\|\| incomplete patterns - Process \|\|S\|\| missing brackets - Improve delimiter normalization	2025-09-12 17:45:32 +08:00
yangdx	0221213b9b	Improve entity summarization with JSONL format and fix tuple delimiters • Convert descriptions to JSONL format • Add token-based truncation helper • Enhance entity name consistency rules • Improve summarization prompt clarity • Fix tuple delimiter corruption patterns	2025-09-12 12:32:08 +08:00
yangdx	1892ed23cc	Change tuple delimiter from <\|SEP\|> to <\|S\|> across codebase • Update prompt instruction clarity • Correct utility function examples • Update regex pattern comments	2025-09-12 08:57:46 +08:00
yangdx	c07bcbff44	Fix tuple delimiter corruption patterns and add missing edge cases	2025-09-12 08:35:37 +08:00
yangdx	8660bf34e4	Add timestamp tracking for LLM responses and entity/relationship data - Track timestamps for cache hits/misses - Add timestamp to entity/relationship objects - Sort descriptions by timestamp order - Preserve temporal ordering in merges	2025-09-12 04:34:12 +08:00
yangdx	40688def20	Refactor tuple delimiter corruption fix into reusable utility function - Extract regex fixes to utils module - Add case-insensitive delimiter handling	2025-09-12 04:10:14 +08:00
yangdx	a49c8e4a0d	Refactor JSON serialization to use newline-separated format - Replace json.dumps with line-by-line format - Apply to entities, relations, text units - Update truncation key functions - Maintain ensure_ascii=False setting - Improve context readability	2025-09-10 11:59:25 +08:00
yangdx	2dd143c935	Refactor conversation history handling to use LLM native message format • Remove get_conversation_turns utility • Pass history_messages to LLM directly • Clean up prompt template formatting	2025-09-10 11:56:58 +08:00
yangdx	09abb656b8	Improve log message formatting for better readability	2025-09-09 17:41:09 +08:00
yangdx	d218f15a62	Refactor entity extraction with system prompts and output limits - Add system/user prompt separation - Set max tokens for endless output fix - Improve extraction error logging - Update cache type from extract to summary	2025-09-08 15:20:45 +08:00
yangdx	c87eb2cfcf	Increase timeout buffers for async function calls • Extend execution timeout buffer to 150s • Extend task duration buffer to 180s • Account for low-level retry delays • Improve health check phase handling • Reduce timeout-related failures	2025-09-06 23:56:24 +08:00
yangdx	6be462511f	Add error prefixing for better debugging context in async operations * Add create_prefixed_exception utility * Prefix entity processing errors * Prefix relationship processing errors * Prefix chunk extraction progress info * Maintain original exception chains	2025-09-05 21:28:00 +08:00
yangdx	2c551cb5db	Add support for Chinese book title marks in normalize_extracted_info	2025-09-04 18:51:57 +08:00
yangdx	9b516a8a53	Hot Fix: Preserve whitespace chars in text sanitization • Keep \t, \n, \r in control char removal	2025-09-04 10:58:29 +08:00
yangdx	a25ce7f078	Fix linting	2025-09-03 21:58:30 +08:00
yangdx	7ef2f0dff6	Add VDB error handling with retries for data consistency - Add safe_vdb_operation_with_exception util - Wrap VDB ops in entity/relationship code - Ensure exceptions propagate on failure - Add retry logic with configurable delays	2025-09-03 21:15:09 +08:00
yangdx	5b2deccbef	Improve text normalization and add entity type capitalization - Capitalize entity types with .title() - Add non-breaking space handling - Add narrow non-breaking space regex	2025-09-02 02:51:41 +08:00
yangdx	e95622ca7b	fix(utils): enhance remove_think_tags to handle orphaned </think> closing tags The function now properly handles cases where text contains </think> closing tags without corresponding <think> opening tags, which can occur due to content truncation or processing errors.	2025-09-01 07:17:30 +08:00

1 2 3 4 5 ...

253 commits