LightRAG

Author	SHA1	Message	Date
clssck	2af1170bec	chore: sync with upstream Merge latest upstream changes including: - Cohere rerank improvements (top_n chunking behavior) - Content deduplication for document insertion - Dependabot configuration updates - Dependency version bumps	2025-12-03 12:55:53 +01:00
yangdx	9009abed3e	Fix top_n behavior with chunking to limit documents not chunks - Disable API-level top_n when chunking - Apply top_n to aggregated documents - Add comprehensive test coverage	2025-12-03 13:08:26 +08:00
yangdx	561ba4e4b5	Fix trailing whitespace and update test mocking for rerank module • Remove trailing whitespace • Fix TiktokenTokenizer import patch • Add async context manager mocks • Update aiohttp.ClientSession patch • Improve test reliability	2025-12-03 12:40:48 +08:00
yangdx	8e50eef58b	Merge branch 'main' into cohere-rerank	2025-12-02 22:19:37 +08:00
yangdx	19c16bc464	Add content deduplication check for document insertion endpoints • Check content hash before insertion • Return duplicated status if exists • Use sanitized text for hash computation • Apply to both single and batch inserts • Prevent duplicate content processing	2025-12-02 17:49:48 +08:00
yangdx	8d28b95966	Fix duplicate document responses to return original track_id - Return existing track_id for duplicates - Remove track_id generation in reprocess - Update reprocess response documentation - Clarify track_id behavior in comments - Update API response examples	2025-12-02 14:32:28 +08:00
yangdx	381ddfffd4	Bump API version to 0259	2025-12-02 13:27:02 +08:00
clssck	8d099fc3ac	chore: sync with upstream HKUDS/LightRAG - Add KaTeX extensions (mhchem for chemistry, copy-tex for copying) - Add CASCADE to AGE extension for PostgreSQL - Remove future dependency, replace passlib with bcrypt - Fix Jina embedding configuration and provider defaults - Update gunicorn help text and bump API version to 0258 - Documentation and README updates	2025-12-01 21:30:19 +01:00
clssck	1bdd906753	chore(lightrag): remove legacy prompts and clean up prompt.py Remove unused LLM-generated citation prompts that were kept for backward compatibility but never referenced in codebase. Consolidate duplicate instructions in entity summarization prompt and fix minor typos. - Remove rag_response_with_llm_citations prompt (dead code) - Remove naive_rag_response_with_llm_citations prompt (dead code) - Remove unused cite_ready_* backward compatibility aliases - Consolidate duplicate context/objectivity instructions in summarize prompt - Fix typo in example (extra parenthesis) - Clarify delimiter documentation comment	2025-12-01 21:02:44 +01:00
yangdx	2ecf77efe2	Update help text to use correct gunicorn command with workers flag	2025-12-02 02:52:31 +08:00
clssck	663ada943a	chore: add citation system and enhance RAG UI components Add citation tracking and display system across backend and frontend components. Backend changes include citation.py for document attribution, enhanced query routes with citation metadata, improved prompt templates, and PostgreSQL schema updates. Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements, and ChatMessage enhancements for displaying document sources. Update dependencies and docker-compose test configuration for improved development workflow.	2025-12-01 17:50:00 +01:00
yangdx	d6019c82af	Add CASCADE to AGE extension creation in PostgreSQL implementation - Add CASCADE option to CREATE EXTENSION - Ensure dependencies are installed - Fix potential AGE setup issues	2025-12-02 00:17:41 +08:00
yangdx	112ed234c4	Bump API version to 0258	2025-12-01 12:20:27 +08:00
clssck	77df910525	chore: add citation system and code formatting setup Add citation.py module for document citation tracking and management. Configure Biome and Ruff for consistent code formatting across TypeScript and Python. Update webui with improved component organization, API client refactoring, and enhanced user interface patterns. Add formatting configs and dependency updates for build toolchain optimization.	2025-11-30 20:51:43 +01:00
clssck	9f5948650e	chore(lightrag): add wikipedia test dataset for evaluation Add comprehensive test dataset with 7 domain-specific Wikipedia documents (climate, finance, medical, sports) and corresponding test cases in JSON format. Total of 2292 lines of test data across 8 files for RAG quality evaluation and end-to-end testing infrastructure.	2025-11-30 20:14:52 +01:00
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	ef7327bb3e	chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning, Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py for automated evaluation pipelines, and ingest_test_docs.py for batch document ingestion. Updates docker-compose.test.yml with aggressive async settings, memory limits, and optimized chunking parameters. Parallelize entity summarization in operate.py for improved extraction performance. Fix typos in merge node/edge logs.	2025-11-29 10:39:20 +01:00
clssck	d2c9e6e2ec	test(lightrag): add orphan connection feature with quality validation tests Implement automatic orphan entity connection system that identifies entities with no relationships and creates meaningful connections via vector similarity + LLM validation. This improves knowledge graph connectivity and retrieval quality. Changes: - Add orphan connection configuration parameters (thresholds, cross-connect settings) - Implement aconnect_orphan_entities() method with 4-step validation pipeline - Add SQL templates for efficient orphan and candidate entity queries - Create POST /graph/orphans/connect API endpoint with configurable parameters - Add orphan connection validation prompt for LLM-based relationship verification - Include relationship density requirement in extraction prompts to prevent orphans - Update docker-compose.test.yml with optimized extraction parameters - Add quality validation test suite (run_quality_tests.py) for retrieval evaluation - Add unit test framework (test_orphan_connection_quality.py) with test cases - Enable auto-run of orphan connection after document processing	2025-11-28 18:23:30 +01:00
yangdx	ea8d55ab42	Add documentation for embedding provider configuration rules	2025-11-28 17:49:30 +08:00
yangdx	4ab4a7ac94	Allow embedding models to use provider defaults when unspecified - Set EMBEDDING_MODEL default to None - Pass model param only when provided - Let providers use their own defaults - Fix lollms embed function params - Add ollama embed_model default param	2025-11-28 16:57:33 +08:00
yangdx	881b8d3a50	Bump API version to 0257	2025-11-28 15:39:55 +08:00
yangdx	56e0365cf0	Add configurable model parameter to jina_embed function - Add model parameter to jina_embed - Pass model from API server - Default to jina-embeddings-v4 - Update function documentation - Make model selection flexible	2025-11-28 15:38:29 +08:00
yangdx	6e2946e78a	Add max_token_size parameter to azure_openai_embed wrapper	2025-11-28 13:41:01 +08:00
clssck	b6074b9a81	chore(lightrag, lightrag_webui): improve code quality and security - Extract PostgreSQL storage check into named variable for clarity - Move APIRouter initialization into create_table_routes function scope - Add robust type handling for database query results - Add input validation for table names and pagination parameters - Add regex-based SQL injection prevention for table name sanitization - Improve clipboard copy fallback logic and error handling - Add memoization for JSON serialization to prevent unnecessary recalculations - Hide meta column from table explorer UI display - Sort table columns alphabetically for consistent ordering - Add keyboard accessibility to status filter buttons - Add preprocessed status filter to document manager - Update @tanstack/react-query from 5.60.0 to 5.87.1 - Extract dev storage config into constant to reduce duplication - Update documentation comments for clarity	2025-11-27 21:39:42 +01:00
clssck	a9edadef45	feat: add Table Explorer feature with dynamic table data fetching and schema display - Implemented Table Explorer component to allow users to select and view database tables. - Added API calls for fetching table list, schema, and paginated data. - Introduced row detail modal for displaying and copying row data. - Enhanced DataTable component to support row click events. - Updated UI components for better user experience and accessibility. - Added mock data for development mode to facilitate testing. - Updated localization files to include new terms related to tables. - Modified settings store to include storage configuration for conditional UI rendering. - Improved styling and layout for various components to align with new design standards.	2025-11-27 18:27:14 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	4f12fe121d	Change entity extraction logging from warning to info level • Reduce log noise for empty entities	2025-11-27 11:00:34 +08:00
palanisd	a898f0548d	Merge branch 'HKUDS:main' into cohere-rerank	2025-11-25 14:21:43 -05:00
yangdx	93d445dfdd	Add pipeline status lock function for legacy compatibility - Add get_pipeline_status_lock function - Return NamespaceLock for consistency - Support workspace parameter - Enable logging option - Legacy code compatibility	2025-11-25 18:24:39 +08:00
EightyOliveira	8994c70f2f	fix:exception handling order error	2025-11-25 16:36:41 +08:00
yangdx	48b67d3077	Handle missing WebUI assets gracefully without blocking server startup - Change build check from error to warning - Redirect to /docs when WebUI unavailable - Add webui_available to health endpoint - Only mount /webui if assets exist - Return status tuple from build check	2025-11-25 02:51:55 +08:00
yangdx	8c4d7a00ad	Refactor: Extract retry decorator to reduce code duplication in Neo4J storage • Define READ_RETRY_EXCEPTIONS constant • Create reusable READ_RETRY decorator • Replace 11 duplicate retry decorators • Improve code maintainability • Add missing retry to edge_degrees_batch	2025-11-25 01:35:21 +08:00
yangdx	7aaa51cda9	Add retry decorators to Neo4j read operations for resilience	2025-11-24 22:28:15 +08:00
copilot-swe-agent[bot]	8835fc244a	Improve edge case handling for max_tokens=1 Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com>	2025-11-24 03:43:05 +00:00
copilot-swe-agent[bot]	1d6ea0c5f7	Fix chunking infinite loop when overlap_tokens >= max_tokens Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com>	2025-11-24 03:40:58 +00:00
netbrah	a05bbf105e	Add Cohere reranker config, chunking, and tests	2025-11-22 16:43:13 -05:00
yangdx	7b76211066	Add fallback to AZURE_OPENAI_API_VERSION for embedding API version	2025-11-22 00:14:35 +08:00
yangdx	ffd8da512e	Improve Azure OpenAI compatibility and error handling • Reduce log noise for Azure content filters • Add default API version fallback • Change warning to debug log level • Handle empty choices in streaming • Better Azure OpenAI integration	2025-11-21 23:51:18 +08:00
yangdx	fafa1791f4	Fix Azure OpenAI model parameter to use deployment name consistently - Use deployment name for Azure API calls - Fix model param in embed function - Consistent api_model logic - Prevent Azure model name conflicts	2025-11-21 23:41:52 +08:00
yangdx	ac9f2574a5	Improve Azure OpenAI wrapper functions with full parameter support • Add missing parameters to wrappers • Update docstrings for clarity • Ensure API consistency • Fix parameter forwarding • Maintain backward compatibility	2025-11-21 19:24:32 +08:00
yangdx	45f4f82392	Refactor Azure OpenAI client creation to support client_configs merging - Handle None client_configs case - Merge configs with explicit params - Override client_configs with params - Use dict unpacking for client init - Maintain parameter precedence	2025-11-21 19:14:16 +08:00
yangdx	0c4cba3860	Fix double decoration in azure_openai_embed and document decorator usage • Remove redundant @retry decorator • Call openai_embed.func directly • Add detailed decorator documentation • Prevent double parameter injection • Fix EmbeddingFunc wrapping issues	2025-11-21 18:03:53 +08:00
yangdx	b46c152306	Fix linting	2025-11-21 17:16:44 +08:00
yangdx	b709f8f869	Consolidate Azure OpenAI implementation into main OpenAI module • Unified OpenAI/Azure client creation • Azure module now re-exports functions • Backward compatibility maintained • Reduced code duplication	2025-11-21 17:12:33 +08:00
yangdx	66d6c7dd6f	Refactor main function to provide sync CLI entry point	2025-11-21 13:11:55 +08:00
yangdx	02fdceb959	Update OpenAI client to use stable API and bump minimum version to 2.0.0 - Remove beta prefix from completions.parse - Update OpenAI dependency to >=2.0.0 - Fix whitespace formatting - Update all requirement files - Clean up pyproject.toml dependencies	2025-11-21 12:55:44 +08:00
yangdx	9f69c5bf85	feat: Support structured output `parsed` from OpenAI Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`. When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.	2025-11-21 12:46:31 +08:00
yangdx	c9e1c86e81	Refactor keyword extraction handling to centralize response format logic • Move response format to core function • Remove duplicate format assignments • Standardize keyword extraction flow • Clean up redundant parameter handling • Improve Azure OpenAI compatibility	2025-11-21 12:10:04 +08:00
yangdx	46ce6d9a13	Fix Azure OpenAI embedding model parameter fallback - Use model param if provided - Fall back to deployment name - Fix embedding API call - Improve parameter handling	2025-11-20 18:20:22 +08:00
Amritpal Singh	30e86fa331	use deployment variable which extracted value from .env file or have default value	2025-11-20 09:00:27 +00:00

1 2 3 4 5 ...

3747 commits