Commit graph

5834 commits

Author SHA1 Message Date
clssck
ef7327bb3e chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools
Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning,
Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality
assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test
analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py
for automated evaluation pipelines, and ingest_test_docs.py for batch document
ingestion. Updates docker-compose.test.yml with aggressive async settings, memory
limits, and optimized chunking parameters. Parallelize entity summarization in
operate.py for improved extraction performance. Fix typos in merge node/edge logs.
2025-11-29 10:39:20 +01:00
clssck
d2c9e6e2ec test(lightrag): add orphan connection feature with quality validation tests
Implement automatic orphan entity connection system that identifies entities with
no relationships and creates meaningful connections via vector similarity + LLM
validation. This improves knowledge graph connectivity and retrieval quality.
Changes:
- Add orphan connection configuration parameters (thresholds, cross-connect settings)
- Implement aconnect_orphan_entities() method with 4-step validation pipeline
- Add SQL templates for efficient orphan and candidate entity queries
- Create POST /graph/orphans/connect API endpoint with configurable parameters
- Add orphan connection validation prompt for LLM-based relationship verification
- Include relationship density requirement in extraction prompts to prevent orphans
- Update docker-compose.test.yml with optimized extraction parameters
- Add quality validation test suite (run_quality_tests.py) for retrieval evaluation
- Add unit test framework (test_orphan_connection_quality.py) with test cases
- Enable auto-run of orphan connection after document processing
2025-11-28 18:23:30 +01:00
clssck
90825e823a remove inherited workflows, keep only docker-publish 2025-11-28 09:10:38 +00:00
clssck
3b250fd0d0 simplify docker workflow to manual trigger only 2025-11-28 08:43:36 +00:00
clssck
b6074b9a81 chore(lightrag, lightrag_webui): improve code quality and security
- Extract PostgreSQL storage check into named variable for clarity
- Move APIRouter initialization into create_table_routes function scope
- Add robust type handling for database query results
- Add input validation for table names and pagination parameters
- Add regex-based SQL injection prevention for table name sanitization
- Improve clipboard copy fallback logic and error handling
- Add memoization for JSON serialization to prevent unnecessary recalculations
- Hide meta column from table explorer UI display
- Sort table columns alphabetically for consistent ordering
- Add keyboard accessibility to status filter buttons
- Add preprocessed status filter to document manager
- Update @tanstack/react-query from 5.60.0 to 5.87.1
- Extract dev storage config into constant to reduce duplication
- Update documentation comments for clarity
2025-11-27 21:39:42 +01:00
clssck
a9edadef45 feat: add Table Explorer feature with dynamic table data fetching and schema display
- Implemented Table Explorer component to allow users to select and view database tables.
- Added API calls for fetching table list, schema, and paginated data.
- Introduced row detail modal for displaying and copying row data.
- Enhanced DataTable component to support row click events.
- Updated UI components for better user experience and accessibility.
- Added mock data for development mode to facilitate testing.
- Updated localization files to include new terms related to tables.
- Modified settings store to include storage configuration for conditional UI rendering.
- Improved styling and layout for various components to align with new design standards.
2025-11-27 18:27:14 +01:00
clssck
48c7732edc feat: add automatic entity resolution with 3-layer matching
Implement automatic entity resolution to prevent duplicate nodes in the
knowledge graph. The system uses a 3-layer approach:

1. Case-insensitive exact matching (free, instant)
2. Fuzzy string matching >85% threshold (free, instant)
3. Vector similarity + LLM verification (for acronyms/synonyms)

Key features:
- Pre-resolution phase prevents race conditions in parallel processing
- Numeric suffix detection blocks false matches (IL-4 ≠ IL-13)
- PostgreSQL alias cache for fast lookups on subsequent ingestion
- Configurable thresholds via environment variables

Bug fixes included:
- Fix fuzzy matching false positives for numbered entities
- Fix alias cache not being populated (missing db parameter)
- Skip entity_aliases table from generic id index creation

New files:
- lightrag/entity_resolution/ - Core resolution module
- tests/test_entity_resolution/ - Unit tests
- docker/postgres-age-vector/ - Custom PG image with pgvector + AGE
- docker-compose.test.yml - Integration test environment

Configuration (env.example):
- ENTITY_RESOLUTION_ENABLED=true
- ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85
- ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5
- ENTITY_RESOLUTION_MAX_CANDIDATES=3
2025-11-27 15:35:02 +01:00
yangdx
4f12fe121d Change entity extraction logging from warning to info level
• Reduce log noise for empty entities
2025-11-27 11:00:34 +08:00
yangdx
93d445dfdd Add pipeline status lock function for legacy compatibility
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility
2025-11-25 18:24:39 +08:00
Daniel.y
d2cd1c0722
Merge pull request #2421 from EightyOliveira/fix_catch_order
fix:exception handling order error
2025-11-25 17:52:56 +08:00
yangdx
777c91794b Add Langfuse observability configuration to env.example
- Add Langfuse environment variables
- Include setup instructions
- Support OpenAI compatible APIs
- Enable tracing configuration
- Add cloud/self-host options
2025-11-25 17:16:55 +08:00
EightyOliveira
8994c70f2f fix:exception handling order error 2025-11-25 16:36:41 +08:00
Daniel.y
2539b4e2c8
Merge pull request #2418 from danielaskdd/start-without-webui
Refact: Allow API Server to Start Without Built WebUI Assets
2025-11-25 03:02:15 +08:00
yangdx
48b67d3077 Handle missing WebUI assets gracefully without blocking server startup
- Change build check from error to warning
- Redirect to /docs when WebUI unavailable
- Add webui_available to health endpoint
- Only mount /webui if assets exist
- Return status tuple from build check
2025-11-25 02:51:55 +08:00
Daniel.y
2832a2ca7e
Merge pull request #2417 from danielaskdd/neo4j-retry
Fix: Add Comprehensive Retry Mechanism for Neo4j Storage Operations
2025-11-25 02:03:48 +08:00
yangdx
5f91063c7a Add ruff as dependency to pytest and evaluation extras 2025-11-25 02:03:28 +08:00
yangdx
8c4d7a00ad Refactor: Extract retry decorator to reduce code duplication in Neo4J storage
• Define READ_RETRY_EXCEPTIONS constant
• Create reusable READ_RETRY decorator
• Replace 11 duplicate retry decorators
• Improve code maintainability
• Add missing retry to edge_degrees_batch
2025-11-25 01:35:21 +08:00
Daniel.y
5b81ef000e
Merge pull request #2410 from netbrah/create-copilot-setup-steps
feat: create copilot-setup-steps.yml
2025-11-24 22:36:33 +08:00
yangdx
7aaa51cda9 Add retry decorators to Neo4j read operations for resilience 2025-11-24 22:28:15 +08:00
palanisd
c233da6318
Update copilot-setup-steps.yml 2025-11-23 17:42:04 -05:00
palanisd
1b0413ee74
Create copilot-setup-steps.yml 2025-11-22 15:29:05 -05:00
chaohuang-ai
16eb0d5bee
Merge pull request #2409 from HKUDS/chaohuang-ai-patch-3
Update README.md
2025-11-23 00:54:04 +08:00
chaohuang-ai
37178462ab
Update README.md 2025-11-23 00:53:39 +08:00
chaohuang-ai
6d3bfe46d0
Merge pull request #2408 from HKUDS/chaohuang-ai-patch-2
Update README.md
2025-11-23 00:50:16 +08:00
chaohuang-ai
babbcb566b
Update README.md 2025-11-23 00:48:52 +08:00
yangdx
5f53de8866 Fix Azure configuration examples and correct typos in env.example 2025-11-22 09:05:52 +08:00
yangdx
fa6797f246 Update env.example 2025-11-22 00:32:12 +08:00
yangdx
49fb11e205 Update Azure OpenAI configuration examples 2025-11-22 00:19:23 +08:00
yangdx
7b76211066 Add fallback to AZURE_OPENAI_API_VERSION for embedding API version 2025-11-22 00:14:35 +08:00
yangdx
ffd8da512e Improve Azure OpenAI compatibility and error handling
• Reduce log noise for Azure content filters
• Add default API version fallback
• Change warning to debug log level
• Handle empty choices in streaming
• Better Azure OpenAI integration
2025-11-21 23:51:18 +08:00
yangdx
fafa1791f4 Fix Azure OpenAI model parameter to use deployment name consistently
- Use deployment name for Azure API calls
- Fix model param in embed function
- Consistent api_model logic
- Prevent Azure model name conflicts
2025-11-21 23:41:52 +08:00
Daniel.y
021b637dc3
Merge pull request #2403 from danielaskdd/azure-cot-handling
Refact: Consolidate Azure OpenAI and OpenAI implementations
2025-11-21 19:36:12 +08:00
yangdx
ac9f2574a5 Improve Azure OpenAI wrapper functions with full parameter support
• Add missing parameters to wrappers
• Update docstrings for clarity
• Ensure API consistency
• Fix parameter forwarding
• Maintain backward compatibility
2025-11-21 19:24:32 +08:00
yangdx
45f4f82392 Refactor Azure OpenAI client creation to support client_configs merging
- Handle None client_configs case
- Merge configs with explicit params
- Override client_configs with params
- Use dict unpacking for client init
- Maintain parameter precedence
2025-11-21 19:14:16 +08:00
yangdx
0c4cba3860 Fix double decoration in azure_openai_embed and document decorator usage
• Remove redundant @retry decorator
• Call openai_embed.func directly
• Add detailed decorator documentation
• Prevent double parameter injection
• Fix EmbeddingFunc wrapping issues
2025-11-21 18:03:53 +08:00
yangdx
b46c152306 Fix linting 2025-11-21 17:16:44 +08:00
yangdx
b709f8f869 Consolidate Azure OpenAI implementation into main OpenAI module
• Unified OpenAI/Azure client creation
• Azure module now re-exports functions
• Backward compatibility maintained
• Reduced code duplication
2025-11-21 17:12:33 +08:00
yangdx
66d6c7dd6f Refactor main function to provide sync CLI entry point 2025-11-21 13:11:55 +08:00
Daniel.y
8777895efc
Merge pull request #2401 from danielaskdd/fix-openai-keyword-extraction
Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations
2025-11-21 13:08:15 +08:00
yangdx
1e477e95ef Add lightrag-clean-llmqc console script entry point
- Add clean_llm_query_cache tool
- New console script for cache cleanup
- Extend CLI tool availability
2025-11-21 12:59:49 +08:00
yangdx
02fdceb959 Update OpenAI client to use stable API and bump minimum version to 2.0.0
- Remove beta prefix from completions.parse
- Update OpenAI dependency to >=2.0.0
- Fix whitespace formatting
- Update all requirement files
- Clean up pyproject.toml dependencies
2025-11-21 12:55:44 +08:00
yangdx
9f69c5bf85 feat: Support structured output parsed from OpenAI
Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`.

When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.
2025-11-21 12:46:31 +08:00
yangdx
c9e1c86e81 Refactor keyword extraction handling to centralize response format logic
• Move response format to core function
• Remove duplicate format assignments
• Standardize keyword extraction flow
• Clean up redundant parameter handling
• Improve Azure OpenAI compatibility
2025-11-21 12:10:04 +08:00
yangdx
46ce6d9a13 Fix Azure OpenAI embedding model parameter fallback
- Use model param if provided
- Fall back to deployment name
- Fix embedding API call
- Improve parameter handling
2025-11-20 18:20:22 +08:00
Daniel.y
cc78e2df10
Merge pull request #2395 from Amrit75/issue-2394
issue-2394: use deployment variable instead of model for embeddings API call
2025-11-20 18:10:49 +08:00
Amritpal Singh
30e86fa331 use deployment variable which extracted value from .env file or have default value 2025-11-20 09:00:27 +00:00
yangdx
ecea93992a Fix lingting 2025-11-20 13:03:31 +08:00
yangdx
1d2f534f3d Fix linting 2025-11-20 13:02:25 +08:00
yangdx
72ece7343a Remove obsolete config file and paging design doc 2025-11-20 13:00:13 +08:00
yangdx
1e415cff95 Update postgreSQL docker image link 2025-11-20 12:34:49 +08:00