Commit graph

5841 commits

Author SHA1 Message Date
clssck
8d099fc3ac chore: sync with upstream HKUDS/LightRAG
- Add KaTeX extensions (mhchem for chemistry, copy-tex for copying)
- Add CASCADE to AGE extension for PostgreSQL
- Remove future dependency, replace passlib with bcrypt
- Fix Jina embedding configuration and provider defaults
- Update gunicorn help text and bump API version to 0258
- Documentation and README updates
2025-12-01 21:30:19 +01:00
clssck
1bdd906753 chore(lightrag): remove legacy prompts and clean up prompt.py
Remove unused LLM-generated citation prompts that were kept for backward
compatibility but never referenced in codebase. Consolidate duplicate
instructions in entity summarization prompt and fix minor typos.

- Remove rag_response_with_llm_citations prompt (dead code)
- Remove naive_rag_response_with_llm_citations prompt (dead code)
- Remove unused cite_ready_* backward compatibility aliases
- Consolidate duplicate context/objectivity instructions in summarize prompt
- Fix typo in example (extra parenthesis)
- Clarify delimiter documentation comment
2025-12-01 21:02:44 +01:00
clssck
663ada943a chore: add citation system and enhance RAG UI components
Add citation tracking and display system across backend and frontend components.
Backend changes include citation.py for document attribution, enhanced query routes
with citation metadata, improved prompt templates, and PostgreSQL schema updates.
Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements,
and ChatMessage enhancements for displaying document sources. Update dependencies
and docker-compose test configuration for improved development workflow.
2025-12-01 17:50:00 +01:00
clssck
77df910525 chore: add citation system and code formatting setup
Add citation.py module for document citation tracking and management.
Configure Biome and Ruff for consistent code formatting across TypeScript
and Python. Update webui with improved component organization, API client
refactoring, and enhanced user interface patterns. Add formatting configs
and dependency updates for build toolchain optimization.
2025-11-30 20:51:43 +01:00
clssck
4e58da3583 style(lightrag_webui): fix indentation, color palette, and component optimization
- Fix inconsistent indentation in App.tsx (66 → 68 chars)
- Refactor GraphControl reducer logic: cache selection/theme in refs to prevent expensive re-renders on every hover/selection change; extract nodeReducer and edgeReducer to useCallback with stable dependencies
- Improve GraphViewer performance: extract FocusSync and GraphSearchWithSelection components to prevent re-renders from unrelated store updates
- Remove unused imports (X icon, ZapIcon, i18n)
- Remove unused function parameter (storageConfig)
- Standardize dark theme colors: improve contrast and visual hierarchy (hsl values); update scrollbar colors for better visibility
- Normalize quote style: double quotes → single quotes in className attributes
- Fix form element styling: improve dark mode button hover states (gray-800/900 → gray-700/800, red-900 → red-800)
- Optimize dropdown menu colors: dark mode backgrounds (gray-900/gray-800)
- Relocate HIDDEN_COLUMNS constant to module level in TableExplorer
- Optimize RowDetailModal: move entries computation to useMemo for perf
- Fix useLightragGraph dependency array: add missing minDegree and includeOrphans dependencies
2025-11-30 20:15:27 +01:00
clssck
9f5948650e chore(lightrag): add wikipedia test dataset for evaluation
Add comprehensive test dataset with 7 domain-specific Wikipedia documents
(climate, finance, medical, sports) and corresponding test cases in JSON format.
Total of 2292 lines of test data across 8 files for RAG quality evaluation
and end-to-end testing infrastructure.
2025-11-30 20:14:52 +01:00
clssck
43af31f888 feat: add db_degree visibility and orphan connection UI
Graph Connectivity Awareness:
- Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph)
- Show database degree vs visual degree in node panel with amber badge
- Add visual indicator (amber border) for nodes with hidden connections
- Add "Load X hidden connection(s)" button to expand hidden neighbors
- Add configurable "Expand Depth" setting (1-5) in graph settings
- Use global maxNodes setting for node expansion consistency

Orphan Connection UI:
- Add OrphanConnectionDialog component for manual orphan entity connection
- Add OrphanConnectionControl button in graph sidebar
- Expose /graph/orphans/connect API endpoint for frontend use

Backend Improvements:
- Add get_orphan_entities() and connect_orphan_entities() to base storage
- Add orphan connection configuration parameters
- Improve entity extraction with relationship density requirements

Frontend:
- Add graphExpandDepth and graphIncludeOrphans to settings store
- Add min_degree and include_orphans graph filtering parameters
- Update translations (en.json, zh.json)
2025-11-29 21:08:07 +01:00
clssck
ef7327bb3e chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools
Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning,
Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality
assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test
analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py
for automated evaluation pipelines, and ingest_test_docs.py for batch document
ingestion. Updates docker-compose.test.yml with aggressive async settings, memory
limits, and optimized chunking parameters. Parallelize entity summarization in
operate.py for improved extraction performance. Fix typos in merge node/edge logs.
2025-11-29 10:39:20 +01:00
clssck
d2c9e6e2ec test(lightrag): add orphan connection feature with quality validation tests
Implement automatic orphan entity connection system that identifies entities with
no relationships and creates meaningful connections via vector similarity + LLM
validation. This improves knowledge graph connectivity and retrieval quality.
Changes:
- Add orphan connection configuration parameters (thresholds, cross-connect settings)
- Implement aconnect_orphan_entities() method with 4-step validation pipeline
- Add SQL templates for efficient orphan and candidate entity queries
- Create POST /graph/orphans/connect API endpoint with configurable parameters
- Add orphan connection validation prompt for LLM-based relationship verification
- Include relationship density requirement in extraction prompts to prevent orphans
- Update docker-compose.test.yml with optimized extraction parameters
- Add quality validation test suite (run_quality_tests.py) for retrieval evaluation
- Add unit test framework (test_orphan_connection_quality.py) with test cases
- Enable auto-run of orphan connection after document processing
2025-11-28 18:23:30 +01:00
clssck
90825e823a remove inherited workflows, keep only docker-publish 2025-11-28 09:10:38 +00:00
clssck
3b250fd0d0 simplify docker workflow to manual trigger only 2025-11-28 08:43:36 +00:00
clssck
b6074b9a81 chore(lightrag, lightrag_webui): improve code quality and security
- Extract PostgreSQL storage check into named variable for clarity
- Move APIRouter initialization into create_table_routes function scope
- Add robust type handling for database query results
- Add input validation for table names and pagination parameters
- Add regex-based SQL injection prevention for table name sanitization
- Improve clipboard copy fallback logic and error handling
- Add memoization for JSON serialization to prevent unnecessary recalculations
- Hide meta column from table explorer UI display
- Sort table columns alphabetically for consistent ordering
- Add keyboard accessibility to status filter buttons
- Add preprocessed status filter to document manager
- Update @tanstack/react-query from 5.60.0 to 5.87.1
- Extract dev storage config into constant to reduce duplication
- Update documentation comments for clarity
2025-11-27 21:39:42 +01:00
clssck
a9edadef45 feat: add Table Explorer feature with dynamic table data fetching and schema display
- Implemented Table Explorer component to allow users to select and view database tables.
- Added API calls for fetching table list, schema, and paginated data.
- Introduced row detail modal for displaying and copying row data.
- Enhanced DataTable component to support row click events.
- Updated UI components for better user experience and accessibility.
- Added mock data for development mode to facilitate testing.
- Updated localization files to include new terms related to tables.
- Modified settings store to include storage configuration for conditional UI rendering.
- Improved styling and layout for various components to align with new design standards.
2025-11-27 18:27:14 +01:00
clssck
48c7732edc feat: add automatic entity resolution with 3-layer matching
Implement automatic entity resolution to prevent duplicate nodes in the
knowledge graph. The system uses a 3-layer approach:

1. Case-insensitive exact matching (free, instant)
2. Fuzzy string matching >85% threshold (free, instant)
3. Vector similarity + LLM verification (for acronyms/synonyms)

Key features:
- Pre-resolution phase prevents race conditions in parallel processing
- Numeric suffix detection blocks false matches (IL-4 ≠ IL-13)
- PostgreSQL alias cache for fast lookups on subsequent ingestion
- Configurable thresholds via environment variables

Bug fixes included:
- Fix fuzzy matching false positives for numbered entities
- Fix alias cache not being populated (missing db parameter)
- Skip entity_aliases table from generic id index creation

New files:
- lightrag/entity_resolution/ - Core resolution module
- tests/test_entity_resolution/ - Unit tests
- docker/postgres-age-vector/ - Custom PG image with pgvector + AGE
- docker-compose.test.yml - Integration test environment

Configuration (env.example):
- ENTITY_RESOLUTION_ENABLED=true
- ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85
- ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5
- ENTITY_RESOLUTION_MAX_CANDIDATES=3
2025-11-27 15:35:02 +01:00
yangdx
4f12fe121d Change entity extraction logging from warning to info level
• Reduce log noise for empty entities
2025-11-27 11:00:34 +08:00
yangdx
93d445dfdd Add pipeline status lock function for legacy compatibility
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility
2025-11-25 18:24:39 +08:00
Daniel.y
d2cd1c0722
Merge pull request #2421 from EightyOliveira/fix_catch_order
fix:exception handling order error
2025-11-25 17:52:56 +08:00
yangdx
777c91794b Add Langfuse observability configuration to env.example
- Add Langfuse environment variables
- Include setup instructions
- Support OpenAI compatible APIs
- Enable tracing configuration
- Add cloud/self-host options
2025-11-25 17:16:55 +08:00
EightyOliveira
8994c70f2f fix:exception handling order error 2025-11-25 16:36:41 +08:00
Daniel.y
2539b4e2c8
Merge pull request #2418 from danielaskdd/start-without-webui
Refact: Allow API Server to Start Without Built WebUI Assets
2025-11-25 03:02:15 +08:00
yangdx
48b67d3077 Handle missing WebUI assets gracefully without blocking server startup
- Change build check from error to warning
- Redirect to /docs when WebUI unavailable
- Add webui_available to health endpoint
- Only mount /webui if assets exist
- Return status tuple from build check
2025-11-25 02:51:55 +08:00
Daniel.y
2832a2ca7e
Merge pull request #2417 from danielaskdd/neo4j-retry
Fix: Add Comprehensive Retry Mechanism for Neo4j Storage Operations
2025-11-25 02:03:48 +08:00
yangdx
5f91063c7a Add ruff as dependency to pytest and evaluation extras 2025-11-25 02:03:28 +08:00
yangdx
8c4d7a00ad Refactor: Extract retry decorator to reduce code duplication in Neo4J storage
• Define READ_RETRY_EXCEPTIONS constant
• Create reusable READ_RETRY decorator
• Replace 11 duplicate retry decorators
• Improve code maintainability
• Add missing retry to edge_degrees_batch
2025-11-25 01:35:21 +08:00
Daniel.y
5b81ef000e
Merge pull request #2410 from netbrah/create-copilot-setup-steps
feat: create copilot-setup-steps.yml
2025-11-24 22:36:33 +08:00
yangdx
7aaa51cda9 Add retry decorators to Neo4j read operations for resilience 2025-11-24 22:28:15 +08:00
palanisd
c233da6318
Update copilot-setup-steps.yml 2025-11-23 17:42:04 -05:00
palanisd
1b0413ee74
Create copilot-setup-steps.yml 2025-11-22 15:29:05 -05:00
chaohuang-ai
16eb0d5bee
Merge pull request #2409 from HKUDS/chaohuang-ai-patch-3
Update README.md
2025-11-23 00:54:04 +08:00
chaohuang-ai
37178462ab
Update README.md 2025-11-23 00:53:39 +08:00
chaohuang-ai
6d3bfe46d0
Merge pull request #2408 from HKUDS/chaohuang-ai-patch-2
Update README.md
2025-11-23 00:50:16 +08:00
chaohuang-ai
babbcb566b
Update README.md 2025-11-23 00:48:52 +08:00
yangdx
5f53de8866 Fix Azure configuration examples and correct typos in env.example 2025-11-22 09:05:52 +08:00
yangdx
fa6797f246 Update env.example 2025-11-22 00:32:12 +08:00
yangdx
49fb11e205 Update Azure OpenAI configuration examples 2025-11-22 00:19:23 +08:00
yangdx
7b76211066 Add fallback to AZURE_OPENAI_API_VERSION for embedding API version 2025-11-22 00:14:35 +08:00
yangdx
ffd8da512e Improve Azure OpenAI compatibility and error handling
• Reduce log noise for Azure content filters
• Add default API version fallback
• Change warning to debug log level
• Handle empty choices in streaming
• Better Azure OpenAI integration
2025-11-21 23:51:18 +08:00
yangdx
fafa1791f4 Fix Azure OpenAI model parameter to use deployment name consistently
- Use deployment name for Azure API calls
- Fix model param in embed function
- Consistent api_model logic
- Prevent Azure model name conflicts
2025-11-21 23:41:52 +08:00
Daniel.y
021b637dc3
Merge pull request #2403 from danielaskdd/azure-cot-handling
Refact: Consolidate Azure OpenAI and OpenAI implementations
2025-11-21 19:36:12 +08:00
yangdx
ac9f2574a5 Improve Azure OpenAI wrapper functions with full parameter support
• Add missing parameters to wrappers
• Update docstrings for clarity
• Ensure API consistency
• Fix parameter forwarding
• Maintain backward compatibility
2025-11-21 19:24:32 +08:00
yangdx
45f4f82392 Refactor Azure OpenAI client creation to support client_configs merging
- Handle None client_configs case
- Merge configs with explicit params
- Override client_configs with params
- Use dict unpacking for client init
- Maintain parameter precedence
2025-11-21 19:14:16 +08:00
yangdx
0c4cba3860 Fix double decoration in azure_openai_embed and document decorator usage
• Remove redundant @retry decorator
• Call openai_embed.func directly
• Add detailed decorator documentation
• Prevent double parameter injection
• Fix EmbeddingFunc wrapping issues
2025-11-21 18:03:53 +08:00
yangdx
b46c152306 Fix linting 2025-11-21 17:16:44 +08:00
yangdx
b709f8f869 Consolidate Azure OpenAI implementation into main OpenAI module
• Unified OpenAI/Azure client creation
• Azure module now re-exports functions
• Backward compatibility maintained
• Reduced code duplication
2025-11-21 17:12:33 +08:00
yangdx
66d6c7dd6f Refactor main function to provide sync CLI entry point 2025-11-21 13:11:55 +08:00
Daniel.y
8777895efc
Merge pull request #2401 from danielaskdd/fix-openai-keyword-extraction
Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations
2025-11-21 13:08:15 +08:00
yangdx
1e477e95ef Add lightrag-clean-llmqc console script entry point
- Add clean_llm_query_cache tool
- New console script for cache cleanup
- Extend CLI tool availability
2025-11-21 12:59:49 +08:00
yangdx
02fdceb959 Update OpenAI client to use stable API and bump minimum version to 2.0.0
- Remove beta prefix from completions.parse
- Update OpenAI dependency to >=2.0.0
- Fix whitespace formatting
- Update all requirement files
- Clean up pyproject.toml dependencies
2025-11-21 12:55:44 +08:00
yangdx
9f69c5bf85 feat: Support structured output parsed from OpenAI
Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`.

When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.
2025-11-21 12:46:31 +08:00
yangdx
c9e1c86e81 Refactor keyword extraction handling to centralize response format logic
• Move response format to core function
• Remove duplicate format assignments
• Standardize keyword extraction flow
• Clean up redundant parameter handling
• Improve Azure OpenAI compatibility
2025-11-21 12:10:04 +08:00