Add extensive test suites for API routes and utilities:
- Implement test_search_routes.py (406 lines) for search endpoint validation
- Implement test_upload_routes.py (724 lines) for document upload workflows
- Implement test_s3_client.py (618 lines) for S3 storage operations
- Implement test_citation_utils.py (352 lines) for citation extraction
- Implement test_chunking.py (216 lines) for text chunking validation
Add S3 storage client implementation:
- Create lightrag/storage/s3_client.py with S3 operations
- Add storage module initialization with exports
- Integrate S3 client with document upload handling
Enhance API routes and core functionality:
- Add search_routes.py with full-text and graph search endpoints
- Add upload_routes.py with multipart document upload support
- Update operate.py with bulk operations and health checks
- Enhance postgres_impl.py with bulk upsert and parameterized queries
- Update lightrag_server.py to register new API routes
- Improve utils.py with citation and formatting utilities
Update dependencies and configuration:
- Add S3 and test dependencies to pyproject.toml
- Update docker-compose.test.yml for testing environment
- Sync uv.lock with new dependencies
Apply code quality improvements across all modified files:
- Add type hints to function signatures
- Update imports and router initialization
- Fix logging and error handling
Fix logging output in evaluation test harness and examples:
- Replace print() statements with logger calls in e2e_test_harness.py
- Update copy_llm_cache_to_another_storage.py to use logger instead of print
- Remove redundant logging configuration in copy_llm_cache_to_another_storage.py
Fix path handling and typos:
- Correct makedirs() call in lightrag_openai_demo.py to create log_dir directly
- Update constants.py comments to clarify SOURCE_IDS_LIMIT_METHOD options
- Remove duplicate return statement in utils.py normalize_extracted_info()
- Fix error string formatting in chroma_impl.py with !s conversion
- Remove unused pipmaster import from chroma_impl.py
Add comprehensive test suites for prompt evaluation:
- test_prompt_accuracy.py: 365 lines testing prompt extraction accuracy
- test_prompt_quality_deep.py: 672 lines for deep quality analysis
- Refactor prompt.py to consolidate optimized variants (removed prompt_optimized.py)
- Apply ruff formatting and type hints across 30 files
- Update pyrightconfig.json for static type checking
- Modernize reproduce scripts and examples with improved type annotations
- Sync uv.lock dependencies
Format entire codebase with ruff and add type hints across all modules:
- Apply ruff formatting to all Python files (121 files, 17K insertions)
- Add type hints to function signatures throughout lightrag core and API
- Update test suite with improved type annotations and docstrings
- Add pyrightconfig.json for static type checking configuration
- Create prompt_optimized.py and test_extraction_prompt_ab.py test files
- Update ruff.toml and .gitignore for improved linting configuration
- Standardize code style across examples, reproduce scripts, and utilities
Add citation tracking and display system across backend and frontend components.
Backend changes include citation.py for document attribution, enhanced query routes
with citation metadata, improved prompt templates, and PostgreSQL schema updates.
Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements,
and ChatMessage enhancements for displaying document sources. Update dependencies
and docker-compose test configuration for improved development workflow.
Add comprehensive test dataset with 7 domain-specific Wikipedia documents
(climate, finance, medical, sports) and corresponding test cases in JSON format.
Total of 2292 lines of test data across 8 files for RAG quality evaluation
and end-to-end testing infrastructure.
Graph Connectivity Awareness:
- Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph)
- Show database degree vs visual degree in node panel with amber badge
- Add visual indicator (amber border) for nodes with hidden connections
- Add "Load X hidden connection(s)" button to expand hidden neighbors
- Add configurable "Expand Depth" setting (1-5) in graph settings
- Use global maxNodes setting for node expansion consistency
Orphan Connection UI:
- Add OrphanConnectionDialog component for manual orphan entity connection
- Add OrphanConnectionControl button in graph sidebar
- Expose /graph/orphans/connect API endpoint for frontend use
Backend Improvements:
- Add get_orphan_entities() and connect_orphan_entities() to base storage
- Add orphan connection configuration parameters
- Improve entity extraction with relationship density requirements
Frontend:
- Add graphExpandDepth and graphIncludeOrphans to settings store
- Add min_degree and include_orphans graph filtering parameters
- Update translations (en.json, zh.json)
Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning,
Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality
assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test
analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py
for automated evaluation pipelines, and ingest_test_docs.py for batch document
ingestion. Updates docker-compose.test.yml with aggressive async settings, memory
limits, and optimized chunking parameters. Parallelize entity summarization in
operate.py for improved extraction performance. Fix typos in merge node/edge logs.
• Add position pool for tqdm bars
• Serialize tqdm creation with lock
• Set leave=False to clear completed bars
• Pass position/lock to eval tasks
• Import tqdm.auto for better display
• Move rag_semaphore to wrap full function
• Increase RAG concurrency to 2x eval limit
• Prevent memory buildup from slow evals
• Keep eval_semaphore for RAGAS control
• Split RAG gen and eval stages
• Add rag_semaphore for stage 1
• Add eval_semaphore for stage 2
• Improve concurrency control
• Update connection pool limits
• Add tqdm progress bar for eval steps
• Pass progress bar to RAGAS evaluate
• Ensure progress bar cleanup in finally
• Remove redundant output buffer flushes
- Change default model to gpt-4o-mini
- Add deprecation warning suppression
- Update docs and comments for LightRAG
- Improve output formatting and timing
- Fix RAGAS LLM wrapper compatibility
- Add concurrency control for rate limits
- Add eval env vars for model config
- Improve error handling and logging
- Update documentation with examples
• Load .env from current directory
• Support LIGHTRAG_API_KEY auth header
• Override=False for env precedence
• Add Bearer token to API requests
• Enable per-instance .env configs
Add 5 markdown documents that users can index to reproduce evaluation results.
Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score
Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%
Changes:
- Move sys.exit() calls from module level to __init__() method
- Raise proper exceptions (ImportError, ValueError, EnvironmentError) instead of sys.exit()
- Add lazy import for RAGEvaluator in __init__.py using __getattr__
- Update README to clarify sample_dataset.json contains generic test data (not personal)
- Fix README to reflect actual output format (JSON + CSV, not HTML)
- Improve documentation for custom test case creation
Addresses code review feedback about import-time validation and module exports.
BREAKING CHANGE: The `content` field in query response references is now
an array of strings instead of a concatenated string. This preserves
individual chunk boundaries when a single file has multiple chunks.
Changes:
- Update QueryResponse Pydantic model to accept List[str] for content
- Modify query_text endpoint to return content as list (query_routes.py:425)
- Modify query_text_stream endpoint to support chunk content enrichment
- Update OpenAPI schema and examples to reflect array structure
- Update API README with breaking change notice and migration guide
- Fix RAGAS evaluation to flatten chunk content lists
**Critical Fix: Contexts vs Ground Truth**
- RAGAS metrics now evaluate actual retrieval performance
- Previously: Used ground_truth as contexts (always perfect scores)
- Now: Uses retrieved documents from LightRAG API (real evaluation)
**Changes to generate_rag_response (lines 100-156)**:
- Remove unused 'context' parameter
- Change return type: Dict[str, str] → Dict[str, Any]
- Extract contexts as list of strings from references[].text
- Return 'contexts' key instead of 'context' (JSON dump)
- Add response.raise_for_status() for better error handling
- Add httpx.HTTPStatusError exception handler
**Changes to evaluate_responses (lines 180-191)**:
- Line 183: Extract retrieved_contexts from rag_response
- Line 190: Use [retrieved_contexts] instead of [[ground_truth]]
- Now correctly evaluates: retrieval quality, not ground_truth quality
**Impact on RAGAS Metrics**:
- Context Precision: Now ranks actual retrieved docs by relevance
- Context Recall: Compares ground_truth against actual retrieval
- Faithfulness: Verifies answer based on actual retrieved contexts
- Answer Relevance: Unchanged (question-answer relevance)
Fixes incorrect evaluation methodology. Based on RAGAS documentation:
- contexts = retrieved documents from RAG system
- ground_truth = reference answer for context_recall metric
References:
- https://docs.ragas.io/en/stable/concepts/components/eval_dataset/
- https://docs.ragas.io/en/stable/concepts/metrics/
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)
**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md
Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.