Commit graph

11 commits

Author SHA1 Message Date
clssck
abb44eccb1 feat(lightrag): improve entity extraction prompts and rerank chunking
Enhance entity extraction with better structured prompts:
- Reorganize prompt format for improved clarity and consistency
- Add XML-style formatting tags for better LLM parsing
- Include language parameter in keywords extraction cache key
- Fix language parameter usage in keywords_extraction prompt

Improve rerank module with chunking fixes:
- Fix top_n behavior to limit documents instead of chunks
- Add Cohere reranker support with proper chunking
- Improve error handling for rerank API responses

Update operate.py:
- Better entity extraction parsing and validation
- Improved cache key generation for multilingual support
2025-12-12 16:45:14 +01:00
clssck
59e89772de refactor: consolidate to PostgreSQL-only backend and modernize stack
Remove legacy storage implementations and deprecated examples:
- Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends
- Remove Kubernetes deployment manifests and installation scripts
- Delete unofficial examples for deprecated backends and offline deployment docs
Streamline core infrastructure:
- Consolidate storage layer to PostgreSQL-only implementation
- Add full-text search caching with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
Modernize frontend and tooling:
- Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles
- Update Dockerfile for PostgreSQL-only deployment
- Add Makefile for common development tasks
- Update environment and configuration examples
Enhance evaluation and testing capabilities:
- Add prompt optimization with DSPy and auto-tuning
- Implement ground truth regeneration and variant testing
- Add prompt debugging and response comparison utilities
- Expand test coverage with new integration scenarios
Simplify dependencies and configuration:
- Remove offline-specific requirement files
- Update pyproject.toml with streamlined dependencies
- Add Python version pinning with .python-version
- Create project guidelines in CLAUDE.md and AGENTS.md
2025-12-12 16:28:49 +01:00
clssck
da9070ecf7 refactor: remove legacy storage implementations and k8s deployment
Remove deprecated storage backends and Kubernetes deployment configuration:
- Delete unused storage implementations: FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis
- Remove Kubernetes deployment manifests and installation scripts
- Delete legacy examples for deprecated backends
- Consolidate to PostgreSQL-only storage backend
Streamline dependencies and add new capabilities:
- Remove deprecated code documentation and migration guides
- Add full-text search caching layer with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
- Simplify configuration with PostgreSQL-focused setup
Update documentation and configuration:
- Rewrite README to focus on supported features
- Update environment and configuration examples
- Remove Kubernetes-specific documentation
- Add new utility scripts for PDF uploads and pipeline monitoring
2025-12-09 14:02:00 +01:00
clssck
69358d830d test(lightrag,examples,api): comprehensive ruff formatting and type hints
Format entire codebase with ruff and add type hints across all modules:
- Apply ruff formatting to all Python files (121 files, 17K insertions)
- Add type hints to function signatures throughout lightrag core and API
- Update test suite with improved type annotations and docstrings
- Add pyrightconfig.json for static type checking configuration
- Create prompt_optimized.py and test_extraction_prompt_ab.py test files
- Update ruff.toml and .gitignore for improved linting configuration
- Standardize code style across examples, reproduce scripts, and utilities
2025-12-05 15:17:06 +01:00
clssck
9bae6267f6
chore: sync with upstream (#4)
* chore: sync with upstream

- Cohere rerank improvements
- Content deduplication
- Dependency updates

* fix: address CodeRabbit review feedback

- Harden env parsing for RERANK_MAX_TOKENS_PER_DOC with try/except
- Add @pytest.mark.offline to test_overlap_validation
- Remove unused doc_indices variable
2025-12-03 13:16:28 +01:00
yangdx
bf43e1b8c1 fix: Resolve default rerank config problem when env var missing
- Read config from selected_rerank_func when env var missing
- Make api_key optional for rerank function
- Add response format validation with proper error handling
- Update Cohere rerank default to official API endpoint
2025-08-23 01:07:59 +08:00
yangdx
580cb7906c feat: Add multiple rerank provider support to LightRAG Server by adding new env vars and cli params
- Add --enable-rerank CLI argument and ENABLE_RERANK env var
- Simplify rerank configuration logic to only check enable flag and binding
- Update health endpoint to show enable_rerank and rerank_configured status
- Improve logging messages for rerank enable/disable states
- Maintain backward compatibility with default value True
2025-08-22 19:29:45 +08:00
yangdx
cb3bf3291c Fix: rename rerank parameter from top_k to top_n
The change aligns with the API parameter naming used by Jina and Cohere rerank services, ensuring consistency and clarity.
2025-07-20 00:26:27 +08:00
zrguo
9a9f0f2463 Update rerank_example & readme 2025-07-15 12:17:27 +08:00
zrguo
f5c80d7cde Simplify Configuration 2025-07-08 11:16:34 +08:00
zrguo
75dd4f3498 add rerank model 2025-07-07 22:44:59 +08:00