LightRAG

Author	SHA1	Message	Date
clssck	59e89772de	refactor: consolidate to PostgreSQL-only backend and modernize stack Remove legacy storage implementations and deprecated examples: - Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends - Remove Kubernetes deployment manifests and installation scripts - Delete unofficial examples for deprecated backends and offline deployment docs Streamline core infrastructure: - Consolidate storage layer to PostgreSQL-only implementation - Add full-text search caching with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes Modernize frontend and tooling: - Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles - Update Dockerfile for PostgreSQL-only deployment - Add Makefile for common development tasks - Update environment and configuration examples Enhance evaluation and testing capabilities: - Add prompt optimization with DSPy and auto-tuning - Implement ground truth regeneration and variant testing - Add prompt debugging and response comparison utilities - Expand test coverage with new integration scenarios Simplify dependencies and configuration: - Remove offline-specific requirement files - Update pyproject.toml with streamlined dependencies - Add Python version pinning with .python-version - Create project guidelines in CLAUDE.md and AGENTS.md	2025-12-12 16:28:49 +01:00
clssck	da9070ecf7	refactor: remove legacy storage implementations and k8s deployment Remove deprecated storage backends and Kubernetes deployment configuration: - Delete unused storage implementations: FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis - Remove Kubernetes deployment manifests and installation scripts - Delete legacy examples for deprecated backends - Consolidate to PostgreSQL-only storage backend Streamline dependencies and add new capabilities: - Remove deprecated code documentation and migration guides - Add full-text search caching layer with FTS cache module - Implement metrics collection and monitoring pipeline - Add explain and metrics API routes - Simplify configuration with PostgreSQL-focused setup Update documentation and configuration: - Rewrite README to focus on supported features - Update environment and configuration examples - Remove Kubernetes-specific documentation - Add new utility scripts for PDF uploads and pipeline monitoring	2025-12-09 14:02:00 +01:00
clssck	65d2cd16b1	feat(examples, lightrag): fix logging and code improvements Fix logging output in evaluation test harness and examples: - Replace print() statements with logger calls in e2e_test_harness.py - Update copy_llm_cache_to_another_storage.py to use logger instead of print - Remove redundant logging configuration in copy_llm_cache_to_another_storage.py Fix path handling and typos: - Correct makedirs() call in lightrag_openai_demo.py to create log_dir directly - Update constants.py comments to clarify SOURCE_IDS_LIMIT_METHOD options - Remove duplicate return statement in utils.py normalize_extracted_info() - Fix error string formatting in chroma_impl.py with !s conversion - Remove unused pipmaster import from chroma_impl.py	2025-12-05 18:10:19 +01:00
clssck	dd1413f3eb	test(lightrag,examples): add prompt accuracy and quality tests Add comprehensive test suites for prompt evaluation: - test_prompt_accuracy.py: 365 lines testing prompt extraction accuracy - test_prompt_quality_deep.py: 672 lines for deep quality analysis - Refactor prompt.py to consolidate optimized variants (removed prompt_optimized.py) - Apply ruff formatting and type hints across 30 files - Update pyrightconfig.json for static type checking - Modernize reproduce scripts and examples with improved type annotations - Sync uv.lock dependencies	2025-12-05 16:39:52 +01:00
clssck	69358d830d	test(lightrag,examples,api): comprehensive ruff formatting and type hints Format entire codebase with ruff and add type hints across all modules: - Apply ruff formatting to all Python files (121 files, 17K insertions) - Add type hints to function signatures throughout lightrag core and API - Update test suite with improved type annotations and docstrings - Add pyrightconfig.json for static type checking configuration - Create prompt_optimized.py and test_extraction_prompt_ab.py test files - Update ruff.toml and .gitignore for improved linting configuration - Standardize code style across examples, reproduce scripts, and utilities	2025-12-05 15:17:06 +01:00
clssck	9bae6267f6	chore: sync with upstream (#4 ) * chore: sync with upstream - Cohere rerank improvements - Content deduplication - Dependency updates * fix: address CodeRabbit review feedback - Harden env parsing for RERANK_MAX_TOKENS_PER_DOC with try/except - Add @pytest.mark.offline to test_overlap_validation - Remove unused doc_indices variable	2025-12-03 13:16:28 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	cdd53ee875	Remove manual initialize_pipeline_status() calls across codebase - Auto-init pipeline status in storages - Remove redundant import statements - Simplify initialization pattern - Update docs and examples	2025-11-17 12:54:33 +08:00
Sleeep	8abc2ac1cb	Update edge keywords extraction in graph visualization 构建neo4j时候关键字的取值默认为d7 应该为修改后的d9	2025-11-17 12:54:32 +08:00
yangdx	c580874a1a	Remove depreced sample code	2025-11-07 16:15:20 +08:00
yangdx	14a015d4ad	Restore query generation example and fix README path reference • Fix path from example/ to examples/ • Add generate_query.py implementation	2025-10-29 19:11:40 +08:00
yangdx	3d5e6226a9	Refactored `rerank_example` file to utilize the updated rerank function.	2025-08-23 22:51:41 +08:00
yangdx	598eecd06d	Refactor: Rename llm_model_max_token_size to summary_max_tokens This commit renames the parameter 'llm_model_max_token_size' to 'summary_max_tokens' for better clarity, as it specifically controls the token limit for entity relation summaries.	2025-07-28 00:49:08 +08:00
yangdx	cb3bf3291c	Fix: rename rerank parameter from top_k to top_n The change aligns with the API parameter naming used by Jina and Cohere rerank services, ensuring consistency and clarity.	2025-07-20 00:26:27 +08:00
zrguo	9a9f0f2463	Update rerank_example & readme	2025-07-15 12:17:27 +08:00
al1kss	e44e7296b2	fix: apply automatic ruff formatting	2025-07-14 00:28:45 +06:00
al1k	0d69e40995	feat: add cloudflare workers ai integration for lightrag	2025-07-10 13:05:19 +06:00
al1k	f153330419	cleaned up files	2025-07-10 12:59:16 +06:00
al1k	6ef08e2302	feat: add cloudflare workers ai integration for lightrag	2025-07-10 12:58:35 +06:00
al1k	1aeac40c30	feat: add cloudflare workers ai integration for lightrag	2025-07-10 12:55:43 +06:00
al1k	c9830b38cb	remove log files during testing	2025-07-10 11:56:59 +06:00
minh nhan nguyen	dd830e9b9d	Merge branch 'HKUDS:main' into main	2025-07-10 11:25:51 +07:00
minh nhan nguyen	0b6cc442c4	Create terminal_log.txt logs of the terminal output	2025-07-08 18:02:49 +07:00
minh nhan nguyen	2156219b71	Create book.txt	2025-07-08 17:59:15 +07:00
minh nhan nguyen	8b17696b3c	Create lighrag_cloudflareworker_example.py Just the first version	2025-07-08 17:58:49 +07:00
zrguo	c295d355a0	fix chunk_top_k limiting	2025-07-08 15:05:30 +08:00
zrguo	f5c80d7cde	Simplify Configuration	2025-07-08 11:16:34 +08:00
zrguo	75dd4f3498	add rerank model	2025-07-07 22:44:59 +08:00
zrguo	62655a4725	Update raganything_example.py	2025-07-04 11:32:01 +08:00
zrguo	12b36bf0c9	Update raganything_example.py	2025-07-03 19:22:20 +08:00
yangdx	271722405f	feat: Flatten LLM cache structure for improved recall efficiency Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.	2025-07-02 16:11:53 +08:00
zrguo	1d788c3e97	Update RAGAnything related	2025-06-26 16:08:14 +08:00
zrguo	03dd99912d	RAG-Anything Integration	2025-06-17 01:16:02 +08:00
zrguo	cc9040d70c	fix lint	2025-06-05 17:37:11 +08:00
zrguo	962974589a	Add example of directly using modal processors	2025-06-05 17:36:05 +08:00
zrguo	8a726f6e08	MinerU integration	2025-06-05 17:02:48 +08:00
Martin Perez-Guevara	3d418d95c5	feat: Integrate Opik for Enhanced Observability in LlamaIndex LLM Interactions This pull request demonstrates how to create a new Opik project when using LiteLLM for LlamaIndex-based LLM calls. The primary goal is to enable detailed tracing, monitoring, and logging of LLM interactions in a new Opik project_name, particularly when using LiteLLM as an API proxy. This enhancement allows for better debugging, performance analysis, observability when using LightRAG with LiteLLM and Opik. Motivation: As our application's reliance on Large Language Models (LLMs) grows, robust observability becomes crucial for maintaining system health, optimizing performance, and understanding usage patterns. Integrating Opik provides the following key benefits: 1. Improved Debugging: Enables end-to-end tracing of requests through the LlamaIndex and LiteLLM layers, making it easier to identify and resolve issues or performance bottlenecks. 2. Comprehensive Performance Monitoring: Allows for the collection of vital metrics such as LLM call latency, token usage, and error rates. This data can be filtered and analyzed within Opik using project names and tags. 3. Effective Cost Management: Facilitates tracking of token consumption associated with specific requests or projects, leading to better cost control and optimization. 4. Deeper Usage Insights: Provides a clearer understanding of how different components of the application or various projects are utilizing LLM capabilities. These changes empower developers to seamlessly add observability to their LlamaIndex-based LLM workflows, especially when leveraging LiteLLM, by passing necessary Opik metadata. Changes Made: 1. `lightrag/llm/llama_index_impl.py`: * Modified the `llama_index_complete_if_cache` function: * The `*kwargs` parameter, which previously handled additional arguments, has been refined. A dedicated `chat_kwargs={}` parameter is now used to pass keyword arguments directly to the `model.achat()` method. This change ensures that vendor-specific parameters, such as LiteLLM's `litellm_params` for Opik metadata, are correctly propagated. The logic for retrieving `llm_instance` from `kwargs` was removed as `model` is now a direct parameter, simplifying the function. * Updated the `llama_index_complete` function: * Ensured that `kwargs` (which may include `chat_kwargs` or other parameters intended for `llama_index_complete_if_cache`) are correctly passed down. 2. `examples/unofficial-sample/lightrag_llamaindex_litellm_demo.py`:** * This existing demo file was updated to align with the changes in `llama_index_impl.py`. * The `llm_model_func` now passes an empty `chat_kwargs={}` by default to `llama_index_complete_if_cache` if no specific chat arguments are needed, maintaining compatibility with the updated function signature. This file serves as a baseline example without Opik integration. 3. `examples/unofficial-sample/lightrag_llamaindex_litellm_opik_demo.py` (New File): * A new example script has been added to specifically demonstrate the integration of LightRAG with LlamaIndex, LiteLLM, and Opik for observability. * The `llm_model_func` in this demo showcases how to construct the `chat_kwargs` dictionary. * It includes `litellm_params` with a `metadata` field for Opik, containing `project_name` and `tags`. This provides a clear example of how to send observability data to Opik. * The call to `llama_index_complete_if_cache` within `llm_model_func` passes these `chat_kwargs`, ensuring Opik metadata is included in the LiteLLM request. These modifications provide a more robust and extensible way to pass parameters to the underlying LLM calls, specifically enabling the integration of observability tools like Opik. Co-authored-by: Martin Perez-Guevara <8766915+MartinPerez@users.noreply.github.com> Co-authored-by: Young Jin Kim <157011356+jidodata-ykim@users.noreply.github.com>	2025-05-20 17:47:05 +02:00
yangdx	d97da6068a	Fix linting	2025-05-20 17:57:42 +08:00
yangdx	e492394fb6	Fix linting	2025-05-20 17:56:52 +08:00
yangdx	7263a1ccf9	Fix linting	2025-05-18 07:17:21 +08:00
sa9arr	36b606d0db	Fix: Correct GraphML to JSON mapping in xml_to_json function	2025-05-17 19:32:25 +05:45
yangdx	284e8aac79	Remove deprecated demo code	2025-05-14 01:57:20 +08:00
yangdx	ba26b82d40	Remove deprected demo code	2025-05-14 01:56:26 +08:00
yangdx	0e26cbebd0	Fix linting	2025-05-14 01:14:45 +08:00
yangdx	5c9fd9c4d2	Update Ollama sample code	2025-05-14 01:14:15 +08:00
yangdx	aa36894d6e	Remove deprecated demo code	2025-05-14 00:36:38 +08:00
yangdx	ab75027b22	Remove deprecated demo code	2025-05-13 23:59:00 +08:00
yangdx	43948d6f17	Update openai demo	2025-05-13 18:27:55 +08:00
yangdx	461c76ce28	Update openai compatible demo	2025-05-13 17:48:45 +08:00
yangdx	5c533f5e1a	Fix liinting	2025-05-13 00:08:21 +08:00

1 2 3 4 5 ...

314 commits