Commit graph

128 commits

Author SHA1 Message Date
clssck
abb44eccb1 feat(lightrag): improve entity extraction prompts and rerank chunking
Enhance entity extraction with better structured prompts:
- Reorganize prompt format for improved clarity and consistency
- Add XML-style formatting tags for better LLM parsing
- Include language parameter in keywords extraction cache key
- Fix language parameter usage in keywords_extraction prompt

Improve rerank module with chunking fixes:
- Fix top_n behavior to limit documents instead of chunks
- Add Cohere reranker support with proper chunking
- Improve error handling for rerank API responses

Update operate.py:
- Better entity extraction parsing and validation
- Improved cache key generation for multilingual support
2025-12-12 16:45:14 +01:00
clssck
59e89772de refactor: consolidate to PostgreSQL-only backend and modernize stack
Remove legacy storage implementations and deprecated examples:
- Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends
- Remove Kubernetes deployment manifests and installation scripts
- Delete unofficial examples for deprecated backends and offline deployment docs
Streamline core infrastructure:
- Consolidate storage layer to PostgreSQL-only implementation
- Add full-text search caching with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
Modernize frontend and tooling:
- Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles
- Update Dockerfile for PostgreSQL-only deployment
- Add Makefile for common development tasks
- Update environment and configuration examples
Enhance evaluation and testing capabilities:
- Add prompt optimization with DSPy and auto-tuning
- Implement ground truth regeneration and variant testing
- Add prompt debugging and response comparison utilities
- Expand test coverage with new integration scenarios
Simplify dependencies and configuration:
- Remove offline-specific requirement files
- Update pyproject.toml with streamlined dependencies
- Add Python version pinning with .python-version
- Create project guidelines in CLAUDE.md and AGENTS.md
2025-12-12 16:28:49 +01:00
clssck
da9070ecf7 refactor: remove legacy storage implementations and k8s deployment
Remove deprecated storage backends and Kubernetes deployment configuration:
- Delete unused storage implementations: FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis
- Remove Kubernetes deployment manifests and installation scripts
- Delete legacy examples for deprecated backends
- Consolidate to PostgreSQL-only storage backend
Streamline dependencies and add new capabilities:
- Remove deprecated code documentation and migration guides
- Add full-text search caching layer with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
- Simplify configuration with PostgreSQL-focused setup
Update documentation and configuration:
- Rewrite README to focus on supported features
- Update environment and configuration examples
- Remove Kubernetes-specific documentation
- Add new utility scripts for PDF uploads and pipeline monitoring
2025-12-09 14:02:00 +01:00
clssck
dd1413f3eb test(lightrag,examples): add prompt accuracy and quality tests
Add comprehensive test suites for prompt evaluation:
- test_prompt_accuracy.py: 365 lines testing prompt extraction accuracy
- test_prompt_quality_deep.py: 672 lines for deep quality analysis
- Refactor prompt.py to consolidate optimized variants (removed prompt_optimized.py)
- Apply ruff formatting and type hints across 30 files
- Update pyrightconfig.json for static type checking
- Modernize reproduce scripts and examples with improved type annotations
- Sync uv.lock dependencies
2025-12-05 16:39:52 +01:00
clssck
69358d830d test(lightrag,examples,api): comprehensive ruff formatting and type hints
Format entire codebase with ruff and add type hints across all modules:
- Apply ruff formatting to all Python files (121 files, 17K insertions)
- Add type hints to function signatures throughout lightrag core and API
- Update test suite with improved type annotations and docstrings
- Add pyrightconfig.json for static type checking configuration
- Create prompt_optimized.py and test_extraction_prompt_ab.py test files
- Update ruff.toml and .gitignore for improved linting configuration
- Standardize code style across examples, reproduce scripts, and utilities
2025-12-05 15:17:06 +01:00
clssck
1bdd906753 chore(lightrag): remove legacy prompts and clean up prompt.py
Remove unused LLM-generated citation prompts that were kept for backward
compatibility but never referenced in codebase. Consolidate duplicate
instructions in entity summarization prompt and fix minor typos.

- Remove rag_response_with_llm_citations prompt (dead code)
- Remove naive_rag_response_with_llm_citations prompt (dead code)
- Remove unused cite_ready_* backward compatibility aliases
- Consolidate duplicate context/objectivity instructions in summarize prompt
- Fix typo in example (extra parenthesis)
- Clarify delimiter documentation comment
2025-12-01 21:02:44 +01:00
clssck
663ada943a chore: add citation system and enhance RAG UI components
Add citation tracking and display system across backend and frontend components.
Backend changes include citation.py for document attribution, enhanced query routes
with citation metadata, improved prompt templates, and PostgreSQL schema updates.
Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements,
and ChatMessage enhancements for displaying document sources. Update dependencies
and docker-compose test configuration for improved development workflow.
2025-12-01 17:50:00 +01:00
clssck
d2c9e6e2ec test(lightrag): add orphan connection feature with quality validation tests
Implement automatic orphan entity connection system that identifies entities with
no relationships and creates meaningful connections via vector similarity + LLM
validation. This improves knowledge graph connectivity and retrieval quality.
Changes:
- Add orphan connection configuration parameters (thresholds, cross-connect settings)
- Implement aconnect_orphan_entities() method with 4-step validation pipeline
- Add SQL templates for efficient orphan and candidate entity queries
- Create POST /graph/orphans/connect API endpoint with configurable parameters
- Add orphan connection validation prompt for LLM-based relationship verification
- Include relationship density requirement in extraction prompts to prevent orphans
- Update docker-compose.test.yml with optimized extraction parameters
- Add quality validation test suite (run_quality_tests.py) for retrieval evaluation
- Add unit test framework (test_orphan_connection_quality.py) with test cases
- Enable auto-run of orphan connection after document processing
2025-11-28 18:23:30 +01:00
Daniel.y
d392db7b4a
Fix typo in 'equipment' in prompt.py 2025-10-22 11:13:22 +08:00
yangdx
6bf6f43d96 Remove bold formatting from instruction headers in prompts 2025-10-02 00:58:03 +08:00
yangdx
bb6138e748 fix(prompt): Clarify reference section restrictions in prompt templates 2025-10-01 22:35:26 +08:00
yangdx
37e8898cf6 Simplify reference formatting in LLM context generation
- Remove extra newlines in reference lists
- Change code block type from text to generic
2025-10-01 22:20:58 +08:00
yangdx
f83cde14df fix(prompt): Improve markdown formatting requirements and reference style 2025-10-01 21:41:12 +08:00
yangdx
0fd0186414 Improve prompt clarity by standardizing terminology and formatting
• Replace "Source Data" with "Context"
• Add bold formatting for key sections
• Clarify reference_id usage
• Improve JSON/text block formatting
• Standardize data source naming
2025-09-28 13:31:55 +08:00
yangdx
cbdc4c4bdf Refactor prompts and context building for better maintainability
- Extract context templates to PROMPTS
- Unify token calculation logic
- Simplify user_prompt formatting
- Reduce code duplication
- Improve prompt structure consistency
2025-09-26 12:39:06 +08:00
yangdx
fba2356c81 Move user_prompt to system prompt
- Refactor query prompt handling to separate user prompts in system context
- Simplify user_query to only contain query
- Apply changes to both kg_query and naive_query
2025-09-26 10:02:01 +08:00
yangdx
058ce83dba Clarify citation format and fix typo 2025-09-25 20:08:55 +08:00
yangdx
41a6da6786 Remove inline citation instructions from prompt templates
- Remove footnote syntax guidelines
- Delete inline citation examples
- Keep references section format
- Simplify citation documentation
- Update example section titles
2025-09-25 03:46:30 +08:00
yangdx
14bbafa146 Improve inline citation format and add examples to prompts
- Clarify single caret rule for citations
- Add citation format examples
- Rename to "References Section Format"
- Improve multi-citation instructions
2025-09-25 03:26:50 +08:00
yangdx
6177878812 Add inline citation format with footnote syntax to prompts
- Add footnote syntax `[^1]` for citations
- Support multiple citations `[^1,2,3]`
- Update reference section examples
- Enforce caret symbol requirement
- Match reference_id in brackets
2025-09-25 02:51:12 +08:00
yangdx
f610bd5d21 Update citation format to use bullet points and add examples
- Change citation format to `* [n]`
- Add reference section examples
- Apply to both prompt templates
- Improve formatting consistency
2025-09-24 21:59:21 +08:00
yangdx
e9503ee6ae Merge branch 'patch-1' into citation-optimization 2025-09-24 18:23:29 +08:00
yangdx
ac26f3a2f2 Refactor citation format from file paths to numbered document titles
• Change citation format to [n] style
• Reduce max citations from 6 to 5
• Add reference tracking instructions
• Simplify citation merge logic
• Remove inline citation requirements
2025-09-24 14:30:53 +08:00
SASon
b3cc0127d9
Fix typo in output language instruction 2025-09-24 13:22:35 +09:00
SASon
746d4c576d
Fix typo in output language instruction
from Oputput to Output
2025-09-24 13:17:37 +09:00
yangdx
5fa92cbf99 Improve citation quality and reduce reference limits in prompts
- Reduce max citations from 8 to 6
- Require direct fact referencing
- Clarify relevance prioritization
2025-09-22 10:53:03 +08:00
yangdx
8826d2f892 Optimize prompt instruction for citation format 2025-09-22 01:04:57 +08:00
yangdx
2f06f851c3 Enhance citation format with merged references and clearer guidelines
- Increase max references from 5 to 8
- Merge citations by file_path
- Remove inline citations from body
- Add reference section examples
- Update citation prefixes (KG→EN, RE)
2025-09-21 22:48:48 +08:00
yangdx
f88c2fbdff Refactor citation format instructions for clarity and consistency 2025-09-21 15:51:31 +08:00
yangdx
8f0fb3c9eb Include user query in prompt returns 2025-09-21 15:24:20 +08:00
yangdx
6eb37e270a Refactor query handling and improve RAG response prompts
- Move user_prompt to query concatenation
- Remove DEFAULT_USER_PROMPT constant
- Enhance prompt clarity and structure
- Standardize citation formatting
- Improve step-by-step instructions
2025-09-21 15:16:24 +08:00
yangdx
f69c5dfd9a Add language control and format clarity to extraction prompts 2025-09-14 18:26:41 +08:00
yangdx
6e37460964 Improve entity extraction prompt clarity and make sure LLM output content only 2025-09-14 17:50:56 +08:00
yangdx
4de1473875 Improve entity extraction prompts and error message formatting
• Fix typo in error log message
• Clarify format requirements in prompts
• Make extraction instructions clearer
• Improve user prompt consistency
2025-09-14 13:45:59 +08:00
yangdx
fd48afdb00 Use "relation" instead of "relationship" in extration prompt, and support both format for safty 2025-09-14 11:43:35 +08:00
yangdx
d993464a92 Restructure entity extraction prompt with clearer formatting and examples
* Improved instruction clarity
* Added better formatting structure
* Enhanced delimiter usage rules
* Clarified relationship handling
* Better third-person guidelines
2025-09-14 02:30:32 +08:00
yangdx
2686fc526e Change entity type from CreativeWork to Content and update delimiter
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
4a5ab5121d Change delimiter from <|S|> to <|#|> and clarify formatting rules 2025-09-13 22:58:56 +08:00
yangdx
bf423a4ce1 Clarify output structure in prompt instructions by adding field count specifications 2025-09-13 09:51:33 +08:00
yangdx
369f799b16 Refine entity extraction prompts for clarity and consistency
• Clarify tuple delimiter usage
• Soften proper noun translation rules
• Standardize language requirements
• Improve output format consistency
2025-09-13 08:14:46 +08:00
yangdx
0221213b9b Improve entity summarization with JSONL format and fix tuple delimiters
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc Change tuple delimiter from <|SEP|> to <|S|> across codebase
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
b96f1484ec Shorten tuple delimiter to <|S|> and refine relationship extraction text
• Remove redundant "within input text"
• Clarify relationship extraction scope
2025-09-12 08:36:43 +08:00
yangdx
40688def20 Refactor tuple delimiter corruption fix into reusable utility function
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
7f83a58497 Refactor extraction delimiters from ## to newlines and change tuple delimiter to <|SEP|>
• Add robust delimiter fixing logic
• Update prompts for single-line format
2025-09-11 13:44:44 +08:00
yangdx
02e7462645 feat: enhance LLM output format tolerance for bracket processing
- Expand bracket tolerance to support additional characters: < > " '
- Implement symmetric handling for both leading and trailing characters
- Replace simple string matching with robust regex-based pattern detection
- Maintain full backward compatibility with existing bracket formats
2025-09-10 18:10:06 +08:00
yangdx
50fddeebbf fix: Remove conversation history from prompt template
- Delete history section from prompt
- Simplify user query response format
- Remove {history} placeholder variable
2025-09-10 12:07:34 +08:00
yangdx
2dd143c935 Refactor conversation history handling to use LLM native message format
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
06db511f3b Remove angle brackets from entity and relationship output formats 2025-09-09 09:21:23 +08:00
yangdx
d218f15a62 Refactor entity extraction with system prompts and output limits
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00