LightRAG/docs
Claude 12ab6ebb42
Add trilingual entity extractor (Chinese/English/Swedish)
Implements high-quality entity extraction for three languages using best-in-class tools:
- Chinese: HanLP (F1 95%)
- English: spaCy (F1 90%)
- Swedish: spaCy (F1 80-85%)

**Why not GLiNER?**
Quality gap too large:
- Chinese: 95% vs 24% (-71%)
- English: 90% vs 60% (-30%)
- Swedish: 85% vs 50% (-35%)

**Key Features:**
1. Lazy loading (memory efficient)
   - Loads models on-demand
   - Only one model in memory at a time (~1.5-1.8 GB)
   - Not 4-5 GB simultaneously

2. High quality
   - Each language uses optimal tool
   - Chinese: HanLP (specialized for Chinese)
   - English/Swedish: spaCy (official support)

3. Easy to use
   - Simple API: extract(text, language='zh'/'en'/'sv')
   - Automatic model management
   - Error handling and logging

**Files Added:**
- lightrag/kg/trilingual_entity_extractor.py - Core extractor class
- requirements-trilingual.txt - Dependencies (spacy + hanlp)
- scripts/install_trilingual_models.sh - One-click installation
- scripts/test_trilingual_extractor.py - Comprehensive test suite
- docs/TrilingualNER-Usage-zh.md - Complete usage guide

**Installation:**
```bash
# Method 1: One-click install
./scripts/install_trilingual_models.sh

# Method 2: Manual install
pip install -r requirements-trilingual.txt
python -m spacy download en_core_web_trf
python -m spacy download sv_core_news_lg
# HanLP downloads automatically on first use
```

**Usage:**
```python
from lightrag.kg.trilingual_entity_extractor import TrilingualEntityExtractor

extractor = TrilingualEntityExtractor()

# Chinese
entities = extractor.extract("苹果公司由史蒂夫·乔布斯创立。", language='zh')

# English
entities = extractor.extract("Apple Inc. was founded by Steve Jobs.", language='en')

# Swedish
entities = extractor.extract("Volvo grundades i Göteborg.", language='sv')
```

**Testing:**
```bash
python scripts/test_trilingual_extractor.py
```

**Resource Requirements:**
- Disk: ~1.4 GB (440MB + 545MB + 400MB)
- Memory: ~1.5-1.8 GB per language (lazy loaded)

**Performance (CPU):**
- Chinese: ~12 docs/s
- English: ~29 docs/s
- Swedish: ~26 docs/s

Addresses user's specific needs: pure Chinese, pure English, and pure Swedish documents.
2025-11-19 17:29:00 +00:00
..
Algorithm.md Create Algorithm.md 2025-01-24 21:19:04 +01:00
DockerDeployment.md Add BuildKit cache mounts to optimize Docker build performance 2025-11-03 12:40:30 +08:00
EvaluatingEntityRelationQuality-zh.md Add comprehensive entity/relation extraction quality evaluation guide 2025-11-19 12:45:31 +00:00
FrontendBuildGuide.md Use frozen lockfile for consistent frontend builds 2025-10-14 03:34:55 +08:00
HanLPvsGLiNER-zh.md Add detailed comparison: HanLP vs GLiNER for Chinese entity extraction 2025-11-19 16:16:00 +00:00
LightRAG_concurrent_explain.md Update README 2025-07-27 17:26:49 +08:00
MultilingualNER-Comparison-zh.md Add comprehensive multilingual NER tools comparison guide 2025-11-19 16:34:37 +00:00
OfflineDeployment.md refactor: move document deps to api group, remove dynamic imports 2025-11-13 13:34:09 +08:00
PerformanceFAQ-zh.md Add comprehensive performance FAQ addressing max_async, LLM selection, and database optimization 2025-11-19 10:21:58 +00:00
PerformanceOptimization-zh.md Add performance optimization guide and configuration for LightRAG indexing 2025-11-19 09:55:28 +00:00
PerformanceOptimization.md Add performance optimization guide and configuration for LightRAG indexing 2025-11-19 09:55:28 +00:00
RAGEvaluationMethodsComparison-zh.md Add comprehensive comparison of RAG evaluation methods 2025-11-19 13:36:56 +00:00
SelfHostedOptimization-zh.md Add comprehensive self-hosted LLM optimization guide for LightRAG 2025-11-19 10:53:48 +00:00
TrilingualNER-Usage-zh.md Add trilingual entity extractor (Chinese/English/Swedish) 2025-11-19 17:29:00 +00:00
UV_LOCK_GUIDE.md Migrate Dockerfile from pip to uv package manager for faster builds 2025-10-16 01:54:20 +08:00
WhatIsGleaning-zh.md Add comprehensive guide explaining gleaning concept in LightRAG 2025-11-19 11:45:07 +00:00
WhatIsRAGAS-zh.md Add comprehensive RAGAS evaluation framework guide 2025-11-19 12:52:22 +00:00