LightRAG/logs/2025-12-04-11-50-fix-tenant-stats-document-count.md
2025-12-05 14:31:13 +08:00

118 lines
3.9 KiB
Markdown

# Session Log: Fix Tenant Statistics - Document Count Issue
## Date: 2025-12-04
## Summary
Fixed the tenant card statistics to correctly display document count and storage usage. The issue was that documents were stored in `lightrag_doc_full` table (with workspace column) instead of the `documents` table.
## Problem
The tenant selection cards were showing "0 Docs" for TechStart tenant even though there was 1 document in the Main KB. Additionally, storage_used_gb was always 0.
## Root Cause Analysis
The SQL query in `list_tenants` was looking for documents in the `documents` table, which was empty. The actual documents are stored in the `lightrag_doc_full` table with a `workspace` column in format `{tenant_id}:{kb_id}`.
Database investigation showed:
- `documents` table: Empty (0 rows)
- `lightrag_doc_full` table: Contains actual documents with workspace field
- Example: workspace = `techstart:kb-main` for TechStart tenant's Main KB
## Solution
Updated the SQL query to:
1. Count documents from `lightrag_doc_full` table
2. Extract tenant_id from the workspace column using `SPLIT_PART(workspace, ':', 1)`
3. Calculate storage from content length instead of file_size
4. Join this data with tenant and KB statistics
### Updated SQL Query
```sql
SELECT
t.tenant_id,
t.name,
t.description,
t.created_at,
t.updated_at,
COALESCE(kb_stats.kb_count, 0) as kb_count,
COALESCE(doc_stats.doc_count, 0) as total_documents,
COALESCE(doc_stats.total_size_bytes, 0) as total_size_bytes
FROM tenants t
LEFT JOIN (
SELECT tenant_id, COUNT(*) as kb_count
FROM knowledge_bases
GROUP BY tenant_id
) kb_stats ON t.tenant_id = kb_stats.tenant_id
LEFT JOIN (
SELECT
SPLIT_PART(workspace, ':', 1) as tenant_id,
COUNT(*) as doc_count,
COALESCE(SUM(LENGTH(content)), 0) as total_size_bytes
FROM lightrag_doc_full
GROUP BY SPLIT_PART(workspace, ':', 1)
) doc_stats ON t.tenant_id = doc_stats.tenant_id
ORDER BY t.created_at DESC
```
## Results
Before Fix:
```json
{
"tenant_id": "techstart",
"num_knowledge_bases": 2,
"num_documents": 0,
"storage_used_gb": 0.0
}
```
After Fix:
```json
{
"tenant_id": "techstart",
"num_knowledge_bases": 2,
"num_documents": 1,
"storage_used_gb": 0.00010807719081640244
}
```
## Verification
✅ Tenant cards now show correct document count:
- TechStart Inc: "1 Doc" (was "0 Docs")
- Acme Corporation: "0 Docs" (correct, no documents)
✅ Storage usage now calculated from actual content:
- TechStart Inc: 0.00010807719081640244 GB (was 0.0)
✅ KB count remains correct:
- Both tenants: "2 KBs"
## Code Changes
**File: `/lightrag/services/tenant_service.py` (lines 612-644)**
- Updated SQL query to use `lightrag_doc_full` table
- Used `SPLIT_PART(workspace, ':', 1)` to extract tenant_id
- Changed storage calculation to use `SUM(LENGTH(content))`
## Files Modified
1. `/lightrag/services/tenant_service.py` - Updated list_tenants SQL query
## Testing
- ✅ Restarted backend server
- ✅ Verified API returns correct stats via curl
- ✅ Verified tenant selection cards display correct values
- ✅ Document count matches actual data in database
---
## Task Logs
### Actions:
- Investigated database schema to find actual document storage location
- Updated SQL query to count from lightrag_doc_full instead of documents table
- Extracted tenant_id from workspace column using SPLIT_PART
- Restarted backend and verified stats are now correct
### Decisions:
- Used LENGTH(content) for storage calculation since file_size was not available
- Used SPLIT_PART to parse workspace field rather than a JOIN on a computed column
### Insights:
- LightRAG uses two document storage modes: workspace mode (lightrag_doc_full) and tenant mode (documents)
- The system was using workspace mode which stores workspace as {tenant_id}:{kb_id}
- Statistics need to aggregate data across both storage modes or correctly identify which one is in use