118 lines
3.9 KiB
Markdown
118 lines
3.9 KiB
Markdown
# Session Log: Fix Tenant Statistics - Document Count Issue
|
|
|
|
## Date: 2025-12-04
|
|
|
|
## Summary
|
|
Fixed the tenant card statistics to correctly display document count and storage usage. The issue was that documents were stored in `lightrag_doc_full` table (with workspace column) instead of the `documents` table.
|
|
|
|
## Problem
|
|
The tenant selection cards were showing "0 Docs" for TechStart tenant even though there was 1 document in the Main KB. Additionally, storage_used_gb was always 0.
|
|
|
|
## Root Cause Analysis
|
|
The SQL query in `list_tenants` was looking for documents in the `documents` table, which was empty. The actual documents are stored in the `lightrag_doc_full` table with a `workspace` column in format `{tenant_id}:{kb_id}`.
|
|
|
|
Database investigation showed:
|
|
- `documents` table: Empty (0 rows)
|
|
- `lightrag_doc_full` table: Contains actual documents with workspace field
|
|
- Example: workspace = `techstart:kb-main` for TechStart tenant's Main KB
|
|
|
|
## Solution
|
|
Updated the SQL query to:
|
|
1. Count documents from `lightrag_doc_full` table
|
|
2. Extract tenant_id from the workspace column using `SPLIT_PART(workspace, ':', 1)`
|
|
3. Calculate storage from content length instead of file_size
|
|
4. Join this data with tenant and KB statistics
|
|
|
|
### Updated SQL Query
|
|
```sql
|
|
SELECT
|
|
t.tenant_id,
|
|
t.name,
|
|
t.description,
|
|
t.created_at,
|
|
t.updated_at,
|
|
COALESCE(kb_stats.kb_count, 0) as kb_count,
|
|
COALESCE(doc_stats.doc_count, 0) as total_documents,
|
|
COALESCE(doc_stats.total_size_bytes, 0) as total_size_bytes
|
|
FROM tenants t
|
|
LEFT JOIN (
|
|
SELECT tenant_id, COUNT(*) as kb_count
|
|
FROM knowledge_bases
|
|
GROUP BY tenant_id
|
|
) kb_stats ON t.tenant_id = kb_stats.tenant_id
|
|
LEFT JOIN (
|
|
SELECT
|
|
SPLIT_PART(workspace, ':', 1) as tenant_id,
|
|
COUNT(*) as doc_count,
|
|
COALESCE(SUM(LENGTH(content)), 0) as total_size_bytes
|
|
FROM lightrag_doc_full
|
|
GROUP BY SPLIT_PART(workspace, ':', 1)
|
|
) doc_stats ON t.tenant_id = doc_stats.tenant_id
|
|
ORDER BY t.created_at DESC
|
|
```
|
|
|
|
## Results
|
|
Before Fix:
|
|
```json
|
|
{
|
|
"tenant_id": "techstart",
|
|
"num_knowledge_bases": 2,
|
|
"num_documents": 0,
|
|
"storage_used_gb": 0.0
|
|
}
|
|
```
|
|
|
|
After Fix:
|
|
```json
|
|
{
|
|
"tenant_id": "techstart",
|
|
"num_knowledge_bases": 2,
|
|
"num_documents": 1,
|
|
"storage_used_gb": 0.00010807719081640244
|
|
}
|
|
```
|
|
|
|
## Verification
|
|
✅ Tenant cards now show correct document count:
|
|
- TechStart Inc: "1 Doc" (was "0 Docs")
|
|
- Acme Corporation: "0 Docs" (correct, no documents)
|
|
|
|
✅ Storage usage now calculated from actual content:
|
|
- TechStart Inc: 0.00010807719081640244 GB (was 0.0)
|
|
|
|
✅ KB count remains correct:
|
|
- Both tenants: "2 KBs"
|
|
|
|
## Code Changes
|
|
**File: `/lightrag/services/tenant_service.py` (lines 612-644)**
|
|
- Updated SQL query to use `lightrag_doc_full` table
|
|
- Used `SPLIT_PART(workspace, ':', 1)` to extract tenant_id
|
|
- Changed storage calculation to use `SUM(LENGTH(content))`
|
|
|
|
## Files Modified
|
|
1. `/lightrag/services/tenant_service.py` - Updated list_tenants SQL query
|
|
|
|
## Testing
|
|
- ✅ Restarted backend server
|
|
- ✅ Verified API returns correct stats via curl
|
|
- ✅ Verified tenant selection cards display correct values
|
|
- ✅ Document count matches actual data in database
|
|
|
|
---
|
|
|
|
## Task Logs
|
|
|
|
### Actions:
|
|
- Investigated database schema to find actual document storage location
|
|
- Updated SQL query to count from lightrag_doc_full instead of documents table
|
|
- Extracted tenant_id from workspace column using SPLIT_PART
|
|
- Restarted backend and verified stats are now correct
|
|
|
|
### Decisions:
|
|
- Used LENGTH(content) for storage calculation since file_size was not available
|
|
- Used SPLIT_PART to parse workspace field rather than a JOIN on a computed column
|
|
|
|
### Insights:
|
|
- LightRAG uses two document storage modes: workspace mode (lightrag_doc_full) and tenant mode (documents)
|
|
- The system was using workspace mode which stores workspace as {tenant_id}:{kb_id}
|
|
- Statistics need to aggregate data across both storage modes or correctly identify which one is in use
|