93 lines
3 KiB
Markdown
93 lines
3 KiB
Markdown
# Task Log: Document Processing Debug Session
|
|
|
|
**Date:** 2025-12-05 12:45
|
|
**Mode:** Beastmode
|
|
**Topic:** Investigation - Document stuck in "Processing" state
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
Successfully investigated and resolved a document stuck in "Processing" state for TechStart Inc tenant.
|
|
|
|
---
|
|
|
|
## Actions Performed
|
|
|
|
1. **Verified server health** - Server running on port 9621, PostgreSQL connected, multi-tenant enabled
|
|
2. **Identified tenant issue** - UI was using cached `tenant_id: default` which doesn't exist in PostgreSQL
|
|
3. **Switched to valid tenant** - Searched and selected "TechStart Inc" with "Main KB"
|
|
4. **Identified stuck document** - `doc-408153a6090f3deeeea5a56df844fef8` ("Can AI Really Check Its Own Math Homework?")
|
|
5. **Found root cause in logs** - LLM extraction timeout after 360s at 03:29:09
|
|
6. **Deleted stuck document** - Used UI to delete the orphaned document
|
|
7. **Verified resolution** - Processing count dropped from 1 to 0
|
|
|
|
---
|
|
|
|
## Decisions Made
|
|
|
|
- Document was orphaned due to server crash/restart during processing
|
|
- The document status was never updated to "Failed" after timeout exception
|
|
- Best solution: delete and re-upload rather than fixing state manually
|
|
|
|
---
|
|
|
|
## Root Cause Analysis
|
|
|
|
```
|
|
2025-12-05 03:29:09 - Failed to extract entities and relationships:
|
|
C[1/1]: chunk-408153a6090f3deeeea5a56df844fef8: LLM func: Worker execution timeout after 360s
|
|
```
|
|
|
|
The document started processing at 00:53:00 and failed at 03:29:09 with a timeout. The exception handling code should have marked the document as "Failed", but likely the server was restarted or crashed during error handling.
|
|
|
|
---
|
|
|
|
## Technical Details
|
|
|
|
### Affected Components
|
|
- `lightrag/lightrag.py` - Entity extraction with timeout
|
|
- `lightrag/api/routers/document_routes.py` - Document management endpoints
|
|
- PostgreSQL doc_status storage
|
|
|
|
### Files Verified (from previous session)
|
|
- `lightrag/services/tenant_service.py` - Fixed datetime deserialization
|
|
- `lightrag_webui/src/features/DocumentManager.tsx` - Pipeline status sync
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. Consider adding a "stale document cleanup" job that marks documents stuck in "Processing" for >1 hour as "Failed"
|
|
2. Add UI button to manually reset document status to "Pending" for retry
|
|
3. Improve error handling in `_process_extract_entities` to ensure status is always updated
|
|
|
|
---
|
|
|
|
## Lessons Learned
|
|
|
|
- Document state can become inconsistent if server crashes during processing
|
|
- The "default" tenant in localStorage can cause 500 errors when it doesn't exist in PostgreSQL
|
|
- Always verify tenant/KB selection before debugging document issues
|
|
- LLM extraction can timeout (360s default) for complex documents
|
|
|
|
---
|
|
|
|
## Verification Steps
|
|
|
|
```bash
|
|
# Health check
|
|
curl -s "http://localhost:9621/health" | jq '.status, .pipeline_busy, .multi_tenant_enabled'
|
|
# Result: "healthy", false, true
|
|
|
|
# Document status after fix
|
|
# All (1), Completed (1), Processing (0), Failed (0)
|
|
```
|
|
|
|
---
|
|
|
|
## Session End
|
|
|
|
- Document stuck state: **RESOLVED** ✅
|
|
- Application functional: **YES** ✅
|
|
- No pending processing: **VERIFIED** ✅
|