3 KiB
3 KiB
Task Log: Document Processing Debug Session
Date: 2025-12-05 12:45 Mode: Beastmode Topic: Investigation - Document stuck in "Processing" state
Summary
Successfully investigated and resolved a document stuck in "Processing" state for TechStart Inc tenant.
Actions Performed
- Verified server health - Server running on port 9621, PostgreSQL connected, multi-tenant enabled
- Identified tenant issue - UI was using cached
tenant_id: defaultwhich doesn't exist in PostgreSQL - Switched to valid tenant - Searched and selected "TechStart Inc" with "Main KB"
- Identified stuck document -
doc-408153a6090f3deeeea5a56df844fef8("Can AI Really Check Its Own Math Homework?") - Found root cause in logs - LLM extraction timeout after 360s at 03:29:09
- Deleted stuck document - Used UI to delete the orphaned document
- Verified resolution - Processing count dropped from 1 to 0
Decisions Made
- Document was orphaned due to server crash/restart during processing
- The document status was never updated to "Failed" after timeout exception
- Best solution: delete and re-upload rather than fixing state manually
Root Cause Analysis
2025-12-05 03:29:09 - Failed to extract entities and relationships:
C[1/1]: chunk-408153a6090f3deeeea5a56df844fef8: LLM func: Worker execution timeout after 360s
The document started processing at 00:53:00 and failed at 03:29:09 with a timeout. The exception handling code should have marked the document as "Failed", but likely the server was restarted or crashed during error handling.
Technical Details
Affected Components
lightrag/lightrag.py- Entity extraction with timeoutlightrag/api/routers/document_routes.py- Document management endpoints- PostgreSQL doc_status storage
Files Verified (from previous session)
lightrag/services/tenant_service.py- Fixed datetime deserializationlightrag_webui/src/features/DocumentManager.tsx- Pipeline status sync
Next Steps
- Consider adding a "stale document cleanup" job that marks documents stuck in "Processing" for >1 hour as "Failed"
- Add UI button to manually reset document status to "Pending" for retry
- Improve error handling in
_process_extract_entitiesto ensure status is always updated
Lessons Learned
- Document state can become inconsistent if server crashes during processing
- The "default" tenant in localStorage can cause 500 errors when it doesn't exist in PostgreSQL
- Always verify tenant/KB selection before debugging document issues
- LLM extraction can timeout (360s default) for complex documents
Verification Steps
# Health check
curl -s "http://localhost:9621/health" | jq '.status, .pipeline_busy, .multi_tenant_enabled'
# Result: "healthy", false, true
# Document status after fix
# All (1), Completed (1), Processing (0), Failed (0)
Session End
- Document stuck state: RESOLVED ✅
- Application functional: YES ✅
- No pending processing: VERIFIED ✅