ragflow/rag
Yongteng Lei 24ca4cc6b7
Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223)
### What problem does this PR solve?

This PR investigates the cause of #7957.

TL;DR: Incorrect similarity calculations lead to too many candidates.
Since candidate selection involves interaction with the LLM, this causes
significant delays in the program.

What this PR does:

1. **Fix similarity calculation**:
When processing a 64 pages government document, the corrected similarity
calculation reduces the number of candidates from over 100,000 to around
16,000. With a default batch size of 100 pairs per LLM call, this fix
reduces unnecessary LLM interactions from over 1,000 calls to around
160, a roughly 10x improvement.
2. **Add concurrency and timeout limits**: 
Up to 5 entity types are processed in "parallel", each with a 180-second
timeout. These limits may be configurable in future updates.
3. **Improve logging**:
The candidate resolution process now reports progress in real time.
4. **Mitigates potential concurrency risks**


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-12 19:09:50 +08:00
..
app Display only the duplicate column names and corresponding original source. (#8138) 2025-06-10 10:16:38 +08:00
llm Refa: make exception more clear. (#8224) 2025-06-12 17:53:59 +08:00
nlp Fix: order chunks from docx by positions. (#7979) 2025-05-30 17:20:53 +08:00
res Update synonym dictionary file (#7997) 2025-06-03 09:41:53 +08:00
svr Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223) 2025-06-12 19:09:50 +08:00
utils Oss support opendal(including mysql) (#8204) 2025-06-12 11:37:42 +08:00
__init__.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
benchmark.py Refactor embedding batch_size (#3825) 2024-12-03 16:22:39 +08:00
prompts.py Refa: chat with tools. (#8210) 2025-06-12 12:31:10 +08:00
raptor.py Fix task_limiter in raptor.py (#8124) 2025-06-09 10:18:03 +08:00
settings.py set PARALLEL_DEVICES default value= 0 (#7935) 2025-05-29 13:32:16 +08:00