ragflow/graphrag/general
Yongteng Lei 24ca4cc6b7
Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223)
### What problem does this PR solve?

This PR investigates the cause of #7957.

TL;DR: Incorrect similarity calculations lead to too many candidates.
Since candidate selection involves interaction with the LLM, this causes
significant delays in the program.

What this PR does:

1. **Fix similarity calculation**:
When processing a 64 pages government document, the corrected similarity
calculation reduces the number of candidates from over 100,000 to around
16,000. With a default batch size of 100 pairs per LLM call, this fix
reduces unnecessary LLM interactions from over 1,000 calls to around
160, a roughly 10x improvement.
2. **Add concurrency and timeout limits**: 
Up to 5 entity types are processed in "parallel", each with a 180-second
timeout. These limits may be configurable in future updates.
3. **Improve logging**:
The candidate resolution process now reports progress in real time.
4. **Mitigates potential concurrency risks**


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-12 19:09:50 +08:00
..
__init__.py Light GraphRAG (#4585) 2025-01-22 19:43:14 +08:00
community_report_prompt.py Optimize graphrag again (#6513) 2025-03-26 15:34:42 +08:00
community_reports_extractor.py Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223) 2025-06-12 19:09:50 +08:00
entity_embedding.py Light GraphRAG (#4585) 2025-01-22 19:43:14 +08:00
extractor.py <think> tag is missing. (#7256) 2025-04-24 11:44:10 +08:00
graph_extractor.py perf: Optimize GraphRAG’s LOOP_PROMPT (#7356) 2025-04-28 13:31:04 +08:00
graph_prompt.py perf: Optimize GraphRAG’s LOOP_PROMPT (#7356) 2025-04-28 13:31:04 +08:00
index.py Perf: pass useless check for tidy graph (#8121) 2025-06-09 11:44:13 +08:00
leiden.py Optimize graphrag again (#6513) 2025-03-26 15:34:42 +08:00
mind_map_extractor.py fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106) 2025-04-18 18:00:20 +08:00
mind_map_prompt.py Light GraphRAG (#4585) 2025-01-22 19:43:14 +08:00
smoke.py Refactor graphrag to remove redis lock (#5828) 2025-03-10 15:15:06 +08:00