Commit graph

7 commits

Author SHA1 Message Date
LinhKhanh
146eadc89c
Analyze dialog_service.py code 2025-12-01 05:06:53 +07:00
Claude
2f61760051
docs: Add document and knowledgebase service analysis documentation
- Add document_service_analysis.md: comprehensive analysis of document
  lifecycle management including insert, remove, parse, progress tracking
- Add knowledgebase_service_analysis.md: dataset management and access
  control analysis with permission model, parser configuration
2025-11-27 09:54:39 +00:00
Claude
1dcc9a870b
docs: Add detailed PDF parser processing steps documentation
Created comprehensive documentation for RAGFlowPdfParser processing pipeline:

- 10 major processing steps with code references
- Complete data flow diagrams
- Algorithm explanations (K-Means column detection, text merging)
- Box data structure evolution through pipeline
- Position tag format specification
- Line-by-line code analysis for key methods:
  - __init__ (model loading)
  - __images__ (OCR processing)
  - _layouts_rec (layout detection)
  - _table_transformer_job (table structure)
  - _assign_column (column detection)
  - _text_merge (horizontal merge)
  - _naive_vertical_merge (vertical merge)
  - _filter_forpages (cleanup)
  - _extract_table_figure (extraction)
  - __filterout_scraps (final output)
2025-11-27 06:29:12 +00:00
Claude
6d4dbbfe2c
docs: Add comprehensive DeepDoc deep guide documentation
Created in-depth documentation for understanding the deepdoc module:

- README.md: Complete deep guide with:
  - Big picture explanation (what problem deepdoc solves)
  - Data flow diagrams (Input → Processing → Output)
  - Detailed code analysis with line numbers
  - Technical explanations (ONNX, CTC, NMS, etc.)
  - Design reasoning (why certain technologies chosen)
  - Difficult terms glossary
  - Extension examples

- ocr_deep_dive.md: Deep dive into OCR subsystem
  - DBNet text detection architecture
  - CRNN text recognition
  - CTC decoding algorithm
  - Rotation handling
  - Performance optimization

- layout_table_deep_dive.md: Deep dive into layout/table recognition
  - YOLOv10 layout detection
  - Table structure recognition
  - Grid construction algorithm
  - Spanning cell handling
  - HTML/descriptive output generation
2025-11-27 03:46:14 +00:00
Claude
566bce428b
docs: Add comprehensive algorithm documentation (50+ algorithms)
- Updated README.md with complete algorithm map across 12 categories
- Added clustering_algorithms.md (K-Means, GMM, UMAP, Silhouette, Node2Vec)
- Added graph_algorithms.md (PageRank, Leiden, Entity Extraction/Resolution)
- Added nlp_algorithms.md (Trie tokenization, TF-IDF, NER, POS, Synonym)
- Added vision_algorithms.md (OCR, Layout Recognition, TSR, NMS, IoU, XGBoost)
- Added similarity_metrics.md (Cosine, Edit Distance, Token, Hybrid)
2025-11-27 03:34:49 +00:00
Claude
a6ee18476d
docs: Add detailed backend module analysis documentation
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00
Claude
c7cecf9a1f
docs: Add comprehensive RAGFlow analysis documentation
- Add directory structure analysis (01_directory_structure.md)
- Add system architecture with diagrams (02_system_architecture.md)
- Add sequence diagrams for main flows (03_sequence_diagrams.md)
- Add detailed modules analysis (04_modules_analysis.md)
- Add tech stack documentation (05_tech_stack.md)
- Add source code analysis (06_source_code_analysis.md)
- Add README summary for personal_analyze folder

This documentation provides:
- Complete codebase structure overview
- System architecture diagrams (ASCII art)
- Sequence diagrams for authentication, RAG, chat, agent flows
- Detailed analysis of API, RAG, DeepDoc, Agent, GraphRAG modules
- Full tech stack with 150+ dependencies analyzed
- Source code patterns and best practices analysis
2025-11-26 10:20:05 +00:00