LinhKhanh
23f230d69e
Merge pull request #2 from Learnheart/claude/analyze-document-processing-012NBdsHwJzhXDnbtrCy4yZM
...
Claude/analyze document processing 012 n bds hw jzh x dnbtr cy4y zm
2025-12-01 17:23:21 +07:00
Claude
0125ae5e84
docs: Add comprehensive document processing analysis
...
Add detailed analysis documentation for RAGFlow's document processing pipeline:
- README.md: Overview and architecture diagram
- task_executor_analysis.md: Task execution pipeline details
- pdf_parsing.md: PDF parsing with layout analysis
- ocr_pipeline.md: PaddleOCR integration and text detection
- layout_detection.md: Detectron2 layout recognition
- table_extraction.md: Table structure recognition (TSR)
- file_type_handlers.md: Handlers for all supported file types
These documents explain the document processing flow for newcomers
to understand how RAGFlow handles various file formats.
2025-12-01 09:47:37 +00:00
Claude
2002efb90a
docs: Add database architecture analysis for RAGFlow
...
Document the data flow and storage types for all 4 database systems:
- MySQL: metadata, user data, configs
- Elasticsearch/Infinity: chunks, embeddings, search
- Redis: task queue, caching, distributed locks
- MinIO: raw files storage
2025-12-01 06:40:57 +00:00
LinhKhanh
146eadc89c
Analyze dialog_service.py code
2025-12-01 05:06:53 +07:00
Claude
2f61760051
docs: Add document and knowledgebase service analysis documentation
...
- Add document_service_analysis.md: comprehensive analysis of document
lifecycle management including insert, remove, parse, progress tracking
- Add knowledgebase_service_analysis.md: dataset management and access
control analysis with permission model, parser configuration
2025-11-27 09:54:39 +00:00
Claude
1dcc9a870b
docs: Add detailed PDF parser processing steps documentation
...
Created comprehensive documentation for RAGFlowPdfParser processing pipeline:
- 10 major processing steps with code references
- Complete data flow diagrams
- Algorithm explanations (K-Means column detection, text merging)
- Box data structure evolution through pipeline
- Position tag format specification
- Line-by-line code analysis for key methods:
- __init__ (model loading)
- __images__ (OCR processing)
- _layouts_rec (layout detection)
- _table_transformer_job (table structure)
- _assign_column (column detection)
- _text_merge (horizontal merge)
- _naive_vertical_merge (vertical merge)
- _filter_forpages (cleanup)
- _extract_table_figure (extraction)
- __filterout_scraps (final output)
2025-11-27 06:29:12 +00:00
Claude
6d4dbbfe2c
docs: Add comprehensive DeepDoc deep guide documentation
...
Created in-depth documentation for understanding the deepdoc module:
- README.md: Complete deep guide with:
- Big picture explanation (what problem deepdoc solves)
- Data flow diagrams (Input → Processing → Output)
- Detailed code analysis with line numbers
- Technical explanations (ONNX, CTC, NMS, etc.)
- Design reasoning (why certain technologies chosen)
- Difficult terms glossary
- Extension examples
- ocr_deep_dive.md: Deep dive into OCR subsystem
- DBNet text detection architecture
- CRNN text recognition
- CTC decoding algorithm
- Rotation handling
- Performance optimization
- layout_table_deep_dive.md: Deep dive into layout/table recognition
- YOLOv10 layout detection
- Table structure recognition
- Grid construction algorithm
- Spanning cell handling
- HTML/descriptive output generation
2025-11-27 03:46:14 +00:00
Claude
566bce428b
docs: Add comprehensive algorithm documentation (50+ algorithms)
...
- Updated README.md with complete algorithm map across 12 categories
- Added clustering_algorithms.md (K-Means, GMM, UMAP, Silhouette, Node2Vec)
- Added graph_algorithms.md (PageRank, Leiden, Entity Extraction/Resolution)
- Added nlp_algorithms.md (Trie tokenization, TF-IDF, NER, POS, Synonym)
- Added vision_algorithms.md (OCR, Layout Recognition, TSR, NMS, IoU, XGBoost)
- Added similarity_metrics.md (Cosine, Edit Distance, Token, Hybrid)
2025-11-27 03:34:49 +00:00
Claude
a6ee18476d
docs: Add detailed backend module analysis documentation
...
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR
Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00
Claude
c7cecf9a1f
docs: Add comprehensive RAGFlow analysis documentation
...
- Add directory structure analysis (01_directory_structure.md)
- Add system architecture with diagrams (02_system_architecture.md)
- Add sequence diagrams for main flows (03_sequence_diagrams.md)
- Add detailed modules analysis (04_modules_analysis.md)
- Add tech stack documentation (05_tech_stack.md)
- Add source code analysis (06_source_code_analysis.md)
- Add README summary for personal_analyze folder
This documentation provides:
- Complete codebase structure overview
- System architecture diagrams (ASCII art)
- Sequence diagrams for authentication, RAG, chat, agent flows
- Detailed analysis of API, RAG, DeepDoc, Agent, GraphRAG modules
- Full tech stack with 150+ dependencies analyzed
- Source code patterns and best practices analysis
2025-11-26 10:20:05 +00:00