ragflow/common/data_source
hsparks.codes 066d6d3754 feat: Enterprise-grade MySQL/PostgreSQL database connector (2071 lines)
Implements comprehensive database connector with advanced features for
production-grade data synchronization and vectorization.

Core Features (1378 lines - database_connector.py):
- Connection pooling with thread-safe management
- Secure credential encryption using Fernet
- Query result caching with LRU eviction
- Rate limiting with token bucket algorithm
- SQL injection prevention and validation
- Comprehensive error handling and retry logic
- Batch processing with memory management
- Incremental sync with timestamp tracking
- Real-time metrics and monitoring
- Health checks and diagnostics

Security:
- Encrypted credential storage at rest
- SSL/TLS connection support
- SQL injection pattern detection
- Parameterized query enforcement
- Secure password handling

Performance:
- Connection pool (5-20 connections)
- Query result caching (LRU, configurable TTL)
- Rate limiting (100 calls/min default)
- Batch processing (1000 rows/batch)
- Query timeout management
- Automatic retry with exponential backoff

UI Configuration (693 lines - database_config_ui.py):
- Complete UI schema for frontend integration
- Field validation and conditional rendering
- Example configurations for common use cases
- Connection testing utilities
- Schema discovery from SQL queries
- Sample data preview
- Row count estimation

Supported Databases:
- MySQL 5.7+
- MariaDB 10.2+
- PostgreSQL 10+

Configuration Options:
- Batch vs Incremental sync modes
- Field mapping (vectorization vs metadata)
- Custom field transformations
- Validation rules
- SSL/TLS settings
- Performance tuning (pool size, timeouts, cache)
- Rate limiting configuration

Use Cases:
- Product catalogs
- Customer support tickets
- Internal documentation
- FAQ databases
- Real-time data feeds
- Scheduled batch imports

Dependencies:
- mysql-connector-python (MySQL/MariaDB)
- psycopg2 (PostgreSQL)
- cryptography (encryption)

Test Coverage:
- Unit tests for all major components
- Configuration validation
- Document conversion
- Field transformation
- Error handling

Fixes #11560
2025-12-03 12:27:24 +01:00
..
google_drive Feat: add gmail connector (#11549) 2025-11-28 13:09:40 +08:00
google_util Feat: add gmail connector (#11549) 2025-11-28 13:09:40 +08:00
jira Feat: add Jira connector (#11285) 2025-11-17 09:38:04 +08:00
__init__.py Feat: Add Webdav storage as data source (#11422) 2025-11-26 14:14:42 +08:00
blob_connector.py Feat: add addressing style config for S3-compatible storage (#11510) 2025-11-25 16:24:14 +08:00
config.py Feat: add gmail connector (#11549) 2025-11-28 13:09:40 +08:00
confluence_connector.py feat: improve metadata handling in connector service (#11421) 2025-11-26 19:55:48 +08:00
database_config_ui.py feat: Enterprise-grade MySQL/PostgreSQL database connector (2071 lines) 2025-12-03 12:27:24 +01:00
database_connector.py feat: Enterprise-grade MySQL/PostgreSQL database connector (2071 lines) 2025-12-03 12:27:24 +01:00
discord_connector.py feat: improve metadata handling in connector service (#11421) 2025-11-26 19:55:48 +08:00
dropbox_connector.py Feat: add datasource Dropbox (#11488) 2025-11-25 09:40:03 +08:00
exceptions.py Feat: Support multiple data sources synchronizations (#10954) 2025-11-03 19:59:18 +08:00
file_types.py Feat: Support multiple data sources synchronizations (#10954) 2025-11-03 19:59:18 +08:00
gmail_connector.py Feat: add gmail connector (#11549) 2025-11-28 13:09:40 +08:00
html_utils.py Feat: Support multiple data sources synchronizations (#10954) 2025-11-03 19:59:18 +08:00
interfaces.py Feat: add gmail connector (#11549) 2025-11-28 13:09:40 +08:00
models.py feat: improve metadata handling in connector service (#11421) 2025-11-26 19:55:48 +08:00
moodle_connector.py feat: improve Moodle connector functionality (#11665) 2025-12-02 19:12:43 +08:00
notion_connector.py Feat: enriches Notion connector (#11414) 2025-11-20 19:51:37 +08:00
sharepoint_connector.py Feat: Support multiple data sources synchronizations (#10954) 2025-11-03 19:59:18 +08:00
slack_connector.py Feat: Support multiple data sources synchronizations (#10954) 2025-11-03 19:59:18 +08:00
teams_connector.py Feat: Support multiple data sources synchronizations (#10954) 2025-11-03 19:59:18 +08:00
utils.py Feat: add gmail connector (#11549) 2025-11-28 13:09:40 +08:00
webdav_connector.py Feat: Add Webdav storage as data source (#11422) 2025-11-26 14:14:42 +08:00