yangdx
ecea93992a
Fix lingting
2025-11-20 13:03:31 +08:00
yangdx
1d2f534f3d
Fix linting
2025-11-20 13:02:25 +08:00
yangdx
72ece7343a
Remove obsolete config file and paging design doc
2025-11-20 13:00:13 +08:00
yangdx
1e415cff95
Update postgreSQL docker image link
2025-11-20 12:34:49 +08:00
yangdx
3c85e4882c
Update README
2025-11-20 10:50:02 +08:00
Daniel.y
d52adb64d7
Merge pull request #2390 from danielaskdd/fix-pytest-logging-error
...
Fix: Remove redundant exception logging to eliminate pytest shutdown errors
2025-11-19 23:09:30 +08:00
yangdx
b7de694f48
Add comprehensive error logging across API routes
...
- Add error logs to Ollama API endpoints
- Replace logging with unified logger
- Log streaming query errors
- Add data query error logging
- Include stack traces for debugging
2025-11-19 22:50:06 +08:00
yangdx
0fb2925c6a
Remove ascii_colors dependency and fix stream handling errors
...
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling
2025-11-19 21:38:17 +08:00
Daniel.y
f72f435cef
Merge pull request #2389 from danielaskdd/fix-chunk-size
...
Fix: Add chunk token limit validation with detailed error reporting
2025-11-19 20:34:11 +08:00
yangdx
fec7c67f45
Add comprehensive chunking tests with multi-token tokenizer edge cases
...
• Add MultiTokenCharacterTokenizer for testing
• Test token vs character counting accuracy
• Verify delimiter splitting precision
• Test overlap with distinctive content
• Add decode content preservation tests
2025-11-19 19:31:36 +08:00
yangdx
5733292557
Add comprehensive tests for chunking with recursive splitting
...
- Test recursive split mode
- Add edge case coverage
- Test parameter combinations
- Verify chunk order indexing
- Add integration test scenarios
2025-11-19 19:08:50 +08:00
yangdx
6fea68bff9
Fix ChunkTokenLimitExceededError message formatting
...
- Prevent passes two separate string objects to __init__
- Maintain same error output
2025-11-19 18:50:45 +08:00
yangdx
f988a22652
Add token limit validation for character-only chunking
...
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks
2025-11-19 18:32:43 +08:00
yangdx
5cc916861f
Expand AGENTS.md with testing controls and automation guidelines
...
- Add pytest marker and CLI toggle docs
- Document automation workflow rules
- Clarify integration test setup
- Add agent-specific best practices
- Update testing command examples
2025-11-19 11:30:54 +08:00
Daniel.y
af4d2a3dcc
Merge pull request #2386 from danielaskdd/excel-optimization
...
Feat: Enhance XLSX Extraction by Adding Separators and Escape Special Characters
2025-11-19 10:26:32 +08:00
yangdx
95cd0ece74
Fix DOCX table extraction by escaping special characters in cells
...
- Add escape_cell() function
- Escape backslashes first
- Handle tabs and newlines
- Preserve tab-delimited format
- Prevent double-escaping issues
2025-11-19 09:54:35 +08:00
yangdx
87de2b3e9e
Update XLSX extraction documentation to reflect current implementation
2025-11-19 04:26:41 +08:00
yangdx
0244699d81
Optimize XLSX extraction by using sheet.max_column instead of two-pass scan
...
• Remove two-pass row scanning approach
• Use built-in sheet.max_column property
• Simplify column width detection logic
• Improve memory efficiency
• Maintain column alignment preservation
2025-11-19 04:02:39 +08:00
yangdx
2b16016312
Optimize XLSX extraction to avoid storing all rows in memory
...
• Remove intermediate row storage
• Use iterator twice instead of list()
• Preserve column alignment logic
• Reduce memory footprint
• Maintain same output format
2025-11-19 03:48:36 +08:00
yangdx
ef659a1e09
Preserve column alignment in XLSX extraction with two-pass processing
...
• Two-pass approach for consistent width
• Maintain tabular structure integrity
• Determine max columns first pass
• Extract with alignment second pass
• Prevent column misalignment issues
2025-11-19 03:34:22 +08:00
yangdx
3efb1716b4
Enhance XLSX extraction with structured tab-delimited format and escaping
...
- Add clear sheet separators
- Escape special characters
- Trim trailing empty columns
- Preserve row structure
- Single-pass optimization
2025-11-19 03:06:29 +08:00
Daniel.y
efbbaaf7f9
Merge pull request #2383 from danielaskdd/doc-table
...
Feat: Enhanced DOCX Extraction with Table Content Support
2025-11-19 02:26:02 +08:00
yangdx
e7d2803a65
Remove text stripping in DOCX extraction to preserve whitespace
...
• Keep original paragraph spacing
• Preserve cell whitespace in tables
• Maintain document formatting
• Don't strip leading/trailing spaces
2025-11-19 02:12:27 +08:00
yangdx
186c8f0e16
Preserve blank paragraphs in DOCX extraction to maintain spacing
...
• Remove text emptiness check
• Always append paragraph text
• Maintain document formatting
• Preserve original spacing
2025-11-19 02:03:10 +08:00
yangdx
fa887d811b
Fix table column structure preservation in DOCX extraction
...
• Always append cell text to maintain columns
• Preserve empty cells in table structure
• Check for any content before adding rows
• Use tab separation for proper alignment
• Improve table formatting consistency
2025-11-19 01:52:02 +08:00
yangdx
4438ba41a3
Enhance DOCX extraction to preserve document order with tables
...
• Include tables in extracted content
• Maintain original document order
• Add spacing around tables
• Use tabs to separate table cells
• Process all body elements sequentially
2025-11-19 01:31:33 +08:00
yangdx
d16c7840ab
Bump API version to 0256
2025-11-18 23:15:31 +08:00
yangdx
e77340d4a1
Adjust chunking parameters to match the default environment variable settings
2025-11-18 23:14:50 +08:00
yangdx
24423c9215
Merge branch 'fix_chunk_comment'
2025-11-18 22:47:23 +08:00
yangdx
1bfa1f81cb
Merge branch 'main' into fix_chunk_comment
2025-11-18 22:38:50 +08:00
yangdx
9c10c87554
Fix linting
2025-11-18 22:38:43 +08:00
yangdx
9109509b1a
Merge branch 'dev-postgres-vchordrq'
2025-11-18 22:25:35 +08:00
yangdx
dbae327a17
Merge branch 'main' into dev-postgres-vchordrq
2025-11-18 22:13:27 +08:00
yangdx
b583b8a59d
Merge branch 'feature/postgres-vchordrq-indexes' into dev-postgres-vchordrq
2025-11-18 22:05:48 +08:00
yangdx
3096f844fb
fix(postgres): allow vchordrq.epsilon config when probes is empty
...
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.
This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
EightyOliveira
dacca334e0
refactor(chunking): rename params and improve docstring for chunking_by_token_size
2025-11-18 15:46:28 +08:00
wmsnp
f4bf5d279c
fix: add logger to configure_vchordrq() and format code
2025-11-18 15:31:08 +08:00
Daniel.y
dfbc97363c
Merge pull request #2369 from HKUDS/workspace-isolation
...
Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage
2025-11-18 15:21:10 +08:00
yangdx
702cfd2981
Fix document deletion concurrency control and validation logic
...
• Clarify job naming for single vs batch deletion
• Update job name validation in busy pipeline check
2025-11-18 13:59:24 +08:00
yangdx
656025b75e
Rename GitHub workflow from "Tests" to "Offline Unit Tests"
2025-11-18 13:36:00 +08:00
yangdx
7e9c8ed1e8
Rename test classes to prevent warning from pytest
...
• TestResult → ExecutionResult
• TestStats → ExecutionStats
• Update class docstrings
• Update type hints
• Update variable references
2025-11-18 13:33:05 +08:00
yangdx
4048fc4b89
Fix: auto-acquire pipeline when idle in document deletion
...
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation
2025-11-18 13:25:13 +08:00
yangdx
1745b30a5f
Fix missing workspace parameter in update flags status call
2025-11-18 12:55:48 +08:00
yangdx
f8dd2e0724
Fix namespace parsing when workspace contains colons
...
• Use rsplit instead of split
• Handle colons in workspace names
2025-11-18 12:23:05 +08:00
yangdx
472b498ade
Replace pytest group reference with explicit dependencies in evaluation
...
• Remove pytest group dependency
• Add explicit pytest>=8.4.2
• Add pytest-asyncio>=1.2.0
• Add pre-commit directly
• Fix potential circular dependency
2025-11-18 12:17:21 +08:00
yangdx
a11912ffa5
Add testing workflow guidelines to basic development rules
...
* Define pytest marker patterns
* Document CI/CD test execution
* Specify offline vs integration tests
* Add test isolation best practices
* Reference testing guidelines doc
2025-11-18 11:54:19 +08:00
yangdx
41bf6d0283
Fix test to use default workspace parameter behavior
2025-11-18 11:51:17 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic
2025-11-18 11:45:16 +08:00
yangdx
4ea2124001
Add GitHub CI workflow and test markers for offline/integration tests
...
- Add GitHub Actions workflow for CI
- Mark integration tests requiring services
- Add offline test markers for isolated tests
- Skip integration tests by default
- Configure pytest markers and collection
2025-11-18 11:36:10 +08:00
yangdx
4fef731f37
Standardize test directory creation and remove tempfile dependency
...
• Remove unused tempfile import
• Use consistent project temp/ structure
• Clean up existing directories first
• Create directories with os.makedirs
• Use descriptive test directory names
2025-11-18 10:39:54 +08:00