Commit graph

3 commits

Author SHA1 Message Date
yangdx
8a8bdba8f4 Add comprehensive chunking tests with multi-token tokenizer edge cases
• Add MultiTokenCharacterTokenizer for testing
• Test token vs character counting accuracy
• Verify delimiter splitting precision
• Test overlap with distinctive content
• Add decode content preservation tests

(cherry picked from commit fec7c67f45)
2025-12-04 19:11:22 +08:00
yangdx
7f7574c8b7 Add token limit validation for character-only chunking
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks

(cherry picked from commit f988a22652)
2025-12-04 19:11:22 +08:00
yangdx
326acbf19b Add comprehensive tests for chunking with recursive splitting
- Test recursive split mode
- Add edge case coverage
- Test parameter combinations
- Verify chunk order indexing
- Add integration test scenarios

(cherry picked from commit 5733292557)
2025-12-04 19:11:21 +08:00