cognee/cognee/tests
Vasilije c7d0f64cb1
fix: Refactor web parsing (#1575)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

This PR is an iteration over #1552:

1. Refactors `preferred_loaders` from dicts, to a list that can contain
strings (name of the loader) or dicts (`{loader_name: {arg1: val1}}`),
i.e. - `[{"loader_name_one": {"arg1": "val1"}}, "loader_name_two"]`
2. Adds default extraction rules for html parsing
3. Adds unit tests that cover the changes + unit test for tavily

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-22 19:09:19 +02:00
..
cli_tests Remove all references to SearchType.INSIGHTS across the codebase, meaningfully replacing it with SearchType.GRAPH_COMPLETION where applicable. 2025-10-08 12:13:59 +01:00
integration add test cases for tavily 2025-10-22 13:15:56 +01:00
subprocesses feat: Redis lock integration and Kuzu agentic access fix (#1504) 2025-10-16 15:48:20 +02:00
tasks Separate BeautifulSoup crawling from fetching 2025-10-21 22:47:52 +01:00
test_data test: Use smaller files than Alice for tests. 2025-09-26 11:05:57 +02:00
unit tests: remove redundant test 2025-10-22 16:23:41 +01:00
__init__.py
test_add_docling_document.py test: Add test to CI 2025-09-30 12:04:41 +02:00
test_advanced_pdf_loader.py make advanced pdf loader optional 2025-09-22 15:07:58 +08:00
test_chromadb.py COG-3050 - remove insights search (#1506) 2025-10-11 09:09:56 +02:00
test_cognee_server_start.py feat: Add Windows compatibility and error handling improvements 2025-09-25 03:51:01 +07:00
test_concurrent_subprocess_access.py feat: Redis lock integration and Kuzu agentic access fix (#1504) 2025-10-16 15:48:20 +02:00
test_conversation_history.py chore: linting fix 2025-10-17 14:24:25 +02:00
test_custom_model.py
test_deduplication.py test: Rollback deduplication test 2025-10-01 18:10:57 +02:00
test_delete_by_id.py chore: updating delete_by_id test 2025-08-13 15:39:11 +02:00
test_delete_hard.py refactor: Speed up CI/CD execution time 2025-08-26 21:28:11 +02:00
test_delete_soft.py refactor: Speed up CI/CD execution time 2025-08-26 21:28:11 +02:00
test_edge_ingestion.py
test_graph_visualization_permissions.py
test_kuzu.py Revert "Revert "fix: search without prior cognify"" 2025-10-22 13:21:51 +01:00
test_lancedb.py COG-3050 - remove insights search (#1506) 2025-10-11 09:09:56 +02:00
test_library.py Test for update function (#1487) 2025-10-11 10:38:37 +02:00
test_neo4j.py Revert "Revert "fix: search without prior cognify"" 2025-10-22 13:21:51 +01:00
test_neptune_analytics_graph.py
test_neptune_analytics_hybrid.py
test_neptune_analytics_vector.py COG-3050 - remove insights search (#1506) 2025-10-11 09:09:56 +02:00
test_parallel_databases.py
test_permissions.py test: Removed long text string about qunatum computers from tests. Used a file instead. 2025-10-01 17:59:53 +02:00
test_pgvector.py COG-3050 - remove insights search (#1506) 2025-10-11 09:09:56 +02:00
test_relational_db_migration.py refactor: refactor schema migration 2025-09-27 00:41:58 +02:00
test_remote_kuzu.py COG-3050 - remove insights search (#1506) 2025-10-11 09:09:56 +02:00
test_remote_kuzu_stress.py
test_s3.py Loader separation (#1240) 2025-08-14 19:55:39 +02:00
test_s3_file_storage.py Deprecate SearchType.INSIGHTS, replace all references to default search type - SearchType.GRAPH_COMPLETION 2025-10-08 12:13:59 +01:00
test_search_db.py test: Removed long text string about qunatum computers from tests. Used a file instead. 2025-10-01 17:59:53 +02:00
test_starter_pipelines.py
test_telemetry.py
test_temporal_graph.py Merge dev into main (#1422) 2025-09-17 10:32:10 +02:00