Daulet Amirkhanov
f02aa1abfc
ruff format
2025-10-21 23:02:25 +01:00
Daulet Amirkhanov
0f6aac19e8
TDD: add test cases and finish loading stage
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
6895813ae8
tests: name integration tests more meaningfully
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
ed4eba4c44
add back in-code comments for ingest_data
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
03b4547b7f
validate e2e - urls are saved as htmls, and loaders are selected correctly
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
f84e31c626
bs4_loader.py -> beautiful_soup_loader.py, add to supported loaders
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
322ef156cb
redefine preferred_loaders param to allow for args per loader
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
7210198f2e
implement bs4_loader.py methods aside load yet
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
16e1c60925
move bs4 html parsing into bs4_loader
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
9d9969676f
Separate BeautifulSoup crawling from fetching
2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
a7ff188018
add crawler tests
2025-10-21 22:47:22 +01:00
Daulet Amirkhanov
5035c872a7
refactor: update web scraper configurations and simplify fetch logic
2025-10-21 22:47:22 +01:00
Daulet Amirkhanov
95e735d397
remove fetchers_config, use default configs for Tavily and BeautifulSoup
2025-10-21 22:46:50 +01:00
Daulet Amirkhanov
abbbf88ad3
CI: use scraping dependenies for integration tests
2025-10-21 22:46:50 +01:00
Daulet Amirkhanov
085e81c082
Clean up - remove UnsupportedPathSchemeError
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
35d3c08779
Clean up add.py imports
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
fdf7c27fec
refactor: remove WebUrlLoader imports
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
1213a3a4cb
revert changes to LoaderEngine and LoaderInterface
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
f7c2187ce7
remove loaders_config as it's not in use
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
fc660b46bb
remove web_url_loader since there is no logic post fetching for loader
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d7417d9b06
refactor: move url data fetching logic into save_data_item_to_storage
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
17b33ab443
feat: web_url_fetcher
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
8fe789ee96
nit: remove uneccessary import
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
1a0978fb37
incremental loading - fallback to regular, update test cases
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
a0f760a3d1
refactor: remove redundant filestream arg from LoaderEngine.load_file(...)
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
a69a7e5fc4
tests: remove redundant bs4 configs from tests
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
b5190c90f1
add logging for crawling status; add cap to the crawl_delay from robots.txt
...
- Not advising to use the cap, but giving an option to be able to configure it
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
b9877f9e87
create web_url_loader_example.py
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9b802f651b
fix: web_url_loader load_data should yield stored_path
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d0f3e224cb
refactor ingest_data to accomodate non-FS data items
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
2e7ff0b01b
remove reduntant HtmlContent class in save_data_item_to_storage
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
c0d450b165
tests: fix test_add - add missing required parameter
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
572c8ebce7
refactor: use pydantic models for tavily and beautifulsoup configs instead of dicts
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
36364285b2
tests: fix failing tests
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9a9f9f6836
tests: add some tests to assert behaviour is as expected
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
185600fe17
revert url_crawler changes to cognee.add(), and update web_url_loader.load()
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d884867d2c
extend LoaderInterface to support web_url_loader, implement load()
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
305969c61b
refactor web_url_loader filename
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
95106d5914
fix: ensure web urls correctly go through ingest_data and reach loaders
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9395539868
feat: interface for WebLoader
2025-10-21 22:46:49 +01:00
Vasilije
62157a114d
feature: Cognee Search sessions/conversation related short-term memory ( #1545 )
...
<!-- .github/pull_request_template.md -->
## Description
This PR introduces QA sessions and conversation related short term
memory in cognee search using Redis.
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Screenshots/Videos (if applicable)
None
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-21 16:53:16 +02:00
hajdul88
e9c27fe2f2
Revert "Update basic_tests.yml"
...
This reverts commit 03f4a01499 .
2025-10-21 14:47:22 +02:00
hajdul88
6ee8c9719d
Revert "chore: changes BAML model to openai"
...
This reverts commit e64e29f841 .
2025-10-21 14:47:18 +02:00
hajdul88
03f4a01499
Update basic_tests.yml
2025-10-21 14:44:47 +02:00
hajdul88
e64e29f841
chore: changes BAML model to openai
2025-10-21 14:40:50 +02:00
hajdul88
7f3be30bb9
Potential fix for code scanning alert no. 379: Workflow does not contain permissions
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-10-21 14:31:52 +02:00
hajdul88
62c1862748
feat: adds sessions test again
2025-10-21 14:26:53 +02:00
hajdul88
aad6478fa8
Merge branch 'dev' into feature/cog-3160-redis-session-conversation
2025-10-21 14:25:59 +02:00
hajdul88
2705f97b16
Revert "test"
...
This reverts commit e824a1256b .
2025-10-21 14:23:07 +02:00
hajdul88
e824a1256b
test
2025-10-21 14:22:17 +02:00