Commit graph

3960 commits

Author SHA1 Message Date
Daulet Amirkhanov
fc660b46bb remove web_url_loader since there is no logic post fetching for loader 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d7417d9b06 refactor: move url data fetching logic into save_data_item_to_storage 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
17b33ab443 feat: web_url_fetcher 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
8fe789ee96 nit: remove uneccessary import 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
1a0978fb37 incremental loading - fallback to regular, update test cases 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
a0f760a3d1 refactor: remove redundant filestream arg from LoaderEngine.load_file(...) 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
a69a7e5fc4 tests: remove redundant bs4 configs from tests 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
b5190c90f1 add logging for crawling status; add cap to the crawl_delay from robots.txt
- Not advising to use the cap, but giving an option to be able to configure it
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
b9877f9e87 create web_url_loader_example.py 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9b802f651b fix: web_url_loader load_data should yield stored_path 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d0f3e224cb refactor ingest_data to accomodate non-FS data items 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
2e7ff0b01b remove reduntant HtmlContent class in save_data_item_to_storage 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
c0d450b165 tests: fix test_add - add missing required parameter 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
572c8ebce7 refactor: use pydantic models for tavily and beautifulsoup configs instead of dicts 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
36364285b2 tests: fix failing tests 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9a9f9f6836 tests: add some tests to assert behaviour is as expected 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
185600fe17 revert url_crawler changes to cognee.add(), and update web_url_loader.load() 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d884867d2c extend LoaderInterface to support web_url_loader, implement load() 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
305969c61b refactor web_url_loader filename 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
95106d5914 fix: ensure web urls correctly go through ingest_data and reach loaders 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9395539868 feat: interface for WebLoader 2025-10-21 22:46:49 +01:00
Vasilije
62157a114d
feature: Cognee Search sessions/conversation related short-term memory (#1545)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces QA sessions and conversation related short term
memory in cognee search using Redis.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-21 16:53:16 +02:00
hajdul88
e9c27fe2f2 Revert "Update basic_tests.yml"
This reverts commit 03f4a01499.
2025-10-21 14:47:22 +02:00
hajdul88
6ee8c9719d Revert "chore: changes BAML model to openai"
This reverts commit e64e29f841.
2025-10-21 14:47:18 +02:00
hajdul88
03f4a01499 Update basic_tests.yml 2025-10-21 14:44:47 +02:00
hajdul88
e64e29f841 chore: changes BAML model to openai 2025-10-21 14:40:50 +02:00
hajdul88
7f3be30bb9
Potential fix for code scanning alert no. 379: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-10-21 14:31:52 +02:00
hajdul88
62c1862748 feat: adds sessions test again 2025-10-21 14:26:53 +02:00
hajdul88
aad6478fa8
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-21 14:25:59 +02:00
hajdul88
2705f97b16 Revert "test"
This reverts commit e824a1256b.
2025-10-21 14:23:07 +02:00
hajdul88
e824a1256b test 2025-10-21 14:22:17 +02:00
Vasilije
1e458b29b6
test: cog 3168/add entity extraction tests (#1572)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added a test to check if graphs are correctly generated and the entities
we expect are actually there. Could be improved with a bigger file and
more assertions, depends on how heavy we want the test to be.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-21 14:15:45 +02:00
Andrej Milicevic
e8523bf4aa test: Add entity extraction test. Minor checks and fixes. 2025-10-21 13:00:42 +02:00
Andrej Milicevic
f8cb233389 merge conflicts resolved. merging dev into this branch 2025-10-21 11:40:37 +02:00
hajdul88
5a27c37cc2
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-21 10:30:52 +02:00
Vasilije
0518b06462
Embedding rate limiter + Cognee optimization (#1547)
<!-- .github/pull_request_template.md -->

## Description
Add embedding rate limiter with Tenacity and set Cognee to fastest
settings by default

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-21 10:19:52 +02:00
hajdul88
47f0b577df
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-21 09:12:14 +02:00
Vasilije
00696d7ee4
Merge branch 'dev' into embedding-rate-limiter 2025-10-21 07:25:16 +02:00
vasilije
0402619ed7 add merge 2025-10-21 07:24:12 +02:00
vasilije
612a2252ce fix 2025-10-21 07:22:52 +02:00
Vasilije
915aa5184e
fix: Resolve issue with plain text files not having magic file info (#1564)
<!-- .github/pull_request_template.md -->

## Description
Some plain text files dont have to have magic binary info on file type,
in case file type guess was not able to dedcude file type consider it
plain text

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-20 18:47:38 +02:00
hajdul88
c42f8392e1
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-20 17:28:12 +02:00
hajdul88
df038365c8
fix: fixes id in get_filtered_graph_data (#1569)
<!-- .github/pull_request_template.md -->

## Description
Fixes get_filtered_graph_data method in neo4jAdapter.


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-20 17:27:49 +02:00
hajdul88
dd8afe42f8
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-20 15:21:56 +02:00
Igor Ilic
0be56ee762
Merge branch 'dev' into fix-plain-txt-file-type 2025-10-20 15:08:59 +02:00
Igor Ilic
3e54b67b4d
fix: Resolve missing argument for distributed (#1563)
<!-- .github/pull_request_template.md -->

## Description
Resolve missing argument for distributed

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-10-20 15:03:35 +02:00
Igor Ilic
09c10286bd
Merge branch 'dev' into fix-plain-txt-file-type 2025-10-20 14:44:46 +02:00
hajdul88
d2d2cfb477
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-20 13:31:33 +02:00
Vasilije
407352d586
Revert "fix: search without prior cognify" (#1567)
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

Reverts topoteretes/cognee#1548
2025-10-20 13:19:02 +02:00
hajdul88
07caedde08
Merge branch 'dev' into feature/cog-3160-redis-session-conversation 2025-10-20 13:03:38 +02:00