Commit graph

386 commits

Author SHA1 Message Date
hajdul88
faeca138d9
fix: fixes distributed pipeline (#1454)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes distributed pipeline + updates core changes in distr
logic.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Changes Made
Fixes distributed pipeline:
-Changed spawning logic + adds incremental loading to
run_tasks_diistributed
-Adds batching to consumer nodes
-Fixes consumer stopping criteria by adding stop signal + handling
-Changed edge embedding solution to avoid huge network load in a case of
a multicontainer environment

## Testing
Tested it by running 1GB on modal + manually

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
None

## Additional Notes
None

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-10-09 14:06:25 +02:00
Andrej Milićević
01632988fe
test: replace neo4j usages in cicd with reusable local instances (#1507)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Changed from remote to local Neo4j instance in tests because CI was
failing due to multiple tests using the remote instance in parallel.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevi@Andrejs-MacBook-Pro.local>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-10-09 12:39:18 +02:00
Daulet Amirkhanov
978549106c chore: use dev env in test_s3 CI 2025-10-07 22:04:26 +01:00
Daulet Amirkhanov
ee079604f4 chore: keep exception handling changes to instructor adapters 2025-10-07 22:03:16 +01:00
Igor Ilic
b4eaea2133 Merge branch 'main' into main-merge-vol7 2025-10-07 19:56:38 +02:00
Igor Ilic
38cdacbcb6
fix: Resolve issue with Gemini adapter (#1494)
<!-- .github/pull_request_template.md -->

## Description
Resolve Gemini Adapter issues:
 1. resolve embedding batch issue,
2. Resolve slowness because gemini tokenizer was sending word per word
to Googles API to count tokens (using OpenAI's local tokenizer to count
tokens for Gemini now)
 3. Update deprecated library and move to instructor

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-07 18:04:18 +02:00
Igor Ilic
95fdbab406
refactor: Remove macos13 from ci/cd and support (#1489)
<!-- .github/pull_request_template.md -->

## Description
Remove MacOS13 support and CI/CD tests

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): Remove MacOS13 support

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-01 18:01:04 +02:00
Igor Ilic
52c978faeb
docs: Multi user authorization example (#1466)
<!-- .github/pull_request_template.md -->

## Description
Add return value of creating role and tenant, add detailed permissions
example to Cognee

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-09-29 20:15:50 +02:00
Vasilije
107b5af6b5
Feature/cog 2979 fix falkordb adapter (#1430)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Falkordb adapter didn't work on main repo, but we have it working on
community. Decision was to remove it from main repo, so it is removed.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-28 17:02:48 +02:00
Igor Ilic
5528097e29 Merge branch 'main' into merge-main-vol6 2025-09-27 00:06:33 +02:00
Vasilije
235f28aefe
refactor: Rework limit=0 for vector adapters (#1450)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Until now, limit=0 in vector search meant that there is no limit and we
should return everything. This caused confusion and errors, so now it is
reworked so that limit=None means no limit on the search. If someone
puts limit=0, there will be no results returned, as it makes more sense
and is less error prone.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-25 21:13:41 +02:00
Igor Ilic
50032dd133 fix: install aws for gh action 2025-09-25 16:02:30 +02:00
Igor Ilic
664459e239 refactor: Install baml only for BAML test 2025-09-25 15:30:27 +02:00
Igor Ilic
d2d0d0de4e refactor: install cognee defined baml version for CI/CD 2025-09-25 13:32:09 +02:00
Igor Ilic
8cbc3eb877 Merge branch 'dev' into COG-2826 2025-09-25 13:31:21 +02:00
Hande
fcbb0a8c56 chore: update pr template 2025-09-24 16:20:26 +02:00
Vasilije
f3e04142ca
fix: added auto tagging (#1424)
<!-- .github/pull_request_template.md -->

## Description

Added auto tagging so that core team PRs always get the same lable
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-23 13:17:03 +02:00
Andrej Milicevic
9b6e1a8f0c test:Add tests for limit=None search 2025-09-23 12:46:51 +02:00
Chaitany
96eb0d448a
feat(#1357): Lexical chunk retriever (#1392)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
I Implemented Lexical Chunk Retriever In the LexicalRetriever class is
Inherite the BaseRetriever and The DocumentChunk are lazy loaded when
first time query is made because it save time during object
initialization
and the function get_context and the get_completion are Implemented same
as the ChunksRetriever the only diffrence is that the DocumentChunk are
converted to match the output type of the ChunksRetriever using function
get_own_properties in the utils.

## Type of Change
<!-- Please check the relevant option -->
- [-] Bug fix (non-breaking change that fixes an issue)
- [-] New feature (non-breaking change that adds functionality)
- [-] Breaking change (fix or feature that would cause existing
functionality to change)
- [-] Documentation update
- [-] Code refactoring
- [-] Performance improvement
- [-] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- Added LexicalRetriever base class with customizable tokenizer & scorer
     - Implemented caching of DocumentChunk tokens and payloads 
- Added robust initialization with error handling and logging -
Implemented get_context with top_k ranking and optional scores
- Implemented get_completion consistent with BaseRetriever interface
- Added JaccardChunksRetriever demo using set/multiset Jaccard
similarity
- Support for stopwords and multiset frequency-aware similarity -
Integrated logging for initialization, scoring, and retrieval

## Testing

- Manual tests: initialized retriever, retrieved chunks with toy corpus
    - Edge cases: empty corpus, empty query, scorer/tokenizer errors 
    - Verified Jaccard similarity results for single/multiset cases 
    - Code formatted and linted


## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [-] **I have tested my changes thoroughly before submitting this PR**
- [-] **This PR contains minimal changes necessary to address the
issue/feature**
- [-] My code follows the project's coding standards and style
guidelines
- [-] I have added tests that prove my fix is effective or that my
feature works
- [-] I have added necessary documentation (if applicable)
- [-] All new and existing tests pass
- [-] I have searched existing PRs to ensure this change hasn't been
submitted already
- [-] I have linked any relevant issues in the description
- [-] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->
Relates to  #1392
## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->
Int the cognee/modules/chunking/models/DocumentChunk.py
don't remove the optional  from is_part_of attributes.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevicandrej@yahoo.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-09-19 18:24:33 +02:00
Andrej Milicevic
c391be41d6 refactor: Remove falkordb adapter from main repo, since we have it on community 2025-09-17 12:09:57 +02:00
vasilije
8ec74c48e7 added auto tagging 2025-09-16 16:16:25 -07:00
Andrej Milicevic
a20fe8ad34 fix Azure error in test 2025-09-12 15:15:06 +02:00
Igor Ilic
8d7738d713 Merge branch 'dev' into feature/cog-2923-create-ci-test-for-fastembed 2025-09-12 13:33:15 +02:00
Igor Ilic
82afdef484
Potential fix for code scanning alert no. 227: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-09-12 07:27:31 -04:00
Andrej Milicevic
b0c5620702 Change how llm tests are called in workflow. Delete unnecessary workflows. 2025-09-12 09:47:27 +02:00
Igor Ilic
52b25882b3 refactor: Move cli tests to run in parallel with all tests 2025-09-11 22:34:26 +02:00
Andrej Milicevic
300d358d4f test name change 2025-09-11 18:23:46 +02:00
Andrej Milicevic
70ee196166 Change amount of tests called in test-suites.yml 2025-09-11 18:20:11 +02:00
Igor Ilic
136b5a2f95 test: Add test for memify pipeline 2025-09-11 17:58:42 +02:00
Andrej Milicevic
a16100f8cd test: Add CI test for fastembed 2025-09-11 17:50:27 +02:00
vasilije
711f3fe070 added github issue templates 2025-09-07 17:15:49 -07:00
vasilije
bb8b47bf34 add fix 2025-09-07 16:37:52 -07:00
Vasilije
10b18e5782
Merge branch 'dev' into COG-2826 2025-09-02 22:51:01 +02:00
hajdul88
a4e59b7583
Merge branch 'dev' into feature/cog-2746-time-graph-to-cognify 2025-09-01 09:47:37 +02:00
vasilije
a3da74a01d add open router 2025-08-29 21:49:28 +02:00
hajdul88
0fac4da2d0 feat: adds temporal graph integration and structural tests 2025-08-29 18:21:24 +02:00
vasilije
8f9e289a83 added baml test fix and format 2025-08-28 08:10:25 +02:00
Igor Ilic
3a3274b5f9
Update .github/workflows/test_different_operating_systems.yml
Co-authored-by: Boris <boris@topoteretes.com>
2025-08-27 07:38:05 -04:00
Igor Ilic
e4e1a5438e refactor: Add read permissions only for gh token 2025-08-27 12:47:23 +02:00
Igor Ilic
23ea1c1659
Potential fix for code scanning alert no. 187: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-08-27 06:42:30 -04:00
Igor Ilic
58655ca41e refactor: Add proper path to test file 2025-08-26 21:51:42 +02:00
Igor Ilic
229a7a1db3 refactor: Speed up CI/CD execution time 2025-08-26 21:28:11 +02:00
Igor Ilic
65542ecec7 refactor: Make CI/CD faster add more OS tests 2025-08-26 21:05:30 +02:00
Igor Ilic
8c69653912 fix: Resolve issue with Windows path 2025-08-26 20:22:20 +02:00
Andrej Milicevic
cf7d41cec2 Fix 2025-08-26 09:35:53 +02:00
Andrej Milicevic
dce525e27b Fix again. 2025-08-25 20:52:12 +02:00
Andrej Milicevic
de7a5a1c5d Fixed yml files 2025-08-25 20:45:11 +02:00
Andrej Milicevic
454358ea28 Solution for the API key error. 2025-08-25 20:26:59 +02:00
Andrej Milicevic
4b593aa523 And another potential fix for the API error. 2025-08-25 20:07:15 +02:00
Andrej Milicevic
31b33a98d3 Yet another potential fix for the invalid API key. 2025-08-25 19:40:39 +02:00