Commit graph

506 commits

Author SHA1 Message Date
Igor Ilic
ea14771fd3
Create scorecard.yml 2025-10-20 15:22:01 +02:00
vasilije
a1927548ad added 2025-10-19 14:52:02 +02:00
vasilije
66876daf85 removed docs 2025-10-19 14:38:34 +02:00
vasilije
3f7efd8b88 added fixes for tests 2025-10-19 13:33:02 +02:00
Boris
5da8b03e0b
Merge branch 'dev' into feature/cog-2985-add-ci-tests-that-run-more-examples 2025-10-18 17:14:08 +02:00
hajdul88
9821a01a47
feat: Redis lock integration and Kuzu agentic access fix (#1504)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces a shared locked mechanism in KuzuAdapter to avoid use
case when multiple subprocesses from different environments are trying
to use the same Kuzu adatabase.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-16 15:48:20 +02:00
Andrej Milicevic
6e3370399b Merge branch 'dev' into feature/cog-2985-add-ci-tests-that-run-more-examples 2025-10-16 11:02:48 +02:00
Andrej Milicevic
88cc7af4d7 test: Add a few more examples to the workflow. 2025-10-16 10:50:50 +02:00
Andrej Milicevic
e0663baba4 test: Add test to e2e workflow 2025-10-15 18:17:06 +02:00
Andrej Milicevic
832243034f test: small change in soft delete test 2025-10-13 16:21:19 +02:00
hajdul88
8137ba85e6
Merge branch 'dev' into feature/cog-3123-fix-windows-deletion-test 2025-10-13 10:24:45 +02:00
vasilije
1e90d90a72 Merge branch 'dev' into feature/cog-2871-add-docling-as-data-ingestion-option-to-cognee-add
# Conflicts:
#	.github/workflows/examples_tests.yml
#	poetry.lock
#	uv.lock
2025-10-12 13:06:13 +02:00
hajdul88
faeca138d9
fix: fixes distributed pipeline (#1454)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes distributed pipeline + updates core changes in distr
logic.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Changes Made
Fixes distributed pipeline:
-Changed spawning logic + adds incremental loading to
run_tasks_diistributed
-Adds batching to consumer nodes
-Fixes consumer stopping criteria by adding stop signal + handling
-Changed edge embedding solution to avoid huge network load in a case of
a multicontainer environment

## Testing
Tested it by running 1GB on modal + manually

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
None

## Additional Notes
None

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-10-09 14:06:25 +02:00
Andrej Milićević
01632988fe
test: replace neo4j usages in cicd with reusable local instances (#1507)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Changed from remote to local Neo4j instance in tests because CI was
failing due to multiple tests using the remote instance in parallel.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevi@Andrejs-MacBook-Pro.local>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-10-09 12:39:18 +02:00
Andrej Milicevic
1970e857da chore: fixes based on PR comments 2025-10-09 09:53:27 +02:00
Andrej Milicevic
2932a627bb test: Potential fix for soft deletion test 2025-10-09 09:45:26 +02:00
Daulet Amirkhanov
978549106c chore: use dev env in test_s3 CI 2025-10-07 22:04:26 +01:00
Daulet Amirkhanov
ee079604f4 chore: keep exception handling changes to instructor adapters 2025-10-07 22:03:16 +01:00
Igor Ilic
b4eaea2133 Merge branch 'main' into main-merge-vol7 2025-10-07 19:56:38 +02:00
Igor Ilic
38cdacbcb6
fix: Resolve issue with Gemini adapter (#1494)
<!-- .github/pull_request_template.md -->

## Description
Resolve Gemini Adapter issues:
 1. resolve embedding batch issue,
2. Resolve slowness because gemini tokenizer was sending word per word
to Googles API to count tokens (using OpenAI's local tokenizer to count
tokens for Gemini now)
 3. Update deprecated library and move to instructor

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-07 18:04:18 +02:00
Andrej Milicevic
9206d8536b initial changes, still need to work on this. commit so I can checkout to diff branch 2025-10-06 17:45:22 +02:00
Igor Ilic
95fdbab406
refactor: Remove macos13 from ci/cd and support (#1489)
<!-- .github/pull_request_template.md -->

## Description
Remove MacOS13 support and CI/CD tests

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): Remove MacOS13 support

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-01 18:01:04 +02:00
Andrej Milicevic
45f00b022f test: Renamed s3 test. Commented out docling test. Fails until docling resolves their issue. 2025-09-30 17:22:43 +02:00
Andrej Milicevic
e74ee55137 test: Add test to CI 2025-09-30 12:04:41 +02:00
Igor Ilic
52c978faeb
docs: Multi user authorization example (#1466)
<!-- .github/pull_request_template.md -->

## Description
Add return value of creating role and tenant, add detailed permissions
example to Cognee

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-09-29 20:15:50 +02:00
Vasilije
107b5af6b5
Feature/cog 2979 fix falkordb adapter (#1430)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Falkordb adapter didn't work on main repo, but we have it working on
community. Decision was to remove it from main repo, so it is removed.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-28 17:02:48 +02:00
Igor Ilic
5528097e29 Merge branch 'main' into merge-main-vol6 2025-09-27 00:06:33 +02:00
Vasilije
235f28aefe
refactor: Rework limit=0 for vector adapters (#1450)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Until now, limit=0 in vector search meant that there is no limit and we
should return everything. This caused confusion and errors, so now it is
reworked so that limit=None means no limit on the search. If someone
puts limit=0, there will be no results returned, as it makes more sense
and is less error prone.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-25 21:13:41 +02:00
Igor Ilic
50032dd133 fix: install aws for gh action 2025-09-25 16:02:30 +02:00
Igor Ilic
664459e239 refactor: Install baml only for BAML test 2025-09-25 15:30:27 +02:00
Igor Ilic
d2d0d0de4e refactor: install cognee defined baml version for CI/CD 2025-09-25 13:32:09 +02:00
Igor Ilic
8cbc3eb877 Merge branch 'dev' into COG-2826 2025-09-25 13:31:21 +02:00
Hande
fcbb0a8c56 chore: update pr template 2025-09-24 16:20:26 +02:00
Vasilije
f3e04142ca
fix: added auto tagging (#1424)
<!-- .github/pull_request_template.md -->

## Description

Added auto tagging so that core team PRs always get the same lable
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-23 13:17:03 +02:00
Andrej Milicevic
9b6e1a8f0c test:Add tests for limit=None search 2025-09-23 12:46:51 +02:00
Chaitany
96eb0d448a
feat(#1357): Lexical chunk retriever (#1392)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
I Implemented Lexical Chunk Retriever In the LexicalRetriever class is
Inherite the BaseRetriever and The DocumentChunk are lazy loaded when
first time query is made because it save time during object
initialization
and the function get_context and the get_completion are Implemented same
as the ChunksRetriever the only diffrence is that the DocumentChunk are
converted to match the output type of the ChunksRetriever using function
get_own_properties in the utils.

## Type of Change
<!-- Please check the relevant option -->
- [-] Bug fix (non-breaking change that fixes an issue)
- [-] New feature (non-breaking change that adds functionality)
- [-] Breaking change (fix or feature that would cause existing
functionality to change)
- [-] Documentation update
- [-] Code refactoring
- [-] Performance improvement
- [-] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- Added LexicalRetriever base class with customizable tokenizer & scorer
     - Implemented caching of DocumentChunk tokens and payloads 
- Added robust initialization with error handling and logging -
Implemented get_context with top_k ranking and optional scores
- Implemented get_completion consistent with BaseRetriever interface
- Added JaccardChunksRetriever demo using set/multiset Jaccard
similarity
- Support for stopwords and multiset frequency-aware similarity -
Integrated logging for initialization, scoring, and retrieval

## Testing

- Manual tests: initialized retriever, retrieved chunks with toy corpus
    - Edge cases: empty corpus, empty query, scorer/tokenizer errors 
    - Verified Jaccard similarity results for single/multiset cases 
    - Code formatted and linted


## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [-] **I have tested my changes thoroughly before submitting this PR**
- [-] **This PR contains minimal changes necessary to address the
issue/feature**
- [-] My code follows the project's coding standards and style
guidelines
- [-] I have added tests that prove my fix is effective or that my
feature works
- [-] I have added necessary documentation (if applicable)
- [-] All new and existing tests pass
- [-] I have searched existing PRs to ensure this change hasn't been
submitted already
- [-] I have linked any relevant issues in the description
- [-] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->
Relates to  #1392
## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->
Int the cognee/modules/chunking/models/DocumentChunk.py
don't remove the optional  from is_part_of attributes.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevicandrej@yahoo.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-09-19 18:24:33 +02:00
Andrej Milicevic
c391be41d6 refactor: Remove falkordb adapter from main repo, since we have it on community 2025-09-17 12:09:57 +02:00
vasilije
8ec74c48e7 added auto tagging 2025-09-16 16:16:25 -07:00
Andrej Milicevic
a20fe8ad34 fix Azure error in test 2025-09-12 15:15:06 +02:00
Igor Ilic
8d7738d713 Merge branch 'dev' into feature/cog-2923-create-ci-test-for-fastembed 2025-09-12 13:33:15 +02:00
Igor Ilic
82afdef484
Potential fix for code scanning alert no. 227: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-09-12 07:27:31 -04:00
Andrej Milicevic
b0c5620702 Change how llm tests are called in workflow. Delete unnecessary workflows. 2025-09-12 09:47:27 +02:00
Igor Ilic
52b25882b3 refactor: Move cli tests to run in parallel with all tests 2025-09-11 22:34:26 +02:00
Andrej Milicevic
300d358d4f test name change 2025-09-11 18:23:46 +02:00
Andrej Milicevic
70ee196166 Change amount of tests called in test-suites.yml 2025-09-11 18:20:11 +02:00
Igor Ilic
136b5a2f95 test: Add test for memify pipeline 2025-09-11 17:58:42 +02:00
Andrej Milicevic
a16100f8cd test: Add CI test for fastembed 2025-09-11 17:50:27 +02:00
vasilije
711f3fe070 added github issue templates 2025-09-07 17:15:49 -07:00
vasilije
bb8b47bf34 add fix 2025-09-07 16:37:52 -07:00
xavierdurawa
06a3458982
Merge branch 'dev' into feature/bedrock-llm-provider 2025-09-02 23:08:38 -04:00