Commit graph

2375 commits

Author SHA1 Message Date
hajdul88
94d5175570 feat: adds unit test for the prepare search result - search contract 2025-12-17 10:34:57 +01:00
hajdul88
18d0a41850 Update test_search.py 2025-12-16 17:49:43 +01:00
hajdul88
789fa90790 chore: covering search.py behavior with unit tests 2025-12-16 16:39:31 +01:00
hajdul88
7892b48afe Update test_get_search_type_tools.py 2025-12-16 15:59:15 +01:00
hajdul88
48c2040f3d Delete test_get_search_type_tools_integration.py 2025-12-16 15:45:32 +01:00
hajdul88
757d5fca65
Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-16 15:43:05 +01:00
hajdul88
89ef7d7d15 feat: adds integration test for community registered retriever case 2025-12-16 15:41:13 +01:00
hajdul88
c61ff60e40 feat: add unit tests for get_search_type_tools 2025-12-16 15:37:33 +01:00
hajdul88
0d5b284147
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 15:18:01 +01:00
Vasilije
12e6ad152e
fix(api): pass run_in_background parameter to memify function (#1847)
## Summary

The `run_in_background` parameter was defined in `MemifyPayloadDTO` but
was never passed to the `cognee_memify` function call, making the
parameter effectively unused.

## Changes

This fix passes the `run_in_background` parameter from the payload to
the `cognee_memify` function so users can actually run memify operations
in the background.

## Testing

- `uv run ruff check cognee/api/v1/memify/routers/get_memify_router.py`
- All checks passed
- `uv run ruff format cognee/api/v1/memify/routers/get_memify_router.py`
- No changes needed

## DCO

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Fixed background execution flag for memify operations to be properly
applied when requested. The background execution setting is now
correctly propagated through the system, ensuring operations run as
intended.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 15:14:52 +01:00
hajdul88
aad4d0cdde
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 14:56:24 +01:00
Vasilije
412b6467da
feat(database): add connect_args support to SqlAlchemyAdapter (#1861)
- Add optional connect_args parameter to __init__ method
- Support DATABASE_CONNECT_ARGS environment variable for JSON-based
configuration
- Enable custom connection parameters for all database engines (SQLite
and PostgreSQL)
- Maintain backward compatibility with existing code
- Add proper error handling and validation for environment variable
parsing

<!-- .github/pull_request_template.md -->
## Description
The intent of this PR is to make the database initialization more
flexible and configurable. In order to do this, the system will support
a new DATABASE_CONNECT_ARGS environment variable that takes JSON-based
configuration,. This enhancement will allow custom connection parameters
to be passed to any supported database engine, including SQLite and
PostgreSQL,. To guarantee that the environment variable is parsed
securely and consistently, appropriate error handling and validation
will also be added.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [x] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Advanced database connection configuration through the optional
DATABASE_CONNECT_ARGS environment variable, supporting custom settings
such as SSL certificates and timeout configurations.
* Custom connection arguments can now be passed to relational database
adapters.

* **Tests**
* Comprehensive unit test suite for database connection argument parsing
and validation.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 14:50:27 +01:00
hajdul88
646894d7c5 Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 12:04:11 +01:00
hajdul88
b4aaa7faef
chore: retriever test reorganization + adding new tests (smoke e2e) (STEP 1.5) (#1888)
<!-- .github/pull_request_template.md -->

This PR restructures the end-to-end tests for the multi-database search
layer to improve maintainability, consistency, and coverage across
supported Python versions and database settings.

Key Changes

-Migrates the existing E2E tests to pytest for a more standard and
extensible testing framework.
-Introduces pytest fixtures to centralize and reuse test setup logic.
-Implements proper event loop management to support multiple
asynchronous pytest tests reliably.
-Improves SQLAlchemy handling in tests, ensuring clean setup and
teardown of database state.
-Extends multi-database E2E test coverage across all supported Python
versions.

Benefits

-Cleaner and more modular test structure.
-Reduced duplication and clearer test intent through fixtures.
-More reliable async test execution.
-Better alignment with our supported Python version matrix.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Expanded end-to-end test suite for the search database with
comprehensive setup/teardown, new session-scoped fixtures, and multiple
tests validating graph/vector consistency, retriever contexts, triplet
metadata, search result shapes, side effects, and feedback-weight
behavior.

* **Chores**
* CI updated to run matrixed test jobs across multiple Python versions
and standardize test execution for more consistent, parallelized runs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 11:59:33 +01:00
hajdul88
4e8845c117
chore: retriever test reorganization + adding new tests (integration) (STEP 1) (#1881)
<!-- .github/pull_request_template.md -->

## Description
This PR restructures/adds integration and unit tests for the retrieval
module.

-Old integration tests were updated and moved under unit tests +
fixtures added
-Added missing unit tests for all core retrieval business logic
-Covered 100% of the core retrievers with tests
-Minor changes (dead code deletion, typo fixed)

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Changes**
* TripletRetriever now returns up to 5 results by default (was 1),
providing richer context.

* **Tests**
* Reorganized test coverage: many unit tests removed and replaced with
comprehensive integration tests across retrieval components (graph,
chunks, RAG, summaries, temporal, triplets, structured output).

* **Chores**
  * Simplified triplet formatting logic and removed debug output.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 11:11:29 +01:00
hajdul88
622f8fa79e
chore: introduces 1 file upload in ontology endpoint (#1899)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes the ontology upload endpoint by forcing 1 file upload at
the time. Tests are adjusted in both server start and ontology endpoint
unit test. API was tested.

Do not merge it together with
https://github.com/topoteretes/cognee/pull/1898 its either that or this
one.



## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **API Changes**
* Ontology upload now accepts exactly one file per request; field
renamed from "descriptions" to "description" and validated as a plain
string.
* Stricter form validation and tighter 400/500 error handling for
malformed submissions.

* **Tests**
* Tests converted to real HTTP-style interactions using a shared test
client and dependency overrides.
* Payloads now use plain string fields; added coverage for single-file
constraints and specific error responses.

* **Style**
  * Minor formatting cleanups with no functional impact.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:30:35 +01:00
Igor Ilic
14d9540d1b
feat: Add database deletion on dataset delete (#1893)
<!-- .github/pull_request_template.md -->

## Description
- Add support for database deletion when dataset is deleted
- Simplify dataset handler usage in Cognee

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved dataset deletion: stronger authorization checks and reliable
removal of associated graph and vector storage.

* **Tests**
* Added end-to-end test to verify complete dataset deletion and cleanup
of all related storage components.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:15:48 +01:00
Andrej Milicevic
433170fe09 merge dev 2025-12-15 17:06:20 +01:00
hajdul88
bad22ba26b
chore: adds id generation to memify triplet embedding pipeline (#1895)
<!-- .github/pull_request_template.md -->

## Description
This PR adds id generation to the Triplet objects in triplet embedding
memify pipeline. In some edge cases duplicated elements could have been
ingested into the collection



## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Enhancements**
* Relationship data now includes unique identifiers for improved
tracking and data management capabilities.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 15:45:35 +01:00
Vasilije
69e36cc834
feat: add bedrock as supported llm provider (#1830)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added support for AWS Bedrock, and the models that are available there.
This was a contributor PR that was never finished, so now I polished it
up and made it work.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added AWS Bedrock as a new LLM provider with support for multiple
authentication methods.
* Integrated three new AI models: Claude 4.5 Sonnet, Claude 4.5 Haiku,
and Amazon Nova Lite.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:33:57 +01:00
Igor Ilic
c94225f505
fix: make ontology key an optional param in cognify (#1894)
<!-- .github/pull_request_template.md -->

## Description
Make ontology key optional in Swagger and None by default (it was
"string" by default before change which was causing issues when running
cognify endpoint)

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Enhanced API documentation with additional examples and validation
metadata to improve request clarity and validation guidance.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:30:22 +01:00
hajdul88
fa035f42f4 chore: adds back accidentally deleted structured output test 2025-12-12 16:47:58 +01:00
hajdul88
fd23c75c09 chore: adds new Unit tests for retrievers 2025-12-12 14:44:41 +01:00
Igor Ilic
127d9860df
feat: Add dataset database handler info (#1887)
<!-- .github/pull_request_template.md -->

## Description
Add info on dataset database handler used for dataset database

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Datasets now record their assigned vector and graph database handlers,
allowing per-dataset backend selection.

* **Chores**
  * Database schema expanded to store handler identifiers per dataset.
* Deletion/cleanup processes now use dataset-level handler info for
accurate removal across backends.

* **Tests**
* Tests updated to include and validate the new handler fields in
dataset creation outputs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:22:03 +01:00
Igor Ilic
ede884e0b0
feat: make pipeline processing cache optional (#1876)
<!-- .github/pull_request_template.md -->

## Description
Make the pipeline cache mechanism optional, have it turned off by
default but use it for add and cognify like it has been used until now

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [ x I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Introduced pipeline caching across ingestion, processing, and custom
pipeline flows with per-run controls to enable or disable caching.
  * Added an option for incremental loading in custom pipeline runs.

* **Behavior Changes**
* One pipeline path now explicitly bypasses caching by default to always
re-run when invoked.
* Disabling cache forces re-processing instead of early exit; cache
reset still enables re-execution.

* **Tests**
* Added tests validating caching, non-caching, and cache-reset
re-execution behavior.

* **Chores**
  * Added CI job to run pipeline caching tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:11:31 +01:00
Igor Ilic
59f8d12fa3 Merge branch 'main' into merge-main-vol7 2025-12-11 19:11:24 +01:00
Andrej Milicevic
af8c5bedcc feat: add kwargs to other adapters 2025-12-11 17:47:23 +01:00
Igor Ilic
46ddd4fd12
feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776)
<!-- .github/pull_request_template.md -->

## Description
Add ability to use multi tenant multi user mode with Neo4j

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **New Features**
* Multi-user support with per-dataset database isolation enabled by
default, allowing backend access control for secure data separation.
* Configurable database handlers via environment variables
(GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for
flexible deployment options.

* **Chores**
* Database schema migration to support per-user dataset database
configurations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 14:15:20 +01:00
Igor Ilic
0a1ed79340 refactor: change neo4j_aura to neo4j_aura_dev 2025-12-11 13:05:23 +01:00
Pavel Zorin
fe7e97be45
Chore: Remove Ontology file size limit. Code duplications (#1880)
<!-- .github/pull_request_template.md -->

## Description
We received a complaint about the 10MB file size limit. 
Removed code duplications
More strict types
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Support for supplying optional per-file descriptions when uploading
multiple ontologies.

* **Improvements**
* Removed the 10MB file size limit for ontology uploads, allowing larger
files.
* Streamlined and more robust upload handling with improved per-file
validation and safer upload behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 10:49:55 +01:00
Pavel Zorin
88f61f9bdb Added filename check 2025-12-10 17:24:31 +01:00
hajdul88
001fbe699e
feat: Adds edge centered payload and embedding structure during ingestion (#1853)
<!-- .github/pull_request_template.md -->

## Description
This pull request introduces edge‑centered payloads to the ingestion
process. Payloads are stored in the Triplet_text collection which is
compatible with the triplet_embedding memify pipeline.

Changes in This PR:

- Refactored custom edge handling, from now on they can be passed to the
add_data_points method so the ingestion is centralized and is happening
in one place.
- Added private methods to handle edge centered payload creation inside
the add_data_points.py
- Added unit tests to cover the new functionality
- Added integration tests
- Added e2e tests

Acceptance Criteria and Testing
Scenario 1:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB contains a non empty Triplet_text collection and
the number of triplets are matching with the number of edges in the
graph database
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB does not have the Triplet_text collection 
-You should receive an error indicating that the Triplet_text is not
available


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet embeddings supported—embeddings created from graph edges plus
connected node text
  * Ability to supply custom edges when adding data points
  * New configuration toggle to enable/disable triplet embedding

* **Tests**
* Added comprehensive unit and end-to-end tests for edge-centered
payloads and triplet embedding
  * New CI job to run the edge-centered payload e2e test

* **Bug Fixes**
* Adjusted server start behavior to surface process output in parent
logs

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-10 17:10:06 +01:00
ketanjain3
2de1bd977d
Merge branch 'dev' into feature/sqlalchemy-custom-connect-args 2025-12-09 23:53:06 +05:30
Pavel Zorin
2ca194c28f fix format 2025-12-09 18:22:44 +01:00
Pavel Zorin
d932ee4bd9 Specify file type 2025-12-09 17:58:34 +01:00
Pavel Zorin
d0b914acaa Chore: Remove Ontology file size limit. Code duplications 2025-12-09 17:55:43 +01:00
ketanjain7981
e1d313a46b move DATABASE_CONNECT_ARGS parsing to RelationalConfig
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-09 10:15:36 +05:30
lxobr
c04d255aca feat: remove secondary search 2025-12-08 17:29:25 +01:00
Vasilije
75fea8dcc8
Removed check_permissions_on_dataset.py and related references (#1786)
<!-- .github/pull_request_template.md -->

## Description
This PR removes the obsolete `check_permissions_on_dataset` task and all
its related imports and usages across the codebase.
The authorization logic is now handled earlier in the pipeline, so this
task is no longer needed.
These changes simplify the default Cognify pipeline and make the code
cleaner and easier to maintain.

### Changes Made
- Removed `cognee/tasks/documents/check_permissions_on_dataset.py` 
- Removed import from `cognee/tasks/documents/__init__.py` 
- Removed import and usage in `cognee/api/v1/cognify/cognify.py` 
- Removed import and usage in
`cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py`
- Updated comments in
`cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py`
(index positions changed)
- Removed usage in `notebooks/cognee_demo.ipynb` 
- Updated documentation in `examples/python/simple_example.py` (process
description)

---

## Type of Change
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Other (please specify): Task removal / cleanup of deprecated
function

---

## Pre-submission Checklist
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue**
- [x] My code follows the project's coding standards and style
guidelines
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description (Closes
#1771)
- [x] My commits have clear and descriptive messages

---

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-08 05:43:42 +01:00
Vasilije
7a3138edf8
fix: remove double quotes from llmconfig str params (#1758)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Recently a few cases cryptic errors like in issue #1721 have occurred
across cognee use cases.

Debugging #1721 however, I found out that if LLM_API_KEY happens to have
`"` quotation marks as part of it's value, for example, when already
part of the ENV

<img width="1014" height="507" alt="Screenshot 2025-11-07 at 16 58 22"
src="https://github.com/user-attachments/assets/54b7cbb0-5bdc-4b40-b2b1-aed6c5d3d886"
/>

Then it makes it's way into Cognee and gets treated as part of the API
key.

By default, we do not do sanitization nor cleanup.

While most of the time quotation marks get handled for us:
1. `export KEY="VALUE"` will strip it
2. python dotenv will strip it if read from `.env`

But issues like https://github.com/docker/cli/issues/3630 and #1721
demonstrate that we have to have some handling on our end instead of
assuming it's stripped.

## This PR

This PR sets up a list of string params we want to strip + some that we
may want to.

We may want to avoid doing this for all params, which is why I went with
selective approach.

TODO: add testing

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Configuration values with surrounding quotes are now automatically
normalized and cleaned during system initialization, ensuring consistent
and predictable data handling across all configuration parameters.

* **Tests**
* Added comprehensive unit tests to validate automatic quote removal
from configuration values, covering various scenarios including quoted,
unquoted, empty, and edge cases with mixed and internal quotes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:10:23 +01:00
Vasilije
40bbdd1ac7
fix: install nvm and node for -ui cli command (#1836)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Enhanced Node.js and npm environment management for improved system
compatibility on Unix-like platforms.

* **Chores**
* Updated Next.js to v16, React to v19.2, and Auth0 SDK to v4.13.1 for
compatibility and performance improvements.
  * Removed CrewAI workflow trigger component.
  * Removed user feedback submission form.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:09:49 +01:00
Igor Ilic
2f572ae509 test: Update embeding limiter test 2025-12-05 19:18:48 +01:00
Igor Ilic
a66b2ceeca refactor: reduce ammount of retry attempts for baml llm calls 2025-12-05 18:58:59 +01:00
Igor Ilic
7deaa6e8e9 feat: Add RPM limiting to Cognee 2025-12-05 18:56:34 +01:00
Igor Ilic
0c97a400b0 feat: Add RPM control 2025-12-05 15:40:24 +01:00
Igor Ilic
5d0586da28
Merge branch 'dev' into baml-rate-limit-handling 2025-12-05 13:24:07 +01:00
hajdul88
d5bf5cf4e9
fix: fixes lancedb batch handling (#1872)
<!-- .github/pull_request_template.md -->

## Description
Fixes lancedb batch handling issue. Duplicated elements could appear in
the collections when duplicates happen in the same insert
batch.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved data integrity by implementing deduplication logic to
eliminate duplicate entries and ensure only the latest version is
retained.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-05 12:26:45 +01:00
Vasilije
9571641199
refactor: move codify pipeline out of main repo (#1738)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
This PR removes codify, and the code graph pipeline, out of the
repository. It also introduces a Custom Pipeline interface, which can be
used in the future to define custom pipelines.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-04 23:10:39 -08:00
ketanjain3
654a573454
Merge branch 'dev' into feature/sqlalchemy-custom-connect-args 2025-12-04 23:47:39 +05:30
Igor Ilic
7d7f8a249a
Merge branch 'dev' into main-merge-vol4 2025-12-04 10:32:10 +01:00