cognee

Author	SHA1	Message	Date
hajdul88	94d5175570	feat: adds unit test for the prepare search result - search contract	2025-12-17 10:34:57 +01:00
hajdul88	18d0a41850	Update test_search.py	2025-12-16 17:49:43 +01:00
hajdul88	789fa90790	chore: covering search.py behavior with unit tests	2025-12-16 16:39:31 +01:00
hajdul88	7892b48afe	Update test_get_search_type_tools.py	2025-12-16 15:59:15 +01:00
hajdul88	48c2040f3d	Delete test_get_search_type_tools_integration.py	2025-12-16 15:45:32 +01:00
hajdul88	757d5fca65	Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4	2025-12-16 15:43:05 +01:00
hajdul88	89ef7d7d15	feat: adds integration test for community registered retriever case	2025-12-16 15:41:13 +01:00
hajdul88	c61ff60e40	feat: add unit tests for get_search_type_tools	2025-12-16 15:37:33 +01:00
hajdul88	0d5b284147	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3	2025-12-16 15:18:01 +01:00
Vasilije	12e6ad152e	fix(api): pass run_in_background parameter to memify function (#1847 ) ## Summary The `run_in_background` parameter was defined in `MemifyPayloadDTO` but was never passed to the `cognee_memify` function call, making the parameter effectively unused. ## Changes This fix passes the `run_in_background` parameter from the payload to the `cognee_memify` function so users can actually run memify operations in the background. ## Testing - `uv run ruff check cognee/api/v1/memify/routers/get_memify_router.py` - All checks passed - `uv run ruff format cognee/api/v1/memify/routers/get_memify_router.py` - No changes needed ## DCO I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Fixed background execution flag for memify operations to be properly applied when requested. The background execution setting is now correctly propagated through the system, ensuring operations run as intended. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-16 15:14:52 +01:00
hajdul88	aad4d0cdde	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3	2025-12-16 14:56:24 +01:00
Vasilije	412b6467da	feat(database): add connect_args support to SqlAlchemyAdapter (#1861 ) - Add optional connect_args parameter to __init__ method - Support DATABASE_CONNECT_ARGS environment variable for JSON-based configuration - Enable custom connection parameters for all database engines (SQLite and PostgreSQL) - Maintain backward compatibility with existing code - Add proper error handling and validation for environment variable parsing <!-- .github/pull_request_template.md --> ## Description The intent of this PR is to make the database initialization more flexible and configurable. In order to do this, the system will support a new DATABASE_CONNECT_ARGS environment variable that takes JSON-based configuration,. This enhancement will allow custom connection parameters to be passed to any supported database engine, including SQLite and PostgreSQL,. To guarantee that the environment variable is parsed securely and consistently, appropriate error handling and validation will also be added. ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [x] Breaking change (fix or feature that would cause existing functionality to change) - [x] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Advanced database connection configuration through the optional DATABASE_CONNECT_ARGS environment variable, supporting custom settings such as SSL certificates and timeout configurations. * Custom connection arguments can now be passed to relational database adapters. * Tests * Comprehensive unit test suite for database connection argument parsing and validation. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-16 14:50:27 +01:00
hajdul88	646894d7c5	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3	2025-12-16 12:04:11 +01:00
hajdul88	b4aaa7faef	chore: retriever test reorganization + adding new tests (smoke e2e) (STEP 1.5) (#1888 ) <!-- .github/pull_request_template.md --> This PR restructures the end-to-end tests for the multi-database search layer to improve maintainability, consistency, and coverage across supported Python versions and database settings. Key Changes -Migrates the existing E2E tests to pytest for a more standard and extensible testing framework. -Introduces pytest fixtures to centralize and reuse test setup logic. -Implements proper event loop management to support multiple asynchronous pytest tests reliably. -Improves SQLAlchemy handling in tests, ensuring clean setup and teardown of database state. -Extends multi-database E2E test coverage across all supported Python versions. Benefits -Cleaner and more modular test structure. -Reduced duplication and clearer test intent through fixtures. -More reliable async test execution. -Better alignment with our supported Python version matrix. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Expanded end-to-end test suite for the search database with comprehensive setup/teardown, new session-scoped fixtures, and multiple tests validating graph/vector consistency, retriever contexts, triplet metadata, search result shapes, side effects, and feedback-weight behavior. * Chores * CI updated to run matrixed test jobs across multiple Python versions and standardize test execution for more consistent, parallelized runs. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-16 11:59:33 +01:00
hajdul88	4e8845c117	chore: retriever test reorganization + adding new tests (integration) (STEP 1) (#1881 ) <!-- .github/pull_request_template.md --> ## Description This PR restructures/adds integration and unit tests for the retrieval module. -Old integration tests were updated and moved under unit tests + fixtures added -Added missing unit tests for all core retrieval business logic -Covered 100% of the core retrievers with tests -Minor changes (dead code deletion, typo fixed) ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Changes * TripletRetriever now returns up to 5 results by default (was 1), providing richer context. * Tests * Reorganized test coverage: many unit tests removed and replaced with comprehensive integration tests across retrieval components (graph, chunks, RAG, summaries, temporal, triplets, structured output). * Chores * Simplified triplet formatting logic and removed debug output. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-16 11:11:29 +01:00
hajdul88	622f8fa79e	chore: introduces 1 file upload in ontology endpoint (#1899 ) <!-- .github/pull_request_template.md --> ## Description This PR fixes the ontology upload endpoint by forcing 1 file upload at the time. Tests are adjusted in both server start and ontology endpoint unit test. API was tested. Do not merge it together with https://github.com/topoteretes/cognee/pull/1898 its either that or this one. ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * API Changes * Ontology upload now accepts exactly one file per request; field renamed from "descriptions" to "description" and validated as a plain string. * Stricter form validation and tighter 400/500 error handling for malformed submissions. * Tests * Tests converted to real HTTP-style interactions using a shared test client and dependency overrides. * Payloads now use plain string fields; added coverage for single-file constraints and specific error responses. * Style * Minor formatting cleanups with no functional impact. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 18:30:35 +01:00
Igor Ilic	14d9540d1b	feat: Add database deletion on dataset delete (#1893 ) <!-- .github/pull_request_template.md --> ## Description - Add support for database deletion when dataset is deleted - Simplify dataset handler usage in Cognee ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved dataset deletion: stronger authorization checks and reliable removal of associated graph and vector storage. * Tests * Added end-to-end test to verify complete dataset deletion and cleanup of all related storage components. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 18:15:48 +01:00
Andrej Milicevic	433170fe09	merge dev	2025-12-15 17:06:20 +01:00
hajdul88	bad22ba26b	chore: adds id generation to memify triplet embedding pipeline (#1895 ) <!-- .github/pull_request_template.md --> ## Description This PR adds id generation to the Triplet objects in triplet embedding memify pipeline. In some edge cases duplicated elements could have been ingested into the collection ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * Enhancements * Relationship data now includes unique identifiers for improved tracking and data management capabilities. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 15:45:35 +01:00
Vasilije	69e36cc834	feat: add bedrock as supported llm provider (#1830 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Added support for AWS Bedrock, and the models that are available there. This was a contributor PR that was never finished, so now I polished it up and made it work. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added AWS Bedrock as a new LLM provider with support for multiple authentication methods. * Integrated three new AI models: Claude 4.5 Sonnet, Claude 4.5 Haiku, and Amazon Nova Lite. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 14:33:57 +01:00
Igor Ilic	c94225f505	fix: make ontology key an optional param in cognify (#1894 ) <!-- .github/pull_request_template.md --> ## Description Make ontology key optional in Swagger and None by default (it was "string" by default before change which was causing issues when running cognify endpoint) ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Documentation * Enhanced API documentation with additional examples and validation metadata to improve request clarity and validation guidance. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 14:30:22 +01:00
hajdul88	fa035f42f4	chore: adds back accidentally deleted structured output test	2025-12-12 16:47:58 +01:00
hajdul88	fd23c75c09	chore: adds new Unit tests for retrievers	2025-12-12 14:44:41 +01:00
Igor Ilic	127d9860df	feat: Add dataset database handler info (#1887 ) <!-- .github/pull_request_template.md --> ## Description Add info on dataset database handler used for dataset database ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Datasets now record their assigned vector and graph database handlers, allowing per-dataset backend selection. * Chores * Database schema expanded to store handler identifiers per dataset. * Deletion/cleanup processes now use dataset-level handler info for accurate removal across backends. * Tests * Tests updated to include and validate the new handler fields in dataset creation outputs. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:22:03 +01:00
Igor Ilic	ede884e0b0	feat: make pipeline processing cache optional (#1876 ) <!-- .github/pull_request_template.md --> ## Description Make the pipeline cache mechanism optional, have it turned off by default but use it for add and cognify like it has been used until now ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [ x I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Introduced pipeline caching across ingestion, processing, and custom pipeline flows with per-run controls to enable or disable caching. * Added an option for incremental loading in custom pipeline runs. * Behavior Changes * One pipeline path now explicitly bypasses caching by default to always re-run when invoked. * Disabling cache forces re-processing instead of early exit; cache reset still enables re-execution. * Tests * Added tests validating caching, non-caching, and cache-reset re-execution behavior. * Chores * Added CI job to run pipeline caching tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:11:31 +01:00
Igor Ilic	59f8d12fa3	Merge branch 'main' into merge-main-vol7	2025-12-11 19:11:24 +01:00
Andrej Milicevic	af8c5bedcc	feat: add kwargs to other adapters	2025-12-11 17:47:23 +01:00
Igor Ilic	46ddd4fd12	feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776 ) <!-- .github/pull_request_template.md --> ## Description Add ability to use multi tenant multi user mode with Neo4j ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * New Features * Multi-user support with per-dataset database isolation enabled by default, allowing backend access control for secure data separation. * Configurable database handlers via environment variables (GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for flexible deployment options. * Chores * Database schema migration to support per-user dataset database configurations. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 14:15:20 +01:00
Igor Ilic	0a1ed79340	refactor: change neo4j_aura to neo4j_aura_dev	2025-12-11 13:05:23 +01:00
Pavel Zorin	fe7e97be45	Chore: Remove Ontology file size limit. Code duplications (#1880 ) <!-- .github/pull_request_template.md --> ## Description We received a complaint about the 10MB file size limit. Removed code duplications More strict types <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Support for supplying optional per-file descriptions when uploading multiple ontologies. * Improvements * Removed the 10MB file size limit for ontology uploads, allowing larger files. * Streamlined and more robust upload handling with improved per-file validation and safer upload behavior. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 10:49:55 +01:00
Pavel Zorin	88f61f9bdb	Added filename check	2025-12-10 17:24:31 +01:00
hajdul88	001fbe699e	feat: Adds edge centered payload and embedding structure during ingestion (#1853 ) <!-- .github/pull_request_template.md --> ## Description This pull request introduces edge‑centered payloads to the ingestion process. Payloads are stored in the Triplet_text collection which is compatible with the triplet_embedding memify pipeline. Changes in This PR: - Refactored custom edge handling, from now on they can be passed to the add_data_points method so the ingestion is centralized and is happening in one place. - Added private methods to handle edge centered payload creation inside the add_data_points.py - Added unit tests to cover the new functionality - Added integration tests - Added e2e tests Acceptance Criteria and Testing Scenario 1: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB contains a non empty Triplet_text collection and the number of triplets are matching with the number of edges in the graph database -Use the new triplet_completion search type and confirm it works correctly. Scenario 2: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB does not have the Triplet_text collection -You should receive an error indicating that the Triplet_text is not available ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Triplet embeddings supported—embeddings created from graph edges plus connected node text * Ability to supply custom edges when adding data points * New configuration toggle to enable/disable triplet embedding * Tests * Added comprehensive unit and end-to-end tests for edge-centered payloads and triplet embedding * New CI job to run the edge-centered payload e2e test * Bug Fixes * Adjusted server start behavior to surface process output in parent logs <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-12-10 17:10:06 +01:00
ketanjain3	2de1bd977d	Merge branch 'dev' into feature/sqlalchemy-custom-connect-args	2025-12-09 23:53:06 +05:30
Pavel Zorin	2ca194c28f	fix format	2025-12-09 18:22:44 +01:00
Pavel Zorin	d932ee4bd9	Specify file type	2025-12-09 17:58:34 +01:00
Pavel Zorin	d0b914acaa	Chore: Remove Ontology file size limit. Code duplications	2025-12-09 17:55:43 +01:00
ketanjain7981	e1d313a46b	move DATABASE_CONNECT_ARGS parsing to RelationalConfig Signed-off-by: ketanjain7981 <ketan.jain@think41.com>	2025-12-09 10:15:36 +05:30
lxobr	c04d255aca	feat: remove secondary search	2025-12-08 17:29:25 +01:00
Vasilije	75fea8dcc8	Removed check_permissions_on_dataset.py and related references (#1786 ) <!-- .github/pull_request_template.md --> ## Description This PR removes the obsolete `check_permissions_on_dataset` task and all its related imports and usages across the codebase. The authorization logic is now handled earlier in the pipeline, so this task is no longer needed. These changes simplify the default Cognify pipeline and make the code cleaner and easier to maintain. ### Changes Made - Removed `cognee/tasks/documents/check_permissions_on_dataset.py` - Removed import from `cognee/tasks/documents/__init__.py` - Removed import and usage in `cognee/api/v1/cognify/cognify.py` - Removed import and usage in `cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py` - Updated comments in `cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py` (index positions changed) - Removed usage in `notebooks/cognee_demo.ipynb` - Updated documentation in `examples/python/simple_example.py` (process description) --- ## Type of Change - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [x] Other (please specify): Task removal / cleanup of deprecated function --- ## Pre-submission Checklist - [ ] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue - [x] My code follows the project's coding standards and style guidelines - [ ] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description (Closes #1771) - [x] My commits have clear and descriptive messages --- ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-12-08 05:43:42 +01:00
Vasilije	7a3138edf8	fix: remove double quotes from llmconfig str params (#1758 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Recently a few cases cryptic errors like in issue #1721 have occurred across cognee use cases. Debugging #1721 however, I found out that if LLM_API_KEY happens to have `"` quotation marks as part of it's value, for example, when already part of the ENV <img width="1014" height="507" alt="Screenshot 2025-11-07 at 16 58 22" src="https://github.com/user-attachments/assets/54b7cbb0-5bdc-4b40-b2b1-aed6c5d3d886" /> Then it makes it's way into Cognee and gets treated as part of the API key. By default, we do not do sanitization nor cleanup. While most of the time quotation marks get handled for us: 1. `export KEY="VALUE"` will strip it 2. python dotenv will strip it if read from `.env` But issues like https://github.com/docker/cli/issues/3630 and #1721 demonstrate that we have to have some handling on our end instead of assuming it's stripped. ## This PR This PR sets up a list of string params we want to strip + some that we may want to. We may want to avoid doing this for all params, which is why I went with selective approach. TODO: add testing ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Configuration values with surrounding quotes are now automatically normalized and cleaned during system initialization, ensuring consistent and predictable data handling across all configuration parameters. * Tests * Added comprehensive unit tests to validate automatic quote removal from configuration values, covering various scenarios including quoted, unquoted, empty, and edge cases with mixed and internal quotes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-08 05:10:23 +01:00
Vasilije	40bbdd1ac7	fix: install nvm and node for -ui cli command (#1836 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Enhanced Node.js and npm environment management for improved system compatibility on Unix-like platforms. * Chores * Updated Next.js to v16, React to v19.2, and Auth0 SDK to v4.13.1 for compatibility and performance improvements. * Removed CrewAI workflow trigger component. * Removed user feedback submission form. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-08 05:09:49 +01:00
Igor Ilic	2f572ae509	test: Update embeding limiter test	2025-12-05 19:18:48 +01:00
Igor Ilic	a66b2ceeca	refactor: reduce ammount of retry attempts for baml llm calls	2025-12-05 18:58:59 +01:00
Igor Ilic	7deaa6e8e9	feat: Add RPM limiting to Cognee	2025-12-05 18:56:34 +01:00
Igor Ilic	0c97a400b0	feat: Add RPM control	2025-12-05 15:40:24 +01:00
Igor Ilic	5d0586da28	Merge branch 'dev' into baml-rate-limit-handling	2025-12-05 13:24:07 +01:00
hajdul88	d5bf5cf4e9	fix: fixes lancedb batch handling (#1872 ) <!-- .github/pull_request_template.md --> ## Description Fixes lancedb batch handling issue. Duplicated elements could appear in the collections when duplicates happen in the same insert batch. ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved data integrity by implementing deduplication logic to eliminate duplicate entries and ensure only the latest version is retained. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-05 12:26:45 +01:00
Vasilije	9571641199	refactor: move codify pipeline out of main repo (#1738 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> This PR removes codify, and the code graph pipeline, out of the repository. It also introduces a Custom Pipeline interface, which can be used in the future to define custom pipelines. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-12-04 23:10:39 -08:00
ketanjain3	654a573454	Merge branch 'dev' into feature/sqlalchemy-custom-connect-args	2025-12-04 23:47:39 +05:30
Igor Ilic	7d7f8a249a	Merge branch 'dev' into main-merge-vol4	2025-12-04 10:32:10 +01:00

1 2 3 4 5 ...

2375 commits