cognee

Author	SHA1	Message	Date
hajdul88	b3fe144b5d	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-2	2025-12-12 13:23:12 +01:00
Igor Ilic	127d9860df	feat: Add dataset database handler info (#1887 ) <!-- .github/pull_request_template.md --> ## Description Add info on dataset database handler used for dataset database ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Datasets now record their assigned vector and graph database handlers, allowing per-dataset backend selection. * Chores * Database schema expanded to store handler identifiers per dataset. * Deletion/cleanup processes now use dataset-level handler info for accurate removal across backends. * Tests * Tests updated to include and validate the new handler fields in dataset creation outputs. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:22:03 +01:00
Igor Ilic	ede884e0b0	feat: make pipeline processing cache optional (#1876 ) <!-- .github/pull_request_template.md --> ## Description Make the pipeline cache mechanism optional, have it turned off by default but use it for add and cognify like it has been used until now ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [ x I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Introduced pipeline caching across ingestion, processing, and custom pipeline flows with per-run controls to enable or disable caching. * Added an option for incremental loading in custom pipeline runs. * Behavior Changes * One pipeline path now explicitly bypasses caching by default to always re-run when invoked. * Disabling cache forces re-processing instead of early exit; cache reset still enables re-execution. * Tests * Added tests validating caching, non-caching, and cache-reset re-execution behavior. * Chores * Added CI job to run pipeline caching tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:11:31 +01:00
hajdul88	2f5d0e0107	lint fix + adds comments	2025-12-12 12:22:12 +01:00
hajdul88	eaf29f2e52	testing another approach	2025-12-12 12:07:04 +01:00
hajdul88	ed21432942	ruff	2025-12-12 11:38:30 +01:00
hajdul88	78390317cf	adds additional asserts to e2e smoke	2025-12-12 11:38:19 +01:00
hajdul88	c8571ad3c7	fixes enable backend access control on case asserts	2025-12-12 11:12:02 +01:00
hajdul88	e677723952	chore: adds tests for missing retrievers	2025-12-12 10:46:25 +01:00
hajdul88	baa158b690	removes atexit handlers	2025-12-12 10:06:40 +01:00
hajdul88	064f39a623	fix linting	2025-12-12 10:04:41 +01:00
hajdul88	d9d6c34a71	Update test_search_db.py	2025-12-12 09:53:50 +01:00
hajdul88	16d953e7fe	Update test_search_db.py	2025-12-12 09:52:37 +01:00
hajdul88	6a8307a1c9	sets log level info in pytest call	2025-12-12 09:37:45 +01:00
hajdul88	96900e18a7	Update test_search_db.py	2025-12-12 09:17:05 +01:00
hajdul88	52c2e00f90	ruff	2025-12-12 09:01:01 +01:00
hajdul88	e6c3e5951f	Update test_search_db.py	2025-12-12 09:00:52 +01:00
hajdul88	6a6e4867db	Update test_search_db.py	2025-12-12 08:48:33 +01:00
hajdul88	2a10ab69ee	Update test_search_db.py	2025-12-12 08:34:33 +01:00
hajdul88	3971e197c2	merging long expensive tests together to reduce setup cost	2025-12-12 08:20:42 +01:00
hajdul88	a06c55f907	Update test_search_db.py	2025-12-12 08:09:30 +01:00
hajdul88	987d4dabd4	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-2	2025-12-12 07:47:15 +01:00
hajdul88	c34f6df61a	Revert "Update test_search_db.py" This reverts commit `043a2da1aa`.	2025-12-11 20:47:58 +01:00
hajdul88	043a2da1aa	Update test_search_db.py	2025-12-11 20:41:01 +01:00
hajdul88	d1a3928d7e	ruff	2025-12-11 20:12:09 +01:00
hajdul88	d349283ce3	Update test_search_db.py	2025-12-11 20:12:00 +01:00
Igor Ilic	7b3d997a06	Merge main vol7 (#1891 ) <!-- .github/pull_request_template.md --> ## Description Add commits from main to dev branch ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Removed permission validation checks from the data processing pipeline, streamlining the overall workflow and reducing processing steps. * Updated task sequences across task handlers to reflect the removal of the validation step. * Documentation * Updated processing pipeline documentation and example code to reflect the new streamlined task sequence. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 20:11:54 +01:00
hajdul88	4a99138daf	ruff	2025-12-11 20:09:06 +01:00
hajdul88	926168b6bd	Update test_search_db.py	2025-12-11 20:08:47 +01:00
hajdul88	41a3d75ed8	Update test_search_db.py	2025-12-11 20:06:30 +01:00
hajdul88	b2d2fd971d	Update test_search_db.py	2025-12-11 19:40:28 +01:00
hajdul88	9e58b3aa11	Update test_search_db.py	2025-12-11 19:21:24 +01:00
hajdul88	088e4e9f98	Update test_search_db.py	2025-12-11 19:18:11 +01:00
Igor Ilic	59f8d12fa3	Merge branch 'main' into merge-main-vol7	2025-12-11 19:11:24 +01:00
hajdul88	1eb15c9f9c	Update test_search_db.py	2025-12-11 19:07:39 +01:00
hajdul88	7e0c9f0c91	removes fixtures	2025-12-11 19:01:37 +01:00
hajdul88	0bef029e34	adds e2e tests (old test broken into separate tests)	2025-12-11 18:47:28 +01:00
hajdul88	84058d4525	Update test_search_db.py	2025-12-11 18:36:21 +01:00
hajdul88	e766a2d78c	adds context e2e test	2025-12-11 18:36:10 +01:00
hajdul88	714fa1f165	adds simple multidb test as pytest + fixtures	2025-12-11 18:22:43 +01:00
hajdul88	3defb9ad44	converts search_db test to pytest test	2025-12-11 17:09:48 +01:00
hajdul88	7a82bd7f7f	TO REVERT	2025-12-11 16:31:33 +01:00
hajdul88	151db6d29a	feat: adds different python versions to the search db test run	2025-12-11 16:30:17 +01:00
Igor Ilic	46ddd4fd12	feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776 ) <!-- .github/pull_request_template.md --> ## Description Add ability to use multi tenant multi user mode with Neo4j ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * New Features * Multi-user support with per-dataset database isolation enabled by default, allowing backend access control for secure data separation. * Configurable database handlers via environment variables (GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for flexible deployment options. * Chores * Database schema migration to support per-user dataset database configurations. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 14:15:20 +01:00
Igor Ilic	0a1ed79340	refactor: change neo4j_aura to neo4j_aura_dev	2025-12-11 13:05:23 +01:00
Pavel Zorin	fe7e97be45	Chore: Remove Ontology file size limit. Code duplications (#1880 ) <!-- .github/pull_request_template.md --> ## Description We received a complaint about the 10MB file size limit. Removed code duplications More strict types <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Support for supplying optional per-file descriptions when uploading multiple ontologies. * Improvements * Removed the 10MB file size limit for ontology uploads, allowing larger files. * Streamlined and more robust upload handling with improved per-file validation and safer upload behavior. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 10:49:55 +01:00
Pavel Zorin	88f61f9bdb	Added filename check	2025-12-10 17:24:31 +01:00
hajdul88	001fbe699e	feat: Adds edge centered payload and embedding structure during ingestion (#1853 ) <!-- .github/pull_request_template.md --> ## Description This pull request introduces edge‑centered payloads to the ingestion process. Payloads are stored in the Triplet_text collection which is compatible with the triplet_embedding memify pipeline. Changes in This PR: - Refactored custom edge handling, from now on they can be passed to the add_data_points method so the ingestion is centralized and is happening in one place. - Added private methods to handle edge centered payload creation inside the add_data_points.py - Added unit tests to cover the new functionality - Added integration tests - Added e2e tests Acceptance Criteria and Testing Scenario 1: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB contains a non empty Triplet_text collection and the number of triplets are matching with the number of edges in the graph database -Use the new triplet_completion search type and confirm it works correctly. Scenario 2: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB does not have the Triplet_text collection -You should receive an error indicating that the Triplet_text is not available ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Triplet embeddings supported—embeddings created from graph edges plus connected node text * Ability to supply custom edges when adding data points * New configuration toggle to enable/disable triplet embedding * Tests * Added comprehensive unit and end-to-end tests for edge-centered payloads and triplet embedding * New CI job to run the edge-centered payload e2e test * Bug Fixes * Adjusted server start behavior to surface process output in parent logs <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-12-10 17:10:06 +01:00
Pavel Zorin	2ca194c28f	fix format	2025-12-09 18:22:44 +01:00
Pavel Zorin	d932ee4bd9	Specify file type	2025-12-09 17:58:34 +01:00

1 2 3 4 5 ...

4552 commits