cognee

Author	SHA1	Message	Date
Igor Ilic	c94225f505	fix: make ontology key an optional param in cognify (#1894 ) <!-- .github/pull_request_template.md --> ## Description Make ontology key optional in Swagger and None by default (it was "string" by default before change which was causing issues when running cognify endpoint) ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Documentation * Enhanced API documentation with additional examples and validation metadata to improve request clarity and validation guidance. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 14:30:22 +01:00
Pavel Zorin	a6bc27afaa	Cleanup	2025-12-12 17:31:54 +01:00
Pavel Zorin	14ff94f269	Initial release pipeline	2025-12-12 17:09:10 +01:00
hajdul88	fa035f42f4	chore: adds back accidentally deleted structured output test	2025-12-12 16:47:58 +01:00
Igor Ilic	7cf6f08283	chore: update test credentials	2025-12-12 15:29:21 +01:00
hajdul88	fd23c75c09	chore: adds new Unit tests for retrievers	2025-12-12 14:44:41 +01:00
Andrej Milicevic	116b6f1eeb	chore: formatting	2025-12-12 13:46:16 +01:00
Andrej Milicevic	a225d7fc61	test: revert some changes	2025-12-12 13:44:58 +01:00
Igor Ilic	0cde551226	Merge branch 'dev' into add-s3-permissions-test	2025-12-12 13:22:50 +01:00
Igor Ilic	127d9860df	feat: Add dataset database handler info (#1887 ) <!-- .github/pull_request_template.md --> ## Description Add info on dataset database handler used for dataset database ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Datasets now record their assigned vector and graph database handlers, allowing per-dataset backend selection. * Chores * Database schema expanded to store handler identifiers per dataset. * Deletion/cleanup processes now use dataset-level handler info for accurate removal across backends. * Tests * Tests updated to include and validate the new handler fields in dataset creation outputs. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:22:03 +01:00
Igor Ilic	ede884e0b0	feat: make pipeline processing cache optional (#1876 ) <!-- .github/pull_request_template.md --> ## Description Make the pipeline cache mechanism optional, have it turned off by default but use it for add and cognify like it has been used until now ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [ x I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Introduced pipeline caching across ingestion, processing, and custom pipeline flows with per-run controls to enable or disable caching. * Added an option for incremental loading in custom pipeline runs. * Behavior Changes * One pipeline path now explicitly bypasses caching by default to always re-run when invoked. * Disabling cache forces re-processing instead of early exit; cache reset still enables re-execution. * Tests * Added tests validating caching, non-caching, and cache-reset re-execution behavior. * Chores * Added CI job to run pipeline caching tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:11:31 +01:00
Andrej Milicevic	a337f4e54c	test: testing logger	2025-12-12 13:02:55 +01:00
Andrej Milicevic	bce6094010	test: change logger	2025-12-12 12:43:54 +01:00
Andrej Milicevic	c48b274571	test: remove delete error from mcp test	2025-12-12 11:53:40 +01:00
Andrej Milicevic	3b8a607b5f	test: fix errors in mcp test	2025-12-12 11:37:27 +01:00
Igor Ilic	7b3d997a06	Merge main vol7 (#1891 ) <!-- .github/pull_request_template.md --> ## Description Add commits from main to dev branch ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Removed permission validation checks from the data processing pipeline, streamlining the overall workflow and reducing processing steps. * Updated task sequences across task handlers to reflect the removal of the validation step. * Documentation * Updated processing pipeline documentation and example code to reflect the new streamlined task sequence. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 20:11:54 +01:00
Igor Ilic	59f8d12fa3	Merge branch 'main' into merge-main-vol7	2025-12-11 19:11:24 +01:00
Andrej Milicevic	e211e66275	chore: remove quick option to isolate mcp CI test	2025-12-11 18:29:17 +01:00
Andrej Milicevic	0f50c993ac	chore: add quick option to isolate mcp CI test	2025-12-11 18:20:07 +01:00
Andrej Milicevic	248ba74592	test: remove codify-related stuff from mcp test	2025-12-11 18:18:42 +01:00
Andrej Milicevic	af8c5bedcc	feat: add kwargs to other adapters	2025-12-11 17:47:23 +01:00
Andrej Milicevic	0f4cf15d58	test: fix docs test trigger	2025-12-11 16:24:47 +01:00
Andrej Milicevic	41edeb0cf8	test: change target repo name	2025-12-11 16:01:26 +01:00
Andrej Milicevic	cd60ae3174	test: remove docs tests. add trigger to docs repo	2025-12-11 15:25:44 +01:00
Igor Ilic	46ddd4fd12	feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776 ) <!-- .github/pull_request_template.md --> ## Description Add ability to use multi tenant multi user mode with Neo4j ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * New Features * Multi-user support with per-dataset database isolation enabled by default, allowing backend access control for secure data separation. * Configurable database handlers via environment variables (GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for flexible deployment options. * Chores * Database schema migration to support per-user dataset database configurations. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 14:15:20 +01:00
Chinmay Bhosale	0d96606fb2	Merge pull request #8 from chinu0609/delete-last-acessed fix: only document level deletion	2025-12-11 18:17:04 +05:30
Igor Ilic	0a1ed79340	refactor: change neo4j_aura to neo4j_aura_dev	2025-12-11 13:05:23 +01:00
Pavel Zorin	fe7e97be45	Chore: Remove Ontology file size limit. Code duplications (#1880 ) <!-- .github/pull_request_template.md --> ## Description We received a complaint about the 10MB file size limit. Removed code duplications More strict types <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Support for supplying optional per-file descriptions when uploading multiple ontologies. * Improvements * Removed the 10MB file size limit for ontology uploads, allowing larger files. * Streamlined and more robust upload handling with improved per-file validation and safer upload behavior. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 10:49:55 +01:00
chinu0609	2485c3f5f0	fix: only document level deletion	2025-12-11 12:48:06 +05:30
hiyan	f48df27fc8	fix(db): url-encode postgres credentials to handle special characters	2025-12-11 10:32:45 +05:30
rajeevrajeshuni	6260f9eb82	strandardizing return type for transcription and some CR changes	2025-12-11 06:53:36 +05:30
Chinmay Bhosale	e654bcb081	Merge pull request #7 from chinu0609/delete-last-acessed fix: only document level deletion	2025-12-10 22:44:06 +05:30
chinu0609	829a6f0d04	fix: only document level deletion	2025-12-10 22:41:01 +05:30
Igor Ilic	2067c459e3	Merge branch 'dev' into add-s3-permissions-test	2025-12-10 17:39:58 +01:00
Pavel Zorin	88f61f9bdb	Added filename check	2025-12-10 17:24:31 +01:00
hajdul88	001fbe699e	feat: Adds edge centered payload and embedding structure during ingestion (#1853 ) <!-- .github/pull_request_template.md --> ## Description This pull request introduces edge‑centered payloads to the ingestion process. Payloads are stored in the Triplet_text collection which is compatible with the triplet_embedding memify pipeline. Changes in This PR: - Refactored custom edge handling, from now on they can be passed to the add_data_points method so the ingestion is centralized and is happening in one place. - Added private methods to handle edge centered payload creation inside the add_data_points.py - Added unit tests to cover the new functionality - Added integration tests - Added e2e tests Acceptance Criteria and Testing Scenario 1: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB contains a non empty Triplet_text collection and the number of triplets are matching with the number of edges in the graph database -Use the new triplet_completion search type and confirm it works correctly. Scenario 2: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB does not have the Triplet_text collection -You should receive an error indicating that the Triplet_text is not available ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Triplet embeddings supported—embeddings created from graph edges plus connected node text * Ability to supply custom edges when adding data points * New configuration toggle to enable/disable triplet embedding * Tests * Added comprehensive unit and end-to-end tests for edge-centered payloads and triplet embedding * New CI job to run the edge-centered payload e2e test * Bug Fixes * Adjusted server start behavior to surface process output in parent logs <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-12-10 17:10:06 +01:00
Igor Ilic	4d0f132822	chore: Remove AWS url	2025-12-10 16:22:40 +01:00
Igor Ilic	ab20443330	chore: Change s3 bucket for permission example	2025-12-10 12:35:58 +01:00
rajeevrajeshuni	d57d188459	resolving merge conflicts	2025-12-10 10:52:10 +05:30
rajeevrajeshuni	8e5f14da78	resolving merge conflicts	2025-12-10 10:49:23 +05:30
Igor Ilic	7972e39653	Merge branch 'dev' into main	2025-12-09 20:40:33 +01:00
Chinmay Bhosale	6ecf719632	Merge pull request #6 from chinu0609/delete-last-acessed fix: flag to enable and disable last_accessed	2025-12-10 00:03:16 +05:30
ketanjain3	2de1bd977d	Merge branch 'dev' into feature/sqlalchemy-custom-connect-args	2025-12-09 23:53:06 +05:30
Pavel Zorin	2ca194c28f	fix format	2025-12-09 18:22:44 +01:00
Pavel Zorin	d932ee4bd9	Specify file type	2025-12-09 17:58:34 +01:00
Igor Ilic	032a74a409	chore: add postgres dependency for cicd test	2025-12-09 17:56:34 +01:00
Pavel Zorin	d0b914acaa	Chore: Remove Ontology file size limit. Code duplications	2025-12-09 17:55:43 +01:00
Igor Ilic	28faf7ce04	test: Add permission example test with running s3 file system	2025-12-09 17:53:18 +01:00
Vasilije	49f7c5188c	feat: avoid double edge vector search in triplet search (#1877 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Eliminates double vector search for edges by ensuring all edge lookups happen once in the retrieval layer. - `brute_force_triplet_search`: Always includes "EdgeType_relationship_name" in collections - `CogneeGraph.map_vector_distances_to_graph_edges`: Removed internal vector search fallback; only maps provided distances. - Tests updated to reflect the new behavior. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [x] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Ensured relationship edges are automatically included in search collections, improving search completeness and accuracy. * Refactor * Simplified graph edge distance mapping logic by removing unnecessary external dependencies, resulting in more efficient edge processing during retrieval operations. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-09 13:23:57 +01:00
ketanjain7981	e1d313a46b	move DATABASE_CONNECT_ARGS parsing to RelationalConfig Signed-off-by: ketanjain7981 <ketan.jain@think41.com>	2025-12-09 10:15:36 +05:30

... 2 3 4 5 6 ...

4774 commits