cognee

Author	SHA1	Message	Date
Faizan Shaikh	bc6117fcba	refactor: replace wildcard import in pipelines.py with explicit imports Signed-off-by: Faizan Shaikh <faizansk9292@gmail.com>	2025-12-19 18:58:37 +05:30
Faizan Shaikh	240d50e96f	refactor: cleanup unused code and improve tokenizer loading logic Signed-off-by: Faizan Shaikh <faizansk9292@gmail.com>	2025-12-19 18:43:32 +05:30
Faizan Shaikh	f637f80d7a	fix: handle provider prefix in LiteLLMEmbeddingEngine tokenizer loading Signed-off-by: Faizan Shaikh <faizansk9292@gmail.com>	2025-12-19 18:35:55 +05:30
Igor Ilic	b5949580de	refactor: add note about verbose in combined context search	2025-12-18 13:45:20 +01:00
Igor Ilic	986b93fee4	docs: add docstring update for search	2025-12-18 13:24:39 +01:00
Igor Ilic	31e491bc88	test: Add test for verbose search	2025-12-18 13:04:17 +01:00
Igor Ilic	f2bc7ca992	refactor: change comment	2025-12-18 12:00:06 +01:00
Igor Ilic	dd9aad90cb	refactor: Make graphs return optional	2025-12-18 11:57:40 +01:00
hajdul88	622f8fa79e	chore: introduces 1 file upload in ontology endpoint (#1899 ) <!-- .github/pull_request_template.md --> ## Description This PR fixes the ontology upload endpoint by forcing 1 file upload at the time. Tests are adjusted in both server start and ontology endpoint unit test. API was tested. Do not merge it together with https://github.com/topoteretes/cognee/pull/1898 its either that or this one. ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * API Changes * Ontology upload now accepts exactly one file per request; field renamed from "descriptions" to "description" and validated as a plain string. * Stricter form validation and tighter 400/500 error handling for malformed submissions. * Tests * Tests converted to real HTTP-style interactions using a shared test client and dependency overrides. * Payloads now use plain string fields; added coverage for single-file constraints and specific error responses. * Style * Minor formatting cleanups with no functional impact. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 18:30:35 +01:00
Igor Ilic	14d9540d1b	feat: Add database deletion on dataset delete (#1893 ) <!-- .github/pull_request_template.md --> ## Description - Add support for database deletion when dataset is deleted - Simplify dataset handler usage in Cognee ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved dataset deletion: stronger authorization checks and reliable removal of associated graph and vector storage. * Tests * Added end-to-end test to verify complete dataset deletion and cleanup of all related storage components. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 18:15:48 +01:00
Andrej Milicevic	433170fe09	merge dev	2025-12-15 17:06:20 +01:00
hajdul88	bad22ba26b	chore: adds id generation to memify triplet embedding pipeline (#1895 ) <!-- .github/pull_request_template.md --> ## Description This PR adds id generation to the Triplet objects in triplet embedding memify pipeline. In some edge cases duplicated elements could have been ingested into the collection ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * Enhancements * Relationship data now includes unique identifiers for improved tracking and data management capabilities. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 15:45:35 +01:00
Vasilije	69e36cc834	feat: add bedrock as supported llm provider (#1830 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Added support for AWS Bedrock, and the models that are available there. This was a contributor PR that was never finished, so now I polished it up and made it work. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added AWS Bedrock as a new LLM provider with support for multiple authentication methods. * Integrated three new AI models: Claude 4.5 Sonnet, Claude 4.5 Haiku, and Amazon Nova Lite. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 14:33:57 +01:00
Igor Ilic	c94225f505	fix: make ontology key an optional param in cognify (#1894 ) <!-- .github/pull_request_template.md --> ## Description Make ontology key optional in Swagger and None by default (it was "string" by default before change which was causing issues when running cognify endpoint) ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Documentation * Enhanced API documentation with additional examples and validation metadata to improve request clarity and validation guidance. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-15 14:30:22 +01:00
Igor Ilic	127d9860df	feat: Add dataset database handler info (#1887 ) <!-- .github/pull_request_template.md --> ## Description Add info on dataset database handler used for dataset database ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Datasets now record their assigned vector and graph database handlers, allowing per-dataset backend selection. * Chores * Database schema expanded to store handler identifiers per dataset. * Deletion/cleanup processes now use dataset-level handler info for accurate removal across backends. * Tests * Tests updated to include and validate the new handler fields in dataset creation outputs. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:22:03 +01:00
Igor Ilic	ede884e0b0	feat: make pipeline processing cache optional (#1876 ) <!-- .github/pull_request_template.md --> ## Description Make the pipeline cache mechanism optional, have it turned off by default but use it for add and cognify like it has been used until now ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [ x I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Introduced pipeline caching across ingestion, processing, and custom pipeline flows with per-run controls to enable or disable caching. * Added an option for incremental loading in custom pipeline runs. * Behavior Changes * One pipeline path now explicitly bypasses caching by default to always re-run when invoked. * Disabling cache forces re-processing instead of early exit; cache reset still enables re-execution. * Tests * Added tests validating caching, non-caching, and cache-reset re-execution behavior. * Chores * Added CI job to run pipeline caching tests. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-12 13:11:31 +01:00
Igor Ilic	59f8d12fa3	Merge branch 'main' into merge-main-vol7	2025-12-11 19:11:24 +01:00
Andrej Milicevic	af8c5bedcc	feat: add kwargs to other adapters	2025-12-11 17:47:23 +01:00
Igor Ilic	46ddd4fd12	feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776 ) <!-- .github/pull_request_template.md --> ## Description Add ability to use multi tenant multi user mode with Neo4j ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * New Features * Multi-user support with per-dataset database isolation enabled by default, allowing backend access control for secure data separation. * Configurable database handlers via environment variables (GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for flexible deployment options. * Chores * Database schema migration to support per-user dataset database configurations. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 14:15:20 +01:00
Igor Ilic	0a1ed79340	refactor: change neo4j_aura to neo4j_aura_dev	2025-12-11 13:05:23 +01:00
Pavel Zorin	fe7e97be45	Chore: Remove Ontology file size limit. Code duplications (#1880 ) <!-- .github/pull_request_template.md --> ## Description We received a complaint about the 10MB file size limit. Removed code duplications More strict types <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Support for supplying optional per-file descriptions when uploading multiple ontologies. * Improvements * Removed the 10MB file size limit for ontology uploads, allowing larger files. * Streamlined and more robust upload handling with improved per-file validation and safer upload behavior. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-11 10:49:55 +01:00
Pavel Zorin	88f61f9bdb	Added filename check	2025-12-10 17:24:31 +01:00
hajdul88	001fbe699e	feat: Adds edge centered payload and embedding structure during ingestion (#1853 ) <!-- .github/pull_request_template.md --> ## Description This pull request introduces edge‑centered payloads to the ingestion process. Payloads are stored in the Triplet_text collection which is compatible with the triplet_embedding memify pipeline. Changes in This PR: - Refactored custom edge handling, from now on they can be passed to the add_data_points method so the ingestion is centralized and is happening in one place. - Added private methods to handle edge centered payload creation inside the add_data_points.py - Added unit tests to cover the new functionality - Added integration tests - Added e2e tests Acceptance Criteria and Testing Scenario 1: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB contains a non empty Triplet_text collection and the number of triplets are matching with the number of edges in the graph database -Use the new triplet_completion search type and confirm it works correctly. Scenario 2: -Set TRIPLET_EMBEDDING env var to True -Run prune, add, cognify -Verify the vector DB does not have the Triplet_text collection -You should receive an error indicating that the Triplet_text is not available ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Triplet embeddings supported—embeddings created from graph edges plus connected node text * Ability to supply custom edges when adding data points * New configuration toggle to enable/disable triplet embedding * Tests * Added comprehensive unit and end-to-end tests for edge-centered payloads and triplet embedding * New CI job to run the edge-centered payload e2e test * Bug Fixes * Adjusted server start behavior to surface process output in parent logs <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-12-10 17:10:06 +01:00
Pavel Zorin	2ca194c28f	fix format	2025-12-09 18:22:44 +01:00
Pavel Zorin	d932ee4bd9	Specify file type	2025-12-09 17:58:34 +01:00
Pavel Zorin	d0b914acaa	Chore: Remove Ontology file size limit. Code duplications	2025-12-09 17:55:43 +01:00
lxobr	c04d255aca	feat: remove secondary search	2025-12-08 17:29:25 +01:00
Vasilije	75fea8dcc8	Removed check_permissions_on_dataset.py and related references (#1786 ) <!-- .github/pull_request_template.md --> ## Description This PR removes the obsolete `check_permissions_on_dataset` task and all its related imports and usages across the codebase. The authorization logic is now handled earlier in the pipeline, so this task is no longer needed. These changes simplify the default Cognify pipeline and make the code cleaner and easier to maintain. ### Changes Made - Removed `cognee/tasks/documents/check_permissions_on_dataset.py` - Removed import from `cognee/tasks/documents/__init__.py` - Removed import and usage in `cognee/api/v1/cognify/cognify.py` - Removed import and usage in `cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py` - Updated comments in `cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py` (index positions changed) - Removed usage in `notebooks/cognee_demo.ipynb` - Updated documentation in `examples/python/simple_example.py` (process description) --- ## Type of Change - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [x] Other (please specify): Task removal / cleanup of deprecated function --- ## Pre-submission Checklist - [ ] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue - [x] My code follows the project's coding standards and style guidelines - [ ] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description (Closes #1771) - [x] My commits have clear and descriptive messages --- ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-12-08 05:43:42 +01:00
Vasilije	7a3138edf8	fix: remove double quotes from llmconfig str params (#1758 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Recently a few cases cryptic errors like in issue #1721 have occurred across cognee use cases. Debugging #1721 however, I found out that if LLM_API_KEY happens to have `"` quotation marks as part of it's value, for example, when already part of the ENV <img width="1014" height="507" alt="Screenshot 2025-11-07 at 16 58 22" src="https://github.com/user-attachments/assets/54b7cbb0-5bdc-4b40-b2b1-aed6c5d3d886" /> Then it makes it's way into Cognee and gets treated as part of the API key. By default, we do not do sanitization nor cleanup. While most of the time quotation marks get handled for us: 1. `export KEY="VALUE"` will strip it 2. python dotenv will strip it if read from `.env` But issues like https://github.com/docker/cli/issues/3630 and #1721 demonstrate that we have to have some handling on our end instead of assuming it's stripped. ## This PR This PR sets up a list of string params we want to strip + some that we may want to. We may want to avoid doing this for all params, which is why I went with selective approach. TODO: add testing ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Configuration values with surrounding quotes are now automatically normalized and cleaned during system initialization, ensuring consistent and predictable data handling across all configuration parameters. * Tests * Added comprehensive unit tests to validate automatic quote removal from configuration values, covering various scenarios including quoted, unquoted, empty, and edge cases with mixed and internal quotes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-08 05:10:23 +01:00
Vasilije	40bbdd1ac7	fix: install nvm and node for -ui cli command (#1836 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Enhanced Node.js and npm environment management for improved system compatibility on Unix-like platforms. * Chores * Updated Next.js to v16, React to v19.2, and Auth0 SDK to v4.13.1 for compatibility and performance improvements. * Removed CrewAI workflow trigger component. * Removed user feedback submission form. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-08 05:09:49 +01:00
Igor Ilic	2f572ae509	test: Update embeding limiter test	2025-12-05 19:18:48 +01:00
Igor Ilic	a66b2ceeca	refactor: reduce ammount of retry attempts for baml llm calls	2025-12-05 18:58:59 +01:00
Igor Ilic	7deaa6e8e9	feat: Add RPM limiting to Cognee	2025-12-05 18:56:34 +01:00
Igor Ilic	0c97a400b0	feat: Add RPM control	2025-12-05 15:40:24 +01:00
Igor Ilic	5d0586da28	Merge branch 'dev' into baml-rate-limit-handling	2025-12-05 13:24:07 +01:00
hajdul88	d5bf5cf4e9	fix: fixes lancedb batch handling (#1872 ) <!-- .github/pull_request_template.md --> ## Description Fixes lancedb batch handling issue. Duplicated elements could appear in the collections when duplicates happen in the same insert batch. ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved data integrity by implementing deduplication logic to eliminate duplicate entries and ensure only the latest version is retained. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-05 12:26:45 +01:00
Vasilije	9571641199	refactor: move codify pipeline out of main repo (#1738 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> This PR removes codify, and the code graph pipeline, out of the repository. It also introduces a Custom Pipeline interface, which can be used in the future to define custom pipelines. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-12-04 23:10:39 -08:00
Igor Ilic	7d7f8a249a	Merge branch 'dev' into main-merge-vol4	2025-12-04 10:32:10 +01:00
Igor Ilic	f1c5b9a55f	fix: Resolve DB caching issues when deleting databases	2025-12-03 18:05:47 +01:00
Igor Ilic	fd84edeb74	refactor: change getting of tables during deletion	2025-12-03 15:43:41 +01:00
Boris	8cad9ef225	Merge branch 'dev' into feature/cog-3409-add-bedrock-as-supported-llm-provider	2025-12-03 14:58:00 +01:00
Igor Ilic	45f32f8bfd	Merge branch 'dev' into multi-tenant-neo4j	2025-12-03 14:37:13 +01:00
Igor Ilic	1961efcc33	fix: Handle scenario when there is no relational database on prune time	2025-12-03 14:27:06 +01:00
Igor Ilic	f4078d1247	feat: Add ability to delete lance and kuzu datasets, add prune to work with multi user mode	2025-12-03 13:10:18 +01:00
Igor Ilic	5698c609f5	test: Update tests with regards to auto scaling changes	2025-12-03 11:47:10 +01:00
Boris Arzentar	0d2e84f58e	test: test_strip_quotes_from_strings	2025-12-03 10:59:17 +01:00
Boris	3288ef01a4	Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params	2025-12-03 10:05:49 +01:00
hajdul88	d4d190ac2b	feature: adds triplet embedding via memify (#1832 ) <!-- .github/pull_request_template.md --> ## Description This PR introduces triplet embeddings via a new create_triplet_embeddings memify pipeline. The pipeline reads the graph in batches, extracts properties from graph elements based on their datapoint types, and generates combined triplet embeddings. These embeddings are stored in the vector database as a new collection. Changes in This PR: -Added a new create_triplet_embeddings memify pipeline. -Added a new get_triplet_datapoints memify task. -Introduced a new triplet_completion search type. -Added full test coverage --Unit tests: memify task, pipeline, and retriever --Integration tests: memify task, pipeline, and retriever --End-to-end tests: updated session history tests and multi-DB search tests; added tests for triplet_completion and memify pipeline execution Acceptance Criteria and Testing Scenario 1: -Run default add, cognify pipelines -Run create triplet embeddings memify pipeline -Verify the vector DB contains a non empty Triplet_text collection. -Use the new triplet_completion search type and confirm it works correctly. Scenario 2: -Run the default add and cognify pipelines. -Do not run the triplet embeddings memify pipeline. -Attempt to use the triplet_completion search type. -You should receive an error indicating that the triplet embeddings memify pipeline must be executed first. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION) * Batch triplet retrieval and a triplet embeddings pipeline for extraction, indexing, and optional background processing * Context retrieval from triplet embeddings with optional caching and conversation-history support * New Triplet data type exposed for indexing and search * Examples * End-to-end example demonstrating triplet embeddings extraction and TRIPLET_COMPLETION search * Tests * Unit and integration tests covering triplet extraction, retrieval, embedding pipeline, and completion flows <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-12-02 18:27:08 +01:00
Igor Ilic	1282905888	feat: add password encryption for Neo4j	2025-12-02 16:34:16 +01:00
Igor Ilic	92448767fe	refactor: remove done TODOs	2025-12-02 14:29:51 +01:00

1 2 3 4 5 ...

2354 commits