<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
Adds graph completion retriever fix
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
- Fixes MCP server communication issue by switching to sys.stderr ( as
is default for python loggin )
- Adds needed api optional dependency for fastapi users
- Removes lock file as a new one will need to be made after new Cognee
release with api optional dependency
- Adds log file location to MCP tool call answer
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
Add logging to logs file
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
I added one example "get all connected nodes to entity"
---------
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
Add ability to migrate relational database to graph database
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
Adds ontology demo 2
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
---------
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
- handle empty distance list in brute force search
- unit tests
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
Introducing scructlog.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
Cognee backend fixes
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Improved handling of `tenant_id` in JWT payload for enhanced type
safety.
- Unique identifier generation for datasets now considers the owner ID,
allowing for multiple users to share the same dataset name.
- **Bug Fixes**
- Disabled user role permissions in the permission check logic
temporarily during a rework.
- **Refactor**
- Simplified dependencies by removing unnecessary model imports.
- Updated parameter name from `tenant` to `tenant_id` for clarity in JWT
creation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
- Optimized to deduplicate nodes appearing in multiple triplets,
avoiding redundant text repetition
- Reimplemented `resolve_edges_to_text` with cleaner formatting
- Added `_top_n_words` method for extracting frequent words from text
- Created `_get_title` function to generate titles from text content
based on first words and word frequency
- Extracted node processing logic to `_get_nodes` helper method
- Created dedicated `stop_words` utility with common English stopwords
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Summary by CodeRabbit
- **New Features**
- Improved text output formatting that organizes content into clearly
defined sections for enhanced readability.
- Enhanced text processing capabilities, including refined title
generation and key phrase extraction.
- Introduced a comprehensive utility for managing common stop words,
further optimizing text analysis.
- **Bug Fixes**
- Updated tests to ensure accurate validation of new functionalities and
improved existing test coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Removed unused code to streamline internal processes.
- **Tests**
- Added a comprehensive suite of tests to validate core retrieval and
search functionalities.
- Improved validation of response generation, context handling, and
error scenarios to ensure consistent and reliable performance.
These improvements enhance overall system stability and maintainability,
contributing to a smoother experience for end-users.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: vasilije <vas.markovic@gmail.com>
…t issues
<!-- .github/pull_request_template.md -->
## Description
Resolve issue with MCP timeout by switching cognify and codify to run as
background async tasks
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced feedback messages now inform users when operations are
running in the background, providing an estimated wait time of up to 4
minutes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Updates helm chart image
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Updated the Docker image reference for the cognee application service
to use the new, more official source for deployments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
This .sh file can be used for EC2 deployment as explained in
https://github.com/topoteretes/cognee-docs/pull/58
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Documentation**
- Removed outdated guidance for setting up evaluation environments,
streamlining the visible instructions.
- **Chores**
- Updated the Ubuntu setup process to install Python 3.12, ensuring the
virtual environment uses the latest version and enhancing overall
performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Exposes the query method of the adapter in the search interface for Kuzu
and Neo4j (cypher compatible adapters)
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a new cypher-based search option that expands the app's
search functionality.
- Enabled asynchronous processing for advanced query execution.
- Enhanced error messaging for unsupported search types and query
execution issues.
- Added a new enumeration value for `CYPHER` to support the new search
type.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ollection handling its not an error
<!-- .github/pull_request_template.md -->
## Description
Deletes error logging from ChromaDB adapter
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Refactor**
- Updated internal error handling to ensure more consistent responses
during unforeseen issues. This change streamlines the system’s approach
to managing errors, reducing unnecessary internal error logs while
maintaining reliable operations and a stable user experience. These
refinements contribute to improved system stability and efficient error
management. Internal operations are now better optimized to handle
unexpected scenarios gracefully.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Removed old, unused eval files.
- swe-bench eval files are kept here as swe-bench eval is not handled by
the new eval framework
- EC2_readme and cloud/setup_ubuntu_instance.sh will be removed (and
moved to the docs website) as part of another task
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
Temporarily remove embedding env variables for code graph action so the
action can run
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Removed legacy secret configuration from the testing workflow to
streamline the CI process and enhance maintainability.
- **Improvements**
- Updated the argument name in the code graph pipeline for clarity.
- Enhanced the handling of results in the example script to support
asynchronous processing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
- Created get_default_tasks_by_indices to filter default tasks by
specific indices
- Added get_no_summary_tasks function to skip summarization tasks
- Added get_just_chunks_tasks function for chunk extraction and data
points only
- Added NO_SUMMARIES and JUST_CHUNKS to the TaskGetters enum
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- The evaluation configuration now includes expanded task retrieval
options. Users can choose customized modes that bypass summarization or
focus solely on extracting data chunks, offering a more tailored
evaluation experience.
- Enhanced asynchronous task processing brings increased flexibility and
smoother performance during task selection.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Tests**
- Revised test logic to align with updated conventions for retrieving
database entities, ensuring accurate verification of database state.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Let's scope it out.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced support for the Kuzu graph database provider, enhancing
graph operations and data management capabilities.
- Added a comprehensive adapter for Kuzu, facilitating various graph
database operations.
- Expanded the enumeration of graph database types to include Kuzu.
- **Tests**
- Launched comprehensive asynchronous tests to validate the new Kuzu
graph integration’s performance and reliability.
- **Chores**
- Updated dependency settings and continuous integration workflows to
include the Kuzu provider, ensuring smoother deployments and improved
system quality.
- Enhanced configuration documentation to clarify Kuzu database
requirements.
- Modified Dockerfile to include Kuzu in the installation extras.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
# Add Support for ChromaDB
## Summary
This PR adds support for ChromaDB as a vector database option in the
Cognee application. ChromaDB is a modern, open-source embedding database
designed for AI applications.
## Changes
- Created a new ChromaDBAdapter implementation for vector database
operations
- Added comprehensive test suite for ChromaDB functionality
- Updated docker-compose.yml to include ChromaDB service
- Modified environment configuration to support ChromaDB settings
- Updated vector engine creation logic to support ChromaDB as an option
## Technical Details
- Implemented `ChromaDBAdapter.py` (347 lines) with full CRUD operations
for vector data
- Created test suite (`test_chromadb.py`) with 171 lines of test
coverage
- Updated vector engine creation process to dynamically select ChromaDB
when configured
- Modified settings router to accommodate new database option
- Updated environment template with ChromaDB configuration options
## Docker Changes
- Added ChromaDB service to docker-compose.yml with appropriate
configuration
This PR enhances Cognee's flexibility by providing an alternative vector
database option, allowing users to choose the most appropriate database
for their specific use case.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
Tested with UI + tests.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Expanded vector database integration by adding support for Chromadb,
enabling enhanced data management and search functionalities.
- **Tests**
- Added automated tests to validate the Chromadb integration and related
operations.
- **Chores**
- Updated configuration guidance and dependency management to include
Chromadb.
- Provided an optional container deployment template for Chromadb.
- Added a new entry to ignore the `.chromadb_data/` directory in version
control.
- Introduced a new GitHub Actions workflow for testing Chromadb
integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
- Added _filter_instances to BaseBenchmarkAdapter supporting filtering
by IDs, indices, or JSON files.
- Updated HotpotQAAdapter and MusiqueQAAdapter to use the base class
filtering.
- Added instance_filter parameter to corpus builder pipeline.
- Extracted _get_raw_corpus method in both adapters for better code
organization
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Corpus loading and building now support a flexible filtering option,
allowing users to apply custom criteria to tailor the retrieved data.
- **Refactor**
- The extraction process has been reorganized to separately handle text
content and associated metadata, enhancing clarity and overall workflow
efficiency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Added user authorization through JWT header, reworked user and relevant
RBAC models to accompany future User Permission system.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced an automated workflow to validate server startup.
- Added secure JWT token generation for improved session handling.
- Enabled a new structure for permission management with role and
tenant-based controls, including endpoints for creating roles, tenants,
and assigning permissions.
- Added methods for assigning default permissions to roles, tenants, and
users.
- Introduced new classes for managing default permissions for roles,
tenants, and users.
- **Refactor**
- Streamlined authentication and user management flows with enhanced
error handling.
- **Tests**
- Upgraded integration tests with improved database initialization and
data pruning for a more stable environment.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
- Exposed chunk_size in get_default_tasks in cognify
- Reintegrated chunk_size in corpus building in eval framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced an optional configuration parameter to allow users to set
custom processing segment sizes. This enhances flexibility in managing
content processing and task execution, enabling more dynamic control
over resource handling during corpus creation and related operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
This PR contains the ontology feature integrated into cognify
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced ontology management with the introduction of the
`OntologyResolver` class for improved data handling and querying.
- Expanded ontology framework now provides enriched coverage of
technology and automotive domains, including new entities and
relationships.
- Updated entity models now include a validation flag to support
improved data integrity.
- Added support for specifying an ontology file path in relevant
functions to enhance flexibility.
- **Refactor**
- Streamlined integration of ontology processing across data extraction
and workflow routines.
- **Chores**
- Updated project dependencies to include `owlready2` for advanced
ontology functionality.
- **Tests**
- Introduced a new test suite for the `OntologyResolver` class to
validate its functionality under various conditions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Refactor**
- Simplified text processing by unifying multiple size-related
parameters into a single metric across chunking and extraction
functionalities.
- Streamlined logic for text segmentation by removing redundant
calculations and checks, resulting in a more consistent chunk management
process.
- **Chores**
- Removed the `modal` package as a dependency.
- **Documentation**
- Updated the README.md to include a new demo video link and clarified
default environment variable settings.
- Enhanced the CONTRIBUTING.md to improve clarity and engagement for
potential contributors.
- **Bug Fixes**
- Improved handling of sentence-ending punctuation in text processing to
include additional characters.
- **Version Update**
- Updated project version to 0.1.33 in the pyproject.toml file.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Refined dependency version specifications to allow smoother minor
updates while enhancing compatibility.
- Introduced conditional configurations for improved
environment-specific stability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
## Description
Missing dependency.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enabled PostgreSQL integration, expanding support for additional
database options and enhancing overall functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->