<!-- .github/pull_request_template.md -->
## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Update default tutorial:
1. Use tutorial from [notebook_tutorial
branch](https://github.com/topoteretes/cognee/blob/notebook_tutorial/notebooks/tutorial.ipynb),
specifically - it's .zip version with all necessary data files
2. Use Jupyter Notebook `Notebook` abstractions to read, and map `ipynb`
into our Notebook model
3. Dynamically update starter notebook code blocks that reference
starter data files, and swap them with local paths to downloaded copies
4. Test coverage
| Before | After (storage backend = local) | After (s3) |
|--------|---------------------------------|------------|
| <img width="613" height="546" alt="Screenshot 2025-09-17 at 01 00 58"
src="https://github.com/user-attachments/assets/20b59021-96c1-4a83-977f-e064324bd758"
/> | <img width="1480" height="262" alt="Screenshot 2025-09-18 at 13 01
57"
src="https://github.com/user-attachments/assets/bd56ea78-7c6a-42e3-ae3f-4157da231b2d"
/> | <img width="1485" height="307" alt="Screenshot 2025-09-18 at 12 56
08"
src="https://github.com/user-attachments/assets/248ae720-4c78-445a-ba8b-8a2991ed3f80"
/> |
## File Replacements
### S3 Demo
https://github.com/user-attachments/assets/bd46eec9-ef77-4f69-9ef0-e7d1612ff9b3
---
### Local FS Demo
https://github.com/user-attachments/assets/8251cea0-81b3-4cac-a968-9576c358f334
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Changes Made
<!-- List the specific changes made in this PR -->
-
-
-
## Testing
<!-- Describe how you tested your changes -->
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages
## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->
## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
---------
Co-authored-by: vasilije <vas.markovic@gmail.com>
<!-- .github/pull_request_template.md -->
## Description
Add logging to logs file
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
# Add Support for ChromaDB
## Summary
This PR adds support for ChromaDB as a vector database option in the
Cognee application. ChromaDB is a modern, open-source embedding database
designed for AI applications.
## Changes
- Created a new ChromaDBAdapter implementation for vector database
operations
- Added comprehensive test suite for ChromaDB functionality
- Updated docker-compose.yml to include ChromaDB service
- Modified environment configuration to support ChromaDB settings
- Updated vector engine creation logic to support ChromaDB as an option
## Technical Details
- Implemented `ChromaDBAdapter.py` (347 lines) with full CRUD operations
for vector data
- Created test suite (`test_chromadb.py`) with 171 lines of test
coverage
- Updated vector engine creation process to dynamically select ChromaDB
when configured
- Modified settings router to accommodate new database option
- Updated environment template with ChromaDB configuration options
## Docker Changes
- Added ChromaDB service to docker-compose.yml with appropriate
configuration
This PR enhances Cognee's flexibility by providing an alternative vector
database option, allowing users to choose the most appropriate database
for their specific use case.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
Tested with UI + tests.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Expanded vector database integration by adding support for Chromadb,
enabling enhanced data management and search functionalities.
- **Tests**
- Added automated tests to validate the Chromadb integration and related
operations.
- **Chores**
- Updated configuration guidance and dependency management to include
Chromadb.
- Provided an optional container deployment template for Chromadb.
- Added a new entry to ignore the `.chromadb_data/` directory in version
control.
- Introduced a new GitHub Actions workflow for testing Chromadb
integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.
- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.
- **Chores**
• Dependency and configuration updates improve overall stability and
performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
* fix: remove groups from UserRead model
* fix: add missing system dependencies for postgres
* fix: change vector db provider environment variable name
* fix: WeaviateAdapter retrieve bug
* fix: correctly return data point objects from retrieve method
* fix: align graph object properties
* feat: add node example
* fix: don't return anything on health endpoint
* feat: add alembic migrations
* feat: align search types with the data we store and migrate search to tasks