Commit graph

67 commits

Author SHA1 Message Date
Pavel Zorin
7a48e22b13 chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
Igor Ilic
7de3356b1f fix: Resolve issue with migration order 2026-01-08 14:28:39 +01:00
Vasilije
a0f25f4f50
feat: redo notebook tutorials (#1922)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Two interactive tutorial notebooks added (Cognee Basics, Python
Development) with runnable code and rich markdown; MarkdownPreview for
rendered markdown; instance-aware notebook support and cloud proxy with
API key handling; notebook CRUD (create, save, run, delete).

* **Bug Fixes**
  * Improved authentication handling to treat 401/403 consistently.

* **Improvements**
* Auto-expanding text areas; better error propagation from dataset
operations; migration to allow toggling deletability for legacy tutorial
notebooks.

* **Tests**
  * Expanded tests for tutorial creation and loading.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-01 14:44:04 +01:00
Igor Ilic
2c4f9b07ac fix: Resolve migration issue 2025-12-19 13:35:14 +01:00
Igor Ilic
3bc3f63362 fix: Resolve issues with migration 2025-12-19 11:55:13 +01:00
hajdul88
4b71995a70 ruff 2025-12-19 10:25:24 +01:00
Vasilije
eb444ca18f
feat: Add a task that deletes the old data that has not been accessed in a while (#1751)
<!-- .github/pull_request_template.md -->  
  
## Description  
  
This PR implements a data deletion system for unused DataPoint models
based on last access tracking. The system tracks when data is accessed
during search operations and provides cleanup functionality to remove
data that hasn't been accessed within a configurable time threshold.
  
**Key Changes:**  
1. Added `last_accessed` timestamp field to the SQL `Data` model  
2. Added `last_accessed_at` timestamp field to the graph `DataPoint`
model
3. Implemented `update_node_access_timestamps()` function that updates
both graph nodes and SQL records during search operations
4. Created `cleanup_unused_data()` function with SQL-based deletion mode
for whole document cleanup
5. Added Alembic migration to add `last_accessed` column to the `data`
table
6. Integrated timestamp tracking into  in retrievers  
7. Added comprehensive end-to-end test for the cleanup functionality  
  
## Related Issues  
Fixes #[issue_number]  
  
## Type of Change  
- [x] New feature (non-breaking change that adds functionality)  
- [ ] Bug fix (non-breaking change that fixes an issue)  
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update  
- [ ] Code refactoring  
- [ ] Performance improvement  
  
## Database Changes  
- [x] This PR includes database schema changes  
- [x] Alembic migration included: `add_last_accessed_to_data`  
- [x] Migration adds `last_accessed` column to `data` table  
- [x] Migration includes backward compatibility (nullable column)  
- [x] Migration tested locally  
  
## Implementation Details  
  
### Files Modified:  
1. **cognee/modules/data/models/Data.py** - Added `last_accessed` column
2. **cognee/infrastructure/engine/models/DataPoint.py** - Added
`last_accessed_at` field
3. **cognee/modules/retrieval/chunks_retriever.py** - Integrated
timestamp tracking in `get_context()`
4. **cognee/modules/retrieval/utils/update_node_access_timestamps.py**
(new file) - Core tracking logic
5. **cognee/tasks/cleanup/cleanup_unused_data.py** (new file) - Cleanup
implementation
6. **alembic/versions/[revision]_add_last_accessed_to_data.py** (new
file) - Database migration
7. **cognee/tests/test_cleanup_unused_data.py** (new file) - End-to-end
test
  
### Key Functions:  
- `update_node_access_timestamps(items)` - Updates timestamps in both
graph and SQL
- `cleanup_unused_data(minutes_threshold, dry_run, text_doc)` - Main
cleanup function
- SQL-based cleanup mode uses `cognee.delete()` for proper document
deletion
  
## Testing  
- [x] Added end-to-end test: `test_textdocument_cleanup_with_sql()`  
- [x] Test covers: add → cognify → search → timestamp verification →
aging → cleanup → deletion verification
- [x] Test verifies cleanup across all storage systems (SQL, graph,
vector)
- [x] All existing tests pass  
- [x] Manual testing completed  
  
## Screenshots/Videos  
N/A - Backend functionality  
  
## Pre-submission Checklist  
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)  
- [x] All new and existing tests pass  
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description  
- [x] My commits have clear and descriptive messages  
  
## Breaking Changes  
None - This is a new feature that doesn't affect existing functionality.
  

## DCO Affirmation  
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
Resolves #1335 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added access timestamp tracking to monitor when data is last
retrieved.
* Introduced automatic cleanup of unused data based on configurable time
thresholds and access history.
* Retrieval operations now update access timestamps to ensure accurate
tracking of data usage.

* **Tests**
* Added integration test validating end-to-end cleanup workflow across
storage layers.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 09:47:31 +01:00
Boris Arzentar
d127381262
Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook 2025-12-18 15:28:56 +01:00
Boris Arzentar
f93d414e94
feat: simplify the current tutorial and add cognee basics tutorial 2025-12-18 15:28:45 +01:00
Igor Ilic
b77961b0f1 fix: Resolve issues with data label PR, add tests and upgrade migration 2025-12-16 20:59:17 +01:00
Igor Ilic
56b03c89f3 Merge branch 'dev' into add-custom-label-apenade 2025-12-16 19:15:30 +01:00
Igor Ilic
127d9860df
feat: Add dataset database handler info (#1887)
<!-- .github/pull_request_template.md -->

## Description
Add info on dataset database handler used for dataset database

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Datasets now record their assigned vector and graph database handlers,
allowing per-dataset backend selection.

* **Chores**
  * Database schema expanded to store handler identifiers per dataset.
* Deletion/cleanup processes now use dataset-level handler info for
accurate removal across backends.

* **Tests**
* Tests updated to include and validate the new handler fields in
dataset creation outputs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:22:03 +01:00
Igor Ilic
ddf802ff54 chore: Add migration of unique constraint for SQLite 2025-11-27 18:38:00 +01:00
Igor Ilic
1ff6a72fc7 refactor: set default value to empty dictionary 2025-11-26 16:45:18 +01:00
Igor Ilic
cf9edf2663 chore: Add migration for new dataset database model field 2025-11-25 18:03:35 +01:00
chinu0609
b52c1a1e25 fix: flag to enable and disable last_accessed 2025-11-24 12:50:39 +05:30
chinu0609
53d3b50f93 Merge remote-tracking branch 'upstream/dev' 2025-11-22 14:54:10 +05:30
chinu0609
43290af1b2 fix: set last_acessed to current timestamp 2025-11-19 21:00:16 +05:30
apenade
a072773995
Update alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-19 16:02:27 +01:00
apenade
8ea83e4a26
Update alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-18 10:44:49 +01:00
Andrés Peña del Río
82d48663bb Add custom label support to Data model (#1769) 2025-11-17 23:27:31 +01:00
Andrej Milićević
8e8aecb76f
feat: enable multi user for falkor (#1689)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added multi-user support for Falkor. Adding support for the rest of the
graph dbs should be a bit easier after this first one, especially since
Falkor is hybrid.
There are a few things code quality wise that might need changing, I am
open to suggestions.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevi@Andrejs-MacBook-Pro.local>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-11-11 17:03:48 +01:00
Igor Ilic
011a7fb60b fix: Resolve multi user migration 2025-11-11 13:53:19 +01:00
Igor Ilic
322c134e7e
Merge branch 'dev' into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 13:01:07 +01:00
Igor Ilic
61e1c2903f fix: Remove issue with default user creation 2025-11-06 17:00:46 +01:00
Igor Ilic
bcc59cf9a0 fix: Remove default user creation 2025-11-06 16:57:59 +01:00
Igor Ilic
0d68175167 fix: remove database creation from migrations 2025-11-06 16:53:22 +01:00
Igor Ilic
efb46c99f9 fix: resolve issue with sqlite migration 2025-11-06 16:47:42 +01:00
Igor Ilic
c146de3a4d fix: Remove creation of database and db tables from env.py 2025-11-06 16:41:00 +01:00
Igor Ilic
ef3a382669 refactor: use batch insert for SQLite as well 2025-11-06 16:23:54 +01:00
Igor Ilic
ac751bacf0 fix: Resolve SQLite migration issue 2025-11-06 14:51:25 +01:00
Igor Ilic
ac6dd08855 fix: Resolve issue with sqlite index creation 2025-11-06 14:35:26 +01:00
Igor Ilic
ce64f242b7 refactor: add droping of index as well 2025-11-05 18:04:05 +01:00
Igor Ilic
1ef5805c57 fix: Resolve issue with sync migration 2025-11-05 17:50:13 +01:00
Igor Ilic
9b6cbaf389 chore: Add multi tenant migration 2025-11-05 17:24:11 +01:00
Igor Ilic
c4807a0c67 refactor: Use user_tenants table to update 2025-11-05 16:14:37 +01:00
Igor Ilic
fa4c50f972 fix: Resolve issue with sync migration not working for postgresql 2025-11-05 16:05:33 +01:00
chinu0609
ff263c0132 fix: add column check in migration 2025-11-05 18:40:58 +05:30
Igor Ilic
9fc4199958 fix: Resolve issue with cleaning acl table 2025-11-05 13:18:47 +01:00
Igor Ilic
1643b13c95 chore: add table creation for multi-tenancy to migration 2025-11-05 12:43:01 +01:00
Igor Ilic
db2a32dd17 test: Resolve issue permission example 2025-11-04 19:17:02 +01:00
Igor Ilic
fb102f29a8 chore: Add alembic migration for multi-tenant system 2025-11-04 19:03:56 +01:00
chinu0609
d34fd9237b feat: adding last_acessed in the Data model 2025-11-04 22:04:32 +05:30
Andrej Milicevic
908d329127 feat: add alembic migrations 2025-10-30 15:15:41 +01:00
Igor Ilic
093dc2f5e3 chore: Update MCP version 2025-09-11 23:41:24 +02:00
Boris
74a3220e9f
fix: graph view with search (#1368)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-11 16:16:03 +02:00
Daulet Amirkhanov
47cb34e89c
feat: update sync to be two way (#1359)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-11 15:34:43 +02:00
Igor Ilic
1b076bce1e
Loader separation migration (#1251)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-14 15:50:36 -04:00
Igor Ilic
beea2f5e0a
Incremental loading migration (#1238)
<!-- .github/pull_request_template.md -->

## Description
Add relational db migration for incremental loading, change incremental
loading to work document per document instead of async together

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-13 07:58:09 -04:00
Igor Ilic
bc67eb9651
Regen lock files (#1153)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-25 11:45:28 -04:00