No description
Find a file
Vasilije eb444ca18f
feat: Add a task that deletes the old data that has not been accessed in a while (#1751)
<!-- .github/pull_request_template.md -->  
  
## Description  
  
This PR implements a data deletion system for unused DataPoint models
based on last access tracking. The system tracks when data is accessed
during search operations and provides cleanup functionality to remove
data that hasn't been accessed within a configurable time threshold.
  
**Key Changes:**  
1. Added `last_accessed` timestamp field to the SQL `Data` model  
2. Added `last_accessed_at` timestamp field to the graph `DataPoint`
model
3. Implemented `update_node_access_timestamps()` function that updates
both graph nodes and SQL records during search operations
4. Created `cleanup_unused_data()` function with SQL-based deletion mode
for whole document cleanup
5. Added Alembic migration to add `last_accessed` column to the `data`
table
6. Integrated timestamp tracking into  in retrievers  
7. Added comprehensive end-to-end test for the cleanup functionality  
  
## Related Issues  
Fixes #[issue_number]  
  
## Type of Change  
- [x] New feature (non-breaking change that adds functionality)  
- [ ] Bug fix (non-breaking change that fixes an issue)  
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update  
- [ ] Code refactoring  
- [ ] Performance improvement  
  
## Database Changes  
- [x] This PR includes database schema changes  
- [x] Alembic migration included: `add_last_accessed_to_data`  
- [x] Migration adds `last_accessed` column to `data` table  
- [x] Migration includes backward compatibility (nullable column)  
- [x] Migration tested locally  
  
## Implementation Details  
  
### Files Modified:  
1. **cognee/modules/data/models/Data.py** - Added `last_accessed` column
2. **cognee/infrastructure/engine/models/DataPoint.py** - Added
`last_accessed_at` field
3. **cognee/modules/retrieval/chunks_retriever.py** - Integrated
timestamp tracking in `get_context()`
4. **cognee/modules/retrieval/utils/update_node_access_timestamps.py**
(new file) - Core tracking logic
5. **cognee/tasks/cleanup/cleanup_unused_data.py** (new file) - Cleanup
implementation
6. **alembic/versions/[revision]_add_last_accessed_to_data.py** (new
file) - Database migration
7. **cognee/tests/test_cleanup_unused_data.py** (new file) - End-to-end
test
  
### Key Functions:  
- `update_node_access_timestamps(items)` - Updates timestamps in both
graph and SQL
- `cleanup_unused_data(minutes_threshold, dry_run, text_doc)` - Main
cleanup function
- SQL-based cleanup mode uses `cognee.delete()` for proper document
deletion
  
## Testing  
- [x] Added end-to-end test: `test_textdocument_cleanup_with_sql()`  
- [x] Test covers: add → cognify → search → timestamp verification →
aging → cleanup → deletion verification
- [x] Test verifies cleanup across all storage systems (SQL, graph,
vector)
- [x] All existing tests pass  
- [x] Manual testing completed  
  
## Screenshots/Videos  
N/A - Backend functionality  
  
## Pre-submission Checklist  
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)  
- [x] All new and existing tests pass  
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description  
- [x] My commits have clear and descriptive messages  
  
## Breaking Changes  
None - This is a new feature that doesn't affect existing functionality.
  

## DCO Affirmation  
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
Resolves #1335 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added access timestamp tracking to monitor when data is last
retrieved.
* Introduced automatic cleanup of unused data based on configurable time
thresholds and access history.
* Retrieval operations now update access timestamps to ensure accurate
tracking of data usage.

* **Tests**
* Added integration test validating end-to-end cleanup workflow across
storage layers.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 09:47:31 +01:00
.github test: set up triggers for docs and community tests on new main release (#1780) 2025-12-19 09:40:10 +01:00
alembic feat: Add a task that deletes the old data that has not been accessed in a while (#1751) 2025-12-19 09:47:31 +01:00
assets
bin
cognee feat: Add a task that deletes the old data that has not been accessed in a while (#1751) 2025-12-19 09:47:31 +01:00
cognee-frontend fix: install nvm and node for -ui cli command 2025-11-26 12:24:14 +01:00
cognee-mcp Merge branch 'dev' into add-custom-label-apenade 2025-12-16 19:15:30 +01:00
cognee-starter-kit
deployment
distributed
evals
examples Merge branch 'main' into merge-main-vol7 2025-12-11 19:11:24 +01:00
licenses
logs refactor: Return logs folder 2025-10-29 16:31:42 +01:00
notebooks Removed check_permissions_on_dataset.py and related references 2025-11-13 08:31:15 -05:00
tools
working_dir_error_replication
.coderabbit.yaml coderabbit fix 2025-11-25 18:09:43 +01:00
.dockerignore
.env.template feat(database): add connect_args support to SqlAlchemyAdapter (#1861) 2025-12-16 14:50:27 +01:00
.gitattributes
.gitguardian.yml
.gitignore
.mergify.yml ci(Mergify): configuration update 2025-11-21 17:59:15 +01:00
.pre-commit-config.yaml
.pylintrc
AGENTS.md Add repository guidelines to AGENTS.md 2025-10-26 11:18:17 +01:00
alembic.ini
CODE_OF_CONDUCT.md
CONTRIBUTING.md
CONTRIBUTORS.md
DCO.md
docker-compose.yml added logs 2025-10-25 10:26:46 +02:00
Dockerfile
entrypoint.sh added logs 2025-10-25 10:26:46 +02:00
LICENSE
mypy.ini
NOTICE.md
poetry.lock chore: regen poetry lock file 2025-12-05 19:51:26 +01:00
pyproject.toml Release v0.5.1.dev0 2025-12-18 16:07:19 +01:00
README.md Update Python version range in README 2025-11-09 11:42:45 +01:00
SECURITY.md
uv.lock Release v0.5.1.dev0 2025-12-18 16:07:19 +01:00

Cognee Logo

Cognee - Accurate and Persistent AI Memory

Demo . Docs . Learn More · Join Discord · Join r/AIMemory . Community Plugins & Add-ons

GitHub forks GitHub stars GitHub commits GitHub tag Downloads License Contributors Sponsor

cognee - Memory for AI Agents  in 5 lines of code | Product Hunt topoteretes%2Fcognee | Trendshift

Use your data to build personalized and dynamic memory for AI Agents. Cognee lets you replace RAG with scalable and modular ECL (Extract, Cognify, Load) pipelines.

🌐 Available Languages : Deutsch | Español | Français | 日本語 | 한국어 | Português | Русский | 中文

Why cognee?

About Cognee

Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.

You can use Cognee in two ways:

  1. Self-host Cognee Open Source, which stores all data locally by default.
  2. Connect to Cognee Cloud, and get the same OSS stack on managed infrastructure for easier development and productionization.

Cognee Open Source (self-hosted):

  • Interconnects any type of data — including past conversations, files, images, and audio transcriptions
  • Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
  • Reduces developer effort and infrastructure cost while improving quality and precision
  • Provides Pythonic data pipelines for ingestion from 30+ data sources
  • Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints

Cognee Cloud (managed):

  • Hosted web UI dashboard
  • Automatic version updates
  • Resource usage analytics
  • GDPR compliant, enterprise-grade security

Basic Usage & Feature Guide

To learn more, check out this short, end-to-end Colab walkthrough of Cognee's core features.

Open In Colab

Quickstart

Lets try Cognee in just a few lines of code. For detailed setup and configuration, see the Cognee Docs.

Prerequisites

  • Python 3.10 to 3.13

Step 1: Install Cognee

You can install Cognee with pip, poetry, uv, or your preferred Python package manager.

uv pip install cognee

Step 2: Configure the LLM

import os
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"

Alternatively, create a .env file using our template.

To integrate other LLM providers, see our LLM Provider Documentation.

Step 3: Run the Pipeline

Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.

Now, run a minimal pipeline:

import cognee
import asyncio


async def main():
    # Add text to cognee
    await cognee.add("Cognee turns documents into AI memory.")

    # Generate the knowledge graph
    await cognee.cognify()

    # Add memory algorithms to the graph
    await cognee.memify()

    # Query the knowledge graph
    results = await cognee.search("What does Cognee do?")

    # Display the results
    for result in results:
        print(result)


if __name__ == '__main__':
    asyncio.run(main())

As you can see, the output is generated from the document we previously stored in Cognee:

  Cognee turns documents into AI memory.

Use the Cognee CLI

As an alternative, you can get started with these essential commands:

cognee-cli add "Cognee turns documents into AI memory."

cognee-cli cognify

cognee-cli search "What does Cognee do?"
cognee-cli delete --all

To open the local UI, run:

cognee-cli -ui

Demos & Examples

See Cognee in action:

Persistent Agent Memory

Cognee Memory for LangGraph Agents

Simple GraphRAG

Watch Demo

Cognee with Ollama

Watch Demo

Community & Support

Contributing

We welcome contributions from the community! Your input helps make Cognee better for everyone. See CONTRIBUTING.md to get started.

Code of Conduct

We're committed to fostering an inclusive and respectful community. Read our Code of Conduct for guidelines.

Research & Citation

We recently published a research paper on optimizing knowledge graphs for LLM reasoning:

@misc{markovic2025optimizinginterfaceknowledgegraphs,
      title={Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning},
      author={Vasilije Markovic and Lazar Obradovic and Laszlo Hajdu and Jovan Pavlovic},
      year={2025},
      eprint={2505.24478},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.24478},
}