No description
Find a file
Vasilije 114b56d829
feat: Add usage frequency tracking for graph elements (#1992)
## Description

This PR adds usage frequency tracking to help identify which graph
elements (nodes) are most frequently accessed during user searches.

**Related Issue:** Closes [#1458]
**The Problem:**
When users search repeatedly, we had no way to track which pieces of
information were being referenced most often. This made it impossible
to:
- Prioritize popular content in search results
- Understand which topics users care about most
- Improve retrieval by boosting frequently-used nodes

**The Solution:**
I've implemented a system that tracks usage patterns by:
1. Leveraging the existing `save_interaction=True` flag in
`cognee.search()` which creates `CogneeUserInteraction` nodes
2. Following the `used_graph_element_to_answer` edges to see which graph
elements each search referenced
3. Counting how many times each element was accessed within a
configurable time window (default: 7 days)
4. Writing a `frequency_weight` property back to frequently-accessed
nodes

This gives us a simple numeric weight on nodes that reflects real usage
patterns, which can be used to improve search ranking, analytics
dashboards, or identifying trending topics.

**Key Design Decisions:**
- Time-windowed counting (not cumulative) - focuses on recent usage
patterns
- Configurable minimum threshold - filters out noise from rarely
accessed nodes
- Neo4j-first implementation using Cypher queries - works with our
primary production database
- Documented Kuzu limitation - requires schema changes, leaving for
future work as acceptable per team discussion

The implementation follows existing patterns in Cognee's memify pipeline
and can be run as a scheduled task or on-demand.

**Known Limitations:**
**Kuzu adapter not currently supported** - Kuzu requires properties to
be defined in the schema at node creation time, so dynamic property
updates don't work. I'm opening a separate issue to track Kuzu support,
which will require schema modifications in the Kuzu adapter. For now,
this feature works with Neo4j (our primary production database).

**Follow-up Issue:** #1993 

## Acceptance Criteria

**Core Functionality:**
-  `extract_usage_frequency()` correctly counts node access frequencies
from interaction data
-  `add_frequency_weights()` writes `frequency_weight` property to
Neo4j nodes
-  Time window filtering works (only counts recent interactions)
-  Minimum threshold filtering works (excludes rarely-used nodes)
-  Element type distribution tracked for analytics
-  Gracefully handles unsupported adapters (logs warning, doesn't
crash)

**Testing Verification:**
1. Run the end-to-end example with Neo4j:
   ```bash
   # Update .env for Neo4j
   GRAPH_DATABASE_PROVIDER=neo4j
   GRAPH_DATASET_HANDLER=neo4j_aura_dev
   
   python extract_usage_frequency_examplepy
   ```
   Should show frequencies extracted and applied to nodes

2. Verify in Neo4j Browser (http://localhost:7474):
   ```cypher
   MATCH (n) WHERE n.frequency_weight IS NOT NULL 
   RETURN n.frequency_weight, labels(n), n.text 
   ORDER BY n.frequency_weight DESC LIMIT 10
   ```
   Should return nodes with frequency weights

3. Run unit tests:
   ```bash
   python test_usage_frequency.py
   ```
   All tests pass (tests are adapter-agnostic and test core logic)

4. Test graceful handling with unsupported adapter:
   ```bash
   # Update .env for Kuzu
   GRAPH_DATABASE_PROVIDER=kuzu
   GRAPH_DATASET_HANDLER=kuzu
   
   python extract_usage_frequency_example.py
   ```
   Should log warning about Kuzu not being supported but not crash

**Files Added:**
- `cognee/tasks/memify/extract_usage_frequency.py` - Core implementation
(215 lines)
- `extract_usage_frequency_example.py` - Complete working example with
documentation
- `test_usage_frequency.py` - Unit tests for core logic
- Test utilities and Neo4j setup scripts for local development

**Tested With:**
- Neo4j 5.x (primary target, fully working)
- Kuzu (gracefully skips with warning)
- Python 3.10, 3.11
- Existing Cognee interaction tracking (save_interaction=True)

**What This Solves:**
This directly addresses the need for usage-based ranking mentioned in
[#1458]. Now teams can:
- See which information gets referenced most in their knowledge base
- Build analytics dashboards showing popular topics
- Weight search results by actual usage patterns
- Identify content that needs improvement (low frequency despite high
relevance)

## Type of Change

- [x] New feature (non-breaking change that adds functionality)

## Screenshots
**Output from running the E2E example showing frequency extraction:**
<img width="1125" height="664" alt="image"
src="https://github.com/user-attachments/assets/455c1ee4-525d-498b-8219-8f12a15292eb"
/>
<img width="1125" height="664" alt="image"
src="https://github.com/user-attachments/assets/64d5da31-85db-427b-b4b4-df47a9c12d6f"
/>
<img width="822" height="456" alt="image"
src="https://github.com/user-attachments/assets/69967354-d550-4818-9aff-a2273e48c5f3"
/>


**Neo4j Browser verification:**
```
✓ Found 6 nodes with frequency_weight in Neo4j!
Sample weighted nodes:
  - Weight: 37, Type: ['DocumentChunk']
  - Weight: 30, Type: ['Entity']
```

## Pre-submission Checklist

- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added usage frequency extraction that aggregates interaction data and
weights frequently accessed graph elements.
* Frequency analysis supports configurable time windows, minimum
interaction thresholds, and element type filtering.
* Automatic frequency weight propagation to Neo4j, Kuzu, and generic
graph database backends.

* **Documentation**
* Added comprehensive example script demonstrating end-to-end usage
frequency extraction, weighting, and analysis.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-13 14:46:48 +01:00
.github Chore: Remove Lint and Format check in favor to pre-commit 2026-01-09 14:15:36 +01:00
alembic chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
assets chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
bin Revert "Clean up core cognee repo" 2025-05-15 10:46:01 +02:00
cognee Merge branch 'feature/graph-usage-frequency-tracking' into dev 2026-01-12 21:58:19 +00:00
cognee-frontend Merge branch 'dev' into ffix_sec 2026-01-08 21:29:02 +01:00
cognee-mcp chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
cognee-starter-kit chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
deployment Chore: Fix helm chart 2026-01-09 18:06:08 +01:00
distributed fix: fixes distributed pipeline (#1454) 2025-10-09 14:06:25 +02:00
evals chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
examples Merge branch 'feature/graph-usage-frequency-tracking' into dev 2026-01-12 21:58:19 +00:00
licenses Revert "Clean up core cognee repo" 2025-05-15 10:46:01 +02:00
logs refactor: Return logs folder 2025-10-29 16:31:42 +01:00
new-examples chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
notebooks Removed check_permissions_on_dataset.py and related references 2025-11-13 08:31:15 -05:00
tools chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
working_dir_error_replication feat: Redis lock integration and Kuzu agentic access fix (#1504) 2025-10-16 15:48:20 +02:00
.coderabbit.yaml chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
.dockerignore Revert "Clean up core cognee repo" 2025-05-15 10:46:01 +02:00
.env.example Chore: pre-commit, pre-commit action, contribution guide update 2026-01-08 19:19:07 +01:00
.env.template feat(database): add connect_args support to SqlAlchemyAdapter (#1861) 2025-12-16 14:50:27 +01:00
.gitattributes Merge dev with main (#921) 2025-06-07 07:48:47 -07:00
.gitguardian.yml chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
.gitignore Chore: Fix helm chart 2026-01-09 18:06:08 +01:00
.mergify.yml chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
.pre-commit-config.yaml chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
.pylintrc fix: enable sqlalchemy adapter 2024-08-04 22:23:28 +02:00
AGENTS.md chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
alembic.ini fix: Logger suppresion and database logs (#1041) 2025-07-03 20:08:27 +02:00
CLAUDE.md docs: add CLAUDE.md for Claude Code guidance 2026-01-11 16:04:11 +01:00
CODE_OF_CONDUCT.md Version 0.1.21 (#431) 2025-01-10 19:37:50 +01:00
CONTRIBUTING.md Chore: pre-commit, pre-commit action, contribution guide update 2026-01-08 19:19:07 +01:00
CONTRIBUTORS.md Merge with main (#892) 2025-05-30 23:13:04 +02:00
DCO.md Version 0.1.21 (#431) 2025-01-10 19:37:50 +01:00
docker-compose.yml ``` 2026-01-04 11:08:42 +08:00
Dockerfile refactor: Remove comment from Dockerfile 2026-01-08 12:45:03 +01:00
entrypoint.sh added logs 2025-10-25 10:26:46 +02:00
LICENSE Update LICENSE 2024-03-30 11:57:07 +01:00
mise.toml chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
mypy.ini chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
NOTICE.md add NOTICE file, reference CoC in contribution guidelines, add licenses folder for external licenses 2024-12-06 13:27:55 +00:00
poetry.lock fix: Resolve issue with distributed test 2026-01-09 11:20:16 +01:00
pyproject.toml Merge branch 'dev' into ffix_sec 2026-01-08 21:29:02 +01:00
README.md chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
SECURITY.md Merge main vol 2 (#967) 2025-06-11 09:28:41 -04:00
uv.lock Merge branch 'dev' into ffix_sec 2026-01-08 21:29:02 +01:00

Cognee Logo

Cognee - Accurate and Persistent AI Memory

Demo . Docs . Learn More · Join Discord · Join r/AIMemory . Community Plugins & Add-ons

GitHub forks GitHub stars GitHub commits GitHub tag Downloads License Contributors Sponsor

cognee - Memory for AI Agents  in 5 lines of code | Product Hunt topoteretes%2Fcognee | Trendshift

Use your data to build personalized and dynamic memory for AI Agents. Cognee lets you replace RAG with scalable and modular ECL (Extract, Cognify, Load) pipelines.

🌐 Available Languages : Deutsch | Español | Français | 日本語 | 한국어 | Português | Русский | 中文

Why cognee?

About Cognee

Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.

You can use Cognee in two ways:

  1. Self-host Cognee Open Source, which stores all data locally by default.
  2. Connect to Cognee Cloud, and get the same OSS stack on managed infrastructure for easier development and productionization.

Cognee Open Source (self-hosted):

  • Interconnects any type of data — including past conversations, files, images, and audio transcriptions
  • Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
  • Reduces developer effort and infrastructure cost while improving quality and precision
  • Provides Pythonic data pipelines for ingestion from 30+ data sources
  • Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints

Cognee Cloud (managed):

  • Hosted web UI dashboard
  • Automatic version updates
  • Resource usage analytics
  • GDPR compliant, enterprise-grade security

Basic Usage & Feature Guide

To learn more, check out this short, end-to-end Colab walkthrough of Cognee's core features.

Open In Colab

Quickstart

Lets try Cognee in just a few lines of code. For detailed setup and configuration, see the Cognee Docs.

Prerequisites

  • Python 3.10 to 3.13

Step 1: Install Cognee

You can install Cognee with pip, poetry, uv, or your preferred Python package manager.

uv pip install cognee

Step 2: Configure the LLM

import os
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"

Alternatively, create a .env file using our template.

To integrate other LLM providers, see our LLM Provider Documentation.

Step 3: Run the Pipeline

Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.

Now, run a minimal pipeline:

import cognee
import asyncio


async def main():
    # Add text to cognee
    await cognee.add("Cognee turns documents into AI memory.")

    # Generate the knowledge graph
    await cognee.cognify()

    # Add memory algorithms to the graph
    await cognee.memify()

    # Query the knowledge graph
    results = await cognee.search("What does Cognee do?")

    # Display the results
    for result in results:
        print(result)


if __name__ == '__main__':
    asyncio.run(main())

As you can see, the output is generated from the document we previously stored in Cognee:

  Cognee turns documents into AI memory.

Use the Cognee CLI

As an alternative, you can get started with these essential commands:

cognee-cli add "Cognee turns documents into AI memory."

cognee-cli cognify

cognee-cli search "What does Cognee do?"
cognee-cli delete --all

To open the local UI, run:

cognee-cli -ui

Demos & Examples

See Cognee in action:

Persistent Agent Memory

Cognee Memory for LangGraph Agents

Simple GraphRAG

Watch Demo

Cognee with Ollama

Watch Demo

Community & Support

Contributing

We welcome contributions from the community! Your input helps make Cognee better for everyone. See CONTRIBUTING.md to get started.

Code of Conduct

We're committed to fostering an inclusive and respectful community. Read our Code of Conduct for guidelines.

Research & Citation

We recently published a research paper on optimizing knowledge graphs for LLM reasoning:

@misc{markovic2025optimizinginterfaceknowledgegraphs,
      title={Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning},
      author={Vasilije Markovic and Lazar Obradovic and Laszlo Hajdu and Jovan Pavlovic},
      year={2025},
      eprint={2505.24478},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.24478},
}