Commit graph

4899 commits

Author SHA1 Message Date
hajdul88
cc9eae0285 Update e2e_tests.yml 2026-01-16 19:01:41 +01:00
hajdul88
eb3e3984b3 Update e2e_tests.yml 2026-01-16 18:56:38 +01:00
hajdul88
58a4e34b5d Update e2e_tests.yml 2026-01-16 18:54:08 +01:00
hajdul88
61157725d1 Update e2e_tests.yml 2026-01-16 18:48:48 +01:00
hajdul88
9d373e7657 chore updates mcp deps for CI 2026-01-16 18:41:40 +01:00
hajdul88
cd841363a6 feat: adds mcp tool usage e2e test 2026-01-16 18:28:38 +01:00
hajdul88
01c851cf80 Adds usage logger e2e test 2026-01-16 18:15:17 +01:00
hajdul88
4e1e3dcfb9 chore: makes integration test a bit cleaner 2026-01-16 16:18:18 +01:00
hajdul88
3aadb91a6f ruff ruff 2026-01-16 16:13:22 +01:00
hajdul88
707269e8b8 feat: adds integration test for usage logger 2026-01-16 16:11:53 +01:00
Vasilije
da35f028df
Merge branch 'dev' into feature/cog-3502-tool-logging-with-redis 2026-01-16 15:26:50 +01:00
Vasilije
1d674d459f
feat: Configurable batch size (#1941)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added configurable chunks-per-batch to control per-batch processing
size via CLI flag, API payload, and configuration; defaults are now
driven by config with an automatic fallback.

* **Style / Documentation**
* Updated contribution/style guidelines (formatting, line length,
string-quote rule, pre-commit note).

* **Tests**
* Updated CLI tests to verify propagation of the new chunks-per-batch
parameter.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-16 15:20:10 +01:00
hajdul88
1c96e3b469 ruff ruff 2026-01-16 13:01:46 +01:00
hajdul88
2e9f646edd
Merge branch 'dev' into feature/cog-3502-tool-logging-with-redis 2026-01-16 13:00:11 +01:00
hajdul88
2ad8bcf6e9 feat: adds unit test for usage logging 2026-01-16 12:59:45 +01:00
Igor Ilic
2c29868f9a
Neo4j multiuser delete (#1985)
<!-- .github/pull_request_template.md -->

## Description
- Add delete ability for Neo4j Aura 
- Refactor Neo4j Aura to use aiohttp to make async requests and perform
better

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Performance Improvements**
* Neo4j Aura database operations are now asynchronous, eliminating
blocking requests and improving system responsiveness during dataset
management.
* Token retrieval and database provisioning workflows now use
non-blocking asynchronous calls.
  * Enhanced error handling for database API interactions.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-16 12:43:09 +01:00
hajdul88
abb45c65d7 feat: adding and fixing mcp tool logging 2026-01-15 18:03:47 +01:00
hajdul88
e17ca5ac59 feat: adds logging to memify endpoint 2026-01-15 17:35:06 +01:00
hajdul88
7773e811b0 chore: cleaning and adding correct defaults 2026-01-15 16:31:38 +01:00
hajdul88
c0a7b14ff3 feat: adds log usage decorator to main api endpoints 2026-01-15 16:04:09 +01:00
hajdul88
bc8c6e8bae Update usage_logger.py 2026-01-15 16:02:22 +01:00
hajdul88
2d5e74ced0 fix: fixes codebunny suggestion 2026-01-15 16:01:21 +01:00
hajdul88
b83af5f63f feat: adds new exception to shared usage logger 2026-01-15 15:58:40 +01:00
hajdul88
7ebe8563c5 ruff 2026-01-15 15:21:24 +01:00
hajdul88
8dc358da39 feat: adds default param logging 2026-01-15 15:21:15 +01:00
hajdul88
bf2357e7bf chore: cleaning usage logger logic 2026-01-15 15:16:26 +01:00
hajdul88
e803f10417 feat: implements first version of usage_logger decorator 2026-01-15 14:22:01 +01:00
hajdul88
34513f2c10 fix: fixes unit test for cacheconfig params 2026-01-15 12:46:19 +01:00
hajdul88
ace34b9a91 feat: adds log key to RedisAdapter 2026-01-15 11:56:01 +01:00
hajdul88
8b49f892ce ruff fix 2026-01-15 11:53:54 +01:00
hajdul88
e8edf4482d feat: adds usage logging and log key to the cache engine factory 2026-01-15 11:53:36 +01:00
hajdul88
97fcc15af5 feat: updates constructor params in base class 2026-01-15 11:49:37 +01:00
hajdul88
f4c2365c23 Adds default methods to satisfy base class (FSCache does not support the logging for now) 2026-01-15 11:48:10 +01:00
hajdul88
8f0705359a feat: adds log usage and get logs operations to RedisAdapter 2026-01-15 11:13:03 +01:00
hajdul88
90cf79b420 feat: adds new config params to Cacheconfig 2026-01-15 11:10:13 +01:00
hajdul88
eb8996dd81 feat: extends CacheDBInterface base class with the logging related methods 2026-01-15 11:08:30 +01:00
Igor Ilic
4765f9e4a0
feat: multiquery triplet search (#1991)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
- Adds batch search support to `brute_force_triplet_search` with a new
`query_batch` parameter that accepts a list of queries in addition to
the existing single `query` parameter.
- Introduces a new `NodeEdgeVectorSearch` class that encapsulates vector
search operations, handling embedding and distance retrieval for both
single-query and batch-query modes.
- Returns `List[List[Edge]]` (one list per query) when using
`query_batch`, instead of the single `List[Edge]` format used for single
queries.
- Adds comprehensive test coverage including new test files and cases
for the `NodeEdgeVectorSearch` class, batch search functionality, and
edge cases for both single and batch modes.
- Refactors code by extracting vector search logic into the new class
and adding a helper function `_get_top_triplet_importances` to reduce
code duplication and improve maintainability.
## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added batch-query support to triplet search; batch returns per-query
nested results while single-query remains flat.
* Introduced a unified vector search controller to embed queries and
retrieve node/edge distances across collections.

* **Bug Fixes**
* Improved input validation and safer error handling for missing
collections and batch failures.
  * Stopped adding duplicate skeleton edge links after edge creation.

* **Tests**
* Added comprehensive unit and integration tests covering single/batch
flows and edge cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-14 14:40:29 +01:00
Igor Ilic
f09f66e90d
feat: Remove combined search (#1990)
- Remove use_combined_context parameter from search functions
- Remove CombinedSearchResult class from types module
- Update API routers to remove combined search support
- Remove prepare_combined_context helper function
- Update tutorial notebook to remove use_combined_context usage
- Simplify search return types to always return List[SearchResult]

This removes the combined search feature which aggregated results across
multiple datasets into a single response. Users can still search across
multiple datasets and get results per dataset.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Breaking Changes**
* Search API response simplified: combined-context result type removed
and the legacy combined-context request flag eliminated, changing
response shapes.

* **New Features**
  * dataset_name added to each search result for clearer attribution.

* **Refactor**
* Search logic and return shapes streamlined for access-control and
per-dataset flows; telemetry and request parameters aligned.

* **Tests**
* Combined-context related tests removed or updated to reflect
simplified behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-14 11:31:31 +01:00
Igor Ilic
a27b4b5cd0 refactor: Add back verbose parameter to search 2026-01-13 21:11:57 +01:00
vasilije
bd03a43efa add fix 2026-01-13 17:56:55 +01:00
Vasilije
2b5804f2de
Merge branch 'dev' into remove-combined-search 2026-01-13 17:49:07 +01:00
Vasilije
d341aeabf4
Merge branch 'dev' into feature/cog-3504-multiquery-triplet-search 2026-01-13 17:47:35 +01:00
Igor Ilic
dd16ba89c3
Main merge vol9 (#1994)
<!-- .github/pull_request_template.md -->

## Description
Resolve conflict and merge commits from main to dev

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Add top_k to control number of search results
* Add verbose option to include/exclude detailed graphs in search output

* **Improvements**
  * Examples now use pretty-printed output for clearer readability
* Startup handles migration failures more gracefully with a fallback
initialization path

* **Documentation**
* Updated contributing guidance and added explicit run instructions for
examples

* **Chores**
  * Project version bumped to 0.5.1
  * Adjusted frontend framework version constraint

* **Tests**
  * Updated tests to exercise verbose search behavior

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-13 17:28:03 +01:00
Igor Ilic
48c8a2996f test: Update test search options with verbose mode 2026-01-13 16:27:58 +01:00
lxobr
08779398b0 fix: deduplicate skeleton edges 2026-01-13 16:15:49 +01:00
Igor Ilic
dce51efbe3 chore: ruff format and refactor on contributor PR 2026-01-13 15:10:21 +01:00
Igor Ilic
0d2f66fa1d
Merge branch 'dev' into main-merge-vol9 2026-01-13 15:00:29 +01:00
Igor Ilic
9e5ecffc6e chore: Update test 2026-01-13 14:55:19 +01:00
Vasilije
114b56d829
feat: Add usage frequency tracking for graph elements (#1992)
## Description

This PR adds usage frequency tracking to help identify which graph
elements (nodes) are most frequently accessed during user searches.

**Related Issue:** Closes [#1458]
**The Problem:**
When users search repeatedly, we had no way to track which pieces of
information were being referenced most often. This made it impossible
to:
- Prioritize popular content in search results
- Understand which topics users care about most
- Improve retrieval by boosting frequently-used nodes

**The Solution:**
I've implemented a system that tracks usage patterns by:
1. Leveraging the existing `save_interaction=True` flag in
`cognee.search()` which creates `CogneeUserInteraction` nodes
2. Following the `used_graph_element_to_answer` edges to see which graph
elements each search referenced
3. Counting how many times each element was accessed within a
configurable time window (default: 7 days)
4. Writing a `frequency_weight` property back to frequently-accessed
nodes

This gives us a simple numeric weight on nodes that reflects real usage
patterns, which can be used to improve search ranking, analytics
dashboards, or identifying trending topics.

**Key Design Decisions:**
- Time-windowed counting (not cumulative) - focuses on recent usage
patterns
- Configurable minimum threshold - filters out noise from rarely
accessed nodes
- Neo4j-first implementation using Cypher queries - works with our
primary production database
- Documented Kuzu limitation - requires schema changes, leaving for
future work as acceptable per team discussion

The implementation follows existing patterns in Cognee's memify pipeline
and can be run as a scheduled task or on-demand.

**Known Limitations:**
**Kuzu adapter not currently supported** - Kuzu requires properties to
be defined in the schema at node creation time, so dynamic property
updates don't work. I'm opening a separate issue to track Kuzu support,
which will require schema modifications in the Kuzu adapter. For now,
this feature works with Neo4j (our primary production database).

**Follow-up Issue:** #1993 

## Acceptance Criteria

**Core Functionality:**
-  `extract_usage_frequency()` correctly counts node access frequencies
from interaction data
-  `add_frequency_weights()` writes `frequency_weight` property to
Neo4j nodes
-  Time window filtering works (only counts recent interactions)
-  Minimum threshold filtering works (excludes rarely-used nodes)
-  Element type distribution tracked for analytics
-  Gracefully handles unsupported adapters (logs warning, doesn't
crash)

**Testing Verification:**
1. Run the end-to-end example with Neo4j:
   ```bash
   # Update .env for Neo4j
   GRAPH_DATABASE_PROVIDER=neo4j
   GRAPH_DATASET_HANDLER=neo4j_aura_dev
   
   python extract_usage_frequency_examplepy
   ```
   Should show frequencies extracted and applied to nodes

2. Verify in Neo4j Browser (http://localhost:7474):
   ```cypher
   MATCH (n) WHERE n.frequency_weight IS NOT NULL 
   RETURN n.frequency_weight, labels(n), n.text 
   ORDER BY n.frequency_weight DESC LIMIT 10
   ```
   Should return nodes with frequency weights

3. Run unit tests:
   ```bash
   python test_usage_frequency.py
   ```
   All tests pass (tests are adapter-agnostic and test core logic)

4. Test graceful handling with unsupported adapter:
   ```bash
   # Update .env for Kuzu
   GRAPH_DATABASE_PROVIDER=kuzu
   GRAPH_DATASET_HANDLER=kuzu
   
   python extract_usage_frequency_example.py
   ```
   Should log warning about Kuzu not being supported but not crash

**Files Added:**
- `cognee/tasks/memify/extract_usage_frequency.py` - Core implementation
(215 lines)
- `extract_usage_frequency_example.py` - Complete working example with
documentation
- `test_usage_frequency.py` - Unit tests for core logic
- Test utilities and Neo4j setup scripts for local development

**Tested With:**
- Neo4j 5.x (primary target, fully working)
- Kuzu (gracefully skips with warning)
- Python 3.10, 3.11
- Existing Cognee interaction tracking (save_interaction=True)

**What This Solves:**
This directly addresses the need for usage-based ranking mentioned in
[#1458]. Now teams can:
- See which information gets referenced most in their knowledge base
- Build analytics dashboards showing popular topics
- Weight search results by actual usage patterns
- Identify content that needs improvement (low frequency despite high
relevance)

## Type of Change

- [x] New feature (non-breaking change that adds functionality)

## Screenshots
**Output from running the E2E example showing frequency extraction:**
<img width="1125" height="664" alt="image"
src="https://github.com/user-attachments/assets/455c1ee4-525d-498b-8219-8f12a15292eb"
/>
<img width="1125" height="664" alt="image"
src="https://github.com/user-attachments/assets/64d5da31-85db-427b-b4b4-df47a9c12d6f"
/>
<img width="822" height="456" alt="image"
src="https://github.com/user-attachments/assets/69967354-d550-4818-9aff-a2273e48c5f3"
/>


**Neo4j Browser verification:**
```
✓ Found 6 nodes with frequency_weight in Neo4j!
Sample weighted nodes:
  - Weight: 37, Type: ['DocumentChunk']
  - Weight: 30, Type: ['Entity']
```

## Pre-submission Checklist

- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added usage frequency extraction that aggregates interaction data and
weights frequently accessed graph elements.
* Frequency analysis supports configurable time windows, minimum
interaction thresholds, and element type filtering.
* Automatic frequency weight propagation to Neo4j, Kuzu, and generic
graph database backends.

* **Documentation**
* Added comprehensive example script demonstrating end-to-end usage
frequency extraction, weighting, and analysis.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-13 14:46:48 +01:00
Igor Ilic
86451cfbc2 chore: update test 2026-01-13 14:43:00 +01:00