Compare commits

..

280 commits

Author SHA1 Message Date
Milenko Gavric
dab862e058 update and refactor demo examples 2026-01-20 14:09:12 +01:00
Vasilije
ba94d1270c
fix: Resolve issue with parameter caching for engine creation (#2004)
<!-- .github/pull_request_template.md -->

## Description
Resolve issues with parameter caching for engine creation by making sure
parameters are always called in the same order with the default values
fixed

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2026-01-17 22:17:40 +01:00
Vasilije
1d674d459f
feat: Configurable batch size (#1941)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added configurable chunks-per-batch to control per-batch processing
size via CLI flag, API payload, and configuration; defaults are now
driven by config with an automatic fallback.

* **Style / Documentation**
* Updated contribution/style guidelines (formatting, line length,
string-quote rule, pre-commit note).

* **Tests**
* Updated CLI tests to verify propagation of the new chunks-per-batch
parameter.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-16 15:20:10 +01:00
Igor Ilic
e13877cb65 refactor: Update cache clear calls 2026-01-16 14:51:30 +01:00
Igor Ilic
9a9eacf8c8 refactor: update test with cache change 2026-01-16 14:11:06 +01:00
Igor Ilic
e24707c8da fix: Resolve issue with parameter caching for engine creation 2026-01-16 13:43:01 +01:00
Igor Ilic
2c29868f9a
Neo4j multiuser delete (#1985)
<!-- .github/pull_request_template.md -->

## Description
- Add delete ability for Neo4j Aura 
- Refactor Neo4j Aura to use aiohttp to make async requests and perform
better

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Performance Improvements**
* Neo4j Aura database operations are now asynchronous, eliminating
blocking requests and improving system responsiveness during dataset
management.
* Token retrieval and database provisioning workflows now use
non-blocking asynchronous calls.
  * Enhanced error handling for database API interactions.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-16 12:43:09 +01:00
Igor Ilic
4765f9e4a0
feat: multiquery triplet search (#1991)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
- Adds batch search support to `brute_force_triplet_search` with a new
`query_batch` parameter that accepts a list of queries in addition to
the existing single `query` parameter.
- Introduces a new `NodeEdgeVectorSearch` class that encapsulates vector
search operations, handling embedding and distance retrieval for both
single-query and batch-query modes.
- Returns `List[List[Edge]]` (one list per query) when using
`query_batch`, instead of the single `List[Edge]` format used for single
queries.
- Adds comprehensive test coverage including new test files and cases
for the `NodeEdgeVectorSearch` class, batch search functionality, and
edge cases for both single and batch modes.
- Refactors code by extracting vector search logic into the new class
and adding a helper function `_get_top_triplet_importances` to reduce
code duplication and improve maintainability.
## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added batch-query support to triplet search; batch returns per-query
nested results while single-query remains flat.
* Introduced a unified vector search controller to embed queries and
retrieve node/edge distances across collections.

* **Bug Fixes**
* Improved input validation and safer error handling for missing
collections and batch failures.
  * Stopped adding duplicate skeleton edge links after edge creation.

* **Tests**
* Added comprehensive unit and integration tests covering single/batch
flows and edge cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-14 14:40:29 +01:00
Igor Ilic
f09f66e90d
feat: Remove combined search (#1990)
- Remove use_combined_context parameter from search functions
- Remove CombinedSearchResult class from types module
- Update API routers to remove combined search support
- Remove prepare_combined_context helper function
- Update tutorial notebook to remove use_combined_context usage
- Simplify search return types to always return List[SearchResult]

This removes the combined search feature which aggregated results across
multiple datasets into a single response. Users can still search across
multiple datasets and get results per dataset.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Breaking Changes**
* Search API response simplified: combined-context result type removed
and the legacy combined-context request flag eliminated, changing
response shapes.

* **New Features**
  * dataset_name added to each search result for clearer attribution.

* **Refactor**
* Search logic and return shapes streamlined for access-control and
per-dataset flows; telemetry and request parameters aligned.

* **Tests**
* Combined-context related tests removed or updated to reflect
simplified behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-14 11:31:31 +01:00
Igor Ilic
a27b4b5cd0 refactor: Add back verbose parameter to search 2026-01-13 21:11:57 +01:00
vasilije
bd03a43efa add fix 2026-01-13 17:56:55 +01:00
Vasilije
2b5804f2de
Merge branch 'dev' into remove-combined-search 2026-01-13 17:49:07 +01:00
Vasilije
d341aeabf4
Merge branch 'dev' into feature/cog-3504-multiquery-triplet-search 2026-01-13 17:47:35 +01:00
Igor Ilic
dd16ba89c3
Main merge vol9 (#1994)
<!-- .github/pull_request_template.md -->

## Description
Resolve conflict and merge commits from main to dev

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Add top_k to control number of search results
* Add verbose option to include/exclude detailed graphs in search output

* **Improvements**
  * Examples now use pretty-printed output for clearer readability
* Startup handles migration failures more gracefully with a fallback
initialization path

* **Documentation**
* Updated contributing guidance and added explicit run instructions for
examples

* **Chores**
  * Project version bumped to 0.5.1
  * Adjusted frontend framework version constraint

* **Tests**
  * Updated tests to exercise verbose search behavior

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-13 17:28:03 +01:00
Igor Ilic
48c8a2996f test: Update test search options with verbose mode 2026-01-13 16:27:58 +01:00
lxobr
08779398b0 fix: deduplicate skeleton edges 2026-01-13 16:15:49 +01:00
Igor Ilic
dce51efbe3 chore: ruff format and refactor on contributor PR 2026-01-13 15:10:21 +01:00
Igor Ilic
0d2f66fa1d
Merge branch 'dev' into main-merge-vol9 2026-01-13 15:00:29 +01:00
Igor Ilic
9e5ecffc6e chore: Update test 2026-01-13 14:55:19 +01:00
Vasilije
114b56d829
feat: Add usage frequency tracking for graph elements (#1992)
## Description

This PR adds usage frequency tracking to help identify which graph
elements (nodes) are most frequently accessed during user searches.

**Related Issue:** Closes [#1458]
**The Problem:**
When users search repeatedly, we had no way to track which pieces of
information were being referenced most often. This made it impossible
to:
- Prioritize popular content in search results
- Understand which topics users care about most
- Improve retrieval by boosting frequently-used nodes

**The Solution:**
I've implemented a system that tracks usage patterns by:
1. Leveraging the existing `save_interaction=True` flag in
`cognee.search()` which creates `CogneeUserInteraction` nodes
2. Following the `used_graph_element_to_answer` edges to see which graph
elements each search referenced
3. Counting how many times each element was accessed within a
configurable time window (default: 7 days)
4. Writing a `frequency_weight` property back to frequently-accessed
nodes

This gives us a simple numeric weight on nodes that reflects real usage
patterns, which can be used to improve search ranking, analytics
dashboards, or identifying trending topics.

**Key Design Decisions:**
- Time-windowed counting (not cumulative) - focuses on recent usage
patterns
- Configurable minimum threshold - filters out noise from rarely
accessed nodes
- Neo4j-first implementation using Cypher queries - works with our
primary production database
- Documented Kuzu limitation - requires schema changes, leaving for
future work as acceptable per team discussion

The implementation follows existing patterns in Cognee's memify pipeline
and can be run as a scheduled task or on-demand.

**Known Limitations:**
**Kuzu adapter not currently supported** - Kuzu requires properties to
be defined in the schema at node creation time, so dynamic property
updates don't work. I'm opening a separate issue to track Kuzu support,
which will require schema modifications in the Kuzu adapter. For now,
this feature works with Neo4j (our primary production database).

**Follow-up Issue:** #1993 

## Acceptance Criteria

**Core Functionality:**
-  `extract_usage_frequency()` correctly counts node access frequencies
from interaction data
-  `add_frequency_weights()` writes `frequency_weight` property to
Neo4j nodes
-  Time window filtering works (only counts recent interactions)
-  Minimum threshold filtering works (excludes rarely-used nodes)
-  Element type distribution tracked for analytics
-  Gracefully handles unsupported adapters (logs warning, doesn't
crash)

**Testing Verification:**
1. Run the end-to-end example with Neo4j:
   ```bash
   # Update .env for Neo4j
   GRAPH_DATABASE_PROVIDER=neo4j
   GRAPH_DATASET_HANDLER=neo4j_aura_dev
   
   python extract_usage_frequency_examplepy
   ```
   Should show frequencies extracted and applied to nodes

2. Verify in Neo4j Browser (http://localhost:7474):
   ```cypher
   MATCH (n) WHERE n.frequency_weight IS NOT NULL 
   RETURN n.frequency_weight, labels(n), n.text 
   ORDER BY n.frequency_weight DESC LIMIT 10
   ```
   Should return nodes with frequency weights

3. Run unit tests:
   ```bash
   python test_usage_frequency.py
   ```
   All tests pass (tests are adapter-agnostic and test core logic)

4. Test graceful handling with unsupported adapter:
   ```bash
   # Update .env for Kuzu
   GRAPH_DATABASE_PROVIDER=kuzu
   GRAPH_DATASET_HANDLER=kuzu
   
   python extract_usage_frequency_example.py
   ```
   Should log warning about Kuzu not being supported but not crash

**Files Added:**
- `cognee/tasks/memify/extract_usage_frequency.py` - Core implementation
(215 lines)
- `extract_usage_frequency_example.py` - Complete working example with
documentation
- `test_usage_frequency.py` - Unit tests for core logic
- Test utilities and Neo4j setup scripts for local development

**Tested With:**
- Neo4j 5.x (primary target, fully working)
- Kuzu (gracefully skips with warning)
- Python 3.10, 3.11
- Existing Cognee interaction tracking (save_interaction=True)

**What This Solves:**
This directly addresses the need for usage-based ranking mentioned in
[#1458]. Now teams can:
- See which information gets referenced most in their knowledge base
- Build analytics dashboards showing popular topics
- Weight search results by actual usage patterns
- Identify content that needs improvement (low frequency despite high
relevance)

## Type of Change

- [x] New feature (non-breaking change that adds functionality)

## Screenshots
**Output from running the E2E example showing frequency extraction:**
<img width="1125" height="664" alt="image"
src="https://github.com/user-attachments/assets/455c1ee4-525d-498b-8219-8f12a15292eb"
/>
<img width="1125" height="664" alt="image"
src="https://github.com/user-attachments/assets/64d5da31-85db-427b-b4b4-df47a9c12d6f"
/>
<img width="822" height="456" alt="image"
src="https://github.com/user-attachments/assets/69967354-d550-4818-9aff-a2273e48c5f3"
/>


**Neo4j Browser verification:**
```
✓ Found 6 nodes with frequency_weight in Neo4j!
Sample weighted nodes:
  - Weight: 37, Type: ['DocumentChunk']
  - Weight: 30, Type: ['Entity']
```

## Pre-submission Checklist

- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added usage frequency extraction that aggregates interaction data and
weights frequently accessed graph elements.
* Frequency analysis supports configurable time windows, minimum
interaction thresholds, and element type filtering.
* Automatic frequency weight propagation to Neo4j, Kuzu, and generic
graph database backends.

* **Documentation**
* Added comprehensive example script demonstrating end-to-end usage
frequency extraction, weighting, and analysis.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-13 14:46:48 +01:00
Igor Ilic
86451cfbc2 chore: update test 2026-01-13 14:43:00 +01:00
Igor Ilic
3cfbaaaa9d chore: update lock file 2026-01-13 14:30:13 +01:00
Igor Ilic
dc48d2f992 refactor: set top_k value to 10 2026-01-13 14:24:31 +01:00
Igor Ilic
b689d330ac Merge branch 'main' into main-merge-vol9 2026-01-13 14:22:22 +01:00
lxobr
c609b73cda refactor: improve methods order 2026-01-13 11:22:04 +01:00
Christina_Raichel_Francis
a91d02cd71 Merge branch 'feature/graph-usage-frequency-tracking' into dev 2026-01-12 21:58:19 +00:00
Christina_Raichel_Francis
d09b6df241 feat: feat to support issue #1458 frequency weights addition for neo4j backend 2026-01-12 18:10:51 +00:00
lxobr
872795f0cc test: add integration test for brute_force_triplet_search 2026-01-12 18:16:30 +01:00
lxobr
e5d4100070 Merge branch 'dev' into feature/cog-3504-multiquery-support-in-brute_force_triplet_search 2026-01-12 15:09:47 +01:00
lxobr
1c8d0f6da1 chore: update tests and minor tweaks 2026-01-12 14:51:50 +01:00
lxobr
c20304a92a tests: update and expand test_brute_force_triplet_search.py and test_node_edge_vector_search.py 2026-01-12 14:40:55 +01:00
lxobr
7833189001 feat: enable batch search in brute_force_triplet_search 2026-01-12 14:40:17 +01:00
lxobr
5ac288afa3 chore: tweak type hints 2026-01-12 13:44:04 +01:00
lxobr
fce018d43d test: add tests for node_edge_vector_search.py 2026-01-12 13:38:24 +01:00
lxobr
701a92cdec feat: add batch search to node_edge_vector_search.py 2026-01-12 13:27:18 +01:00
Igor Ilic
47054200c9
Merge branch 'dev' into neo4j-multiuser-delete 2026-01-12 10:41:06 +01:00
vasilije
c2d8777aab test: remove obsolete combined search tests
- Remove 3 tests for use_combined_context from test_search.py
- Remove 1 test for use_combined_context from test_search_prepare_search_result_contract.py
- Simplify test_authorized_search_non_combined_delegates to test delegation only

These tests relied on the removed use_combined_context parameter and
CombinedSearchResult type. The functionality is no longer available
after removing combined search support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-12 07:54:38 +01:00
vasilije
748b7aeaf5 refactor: remove combined search functionality
- Remove use_combined_context parameter from search functions
- Remove CombinedSearchResult class from types module
- Update API routers to remove combined search support
- Remove prepare_combined_context helper function
- Update tutorial notebook to remove use_combined_context usage
- Simplify search return types to always return List[SearchResult]

This removes the combined search feature which aggregated results
across multiple datasets into a single response. Users can still
search across multiple datasets and get results per dataset.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-12 07:43:57 +01:00
vasilije
b7d5bf5e9c test: update cognify test assertions to include chunks_per_batch parameter
Update all cognify test assertions to expect the chunks_per_batch=None
parameter that was added to the CLI command. This fixes three failing tests:
- test_execute_basic_cognify
- test_cognify_invalid_chunk_size
- test_cognify_nonexistent_ontology_file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-11 16:22:06 +01:00
vasilije
98394fc264 docs: add code style rules for double quotes and pre-commit 2026-01-11 16:18:29 +01:00
vasilije
7d3450cb08 style: apply ruff formatting to use double quotes 2026-01-11 16:18:13 +01:00
Vasilije
6ec44f64f7
Merge branch 'dev' into COG-3146 2026-01-11 16:16:06 +01:00
vasilije
e38c33c1b5 fix: handle missing chunks_per_batch attribute in cognify CLI command
Fix AttributeError when args.chunks_per_batch is not present in the
argparse.Namespace object. Use getattr() with default value of None
to safely access the optional chunks_per_batch parameter.

This resolves test failures in test_cli_edge_cases.py where Namespace
objects were created without the chunks_per_batch attribute.

Changes:
- Use getattr(args, 'chunks_per_batch', None) instead of direct access
- Update test assertion to expect chunks_per_batch=None parameter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-11 16:15:29 +01:00
vasilije
ab990f7c5c docs: add CLAUDE.md for Claude Code guidance
Add comprehensive CLAUDE.md file to guide future Claude Code instances working in this repository. Includes:
- Development commands (setup, testing, code quality)
- Architecture overview (ECL pipeline, data flows, key patterns)
- Complete configuration guide (LLM providers, databases, storage)
- All 15 search types with descriptions
- Extension points for custom functionality
- Troubleshooting common issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-11 16:04:11 +01:00
Vasilije
861c5e33da
refactor: add type hints for user_id and visualization server args (#1987)
<!-- .github/pull_request_template.md -->

## Description
Resolves #1986 

I addressed all `ANN001` errors in `cognee/shared/utils.py`.
Updated functions:
1. send_telemetry
2. start_visualization_server
3. _sanitize_nested_properties
4. embed_logo

While fixing these errors, i've noticed that the `send_telemetry`
function lacked a type hint for `user_id`. After analyzing the `User`
models and usage patterns in the codebase, I found that `user_id` is not
strictly a `str` but can also be a `uuid.UUID` object.

Therefore, I updated the type hint to `Union[str, uuid.UUID]` (importing
`uuid` and `typing.Union`) to accurately reflect the data structure and
improve type safety.

## Acceptance Criteria
* [x] The code passes static analysis (`ruff`) without `ANN001` errors
in `cognee/shared/utils.py`.
* [x] Correct imports (`uuid`, `Union`) are added and sorted.

[Check Steps]
1. Run 'uv run ruff check cognee/shared/utils.py --select ANN001'
5. Expected result: No errors found.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Improved type annotations across telemetry and sanitization utilities
for safer handling of IDs and nested properties.
* Ensured additional properties are sanitized before telemetry is sent.
* Added explicit type hints for visualization startup and logo embedding
parameters for clearer IDE support.

This release contains no user-facing changes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-10 15:27:52 +01:00
HectorSin
4189cda895 refactor: simplify type hint and add return type for sanitize function
Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>
2026-01-10 21:25:25 +09:00
HectorSin
da5660b716 refactor: fix mutable default argument in send_telemetry
Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>
2026-01-10 21:21:04 +09:00
HectorSin
46c12cc0ee refactor: resolve remaining ANN001 errors in utils.py
Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>
2026-01-10 21:13:10 +09:00
HectorSin
ebf2aaaa5c refactor: add type hint for handler_class
Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>
2026-01-10 20:58:08 +09:00
HectorSin
f73457ef72 refactor: add type hints for user_id and visualization server args
Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>
2026-01-10 20:29:50 +09:00
Vasilije
2c23063da7
fix: Better error message if cognee is run without cognee.add and cognee.cognify (#1940)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved error messaging in search functionality with clearer,
actionable feedback when database or user configuration prerequisites
are not met
* Standardized error response format for consistent and informative
error reporting across search operations

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-10 11:21:52 +01:00
Vasilije
f2166be823
Merge branch 'dev' into COG-3146 2026-01-10 08:49:43 +01:00
Vasilije
f03ab671e6
fix: Remove Jon Doe enitity reference due to hallucination issues (#1939)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Clarifies and tightens coreference resolution guidance across
knowledge-graph prompt templates.
> 
> - Updates coreference rules to emphasize using the most complete,
human-readable identifiers consistently (`generate_graph_prompt*.txt`)
> - Tweaks examples, notably replacing the John Doe example with a
generic "X" case in the one-shot prompt
> - Minor wording/formatting cleanups; no code changes or logic
modifications
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
8499258272. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Refined entity resolution guidance in knowledge graph generation
prompts to use more generic instructions, improving flexibility and
consistency in how entities are identified throughout the system.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-10 08:49:23 +01:00
Vasilije
a3c04d30be
Merge branch 'dev' into COG-3283 2026-01-10 08:48:55 +01:00
Vasilije
7a421dd968
Chore: Update helm chart (#1984)
<!-- .github/pull_request_template.md -->

## Description
Updated example Helm chart:
* connected PostgreSQL + pgvector to Cognee
* Added required variables and secrets
* Tested with port forwarding
* Updated readme
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): Cloud

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Updated deployment guide with example setup instructions, deployment
commands, and port forwarding details for local access.

* **Configuration**
  * Added LLM model and provider configuration settings.
* Enhanced deployment with environment variables and memory resource
limits.
  * Implemented secure secret management for API keys.
  * Adjusted resource allocations for services.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-09 19:16:45 +01:00
Igor Ilic
05961221f4 refactor: use async requests library 2026-01-09 19:08:52 +01:00
Igor Ilic
268db003e3 feat: Add delete to neo4j aura 2026-01-09 18:10:54 +01:00
Pavel Zorin
fb4796204a Chore: Fix helm chart 2026-01-09 18:06:08 +01:00
Vasilije
6dc08eb5d0
Chore: Remove Lint and Format check in favor to pre-commit (#1983)
<!-- .github/pull_request_template.md -->

## Description
Removed `List and Format check` steps. It's all done by pre-commit
checks
## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): Dev Experience

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Streamlined the continuous integration workflow by removing redundant
job steps from the automated testing pipeline. Unit tests and
integration tests continue to run as expected.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-09 15:39:01 +01:00
lxobr
58dd518690 chore: update tests 2026-01-09 14:54:10 +01:00
lxobr
fad75e21c1 refactor: minor tweaks 2026-01-09 14:53:47 +01:00
Pavel Zorin
aeb2f39fd8 Chore: Remove Lint and Format check in favor to pre-commit 2026-01-09 14:15:36 +01:00
Vasilije
14c5a306b3
fix: Resolve issue with distributed test (#1982)
<!-- .github/pull_request_template.md -->

## Description
Update poetry lock for distributed Cognee

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2026-01-09 12:54:41 +01:00
Igor Ilic
6db193ef36 fix: Resolve issue with distributed test 2026-01-09 11:20:16 +01:00
lxobr
c79af6c8cc refactor: brute_force_triplet_search.py and node_edge_vector_search.py 2026-01-09 10:05:44 +01:00
lxobr
876120853f refactor: brute_force_triplet_search.py with context class 2026-01-09 09:10:29 +01:00
Vasilije
beb8932fea
fix: Handle Dependabot security issues (#1968)
<!-- .github/pull_request_template.md -->

## Description
Fix security issue with langchain raised by Dependabot:
https://github.com/topoteretes/cognee/security/dependabot/73

Older version of langchain has an issue

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [X ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ X] **I have tested my changes thoroughly before submitting this PR**
- [X ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ X] My code follows the project's coding standards and style
guidelines
- [ X] I have added tests that prove my fix is effective or that my
feature works
- [ X] I have added necessary documentation (if applicable)
- [X ] All new and existing tests pass
- [X ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ X] I have linked any relevant issues in the description
- [ X] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Addresses Dependabot alerts by updating critical dependencies and
refreshing the Python lockfile.
> 
> - Adds `langchain-core` to optional deps and updates locked version to
`1.2.6` (introduces `uuid-utils`)
> - Tightens HTTP stack: raises `aiohttp` to `>=3.13.3`, adds `urllib3`
runtime dep (locked to `2.6.2`)
> - Bumps frontend `next` to `16.1.7`
> - Regenerates `uv.lock` with numerous package/version updates and
platform wheels; adjusts `kubernetes` to `33.1.0` with `oauthlib` dep
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1eb4197f1a. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Updated Next.js to 16.1.7.
  * Relaxed aiohttp dependency constraint.
  * Added urllib3 as a dependency.
  * Added langchain-core to optional dependencies.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 21:29:20 +01:00
Vasilije
c3c8961631
Merge branch 'dev' into ffix_sec 2026-01-08 21:29:02 +01:00
Vasilije
abc6faff34
fix: fix security issue (#1967)
<!-- .github/pull_request_template.md -->

## Description
Fix security issue reported by the user
https://github.com/topoteretes/cognee/issues/1950

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> - **Dependencies:** Adds `cbor2>=5.8.0` to `pyproject.toml`; updates
`uv.lock` (including version bump and wheels) to reflect new dependency.
> - **CI/Docs:** Refines `.github/pull_request_template.md` (simplified
change types; renamed `Screenshots` section to request proof of local
tests passing).
> - **Code cleanup:** Minor formatting changes in
`LiteLLMEmbeddingEngine.py` and `get_api_auth_backend.py` with no
functional impact.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
aa4ab1ed8a. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Added the cbor2 serialization library to project dependencies.

* **Documentation**
* Updated the pull request template: simplified change-type options,
tightened acceptance criteria, expanded the pre-submission checklist
with additional verification items, and renamed/clarified the
screenshots section to request local test evidence.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 21:28:21 +01:00
Vasilije
ada0a2be4f
Merge branch 'dev' into fix_security_issue 2026-01-08 21:28:11 +01:00
Vasilije
b1ff473a38
COG-3395: Chore: pre-commit, pre-commit action, contribution guide update (#1979)
## Description
Revisited the `CONTRIBUTING.md`:
* Added the `Required tools`
* Pre-commit requirement. It replaces `ruff` and other linting guides
* Fixed `test_library.py` paths. Made sure that the testing guide is
complete and works
* Added a `pre-commit` step to `Pre-Test` workflow. It will fail if
`pre-commit` has issues and no other tests will be triggered
* Added a sufficient LLM configuration example for tests. Moved
`cognee/.env.example` to the project root for convenience

>>> Requires: https://github.com/topoteretes/cognee/pull/1980 <<<

## Acceptance Criteria
`pre-commit` action works 
Tested pre-commit locally. If a commit violates the rules - it rejects
it and fixes the issues. Then we need to `git commit ...` again.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI and DevExp improvement

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Expanded contributor guide with setup, required tools, testing
instructions, examples, and updated PR submission guidance.
* Updated pull-request checklist to reference contributing instructions.

* **Chores**
* Added three new local environment variables for LLM configuration and
updated example env file.
  * Added a pre-commit validation step to CI.
  * Updated ignore list to exclude a local environment file.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 19:59:20 +01:00
Pavel Zorin
3e602fdad7 Renamed the pre_test workflow 2026-01-08 19:19:11 +01:00
Pavel Zorin
15a88accac Chore: use pre-commit action 2026-01-08 19:19:11 +01:00
Pavel Zorin
b0fe1a8439 CI: Speed up pre-test workflow 2026-01-08 19:19:11 +01:00
Pavel Zorin
962ddf4257 Chore: pre-commit, pre-commit action, contribution guide update 2026-01-08 19:19:07 +01:00
Vasilije
fde921ca3e
chore: Remove trailing whitespaces in the project, fix YAMLs (#1980)
<!-- .github/pull_request_template.md -->

## Description
Removes trailing whitespaces from all files in the project. Needed by
https://github.com/topoteretes/cognee/pull/1979

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `topK` parameter support in search functionality to control
result count (1-100).
  * Added Python tool configuration via mise.toml.

* **Documentation**
* Enhanced issue templates with improved UI metadata, labels, and
clearer guidance for bug reports, feature requests, and documentation
issues.
* Expanded CONTRIBUTING.md with comprehensive contribution guidelines
and community information.

* **Chores**
* Removed unused modules: `cognee.modules.retrieval` and
`cognee.tasks.temporal_graph`.
* Applied consistent formatting and whitespace normalization across
configuration files, workflows, and documentation.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 19:16:09 +01:00
lxobr
f9cb490ad9 refactor: brute_force_triplet_search.py 2026-01-08 17:20:28 +01:00
Pavel Zorin
7a48e22b13 chore: Remove trailing whitespaces in the project, fix YAMLs 2026-01-08 17:15:53 +01:00
vasilije
1eb4197f1a add uv lock 2026-01-08 16:05:36 +01:00
Vasilije
c50b5fa139
Merge branch 'dev' into fix_security_issue 2026-01-08 16:00:21 +01:00
Vasilije
42dc9351f2
Merge branch 'dev' into ffix_sec 2026-01-08 15:53:44 +01:00
Vasilije
5cf63617a1
Fix dev branch ci (#1978)
<!-- .github/pull_request_template.md -->

## Description
Resolve issues with CI for dev branch with slight contributor PR
refactors

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2026-01-08 15:49:59 +01:00
Igor Ilic
7de3356b1f fix: Resolve issue with migration order 2026-01-08 14:28:39 +01:00
Igor Ilic
00697c4491 chore: Update poetry lock 2026-01-08 14:21:05 +01:00
Vasilije
1772439ea5
Update aiohttp version in pyproject.toml 2026-01-08 13:49:39 +01:00
Igor Ilic
fd6a77deec refactor: Add TODO for missing llm config parameters 2026-01-08 13:31:25 +01:00
Igor Ilic
f3215e16f9 refactor: Remove silent handling of lifetime assignment 2026-01-08 12:51:11 +01:00
Igor Ilic
07b91f3a5f refactor: Remove comment from Dockerfile 2026-01-08 12:45:03 +01:00
vasilije
af72dd2fc2 fixes to ruff format 2026-01-07 16:26:36 +01:00
vasilije
aa4ab1ed8a reformat 2026-01-06 18:05:34 +01:00
vasilije
f1f955b76a fix 2026-01-06 18:03:43 +01:00
vasilije
555eef69e3 added update to pr template 2026-01-06 17:53:42 +01:00
vasilije
5c365abf66 added update to pr template 2026-01-06 17:53:38 +01:00
vasilije
295f623db3 fix security issue 2026-01-06 17:47:54 +01:00
Christina_Raichel_Francis
e0c7e68dd6 chore: removed inconsistency in node properties btw task, e2e example and test codes 2026-01-05 22:22:47 +00:00
Vasilije
34c6652939
add configurable JWT expiration, cookie domain, CORS origins, and service restart policies (#1956)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces several configuration improvements to enhance the
application's flexibility and reliability. The changes make JWT token
expiration and cookie domain configurable via environment variables,
improve CORS configuration, and add container restart policies for
better uptime.

**JWT Token Expiration Configuration:**
- Added `JWT_LIFETIME_SECONDS` environment variable to configure JWT
token expiration time
- Set default expiration to 3600 seconds (1 hour) for both API and
client authentication backends
- Removed hardcoded expiration values in favor of environment-based
configuration
- Added documentation comments explaining the JWT strategy configuration

**Cookie Domain Configuration:**
- Added `AUTH_TOKEN_COOKIE_DOMAIN` environment variable to configure
cookie domain
- When not set or empty, cookie domain defaults to `None` allowing
cross-domain usage
- Added documentation explaining cookie expiration is handled by JWT
strategy
- Updated default_transport to use environment-based cookie domain

**CORS Configuration Enhancement:**
- Added `CORS_ALLOWED_ORIGINS` environment variable with default value
of `'*'`
- Configured frontend to use `NEXT_PUBLIC_BACKEND_API_URL` environment
variable
- Set default backend API URL to `http://localhost:8000`

**Docker Service Reliability:**
- Added `restart: always` policy to all services (cognee, frontend,
neo4j, chromadb, and postgres)
- This ensures services automatically restart on failure or system
reboot
- Improves container reliability and uptime in production and
development environments

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Services now automatically restart on failure for improved
reliability.

* **Configuration**
* Cookie domain for authentication is now configurable via environment
variable, defaulting to None if not set.
* JWT token lifetime is now configurable via environment variable, with
a 3600-second default.
* CORS allowed origins are now configurable with a default of all
origins (*).
* Frontend backend API URL is now configurable, defaulting to
http://localhost:8000.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-04 10:37:13 +01:00
Vasilije
4e881cd00f
Fix: Handle empty API key in LiteLLMEmbeddingEngine (#1959)
fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine

- Add conditional check for empty API key to prevent authentication
errors- Set default API key to "EMPTY" when no valid key is provided-
This ensures proper fallback behavior when API key is not configured
```

<!-- .github/pull_request_template.md -->

## Description
This PR fixes an issue where the `LiteLLMEmbeddingEngine` throws an authentication error when the `EMBEDDING_API_KEY` environment variable is empty or not set. The error message indicated `"api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"`.

Log Error: 2025-12-23T11:36:58.220908 [error    ] Error embedding text: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable [LiteLLMEmbeddingEngine]

**Root Cause**: When initializing the embedding engine, if the `api_key` parameter is an empty string, the underlying LiteLLM client doesn't treat it as "no key provided" but instead uses this empty string to make API requests, triggering authentication failure.

**Solution**: Added a conditional check in the code that creates the `LiteLLMEmbeddingEngine` instance. If the `EMBEDDING_API_KEY` read from configuration is empty (`None` or empty string), we explicitly set the `api_key` parameter passed to the engine constructor to a non-empty placeholder string `"EMPTY"`. This aligns with LiteLLM's handling of optional authentication and prevents exceptions in scenarios where keys are not required or need to be obtained from other sources

**How to Reproduce**:
   Configure the application with the following settings (as shown in the error log):
   EMBEDDING_PROVIDER="custom"
   EMBEDDING_MODEL="openai/Qwen/Qwen3-Embedding-xxx"
   EMBEDDING_ENDPOINT="xxxxx"  
   EMBEDDING_API_VERSION=""
   EMBEDDING_DIMENSIONS=1024
   EMBEDDING_MAX_TOKENS=16384
   EMBEDDING_BATCH_SIZE=10
   # If embedding key is not provided same key set for LLM_API_KEY will be used
   EMBEDDING_API_KEY=""


## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] I have tested my changes thoroughly before submitting this PR
- [x] This PR contains minimal changes necessary to address the issue/feature
- [ ] My code follows the project's coding standards and style guidelines
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **Bug Fixes**
  * Improved API key validation for the embedding service to properly handle blank or missing API keys, ensuring more reliable embedding generation and preventing potential service errors.

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-04 10:36:17 +01:00
Vasilije
90805dac36
add ChromaDB dependency to fix missing installation error (#1960)
<!-- .github/pull_request_template.md -->

## Description
This PR addresses a runtime error where the application fails because
ChromaDB is not installed. The error message `"ChromaDB is not
installed. Please install it with 'pip install chromadb'"` occurs when
attempting to use features that depend on ChromaDB.

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Updated dependency management to include chromadb in the build
configuration.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-04 10:35:45 +01:00
maozhen
570de517c5 ```
feat(Dockerfile): add chromadb support and China mirror option

- Add chromadb extra dependency to uv sync commands in Dockerfile- Include optional aliyun mirror configuration for users in China- Update dependency installation to include chromadb extra```
2026-01-04 15:22:21 +08:00
maozhen
2c79d693fd ```
fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine

- Add conditional check for empty API key to prevent authentication errors- Set default API key to "EMPTY" when no valid key is provided- This ensures proper fallback behavior when API key is not configured
```
2026-01-04 15:18:43 +08:00
maozhen
e47fda4872 ```
fix(auth): add error handling for JWT lifetime configuration

- Add try-catch block to handle invalid JWT_LIFETIME_SECONDS environment variable
- Default to 360 seconds when environment variable is not a valid integer
- Apply same fix to both API and client authentication backendsdocs(docker): add security warning for CORS configuration

- Add comment warning about default CORS_ALLOWED_ORIGINS setting
- Emphasize need to override wildcard with specific domains in production
```
2026-01-04 11:08:42 +08:00
maozhen
5a77c36a95 ```
refactor(auth): remove redundant comments from JWT strategy configurationRemove duplicate comments that were explaining the JWT lifetime configuration
in both API and client authentication backends. The code remains functionallyunchanged but comments are cleaned up for better maintainability.
```
2026-01-04 11:08:32 +08:00
maozhen
a7b114725a ```
feat(auth): make JWT token expiration configurable via environment variable- Add JWT_LIFETIME_SECONDS environment variable to configure token expiration
- Set default expiration to3600 seconds (1 hour) for both API and client auth backends
- Remove hardcoded expiration values in favor of environment-based configuration
- Add documentation comments explaining the JWT strategy configuration

feat(auth): make cookie domain configurable via environment variable

- Add AUTH_TOKEN_COOKIE_DOMAIN environment variable to configure cookie domain
- When not set or empty, cookie domain defaults to None allowing cross-domain usage
- Add documentation explaining cookie expiration is handled by JWT strategy
- Update default_transport to use environment-based cookie domainfeat(docker): add CORS_ALLOWED_ORIGINS environment variable

- Add CORS_ALLOWED_ORIGINS environment variable with default value of '*'
- Configure frontend to use NEXT_PUBLIC_BACKEND_API_URL environment variable
- Set default backend API URL to http://localhost:8000

feat(docker): add restart policy to all services

- Add restart: always policy to cognee, frontend, neo4j, chromadb, and postgres services
- This ensures services automatically restart on failure or system reboot
- Improves container reliability and uptime```
2026-01-04 11:08:28 +08:00
Vasilije
a0f25f4f50
feat: redo notebook tutorials (#1922)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Two interactive tutorial notebooks added (Cognee Basics, Python
Development) with runnable code and rich markdown; MarkdownPreview for
rendered markdown; instance-aware notebook support and cloud proxy with
API key handling; notebook CRUD (create, save, run, delete).

* **Bug Fixes**
  * Improved authentication handling to treat 401/403 consistently.

* **Improvements**
* Auto-expanding text areas; better error propagation from dataset
operations; migration to allow toggling deletability for legacy tutorial
notebooks.

* **Tests**
  * Expanded tests for tutorial creation and loading.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-01 14:44:04 +01:00
vasilije
8965e31a58 reformat 2025-12-31 13:57:48 +01:00
Vasilije
e5341c5f49
Support Structured Outputs with Llama CPP using LiteLLM & Instructor (#1949)
<!-- .github/pull_request_template.md -->

## Description
This PR adds support for structured outputs with llama cpp using litellm
and instructor. It returns a Pydantic instance. Based on the github
issue described
[here](https://github.com/topoteretes/cognee/issues/1947).

It features the following:
- works for both local and server modes (OpenAI api compatible)
- defaults to `JSON` mode (**not JSON schema mode, which is too rigid**)
- uses existing patterns around logging & tenacity decorator consistent
with other adapters
- Respects max_completion_tokens / max_tokens

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

I used the script below to test it with the [Phi-3-mini-4k-instruct
model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf).
This tests a basic structured data extraction and a more complex one
locally, then verifies that data extraction works in server mode.

There are instructors in the script on how to set up the models. If you
are testing this on a mac, run `brew install llama.cpp` to get llama cpp
working locally. If you don't have Apple silicon chips, you will need to
alter the script or the configs to run this on GPU.
```
"""
Comprehensive test script for LlamaCppAPIAdapter - Tests LOCAL and SERVER modes

SETUP INSTRUCTIONS:
===================

1. Download a small model (pick ONE):

   # Phi-3-mini (2.3GB, recommended - best balance)
   wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf

   # OR TinyLlama (1.1GB, smallest but lower quality)
   wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

2. For SERVER mode tests, start a server:
   python -m llama_cpp.server  --model ./Phi-3-mini-4k-instruct-q4.gguf --port 8080 --n_gpu_layers -1
"""

import asyncio
import os
from pydantic import BaseModel
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.llama_cpp.adapter import (
    LlamaCppAPIAdapter,
)


class Person(BaseModel):
    """Simple test model for person extraction"""

    name: str
    age: int


class EntityExtraction(BaseModel):
    """Test model for entity extraction"""

    entities: list[str]
    summary: str


# Configuration - UPDATE THESE PATHS
MODEL_PATHS = [
    "./Phi-3-mini-4k-instruct-q4.gguf",
    "./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
]


def find_model() -> str:
    """Find the first available model file"""
    for path in MODEL_PATHS:
        if os.path.exists(path):
            return path
    return None


async def test_local_mode():
    """Test LOCAL mode (in-process, no server needed)"""
    print("=" * 70)
    print("Test 1: LOCAL MODE (In-Process)")
    print("=" * 70)

    model_path = find_model()
    if not model_path:
        print(" No model found! Download a model first:")
        print()
        return False

    print(f"Using model: {model_path}")

    try:
        adapter = LlamaCppAPIAdapter(
            name="LlamaCpp-Local",
            model_path=model_path,  # Local mode parameter
            max_completion_tokens=4096,
            n_ctx=2048,
            n_gpu_layers=-1,  # 0 for CPU, -1 for all GPU layers
        )

        print(f"✓ Adapter initialized in {adapter.mode_type.upper()} mode")
        print("  Sending request...")

        result = await adapter.acreate_structured_output(
            text_input="John Smith is 30 years old",
            system_prompt="Extract the person's name and age.",
            response_model=Person,
        )

        print(f" Success!")
        print(f"   Name: {result.name}")
        print(f"   Age: {result.age}")
        print()
        return True
    except ImportError as e:
        print(f" ImportError: {e}")
        print("   Install llama-cpp-python: pip install llama-cpp-python")
        print()
        return False
    except Exception as e:
        print(f" Failed: {e}")
        print()
        return False


async def test_server_mode():
    """Test SERVER mode (localhost HTTP endpoint)"""
    print("=" * 70)
    print("Test 3: SERVER MODE (Localhost HTTP)")
    print("=" * 70)

    try:
        adapter = LlamaCppAPIAdapter(
            name="LlamaCpp-Server",
            endpoint="http://localhost:8080/v1",  # Server mode parameter
            api_key="dummy",
            model="Phi-3-mini-4k-instruct-q4.gguf",
            max_completion_tokens=1024,
            chat_format="phi-3"
        )

        print(f"✓ Adapter initialized in {adapter.mode_type.upper()} mode")
        print(f"  Endpoint: {adapter.endpoint}")
        print("  Sending request...")

        result = await adapter.acreate_structured_output(
            text_input="Sarah Johnson is 25 years old",
            system_prompt="Extract the person's name and age.",
            response_model=Person,
        )

        print(f" Success!")
        print(f"   Name: {result.name}")
        print(f"   Age: {result.age}")
        print()
        return True
    except Exception as e:
        print(f" Failed: {e}")
        print("   Make sure llama-cpp-python server is running on port 8080:")
        print("   python -m llama_cpp.server --model your-model.gguf --port 8080")
        print()
        return False


async def test_entity_extraction_local():
    """Test more complex extraction with local mode"""
    print("=" * 70)
    print("Test 2: Complex Entity Extraction (Local Mode)")
    print("=" * 70)

    model_path = find_model()
    if not model_path:
        print(" No model found!")
        print()
        return False

    try:
        adapter = LlamaCppAPIAdapter(
            name="LlamaCpp-Local",
            model_path=model_path,
            max_completion_tokens=1024,
            n_ctx=2048,
            n_gpu_layers=-1,
        )

        print(f"✓ Adapter initialized")
        print("  Sending complex extraction request...")

        result = await adapter.acreate_structured_output(
            text_input="Natural language processing (NLP) is a subfield of artificial intelligence (AI) and computer science.",
            system_prompt="Extract all technical entities mentioned and provide a brief summary.",
            response_model=EntityExtraction,
        )

        print(f" Success!")
        print(f"   Entities: {', '.join(result.entities)}")
        print(f"   Summary: {result.summary}")
        print()
        return True
    except Exception as e:
        print(f" Failed: {e}")
        print()
        return False


async def main():
    """Run all tests"""
    print("\n" + "🦙" * 35)
    print("Llama CPP Adapter - Comprehensive Test Suite")
    print("Testing LOCAL and SERVER modes")
    print("🦙" * 35 + "\n")

    results = {}

    # Test 1: Local mode (no server needed)
    print("=" * 70)
    print("PHASE 1: Testing LOCAL mode (in-process)")
    print("=" * 70)
    print()
    results["local_basic"] = await test_local_mode()

    results["local_complex"] = await test_entity_extraction_local()

    # Test 2: Server mode (requires server on 8080)
    print("\n" + "=" * 70)
    print("PHASE 2: Testing SERVER mode (requires server running)")
    print("=" * 70)
    print()
    results["server"] = await test_server_mode()

    # Summary
    print("\n" + "=" * 70)
    print("TEST SUMMARY")
    print("=" * 70)
    for test_name, passed in results.items():
        status = " PASSED" if passed else " FAILED"
        print(f"  {test_name:20s}: {status}")

    passed_count = sum(results.values())
    total_count = len(results)
    print()
    print(f"Total: {passed_count}/{total_count} tests passed")

    if passed_count == total_count:
        print("\n🎉 All tests passed! The adapter is working correctly.")
    elif results.get("local_basic"):
        print("\n✓ Local mode works! Server/cloud tests need llama-cpp-python server running.")
    else:
        print("\n⚠️  Please check setup instructions at the top of this file.")


if __name__ == "__main__":
    asyncio.run(main())

```

**The following screenshots show the tests passing**
<img width="622" height="149" alt="image"
src="https://github.com/user-attachments/assets/9df02f66-39a9-488a-96a6-dc79b47e3001"
/>

Test 1
<img width="939" height="750" alt="image"
src="https://github.com/user-attachments/assets/87759189-8fd2-450f-af7f-0364101a5690"
/>

Test 2
<img width="938" height="746" alt="image"
src="https://github.com/user-attachments/assets/61e423c0-3d41-4fde-acaf-ae77c3463d66"
/>

Test 3
<img width="944" height="232" alt="image"
src="https://github.com/user-attachments/assets/f7302777-2004-447c-a2fe-b12762241ba9"
/>


**note** I also tried to test it with the `TinyLlama-1.1B-Chat` model
but such a small model is bad at producing structured JSON consistently.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ X] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
see above

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [X] **I have tested my changes thoroughly before submitting this PR**
- [X] **This PR contains minimal changes necessary to address the
issue/feature**
- [X] My code follows the project's coding standards and style
guidelines
- [X] I have added tests that prove my fix is effective or that my
feature works
- [X] I have added necessary documentation (if applicable)
- [X] All new and existing tests pass
- [X] I have searched existing PRs to ensure this change hasn't been
submitted already
- [X] I have linked any relevant issues in the description
- [X] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Llama CPP integration supporting local (in-process) and server
(OpenAI‑compatible) modes.
* Selectable provider with configurable model path, context size, GPU
layers, and chat format.
* Asynchronous structured-output generation with rate limiting,
retries/backoff, and debug logging.

* **Chores**
  * Added llama-cpp-python dependency and bumped project version.

* **Documentation**
* CONTRIBUTING updated with a “Running Simple Example” walkthrough for
local/server usage.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-31 12:53:55 +01:00
dgarnitz
dd639fa967 update lock file 2025-12-30 16:59:59 -08:00
dgarnitz
d578971b60 add support for structured outputs with llamma cpp va instructor and litellm 2025-12-30 16:37:31 -08:00
vasilije
27f2aa03b3 added fixes to litellm 2025-12-28 21:48:01 +01:00
vasilije
d1b13b113f added batch size as an env variable option 2025-12-28 21:06:42 +01:00
vasilije
ce685557bb added fix that raises error if database doesnt exist 2025-12-28 20:26:47 +01:00
vasilije
8499258272 resolve jon doe issue 2025-12-28 20:00:29 +01:00
Vasilije
310e9e97ae
feat: list vector distance in cogneegraph (#1926)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

- `map_vector_distances_to_graph_nodes` and
`map_vector_distances_to_graph_edges` accept both single-query (flat
list) and multi-query (nested list) inputs.
- `query_list_length` controls the mode: omit it for single-query
behavior, or provide it to enable multi-query mode with strict length
validation and per-query results.
- `vector_distance` on `Node` and `Edge` is now a list (one distance per
query). Constructors set it to `None`, and `reset_distances` initializes
it at the start of each search.
- `Node.update_distance_for_query` and `Edge.update_distance_for_query`
are the only methods that write to `vector_distance`. They ensure the
list has enough elements and keep unmatched queries at the penalty
value.
- `triplet_distance_penalty` is the default distance value used
everywhere. Unmatched nodes/edges and missing scores all use this same
penalty for consistency.
- `edges_by_distance_key` is an index mapping edge labels to matching
edges. This lets us update all edges with the same label at once,
instead of scanning the full edge list repeatedly.
- `calculate_top_triplet_importances` returns `List[Edge]` for
single-query mode and `List[List[Edge]]` for multi-query mode.


## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Multi-query support for mapping/scoring node and edge distances and a
configurable triplet distance penalty.
* Distance-keyed edge indexing for more accurate distance-to-edge
matching.

* **Refactor**
* Vector distance metadata changed from scalars to per-query lists;
added reset/normalization and per-query update flows.
* Node/edge distance initialization now supports deferred/listed
distances.

* **Tests**
* Updated and expanded tests for multi-query flows, list-based
distances, edge-key handling, and related error cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-23 14:47:27 +01:00
Hande
5f8a3e24bd
refactor: restructure examples and starter kit into new-examples (#1862)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Deprecated legacy examples and added a migration guide mapping old
paths to new locations
* Added a comprehensive new-examples README detailing configurations,
pipelines, demos, and migration notes

* **New Features**
* Added many runnable examples and demos: database configs,
embedding/LLM setups, permissions and access-control, custom pipelines
(organizational, product recommendation, code analysis, procurement),
multimedia, visualization, temporal/ontology demos, and a local UI
starter

* **Chores**
  * Updated CI/test entrypoints to use the new-examples layout

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-12-20 02:07:28 +01:00
lxobr
f6c76ce19e chore: remove duplicate import 2025-12-19 16:24:49 +01:00
lxobr
c3cec818d7 fix: update tests 2025-12-19 16:22:47 +01:00
lxobr
9808077b4c nit: update variable names 2025-12-19 15:35:34 +01:00
Vasilije
9b2b1a9c13
chore: covering higher level search logic with tests (#1910)
<!-- .github/pull_request_template.md -->

## Description
This PR covers the higher level search.py logic with unit tests. As a
part of the implementation we fully cover the following core logic:

- search.py
- get_search_type_tools (with all the core search types)
- search - prepare_search_results contract (testing behavior from
search.py interface)

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Added comprehensive unit test coverage for search functionality,
including search type tool selection, search operations, and result
preparation workflows across multiple scenarios and edge cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 14:22:54 +01:00
Vasilije
16cf955497
feat: adds multitenant tests via pytest (#1923)
<!-- .github/pull_request_template.md -->

## Description
This PR changes the permission test in e2e tests to use pytest.

Introduces:

- fixtures for the environment setup 
- one eventloop for all pytest tests
- mocking for acreate_structured_output answer generation (for search)
- Asserts in permission test (before we use the example only)


## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Entity model now includes description and metadata fields for richer
entity information and indexing.

* **Tests**
* Expanded and restructured permission tests covering multi-tenant and
role-based access flows; improved test scaffolding and stability.
  * E2E test workflow now runs pytest with verbose output and INFO logs.

* **Bug Fixes**
* Access-tracking updates now commit transactions so access timestamps
persist.

* **Chores**
* General formatting, cleanup, and refactoring across modules and
maintenance scripts.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 14:16:01 +01:00
Igor Ilic
2c4f9b07ac fix: Resolve migration issue 2025-12-19 13:35:14 +01:00
lxobr
a85df53c74 chore: tweak mapping and scoring 2025-12-19 13:14:50 +01:00
Igor Ilic
3bc3f63362 fix: Resolve issues with migration 2025-12-19 11:55:13 +01:00
hajdul88
72dae0f79a fix linting 2025-12-19 10:38:44 +01:00
hajdul88
7cf93ea79d updates old no asserts test + yml 2025-12-19 10:32:45 +01:00
hajdul88
8214bdce5b Revert "changes pytest call in yaml"
This reverts commit 8a490b1c16.
2025-12-19 10:28:48 +01:00
hajdul88
4b71995a70 ruff 2025-12-19 10:25:24 +01:00
hajdul88
8a490b1c16 changes pytest call in yaml 2025-12-19 10:20:46 +01:00
hajdul88
9819b38058
Merge branch 'dev' into feature/cog-3536-multitenant-search-testing-automation 2025-12-19 10:06:02 +01:00
Vasilije
eb444ca18f
feat: Add a task that deletes the old data that has not been accessed in a while (#1751)
<!-- .github/pull_request_template.md -->  
  
## Description  
  
This PR implements a data deletion system for unused DataPoint models
based on last access tracking. The system tracks when data is accessed
during search operations and provides cleanup functionality to remove
data that hasn't been accessed within a configurable time threshold.
  
**Key Changes:**  
1. Added `last_accessed` timestamp field to the SQL `Data` model  
2. Added `last_accessed_at` timestamp field to the graph `DataPoint`
model
3. Implemented `update_node_access_timestamps()` function that updates
both graph nodes and SQL records during search operations
4. Created `cleanup_unused_data()` function with SQL-based deletion mode
for whole document cleanup
5. Added Alembic migration to add `last_accessed` column to the `data`
table
6. Integrated timestamp tracking into  in retrievers  
7. Added comprehensive end-to-end test for the cleanup functionality  
  
## Related Issues  
Fixes #[issue_number]  
  
## Type of Change  
- [x] New feature (non-breaking change that adds functionality)  
- [ ] Bug fix (non-breaking change that fixes an issue)  
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update  
- [ ] Code refactoring  
- [ ] Performance improvement  
  
## Database Changes  
- [x] This PR includes database schema changes  
- [x] Alembic migration included: `add_last_accessed_to_data`  
- [x] Migration adds `last_accessed` column to `data` table  
- [x] Migration includes backward compatibility (nullable column)  
- [x] Migration tested locally  
  
## Implementation Details  
  
### Files Modified:  
1. **cognee/modules/data/models/Data.py** - Added `last_accessed` column
2. **cognee/infrastructure/engine/models/DataPoint.py** - Added
`last_accessed_at` field
3. **cognee/modules/retrieval/chunks_retriever.py** - Integrated
timestamp tracking in `get_context()`
4. **cognee/modules/retrieval/utils/update_node_access_timestamps.py**
(new file) - Core tracking logic
5. **cognee/tasks/cleanup/cleanup_unused_data.py** (new file) - Cleanup
implementation
6. **alembic/versions/[revision]_add_last_accessed_to_data.py** (new
file) - Database migration
7. **cognee/tests/test_cleanup_unused_data.py** (new file) - End-to-end
test
  
### Key Functions:  
- `update_node_access_timestamps(items)` - Updates timestamps in both
graph and SQL
- `cleanup_unused_data(minutes_threshold, dry_run, text_doc)` - Main
cleanup function
- SQL-based cleanup mode uses `cognee.delete()` for proper document
deletion
  
## Testing  
- [x] Added end-to-end test: `test_textdocument_cleanup_with_sql()`  
- [x] Test covers: add → cognify → search → timestamp verification →
aging → cleanup → deletion verification
- [x] Test verifies cleanup across all storage systems (SQL, graph,
vector)
- [x] All existing tests pass  
- [x] Manual testing completed  
  
## Screenshots/Videos  
N/A - Backend functionality  
  
## Pre-submission Checklist  
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)  
- [x] All new and existing tests pass  
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description  
- [x] My commits have clear and descriptive messages  
  
## Breaking Changes  
None - This is a new feature that doesn't affect existing functionality.
  

## DCO Affirmation  
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
Resolves #1335 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added access timestamp tracking to monitor when data is last
retrieved.
* Introduced automatic cleanup of unused data based on configurable time
thresholds and access history.
* Retrieval operations now update access timestamps to ensure accurate
tracking of data usage.

* **Tests**
* Added integration test validating end-to-end cleanup workflow across
storage layers.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 09:47:31 +01:00
Vasilije
3055ed89c8
test: set up triggers for docs and community tests on new main release (#1780)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added tests that just run scripts we have in the docs, in our guides
section (Essentials and Customizing Cognee). This is a start of a test
suite regarding docs, to make sure new releases don't break the scripts
we have written in our docs.
The new workflow only runs on releases, and is part of the Release Test
Workflow we have.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Enhanced release workflow automation with improved coordination
between repositories.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 09:40:10 +01:00
hajdul88
ee967ae3fa feat: adds grant permission checks + tenant + role scenarios 2025-12-19 09:31:56 +01:00
hajdul88
976ac78e5e ruff 2025-12-19 07:53:36 +01:00
hajdul88
ef7ebc0748 feat: adds user1 and user 2 dataset read tests 2025-12-19 07:53:08 +01:00
Boris Arzentar
3311db55bf
fix: typos in text and error handling 2025-12-18 22:52:09 +01:00
Boris Arzentar
672a776df5
Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook 2025-12-18 17:33:25 +01:00
Boris Arzentar
edb541505c
fix: lint errors and ignore tutorial python files when linting 2025-12-18 17:33:21 +01:00
hajdul88
3e47de5ea0 ruff ruff 2025-12-18 17:33:15 +01:00
hajdul88
9c04f46572 feat: adds new permission test fixtures and setup til cognify 2025-12-18 17:31:32 +01:00
hajdul88
ef51dcfb7a
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-18 16:10:16 +01:00
hajdul88
4f07adee66
chore: fixes get_raw_data endpoint and adds s3 support (#1916)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes get_raw_data endpoint in get_dataset_router

- Fixes local path access
- Adds s3 access
- Covers new fixed functionality with unit tests

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Streaming support for remote S3 data locations so large dataset files
can be retrieved efficiently.
  * Improved handling of local and remote file paths for downloads.

* **Improvements**
  * Standardized error responses for missing datasets or data files.

* **Tests**
* Added unit tests covering local file downloads and S3 streaming,
including content and attachment header verification.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-18 16:10:05 +01:00
Pavel Zorin
a70ce2785b Release v0.5.1.dev0 2025-12-18 16:07:19 +01:00
Boris Arzentar
d127381262
Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook 2025-12-18 15:28:56 +01:00
Boris Arzentar
f93d414e94
feat: simplify the current tutorial and add cognee basics tutorial 2025-12-18 15:28:45 +01:00
lxobr
c1ea7a8cc2 fix: improve graph distance mapping 2025-12-18 14:52:35 +01:00
hajdul88
8602ba1e93
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-18 13:25:19 +01:00
Vasilije
4d03fcfa9e
fix: Fix connection encoding (#1917)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with special characters like '#' and '@' in passwords for
Postgres

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved internal database connection handling for relational and
vector databases to enhance system stability and code maintainability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-17 22:04:09 +01:00
Vasilije
2ef8094666
feat: Add custom label by contributor: apenade (#1913)
<!-- .github/pull_request_template.md -->

## Description
Add ability to define custom labels for Data in Cognee. Initial PR by
contributor: apenade

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added support for labeling individual data items during ingestion
workflows
* Expanded the add API to accept data items with optional custom labels
for better organization
* Labels are persisted and retrievable when accessing dataset
information
* Enhanced data retrieval to include label information in API responses

* **Tests**
* Added comprehensive end-to-end tests validating custom data labeling
functionality

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-17 21:21:40 +01:00
Igor Ilic
d352ff0c28
Merge branch 'dev' into fix-connection-encoding 2025-12-17 21:08:45 +01:00
Igor Ilic
6e5e79f434 fix: Resolve connection issue with postgres when special characters are present 2025-12-17 21:07:23 +01:00
lxobr
46ff01021a feat: add multi-query support to score calculation 2025-12-17 19:09:02 +01:00
Christina_Raichel_Francis
931c5f3096 refactor: add test and example script 2025-12-17 18:02:35 +00:00
lxobr
69ab8e7ede feat: add multi-query support to graph distance mapping 2025-12-17 18:14:57 +01:00
lxobr
cc7ca45e73 feat: make vector_distance list based 2025-12-17 15:48:24 +01:00
Andrej Milicevic
929d88557e Merge branch 'dev' into feature/cog-3213-docs-set-up-guide-script-tests 2025-12-17 13:52:45 +01:00
Andrej Milicevic
431a83247f chore: remove unnecessary 'on push' setting 2025-12-17 13:50:43 +01:00
Andrej Milicevic
6958b4edd4 feat: add the triggers to release, after pypi publishing 2025-12-17 13:50:03 +01:00
Andrej Milicevic
a5a7ae2564 test: remove if 2025-12-17 13:16:46 +01:00
Andrej Milicevic
601f74db4f test: remove dependency from community trigger 2025-12-17 13:15:43 +01:00
Andrej Milicevic
e92d8f57b5 feat: add comunity test trigger 2025-12-17 13:14:14 +01:00
hajdul88
d8b4411aac
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-17 12:30:47 +01:00
hajdul88
f79ba53e1d
COG-3532 chore: retriever test reorganization + adding new tests (unit) (STEP 2) (#1892)
<!-- .github/pull_request_template.md -->

This PR restructures/adds unittests for the retrieval module. (STEP 2)

-Added missing unit tests for all core retrieval business logic

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Expanded and refactored retrieval module test suites with
comprehensive unit test coverage for ChunksRetriever,
SummariesRetriever, RagCompletionRetriever, TripletRetriever,
GraphCompletionRetriever, TemporalRetriever, and related components.
* Added new test modules for completion utilities, graph summary
retrieval, and user feedback functionality.
* Improved test robustness with edge case handling and error scenario
coverage.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-17 12:30:15 +01:00
Christina_Raichel_Francis
ee29dd1f81 refactor: update cognee tasks to add frequency tracking script 2025-12-17 10:36:59 +00:00
Andrej Milicevic
999e6c0981 Merge branch 'dev' into feature/cog-3213-docs-set-up-guide-script-tests 2025-12-17 11:01:25 +01:00
hajdul88
b0454b49a9 Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4' of github.com:topoteretes/cognee into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-17 10:35:12 +01:00
hajdul88
94d5175570 feat: adds unit test for the prepare search result - search contract 2025-12-17 10:34:57 +01:00
hajdul88
623126eec1
Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-17 10:07:58 +01:00
Igor Ilic
cc872fc8de refactor: format PR 2025-12-16 21:04:15 +01:00
Igor Ilic
233afdd0a9
Merge branch 'dev' into add-custom-label-apenade 2025-12-16 21:01:42 +01:00
Igor Ilic
b77961b0f1 fix: Resolve issues with data label PR, add tests and upgrade migration 2025-12-16 20:59:17 +01:00
hajdul88
8f8f4c0b63
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 20:18:54 +01:00
Igor Ilic
56b03c89f3 Merge branch 'dev' into add-custom-label-apenade 2025-12-16 19:15:30 +01:00
Vasilije
aeda1d8eba
Test audio image transcription (#1911)
<!-- .github/pull_request_template.md -->

## Description
Run CI/CD for audio/image transcription PR from contributor
@rajeevrajeshuni

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
  * Added audio transcription capability across LLM providers.
  * Added image transcription and description capability.
  * Enhanced observability and monitoring for AI operations.

* **Breaking Changes**
* Removed synchronous structured output method; use asynchronous
alternative instead.

* **Refactor**
* Unified LLM provider architecture for improved consistency and
maintainability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 19:06:38 +01:00
hajdul88
1500b1c693
Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-16 17:58:04 +01:00
hajdul88
18d0a41850 Update test_search.py 2025-12-16 17:49:43 +01:00
hajdul88
4ff2a35476 chore: moves unit tests into their correct directory 2025-12-16 17:33:20 +01:00
Igor Ilic
8027263e8b refactor: remove unused import 2025-12-16 16:54:27 +01:00
Igor Ilic
f27d07d902 refactor: remove mandatory transcription and image methods in LLMInterface 2025-12-16 16:53:56 +01:00
hajdul88
789fa90790 chore: covering search.py behavior with unit tests 2025-12-16 16:39:31 +01:00
Igor Ilic
3e041ec12f refactor: format code 2025-12-16 16:28:30 +01:00
Igor Ilic
f2cb68dd5e refactor: use async image and transcription handling 2025-12-16 16:27:13 +01:00
Igor Ilic
d92d6b9d8f refactor: remove optional return value 2025-12-16 16:02:15 +01:00
hajdul88
7892b48afe Update test_get_search_type_tools.py 2025-12-16 15:59:15 +01:00
Igor Ilic
a52873a71f refactor: make return type mandatory for transcription 2025-12-16 15:54:25 +01:00
hajdul88
48c2040f3d Delete test_get_search_type_tools_integration.py 2025-12-16 15:45:32 +01:00
hajdul88
757d5fca65
Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-16 15:43:05 +01:00
Igor Ilic
13c034e2e4
Merge branch 'dev' into issue-1767 2025-12-16 15:41:28 +01:00
hajdul88
89ef7d7d15 feat: adds integration test for community registered retriever case 2025-12-16 15:41:13 +01:00
hajdul88
c61ff60e40 feat: add unit tests for get_search_type_tools 2025-12-16 15:37:33 +01:00
hajdul88
0d5b284147
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 15:18:01 +01:00
Vasilije
12e6ad152e
fix(api): pass run_in_background parameter to memify function (#1847)
## Summary

The `run_in_background` parameter was defined in `MemifyPayloadDTO` but
was never passed to the `cognee_memify` function call, making the
parameter effectively unused.

## Changes

This fix passes the `run_in_background` parameter from the payload to
the `cognee_memify` function so users can actually run memify operations
in the background.

## Testing

- `uv run ruff check cognee/api/v1/memify/routers/get_memify_router.py`
- All checks passed
- `uv run ruff format cognee/api/v1/memify/routers/get_memify_router.py`
- No changes needed

## DCO

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Fixed background execution flag for memify operations to be properly
applied when requested. The background execution setting is now
correctly propagated through the system, ensuring operations run as
intended.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 15:14:52 +01:00
hajdul88
aad4d0cdde
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 14:56:24 +01:00
Vasilije
bad9f09f4c
Merge main vol9 (#1908)
<!-- .github/pull_request_template.md -->

## Description
Merge PRs from main to dev

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Version bumped to 0.5.0 across project packages
* Simplified release workflow by removing conditional release paths,
ensuring consistent publishing process for all releases

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 14:54:14 +01:00
Vasilije
42e711fe3a
test: Resolve silent mcp test (#1907)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with silent MCP test

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Modified test failure handling to enforce zero failures during test
runs by aborting upon test failures.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 14:53:53 +01:00
Vasilije
412b6467da
feat(database): add connect_args support to SqlAlchemyAdapter (#1861)
- Add optional connect_args parameter to __init__ method
- Support DATABASE_CONNECT_ARGS environment variable for JSON-based
configuration
- Enable custom connection parameters for all database engines (SQLite
and PostgreSQL)
- Maintain backward compatibility with existing code
- Add proper error handling and validation for environment variable
parsing

<!-- .github/pull_request_template.md -->
## Description
The intent of this PR is to make the database initialization more
flexible and configurable. In order to do this, the system will support
a new DATABASE_CONNECT_ARGS environment variable that takes JSON-based
configuration,. This enhancement will allow custom connection parameters
to be passed to any supported database engine, including SQLite and
PostgreSQL,. To guarantee that the environment variable is parsed
securely and consistently, appropriate error handling and validation
will also be added.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [x] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Advanced database connection configuration through the optional
DATABASE_CONNECT_ARGS environment variable, supporting custom settings
such as SSL certificates and timeout configurations.
* Custom connection arguments can now be passed to relational database
adapters.

* **Tests**
* Comprehensive unit test suite for database connection argument parsing
and validation.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 14:50:27 +01:00
Vasilije
9a509e4e2e
test: Add permission example test with running s3 file system (#1879)
<!-- .github/pull_request_template.md -->

## Description
Add test for S3 file system functioning with permissions example (covers
multi-user mode, permissions and multi tenancy use cases)

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Renamed a CI test step to clarify its label in permissions testing.
* Added a new CI workflow job to validate permissions against S3-backed
storage, including a database service and an end-to-end example
execution to verify S3-specific environment and storage behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 14:49:45 +01:00
Igor Ilic
7f1a51fcdc Merge branch 'main' into merge-main-vol9 2025-12-16 13:39:51 +01:00
hajdul88
646894d7c5 Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3 2025-12-16 12:04:11 +01:00
Igor Ilic
21407dd9ed test: Resolve silent mcp test 2025-12-16 12:03:49 +01:00
hajdul88
b4aaa7faef
chore: retriever test reorganization + adding new tests (smoke e2e) (STEP 1.5) (#1888)
<!-- .github/pull_request_template.md -->

This PR restructures the end-to-end tests for the multi-database search
layer to improve maintainability, consistency, and coverage across
supported Python versions and database settings.

Key Changes

-Migrates the existing E2E tests to pytest for a more standard and
extensible testing framework.
-Introduces pytest fixtures to centralize and reuse test setup logic.
-Implements proper event loop management to support multiple
asynchronous pytest tests reliably.
-Improves SQLAlchemy handling in tests, ensuring clean setup and
teardown of database state.
-Extends multi-database E2E test coverage across all supported Python
versions.

Benefits

-Cleaner and more modular test structure.
-Reduced duplication and clearer test intent through fixtures.
-More reliable async test execution.
-Better alignment with our supported Python version matrix.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Expanded end-to-end test suite for the search database with
comprehensive setup/teardown, new session-scoped fixtures, and multiple
tests validating graph/vector consistency, retriever contexts, triplet
metadata, search result shapes, side effects, and feedback-weight
behavior.

* **Chores**
* CI updated to run matrixed test jobs across multiple Python versions
and standardize test execution for more consistent, parallelized runs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 11:59:33 +01:00
hajdul88
4e8845c117
chore: retriever test reorganization + adding new tests (integration) (STEP 1) (#1881)
<!-- .github/pull_request_template.md -->

## Description
This PR restructures/adds integration and unit tests for the retrieval
module.

-Old integration tests were updated and moved under unit tests +
fixtures added
-Added missing unit tests for all core retrieval business logic
-Covered 100% of the core retrievers with tests
-Minor changes (dead code deletion, typo fixed)

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Changes**
* TripletRetriever now returns up to 5 results by default (was 1),
providing richer context.

* **Tests**
* Reorganized test coverage: many unit tests removed and replaced with
comprehensive integration tests across retrieval components (graph,
chunks, RAG, summaries, temporal, triplets, structured output).

* **Chores**
  * Simplified triplet formatting logic and removed debug output.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 11:11:29 +01:00
hajdul88
fa035f42f4 chore: adds back accidentally deleted structured output test 2025-12-12 16:47:58 +01:00
Igor Ilic
7cf6f08283 chore: update test credentials 2025-12-12 15:29:21 +01:00
hajdul88
fd23c75c09 chore: adds new Unit tests for retrievers 2025-12-12 14:44:41 +01:00
Igor Ilic
0cde551226
Merge branch 'dev' into add-s3-permissions-test 2025-12-12 13:22:50 +01:00
Andrej Milicevic
0f4cf15d58 test: fix docs test trigger 2025-12-11 16:24:47 +01:00
Andrej Milicevic
41edeb0cf8 test: change target repo name 2025-12-11 16:01:26 +01:00
Andrej Milicevic
cd60ae3174 test: remove docs tests. add trigger to docs repo 2025-12-11 15:25:44 +01:00
Chinmay Bhosale
0d96606fb2
Merge pull request #8 from chinu0609/delete-last-acessed
fix: only document level deletion
2025-12-11 18:17:04 +05:30
chinu0609
2485c3f5f0 fix: only document level deletion 2025-12-11 12:48:06 +05:30
hiyan
f48df27fc8 fix(db): url-encode postgres credentials to handle special characters 2025-12-11 10:32:45 +05:30
rajeevrajeshuni
6260f9eb82 strandardizing return type for transcription and some CR changes 2025-12-11 06:53:36 +05:30
Chinmay Bhosale
e654bcb081
Merge pull request #7 from chinu0609/delete-last-acessed
fix: only document level deletion
2025-12-10 22:44:06 +05:30
chinu0609
829a6f0d04 fix: only document level deletion 2025-12-10 22:41:01 +05:30
Igor Ilic
2067c459e3
Merge branch 'dev' into add-s3-permissions-test 2025-12-10 17:39:58 +01:00
Igor Ilic
4d0f132822 chore: Remove AWS url 2025-12-10 16:22:40 +01:00
Igor Ilic
ab20443330 chore: Change s3 bucket for permission example 2025-12-10 12:35:58 +01:00
rajeevrajeshuni
d57d188459 resolving merge conflicts 2025-12-10 10:52:10 +05:30
rajeevrajeshuni
8e5f14da78 resolving merge conflicts 2025-12-10 10:49:23 +05:30
Igor Ilic
7972e39653
Merge branch 'dev' into main 2025-12-09 20:40:33 +01:00
Chinmay Bhosale
6ecf719632
Merge pull request #6 from chinu0609/delete-last-acessed
fix: flag to enable and disable last_accessed
2025-12-10 00:03:16 +05:30
ketanjain3
2de1bd977d
Merge branch 'dev' into feature/sqlalchemy-custom-connect-args 2025-12-09 23:53:06 +05:30
Igor Ilic
032a74a409 chore: add postgres dependency for cicd test 2025-12-09 17:56:34 +01:00
Igor Ilic
28faf7ce04 test: Add permission example test with running s3 file system 2025-12-09 17:53:18 +01:00
ketanjain7981
e1d313a46b move DATABASE_CONNECT_ARGS parsing to RelationalConfig
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-09 10:15:36 +05:30
ketanjain3
654a573454
Merge branch 'dev' into feature/sqlalchemy-custom-connect-args 2025-12-04 23:47:39 +05:30
Boris
ec744f01cc
Merge branch 'dev' into fix/memify-run-in-background-clean 2025-12-03 14:55:32 +01:00
Boris
b3f9795d2e
Merge branch 'dev' into feature/cog-3213-docs-set-up-guide-script-tests 2025-12-03 12:03:31 +01:00
ketanjain7981
f26b490a8f refactor: improve test isolation and add connect_args precedence
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-03 00:27:51 +05:30
ketanjain3
1f98d50870
Merge branch 'dev' into feature/sqlalchemy-custom-connect-args 2025-12-03 00:15:03 +05:30
ketanjain7981
a7da9c7d65 test: verify logger warning for invalid JSON in SQLAlchemyAdapter
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 23:35:35 +05:30
ketanjain7981
4f3a1bcf01 test: add unit tests for SQLAlchemyAdapter connection arguments
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 23:25:47 +05:30
chinu0609
5f00abf3e4 fix: fallback and document deletion 2025-12-02 22:25:03 +05:30
ketanjain7981
3f53534c99 refactor(database): simplify to env var only for connect_args
- Remove unused connect_args parameter from __init__
- Programmatic parameter was dead code (never called by users)
- Users call get_relational_engine() which doesn't expose connect_args
- Keep DATABASE_CONNECT_ARGS env var support (actually used in production)
- Simplify implementation and reduce complexity
- Update docstring to reflect env-var-only approach
- Add production examples to docstring

Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 21:12:07 +05:30
ketanjain7981
c892265644 fix(database): address CodeRabbit review feedback
- Add comprehensive docstring for __init__ method to meet 80% coverage requirement
- Fix security issue: remove sensitive data from log messages
- Fix merge precedence: programmatic args now correctly override env vars
- Fix SQLite timeout order: user-specified timeout now overrides default 30s
- Clarify precedence in docstring documentation

Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 21:01:43 +05:30
ketanjain7981
f9b16e508d feat(database): add connect_args support to SqlAlchemyAdapter
- Add optional connect_args parameter to __init__ method
- Support DATABASE_CONNECT_ARGS environment variable for JSON-based configuration
- Enable custom connection parameters for all database engines (SQLite and PostgreSQL)
- Maintain backward compatibility with existing code
- Add proper error handling and validation for environment variable parsing

Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 20:30:09 +05:30
chinu0609
6a4d31356b fix: using graph projection instead of conditions 2025-12-02 18:55:47 +05:30
Boris
3cda1af29d
Merge branch 'dev' into main 2025-12-01 11:33:22 +01:00
Mike Potter
73d84129de fix(api): pass run_in_background parameter to memify function
The run_in_background parameter was defined in MemifyPayloadDTO but was
never passed to the cognee_memify function call, making the parameter
effectively unused. This fix passes the parameter so users can actually
run memify operations in the background.

Signed-off-by: Mike Potter <mpotter1@gmail.com>
2025-11-28 12:40:53 -05:00
Rajeev Rajeshuni
57195fb5a1
Merge branch 'dev' into issue-1767 2025-11-28 17:24:55 +05:30
chinu0609
12ce80005c fix: generalized queries 2025-11-26 17:32:50 +05:30
rajeevrajeshuni
09fbf22768 uv lock version revert 2025-11-25 12:24:30 +05:30
rajeevrajeshuni
02b1778658 Adding support for audio/image transcription for all other providers 2025-11-25 12:22:15 +05:30
chinu0609
5cb6510205 fix: import 2025-11-24 13:12:46 +05:30
chinu0609
d6da7a999b Merge branch 'dev' of https://github.com/topoteretes/cognee 2025-11-24 12:54:48 +05:30
chinu0609
b52c1a1e25 fix: flag to enable and disable last_accessed 2025-11-24 12:50:39 +05:30
chinu0609
53d3b50f93 Merge remote-tracking branch 'upstream/dev' 2025-11-22 14:54:10 +05:30
chinu0609
43290af1b2 fix: set last_acessed to current timestamp 2025-11-19 21:00:16 +05:30
apenade
a072773995
Update alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-19 16:02:27 +01:00
chinu0609
5fac3b40b9 fix: test file for cleanup unused data 2025-11-18 22:26:59 +05:30
chinu0609
7bd7079aac fix: vecto_engine.delte_data_points 2025-11-18 22:17:23 +05:30
apenade
a451fb8c5a
Update cognee/tasks/ingestion/data_item.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-18 10:45:31 +01:00
apenade
8ea83e4a26
Update alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-18 10:44:49 +01:00
Andrés Peña del Río
82d48663bb Add custom label support to Data model (#1769) 2025-11-17 23:27:31 +01:00
Andrej Milicevic
1e56d6dc38 chore: ruff format 2025-11-12 13:42:53 +01:00
Andrej Milicevic
503bdc34f3 test: add tests to workflows 2025-11-12 13:20:23 +01:00
Andrej Milicevic
b328aeff12 Merge branch 'dev' into feature/cog-3213-docs-set-up-guide-script-tests 2025-11-12 12:26:38 +01:00
Andrej Milicevic
ac3300760b test: add search tests docs 2025-11-12 12:26:28 +01:00
chinu0609
d351c9a009 fix: return chunk payload 2025-11-10 21:58:01 +05:30
chinu0609
84bd2f38f7 fix: remove uneccessary imports 2025-11-07 12:12:46 +05:30
chinu0609
84c8e07ddd fix: remove uneccessary imports 2025-11-07 12:03:17 +05:30
chinu0609
85a2bac062 fix: min to days 2025-11-06 23:18:25 +05:30
chinu0609
ce4a5c8311 Merge branch 'main' of github.com:chinu0609/cognee 2025-11-06 23:05:12 +05:30
chinu0609
b327756e5f Merge remote-tracking branch 'upstream/dev' 2025-11-06 23:03:45 +05:30
Chinmay Bhosale
bd71540d75
Merge pull request #5 from chinu0609/delete-last-acessed
feat: adding cleanup function and adding update_node_acess_timestamps…
2025-11-06 23:02:18 +05:30
chinu0609
fdf037b3d0 fix: min to days 2025-11-06 23:00:56 +05:30
chinu0609
c5f0c4af87 fix: add text_doc flag 2025-11-05 20:22:17 +05:30
chinu0609
ff263c0132 fix: add column check in migration 2025-11-05 18:40:58 +05:30
chinu0609
9041a804ec fix: add text_doc flag 2025-11-05 18:32:49 +05:30
chinu0609
3c0e915812 fix: removing hard relations 2025-11-05 12:25:51 +05:30
chinu0609
d34fd9237b feat: adding last_acessed in the Data model 2025-11-04 22:04:32 +05:30
Andrej Milicevic
90d10e6f9a test: Add docs tests. Initial commit, still WIP. 2025-11-03 15:31:09 +01:00
chinu0609
5080e8f8a5 feat: genarlizing getting entities from triplets 2025-11-03 00:59:04 +05:30
chinu0609
f1afd1f0a2 feat: adding cleanup function and adding update_node_acess_timestamps in completion retriever and graph_completion retriever 2025-10-31 15:49:34 +05:30
Chinmay Bhosale
4b43afcdab
Merge pull request #4 from chinu0609/delete-last-acessed
Delete last acessed
2025-10-31 00:25:33 +05:30
chinu0609
6f06e4a5eb fix: removing node_type and try except 2025-10-31 00:17:13 +05:30
chinu0609
13396871c9 Merge branch 'delete-last-acessed' of github.com:chinu0609/cognee into delete-last-acessed 2025-10-31 00:12:10 +05:30
chinu0609
5f6f0502c8 fix: removing last_acessed_at from individual model and adding it to DataPoint 2025-10-31 00:00:18 +05:30
chinu0609
7d4804ff7b Merge remote-tracking branch 'upstream/dev' into delete-last-acessed 2025-10-29 20:22:34 +05:30
chinu0609
3f27c5592b feat: adding last_accessed_at field to the models and updating the retrievers to update the timestamp 2025-10-29 20:17:27 +05:30
chinu0609
3372679f7b feat: adding last_accessed_at field to the models and updating the retrievers to update the timestamp 2025-10-29 20:12:14 +05:30
305 changed files with 27998 additions and 8737 deletions

View file

@ -3,7 +3,7 @@
language: en
early_access: false
enable_free_tier: true
reviews:
reviews:
profile: chill
instructions: >-
# Code Review Instructions
@ -118,10 +118,10 @@ reviews:
- E117
- D208
line_length: 100
dummy_variable_rgx: '^(_.*|junk|extra)$' # Variables starting with '_' or named 'junk' or 'extras', are considered dummy variables
dummy_variable_rgx: '^(_.*|junk|extra)$' # Variables starting with '_' or named 'junk' or 'extras', are considered dummy variables
markdownlint:
enabled: true
yamllint:
enabled: true
chat:
auto_reply: true
auto_reply: true

View file

@ -2,4 +2,8 @@
# Example:
# CORS_ALLOWED_ORIGINS="https://yourdomain.com,https://another.com"
# For local development, you might use:
# CORS_ALLOWED_ORIGINS="http://localhost:3000"
# CORS_ALLOWED_ORIGINS="http://localhost:3000"
LLM_API_KEY="your-openai-api-key"
LLM_MODEL="openai/gpt-4o-mini"
LLM_PROVIDER="openai"

View file

@ -91,6 +91,15 @@ DB_NAME=cognee_db
#DB_USERNAME=cognee
#DB_PASSWORD=cognee
# -- Advanced: Custom database connection arguments (optional) ---------------
# Pass additional connection parameters as JSON. Useful for SSL, timeouts, etc.
# Examples:
# For PostgreSQL with SSL:
# DATABASE_CONNECT_ARGS='{"sslmode": "require", "connect_timeout": 10}'
# For SQLite with custom timeout:
# DATABASE_CONNECT_ARGS='{"timeout": 60}'
#DATABASE_CONNECT_ARGS='{}'
################################################################################
# 🕸️ Graph Database settings
################################################################################

View file

@ -28,4 +28,4 @@ secret-scan:
- path: 'docker-compose.yml'
comment: 'Development docker compose with test credentials (neo4j/pleaseletmein, postgres cognee/cognee)'
- path: 'deployment/helm/docker-compose-helm.yml'
comment: 'Helm deployment docker compose with test postgres credentials (cognee/cognee)'
comment: 'Helm deployment docker compose with test postgres credentials (cognee/cognee)'

View file

@ -8,7 +8,7 @@ body:
attributes:
value: |
Thanks for taking the time to fill out this bug report! Please provide a clear and detailed description.
- type: textarea
id: description
attributes:
@ -17,7 +17,7 @@ body:
placeholder: Describe the bug in detail...
validations:
required: true
- type: textarea
id: reproduction
attributes:
@ -29,7 +29,7 @@ body:
3. See error...
validations:
required: true
- type: textarea
id: expected
attributes:
@ -38,7 +38,7 @@ body:
placeholder: Describe what you expected...
validations:
required: true
- type: textarea
id: actual
attributes:
@ -47,7 +47,7 @@ body:
placeholder: Describe what actually happened...
validations:
required: true
- type: textarea
id: environment
attributes:
@ -61,7 +61,7 @@ body:
- Database: [e.g. Neo4j]
validations:
required: true
- type: textarea
id: logs
attributes:
@ -71,7 +71,7 @@ body:
render: shell
validations:
required: false
- type: textarea
id: additional
attributes:
@ -80,7 +80,7 @@ body:
placeholder: Any additional information...
validations:
required: false
- type: checkboxes
id: checklist
attributes:

View file

@ -8,7 +8,7 @@ body:
attributes:
value: |
Thanks for helping improve our documentation! Please provide details about the documentation issue or improvement.
- type: dropdown
id: doc-type
attributes:
@ -22,7 +22,7 @@ body:
- New documentation request
validations:
required: true
- type: textarea
id: location
attributes:
@ -31,7 +31,7 @@ body:
placeholder: https://cognee.ai/docs/... or specific file/section
validations:
required: true
- type: textarea
id: issue
attributes:
@ -40,7 +40,7 @@ body:
placeholder: The documentation is unclear about...
validations:
required: true
- type: textarea
id: suggestion
attributes:
@ -49,7 +49,7 @@ body:
placeholder: I suggest changing this to...
validations:
required: false
- type: textarea
id: additional
attributes:
@ -58,7 +58,7 @@ body:
placeholder: Additional context...
validations:
required: false
- type: checkboxes
id: checklist
attributes:
@ -71,4 +71,3 @@ body:
required: true
- label: I have specified the location of the documentation issue
required: true

View file

@ -8,7 +8,7 @@ body:
attributes:
value: |
Thanks for suggesting a new feature! Please provide a clear and detailed description of your idea.
- type: textarea
id: problem
attributes:
@ -17,7 +17,7 @@ body:
placeholder: I'm always frustrated when...
validations:
required: true
- type: textarea
id: solution
attributes:
@ -26,7 +26,7 @@ body:
placeholder: I would like to see...
validations:
required: true
- type: textarea
id: alternatives
attributes:
@ -35,7 +35,7 @@ body:
placeholder: I have also considered...
validations:
required: false
- type: textarea
id: use-case
attributes:
@ -44,7 +44,7 @@ body:
placeholder: This feature would help me...
validations:
required: true
- type: textarea
id: implementation
attributes:
@ -53,7 +53,7 @@ body:
placeholder: This could be implemented by...
validations:
required: false
- type: textarea
id: additional
attributes:
@ -62,7 +62,7 @@ body:
placeholder: Additional context...
validations:
required: false
- type: checkboxes
id: checklist
attributes:
@ -75,4 +75,3 @@ body:
required: true
- label: I have described my specific use case
required: true

View file

@ -34,14 +34,14 @@ runs:
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
neo4j:${{ inputs.neo4j-version }}
- name: Wait for Neo4j to be ready
shell: bash
run: |
echo "Waiting for Neo4j to start..."
timeout=60
counter=0
while [ $counter -lt $timeout ]; do
if docker exec neo4j-test cypher-shell -u neo4j -p "${{ inputs.neo4j-password }}" "RETURN 1" > /dev/null 2>&1; then
echo "Neo4j is ready!"
@ -51,13 +51,13 @@ runs:
sleep 2
counter=$((counter + 2))
done
if [ $counter -ge $timeout ]; then
echo "Neo4j failed to start within $timeout seconds"
docker logs neo4j-test
exit 1
fi
- name: Verify GDS is available
shell: bash
run: |

View file

@ -8,5 +8,3 @@ lxobr
pazone
siillee
vasilije1990

View file

@ -10,26 +10,21 @@ DO NOT use AI-generated descriptions. We want to understand your thought process
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to test it locally;
* Proof that it's sufficiently tested.
-->
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Screenshots
<!-- ADD SCREENSHOT OF LOCAL TESTS PASSING-->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **I have tested my changes thoroughly before submitting this PR** (See `CONTRIBUTING.md`)
- [ ] **This PR contains minimal changes necessary to address the issue/feature**
- [ ] My code follows the project's coding standards and style guidelines
- [ ] I have added tests that prove my fix is effective or that my feature works

View file

@ -3,7 +3,7 @@ tag-template: 'v$NEXT_PATCH_VERSION'
categories:
- title: 'Features'
labels: ['feature', 'enhancement']
labels: ['feature', 'enhancement']
- title: 'Bug Fixes'
labels: ['bug', 'fix']
- title: 'Maintenance'

View file

@ -34,43 +34,6 @@ env:
ENV: 'dev'
jobs:
lint:
name: Run Linting
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
- name: Run Linting
uses: astral-sh/ruff-action@v2
format-check:
name: Run Formatting Check
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
- name: Run Formatting Check
uses: astral-sh/ruff-action@v2
with:
args: "format --check"
unit-tests:
name: Run Unit Tests
runs-on: ubuntu-22.04

View file

@ -31,54 +31,54 @@ WORKFLOWS=(
for workflow in "${WORKFLOWS[@]}"; do
if [ -f "$workflow" ]; then
echo "Processing $workflow..."
# Create a backup
cp "$workflow" "${workflow}.bak"
# Check if the file begins with a workflow_call trigger
if grep -q "workflow_call:" "$workflow"; then
echo "$workflow already has workflow_call trigger, skipping..."
continue
fi
# Get the content after the 'on:' section
on_line=$(grep -n "^on:" "$workflow" | cut -d ':' -f1)
if [ -z "$on_line" ]; then
echo "Warning: No 'on:' section found in $workflow, skipping..."
continue
fi
# Create a new file with the modified content
{
# Copy the part before 'on:'
head -n $((on_line-1)) "$workflow"
# Add the new on: section that only includes workflow_call
echo "on:"
echo " workflow_call:"
echo " secrets:"
echo " inherit: true"
# Find where to continue after the original 'on:' section
next_section=$(awk "NR > $on_line && /^[a-z]/ {print NR; exit}" "$workflow")
if [ -z "$next_section" ]; then
next_section=$(wc -l < "$workflow")
next_section=$((next_section+1))
fi
# Copy the rest of the file starting from the next section
tail -n +$next_section "$workflow"
} > "${workflow}.new"
# Replace the original with the new version
mv "${workflow}.new" "$workflow"
echo "Modified $workflow to only run when called from test-suites.yml"
else
echo "Warning: $workflow not found, skipping..."
fi
done
echo "Finished modifying workflows!"
echo "Finished modifying workflows!"

View file

@ -45,4 +45,4 @@ jobs:
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
- name: Image digest
run: echo ${{ steps.build.outputs.digest }}
run: echo ${{ steps.build.outputs.digest }}

View file

@ -288,7 +288,7 @@ jobs:
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_permissions.py
run: uv run pytest cognee/tests/test_permissions.py -v --log-level=INFO
test-multi-tenancy:
name: Test multi tenancy with different situations in Cognee
@ -315,6 +315,31 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_multi_tenancy.py
test-data-label:
name: Test adding of label for data in Cognee
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run custom data label test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_custom_data_label.py
test-graph-edges:
name: Test graph edge ingestion
runs-on: ubuntu-22.04

View file

@ -257,7 +257,7 @@ jobs:
with:
python-version: '3.11.x'
- name: Run Memify Tests
- name: Run Permissions Example
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
@ -270,6 +270,65 @@ jobs:
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./examples/python/permissions_example.py
test-s3-permissions-example: # Make sure permission and multi-user mode work with S3 file system
name: Run Permissions Example
runs-on: ubuntu-22.04
defaults:
run:
shell: bash
services:
postgres: # Using postgres to avoid storing and using SQLite from S3
image: pgvector/pgvector:pg17
env:
POSTGRES_USER: cognee
POSTGRES_PASSWORD: cognee
POSTGRES_DB: cognee_db
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "postgres aws"
- name: Run S3 Permissions Example
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
STORAGE_BACKEND: 's3'
AWS_REGION: eu-west-1
AWS_ENDPOINT_URL: https://s3-eu-west-1.amazonaws.com
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_DEV_USER_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_DEV_USER_SECRET_KEY }}
STORAGE_BUCKET_NAME: github-runner-cognee-tests
DATA_ROOT_DIRECTORY: "s3://github-runner-cognee-tests/cognee/data"
SYSTEM_ROOT_DIRECTORY: "s3://github-runner-cognee-tests/cognee/system"
DB_PROVIDER: 'postgres'
DB_NAME: 'cognee_db'
DB_HOST: '127.0.0.1'
DB_PORT: 5432
DB_USERNAME: cognee
DB_PASSWORD: cognee
run: uv run python ./examples/python/permissions_example.py
test_docling_add:
name: Run Add with Docling Test
runs-on: macos-15

View file

@ -72,5 +72,3 @@ jobs:
} catch (error) {
core.warning(`Failed to add label: ${error.message}`);
}

View file

@ -66,5 +66,3 @@ jobs:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_DEV_USER_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_DEV_USER_SECRET_KEY }}
run: uv run python ./cognee/tests/test_load.py

View file

@ -5,7 +5,7 @@ permissions:
contents: read
jobs:
check-uv-lock:
name: Validate uv lockfile and project metadata
name: Lockfile and Pre-commit Hooks
runs-on: ubuntu-22.04
steps:
- name: Check out repository
@ -17,6 +17,9 @@ jobs:
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Validate uv lockfile and project metadata
run: uv lock --check || { echo "'uv lock --check' failed."; echo "Run 'uv lock' and push your changes."; exit 1; }
- name: Run pre-commit hooks
uses: pre-commit/action@v3.0.1

View file

@ -42,10 +42,10 @@ jobs:
echo "tag=${TAG}" >> "$GITHUB_OUTPUT"
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
git tag "${TAG}"
git push origin "${TAG}"
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
@ -54,8 +54,8 @@ jobs:
generate_release_notes: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
release-pypi-package:
release-pypi-package:
needs: release-github
name: Release PyPI Package from ${{ inputs.flavour }}
permissions:
@ -67,25 +67,25 @@ jobs:
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Install Python
run: uv python install
- name: Install dependencies
run: uv sync --locked --all-extras
- name: Build distributions
run: uv build
- name: Publish ${{ inputs.flavour }} release to PyPI
env:
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: uv publish
release-docker-image:
release-docker-image:
needs: release-github
name: Release Docker Image from ${{ inputs.flavour }}
permissions:
@ -128,7 +128,7 @@ jobs:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: |
tags: |
cognee/cognee:${{ needs.release-github.outputs.version }}
cognee/cognee:latest
labels: |
@ -136,3 +136,31 @@ jobs:
flavour=${{ inputs.flavour }}
cache-from: type=registry,ref=cognee/cognee:buildcache
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
trigger-docs-test-suite:
needs: release-pypi-package
if: ${{ inputs.flavour == 'main' }}
runs-on: ubuntu-22.04
steps:
- name: Trigger docs tests
run: |
curl -L -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.REPO_DISPATCH_PAT_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/topoteretes/cognee-docs/dispatches \
-d '{"event_type":"new-main-release","client_payload":{"caller_repo":"'"${GITHUB_REPOSITORY}"'"}}'
trigger-community-test-suite:
needs: release-pypi-package
if: ${{ inputs.flavour == 'main' }}
runs-on: ubuntu-22.04
steps:
- name: Trigger community tests
run: |
curl -L -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.REPO_DISPATCH_PAT_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/topoteretes/cognee-community/dispatches \
-d '{"event_type":"new-main-release","client_payload":{"caller_repo":"'"${GITHUB_REPOSITORY}"'"}}'

View file

@ -14,4 +14,4 @@ jobs:
load-tests:
name: Load Tests
uses: ./.github/workflows/load_tests.yml
secrets: inherit
secrets: inherit

View file

@ -11,12 +11,21 @@ on:
type: string
default: "all"
description: "Which vector databases to test (comma-separated list or 'all')"
python-versions:
required: false
type: string
default: '["3.10", "3.11", "3.12", "3.13"]'
description: "Python versions to test (JSON array)"
jobs:
run-kuzu-lance-sqlite-search-tests:
name: Search test for Kuzu/LanceDB/Sqlite
name: Search test for Kuzu/LanceDB/Sqlite (Python ${{ matrix.python-version }})
runs-on: ubuntu-22.04
if: ${{ inputs.databases == 'all' || contains(inputs.databases, 'kuzu/lance/sqlite') }}
strategy:
matrix:
python-version: ${{ fromJSON(inputs.python-versions) }}
fail-fast: false
steps:
- name: Check out
uses: actions/checkout@v4
@ -26,7 +35,7 @@ jobs:
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
python-version: ${{ matrix.python-version }}
- name: Dependencies already installed
run: echo "Dependencies already installed in setup"
@ -45,13 +54,16 @@ jobs:
GRAPH_DATABASE_PROVIDER: 'kuzu'
VECTOR_DB_PROVIDER: 'lancedb'
DB_PROVIDER: 'sqlite'
run: uv run python ./cognee/tests/test_search_db.py
run: uv run pytest cognee/tests/test_search_db.py -v --log-level=INFO
run-neo4j-lance-sqlite-search-tests:
name: Search test for Neo4j/LanceDB/Sqlite
name: Search test for Neo4j/LanceDB/Sqlite (Python ${{ matrix.python-version }})
runs-on: ubuntu-22.04
if: ${{ inputs.databases == 'all' || contains(inputs.databases, 'neo4j/lance/sqlite') }}
strategy:
matrix:
python-version: ${{ fromJSON(inputs.python-versions) }}
fail-fast: false
steps:
- name: Check out
uses: actions/checkout@v4
@ -61,7 +73,7 @@ jobs:
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
python-version: ${{ matrix.python-version }}
- name: Setup Neo4j with GDS
uses: ./.github/actions/setup_neo4j
@ -88,12 +100,16 @@ jobs:
GRAPH_DATABASE_URL: ${{ steps.neo4j.outputs.neo4j-url }}
GRAPH_DATABASE_USERNAME: ${{ steps.neo4j.outputs.neo4j-username }}
GRAPH_DATABASE_PASSWORD: ${{ steps.neo4j.outputs.neo4j-password }}
run: uv run python ./cognee/tests/test_search_db.py
run: uv run pytest cognee/tests/test_search_db.py -v --log-level=INFO
run-kuzu-pgvector-postgres-search-tests:
name: Search test for Kuzu/PGVector/Postgres
name: Search test for Kuzu/PGVector/Postgres (Python ${{ matrix.python-version }})
runs-on: ubuntu-22.04
if: ${{ inputs.databases == 'all' || contains(inputs.databases, 'kuzu/pgvector/postgres') }}
strategy:
matrix:
python-version: ${{ fromJSON(inputs.python-versions) }}
fail-fast: false
services:
postgres:
image: pgvector/pgvector:pg17
@ -117,7 +133,7 @@ jobs:
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
python-version: ${{ matrix.python-version }}
extra-dependencies: "postgres"
- name: Dependencies already installed
@ -143,12 +159,16 @@ jobs:
DB_PORT: 5432
DB_USERNAME: cognee
DB_PASSWORD: cognee
run: uv run python ./cognee/tests/test_search_db.py
run: uv run pytest cognee/tests/test_search_db.py -v --log-level=INFO
run-neo4j-pgvector-postgres-search-tests:
name: Search test for Neo4j/PGVector/Postgres
name: Search test for Neo4j/PGVector/Postgres (Python ${{ matrix.python-version }})
runs-on: ubuntu-22.04
if: ${{ inputs.databases == 'all' || contains(inputs.databases, 'neo4j/pgvector/postgres') }}
strategy:
matrix:
python-version: ${{ fromJSON(inputs.python-versions) }}
fail-fast: false
services:
postgres:
image: pgvector/pgvector:pg17
@ -172,7 +192,7 @@ jobs:
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
python-version: ${{ matrix.python-version }}
extra-dependencies: "postgres"
- name: Setup Neo4j with GDS
@ -205,4 +225,4 @@ jobs:
DB_PORT: 5432
DB_USERNAME: cognee
DB_PASSWORD: cognee
run: uv run python ./cognee/tests/test_search_db.py
run: uv run pytest cognee/tests/test_search_db.py -v --log-level=INFO

View file

@ -10,7 +10,7 @@ on:
required: false
type: string
default: '["3.10.x", "3.12.x", "3.13.x"]'
os:
os:
required: false
type: string
default: '["ubuntu-22.04", "macos-15", "windows-latest"]'

View file

@ -173,4 +173,4 @@ jobs:
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
run: uv run python ./examples/python/simple_example.py

View file

@ -18,11 +18,11 @@ env:
RUNTIME__LOG_LEVEL: ERROR
ENV: 'dev'
jobs:
jobs:
pre-test:
name: basic checks
uses: ./.github/workflows/pre_test.yml
basic-tests:
name: Basic Tests
uses: ./.github/workflows/basic_tests.yml

2
.gitignore vendored
View file

@ -147,6 +147,8 @@ venv/
ENV/
env.bak/
venv.bak/
mise.toml
deployment/helm/values-local.yml
# Spyder project settings
.spyderproject

View file

@ -6,4 +6,4 @@ pull_request_rules:
actions:
backport:
branches:
- main
- main

View file

@ -7,6 +7,7 @@ repos:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
exclude: ^deployment/helm/templates/
- id: check-added-large-files
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.

View file

@ -128,5 +128,3 @@ MCP server and Frontend:
## CI Mirrors Local Commands
Our GitHub Actions run the same ruff checks and pytest suites shown above (`.github/workflows/basic_tests.yml` and related workflows). Use the commands in this document locally to minimize CI surprises.

590
CLAUDE.md Normal file
View file

@ -0,0 +1,590 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Cognee is an open-source AI memory platform that transforms raw data into persistent knowledge graphs for AI agents. It replaces traditional RAG (Retrieval-Augmented Generation) with an ECL (Extract, Cognify, Load) pipeline combining vector search, graph databases, and LLM-powered entity extraction.
**Requirements**: Python 3.9 - 3.12
## Development Commands
### Setup
```bash
# Create virtual environment (recommended: uv)
uv venv && source .venv/bin/activate
# Install with pip, poetry, or uv
uv pip install -e .
# Install with dev dependencies
uv pip install -e ".[dev]"
# Install with specific extras
uv pip install -e ".[postgres,neo4j,docs,chromadb]"
# Set up pre-commit hooks
pre-commit install
```
### Available Installation Extras
- **postgres** / **postgres-binary** - PostgreSQL + PGVector support
- **neo4j** - Neo4j graph database support
- **neptune** - AWS Neptune support
- **chromadb** - ChromaDB vector database
- **docs** - Document processing (unstructured library)
- **scraping** - Web scraping (Tavily, BeautifulSoup, Playwright)
- **langchain** - LangChain integration
- **llama-index** - LlamaIndex integration
- **anthropic** - Anthropic Claude models
- **gemini** - Google Gemini models
- **ollama** - Ollama local models
- **mistral** - Mistral AI models
- **groq** - Groq API support
- **llama-cpp** - Llama.cpp local inference
- **huggingface** - HuggingFace transformers
- **aws** - S3 storage backend
- **redis** - Redis caching
- **graphiti** - Graphiti-core integration
- **baml** - BAML structured output
- **dlt** - Data load tool (dlt) integration
- **docling** - Docling document processing
- **codegraph** - Code graph extraction
- **evals** - Evaluation tools
- **deepeval** - DeepEval testing framework
- **posthog** - PostHog analytics
- **monitoring** - Sentry + Langfuse observability
- **distributed** - Modal distributed execution
- **dev** - All development tools (pytest, mypy, ruff, etc.)
- **debug** - Debugpy for debugging
### Testing
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=cognee --cov-report=html
# Run specific test file
pytest cognee/tests/test_custom_model.py
# Run specific test function
pytest cognee/tests/test_custom_model.py::test_function_name
# Run async tests
pytest -v cognee/tests/integration/
# Run unit tests only
pytest cognee/tests/unit/
# Run integration tests only
pytest cognee/tests/integration/
```
### Code Quality
```bash
# Run ruff linter
ruff check .
# Run ruff formatter
ruff format .
# Run both linting and formatting (pre-commit)
pre-commit run --all-files
# Type checking with mypy
mypy cognee/
# Run pylint
pylint cognee/
```
### Running Cognee
```bash
# Using Python SDK
python examples/python/simple_example.py
# Using CLI
cognee-cli add "Your text here"
cognee-cli cognify
cognee-cli search "Your query"
cognee-cli delete --all
# Launch full stack with UI
cognee-cli -ui
```
## Architecture Overview
### Core Workflow: add → cognify → search/memify
1. **add()** - Ingest data (files, URLs, text) into datasets
2. **cognify()** - Extract entities/relationships and build knowledge graph
3. **search()** - Query knowledge using various retrieval strategies
4. **memify()** - Enrich graph with additional context and rules
### Key Architectural Patterns
#### 1. Pipeline-Based Processing
All data flows through task-based pipelines (`cognee/modules/pipelines/`). Tasks are composable units that can run sequentially or in parallel. Example pipeline tasks: `classify_documents`, `extract_graph_from_data`, `add_data_points`.
#### 2. Interface-Based Database Adapters
Multiple backends are supported through adapter interfaces:
- **Graph**: Kuzu (default), Neo4j, Neptune via `GraphDBInterface`
- **Vector**: LanceDB (default), ChromaDB, PGVector via `VectorDBInterface`
- **Relational**: SQLite (default), PostgreSQL
Key files:
- `cognee/infrastructure/databases/graph/graph_db_interface.py`
- `cognee/infrastructure/databases/vector/vector_db_interface.py`
#### 3. Multi-Tenant Access Control
User → Dataset → Data hierarchy with permission-based filtering. Enable with `ENABLE_BACKEND_ACCESS_CONTROL=True`. Each user+dataset combination can have isolated graph/vector databases (when using supported backends: Kuzu, LanceDB, SQLite, Postgres).
### Layer Structure
```
API Layer (cognee/api/v1/)
Main Functions (add, cognify, search, memify)
Pipeline Orchestrator (cognee/modules/pipelines/)
Task Execution Layer (cognee/tasks/)
Domain Modules (graph, retrieval, ingestion, etc.)
Infrastructure Adapters (LLM, databases)
External Services (OpenAI, Kuzu, LanceDB, etc.)
```
### Critical Data Flow Paths
#### ADD: Data Ingestion
`add()``resolve_data_directories``ingest_data``save_data_item_to_storage` → Create Dataset + Data records in relational DB
Key files: `cognee/api/v1/add/add.py`, `cognee/tasks/ingestion/ingest_data.py`
#### COGNIFY: Knowledge Graph Construction
`cognify()``classify_documents``extract_chunks_from_documents``extract_graph_from_data` (LLM extracts entities/relationships using Instructor) → `summarize_text``add_data_points` (store in graph + vector DBs)
Key files:
- `cognee/api/v1/cognify/cognify.py`
- `cognee/tasks/graph/extract_graph_from_data.py`
- `cognee/tasks/storage/add_data_points.py`
#### SEARCH: Retrieval
`search(query_text, query_type)` → route to retriever type → filter by permissions → return results
Available search types (from `cognee/modules/search/types/SearchType.py`):
- **GRAPH_COMPLETION** (default) - Graph traversal + LLM completion
- **GRAPH_SUMMARY_COMPLETION** - Uses pre-computed summaries with graph context
- **GRAPH_COMPLETION_COT** - Chain-of-thought reasoning over graph
- **GRAPH_COMPLETION_CONTEXT_EXTENSION** - Extended context graph retrieval
- **TRIPLET_COMPLETION** - Triplet-based (subject-predicate-object) search
- **RAG_COMPLETION** - Traditional RAG with chunks
- **CHUNKS** - Vector similarity search over chunks
- **CHUNKS_LEXICAL** - Lexical (keyword) search over chunks
- **SUMMARIES** - Search pre-computed document summaries
- **CYPHER** - Direct Cypher query execution (requires `ALLOW_CYPHER_QUERY=True`)
- **NATURAL_LANGUAGE** - Natural language to structured query
- **TEMPORAL** - Time-aware graph search
- **FEELING_LUCKY** - Automatic search type selection
- **FEEDBACK** - User feedback-based refinement
- **CODING_RULES** - Code-specific search rules
Key files:
- `cognee/api/v1/search/search.py`
- `cognee/modules/retrieval/context_providers/TripletSearchContextProvider.py`
- `cognee/modules/search/types/SearchType.py`
### Core Data Models
#### Engine Models (`cognee/infrastructure/engine/models/`)
- **DataPoint** - Base class for all graph nodes (versioned, with metadata)
- **Edge** - Graph relationships (source, target, relationship type)
- **Triplet** - (Subject, Predicate, Object) representation
#### Graph Models (`cognee/shared/data_models.py`)
- **KnowledgeGraph** - Container for nodes and edges
- **Node** - Entity (id, name, type, description)
- **Edge** - Relationship (source_node_id, target_node_id, relationship_name)
### Key Infrastructure Components
#### LLM Gateway (`cognee/infrastructure/llm/LLMGateway.py`)
Unified interface for multiple LLM providers: OpenAI, Anthropic, Gemini, Ollama, Mistral, Bedrock. Uses Instructor for structured output extraction.
#### Embedding Engines
Factory pattern for embeddings: `cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py`
#### Document Loaders
Support for PDF, DOCX, CSV, images, audio, code files in `cognee/infrastructure/files/`
## Important Configuration
### Environment Setup
Copy `.env.template` to `.env` and configure:
```bash
# Minimal setup (defaults to OpenAI + local file-based databases)
LLM_API_KEY="your_openai_api_key"
LLM_MODEL="openai/gpt-4o-mini" # Default model
```
**Important**: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both to avoid unexpected defaults.
Default databases (no extra setup needed):
- **Relational**: SQLite (metadata and state storage)
- **Vector**: LanceDB (embeddings for semantic search)
- **Graph**: Kuzu (knowledge graph and relationships)
All stored in `.venv` by default. Override with `DATA_ROOT_DIRECTORY` and `SYSTEM_ROOT_DIRECTORY`.
### Switching Databases
#### Relational Databases
```bash
# PostgreSQL (requires postgres extra: pip install cognee[postgres])
DB_PROVIDER=postgres
DB_HOST=localhost
DB_PORT=5432
DB_USERNAME=cognee
DB_PASSWORD=cognee
DB_NAME=cognee_db
```
#### Vector Databases
Supported: lancedb (default), pgvector, chromadb, qdrant, weaviate, milvus
```bash
# ChromaDB (requires chromadb extra)
VECTOR_DB_PROVIDER=chromadb
# PGVector (requires postgres extra)
VECTOR_DB_PROVIDER=pgvector
VECTOR_DB_URL=postgresql://cognee:cognee@localhost:5432/cognee_db
```
#### Graph Databases
Supported: kuzu (default), neo4j, neptune, kuzu-remote
```bash
# Neo4j (requires neo4j extra: pip install cognee[neo4j])
GRAPH_DATABASE_PROVIDER=neo4j
GRAPH_DATABASE_URL=bolt://localhost:7687
GRAPH_DATABASE_NAME=neo4j
GRAPH_DATABASE_USERNAME=neo4j
GRAPH_DATABASE_PASSWORD=yourpassword
# Remote Kuzu
GRAPH_DATABASE_PROVIDER=kuzu-remote
GRAPH_DATABASE_URL=http://localhost:8000
GRAPH_DATABASE_USERNAME=your_username
GRAPH_DATABASE_PASSWORD=your_password
```
### LLM Provider Configuration
Supported providers: OpenAI (default), Azure OpenAI, Google Gemini, Anthropic, AWS Bedrock, Ollama, LM Studio, Custom (OpenAI-compatible APIs)
#### OpenAI (Recommended - Minimal Setup)
```bash
LLM_API_KEY="your_openai_api_key"
LLM_MODEL="openai/gpt-4o-mini" # or gpt-4o, gpt-4-turbo, etc.
LLM_PROVIDER="openai"
```
#### Azure OpenAI
```bash
LLM_PROVIDER="azure"
LLM_MODEL="azure/gpt-4o-mini"
LLM_ENDPOINT="https://YOUR-RESOURCE.openai.azure.com/openai/deployments/gpt-4o-mini"
LLM_API_KEY="your_azure_api_key"
LLM_API_VERSION="2024-12-01-preview"
```
#### Google Gemini (requires gemini extra)
```bash
LLM_PROVIDER="gemini"
LLM_MODEL="gemini/gemini-2.0-flash-exp"
LLM_API_KEY="your_gemini_api_key"
```
#### Anthropic Claude (requires anthropic extra)
```bash
LLM_PROVIDER="anthropic"
LLM_MODEL="claude-3-5-sonnet-20241022"
LLM_API_KEY="your_anthropic_api_key"
```
#### Ollama (Local - requires ollama extra)
```bash
LLM_PROVIDER="ollama"
LLM_MODEL="llama3.1:8b"
LLM_ENDPOINT="http://localhost:11434/v1"
LLM_API_KEY="ollama"
EMBEDDING_PROVIDER="ollama"
EMBEDDING_MODEL="nomic-embed-text:latest"
EMBEDDING_ENDPOINT="http://localhost:11434/api/embed"
HUGGINGFACE_TOKENIZER="nomic-ai/nomic-embed-text-v1.5"
```
#### Custom / OpenRouter / vLLM
```bash
LLM_PROVIDER="custom"
LLM_MODEL="openrouter/google/gemini-2.0-flash-lite-preview-02-05:free"
LLM_ENDPOINT="https://openrouter.ai/api/v1"
LLM_API_KEY="your_api_key"
```
#### AWS Bedrock (requires aws extra)
```bash
LLM_PROVIDER="bedrock"
LLM_MODEL="anthropic.claude-3-sonnet-20240229-v1:0"
AWS_REGION="us-east-1"
AWS_ACCESS_KEY_ID="your_access_key"
AWS_SECRET_ACCESS_KEY="your_secret_key"
# Optional for temporary credentials:
# AWS_SESSION_TOKEN="your_session_token"
```
#### LLM Rate Limiting
```bash
LLM_RATE_LIMIT_ENABLED=true
LLM_RATE_LIMIT_REQUESTS=60 # Requests per interval
LLM_RATE_LIMIT_INTERVAL=60 # Interval in seconds
```
#### Instructor Mode (Structured Output)
```bash
# LLM_INSTRUCTOR_MODE controls how structured data is extracted
# Each LLM has its own default (e.g., gpt-4o models use "json_schema_mode")
# Override if needed:
LLM_INSTRUCTOR_MODE="json_schema_mode" # or "tool_call", "md_json", etc.
```
### Structured Output Framework
```bash
# Use Instructor (default, via litellm)
STRUCTURED_OUTPUT_FRAMEWORK="instructor"
# Or use BAML (requires baml extra: pip install cognee[baml])
STRUCTURED_OUTPUT_FRAMEWORK="baml"
BAML_LLM_PROVIDER=openai
BAML_LLM_MODEL="gpt-4o-mini"
BAML_LLM_API_KEY="your_api_key"
```
### Storage Backend
```bash
# Local filesystem (default)
STORAGE_BACKEND="local"
# S3 (requires aws extra: pip install cognee[aws])
STORAGE_BACKEND="s3"
STORAGE_BUCKET_NAME="your-bucket-name"
AWS_REGION="us-east-1"
AWS_ACCESS_KEY_ID="your_access_key"
AWS_SECRET_ACCESS_KEY="your_secret_key"
DATA_ROOT_DIRECTORY="s3://your-bucket/cognee/data"
SYSTEM_ROOT_DIRECTORY="s3://your-bucket/cognee/system"
```
## Extension Points
### Adding New Functionality
1. **New Task Type**: Create task function in `cognee/tasks/`, return Task object, register in pipeline
2. **New Database Backend**: Implement `GraphDBInterface` or `VectorDBInterface` in `cognee/infrastructure/databases/`
3. **New LLM Provider**: Add configuration in LLM config (uses litellm)
4. **New Document Processor**: Extend loaders in `cognee/modules/data/processing/`
5. **New Search Type**: Add to `SearchType` enum and implement retriever in `cognee/modules/retrieval/`
6. **Custom Graph Models**: Define Pydantic models extending `DataPoint` in your code
### Working with Ontologies
Cognee supports ontology-based entity extraction to ground knowledge graphs in standardized semantic frameworks (e.g., OWL ontologies).
Configuration:
```bash
ONTOLOGY_RESOLVER=rdflib # Default: uses rdflib and OWL files
MATCHING_STRATEGY=fuzzy # Default: fuzzy matching with 80% similarity
ONTOLOGY_FILE_PATH=/path/to/your/ontology.owl # Full path to ontology file
```
Implementation: `cognee/modules/ontology/`
## Branching Strategy
**IMPORTANT**: Always branch from `dev`, not `main`. The `dev` branch is the active development branch.
```bash
git checkout dev
git pull origin dev
git checkout -b feature/your-feature-name
```
## Code Style
- **Formatter**: Ruff (configured in `pyproject.toml`)
- **Line length**: 100 characters
- **String quotes**: Use double quotes `"` not single quotes `'` (enforced by ruff-format)
- **Pre-commit hooks**: Run ruff linting and formatting automatically
- **Type hints**: Encouraged (mypy checks enabled)
- **Important**: Always run `pre-commit run --all-files` before committing to catch formatting issues
## Testing Strategy
Tests are organized in `cognee/tests/`:
- `unit/` - Unit tests for individual modules
- `integration/` - Full pipeline integration tests
- `cli_tests/` - CLI command tests
- `tasks/` - Task-specific tests
When adding features, add corresponding tests. Integration tests should cover the full add → cognify → search flow.
## API Structure
FastAPI application with versioned routes under `cognee/api/v1/`:
- `/add` - Data ingestion
- `/cognify` - Knowledge graph processing
- `/search` - Query interface
- `/memify` - Graph enrichment
- `/datasets` - Dataset management
- `/users` - Authentication (if `REQUIRE_AUTHENTICATION=True`)
- `/visualize` - Graph visualization server
## Python SDK Entry Points
Main functions exported from `cognee/__init__.py`:
- `add(data, dataset_name)` - Ingest data
- `cognify(datasets)` - Build knowledge graph
- `search(query_text, query_type)` - Query knowledge
- `memify(extraction_tasks, enrichment_tasks)` - Enrich graph
- `delete(data_id)` - Remove data
- `config()` - Configuration management
- `datasets()` - Dataset operations
All functions are async - use `await` or `asyncio.run()`.
## Security Considerations
Several security environment variables in `.env`:
- `ACCEPT_LOCAL_FILE_PATH` - Allow local file paths (default: True)
- `ALLOW_HTTP_REQUESTS` - Allow HTTP requests from Cognee (default: True)
- `ALLOW_CYPHER_QUERY` - Allow raw Cypher queries (default: True)
- `REQUIRE_AUTHENTICATION` - Enable API authentication (default: False)
- `ENABLE_BACKEND_ACCESS_CONTROL` - Multi-tenant isolation (default: True)
For production deployments, review and tighten these settings.
## Common Patterns
### Creating a Custom Pipeline Task
```python
from cognee.modules.pipelines.tasks.Task import Task
async def my_custom_task(data):
# Your logic here
processed_data = process(data)
return processed_data
# Use in pipeline
task = Task(my_custom_task)
```
### Accessing Databases Directly
```python
from cognee.infrastructure.databases.graph import get_graph_engine
from cognee.infrastructure.databases.vector import get_vector_engine
graph_engine = await get_graph_engine()
vector_engine = await get_vector_engine()
```
### Using LLM Gateway
```python
from cognee.infrastructure.llm.get_llm_client import get_llm_client
llm_client = get_llm_client()
response = await llm_client.acreate_structured_output(
text_input="Your prompt",
system_prompt="System instructions",
response_model=YourPydanticModel
)
```
## Key Concepts
### Datasets
Datasets are project-level containers that support organization, permissions, and isolated processing workflows. Each user can have multiple datasets with different access permissions.
```python
# Create/use a dataset
await cognee.add(data, dataset_name="my_project")
await cognee.cognify(datasets=["my_project"])
```
### DataPoints
Atomic knowledge units that form the foundation of graph structures. All graph nodes extend the `DataPoint` base class with versioning and metadata support.
### Permissions System
Multi-tenant architecture with users, roles, and Access Control Lists (ACLs):
- Read, write, delete, and share permissions per dataset
- Enable with `ENABLE_BACKEND_ACCESS_CONTROL=True`
- Supports isolated databases per user+dataset (Kuzu, LanceDB, SQLite, Postgres)
### Graph Visualization
Launch visualization server:
```bash
# Via CLI
cognee-cli -ui # Launches full stack with UI at http://localhost:3000
# Via Python
from cognee.api.v1.visualize import start_visualization_server
await start_visualization_server(port=8080)
```
## Debugging & Troubleshooting
### Debug Configuration
- Set `LITELLM_LOG="DEBUG"` for verbose LLM logs (default: "ERROR")
- Enable debug mode: `ENV="development"` or `ENV="debug"`
- Disable telemetry: `TELEMETRY_DISABLED=1`
- Check logs in structured format (uses structlog)
- Use `debugpy` optional dependency for debugging: `pip install cognee[debug]`
### Common Issues
**Ollama + OpenAI Embeddings NoDataError**
- Issue: Mixing Ollama with OpenAI embeddings can cause errors
- Solution: Configure both LLM and embeddings to use the same provider, or ensure `HUGGINGFACE_TOKENIZER` is set when using Ollama
**LM Studio Structured Output**
- Issue: LM Studio requires explicit instructor mode
- Solution: Set `LLM_INSTRUCTOR_MODE="json_schema_mode"` (or appropriate mode)
**Default Provider Fallback**
- Issue: Configuring only LLM or only embeddings defaults the other to OpenAI
- Solution: Always configure both LLM and embedding providers, or ensure valid OpenAI API key
**Permission Denied on Search**
- Behavior: Returns empty list rather than error (prevents information leakage)
- Solution: Check dataset permissions and user access rights
**Database Connection Issues**
- Check: Verify database URLs, credentials, and that services are running
- Docker users: Use `DB_HOST=host.docker.internal` for local databases
**Rate Limiting Errors**
- Enable client-side rate limiting: `LLM_RATE_LIMIT_ENABLED=true`
- Adjust limits: `LLM_RATE_LIMIT_REQUESTS` and `LLM_RATE_LIMIT_INTERVAL`
## Resources
- [Documentation](https://docs.cognee.ai/)
- [Discord Community](https://discord.gg/NQPKmU5CCg)
- [GitHub Issues](https://github.com/topoteretes/cognee/issues)
- [Example Notebooks](examples/python/)
- [Research Paper](https://arxiv.org/abs/2505.24478) - Optimizing knowledge graphs for LLM reasoning

View file

@ -1,16 +1,16 @@
> [!IMPORTANT]
> **Note for contributors:** When branching out, create a new branch from the `dev` branch.
# 🎉 Welcome to **cognee**!
# 🎉 Welcome to **cognee**!
We're excited that you're interested in contributing to our project!
We want to ensure that every user and contributor feels welcome, included and supported to participate in cognee community.
We're excited that you're interested in contributing to our project!
We want to ensure that every user and contributor feels welcome, included and supported to participate in cognee community.
This guide will help you get started and ensure your contributions can be efficiently integrated into the project.
## 🌟 Quick Links
- [Code of Conduct](CODE_OF_CONDUCT.md)
- [Discord Community](https://discord.gg/bcy8xFAtfd)
- [Discord Community](https://discord.gg/bcy8xFAtfd)
- [Issue Tracker](https://github.com/topoteretes/cognee/issues)
- [Cognee Docs](https://docs.cognee.ai)
@ -62,6 +62,11 @@ Looking for a place to start? Try filtering for [good first issues](https://gith
## 2. 🛠️ Development Setup
### Required tools
* [Python](https://www.python.org/downloads/)
* [uv](https://docs.astral.sh/uv/getting-started/installation/)
* pre-commit: `uv run pip install pre-commit && pre-commit install`
### Fork and Clone
1. Fork the [**cognee**](https://github.com/topoteretes/cognee) repository
@ -93,8 +98,26 @@ git checkout -b feature/your-feature-name
4. **Commits**: Write clear commit messages
### Running Tests
Rename `.env.example` into `.env` and provide your OPENAI_API_KEY as LLM_API_KEY
```shell
python cognee/cognee/tests/test_library.py
uv run python cognee/tests/test_library.py
```
### Running Simple Example
Rename `.env.example` into `.env` and provide your OPENAI_API_KEY as LLM_API_KEY
Make sure to run ```shell uv sync ``` in the root cloned folder or set up a virtual environment to run cognee
```shell
python examples/python/simple_example.py
```
or
```shell
uv run python examples/python/simple_example.py
```
### Running Simple Example
@ -106,7 +129,7 @@ Make sure to run ```shell uv sync ``` in the root cloned folder or set up a virt
```shell
python cognee/cognee/examples/python/simple_example.py
```
or
or
```shell
uv run python cognee/cognee/examples/python/simple_example.py
@ -114,8 +137,7 @@ uv run python cognee/cognee/examples/python/simple_example.py
## 4. 📤 Submitting Changes
1. Install ruff on your system
2. Run ```ruff format .``` and ``` ruff check ``` and fix the issues
1. Make sure that `pre-commit` and hooks are installed. See `Required tools` section for more information. Try executing `pre-commit run` if you are not sure.
3. Push your changes:
```shell
git add .

View file

@ -32,7 +32,7 @@ COPY README.md pyproject.toml uv.lock entrypoint.sh ./
# Install the project's dependencies using the lockfile and settings
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --extra debug --extra api --extra postgres --extra neo4j --extra llama-index --extra ollama --extra mistral --extra groq --extra anthropic --frozen --no-install-project --no-dev --no-editable
uv sync --extra debug --extra api --extra postgres --extra neo4j --extra llama-index --extra ollama --extra mistral --extra groq --extra anthropic --extra chromadb --frozen --no-install-project --no-dev --no-editable
# Copy Alembic configuration
COPY alembic.ini /app/alembic.ini
@ -43,7 +43,7 @@ COPY alembic/ /app/alembic
COPY ./cognee /app/cognee
COPY ./distributed /app/distributed
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --extra debug --extra api --extra postgres --extra neo4j --extra llama-index --extra ollama --extra mistral --extra groq --extra anthropic --frozen --no-dev --no-editable
uv sync --extra debug --extra api --extra postgres --extra neo4j --extra llama-index --extra ollama --extra mistral --extra groq --extra anthropic --extra chromadb --frozen --no-dev --no-editable
FROM python:3.12-slim-bookworm

View file

@ -65,11 +65,14 @@ Use your data to build personalized and dynamic memory for AI Agents. Cognee let
## About Cognee
Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.
Cognee offers default memory creation and search which we describe bellow. But with Cognee you can build your own!
Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.
You can use Cognee in two ways:
### Cognee Open Source:
1. [Self-host Cognee Open Source](https://docs.cognee.ai/getting-started/installation), which stores all data locally by default.
2. [Connect to Cognee Cloud](https://platform.cognee.ai/), and get the same OSS stack on managed infrastructure for easier development and productionization.
### Cognee Open Source (self-hosted):
- Interconnects any type of data — including past conversations, files, images, and audio transcriptions
- Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
@ -77,6 +80,11 @@ Cognee offers default memory creation and search which we describe bellow. But w
- Provides Pythonic data pipelines for ingestion from 30+ data sources
- Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints
### Cognee Cloud (managed):
- Hosted web UI dashboard
- Automatic version updates
- Resource usage analytics
- GDPR compliant, enterprise-grade security
## Basic Usage & Feature Guide
@ -111,7 +119,7 @@ To integrate other LLM providers, see our [LLM Provider Documentation](https://d
### Step 3: Run the Pipeline
Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.
Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.
Now, run a minimal pipeline:
@ -150,7 +158,7 @@ As you can see, the output is generated from the document we previously stored i
Cognee turns documents into AI memory.
```
### Use the Cognee CLI
### Use the Cognee CLI
As an alternative, you can get started with these essential commands:

View file

@ -1 +1 @@
Generic single-database configuration with an async dbapi.
Generic single-database configuration with an async dbapi.

View file

@ -0,0 +1,52 @@
"""Enable delete for old tutorial notebooks
Revision ID: 1a58b986e6e1
Revises: 46a6ce2bd2b2
Create Date: 2025-12-17 11:04:44.414259
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "1a58b986e6e1"
down_revision: Union[str, None] = "e1ec1dcb50b6"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def change_tutorial_deletable_flag(deletable: bool) -> None:
bind = op.get_bind()
inspector = sa.inspect(bind)
if "notebooks" not in inspector.get_table_names():
return
columns = {col["name"] for col in inspector.get_columns("notebooks")}
required_columns = {"name", "deletable"}
if not required_columns.issubset(columns):
return
notebooks = sa.table(
"notebooks",
sa.Column("name", sa.String()),
sa.Column("deletable", sa.Boolean()),
)
tutorial_name = "Python Development with Cognee Tutorial 🧠"
bind.execute(
notebooks.update().where(notebooks.c.name == tutorial_name).values(deletable=deletable)
)
def upgrade() -> None:
change_tutorial_deletable_flag(True)
def downgrade() -> None:
change_tutorial_deletable_flag(False)

View file

@ -0,0 +1,38 @@
"""Add label column to data table
Revision ID: a1b2c3d4e5f6
Revises: 211ab850ef3d
Create Date: 2025-11-17 17:54:32.123456
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "a1b2c3d4e5f6"
down_revision: Union[str, None] = "46a6ce2bd2b2"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
label_column = _get_column(insp, "data", "label")
if not label_column:
op.add_column("data", sa.Column("label", sa.String(), nullable=True))
def downgrade() -> None:
op.drop_column("data", "label")

View file

@ -0,0 +1,51 @@
"""add_last_accessed_to_data
Revision ID: e1ec1dcb50b6
Revises: 211ab850ef3d
Create Date: 2025-11-04 21:45:52.642322
"""
import os
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "e1ec1dcb50b6"
down_revision: Union[str, None] = "a1b2c3d4e5f6"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
last_accessed_column = _get_column(insp, "data", "last_accessed")
if not last_accessed_column:
# Always create the column for schema consistency
op.add_column("data", sa.Column("last_accessed", sa.DateTime(timezone=True), nullable=True))
# Only initialize existing records if feature is enabled
enable_last_accessed = os.getenv("ENABLE_LAST_ACCESSED", "false").lower() == "true"
if enable_last_accessed:
op.execute("UPDATE data SET last_accessed = CURRENT_TIMESTAMP")
def downgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
last_accessed_column = _get_column(insp, "data", "last_accessed")
if last_accessed_column:
op.drop_column("data", "last_accessed")

View file

@ -43,10 +43,10 @@ Saiba mais sobre os [casos de uso](https://docs.cognee.ai/use-cases) e [avaliaç
## Funcionalidades
- Conecte e recupere suas conversas passadas, documentos, imagens e transcrições de áudio
- Reduza alucinações, esforço de desenvolvimento e custos
- Carregue dados em bancos de dados de grafos e vetores usando apenas Pydantic
- Transforme e organize seus dados enquanto os coleta de mais de 30 fontes diferentes
- Conecte e recupere suas conversas passadas, documentos, imagens e transcrições de áudio
- Reduza alucinações, esforço de desenvolvimento e custos
- Carregue dados em bancos de dados de grafos e vetores usando apenas Pydantic
- Transforme e organize seus dados enquanto os coleta de mais de 30 fontes diferentes
## Primeiros Passos
@ -108,7 +108,7 @@ if __name__ == '__main__':
Exemplo do output:
```
O Processamento de Linguagem Natural (NLP) é um campo interdisciplinar e transdisciplinar que envolve ciência da computação e recuperação de informações. Ele se concentra na interação entre computadores e a linguagem humana, permitindo que as máquinas compreendam e processem a linguagem natural.
```
Visualização do grafo:

View file

@ -141,7 +141,7 @@ if __name__ == '__main__':
2. Простая демонстрация GraphRAG
[Видео](https://github.com/user-attachments/assets/d80b0776-4eb9-4b8e-aa22-3691e2d44b8f)
3. Cognee с Ollama
3. Cognee с Ollama
[Видео](https://github.com/user-attachments/assets/8621d3e8-ecb8-4860-afb2-5594f2ee17db)
## Правила поведения

View file

@ -114,7 +114,7 @@ if __name__ == '__main__':
示例输出:
```
自然语言处理NLP是计算机科学和信息检索的跨学科领域。它关注计算机和人类语言之间的交互使机器能够理解和处理自然语言。
```
图形可视化:
<a href="https://rawcdn.githack.com/topoteretes/cognee/refs/heads/main/assets/graph_visualization.html"><img src="https://rawcdn.githack.com/topoteretes/cognee/refs/heads/main/assets/graph_visualization.png" width="100%" alt="图形可视化"></a>

File diff suppressed because it is too large Load diff

View file

@ -9,14 +9,15 @@
"lint": "next lint"
},
"dependencies": {
"@auth0/nextjs-auth0": "^4.13.1",
"@auth0/nextjs-auth0": "^4.14.0",
"classnames": "^2.5.1",
"culori": "^4.0.1",
"d3-force-3d": "^3.0.6",
"next": "16.1.1",
"react": "^19.2.0",
"react-dom": "^19.2.0",
"next": "^16.1.1",
"react": "^19.2.3",
"react-dom": "^19.2.3",
"react-force-graph-2d": "^1.27.1",
"react-markdown": "^10.1.0",
"uuid": "^9.0.1"
},
"devDependencies": {

View file

@ -55,7 +55,7 @@ export default function CogneeAddWidget({ onData, useCloud = false }: CogneeAddW
setTrue: setProcessingFilesInProgress,
setFalse: setProcessingFilesDone,
} = useBoolean(false);
const handleAddFiles = (dataset: Dataset, event: ChangeEvent<HTMLInputElement>) => {
event.stopPropagation();

View file

@ -111,7 +111,7 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
const [isAuthShapeChangeEnabled, setIsAuthShapeChangeEnabled] = useState(true);
const shapeChangeTimeout = useRef<number | null>(null);
useEffect(() => {
onGraphShapeChange(DEFAULT_GRAPH_SHAPE);

View file

@ -57,7 +57,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
// Initial size calculation
handleResize();
// ResizeObserver
// ResizeObserver
const resizeObserver = new ResizeObserver(() => {
handleResize();
});
@ -216,7 +216,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
}, [data, graphRef]);
const [graphShape, setGraphShape] = useState<string>();
const zoomToFit: ForceGraphMethods["zoomToFit"] = (
durationMs?: number,
padding?: number,
@ -227,15 +227,15 @@ export default function GraphVisualization({ ref, data, graphControls, className
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return undefined as any;
}
return graphRef.current.zoomToFit?.(durationMs, padding, nodeFilter);
};
useImperativeHandle(ref, () => ({
zoomToFit,
setGraphShape,
}));
return (
<div ref={containerRef} className={classNames("w-full h-full", className)} id="graph-container">

View file

@ -1373,4 +1373,4 @@
"padding": 20
}
}
}
}

View file

@ -15,6 +15,8 @@ import AddDataToCognee from "./AddDataToCognee";
import NotebooksAccordion from "./NotebooksAccordion";
import CogneeInstancesAccordion from "./CogneeInstancesAccordion";
import InstanceDatasetsAccordion from "./InstanceDatasetsAccordion";
import cloudFetch from "@/modules/instances/cloudFetch";
import localFetch from "@/modules/instances/localFetch";
interface DashboardProps {
user?: {
@ -26,6 +28,17 @@ interface DashboardProps {
accessToken: string;
}
const cogneeInstances = {
cloudCognee: {
name: "CloudCognee",
fetch: cloudFetch,
},
localCognee: {
name: "LocalCognee",
fetch: localFetch,
}
};
export default function Dashboard({ accessToken }: DashboardProps) {
fetch.setAccessToken(accessToken);
const { user } = useAuthenticatedUser();
@ -38,7 +51,7 @@ export default function Dashboard({ accessToken }: DashboardProps) {
updateNotebook,
saveNotebook,
removeNotebook,
} = useNotebooks();
} = useNotebooks(cogneeInstances.localCognee);
useEffect(() => {
if (!notebooks.length) {

View file

@ -134,7 +134,7 @@ export default function DatasetsAccordion({
} = useBoolean(false);
const [datasetToRemove, setDatasetToRemove] = useState<Dataset | null>(null);
const handleDatasetRemove = (dataset: Dataset) => {
setDatasetToRemove(dataset);
openRemoveDatasetModal();

View file

@ -3,6 +3,7 @@ import { useCallback, useEffect } from "react";
import { fetch, isCloudEnvironment, useBoolean } from "@/utils";
import { checkCloudConnection } from "@/modules/cloud";
import { setApiKey } from "@/modules/instances/cloudFetch";
import { CaretIcon, CloseIcon, CloudIcon, LocalCogneeIcon } from "@/ui/Icons";
import { CTAButton, GhostButton, IconButton, Input, Modal } from "@/ui/elements";
@ -24,6 +25,7 @@ export default function InstanceDatasetsAccordion({ onDatasetsChange }: Instance
const checkConnectionToCloudCognee = useCallback((apiKey?: string) => {
if (apiKey) {
fetch.setApiKey(apiKey);
setApiKey(apiKey);
}
return checkCloudConnection()
.then(setCloudCogneeConnected)

View file

@ -45,7 +45,7 @@ export default function Plan() {
<div className="bg-white rounded-xl px-5 py-5 mb-2">
Affordable and transparent pricing
</div>
<div className="grid grid-cols-3 gap-x-2.5">
<div className="pt-13 py-4 px-5 mb-2.5 rounded-tl-xl rounded-tr-xl bg-white h-full">
<div>Basic</div>

View file

@ -40,7 +40,7 @@ export default function useChat(dataset: Dataset) {
setTrue: disableSearchRun,
setFalse: enableSearchRun,
} = useBoolean(false);
const refreshChat = useCallback(async () => {
const data = await fetchMessages();
return setMessages(data);

View file

@ -46,7 +46,7 @@ function useDatasets(useCloud = false) {
// checkDatasetStatuses(datasets);
// }, 50000);
// }, [fetchDatasetStatuses]);
// useEffect(() => {
// return () => {
// if (statusTimeout.current !== null) {
@ -95,6 +95,7 @@ function useDatasets(useCloud = false) {
})
.catch((error) => {
console.error('Error fetching datasets:', error);
throw error;
});
}, [useCloud]);

View file

@ -0,0 +1,59 @@
import handleServerErrors from "@/utils/handleServerErrors";
// let numberOfRetries = 0;
const cloudApiUrl = process.env.NEXT_PUBLIC_CLOUD_API_URL || "http://localhost:8001";
let apiKey: string | null = process.env.NEXT_PUBLIC_COGWIT_API_KEY || null;
export function setApiKey(newApiKey: string) {
apiKey = newApiKey;
};
export default async function cloudFetch(url: URL | RequestInfo, options: RequestInit = {}): Promise<Response> {
// function retry(lastError: Response) {
// if (numberOfRetries >= 1) {
// return Promise.reject(lastError);
// }
// numberOfRetries += 1;
// return global.fetch("/auth/token")
// .then(() => {
// return fetch(url, options);
// });
// }
const authHeaders = {
"Authorization": `X-Api-Key ${apiKey}`,
};
return global.fetch(
cloudApiUrl + "/api" + (typeof url === "string" ? url : url.toString()).replace("/v1", ""),
{
...options,
headers: {
...options.headers,
...authHeaders,
} as HeadersInit,
credentials: "include",
},
)
.then((response) => handleServerErrors(response, null, true))
.catch((error) => {
if (error.message === "NEXT_REDIRECT") {
throw error;
}
if (error.detail === undefined) {
return Promise.reject(
new Error("No connection to the server.")
);
}
return Promise.reject(error);
});
// .finally(() => {
// numberOfRetries = 0;
// });
}

View file

@ -0,0 +1,27 @@
import handleServerErrors from "@/utils/handleServerErrors";
const localApiUrl = process.env.NEXT_PUBLIC_LOCAL_API_URL || "http://localhost:8000";
export default async function localFetch(url: URL | RequestInfo, options: RequestInit = {}): Promise<Response> {
return global.fetch(
localApiUrl + "/api" + (typeof url === "string" ? url : url.toString()),
{
...options,
credentials: "include",
},
)
.then((response) => handleServerErrors(response, null, false))
.catch((error) => {
if (error.message === "NEXT_REDIRECT") {
throw error;
}
if (error.detail === undefined) {
return Promise.reject(
new Error("No connection to the server.")
);
}
return Promise.reject(error);
});
}

View file

@ -0,0 +1,4 @@
export interface CogneeInstance {
name: string;
fetch: typeof global.fetch;
}

View file

@ -0,0 +1,13 @@
import { CogneeInstance } from "@/modules/instances/types";
export default function createNotebook(notebookName: string, instance: CogneeInstance) {
return instance.fetch("/v1/notebooks/", {
body: JSON.stringify({ name: notebookName }),
method: "POST",
headers: {
"Content-Type": "application/json",
},
}).then((response: Response) =>
response.ok ? response.json() : Promise.reject(response)
);
}

View file

@ -0,0 +1,7 @@
import { CogneeInstance } from "@/modules/instances/types";
export default function deleteNotebook(notebookId: string, instance: CogneeInstance) {
return instance.fetch(`/v1/notebooks/${notebookId}`, {
method: "DELETE",
});
}

View file

@ -0,0 +1,12 @@
import { CogneeInstance } from "@/modules/instances/types";
export default function getNotebooks(instance: CogneeInstance) {
return instance.fetch("/v1/notebooks/", {
method: "GET",
headers: {
"Content-Type": "application/json",
},
}).then((response: Response) =>
response.ok ? response.json() : Promise.reject(response)
);
}

View file

@ -0,0 +1,14 @@
import { Cell } from "@/ui/elements/Notebook/types";
import { CogneeInstance } from "@/modules/instances/types";
export default function runNotebookCell(notebookId: string, cell: Cell, instance: CogneeInstance) {
return instance.fetch(`/v1/notebooks/${notebookId}/${cell.id}/run`, {
body: JSON.stringify({
content: cell.content,
}),
method: "POST",
headers: {
"Content-Type": "application/json",
},
}).then((response: Response) => response.json());
}

View file

@ -0,0 +1,13 @@
import { CogneeInstance } from "@/modules/instances/types";
export default function saveNotebook(notebookId: string, notebookData: object, instance: CogneeInstance) {
return instance.fetch(`/v1/notebooks/${notebookId}`, {
body: JSON.stringify(notebookData),
method: "PUT",
headers: {
"Content-Type": "application/json",
},
}).then((response: Response) =>
response.ok ? response.json() : Promise.reject(response)
);
}

View file

@ -1,20 +1,18 @@
import { useCallback, useState } from "react";
import { fetch, isCloudEnvironment } from "@/utils";
import { Cell, Notebook } from "@/ui/elements/Notebook/types";
import { CogneeInstance } from "@/modules/instances/types";
import createNotebook from "./createNotebook";
import deleteNotebook from "./deleteNotebook";
import getNotebooks from "./getNotebooks";
import runNotebookCell from "./runNotebookCell";
import { default as persistNotebook } from "./saveNotebook";
function useNotebooks() {
function useNotebooks(instance: CogneeInstance) {
const [notebooks, setNotebooks] = useState<Notebook[]>([]);
const addNotebook = useCallback((notebookName: string) => {
return fetch("/v1/notebooks", {
body: JSON.stringify({ name: notebookName }),
method: "POST",
headers: {
"Content-Type": "application/json",
},
}, isCloudEnvironment())
.then((response) => response.json())
.then((notebook) => {
return createNotebook(notebookName, instance)
.then((notebook: Notebook) => {
setNotebooks((notebooks) => [
...notebooks,
notebook,
@ -22,36 +20,29 @@ function useNotebooks() {
return notebook;
});
}, []);
}, [instance]);
const removeNotebook = useCallback((notebookId: string) => {
return fetch(`/v1/notebooks/${notebookId}`, {
method: "DELETE",
}, isCloudEnvironment())
return deleteNotebook(notebookId, instance)
.then(() => {
setNotebooks((notebooks) =>
notebooks.filter((notebook) => notebook.id !== notebookId)
);
});
}, []);
}, [instance]);
const fetchNotebooks = useCallback(() => {
return fetch("/v1/notebooks", {
headers: {
"Content-Type": "application/json",
},
}, isCloudEnvironment())
.then((response) => response.json())
return getNotebooks(instance)
.then((notebooks) => {
setNotebooks(notebooks);
return notebooks;
})
.catch((error) => {
console.error("Error fetching notebooks:", error);
console.error("Error fetching notebooks:", error.detail);
throw error
});
}, []);
}, [instance]);
const updateNotebook = useCallback((updatedNotebook: Notebook) => {
setNotebooks((existingNotebooks) =>
@ -64,20 +55,13 @@ function useNotebooks() {
}, []);
const saveNotebook = useCallback((notebook: Notebook) => {
return fetch(`/v1/notebooks/${notebook.id}`, {
body: JSON.stringify({
name: notebook.name,
cells: notebook.cells,
}),
method: "PUT",
headers: {
"Content-Type": "application/json",
},
}, isCloudEnvironment())
.then((response) => response.json())
}, []);
return persistNotebook(notebook.id, {
name: notebook.name,
cells: notebook.cells,
}, instance);
}, [instance]);
const runCell = useCallback((notebook: Notebook, cell: Cell, cogneeInstance: string) => {
const runCell = useCallback((notebook: Notebook, cell: Cell) => {
setNotebooks((existingNotebooks) =>
existingNotebooks.map((existingNotebook) =>
existingNotebook.id === notebook.id ? {
@ -89,20 +73,11 @@ function useNotebooks() {
error: undefined,
} : existingCell
),
} : notebook
} : existingNotebook
)
);
return fetch(`/v1/notebooks/${notebook.id}/${cell.id}/run`, {
body: JSON.stringify({
content: cell.content,
}),
method: "POST",
headers: {
"Content-Type": "application/json",
},
}, cogneeInstance === "cloud")
.then((response) => response.json())
return runNotebookCell(notebook.id, cell, instance)
.then((response) => {
setNotebooks((existingNotebooks) =>
existingNotebooks.map((existingNotebook) =>
@ -115,11 +90,11 @@ function useNotebooks() {
error: response.error,
} : existingCell
),
} : notebook
} : existingNotebook
)
);
});
}, []);
}, [instance]);
return {
notebooks,

View file

@ -7,4 +7,4 @@ export default function GitHubIcon({ width = 24, height = 24, color = 'currentCo
</g>
</svg>
);
}
}

View file

@ -46,7 +46,7 @@ export default function Header({ user }: HeaderProps) {
checkMCPConnection();
const interval = setInterval(checkMCPConnection, 30000);
return () => clearInterval(interval);
}, [setMCPConnected, setMCPDisconnected]);

View file

@ -90,7 +90,7 @@ export default function SearchView() {
scrollToBottom();
setSearchInputValue("");
// Pass topK to sendMessage
sendMessage(chatInput, searchType, topK)
.then(scrollToBottom)
@ -171,4 +171,4 @@ export default function SearchView() {
</form>
</div>
);
}
}

View file

@ -1,3 +1,2 @@
export { default as Modal } from "./Modal";
export { default as useModal } from "./useModal";

View file

@ -0,0 +1,76 @@
import { memo } from "react";
import ReactMarkdown from "react-markdown";
interface MarkdownPreviewProps {
content: string;
className?: string;
}
function MarkdownPreview({ content, className = "" }: MarkdownPreviewProps) {
return (
<div className={`min-h-24 max-h-96 overflow-y-auto p-4 prose prose-sm max-w-none ${className}`}>
<ReactMarkdown
components={{
h1: ({ children }) => <h1 className="text-2xl font-bold mt-4 mb-2">{children}</h1>,
h2: ({ children }) => <h2 className="text-xl font-bold mt-3 mb-2">{children}</h2>,
h3: ({ children }) => <h3 className="text-lg font-bold mt-3 mb-2">{children}</h3>,
h4: ({ children }) => <h4 className="text-base font-bold mt-2 mb-1">{children}</h4>,
h5: ({ children }) => <h5 className="text-sm font-bold mt-2 mb-1">{children}</h5>,
h6: ({ children }) => <h6 className="text-xs font-bold mt-2 mb-1">{children}</h6>,
p: ({ children }) => <p className="mb-2">{children}</p>,
ul: ({ children }) => <ul className="list-disc list-inside mb-2 ml-4">{children}</ul>,
ol: ({ children }) => <ol className="list-decimal list-inside mb-2 ml-4">{children}</ol>,
li: ({ children }) => <li className="mb-1">{children}</li>,
blockquote: ({ children }) => (
<blockquote className="border-l-4 border-gray-300 pl-4 italic my-2">{children}</blockquote>
),
code: ({ className, children, ...props }) => {
const isInline = !className;
return isInline ? (
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono" {...props}>
{children}
</code>
) : (
<code className="block bg-gray-100 p-2 rounded text-sm font-mono overflow-x-auto" {...props}>
{children}
</code>
);
},
pre: ({ children }) => (
<pre className="bg-gray-100 p-2 rounded text-sm font-mono overflow-x-auto mb-2">
{children}
</pre>
),
a: ({ href, children }) => (
<a href={href} className="text-blue-600 hover:underline" target="_blank" rel="noopener noreferrer">
{children}
</a>
),
strong: ({ children }) => <strong className="font-bold">{children}</strong>,
em: ({ children }) => <em className="italic">{children}</em>,
hr: () => <hr className="my-4 border-gray-300" />,
table: ({ children }) => (
<div className="overflow-x-auto my-2">
<table className="min-w-full border border-gray-300">{children}</table>
</div>
),
thead: ({ children }) => <thead className="bg-gray-100">{children}</thead>,
tbody: ({ children }) => <tbody>{children}</tbody>,
tr: ({ children }) => <tr className="border-b border-gray-300">{children}</tr>,
th: ({ children }) => (
<th className="border border-gray-300 px-4 py-2 text-left font-bold">
{children}
</th>
),
td: ({ children }) => (
<td className="border border-gray-300 px-4 py-2">{children}</td>
),
}}
>
{content}
</ReactMarkdown>
</div>
);
}
export default memo(MarkdownPreview);

View file

@ -2,15 +2,17 @@
import { v4 as uuid4 } from "uuid";
import classNames from "classnames";
import { Fragment, MouseEvent, RefObject, useCallback, useEffect, useRef, useState } from "react";
import { Fragment, MouseEvent, MutableRefObject, useCallback, useEffect, useRef, useState, memo } from "react";
import { useModal } from "@/ui/elements/Modal";
import { CaretIcon, CloseIcon, PlusIcon } from "@/ui/Icons";
import { IconButton, PopupMenu, TextArea, Modal, GhostButton, CTAButton } from "@/ui/elements";
import PopupMenu from "@/ui/elements/PopupMenu";
import { IconButton, TextArea, Modal, GhostButton, CTAButton } from "@/ui/elements";
import { GraphControlsAPI } from "@/app/(graph)/GraphControls";
import GraphVisualization, { GraphVisualizationAPI } from "@/app/(graph)/GraphVisualization";
import NotebookCellHeader from "./NotebookCellHeader";
import MarkdownPreview from "./MarkdownPreview";
import { Cell, Notebook as NotebookType } from "./types";
interface NotebookProps {
@ -19,7 +21,186 @@ interface NotebookProps {
updateNotebook: (updatedNotebook: NotebookType) => void;
}
interface NotebookCellProps {
cell: Cell;
index: number;
isOpen: boolean;
isMarkdownEditMode: boolean;
onToggleOpen: () => void;
onToggleMarkdownEdit: () => void;
onContentChange: (value: string) => void;
onCellRun: (cell: Cell, cogneeInstance: string) => Promise<void>;
onCellRename: (cell: Cell) => void;
onCellRemove: (cell: Cell) => void;
onCellUp: (cell: Cell) => void;
onCellDown: (cell: Cell) => void;
onCellAdd: (afterCellIndex: number, cellType: "markdown" | "code") => void;
}
const NotebookCell = memo(function NotebookCell({
cell,
index,
isOpen,
isMarkdownEditMode,
onToggleOpen,
onToggleMarkdownEdit,
onContentChange,
onCellRun,
onCellRename,
onCellRemove,
onCellUp,
onCellDown,
onCellAdd,
}: NotebookCellProps) {
return (
<Fragment>
<div className="flex flex-row rounded-xl border-1 border-gray-100">
<div className="flex flex-col flex-1 relative">
{cell.type === "code" ? (
<>
<div className="absolute left-[-1.35rem] top-2.5">
<IconButton className="p-[0.25rem] m-[-0.25rem]" onClick={onToggleOpen}>
<CaretIcon className={classNames("transition-transform", isOpen ? "rotate-0" : "rotate-180")} />
</IconButton>
</div>
<NotebookCellHeader
cell={cell}
runCell={onCellRun}
renameCell={onCellRename}
removeCell={onCellRemove}
moveCellUp={onCellUp}
moveCellDown={onCellDown}
className="rounded-tl-xl rounded-tr-xl"
/>
{isOpen && (
<>
<TextArea
value={cell.content}
onChange={onContentChange}
isAutoExpanding
name="cellInput"
placeholder="Type your code here..."
className="resize-none min-h-36 max-h-96 overflow-y-auto rounded-tl-none rounded-tr-none rounded-bl-xl rounded-br-xl border-0 !outline-0"
/>
<div className="flex flex-col bg-gray-100 overflow-x-auto max-w-full">
{cell.result && (
<div className="px-2 py-2">
output: <CellResult content={cell.result} />
</div>
)}
{!!cell.error?.length && (
<div className="px-2 py-2">
error: {cell.error}
</div>
)}
</div>
</>
)}
</>
) : (
<>
<div className="absolute left-[-1.35rem] top-2.5">
<IconButton className="p-[0.25rem] m-[-0.25rem]" onClick={onToggleOpen}>
<CaretIcon className={classNames("transition-transform", isOpen ? "rotate-0" : "rotate-180")} />
</IconButton>
</div>
<NotebookCellHeader
cell={cell}
renameCell={onCellRename}
removeCell={onCellRemove}
moveCellUp={onCellUp}
moveCellDown={onCellDown}
className="rounded-tl-xl rounded-tr-xl"
/>
{isOpen && (
<div className="relative rounded-tl-none rounded-tr-none rounded-bl-xl rounded-br-xl border-0 overflow-hidden">
<GhostButton
onClick={onToggleMarkdownEdit}
className="absolute top-2 right-2.5 text-xs leading-[1] !px-2 !py-1 !h-auto"
>
{isMarkdownEditMode ? "Preview" : "Edit"}
</GhostButton>
{isMarkdownEditMode ? (
<TextArea
value={cell.content}
onChange={onContentChange}
isAutoExpanding
name="markdownInput"
placeholder="Type your markdown here..."
className="resize-none min-h-24 max-h-96 overflow-y-auto rounded-tl-none rounded-tr-none rounded-bl-xl rounded-br-xl border-0 !outline-0 !bg-gray-50"
/>
) : (
<MarkdownPreview content={cell.content} className="!bg-gray-50" />
)}
</div>
)}
</>
)}
</div>
</div>
<div className="ml-[-1.35rem]">
<PopupMenu
openToRight={true}
triggerElement={<PlusIcon />}
triggerClassName="p-[0.25rem] m-[-0.25rem]"
>
<div className="flex flex-col gap-0.5">
<button
onClick={() => onCellAdd(index, "markdown")}
className="hover:bg-gray-100 w-full text-left px-2 cursor-pointer"
>
<span>text</span>
</button>
</div>
<div
onClick={() => onCellAdd(index, "code")}
className="hover:bg-gray-100 w-full text-left px-2 cursor-pointer"
>
<span>code</span>
</div>
</PopupMenu>
</div>
</Fragment>
);
});
export default function Notebook({ notebook, updateNotebook, runCell }: NotebookProps) {
const [openCells, setOpenCells] = useState(new Set(notebook.cells.map((c: Cell) => c.id)));
const [markdownEditMode, setMarkdownEditMode] = useState<Set<string>>(new Set());
const toggleCellOpen = useCallback((id: string) => {
setOpenCells((prev) => {
const newState = new Set(prev);
if (newState.has(id)) {
newState.delete(id)
} else {
newState.add(id);
}
return newState;
});
}, []);
const toggleMarkdownEditMode = useCallback((id: string) => {
setMarkdownEditMode((prev) => {
const newState = new Set(prev);
if (newState.has(id)) {
newState.delete(id);
} else {
newState.add(id);
}
return newState;
});
}, []);
useEffect(() => {
if (notebook.cells.length === 0) {
const newCell: Cell = {
@ -34,7 +215,7 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
});
toggleCellOpen(newCell.id)
}
}, [notebook, updateNotebook]);
}, [notebook, updateNotebook, toggleCellOpen]);
const handleCellRun = useCallback((cell: Cell, cogneeInstance: string) => {
return runCell(notebook, cell, cogneeInstance);
@ -43,7 +224,7 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
const handleCellAdd = useCallback((afterCellIndex: number, cellType: "markdown" | "code") => {
const newCell: Cell = {
id: uuid4(),
name: "new cell",
name: cellType === "markdown" ? "Markdown Cell" : "Code Cell",
type: cellType,
content: "",
};
@ -59,7 +240,7 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
toggleCellOpen(newCell.id);
updateNotebook(newNotebook);
}, [notebook, updateNotebook]);
}, [notebook, updateNotebook, toggleCellOpen]);
const removeCell = useCallback((cell: Cell, event?: MouseEvent) => {
event?.preventDefault();
@ -81,14 +262,12 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
openCellRemoveConfirmModal(cell);
}, [openCellRemoveConfirmModal]);
const handleCellInputChange = useCallback((notebook: NotebookType, cell: Cell, value: string) => {
const newCell = {...cell, content: value };
const handleCellInputChange = useCallback((cellId: string, value: string) => {
updateNotebook({
...notebook,
cells: notebook.cells.map((cell: Cell) => (cell.id === newCell.id ? newCell : cell)),
cells: notebook.cells.map((cell: Cell) => (cell.id === cellId ? {...cell, content: value} : cell)),
});
}, [updateNotebook]);
}, [notebook, updateNotebook]);
const handleCellUp = useCallback((cell: Cell) => {
const index = notebook.cells.indexOf(cell);
@ -131,133 +310,28 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
}
}, [notebook, updateNotebook]);
const [openCells, setOpenCells] = useState(new Set(notebook.cells.map((c: Cell) => c.id)));
const toggleCellOpen = (id: string) => {
setOpenCells((prev) => {
const newState = new Set(prev);
if (newState.has(id)) {
newState.delete(id)
} else {
newState.add(id);
}
return newState;
});
};
return (
<>
<div className="bg-white rounded-xl flex flex-col gap-0.5 px-7 py-5 flex-1">
<div className="mb-5">{notebook.name}</div>
{notebook.cells.map((cell: Cell, index) => (
<Fragment key={cell.id}>
<div key={cell.id} className="flex flex-row rounded-xl border-1 border-gray-100">
<div className="flex flex-col flex-1 relative">
{cell.type === "code" ? (
<>
<div className="absolute left-[-1.35rem] top-2.5">
<IconButton className="p-[0.25rem] m-[-0.25rem]" onClick={toggleCellOpen.bind(null, cell.id)}>
<CaretIcon className={classNames("transition-transform", openCells.has(cell.id) ? "rotate-0" : "rotate-180")} />
</IconButton>
</div>
<NotebookCellHeader
cell={cell}
runCell={handleCellRun}
renameCell={handleCellRename}
removeCell={handleCellRemove}
moveCellUp={handleCellUp}
moveCellDown={handleCellDown}
className="rounded-tl-xl rounded-tr-xl"
/>
{openCells.has(cell.id) && (
<>
<TextArea
value={cell.content}
onChange={handleCellInputChange.bind(null, notebook, cell)}
// onKeyUp={handleCellRunOnEnter}
isAutoExpanding
name="cellInput"
placeholder="Type your code here..."
contentEditable={true}
className="resize-none min-h-36 max-h-96 overflow-y-auto rounded-tl-none rounded-tr-none rounded-bl-xl rounded-br-xl border-0 !outline-0"
/>
<div className="flex flex-col bg-gray-100 overflow-x-auto max-w-full">
{cell.result && (
<div className="px-2 py-2">
output: <CellResult content={cell.result} />
</div>
)}
{!!cell.error?.length && (
<div className="px-2 py-2">
error: {cell.error}
</div>
)}
</div>
</>
)}
</>
) : (
<>
<div className="absolute left-[-1.35rem] top-2.5">
<IconButton className="p-[0.25rem] m-[-0.25rem]" onClick={toggleCellOpen.bind(null, cell.id)}>
<CaretIcon className={classNames("transition-transform", openCells.has(cell.id) ? "rotate-0" : "rotate-180")} />
</IconButton>
</div>
<NotebookCellHeader
cell={cell}
renameCell={handleCellRename}
removeCell={handleCellRemove}
moveCellUp={handleCellUp}
moveCellDown={handleCellDown}
className="rounded-tl-xl rounded-tr-xl"
/>
{openCells.has(cell.id) && (
<TextArea
value={cell.content}
onChange={handleCellInputChange.bind(null, notebook, cell)}
// onKeyUp={handleCellRunOnEnter}
isAutoExpanding
name="cellInput"
placeholder="Type your text here..."
contentEditable={true}
className="resize-none min-h-24 max-h-96 overflow-y-auto rounded-tl-none rounded-tr-none rounded-bl-xl rounded-br-xl border-0 !outline-0"
/>
)}
</>
)}
</div>
</div>
<div className="ml-[-1.35rem]">
<PopupMenu
openToRight={true}
triggerElement={<PlusIcon />}
triggerClassName="p-[0.25rem] m-[-0.25rem]"
>
<div className="flex flex-col gap-0.5">
<button
onClick={() => handleCellAdd(index, "markdown")}
className="hover:bg-gray-100 w-full text-left px-2 cursor-pointer"
>
<span>text</span>
</button>
</div>
<div
onClick={() => handleCellAdd(index, "code")}
className="hover:bg-gray-100 w-full text-left px-2 cursor-pointer"
>
<span>code</span>
</div>
</PopupMenu>
</div>
</Fragment>
<NotebookCell
key={cell.id}
cell={cell}
index={index}
isOpen={openCells.has(cell.id)}
isMarkdownEditMode={markdownEditMode.has(cell.id)}
onToggleOpen={() => toggleCellOpen(cell.id)}
onToggleMarkdownEdit={() => toggleMarkdownEditMode(cell.id)}
onContentChange={(value) => handleCellInputChange(cell.id, value)}
onCellRun={handleCellRun}
onCellRename={handleCellRename}
onCellRemove={handleCellRemove}
onCellUp={handleCellUp}
onCellDown={handleCellDown}
onCellAdd={handleCellAdd}
/>
))}
</div>
@ -288,6 +362,10 @@ function CellResult({ content }: { content: [] }) {
getSelectedNode: () => null,
});
if (content.length === 0) {
return <span>OK</span>;
}
for (const line of content) {
try {
if (Array.isArray(line)) {
@ -298,7 +376,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph</span>
<GraphVisualization
data={transformInsightsGraphData(line)}
ref={graphRef as RefObject<GraphVisualizationAPI>}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
@ -346,7 +424,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as RefObject<GraphVisualizationAPI>}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
@ -356,8 +434,7 @@ function CellResult({ content }: { content: [] }) {
}
}
}
if (typeof(line) === "object" && line["result"] && typeof(line["result"]) === "string") {
else if (typeof(line) === "object" && line["result"] && typeof(line["result"]) === "string") {
const datasets = Array.from(
// eslint-disable-next-line @typescript-eslint/no-explicit-any
new Set(Object.values(line["datasets"]).map((dataset: any) => dataset.name))
@ -369,39 +446,46 @@ function CellResult({ content }: { content: [] }) {
<span className="block px-2 py-2 whitespace-normal">{line["result"]}</span>
</div>
);
}
if (typeof(line) === "object" && line["graphs"]) {
Object.entries<{ nodes: []; edges: []; }>(line["graphs"]).forEach(([datasetName, graph]) => {
parsedContent.push(
<div key={datasetName} className="w-full h-full bg-white">
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
</div>
);
});
}
if (typeof(line) === "object" && line["result"] && typeof(line["result"]) === "object") {
if (line["graphs"]) {
Object.entries<{ nodes: []; edges: []; }>(line["graphs"]).forEach(([datasetName, graph]) => {
parsedContent.push(
<div key={datasetName} className="w-full h-full bg-white">
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
</div>
);
});
}
}
else if (typeof(line) === "object" && line["result"] && typeof(line["result"]) === "object") {
parsedContent.push(
<pre className="px-2 w-full h-full bg-white text-sm" key={String(line).slice(0, -10)}>
{JSON.stringify(line["result"], null, 2)}
</pre>
)
}
if (typeof(line) === "string") {
else if (typeof(line) === "object") {
parsedContent.push(
<pre className="px-2 w-full h-full bg-white text-sm" key={String(line).slice(0, -10)}>
{JSON.stringify(line, null, 2)}
</pre>
)
}
else if (typeof(line) === "string") {
parsedContent.push(
<pre className="px-2 w-full h-full bg-white text-sm whitespace-normal" key={String(line).slice(0, -10)}>
{line}
</pre>
)
}
} catch (error) {
console.error(error);
} catch {
// It is fine if we don't manage to parse the output line, we show it as it is.
parsedContent.push(
<pre className="px-2 w-full h-full bg-white text-sm whitespace-normal" key={String(line).slice(0, -10)}>
{line}
@ -415,7 +499,6 @@ function CellResult({ content }: { content: [] }) {
{item}
</div>
));
};
function transformToVisualizationData(graph: { nodes: [], edges: [] }) {
@ -451,7 +534,7 @@ function transformInsightsGraphData(triplets: Triplet[]) {
target: string,
label: string,
}
} = {};
} = {};
for (const triplet of triplets) {
nodes[triplet[0].id] = {
@ -471,7 +554,7 @@ function transformInsightsGraphData(triplets: Triplet[]) {
label: triplet[1]["relationship_name"],
};
}
return {
nodes: Object.values(nodes),
links: Object.values(links),

View file

@ -1,9 +1,12 @@
"use client";
import { useState } from "react";
import classNames from "classnames";
import { isCloudEnvironment, useBoolean } from "@/utils";
import { PlayIcon } from "@/ui/Icons";
import { PopupMenu, IconButton } from "@/ui/elements";
import PopupMenu from "@/ui/elements/PopupMenu";
import { IconButton } from "@/ui/elements";
import { LoadingIndicator } from "@/ui/App";
import { Cell } from "./types";
@ -39,7 +42,7 @@ export default function NotebookCellHeader({
if (runCell) {
setIsRunningCell();
runCell(cell, runInstance)
.then(() => {
.finally(() => {
setIsNotRunningCell();
});
}
@ -53,7 +56,7 @@ export default function NotebookCellHeader({
{isRunningCell ? <LoadingIndicator /> : <IconButton onClick={handleCellRun}><PlayIcon /></IconButton>}
</>
)}
<span className="ml-4">{cell.name}</span>
<span className="ml-4">{cell.type === "markdown" ? "Markdown Cell" : cell.name}</span>
</div>
<div className="pr-4 flex flex-row items-center gap-8">
{runCell && (

View file

@ -1,12 +1,12 @@
"use client";
import classNames from "classnames";
import { InputHTMLAttributes, useCallback, useEffect, useLayoutEffect, useRef } from "react"
import { InputHTMLAttributes, useCallback, useEffect, useRef } from "react"
interface TextAreaProps extends Omit<InputHTMLAttributes<HTMLTextAreaElement>, "onChange"> {
isAutoExpanding?: boolean; // Set to true to enable auto-expanding text area behavior. Default is false.
value: string;
onChange: (value: string) => void;
value?: string;
onChange?: (value: string) => void;
}
export default function TextArea({
@ -19,95 +19,81 @@ export default function TextArea({
placeholder = "",
onKeyUp,
...props
}: TextAreaProps) {
const handleTextChange = useCallback((event: Event) => {
const fakeTextAreaElement = event.target as HTMLDivElement;
const newValue = fakeTextAreaElement.innerText;
}: TextAreaProps) {
const textareaRef = useRef<HTMLTextAreaElement>(null);
const maxHeightRef = useRef<number | null>(null);
const throttleTimeoutRef = useRef<number | null>(null);
const lastAdjustTimeRef = useRef<number>(0);
const THROTTLE_MS = 250; // 4 calculations per second
const adjustHeight = useCallback(() => {
if (!isAutoExpanding || !textareaRef.current) return;
const textarea = textareaRef.current;
// Cache maxHeight on first calculation
if (maxHeightRef.current === null) {
const computedStyle = getComputedStyle(textarea);
maxHeightRef.current = computedStyle.maxHeight === "none"
? Infinity
: parseInt(computedStyle.maxHeight) || Infinity;
}
// Reset height to auto to get the correct scrollHeight
textarea.style.height = "auto";
// Set height to scrollHeight, but respect max-height
const scrollHeight = textarea.scrollHeight;
textarea.style.height = `${Math.min(scrollHeight, maxHeightRef.current)}px`;
lastAdjustTimeRef.current = Date.now();
}, [isAutoExpanding]);
const handleChange = useCallback((event: React.ChangeEvent<HTMLTextAreaElement>) => {
const newValue = event.target.value;
onChange?.(newValue);
}, [onChange]);
const handleKeyUp = useCallback((event: Event) => {
if (onKeyUp) {
onKeyUp(event as unknown as React.KeyboardEvent<HTMLTextAreaElement>);
}
}, [onKeyUp]);
// Throttle height adjustments to avoid blocking typing
if (isAutoExpanding) {
const now = Date.now();
const timeSinceLastAdjust = now - lastAdjustTimeRef.current;
const handleTextAreaFocus = (event: React.FocusEvent<HTMLDivElement>) => {
if (event.target.innerText.trim() === placeholder) {
event.target.innerText = "";
}
};
const handleTextAreaBlur = (event: React.FocusEvent<HTMLDivElement>) => {
if (value === "") {
event.target.innerText = placeholder;
}
};
const handleChange = (event: React.ChangeEvent<HTMLTextAreaElement>) => {
onChange(event.target.value);
};
const fakeTextAreaRef = useRef<HTMLDivElement>(null);
useLayoutEffect(() => {
const fakeTextAreaElement = fakeTextAreaRef.current;
if (fakeTextAreaElement && fakeTextAreaElement.innerText.trim() !== "") {
fakeTextAreaElement.innerText = placeholder;
}
}, [placeholder]);
useLayoutEffect(() => {
const fakeTextAreaElement = fakeTextAreaRef.current;
if (fakeTextAreaElement) {
fakeTextAreaElement.addEventListener("input", handleTextChange);
fakeTextAreaElement.addEventListener("keyup", handleKeyUp);
}
return () => {
if (fakeTextAreaElement) {
fakeTextAreaElement.removeEventListener("input", handleTextChange);
fakeTextAreaElement.removeEventListener("keyup", handleKeyUp);
if (timeSinceLastAdjust >= THROTTLE_MS) {
adjustHeight();
} else {
if (throttleTimeoutRef.current !== null) {
clearTimeout(throttleTimeoutRef.current);
}
throttleTimeoutRef.current = window.setTimeout(() => {
adjustHeight();
throttleTimeoutRef.current = null;
}, THROTTLE_MS - timeSinceLastAdjust);
}
};
}, [handleKeyUp, handleTextChange]);
}
}, [onChange, isAutoExpanding, adjustHeight]);
useEffect(() => {
const fakeTextAreaElement = fakeTextAreaRef.current;
const textAreaText = fakeTextAreaElement?.innerText;
if (fakeTextAreaElement && (value === "" || value === "\n")) {
fakeTextAreaElement.innerText = placeholder;
return;
if (isAutoExpanding && textareaRef.current) {
adjustHeight();
}
}, [value, isAutoExpanding, adjustHeight]);
if (fakeTextAreaElement && textAreaText !== value) {
fakeTextAreaElement.innerText = value;
}
}, [placeholder, value]);
useEffect(() => {
return () => {
if (throttleTimeoutRef.current !== null) {
clearTimeout(throttleTimeoutRef.current);
}
};
}, []);
return isAutoExpanding ? (
<>
<div
ref={fakeTextAreaRef}
contentEditable="true"
role="textbox"
aria-multiline="true"
className={classNames("block w-full rounded-md bg-white px-4 py-4 text-base text-gray-900 outline-1 -outline-offset-1 outline-gray-300 placeholder:text-gray-400 focus:outline-2 focus:-outline-offset-2 focus:outline-indigo-600", className)}
onFocus={handleTextAreaFocus}
onBlur={handleTextAreaBlur}
/>
</>
) : (
return (
<textarea
ref={isAutoExpanding ? textareaRef : undefined}
name={name}
style={style}
value={value}
placeholder={placeholder}
className={classNames("block w-full rounded-md bg-white px-4 py-4 text-base text-gray-900 outline-1 -outline-offset-1 outline-gray-300 placeholder:text-gray-400 focus:outline-2 focus:-outline-offset-2 focus:outline-indigo-600", className)}
onChange={handleChange}
onKeyUp={onKeyUp}
{...props}
/>
)

View file

@ -10,4 +10,4 @@ export { default as NeutralButton } from "./NeutralButton";
export { default as StatusIndicator } from "./StatusIndicator";
export { default as StatusDot } from "./StatusDot";
export { default as Accordion } from "./Accordion";
export { default as Notebook } from "./Notebook";
export { default as Notebook } from "./Notebook";

View file

@ -57,7 +57,7 @@ export default async function fetch(url: string, options: RequestInit = {}, useC
new Error("Backend server is not responding. Please check if the server is running.")
);
}
if (error.detail === undefined) {
return Promise.reject(
new Error("No connection to the server.")
@ -74,7 +74,7 @@ export default async function fetch(url: string, options: RequestInit = {}, useC
fetch.checkHealth = async () => {
const maxRetries = 5;
const retryDelay = 1000; // 1 second
for (let i = 0; i < maxRetries; i++) {
try {
const response = await global.fetch(`${backendApiUrl.replace("/api", "")}/health`);
@ -90,7 +90,7 @@ fetch.checkHealth = async () => {
await new Promise(resolve => setTimeout(resolve, retryDelay));
}
}
throw new Error("Backend server is not responding after multiple attempts");
};

View file

@ -1,8 +1,12 @@
import { redirect } from "next/navigation";
export default function handleServerErrors(response: Response, retry?: (response: Response) => Promise<Response>, useCloud?: boolean): Promise<Response> {
export default function handleServerErrors(
response: Response,
retry: ((response: Response) => Promise<Response>) | null = null,
useCloud: boolean = false,
): Promise<Response> {
return new Promise((resolve, reject) => {
if (response.status === 401 && !useCloud) {
if ((response.status === 401 || response.status === 403) && !useCloud) {
if (retry) {
return retry(response)
.catch(() => {

View file

@ -105,14 +105,14 @@ If you'd rather run cognee-mcp in a container, you have two options:
```bash
# For HTTP transport (recommended for web deployments)
docker run -e TRANSPORT_MODE=http --env-file ./.env -p 8000:8000 --rm -it cognee/cognee-mcp:main
# For SSE transport
# For SSE transport
docker run -e TRANSPORT_MODE=sse --env-file ./.env -p 8000:8000 --rm -it cognee/cognee-mcp:main
# For stdio transport (default)
docker run -e TRANSPORT_MODE=stdio --env-file ./.env --rm -it cognee/cognee-mcp:main
```
**Installing optional dependencies at runtime:**
You can install optional dependencies when running the container by setting the `EXTRAS` environment variable:
```bash
# Install a single optional dependency group at runtime
@ -122,7 +122,7 @@ If you'd rather run cognee-mcp in a container, you have two options:
--env-file ./.env \
-p 8000:8000 \
--rm -it cognee/cognee-mcp:main
# Install multiple optional dependency groups at runtime (comma-separated)
docker run \
-e TRANSPORT_MODE=sse \
@ -131,7 +131,7 @@ If you'd rather run cognee-mcp in a container, you have two options:
-p 8000:8000 \
--rm -it cognee/cognee-mcp:main
```
**Available optional dependency groups:**
- `aws` - S3 storage support
- `postgres` / `postgres-binary` - PostgreSQL database support
@ -160,7 +160,7 @@ If you'd rather run cognee-mcp in a container, you have two options:
# With stdio transport (default)
docker run -e TRANSPORT_MODE=stdio --env-file ./.env --rm -it cognee/cognee-mcp:main
```
**With runtime installation of optional dependencies:**
```bash
# Install optional dependencies from Docker Hub image
@ -357,7 +357,7 @@ You can configure both transports simultaneously for testing:
"url": "http://localhost:8000/sse"
},
"cognee-http": {
"type": "http",
"type": "http",
"url": "http://localhost:8000/mcp"
}
}

View file

@ -7,11 +7,11 @@ echo "Environment: $ENVIRONMENT"
# Install optional dependencies if EXTRAS is set
if [ -n "$EXTRAS" ]; then
echo "Installing optional dependencies: $EXTRAS"
# Get the cognee version that's currently installed
COGNEE_VERSION=$(uv pip show cognee | grep "Version:" | awk '{print $2}')
echo "Current cognee version: $COGNEE_VERSION"
# Build the extras list for cognee
IFS=',' read -ra EXTRA_ARRAY <<< "$EXTRAS"
# Combine base extras from pyproject.toml with requested extras
@ -28,11 +28,11 @@ if [ -n "$EXTRAS" ]; then
fi
fi
done
echo "Installing cognee with extras: $ALL_EXTRAS"
echo "Running: uv pip install 'cognee[$ALL_EXTRAS]==$COGNEE_VERSION'"
uv pip install "cognee[$ALL_EXTRAS]==$COGNEE_VERSION"
# Verify installation
echo ""
echo "✓ Optional dependencies installation completed"
@ -93,19 +93,19 @@ if [ -n "$API_URL" ]; then
if echo "$API_URL" | grep -q "localhost" || echo "$API_URL" | grep -q "127.0.0.1"; then
echo "⚠️ Warning: API_URL contains localhost/127.0.0.1"
echo " Original: $API_URL"
# Try to use host.docker.internal (works on Mac/Windows and recent Linux with Docker Desktop)
FIXED_API_URL=$(echo "$API_URL" | sed 's/localhost/host.docker.internal/g' | sed 's/127\.0\.0\.1/host.docker.internal/g')
echo " Converted to: $FIXED_API_URL"
echo " This will work on Mac/Windows/Docker Desktop."
echo " On Linux without Docker Desktop, you may need to:"
echo " - Use --network host, OR"
echo " - Set API_URL=http://172.17.0.1:8000 (Docker bridge IP)"
API_URL="$FIXED_API_URL"
fi
API_ARGS="--api-url $API_URL"
if [ -n "$API_TOKEN" ]; then
API_ARGS="$API_ARGS --api-token $API_TOKEN"

View file

@ -151,7 +151,7 @@ class CogneeClient:
query_type: str,
datasets: Optional[List[str]] = None,
system_prompt: Optional[str] = None,
top_k: int = 5,
top_k: int = 10,
) -> Any:
"""
Search the knowledge graph.

View file

@ -627,8 +627,7 @@ class TestModel:
print(f"Failed: {failed}")
print(f"Success Rate: {(passed / total_tests * 100):.1f}%")
if failed > 0:
print(f"\n ⚠️ {failed} test(s) failed - review results above for details")
assert failed == 0, f"\n ⚠️ {failed} test(s) failed - review results above for details"
async def main():

4183
cognee-mcp/uv.lock generated

File diff suppressed because it is too large Load diff

View file

@ -16,4 +16,4 @@ EMBEDDING_API_VERSION=""
GRAPHISTRY_USERNAME=""
GRAPHISTRY_PASSWORD=""
GRAPHISTRY_PASSWORD=""

View file

@ -1,8 +1,20 @@
# ⚠️ DEPRECATED - Go to `new-examples/` Instead
This starter kit is deprecated. Its examples have been integrated into the `/new-examples/` folder.
| Old Location | New Location |
|--------------|--------------|
| `src/pipelines/default.py` | none |
| `src/pipelines/low_level.py` | `new-examples/custom_pipelines/organizational_hierarchy/` |
| `src/pipelines/custom-model.py` | `new-examples/demos/custom_graph_model_entity_schema_definition.py` |
| `src/data/` | Included in `new-examples/custom_pipelines/organizational_hierarchy/data/` |
----------
# Cognee Starter Kit
Welcome to the <a href="https://github.com/topoteretes/cognee">cognee</a> Starter Repo! This repository is designed to help you get started quickly by providing a structured dataset and pre-built data pipelines using cognee to build powerful knowledge graphs.
You can use this repo to ingest, process, and visualize data in minutes.
You can use this repo to ingest, process, and visualize data in minutes.
By following this guide, you will:
@ -68,7 +80,7 @@ Custom model uses custom pydantic model for graph extraction. This script catego
python src/pipelines/custom-model.py
```
## Graph preview
## Graph preview
cognee provides a visualize_graph function that will render the graph for you.

View file

@ -10,13 +10,14 @@ from cognee.modules.pipelines.layers.reset_dataset_pipeline_run_status import (
)
from cognee.modules.engine.operations.setup import setup
from cognee.tasks.ingestion import ingest_data, resolve_data_directories
from cognee.tasks.ingestion.data_item import DataItem
from cognee.shared.logging_utils import get_logger
logger = get_logger()
async def add(
data: Union[BinaryIO, list[BinaryIO], str, list[str]],
data: Union[BinaryIO, list[BinaryIO], str, list[str], DataItem, list[DataItem]],
dataset_name: str = "main_dataset",
user: User = None,
node_set: Optional[List[str]] = None,

View file

@ -252,7 +252,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
chunk_size: int = None,
config: Config = None,
custom_prompt: Optional[str] = None,
chunks_per_batch: int = 100,
chunks_per_batch: int = None,
**kwargs,
) -> list[Task]:
if config is None:
@ -272,12 +272,14 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
"ontology_config": {"ontology_resolver": get_default_ontology_resolver()}
}
if chunks_per_batch is None:
chunks_per_batch = 100
cognify_config = get_cognify_config()
embed_triplets = cognify_config.triplet_embedding
if chunks_per_batch is None:
chunks_per_batch = (
cognify_config.chunks_per_batch if cognify_config.chunks_per_batch is not None else 100
)
default_tasks = [
Task(classify_documents),
Task(
@ -308,7 +310,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
async def get_temporal_tasks(
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
) -> list[Task]:
"""
Builds and returns a list of temporal processing tasks to be executed in sequence.
@ -330,7 +332,10 @@ async def get_temporal_tasks(
list[Task]: A list of Task objects representing the temporal processing pipeline.
"""
if chunks_per_batch is None:
chunks_per_batch = 10
from cognee.modules.cognify.config import get_cognify_config
configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
temporal_tasks = [
Task(classify_documents),

View file

@ -46,6 +46,11 @@ class CognifyPayloadDTO(InDTO):
examples=[[]],
description="Reference to one or more previously uploaded ontologies",
)
chunks_per_batch: Optional[int] = Field(
default=None,
description="Number of chunks to process per task batch in Cognify (overrides default).",
examples=[10, 20, 50, 100],
)
def get_cognify_router() -> APIRouter:
@ -146,6 +151,7 @@ def get_cognify_router() -> APIRouter:
config=config_to_use,
run_in_background=payload.run_in_background,
custom_prompt=payload.custom_prompt,
chunks_per_batch=payload.chunks_per_batch,
)
# If any cognify run errored return JSONResponse with proper error status code

View file

@ -7,7 +7,9 @@ from fastapi import status
from fastapi import APIRouter
from fastapi.encoders import jsonable_encoder
from fastapi import HTTPException, Query, Depends
from fastapi.responses import JSONResponse, FileResponse
from fastapi.responses import JSONResponse, FileResponse, StreamingResponse
from urllib.parse import urlparse
from pathlib import Path
from cognee.api.DTO import InDTO, OutDTO
from cognee.infrastructure.databases.relational import get_relational_engine
@ -44,6 +46,7 @@ class DatasetDTO(OutDTO):
class DataDTO(OutDTO):
id: UUID
name: str
label: Optional[str] = None
created_at: datetime
updated_at: Optional[datetime] = None
extension: str
@ -475,6 +478,40 @@ def get_datasets_router() -> APIRouter:
message=f"Data ({data_id}) not found in dataset ({dataset_id})."
)
return data.raw_data_location
raw_location = data.raw_data_location
if raw_location.startswith("file://"):
from cognee.infrastructure.files.utils.get_data_file_path import get_data_file_path
raw_location = get_data_file_path(raw_location)
if raw_location.startswith("s3://"):
from cognee.infrastructure.files.utils.open_data_file import open_data_file
from cognee.infrastructure.utils.run_async import run_async
parsed = urlparse(raw_location)
download_name = Path(parsed.path).name or data.name
media_type = data.mime_type or "application/octet-stream"
async def file_iterator(chunk_size: int = 1024 * 1024):
async with open_data_file(raw_location, mode="rb") as file:
while True:
chunk = await run_async(file.read, chunk_size)
if not chunk:
break
yield chunk
return StreamingResponse(
file_iterator(),
media_type=media_type,
headers={"Content-Disposition": f'attachment; filename="{download_name}"'},
)
path = Path(raw_location)
if not path.is_file():
raise DataNotFoundError(message=f"Raw file not found on disk for data ({data_id}).")
return FileResponse(path=path)
return router

View file

@ -90,6 +90,7 @@ def get_memify_router() -> APIRouter:
dataset=payload.dataset_id if payload.dataset_id else payload.dataset_name,
node_name=payload.node_name,
user=user,
run_in_background=payload.run_in_background,
)
if isinstance(memify_run, PipelineRunErrored):

View file

@ -6,14 +6,16 @@ from fastapi import Depends, APIRouter
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder
from cognee.modules.search.types import SearchType, SearchResult, CombinedSearchResult
from cognee.modules.search.types import SearchType, SearchResult
from cognee.api.DTO import InDTO, OutDTO
from cognee.modules.users.exceptions.exceptions import PermissionDeniedError
from cognee.modules.users.exceptions.exceptions import PermissionDeniedError, UserNotFoundError
from cognee.modules.users.models import User
from cognee.modules.search.operations import get_history
from cognee.modules.users.methods import get_authenticated_user
from cognee.shared.utils import send_telemetry
from cognee import __version__ as cognee_version
from cognee.infrastructure.databases.exceptions import DatabaseNotCreatedError
from cognee.exceptions import CogneeValidationError
# Note: Datasets sent by name will only map to datasets owned by the request sender
@ -29,7 +31,7 @@ class SearchPayloadDTO(InDTO):
node_name: Optional[list[str]] = Field(default=None, example=[])
top_k: Optional[int] = Field(default=10)
only_context: bool = Field(default=False)
use_combined_context: bool = Field(default=False)
verbose: bool = Field(default=False)
def get_search_router() -> APIRouter:
@ -72,7 +74,7 @@ def get_search_router() -> APIRouter:
except Exception as error:
return JSONResponse(status_code=500, content={"error": str(error)})
@router.post("", response_model=Union[List[SearchResult], CombinedSearchResult, List])
@router.post("", response_model=Union[List[SearchResult], List])
async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticated_user)):
"""
Search for nodes in the graph database.
@ -116,7 +118,7 @@ def get_search_router() -> APIRouter:
"node_name": payload.node_name,
"top_k": payload.top_k,
"only_context": payload.only_context,
"use_combined_context": payload.use_combined_context,
"verbose": payload.verbose,
"cognee_version": cognee_version,
},
)
@ -133,11 +135,22 @@ def get_search_router() -> APIRouter:
system_prompt=payload.system_prompt,
node_name=payload.node_name,
top_k=payload.top_k,
verbose=payload.verbose,
only_context=payload.only_context,
use_combined_context=payload.use_combined_context,
)
return jsonable_encoder(results)
except (DatabaseNotCreatedError, UserNotFoundError, CogneeValidationError) as e:
# Return a clear 422 with actionable guidance instead of leaking a stacktrace
status_code = getattr(e, "status_code", 422)
return JSONResponse(
status_code=status_code,
content={
"error": "Search prerequisites not met",
"detail": str(e),
"hint": "Run `await cognee.add(...)` then `await cognee.cognify()` before searching.",
},
)
except PermissionDeniedError:
return []
except Exception as error:

View file

@ -4,13 +4,16 @@ from typing import Union, Optional, List, Type
from cognee.infrastructure.databases.graph import get_graph_engine
from cognee.modules.engine.models.node_set import NodeSet
from cognee.modules.users.models import User
from cognee.modules.search.types import SearchResult, SearchType, CombinedSearchResult
from cognee.modules.search.types import SearchResult, SearchType
from cognee.modules.users.methods import get_default_user
from cognee.modules.search.methods import search as search_function
from cognee.modules.data.methods import get_authorized_existing_datasets
from cognee.modules.data.exceptions import DatasetNotFoundError
from cognee.context_global_variables import set_session_user_context_variable
from cognee.shared.logging_utils import get_logger
from cognee.infrastructure.databases.exceptions import DatabaseNotCreatedError
from cognee.exceptions import CogneeValidationError
from cognee.modules.users.exceptions.exceptions import UserNotFoundError
logger = get_logger()
@ -29,12 +32,11 @@ async def search(
save_interaction: bool = False,
last_k: Optional[int] = 1,
only_context: bool = False,
use_combined_context: bool = False,
session_id: Optional[str] = None,
wide_search_top_k: Optional[int] = 100,
triplet_distance_penalty: Optional[float] = 3.5,
verbose: bool = False,
) -> Union[List[SearchResult], CombinedSearchResult]:
) -> List[SearchResult]:
"""
Search and query the knowledge graph for insights, information, and connections.
@ -179,7 +181,18 @@ async def search(
datasets = [datasets]
if user is None:
user = await get_default_user()
try:
user = await get_default_user()
except (DatabaseNotCreatedError, UserNotFoundError) as error:
# Provide a clear, actionable message instead of surfacing low-level stacktraces
raise CogneeValidationError(
message=(
"Search prerequisites not met: no database/default user found. "
"Initialize Cognee before searching by:\n"
"• running `await cognee.add(...)` followed by `await cognee.cognify()`."
),
name="SearchPreconditionError",
) from error
await set_session_user_context_variable(user)
@ -203,7 +216,6 @@ async def search(
save_interaction=save_interaction,
last_k=last_k,
only_context=only_context,
use_combined_context=use_combined_context,
session_id=session_id,
wide_search_top_k=wide_search_top_k,
triplet_distance_penalty=triplet_distance_penalty,

View file

@ -71,7 +71,7 @@ def get_sync_router() -> APIRouter:
-H "Content-Type: application/json" \\
-H "Cookie: auth_token=your-token" \\
-d '{"dataset_ids": ["123e4567-e89b-12d3-a456-426614174000", "456e7890-e12b-34c5-d678-901234567000"]}'
# Sync all user datasets (empty request body or null dataset_ids)
curl -X POST "http://localhost:8000/api/v1/sync" \\
-H "Content-Type: application/json" \\
@ -88,7 +88,7 @@ def get_sync_router() -> APIRouter:
- **413 Payload Too Large**: Dataset too large for current cloud plan
- **429 Too Many Requests**: Rate limit exceeded
## Notes
## Notes
- Sync operations run in the background - you get an immediate response
- Use the returned run_id to track progress (status API coming soon)
- Large datasets are automatically chunked for efficient transfer
@ -179,7 +179,7 @@ def get_sync_router() -> APIRouter:
```
## Example Responses
**No running syncs:**
```json
{

View file

@ -21,7 +21,7 @@ binary streams, then stores them in a specified dataset for further processing.
Supported Input Types:
- **Text strings**: Direct text content
- **File paths**: Local file paths (absolute paths starting with "/")
- **File paths**: Local file paths (absolute paths starting with "/")
- **File URLs**: "file:///absolute/path" or "file://relative/path"
- **S3 paths**: "s3://bucket-name/path/to/file"
- **Lists**: Multiple files or text strings in a single call

View file

@ -62,6 +62,11 @@ After successful cognify processing, use `cognee search` to query the knowledge
parser.add_argument(
"--verbose", "-v", action="store_true", help="Show detailed progress information"
)
parser.add_argument(
"--chunks-per-batch",
type=int,
help="Number of chunks to process per task batch (try 50 for large single documents).",
)
def execute(self, args: argparse.Namespace) -> None:
try:
@ -111,6 +116,7 @@ After successful cognify processing, use `cognee search` to query the knowledge
chunk_size=args.chunk_size,
ontology_file_path=args.ontology_file,
run_in_background=args.background,
chunks_per_batch=getattr(args, "chunks_per_batch", None),
)
return result
except Exception as e:

View file

@ -17,7 +17,7 @@ The `cognee config` command allows you to view and modify configuration settings
You can:
- View all current configuration settings
- Get specific configuration values
- Get specific configuration values
- Set configuration values
- Unset (reset to default) specific configuration values
- Reset all configuration to defaults

View file

@ -24,7 +24,6 @@ async def get_graph_engine() -> GraphDBInterface:
return graph_client
@lru_cache
def create_graph_engine(
graph_database_provider,
graph_file_path,
@ -35,6 +34,35 @@ def create_graph_engine(
graph_database_port="",
graph_database_key="",
graph_dataset_database_handler="",
):
"""
Wrapper function to call create graph engine with caching.
For a detailed description, see _create_graph_engine.
"""
return _create_graph_engine(
graph_database_provider,
graph_file_path,
graph_database_url,
graph_database_name,
graph_database_username,
graph_database_password,
graph_database_port,
graph_database_key,
graph_dataset_database_handler,
)
@lru_cache
def _create_graph_engine(
graph_database_provider,
graph_file_path,
graph_database_url="",
graph_database_name="",
graph_database_username="",
graph_database_password="",
graph_database_port="",
graph_database_key="",
graph_dataset_database_handler="",
):
"""
Create a graph engine based on the specified provider type.

View file

@ -1,11 +1,13 @@
import os
import aiohttp
import asyncio
import requests
import base64
import hashlib
from uuid import UUID
from typing import Optional
from urllib.parse import urlparse
from cryptography.fernet import Fernet
from aiohttp import BasicAuth
from cognee.infrastructure.databases.graph import get_graph_config
from cognee.modules.users.models import User, DatasetDatabase
@ -23,7 +25,6 @@ class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
Quality of life improvements:
- Allow configuration of different Neo4j Aura plans and regions.
- Requests should be made async, currently a blocking requests library is used.
"""
@classmethod
@ -49,6 +50,7 @@ class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
graph_db_name = f"{dataset_id}"
# Client credentials and encryption
# Note: Should not be used as class variables so that they are not persisted in memory longer than needed
client_id = os.environ.get("NEO4J_CLIENT_ID", None)
client_secret = os.environ.get("NEO4J_CLIENT_SECRET", None)
tenant_id = os.environ.get("NEO4J_TENANT_ID", None)
@ -63,22 +65,13 @@ class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"NEO4J_CLIENT_ID, NEO4J_CLIENT_SECRET, and NEO4J_TENANT_ID environment variables must be set to use Neo4j Aura DatasetDatabase Handling."
)
# Make the request with HTTP Basic Auth
def get_aura_token(client_id: str, client_secret: str) -> dict:
url = "https://api.neo4j.io/oauth/token"
data = {"grant_type": "client_credentials"} # sent as application/x-www-form-urlencoded
resp = requests.post(url, data=data, auth=(client_id, client_secret))
resp.raise_for_status() # raises if the request failed
return resp.json()
resp = get_aura_token(client_id, client_secret)
resp_token = await cls._get_aura_token(client_id, client_secret)
url = "https://api.neo4j.io/v1/instances"
headers = {
"accept": "application/json",
"Authorization": f"Bearer {resp['access_token']}",
"Authorization": f"Bearer {resp_token['access_token']}",
"Content-Type": "application/json",
}
@ -96,31 +89,38 @@ class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"cloud_provider": "gcp",
}
response = requests.post(url, headers=headers, json=payload)
async def _create_database_instance_request():
async with aiohttp.ClientSession() as session:
async with session.post(url, headers=headers, json=payload) as resp:
resp.raise_for_status()
return await resp.json()
resp_create = await _create_database_instance_request()
graph_db_name = "neo4j" # Has to be 'neo4j' for Aura
graph_db_url = response.json()["data"]["connection_url"]
graph_db_key = resp["access_token"]
graph_db_username = response.json()["data"]["username"]
graph_db_password = response.json()["data"]["password"]
graph_db_url = resp_create["data"]["connection_url"]
graph_db_key = resp_token["access_token"]
graph_db_username = resp_create["data"]["username"]
graph_db_password = resp_create["data"]["password"]
async def _wait_for_neo4j_instance_provisioning(instance_id: str, headers: dict):
# Poll until the instance is running
status_url = f"https://api.neo4j.io/v1/instances/{instance_id}"
status = ""
for attempt in range(30): # Try for up to ~5 minutes
status_resp = requests.get(
status_url, headers=headers
) # TODO: Use async requests with httpx
status = status_resp.json()["data"]["status"]
if status.lower() == "running":
return
await asyncio.sleep(10)
async with aiohttp.ClientSession() as session:
async with session.get(status_url, headers=headers) as resp:
resp.raise_for_status()
status_resp = await resp.json()
status = status_resp["data"]["status"]
if status.lower() == "running":
return
await asyncio.sleep(10)
raise TimeoutError(
f"Neo4j instance '{graph_db_name}' did not become ready within 5 minutes. Status: {status}"
)
instance_id = response.json()["data"]["id"]
instance_id = resp_create["data"]["id"]
await _wait_for_neo4j_instance_provisioning(instance_id, headers)
encrypted_db_password_bytes = cipher.encrypt(graph_db_password.encode())
@ -165,4 +165,39 @@ class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
@classmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase):
pass
# Get dataset database information and credentials
dataset_database = await cls.resolve_dataset_connection_info(dataset_database)
parsed_url = urlparse(dataset_database.graph_database_url)
instance_id = parsed_url.hostname.split(".")[0]
url = f"https://api.neo4j.io/v1/instances/{instance_id}"
# Get access token for Neo4j Aura API
# Client credentials
client_id = os.environ.get("NEO4J_CLIENT_ID", None)
client_secret = os.environ.get("NEO4J_CLIENT_SECRET", None)
resp = await cls._get_aura_token(client_id, client_secret)
headers = {
"accept": "application/json",
"Authorization": f"Bearer {resp['access_token']}",
"Content-Type": "application/json",
}
async with aiohttp.ClientSession() as session:
async with session.delete(url, headers=headers) as resp:
resp.raise_for_status()
return await resp.json()
@classmethod
async def _get_aura_token(cls, client_id: str, client_secret: str) -> dict:
url = "https://api.neo4j.io/oauth/token"
data = {"grant_type": "client_credentials"} # sent as application/x-www-form-urlencoded
async with aiohttp.ClientSession() as session:
async with session.post(
url, data=data, auth=BasicAuth(client_id, client_secret)
) as resp:
resp.raise_for_status()
return await resp.json()

View file

@ -290,7 +290,7 @@ class NeptuneAnalyticsAdapter(NeptuneGraphDB, VectorDBInterface):
query_string = f"""
CALL neptune.algo.vectors.topKByEmbeddingWithFiltering({{
topK: {limit},
embedding: {embedding},
embedding: {embedding},
nodeFilter: {{ equals: {{property: '{self._COLLECTION_PREFIX}', value: '{collection_name}'}} }}
}}
)
@ -299,7 +299,7 @@ class NeptuneAnalyticsAdapter(NeptuneGraphDB, VectorDBInterface):
if with_vector:
query_string += """
WITH node, score, id(node) as node_id
WITH node, score, id(node) as node_id
MATCH (n)
WHERE id(n) = id(node)
CALL neptune.algo.vectors.get(n)

View file

@ -1,4 +1,5 @@
import os
import json
import pydantic
from typing import Union
from functools import lru_cache
@ -19,6 +20,7 @@ class RelationalConfig(BaseSettings):
db_username: Union[str, None] = None # "cognee"
db_password: Union[str, None] = None # "cognee"
db_provider: str = "sqlite"
database_connect_args: Union[str, None] = None
model_config = SettingsConfigDict(env_file=".env", extra="allow")
@ -30,6 +32,17 @@ class RelationalConfig(BaseSettings):
databases_directory_path = os.path.join(base_config.system_root_directory, "databases")
self.db_path = databases_directory_path
# Parse database_connect_args if provided as JSON string
if self.database_connect_args and isinstance(self.database_connect_args, str):
try:
parsed_args = json.loads(self.database_connect_args)
if isinstance(parsed_args, dict):
self.database_connect_args = parsed_args
else:
self.database_connect_args = {}
except json.JSONDecodeError:
self.database_connect_args = {}
return self
def to_dict(self) -> dict:
@ -40,7 +53,8 @@ class RelationalConfig(BaseSettings):
--------
- dict: A dictionary containing database configuration settings including db_path,
db_name, db_host, db_port, db_username, db_password, and db_provider.
db_name, db_host, db_port, db_username, db_password, db_provider, and
database_connect_args.
"""
return {
"db_path": self.db_path,
@ -50,6 +64,7 @@ class RelationalConfig(BaseSettings):
"db_username": self.db_username,
"db_password": self.db_password,
"db_provider": self.db_provider,
"database_connect_args": self.database_connect_args,
}

View file

@ -1,3 +1,4 @@
from sqlalchemy import URL
from .sqlalchemy.SqlAlchemyAdapter import SQLAlchemyAdapter
from functools import lru_cache
@ -11,6 +12,7 @@ def create_relational_engine(
db_username: str,
db_password: str,
db_provider: str,
database_connect_args: dict = None,
):
"""
Create a relational database engine based on the specified parameters.
@ -29,6 +31,7 @@ def create_relational_engine(
- db_password (str): The password for database authentication, required for
PostgreSQL.
- db_provider (str): The type of database provider (e.g., 'sqlite' or 'postgres').
- database_connect_args (dict, optional): Database driver connection arguments.
Returns:
--------
@ -43,12 +46,19 @@ def create_relational_engine(
# Test if asyncpg is available
import asyncpg
connection_string = (
f"postgresql+asyncpg://{db_username}:{db_password}@{db_host}:{db_port}/{db_name}"
# Handle special characters in username and password like # or @
connection_string = URL.create(
"postgresql+asyncpg",
username=db_username,
password=db_password,
host=db_host,
port=int(db_port),
database=db_name,
)
except ImportError:
raise ImportError(
"PostgreSQL dependencies are not installed. Please install with 'pip install cognee\"[postgres]\"' or 'pip install cognee\"[postgres-binary]\"' to use PostgreSQL functionality."
)
return SQLAlchemyAdapter(connection_string)
return SQLAlchemyAdapter(connection_string, connect_args=database_connect_args)

View file

@ -29,10 +29,31 @@ class SQLAlchemyAdapter:
functions.
"""
def __init__(self, connection_string: str):
def __init__(self, connection_string: str, connect_args: dict = None):
"""
Initialize the SQLAlchemy adapter with connection settings.
Parameters:
-----------
connection_string (str): The database connection string (e.g., 'sqlite:///path/to/db'
or 'postgresql://user:pass@host:port/db').
connect_args (dict, optional): Database driver connection arguments.
Configuration is loaded from RelationalConfig.database_connect_args, which reads
from the DATABASE_CONNECT_ARGS environment variable.
Examples:
PostgreSQL with SSL:
DATABASE_CONNECT_ARGS='{"sslmode": "require", "connect_timeout": 10}'
SQLite with custom timeout:
DATABASE_CONNECT_ARGS='{"timeout": 60}'
"""
self.db_path: str = None
self.db_uri: str = connection_string
# Use provided connect_args (already parsed from config)
final_connect_args = connect_args or {}
if "sqlite" in connection_string:
[prefix, db_path] = connection_string.split("///")
self.db_path = db_path
@ -53,7 +74,7 @@ class SQLAlchemyAdapter:
self.engine = create_async_engine(
connection_string,
poolclass=NullPool,
connect_args={"timeout": 30},
connect_args={**{"timeout": 30}, **final_connect_args},
)
else:
self.engine = create_async_engine(
@ -63,6 +84,7 @@ class SQLAlchemyAdapter:
pool_recycle=280,
pool_pre_ping=True,
pool_timeout=280,
connect_args=final_connect_args,
)
self.sessionmaker = async_sessionmaker(bind=self.engine, expire_on_commit=False)

View file

@ -1,3 +1,5 @@
from sqlalchemy import URL
from .supported_databases import supported_databases
from .embeddings import get_embedding_engine
from cognee.infrastructure.databases.graph.config import get_graph_context_config
@ -5,7 +7,6 @@ from cognee.infrastructure.databases.graph.config import get_graph_context_confi
from functools import lru_cache
@lru_cache
def create_vector_engine(
vector_db_provider: str,
vector_db_url: str,
@ -13,6 +14,29 @@ def create_vector_engine(
vector_db_port: str = "",
vector_db_key: str = "",
vector_dataset_database_handler: str = "",
):
"""
Wrapper function to call create vector engine with caching.
For a detailed description, see _create_vector_engine.
"""
return _create_vector_engine(
vector_db_provider,
vector_db_url,
vector_db_name,
vector_db_port,
vector_db_key,
vector_dataset_database_handler,
)
@lru_cache
def _create_vector_engine(
vector_db_provider: str,
vector_db_url: str,
vector_db_name: str,
vector_db_port: str = "",
vector_db_key: str = "",
vector_dataset_database_handler: str = "",
):
"""
Create a vector database engine based on the specified provider.
@ -66,8 +90,13 @@ def create_vector_engine(
if not (db_host and db_port and db_name and db_username and db_password):
raise EnvironmentError("Missing requred pgvector credentials!")
connection_string: str = (
f"postgresql+asyncpg://{db_username}:{db_password}@{db_host}:{db_port}/{db_name}"
connection_string = URL.create(
"postgresql+asyncpg",
username=db_username,
password=db_password,
host=db_host,
port=int(db_port),
database=db_name,
)
try:

View file

@ -14,6 +14,8 @@ from tenacity import (
)
import litellm
import os
from urllib.parse import urlparse
import httpx
from cognee.infrastructure.databases.vector.embeddings.EmbeddingEngine import EmbeddingEngine
from cognee.infrastructure.databases.exceptions import EmbeddingException
from cognee.infrastructure.llm.tokenizer.HuggingFace import (
@ -79,10 +81,26 @@ class LiteLLMEmbeddingEngine(EmbeddingEngine):
enable_mocking = str(enable_mocking).lower()
self.mock = enable_mocking in ("true", "1", "yes")
# Validate provided custom embedding endpoint early to avoid long hangs later
if self.endpoint:
try:
parsed = urlparse(self.endpoint)
except Exception:
parsed = None
if not parsed or parsed.scheme not in ("http", "https") or not parsed.netloc:
logger.error(
"Invalid EMBEDDING_ENDPOINT configured: '%s'. Expected a URL starting with http:// or https://",
str(self.endpoint),
)
raise EmbeddingException(
"Invalid EMBEDDING_ENDPOINT. Please set a valid URL (e.g., https://host:port) "
"via environment variable EMBEDDING_ENDPOINT."
)
@retry(
stop=stop_after_delay(128),
stop=stop_after_delay(30),
wait=wait_exponential_jitter(2, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
retry=retry_if_not_exception_type((litellm.exceptions.NotFoundError, EmbeddingException)),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
@ -111,12 +129,16 @@ class LiteLLMEmbeddingEngine(EmbeddingEngine):
return [data["embedding"] for data in response["data"]]
else:
async with embedding_rate_limiter_context_manager():
response = await litellm.aembedding(
model=self.model,
input=text,
api_key=self.api_key,
api_base=self.endpoint,
api_version=self.api_version,
# Ensure each attempt does not hang indefinitely
response = await asyncio.wait_for(
litellm.aembedding(
model=self.model,
input=text,
api_key=self.api_key,
api_base=self.endpoint,
api_version=self.api_version,
),
timeout=30.0,
)
return [data["embedding"] for data in response.data]
@ -154,6 +176,27 @@ class LiteLLMEmbeddingEngine(EmbeddingEngine):
logger.error("Context window exceeded for embedding text: %s", str(error))
raise error
except asyncio.TimeoutError as e:
# Per-attempt timeout likely an unreachable endpoint
logger.error(
"Embedding endpoint timed out. EMBEDDING_ENDPOINT='%s'. "
"Verify that the endpoint is reachable and correct.",
str(self.endpoint),
)
raise EmbeddingException(
"Embedding request timed out. Check EMBEDDING_ENDPOINT connectivity."
) from e
except (httpx.ConnectError, httpx.ReadTimeout) as e:
logger.error(
"Failed to connect to embedding endpoint. EMBEDDING_ENDPOINT='%s'. "
"Ensure the URL is correct and the server is running.",
str(self.endpoint),
)
raise EmbeddingException(
"Cannot connect to embedding endpoint. Check EMBEDDING_ENDPOINT."
) from e
except (
litellm.exceptions.BadRequestError,
litellm.exceptions.NotFoundError,
@ -162,8 +205,15 @@ class LiteLLMEmbeddingEngine(EmbeddingEngine):
raise EmbeddingException(f"Failed to index data points using model {self.model}") from e
except Exception as error:
logger.error("Error embedding text: %s", str(error))
raise error
# Fall back to a clear, actionable message for connectivity/misconfiguration issues
logger.error(
"Error embedding text: %s. EMBEDDING_ENDPOINT='%s'.",
str(error),
str(self.endpoint),
)
raise EmbeddingException(
"Embedding failed due to an unexpected error. Verify EMBEDDING_ENDPOINT and provider settings."
) from error
def get_vector_size(self) -> int:
"""

View file

@ -37,19 +37,6 @@ class LLMGateway:
**kwargs,
)
@staticmethod
def create_structured_output(
text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.get_llm_client import (
get_llm_client,
)
llm_client = get_llm_client()
return llm_client.create_structured_output(
text_input=text_input, system_prompt=system_prompt, response_model=response_model
)
@staticmethod
def create_transcript(input) -> Coroutine:
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.get_llm_client import (

View file

@ -10,4 +10,4 @@ Extraction rules:
5. Current-time references ("now", "current", "today"): If the query explicitly refers to the present, set both starts_at and ends_at to now (the ingestion timestamp).
6. "Who is" and "Who was" questions: These imply a general identity or biographical inquiry without a specific temporal scope. Set both starts_at and ends_at to None.
7. Ordering rule: Always ensure the earlier date is assigned to starts_at and the later date to ends_at.
8. No temporal information: If no valid or inferable time reference is found, set both starts_at and ends_at to None.
8. No temporal information: If no valid or inferable time reference is found, set both starts_at and ends_at to None.

Some files were not shown because too many files have changed in this diff Show more