Compare commits

...
Sign in to create a new pull request.

418 commits

Author SHA1 Message Date
Vasilije
a505eef95b
Enhance README with Cognee description
Added a brief description of Cognee's features and capabilities.
2026-01-17 21:00:31 +00:00
Vasilije
8cd3aab1ef
Clarify Cognee Open Source and Cloud descriptions
Updated the README to clarify the differences between Cognee Open Source and Cognee Cloud offerings.
2026-01-17 20:58:55 +00:00
Vasilije
8a96a351e2
chore(deps): bump the npm_and_yarn group across 1 directory with 2 updates (#1974)
Bumps the npm_and_yarn group with 2 updates in the /cognee-frontend
directory: [next](https://github.com/vercel/next.js) and
[preact](https://github.com/preactjs/preact).

Updates `next` from 16.0.4 to 16.1.1
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v16.1.1</h2>
<blockquote>
<p>[!NOTE]
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Turbopack: Create junction points instead of symlinks on Windows (<a
href="https://redirect.github.com/vercel/next.js/issues/87606">#87606</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/sokra"><code>@​sokra</code></a> and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v16.1.1-canary.16</h2>
<h3>Core Changes</h3>
<ul>
<li>Add maximum size limit for postponed body parsing: <a
href="https://redirect.github.com/vercel/next.js/issues/88175">#88175</a></li>
<li>metadata: use fixed segment in dynamic routes with static metadata
files: <a
href="https://redirect.github.com/vercel/next.js/issues/88113">#88113</a></li>
<li>feat: add --experimental-cpu-prof flag for dev, build, and start: <a
href="https://redirect.github.com/vercel/next.js/issues/87946">#87946</a></li>
<li>Add experimental option to use no-cache instead of no-store in dev:
<a
href="https://redirect.github.com/vercel/next.js/issues/88182">#88182</a></li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>fix: move conductor.json to repo root for proper detection: <a
href="https://redirect.github.com/vercel/next.js/issues/88184">#88184</a></li>
<li>Turbopack: Update to swc_core v50.2.3: <a
href="https://redirect.github.com/vercel/next.js/issues/87841">#87841</a></li>
<li>Update generateMetadata in client component error: <a
href="https://redirect.github.com/vercel/next.js/issues/88172">#88172</a></li>
<li>ci: run stats on canary pushes for historical trend tracking: <a
href="https://redirect.github.com/vercel/next.js/issues/88157">#88157</a></li>
<li>perf: improve stats thresholds to reduce CI noise: <a
href="https://redirect.github.com/vercel/next.js/issues/88158">#88158</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/wyattjoh"><code>@​wyattjoh</code></a>, <a
href="https://github.com/bgw"><code>@​bgw</code></a>, <a
href="https://github.com/timneutkens"><code>@​timneutkens</code></a>, <a
href="https://github.com/feedthejim"><code>@​feedthejim</code></a>, and
<a href="https://github.com/huozhi"><code>@​huozhi</code></a> for
helping!</p>
<h2>v16.1.1-canary.15</h2>
<h3>Core Changes</h3>
<ul>
<li>add compilation error for taint when not enabled: <a
href="https://redirect.github.com/vercel/next.js/issues/88173">#88173</a></li>
<li>feat(next/image)!: add <code>images.maximumResponseBody</code>
config: <a
href="https://redirect.github.com/vercel/next.js/issues/88183">#88183</a></li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>Update Rspack production test manifest: <a
href="https://redirect.github.com/vercel/next.js/issues/88137">#88137</a></li>
<li>Update Rspack development test manifest: <a
href="https://redirect.github.com/vercel/next.js/issues/88138">#88138</a></li>
<li>Fix compile error when running next-custom-transform tests: <a
href="https://redirect.github.com/vercel/next.js/issues/83715">#83715</a></li>
<li>chore: add Conductor configuration for parallel development: <a
href="https://redirect.github.com/vercel/next.js/issues/88116">#88116</a></li>
<li>[docs] add get_routes in mcp available tools: <a
href="https://redirect.github.com/vercel/next.js/issues/88181">#88181</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/vercel-release-bot"><code>@​vercel-release-bot</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, <a
href="https://github.com/mischnic"><code>@​mischnic</code></a>, <a
href="https://github.com/wyattjoh"><code>@​wyattjoh</code></a>, <a
href="https://github.com/huozhi"><code>@​huozhi</code></a>, and <a
href="https://github.com/styfle"><code>@​styfle</code></a> for
helping!</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3aa53984e9"><code>3aa5398</code></a>
v16.1.1</li>
<li><a
href="d1bd5b5810"><code>d1bd5b5</code></a>
Turbopack: Create junction points instead of symlinks on Windows (<a
href="https://redirect.github.com/vercel/next.js/issues/87606">#87606</a>)</li>
<li><a
href="a67ee72788"><code>a67ee72</code></a>
setup release branch</li>
<li><a
href="34916762cd"><code>3491676</code></a>
v16.1.0</li>
<li><a
href="58e8f8c7e5"><code>58e8f8c</code></a>
v16.1.0-canary.34</li>
<li><a
href="8a8a00d5d0"><code>8a8a00d</code></a>
Revert &quot;Move next-env.d.ts to dist dir&quot; (<a
href="https://redirect.github.com/vercel/next.js/issues/87311">#87311</a>)</li>
<li><a
href="3284587f8e"><code>3284587</code></a>
v16.1.0-canary.33</li>
<li><a
href="25da5f0426"><code>25da5f0</code></a>
Move next-env.d.ts to dist dir (<a
href="https://redirect.github.com/vercel/next.js/issues/86752">#86752</a>)</li>
<li><a
href="aa8a243e72"><code>aa8a243</code></a>
feat: use Rspack persistent cache by default (<a
href="https://redirect.github.com/vercel/next.js/issues/81399">#81399</a>)</li>
<li><a
href="754db28e52"><code>754db28</code></a>
bundle analyzer: remove geist font in favor of system ui fonts (<a
href="https://redirect.github.com/vercel/next.js/issues/87292">#87292</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v16.0.4...v16.1.1">compare
view</a></li>
</ul>
</details>
<br />

Updates `preact` from 10.27.2 to 10.28.2
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/preactjs/preact/releases">preact's
releases</a>.</em></p>
<blockquote>
<h2>10.28.2</h2>
<h2>Fixes</h2>
<ul>
<li>Enforce strict equality for VNode object constructors</li>
</ul>
<h2>10.28.1</h2>
<h2>Fixes</h2>
<ul>
<li>Fix erroneous diffing w/ growing list (<a
href="https://redirect.github.com/preactjs/preact/issues/4975">#4975</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
</ul>
<h2>10.28.0</h2>
<h2>Types</h2>
<ul>
<li>Updates dangerouslySetInnerHTML type so future TS will accept
Trusted… (<a
href="https://redirect.github.com/preactjs/preact/issues/4931">#4931</a>,
thanks <a
href="https://github.com/lukewarlow"><code>@​lukewarlow</code></a>)</li>
<li>Adds snap events (<a
href="https://redirect.github.com/preactjs/preact/issues/4947">#4947</a>,
thanks <a
href="https://github.com/argyleink"><code>@​argyleink</code></a>)</li>
<li>Remove missed jsx duplicates (<a
href="https://redirect.github.com/preactjs/preact/issues/4950">#4950</a>,
thanks <a
href="https://github.com/rschristian"><code>@​rschristian</code></a>)</li>
<li>Fix scroll events (<a
href="https://redirect.github.com/preactjs/preact/issues/4949">#4949</a>,
thanks <a
href="https://github.com/rschristian"><code>@​rschristian</code></a>)</li>
</ul>
<h2>Fixes</h2>
<ul>
<li>Fix cascading renders with signals (<a
href="https://redirect.github.com/preactjs/preact/issues/4966">#4966</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
<li>add <code>commpat/server.browser</code> entry (<a
href="https://redirect.github.com/preactjs/preact/issues/4941">#4941</a>
&amp; <a
href="https://redirect.github.com/preactjs/preact/issues/4940">#4940</a>,
thanks <a
href="https://github.com/marvinhagemeister"><code>@​marvinhagemeister</code></a>)</li>
<li>Avoid lazy components without result going in throw loop (<a
href="https://redirect.github.com/preactjs/preact/issues/4937">#4937</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
</ul>
<h2>Performance</h2>
<ul>
<li>Backport some v11 optimizations (<a
href="https://redirect.github.com/preactjs/preact/issues/4967">#4967</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
</ul>
<h2>10.27.3</h2>
<h2>Fixes</h2>
<ul>
<li>Enforce strict equality for VNode object constructors</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="6f914464b3"><code>6f91446</code></a>
10.28.2</li>
<li><a
href="37c3e030ab"><code>37c3e03</code></a>
Strict equality check on constructor (<a
href="https://redirect.github.com/preactjs/preact/issues/4985">#4985</a>)</li>
<li><a
href="6670a4a70b"><code>6670a4a</code></a>
chore: Adjust TS linting setup (<a
href="https://redirect.github.com/preactjs/preact/issues/4982">#4982</a>)</li>
<li><a
href="2af522b2c2"><code>2af522b</code></a>
10.28.1 (<a
href="https://redirect.github.com/preactjs/preact/issues/4978">#4978</a>)</li>
<li><a
href="f7693b72ec"><code>f7693b7</code></a>
Fix erroneous diffing w/ growing list (<a
href="https://redirect.github.com/preactjs/preact/issues/4975">#4975</a>)</li>
<li><a
href="b36b6a7148"><code>b36b6a7</code></a>
10.28.0 (<a
href="https://redirect.github.com/preactjs/preact/issues/4968">#4968</a>)</li>
<li><a
href="4d40e96f43"><code>4d40e96</code></a>
Backport some v11 optimizations (<a
href="https://redirect.github.com/preactjs/preact/issues/4967">#4967</a>)</li>
<li><a
href="7b74b406e2"><code>7b74b40</code></a>
Fix cascading renders with signals (<a
href="https://redirect.github.com/preactjs/preact/issues/4966">#4966</a>)</li>
<li><a
href="3ab5c6fbbb"><code>3ab5c6f</code></a>
Updates dangerouslySetInnerHTML type so future TS will accept Trusted…
(<a
href="https://redirect.github.com/preactjs/preact/issues/4931">#4931</a>)</li>
<li><a
href="ff30c2b5c4"><code>ff30c2b</code></a>
Adds snap events (<a
href="https://redirect.github.com/preactjs/preact/issues/4947">#4947</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/preactjs/preact/compare/10.27.2...10.28.2">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/topoteretes/cognee/network/alerts).

</details>
2026-01-08 15:55:56 +01:00
Vasilije
39613997d6
docs: clarify dev branching and fix contributing text (#1976)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2026-01-08 15:53:18 +01:00
Vasilije
a5fc6165c1
refactor: Use same default_k value in MCP as for Cognee (#1977)
<!-- .github/pull_request_template.md -->

Set default top_k value for MCP to be the same as the Cognee default
top_k value

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Improvements**
* Search operations now return 10 results by default instead of 5,
providing more comprehensive search results.

* **Style**
* Minor internal formatting cleanup in the client path with no
user-visible behavior changes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 15:52:15 +01:00
Igor Ilic
69fe35bdee refactor: add ruff formatting 2026-01-08 13:32:15 +01:00
Igor Ilic
be738df88a refactor: Use same default_k value in MCP as for Cognee 2026-01-08 12:47:42 +01:00
Babar Ali
01a39dff22 docs: clarify dev branching and fix contributing text
Signed-off-by: Babar Ali <148423037+Babarali2k21@users.noreply.github.com>
2026-01-08 10:15:42 +01:00
dependabot[bot]
53f96f3e29
chore(deps): bump the npm_and_yarn group across 1 directory with 2 updates
Bumps the npm_and_yarn group with 2 updates in the /cognee-frontend directory: [next](https://github.com/vercel/next.js) and [preact](https://github.com/preactjs/preact).


Updates `next` from 16.0.4 to 16.1.1
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v16.0.4...v16.1.1)

Updates `preact` from 10.27.2 to 10.28.2
- [Release notes](https://github.com/preactjs/preact/releases)
- [Commits](https://github.com/preactjs/preact/compare/10.27.2...10.28.2)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 16.1.1
  dependency-type: direct:production
  dependency-group: npm_and_yarn
- dependency-name: preact
  dependency-version: 10.28.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-07 19:36:40 +00:00
Vasilije
b339529621
Fix: Add top_k parameter support to MCP search tool (#1954)
## Problem

The MCP search wrapper doesn't expose the `top_k` parameter, causing
critical performance and usability issues:

- **Unlimited result returns**: CHUNKS search returns 113KB+ responses
with hundreds of text chunks
- **Extreme performance degradation**: GRAPH_COMPLETION takes 30+
seconds to complete
- **Context window exhaustion**: Responses quickly consume the entire
context budget
- **Production unusability**: Search functionality is impractical for
real-world MCP client usage

### Root Cause

The MCP tool definition in `server.py` doesn't expose the `top_k`
parameter that exists in the underlying `cognee.search()` API.
Additionally, `cognee_client.py` ignores the parameter in direct mode
(line 194).

## Solution

This PR adds proper `top_k` parameter support throughout the MCP call
chain:

### Changes

1. **server.py (line 319)**: Add `top_k: int = 5` parameter to MCP
`search` tool
2. **server.py (line 428)**: Update `search_task` signature to accept
`top_k`
3. **server.py (line 433)**: Pass `top_k` to `cognee_client.search()`
4. **server.py (line 468)**: Pass `top_k` to `search_task` call
5. **cognee_client.py (line 194)**: Forward `top_k` parameter to
`cognee.search()`

### Parameter Flow
```
MCP Client (Claude Code, etc.)
    ↓ search(query, type, top_k=5)
server.py::search()
    ↓
server.py::search_task()
    ↓
cognee_client.search()
    ↓
cognee.search()  ← Core library
```

## Impact

### Performance Improvements

| Metric | Before | After (top_k=5) | Improvement |
|--------|--------|-----------------|-------------|
| Response Size (CHUNKS) | 113KB+ | ~3KB | 97% reduction |
| Response Size (GRAPH_COMPLETION) | 100KB+ | ~5KB | 95% reduction |
| Latency (GRAPH_COMPLETION) | 30+ seconds | 2-5 seconds | 80-90% faster
|
| Context Window Usage | Rapidly exhausted | Sustainable | Dramatic
improvement |

### User Control

Users can now control result granularity:
- `top_k=3` - Quick answers, minimal context
- `top_k=5` - Balanced (default)
- `top_k=10` - More comprehensive
- `top_k=20` - Maximum context (still reasonable)

### Backward Compatibility

 **Fully backward compatible**
- Default `top_k=5` maintains sensible behavior
- Existing MCP clients work without changes
- No breaking API changes

## Testing

### Code Review
-  All function signatures updated correctly
-  Parameter properly threaded through call chain
-  Default value provides sensible behavior
-  No syntax errors or type issues

### Production Usage
-  Patches in production use since issue discovery
-  Confirmed dramatic performance improvements
-  Successfully tested with CHUNKS, GRAPH_COMPLETION, and
RAG_COMPLETION search types
-  Vertex AI backend compatibility validated

## Additional Context

This issue particularly affects users of:
- Non-OpenAI LLM backends (Vertex AI, Claude, etc.)
- Production MCP deployments
- Context-sensitive applications (Claude Code, etc.)

The fix enables Cognee MCP to be practically usable in production
environments where context window management and response latency are
critical.

---

**Generated with** [Claude Code](https://claude.com/claude-code)

**Co-Authored-By**: Claude Sonnet 4.5 <noreply@anthropic.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Search queries now support a configurable results limit (defaults to
5), letting users control how many results are returned.
* The results limit is consistently applied across search modes so
returned results match the requested maximum.

* **Documentation**
* Clarified description of the results limit and its impact on result
sets and context usage.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-03 10:57:10 +01:00
AnveshJarabani
6a5ba70ced
docs: Add comprehensive docstrings and fix default top_k consistency
Address PR feedback from CodeRabbit AI:
- Add detailed docstring for search_task internal function
- Document top_k parameter in main search function docstring
- Fix default top_k inconsistency (was 10 in client, now 5 everywhere)
- Clarify performance implications of different top_k values

Changes:
- server.py: Add top_k parameter documentation and search_task docstring
- cognee_client.py: Change default top_k from 10 to 5 for consistency

This ensures consistent behavior across the MCP call chain and
provides clear guidance for users on choosing appropriate top_k values.
2026-01-03 01:33:13 -06:00
AnveshJarabani
7ee36f883b
Fix: Add top_k parameter support to MCP search tool
## Problem
The MCP search wrapper doesn't expose the top_k parameter, causing:
- Unlimited result returns (113KB+ responses)
- Extremely slow search performance (30+ seconds for GRAPH_COMPLETION)
- Context window exhaustion in production use

## Solution
1. Add top_k parameter (default=5) to MCP search tool in server.py
2. Thread parameter through search_task internal function
3. Forward top_k to cognee_client.search() call
4. Update cognee_client.py to pass top_k to core cognee.search()

## Impact
- **Performance**: 97% reduction in response size (113KB → 3KB)
- **Latency**: 80-90% faster (30s → 2-5s for GRAPH_COMPLETION)
- **Backward Compatible**: Default top_k=5 maintains existing behavior
- **User Control**: Configurable from top_k=3 (quick) to top_k=20 (comprehensive)

## Testing
-  Code review validates proper parameter threading
-  Backward compatible (default value ensures no breaking changes)
-  Production usage confirms performance improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-03 01:27:16 -06:00
Vasilije
5b42b21af5
Enhance CONTRIBUTING.md with example setup instructions
Added instructions for running a simple example and setting up the environment.
2025-12-29 18:00:08 +01:00
Vasilije
1061258fde
Fix Python 3.12 SyntaxError caused by JS regex escape sequences (#1934)
### Fix Python 3.12 SyntaxError caused by invalid escape sequences

#### Problem
Importing `cognee` on Python 3.12+ raises:

SyntaxError: invalid escape sequence '\s'

This occurs because JavaScript regex patterns (e.g. `/\s+/g`) are
embedded
inside a regular Python triple-quoted string, which Python 3.12 strictly
validates.

#### Solution
Converted the HTML template string to a raw string (`r"""`) so Python
does not
interpret JavaScript regex escape sequences.

#### Files Changed
- cognee/modules/visualization/cognee_network_visualization.py

Fixes #1929


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved stability of the network visualization module through
enhanced HTML template handling.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-23 14:23:25 +01:00
Uday Gupta
7019a91f7c Fix Python 3.12 SyntaxError caused by JS regex escape sequences 2025-12-23 15:51:07 +05:30
Vasilije
e0644285d4
fix: Resolve issue with migrations for docker (#1932)
<!-- .github/pull_request_template.md -->

## Description
Try database creation if migrations can't run in Cognee

Until we rework the way migrations work in Cognee we can't create the
database first and then run migrations so this is the best we can do
until then

This will unfortunately eat all potential migrations issues, we should
make it work without eating exceptions when we rework migrations

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved database migration resilience with automatic fallback
initialization when migrations encounter unexpected errors
* Enhanced startup error handling to ensure graceful recovery during
database setup
* Database initialization now attempts automatic recovery before
reporting startup failures
* Added explicit success messaging when migrations complete without
issues

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-22 15:41:55 +01:00
Igor Ilic
f1526a6660 fix: Resolve issue with migrations for docker 2025-12-22 14:54:11 +01:00
Vasilije
d8d3844805
refactor: Update examples to use pprint (#1921)
<!-- .github/pull_request_template.md -->

## Description
Update examples to use pprint

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Style**
* Updated Python examples to use prettier, more readable output
formatting for search and example results.
* **Documentation**
* README updated to reflect improved example output presentation, making
demonstrations easier to read and verify.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-18 17:59:24 +01:00
Pavel Zorin
23d55a45d4 Release v0.5.1 2025-12-18 16:14:47 +01:00
Igor Ilic
1724997683 docs: Update README.md 2025-12-18 14:46:21 +01:00
Igor Ilic
eda9f26b2b
Merge branch 'main' into human-readable-search 2025-12-18 14:24:09 +01:00
Igor Ilic
cc41ef853c refactor: Update examples to use pprint 2025-12-18 14:17:24 +01:00
Vasilije
4caac4b8f0
refactor: Make graphs return optional in search (#1919)
<!-- .github/pull_request_template.md -->

## Description
- Have search results be more human readable by making graphs return
information optional

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added an optional verbose mode to search. When enabled, results
include additional graph details; disabled by default for cleaner
responses.

* **Tests**
* Added unit tests verifying access-controlled search returns correctly
shaped results for both verbose and non-verbose modes, including
presence/absence of graph details.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-18 13:50:33 +01:00
Igor Ilic
b5949580de refactor: add note about verbose in combined context search 2025-12-18 13:45:20 +01:00
Igor Ilic
986b93fee4 docs: add docstring update for search 2025-12-18 13:24:39 +01:00
Igor Ilic
31e491bc88 test: Add test for verbose search 2025-12-18 13:04:17 +01:00
Igor Ilic
f2bc7ca992 refactor: change comment 2025-12-18 12:00:06 +01:00
Igor Ilic
dd9aad90cb refactor: Make graphs return optional 2025-12-18 11:57:40 +01:00
Vasilije
c7e94e296e
fix: Fix MCP on main branch (#1906)
<!-- .github/pull_request_template.md -->

## Description
Fixes MCP on main branch

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated core dependency to a newer version.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 12:22:21 +01:00
Igor Ilic
109b889b68 fix: Fix MCP on main branch 2025-12-16 10:47:10 +01:00
Pavel Zorin
fdbee66985 Release: add push to main docker workflow 2025-12-15 20:46:20 +01:00
Pavel Zorin
077e0a4937
Release 0.5.0: uv lock (#1902)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 20:35:48 +01:00
Pavel Zorin
40e1b9bfed mcp: uv lock 2025-12-15 20:34:51 +01:00
Pavel Zorin
4424ebf3c9 Release 0.5.0: uv lock 2025-12-15 20:33:02 +01:00
Pavel Zorin
96107b9f2f
Bump cognee and MCP versions to 0.5.0 (#1901)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 20:30:19 +01:00
Pavel Zorin
b42ea166a8 Release: Bump cognee and MCP versions to 0.5.0 2025-12-15 20:29:29 +01:00
Pavel Zorin
4936551a90 CI: Update release workflow 2025-12-15 20:18:51 +01:00
Pavel Zorin
405c925b70
Release 0.5.0 Merge dev to main (#1900)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 19:57:39 +01:00
Pavel Zorin
78028b819f update dev uv.lock 2025-12-15 18:42:02 +01:00
Pavel Zorin
67af8a7cb4
Bump version from 0.5.0.dev0 to 0.5.0.dev1 2025-12-15 18:36:15 +01:00
hajdul88
622f8fa79e
chore: introduces 1 file upload in ontology endpoint (#1899)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes the ontology upload endpoint by forcing 1 file upload at
the time. Tests are adjusted in both server start and ontology endpoint
unit test. API was tested.

Do not merge it together with
https://github.com/topoteretes/cognee/pull/1898 its either that or this
one.



## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **API Changes**
* Ontology upload now accepts exactly one file per request; field
renamed from "descriptions" to "description" and validated as a plain
string.
* Stricter form validation and tighter 400/500 error handling for
malformed submissions.

* **Tests**
* Tests converted to real HTTP-style interactions using a shared test
client and dependency overrides.
* Payloads now use plain string fields; added coverage for single-file
constraints and specific error responses.

* **Style**
  * Minor formatting cleanups with no functional impact.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:30:35 +01:00
Igor Ilic
14d9540d1b
feat: Add database deletion on dataset delete (#1893)
<!-- .github/pull_request_template.md -->

## Description
- Add support for database deletion when dataset is deleted
- Simplify dataset handler usage in Cognee

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved dataset deletion: stronger authorization checks and reliable
removal of associated graph and vector storage.

* **Tests**
* Added end-to-end test to verify complete dataset deletion and cleanup
of all related storage components.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:15:48 +01:00
Vasilije
6b86f423ff
COG-3546: Initial release pipeline (#1883)
<!-- .github/pull_request_template.md -->

## Description

### Inputs: 
`flavour`: `dev` or `main`
`test_mode`: `Boolean`. Aka Dry Run. If true, it won't affect public
indices or repositories

### Jobs
* Create GitHub Release
* Release PyPI Package
* Release Docker Image

[Test
run](https://github.com/topoteretes/cognee/actions/runs/20167985120)

<img width="551" height="149" alt="Screenshot 2025-12-12 at 14 31 06"
src="https://github.com/user-attachments/assets/37648a35-0af8-4474-b051-4a81f8f8cfe7"
/>

#### Create GitHub Release
Gets the `version` from `pyproject.toml`
Creates a tag and release based on the version. The version in
`pyproject.toml` **must** be already correct!
If the version not updated and tag already exists - the process will
take no effect and be failed.

Test release was deleted. Here is the screenshot:
<img width="818" height="838" alt="Screenshot 2025-12-12 at 14 19 06"
src="https://github.com/user-attachments/assets/c2009f9f-1411-494e-8268-6489f6bdd959"
/>

#### Release PyPI Package

Publishes the `dist` artifacts to pypi.org. If `test_mode` enabled - it
will publish it to test.pypi.org
([example](https://test.pypi.org/project/cognee/0.5.0.dev0/)). <<<
That's how I tested it.

#### Release Docker Image
**IMPORTANT!**
Builds the image and tags it with the version from `pyproject.toml`. For
example:
* `cognee/cognee:0.5.1.dev0`
* `cognee/cognee:0.5.1`
ONLY **main** flavor adds the `latest` tag!. If a user does `docker pull
cognee/cognee:latest` - the latest `main` release image will be pulled

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Added an automated release pipeline that creates versioned GitHub
releases, publishes Python packages to TestPyPI/PyPI, and builds/pushes
Docker images.
* Supports selectable dev/main flavors, includes flavor-specific image
tagging and labels, wires version/tag outputs for downstream steps, and
offers a test-mode dry run that skips external publishing.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 17:13:40 +01:00
Vasilije
821852f2c3
feat: add support to pass custom parameters in llm adapters during cognify (#1802)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
A user raised this issue/feature request:
https://github.com/topoteretes/cognee/issues/1784
This PR implements this in a way, but I may have misunderstood the
request. If I'm correct, I think it makes sense to support this, if our
users need it.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 17:09:13 +01:00
Andrej Milicevic
433170fe09 merge dev 2025-12-15 17:06:20 +01:00
hajdul88
bad22ba26b
chore: adds id generation to memify triplet embedding pipeline (#1895)
<!-- .github/pull_request_template.md -->

## Description
This PR adds id generation to the Triplet objects in triplet embedding
memify pipeline. In some edge cases duplicated elements could have been
ingested into the collection



## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Enhancements**
* Relationship data now includes unique identifiers for improved
tracking and data management capabilities.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 15:45:35 +01:00
Vasilije
69e36cc834
feat: add bedrock as supported llm provider (#1830)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added support for AWS Bedrock, and the models that are available there.
This was a contributor PR that was never finished, so now I polished it
up and made it work.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added AWS Bedrock as a new LLM provider with support for multiple
authentication methods.
* Integrated three new AI models: Claude 4.5 Sonnet, Claude 4.5 Haiku,
and Amazon Nova Lite.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:33:57 +01:00
Vasilije
d5bd75c627
test: remove anything codify related from mcp test (#1890)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
After removing codify, I didn't remove codify-related stuff from mcp
test. Hopefully now I've removed everything that was failing.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
  * Removed test coverage for codify and developer rules functionality.
* Updated search functionality test to exclude additional search type
filtering.

* **Chores**
  * Increased default logging verbosity from ERROR to INFO level.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:33:22 +01:00
Igor Ilic
c94225f505
fix: make ontology key an optional param in cognify (#1894)
<!-- .github/pull_request_template.md -->

## Description
Make ontology key optional in Swagger and None by default (it was
"string" by default before change which was causing issues when running
cognify endpoint)

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Enhanced API documentation with additional examples and validation
metadata to improve request clarity and validation guidance.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:30:22 +01:00
Pavel Zorin
a6bc27afaa Cleanup 2025-12-12 17:31:54 +01:00
Pavel Zorin
14ff94f269 Initial release pipeline 2025-12-12 17:09:10 +01:00
Andrej Milicevic
116b6f1eeb chore: formatting 2025-12-12 13:46:16 +01:00
Andrej Milicevic
a225d7fc61 test: revert some changes 2025-12-12 13:44:58 +01:00
Igor Ilic
127d9860df
feat: Add dataset database handler info (#1887)
<!-- .github/pull_request_template.md -->

## Description
Add info on dataset database handler used for dataset database

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Datasets now record their assigned vector and graph database handlers,
allowing per-dataset backend selection.

* **Chores**
  * Database schema expanded to store handler identifiers per dataset.
* Deletion/cleanup processes now use dataset-level handler info for
accurate removal across backends.

* **Tests**
* Tests updated to include and validate the new handler fields in
dataset creation outputs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:22:03 +01:00
Igor Ilic
ede884e0b0
feat: make pipeline processing cache optional (#1876)
<!-- .github/pull_request_template.md -->

## Description
Make the pipeline cache mechanism optional, have it turned off by
default but use it for add and cognify like it has been used until now

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [ x I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Introduced pipeline caching across ingestion, processing, and custom
pipeline flows with per-run controls to enable or disable caching.
  * Added an option for incremental loading in custom pipeline runs.

* **Behavior Changes**
* One pipeline path now explicitly bypasses caching by default to always
re-run when invoked.
* Disabling cache forces re-processing instead of early exit; cache
reset still enables re-execution.

* **Tests**
* Added tests validating caching, non-caching, and cache-reset
re-execution behavior.

* **Chores**
  * Added CI job to run pipeline caching tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:11:31 +01:00
Andrej Milicevic
a337f4e54c test: testing logger 2025-12-12 13:02:55 +01:00
Andrej Milicevic
bce6094010 test: change logger 2025-12-12 12:43:54 +01:00
Andrej Milicevic
c48b274571 test: remove delete error from mcp test 2025-12-12 11:53:40 +01:00
Andrej Milicevic
3b8a607b5f test: fix errors in mcp test 2025-12-12 11:37:27 +01:00
Igor Ilic
7b3d997a06
Merge main vol7 (#1891)
<!-- .github/pull_request_template.md -->

## Description
Add commits from main to dev branch

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Removed permission validation checks from the data processing
pipeline, streamlining the overall workflow and reducing processing
steps.
* Updated task sequences across task handlers to reflect the removal of
the validation step.

* **Documentation**
* Updated processing pipeline documentation and example code to reflect
the new streamlined task sequence.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 20:11:54 +01:00
Igor Ilic
59f8d12fa3 Merge branch 'main' into merge-main-vol7 2025-12-11 19:11:24 +01:00
Andrej Milicevic
e211e66275 chore: remove quick option to isolate mcp CI test 2025-12-11 18:29:17 +01:00
Andrej Milicevic
0f50c993ac chore: add quick option to isolate mcp CI test 2025-12-11 18:20:07 +01:00
Andrej Milicevic
248ba74592 test: remove codify-related stuff from mcp test 2025-12-11 18:18:42 +01:00
Andrej Milicevic
af8c5bedcc feat: add kwargs to other adapters 2025-12-11 17:47:23 +01:00
Igor Ilic
46ddd4fd12
feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776)
<!-- .github/pull_request_template.md -->

## Description
Add ability to use multi tenant multi user mode with Neo4j

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **New Features**
* Multi-user support with per-dataset database isolation enabled by
default, allowing backend access control for secure data separation.
* Configurable database handlers via environment variables
(GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for
flexible deployment options.

* **Chores**
* Database schema migration to support per-user dataset database
configurations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 14:15:20 +01:00
Igor Ilic
0a1ed79340 refactor: change neo4j_aura to neo4j_aura_dev 2025-12-11 13:05:23 +01:00
Pavel Zorin
fe7e97be45
Chore: Remove Ontology file size limit. Code duplications (#1880)
<!-- .github/pull_request_template.md -->

## Description
We received a complaint about the 10MB file size limit. 
Removed code duplications
More strict types
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Support for supplying optional per-file descriptions when uploading
multiple ontologies.

* **Improvements**
* Removed the 10MB file size limit for ontology uploads, allowing larger
files.
* Streamlined and more robust upload handling with improved per-file
validation and safer upload behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 10:49:55 +01:00
Pavel Zorin
88f61f9bdb Added filename check 2025-12-10 17:24:31 +01:00
hajdul88
001fbe699e
feat: Adds edge centered payload and embedding structure during ingestion (#1853)
<!-- .github/pull_request_template.md -->

## Description
This pull request introduces edge‑centered payloads to the ingestion
process. Payloads are stored in the Triplet_text collection which is
compatible with the triplet_embedding memify pipeline.

Changes in This PR:

- Refactored custom edge handling, from now on they can be passed to the
add_data_points method so the ingestion is centralized and is happening
in one place.
- Added private methods to handle edge centered payload creation inside
the add_data_points.py
- Added unit tests to cover the new functionality
- Added integration tests
- Added e2e tests

Acceptance Criteria and Testing
Scenario 1:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB contains a non empty Triplet_text collection and
the number of triplets are matching with the number of edges in the
graph database
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB does not have the Triplet_text collection 
-You should receive an error indicating that the Triplet_text is not
available


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet embeddings supported—embeddings created from graph edges plus
connected node text
  * Ability to supply custom edges when adding data points
  * New configuration toggle to enable/disable triplet embedding

* **Tests**
* Added comprehensive unit and end-to-end tests for edge-centered
payloads and triplet embedding
  * New CI job to run the edge-centered payload e2e test

* **Bug Fixes**
* Adjusted server start behavior to surface process output in parent
logs

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-10 17:10:06 +01:00
Pavel Zorin
2ca194c28f fix format 2025-12-09 18:22:44 +01:00
Pavel Zorin
d932ee4bd9 Specify file type 2025-12-09 17:58:34 +01:00
Pavel Zorin
d0b914acaa Chore: Remove Ontology file size limit. Code duplications 2025-12-09 17:55:43 +01:00
Vasilije
49f7c5188c
feat: avoid double edge vector search in triplet search (#1877)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Eliminates double vector search for edges by ensuring all edge lookups
happen once in the retrieval layer.
- `brute_force_triplet_search`: Always includes
"EdgeType_relationship_name" in collections
- `CogneeGraph.map_vector_distances_to_graph_edges`: Removed internal
vector search fallback; only maps provided distances.
- Tests updated to reflect the new behavior.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Ensured relationship edges are automatically included in search
collections, improving search completeness and accuracy.

* **Refactor**
* Simplified graph edge distance mapping logic by removing unnecessary
external dependencies, resulting in more efficient edge processing
during retrieval operations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-09 13:23:57 +01:00
lxobr
c04d255aca feat: remove secondary search 2025-12-08 17:29:25 +01:00
Vasilije
75fea8dcc8
Removed check_permissions_on_dataset.py and related references (#1786)
<!-- .github/pull_request_template.md -->

## Description
This PR removes the obsolete `check_permissions_on_dataset` task and all
its related imports and usages across the codebase.
The authorization logic is now handled earlier in the pipeline, so this
task is no longer needed.
These changes simplify the default Cognify pipeline and make the code
cleaner and easier to maintain.

### Changes Made
- Removed `cognee/tasks/documents/check_permissions_on_dataset.py` 
- Removed import from `cognee/tasks/documents/__init__.py` 
- Removed import and usage in `cognee/api/v1/cognify/cognify.py` 
- Removed import and usage in
`cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py`
- Updated comments in
`cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py`
(index positions changed)
- Removed usage in `notebooks/cognee_demo.ipynb` 
- Updated documentation in `examples/python/simple_example.py` (process
description)

---

## Type of Change
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Other (please specify): Task removal / cleanup of deprecated
function

---

## Pre-submission Checklist
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue**
- [x] My code follows the project's coding standards and style
guidelines
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description (Closes
#1771)
- [x] My commits have clear and descriptive messages

---

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-08 05:43:42 +01:00
Vasilije
7a3138edf8
fix: remove double quotes from llmconfig str params (#1758)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Recently a few cases cryptic errors like in issue #1721 have occurred
across cognee use cases.

Debugging #1721 however, I found out that if LLM_API_KEY happens to have
`"` quotation marks as part of it's value, for example, when already
part of the ENV

<img width="1014" height="507" alt="Screenshot 2025-11-07 at 16 58 22"
src="https://github.com/user-attachments/assets/54b7cbb0-5bdc-4b40-b2b1-aed6c5d3d886"
/>

Then it makes it's way into Cognee and gets treated as part of the API
key.

By default, we do not do sanitization nor cleanup.

While most of the time quotation marks get handled for us:
1. `export KEY="VALUE"` will strip it
2. python dotenv will strip it if read from `.env`

But issues like https://github.com/docker/cli/issues/3630 and #1721
demonstrate that we have to have some handling on our end instead of
assuming it's stripped.

## This PR

This PR sets up a list of string params we want to strip + some that we
may want to.

We may want to avoid doing this for all params, which is why I went with
selective approach.

TODO: add testing

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Configuration values with surrounding quotes are now automatically
normalized and cleaned during system initialization, ensuring consistent
and predictable data handling across all configuration parameters.

* **Tests**
* Added comprehensive unit tests to validate automatic quote removal
from configuration values, covering various scenarios including quoted,
unquoted, empty, and edge cases with mixed and internal quotes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:10:23 +01:00
Vasilije
40bbdd1ac7
fix: install nvm and node for -ui cli command (#1836)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Enhanced Node.js and npm environment management for improved system
compatibility on Unix-like platforms.

* **Chores**
* Updated Next.js to v16, React to v19.2, and Auth0 SDK to v4.13.1 for
compatibility and performance improvements.
  * Removed CrewAI workflow trigger component.
  * Removed user feedback submission form.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:09:49 +01:00
Vasilije
52b0029fbf
fix: Resolve issue with BAML rate limit handling (#1813)
<!-- .github/pull_request_template.md -->

## Description
Add rate limit handling for BAML

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Improvements**
* Centralized rate limiting for LLM and embedding requests to better
manage API throughput.
* Retry policies adjusted: longer initial backoff and reduced max
retries to improve stability under rate limits.

* **Chores**
* Refactored rate-limiting implementation from decorator-based to
context-manager usage across services.

* **Tests**
* Unit tests and mocks updated to reflect the new context-manager
rate-limiting approach.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:07:14 +01:00
Igor Ilic
67a4d40257 chore: regen poetry lock file 2025-12-05 19:51:26 +01:00
Igor Ilic
2f572ae509 test: Update embeding limiter test 2025-12-05 19:18:48 +01:00
Igor Ilic
a66b2ceeca refactor: reduce ammount of retry attempts for baml llm calls 2025-12-05 18:58:59 +01:00
Igor Ilic
7deaa6e8e9 feat: Add RPM limiting to Cognee 2025-12-05 18:56:34 +01:00
Igor Ilic
0c97a400b0 feat: Add RPM control 2025-12-05 15:40:24 +01:00
Igor Ilic
5d0586da28
Merge branch 'dev' into baml-rate-limit-handling 2025-12-05 13:24:07 +01:00
hajdul88
d5bf5cf4e9
fix: fixes lancedb batch handling (#1872)
<!-- .github/pull_request_template.md -->

## Description
Fixes lancedb batch handling issue. Duplicated elements could appear in
the collections when duplicates happen in the same insert
batch.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved data integrity by implementing deduplication logic to
eliminate duplicate entries and ensure only the latest version is
retained.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-05 12:26:45 +01:00
Vasilije
9571641199
refactor: move codify pipeline out of main repo (#1738)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
This PR removes codify, and the code graph pipeline, out of the
repository. It also introduces a Custom Pipeline interface, which can be
used in the future to define custom pipelines.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-04 23:10:39 -08:00
Igor Ilic
4afde917a9
Main merge vol4 (#1856)
<!-- .github/pull_request_template.md -->

## Description
Merge main changes into dev branch

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Overhauled README: renamed tagline, clarified product positioning,
reorganized Get Started into Open Source and Cloud paths, streamlined
Quickstart, refreshed demos, navigation, and citation/contributing
sections.

* **New Features**
* Added MCP tools for developer rules management, interaction logging,
and enhanced search modes (SUMMARIES, CYPHER, FEELING_LUCKY).

* **Bug Fixes**
  * Improved embedding handling to support alternate response formats.

* **Chores**
  * Updated test/dev dependency versions.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-04 17:48:11 +01:00
Pavel Zorin
386e0c8234
chore: uv lock check in pre-test workflow (#1773)
<!-- .github/pull_request_template.md -->
`.github/actions/cognee_setup/action.yml` - makes `uv.lock` rebuild
disabled by default.
`.github/workflows/pre_test.yml` - lightweight checks that signal that
PR is not ready for testing. For that moment it verifies that `uv.lock`
corresponds to `pyproject.toml`. For the case of failure the run
provides the instructions how to fix the `uv.lock`

Warning: made an exclusion for python 3.13 dependencies. This doesn't
look good to me

## Description
Make
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI improvement

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Made uv lockfile rebuilding optional in CI (disabled by default).
* Added a pre-validation workflow to check lockfile integrity before
tests.
* Ensured pre-test validation runs before basic and end-to-end test
jobs.
* Updated dependencies to support Python 3.13 with version-specific lxml
constraints and adjusted docs dependency structure.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-04 15:50:39 +01:00
Pavel Zorin
7033b63cd4 update poetry lock 2025-12-04 13:26:53 +01:00
Pavel Zorin
6d1f1d183a Removed python-version from pre-test 2025-12-04 11:51:32 +01:00
Pavel Zorin
ed0c0d1823 Potential fix for code scanning alert no. 407: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-12-04 11:51:32 +01:00
Pavel Zorin
a50810240b Vesion exclusion for lxml and python 3.13 2025-12-04 11:51:28 +01:00
Pavel Zorin
c7a2138966 CI: uv lock check. Pre-test workflow. 2025-12-04 11:47:35 +01:00
Igor Ilic
7d7f8a249a
Merge branch 'dev' into main-merge-vol4 2025-12-04 10:32:10 +01:00
Igor Ilic
f1c5b9a55f fix: Resolve DB caching issues when deleting databases 2025-12-03 18:05:47 +01:00
Igor Ilic
fd84edeb74 refactor: change getting of tables during deletion 2025-12-03 15:43:41 +01:00
Boris
8cad9ef225
Merge branch 'dev' into feature/cog-3409-add-bedrock-as-supported-llm-provider 2025-12-03 14:58:00 +01:00
Igor Ilic
45f32f8bfd
Merge branch 'dev' into multi-tenant-neo4j 2025-12-03 14:37:13 +01:00
Igor Ilic
1961efcc33 fix: Handle scenario when there is no relational database on prune time 2025-12-03 14:27:06 +01:00
Igor Ilic
f4078d1247 feat: Add ability to delete lance and kuzu datasets, add prune to work with multi user mode 2025-12-03 13:10:18 +01:00
Igor Ilic
5698c609f5 test: Update tests with regards to auto scaling changes 2025-12-03 11:47:10 +01:00
Boris Arzentar
0d2e84f58e
test: test_strip_quotes_from_strings 2025-12-03 10:59:17 +01:00
Boris
3288ef01a4
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-03 10:05:49 +01:00
hajdul88
d4d190ac2b
feature: adds triplet embedding via memify (#1832)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces triplet embeddings via a new
create_triplet_embeddings memify pipeline.
The pipeline reads the graph in batches, extracts properties from graph
elements based on their datapoint types, and generates combined triplet
embeddings. These embeddings are stored in the vector database as a new
collection.

Changes in This PR:

-Added a new create_triplet_embeddings memify pipeline.
-Added a new get_triplet_datapoints memify task.
-Introduced a new triplet_completion search type.
-Added full test coverage
--Unit tests: memify task, pipeline, and retriever
--Integration tests: memify task, pipeline, and retriever
--End-to-end tests: updated session history tests and multi-DB search
tests; added tests for triplet_completion and memify pipeline execution

Acceptance Criteria and Testing
Scenario 1:
-Run default add, cognify pipelines
-Run create triplet embeddings memify pipeline
-Verify the vector DB contains a non empty Triplet_text collection.
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Run the default add and cognify pipelines.
-Do not run the triplet embeddings memify pipeline.
-Attempt to use the triplet_completion search type.
-You should receive an error indicating that the triplet embeddings
memify pipeline must be executed first.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION)
* Batch triplet retrieval and a triplet embeddings pipeline for
extraction, indexing, and optional background processing
* Context retrieval from triplet embeddings with optional caching and
conversation-history support
  * New Triplet data type exposed for indexing and search

* **Examples**
* End-to-end example demonstrating triplet embeddings extraction and
TRIPLET_COMPLETION search

* **Tests**
* Unit and integration tests covering triplet extraction, retrieval,
embedding pipeline, and completion flows

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-02 18:27:08 +01:00
Igor Ilic
4e8b2ffc3e chore: Update lock files 2025-12-02 17:51:30 +01:00
Igor Ilic
c7810e9fdb Merge branch 'dev' into main-merge-vol4 2025-12-02 17:40:09 +01:00
Igor Ilic
1282905888 feat: add password encryption for Neo4j 2025-12-02 16:34:16 +01:00
Igor Ilic
2d45db9e0d
Fix distributed issues with latest pydantic version (#1859)
<!-- .github/pull_request_template.md -->

## Description
Resolve distributed issues with poetry lock

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated dependency version constraints to improve compatibility and
flexibility with Pydantic and aiofiles packages.
* Removed unused development dependency from the project configuration.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-02 16:08:26 +01:00
Igor Ilic
92448767fe refactor: remove done TODOs 2025-12-02 14:29:51 +01:00
Igor Ilic
702cdb45be
Merge branch 'dev' into multi-tenant-neo4j 2025-12-02 13:11:37 +01:00
Igor Ilic
dbcb35a6da chore: remove unused imports, add optional for delete dataset statement 2025-12-02 13:09:45 +01:00
Boris Arzentar
0ff836b6dd
fix: install latest nvm version 2025-12-02 10:48:28 +01:00
Boris Arzentar
5fe6a17cfd
fix: resolve nvm when not in path 2025-12-02 10:43:57 +01:00
Boris Arzentar
5ee5ae294a
Merge remote-tracking branch 'origin/dev' into feature/cog-3441-cognee-cli-ui-fix 2025-12-01 20:23:01 +01:00
Andrej Milicevic
d473ef12ae fix: small changes based on PR comments 2025-12-01 18:32:55 +01:00
Igor Ilic
8e67471d1e
Merge branch 'dev' into main-merge-vol4 2025-12-01 17:43:46 +01:00
Vasilije
c17f838034
CI: 32 GB machine for Ollama tests (#1857)
<!-- .github/pull_request_template.md -->

## Description
Recently the Llama test became failing with `model requires more system
memory (8.9 GiB) than is available (8.4 GiB)`. Due to `cgroup`
configuration, only 8 GBs are available for containers running on
`buildjet-4vcpu-ubuntu-2204`.
The decision is to change the the machine to
`buildjet-8vcpu-ubuntu-2204`. it costs 0.0016 $ per minute.

Unconfidently changed the model to `phi3:mini`. Any other ideas are
welcome.
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-01 08:01:24 -08:00
Boris
2df84dba27
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-01 16:03:00 +01:00
Igor Ilic
5cfc7b1761 chore: Disable backend access control when not supported 2025-12-01 15:58:19 +01:00
Pavel Zorin
e480acaa7c
COG-3437: Chore: CodeRabbit config (#1833)
## Description
Adds `.coderabbit.yml` that tunes CodeRabbit reviews
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): chore

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Added a project-wide review and tooling configuration to standardize
reviews, path-specific guidance, automated incremental reviews, chat
auto-replies, and integrations with linters/validators.
* Configured review behaviors (auto-review, abort-on-close, high-level
summaries, placeholders) and path filtering to focus checks where
needed.

* **Note**
  * No user-visible changes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-01 15:29:49 +01:00
Pavel Zorin
ba9ca46574 Increase the machine size 2025-12-01 15:23:59 +01:00
Igor Ilic
362aa8df5c
Merge branch 'main' into baml-rate-limit-handling 2025-12-01 15:12:27 +01:00
Igor Ilic
2e493cea4c chore: Disable multi user mode for tests that can't run it 2025-12-01 15:07:01 +01:00
Igor Ilic
524e3e8232 chore: Update lock files 2025-12-01 14:52:01 +01:00
Pavel Zorin
7c9a78abea CI: Smaller embedding model for Ollama test 2025-12-01 14:39:45 +01:00
Igor Ilic
859e98b494 fix: resolve issue with poetry and dev dependency 2025-12-01 13:47:16 +01:00
Boris
76d054b6a5
Merge branch 'dev' into feature/cog-3156-move-codify-pipeline-out-of-main-repo 2025-12-01 11:21:34 +01:00
Igor Ilic
0bb4ece4d8 Merge branch 'main' into main-merge-vol4 2025-12-01 11:16:59 +01:00
Boris
5ce1af8cc0
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-01 10:09:53 +01:00
Igor Ilic
d81d63390f test: Add test for dataset database handler creation 2025-11-28 16:33:46 +01:00
Igor Ilic
a0c5867977 chore: disable backend access control 2025-11-28 14:56:33 +01:00
Igor Ilic
7e0be8f167 chore: disable backend access control for tests not supporting mode 2025-11-28 13:48:01 +01:00
Igor Ilic
7844b9a3a5 Merge branch 'multi-tenant-neo4j' of github.com:topoteretes/cognee into multi-tenant-neo4j 2025-11-28 13:12:07 +01:00
Igor Ilic
ed9b774448 chore: disable backend access control for deduplication test 2025-11-28 13:11:45 +01:00
Igor Ilic
0c825b96ff
Merge branch 'dev' into multi-tenant-neo4j 2025-11-28 12:55:48 +01:00
Vasilije
00b60aed6c
backport: Adds lance-namespace version fix to toml (fixes lancedb issue with 0.2.0 lance-namespace version) + crawler ingetration test url fix (#1842)
<!-- .github/pull_request_template.md -->

## Description
Implements a quick fix for the lance-namespace 0.0.21 to 0.2.0 release
issue with lancedb. Later this has to be revisited if they fix it on
their side, for now we fixed the lance-namespace version to the previous
one.


**If Lancedb fixes the issue on their side this can be closed**


Additionally cherry picking crawler integration test fixes from dev

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-27 10:47:00 -08:00
Igor Ilic
45841824d0 chore: Update Cognee version 2025-11-27 19:29:44 +01:00
Igor Ilic
ddf802ff54 chore: Add migration of unique constraint for SQLite 2025-11-27 18:38:00 +01:00
Andrej Milicevic
aa8afefe8a feat: add kwargs to cognify and related tasks 2025-11-27 17:05:37 +01:00
Andrej Milicevic
c649900042 Merge branch 'dev' into feature/cog-3396-add-support-to-pass-custom-parameters-in-openai-adapter 2025-11-27 16:59:43 +01:00
Andrej Milicevic
c1857a50fa fix: remove new custom pipelien interface 2025-11-27 14:58:07 +01:00
Andrej Milicevic
f776f04ee0 feat: add registration and use of custom retrievers 2025-11-27 14:55:22 +01:00
hajdul88
0fd939ca2b updating url again 2025-11-27 13:28:48 +01:00
hajdul88
8b61e1baa2 fix: adds lance-namespace version fix to toml + fixes lancedb max version 2025-11-27 12:33:04 +01:00
Igor Ilic
1ff6a72fc7 refactor: set default value to empty dictionary 2025-11-26 16:45:18 +01:00
hajdul88
508165e883
feature: Introduces wide subgraph search in graph completion and improves QA speed (#1736)
<!-- .github/pull_request_template.md -->

This PR introduces wide vector and graph structure filtering
capabilities. With these changes, the graph completion retriever and all
retrievers that inherit from it will now filter relevant vector elements
and subgraphs based on the query. This improvement significantly
increases search speed for large graphs while maintaining—and in some
cases slightly improving—accuracy.

Changes in This PR:

-Introduced new wide_search_top_k parameter: Controls the initial search
space size

-Added graph adapter level filtering method: Enables relevant subgraph
filtering while maintaining backward compatibility. For community or
custom graph adapters that don't implement this method, the system
gracefully falls back to the original search behavior.

-Updated modal dashboard and evaluation framework: Fixed compatibility
issues.
Added comprehensive unit tests: Introduced unit tests for
brute_force_triplet_search (previously untested) and expanded the
CogneeGraph test suite.

Integration tests: Existing integration tests verify end-to-end search
functionality (no changes required).

Acceptance Criteria and Testing

To verify the new search behavior, run search queries with different
wide_search_top_k parameters while logging is enabled:
None: Triggers a full graph search (default behavior)
1: Projects a minimal subgraph (demonstrates maximum filtering)
Custom values: Test intermediate levels of filtering

Internal Testing and results:
Performance and accuracy benchmarks are available upon request. The
implementation demonstrates measurable improvements in query latency for
large graphs without sacrificing result quality.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-26 15:18:53 +01:00
Andrej Milicevic
700362a233 fix: fix model names and test names 2025-11-26 13:35:56 +01:00
Boris Arzentar
ca271c5dbb
fix: lint error 2025-11-26 12:43:57 +01:00
Andrej Milicevic
02b9fa485c fix: remove random addition to pyproject file 2025-11-26 12:33:22 +01:00
Andrej Milicevic
0fe16939c1 remove code_graph example after dev merge 2025-11-26 12:32:17 +01:00
Boris Arzentar
2f06c3a97e
fix: install nvm and node for -ui cli command 2025-11-26 12:24:14 +01:00
Andrej Milicevic
5a2a5f64d2 merge dev 2025-11-26 11:04:11 +01:00
Andrej Milicevic
7c5a17ecb5 test: add extra dependency to bedrock ci test 2025-11-26 11:02:36 +01:00
Pavel Zorin
39c6eba571 coderabbit fix 2025-11-25 18:09:43 +01:00
Igor Ilic
cf9edf2663 chore: Add migration for new dataset database model field 2025-11-25 18:03:35 +01:00
Igor Ilic
69777ef0a5 feat: Add ability to handle custom connection resolution to avoid storing security critical data in rel dbx 2025-11-25 17:53:21 +01:00
Pavel Zorin
0f8cec64d5 fix comments 2025-11-25 17:51:32 +01:00
Pavel Zorin
ff20f021cc fix comments 2025-11-25 17:49:56 +01:00
Pavel Zorin
e46c0c4f6c CodeRabbit config 2025-11-25 17:37:33 +01:00
Igor Ilic
5f3b776406 chore: add todo for enhancing db connections 2025-11-25 16:38:34 +01:00
Igor Ilic
2e02aafbae refactor: Remove unused imports 2025-11-25 15:55:36 +01:00
Igor Ilic
593f17fcdc refactor: Add better handling of configuration for dataset to database handler 2025-11-25 15:41:01 +01:00
Andrej Milicevic
9e652a3a93 fix: use uv instead of poetry in CI tests 2025-11-25 13:34:18 +01:00
Andrej Milicevic
4c6bed885e chore: ruff format 2025-11-25 13:02:26 +01:00
Andrej Milicevic
f22330a7b6 Merge branch 'dev' into feature/bedrock-llm-provider 2025-11-25 13:02:07 +01:00
Andrej Milicevic
e0d48c043a fix: fixes to adapter and tests 2025-11-25 12:58:07 +01:00
Igor Ilic
64a3ee96c4 refactor: Create new abstraction for dataset database mapping and handling 2025-11-24 20:31:28 +01:00
hajdul88
c2c64a417c
fix: fixes ontology api endpoint tests + poetry lock(#1824)
<!-- .github/pull_request_template.md -->

## Description

This PR fixes the failing CI tests related to the new ontology api
endpoint.


<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-24 17:44:51 +01:00
Andrej Milicevic
d97acba78e Merge branch 'dev' into feature/bedrock-llm-provider 2025-11-24 16:48:45 +01:00
Andrej Milicevic
f732fbf55f merge dev 2025-11-24 16:47:59 +01:00
Andrej Milicevic
3b78eb88bd fix: use s3 config 2025-11-24 16:38:23 +01:00
Boris
58fa95605a
version: 0.5.0.dev0 (#1827)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-24 16:19:13 +01:00
Boris Arzentar
af8c55e82b
version: 0.5.0.dev0 2025-11-24 16:16:47 +01:00
Vasilije
2a005d6e1f
feat: make notebook cognee usage identical across uis (#1747)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-22 14:50:01 -08:00
Vasilije
e81613ea6e
feat: add ontology endpoint in REST API (#1724)
## Description

This PR resolves #1446 by adding support to upload ontology files and
refer to them in the cognee POST request.

  Implementation Details:
- New endpoint: POST /api/v1/ontologies for ontology file upload with a
simple key parameter
  that can be referenced in POST cognify requests
- File storage: Ontology files are stored in /tmp/ontologies/{user_id}/
with metadata
  management
- New service: OntologyService created for file management and metadata
handling
- Resolver: RDFLibOntologyResolver modified to handle file-like objects.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [X] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [X] **I have tested my changes thoroughly before submitting this PR**
- [X] **This PR contains minimal changes necessary to address the
issue/feature**
- [X] My code follows the project's coding standards and style
guidelines
- [X] I have added tests that prove my fix is effective or that my
feature works
- [X] I have added necessary documentation (if applicable)
- [X] All new and existing tests pass
- [X] I have searched existing PRs to ensure this change hasn't been
submitted already
- [X] I have linked any relevant issues in the description
- [X] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-22 14:49:09 -08:00
Vasilije
2f2a4487f0
feat: csv ingestion & chunking (#1574)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Create a dedicated CSV ingestion path with a custom loader and custom
chunker that preserves row-column relationships in the produced chunks.
#1348

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [x] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [x] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-22 14:48:27 -08:00
Vasilije
bcf1d4890f
feat: add instructor mode env variable and config parameter (#1789)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added a variable to control which instructor mode we use. The defaults
for each adapter are used, but a user can override this if the set the
`LLM_INSTRUCTOR_MODE` env variable.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-22 14:18:40 -08:00
Boris
ae227106ef
Merge branch 'dev' into feature/cog-3155-engineering-make-notebook-cognee-usage-identical-across-uis 2025-11-22 12:19:52 +01:00
Pavel Zorin
d99a7fffdd
ci(Mergify): configuration update (#1820)
This change has been made by @pazone from the Mergify config editor.
2025-11-21 17:59:57 +01:00
Pavel Zorin
a7d0132e16 ci(Mergify): configuration update
Signed-off-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-21 17:59:15 +01:00
Pavel Zorin
a472db5db5
CI: mergify config (#1818)
<!-- .github/pull_request_template.md -->

## Description
Use case: If you merge(d) a PR to `dev` and also it's good to have it in
`main` ASAP. For example, tests fix.
Just add `backport_main` label to the PR (even if it's already merged).
The same PR will be created but targeted to `main`.

It's called
[Backport](https://docs.mergify.com/workflow/actions/backport/). If the
backport has conflicts - it will have a `conflicts` label. Be careful
with that

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-21 16:57:41 +01:00
Pavel Zorin
dd87acc004
CI: Release drafter configuration (#1817)
<!-- .github/pull_request_template.md -->

## Description
[Release drafter](https://github.com/release-drafter/release-drafter)
template configuration. Will generate and update release draft on each
PR merge to dev

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-21 16:57:28 +01:00
Andrej Milicevic
204f9c2e4a fix: PR comment changes 2025-11-21 16:20:19 +01:00
Pavel Zorin
efb9739ac6 CI: mergify config 2025-11-21 15:23:35 +01:00
Pavel Zorin
83e0876066 added CI label 2025-11-21 15:04:32 +01:00
Pavel Zorin
0b20f13117 CI: Release drafter configuration 2025-11-21 15:01:09 +01:00
Igor Ilic
0800810713 refactor: remove print statements 2025-11-20 18:46:02 +01:00
Igor Ilic
68d81a9125 refactor: Update multi-user database dataset creation mechanism 2025-11-20 18:37:15 +01:00
hajdul88
2176ec16b8
chore: changes url for crawler tests (#1816)
<!-- .github/pull_request_template.md -->

Updates crawler test url to avoid blocking and unavailable sites in CI.

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-20 17:03:36 +01:00
Andrej Milicevic
4e880eca84 chore: update env template 2025-11-20 15:47:22 +01:00
Vasilije
3fe354e34e
Handle multiple response formats in OllamaEmbeddingEngine (#1735)
The OllamaEmbeddingEngine is compatible with OpenAI

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [*] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [* ] **I have tested my changes thoroughly before submitting this PR**
- [*] **This PR contains minimal changes necessary to address the
issue/feature**
- [*] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-20 05:19:43 -08:00
Igor Ilic
a8950b244f
Merge branch 'main' into baml-rate-limit-handling 2025-11-20 13:27:22 +01:00
Pavel Zorin
f996ecffcc
docs(mcp): update available tools documentation (#1740)
<!-- .github/pull_request_template.md -->

## Description
- Add missing tools: cognify_add_developer_rules, get_developer_rules,
save_interaction
- Enhance search tool description with all supported search types
- Documentation now accurately reflects all 10 tools implemented in
server.py

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-20 13:20:42 +01:00
Fahad Shoaib
8cfb6c41ee fix: remove async from ontology endpoint test functions 2025-11-20 15:54:09 +05:00
Fahad Shoaib
30e3971d44 fix: add auth headers to ontology upload request and enhance ontology content 2025-11-20 15:36:15 +05:00
Igor Ilic
7360729db1 fix: Resolve issue with BAML rate limit handling 2025-11-19 23:04:19 +01:00
hajdul88
fe55071849
Feature/cog 3407 fixing integration test in ci (#1810)
<!-- .github/pull_request_template.md -->

## Description
This PR should fix the web crawler integration test issue in our CI

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-19 15:33:50 +01:00
Boris
4888cd54b8
Merge branch 'dev' into feature/cog-3155-engineering-make-notebook-cognee-usage-identical-across-uis 2025-11-17 19:25:00 +01:00
Andrej Milicevic
0a4b1068a2 feat: add kwargs to openai adapter functions 2025-11-17 17:42:22 +01:00
BOBDA DJIMO ARMEL HYACINTHE
4fcbc96e85 Merge branch 'fix/update-mcp-tools-documentation' of https://github.com/armelhbobdad/cognee into fix/update-mcp-tools-documentation 2025-11-17 19:59:19 +04:00
Armel BOBDA
53532921cf
Update cognee-mcp/README.md
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
2025-11-17 19:56:57 +04:00
BOBDA DJIMO ARMEL HYACINTHE
a5aa58a7d7 docs(mcp): correct tool name in README
- Update the tool name from cognify_add_developer_rules to cognee_add_developer_rules.
2025-11-17 19:55:32 +04:00
Armel BOBDA
900dd38625
Merge branch 'main' into fix/update-mcp-tools-documentation 2025-11-17 15:54:35 +04:00
hajdul88
ccea3c39d4
Merge branch 'dev' into feature/ontology-param-rest 2025-11-17 09:40:37 +01:00
EricXiao
688857ccc3
Merge branch 'dev' into feat/csv-ingestion 2025-11-17 14:42:20 +08:00
EricXiao
983bfae4fc chore: remove unnecessary csv file type
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-11-17 14:41:55 +08:00
Fahad Shoaib
1ded09d0f9 fix: fixed test ontology file content. Added tests to support multiple files and improved validation. 2025-11-15 00:06:55 +05:00
Fahad Shoaib
01f1c099cc test: enhance server start test with ontology upload verification
- Extend test_cognee_server_start to upload ontology and verify integration
- Move test_ontology_endpoint from tests/ to tests/unit/api/
2025-11-14 22:20:54 +05:00
Fahad Shoaib
844b8d635a feat: enhance ontology handling to support multiple uploads and retrievals 2025-11-14 22:13:00 +05:00
Pavel Zorin
de6b842d02
Fix: Remove cognee script from pyproject.toml (#1785)
<!-- .github/pull_request_template.md -->

## Description
Removes `cognee` script entry from the pyproject.toml. This might affect
the MCP server startup.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-14 13:04:51 +01:00
Andrej Milicevic
205f5a9e0c fix: Fix based on PR comments 2025-11-14 11:05:39 +01:00
Vasilije
6b88f925af
Fix: MCP remove cognee.add() preprequisite from the doc (#1788)
<!-- .github/pull_request_template.md -->

## Description
This prerequisite is not required because `cognee.add()` is invoked in
the `cognify`. This confuses Claude
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-14 10:09:43 +01:00
EricXiao
01f9dd957c
Merge branch 'dev' into feat/csv-ingestion 2025-11-14 15:24:31 +08:00
EricXiao
661c194f97 fix: Resolve issue with csv suffix classification
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-11-14 15:21:47 +08:00
EricXiao
4c609d6074 Merge branch 'dev' into feat/csv-ingestion
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-11-14 14:46:11 +08:00
Andrej Milicevic
2337d36f7b feat: add variable to control instructor mode 2025-11-13 18:25:07 +01:00
Pavel Zorin
c6454338f9 Fix: MCP remove cognee.add() preprequisite from the doc 2025-11-13 17:35:16 +01:00
Andrej Milićević
f882fb7e0e
fix: remove duplicate mistral adapter creation (#1787)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-13 16:59:45 +01:00
Andrej Milicevic
3b7d030817 fix: remove duplicate mistral adapter creation 2025-11-13 16:06:07 +01:00
martin0731
3acb581bd0 Removed check_permissions_on_dataset.py and related references 2025-11-13 08:31:15 -05:00
Pavel Zorin
f9cde2f375 Fix: Remove cognee script from pyproject.toml 2025-11-13 13:35:07 +01:00
Igor Ilic
d566541a4b
Merge branch 'dev' into multi-tenant-neo4j 2025-11-12 21:32:40 +01:00
Igor Ilic
a5bd504daa
Relational DB migration test search (#1752)
<!-- .github/pull_request_template.md -->

## Description
Add deterministic Cognee search test after rel DB migration. 
Test gathers all relevant relationships regarding Customers and their
Invoices from relational DB that was migrated and then tries to get the
same results with Cognee search.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-12 21:32:22 +01:00
Igor Ilic
6bb642d6b8 refactor: Start adding multi-user functions to db interfaces 2025-11-12 21:24:40 +01:00
Igor Ilic
0176cd5a68 refactor: Add todo point 2025-11-12 18:01:44 +01:00
Igor Ilic
b017fcc8d0 refactor: Make neo4j auto scaling more readable 2025-11-12 17:58:27 +01:00
Pavel Zorin
d328f2f612
Chore: Acceptance Criteria for PRs (#1781)
<!-- .github/pull_request_template.md -->

## Description
Added Acceptance Criteria to Pull Request template

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): chore

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-12 15:40:33 +01:00
Daulet Amirkhanov
056424f244
feat: fs-cache (#1645)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

Implement File-Based Version of the Redis Cache Adapter

Description and acceptance criteria:

This PR introduces a file-based cache adapter as an alternative to the
existing Redis-based adapter. It provides the same core functionality
for caching session data and maintaining context across multiple user
interactions but stores data locally in files instead of Redis.

Because the shared Kùzu lock mechanism relies on Redis, it is not
supported in this implementation. If a lock is configured, the adapter
will raise an error to prevent misconfiguration.

You can test this adapter by enabling caching with the following
settings:

caching=True
cache_backend="fs"

When running multiple searches in a session, the system should correctly
maintain conversational context. For example:

- What is XY?
- Are you sure?
- What was my first question?

In this case, the adapter should preserve previous user–Cognee
interactions within the cache file so that follow-up queries remain
context-aware.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-11-12 15:34:30 +01:00
Pavel Zorin
b716c2e3c4 Chore: Acceptance Criteria for PRs 2025-11-12 13:35:04 +01:00
Vasilije
b6a507b435
fix: Resolve issue with empty node set (#1744)
<!-- .github/pull_request_template.md -->

## Description
Transform default node set value needed for FastAPI to default value
needed by Cognee

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-11 21:53:50 +01:00
Vasilije
466c942922
refactor: Disable telemetry for all non telemetry tests (#1777)
<!-- .github/pull_request_template.md -->

## Description
Explicitely set env to dev for each test even if it's set on a workflow
level to avoid a chance to not have it set in a new workflow in the
future, add setting of env to dev where it was missing. Add
"test-machine" .anon_id for every cicd call

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-11 20:12:18 +01:00
Igor Ilic
a0a14e7ccc refactor: Update dataset database class 2025-11-11 20:05:47 +01:00
Igor Ilic
a06c58a8ae Merge branch 'dev' into multi-tenant-neo4j 2025-11-11 20:04:22 +01:00
Igor Ilic
ba693b7ef4 chore: add shell to setting of anon_id in gh action 2025-11-11 19:58:30 +01:00
Igor Ilic
cc0e1a83ab refactor: Disable telemetry for all non telemetry tests 2025-11-11 19:55:29 +01:00
Igor Ilic
432d4a1578 feat: Add initial multi tenant neo4j support 2025-11-11 19:44:34 +01:00
Andrej Milićević
8e8aecb76f
feat: enable multi user for falkor (#1689)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added multi-user support for Falkor. Adding support for the rest of the
graph dbs should be a bit easier after this first one, especially since
Falkor is hybrid.
There are a few things code quality wise that might need changing, I am
open to suggestions.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevi@Andrejs-MacBook-Pro.local>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-11-11 17:03:48 +01:00
Igor Ilic
a8706a2fd2 Merge branch 'feature/cog-3245-enable-multi-user-for-falkor' of github.com:topoteretes/cognee into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 15:13:17 +01:00
Igor Ilic
6a64023876 fix: Update vector db url properly 2025-11-11 15:12:58 +01:00
Andrej Milicevic
ac6c3ef9de fix: fix names, add falkor to constants 2025-11-11 15:07:59 +01:00
Andrej Milicevic
cfc131307f Merge branch 'feature/cog-3245-enable-multi-user-for-falkor' of github.com:topoteretes/cognee into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 14:22:55 +01:00
Andrej Milicevic
4f5771230e fix: PR comment changes 2025-11-11 14:22:42 +01:00
Igor Ilic
41b844a31c
Apply suggestion from @dexters1 2025-11-11 13:56:59 +01:00
Igor Ilic
20d49eeb76
Apply suggestion from @dexters1 2025-11-11 13:56:35 +01:00
Igor Ilic
bb8de7b336
Apply suggestion from @dexters1 2025-11-11 13:56:16 +01:00
Igor Ilic
011a7fb60b fix: Resolve multi user migration 2025-11-11 13:53:19 +01:00
Andrej Milicevic
ed2d687135 fix: changes based on PR comments 2025-11-11 13:52:34 +01:00
Igor Ilic
322c134e7e
Merge branch 'dev' into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 13:01:07 +01:00
Vasilije
78b825f338
feat: Add multi-tenancy (#1560)
<!-- .github/pull_request_template.md -->

## Description
Add multi-tenancy to Cognee

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-11 12:55:27 +01:00
Igor Ilic
76bf99b8a6
Merge branch 'dev' into multi-tenancy 2025-11-11 12:51:19 +01:00
Igor Ilic
b5f94c889d
Update cognee/modules/users/permissions/methods/get_all_user_permission_datasets.py
Co-authored-by: Boris <boris@topoteretes.com>
2025-11-11 12:51:09 +01:00
Boris
30430ed122
Merge branch 'dev' into feature/cog-3155-engineering-make-notebook-cognee-usage-identical-across-uis 2025-11-11 12:16:28 +01:00
Andrej Milićević
b5a09dfb53
test: fix weighted edges example (#1745)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Changed the key used for openai, since we changed the keys in the
secrets.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-11 11:39:57 +01:00
Igor Ilic
3710eec94f refactor: update docstring message 2025-11-10 16:23:34 +01:00
Andrej Milićević
7e96f02326
feat:add emptiness check to neptune adapter (#1757)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Add emptiness check to Neptune adapter

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-10 12:34:01 +01:00
Boris
280ce626a8
Merge branch 'dev' into feature/cog-3155-engineering-make-notebook-cognee-usage-identical-across-uis 2025-11-10 11:05:16 +01:00
Vasilije
487635b71b
Update Python version range in README 2025-11-09 11:42:45 +01:00
Vasilije
6192d49dd6
fix: added release update version (#1764)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is
generating a summary for commit
44f0498b75. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2025-11-09 09:44:34 +01:00
vasilije
44f0498b75 added release update version 2025-11-09 09:44:07 +01:00
Vasilije
7557e6f10b
Cherry-pick: Fix cypher search (#1739) (#1763)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with cypher search by encoding the return value from the
cypher query into JSON. Uses fastapi json encoder


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
Example of result now with Cypher search with the following query "MATCH
(src)-[rel]->(nbr) RETURN src, rel" on Simple example:
```
{
   "search_result":[
      [
         [
            {
               "_id":{
                  "offset":0,
                  "table":0
               },
               "_label":"Node",
               "id":"87372381-a9fe-5b82-9c92-3f5dbab1bc35",
               "name":"",
               "type":"DocumentChunk",
               "created_at":"2025-11-05T14:12:46.707597",
               "updated_at":"2025-11-05T14:12:54.801747",
               "properties":"{\"created_at\": 1762351945009, \"updated_at\": 1762351945009, \"ontology_valid\": false, \"version\": 1, \"topological_rank\": 0, \"metadata\": {\"index_fields\": [\"text\"]}, \"belongs_to_set\": null, \"text\": \"\\n    Natural language processing (NLP) is an interdisciplinary\\n    subfield of computer science and information retrieval.\\n    \", \"chunk_size\": 48, \"chunk_index\": 0, \"cut_type\": \"paragraph_end\"}"
            },
            {
               "_src":{
                  "offset":0,
                  "table":0
               },
               "_dst":{
                  "offset":1,
                  "table":0
               },
               "_label":"EDGE",
               "_id":{
                  "offset":0,
                  "table":1
               },
               "relationship_name":"contains",
               "created_at":"2025-11-05T14:12:47.217590",
               "updated_at":"2025-11-05T14:12:55.193003",
               "properties":"{\"source_node_id\": \"87372381-a9fe-5b82-9c92-3f5dbab1bc35\", \"target_node_id\": \"bc338a39-64d6-549a-acec-da60846dd90d\", \"relationship_name\": \"contains\", \"updated_at\": \"2025-11-05 14:12:54\", \"relationship_type\": \"contains\", \"edge_text\": \"relationship_name: contains; entity_name: natural language processing (nlp); entity_description: An interdisciplinary subfield of computer science and information retrieval concerned with interactions between computers and human (natural) languages.\"}"
            }
         ]
      ]
   ],
   "dataset_id":"UUID(""af4b1c1c-90fc-59b7-952c-1da9bbde370c"")",
   "dataset_name":"main_dataset",
   "graphs":"None"
}
```
Relates to https://github.com/topoteretes/cognee/pull/1725 Issue:
https://github.com/topoteretes/cognee/issues/1723

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-09 06:48:14 +01:00
Pavel Zorin
007c7d403e Fix cypher search (#1739)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with cypher search by encoding the return value from the
cypher query into JSON. Uses fastapi json encoder


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
Example of result now with Cypher search with the following query "MATCH
(src)-[rel]->(nbr) RETURN src, rel" on Simple example:
```
{
   "search_result":[
      [
         [
            {
               "_id":{
                  "offset":0,
                  "table":0
               },
               "_label":"Node",
               "id":"87372381-a9fe-5b82-9c92-3f5dbab1bc35",
               "name":"",
               "type":"DocumentChunk",
               "created_at":"2025-11-05T14:12:46.707597",
               "updated_at":"2025-11-05T14:12:54.801747",
               "properties":"{\"created_at\": 1762351945009, \"updated_at\": 1762351945009, \"ontology_valid\": false, \"version\": 1, \"topological_rank\": 0, \"metadata\": {\"index_fields\": [\"text\"]}, \"belongs_to_set\": null, \"text\": \"\\n    Natural language processing (NLP) is an interdisciplinary\\n    subfield of computer science and information retrieval.\\n    \", \"chunk_size\": 48, \"chunk_index\": 0, \"cut_type\": \"paragraph_end\"}"
            },
            {
               "_src":{
                  "offset":0,
                  "table":0
               },
               "_dst":{
                  "offset":1,
                  "table":0
               },
               "_label":"EDGE",
               "_id":{
                  "offset":0,
                  "table":1
               },
               "relationship_name":"contains",
               "created_at":"2025-11-05T14:12:47.217590",
               "updated_at":"2025-11-05T14:12:55.193003",
               "properties":"{\"source_node_id\": \"87372381-a9fe-5b82-9c92-3f5dbab1bc35\", \"target_node_id\": \"bc338a39-64d6-549a-acec-da60846dd90d\", \"relationship_name\": \"contains\", \"updated_at\": \"2025-11-05 14:12:54\", \"relationship_type\": \"contains\", \"edge_text\": \"relationship_name: contains; entity_name: natural language processing (nlp); entity_description: An interdisciplinary subfield of computer science and information retrieval concerned with interactions between computers and human (natural) languages.\"}"
            }
         ]
      ]
   ],
   "dataset_id":"UUID(""af4b1c1c-90fc-59b7-952c-1da9bbde370c"")",
   "dataset_name":"main_dataset",
   "graphs":"None"
}
```
Relates to https://github.com/topoteretes/cognee/pull/1725
Issue: https://github.com/topoteretes/cognee/issues/1723

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-08 22:54:29 +01:00
Pavel Zorin
6335e91620
Fix cypher search (#1739)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with cypher search by encoding the return value from the
cypher query into JSON. Uses fastapi json encoder


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
Example of result now with Cypher search with the following query "MATCH
(src)-[rel]->(nbr) RETURN src, rel" on Simple example:
```
{
   "search_result":[
      [
         [
            {
               "_id":{
                  "offset":0,
                  "table":0
               },
               "_label":"Node",
               "id":"87372381-a9fe-5b82-9c92-3f5dbab1bc35",
               "name":"",
               "type":"DocumentChunk",
               "created_at":"2025-11-05T14:12:46.707597",
               "updated_at":"2025-11-05T14:12:54.801747",
               "properties":"{\"created_at\": 1762351945009, \"updated_at\": 1762351945009, \"ontology_valid\": false, \"version\": 1, \"topological_rank\": 0, \"metadata\": {\"index_fields\": [\"text\"]}, \"belongs_to_set\": null, \"text\": \"\\n    Natural language processing (NLP) is an interdisciplinary\\n    subfield of computer science and information retrieval.\\n    \", \"chunk_size\": 48, \"chunk_index\": 0, \"cut_type\": \"paragraph_end\"}"
            },
            {
               "_src":{
                  "offset":0,
                  "table":0
               },
               "_dst":{
                  "offset":1,
                  "table":0
               },
               "_label":"EDGE",
               "_id":{
                  "offset":0,
                  "table":1
               },
               "relationship_name":"contains",
               "created_at":"2025-11-05T14:12:47.217590",
               "updated_at":"2025-11-05T14:12:55.193003",
               "properties":"{\"source_node_id\": \"87372381-a9fe-5b82-9c92-3f5dbab1bc35\", \"target_node_id\": \"bc338a39-64d6-549a-acec-da60846dd90d\", \"relationship_name\": \"contains\", \"updated_at\": \"2025-11-05 14:12:54\", \"relationship_type\": \"contains\", \"edge_text\": \"relationship_name: contains; entity_name: natural language processing (nlp); entity_description: An interdisciplinary subfield of computer science and information retrieval concerned with interactions between computers and human (natural) languages.\"}"
            }
         ]
      ]
   ],
   "dataset_id":"UUID(""af4b1c1c-90fc-59b7-952c-1da9bbde370c"")",
   "dataset_name":"main_dataset",
   "graphs":"None"
}
```
Relates to https://github.com/topoteretes/cognee/pull/1725
Issue: https://github.com/topoteretes/cognee/issues/1723

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-08 22:34:27 +01:00
Igor Ilic
475e6a5b16
Merge branch 'dev' into fix-cypher-search 2025-11-08 19:44:20 +01:00
Andrej Milicevic
21b1f6b39c fix: remove add_ and get_developer_rules 2025-11-07 18:28:30 +01:00
Daulet Amirkhanov
05c984f98f feat: enhance LLMConfig to selectively strip quotes from specific string fields 2025-11-07 16:55:49 +00:00
Daulet Amirkhanov
c069dd276e feat: add model validator to strip quotes from string fields in LLMConfig 2025-11-07 16:51:11 +00:00
Andrej Milicevic
234b13a8c9 feat:add emptiness check to neptune adapter 2025-11-07 17:39:42 +01:00
Igor Ilic
d6e2bd132b refactor: Remove testme comment 2025-11-07 16:37:37 +01:00
Igor Ilic
8a72de8f86 Merge branch 'dev' into multi-tenancy 2025-11-07 15:52:40 +01:00
Igor Ilic
9fb7f2c4cf Merge branch 'dev' into multi-tenancy 2025-11-07 15:51:44 +01:00
Igor Ilic
59f758d5c2 feat: Add test for multi tenancy, add ability to share name for dataset across tenants for one user 2025-11-07 15:50:49 +01:00
Pavel Zorin
f7095c1c12
CI: Initial Release test workflow (#1749)
<!-- .github/pull_request_template.md -->

## Description
Created `Release Test Workflow` that for now executes load tests. It's
being triggered manually and on a PR to `main`.
NOTE: Release workflow is not in required checks because it contains new
and unstable tests. We will need to watch it ourselves.

The `Load Test` workflow is being called from the `Release Test
Workflow` and also manually on demand.
Limited load test duration to 60 minutes.

<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify):
CI improvement
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-07 14:09:28 +01:00
Pavel Zorin
7d5d19347a CI: removed unnecessary ulimit 2025-11-07 11:11:05 +01:00
Andrej Milićević
df102e99a9
feat: add structured output support to retrievers (#1734)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Add structured output support to all completion-based retrievers. If the
response model is not supplied, a string is used as default

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-07 10:21:52 +01:00
Andrej Milicevic
4ab53c9d64 changes based on PR comments 2025-11-07 10:00:17 +01:00
Igor Ilic
b0a4f775f4 Merge branch 'multi-tenancy' of github.com:topoteretes/cognee into multi-tenancy 2025-11-06 19:12:29 +01:00
Igor Ilic
96c8bba580 refactor: Add db creation as step in MCP creation 2025-11-06 19:12:09 +01:00
Igor Ilic
5dbfea5084
Merge branch 'dev' into multi-tenancy 2025-11-06 18:55:18 +01:00
Igor Ilic
7dec6bfded refactor: Add migrations as part of python package 2025-11-06 18:10:04 +01:00
Andrej Milicevic
72ba8d0dcb chore: ruff format 2025-11-06 17:12:33 +01:00
Andrej Milicevic
da5055a0a9 test: add one test that covers all retrievers. delete others 2025-11-06 17:11:15 +01:00
Igor Ilic
61e1c2903f fix: Remove issue with default user creation 2025-11-06 17:00:46 +01:00
Igor Ilic
bcc59cf9a0 fix: Remove default user creation 2025-11-06 16:57:59 +01:00
Igor Ilic
0d68175167 fix: remove database creation from migrations 2025-11-06 16:53:22 +01:00
Igor Ilic
efb46c99f9 fix: resolve issue with sqlite migration 2025-11-06 16:47:42 +01:00
Pavel Zorin
49fbad9ec0 CI: Run release workflow on PR to main 2025-11-06 16:44:43 +01:00
Igor Ilic
c146de3a4d fix: Remove creation of database and db tables from env.py 2025-11-06 16:41:00 +01:00
Boris
40748427f6
Merge branch 'dev' into feature/cog-3155-engineering-make-notebook-cognee-usage-identical-across-uis 2025-11-06 16:37:36 +01:00
Igor Ilic
ef3a382669 refactor: use batch insert for SQLite as well 2025-11-06 16:23:54 +01:00
Pavel Zorin
d5caba144c CI: Initial Release test workflow 2025-11-06 16:01:39 +01:00
lxobr
5bc83968f8
feature: text chunker with overlap (#1732)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
- Implements `TextChunkerWithOverlap` with configurable
`chunk_overlap_ratio`
- Abstracts chunk_data generation via `get_chunk_data` callable
(defaults to `chunk_by_paragraph`)
- Parametrized tests verify `TextChunker` and `TextChunkerWithOverlap`
(0% overlap) produce identical output for all edge cases.
- Overlap-specific tests validate `TextChunkerWithOverlap` behavior 

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-11-06 15:22:48 +01:00
Igor Ilic
ac751bacf0 fix: Resolve SQLite migration issue 2025-11-06 14:51:25 +01:00
Igor Ilic
ac6dd08855 fix: Resolve issue with sqlite index creation 2025-11-06 14:35:26 +01:00
Vasilije
26c18e9db2
chore: update videos (#1748)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-06 14:14:01 +01:00
hajdul88
c0e5ce04ce
Fix: fixes session history test for multiuser mode (#1746)
<!-- .github/pull_request_template.md -->

## Description
Fixes failing session history test

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-06 14:13:55 +01:00
Hande
a6db2811d5 chore: update videos 2025-11-06 13:58:18 +01:00
Andrej Milicevic
62c599a499 test: add dev to CI config for weighted edges tests 2025-11-06 13:13:35 +01:00
Andrej Milicevic
6b81559bb6 test: changed endpoints in other tests 2025-11-06 13:04:01 +01:00
Boris Arzentar
a058250c95
fix: add cognee to the local run environment 2025-11-06 13:03:11 +01:00
Andrej Milicevic
286584eef0 test: change embedding config 2025-11-06 12:35:37 +01:00
Andrej Milicevic
4567ffcfff test: change endpoint 2025-11-06 12:27:10 +01:00
Andrej Milicevic
5dc8d4e8a1 test: remove workflow_dispatch 2025-11-06 11:52:39 +01:00
Andrej Milicevic
cdb06d2157 test: add workflow_dispatch for test purposes 2025-11-06 11:50:58 +01:00
Andrej Milicevic
cc60141c8d test: change key for wighted edges example 2025-11-06 11:46:34 +01:00
Igor Ilic
5271ee4901 fix: Resolve issue with empty node set 2025-11-06 11:12:12 +01:00
hajdul88
79bd2b2576 chore: fixes ruff formatting 2025-11-06 09:48:01 +01:00
hajdul88
cc058b4482
Merge branch 'dev' into feature/ontology-param-rest 2025-11-06 09:40:04 +01:00
Vasilije
69c7aa2559
test: add load tests (#1573)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added a load test to out codebase. The test runs N adds of a pdf, then
cognifies them and runs N searches. Cognify and the searches are
measured, with certain constraints on how fast they should be. We can
tweak the values if necessary, these are values for the gpt-5-mini
model.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduce a load test for S3 ingest, cognify, and concurrent searches
with timing thresholds, and wire it into CI.
> 
> - **Tests**:
> - Add `cognee/tests/test_load.py` to measure end-to-end load: prunes
data/system, ingests from `s3://cognee-test-load-s3-bucket`, runs
`cognify` then concurrent GRAPH_COMPLETION searches, records timings
across reps, and asserts avg ≤ 8m and each run ≤ 10m.
> - **CI**:
> - Add `test-load` job in `.github/workflows/e2e_tests.yml`: installs
AWS deps, raises file descriptor limit, configures S3/env secrets, and
executes the new load test.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c7598122bb. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2025-11-06 08:42:01 +01:00
Vasilije
8cc55ac0b2
refactor: Enable multi user mode by default if graph and vector db pr… (#1695)
…oviders support it

<!-- .github/pull_request_template.md -->

## Description
Enable multi user mode by default for supported graph and vector DBs

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-06 08:40:06 +01:00
Igor Ilic
ce64f242b7 refactor: add droping of index as well 2025-11-05 18:04:05 +01:00
Igor Ilic
1ef5805c57 fix: Resolve issue with sync migration 2025-11-05 17:50:13 +01:00
Andrej Milicevic
215ef7f3c2 test: add retriever tests 2025-11-05 17:29:40 +01:00
Igor Ilic
9b6cbaf389 chore: Add multi tenant migration 2025-11-05 17:24:11 +01:00
BOBDA DJIMO ARMEL HYACINTHE
41b973a61e docs(mcp): update available tools documentation
- Add missing tools: cognify_add_developer_rules, get_developer_rules, save_interaction
- Enhance search tool description with all supported search types
- Documentation now accurately reflects all 10 tools implemented in server.py
2025-11-05 20:19:52 +04:00
Igor Ilic
c4807a0c67 refactor: Use user_tenants table to update 2025-11-05 16:14:37 +01:00
Igor Ilic
fa4c50f972 fix: Resolve issue with sync migration not working for postgresql 2025-11-05 16:05:33 +01:00
Igor Ilic
1c142734b4 refactor: change import to top of file 2025-11-05 15:22:23 +01:00
Igor Ilic
1b4aa5c67b fix: resolve issue with cypher search 2025-11-05 15:15:15 +01:00
Vasilije
c7598122bb
Merge branch 'dev' into feature/cog-3165-add-load-tests 2025-11-05 14:49:29 +01:00
Igor Ilic
9fc4199958 fix: Resolve issue with cleaning acl table 2025-11-05 13:18:47 +01:00
Andrej Milicevic
18e4bb48fd refactor: remove code and repository related tasks 2025-11-05 13:02:56 +01:00
Andrej Milicevic
c481b87d58 refactor: Remove codify and code_graph pipeline from main repo 2025-11-05 12:56:17 +01:00
Igor Ilic
1643b13c95 chore: add table creation for multi-tenancy to migration 2025-11-05 12:43:01 +01:00
Igor Ilic
6a7d8ba106
Merge branch 'dev' into multi-tenancy 2025-11-05 12:17:49 +01:00
Gao,Wei
72c20c256d
Handle multiple response formats in OllamaEmbeddingEngine
The OllamaEmbeddingEngine is compatible with OpenAI
2025-11-05 17:44:08 +08:00
Igor Ilic
c2aaec2a82 refactor: Resolve issue with permissions example 2025-11-04 23:34:51 +01:00
Igor Ilic
7782f246d3 refactor: Update permissions example to work with new changes 2025-11-04 20:54:00 +01:00
Igor Ilic
f002d3bf0e refactor: Update permissions example 2025-11-04 20:24:16 +01:00
Igor Ilic
db2a32dd17 test: Resolve issue permission example 2025-11-04 19:17:02 +01:00
Igor Ilic
fb102f29a8 chore: Add alembic migration for multi-tenant system 2025-11-04 19:03:56 +01:00
Igor Ilic
a6487cfdc1
Merge branch 'dev' into multi-tenancy 2025-11-04 18:01:19 +01:00
Igor Ilic
cd32b492a4 refactor: Add filtering of non current tenant results when authorizing dataset 2025-11-04 17:56:01 +01:00
Igor Ilic
f4117c42e9 fix: Resolve issue with entity extraction test 2025-11-04 16:43:41 +01:00
Igor Ilic
69ee8ae0a9 Merge branch 'multi-tenancy' of github.com:topoteretes/cognee into multi-tenancy 2025-11-04 16:42:55 +01:00
Igor Ilic
64c7b857d6
Merge branch 'dev' into multi-tenancy 2025-11-04 16:42:51 +01:00
Andrej Milicevic
33b0516381 test: fix completion tests 2025-11-04 15:27:03 +01:00
Andrej Milicevic
7e3c24100b refactor: add structured output to completion retrievers 2025-11-04 15:09:33 +01:00
Igor Ilic
9d771acc24 refactor: filter out search results 2025-11-04 13:35:50 +01:00
Igor Ilic
ea675f29d6 fix: Resolve typo in accessing dictionary for dataset_id 2025-11-04 13:15:49 +01:00
Igor Ilic
ac257dca1d refactor: Account for async change for identify function 2025-11-04 13:13:42 +01:00
Igor Ilic
ff388179fb feat: Add dataset_id calculation that handles legacy dataset_id 2025-11-04 13:11:57 +01:00
Igor Ilic
b0f85c9e99 feat: add legacy and modern data_id calculating 2025-11-04 13:01:10 +01:00
Igor Ilic
e3b707a0c2 refactor: Change variable names, add setting of current tenant to be optional for tenant creation 2025-11-04 12:20:17 +01:00
Igor Ilic
46c509778f refactor: Rename access control functions 2025-11-04 12:06:16 +01:00
Igor Ilic
53521c2068
Update cognee/context_global_variables.py
Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-03 19:42:51 +01:00
Igor Ilic
c81d06d364
Update cognee/context_global_variables.py
Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-03 19:37:52 +01:00
Andrej Milicevic
a7d63df98c test: add extra aws dependency to load test 2025-11-03 18:15:18 +01:00
Andrej Milicevic
eb8df45dab test: increase file descriptor limit on workflow load test 2025-11-03 18:10:19 +01:00
Andrej Milicevic
d0a3bfd39f merged dev into this branch 2025-11-03 17:08:08 +01:00
Andrej Milicevic
4424bdc764 test: fix path based on pr comment 2025-11-03 17:06:51 +01:00
Igor Ilic
2ab2cffd07 chore: update test_search_db to work with all graph providers 2025-11-03 16:37:03 +01:00
Igor Ilic
baac00923c
Merge branch 'dev' into enable-multi-user-mode-default 2025-11-03 15:57:06 +01:00
Fahad Shoaib
a4a9e76246 feat: add ontology endpoint in REST API
- Add POST /api/v1/ontologies endpoint for file upload
  - Add GET /api/v1/ontologies endpoint for listing ontologies
  - Implement OntologyService for file management and metadata
  - Integrate ontology_key parameter in cognify endpoint
  - Update RDFLibOntologyResolver to support file-like objects
  - Add essential test suite for ontology endpoints
2025-11-02 17:05:03 +05:00
Vasilije
1ab8869af1
Merge branch 'dev' into multi-tenancy 2025-11-02 12:13:46 +01:00
Vasilije
1e531385a6
CI: Fix ollama tests (#1713)
<!-- .github/pull_request_template.md -->
Ollama tests were failing during to incorrect embeddings API endpoint

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-02 12:12:44 +01:00
Vasilije
b1fdb71b1e
docs: Fix Readme Instructions (#1706)
- fix grammar, spelling and semantics
- update headings
- add new copy to mirror website copy
- update product names
2025-11-01 07:23:32 +01:00
Igor Ilic
f368a1a4d5 fix: set tests to not use multi-user mode 2025-10-31 20:10:05 +01:00
David Myriel
e69ff7c855 remove agents 2025-10-31 18:36:36 +01:00
David Myriel
4e03406cb6 fix errors 2025-10-31 17:47:28 +01:00
David Myriel
0427586211 fix tagline and description 2025-10-31 17:39:57 +01:00
David Myriel
3472f62a84 bash 2025-10-31 17:00:23 +01:00
David Myriel
a068f3536b fixes 2025-10-31 16:58:21 +01:00
David Myriel
79f5201d6a fixes 2025-10-31 16:52:11 +01:00
Igor Ilic
4c8b821197 fix: resolve test failing 2025-10-31 14:55:52 +01:00
Igor Ilic
00a1fe71d7 fix: Use multi-user mode search 2025-10-31 14:33:07 +01:00
Igor Ilic
3c09433ade fix: Resolve docling test 2025-10-31 13:57:12 +01:00
Igor Ilic
41bbf5fdd8
Merge branch 'dev' into multi-tenancy 2025-10-30 18:12:21 +01:00
Igor Ilic
1b483276b0 fix: disable backend access control for rel db test 2025-10-30 18:04:27 +01:00
Igor Ilic
45bb3130c6 fix: Use same dataset name accross cognee calls 2025-10-30 17:40:00 +01:00
Igor Ilic
e061f34a28 fix: Resolve issue with dataset names for example 2025-10-30 17:13:10 +01:00
Igor Ilic
9d8430cfb0 refactor: Update unit tests for require auth 2025-10-30 16:52:04 +01:00
Igor Ilic
50977db946 Merge branch 'enable-multi-user-mode-default' of github.com:topoteretes/cognee into enable-multi-user-mode-default 2025-10-30 16:26:05 +01:00
Igor Ilic
ee0ecd52d8 refactor: Rewrite tests to work with multi-user mode by default 2025-10-30 16:25:34 +01:00
David Myriel
7ee6cc8eb8 fix docs 2025-10-30 16:21:41 +01:00
Andrej Milicevic
4b0b9bfc53 chore: ruff format 2025-10-30 16:06:15 +01:00
Andrej Milicevic
28f28f06dd fix: added vector db name to test configs 2025-10-30 16:04:33 +01:00
Andrej Milicevic
ce925615fe fix: fix small naming error 2025-10-30 15:32:27 +01:00
Andrej Milicevic
908d329127 feat: add alembic migrations 2025-10-30 15:15:41 +01:00
Igor Ilic
8d4eed6101
Merge branch 'dev' into enable-multi-user-mode-default 2025-10-30 14:20:30 +01:00
Igor Ilic
eec96e4f1f refactor: fix search result for library test 2025-10-29 19:14:53 +01:00
Igor Ilic
82eb6e4ae3 Merge branch 'enable-multi-user-mode-default' of github.com:topoteretes/cognee into enable-multi-user-mode-default 2025-10-29 17:37:14 +01:00
Igor Ilic
d1581e9eba refactor: disable permissions for code graph example 2025-10-29 17:36:56 +01:00
Igor Ilic
858ae99060
Merge branch 'dev' into enable-multi-user-mode-default 2025-10-29 17:29:47 +01:00
Igor Ilic
6572cf5cb9 refactor: use boolean instead of string 2025-10-29 16:35:44 +01:00
Igor Ilic
6a7660a7c1 refactor: Return logs folder 2025-10-29 16:31:42 +01:00
Andrej Milicevic
70f3ced15a fix: PR comment fixes 2025-10-29 16:30:13 +01:00
Igor Ilic
fb7e74eaa8 refactor: Enable multi user mode by default if graph and vector db providers support it 2025-10-29 16:28:09 +01:00
Andrej Milicevic
c3f0cb95da fix: delete unnecessary comments, add to config 2025-10-28 18:06:04 +01:00
Andrej Milicevic
9c9395851c chore: ruff formatting 2025-10-28 18:01:32 +01:00
Andrej Milicevic
bbcd8baf3a feature: add multi-user for Falkor db 2025-10-28 17:56:32 +01:00
Andrej Milicevic
2b083dd0f1 small changes to load test 2025-10-28 09:27:33 +01:00
Andrej Milicevic
897fbd2f09 load test now uses s3 bucket 2025-10-27 15:42:09 +01:00
Andrej Milicevic
813ee94836 Initial commit, still wip 2025-10-27 08:12:37 +01:00
Igor Ilic
90e11b676e
Merge branch 'dev' into multi-tenancy 2025-10-25 00:01:45 +02:00
EricXiao
8566516cec chore: Remove local test code
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-10-22 16:59:07 +08:00
EricXiao
742866b4c9 feat: csv ingestion loader & chunk
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-10-22 16:56:46 +08:00
Andrej Milicevic
c5648e6337 test: Add load test. 2025-10-22 09:22:11 +02:00
Igor Ilic
f3f710f9b9
Merge branch 'dev' into multi-tenancy 2025-10-21 15:49:09 +02:00
Andrej Milicevic
4ff0b3407e Merge branch 'dev' into feature/cog-3165-add-load-tests 2025-10-21 13:39:24 +02:00
Igor Ilic
dfb6398aea
Merge branch 'dev' into multi-tenancy 2025-10-20 15:08:39 +02:00
Igor Ilic
8e875d2061 Merge branch 'multi-tenancy' of github.com:topoteretes/cognee into multi-tenancy 2025-10-20 15:07:43 +02:00
Igor Ilic
6934692e1b refactor: Enable selection of default single user tenant 2025-10-20 15:07:13 +02:00
Igor Ilic
a0a4cdea20
Merge branch 'dev' into multi-tenancy 2025-10-20 14:45:07 +02:00
Igor Ilic
4f874deace feat: Add tenant select method/endpoint for users 2025-10-19 23:50:17 +02:00
Igor Ilic
d6bb95e379 fix: load tenants and roles when creating user 2025-10-19 21:57:39 +02:00
Igor Ilic
13f0423a55 refactor: Add better TODO message 2025-10-19 21:35:50 +02:00
Igor Ilic
12785e31ea fix: Resolve issue with adding user to tenants 2025-10-19 21:11:14 +02:00
Igor Ilic
9a2a84905c Merge branch 'multi-tenancy' of github.com:topoteretes/cognee into multi-tenancy 2025-10-19 20:13:36 +02:00
Igor Ilic
0c4e3e1f52 fix: Load tenants to default user 2025-10-19 20:13:22 +02:00
Igor Ilic
68ee8cc1bf
Merge branch 'dev' into multi-tenancy 2025-10-19 19:32:24 +02:00
Igor Ilic
a8ff50ceae feat: Initial multi-tenancy commit 2025-10-17 18:09:01 +02:00
Andrej Milicevic
c16459d236 test: Add prune step to the test 2025-10-15 17:58:05 +02:00
Andrej Milicevic
ac5118ee34 test:Add load test 2025-10-15 17:28:51 +02:00
xdurawa
c91d1ff0ae Remove documentation changes as requested by reviewers
- Reverted README.md to original state
- Reverted cognee-starter-kit/README.md to original state
- Documentation will be updated separately by maintainers
2025-09-03 01:34:35 -04:00
xavierdurawa
06a3458982
Merge branch 'dev' into feature/bedrock-llm-provider 2025-09-02 23:08:38 -04:00
xdurawa
3821966d89 Merge feature/bedrock_llm_client into feature/bedrock-llm-provider
- Resolved conflicts in test_suites.yml to include both OpenRouter and Bedrock tests
- Updated add.py to include bedrock provider with latest dev branch model defaults
- Resolved uv.lock conflicts by using dev branch version
2025-09-01 20:55:20 -04:00
xdurawa
2a46208569 Adding AWS Bedrock support as a LLM provider
Signed-off-by: xdurawa <xavierdurawa@gmail.com>
2025-09-01 20:47:26 -04:00
285 changed files with 22756 additions and 13548 deletions

127
.coderabbit.yaml Normal file
View file

@ -0,0 +1,127 @@
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
# .coderabbit.yaml
language: en
early_access: false
enable_free_tier: true
reviews:
profile: chill
instructions: >-
# Code Review Instructions
- Ensure the code follows best practices and coding standards.
- For **Python** code, follow
[PEP 20](https://www.python.org/dev/peps/pep-0020/) and
[CEP-8](https://gist.github.com/reactive-firewall/b7ee98df9e636a51806e62ef9c4ab161)
standards.
# Documentation Review Instructions
- Verify that documentation and comments are clear and comprehensive.
- Verify that documentation and comments are free of spelling mistakes.
# Test Code Review Instructions
- Ensure that test code is automated, comprehensive, and follows testing best practices.
- Verify that all critical functionality is covered by tests.
- Ensure that test code follow
[CEP-8](https://gist.github.com/reactive-firewall/d840ee9990e65f302ce2a8d78ebe73f6)
# Misc.
- Confirm that the code meets the project's requirements and objectives.
- Confirm that copyright years are up-to date whenever a file is changed.
request_changes_workflow: false
high_level_summary: true
high_level_summary_placeholder: '@coderabbitai summary'
auto_title_placeholder: '@coderabbitai'
review_status: true
poem: false
collapse_walkthrough: false
sequence_diagrams: false
changed_files_summary: true
path_filters: ['!*.xc*/**', '!node_modules/**', '!dist/**', '!build/**', '!.git/**', '!venv/**', '!__pycache__/**']
path_instructions:
- path: README.md
instructions: >-
1. Consider the file 'README.md' the overview/introduction of the project.
Also consider the 'README.md' file the first place to look for project documentation.
2. When reviewing the file 'README.md' it should be linted with help
from the tools `markdownlint` and `languagetool`, pointing out any issues.
3. You may assume the file 'README.md' will contain GitHub flavor Markdown.
- path: '**/*.py'
instructions: >-
When reviewing Python code for this project:
1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
3. As a style convention, consider the code style advocated in [CEP-8](https://gist.github.com/reactive-firewall/b7ee98df9e636a51806e62ef9c4ab161) and evaluate suggested changes for code style compliance.
4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
- path: cognee/tests/*
instructions: >-
When reviewing test code:
1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
3. As a style convention, consider the code style advocated in [CEP-8](https://gist.github.com/reactive-firewall/b7ee98df9e636a51806e62ef9c4ab161) and evaluate suggested changes for code style compliance, pointing out any violations discovered.
4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
5. As a project rule, Python source files with names prefixed by the string "test_" and located in the project's "tests" directory are the project's unit-testing code. It is safe, albeit a heuristic, to assume these are considered part of the project's minimal acceptance testing unless a justifying exception to this assumption is documented.
6. As a project rule, any files without extensions and with names prefixed by either the string "check_" or the string "test_", and located in the project's "tests" directory, are the project's non-unit test code. "Non-unit test" in this context refers to any type of testing other than unit testing, such as (but not limited to) functional testing, style linting, regression testing, etc. It can also be assumed that non-unit testing code is usually written as Bash shell scripts.
- path: requirements.txt
instructions: >-
* The project's own Python dependencies are recorded in 'requirements.txt' for production code.
* The project's testing-specific Python dependencies are recorded in 'tests/requirements.txt' and are used for testing the project.
* The project's documentation-specific Python dependencies are recorded in 'docs/requirements.txt' and are used only for generating Python-focused documentation for the project. 'docs/requirements.txt' may be absent if not applicable.
Consider these 'requirements.txt' files the records of truth regarding project dependencies.
- path: .github/**
instructions: >-
* When the project is hosted on GitHub: All GitHub-specific configurations, templates, and tools should be found in the '.github' directory tree.
* 'actionlint' erroneously generates false positives when dealing with GitHub's `${{ ... }}` syntax in conditionals.
* 'actionlint' erroneously generates incorrect solutions when suggesting the removal of valid `${{ ... }}` syntax.
abort_on_close: true
auto_review:
enabled: true
auto_incremental_review: true
ignore_title_keywords: []
labels: []
drafts: false
base_branches:
- dev
- main
tools:
shellcheck:
enabled: true
ruff:
enabled: true
configuration:
extend_select:
- E # Pycodestyle errors (style issues)
- F # PyFlakes codes (logical errors)
- W # Pycodestyle warnings
- N # PEP 8 naming conventions
ignore:
- W191
- W391
- E117
- D208
line_length: 100
dummy_variable_rgx: '^(_.*|junk|extra)$' # Variables starting with '_' or named 'junk' or 'extras', are considered dummy variables
markdownlint:
enabled: true
yamllint:
enabled: true
chat:
auto_reply: true

View file

@ -21,6 +21,10 @@ LLM_PROVIDER="openai"
LLM_ENDPOINT=""
LLM_API_VERSION=""
LLM_MAX_TOKENS="16384"
# Instructor's modes determine how structured data is requested from and extracted from LLM responses
# You can change this type (i.e. mode) via this env variable
# Each LLM has its own default value, e.g. gpt-5 models have "json_schema_mode"
LLM_INSTRUCTOR_MODE=""
EMBEDDING_PROVIDER="openai"
EMBEDDING_MODEL="openai/text-embedding-3-large"
@ -93,6 +97,8 @@ DB_NAME=cognee_db
# Default (local file-based)
GRAPH_DATABASE_PROVIDER="kuzu"
# Handler for multi-user access control mode, it handles how should the mapping/creation of separate DBs be handled per Cognee dataset
GRAPH_DATASET_DATABASE_HANDLER="kuzu"
# -- To switch to Remote Kuzu uncomment and fill these: -------------------------------------------------------------
#GRAPH_DATABASE_PROVIDER="kuzu"
@ -117,6 +123,8 @@ VECTOR_DB_PROVIDER="lancedb"
# Not needed if a cloud vector database is not used
VECTOR_DB_URL=
VECTOR_DB_KEY=
# Handler for multi-user access control mode, it handles how should the mapping/creation of separate DBs be handled per Cognee dataset
VECTOR_DATASET_DATABASE_HANDLER="lancedb"
################################################################################
# 🧩 Ontology resolver settings
@ -169,8 +177,9 @@ REQUIRE_AUTHENTICATION=False
# Vector: LanceDB
# Graph: KuzuDB
#
# It enforces LanceDB and KuzuDB use and uses them to create databases per Cognee user + dataset
ENABLE_BACKEND_ACCESS_CONTROL=False
# It enforces creation of databases per Cognee user + dataset. Does not work with some graph and database providers.
# Disable mode when using not supported graph/vector databases.
ENABLE_BACKEND_ACCESS_CONTROL=True
################################################################################
# ☁️ Cloud Sync Settings

View file

@ -10,6 +10,10 @@ inputs:
description: "Additional extra dependencies to install (space-separated)"
required: false
default: ""
rebuild-lockfile:
description: "Whether to rebuild the uv lockfile"
required: false
default: "false"
runs:
using: "composite"
@ -26,6 +30,7 @@ runs:
enable-cache: true
- name: Rebuild uv lockfile
if: ${{ inputs.rebuild-lockfile == 'true' }}
shell: bash
run: |
rm uv.lock
@ -42,3 +47,8 @@ runs:
done
fi
uv sync --extra api --extra docs --extra evals --extra codegraph --extra ollama --extra dev --extra neo4j --extra redis $EXTRA_ARGS
- name: Add telemetry identifier for telemetry test and in case telemetry is enabled by accident
shell: bash
run: |
echo "test-machine" > .anon_id

View file

@ -6,6 +6,14 @@ Please provide a clear, human-generated description of the changes in this PR.
DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning.
-->
## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to test it locally;
* Proof that it's sufficiently tested.
-->
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)

20
.github/release-drafter.yml vendored Normal file
View file

@ -0,0 +1,20 @@
name-template: 'v$NEXT_PATCH_VERSION'
tag-template: 'v$NEXT_PATCH_VERSION'
categories:
- title: 'Features'
labels: ['feature', 'enhancement']
- title: 'Bug Fixes'
labels: ['bug', 'fix']
- title: 'Maintenance'
labels: ['chore', 'refactor', 'ci']
change-template: '- $TITLE (#$NUMBER) @$AUTHOR'
template: |
## Whats Changed
$CHANGES
## Contributors
$CONTRIBUTORS

View file

@ -75,6 +75,7 @@ jobs:
name: Run Unit Tests
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -104,6 +105,7 @@ jobs:
name: Run Integration Tests
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -132,6 +134,7 @@ jobs:
name: Run Simple Examples
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -161,6 +164,7 @@ jobs:
name: Run Simple Examples BAML
runs-on: ubuntu-22.04
env:
ENV: 'dev'
STRUCTURED_OUTPUT_FRAMEWORK: "BAML"
BAML_LLM_PROVIDER: openai
BAML_LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
@ -193,32 +197,3 @@ jobs:
- name: Run Simple Examples
run: uv run python ./examples/python/simple_example.py
graph-tests:
name: Run Basic Graph Tests
runs-on: ubuntu-22.04
env:
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_PROVIDER: openai
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
- name: Run Graph Tests
run: uv run python ./examples/python/code_graph_example.py --repo_path ./cognee/tasks/graph

View file

@ -39,6 +39,7 @@ jobs:
name: CLI Unit Tests
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -66,6 +67,7 @@ jobs:
name: CLI Integration Tests
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -93,6 +95,7 @@ jobs:
name: CLI Functionality Tests
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}

View file

@ -60,7 +60,8 @@ jobs:
- name: Run Neo4j Example
env:
ENV: dev
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -95,7 +96,7 @@ jobs:
- name: Run Kuzu Example
env:
ENV: dev
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -141,7 +142,8 @@ jobs:
- name: Run PGVector Example
env:
ENV: dev
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

View file

@ -47,6 +47,7 @@ jobs:
- name: Run Distributed Cognee (Modal)
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

View file

@ -147,6 +147,7 @@ jobs:
- name: Run Deduplication Example
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Test needs OpenAI endpoint to handle multimedia
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
@ -211,6 +212,56 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_parallel_databases.py
test-dataset-database-handler:
name: Test dataset database handlers in Cognee
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run dataset databases handler test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_dataset_database_handler.py
test-dataset-database-deletion:
name: Test dataset database deletion in Cognee
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run dataset databases deletion test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_dataset_delete.py
test-permissions:
name: Test permissions with different situations in Cognee
runs-on: ubuntu-22.04
@ -226,7 +277,7 @@ jobs:
- name: Dependencies already installed
run: echo "Dependencies already installed in setup"
- name: Run parallel databases test
- name: Run permissions test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
@ -239,6 +290,31 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_permissions.py
test-multi-tenancy:
name: Test multi tenancy with different situations in Cognee
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run multi tenancy test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_multi_tenancy.py
test-graph-edges:
name: Test graph edge ingestion
runs-on: ubuntu-22.04
@ -308,7 +384,7 @@ jobs:
python-version: '3.11.x'
extra-dependencies: "postgres redis"
- name: Run Concurrent subprocess access test (Kuzu/Lancedb/Postgres)
- name: Run Concurrent subprocess access test (Kuzu/Lancedb/Postgres/Redis)
env:
ENV: dev
LLM_MODEL: ${{ secrets.LLM_MODEL }}
@ -321,6 +397,7 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
GRAPH_DATABASE_PROVIDER: 'kuzu'
CACHING: true
CACHE_BACKEND: 'redis'
SHARED_KUZU_LOCK: true
DB_PROVIDER: 'postgres'
DB_NAME: 'cognee_db'
@ -386,8 +463,37 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_feedback_enrichment.py
run_conversation_sessions_test:
name: Conversation sessions test
test-edge-centered-payload:
name: Test Cognify - Edge Centered Payload
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Dependencies already installed
run: echo "Dependencies already installed in setup"
- name: Run Edge Centered Payload Test
env:
ENV: 'dev'
TRIPLET_EMBEDDING: True
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_edge_centered_payload.py
run_conversation_sessions_test_redis:
name: Conversation sessions test (Redis)
runs-on: ubuntu-latest
defaults:
run:
@ -427,7 +533,60 @@ jobs:
python-version: '3.11.x'
extra-dependencies: "postgres redis"
- name: Run Conversation session tests
- name: Run Conversation session tests (Redis)
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
GRAPH_DATABASE_PROVIDER: 'kuzu'
CACHING: true
CACHE_BACKEND: 'redis'
DB_PROVIDER: 'postgres'
DB_NAME: 'cognee_db'
DB_HOST: '127.0.0.1'
DB_PORT: 5432
DB_USERNAME: cognee
DB_PASSWORD: cognee
run: uv run python ./cognee/tests/test_conversation_history.py
run_conversation_sessions_test_fs:
name: Conversation sessions test (FS)
runs-on: ubuntu-latest
defaults:
run:
shell: bash
services:
postgres:
image: pgvector/pgvector:pg17
env:
POSTGRES_USER: cognee
POSTGRES_PASSWORD: cognee
POSTGRES_DB: cognee_db
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "postgres"
- name: Run Conversation session tests (FS)
env:
ENV: dev
LLM_MODEL: ${{ secrets.LLM_MODEL }}
@ -440,6 +599,7 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
GRAPH_DATABASE_PROVIDER: 'kuzu'
CACHING: true
CACHE_BACKEND: 'fs'
DB_PROVIDER: 'postgres'
DB_NAME: 'cognee_db'
DB_HOST: '127.0.0.1'
@ -447,3 +607,30 @@ jobs:
DB_USERNAME: cognee
DB_PASSWORD: cognee
run: uv run python ./cognee/tests/test_conversation_history.py
run-pipeline-cache-test:
name: Test Pipeline Caching
runs-on: ubuntu-22.04
steps:
- name: Check out
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run Pipeline Cache Test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_pipeline_cache.py

View file

@ -21,6 +21,7 @@ jobs:
- name: Run Multimedia Example
env:
ENV: 'dev'
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: uv run python ./examples/python/multimedia_example.py
@ -40,6 +41,7 @@ jobs:
- name: Run Evaluation Framework Example
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -69,6 +71,8 @@ jobs:
- name: Run Descriptive Graph Metrics Example
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -99,6 +103,7 @@ jobs:
- name: Run Dynamic Steps Tests
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -124,6 +129,7 @@ jobs:
- name: Run Temporal Example
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -149,6 +155,7 @@ jobs:
- name: Run Ontology Demo Example
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -174,6 +181,7 @@ jobs:
- name: Run Agentic Reasoning Example
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -199,6 +207,7 @@ jobs:
- name: Run Memify Tests
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -224,6 +233,7 @@ jobs:
- name: Run Custom Pipeline Example
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -249,6 +259,7 @@ jobs:
- name: Run Memify Tests
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
@ -274,6 +285,7 @@ jobs:
- name: Run Docling Test
env:
ENV: 'dev'
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}

View file

@ -78,6 +78,7 @@ jobs:
- name: Run default Neo4j
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

70
.github/workflows/load_tests.yml vendored Normal file
View file

@ -0,0 +1,70 @@
name: Load tests
permissions:
contents: read
on:
workflow_dispatch:
workflow_call:
secrets:
LLM_MODEL:
required: true
LLM_ENDPOINT:
required: true
LLM_API_KEY:
required: true
LLM_API_VERSION:
required: true
EMBEDDING_MODEL:
required: true
EMBEDDING_ENDPOINT:
required: true
EMBEDDING_API_KEY:
required: true
EMBEDDING_API_VERSION:
required: true
OPENAI_API_KEY:
required: true
AWS_ACCESS_KEY_ID:
required: true
AWS_SECRET_ACCESS_KEY:
required: true
jobs:
test-load:
name: Test Load
runs-on: ubuntu-22.04
timeout-minutes: 60
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Verify File Descriptor Limit
run: ulimit -n
- name: Run Load Test
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: True
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
STORAGE_BACKEND: s3
AWS_REGION: eu-west-1
AWS_ENDPOINT_URL: https://s3-eu-west-1.amazonaws.com
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_DEV_USER_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_DEV_USER_SECRET_KEY }}
run: uv run python ./cognee/tests/test_load.py

22
.github/workflows/pre_test.yml vendored Normal file
View file

@ -0,0 +1,22 @@
on:
workflow_call:
permissions:
contents: read
jobs:
check-uv-lock:
name: Validate uv lockfile and project metadata
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Validate uv lockfile and project metadata
run: uv lock --check || { echo "'uv lock --check' failed."; echo "Run 'uv lock' and push your changes."; exit 1; }

138
.github/workflows/release.yml vendored Normal file
View file

@ -0,0 +1,138 @@
name: release.yml
on:
workflow_dispatch:
inputs:
flavour:
required: true
default: dev
type: choice
options:
- dev
- main
description: Dev or Main release
jobs:
release-github:
name: Create GitHub Release from ${{ inputs.flavour }}
outputs:
tag: ${{ steps.create_tag.outputs.tag }}
version: ${{ steps.create_tag.outputs.version }}
permissions:
contents: write
runs-on: ubuntu-latest
steps:
- name: Check out ${{ inputs.flavour }}
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Create and push git tag
id: create_tag
run: |
VERSION="$(uv version --short)"
TAG="v${VERSION}"
echo "Tag to create: ${TAG}"
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
echo "tag=${TAG}" >> "$GITHUB_OUTPUT"
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
git tag "${TAG}"
git push origin "${TAG}"
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
tag_name: ${{ steps.create_tag.outputs.tag }}
generate_release_notes: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
release-pypi-package:
needs: release-github
name: Release PyPI Package from ${{ inputs.flavour }}
permissions:
contents: read
runs-on: ubuntu-latest
steps:
- name: Check out ${{ inputs.flavour }}
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Install Python
run: uv python install
- name: Install dependencies
run: uv sync --locked --all-extras
- name: Build distributions
run: uv build
- name: Publish ${{ inputs.flavour }} release to PyPI
env:
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: uv publish
release-docker-image:
needs: release-github
name: Release Docker Image from ${{ inputs.flavour }}
permissions:
contents: read
runs-on: ubuntu-latest
steps:
- name: Check out ${{ inputs.flavour }}
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push Dev Docker Image
if: ${{ inputs.flavour == 'dev' }}
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: cognee/cognee:${{ needs.release-github.outputs.version }}
labels: |
version=${{ needs.release-github.outputs.version }}
flavour=${{ inputs.flavour }}
cache-from: type=registry,ref=cognee/cognee:buildcache
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
- name: Build and push Main Docker Image
if: ${{ inputs.flavour == 'main' }}
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: |
cognee/cognee:${{ needs.release-github.outputs.version }}
cognee/cognee:latest
labels: |
version=${{ needs.release-github.outputs.version }}
flavour=${{ inputs.flavour }}
cache-from: type=registry,ref=cognee/cognee:buildcache
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max

17
.github/workflows/release_test.yml vendored Normal file
View file

@ -0,0 +1,17 @@
# Long-running, heavy and resource-consuming tests for release validation
name: Release Test Workflow
permissions:
contents: read
on:
workflow_dispatch:
pull_request:
branches:
- main
jobs:
load-tests:
name: Load Tests
uses: ./.github/workflows/load_tests.yml
secrets: inherit

View file

@ -84,6 +84,7 @@ jobs:
GRAPH_DATABASE_PROVIDER: 'neo4j'
VECTOR_DB_PROVIDER: 'lancedb'
DB_PROVIDER: 'sqlite'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
GRAPH_DATABASE_URL: ${{ steps.neo4j.outputs.neo4j-url }}
GRAPH_DATABASE_USERNAME: ${{ steps.neo4j.outputs.neo4j-username }}
GRAPH_DATABASE_PASSWORD: ${{ steps.neo4j.outputs.neo4j-password }}
@ -135,6 +136,7 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
GRAPH_DATABASE_PROVIDER: 'kuzu'
VECTOR_DB_PROVIDER: 'pgvector'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
DB_PROVIDER: 'postgres'
DB_NAME: 'cognee_db'
DB_HOST: '127.0.0.1'
@ -197,6 +199,7 @@ jobs:
GRAPH_DATABASE_URL: ${{ steps.neo4j.outputs.neo4j-url }}
GRAPH_DATABASE_USERNAME: ${{ steps.neo4j.outputs.neo4j-username }}
GRAPH_DATABASE_PASSWORD: ${{ steps.neo4j.outputs.neo4j-password }}
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
DB_NAME: cognee_db
DB_HOST: 127.0.0.1
DB_PORT: 5432

View file

@ -72,6 +72,7 @@ jobs:
- name: Run Temporal Graph with Neo4j (lancedb + sqlite)
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
@ -123,6 +124,7 @@ jobs:
- name: Run Temporal Graph with Kuzu (postgres + pgvector)
env:
ENV: dev
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
@ -189,6 +191,7 @@ jobs:
- name: Run Temporal Graph with Neo4j (postgres + pgvector)
env:
ENV: dev
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}

View file

@ -84,3 +84,93 @@ jobs:
EMBEDDING_DIMENSIONS: "3072"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
test-bedrock-api-key:
name: Run Bedrock API Key Test
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Run Bedrock API Key Simple Example
env:
LLM_PROVIDER: "bedrock"
LLM_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
LLM_MAX_TOKENS: "16384"
AWS_REGION_NAME: "eu-west-1"
EMBEDDING_PROVIDER: "bedrock"
EMBEDDING_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
test-bedrock-aws-credentials:
name: Run Bedrock AWS Credentials Test
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Run Bedrock AWS Credentials Simple Example
env:
LLM_PROVIDER: "bedrock"
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
LLM_MAX_TOKENS: "16384"
AWS_REGION_NAME: "eu-west-1"
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
EMBEDDING_PROVIDER: "bedrock"
EMBEDDING_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
test-bedrock-aws-profile:
name: Run Bedrock AWS Profile Test
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Configure AWS Profile
run: |
mkdir -p ~/.aws
cat > ~/.aws/credentials << EOF
[bedrock-test]
aws_access_key_id = ${{ secrets.AWS_ACCESS_KEY_ID }}
aws_secret_access_key = ${{ secrets.AWS_SECRET_ACCESS_KEY }}
EOF
- name: Run Bedrock AWS Profile Simple Example
env:
LLM_PROVIDER: "bedrock"
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
LLM_MAX_TOKENS: "16384"
AWS_PROFILE_NAME: "bedrock-test"
AWS_REGION_NAME: "eu-west-1"
EMBEDDING_PROVIDER: "bedrock"
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py

View file

@ -7,13 +7,8 @@ jobs:
run_ollama_test:
# needs 16 Gb RAM for phi4
runs-on: buildjet-4vcpu-ubuntu-2204
# services:
# ollama:
# image: ollama/ollama
# ports:
# - 11434:11434
# needs 32 Gb RAM for phi4 in a container
runs-on: buildjet-8vcpu-ubuntu-2204
steps:
- name: Checkout repository
@ -28,14 +23,6 @@ jobs:
run: |
uv add torch
# - name: Install ollama
# run: curl -fsSL https://ollama.com/install.sh | sh
# - name: Run ollama
# run: |
# ollama serve --openai &
# ollama pull llama3.2 &
# ollama pull avr/sfr-embedding-mistral:latest
- name: Start Ollama container
run: |
docker run -d --name ollama -p 11434:11434 ollama/ollama

View file

@ -18,15 +18,21 @@ env:
RUNTIME__LOG_LEVEL: ERROR
ENV: 'dev'
jobs:
jobs:
pre-test:
name: basic checks
uses: ./.github/workflows/pre_test.yml
basic-tests:
name: Basic Tests
uses: ./.github/workflows/basic_tests.yml
needs: [ pre-test ]
secrets: inherit
e2e-tests:
name: End-to-End Tests
uses: ./.github/workflows/e2e_tests.yml
needs: [ pre-test ]
secrets: inherit
distributed-tests:

View file

@ -92,6 +92,7 @@ jobs:
- name: Run PGVector Tests
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -127,4 +128,4 @@ jobs:
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_lancedb.py
run: uv run python ./cognee/tests/test_lancedb.py

View file

@ -2,7 +2,7 @@ name: Weighted Edges Tests
on:
push:
branches: [ main, weighted_edges ]
branches: [ main, dev, weighted_edges ]
paths:
- 'cognee/modules/graph/utils/get_graph_from_model.py'
- 'cognee/infrastructure/engine/models/Edge.py'
@ -10,7 +10,7 @@ on:
- 'examples/python/weighted_edges_example.py'
- '.github/workflows/weighted_edges_tests.yml'
pull_request:
branches: [ main ]
branches: [ main, dev ]
paths:
- 'cognee/modules/graph/utils/get_graph_from_model.py'
- 'cognee/infrastructure/engine/models/Edge.py'
@ -32,7 +32,7 @@ jobs:
env:
LLM_PROVIDER: openai
LLM_MODEL: gpt-5-mini
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
steps:
- name: Check out repository
@ -67,14 +67,13 @@ jobs:
env:
LLM_PROVIDER: openai
LLM_MODEL: gpt-5-mini
LLM_ENDPOINT: https://api.openai.com/v1/
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_ENDPOINT: https://api.openai.com/v1
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_API_VERSION: "2024-02-01"
EMBEDDING_PROVIDER: openai
EMBEDDING_MODEL: text-embedding-3-small
EMBEDDING_ENDPOINT: https://api.openai.com/v1/
EMBEDDING_API_KEY: ${{ secrets.LLM_API_KEY }}
EMBEDDING_API_VERSION: "2024-02-01"
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
steps:
- name: Check out repository
uses: actions/checkout@v4
@ -95,6 +94,7 @@ jobs:
- name: Run Weighted Edges Tests
env:
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
GRAPH_DATABASE_PROVIDER: ${{ matrix.graph_db_provider }}
GRAPH_DATABASE_URL: ${{ matrix.graph_db_provider == 'neo4j' && steps.neo4j.outputs.neo4j-url || '' }}
GRAPH_DATABASE_USERNAME: ${{ matrix.graph_db_provider == 'neo4j' && steps.neo4j.outputs.neo4j-username || '' }}
@ -108,14 +108,14 @@ jobs:
env:
LLM_PROVIDER: openai
LLM_MODEL: gpt-5-mini
LLM_ENDPOINT: https://api.openai.com/v1/
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_ENDPOINT: https://api.openai.com/v1
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LLM_API_VERSION: "2024-02-01"
EMBEDDING_PROVIDER: openai
EMBEDDING_MODEL: text-embedding-3-small
EMBEDDING_ENDPOINT: https://api.openai.com/v1/
EMBEDDING_API_KEY: ${{ secrets.LLM_API_KEY }}
EMBEDDING_API_VERSION: "2024-02-01"
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
steps:
- name: Check out repository
uses: actions/checkout@v4
@ -166,5 +166,3 @@ jobs:
uses: astral-sh/ruff-action@v2
with:
args: "format --check cognee/modules/graph/utils/get_graph_from_model.py cognee/tests/unit/interfaces/graph/test_weighted_edges.py examples/python/weighted_edges_example.py"

9
.mergify.yml Normal file
View file

@ -0,0 +1,9 @@
pull_request_rules:
- name: Backport to main when backport_main label is set
conditions:
- label=backport_main
- base=dev
actions:
backport:
branches:
- main

View file

@ -71,7 +71,7 @@ git clone https://github.com/<your-github-username>/cognee.git
cd cognee
```
In case you are working on Vector and Graph Adapters
1. Fork the [**cognee**](https://github.com/topoteretes/cognee-community) repository
1. Fork the [**cognee-community**](https://github.com/topoteretes/cognee-community) repository
2. Clone your fork:
```shell
git clone https://github.com/<your-github-username>/cognee-community.git
@ -97,6 +97,21 @@ git checkout -b feature/your-feature-name
python cognee/cognee/tests/test_library.py
```
### Running Simple Example
Change .env.example into .env and provide your OPENAI_API_KEY as LLM_API_KEY
Make sure to run ```shell uv sync ``` in the root cloned folder or set up a virtual environment to run cognee
```shell
python cognee/cognee/examples/python/simple_example.py
```
or
```shell
uv run python cognee/cognee/examples/python/simple_example.py
```
## 4. 📤 Submitting Changes
1. Install ruff on your system

163
README.md
View file

@ -5,27 +5,27 @@
<br />
cognee - Memory for AI Agents in 6 lines of code
Cognee - Accurate and Persistent AI Memory
<p align="center">
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
.
<a href="https://cognee.ai">Learn more</a>
<a href="https://docs.cognee.ai/">Docs</a>
.
<a href="https://cognee.ai">Learn More</a>
·
<a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
·
<a href="https://www.reddit.com/r/AIMemory/">Join r/AIMemory</a>
.
<a href="https://docs.cognee.ai/">Docs</a>
.
<a href="https://github.com/topoteretes/cognee-community">cognee community repo</a>
<a href="https://github.com/topoteretes/cognee-community">Community Plugins & Add-ons</a>
</p>
[![GitHub forks](https://img.shields.io/github/forks/topoteretes/cognee.svg?style=social&label=Fork&maxAge=2592000)](https://GitHub.com/topoteretes/cognee/network/)
[![GitHub stars](https://img.shields.io/github/stars/topoteretes/cognee.svg?style=social&label=Star&maxAge=2592000)](https://GitHub.com/topoteretes/cognee/stargazers/)
[![GitHub commits](https://badgen.net/github/commits/topoteretes/cognee)](https://GitHub.com/topoteretes/cognee/commit/)
[![Github tag](https://badgen.net/github/tag/topoteretes/cognee)](https://github.com/topoteretes/cognee/tags/)
[![GitHub tag](https://badgen.net/github/tag/topoteretes/cognee)](https://github.com/topoteretes/cognee/tags/)
[![Downloads](https://static.pepy.tech/badge/cognee)](https://pepy.tech/project/cognee)
[![License](https://img.shields.io/github/license/topoteretes/cognee?colorA=00C586&colorB=000000)](https://github.com/topoteretes/cognee/blob/main/LICENSE)
[![Contributors](https://img.shields.io/github/contributors/topoteretes/cognee?colorA=00C586&colorB=000000)](https://github.com/topoteretes/cognee/graphs/contributors)
@ -41,11 +41,7 @@
</a>
</p>
Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Extract, Cognify, Load) pipelines.
Use your data to build personalized and dynamic memory for AI Agents. Cognee lets you replace RAG with scalable and modular ECL (Extract, Cognify, Load) pipelines.
<p align="center">
🌐 Available Languages
@ -53,7 +49,7 @@ Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Ext
<!-- Keep these links. Translations will automatically update with the README. -->
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=de">Deutsch</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=es">Español</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=fr">français</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=fr">Français</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=ja">日本語</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=ko">한국어</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=pt">Português</a> |
@ -67,73 +63,62 @@ Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Ext
</div>
</div>
## About Cognee
Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.
Cognee offers default memory creation and search which we describe bellow. But with Cognee you can build your own!
## Get Started
### Cognee Open Source:
Get started quickly with a Google Colab <a href="https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing">notebook</a> , <a href="https://deepnote.com/workspace/cognee-382213d0-0444-4c89-8265-13770e333c02/project/cognee-demo-78ffacb9-5832-4611-bb1a-560386068b30/notebook/Notebook-1-75b24cda566d4c24ab348f7150792601?utm_source=share-modal&utm_medium=product-shared-content&utm_campaign=notebook&utm_content=78ffacb9-5832-4611-bb1a-560386068b30">Deepnote notebook</a> or <a href="https://github.com/topoteretes/cognee/tree/main/cognee-starter-kit">starter repo</a>
- Interconnects any type of data — including past conversations, files, images, and audio transcriptions
- Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
- Reduces developer effort and infrastructure cost while improving quality and precision
- Provides Pythonic data pipelines for ingestion from 30+ data sources
- Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints
## About cognee
## Basic Usage & Feature Guide
cognee works locally and stores your data on your device.
Our hosted solution is just our deployment of OSS cognee on Modal, with the goal of making development and productionization easier.
To learn more, [check out this short, end-to-end Colab walkthrough](https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing) of Cognee's core features.
Self-hosted package:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing)
- Interconnects any kind of documents: past conversations, files, images, and audio transcriptions
- Replaces RAG systems with a memory layer based on graphs and vectors
- Reduces developer effort and cost, while increasing quality and precision
- Provides Pythonic data pipelines that manage data ingestion from 30+ data sources
- Is highly customizable with custom tasks, pipelines, and a set of built-in search endpoints
## Quickstart
Hosted platform:
- Includes a managed UI and a [hosted solution](https://www.cognee.ai)
Lets try Cognee in just a few lines of code. For detailed setup and configuration, see the [Cognee Docs](https://docs.cognee.ai/getting-started/installation#environment-configuration).
### Prerequisites
- Python 3.10 to 3.13
## Self-Hosted (Open Source)
### Step 1: Install Cognee
### 📦 Installation
You can install Cognee using either **pip**, **poetry**, **uv** or any other python package manager..
Cognee supports Python 3.10 to 3.12
#### With uv
You can install Cognee with **pip**, **poetry**, **uv**, or your preferred Python package manager.
```bash
uv pip install cognee
```
Detailed instructions can be found in our [docs](https://docs.cognee.ai/getting-started/installation#environment-configuration)
### 💻 Basic Usage
#### Setup
```
### Step 2: Configure the LLM
```python
import os
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"
```
Alternatively, create a `.env` file using our [template](https://github.com/topoteretes/cognee/blob/main/.env.template).
You can also set the variables by creating .env file, using our <a href="https://github.com/topoteretes/cognee/blob/main/.env.template">template.</a>
To use different LLM providers, for more info check out our <a href="https://docs.cognee.ai/setup-configuration/llm-providers">documentation</a>
To integrate other LLM providers, see our [LLM Provider Documentation](https://docs.cognee.ai/setup-configuration/llm-providers).
### Step 3: Run the Pipeline
#### Simple example
Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.
##### Python
This script will run the default pipeline:
Now, run a minimal pipeline:
```python
import cognee
import asyncio
from pprint import pprint
async def main():
@ -147,80 +132,72 @@ async def main():
await cognee.memify()
# Query the knowledge graph
results = await cognee.search("What does cognee do?")
results = await cognee.search("What does Cognee do?")
# Display the results
for result in results:
print(result)
pprint(result)
if __name__ == '__main__':
asyncio.run(main())
```
Example output:
```
As you can see, the output is generated from the document we previously stored in Cognee:
```bash
Cognee turns documents into AI memory.
```
##### Via CLI
Let's get the basics covered
### Use the Cognee CLI
```
As an alternative, you can get started with these essential commands:
```bash
cognee-cli add "Cognee turns documents into AI memory."
cognee-cli cognify
cognee-cli search "What does cognee do?"
cognee-cli search "What does Cognee do?"
cognee-cli delete --all
```
or run
```
To open the local UI, run:
```bash
cognee-cli -ui
```
## Demos & Examples
</div>
See Cognee in action:
### Persistent Agent Memory
[Cognee Memory for LangGraph Agents](https://github.com/user-attachments/assets/e113b628-7212-4a2b-b288-0be39a93a1c3)
### Simple GraphRAG
[Watch Demo](https://github.com/user-attachments/assets/f2186b2e-305a-42b0-9c2d-9f4473f15df8)
### Cognee with Ollama
[Watch Demo](https://github.com/user-attachments/assets/39672858-f774-4136-b957-1e2de67b8981)
### Hosted Platform
## Community & Support
Get up and running in minutes with automatic updates, analytics, and enterprise security.
### Contributing
We welcome contributions from the community! Your input helps make Cognee better for everyone. See [`CONTRIBUTING.md`](CONTRIBUTING.md) to get started.
1. Sign up on [cogwit](https://www.cognee.ai)
2. Add your API key to local UI and sync your data to Cogwit
### Code of Conduct
We're committed to fostering an inclusive and respectful community. Read our [Code of Conduct](https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md) for guidelines.
## Research & Citation
## Demos
1. Cogwit Beta demo:
[Cogwit Beta](https://github.com/user-attachments/assets/fa520cd2-2913-4246-a444-902ea5242cb0)
2. Simple GraphRAG demo
[Simple GraphRAG demo](https://github.com/user-attachments/assets/d80b0776-4eb9-4b8e-aa22-3691e2d44b8f)
3. cognee with Ollama
[cognee with local models](https://github.com/user-attachments/assets/8621d3e8-ecb8-4860-afb2-5594f2ee17db)
## Contributing
Your contributions are at the core of making this a true open source project. Any contributions you make are **greatly appreciated**. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for more information.
## Code of Conduct
We are committed to making open source an enjoyable and respectful experience for our community. See <a href="https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md"><code>CODE_OF_CONDUCT</code></a> for more information.
## Citation
We now have a paper you can cite:
We recently published a research paper on optimizing knowledge graphs for LLM reasoning:
```bibtex
@misc{markovic2025optimizinginterfaceknowledgegraphs,

View file

@ -87,11 +87,6 @@ db_engine = get_relational_engine()
print("Using database:", db_engine.db_uri)
if "sqlite" in db_engine.db_uri:
from cognee.infrastructure.utils.run_sync import run_sync
run_sync(db_engine.create_database())
config.set_section_option(
config.config_ini_section,
"SQLALCHEMY_DATABASE_URI",

View file

@ -10,6 +10,7 @@ from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
@ -26,7 +27,34 @@ def upgrade() -> None:
connection = op.get_bind()
inspector = sa.inspect(connection)
if op.get_context().dialect.name == "postgresql":
syncstatus_enum = postgresql.ENUM(
"STARTED", "IN_PROGRESS", "COMPLETED", "FAILED", "CANCELLED", name="syncstatus"
)
syncstatus_enum.create(op.get_bind(), checkfirst=True)
if "sync_operations" not in inspector.get_table_names():
if op.get_context().dialect.name == "postgresql":
syncstatus = postgresql.ENUM(
"STARTED",
"IN_PROGRESS",
"COMPLETED",
"FAILED",
"CANCELLED",
name="syncstatus",
create_type=False,
)
else:
syncstatus = sa.Enum(
"STARTED",
"IN_PROGRESS",
"COMPLETED",
"FAILED",
"CANCELLED",
name="syncstatus",
create_type=False,
)
# Table doesn't exist, create it normally
op.create_table(
"sync_operations",
@ -34,15 +62,7 @@ def upgrade() -> None:
sa.Column("run_id", sa.Text(), nullable=True),
sa.Column(
"status",
sa.Enum(
"STARTED",
"IN_PROGRESS",
"COMPLETED",
"FAILED",
"CANCELLED",
name="syncstatus",
create_type=False,
),
syncstatus,
nullable=True,
),
sa.Column("progress_percentage", sa.Integer(), nullable=True),

View file

@ -0,0 +1,333 @@
"""Expand dataset database with json connection field
Revision ID: 46a6ce2bd2b2
Revises: 76625596c5c3
Create Date: 2025-11-25 17:56:28.938931
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "46a6ce2bd2b2"
down_revision: Union[str, None] = "76625596c5c3"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
graph_constraint_name = "dataset_database_graph_database_name_key"
vector_constraint_name = "dataset_database_vector_database_name_key"
TABLE_NAME = "dataset_database"
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def _recreate_table_without_unique_constraint_sqlite(op, insp):
"""
SQLite cannot drop unique constraints on individual columns. We must:
1. Create a new table without the unique constraints.
2. Copy data from the old table.
3. Drop the old table.
4. Rename the new table.
"""
conn = op.get_bind()
# Create new table definition (without unique constraints)
op.create_table(
f"{TABLE_NAME}_new",
sa.Column("owner_id", sa.UUID()),
sa.Column("dataset_id", sa.UUID(), primary_key=True, nullable=False),
sa.Column("vector_database_name", sa.String(), nullable=False),
sa.Column("graph_database_name", sa.String(), nullable=False),
sa.Column("vector_database_provider", sa.String(), nullable=False),
sa.Column("graph_database_provider", sa.String(), nullable=False),
sa.Column(
"vector_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
sa.Column(
"graph_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
sa.Column("vector_database_url", sa.String()),
sa.Column("graph_database_url", sa.String()),
sa.Column("vector_database_key", sa.String()),
sa.Column("graph_database_key", sa.String()),
sa.Column(
"graph_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column(
"vector_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column("created_at", sa.DateTime()),
sa.Column("updated_at", sa.DateTime()),
sa.ForeignKeyConstraint(["dataset_id"], ["datasets.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["owner_id"], ["principals.id"], ondelete="CASCADE"),
)
# Copy data into new table
conn.execute(
sa.text(f"""
INSERT INTO {TABLE_NAME}_new
SELECT
owner_id,
dataset_id,
vector_database_name,
graph_database_name,
vector_database_provider,
graph_database_provider,
vector_dataset_database_handler,
graph_dataset_database_handler,
vector_database_url,
graph_database_url,
vector_database_key,
graph_database_key,
COALESCE(graph_database_connection_info, '{{}}'),
COALESCE(vector_database_connection_info, '{{}}'),
created_at,
updated_at
FROM {TABLE_NAME}
""")
)
# Drop old table
op.drop_table(TABLE_NAME)
# Rename new table
op.rename_table(f"{TABLE_NAME}_new", TABLE_NAME)
def _recreate_table_with_unique_constraint_sqlite(op, insp):
"""
SQLite cannot drop unique constraints on individual columns. We must:
1. Create a new table without the unique constraints.
2. Copy data from the old table.
3. Drop the old table.
4. Rename the new table.
"""
conn = op.get_bind()
# Create new table definition (without unique constraints)
op.create_table(
f"{TABLE_NAME}_new",
sa.Column("owner_id", sa.UUID()),
sa.Column("dataset_id", sa.UUID(), primary_key=True, nullable=False),
sa.Column("vector_database_name", sa.String(), nullable=False, unique=True),
sa.Column("graph_database_name", sa.String(), nullable=False, unique=True),
sa.Column("vector_database_provider", sa.String(), nullable=False),
sa.Column("graph_database_provider", sa.String(), nullable=False),
sa.Column(
"vector_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
sa.Column(
"graph_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
sa.Column("vector_database_url", sa.String()),
sa.Column("graph_database_url", sa.String()),
sa.Column("vector_database_key", sa.String()),
sa.Column("graph_database_key", sa.String()),
sa.Column(
"graph_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column(
"vector_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column("created_at", sa.DateTime()),
sa.Column("updated_at", sa.DateTime()),
sa.ForeignKeyConstraint(["dataset_id"], ["datasets.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["owner_id"], ["principals.id"], ondelete="CASCADE"),
)
# Copy data into new table
conn.execute(
sa.text(f"""
INSERT INTO {TABLE_NAME}_new
SELECT
owner_id,
dataset_id,
vector_database_name,
graph_database_name,
vector_database_provider,
graph_database_provider,
vector_dataset_database_handler,
graph_dataset_database_handler,
vector_database_url,
graph_database_url,
vector_database_key,
graph_database_key,
COALESCE(graph_database_connection_info, '{{}}'),
COALESCE(vector_database_connection_info, '{{}}'),
created_at,
updated_at
FROM {TABLE_NAME}
""")
)
# Drop old table
op.drop_table(TABLE_NAME)
# Rename new table
op.rename_table(f"{TABLE_NAME}_new", TABLE_NAME)
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
unique_constraints = insp.get_unique_constraints(TABLE_NAME)
vector_database_connection_info_column = _get_column(
insp, "dataset_database", "vector_database_connection_info"
)
if not vector_database_connection_info_column:
op.add_column(
"dataset_database",
sa.Column(
"vector_database_connection_info",
sa.JSON(),
unique=False,
nullable=False,
server_default=sa.text("'{}'"),
),
)
vector_dataset_database_handler = _get_column(
insp, "dataset_database", "vector_dataset_database_handler"
)
if not vector_dataset_database_handler:
# Add LanceDB as the default graph dataset database handler
op.add_column(
"dataset_database",
sa.Column(
"vector_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
)
graph_database_connection_info_column = _get_column(
insp, "dataset_database", "graph_database_connection_info"
)
if not graph_database_connection_info_column:
op.add_column(
"dataset_database",
sa.Column(
"graph_database_connection_info",
sa.JSON(),
unique=False,
nullable=False,
server_default=sa.text("'{}'"),
),
)
graph_dataset_database_handler = _get_column(
insp, "dataset_database", "graph_dataset_database_handler"
)
if not graph_dataset_database_handler:
# Add Kuzu as the default graph dataset database handler
op.add_column(
"dataset_database",
sa.Column(
"graph_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
)
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
# Drop the unique constraint to make unique=False
graph_constraint_to_drop = None
for uc in unique_constraints:
# Check if the constraint covers ONLY the target column
if uc["name"] == graph_constraint_name:
graph_constraint_to_drop = uc["name"]
break
vector_constraint_to_drop = None
for uc in unique_constraints:
# Check if the constraint covers ONLY the target column
if uc["name"] == vector_constraint_name:
vector_constraint_to_drop = uc["name"]
break
if (
vector_constraint_to_drop
and graph_constraint_to_drop
and op.get_context().dialect.name == "postgresql"
):
# PostgreSQL
batch_op.drop_constraint(graph_constraint_name, type_="unique")
batch_op.drop_constraint(vector_constraint_name, type_="unique")
if op.get_context().dialect.name == "sqlite":
conn = op.get_bind()
# Fun fact: SQLite has hidden auto indexes for unique constraints that can't be dropped or accessed directly
# So we need to check for them and drop them by recreating the table (altering column also won't work)
result = conn.execute(sa.text("PRAGMA index_list('dataset_database')"))
rows = result.fetchall()
unique_auto_indexes = [row for row in rows if row[3] == "u"]
for row in unique_auto_indexes:
result = conn.execute(sa.text(f"PRAGMA index_info('{row[1]}')"))
index_info = result.fetchall()
if index_info[0][2] == "vector_database_name":
# In case a unique index exists on vector_database_name, drop it and the graph_database_name one
_recreate_table_without_unique_constraint_sqlite(op, insp)
def downgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
if op.get_context().dialect.name == "sqlite":
_recreate_table_with_unique_constraint_sqlite(op, insp)
elif op.get_context().dialect.name == "postgresql":
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
# Re-add the unique constraint to return to unique=True
batch_op.create_unique_constraint(graph_constraint_name, ["graph_database_name"])
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
# Re-add the unique constraint to return to unique=True
batch_op.create_unique_constraint(vector_constraint_name, ["vector_database_name"])
op.drop_column("dataset_database", "vector_database_connection_info")
op.drop_column("dataset_database", "graph_database_connection_info")
op.drop_column("dataset_database", "vector_dataset_database_handler")
op.drop_column("dataset_database", "graph_dataset_database_handler")

View file

@ -23,11 +23,8 @@ depends_on: Union[str, Sequence[str], None] = "8057ae7329c2"
def upgrade() -> None:
try:
await_only(create_default_user())
except UserAlreadyExists:
pass # It's fine if the default user already exists
pass
def downgrade() -> None:
await_only(delete_user("default_user@example.com"))
pass

View file

@ -0,0 +1,98 @@
"""Expand dataset database for multi user
Revision ID: 76625596c5c3
Revises: 211ab850ef3d
Create Date: 2025-10-30 12:55:20.239562
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "76625596c5c3"
down_revision: Union[str, None] = "c946955da633"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
vector_database_provider_column = _get_column(
insp, "dataset_database", "vector_database_provider"
)
if not vector_database_provider_column:
op.add_column(
"dataset_database",
sa.Column(
"vector_database_provider",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
)
graph_database_provider_column = _get_column(
insp, "dataset_database", "graph_database_provider"
)
if not graph_database_provider_column:
op.add_column(
"dataset_database",
sa.Column(
"graph_database_provider",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
)
vector_database_url_column = _get_column(insp, "dataset_database", "vector_database_url")
if not vector_database_url_column:
op.add_column(
"dataset_database",
sa.Column("vector_database_url", sa.String(), unique=False, nullable=True),
)
graph_database_url_column = _get_column(insp, "dataset_database", "graph_database_url")
if not graph_database_url_column:
op.add_column(
"dataset_database",
sa.Column("graph_database_url", sa.String(), unique=False, nullable=True),
)
vector_database_key_column = _get_column(insp, "dataset_database", "vector_database_key")
if not vector_database_key_column:
op.add_column(
"dataset_database",
sa.Column("vector_database_key", sa.String(), unique=False, nullable=True),
)
graph_database_key_column = _get_column(insp, "dataset_database", "graph_database_key")
if not graph_database_key_column:
op.add_column(
"dataset_database",
sa.Column("graph_database_key", sa.String(), unique=False, nullable=True),
)
def downgrade() -> None:
op.drop_column("dataset_database", "vector_database_provider")
op.drop_column("dataset_database", "graph_database_provider")
op.drop_column("dataset_database", "vector_database_url")
op.drop_column("dataset_database", "graph_database_url")
op.drop_column("dataset_database", "vector_database_key")
op.drop_column("dataset_database", "graph_database_key")

View file

@ -18,11 +18,8 @@ depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
db_engine = get_relational_engine()
# we might want to delete this
await_only(db_engine.create_database())
pass
def downgrade() -> None:
db_engine = get_relational_engine()
await_only(db_engine.delete_database())
pass

View file

@ -144,44 +144,58 @@ def _create_data_permission(conn, user_id, data_id, permission_name):
)
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
# Recreate ACLs table with default permissions set to datasets instead of documents
op.drop_table("acls")
dataset_id_column = _get_column(insp, "acls", "dataset_id")
if not dataset_id_column:
# Recreate ACLs table with default permissions set to datasets instead of documents
op.drop_table("acls")
acls_table = op.create_table(
"acls",
sa.Column("id", UUID, primary_key=True, default=uuid4),
sa.Column(
"created_at", sa.DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
),
sa.Column(
"updated_at", sa.DateTime(timezone=True), onupdate=lambda: datetime.now(timezone.utc)
),
sa.Column("principal_id", UUID, sa.ForeignKey("principals.id")),
sa.Column("permission_id", UUID, sa.ForeignKey("permissions.id")),
sa.Column("dataset_id", UUID, sa.ForeignKey("datasets.id", ondelete="CASCADE")),
)
acls_table = op.create_table(
"acls",
sa.Column("id", UUID, primary_key=True, default=uuid4),
sa.Column(
"created_at", sa.DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
),
sa.Column(
"updated_at",
sa.DateTime(timezone=True),
onupdate=lambda: datetime.now(timezone.utc),
),
sa.Column("principal_id", UUID, sa.ForeignKey("principals.id")),
sa.Column("permission_id", UUID, sa.ForeignKey("permissions.id")),
sa.Column("dataset_id", UUID, sa.ForeignKey("datasets.id", ondelete="CASCADE")),
)
# Note: We can't use any Cognee model info to gather data (as it can change) in database so we must use our own table
# definition or load what is in the database
dataset_table = _define_dataset_table()
datasets = conn.execute(sa.select(dataset_table)).fetchall()
# Note: We can't use any Cognee model info to gather data (as it can change) in database so we must use our own table
# definition or load what is in the database
dataset_table = _define_dataset_table()
datasets = conn.execute(sa.select(dataset_table)).fetchall()
if not datasets:
return
if not datasets:
return
acl_list = []
acl_list = []
for dataset in datasets:
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "read"))
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "write"))
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "share"))
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "delete"))
for dataset in datasets:
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "read"))
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "write"))
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "share"))
acl_list.append(
_create_dataset_permission(conn, dataset.owner_id, dataset.id, "delete")
)
if acl_list:
op.bulk_insert(acls_table, acl_list)
if acl_list:
op.bulk_insert(acls_table, acl_list)
def downgrade() -> None:

View file

@ -0,0 +1,137 @@
"""Multi Tenant Support
Revision ID: c946955da633
Revises: 211ab850ef3d
Create Date: 2025-11-04 18:11:09.325158
"""
from typing import Sequence, Union
from datetime import datetime, timezone
from uuid import uuid4
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "c946955da633"
down_revision: Union[str, None] = "211ab850ef3d"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def _now():
return datetime.now(timezone.utc)
def _define_user_table() -> sa.Table:
table = sa.Table(
"users",
sa.MetaData(),
sa.Column(
"id",
sa.UUID,
sa.ForeignKey("principals.id", ondelete="CASCADE"),
primary_key=True,
nullable=False,
),
sa.Column("tenant_id", sa.UUID, sa.ForeignKey("tenants.id"), index=True, nullable=True),
)
return table
def _define_dataset_table() -> sa.Table:
# Note: We can't use any Cognee model info to gather data (as it can change) in database so we must use our own table
# definition or load what is in the database
table = sa.Table(
"datasets",
sa.MetaData(),
sa.Column("id", sa.UUID, primary_key=True, default=uuid4),
sa.Column("name", sa.Text),
sa.Column(
"created_at",
sa.DateTime(timezone=True),
default=lambda: datetime.now(timezone.utc),
),
sa.Column(
"updated_at",
sa.DateTime(timezone=True),
onupdate=lambda: datetime.now(timezone.utc),
),
sa.Column("owner_id", sa.UUID(), sa.ForeignKey("principals.id"), index=True),
sa.Column("tenant_id", sa.UUID(), sa.ForeignKey("tenants.id"), index=True, nullable=True),
)
return table
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
dataset = _define_dataset_table()
user = _define_user_table()
if "user_tenants" not in insp.get_table_names():
# Define table with all necessary columns including primary key
user_tenants = op.create_table(
"user_tenants",
sa.Column("user_id", sa.UUID, sa.ForeignKey("users.id"), primary_key=True),
sa.Column("tenant_id", sa.UUID, sa.ForeignKey("tenants.id"), primary_key=True),
sa.Column(
"created_at", sa.DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
),
)
# Get all users with their tenant_id
user_data = conn.execute(
sa.select(user.c.id, user.c.tenant_id).where(user.c.tenant_id.isnot(None))
).fetchall()
# Insert into user_tenants table
if user_data:
op.bulk_insert(
user_tenants,
[
{"user_id": user_id, "tenant_id": tenant_id, "created_at": _now()}
for user_id, tenant_id in user_data
],
)
tenant_id_column = _get_column(insp, "datasets", "tenant_id")
if not tenant_id_column:
op.add_column("datasets", sa.Column("tenant_id", sa.UUID(), nullable=True))
# Build subquery, select users.tenant_id for each dataset.owner_id
tenant_id_from_dataset_owner = (
sa.select(user.c.tenant_id).where(user.c.id == dataset.c.owner_id).scalar_subquery()
)
if op.get_context().dialect.name == "sqlite":
# If column doesn't exist create new original_extension column and update from values of extension column
with op.batch_alter_table("datasets") as batch_op:
batch_op.execute(
dataset.update().values(
tenant_id=tenant_id_from_dataset_owner,
)
)
else:
conn = op.get_bind()
conn.execute(dataset.update().values(tenant_id=tenant_id_from_dataset_owner))
op.create_index(op.f("ix_datasets_tenant_id"), "datasets", ["tenant_id"])
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table("user_tenants")
op.drop_index(op.f("ix_datasets_tenant_id"), table_name="datasets")
op.drop_column("datasets", "tenant_id")
# ### end Alembic commands ###

File diff suppressed because it is too large Load diff

View file

@ -9,13 +9,13 @@
"lint": "next lint"
},
"dependencies": {
"@auth0/nextjs-auth0": "^4.6.0",
"@auth0/nextjs-auth0": "^4.13.1",
"classnames": "^2.5.1",
"culori": "^4.0.1",
"d3-force-3d": "^3.0.6",
"next": "15.3.3",
"react": "^19.0.0",
"react-dom": "^19.0.0",
"next": "16.1.1",
"react": "^19.2.0",
"react-dom": "^19.2.0",
"react-force-graph-2d": "^1.27.1",
"uuid": "^9.0.1"
},
@ -24,11 +24,11 @@
"@tailwindcss/postcss": "^4.1.7",
"@types/culori": "^4.0.0",
"@types/node": "^20",
"@types/react": "^18",
"@types/react-dom": "^18",
"@types/react": "^19",
"@types/react-dom": "^19",
"@types/uuid": "^9.0.8",
"eslint": "^9",
"eslint-config-next": "^15.3.3",
"eslint-config-next": "^16.0.4",
"eslint-config-prettier": "^10.1.5",
"tailwindcss": "^4.1.7",
"typescript": "^5"

View file

@ -1,119 +0,0 @@
import { useState } from "react";
import { fetch } from "@/utils";
import { v4 as uuid4 } from "uuid";
import { LoadingIndicator } from "@/ui/App";
import { CTAButton, Input } from "@/ui/elements";
interface CrewAIFormPayload extends HTMLFormElement {
username1: HTMLInputElement;
username2: HTMLInputElement;
}
interface CrewAITriggerProps {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
onData: (data: any) => void;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
onActivity: (activities: any) => void;
}
export default function CrewAITrigger({ onData, onActivity }: CrewAITriggerProps) {
const [isCrewAIRunning, setIsCrewAIRunning] = useState(false);
const handleRunCrewAI = (event: React.FormEvent<CrewAIFormPayload>) => {
event.preventDefault();
const formElements = event.currentTarget;
const crewAIConfig = {
username1: formElements.username1.value,
username2: formElements.username2.value,
};
const backendApiUrl = process.env.NEXT_PUBLIC_BACKEND_API_URL;
const wsUrl = backendApiUrl.replace(/^http(s)?/, "ws");
const websocket = new WebSocket(`${wsUrl}/v1/crewai/subscribe`);
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Dispatching hiring crew agents" }]);
websocket.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.status === "PipelineRunActivity") {
onActivity([data.payload]);
return;
}
onData({
nodes: data.payload.nodes,
links: data.payload.edges,
});
const nodes_type_map: { [key: string]: number } = {};
for (let i = 0; i < data.payload.nodes.length; i++) {
const node = data.payload.nodes[i];
if (!nodes_type_map[node.type]) {
nodes_type_map[node.type] = 0;
}
nodes_type_map[node.type] += 1;
}
const activityMessage = Object.entries(nodes_type_map).reduce((message, [type, count]) => {
return `${message}\n | ${type}: ${count}`;
}, "Graph updated:");
onActivity([{
id: uuid4(),
timestamp: Date.now(),
activity: activityMessage,
}]);
if (data.status === "PipelineRunCompleted") {
websocket.close();
}
};
onData(null);
setIsCrewAIRunning(true);
return fetch("/v1/crewai/run", {
method: "POST",
body: JSON.stringify(crewAIConfig),
headers: {
"Content-Type": "application/json",
},
})
.then(response => response.json())
.then(() => {
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Hiring crew agents made a decision" }]);
})
.catch(() => {
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Hiring crew agents had problems while executing" }]);
})
.finally(() => {
websocket.close();
setIsCrewAIRunning(false);
});
};
return (
<form className="w-full flex flex-col gap-2" onSubmit={handleRunCrewAI}>
<h1 className="text-2xl text-white">Cognee Dev Mexican Standoff</h1>
<span className="text-white">Agents compare GitHub profiles, and make a decision who is a better developer</span>
<div className="flex flex-row gap-2">
<div className="flex flex-col w-full flex-1/2">
<label className="block mb-1 text-white" htmlFor="username1">GitHub username</label>
<Input name="username1" type="text" placeholder="Github Username" required defaultValue="hajdul88" />
</div>
<div className="flex flex-col w-full flex-1/2">
<label className="block mb-1 text-white" htmlFor="username2">GitHub username</label>
<Input name="username2" type="text" placeholder="Github Username" required defaultValue="lxobr" />
</div>
</div>
<CTAButton type="submit" disabled={isCrewAIRunning} className="whitespace-nowrap">
Start Mexican Standoff
{isCrewAIRunning && <LoadingIndicator />}
</CTAButton>
</form>
);
}

View file

@ -6,7 +6,6 @@ import { NodeObject, LinkObject } from "react-force-graph-2d";
import { ChangeEvent, useEffect, useImperativeHandle, useRef, useState } from "react";
import { DeleteIcon } from "@/ui/Icons";
// import { FeedbackForm } from "@/ui/Partials";
import { CTAButton, Input, NeutralButton, Select } from "@/ui/elements";
interface GraphControlsProps {
@ -111,7 +110,7 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
};
const [isAuthShapeChangeEnabled, setIsAuthShapeChangeEnabled] = useState(true);
const shapeChangeTimeout = useRef<number | null>();
const shapeChangeTimeout = useRef<number | null>(null);
useEffect(() => {
onGraphShapeChange(DEFAULT_GRAPH_SHAPE);
@ -230,12 +229,6 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
)}
</>
{/* )} */}
{/* {selectedTab === "feedback" && (
<div className="flex flex-col gap-2">
<FeedbackForm onSuccess={() => {}} />
</div>
)} */}
</div>
</>
);

View file

@ -1,6 +1,6 @@
"use client";
import { useCallback, useRef, useState, MutableRefObject } from "react";
import { useCallback, useRef, useState, RefObject } from "react";
import Link from "next/link";
import { TextLogo } from "@/ui/App";
@ -47,11 +47,11 @@ export default function GraphView() {
updateData(newData);
}, []);
const graphRef = useRef<GraphVisualizationAPI>();
const graphRef = useRef<GraphVisualizationAPI>(null);
const graphControls = useRef<GraphControlsAPI>();
const graphControls = useRef<GraphControlsAPI>(null);
const activityLog = useRef<ActivityLogAPI>();
const activityLog = useRef<ActivityLogAPI>(null);
return (
<main className="flex flex-col h-full">
@ -74,21 +74,18 @@ export default function GraphView() {
<div className="w-full h-full relative overflow-hidden">
<GraphVisualization
key={data?.nodes.length}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
data={data}
graphControls={graphControls as MutableRefObject<GraphControlsAPI>}
graphControls={graphControls as RefObject<GraphControlsAPI>}
/>
<div className="absolute top-2 left-2 flex flex-col gap-2">
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
<CogneeAddWidget onData={onDataChange} />
</div>
{/* <div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
<CrewAITrigger onData={onDataChange} onActivity={(activities) => activityLog.current?.updateActivityLog(activities)} />
</div> */}
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
<h2 className="text-xl text-white mb-4">Activity Log</h2>
<ActivityLog ref={activityLog as MutableRefObject<ActivityLogAPI>} />
<ActivityLog ref={activityLog as RefObject<ActivityLogAPI>} />
</div>
</div>
@ -96,7 +93,7 @@ export default function GraphView() {
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-110">
<GraphControls
data={data}
ref={graphControls as MutableRefObject<GraphControlsAPI>}
ref={graphControls as RefObject<GraphControlsAPI>}
isAddNodeFormOpen={isAddNodeFormOpen}
onFitIntoView={() => graphRef.current!.zoomToFit(1000, 50)}
onGraphShapeChange={(shape) => graphRef.current!.setGraphShape(shape)}

View file

@ -1,7 +1,7 @@
"use client";
import classNames from "classnames";
import { MutableRefObject, useEffect, useImperativeHandle, useRef, useState, useCallback } from "react";
import { RefObject, useEffect, useImperativeHandle, useRef, useState, useCallback } from "react";
import { forceCollide, forceManyBody } from "d3-force-3d";
import dynamic from "next/dynamic";
import { GraphControlsAPI } from "./GraphControls";
@ -16,9 +16,9 @@ const ForceGraph = dynamic(() => import("react-force-graph-2d"), {
import type { ForceGraphMethods, GraphData, LinkObject, NodeObject } from "react-force-graph-2d";
interface GraphVisuzaliationProps {
ref: MutableRefObject<GraphVisualizationAPI>;
ref: RefObject<GraphVisualizationAPI>;
data?: GraphData<NodeObject, LinkObject>;
graphControls: MutableRefObject<GraphControlsAPI>;
graphControls: RefObject<GraphControlsAPI>;
className?: string;
}
@ -205,7 +205,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
// eslint-disable-next-line @typescript-eslint/no-unused-vars
function handleDagError(loopNodeIds: (string | number)[]) {}
const graphRef = useRef<ForceGraphMethods>();
const graphRef = useRef<ForceGraphMethods>(null);
useEffect(() => {
if (data && graphRef.current) {
@ -224,6 +224,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
) => {
if (!graphRef.current) {
console.warn("GraphVisualization: graphRef not ready yet");
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return undefined as any;
}
@ -239,7 +240,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
return (
<div ref={containerRef} className={classNames("w-full h-full", className)} id="graph-container">
<ForceGraph
ref={graphRef}
ref={graphRef as RefObject<ForceGraphMethods>}
width={dimensions.width}
height={dimensions.height}
dagMode={graphShape as unknown as undefined}

View file

@ -1,8 +1,8 @@
"use server";
"use client";
import Dashboard from "./Dashboard";
export default async function Page() {
export default function Page() {
const accessToken = "";
return (

View file

@ -1,3 +1,3 @@
export { default } from "./dashboard/page";
// export const dynamic = "force-dynamic";
export const dynamic = "force-dynamic";

View file

@ -13,7 +13,6 @@ export interface Dataset {
function useDatasets(useCloud = false) {
const [datasets, setDatasets] = useState<Dataset[]>([]);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
// const statusTimeout = useRef<any>(null);
// const fetchDatasetStatuses = useCallback((datasets: Dataset[]) => {

View file

@ -2,7 +2,7 @@ import { NextResponse, type NextRequest } from "next/server";
// import { auth0 } from "./modules/auth/auth0";
// eslint-disable-next-line @typescript-eslint/no-unused-vars
export async function middleware(request: NextRequest) {
export async function proxy(request: NextRequest) {
// if (process.env.USE_AUTH0_AUTHORIZATION?.toLowerCase() === "true") {
// if (request.nextUrl.pathname === "/auth/token") {
// return NextResponse.next();

View file

@ -1,69 +0,0 @@
"use client";
import { useState } from "react";
import { LoadingIndicator } from "@/ui/App";
import { fetch, useBoolean } from "@/utils";
import { CTAButton, TextArea } from "@/ui/elements";
interface SignInFormPayload extends HTMLFormElement {
feedback: HTMLTextAreaElement;
}
interface FeedbackFormProps {
onSuccess: () => void;
}
export default function FeedbackForm({ onSuccess }: FeedbackFormProps) {
const {
value: isSubmittingFeedback,
setTrue: disableFeedbackSubmit,
setFalse: enableFeedbackSubmit,
} = useBoolean(false);
const [feedbackError, setFeedbackError] = useState<string | null>(null);
const signIn = (event: React.FormEvent<SignInFormPayload>) => {
event.preventDefault();
const formElements = event.currentTarget;
setFeedbackError(null);
disableFeedbackSubmit();
fetch("/v1/crewai/feedback", {
method: "POST",
body: JSON.stringify({
feedback: formElements.feedback.value,
}),
headers: {
"Content-Type": "application/json",
},
})
.then(response => response.json())
.then(() => {
onSuccess();
formElements.feedback.value = "";
})
.catch(error => setFeedbackError(error.detail))
.finally(() => enableFeedbackSubmit());
};
return (
<form onSubmit={signIn} className="flex flex-col gap-2">
<div className="flex flex-col gap-2">
<div className="mb-4">
<label className="block text-white" htmlFor="feedback">Feedback on agent&apos;s reasoning</label>
<TextArea id="feedback" name="feedback" type="text" placeholder="Your feedback" />
</div>
</div>
<CTAButton type="submit">
<span>Submit feedback</span>
{isSubmittingFeedback && <LoadingIndicator />}
</CTAButton>
{feedbackError && (
<span className="text-s text-white">{feedbackError}</span>
)}
</form>
)
}

View file

@ -3,4 +3,3 @@ export { default as Footer } from "./Footer/Footer";
export { default as SearchView } from "./SearchView/SearchView";
export { default as IFrameView } from "./IFrameView/IFrameView";
// export { default as Explorer } from "./Explorer/Explorer";
export { default as FeedbackForm } from "./FeedbackForm";

View file

@ -2,7 +2,7 @@
import { v4 as uuid4 } from "uuid";
import classNames from "classnames";
import { Fragment, MouseEvent, MutableRefObject, useCallback, useEffect, useRef, useState } from "react";
import { Fragment, MouseEvent, RefObject, useCallback, useEffect, useRef, useState } from "react";
import { useModal } from "@/ui/elements/Modal";
import { CaretIcon, CloseIcon, PlusIcon } from "@/ui/Icons";
@ -282,7 +282,7 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
function CellResult({ content }: { content: [] }) {
const parsedContent = [];
const graphRef = useRef<GraphVisualizationAPI>();
const graphRef = useRef<GraphVisualizationAPI>(null);
const graphControls = useRef<GraphControlsAPI>({
setSelectedNode: () => {},
getSelectedNode: () => null,
@ -298,7 +298,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph</span>
<GraphVisualization
data={transformInsightsGraphData(line)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
@ -346,7 +346,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
@ -377,7 +377,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>

View file

@ -33,7 +33,7 @@ export default function NotebookCellHeader({
setFalse: setIsNotRunningCell,
} = useBoolean(false);
const [runInstance, setRunInstance] = useState<string>(isCloudEnvironment() ? "cloud" : "local");
const [runInstance] = useState<string>(isCloudEnvironment() ? "cloud" : "local");
const handleCellRun = () => {
if (runCell) {

View file

@ -14,7 +14,7 @@
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"jsx": "react-jsx",
"incremental": true,
"plugins": [
{
@ -32,7 +32,8 @@
"next-env.d.ts",
"**/*.ts",
"**/*.tsx",
".next/types/**/*.ts"
".next/types/**/*.ts",
".next/dev/types/**/*.ts"
],
"exclude": [
"node_modules"

View file

@ -445,16 +445,22 @@ The MCP server exposes its functionality through tools. Call them from any MCP c
- **cognify**: Turns your data into a structured knowledge graph and stores it in memory
- **cognee_add_developer_rules**: Ingest core developer rule files into memory
- **codify**: Analyse a code repository, build a code graph, stores it in memory
- **search**: Query memory supports GRAPH_COMPLETION, RAG_COMPLETION, CODE, CHUNKS
- **list_data**: List all datasets and their data items with IDs for deletion operations
- **delete**: Delete specific data from a dataset (supports soft/hard deletion modes)
- **get_developer_rules**: Retrieve all developer rules that were generated based on previous interactions
- **list_data**: List all datasets and their data items with IDs for deletion operations
- **save_interaction**: Logs user-agent interactions and query-answer pairs
- **prune**: Reset cognee for a fresh start (removes all data)
- **search**: Query memory supports GRAPH_COMPLETION, RAG_COMPLETION, CODE, CHUNKS, SUMMARIES, CYPHER, and FEELING_LUCKY
- **cognify_status / codify_status**: Track pipeline progress
**Data Management Examples:**

View file

@ -1,6 +1,6 @@
[project]
name = "cognee-mcp"
version = "0.4.0"
version = "0.5.0"
description = "Cognee MCP server"
readme = "README.md"
requires-python = ">=3.10"
@ -9,7 +9,7 @@ dependencies = [
# For local cognee repo usage remove comment bellow and add absolute path to cognee. Then run `uv sync --reinstall` in the mcp folder on local cognee changes.
#"cognee[postgres,codegraph,gemini,huggingface,docs,neo4j] @ file:/Users/igorilic/Desktop/cognee",
# TODO: Remove gemini from optional dependecnies for new Cognee version after 0.3.4
"cognee[postgres,docs,neo4j]==0.3.7",
"cognee[postgres,docs,neo4j]==0.5.0",
"fastmcp>=2.10.0,<3.0.0",
"mcp>=1.12.0,<2.0.0",
"uv>=0.6.3,<1.0.0",

View file

@ -151,7 +151,7 @@ class CogneeClient:
query_type: str,
datasets: Optional[List[str]] = None,
system_prompt: Optional[str] = None,
top_k: int = 10,
top_k: int = 5,
) -> Any:
"""
Search the knowledge graph.
@ -192,7 +192,7 @@ class CogneeClient:
with redirect_stdout(sys.stderr):
results = await self.cognee.search(
query_type=SearchType[query_type.upper()], query_text=query_text
query_type=SearchType[query_type.upper()], query_text=query_text, top_k=top_k
)
return results

View file

@ -90,97 +90,6 @@ async def health_check(request):
return JSONResponse({"status": "ok"})
@mcp.tool()
async def cognee_add_developer_rules(
base_path: str = ".", graph_model_file: str = None, graph_model_name: str = None
) -> list:
"""
Ingest core developer rule files into Cognee's memory layer.
This function loads a predefined set of developer-related configuration,
rule, and documentation files from the base repository and assigns them
to the special 'developer_rules' node set in Cognee. It ensures these
foundational files are always part of the structured memory graph.
Parameters
----------
base_path : str
Root path to resolve relative file paths. Defaults to current directory.
graph_model_file : str, optional
Optional path to a custom schema file for knowledge graph generation.
graph_model_name : str, optional
Optional class name to use from the graph_model_file schema.
Returns
-------
list
A message indicating how many rule files were scheduled for ingestion,
and how to check their processing status.
Notes
-----
- Each file is processed asynchronously in the background.
- Files are attached to the 'developer_rules' node set.
- Missing files are skipped with a logged warning.
"""
developer_rule_paths = [
".cursorrules",
".cursor/rules",
".same/todos.md",
".windsurfrules",
".clinerules",
"CLAUDE.md",
".sourcegraph/memory.md",
"AGENT.md",
"AGENTS.md",
]
async def cognify_task(file_path: str) -> None:
with redirect_stdout(sys.stderr):
logger.info(f"Starting cognify for: {file_path}")
try:
await cognee_client.add(file_path, node_set=["developer_rules"])
model = None
if graph_model_file and graph_model_name:
if cognee_client.use_api:
logger.warning(
"Custom graph models are not supported in API mode, ignoring."
)
else:
from cognee.shared.data_models import KnowledgeGraph
model = load_class(graph_model_file, graph_model_name)
await cognee_client.cognify(graph_model=model)
logger.info(f"Cognify finished for: {file_path}")
except Exception as e:
logger.error(f"Cognify failed for {file_path}: {str(e)}")
raise ValueError(f"Failed to cognify: {str(e)}")
tasks = []
for rel_path in developer_rule_paths:
abs_path = os.path.join(base_path, rel_path)
if os.path.isfile(abs_path):
tasks.append(asyncio.create_task(cognify_task(abs_path)))
else:
logger.warning(f"Skipped missing developer rule file: {abs_path}")
log_file = get_log_file_location()
return [
types.TextContent(
type="text",
text=(
f"Started cognify for {len(tasks)} developer rule files in background.\n"
f"All are added to the `developer_rules` node set.\n"
f"Use `cognify_status` or check logs at {log_file} to monitor progress."
),
)
]
@mcp.tool()
async def cognify(
data: str, graph_model_file: str = None, graph_model_name: str = None, custom_prompt: str = None
@ -194,7 +103,6 @@ async def cognify(
Prerequisites:
- **LLM_API_KEY**: Must be configured (required for entity extraction and graph generation)
- **Data Added**: Must have data previously added via `cognee.add()`
- **Vector Database**: Must be accessible for embeddings storage
- **Graph Database**: Must be accessible for relationship storage
@ -408,76 +316,7 @@ async def save_interaction(data: str) -> list:
@mcp.tool()
async def codify(repo_path: str) -> list:
"""
Analyze and generate a code-specific knowledge graph from a software repository.
This function launches a background task that processes the provided repository
and builds a code knowledge graph. The function returns immediately while
the processing continues in the background due to MCP timeout constraints.
Parameters
----------
repo_path : str
Path to the code repository to analyze. This can be a local file path or a
relative path to a repository. The path should point to the root of the
repository or a specific directory within it.
Returns
-------
list
A list containing a single TextContent object with information about the
background task launch and how to check its status.
Notes
-----
- The function launches a background task and returns immediately
- The code graph generation may take significant time for larger repositories
- Use the codify_status tool to check the progress of the operation
- Process results are logged to the standard Cognee log file
- All stdout is redirected to stderr to maintain MCP communication integrity
"""
if cognee_client.use_api:
error_msg = "❌ Codify operation is not available in API mode. Please use direct mode for code graph pipeline."
logger.error(error_msg)
return [types.TextContent(type="text", text=error_msg)]
async def codify_task(repo_path: str):
# NOTE: MCP uses stdout to communicate, we must redirect all output
# going to stdout ( like the print function ) to stderr.
with redirect_stdout(sys.stderr):
logger.info("Codify process starting.")
from cognee.api.v1.cognify.code_graph_pipeline import run_code_graph_pipeline
results = []
async for result in run_code_graph_pipeline(repo_path, False):
results.append(result)
logger.info(result)
if all(results):
logger.info("Codify process finished succesfully.")
else:
logger.info("Codify process failed.")
asyncio.create_task(codify_task(repo_path))
log_file = get_log_file_location()
text = (
f"Background process launched due to MCP timeout limitations.\n"
f"To check current codify status use the codify_status tool\n"
f"or you can check the log file at: {log_file}"
)
return [
types.TextContent(
type="text",
text=text,
)
]
@mcp.tool()
async def search(search_query: str, search_type: str) -> list:
async def search(search_query: str, search_type: str, top_k: int = 10) -> list:
"""
Search and query the knowledge graph for insights, information, and connections.
@ -550,6 +389,13 @@ async def search(search_query: str, search_type: str) -> list:
The search_type is case-insensitive and will be converted to uppercase.
top_k : int, optional
Maximum number of results to return (default: 10).
Controls the amount of context retrieved from the knowledge graph.
- Lower values (3-5): Faster, more focused results
- Higher values (10-20): More comprehensive, but slower and more context-heavy
Helps manage response size and context window usage in MCP clients.
Returns
-------
list
@ -586,13 +432,32 @@ async def search(search_query: str, search_type: str) -> list:
"""
async def search_task(search_query: str, search_type: str) -> str:
"""Search the knowledge graph"""
async def search_task(search_query: str, search_type: str, top_k: int) -> str:
"""
Internal task to execute knowledge graph search with result formatting.
Handles the actual search execution and formats results appropriately
for MCP clients based on the search type and execution mode (API vs direct).
Parameters
----------
search_query : str
The search query in natural language
search_type : str
Type of search to perform (GRAPH_COMPLETION, CHUNKS, etc.)
top_k : int
Maximum number of results to return
Returns
-------
str
Formatted search results as a string, with format depending on search_type
"""
# NOTE: MCP uses stdout to communicate, we must redirect all output
# going to stdout ( like the print function ) to stderr.
with redirect_stdout(sys.stderr):
search_results = await cognee_client.search(
query_text=search_query, query_type=search_type
query_text=search_query, query_type=search_type, top_k=top_k
)
# Handle different result formats based on API vs direct mode
@ -626,49 +491,10 @@ async def search(search_query: str, search_type: str) -> list:
else:
return str(search_results)
search_results = await search_task(search_query, search_type)
search_results = await search_task(search_query, search_type, top_k)
return [types.TextContent(type="text", text=search_results)]
@mcp.tool()
async def get_developer_rules() -> list:
"""
Retrieve all developer rules that were generated based on previous interactions.
This tool queries the Cognee knowledge graph and returns a list of developer
rules.
Parameters
----------
None
Returns
-------
list
A list containing a single TextContent object with the retrieved developer rules.
The format is plain text containing the developer rules in bulletpoints.
Notes
-----
- The specific logic for fetching rules is handled internally.
- This tool does not accept any parameters and is intended for simple rule inspection use cases.
"""
async def fetch_rules_from_cognee() -> str:
"""Collect all developer rules from Cognee"""
with redirect_stdout(sys.stderr):
if cognee_client.use_api:
logger.warning("Developer rules retrieval is not available in API mode")
return "Developer rules retrieval is not available in API mode"
developer_rules = await get_existing_rules(rules_nodeset_name="coding_agent_rules")
return developer_rules
rules_text = await fetch_rules_from_cognee()
return [types.TextContent(type="text", text=rules_text)]
@mcp.tool()
async def list_data(dataset_id: str = None) -> list:
"""
@ -954,48 +780,6 @@ async def cognify_status():
return [types.TextContent(type="text", text=error_msg)]
@mcp.tool()
async def codify_status():
"""
Get the current status of the codify pipeline.
This function retrieves information about current and recently completed codify operations
in the codebase dataset. It provides details on progress, success/failure status, and statistics
about the processed code repositories.
Returns
-------
list
A list containing a single TextContent object with the status information as a string.
The status includes information about active and completed jobs for the cognify_code_pipeline.
Notes
-----
- The function retrieves pipeline status specifically for the "cognify_code_pipeline" on the "codebase" dataset
- Status information includes job progress, execution time, and completion status
- The status is returned in string format for easy reading
- This operation is not available in API mode
"""
with redirect_stdout(sys.stderr):
try:
from cognee.modules.data.methods.get_unique_dataset_id import get_unique_dataset_id
from cognee.modules.users.methods import get_default_user
user = await get_default_user()
status = await cognee_client.get_pipeline_status(
[await get_unique_dataset_id("codebase", user)], "cognify_code_pipeline"
)
return [types.TextContent(type="text", text=str(status))]
except NotImplementedError:
error_msg = "❌ Pipeline status is not available in API mode"
logger.error(error_msg)
return [types.TextContent(type="text", text=error_msg)]
except Exception as e:
error_msg = f"❌ Failed to get codify status: {str(e)}"
logger.error(error_msg)
return [types.TextContent(type="text", text=error_msg)]
def node_to_string(node):
node_data = ", ".join(
[f'{key}: "{value}"' for key, value in node.items() if key in ["id", "name"]]
@ -1096,6 +880,10 @@ async def main():
# Skip migrations when in API mode (the API server handles its own database)
if not args.no_migration and not args.api_url:
from cognee.modules.engine.operations.setup import setup
await setup()
# Run Alembic migrations from the main cognee directory where alembic.ini is located
logger.info("Running database migrations...")
migration_result = subprocess.run(

View file

@ -3,7 +3,7 @@
Test client for Cognee MCP Server functionality.
This script tests all the tools and functions available in the Cognee MCP server,
including cognify, codify, search, prune, status checks, and utility functions.
including cognify, search, prune, status checks, and utility functions.
Usage:
# Set your OpenAI API key first
@ -23,6 +23,7 @@ import tempfile
import time
from contextlib import asynccontextmanager
from cognee.shared.logging_utils import setup_logging
from logging import ERROR, INFO
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
@ -35,7 +36,7 @@ from src.server import (
load_class,
)
# Set timeout for cognify/codify to complete in
# Set timeout for cognify to complete in
TIMEOUT = 5 * 60 # 5 min in seconds
@ -151,12 +152,9 @@ DEBUG = True
expected_tools = {
"cognify",
"codify",
"search",
"prune",
"cognify_status",
"codify_status",
"cognee_add_developer_rules",
"list_data",
"delete",
}
@ -247,106 +245,6 @@ DEBUG = True
}
print(f"{test_name} test failed: {e}")
async def test_codify(self):
"""Test the codify functionality using MCP client."""
print("\n🧪 Testing codify functionality...")
try:
async with self.mcp_server_session() as session:
codify_result = await session.call_tool(
"codify", arguments={"repo_path": self.test_repo_dir}
)
start = time.time() # mark the start
while True:
try:
# Wait a moment
await asyncio.sleep(5)
# Check if codify processing is finished
status_result = await session.call_tool("codify_status", arguments={})
if hasattr(status_result, "content") and status_result.content:
status_text = (
status_result.content[0].text
if status_result.content
else str(status_result)
)
else:
status_text = str(status_result)
if str(PipelineRunStatus.DATASET_PROCESSING_COMPLETED) in status_text:
break
elif time.time() - start > TIMEOUT:
raise TimeoutError("Codify did not complete in 5min")
except DatabaseNotCreatedError:
if time.time() - start > TIMEOUT:
raise TimeoutError("Database was not created in 5min")
self.test_results["codify"] = {
"status": "PASS",
"result": codify_result,
"message": "Codify executed successfully",
}
print("✅ Codify test passed")
except Exception as e:
self.test_results["codify"] = {
"status": "FAIL",
"error": str(e),
"message": "Codify test failed",
}
print(f"❌ Codify test failed: {e}")
async def test_cognee_add_developer_rules(self):
"""Test the cognee_add_developer_rules functionality using MCP client."""
print("\n🧪 Testing cognee_add_developer_rules functionality...")
try:
async with self.mcp_server_session() as session:
result = await session.call_tool(
"cognee_add_developer_rules", arguments={"base_path": self.test_data_dir}
)
start = time.time() # mark the start
while True:
try:
# Wait a moment
await asyncio.sleep(5)
# Check if developer rule cognify processing is finished
status_result = await session.call_tool("cognify_status", arguments={})
if hasattr(status_result, "content") and status_result.content:
status_text = (
status_result.content[0].text
if status_result.content
else str(status_result)
)
else:
status_text = str(status_result)
if str(PipelineRunStatus.DATASET_PROCESSING_COMPLETED) in status_text:
break
elif time.time() - start > TIMEOUT:
raise TimeoutError(
"Cognify of developer rules did not complete in 5min"
)
except DatabaseNotCreatedError:
if time.time() - start > TIMEOUT:
raise TimeoutError("Database was not created in 5min")
self.test_results["cognee_add_developer_rules"] = {
"status": "PASS",
"result": result,
"message": "Developer rules addition executed successfully",
}
print("✅ Developer rules test passed")
except Exception as e:
self.test_results["cognee_add_developer_rules"] = {
"status": "FAIL",
"error": str(e),
"message": "Developer rules test failed",
}
print(f"❌ Developer rules test failed: {e}")
async def test_search_functionality(self):
"""Test the search functionality with different search types using MCP client."""
print("\n🧪 Testing search functionality...")
@ -359,7 +257,11 @@ DEBUG = True
# Go through all Cognee search types
for search_type in SearchType:
# Don't test these search types
if search_type in [SearchType.NATURAL_LANGUAGE, SearchType.CYPHER]:
if search_type in [
SearchType.NATURAL_LANGUAGE,
SearchType.CYPHER,
SearchType.TRIPLET_COMPLETION,
]:
break
try:
async with self.mcp_server_session() as session:
@ -681,9 +583,6 @@ class TestModel:
test_name="Cognify2",
)
await self.test_codify()
await self.test_cognee_add_developer_rules()
# Test list_data and delete functionality
await self.test_list_data()
await self.test_delete()
@ -739,7 +638,5 @@ async def main():
if __name__ == "__main__":
from logging import ERROR
logger = setup_logging(log_level=ERROR)
asyncio.run(main())

7633
cognee-mcp/uv.lock generated

File diff suppressed because it is too large Load diff

View file

@ -21,8 +21,9 @@ from cognee.api.v1.notebooks.routers import get_notebooks_router
from cognee.api.v1.permissions.routers import get_permissions_router
from cognee.api.v1.settings.routers import get_settings_router
from cognee.api.v1.datasets.routers import get_datasets_router
from cognee.api.v1.cognify.routers import get_code_pipeline_router, get_cognify_router
from cognee.api.v1.cognify.routers import get_cognify_router
from cognee.api.v1.search.routers import get_search_router
from cognee.api.v1.ontologies.routers.get_ontology_router import get_ontology_router
from cognee.api.v1.memify.routers import get_memify_router
from cognee.api.v1.add.routers import get_add_router
from cognee.api.v1.delete.routers import get_delete_router
@ -263,6 +264,8 @@ app.include_router(
app.include_router(get_datasets_router(), prefix="/api/v1/datasets", tags=["datasets"])
app.include_router(get_ontology_router(), prefix="/api/v1/ontologies", tags=["ontologies"])
app.include_router(get_settings_router(), prefix="/api/v1/settings", tags=["settings"])
app.include_router(get_visualize_router(), prefix="/api/v1/visualize", tags=["visualize"])
@ -275,10 +278,6 @@ app.include_router(get_responses_router(), prefix="/api/v1/responses", tags=["re
app.include_router(get_sync_router(), prefix="/api/v1/sync", tags=["sync"])
codegraph_routes = get_code_pipeline_router()
if codegraph_routes:
app.include_router(codegraph_routes, prefix="/api/v1/code-pipeline", tags=["code-pipeline"])
app.include_router(
get_users_router(),
prefix="/api/v1/users",

View file

@ -155,7 +155,7 @@ async def add(
- LLM_API_KEY: API key for your LLM provider (OpenAI, Anthropic, etc.)
Optional:
- LLM_PROVIDER: "openai" (default), "anthropic", "gemini", "ollama", "mistral"
- LLM_PROVIDER: "openai" (default), "anthropic", "gemini", "ollama", "mistral", "bedrock"
- LLM_MODEL: Model name (default: "gpt-5-mini")
- DEFAULT_USER_EMAIL: Custom default user email
- DEFAULT_USER_PASSWORD: Custom default user password
@ -205,6 +205,7 @@ async def add(
pipeline_name="add_pipeline",
vector_db_config=vector_db_config,
graph_db_config=graph_db_config,
use_pipeline_cache=True,
incremental_loading=incremental_loading,
data_per_batch=data_per_batch,
):

View file

@ -82,7 +82,9 @@ def get_add_router() -> APIRouter:
datasetName,
user=user,
dataset_id=datasetId,
node_set=node_set if node_set else None,
node_set=node_set
if node_set != [""]
else None, # Transform default node_set endpoint value to None
)
if isinstance(add_run, PipelineRunErrored):

View file

@ -1,119 +0,0 @@
import os
import pathlib
import asyncio
from typing import Optional
from cognee.shared.logging_utils import get_logger, setup_logging
from cognee.modules.observability.get_observe import get_observe
from cognee.api.v1.search import SearchType, search
from cognee.api.v1.visualize.visualize import visualize_graph
from cognee.modules.cognify.config import get_cognify_config
from cognee.modules.pipelines import run_tasks
from cognee.modules.pipelines.tasks.task import Task
from cognee.modules.users.methods import get_default_user
from cognee.shared.data_models import KnowledgeGraph
from cognee.modules.data.methods import create_dataset
from cognee.tasks.documents import classify_documents, extract_chunks_from_documents
from cognee.tasks.graph import extract_graph_from_data
from cognee.tasks.ingestion import ingest_data
from cognee.tasks.repo_processor import get_non_py_files, get_repo_file_dependencies
from cognee.tasks.storage import add_data_points
from cognee.tasks.summarization import summarize_text
from cognee.infrastructure.llm import get_max_chunk_tokens
from cognee.infrastructure.databases.relational import get_relational_engine
observe = get_observe()
logger = get_logger("code_graph_pipeline")
@observe
async def run_code_graph_pipeline(
repo_path,
include_docs=False,
excluded_paths: Optional[list[str]] = None,
supported_languages: Optional[list[str]] = None,
):
import cognee
from cognee.low_level import setup
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await setup()
cognee_config = get_cognify_config()
user = await get_default_user()
detailed_extraction = True
tasks = [
Task(
get_repo_file_dependencies,
detailed_extraction=detailed_extraction,
supported_languages=supported_languages,
excluded_paths=excluded_paths,
),
# Task(summarize_code, task_config={"batch_size": 500}), # This task takes a long time to complete
Task(add_data_points, task_config={"batch_size": 30}),
]
if include_docs:
# This tasks take a long time to complete
non_code_tasks = [
Task(get_non_py_files, task_config={"batch_size": 50}),
Task(ingest_data, dataset_name="repo_docs", user=user),
Task(classify_documents),
Task(extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()),
Task(
extract_graph_from_data,
graph_model=KnowledgeGraph,
task_config={"batch_size": 50},
),
Task(
summarize_text,
summarization_model=cognee_config.summarization_model,
task_config={"batch_size": 50},
),
]
dataset_name = "codebase"
# Save dataset to database
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
dataset = await create_dataset(dataset_name, user, session)
if include_docs:
non_code_pipeline_run = run_tasks(
non_code_tasks, dataset.id, repo_path, user, "cognify_pipeline"
)
async for run_status in non_code_pipeline_run:
yield run_status
async for run_status in run_tasks(
tasks, dataset.id, repo_path, user, "cognify_code_pipeline", incremental_loading=False
):
yield run_status
if __name__ == "__main__":
async def main():
async for run_status in run_code_graph_pipeline("REPO_PATH"):
print(f"{run_status.pipeline_run_id}: {run_status.status}")
file_path = os.path.join(
pathlib.Path(__file__).parent, ".artifacts", "graph_visualization.html"
)
await visualize_graph(file_path)
search_results = await search(
query_type=SearchType.CODE,
query_text="How is Relationship weight calculated?",
)
for file in search_results:
print(file["name"])
logger = setup_logging(name="code_graph_pipeline")
asyncio.run(main())

View file

@ -3,6 +3,7 @@ from pydantic import BaseModel
from typing import Union, Optional
from uuid import UUID
from cognee.modules.cognify.config import get_cognify_config
from cognee.modules.ontology.ontology_env_config import get_ontology_env_config
from cognee.shared.logging_utils import get_logger
from cognee.shared.data_models import KnowledgeGraph
@ -19,7 +20,6 @@ from cognee.modules.ontology.get_default_ontology_resolver import (
from cognee.modules.users.models import User
from cognee.tasks.documents import (
check_permissions_on_dataset,
classify_documents,
extract_chunks_from_documents,
)
@ -53,6 +53,7 @@ async def cognify(
custom_prompt: Optional[str] = None,
temporal_cognify: bool = False,
data_per_batch: int = 20,
**kwargs,
):
"""
Transform ingested data into a structured knowledge graph.
@ -78,12 +79,11 @@ async def cognify(
Processing Pipeline:
1. **Document Classification**: Identifies document types and structures
2. **Permission Validation**: Ensures user has processing rights
3. **Text Chunking**: Breaks content into semantically meaningful segments
4. **Entity Extraction**: Identifies key concepts, people, places, organizations
5. **Relationship Detection**: Discovers connections between entities
6. **Graph Construction**: Builds semantic knowledge graph with embeddings
7. **Content Summarization**: Creates hierarchical summaries for navigation
2. **Text Chunking**: Breaks content into semantically meaningful segments
3. **Entity Extraction**: Identifies key concepts, people, places, organizations
4. **Relationship Detection**: Discovers connections between entities
5. **Graph Construction**: Builds semantic knowledge graph with embeddings
6. **Content Summarization**: Creates hierarchical summaries for navigation
Graph Model Customization:
The `graph_model` parameter allows custom knowledge structures:
@ -224,6 +224,7 @@ async def cognify(
config=config,
custom_prompt=custom_prompt,
chunks_per_batch=chunks_per_batch,
**kwargs,
)
# By calling get pipeline executor we get a function that will have the run_pipeline run in the background or a function that we will need to wait for
@ -238,6 +239,7 @@ async def cognify(
vector_db_config=vector_db_config,
graph_db_config=graph_db_config,
incremental_loading=incremental_loading,
use_pipeline_cache=True,
pipeline_name="cognify_pipeline",
data_per_batch=data_per_batch,
)
@ -251,6 +253,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
config: Config = None,
custom_prompt: Optional[str] = None,
chunks_per_batch: int = 100,
**kwargs,
) -> list[Task]:
if config is None:
ontology_config = get_ontology_env_config()
@ -272,9 +275,11 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
if chunks_per_batch is None:
chunks_per_batch = 100
cognify_config = get_cognify_config()
embed_triplets = cognify_config.triplet_embedding
default_tasks = [
Task(classify_documents),
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
Task(
extract_chunks_from_documents,
max_chunk_size=chunk_size or get_max_chunk_tokens(),
@ -286,12 +291,17 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
config=config,
custom_prompt=custom_prompt,
task_config={"batch_size": chunks_per_batch},
**kwargs,
), # Generate knowledge graphs from the document chunks.
Task(
summarize_text,
task_config={"batch_size": chunks_per_batch},
),
Task(add_data_points, task_config={"batch_size": chunks_per_batch}),
Task(
add_data_points,
embed_triplets=embed_triplets,
task_config={"batch_size": chunks_per_batch},
),
]
return default_tasks
@ -305,14 +315,13 @@ async def get_temporal_tasks(
The pipeline includes:
1. Document classification.
2. Dataset permission checks (requires "write" access).
3. Document chunking with a specified or default chunk size.
4. Event and timestamp extraction from chunks.
5. Knowledge graph extraction from events.
6. Batched insertion of data points.
2. Document chunking with a specified or default chunk size.
3. Event and timestamp extraction from chunks.
4. Knowledge graph extraction from events.
5. Batched insertion of data points.
Args:
user (User, optional): The user requesting task execution, used for permission checks.
user (User, optional): The user requesting task execution.
chunker (Callable, optional): A text chunking function/class to split documents. Defaults to TextChunker.
chunk_size (int, optional): Maximum token size per chunk. If not provided, uses system default.
chunks_per_batch (int, optional): Number of chunks to process in a single batch in Cognify
@ -325,7 +334,6 @@ async def get_temporal_tasks(
temporal_tasks = [
Task(classify_documents),
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
Task(
extract_chunks_from_documents,
max_chunk_size=chunk_size or get_max_chunk_tokens(),

View file

@ -1,2 +1 @@
from .get_cognify_router import get_cognify_router
from .get_code_pipeline_router import get_code_pipeline_router

View file

@ -1,90 +0,0 @@
import json
from cognee.shared.logging_utils import get_logger
from fastapi import APIRouter
from fastapi.responses import JSONResponse
from cognee.api.DTO import InDTO
from cognee.modules.retrieval.code_retriever import CodeRetriever
from cognee.modules.storage.utils import JSONEncoder
logger = get_logger()
class CodePipelineIndexPayloadDTO(InDTO):
repo_path: str
include_docs: bool = False
class CodePipelineRetrievePayloadDTO(InDTO):
query: str
full_input: str
def get_code_pipeline_router() -> APIRouter:
try:
import cognee.api.v1.cognify.code_graph_pipeline
except ModuleNotFoundError:
logger.error("codegraph dependencies not found. Skipping codegraph API routes.")
return None
router = APIRouter()
@router.post("/index", response_model=None)
async def code_pipeline_index(payload: CodePipelineIndexPayloadDTO):
"""
Run indexation on a code repository.
This endpoint processes a code repository to create a knowledge graph
of the codebase structure, dependencies, and relationships.
## Request Parameters
- **repo_path** (str): Path to the code repository
- **include_docs** (bool): Whether to include documentation files (default: false)
## Response
No content returned. Processing results are logged.
## Error Codes
- **409 Conflict**: Error during indexation process
"""
from cognee.api.v1.cognify.code_graph_pipeline import run_code_graph_pipeline
try:
async for result in run_code_graph_pipeline(payload.repo_path, payload.include_docs):
logger.info(result)
except Exception as error:
return JSONResponse(status_code=409, content={"error": str(error)})
@router.post("/retrieve", response_model=list[dict])
async def code_pipeline_retrieve(payload: CodePipelineRetrievePayloadDTO):
"""
Retrieve context from the code knowledge graph.
This endpoint searches the indexed code repository to find relevant
context based on the provided query.
## Request Parameters
- **query** (str): Search query for code context
- **full_input** (str): Full input text for processing
## Response
Returns a list of relevant code files and context as JSON.
## Error Codes
- **409 Conflict**: Error during retrieval process
"""
try:
query = (
payload.full_input.replace("cognee ", "")
if payload.full_input.startswith("cognee ")
else payload.full_input
)
retriever = CodeRetriever()
retrieved_files = await retriever.get_context(query)
return json.dumps(retrieved_files, cls=JSONEncoder)
except Exception as error:
return JSONResponse(status_code=409, content={"error": str(error)})
return router

View file

@ -41,6 +41,11 @@ class CognifyPayloadDTO(InDTO):
custom_prompt: Optional[str] = Field(
default="", description="Custom prompt for entity extraction and graph generation"
)
ontology_key: Optional[List[str]] = Field(
default=None,
examples=[[]],
description="Reference to one or more previously uploaded ontologies",
)
def get_cognify_router() -> APIRouter:
@ -68,6 +73,7 @@ def get_cognify_router() -> APIRouter:
- **dataset_ids** (Optional[List[UUID]]): List of existing dataset UUIDs to process. UUIDs allow processing of datasets not owned by the user (if permitted).
- **run_in_background** (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking).
- **custom_prompt** (Optional[str]): Custom prompt for entity extraction and graph generation. If provided, this prompt will be used instead of the default prompts for knowledge graph extraction.
- **ontology_key** (Optional[List[str]]): Reference to one or more previously uploaded ontology files to use for knowledge graph construction.
## Response
- **Blocking execution**: Complete pipeline run information with entity counts, processing duration, and success/failure status
@ -82,7 +88,8 @@ def get_cognify_router() -> APIRouter:
{
"datasets": ["research_papers", "documentation"],
"run_in_background": false,
"custom_prompt": "Extract entities focusing on technical concepts and their relationships. Identify key technologies, methodologies, and their interconnections."
"custom_prompt": "Extract entities focusing on technical concepts and their relationships. Identify key technologies, methodologies, and their interconnections.",
"ontology_key": ["medical_ontology_v1"]
}
```
@ -108,13 +115,35 @@ def get_cognify_router() -> APIRouter:
)
from cognee.api.v1.cognify import cognify as cognee_cognify
from cognee.api.v1.ontologies.ontologies import OntologyService
try:
datasets = payload.dataset_ids if payload.dataset_ids else payload.datasets
config_to_use = None
if payload.ontology_key:
ontology_service = OntologyService()
ontology_contents = ontology_service.get_ontology_contents(
payload.ontology_key, user
)
from cognee.modules.ontology.ontology_config import Config
from cognee.modules.ontology.rdf_xml.RDFLibOntologyResolver import (
RDFLibOntologyResolver,
)
from io import StringIO
ontology_streams = [StringIO(content) for content in ontology_contents]
config_to_use: Config = {
"ontology_config": {
"ontology_resolver": RDFLibOntologyResolver(ontology_file=ontology_streams)
}
}
cognify_run = await cognee_cognify(
datasets,
user,
config=config_to_use,
run_in_background=payload.run_in_background,
custom_prompt=payload.custom_prompt,
)

View file

@ -208,14 +208,14 @@ def get_datasets_router() -> APIRouter:
},
)
from cognee.modules.data.methods import get_dataset, delete_dataset
from cognee.modules.data.methods import delete_dataset
dataset = await get_dataset(user.id, dataset_id)
dataset = await get_authorized_existing_datasets([dataset_id], "delete", user)
if dataset is None:
raise DatasetNotFoundError(message=f"Dataset ({str(dataset_id)}) not found.")
await delete_dataset(dataset)
await delete_dataset(dataset[0])
@router.delete(
"/{dataset_id}/data/{data_id}",

View file

@ -0,0 +1,4 @@
from .ontologies import OntologyService
from .routers.get_ontology_router import get_ontology_router
__all__ = ["OntologyService", "get_ontology_router"]

View file

@ -0,0 +1,158 @@
import os
import json
import tempfile
from pathlib import Path
from datetime import datetime, timezone
from typing import Optional, List
from dataclasses import dataclass
from fastapi import UploadFile
@dataclass
class OntologyMetadata:
ontology_key: str
filename: str
size_bytes: int
uploaded_at: str
description: Optional[str] = None
class OntologyService:
def __init__(self):
pass
@property
def base_dir(self) -> Path:
return Path(tempfile.gettempdir()) / "ontologies"
def _get_user_dir(self, user_id: str) -> Path:
user_dir = self.base_dir / str(user_id)
user_dir.mkdir(parents=True, exist_ok=True)
return user_dir
def _get_metadata_path(self, user_dir: Path) -> Path:
return user_dir / "metadata.json"
def _load_metadata(self, user_dir: Path) -> dict:
metadata_path = self._get_metadata_path(user_dir)
if metadata_path.exists():
with open(metadata_path, "r") as f:
return json.load(f)
return {}
def _save_metadata(self, user_dir: Path, metadata: dict):
metadata_path = self._get_metadata_path(user_dir)
with open(metadata_path, "w") as f:
json.dump(metadata, f, indent=2)
async def upload_ontology(
self, ontology_key: str, file: UploadFile, user, description: Optional[str] = None
) -> OntologyMetadata:
if not file.filename:
raise ValueError("File must have a filename")
if not file.filename.lower().endswith(".owl"):
raise ValueError("File must be in .owl format")
user_dir = self._get_user_dir(str(user.id))
metadata = self._load_metadata(user_dir)
if ontology_key in metadata:
raise ValueError(f"Ontology key '{ontology_key}' already exists")
content = await file.read()
file_path = user_dir / f"{ontology_key}.owl"
with open(file_path, "wb") as f:
f.write(content)
ontology_metadata = {
"filename": file.filename,
"size_bytes": len(content),
"uploaded_at": datetime.now(timezone.utc).isoformat(),
"description": description,
}
metadata[ontology_key] = ontology_metadata
self._save_metadata(user_dir, metadata)
return OntologyMetadata(
ontology_key=ontology_key,
filename=file.filename,
size_bytes=len(content),
uploaded_at=ontology_metadata["uploaded_at"],
description=description,
)
async def upload_ontologies(
self,
ontology_key: List[str],
files: List[UploadFile],
user,
descriptions: Optional[List[str]] = None,
) -> List[OntologyMetadata]:
"""
Upload ontology files with their respective keys.
Args:
ontology_key: List of unique keys for each ontology
files: List of UploadFile objects (same length as keys)
user: Authenticated user
descriptions: Optional list of descriptions for each file
Returns:
List of OntologyMetadata objects for uploaded files
Raises:
ValueError: If keys duplicate, file format invalid, or array lengths don't match
"""
if len(ontology_key) != len(files):
raise ValueError("Number of keys must match number of files")
if len(set(ontology_key)) != len(ontology_key):
raise ValueError("Duplicate ontology keys not allowed")
results = []
for i, (key, file) in enumerate(zip(ontology_key, files)):
results.append(
await self.upload_ontology(
ontology_key=key,
file=file,
user=user,
description=descriptions[i] if descriptions else None,
)
)
return results
def get_ontology_contents(self, ontology_key: List[str], user) -> List[str]:
"""
Retrieve ontology content for one or more keys.
Args:
ontology_key: List of ontology keys to retrieve (can contain single item)
user: Authenticated user
Returns:
List of ontology content strings
Raises:
ValueError: If any ontology key not found
"""
user_dir = self._get_user_dir(str(user.id))
metadata = self._load_metadata(user_dir)
contents = []
for key in ontology_key:
if key not in metadata:
raise ValueError(f"Ontology key '{key}' not found")
file_path = user_dir / f"{key}.owl"
if not file_path.exists():
raise ValueError(f"Ontology file for key '{key}' not found")
with open(file_path, "r", encoding="utf-8") as f:
contents.append(f.read())
return contents
def list_ontologies(self, user) -> dict:
user_dir = self._get_user_dir(str(user.id))
return self._load_metadata(user_dir)

View file

@ -0,0 +1,109 @@
from fastapi import APIRouter, File, Form, UploadFile, Depends, Request
from fastapi.responses import JSONResponse
from typing import Optional, List
from cognee.modules.users.models import User
from cognee.modules.users.methods import get_authenticated_user
from cognee.shared.utils import send_telemetry
from cognee import __version__ as cognee_version
from ..ontologies import OntologyService
def get_ontology_router() -> APIRouter:
router = APIRouter()
ontology_service = OntologyService()
@router.post("", response_model=dict)
async def upload_ontology(
request: Request,
ontology_key: str = Form(...),
ontology_file: UploadFile = File(...),
description: Optional[str] = Form(None),
user: User = Depends(get_authenticated_user),
):
"""
Upload a single ontology file for later use in cognify operations.
## Request Parameters
- **ontology_key** (str): User-defined identifier for the ontology.
- **ontology_file** (UploadFile): Single OWL format ontology file
- **description** (Optional[str]): Optional description for the ontology.
## Response
Returns metadata about the uploaded ontology including key, filename, size, and upload timestamp.
## Error Codes
- **400 Bad Request**: Invalid file format, duplicate key, multiple files uploaded
- **500 Internal Server Error**: File system or processing errors
"""
send_telemetry(
"Ontology Upload API Endpoint Invoked",
user.id,
additional_properties={
"endpoint": "POST /api/v1/ontologies",
"cognee_version": cognee_version,
},
)
try:
# Enforce: exactly one uploaded file for "ontology_file"
form = await request.form()
uploaded_files = form.getlist("ontology_file")
if len(uploaded_files) != 1:
raise ValueError("Only one ontology_file is allowed")
if ontology_key.strip().startswith(("[", "{")):
raise ValueError("ontology_key must be a string")
if description is not None and description.strip().startswith(("[", "{")):
raise ValueError("description must be a string")
result = await ontology_service.upload_ontology(
ontology_key=ontology_key,
file=ontology_file,
user=user,
description=description,
)
return {
"uploaded_ontologies": [
{
"ontology_key": result.ontology_key,
"filename": result.filename,
"size_bytes": result.size_bytes,
"uploaded_at": result.uploaded_at,
"description": result.description,
}
]
}
except ValueError as e:
return JSONResponse(status_code=400, content={"error": str(e)})
except Exception as e:
return JSONResponse(status_code=500, content={"error": str(e)})
@router.get("", response_model=dict)
async def list_ontologies(user: User = Depends(get_authenticated_user)):
"""
List all uploaded ontologies for the authenticated user.
## Response
Returns a dictionary mapping ontology keys to their metadata including filename, size, and upload timestamp.
## Error Codes
- **500 Internal Server Error**: File system or processing errors
"""
send_telemetry(
"Ontology List API Endpoint Invoked",
user.id,
additional_properties={
"endpoint": "GET /api/v1/ontologies",
"cognee_version": cognee_version,
},
)
try:
metadata = ontology_service.list_ontologies(user)
return metadata
except Exception as e:
return JSONResponse(status_code=500, content={"error": str(e)})
return router

View file

@ -1,15 +1,20 @@
from uuid import UUID
from typing import List
from typing import List, Union
from fastapi import APIRouter, Depends
from fastapi.responses import JSONResponse
from cognee.modules.users.models import User
from cognee.api.DTO import InDTO
from cognee.modules.users.methods import get_authenticated_user
from cognee.shared.utils import send_telemetry
from cognee import __version__ as cognee_version
class SelectTenantDTO(InDTO):
tenant_id: UUID | None = None
def get_permissions_router() -> APIRouter:
permissions_router = APIRouter()
@ -226,4 +231,39 @@ def get_permissions_router() -> APIRouter:
status_code=200, content={"message": "Tenant created.", "tenant_id": str(tenant_id)}
)
@permissions_router.post("/tenants/select")
async def select_tenant(payload: SelectTenantDTO, user: User = Depends(get_authenticated_user)):
"""
Select current tenant.
This endpoint selects a tenant with the specified UUID. Tenants are used
to organize users and resources in multi-tenant environments, providing
isolation and access control between different groups or organizations.
Sending a null/None value as tenant_id selects his default single user tenant
## Request Parameters
- **tenant_id** (Union[UUID, None]): UUID of the tenant to select, If null/None is provided use the default single user tenant
## Response
Returns a success message along with selected tenant id.
"""
send_telemetry(
"Permissions API Endpoint Invoked",
user.id,
additional_properties={
"endpoint": f"POST /v1/permissions/tenants/{str(payload.tenant_id)}",
"tenant_id": str(payload.tenant_id),
},
)
from cognee.modules.users.tenants.methods import select_tenant as select_tenant_method
await select_tenant_method(user_id=user.id, tenant_id=payload.tenant_id)
return JSONResponse(
status_code=200,
content={"message": "Tenant selected.", "tenant_id": str(payload.tenant_id)},
)
return permissions_router

View file

@ -31,6 +31,9 @@ async def search(
only_context: bool = False,
use_combined_context: bool = False,
session_id: Optional[str] = None,
wide_search_top_k: Optional[int] = 100,
triplet_distance_penalty: Optional[float] = 3.5,
verbose: bool = False,
) -> Union[List[SearchResult], CombinedSearchResult]:
"""
Search and query the knowledge graph for insights, information, and connections.
@ -121,6 +124,8 @@ async def search(
session_id: Optional session identifier for caching Q&A interactions. Defaults to 'default_session' if None.
verbose: If True, returns detailed result information including graph representation (when possible).
Returns:
list: Search results in format determined by query_type:
@ -200,6 +205,9 @@ async def search(
only_context=only_context,
use_combined_context=use_combined_context,
session_id=session_id,
wide_search_top_k=wide_search_top_k,
triplet_distance_penalty=triplet_distance_penalty,
verbose=verbose,
)
return filtered_search_results

View file

@ -0,0 +1,360 @@
import os
import platform
import subprocess
import tempfile
from pathlib import Path
import requests
from cognee.shared.logging_utils import get_logger
logger = get_logger()
def get_nvm_dir() -> Path:
"""
Get the nvm directory path following standard nvm installation logic.
Uses XDG_CONFIG_HOME if set, otherwise falls back to ~/.nvm.
"""
xdg_config_home = os.environ.get("XDG_CONFIG_HOME")
if xdg_config_home:
return Path(xdg_config_home) / "nvm"
return Path.home() / ".nvm"
def get_nvm_sh_path() -> Path:
"""
Get the path to nvm.sh following standard nvm installation logic.
"""
return get_nvm_dir() / "nvm.sh"
def check_nvm_installed() -> bool:
"""
Check if nvm (Node Version Manager) is installed.
"""
try:
# Check if nvm is available in the shell
# nvm is typically sourced in shell config files, so we need to check via shell
if platform.system() == "Windows":
# On Windows, nvm-windows uses a different approach
result = subprocess.run(
["nvm", "version"],
capture_output=True,
text=True,
timeout=10,
shell=True,
)
else:
# On Unix-like systems, nvm is a shell function, so we need to source it
# First check if nvm.sh exists
nvm_path = get_nvm_sh_path()
if not nvm_path.exists():
logger.debug(f"nvm.sh not found at {nvm_path}")
return False
# Try to source nvm and check version, capturing errors
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && nvm --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
# Log the error to help diagnose configuration issues
if result.stderr:
logger.debug(f"nvm check failed: {result.stderr.strip()}")
return False
return result.returncode == 0
except Exception as e:
logger.debug(f"Exception checking nvm: {str(e)}")
return False
def install_nvm() -> bool:
"""
Install nvm (Node Version Manager) on Unix-like systems.
"""
if platform.system() == "Windows":
logger.error("nvm installation on Windows requires nvm-windows.")
logger.error(
"Please install nvm-windows manually from: https://github.com/coreybutler/nvm-windows"
)
return False
logger.info("Installing nvm (Node Version Manager)...")
try:
# Download and install nvm
nvm_install_script = "https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh"
logger.info(f"Downloading nvm installer from {nvm_install_script}...")
response = requests.get(nvm_install_script, timeout=60)
response.raise_for_status()
# Create a temporary script file
with tempfile.NamedTemporaryFile(mode="w", suffix=".sh", delete=False) as f:
f.write(response.text)
install_script_path = f.name
try:
# Make the script executable and run it
os.chmod(install_script_path, 0o755)
result = subprocess.run(
["bash", install_script_path],
capture_output=True,
text=True,
timeout=120,
)
if result.returncode == 0:
logger.info("✓ nvm installed successfully")
# Source nvm in current shell session
nvm_dir = get_nvm_dir()
if nvm_dir.exists():
return True
else:
logger.warning(
f"nvm installation completed but nvm directory not found at {nvm_dir}"
)
return False
else:
logger.error(f"nvm installation failed: {result.stderr}")
return False
finally:
# Clean up temporary script
try:
os.unlink(install_script_path)
except Exception:
pass
except requests.exceptions.RequestException as e:
logger.error(f"Failed to download nvm installer: {str(e)}")
return False
except Exception as e:
logger.error(f"Failed to install nvm: {str(e)}")
return False
def install_node_with_nvm() -> bool:
"""
Install the latest Node.js version using nvm.
Returns True if installation succeeds, False otherwise.
"""
if platform.system() == "Windows":
logger.error("Node.js installation via nvm on Windows requires nvm-windows.")
logger.error("Please install Node.js manually from: https://nodejs.org/")
return False
logger.info("Installing latest Node.js version using nvm...")
try:
# Source nvm and install latest Node.js
nvm_path = get_nvm_sh_path()
if not nvm_path.exists():
logger.error(f"nvm.sh not found at {nvm_path}. nvm may not be properly installed.")
return False
nvm_source_cmd = f"source {nvm_path}"
install_cmd = f"{nvm_source_cmd} && nvm install node"
result = subprocess.run(
["bash", "-c", install_cmd],
capture_output=True,
text=True,
timeout=300, # 5 minutes timeout for Node.js installation
)
if result.returncode == 0:
logger.info("✓ Node.js installed successfully via nvm")
# Set as default version
use_cmd = f"{nvm_source_cmd} && nvm alias default node"
subprocess.run(
["bash", "-c", use_cmd],
capture_output=True,
text=True,
timeout=30,
)
# Add nvm to PATH for current session
# This ensures node/npm are available in subsequent commands
nvm_dir = get_nvm_dir()
if nvm_dir.exists():
# Update PATH for current process
nvm_bin = nvm_dir / "versions" / "node"
# Find the latest installed version
if nvm_bin.exists():
versions = sorted(nvm_bin.iterdir(), reverse=True)
if versions:
latest_node_bin = versions[0] / "bin"
if latest_node_bin.exists():
current_path = os.environ.get("PATH", "")
os.environ["PATH"] = f"{latest_node_bin}:{current_path}"
return True
else:
logger.error(f"Failed to install Node.js: {result.stderr}")
return False
except subprocess.TimeoutExpired:
logger.error("Timeout installing Node.js (this can take several minutes)")
return False
except Exception as e:
logger.error(f"Error installing Node.js: {str(e)}")
return False
def check_node_npm() -> tuple[bool, str]: # (is_available, error_message)
"""
Check if Node.js and npm are available.
If not available, attempts to install nvm and Node.js automatically.
"""
try:
# Check Node.js - try direct command first, then with nvm if needed
result = subprocess.run(["node", "--version"], capture_output=True, text=True, timeout=10)
if result.returncode != 0:
# If direct command fails, try with nvm sourced (in case nvm is installed but not in PATH)
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && node --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0 and result.stderr:
logger.debug(f"Failed to source nvm or run node: {result.stderr.strip()}")
if result.returncode != 0:
# Node.js is not installed, try to install it
logger.info("Node.js is not installed. Attempting to install automatically...")
# Check if nvm is installed
if not check_nvm_installed():
logger.info("nvm is not installed. Installing nvm first...")
if not install_nvm():
return (
False,
"Failed to install nvm. Please install Node.js manually from https://nodejs.org/",
)
# Install Node.js using nvm
if not install_node_with_nvm():
return (
False,
"Failed to install Node.js. Please install Node.js manually from https://nodejs.org/",
)
# Verify installation after automatic setup
# Try with nvm sourced first
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && node --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0 and result.stderr:
logger.debug(
f"Failed to verify node after installation: {result.stderr.strip()}"
)
else:
result = subprocess.run(
["node", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
nvm_path = get_nvm_sh_path()
return (
False,
f"Node.js installation completed but node command is not available. Please restart your terminal or source {nvm_path}",
)
node_version = result.stdout.strip()
logger.debug(f"Found Node.js version: {node_version}")
# Check npm - handle Windows PowerShell scripts
if platform.system() == "Windows":
# On Windows, npm might be a PowerShell script, so we need to use shell=True
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10, shell=True
)
else:
# On Unix-like systems, if we just installed via nvm, we may need to source nvm
# Try direct command first
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
# Try with nvm sourced
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && npm --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0 and result.stderr:
logger.debug(f"Failed to source nvm or run npm: {result.stderr.strip()}")
if result.returncode != 0:
return False, "npm is not installed or not in PATH"
npm_version = result.stdout.strip()
logger.debug(f"Found npm version: {npm_version}")
return True, f"Node.js {node_version}, npm {npm_version}"
except subprocess.TimeoutExpired:
return False, "Timeout checking Node.js/npm installation"
except FileNotFoundError:
# Node.js is not installed, try to install it
logger.info("Node.js is not found. Attempting to install automatically...")
# Check if nvm is installed
if not check_nvm_installed():
logger.info("nvm is not installed. Installing nvm first...")
if not install_nvm():
return (
False,
"Failed to install nvm. Please install Node.js manually from https://nodejs.org/",
)
# Install Node.js using nvm
if not install_node_with_nvm():
return (
False,
"Failed to install Node.js. Please install Node.js manually from https://nodejs.org/",
)
# Retry checking Node.js after installation
try:
result = subprocess.run(
["node", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode == 0:
node_version = result.stdout.strip()
# Check npm
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && npm --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode == 0:
npm_version = result.stdout.strip()
return True, f"Node.js {node_version}, npm {npm_version}"
elif result.stderr:
logger.debug(f"Failed to source nvm or run npm: {result.stderr.strip()}")
except Exception as e:
logger.debug(f"Exception retrying node/npm check: {str(e)}")
return False, "Node.js/npm not found. Please install Node.js from https://nodejs.org/"
except Exception as e:
return False, f"Error checking Node.js/npm: {str(e)}"

View file

@ -0,0 +1,50 @@
import platform
import subprocess
from pathlib import Path
from typing import List
from cognee.shared.logging_utils import get_logger
from .node_setup import get_nvm_sh_path
logger = get_logger()
def run_npm_command(cmd: List[str], cwd: Path, timeout: int = 300) -> subprocess.CompletedProcess:
"""
Run an npm command, ensuring nvm is sourced if needed (Unix-like systems only).
Returns the CompletedProcess result.
"""
if platform.system() == "Windows":
# On Windows, use shell=True for npm commands
return subprocess.run(
cmd,
cwd=cwd,
capture_output=True,
text=True,
timeout=timeout,
shell=True,
)
else:
# On Unix-like systems, try direct command first
result = subprocess.run(
cmd,
cwd=cwd,
capture_output=True,
text=True,
timeout=timeout,
)
# If it fails and nvm might be installed, try with nvm sourced
if result.returncode != 0:
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
nvm_cmd = f"source {nvm_path} && {' '.join(cmd)}"
result = subprocess.run(
["bash", "-c", nvm_cmd],
cwd=cwd,
capture_output=True,
text=True,
timeout=timeout,
)
if result.returncode != 0 and result.stderr:
logger.debug(f"npm command failed with nvm: {result.stderr.strip()}")
return result

View file

@ -15,6 +15,8 @@ import shutil
from cognee.shared.logging_utils import get_logger
from cognee.version import get_cognee_version
from .node_setup import check_node_npm, get_nvm_dir, get_nvm_sh_path
from .npm_utils import run_npm_command
logger = get_logger()
@ -285,48 +287,6 @@ def find_frontend_path() -> Optional[Path]:
return None
def check_node_npm() -> tuple[bool, str]:
"""
Check if Node.js and npm are available.
Returns (is_available, error_message)
"""
try:
# Check Node.js
result = subprocess.run(["node", "--version"], capture_output=True, text=True, timeout=10)
if result.returncode != 0:
return False, "Node.js is not installed or not in PATH"
node_version = result.stdout.strip()
logger.debug(f"Found Node.js version: {node_version}")
# Check npm - handle Windows PowerShell scripts
if platform.system() == "Windows":
# On Windows, npm might be a PowerShell script, so we need to use shell=True
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10, shell=True
)
else:
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
return False, "npm is not installed or not in PATH"
npm_version = result.stdout.strip()
logger.debug(f"Found npm version: {npm_version}")
return True, f"Node.js {node_version}, npm {npm_version}"
except subprocess.TimeoutExpired:
return False, "Timeout checking Node.js/npm installation"
except FileNotFoundError:
return False, "Node.js/npm not found. Please install Node.js from https://nodejs.org/"
except Exception as e:
return False, f"Error checking Node.js/npm: {str(e)}"
def install_frontend_dependencies(frontend_path: Path) -> bool:
"""
Install frontend dependencies if node_modules doesn't exist.
@ -341,24 +301,7 @@ def install_frontend_dependencies(frontend_path: Path) -> bool:
logger.info("Installing frontend dependencies (this may take a few minutes)...")
try:
# Use shell=True on Windows for npm commands
if platform.system() == "Windows":
result = subprocess.run(
["npm", "install"],
cwd=frontend_path,
capture_output=True,
text=True,
timeout=300, # 5 minutes timeout
shell=True,
)
else:
result = subprocess.run(
["npm", "install"],
cwd=frontend_path,
capture_output=True,
text=True,
timeout=300, # 5 minutes timeout
)
result = run_npm_command(["npm", "install"], frontend_path, timeout=300)
if result.returncode == 0:
logger.info("Frontend dependencies installed successfully")
@ -642,6 +585,21 @@ def start_ui(
env["HOST"] = "localhost"
env["PORT"] = str(port)
# If nvm is installed, ensure it's available in the environment
nvm_path = get_nvm_sh_path()
if platform.system() != "Windows" and nvm_path.exists():
# Add nvm to PATH for the subprocess
nvm_dir = get_nvm_dir()
# Find the latest Node.js version installed via nvm
nvm_versions = nvm_dir / "versions" / "node"
if nvm_versions.exists():
versions = sorted(nvm_versions.iterdir(), reverse=True)
if versions:
latest_node_bin = versions[0] / "bin"
if latest_node_bin.exists():
current_path = env.get("PATH", "")
env["PATH"] = f"{latest_node_bin}:{current_path}"
# Start the development server
logger.info(f"Starting frontend server at http://localhost:{port}")
logger.info("This may take a moment to compile and start...")
@ -659,14 +617,26 @@ def start_ui(
shell=True,
)
else:
process = subprocess.Popen(
["npm", "run", "dev"],
cwd=frontend_path,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
)
# On Unix-like systems, use bash with nvm sourced if available
if nvm_path.exists():
# Use bash to source nvm and run npm
process = subprocess.Popen(
["bash", "-c", f"source {nvm_path} && npm run dev"],
cwd=frontend_path,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
)
else:
process = subprocess.Popen(
["npm", "run", "dev"],
cwd=frontend_path,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
)
# Start threads to stream frontend output with prefix
_stream_process_output(process, "stdout", "[FRONTEND]", "\033[33m") # Yellow

View file

@ -22,7 +22,7 @@ relationships, and creates semantic connections for enhanced search and reasonin
Processing Pipeline:
1. **Document Classification**: Identifies document types and structures
2. **Permission Validation**: Ensures user has processing rights
2. **Permission Validation**: Ensures user has processing rights
3. **Text Chunking**: Breaks content into semantically meaningful segments
4. **Entity Extraction**: Identifies key concepts, people, places, organizations
5. **Relationship Detection**: Discovers connections between entities
@ -97,6 +97,13 @@ After successful cognify processing, use `cognee search` to query the knowledge
chunker_class = LangchainChunker
except ImportError:
fmt.warning("LangchainChunker not available, using TextChunker")
elif args.chunker == "CsvChunker":
try:
from cognee.modules.chunking.CsvChunker import CsvChunker
chunker_class = CsvChunker
except ImportError:
fmt.warning("CsvChunker not available, using TextChunker")
result = await cognee.cognify(
datasets=datasets,

View file

@ -26,7 +26,7 @@ SEARCH_TYPE_CHOICES = [
]
# Chunker choices
CHUNKER_CHOICES = ["TextChunker", "LangchainChunker"]
CHUNKER_CHOICES = ["TextChunker", "LangchainChunker", "CsvChunker"]
# Output format choices
OUTPUT_FORMAT_CHOICES = ["json", "pretty", "simple"]

View file

@ -4,7 +4,10 @@ from typing import Union
from uuid import UUID
from cognee.base_config import get_base_config
from cognee.infrastructure.databases.vector.config import get_vectordb_config
from cognee.infrastructure.databases.graph.config import get_graph_config
from cognee.infrastructure.databases.utils import get_or_create_dataset_database
from cognee.infrastructure.databases.utils import resolve_dataset_database_connection_info
from cognee.infrastructure.files.storage.config import file_storage_config
from cognee.modules.users.methods import get_user
@ -19,6 +22,67 @@ async def set_session_user_context_variable(user):
session_user.set(user)
def multi_user_support_possible():
graph_db_config = get_graph_config()
vector_db_config = get_vectordb_config()
graph_handler = graph_db_config.graph_dataset_database_handler
vector_handler = vector_db_config.vector_dataset_database_handler
from cognee.infrastructure.databases.dataset_database_handler import (
supported_dataset_database_handlers,
)
if graph_handler not in supported_dataset_database_handlers:
raise EnvironmentError(
"Unsupported graph dataset to database handler configured. Cannot add support for multi-user access control mode. Please use a supported graph dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected graph dataset to database handler: {graph_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
if vector_handler not in supported_dataset_database_handlers:
raise EnvironmentError(
"Unsupported vector dataset to database handler configured. Cannot add support for multi-user access control mode. Please use a supported vector dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected vector dataset to database handler: {vector_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
if (
supported_dataset_database_handlers[graph_handler]["handler_provider"]
!= graph_db_config.graph_database_provider
):
raise EnvironmentError(
"The selected graph dataset to database handler does not work with the configured graph database provider. Cannot add support for multi-user access control mode. Please use a supported graph dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected graph database provider: {graph_db_config.graph_database_provider}\n"
f"Selected graph dataset to database handler: {graph_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
if (
supported_dataset_database_handlers[vector_handler]["handler_provider"]
!= vector_db_config.vector_db_provider
):
raise EnvironmentError(
"The selected vector dataset to database handler does not work with the configured vector database provider. Cannot add support for multi-user access control mode. Please use a supported vector dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected vector database provider: {vector_db_config.vector_db_provider}\n"
f"Selected vector dataset to database handler: {vector_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
return True
def backend_access_control_enabled():
backend_access_control = os.environ.get("ENABLE_BACKEND_ACCESS_CONTROL", None)
if backend_access_control is None:
# If backend access control is not defined in environment variables,
# enable it by default if graph and vector DBs can support it, otherwise disable it
return multi_user_support_possible()
elif backend_access_control.lower() == "true":
# If enabled, ensure that the current graph and vector DBs can support it
return multi_user_support_possible()
return False
async def set_database_global_context_variables(dataset: Union[str, UUID], user_id: UUID):
"""
If backend access control is enabled this function will ensure all datasets have their own databases,
@ -38,16 +102,17 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
"""
base_config = get_base_config()
if not os.getenv("ENABLE_BACKEND_ACCESS_CONTROL", "false").lower() == "true":
if not backend_access_control_enabled():
return
user = await get_user(user_id)
# To ensure permissions are enforced properly all datasets will have their own databases
dataset_database = await get_or_create_dataset_database(dataset, user)
# Ensure that all connection info is resolved properly
dataset_database = await resolve_dataset_database_connection_info(dataset_database)
base_config = get_base_config()
data_root_directory = os.path.join(
base_config.data_root_directory, str(user.tenant_id or user.id)
)
@ -56,19 +121,31 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
)
# Set vector and graph database configuration based on dataset database information
# TODO: Add better handling of vector and graph config accross Cognee.
# LRU_CACHE takes into account order of inputs, if order of inputs is changed it will be registered as a new DB adapter
vector_config = {
"vector_db_url": os.path.join(
databases_directory_path, dataset_database.vector_database_name
),
"vector_db_key": "",
"vector_db_provider": "lancedb",
"vector_db_provider": dataset_database.vector_database_provider,
"vector_db_url": dataset_database.vector_database_url,
"vector_db_key": dataset_database.vector_database_key,
"vector_db_name": dataset_database.vector_database_name,
}
graph_config = {
"graph_database_provider": "kuzu",
"graph_database_provider": dataset_database.graph_database_provider,
"graph_database_url": dataset_database.graph_database_url,
"graph_database_name": dataset_database.graph_database_name,
"graph_database_key": dataset_database.graph_database_key,
"graph_file_path": os.path.join(
databases_directory_path, dataset_database.graph_database_name
),
"graph_database_username": dataset_database.graph_database_connection_info.get(
"graph_database_username", ""
),
"graph_database_password": dataset_database.graph_database_connection_info.get(
"graph_database_password", ""
),
"graph_dataset_database_handler": "",
"graph_database_port": "",
}
storage_config = {

View file

@ -0,0 +1,29 @@
FROM python:3.11-slim
# Set environment variables
ENV PIP_NO_CACHE_DIR=true
ENV PATH="${PATH}:/root/.poetry/bin"
ENV PYTHONPATH=/app
ENV SKIP_MIGRATIONS=true
# System dependencies
RUN apt-get update && apt-get install -y \
gcc \
libpq-dev \
git \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY pyproject.toml poetry.lock README.md /app/
RUN pip install poetry
RUN poetry config virtualenvs.create false
RUN poetry install --extras distributed --extras evals --extras deepeval --no-root
COPY cognee/ /app/cognee
COPY distributed/ /app/distributed

View file

@ -35,6 +35,16 @@ class AnswerGeneratorExecutor:
retrieval_context = await retriever.get_context(query_text)
search_results = await retriever.get_completion(query_text, retrieval_context)
############
#:TODO This is a quick fix until we don't structure retriever results properly but lets not leave it like this...this is needed now due to the changed combined retriever structure..
if isinstance(retrieval_context, list):
retrieval_context = await retriever.convert_retrieved_objects_to_context(
triplets=retrieval_context
)
if isinstance(search_results, str):
search_results = [search_results]
#############
answer = {
"question": query_text,
"answer": search_results[0],

View file

@ -35,7 +35,7 @@ async def create_and_insert_answers_table(questions_payload):
async def run_question_answering(
params: dict, system_prompt="answer_simple_question.txt", top_k: Optional[int] = None
params: dict, system_prompt="answer_simple_question_benchmark.txt", top_k: Optional[int] = None
) -> List[dict]:
if params.get("answering_questions"):
logger.info("Question answering started...")

View file

@ -8,7 +8,6 @@ from cognee.modules.users.models import User
from cognee.shared.data_models import KnowledgeGraph
from cognee.shared.utils import send_telemetry
from cognee.tasks.documents import (
check_permissions_on_dataset,
classify_documents,
extract_chunks_from_documents,
)
@ -31,7 +30,6 @@ async def get_cascade_graph_tasks(
cognee_config = get_cognify_config()
default_tasks = [
Task(classify_documents),
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
Task(
extract_chunks_from_documents, max_chunk_tokens=get_max_chunk_tokens()
), # Extract text chunks based on the document type.

View file

@ -30,8 +30,8 @@ async def get_no_summary_tasks(
ontology_file_path=None,
) -> List[Task]:
"""Returns default tasks without summarization tasks."""
# Get base tasks (0=classify, 1=check_permissions, 2=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1, 2], chunk_size, chunker)
# Get base tasks (0=classify, 1=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1], chunk_size, chunker)
ontology_adapter = RDFLibOntologyResolver(ontology_file=ontology_file_path)
@ -51,8 +51,8 @@ async def get_just_chunks_tasks(
chunk_size: int = None, chunker=TextChunker, user=None
) -> List[Task]:
"""Returns default tasks with only chunk extraction and data points addition."""
# Get base tasks (0=classify, 1=check_permissions, 2=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1, 2], chunk_size, chunker)
# Get base tasks (0=classify, 1=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1], chunk_size, chunker)
add_data_points_task = Task(add_data_points, task_config={"batch_size": 10})

View file

@ -14,7 +14,7 @@ class EvalConfig(BaseSettings):
# Question answering params
answering_questions: bool = True
qa_engine: str = "cognee_completion" # Options: 'cognee_completion' or 'cognee_graph_completion' or 'cognee_graph_completion_cot' or 'cognee_graph_completion_context_extension'
qa_engine: str = "cognee_graph_completion" # Options: 'cognee_completion' or 'cognee_graph_completion' or 'cognee_graph_completion_cot' or 'cognee_graph_completion_context_extension'
# Evaluation params
evaluating_answers: bool = True
@ -25,7 +25,7 @@ class EvalConfig(BaseSettings):
"EM",
"f1",
] # Use only 'correctness' for DirectLLM
deepeval_model: str = "gpt-5-mini"
deepeval_model: str = "gpt-4o-mini"
# Metrics params
calculate_metrics: bool = True

View file

@ -2,7 +2,6 @@ import modal
import os
import asyncio
import datetime
import hashlib
import json
from cognee.shared.logging_utils import get_logger
from cognee.eval_framework.eval_config import EvalConfig
@ -10,6 +9,9 @@ from cognee.eval_framework.corpus_builder.run_corpus_builder import run_corpus_b
from cognee.eval_framework.answer_generation.run_question_answering_module import (
run_question_answering,
)
import pathlib
from os import path
from modal import Image
from cognee.eval_framework.evaluation.run_evaluation_module import run_evaluation
from cognee.eval_framework.metrics_dashboard import create_dashboard
@ -38,22 +40,19 @@ def read_and_combine_metrics(eval_params: dict) -> dict:
app = modal.App("modal-run-eval")
image = (
modal.Image.from_dockerfile(path="Dockerfile_modal", force_build=False)
.copy_local_file("pyproject.toml", "pyproject.toml")
.copy_local_file("poetry.lock", "poetry.lock")
.env(
{
"ENV": os.getenv("ENV"),
"LLM_API_KEY": os.getenv("LLM_API_KEY"),
"OPENAI_API_KEY": os.getenv("OPENAI_API_KEY"),
}
)
.pip_install("protobuf", "h2", "deepeval", "gdown", "plotly")
image = Image.from_dockerfile(
path=pathlib.Path(path.join(path.dirname(__file__), "Dockerfile")).resolve(),
force_build=False,
).add_local_python_source("cognee")
@app.function(
image=image,
max_containers=10,
timeout=86400,
volumes={"/data": vol},
secrets=[modal.Secret.from_name("eval_secrets")],
)
@app.function(image=image, concurrency_limit=10, timeout=86400, volumes={"/data": vol})
async def modal_run_eval(eval_params=None):
"""Runs evaluation pipeline and returns combined metrics results."""
if eval_params is None:
@ -105,18 +104,7 @@ async def main():
configs = [
EvalConfig(
task_getter_type="Default",
number_of_samples_in_corpus=10,
benchmark="HotPotQA",
qa_engine="cognee_graph_completion",
building_corpus_from_scratch=True,
answering_questions=True,
evaluating_answers=True,
calculate_metrics=True,
dashboard=True,
),
EvalConfig(
task_getter_type="Default",
number_of_samples_in_corpus=10,
number_of_samples_in_corpus=25,
benchmark="TwoWikiMultiHop",
qa_engine="cognee_graph_completion",
building_corpus_from_scratch=True,
@ -127,7 +115,7 @@ async def main():
),
EvalConfig(
task_getter_type="Default",
number_of_samples_in_corpus=10,
number_of_samples_in_corpus=25,
benchmark="Musique",
qa_engine="cognee_graph_completion",
building_corpus_from_scratch=True,

View file

@ -1,6 +1,6 @@
from pydantic_settings import BaseSettings, SettingsConfigDict
from functools import lru_cache
from typing import Optional
from typing import Optional, Literal
class CacheConfig(BaseSettings):
@ -15,6 +15,7 @@ class CacheConfig(BaseSettings):
- agentic_lock_timeout: Maximum time (in seconds) to wait for the lock release.
"""
cache_backend: Literal["redis", "fs"] = "fs"
caching: bool = False
shared_kuzu_lock: bool = False
cache_host: str = "localhost"
@ -28,6 +29,7 @@ class CacheConfig(BaseSettings):
def to_dict(self) -> dict:
return {
"cache_backend": self.cache_backend,
"caching": self.caching,
"shared_kuzu_lock": self.shared_kuzu_lock,
"cache_host": self.cache_host,

View file

@ -0,0 +1,151 @@
import asyncio
import json
import os
from datetime import datetime
import time
import threading
import diskcache as dc
from cognee.infrastructure.databases.cache.cache_db_interface import CacheDBInterface
from cognee.infrastructure.databases.exceptions.exceptions import (
CacheConnectionError,
SharedKuzuLockRequiresRedisError,
)
from cognee.infrastructure.files.storage.get_storage_config import get_storage_config
from cognee.shared.logging_utils import get_logger
logger = get_logger("FSCacheAdapter")
class FSCacheAdapter(CacheDBInterface):
def __init__(self):
default_key = "sessions_db"
storage_config = get_storage_config()
data_root_directory = storage_config["data_root_directory"]
cache_directory = os.path.join(data_root_directory, ".cognee_fs_cache", default_key)
os.makedirs(cache_directory, exist_ok=True)
self.cache = dc.Cache(directory=cache_directory)
self.cache.expire()
logger.debug(f"FSCacheAdapter initialized with cache directory: {cache_directory}")
def acquire_lock(self):
"""Lock acquisition is not available for filesystem cache backend."""
message = "Shared Kuzu lock requires Redis cache backend."
logger.error(message)
raise SharedKuzuLockRequiresRedisError()
def release_lock(self):
"""Lock release is not available for filesystem cache backend."""
message = "Shared Kuzu lock requires Redis cache backend."
logger.error(message)
raise SharedKuzuLockRequiresRedisError()
async def add_qa(
self,
user_id: str,
session_id: str,
question: str,
context: str,
answer: str,
ttl: int | None = 86400,
):
try:
session_key = f"agent_sessions:{user_id}:{session_id}"
qa_entry = {
"time": datetime.utcnow().isoformat(),
"question": question,
"context": context,
"answer": answer,
}
existing_value = self.cache.get(session_key)
if existing_value is not None:
value: list = json.loads(existing_value)
value.append(qa_entry)
else:
value = [qa_entry]
self.cache.set(session_key, json.dumps(value), expire=ttl)
except Exception as e:
error_msg = f"Unexpected error while adding Q&A to diskcache: {str(e)}"
logger.error(error_msg)
raise CacheConnectionError(error_msg) from e
async def get_latest_qa(self, user_id: str, session_id: str, last_n: int = 5):
session_key = f"agent_sessions:{user_id}:{session_id}"
value = self.cache.get(session_key)
if value is None:
return None
entries = json.loads(value)
return entries[-last_n:] if len(entries) > last_n else entries
async def get_all_qas(self, user_id: str, session_id: str):
session_key = f"agent_sessions:{user_id}:{session_id}"
value = self.cache.get(session_key)
if value is None:
return None
return json.loads(value)
async def close(self):
if self.cache is not None:
self.cache.expire()
self.cache.close()
async def main():
adapter = FSCacheAdapter()
session_id = "demo_session"
user_id = "demo_user_id"
print("\nAdding sample Q/A pairs...")
await adapter.add_qa(
user_id,
session_id,
"What is Redis?",
"Basic DB context",
"Redis is an in-memory data store.",
)
await adapter.add_qa(
user_id,
session_id,
"Who created Redis?",
"Historical context",
"Salvatore Sanfilippo (antirez).",
)
print("\nLatest QA:")
latest = await adapter.get_latest_qa(user_id, session_id)
print(json.dumps(latest, indent=2))
print("\nLast 2 QAs:")
last_two = await adapter.get_latest_qa(user_id, session_id, last_n=2)
print(json.dumps(last_two, indent=2))
session_id = "session_expire_demo"
await adapter.add_qa(
user_id,
session_id,
"What is Redis?",
"Database context",
"Redis is an in-memory data store.",
)
await adapter.add_qa(
user_id,
session_id,
"Who created Redis?",
"History context",
"Salvatore Sanfilippo (antirez).",
)
print(await adapter.get_all_qas(user_id, session_id))
await adapter.close()
if __name__ == "__main__":
asyncio.run(main())

View file

@ -1,9 +1,11 @@
"""Factory to get the appropriate cache coordination engine (e.g., Redis)."""
from functools import lru_cache
import os
from typing import Optional
from cognee.infrastructure.databases.cache.config import get_cache_config
from cognee.infrastructure.databases.cache.cache_db_interface import CacheDBInterface
from cognee.infrastructure.databases.cache.fscache.FsCacheAdapter import FSCacheAdapter
config = get_cache_config()
@ -33,20 +35,28 @@ def create_cache_engine(
Returns:
--------
- CacheDBInterface: An instance of the appropriate cache adapter. :TODO: Now we support only Redis. later if we add more here we can split the logic
- CacheDBInterface: An instance of the appropriate cache adapter.
"""
if config.caching:
from cognee.infrastructure.databases.cache.redis.RedisAdapter import RedisAdapter
return RedisAdapter(
host=cache_host,
port=cache_port,
username=cache_username,
password=cache_password,
lock_name=lock_key,
timeout=agentic_lock_expire,
blocking_timeout=agentic_lock_timeout,
)
if config.cache_backend == "redis":
return RedisAdapter(
host=cache_host,
port=cache_port,
username=cache_username,
password=cache_password,
lock_name=lock_key,
timeout=agentic_lock_expire,
blocking_timeout=agentic_lock_timeout,
)
elif config.cache_backend == "fs":
return FSCacheAdapter()
else:
raise ValueError(
f"Unsupported cache backend: '{config.cache_backend}'. "
f"Supported backends are: 'redis', 'fs'"
)
else:
return None

View file

@ -0,0 +1,3 @@
from .dataset_database_handler_interface import DatasetDatabaseHandlerInterface
from .supported_dataset_database_handlers import supported_dataset_database_handlers
from .use_dataset_database_handler import use_dataset_database_handler

View file

@ -0,0 +1,80 @@
from typing import Optional
from uuid import UUID
from abc import ABC, abstractmethod
from cognee.modules.users.models.User import User
from cognee.modules.users.models.DatasetDatabase import DatasetDatabase
class DatasetDatabaseHandlerInterface(ABC):
@classmethod
@abstractmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Return a dictionary with database connection/resolution info for a graph or vector database for the given dataset.
Function can auto handle deploying of the actual database if needed, but is not necessary.
Only providing connection info is sufficient, this info will be mapped when trying to connect to the provided dataset in the future.
Needed for Cognee multi-tenant/multi-user and backend access control support.
Dictionary returned from this function will be used to create a DatasetDatabase row in the relational database.
From which internal mapping of dataset -> database connection info will be done.
The returned dictionary is stored verbatim in the relational database and is later passed to
resolve_dataset_connection_info() at connection time. For safe credential handling, prefer
returning only references to secrets or role identifiers, not plaintext credentials.
Each dataset needs to map to a unique graph or vector database when backend access control is enabled to facilitate a separation of concern for data.
Args:
dataset_id: UUID of the dataset if needed by the database creation logic
user: User object if needed by the database creation logic
Returns:
dict: Connection info for the created graph or vector database instance.
"""
pass
@classmethod
async def resolve_dataset_connection_info(
cls, dataset_database: DatasetDatabase
) -> DatasetDatabase:
"""
Resolve runtime connection details for a datasets backing graph/vector database.
Function is intended to be overwritten to implement custom logic for resolving connection info.
This method is invoked right before the application opens a connection for a given dataset.
It receives the DatasetDatabase row that was persisted when create_dataset() ran and must
return a modified instance of DatasetDatabase with concrete connection parameters that the client/driver can use.
Do not update these new DatasetDatabase values in the relational database to avoid storing secure credentials.
In case of separate graph and vector database handlers, each handler should implement its own logic for resolving
connection info and only change parameters related to its appropriate database, the resolution function will then
be called one after another with the updated DatasetDatabase value from the previous function as the input.
Typical behavior:
- If the DatasetDatabase row already contains raw connection fields (e.g., host/port/db/user/password
or api_url/api_key), return them as-is.
- If the row stores only references (e.g., secret IDs, vault paths, cloud resource ARNs/IDs, IAM
roles, SSO tokens), resolve those references by calling the appropriate secret manager or provider
API to obtain short-lived credentials and assemble the final connection DatasetDatabase object.
- Do not persist any resolved or decrypted secrets back to the relational database. Return them only
to the caller.
Args:
dataset_database: DatasetDatabase row from the relational database
Returns:
DatasetDatabase: Updated instance with resolved connection info
"""
return dataset_database
@classmethod
@abstractmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase) -> None:
"""
Delete the graph or vector database for the given dataset.
Function should auto handle deleting of the actual database or send a request to the proper service to delete/mark the database as not needed for the given dataset.
Needed for maintaining a database for Cognee multi-tenant/multi-user and backend access control.
Args:
dataset_database: DatasetDatabase row containing connection/resolution info for the graph or vector database to delete.
"""
pass

View file

@ -0,0 +1,18 @@
from cognee.infrastructure.databases.graph.neo4j_driver.Neo4jAuraDevDatasetDatabaseHandler import (
Neo4jAuraDevDatasetDatabaseHandler,
)
from cognee.infrastructure.databases.vector.lancedb.LanceDBDatasetDatabaseHandler import (
LanceDBDatasetDatabaseHandler,
)
from cognee.infrastructure.databases.graph.kuzu.KuzuDatasetDatabaseHandler import (
KuzuDatasetDatabaseHandler,
)
supported_dataset_database_handlers = {
"neo4j_aura_dev": {
"handler_instance": Neo4jAuraDevDatasetDatabaseHandler,
"handler_provider": "neo4j",
},
"lancedb": {"handler_instance": LanceDBDatasetDatabaseHandler, "handler_provider": "lancedb"},
"kuzu": {"handler_instance": KuzuDatasetDatabaseHandler, "handler_provider": "kuzu"},
}

View file

@ -0,0 +1,10 @@
from .supported_dataset_database_handlers import supported_dataset_database_handlers
def use_dataset_database_handler(
dataset_database_handler_name, dataset_database_handler, dataset_database_provider
):
supported_dataset_database_handlers[dataset_database_handler_name] = {
"handler_instance": dataset_database_handler,
"handler_provider": dataset_database_provider,
}

View file

@ -148,3 +148,19 @@ class CacheConnectionError(CogneeConfigurationError):
status_code: int = status.HTTP_503_SERVICE_UNAVAILABLE,
):
super().__init__(message, name, status_code)
class SharedKuzuLockRequiresRedisError(CogneeConfigurationError):
"""
Raised when shared Kuzu locking is requested without configuring the Redis backend.
"""
def __init__(
self,
message: str = (
"Shared Kuzu lock requires Redis cache backend. Configure Redis to enable shared Kuzu locking."
),
name: str = "SharedKuzuLockRequiresRedisError",
status_code: int = status.HTTP_400_BAD_REQUEST,
):
super().__init__(message, name, status_code)

View file

@ -26,6 +26,7 @@ class GraphConfig(BaseSettings):
- graph_database_username
- graph_database_password
- graph_database_port
- graph_database_key
- graph_file_path
- graph_model
- graph_topology
@ -41,10 +42,12 @@ class GraphConfig(BaseSettings):
graph_database_username: str = ""
graph_database_password: str = ""
graph_database_port: int = 123
graph_database_key: str = ""
graph_file_path: str = ""
graph_filename: str = ""
graph_model: object = KnowledgeGraph
graph_topology: object = KnowledgeGraph
graph_dataset_database_handler: str = "kuzu"
model_config = SettingsConfigDict(env_file=".env", extra="allow", populate_by_name=True)
# Model validator updates graph_filename and path dynamically after class creation based on current database provider
@ -90,10 +93,12 @@ class GraphConfig(BaseSettings):
"graph_database_username": self.graph_database_username,
"graph_database_password": self.graph_database_password,
"graph_database_port": self.graph_database_port,
"graph_database_key": self.graph_database_key,
"graph_file_path": self.graph_file_path,
"graph_model": self.graph_model,
"graph_topology": self.graph_topology,
"model_config": self.model_config,
"graph_dataset_database_handler": self.graph_dataset_database_handler,
}
def to_hashable_dict(self) -> dict:
@ -116,7 +121,9 @@ class GraphConfig(BaseSettings):
"graph_database_username": self.graph_database_username,
"graph_database_password": self.graph_database_password,
"graph_database_port": self.graph_database_port,
"graph_database_key": self.graph_database_key,
"graph_file_path": self.graph_file_path,
"graph_dataset_database_handler": self.graph_dataset_database_handler,
}

View file

@ -33,6 +33,8 @@ def create_graph_engine(
graph_database_username="",
graph_database_password="",
graph_database_port="",
graph_database_key="",
graph_dataset_database_handler="",
):
"""
Create a graph engine based on the specified provider type.
@ -69,6 +71,7 @@ def create_graph_engine(
graph_database_url=graph_database_url,
graph_database_username=graph_database_username,
graph_database_password=graph_database_password,
database_name=graph_database_name,
)
if graph_database_provider == "neo4j":

View file

@ -398,3 +398,18 @@ class GraphDBInterface(ABC):
- node_id (Union[str, UUID]): Unique identifier of the node for which to retrieve connections.
"""
raise NotImplementedError
@abstractmethod
async def get_filtered_graph_data(
self, attribute_filters: List[Dict[str, List[Union[str, int]]]]
) -> Tuple[List[Node], List[EdgeData]]:
"""
Retrieve nodes and edges filtered by the provided attribute criteria.
Parameters:
-----------
- attribute_filters: A list of dictionaries where keys are attribute names and values
are lists of attribute values to filter by.
"""
raise NotImplementedError

View file

@ -0,0 +1,81 @@
import os
from uuid import UUID
from typing import Optional
from cognee.infrastructure.databases.graph.get_graph_engine import create_graph_engine
from cognee.base_config import get_base_config
from cognee.modules.users.models import User
from cognee.modules.users.models import DatasetDatabase
from cognee.infrastructure.databases.dataset_database_handler import DatasetDatabaseHandlerInterface
class KuzuDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"""
Handler for interacting with Kuzu Dataset databases.
"""
@classmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Create a new Kuzu instance for the dataset. Return connection info that will be mapped to the dataset.
Args:
dataset_id: Dataset UUID
user: User object who owns the dataset and is making the request
Returns:
dict: Connection details for the created Kuzu instance
"""
from cognee.infrastructure.databases.graph.config import get_graph_config
graph_config = get_graph_config()
if graph_config.graph_database_provider != "kuzu":
raise ValueError(
"KuzuDatasetDatabaseHandler can only be used with Kuzu graph database provider."
)
graph_db_name = f"{dataset_id}.pkl"
graph_db_url = graph_config.graph_database_url
graph_db_key = graph_config.graph_database_key
graph_db_username = graph_config.graph_database_username
graph_db_password = graph_config.graph_database_password
return {
"graph_database_name": graph_db_name,
"graph_database_url": graph_db_url,
"graph_database_provider": graph_config.graph_database_provider,
"graph_database_key": graph_db_key,
"graph_dataset_database_handler": "kuzu",
"graph_database_connection_info": {
"graph_database_username": graph_db_username,
"graph_database_password": graph_db_password,
},
}
@classmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase):
base_config = get_base_config()
databases_directory_path = os.path.join(
base_config.system_root_directory, "databases", str(dataset_database.owner_id)
)
graph_file_path = os.path.join(
databases_directory_path, dataset_database.graph_database_name
)
graph_engine = create_graph_engine(
graph_database_provider=dataset_database.graph_database_provider,
graph_database_url=dataset_database.graph_database_url,
graph_database_name=dataset_database.graph_database_name,
graph_database_key=dataset_database.graph_database_key,
graph_file_path=graph_file_path,
graph_database_username=dataset_database.graph_database_connection_info.get(
"graph_database_username", ""
),
graph_database_password=dataset_database.graph_database_connection_info.get(
"graph_database_password", ""
),
graph_dataset_database_handler="",
graph_database_port="",
)
await graph_engine.delete_graph()

View file

@ -12,6 +12,7 @@ from contextlib import asynccontextmanager
from concurrent.futures import ThreadPoolExecutor
from typing import Dict, Any, List, Union, Optional, Tuple, Type
from cognee.exceptions import CogneeValidationError
from cognee.shared.logging_utils import get_logger
from cognee.infrastructure.utils.run_sync import run_sync
from cognee.infrastructure.files.storage import get_file_storage
@ -1186,6 +1187,11 @@ class KuzuAdapter(GraphDBInterface):
A tuple with two elements: a list of tuples of (node_id, properties) and a list of
tuples of (source_id, target_id, relationship_name, properties).
"""
import time
start_time = time.time()
try:
nodes_query = """
MATCH (n:Node)
@ -1249,6 +1255,11 @@ class KuzuAdapter(GraphDBInterface):
},
)
)
retrieval_time = time.time() - start_time
logger.info(
f"Retrieved {len(nodes)} nodes and {len(edges)} edges in {retrieval_time:.2f} seconds"
)
return formatted_nodes, formatted_edges
except Exception as e:
logger.error(f"Failed to get graph data: {e}")
@ -1417,6 +1428,92 @@ class KuzuAdapter(GraphDBInterface):
formatted_edges.append((source_id, target_id, rel_type, props))
return formatted_nodes, formatted_edges
async def get_id_filtered_graph_data(self, target_ids: list[str]):
"""
Retrieve graph data filtered by specific node IDs, including their direct neighbors
and only edges where one endpoint matches those IDs.
Returns:
nodes: List[dict] -> Each dict includes "id" and all node properties
edges: List[dict] -> Each dict includes "source", "target", "type", "properties"
"""
import time
start_time = time.time()
try:
if not target_ids:
logger.warning("No target IDs provided for ID-filtered graph retrieval.")
return [], []
if not all(isinstance(x, str) for x in target_ids):
raise CogneeValidationError("target_ids must be a list of strings")
query = """
MATCH (n:Node)-[r]->(m:Node)
WHERE n.id IN $target_ids OR m.id IN $target_ids
RETURN n.id, {
name: n.name,
type: n.type,
properties: n.properties
}, m.id, {
name: m.name,
type: m.type,
properties: m.properties
}, r.relationship_name, r.properties
"""
result = await self.query(query, {"target_ids": target_ids})
if not result:
logger.info("No data returned for the supplied IDs")
return [], []
nodes_dict = {}
edges = []
for n_id, n_props, m_id, m_props, r_type, r_props_raw in result:
if n_props.get("properties"):
try:
additional_props = json.loads(n_props["properties"])
n_props.update(additional_props)
del n_props["properties"]
except json.JSONDecodeError:
logger.warning(f"Failed to parse properties JSON for node {n_id}")
if m_props.get("properties"):
try:
additional_props = json.loads(m_props["properties"])
m_props.update(additional_props)
del m_props["properties"]
except json.JSONDecodeError:
logger.warning(f"Failed to parse properties JSON for node {m_id}")
nodes_dict[n_id] = (n_id, n_props)
nodes_dict[m_id] = (m_id, m_props)
edge_props = {}
if r_props_raw:
try:
edge_props = json.loads(r_props_raw)
except (json.JSONDecodeError, TypeError):
logger.warning(f"Failed to parse edge properties for {n_id}->{m_id}")
source_id = edge_props.get("source_node_id", n_id)
target_id = edge_props.get("target_node_id", m_id)
edges.append((source_id, target_id, r_type, edge_props))
retrieval_time = time.time() - start_time
logger.info(
f"ID-filtered retrieval: {len(nodes_dict)} nodes and {len(edges)} edges in {retrieval_time:.2f}s"
)
return list(nodes_dict.values()), edges
except Exception as e:
logger.error(f"Error during ID-filtered graph data retrieval: {str(e)}")
raise
async def get_graph_metrics(self, include_optional=False) -> Dict[str, Any]:
"""
Get metrics on graph structure and connectivity.
@ -1908,3 +2005,134 @@ class KuzuAdapter(GraphDBInterface):
time_ids_list = [item[0] for item in time_nodes]
return ", ".join(f"'{uid}'" for uid in time_ids_list)
async def get_triplets_batch(self, offset: int, limit: int) -> list[dict[str, Any]]:
"""
Retrieve a batch of triplets (start_node, relationship, end_node) from the graph.
Parameters:
-----------
- offset (int): Number of triplets to skip before returning results.
- limit (int): Maximum number of triplets to return.
Returns:
--------
- list[dict[str, Any]]: A list of triplets, where each triplet is a dictionary
with keys: 'start_node', 'relationship_properties', 'end_node'.
Raises:
-------
- ValueError: If offset or limit are negative.
- Exception: Re-raises any exceptions from query execution.
"""
if offset < 0:
raise ValueError(f"Offset must be non-negative, got {offset}")
if limit < 0:
raise ValueError(f"Limit must be non-negative, got {limit}")
query = """
MATCH (start_node:Node)-[relationship:EDGE]->(end_node:Node)
RETURN {
start_node: {
id: start_node.id,
name: start_node.name,
type: start_node.type,
properties: start_node.properties
},
relationship_properties: {
relationship_name: relationship.relationship_name,
properties: relationship.properties
},
end_node: {
id: end_node.id,
name: end_node.name,
type: end_node.type,
properties: end_node.properties
}
} AS triplet
SKIP $offset LIMIT $limit
"""
try:
results = await self.query(query, {"offset": offset, "limit": limit})
except Exception as e:
logger.error(f"Failed to execute triplet query: {str(e)}")
logger.error(f"Query: {query}")
logger.error(f"Parameters: offset={offset}, limit={limit}")
raise
triplets = []
for idx, row in enumerate(results):
try:
if not row or len(row) == 0:
logger.warning(f"Skipping empty row at index {idx} in triplet batch")
continue
if not isinstance(row[0], dict):
logger.warning(
f"Skipping invalid row at index {idx}: expected dict, got {type(row[0])}"
)
continue
triplet = row[0]
if "start_node" not in triplet:
logger.warning(f"Skipping triplet at index {idx}: missing 'start_node' key")
continue
if not isinstance(triplet["start_node"], dict):
logger.warning(f"Skipping triplet at index {idx}: 'start_node' is not a dict")
continue
triplet["start_node"] = self._parse_node_properties(triplet["start_node"].copy())
if "relationship_properties" not in triplet:
logger.warning(
f"Skipping triplet at index {idx}: missing 'relationship_properties' key"
)
continue
if not isinstance(triplet["relationship_properties"], dict):
logger.warning(
f"Skipping triplet at index {idx}: 'relationship_properties' is not a dict"
)
continue
rel_props = triplet["relationship_properties"].copy()
relationship_name = rel_props.get("relationship_name") or ""
if rel_props.get("properties"):
try:
parsed_props = json.loads(rel_props["properties"])
if isinstance(parsed_props, dict):
rel_props.update(parsed_props)
del rel_props["properties"]
else:
logger.warning(
f"Parsed relationship properties is not a dict for triplet at index {idx}"
)
except (json.JSONDecodeError, TypeError) as e:
logger.warning(
f"Failed to parse relationship properties JSON for triplet at index {idx}: {e}"
)
rel_props["relationship_name"] = relationship_name
triplet["relationship_properties"] = rel_props
if "end_node" not in triplet:
logger.warning(f"Skipping triplet at index {idx}: missing 'end_node' key")
continue
if not isinstance(triplet["end_node"], dict):
logger.warning(f"Skipping triplet at index {idx}: 'end_node' is not a dict")
continue
triplet["end_node"] = self._parse_node_properties(triplet["end_node"].copy())
triplets.append(triplet)
except Exception as e:
logger.error(f"Error processing triplet at index {idx}: {e}", exc_info=True)
continue
return triplets

View file

@ -0,0 +1,168 @@
import os
import asyncio
import requests
import base64
import hashlib
from uuid import UUID
from typing import Optional
from cryptography.fernet import Fernet
from cognee.infrastructure.databases.graph import get_graph_config
from cognee.modules.users.models import User, DatasetDatabase
from cognee.infrastructure.databases.dataset_database_handler import DatasetDatabaseHandlerInterface
class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"""
Handler for a quick development PoC integration of Cognee multi-user and permission mode with Neo4j Aura databases.
This handler creates a new Neo4j Aura instance for each Cognee dataset created.
Improvements needed to be production ready:
- Secret management for client credentials, currently secrets are encrypted and stored in the Cognee relational database,
a secret manager or a similar system should be used instead.
Quality of life improvements:
- Allow configuration of different Neo4j Aura plans and regions.
- Requests should be made async, currently a blocking requests library is used.
"""
@classmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Create a new Neo4j Aura instance for the dataset. Return connection info that will be mapped to the dataset.
Args:
dataset_id: Dataset UUID
user: User object who owns the dataset and is making the request
Returns:
dict: Connection details for the created Neo4j instance
"""
graph_config = get_graph_config()
if graph_config.graph_database_provider != "neo4j":
raise ValueError(
"Neo4jAuraDevDatasetDatabaseHandler can only be used with Neo4j graph database provider."
)
graph_db_name = f"{dataset_id}"
# Client credentials and encryption
client_id = os.environ.get("NEO4J_CLIENT_ID", None)
client_secret = os.environ.get("NEO4J_CLIENT_SECRET", None)
tenant_id = os.environ.get("NEO4J_TENANT_ID", None)
encryption_env_key = os.environ.get("NEO4J_ENCRYPTION_KEY", "test_key")
encryption_key = base64.urlsafe_b64encode(
hashlib.sha256(encryption_env_key.encode()).digest()
)
cipher = Fernet(encryption_key)
if client_id is None or client_secret is None or tenant_id is None:
raise ValueError(
"NEO4J_CLIENT_ID, NEO4J_CLIENT_SECRET, and NEO4J_TENANT_ID environment variables must be set to use Neo4j Aura DatasetDatabase Handling."
)
# Make the request with HTTP Basic Auth
def get_aura_token(client_id: str, client_secret: str) -> dict:
url = "https://api.neo4j.io/oauth/token"
data = {"grant_type": "client_credentials"} # sent as application/x-www-form-urlencoded
resp = requests.post(url, data=data, auth=(client_id, client_secret))
resp.raise_for_status() # raises if the request failed
return resp.json()
resp = get_aura_token(client_id, client_secret)
url = "https://api.neo4j.io/v1/instances"
headers = {
"accept": "application/json",
"Authorization": f"Bearer {resp['access_token']}",
"Content-Type": "application/json",
}
# TODO: Maybe we can allow **kwargs parameter forwarding for cases like these
# Too allow different configurations between datasets
payload = {
"version": "5",
"region": "europe-west1",
"memory": "1GB",
"name": graph_db_name[
0:29
], # TODO: Find better name to name Neo4j instance within 30 character limit
"type": "professional-db",
"tenant_id": tenant_id,
"cloud_provider": "gcp",
}
response = requests.post(url, headers=headers, json=payload)
graph_db_name = "neo4j" # Has to be 'neo4j' for Aura
graph_db_url = response.json()["data"]["connection_url"]
graph_db_key = resp["access_token"]
graph_db_username = response.json()["data"]["username"]
graph_db_password = response.json()["data"]["password"]
async def _wait_for_neo4j_instance_provisioning(instance_id: str, headers: dict):
# Poll until the instance is running
status_url = f"https://api.neo4j.io/v1/instances/{instance_id}"
status = ""
for attempt in range(30): # Try for up to ~5 minutes
status_resp = requests.get(
status_url, headers=headers
) # TODO: Use async requests with httpx
status = status_resp.json()["data"]["status"]
if status.lower() == "running":
return
await asyncio.sleep(10)
raise TimeoutError(
f"Neo4j instance '{graph_db_name}' did not become ready within 5 minutes. Status: {status}"
)
instance_id = response.json()["data"]["id"]
await _wait_for_neo4j_instance_provisioning(instance_id, headers)
encrypted_db_password_bytes = cipher.encrypt(graph_db_password.encode())
encrypted_db_password_string = encrypted_db_password_bytes.decode()
return {
"graph_database_name": graph_db_name,
"graph_database_url": graph_db_url,
"graph_database_provider": "neo4j",
"graph_database_key": graph_db_key,
"graph_dataset_database_handler": "neo4j_aura_dev",
"graph_database_connection_info": {
"graph_database_username": graph_db_username,
"graph_database_password": encrypted_db_password_string,
},
}
@classmethod
async def resolve_dataset_connection_info(
cls, dataset_database: DatasetDatabase
) -> DatasetDatabase:
"""
Resolve and decrypt connection info for the Neo4j dataset database.
In this case, decrypt the password stored in the database.
Args:
dataset_database: DatasetDatabase instance containing encrypted connection info.
"""
encryption_env_key = os.environ.get("NEO4J_ENCRYPTION_KEY", "test_key")
encryption_key = base64.urlsafe_b64encode(
hashlib.sha256(encryption_env_key.encode()).digest()
)
cipher = Fernet(encryption_key)
graph_db_password = cipher.decrypt(
dataset_database.graph_database_connection_info["graph_database_password"].encode()
).decode()
dataset_database.graph_database_connection_info["graph_database_password"] = (
graph_db_password
)
return dataset_database
@classmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase):
pass

View file

@ -8,7 +8,7 @@ from neo4j import AsyncSession
from neo4j import AsyncGraphDatabase
from neo4j.exceptions import Neo4jError
from contextlib import asynccontextmanager
from typing import Optional, Any, List, Dict, Type, Tuple
from typing import Optional, Any, List, Dict, Type, Tuple, Coroutine
from cognee.infrastructure.engine import DataPoint
from cognee.modules.engine.utils.generate_timestamp_datapoint import date_to_int
@ -964,6 +964,63 @@ class Neo4jAdapter(GraphDBInterface):
logger.error(f"Error during graph data retrieval: {str(e)}")
raise
async def get_id_filtered_graph_data(self, target_ids: list[str]):
"""
Retrieve graph data filtered by specific node IDs, including their direct neighbors
and only edges where one endpoint matches those IDs.
This version uses a single Cypher query for efficiency.
"""
import time
start_time = time.time()
try:
if not target_ids:
logger.warning("No target IDs provided for ID-filtered graph retrieval.")
return [], []
query = """
MATCH ()-[r]-()
WHERE startNode(r).id IN $target_ids
OR endNode(r).id IN $target_ids
WITH DISTINCT r, startNode(r) AS a, endNode(r) AS b
RETURN
properties(a) AS n_properties,
properties(b) AS m_properties,
type(r) AS type,
properties(r) AS properties
"""
result = await self.query(query, {"target_ids": target_ids})
nodes_dict = {}
edges = []
for record in result:
n_props = record["n_properties"]
m_props = record["m_properties"]
r_props = record["properties"]
r_type = record["type"]
nodes_dict[n_props["id"]] = (n_props["id"], n_props)
nodes_dict[m_props["id"]] = (m_props["id"], m_props)
source_id = r_props.get("source_node_id", n_props["id"])
target_id = r_props.get("target_node_id", m_props["id"])
edges.append((source_id, target_id, r_type, r_props))
retrieval_time = time.time() - start_time
logger.info(
f"ID-filtered retrieval: {len(nodes_dict)} nodes and {len(edges)} edges in {retrieval_time:.2f}s"
)
return list(nodes_dict.values()), edges
except Exception as e:
logger.error(f"Error during ID-filtered graph data retrieval: {str(e)}")
raise
async def get_nodeset_subgraph(
self, node_type: Type[Any], node_name: List[str]
) -> Tuple[List[Tuple[int, dict]], List[Tuple[int, int, str, dict]]]:
@ -1470,3 +1527,25 @@ class Neo4jAdapter(GraphDBInterface):
time_ids_list = [item["id"] for item in time_nodes if "id" in item]
return ", ".join(f"'{uid}'" for uid in time_ids_list)
async def get_triplets_batch(self, offset: int, limit: int) -> list[dict[str, Any]]:
"""
Retrieve a batch of triplets (start_node, relationship, end_node) from the graph.
Parameters:
-----------
- offset (int): Number of triplets to skip before returning results.
- limit (int): Maximum number of triplets to return.
Returns:
--------
- list[dict[str, Any]]: A list of triplets.
"""
query = f"""
MATCH (start_node:`{BASE_LABEL}`)-[relationship]->(end_node:`{BASE_LABEL}`)
RETURN start_node, properties(relationship) AS relationship_properties, end_node
SKIP $offset LIMIT $limit
"""
results = await self.query(query, {"offset": offset, "limit": limit})
return results

View file

@ -416,6 +416,15 @@ class NeptuneAnalyticsAdapter(NeptuneGraphDB, VectorDBInterface):
self._client.query(f"MATCH (n :{self._VECTOR_NODE_LABEL}) DETACH DELETE n")
pass
async def is_empty(self) -> bool:
query = """
MATCH (n)
RETURN true
LIMIT 1;
"""
query_result = await self._client.query(query)
return len(query_result) == 0
@staticmethod
def _get_scored_result(
item: dict, with_vector: bool = False, with_score: bool = False

View file

@ -1 +1,4 @@
from .get_or_create_dataset_database import get_or_create_dataset_database
from .resolve_dataset_database_connection_info import resolve_dataset_database_connection_info
from .get_graph_dataset_database_handler import get_graph_dataset_database_handler
from .get_vector_dataset_database_handler import get_vector_dataset_database_handler

Some files were not shown because too many files have changed in this diff Show more