Compare commits

..

243 commits

Author SHA1 Message Date
Vasilije
a505eef95b
Enhance README with Cognee description
Added a brief description of Cognee's features and capabilities.
2026-01-17 21:00:31 +00:00
Vasilije
8cd3aab1ef
Clarify Cognee Open Source and Cloud descriptions
Updated the README to clarify the differences between Cognee Open Source and Cognee Cloud offerings.
2026-01-17 20:58:55 +00:00
Vasilije
8a96a351e2
chore(deps): bump the npm_and_yarn group across 1 directory with 2 updates (#1974)
Bumps the npm_and_yarn group with 2 updates in the /cognee-frontend
directory: [next](https://github.com/vercel/next.js) and
[preact](https://github.com/preactjs/preact).

Updates `next` from 16.0.4 to 16.1.1
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v16.1.1</h2>
<blockquote>
<p>[!NOTE]
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Turbopack: Create junction points instead of symlinks on Windows (<a
href="https://redirect.github.com/vercel/next.js/issues/87606">#87606</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/sokra"><code>@​sokra</code></a> and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v16.1.1-canary.16</h2>
<h3>Core Changes</h3>
<ul>
<li>Add maximum size limit for postponed body parsing: <a
href="https://redirect.github.com/vercel/next.js/issues/88175">#88175</a></li>
<li>metadata: use fixed segment in dynamic routes with static metadata
files: <a
href="https://redirect.github.com/vercel/next.js/issues/88113">#88113</a></li>
<li>feat: add --experimental-cpu-prof flag for dev, build, and start: <a
href="https://redirect.github.com/vercel/next.js/issues/87946">#87946</a></li>
<li>Add experimental option to use no-cache instead of no-store in dev:
<a
href="https://redirect.github.com/vercel/next.js/issues/88182">#88182</a></li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>fix: move conductor.json to repo root for proper detection: <a
href="https://redirect.github.com/vercel/next.js/issues/88184">#88184</a></li>
<li>Turbopack: Update to swc_core v50.2.3: <a
href="https://redirect.github.com/vercel/next.js/issues/87841">#87841</a></li>
<li>Update generateMetadata in client component error: <a
href="https://redirect.github.com/vercel/next.js/issues/88172">#88172</a></li>
<li>ci: run stats on canary pushes for historical trend tracking: <a
href="https://redirect.github.com/vercel/next.js/issues/88157">#88157</a></li>
<li>perf: improve stats thresholds to reduce CI noise: <a
href="https://redirect.github.com/vercel/next.js/issues/88158">#88158</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/wyattjoh"><code>@​wyattjoh</code></a>, <a
href="https://github.com/bgw"><code>@​bgw</code></a>, <a
href="https://github.com/timneutkens"><code>@​timneutkens</code></a>, <a
href="https://github.com/feedthejim"><code>@​feedthejim</code></a>, and
<a href="https://github.com/huozhi"><code>@​huozhi</code></a> for
helping!</p>
<h2>v16.1.1-canary.15</h2>
<h3>Core Changes</h3>
<ul>
<li>add compilation error for taint when not enabled: <a
href="https://redirect.github.com/vercel/next.js/issues/88173">#88173</a></li>
<li>feat(next/image)!: add <code>images.maximumResponseBody</code>
config: <a
href="https://redirect.github.com/vercel/next.js/issues/88183">#88183</a></li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>Update Rspack production test manifest: <a
href="https://redirect.github.com/vercel/next.js/issues/88137">#88137</a></li>
<li>Update Rspack development test manifest: <a
href="https://redirect.github.com/vercel/next.js/issues/88138">#88138</a></li>
<li>Fix compile error when running next-custom-transform tests: <a
href="https://redirect.github.com/vercel/next.js/issues/83715">#83715</a></li>
<li>chore: add Conductor configuration for parallel development: <a
href="https://redirect.github.com/vercel/next.js/issues/88116">#88116</a></li>
<li>[docs] add get_routes in mcp available tools: <a
href="https://redirect.github.com/vercel/next.js/issues/88181">#88181</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/vercel-release-bot"><code>@​vercel-release-bot</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, <a
href="https://github.com/mischnic"><code>@​mischnic</code></a>, <a
href="https://github.com/wyattjoh"><code>@​wyattjoh</code></a>, <a
href="https://github.com/huozhi"><code>@​huozhi</code></a>, and <a
href="https://github.com/styfle"><code>@​styfle</code></a> for
helping!</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3aa53984e9"><code>3aa5398</code></a>
v16.1.1</li>
<li><a
href="d1bd5b5810"><code>d1bd5b5</code></a>
Turbopack: Create junction points instead of symlinks on Windows (<a
href="https://redirect.github.com/vercel/next.js/issues/87606">#87606</a>)</li>
<li><a
href="a67ee72788"><code>a67ee72</code></a>
setup release branch</li>
<li><a
href="34916762cd"><code>3491676</code></a>
v16.1.0</li>
<li><a
href="58e8f8c7e5"><code>58e8f8c</code></a>
v16.1.0-canary.34</li>
<li><a
href="8a8a00d5d0"><code>8a8a00d</code></a>
Revert &quot;Move next-env.d.ts to dist dir&quot; (<a
href="https://redirect.github.com/vercel/next.js/issues/87311">#87311</a>)</li>
<li><a
href="3284587f8e"><code>3284587</code></a>
v16.1.0-canary.33</li>
<li><a
href="25da5f0426"><code>25da5f0</code></a>
Move next-env.d.ts to dist dir (<a
href="https://redirect.github.com/vercel/next.js/issues/86752">#86752</a>)</li>
<li><a
href="aa8a243e72"><code>aa8a243</code></a>
feat: use Rspack persistent cache by default (<a
href="https://redirect.github.com/vercel/next.js/issues/81399">#81399</a>)</li>
<li><a
href="754db28e52"><code>754db28</code></a>
bundle analyzer: remove geist font in favor of system ui fonts (<a
href="https://redirect.github.com/vercel/next.js/issues/87292">#87292</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v16.0.4...v16.1.1">compare
view</a></li>
</ul>
</details>
<br />

Updates `preact` from 10.27.2 to 10.28.2
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/preactjs/preact/releases">preact's
releases</a>.</em></p>
<blockquote>
<h2>10.28.2</h2>
<h2>Fixes</h2>
<ul>
<li>Enforce strict equality for VNode object constructors</li>
</ul>
<h2>10.28.1</h2>
<h2>Fixes</h2>
<ul>
<li>Fix erroneous diffing w/ growing list (<a
href="https://redirect.github.com/preactjs/preact/issues/4975">#4975</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
</ul>
<h2>10.28.0</h2>
<h2>Types</h2>
<ul>
<li>Updates dangerouslySetInnerHTML type so future TS will accept
Trusted… (<a
href="https://redirect.github.com/preactjs/preact/issues/4931">#4931</a>,
thanks <a
href="https://github.com/lukewarlow"><code>@​lukewarlow</code></a>)</li>
<li>Adds snap events (<a
href="https://redirect.github.com/preactjs/preact/issues/4947">#4947</a>,
thanks <a
href="https://github.com/argyleink"><code>@​argyleink</code></a>)</li>
<li>Remove missed jsx duplicates (<a
href="https://redirect.github.com/preactjs/preact/issues/4950">#4950</a>,
thanks <a
href="https://github.com/rschristian"><code>@​rschristian</code></a>)</li>
<li>Fix scroll events (<a
href="https://redirect.github.com/preactjs/preact/issues/4949">#4949</a>,
thanks <a
href="https://github.com/rschristian"><code>@​rschristian</code></a>)</li>
</ul>
<h2>Fixes</h2>
<ul>
<li>Fix cascading renders with signals (<a
href="https://redirect.github.com/preactjs/preact/issues/4966">#4966</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
<li>add <code>commpat/server.browser</code> entry (<a
href="https://redirect.github.com/preactjs/preact/issues/4941">#4941</a>
&amp; <a
href="https://redirect.github.com/preactjs/preact/issues/4940">#4940</a>,
thanks <a
href="https://github.com/marvinhagemeister"><code>@​marvinhagemeister</code></a>)</li>
<li>Avoid lazy components without result going in throw loop (<a
href="https://redirect.github.com/preactjs/preact/issues/4937">#4937</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
</ul>
<h2>Performance</h2>
<ul>
<li>Backport some v11 optimizations (<a
href="https://redirect.github.com/preactjs/preact/issues/4967">#4967</a>,
thanks <a
href="https://github.com/JoviDeCroock"><code>@​JoviDeCroock</code></a>)</li>
</ul>
<h2>10.27.3</h2>
<h2>Fixes</h2>
<ul>
<li>Enforce strict equality for VNode object constructors</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="6f914464b3"><code>6f91446</code></a>
10.28.2</li>
<li><a
href="37c3e030ab"><code>37c3e03</code></a>
Strict equality check on constructor (<a
href="https://redirect.github.com/preactjs/preact/issues/4985">#4985</a>)</li>
<li><a
href="6670a4a70b"><code>6670a4a</code></a>
chore: Adjust TS linting setup (<a
href="https://redirect.github.com/preactjs/preact/issues/4982">#4982</a>)</li>
<li><a
href="2af522b2c2"><code>2af522b</code></a>
10.28.1 (<a
href="https://redirect.github.com/preactjs/preact/issues/4978">#4978</a>)</li>
<li><a
href="f7693b72ec"><code>f7693b7</code></a>
Fix erroneous diffing w/ growing list (<a
href="https://redirect.github.com/preactjs/preact/issues/4975">#4975</a>)</li>
<li><a
href="b36b6a7148"><code>b36b6a7</code></a>
10.28.0 (<a
href="https://redirect.github.com/preactjs/preact/issues/4968">#4968</a>)</li>
<li><a
href="4d40e96f43"><code>4d40e96</code></a>
Backport some v11 optimizations (<a
href="https://redirect.github.com/preactjs/preact/issues/4967">#4967</a>)</li>
<li><a
href="7b74b406e2"><code>7b74b40</code></a>
Fix cascading renders with signals (<a
href="https://redirect.github.com/preactjs/preact/issues/4966">#4966</a>)</li>
<li><a
href="3ab5c6fbbb"><code>3ab5c6f</code></a>
Updates dangerouslySetInnerHTML type so future TS will accept Trusted…
(<a
href="https://redirect.github.com/preactjs/preact/issues/4931">#4931</a>)</li>
<li><a
href="ff30c2b5c4"><code>ff30c2b</code></a>
Adds snap events (<a
href="https://redirect.github.com/preactjs/preact/issues/4947">#4947</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/preactjs/preact/compare/10.27.2...10.28.2">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/topoteretes/cognee/network/alerts).

</details>
2026-01-08 15:55:56 +01:00
Vasilije
39613997d6
docs: clarify dev branching and fix contributing text (#1976)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2026-01-08 15:53:18 +01:00
Vasilije
a5fc6165c1
refactor: Use same default_k value in MCP as for Cognee (#1977)
<!-- .github/pull_request_template.md -->

Set default top_k value for MCP to be the same as the Cognee default
top_k value

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Improvements**
* Search operations now return 10 results by default instead of 5,
providing more comprehensive search results.

* **Style**
* Minor internal formatting cleanup in the client path with no
user-visible behavior changes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 15:52:15 +01:00
Igor Ilic
69fe35bdee refactor: add ruff formatting 2026-01-08 13:32:15 +01:00
Igor Ilic
be738df88a refactor: Use same default_k value in MCP as for Cognee 2026-01-08 12:47:42 +01:00
Babar Ali
01a39dff22 docs: clarify dev branching and fix contributing text
Signed-off-by: Babar Ali <148423037+Babarali2k21@users.noreply.github.com>
2026-01-08 10:15:42 +01:00
dependabot[bot]
53f96f3e29
chore(deps): bump the npm_and_yarn group across 1 directory with 2 updates
Bumps the npm_and_yarn group with 2 updates in the /cognee-frontend directory: [next](https://github.com/vercel/next.js) and [preact](https://github.com/preactjs/preact).


Updates `next` from 16.0.4 to 16.1.1
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v16.0.4...v16.1.1)

Updates `preact` from 10.27.2 to 10.28.2
- [Release notes](https://github.com/preactjs/preact/releases)
- [Commits](https://github.com/preactjs/preact/compare/10.27.2...10.28.2)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 16.1.1
  dependency-type: direct:production
  dependency-group: npm_and_yarn
- dependency-name: preact
  dependency-version: 10.28.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-07 19:36:40 +00:00
Vasilije
b339529621
Fix: Add top_k parameter support to MCP search tool (#1954)
## Problem

The MCP search wrapper doesn't expose the `top_k` parameter, causing
critical performance and usability issues:

- **Unlimited result returns**: CHUNKS search returns 113KB+ responses
with hundreds of text chunks
- **Extreme performance degradation**: GRAPH_COMPLETION takes 30+
seconds to complete
- **Context window exhaustion**: Responses quickly consume the entire
context budget
- **Production unusability**: Search functionality is impractical for
real-world MCP client usage

### Root Cause

The MCP tool definition in `server.py` doesn't expose the `top_k`
parameter that exists in the underlying `cognee.search()` API.
Additionally, `cognee_client.py` ignores the parameter in direct mode
(line 194).

## Solution

This PR adds proper `top_k` parameter support throughout the MCP call
chain:

### Changes

1. **server.py (line 319)**: Add `top_k: int = 5` parameter to MCP
`search` tool
2. **server.py (line 428)**: Update `search_task` signature to accept
`top_k`
3. **server.py (line 433)**: Pass `top_k` to `cognee_client.search()`
4. **server.py (line 468)**: Pass `top_k` to `search_task` call
5. **cognee_client.py (line 194)**: Forward `top_k` parameter to
`cognee.search()`

### Parameter Flow
```
MCP Client (Claude Code, etc.)
    ↓ search(query, type, top_k=5)
server.py::search()
    ↓
server.py::search_task()
    ↓
cognee_client.search()
    ↓
cognee.search()  ← Core library
```

## Impact

### Performance Improvements

| Metric | Before | After (top_k=5) | Improvement |
|--------|--------|-----------------|-------------|
| Response Size (CHUNKS) | 113KB+ | ~3KB | 97% reduction |
| Response Size (GRAPH_COMPLETION) | 100KB+ | ~5KB | 95% reduction |
| Latency (GRAPH_COMPLETION) | 30+ seconds | 2-5 seconds | 80-90% faster
|
| Context Window Usage | Rapidly exhausted | Sustainable | Dramatic
improvement |

### User Control

Users can now control result granularity:
- `top_k=3` - Quick answers, minimal context
- `top_k=5` - Balanced (default)
- `top_k=10` - More comprehensive
- `top_k=20` - Maximum context (still reasonable)

### Backward Compatibility

 **Fully backward compatible**
- Default `top_k=5` maintains sensible behavior
- Existing MCP clients work without changes
- No breaking API changes

## Testing

### Code Review
-  All function signatures updated correctly
-  Parameter properly threaded through call chain
-  Default value provides sensible behavior
-  No syntax errors or type issues

### Production Usage
-  Patches in production use since issue discovery
-  Confirmed dramatic performance improvements
-  Successfully tested with CHUNKS, GRAPH_COMPLETION, and
RAG_COMPLETION search types
-  Vertex AI backend compatibility validated

## Additional Context

This issue particularly affects users of:
- Non-OpenAI LLM backends (Vertex AI, Claude, etc.)
- Production MCP deployments
- Context-sensitive applications (Claude Code, etc.)

The fix enables Cognee MCP to be practically usable in production
environments where context window management and response latency are
critical.

---

**Generated with** [Claude Code](https://claude.com/claude-code)

**Co-Authored-By**: Claude Sonnet 4.5 <noreply@anthropic.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Search queries now support a configurable results limit (defaults to
5), letting users control how many results are returned.
* The results limit is consistently applied across search modes so
returned results match the requested maximum.

* **Documentation**
* Clarified description of the results limit and its impact on result
sets and context usage.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-03 10:57:10 +01:00
AnveshJarabani
6a5ba70ced
docs: Add comprehensive docstrings and fix default top_k consistency
Address PR feedback from CodeRabbit AI:
- Add detailed docstring for search_task internal function
- Document top_k parameter in main search function docstring
- Fix default top_k inconsistency (was 10 in client, now 5 everywhere)
- Clarify performance implications of different top_k values

Changes:
- server.py: Add top_k parameter documentation and search_task docstring
- cognee_client.py: Change default top_k from 10 to 5 for consistency

This ensures consistent behavior across the MCP call chain and
provides clear guidance for users on choosing appropriate top_k values.
2026-01-03 01:33:13 -06:00
AnveshJarabani
7ee36f883b
Fix: Add top_k parameter support to MCP search tool
## Problem
The MCP search wrapper doesn't expose the top_k parameter, causing:
- Unlimited result returns (113KB+ responses)
- Extremely slow search performance (30+ seconds for GRAPH_COMPLETION)
- Context window exhaustion in production use

## Solution
1. Add top_k parameter (default=5) to MCP search tool in server.py
2. Thread parameter through search_task internal function
3. Forward top_k to cognee_client.search() call
4. Update cognee_client.py to pass top_k to core cognee.search()

## Impact
- **Performance**: 97% reduction in response size (113KB → 3KB)
- **Latency**: 80-90% faster (30s → 2-5s for GRAPH_COMPLETION)
- **Backward Compatible**: Default top_k=5 maintains existing behavior
- **User Control**: Configurable from top_k=3 (quick) to top_k=20 (comprehensive)

## Testing
-  Code review validates proper parameter threading
-  Backward compatible (default value ensures no breaking changes)
-  Production usage confirms performance improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-03 01:27:16 -06:00
Vasilije
5b42b21af5
Enhance CONTRIBUTING.md with example setup instructions
Added instructions for running a simple example and setting up the environment.
2025-12-29 18:00:08 +01:00
Vasilije
1061258fde
Fix Python 3.12 SyntaxError caused by JS regex escape sequences (#1934)
### Fix Python 3.12 SyntaxError caused by invalid escape sequences

#### Problem
Importing `cognee` on Python 3.12+ raises:

SyntaxError: invalid escape sequence '\s'

This occurs because JavaScript regex patterns (e.g. `/\s+/g`) are
embedded
inside a regular Python triple-quoted string, which Python 3.12 strictly
validates.

#### Solution
Converted the HTML template string to a raw string (`r"""`) so Python
does not
interpret JavaScript regex escape sequences.

#### Files Changed
- cognee/modules/visualization/cognee_network_visualization.py

Fixes #1929


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved stability of the network visualization module through
enhanced HTML template handling.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-23 14:23:25 +01:00
Uday Gupta
7019a91f7c Fix Python 3.12 SyntaxError caused by JS regex escape sequences 2025-12-23 15:51:07 +05:30
Vasilije
e0644285d4
fix: Resolve issue with migrations for docker (#1932)
<!-- .github/pull_request_template.md -->

## Description
Try database creation if migrations can't run in Cognee

Until we rework the way migrations work in Cognee we can't create the
database first and then run migrations so this is the best we can do
until then

This will unfortunately eat all potential migrations issues, we should
make it work without eating exceptions when we rework migrations

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved database migration resilience with automatic fallback
initialization when migrations encounter unexpected errors
* Enhanced startup error handling to ensure graceful recovery during
database setup
* Database initialization now attempts automatic recovery before
reporting startup failures
* Added explicit success messaging when migrations complete without
issues

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-22 15:41:55 +01:00
Igor Ilic
f1526a6660 fix: Resolve issue with migrations for docker 2025-12-22 14:54:11 +01:00
Vasilije
d8d3844805
refactor: Update examples to use pprint (#1921)
<!-- .github/pull_request_template.md -->

## Description
Update examples to use pprint

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Style**
* Updated Python examples to use prettier, more readable output
formatting for search and example results.
* **Documentation**
* README updated to reflect improved example output presentation, making
demonstrations easier to read and verify.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-18 17:59:24 +01:00
Pavel Zorin
23d55a45d4 Release v0.5.1 2025-12-18 16:14:47 +01:00
Igor Ilic
1724997683 docs: Update README.md 2025-12-18 14:46:21 +01:00
Igor Ilic
eda9f26b2b
Merge branch 'main' into human-readable-search 2025-12-18 14:24:09 +01:00
Igor Ilic
cc41ef853c refactor: Update examples to use pprint 2025-12-18 14:17:24 +01:00
Vasilije
4caac4b8f0
refactor: Make graphs return optional in search (#1919)
<!-- .github/pull_request_template.md -->

## Description
- Have search results be more human readable by making graphs return
information optional

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added an optional verbose mode to search. When enabled, results
include additional graph details; disabled by default for cleaner
responses.

* **Tests**
* Added unit tests verifying access-controlled search returns correctly
shaped results for both verbose and non-verbose modes, including
presence/absence of graph details.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-18 13:50:33 +01:00
Igor Ilic
b5949580de refactor: add note about verbose in combined context search 2025-12-18 13:45:20 +01:00
Igor Ilic
986b93fee4 docs: add docstring update for search 2025-12-18 13:24:39 +01:00
Igor Ilic
31e491bc88 test: Add test for verbose search 2025-12-18 13:04:17 +01:00
Igor Ilic
f2bc7ca992 refactor: change comment 2025-12-18 12:00:06 +01:00
Igor Ilic
dd9aad90cb refactor: Make graphs return optional 2025-12-18 11:57:40 +01:00
Vasilije
c7e94e296e
fix: Fix MCP on main branch (#1906)
<!-- .github/pull_request_template.md -->

## Description
Fixes MCP on main branch

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated core dependency to a newer version.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-16 12:22:21 +01:00
Igor Ilic
109b889b68 fix: Fix MCP on main branch 2025-12-16 10:47:10 +01:00
Pavel Zorin
fdbee66985 Release: add push to main docker workflow 2025-12-15 20:46:20 +01:00
Pavel Zorin
077e0a4937
Release 0.5.0: uv lock (#1902)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 20:35:48 +01:00
Pavel Zorin
40e1b9bfed mcp: uv lock 2025-12-15 20:34:51 +01:00
Pavel Zorin
4424ebf3c9 Release 0.5.0: uv lock 2025-12-15 20:33:02 +01:00
Pavel Zorin
96107b9f2f
Bump cognee and MCP versions to 0.5.0 (#1901)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 20:30:19 +01:00
Pavel Zorin
b42ea166a8 Release: Bump cognee and MCP versions to 0.5.0 2025-12-15 20:29:29 +01:00
Pavel Zorin
4936551a90 CI: Update release workflow 2025-12-15 20:18:51 +01:00
Pavel Zorin
405c925b70
Release 0.5.0 Merge dev to main (#1900)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 19:57:39 +01:00
Pavel Zorin
78028b819f update dev uv.lock 2025-12-15 18:42:02 +01:00
Pavel Zorin
67af8a7cb4
Bump version from 0.5.0.dev0 to 0.5.0.dev1 2025-12-15 18:36:15 +01:00
hajdul88
622f8fa79e
chore: introduces 1 file upload in ontology endpoint (#1899)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes the ontology upload endpoint by forcing 1 file upload at
the time. Tests are adjusted in both server start and ontology endpoint
unit test. API was tested.

Do not merge it together with
https://github.com/topoteretes/cognee/pull/1898 its either that or this
one.



## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **API Changes**
* Ontology upload now accepts exactly one file per request; field
renamed from "descriptions" to "description" and validated as a plain
string.
* Stricter form validation and tighter 400/500 error handling for
malformed submissions.

* **Tests**
* Tests converted to real HTTP-style interactions using a shared test
client and dependency overrides.
* Payloads now use plain string fields; added coverage for single-file
constraints and specific error responses.

* **Style**
  * Minor formatting cleanups with no functional impact.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:30:35 +01:00
Igor Ilic
14d9540d1b
feat: Add database deletion on dataset delete (#1893)
<!-- .github/pull_request_template.md -->

## Description
- Add support for database deletion when dataset is deleted
- Simplify dataset handler usage in Cognee

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved dataset deletion: stronger authorization checks and reliable
removal of associated graph and vector storage.

* **Tests**
* Added end-to-end test to verify complete dataset deletion and cleanup
of all related storage components.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:15:48 +01:00
Vasilije
6b86f423ff
COG-3546: Initial release pipeline (#1883)
<!-- .github/pull_request_template.md -->

## Description

### Inputs: 
`flavour`: `dev` or `main`
`test_mode`: `Boolean`. Aka Dry Run. If true, it won't affect public
indices or repositories

### Jobs
* Create GitHub Release
* Release PyPI Package
* Release Docker Image

[Test
run](https://github.com/topoteretes/cognee/actions/runs/20167985120)

<img width="551" height="149" alt="Screenshot 2025-12-12 at 14 31 06"
src="https://github.com/user-attachments/assets/37648a35-0af8-4474-b051-4a81f8f8cfe7"
/>

#### Create GitHub Release
Gets the `version` from `pyproject.toml`
Creates a tag and release based on the version. The version in
`pyproject.toml` **must** be already correct!
If the version not updated and tag already exists - the process will
take no effect and be failed.

Test release was deleted. Here is the screenshot:
<img width="818" height="838" alt="Screenshot 2025-12-12 at 14 19 06"
src="https://github.com/user-attachments/assets/c2009f9f-1411-494e-8268-6489f6bdd959"
/>

#### Release PyPI Package

Publishes the `dist` artifacts to pypi.org. If `test_mode` enabled - it
will publish it to test.pypi.org
([example](https://test.pypi.org/project/cognee/0.5.0.dev0/)). <<<
That's how I tested it.

#### Release Docker Image
**IMPORTANT!**
Builds the image and tags it with the version from `pyproject.toml`. For
example:
* `cognee/cognee:0.5.1.dev0`
* `cognee/cognee:0.5.1`
ONLY **main** flavor adds the `latest` tag!. If a user does `docker pull
cognee/cognee:latest` - the latest `main` release image will be pulled

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Added an automated release pipeline that creates versioned GitHub
releases, publishes Python packages to TestPyPI/PyPI, and builds/pushes
Docker images.
* Supports selectable dev/main flavors, includes flavor-specific image
tagging and labels, wires version/tag outputs for downstream steps, and
offers a test-mode dry run that skips external publishing.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 17:13:40 +01:00
Vasilije
821852f2c3
feat: add support to pass custom parameters in llm adapters during cognify (#1802)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
A user raised this issue/feature request:
https://github.com/topoteretes/cognee/issues/1784
This PR implements this in a way, but I may have misunderstood the
request. If I'm correct, I think it makes sense to support this, if our
users need it.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-15 17:09:13 +01:00
Andrej Milicevic
433170fe09 merge dev 2025-12-15 17:06:20 +01:00
hajdul88
bad22ba26b
chore: adds id generation to memify triplet embedding pipeline (#1895)
<!-- .github/pull_request_template.md -->

## Description
This PR adds id generation to the Triplet objects in triplet embedding
memify pipeline. In some edge cases duplicated elements could have been
ingested into the collection



## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Enhancements**
* Relationship data now includes unique identifiers for improved
tracking and data management capabilities.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 15:45:35 +01:00
Vasilije
69e36cc834
feat: add bedrock as supported llm provider (#1830)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Added support for AWS Bedrock, and the models that are available there.
This was a contributor PR that was never finished, so now I polished it
up and made it work.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added AWS Bedrock as a new LLM provider with support for multiple
authentication methods.
* Integrated three new AI models: Claude 4.5 Sonnet, Claude 4.5 Haiku,
and Amazon Nova Lite.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:33:57 +01:00
Vasilije
d5bd75c627
test: remove anything codify related from mcp test (#1890)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
After removing codify, I didn't remove codify-related stuff from mcp
test. Hopefully now I've removed everything that was failing.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
  * Removed test coverage for codify and developer rules functionality.
* Updated search functionality test to exclude additional search type
filtering.

* **Chores**
  * Increased default logging verbosity from ERROR to INFO level.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:33:22 +01:00
Igor Ilic
c94225f505
fix: make ontology key an optional param in cognify (#1894)
<!-- .github/pull_request_template.md -->

## Description
Make ontology key optional in Swagger and None by default (it was
"string" by default before change which was causing issues when running
cognify endpoint)

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Enhanced API documentation with additional examples and validation
metadata to improve request clarity and validation guidance.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 14:30:22 +01:00
Pavel Zorin
a6bc27afaa Cleanup 2025-12-12 17:31:54 +01:00
Pavel Zorin
14ff94f269 Initial release pipeline 2025-12-12 17:09:10 +01:00
Andrej Milicevic
116b6f1eeb chore: formatting 2025-12-12 13:46:16 +01:00
Andrej Milicevic
a225d7fc61 test: revert some changes 2025-12-12 13:44:58 +01:00
Igor Ilic
127d9860df
feat: Add dataset database handler info (#1887)
<!-- .github/pull_request_template.md -->

## Description
Add info on dataset database handler used for dataset database

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Datasets now record their assigned vector and graph database handlers,
allowing per-dataset backend selection.

* **Chores**
  * Database schema expanded to store handler identifiers per dataset.
* Deletion/cleanup processes now use dataset-level handler info for
accurate removal across backends.

* **Tests**
* Tests updated to include and validate the new handler fields in
dataset creation outputs.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:22:03 +01:00
Igor Ilic
ede884e0b0
feat: make pipeline processing cache optional (#1876)
<!-- .github/pull_request_template.md -->

## Description
Make the pipeline cache mechanism optional, have it turned off by
default but use it for add and cognify like it has been used until now

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [ x I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Introduced pipeline caching across ingestion, processing, and custom
pipeline flows with per-run controls to enable or disable caching.
  * Added an option for incremental loading in custom pipeline runs.

* **Behavior Changes**
* One pipeline path now explicitly bypasses caching by default to always
re-run when invoked.
* Disabling cache forces re-processing instead of early exit; cache
reset still enables re-execution.

* **Tests**
* Added tests validating caching, non-caching, and cache-reset
re-execution behavior.

* **Chores**
  * Added CI job to run pipeline caching tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:11:31 +01:00
Andrej Milicevic
a337f4e54c test: testing logger 2025-12-12 13:02:55 +01:00
Andrej Milicevic
bce6094010 test: change logger 2025-12-12 12:43:54 +01:00
Andrej Milicevic
c48b274571 test: remove delete error from mcp test 2025-12-12 11:53:40 +01:00
Andrej Milicevic
3b8a607b5f test: fix errors in mcp test 2025-12-12 11:37:27 +01:00
Igor Ilic
7b3d997a06
Merge main vol7 (#1891)
<!-- .github/pull_request_template.md -->

## Description
Add commits from main to dev branch

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Removed permission validation checks from the data processing
pipeline, streamlining the overall workflow and reducing processing
steps.
* Updated task sequences across task handlers to reflect the removal of
the validation step.

* **Documentation**
* Updated processing pipeline documentation and example code to reflect
the new streamlined task sequence.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 20:11:54 +01:00
Igor Ilic
59f8d12fa3 Merge branch 'main' into merge-main-vol7 2025-12-11 19:11:24 +01:00
Andrej Milicevic
e211e66275 chore: remove quick option to isolate mcp CI test 2025-12-11 18:29:17 +01:00
Andrej Milicevic
0f50c993ac chore: add quick option to isolate mcp CI test 2025-12-11 18:20:07 +01:00
Andrej Milicevic
248ba74592 test: remove codify-related stuff from mcp test 2025-12-11 18:18:42 +01:00
Andrej Milicevic
af8c5bedcc feat: add kwargs to other adapters 2025-12-11 17:47:23 +01:00
Igor Ilic
46ddd4fd12
feat: add dataset database handler logic and neo4j/lancedb/kuzu handlers (#1776)
<!-- .github/pull_request_template.md -->

## Description
Add ability to use multi tenant multi user mode with Neo4j

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **New Features**
* Multi-user support with per-dataset database isolation enabled by
default, allowing backend access control for secure data separation.
* Configurable database handlers via environment variables
(GRAPH_DATASET_DATABASE_HANDLER, VECTOR_DATASET_DATABASE_HANDLER) for
flexible deployment options.

* **Chores**
* Database schema migration to support per-user dataset database
configurations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 14:15:20 +01:00
Igor Ilic
0a1ed79340 refactor: change neo4j_aura to neo4j_aura_dev 2025-12-11 13:05:23 +01:00
Pavel Zorin
fe7e97be45
Chore: Remove Ontology file size limit. Code duplications (#1880)
<!-- .github/pull_request_template.md -->

## Description
We received a complaint about the 10MB file size limit. 
Removed code duplications
More strict types
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Support for supplying optional per-file descriptions when uploading
multiple ontologies.

* **Improvements**
* Removed the 10MB file size limit for ontology uploads, allowing larger
files.
* Streamlined and more robust upload handling with improved per-file
validation and safer upload behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-11 10:49:55 +01:00
Pavel Zorin
88f61f9bdb Added filename check 2025-12-10 17:24:31 +01:00
hajdul88
001fbe699e
feat: Adds edge centered payload and embedding structure during ingestion (#1853)
<!-- .github/pull_request_template.md -->

## Description
This pull request introduces edge‑centered payloads to the ingestion
process. Payloads are stored in the Triplet_text collection which is
compatible with the triplet_embedding memify pipeline.

Changes in This PR:

- Refactored custom edge handling, from now on they can be passed to the
add_data_points method so the ingestion is centralized and is happening
in one place.
- Added private methods to handle edge centered payload creation inside
the add_data_points.py
- Added unit tests to cover the new functionality
- Added integration tests
- Added e2e tests

Acceptance Criteria and Testing
Scenario 1:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB contains a non empty Triplet_text collection and
the number of triplets are matching with the number of edges in the
graph database
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB does not have the Triplet_text collection 
-You should receive an error indicating that the Triplet_text is not
available


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet embeddings supported—embeddings created from graph edges plus
connected node text
  * Ability to supply custom edges when adding data points
  * New configuration toggle to enable/disable triplet embedding

* **Tests**
* Added comprehensive unit and end-to-end tests for edge-centered
payloads and triplet embedding
  * New CI job to run the edge-centered payload e2e test

* **Bug Fixes**
* Adjusted server start behavior to surface process output in parent
logs

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-10 17:10:06 +01:00
Pavel Zorin
2ca194c28f fix format 2025-12-09 18:22:44 +01:00
Pavel Zorin
d932ee4bd9 Specify file type 2025-12-09 17:58:34 +01:00
Pavel Zorin
d0b914acaa Chore: Remove Ontology file size limit. Code duplications 2025-12-09 17:55:43 +01:00
Vasilije
49f7c5188c
feat: avoid double edge vector search in triplet search (#1877)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Eliminates double vector search for edges by ensuring all edge lookups
happen once in the retrieval layer.
- `brute_force_triplet_search`: Always includes
"EdgeType_relationship_name" in collections
- `CogneeGraph.map_vector_distances_to_graph_edges`: Removed internal
vector search fallback; only maps provided distances.
- Tests updated to reflect the new behavior.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Ensured relationship edges are automatically included in search
collections, improving search completeness and accuracy.

* **Refactor**
* Simplified graph edge distance mapping logic by removing unnecessary
external dependencies, resulting in more efficient edge processing
during retrieval operations.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-09 13:23:57 +01:00
lxobr
c04d255aca feat: remove secondary search 2025-12-08 17:29:25 +01:00
Vasilije
75fea8dcc8
Removed check_permissions_on_dataset.py and related references (#1786)
<!-- .github/pull_request_template.md -->

## Description
This PR removes the obsolete `check_permissions_on_dataset` task and all
its related imports and usages across the codebase.
The authorization logic is now handled earlier in the pipeline, so this
task is no longer needed.
These changes simplify the default Cognify pipeline and make the code
cleaner and easier to maintain.

### Changes Made
- Removed `cognee/tasks/documents/check_permissions_on_dataset.py` 
- Removed import from `cognee/tasks/documents/__init__.py` 
- Removed import and usage in `cognee/api/v1/cognify/cognify.py` 
- Removed import and usage in
`cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py`
- Updated comments in
`cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py`
(index positions changed)
- Removed usage in `notebooks/cognee_demo.ipynb` 
- Updated documentation in `examples/python/simple_example.py` (process
description)

---

## Type of Change
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Other (please specify): Task removal / cleanup of deprecated
function

---

## Pre-submission Checklist
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue**
- [x] My code follows the project's coding standards and style
guidelines
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description (Closes
#1771)
- [x] My commits have clear and descriptive messages

---

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-08 05:43:42 +01:00
Vasilije
7a3138edf8
fix: remove double quotes from llmconfig str params (#1758)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Recently a few cases cryptic errors like in issue #1721 have occurred
across cognee use cases.

Debugging #1721 however, I found out that if LLM_API_KEY happens to have
`"` quotation marks as part of it's value, for example, when already
part of the ENV

<img width="1014" height="507" alt="Screenshot 2025-11-07 at 16 58 22"
src="https://github.com/user-attachments/assets/54b7cbb0-5bdc-4b40-b2b1-aed6c5d3d886"
/>

Then it makes it's way into Cognee and gets treated as part of the API
key.

By default, we do not do sanitization nor cleanup.

While most of the time quotation marks get handled for us:
1. `export KEY="VALUE"` will strip it
2. python dotenv will strip it if read from `.env`

But issues like https://github.com/docker/cli/issues/3630 and #1721
demonstrate that we have to have some handling on our end instead of
assuming it's stripped.

## This PR

This PR sets up a list of string params we want to strip + some that we
may want to.

We may want to avoid doing this for all params, which is why I went with
selective approach.

TODO: add testing

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Configuration values with surrounding quotes are now automatically
normalized and cleaned during system initialization, ensuring consistent
and predictable data handling across all configuration parameters.

* **Tests**
* Added comprehensive unit tests to validate automatic quote removal
from configuration values, covering various scenarios including quoted,
unquoted, empty, and edge cases with mixed and internal quotes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:10:23 +01:00
Vasilije
40bbdd1ac7
fix: install nvm and node for -ui cli command (#1836)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Enhanced Node.js and npm environment management for improved system
compatibility on Unix-like platforms.

* **Chores**
* Updated Next.js to v16, React to v19.2, and Auth0 SDK to v4.13.1 for
compatibility and performance improvements.
  * Removed CrewAI workflow trigger component.
  * Removed user feedback submission form.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:09:49 +01:00
Vasilije
52b0029fbf
fix: Resolve issue with BAML rate limit handling (#1813)
<!-- .github/pull_request_template.md -->

## Description
Add rate limit handling for BAML

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Improvements**
* Centralized rate limiting for LLM and embedding requests to better
manage API throughput.
* Retry policies adjusted: longer initial backoff and reduced max
retries to improve stability under rate limits.

* **Chores**
* Refactored rate-limiting implementation from decorator-based to
context-manager usage across services.

* **Tests**
* Unit tests and mocks updated to reflect the new context-manager
rate-limiting approach.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-08 05:07:14 +01:00
Igor Ilic
67a4d40257 chore: regen poetry lock file 2025-12-05 19:51:26 +01:00
Igor Ilic
2f572ae509 test: Update embeding limiter test 2025-12-05 19:18:48 +01:00
Igor Ilic
a66b2ceeca refactor: reduce ammount of retry attempts for baml llm calls 2025-12-05 18:58:59 +01:00
Igor Ilic
7deaa6e8e9 feat: Add RPM limiting to Cognee 2025-12-05 18:56:34 +01:00
Igor Ilic
0c97a400b0 feat: Add RPM control 2025-12-05 15:40:24 +01:00
Igor Ilic
5d0586da28
Merge branch 'dev' into baml-rate-limit-handling 2025-12-05 13:24:07 +01:00
hajdul88
d5bf5cf4e9
fix: fixes lancedb batch handling (#1872)
<!-- .github/pull_request_template.md -->

## Description
Fixes lancedb batch handling issue. Duplicated elements could appear in
the collections when duplicates happen in the same insert
batch.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved data integrity by implementing deduplication logic to
eliminate duplicate entries and ensure only the latest version is
retained.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-05 12:26:45 +01:00
Vasilije
9571641199
refactor: move codify pipeline out of main repo (#1738)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
This PR removes codify, and the code graph pipeline, out of the
repository. It also introduces a Custom Pipeline interface, which can be
used in the future to define custom pipelines.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-04 23:10:39 -08:00
Igor Ilic
4afde917a9
Main merge vol4 (#1856)
<!-- .github/pull_request_template.md -->

## Description
Merge main changes into dev branch

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Overhauled README: renamed tagline, clarified product positioning,
reorganized Get Started into Open Source and Cloud paths, streamlined
Quickstart, refreshed demos, navigation, and citation/contributing
sections.

* **New Features**
* Added MCP tools for developer rules management, interaction logging,
and enhanced search modes (SUMMARIES, CYPHER, FEELING_LUCKY).

* **Bug Fixes**
  * Improved embedding handling to support alternate response formats.

* **Chores**
  * Updated test/dev dependency versions.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-04 17:48:11 +01:00
Pavel Zorin
386e0c8234
chore: uv lock check in pre-test workflow (#1773)
<!-- .github/pull_request_template.md -->
`.github/actions/cognee_setup/action.yml` - makes `uv.lock` rebuild
disabled by default.
`.github/workflows/pre_test.yml` - lightweight checks that signal that
PR is not ready for testing. For that moment it verifies that `uv.lock`
corresponds to `pyproject.toml`. For the case of failure the run
provides the instructions how to fix the `uv.lock`

Warning: made an exclusion for python 3.13 dependencies. This doesn't
look good to me

## Description
Make
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI improvement

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Made uv lockfile rebuilding optional in CI (disabled by default).
* Added a pre-validation workflow to check lockfile integrity before
tests.
* Ensured pre-test validation runs before basic and end-to-end test
jobs.
* Updated dependencies to support Python 3.13 with version-specific lxml
constraints and adjusted docs dependency structure.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-04 15:50:39 +01:00
Pavel Zorin
7033b63cd4 update poetry lock 2025-12-04 13:26:53 +01:00
Pavel Zorin
6d1f1d183a Removed python-version from pre-test 2025-12-04 11:51:32 +01:00
Pavel Zorin
ed0c0d1823 Potential fix for code scanning alert no. 407: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-12-04 11:51:32 +01:00
Pavel Zorin
a50810240b Vesion exclusion for lxml and python 3.13 2025-12-04 11:51:28 +01:00
Pavel Zorin
c7a2138966 CI: uv lock check. Pre-test workflow. 2025-12-04 11:47:35 +01:00
Igor Ilic
7d7f8a249a
Merge branch 'dev' into main-merge-vol4 2025-12-04 10:32:10 +01:00
Igor Ilic
f1c5b9a55f fix: Resolve DB caching issues when deleting databases 2025-12-03 18:05:47 +01:00
Igor Ilic
fd84edeb74 refactor: change getting of tables during deletion 2025-12-03 15:43:41 +01:00
Boris
8cad9ef225
Merge branch 'dev' into feature/cog-3409-add-bedrock-as-supported-llm-provider 2025-12-03 14:58:00 +01:00
Igor Ilic
45f32f8bfd
Merge branch 'dev' into multi-tenant-neo4j 2025-12-03 14:37:13 +01:00
Igor Ilic
1961efcc33 fix: Handle scenario when there is no relational database on prune time 2025-12-03 14:27:06 +01:00
Igor Ilic
f4078d1247 feat: Add ability to delete lance and kuzu datasets, add prune to work with multi user mode 2025-12-03 13:10:18 +01:00
Igor Ilic
5698c609f5 test: Update tests with regards to auto scaling changes 2025-12-03 11:47:10 +01:00
Boris Arzentar
0d2e84f58e
test: test_strip_quotes_from_strings 2025-12-03 10:59:17 +01:00
Boris
3288ef01a4
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-03 10:05:49 +01:00
hajdul88
d4d190ac2b
feature: adds triplet embedding via memify (#1832)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces triplet embeddings via a new
create_triplet_embeddings memify pipeline.
The pipeline reads the graph in batches, extracts properties from graph
elements based on their datapoint types, and generates combined triplet
embeddings. These embeddings are stored in the vector database as a new
collection.

Changes in This PR:

-Added a new create_triplet_embeddings memify pipeline.
-Added a new get_triplet_datapoints memify task.
-Introduced a new triplet_completion search type.
-Added full test coverage
--Unit tests: memify task, pipeline, and retriever
--Integration tests: memify task, pipeline, and retriever
--End-to-end tests: updated session history tests and multi-DB search
tests; added tests for triplet_completion and memify pipeline execution

Acceptance Criteria and Testing
Scenario 1:
-Run default add, cognify pipelines
-Run create triplet embeddings memify pipeline
-Verify the vector DB contains a non empty Triplet_text collection.
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Run the default add and cognify pipelines.
-Do not run the triplet embeddings memify pipeline.
-Attempt to use the triplet_completion search type.
-You should receive an error indicating that the triplet embeddings
memify pipeline must be executed first.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION)
* Batch triplet retrieval and a triplet embeddings pipeline for
extraction, indexing, and optional background processing
* Context retrieval from triplet embeddings with optional caching and
conversation-history support
  * New Triplet data type exposed for indexing and search

* **Examples**
* End-to-end example demonstrating triplet embeddings extraction and
TRIPLET_COMPLETION search

* **Tests**
* Unit and integration tests covering triplet extraction, retrieval,
embedding pipeline, and completion flows

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-02 18:27:08 +01:00
Igor Ilic
4e8b2ffc3e chore: Update lock files 2025-12-02 17:51:30 +01:00
Igor Ilic
c7810e9fdb Merge branch 'dev' into main-merge-vol4 2025-12-02 17:40:09 +01:00
Igor Ilic
1282905888 feat: add password encryption for Neo4j 2025-12-02 16:34:16 +01:00
Igor Ilic
2d45db9e0d
Fix distributed issues with latest pydantic version (#1859)
<!-- .github/pull_request_template.md -->

## Description
Resolve distributed issues with poetry lock

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated dependency version constraints to improve compatibility and
flexibility with Pydantic and aiofiles packages.
* Removed unused development dependency from the project configuration.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-02 16:08:26 +01:00
Igor Ilic
92448767fe refactor: remove done TODOs 2025-12-02 14:29:51 +01:00
Igor Ilic
702cdb45be
Merge branch 'dev' into multi-tenant-neo4j 2025-12-02 13:11:37 +01:00
Igor Ilic
dbcb35a6da chore: remove unused imports, add optional for delete dataset statement 2025-12-02 13:09:45 +01:00
Boris Arzentar
0ff836b6dd
fix: install latest nvm version 2025-12-02 10:48:28 +01:00
Boris Arzentar
5fe6a17cfd
fix: resolve nvm when not in path 2025-12-02 10:43:57 +01:00
Boris Arzentar
5ee5ae294a
Merge remote-tracking branch 'origin/dev' into feature/cog-3441-cognee-cli-ui-fix 2025-12-01 20:23:01 +01:00
Andrej Milicevic
d473ef12ae fix: small changes based on PR comments 2025-12-01 18:32:55 +01:00
Igor Ilic
8e67471d1e
Merge branch 'dev' into main-merge-vol4 2025-12-01 17:43:46 +01:00
Vasilije
c17f838034
CI: 32 GB machine for Ollama tests (#1857)
<!-- .github/pull_request_template.md -->

## Description
Recently the Llama test became failing with `model requires more system
memory (8.9 GiB) than is available (8.4 GiB)`. Due to `cgroup`
configuration, only 8 GBs are available for containers running on
`buildjet-4vcpu-ubuntu-2204`.
The decision is to change the the machine to
`buildjet-8vcpu-ubuntu-2204`. it costs 0.0016 $ per minute.

Unconfidently changed the model to `phi3:mini`. Any other ideas are
welcome.
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-12-01 08:01:24 -08:00
Boris
2df84dba27
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-01 16:03:00 +01:00
Igor Ilic
5cfc7b1761 chore: Disable backend access control when not supported 2025-12-01 15:58:19 +01:00
Pavel Zorin
e480acaa7c
COG-3437: Chore: CodeRabbit config (#1833)
## Description
Adds `.coderabbit.yml` that tunes CodeRabbit reviews
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): chore

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Added a project-wide review and tooling configuration to standardize
reviews, path-specific guidance, automated incremental reviews, chat
auto-replies, and integrations with linters/validators.
* Configured review behaviors (auto-review, abort-on-close, high-level
summaries, placeholders) and path filtering to focus checks where
needed.

* **Note**
  * No user-visible changes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-01 15:29:49 +01:00
Pavel Zorin
ba9ca46574 Increase the machine size 2025-12-01 15:23:59 +01:00
Igor Ilic
362aa8df5c
Merge branch 'main' into baml-rate-limit-handling 2025-12-01 15:12:27 +01:00
Igor Ilic
2e493cea4c chore: Disable multi user mode for tests that can't run it 2025-12-01 15:07:01 +01:00
Igor Ilic
524e3e8232 chore: Update lock files 2025-12-01 14:52:01 +01:00
Pavel Zorin
7c9a78abea CI: Smaller embedding model for Ollama test 2025-12-01 14:39:45 +01:00
Igor Ilic
859e98b494 fix: resolve issue with poetry and dev dependency 2025-12-01 13:47:16 +01:00
Boris
76d054b6a5
Merge branch 'dev' into feature/cog-3156-move-codify-pipeline-out-of-main-repo 2025-12-01 11:21:34 +01:00
Igor Ilic
0bb4ece4d8 Merge branch 'main' into main-merge-vol4 2025-12-01 11:16:59 +01:00
Boris
5ce1af8cc0
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-01 10:09:53 +01:00
Igor Ilic
d81d63390f test: Add test for dataset database handler creation 2025-11-28 16:33:46 +01:00
Igor Ilic
a0c5867977 chore: disable backend access control 2025-11-28 14:56:33 +01:00
Igor Ilic
7e0be8f167 chore: disable backend access control for tests not supporting mode 2025-11-28 13:48:01 +01:00
Igor Ilic
7844b9a3a5 Merge branch 'multi-tenant-neo4j' of github.com:topoteretes/cognee into multi-tenant-neo4j 2025-11-28 13:12:07 +01:00
Igor Ilic
ed9b774448 chore: disable backend access control for deduplication test 2025-11-28 13:11:45 +01:00
Igor Ilic
0c825b96ff
Merge branch 'dev' into multi-tenant-neo4j 2025-11-28 12:55:48 +01:00
Vasilije
00b60aed6c
backport: Adds lance-namespace version fix to toml (fixes lancedb issue with 0.2.0 lance-namespace version) + crawler ingetration test url fix (#1842)
<!-- .github/pull_request_template.md -->

## Description
Implements a quick fix for the lance-namespace 0.0.21 to 0.2.0 release
issue with lancedb. Later this has to be revisited if they fix it on
their side, for now we fixed the lance-namespace version to the previous
one.


**If Lancedb fixes the issue on their side this can be closed**


Additionally cherry picking crawler integration test fixes from dev

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-27 10:47:00 -08:00
Igor Ilic
45841824d0 chore: Update Cognee version 2025-11-27 19:29:44 +01:00
Igor Ilic
ddf802ff54 chore: Add migration of unique constraint for SQLite 2025-11-27 18:38:00 +01:00
Andrej Milicevic
aa8afefe8a feat: add kwargs to cognify and related tasks 2025-11-27 17:05:37 +01:00
Andrej Milicevic
c649900042 Merge branch 'dev' into feature/cog-3396-add-support-to-pass-custom-parameters-in-openai-adapter 2025-11-27 16:59:43 +01:00
Andrej Milicevic
c1857a50fa fix: remove new custom pipelien interface 2025-11-27 14:58:07 +01:00
Andrej Milicevic
f776f04ee0 feat: add registration and use of custom retrievers 2025-11-27 14:55:22 +01:00
hajdul88
0fd939ca2b updating url again 2025-11-27 13:28:48 +01:00
hajdul88
8b61e1baa2 fix: adds lance-namespace version fix to toml + fixes lancedb max version 2025-11-27 12:33:04 +01:00
Igor Ilic
1ff6a72fc7 refactor: set default value to empty dictionary 2025-11-26 16:45:18 +01:00
Andrej Milicevic
700362a233 fix: fix model names and test names 2025-11-26 13:35:56 +01:00
Boris Arzentar
ca271c5dbb
fix: lint error 2025-11-26 12:43:57 +01:00
Andrej Milicevic
02b9fa485c fix: remove random addition to pyproject file 2025-11-26 12:33:22 +01:00
Andrej Milicevic
0fe16939c1 remove code_graph example after dev merge 2025-11-26 12:32:17 +01:00
Boris Arzentar
2f06c3a97e
fix: install nvm and node for -ui cli command 2025-11-26 12:24:14 +01:00
Andrej Milicevic
5a2a5f64d2 merge dev 2025-11-26 11:04:11 +01:00
Andrej Milicevic
7c5a17ecb5 test: add extra dependency to bedrock ci test 2025-11-26 11:02:36 +01:00
Pavel Zorin
39c6eba571 coderabbit fix 2025-11-25 18:09:43 +01:00
Igor Ilic
cf9edf2663 chore: Add migration for new dataset database model field 2025-11-25 18:03:35 +01:00
Igor Ilic
69777ef0a5 feat: Add ability to handle custom connection resolution to avoid storing security critical data in rel dbx 2025-11-25 17:53:21 +01:00
Pavel Zorin
0f8cec64d5 fix comments 2025-11-25 17:51:32 +01:00
Pavel Zorin
ff20f021cc fix comments 2025-11-25 17:49:56 +01:00
Pavel Zorin
e46c0c4f6c CodeRabbit config 2025-11-25 17:37:33 +01:00
Igor Ilic
5f3b776406 chore: add todo for enhancing db connections 2025-11-25 16:38:34 +01:00
Igor Ilic
2e02aafbae refactor: Remove unused imports 2025-11-25 15:55:36 +01:00
Igor Ilic
593f17fcdc refactor: Add better handling of configuration for dataset to database handler 2025-11-25 15:41:01 +01:00
Andrej Milicevic
9e652a3a93 fix: use uv instead of poetry in CI tests 2025-11-25 13:34:18 +01:00
Andrej Milicevic
4c6bed885e chore: ruff format 2025-11-25 13:02:26 +01:00
Andrej Milicevic
f22330a7b6 Merge branch 'dev' into feature/bedrock-llm-provider 2025-11-25 13:02:07 +01:00
Andrej Milicevic
e0d48c043a fix: fixes to adapter and tests 2025-11-25 12:58:07 +01:00
Igor Ilic
64a3ee96c4 refactor: Create new abstraction for dataset database mapping and handling 2025-11-24 20:31:28 +01:00
Andrej Milicevic
d97acba78e Merge branch 'dev' into feature/bedrock-llm-provider 2025-11-24 16:48:45 +01:00
Andrej Milicevic
f732fbf55f merge dev 2025-11-24 16:47:59 +01:00
Andrej Milicevic
3b78eb88bd fix: use s3 config 2025-11-24 16:38:23 +01:00
Pavel Zorin
d99a7fffdd
ci(Mergify): configuration update (#1820)
This change has been made by @pazone from the Mergify config editor.
2025-11-21 17:59:57 +01:00
Pavel Zorin
a7d0132e16 ci(Mergify): configuration update
Signed-off-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-21 17:59:15 +01:00
Pavel Zorin
a472db5db5
CI: mergify config (#1818)
<!-- .github/pull_request_template.md -->

## Description
Use case: If you merge(d) a PR to `dev` and also it's good to have it in
`main` ASAP. For example, tests fix.
Just add `backport_main` label to the PR (even if it's already merged).
The same PR will be created but targeted to `main`.

It's called
[Backport](https://docs.mergify.com/workflow/actions/backport/). If the
backport has conflicts - it will have a `conflicts` label. Be careful
with that

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-21 16:57:41 +01:00
Pavel Zorin
dd87acc004
CI: Release drafter configuration (#1817)
<!-- .github/pull_request_template.md -->

## Description
[Release drafter](https://github.com/release-drafter/release-drafter)
template configuration. Will generate and update release draft on each
PR merge to dev

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): CI

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-21 16:57:28 +01:00
Pavel Zorin
efb9739ac6 CI: mergify config 2025-11-21 15:23:35 +01:00
Pavel Zorin
83e0876066 added CI label 2025-11-21 15:04:32 +01:00
Pavel Zorin
0b20f13117 CI: Release drafter configuration 2025-11-21 15:01:09 +01:00
Igor Ilic
0800810713 refactor: remove print statements 2025-11-20 18:46:02 +01:00
Igor Ilic
68d81a9125 refactor: Update multi-user database dataset creation mechanism 2025-11-20 18:37:15 +01:00
Vasilije
3fe354e34e
Handle multiple response formats in OllamaEmbeddingEngine (#1735)
The OllamaEmbeddingEngine is compatible with OpenAI

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [*] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [* ] **I have tested my changes thoroughly before submitting this PR**
- [*] **This PR contains minimal changes necessary to address the
issue/feature**
- [*] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-20 05:19:43 -08:00
Igor Ilic
a8950b244f
Merge branch 'main' into baml-rate-limit-handling 2025-11-20 13:27:22 +01:00
Pavel Zorin
f996ecffcc
docs(mcp): update available tools documentation (#1740)
<!-- .github/pull_request_template.md -->

## Description
- Add missing tools: cognify_add_developer_rules, get_developer_rules,
save_interaction
- Enhance search tool description with all supported search types
- Documentation now accurately reflects all 10 tools implemented in
server.py

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-20 13:20:42 +01:00
Igor Ilic
7360729db1 fix: Resolve issue with BAML rate limit handling 2025-11-19 23:04:19 +01:00
Andrej Milicevic
0a4b1068a2 feat: add kwargs to openai adapter functions 2025-11-17 17:42:22 +01:00
BOBDA DJIMO ARMEL HYACINTHE
4fcbc96e85 Merge branch 'fix/update-mcp-tools-documentation' of https://github.com/armelhbobdad/cognee into fix/update-mcp-tools-documentation 2025-11-17 19:59:19 +04:00
Armel BOBDA
53532921cf
Update cognee-mcp/README.md
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
2025-11-17 19:56:57 +04:00
BOBDA DJIMO ARMEL HYACINTHE
a5aa58a7d7 docs(mcp): correct tool name in README
- Update the tool name from cognify_add_developer_rules to cognee_add_developer_rules.
2025-11-17 19:55:32 +04:00
Armel BOBDA
900dd38625
Merge branch 'main' into fix/update-mcp-tools-documentation 2025-11-17 15:54:35 +04:00
martin0731
3acb581bd0 Removed check_permissions_on_dataset.py and related references 2025-11-13 08:31:15 -05:00
Igor Ilic
d566541a4b
Merge branch 'dev' into multi-tenant-neo4j 2025-11-12 21:32:40 +01:00
Igor Ilic
6bb642d6b8 refactor: Start adding multi-user functions to db interfaces 2025-11-12 21:24:40 +01:00
Igor Ilic
0176cd5a68 refactor: Add todo point 2025-11-12 18:01:44 +01:00
Igor Ilic
b017fcc8d0 refactor: Make neo4j auto scaling more readable 2025-11-12 17:58:27 +01:00
Igor Ilic
a0a14e7ccc refactor: Update dataset database class 2025-11-11 20:05:47 +01:00
Igor Ilic
a06c58a8ae Merge branch 'dev' into multi-tenant-neo4j 2025-11-11 20:04:22 +01:00
Igor Ilic
432d4a1578 feat: Add initial multi tenant neo4j support 2025-11-11 19:44:34 +01:00
Igor Ilic
a8706a2fd2 Merge branch 'feature/cog-3245-enable-multi-user-for-falkor' of github.com:topoteretes/cognee into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 15:13:17 +01:00
Igor Ilic
6a64023876 fix: Update vector db url properly 2025-11-11 15:12:58 +01:00
Andrej Milicevic
ac6c3ef9de fix: fix names, add falkor to constants 2025-11-11 15:07:59 +01:00
Andrej Milicevic
cfc131307f Merge branch 'feature/cog-3245-enable-multi-user-for-falkor' of github.com:topoteretes/cognee into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 14:22:55 +01:00
Andrej Milicevic
4f5771230e fix: PR comment changes 2025-11-11 14:22:42 +01:00
Igor Ilic
41b844a31c
Apply suggestion from @dexters1 2025-11-11 13:56:59 +01:00
Igor Ilic
20d49eeb76
Apply suggestion from @dexters1 2025-11-11 13:56:35 +01:00
Igor Ilic
bb8de7b336
Apply suggestion from @dexters1 2025-11-11 13:56:16 +01:00
Igor Ilic
011a7fb60b fix: Resolve multi user migration 2025-11-11 13:53:19 +01:00
Andrej Milicevic
ed2d687135 fix: changes based on PR comments 2025-11-11 13:52:34 +01:00
Igor Ilic
322c134e7e
Merge branch 'dev' into feature/cog-3245-enable-multi-user-for-falkor 2025-11-11 13:01:07 +01:00
Vasilije
487635b71b
Update Python version range in README 2025-11-09 11:42:45 +01:00
Vasilije
6192d49dd6
fix: added release update version (#1764)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is
generating a summary for commit
44f0498b75. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2025-11-09 09:44:34 +01:00
vasilije
44f0498b75 added release update version 2025-11-09 09:44:07 +01:00
Vasilije
7557e6f10b
Cherry-pick: Fix cypher search (#1739) (#1763)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with cypher search by encoding the return value from the
cypher query into JSON. Uses fastapi json encoder


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
Example of result now with Cypher search with the following query "MATCH
(src)-[rel]->(nbr) RETURN src, rel" on Simple example:
```
{
   "search_result":[
      [
         [
            {
               "_id":{
                  "offset":0,
                  "table":0
               },
               "_label":"Node",
               "id":"87372381-a9fe-5b82-9c92-3f5dbab1bc35",
               "name":"",
               "type":"DocumentChunk",
               "created_at":"2025-11-05T14:12:46.707597",
               "updated_at":"2025-11-05T14:12:54.801747",
               "properties":"{\"created_at\": 1762351945009, \"updated_at\": 1762351945009, \"ontology_valid\": false, \"version\": 1, \"topological_rank\": 0, \"metadata\": {\"index_fields\": [\"text\"]}, \"belongs_to_set\": null, \"text\": \"\\n    Natural language processing (NLP) is an interdisciplinary\\n    subfield of computer science and information retrieval.\\n    \", \"chunk_size\": 48, \"chunk_index\": 0, \"cut_type\": \"paragraph_end\"}"
            },
            {
               "_src":{
                  "offset":0,
                  "table":0
               },
               "_dst":{
                  "offset":1,
                  "table":0
               },
               "_label":"EDGE",
               "_id":{
                  "offset":0,
                  "table":1
               },
               "relationship_name":"contains",
               "created_at":"2025-11-05T14:12:47.217590",
               "updated_at":"2025-11-05T14:12:55.193003",
               "properties":"{\"source_node_id\": \"87372381-a9fe-5b82-9c92-3f5dbab1bc35\", \"target_node_id\": \"bc338a39-64d6-549a-acec-da60846dd90d\", \"relationship_name\": \"contains\", \"updated_at\": \"2025-11-05 14:12:54\", \"relationship_type\": \"contains\", \"edge_text\": \"relationship_name: contains; entity_name: natural language processing (nlp); entity_description: An interdisciplinary subfield of computer science and information retrieval concerned with interactions between computers and human (natural) languages.\"}"
            }
         ]
      ]
   ],
   "dataset_id":"UUID(""af4b1c1c-90fc-59b7-952c-1da9bbde370c"")",
   "dataset_name":"main_dataset",
   "graphs":"None"
}
```
Relates to https://github.com/topoteretes/cognee/pull/1725 Issue:
https://github.com/topoteretes/cognee/issues/1723

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-09 06:48:14 +01:00
Pavel Zorin
007c7d403e Fix cypher search (#1739)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with cypher search by encoding the return value from the
cypher query into JSON. Uses fastapi json encoder


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
Example of result now with Cypher search with the following query "MATCH
(src)-[rel]->(nbr) RETURN src, rel" on Simple example:
```
{
   "search_result":[
      [
         [
            {
               "_id":{
                  "offset":0,
                  "table":0
               },
               "_label":"Node",
               "id":"87372381-a9fe-5b82-9c92-3f5dbab1bc35",
               "name":"",
               "type":"DocumentChunk",
               "created_at":"2025-11-05T14:12:46.707597",
               "updated_at":"2025-11-05T14:12:54.801747",
               "properties":"{\"created_at\": 1762351945009, \"updated_at\": 1762351945009, \"ontology_valid\": false, \"version\": 1, \"topological_rank\": 0, \"metadata\": {\"index_fields\": [\"text\"]}, \"belongs_to_set\": null, \"text\": \"\\n    Natural language processing (NLP) is an interdisciplinary\\n    subfield of computer science and information retrieval.\\n    \", \"chunk_size\": 48, \"chunk_index\": 0, \"cut_type\": \"paragraph_end\"}"
            },
            {
               "_src":{
                  "offset":0,
                  "table":0
               },
               "_dst":{
                  "offset":1,
                  "table":0
               },
               "_label":"EDGE",
               "_id":{
                  "offset":0,
                  "table":1
               },
               "relationship_name":"contains",
               "created_at":"2025-11-05T14:12:47.217590",
               "updated_at":"2025-11-05T14:12:55.193003",
               "properties":"{\"source_node_id\": \"87372381-a9fe-5b82-9c92-3f5dbab1bc35\", \"target_node_id\": \"bc338a39-64d6-549a-acec-da60846dd90d\", \"relationship_name\": \"contains\", \"updated_at\": \"2025-11-05 14:12:54\", \"relationship_type\": \"contains\", \"edge_text\": \"relationship_name: contains; entity_name: natural language processing (nlp); entity_description: An interdisciplinary subfield of computer science and information retrieval concerned with interactions between computers and human (natural) languages.\"}"
            }
         ]
      ]
   ],
   "dataset_id":"UUID(""af4b1c1c-90fc-59b7-952c-1da9bbde370c"")",
   "dataset_name":"main_dataset",
   "graphs":"None"
}
```
Relates to https://github.com/topoteretes/cognee/pull/1725
Issue: https://github.com/topoteretes/cognee/issues/1723

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-08 22:54:29 +01:00
Andrej Milicevic
21b1f6b39c fix: remove add_ and get_developer_rules 2025-11-07 18:28:30 +01:00
Daulet Amirkhanov
05c984f98f feat: enhance LLMConfig to selectively strip quotes from specific string fields 2025-11-07 16:55:49 +00:00
Daulet Amirkhanov
c069dd276e feat: add model validator to strip quotes from string fields in LLMConfig 2025-11-07 16:51:11 +00:00
Vasilije
26c18e9db2
chore: update videos (#1748)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-06 14:14:01 +01:00
Hande
a6db2811d5 chore: update videos 2025-11-06 13:58:18 +01:00
BOBDA DJIMO ARMEL HYACINTHE
41b973a61e docs(mcp): update available tools documentation
- Add missing tools: cognify_add_developer_rules, get_developer_rules, save_interaction
- Enhance search tool description with all supported search types
- Documentation now accurately reflects all 10 tools implemented in server.py
2025-11-05 20:19:52 +04:00
Andrej Milicevic
18e4bb48fd refactor: remove code and repository related tasks 2025-11-05 13:02:56 +01:00
Andrej Milicevic
c481b87d58 refactor: Remove codify and code_graph pipeline from main repo 2025-11-05 12:56:17 +01:00
Gao,Wei
72c20c256d
Handle multiple response formats in OllamaEmbeddingEngine
The OllamaEmbeddingEngine is compatible with OpenAI
2025-11-05 17:44:08 +08:00
Vasilije
1e531385a6
CI: Fix ollama tests (#1713)
<!-- .github/pull_request_template.md -->
Ollama tests were failing during to incorrect embeddings API endpoint

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-02 12:12:44 +01:00
Vasilije
b1fdb71b1e
docs: Fix Readme Instructions (#1706)
- fix grammar, spelling and semantics
- update headings
- add new copy to mirror website copy
- update product names
2025-11-01 07:23:32 +01:00
David Myriel
e69ff7c855 remove agents 2025-10-31 18:36:36 +01:00
David Myriel
4e03406cb6 fix errors 2025-10-31 17:47:28 +01:00
David Myriel
0427586211 fix tagline and description 2025-10-31 17:39:57 +01:00
David Myriel
3472f62a84 bash 2025-10-31 17:00:23 +01:00
David Myriel
a068f3536b fixes 2025-10-31 16:58:21 +01:00
David Myriel
79f5201d6a fixes 2025-10-31 16:52:11 +01:00
David Myriel
7ee6cc8eb8 fix docs 2025-10-30 16:21:41 +01:00
Andrej Milicevic
4b0b9bfc53 chore: ruff format 2025-10-30 16:06:15 +01:00
Andrej Milicevic
28f28f06dd fix: added vector db name to test configs 2025-10-30 16:04:33 +01:00
Andrej Milicevic
ce925615fe fix: fix small naming error 2025-10-30 15:32:27 +01:00
Andrej Milicevic
908d329127 feat: add alembic migrations 2025-10-30 15:15:41 +01:00
Andrej Milicevic
70f3ced15a fix: PR comment fixes 2025-10-29 16:30:13 +01:00
Andrej Milicevic
c3f0cb95da fix: delete unnecessary comments, add to config 2025-10-28 18:06:04 +01:00
Andrej Milicevic
9c9395851c chore: ruff formatting 2025-10-28 18:01:32 +01:00
Andrej Milicevic
bbcd8baf3a feature: add multi-user for Falkor db 2025-10-28 17:56:32 +01:00
Andrej Milicevic
813ee94836 Initial commit, still wip 2025-10-27 08:12:37 +01:00
xdurawa
c91d1ff0ae Remove documentation changes as requested by reviewers
- Reverted README.md to original state
- Reverted cognee-starter-kit/README.md to original state
- Documentation will be updated separately by maintainers
2025-09-03 01:34:35 -04:00
xavierdurawa
06a3458982
Merge branch 'dev' into feature/bedrock-llm-provider 2025-09-02 23:08:38 -04:00
xdurawa
3821966d89 Merge feature/bedrock_llm_client into feature/bedrock-llm-provider
- Resolved conflicts in test_suites.yml to include both OpenRouter and Bedrock tests
- Updated add.py to include bedrock provider with latest dev branch model defaults
- Resolved uv.lock conflicts by using dev branch version
2025-09-01 20:55:20 -04:00
xdurawa
2a46208569 Adding AWS Bedrock support as a LLM provider
Signed-off-by: xdurawa <xavierdurawa@gmail.com>
2025-09-01 20:47:26 -04:00
171 changed files with 17436 additions and 13307 deletions

View file

@ -3,8 +3,7 @@
language: en
early_access: false
enable_free_tier: true
reviews:
enabled: true
reviews:
profile: chill
instructions: >-
# Code Review Instructions
@ -28,7 +27,7 @@ reviews:
# Misc.
- Confirm that the code meets the project's requirements and objectives.
- Confirm that copyright years are up-to date whenever a file is changed.
request_changes_workflow: true
request_changes_workflow: false
high_level_summary: true
high_level_summary_placeholder: '@coderabbitai summary'
auto_title_placeholder: '@coderabbitai'
@ -103,31 +102,6 @@ reviews:
- dev
- main
tools:
languagetool:
enabled: true
language: en-US
configuration:
level: picky
mother_tongue: en
dictionary:
- 'reactive-firewall'
- 'CEP-9'
- 'CEP-8'
- 'CEP-7'
- 'CEP-5'
- 'Shellscript'
- 'bash'
disabled_rules:
- EN_QUOTES
- CONSECUTIVE_SPACES
enabled_rules:
- STYLE
- EN_CONTRACTION_SPELLING
- EN_WORD_COHERENCY
- IT_IS_OBVIOUS
- TWELFTH_OF_NEVER
- OXFORD_SPELLING
- PASSIVE_VOICE
shellcheck:
enabled: true
ruff:
@ -149,6 +123,5 @@ reviews:
enabled: true
yamllint:
enabled: true
configuration_file: ".yamllint.conf"
chat:
auto_reply: true

View file

@ -97,6 +97,8 @@ DB_NAME=cognee_db
# Default (local file-based)
GRAPH_DATABASE_PROVIDER="kuzu"
# Handler for multi-user access control mode, it handles how should the mapping/creation of separate DBs be handled per Cognee dataset
GRAPH_DATASET_DATABASE_HANDLER="kuzu"
# -- To switch to Remote Kuzu uncomment and fill these: -------------------------------------------------------------
#GRAPH_DATABASE_PROVIDER="kuzu"
@ -121,6 +123,8 @@ VECTOR_DB_PROVIDER="lancedb"
# Not needed if a cloud vector database is not used
VECTOR_DB_URL=
VECTOR_DB_KEY=
# Handler for multi-user access control mode, it handles how should the mapping/creation of separate DBs be handled per Cognee dataset
VECTOR_DATASET_DATABASE_HANDLER="lancedb"
################################################################################
# 🧩 Ontology resolver settings

View file

@ -10,6 +10,10 @@ inputs:
description: "Additional extra dependencies to install (space-separated)"
required: false
default: ""
rebuild-lockfile:
description: "Whether to rebuild the uv lockfile"
required: false
default: "false"
runs:
using: "composite"
@ -26,6 +30,7 @@ runs:
enable-cache: true
- name: Rebuild uv lockfile
if: ${{ inputs.rebuild-lockfile == 'true' }}
shell: bash
run: |
rm uv.lock

20
.github/release-drafter.yml vendored Normal file
View file

@ -0,0 +1,20 @@
name-template: 'v$NEXT_PATCH_VERSION'
tag-template: 'v$NEXT_PATCH_VERSION'
categories:
- title: 'Features'
labels: ['feature', 'enhancement']
- title: 'Bug Fixes'
labels: ['bug', 'fix']
- title: 'Maintenance'
labels: ['chore', 'refactor', 'ci']
change-template: '- $TITLE (#$NUMBER) @$AUTHOR'
template: |
## Whats Changed
$CHANGES
## Contributors
$CONTRIBUTORS

View file

@ -197,33 +197,3 @@ jobs:
- name: Run Simple Examples
run: uv run python ./examples/python/simple_example.py
graph-tests:
name: Run Basic Graph Tests
runs-on: ubuntu-22.04
env:
ENV: 'dev'
LLM_PROVIDER: openai
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_PROVIDER: openai
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}
- name: Run Graph Tests
run: uv run python ./examples/python/code_graph_example.py --repo_path ./cognee/tasks/graph

View file

@ -61,6 +61,7 @@ jobs:
- name: Run Neo4j Example
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -142,6 +143,7 @@ jobs:
- name: Run PGVector Example
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

View file

@ -47,6 +47,7 @@ jobs:
- name: Run Distributed Cognee (Modal)
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

View file

@ -147,6 +147,7 @@ jobs:
- name: Run Deduplication Example
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Test needs OpenAI endpoint to handle multimedia
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
@ -211,6 +212,56 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_parallel_databases.py
test-dataset-database-handler:
name: Test dataset database handlers in Cognee
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run dataset databases handler test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_dataset_database_handler.py
test-dataset-database-deletion:
name: Test dataset database deletion in Cognee
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run dataset databases deletion test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_dataset_delete.py
test-permissions:
name: Test permissions with different situations in Cognee
runs-on: ubuntu-22.04
@ -412,6 +463,35 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_feedback_enrichment.py
test-edge-centered-payload:
name: Test Cognify - Edge Centered Payload
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Dependencies already installed
run: echo "Dependencies already installed in setup"
- name: Run Edge Centered Payload Test
env:
ENV: 'dev'
TRIPLET_EMBEDDING: True
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_edge_centered_payload.py
run_conversation_sessions_test_redis:
name: Conversation sessions test (Redis)
runs-on: ubuntu-latest
@ -527,3 +607,30 @@ jobs:
DB_USERNAME: cognee
DB_PASSWORD: cognee
run: uv run python ./cognee/tests/test_conversation_history.py
run-pipeline-cache-test:
name: Test Pipeline Caching
runs-on: ubuntu-22.04
steps:
- name: Check out
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
- name: Run Pipeline Cache Test
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_pipeline_cache.py

View file

@ -72,6 +72,7 @@ jobs:
- name: Run Descriptive Graph Metrics Example
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

View file

@ -78,6 +78,7 @@ jobs:
- name: Run default Neo4j
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}

22
.github/workflows/pre_test.yml vendored Normal file
View file

@ -0,0 +1,22 @@
on:
workflow_call:
permissions:
contents: read
jobs:
check-uv-lock:
name: Validate uv lockfile and project metadata
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Validate uv lockfile and project metadata
run: uv lock --check || { echo "'uv lock --check' failed."; echo "Run 'uv lock' and push your changes."; exit 1; }

138
.github/workflows/release.yml vendored Normal file
View file

@ -0,0 +1,138 @@
name: release.yml
on:
workflow_dispatch:
inputs:
flavour:
required: true
default: dev
type: choice
options:
- dev
- main
description: Dev or Main release
jobs:
release-github:
name: Create GitHub Release from ${{ inputs.flavour }}
outputs:
tag: ${{ steps.create_tag.outputs.tag }}
version: ${{ steps.create_tag.outputs.version }}
permissions:
contents: write
runs-on: ubuntu-latest
steps:
- name: Check out ${{ inputs.flavour }}
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Create and push git tag
id: create_tag
run: |
VERSION="$(uv version --short)"
TAG="v${VERSION}"
echo "Tag to create: ${TAG}"
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
echo "tag=${TAG}" >> "$GITHUB_OUTPUT"
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
git tag "${TAG}"
git push origin "${TAG}"
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
tag_name: ${{ steps.create_tag.outputs.tag }}
generate_release_notes: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
release-pypi-package:
needs: release-github
name: Release PyPI Package from ${{ inputs.flavour }}
permissions:
contents: read
runs-on: ubuntu-latest
steps:
- name: Check out ${{ inputs.flavour }}
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Install Python
run: uv python install
- name: Install dependencies
run: uv sync --locked --all-extras
- name: Build distributions
run: uv build
- name: Publish ${{ inputs.flavour }} release to PyPI
env:
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: uv publish
release-docker-image:
needs: release-github
name: Release Docker Image from ${{ inputs.flavour }}
permissions:
contents: read
runs-on: ubuntu-latest
steps:
- name: Check out ${{ inputs.flavour }}
uses: actions/checkout@v4
with:
ref: ${{ inputs.flavour }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push Dev Docker Image
if: ${{ inputs.flavour == 'dev' }}
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: cognee/cognee:${{ needs.release-github.outputs.version }}
labels: |
version=${{ needs.release-github.outputs.version }}
flavour=${{ inputs.flavour }}
cache-from: type=registry,ref=cognee/cognee:buildcache
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
- name: Build and push Main Docker Image
if: ${{ inputs.flavour == 'main' }}
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: |
cognee/cognee:${{ needs.release-github.outputs.version }}
cognee/cognee:latest
labels: |
version=${{ needs.release-github.outputs.version }}
flavour=${{ inputs.flavour }}
cache-from: type=registry,ref=cognee/cognee:buildcache
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max

View file

@ -72,6 +72,7 @@ jobs:
- name: Run Temporal Graph with Neo4j (lancedb + sqlite)
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
@ -123,6 +124,7 @@ jobs:
- name: Run Temporal Graph with Kuzu (postgres + pgvector)
env:
ENV: dev
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
@ -189,6 +191,7 @@ jobs:
- name: Run Temporal Graph with Neo4j (postgres + pgvector)
env:
ENV: dev
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}

View file

@ -84,3 +84,93 @@ jobs:
EMBEDDING_DIMENSIONS: "3072"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
test-bedrock-api-key:
name: Run Bedrock API Key Test
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Run Bedrock API Key Simple Example
env:
LLM_PROVIDER: "bedrock"
LLM_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
LLM_MAX_TOKENS: "16384"
AWS_REGION_NAME: "eu-west-1"
EMBEDDING_PROVIDER: "bedrock"
EMBEDDING_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
test-bedrock-aws-credentials:
name: Run Bedrock AWS Credentials Test
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Run Bedrock AWS Credentials Simple Example
env:
LLM_PROVIDER: "bedrock"
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
LLM_MAX_TOKENS: "16384"
AWS_REGION_NAME: "eu-west-1"
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
EMBEDDING_PROVIDER: "bedrock"
EMBEDDING_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py
test-bedrock-aws-profile:
name: Run Bedrock AWS Profile Test
runs-on: ubuntu-22.04
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: '3.11.x'
extra-dependencies: "aws"
- name: Configure AWS Profile
run: |
mkdir -p ~/.aws
cat > ~/.aws/credentials << EOF
[bedrock-test]
aws_access_key_id = ${{ secrets.AWS_ACCESS_KEY_ID }}
aws_secret_access_key = ${{ secrets.AWS_SECRET_ACCESS_KEY }}
EOF
- name: Run Bedrock AWS Profile Simple Example
env:
LLM_PROVIDER: "bedrock"
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
LLM_MAX_TOKENS: "16384"
AWS_PROFILE_NAME: "bedrock-test"
AWS_REGION_NAME: "eu-west-1"
EMBEDDING_PROVIDER: "bedrock"
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
EMBEDDING_DIMENSIONS: "1024"
EMBEDDING_MAX_TOKENS: "8191"
run: uv run python ./examples/python/simple_example.py

View file

@ -7,8 +7,8 @@ jobs:
run_ollama_test:
runs-on: buildjet-4vcpu-ubuntu-2204
# needs 32 Gb RAM for phi4 in a container
runs-on: buildjet-8vcpu-ubuntu-2204
steps:
- name: Checkout repository
@ -47,15 +47,15 @@ jobs:
- name: Pull required Ollama models
run: |
curl -X POST http://localhost:11434/api/pull -d '{"name": "phi3:mini"}'
curl -X POST http://localhost:11434/api/pull -d '{"name": "nomic-embed-text"}'
curl -X POST http://localhost:11434/api/pull -d '{"name": "phi4"}'
curl -X POST http://localhost:11434/api/pull -d '{"name": "avr/sfr-embedding-mistral:latest"}'
- name: Call ollama API
run: |
curl -X POST http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi3:mini",
"model": "phi4",
"stream": false,
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
@ -65,7 +65,7 @@ jobs:
curl -X POST http://127.0.0.1:11434/api/embed \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"model": "avr/sfr-embedding-mistral:latest",
"input": "This is a test sentence to generate an embedding."
}'
@ -82,10 +82,10 @@ jobs:
LLM_PROVIDER: "ollama"
LLM_API_KEY: "ollama"
LLM_ENDPOINT: "http://localhost:11434/v1/"
LLM_MODEL: "phi3:mini"
LLM_MODEL: "phi4"
EMBEDDING_PROVIDER: "ollama"
EMBEDDING_MODEL: "nomic-embed-text"
EMBEDDING_MODEL: "avr/sfr-embedding-mistral:latest"
EMBEDDING_ENDPOINT: "http://localhost:11434/api/embed"
EMBEDDING_DIMENSIONS: "768"
HUGGINGFACE_TOKENIZER: "nomic-ai/nomic-embed-text-v1"
EMBEDDING_DIMENSIONS: "4096"
HUGGINGFACE_TOKENIZER: "Salesforce/SFR-Embedding-Mistral"
run: uv run python ./examples/python/simple_example.py

View file

@ -18,15 +18,21 @@ env:
RUNTIME__LOG_LEVEL: ERROR
ENV: 'dev'
jobs:
jobs:
pre-test:
name: basic checks
uses: ./.github/workflows/pre_test.yml
basic-tests:
name: Basic Tests
uses: ./.github/workflows/basic_tests.yml
needs: [ pre-test ]
secrets: inherit
e2e-tests:
name: End-to-End Tests
uses: ./.github/workflows/e2e_tests.yml
needs: [ pre-test ]
secrets: inherit
distributed-tests:

View file

@ -92,6 +92,7 @@ jobs:
- name: Run PGVector Tests
env:
ENV: 'dev'
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
@ -127,4 +128,4 @@ jobs:
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_lancedb.py
run: uv run python ./cognee/tests/test_lancedb.py

View file

@ -94,6 +94,7 @@ jobs:
- name: Run Weighted Edges Tests
env:
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
GRAPH_DATABASE_PROVIDER: ${{ matrix.graph_db_provider }}
GRAPH_DATABASE_URL: ${{ matrix.graph_db_provider == 'neo4j' && steps.neo4j.outputs.neo4j-url || '' }}
GRAPH_DATABASE_USERNAME: ${{ matrix.graph_db_provider == 'neo4j' && steps.neo4j.outputs.neo4j-username || '' }}
@ -165,5 +166,3 @@ jobs:
uses: astral-sh/ruff-action@v2
with:
args: "format --check cognee/modules/graph/utils/get_graph_from_model.py cognee/tests/unit/interfaces/graph/test_weighted_edges.py examples/python/weighted_edges_example.py"

9
.mergify.yml Normal file
View file

@ -0,0 +1,9 @@
pull_request_rules:
- name: Backport to main when backport_main label is set
conditions:
- label=backport_main
- base=dev
actions:
backport:
branches:
- main

View file

@ -71,7 +71,7 @@ git clone https://github.com/<your-github-username>/cognee.git
cd cognee
```
In case you are working on Vector and Graph Adapters
1. Fork the [**cognee**](https://github.com/topoteretes/cognee-community) repository
1. Fork the [**cognee-community**](https://github.com/topoteretes/cognee-community) repository
2. Clone your fork:
```shell
git clone https://github.com/<your-github-username>/cognee-community.git
@ -97,6 +97,21 @@ git checkout -b feature/your-feature-name
python cognee/cognee/tests/test_library.py
```
### Running Simple Example
Change .env.example into .env and provide your OPENAI_API_KEY as LLM_API_KEY
Make sure to run ```shell uv sync ``` in the root cloned folder or set up a virtual environment to run cognee
```shell
python cognee/cognee/examples/python/simple_example.py
```
or
```shell
uv run python cognee/cognee/examples/python/simple_example.py
```
## 4. 📤 Submitting Changes
1. Install ruff on your system

163
README.md
View file

@ -5,27 +5,27 @@
<br />
cognee - Memory for AI Agents in 6 lines of code
Cognee - Accurate and Persistent AI Memory
<p align="center">
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
.
<a href="https://cognee.ai">Learn more</a>
<a href="https://docs.cognee.ai/">Docs</a>
.
<a href="https://cognee.ai">Learn More</a>
·
<a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
·
<a href="https://www.reddit.com/r/AIMemory/">Join r/AIMemory</a>
.
<a href="https://docs.cognee.ai/">Docs</a>
.
<a href="https://github.com/topoteretes/cognee-community">cognee community repo</a>
<a href="https://github.com/topoteretes/cognee-community">Community Plugins & Add-ons</a>
</p>
[![GitHub forks](https://img.shields.io/github/forks/topoteretes/cognee.svg?style=social&label=Fork&maxAge=2592000)](https://GitHub.com/topoteretes/cognee/network/)
[![GitHub stars](https://img.shields.io/github/stars/topoteretes/cognee.svg?style=social&label=Star&maxAge=2592000)](https://GitHub.com/topoteretes/cognee/stargazers/)
[![GitHub commits](https://badgen.net/github/commits/topoteretes/cognee)](https://GitHub.com/topoteretes/cognee/commit/)
[![Github tag](https://badgen.net/github/tag/topoteretes/cognee)](https://github.com/topoteretes/cognee/tags/)
[![GitHub tag](https://badgen.net/github/tag/topoteretes/cognee)](https://github.com/topoteretes/cognee/tags/)
[![Downloads](https://static.pepy.tech/badge/cognee)](https://pepy.tech/project/cognee)
[![License](https://img.shields.io/github/license/topoteretes/cognee?colorA=00C586&colorB=000000)](https://github.com/topoteretes/cognee/blob/main/LICENSE)
[![Contributors](https://img.shields.io/github/contributors/topoteretes/cognee?colorA=00C586&colorB=000000)](https://github.com/topoteretes/cognee/graphs/contributors)
@ -41,11 +41,7 @@
</a>
</p>
Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Extract, Cognify, Load) pipelines.
Use your data to build personalized and dynamic memory for AI Agents. Cognee lets you replace RAG with scalable and modular ECL (Extract, Cognify, Load) pipelines.
<p align="center">
🌐 Available Languages
@ -53,7 +49,7 @@ Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Ext
<!-- Keep these links. Translations will automatically update with the README. -->
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=de">Deutsch</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=es">Español</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=fr">français</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=fr">Français</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=ja">日本語</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=ko">한국어</a> |
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=pt">Português</a> |
@ -67,73 +63,62 @@ Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Ext
</div>
</div>
## About Cognee
Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.
Cognee offers default memory creation and search which we describe bellow. But with Cognee you can build your own!
## Get Started
### Cognee Open Source:
Get started quickly with a Google Colab <a href="https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing">notebook</a> , <a href="https://deepnote.com/workspace/cognee-382213d0-0444-4c89-8265-13770e333c02/project/cognee-demo-78ffacb9-5832-4611-bb1a-560386068b30/notebook/Notebook-1-75b24cda566d4c24ab348f7150792601?utm_source=share-modal&utm_medium=product-shared-content&utm_campaign=notebook&utm_content=78ffacb9-5832-4611-bb1a-560386068b30">Deepnote notebook</a> or <a href="https://github.com/topoteretes/cognee/tree/main/cognee-starter-kit">starter repo</a>
- Interconnects any type of data — including past conversations, files, images, and audio transcriptions
- Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
- Reduces developer effort and infrastructure cost while improving quality and precision
- Provides Pythonic data pipelines for ingestion from 30+ data sources
- Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints
## About cognee
## Basic Usage & Feature Guide
cognee works locally and stores your data on your device.
Our hosted solution is just our deployment of OSS cognee on Modal, with the goal of making development and productionization easier.
To learn more, [check out this short, end-to-end Colab walkthrough](https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing) of Cognee's core features.
Self-hosted package:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing)
- Interconnects any kind of documents: past conversations, files, images, and audio transcriptions
- Replaces RAG systems with a memory layer based on graphs and vectors
- Reduces developer effort and cost, while increasing quality and precision
- Provides Pythonic data pipelines that manage data ingestion from 30+ data sources
- Is highly customizable with custom tasks, pipelines, and a set of built-in search endpoints
## Quickstart
Hosted platform:
- Includes a managed UI and a [hosted solution](https://www.cognee.ai)
Lets try Cognee in just a few lines of code. For detailed setup and configuration, see the [Cognee Docs](https://docs.cognee.ai/getting-started/installation#environment-configuration).
### Prerequisites
- Python 3.10 to 3.13
## Self-Hosted (Open Source)
### Step 1: Install Cognee
### 📦 Installation
You can install Cognee using either **pip**, **poetry**, **uv** or any other python package manager..
Cognee supports Python 3.10 to 3.12
#### With uv
You can install Cognee with **pip**, **poetry**, **uv**, or your preferred Python package manager.
```bash
uv pip install cognee
```
Detailed instructions can be found in our [docs](https://docs.cognee.ai/getting-started/installation#environment-configuration)
### 💻 Basic Usage
#### Setup
```
### Step 2: Configure the LLM
```python
import os
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"
```
Alternatively, create a `.env` file using our [template](https://github.com/topoteretes/cognee/blob/main/.env.template).
You can also set the variables by creating .env file, using our <a href="https://github.com/topoteretes/cognee/blob/main/.env.template">template.</a>
To use different LLM providers, for more info check out our <a href="https://docs.cognee.ai/setup-configuration/llm-providers">documentation</a>
To integrate other LLM providers, see our [LLM Provider Documentation](https://docs.cognee.ai/setup-configuration/llm-providers).
### Step 3: Run the Pipeline
#### Simple example
Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.
##### Python
This script will run the default pipeline:
Now, run a minimal pipeline:
```python
import cognee
import asyncio
from pprint import pprint
async def main():
@ -147,80 +132,72 @@ async def main():
await cognee.memify()
# Query the knowledge graph
results = await cognee.search("What does cognee do?")
results = await cognee.search("What does Cognee do?")
# Display the results
for result in results:
print(result)
pprint(result)
if __name__ == '__main__':
asyncio.run(main())
```
Example output:
```
As you can see, the output is generated from the document we previously stored in Cognee:
```bash
Cognee turns documents into AI memory.
```
##### Via CLI
Let's get the basics covered
### Use the Cognee CLI
```
As an alternative, you can get started with these essential commands:
```bash
cognee-cli add "Cognee turns documents into AI memory."
cognee-cli cognify
cognee-cli search "What does cognee do?"
cognee-cli search "What does Cognee do?"
cognee-cli delete --all
```
or run
```
To open the local UI, run:
```bash
cognee-cli -ui
```
## Demos & Examples
</div>
See Cognee in action:
### Persistent Agent Memory
[Cognee Memory for LangGraph Agents](https://github.com/user-attachments/assets/e113b628-7212-4a2b-b288-0be39a93a1c3)
### Simple GraphRAG
[Watch Demo](https://github.com/user-attachments/assets/f2186b2e-305a-42b0-9c2d-9f4473f15df8)
### Cognee with Ollama
[Watch Demo](https://github.com/user-attachments/assets/39672858-f774-4136-b957-1e2de67b8981)
### Hosted Platform
## Community & Support
Get up and running in minutes with automatic updates, analytics, and enterprise security.
### Contributing
We welcome contributions from the community! Your input helps make Cognee better for everyone. See [`CONTRIBUTING.md`](CONTRIBUTING.md) to get started.
1. Sign up on [cogwit](https://www.cognee.ai)
2. Add your API key to local UI and sync your data to Cogwit
### Code of Conduct
We're committed to fostering an inclusive and respectful community. Read our [Code of Conduct](https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md) for guidelines.
## Research & Citation
## Demos
1. Cogwit Beta demo:
[Cogwit Beta](https://github.com/user-attachments/assets/fa520cd2-2913-4246-a444-902ea5242cb0)
2. Simple GraphRAG demo
[Simple GraphRAG demo](https://github.com/user-attachments/assets/d80b0776-4eb9-4b8e-aa22-3691e2d44b8f)
3. cognee with Ollama
[cognee with local models](https://github.com/user-attachments/assets/8621d3e8-ecb8-4860-afb2-5594f2ee17db)
## Contributing
Your contributions are at the core of making this a true open source project. Any contributions you make are **greatly appreciated**. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for more information.
## Code of Conduct
We are committed to making open source an enjoyable and respectful experience for our community. See <a href="https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md"><code>CODE_OF_CONDUCT</code></a> for more information.
## Citation
We now have a paper you can cite:
We recently published a research paper on optimizing knowledge graphs for LLM reasoning:
```bibtex
@misc{markovic2025optimizinginterfaceknowledgegraphs,

View file

@ -0,0 +1,333 @@
"""Expand dataset database with json connection field
Revision ID: 46a6ce2bd2b2
Revises: 76625596c5c3
Create Date: 2025-11-25 17:56:28.938931
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "46a6ce2bd2b2"
down_revision: Union[str, None] = "76625596c5c3"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
graph_constraint_name = "dataset_database_graph_database_name_key"
vector_constraint_name = "dataset_database_vector_database_name_key"
TABLE_NAME = "dataset_database"
def _get_column(inspector, table, name, schema=None):
for col in inspector.get_columns(table, schema=schema):
if col["name"] == name:
return col
return None
def _recreate_table_without_unique_constraint_sqlite(op, insp):
"""
SQLite cannot drop unique constraints on individual columns. We must:
1. Create a new table without the unique constraints.
2. Copy data from the old table.
3. Drop the old table.
4. Rename the new table.
"""
conn = op.get_bind()
# Create new table definition (without unique constraints)
op.create_table(
f"{TABLE_NAME}_new",
sa.Column("owner_id", sa.UUID()),
sa.Column("dataset_id", sa.UUID(), primary_key=True, nullable=False),
sa.Column("vector_database_name", sa.String(), nullable=False),
sa.Column("graph_database_name", sa.String(), nullable=False),
sa.Column("vector_database_provider", sa.String(), nullable=False),
sa.Column("graph_database_provider", sa.String(), nullable=False),
sa.Column(
"vector_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
sa.Column(
"graph_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
sa.Column("vector_database_url", sa.String()),
sa.Column("graph_database_url", sa.String()),
sa.Column("vector_database_key", sa.String()),
sa.Column("graph_database_key", sa.String()),
sa.Column(
"graph_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column(
"vector_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column("created_at", sa.DateTime()),
sa.Column("updated_at", sa.DateTime()),
sa.ForeignKeyConstraint(["dataset_id"], ["datasets.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["owner_id"], ["principals.id"], ondelete="CASCADE"),
)
# Copy data into new table
conn.execute(
sa.text(f"""
INSERT INTO {TABLE_NAME}_new
SELECT
owner_id,
dataset_id,
vector_database_name,
graph_database_name,
vector_database_provider,
graph_database_provider,
vector_dataset_database_handler,
graph_dataset_database_handler,
vector_database_url,
graph_database_url,
vector_database_key,
graph_database_key,
COALESCE(graph_database_connection_info, '{{}}'),
COALESCE(vector_database_connection_info, '{{}}'),
created_at,
updated_at
FROM {TABLE_NAME}
""")
)
# Drop old table
op.drop_table(TABLE_NAME)
# Rename new table
op.rename_table(f"{TABLE_NAME}_new", TABLE_NAME)
def _recreate_table_with_unique_constraint_sqlite(op, insp):
"""
SQLite cannot drop unique constraints on individual columns. We must:
1. Create a new table without the unique constraints.
2. Copy data from the old table.
3. Drop the old table.
4. Rename the new table.
"""
conn = op.get_bind()
# Create new table definition (without unique constraints)
op.create_table(
f"{TABLE_NAME}_new",
sa.Column("owner_id", sa.UUID()),
sa.Column("dataset_id", sa.UUID(), primary_key=True, nullable=False),
sa.Column("vector_database_name", sa.String(), nullable=False, unique=True),
sa.Column("graph_database_name", sa.String(), nullable=False, unique=True),
sa.Column("vector_database_provider", sa.String(), nullable=False),
sa.Column("graph_database_provider", sa.String(), nullable=False),
sa.Column(
"vector_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
sa.Column(
"graph_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
sa.Column("vector_database_url", sa.String()),
sa.Column("graph_database_url", sa.String()),
sa.Column("vector_database_key", sa.String()),
sa.Column("graph_database_key", sa.String()),
sa.Column(
"graph_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column(
"vector_database_connection_info",
sa.JSON(),
nullable=False,
server_default=sa.text("'{}'"),
),
sa.Column("created_at", sa.DateTime()),
sa.Column("updated_at", sa.DateTime()),
sa.ForeignKeyConstraint(["dataset_id"], ["datasets.id"], ondelete="CASCADE"),
sa.ForeignKeyConstraint(["owner_id"], ["principals.id"], ondelete="CASCADE"),
)
# Copy data into new table
conn.execute(
sa.text(f"""
INSERT INTO {TABLE_NAME}_new
SELECT
owner_id,
dataset_id,
vector_database_name,
graph_database_name,
vector_database_provider,
graph_database_provider,
vector_dataset_database_handler,
graph_dataset_database_handler,
vector_database_url,
graph_database_url,
vector_database_key,
graph_database_key,
COALESCE(graph_database_connection_info, '{{}}'),
COALESCE(vector_database_connection_info, '{{}}'),
created_at,
updated_at
FROM {TABLE_NAME}
""")
)
# Drop old table
op.drop_table(TABLE_NAME)
# Rename new table
op.rename_table(f"{TABLE_NAME}_new", TABLE_NAME)
def upgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
unique_constraints = insp.get_unique_constraints(TABLE_NAME)
vector_database_connection_info_column = _get_column(
insp, "dataset_database", "vector_database_connection_info"
)
if not vector_database_connection_info_column:
op.add_column(
"dataset_database",
sa.Column(
"vector_database_connection_info",
sa.JSON(),
unique=False,
nullable=False,
server_default=sa.text("'{}'"),
),
)
vector_dataset_database_handler = _get_column(
insp, "dataset_database", "vector_dataset_database_handler"
)
if not vector_dataset_database_handler:
# Add LanceDB as the default graph dataset database handler
op.add_column(
"dataset_database",
sa.Column(
"vector_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="lancedb",
),
)
graph_database_connection_info_column = _get_column(
insp, "dataset_database", "graph_database_connection_info"
)
if not graph_database_connection_info_column:
op.add_column(
"dataset_database",
sa.Column(
"graph_database_connection_info",
sa.JSON(),
unique=False,
nullable=False,
server_default=sa.text("'{}'"),
),
)
graph_dataset_database_handler = _get_column(
insp, "dataset_database", "graph_dataset_database_handler"
)
if not graph_dataset_database_handler:
# Add Kuzu as the default graph dataset database handler
op.add_column(
"dataset_database",
sa.Column(
"graph_dataset_database_handler",
sa.String(),
unique=False,
nullable=False,
server_default="kuzu",
),
)
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
# Drop the unique constraint to make unique=False
graph_constraint_to_drop = None
for uc in unique_constraints:
# Check if the constraint covers ONLY the target column
if uc["name"] == graph_constraint_name:
graph_constraint_to_drop = uc["name"]
break
vector_constraint_to_drop = None
for uc in unique_constraints:
# Check if the constraint covers ONLY the target column
if uc["name"] == vector_constraint_name:
vector_constraint_to_drop = uc["name"]
break
if (
vector_constraint_to_drop
and graph_constraint_to_drop
and op.get_context().dialect.name == "postgresql"
):
# PostgreSQL
batch_op.drop_constraint(graph_constraint_name, type_="unique")
batch_op.drop_constraint(vector_constraint_name, type_="unique")
if op.get_context().dialect.name == "sqlite":
conn = op.get_bind()
# Fun fact: SQLite has hidden auto indexes for unique constraints that can't be dropped or accessed directly
# So we need to check for them and drop them by recreating the table (altering column also won't work)
result = conn.execute(sa.text("PRAGMA index_list('dataset_database')"))
rows = result.fetchall()
unique_auto_indexes = [row for row in rows if row[3] == "u"]
for row in unique_auto_indexes:
result = conn.execute(sa.text(f"PRAGMA index_info('{row[1]}')"))
index_info = result.fetchall()
if index_info[0][2] == "vector_database_name":
# In case a unique index exists on vector_database_name, drop it and the graph_database_name one
_recreate_table_without_unique_constraint_sqlite(op, insp)
def downgrade() -> None:
conn = op.get_bind()
insp = sa.inspect(conn)
if op.get_context().dialect.name == "sqlite":
_recreate_table_with_unique_constraint_sqlite(op, insp)
elif op.get_context().dialect.name == "postgresql":
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
# Re-add the unique constraint to return to unique=True
batch_op.create_unique_constraint(graph_constraint_name, ["graph_database_name"])
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
# Re-add the unique constraint to return to unique=True
batch_op.create_unique_constraint(vector_constraint_name, ["vector_database_name"])
op.drop_column("dataset_database", "vector_database_connection_info")
op.drop_column("dataset_database", "graph_database_connection_info")
op.drop_column("dataset_database", "vector_dataset_database_handler")
op.drop_column("dataset_database", "graph_dataset_database_handler")

File diff suppressed because it is too large Load diff

View file

@ -9,13 +9,13 @@
"lint": "next lint"
},
"dependencies": {
"@auth0/nextjs-auth0": "^4.6.0",
"@auth0/nextjs-auth0": "^4.13.1",
"classnames": "^2.5.1",
"culori": "^4.0.1",
"d3-force-3d": "^3.0.6",
"next": "15.3.3",
"react": "^19.0.0",
"react-dom": "^19.0.0",
"next": "16.1.1",
"react": "^19.2.0",
"react-dom": "^19.2.0",
"react-force-graph-2d": "^1.27.1",
"uuid": "^9.0.1"
},
@ -24,11 +24,11 @@
"@tailwindcss/postcss": "^4.1.7",
"@types/culori": "^4.0.0",
"@types/node": "^20",
"@types/react": "^18",
"@types/react-dom": "^18",
"@types/react": "^19",
"@types/react-dom": "^19",
"@types/uuid": "^9.0.8",
"eslint": "^9",
"eslint-config-next": "^15.3.3",
"eslint-config-next": "^16.0.4",
"eslint-config-prettier": "^10.1.5",
"tailwindcss": "^4.1.7",
"typescript": "^5"

View file

@ -1,119 +0,0 @@
import { useState } from "react";
import { fetch } from "@/utils";
import { v4 as uuid4 } from "uuid";
import { LoadingIndicator } from "@/ui/App";
import { CTAButton, Input } from "@/ui/elements";
interface CrewAIFormPayload extends HTMLFormElement {
username1: HTMLInputElement;
username2: HTMLInputElement;
}
interface CrewAITriggerProps {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
onData: (data: any) => void;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
onActivity: (activities: any) => void;
}
export default function CrewAITrigger({ onData, onActivity }: CrewAITriggerProps) {
const [isCrewAIRunning, setIsCrewAIRunning] = useState(false);
const handleRunCrewAI = (event: React.FormEvent<CrewAIFormPayload>) => {
event.preventDefault();
const formElements = event.currentTarget;
const crewAIConfig = {
username1: formElements.username1.value,
username2: formElements.username2.value,
};
const backendApiUrl = process.env.NEXT_PUBLIC_BACKEND_API_URL;
const wsUrl = backendApiUrl.replace(/^http(s)?/, "ws");
const websocket = new WebSocket(`${wsUrl}/v1/crewai/subscribe`);
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Dispatching hiring crew agents" }]);
websocket.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.status === "PipelineRunActivity") {
onActivity([data.payload]);
return;
}
onData({
nodes: data.payload.nodes,
links: data.payload.edges,
});
const nodes_type_map: { [key: string]: number } = {};
for (let i = 0; i < data.payload.nodes.length; i++) {
const node = data.payload.nodes[i];
if (!nodes_type_map[node.type]) {
nodes_type_map[node.type] = 0;
}
nodes_type_map[node.type] += 1;
}
const activityMessage = Object.entries(nodes_type_map).reduce((message, [type, count]) => {
return `${message}\n | ${type}: ${count}`;
}, "Graph updated:");
onActivity([{
id: uuid4(),
timestamp: Date.now(),
activity: activityMessage,
}]);
if (data.status === "PipelineRunCompleted") {
websocket.close();
}
};
onData(null);
setIsCrewAIRunning(true);
return fetch("/v1/crewai/run", {
method: "POST",
body: JSON.stringify(crewAIConfig),
headers: {
"Content-Type": "application/json",
},
})
.then(response => response.json())
.then(() => {
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Hiring crew agents made a decision" }]);
})
.catch(() => {
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Hiring crew agents had problems while executing" }]);
})
.finally(() => {
websocket.close();
setIsCrewAIRunning(false);
});
};
return (
<form className="w-full flex flex-col gap-2" onSubmit={handleRunCrewAI}>
<h1 className="text-2xl text-white">Cognee Dev Mexican Standoff</h1>
<span className="text-white">Agents compare GitHub profiles, and make a decision who is a better developer</span>
<div className="flex flex-row gap-2">
<div className="flex flex-col w-full flex-1/2">
<label className="block mb-1 text-white" htmlFor="username1">GitHub username</label>
<Input name="username1" type="text" placeholder="Github Username" required defaultValue="hajdul88" />
</div>
<div className="flex flex-col w-full flex-1/2">
<label className="block mb-1 text-white" htmlFor="username2">GitHub username</label>
<Input name="username2" type="text" placeholder="Github Username" required defaultValue="lxobr" />
</div>
</div>
<CTAButton type="submit" disabled={isCrewAIRunning} className="whitespace-nowrap">
Start Mexican Standoff
{isCrewAIRunning && <LoadingIndicator />}
</CTAButton>
</form>
);
}

View file

@ -6,7 +6,6 @@ import { NodeObject, LinkObject } from "react-force-graph-2d";
import { ChangeEvent, useEffect, useImperativeHandle, useRef, useState } from "react";
import { DeleteIcon } from "@/ui/Icons";
// import { FeedbackForm } from "@/ui/Partials";
import { CTAButton, Input, NeutralButton, Select } from "@/ui/elements";
interface GraphControlsProps {
@ -111,7 +110,7 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
};
const [isAuthShapeChangeEnabled, setIsAuthShapeChangeEnabled] = useState(true);
const shapeChangeTimeout = useRef<number | null>();
const shapeChangeTimeout = useRef<number | null>(null);
useEffect(() => {
onGraphShapeChange(DEFAULT_GRAPH_SHAPE);
@ -230,12 +229,6 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
)}
</>
{/* )} */}
{/* {selectedTab === "feedback" && (
<div className="flex flex-col gap-2">
<FeedbackForm onSuccess={() => {}} />
</div>
)} */}
</div>
</>
);

View file

@ -1,6 +1,6 @@
"use client";
import { useCallback, useRef, useState, MutableRefObject } from "react";
import { useCallback, useRef, useState, RefObject } from "react";
import Link from "next/link";
import { TextLogo } from "@/ui/App";
@ -47,11 +47,11 @@ export default function GraphView() {
updateData(newData);
}, []);
const graphRef = useRef<GraphVisualizationAPI>();
const graphRef = useRef<GraphVisualizationAPI>(null);
const graphControls = useRef<GraphControlsAPI>();
const graphControls = useRef<GraphControlsAPI>(null);
const activityLog = useRef<ActivityLogAPI>();
const activityLog = useRef<ActivityLogAPI>(null);
return (
<main className="flex flex-col h-full">
@ -74,21 +74,18 @@ export default function GraphView() {
<div className="w-full h-full relative overflow-hidden">
<GraphVisualization
key={data?.nodes.length}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
data={data}
graphControls={graphControls as MutableRefObject<GraphControlsAPI>}
graphControls={graphControls as RefObject<GraphControlsAPI>}
/>
<div className="absolute top-2 left-2 flex flex-col gap-2">
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
<CogneeAddWidget onData={onDataChange} />
</div>
{/* <div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
<CrewAITrigger onData={onDataChange} onActivity={(activities) => activityLog.current?.updateActivityLog(activities)} />
</div> */}
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
<h2 className="text-xl text-white mb-4">Activity Log</h2>
<ActivityLog ref={activityLog as MutableRefObject<ActivityLogAPI>} />
<ActivityLog ref={activityLog as RefObject<ActivityLogAPI>} />
</div>
</div>
@ -96,7 +93,7 @@ export default function GraphView() {
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-110">
<GraphControls
data={data}
ref={graphControls as MutableRefObject<GraphControlsAPI>}
ref={graphControls as RefObject<GraphControlsAPI>}
isAddNodeFormOpen={isAddNodeFormOpen}
onFitIntoView={() => graphRef.current!.zoomToFit(1000, 50)}
onGraphShapeChange={(shape) => graphRef.current!.setGraphShape(shape)}

View file

@ -1,7 +1,7 @@
"use client";
import classNames from "classnames";
import { MutableRefObject, useEffect, useImperativeHandle, useRef, useState, useCallback } from "react";
import { RefObject, useEffect, useImperativeHandle, useRef, useState, useCallback } from "react";
import { forceCollide, forceManyBody } from "d3-force-3d";
import dynamic from "next/dynamic";
import { GraphControlsAPI } from "./GraphControls";
@ -16,9 +16,9 @@ const ForceGraph = dynamic(() => import("react-force-graph-2d"), {
import type { ForceGraphMethods, GraphData, LinkObject, NodeObject } from "react-force-graph-2d";
interface GraphVisuzaliationProps {
ref: MutableRefObject<GraphVisualizationAPI>;
ref: RefObject<GraphVisualizationAPI>;
data?: GraphData<NodeObject, LinkObject>;
graphControls: MutableRefObject<GraphControlsAPI>;
graphControls: RefObject<GraphControlsAPI>;
className?: string;
}
@ -205,7 +205,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
// eslint-disable-next-line @typescript-eslint/no-unused-vars
function handleDagError(loopNodeIds: (string | number)[]) {}
const graphRef = useRef<ForceGraphMethods>();
const graphRef = useRef<ForceGraphMethods>(null);
useEffect(() => {
if (data && graphRef.current) {
@ -224,6 +224,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
) => {
if (!graphRef.current) {
console.warn("GraphVisualization: graphRef not ready yet");
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return undefined as any;
}
@ -239,7 +240,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
return (
<div ref={containerRef} className={classNames("w-full h-full", className)} id="graph-container">
<ForceGraph
ref={graphRef}
ref={graphRef as RefObject<ForceGraphMethods>}
width={dimensions.width}
height={dimensions.height}
dagMode={graphShape as unknown as undefined}

View file

@ -1,8 +1,8 @@
"use server";
"use client";
import Dashboard from "./Dashboard";
export default async function Page() {
export default function Page() {
const accessToken = "";
return (

View file

@ -1,3 +1,3 @@
export { default } from "./dashboard/page";
// export const dynamic = "force-dynamic";
export const dynamic = "force-dynamic";

View file

@ -13,7 +13,6 @@ export interface Dataset {
function useDatasets(useCloud = false) {
const [datasets, setDatasets] = useState<Dataset[]>([]);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
// const statusTimeout = useRef<any>(null);
// const fetchDatasetStatuses = useCallback((datasets: Dataset[]) => {

View file

@ -2,7 +2,7 @@ import { NextResponse, type NextRequest } from "next/server";
// import { auth0 } from "./modules/auth/auth0";
// eslint-disable-next-line @typescript-eslint/no-unused-vars
export async function middleware(request: NextRequest) {
export async function proxy(request: NextRequest) {
// if (process.env.USE_AUTH0_AUTHORIZATION?.toLowerCase() === "true") {
// if (request.nextUrl.pathname === "/auth/token") {
// return NextResponse.next();

View file

@ -1,69 +0,0 @@
"use client";
import { useState } from "react";
import { LoadingIndicator } from "@/ui/App";
import { fetch, useBoolean } from "@/utils";
import { CTAButton, TextArea } from "@/ui/elements";
interface SignInFormPayload extends HTMLFormElement {
feedback: HTMLTextAreaElement;
}
interface FeedbackFormProps {
onSuccess: () => void;
}
export default function FeedbackForm({ onSuccess }: FeedbackFormProps) {
const {
value: isSubmittingFeedback,
setTrue: disableFeedbackSubmit,
setFalse: enableFeedbackSubmit,
} = useBoolean(false);
const [feedbackError, setFeedbackError] = useState<string | null>(null);
const signIn = (event: React.FormEvent<SignInFormPayload>) => {
event.preventDefault();
const formElements = event.currentTarget;
setFeedbackError(null);
disableFeedbackSubmit();
fetch("/v1/crewai/feedback", {
method: "POST",
body: JSON.stringify({
feedback: formElements.feedback.value,
}),
headers: {
"Content-Type": "application/json",
},
})
.then(response => response.json())
.then(() => {
onSuccess();
formElements.feedback.value = "";
})
.catch(error => setFeedbackError(error.detail))
.finally(() => enableFeedbackSubmit());
};
return (
<form onSubmit={signIn} className="flex flex-col gap-2">
<div className="flex flex-col gap-2">
<div className="mb-4">
<label className="block text-white" htmlFor="feedback">Feedback on agent&apos;s reasoning</label>
<TextArea id="feedback" name="feedback" type="text" placeholder="Your feedback" />
</div>
</div>
<CTAButton type="submit">
<span>Submit feedback</span>
{isSubmittingFeedback && <LoadingIndicator />}
</CTAButton>
{feedbackError && (
<span className="text-s text-white">{feedbackError}</span>
)}
</form>
)
}

View file

@ -3,4 +3,3 @@ export { default as Footer } from "./Footer/Footer";
export { default as SearchView } from "./SearchView/SearchView";
export { default as IFrameView } from "./IFrameView/IFrameView";
// export { default as Explorer } from "./Explorer/Explorer";
export { default as FeedbackForm } from "./FeedbackForm";

View file

@ -2,7 +2,7 @@
import { v4 as uuid4 } from "uuid";
import classNames from "classnames";
import { Fragment, MouseEvent, MutableRefObject, useCallback, useEffect, useRef, useState } from "react";
import { Fragment, MouseEvent, RefObject, useCallback, useEffect, useRef, useState } from "react";
import { useModal } from "@/ui/elements/Modal";
import { CaretIcon, CloseIcon, PlusIcon } from "@/ui/Icons";
@ -282,7 +282,7 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
function CellResult({ content }: { content: [] }) {
const parsedContent = [];
const graphRef = useRef<GraphVisualizationAPI>();
const graphRef = useRef<GraphVisualizationAPI>(null);
const graphControls = useRef<GraphControlsAPI>({
setSelectedNode: () => {},
getSelectedNode: () => null,
@ -298,7 +298,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph</span>
<GraphVisualization
data={transformInsightsGraphData(line)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
@ -346,7 +346,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>
@ -377,7 +377,7 @@ function CellResult({ content }: { content: [] }) {
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
<GraphVisualization
data={transformToVisualizationData(graph)}
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
ref={graphRef as RefObject<GraphVisualizationAPI>}
graphControls={graphControls}
className="min-h-80"
/>

View file

@ -33,7 +33,7 @@ export default function NotebookCellHeader({
setFalse: setIsNotRunningCell,
} = useBoolean(false);
const [runInstance, setRunInstance] = useState<string>(isCloudEnvironment() ? "cloud" : "local");
const [runInstance] = useState<string>(isCloudEnvironment() ? "cloud" : "local");
const handleCellRun = () => {
if (runCell) {

View file

@ -14,7 +14,7 @@
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"jsx": "react-jsx",
"incremental": true,
"plugins": [
{
@ -32,7 +32,8 @@
"next-env.d.ts",
"**/*.ts",
"**/*.tsx",
".next/types/**/*.ts"
".next/types/**/*.ts",
".next/dev/types/**/*.ts"
],
"exclude": [
"node_modules"

View file

@ -445,16 +445,22 @@ The MCP server exposes its functionality through tools. Call them from any MCP c
- **cognify**: Turns your data into a structured knowledge graph and stores it in memory
- **cognee_add_developer_rules**: Ingest core developer rule files into memory
- **codify**: Analyse a code repository, build a code graph, stores it in memory
- **search**: Query memory supports GRAPH_COMPLETION, RAG_COMPLETION, CODE, CHUNKS
- **list_data**: List all datasets and their data items with IDs for deletion operations
- **delete**: Delete specific data from a dataset (supports soft/hard deletion modes)
- **get_developer_rules**: Retrieve all developer rules that were generated based on previous interactions
- **list_data**: List all datasets and their data items with IDs for deletion operations
- **save_interaction**: Logs user-agent interactions and query-answer pairs
- **prune**: Reset cognee for a fresh start (removes all data)
- **search**: Query memory supports GRAPH_COMPLETION, RAG_COMPLETION, CODE, CHUNKS, SUMMARIES, CYPHER, and FEELING_LUCKY
- **cognify_status / codify_status**: Track pipeline progress
**Data Management Examples:**

View file

@ -1,6 +1,6 @@
[project]
name = "cognee-mcp"
version = "0.4.0"
version = "0.5.0"
description = "Cognee MCP server"
readme = "README.md"
requires-python = ">=3.10"
@ -9,7 +9,7 @@ dependencies = [
# For local cognee repo usage remove comment bellow and add absolute path to cognee. Then run `uv sync --reinstall` in the mcp folder on local cognee changes.
#"cognee[postgres,codegraph,gemini,huggingface,docs,neo4j] @ file:/Users/igorilic/Desktop/cognee",
# TODO: Remove gemini from optional dependecnies for new Cognee version after 0.3.4
"cognee[postgres,docs,neo4j]==0.3.7",
"cognee[postgres,docs,neo4j]==0.5.0",
"fastmcp>=2.10.0,<3.0.0",
"mcp>=1.12.0,<2.0.0",
"uv>=0.6.3,<1.0.0",

View file

@ -151,7 +151,7 @@ class CogneeClient:
query_type: str,
datasets: Optional[List[str]] = None,
system_prompt: Optional[str] = None,
top_k: int = 10,
top_k: int = 5,
) -> Any:
"""
Search the knowledge graph.
@ -192,7 +192,7 @@ class CogneeClient:
with redirect_stdout(sys.stderr):
results = await self.cognee.search(
query_type=SearchType[query_type.upper()], query_text=query_text
query_type=SearchType[query_type.upper()], query_text=query_text, top_k=top_k
)
return results

View file

@ -90,97 +90,6 @@ async def health_check(request):
return JSONResponse({"status": "ok"})
@mcp.tool()
async def cognee_add_developer_rules(
base_path: str = ".", graph_model_file: str = None, graph_model_name: str = None
) -> list:
"""
Ingest core developer rule files into Cognee's memory layer.
This function loads a predefined set of developer-related configuration,
rule, and documentation files from the base repository and assigns them
to the special 'developer_rules' node set in Cognee. It ensures these
foundational files are always part of the structured memory graph.
Parameters
----------
base_path : str
Root path to resolve relative file paths. Defaults to current directory.
graph_model_file : str, optional
Optional path to a custom schema file for knowledge graph generation.
graph_model_name : str, optional
Optional class name to use from the graph_model_file schema.
Returns
-------
list
A message indicating how many rule files were scheduled for ingestion,
and how to check their processing status.
Notes
-----
- Each file is processed asynchronously in the background.
- Files are attached to the 'developer_rules' node set.
- Missing files are skipped with a logged warning.
"""
developer_rule_paths = [
".cursorrules",
".cursor/rules",
".same/todos.md",
".windsurfrules",
".clinerules",
"CLAUDE.md",
".sourcegraph/memory.md",
"AGENT.md",
"AGENTS.md",
]
async def cognify_task(file_path: str) -> None:
with redirect_stdout(sys.stderr):
logger.info(f"Starting cognify for: {file_path}")
try:
await cognee_client.add(file_path, node_set=["developer_rules"])
model = None
if graph_model_file and graph_model_name:
if cognee_client.use_api:
logger.warning(
"Custom graph models are not supported in API mode, ignoring."
)
else:
from cognee.shared.data_models import KnowledgeGraph
model = load_class(graph_model_file, graph_model_name)
await cognee_client.cognify(graph_model=model)
logger.info(f"Cognify finished for: {file_path}")
except Exception as e:
logger.error(f"Cognify failed for {file_path}: {str(e)}")
raise ValueError(f"Failed to cognify: {str(e)}")
tasks = []
for rel_path in developer_rule_paths:
abs_path = os.path.join(base_path, rel_path)
if os.path.isfile(abs_path):
tasks.append(asyncio.create_task(cognify_task(abs_path)))
else:
logger.warning(f"Skipped missing developer rule file: {abs_path}")
log_file = get_log_file_location()
return [
types.TextContent(
type="text",
text=(
f"Started cognify for {len(tasks)} developer rule files in background.\n"
f"All are added to the `developer_rules` node set.\n"
f"Use `cognify_status` or check logs at {log_file} to monitor progress."
),
)
]
@mcp.tool()
async def cognify(
data: str, graph_model_file: str = None, graph_model_name: str = None, custom_prompt: str = None
@ -407,76 +316,7 @@ async def save_interaction(data: str) -> list:
@mcp.tool()
async def codify(repo_path: str) -> list:
"""
Analyze and generate a code-specific knowledge graph from a software repository.
This function launches a background task that processes the provided repository
and builds a code knowledge graph. The function returns immediately while
the processing continues in the background due to MCP timeout constraints.
Parameters
----------
repo_path : str
Path to the code repository to analyze. This can be a local file path or a
relative path to a repository. The path should point to the root of the
repository or a specific directory within it.
Returns
-------
list
A list containing a single TextContent object with information about the
background task launch and how to check its status.
Notes
-----
- The function launches a background task and returns immediately
- The code graph generation may take significant time for larger repositories
- Use the codify_status tool to check the progress of the operation
- Process results are logged to the standard Cognee log file
- All stdout is redirected to stderr to maintain MCP communication integrity
"""
if cognee_client.use_api:
error_msg = "❌ Codify operation is not available in API mode. Please use direct mode for code graph pipeline."
logger.error(error_msg)
return [types.TextContent(type="text", text=error_msg)]
async def codify_task(repo_path: str):
# NOTE: MCP uses stdout to communicate, we must redirect all output
# going to stdout ( like the print function ) to stderr.
with redirect_stdout(sys.stderr):
logger.info("Codify process starting.")
from cognee.api.v1.cognify.code_graph_pipeline import run_code_graph_pipeline
results = []
async for result in run_code_graph_pipeline(repo_path, False):
results.append(result)
logger.info(result)
if all(results):
logger.info("Codify process finished succesfully.")
else:
logger.info("Codify process failed.")
asyncio.create_task(codify_task(repo_path))
log_file = get_log_file_location()
text = (
f"Background process launched due to MCP timeout limitations.\n"
f"To check current codify status use the codify_status tool\n"
f"or you can check the log file at: {log_file}"
)
return [
types.TextContent(
type="text",
text=text,
)
]
@mcp.tool()
async def search(search_query: str, search_type: str) -> list:
async def search(search_query: str, search_type: str, top_k: int = 10) -> list:
"""
Search and query the knowledge graph for insights, information, and connections.
@ -549,6 +389,13 @@ async def search(search_query: str, search_type: str) -> list:
The search_type is case-insensitive and will be converted to uppercase.
top_k : int, optional
Maximum number of results to return (default: 10).
Controls the amount of context retrieved from the knowledge graph.
- Lower values (3-5): Faster, more focused results
- Higher values (10-20): More comprehensive, but slower and more context-heavy
Helps manage response size and context window usage in MCP clients.
Returns
-------
list
@ -585,13 +432,32 @@ async def search(search_query: str, search_type: str) -> list:
"""
async def search_task(search_query: str, search_type: str) -> str:
"""Search the knowledge graph"""
async def search_task(search_query: str, search_type: str, top_k: int) -> str:
"""
Internal task to execute knowledge graph search with result formatting.
Handles the actual search execution and formats results appropriately
for MCP clients based on the search type and execution mode (API vs direct).
Parameters
----------
search_query : str
The search query in natural language
search_type : str
Type of search to perform (GRAPH_COMPLETION, CHUNKS, etc.)
top_k : int
Maximum number of results to return
Returns
-------
str
Formatted search results as a string, with format depending on search_type
"""
# NOTE: MCP uses stdout to communicate, we must redirect all output
# going to stdout ( like the print function ) to stderr.
with redirect_stdout(sys.stderr):
search_results = await cognee_client.search(
query_text=search_query, query_type=search_type
query_text=search_query, query_type=search_type, top_k=top_k
)
# Handle different result formats based on API vs direct mode
@ -625,49 +491,10 @@ async def search(search_query: str, search_type: str) -> list:
else:
return str(search_results)
search_results = await search_task(search_query, search_type)
search_results = await search_task(search_query, search_type, top_k)
return [types.TextContent(type="text", text=search_results)]
@mcp.tool()
async def get_developer_rules() -> list:
"""
Retrieve all developer rules that were generated based on previous interactions.
This tool queries the Cognee knowledge graph and returns a list of developer
rules.
Parameters
----------
None
Returns
-------
list
A list containing a single TextContent object with the retrieved developer rules.
The format is plain text containing the developer rules in bulletpoints.
Notes
-----
- The specific logic for fetching rules is handled internally.
- This tool does not accept any parameters and is intended for simple rule inspection use cases.
"""
async def fetch_rules_from_cognee() -> str:
"""Collect all developer rules from Cognee"""
with redirect_stdout(sys.stderr):
if cognee_client.use_api:
logger.warning("Developer rules retrieval is not available in API mode")
return "Developer rules retrieval is not available in API mode"
developer_rules = await get_existing_rules(rules_nodeset_name="coding_agent_rules")
return developer_rules
rules_text = await fetch_rules_from_cognee()
return [types.TextContent(type="text", text=rules_text)]
@mcp.tool()
async def list_data(dataset_id: str = None) -> list:
"""
@ -953,48 +780,6 @@ async def cognify_status():
return [types.TextContent(type="text", text=error_msg)]
@mcp.tool()
async def codify_status():
"""
Get the current status of the codify pipeline.
This function retrieves information about current and recently completed codify operations
in the codebase dataset. It provides details on progress, success/failure status, and statistics
about the processed code repositories.
Returns
-------
list
A list containing a single TextContent object with the status information as a string.
The status includes information about active and completed jobs for the cognify_code_pipeline.
Notes
-----
- The function retrieves pipeline status specifically for the "cognify_code_pipeline" on the "codebase" dataset
- Status information includes job progress, execution time, and completion status
- The status is returned in string format for easy reading
- This operation is not available in API mode
"""
with redirect_stdout(sys.stderr):
try:
from cognee.modules.data.methods.get_unique_dataset_id import get_unique_dataset_id
from cognee.modules.users.methods import get_default_user
user = await get_default_user()
status = await cognee_client.get_pipeline_status(
[await get_unique_dataset_id("codebase", user)], "cognify_code_pipeline"
)
return [types.TextContent(type="text", text=str(status))]
except NotImplementedError:
error_msg = "❌ Pipeline status is not available in API mode"
logger.error(error_msg)
return [types.TextContent(type="text", text=error_msg)]
except Exception as e:
error_msg = f"❌ Failed to get codify status: {str(e)}"
logger.error(error_msg)
return [types.TextContent(type="text", text=error_msg)]
def node_to_string(node):
node_data = ", ".join(
[f'{key}: "{value}"' for key, value in node.items() if key in ["id", "name"]]

View file

@ -3,7 +3,7 @@
Test client for Cognee MCP Server functionality.
This script tests all the tools and functions available in the Cognee MCP server,
including cognify, codify, search, prune, status checks, and utility functions.
including cognify, search, prune, status checks, and utility functions.
Usage:
# Set your OpenAI API key first
@ -23,6 +23,7 @@ import tempfile
import time
from contextlib import asynccontextmanager
from cognee.shared.logging_utils import setup_logging
from logging import ERROR, INFO
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
@ -35,7 +36,7 @@ from src.server import (
load_class,
)
# Set timeout for cognify/codify to complete in
# Set timeout for cognify to complete in
TIMEOUT = 5 * 60 # 5 min in seconds
@ -151,12 +152,9 @@ DEBUG = True
expected_tools = {
"cognify",
"codify",
"search",
"prune",
"cognify_status",
"codify_status",
"cognee_add_developer_rules",
"list_data",
"delete",
}
@ -247,106 +245,6 @@ DEBUG = True
}
print(f"{test_name} test failed: {e}")
async def test_codify(self):
"""Test the codify functionality using MCP client."""
print("\n🧪 Testing codify functionality...")
try:
async with self.mcp_server_session() as session:
codify_result = await session.call_tool(
"codify", arguments={"repo_path": self.test_repo_dir}
)
start = time.time() # mark the start
while True:
try:
# Wait a moment
await asyncio.sleep(5)
# Check if codify processing is finished
status_result = await session.call_tool("codify_status", arguments={})
if hasattr(status_result, "content") and status_result.content:
status_text = (
status_result.content[0].text
if status_result.content
else str(status_result)
)
else:
status_text = str(status_result)
if str(PipelineRunStatus.DATASET_PROCESSING_COMPLETED) in status_text:
break
elif time.time() - start > TIMEOUT:
raise TimeoutError("Codify did not complete in 5min")
except DatabaseNotCreatedError:
if time.time() - start > TIMEOUT:
raise TimeoutError("Database was not created in 5min")
self.test_results["codify"] = {
"status": "PASS",
"result": codify_result,
"message": "Codify executed successfully",
}
print("✅ Codify test passed")
except Exception as e:
self.test_results["codify"] = {
"status": "FAIL",
"error": str(e),
"message": "Codify test failed",
}
print(f"❌ Codify test failed: {e}")
async def test_cognee_add_developer_rules(self):
"""Test the cognee_add_developer_rules functionality using MCP client."""
print("\n🧪 Testing cognee_add_developer_rules functionality...")
try:
async with self.mcp_server_session() as session:
result = await session.call_tool(
"cognee_add_developer_rules", arguments={"base_path": self.test_data_dir}
)
start = time.time() # mark the start
while True:
try:
# Wait a moment
await asyncio.sleep(5)
# Check if developer rule cognify processing is finished
status_result = await session.call_tool("cognify_status", arguments={})
if hasattr(status_result, "content") and status_result.content:
status_text = (
status_result.content[0].text
if status_result.content
else str(status_result)
)
else:
status_text = str(status_result)
if str(PipelineRunStatus.DATASET_PROCESSING_COMPLETED) in status_text:
break
elif time.time() - start > TIMEOUT:
raise TimeoutError(
"Cognify of developer rules did not complete in 5min"
)
except DatabaseNotCreatedError:
if time.time() - start > TIMEOUT:
raise TimeoutError("Database was not created in 5min")
self.test_results["cognee_add_developer_rules"] = {
"status": "PASS",
"result": result,
"message": "Developer rules addition executed successfully",
}
print("✅ Developer rules test passed")
except Exception as e:
self.test_results["cognee_add_developer_rules"] = {
"status": "FAIL",
"error": str(e),
"message": "Developer rules test failed",
}
print(f"❌ Developer rules test failed: {e}")
async def test_search_functionality(self):
"""Test the search functionality with different search types using MCP client."""
print("\n🧪 Testing search functionality...")
@ -359,7 +257,11 @@ DEBUG = True
# Go through all Cognee search types
for search_type in SearchType:
# Don't test these search types
if search_type in [SearchType.NATURAL_LANGUAGE, SearchType.CYPHER]:
if search_type in [
SearchType.NATURAL_LANGUAGE,
SearchType.CYPHER,
SearchType.TRIPLET_COMPLETION,
]:
break
try:
async with self.mcp_server_session() as session:
@ -681,9 +583,6 @@ class TestModel:
test_name="Cognify2",
)
await self.test_codify()
await self.test_cognee_add_developer_rules()
# Test list_data and delete functionality
await self.test_list_data()
await self.test_delete()
@ -739,7 +638,5 @@ async def main():
if __name__ == "__main__":
from logging import ERROR
logger = setup_logging(log_level=ERROR)
asyncio.run(main())

7633
cognee-mcp/uv.lock generated

File diff suppressed because it is too large Load diff

View file

@ -21,7 +21,7 @@ from cognee.api.v1.notebooks.routers import get_notebooks_router
from cognee.api.v1.permissions.routers import get_permissions_router
from cognee.api.v1.settings.routers import get_settings_router
from cognee.api.v1.datasets.routers import get_datasets_router
from cognee.api.v1.cognify.routers import get_code_pipeline_router, get_cognify_router
from cognee.api.v1.cognify.routers import get_cognify_router
from cognee.api.v1.search.routers import get_search_router
from cognee.api.v1.ontologies.routers.get_ontology_router import get_ontology_router
from cognee.api.v1.memify.routers import get_memify_router
@ -278,10 +278,6 @@ app.include_router(get_responses_router(), prefix="/api/v1/responses", tags=["re
app.include_router(get_sync_router(), prefix="/api/v1/sync", tags=["sync"])
codegraph_routes = get_code_pipeline_router()
if codegraph_routes:
app.include_router(codegraph_routes, prefix="/api/v1/code-pipeline", tags=["code-pipeline"])
app.include_router(
get_users_router(),
prefix="/api/v1/users",

View file

@ -155,7 +155,7 @@ async def add(
- LLM_API_KEY: API key for your LLM provider (OpenAI, Anthropic, etc.)
Optional:
- LLM_PROVIDER: "openai" (default), "anthropic", "gemini", "ollama", "mistral"
- LLM_PROVIDER: "openai" (default), "anthropic", "gemini", "ollama", "mistral", "bedrock"
- LLM_MODEL: Model name (default: "gpt-5-mini")
- DEFAULT_USER_EMAIL: Custom default user email
- DEFAULT_USER_PASSWORD: Custom default user password
@ -205,6 +205,7 @@ async def add(
pipeline_name="add_pipeline",
vector_db_config=vector_db_config,
graph_db_config=graph_db_config,
use_pipeline_cache=True,
incremental_loading=incremental_loading,
data_per_batch=data_per_batch,
):

View file

@ -1,119 +0,0 @@
import os
import pathlib
import asyncio
from typing import Optional
from cognee.shared.logging_utils import get_logger, setup_logging
from cognee.modules.observability.get_observe import get_observe
from cognee.api.v1.search import SearchType, search
from cognee.api.v1.visualize.visualize import visualize_graph
from cognee.modules.cognify.config import get_cognify_config
from cognee.modules.pipelines import run_tasks
from cognee.modules.pipelines.tasks.task import Task
from cognee.modules.users.methods import get_default_user
from cognee.shared.data_models import KnowledgeGraph
from cognee.modules.data.methods import create_dataset
from cognee.tasks.documents import classify_documents, extract_chunks_from_documents
from cognee.tasks.graph import extract_graph_from_data
from cognee.tasks.ingestion import ingest_data
from cognee.tasks.repo_processor import get_non_py_files, get_repo_file_dependencies
from cognee.tasks.storage import add_data_points
from cognee.tasks.summarization import summarize_text
from cognee.infrastructure.llm import get_max_chunk_tokens
from cognee.infrastructure.databases.relational import get_relational_engine
observe = get_observe()
logger = get_logger("code_graph_pipeline")
@observe
async def run_code_graph_pipeline(
repo_path,
include_docs=False,
excluded_paths: Optional[list[str]] = None,
supported_languages: Optional[list[str]] = None,
):
import cognee
from cognee.low_level import setup
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await setup()
cognee_config = get_cognify_config()
user = await get_default_user()
detailed_extraction = True
tasks = [
Task(
get_repo_file_dependencies,
detailed_extraction=detailed_extraction,
supported_languages=supported_languages,
excluded_paths=excluded_paths,
),
# Task(summarize_code, task_config={"batch_size": 500}), # This task takes a long time to complete
Task(add_data_points, task_config={"batch_size": 30}),
]
if include_docs:
# This tasks take a long time to complete
non_code_tasks = [
Task(get_non_py_files, task_config={"batch_size": 50}),
Task(ingest_data, dataset_name="repo_docs", user=user),
Task(classify_documents),
Task(extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()),
Task(
extract_graph_from_data,
graph_model=KnowledgeGraph,
task_config={"batch_size": 50},
),
Task(
summarize_text,
summarization_model=cognee_config.summarization_model,
task_config={"batch_size": 50},
),
]
dataset_name = "codebase"
# Save dataset to database
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
dataset = await create_dataset(dataset_name, user, session)
if include_docs:
non_code_pipeline_run = run_tasks(
non_code_tasks, dataset.id, repo_path, user, "cognify_pipeline"
)
async for run_status in non_code_pipeline_run:
yield run_status
async for run_status in run_tasks(
tasks, dataset.id, repo_path, user, "cognify_code_pipeline", incremental_loading=False
):
yield run_status
if __name__ == "__main__":
async def main():
async for run_status in run_code_graph_pipeline("REPO_PATH"):
print(f"{run_status.pipeline_run_id}: {run_status.status}")
file_path = os.path.join(
pathlib.Path(__file__).parent, ".artifacts", "graph_visualization.html"
)
await visualize_graph(file_path)
search_results = await search(
query_type=SearchType.CODE,
query_text="How is Relationship weight calculated?",
)
for file in search_results:
print(file["name"])
logger = setup_logging(name="code_graph_pipeline")
asyncio.run(main())

View file

@ -3,6 +3,7 @@ from pydantic import BaseModel
from typing import Union, Optional
from uuid import UUID
from cognee.modules.cognify.config import get_cognify_config
from cognee.modules.ontology.ontology_env_config import get_ontology_env_config
from cognee.shared.logging_utils import get_logger
from cognee.shared.data_models import KnowledgeGraph
@ -19,7 +20,6 @@ from cognee.modules.ontology.get_default_ontology_resolver import (
from cognee.modules.users.models import User
from cognee.tasks.documents import (
check_permissions_on_dataset,
classify_documents,
extract_chunks_from_documents,
)
@ -53,6 +53,7 @@ async def cognify(
custom_prompt: Optional[str] = None,
temporal_cognify: bool = False,
data_per_batch: int = 20,
**kwargs,
):
"""
Transform ingested data into a structured knowledge graph.
@ -78,12 +79,11 @@ async def cognify(
Processing Pipeline:
1. **Document Classification**: Identifies document types and structures
2. **Permission Validation**: Ensures user has processing rights
3. **Text Chunking**: Breaks content into semantically meaningful segments
4. **Entity Extraction**: Identifies key concepts, people, places, organizations
5. **Relationship Detection**: Discovers connections between entities
6. **Graph Construction**: Builds semantic knowledge graph with embeddings
7. **Content Summarization**: Creates hierarchical summaries for navigation
2. **Text Chunking**: Breaks content into semantically meaningful segments
3. **Entity Extraction**: Identifies key concepts, people, places, organizations
4. **Relationship Detection**: Discovers connections between entities
5. **Graph Construction**: Builds semantic knowledge graph with embeddings
6. **Content Summarization**: Creates hierarchical summaries for navigation
Graph Model Customization:
The `graph_model` parameter allows custom knowledge structures:
@ -224,6 +224,7 @@ async def cognify(
config=config,
custom_prompt=custom_prompt,
chunks_per_batch=chunks_per_batch,
**kwargs,
)
# By calling get pipeline executor we get a function that will have the run_pipeline run in the background or a function that we will need to wait for
@ -238,6 +239,7 @@ async def cognify(
vector_db_config=vector_db_config,
graph_db_config=graph_db_config,
incremental_loading=incremental_loading,
use_pipeline_cache=True,
pipeline_name="cognify_pipeline",
data_per_batch=data_per_batch,
)
@ -251,6 +253,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
config: Config = None,
custom_prompt: Optional[str] = None,
chunks_per_batch: int = 100,
**kwargs,
) -> list[Task]:
if config is None:
ontology_config = get_ontology_env_config()
@ -272,9 +275,11 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
if chunks_per_batch is None:
chunks_per_batch = 100
cognify_config = get_cognify_config()
embed_triplets = cognify_config.triplet_embedding
default_tasks = [
Task(classify_documents),
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
Task(
extract_chunks_from_documents,
max_chunk_size=chunk_size or get_max_chunk_tokens(),
@ -286,12 +291,17 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
config=config,
custom_prompt=custom_prompt,
task_config={"batch_size": chunks_per_batch},
**kwargs,
), # Generate knowledge graphs from the document chunks.
Task(
summarize_text,
task_config={"batch_size": chunks_per_batch},
),
Task(add_data_points, task_config={"batch_size": chunks_per_batch}),
Task(
add_data_points,
embed_triplets=embed_triplets,
task_config={"batch_size": chunks_per_batch},
),
]
return default_tasks
@ -305,14 +315,13 @@ async def get_temporal_tasks(
The pipeline includes:
1. Document classification.
2. Dataset permission checks (requires "write" access).
3. Document chunking with a specified or default chunk size.
4. Event and timestamp extraction from chunks.
5. Knowledge graph extraction from events.
6. Batched insertion of data points.
2. Document chunking with a specified or default chunk size.
3. Event and timestamp extraction from chunks.
4. Knowledge graph extraction from events.
5. Batched insertion of data points.
Args:
user (User, optional): The user requesting task execution, used for permission checks.
user (User, optional): The user requesting task execution.
chunker (Callable, optional): A text chunking function/class to split documents. Defaults to TextChunker.
chunk_size (int, optional): Maximum token size per chunk. If not provided, uses system default.
chunks_per_batch (int, optional): Number of chunks to process in a single batch in Cognify
@ -325,7 +334,6 @@ async def get_temporal_tasks(
temporal_tasks = [
Task(classify_documents),
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
Task(
extract_chunks_from_documents,
max_chunk_size=chunk_size or get_max_chunk_tokens(),

View file

@ -1,2 +1 @@
from .get_cognify_router import get_cognify_router
from .get_code_pipeline_router import get_code_pipeline_router

View file

@ -1,90 +0,0 @@
import json
from cognee.shared.logging_utils import get_logger
from fastapi import APIRouter
from fastapi.responses import JSONResponse
from cognee.api.DTO import InDTO
from cognee.modules.retrieval.code_retriever import CodeRetriever
from cognee.modules.storage.utils import JSONEncoder
logger = get_logger()
class CodePipelineIndexPayloadDTO(InDTO):
repo_path: str
include_docs: bool = False
class CodePipelineRetrievePayloadDTO(InDTO):
query: str
full_input: str
def get_code_pipeline_router() -> APIRouter:
try:
import cognee.api.v1.cognify.code_graph_pipeline
except ModuleNotFoundError:
logger.error("codegraph dependencies not found. Skipping codegraph API routes.")
return None
router = APIRouter()
@router.post("/index", response_model=None)
async def code_pipeline_index(payload: CodePipelineIndexPayloadDTO):
"""
Run indexation on a code repository.
This endpoint processes a code repository to create a knowledge graph
of the codebase structure, dependencies, and relationships.
## Request Parameters
- **repo_path** (str): Path to the code repository
- **include_docs** (bool): Whether to include documentation files (default: false)
## Response
No content returned. Processing results are logged.
## Error Codes
- **409 Conflict**: Error during indexation process
"""
from cognee.api.v1.cognify.code_graph_pipeline import run_code_graph_pipeline
try:
async for result in run_code_graph_pipeline(payload.repo_path, payload.include_docs):
logger.info(result)
except Exception as error:
return JSONResponse(status_code=409, content={"error": str(error)})
@router.post("/retrieve", response_model=list[dict])
async def code_pipeline_retrieve(payload: CodePipelineRetrievePayloadDTO):
"""
Retrieve context from the code knowledge graph.
This endpoint searches the indexed code repository to find relevant
context based on the provided query.
## Request Parameters
- **query** (str): Search query for code context
- **full_input** (str): Full input text for processing
## Response
Returns a list of relevant code files and context as JSON.
## Error Codes
- **409 Conflict**: Error during retrieval process
"""
try:
query = (
payload.full_input.replace("cognee ", "")
if payload.full_input.startswith("cognee ")
else payload.full_input
)
retriever = CodeRetriever()
retrieved_files = await retriever.get_context(query)
return json.dumps(retrieved_files, cls=JSONEncoder)
except Exception as error:
return JSONResponse(status_code=409, content={"error": str(error)})
return router

View file

@ -42,7 +42,9 @@ class CognifyPayloadDTO(InDTO):
default="", description="Custom prompt for entity extraction and graph generation"
)
ontology_key: Optional[List[str]] = Field(
default=None, description="Reference to one or more previously uploaded ontologies"
default=None,
examples=[[]],
description="Reference to one or more previously uploaded ontologies",
)

View file

@ -208,14 +208,14 @@ def get_datasets_router() -> APIRouter:
},
)
from cognee.modules.data.methods import get_dataset, delete_dataset
from cognee.modules.data.methods import delete_dataset
dataset = await get_dataset(user.id, dataset_id)
dataset = await get_authorized_existing_datasets([dataset_id], "delete", user)
if dataset is None:
raise DatasetNotFoundError(message=f"Dataset ({str(dataset_id)}) not found.")
await delete_dataset(dataset)
await delete_dataset(dataset[0])
@router.delete(
"/{dataset_id}/data/{data_id}",

View file

@ -5,6 +5,7 @@ from pathlib import Path
from datetime import datetime, timezone
from typing import Optional, List
from dataclasses import dataclass
from fastapi import UploadFile
@dataclass
@ -45,8 +46,10 @@ class OntologyService:
json.dump(metadata, f, indent=2)
async def upload_ontology(
self, ontology_key: str, file, user, description: Optional[str] = None
self, ontology_key: str, file: UploadFile, user, description: Optional[str] = None
) -> OntologyMetadata:
if not file.filename:
raise ValueError("File must have a filename")
if not file.filename.lower().endswith(".owl"):
raise ValueError("File must be in .owl format")
@ -57,8 +60,6 @@ class OntologyService:
raise ValueError(f"Ontology key '{ontology_key}' already exists")
content = await file.read()
if len(content) > 10 * 1024 * 1024:
raise ValueError("File size exceeds 10MB limit")
file_path = user_dir / f"{ontology_key}.owl"
with open(file_path, "wb") as f:
@ -82,7 +83,11 @@ class OntologyService:
)
async def upload_ontologies(
self, ontology_key: List[str], files: List, user, descriptions: Optional[List[str]] = None
self,
ontology_key: List[str],
files: List[UploadFile],
user,
descriptions: Optional[List[str]] = None,
) -> List[OntologyMetadata]:
"""
Upload ontology files with their respective keys.
@ -105,47 +110,17 @@ class OntologyService:
if len(set(ontology_key)) != len(ontology_key):
raise ValueError("Duplicate ontology keys not allowed")
if descriptions and len(descriptions) != len(files):
raise ValueError("Number of descriptions must match number of files")
results = []
user_dir = self._get_user_dir(str(user.id))
metadata = self._load_metadata(user_dir)
for i, (key, file) in enumerate(zip(ontology_key, files)):
if key in metadata:
raise ValueError(f"Ontology key '{key}' already exists")
if not file.filename.lower().endswith(".owl"):
raise ValueError(f"File '{file.filename}' must be in .owl format")
content = await file.read()
if len(content) > 10 * 1024 * 1024:
raise ValueError(f"File '{file.filename}' exceeds 10MB limit")
file_path = user_dir / f"{key}.owl"
with open(file_path, "wb") as f:
f.write(content)
ontology_metadata = {
"filename": file.filename,
"size_bytes": len(content),
"uploaded_at": datetime.now(timezone.utc).isoformat(),
"description": descriptions[i] if descriptions else None,
}
metadata[key] = ontology_metadata
results.append(
OntologyMetadata(
await self.upload_ontology(
ontology_key=key,
filename=file.filename,
size_bytes=len(content),
uploaded_at=ontology_metadata["uploaded_at"],
file=file,
user=user,
description=descriptions[i] if descriptions else None,
)
)
self._save_metadata(user_dir, metadata)
return results
def get_ontology_contents(self, ontology_key: List[str], user) -> List[str]:

View file

@ -1,4 +1,4 @@
from fastapi import APIRouter, File, Form, UploadFile, Depends, HTTPException
from fastapi import APIRouter, File, Form, UploadFile, Depends, Request
from fastapi.responses import JSONResponse
from typing import Optional, List
@ -15,28 +15,25 @@ def get_ontology_router() -> APIRouter:
@router.post("", response_model=dict)
async def upload_ontology(
request: Request,
ontology_key: str = Form(...),
ontology_file: List[UploadFile] = File(...),
descriptions: Optional[str] = Form(None),
ontology_file: UploadFile = File(...),
description: Optional[str] = Form(None),
user: User = Depends(get_authenticated_user),
):
"""
Upload ontology files with their respective keys for later use in cognify operations.
Supports both single and multiple file uploads:
- Single file: ontology_key=["key"], ontology_file=[file]
- Multiple files: ontology_key=["key1", "key2"], ontology_file=[file1, file2]
Upload a single ontology file for later use in cognify operations.
## Request Parameters
- **ontology_key** (str): JSON array string of user-defined identifiers for the ontologies
- **ontology_file** (List[UploadFile]): OWL format ontology files
- **descriptions** (Optional[str]): JSON array string of optional descriptions
- **ontology_key** (str): User-defined identifier for the ontology.
- **ontology_file** (UploadFile): Single OWL format ontology file
- **description** (Optional[str]): Optional description for the ontology.
## Response
Returns metadata about uploaded ontologies including keys, filenames, sizes, and upload timestamps.
Returns metadata about the uploaded ontology including key, filename, size, and upload timestamp.
## Error Codes
- **400 Bad Request**: Invalid file format, duplicate keys, array length mismatches, file size exceeded
- **400 Bad Request**: Invalid file format, duplicate key, multiple files uploaded
- **500 Internal Server Error**: File system or processing errors
"""
send_telemetry(
@ -49,16 +46,22 @@ def get_ontology_router() -> APIRouter:
)
try:
import json
# Enforce: exactly one uploaded file for "ontology_file"
form = await request.form()
uploaded_files = form.getlist("ontology_file")
if len(uploaded_files) != 1:
raise ValueError("Only one ontology_file is allowed")
ontology_keys = json.loads(ontology_key)
description_list = json.loads(descriptions) if descriptions else None
if ontology_key.strip().startswith(("[", "{")):
raise ValueError("ontology_key must be a string")
if description is not None and description.strip().startswith(("[", "{")):
raise ValueError("description must be a string")
if not isinstance(ontology_keys, list):
raise ValueError("ontology_key must be a JSON array")
results = await ontology_service.upload_ontologies(
ontology_keys, ontology_file, user, description_list
result = await ontology_service.upload_ontology(
ontology_key=ontology_key,
file=ontology_file,
user=user,
description=description,
)
return {
@ -70,10 +73,9 @@ def get_ontology_router() -> APIRouter:
"uploaded_at": result.uploaded_at,
"description": result.description,
}
for result in results
]
}
except (json.JSONDecodeError, ValueError) as e:
except ValueError as e:
return JSONResponse(status_code=400, content={"error": str(e)})
except Exception as e:
return JSONResponse(status_code=500, content={"error": str(e)})

View file

@ -33,6 +33,7 @@ async def search(
session_id: Optional[str] = None,
wide_search_top_k: Optional[int] = 100,
triplet_distance_penalty: Optional[float] = 3.5,
verbose: bool = False,
) -> Union[List[SearchResult], CombinedSearchResult]:
"""
Search and query the knowledge graph for insights, information, and connections.
@ -123,6 +124,8 @@ async def search(
session_id: Optional session identifier for caching Q&A interactions. Defaults to 'default_session' if None.
verbose: If True, returns detailed result information including graph representation (when possible).
Returns:
list: Search results in format determined by query_type:
@ -204,6 +207,7 @@ async def search(
session_id=session_id,
wide_search_top_k=wide_search_top_k,
triplet_distance_penalty=triplet_distance_penalty,
verbose=verbose,
)
return filtered_search_results

View file

@ -0,0 +1,360 @@
import os
import platform
import subprocess
import tempfile
from pathlib import Path
import requests
from cognee.shared.logging_utils import get_logger
logger = get_logger()
def get_nvm_dir() -> Path:
"""
Get the nvm directory path following standard nvm installation logic.
Uses XDG_CONFIG_HOME if set, otherwise falls back to ~/.nvm.
"""
xdg_config_home = os.environ.get("XDG_CONFIG_HOME")
if xdg_config_home:
return Path(xdg_config_home) / "nvm"
return Path.home() / ".nvm"
def get_nvm_sh_path() -> Path:
"""
Get the path to nvm.sh following standard nvm installation logic.
"""
return get_nvm_dir() / "nvm.sh"
def check_nvm_installed() -> bool:
"""
Check if nvm (Node Version Manager) is installed.
"""
try:
# Check if nvm is available in the shell
# nvm is typically sourced in shell config files, so we need to check via shell
if platform.system() == "Windows":
# On Windows, nvm-windows uses a different approach
result = subprocess.run(
["nvm", "version"],
capture_output=True,
text=True,
timeout=10,
shell=True,
)
else:
# On Unix-like systems, nvm is a shell function, so we need to source it
# First check if nvm.sh exists
nvm_path = get_nvm_sh_path()
if not nvm_path.exists():
logger.debug(f"nvm.sh not found at {nvm_path}")
return False
# Try to source nvm and check version, capturing errors
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && nvm --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
# Log the error to help diagnose configuration issues
if result.stderr:
logger.debug(f"nvm check failed: {result.stderr.strip()}")
return False
return result.returncode == 0
except Exception as e:
logger.debug(f"Exception checking nvm: {str(e)}")
return False
def install_nvm() -> bool:
"""
Install nvm (Node Version Manager) on Unix-like systems.
"""
if platform.system() == "Windows":
logger.error("nvm installation on Windows requires nvm-windows.")
logger.error(
"Please install nvm-windows manually from: https://github.com/coreybutler/nvm-windows"
)
return False
logger.info("Installing nvm (Node Version Manager)...")
try:
# Download and install nvm
nvm_install_script = "https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh"
logger.info(f"Downloading nvm installer from {nvm_install_script}...")
response = requests.get(nvm_install_script, timeout=60)
response.raise_for_status()
# Create a temporary script file
with tempfile.NamedTemporaryFile(mode="w", suffix=".sh", delete=False) as f:
f.write(response.text)
install_script_path = f.name
try:
# Make the script executable and run it
os.chmod(install_script_path, 0o755)
result = subprocess.run(
["bash", install_script_path],
capture_output=True,
text=True,
timeout=120,
)
if result.returncode == 0:
logger.info("✓ nvm installed successfully")
# Source nvm in current shell session
nvm_dir = get_nvm_dir()
if nvm_dir.exists():
return True
else:
logger.warning(
f"nvm installation completed but nvm directory not found at {nvm_dir}"
)
return False
else:
logger.error(f"nvm installation failed: {result.stderr}")
return False
finally:
# Clean up temporary script
try:
os.unlink(install_script_path)
except Exception:
pass
except requests.exceptions.RequestException as e:
logger.error(f"Failed to download nvm installer: {str(e)}")
return False
except Exception as e:
logger.error(f"Failed to install nvm: {str(e)}")
return False
def install_node_with_nvm() -> bool:
"""
Install the latest Node.js version using nvm.
Returns True if installation succeeds, False otherwise.
"""
if platform.system() == "Windows":
logger.error("Node.js installation via nvm on Windows requires nvm-windows.")
logger.error("Please install Node.js manually from: https://nodejs.org/")
return False
logger.info("Installing latest Node.js version using nvm...")
try:
# Source nvm and install latest Node.js
nvm_path = get_nvm_sh_path()
if not nvm_path.exists():
logger.error(f"nvm.sh not found at {nvm_path}. nvm may not be properly installed.")
return False
nvm_source_cmd = f"source {nvm_path}"
install_cmd = f"{nvm_source_cmd} && nvm install node"
result = subprocess.run(
["bash", "-c", install_cmd],
capture_output=True,
text=True,
timeout=300, # 5 minutes timeout for Node.js installation
)
if result.returncode == 0:
logger.info("✓ Node.js installed successfully via nvm")
# Set as default version
use_cmd = f"{nvm_source_cmd} && nvm alias default node"
subprocess.run(
["bash", "-c", use_cmd],
capture_output=True,
text=True,
timeout=30,
)
# Add nvm to PATH for current session
# This ensures node/npm are available in subsequent commands
nvm_dir = get_nvm_dir()
if nvm_dir.exists():
# Update PATH for current process
nvm_bin = nvm_dir / "versions" / "node"
# Find the latest installed version
if nvm_bin.exists():
versions = sorted(nvm_bin.iterdir(), reverse=True)
if versions:
latest_node_bin = versions[0] / "bin"
if latest_node_bin.exists():
current_path = os.environ.get("PATH", "")
os.environ["PATH"] = f"{latest_node_bin}:{current_path}"
return True
else:
logger.error(f"Failed to install Node.js: {result.stderr}")
return False
except subprocess.TimeoutExpired:
logger.error("Timeout installing Node.js (this can take several minutes)")
return False
except Exception as e:
logger.error(f"Error installing Node.js: {str(e)}")
return False
def check_node_npm() -> tuple[bool, str]: # (is_available, error_message)
"""
Check if Node.js and npm are available.
If not available, attempts to install nvm and Node.js automatically.
"""
try:
# Check Node.js - try direct command first, then with nvm if needed
result = subprocess.run(["node", "--version"], capture_output=True, text=True, timeout=10)
if result.returncode != 0:
# If direct command fails, try with nvm sourced (in case nvm is installed but not in PATH)
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && node --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0 and result.stderr:
logger.debug(f"Failed to source nvm or run node: {result.stderr.strip()}")
if result.returncode != 0:
# Node.js is not installed, try to install it
logger.info("Node.js is not installed. Attempting to install automatically...")
# Check if nvm is installed
if not check_nvm_installed():
logger.info("nvm is not installed. Installing nvm first...")
if not install_nvm():
return (
False,
"Failed to install nvm. Please install Node.js manually from https://nodejs.org/",
)
# Install Node.js using nvm
if not install_node_with_nvm():
return (
False,
"Failed to install Node.js. Please install Node.js manually from https://nodejs.org/",
)
# Verify installation after automatic setup
# Try with nvm sourced first
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && node --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0 and result.stderr:
logger.debug(
f"Failed to verify node after installation: {result.stderr.strip()}"
)
else:
result = subprocess.run(
["node", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
nvm_path = get_nvm_sh_path()
return (
False,
f"Node.js installation completed but node command is not available. Please restart your terminal or source {nvm_path}",
)
node_version = result.stdout.strip()
logger.debug(f"Found Node.js version: {node_version}")
# Check npm - handle Windows PowerShell scripts
if platform.system() == "Windows":
# On Windows, npm might be a PowerShell script, so we need to use shell=True
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10, shell=True
)
else:
# On Unix-like systems, if we just installed via nvm, we may need to source nvm
# Try direct command first
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
# Try with nvm sourced
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && npm --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0 and result.stderr:
logger.debug(f"Failed to source nvm or run npm: {result.stderr.strip()}")
if result.returncode != 0:
return False, "npm is not installed or not in PATH"
npm_version = result.stdout.strip()
logger.debug(f"Found npm version: {npm_version}")
return True, f"Node.js {node_version}, npm {npm_version}"
except subprocess.TimeoutExpired:
return False, "Timeout checking Node.js/npm installation"
except FileNotFoundError:
# Node.js is not installed, try to install it
logger.info("Node.js is not found. Attempting to install automatically...")
# Check if nvm is installed
if not check_nvm_installed():
logger.info("nvm is not installed. Installing nvm first...")
if not install_nvm():
return (
False,
"Failed to install nvm. Please install Node.js manually from https://nodejs.org/",
)
# Install Node.js using nvm
if not install_node_with_nvm():
return (
False,
"Failed to install Node.js. Please install Node.js manually from https://nodejs.org/",
)
# Retry checking Node.js after installation
try:
result = subprocess.run(
["node", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode == 0:
node_version = result.stdout.strip()
# Check npm
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
result = subprocess.run(
["bash", "-c", f"source {nvm_path} && npm --version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode == 0:
npm_version = result.stdout.strip()
return True, f"Node.js {node_version}, npm {npm_version}"
elif result.stderr:
logger.debug(f"Failed to source nvm or run npm: {result.stderr.strip()}")
except Exception as e:
logger.debug(f"Exception retrying node/npm check: {str(e)}")
return False, "Node.js/npm not found. Please install Node.js from https://nodejs.org/"
except Exception as e:
return False, f"Error checking Node.js/npm: {str(e)}"

View file

@ -0,0 +1,50 @@
import platform
import subprocess
from pathlib import Path
from typing import List
from cognee.shared.logging_utils import get_logger
from .node_setup import get_nvm_sh_path
logger = get_logger()
def run_npm_command(cmd: List[str], cwd: Path, timeout: int = 300) -> subprocess.CompletedProcess:
"""
Run an npm command, ensuring nvm is sourced if needed (Unix-like systems only).
Returns the CompletedProcess result.
"""
if platform.system() == "Windows":
# On Windows, use shell=True for npm commands
return subprocess.run(
cmd,
cwd=cwd,
capture_output=True,
text=True,
timeout=timeout,
shell=True,
)
else:
# On Unix-like systems, try direct command first
result = subprocess.run(
cmd,
cwd=cwd,
capture_output=True,
text=True,
timeout=timeout,
)
# If it fails and nvm might be installed, try with nvm sourced
if result.returncode != 0:
nvm_path = get_nvm_sh_path()
if nvm_path.exists():
nvm_cmd = f"source {nvm_path} && {' '.join(cmd)}"
result = subprocess.run(
["bash", "-c", nvm_cmd],
cwd=cwd,
capture_output=True,
text=True,
timeout=timeout,
)
if result.returncode != 0 and result.stderr:
logger.debug(f"npm command failed with nvm: {result.stderr.strip()}")
return result

View file

@ -15,6 +15,8 @@ import shutil
from cognee.shared.logging_utils import get_logger
from cognee.version import get_cognee_version
from .node_setup import check_node_npm, get_nvm_dir, get_nvm_sh_path
from .npm_utils import run_npm_command
logger = get_logger()
@ -285,48 +287,6 @@ def find_frontend_path() -> Optional[Path]:
return None
def check_node_npm() -> tuple[bool, str]:
"""
Check if Node.js and npm are available.
Returns (is_available, error_message)
"""
try:
# Check Node.js
result = subprocess.run(["node", "--version"], capture_output=True, text=True, timeout=10)
if result.returncode != 0:
return False, "Node.js is not installed or not in PATH"
node_version = result.stdout.strip()
logger.debug(f"Found Node.js version: {node_version}")
# Check npm - handle Windows PowerShell scripts
if platform.system() == "Windows":
# On Windows, npm might be a PowerShell script, so we need to use shell=True
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10, shell=True
)
else:
result = subprocess.run(
["npm", "--version"], capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
return False, "npm is not installed or not in PATH"
npm_version = result.stdout.strip()
logger.debug(f"Found npm version: {npm_version}")
return True, f"Node.js {node_version}, npm {npm_version}"
except subprocess.TimeoutExpired:
return False, "Timeout checking Node.js/npm installation"
except FileNotFoundError:
return False, "Node.js/npm not found. Please install Node.js from https://nodejs.org/"
except Exception as e:
return False, f"Error checking Node.js/npm: {str(e)}"
def install_frontend_dependencies(frontend_path: Path) -> bool:
"""
Install frontend dependencies if node_modules doesn't exist.
@ -341,24 +301,7 @@ def install_frontend_dependencies(frontend_path: Path) -> bool:
logger.info("Installing frontend dependencies (this may take a few minutes)...")
try:
# Use shell=True on Windows for npm commands
if platform.system() == "Windows":
result = subprocess.run(
["npm", "install"],
cwd=frontend_path,
capture_output=True,
text=True,
timeout=300, # 5 minutes timeout
shell=True,
)
else:
result = subprocess.run(
["npm", "install"],
cwd=frontend_path,
capture_output=True,
text=True,
timeout=300, # 5 minutes timeout
)
result = run_npm_command(["npm", "install"], frontend_path, timeout=300)
if result.returncode == 0:
logger.info("Frontend dependencies installed successfully")
@ -642,6 +585,21 @@ def start_ui(
env["HOST"] = "localhost"
env["PORT"] = str(port)
# If nvm is installed, ensure it's available in the environment
nvm_path = get_nvm_sh_path()
if platform.system() != "Windows" and nvm_path.exists():
# Add nvm to PATH for the subprocess
nvm_dir = get_nvm_dir()
# Find the latest Node.js version installed via nvm
nvm_versions = nvm_dir / "versions" / "node"
if nvm_versions.exists():
versions = sorted(nvm_versions.iterdir(), reverse=True)
if versions:
latest_node_bin = versions[0] / "bin"
if latest_node_bin.exists():
current_path = env.get("PATH", "")
env["PATH"] = f"{latest_node_bin}:{current_path}"
# Start the development server
logger.info(f"Starting frontend server at http://localhost:{port}")
logger.info("This may take a moment to compile and start...")
@ -659,14 +617,26 @@ def start_ui(
shell=True,
)
else:
process = subprocess.Popen(
["npm", "run", "dev"],
cwd=frontend_path,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
)
# On Unix-like systems, use bash with nvm sourced if available
if nvm_path.exists():
# Use bash to source nvm and run npm
process = subprocess.Popen(
["bash", "-c", f"source {nvm_path} && npm run dev"],
cwd=frontend_path,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
)
else:
process = subprocess.Popen(
["npm", "run", "dev"],
cwd=frontend_path,
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
)
# Start threads to stream frontend output with prefix
_stream_process_output(process, "stdout", "[FRONTEND]", "\033[33m") # Yellow

View file

@ -4,9 +4,10 @@ from typing import Union
from uuid import UUID
from cognee.base_config import get_base_config
from cognee.infrastructure.databases.vector.config import get_vectordb_context_config
from cognee.infrastructure.databases.graph.config import get_graph_context_config
from cognee.infrastructure.databases.vector.config import get_vectordb_config
from cognee.infrastructure.databases.graph.config import get_graph_config
from cognee.infrastructure.databases.utils import get_or_create_dataset_database
from cognee.infrastructure.databases.utils import resolve_dataset_database_connection_info
from cognee.infrastructure.files.storage.config import file_storage_config
from cognee.modules.users.methods import get_user
@ -16,22 +17,59 @@ vector_db_config = ContextVar("vector_db_config", default=None)
graph_db_config = ContextVar("graph_db_config", default=None)
session_user = ContextVar("session_user", default=None)
VECTOR_DBS_WITH_MULTI_USER_SUPPORT = ["lancedb", "falkor"]
GRAPH_DBS_WITH_MULTI_USER_SUPPORT = ["kuzu", "falkor"]
async def set_session_user_context_variable(user):
session_user.set(user)
def multi_user_support_possible():
graph_db_config = get_graph_context_config()
vector_db_config = get_vectordb_context_config()
return (
graph_db_config["graph_database_provider"] in GRAPH_DBS_WITH_MULTI_USER_SUPPORT
and vector_db_config["vector_db_provider"] in VECTOR_DBS_WITH_MULTI_USER_SUPPORT
graph_db_config = get_graph_config()
vector_db_config = get_vectordb_config()
graph_handler = graph_db_config.graph_dataset_database_handler
vector_handler = vector_db_config.vector_dataset_database_handler
from cognee.infrastructure.databases.dataset_database_handler import (
supported_dataset_database_handlers,
)
if graph_handler not in supported_dataset_database_handlers:
raise EnvironmentError(
"Unsupported graph dataset to database handler configured. Cannot add support for multi-user access control mode. Please use a supported graph dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected graph dataset to database handler: {graph_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
if vector_handler not in supported_dataset_database_handlers:
raise EnvironmentError(
"Unsupported vector dataset to database handler configured. Cannot add support for multi-user access control mode. Please use a supported vector dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected vector dataset to database handler: {vector_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
if (
supported_dataset_database_handlers[graph_handler]["handler_provider"]
!= graph_db_config.graph_database_provider
):
raise EnvironmentError(
"The selected graph dataset to database handler does not work with the configured graph database provider. Cannot add support for multi-user access control mode. Please use a supported graph dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected graph database provider: {graph_db_config.graph_database_provider}\n"
f"Selected graph dataset to database handler: {graph_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
if (
supported_dataset_database_handlers[vector_handler]["handler_provider"]
!= vector_db_config.vector_db_provider
):
raise EnvironmentError(
"The selected vector dataset to database handler does not work with the configured vector database provider. Cannot add support for multi-user access control mode. Please use a supported vector dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
f"Selected vector database provider: {vector_db_config.vector_db_provider}\n"
f"Selected vector dataset to database handler: {vector_handler}\n"
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
)
return True
def backend_access_control_enabled():
backend_access_control = os.environ.get("ENABLE_BACKEND_ACCESS_CONTROL", None)
@ -41,12 +79,7 @@ def backend_access_control_enabled():
return multi_user_support_possible()
elif backend_access_control.lower() == "true":
# If enabled, ensure that the current graph and vector DBs can support it
multi_user_support = multi_user_support_possible()
if not multi_user_support:
raise EnvironmentError(
"ENABLE_BACKEND_ACCESS_CONTROL is set to true but the current graph and/or vector databases do not support multi-user access control. Please use supported databases or disable backend access control."
)
return True
return multi_user_support_possible()
return False
@ -76,6 +109,8 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
# To ensure permissions are enforced properly all datasets will have their own databases
dataset_database = await get_or_create_dataset_database(dataset, user)
# Ensure that all connection info is resolved properly
dataset_database = await resolve_dataset_database_connection_info(dataset_database)
base_config = get_base_config()
data_root_directory = os.path.join(
@ -86,6 +121,8 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
)
# Set vector and graph database configuration based on dataset database information
# TODO: Add better handling of vector and graph config accross Cognee.
# LRU_CACHE takes into account order of inputs, if order of inputs is changed it will be registered as a new DB adapter
vector_config = {
"vector_db_provider": dataset_database.vector_database_provider,
"vector_db_url": dataset_database.vector_database_url,
@ -101,6 +138,14 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
"graph_file_path": os.path.join(
databases_directory_path, dataset_database.graph_database_name
),
"graph_database_username": dataset_database.graph_database_connection_info.get(
"graph_database_username", ""
),
"graph_database_password": dataset_database.graph_database_connection_info.get(
"graph_database_password", ""
),
"graph_dataset_database_handler": "",
"graph_database_port": "",
}
storage_config = {

View file

@ -8,7 +8,6 @@ from cognee.modules.users.models import User
from cognee.shared.data_models import KnowledgeGraph
from cognee.shared.utils import send_telemetry
from cognee.tasks.documents import (
check_permissions_on_dataset,
classify_documents,
extract_chunks_from_documents,
)
@ -31,7 +30,6 @@ async def get_cascade_graph_tasks(
cognee_config = get_cognify_config()
default_tasks = [
Task(classify_documents),
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
Task(
extract_chunks_from_documents, max_chunk_tokens=get_max_chunk_tokens()
), # Extract text chunks based on the document type.

View file

@ -30,8 +30,8 @@ async def get_no_summary_tasks(
ontology_file_path=None,
) -> List[Task]:
"""Returns default tasks without summarization tasks."""
# Get base tasks (0=classify, 1=check_permissions, 2=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1, 2], chunk_size, chunker)
# Get base tasks (0=classify, 1=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1], chunk_size, chunker)
ontology_adapter = RDFLibOntologyResolver(ontology_file=ontology_file_path)
@ -51,8 +51,8 @@ async def get_just_chunks_tasks(
chunk_size: int = None, chunker=TextChunker, user=None
) -> List[Task]:
"""Returns default tasks with only chunk extraction and data points addition."""
# Get base tasks (0=classify, 1=check_permissions, 2=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1, 2], chunk_size, chunker)
# Get base tasks (0=classify, 1=extract_chunks)
base_tasks = await get_default_tasks_by_indices([0, 1], chunk_size, chunker)
add_data_points_task = Task(add_data_points, task_config={"batch_size": 10})

View file

@ -0,0 +1,3 @@
from .dataset_database_handler_interface import DatasetDatabaseHandlerInterface
from .supported_dataset_database_handlers import supported_dataset_database_handlers
from .use_dataset_database_handler import use_dataset_database_handler

View file

@ -0,0 +1,80 @@
from typing import Optional
from uuid import UUID
from abc import ABC, abstractmethod
from cognee.modules.users.models.User import User
from cognee.modules.users.models.DatasetDatabase import DatasetDatabase
class DatasetDatabaseHandlerInterface(ABC):
@classmethod
@abstractmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Return a dictionary with database connection/resolution info for a graph or vector database for the given dataset.
Function can auto handle deploying of the actual database if needed, but is not necessary.
Only providing connection info is sufficient, this info will be mapped when trying to connect to the provided dataset in the future.
Needed for Cognee multi-tenant/multi-user and backend access control support.
Dictionary returned from this function will be used to create a DatasetDatabase row in the relational database.
From which internal mapping of dataset -> database connection info will be done.
The returned dictionary is stored verbatim in the relational database and is later passed to
resolve_dataset_connection_info() at connection time. For safe credential handling, prefer
returning only references to secrets or role identifiers, not plaintext credentials.
Each dataset needs to map to a unique graph or vector database when backend access control is enabled to facilitate a separation of concern for data.
Args:
dataset_id: UUID of the dataset if needed by the database creation logic
user: User object if needed by the database creation logic
Returns:
dict: Connection info for the created graph or vector database instance.
"""
pass
@classmethod
async def resolve_dataset_connection_info(
cls, dataset_database: DatasetDatabase
) -> DatasetDatabase:
"""
Resolve runtime connection details for a datasets backing graph/vector database.
Function is intended to be overwritten to implement custom logic for resolving connection info.
This method is invoked right before the application opens a connection for a given dataset.
It receives the DatasetDatabase row that was persisted when create_dataset() ran and must
return a modified instance of DatasetDatabase with concrete connection parameters that the client/driver can use.
Do not update these new DatasetDatabase values in the relational database to avoid storing secure credentials.
In case of separate graph and vector database handlers, each handler should implement its own logic for resolving
connection info and only change parameters related to its appropriate database, the resolution function will then
be called one after another with the updated DatasetDatabase value from the previous function as the input.
Typical behavior:
- If the DatasetDatabase row already contains raw connection fields (e.g., host/port/db/user/password
or api_url/api_key), return them as-is.
- If the row stores only references (e.g., secret IDs, vault paths, cloud resource ARNs/IDs, IAM
roles, SSO tokens), resolve those references by calling the appropriate secret manager or provider
API to obtain short-lived credentials and assemble the final connection DatasetDatabase object.
- Do not persist any resolved or decrypted secrets back to the relational database. Return them only
to the caller.
Args:
dataset_database: DatasetDatabase row from the relational database
Returns:
DatasetDatabase: Updated instance with resolved connection info
"""
return dataset_database
@classmethod
@abstractmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase) -> None:
"""
Delete the graph or vector database for the given dataset.
Function should auto handle deleting of the actual database or send a request to the proper service to delete/mark the database as not needed for the given dataset.
Needed for maintaining a database for Cognee multi-tenant/multi-user and backend access control.
Args:
dataset_database: DatasetDatabase row containing connection/resolution info for the graph or vector database to delete.
"""
pass

View file

@ -0,0 +1,18 @@
from cognee.infrastructure.databases.graph.neo4j_driver.Neo4jAuraDevDatasetDatabaseHandler import (
Neo4jAuraDevDatasetDatabaseHandler,
)
from cognee.infrastructure.databases.vector.lancedb.LanceDBDatasetDatabaseHandler import (
LanceDBDatasetDatabaseHandler,
)
from cognee.infrastructure.databases.graph.kuzu.KuzuDatasetDatabaseHandler import (
KuzuDatasetDatabaseHandler,
)
supported_dataset_database_handlers = {
"neo4j_aura_dev": {
"handler_instance": Neo4jAuraDevDatasetDatabaseHandler,
"handler_provider": "neo4j",
},
"lancedb": {"handler_instance": LanceDBDatasetDatabaseHandler, "handler_provider": "lancedb"},
"kuzu": {"handler_instance": KuzuDatasetDatabaseHandler, "handler_provider": "kuzu"},
}

View file

@ -0,0 +1,10 @@
from .supported_dataset_database_handlers import supported_dataset_database_handlers
def use_dataset_database_handler(
dataset_database_handler_name, dataset_database_handler, dataset_database_provider
):
supported_dataset_database_handlers[dataset_database_handler_name] = {
"handler_instance": dataset_database_handler,
"handler_provider": dataset_database_provider,
}

View file

@ -47,6 +47,7 @@ class GraphConfig(BaseSettings):
graph_filename: str = ""
graph_model: object = KnowledgeGraph
graph_topology: object = KnowledgeGraph
graph_dataset_database_handler: str = "kuzu"
model_config = SettingsConfigDict(env_file=".env", extra="allow", populate_by_name=True)
# Model validator updates graph_filename and path dynamically after class creation based on current database provider
@ -97,6 +98,7 @@ class GraphConfig(BaseSettings):
"graph_model": self.graph_model,
"graph_topology": self.graph_topology,
"model_config": self.model_config,
"graph_dataset_database_handler": self.graph_dataset_database_handler,
}
def to_hashable_dict(self) -> dict:
@ -121,6 +123,7 @@ class GraphConfig(BaseSettings):
"graph_database_port": self.graph_database_port,
"graph_database_key": self.graph_database_key,
"graph_file_path": self.graph_file_path,
"graph_dataset_database_handler": self.graph_dataset_database_handler,
}

View file

@ -34,6 +34,7 @@ def create_graph_engine(
graph_database_password="",
graph_database_port="",
graph_database_key="",
graph_dataset_database_handler="",
):
"""
Create a graph engine based on the specified provider type.

View file

@ -0,0 +1,81 @@
import os
from uuid import UUID
from typing import Optional
from cognee.infrastructure.databases.graph.get_graph_engine import create_graph_engine
from cognee.base_config import get_base_config
from cognee.modules.users.models import User
from cognee.modules.users.models import DatasetDatabase
from cognee.infrastructure.databases.dataset_database_handler import DatasetDatabaseHandlerInterface
class KuzuDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"""
Handler for interacting with Kuzu Dataset databases.
"""
@classmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Create a new Kuzu instance for the dataset. Return connection info that will be mapped to the dataset.
Args:
dataset_id: Dataset UUID
user: User object who owns the dataset and is making the request
Returns:
dict: Connection details for the created Kuzu instance
"""
from cognee.infrastructure.databases.graph.config import get_graph_config
graph_config = get_graph_config()
if graph_config.graph_database_provider != "kuzu":
raise ValueError(
"KuzuDatasetDatabaseHandler can only be used with Kuzu graph database provider."
)
graph_db_name = f"{dataset_id}.pkl"
graph_db_url = graph_config.graph_database_url
graph_db_key = graph_config.graph_database_key
graph_db_username = graph_config.graph_database_username
graph_db_password = graph_config.graph_database_password
return {
"graph_database_name": graph_db_name,
"graph_database_url": graph_db_url,
"graph_database_provider": graph_config.graph_database_provider,
"graph_database_key": graph_db_key,
"graph_dataset_database_handler": "kuzu",
"graph_database_connection_info": {
"graph_database_username": graph_db_username,
"graph_database_password": graph_db_password,
},
}
@classmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase):
base_config = get_base_config()
databases_directory_path = os.path.join(
base_config.system_root_directory, "databases", str(dataset_database.owner_id)
)
graph_file_path = os.path.join(
databases_directory_path, dataset_database.graph_database_name
)
graph_engine = create_graph_engine(
graph_database_provider=dataset_database.graph_database_provider,
graph_database_url=dataset_database.graph_database_url,
graph_database_name=dataset_database.graph_database_name,
graph_database_key=dataset_database.graph_database_key,
graph_file_path=graph_file_path,
graph_database_username=dataset_database.graph_database_connection_info.get(
"graph_database_username", ""
),
graph_database_password=dataset_database.graph_database_connection_info.get(
"graph_database_password", ""
),
graph_dataset_database_handler="",
graph_database_port="",
)
await graph_engine.delete_graph()

View file

@ -2005,3 +2005,134 @@ class KuzuAdapter(GraphDBInterface):
time_ids_list = [item[0] for item in time_nodes]
return ", ".join(f"'{uid}'" for uid in time_ids_list)
async def get_triplets_batch(self, offset: int, limit: int) -> list[dict[str, Any]]:
"""
Retrieve a batch of triplets (start_node, relationship, end_node) from the graph.
Parameters:
-----------
- offset (int): Number of triplets to skip before returning results.
- limit (int): Maximum number of triplets to return.
Returns:
--------
- list[dict[str, Any]]: A list of triplets, where each triplet is a dictionary
with keys: 'start_node', 'relationship_properties', 'end_node'.
Raises:
-------
- ValueError: If offset or limit are negative.
- Exception: Re-raises any exceptions from query execution.
"""
if offset < 0:
raise ValueError(f"Offset must be non-negative, got {offset}")
if limit < 0:
raise ValueError(f"Limit must be non-negative, got {limit}")
query = """
MATCH (start_node:Node)-[relationship:EDGE]->(end_node:Node)
RETURN {
start_node: {
id: start_node.id,
name: start_node.name,
type: start_node.type,
properties: start_node.properties
},
relationship_properties: {
relationship_name: relationship.relationship_name,
properties: relationship.properties
},
end_node: {
id: end_node.id,
name: end_node.name,
type: end_node.type,
properties: end_node.properties
}
} AS triplet
SKIP $offset LIMIT $limit
"""
try:
results = await self.query(query, {"offset": offset, "limit": limit})
except Exception as e:
logger.error(f"Failed to execute triplet query: {str(e)}")
logger.error(f"Query: {query}")
logger.error(f"Parameters: offset={offset}, limit={limit}")
raise
triplets = []
for idx, row in enumerate(results):
try:
if not row or len(row) == 0:
logger.warning(f"Skipping empty row at index {idx} in triplet batch")
continue
if not isinstance(row[0], dict):
logger.warning(
f"Skipping invalid row at index {idx}: expected dict, got {type(row[0])}"
)
continue
triplet = row[0]
if "start_node" not in triplet:
logger.warning(f"Skipping triplet at index {idx}: missing 'start_node' key")
continue
if not isinstance(triplet["start_node"], dict):
logger.warning(f"Skipping triplet at index {idx}: 'start_node' is not a dict")
continue
triplet["start_node"] = self._parse_node_properties(triplet["start_node"].copy())
if "relationship_properties" not in triplet:
logger.warning(
f"Skipping triplet at index {idx}: missing 'relationship_properties' key"
)
continue
if not isinstance(triplet["relationship_properties"], dict):
logger.warning(
f"Skipping triplet at index {idx}: 'relationship_properties' is not a dict"
)
continue
rel_props = triplet["relationship_properties"].copy()
relationship_name = rel_props.get("relationship_name") or ""
if rel_props.get("properties"):
try:
parsed_props = json.loads(rel_props["properties"])
if isinstance(parsed_props, dict):
rel_props.update(parsed_props)
del rel_props["properties"]
else:
logger.warning(
f"Parsed relationship properties is not a dict for triplet at index {idx}"
)
except (json.JSONDecodeError, TypeError) as e:
logger.warning(
f"Failed to parse relationship properties JSON for triplet at index {idx}: {e}"
)
rel_props["relationship_name"] = relationship_name
triplet["relationship_properties"] = rel_props
if "end_node" not in triplet:
logger.warning(f"Skipping triplet at index {idx}: missing 'end_node' key")
continue
if not isinstance(triplet["end_node"], dict):
logger.warning(f"Skipping triplet at index {idx}: 'end_node' is not a dict")
continue
triplet["end_node"] = self._parse_node_properties(triplet["end_node"].copy())
triplets.append(triplet)
except Exception as e:
logger.error(f"Error processing triplet at index {idx}: {e}", exc_info=True)
continue
return triplets

View file

@ -0,0 +1,168 @@
import os
import asyncio
import requests
import base64
import hashlib
from uuid import UUID
from typing import Optional
from cryptography.fernet import Fernet
from cognee.infrastructure.databases.graph import get_graph_config
from cognee.modules.users.models import User, DatasetDatabase
from cognee.infrastructure.databases.dataset_database_handler import DatasetDatabaseHandlerInterface
class Neo4jAuraDevDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"""
Handler for a quick development PoC integration of Cognee multi-user and permission mode with Neo4j Aura databases.
This handler creates a new Neo4j Aura instance for each Cognee dataset created.
Improvements needed to be production ready:
- Secret management for client credentials, currently secrets are encrypted and stored in the Cognee relational database,
a secret manager or a similar system should be used instead.
Quality of life improvements:
- Allow configuration of different Neo4j Aura plans and regions.
- Requests should be made async, currently a blocking requests library is used.
"""
@classmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Create a new Neo4j Aura instance for the dataset. Return connection info that will be mapped to the dataset.
Args:
dataset_id: Dataset UUID
user: User object who owns the dataset and is making the request
Returns:
dict: Connection details for the created Neo4j instance
"""
graph_config = get_graph_config()
if graph_config.graph_database_provider != "neo4j":
raise ValueError(
"Neo4jAuraDevDatasetDatabaseHandler can only be used with Neo4j graph database provider."
)
graph_db_name = f"{dataset_id}"
# Client credentials and encryption
client_id = os.environ.get("NEO4J_CLIENT_ID", None)
client_secret = os.environ.get("NEO4J_CLIENT_SECRET", None)
tenant_id = os.environ.get("NEO4J_TENANT_ID", None)
encryption_env_key = os.environ.get("NEO4J_ENCRYPTION_KEY", "test_key")
encryption_key = base64.urlsafe_b64encode(
hashlib.sha256(encryption_env_key.encode()).digest()
)
cipher = Fernet(encryption_key)
if client_id is None or client_secret is None or tenant_id is None:
raise ValueError(
"NEO4J_CLIENT_ID, NEO4J_CLIENT_SECRET, and NEO4J_TENANT_ID environment variables must be set to use Neo4j Aura DatasetDatabase Handling."
)
# Make the request with HTTP Basic Auth
def get_aura_token(client_id: str, client_secret: str) -> dict:
url = "https://api.neo4j.io/oauth/token"
data = {"grant_type": "client_credentials"} # sent as application/x-www-form-urlencoded
resp = requests.post(url, data=data, auth=(client_id, client_secret))
resp.raise_for_status() # raises if the request failed
return resp.json()
resp = get_aura_token(client_id, client_secret)
url = "https://api.neo4j.io/v1/instances"
headers = {
"accept": "application/json",
"Authorization": f"Bearer {resp['access_token']}",
"Content-Type": "application/json",
}
# TODO: Maybe we can allow **kwargs parameter forwarding for cases like these
# Too allow different configurations between datasets
payload = {
"version": "5",
"region": "europe-west1",
"memory": "1GB",
"name": graph_db_name[
0:29
], # TODO: Find better name to name Neo4j instance within 30 character limit
"type": "professional-db",
"tenant_id": tenant_id,
"cloud_provider": "gcp",
}
response = requests.post(url, headers=headers, json=payload)
graph_db_name = "neo4j" # Has to be 'neo4j' for Aura
graph_db_url = response.json()["data"]["connection_url"]
graph_db_key = resp["access_token"]
graph_db_username = response.json()["data"]["username"]
graph_db_password = response.json()["data"]["password"]
async def _wait_for_neo4j_instance_provisioning(instance_id: str, headers: dict):
# Poll until the instance is running
status_url = f"https://api.neo4j.io/v1/instances/{instance_id}"
status = ""
for attempt in range(30): # Try for up to ~5 minutes
status_resp = requests.get(
status_url, headers=headers
) # TODO: Use async requests with httpx
status = status_resp.json()["data"]["status"]
if status.lower() == "running":
return
await asyncio.sleep(10)
raise TimeoutError(
f"Neo4j instance '{graph_db_name}' did not become ready within 5 minutes. Status: {status}"
)
instance_id = response.json()["data"]["id"]
await _wait_for_neo4j_instance_provisioning(instance_id, headers)
encrypted_db_password_bytes = cipher.encrypt(graph_db_password.encode())
encrypted_db_password_string = encrypted_db_password_bytes.decode()
return {
"graph_database_name": graph_db_name,
"graph_database_url": graph_db_url,
"graph_database_provider": "neo4j",
"graph_database_key": graph_db_key,
"graph_dataset_database_handler": "neo4j_aura_dev",
"graph_database_connection_info": {
"graph_database_username": graph_db_username,
"graph_database_password": encrypted_db_password_string,
},
}
@classmethod
async def resolve_dataset_connection_info(
cls, dataset_database: DatasetDatabase
) -> DatasetDatabase:
"""
Resolve and decrypt connection info for the Neo4j dataset database.
In this case, decrypt the password stored in the database.
Args:
dataset_database: DatasetDatabase instance containing encrypted connection info.
"""
encryption_env_key = os.environ.get("NEO4J_ENCRYPTION_KEY", "test_key")
encryption_key = base64.urlsafe_b64encode(
hashlib.sha256(encryption_env_key.encode()).digest()
)
cipher = Fernet(encryption_key)
graph_db_password = cipher.decrypt(
dataset_database.graph_database_connection_info["graph_database_password"].encode()
).decode()
dataset_database.graph_database_connection_info["graph_database_password"] = (
graph_db_password
)
return dataset_database
@classmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase):
pass

View file

@ -8,7 +8,7 @@ from neo4j import AsyncSession
from neo4j import AsyncGraphDatabase
from neo4j.exceptions import Neo4jError
from contextlib import asynccontextmanager
from typing import Optional, Any, List, Dict, Type, Tuple
from typing import Optional, Any, List, Dict, Type, Tuple, Coroutine
from cognee.infrastructure.engine import DataPoint
from cognee.modules.engine.utils.generate_timestamp_datapoint import date_to_int
@ -1527,3 +1527,25 @@ class Neo4jAdapter(GraphDBInterface):
time_ids_list = [item["id"] for item in time_nodes if "id" in item]
return ", ".join(f"'{uid}'" for uid in time_ids_list)
async def get_triplets_batch(self, offset: int, limit: int) -> list[dict[str, Any]]:
"""
Retrieve a batch of triplets (start_node, relationship, end_node) from the graph.
Parameters:
-----------
- offset (int): Number of triplets to skip before returning results.
- limit (int): Maximum number of triplets to return.
Returns:
--------
- list[dict[str, Any]]: A list of triplets.
"""
query = f"""
MATCH (start_node:`{BASE_LABEL}`)-[relationship]->(end_node:`{BASE_LABEL}`)
RETURN start_node, properties(relationship) AS relationship_properties, end_node
SKIP $offset LIMIT $limit
"""
results = await self.query(query, {"offset": offset, "limit": limit})
return results

View file

@ -1 +1,4 @@
from .get_or_create_dataset_database import get_or_create_dataset_database
from .resolve_dataset_database_connection_info import resolve_dataset_database_connection_info
from .get_graph_dataset_database_handler import get_graph_dataset_database_handler
from .get_vector_dataset_database_handler import get_vector_dataset_database_handler

View file

@ -0,0 +1,10 @@
from cognee.modules.users.models.DatasetDatabase import DatasetDatabase
def get_graph_dataset_database_handler(dataset_database: DatasetDatabase) -> dict:
from cognee.infrastructure.databases.dataset_database_handler.supported_dataset_database_handlers import (
supported_dataset_database_handlers,
)
handler = supported_dataset_database_handlers[dataset_database.graph_dataset_database_handler]
return handler

View file

@ -1,11 +1,9 @@
import os
from uuid import UUID
from typing import Union
from typing import Union, Optional
from sqlalchemy import select
from sqlalchemy.exc import IntegrityError
from cognee.base_config import get_base_config
from cognee.modules.data.methods import create_dataset
from cognee.infrastructure.databases.relational import get_relational_engine
from cognee.infrastructure.databases.vector import get_vectordb_config
@ -15,6 +13,53 @@ from cognee.modules.users.models import DatasetDatabase
from cognee.modules.users.models import User
async def _get_vector_db_info(dataset_id: UUID, user: User) -> dict:
vector_config = get_vectordb_config()
from cognee.infrastructure.databases.dataset_database_handler.supported_dataset_database_handlers import (
supported_dataset_database_handlers,
)
handler = supported_dataset_database_handlers[vector_config.vector_dataset_database_handler]
return await handler["handler_instance"].create_dataset(dataset_id, user)
async def _get_graph_db_info(dataset_id: UUID, user: User) -> dict:
graph_config = get_graph_config()
from cognee.infrastructure.databases.dataset_database_handler.supported_dataset_database_handlers import (
supported_dataset_database_handlers,
)
handler = supported_dataset_database_handlers[graph_config.graph_dataset_database_handler]
return await handler["handler_instance"].create_dataset(dataset_id, user)
async def _existing_dataset_database(
dataset_id: UUID,
user: User,
) -> Optional[DatasetDatabase]:
"""
Check if a DatasetDatabase row already exists for the given owner + dataset.
Return None if it doesn't exist, return the row if it does.
Args:
dataset_id:
user:
Returns:
DatasetDatabase or None
"""
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
stmt = select(DatasetDatabase).where(
DatasetDatabase.owner_id == user.id,
DatasetDatabase.dataset_id == dataset_id,
)
existing: DatasetDatabase = await session.scalar(stmt)
return existing
async def get_or_create_dataset_database(
dataset: Union[str, UUID],
user: User,
@ -25,6 +70,8 @@ async def get_or_create_dataset_database(
If the row already exists, it is fetched and returned.
Otherwise a new one is created atomically and returned.
DatasetDatabase row contains connection and provider info for vector and graph databases.
Parameters
----------
user : User
@ -36,59 +83,26 @@ async def get_or_create_dataset_database(
dataset_id = await get_unique_dataset_id(dataset, user)
vector_config = get_vectordb_config()
graph_config = get_graph_config()
# If dataset is given as name make sure the dataset is created first
if isinstance(dataset, str):
async with db_engine.get_async_session() as session:
await create_dataset(dataset, user, session)
# Note: for hybrid databases both graph and vector DB name have to be the same
if graph_config.graph_database_provider == "kuzu":
graph_db_name = f"{dataset_id}.pkl"
else:
graph_db_name = f"{dataset_id}"
# If dataset database already exists return it
existing_dataset_database = await _existing_dataset_database(dataset_id, user)
if existing_dataset_database:
return existing_dataset_database
if vector_config.vector_db_provider == "lancedb":
vector_db_name = f"{dataset_id}.lance.db"
else:
vector_db_name = f"{dataset_id}"
base_config = get_base_config()
databases_directory_path = os.path.join(
base_config.system_root_directory, "databases", str(user.id)
)
# Determine vector database URL
if vector_config.vector_db_provider == "lancedb":
vector_db_url = os.path.join(databases_directory_path, vector_config.vector_db_name)
else:
vector_db_url = vector_config.vector_database_url
# Determine graph database URL
graph_config_dict = await _get_graph_db_info(dataset_id, user)
vector_config_dict = await _get_vector_db_info(dataset_id, user)
async with db_engine.get_async_session() as session:
# Create dataset if it doesn't exist
if isinstance(dataset, str):
dataset = await create_dataset(dataset, user, session)
# Try to fetch an existing row first
stmt = select(DatasetDatabase).where(
DatasetDatabase.owner_id == user.id,
DatasetDatabase.dataset_id == dataset_id,
)
existing: DatasetDatabase = await session.scalar(stmt)
if existing:
return existing
# If there are no existing rows build a new row
record = DatasetDatabase(
owner_id=user.id,
dataset_id=dataset_id,
vector_database_name=vector_db_name,
graph_database_name=graph_db_name,
vector_database_provider=vector_config.vector_db_provider,
graph_database_provider=graph_config.graph_database_provider,
vector_database_url=vector_db_url,
graph_database_url=graph_config.graph_database_url,
vector_database_key=vector_config.vector_db_key,
graph_database_key=graph_config.graph_database_key,
**graph_config_dict, # Unpack graph db config
**vector_config_dict, # Unpack vector db config
)
try:

View file

@ -0,0 +1,10 @@
from cognee.modules.users.models.DatasetDatabase import DatasetDatabase
def get_vector_dataset_database_handler(dataset_database: DatasetDatabase) -> dict:
from cognee.infrastructure.databases.dataset_database_handler.supported_dataset_database_handlers import (
supported_dataset_database_handlers,
)
handler = supported_dataset_database_handlers[dataset_database.vector_dataset_database_handler]
return handler

View file

@ -0,0 +1,30 @@
from cognee.infrastructure.databases.utils.get_graph_dataset_database_handler import (
get_graph_dataset_database_handler,
)
from cognee.infrastructure.databases.utils.get_vector_dataset_database_handler import (
get_vector_dataset_database_handler,
)
from cognee.modules.users.models.DatasetDatabase import DatasetDatabase
async def resolve_dataset_database_connection_info(
dataset_database: DatasetDatabase,
) -> DatasetDatabase:
"""
Resolve the connection info for the given DatasetDatabase instance.
Resolve both vector and graph database connection info and return the updated DatasetDatabase instance.
Args:
dataset_database: DatasetDatabase instance
Returns:
DatasetDatabase instance with resolved connection info
"""
vector_dataset_database_handler = get_vector_dataset_database_handler(dataset_database)
graph_dataset_database_handler = get_graph_dataset_database_handler(dataset_database)
dataset_database = await vector_dataset_database_handler[
"handler_instance"
].resolve_dataset_connection_info(dataset_database)
dataset_database = await graph_dataset_database_handler[
"handler_instance"
].resolve_dataset_connection_info(dataset_database)
return dataset_database

View file

@ -28,6 +28,7 @@ class VectorConfig(BaseSettings):
vector_db_name: str = ""
vector_db_key: str = ""
vector_db_provider: str = "lancedb"
vector_dataset_database_handler: str = "lancedb"
model_config = SettingsConfigDict(env_file=".env", extra="allow")
@ -63,6 +64,7 @@ class VectorConfig(BaseSettings):
"vector_db_name": self.vector_db_name,
"vector_db_key": self.vector_db_key,
"vector_db_provider": self.vector_db_provider,
"vector_dataset_database_handler": self.vector_dataset_database_handler,
}

View file

@ -12,6 +12,7 @@ def create_vector_engine(
vector_db_name: str,
vector_db_port: str = "",
vector_db_key: str = "",
vector_dataset_database_handler: str = "",
):
"""
Create a vector database engine based on the specified provider.

View file

@ -17,6 +17,7 @@ from cognee.infrastructure.databases.exceptions import EmbeddingException
from cognee.infrastructure.llm.tokenizer.TikToken import (
TikTokenTokenizer,
)
from cognee.shared.rate_limiting import embedding_rate_limiter_context_manager
litellm.set_verbose = False
logger = get_logger("FastembedEmbeddingEngine")
@ -68,7 +69,7 @@ class FastembedEmbeddingEngine(EmbeddingEngine):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
@ -96,11 +97,12 @@ class FastembedEmbeddingEngine(EmbeddingEngine):
if self.mock:
return [[0.0] * self.dimensions for _ in text]
else:
embeddings = self.embedding_model.embed(
text,
batch_size=len(text),
parallel=None,
)
async with embedding_rate_limiter_context_manager():
embeddings = self.embedding_model.embed(
text,
batch_size=len(text),
parallel=None,
)
return list(embeddings)

View file

@ -25,6 +25,7 @@ from cognee.infrastructure.llm.tokenizer.Mistral import (
from cognee.infrastructure.llm.tokenizer.TikToken import (
TikTokenTokenizer,
)
from cognee.shared.rate_limiting import embedding_rate_limiter_context_manager
litellm.set_verbose = False
logger = get_logger("LiteLLMEmbeddingEngine")
@ -109,13 +110,14 @@ class LiteLLMEmbeddingEngine(EmbeddingEngine):
response = {"data": [{"embedding": [0.0] * self.dimensions} for _ in text]}
return [data["embedding"] for data in response["data"]]
else:
response = await litellm.aembedding(
model=self.model,
input=text,
api_key=self.api_key,
api_base=self.endpoint,
api_version=self.api_version,
)
async with embedding_rate_limiter_context_manager():
response = await litellm.aembedding(
model=self.model,
input=text,
api_key=self.api_key,
api_base=self.endpoint,
api_version=self.api_version,
)
return [data["embedding"] for data in response.data]

View file

@ -18,10 +18,7 @@ from cognee.infrastructure.databases.vector.embeddings.EmbeddingEngine import Em
from cognee.infrastructure.llm.tokenizer.HuggingFace import (
HuggingFaceTokenizer,
)
from cognee.infrastructure.databases.vector.embeddings.embedding_rate_limiter import (
embedding_rate_limit_async,
embedding_sleep_and_retry_async,
)
from cognee.shared.rate_limiting import embedding_rate_limiter_context_manager
from cognee.shared.utils import create_secure_ssl_context
logger = get_logger("OllamaEmbeddingEngine")
@ -101,7 +98,7 @@ class OllamaEmbeddingEngine(EmbeddingEngine):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
@ -120,11 +117,15 @@ class OllamaEmbeddingEngine(EmbeddingEngine):
ssl_context = create_secure_ssl_context()
connector = aiohttp.TCPConnector(ssl=ssl_context)
async with aiohttp.ClientSession(connector=connector) as session:
async with session.post(
self.endpoint, json=payload, headers=headers, timeout=60.0
) as response:
data = await response.json()
return data["embeddings"][0]
async with embedding_rate_limiter_context_manager():
async with session.post(
self.endpoint, json=payload, headers=headers, timeout=60.0
) as response:
data = await response.json()
if "embeddings" in data:
return data["embeddings"][0]
else:
return data["data"][0]["embedding"]
def get_vector_size(self) -> int:
"""

View file

@ -1,544 +0,0 @@
import threading
import logging
import functools
import os
import time
import asyncio
import random
from cognee.shared.logging_utils import get_logger
from cognee.infrastructure.llm.config import get_llm_config
logger = get_logger()
# Common error patterns that indicate rate limiting
RATE_LIMIT_ERROR_PATTERNS = [
"rate limit",
"rate_limit",
"ratelimit",
"too many requests",
"retry after",
"capacity",
"quota",
"limit exceeded",
"tps limit exceeded",
"request limit exceeded",
"maximum requests",
"exceeded your current quota",
"throttled",
"throttling",
]
# Default retry settings
DEFAULT_MAX_RETRIES = 5
DEFAULT_INITIAL_BACKOFF = 1.0 # seconds
DEFAULT_BACKOFF_FACTOR = 2.0 # exponential backoff multiplier
DEFAULT_JITTER = 0.1 # 10% jitter to avoid thundering herd
class EmbeddingRateLimiter:
"""
Rate limiter for embedding API calls.
This class implements a singleton pattern to ensure that rate limiting
is consistent across all embedding requests. It uses the limits
library with a moving window strategy to control request rates.
The rate limiter uses the same configuration as the LLM API rate limiter
but uses a separate key to track embedding API calls independently.
Public Methods:
- get_instance
- reset_instance
- hit_limit
- wait_if_needed
- async_wait_if_needed
Instance Variables:
- enabled
- requests_limit
- interval_seconds
- request_times
- lock
"""
_instance = None
lock = threading.Lock()
@classmethod
def get_instance(cls):
"""
Retrieve the singleton instance of the EmbeddingRateLimiter.
This method ensures that only one instance of the class exists and
is thread-safe. It lazily initializes the instance if it doesn't
already exist.
Returns:
--------
The singleton instance of the EmbeddingRateLimiter class.
"""
if cls._instance is None:
with cls.lock:
if cls._instance is None:
cls._instance = cls()
return cls._instance
@classmethod
def reset_instance(cls):
"""
Reset the singleton instance of the EmbeddingRateLimiter.
This method is thread-safe and sets the instance to None, allowing
for a new instance to be created when requested again.
"""
with cls.lock:
cls._instance = None
def __init__(self):
config = get_llm_config()
self.enabled = config.embedding_rate_limit_enabled
self.requests_limit = config.embedding_rate_limit_requests
self.interval_seconds = config.embedding_rate_limit_interval
self.request_times = []
self.lock = threading.Lock()
logging.info(
f"EmbeddingRateLimiter initialized: enabled={self.enabled}, "
f"requests_limit={self.requests_limit}, interval_seconds={self.interval_seconds}"
)
def hit_limit(self) -> bool:
"""
Check if the current request would exceed the rate limit.
This method checks if the rate limiter is enabled and evaluates
the number of requests made in the elapsed interval.
Returns:
- bool: True if the rate limit would be exceeded, False otherwise.
Returns:
--------
- bool: True if the rate limit would be exceeded, otherwise False.
"""
if not self.enabled:
return False
current_time = time.time()
with self.lock:
# Remove expired request times
cutoff_time = current_time - self.interval_seconds
self.request_times = [t for t in self.request_times if t > cutoff_time]
# Check if adding a new request would exceed the limit
if len(self.request_times) >= self.requests_limit:
logger.info(
f"Rate limit hit: {len(self.request_times)} requests in the last {self.interval_seconds} seconds"
)
return True
# Otherwise, we're under the limit
return False
def wait_if_needed(self) -> float:
"""
Block until a request can be made without exceeding the rate limit.
This method will wait if the current request would exceed the
rate limit and returns the time waited in seconds.
Returns:
- float: Time waited in seconds before a request is allowed.
Returns:
--------
- float: Time waited in seconds before proceeding.
"""
if not self.enabled:
return 0
wait_time = 0
start_time = time.time()
while self.hit_limit():
time.sleep(0.5) # Poll every 0.5 seconds
wait_time = time.time() - start_time
# Record this request
with self.lock:
self.request_times.append(time.time())
return wait_time
async def async_wait_if_needed(self) -> float:
"""
Asynchronously wait until a request can be made without exceeding the rate limit.
This method will wait if the current request would exceed the
rate limit and returns the time waited in seconds.
Returns:
- float: Time waited in seconds before a request is allowed.
Returns:
--------
- float: Time waited in seconds before proceeding.
"""
if not self.enabled:
return 0
wait_time = 0
start_time = time.time()
while self.hit_limit():
await asyncio.sleep(0.5) # Poll every 0.5 seconds
wait_time = time.time() - start_time
# Record this request
with self.lock:
self.request_times.append(time.time())
return wait_time
def embedding_rate_limit_sync(func):
"""
Apply rate limiting to a synchronous embedding function.
Parameters:
-----------
- func: Function to decorate with rate limiting logic.
Returns:
--------
Returns the decorated function that applies rate limiting.
"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
"""
Wrap the given function with rate limiting logic to control the embedding API usage.
Checks if the rate limit has been exceeded before allowing the function to execute. If
the limit is hit, it logs a warning and raises an EmbeddingException. Otherwise, it
updates the request count and proceeds to call the original function.
Parameters:
-----------
- *args: Variable length argument list for the wrapped function.
- **kwargs: Keyword arguments for the wrapped function.
Returns:
--------
Returns the result of the wrapped function if rate limiting conditions are met.
"""
limiter = EmbeddingRateLimiter.get_instance()
# Check if rate limiting is enabled and if we're at the limit
if limiter.hit_limit():
error_msg = "Embedding API rate limit exceeded"
logger.warning(error_msg)
# Create a custom embedding rate limit exception
from cognee.infrastructure.databases.exceptions import EmbeddingException
raise EmbeddingException(error_msg)
# Add this request to the counter and proceed
limiter.wait_if_needed()
return func(*args, **kwargs)
return wrapper
def embedding_rate_limit_async(func):
"""
Decorator that applies rate limiting to an asynchronous embedding function.
Parameters:
-----------
- func: Async function to decorate.
Returns:
--------
Returns the decorated async function that applies rate limiting.
"""
@functools.wraps(func)
async def wrapper(*args, **kwargs):
"""
Handle function calls with embedding rate limiting.
This asynchronous wrapper checks if the embedding API rate limit is exceeded before
allowing the function to execute. If the limit is exceeded, it logs a warning and raises
an EmbeddingException. If not, it waits as necessary and proceeds with the function
call.
Parameters:
-----------
- *args: Positional arguments passed to the wrapped function.
- **kwargs: Keyword arguments passed to the wrapped function.
Returns:
--------
Returns the result of the wrapped function after handling rate limiting.
"""
limiter = EmbeddingRateLimiter.get_instance()
# Check if rate limiting is enabled and if we're at the limit
if limiter.hit_limit():
error_msg = "Embedding API rate limit exceeded"
logger.warning(error_msg)
# Create a custom embedding rate limit exception
from cognee.infrastructure.databases.exceptions import EmbeddingException
raise EmbeddingException(error_msg)
# Add this request to the counter and proceed
await limiter.async_wait_if_needed()
return await func(*args, **kwargs)
return wrapper
def embedding_sleep_and_retry_sync(max_retries=5, base_backoff=1.0, jitter=0.5):
"""
Add retry with exponential backoff for synchronous embedding functions.
Parameters:
-----------
- max_retries: Maximum number of retries before giving up. (default 5)
- base_backoff: Base backoff time in seconds for retry intervals. (default 1.0)
- jitter: Jitter factor to randomize the backoff time to avoid collision. (default
0.5)
Returns:
--------
A decorator that retries the wrapped function on rate limit errors, applying
exponential backoff with jitter.
"""
def decorator(func):
"""
Wraps a function to apply retry logic on rate limit errors.
Parameters:
-----------
- func: The function to be wrapped with retry logic.
Returns:
--------
Returns the wrapped function with retry logic applied.
"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
"""
Retry the execution of a function with backoff on failure due to rate limit errors.
This wrapper function will call the specified function and if it raises an exception, it
will handle retries according to defined conditions. It will check the environment for a
DISABLE_RETRIES flag to determine whether to retry or propagate errors immediately
during tests. If the error is identified as a rate limit error, it will apply an
exponential backoff strategy with jitter before retrying, up to a maximum number of
retries. If the retries are exhausted, it raises the last encountered error.
Parameters:
-----------
- *args: Positional arguments passed to the wrapped function.
- **kwargs: Keyword arguments passed to the wrapped function.
Returns:
--------
Returns the result of the wrapped function if successful; otherwise, raises the last
error encountered after maximum retries are exhausted.
"""
# If DISABLE_RETRIES is set, don't retry for testing purposes
disable_retries = os.environ.get("DISABLE_RETRIES", "false").lower() in (
"true",
"1",
"yes",
)
retries = 0
last_error = None
while retries <= max_retries:
try:
return func(*args, **kwargs)
except Exception as e:
# Check if this is a rate limit error
error_str = str(e).lower()
error_type = type(e).__name__
is_rate_limit = any(
pattern in error_str.lower() for pattern in RATE_LIMIT_ERROR_PATTERNS
)
if disable_retries:
# For testing, propagate the exception immediately
raise
if is_rate_limit and retries < max_retries:
# Calculate backoff with jitter
backoff = (
base_backoff * (2**retries) * (1 + random.uniform(-jitter, jitter))
)
logger.warning(
f"Embedding rate limit hit, retrying in {backoff:.2f}s "
f"(attempt {retries + 1}/{max_retries}): "
f"({error_str!r}, {error_type!r})"
)
time.sleep(backoff)
retries += 1
last_error = e
else:
# Not a rate limit error or max retries reached, raise
raise
# If we exit the loop due to max retries, raise the last error
if last_error:
raise last_error
return wrapper
return decorator
def embedding_sleep_and_retry_async(max_retries=5, base_backoff=1.0, jitter=0.5):
"""
Add retry logic with exponential backoff for asynchronous embedding functions.
This decorator retries the wrapped asynchronous function upon encountering rate limit
errors, utilizing exponential backoff with optional jitter to space out retry attempts.
It allows for a maximum number of retries before giving up and raising the last error
encountered.
Parameters:
-----------
- max_retries: Maximum number of retries allowed before giving up. (default 5)
- base_backoff: Base amount of time in seconds to wait before retrying after a rate
limit error. (default 1.0)
- jitter: Amount of randomness to add to the backoff duration to help mitigate burst
issues on retries. (default 0.5)
Returns:
--------
Returns a decorated asynchronous function that implements the retry logic on rate
limit errors.
"""
def decorator(func):
"""
Handle retries for an async function with exponential backoff and jitter.
Parameters:
-----------
- func: An asynchronous function to be wrapped with retry logic.
Returns:
--------
Returns the wrapper function that manages the retry behavior for the wrapped async
function.
"""
@functools.wraps(func)
async def wrapper(*args, **kwargs):
"""
Handle retries for an async function with exponential backoff and jitter.
If the environment variable DISABLE_RETRIES is set to true, 1, or yes, the function will
not retry on errors.
It attempts to call the wrapped function until it succeeds or the maximum number of
retries is reached. If an exception occurs, it checks if it's a rate limit error to
determine if a retry is needed.
Parameters:
-----------
- *args: Positional arguments passed to the wrapped function.
- **kwargs: Keyword arguments passed to the wrapped function.
Returns:
--------
Returns the result of the wrapped async function if successful; raises the last
encountered error if all retries fail.
"""
# If DISABLE_RETRIES is set, don't retry for testing purposes
disable_retries = os.environ.get("DISABLE_RETRIES", "false").lower() in (
"true",
"1",
"yes",
)
retries = 0
last_error = None
while retries <= max_retries:
try:
return await func(*args, **kwargs)
except Exception as e:
# Check if this is a rate limit error
error_str = str(e).lower()
error_type = type(e).__name__
is_rate_limit = any(
pattern in error_str.lower() for pattern in RATE_LIMIT_ERROR_PATTERNS
)
if disable_retries:
# For testing, propagate the exception immediately
raise
if is_rate_limit and retries < max_retries:
# Calculate backoff with jitter
backoff = (
base_backoff * (2**retries) * (1 + random.uniform(-jitter, jitter))
)
logger.warning(
f"Embedding rate limit hit, retrying in {backoff:.2f}s "
f"(attempt {retries + 1}/{max_retries}): "
f"({error_str!r}, {error_type!r})"
)
await asyncio.sleep(backoff)
retries += 1
last_error = e
else:
# Not a rate limit error or max retries reached, raise
raise
# If we exit the loop due to max retries, raise the last error
if last_error:
raise last_error
return wrapper
return decorator

View file

@ -193,6 +193,8 @@ class LanceDBAdapter(VectorDBInterface):
for (data_point_index, data_point) in enumerate(data_points)
]
lance_data_points = list({dp.id: dp for dp in lance_data_points}.values())
async with self.VECTOR_DB_LOCK:
await (
collection.merge_insert("id")

View file

@ -0,0 +1,50 @@
import os
from uuid import UUID
from typing import Optional
from cognee.infrastructure.databases.vector.create_vector_engine import create_vector_engine
from cognee.modules.users.models import User
from cognee.modules.users.models import DatasetDatabase
from cognee.base_config import get_base_config
from cognee.infrastructure.databases.vector import get_vectordb_config
from cognee.infrastructure.databases.dataset_database_handler import DatasetDatabaseHandlerInterface
class LanceDBDatasetDatabaseHandler(DatasetDatabaseHandlerInterface):
"""
Handler for interacting with LanceDB Dataset databases.
"""
@classmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
vector_config = get_vectordb_config()
base_config = get_base_config()
if vector_config.vector_db_provider != "lancedb":
raise ValueError(
"LanceDBDatasetDatabaseHandler can only be used with LanceDB vector database provider."
)
databases_directory_path = os.path.join(
base_config.system_root_directory, "databases", str(user.id)
)
vector_db_name = f"{dataset_id}.lance.db"
return {
"vector_database_provider": vector_config.vector_db_provider,
"vector_database_url": os.path.join(databases_directory_path, vector_db_name),
"vector_database_key": vector_config.vector_db_key,
"vector_database_name": vector_db_name,
"vector_dataset_database_handler": "lancedb",
}
@classmethod
async def delete_dataset(cls, dataset_database: DatasetDatabase):
vector_engine = create_vector_engine(
vector_db_provider=dataset_database.vector_database_provider,
vector_db_url=dataset_database.vector_database_url,
vector_db_key=dataset_database.vector_database_key,
vector_db_name=dataset_database.vector_database_name,
)
await vector_engine.prune()

View file

@ -2,6 +2,8 @@ from typing import List, Protocol, Optional, Union, Any
from abc import abstractmethod
from cognee.infrastructure.engine import DataPoint
from .models.PayloadSchema import PayloadSchema
from uuid import UUID
from cognee.modules.users.models import User
class VectorDBInterface(Protocol):
@ -217,3 +219,36 @@ class VectorDBInterface(Protocol):
- Any: The schema object suitable for this vector database
"""
return model_type
@classmethod
async def create_dataset(cls, dataset_id: Optional[UUID], user: Optional[User]) -> dict:
"""
Return a dictionary with connection info for a vector database for the given dataset.
Function can auto handle deploying of the actual database if needed, but is not necessary.
Only providing connection info is sufficient, this info will be mapped when trying to connect to the provided dataset in the future.
Needed for Cognee multi-tenant/multi-user and backend access control support.
Dictionary returned from this function will be used to create a DatasetDatabase row in the relational database.
From which internal mapping of dataset -> database connection info will be done.
Each dataset needs to map to a unique vector database when backend access control is enabled to facilitate a separation of concern for data.
Args:
dataset_id: UUID of the dataset if needed by the database creation logic
user: User object if needed by the database creation logic
Returns:
dict: Connection info for the created vector database instance.
"""
pass
async def delete_dataset(self, dataset_id: UUID, user: User) -> None:
"""
Delete the vector database for the given dataset.
Function should auto handle deleting of the actual database or send a request to the proper service to delete the database.
Needed for maintaining a database for Cognee multi-tenant/multi-user and backend access control.
Args:
dataset_id: UUID of the dataset
user: User object
"""
pass

View file

@ -9,6 +9,8 @@ class S3Config(BaseSettings):
aws_access_key_id: Optional[str] = None
aws_secret_access_key: Optional[str] = None
aws_session_token: Optional[str] = None
aws_profile_name: Optional[str] = None
aws_bedrock_runtime_endpoint: Optional[str] = None
model_config = SettingsConfigDict(env_file=".env", extra="allow")

View file

@ -11,7 +11,7 @@ class LLMGateway:
@staticmethod
def acreate_structured_output(
text_input: str, system_prompt: str, response_model: Type[BaseModel]
text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> Coroutine:
llm_config = get_llm_config()
if llm_config.structured_output_framework.upper() == "BAML":
@ -31,7 +31,10 @@ class LLMGateway:
llm_client = get_llm_client()
return llm_client.acreate_structured_output(
text_input=text_input, system_prompt=system_prompt, response_model=response_model
text_input=text_input,
system_prompt=system_prompt,
response_model=response_model,
**kwargs,
)
@staticmethod

View file

@ -74,6 +74,41 @@ class LLMConfig(BaseSettings):
model_config = SettingsConfigDict(env_file=".env", extra="allow")
@model_validator(mode="after")
def strip_quotes_from_strings(self) -> "LLMConfig":
"""
Strip surrounding quotes from specific string fields that often come from
environment variables with extra quotes (e.g., via Docker's --env-file).
Only applies to known config keys where quotes are invalid or cause issues.
"""
string_fields_to_strip = [
"llm_api_key",
"llm_endpoint",
"llm_api_version",
"baml_llm_api_key",
"baml_llm_endpoint",
"baml_llm_api_version",
"fallback_api_key",
"fallback_endpoint",
"fallback_model",
"llm_provider",
"llm_model",
"baml_llm_provider",
"baml_llm_model",
]
cls = self.__class__
for field_name in string_fields_to_strip:
if field_name not in cls.model_fields:
continue
value = getattr(self, field_name, None)
if isinstance(value, str) and len(value) >= 2:
if value[0] == value[-1] and value[0] in ("'", '"'):
setattr(self, field_name, value[1:-1])
return self
def model_post_init(self, __context) -> None:
"""Initialize the BAML registry after the model is created."""
# Check if BAML is selected as structured output framework but not available

View file

@ -10,7 +10,7 @@ from cognee.infrastructure.llm.config import (
async def extract_content_graph(
content: str, response_model: Type[BaseModel], custom_prompt: Optional[str] = None
content: str, response_model: Type[BaseModel], custom_prompt: Optional[str] = None, **kwargs
):
if custom_prompt:
system_prompt = custom_prompt
@ -30,7 +30,7 @@ async def extract_content_graph(
system_prompt = render_prompt(prompt_path, {}, base_directory=base_directory)
content_graph = await LLMGateway.acreate_structured_output(
content, system_prompt, response_model
content, system_prompt, response_model, **kwargs
)
return content_graph

View file

@ -1,7 +1,15 @@
import asyncio
from typing import Type
from cognee.shared.logging_utils import get_logger
from pydantic import BaseModel
from tenacity import (
retry,
stop_after_delay,
wait_exponential_jitter,
retry_if_not_exception_type,
before_sleep_log,
)
from cognee.shared.logging_utils import get_logger
from cognee.infrastructure.llm.config import get_llm_config
from cognee.infrastructure.llm.structured_output_framework.baml.baml_src.extraction.create_dynamic_baml_type import (
create_dynamic_baml_type,
@ -10,12 +18,18 @@ from cognee.infrastructure.llm.structured_output_framework.baml.baml_client.type
TypeBuilder,
)
from cognee.infrastructure.llm.structured_output_framework.baml.baml_client import b
from pydantic import BaseModel
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
import logging
logger = get_logger()
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(8, 128),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
text_input: str, system_prompt: str, response_model: Type[BaseModel]
):
@ -45,11 +59,12 @@ async def acreate_structured_output(
tb = TypeBuilder()
type_builder = create_dynamic_baml_type(tb, tb.ResponseModel, response_model)
result = await b.AcreateStructuredOutput(
text_input=text_input,
system_prompt=system_prompt,
baml_options={"client_registry": config.baml_registry, "tb": type_builder},
)
async with llm_rate_limiter_context_manager():
result = await b.AcreateStructuredOutput(
text_input=text_input,
system_prompt=system_prompt,
baml_options={"client_registry": config.baml_registry, "tb": type_builder},
)
# Transform BAML response to proper pydantic reponse model
if response_model is str:

View file

@ -15,6 +15,7 @@ from tenacity import (
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.llm_interface import (
LLMInterface,
)
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
from cognee.infrastructure.llm.config import get_llm_config
logger = get_logger()
@ -45,13 +46,13 @@ class AnthropicAdapter(LLMInterface):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a response from a user query.
@ -69,17 +70,17 @@ class AnthropicAdapter(LLMInterface):
- BaseModel: An instance of BaseModel containing the structured response.
"""
return await self.aclient(
model=self.model,
max_tokens=4096,
max_retries=5,
messages=[
{
"role": "user",
"content": f"""Use the given format to extract information
from the following input: {text_input}. {system_prompt}""",
}
],
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
return await self.aclient(
model=self.model,
max_tokens=4096,
max_retries=2,
messages=[
{
"role": "user",
"content": f"""Use the given format to extract information
from the following input: {text_input}. {system_prompt}""",
}
],
response_model=response_model,
)

View file

@ -0,0 +1,5 @@
"""Bedrock LLM adapter module."""
from .adapter import BedrockAdapter
__all__ = ["BedrockAdapter"]

View file

@ -0,0 +1,153 @@
import litellm
import instructor
from typing import Type
from pydantic import BaseModel
from litellm.exceptions import ContentPolicyViolationError
from instructor.exceptions import InstructorRetryException
from cognee.infrastructure.llm.LLMGateway import LLMGateway
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.llm_interface import (
LLMInterface,
)
from cognee.infrastructure.llm.exceptions import (
ContentPolicyFilterError,
MissingSystemPromptPathError,
)
from cognee.infrastructure.files.storage.s3_config import get_s3_config
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.rate_limiter import (
rate_limit_async,
rate_limit_sync,
sleep_and_retry_async,
sleep_and_retry_sync,
)
from cognee.modules.observability.get_observe import get_observe
observe = get_observe()
class BedrockAdapter(LLMInterface):
"""
Adapter for AWS Bedrock API with support for three authentication methods:
1. API Key (Bearer Token)
2. AWS Credentials (access key + secret key)
3. AWS Profile (boto3 credential chain)
"""
name = "Bedrock"
model: str
api_key: str
default_instructor_mode = "json_schema_mode"
MAX_RETRIES = 5
def __init__(
self,
model: str,
api_key: str = None,
max_completion_tokens: int = 16384,
streaming: bool = False,
instructor_mode: str = None,
):
self.instructor_mode = instructor_mode if instructor_mode else self.default_instructor_mode
self.aclient = instructor.from_litellm(
litellm.acompletion, mode=instructor.Mode(self.instructor_mode)
)
self.client = instructor.from_litellm(litellm.completion)
self.model = model
self.api_key = api_key
self.max_completion_tokens = max_completion_tokens
self.streaming = streaming
def _create_bedrock_request(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> dict:
"""Create Bedrock request with authentication."""
request_params = {
"model": self.model,
"custom_llm_provider": "bedrock",
"drop_params": True,
"messages": [
{"role": "user", "content": text_input},
{"role": "system", "content": system_prompt},
],
"response_model": response_model,
"max_retries": self.MAX_RETRIES,
"max_completion_tokens": self.max_completion_tokens,
"stream": self.streaming,
}
s3_config = get_s3_config()
# Add authentication parameters
if self.api_key:
request_params["api_key"] = self.api_key
elif s3_config.aws_access_key_id and s3_config.aws_secret_access_key:
request_params["aws_access_key_id"] = s3_config.aws_access_key_id
request_params["aws_secret_access_key"] = s3_config.aws_secret_access_key
if s3_config.aws_session_token:
request_params["aws_session_token"] = s3_config.aws_session_token
elif s3_config.aws_profile_name:
request_params["aws_profile_name"] = s3_config.aws_profile_name
if s3_config.aws_region:
request_params["aws_region_name"] = s3_config.aws_region
# Add optional parameters
if s3_config.aws_bedrock_runtime_endpoint:
request_params["aws_bedrock_runtime_endpoint"] = s3_config.aws_bedrock_runtime_endpoint
return request_params
@observe(as_type="generation")
@sleep_and_retry_async()
@rate_limit_async
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
"""Generate structured output from AWS Bedrock API."""
try:
request_params = self._create_bedrock_request(text_input, system_prompt, response_model)
return await self.aclient.chat.completions.create(**request_params)
except (
ContentPolicyViolationError,
InstructorRetryException,
) as error:
if (
isinstance(error, InstructorRetryException)
and "content management policy" not in str(error).lower()
):
raise error
raise ContentPolicyFilterError(
f"The provided input contains content that is not aligned with our content policy: {text_input}"
)
@observe
@sleep_and_retry_sync()
@rate_limit_sync
def create_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
"""Generate structured output from AWS Bedrock API (synchronous)."""
request_params = self._create_bedrock_request(text_input, system_prompt, response_model)
return self.client.chat.completions.create(**request_params)
def show_prompt(self, text_input: str, system_prompt: str) -> str:
"""Format and display the prompt for a user query."""
if not text_input:
text_input = "No user input provided."
if not system_prompt:
raise MissingSystemPromptPathError()
system_prompt = LLMGateway.read_query_prompt(system_prompt)
formatted_prompt = (
f"""System Prompt:\n{system_prompt}\n\nUser Input:\n{text_input}\n"""
if system_prompt
else None
)
return formatted_prompt

View file

@ -13,6 +13,7 @@ from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.ll
LLMInterface,
)
import logging
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
from cognee.shared.logging_utils import get_logger
from tenacity import (
retry,
@ -73,13 +74,13 @@ class GeminiAdapter(LLMInterface):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a response from a user query.
@ -105,24 +106,25 @@ class GeminiAdapter(LLMInterface):
"""
try:
return await self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
api_key=self.api_key,
max_retries=5,
api_base=self.endpoint,
api_version=self.api_version,
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
return await self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
api_key=self.api_key,
max_retries=2,
api_base=self.endpoint,
api_version=self.api_version,
response_model=response_model,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,
@ -140,23 +142,24 @@ class GeminiAdapter(LLMInterface):
)
try:
return await self.aclient.chat.completions.create(
model=self.fallback_model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
api_key=self.fallback_api_key,
api_base=self.fallback_endpoint,
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
return await self.aclient.chat.completions.create(
model=self.fallback_model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=2,
api_key=self.fallback_api_key,
api_base=self.fallback_endpoint,
response_model=response_model,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,

View file

@ -13,6 +13,7 @@ from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.ll
LLMInterface,
)
import logging
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
from cognee.shared.logging_utils import get_logger
from tenacity import (
retry,
@ -73,13 +74,13 @@ class GenericAPIAdapter(LLMInterface):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a response from a user query.
@ -105,23 +106,24 @@ class GenericAPIAdapter(LLMInterface):
"""
try:
return await self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
api_key=self.api_key,
api_base=self.endpoint,
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
return await self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=2,
api_key=self.api_key,
api_base=self.endpoint,
response_model=response_model,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,
@ -139,23 +141,24 @@ class GenericAPIAdapter(LLMInterface):
) from error
try:
return await self.aclient.chat.completions.create(
model=self.fallback_model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
api_key=self.fallback_api_key,
api_base=self.fallback_endpoint,
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
return await self.aclient.chat.completions.create(
model=self.fallback_model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=2,
api_key=self.fallback_api_key,
api_base=self.fallback_endpoint,
response_model=response_model,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,

View file

@ -24,6 +24,7 @@ class LLMProvider(Enum):
- CUSTOM: Represents a custom provider option.
- GEMINI: Represents the Gemini provider.
- MISTRAL: Represents the Mistral AI provider.
- BEDROCK: Represents the AWS Bedrock provider.
"""
OPENAI = "openai"
@ -32,6 +33,7 @@ class LLMProvider(Enum):
CUSTOM = "custom"
GEMINI = "gemini"
MISTRAL = "mistral"
BEDROCK = "bedrock"
def get_llm_client(raise_api_key_error: bool = True):
@ -154,7 +156,7 @@ def get_llm_client(raise_api_key_error: bool = True):
)
elif provider == LLMProvider.MISTRAL:
if llm_config.llm_api_key is None:
if llm_config.llm_api_key is None and raise_api_key_error:
raise LLMAPIKeyNotSetError()
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.mistral.adapter import (
@ -169,5 +171,21 @@ def get_llm_client(raise_api_key_error: bool = True):
instructor_mode=llm_config.llm_instructor_mode.lower(),
)
elif provider == LLMProvider.BEDROCK:
# if llm_config.llm_api_key is None and raise_api_key_error:
# raise LLMAPIKeyNotSetError()
from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.bedrock.adapter import (
BedrockAdapter,
)
return BedrockAdapter(
model=llm_config.llm_model,
api_key=llm_config.llm_api_key,
max_completion_tokens=max_completion_tokens,
streaming=llm_config.llm_streaming,
instructor_mode=llm_config.llm_instructor_mode.lower(),
)
else:
raise UnsupportedLLMProviderError(provider)

View file

@ -10,6 +10,7 @@ from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.ll
LLMInterface,
)
from cognee.infrastructure.llm.config import get_llm_config
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
import logging
from tenacity import (
@ -62,13 +63,13 @@ class MistralAdapter(LLMInterface):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a response from the user query.
@ -97,13 +98,14 @@ class MistralAdapter(LLMInterface):
},
]
try:
response = await self.aclient.chat.completions.create(
model=self.model,
max_tokens=self.max_completion_tokens,
max_retries=5,
messages=messages,
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
response = await self.aclient.chat.completions.create(
model=self.model,
max_tokens=self.max_completion_tokens,
max_retries=2,
messages=messages,
response_model=response_model,
)
if response.choices and response.choices[0].message.content:
content = response.choices[0].message.content
return response_model.model_validate_json(content)

View file

@ -11,6 +11,8 @@ from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.ll
)
from cognee.infrastructure.files.utils.open_data_file import open_data_file
from cognee.shared.logging_utils import get_logger
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
from tenacity import (
retry,
stop_after_delay,
@ -68,13 +70,13 @@ class OllamaAPIAdapter(LLMInterface):
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a structured output from the LLM using the provided text and system prompt.
@ -95,33 +97,33 @@ class OllamaAPIAdapter(LLMInterface):
- BaseModel: A structured output that conforms to the specified response model.
"""
response = self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"{text_input}",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
response_model=response_model,
)
async with llm_rate_limiter_context_manager():
response = self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"{text_input}",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=2,
response_model=response_model,
)
return response
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def create_transcript(self, input_file: str) -> str:
async def create_transcript(self, input_file: str, **kwargs) -> str:
"""
Generate an audio transcript from a user query.
@ -160,7 +162,7 @@ class OllamaAPIAdapter(LLMInterface):
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def transcribe_image(self, input_file: str) -> str:
async def transcribe_image(self, input_file: str, **kwargs) -> str:
"""
Transcribe content from an image using base64 encoding.

View file

@ -22,6 +22,7 @@ from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.ll
from cognee.infrastructure.llm.exceptions import (
ContentPolicyFilterError,
)
from cognee.shared.rate_limiting import llm_rate_limiter_context_manager
from cognee.infrastructure.files.utils.open_data_file import open_data_file
from cognee.modules.observability.get_observe import get_observe
from cognee.shared.logging_utils import get_logger
@ -105,13 +106,13 @@ class OpenAIAdapter(LLMInterface):
@observe(as_type="generation")
@retry(
stop=stop_after_delay(128),
wait=wait_exponential_jitter(2, 128),
wait=wait_exponential_jitter(8, 128),
retry=retry_if_not_exception_type(litellm.exceptions.NotFoundError),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a response from a user query.
@ -135,34 +136,9 @@ class OpenAIAdapter(LLMInterface):
"""
try:
return await self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
api_key=self.api_key,
api_base=self.endpoint,
api_version=self.api_version,
response_model=response_model,
max_retries=self.MAX_RETRIES,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,
InstructorRetryException,
) as e:
if not (self.fallback_model and self.fallback_api_key):
raise e
try:
async with llm_rate_limiter_context_manager():
return await self.aclient.chat.completions.create(
model=self.fallback_model,
model=self.model,
messages=[
{
"role": "user",
@ -173,11 +149,40 @@ class OpenAIAdapter(LLMInterface):
"content": system_prompt,
},
],
api_key=self.fallback_api_key,
# api_base=self.fallback_endpoint,
api_key=self.api_key,
api_base=self.endpoint,
api_version=self.api_version,
response_model=response_model,
max_retries=self.MAX_RETRIES,
**kwargs,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,
InstructorRetryException,
) as e:
if not (self.fallback_model and self.fallback_api_key):
raise e
try:
async with llm_rate_limiter_context_manager():
return await self.aclient.chat.completions.create(
model=self.fallback_model,
messages=[
{
"role": "user",
"content": f"""{text_input}""",
},
{
"role": "system",
"content": system_prompt,
},
],
api_key=self.fallback_api_key,
# api_base=self.fallback_endpoint,
response_model=response_model,
max_retries=self.MAX_RETRIES,
**kwargs,
)
except (
ContentFilterFinishReasonError,
ContentPolicyViolationError,
@ -202,7 +207,7 @@ class OpenAIAdapter(LLMInterface):
reraise=True,
)
def create_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
self, text_input: str, system_prompt: str, response_model: Type[BaseModel], **kwargs
) -> BaseModel:
"""
Generate a response from a user query.
@ -242,6 +247,7 @@ class OpenAIAdapter(LLMInterface):
api_version=self.api_version,
response_model=response_model,
max_retries=self.MAX_RETRIES,
**kwargs,
)
@retry(
@ -251,7 +257,7 @@ class OpenAIAdapter(LLMInterface):
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def create_transcript(self, input):
async def create_transcript(self, input, **kwargs):
"""
Generate an audio transcript from a user query.
@ -278,6 +284,7 @@ class OpenAIAdapter(LLMInterface):
api_base=self.endpoint,
api_version=self.api_version,
max_retries=self.MAX_RETRIES,
**kwargs,
)
return transcription
@ -289,7 +296,7 @@ class OpenAIAdapter(LLMInterface):
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
async def transcribe_image(self, input) -> BaseModel:
async def transcribe_image(self, input, **kwargs) -> BaseModel:
"""
Generate a transcription of an image from a user query.
@ -334,4 +341,5 @@ class OpenAIAdapter(LLMInterface):
api_version=self.api_version,
max_completion_tokens=300,
max_retries=self.MAX_RETRIES,
**kwargs,
)

View file

@ -0,0 +1,53 @@
from typing import Any
from cognee import memify
from cognee.context_global_variables import (
set_database_global_context_variables,
)
from cognee.exceptions import CogneeValidationError
from cognee.modules.data.methods import get_authorized_existing_datasets
from cognee.shared.logging_utils import get_logger
from cognee.modules.pipelines.tasks.task import Task
from cognee.modules.users.models import User
from cognee.tasks.memify.get_triplet_datapoints import get_triplet_datapoints
from cognee.tasks.storage import index_data_points
logger = get_logger("create_triplet_embeddings")
async def create_triplet_embeddings(
user: User,
dataset: str = "main_dataset",
run_in_background: bool = False,
triplets_batch_size: int = 100,
) -> dict[str, Any]:
dataset_to_write = await get_authorized_existing_datasets(
user=user, datasets=[dataset], permission_type="write"
)
if not dataset_to_write:
raise CogneeValidationError(
message=f"User does not have write access to dataset: {dataset}",
log=False,
)
await set_database_global_context_variables(
dataset_to_write[0].id, dataset_to_write[0].owner_id
)
extraction_tasks = [Task(get_triplet_datapoints, triplets_batch_size=triplets_batch_size)]
enrichment_tasks = [
Task(index_data_points, task_config={"batch_size": triplets_batch_size}),
]
result = await memify(
extraction_tasks=extraction_tasks,
enrichment_tasks=enrichment_tasks,
dataset=dataset_to_write[0].id,
data=[{}],
user=user,
run_in_background=run_in_background,
)
return result

View file

@ -8,12 +8,14 @@ import os
class CognifyConfig(BaseSettings):
classification_model: object = DefaultContentPrediction
summarization_model: object = SummarizedContent
triplet_embedding: bool = False
model_config = SettingsConfigDict(env_file=".env", extra="allow")
def to_dict(self) -> dict:
return {
"classification_model": self.classification_model,
"summarization_model": self.summarization_model,
"triplet_embedding": self.triplet_embedding,
}

Some files were not shown because too many files have changed in this diff Show more