Compare commits
4 commits
main
...
feat/add-e
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
277b5972d0 | ||
|
|
5fc4f10988 | ||
|
|
b5852a2da7 | ||
|
|
de39d5c49a |
390 changed files with 16066 additions and 30285 deletions
127
.coderabbit.yaml
127
.coderabbit.yaml
|
|
@ -1,127 +0,0 @@
|
|||
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
|
||||
# .coderabbit.yaml
|
||||
language: en
|
||||
early_access: false
|
||||
enable_free_tier: true
|
||||
reviews:
|
||||
profile: chill
|
||||
instructions: >-
|
||||
# Code Review Instructions
|
||||
|
||||
- Ensure the code follows best practices and coding standards.
|
||||
- For **Python** code, follow
|
||||
[PEP 20](https://www.python.org/dev/peps/pep-0020/) and
|
||||
[CEP-8](https://gist.github.com/reactive-firewall/b7ee98df9e636a51806e62ef9c4ab161)
|
||||
standards.
|
||||
|
||||
# Documentation Review Instructions
|
||||
- Verify that documentation and comments are clear and comprehensive.
|
||||
- Verify that documentation and comments are free of spelling mistakes.
|
||||
|
||||
# Test Code Review Instructions
|
||||
- Ensure that test code is automated, comprehensive, and follows testing best practices.
|
||||
- Verify that all critical functionality is covered by tests.
|
||||
- Ensure that test code follow
|
||||
[CEP-8](https://gist.github.com/reactive-firewall/d840ee9990e65f302ce2a8d78ebe73f6)
|
||||
|
||||
# Misc.
|
||||
- Confirm that the code meets the project's requirements and objectives.
|
||||
- Confirm that copyright years are up-to date whenever a file is changed.
|
||||
request_changes_workflow: false
|
||||
high_level_summary: true
|
||||
high_level_summary_placeholder: '@coderabbitai summary'
|
||||
auto_title_placeholder: '@coderabbitai'
|
||||
review_status: true
|
||||
poem: false
|
||||
collapse_walkthrough: false
|
||||
sequence_diagrams: false
|
||||
changed_files_summary: true
|
||||
path_filters: ['!*.xc*/**', '!node_modules/**', '!dist/**', '!build/**', '!.git/**', '!venv/**', '!__pycache__/**']
|
||||
path_instructions:
|
||||
- path: README.md
|
||||
instructions: >-
|
||||
1. Consider the file 'README.md' the overview/introduction of the project.
|
||||
Also consider the 'README.md' file the first place to look for project documentation.
|
||||
|
||||
2. When reviewing the file 'README.md' it should be linted with help
|
||||
from the tools `markdownlint` and `languagetool`, pointing out any issues.
|
||||
|
||||
3. You may assume the file 'README.md' will contain GitHub flavor Markdown.
|
||||
- path: '**/*.py'
|
||||
instructions: >-
|
||||
When reviewing Python code for this project:
|
||||
|
||||
1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
|
||||
|
||||
2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
|
||||
|
||||
3. As a style convention, consider the code style advocated in [CEP-8](https://gist.github.com/reactive-firewall/b7ee98df9e636a51806e62ef9c4ab161) and evaluate suggested changes for code style compliance.
|
||||
|
||||
4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
|
||||
|
||||
5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
|
||||
- path: cognee/tests/*
|
||||
instructions: >-
|
||||
When reviewing test code:
|
||||
|
||||
1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
|
||||
|
||||
2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
|
||||
|
||||
3. As a style convention, consider the code style advocated in [CEP-8](https://gist.github.com/reactive-firewall/b7ee98df9e636a51806e62ef9c4ab161) and evaluate suggested changes for code style compliance, pointing out any violations discovered.
|
||||
|
||||
4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
|
||||
|
||||
5. As a project rule, Python source files with names prefixed by the string "test_" and located in the project's "tests" directory are the project's unit-testing code. It is safe, albeit a heuristic, to assume these are considered part of the project's minimal acceptance testing unless a justifying exception to this assumption is documented.
|
||||
|
||||
6. As a project rule, any files without extensions and with names prefixed by either the string "check_" or the string "test_", and located in the project's "tests" directory, are the project's non-unit test code. "Non-unit test" in this context refers to any type of testing other than unit testing, such as (but not limited to) functional testing, style linting, regression testing, etc. It can also be assumed that non-unit testing code is usually written as Bash shell scripts.
|
||||
- path: requirements.txt
|
||||
instructions: >-
|
||||
* The project's own Python dependencies are recorded in 'requirements.txt' for production code.
|
||||
|
||||
* The project's testing-specific Python dependencies are recorded in 'tests/requirements.txt' and are used for testing the project.
|
||||
|
||||
* The project's documentation-specific Python dependencies are recorded in 'docs/requirements.txt' and are used only for generating Python-focused documentation for the project. 'docs/requirements.txt' may be absent if not applicable.
|
||||
|
||||
Consider these 'requirements.txt' files the records of truth regarding project dependencies.
|
||||
- path: .github/**
|
||||
instructions: >-
|
||||
* When the project is hosted on GitHub: All GitHub-specific configurations, templates, and tools should be found in the '.github' directory tree.
|
||||
|
||||
* 'actionlint' erroneously generates false positives when dealing with GitHub's `${{ ... }}` syntax in conditionals.
|
||||
|
||||
* 'actionlint' erroneously generates incorrect solutions when suggesting the removal of valid `${{ ... }}` syntax.
|
||||
abort_on_close: true
|
||||
auto_review:
|
||||
enabled: true
|
||||
auto_incremental_review: true
|
||||
ignore_title_keywords: []
|
||||
labels: []
|
||||
drafts: false
|
||||
base_branches:
|
||||
- dev
|
||||
- main
|
||||
tools:
|
||||
shellcheck:
|
||||
enabled: true
|
||||
ruff:
|
||||
enabled: true
|
||||
configuration:
|
||||
extend_select:
|
||||
- E # Pycodestyle errors (style issues)
|
||||
- F # PyFlakes codes (logical errors)
|
||||
- W # Pycodestyle warnings
|
||||
- N # PEP 8 naming conventions
|
||||
ignore:
|
||||
- W191
|
||||
- W391
|
||||
- E117
|
||||
- D208
|
||||
line_length: 100
|
||||
dummy_variable_rgx: '^(_.*|junk|extra)$' # Variables starting with '_' or named 'junk' or 'extras', are considered dummy variables
|
||||
markdownlint:
|
||||
enabled: true
|
||||
yamllint:
|
||||
enabled: true
|
||||
chat:
|
||||
auto_reply: true
|
||||
|
|
@ -21,10 +21,6 @@ LLM_PROVIDER="openai"
|
|||
LLM_ENDPOINT=""
|
||||
LLM_API_VERSION=""
|
||||
LLM_MAX_TOKENS="16384"
|
||||
# Instructor's modes determine how structured data is requested from and extracted from LLM responses
|
||||
# You can change this type (i.e. mode) via this env variable
|
||||
# Each LLM has its own default value, e.g. gpt-5 models have "json_schema_mode"
|
||||
LLM_INSTRUCTOR_MODE=""
|
||||
|
||||
EMBEDDING_PROVIDER="openai"
|
||||
EMBEDDING_MODEL="openai/text-embedding-3-large"
|
||||
|
|
@ -32,10 +28,11 @@ EMBEDDING_ENDPOINT=""
|
|||
EMBEDDING_API_VERSION=""
|
||||
EMBEDDING_DIMENSIONS=3072
|
||||
EMBEDDING_MAX_TOKENS=8191
|
||||
EMBEDDING_BATCH_SIZE=36
|
||||
# If embedding key is not provided same key set for LLM_API_KEY will be used
|
||||
#EMBEDDING_API_KEY="your_api_key"
|
||||
|
||||
# Note: OpenAI support up to 2048 elements and Gemini supports a maximum of 100 elements in an embedding batch,
|
||||
# Cognee sets the optimal batch size for OpenAI and Gemini, but a custom size can be defined if necessary for other models
|
||||
#EMBEDDING_BATCH_SIZE=2048
|
||||
|
||||
# If using BAML structured output these env variables will be used
|
||||
BAML_LLM_PROVIDER=openai
|
||||
|
|
@ -97,8 +94,6 @@ DB_NAME=cognee_db
|
|||
|
||||
# Default (local file-based)
|
||||
GRAPH_DATABASE_PROVIDER="kuzu"
|
||||
# Handler for multi-user access control mode, it handles how should the mapping/creation of separate DBs be handled per Cognee dataset
|
||||
GRAPH_DATASET_DATABASE_HANDLER="kuzu"
|
||||
|
||||
# -- To switch to Remote Kuzu uncomment and fill these: -------------------------------------------------------------
|
||||
#GRAPH_DATABASE_PROVIDER="kuzu"
|
||||
|
|
@ -123,8 +118,6 @@ VECTOR_DB_PROVIDER="lancedb"
|
|||
# Not needed if a cloud vector database is not used
|
||||
VECTOR_DB_URL=
|
||||
VECTOR_DB_KEY=
|
||||
# Handler for multi-user access control mode, it handles how should the mapping/creation of separate DBs be handled per Cognee dataset
|
||||
VECTOR_DATASET_DATABASE_HANDLER="lancedb"
|
||||
|
||||
################################################################################
|
||||
# 🧩 Ontology resolver settings
|
||||
|
|
@ -177,9 +170,8 @@ REQUIRE_AUTHENTICATION=False
|
|||
# Vector: LanceDB
|
||||
# Graph: KuzuDB
|
||||
#
|
||||
# It enforces creation of databases per Cognee user + dataset. Does not work with some graph and database providers.
|
||||
# Disable mode when using not supported graph/vector databases.
|
||||
ENABLE_BACKEND_ACCESS_CONTROL=True
|
||||
# It enforces LanceDB and KuzuDB use and uses them to create databases per Cognee user + dataset
|
||||
ENABLE_BACKEND_ACCESS_CONTROL=False
|
||||
|
||||
################################################################################
|
||||
# ☁️ Cloud Sync Settings
|
||||
|
|
@ -251,16 +243,15 @@ LITELLM_LOG="ERROR"
|
|||
|
||||
########## Local LLM via Ollama ###############################################
|
||||
|
||||
|
||||
#LLM_API_KEY ="ollama"
|
||||
#LLM_MODEL="llama3.1:8b"
|
||||
#LLM_PROVIDER="ollama"
|
||||
#LLM_ENDPOINT="http://localhost:11434/v1"
|
||||
#EMBEDDING_PROVIDER="ollama"
|
||||
#EMBEDDING_MODEL="nomic-embed-text:latest"
|
||||
#EMBEDDING_ENDPOINT="http://localhost:11434/api/embed"
|
||||
#EMBEDDING_DIMENSIONS=768
|
||||
#HUGGINGFACE_TOKENIZER="nomic-ai/nomic-embed-text-v1.5"
|
||||
#EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest"
|
||||
#EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings"
|
||||
#EMBEDDING_DIMENSIONS=4096
|
||||
#HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"
|
||||
|
||||
########## OpenRouter (also free) #########################################################
|
||||
|
||||
|
|
|
|||
12
.github/actions/cognee_setup/action.yml
vendored
12
.github/actions/cognee_setup/action.yml
vendored
|
|
@ -10,10 +10,6 @@ inputs:
|
|||
description: "Additional extra dependencies to install (space-separated)"
|
||||
required: false
|
||||
default: ""
|
||||
rebuild-lockfile:
|
||||
description: "Whether to rebuild the uv lockfile"
|
||||
required: false
|
||||
default: "false"
|
||||
|
||||
runs:
|
||||
using: "composite"
|
||||
|
|
@ -30,7 +26,6 @@ runs:
|
|||
enable-cache: true
|
||||
|
||||
- name: Rebuild uv lockfile
|
||||
if: ${{ inputs.rebuild-lockfile == 'true' }}
|
||||
shell: bash
|
||||
run: |
|
||||
rm uv.lock
|
||||
|
|
@ -46,9 +41,4 @@ runs:
|
|||
EXTRA_ARGS="$EXTRA_ARGS --extra $extra"
|
||||
done
|
||||
fi
|
||||
uv sync --extra api --extra docs --extra evals --extra codegraph --extra ollama --extra dev --extra neo4j --extra redis $EXTRA_ARGS
|
||||
|
||||
- name: Add telemetry identifier for telemetry test and in case telemetry is enabled by accident
|
||||
shell: bash
|
||||
run: |
|
||||
echo "test-machine" > .anon_id
|
||||
uv sync --extra api --extra docs --extra evals --extra codegraph --extra ollama --extra dev --extra neo4j $EXTRA_ARGS
|
||||
|
|
|
|||
8
.github/pull_request_template.md
vendored
8
.github/pull_request_template.md
vendored
|
|
@ -6,14 +6,6 @@ Please provide a clear, human-generated description of the changes in this PR.
|
|||
DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning.
|
||||
-->
|
||||
|
||||
## Acceptance Criteria
|
||||
<!--
|
||||
* Key requirements to the new feature or modification;
|
||||
* Proof that the changes work and meet the requirements;
|
||||
* Include instructions on how to verify the changes. Describe how to test it locally;
|
||||
* Proof that it's sufficiently tested.
|
||||
-->
|
||||
|
||||
## Type of Change
|
||||
<!-- Please check the relevant option -->
|
||||
- [ ] Bug fix (non-breaking change that fixes an issue)
|
||||
|
|
|
|||
20
.github/release-drafter.yml
vendored
20
.github/release-drafter.yml
vendored
|
|
@ -1,20 +0,0 @@
|
|||
name-template: 'v$NEXT_PATCH_VERSION'
|
||||
tag-template: 'v$NEXT_PATCH_VERSION'
|
||||
|
||||
categories:
|
||||
- title: 'Features'
|
||||
labels: ['feature', 'enhancement']
|
||||
- title: 'Bug Fixes'
|
||||
labels: ['bug', 'fix']
|
||||
- title: 'Maintenance'
|
||||
labels: ['chore', 'refactor', 'ci']
|
||||
|
||||
change-template: '- $TITLE (#$NUMBER) @$AUTHOR'
|
||||
template: |
|
||||
## What’s Changed
|
||||
|
||||
$CHANGES
|
||||
|
||||
## Contributors
|
||||
|
||||
$CONTRIBUTORS
|
||||
44
.github/workflows/basic_tests.yml
vendored
44
.github/workflows/basic_tests.yml
vendored
|
|
@ -75,7 +75,6 @@ jobs:
|
|||
name: Run Unit Tests
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -105,7 +104,6 @@ jobs:
|
|||
name: Run Integration Tests
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -125,7 +123,6 @@ jobs:
|
|||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
extra-dependencies: "scraping"
|
||||
|
||||
- name: Run Integration Tests
|
||||
run: uv run pytest cognee/tests/integration/
|
||||
|
|
@ -134,7 +131,6 @@ jobs:
|
|||
name: Run Simple Examples
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -164,13 +160,12 @@ jobs:
|
|||
name: Run Simple Examples BAML
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
STRUCTURED_OUTPUT_FRAMEWORK: "BAML"
|
||||
BAML_LLM_PROVIDER: openai
|
||||
BAML_LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
|
||||
BAML_LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
|
||||
BAML_LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
# BAML_LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
BAML_LLM_PROVIDER: azure-openai
|
||||
BAML_LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
BAML_LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
BAML_LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
BAML_LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
|
|
@ -197,3 +192,32 @@ jobs:
|
|||
|
||||
- name: Run Simple Examples
|
||||
run: uv run python ./examples/python/simple_example.py
|
||||
|
||||
graph-tests:
|
||||
name: Run Basic Graph Tests
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
|
||||
EMBEDDING_PROVIDER: openai
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
|
||||
- name: Run Graph Tests
|
||||
run: uv run python ./examples/python/code_graph_example.py --repo_path ./cognee/tasks/graph
|
||||
|
|
|
|||
3
.github/workflows/cli_tests.yml
vendored
3
.github/workflows/cli_tests.yml
vendored
|
|
@ -39,7 +39,6 @@ jobs:
|
|||
name: CLI Unit Tests
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -67,7 +66,6 @@ jobs:
|
|||
name: CLI Integration Tests
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -95,7 +93,6 @@ jobs:
|
|||
name: CLI Functionality Tests
|
||||
runs-on: ubuntu-22.04
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
|
|||
8
.github/workflows/db_examples_tests.yml
vendored
8
.github/workflows/db_examples_tests.yml
vendored
|
|
@ -60,8 +60,7 @@ jobs:
|
|||
|
||||
- name: Run Neo4j Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
ENV: dev
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
@ -96,7 +95,7 @@ jobs:
|
|||
|
||||
- name: Run Kuzu Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENV: dev
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
@ -142,8 +141,7 @@ jobs:
|
|||
|
||||
- name: Run PGVector Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
ENV: dev
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
|
|||
1
.github/workflows/distributed_test.yml
vendored
1
.github/workflows/distributed_test.yml
vendored
|
|
@ -47,7 +47,6 @@ jobs:
|
|||
- name: Run Distributed Cognee (Modal)
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
|
|||
20
.github/workflows/dockerhub-mcp.yml
vendored
20
.github/workflows/dockerhub-mcp.yml
vendored
|
|
@ -7,29 +7,14 @@ on:
|
|||
|
||||
jobs:
|
||||
docker-build-and-push:
|
||||
runs-on:
|
||||
group: Default
|
||||
labels:
|
||||
- docker_build_runner
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check and free disk space before build
|
||||
run: |
|
||||
echo "=== Before cleanup ==="
|
||||
df -h
|
||||
echo "Removing unused preinstalled SDKs to free space..."
|
||||
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc || true
|
||||
docker system prune -af || true
|
||||
echo "=== After cleanup ==="
|
||||
df -h
|
||||
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
with:
|
||||
buildkitd-flags: --root /tmp/buildkit
|
||||
|
||||
- name: Log in to Docker Hub
|
||||
uses: docker/login-action@v3
|
||||
|
|
@ -49,7 +34,7 @@ jobs:
|
|||
|
||||
- name: Build and push
|
||||
id: build
|
||||
uses: docker/build-push-action@v6
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
platforms: linux/amd64,linux/arm64
|
||||
|
|
@ -60,6 +45,5 @@ jobs:
|
|||
cache-from: type=registry,ref=cognee/cognee-mcp:buildcache
|
||||
cache-to: type=registry,ref=cognee/cognee-mcp:buildcache,mode=max
|
||||
|
||||
|
||||
- name: Image digest
|
||||
run: echo ${{ steps.build.outputs.digest }}
|
||||
|
|
|
|||
372
.github/workflows/e2e_tests.yml
vendored
372
.github/workflows/e2e_tests.yml
vendored
|
|
@ -1,6 +1,4 @@
|
|||
name: Reusable Integration Tests
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
|
|
@ -147,7 +145,6 @@ jobs:
|
|||
- name: Run Deduplication Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Test needs OpenAI endpoint to handle multimedia
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
|
|
@ -212,56 +209,6 @@ jobs:
|
|||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_parallel_databases.py
|
||||
|
||||
test-dataset-database-handler:
|
||||
name: Test dataset database handlers in Cognee
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run dataset databases handler test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_dataset_database_handler.py
|
||||
|
||||
test-dataset-database-deletion:
|
||||
name: Test dataset database deletion in Cognee
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run dataset databases deletion test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_dataset_delete.py
|
||||
|
||||
test-permissions:
|
||||
name: Test permissions with different situations in Cognee
|
||||
runs-on: ubuntu-22.04
|
||||
|
|
@ -277,7 +224,7 @@ jobs:
|
|||
- name: Dependencies already installed
|
||||
run: echo "Dependencies already installed in setup"
|
||||
|
||||
- name: Run permissions test
|
||||
- name: Run parallel databases test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
|
|
@ -290,31 +237,6 @@ jobs:
|
|||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_permissions.py
|
||||
|
||||
test-multi-tenancy:
|
||||
name: Test multi tenancy with different situations in Cognee
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run multi tenancy test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_multi_tenancy.py
|
||||
|
||||
test-graph-edges:
|
||||
name: Test graph edge ingestion
|
||||
runs-on: ubuntu-22.04
|
||||
|
|
@ -342,295 +264,3 @@ jobs:
|
|||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_edge_ingestion.py
|
||||
|
||||
run_concurrent_subprocess_access_test:
|
||||
name: Concurrent Subprocess access test
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
services:
|
||||
postgres:
|
||||
image: pgvector/pgvector:pg17
|
||||
env:
|
||||
POSTGRES_USER: cognee
|
||||
POSTGRES_PASSWORD: cognee
|
||||
POSTGRES_DB: cognee_db
|
||||
options: >-
|
||||
--health-cmd pg_isready
|
||||
--health-interval 10s
|
||||
--health-timeout 5s
|
||||
--health-retries 5
|
||||
ports:
|
||||
- 5432:5432
|
||||
|
||||
redis:
|
||||
image: redis:7
|
||||
ports:
|
||||
- 6379:6379
|
||||
options: >-
|
||||
--health-cmd "redis-cli ping"
|
||||
--health-interval 5s
|
||||
--health-timeout 3s
|
||||
--health-retries 5
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "postgres redis"
|
||||
|
||||
- name: Run Concurrent subprocess access test (Kuzu/Lancedb/Postgres/Redis)
|
||||
env:
|
||||
ENV: dev
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
GRAPH_DATABASE_PROVIDER: 'kuzu'
|
||||
CACHING: true
|
||||
CACHE_BACKEND: 'redis'
|
||||
SHARED_KUZU_LOCK: true
|
||||
DB_PROVIDER: 'postgres'
|
||||
DB_NAME: 'cognee_db'
|
||||
DB_HOST: '127.0.0.1'
|
||||
DB_PORT: 5432
|
||||
DB_USERNAME: cognee
|
||||
DB_PASSWORD: cognee
|
||||
run: uv run python ./cognee/tests/test_concurrent_subprocess_access.py
|
||||
|
||||
test-entity-extraction:
|
||||
name: Test Entity Extraction
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Dependencies already installed
|
||||
run: echo "Dependencies already installed in setup"
|
||||
|
||||
- name: Run Entity Extraction Test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/tasks/entity_extraction/entity_extraction_test.py
|
||||
|
||||
test-feedback-enrichment:
|
||||
name: Test Feedback Enrichment
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Dependencies already installed
|
||||
run: echo "Dependencies already installed in setup"
|
||||
|
||||
- name: Run Feedback Enrichment Test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_feedback_enrichment.py
|
||||
|
||||
test-edge-centered-payload:
|
||||
name: Test Cognify - Edge Centered Payload
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Dependencies already installed
|
||||
run: echo "Dependencies already installed in setup"
|
||||
|
||||
- name: Run Edge Centered Payload Test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
TRIPLET_EMBEDDING: True
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_edge_centered_payload.py
|
||||
|
||||
run_conversation_sessions_test_redis:
|
||||
name: Conversation sessions test (Redis)
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
services:
|
||||
postgres:
|
||||
image: pgvector/pgvector:pg17
|
||||
env:
|
||||
POSTGRES_USER: cognee
|
||||
POSTGRES_PASSWORD: cognee
|
||||
POSTGRES_DB: cognee_db
|
||||
options: >-
|
||||
--health-cmd pg_isready
|
||||
--health-interval 10s
|
||||
--health-timeout 5s
|
||||
--health-retries 5
|
||||
ports:
|
||||
- 5432:5432
|
||||
|
||||
redis:
|
||||
image: redis:7
|
||||
ports:
|
||||
- 6379:6379
|
||||
options: >-
|
||||
--health-cmd "redis-cli ping"
|
||||
--health-interval 5s
|
||||
--health-timeout 3s
|
||||
--health-retries 5
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "postgres redis"
|
||||
|
||||
- name: Run Conversation session tests (Redis)
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
GRAPH_DATABASE_PROVIDER: 'kuzu'
|
||||
CACHING: true
|
||||
CACHE_BACKEND: 'redis'
|
||||
DB_PROVIDER: 'postgres'
|
||||
DB_NAME: 'cognee_db'
|
||||
DB_HOST: '127.0.0.1'
|
||||
DB_PORT: 5432
|
||||
DB_USERNAME: cognee
|
||||
DB_PASSWORD: cognee
|
||||
run: uv run python ./cognee/tests/test_conversation_history.py
|
||||
|
||||
run_conversation_sessions_test_fs:
|
||||
name: Conversation sessions test (FS)
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
services:
|
||||
postgres:
|
||||
image: pgvector/pgvector:pg17
|
||||
env:
|
||||
POSTGRES_USER: cognee
|
||||
POSTGRES_PASSWORD: cognee
|
||||
POSTGRES_DB: cognee_db
|
||||
options: >-
|
||||
--health-cmd pg_isready
|
||||
--health-interval 10s
|
||||
--health-timeout 5s
|
||||
--health-retries 5
|
||||
ports:
|
||||
- 5432:5432
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "postgres"
|
||||
|
||||
- name: Run Conversation session tests (FS)
|
||||
env:
|
||||
ENV: dev
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
GRAPH_DATABASE_PROVIDER: 'kuzu'
|
||||
CACHING: true
|
||||
CACHE_BACKEND: 'fs'
|
||||
DB_PROVIDER: 'postgres'
|
||||
DB_NAME: 'cognee_db'
|
||||
DB_HOST: '127.0.0.1'
|
||||
DB_PORT: 5432
|
||||
DB_USERNAME: cognee
|
||||
DB_PASSWORD: cognee
|
||||
run: uv run python ./cognee/tests/test_conversation_history.py
|
||||
|
||||
run-pipeline-cache-test:
|
||||
name: Test Pipeline Caching
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run Pipeline Cache Test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_pipeline_cache.py
|
||||
|
|
|
|||
112
.github/workflows/examples_tests.yml
vendored
112
.github/workflows/examples_tests.yml
vendored
|
|
@ -21,7 +21,6 @@ jobs:
|
|||
|
||||
- name: Run Multimedia Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
run: uv run python ./examples/python/multimedia_example.py
|
||||
|
|
@ -41,7 +40,6 @@ jobs:
|
|||
|
||||
- name: Run Evaluation Framework Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
@ -71,8 +69,6 @@ jobs:
|
|||
|
||||
- name: Run Descriptive Graph Metrics Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
@ -103,7 +99,6 @@ jobs:
|
|||
|
||||
- name: Run Dynamic Steps Tests
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -115,84 +110,6 @@ jobs:
|
|||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./examples/python/dynamic_steps_example.py
|
||||
|
||||
test-temporal-example:
|
||||
name: Run Temporal Tests
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run Temporal Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./examples/python/temporal_example.py
|
||||
|
||||
test-ontology-example:
|
||||
name: Run Ontology Tests
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run Ontology Demo Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./examples/python/ontology_demo_example.py
|
||||
|
||||
test-agentic-reasoning:
|
||||
name: Run Agentic Reasoning Tests
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run Agentic Reasoning Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./examples/python/agentic_reasoning_procurement_example.py
|
||||
|
||||
test-memify:
|
||||
name: Run Memify Example
|
||||
runs-on: ubuntu-22.04
|
||||
|
|
@ -207,7 +124,6 @@ jobs:
|
|||
|
||||
- name: Run Memify Tests
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -219,32 +135,6 @@ jobs:
|
|||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./examples/python/memify_coding_agent_example.py
|
||||
|
||||
test-custom-pipeline:
|
||||
name: Run Custom Pipeline Example
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
|
||||
- name: Run Custom Pipeline Example
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./examples/python/run_custom_pipeline_example.py
|
||||
|
||||
test-permissions-example:
|
||||
name: Run Permissions Example
|
||||
runs-on: ubuntu-22.04
|
||||
|
|
@ -259,7 +149,6 @@ jobs:
|
|||
|
||||
- name: Run Memify Tests
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
@ -285,7 +174,6 @@ jobs:
|
|||
|
||||
- name: Run Docling Test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
|
|
|
|||
1
.github/workflows/graph_db_tests.yml
vendored
1
.github/workflows/graph_db_tests.yml
vendored
|
|
@ -78,7 +78,6 @@ jobs:
|
|||
- name: Run default Neo4j
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
|
|||
70
.github/workflows/load_tests.yml
vendored
70
.github/workflows/load_tests.yml
vendored
|
|
@ -1,70 +0,0 @@
|
|||
name: Load tests
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
workflow_call:
|
||||
secrets:
|
||||
LLM_MODEL:
|
||||
required: true
|
||||
LLM_ENDPOINT:
|
||||
required: true
|
||||
LLM_API_KEY:
|
||||
required: true
|
||||
LLM_API_VERSION:
|
||||
required: true
|
||||
EMBEDDING_MODEL:
|
||||
required: true
|
||||
EMBEDDING_ENDPOINT:
|
||||
required: true
|
||||
EMBEDDING_API_KEY:
|
||||
required: true
|
||||
EMBEDDING_API_VERSION:
|
||||
required: true
|
||||
OPENAI_API_KEY:
|
||||
required: true
|
||||
AWS_ACCESS_KEY_ID:
|
||||
required: true
|
||||
AWS_SECRET_ACCESS_KEY:
|
||||
required: true
|
||||
|
||||
jobs:
|
||||
test-load:
|
||||
name: Test Load
|
||||
runs-on: ubuntu-22.04
|
||||
timeout-minutes: 60
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "aws"
|
||||
|
||||
- name: Verify File Descriptor Limit
|
||||
run: ulimit -n
|
||||
|
||||
- name: Run Load Test
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: True
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
STORAGE_BACKEND: s3
|
||||
AWS_REGION: eu-west-1
|
||||
AWS_ENDPOINT_URL: https://s3-eu-west-1.amazonaws.com
|
||||
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_DEV_USER_KEY_ID }}
|
||||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_DEV_USER_SECRET_KEY }}
|
||||
run: uv run python ./cognee/tests/test_load.py
|
||||
|
||||
|
||||
22
.github/workflows/pre_test.yml
vendored
22
.github/workflows/pre_test.yml
vendored
|
|
@ -1,22 +0,0 @@
|
|||
on:
|
||||
workflow_call:
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
jobs:
|
||||
check-uv-lock:
|
||||
name: Validate uv lockfile and project metadata
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v4
|
||||
with:
|
||||
enable-cache: true
|
||||
|
||||
- name: Validate uv lockfile and project metadata
|
||||
run: uv lock --check || { echo "'uv lock --check' failed."; echo "Run 'uv lock' and push your changes."; exit 1; }
|
||||
138
.github/workflows/release.yml
vendored
138
.github/workflows/release.yml
vendored
|
|
@ -1,138 +0,0 @@
|
|||
name: release.yml
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
flavour:
|
||||
required: true
|
||||
default: dev
|
||||
type: choice
|
||||
options:
|
||||
- dev
|
||||
- main
|
||||
description: Dev or Main release
|
||||
|
||||
jobs:
|
||||
release-github:
|
||||
name: Create GitHub Release from ${{ inputs.flavour }}
|
||||
outputs:
|
||||
tag: ${{ steps.create_tag.outputs.tag }}
|
||||
version: ${{ steps.create_tag.outputs.version }}
|
||||
permissions:
|
||||
contents: write
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out ${{ inputs.flavour }}
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ref: ${{ inputs.flavour }}
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v7
|
||||
|
||||
- name: Create and push git tag
|
||||
id: create_tag
|
||||
run: |
|
||||
VERSION="$(uv version --short)"
|
||||
TAG="v${VERSION}"
|
||||
|
||||
echo "Tag to create: ${TAG}"
|
||||
|
||||
git config user.name "github-actions[bot]"
|
||||
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
|
||||
|
||||
echo "tag=${TAG}" >> "$GITHUB_OUTPUT"
|
||||
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
git tag "${TAG}"
|
||||
git push origin "${TAG}"
|
||||
|
||||
|
||||
- name: Create GitHub Release
|
||||
uses: softprops/action-gh-release@v2
|
||||
with:
|
||||
tag_name: ${{ steps.create_tag.outputs.tag }}
|
||||
generate_release_notes: true
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
release-pypi-package:
|
||||
needs: release-github
|
||||
name: Release PyPI Package from ${{ inputs.flavour }}
|
||||
permissions:
|
||||
contents: read
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out ${{ inputs.flavour }}
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ref: ${{ inputs.flavour }}
|
||||
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v7
|
||||
|
||||
- name: Install Python
|
||||
run: uv python install
|
||||
|
||||
- name: Install dependencies
|
||||
run: uv sync --locked --all-extras
|
||||
|
||||
- name: Build distributions
|
||||
run: uv build
|
||||
|
||||
- name: Publish ${{ inputs.flavour }} release to PyPI
|
||||
env:
|
||||
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }}
|
||||
run: uv publish
|
||||
|
||||
release-docker-image:
|
||||
needs: release-github
|
||||
name: Release Docker Image from ${{ inputs.flavour }}
|
||||
permissions:
|
||||
contents: read
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Check out ${{ inputs.flavour }}
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ref: ${{ inputs.flavour }}
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Log in to Docker Hub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKER_USERNAME }}
|
||||
password: ${{ secrets.DOCKER_PASSWORD }}
|
||||
|
||||
- name: Build and push Dev Docker Image
|
||||
if: ${{ inputs.flavour == 'dev' }}
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
platforms: linux/amd64,linux/arm64
|
||||
push: true
|
||||
tags: cognee/cognee:${{ needs.release-github.outputs.version }}
|
||||
labels: |
|
||||
version=${{ needs.release-github.outputs.version }}
|
||||
flavour=${{ inputs.flavour }}
|
||||
cache-from: type=registry,ref=cognee/cognee:buildcache
|
||||
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
|
||||
|
||||
- name: Build and push Main Docker Image
|
||||
if: ${{ inputs.flavour == 'main' }}
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
platforms: linux/amd64,linux/arm64
|
||||
push: true
|
||||
tags: |
|
||||
cognee/cognee:${{ needs.release-github.outputs.version }}
|
||||
cognee/cognee:latest
|
||||
labels: |
|
||||
version=${{ needs.release-github.outputs.version }}
|
||||
flavour=${{ inputs.flavour }}
|
||||
cache-from: type=registry,ref=cognee/cognee:buildcache
|
||||
cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
|
||||
17
.github/workflows/release_test.yml
vendored
17
.github/workflows/release_test.yml
vendored
|
|
@ -1,17 +0,0 @@
|
|||
# Long-running, heavy and resource-consuming tests for release validation
|
||||
name: Release Test Workflow
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
pull_request:
|
||||
branches:
|
||||
- main
|
||||
|
||||
jobs:
|
||||
load-tests:
|
||||
name: Load Tests
|
||||
uses: ./.github/workflows/load_tests.yml
|
||||
secrets: inherit
|
||||
78
.github/workflows/scorecard.yml
vendored
78
.github/workflows/scorecard.yml
vendored
|
|
@ -1,78 +0,0 @@
|
|||
# This workflow uses actions that are not certified by GitHub. They are provided
|
||||
# by a third-party and are governed by separate terms of service, privacy
|
||||
# policy, and support documentation.
|
||||
|
||||
name: Scorecard supply-chain security
|
||||
on:
|
||||
# For Branch-Protection check. Only the default branch is supported. See
|
||||
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection
|
||||
branch_protection_rule:
|
||||
# To guarantee Maintained check is occasionally updated. See
|
||||
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
|
||||
schedule:
|
||||
- cron: '35 8 * * 2'
|
||||
push:
|
||||
branches: [ "main" ]
|
||||
|
||||
# Declare default permissions as read only.
|
||||
permissions: read-all
|
||||
|
||||
jobs:
|
||||
analysis:
|
||||
name: Scorecard analysis
|
||||
runs-on: ubuntu-latest
|
||||
# `publish_results: true` only works when run from the default branch. conditional can be removed if disabled.
|
||||
if: github.event.repository.default_branch == github.ref_name || github.event_name == 'pull_request'
|
||||
permissions:
|
||||
# Needed to upload the results to code-scanning dashboard.
|
||||
security-events: write
|
||||
# Needed to publish results and get a badge (see publish_results below).
|
||||
id-token: write
|
||||
# Uncomment the permissions below if installing in a private repository.
|
||||
# contents: read
|
||||
# actions: read
|
||||
|
||||
steps:
|
||||
- name: "Checkout code"
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
|
||||
with:
|
||||
persist-credentials: false
|
||||
|
||||
- name: "Run analysis"
|
||||
uses: ossf/scorecard-action@f49aabe0b5af0936a0987cfb85d86b75731b0186 # v2.4.1
|
||||
with:
|
||||
results_file: results.sarif
|
||||
results_format: sarif
|
||||
# (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
|
||||
# - you want to enable the Branch-Protection check on a *public* repository, or
|
||||
# - you are installing Scorecard on a *private* repository
|
||||
# To create the PAT, follow the steps in https://github.com/ossf/scorecard-action?tab=readme-ov-file#authentication-with-fine-grained-pat-optional.
|
||||
# repo_token: ${{ secrets.SCORECARD_TOKEN }}
|
||||
|
||||
# Public repositories:
|
||||
# - Publish results to OpenSSF REST API for easy access by consumers
|
||||
# - Allows the repository to include the Scorecard badge.
|
||||
# - See https://github.com/ossf/scorecard-action#publishing-results.
|
||||
# For private repositories:
|
||||
# - `publish_results` will always be set to `false`, regardless
|
||||
# of the value entered here.
|
||||
publish_results: true
|
||||
|
||||
# (Optional) Uncomment file_mode if you have a .gitattributes with files marked export-ignore
|
||||
# file_mode: git
|
||||
|
||||
# Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
|
||||
# format to the repository Actions tab.
|
||||
- name: "Upload artifact"
|
||||
uses: actions/upload-artifact@4cec3d8aa04e39d1a68397de0c4cd6fb9dce8ec1 # v4.6.1
|
||||
with:
|
||||
name: SARIF file
|
||||
path: results.sarif
|
||||
retention-days: 5
|
||||
|
||||
# Upload the results to GitHub's code scanning dashboard (optional).
|
||||
# Commenting out will disable upload of results to your repo's Code Scanning dashboard
|
||||
- name: "Upload to code-scanning"
|
||||
uses: github/codeql-action/upload-sarif@v3
|
||||
with:
|
||||
sarif_file: results.sarif
|
||||
3
.github/workflows/search_db_tests.yml
vendored
3
.github/workflows/search_db_tests.yml
vendored
|
|
@ -84,7 +84,6 @@ jobs:
|
|||
GRAPH_DATABASE_PROVIDER: 'neo4j'
|
||||
VECTOR_DB_PROVIDER: 'lancedb'
|
||||
DB_PROVIDER: 'sqlite'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
GRAPH_DATABASE_URL: ${{ steps.neo4j.outputs.neo4j-url }}
|
||||
GRAPH_DATABASE_USERNAME: ${{ steps.neo4j.outputs.neo4j-username }}
|
||||
GRAPH_DATABASE_PASSWORD: ${{ steps.neo4j.outputs.neo4j-password }}
|
||||
|
|
@ -136,7 +135,6 @@ jobs:
|
|||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
GRAPH_DATABASE_PROVIDER: 'kuzu'
|
||||
VECTOR_DB_PROVIDER: 'pgvector'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
DB_PROVIDER: 'postgres'
|
||||
DB_NAME: 'cognee_db'
|
||||
DB_HOST: '127.0.0.1'
|
||||
|
|
@ -199,7 +197,6 @@ jobs:
|
|||
GRAPH_DATABASE_URL: ${{ steps.neo4j.outputs.neo4j-url }}
|
||||
GRAPH_DATABASE_USERNAME: ${{ steps.neo4j.outputs.neo4j-username }}
|
||||
GRAPH_DATABASE_PASSWORD: ${{ steps.neo4j.outputs.neo4j-password }}
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
DB_NAME: cognee_db
|
||||
DB_HOST: 127.0.0.1
|
||||
DB_PORT: 5432
|
||||
|
|
|
|||
31
.github/workflows/temporal_graph_tests.yml
vendored
31
.github/workflows/temporal_graph_tests.yml
vendored
|
|
@ -34,9 +34,10 @@ jobs:
|
|||
- name: Run Temporal Graph with Kuzu (lancedb + sqlite)
|
||||
env:
|
||||
ENV: 'dev'
|
||||
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
|
|
@ -72,10 +73,10 @@ jobs:
|
|||
- name: Run Temporal Graph with Neo4j (lancedb + sqlite)
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
|
|
@ -124,10 +125,10 @@ jobs:
|
|||
- name: Run Temporal Graph with Kuzu (postgres + pgvector)
|
||||
env:
|
||||
ENV: dev
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
|
|
@ -191,10 +192,10 @@ jobs:
|
|||
- name: Run Temporal Graph with Neo4j (postgres + pgvector)
|
||||
env:
|
||||
ENV: dev
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.OPENAI_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
|
|
|
|||
|
|
@ -9,11 +9,7 @@ on:
|
|||
python-versions:
|
||||
required: false
|
||||
type: string
|
||||
default: '["3.10.x", "3.12.x", "3.13.x"]'
|
||||
os:
|
||||
required: false
|
||||
type: string
|
||||
default: '["ubuntu-22.04", "macos-15", "windows-latest"]'
|
||||
default: '["3.10.x", "3.11.x", "3.12.x"]'
|
||||
secrets:
|
||||
LLM_PROVIDER:
|
||||
required: true
|
||||
|
|
@ -44,11 +40,10 @@ jobs:
|
|||
run-unit-tests:
|
||||
name: Unit tests ${{ matrix.python-version }} on ${{ matrix.os }}
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 60
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(inputs.python-versions) }}
|
||||
os: ${{ fromJSON(inputs.os) }}
|
||||
os: [ubuntu-22.04, macos-15, windows-latest]
|
||||
fail-fast: false
|
||||
steps:
|
||||
- name: Check out
|
||||
|
|
@ -81,11 +76,10 @@ jobs:
|
|||
run-integration-tests:
|
||||
name: Integration tests ${{ matrix.python-version }} on ${{ matrix.os }}
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 60
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(inputs.python-versions) }}
|
||||
os: ${{ fromJSON(inputs.os) }}
|
||||
os: [ ubuntu-22.04, macos-15, windows-latest ]
|
||||
fail-fast: false
|
||||
steps:
|
||||
- name: Check out
|
||||
|
|
@ -118,11 +112,10 @@ jobs:
|
|||
run-library-test:
|
||||
name: Library test ${{ matrix.python-version }} on ${{ matrix.os }}
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 60
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(inputs.python-versions) }}
|
||||
os: ${{ fromJSON(inputs.os) }}
|
||||
os: [ ubuntu-22.04, macos-15, windows-latest ]
|
||||
fail-fast: false
|
||||
steps:
|
||||
- name: Check out
|
||||
|
|
@ -155,11 +148,10 @@ jobs:
|
|||
run-build-test:
|
||||
name: Build test ${{ matrix.python-version }} on ${{ matrix.os }}
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 60
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(inputs.python-versions) }}
|
||||
os: ${{ fromJSON(inputs.os) }}
|
||||
os: [ ubuntu-22.04, macos-15, windows-latest ]
|
||||
fail-fast: false
|
||||
steps:
|
||||
- name: Check out
|
||||
|
|
@ -185,11 +177,10 @@ jobs:
|
|||
run-soft-deletion-test:
|
||||
name: Soft Delete test ${{ matrix.python-version }} on ${{ matrix.os }}
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 60
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(inputs.python-versions) }}
|
||||
os: ${{ fromJSON(inputs.os) }}
|
||||
os: [ ubuntu-22.04, macos-15, windows-latest ]
|
||||
fail-fast: false
|
||||
steps:
|
||||
- name: Check out
|
||||
|
|
@ -202,13 +193,6 @@ jobs:
|
|||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
|
||||
- name: Path setup
|
||||
if: ${{ matrix.os }} == 'windows-latest'
|
||||
shell: bash
|
||||
run: |
|
||||
PATH=$(printf '%s' "$PATH" | tr ':' $'\n' | grep -vi '/git/usr/bin' | paste -sd: -)
|
||||
export PATH
|
||||
|
||||
- name: Run Soft Deletion Tests
|
||||
env:
|
||||
ENV: 'dev'
|
||||
|
|
@ -223,11 +207,10 @@ jobs:
|
|||
run-hard-deletion-test:
|
||||
name: Hard Delete test ${{ matrix.python-version }} on ${{ matrix.os }}
|
||||
runs-on: ${{ matrix.os }}
|
||||
timeout-minutes: 60
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(inputs.python-versions) }}
|
||||
os: ${{ fromJSON(inputs.os) }}
|
||||
os: [ ubuntu-22.04, macos-15, windows-latest ]
|
||||
fail-fast: false
|
||||
steps:
|
||||
- name: Check out
|
||||
|
|
|
|||
90
.github/workflows/test_llms.yml
vendored
90
.github/workflows/test_llms.yml
vendored
|
|
@ -84,93 +84,3 @@ jobs:
|
|||
EMBEDDING_DIMENSIONS: "3072"
|
||||
EMBEDDING_MAX_TOKENS: "8191"
|
||||
run: uv run python ./examples/python/simple_example.py
|
||||
|
||||
test-bedrock-api-key:
|
||||
name: Run Bedrock API Key Test
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "aws"
|
||||
|
||||
- name: Run Bedrock API Key Simple Example
|
||||
env:
|
||||
LLM_PROVIDER: "bedrock"
|
||||
LLM_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
|
||||
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
|
||||
LLM_MAX_TOKENS: "16384"
|
||||
AWS_REGION_NAME: "eu-west-1"
|
||||
EMBEDDING_PROVIDER: "bedrock"
|
||||
EMBEDDING_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
|
||||
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
|
||||
EMBEDDING_DIMENSIONS: "1024"
|
||||
EMBEDDING_MAX_TOKENS: "8191"
|
||||
run: uv run python ./examples/python/simple_example.py
|
||||
|
||||
test-bedrock-aws-credentials:
|
||||
name: Run Bedrock AWS Credentials Test
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "aws"
|
||||
|
||||
- name: Run Bedrock AWS Credentials Simple Example
|
||||
env:
|
||||
LLM_PROVIDER: "bedrock"
|
||||
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
|
||||
LLM_MAX_TOKENS: "16384"
|
||||
AWS_REGION_NAME: "eu-west-1"
|
||||
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
EMBEDDING_PROVIDER: "bedrock"
|
||||
EMBEDDING_API_KEY: ${{ secrets.BEDROCK_API_KEY }}
|
||||
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
|
||||
EMBEDDING_DIMENSIONS: "1024"
|
||||
EMBEDDING_MAX_TOKENS: "8191"
|
||||
run: uv run python ./examples/python/simple_example.py
|
||||
|
||||
test-bedrock-aws-profile:
|
||||
name: Run Bedrock AWS Profile Test
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Cognee Setup
|
||||
uses: ./.github/actions/cognee_setup
|
||||
with:
|
||||
python-version: '3.11.x'
|
||||
extra-dependencies: "aws"
|
||||
|
||||
- name: Configure AWS Profile
|
||||
run: |
|
||||
mkdir -p ~/.aws
|
||||
cat > ~/.aws/credentials << EOF
|
||||
[bedrock-test]
|
||||
aws_access_key_id = ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws_secret_access_key = ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
EOF
|
||||
|
||||
- name: Run Bedrock AWS Profile Simple Example
|
||||
env:
|
||||
LLM_PROVIDER: "bedrock"
|
||||
LLM_MODEL: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
|
||||
LLM_MAX_TOKENS: "16384"
|
||||
AWS_PROFILE_NAME: "bedrock-test"
|
||||
AWS_REGION_NAME: "eu-west-1"
|
||||
EMBEDDING_PROVIDER: "bedrock"
|
||||
EMBEDDING_MODEL: "amazon.titan-embed-text-v2:0"
|
||||
EMBEDDING_DIMENSIONS: "1024"
|
||||
EMBEDDING_MAX_TOKENS: "8191"
|
||||
run: uv run python ./examples/python/simple_example.py
|
||||
21
.github/workflows/test_ollama.yml
vendored
21
.github/workflows/test_ollama.yml
vendored
|
|
@ -7,8 +7,13 @@ jobs:
|
|||
|
||||
run_ollama_test:
|
||||
|
||||
# needs 32 Gb RAM for phi4 in a container
|
||||
runs-on: buildjet-8vcpu-ubuntu-2204
|
||||
# needs 16 Gb RAM for phi4
|
||||
runs-on: buildjet-4vcpu-ubuntu-2204
|
||||
# services:
|
||||
# ollama:
|
||||
# image: ollama/ollama
|
||||
# ports:
|
||||
# - 11434:11434
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
|
|
@ -23,6 +28,14 @@ jobs:
|
|||
run: |
|
||||
uv add torch
|
||||
|
||||
# - name: Install ollama
|
||||
# run: curl -fsSL https://ollama.com/install.sh | sh
|
||||
# - name: Run ollama
|
||||
# run: |
|
||||
# ollama serve --openai &
|
||||
# ollama pull llama3.2 &
|
||||
# ollama pull avr/sfr-embedding-mistral:latest
|
||||
|
||||
- name: Start Ollama container
|
||||
run: |
|
||||
docker run -d --name ollama -p 11434:11434 ollama/ollama
|
||||
|
|
@ -62,7 +75,7 @@ jobs:
|
|||
{ "role": "user", "content": "Whatever I say, answer with Yes." }
|
||||
]
|
||||
}'
|
||||
curl -X POST http://127.0.0.1:11434/api/embed \
|
||||
curl -X POST http://127.0.0.1:11434/v1/embeddings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "avr/sfr-embedding-mistral:latest",
|
||||
|
|
@ -85,7 +98,7 @@ jobs:
|
|||
LLM_MODEL: "phi4"
|
||||
EMBEDDING_PROVIDER: "ollama"
|
||||
EMBEDDING_MODEL: "avr/sfr-embedding-mistral:latest"
|
||||
EMBEDDING_ENDPOINT: "http://localhost:11434/api/embed"
|
||||
EMBEDDING_ENDPOINT: "http://localhost:11434/api/embeddings"
|
||||
EMBEDDING_DIMENSIONS: "4096"
|
||||
HUGGINGFACE_TOKENIZER: "Salesforce/SFR-Embedding-Mistral"
|
||||
run: uv run python ./examples/python/simple_example.py
|
||||
|
|
|
|||
35
.github/workflows/test_suites.yml
vendored
35
.github/workflows/test_suites.yml
vendored
|
|
@ -1,6 +1,4 @@
|
|||
name: Test Suites
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
on:
|
||||
push:
|
||||
|
|
@ -18,21 +16,15 @@ env:
|
|||
RUNTIME__LOG_LEVEL: ERROR
|
||||
ENV: 'dev'
|
||||
|
||||
jobs:
|
||||
pre-test:
|
||||
name: basic checks
|
||||
uses: ./.github/workflows/pre_test.yml
|
||||
|
||||
jobs:
|
||||
basic-tests:
|
||||
name: Basic Tests
|
||||
uses: ./.github/workflows/basic_tests.yml
|
||||
needs: [ pre-test ]
|
||||
secrets: inherit
|
||||
|
||||
e2e-tests:
|
||||
name: End-to-End Tests
|
||||
uses: ./.github/workflows/e2e_tests.yml
|
||||
needs: [ pre-test ]
|
||||
secrets: inherit
|
||||
|
||||
distributed-tests:
|
||||
|
|
@ -88,22 +80,12 @@ jobs:
|
|||
uses: ./.github/workflows/notebooks_tests.yml
|
||||
secrets: inherit
|
||||
|
||||
different-os-tests-basic:
|
||||
name: OS and Python Tests Ubuntu
|
||||
different-operating-systems-tests:
|
||||
name: Operating System and Python Tests
|
||||
needs: [basic-tests, e2e-tests]
|
||||
uses: ./.github/workflows/test_different_operating_systems.yml
|
||||
with:
|
||||
python-versions: '["3.10.x", "3.11.x", "3.12.x", "3.13.x"]'
|
||||
os: '["ubuntu-22.04"]'
|
||||
secrets: inherit
|
||||
|
||||
different-os-tests-extended:
|
||||
name: OS and Python Tests Extended
|
||||
needs: [basic-tests, e2e-tests]
|
||||
uses: ./.github/workflows/test_different_operating_systems.yml
|
||||
with:
|
||||
python-versions: '["3.13.x"]'
|
||||
os: '["macos-15", "windows-latest"]'
|
||||
python-versions: '["3.10.x", "3.11.x", "3.12.x"]'
|
||||
secrets: inherit
|
||||
|
||||
# Matrix-based vector database tests
|
||||
|
|
@ -153,8 +135,7 @@ jobs:
|
|||
e2e-tests,
|
||||
graph-db-tests,
|
||||
notebook-tests,
|
||||
different-os-tests-basic,
|
||||
different-os-tests-extended,
|
||||
different-operating-systems-tests,
|
||||
vector-db-tests,
|
||||
example-tests,
|
||||
llm-tests,
|
||||
|
|
@ -174,8 +155,7 @@ jobs:
|
|||
cli-tests,
|
||||
graph-db-tests,
|
||||
notebook-tests,
|
||||
different-os-tests-basic,
|
||||
different-os-tests-extended,
|
||||
different-operating-systems-tests,
|
||||
vector-db-tests,
|
||||
example-tests,
|
||||
db-examples-tests,
|
||||
|
|
@ -196,8 +176,7 @@ jobs:
|
|||
"${{ needs.cli-tests.result }}" == "success" &&
|
||||
"${{ needs.graph-db-tests.result }}" == "success" &&
|
||||
"${{ needs.notebook-tests.result }}" == "success" &&
|
||||
"${{ needs.different-os-tests-basic.result }}" == "success" &&
|
||||
"${{ needs.different-os-tests-extended.result }}" == "success" &&
|
||||
"${{ needs.different-operating-systems-tests.result }}" == "success" &&
|
||||
"${{ needs.vector-db-tests.result }}" == "success" &&
|
||||
"${{ needs.example-tests.result }}" == "success" &&
|
||||
"${{ needs.db-examples-tests.result }}" == "success" &&
|
||||
|
|
|
|||
3
.github/workflows/vector_db_tests.yml
vendored
3
.github/workflows/vector_db_tests.yml
vendored
|
|
@ -92,7 +92,6 @@ jobs:
|
|||
- name: Run PGVector Tests
|
||||
env:
|
||||
ENV: 'dev'
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
LLM_MODEL: ${{ secrets.LLM_MODEL }}
|
||||
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
|
@ -128,4 +127,4 @@ jobs:
|
|||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
run: uv run python ./cognee/tests/test_lancedb.py
|
||||
run: uv run python ./cognee/tests/test_lancedb.py
|
||||
36
.github/workflows/weighted_edges_tests.yml
vendored
36
.github/workflows/weighted_edges_tests.yml
vendored
|
|
@ -2,7 +2,7 @@ name: Weighted Edges Tests
|
|||
|
||||
on:
|
||||
push:
|
||||
branches: [ main, dev, weighted_edges ]
|
||||
branches: [ main, weighted_edges ]
|
||||
paths:
|
||||
- 'cognee/modules/graph/utils/get_graph_from_model.py'
|
||||
- 'cognee/infrastructure/engine/models/Edge.py'
|
||||
|
|
@ -10,7 +10,7 @@ on:
|
|||
- 'examples/python/weighted_edges_example.py'
|
||||
- '.github/workflows/weighted_edges_tests.yml'
|
||||
pull_request:
|
||||
branches: [ main, dev ]
|
||||
branches: [ main ]
|
||||
paths:
|
||||
- 'cognee/modules/graph/utils/get_graph_from_model.py'
|
||||
- 'cognee/infrastructure/engine/models/Edge.py'
|
||||
|
|
@ -32,7 +32,7 @@ jobs:
|
|||
env:
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: gpt-5-mini
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
|
||||
steps:
|
||||
- name: Check out repository
|
||||
|
|
@ -67,13 +67,14 @@ jobs:
|
|||
env:
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: gpt-5-mini
|
||||
LLM_ENDPOINT: https://api.openai.com/v1
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_ENDPOINT: https://api.openai.com/v1/
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: "2024-02-01"
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
EMBEDDING_PROVIDER: openai
|
||||
EMBEDDING_MODEL: text-embedding-3-small
|
||||
EMBEDDING_ENDPOINT: https://api.openai.com/v1/
|
||||
EMBEDDING_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
EMBEDDING_API_VERSION: "2024-02-01"
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
|
@ -94,7 +95,6 @@ jobs:
|
|||
|
||||
- name: Run Weighted Edges Tests
|
||||
env:
|
||||
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
|
||||
GRAPH_DATABASE_PROVIDER: ${{ matrix.graph_db_provider }}
|
||||
GRAPH_DATABASE_URL: ${{ matrix.graph_db_provider == 'neo4j' && steps.neo4j.outputs.neo4j-url || '' }}
|
||||
GRAPH_DATABASE_USERNAME: ${{ matrix.graph_db_provider == 'neo4j' && steps.neo4j.outputs.neo4j-username || '' }}
|
||||
|
|
@ -108,14 +108,14 @@ jobs:
|
|||
env:
|
||||
LLM_PROVIDER: openai
|
||||
LLM_MODEL: gpt-5-mini
|
||||
LLM_ENDPOINT: https://api.openai.com/v1
|
||||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
LLM_ENDPOINT: https://api.openai.com/v1/
|
||||
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
LLM_API_VERSION: "2024-02-01"
|
||||
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
|
||||
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
|
||||
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
|
||||
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
|
||||
|
||||
EMBEDDING_PROVIDER: openai
|
||||
EMBEDDING_MODEL: text-embedding-3-small
|
||||
EMBEDDING_ENDPOINT: https://api.openai.com/v1/
|
||||
EMBEDDING_API_KEY: ${{ secrets.LLM_API_KEY }}
|
||||
EMBEDDING_API_VERSION: "2024-02-01"
|
||||
steps:
|
||||
- name: Check out repository
|
||||
uses: actions/checkout@v4
|
||||
|
|
@ -166,3 +166,5 @@ jobs:
|
|||
uses: astral-sh/ruff-action@v2
|
||||
with:
|
||||
args: "format --check cognee/modules/graph/utils/get_graph_from_model.py cognee/tests/unit/interfaces/graph/test_weighted_edges.py examples/python/weighted_edges_example.py"
|
||||
|
||||
|
||||
|
|
@ -1,9 +0,0 @@
|
|||
pull_request_rules:
|
||||
- name: Backport to main when backport_main label is set
|
||||
conditions:
|
||||
- label=backport_main
|
||||
- base=dev
|
||||
actions:
|
||||
backport:
|
||||
branches:
|
||||
- main
|
||||
132
AGENTS.md
132
AGENTS.md
|
|
@ -1,132 +0,0 @@
|
|||
## Repository Guidelines
|
||||
|
||||
This document summarizes how to work with the cognee repository: how it’s organized, how to build, test, lint, and contribute. It mirrors our actual tooling and CI while providing quick commands for local development.
|
||||
|
||||
## Project Structure & Module Organization
|
||||
|
||||
- `cognee/`: Core Python library and API.
|
||||
- `api/`: FastAPI application and versioned routers (add, cognify, memify, search, delete, users, datasets, responses, visualize, settings, sync, update, checks).
|
||||
- `cli/`: CLI entry points and subcommands invoked via `cognee` / `cognee-cli`.
|
||||
- `infrastructure/`: Databases, LLM providers, embeddings, loaders, and storage adapters.
|
||||
- `modules/`: Domain logic (graph, retrieval, ontology, users, processing, observability, etc.).
|
||||
- `tasks/`: Reusable tasks (e.g., code graph, web scraping, storage). Extend with new tasks here.
|
||||
- `eval_framework/`: Evaluation utilities and adapters.
|
||||
- `shared/`: Cross-cutting helpers (logging, settings, utils).
|
||||
- `tests/`: Unit, integration, CLI, and end-to-end tests organized by feature.
|
||||
- `__main__.py`: Entrypoint to route to CLI.
|
||||
- `cognee-mcp/`: Model Context Protocol server exposing cognee as MCP tools (SSE/HTTP/stdio). Contains its own README and Dockerfile.
|
||||
- `cognee-frontend/`: Next.js UI for local development and demos.
|
||||
- `distributed/`: Utilities for distributed execution (Modal, workers, queues).
|
||||
- `examples/`: Example scripts demonstrating the public APIs and features (graph, code graph, multimodal, permissions, etc.).
|
||||
- `notebooks/`: Jupyter notebooks for demos and tutorials.
|
||||
- `alembic/`: Database migrations for relational backends.
|
||||
|
||||
Notes:
|
||||
- Co-locate feature-specific helpers under their respective package (`modules/`, `infrastructure/`, or `tasks/`).
|
||||
- Extend the system by adding new tasks, loaders, or retrievers rather than modifying core pipeline mechanisms.
|
||||
|
||||
## Build, Test, and Development Commands
|
||||
|
||||
Python (root) – requires Python >= 3.10 and < 3.14. We recommend `uv` for speed and reproducibility.
|
||||
|
||||
- Create/refresh env and install dev deps:
|
||||
```bash
|
||||
uv sync --dev --all-extras --reinstall
|
||||
```
|
||||
|
||||
- Run the CLI (examples):
|
||||
```bash
|
||||
uv run cognee-cli add "Cognee turns documents into AI memory."
|
||||
uv run cognee-cli cognify
|
||||
uv run cognee-cli search "What does cognee do?"
|
||||
uv run cognee-cli -ui # Launches UI, backend API, and MCP server together
|
||||
```
|
||||
|
||||
- Start the FastAPI server directly:
|
||||
```bash
|
||||
uv run python -m cognee.api.client
|
||||
```
|
||||
|
||||
- Run tests (CI mirrors these commands):
|
||||
```bash
|
||||
uv run pytest cognee/tests/unit/ -v
|
||||
uv run pytest cognee/tests/integration/ -v
|
||||
```
|
||||
|
||||
- Lint and format (ruff):
|
||||
```bash
|
||||
uv run ruff check .
|
||||
uv run ruff format .
|
||||
```
|
||||
|
||||
- Optional static type checks (mypy):
|
||||
```bash
|
||||
uv run mypy cognee/
|
||||
```
|
||||
|
||||
MCP Server (`cognee-mcp/`):
|
||||
|
||||
- Install and run locally:
|
||||
```bash
|
||||
cd cognee-mcp
|
||||
uv sync --dev --all-extras --reinstall
|
||||
uv run python src/server.py # stdio (default)
|
||||
uv run python src/server.py --transport sse
|
||||
uv run python src/server.py --transport http --host 127.0.0.1 --port 8000 --path /mcp
|
||||
```
|
||||
|
||||
- API Mode (connect to a running Cognee API):
|
||||
```bash
|
||||
uv run python src/server.py --transport sse --api-url http://localhost:8000 --api-token YOUR_TOKEN
|
||||
```
|
||||
|
||||
- Docker quickstart (examples): see `cognee-mcp/README.md` for full details
|
||||
```bash
|
||||
docker run -e TRANSPORT_MODE=http --env-file ./.env -p 8000:8000 --rm -it cognee/cognee-mcp:main
|
||||
```
|
||||
|
||||
Frontend (`cognee-frontend/`):
|
||||
```bash
|
||||
cd cognee-frontend
|
||||
npm install
|
||||
npm run dev # Next.js dev server
|
||||
npm run lint # ESLint
|
||||
npm run build && npm start
|
||||
```
|
||||
|
||||
## Coding Style & Naming Conventions
|
||||
|
||||
Python:
|
||||
- 4-space indentation, modules and functions in `snake_case`, classes in `PascalCase`.
|
||||
- Public APIs should be type-annotated where practical.
|
||||
- Use `ruff format` before committing; `ruff check` enforces import hygiene and style (line-length 100 configured in `pyproject.toml`).
|
||||
- Prefer explicit, structured error handling. Use shared logging utilities in `cognee.shared.logging_utils`.
|
||||
|
||||
MCP server and Frontend:
|
||||
- Follow the local `README.md` and ESLint/TypeScript configuration in `cognee-frontend/`.
|
||||
|
||||
## Testing Guidelines
|
||||
|
||||
- Place Python tests under `cognee/tests/`.
|
||||
- Unit tests: `cognee/tests/unit/`
|
||||
- Integration tests: `cognee/tests/integration/`
|
||||
- CLI tests: `cognee/tests/cli_tests/`
|
||||
- Name test files `test_*.py`. Use `pytest.mark.asyncio` for async tests.
|
||||
- Avoid external state; rely on test fixtures and the CI-provided env vars when LLM/embedding providers are required. See CI workflows under `.github/workflows/` for expected environment variables.
|
||||
- When adding public APIs, provide/update targeted examples under `examples/python/`.
|
||||
|
||||
## Commit & Pull Request Guidelines
|
||||
|
||||
- Use clear, imperative subjects (≤ 72 chars) and conventional commit styling in PR titles. Our CI validates semantic PR titles (see `.github/workflows/pr_lint`). Examples:
|
||||
- `feat(graph): add temporal edge weighting`
|
||||
- `fix(api): handle missing auth cookie`
|
||||
- `docs: update installation instructions`
|
||||
- Reference related issues/discussions in the PR body and provide brief context.
|
||||
- PRs should describe scope, list local test commands run, and mention any impacts on MCP server or UI if applicable.
|
||||
- Sign commits and affirm the DCO (see `CONTRIBUTING.md`).
|
||||
|
||||
## CI Mirrors Local Commands
|
||||
|
||||
Our GitHub Actions run the same ruff checks and pytest suites shown above (`.github/workflows/basic_tests.yml` and related workflows). Use the commands in this document locally to minimize CI surprises.
|
||||
|
||||
|
||||
|
|
@ -71,7 +71,7 @@ git clone https://github.com/<your-github-username>/cognee.git
|
|||
cd cognee
|
||||
```
|
||||
In case you are working on Vector and Graph Adapters
|
||||
1. Fork the [**cognee-community**](https://github.com/topoteretes/cognee-community) repository
|
||||
1. Fork the [**cognee**](https://github.com/topoteretes/cognee-community) repository
|
||||
2. Clone your fork:
|
||||
```shell
|
||||
git clone https://github.com/<your-github-username>/cognee-community.git
|
||||
|
|
@ -97,21 +97,6 @@ git checkout -b feature/your-feature-name
|
|||
python cognee/cognee/tests/test_library.py
|
||||
```
|
||||
|
||||
### Running Simple Example
|
||||
|
||||
Change .env.example into .env and provide your OPENAI_API_KEY as LLM_API_KEY
|
||||
|
||||
Make sure to run ```shell uv sync ``` in the root cloned folder or set up a virtual environment to run cognee
|
||||
|
||||
```shell
|
||||
python cognee/cognee/examples/python/simple_example.py
|
||||
```
|
||||
or
|
||||
|
||||
```shell
|
||||
uv run python cognee/cognee/examples/python/simple_example.py
|
||||
```
|
||||
|
||||
## 4. 📤 Submitting Changes
|
||||
|
||||
1. Install ruff on your system
|
||||
|
|
|
|||
167
README.md
167
README.md
|
|
@ -5,27 +5,27 @@
|
|||
|
||||
<br />
|
||||
|
||||
Cognee - Accurate and Persistent AI Memory
|
||||
cognee - Memory for AI Agents in 6 lines of code
|
||||
|
||||
<p align="center">
|
||||
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
|
||||
.
|
||||
<a href="https://docs.cognee.ai/">Docs</a>
|
||||
.
|
||||
<a href="https://cognee.ai">Learn More</a>
|
||||
<a href="https://cognee.ai">Learn more</a>
|
||||
·
|
||||
<a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
|
||||
·
|
||||
<a href="https://www.reddit.com/r/AIMemory/">Join r/AIMemory</a>
|
||||
.
|
||||
<a href="https://github.com/topoteretes/cognee-community">Community Plugins & Add-ons</a>
|
||||
<a href="https://docs.cognee.ai/">Docs</a>
|
||||
.
|
||||
<a href="https://github.com/topoteretes/cognee-community">cognee community repo</a>
|
||||
</p>
|
||||
|
||||
|
||||
[](https://GitHub.com/topoteretes/cognee/network/)
|
||||
[](https://GitHub.com/topoteretes/cognee/stargazers/)
|
||||
[](https://GitHub.com/topoteretes/cognee/commit/)
|
||||
[](https://github.com/topoteretes/cognee/tags/)
|
||||
[](https://github.com/topoteretes/cognee/tags/)
|
||||
[](https://pepy.tech/project/cognee)
|
||||
[](https://github.com/topoteretes/cognee/blob/main/LICENSE)
|
||||
[](https://github.com/topoteretes/cognee/graphs/contributors)
|
||||
|
|
@ -41,7 +41,11 @@
|
|||
</a>
|
||||
</p>
|
||||
|
||||
Use your data to build personalized and dynamic memory for AI Agents. Cognee lets you replace RAG with scalable and modular ECL (Extract, Cognify, Load) pipelines.
|
||||
|
||||
|
||||
|
||||
|
||||
Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Extract, Cognify, Load) pipelines.
|
||||
|
||||
<p align="center">
|
||||
🌐 Available Languages
|
||||
|
|
@ -49,7 +53,7 @@ Use your data to build personalized and dynamic memory for AI Agents. Cognee let
|
|||
<!-- Keep these links. Translations will automatically update with the README. -->
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=de">Deutsch</a> |
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=es">Español</a> |
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=fr">Français</a> |
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=fr">français</a> |
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=ja">日本語</a> |
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=ko">한국어</a> |
|
||||
<a href="https://www.readme-i18n.com/topoteretes/cognee?lang=pt">Português</a> |
|
||||
|
|
@ -63,62 +67,73 @@ Use your data to build personalized and dynamic memory for AI Agents. Cognee let
|
|||
</div>
|
||||
</div>
|
||||
|
||||
## About Cognee
|
||||
|
||||
Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.
|
||||
Cognee offers default memory creation and search which we describe bellow. But with Cognee you can build your own!
|
||||
|
||||
|
||||
### Cognee Open Source:
|
||||
## Get Started
|
||||
|
||||
- Interconnects any type of data — including past conversations, files, images, and audio transcriptions
|
||||
- Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
|
||||
- Reduces developer effort and infrastructure cost while improving quality and precision
|
||||
- Provides Pythonic data pipelines for ingestion from 30+ data sources
|
||||
- Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints
|
||||
Get started quickly with a Google Colab <a href="https://colab.research.google.com/drive/1jHbWVypDgCLwjE71GSXhRL3YxYhCZzG1?usp=sharing">notebook</a> , <a href="https://deepnote.com/workspace/cognee-382213d0-0444-4c89-8265-13770e333c02/project/cognee-demo-78ffacb9-5832-4611-bb1a-560386068b30/notebook/Notebook-1-75b24cda566d4c24ab348f7150792601?utm_source=share-modal&utm_medium=product-shared-content&utm_campaign=notebook&utm_content=78ffacb9-5832-4611-bb1a-560386068b30">Deepnote notebook</a> or <a href="https://github.com/topoteretes/cognee/tree/main/cognee-starter-kit">starter repo</a>
|
||||
|
||||
|
||||
## Basic Usage & Feature Guide
|
||||
## About cognee
|
||||
|
||||
To learn more, [check out this short, end-to-end Colab walkthrough](https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing) of Cognee's core features.
|
||||
cognee works locally and stores your data on your device.
|
||||
Our hosted solution is just our deployment of OSS cognee on Modal, with the goal of making development and productionization easier.
|
||||
|
||||
[](https://colab.research.google.com/drive/12Vi9zID-M3fpKpKiaqDBvkk98ElkRPWy?usp=sharing)
|
||||
Self-hosted package:
|
||||
|
||||
## Quickstart
|
||||
- Interconnects any kind of documents: past conversations, files, images, and audio transcriptions
|
||||
- Replaces RAG systems with a memory layer based on graphs and vectors
|
||||
- Reduces developer effort and cost, while increasing quality and precision
|
||||
- Provides Pythonic data pipelines that manage data ingestion from 30+ data sources
|
||||
- Is highly customizable with custom tasks, pipelines, and a set of built-in search endpoints
|
||||
|
||||
Let’s try Cognee in just a few lines of code. For detailed setup and configuration, see the [Cognee Docs](https://docs.cognee.ai/getting-started/installation#environment-configuration).
|
||||
Hosted platform:
|
||||
- Includes a managed UI and a [hosted solution](https://www.cognee.ai)
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10 to 3.13
|
||||
|
||||
### Step 1: Install Cognee
|
||||
## Self-Hosted (Open Source)
|
||||
|
||||
You can install Cognee with **pip**, **poetry**, **uv**, or your preferred Python package manager.
|
||||
|
||||
### 📦 Installation
|
||||
|
||||
You can install Cognee using either **pip**, **poetry**, **uv** or any other python package manager.
|
||||
|
||||
Cognee supports Python 3.10 to 3.12
|
||||
|
||||
#### With uv
|
||||
|
||||
```bash
|
||||
uv pip install cognee
|
||||
```
|
||||
|
||||
### Step 2: Configure the LLM
|
||||
```python
|
||||
Detailed instructions can be found in our [docs](https://docs.cognee.ai/getting-started/installation#environment-configuration)
|
||||
|
||||
### 💻 Basic Usage
|
||||
|
||||
#### Setup
|
||||
|
||||
```
|
||||
import os
|
||||
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"
|
||||
|
||||
```
|
||||
Alternatively, create a `.env` file using our [template](https://github.com/topoteretes/cognee/blob/main/.env.template).
|
||||
|
||||
To integrate other LLM providers, see our [LLM Provider Documentation](https://docs.cognee.ai/setup-configuration/llm-providers).
|
||||
You can also set the variables by creating .env file, using our <a href="https://github.com/topoteretes/cognee/blob/main/.env.template">template.</a>
|
||||
To use different LLM providers, for more info check out our <a href="https://docs.cognee.ai/setup-configuration/llm-providers">documentation</a>
|
||||
|
||||
### Step 3: Run the Pipeline
|
||||
|
||||
Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.
|
||||
#### Simple example
|
||||
|
||||
Now, run a minimal pipeline:
|
||||
|
||||
|
||||
##### Python
|
||||
|
||||
This script will run the default pipeline:
|
||||
|
||||
```python
|
||||
import cognee
|
||||
import asyncio
|
||||
from pprint import pprint
|
||||
|
||||
|
||||
async def main():
|
||||
|
|
@ -132,81 +147,89 @@ async def main():
|
|||
await cognee.memify()
|
||||
|
||||
# Query the knowledge graph
|
||||
results = await cognee.search("What does Cognee do?")
|
||||
results = await cognee.search("What does cognee do?")
|
||||
|
||||
# Display the results
|
||||
for result in results:
|
||||
pprint(result)
|
||||
print(result)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
asyncio.run(main())
|
||||
|
||||
```
|
||||
|
||||
As you can see, the output is generated from the document we previously stored in Cognee:
|
||||
|
||||
```bash
|
||||
Cognee turns documents into AI memory.
|
||||
Example output:
|
||||
```
|
||||
Cognee turns documents into AI memory.
|
||||
|
||||
### Use the Cognee CLI
|
||||
```
|
||||
##### Via CLI
|
||||
|
||||
As an alternative, you can get started with these essential commands:
|
||||
Let's get the basics covered
|
||||
|
||||
```bash
|
||||
```
|
||||
cognee-cli add "Cognee turns documents into AI memory."
|
||||
|
||||
cognee-cli cognify
|
||||
|
||||
cognee-cli search "What does Cognee do?"
|
||||
cognee-cli search "What does cognee do?"
|
||||
cognee-cli delete --all
|
||||
|
||||
```
|
||||
|
||||
To open the local UI, run:
|
||||
```bash
|
||||
or run
|
||||
```
|
||||
cognee-cli -ui
|
||||
```
|
||||
|
||||
## Demos & Examples
|
||||
|
||||
See Cognee in action:
|
||||
|
||||
### Persistent Agent Memory
|
||||
|
||||
[Cognee Memory for LangGraph Agents](https://github.com/user-attachments/assets/e113b628-7212-4a2b-b288-0be39a93a1c3)
|
||||
|
||||
### Simple GraphRAG
|
||||
|
||||
[Watch Demo](https://github.com/user-attachments/assets/f2186b2e-305a-42b0-9c2d-9f4473f15df8)
|
||||
|
||||
### Cognee with Ollama
|
||||
|
||||
[Watch Demo](https://github.com/user-attachments/assets/39672858-f774-4136-b957-1e2de67b8981)
|
||||
</div>
|
||||
|
||||
|
||||
## Community & Support
|
||||
### Hosted Platform
|
||||
|
||||
### Contributing
|
||||
We welcome contributions from the community! Your input helps make Cognee better for everyone. See [`CONTRIBUTING.md`](CONTRIBUTING.md) to get started.
|
||||
Get up and running in minutes with automatic updates, analytics, and enterprise security.
|
||||
|
||||
### Code of Conduct
|
||||
1. Sign up on [cogwit](https://www.cognee.ai)
|
||||
2. Add your API key to local UI and sync your data to Cogwit
|
||||
|
||||
We're committed to fostering an inclusive and respectful community. Read our [Code of Conduct](https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md) for guidelines.
|
||||
|
||||
## Research & Citation
|
||||
|
||||
We recently published a research paper on optimizing knowledge graphs for LLM reasoning:
|
||||
|
||||
## Demos
|
||||
|
||||
1. Cogwit Beta demo:
|
||||
|
||||
[Cogwit Beta](https://github.com/user-attachments/assets/fa520cd2-2913-4246-a444-902ea5242cb0)
|
||||
|
||||
2. Simple GraphRAG demo
|
||||
|
||||
[Simple GraphRAG demo](https://github.com/user-attachments/assets/d80b0776-4eb9-4b8e-aa22-3691e2d44b8f)
|
||||
|
||||
3. cognee with Ollama
|
||||
|
||||
[cognee with local models](https://github.com/user-attachments/assets/8621d3e8-ecb8-4860-afb2-5594f2ee17db)
|
||||
|
||||
|
||||
## Contributing
|
||||
Your contributions are at the core of making this a true open source project. Any contributions you make are **greatly appreciated**. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for more information.
|
||||
|
||||
|
||||
## Code of Conduct
|
||||
|
||||
We are committed to making open source an enjoyable and respectful experience for our community. See <a href="https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md"><code>CODE_OF_CONDUCT</code></a> for more information.
|
||||
|
||||
## Citation
|
||||
|
||||
We now have a paper you can cite:
|
||||
|
||||
```bibtex
|
||||
@misc{markovic2025optimizinginterfaceknowledgegraphs,
|
||||
title={Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning},
|
||||
title={Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning},
|
||||
author={Vasilije Markovic and Lazar Obradovic and Laszlo Hajdu and Jovan Pavlovic},
|
||||
year={2025},
|
||||
eprint={2505.24478},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.AI},
|
||||
url={https://arxiv.org/abs/2505.24478},
|
||||
url={https://arxiv.org/abs/2505.24478},
|
||||
}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -87,6 +87,11 @@ db_engine = get_relational_engine()
|
|||
|
||||
print("Using database:", db_engine.db_uri)
|
||||
|
||||
if "sqlite" in db_engine.db_uri:
|
||||
from cognee.infrastructure.utils.run_sync import run_sync
|
||||
|
||||
run_sync(db_engine.create_database())
|
||||
|
||||
config.set_section_option(
|
||||
config.config_ini_section,
|
||||
"SQLALCHEMY_DATABASE_URI",
|
||||
|
|
|
|||
|
|
@ -10,7 +10,6 @@ from typing import Sequence, Union
|
|||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
from sqlalchemy.dialects import postgresql
|
||||
|
||||
|
||||
# revision identifiers, used by Alembic.
|
||||
|
|
@ -27,34 +26,7 @@ def upgrade() -> None:
|
|||
connection = op.get_bind()
|
||||
inspector = sa.inspect(connection)
|
||||
|
||||
if op.get_context().dialect.name == "postgresql":
|
||||
syncstatus_enum = postgresql.ENUM(
|
||||
"STARTED", "IN_PROGRESS", "COMPLETED", "FAILED", "CANCELLED", name="syncstatus"
|
||||
)
|
||||
syncstatus_enum.create(op.get_bind(), checkfirst=True)
|
||||
|
||||
if "sync_operations" not in inspector.get_table_names():
|
||||
if op.get_context().dialect.name == "postgresql":
|
||||
syncstatus = postgresql.ENUM(
|
||||
"STARTED",
|
||||
"IN_PROGRESS",
|
||||
"COMPLETED",
|
||||
"FAILED",
|
||||
"CANCELLED",
|
||||
name="syncstatus",
|
||||
create_type=False,
|
||||
)
|
||||
else:
|
||||
syncstatus = sa.Enum(
|
||||
"STARTED",
|
||||
"IN_PROGRESS",
|
||||
"COMPLETED",
|
||||
"FAILED",
|
||||
"CANCELLED",
|
||||
name="syncstatus",
|
||||
create_type=False,
|
||||
)
|
||||
|
||||
# Table doesn't exist, create it normally
|
||||
op.create_table(
|
||||
"sync_operations",
|
||||
|
|
@ -62,7 +34,15 @@ def upgrade() -> None:
|
|||
sa.Column("run_id", sa.Text(), nullable=True),
|
||||
sa.Column(
|
||||
"status",
|
||||
syncstatus,
|
||||
sa.Enum(
|
||||
"STARTED",
|
||||
"IN_PROGRESS",
|
||||
"COMPLETED",
|
||||
"FAILED",
|
||||
"CANCELLED",
|
||||
name="syncstatus",
|
||||
create_type=False,
|
||||
),
|
||||
nullable=True,
|
||||
),
|
||||
sa.Column("progress_percentage", sa.Integer(), nullable=True),
|
||||
|
|
|
|||
|
|
@ -1,333 +0,0 @@
|
|||
"""Expand dataset database with json connection field
|
||||
|
||||
Revision ID: 46a6ce2bd2b2
|
||||
Revises: 76625596c5c3
|
||||
Create Date: 2025-11-25 17:56:28.938931
|
||||
|
||||
"""
|
||||
|
||||
from typing import Sequence, Union
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
|
||||
|
||||
# revision identifiers, used by Alembic.
|
||||
revision: str = "46a6ce2bd2b2"
|
||||
down_revision: Union[str, None] = "76625596c5c3"
|
||||
branch_labels: Union[str, Sequence[str], None] = None
|
||||
depends_on: Union[str, Sequence[str], None] = None
|
||||
|
||||
graph_constraint_name = "dataset_database_graph_database_name_key"
|
||||
vector_constraint_name = "dataset_database_vector_database_name_key"
|
||||
TABLE_NAME = "dataset_database"
|
||||
|
||||
|
||||
def _get_column(inspector, table, name, schema=None):
|
||||
for col in inspector.get_columns(table, schema=schema):
|
||||
if col["name"] == name:
|
||||
return col
|
||||
return None
|
||||
|
||||
|
||||
def _recreate_table_without_unique_constraint_sqlite(op, insp):
|
||||
"""
|
||||
SQLite cannot drop unique constraints on individual columns. We must:
|
||||
1. Create a new table without the unique constraints.
|
||||
2. Copy data from the old table.
|
||||
3. Drop the old table.
|
||||
4. Rename the new table.
|
||||
"""
|
||||
conn = op.get_bind()
|
||||
|
||||
# Create new table definition (without unique constraints)
|
||||
op.create_table(
|
||||
f"{TABLE_NAME}_new",
|
||||
sa.Column("owner_id", sa.UUID()),
|
||||
sa.Column("dataset_id", sa.UUID(), primary_key=True, nullable=False),
|
||||
sa.Column("vector_database_name", sa.String(), nullable=False),
|
||||
sa.Column("graph_database_name", sa.String(), nullable=False),
|
||||
sa.Column("vector_database_provider", sa.String(), nullable=False),
|
||||
sa.Column("graph_database_provider", sa.String(), nullable=False),
|
||||
sa.Column(
|
||||
"vector_dataset_database_handler",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="lancedb",
|
||||
),
|
||||
sa.Column(
|
||||
"graph_dataset_database_handler",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="kuzu",
|
||||
),
|
||||
sa.Column("vector_database_url", sa.String()),
|
||||
sa.Column("graph_database_url", sa.String()),
|
||||
sa.Column("vector_database_key", sa.String()),
|
||||
sa.Column("graph_database_key", sa.String()),
|
||||
sa.Column(
|
||||
"graph_database_connection_info",
|
||||
sa.JSON(),
|
||||
nullable=False,
|
||||
server_default=sa.text("'{}'"),
|
||||
),
|
||||
sa.Column(
|
||||
"vector_database_connection_info",
|
||||
sa.JSON(),
|
||||
nullable=False,
|
||||
server_default=sa.text("'{}'"),
|
||||
),
|
||||
sa.Column("created_at", sa.DateTime()),
|
||||
sa.Column("updated_at", sa.DateTime()),
|
||||
sa.ForeignKeyConstraint(["dataset_id"], ["datasets.id"], ondelete="CASCADE"),
|
||||
sa.ForeignKeyConstraint(["owner_id"], ["principals.id"], ondelete="CASCADE"),
|
||||
)
|
||||
|
||||
# Copy data into new table
|
||||
conn.execute(
|
||||
sa.text(f"""
|
||||
INSERT INTO {TABLE_NAME}_new
|
||||
SELECT
|
||||
owner_id,
|
||||
dataset_id,
|
||||
vector_database_name,
|
||||
graph_database_name,
|
||||
vector_database_provider,
|
||||
graph_database_provider,
|
||||
vector_dataset_database_handler,
|
||||
graph_dataset_database_handler,
|
||||
vector_database_url,
|
||||
graph_database_url,
|
||||
vector_database_key,
|
||||
graph_database_key,
|
||||
COALESCE(graph_database_connection_info, '{{}}'),
|
||||
COALESCE(vector_database_connection_info, '{{}}'),
|
||||
created_at,
|
||||
updated_at
|
||||
FROM {TABLE_NAME}
|
||||
""")
|
||||
)
|
||||
|
||||
# Drop old table
|
||||
op.drop_table(TABLE_NAME)
|
||||
|
||||
# Rename new table
|
||||
op.rename_table(f"{TABLE_NAME}_new", TABLE_NAME)
|
||||
|
||||
|
||||
def _recreate_table_with_unique_constraint_sqlite(op, insp):
|
||||
"""
|
||||
SQLite cannot drop unique constraints on individual columns. We must:
|
||||
1. Create a new table without the unique constraints.
|
||||
2. Copy data from the old table.
|
||||
3. Drop the old table.
|
||||
4. Rename the new table.
|
||||
"""
|
||||
conn = op.get_bind()
|
||||
|
||||
# Create new table definition (without unique constraints)
|
||||
op.create_table(
|
||||
f"{TABLE_NAME}_new",
|
||||
sa.Column("owner_id", sa.UUID()),
|
||||
sa.Column("dataset_id", sa.UUID(), primary_key=True, nullable=False),
|
||||
sa.Column("vector_database_name", sa.String(), nullable=False, unique=True),
|
||||
sa.Column("graph_database_name", sa.String(), nullable=False, unique=True),
|
||||
sa.Column("vector_database_provider", sa.String(), nullable=False),
|
||||
sa.Column("graph_database_provider", sa.String(), nullable=False),
|
||||
sa.Column(
|
||||
"vector_dataset_database_handler",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="lancedb",
|
||||
),
|
||||
sa.Column(
|
||||
"graph_dataset_database_handler",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="kuzu",
|
||||
),
|
||||
sa.Column("vector_database_url", sa.String()),
|
||||
sa.Column("graph_database_url", sa.String()),
|
||||
sa.Column("vector_database_key", sa.String()),
|
||||
sa.Column("graph_database_key", sa.String()),
|
||||
sa.Column(
|
||||
"graph_database_connection_info",
|
||||
sa.JSON(),
|
||||
nullable=False,
|
||||
server_default=sa.text("'{}'"),
|
||||
),
|
||||
sa.Column(
|
||||
"vector_database_connection_info",
|
||||
sa.JSON(),
|
||||
nullable=False,
|
||||
server_default=sa.text("'{}'"),
|
||||
),
|
||||
sa.Column("created_at", sa.DateTime()),
|
||||
sa.Column("updated_at", sa.DateTime()),
|
||||
sa.ForeignKeyConstraint(["dataset_id"], ["datasets.id"], ondelete="CASCADE"),
|
||||
sa.ForeignKeyConstraint(["owner_id"], ["principals.id"], ondelete="CASCADE"),
|
||||
)
|
||||
|
||||
# Copy data into new table
|
||||
conn.execute(
|
||||
sa.text(f"""
|
||||
INSERT INTO {TABLE_NAME}_new
|
||||
SELECT
|
||||
owner_id,
|
||||
dataset_id,
|
||||
vector_database_name,
|
||||
graph_database_name,
|
||||
vector_database_provider,
|
||||
graph_database_provider,
|
||||
vector_dataset_database_handler,
|
||||
graph_dataset_database_handler,
|
||||
vector_database_url,
|
||||
graph_database_url,
|
||||
vector_database_key,
|
||||
graph_database_key,
|
||||
COALESCE(graph_database_connection_info, '{{}}'),
|
||||
COALESCE(vector_database_connection_info, '{{}}'),
|
||||
created_at,
|
||||
updated_at
|
||||
FROM {TABLE_NAME}
|
||||
""")
|
||||
)
|
||||
|
||||
# Drop old table
|
||||
op.drop_table(TABLE_NAME)
|
||||
|
||||
# Rename new table
|
||||
op.rename_table(f"{TABLE_NAME}_new", TABLE_NAME)
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
conn = op.get_bind()
|
||||
insp = sa.inspect(conn)
|
||||
|
||||
unique_constraints = insp.get_unique_constraints(TABLE_NAME)
|
||||
|
||||
vector_database_connection_info_column = _get_column(
|
||||
insp, "dataset_database", "vector_database_connection_info"
|
||||
)
|
||||
if not vector_database_connection_info_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column(
|
||||
"vector_database_connection_info",
|
||||
sa.JSON(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default=sa.text("'{}'"),
|
||||
),
|
||||
)
|
||||
|
||||
vector_dataset_database_handler = _get_column(
|
||||
insp, "dataset_database", "vector_dataset_database_handler"
|
||||
)
|
||||
if not vector_dataset_database_handler:
|
||||
# Add LanceDB as the default graph dataset database handler
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column(
|
||||
"vector_dataset_database_handler",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="lancedb",
|
||||
),
|
||||
)
|
||||
|
||||
graph_database_connection_info_column = _get_column(
|
||||
insp, "dataset_database", "graph_database_connection_info"
|
||||
)
|
||||
if not graph_database_connection_info_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column(
|
||||
"graph_database_connection_info",
|
||||
sa.JSON(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default=sa.text("'{}'"),
|
||||
),
|
||||
)
|
||||
|
||||
graph_dataset_database_handler = _get_column(
|
||||
insp, "dataset_database", "graph_dataset_database_handler"
|
||||
)
|
||||
if not graph_dataset_database_handler:
|
||||
# Add Kuzu as the default graph dataset database handler
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column(
|
||||
"graph_dataset_database_handler",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="kuzu",
|
||||
),
|
||||
)
|
||||
|
||||
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
|
||||
# Drop the unique constraint to make unique=False
|
||||
graph_constraint_to_drop = None
|
||||
for uc in unique_constraints:
|
||||
# Check if the constraint covers ONLY the target column
|
||||
if uc["name"] == graph_constraint_name:
|
||||
graph_constraint_to_drop = uc["name"]
|
||||
break
|
||||
|
||||
vector_constraint_to_drop = None
|
||||
for uc in unique_constraints:
|
||||
# Check if the constraint covers ONLY the target column
|
||||
if uc["name"] == vector_constraint_name:
|
||||
vector_constraint_to_drop = uc["name"]
|
||||
break
|
||||
|
||||
if (
|
||||
vector_constraint_to_drop
|
||||
and graph_constraint_to_drop
|
||||
and op.get_context().dialect.name == "postgresql"
|
||||
):
|
||||
# PostgreSQL
|
||||
batch_op.drop_constraint(graph_constraint_name, type_="unique")
|
||||
batch_op.drop_constraint(vector_constraint_name, type_="unique")
|
||||
|
||||
if op.get_context().dialect.name == "sqlite":
|
||||
conn = op.get_bind()
|
||||
# Fun fact: SQLite has hidden auto indexes for unique constraints that can't be dropped or accessed directly
|
||||
# So we need to check for them and drop them by recreating the table (altering column also won't work)
|
||||
result = conn.execute(sa.text("PRAGMA index_list('dataset_database')"))
|
||||
rows = result.fetchall()
|
||||
unique_auto_indexes = [row for row in rows if row[3] == "u"]
|
||||
for row in unique_auto_indexes:
|
||||
result = conn.execute(sa.text(f"PRAGMA index_info('{row[1]}')"))
|
||||
index_info = result.fetchall()
|
||||
if index_info[0][2] == "vector_database_name":
|
||||
# In case a unique index exists on vector_database_name, drop it and the graph_database_name one
|
||||
_recreate_table_without_unique_constraint_sqlite(op, insp)
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
conn = op.get_bind()
|
||||
insp = sa.inspect(conn)
|
||||
|
||||
if op.get_context().dialect.name == "sqlite":
|
||||
_recreate_table_with_unique_constraint_sqlite(op, insp)
|
||||
elif op.get_context().dialect.name == "postgresql":
|
||||
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
|
||||
# Re-add the unique constraint to return to unique=True
|
||||
batch_op.create_unique_constraint(graph_constraint_name, ["graph_database_name"])
|
||||
|
||||
with op.batch_alter_table("dataset_database", schema=None) as batch_op:
|
||||
# Re-add the unique constraint to return to unique=True
|
||||
batch_op.create_unique_constraint(vector_constraint_name, ["vector_database_name"])
|
||||
|
||||
op.drop_column("dataset_database", "vector_database_connection_info")
|
||||
op.drop_column("dataset_database", "graph_database_connection_info")
|
||||
op.drop_column("dataset_database", "vector_dataset_database_handler")
|
||||
op.drop_column("dataset_database", "graph_dataset_database_handler")
|
||||
|
|
@ -23,8 +23,11 @@ depends_on: Union[str, Sequence[str], None] = "8057ae7329c2"
|
|||
|
||||
|
||||
def upgrade() -> None:
|
||||
pass
|
||||
try:
|
||||
await_only(create_default_user())
|
||||
except UserAlreadyExists:
|
||||
pass # It's fine if the default user already exists
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
pass
|
||||
await_only(delete_user("default_user@example.com"))
|
||||
|
|
|
|||
|
|
@ -1,98 +0,0 @@
|
|||
"""Expand dataset database for multi user
|
||||
|
||||
Revision ID: 76625596c5c3
|
||||
Revises: 211ab850ef3d
|
||||
Create Date: 2025-10-30 12:55:20.239562
|
||||
|
||||
"""
|
||||
|
||||
from typing import Sequence, Union
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
|
||||
|
||||
# revision identifiers, used by Alembic.
|
||||
revision: str = "76625596c5c3"
|
||||
down_revision: Union[str, None] = "c946955da633"
|
||||
branch_labels: Union[str, Sequence[str], None] = None
|
||||
depends_on: Union[str, Sequence[str], None] = None
|
||||
|
||||
|
||||
def _get_column(inspector, table, name, schema=None):
|
||||
for col in inspector.get_columns(table, schema=schema):
|
||||
if col["name"] == name:
|
||||
return col
|
||||
return None
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
conn = op.get_bind()
|
||||
insp = sa.inspect(conn)
|
||||
|
||||
vector_database_provider_column = _get_column(
|
||||
insp, "dataset_database", "vector_database_provider"
|
||||
)
|
||||
if not vector_database_provider_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column(
|
||||
"vector_database_provider",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="lancedb",
|
||||
),
|
||||
)
|
||||
|
||||
graph_database_provider_column = _get_column(
|
||||
insp, "dataset_database", "graph_database_provider"
|
||||
)
|
||||
if not graph_database_provider_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column(
|
||||
"graph_database_provider",
|
||||
sa.String(),
|
||||
unique=False,
|
||||
nullable=False,
|
||||
server_default="kuzu",
|
||||
),
|
||||
)
|
||||
|
||||
vector_database_url_column = _get_column(insp, "dataset_database", "vector_database_url")
|
||||
if not vector_database_url_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column("vector_database_url", sa.String(), unique=False, nullable=True),
|
||||
)
|
||||
|
||||
graph_database_url_column = _get_column(insp, "dataset_database", "graph_database_url")
|
||||
if not graph_database_url_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column("graph_database_url", sa.String(), unique=False, nullable=True),
|
||||
)
|
||||
|
||||
vector_database_key_column = _get_column(insp, "dataset_database", "vector_database_key")
|
||||
if not vector_database_key_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column("vector_database_key", sa.String(), unique=False, nullable=True),
|
||||
)
|
||||
|
||||
graph_database_key_column = _get_column(insp, "dataset_database", "graph_database_key")
|
||||
if not graph_database_key_column:
|
||||
op.add_column(
|
||||
"dataset_database",
|
||||
sa.Column("graph_database_key", sa.String(), unique=False, nullable=True),
|
||||
)
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
op.drop_column("dataset_database", "vector_database_provider")
|
||||
op.drop_column("dataset_database", "graph_database_provider")
|
||||
op.drop_column("dataset_database", "vector_database_url")
|
||||
op.drop_column("dataset_database", "graph_database_url")
|
||||
op.drop_column("dataset_database", "vector_database_key")
|
||||
op.drop_column("dataset_database", "graph_database_key")
|
||||
|
|
@ -18,8 +18,11 @@ depends_on: Union[str, Sequence[str], None] = None
|
|||
|
||||
|
||||
def upgrade() -> None:
|
||||
pass
|
||||
db_engine = get_relational_engine()
|
||||
# we might want to delete this
|
||||
await_only(db_engine.create_database())
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
pass
|
||||
db_engine = get_relational_engine()
|
||||
await_only(db_engine.delete_database())
|
||||
|
|
|
|||
|
|
@ -144,58 +144,44 @@ def _create_data_permission(conn, user_id, data_id, permission_name):
|
|||
)
|
||||
|
||||
|
||||
def _get_column(inspector, table, name, schema=None):
|
||||
for col in inspector.get_columns(table, schema=schema):
|
||||
if col["name"] == name:
|
||||
return col
|
||||
return None
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
conn = op.get_bind()
|
||||
insp = sa.inspect(conn)
|
||||
|
||||
dataset_id_column = _get_column(insp, "acls", "dataset_id")
|
||||
if not dataset_id_column:
|
||||
# Recreate ACLs table with default permissions set to datasets instead of documents
|
||||
op.drop_table("acls")
|
||||
# Recreate ACLs table with default permissions set to datasets instead of documents
|
||||
op.drop_table("acls")
|
||||
|
||||
acls_table = op.create_table(
|
||||
"acls",
|
||||
sa.Column("id", UUID, primary_key=True, default=uuid4),
|
||||
sa.Column(
|
||||
"created_at", sa.DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
|
||||
),
|
||||
sa.Column(
|
||||
"updated_at",
|
||||
sa.DateTime(timezone=True),
|
||||
onupdate=lambda: datetime.now(timezone.utc),
|
||||
),
|
||||
sa.Column("principal_id", UUID, sa.ForeignKey("principals.id")),
|
||||
sa.Column("permission_id", UUID, sa.ForeignKey("permissions.id")),
|
||||
sa.Column("dataset_id", UUID, sa.ForeignKey("datasets.id", ondelete="CASCADE")),
|
||||
)
|
||||
acls_table = op.create_table(
|
||||
"acls",
|
||||
sa.Column("id", UUID, primary_key=True, default=uuid4),
|
||||
sa.Column(
|
||||
"created_at", sa.DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
|
||||
),
|
||||
sa.Column(
|
||||
"updated_at", sa.DateTime(timezone=True), onupdate=lambda: datetime.now(timezone.utc)
|
||||
),
|
||||
sa.Column("principal_id", UUID, sa.ForeignKey("principals.id")),
|
||||
sa.Column("permission_id", UUID, sa.ForeignKey("permissions.id")),
|
||||
sa.Column("dataset_id", UUID, sa.ForeignKey("datasets.id", ondelete="CASCADE")),
|
||||
)
|
||||
|
||||
# Note: We can't use any Cognee model info to gather data (as it can change) in database so we must use our own table
|
||||
# definition or load what is in the database
|
||||
dataset_table = _define_dataset_table()
|
||||
datasets = conn.execute(sa.select(dataset_table)).fetchall()
|
||||
# Note: We can't use any Cognee model info to gather data (as it can change) in database so we must use our own table
|
||||
# definition or load what is in the database
|
||||
dataset_table = _define_dataset_table()
|
||||
datasets = conn.execute(sa.select(dataset_table)).fetchall()
|
||||
|
||||
if not datasets:
|
||||
return
|
||||
if not datasets:
|
||||
return
|
||||
|
||||
acl_list = []
|
||||
acl_list = []
|
||||
|
||||
for dataset in datasets:
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "read"))
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "write"))
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "share"))
|
||||
acl_list.append(
|
||||
_create_dataset_permission(conn, dataset.owner_id, dataset.id, "delete")
|
||||
)
|
||||
for dataset in datasets:
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "read"))
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "write"))
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "share"))
|
||||
acl_list.append(_create_dataset_permission(conn, dataset.owner_id, dataset.id, "delete"))
|
||||
|
||||
if acl_list:
|
||||
op.bulk_insert(acls_table, acl_list)
|
||||
if acl_list:
|
||||
op.bulk_insert(acls_table, acl_list)
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
|
|
|
|||
|
|
@ -1,137 +0,0 @@
|
|||
"""Multi Tenant Support
|
||||
|
||||
Revision ID: c946955da633
|
||||
Revises: 211ab850ef3d
|
||||
Create Date: 2025-11-04 18:11:09.325158
|
||||
|
||||
"""
|
||||
|
||||
from typing import Sequence, Union
|
||||
from datetime import datetime, timezone
|
||||
from uuid import uuid4
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
|
||||
# revision identifiers, used by Alembic.
|
||||
revision: str = "c946955da633"
|
||||
down_revision: Union[str, None] = "211ab850ef3d"
|
||||
branch_labels: Union[str, Sequence[str], None] = None
|
||||
depends_on: Union[str, Sequence[str], None] = None
|
||||
|
||||
|
||||
def _now():
|
||||
return datetime.now(timezone.utc)
|
||||
|
||||
|
||||
def _define_user_table() -> sa.Table:
|
||||
table = sa.Table(
|
||||
"users",
|
||||
sa.MetaData(),
|
||||
sa.Column(
|
||||
"id",
|
||||
sa.UUID,
|
||||
sa.ForeignKey("principals.id", ondelete="CASCADE"),
|
||||
primary_key=True,
|
||||
nullable=False,
|
||||
),
|
||||
sa.Column("tenant_id", sa.UUID, sa.ForeignKey("tenants.id"), index=True, nullable=True),
|
||||
)
|
||||
return table
|
||||
|
||||
|
||||
def _define_dataset_table() -> sa.Table:
|
||||
# Note: We can't use any Cognee model info to gather data (as it can change) in database so we must use our own table
|
||||
# definition or load what is in the database
|
||||
table = sa.Table(
|
||||
"datasets",
|
||||
sa.MetaData(),
|
||||
sa.Column("id", sa.UUID, primary_key=True, default=uuid4),
|
||||
sa.Column("name", sa.Text),
|
||||
sa.Column(
|
||||
"created_at",
|
||||
sa.DateTime(timezone=True),
|
||||
default=lambda: datetime.now(timezone.utc),
|
||||
),
|
||||
sa.Column(
|
||||
"updated_at",
|
||||
sa.DateTime(timezone=True),
|
||||
onupdate=lambda: datetime.now(timezone.utc),
|
||||
),
|
||||
sa.Column("owner_id", sa.UUID(), sa.ForeignKey("principals.id"), index=True),
|
||||
sa.Column("tenant_id", sa.UUID(), sa.ForeignKey("tenants.id"), index=True, nullable=True),
|
||||
)
|
||||
|
||||
return table
|
||||
|
||||
|
||||
def _get_column(inspector, table, name, schema=None):
|
||||
for col in inspector.get_columns(table, schema=schema):
|
||||
if col["name"] == name:
|
||||
return col
|
||||
return None
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
conn = op.get_bind()
|
||||
insp = sa.inspect(conn)
|
||||
|
||||
dataset = _define_dataset_table()
|
||||
user = _define_user_table()
|
||||
|
||||
if "user_tenants" not in insp.get_table_names():
|
||||
# Define table with all necessary columns including primary key
|
||||
user_tenants = op.create_table(
|
||||
"user_tenants",
|
||||
sa.Column("user_id", sa.UUID, sa.ForeignKey("users.id"), primary_key=True),
|
||||
sa.Column("tenant_id", sa.UUID, sa.ForeignKey("tenants.id"), primary_key=True),
|
||||
sa.Column(
|
||||
"created_at", sa.DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
|
||||
),
|
||||
)
|
||||
|
||||
# Get all users with their tenant_id
|
||||
user_data = conn.execute(
|
||||
sa.select(user.c.id, user.c.tenant_id).where(user.c.tenant_id.isnot(None))
|
||||
).fetchall()
|
||||
|
||||
# Insert into user_tenants table
|
||||
if user_data:
|
||||
op.bulk_insert(
|
||||
user_tenants,
|
||||
[
|
||||
{"user_id": user_id, "tenant_id": tenant_id, "created_at": _now()}
|
||||
for user_id, tenant_id in user_data
|
||||
],
|
||||
)
|
||||
|
||||
tenant_id_column = _get_column(insp, "datasets", "tenant_id")
|
||||
if not tenant_id_column:
|
||||
op.add_column("datasets", sa.Column("tenant_id", sa.UUID(), nullable=True))
|
||||
|
||||
# Build subquery, select users.tenant_id for each dataset.owner_id
|
||||
tenant_id_from_dataset_owner = (
|
||||
sa.select(user.c.tenant_id).where(user.c.id == dataset.c.owner_id).scalar_subquery()
|
||||
)
|
||||
|
||||
if op.get_context().dialect.name == "sqlite":
|
||||
# If column doesn't exist create new original_extension column and update from values of extension column
|
||||
with op.batch_alter_table("datasets") as batch_op:
|
||||
batch_op.execute(
|
||||
dataset.update().values(
|
||||
tenant_id=tenant_id_from_dataset_owner,
|
||||
)
|
||||
)
|
||||
else:
|
||||
conn = op.get_bind()
|
||||
conn.execute(dataset.update().values(tenant_id=tenant_id_from_dataset_owner))
|
||||
|
||||
op.create_index(op.f("ix_datasets_tenant_id"), "datasets", ["tenant_id"])
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
# ### commands auto generated by Alembic - please adjust! ###
|
||||
op.drop_table("user_tenants")
|
||||
op.drop_index(op.f("ix_datasets_tenant_id"), table_name="datasets")
|
||||
op.drop_column("datasets", "tenant_id")
|
||||
# ### end Alembic commands ###
|
||||
2512
cognee-frontend/package-lock.json
generated
2512
cognee-frontend/package-lock.json
generated
File diff suppressed because it is too large
Load diff
|
|
@ -9,13 +9,13 @@
|
|||
"lint": "next lint"
|
||||
},
|
||||
"dependencies": {
|
||||
"@auth0/nextjs-auth0": "^4.13.1",
|
||||
"@auth0/nextjs-auth0": "^4.6.0",
|
||||
"classnames": "^2.5.1",
|
||||
"culori": "^4.0.1",
|
||||
"d3-force-3d": "^3.0.6",
|
||||
"next": "16.1.1",
|
||||
"react": "^19.2.0",
|
||||
"react-dom": "^19.2.0",
|
||||
"next": "15.3.3",
|
||||
"react": "^19.0.0",
|
||||
"react-dom": "^19.0.0",
|
||||
"react-force-graph-2d": "^1.27.1",
|
||||
"uuid": "^9.0.1"
|
||||
},
|
||||
|
|
@ -24,11 +24,11 @@
|
|||
"@tailwindcss/postcss": "^4.1.7",
|
||||
"@types/culori": "^4.0.0",
|
||||
"@types/node": "^20",
|
||||
"@types/react": "^19",
|
||||
"@types/react-dom": "^19",
|
||||
"@types/react": "^18",
|
||||
"@types/react-dom": "^18",
|
||||
"@types/uuid": "^9.0.8",
|
||||
"eslint": "^9",
|
||||
"eslint-config-next": "^16.0.4",
|
||||
"eslint-config-next": "^15.3.3",
|
||||
"eslint-config-prettier": "^10.1.5",
|
||||
"tailwindcss": "^4.1.7",
|
||||
"typescript": "^5"
|
||||
|
|
|
|||
119
cognee-frontend/src/app/(graph)/CrewAITrigger.tsx
Normal file
119
cognee-frontend/src/app/(graph)/CrewAITrigger.tsx
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
import { useState } from "react";
|
||||
import { fetch } from "@/utils";
|
||||
import { v4 as uuid4 } from "uuid";
|
||||
import { LoadingIndicator } from "@/ui/App";
|
||||
import { CTAButton, Input } from "@/ui/elements";
|
||||
|
||||
interface CrewAIFormPayload extends HTMLFormElement {
|
||||
username1: HTMLInputElement;
|
||||
username2: HTMLInputElement;
|
||||
}
|
||||
|
||||
interface CrewAITriggerProps {
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
onData: (data: any) => void;
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
onActivity: (activities: any) => void;
|
||||
}
|
||||
|
||||
export default function CrewAITrigger({ onData, onActivity }: CrewAITriggerProps) {
|
||||
const [isCrewAIRunning, setIsCrewAIRunning] = useState(false);
|
||||
|
||||
const handleRunCrewAI = (event: React.FormEvent<CrewAIFormPayload>) => {
|
||||
event.preventDefault();
|
||||
const formElements = event.currentTarget;
|
||||
|
||||
const crewAIConfig = {
|
||||
username1: formElements.username1.value,
|
||||
username2: formElements.username2.value,
|
||||
};
|
||||
|
||||
const backendApiUrl = process.env.NEXT_PUBLIC_BACKEND_API_URL;
|
||||
const wsUrl = backendApiUrl.replace(/^http(s)?/, "ws");
|
||||
|
||||
const websocket = new WebSocket(`${wsUrl}/v1/crewai/subscribe`);
|
||||
|
||||
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Dispatching hiring crew agents" }]);
|
||||
|
||||
websocket.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
|
||||
if (data.status === "PipelineRunActivity") {
|
||||
onActivity([data.payload]);
|
||||
return;
|
||||
}
|
||||
|
||||
onData({
|
||||
nodes: data.payload.nodes,
|
||||
links: data.payload.edges,
|
||||
});
|
||||
|
||||
const nodes_type_map: { [key: string]: number } = {};
|
||||
|
||||
for (let i = 0; i < data.payload.nodes.length; i++) {
|
||||
const node = data.payload.nodes[i];
|
||||
if (!nodes_type_map[node.type]) {
|
||||
nodes_type_map[node.type] = 0;
|
||||
}
|
||||
nodes_type_map[node.type] += 1;
|
||||
}
|
||||
|
||||
const activityMessage = Object.entries(nodes_type_map).reduce((message, [type, count]) => {
|
||||
return `${message}\n | ${type}: ${count}`;
|
||||
}, "Graph updated:");
|
||||
|
||||
onActivity([{
|
||||
id: uuid4(),
|
||||
timestamp: Date.now(),
|
||||
activity: activityMessage,
|
||||
}]);
|
||||
|
||||
if (data.status === "PipelineRunCompleted") {
|
||||
websocket.close();
|
||||
}
|
||||
};
|
||||
|
||||
onData(null);
|
||||
setIsCrewAIRunning(true);
|
||||
|
||||
return fetch("/v1/crewai/run", {
|
||||
method: "POST",
|
||||
body: JSON.stringify(crewAIConfig),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(() => {
|
||||
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Hiring crew agents made a decision" }]);
|
||||
})
|
||||
.catch(() => {
|
||||
onActivity([{ id: uuid4(), timestamp: Date.now(), activity: "Hiring crew agents had problems while executing" }]);
|
||||
})
|
||||
.finally(() => {
|
||||
websocket.close();
|
||||
setIsCrewAIRunning(false);
|
||||
});
|
||||
};
|
||||
|
||||
return (
|
||||
<form className="w-full flex flex-col gap-2" onSubmit={handleRunCrewAI}>
|
||||
<h1 className="text-2xl text-white">Cognee Dev Mexican Standoff</h1>
|
||||
<span className="text-white">Agents compare GitHub profiles, and make a decision who is a better developer</span>
|
||||
<div className="flex flex-row gap-2">
|
||||
<div className="flex flex-col w-full flex-1/2">
|
||||
<label className="block mb-1 text-white" htmlFor="username1">GitHub username</label>
|
||||
<Input name="username1" type="text" placeholder="Github Username" required defaultValue="hajdul88" />
|
||||
</div>
|
||||
<div className="flex flex-col w-full flex-1/2">
|
||||
<label className="block mb-1 text-white" htmlFor="username2">GitHub username</label>
|
||||
<Input name="username2" type="text" placeholder="Github Username" required defaultValue="lxobr" />
|
||||
</div>
|
||||
</div>
|
||||
<CTAButton type="submit" disabled={isCrewAIRunning} className="whitespace-nowrap">
|
||||
Start Mexican Standoff
|
||||
{isCrewAIRunning && <LoadingIndicator />}
|
||||
</CTAButton>
|
||||
</form>
|
||||
);
|
||||
}
|
||||
|
|
@ -6,6 +6,7 @@ import { NodeObject, LinkObject } from "react-force-graph-2d";
|
|||
import { ChangeEvent, useEffect, useImperativeHandle, useRef, useState } from "react";
|
||||
|
||||
import { DeleteIcon } from "@/ui/Icons";
|
||||
// import { FeedbackForm } from "@/ui/Partials";
|
||||
import { CTAButton, Input, NeutralButton, Select } from "@/ui/elements";
|
||||
|
||||
interface GraphControlsProps {
|
||||
|
|
@ -110,7 +111,7 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
|
|||
};
|
||||
|
||||
const [isAuthShapeChangeEnabled, setIsAuthShapeChangeEnabled] = useState(true);
|
||||
const shapeChangeTimeout = useRef<number | null>(null);
|
||||
const shapeChangeTimeout = useRef<number | null>();
|
||||
|
||||
useEffect(() => {
|
||||
onGraphShapeChange(DEFAULT_GRAPH_SHAPE);
|
||||
|
|
@ -229,6 +230,12 @@ export default function GraphControls({ data, isAddNodeFormOpen, onGraphShapeCha
|
|||
)}
|
||||
</>
|
||||
{/* )} */}
|
||||
|
||||
{/* {selectedTab === "feedback" && (
|
||||
<div className="flex flex-col gap-2">
|
||||
<FeedbackForm onSuccess={() => {}} />
|
||||
</div>
|
||||
)} */}
|
||||
</div>
|
||||
</>
|
||||
);
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
"use client";
|
||||
|
||||
import { useCallback, useRef, useState, RefObject } from "react";
|
||||
import { useCallback, useRef, useState, MutableRefObject } from "react";
|
||||
|
||||
import Link from "next/link";
|
||||
import { TextLogo } from "@/ui/App";
|
||||
|
|
@ -47,11 +47,11 @@ export default function GraphView() {
|
|||
updateData(newData);
|
||||
}, []);
|
||||
|
||||
const graphRef = useRef<GraphVisualizationAPI>(null);
|
||||
const graphRef = useRef<GraphVisualizationAPI>();
|
||||
|
||||
const graphControls = useRef<GraphControlsAPI>(null);
|
||||
const graphControls = useRef<GraphControlsAPI>();
|
||||
|
||||
const activityLog = useRef<ActivityLogAPI>(null);
|
||||
const activityLog = useRef<ActivityLogAPI>();
|
||||
|
||||
return (
|
||||
<main className="flex flex-col h-full">
|
||||
|
|
@ -74,18 +74,21 @@ export default function GraphView() {
|
|||
<div className="w-full h-full relative overflow-hidden">
|
||||
<GraphVisualization
|
||||
key={data?.nodes.length}
|
||||
ref={graphRef as RefObject<GraphVisualizationAPI>}
|
||||
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
|
||||
data={data}
|
||||
graphControls={graphControls as RefObject<GraphControlsAPI>}
|
||||
graphControls={graphControls as MutableRefObject<GraphControlsAPI>}
|
||||
/>
|
||||
|
||||
<div className="absolute top-2 left-2 flex flex-col gap-2">
|
||||
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
|
||||
<CogneeAddWidget onData={onDataChange} />
|
||||
</div>
|
||||
{/* <div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
|
||||
<CrewAITrigger onData={onDataChange} onActivity={(activities) => activityLog.current?.updateActivityLog(activities)} />
|
||||
</div> */}
|
||||
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-sm">
|
||||
<h2 className="text-xl text-white mb-4">Activity Log</h2>
|
||||
<ActivityLog ref={activityLog as RefObject<ActivityLogAPI>} />
|
||||
<ActivityLog ref={activityLog as MutableRefObject<ActivityLogAPI>} />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
|
@ -93,7 +96,7 @@ export default function GraphView() {
|
|||
<div className="bg-gray-500 pt-4 pr-4 pb-4 pl-4 rounded-md w-110">
|
||||
<GraphControls
|
||||
data={data}
|
||||
ref={graphControls as RefObject<GraphControlsAPI>}
|
||||
ref={graphControls as MutableRefObject<GraphControlsAPI>}
|
||||
isAddNodeFormOpen={isAddNodeFormOpen}
|
||||
onFitIntoView={() => graphRef.current!.zoomToFit(1000, 50)}
|
||||
onGraphShapeChange={(shape) => graphRef.current!.setGraphShape(shape)}
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
"use client";
|
||||
|
||||
import classNames from "classnames";
|
||||
import { RefObject, useEffect, useImperativeHandle, useRef, useState, useCallback } from "react";
|
||||
import { MutableRefObject, useEffect, useImperativeHandle, useRef, useState, useCallback } from "react";
|
||||
import { forceCollide, forceManyBody } from "d3-force-3d";
|
||||
import dynamic from "next/dynamic";
|
||||
import { GraphControlsAPI } from "./GraphControls";
|
||||
|
|
@ -16,9 +16,9 @@ const ForceGraph = dynamic(() => import("react-force-graph-2d"), {
|
|||
import type { ForceGraphMethods, GraphData, LinkObject, NodeObject } from "react-force-graph-2d";
|
||||
|
||||
interface GraphVisuzaliationProps {
|
||||
ref: RefObject<GraphVisualizationAPI>;
|
||||
ref: MutableRefObject<GraphVisualizationAPI>;
|
||||
data?: GraphData<NodeObject, LinkObject>;
|
||||
graphControls: RefObject<GraphControlsAPI>;
|
||||
graphControls: MutableRefObject<GraphControlsAPI>;
|
||||
className?: string;
|
||||
}
|
||||
|
||||
|
|
@ -205,7 +205,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
|
|||
// eslint-disable-next-line @typescript-eslint/no-unused-vars
|
||||
function handleDagError(loopNodeIds: (string | number)[]) {}
|
||||
|
||||
const graphRef = useRef<ForceGraphMethods>(null);
|
||||
const graphRef = useRef<ForceGraphMethods>();
|
||||
|
||||
useEffect(() => {
|
||||
if (data && graphRef.current) {
|
||||
|
|
@ -224,7 +224,6 @@ export default function GraphVisualization({ ref, data, graphControls, className
|
|||
) => {
|
||||
if (!graphRef.current) {
|
||||
console.warn("GraphVisualization: graphRef not ready yet");
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
return undefined as any;
|
||||
}
|
||||
|
||||
|
|
@ -240,7 +239,7 @@ export default function GraphVisualization({ ref, data, graphControls, className
|
|||
return (
|
||||
<div ref={containerRef} className={classNames("w-full h-full", className)} id="graph-container">
|
||||
<ForceGraph
|
||||
ref={graphRef as RefObject<ForceGraphMethods>}
|
||||
ref={graphRef}
|
||||
width={dimensions.width}
|
||||
height={dimensions.height}
|
||||
dagMode={graphShape as unknown as undefined}
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
"use client";
|
||||
"use server";
|
||||
|
||||
import Dashboard from "./Dashboard";
|
||||
|
||||
export default function Page() {
|
||||
export default async function Page() {
|
||||
const accessToken = "";
|
||||
|
||||
return (
|
||||
|
|
|
|||
|
|
@ -1,3 +1,3 @@
|
|||
export { default } from "./dashboard/page";
|
||||
|
||||
export const dynamic = "force-dynamic";
|
||||
// export const dynamic = "force-dynamic";
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@ import { NextResponse, type NextRequest } from "next/server";
|
|||
// import { auth0 } from "./modules/auth/auth0";
|
||||
|
||||
// eslint-disable-next-line @typescript-eslint/no-unused-vars
|
||||
export async function proxy(request: NextRequest) {
|
||||
export async function middleware(request: NextRequest) {
|
||||
// if (process.env.USE_AUTH0_AUTHORIZATION?.toLowerCase() === "true") {
|
||||
// if (request.nextUrl.pathname === "/auth/token") {
|
||||
// return NextResponse.next();
|
||||
|
|
@ -13,6 +13,7 @@ export interface Dataset {
|
|||
|
||||
function useDatasets(useCloud = false) {
|
||||
const [datasets, setDatasets] = useState<Dataset[]>([]);
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
// const statusTimeout = useRef<any>(null);
|
||||
|
||||
// const fetchDatasetStatuses = useCallback((datasets: Dataset[]) => {
|
||||
|
|
|
|||
69
cognee-frontend/src/ui/Partials/FeedbackForm.tsx
Normal file
69
cognee-frontend/src/ui/Partials/FeedbackForm.tsx
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
"use client";
|
||||
|
||||
import { useState } from "react";
|
||||
import { LoadingIndicator } from "@/ui/App";
|
||||
import { fetch, useBoolean } from "@/utils";
|
||||
import { CTAButton, TextArea } from "@/ui/elements";
|
||||
|
||||
interface SignInFormPayload extends HTMLFormElement {
|
||||
feedback: HTMLTextAreaElement;
|
||||
}
|
||||
|
||||
interface FeedbackFormProps {
|
||||
onSuccess: () => void;
|
||||
}
|
||||
|
||||
export default function FeedbackForm({ onSuccess }: FeedbackFormProps) {
|
||||
const {
|
||||
value: isSubmittingFeedback,
|
||||
setTrue: disableFeedbackSubmit,
|
||||
setFalse: enableFeedbackSubmit,
|
||||
} = useBoolean(false);
|
||||
|
||||
const [feedbackError, setFeedbackError] = useState<string | null>(null);
|
||||
|
||||
const signIn = (event: React.FormEvent<SignInFormPayload>) => {
|
||||
event.preventDefault();
|
||||
const formElements = event.currentTarget;
|
||||
|
||||
setFeedbackError(null);
|
||||
disableFeedbackSubmit();
|
||||
|
||||
fetch("/v1/crewai/feedback", {
|
||||
method: "POST",
|
||||
body: JSON.stringify({
|
||||
feedback: formElements.feedback.value,
|
||||
}),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(() => {
|
||||
onSuccess();
|
||||
formElements.feedback.value = "";
|
||||
})
|
||||
.catch(error => setFeedbackError(error.detail))
|
||||
.finally(() => enableFeedbackSubmit());
|
||||
};
|
||||
|
||||
return (
|
||||
<form onSubmit={signIn} className="flex flex-col gap-2">
|
||||
<div className="flex flex-col gap-2">
|
||||
<div className="mb-4">
|
||||
<label className="block text-white" htmlFor="feedback">Feedback on agent's reasoning</label>
|
||||
<TextArea id="feedback" name="feedback" type="text" placeholder="Your feedback" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<CTAButton type="submit">
|
||||
<span>Submit feedback</span>
|
||||
{isSubmittingFeedback && <LoadingIndicator />}
|
||||
</CTAButton>
|
||||
|
||||
{feedbackError && (
|
||||
<span className="text-s text-white">{feedbackError}</span>
|
||||
)}
|
||||
</form>
|
||||
)
|
||||
}
|
||||
|
|
@ -3,3 +3,4 @@ export { default as Footer } from "./Footer/Footer";
|
|||
export { default as SearchView } from "./SearchView/SearchView";
|
||||
export { default as IFrameView } from "./IFrameView/IFrameView";
|
||||
// export { default as Explorer } from "./Explorer/Explorer";
|
||||
export { default as FeedbackForm } from "./FeedbackForm";
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
import { v4 as uuid4 } from "uuid";
|
||||
import classNames from "classnames";
|
||||
import { Fragment, MouseEvent, RefObject, useCallback, useEffect, useRef, useState } from "react";
|
||||
import { Fragment, MouseEvent, MutableRefObject, useCallback, useEffect, useRef, useState } from "react";
|
||||
|
||||
import { useModal } from "@/ui/elements/Modal";
|
||||
import { CaretIcon, CloseIcon, PlusIcon } from "@/ui/Icons";
|
||||
|
|
@ -282,7 +282,7 @@ export default function Notebook({ notebook, updateNotebook, runCell }: Notebook
|
|||
function CellResult({ content }: { content: [] }) {
|
||||
const parsedContent = [];
|
||||
|
||||
const graphRef = useRef<GraphVisualizationAPI>(null);
|
||||
const graphRef = useRef<GraphVisualizationAPI>();
|
||||
const graphControls = useRef<GraphControlsAPI>({
|
||||
setSelectedNode: () => {},
|
||||
getSelectedNode: () => null,
|
||||
|
|
@ -298,7 +298,7 @@ function CellResult({ content }: { content: [] }) {
|
|||
<span className="text-sm pl-2 mb-4">reasoning graph</span>
|
||||
<GraphVisualization
|
||||
data={transformInsightsGraphData(line)}
|
||||
ref={graphRef as RefObject<GraphVisualizationAPI>}
|
||||
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
|
||||
graphControls={graphControls}
|
||||
className="min-h-80"
|
||||
/>
|
||||
|
|
@ -346,7 +346,7 @@ function CellResult({ content }: { content: [] }) {
|
|||
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
|
||||
<GraphVisualization
|
||||
data={transformToVisualizationData(graph)}
|
||||
ref={graphRef as RefObject<GraphVisualizationAPI>}
|
||||
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
|
||||
graphControls={graphControls}
|
||||
className="min-h-80"
|
||||
/>
|
||||
|
|
@ -377,7 +377,7 @@ function CellResult({ content }: { content: [] }) {
|
|||
<span className="text-sm pl-2 mb-4">reasoning graph (datasets: {datasetName})</span>
|
||||
<GraphVisualization
|
||||
data={transformToVisualizationData(graph)}
|
||||
ref={graphRef as RefObject<GraphVisualizationAPI>}
|
||||
ref={graphRef as MutableRefObject<GraphVisualizationAPI>}
|
||||
graphControls={graphControls}
|
||||
className="min-h-80"
|
||||
/>
|
||||
|
|
|
|||
|
|
@ -33,7 +33,7 @@ export default function NotebookCellHeader({
|
|||
setFalse: setIsNotRunningCell,
|
||||
} = useBoolean(false);
|
||||
|
||||
const [runInstance] = useState<string>(isCloudEnvironment() ? "cloud" : "local");
|
||||
const [runInstance, setRunInstance] = useState<string>(isCloudEnvironment() ? "cloud" : "local");
|
||||
|
||||
const handleCellRun = () => {
|
||||
if (runCell) {
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@
|
|||
"moduleResolution": "bundler",
|
||||
"resolveJsonModule": true,
|
||||
"isolatedModules": true,
|
||||
"jsx": "react-jsx",
|
||||
"jsx": "preserve",
|
||||
"incremental": true,
|
||||
"plugins": [
|
||||
{
|
||||
|
|
@ -32,8 +32,7 @@
|
|||
"next-env.d.ts",
|
||||
"**/*.ts",
|
||||
"**/*.tsx",
|
||||
".next/types/**/*.ts",
|
||||
".next/dev/types/**/*.ts"
|
||||
".next/types/**/*.ts"
|
||||
],
|
||||
"exclude": [
|
||||
"node_modules"
|
||||
|
|
|
|||
|
|
@ -110,47 +110,6 @@ If you'd rather run cognee-mcp in a container, you have two options:
|
|||
# For stdio transport (default)
|
||||
docker run -e TRANSPORT_MODE=stdio --env-file ./.env --rm -it cognee/cognee-mcp:main
|
||||
```
|
||||
|
||||
**Installing optional dependencies at runtime:**
|
||||
|
||||
You can install optional dependencies when running the container by setting the `EXTRAS` environment variable:
|
||||
```bash
|
||||
# Install a single optional dependency group at runtime
|
||||
docker run \
|
||||
-e TRANSPORT_MODE=http \
|
||||
-e EXTRAS=aws \
|
||||
--env-file ./.env \
|
||||
-p 8000:8000 \
|
||||
--rm -it cognee/cognee-mcp:main
|
||||
|
||||
# Install multiple optional dependency groups at runtime (comma-separated)
|
||||
docker run \
|
||||
-e TRANSPORT_MODE=sse \
|
||||
-e EXTRAS=aws,postgres,neo4j \
|
||||
--env-file ./.env \
|
||||
-p 8000:8000 \
|
||||
--rm -it cognee/cognee-mcp:main
|
||||
```
|
||||
|
||||
**Available optional dependency groups:**
|
||||
- `aws` - S3 storage support
|
||||
- `postgres` / `postgres-binary` - PostgreSQL database support
|
||||
- `neo4j` - Neo4j graph database support
|
||||
- `neptune` - AWS Neptune support
|
||||
- `chromadb` - ChromaDB vector store support
|
||||
- `scraping` - Web scraping capabilities
|
||||
- `distributed` - Modal distributed execution
|
||||
- `langchain` - LangChain integration
|
||||
- `llama-index` - LlamaIndex integration
|
||||
- `anthropic` - Anthropic models
|
||||
- `groq` - Groq models
|
||||
- `mistral` - Mistral models
|
||||
- `ollama` / `huggingface` - Local model support
|
||||
- `docs` - Document processing
|
||||
- `codegraph` - Code analysis
|
||||
- `monitoring` - Sentry & Langfuse monitoring
|
||||
- `redis` - Redis support
|
||||
- And more (see [pyproject.toml](https://github.com/topoteretes/cognee/blob/main/pyproject.toml) for full list)
|
||||
2. **Pull from Docker Hub** (no build required):
|
||||
```bash
|
||||
# With HTTP transport (recommended for web deployments)
|
||||
|
|
@ -160,17 +119,6 @@ If you'd rather run cognee-mcp in a container, you have two options:
|
|||
# With stdio transport (default)
|
||||
docker run -e TRANSPORT_MODE=stdio --env-file ./.env --rm -it cognee/cognee-mcp:main
|
||||
```
|
||||
|
||||
**With runtime installation of optional dependencies:**
|
||||
```bash
|
||||
# Install optional dependencies from Docker Hub image
|
||||
docker run \
|
||||
-e TRANSPORT_MODE=http \
|
||||
-e EXTRAS=aws,postgres \
|
||||
--env-file ./.env \
|
||||
-p 8000:8000 \
|
||||
--rm -it cognee/cognee-mcp:main
|
||||
```
|
||||
|
||||
### **Important: Docker vs Direct Usage**
|
||||
**Docker uses environment variables**, not command line arguments:
|
||||
|
|
@ -445,22 +393,16 @@ The MCP server exposes its functionality through tools. Call them from any MCP c
|
|||
|
||||
- **cognify**: Turns your data into a structured knowledge graph and stores it in memory
|
||||
|
||||
- **cognee_add_developer_rules**: Ingest core developer rule files into memory
|
||||
|
||||
- **codify**: Analyse a code repository, build a code graph, stores it in memory
|
||||
|
||||
- **delete**: Delete specific data from a dataset (supports soft/hard deletion modes)
|
||||
|
||||
- **get_developer_rules**: Retrieve all developer rules that were generated based on previous interactions
|
||||
- **search**: Query memory – supports GRAPH_COMPLETION, RAG_COMPLETION, CODE, CHUNKS
|
||||
|
||||
- **list_data**: List all datasets and their data items with IDs for deletion operations
|
||||
|
||||
- **save_interaction**: Logs user-agent interactions and query-answer pairs
|
||||
- **delete**: Delete specific data from a dataset (supports soft/hard deletion modes)
|
||||
|
||||
- **prune**: Reset cognee for a fresh start (removes all data)
|
||||
|
||||
- **search**: Query memory – supports GRAPH_COMPLETION, RAG_COMPLETION, CODE, CHUNKS, SUMMARIES, CYPHER, and FEELING_LUCKY
|
||||
|
||||
- **cognify_status / codify_status**: Track pipeline progress
|
||||
|
||||
**Data Management Examples:**
|
||||
|
|
|
|||
|
|
@ -4,42 +4,6 @@ set -e # Exit on error
|
|||
echo "Debug mode: $DEBUG"
|
||||
echo "Environment: $ENVIRONMENT"
|
||||
|
||||
# Install optional dependencies if EXTRAS is set
|
||||
if [ -n "$EXTRAS" ]; then
|
||||
echo "Installing optional dependencies: $EXTRAS"
|
||||
|
||||
# Get the cognee version that's currently installed
|
||||
COGNEE_VERSION=$(uv pip show cognee | grep "Version:" | awk '{print $2}')
|
||||
echo "Current cognee version: $COGNEE_VERSION"
|
||||
|
||||
# Build the extras list for cognee
|
||||
IFS=',' read -ra EXTRA_ARRAY <<< "$EXTRAS"
|
||||
# Combine base extras from pyproject.toml with requested extras
|
||||
ALL_EXTRAS=""
|
||||
for extra in "${EXTRA_ARRAY[@]}"; do
|
||||
# Trim whitespace
|
||||
extra=$(echo "$extra" | xargs)
|
||||
# Add to extras list if not already present
|
||||
if [[ ! "$ALL_EXTRAS" =~ (^|,)"$extra"(,|$) ]]; then
|
||||
if [ -z "$ALL_EXTRAS" ]; then
|
||||
ALL_EXTRAS="$extra"
|
||||
else
|
||||
ALL_EXTRAS="$ALL_EXTRAS,$extra"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
echo "Installing cognee with extras: $ALL_EXTRAS"
|
||||
echo "Running: uv pip install 'cognee[$ALL_EXTRAS]==$COGNEE_VERSION'"
|
||||
uv pip install "cognee[$ALL_EXTRAS]==$COGNEE_VERSION"
|
||||
|
||||
# Verify installation
|
||||
echo ""
|
||||
echo "✓ Optional dependencies installation completed"
|
||||
else
|
||||
echo "No optional dependencies specified"
|
||||
fi
|
||||
|
||||
# Set default transport mode if not specified
|
||||
TRANSPORT_MODE=${TRANSPORT_MODE:-"stdio"}
|
||||
echo "Transport mode: $TRANSPORT_MODE"
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
[project]
|
||||
name = "cognee-mcp"
|
||||
version = "0.5.0"
|
||||
version = "0.4.0"
|
||||
description = "Cognee MCP server"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
|
|
@ -9,7 +9,7 @@ dependencies = [
|
|||
# For local cognee repo usage remove comment bellow and add absolute path to cognee. Then run `uv sync --reinstall` in the mcp folder on local cognee changes.
|
||||
#"cognee[postgres,codegraph,gemini,huggingface,docs,neo4j] @ file:/Users/igorilic/Desktop/cognee",
|
||||
# TODO: Remove gemini from optional dependecnies for new Cognee version after 0.3.4
|
||||
"cognee[postgres,docs,neo4j]==0.5.0",
|
||||
"cognee[postgres,codegraph,gemini,huggingface,docs,neo4j]==0.3.4",
|
||||
"fastmcp>=2.10.0,<3.0.0",
|
||||
"mcp>=1.12.0,<2.0.0",
|
||||
"uv>=0.6.3,<1.0.0",
|
||||
|
|
@ -38,5 +38,4 @@ dev = [
|
|||
allow-direct-references = true
|
||||
|
||||
[project.scripts]
|
||||
cognee = "src:main"
|
||||
cognee-mcp = "src:main_mcp"
|
||||
cognee-mcp = "src:main"
|
||||
|
|
@ -37,10 +37,12 @@ async def run():
|
|||
|
||||
toolResult = await session.call_tool("prune", arguments={})
|
||||
|
||||
toolResult = await session.call_tool("cognify", arguments={})
|
||||
toolResult = await session.call_tool(
|
||||
"codify", arguments={"repo_path": "SOME_REPO_PATH"}
|
||||
)
|
||||
|
||||
toolResult = await session.call_tool(
|
||||
"search", arguments={"search_type": "GRAPH_COMPLETION"}
|
||||
"search", arguments={"search_type": "CODE", "search_query": "exceptions"}
|
||||
)
|
||||
|
||||
print(f"Cognify result: {toolResult.content}")
|
||||
|
|
|
|||
|
|
@ -151,7 +151,7 @@ class CogneeClient:
|
|||
query_type: str,
|
||||
datasets: Optional[List[str]] = None,
|
||||
system_prompt: Optional[str] = None,
|
||||
top_k: int = 5,
|
||||
top_k: int = 10,
|
||||
) -> Any:
|
||||
"""
|
||||
Search the knowledge graph.
|
||||
|
|
@ -192,7 +192,7 @@ class CogneeClient:
|
|||
|
||||
with redirect_stdout(sys.stderr):
|
||||
results = await self.cognee.search(
|
||||
query_type=SearchType[query_type.upper()], query_text=query_text, top_k=top_k
|
||||
query_type=SearchType[query_type.upper()], query_text=query_text
|
||||
)
|
||||
return results
|
||||
|
||||
|
|
|
|||
|
|
@ -90,6 +90,97 @@ async def health_check(request):
|
|||
return JSONResponse({"status": "ok"})
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def cognee_add_developer_rules(
|
||||
base_path: str = ".", graph_model_file: str = None, graph_model_name: str = None
|
||||
) -> list:
|
||||
"""
|
||||
Ingest core developer rule files into Cognee's memory layer.
|
||||
|
||||
This function loads a predefined set of developer-related configuration,
|
||||
rule, and documentation files from the base repository and assigns them
|
||||
to the special 'developer_rules' node set in Cognee. It ensures these
|
||||
foundational files are always part of the structured memory graph.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
base_path : str
|
||||
Root path to resolve relative file paths. Defaults to current directory.
|
||||
|
||||
graph_model_file : str, optional
|
||||
Optional path to a custom schema file for knowledge graph generation.
|
||||
|
||||
graph_model_name : str, optional
|
||||
Optional class name to use from the graph_model_file schema.
|
||||
|
||||
Returns
|
||||
-------
|
||||
list
|
||||
A message indicating how many rule files were scheduled for ingestion,
|
||||
and how to check their processing status.
|
||||
|
||||
Notes
|
||||
-----
|
||||
- Each file is processed asynchronously in the background.
|
||||
- Files are attached to the 'developer_rules' node set.
|
||||
- Missing files are skipped with a logged warning.
|
||||
"""
|
||||
|
||||
developer_rule_paths = [
|
||||
".cursorrules",
|
||||
".cursor/rules",
|
||||
".same/todos.md",
|
||||
".windsurfrules",
|
||||
".clinerules",
|
||||
"CLAUDE.md",
|
||||
".sourcegraph/memory.md",
|
||||
"AGENT.md",
|
||||
"AGENTS.md",
|
||||
]
|
||||
|
||||
async def cognify_task(file_path: str) -> None:
|
||||
with redirect_stdout(sys.stderr):
|
||||
logger.info(f"Starting cognify for: {file_path}")
|
||||
try:
|
||||
await cognee_client.add(file_path, node_set=["developer_rules"])
|
||||
|
||||
model = None
|
||||
if graph_model_file and graph_model_name:
|
||||
if cognee_client.use_api:
|
||||
logger.warning(
|
||||
"Custom graph models are not supported in API mode, ignoring."
|
||||
)
|
||||
else:
|
||||
from cognee.shared.data_models import KnowledgeGraph
|
||||
|
||||
model = load_class(graph_model_file, graph_model_name)
|
||||
|
||||
await cognee_client.cognify(graph_model=model)
|
||||
logger.info(f"Cognify finished for: {file_path}")
|
||||
except Exception as e:
|
||||
logger.error(f"Cognify failed for {file_path}: {str(e)}")
|
||||
raise ValueError(f"Failed to cognify: {str(e)}")
|
||||
|
||||
tasks = []
|
||||
for rel_path in developer_rule_paths:
|
||||
abs_path = os.path.join(base_path, rel_path)
|
||||
if os.path.isfile(abs_path):
|
||||
tasks.append(asyncio.create_task(cognify_task(abs_path)))
|
||||
else:
|
||||
logger.warning(f"Skipped missing developer rule file: {abs_path}")
|
||||
log_file = get_log_file_location()
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=(
|
||||
f"Started cognify for {len(tasks)} developer rule files in background.\n"
|
||||
f"All are added to the `developer_rules` node set.\n"
|
||||
f"Use `cognify_status` or check logs at {log_file} to monitor progress."
|
||||
),
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def cognify(
|
||||
data: str, graph_model_file: str = None, graph_model_name: str = None, custom_prompt: str = None
|
||||
|
|
@ -103,6 +194,7 @@ async def cognify(
|
|||
|
||||
Prerequisites:
|
||||
- **LLM_API_KEY**: Must be configured (required for entity extraction and graph generation)
|
||||
- **Data Added**: Must have data previously added via `cognee.add()`
|
||||
- **Vector Database**: Must be accessible for embeddings storage
|
||||
- **Graph Database**: Must be accessible for relationship storage
|
||||
|
||||
|
|
@ -316,7 +408,76 @@ async def save_interaction(data: str) -> list:
|
|||
|
||||
|
||||
@mcp.tool()
|
||||
async def search(search_query: str, search_type: str, top_k: int = 10) -> list:
|
||||
async def codify(repo_path: str) -> list:
|
||||
"""
|
||||
Analyze and generate a code-specific knowledge graph from a software repository.
|
||||
|
||||
This function launches a background task that processes the provided repository
|
||||
and builds a code knowledge graph. The function returns immediately while
|
||||
the processing continues in the background due to MCP timeout constraints.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
repo_path : str
|
||||
Path to the code repository to analyze. This can be a local file path or a
|
||||
relative path to a repository. The path should point to the root of the
|
||||
repository or a specific directory within it.
|
||||
|
||||
Returns
|
||||
-------
|
||||
list
|
||||
A list containing a single TextContent object with information about the
|
||||
background task launch and how to check its status.
|
||||
|
||||
Notes
|
||||
-----
|
||||
- The function launches a background task and returns immediately
|
||||
- The code graph generation may take significant time for larger repositories
|
||||
- Use the codify_status tool to check the progress of the operation
|
||||
- Process results are logged to the standard Cognee log file
|
||||
- All stdout is redirected to stderr to maintain MCP communication integrity
|
||||
"""
|
||||
|
||||
if cognee_client.use_api:
|
||||
error_msg = "❌ Codify operation is not available in API mode. Please use direct mode for code graph pipeline."
|
||||
logger.error(error_msg)
|
||||
return [types.TextContent(type="text", text=error_msg)]
|
||||
|
||||
async def codify_task(repo_path: str):
|
||||
# NOTE: MCP uses stdout to communicate, we must redirect all output
|
||||
# going to stdout ( like the print function ) to stderr.
|
||||
with redirect_stdout(sys.stderr):
|
||||
logger.info("Codify process starting.")
|
||||
from cognee.api.v1.cognify.code_graph_pipeline import run_code_graph_pipeline
|
||||
|
||||
results = []
|
||||
async for result in run_code_graph_pipeline(repo_path, False):
|
||||
results.append(result)
|
||||
logger.info(result)
|
||||
if all(results):
|
||||
logger.info("Codify process finished succesfully.")
|
||||
else:
|
||||
logger.info("Codify process failed.")
|
||||
|
||||
asyncio.create_task(codify_task(repo_path))
|
||||
|
||||
log_file = get_log_file_location()
|
||||
text = (
|
||||
f"Background process launched due to MCP timeout limitations.\n"
|
||||
f"To check current codify status use the codify_status tool\n"
|
||||
f"or you can check the log file at: {log_file}"
|
||||
)
|
||||
|
||||
return [
|
||||
types.TextContent(
|
||||
type="text",
|
||||
text=text,
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def search(search_query: str, search_type: str) -> list:
|
||||
"""
|
||||
Search and query the knowledge graph for insights, information, and connections.
|
||||
|
||||
|
|
@ -389,13 +550,6 @@ async def search(search_query: str, search_type: str, top_k: int = 10) -> list:
|
|||
|
||||
The search_type is case-insensitive and will be converted to uppercase.
|
||||
|
||||
top_k : int, optional
|
||||
Maximum number of results to return (default: 10).
|
||||
Controls the amount of context retrieved from the knowledge graph.
|
||||
- Lower values (3-5): Faster, more focused results
|
||||
- Higher values (10-20): More comprehensive, but slower and more context-heavy
|
||||
Helps manage response size and context window usage in MCP clients.
|
||||
|
||||
Returns
|
||||
-------
|
||||
list
|
||||
|
|
@ -432,32 +586,13 @@ async def search(search_query: str, search_type: str, top_k: int = 10) -> list:
|
|||
|
||||
"""
|
||||
|
||||
async def search_task(search_query: str, search_type: str, top_k: int) -> str:
|
||||
"""
|
||||
Internal task to execute knowledge graph search with result formatting.
|
||||
|
||||
Handles the actual search execution and formats results appropriately
|
||||
for MCP clients based on the search type and execution mode (API vs direct).
|
||||
|
||||
Parameters
|
||||
----------
|
||||
search_query : str
|
||||
The search query in natural language
|
||||
search_type : str
|
||||
Type of search to perform (GRAPH_COMPLETION, CHUNKS, etc.)
|
||||
top_k : int
|
||||
Maximum number of results to return
|
||||
|
||||
Returns
|
||||
-------
|
||||
str
|
||||
Formatted search results as a string, with format depending on search_type
|
||||
"""
|
||||
async def search_task(search_query: str, search_type: str) -> str:
|
||||
"""Search the knowledge graph"""
|
||||
# NOTE: MCP uses stdout to communicate, we must redirect all output
|
||||
# going to stdout ( like the print function ) to stderr.
|
||||
with redirect_stdout(sys.stderr):
|
||||
search_results = await cognee_client.search(
|
||||
query_text=search_query, query_type=search_type, top_k=top_k
|
||||
query_text=search_query, query_type=search_type
|
||||
)
|
||||
|
||||
# Handle different result formats based on API vs direct mode
|
||||
|
|
@ -491,10 +626,49 @@ async def search(search_query: str, search_type: str, top_k: int = 10) -> list:
|
|||
else:
|
||||
return str(search_results)
|
||||
|
||||
search_results = await search_task(search_query, search_type, top_k)
|
||||
search_results = await search_task(search_query, search_type)
|
||||
return [types.TextContent(type="text", text=search_results)]
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def get_developer_rules() -> list:
|
||||
"""
|
||||
Retrieve all developer rules that were generated based on previous interactions.
|
||||
|
||||
This tool queries the Cognee knowledge graph and returns a list of developer
|
||||
rules.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
None
|
||||
|
||||
Returns
|
||||
-------
|
||||
list
|
||||
A list containing a single TextContent object with the retrieved developer rules.
|
||||
The format is plain text containing the developer rules in bulletpoints.
|
||||
|
||||
Notes
|
||||
-----
|
||||
- The specific logic for fetching rules is handled internally.
|
||||
- This tool does not accept any parameters and is intended for simple rule inspection use cases.
|
||||
"""
|
||||
|
||||
async def fetch_rules_from_cognee() -> str:
|
||||
"""Collect all developer rules from Cognee"""
|
||||
with redirect_stdout(sys.stderr):
|
||||
if cognee_client.use_api:
|
||||
logger.warning("Developer rules retrieval is not available in API mode")
|
||||
return "Developer rules retrieval is not available in API mode"
|
||||
|
||||
developer_rules = await get_existing_rules(rules_nodeset_name="coding_agent_rules")
|
||||
return developer_rules
|
||||
|
||||
rules_text = await fetch_rules_from_cognee()
|
||||
|
||||
return [types.TextContent(type="text", text=rules_text)]
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def list_data(dataset_id: str = None) -> list:
|
||||
"""
|
||||
|
|
@ -780,6 +954,48 @@ async def cognify_status():
|
|||
return [types.TextContent(type="text", text=error_msg)]
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
async def codify_status():
|
||||
"""
|
||||
Get the current status of the codify pipeline.
|
||||
|
||||
This function retrieves information about current and recently completed codify operations
|
||||
in the codebase dataset. It provides details on progress, success/failure status, and statistics
|
||||
about the processed code repositories.
|
||||
|
||||
Returns
|
||||
-------
|
||||
list
|
||||
A list containing a single TextContent object with the status information as a string.
|
||||
The status includes information about active and completed jobs for the cognify_code_pipeline.
|
||||
|
||||
Notes
|
||||
-----
|
||||
- The function retrieves pipeline status specifically for the "cognify_code_pipeline" on the "codebase" dataset
|
||||
- Status information includes job progress, execution time, and completion status
|
||||
- The status is returned in string format for easy reading
|
||||
- This operation is not available in API mode
|
||||
"""
|
||||
with redirect_stdout(sys.stderr):
|
||||
try:
|
||||
from cognee.modules.data.methods.get_unique_dataset_id import get_unique_dataset_id
|
||||
from cognee.modules.users.methods import get_default_user
|
||||
|
||||
user = await get_default_user()
|
||||
status = await cognee_client.get_pipeline_status(
|
||||
[await get_unique_dataset_id("codebase", user)], "cognify_code_pipeline"
|
||||
)
|
||||
return [types.TextContent(type="text", text=str(status))]
|
||||
except NotImplementedError:
|
||||
error_msg = "❌ Pipeline status is not available in API mode"
|
||||
logger.error(error_msg)
|
||||
return [types.TextContent(type="text", text=error_msg)]
|
||||
except Exception as e:
|
||||
error_msg = f"❌ Failed to get codify status: {str(e)}"
|
||||
logger.error(error_msg)
|
||||
return [types.TextContent(type="text", text=error_msg)]
|
||||
|
||||
|
||||
def node_to_string(node):
|
||||
node_data = ", ".join(
|
||||
[f'{key}: "{value}"' for key, value in node.items() if key in ["id", "name"]]
|
||||
|
|
@ -880,10 +1096,6 @@ async def main():
|
|||
|
||||
# Skip migrations when in API mode (the API server handles its own database)
|
||||
if not args.no_migration and not args.api_url:
|
||||
from cognee.modules.engine.operations.setup import setup
|
||||
|
||||
await setup()
|
||||
|
||||
# Run Alembic migrations from the main cognee directory where alembic.ini is located
|
||||
logger.info("Running database migrations...")
|
||||
migration_result = subprocess.run(
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@
|
|||
Test client for Cognee MCP Server functionality.
|
||||
|
||||
This script tests all the tools and functions available in the Cognee MCP server,
|
||||
including cognify, search, prune, status checks, and utility functions.
|
||||
including cognify, codify, search, prune, status checks, and utility functions.
|
||||
|
||||
Usage:
|
||||
# Set your OpenAI API key first
|
||||
|
|
@ -23,7 +23,6 @@ import tempfile
|
|||
import time
|
||||
from contextlib import asynccontextmanager
|
||||
from cognee.shared.logging_utils import setup_logging
|
||||
from logging import ERROR, INFO
|
||||
|
||||
from mcp import ClientSession, StdioServerParameters
|
||||
from mcp.client.stdio import stdio_client
|
||||
|
|
@ -36,7 +35,7 @@ from src.server import (
|
|||
load_class,
|
||||
)
|
||||
|
||||
# Set timeout for cognify to complete in
|
||||
# Set timeout for cognify/codify to complete in
|
||||
TIMEOUT = 5 * 60 # 5 min in seconds
|
||||
|
||||
|
||||
|
|
@ -152,9 +151,12 @@ DEBUG = True
|
|||
|
||||
expected_tools = {
|
||||
"cognify",
|
||||
"codify",
|
||||
"search",
|
||||
"prune",
|
||||
"cognify_status",
|
||||
"codify_status",
|
||||
"cognee_add_developer_rules",
|
||||
"list_data",
|
||||
"delete",
|
||||
}
|
||||
|
|
@ -245,6 +247,106 @@ DEBUG = True
|
|||
}
|
||||
print(f"❌ {test_name} test failed: {e}")
|
||||
|
||||
async def test_codify(self):
|
||||
"""Test the codify functionality using MCP client."""
|
||||
print("\n🧪 Testing codify functionality...")
|
||||
try:
|
||||
async with self.mcp_server_session() as session:
|
||||
codify_result = await session.call_tool(
|
||||
"codify", arguments={"repo_path": self.test_repo_dir}
|
||||
)
|
||||
|
||||
start = time.time() # mark the start
|
||||
while True:
|
||||
try:
|
||||
# Wait a moment
|
||||
await asyncio.sleep(5)
|
||||
|
||||
# Check if codify processing is finished
|
||||
status_result = await session.call_tool("codify_status", arguments={})
|
||||
if hasattr(status_result, "content") and status_result.content:
|
||||
status_text = (
|
||||
status_result.content[0].text
|
||||
if status_result.content
|
||||
else str(status_result)
|
||||
)
|
||||
else:
|
||||
status_text = str(status_result)
|
||||
|
||||
if str(PipelineRunStatus.DATASET_PROCESSING_COMPLETED) in status_text:
|
||||
break
|
||||
elif time.time() - start > TIMEOUT:
|
||||
raise TimeoutError("Codify did not complete in 5min")
|
||||
except DatabaseNotCreatedError:
|
||||
if time.time() - start > TIMEOUT:
|
||||
raise TimeoutError("Database was not created in 5min")
|
||||
|
||||
self.test_results["codify"] = {
|
||||
"status": "PASS",
|
||||
"result": codify_result,
|
||||
"message": "Codify executed successfully",
|
||||
}
|
||||
print("✅ Codify test passed")
|
||||
|
||||
except Exception as e:
|
||||
self.test_results["codify"] = {
|
||||
"status": "FAIL",
|
||||
"error": str(e),
|
||||
"message": "Codify test failed",
|
||||
}
|
||||
print(f"❌ Codify test failed: {e}")
|
||||
|
||||
async def test_cognee_add_developer_rules(self):
|
||||
"""Test the cognee_add_developer_rules functionality using MCP client."""
|
||||
print("\n🧪 Testing cognee_add_developer_rules functionality...")
|
||||
try:
|
||||
async with self.mcp_server_session() as session:
|
||||
result = await session.call_tool(
|
||||
"cognee_add_developer_rules", arguments={"base_path": self.test_data_dir}
|
||||
)
|
||||
|
||||
start = time.time() # mark the start
|
||||
while True:
|
||||
try:
|
||||
# Wait a moment
|
||||
await asyncio.sleep(5)
|
||||
|
||||
# Check if developer rule cognify processing is finished
|
||||
status_result = await session.call_tool("cognify_status", arguments={})
|
||||
if hasattr(status_result, "content") and status_result.content:
|
||||
status_text = (
|
||||
status_result.content[0].text
|
||||
if status_result.content
|
||||
else str(status_result)
|
||||
)
|
||||
else:
|
||||
status_text = str(status_result)
|
||||
|
||||
if str(PipelineRunStatus.DATASET_PROCESSING_COMPLETED) in status_text:
|
||||
break
|
||||
elif time.time() - start > TIMEOUT:
|
||||
raise TimeoutError(
|
||||
"Cognify of developer rules did not complete in 5min"
|
||||
)
|
||||
except DatabaseNotCreatedError:
|
||||
if time.time() - start > TIMEOUT:
|
||||
raise TimeoutError("Database was not created in 5min")
|
||||
|
||||
self.test_results["cognee_add_developer_rules"] = {
|
||||
"status": "PASS",
|
||||
"result": result,
|
||||
"message": "Developer rules addition executed successfully",
|
||||
}
|
||||
print("✅ Developer rules test passed")
|
||||
|
||||
except Exception as e:
|
||||
self.test_results["cognee_add_developer_rules"] = {
|
||||
"status": "FAIL",
|
||||
"error": str(e),
|
||||
"message": "Developer rules test failed",
|
||||
}
|
||||
print(f"❌ Developer rules test failed: {e}")
|
||||
|
||||
async def test_search_functionality(self):
|
||||
"""Test the search functionality with different search types using MCP client."""
|
||||
print("\n🧪 Testing search functionality...")
|
||||
|
|
@ -257,11 +359,7 @@ DEBUG = True
|
|||
# Go through all Cognee search types
|
||||
for search_type in SearchType:
|
||||
# Don't test these search types
|
||||
if search_type in [
|
||||
SearchType.NATURAL_LANGUAGE,
|
||||
SearchType.CYPHER,
|
||||
SearchType.TRIPLET_COMPLETION,
|
||||
]:
|
||||
if search_type in [SearchType.NATURAL_LANGUAGE, SearchType.CYPHER]:
|
||||
break
|
||||
try:
|
||||
async with self.mcp_server_session() as session:
|
||||
|
|
@ -583,6 +681,9 @@ class TestModel:
|
|||
test_name="Cognify2",
|
||||
)
|
||||
|
||||
await self.test_codify()
|
||||
await self.test_cognee_add_developer_rules()
|
||||
|
||||
# Test list_data and delete functionality
|
||||
await self.test_list_data()
|
||||
await self.test_delete()
|
||||
|
|
@ -638,5 +739,7 @@ async def main():
|
|||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from logging import ERROR
|
||||
|
||||
logger = setup_logging(log_level=ERROR)
|
||||
asyncio.run(main())
|
||||
|
|
|
|||
4611
cognee-mcp/uv.lock
generated
4611
cognee-mcp/uv.lock
generated
File diff suppressed because it is too large
Load diff
|
|
@ -19,7 +19,6 @@ from .api.v1.add import add
|
|||
from .api.v1.delete import delete
|
||||
from .api.v1.cognify import cognify
|
||||
from .modules.memify import memify
|
||||
from .modules.run_custom_pipeline import run_custom_pipeline
|
||||
from .api.v1.update import update
|
||||
from .api.v1.config.config import config
|
||||
from .api.v1.datasets.datasets import datasets
|
||||
|
|
|
|||
|
|
@ -21,9 +21,8 @@ from cognee.api.v1.notebooks.routers import get_notebooks_router
|
|||
from cognee.api.v1.permissions.routers import get_permissions_router
|
||||
from cognee.api.v1.settings.routers import get_settings_router
|
||||
from cognee.api.v1.datasets.routers import get_datasets_router
|
||||
from cognee.api.v1.cognify.routers import get_cognify_router
|
||||
from cognee.api.v1.cognify.routers import get_code_pipeline_router, get_cognify_router
|
||||
from cognee.api.v1.search.routers import get_search_router
|
||||
from cognee.api.v1.ontologies.routers.get_ontology_router import get_ontology_router
|
||||
from cognee.api.v1.memify.routers import get_memify_router
|
||||
from cognee.api.v1.add.routers import get_add_router
|
||||
from cognee.api.v1.delete.routers import get_delete_router
|
||||
|
|
@ -40,8 +39,6 @@ from cognee.api.v1.users.routers import (
|
|||
)
|
||||
from cognee.modules.users.methods.get_authenticated_user import REQUIRE_AUTHENTICATION
|
||||
|
||||
# Ensure application logging is configured for container stdout/stderr
|
||||
setup_logging()
|
||||
logger = get_logger()
|
||||
|
||||
if os.getenv("ENV", "prod") == "prod":
|
||||
|
|
@ -77,9 +74,6 @@ async def lifespan(app: FastAPI):
|
|||
|
||||
await get_default_user()
|
||||
|
||||
# Emit a clear startup message for docker logs
|
||||
logger.info("Backend server has started")
|
||||
|
||||
yield
|
||||
|
||||
|
||||
|
|
@ -264,8 +258,6 @@ app.include_router(
|
|||
|
||||
app.include_router(get_datasets_router(), prefix="/api/v1/datasets", tags=["datasets"])
|
||||
|
||||
app.include_router(get_ontology_router(), prefix="/api/v1/ontologies", tags=["ontologies"])
|
||||
|
||||
app.include_router(get_settings_router(), prefix="/api/v1/settings", tags=["settings"])
|
||||
|
||||
app.include_router(get_visualize_router(), prefix="/api/v1/visualize", tags=["visualize"])
|
||||
|
|
@ -278,6 +270,10 @@ app.include_router(get_responses_router(), prefix="/api/v1/responses", tags=["re
|
|||
|
||||
app.include_router(get_sync_router(), prefix="/api/v1/sync", tags=["sync"])
|
||||
|
||||
codegraph_routes = get_code_pipeline_router()
|
||||
if codegraph_routes:
|
||||
app.include_router(codegraph_routes, prefix="/api/v1/code-pipeline", tags=["code-pipeline"])
|
||||
|
||||
app.include_router(
|
||||
get_users_router(),
|
||||
prefix="/api/v1/users",
|
||||
|
|
|
|||
|
|
@ -1,5 +1,6 @@
|
|||
from uuid import UUID
|
||||
from typing import Union, BinaryIO, List, Optional, Any
|
||||
from typing import Union, BinaryIO, List, Optional, Dict, Any
|
||||
from pydantic import BaseModel
|
||||
from cognee.modules.users.models import User
|
||||
from cognee.modules.pipelines import Task, run_pipeline
|
||||
from cognee.modules.pipelines.layers.resolve_authorized_user_dataset import (
|
||||
|
|
@ -11,9 +12,18 @@ from cognee.modules.pipelines.layers.reset_dataset_pipeline_run_status import (
|
|||
from cognee.modules.engine.operations.setup import setup
|
||||
from cognee.tasks.ingestion import ingest_data, resolve_data_directories
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from .preprocessors import get_preprocessor_registry, PreprocessorContext
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
try:
|
||||
from .preprocessors.web_preprocessor import register_web_preprocessor
|
||||
|
||||
register_web_preprocessor()
|
||||
except ImportError:
|
||||
logger.debug("Web preprocessor not available")
|
||||
pass
|
||||
|
||||
|
||||
async def add(
|
||||
data: Union[BinaryIO, list[BinaryIO], str, list[str]],
|
||||
|
|
@ -23,9 +33,10 @@ async def add(
|
|||
vector_db_config: dict = None,
|
||||
graph_db_config: dict = None,
|
||||
dataset_id: Optional[UUID] = None,
|
||||
preferred_loaders: Optional[List[Union[str, dict[str, dict[str, Any]]]]] = None,
|
||||
preferred_loaders: List[str] = None,
|
||||
incremental_loading: bool = True,
|
||||
data_per_batch: Optional[int] = 20,
|
||||
preprocessors: Optional[List[str]] = None,
|
||||
**preprocessor_params,
|
||||
):
|
||||
"""
|
||||
Add data to Cognee for knowledge graph processing.
|
||||
|
|
@ -82,9 +93,9 @@ async def add(
|
|||
vector_db_config: Optional configuration for vector database (for custom setups).
|
||||
graph_db_config: Optional configuration for graph database (for custom setups).
|
||||
dataset_id: Optional specific dataset UUID to use instead of dataset_name.
|
||||
extraction_rules: Optional dictionary of rules (e.g., CSS selectors, XPath) for extracting specific content from web pages using BeautifulSoup
|
||||
tavily_config: Optional configuration for Tavily API, including API key and extraction settings
|
||||
soup_crawler_config: Optional configuration for BeautifulSoup crawler, specifying concurrency, crawl delay, and extraction rules.
|
||||
preprocessors: Optional list of preprocessor names to run. If None, no preprocessors will run.
|
||||
Available preprocessors: ["web_preprocessor"] for handling URLs.
|
||||
**preprocessor_params: Additional parameters passed to preprocessors (e.g., extraction_rules, tavily_config, soup_crawler_config for web preprocessor).
|
||||
|
||||
Returns:
|
||||
PipelineRunInfo: Information about the ingestion pipeline execution including:
|
||||
|
|
@ -140,14 +151,14 @@ async def add(
|
|||
"description": "p",
|
||||
"more_info": "a[href*='more-info']"
|
||||
}
|
||||
await cognee.add("https://example.com",extraction_rules=extraction_rules)
|
||||
await cognee.add("https://example.com", preprocessors=["web_preprocessor"], extraction_rules=extraction_rules)
|
||||
|
||||
# Add a single url and tavily extract ingestion method
|
||||
Make sure to set TAVILY_API_KEY = YOUR_TAVILY_API_KEY as a environment variable
|
||||
await cognee.add("https://example.com")
|
||||
await cognee.add("https://example.com", preprocessors=["web_preprocessor"], tavily_config=your_config)
|
||||
|
||||
# Add multiple urls
|
||||
await cognee.add(["https://example.com","https://books.toscrape.com"])
|
||||
await cognee.add(["https://example.com","https://books.toscrape.com"], preprocessors=["web_preprocessor"])
|
||||
```
|
||||
|
||||
Environment Variables:
|
||||
|
|
@ -155,7 +166,7 @@ async def add(
|
|||
- LLM_API_KEY: API key for your LLM provider (OpenAI, Anthropic, etc.)
|
||||
|
||||
Optional:
|
||||
- LLM_PROVIDER: "openai" (default), "anthropic", "gemini", "ollama", "mistral", "bedrock"
|
||||
- LLM_PROVIDER: "openai" (default), "anthropic", "gemini", "ollama", "mistral"
|
||||
- LLM_MODEL: Model name (default: "gpt-5-mini")
|
||||
- DEFAULT_USER_EMAIL: Custom default user email
|
||||
- DEFAULT_USER_PASSWORD: Custom default user password
|
||||
|
|
@ -164,14 +175,38 @@ async def add(
|
|||
- TAVILY_API_KEY: YOUR_TAVILY_API_KEY
|
||||
|
||||
"""
|
||||
if preferred_loaders is not None:
|
||||
transformed = {}
|
||||
for item in preferred_loaders:
|
||||
if isinstance(item, dict):
|
||||
transformed.update(item)
|
||||
else:
|
||||
transformed[item] = {}
|
||||
preferred_loaders = transformed
|
||||
|
||||
registry = get_preprocessor_registry()
|
||||
preprocessor_context = PreprocessorContext(
|
||||
data=data,
|
||||
dataset_name=dataset_name,
|
||||
user=user,
|
||||
node_set=node_set,
|
||||
vector_db_config=vector_db_config,
|
||||
graph_db_config=graph_db_config,
|
||||
dataset_id=dataset_id,
|
||||
preferred_loaders=preferred_loaders,
|
||||
incremental_loading=incremental_loading,
|
||||
extra_params={
|
||||
**preprocessor_params,
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
processed_context = await registry.process_with_selected_preprocessors(
|
||||
preprocessor_context, preprocessors or []
|
||||
)
|
||||
data = processed_context.data
|
||||
dataset_name = processed_context.dataset_name
|
||||
user = processed_context.user
|
||||
node_set = processed_context.node_set
|
||||
vector_db_config = processed_context.vector_db_config
|
||||
graph_db_config = processed_context.graph_db_config
|
||||
dataset_id = processed_context.dataset_id
|
||||
preferred_loaders = processed_context.preferred_loaders
|
||||
incremental_loading = processed_context.incremental_loading
|
||||
except Exception as e:
|
||||
logger.error(f"Preprocessor processing failed: {str(e)}")
|
||||
|
||||
tasks = [
|
||||
Task(resolve_data_directories, include_subdirectories=True),
|
||||
|
|
@ -205,9 +240,7 @@ async def add(
|
|||
pipeline_name="add_pipeline",
|
||||
vector_db_config=vector_db_config,
|
||||
graph_db_config=graph_db_config,
|
||||
use_pipeline_cache=True,
|
||||
incremental_loading=incremental_loading,
|
||||
data_per_batch=data_per_batch,
|
||||
):
|
||||
pipeline_run_info = run_info
|
||||
|
||||
|
|
|
|||
16
cognee/api/v1/add/preprocessors/__init__.py
Normal file
16
cognee/api/v1/add/preprocessors/__init__.py
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
"""
|
||||
Preprocessor system for the cognee add function.
|
||||
|
||||
This module provides a plugin architecture that allows preprocessors to be easily
|
||||
plugged into the add() function without modifying core code.
|
||||
"""
|
||||
|
||||
from .base import Preprocessor, PreprocessorRegistry, PreprocessorContext
|
||||
from .registry import get_preprocessor_registry
|
||||
|
||||
__all__ = [
|
||||
"Preprocessor",
|
||||
"PreprocessorRegistry",
|
||||
"get_preprocessor_registry",
|
||||
"PreprocessorContext",
|
||||
]
|
||||
171
cognee/api/v1/add/preprocessors/base.py
Normal file
171
cognee/api/v1/add/preprocessors/base.py
Normal file
|
|
@ -0,0 +1,171 @@
|
|||
"""
|
||||
Base classes for the cognee add preprocessor system.
|
||||
"""
|
||||
|
||||
from uuid import UUID
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Dict, List, Optional, Union, BinaryIO
|
||||
from pydantic import BaseModel
|
||||
|
||||
from cognee.modules.users.models import User
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
||||
class PreprocessorContext(BaseModel):
|
||||
"""Context passed to preprocessors during processing."""
|
||||
|
||||
model_config = {"arbitrary_types_allowed": True}
|
||||
|
||||
data: Union[BinaryIO, List[BinaryIO], str, List[str]]
|
||||
dataset_name: str
|
||||
user: Optional[User] = None
|
||||
node_set: Optional[List[str]] = None
|
||||
vector_db_config: Optional[Dict] = None
|
||||
graph_db_config: Optional[Dict] = None
|
||||
dataset_id: Optional[UUID] = None
|
||||
preferred_loaders: Optional[List[str]] = None
|
||||
incremental_loading: bool = True
|
||||
extra_params: Dict[str, Any] = {}
|
||||
|
||||
|
||||
class PreprocessorResult(BaseModel):
|
||||
"""Result returned by preprocessors."""
|
||||
|
||||
modified_context: Optional[PreprocessorContext] = None
|
||||
stop_processing: bool = False
|
||||
error: Optional[str] = None
|
||||
|
||||
|
||||
class Preprocessor(ABC):
|
||||
"""
|
||||
Base class for all cognee add preprocessors.
|
||||
|
||||
Preprocessors can modify the processing context, add custom logic,
|
||||
or handle specific data types before main pipeline processing.
|
||||
"""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def name(self) -> str:
|
||||
"""Unique name for this preprocessor."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def can_handle(self, context: PreprocessorContext) -> bool:
|
||||
"""
|
||||
Check if this preprocessor can handle the given context.
|
||||
|
||||
Args:
|
||||
context: The current processing context
|
||||
|
||||
Returns:
|
||||
True if this preprocessor should process this context
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def process(self, context: PreprocessorContext) -> PreprocessorResult:
|
||||
"""
|
||||
Process the given context.
|
||||
|
||||
Args:
|
||||
context: The current processing context
|
||||
|
||||
Returns:
|
||||
PreprocessorResult with any modifications or errors
|
||||
"""
|
||||
pass
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"{self.__class__.__name__}(name={self.name}"
|
||||
|
||||
|
||||
class PreprocessorRegistry:
|
||||
"""Registry for managing and executing preprocessors."""
|
||||
|
||||
def __init__(self):
|
||||
self._preprocessors: List[Preprocessor] = []
|
||||
|
||||
def register(self, preprocessor: Preprocessor) -> None:
|
||||
"""Register a preprocessor."""
|
||||
if not isinstance(preprocessor, Preprocessor):
|
||||
raise TypeError(
|
||||
f"Preprocessor must inherit from Preprocessor, got {type(preprocessor)}"
|
||||
)
|
||||
|
||||
if any(prep.name == preprocessor.name for prep in self._preprocessors):
|
||||
raise ValueError(f"Preprocessor with name '{preprocessor.name}' already registered")
|
||||
|
||||
self._preprocessors.append(preprocessor)
|
||||
|
||||
def unregister(self, name: str) -> bool:
|
||||
"""Unregister a preprocessor by name."""
|
||||
for i, prep in enumerate(self._preprocessors):
|
||||
if prep.name == name:
|
||||
del self._preprocessors[i]
|
||||
return True
|
||||
return False
|
||||
|
||||
def get_preprocessors(self) -> List[Preprocessor]:
|
||||
"""Get all registered preprocessors ordered by priority."""
|
||||
return self._preprocessors.copy()
|
||||
|
||||
def get_applicable_preprocessors(self, context: PreprocessorContext) -> List[Preprocessor]:
|
||||
"""Get preprocessors that can handle the given context."""
|
||||
return [prep for prep in self._preprocessors if prep.can_handle(context)]
|
||||
|
||||
async def process_with_selected_preprocessors(
|
||||
self, context: PreprocessorContext, preprocessor_names: List[str]
|
||||
) -> PreprocessorContext:
|
||||
"""
|
||||
Process context through only the specified preprocessors.
|
||||
|
||||
Args:
|
||||
context: The initial context
|
||||
preprocessor_names: List of preprocessor names to run
|
||||
|
||||
Returns:
|
||||
The final processed context
|
||||
|
||||
Raises:
|
||||
Exception: If any preprocessor encounters an error or preprocessor name not found
|
||||
"""
|
||||
current_context = context
|
||||
|
||||
selected_preprocessors: List[Preprocessor] = []
|
||||
for name in preprocessor_names:
|
||||
preprocessor = next((prep for prep in self._preprocessors if prep.name == name), None)
|
||||
if preprocessor is None:
|
||||
available_names = [prep.name for prep in self._preprocessors]
|
||||
raise ValueError(
|
||||
f"Preprocessor '{name}' not found. Available preprocessors: {available_names}"
|
||||
)
|
||||
selected_preprocessors.append(preprocessor)
|
||||
|
||||
for preprocessor in selected_preprocessors:
|
||||
if not preprocessor.can_handle(current_context):
|
||||
logger.warning(
|
||||
f"Preprocessor '{preprocessor.name}' cannot handle current context, skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
try:
|
||||
result = await preprocessor.process(current_context)
|
||||
|
||||
if result.error:
|
||||
raise Exception(f"Preprocessor '{preprocessor.name}' failed: {result.error}")
|
||||
|
||||
if result.modified_context:
|
||||
current_context = result.modified_context
|
||||
|
||||
if result.stop_processing:
|
||||
break
|
||||
|
||||
except Exception as e:
|
||||
raise Exception(
|
||||
f"Preprocessor '{preprocessor.name}' encountered an error: {str(e)}"
|
||||
) from e
|
||||
|
||||
return current_context
|
||||
12
cognee/api/v1/add/preprocessors/registry.py
Normal file
12
cognee/api/v1/add/preprocessors/registry.py
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
"""
|
||||
Global preprocessor registry for cognee add function.
|
||||
"""
|
||||
|
||||
from .base import PreprocessorRegistry
|
||||
|
||||
_registry = PreprocessorRegistry()
|
||||
|
||||
|
||||
def get_preprocessor_registry() -> PreprocessorRegistry:
|
||||
"""Get the global preprocessor registry."""
|
||||
return _registry
|
||||
98
cognee/api/v1/add/preprocessors/web_preprocessor.py
Normal file
98
cognee/api/v1/add/preprocessors/web_preprocessor.py
Normal file
|
|
@ -0,0 +1,98 @@
|
|||
"""
|
||||
Web preprocessor for handling URL inputs in the cognee add function.
|
||||
|
||||
This preprocessor handles web URLs by setting up appropriate crawling configurations
|
||||
and modifying the processing context for web content.
|
||||
"""
|
||||
|
||||
import os
|
||||
from urllib.parse import urlparse
|
||||
from typing import Union, BinaryIO
|
||||
|
||||
from .base import Preprocessor, PreprocessorContext, PreprocessorResult
|
||||
|
||||
try:
|
||||
from cognee.tasks.web_scraper.config import TavilyConfig, SoupCrawlerConfig
|
||||
from cognee.context_global_variables import (
|
||||
tavily_config as tavily,
|
||||
soup_crawler_config as soup_crawler,
|
||||
)
|
||||
|
||||
WEB_SCRAPER_AVAILABLE = True
|
||||
except ImportError:
|
||||
WEB_SCRAPER_AVAILABLE = False
|
||||
|
||||
|
||||
class WebPreprocessor(Preprocessor):
|
||||
"""Preprocessor for handling web URL inputs."""
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return "web_preprocessor"
|
||||
|
||||
def _is_http_url(self, item: Union[str, BinaryIO]) -> bool:
|
||||
"""Check if an item is an HTTP/HTTPS URL."""
|
||||
http_schemes = {"http", "https"}
|
||||
return isinstance(item, str) and urlparse(item).scheme in http_schemes
|
||||
|
||||
def can_handle(self, context: PreprocessorContext) -> bool:
|
||||
"""Check if this preprocessor can handle the given context."""
|
||||
if not WEB_SCRAPER_AVAILABLE:
|
||||
return False
|
||||
|
||||
if self._is_http_url(context.data):
|
||||
return True
|
||||
|
||||
if isinstance(context.data, list):
|
||||
return any(self._is_http_url(item) for item in context.data)
|
||||
|
||||
return False
|
||||
|
||||
async def process(self, context: PreprocessorContext) -> PreprocessorResult:
|
||||
"""Process web URLs by setting up crawling configurations."""
|
||||
try:
|
||||
extraction_rules = context.extra_params.get("extraction_rules")
|
||||
tavily_config_param = context.extra_params.get("tavily_config")
|
||||
soup_crawler_config_param = context.extra_params.get("soup_crawler_config")
|
||||
|
||||
if not soup_crawler_config_param and extraction_rules:
|
||||
soup_crawler_config_param = SoupCrawlerConfig(extraction_rules=extraction_rules)
|
||||
|
||||
if not tavily_config_param and os.getenv("TAVILY_API_KEY"):
|
||||
tavily_config_param = TavilyConfig(api_key=os.getenv("TAVILY_API_KEY"))
|
||||
|
||||
if soup_crawler_config_param:
|
||||
soup_crawler.set(soup_crawler_config_param)
|
||||
|
||||
tavily.set(tavily_config_param)
|
||||
|
||||
modified_context = context.model_copy()
|
||||
|
||||
if self._is_http_url(context.data):
|
||||
modified_context.node_set = (
|
||||
["web_content"] if not context.node_set else context.node_set + ["web_content"]
|
||||
)
|
||||
elif isinstance(context.data, list) and any(
|
||||
self._is_http_url(item) for item in context.data
|
||||
):
|
||||
modified_context.node_set = (
|
||||
["web_content"] if not context.node_set else context.node_set + ["web_content"]
|
||||
)
|
||||
|
||||
return PreprocessorResult(modified_context=modified_context)
|
||||
|
||||
except Exception as e:
|
||||
return PreprocessorResult(error=f"Failed to configure web scraping: {str(e)}")
|
||||
|
||||
|
||||
def register_web_preprocessor() -> None:
|
||||
"""Register the web preprocessor with the global registry."""
|
||||
from .registry import get_preprocessor_registry
|
||||
|
||||
registry = get_preprocessor_registry()
|
||||
|
||||
if WEB_SCRAPER_AVAILABLE:
|
||||
try:
|
||||
registry.register(WebPreprocessor())
|
||||
except ValueError:
|
||||
pass
|
||||
|
|
@ -10,7 +10,6 @@ from cognee.modules.users.methods import get_authenticated_user
|
|||
from cognee.shared.utils import send_telemetry
|
||||
from cognee.modules.pipelines.models import PipelineRunErrored
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
|
@ -64,11 +63,7 @@ def get_add_router() -> APIRouter:
|
|||
send_telemetry(
|
||||
"Add API Endpoint Invoked",
|
||||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "POST /v1/add",
|
||||
"node_set": node_set,
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
additional_properties={"endpoint": "POST /v1/add", "node_set": node_set},
|
||||
)
|
||||
|
||||
from cognee.api.v1.add import add as cognee_add
|
||||
|
|
@ -82,9 +77,7 @@ def get_add_router() -> APIRouter:
|
|||
datasetName,
|
||||
user=user,
|
||||
dataset_id=datasetId,
|
||||
node_set=node_set
|
||||
if node_set != [""]
|
||||
else None, # Transform default node_set endpoint value to None
|
||||
node_set=node_set if node_set else None,
|
||||
)
|
||||
|
||||
if isinstance(add_run, PipelineRunErrored):
|
||||
|
|
|
|||
119
cognee/api/v1/cognify/code_graph_pipeline.py
Normal file
119
cognee/api/v1/cognify/code_graph_pipeline.py
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
import os
|
||||
import pathlib
|
||||
import asyncio
|
||||
from typing import Optional
|
||||
from cognee.shared.logging_utils import get_logger, setup_logging
|
||||
from cognee.modules.observability.get_observe import get_observe
|
||||
|
||||
from cognee.api.v1.search import SearchType, search
|
||||
from cognee.api.v1.visualize.visualize import visualize_graph
|
||||
from cognee.modules.cognify.config import get_cognify_config
|
||||
from cognee.modules.pipelines import run_tasks
|
||||
from cognee.modules.pipelines.tasks.task import Task
|
||||
from cognee.modules.users.methods import get_default_user
|
||||
from cognee.shared.data_models import KnowledgeGraph
|
||||
from cognee.modules.data.methods import create_dataset
|
||||
from cognee.tasks.documents import classify_documents, extract_chunks_from_documents
|
||||
from cognee.tasks.graph import extract_graph_from_data
|
||||
from cognee.tasks.ingestion import ingest_data
|
||||
from cognee.tasks.repo_processor import get_non_py_files, get_repo_file_dependencies
|
||||
|
||||
from cognee.tasks.storage import add_data_points
|
||||
from cognee.tasks.summarization import summarize_text
|
||||
from cognee.infrastructure.llm import get_max_chunk_tokens
|
||||
from cognee.infrastructure.databases.relational import get_relational_engine
|
||||
|
||||
observe = get_observe()
|
||||
|
||||
logger = get_logger("code_graph_pipeline")
|
||||
|
||||
|
||||
@observe
|
||||
async def run_code_graph_pipeline(
|
||||
repo_path,
|
||||
include_docs=False,
|
||||
excluded_paths: Optional[list[str]] = None,
|
||||
supported_languages: Optional[list[str]] = None,
|
||||
):
|
||||
import cognee
|
||||
from cognee.low_level import setup
|
||||
|
||||
await cognee.prune.prune_data()
|
||||
await cognee.prune.prune_system(metadata=True)
|
||||
await setup()
|
||||
|
||||
cognee_config = get_cognify_config()
|
||||
user = await get_default_user()
|
||||
detailed_extraction = True
|
||||
|
||||
tasks = [
|
||||
Task(
|
||||
get_repo_file_dependencies,
|
||||
detailed_extraction=detailed_extraction,
|
||||
supported_languages=supported_languages,
|
||||
excluded_paths=excluded_paths,
|
||||
),
|
||||
# Task(summarize_code, task_config={"batch_size": 500}), # This task takes a long time to complete
|
||||
Task(add_data_points, task_config={"batch_size": 30}),
|
||||
]
|
||||
|
||||
if include_docs:
|
||||
# This tasks take a long time to complete
|
||||
non_code_tasks = [
|
||||
Task(get_non_py_files, task_config={"batch_size": 50}),
|
||||
Task(ingest_data, dataset_name="repo_docs", user=user),
|
||||
Task(classify_documents),
|
||||
Task(extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()),
|
||||
Task(
|
||||
extract_graph_from_data,
|
||||
graph_model=KnowledgeGraph,
|
||||
task_config={"batch_size": 50},
|
||||
),
|
||||
Task(
|
||||
summarize_text,
|
||||
summarization_model=cognee_config.summarization_model,
|
||||
task_config={"batch_size": 50},
|
||||
),
|
||||
]
|
||||
|
||||
dataset_name = "codebase"
|
||||
|
||||
# Save dataset to database
|
||||
db_engine = get_relational_engine()
|
||||
async with db_engine.get_async_session() as session:
|
||||
dataset = await create_dataset(dataset_name, user, session)
|
||||
|
||||
if include_docs:
|
||||
non_code_pipeline_run = run_tasks(
|
||||
non_code_tasks, dataset.id, repo_path, user, "cognify_pipeline"
|
||||
)
|
||||
async for run_status in non_code_pipeline_run:
|
||||
yield run_status
|
||||
|
||||
async for run_status in run_tasks(
|
||||
tasks, dataset.id, repo_path, user, "cognify_code_pipeline", incremental_loading=False
|
||||
):
|
||||
yield run_status
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
async def main():
|
||||
async for run_status in run_code_graph_pipeline("REPO_PATH"):
|
||||
print(f"{run_status.pipeline_run_id}: {run_status.status}")
|
||||
|
||||
file_path = os.path.join(
|
||||
pathlib.Path(__file__).parent, ".artifacts", "graph_visualization.html"
|
||||
)
|
||||
await visualize_graph(file_path)
|
||||
|
||||
search_results = await search(
|
||||
query_type=SearchType.CODE,
|
||||
query_text="How is Relationship weight calculated?",
|
||||
)
|
||||
|
||||
for file in search_results:
|
||||
print(file["name"])
|
||||
|
||||
logger = setup_logging(name="code_graph_pipeline")
|
||||
asyncio.run(main())
|
||||
|
|
@ -3,7 +3,6 @@ from pydantic import BaseModel
|
|||
from typing import Union, Optional
|
||||
from uuid import UUID
|
||||
|
||||
from cognee.modules.cognify.config import get_cognify_config
|
||||
from cognee.modules.ontology.ontology_env_config import get_ontology_env_config
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from cognee.shared.data_models import KnowledgeGraph
|
||||
|
|
@ -20,6 +19,7 @@ from cognee.modules.ontology.get_default_ontology_resolver import (
|
|||
from cognee.modules.users.models import User
|
||||
|
||||
from cognee.tasks.documents import (
|
||||
check_permissions_on_dataset,
|
||||
classify_documents,
|
||||
extract_chunks_from_documents,
|
||||
)
|
||||
|
|
@ -44,7 +44,6 @@ async def cognify(
|
|||
graph_model: BaseModel = KnowledgeGraph,
|
||||
chunker=TextChunker,
|
||||
chunk_size: int = None,
|
||||
chunks_per_batch: int = None,
|
||||
config: Config = None,
|
||||
vector_db_config: dict = None,
|
||||
graph_db_config: dict = None,
|
||||
|
|
@ -52,8 +51,6 @@ async def cognify(
|
|||
incremental_loading: bool = True,
|
||||
custom_prompt: Optional[str] = None,
|
||||
temporal_cognify: bool = False,
|
||||
data_per_batch: int = 20,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Transform ingested data into a structured knowledge graph.
|
||||
|
|
@ -79,11 +76,12 @@ async def cognify(
|
|||
|
||||
Processing Pipeline:
|
||||
1. **Document Classification**: Identifies document types and structures
|
||||
2. **Text Chunking**: Breaks content into semantically meaningful segments
|
||||
3. **Entity Extraction**: Identifies key concepts, people, places, organizations
|
||||
4. **Relationship Detection**: Discovers connections between entities
|
||||
5. **Graph Construction**: Builds semantic knowledge graph with embeddings
|
||||
6. **Content Summarization**: Creates hierarchical summaries for navigation
|
||||
2. **Permission Validation**: Ensures user has processing rights
|
||||
3. **Text Chunking**: Breaks content into semantically meaningful segments
|
||||
4. **Entity Extraction**: Identifies key concepts, people, places, organizations
|
||||
5. **Relationship Detection**: Discovers connections between entities
|
||||
6. **Graph Construction**: Builds semantic knowledge graph with embeddings
|
||||
7. **Content Summarization**: Creates hierarchical summaries for navigation
|
||||
|
||||
Graph Model Customization:
|
||||
The `graph_model` parameter allows custom knowledge structures:
|
||||
|
|
@ -107,7 +105,6 @@ async def cognify(
|
|||
Formula: min(embedding_max_completion_tokens, llm_max_completion_tokens // 2)
|
||||
Default limits: ~512-8192 tokens depending on models.
|
||||
Smaller chunks = more granular but potentially fragmented knowledge.
|
||||
chunks_per_batch: Number of chunks to be processed in a single batch in Cognify tasks.
|
||||
vector_db_config: Custom vector database configuration for embeddings storage.
|
||||
graph_db_config: Custom graph database configuration for relationship storage.
|
||||
run_in_background: If True, starts processing asynchronously and returns immediately.
|
||||
|
|
@ -212,19 +209,10 @@ async def cognify(
|
|||
}
|
||||
|
||||
if temporal_cognify:
|
||||
tasks = await get_temporal_tasks(
|
||||
user=user, chunker=chunker, chunk_size=chunk_size, chunks_per_batch=chunks_per_batch
|
||||
)
|
||||
tasks = await get_temporal_tasks(user, chunker, chunk_size)
|
||||
else:
|
||||
tasks = await get_default_tasks(
|
||||
user=user,
|
||||
graph_model=graph_model,
|
||||
chunker=chunker,
|
||||
chunk_size=chunk_size,
|
||||
config=config,
|
||||
custom_prompt=custom_prompt,
|
||||
chunks_per_batch=chunks_per_batch,
|
||||
**kwargs,
|
||||
user, graph_model, chunker, chunk_size, config, custom_prompt
|
||||
)
|
||||
|
||||
# By calling get pipeline executor we get a function that will have the run_pipeline run in the background or a function that we will need to wait for
|
||||
|
|
@ -239,9 +227,7 @@ async def cognify(
|
|||
vector_db_config=vector_db_config,
|
||||
graph_db_config=graph_db_config,
|
||||
incremental_loading=incremental_loading,
|
||||
use_pipeline_cache=True,
|
||||
pipeline_name="cognify_pipeline",
|
||||
data_per_batch=data_per_batch,
|
||||
)
|
||||
|
||||
|
||||
|
|
@ -252,8 +238,6 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
|
|||
chunk_size: int = None,
|
||||
config: Config = None,
|
||||
custom_prompt: Optional[str] = None,
|
||||
chunks_per_batch: int = 100,
|
||||
**kwargs,
|
||||
) -> list[Task]:
|
||||
if config is None:
|
||||
ontology_config = get_ontology_env_config()
|
||||
|
|
@ -272,14 +256,9 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
|
|||
"ontology_config": {"ontology_resolver": get_default_ontology_resolver()}
|
||||
}
|
||||
|
||||
if chunks_per_batch is None:
|
||||
chunks_per_batch = 100
|
||||
|
||||
cognify_config = get_cognify_config()
|
||||
embed_triplets = cognify_config.triplet_embedding
|
||||
|
||||
default_tasks = [
|
||||
Task(classify_documents),
|
||||
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
|
||||
Task(
|
||||
extract_chunks_from_documents,
|
||||
max_chunk_size=chunk_size or get_max_chunk_tokens(),
|
||||
|
|
@ -290,58 +269,51 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
|
|||
graph_model=graph_model,
|
||||
config=config,
|
||||
custom_prompt=custom_prompt,
|
||||
task_config={"batch_size": chunks_per_batch},
|
||||
**kwargs,
|
||||
task_config={"batch_size": 10},
|
||||
), # Generate knowledge graphs from the document chunks.
|
||||
Task(
|
||||
summarize_text,
|
||||
task_config={"batch_size": chunks_per_batch},
|
||||
),
|
||||
Task(
|
||||
add_data_points,
|
||||
embed_triplets=embed_triplets,
|
||||
task_config={"batch_size": chunks_per_batch},
|
||||
task_config={"batch_size": 10},
|
||||
),
|
||||
Task(add_data_points, task_config={"batch_size": 10}),
|
||||
]
|
||||
|
||||
return default_tasks
|
||||
|
||||
|
||||
async def get_temporal_tasks(
|
||||
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10
|
||||
user: User = None, chunker=TextChunker, chunk_size: int = None
|
||||
) -> list[Task]:
|
||||
"""
|
||||
Builds and returns a list of temporal processing tasks to be executed in sequence.
|
||||
|
||||
The pipeline includes:
|
||||
1. Document classification.
|
||||
2. Document chunking with a specified or default chunk size.
|
||||
3. Event and timestamp extraction from chunks.
|
||||
4. Knowledge graph extraction from events.
|
||||
5. Batched insertion of data points.
|
||||
2. Dataset permission checks (requires "write" access).
|
||||
3. Document chunking with a specified or default chunk size.
|
||||
4. Event and timestamp extraction from chunks.
|
||||
5. Knowledge graph extraction from events.
|
||||
6. Batched insertion of data points.
|
||||
|
||||
Args:
|
||||
user (User, optional): The user requesting task execution.
|
||||
user (User, optional): The user requesting task execution, used for permission checks.
|
||||
chunker (Callable, optional): A text chunking function/class to split documents. Defaults to TextChunker.
|
||||
chunk_size (int, optional): Maximum token size per chunk. If not provided, uses system default.
|
||||
chunks_per_batch (int, optional): Number of chunks to process in a single batch in Cognify
|
||||
|
||||
Returns:
|
||||
list[Task]: A list of Task objects representing the temporal processing pipeline.
|
||||
"""
|
||||
if chunks_per_batch is None:
|
||||
chunks_per_batch = 10
|
||||
|
||||
temporal_tasks = [
|
||||
Task(classify_documents),
|
||||
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
|
||||
Task(
|
||||
extract_chunks_from_documents,
|
||||
max_chunk_size=chunk_size or get_max_chunk_tokens(),
|
||||
chunker=chunker,
|
||||
),
|
||||
Task(extract_events_and_timestamps, task_config={"batch_size": chunks_per_batch}),
|
||||
Task(extract_events_and_timestamps, task_config={"chunk_size": 10}),
|
||||
Task(extract_knowledge_graph_from_events),
|
||||
Task(add_data_points, task_config={"batch_size": chunks_per_batch}),
|
||||
Task(add_data_points, task_config={"batch_size": 10}),
|
||||
]
|
||||
|
||||
return temporal_tasks
|
||||
|
|
|
|||
|
|
@ -1 +1,2 @@
|
|||
from .get_cognify_router import get_cognify_router
|
||||
from .get_code_pipeline_router import get_code_pipeline_router
|
||||
|
|
|
|||
90
cognee/api/v1/cognify/routers/get_code_pipeline_router.py
Normal file
90
cognee/api/v1/cognify/routers/get_code_pipeline_router.py
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
import json
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from fastapi import APIRouter
|
||||
from fastapi.responses import JSONResponse
|
||||
from cognee.api.DTO import InDTO
|
||||
from cognee.modules.retrieval.code_retriever import CodeRetriever
|
||||
from cognee.modules.storage.utils import JSONEncoder
|
||||
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
||||
class CodePipelineIndexPayloadDTO(InDTO):
|
||||
repo_path: str
|
||||
include_docs: bool = False
|
||||
|
||||
|
||||
class CodePipelineRetrievePayloadDTO(InDTO):
|
||||
query: str
|
||||
full_input: str
|
||||
|
||||
|
||||
def get_code_pipeline_router() -> APIRouter:
|
||||
try:
|
||||
import cognee.api.v1.cognify.code_graph_pipeline
|
||||
except ModuleNotFoundError:
|
||||
logger.error("codegraph dependencies not found. Skipping codegraph API routes.")
|
||||
return None
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.post("/index", response_model=None)
|
||||
async def code_pipeline_index(payload: CodePipelineIndexPayloadDTO):
|
||||
"""
|
||||
Run indexation on a code repository.
|
||||
|
||||
This endpoint processes a code repository to create a knowledge graph
|
||||
of the codebase structure, dependencies, and relationships.
|
||||
|
||||
## Request Parameters
|
||||
- **repo_path** (str): Path to the code repository
|
||||
- **include_docs** (bool): Whether to include documentation files (default: false)
|
||||
|
||||
## Response
|
||||
No content returned. Processing results are logged.
|
||||
|
||||
## Error Codes
|
||||
- **409 Conflict**: Error during indexation process
|
||||
"""
|
||||
from cognee.api.v1.cognify.code_graph_pipeline import run_code_graph_pipeline
|
||||
|
||||
try:
|
||||
async for result in run_code_graph_pipeline(payload.repo_path, payload.include_docs):
|
||||
logger.info(result)
|
||||
except Exception as error:
|
||||
return JSONResponse(status_code=409, content={"error": str(error)})
|
||||
|
||||
@router.post("/retrieve", response_model=list[dict])
|
||||
async def code_pipeline_retrieve(payload: CodePipelineRetrievePayloadDTO):
|
||||
"""
|
||||
Retrieve context from the code knowledge graph.
|
||||
|
||||
This endpoint searches the indexed code repository to find relevant
|
||||
context based on the provided query.
|
||||
|
||||
## Request Parameters
|
||||
- **query** (str): Search query for code context
|
||||
- **full_input** (str): Full input text for processing
|
||||
|
||||
## Response
|
||||
Returns a list of relevant code files and context as JSON.
|
||||
|
||||
## Error Codes
|
||||
- **409 Conflict**: Error during retrieval process
|
||||
"""
|
||||
try:
|
||||
query = (
|
||||
payload.full_input.replace("cognee ", "")
|
||||
if payload.full_input.startswith("cognee ")
|
||||
else payload.full_input
|
||||
)
|
||||
|
||||
retriever = CodeRetriever()
|
||||
retrieved_files = await retriever.get_context(query)
|
||||
|
||||
return json.dumps(retrieved_files, cls=JSONEncoder)
|
||||
except Exception as error:
|
||||
return JSONResponse(status_code=409, content={"error": str(error)})
|
||||
|
||||
return router
|
||||
|
|
@ -29,7 +29,7 @@ from cognee.modules.pipelines.queues.pipeline_run_info_queues import (
|
|||
)
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
|
||||
logger = get_logger("api.cognify")
|
||||
|
||||
|
|
@ -41,11 +41,6 @@ class CognifyPayloadDTO(InDTO):
|
|||
custom_prompt: Optional[str] = Field(
|
||||
default="", description="Custom prompt for entity extraction and graph generation"
|
||||
)
|
||||
ontology_key: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
examples=[[]],
|
||||
description="Reference to one or more previously uploaded ontologies",
|
||||
)
|
||||
|
||||
|
||||
def get_cognify_router() -> APIRouter:
|
||||
|
|
@ -73,7 +68,6 @@ def get_cognify_router() -> APIRouter:
|
|||
- **dataset_ids** (Optional[List[UUID]]): List of existing dataset UUIDs to process. UUIDs allow processing of datasets not owned by the user (if permitted).
|
||||
- **run_in_background** (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking).
|
||||
- **custom_prompt** (Optional[str]): Custom prompt for entity extraction and graph generation. If provided, this prompt will be used instead of the default prompts for knowledge graph extraction.
|
||||
- **ontology_key** (Optional[List[str]]): Reference to one or more previously uploaded ontology files to use for knowledge graph construction.
|
||||
|
||||
## Response
|
||||
- **Blocking execution**: Complete pipeline run information with entity counts, processing duration, and success/failure status
|
||||
|
|
@ -88,8 +82,7 @@ def get_cognify_router() -> APIRouter:
|
|||
{
|
||||
"datasets": ["research_papers", "documentation"],
|
||||
"run_in_background": false,
|
||||
"custom_prompt": "Extract entities focusing on technical concepts and their relationships. Identify key technologies, methodologies, and their interconnections.",
|
||||
"ontology_key": ["medical_ontology_v1"]
|
||||
"custom_prompt": "Extract entities focusing on technical concepts and their relationships. Identify key technologies, methodologies, and their interconnections."
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -105,7 +98,6 @@ def get_cognify_router() -> APIRouter:
|
|||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "POST /v1/cognify",
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -115,35 +107,13 @@ def get_cognify_router() -> APIRouter:
|
|||
)
|
||||
|
||||
from cognee.api.v1.cognify import cognify as cognee_cognify
|
||||
from cognee.api.v1.ontologies.ontologies import OntologyService
|
||||
|
||||
try:
|
||||
datasets = payload.dataset_ids if payload.dataset_ids else payload.datasets
|
||||
config_to_use = None
|
||||
|
||||
if payload.ontology_key:
|
||||
ontology_service = OntologyService()
|
||||
ontology_contents = ontology_service.get_ontology_contents(
|
||||
payload.ontology_key, user
|
||||
)
|
||||
|
||||
from cognee.modules.ontology.ontology_config import Config
|
||||
from cognee.modules.ontology.rdf_xml.RDFLibOntologyResolver import (
|
||||
RDFLibOntologyResolver,
|
||||
)
|
||||
from io import StringIO
|
||||
|
||||
ontology_streams = [StringIO(content) for content in ontology_contents]
|
||||
config_to_use: Config = {
|
||||
"ontology_config": {
|
||||
"ontology_resolver": RDFLibOntologyResolver(ontology_file=ontology_streams)
|
||||
}
|
||||
}
|
||||
|
||||
cognify_run = await cognee_cognify(
|
||||
datasets,
|
||||
user,
|
||||
config=config_to_use,
|
||||
run_in_background=payload.run_in_background,
|
||||
custom_prompt=payload.custom_prompt,
|
||||
)
|
||||
|
|
|
|||
|
|
@ -1,5 +1,4 @@
|
|||
from uuid import UUID
|
||||
from cognee.modules.data.methods import has_dataset_data
|
||||
from cognee.modules.users.methods import get_default_user
|
||||
from cognee.modules.ingestion import discover_directory_datasets
|
||||
from cognee.modules.pipelines.operations.get_pipeline_status import get_pipeline_status
|
||||
|
|
@ -27,16 +26,6 @@ class datasets:
|
|||
|
||||
return await get_dataset_data(dataset.id)
|
||||
|
||||
@staticmethod
|
||||
async def has_data(dataset_id: str) -> bool:
|
||||
from cognee.modules.data.methods import get_dataset
|
||||
|
||||
user = await get_default_user()
|
||||
|
||||
dataset = await get_dataset(user.id, dataset_id)
|
||||
|
||||
return await has_dataset_data(dataset.id)
|
||||
|
||||
@staticmethod
|
||||
async def get_status(dataset_ids: list[UUID]) -> dict:
|
||||
return await get_pipeline_status(dataset_ids, pipeline_name="cognify_pipeline")
|
||||
|
|
|
|||
|
|
@ -24,7 +24,6 @@ from cognee.modules.users.permissions.methods import (
|
|||
from cognee.modules.graph.methods import get_formatted_graph_data
|
||||
from cognee.modules.pipelines.models import PipelineRunStatus
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
|
@ -101,7 +100,6 @@ def get_datasets_router() -> APIRouter:
|
|||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "GET /v1/datasets",
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -149,7 +147,6 @@ def get_datasets_router() -> APIRouter:
|
|||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "POST /v1/datasets",
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -204,18 +201,17 @@ def get_datasets_router() -> APIRouter:
|
|||
additional_properties={
|
||||
"endpoint": f"DELETE /v1/datasets/{str(dataset_id)}",
|
||||
"dataset_id": str(dataset_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
from cognee.modules.data.methods import delete_dataset
|
||||
from cognee.modules.data.methods import get_dataset, delete_dataset
|
||||
|
||||
dataset = await get_authorized_existing_datasets([dataset_id], "delete", user)
|
||||
dataset = await get_dataset(user.id, dataset_id)
|
||||
|
||||
if dataset is None:
|
||||
raise DatasetNotFoundError(message=f"Dataset ({str(dataset_id)}) not found.")
|
||||
|
||||
await delete_dataset(dataset[0])
|
||||
await delete_dataset(dataset)
|
||||
|
||||
@router.delete(
|
||||
"/{dataset_id}/data/{data_id}",
|
||||
|
|
@ -250,7 +246,6 @@ def get_datasets_router() -> APIRouter:
|
|||
"endpoint": f"DELETE /v1/datasets/{str(dataset_id)}/data/{str(data_id)}",
|
||||
"dataset_id": str(dataset_id),
|
||||
"data_id": str(data_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -332,7 +327,6 @@ def get_datasets_router() -> APIRouter:
|
|||
additional_properties={
|
||||
"endpoint": f"GET /v1/datasets/{str(dataset_id)}/data",
|
||||
"dataset_id": str(dataset_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -393,7 +387,6 @@ def get_datasets_router() -> APIRouter:
|
|||
additional_properties={
|
||||
"endpoint": "GET /v1/datasets/status",
|
||||
"datasets": [str(dataset_id) for dataset_id in datasets],
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -440,7 +433,6 @@ def get_datasets_router() -> APIRouter:
|
|||
"endpoint": f"GET /v1/datasets/{str(dataset_id)}/data/{str(data_id)}/raw",
|
||||
"dataset_id": str(dataset_id),
|
||||
"data_id": str(data_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -6,7 +6,6 @@ from cognee.shared.logging_utils import get_logger
|
|||
from cognee.modules.users.models import User
|
||||
from cognee.modules.users.methods import get_authenticated_user
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
|
@ -40,7 +39,6 @@ def get_delete_router() -> APIRouter:
|
|||
"endpoint": "DELETE /v1/delete",
|
||||
"dataset_id": str(dataset_id),
|
||||
"data_id": str(data_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -12,7 +12,6 @@ from cognee.modules.users.methods import get_authenticated_user
|
|||
from cognee.shared.utils import send_telemetry
|
||||
from cognee.modules.pipelines.models import PipelineRunErrored
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
|
@ -74,7 +73,7 @@ def get_memify_router() -> APIRouter:
|
|||
send_telemetry(
|
||||
"Memify API Endpoint Invoked",
|
||||
user.id,
|
||||
additional_properties={"endpoint": "POST /v1/memify", "cognee_version": cognee_version},
|
||||
additional_properties={"endpoint": "POST /v1/memify"},
|
||||
)
|
||||
|
||||
if not payload.dataset_id and not payload.dataset_name:
|
||||
|
|
|
|||
|
|
@ -1,4 +0,0 @@
|
|||
from .ontologies import OntologyService
|
||||
from .routers.get_ontology_router import get_ontology_router
|
||||
|
||||
__all__ = ["OntologyService", "get_ontology_router"]
|
||||
|
|
@ -1,158 +0,0 @@
|
|||
import os
|
||||
import json
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional, List
|
||||
from dataclasses import dataclass
|
||||
from fastapi import UploadFile
|
||||
|
||||
|
||||
@dataclass
|
||||
class OntologyMetadata:
|
||||
ontology_key: str
|
||||
filename: str
|
||||
size_bytes: int
|
||||
uploaded_at: str
|
||||
description: Optional[str] = None
|
||||
|
||||
|
||||
class OntologyService:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
@property
|
||||
def base_dir(self) -> Path:
|
||||
return Path(tempfile.gettempdir()) / "ontologies"
|
||||
|
||||
def _get_user_dir(self, user_id: str) -> Path:
|
||||
user_dir = self.base_dir / str(user_id)
|
||||
user_dir.mkdir(parents=True, exist_ok=True)
|
||||
return user_dir
|
||||
|
||||
def _get_metadata_path(self, user_dir: Path) -> Path:
|
||||
return user_dir / "metadata.json"
|
||||
|
||||
def _load_metadata(self, user_dir: Path) -> dict:
|
||||
metadata_path = self._get_metadata_path(user_dir)
|
||||
if metadata_path.exists():
|
||||
with open(metadata_path, "r") as f:
|
||||
return json.load(f)
|
||||
return {}
|
||||
|
||||
def _save_metadata(self, user_dir: Path, metadata: dict):
|
||||
metadata_path = self._get_metadata_path(user_dir)
|
||||
with open(metadata_path, "w") as f:
|
||||
json.dump(metadata, f, indent=2)
|
||||
|
||||
async def upload_ontology(
|
||||
self, ontology_key: str, file: UploadFile, user, description: Optional[str] = None
|
||||
) -> OntologyMetadata:
|
||||
if not file.filename:
|
||||
raise ValueError("File must have a filename")
|
||||
if not file.filename.lower().endswith(".owl"):
|
||||
raise ValueError("File must be in .owl format")
|
||||
|
||||
user_dir = self._get_user_dir(str(user.id))
|
||||
metadata = self._load_metadata(user_dir)
|
||||
|
||||
if ontology_key in metadata:
|
||||
raise ValueError(f"Ontology key '{ontology_key}' already exists")
|
||||
|
||||
content = await file.read()
|
||||
|
||||
file_path = user_dir / f"{ontology_key}.owl"
|
||||
with open(file_path, "wb") as f:
|
||||
f.write(content)
|
||||
|
||||
ontology_metadata = {
|
||||
"filename": file.filename,
|
||||
"size_bytes": len(content),
|
||||
"uploaded_at": datetime.now(timezone.utc).isoformat(),
|
||||
"description": description,
|
||||
}
|
||||
metadata[ontology_key] = ontology_metadata
|
||||
self._save_metadata(user_dir, metadata)
|
||||
|
||||
return OntologyMetadata(
|
||||
ontology_key=ontology_key,
|
||||
filename=file.filename,
|
||||
size_bytes=len(content),
|
||||
uploaded_at=ontology_metadata["uploaded_at"],
|
||||
description=description,
|
||||
)
|
||||
|
||||
async def upload_ontologies(
|
||||
self,
|
||||
ontology_key: List[str],
|
||||
files: List[UploadFile],
|
||||
user,
|
||||
descriptions: Optional[List[str]] = None,
|
||||
) -> List[OntologyMetadata]:
|
||||
"""
|
||||
Upload ontology files with their respective keys.
|
||||
|
||||
Args:
|
||||
ontology_key: List of unique keys for each ontology
|
||||
files: List of UploadFile objects (same length as keys)
|
||||
user: Authenticated user
|
||||
descriptions: Optional list of descriptions for each file
|
||||
|
||||
Returns:
|
||||
List of OntologyMetadata objects for uploaded files
|
||||
|
||||
Raises:
|
||||
ValueError: If keys duplicate, file format invalid, or array lengths don't match
|
||||
"""
|
||||
if len(ontology_key) != len(files):
|
||||
raise ValueError("Number of keys must match number of files")
|
||||
|
||||
if len(set(ontology_key)) != len(ontology_key):
|
||||
raise ValueError("Duplicate ontology keys not allowed")
|
||||
|
||||
results = []
|
||||
|
||||
for i, (key, file) in enumerate(zip(ontology_key, files)):
|
||||
results.append(
|
||||
await self.upload_ontology(
|
||||
ontology_key=key,
|
||||
file=file,
|
||||
user=user,
|
||||
description=descriptions[i] if descriptions else None,
|
||||
)
|
||||
)
|
||||
return results
|
||||
|
||||
def get_ontology_contents(self, ontology_key: List[str], user) -> List[str]:
|
||||
"""
|
||||
Retrieve ontology content for one or more keys.
|
||||
|
||||
Args:
|
||||
ontology_key: List of ontology keys to retrieve (can contain single item)
|
||||
user: Authenticated user
|
||||
|
||||
Returns:
|
||||
List of ontology content strings
|
||||
|
||||
Raises:
|
||||
ValueError: If any ontology key not found
|
||||
"""
|
||||
user_dir = self._get_user_dir(str(user.id))
|
||||
metadata = self._load_metadata(user_dir)
|
||||
|
||||
contents = []
|
||||
for key in ontology_key:
|
||||
if key not in metadata:
|
||||
raise ValueError(f"Ontology key '{key}' not found")
|
||||
|
||||
file_path = user_dir / f"{key}.owl"
|
||||
if not file_path.exists():
|
||||
raise ValueError(f"Ontology file for key '{key}' not found")
|
||||
|
||||
with open(file_path, "r", encoding="utf-8") as f:
|
||||
contents.append(f.read())
|
||||
return contents
|
||||
|
||||
def list_ontologies(self, user) -> dict:
|
||||
user_dir = self._get_user_dir(str(user.id))
|
||||
return self._load_metadata(user_dir)
|
||||
|
|
@ -1,109 +0,0 @@
|
|||
from fastapi import APIRouter, File, Form, UploadFile, Depends, Request
|
||||
from fastapi.responses import JSONResponse
|
||||
from typing import Optional, List
|
||||
|
||||
from cognee.modules.users.models import User
|
||||
from cognee.modules.users.methods import get_authenticated_user
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
from ..ontologies import OntologyService
|
||||
|
||||
|
||||
def get_ontology_router() -> APIRouter:
|
||||
router = APIRouter()
|
||||
ontology_service = OntologyService()
|
||||
|
||||
@router.post("", response_model=dict)
|
||||
async def upload_ontology(
|
||||
request: Request,
|
||||
ontology_key: str = Form(...),
|
||||
ontology_file: UploadFile = File(...),
|
||||
description: Optional[str] = Form(None),
|
||||
user: User = Depends(get_authenticated_user),
|
||||
):
|
||||
"""
|
||||
Upload a single ontology file for later use in cognify operations.
|
||||
|
||||
## Request Parameters
|
||||
- **ontology_key** (str): User-defined identifier for the ontology.
|
||||
- **ontology_file** (UploadFile): Single OWL format ontology file
|
||||
- **description** (Optional[str]): Optional description for the ontology.
|
||||
|
||||
## Response
|
||||
Returns metadata about the uploaded ontology including key, filename, size, and upload timestamp.
|
||||
|
||||
## Error Codes
|
||||
- **400 Bad Request**: Invalid file format, duplicate key, multiple files uploaded
|
||||
- **500 Internal Server Error**: File system or processing errors
|
||||
"""
|
||||
send_telemetry(
|
||||
"Ontology Upload API Endpoint Invoked",
|
||||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "POST /api/v1/ontologies",
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
# Enforce: exactly one uploaded file for "ontology_file"
|
||||
form = await request.form()
|
||||
uploaded_files = form.getlist("ontology_file")
|
||||
if len(uploaded_files) != 1:
|
||||
raise ValueError("Only one ontology_file is allowed")
|
||||
|
||||
if ontology_key.strip().startswith(("[", "{")):
|
||||
raise ValueError("ontology_key must be a string")
|
||||
if description is not None and description.strip().startswith(("[", "{")):
|
||||
raise ValueError("description must be a string")
|
||||
|
||||
result = await ontology_service.upload_ontology(
|
||||
ontology_key=ontology_key,
|
||||
file=ontology_file,
|
||||
user=user,
|
||||
description=description,
|
||||
)
|
||||
|
||||
return {
|
||||
"uploaded_ontologies": [
|
||||
{
|
||||
"ontology_key": result.ontology_key,
|
||||
"filename": result.filename,
|
||||
"size_bytes": result.size_bytes,
|
||||
"uploaded_at": result.uploaded_at,
|
||||
"description": result.description,
|
||||
}
|
||||
]
|
||||
}
|
||||
except ValueError as e:
|
||||
return JSONResponse(status_code=400, content={"error": str(e)})
|
||||
except Exception as e:
|
||||
return JSONResponse(status_code=500, content={"error": str(e)})
|
||||
|
||||
@router.get("", response_model=dict)
|
||||
async def list_ontologies(user: User = Depends(get_authenticated_user)):
|
||||
"""
|
||||
List all uploaded ontologies for the authenticated user.
|
||||
|
||||
## Response
|
||||
Returns a dictionary mapping ontology keys to their metadata including filename, size, and upload timestamp.
|
||||
|
||||
## Error Codes
|
||||
- **500 Internal Server Error**: File system or processing errors
|
||||
"""
|
||||
send_telemetry(
|
||||
"Ontology List API Endpoint Invoked",
|
||||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "GET /api/v1/ontologies",
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
metadata = ontology_service.list_ontologies(user)
|
||||
return metadata
|
||||
except Exception as e:
|
||||
return JSONResponse(status_code=500, content={"error": str(e)})
|
||||
|
||||
return router
|
||||
|
|
@ -1,18 +1,12 @@
|
|||
from uuid import UUID
|
||||
from typing import List, Union
|
||||
from typing import List
|
||||
|
||||
from fastapi import APIRouter, Depends
|
||||
from fastapi.responses import JSONResponse
|
||||
|
||||
from cognee.modules.users.models import User
|
||||
from cognee.api.DTO import InDTO
|
||||
from cognee.modules.users.methods import get_authenticated_user
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
|
||||
class SelectTenantDTO(InDTO):
|
||||
tenant_id: UUID | None = None
|
||||
|
||||
|
||||
def get_permissions_router() -> APIRouter:
|
||||
|
|
@ -54,7 +48,6 @@ def get_permissions_router() -> APIRouter:
|
|||
"endpoint": f"POST /v1/permissions/datasets/{str(principal_id)}",
|
||||
"dataset_ids": str(dataset_ids),
|
||||
"principal_id": str(principal_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -96,7 +89,6 @@ def get_permissions_router() -> APIRouter:
|
|||
additional_properties={
|
||||
"endpoint": "POST /v1/permissions/roles",
|
||||
"role_name": role_name,
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -141,7 +133,6 @@ def get_permissions_router() -> APIRouter:
|
|||
"endpoint": f"POST /v1/permissions/users/{str(user_id)}/roles",
|
||||
"user_id": str(user_id),
|
||||
"role_id": str(role_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -184,7 +175,6 @@ def get_permissions_router() -> APIRouter:
|
|||
"endpoint": f"POST /v1/permissions/users/{str(user_id)}/tenants",
|
||||
"user_id": str(user_id),
|
||||
"tenant_id": str(tenant_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -219,7 +209,6 @@ def get_permissions_router() -> APIRouter:
|
|||
additional_properties={
|
||||
"endpoint": "POST /v1/permissions/tenants",
|
||||
"tenant_name": tenant_name,
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
@ -231,39 +220,4 @@ def get_permissions_router() -> APIRouter:
|
|||
status_code=200, content={"message": "Tenant created.", "tenant_id": str(tenant_id)}
|
||||
)
|
||||
|
||||
@permissions_router.post("/tenants/select")
|
||||
async def select_tenant(payload: SelectTenantDTO, user: User = Depends(get_authenticated_user)):
|
||||
"""
|
||||
Select current tenant.
|
||||
|
||||
This endpoint selects a tenant with the specified UUID. Tenants are used
|
||||
to organize users and resources in multi-tenant environments, providing
|
||||
isolation and access control between different groups or organizations.
|
||||
|
||||
Sending a null/None value as tenant_id selects his default single user tenant
|
||||
|
||||
## Request Parameters
|
||||
- **tenant_id** (Union[UUID, None]): UUID of the tenant to select, If null/None is provided use the default single user tenant
|
||||
|
||||
## Response
|
||||
Returns a success message along with selected tenant id.
|
||||
"""
|
||||
send_telemetry(
|
||||
"Permissions API Endpoint Invoked",
|
||||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": f"POST /v1/permissions/tenants/{str(payload.tenant_id)}",
|
||||
"tenant_id": str(payload.tenant_id),
|
||||
},
|
||||
)
|
||||
|
||||
from cognee.modules.users.tenants.methods import select_tenant as select_tenant_method
|
||||
|
||||
await select_tenant_method(user_id=user.id, tenant_id=payload.tenant_id)
|
||||
|
||||
return JSONResponse(
|
||||
status_code=200,
|
||||
content={"message": "Tenant selected.", "tenant_id": str(payload.tenant_id)},
|
||||
)
|
||||
|
||||
return permissions_router
|
||||
|
|
|
|||
|
|
@ -13,7 +13,6 @@ from cognee.modules.users.models import User
|
|||
from cognee.modules.search.operations import get_history
|
||||
from cognee.modules.users.methods import get_authenticated_user
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
|
||||
# Note: Datasets sent by name will only map to datasets owned by the request sender
|
||||
|
|
@ -62,7 +61,9 @@ def get_search_router() -> APIRouter:
|
|||
send_telemetry(
|
||||
"Search API Endpoint Invoked",
|
||||
user.id,
|
||||
additional_properties={"endpoint": "GET /v1/search", "cognee_version": cognee_version},
|
||||
additional_properties={
|
||||
"endpoint": "GET /v1/search",
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
|
|
@ -117,7 +118,6 @@ def get_search_router() -> APIRouter:
|
|||
"top_k": payload.top_k,
|
||||
"only_context": payload.only_context,
|
||||
"use_combined_context": payload.use_combined_context,
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,7 +1,6 @@
|
|||
from uuid import UUID
|
||||
from typing import Union, Optional, List, Type
|
||||
|
||||
from cognee.infrastructure.databases.graph import get_graph_engine
|
||||
from cognee.modules.engine.models.node_set import NodeSet
|
||||
from cognee.modules.users.models import User
|
||||
from cognee.modules.search.types import SearchResult, SearchType, CombinedSearchResult
|
||||
|
|
@ -9,10 +8,6 @@ from cognee.modules.users.methods import get_default_user
|
|||
from cognee.modules.search.methods import search as search_function
|
||||
from cognee.modules.data.methods import get_authorized_existing_datasets
|
||||
from cognee.modules.data.exceptions import DatasetNotFoundError
|
||||
from cognee.context_global_variables import set_session_user_context_variable
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
||||
async def search(
|
||||
|
|
@ -30,10 +25,6 @@ async def search(
|
|||
last_k: Optional[int] = 1,
|
||||
only_context: bool = False,
|
||||
use_combined_context: bool = False,
|
||||
session_id: Optional[str] = None,
|
||||
wide_search_top_k: Optional[int] = 100,
|
||||
triplet_distance_penalty: Optional[float] = 3.5,
|
||||
verbose: bool = False,
|
||||
) -> Union[List[SearchResult], CombinedSearchResult]:
|
||||
"""
|
||||
Search and query the knowledge graph for insights, information, and connections.
|
||||
|
|
@ -122,10 +113,6 @@ async def search(
|
|||
|
||||
save_interaction: Save interaction (query, context, answer connected to triplet endpoints) results into the graph or not
|
||||
|
||||
session_id: Optional session identifier for caching Q&A interactions. Defaults to 'default_session' if None.
|
||||
|
||||
verbose: If True, returns detailed result information including graph representation (when possible).
|
||||
|
||||
Returns:
|
||||
list: Search results in format determined by query_type:
|
||||
|
||||
|
|
@ -181,8 +168,6 @@ async def search(
|
|||
if user is None:
|
||||
user = await get_default_user()
|
||||
|
||||
await set_session_user_context_variable(user)
|
||||
|
||||
# Transform string based datasets to UUID - String based datasets can only be found for current user
|
||||
if datasets is not None and [all(isinstance(dataset, str) for dataset in datasets)]:
|
||||
datasets = await get_authorized_existing_datasets(datasets, "read", user)
|
||||
|
|
@ -204,10 +189,6 @@ async def search(
|
|||
last_k=last_k,
|
||||
only_context=only_context,
|
||||
use_combined_context=use_combined_context,
|
||||
session_id=session_id,
|
||||
wide_search_top_k=wide_search_top_k,
|
||||
triplet_distance_penalty=triplet_distance_penalty,
|
||||
verbose=verbose,
|
||||
)
|
||||
|
||||
return filtered_search_results
|
||||
|
|
|
|||
|
|
@ -12,7 +12,6 @@ from cognee.modules.sync.methods import get_running_sync_operations_for_user, ge
|
|||
from cognee.shared.utils import send_telemetry
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from cognee.api.v1.sync import SyncResponse
|
||||
from cognee import __version__ as cognee_version
|
||||
from cognee.context_global_variables import set_database_global_context_variables
|
||||
|
||||
logger = get_logger()
|
||||
|
|
@ -100,7 +99,6 @@ def get_sync_router() -> APIRouter:
|
|||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "POST /v1/sync",
|
||||
"cognee_version": cognee_version,
|
||||
"dataset_ids": [str(id) for id in request.dataset_ids]
|
||||
if request.dataset_ids
|
||||
else "*",
|
||||
|
|
@ -207,7 +205,6 @@ def get_sync_router() -> APIRouter:
|
|||
user.id,
|
||||
additional_properties={
|
||||
"endpoint": "GET /v1/sync/status",
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,360 +0,0 @@
|
|||
import os
|
||||
import platform
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
import requests
|
||||
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
||||
def get_nvm_dir() -> Path:
|
||||
"""
|
||||
Get the nvm directory path following standard nvm installation logic.
|
||||
Uses XDG_CONFIG_HOME if set, otherwise falls back to ~/.nvm.
|
||||
"""
|
||||
xdg_config_home = os.environ.get("XDG_CONFIG_HOME")
|
||||
if xdg_config_home:
|
||||
return Path(xdg_config_home) / "nvm"
|
||||
return Path.home() / ".nvm"
|
||||
|
||||
|
||||
def get_nvm_sh_path() -> Path:
|
||||
"""
|
||||
Get the path to nvm.sh following standard nvm installation logic.
|
||||
"""
|
||||
return get_nvm_dir() / "nvm.sh"
|
||||
|
||||
|
||||
def check_nvm_installed() -> bool:
|
||||
"""
|
||||
Check if nvm (Node Version Manager) is installed.
|
||||
"""
|
||||
try:
|
||||
# Check if nvm is available in the shell
|
||||
# nvm is typically sourced in shell config files, so we need to check via shell
|
||||
if platform.system() == "Windows":
|
||||
# On Windows, nvm-windows uses a different approach
|
||||
result = subprocess.run(
|
||||
["nvm", "version"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
shell=True,
|
||||
)
|
||||
else:
|
||||
# On Unix-like systems, nvm is a shell function, so we need to source it
|
||||
# First check if nvm.sh exists
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if not nvm_path.exists():
|
||||
logger.debug(f"nvm.sh not found at {nvm_path}")
|
||||
return False
|
||||
|
||||
# Try to source nvm and check version, capturing errors
|
||||
result = subprocess.run(
|
||||
["bash", "-c", f"source {nvm_path} && nvm --version"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
# Log the error to help diagnose configuration issues
|
||||
if result.stderr:
|
||||
logger.debug(f"nvm check failed: {result.stderr.strip()}")
|
||||
return False
|
||||
|
||||
return result.returncode == 0
|
||||
except Exception as e:
|
||||
logger.debug(f"Exception checking nvm: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def install_nvm() -> bool:
|
||||
"""
|
||||
Install nvm (Node Version Manager) on Unix-like systems.
|
||||
"""
|
||||
if platform.system() == "Windows":
|
||||
logger.error("nvm installation on Windows requires nvm-windows.")
|
||||
logger.error(
|
||||
"Please install nvm-windows manually from: https://github.com/coreybutler/nvm-windows"
|
||||
)
|
||||
return False
|
||||
|
||||
logger.info("Installing nvm (Node Version Manager)...")
|
||||
|
||||
try:
|
||||
# Download and install nvm
|
||||
nvm_install_script = "https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh"
|
||||
logger.info(f"Downloading nvm installer from {nvm_install_script}...")
|
||||
|
||||
response = requests.get(nvm_install_script, timeout=60)
|
||||
response.raise_for_status()
|
||||
|
||||
# Create a temporary script file
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".sh", delete=False) as f:
|
||||
f.write(response.text)
|
||||
install_script_path = f.name
|
||||
|
||||
try:
|
||||
# Make the script executable and run it
|
||||
os.chmod(install_script_path, 0o755)
|
||||
result = subprocess.run(
|
||||
["bash", install_script_path],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
logger.info("✓ nvm installed successfully")
|
||||
# Source nvm in current shell session
|
||||
nvm_dir = get_nvm_dir()
|
||||
if nvm_dir.exists():
|
||||
return True
|
||||
else:
|
||||
logger.warning(
|
||||
f"nvm installation completed but nvm directory not found at {nvm_dir}"
|
||||
)
|
||||
return False
|
||||
else:
|
||||
logger.error(f"nvm installation failed: {result.stderr}")
|
||||
return False
|
||||
finally:
|
||||
# Clean up temporary script
|
||||
try:
|
||||
os.unlink(install_script_path)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
logger.error(f"Failed to download nvm installer: {str(e)}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to install nvm: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def install_node_with_nvm() -> bool:
|
||||
"""
|
||||
Install the latest Node.js version using nvm.
|
||||
Returns True if installation succeeds, False otherwise.
|
||||
"""
|
||||
if platform.system() == "Windows":
|
||||
logger.error("Node.js installation via nvm on Windows requires nvm-windows.")
|
||||
logger.error("Please install Node.js manually from: https://nodejs.org/")
|
||||
return False
|
||||
|
||||
logger.info("Installing latest Node.js version using nvm...")
|
||||
|
||||
try:
|
||||
# Source nvm and install latest Node.js
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if not nvm_path.exists():
|
||||
logger.error(f"nvm.sh not found at {nvm_path}. nvm may not be properly installed.")
|
||||
return False
|
||||
|
||||
nvm_source_cmd = f"source {nvm_path}"
|
||||
install_cmd = f"{nvm_source_cmd} && nvm install node"
|
||||
|
||||
result = subprocess.run(
|
||||
["bash", "-c", install_cmd],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300, # 5 minutes timeout for Node.js installation
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
logger.info("✓ Node.js installed successfully via nvm")
|
||||
|
||||
# Set as default version
|
||||
use_cmd = f"{nvm_source_cmd} && nvm alias default node"
|
||||
subprocess.run(
|
||||
["bash", "-c", use_cmd],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
)
|
||||
|
||||
# Add nvm to PATH for current session
|
||||
# This ensures node/npm are available in subsequent commands
|
||||
nvm_dir = get_nvm_dir()
|
||||
if nvm_dir.exists():
|
||||
# Update PATH for current process
|
||||
nvm_bin = nvm_dir / "versions" / "node"
|
||||
# Find the latest installed version
|
||||
if nvm_bin.exists():
|
||||
versions = sorted(nvm_bin.iterdir(), reverse=True)
|
||||
if versions:
|
||||
latest_node_bin = versions[0] / "bin"
|
||||
if latest_node_bin.exists():
|
||||
current_path = os.environ.get("PATH", "")
|
||||
os.environ["PATH"] = f"{latest_node_bin}:{current_path}"
|
||||
|
||||
return True
|
||||
else:
|
||||
logger.error(f"Failed to install Node.js: {result.stderr}")
|
||||
return False
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.error("Timeout installing Node.js (this can take several minutes)")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"Error installing Node.js: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def check_node_npm() -> tuple[bool, str]: # (is_available, error_message)
|
||||
"""
|
||||
Check if Node.js and npm are available.
|
||||
If not available, attempts to install nvm and Node.js automatically.
|
||||
"""
|
||||
|
||||
try:
|
||||
# Check Node.js - try direct command first, then with nvm if needed
|
||||
result = subprocess.run(["node", "--version"], capture_output=True, text=True, timeout=10)
|
||||
if result.returncode != 0:
|
||||
# If direct command fails, try with nvm sourced (in case nvm is installed but not in PATH)
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if nvm_path.exists():
|
||||
result = subprocess.run(
|
||||
["bash", "-c", f"source {nvm_path} && node --version"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
if result.returncode != 0 and result.stderr:
|
||||
logger.debug(f"Failed to source nvm or run node: {result.stderr.strip()}")
|
||||
if result.returncode != 0:
|
||||
# Node.js is not installed, try to install it
|
||||
logger.info("Node.js is not installed. Attempting to install automatically...")
|
||||
|
||||
# Check if nvm is installed
|
||||
if not check_nvm_installed():
|
||||
logger.info("nvm is not installed. Installing nvm first...")
|
||||
if not install_nvm():
|
||||
return (
|
||||
False,
|
||||
"Failed to install nvm. Please install Node.js manually from https://nodejs.org/",
|
||||
)
|
||||
|
||||
# Install Node.js using nvm
|
||||
if not install_node_with_nvm():
|
||||
return (
|
||||
False,
|
||||
"Failed to install Node.js. Please install Node.js manually from https://nodejs.org/",
|
||||
)
|
||||
|
||||
# Verify installation after automatic setup
|
||||
# Try with nvm sourced first
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if nvm_path.exists():
|
||||
result = subprocess.run(
|
||||
["bash", "-c", f"source {nvm_path} && node --version"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
if result.returncode != 0 and result.stderr:
|
||||
logger.debug(
|
||||
f"Failed to verify node after installation: {result.stderr.strip()}"
|
||||
)
|
||||
else:
|
||||
result = subprocess.run(
|
||||
["node", "--version"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
if result.returncode != 0:
|
||||
nvm_path = get_nvm_sh_path()
|
||||
return (
|
||||
False,
|
||||
f"Node.js installation completed but node command is not available. Please restart your terminal or source {nvm_path}",
|
||||
)
|
||||
|
||||
node_version = result.stdout.strip()
|
||||
logger.debug(f"Found Node.js version: {node_version}")
|
||||
|
||||
# Check npm - handle Windows PowerShell scripts
|
||||
if platform.system() == "Windows":
|
||||
# On Windows, npm might be a PowerShell script, so we need to use shell=True
|
||||
result = subprocess.run(
|
||||
["npm", "--version"], capture_output=True, text=True, timeout=10, shell=True
|
||||
)
|
||||
else:
|
||||
# On Unix-like systems, if we just installed via nvm, we may need to source nvm
|
||||
# Try direct command first
|
||||
result = subprocess.run(
|
||||
["npm", "--version"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
if result.returncode != 0:
|
||||
# Try with nvm sourced
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if nvm_path.exists():
|
||||
result = subprocess.run(
|
||||
["bash", "-c", f"source {nvm_path} && npm --version"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
if result.returncode != 0 and result.stderr:
|
||||
logger.debug(f"Failed to source nvm or run npm: {result.stderr.strip()}")
|
||||
|
||||
if result.returncode != 0:
|
||||
return False, "npm is not installed or not in PATH"
|
||||
|
||||
npm_version = result.stdout.strip()
|
||||
logger.debug(f"Found npm version: {npm_version}")
|
||||
|
||||
return True, f"Node.js {node_version}, npm {npm_version}"
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
return False, "Timeout checking Node.js/npm installation"
|
||||
except FileNotFoundError:
|
||||
# Node.js is not installed, try to install it
|
||||
logger.info("Node.js is not found. Attempting to install automatically...")
|
||||
|
||||
# Check if nvm is installed
|
||||
if not check_nvm_installed():
|
||||
logger.info("nvm is not installed. Installing nvm first...")
|
||||
if not install_nvm():
|
||||
return (
|
||||
False,
|
||||
"Failed to install nvm. Please install Node.js manually from https://nodejs.org/",
|
||||
)
|
||||
|
||||
# Install Node.js using nvm
|
||||
if not install_node_with_nvm():
|
||||
return (
|
||||
False,
|
||||
"Failed to install Node.js. Please install Node.js manually from https://nodejs.org/",
|
||||
)
|
||||
|
||||
# Retry checking Node.js after installation
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["node", "--version"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
if result.returncode == 0:
|
||||
node_version = result.stdout.strip()
|
||||
# Check npm
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if nvm_path.exists():
|
||||
result = subprocess.run(
|
||||
["bash", "-c", f"source {nvm_path} && npm --version"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
npm_version = result.stdout.strip()
|
||||
return True, f"Node.js {node_version}, npm {npm_version}"
|
||||
elif result.stderr:
|
||||
logger.debug(f"Failed to source nvm or run npm: {result.stderr.strip()}")
|
||||
except Exception as e:
|
||||
logger.debug(f"Exception retrying node/npm check: {str(e)}")
|
||||
|
||||
return False, "Node.js/npm not found. Please install Node.js from https://nodejs.org/"
|
||||
except Exception as e:
|
||||
return False, f"Error checking Node.js/npm: {str(e)}"
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
import platform
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from .node_setup import get_nvm_sh_path
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
||||
def run_npm_command(cmd: List[str], cwd: Path, timeout: int = 300) -> subprocess.CompletedProcess:
|
||||
"""
|
||||
Run an npm command, ensuring nvm is sourced if needed (Unix-like systems only).
|
||||
Returns the CompletedProcess result.
|
||||
"""
|
||||
if platform.system() == "Windows":
|
||||
# On Windows, use shell=True for npm commands
|
||||
return subprocess.run(
|
||||
cmd,
|
||||
cwd=cwd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
shell=True,
|
||||
)
|
||||
else:
|
||||
# On Unix-like systems, try direct command first
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
cwd=cwd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
# If it fails and nvm might be installed, try with nvm sourced
|
||||
if result.returncode != 0:
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if nvm_path.exists():
|
||||
nvm_cmd = f"source {nvm_path} && {' '.join(cmd)}"
|
||||
result = subprocess.run(
|
||||
["bash", "-c", nvm_cmd],
|
||||
cwd=cwd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
if result.returncode != 0 and result.stderr:
|
||||
logger.debug(f"npm command failed with nvm: {result.stderr.strip()}")
|
||||
return result
|
||||
|
|
@ -15,8 +15,6 @@ import shutil
|
|||
|
||||
from cognee.shared.logging_utils import get_logger
|
||||
from cognee.version import get_cognee_version
|
||||
from .node_setup import check_node_npm, get_nvm_dir, get_nvm_sh_path
|
||||
from .npm_utils import run_npm_command
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
|
@ -287,6 +285,48 @@ def find_frontend_path() -> Optional[Path]:
|
|||
return None
|
||||
|
||||
|
||||
def check_node_npm() -> tuple[bool, str]:
|
||||
"""
|
||||
Check if Node.js and npm are available.
|
||||
Returns (is_available, error_message)
|
||||
"""
|
||||
|
||||
try:
|
||||
# Check Node.js
|
||||
result = subprocess.run(["node", "--version"], capture_output=True, text=True, timeout=10)
|
||||
if result.returncode != 0:
|
||||
return False, "Node.js is not installed or not in PATH"
|
||||
|
||||
node_version = result.stdout.strip()
|
||||
logger.debug(f"Found Node.js version: {node_version}")
|
||||
|
||||
# Check npm - handle Windows PowerShell scripts
|
||||
if platform.system() == "Windows":
|
||||
# On Windows, npm might be a PowerShell script, so we need to use shell=True
|
||||
result = subprocess.run(
|
||||
["npm", "--version"], capture_output=True, text=True, timeout=10, shell=True
|
||||
)
|
||||
else:
|
||||
result = subprocess.run(
|
||||
["npm", "--version"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
return False, "npm is not installed or not in PATH"
|
||||
|
||||
npm_version = result.stdout.strip()
|
||||
logger.debug(f"Found npm version: {npm_version}")
|
||||
|
||||
return True, f"Node.js {node_version}, npm {npm_version}"
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
return False, "Timeout checking Node.js/npm installation"
|
||||
except FileNotFoundError:
|
||||
return False, "Node.js/npm not found. Please install Node.js from https://nodejs.org/"
|
||||
except Exception as e:
|
||||
return False, f"Error checking Node.js/npm: {str(e)}"
|
||||
|
||||
|
||||
def install_frontend_dependencies(frontend_path: Path) -> bool:
|
||||
"""
|
||||
Install frontend dependencies if node_modules doesn't exist.
|
||||
|
|
@ -301,7 +341,24 @@ def install_frontend_dependencies(frontend_path: Path) -> bool:
|
|||
logger.info("Installing frontend dependencies (this may take a few minutes)...")
|
||||
|
||||
try:
|
||||
result = run_npm_command(["npm", "install"], frontend_path, timeout=300)
|
||||
# Use shell=True on Windows for npm commands
|
||||
if platform.system() == "Windows":
|
||||
result = subprocess.run(
|
||||
["npm", "install"],
|
||||
cwd=frontend_path,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300, # 5 minutes timeout
|
||||
shell=True,
|
||||
)
|
||||
else:
|
||||
result = subprocess.run(
|
||||
["npm", "install"],
|
||||
cwd=frontend_path,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300, # 5 minutes timeout
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
logger.info("Frontend dependencies installed successfully")
|
||||
|
|
@ -446,7 +503,7 @@ def start_ui(
|
|||
if start_mcp:
|
||||
logger.info("Starting Cognee MCP server with Docker...")
|
||||
try:
|
||||
image = "cognee/cognee-mcp:main"
|
||||
image = "cognee/cognee-mcp:feature-standalone-mcp" # TODO: change to "cognee/cognee-mcp:main" right before merging into main
|
||||
subprocess.run(["docker", "pull", image], check=True)
|
||||
|
||||
import uuid
|
||||
|
|
@ -481,7 +538,9 @@ def start_ui(
|
|||
env_file = os.path.join(cwd, ".env")
|
||||
docker_cmd.extend(["--env-file", env_file])
|
||||
|
||||
docker_cmd.append(image)
|
||||
docker_cmd.append(
|
||||
image
|
||||
) # TODO: change to "cognee/cognee-mcp:main" right before merging into main
|
||||
|
||||
mcp_process = subprocess.Popen(
|
||||
docker_cmd,
|
||||
|
|
@ -585,21 +644,6 @@ def start_ui(
|
|||
env["HOST"] = "localhost"
|
||||
env["PORT"] = str(port)
|
||||
|
||||
# If nvm is installed, ensure it's available in the environment
|
||||
nvm_path = get_nvm_sh_path()
|
||||
if platform.system() != "Windows" and nvm_path.exists():
|
||||
# Add nvm to PATH for the subprocess
|
||||
nvm_dir = get_nvm_dir()
|
||||
# Find the latest Node.js version installed via nvm
|
||||
nvm_versions = nvm_dir / "versions" / "node"
|
||||
if nvm_versions.exists():
|
||||
versions = sorted(nvm_versions.iterdir(), reverse=True)
|
||||
if versions:
|
||||
latest_node_bin = versions[0] / "bin"
|
||||
if latest_node_bin.exists():
|
||||
current_path = env.get("PATH", "")
|
||||
env["PATH"] = f"{latest_node_bin}:{current_path}"
|
||||
|
||||
# Start the development server
|
||||
logger.info(f"Starting frontend server at http://localhost:{port}")
|
||||
logger.info("This may take a moment to compile and start...")
|
||||
|
|
@ -617,26 +661,14 @@ def start_ui(
|
|||
shell=True,
|
||||
)
|
||||
else:
|
||||
# On Unix-like systems, use bash with nvm sourced if available
|
||||
if nvm_path.exists():
|
||||
# Use bash to source nvm and run npm
|
||||
process = subprocess.Popen(
|
||||
["bash", "-c", f"source {nvm_path} && npm run dev"],
|
||||
cwd=frontend_path,
|
||||
env=env,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
|
||||
)
|
||||
else:
|
||||
process = subprocess.Popen(
|
||||
["npm", "run", "dev"],
|
||||
cwd=frontend_path,
|
||||
env=env,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
|
||||
)
|
||||
process = subprocess.Popen(
|
||||
["npm", "run", "dev"],
|
||||
cwd=frontend_path,
|
||||
env=env,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
preexec_fn=os.setsid if hasattr(os, "setsid") else None,
|
||||
)
|
||||
|
||||
# Start threads to stream frontend output with prefix
|
||||
_stream_process_output(process, "stdout", "[FRONTEND]", "\033[33m") # Yellow
|
||||
|
|
|
|||
|
|
@ -9,7 +9,6 @@ from cognee.shared.logging_utils import get_logger
|
|||
from cognee.modules.users.models import User
|
||||
from cognee.modules.users.methods import get_authenticated_user
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
from cognee.modules.pipelines.models.PipelineRunInfo import (
|
||||
PipelineRunErrored,
|
||||
)
|
||||
|
|
@ -65,7 +64,6 @@ def get_update_router() -> APIRouter:
|
|||
"dataset_id": str(dataset_id),
|
||||
"data_id": str(data_id),
|
||||
"node_set": str(node_set),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
from uuid import UUID
|
||||
from typing import Union, BinaryIO, List, Optional, Any
|
||||
from typing import Union, BinaryIO, List, Optional
|
||||
|
||||
from cognee.modules.users.models import User
|
||||
from cognee.api.v1.delete import delete
|
||||
|
|
@ -15,7 +15,7 @@ async def update(
|
|||
node_set: Optional[List[str]] = None,
|
||||
vector_db_config: dict = None,
|
||||
graph_db_config: dict = None,
|
||||
preferred_loaders: dict[str, dict[str, Any]] = None,
|
||||
preferred_loaders: List[str] = None,
|
||||
incremental_loading: bool = True,
|
||||
):
|
||||
"""
|
||||
|
|
|
|||
|
|
@ -8,7 +8,6 @@ from cognee.modules.users.models import User
|
|||
|
||||
from cognee.context_global_variables import set_database_global_context_variables
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee import __version__ as cognee_version
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
|
|
@ -47,7 +46,6 @@ def get_visualize_router() -> APIRouter:
|
|||
additional_properties={
|
||||
"endpoint": "GET /v1/visualize",
|
||||
"dataset_id": str(dataset_id),
|
||||
"cognee_version": cognee_version,
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,5 +1,4 @@
|
|||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from functools import lru_cache
|
||||
from cognee.root_dir import get_absolute_path, ensure_absolute_path
|
||||
|
|
@ -12,9 +11,6 @@ class BaseConfig(BaseSettings):
|
|||
data_root_directory: str = get_absolute_path(".data_storage")
|
||||
system_root_directory: str = get_absolute_path(".cognee_system")
|
||||
cache_root_directory: str = get_absolute_path(".cognee_cache")
|
||||
logs_root_directory: str = os.getenv(
|
||||
"COGNEE_LOGS_DIR", str(os.path.join(os.path.dirname(os.path.dirname(__file__)), "logs"))
|
||||
)
|
||||
monitoring_tool: object = Observer.NONE
|
||||
|
||||
@pydantic.model_validator(mode="after")
|
||||
|
|
@ -34,8 +30,6 @@ class BaseConfig(BaseSettings):
|
|||
# Require absolute paths for root directories
|
||||
self.data_root_directory = ensure_absolute_path(self.data_root_directory)
|
||||
self.system_root_directory = ensure_absolute_path(self.system_root_directory)
|
||||
self.logs_root_directory = ensure_absolute_path(self.logs_root_directory)
|
||||
|
||||
# Set monitoring tool based on available keys
|
||||
if self.langfuse_public_key and self.langfuse_secret_key:
|
||||
self.monitoring_tool = Observer.LANGFUSE
|
||||
|
|
@ -55,7 +49,6 @@ class BaseConfig(BaseSettings):
|
|||
"system_root_directory": self.system_root_directory,
|
||||
"monitoring_tool": self.monitoring_tool,
|
||||
"cache_root_directory": self.cache_root_directory,
|
||||
"logs_root_directory": self.logs_root_directory,
|
||||
}
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -22,7 +22,7 @@ relationships, and creates semantic connections for enhanced search and reasonin
|
|||
|
||||
Processing Pipeline:
|
||||
1. **Document Classification**: Identifies document types and structures
|
||||
2. **Permission Validation**: Ensures user has processing rights
|
||||
2. **Permission Validation**: Ensures user has processing rights
|
||||
3. **Text Chunking**: Breaks content into semantically meaningful segments
|
||||
4. **Entity Extraction**: Identifies key concepts, people, places, organizations
|
||||
5. **Relationship Detection**: Discovers connections between entities
|
||||
|
|
@ -97,13 +97,6 @@ After successful cognify processing, use `cognee search` to query the knowledge
|
|||
chunker_class = LangchainChunker
|
||||
except ImportError:
|
||||
fmt.warning("LangchainChunker not available, using TextChunker")
|
||||
elif args.chunker == "CsvChunker":
|
||||
try:
|
||||
from cognee.modules.chunking.CsvChunker import CsvChunker
|
||||
|
||||
chunker_class = CsvChunker
|
||||
except ImportError:
|
||||
fmt.warning("CsvChunker not available, using TextChunker")
|
||||
|
||||
result = await cognee.cognify(
|
||||
datasets=datasets,
|
||||
|
|
|
|||
|
|
@ -26,7 +26,7 @@ SEARCH_TYPE_CHOICES = [
|
|||
]
|
||||
|
||||
# Chunker choices
|
||||
CHUNKER_CHOICES = ["TextChunker", "LangchainChunker", "CsvChunker"]
|
||||
CHUNKER_CHOICES = ["TextChunker", "LangchainChunker"]
|
||||
|
||||
# Output format choices
|
||||
OUTPUT_FORMAT_CHOICES = ["json", "pretty", "simple"]
|
||||
|
|
|
|||
|
|
@ -4,10 +4,7 @@ from typing import Union
|
|||
from uuid import UUID
|
||||
|
||||
from cognee.base_config import get_base_config
|
||||
from cognee.infrastructure.databases.vector.config import get_vectordb_config
|
||||
from cognee.infrastructure.databases.graph.config import get_graph_config
|
||||
from cognee.infrastructure.databases.utils import get_or_create_dataset_database
|
||||
from cognee.infrastructure.databases.utils import resolve_dataset_database_connection_info
|
||||
from cognee.infrastructure.files.storage.config import file_storage_config
|
||||
from cognee.modules.users.methods import get_user
|
||||
|
||||
|
|
@ -15,72 +12,8 @@ from cognee.modules.users.methods import get_user
|
|||
# for different async tasks, threads and processes
|
||||
vector_db_config = ContextVar("vector_db_config", default=None)
|
||||
graph_db_config = ContextVar("graph_db_config", default=None)
|
||||
session_user = ContextVar("session_user", default=None)
|
||||
|
||||
|
||||
async def set_session_user_context_variable(user):
|
||||
session_user.set(user)
|
||||
|
||||
|
||||
def multi_user_support_possible():
|
||||
graph_db_config = get_graph_config()
|
||||
vector_db_config = get_vectordb_config()
|
||||
|
||||
graph_handler = graph_db_config.graph_dataset_database_handler
|
||||
vector_handler = vector_db_config.vector_dataset_database_handler
|
||||
from cognee.infrastructure.databases.dataset_database_handler import (
|
||||
supported_dataset_database_handlers,
|
||||
)
|
||||
|
||||
if graph_handler not in supported_dataset_database_handlers:
|
||||
raise EnvironmentError(
|
||||
"Unsupported graph dataset to database handler configured. Cannot add support for multi-user access control mode. Please use a supported graph dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
|
||||
f"Selected graph dataset to database handler: {graph_handler}\n"
|
||||
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
|
||||
)
|
||||
|
||||
if vector_handler not in supported_dataset_database_handlers:
|
||||
raise EnvironmentError(
|
||||
"Unsupported vector dataset to database handler configured. Cannot add support for multi-user access control mode. Please use a supported vector dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
|
||||
f"Selected vector dataset to database handler: {vector_handler}\n"
|
||||
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
|
||||
)
|
||||
|
||||
if (
|
||||
supported_dataset_database_handlers[graph_handler]["handler_provider"]
|
||||
!= graph_db_config.graph_database_provider
|
||||
):
|
||||
raise EnvironmentError(
|
||||
"The selected graph dataset to database handler does not work with the configured graph database provider. Cannot add support for multi-user access control mode. Please use a supported graph dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
|
||||
f"Selected graph database provider: {graph_db_config.graph_database_provider}\n"
|
||||
f"Selected graph dataset to database handler: {graph_handler}\n"
|
||||
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
|
||||
)
|
||||
|
||||
if (
|
||||
supported_dataset_database_handlers[vector_handler]["handler_provider"]
|
||||
!= vector_db_config.vector_db_provider
|
||||
):
|
||||
raise EnvironmentError(
|
||||
"The selected vector dataset to database handler does not work with the configured vector database provider. Cannot add support for multi-user access control mode. Please use a supported vector dataset to database handler or set the environment variables ENABLE_BACKEND_ACCESS_CONTROL to false to switch off multi-user access control mode.\n"
|
||||
f"Selected vector database provider: {vector_db_config.vector_db_provider}\n"
|
||||
f"Selected vector dataset to database handler: {vector_handler}\n"
|
||||
f"Supported dataset to database handlers: {list(supported_dataset_database_handlers.keys())}\n"
|
||||
)
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def backend_access_control_enabled():
|
||||
backend_access_control = os.environ.get("ENABLE_BACKEND_ACCESS_CONTROL", None)
|
||||
if backend_access_control is None:
|
||||
# If backend access control is not defined in environment variables,
|
||||
# enable it by default if graph and vector DBs can support it, otherwise disable it
|
||||
return multi_user_support_possible()
|
||||
elif backend_access_control.lower() == "true":
|
||||
# If enabled, ensure that the current graph and vector DBs can support it
|
||||
return multi_user_support_possible()
|
||||
return False
|
||||
soup_crawler_config = ContextVar("soup_crawler_config", default=None)
|
||||
tavily_config = ContextVar("tavily_config", default=None)
|
||||
|
||||
|
||||
async def set_database_global_context_variables(dataset: Union[str, UUID], user_id: UUID):
|
||||
|
|
@ -102,17 +35,16 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
|
|||
|
||||
"""
|
||||
|
||||
if not backend_access_control_enabled():
|
||||
base_config = get_base_config()
|
||||
|
||||
if not os.getenv("ENABLE_BACKEND_ACCESS_CONTROL", "false").lower() == "true":
|
||||
return
|
||||
|
||||
user = await get_user(user_id)
|
||||
|
||||
# To ensure permissions are enforced properly all datasets will have their own databases
|
||||
dataset_database = await get_or_create_dataset_database(dataset, user)
|
||||
# Ensure that all connection info is resolved properly
|
||||
dataset_database = await resolve_dataset_database_connection_info(dataset_database)
|
||||
|
||||
base_config = get_base_config()
|
||||
data_root_directory = os.path.join(
|
||||
base_config.data_root_directory, str(user.tenant_id or user.id)
|
||||
)
|
||||
|
|
@ -121,31 +53,19 @@ async def set_database_global_context_variables(dataset: Union[str, UUID], user_
|
|||
)
|
||||
|
||||
# Set vector and graph database configuration based on dataset database information
|
||||
# TODO: Add better handling of vector and graph config accross Cognee.
|
||||
# LRU_CACHE takes into account order of inputs, if order of inputs is changed it will be registered as a new DB adapter
|
||||
vector_config = {
|
||||
"vector_db_provider": dataset_database.vector_database_provider,
|
||||
"vector_db_url": dataset_database.vector_database_url,
|
||||
"vector_db_key": dataset_database.vector_database_key,
|
||||
"vector_db_name": dataset_database.vector_database_name,
|
||||
"vector_db_url": os.path.join(
|
||||
databases_directory_path, dataset_database.vector_database_name
|
||||
),
|
||||
"vector_db_key": "",
|
||||
"vector_db_provider": "lancedb",
|
||||
}
|
||||
|
||||
graph_config = {
|
||||
"graph_database_provider": dataset_database.graph_database_provider,
|
||||
"graph_database_url": dataset_database.graph_database_url,
|
||||
"graph_database_name": dataset_database.graph_database_name,
|
||||
"graph_database_key": dataset_database.graph_database_key,
|
||||
"graph_database_provider": "kuzu",
|
||||
"graph_file_path": os.path.join(
|
||||
databases_directory_path, dataset_database.graph_database_name
|
||||
),
|
||||
"graph_database_username": dataset_database.graph_database_connection_info.get(
|
||||
"graph_database_username", ""
|
||||
),
|
||||
"graph_database_password": dataset_database.graph_database_connection_info.get(
|
||||
"graph_database_password", ""
|
||||
),
|
||||
"graph_dataset_database_handler": "",
|
||||
"graph_database_port": "",
|
||||
}
|
||||
|
||||
storage_config = {
|
||||
|
|
|
|||
|
|
@ -1,29 +0,0 @@
|
|||
FROM python:3.11-slim
|
||||
|
||||
# Set environment variables
|
||||
ENV PIP_NO_CACHE_DIR=true
|
||||
ENV PATH="${PATH}:/root/.poetry/bin"
|
||||
ENV PYTHONPATH=/app
|
||||
ENV SKIP_MIGRATIONS=true
|
||||
|
||||
# System dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
gcc \
|
||||
libpq-dev \
|
||||
git \
|
||||
curl \
|
||||
build-essential \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY pyproject.toml poetry.lock README.md /app/
|
||||
|
||||
RUN pip install poetry
|
||||
|
||||
RUN poetry config virtualenvs.create false
|
||||
|
||||
RUN poetry install --extras distributed --extras evals --extras deepeval --no-root
|
||||
|
||||
COPY cognee/ /app/cognee
|
||||
COPY distributed/ /app/distributed
|
||||
|
|
@ -35,16 +35,6 @@ class AnswerGeneratorExecutor:
|
|||
retrieval_context = await retriever.get_context(query_text)
|
||||
search_results = await retriever.get_completion(query_text, retrieval_context)
|
||||
|
||||
############
|
||||
#:TODO This is a quick fix until we don't structure retriever results properly but lets not leave it like this...this is needed now due to the changed combined retriever structure..
|
||||
if isinstance(retrieval_context, list):
|
||||
retrieval_context = await retriever.convert_retrieved_objects_to_context(
|
||||
triplets=retrieval_context
|
||||
)
|
||||
|
||||
if isinstance(search_results, str):
|
||||
search_results = [search_results]
|
||||
#############
|
||||
answer = {
|
||||
"question": query_text,
|
||||
"answer": search_results[0],
|
||||
|
|
|
|||
|
|
@ -35,7 +35,7 @@ async def create_and_insert_answers_table(questions_payload):
|
|||
|
||||
|
||||
async def run_question_answering(
|
||||
params: dict, system_prompt="answer_simple_question_benchmark.txt", top_k: Optional[int] = None
|
||||
params: dict, system_prompt="answer_simple_question.txt", top_k: Optional[int] = None
|
||||
) -> List[dict]:
|
||||
if params.get("answering_questions"):
|
||||
logger.info("Question answering started...")
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ from cognee.modules.users.models import User
|
|||
from cognee.shared.data_models import KnowledgeGraph
|
||||
from cognee.shared.utils import send_telemetry
|
||||
from cognee.tasks.documents import (
|
||||
check_permissions_on_dataset,
|
||||
classify_documents,
|
||||
extract_chunks_from_documents,
|
||||
)
|
||||
|
|
@ -30,6 +31,7 @@ async def get_cascade_graph_tasks(
|
|||
cognee_config = get_cognify_config()
|
||||
default_tasks = [
|
||||
Task(classify_documents),
|
||||
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
|
||||
Task(
|
||||
extract_chunks_from_documents, max_chunk_tokens=get_max_chunk_tokens()
|
||||
), # Extract text chunks based on the document type.
|
||||
|
|
|
|||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Reference in a new issue