Commit graph

114 commits

Author SHA1 Message Date
andikarachman
db0818cd33 feat(translation): implement multilingual content translation task
- Add translation module with OpenAI, Google, Azure provider support
- Implement language detection using langdetect
- Add TranslatedContent and LanguageMetadata models
- Integrate translation task into cognify pipeline
- Add auto_translate parameter to cognify() function
- Preserve original text alongside translations
- Support custom translation providers and target languages
2026-01-01 15:46:53 +07:00
Igor Ilic
14d9540d1b
feat: Add database deletion on dataset delete (#1893)
<!-- .github/pull_request_template.md -->

## Description
- Add support for database deletion when dataset is deleted
- Simplify dataset handler usage in Cognee

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved dataset deletion: stronger authorization checks and reliable
removal of associated graph and vector storage.

* **Tests**
* Added end-to-end test to verify complete dataset deletion and cleanup
of all related storage components.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-15 18:15:48 +01:00
Andrej Milicevic
433170fe09 merge dev 2025-12-15 17:06:20 +01:00
Igor Ilic
ede884e0b0
feat: make pipeline processing cache optional (#1876)
<!-- .github/pull_request_template.md -->

## Description
Make the pipeline cache mechanism optional, have it turned off by
default but use it for add and cognify like it has been used until now

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [ x I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Introduced pipeline caching across ingestion, processing, and custom
pipeline flows with per-run controls to enable or disable caching.
  * Added an option for incremental loading in custom pipeline runs.

* **Behavior Changes**
* One pipeline path now explicitly bypasses caching by default to always
re-run when invoked.
* Disabling cache forces re-processing instead of early exit; cache
reset still enables re-execution.

* **Tests**
* Added tests validating caching, non-caching, and cache-reset
re-execution behavior.

* **Chores**
  * Added CI job to run pipeline caching tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-12 13:11:31 +01:00
Igor Ilic
59f8d12fa3 Merge branch 'main' into merge-main-vol7 2025-12-11 19:11:24 +01:00
hajdul88
001fbe699e
feat: Adds edge centered payload and embedding structure during ingestion (#1853)
<!-- .github/pull_request_template.md -->

## Description
This pull request introduces edge‑centered payloads to the ingestion
process. Payloads are stored in the Triplet_text collection which is
compatible with the triplet_embedding memify pipeline.

Changes in This PR:

- Refactored custom edge handling, from now on they can be passed to the
add_data_points method so the ingestion is centralized and is happening
in one place.
- Added private methods to handle edge centered payload creation inside
the add_data_points.py
- Added unit tests to cover the new functionality
- Added integration tests
- Added e2e tests

Acceptance Criteria and Testing
Scenario 1:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB contains a non empty Triplet_text collection and
the number of triplets are matching with the number of edges in the
graph database
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Set TRIPLET_EMBEDDING env var to True
-Run prune, add, cognify
-Verify the vector DB does not have the Triplet_text collection 
-You should receive an error indicating that the Triplet_text is not
available


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet embeddings supported—embeddings created from graph edges plus
connected node text
  * Ability to supply custom edges when adding data points
  * New configuration toggle to enable/disable triplet embedding

* **Tests**
* Added comprehensive unit and end-to-end tests for edge-centered
payloads and triplet embedding
  * New CI job to run the edge-centered payload e2e test

* **Bug Fixes**
* Adjusted server start behavior to surface process output in parent
logs

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-10 17:10:06 +01:00
Andrej Milicevic
aa8afefe8a feat: add kwargs to cognify and related tasks 2025-11-27 17:05:37 +01:00
martin0731
3acb581bd0 Removed check_permissions_on_dataset.py and related references 2025-11-13 08:31:15 -05:00
vasilije
0402619ed7 add merge 2025-10-21 07:24:12 +02:00
Igor Ilic
2e1bfe78b1 refactor: rename variable to be more understandable 2025-10-15 20:26:59 +02:00
Igor Ilic
a210bd5905 refactor: rename chunk_batch_size to chunks_per_batch 2025-10-15 20:24:36 +02:00
Igor Ilic
3a9022a26c refactor: Rename batch size for tasks to chunk batch size 2025-10-15 20:22:29 +02:00
Igor Ilic
2fb06e0729 refactor: forwarding of data batch size rework 2025-10-15 20:18:48 +02:00
Igor Ilic
5663c3fe3a refactor: add batch size param to temporal graphs 2025-10-15 17:38:18 +02:00
Igor Ilic
1b28f13743 refactor: Optimize Cognee speed 2025-10-15 13:32:17 +02:00
Igor Ilic
417015d9a9 Merge branch 'dev' into embedding-rate-limiter 2025-10-14 20:39:10 +02:00
Igor Ilic
84a23756f5 fix: Change chunk_size ot batch_size for temporal task 2025-10-14 14:25:38 +02:00
Igor Ilic
eb631a23ad refactor: set default numbers that are more reasonable 2025-10-14 13:57:41 +02:00
Igor Ilic
757d745b5d refactor: Optimize cognification speed 2025-10-10 17:12:09 +02:00
Igor Ilic
abfcbc69d6 refactor: Have embedding calls run in async gather 2025-10-10 15:36:36 +02:00
Daulet Amirkhanov
63a1463073 Deprecate SearchType.INSIGHTS, replace all references to default search type - SearchType.GRAPH_COMPLETION 2025-10-08 12:13:59 +01:00
hajdul88
2f225c9e03 feat: adds ontology resolver env handling 2025-09-19 12:54:33 +02:00
hajdul88
94373e5a01 feat: adds new config structure based on requirements 2025-09-18 17:24:23 +02:00
hajdul88
d2c7980e83 chore: updates mutable default param 2025-09-17 14:14:39 +02:00
hajdul88
e815a3fc14 chore: changes ontology file path parameter to the new config structure 2025-09-17 14:12:47 +02:00
hajdul88
93a383b56a feat: adds matching strategies and moves resolver 2025-09-17 12:23:30 +02:00
hajdul88
f651991c86 feat: adds base class + renames rdflib implementation 2025-09-17 12:02:38 +02:00
hajdul88
1970106f1e chore: adds docstrings 2025-08-29 16:07:18 +02:00
hajdul88
140437acf1 ruff fix 2025-08-27 19:23:29 +02:00
hajdul88
34ff4ad9da fix: circular dep fix 2025-08-27 19:21:49 +02:00
hajdul88
70727332ee ruff format 2025-08-27 18:08:16 +02:00
hajdul88
678173dad4
Merge branch 'dev' into feature/cog-2746-time-graph-to-cognify 2025-08-27 18:07:20 +02:00
hajdul88
58a3be7c12 ruff format 2025-08-27 18:04:58 +02:00
hajdul88
3482f353a9 chore: adds extract kg from events and changes temporal tasks call 2025-08-27 18:02:57 +02:00
hajdul88
2ec22567c3 feat: adds temporal tasks to cognify 2025-08-27 15:18:47 +02:00
Vasilije
62afced9a5
feat: Added custom prompt to cognify (#1278)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-27 14:10:21 +02:00
Boris
6e5acec292
refactor: make run_pipeline a high-level api for running pipelines (#1294)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-27 09:49:20 +02:00
hajdul88
42d33fcd00
fix: fixes search test behaviour and adds comments to new pipeline executor logic (#1293)
<!-- .github/pull_request_template.md -->

## Description
fix: fixes search test behaviour and adds comments to new pipeline
executor logic

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-26 15:52:10 +02:00
hajdul88
d91b0f6aa3
feature: adds pipeline execution layer to cognify (#1291)
<!-- .github/pull_request_template.md -->

## Description
feature: adds pipeline execution layer to cognify

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-26 14:07:19 +02:00
vasilije
d69669b527 added ability to send custom prompts to cognify 2025-08-22 12:37:51 +02:00
vasilije
1bd40f1401 renamed max tokens 2025-08-17 12:39:51 +02:00
hajdul88
544e08930b feat: removing invalidValueErrors 2025-08-13 14:42:57 +02:00
Igor Ilic
14ba3e8829
feat: Enable async execution of data items for incremental loading (#1092)
<!-- .github/pull_request_template.md -->

## Description
Attempt at making incremental loading run async

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-29 10:39:31 -04:00
Igor Ilic
01bab3f0c7
Fix cognify endpoint (#1105)
<!-- .github/pull_request_template.md -->

## Description
Have cognify run in background

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-18 16:37:04 +02:00
Igor Ilic
e51de46163
feat: Add test for permissions, change Cognee search return value (#1058)
<!-- .github/pull_request_template.md -->

## Description
Add tests for permissions for Cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-08 13:33:03 +02:00
Vasilije
c936f5e0a3
feat: adding docstrings (#1045)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-03 21:24:47 +02:00
Igor Ilic
14be2a5f5d
feat: Add dataset_id to pipeline run info and status (#1009)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-06-30 11:53:17 +02:00
Boris
773b15a645
feat: websockets for pipeline update streaming (#851)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-06-11 20:29:26 +02:00
Igor Ilic
1ed6cfd918
feat: new Dataset permissions (#869)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-06-06 14:20:57 +02:00
Boris
0f3522eea6
fix: cognee docker image (#820)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-15 10:05:27 +02:00