Commit graph

534 commits

Author SHA1 Message Date
hajdul88
885f7c3f99 chore: fixing graph elements tests 2025-08-13 14:58:56 +02:00
hajdul88
38329da0e8
Merge branch 'dev' into feature/cog-2717-add-better-error-management-to-cognee 2025-08-13 14:11:56 +02:00
Igor Ilic
beea2f5e0a
Incremental loading migration (#1238)
<!-- .github/pull_request_template.md -->

## Description
Add relational db migration for incremental loading, change incremental
loading to work document per document instead of async together

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-13 07:58:09 -04:00
hajdul88
6dcd59c73c feat: Adds changes to cognee graph part 2025-08-13 13:07:04 +02:00
hajdul88
9fb9f68c42 adds new base errors to retrieval exceptions 2025-08-13 12:36:31 +02:00
hajdul88
5bc00f1143 feat: adds new search classes to search.py 2025-08-13 12:29:35 +02:00
hajdul88
6870bba5a9 feat: adds new error to delete 2025-08-13 12:03:18 +02:00
hajdul88
7bd2660d08 feat: setting base classes of data exceptions to the new ones 2025-08-13 11:58:32 +02:00
EricXiao
815d639132
fix: graph visualization access for users with read permissions (#1220)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
Description
This PR fix graph visualization access for users with read permissions
(https://github.com/topoteretes/cognee/issues/1182)

- Add permission checks for graph visualization endpoints to ensure
users can only access datasets they have permission to view
- Create get_dataset_with_permissions method to validate user access
before returning a dataset
- Remove redundant dataset existence validation in datasets router and
delegate permission checking to graph data retrieval
- Add comprehensive test suite for graph visualization permissions
covering owner access and permission granting scenarios
- Update get_formatted_graph_data() to use dataset owner's ID for
context
## Testing
Tests can be run with:
```bash
pytest -s cognee/tests/test_graph_visualization_permissions.py
```

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Signed-off-by: EricXiao <taoiaox@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-08-08 20:42:57 +02:00
lxobr
6dbd8e85a1
feat: dynamic multiple edges in datapoints (#1212)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Improved list handling, removed `.index` logic from
`get_graph_from_model`, transitioned to fully datapoint-oriented
processing
- Streamlined datapoint iteration by introducing `_datapoints_generator`
with nested loops
- Generalized field processing to handle mixed lists: `[DataPoint,
(Edge, DataPoint), (Edge, [DataPoint])]`, allowing dynamic multiple
edges generation
- Small improvements and refactorings
- Added tests to `test_get_graph_from_model_flexible_edges()` covering
weighted edges and dynamic multiple edges
- Created `dynamic_multiple_edges_example.py` demonstrating dynamic
multiple edges

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-07 14:50:45 +02:00
Vasilije
dabd0912f8
feat: Cog 2082 add BAML to cognee (#1054)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Signed-off-by: Raj2604 <rajmandhare26@gmail.com>
Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Co-authored-by: Raj Mandhare <96978537+Raj2604@users.noreply.github.com>
Co-authored-by: Pedro Thompson <thompsonp17@hotmail.com>
Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br>
2025-08-06 10:41:47 +02:00
Igor Ilic
8d4ed35cbe
Fix low level pipeline (#1203)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-08-05 17:01:48 +02:00
EricXiao
fc7a91d991
feature: implement FEELING_LUCKY search type (#1178)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
This PR implements the 'FEELING_LUCKY' search type, which intelligently
routes user queries to the most appropriate search retriever, addressing
[#1162](https://github.com/topoteretes/cognee/issues/1162).

- implement new search type FEELING_LUCKY
- Add the select_search_type function to analyze queries and choose the
proper search type
- Integrate with an LLM for intelligent search type determination
- Add logging for the search type selection process
- Support fallback to RAG_COMPLETION when the LLM selection fails
- Add tests for the new search type

## How it works
When a user selects the 'FEELING_LUCKY' search type, the system first
sends their natural language query to an LLM-based classifier. This
classifier analyzes the query's intent (e.g., is it asking for a
relationship, a summary, or a factual answer?) and selects the optimal
SearchType, such as 'INSIGHTS' or 'GRAPH_COMPLETION'. The main search
function then proceeds using this dynamically selected type. If the
classification process fails, it gracefully falls back to the default
'RAG_COMPLETION' type.

## Testing
Tests can be run with:
```bash
python -m pytest cognee/tests/unit/modules/search/search_methods_test.py -k "feeling_lucky" -v
```

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-08-02 16:30:08 +02:00
Igor Ilic
14ba3e8829
feat: Enable async execution of data items for incremental loading (#1092)
<!-- .github/pull_request_template.md -->

## Description
Attempt at making incremental loading run async

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-29 10:39:31 -04:00
hajdul88
f78af0cec3
feature: solve edge embedding duplicates in edge collection + retriever optimization (#1151)
<!-- .github/pull_request_template.md -->

## Description
feature: solve edge embedding duplicates in edge collection + retriever
optimization

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-07-29 12:35:38 +02:00
hajdul88
9157d3c2dd
feature: cover current context structure with unit test and add time logging to vector collection retrievals (#1144)
<!-- .github/pull_request_template.md -->

## Description
Cover current context structure with unit test so it is not changed
accidentally in the future

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-25 13:04:43 +02:00
Igor Ilic
dbdf04c089
Data model migration (#1143)
<!-- .github/pull_request_template.md -->

## Description
Data model migration for new release

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-24 15:03:16 +02:00
Vasilije
1885ab9e88
chore: Cog 2354 add logging (#1115)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-07-24 13:27:27 +02:00
Boris
d6727a1b4a
fix: UnstructuredDocument read method (#1141)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-24 13:23:27 +02:00
Vasilije
daa4e9acc4
fix: Remove weaviate (#1139)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-23 19:34:35 +02:00
hajdul88
2b1c17404c
Feature: optimizes query embedding and edge collection search (#1126)
<!-- .github/pull_request_template.md -->

## Description
Optimizes query embedding by reducing the number of query embedding
calls and avoids multiple edge collection searches when they are
available.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-23 11:47:22 +02:00
Igor Ilic
59594e01ac
fix: add missing await for getting default user (#1131)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-23 06:47:38 +02:00
Igor Ilic
022c96de55
refactor: simplify endpoint default values (#1123)
<!-- .github/pull_request_template.md -->

## Description
Simplify Cognee endpoints so default dataset ID will be None

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-22 09:09:44 -04:00
Pedro Thompson
115585ee9c
enhancement: Optimizing embedding calls in brute_force_search (#1101)
@Vasilije1990

- Use query_vector instead of query_text in brute_force_search

<!-- .github/pull_request_template.md -->

## Description

[Here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L163))
brute_force_search uses the vector engine to perform the same search —
with the same query text — across multiple collections, making the
embedding calls unnecessarily proportional to the number of collections
being searched.

Since the
[search](ef1aecd835/cognee/infrastructure/databases/vector/vector_db_interface.py (L85))
interface is already designed to accept precomputed query vectors, I’m
submitting an optimization to brute_force_search to take advantage of
this.

If this is considered good practice, it might be worth implementing a
direct query_vector argument in
[map_vector_distances_to_graph_edges](ef1aecd835/cognee/modules/graph/cognee_graph/CogneeGraph.py (L135))
, and using it both
[here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L179))
and in any future uses of map_vector_distances_to_graph_edges.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-07-22 13:50:25 +02:00
hajdul88
dad7da2e7b
fix:Fixes missing entity to entity edges (#1118)
<!-- .github/pull_request_template.md -->

## Description
Fixes missing entity to entity edges

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-22 11:48:56 +02:00
Boris
c5bd6bed40
fix: s3 file storage (#1095)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-16 20:36:18 +02:00
Boris
46c4463cb2
feat: s3 storage (#988)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-07-14 21:47:08 +02:00
Vasilije
4bcb893a54
feat: Weighted edges (#1068)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-07-14 21:26:25 +02:00
Igor Ilic
f68fd59b95
feat: Data size info tracking (#1088)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-14 19:03:58 +02:00
vasilije
02f7f4bf78 formatting 2025-07-13 20:39:55 +02:00
vasilije
bd892652ad add info 2025-07-13 18:22:46 +02:00
Boris Arzentar
9c5f1a2686
fix: break circular data points in a graph model 2025-07-09 00:33:23 +02:00
Boris Arzentar
66427e725c
fix: remove obsolete files and fix unit tests 2025-07-08 22:47:09 +02:00
Boris Arzentar
340a61b20a
Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization 2025-07-08 22:13:22 +02:00
Igor Ilic
e51de46163
feat: Add test for permissions, change Cognee search return value (#1058)
<!-- .github/pull_request_template.md -->

## Description
Add tests for permissions for Cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-08 13:33:03 +02:00
Boris Arzentar
a4a2742c52
fix: add retries 2025-07-08 10:26:06 +02:00
Boris Arzentar
fa5ea44345
Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization 2025-07-06 21:03:10 +02:00
Boris Arzentar
685d282f5c
fix: add error handling 2025-07-06 21:03:02 +02:00
hajdul88
3c3c89a140
fix: Adds graceful handling quick fix for damaged pdf files (#1047)
<!-- .github/pull_request_template.md -->

## Description
fix: Adds graceful handling quick fix for damaged pdf files

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-06 13:09:42 +02:00
Boris Arzentar
f8f1bb3576
fix: add queue for data points saving 2025-07-04 18:26:22 +02:00
Boris Arzentar
4eba76ca1f
Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization 2025-07-04 15:37:57 +02:00
Boris Arzentar
00dd3b8d97
fix: run cognee distributed with modal 2025-07-04 15:28:05 +02:00
Vasilije
ada3f7b086
fix: Logger suppresion and database logs (#1041)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-07-03 20:08:27 +02:00
Boris Arzentar
86bd3e4a5a
Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization 2025-07-02 11:28:22 +02:00
Igor Ilic
58aeb03688 fix: resolve issue with write permission on datasets not owned by current user 2025-06-30 19:15:18 +02:00
Igor Ilic
0e02f75636 fix: Resolve pipeline id issue 2025-06-30 16:00:42 +02:00
Boris
da14497ddc
fix: authorize in swagger (#1034)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-06-30 15:56:30 +02:00
Boris Arzentar
64edb38c43
fix: add custom openauth schema 2025-06-30 15:09:13 +02:00
Boris Arzentar
72ac4bce43
Merge remote-tracking branch 'origin/dev' into fix/aithorize-in-swagger 2025-06-30 14:34:10 +02:00
Boris Arzentar
ce8203e2d3
fix: update cognee in mcp and use stdio server 2025-06-30 13:42:13 +02:00