Commit graph

108 commits

Author SHA1 Message Date
Daulet Amirkhanov
cd285d2f56 ruff format 2025-09-03 14:09:33 +01:00
Daulet Amirkhanov
de9bb495bc tests: update tests with suggested changes 2025-09-03 14:05:59 +01:00
Daulet Amirkhanov
f0e8f8cc47 refactor: use patch decorators instead of context managers 2025-09-03 13:58:18 +01:00
Vasilije
ea0edc7056
Merge branch 'dev' into feat/make-authentication-optional 2025-09-02 15:19:18 +02:00
hajdul88
d336511c57 ruff fix 2025-09-01 15:31:30 +02:00
hajdul88
9df440c020 feat: adds time extraction + unit tests for temporal retriever 2025-09-01 15:18:29 +02:00
Daulet Amirkhanov
2a3ec5f762 keep get_authenticated_user and move conditional auth 2025-09-01 13:06:38 +01:00
hajdul88
4e9c0810c2
Merge branch 'dev' into feature/cog-2746-time-graph-to-cognify 2025-08-29 18:21:45 +02:00
Igor Ilic
4b1681d856 Merge branch 'dev' into optional-search-flags 2025-08-29 17:02:11 +02:00
Igor Ilic
21f688385b feat: Add nodeset as default node type 2025-08-29 12:53:29 +02:00
Igor Ilic
5bfae7a36b refactor: Resolve unit tests failing for search 2025-08-29 10:30:49 +02:00
hajdul88
2d2a7d69d3 fix: adjusting test to the new Optional DocumentChunk property 2025-08-27 19:08:01 +02:00
Daulet Amirkhanov
1b643c8355 format: ruff format 2025-08-27 16:39:30 +01:00
Daulet Amirkhanov
f786780a20 tests: add unit tests for endpoints and conditional auth 2025-08-27 16:39:30 +01:00
hajdul88
372181d8c1 fix: fixes unit test 2025-08-19 09:43:34 +02:00
hajdul88
b6be61776a fix: fixes tests 2025-08-18 13:50:21 +02:00
hajdul88
78fb415892 chore: changes context return value in tests 2025-08-18 13:40:33 +02:00
hajdul88
da40365932 ruff formatting 2025-08-13 15:15:39 +02:00
hajdul88
d1bfeaa0f2 fix: fixes search unit test error expectation 2025-08-13 15:00:25 +02:00
hajdul88
885f7c3f99 chore: fixing graph elements tests 2025-08-13 14:58:56 +02:00
hajdul88
544e08930b feat: removing invalidValueErrors 2025-08-13 14:42:57 +02:00
EricXiao
fc7a91d991
feature: implement FEELING_LUCKY search type (#1178)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
This PR implements the 'FEELING_LUCKY' search type, which intelligently
routes user queries to the most appropriate search retriever, addressing
[#1162](https://github.com/topoteretes/cognee/issues/1162).

- implement new search type FEELING_LUCKY
- Add the select_search_type function to analyze queries and choose the
proper search type
- Integrate with an LLM for intelligent search type determination
- Add logging for the search type selection process
- Support fallback to RAG_COMPLETION when the LLM selection fails
- Add tests for the new search type

## How it works
When a user selects the 'FEELING_LUCKY' search type, the system first
sends their natural language query to an LLM-based classifier. This
classifier analyzes the query's intent (e.g., is it asking for a
relationship, a summary, or a factual answer?) and selects the optimal
SearchType, such as 'INSIGHTS' or 'GRAPH_COMPLETION'. The main search
function then proceeds using this dynamically selected type. If the
classification process fails, it gracefully falls back to the default
'RAG_COMPLETION' type.

## Testing
Tests can be run with:
```bash
python -m pytest cognee/tests/unit/modules/search/search_methods_test.py -k "feeling_lucky" -v
```

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-08-02 16:30:08 +02:00
hajdul88
9157d3c2dd
feature: cover current context structure with unit test and add time logging to vector collection retrievals (#1144)
<!-- .github/pull_request_template.md -->

## Description
Cover current context structure with unit test so it is not changed
accidentally in the future

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-25 13:04:43 +02:00
Boris
46c4463cb2
feat: s3 storage (#988)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-07-14 21:47:08 +02:00
Boris Arzentar
66427e725c
fix: remove obsolete files and fix unit tests 2025-07-08 22:47:09 +02:00
Vasilije
ada3f7b086
fix: Logger suppresion and database logs (#1041)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-07-03 20:08:27 +02:00
Igor Ilic
2c7eecc93d refactor: format file 2025-06-30 12:26:43 +02:00
Igor Ilic
0d75b6dc76 Merge branch 'main' into main-merge 2025-06-30 12:24:24 +02:00
Hashem Aldhaheri
fd77e92cc4
Fix: Handle file:// URLs in open_data_file function (#1019)
## Summary
This PR fixes an asymmetry issue where files saved with `file://`
prefixes could not be read back, causing "file not found" errors.

## Problem
The Cognee framework has a bug where:
- `save_data_to_file.py` adds `file://` prefix when saving files
- `open_data_file.py` doesn't handle the `file://` prefix when reading
files
- This causes saved files to appear as "lost" with cryptic "file not
found" errors

## Solution
Added proper handling for `file://` URLs in `open_data_file.py` by:
- Checking if the file path starts with `"file://"`
- Stripping the prefix using `replace("file://", "", 1)`
- Following the same pattern as S3 URL handling

## Changes
- Modified
`cognee/modules/data/processing/document_types/open_data_file.py` to
handle `file://` URLs
- Added comprehensive unit tests in
`cognee/tests/unit/modules/data/test_open_data_file.py`

## Testing
Added 6 test cases covering:
- Regular file paths (ensuring backward compatibility)
- file:// URLs in text mode
- file:// URLs in binary mode
- file:// URLs with specific encoding
- Nonexistent files with file:// URLs
- Edge case with multiple file:// prefixes

All tests pass successfully.

## Notes
- This is a minimal fix that maintains backward compatibility
- The fix follows the existing pattern used for S3 URL handling
- No breaking changes to the API

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Signed-off-by: Hashem Aldhaheri <aenawi@gmail.com>
2025-06-30 11:55:34 +02:00
hajdul88
d1a9cab17d
Feature: Set default database to Kuzu (#1022)
<!-- .github/pull_request_template.md -->

## Description
Set default db to kuzu and remove networkx adapter due to community repo
adapter

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-06-27 08:50:58 +02:00
hajdul88
acdcb0e8d9
feat: replace Owlready2 with RDFLib (#981)
<!-- .github/pull_request_template.md -->

## Description
Replaces Owlready2 with RDFLib

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-06-17 14:49:53 +02:00
Igor Ilic
1ed6cfd918
feat: new Dataset permissions (#869)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-06-06 14:20:57 +02:00
Vasilije
b58d7d44f3
fix: 0.1.41 Release (#894)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Signed-off-by: Diego B Theuerkauf <diego.theuerkauf@tuebingen.mpg.de>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
Co-authored-by: Diego Baptista Theuerkauf <34717973+diegoabt@users.noreply.github.com>
Co-authored-by: Dmitrii Galkin <36552323+dm1tryG@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions@users.noreply.github.com>
2025-05-31 02:19:29 +02:00
hajdul88
965033e161
Feat: Adds subgraph retriever to graph based completion searches (#874)
<!-- .github/pull_request_template.md -->

## Description
Adds subgraph retriever to graph based completion searches

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-27 11:40:47 +02:00
hajdul88
d6639217c3
Feat: Adds context extension search (#865)
<!-- .github/pull_request_template.md -->

## Description
Adds context extension search

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-22 18:25:43 +02:00
hajdul88
e0798ff25f
Feat: Adds chain of thought retriever (#864)
<!-- .github/pull_request_template.md -->

## Description
Adds chain of thought retriever

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-22 13:24:56 +02:00
Boris
0aac93e9c4
Merge dev to main (#827)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
Co-authored-by: Diego Baptista Theuerkauf <34717973+diegoabt@users.noreply.github.com>
2025-05-15 13:15:49 +02:00
Boris
5970d964cf
feat: pass context argument to tasks that require it (#788)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-04-30 12:32:40 +02:00
Boris
cd9c4897a4
feat: remove get_distance_from_collection_names and adapt search (#766)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-30 11:11:07 +02:00
Boris
675b66175f
test: make search unit tests deterministic (#726)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
2025-04-18 21:55:24 +02:00
Igor Ilic
da332e85fe
Add top k [COG-1862] (#743)
<!-- .github/pull_request_template.md -->

## Description
Add ability to define top-k for Cognee search types Insights, RAG and
GRAPH Completion

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-17 14:01:35 +02:00
alekszievr
936fcf7cd7
chore: handle empty distance list in brute force search [cog-1424] (#654)
<!-- .github/pull_request_template.md -->

## Description
- handle empty distance list in brute force search
- unit tests

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-03-25 15:50:02 +01:00
lxobr
ee88fcf5d3
feat: reimplement resolve_edges_to_text with cleaner formatting (#652)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Optimized to deduplicate nodes appearing in multiple triplets,
avoiding redundant text repetition
- Reimplemented `resolve_edges_to_text` with cleaner formatting
  - Added `_top_n_words` method for extracting frequent words from text
- Created `_get_title` function to generate titles from text content
based on first words and word frequency
  - Extracted node processing logic to `_get_nodes` helper method
  - Created dedicated `stop_words` utility with common English stopwords

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Improved text output formatting that organizes content into clearly
defined sections for enhanced readability.
- Enhanced text processing capabilities, including refined title
generation and key phrase extraction.
- Introduced a comprehensive utility for managing common stop words,
further optimizing text analysis.
  
- **Bug Fixes**
- Updated tests to ensure accurate validation of new functionalities and
improved existing test coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-03-20 14:52:04 +01:00
alekszievr
164cb581ec
test: test retrievers [cog-1433] (#635)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
	- Removed unused code to streamline internal processes.
  
- **Tests**
- Added a comprehensive suite of tests to validate core retrieval and
search functionalities.
- Improved validation of response generation, context handling, and
error scenarios to ensure consistent and reliable performance.

These improvements enhance overall system stability and maintainability,
contributing to a smoother experience for end-users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-03-20 10:18:21 +01:00
hajdul88
6fcfb3c398
feat: productionizing ontology solution [COG-1401] (#623)
<!-- .github/pull_request_template.md -->

## Description
This PR contains the ontology feature integrated into cognify

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced ontology management with the introduction of the
`OntologyResolver` class for improved data handling and querying.
- Expanded ontology framework now provides enriched coverage of
technology and automotive domains, including new entities and
relationships.
- Updated entity models now include a validation flag to support
improved data integrity.
- Added support for specifying an ontology file path in relevant
functions to enhance flexibility.

- **Refactor**
- Streamlined integration of ontology processing across data extraction
and workflow routines.

- **Chores**
- Updated project dependencies to include `owlready2` for advanced
ontology functionality.
  
- **Tests**
- Introduced a new test suite for the `OntologyResolver` class to
validate its functionality under various conditions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-12 14:31:19 +01:00
lxobr
9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
hajdul88
bcd326518d
feat: implements graph visualization method for cognee (#493)
<!-- .github/pull_request_template.md -->

## Description
This PR contains the improvement of the visualization endpoint

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Launched an enhanced interactive network visualization utility that
renders dynamic, browser-based graphs. The new feature simplifies
execution by directly generating an HTML file showcasing the
visualization—complete with interactive elements and an on-screen
confirmation—providing a more intuitive and efficient experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-06 11:22:17 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
hajdul88
6d85165189
Feature/cog 539 implementing additional retriever approaches (#262)
* fix: refactor get_graph_from_model to return nodes and edges correctly

* fix: add missing params

* fix: remove complex zip usage

* fix: add edges to data_point properties

* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector

* fix: fixes database pruning issue in pgvector (#261)

* feat: adds code summary embeddings to vector DB

* fix: cognee_demo notebook pipeline is not saving summaries

* feat: implements first version of codegraph retriever

* chore: implements minor changes mostly to make the code production ready

* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation

* feat: implements unit tests for description to codepart search

* fix: fixes edge property inconsistent access in codepart retriever

* chore: implements more precise typing for get_attribute method for cogneegraph

* chore: adds spacing to tests and changes the cogneegraph getter names

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-12-10 11:07:06 +01:00