Commit graph

39 commits

Author SHA1 Message Date
hajdul88
fd23c75c09 chore: adds new Unit tests for retrievers 2025-12-12 14:44:41 +01:00
lxobr
c04d255aca feat: remove secondary search 2025-12-08 17:29:25 +01:00
hajdul88
d4d190ac2b
feature: adds triplet embedding via memify (#1832)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces triplet embeddings via a new
create_triplet_embeddings memify pipeline.
The pipeline reads the graph in batches, extracts properties from graph
elements based on their datapoint types, and generates combined triplet
embeddings. These embeddings are stored in the vector database as a new
collection.

Changes in This PR:

-Added a new create_triplet_embeddings memify pipeline.
-Added a new get_triplet_datapoints memify task.
-Introduced a new triplet_completion search type.
-Added full test coverage
--Unit tests: memify task, pipeline, and retriever
--Integration tests: memify task, pipeline, and retriever
--End-to-end tests: updated session history tests and multi-DB search
tests; added tests for triplet_completion and memify pipeline execution

Acceptance Criteria and Testing
Scenario 1:
-Run default add, cognify pipelines
-Run create triplet embeddings memify pipeline
-Verify the vector DB contains a non empty Triplet_text collection.
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Run the default add and cognify pipelines.
-Do not run the triplet embeddings memify pipeline.
-Attempt to use the triplet_completion search type.
-You should receive an error indicating that the triplet embeddings
memify pipeline must be executed first.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION)
* Batch triplet retrieval and a triplet embeddings pipeline for
extraction, indexing, and optional background processing
* Context retrieval from triplet embeddings with optional caching and
conversation-history support
  * New Triplet data type exposed for indexing and search

* **Examples**
* End-to-end example demonstrating triplet embeddings extraction and
TRIPLET_COMPLETION search

* **Tests**
* Unit and integration tests covering triplet extraction, retrieval,
embedding pipeline, and completion flows

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-02 18:27:08 +01:00
hajdul88
508165e883
feature: Introduces wide subgraph search in graph completion and improves QA speed (#1736)
<!-- .github/pull_request_template.md -->

This PR introduces wide vector and graph structure filtering
capabilities. With these changes, the graph completion retriever and all
retrievers that inherit from it will now filter relevant vector elements
and subgraphs based on the query. This improvement significantly
increases search speed for large graphs while maintaining—and in some
cases slightly improving—accuracy.

Changes in This PR:

-Introduced new wide_search_top_k parameter: Controls the initial search
space size

-Added graph adapter level filtering method: Enables relevant subgraph
filtering while maintaining backward compatibility. For community or
custom graph adapters that don't implement this method, the system
gracefully falls back to the original search behavior.

-Updated modal dashboard and evaluation framework: Fixed compatibility
issues.
Added comprehensive unit tests: Introduced unit tests for
brute_force_triplet_search (previously untested) and expanded the
CogneeGraph test suite.

Integration tests: Existing integration tests verify end-to-end search
functionality (no changes required).

Acceptance Criteria and Testing

To verify the new search behavior, run search queries with different
wide_search_top_k parameters while logging is enabled:
None: Triggers a full graph search (default behavior)
1: Projects a minimal subgraph (demonstrates maximum filtering)
Custom values: Test intermediate levels of filtering

Internal Testing and results:
Performance and accuracy benchmarks are available upon request. The
implementation demonstrates measurable improvements in query latency for
large graphs without sacrificing result quality.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-26 15:18:53 +01:00
Andrej Milicevic
4ab53c9d64 changes based on PR comments 2025-11-07 10:00:17 +01:00
Andrej Milicevic
72ba8d0dcb chore: ruff format 2025-11-06 17:12:33 +01:00
Andrej Milicevic
da5055a0a9 test: add one test that covers all retrievers. delete others 2025-11-06 17:11:15 +01:00
Andrej Milicevic
215ef7f3c2 test: add retriever tests 2025-11-05 17:29:40 +01:00
Andrej Milicevic
33b0516381 test: fix completion tests 2025-11-04 15:27:03 +01:00
Andrej Milicevic
7e3c24100b refactor: add structured output to completion retrievers 2025-11-04 15:09:33 +01:00
lxobr
46e6d87c1f Merge branch 'dev' into feature/cog-3187-feedback-enrichment-merge-test 2025-10-23 11:31:23 +02:00
lxobr
1e1fac3261 feat: allow structured output in the cot retriever 2025-10-20 23:43:41 +02:00
hajdul88
49e9d7dc27 chore: renames conversation history save method 2025-10-20 10:28:03 +02:00
hajdul88
e9f4e2000f feat: adds e2e conversation history test 2025-10-17 14:15:18 +02:00
hajdul88
ebb5b94265 chore: unit test fix for cache mocking 2025-10-17 11:06:34 +02:00
hajdul88
30a31889d0 ruff format 2025-10-17 10:30:35 +02:00
hajdul88
339de5a0b8 test fix 2025-10-17 10:25:26 +02:00
hajdul88
47cce90112 test fix 2025-10-17 10:18:39 +02:00
hajdul88
16b073bf8c ruff fix 2025-10-17 10:06:24 +02:00
hajdul88
4a03572f7c feat: adds unit test to conversation history save 2025-10-17 10:06:08 +02:00
Daulet Amirkhanov
63a1463073 Deprecate SearchType.INSIGHTS, replace all references to default search type - SearchType.GRAPH_COMPLETION 2025-10-08 12:13:59 +01:00
Igor Ilic
3b9415ee88 test: Resolve failing unit tests 2025-09-25 17:27:11 +02:00
Igor Ilic
94bc0ef47f tests: Resolve failing search tests 2025-09-11 23:16:25 +02:00
Boris
b1643414d2
feat: implement combined context search (#1341)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-10 16:33:08 +02:00
hajdul88
d336511c57 ruff fix 2025-09-01 15:31:30 +02:00
hajdul88
9df440c020 feat: adds time extraction + unit tests for temporal retriever 2025-09-01 15:18:29 +02:00
hajdul88
2d2a7d69d3 fix: adjusting test to the new Optional DocumentChunk property 2025-08-27 19:08:01 +02:00
hajdul88
78fb415892 chore: changes context return value in tests 2025-08-18 13:40:33 +02:00
hajdul88
9157d3c2dd
feature: cover current context structure with unit test and add time logging to vector collection retrievals (#1144)
<!-- .github/pull_request_template.md -->

## Description
Cover current context structure with unit test so it is not changed
accidentally in the future

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-07-25 13:04:43 +02:00
Boris
46c4463cb2
feat: s3 storage (#988)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-07-14 21:47:08 +02:00
hajdul88
d1a9cab17d
Feature: Set default database to Kuzu (#1022)
<!-- .github/pull_request_template.md -->

## Description
Set default db to kuzu and remove networkx adapter due to community repo
adapter

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-06-27 08:50:58 +02:00
hajdul88
d6639217c3
Feat: Adds context extension search (#865)
<!-- .github/pull_request_template.md -->

## Description
Adds context extension search

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-22 18:25:43 +02:00
hajdul88
e0798ff25f
Feat: Adds chain of thought retriever (#864)
<!-- .github/pull_request_template.md -->

## Description
Adds chain of thought retriever

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-22 13:24:56 +02:00
Boris
cd9c4897a4
feat: remove get_distance_from_collection_names and adapt search (#766)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-30 11:11:07 +02:00
Boris
675b66175f
test: make search unit tests deterministic (#726)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
2025-04-18 21:55:24 +02:00
Igor Ilic
da332e85fe
Add top k [COG-1862] (#743)
<!-- .github/pull_request_template.md -->

## Description
Add ability to define top-k for Cognee search types Insights, RAG and
GRAPH Completion

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-17 14:01:35 +02:00
alekszievr
936fcf7cd7
chore: handle empty distance list in brute force search [cog-1424] (#654)
<!-- .github/pull_request_template.md -->

## Description
- handle empty distance list in brute force search
- unit tests

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-03-25 15:50:02 +01:00
lxobr
ee88fcf5d3
feat: reimplement resolve_edges_to_text with cleaner formatting (#652)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Optimized to deduplicate nodes appearing in multiple triplets,
avoiding redundant text repetition
- Reimplemented `resolve_edges_to_text` with cleaner formatting
  - Added `_top_n_words` method for extracting frequent words from text
- Created `_get_title` function to generate titles from text content
based on first words and word frequency
  - Extracted node processing logic to `_get_nodes` helper method
  - Created dedicated `stop_words` utility with common English stopwords

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Improved text output formatting that organizes content into clearly
defined sections for enhanced readability.
- Enhanced text processing capabilities, including refined title
generation and key phrase extraction.
- Introduced a comprehensive utility for managing common stop words,
further optimizing text analysis.
  
- **Bug Fixes**
- Updated tests to ensure accurate validation of new functionalities and
improved existing test coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-03-20 14:52:04 +01:00
alekszievr
164cb581ec
test: test retrievers [cog-1433] (#635)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
	- Removed unused code to streamline internal processes.
  
- **Tests**
- Added a comprehensive suite of tests to validate core retrieval and
search functionalities.
- Improved validation of response generation, context handling, and
error scenarios to ensure consistent and reliable performance.

These improvements enhance overall system stability and maintainability,
contributing to a smoother experience for end-users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-03-20 10:18:21 +01:00