cognee

Author	SHA1	Message	Date
hajdul88	fd23c75c09	chore: adds new Unit tests for retrievers	2025-12-12 14:44:41 +01:00
lxobr	c04d255aca	feat: remove secondary search	2025-12-08 17:29:25 +01:00
hajdul88	d4d190ac2b	feature: adds triplet embedding via memify (#1832 ) <!-- .github/pull_request_template.md --> ## Description This PR introduces triplet embeddings via a new create_triplet_embeddings memify pipeline. The pipeline reads the graph in batches, extracts properties from graph elements based on their datapoint types, and generates combined triplet embeddings. These embeddings are stored in the vector database as a new collection. Changes in This PR: -Added a new create_triplet_embeddings memify pipeline. -Added a new get_triplet_datapoints memify task. -Introduced a new triplet_completion search type. -Added full test coverage --Unit tests: memify task, pipeline, and retriever --Integration tests: memify task, pipeline, and retriever --End-to-end tests: updated session history tests and multi-DB search tests; added tests for triplet_completion and memify pipeline execution Acceptance Criteria and Testing Scenario 1: -Run default add, cognify pipelines -Run create triplet embeddings memify pipeline -Verify the vector DB contains a non empty Triplet_text collection. -Use the new triplet_completion search type and confirm it works correctly. Scenario 2: -Run the default add and cognify pipelines. -Do not run the triplet embeddings memify pipeline. -Attempt to use the triplet_completion search type. -You should receive an error indicating that the triplet embeddings memify pipeline must be executed first. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION) * Batch triplet retrieval and a triplet embeddings pipeline for extraction, indexing, and optional background processing * Context retrieval from triplet embeddings with optional caching and conversation-history support * New Triplet data type exposed for indexing and search * Examples * End-to-end example demonstrating triplet embeddings extraction and TRIPLET_COMPLETION search * Tests * Unit and integration tests covering triplet extraction, retrieval, embedding pipeline, and completion flows <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-12-02 18:27:08 +01:00
hajdul88	508165e883	feature: Introduces wide subgraph search in graph completion and improves QA speed (#1736 ) <!-- .github/pull_request_template.md --> This PR introduces wide vector and graph structure filtering capabilities. With these changes, the graph completion retriever and all retrievers that inherit from it will now filter relevant vector elements and subgraphs based on the query. This improvement significantly increases search speed for large graphs while maintaining—and in some cases slightly improving—accuracy. Changes in This PR: -Introduced new wide_search_top_k parameter: Controls the initial search space size -Added graph adapter level filtering method: Enables relevant subgraph filtering while maintaining backward compatibility. For community or custom graph adapters that don't implement this method, the system gracefully falls back to the original search behavior. -Updated modal dashboard and evaluation framework: Fixed compatibility issues. Added comprehensive unit tests: Introduced unit tests for brute_force_triplet_search (previously untested) and expanded the CogneeGraph test suite. Integration tests: Existing integration tests verify end-to-end search functionality (no changes required). Acceptance Criteria and Testing To verify the new search behavior, run search queries with different wide_search_top_k parameters while logging is enabled: None: Triggers a full graph search (default behavior) 1: Projects a minimal subgraph (demonstrates maximum filtering) Custom values: Test intermediate levels of filtering Internal Testing and results: Performance and accuracy benchmarks are available upon request. The implementation demonstrates measurable improvements in query latency for large graphs without sacrificing result quality. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [x] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) None ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Pavel Zorin <pazonec@yandex.ru>	2025-11-26 15:18:53 +01:00
Andrej Milicevic	4ab53c9d64	changes based on PR comments	2025-11-07 10:00:17 +01:00
Andrej Milicevic	72ba8d0dcb	chore: ruff format	2025-11-06 17:12:33 +01:00
Andrej Milicevic	da5055a0a9	test: add one test that covers all retrievers. delete others	2025-11-06 17:11:15 +01:00
Andrej Milicevic	215ef7f3c2	test: add retriever tests	2025-11-05 17:29:40 +01:00
Andrej Milicevic	33b0516381	test: fix completion tests	2025-11-04 15:27:03 +01:00
Andrej Milicevic	7e3c24100b	refactor: add structured output to completion retrievers	2025-11-04 15:09:33 +01:00
lxobr	46e6d87c1f	Merge branch 'dev' into feature/cog-3187-feedback-enrichment-merge-test	2025-10-23 11:31:23 +02:00
lxobr	1e1fac3261	feat: allow structured output in the cot retriever	2025-10-20 23:43:41 +02:00
hajdul88	49e9d7dc27	chore: renames conversation history save method	2025-10-20 10:28:03 +02:00
hajdul88	e9f4e2000f	feat: adds e2e conversation history test	2025-10-17 14:15:18 +02:00
hajdul88	ebb5b94265	chore: unit test fix for cache mocking	2025-10-17 11:06:34 +02:00
hajdul88	30a31889d0	ruff format	2025-10-17 10:30:35 +02:00
hajdul88	339de5a0b8	test fix	2025-10-17 10:25:26 +02:00
hajdul88	47cce90112	test fix	2025-10-17 10:18:39 +02:00
hajdul88	16b073bf8c	ruff fix	2025-10-17 10:06:24 +02:00
hajdul88	4a03572f7c	feat: adds unit test to conversation history save	2025-10-17 10:06:08 +02:00
Daulet Amirkhanov	63a1463073	Deprecate `SearchType.INSIGHTS`, replace all references to default search type - `SearchType.GRAPH_COMPLETION`	2025-10-08 12:13:59 +01:00
Igor Ilic	3b9415ee88	test: Resolve failing unit tests	2025-09-25 17:27:11 +02:00
Igor Ilic	94bc0ef47f	tests: Resolve failing search tests	2025-09-11 23:16:25 +02:00
Boris	b1643414d2	feat: implement combined context search (#1341 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-09-10 16:33:08 +02:00
hajdul88	d336511c57	ruff fix	2025-09-01 15:31:30 +02:00
hajdul88	9df440c020	feat: adds time extraction + unit tests for temporal retriever	2025-09-01 15:18:29 +02:00
hajdul88	2d2a7d69d3	fix: adjusting test to the new Optional DocumentChunk property	2025-08-27 19:08:01 +02:00
hajdul88	78fb415892	chore: changes context return value in tests	2025-08-18 13:40:33 +02:00
hajdul88	9157d3c2dd	feature: cover current context structure with unit test and add time logging to vector collection retrievals (#1144 ) <!-- .github/pull_request_template.md --> ## Description Cover current context structure with unit test so it is not changed accidentally in the future ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-07-25 13:04:43 +02:00
Boris	46c4463cb2	feat: s3 storage (#988 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-07-14 21:47:08 +02:00
hajdul88	d1a9cab17d	Feature: Set default database to Kuzu (#1022 ) <!-- .github/pull_request_template.md --> ## Description Set default db to kuzu and remove networkx adapter due to community repo adapter ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-06-27 08:50:58 +02:00
hajdul88	d6639217c3	Feat: Adds context extension search (#865 ) <!-- .github/pull_request_template.md --> ## Description Adds context extension search ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-22 18:25:43 +02:00
hajdul88	e0798ff25f	Feat: Adds chain of thought retriever (#864 ) <!-- .github/pull_request_template.md --> ## Description Adds chain of thought retriever ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-22 13:24:56 +02:00
Boris	cd9c4897a4	feat: remove get_distance_from_collection_names and adapt search (#766 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-30 11:11:07 +02:00
Boris	675b66175f	test: make search unit tests deterministic (#726 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Daniel Molnar <soobrosa@gmail.com>	2025-04-18 21:55:24 +02:00
Igor Ilic	da332e85fe	Add top k [COG-1862] (#743 ) <!-- .github/pull_request_template.md --> ## Description Add ability to define top-k for Cognee search types Insights, RAG and GRAPH Completion ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-17 14:01:35 +02:00
alekszievr	936fcf7cd7	chore: handle empty distance list in brute force search [cog-1424] (#654 ) <!-- .github/pull_request_template.md --> ## Description - handle empty distance list in brute force search - unit tests ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-03-25 15:50:02 +01:00
lxobr	ee88fcf5d3	feat: reimplement `resolve_edges_to_text` with cleaner formatting (#652 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Optimized to deduplicate nodes appearing in multiple triplets, avoiding redundant text repetition - Reimplemented `resolve_edges_to_text` with cleaner formatting - Added `_top_n_words` method for extracting frequent words from text - Created `_get_title` function to generate titles from text content based on first words and word frequency - Extracted node processing logic to `_get_nodes` helper method - Created dedicated `stop_words` utility with common English stopwords ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Improved text output formatting that organizes content into clearly defined sections for enhanced readability. - Enhanced text processing capabilities, including refined title generation and key phrase extraction. - Introduced a comprehensive utility for managing common stop words, further optimizing text analysis. - Bug Fixes - Updated tests to ensure accurate validation of new functionalities and improved existing test coverage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-03-20 14:52:04 +01:00
alekszievr	164cb581ec	test: test retrievers [cog-1433] (#635 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Removed unused code to streamline internal processes. - Tests - Added a comprehensive suite of tests to validate core retrieval and search functionalities. - Improved validation of response generation, context handling, and error scenarios to ensure consistent and reliable performance. These improvements enhance overall system stability and maintainability, contributing to a smoother experience for end-users. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: vasilije <vas.markovic@gmail.com>	2025-03-20 10:18:21 +01:00

39 commits