Commit graph

41 commits

Author SHA1 Message Date
chinu0609
ab33a7863b Merge branch 'delete-last-acessed' of github.com:chinu0609/cognee into delete-last-acessed 2025-12-26 09:13:55 +05:30
chinu0609
4860d1c550 fix: implementing deletion in search.py 2025-12-26 09:13:29 +05:30
Boris
3cda1af29d
Merge branch 'dev' into main 2025-12-01 11:33:22 +01:00
hajdul88
508165e883
feature: Introduces wide subgraph search in graph completion and improves QA speed (#1736)
<!-- .github/pull_request_template.md -->

This PR introduces wide vector and graph structure filtering
capabilities. With these changes, the graph completion retriever and all
retrievers that inherit from it will now filter relevant vector elements
and subgraphs based on the query. This improvement significantly
increases search speed for large graphs while maintaining—and in some
cases slightly improving—accuracy.

Changes in This PR:

-Introduced new wide_search_top_k parameter: Controls the initial search
space size

-Added graph adapter level filtering method: Enables relevant subgraph
filtering while maintaining backward compatibility. For community or
custom graph adapters that don't implement this method, the system
gracefully falls back to the original search behavior.

-Updated modal dashboard and evaluation framework: Fixed compatibility
issues.
Added comprehensive unit tests: Introduced unit tests for
brute_force_triplet_search (previously untested) and expanded the
CogneeGraph test suite.

Integration tests: Existing integration tests verify end-to-end search
functionality (no changes required).

Acceptance Criteria and Testing

To verify the new search behavior, run search queries with different
wide_search_top_k parameters while logging is enabled:
None: Triggers a full graph search (default behavior)
1: Projects a minimal subgraph (demonstrates maximum filtering)
Custom values: Test intermediate levels of filtering

Internal Testing and results:
Performance and accuracy benchmarks are available upon request. The
implementation demonstrates measurable improvements in query latency for
large graphs without sacrificing result quality.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-26 15:18:53 +01:00
chinu0609
53d3b50f93 Merge remote-tracking branch 'upstream/dev' 2025-11-22 14:54:10 +05:30
Andrej Milicevic
da5055a0a9 test: add one test that covers all retrievers. delete others 2025-11-06 17:11:15 +01:00
Andrej Milicevic
7e3c24100b refactor: add structured output to completion retrievers 2025-11-04 15:09:33 +01:00
chinu0609
5080e8f8a5 feat: genarlizing getting entities from triplets 2025-11-03 00:59:04 +05:30
chinu0609
f1afd1f0a2 feat: adding cleanup function and adding update_node_acess_timestamps in completion retriever and graph_completion retriever 2025-10-31 15:49:34 +05:30
Daulet Amirkhanov
ee7db762e6 log warning and early exit when graph is empty and is queried 2025-10-22 13:21:51 +01:00
hajdul88
49e9d7dc27 chore: renames conversation history save method 2025-10-20 10:28:03 +02:00
hajdul88
9e9489c858 feat: adds conversation history to context if caching is enabled 2025-10-16 17:48:50 +02:00
hajdul88
96f2a2f22b ruff format 2025-10-16 15:54:48 +02:00
hajdul88
0e4c4907e9 feat: centralizes session caching in util function 2025-10-16 10:46:19 +02:00
hajdul88
66280442ac ruff formatting 2025-10-15 18:02:10 +02:00
hajdul88
b36772e8bf feat: adds session_id to all retrievers + updates docstrings 2025-10-15 18:01:13 +02:00
hajdul88
0aa64403c5 feat: basic session behavior (only graph completion now just to save) 2025-10-15 17:51:47 +02:00
hajdul88
faeca138d9
fix: fixes distributed pipeline (#1454)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes distributed pipeline + updates core changes in distr
logic.

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Changes Made
Fixes distributed pipeline:
-Changed spawning logic + adds incremental loading to
run_tasks_diistributed
-Adds batching to consumer nodes
-Fixes consumer stopping criteria by adding stop signal + handling
-Changed edge embedding solution to avoid huge network load in a case of
a multicontainer environment

## Testing
Tested it by running 1GB on modal + manually

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
None

## Additional Notes
None

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-10-09 14:06:25 +02:00
Igor Ilic
a9c507b36e fix: Remove creation of default user during search 2025-09-23 18:43:05 +02:00
Boris
351deb0314
fix: UI (#1397)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- 
- 
- 

## Testing
<!-- Describe how you tested your changes -->

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->

## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-12 20:06:44 +02:00
Igor Ilic
e5381e110f fix: Return search backward compatibility 2025-09-11 21:08:13 +02:00
Boris
b1643414d2
feat: implement combined context search (#1341)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-09-10 16:33:08 +02:00
Boris
aaa1776293
feat: implement new local UI (#1279)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com>
2025-09-05 15:39:04 +02:00
Igor Ilic
2847569616 feat: memify next iteration 2025-09-03 16:08:32 +02:00
Igor Ilic
2915698d60 feat: Add only_context and system prompt flags for search 2025-08-28 13:43:37 +02:00
Igor Ilic
ac87e62adb feat: Save search flag progress 2025-08-28 10:52:08 +02:00
hajdul88
4a5d5f70d0 feat: adds feedback weights to edges 2025-08-19 16:50:21 +02:00
hajdul88
b6be61776a fix: fixes tests 2025-08-18 13:50:21 +02:00
hajdul88
711c805c83 feat: adds cognee-user interactions to search 2025-08-18 13:14:06 +02:00
Vasilije
1885ab9e88
chore: Cog 2354 add logging (#1115)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-07-24 13:27:27 +02:00
Boris
773b15a645
feat: websockets for pipeline update streaming (#851)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-06-11 20:29:26 +02:00
hajdul88
ac0711defb
Adds description to context in graph completion and its subclasses (#934)
<!-- .github/pull_request_template.md -->

## Description
Adds description to context in graph completion and its subclasses

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-06-09 15:54:07 +02:00
Daniel Molnar
ff997f48b5
Docstring modules. (#877)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-05-27 21:33:58 +02:00
hajdul88
965033e161
Feat: Adds subgraph retriever to graph based completion searches (#874)
<!-- .github/pull_request_template.md -->

## Description
Adds subgraph retriever to graph based completion searches

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-27 11:40:47 +02:00
Boris
cd9c4897a4
feat: remove get_distance_from_collection_names and adapt search (#766)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-30 11:11:07 +02:00
Boris
675b66175f
test: make search unit tests deterministic (#726)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
2025-04-18 21:55:24 +02:00
hajdul88
119fa1eb73
feat: adds graph completion retriever fix (#676)
<!-- .github/pull_request_template.md -->

## Description
Adds graph completion retriever fix

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-28 17:39:56 +01:00
lxobr
ee88fcf5d3
feat: reimplement resolve_edges_to_text with cleaner formatting (#652)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Optimized to deduplicate nodes appearing in multiple triplets,
avoiding redundant text repetition
- Reimplemented `resolve_edges_to_text` with cleaner formatting
  - Added `_top_n_words` method for extracting frequent words from text
- Created `_get_title` function to generate titles from text content
based on first words and word frequency
  - Extracted node processing logic to `_get_nodes` helper method
  - Created dedicated `stop_words` utility with common English stopwords

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Improved text output formatting that organizes content into clearly
defined sections for enhanced readability.
- Enhanced text processing capabilities, including refined title
generation and key phrase extraction.
- Introduced a comprehensive utility for managing common stop words,
further optimizing text analysis.
  
- **Bug Fixes**
- Updated tests to ensure accurate validation of new functionalities and
improved existing test coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-03-20 14:52:04 +01:00
lxobr
ac0156514d
feat: COG-1523 add top_k in run_question_answering (#625)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Expose top_k as an optional argument of run_question_answering
- Update retrievers to handle the parameters

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced answer generation and document retrieval capabilities by
introducing an optional parameter that allows users to specify the
number of top results. This improvement adds flexibility when retrieving
question responses and associated context, adapting the output based on
user preference.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-10 10:55:31 +01:00
lxobr
3d4312577e
fix: Use DataPoint instead of ExtendableDataPoint in get_all_subclasses (#588)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Use DataPoint instead of ExtendableDataPoint when calling
get_all_subclasses in the get_triplets function of the
GraphCompletionRetriever
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Updated the internal data handling for retrieving information,
ensuring a more consistent and reliable output for end-users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 19:05:09 +01:00
lxobr
9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00