cognee

Author	SHA1	Message	Date
Daniel Molnar	d27f847753	Transition to new retrievers, update searches (#585 ) <!-- .github/pull_request_template.md --> ## Description Delete legacy search implementations after migrating to new retriever classes ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced search and retrieval capabilities, providing improved context resolution for code queries, completions, summaries, and graph connections. - Refactor - Shifted to a modular, object-oriented approach that consolidates query logic and streamlines error management for a more robust and scalable experience. - Bug Fixes - Improved error handling for unsupported search types and retrieval operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-27 15:25:24 +01:00
lxobr	9cc357ac1c	Feat/cog 1365 unify retrievers (#572 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Created the `BaseRetriever` class to unify all the retrievers and searches. - Implemented seven specialized retrievers (summaries, chunks, completions, graph, graph-summary, insights, code) with consistent get_context/get_completion interfaces. - Added json context dumping feature in the current completion implementations to enable context comparisons. - Built a comparison framework to validate old vs new implementations. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced multiple retrieval classes for enhanced search capabilities, including `BaseRetriever`, `ChunksRetriever`, `CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`, `GraphSummaryCompletionRetriever`, `InsightsRetriever`, and `SummariesRetriever`. - Enhanced query completions with optional context saving for improved data persistence. - Implemented advanced tools to compare retrieval outcomes across different implementations. - Refactor - Streamlined internal module organization and updated references for increased maintainability and consistency. - Added comments indicating future maintenance tasks related to code merging. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-02-27 12:13:21 +01:00
Boris	ada466879e	fix: add default params to run_tasks (#563 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced the task execution process by enabling default values for certain parameters, allowing users to trigger task processing without supplying every input explicitly. - Bug Fixes - Adjusted asynchronous handling for the `retrieved_edges_to_string` function to ensure proper execution flow in various components. - Documentation - Updated markdown formatting in the Jupyter notebook for improved readability and structure. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-02-19 20:18:51 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
Igor Ilic	5fe7ff9883	refactor: Refactor search so graph completion is used by default (#505 ) <!-- .github/pull_request_template.md --> ## Description Refactor search so query type doesn't need to be provided to make it simpler for new users ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Improved the search interface by standardizing parameter usage with explicit keyword arguments for specifying search types, enhancing clarity and consistency. - Tests - Updated test cases and example integrations to align with the revised search parameters, ensuring consistent behavior and reliable validation of search outcomes. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-07 17:16:34 +01:00
Vasilije	4d3acc358a	fix: mcp improvements (#472 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Dependency Update - Downgraded `mcp` package version from 1.2.0 to 1.1.3 - Updated `cognee` dependency to include additional features with `cognee[codegraph]` - New Features - Introduced a new tool, "codify", for transforming codebases into knowledge graphs - Enhanced the existing "search" tool to accept a new parameter for search type - Improvements - Streamlined search functionality with a new modular approach - Added new asynchronous function for retrieving and formatting code parts - Documentation - Updated import paths for `SearchType` in various modules and tests to reflect structural changes - Code Cleanup - Removed legacy search module and associated classes/functions - Refined data transfer object classes for consistency and clarity <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-02-04 08:47:31 +01:00
alekszievr	4e3a666b33	Feat: Save and load contexts and answers for eval (#462 ) * feat: make tasks a configurable argument in the cognify function * fix: add data points task * eval on random samples instead of first couple * Save and load contexts and answers * Fix random seed usage and handle empty descriptions * include insights search in cognee option * create output dir if doesnt exist --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-22 16:17:01 +01:00
alekszievr	75bc7f67eb	feat: Add incremental eval option to paramset (#446 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage * feat: make tasks a configurable argument in the cognify function * Run eval on a set of parameters and save results as json and png * fix: add data points task * script for running all param combinations * enable context provider to get tasks as param * bugfix in simple rag * Incremental eval of cognee pipeline * potential fix: single asyncio run * temp fix: exclude insights * Remove insights, have single asyncio run, refactor * Include incremental eval in accepted paramsets * minor fixes * handle pipeline slices in utils * Handle insights and customize search types * Handle retrieved edges more safely * bugfix * fix simple rag --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-01-17 18:04:31 +01:00
alekszievr	2e010f8dd1	Incremental eval of cognee pipeline (#445 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage * feat: make tasks a configurable argument in the cognify function * Run eval on a set of parameters and save results as json and png * fix: add data points task * script for running all param combinations * enable context provider to get tasks as param * bugfix in simple rag * Incremental eval of cognee pipeline * potential fix: single asyncio run * temp fix: exclude insights * Remove insights, have single asyncio run, refactor * minor fixes * handle pipeline slices in utils * include all options in params json --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-01-17 14:16:48 +01:00
alekszievr	8ec1e48ff6	Run eval on a set of parameters and save them as png and json (#443 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage * Run eval on a set of parameters and save results as json and png * script for running all param combinations * bugfix in simple rag * potential fix: single asyncio run * temp fix: exclude insights * Remove insights, have single asyncio run, refactor --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-17 00:18:51 +01:00
alekszievr	3494521cae	Support 4 different rag options in eval (#439 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage	2025-01-15 15:34:13 +01:00

11 commits