<!-- .github/pull_request_template.md -->
## Description
Delete legacy search implementations after migrating to new retriever
classes
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced search and retrieval capabilities, providing improved context
resolution for code queries, completions, summaries, and graph
connections.
- **Refactor**
- Shifted to a modular, object-oriented approach that consolidates query
logic and streamlines error management for a more robust and scalable
experience.
- **Bug Fixes**
- Improved error handling for unsupported search types and retrieval
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.
- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced the task execution process by enabling default values for
certain parameters, allowing users to trigger task processing without
supplying every input explicitly.
- **Bug Fixes**
- Adjusted asynchronous handling for the `retrieved_edges_to_string`
function to ensure proper execution flow in various components.
- **Documentation**
- Updated markdown formatting in the Jupyter notebook for improved
readability and structure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.
- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.
- **Chores**
• Dependency and configuration updates improve overall stability and
performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Refactor search so query type doesn't need to be provided to make it
simpler for new users
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Refactor**
- Improved the search interface by standardizing parameter usage with
explicit keyword arguments for specifying search types, enhancing
clarity and consistency.
- **Tests**
- Updated test cases and example integrations to align with the revised
search parameters, ensuring consistent behavior and reliable validation
of search outcomes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Dependency Update**
- Downgraded `mcp` package version from 1.2.0 to 1.1.3
- Updated `cognee` dependency to include additional features with
`cognee[codegraph]`
- **New Features**
- Introduced a new tool, "codify", for transforming codebases into
knowledge graphs
- Enhanced the existing "search" tool to accept a new parameter for
search type
- **Improvements**
- Streamlined search functionality with a new modular approach
- Added new asynchronous function for retrieving and formatting code
parts
- **Documentation**
- Updated import paths for `SearchType` in various modules and tests to
reflect structural changes
- **Code Cleanup**
- Removed legacy search module and associated classes/functions
- Refined data transfer object classes for consistency and clarity
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
* feat: make tasks a configurable argument in the cognify function
* fix: add data points task
* eval on random samples instead of first couple
* Save and load contexts and answers
* Fix random seed usage and handle empty descriptions
* include insights search in cognee option
* create output dir if doesnt exist
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Support 4 different rag options in eval
* Minor refactor and logger usage
* feat: make tasks a configurable argument in the cognify function
* Run eval on a set of parameters and save results as json and png
* fix: add data points task
* script for running all param combinations
* enable context provider to get tasks as param
* bugfix in simple rag
* Incremental eval of cognee pipeline
* potential fix: single asyncio run
* temp fix: exclude insights
* Remove insights, have single asyncio run, refactor
* Include incremental eval in accepted paramsets
* minor fixes
* handle pipeline slices in utils
* Handle insights and customize search types
* Handle retrieved edges more safely
* bugfix
* fix simple rag
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Support 4 different rag options in eval
* Minor refactor and logger usage
* feat: make tasks a configurable argument in the cognify function
* Run eval on a set of parameters and save results as json and png
* fix: add data points task
* script for running all param combinations
* enable context provider to get tasks as param
* bugfix in simple rag
* Incremental eval of cognee pipeline
* potential fix: single asyncio run
* temp fix: exclude insights
* Remove insights, have single asyncio run, refactor
* minor fixes
* handle pipeline slices in utils
* include all options in params json
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Support 4 different rag options in eval
* Minor refactor and logger usage
* Run eval on a set of parameters and save results as json and png
* script for running all param combinations
* bugfix in simple rag
* potential fix: single asyncio run
* temp fix: exclude insights
* Remove insights, have single asyncio run, refactor
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Support 4 different rag options in eval
* Minor refactor and logger usage