Commit graph

2211 commits

Author SHA1 Message Date
alekszievr
433264d4e4
feat: Add context evaluation to eval framework [COG-1366] (#586)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a class-based retrieval mechanism to enhance answer
generation with improved context extraction and completion.
- Added a new evaluation metric for contextual relevancy and an option
to enable context evaluation during the evaluation process.

- **Refactor**
- Transitioned from a function-based answer resolver to a more modular
retriever approach to improve extensibility.

- **Tests**
- Updated tests to align with the new answer generation and evaluation
process.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-03-05 16:40:24 +01:00
lxobr
f033f733b5
feat: entity brute force triplet search [COG-1325] (#589)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Refactored `brute_force_triplet_search`, extracting memory projection.
- Built **TripletSearchContextProvider** (extends
**BaseContextProvider**) to create a single memory projection and
perform a triplet search for each entity.
- Refactored `entity_completion` into **EntityCompletionRetriever**
(extends **BaseRetriever**).
- Added **SummarizedTripletSearchContextProvider** (extends
**TripletSearchContextProvider**) for an alternative summarized output
format.
- Developed and tested an example showcasing both context providers,
comparing raw triplets, summaries, and standard search results.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced text summarization now delivers clearer, more concise
overviews of search results.
- Improved search performance with optimized context retrieval and
memory reuse for faster, more reliable results.
- Introduced advanced entity-based completion for generating more
relevant, context-aware responses.

- **Refactor**
- Streamlined internal workflows and error handling to ensure a smoother
overall experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-03-05 11:17:58 +01:00
Daniel Molnar
7bac2303cc
chore: Be explicit on extras to install in Docker (#598)
<!-- .github/pull_request_template.md -->

## Description
Be explicit on extras to install in Docker.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a configurable option to install only selected dependency
extras, allowing for a more tailored build experience.

- **Chores**
- Improved clarity in the build instructions regarding environment
configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-04 17:18:57 +01:00
hajdul88
3e93dbe264
fix: add currying to question_answering_non_parallel (#602)
…l to avoid additional params

<!-- .github/pull_request_template.md -->

Introduces lambda currying in question answering non parallel function
to avoid unnecessary params

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Streamlined the question-answering process for cleaner, more efficient
query handling.
- Updated the handling of parameters in the answer generation process,
allowing for a more dynamic integration of context.
- Simplified test setups by reducing the number of parameters involved
in the mock answer resolver.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-04 16:09:53 +01:00
Igor Ilic
cade574bbf
Change data models for gemini (#600)
<!-- .github/pull_request_template.md -->

## Description
Change Gemini adapter and data models so Gemini can use custom data
models

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced provider-specific enhancements with updated data
representations, including improved node labeling and enriched summary
and description fields for graph displays.
- Improved configuration management by automatically loading environment
settings for better LLM operations.

- **Refactor**
- Streamlined response handling with a simplified approach for defining
output formats.
- Updated error handling by removing the try-except block for dotenv
imports.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-04 14:09:28 +01:00
hajdul88
5eef212668
Allowing parallel edges in graph projection when using graph completion search (#599)
<!-- .github/pull_request_template.md -->

## Description
Allows parallell edges in graph projection when using graph completion
search

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined the process for updating connections within the
application’s graph. The update now ensures that every connection is
consistently recorded and propagated without performing duplicate
checks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-04 12:37:26 +01:00
hajdul88
e3f3d49a3b
Feature/cog 1312 integrating evaluation framework into dreamify (#562)
<!-- .github/pull_request_template.md -->

## Description
This PR contains eval framework changes due to the autooptimizer
integration

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Enhanced answer generation now returns structured answer details.
  - Search functionality accepts configurable prompt inputs.
  - Option to generate a metrics dashboard from evaluations.
- Corpus building tasks now support adjustable chunk settings for
greater flexibility.
- New task retrieval functionality allows for flexible task
configuration.
  - Introduced new methods for creating and managing metrics dashboards.

- **Refactor/Chore**
- Streamlined API signatures and reorganized module interfaces for
better consistency.
  - Updated import paths to reflect new module structure.

- **Tests**
- Updated test scenarios to align with new configurations and parameter
adjustments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 19:55:47 +01:00
Boris Arzentar
933c7c86c2 version: v0.1.32 2025-03-03 19:17:55 +01:00
alekszievr
6d7a68dbba
Feat: Store descriptive metrics identified by pipeline run id [cog-1260] (#582)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new analytic capability that calculates descriptive graph
metrics for pipeline runs when enabled.
- Updated the execution flow to include an option for activating the
graph metrics step.

- **Chores**
- Removed the previous mechanism for storing descriptive metrics to
streamline the system.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-03-03 19:09:35 +01:00
Boris
10e4bfb6ab
fix: cognee mcp docker [COG-1470] (#595)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Enhanced deployment and build processes to improve system reliability
and simplify dependency management.
  
- **New Features**
- Added a new dependency (`uv>=0.6.3`) to support enhanced
functionality.
- Updated extra dependencies for `codegraph` to include the
`transformers` library.
  - Improved logging on server startup for clearer operational feedback.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-03-03 19:04:41 +01:00
Igor Ilic
9305f43d8e
Revert "feat: Change Cognee data models to work with Gemini [COG-1352]" (#596)
Reverts topoteretes/cognee#594

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Enhanced AI responses now deliver structured JSON output with clearly
defined sections, improving clarity and consistency.
- Standardized knowledge graph definitions provide a uniform
representation, simplifying integration and interpretation.



<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 17:52:51 +01:00
Igor Ilic
195685a44f
feat: Change Cognee data models to work with Gemini [COG-1352] (#594)
<!-- .github/pull_request_template.md -->

## Description
Change data models and Gemini adapter so it can run custom ontologies

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Improved AI response handling now provides more direct and reliable
output.
- Enhanced knowledge graph displays now include additional descriptive
details under advanced configurations.

- **Refactor**
- Streamlined processing logic reduces complexity and improves
consistency.
- Updated data structures now adapt automatically based on your AI
service configuration for a smoother experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 16:20:23 +01:00
lxobr
bee04cad86
Feat/cog 1331 modal run eval (#576)
<!-- .github/pull_request_template.md -->

## Description
- Split metrics dashboard into two modules: calculator (statistics) and
generator (visualization)
- Added aggregate metrics as a new phase in evaluation pipeline
- Created modal example to run multiple evaluations in parallel and
collect results into a single combined output
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced metrics reporting with improved visualizations, including
histogram and confidence interval plots.
- Introduced an asynchronous evaluation process that supports parallel
execution and streamlined result aggregation.
- Added new configuration options to control metrics calculation and
aggregated output storage.

- **Refactor**
- Restructured dashboard generation and evaluation workflows into a more
modular, maintainable design.
- Improved error handling and logging for better feedback during
evaluation processes.

- **Bug Fixes**
- Updated test cases to ensure accurate validation of the new dashboard
generation and metrics calculation functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 14:22:32 +01:00
Hande
8874ddad2e
feat: cog-1320 Minimal LLM-Based Entity Extraction (#590)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an expert entity extraction feature that extracts
significant named entities from text and provides structured output with
essential details.
- Rolled out customizable prompt templates for both system instructions
and user input to standardize the extraction process.
- Integrated a robust language model–based extractor with comprehensive
error handling to ensure reliable and consistent results.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-03-03 13:22:29 +01:00
Igor Ilic
2323fd0c94
feat: Add gemini ollama support for cognee-mcp [COG-1408] (#583)
<!-- .github/pull_request_template.md -->

## Description
Add gemini ollama support for cognee mcp

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded the system’s capabilities by updating its underlying
integrations, providing enhanced functionality and performance
improvements for end-users.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-01 19:51:48 +01:00
lxobr
ca2cbfab91
feat: add direct llm eval adapter (#591)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Created DirectLLMEvalAdapter - a lightweight alternative to DeepEval
for answer evaluation
• Added evaluation prompt files defining scoring criteria and format
• Made adapter selectable via evaluation_engine = "DirectLLM" in config,
supports "correctness" metric only
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a new evaluation method that compares model responses
against a reference answer using structured prompt templates. This
approach enables automated scoring (ranging from 0 to 1) along with
brief justifications.
  
- **Enhancements**
- Updated the configuration to clearly distinguish between evaluation
options, providing end-users with a more transparent and reliable
assessment process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-01 19:50:20 +01:00
Vasilije
c496bb485c
feat: Draft ollama test (#566)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Introduced new automated testing workflows for Ollama and Gemini,
triggered by pull requests and manual dispatch.
- The Ollama workflow sets up the service and executes a simple example
test to enhance continuous integration.
- Enhanced dependency update workflow with new triggers for push and
pull request events, and added an optional debug logging parameter.
- Added new capabilities for audio and image transcription within the
Ollama API adapter.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
2025-02-28 20:15:12 +01:00
lxobr
3d4312577e
fix: Use DataPoint instead of ExtendableDataPoint in get_all_subclasses (#588)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Use DataPoint instead of ExtendableDataPoint when calling
get_all_subclasses in the get_triplets function of the
GraphCompletionRetriever
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Updated the internal data handling for retrieving information,
ensuring a more consistent and reliable output for end-users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 19:05:09 +01:00
Boris Arzentar
653f5e40dd version: v0.1.31 2025-02-27 18:19:34 +01:00
Boris
e8ab5b4797
fix: tiktoken upgrade (#587)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Removed an outdated internal tracking reference to streamline
maintenance.
- Upgraded a key dependency to its latest stable release, delivering
enhanced performance and reliability for a smoother user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 18:16:11 +01:00
Daniel Molnar
d27f847753
Transition to new retrievers, update searches (#585)
<!-- .github/pull_request_template.md -->

## Description
Delete legacy search implementations after migrating to new retriever
classes

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced search and retrieval capabilities, providing improved context
resolution for code queries, completions, summaries, and graph
connections.
  
- **Refactor**
- Shifted to a modular, object-oriented approach that consolidates query
logic and streamlines error management for a more robust and scalable
experience.
  
- **Bug Fixes**
- Improved error handling for unsupported search types and retrieval
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 15:25:24 +01:00
Igor Ilic
f9b6630024
chore: Add ollama optional depdendency (#584)
<!-- .github/pull_request_template.md -->

## Description
Add ollama optional dependency

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the project’s dependency configuration to include an
additional optional package for enhanced transformation functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 15:09:58 +01:00
lxobr
4b7c21d7d8
feat: retrieve golden contexts [COG-1364] (#579)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Added load_golden_context parameter to BaseBenchmarkAdapter's abstract
load_corpus method, establishing a common interface for retrieving
supporting evidence
• Refactored HotpotQAAdapter with a modular design: introduced
_get_metadata_field_name method to handle dataset-specific fields
(making it extensible for child classes), implemented get golden context
functionality.
• Refactored TwoWikiMultihopAdapter to inherit from HotpotQAAdapter,
overriding only the necessary methods while reusing parent's
functionality
• Added golden context support to MusiqueQAAdapter with their
decomposition-based format
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an option to include additional context during corpus
loading, enhancing the quality and flexibility of generated QA pairs.
- **Refactor**
- Streamlined and modularized the processing workflow across different
adapters for improved consistency and maintainability.
- Updated metadata extraction to refine the display of contextual
information.
- Shifted focus in the `TwoWikiMultihopAdapter` from corpus loading to
context extraction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 13:25:47 +01:00
alekszievr
4c3c811c1e
test: eval_framework/evaluation unit tests [cog-1234] (#575)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Added a suite of tests to validate evaluation logic under various
scenarios, including handling of valid inputs and error conditions.
- Introduced comprehensive tests verifying the accuracy of evaluation
metrics, ensuring reliable scoring and error management.
- Created a new test suite for the `DeepEvalAdapter`, covering
correctness, unsupported metrics, and error handling.
- Added unit tests for `ExactMatchMetric` and `F1ScoreMetric`,
parameterized for various test cases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 13:24:47 +01:00
Igor Ilic
c9aee6fbf4
test: Add testing of cognee telemetry (#573)
<!-- .github/pull_request_template.md -->

## Description
Add testing of cognee telemetry

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Introduced an automated testing process for telemetry components,
running unit tests across multiple environments to ensure consistent
performance. The workflow efficiently manages test execution and error
reporting, speeding up development cycles.

- **Chores**
- Enhanced dependency management and cleanup procedures, significantly
contributing to overall system stability, faster feedback cycles, and
improved release quality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 13:23:16 +01:00
lxobr
9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00
Boris Arzentar
86b34657aa version: v0.1.30 2025-02-26 21:48:59 +01:00
Boris Arzentar
c2c70a7d22 fix: remove postgres and neo4j from mcp setup 2025-02-26 20:30:16 +01:00
Boris Arzentar
8932a5868c fix: add missing system dependencies 2025-02-26 20:25:26 +01:00
Boris Arzentar
915384a944 fix: change context of docker build 2025-02-26 20:22:07 +01:00
Boris
711ae8e675
feat: codegraph improvements and new CODE search [COG-1351] (#581)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced an automated deployment workflow to build and push
container images.
	- Updated dependency management to include additional database support.
- **Refactor**
- Enhanced asynchronous operations and logging in the server for
improved performance.
	- Optimized extraction and retrieval processes for code-related data.
- **Chores**
- Streamlined build configurations and startup scripts for greater
reliability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-02-26 20:15:02 +01:00
alekszievr
f6ced4122a
Test: test eval dashboard generation [COG-1234] (#570)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Introduced a new test suite for validating the metrics dashboard
generation.
- Added tests for the `bootstrap_ci` function to ensure accurate
calculations and handling of various input scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-26 12:45:34 +01:00
Vasilije
4b777cf214
feat: add validation to llm env variables (#558)
… needed

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Implemented enhanced configuration validation for environment-based
settings. Now, if any configuration parameter is provided via the
environment, all required settings must be present. This improvement
helps catch misconfigurations early, reducing potential errors and
ensuring a smoother, more reliable user experience. These proactive
measures significantly enhance overall system stability and performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-26 06:44:45 +01:00
lxobr
1cb83312fe
feat: add experimental cognify pipeline [COG-1293] (#541)
<!-- .github/pull_request_template.md -->

## Description
- Integrate experimental tasks into the evaluation framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced interactive prompt templates for extracting graph nodes,
edge triplets, and relationship names, resulting in more comprehensive
and accurate knowledge graphs.
- Added asynchronous processes to efficiently handle document data and
integrate graph components.
- Launched cascade graph task options to offer enhanced flexibility in
task management workflows.
- Added new functionality for extracting content nodes and relationship
names from text.

- **Refactor**
- Streamlined configurations for prompt processing and task
initialization, improving overall modularity and system stability.
- Updated task getter mechanisms to utilize function-based approaches
for improved flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-25 16:14:27 +01:00
lxobr
55411ff44b
feat: entity completion skeleton [COG-1318] (#552)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Modular implementation of entity completion search
- Added base classes that define entity extractors and context providers
- Created dummy implementations that return test data
- Set up adapters that let us switch between different entity extractors
and context providers using strings
- Added configuration class to control which implementations to use
- Entity completion: query → find entities → get context → interact with
LLM → return answer
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced the query completion experience with integrated language
model response generation, improved validation, and robust error
handling.
- Introduced sample modules for context retrieval and entity extraction
that simulate key processing steps.
- Established foundational abstractions to support flexible context and
entity handling strategies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-25 16:07:48 +01:00
alekszievr
a788875117
test: answer generation [COG-1234] (#569)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Introduced a new asynchronous test to validate the answer generation
functionality, ensuring that generated responses align with the provided
question-answer pairs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-25 12:21:36 +01:00
Vasilije
452eaf0735
Update README.md
Update .env handling
2025-02-24 22:56:18 +01:00
Boris
9a1e03e403
fix: simplify installation in readme (#577)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
	- Enhanced overall clarity and layout of the guide.
- Updated text alignment and visual elements, including an updated logo.
	- Revised header hierarchy for a more intuitive reading experience.
- Added detailed installation instructions with specific database
support.
- Reorganized contributing guidelines and the code of conduct for
improved structure.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-24 20:36:22 +01:00
Igor Ilic
4f354ba534
fix: reuse PostgreSQL database connections (#574)
<!-- .github/pull_request_template.md -->

## Description
Fix PostgreSQL database connection problems

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved the system’s database connection process to enhance
compatibility across multiple relational databases. The application now
dynamically selects the optimal connection method—reusing established
connections when possible—to ensure improved stability and performance
without affecting the public interface.
- Streamlined the creation of the embedding engine by removing it as a
parameter and generating it internally.
- Removed dependency on the embedding engine in the vector engine
retrieval process.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-24 20:35:40 +01:00
Vasilije
6e567445b5
Update README.md 2025-02-21 18:51:24 +01:00
alekszievr
a61df966c6
feat: use external chunker [cog-1354] (#551)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a modular content chunking interface that offers flexible
text segmentation with configurable chunk size and overlap.
- Added new chunkers for enhanced text processing, including
`LangchainChunker` and improved `TextChunker`.

- **Refactor**
- Unified the chunk extraction mechanism across various document types
for improved consistency and type safety.
- Updated method signatures to enhance clarity and type safety regarding
chunker usage.
- Enhanced error handling and logging during text segmentation to guide
adjustments when content exceeds limits.

- **Bug Fixes**
- Adjusted expected output in tests to reflect changes in chunking logic
and configurations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-21 14:10:59 +01:00
hajdul88
eba1515127
feat: quick fix dynamic collection handling in search (#567) [COG-1369]
<!-- .github/pull_request_template.md -->

## Description
Fixes search dynamic collection mapping in graph completion search

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Adjusted graph processing to remove extraneous notifications when
expected data elements are absent.
- Updated query processing to ensure a more consistent selection of
related data types.
- Streamlined database error handling by aligning exception management
with standard practices.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-21 13:45:42 +01:00
SJ
fd3b15fb58
fix: entrypoint.sh to not fail on first docker up, improved handling of migrations, signals and errors. (#546)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
In it's current form, the entrypoint.sh script will run but fail with
exit code 3 on the first docker compose up. Technically, running docker
compose up a second time will not throw the same error and the
application works fine. The new changes will improve the first time user
experience and improve on some other aspects.

Summary of Changes:
1- entrypoint.sh to not fail with exit code 3 on first docker up.
2- Improved error and signal handling with set -e.
3- Improved database migration, verification and error handling. Avoids
schema version mismatch and ensures db schema is always in sync with
application code.
4- Added exec before Gunicorn commands to ensure proper signal handling.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
	- Improved error handling for smoother database migrations and startup.
	- Updated process management to ensure reliable application launch.
- Optimized worker configuration and introduced a startup delay to
guarantee database readiness.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: soekja <soekja@users.noreply.github.com>
Co-authored-by: soekja <soekja@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-02-21 01:28:15 +01:00
alekszievr
28f92f661e
Test: Mock file download and open in musique adapter (#571)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Tests**
- Enhanced test coverage to improve adapter instantiation and data
loading reliability.
  - Updated mock testing logic to ensure robust content handling.
  - Removed an outdated test focused on data limit validation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-20 16:11:19 +01:00
alekszievr
97db017708
Test: test corpus builder [cog-1234] (#564)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Enhanced the continuous integration workflows with updated dependency
management and environment configurations for improved test stability.
  
- **Tests**
- Added parameterized unit tests to verify corpus loading and structure,
ensuring more robust handling of test data.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-20 15:16:58 +01:00
alekszievr
17231de5d0
Test: Parse context pieces separately in MusiqueQAAdapter and adjust tests [cog-1234] (#561)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Updated evaluation checks by removing assertions related to the
relationship between `corpus_list` and `qa_pairs`, now focusing solely
on `qa_pairs` limits.

- **Refactor**
- Improved content processing to append each paragraph individually to
`corpus_list`, enhancing clarity in data structure.
- Simplified type annotations in the `load_corpus` method across
multiple adapters, ensuring consistency in return types.

- **Chores**
- Updated dependency installation commands in GitHub Actions workflows
for Python 3.10, 3.11, and 3.12 to include additional evaluation-related
dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-02-20 14:23:53 +01:00
lxobr
e25c7c93fe
fix: correctly add nodes to chunks [COG-1370] (#568)
<!-- .github/pull_request_template.md -->

## Description
- Fix expand_with_nodes_and_edges to correctly add nodes to chunks
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Enhanced the internal processing for data associations to ensure more
reliable and consistent handling of connections.
- Streamlined the logic to better manage edge cases, improving overall
stability and error handling.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-20 12:52:34 +01:00
Igor Ilic
f2e0f47565
fix: test llm connection with gemini (#557)
<!-- .github/pull_request_template.md -->

## Description
Temporary fix for Gemini LLM until they allow empty dictionaries in
model schema definition

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- AI responses now adjust their format dynamically based on the type of
output, providing a streamlined text display when appropriate.
- Extended processing time improves the handling of longer operations
for a more reliable interaction.

- **Bug Fixes**
- Enhanced error management during connectivity tests ensures a more
robust and stable user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-20 11:41:29 +01:00
Boris
45f7c63322
fix: notebooks errors (#565)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Automatically creates a blank graph when a file isn’t found, ensuring
smoother operations.
- Updated demonstration notebooks with dynamic configurations, including
refined search operations and input prompts.
- Introduced optional support for additional graph functionalities via
an integrated dependency.

- **Refactor**
- Streamlined processing by eliminating duplicate steps and simplifying
graph rendering workflows.

- **Chores**
- Updated environment configurations and upgraded the Python runtime for
improved performance and consistency.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-19 14:07:11 -08:00
Boris Arzentar
811e932cae version: v0.1.29 2025-02-19 20:19:51 +01:00