Commit graph

396 commits

Author SHA1 Message Date
Vasilije
5aca3f091e
fix: Doesn't drop entire PG database, just cleans public schema - Cog 1947 (#760)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-04-26 00:00:45 +02:00
Igor Ilic
f404386df5
fix: hotfix 0.1.38 (#765)
<!-- .github/pull_request_template.md -->

## Description
- db_engine was not dynamically gathered, with this a change in system
directory will be handled correctly
- Added top_k to all search types
- Reduced delete test threshold 
- Updated MCP version and info

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-04-23 12:04:48 +02:00
Vasilije
bb7eaa017b
feat: Group DataPoints into NodeSets (#680)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-04-19 20:21:04 +02:00
Boris
675b66175f
test: make search unit tests deterministic (#726)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
2025-04-18 21:55:24 +02:00
Igor Ilic
ba2de9bb22
fix: HuggingFace tokenizer (#752)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue noticed by [RyabykinIlya](https://github.com/RyabykinIlya)
where too many HuggingFace requests have been sent due to the embedding
engine not working as a singleton per config

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Signed-off-by: Ryabykin Ilya <ryabykinia@sibur.ru>
Co-authored-by: greshish <ryabykinia@yandex.ru>
Co-authored-by: Ryabykin Ilya <ryabykinia@sibur.ru>
2025-04-17 17:07:36 +02:00
Daniel Molnar
9ba12b25ef
feat: add delete by document (#668)
<!-- .github/pull_request_template.md -->

## Description
Delete by document.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-04-17 15:42:10 +02:00
hajdul88
0121a2b5fc
feature: Adds S3 functionality (#731)
<!-- .github/pull_request_template.md -->

## Description
Adds S3 support


## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-17 08:56:40 +02:00
Igor Ilic
a036787ad1
Embedding string fix [COG-1900] (#742)
<!-- .github/pull_request_template.md -->

## Description
Allow embedding of big strings to support full row embedding in SQL
databases

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-16 22:39:06 +02:00
Vasilije
4e9ca94e78
feat: Adding rate limiting (#709)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-04-16 12:03:46 +02:00
Igor Ilic
22b363b297
tests: Add gh action to test relational db migration [COG-1591] (#718)
<!-- .github/pull_request_template.md -->

## Description
Add relational db migration action 

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-11 14:02:44 +02:00
Igor Ilic
c4a6c94675
fix: Resolve duplicate chunk issue for PGVector [COG-895] (#705)
<!-- .github/pull_request_template.md -->

## Description
Resolve issues with duplicate chunks for PGVector

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-07 18:03:36 +02:00
Igor Ilic
f4856b4413
Mcp add search (#702)
<!-- .github/pull_request_template.md -->

## Description
- Fix Ollama endpoint issue
- Fix COMPLETION and GRAPH COMPLETION MCP use

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-04 19:11:07 +02:00
lxobr
8207dc8643
feat: make graph creation prompt configurable (#686)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Added new graph creation prompts
- Exposed graph creation prompts in .cognify via get_default tasks
- Exposed graph creation prompts in eval framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-04-03 11:14:33 +02:00
James
edea54c5c3
fix: convert file path to str (#693)
## Description

fix int unable find method .split, not sure why its a int

## DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-02 12:35:19 +02:00
Boris
daed8d51f5
fix: add pipeline_name to PipelineRun and change logging default to ERROR (#675)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-03-29 14:55:34 +01:00
Dmitrii Galkin
de5b7f2044
feat: Natural Language Retriever (text2cypher) (#663)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

I added one example "get all connected nodes to entity"

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-03-27 15:44:39 +01:00
Igor Ilic
9f587a01a4
feat: Relational db to graph db [COG-1468] (#644)
<!-- .github/pull_request_template.md -->

## Description
Add ability to migrate relational database to graph database

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-26 11:40:06 +01:00
Daniel Molnar
73db1a5a53
fix: human readable logs (#658)
<!-- .github/pull_request_template.md -->

## Description
Introducing scructlog.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-25 11:54:40 +01:00
Boris
d192d1fe20
chore: remove unused dependencies and make some optional (#661)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-25 10:19:52 +01:00
hajdul88
24e0805f50
chore: deletes error log when there is no collection. Using dynamic c… (#651)
…ollection handling its not an error

<!-- .github/pull_request_template.md -->

## Description
Deletes error logging from ChromaDB adapter

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Updated internal error handling to ensure more consistent responses
during unforeseen issues. This change streamlines the system’s approach
to managing errors, reducing unnecessary internal error logs while
maintaining reliable operations and a stable user experience. These
refinements contribute to improved system stability and efficient error
management. Internal operations are now better optimized to handle
unexpected scenarios gracefully.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-18 11:17:23 +01:00
Daniel Molnar
69950a04dd
feat: Kuzu integration (#628)
<!-- .github/pull_request_template.md -->

## Description
Let's scope it out.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for the Kuzu graph database provider, enhancing
graph operations and data management capabilities.
- Added a comprehensive adapter for Kuzu, facilitating various graph
database operations.
  - Expanded the enumeration of graph database types to include Kuzu.

- **Tests**
- Launched comprehensive asynchronous tests to validate the new Kuzu
graph integration’s performance and reliability.

- **Chores**
- Updated dependency settings and continuous integration workflows to
include the Kuzu provider, ensuring smoother deployments and improved
system quality.
- Enhanced configuration documentation to clarify Kuzu database
requirements.
  - Modified Dockerfile to include Kuzu in the installation extras.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-03-13 17:47:09 +01:00
Dmitrii Galkin
e147fa5bde
feat: Add support for ChromaDB (#622)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

# Add Support for ChromaDB

## Summary
This PR adds support for ChromaDB as a vector database option in the
Cognee application. ChromaDB is a modern, open-source embedding database
designed for AI applications.

## Changes
- Created a new ChromaDBAdapter implementation for vector database
operations
- Added comprehensive test suite for ChromaDB functionality
- Updated docker-compose.yml to include ChromaDB service
- Modified environment configuration to support ChromaDB settings
- Updated vector engine creation logic to support ChromaDB as an option

## Technical Details
- Implemented `ChromaDBAdapter.py` (347 lines) with full CRUD operations
for vector data
- Created test suite (`test_chromadb.py`) with 171 lines of test
coverage
- Updated vector engine creation process to dynamically select ChromaDB
when configured
- Modified settings router to accommodate new database option
- Updated environment template with ChromaDB configuration options

## Docker Changes
- Added ChromaDB service to docker-compose.yml with appropriate
configuration

This PR enhances Cognee's flexibility by providing an alternative vector
database option, allowing users to choose the most appropriate database
for their specific use case.



## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

Tested with UI + tests.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded vector database integration by adding support for Chromadb,
enabling enhanced data management and search functionalities.
- **Tests**
- Added automated tests to validate the Chromadb integration and related
operations.
- **Chores**
- Updated configuration guidance and dependency management to include
Chromadb.
  - Provided an optional container deployment template for Chromadb.
- Added a new entry to ignore the `.chromadb_data/` directory in version
control.
- Introduced a new GitHub Actions workflow for testing Chromadb
integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-03-13 15:13:04 +01:00
Boris
5345626e6a
fix: add proper node labels (#607)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved backend data organization with automatic categorization of
stored items for enhanced search and retrieval.
- Launched a product recommendation system that analyzes customer data
and preferences to suggest top products.
- Introduced a sample dataset showcasing customer profiles, preferences,
and product interactions for demonstration purposes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-06 13:30:13 +01:00
Igor Ilic
cade574bbf
Change data models for gemini (#600)
<!-- .github/pull_request_template.md -->

## Description
Change Gemini adapter and data models so Gemini can use custom data
models

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced provider-specific enhancements with updated data
representations, including improved node labeling and enriched summary
and description fields for graph displays.
- Improved configuration management by automatically loading environment
settings for better LLM operations.

- **Refactor**
- Streamlined response handling with a simplified approach for defining
output formats.
- Updated error handling by removing the try-except block for dotenv
imports.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-04 14:09:28 +01:00
Igor Ilic
9305f43d8e
Revert "feat: Change Cognee data models to work with Gemini [COG-1352]" (#596)
Reverts topoteretes/cognee#594

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Enhanced AI responses now deliver structured JSON output with clearly
defined sections, improving clarity and consistency.
- Standardized knowledge graph definitions provide a uniform
representation, simplifying integration and interpretation.



<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 17:52:51 +01:00
Igor Ilic
195685a44f
feat: Change Cognee data models to work with Gemini [COG-1352] (#594)
<!-- .github/pull_request_template.md -->

## Description
Change data models and Gemini adapter so it can run custom ontologies

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Improved AI response handling now provides more direct and reliable
output.
- Enhanced knowledge graph displays now include additional descriptive
details under advanced configurations.

- **Refactor**
- Streamlined processing logic reduces complexity and improves
consistency.
- Updated data structures now adapt automatically based on your AI
service configuration for a smoother experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 16:20:23 +01:00
Hande
8874ddad2e
feat: cog-1320 Minimal LLM-Based Entity Extraction (#590)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an expert entity extraction feature that extracts
significant named entities from text and provides structured output with
essential details.
- Rolled out customizable prompt templates for both system instructions
and user input to standardize the extraction process.
- Integrated a robust language model–based extractor with comprehensive
error handling to ensure reliable and consistent results.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-03-03 13:22:29 +01:00
lxobr
ca2cbfab91
feat: add direct llm eval adapter (#591)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Created DirectLLMEvalAdapter - a lightweight alternative to DeepEval
for answer evaluation
• Added evaluation prompt files defining scoring criteria and format
• Made adapter selectable via evaluation_engine = "DirectLLM" in config,
supports "correctness" metric only
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a new evaluation method that compares model responses
against a reference answer using structured prompt templates. This
approach enables automated scoring (ranging from 0 to 1) along with
brief justifications.
  
- **Enhancements**
- Updated the configuration to clearly distinguish between evaluation
options, providing end-users with a more transparent and reliable
assessment process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-01 19:50:20 +01:00
Vasilije
c496bb485c
feat: Draft ollama test (#566)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Introduced new automated testing workflows for Ollama and Gemini,
triggered by pull requests and manual dispatch.
- The Ollama workflow sets up the service and executes a simple example
test to enhance continuous integration.
- Enhanced dependency update workflow with new triggers for push and
pull request events, and added an optional debug logging parameter.
- Added new capabilities for audio and image transcription within the
Ollama API adapter.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
2025-02-28 20:15:12 +01:00
Boris
711ae8e675
feat: codegraph improvements and new CODE search [COG-1351] (#581)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced an automated deployment workflow to build and push
container images.
	- Updated dependency management to include additional database support.
- **Refactor**
- Enhanced asynchronous operations and logging in the server for
improved performance.
	- Optimized extraction and retrieval processes for code-related data.
- **Chores**
- Streamlined build configurations and startup scripts for greater
reliability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-02-26 20:15:02 +01:00
Vasilije
4b777cf214
feat: add validation to llm env variables (#558)
… needed

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Implemented enhanced configuration validation for environment-based
settings. Now, if any configuration parameter is provided via the
environment, all required settings must be present. This improvement
helps catch misconfigurations early, reducing potential errors and
ensuring a smoother, more reliable user experience. These proactive
measures significantly enhance overall system stability and performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-26 06:44:45 +01:00
lxobr
1cb83312fe
feat: add experimental cognify pipeline [COG-1293] (#541)
<!-- .github/pull_request_template.md -->

## Description
- Integrate experimental tasks into the evaluation framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced interactive prompt templates for extracting graph nodes,
edge triplets, and relationship names, resulting in more comprehensive
and accurate knowledge graphs.
- Added asynchronous processes to efficiently handle document data and
integrate graph components.
- Launched cascade graph task options to offer enhanced flexibility in
task management workflows.
- Added new functionality for extracting content nodes and relationship
names from text.

- **Refactor**
- Streamlined configurations for prompt processing and task
initialization, improving overall modularity and system stability.
- Updated task getter mechanisms to utilize function-based approaches
for improved flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-25 16:14:27 +01:00
lxobr
55411ff44b
feat: entity completion skeleton [COG-1318] (#552)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Modular implementation of entity completion search
- Added base classes that define entity extractors and context providers
- Created dummy implementations that return test data
- Set up adapters that let us switch between different entity extractors
and context providers using strings
- Added configuration class to control which implementations to use
- Entity completion: query → find entities → get context → interact with
LLM → return answer
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced the query completion experience with integrated language
model response generation, improved validation, and robust error
handling.
- Introduced sample modules for context retrieval and entity extraction
that simulate key processing steps.
- Established foundational abstractions to support flexible context and
entity handling strategies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-25 16:07:48 +01:00
Igor Ilic
4f354ba534
fix: reuse PostgreSQL database connections (#574)
<!-- .github/pull_request_template.md -->

## Description
Fix PostgreSQL database connection problems

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved the system’s database connection process to enhance
compatibility across multiple relational databases. The application now
dynamically selects the optimal connection method—reusing established
connections when possible—to ensure improved stability and performance
without affecting the public interface.
- Streamlined the creation of the embedding engine by removing it as a
parameter and generating it internally.
- Removed dependency on the embedding engine in the vector engine
retrieval process.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-24 20:35:40 +01:00
hajdul88
eba1515127
feat: quick fix dynamic collection handling in search (#567) [COG-1369]
<!-- .github/pull_request_template.md -->

## Description
Fixes search dynamic collection mapping in graph completion search

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Adjusted graph processing to remove extraneous notifications when
expected data elements are absent.
- Updated query processing to ensure a more consistent selection of
related data types.
- Streamlined database error handling by aligning exception management
with standard practices.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-21 13:45:42 +01:00
Igor Ilic
f2e0f47565
fix: test llm connection with gemini (#557)
<!-- .github/pull_request_template.md -->

## Description
Temporary fix for Gemini LLM until they allow empty dictionaries in
model schema definition

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- AI responses now adjust their format dynamically based on the type of
output, providing a streamlined text display when appropriate.
- Extended processing time improves the handling of longer operations
for a more reliable interaction.

- **Bug Fixes**
- Enhanced error management during connectivity tests ensures a more
robust and stable user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-20 11:41:29 +01:00
Boris
45f7c63322
fix: notebooks errors (#565)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Automatically creates a blank graph when a file isn’t found, ensuring
smoother operations.
- Updated demonstration notebooks with dynamic configurations, including
refined search operations and input prompts.
- Introduced optional support for additional graph functionalities via
an integrated dependency.

- **Refactor**
- Streamlined processing by eliminating duplicate steps and simplifying
graph rendering workflows.

- **Chores**
- Updated environment configurations and upgraded the Python runtime for
improved performance and consistency.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-19 14:07:11 -08:00
alekszievr
e56d86b410
feat: Implement optional neo4j metrics and improve tests [cog-1262] (#556)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced graph analytics now offer detailed metrics—including shortest
path lengths, diameter, and clustering coefficients—to provide deeper
insights.
- Added new functions for creating connected test graphs and validating
metrics against predefined ground truth values.
- Introduced a new JSON file containing metrics for connected and
disconnected graph structures.

- **Improvements**
- Updated how graphs are projected to consistently use undirected
representations, ensuring more accurate and reliable metric
calculations.
- Streamlined metric consistency checks across different graph
processing methods for robust, reliable results.
- Simplified testing logic by consolidating metric assertions into a
single function call.

- **Chores**
- Removed unnecessary secret variables from the workflow configuration,
potentially affecting access to certain resources.
	- Updated secret management to include the new `OPENAI_API_KEY`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-19 16:24:59 +01:00
hajdul88
0bcaf5c477
Feature/cog 1358 local ollama model support for cognee (#555)
<!-- .github/pull_request_template.md -->

This PR contains the ollama specific llm adapter together with the
embedding engine.

Tested with the following models:

`LLM_API_KEY="ollama"
llm_model = "llama3.1:8b"
LLM_PROVIDER = "ollama"
llm_endpoint = "http://localhost:11434/v1"
EMBEDDING_PROVIDER="ollama"
EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest"
EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings"
EMBEDDING_DIMENSIONS=4096
HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"`

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a new embedding option that leverages an external provider
for asynchronous text processing.
- Added enhanced language model integration using a dedicated adapter to
improve interaction quality.

- **Enhancements**
  - Expanded configuration settings to include a new tokenizer option.
- Updated provider selection logic to incorporate the additional
embedding and language model features.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-02-19 02:54:04 +01:00
alekszievr
4efdb29187
Summarize retrieved edges to compact string [COG-1181] (#522)
<!-- .github/pull_request_template.md -->

## Description
Summarize retrieved edges to compact string with no redundancies.
Example:
**Before summarization:**


CV example:

visual innovations -- employs -- visual innovations
---

CV 4: Not Relevant
Name: David Thompson
Contact Information:

Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:

Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.

Education:

B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:

Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:

Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
 -- contains -- creativeworks agency
---

CV 4: Not Relevant
Name: David Thompson
Contact Information:

Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:

Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.

Education:

B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:

Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:

Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
 -- contains -- visual innovations
---

CV 4: Not Relevant
Name: David Thompson
Contact Information:

Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:

Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.

Education:

B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:

Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:

Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
 -- contains -- rhode island school of design
---
Experienced Graphic Designer with over 8 years in visual design and
branding, specializing in Adobe Creative Suite and enthusiastic about
producing engaging visuals. -- made_from --
CV 4: Not Relevant
Name: David Thompson
Contact Information:

Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:

Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.

Education:

B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:

Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:

Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography

**After summarization:**

David Thompson is a Creative Graphic Designer with over 8 years of
experience in visual design and branding, proficient in Adobe Creative
Suite and passionate about creating compelling visuals. He holds a
B.F.A. in Graphic Design from the Rhode Island School of Design (2012).
His experience includes working as a Senior Graphic Designer at
CreativeWorks Agency (2015 – Present), where he led design projects and
created branding materials that increased client engagement by 30%, and
as a Graphic Designer at Visual Innovations (2012 – 2015), where he
designed marketing collateral and collaborated with the marketing team
to develop cohesive brand strategies. His skills include design software
such as Adobe Photoshop, Illustrator, and InDesign, as well as web
design in HTML and CSS, with specialties in Branding and Identity and
Typography.

1. David Thompson employs his skills in visual design and branding.
2. David Thompson contains experience from CreativeWorks Agency.
3. David Thompson contains experience from Visual Innovations.
4. David Thompson made his qualifications from the Rhode Island School
of Design.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a summarization engine that converts relationship-based
inputs into concise, natural sentences.
- Expanded search capabilities with a new query option that generates
graph summaries, providing insightful and aggregated results from graph
data.
- Enhanced asynchronous processing for improved performance in handling
graph data queries and summarization.
- Added flexibility in specifying string conversion methods for graph
edge retrieval.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-18 17:29:55 +01:00
Boris
f9e6dcf837
fix: simplify code pipeline (#529)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
  - Enhanced code search and dependency analysis for improved accuracy.
  - Introduced a new high-performance text embedding option.
  - Added an additional execution entry point for code graph processing.
- New optional parameters for flexible property selection in retrieval
functions.
- Introduced new classes for handling import statements, function
definitions, and class definitions.
  - Updated embedding engine selection based on configuration options.

- **Bug Fixes**
- Improved error handling in search operations and database queries for
a more stable user experience.
  - Enhanced error logging for source code parsing.

- **Refactor**
- Streamlined asynchronous processing and refactored internal dependency
extraction.
- Updated configuration and integration settings to enhance overall
reliability.
  - Restructured functions for simplified dependency handling.

- **Chores**
- Upgraded and reorganized dependency management with optional libraries
for extended functionality.
- Added new secret parameters for embedding configuration in workflow
settings.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-02-12 23:58:48 +01:00
hajdul88
1b630366c9
Adds types property to pydantic Datapoint inherited classes (#523)
<!-- .github/pull_request_template.md -->

## Description
This PR adds types to DataPoint pydantic class + fixes visualization
colors

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added a `type` field to the `DataPoint` model for clearer data
classification.
- Enhanced color mapping in visualizations by assigning a distinct color
to "TextSummary" nodes.

- **Refactor**
- Improved default settings for version control and ordering to ensure
consistent data behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 19:23:19 +01:00
hajdul88
6a0c0e3ef8
feat: Cognee evaluation framework development (#498)
<!-- .github/pull_request_template.md -->

This PR contains the evaluation framework development for cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded evaluation framework now integrates asynchronous corpus
building, question answering, and performance evaluation with adaptive
benchmarks for improved metrics (correctness, exact match, and F1
score).

- **Infrastructure**
- Added database integration for persistent storage of questions,
answers, and metrics.
- Launched an interactive metrics dashboard featuring advanced
visualizations.
- Introduced an automated testing workflow for continuous quality
assurance.

- **Documentation**
  - Updated guidelines for generating concise, clear answers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 16:31:54 +01:00
Boris
8f84713b54
fix: support structured data conversion to data points (#512)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Introduced version tracking and enhanced metadata in core data models
for improved data consistency.
  
- Bug Fixes
- Improved error handling during graph data loading to prevent
disruptions from unexpected identifier formats.
  
- Refactor
- Centralized identifier parsing and streamlined model definitions,
ensuring smoother and more consistent operations across search,
retrieval, and indexing workflows.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-10 17:16:13 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
alekszievr
2e842652be
Fix diameter and shortest path calculation in networkx adapter [COG-1201] (#507)
…nnected graph

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- Bug Fixes
- Enhanced reliability of graph metric calculations to gracefully handle
unexpected inputs, ensuring smoother and uninterrupted graph analysis
for end-users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 00:15:26 +01:00
alekszievr
8396fed9a1
feat: metrics in neo4j adapter [COG-1082] (#487)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced graph management capabilities allow users to verify graph
existence, project complete graphs, and remove graphs, delivering more
comprehensive graph insights.
  
- **Refactor**
  - Adjusted default task behavior for streamlined performance.
- Updated timestamp handling to ensure accurate and consistent record
tracking.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-07 15:58:43 +01:00
Igor Ilic
df163b0431
Add pydantic settings checker (#497)
<!-- .github/pull_request_template.md -->

## Description
Add test of embedding and LLM model at beginning of cognee use
Fix issue with relational database async use
Refactor handling of cache mechanism for all databases so changes in
config can be reflected in get functions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced connection testing for language and embedding services at
startup, ensuring improved reliability during data addition.
  
- **Refactor**
- Streamlined engine initialization across multiple database systems to
enhance performance and clarity.
- Improved parameter handling and caching strategies for faster, more
consistent operations.
  - Updated record identifiers for more robust and unique data storage.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-04 23:18:27 +01:00
Igor Ilic
1260fc7db0
fix: Add reraising of general exception handling in cognee [COG-1062] (#490)
<!-- .github/pull_request_template.md -->

## Description
Add re-raising of errors in general exception handling 

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes & Stability Improvements**
- Enhanced error handling throughout the system, ensuring issues during
operations like server startup, data processing, and graph management
are properly logged and reported.

- **Refactor**
- Standardized logging practices replace basic output statements,
improving traceability and providing better insights for
troubleshooting.

- **New Features**
- Updated search functionality now returns only unique results,
enhancing data consistency and the overall user experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-04 10:51:05 +01:00
alekszievr
2858a674f5
feat: Calculate graph metrics for networkx graph [COG-1082] (#484)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enabled an option to retrieve more detailed metrics, providing
comprehensive analytics for graph and descriptive data.

- **Refactor**
- Standardized the way metrics are obtained across components for
consistent behavior and improved data accuracy.
  
- **Chore**
- Made internal enhancements to support optional detailed metric
calculations, streamlining system performance and ensuring future
scalability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-03 18:05:53 +01:00