<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
---------
Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
<!-- .github/pull_request_template.md -->
## Description
Adds S3 support
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
---------
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
- Added new graph creation prompts
- Exposed graph creation prompts in .cognify via get_default tasks
- Exposed graph creation prompts in eval framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
I added one example "get all connected nodes to entity"
---------
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
## Description
Introducing scructlog.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- .github/pull_request_template.md -->
## Description
Change Gemini adapter and data models so Gemini can use custom data
models
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced provider-specific enhancements with updated data
representations, including improved node labeling and enriched summary
and description fields for graph displays.
- Improved configuration management by automatically loading environment
settings for better LLM operations.
- **Refactor**
- Streamlined response handling with a simplified approach for defining
output formats.
- Updated error handling by removing the try-except block for dotenv
imports.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts topoteretes/cognee#594
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced AI responses now deliver structured JSON output with clearly
defined sections, improving clarity and consistency.
- Standardized knowledge graph definitions provide a uniform
representation, simplifying integration and interpretation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Change data models and Gemini adapter so it can run custom ontologies
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Improved AI response handling now provides more direct and reliable
output.
- Enhanced knowledge graph displays now include additional descriptive
details under advanced configurations.
- **Refactor**
- Streamlined processing logic reduces complexity and improves
consistency.
- Updated data structures now adapt automatically based on your AI
service configuration for a smoother experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced an expert entity extraction feature that extracts
significant named entities from text and provides structured output with
essential details.
- Rolled out customizable prompt templates for both system instructions
and user input to standardize the extraction process.
- Integrated a robust language model–based extractor with comprehensive
error handling to ensure reliable and consistent results.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
• Created DirectLLMEvalAdapter - a lightweight alternative to DeepEval
for answer evaluation
• Added evaluation prompt files defining scoring criteria and format
• Made adapter selectable via evaluation_engine = "DirectLLM" in config,
supports "correctness" metric only
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a new evaluation method that compares model responses
against a reference answer using structured prompt templates. This
approach enables automated scoring (ranging from 0 to 1) along with
brief justifications.
- **Enhancements**
- Updated the configuration to clearly distinguish between evaluation
options, providing end-users with a more transparent and reliable
assessment process.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Tests**
- Introduced new automated testing workflows for Ollama and Gemini,
triggered by pull requests and manual dispatch.
- The Ollama workflow sets up the service and executes a simple example
test to enhance continuous integration.
- Enhanced dependency update workflow with new triggers for push and
pull request events, and added an optional debug logging parameter.
- Added new capabilities for audio and image transcription within the
Ollama API adapter.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Daniel Molnar <soobrosa@gmail.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced an automated deployment workflow to build and push
container images.
- Updated dependency management to include additional database support.
- **Refactor**
- Enhanced asynchronous operations and logging in the server for
improved performance.
- Optimized extraction and retrieval processes for code-related data.
- **Chores**
- Streamlined build configurations and startup scripts for greater
reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
… needed
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Implemented enhanced configuration validation for environment-based
settings. Now, if any configuration parameter is provided via the
environment, all required settings must be present. This improvement
helps catch misconfigurations early, reducing potential errors and
ensuring a smoother, more reliable user experience. These proactive
measures significantly enhance overall system stability and performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
## Description
- Integrate experimental tasks into the evaluation framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced interactive prompt templates for extracting graph nodes,
edge triplets, and relationship names, resulting in more comprehensive
and accurate knowledge graphs.
- Added asynchronous processes to efficiently handle document data and
integrate graph components.
- Launched cascade graph task options to offer enhanced flexibility in
task management workflows.
- Added new functionality for extracting content nodes and relationship
names from text.
- **Refactor**
- Streamlined configurations for prompt processing and task
initialization, improving overall modularity and system stability.
- Updated task getter mechanisms to utilize function-based approaches
for improved flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
<!-- .github/pull_request_template.md -->
## Description
Temporary fix for Gemini LLM until they allow empty dictionaries in
model schema definition
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- AI responses now adjust their format dynamically based on the type of
output, providing a streamlined text display when appropriate.
- Extended processing time improves the handling of longer operations
for a more reliable interaction.
- **Bug Fixes**
- Enhanced error management during connectivity tests ensures a more
robust and stable user experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
This PR contains the ollama specific llm adapter together with the
embedding engine.
Tested with the following models:
`LLM_API_KEY="ollama"
llm_model = "llama3.1:8b"
LLM_PROVIDER = "ollama"
llm_endpoint = "http://localhost:11434/v1"
EMBEDDING_PROVIDER="ollama"
EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest"
EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings"
EMBEDDING_DIMENSIONS=4096
HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"`
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a new embedding option that leverages an external provider
for asynchronous text processing.
- Added enhanced language model integration using a dedicated adapter to
improve interaction quality.
- **Enhancements**
- Expanded configuration settings to include a new tokenizer option.
- Updated provider selection logic to incorporate the additional
embedding and language model features.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: vasilije <vas.markovic@gmail.com>
<!-- .github/pull_request_template.md -->
## Description
Summarize retrieved edges to compact string with no redundancies.
Example:
**Before summarization:**
CV example:
visual innovations -- employs -- visual innovations
---
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
-- contains -- creativeworks agency
---
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
-- contains -- visual innovations
---
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
-- contains -- rhode island school of design
---
Experienced Graphic Designer with over 8 years in visual design and
branding, specializing in Adobe Creative Suite and enthusiastic about
producing engaging visuals. -- made_from --
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: david.thompson@example.com
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual
design and branding. Proficient in Adobe Creative Suite and passionate
about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand
strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
**After summarization:**
David Thompson is a Creative Graphic Designer with over 8 years of
experience in visual design and branding, proficient in Adobe Creative
Suite and passionate about creating compelling visuals. He holds a
B.F.A. in Graphic Design from the Rhode Island School of Design (2012).
His experience includes working as a Senior Graphic Designer at
CreativeWorks Agency (2015 – Present), where he led design projects and
created branding materials that increased client engagement by 30%, and
as a Graphic Designer at Visual Innovations (2012 – 2015), where he
designed marketing collateral and collaborated with the marketing team
to develop cohesive brand strategies. His skills include design software
such as Adobe Photoshop, Illustrator, and InDesign, as well as web
design in HTML and CSS, with specialties in Branding and Identity and
Typography.
1. David Thompson employs his skills in visual design and branding.
2. David Thompson contains experience from CreativeWorks Agency.
3. David Thompson contains experience from Visual Innovations.
4. David Thompson made his qualifications from the Rhode Island School
of Design.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a summarization engine that converts relationship-based
inputs into concise, natural sentences.
- Expanded search capabilities with a new query option that generates
graph summaries, providing insightful and aggregated results from graph
data.
- Enhanced asynchronous processing for improved performance in handling
graph data queries and summarization.
- Added flexibility in specifying string conversion methods for graph
edge retrieval.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
This PR contains the evaluation framework development for cognee
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Expanded evaluation framework now integrates asynchronous corpus
building, question answering, and performance evaluation with adaptive
benchmarks for improved metrics (correctness, exact match, and F1
score).
- **Infrastructure**
- Added database integration for persistent storage of questions,
answers, and metrics.
- Launched an interactive metrics dashboard featuring advanced
visualizations.
- Introduced an automated testing workflow for continuous quality
assurance.
- **Documentation**
- Updated guidelines for generating concise, clear answers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.
- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.
- **Chores**
• Dependency and configuration updates improve overall stability and
performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
Add test of embedding and LLM model at beginning of cognee use
Fix issue with relational database async use
Refactor handling of cache mechanism for all databases so changes in
config can be reflected in get functions
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced connection testing for language and embedding services at
startup, ensuring improved reliability during data addition.
- **Refactor**
- Streamlined engine initialization across multiple database systems to
enhance performance and clarity.
- Improved parameter handling and caching strategies for faster, more
consistent operations.
- Updated record identifiers for more robust and unique data storage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->
## Description
PR to test Gemini PR from holchan
1. Add Gemini LLM and Gemini Embedding support
2. Fix CodeGraph issue with chunks being bigger than maximum token value
3. Add Tokenizer adapters to CodeGraph
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added support for the Gemini LLM provider.
- Expanded LLM configuration options.
- Introduced a new GitHub Actions workflow for multimetric QA
evaluation.
- Added new environment variables for LLM and embedding configurations
across various workflows.
- **Bug Fixes**
- Improved error handling in various components.
- Updated tokenization and embedding processes.
- Removed warning related to missing `dict` method in data items.
- **Refactor**
- Simplified token extraction and decoding methods.
- Updated tokenizer interfaces.
- Removed deprecated dependencies.
- Enhanced retry logic and error handling in embedding processes.
- **Documentation**
- Updated configuration comments and settings.
- **Chores**
- Updated GitHub Actions workflows to accommodate new secrets and
environment variables.
- Modified evaluation parameters.
- Adjusted dependency management for optional libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Currently TikToken is used for tokenizing by default which is only supported by OpenAI,
this is an initial commit in an attempt to add Cognee tokenizing support for multiple LLMs
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Minor refactor and logger usage