Commit graph

355 commits

Author SHA1 Message Date
hajdul88
1b630366c9
Adds types property to pydantic Datapoint inherited classes (#523)
<!-- .github/pull_request_template.md -->

## Description
This PR adds types to DataPoint pydantic class + fixes visualization
colors

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added a `type` field to the `DataPoint` model for clearer data
classification.
- Enhanced color mapping in visualizations by assigning a distinct color
to "TextSummary" nodes.

- **Refactor**
- Improved default settings for version control and ordering to ensure
consistent data behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 19:23:19 +01:00
hajdul88
6a0c0e3ef8
feat: Cognee evaluation framework development (#498)
<!-- .github/pull_request_template.md -->

This PR contains the evaluation framework development for cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded evaluation framework now integrates asynchronous corpus
building, question answering, and performance evaluation with adaptive
benchmarks for improved metrics (correctness, exact match, and F1
score).

- **Infrastructure**
- Added database integration for persistent storage of questions,
answers, and metrics.
- Launched an interactive metrics dashboard featuring advanced
visualizations.
- Introduced an automated testing workflow for continuous quality
assurance.

- **Documentation**
  - Updated guidelines for generating concise, clear answers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 16:31:54 +01:00
Boris
8f84713b54
fix: support structured data conversion to data points (#512)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Introduced version tracking and enhanced metadata in core data models
for improved data consistency.
  
- Bug Fixes
- Improved error handling during graph data loading to prevent
disruptions from unexpected identifier formats.
  
- Refactor
- Centralized identifier parsing and streamlined model definitions,
ensuring smoother and more consistent operations across search,
retrieval, and indexing workflows.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-10 17:16:13 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
alekszievr
2e842652be
Fix diameter and shortest path calculation in networkx adapter [COG-1201] (#507)
…nnected graph

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- Bug Fixes
- Enhanced reliability of graph metric calculations to gracefully handle
unexpected inputs, ensuring smoother and uninterrupted graph analysis
for end-users.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 00:15:26 +01:00
alekszievr
8396fed9a1
feat: metrics in neo4j adapter [COG-1082] (#487)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced graph management capabilities allow users to verify graph
existence, project complete graphs, and remove graphs, delivering more
comprehensive graph insights.
  
- **Refactor**
  - Adjusted default task behavior for streamlined performance.
- Updated timestamp handling to ensure accurate and consistent record
tracking.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-07 15:58:43 +01:00
Igor Ilic
df163b0431
Add pydantic settings checker (#497)
<!-- .github/pull_request_template.md -->

## Description
Add test of embedding and LLM model at beginning of cognee use
Fix issue with relational database async use
Refactor handling of cache mechanism for all databases so changes in
config can be reflected in get functions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced connection testing for language and embedding services at
startup, ensuring improved reliability during data addition.
  
- **Refactor**
- Streamlined engine initialization across multiple database systems to
enhance performance and clarity.
- Improved parameter handling and caching strategies for faster, more
consistent operations.
  - Updated record identifiers for more robust and unique data storage.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-04 23:18:27 +01:00
Igor Ilic
1260fc7db0
fix: Add reraising of general exception handling in cognee [COG-1062] (#490)
<!-- .github/pull_request_template.md -->

## Description
Add re-raising of errors in general exception handling 

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes & Stability Improvements**
- Enhanced error handling throughout the system, ensuring issues during
operations like server startup, data processing, and graph management
are properly logged and reported.

- **Refactor**
- Standardized logging practices replace basic output statements,
improving traceability and providing better insights for
troubleshooting.

- **New Features**
- Updated search functionality now returns only unique results,
enhancing data consistency and the overall user experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-04 10:51:05 +01:00
alekszievr
2858a674f5
feat: Calculate graph metrics for networkx graph [COG-1082] (#484)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enabled an option to retrieve more detailed metrics, providing
comprehensive analytics for graph and descriptive data.

- **Refactor**
- Standardized the way metrics are obtained across components for
consistent behavior and improved data accuracy.
  
- **Chore**
- Made internal enhancements to support optional detailed metric
calculations, streamlining system performance and ensuring future
scalability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-03 18:05:53 +01:00
alekszievr
5119992fd8
feat: Add graph metrics getter in graph db interface and adapters [COG-1082] (#483)
Dummy implementation of graph metrics to demonstrate how the interface
will look like

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced asynchronous functionality for retrieving comprehensive
graph metrics, including counts and connectivity details, across
different systems.
  
- **Refactor**
- Streamlined metrics processing and storage by shifting to direct
retrieval from the graph engine.
- Updated naming conventions for the `GraphMetrics` database table and
reorganized module imports to enhance internal consistency.
  
- **Chores**
- Removed dataset deletion functionalities while introducing the ability
to store descriptive metrics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-03 15:25:04 +01:00
Igor Ilic
8879f3fbbe
feat: Add gemini support [COG-1023] (#485)
<!-- .github/pull_request_template.md -->

## Description
PR to test Gemini PR from holchan

1. Add Gemini LLM and Gemini Embedding support 
2. Fix CodeGraph issue with chunks being bigger than maximum token value
3. Add Tokenizer adapters to CodeGraph

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
    - Added support for the Gemini LLM provider.
    - Expanded LLM configuration options.
- Introduced a new GitHub Actions workflow for multimetric QA
evaluation.
- Added new environment variables for LLM and embedding configurations
across various workflows.

- **Bug Fixes**
    - Improved error handling in various components.
    - Updated tokenization and embedding processes.
    - Removed warning related to missing `dict` method in data items.

- **Refactor**
    - Simplified token extraction and decoding methods.
    - Updated tokenizer interfaces.
    - Removed deprecated dependencies.
    - Enhanced retry logic and error handling in embedding processes.

- **Documentation**
    - Updated configuration comments and settings.

- **Chores**
- Updated GitHub Actions workflows to accommodate new secrets and
environment variables.
    - Modified evaluation parameters.
    - Adjusted dependency management for optional libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-01-31 18:03:23 +01:00
hajdul88
f843c256e4
feat: Use unwind for batch edge save and add unit tests for get_graph_from_model
* feat: adds some unit tests for get_graph_from_model

* feat: updates neo4j add_edges cypher and deletes shallow get_graph_from_model

* fix: fixing merge conflict false resolve

* chore: deletes old only_root unit test
2025-01-31 13:14:04 +01:00
Igor Ilic
860218632f refactor: add suggestions from PR
Add suggestsions made by CodeRabbit on pull request
2025-01-28 17:15:25 +01:00
Igor Ilic
a8644e0bd7 feat: Use litellm max token size as default for model, if model exists in litellm 2025-01-28 17:00:47 +01:00
Igor Ilic
4e56cd64a1 refactor: Add max chunk tokens to code graph pipeline 2025-01-28 15:33:34 +01:00
Igor Ilic
3db7f85c9c feat: Add max_chunk_tokens value to chunkers
Add formula and forwarding of max_chunk_tokens value through Cognee
2025-01-28 14:32:00 +01:00
Igor Ilic
49f60971bb Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-28 10:12:55 +01:00
Igor Ilic
0a9f1349f2 refactor: Change variable and function names based on PR comments
Change variable and function names based on PR comments
2025-01-28 10:10:29 +01:00
Boris
8da81c1de3
Merge branch 'dev' into pgvector-add-normalization 2025-01-27 11:31:24 +01:00
Boris
0c2c5870df
fix: use low_lever server for cognee mcp server (#470)
* fix: revert to older mcp version

* fix: use low_level server for the mcp

* fix: styling errors

* fix: mcp cognify arguments

* fix: ruff errors
2025-01-26 12:52:48 +01:00
Igor Ilic
89d4b7a5c4
Merge branch 'dev' into pgvector-add-normalization 2025-01-24 19:24:39 +01:00
Igor Ilic
23ecf245ed fix: Return string conversion to resolve traceback 2025-01-24 19:20:55 +01:00
Igor Ilic
b0cec3fcaa refactor: Remove conversion to string 2025-01-24 19:03:57 +01:00
Igor Ilic
ffbb387580
Merge branch 'dev' into fix-insert-data 2025-01-24 18:55:41 +01:00
Igor Ilic
7dea1d54d7 refactor: Add specific max token values to embedding models 2025-01-23 18:18:45 +01:00
Igor Ilic
6d5679f9d2 Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-23 18:14:49 +01:00
Igor Ilic
1319944dcd docs: Update .env.template to include llm and embedding options 2025-01-23 18:05:45 +01:00
Igor Ilic
b686376c54 feat: Add gemini tokenizer to cognee 2025-01-23 17:55:04 +01:00
Igor Ilic
294ed1d960 feat: Add HuggingFace Tokenizer support 2025-01-23 16:52:35 +01:00
Igor Ilic
2e1a48e22c docs: Add usage example of function 2025-01-23 15:13:46 +01:00
Igor Ilic
de19016494 fix: Add flag to allow SQLite to use foreign keys 2025-01-23 15:10:27 +01:00
Igor Ilic
d4453e4a1d fix: Add support for SQLite and PostgreSQL for inserting data in SQLAlchemyAdapter 2025-01-23 14:59:02 +01:00
Boris Arzentar
e577276d91 Merge remote-tracking branch 'origin/dev' into feat/COG-1058-fastmcp 2025-01-23 11:46:25 +01:00
Boris Arzentar
00f302c37a feat: use fastmcp for mcp server 2025-01-23 11:45:40 +01:00
Igor Ilic
9f6a0ba783
Merge branch 'dev' into pgvector-add-normalization 2025-01-23 11:11:43 +01:00
Igor Ilic
93249c72c5 fix: Initial commit to resolve issue with using tokenizer based on LLMs
Currently TikToken is used for tokenizing by default which is only supported by OpenAI,
this is an initial commit in an attempt to add Cognee tokenizing support for multiple LLMs
2025-01-21 19:53:22 +01:00
Igor Ilic
bd3a5a758c
Merge branch 'dev' into COG-793-metadata-rework 2025-01-20 18:06:21 +01:00
Igor Ilic
49ad292592 refactor: Reduce complexity of metadata handling
Have foreign metadata be a table column in data instead of it's own table to reduce complexity

Refactor COG-793
2025-01-20 16:39:05 +01:00
hajdul88
813a03c6e2
Merge branch 'dev' into pgvector-add-normalization 2025-01-20 13:46:50 +01:00
Igor Ilic
2546844787 feat: Add normalization to PGVector search
Add normalization to PGVector search results
2025-01-20 13:42:39 +01:00
hajdul88
bf70705ed0 Fix: fixes networkx failed to load graph from file error 2025-01-20 12:19:34 +01:00
hajdul88
6e691885e6
Merge branch 'dev' into feature/cog-186-run-cognee-on-windows 2025-01-17 09:06:00 +01:00
hajdul88
935763b08d fix: fixing changed lancedb search + pruning 2025-01-16 17:32:44 +01:00
vasilije
0a02886d76 Update format 2025-01-16 13:28:35 +01:00
Vasilije
6c1c8abc26
Update adapter.py 2025-01-16 13:26:37 +01:00
Vasilije
b6e82dfb4f
Update adapter.py 2025-01-16 13:24:42 +01:00
Vasilije
0a9c9438ed
Update DataPoint.py 2025-01-16 13:23:48 +01:00
Vasilije
58526a6a24
Merge branch 'dev' into COG-748 2025-01-16 13:19:09 +01:00
hajdul88
9e63bacaa7
Merge branch 'dev' into feature/cog-761-project-graphiti-graph-to-memory 2025-01-15 11:49:10 +01:00
alekszievr
6653d73556
Feat/cog 950 improve metric selection (#435)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Minor refactor and logger usage
2025-01-15 10:45:55 +01:00