Commit graph

253 commits

Author SHA1 Message Date
Dmitrii Galkin
e147fa5bde
feat: Add support for ChromaDB (#622)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

# Add Support for ChromaDB

## Summary
This PR adds support for ChromaDB as a vector database option in the
Cognee application. ChromaDB is a modern, open-source embedding database
designed for AI applications.

## Changes
- Created a new ChromaDBAdapter implementation for vector database
operations
- Added comprehensive test suite for ChromaDB functionality
- Updated docker-compose.yml to include ChromaDB service
- Modified environment configuration to support ChromaDB settings
- Updated vector engine creation logic to support ChromaDB as an option

## Technical Details
- Implemented `ChromaDBAdapter.py` (347 lines) with full CRUD operations
for vector data
- Created test suite (`test_chromadb.py`) with 171 lines of test
coverage
- Updated vector engine creation process to dynamically select ChromaDB
when configured
- Modified settings router to accommodate new database option
- Updated environment template with ChromaDB configuration options

## Docker Changes
- Added ChromaDB service to docker-compose.yml with appropriate
configuration

This PR enhances Cognee's flexibility by providing an alternative vector
database option, allowing users to choose the most appropriate database
for their specific use case.



## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin

Tested with UI + tests.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded vector database integration by adding support for Chromadb,
enabling enhanced data management and search functionalities.
- **Tests**
- Added automated tests to validate the Chromadb integration and related
operations.
- **Chores**
- Updated configuration guidance and dependency management to include
Chromadb.
  - Provided an optional container deployment template for Chromadb.
- Added a new entry to ignore the `.chromadb_data/` directory in version
control.
- Introduced a new GitHub Actions workflow for testing Chromadb
integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-03-13 15:13:04 +01:00
Igor Ilic
88ed411f03
feat: user authorization [COG-1189] (#593)
<!-- .github/pull_request_template.md -->

## Description
Added user authorization through JWT header, reworked user and relevant
RBAC models to accompany future User Permission system.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Introduced an automated workflow to validate server startup.
  - Added secure JWT token generation for improved session handling.
- Enabled a new structure for permission management with role and
tenant-based controls, including endpoints for creating roles, tenants,
and assigning permissions.
- Added methods for assigning default permissions to roles, tenants, and
users.
- Introduced new classes for managing default permissions for roles,
tenants, and users.

- **Refactor**
- Streamlined authentication and user management flows with enhanced
error handling.

- **Tests**
- Upgraded integration tests with improved database initialization and
data pruning for a more stable environment.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-03-13 13:33:42 +01:00
lxobr
38d527ceac
fix: expose chunk_size for eval framework [COG-1546] (#634)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Exposed chunk_size in get_default_tasks in cognify
- Reintegrated chunk_size in corpus building in eval framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced an optional configuration parameter to allow users to set
custom processing segment sizes. This enhances flexibility in managing
content processing and task execution, enabling more dynamic control
over resource handling during corpus creation and related operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-12 16:13:20 +01:00
hajdul88
6fcfb3c398
feat: productionizing ontology solution [COG-1401] (#623)
<!-- .github/pull_request_template.md -->

## Description
This PR contains the ontology feature integrated into cognify

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced ontology management with the introduction of the
`OntologyResolver` class for improved data handling and querying.
- Expanded ontology framework now provides enriched coverage of
technology and automotive domains, including new entities and
relationships.
- Updated entity models now include a validation flag to support
improved data integrity.
- Added support for specifying an ontology file path in relevant
functions to enhance flexibility.

- **Refactor**
- Streamlined integration of ontology processing across data extraction
and workflow routines.

- **Chores**
- Updated project dependencies to include `owlready2` for advanced
ontology functionality.
  
- **Tests**
- Introduced a new test suite for the `OntologyResolver` class to
validate its functionality under various conditions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-12 14:31:19 +01:00
alekszievr
c1f7b667d1
feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Simplified text processing by unifying multiple size-related
parameters into a single metric across chunking and extraction
functionalities.
- Streamlined logic for text segmentation by removing redundant
calculations and checks, resulting in a more consistent chunk management
process.
- **Chores**
  - Removed the `modal` package as a dependency.
- **Documentation**
- Updated the README.md to include a new demo video link and clarified
default environment variable settings.
- Enhanced the CONTRIBUTING.md to improve clarity and engagement for
potential contributors.
- **Bug Fixes**
- Improved handling of sentence-ending punctuation in text processing to
include additional characters.
- **Version Update**
  - Updated project version to 0.1.33 in the pyproject.toml file.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-12 14:03:41 +01:00
hajdul88
e3f3d49a3b
Feature/cog 1312 integrating evaluation framework into dreamify (#562)
<!-- .github/pull_request_template.md -->

## Description
This PR contains eval framework changes due to the autooptimizer
integration

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Enhanced answer generation now returns structured answer details.
  - Search functionality accepts configurable prompt inputs.
  - Option to generate a metrics dashboard from evaluations.
- Corpus building tasks now support adjustable chunk settings for
greater flexibility.
- New task retrieval functionality allows for flexible task
configuration.
  - Introduced new methods for creating and managing metrics dashboards.

- **Refactor/Chore**
- Streamlined API signatures and reorganized module interfaces for
better consistency.
  - Updated import paths to reflect new module structure.

- **Tests**
- Updated test scenarios to align with new configurations and parameter
adjustments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 19:55:47 +01:00
Daniel Molnar
d27f847753
Transition to new retrievers, update searches (#585)
<!-- .github/pull_request_template.md -->

## Description
Delete legacy search implementations after migrating to new retriever
classes

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced search and retrieval capabilities, providing improved context
resolution for code queries, completions, summaries, and graph
connections.
  
- **Refactor**
- Shifted to a modular, object-oriented approach that consolidates query
logic and streamlines error management for a more robust and scalable
experience.
  
- **Bug Fixes**
- Improved error handling for unsupported search types and retrieval
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 15:25:24 +01:00
Boris
711ae8e675
feat: codegraph improvements and new CODE search [COG-1351] (#581)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced an automated deployment workflow to build and push
container images.
	- Updated dependency management to include additional database support.
- **Refactor**
- Enhanced asynchronous operations and logging in the server for
improved performance.
	- Optimized extraction and retrieval processes for code-related data.
- **Chores**
- Streamlined build configurations and startup scripts for greater
reliability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
2025-02-26 20:15:02 +01:00
alekszievr
2a167fa1ab
feat: externalize chunkers [cog-1354] (#547)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced document chunk extraction for improved processing consistency
across multiple formats.

- **Refactor**
- Streamlined the configuration for text chunking by replacing indirect
mappings with a direct instantiation approach across document types.
- Updated method signatures across various document classes to accept
chunker class references instead of string identifiers.

- **Chores**
- Removed legacy configuration utilities related to document chunking to
simplify processing.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-19 13:26:11 +01:00
Igor Ilic
46e026f77f
Cognee gui [COG-1307] (#530)
<!-- .github/pull_request_template.md -->

## Description
Add a simple GUI to add documents to Cognee and use GRAPH_COMPLETION
search to get answers

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an interactive file search interface with intuitive
controls. Users can easily upload files, enter search terms, and view
results in a unified display with clear notifications during processing.
  
- **Chores**
- Updated project dependencies to include `pyside6` and `qasync` for
enhanced GUI functionality.
- Refined background query processing to improve the accuracy and
relevance of search outcomes.
- Improved code readability with formatting enhancements in the search
function.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-14 15:51:33 +01:00
SJ
a602094598
feat: Update parameters in search API route to match search function parameters order (#528)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
**Updated handling of SearchType through the chain:**
Router receives JSON with searchType Enum, example: "searchType":
"CHUNKS"
FastAPI converts to SearchType enum via SearchPayloadDTO
search_v2.py expects SearchType enum
search.py takes SearchType enum and extracts value
log_query.py takes string value
Query model stores string in database

**get_search_router.py**

Matched the exact field name from JSON payload searchType instead of
search_type in the SearchPayloadDTO class.
Changed cognee_search() params to use payload.query and
payload.searchType

**search.py**
Changed query_type to SearchType
log_query to accept query_type.value parameter instead of
str(query_type)

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Updated the search functionality to improve consistency and
reliability.
- Enhanced validation by switching to stricter search type checks,
ensuring only valid search types are processed.
- Maintained robust error handling for uninterrupted search operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-13 21:04:31 +01:00
Boris Arzentar
d0d8559453 fix: consolidate api/sdk/mcp search 2025-02-13 13:15:39 +01:00
Boris
f9e6dcf837
fix: simplify code pipeline (#529)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
  - Enhanced code search and dependency analysis for improved accuracy.
  - Introduced a new high-performance text embedding option.
  - Added an additional execution entry point for code graph processing.
- New optional parameters for flexible property selection in retrieval
functions.
- Introduced new classes for handling import statements, function
definitions, and class definitions.
  - Updated embedding engine selection based on configuration options.

- **Bug Fixes**
- Improved error handling in search operations and database queries for
a more stable user experience.
  - Enhanced error logging for source code parsing.

- **Refactor**
- Streamlined asynchronous processing and refactored internal dependency
extraction.
- Updated configuration and integration settings to enhance overall
reliability.
  - Restructured functions for simplified dependency handling.

- **Chores**
- Upgraded and reorganized dependency management with optional libraries
for extended functionality.
- Added new secret parameters for embedding configuration in workflow
settings.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-02-12 23:58:48 +01:00
Vasilije
9ba2e0d6c1
chore: Fix and update visualization (#518)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced enhanced visualization capabilities that let users launch a
dedicated server for visual displays.
  
- **Documentation**
- Updated several interactive notebooks to include execution outputs and
expanded explanatory content for better user guidance.
  
- **Style**
- Refined formatting and layout across notebooks to ensure consistent
presentation and improved readability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-11 19:25:01 +01:00
alekszievr
05ba29af01
Feat: log pipeline status and pass it through pipeline [COG-1214] (#501)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced pipeline execution now provides consolidated status feedback
with improved telemetry for start, completion, and error events.
- Automatic generation of unique dataset identifiers offers clearer task
and pipeline run associations.

- **Refactor**
- Task execution has been streamlined with explicit parameter handling
for more structured pipeline processing.
- Interactive examples and demos now return results directly, making
integration and monitoring more accessible.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-02-11 16:41:40 +01:00
Boris
8f84713b54
fix: support structured data conversion to data points (#512)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Introduced version tracking and enhanced metadata in core data models
for improved data consistency.
  
- Bug Fixes
- Improved error handling during graph data loading to prevent
disruptions from unexpected identifier formats.
  
- Refactor
- Centralized identifier parsing and streamlined model definitions,
ensuring smoother and more consistent operations across search,
retrieval, and indexing workflows.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-10 17:16:13 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
Igor Ilic
5fe7ff9883
refactor: Refactor search so graph completion is used by default (#505)
<!-- .github/pull_request_template.md -->

## Description
Refactor search so query type doesn't need to be provided to make it
simpler for new users

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved the search interface by standardizing parameter usage with
explicit keyword arguments for specifying search types, enhancing
clarity and consistency.
- **Tests**
- Updated test cases and example integrations to align with the revised
search parameters, ensuring consistent behavior and reliable validation
of search outcomes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-07 17:16:34 +01:00
alekszievr
8396fed9a1
feat: metrics in neo4j adapter [COG-1082] (#487)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced graph management capabilities allow users to verify graph
existence, project complete graphs, and remove graphs, delivering more
comprehensive graph insights.
  
- **Refactor**
  - Adjusted default task behavior for streamlined performance.
- Updated timestamp handling to ensure accurate and consistent record
tracking.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-07 15:58:43 +01:00
hajdul88
bcd326518d
feat: implements graph visualization method for cognee (#493)
<!-- .github/pull_request_template.md -->

## Description
This PR contains the improvement of the visualization endpoint

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Launched an enhanced interactive network visualization utility that
renders dynamic, browser-based graphs. The new feature simplifies
execution by directly generating an HTML file showcasing the
visualization—complete with interactive elements and an on-screen
confirmation—providing a more intuitive and efficient experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-06 11:22:17 +01:00
Igor Ilic
df163b0431
Add pydantic settings checker (#497)
<!-- .github/pull_request_template.md -->

## Description
Add test of embedding and LLM model at beginning of cognee use
Fix issue with relational database async use
Refactor handling of cache mechanism for all databases so changes in
config can be reflected in get functions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced connection testing for language and embedding services at
startup, ensuring improved reliability during data addition.
  
- **Refactor**
- Streamlined engine initialization across multiple database systems to
enhance performance and clarity.
- Improved parameter handling and caching strategies for faster, more
consistent operations.
  - Updated record identifiers for more robust and unique data storage.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-04 23:18:27 +01:00
Igor Ilic
1260fc7db0
fix: Add reraising of general exception handling in cognee [COG-1062] (#490)
<!-- .github/pull_request_template.md -->

## Description
Add re-raising of errors in general exception handling 

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes & Stability Improvements**
- Enhanced error handling throughout the system, ensuring issues during
operations like server startup, data processing, and graph management
are properly logged and reported.

- **Refactor**
- Standardized logging practices replace basic output statements,
improving traceability and providing better insights for
troubleshooting.

- **New Features**
- Updated search functionality now returns only unique results,
enhancing data consistency and the overall user experience.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-04 10:51:05 +01:00
Vasilije
4d3acc358a
fix: mcp improvements (#472)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Dependency Update**
	- Downgraded `mcp` package version from 1.2.0 to 1.1.3
- Updated `cognee` dependency to include additional features with
`cognee[codegraph]`

- **New Features**
- Introduced a new tool, "codify", for transforming codebases into
knowledge graphs
- Enhanced the existing "search" tool to accept a new parameter for
search type

- **Improvements**
	- Streamlined search functionality with a new modular approach
- Added new asynchronous function for retrieving and formatting code
parts

- **Documentation**
- Updated import paths for `SearchType` in various modules and tests to
reflect structural changes

- **Code Cleanup**
	- Removed legacy search module and associated classes/functions
	- Refined data transfer object classes for consistency and clarity
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-02-04 08:47:31 +01:00
alekszievr
2858a674f5
feat: Calculate graph metrics for networkx graph [COG-1082] (#484)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enabled an option to retrieve more detailed metrics, providing
comprehensive analytics for graph and descriptive data.

- **Refactor**
- Standardized the way metrics are obtained across components for
consistent behavior and improved data accuracy.
  
- **Chore**
- Made internal enhancements to support optional detailed metric
calculations, streamlining system performance and ensuring future
scalability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-03 18:05:53 +01:00
alekszievr
5119992fd8
feat: Add graph metrics getter in graph db interface and adapters [COG-1082] (#483)
Dummy implementation of graph metrics to demonstrate how the interface
will look like

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced asynchronous functionality for retrieving comprehensive
graph metrics, including counts and connectivity details, across
different systems.
  
- **Refactor**
- Streamlined metrics processing and storage by shifting to direct
retrieval from the graph engine.
- Updated naming conventions for the `GraphMetrics` database table and
reorganized module imports to enhance internal consistency.
  
- **Chores**
- Removed dataset deletion functionalities while introducing the ability
to store descriptive metrics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-03 15:25:04 +01:00
Igor Ilic
8879f3fbbe
feat: Add gemini support [COG-1023] (#485)
<!-- .github/pull_request_template.md -->

## Description
PR to test Gemini PR from holchan

1. Add Gemini LLM and Gemini Embedding support 
2. Fix CodeGraph issue with chunks being bigger than maximum token value
3. Add Tokenizer adapters to CodeGraph

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
    - Added support for the Gemini LLM provider.
    - Expanded LLM configuration options.
- Introduced a new GitHub Actions workflow for multimetric QA
evaluation.
- Added new environment variables for LLM and embedding configurations
across various workflows.

- **Bug Fixes**
    - Improved error handling in various components.
    - Updated tokenization and embedding processes.
    - Removed warning related to missing `dict` method in data items.

- **Refactor**
    - Simplified token extraction and decoding methods.
    - Updated tokenizer interfaces.
    - Removed deprecated dependencies.
    - Enhanced retry logic and error handling in embedding processes.

- **Documentation**
    - Updated configuration comments and settings.

- **Chores**
- Updated GitHub Actions workflows to accommodate new secrets and
environment variables.
    - Modified evaluation parameters.
    - Adjusted dependency management for optional libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-01-31 18:03:23 +01:00
hajdul88
f843c256e4
feat: Use unwind for batch edge save and add unit tests for get_graph_from_model
* feat: adds some unit tests for get_graph_from_model

* feat: updates neo4j add_edges cypher and deletes shallow get_graph_from_model

* fix: fixing merge conflict false resolve

* chore: deletes old only_root unit test
2025-01-31 13:14:04 +01:00
alekszievr
a79f7133fd
Feat: add number of tokens and descriptive graph metrics to metric table [COG-1132] (#481)
* Count the number of tokens in documents

* save token count to relational db

* Add metrics to metric table

* Store list as json instead of array in relational db table

* Sum in sql instead of python

* Unify naming

* Return data_points in descriptive metric calculation task

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-01-30 12:39:14 +01:00
Igor Ilic
6f8cbdbf1c
Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-28 15:44:57 +01:00
Igor Ilic
4e56cd64a1 refactor: Add max chunk tokens to code graph pipeline 2025-01-28 15:33:34 +01:00
Igor Ilic
3db7f85c9c feat: Add max_chunk_tokens value to chunkers
Add formula and forwarding of max_chunk_tokens value through Cognee
2025-01-28 14:32:00 +01:00
Boris Arzentar
3320bc8f2c feat: add codegraph related API endpoints 2025-01-28 10:08:59 +01:00
Igor Ilic
93249c72c5 fix: Initial commit to resolve issue with using tokenizer based on LLMs
Currently TikToken is used for tokenizing by default which is only supported by OpenAI,
this is an initial commit in an attempt to add Cognee tokenizing support for multiple LLMs
2025-01-21 19:53:22 +01:00
Igor Ilic
0c7c1d7503 refactor: Refactor ingestion to only have one ingestion task 2025-01-20 14:33:47 +01:00
lxobr
65a0c98455
COG-989 feat: make tasks a configurable argument in the cognify function (#442)
* feat: make tasks a configurable argument in the cognify function

* fix: add data points task

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-01-17 10:20:57 +01:00
vasilije
cb7b2d311e Fix for now 2025-01-16 21:36:25 +01:00
Rita Aleksziev
a11b914f39 Merge branch 'dev' into COG-949 2025-01-10 10:02:56 +01:00
Igor Ilic
6b57bfc4cb feat: Add ability to change graph database configuration through cognee 2025-01-09 16:41:18 +01:00
Rita Aleksziev
626bc76f5c Set max_tokens in config 2025-01-09 12:53:26 +01:00
hajdul88
341f30fcdc fix: Fixes ruff formatting 2025-01-09 12:00:49 +01:00
hajdul88
fe57eb69e7
Merge branch 'dev' into feature/cog-967-adding-graph-completion-feature-to-cognee 2025-01-09 11:07:19 +01:00
Rita Aleksziev
5635da6e38 Adjust unit tests 2025-01-09 10:53:03 +01:00
hajdul88
d39140f28b feat: implements the first version of graph based completion in search 2025-01-08 16:10:29 +01:00
Rita Aleksziev
97814e334f Get embedding engine instead of passing it in code chunking. 2025-01-08 13:45:04 +01:00
Rita Aleksziev
34a9267f41 Get embedding engine instead of passing it. Get it from vector engine instead of direct getter. 2025-01-08 13:23:17 +01:00
vasilije
41b1486cff Fix visualization 2025-01-08 13:13:52 +01:00
hajdul88
18c8bc3c33
Merge branch 'dev' into COG-adding_html_graph_render 2025-01-08 10:44:11 +01:00
alekszievr
0dec704445
Merge branch 'dev' into COG-949 2025-01-08 10:21:07 +01:00
vasilije
61897c57e8 Fix visualization 2025-01-07 15:25:16 +01:00
vasilije
2d10065166 Fix visualization 2025-01-07 15:21:44 +01:00