cognee

Author	SHA1	Message	Date
Dmitrii Galkin	e147fa5bde	feat: Add support for ChromaDB (#622 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> # Add Support for ChromaDB ## Summary This PR adds support for ChromaDB as a vector database option in the Cognee application. ChromaDB is a modern, open-source embedding database designed for AI applications. ## Changes - Created a new ChromaDBAdapter implementation for vector database operations - Added comprehensive test suite for ChromaDB functionality - Updated docker-compose.yml to include ChromaDB service - Modified environment configuration to support ChromaDB settings - Updated vector engine creation logic to support ChromaDB as an option ## Technical Details - Implemented `ChromaDBAdapter.py` (347 lines) with full CRUD operations for vector data - Created test suite (`test_chromadb.py`) with 171 lines of test coverage - Updated vector engine creation process to dynamically select ChromaDB when configured - Modified settings router to accommodate new database option - Updated environment template with ChromaDB configuration options ## Docker Changes - Added ChromaDB service to docker-compose.yml with appropriate configuration This PR enhances Cognee's flexibility by providing an alternative vector database option, allowing users to choose the most appropriate database for their specific use case. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin Tested with UI + tests. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Expanded vector database integration by adding support for Chromadb, enabling enhanced data management and search functionalities. - Tests - Added automated tests to validate the Chromadb integration and related operations. - Chores - Updated configuration guidance and dependency management to include Chromadb. - Provided an optional container deployment template for Chromadb. - Added a new entry to ignore the `.chromadb_data/` directory in version control. - Introduced a new GitHub Actions workflow for testing Chromadb integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-03-13 15:13:04 +01:00
Igor Ilic	88ed411f03	feat: user authorization [COG-1189] (#593 ) <!-- .github/pull_request_template.md --> ## Description Added user authorization through JWT header, reworked user and relevant RBAC models to accompany future User Permission system. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an automated workflow to validate server startup. - Added secure JWT token generation for improved session handling. - Enabled a new structure for permission management with role and tenant-based controls, including endpoints for creating roles, tenants, and assigning permissions. - Added methods for assigning default permissions to roles, tenants, and users. - Introduced new classes for managing default permissions for roles, tenants, and users. - Refactor - Streamlined authentication and user management flows with enhanced error handling. - Tests - Upgraded integration tests with improved database initialization and data pruning for a more stable environment. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-03-13 13:33:42 +01:00
lxobr	38d527ceac	fix: expose chunk_size for eval framework [COG-1546] (#634 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Exposed chunk_size in get_default_tasks in cognify - Reintegrated chunk_size in corpus building in eval framework ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an optional configuration parameter to allow users to set custom processing segment sizes. This enhances flexibility in managing content processing and task execution, enabling more dynamic control over resource handling during corpus creation and related operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 16:13:20 +01:00
hajdul88	6fcfb3c398	feat: productionizing ontology solution [COG-1401] (#623 ) <!-- .github/pull_request_template.md --> ## Description This PR contains the ontology feature integrated into cognify ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced ontology management with the introduction of the `OntologyResolver` class for improved data handling and querying. - Expanded ontology framework now provides enriched coverage of technology and automotive domains, including new entities and relationships. - Updated entity models now include a validation flag to support improved data integrity. - Added support for specifying an ontology file path in relevant functions to enhance flexibility. - Refactor - Streamlined integration of ontology processing across data extraction and workflow routines. - Chores - Updated project dependencies to include `owlready2` for advanced ontology functionality. - Tests - Introduced a new test suite for the `OntologyResolver` class to validate its functionality under various conditions. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 14:31:19 +01:00
alekszievr	c1f7b667d1	feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Simplified text processing by unifying multiple size-related parameters into a single metric across chunking and extraction functionalities. - Streamlined logic for text segmentation by removing redundant calculations and checks, resulting in a more consistent chunk management process. - Chores - Removed the `modal` package as a dependency. - Documentation - Updated the README.md to include a new demo video link and clarified default environment variable settings. - Enhanced the CONTRIBUTING.md to improve clarity and engagement for potential contributors. - Bug Fixes - Improved handling of sentence-ending punctuation in text processing to include additional characters. - Version Update - Updated project version to 0.1.33 in the pyproject.toml file. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 14:03:41 +01:00
hajdul88	e3f3d49a3b	Feature/cog 1312 integrating evaluation framework into dreamify (#562 ) <!-- .github/pull_request_template.md --> ## Description This PR contains eval framework changes due to the autooptimizer integration ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced answer generation now returns structured answer details. - Search functionality accepts configurable prompt inputs. - Option to generate a metrics dashboard from evaluations. - Corpus building tasks now support adjustable chunk settings for greater flexibility. - New task retrieval functionality allows for flexible task configuration. - Introduced new methods for creating and managing metrics dashboards. - Refactor/Chore - Streamlined API signatures and reorganized module interfaces for better consistency. - Updated import paths to reflect new module structure. - Tests - Updated test scenarios to align with new configurations and parameter adjustments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-03 19:55:47 +01:00
Daniel Molnar	d27f847753	Transition to new retrievers, update searches (#585 ) <!-- .github/pull_request_template.md --> ## Description Delete legacy search implementations after migrating to new retriever classes ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced search and retrieval capabilities, providing improved context resolution for code queries, completions, summaries, and graph connections. - Refactor - Shifted to a modular, object-oriented approach that consolidates query logic and streamlines error management for a more robust and scalable experience. - Bug Fixes - Improved error handling for unsupported search types and retrieval operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-27 15:25:24 +01:00
Boris	711ae8e675	feat: codegraph improvements and new CODE search [COG-1351] (#581 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an automated deployment workflow to build and push container images. - Updated dependency management to include additional database support. - Refactor - Enhanced asynchronous operations and logging in the server for improved performance. - Optimized extraction and retrieval processes for code-related data. - Chores - Streamlined build configurations and startup scripts for greater reliability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com>	2025-02-26 20:15:02 +01:00
alekszievr	2a167fa1ab	feat: externalize chunkers [cog-1354] (#547 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced document chunk extraction for improved processing consistency across multiple formats. - Refactor - Streamlined the configuration for text chunking by replacing indirect mappings with a direct instantiation approach across document types. - Updated method signatures across various document classes to accept chunker class references instead of string identifiers. - Chores - Removed legacy configuration utilities related to document chunking to simplify processing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris <boris@topoteretes.com>	2025-02-19 13:26:11 +01:00
Igor Ilic	46e026f77f	Cognee gui [COG-1307] (#530 ) <!-- .github/pull_request_template.md --> ## Description Add a simple GUI to add documents to Cognee and use GRAPH_COMPLETION search to get answers ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an interactive file search interface with intuitive controls. Users can easily upload files, enter search terms, and view results in a unified display with clear notifications during processing. - Chores - Updated project dependencies to include `pyside6` and `qasync` for enhanced GUI functionality. - Refined background query processing to improve the accuracy and relevance of search outcomes. - Improved code readability with formatting enhancements in the search function. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-14 15:51:33 +01:00
SJ	a602094598	feat: Update parameters in search API route to match search function parameters order (#528 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> Updated handling of SearchType through the chain: Router receives JSON with searchType Enum, example: "searchType": "CHUNKS" FastAPI converts to SearchType enum via SearchPayloadDTO search_v2.py expects SearchType enum search.py takes SearchType enum and extracts value log_query.py takes string value Query model stores string in database get_search_router.py Matched the exact field name from JSON payload searchType instead of search_type in the SearchPayloadDTO class. Changed cognee_search() params to use payload.query and payload.searchType search.py Changed query_type to SearchType log_query to accept query_type.value parameter instead of str(query_type) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Updated the search functionality to improve consistency and reliability. - Enhanced validation by switching to stricter search type checks, ensuring only valid search types are processed. - Maintained robust error handling for uninterrupted search operations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-02-13 21:04:31 +01:00
Boris Arzentar	d0d8559453	fix: consolidate api/sdk/mcp search	2025-02-13 13:15:39 +01:00
Boris	f9e6dcf837	fix: simplify code pipeline (#529 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced code search and dependency analysis for improved accuracy. - Introduced a new high-performance text embedding option. - Added an additional execution entry point for code graph processing. - New optional parameters for flexible property selection in retrieval functions. - Introduced new classes for handling import statements, function definitions, and class definitions. - Updated embedding engine selection based on configuration options. - Bug Fixes - Improved error handling in search operations and database queries for a more stable user experience. - Enhanced error logging for source code parsing. - Refactor - Streamlined asynchronous processing and refactored internal dependency extraction. - Updated configuration and integration settings to enhance overall reliability. - Restructured functions for simplified dependency handling. - Chores - Upgraded and reorganized dependency management with optional libraries for extended functionality. - Added new secret parameters for embedding configuration in workflow settings. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: vasilije <vas.markovic@gmail.com>	2025-02-12 23:58:48 +01:00
Vasilije	9ba2e0d6c1	chore: Fix and update visualization (#518 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced enhanced visualization capabilities that let users launch a dedicated server for visual displays. - Documentation - Updated several interactive notebooks to include execution outputs and expanded explanatory content for better user guidance. - Style - Refined formatting and layout across notebooks to ensure consistent presentation and improved readability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-11 19:25:01 +01:00
alekszievr	05ba29af01	Feat: log pipeline status and pass it through pipeline [COG-1214] (#501 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced pipeline execution now provides consolidated status feedback with improved telemetry for start, completion, and error events. - Automatic generation of unique dataset identifiers offers clearer task and pipeline run associations. - Refactor - Task execution has been streamlined with explicit parameter handling for more structured pipeline processing. - Interactive examples and demos now return results directly, making integration and monitoring more accessible. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-02-11 16:41:40 +01:00
Boris	8f84713b54	fix: support structured data conversion to data points (#512 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced version tracking and enhanced metadata in core data models for improved data consistency. - Bug Fixes - Improved error handling during graph data loading to prevent disruptions from unexpected identifier formats. - Refactor - Centralized identifier parsing and streamlined model definitions, ensuring smoother and more consistent operations across search, retrieval, and indexing workflows. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-10 17:16:13 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
Igor Ilic	5fe7ff9883	refactor: Refactor search so graph completion is used by default (#505 ) <!-- .github/pull_request_template.md --> ## Description Refactor search so query type doesn't need to be provided to make it simpler for new users ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Improved the search interface by standardizing parameter usage with explicit keyword arguments for specifying search types, enhancing clarity and consistency. - Tests - Updated test cases and example integrations to align with the revised search parameters, ensuring consistent behavior and reliable validation of search outcomes. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-07 17:16:34 +01:00
alekszievr	8396fed9a1	feat: metrics in neo4j adapter [COG-1082] (#487 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced graph management capabilities allow users to verify graph existence, project complete graphs, and remove graphs, delivering more comprehensive graph insights. - Refactor - Adjusted default task behavior for streamlined performance. - Updated timestamp handling to ensure accurate and consistent record tracking. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-07 15:58:43 +01:00
hajdul88	bcd326518d	feat: implements graph visualization method for cognee (#493 ) <!-- .github/pull_request_template.md --> ## Description This PR contains the improvement of the visualization endpoint ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Launched an enhanced interactive network visualization utility that renders dynamic, browser-based graphs. The new feature simplifies execution by directly generating an HTML file showcasing the visualization—complete with interactive elements and an on-screen confirmation—providing a more intuitive and efficient experience. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-06 11:22:17 +01:00
Igor Ilic	df163b0431	Add pydantic settings checker (#497 ) <!-- .github/pull_request_template.md --> ## Description Add test of embedding and LLM model at beginning of cognee use Fix issue with relational database async use Refactor handling of cache mechanism for all databases so changes in config can be reflected in get functions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced connection testing for language and embedding services at startup, ensuring improved reliability during data addition. - Refactor - Streamlined engine initialization across multiple database systems to enhance performance and clarity. - Improved parameter handling and caching strategies for faster, more consistent operations. - Updated record identifiers for more robust and unique data storage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-02-04 23:18:27 +01:00
Igor Ilic	1260fc7db0	fix: Add reraising of general exception handling in cognee [COG-1062] (#490 ) <!-- .github/pull_request_template.md --> ## Description Add re-raising of errors in general exception handling ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes & Stability Improvements - Enhanced error handling throughout the system, ensuring issues during operations like server startup, data processing, and graph management are properly logged and reported. - Refactor - Standardized logging practices replace basic output statements, improving traceability and providing better insights for troubleshooting. - New Features - Updated search functionality now returns only unique results, enhancing data consistency and the overall user experience. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-02-04 10:51:05 +01:00
Vasilije	4d3acc358a	fix: mcp improvements (#472 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Dependency Update - Downgraded `mcp` package version from 1.2.0 to 1.1.3 - Updated `cognee` dependency to include additional features with `cognee[codegraph]` - New Features - Introduced a new tool, "codify", for transforming codebases into knowledge graphs - Enhanced the existing "search" tool to accept a new parameter for search type - Improvements - Streamlined search functionality with a new modular approach - Added new asynchronous function for retrieving and formatting code parts - Documentation - Updated import paths for `SearchType` in various modules and tests to reflect structural changes - Code Cleanup - Removed legacy search module and associated classes/functions - Refined data transfer object classes for consistency and clarity <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-02-04 08:47:31 +01:00
alekszievr	2858a674f5	feat: Calculate graph metrics for networkx graph [COG-1082] (#484 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enabled an option to retrieve more detailed metrics, providing comprehensive analytics for graph and descriptive data. - Refactor - Standardized the way metrics are obtained across components for consistent behavior and improved data accuracy. - Chore - Made internal enhancements to support optional detailed metric calculations, streamlining system performance and ensuring future scalability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-03 18:05:53 +01:00
alekszievr	5119992fd8	feat: Add graph metrics getter in graph db interface and adapters [COG-1082] (#483 ) Dummy implementation of graph metrics to demonstrate how the interface will look like <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced asynchronous functionality for retrieving comprehensive graph metrics, including counts and connectivity details, across different systems. - Refactor - Streamlined metrics processing and storage by shifting to direct retrieval from the graph engine. - Updated naming conventions for the `GraphMetrics` database table and reorganized module imports to enhance internal consistency. - Chores - Removed dataset deletion functionalities while introducing the ability to store descriptive metrics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-03 15:25:04 +01:00
Igor Ilic	8879f3fbbe	feat: Add gemini support [COG-1023] (#485 ) <!-- .github/pull_request_template.md --> ## Description PR to test Gemini PR from holchan 1. Add Gemini LLM and Gemini Embedding support 2. Fix CodeGraph issue with chunks being bigger than maximum token value 3. Add Tokenizer adapters to CodeGraph ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for the Gemini LLM provider. - Expanded LLM configuration options. - Introduced a new GitHub Actions workflow for multimetric QA evaluation. - Added new environment variables for LLM and embedding configurations across various workflows. - Bug Fixes - Improved error handling in various components. - Updated tokenization and embedding processes. - Removed warning related to missing `dict` method in data items. - Refactor - Simplified token extraction and decoding methods. - Updated tokenizer interfaces. - Removed deprecated dependencies. - Enhanced retry logic and error handling in embedding processes. - Documentation - Updated configuration comments and settings. - Chores - Updated GitHub Actions workflows to accommodate new secrets and environment variables. - Modified evaluation parameters. - Adjusted dependency management for optional libraries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-01-31 18:03:23 +01:00
hajdul88	f843c256e4	feat: Use unwind for batch edge save and add unit tests for get_graph_from_model * feat: adds some unit tests for get_graph_from_model * feat: updates neo4j add_edges cypher and deletes shallow get_graph_from_model * fix: fixing merge conflict false resolve * chore: deletes old only_root unit test	2025-01-31 13:14:04 +01:00
alekszievr	a79f7133fd	Feat: add number of tokens and descriptive graph metrics to metric table [COG-1132] (#481 ) * Count the number of tokens in documents * save token count to relational db * Add metrics to metric table * Store list as json instead of array in relational db table * Sum in sql instead of python * Unify naming * Return data_points in descriptive metric calculation task --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-01-30 12:39:14 +01:00
Igor Ilic	6f8cbdbf1c	Merge branch 'dev' into COG-970-refactor-tokenizing	2025-01-28 15:44:57 +01:00
Igor Ilic	4e56cd64a1	refactor: Add max chunk tokens to code graph pipeline	2025-01-28 15:33:34 +01:00
Igor Ilic	3db7f85c9c	feat: Add max_chunk_tokens value to chunkers Add formula and forwarding of max_chunk_tokens value through Cognee	2025-01-28 14:32:00 +01:00
Boris Arzentar	3320bc8f2c	feat: add codegraph related API endpoints	2025-01-28 10:08:59 +01:00
Igor Ilic	93249c72c5	fix: Initial commit to resolve issue with using tokenizer based on LLMs Currently TikToken is used for tokenizing by default which is only supported by OpenAI, this is an initial commit in an attempt to add Cognee tokenizing support for multiple LLMs	2025-01-21 19:53:22 +01:00
Igor Ilic	0c7c1d7503	refactor: Refactor ingestion to only have one ingestion task	2025-01-20 14:33:47 +01:00
lxobr	65a0c98455	COG-989 feat: make tasks a configurable argument in the cognify function (#442 ) * feat: make tasks a configurable argument in the cognify function * fix: add data points task --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-01-17 10:20:57 +01:00
vasilije	cb7b2d311e	Fix for now	2025-01-16 21:36:25 +01:00
Rita Aleksziev	a11b914f39	Merge branch 'dev' into COG-949	2025-01-10 10:02:56 +01:00
Igor Ilic	6b57bfc4cb	feat: Add ability to change graph database configuration through cognee	2025-01-09 16:41:18 +01:00
Rita Aleksziev	626bc76f5c	Set max_tokens in config	2025-01-09 12:53:26 +01:00
hajdul88	341f30fcdc	fix: Fixes ruff formatting	2025-01-09 12:00:49 +01:00
hajdul88	fe57eb69e7	Merge branch 'dev' into feature/cog-967-adding-graph-completion-feature-to-cognee	2025-01-09 11:07:19 +01:00
Rita Aleksziev	5635da6e38	Adjust unit tests	2025-01-09 10:53:03 +01:00
hajdul88	d39140f28b	feat: implements the first version of graph based completion in search	2025-01-08 16:10:29 +01:00
Rita Aleksziev	97814e334f	Get embedding engine instead of passing it in code chunking.	2025-01-08 13:45:04 +01:00
Rita Aleksziev	34a9267f41	Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.	2025-01-08 13:23:17 +01:00
vasilije	41b1486cff	Fix visualization	2025-01-08 13:13:52 +01:00
hajdul88	18c8bc3c33	Merge branch 'dev' into COG-adding_html_graph_render	2025-01-08 10:44:11 +01:00
alekszievr	0dec704445	Merge branch 'dev' into COG-949	2025-01-08 10:21:07 +01:00
vasilije	61897c57e8	Fix visualization	2025-01-07 15:25:16 +01:00
vasilije	2d10065166	Fix visualization	2025-01-07 15:21:44 +01:00

1 2 3 4 5 ...

253 commits