cognee

Author	SHA1	Message	Date
Igor Ilic	dbdf04c089	Data model migration (#1143 ) <!-- .github/pull_request_template.md --> ## Description Data model migration for new release ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-07-24 15:03:16 +02:00
Boris	d6727a1b4a	fix: UnstructuredDocument read method (#1141 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-07-24 13:23:27 +02:00
Boris	c5bd6bed40	fix: s3 file storage (#1095 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-07-16 20:36:18 +02:00
Boris	46c4463cb2	feat: s3 storage (#988 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-07-14 21:47:08 +02:00
Igor Ilic	f68fd59b95	feat: Data size info tracking (#1088 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-07-14 19:03:58 +02:00
Boris Arzentar	fa5ea44345	Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization	2025-07-06 21:03:10 +02:00
Boris Arzentar	685d282f5c	fix: add error handling	2025-07-06 21:03:02 +02:00
hajdul88	3c3c89a140	fix: Adds graceful handling quick fix for damaged pdf files (#1047 ) <!-- .github/pull_request_template.md --> ## Description fix: Adds graceful handling quick fix for damaged pdf files ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-07-06 13:09:42 +02:00
Boris Arzentar	4eba76ca1f	Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization	2025-07-04 15:37:57 +02:00
Vasilije	ada3f7b086	fix: Logger suppresion and database logs (#1041 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com>	2025-07-03 20:08:27 +02:00
Boris Arzentar	86bd3e4a5a	Merge remote-tracking branch 'origin/dev' into feat/modal-parallelization	2025-07-02 11:28:22 +02:00
Igor Ilic	0d75b6dc76	Merge branch 'main' into main-merge	2025-06-30 12:24:24 +02:00
Hashem Aldhaheri	fd77e92cc4	Fix: Handle file:// URLs in open_data_file function (#1019 ) ## Summary This PR fixes an asymmetry issue where files saved with `file://` prefixes could not be read back, causing "file not found" errors. ## Problem The Cognee framework has a bug where: - `save_data_to_file.py` adds `file://` prefix when saving files - `open_data_file.py` doesn't handle the `file://` prefix when reading files - This causes saved files to appear as "lost" with cryptic "file not found" errors ## Solution Added proper handling for `file://` URLs in `open_data_file.py` by: - Checking if the file path starts with `"file://"` - Stripping the prefix using `replace("file://", "", 1)` - Following the same pattern as S3 URL handling ## Changes - Modified `cognee/modules/data/processing/document_types/open_data_file.py` to handle `file://` URLs - Added comprehensive unit tests in `cognee/tests/unit/modules/data/test_open_data_file.py` ## Testing Added 6 test cases covering: - Regular file paths (ensuring backward compatibility) - file:// URLs in text mode - file:// URLs in binary mode - file:// URLs with specific encoding - Nonexistent files with file:// URLs - Edge case with multiple file:// prefixes All tests pass successfully. ## Notes - This is a minimal fix that maintains backward compatibility - The fix follows the existing pattern used for S3 URL handling - No breaking changes to the API I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Signed-off-by: Hashem Aldhaheri <aenawi@gmail.com>	2025-06-30 11:55:34 +02:00
Igor Ilic	14be2a5f5d	feat: Add dataset_id to pipeline run info and status (#1009 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-06-30 11:53:17 +02:00
hajdul88	21a4217301	Feature: Makes s3 pathway imports optional so cognee can run without s3fs (#978 ) <!-- .github/pull_request_template.md --> ## Description Makes s3 pathway imports optional so cognee can run without s3fs ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-06-13 08:53:30 +02:00
Igor Ilic	1ed6cfd918	feat: new Dataset permissions (#869 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-06-06 14:20:57 +02:00
Boris	0aac93e9c4	Merge dev to main (#827 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Daniel Molnar <soobrosa@gmail.com> Co-authored-by: Diego Baptista Theuerkauf <34717973+diegoabt@users.noreply.github.com>	2025-05-15 13:15:49 +02:00
Igor Ilic	966e337500	feat: add MCP check status tool [COG-1784] (#793 ) <!-- .github/pull_request_template.md --> ## Description Added tools to check current cognify and codify status ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-13 12:09:14 -04:00
Boris Arzentar	a1e605ca97	fix: batch datapoints on save to limit bandwidth size	2025-05-12 11:28:13 +02:00
Igor Ilic	60da1c899e	fix: graph prompt path (#769 ) <!-- .github/pull_request_template.md --> ## Description Fix graph prompt path ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Evain Arthur <arthur.evain35@gmail.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-04-23 12:03:51 +02:00
Vasilije	bb7eaa017b	feat: Group DataPoints into NodeSets (#680 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-04-19 20:21:04 +02:00
Boris	675b66175f	test: make search unit tests deterministic (#726 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Daniel Molnar <soobrosa@gmail.com>	2025-04-18 21:55:24 +02:00
Daniel Molnar	9ba12b25ef	feat: add delete by document (#668 ) <!-- .github/pull_request_template.md --> ## Description Delete by document. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-04-17 15:42:10 +02:00
hajdul88	0121a2b5fc	feature: Adds S3 functionality (#731 ) <!-- .github/pull_request_template.md --> ## Description Adds S3 support ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-17 08:56:40 +02:00
lxobr	8207dc8643	feat: make graph creation prompt configurable (#686 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Added new graph creation prompts - Exposed graph creation prompts in .cognify via get_default tasks - Exposed graph creation prompts in eval framework ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-04-03 11:14:33 +02:00
Boris	ebf1f81b35	fix: code cleanup [COG-781] (#667 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-03-26 18:32:43 +01:00
Daniel Molnar	73db1a5a53	fix: human readable logs (#658 ) <!-- .github/pull_request_template.md --> ## Description Introducing scructlog. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-03-25 11:54:40 +01:00
Boris	d192d1fe20	chore: remove unused dependencies and make some optional (#661 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-03-25 10:19:52 +01:00
Igor Ilic	7bf30f7373	fix: Cognee backend fixes (#659 ) <!-- .github/pull_request_template.md --> ## Description Cognee backend fixes ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Improved handling of `tenant_id` in JWT payload for enhanced type safety. - Unique identifier generation for datasets now considers the owner ID, allowing for multiple users to share the same dataset name. - Bug Fixes - Disabled user role permissions in the permission check logic temporarily during a rework. - Refactor - Simplified dependencies by removing unnecessary model imports. - Updated parameter name from `tenant` to `tenant_id` for clarity in JWT creation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-20 21:51:35 +01:00
Igor Ilic	88ed411f03	feat: user authorization [COG-1189] (#593 ) <!-- .github/pull_request_template.md --> ## Description Added user authorization through JWT header, reworked user and relevant RBAC models to accompany future User Permission system. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an automated workflow to validate server startup. - Added secure JWT token generation for improved session handling. - Enabled a new structure for permission management with role and tenant-based controls, including endpoints for creating roles, tenants, and assigning permissions. - Added methods for assigning default permissions to roles, tenants, and users. - Introduced new classes for managing default permissions for roles, tenants, and users. - Refactor - Streamlined authentication and user management flows with enhanced error handling. - Tests - Upgraded integration tests with improved database initialization and data pruning for a more stable environment. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-03-13 13:33:42 +01:00
alekszievr	c1f7b667d1	feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Simplified text processing by unifying multiple size-related parameters into a single metric across chunking and extraction functionalities. - Streamlined logic for text segmentation by removing redundant calculations and checks, resulting in a more consistent chunk management process. - Chores - Removed the `modal` package as a dependency. - Documentation - Updated the README.md to include a new demo video link and clarified default environment variable settings. - Enhanced the CONTRIBUTING.md to improve clarity and engagement for potential contributors. - Bug Fixes - Improved handling of sentence-ending punctuation in text processing to include additional characters. - Version Update - Updated project version to 0.1.33 in the pyproject.toml file. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 14:03:41 +01:00
lxobr	f033f733b5	feat: entity brute force triplet search [COG-1325] (#589 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Refactored `brute_force_triplet_search`, extracting memory projection. - Built TripletSearchContextProvider (extends BaseContextProvider) to create a single memory projection and perform a triplet search for each entity. - Refactored `entity_completion` into EntityCompletionRetriever (extends BaseRetriever). - Added SummarizedTripletSearchContextProvider (extends TripletSearchContextProvider) for an alternative summarized output format. - Developed and tested an example showcasing both context providers, comparing raw triplets, summaries, and standard search results. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced text summarization now delivers clearer, more concise overviews of search results. - Improved search performance with optimized context retrieval and memory reuse for faster, more reliable results. - Introduced advanced entity-based completion for generating more relevant, context-aware responses. - Refactor - Streamlined internal workflows and error handling to ensure a smoother overall experience. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris <boris@topoteretes.com>	2025-03-05 11:17:58 +01:00
alekszievr	6d7a68dbba	Feat: Store descriptive metrics identified by pipeline run id [cog-1260] (#582 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a new analytic capability that calculates descriptive graph metrics for pipeline runs when enabled. - Updated the execution flow to include an option for activating the graph metrics step. - Chores - Removed the previous mechanism for storing descriptive metrics to streamline the system. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-03-03 19:09:35 +01:00
alekszievr	a61df966c6	feat: use external chunker [cog-1354] (#551 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a modular content chunking interface that offers flexible text segmentation with configurable chunk size and overlap. - Added new chunkers for enhanced text processing, including `LangchainChunker` and improved `TextChunker`. - Refactor - Unified the chunk extraction mechanism across various document types for improved consistency and type safety. - Updated method signatures to enhance clarity and type safety regarding chunker usage. - Enhanced error handling and logging during text segmentation to guide adjustments when content exceeds limits. - Bug Fixes - Adjusted expected output in tests to reflect changes in chunking logic and configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-21 14:10:59 +01:00
alekszievr	2a167fa1ab	feat: externalize chunkers [cog-1354] (#547 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced document chunk extraction for improved processing consistency across multiple formats. - Refactor - Streamlined the configuration for text chunking by replacing indirect mappings with a direct instantiation approach across document types. - Updated method signatures across various document classes to accept chunker class references instead of string identifiers. - Chores - Removed legacy configuration utilities related to document chunking to simplify processing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris <boris@topoteretes.com>	2025-02-19 13:26:11 +01:00
hajdul88	6a0c0e3ef8	feat: Cognee evaluation framework development (#498 ) <!-- .github/pull_request_template.md --> This PR contains the evaluation framework development for cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Expanded evaluation framework now integrates asynchronous corpus building, question answering, and performance evaluation with adaptive benchmarks for improved metrics (correctness, exact match, and F1 score). - Infrastructure - Added database integration for persistent storage of questions, answers, and metrics. - Launched an interactive metrics dashboard featuring advanced visualizations. - Introduced an automated testing workflow for continuous quality assurance. - Documentation - Updated guidelines for generating concise, clear answers. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-11 16:31:54 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
alekszievr	8396fed9a1	feat: metrics in neo4j adapter [COG-1082] (#487 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced graph management capabilities allow users to verify graph existence, project complete graphs, and remove graphs, delivering more comprehensive graph insights. - Refactor - Adjusted default task behavior for streamlined performance. - Updated timestamp handling to ensure accurate and consistent record tracking. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-07 15:58:43 +01:00
Igor Ilic	1260fc7db0	fix: Add reraising of general exception handling in cognee [COG-1062] (#490 ) <!-- .github/pull_request_template.md --> ## Description Add re-raising of errors in general exception handling ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes & Stability Improvements - Enhanced error handling throughout the system, ensuring issues during operations like server startup, data processing, and graph management are properly logged and reported. - Refactor - Standardized logging practices replace basic output statements, improving traceability and providing better insights for troubleshooting. - New Features - Updated search functionality now returns only unique results, enhancing data consistency and the overall user experience. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-02-04 10:51:05 +01:00
alekszievr	2858a674f5	feat: Calculate graph metrics for networkx graph [COG-1082] (#484 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enabled an option to retrieve more detailed metrics, providing comprehensive analytics for graph and descriptive data. - Refactor - Standardized the way metrics are obtained across components for consistent behavior and improved data accuracy. - Chore - Made internal enhancements to support optional detailed metric calculations, streamlining system performance and ensuring future scalability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-03 18:05:53 +01:00
alekszievr	5119992fd8	feat: Add graph metrics getter in graph db interface and adapters [COG-1082] (#483 ) Dummy implementation of graph metrics to demonstrate how the interface will look like <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced asynchronous functionality for retrieving comprehensive graph metrics, including counts and connectivity details, across different systems. - Refactor - Streamlined metrics processing and storage by shifting to direct retrieval from the graph engine. - Updated naming conventions for the `GraphMetrics` database table and reorganized module imports to enhance internal consistency. - Chores - Removed dataset deletion functionalities while introducing the ability to store descriptive metrics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-03 15:25:04 +01:00
alekszievr	a79f7133fd	Feat: add number of tokens and descriptive graph metrics to metric table [COG-1132] (#481 ) * Count the number of tokens in documents * save token count to relational db * Add metrics to metric table * Store list as json instead of array in relational db table * Sum in sql instead of python * Unify naming * Return data_points in descriptive metric calculation task --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-01-30 12:39:14 +01:00
alekszievr	edae2771a5	Count the number of tokens in documents [COG-1071] (#476 ) * Count the number of tokens in documents * save token count to relational db --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-01-29 11:29:09 +01:00
Igor Ilic	860218632f	refactor: add suggestions from PR Add suggestsions made by CodeRabbit on pull request	2025-01-28 17:15:25 +01:00
Igor Ilic	710ca78d6e	Merge branch 'dev' into COG-970-refactor-tokenizing	2025-01-28 16:31:11 +01:00
alekszievr	98f0f60980	Feat: [cog-1089] Define pydantic models for descriptive graph metrics and input metrics (#466 ) * feat: make tasks a configurable argument in the cognify function * fix: add data points task * Define pydantic models for descriptive graph metrics and input metrics * remove to_json method * Use just one MetricData class instead of two --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-28 16:11:31 +01:00
Igor Ilic	3db7f85c9c	feat: Add max_chunk_tokens value to chunkers Add formula and forwarding of max_chunk_tokens value through Cognee	2025-01-28 14:32:00 +01:00
Igor Ilic	6d5679f9d2	Merge branch 'dev' into COG-970-refactor-tokenizing	2025-01-23 18:14:49 +01:00
Igor Ilic	80e67b0619	refactor: Rename foreign to external metadata Rename foreign metadata to external metadata for metadata coming outside of Cognee	2025-01-22 16:07:35 +01:00
Igor Ilic	93249c72c5	fix: Initial commit to resolve issue with using tokenizer based on LLMs Currently TikToken is used for tokenizing by default which is only supported by OpenAI, this is an initial commit in an attempt to add Cognee tokenizing support for multiple LLMs	2025-01-21 19:53:22 +01:00

1 2 3

140 commits