cognee

Author	SHA1	Message	Date
Daniel Molnar	bb68d6a0df	Docstring tasks. (#878 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-27 21:33:16 +02:00
Boris	f9e6dcf837	fix: simplify code pipeline (#529 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced code search and dependency analysis for improved accuracy. - Introduced a new high-performance text embedding option. - Added an additional execution entry point for code graph processing. - New optional parameters for flexible property selection in retrieval functions. - Introduced new classes for handling import statements, function definitions, and class definitions. - Updated embedding engine selection based on configuration options. - Bug Fixes - Improved error handling in search operations and database queries for a more stable user experience. - Enhanced error logging for source code parsing. - Refactor - Streamlined asynchronous processing and refactored internal dependency extraction. - Updated configuration and integration settings to enhance overall reliability. - Restructured functions for simplified dependency handling. - Chores - Upgraded and reorganized dependency management with optional libraries for extended functionality. - Added new secret parameters for embedding configuration in workflow settings. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: vasilije <vas.markovic@gmail.com>	2025-02-12 23:58:48 +01:00
Boris	8f84713b54	fix: support structured data conversion to data points (#512 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced version tracking and enhanced metadata in core data models for improved data consistency. - Bug Fixes - Improved error handling during graph data loading to prevent disruptions from unexpected identifier formats. - Refactor - Centralized identifier parsing and streamlined model definitions, ensuring smoother and more consistent operations across search, retrieval, and indexing workflows. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-10 17:16:13 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
hajdul88	56cc223302	feat: adds pydantic types to graph layer models	2025-01-09 16:46:41 +01:00
vasilije	60c8fd103b	ruff format	2025-01-05 19:09:08 +01:00
lxobr	262deee26e	Cog 813 source code chunks (#383 ) * fix: pass the list of all CodeFiles to enrichment task * feat: introduce SourceCodeChunk, update metadata * feat: get_source_code_chunks code graph pipeline task * feat: integrate get_source_code_chunks task, comment out summarize_code * Fix code summarization (#387) * feat: update data models * feat: naive parse long strings in source code * fix: get_non_py_files instead of get_non_code_files * fix: limit recursion, add comment * handle embedding empty input error (#398) * feat: robustly handle CodeFile source code * refactor: sort imports * todo: add support for other embedding models * feat: add custom logger * feat: add robustness to get_source_code_chunks Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat: improve embedding exceptions * refactor: format indents, rename module --------- Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2024-12-26 13:53:38 +01:00
alekszievr	bfa0f06fb4	Add type to DataPoint metadata (#364 ) * Add type to DataPoint metadata * Add missing index_fields * Use DataPoint UUID type in pgvector create_data_points * Make _metadata mandatory everywhere	2024-12-16 16:27:03 +01:00
hajdul88	6d85165189	Feature/cog 539 implementing additional retriever approaches (#262 ) * fix: refactor get_graph_from_model to return nodes and edges correctly * fix: add missing params * fix: remove complex zip usage * fix: add edges to data_point properties * fix: handle rate limit error coming from llm model * fix: fixes lost edges and nodes in get_graph_from_model * fix: fixes database pruning issue in pgvector * fix: fixes database pruning issue in pgvector (#261) * feat: adds code summary embeddings to vector DB * fix: cognee_demo notebook pipeline is not saving summaries * feat: implements first version of codegraph retriever * chore: implements minor changes mostly to make the code production ready * fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation * feat: implements unit tests for description to codepart search * fix: fixes edge property inconsistent access in codepart retriever * chore: implements more precise typing for get_attribute method for cogneegraph * chore: adds spacing to tests and changes the cogneegraph getter names --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2024-12-10 11:07:06 +01:00
Boris	348610e73c	fix: refactor get_graph_from_model to return nodes and edges correctly (#257 ) * fix: handle rate limit error coming from llm model * fix: fixes lost edges and nodes in get_graph_from_model * fix: fixes database pruning issue in pgvector (#261) * fix: cognee_demo notebook pipeline is not saving summaries --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2024-12-06 12:52:01 +01:00
0xideas	80b06c3acb	test: Test for code graph enrichment task Co-authored-by: lxobr <lazar@topoteretes.com>	2024-11-24 19:24:47 +01:00
Igor Ilic	15b7b8ef2b	fix: Resolve issue with table names in SQL commands Some SQL commands require lowercase characters in table names unless table name is wrapped in quotes. Renamed all new tables to use lowercase Fix COG-677	2024-11-20 14:54:35 +01:00
Leon Luithlen	66fb2948f8	Small cleanup pull request	2024-11-12 15:37:03 +01:00

13 commits