cognee

Author	SHA1	Message	Date
Igor Ilic	f1144abc54	refactor: remove LLMGateway usage where not needed	2025-09-09 13:50:16 +02:00
Igor Ilic	89b51a244d	feat: Add baml dynamic typing	2025-09-09 13:12:59 +02:00
hajdul88	affbc557d2	chore: ruff formatting	2025-08-14 14:17:35 +02:00
hajdul88	63d071f0d8	feat: adds input checks for add datapoints and summarization tasks	2025-08-14 14:17:13 +02:00
Vasilije	dabd0912f8	feat: Cog 2082 add BAML to cognee (#1054 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Raj2604 <rajmandhare26@gmail.com> Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Boris Arzentar <borisarzentar@gmail.com> Co-authored-by: Raj Mandhare <96978537+Raj2604@users.noreply.github.com> Co-authored-by: Pedro Thompson <thompsonp17@hotmail.com> Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br>	2025-08-06 10:41:47 +02:00
Daniel Molnar	bb68d6a0df	Docstring tasks. (#878 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-27 21:33:16 +02:00
Igor Ilic	af276b8999	feat: Add initial cognee pipeline simplification [COG-1705] (#670 ) <!-- .github/pull_request_template.md --> ## Description Simplify Cognee pipeline usage for users ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-04-17 14:02:12 +02:00
Boris	9536395468	Revert "feat: pipeline tasks needs mapping" (#717 ) Reverts topoteretes/cognee#690 I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-10 12:10:12 +02:00
Boris	0ce6fad24a	feat: pipeline tasks needs mapping (#690 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-03 10:52:59 +02:00
Daniel Molnar	d27f847753	Transition to new retrievers, update searches (#585 ) <!-- .github/pull_request_template.md --> ## Description Delete legacy search implementations after migrating to new retriever classes ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced search and retrieval capabilities, providing improved context resolution for code queries, completions, summaries, and graph connections. - Refactor - Shifted to a modular, object-oriented approach that consolidates query logic and streamlines error management for a more robust and scalable experience. - Bug Fixes - Improved error handling for unsupported search types and retrieval operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-27 15:25:24 +01:00
lxobr	9cc357ac1c	Feat/cog 1365 unify retrievers (#572 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Created the `BaseRetriever` class to unify all the retrievers and searches. - Implemented seven specialized retrievers (summaries, chunks, completions, graph, graph-summary, insights, code) with consistent get_context/get_completion interfaces. - Added json context dumping feature in the current completion implementations to enable context comparisons. - Built a comparison framework to validate old vs new implementations. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced multiple retrieval classes for enhanced search capabilities, including `BaseRetriever`, `ChunksRetriever`, `CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`, `GraphSummaryCompletionRetriever`, `InsightsRetriever`, and `SummariesRetriever`. - Enhanced query completions with optional context saving for improved data persistence. - Implemented advanced tools to compare retrieval outcomes across different implementations. - Refactor - Streamlined internal module organization and updated references for increased maintainability and consistency. - Added comments indicating future maintenance tasks related to code merging. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-02-27 12:13:21 +01:00
Boris	f9e6dcf837	fix: simplify code pipeline (#529 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced code search and dependency analysis for improved accuracy. - Introduced a new high-performance text embedding option. - Added an additional execution entry point for code graph processing. - New optional parameters for flexible property selection in retrieval functions. - Introduced new classes for handling import statements, function definitions, and class definitions. - Updated embedding engine selection based on configuration options. - Bug Fixes - Improved error handling in search operations and database queries for a more stable user experience. - Enhanced error logging for source code parsing. - Refactor - Streamlined asynchronous processing and refactored internal dependency extraction. - Updated configuration and integration settings to enhance overall reliability. - Restructured functions for simplified dependency handling. - Chores - Upgraded and reorganized dependency management with optional libraries for extended functionality. - Added new secret parameters for embedding configuration in workflow settings. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: vasilije <vas.markovic@gmail.com>	2025-02-12 23:58:48 +01:00
Boris	8f84713b54	fix: support structured data conversion to data points (#512 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced version tracking and enhanced metadata in core data models for improved data consistency. - Bug Fixes - Improved error handling during graph data loading to prevent disruptions from unexpected identifier formats. - Refactor - Centralized identifier parsing and streamlined model definitions, ensuring smoother and more consistent operations across search, retrieval, and indexing workflows. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-10 17:16:13 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
hajdul88	56cc223302	feat: adds pydantic types to graph layer models	2025-01-09 16:46:41 +01:00
vasilije	60c8fd103b	ruff format	2025-01-05 19:09:08 +01:00
lxobr	262deee26e	Cog 813 source code chunks (#383 ) * fix: pass the list of all CodeFiles to enrichment task * feat: introduce SourceCodeChunk, update metadata * feat: get_source_code_chunks code graph pipeline task * feat: integrate get_source_code_chunks task, comment out summarize_code * Fix code summarization (#387) * feat: update data models * feat: naive parse long strings in source code * fix: get_non_py_files instead of get_non_code_files * fix: limit recursion, add comment * handle embedding empty input error (#398) * feat: robustly handle CodeFile source code * refactor: sort imports * todo: add support for other embedding models * feat: add custom logger * feat: add robustness to get_source_code_chunks Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat: improve embedding exceptions * refactor: format indents, rename module --------- Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2024-12-26 13:53:38 +01:00
hajdul88	4689e55e68	feat: Adds mock summary for codegraph pipeline	2024-12-18 16:42:48 +01:00
alekszievr	9afd0ece63	Structured code summarization (#375 ) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings * Structured code summarization * add missing prompt file * Remove summarization_model argument from summarize_code and fix typehinting * minor refactors --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2024-12-17 13:05:47 +01:00
lxobr	da5e3ab24d	COG 870 Remove duplicate edges from the code graph (#293 ) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2024-12-17 12:02:25 +01:00
alekszievr	bfa0f06fb4	Add type to DataPoint metadata (#364 ) * Add type to DataPoint metadata * Add missing index_fields * Use DataPoint UUID type in pgvector create_data_points * Make _metadata mandatory everywhere	2024-12-16 16:27:03 +01:00
hajdul88	6d85165189	Feature/cog 539 implementing additional retriever approaches (#262 ) * fix: refactor get_graph_from_model to return nodes and edges correctly * fix: add missing params * fix: remove complex zip usage * fix: add edges to data_point properties * fix: handle rate limit error coming from llm model * fix: fixes lost edges and nodes in get_graph_from_model * fix: fixes database pruning issue in pgvector * fix: fixes database pruning issue in pgvector (#261) * feat: adds code summary embeddings to vector DB * fix: cognee_demo notebook pipeline is not saving summaries * feat: implements first version of codegraph retriever * chore: implements minor changes mostly to make the code production ready * fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation * feat: implements unit tests for description to codepart search * fix: fixes edge property inconsistent access in codepart retriever * chore: implements more precise typing for get_attribute method for cogneegraph * chore: adds spacing to tests and changes the cogneegraph getter names --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2024-12-10 11:07:06 +01:00
Boris	348610e73c	fix: refactor get_graph_from_model to return nodes and edges correctly (#257 ) * fix: handle rate limit error coming from llm model * fix: fixes lost edges and nodes in get_graph_from_model * fix: fixes database pruning issue in pgvector (#261) * fix: cognee_demo notebook pipeline is not saving summaries --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2024-12-06 12:52:01 +01:00
Boris	64b8aac86f	feat: code graph swe integration Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: hande-k <handekafkas7@gmail.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2024-11-27 09:32:29 +01:00
0xideas	0fb47ba23d	feat: COG-548-create-code-graph-to-kg-task (#7 ) Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2024-11-24 20:50:32 +01:00
0xideas	80b06c3acb	test: Test for code graph enrichment task Co-authored-by: lxobr <lazar@topoteretes.com>	2024-11-24 19:24:47 +01:00
Igor Ilic	15b7b8ef2b	fix: Resolve issue with table names in SQL commands Some SQL commands require lowercase characters in table names unless table name is wrapped in quotes. Renamed all new tables to use lowercase Fix COG-677	2024-11-20 14:54:35 +01:00
Leon Luithlen	66fb2948f8	Small cleanup pull request	2024-11-12 15:37:03 +01:00
Boris	52180eb6b5	feat: COG-184 add falkordb (#192 ) * feat: add falkordb adapter --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2024-11-11 18:20:52 +01:00
Boris	dc187a81d7	feat: migrate search to tasks (#144 ) * fix: don't return anything on health endpoint * feat: add alembic migrations * feat: align search types with the data we store and migrate search to tasks	2024-10-07 14:41:35 +02:00

30 commits