Commit graph

30 commits

Author SHA1 Message Date
Igor Ilic
f1144abc54 refactor: remove LLMGateway usage where not needed 2025-09-09 13:50:16 +02:00
Igor Ilic
89b51a244d feat: Add baml dynamic typing 2025-09-09 13:12:59 +02:00
hajdul88
affbc557d2 chore: ruff formatting 2025-08-14 14:17:35 +02:00
hajdul88
63d071f0d8 feat: adds input checks for add datapoints and summarization tasks 2025-08-14 14:17:13 +02:00
Vasilije
dabd0912f8
feat: Cog 2082 add BAML to cognee (#1054)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Signed-off-by: Raj2604 <rajmandhare26@gmail.com>
Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Co-authored-by: Raj Mandhare <96978537+Raj2604@users.noreply.github.com>
Co-authored-by: Pedro Thompson <thompsonp17@hotmail.com>
Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br>
2025-08-06 10:41:47 +02:00
Daniel Molnar
bb68d6a0df
Docstring tasks. (#878)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-27 21:33:16 +02:00
Igor Ilic
af276b8999
feat: Add initial cognee pipeline simplification [COG-1705] (#670)
<!-- .github/pull_request_template.md -->

## Description
Simplify Cognee pipeline usage for users

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-04-17 14:02:12 +02:00
Boris
9536395468
Revert "feat: pipeline tasks needs mapping" (#717)
Reverts topoteretes/cognee#690

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-10 12:10:12 +02:00
Boris
0ce6fad24a
feat: pipeline tasks needs mapping (#690)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-03 10:52:59 +02:00
Daniel Molnar
d27f847753
Transition to new retrievers, update searches (#585)
<!-- .github/pull_request_template.md -->

## Description
Delete legacy search implementations after migrating to new retriever
classes

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced search and retrieval capabilities, providing improved context
resolution for code queries, completions, summaries, and graph
connections.
  
- **Refactor**
- Shifted to a modular, object-oriented approach that consolidates query
logic and streamlines error management for a more robust and scalable
experience.
  
- **Bug Fixes**
- Improved error handling for unsupported search types and retrieval
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 15:25:24 +01:00
lxobr
9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00
Boris
f9e6dcf837
fix: simplify code pipeline (#529)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
  - Enhanced code search and dependency analysis for improved accuracy.
  - Introduced a new high-performance text embedding option.
  - Added an additional execution entry point for code graph processing.
- New optional parameters for flexible property selection in retrieval
functions.
- Introduced new classes for handling import statements, function
definitions, and class definitions.
  - Updated embedding engine selection based on configuration options.

- **Bug Fixes**
- Improved error handling in search operations and database queries for
a more stable user experience.
  - Enhanced error logging for source code parsing.

- **Refactor**
- Streamlined asynchronous processing and refactored internal dependency
extraction.
- Updated configuration and integration settings to enhance overall
reliability.
  - Restructured functions for simplified dependency handling.

- **Chores**
- Upgraded and reorganized dependency management with optional libraries
for extended functionality.
- Added new secret parameters for embedding configuration in workflow
settings.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-02-12 23:58:48 +01:00
Boris
8f84713b54
fix: support structured data conversion to data points (#512)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Introduced version tracking and enhanced metadata in core data models
for improved data consistency.
  
- Bug Fixes
- Improved error handling during graph data loading to prevent
disruptions from unexpected identifier formats.
  
- Refactor
- Centralized identifier parsing and streamlined model definitions,
ensuring smoother and more consistent operations across search,
retrieval, and indexing workflows.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-10 17:16:13 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
hajdul88
56cc223302 feat: adds pydantic types to graph layer models 2025-01-09 16:46:41 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
lxobr
262deee26e
Cog 813 source code chunks (#383)
* fix: pass the list of all CodeFiles to enrichment task

* feat: introduce SourceCodeChunk, update metadata

* feat: get_source_code_chunks code graph pipeline task

* feat: integrate get_source_code_chunks task, comment out summarize_code

* Fix code summarization (#387)

* feat: update data models

* feat: naive parse long strings in source code

* fix: get_non_py_files instead of get_non_code_files

* fix: limit recursion, add comment

* handle embedding empty input error (#398)

* feat: robustly handle CodeFile source code

* refactor: sort imports

* todo: add support for other embedding models

* feat: add custom logger

* feat: add robustness to get_source_code_chunks

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat: improve embedding exceptions

* refactor: format indents, rename module

---------

Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-12-26 13:53:38 +01:00
hajdul88
4689e55e68 feat: Adds mock summary for codegraph pipeline 2024-12-18 16:42:48 +01:00
alekszievr
9afd0ece63
Structured code summarization (#375)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

* Structured code summarization

* add missing prompt file

* Remove summarization_model argument from summarize_code and fix typehinting

* minor refactors

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 13:05:47 +01:00
lxobr
da5e3ab24d
COG 870 Remove duplicate edges from the code graph (#293)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 12:02:25 +01:00
alekszievr
bfa0f06fb4
Add type to DataPoint metadata (#364)
* Add type to DataPoint metadata

* Add missing index_fields

* Use DataPoint UUID type in pgvector create_data_points

* Make _metadata mandatory everywhere
2024-12-16 16:27:03 +01:00
hajdul88
6d85165189
Feature/cog 539 implementing additional retriever approaches (#262)
* fix: refactor get_graph_from_model to return nodes and edges correctly

* fix: add missing params

* fix: remove complex zip usage

* fix: add edges to data_point properties

* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector

* fix: fixes database pruning issue in pgvector (#261)

* feat: adds code summary embeddings to vector DB

* fix: cognee_demo notebook pipeline is not saving summaries

* feat: implements first version of codegraph retriever

* chore: implements minor changes mostly to make the code production ready

* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation

* feat: implements unit tests for description to codepart search

* fix: fixes edge property inconsistent access in codepart retriever

* chore: implements more precise typing for get_attribute method for cogneegraph

* chore: adds spacing to tests and changes the cogneegraph getter names

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-12-10 11:07:06 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly (#257)
* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector (#261)

* fix: cognee_demo notebook pipeline is not saving summaries

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Boris
64b8aac86f
feat: code graph swe integration
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: hande-k <handekafkas7@gmail.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2024-11-27 09:32:29 +01:00
0xideas
0fb47ba23d
feat: COG-548-create-code-graph-to-kg-task (#7)
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-11-24 20:50:32 +01:00
0xideas
80b06c3acb
test: Test for code graph enrichment task
Co-authored-by: lxobr <lazar@topoteretes.com>
2024-11-24 19:24:47 +01:00
Igor Ilic
15b7b8ef2b fix: Resolve issue with table names in SQL commands
Some SQL commands require lowercase characters in table names unless table name is wrapped in quotes. Renamed all new tables to use lowercase

Fix COG-677
2024-11-20 14:54:35 +01:00
Leon Luithlen
66fb2948f8 Small cleanup pull request 2024-11-12 15:37:03 +01:00
Boris
52180eb6b5
feat: COG-184 add falkordb (#192)
* feat: add falkordb adapter

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-11-11 18:20:52 +01:00
Boris
dc187a81d7
feat: migrate search to tasks (#144)
* fix: don't return anything on health endpoint

* feat: add alembic migrations

* feat: align search types with the data we store and migrate search to tasks
2024-10-07 14:41:35 +02:00