Commit graph

152 commits

Author SHA1 Message Date
hajdul88
18c8bc3c33
Merge branch 'dev' into COG-adding_html_graph_render 2025-01-08 10:44:11 +01:00
hajdul88
bd644a1434 fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints 2025-01-07 13:33:05 +01:00
vasilije
76a0aa7e8b Fix linter issues 2025-01-05 19:48:35 +01:00
vasilije
649fcf2ba8 Fix linter issues 2025-01-05 19:21:09 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
lxobr
262deee26e
Cog 813 source code chunks (#383)
* fix: pass the list of all CodeFiles to enrichment task

* feat: introduce SourceCodeChunk, update metadata

* feat: get_source_code_chunks code graph pipeline task

* feat: integrate get_source_code_chunks task, comment out summarize_code

* Fix code summarization (#387)

* feat: update data models

* feat: naive parse long strings in source code

* fix: get_non_py_files instead of get_non_code_files

* fix: limit recursion, add comment

* handle embedding empty input error (#398)

* feat: robustly handle CodeFile source code

* refactor: sort imports

* todo: add support for other embedding models

* feat: add custom logger

* feat: add robustness to get_source_code_chunks

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat: improve embedding exceptions

* refactor: format indents, rename module

---------

Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-12-26 13:53:38 +01:00
alekszievr
de2394c392
Ingest non-code files (#395)
* Ingest non-code files

* Fixing review findings
2024-12-20 14:06:40 +01:00
hajdul88
4689e55e68 feat: Adds mock summary for codegraph pipeline 2024-12-18 16:42:48 +01:00
hajdul88
852532fcad fix: changes back the max workers to 12 2024-12-18 16:39:44 +01:00
hajdul88
75b98e0dc6 feat: deletes executor limit from get_repo_file_dependencies 2024-12-18 14:18:38 +01:00
alekszievr
9afd0ece63
Structured code summarization (#375)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

* Structured code summarization

* add missing prompt file

* Remove summarization_model argument from summarize_code and fix typehinting

* minor refactors

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 13:05:47 +01:00
lxobr
da5e3ab24d
COG 870 Remove duplicate edges from the code graph (#293)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 12:02:25 +01:00
hajdul88
9e7ab6492a
feat: outsources chunking parameters to extract chunk from documents … (#289)
* feat: outsources chunking parameters to extract chunk from documents task
2024-12-17 11:31:31 +01:00
alekszievr
bfa0f06fb4
Add type to DataPoint metadata (#364)
* Add type to DataPoint metadata

* Add missing index_fields

* Use DataPoint UUID type in pgvector create_data_points

* Make _metadata mandatory everywhere
2024-12-16 16:27:03 +01:00
lxobr
5360093097
COG-810 Implement a top-down dependency graph builder tool (#268)
* feat: parse repo to call graph

* Update/repo_processor/top_down_repo_parse.py task

* fix: minor improvements

* feat: file parsing jedi script optimisation

---------
2024-12-16 16:02:39 +01:00
Igor Ilic
924759a599 refactor: Rename query compute to query completion
Rename searching type from compute to completion

Refactor COG-656
2024-12-13 17:03:38 +01:00
Igor Ilic
67585d0ab1 feat: Add simple instruction for system prompt
Add simple instruction for system prompt

Feature COG-656
2024-12-13 15:30:24 +01:00
Igor Ilic
9c3e2422f3 feat: Add compute search to cognee
Add compute search to cognee which makes searches human readable

Feature COG-656
2024-12-13 15:18:33 +01:00
Igor Ilic
92d0122b46 fix: Remove data handling based on type in resolving directory function
No need to handle different data types in resolving directories, focus on just handling case when it's a directory

Fix COG-656
2024-12-13 09:55:47 +01:00
Igor Ilic
7100a4994a feat: Add resolving of directories as task for the add pipeline
Add resolving of directories as task for the add pipeline

Feature COG-656
2024-12-12 17:04:49 +01:00
Igor Ilic
9b4af85474 fix: Resolve issue with text being submitted as data
Add support for text data to resolving data directory task

Fix COG-656
2024-12-12 13:31:20 +01:00
Igor Ilic
d9d90d91ae chore: Remove comments from code
Remove code comments that are not needed

Chore COG-656
2024-12-11 16:49:34 +01:00
Igor Ilic
f3ce7be885 feat: Add ability to send directories with data to cognee
Add ability to send data directories to cognee

Feature COG-656
2024-12-11 14:31:54 +01:00
hajdul88
6d85165189
Feature/cog 539 implementing additional retriever approaches (#262)
* fix: refactor get_graph_from_model to return nodes and edges correctly

* fix: add missing params

* fix: remove complex zip usage

* fix: add edges to data_point properties

* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector

* fix: fixes database pruning issue in pgvector (#261)

* feat: adds code summary embeddings to vector DB

* fix: cognee_demo notebook pipeline is not saving summaries

* feat: implements first version of codegraph retriever

* chore: implements minor changes mostly to make the code production ready

* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation

* feat: implements unit tests for description to codepart search

* fix: fixes edge property inconsistent access in codepart retriever

* chore: implements more precise typing for get_attribute method for cogneegraph

* chore: adds spacing to tests and changes the cogneegraph getter names

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-12-10 11:07:06 +01:00
Igor Ilic
62db3f8598 feat: Remove the need for libmagic for unstructured documents
Remove the need for libmagic so for unstructured documents by providing mime_type information

Feature COG-685
2024-12-08 14:37:50 +01:00
Igor Ilic
78214456a6 feat: Add unstructured document handler
Added unstructured library and handling of certain document types through their library

Feature COG-685
2024-12-06 17:50:22 +01:00
Boris
9429e5e1f5
Merge branch 'main' into COG-505-data-dataset-model-changes 2024-12-06 12:53:32 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly (#257)
* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector (#261)

* fix: cognee_demo notebook pipeline is not saving summaries

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Igor Ilic
e80377b729 refactor: Move hash calculation of file to util
Moved hash calculation of file to shared utils, added better typing

Refactor COG-505
2024-12-05 20:33:30 +01:00
Igor Ilic
387002d8ca Merge branch 'COG-505-data-dataset-model-changes' of github.com:topoteretes/cognee into COG-505-data-dataset-model-changes 2024-12-05 19:26:17 +01:00
Igor Ilic
813b76c9c2 test: Add test for text deduplication
Added end to end test for text deduplication

Test COG-505
2024-12-05 19:25:50 +01:00
Igor Ilic
349ddfe794
Merge branch 'main' into COG-505-data-dataset-model-changes 2024-12-05 17:10:43 +01:00
Igor Ilic
378e7b81a5 fix: Fix merge of data for dlt
Resolve issue with dlt data not being merged for data_id

Fix COG-505
2024-12-05 17:03:36 +01:00
Igor Ilic
f5b5e56cc1 feat: Add deduplication of data
Data is deduplicated per user so if a user tries to add data which already exists it will just be redirected to existing data in database

Feature COG-505
2024-12-05 16:38:44 +01:00
hajdul88
68c3f42ab8
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-05 09:08:37 +01:00
hajdul88
36a5a27f10
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-04 18:15:46 +01:00
alekszievr
cedb9d6608
Merge branch 'main' into feat/COG-711-temporal-awareness-task 2024-12-04 17:31:50 +01:00
alekszievr
ac62e9809a
Skip empty files in get repo file dependencies (#254)
Co-authored-by: Rita Aleksziev <alekszievr@gmail..com>
2024-12-04 17:29:07 +01:00
Igor Ilic
0ce254b262 feat: Add text deduplication
If text is added to cognee it will be saved by hash so the same text can't be stored multiple times

Feature COG-505
2024-12-04 17:19:29 +01:00
Rita Aleksziev
5d71059f6c fix credentials 2024-12-04 17:00:48 +01:00
alekszievr
0d2a9e9e17
Merge branch 'main' into feat/COG-711-temporal-awareness-task 2024-12-04 16:34:11 +01:00
Rita Aleksziev
dd94781033 Integrate graphiti's functionality as Tasks 2024-12-04 16:33:26 +01:00
Vasilije
080143cdad
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-04 16:33:08 +01:00
alekszievr
df8fc829f9
check if repo path exists before starting the pipeline (#252)
Co-authored-by: Rita Aleksziev <alekszievr@gmail..com>
2024-12-04 16:25:05 +01:00
Igor Ilic
6bb0f3d8f2 feat: Add ability to store single data instance in multiple datasets
Added ability to store single data instance in multiple datasets

Feature COG-505
2024-12-04 15:53:25 +01:00
hajdul88
c20ee11e80 feat: implements graph edge indexing 2024-12-04 15:37:48 +01:00
Igor Ilic
0a0b030df5 fix: Resolve issue when metadata is updated
Resolve issue when attempting to update metadata related to data

Fix
2024-12-04 14:03:01 +01:00
Igor Ilic
ceebcdb251 fix: Resolve issue with llama index type resolution
Resolve issue with llama index type resolution

Fix
2024-12-04 11:29:27 +01:00
Igor Ilic
61aebf79e0 fix: Resolve issue with dlt for ingest_data_with_metadata
Resolve issue caused by dlt for ingest_data_with_metadata task

Fix
2024-12-04 11:14:30 +01:00
Boris Arzentar
0b8b270933 fix: make get_embeddable_data static 2024-12-03 21:47:23 +01:00