hajdul88
18c8bc3c33
Merge branch 'dev' into COG-adding_html_graph_render
2025-01-08 10:44:11 +01:00
hajdul88
bd644a1434
fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints
2025-01-07 13:33:05 +01:00
vasilije
76a0aa7e8b
Fix linter issues
2025-01-05 19:48:35 +01:00
vasilije
649fcf2ba8
Fix linter issues
2025-01-05 19:21:09 +01:00
vasilije
60c8fd103b
ruff format
2025-01-05 19:09:08 +01:00
lxobr
262deee26e
Cog 813 source code chunks ( #383 )
...
* fix: pass the list of all CodeFiles to enrichment task
* feat: introduce SourceCodeChunk, update metadata
* feat: get_source_code_chunks code graph pipeline task
* feat: integrate get_source_code_chunks task, comment out summarize_code
* Fix code summarization (#387 )
* feat: update data models
* feat: naive parse long strings in source code
* fix: get_non_py_files instead of get_non_code_files
* fix: limit recursion, add comment
* handle embedding empty input error (#398 )
* feat: robustly handle CodeFile source code
* refactor: sort imports
* todo: add support for other embedding models
* feat: add custom logger
* feat: add robustness to get_source_code_chunks
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* feat: improve embedding exceptions
* refactor: format indents, rename module
---------
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-12-26 13:53:38 +01:00
alekszievr
de2394c392
Ingest non-code files ( #395 )
...
* Ingest non-code files
* Fixing review findings
2024-12-20 14:06:40 +01:00
hajdul88
4689e55e68
feat: Adds mock summary for codegraph pipeline
2024-12-18 16:42:48 +01:00
hajdul88
852532fcad
fix: changes back the max workers to 12
2024-12-18 16:39:44 +01:00
hajdul88
75b98e0dc6
feat: deletes executor limit from get_repo_file_dependencies
2024-12-18 14:18:38 +01:00
alekszievr
9afd0ece63
Structured code summarization ( #375 )
...
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
* Structured code summarization
* add missing prompt file
* Remove summarization_model argument from summarize_code and fix typehinting
* minor refactors
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 13:05:47 +01:00
lxobr
da5e3ab24d
COG 870 Remove duplicate edges from the code graph ( #293 )
...
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 12:02:25 +01:00
hajdul88
9e7ab6492a
feat: outsources chunking parameters to extract chunk from documents … ( #289 )
...
* feat: outsources chunking parameters to extract chunk from documents task
2024-12-17 11:31:31 +01:00
alekszievr
bfa0f06fb4
Add type to DataPoint metadata ( #364 )
...
* Add type to DataPoint metadata
* Add missing index_fields
* Use DataPoint UUID type in pgvector create_data_points
* Make _metadata mandatory everywhere
2024-12-16 16:27:03 +01:00
lxobr
5360093097
COG-810 Implement a top-down dependency graph builder tool ( #268 )
...
* feat: parse repo to call graph
* Update/repo_processor/top_down_repo_parse.py task
* fix: minor improvements
* feat: file parsing jedi script optimisation
---------
2024-12-16 16:02:39 +01:00
Igor Ilic
924759a599
refactor: Rename query compute to query completion
...
Rename searching type from compute to completion
Refactor COG-656
2024-12-13 17:03:38 +01:00
Igor Ilic
67585d0ab1
feat: Add simple instruction for system prompt
...
Add simple instruction for system prompt
Feature COG-656
2024-12-13 15:30:24 +01:00
Igor Ilic
9c3e2422f3
feat: Add compute search to cognee
...
Add compute search to cognee which makes searches human readable
Feature COG-656
2024-12-13 15:18:33 +01:00
Igor Ilic
92d0122b46
fix: Remove data handling based on type in resolving directory function
...
No need to handle different data types in resolving directories, focus on just handling case when it's a directory
Fix COG-656
2024-12-13 09:55:47 +01:00
Igor Ilic
7100a4994a
feat: Add resolving of directories as task for the add pipeline
...
Add resolving of directories as task for the add pipeline
Feature COG-656
2024-12-12 17:04:49 +01:00
Igor Ilic
9b4af85474
fix: Resolve issue with text being submitted as data
...
Add support for text data to resolving data directory task
Fix COG-656
2024-12-12 13:31:20 +01:00
Igor Ilic
d9d90d91ae
chore: Remove comments from code
...
Remove code comments that are not needed
Chore COG-656
2024-12-11 16:49:34 +01:00
Igor Ilic
f3ce7be885
feat: Add ability to send directories with data to cognee
...
Add ability to send data directories to cognee
Feature COG-656
2024-12-11 14:31:54 +01:00
hajdul88
6d85165189
Feature/cog 539 implementing additional retriever approaches ( #262 )
...
* fix: refactor get_graph_from_model to return nodes and edges correctly
* fix: add missing params
* fix: remove complex zip usage
* fix: add edges to data_point properties
* fix: handle rate limit error coming from llm model
* fix: fixes lost edges and nodes in get_graph_from_model
* fix: fixes database pruning issue in pgvector
* fix: fixes database pruning issue in pgvector (#261 )
* feat: adds code summary embeddings to vector DB
* fix: cognee_demo notebook pipeline is not saving summaries
* feat: implements first version of codegraph retriever
* chore: implements minor changes mostly to make the code production ready
* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation
* feat: implements unit tests for description to codepart search
* fix: fixes edge property inconsistent access in codepart retriever
* chore: implements more precise typing for get_attribute method for cogneegraph
* chore: adds spacing to tests and changes the cogneegraph getter names
---------
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-12-10 11:07:06 +01:00
Igor Ilic
62db3f8598
feat: Remove the need for libmagic for unstructured documents
...
Remove the need for libmagic so for unstructured documents by providing mime_type information
Feature COG-685
2024-12-08 14:37:50 +01:00
Igor Ilic
78214456a6
feat: Add unstructured document handler
...
Added unstructured library and handling of certain document types through their library
Feature COG-685
2024-12-06 17:50:22 +01:00
Boris
9429e5e1f5
Merge branch 'main' into COG-505-data-dataset-model-changes
2024-12-06 12:53:32 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly ( #257 )
...
* fix: handle rate limit error coming from llm model
* fix: fixes lost edges and nodes in get_graph_from_model
* fix: fixes database pruning issue in pgvector (#261 )
* fix: cognee_demo notebook pipeline is not saving summaries
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Igor Ilic
e80377b729
refactor: Move hash calculation of file to util
...
Moved hash calculation of file to shared utils, added better typing
Refactor COG-505
2024-12-05 20:33:30 +01:00
Igor Ilic
387002d8ca
Merge branch 'COG-505-data-dataset-model-changes' of github.com:topoteretes/cognee into COG-505-data-dataset-model-changes
2024-12-05 19:26:17 +01:00
Igor Ilic
813b76c9c2
test: Add test for text deduplication
...
Added end to end test for text deduplication
Test COG-505
2024-12-05 19:25:50 +01:00
Igor Ilic
349ddfe794
Merge branch 'main' into COG-505-data-dataset-model-changes
2024-12-05 17:10:43 +01:00
Igor Ilic
378e7b81a5
fix: Fix merge of data for dlt
...
Resolve issue with dlt data not being merged for data_id
Fix COG-505
2024-12-05 17:03:36 +01:00
Igor Ilic
f5b5e56cc1
feat: Add deduplication of data
...
Data is deduplicated per user so if a user tries to add data which already exists it will just be redirected to existing data in database
Feature COG-505
2024-12-05 16:38:44 +01:00
hajdul88
68c3f42ab8
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases
2024-12-05 09:08:37 +01:00
hajdul88
36a5a27f10
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases
2024-12-04 18:15:46 +01:00
alekszievr
cedb9d6608
Merge branch 'main' into feat/COG-711-temporal-awareness-task
2024-12-04 17:31:50 +01:00
alekszievr
ac62e9809a
Skip empty files in get repo file dependencies ( #254 )
...
Co-authored-by: Rita Aleksziev <alekszievr@gmail..com>
2024-12-04 17:29:07 +01:00
Igor Ilic
0ce254b262
feat: Add text deduplication
...
If text is added to cognee it will be saved by hash so the same text can't be stored multiple times
Feature COG-505
2024-12-04 17:19:29 +01:00
Rita Aleksziev
5d71059f6c
fix credentials
2024-12-04 17:00:48 +01:00
alekszievr
0d2a9e9e17
Merge branch 'main' into feat/COG-711-temporal-awareness-task
2024-12-04 16:34:11 +01:00
Rita Aleksziev
dd94781033
Integrate graphiti's functionality as Tasks
2024-12-04 16:33:26 +01:00
Vasilije
080143cdad
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases
2024-12-04 16:33:08 +01:00
alekszievr
df8fc829f9
check if repo path exists before starting the pipeline ( #252 )
...
Co-authored-by: Rita Aleksziev <alekszievr@gmail..com>
2024-12-04 16:25:05 +01:00
Igor Ilic
6bb0f3d8f2
feat: Add ability to store single data instance in multiple datasets
...
Added ability to store single data instance in multiple datasets
Feature COG-505
2024-12-04 15:53:25 +01:00
hajdul88
c20ee11e80
feat: implements graph edge indexing
2024-12-04 15:37:48 +01:00
Igor Ilic
0a0b030df5
fix: Resolve issue when metadata is updated
...
Resolve issue when attempting to update metadata related to data
Fix
2024-12-04 14:03:01 +01:00
Igor Ilic
ceebcdb251
fix: Resolve issue with llama index type resolution
...
Resolve issue with llama index type resolution
Fix
2024-12-04 11:29:27 +01:00
Igor Ilic
61aebf79e0
fix: Resolve issue with dlt for ingest_data_with_metadata
...
Resolve issue caused by dlt for ingest_data_with_metadata task
Fix
2024-12-04 11:14:30 +01:00
Boris Arzentar
0b8b270933
fix: make get_embeddable_data static
2024-12-03 21:47:23 +01:00