Igor Ilic
c5e15d715a
Merge pull request #417 from topoteretes/feature/cog-967-adding-graph-completion-feature-to-cognee
...
feat: implements the first version of graph based completion in search
2025-01-09 16:23:22 +01:00
Igor Ilic
6b6cc0f1d4
fix: Add fix for accessing dictionary elements that don't exits
...
Using get for the text key instead of direct access to handle situation if the text key doesn't exist
2025-01-09 16:06:26 +01:00
Igor Ilic
b733590724
fix: Remove logger from __init__.py file
2025-01-09 12:26:14 +01:00
Igor Ilic
1989296b01
fix: Resolve profiler issue with partial and recursive logger imports
...
Resolve issue for profiler with partial and recursive logger imports
2025-01-09 12:17:42 +01:00
Rita Aleksziev
cdaae161a8
Handle circular import
2025-01-09 12:08:42 +01:00
hajdul88
341f30fcdc
fix: Fixes ruff formatting
2025-01-09 12:00:49 +01:00
hajdul88
fe57eb69e7
Merge branch 'dev' into feature/cog-967-adding-graph-completion-feature-to-cognee
2025-01-09 11:07:19 +01:00
Rita Aleksziev
5635da6e38
Adjust unit tests
2025-01-09 10:53:03 +01:00
hajdul88
d39140f28b
feat: implements the first version of graph based completion in search
2025-01-08 16:10:29 +01:00
Rita Aleksziev
97814e334f
Get embedding engine instead of passing it in code chunking.
2025-01-08 13:45:04 +01:00
Rita Aleksziev
34a9267f41
Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.
2025-01-08 13:23:17 +01:00
vasilije
41b1486cff
Fix visualization
2025-01-08 13:13:52 +01:00
hajdul88
18c8bc3c33
Merge branch 'dev' into COG-adding_html_graph_render
2025-01-08 10:44:11 +01:00
alekszievr
0dec704445
Merge branch 'dev' into COG-949
2025-01-08 10:21:07 +01:00
Rita Aleksziev
fb13a1b61a
Handle azure models as well
2025-01-07 15:00:58 +01:00
Rita Aleksziev
a774191ed3
Adjust AudioDocument and handle None token limit
2025-01-07 13:38:23 +01:00
hajdul88
bd644a1434
fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints
2025-01-07 13:33:05 +01:00
alekszievr
4802567871
Overcome ContextWindowExceededError by checking token count while chunking ( #413 )
2025-01-07 11:46:46 +01:00
lxobr
4cee9a16ce
fix: add allowed extensions
2025-01-06 11:22:45 +01:00
vasilije
76a0aa7e8b
Fix linter issues
2025-01-05 19:48:35 +01:00
vasilije
649fcf2ba8
Fix linter issues
2025-01-05 19:21:09 +01:00
vasilije
60c8fd103b
ruff format
2025-01-05 19:09:08 +01:00
lxobr
262deee26e
Cog 813 source code chunks ( #383 )
...
* fix: pass the list of all CodeFiles to enrichment task
* feat: introduce SourceCodeChunk, update metadata
* feat: get_source_code_chunks code graph pipeline task
* feat: integrate get_source_code_chunks task, comment out summarize_code
* Fix code summarization (#387 )
* feat: update data models
* feat: naive parse long strings in source code
* fix: get_non_py_files instead of get_non_code_files
* fix: limit recursion, add comment
* handle embedding empty input error (#398 )
* feat: robustly handle CodeFile source code
* refactor: sort imports
* todo: add support for other embedding models
* feat: add custom logger
* feat: add robustness to get_source_code_chunks
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* feat: improve embedding exceptions
* refactor: format indents, rename module
---------
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-12-26 13:53:38 +01:00
alekszievr
de2394c392
Ingest non-code files ( #395 )
...
* Ingest non-code files
* Fixing review findings
2024-12-20 14:06:40 +01:00
hajdul88
4689e55e68
feat: Adds mock summary for codegraph pipeline
2024-12-18 16:42:48 +01:00
hajdul88
852532fcad
fix: changes back the max workers to 12
2024-12-18 16:39:44 +01:00
hajdul88
75b98e0dc6
feat: deletes executor limit from get_repo_file_dependencies
2024-12-18 14:18:38 +01:00
alekszievr
9afd0ece63
Structured code summarization ( #375 )
...
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
* Structured code summarization
* add missing prompt file
* Remove summarization_model argument from summarize_code and fix typehinting
* minor refactors
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 13:05:47 +01:00
lxobr
da5e3ab24d
COG 870 Remove duplicate edges from the code graph ( #293 )
...
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 12:02:25 +01:00
hajdul88
9e7ab6492a
feat: outsources chunking parameters to extract chunk from documents … ( #289 )
...
* feat: outsources chunking parameters to extract chunk from documents task
2024-12-17 11:31:31 +01:00
alekszievr
bfa0f06fb4
Add type to DataPoint metadata ( #364 )
...
* Add type to DataPoint metadata
* Add missing index_fields
* Use DataPoint UUID type in pgvector create_data_points
* Make _metadata mandatory everywhere
2024-12-16 16:27:03 +01:00
lxobr
5360093097
COG-810 Implement a top-down dependency graph builder tool ( #268 )
...
* feat: parse repo to call graph
* Update/repo_processor/top_down_repo_parse.py task
* fix: minor improvements
* feat: file parsing jedi script optimisation
---------
2024-12-16 16:02:39 +01:00
Igor Ilic
924759a599
refactor: Rename query compute to query completion
...
Rename searching type from compute to completion
Refactor COG-656
2024-12-13 17:03:38 +01:00
Igor Ilic
67585d0ab1
feat: Add simple instruction for system prompt
...
Add simple instruction for system prompt
Feature COG-656
2024-12-13 15:30:24 +01:00
Igor Ilic
9c3e2422f3
feat: Add compute search to cognee
...
Add compute search to cognee which makes searches human readable
Feature COG-656
2024-12-13 15:18:33 +01:00
Igor Ilic
92d0122b46
fix: Remove data handling based on type in resolving directory function
...
No need to handle different data types in resolving directories, focus on just handling case when it's a directory
Fix COG-656
2024-12-13 09:55:47 +01:00
Igor Ilic
7100a4994a
feat: Add resolving of directories as task for the add pipeline
...
Add resolving of directories as task for the add pipeline
Feature COG-656
2024-12-12 17:04:49 +01:00
Igor Ilic
9b4af85474
fix: Resolve issue with text being submitted as data
...
Add support for text data to resolving data directory task
Fix COG-656
2024-12-12 13:31:20 +01:00
Igor Ilic
d9d90d91ae
chore: Remove comments from code
...
Remove code comments that are not needed
Chore COG-656
2024-12-11 16:49:34 +01:00
Igor Ilic
f3ce7be885
feat: Add ability to send directories with data to cognee
...
Add ability to send data directories to cognee
Feature COG-656
2024-12-11 14:31:54 +01:00
hajdul88
6d85165189
Feature/cog 539 implementing additional retriever approaches ( #262 )
...
* fix: refactor get_graph_from_model to return nodes and edges correctly
* fix: add missing params
* fix: remove complex zip usage
* fix: add edges to data_point properties
* fix: handle rate limit error coming from llm model
* fix: fixes lost edges and nodes in get_graph_from_model
* fix: fixes database pruning issue in pgvector
* fix: fixes database pruning issue in pgvector (#261 )
* feat: adds code summary embeddings to vector DB
* fix: cognee_demo notebook pipeline is not saving summaries
* feat: implements first version of codegraph retriever
* chore: implements minor changes mostly to make the code production ready
* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation
* feat: implements unit tests for description to codepart search
* fix: fixes edge property inconsistent access in codepart retriever
* chore: implements more precise typing for get_attribute method for cogneegraph
* chore: adds spacing to tests and changes the cogneegraph getter names
---------
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-12-10 11:07:06 +01:00
Igor Ilic
62db3f8598
feat: Remove the need for libmagic for unstructured documents
...
Remove the need for libmagic so for unstructured documents by providing mime_type information
Feature COG-685
2024-12-08 14:37:50 +01:00
Igor Ilic
78214456a6
feat: Add unstructured document handler
...
Added unstructured library and handling of certain document types through their library
Feature COG-685
2024-12-06 17:50:22 +01:00
Boris
9429e5e1f5
Merge branch 'main' into COG-505-data-dataset-model-changes
2024-12-06 12:53:32 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly ( #257 )
...
* fix: handle rate limit error coming from llm model
* fix: fixes lost edges and nodes in get_graph_from_model
* fix: fixes database pruning issue in pgvector (#261 )
* fix: cognee_demo notebook pipeline is not saving summaries
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Igor Ilic
e80377b729
refactor: Move hash calculation of file to util
...
Moved hash calculation of file to shared utils, added better typing
Refactor COG-505
2024-12-05 20:33:30 +01:00
Igor Ilic
387002d8ca
Merge branch 'COG-505-data-dataset-model-changes' of github.com:topoteretes/cognee into COG-505-data-dataset-model-changes
2024-12-05 19:26:17 +01:00
Igor Ilic
813b76c9c2
test: Add test for text deduplication
...
Added end to end test for text deduplication
Test COG-505
2024-12-05 19:25:50 +01:00
Igor Ilic
349ddfe794
Merge branch 'main' into COG-505-data-dataset-model-changes
2024-12-05 17:10:43 +01:00
Igor Ilic
378e7b81a5
fix: Fix merge of data for dlt
...
Resolve issue with dlt data not being merged for data_id
Fix COG-505
2024-12-05 17:03:36 +01:00