Commit graph

260 commits

Author SHA1 Message Date
Rita Aleksziev
872bc89648 Format with Ruff 0.9.0 2025-01-10 15:11:00 +01:00
hajdul88
06e8d2268b Fix: fixes unit test for codepart search 2025-01-10 13:52:26 +01:00
hajdul88
6177d04b44 feat: implements code retreiver 2025-01-10 13:03:34 +01:00
hajdul88
9604d95ba5 feat: adds basic retriever for swe bench 2025-01-09 19:54:58 +01:00
hajdul88
56cc223302 feat: adds pydantic types to graph layer models 2025-01-09 16:46:41 +01:00
Rita Aleksziev
626bc76f5c Set max_tokens in config 2025-01-09 12:53:26 +01:00
Rita Aleksziev
abb3ea6d21 Adjust integration tests 2025-01-09 11:31:16 +01:00
Rita Aleksziev
5635da6e38 Adjust unit tests 2025-01-09 10:53:03 +01:00
Rita Aleksziev
34a9267f41 Get embedding engine instead of passing it. Get it from vector engine instead of direct getter. 2025-01-08 13:23:17 +01:00
hajdul88
18c8bc3c33
Merge branch 'dev' into COG-adding_html_graph_render 2025-01-08 10:44:11 +01:00
alekszievr
0dec704445
Merge branch 'dev' into COG-949 2025-01-08 10:21:07 +01:00
Rita Aleksziev
a774191ed3 Adjust AudioDocument and handle None token limit 2025-01-07 13:38:23 +01:00
hajdul88
bd644a1434 fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints 2025-01-07 13:33:05 +01:00
alekszievr
fbf8fc93bf
Merge branch 'dev' into COG-949 2025-01-07 13:01:16 +01:00
alekszievr
4802567871
Overcome ContextWindowExceededError by checking token count while chunking (#413) 2025-01-07 11:46:46 +01:00
lxobr
dbc33a6478 fix: adhere UnstructuredDocument.read() to Document 2025-01-06 11:23:55 +01:00
vasilije
76a0aa7e8b Fix linter issues 2025-01-05 19:48:35 +01:00
vasilije
6dafe73a6b Fix linter issues 2025-01-05 19:24:55 +01:00
vasilije
649fcf2ba8 Fix linter issues 2025-01-05 19:21:09 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
Igor Ilic
a4fe33ce92
Merge branch 'dev' into COG-475-local-file-endpoint-deletion 2024-12-20 15:25:10 +01:00
alekszievr
291f1c5a55
Handle retryerrors in code summary (#396)
* Handle retryerrors in code summary

* Log instead of print
2024-12-20 15:21:10 +01:00
Igor Ilic
6cb7fef411
Merge branch 'dev' into COG-475-local-file-endpoint-deletion 2024-12-19 17:34:42 +01:00
Igor Ilic
c139d52938 feat: Add deletion of local files made by cognee through data endpoint
Delete local files made by cognee when deleting data from database through endpoint

Feature COG-475
2024-12-19 16:35:35 +01:00
hajdul88
4689e55e68 feat: Adds mock summary for codegraph pipeline 2024-12-18 16:42:48 +01:00
Igor Ilic
f6800b979e feat: Add deletion of local files when deleting data
Delete local files when deleting data from cognee

Feature COG-475
2024-12-18 15:26:13 +01:00
Igor Ilic
48825d0d84 chore: Resolve typo in getting documents code
Resolve typo in code

chore COG-912
2024-12-17 14:22:51 +01:00
Igor Ilic
8b09358552
Merge branch 'dev' into COG-912-search-by-dataset 2024-12-17 13:22:13 +01:00
alekszievr
9afd0ece63
Structured code summarization (#375)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

* Structured code summarization

* add missing prompt file

* Remove summarization_model argument from summarize_code and fix typehinting

* minor refactors

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 13:05:47 +01:00
Igor Ilic
af335fafe3 test: Added test for getting of documents for search
Added test to verify getting documents related to datasets intended for search

Test COG-912
2024-12-17 12:11:24 +01:00
hajdul88
9e7ab6492a
feat: outsources chunking parameters to extract chunk from documents … (#289)
* feat: outsources chunking parameters to extract chunk from documents task
2024-12-17 11:31:31 +01:00
Igor Ilic
630ab556db feat: Add search by dataset for cognee
Added ability to search by datasets for cognee users

Feature COG-912
2024-12-17 11:20:22 +01:00
alekszievr
bfa0f06fb4
Add type to DataPoint metadata (#364)
* Add type to DataPoint metadata

* Add missing index_fields

* Use DataPoint UUID type in pgvector create_data_points

* Make _metadata mandatory everywhere
2024-12-16 16:27:03 +01:00
Igor Ilic
35b1f7d26a chore: Update typo in code
Update typo in string in code

Chore COG-656
2024-12-13 17:08:05 +01:00
Igor Ilic
11634cb58d feat: Add unauth access error to getting data
Raise unauth access error when trying to read data without access

Feature COG-656
2024-12-13 16:54:53 +01:00
Igor Ilic
43187e4d63 feat: Add user verification for accessing data
Verify user has access to data before returning it

Feature COG-656
2024-12-13 13:54:45 +01:00
Igor Ilic
b8ba436dba fix: Resolve issue with adding permissions to groups
Resolve issue with adding permissions to groups

Fix COG-656
2024-12-13 12:37:01 +01:00
Igor Ilic
eddfc17861 fix: Rewrite endpoint to add users to groups
Rewrote endpoint which adds users to groups

Fix COG-656
2024-12-13 12:13:42 +01:00
Igor Ilic
d4e2eb717a fix: fix existing edge check
Resolve issue with UUID concat by casting to string

Fix COG-656
2024-12-11 16:04:31 +01:00
hajdul88
6d85165189
Feature/cog 539 implementing additional retriever approaches (#262)
* fix: refactor get_graph_from_model to return nodes and edges correctly

* fix: add missing params

* fix: remove complex zip usage

* fix: add edges to data_point properties

* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector

* fix: fixes database pruning issue in pgvector (#261)

* feat: adds code summary embeddings to vector DB

* fix: cognee_demo notebook pipeline is not saving summaries

* feat: implements first version of codegraph retriever

* chore: implements minor changes mostly to make the code production ready

* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation

* feat: implements unit tests for description to codepart search

* fix: fixes edge property inconsistent access in codepart retriever

* chore: implements more precise typing for get_attribute method for cogneegraph

* chore: adds spacing to tests and changes the cogneegraph getter names

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2024-12-10 11:07:06 +01:00
Igor Ilic
344865f1a4
Merge branch 'main' into COG-685-more-document-types 2024-12-09 10:22:26 +01:00
Igor Ilic
07d9330e4a feat: Add UnstructuredLibraryImportError
Added exception when unstructured libary is called but not installed

Feature COG-685
2024-12-08 14:53:19 +01:00
Igor Ilic
62db3f8598 feat: Remove the need for libmagic for unstructured documents
Remove the need for libmagic so for unstructured documents by providing mime_type information

Feature COG-685
2024-12-08 14:37:50 +01:00
Igor Ilic
78214456a6 feat: Add unstructured document handler
Added unstructured library and handling of certain document types through their library

Feature COG-685
2024-12-06 17:50:22 +01:00
alekszievr
f30bf35f92
Merge branch 'main' into feat/COG-418-log-config-to-telemetry 2024-12-06 16:11:56 +01:00
alekszievr
e6def6423c
Merge branch 'main' into feat/COG-418-log-config-to-telemetry 2024-12-06 13:58:38 +01:00
Igor Ilic
d7fa9f3cfd Merge branch 'COG-505-data-dataset-model-changes' of github.com:topoteretes/cognee into COG-505-data-dataset-model-changes 2024-12-06 13:49:07 +01:00
Igor Ilic
cc6fbe2a5f refactor: Add space to ingest function
Add space and newline to ingest function

Refactor COG-505
2024-12-06 13:48:39 +01:00
Rita Aleksziev
462fcef240 move config getter into cognee/modules/pipelines/operations/run_tasks.py and make the indentation a bit more readable 2024-12-06 13:38:54 +01:00
Rita Aleksziev
dbfa91b635 Add cognee config to telemetry 2024-12-06 12:55:25 +01:00