Commit graph

1484 commits

Author SHA1 Message Date
Igor Ilic
07d9330e4a feat: Add UnstructuredLibraryImportError
Added exception when unstructured libary is called but not installed

Feature COG-685
2024-12-08 14:53:19 +01:00
Igor Ilic
53b7806ccb chore: Update pyproject file with unstructured library
Add unstructured library as docs optional extension to pyproject.toml

Chore COG-685
2024-12-08 14:42:08 +01:00
Igor Ilic
62db3f8598 feat: Remove the need for libmagic for unstructured documents
Remove the need for libmagic so for unstructured documents by providing mime_type information

Feature COG-685
2024-12-08 14:37:50 +01:00
Igor Ilic
78214456a6 feat: Add unstructured document handler
Added unstructured library and handling of certain document types through their library

Feature COG-685
2024-12-06 17:50:22 +01:00
Igor Ilic
8415279cb2
Merge pull request #260 from topoteretes/COG-505-data-dataset-model-changes
Cog 505 data dataset model changes
2024-12-06 14:42:35 +01:00
Igor Ilic
d7fa9f3cfd Merge branch 'COG-505-data-dataset-model-changes' of github.com:topoteretes/cognee into COG-505-data-dataset-model-changes 2024-12-06 13:49:07 +01:00
Igor Ilic
cc6fbe2a5f refactor: Add space to ingest function
Add space and newline to ingest function

Refactor COG-505
2024-12-06 13:48:39 +01:00
Boris
9429e5e1f5
Merge branch 'main' into COG-505-data-dataset-model-changes 2024-12-06 12:53:32 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly (#257)
* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector (#261)

* fix: cognee_demo notebook pipeline is not saving summaries

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Igor Ilic
351ce92001
Merge pull request #263 from topoteretes/gh-actions-all-branches
test: Update gh actions so they can run outside of PR to main
2024-12-06 12:04:47 +01:00
Igor Ilic
d254471023 test: Update gh actions so they can run outside of PR to main
Allow github actions to run on PRs that aren't targeting main

Test
2024-12-06 11:09:26 +01:00
Igor Ilic
1e098ae70d refactor: Add error handling to hash util
Added error handling to reading of file in hash util

Refactor COG-505
2024-12-05 20:54:55 +01:00
Igor Ilic
e80377b729 refactor: Move hash calculation of file to util
Moved hash calculation of file to shared utils, added better typing

Refactor COG-505
2024-12-05 20:33:30 +01:00
Igor Ilic
9ba5d49e69 test: Fix test for multimedia deduplication
Add missing function to get data from database to multimedia deduplication test

Test COG-505
2024-12-05 20:09:29 +01:00
Igor Ilic
add6730b9e test: Add testing of dataset data table content
Add testing of dataset data table content

Test COG-505
2024-12-05 19:37:12 +01:00
Igor Ilic
387002d8ca Merge branch 'COG-505-data-dataset-model-changes' of github.com:topoteretes/cognee into COG-505-data-dataset-model-changes 2024-12-05 19:26:17 +01:00
Igor Ilic
813b76c9c2 test: Add test for text deduplication
Added end to end test for text deduplication

Test COG-505
2024-12-05 19:25:50 +01:00
Igor Ilic
349ddfe794
Merge branch 'main' into COG-505-data-dataset-model-changes 2024-12-05 17:10:43 +01:00
Igor Ilic
378e7b81a5 fix: Fix merge of data for dlt
Resolve issue with dlt data not being merged for data_id

Fix COG-505
2024-12-05 17:03:36 +01:00
Igor Ilic
f5b5e56cc1 feat: Add deduplication of data
Data is deduplicated per user so if a user tries to add data which already exists it will just be redirected to existing data in database

Feature COG-505
2024-12-05 16:38:44 +01:00
hajdul88
acf036818e
Merge pull request #251 from topoteretes/feature/cog-717-create-edge-embeddings-in-vector-databases
Creates edge embeddings collection
2024-12-05 09:13:11 +01:00
hajdul88
68c3f42ab8
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-05 09:08:37 +01:00
Vasilije
c4ad473861
Merge pull request #253 from topoteretes/feat/COG-711-temporal-awareness-task
Integrate graphiti's temporal awareness functionality as Tasks
2024-12-04 20:50:03 +01:00
Vasilije
b571fb5626
Merge branch 'main' into feat/COG-711-temporal-awareness-task 2024-12-04 20:49:36 +01:00
hajdul88
7f192e1c2b
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-04 20:49:30 +01:00
Vasilije
7223b2c83b
Merge pull request #256 from topoteretes/fix-notebook-gh-actions
chore: Fix issue with notebook github actions
2024-12-04 20:42:18 +01:00
Igor Ilic
6be025e3d4 chore: Attempt to fix issue with notebook github actions
Attempt to resolve issue with running notebooks in github actions

Chore
2024-12-04 20:36:23 +01:00
hajdul88
59035c3f45 fix: puts index_graph_edges unit tests under unit test directory 2024-12-04 19:32:15 +01:00
hajdul88
e6bf428db5 feat: implements tests for index_graph_edges method 2024-12-04 19:27:09 +01:00
hajdul88
36a5a27f10
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-04 18:15:46 +01:00
hajdul88
f444ae21b4 fix: adds back the ids to the nodes after node_link_graph 2024-12-04 18:14:08 +01:00
alekszievr
cedb9d6608
Merge branch 'main' into feat/COG-711-temporal-awareness-task 2024-12-04 17:31:50 +01:00
alekszievr
ac62e9809a
Skip empty files in get repo file dependencies (#254)
Co-authored-by: Rita Aleksziev <alekszievr@gmail..com>
2024-12-04 17:29:07 +01:00
Igor Ilic
0ce254b262 feat: Add text deduplication
If text is added to cognee it will be saved by hash so the same text can't be stored multiple times

Feature COG-505
2024-12-04 17:19:29 +01:00
Rita Aleksziev
5d71059f6c fix credentials 2024-12-04 17:00:48 +01:00
alekszievr
0d2a9e9e17
Merge branch 'main' into feat/COG-711-temporal-awareness-task 2024-12-04 16:34:11 +01:00
Rita Aleksziev
dd94781033 Integrate graphiti's functionality as Tasks 2024-12-04 16:33:26 +01:00
Vasilije
080143cdad
Merge branch 'main' into feature/cog-717-create-edge-embeddings-in-vector-databases 2024-12-04 16:33:08 +01:00
alekszievr
df8fc829f9
check if repo path exists before starting the pipeline (#252)
Co-authored-by: Rita Aleksziev <alekszievr@gmail..com>
2024-12-04 16:25:05 +01:00
Igor Ilic
6bb0f3d8f2 feat: Add ability to store single data instance in multiple datasets
Added ability to store single data instance in multiple datasets

Feature COG-505
2024-12-04 15:53:25 +01:00
hajdul88
c20ee11e80 feat: implements graph edge indexing 2024-12-04 15:37:48 +01:00
hajdul88
46ee513f6c chore: deletes comment from dynamic_steps_example 2024-12-04 14:59:01 +01:00
Igor Ilic
d793a5d9b5
Merge pull request #249 from topoteretes/fix-metadata-update
fix: Resolve issue when metadata is updated
2024-12-04 14:14:04 +01:00
Igor Ilic
0a0b030df5 fix: Resolve issue when metadata is updated
Resolve issue when attempting to update metadata related to data

Fix
2024-12-04 14:03:01 +01:00
Igor Ilic
3699b0dccb
Merge pull request #248 from topoteretes/fix-milvus-adapter
fix: Resolve issue with embedding data points for Milvus
2024-12-04 12:20:08 +01:00
Igor Ilic
58b17e5738 fix: Resolve issue with embedding data points for Milvus
Resolve issue with embedding data points for Milvus

fix
2024-12-04 12:12:25 +01:00
Vasilije
1a963f1dc8
Merge pull request #247 from topoteretes/fix-dlt-for-metadata
Fix dlt for metadata
2024-12-04 12:05:08 +01:00
Igor Ilic
c505ee5f98
Merge branch 'main' into fix-dlt-for-metadata 2024-12-04 11:56:41 +01:00
Igor Ilic
ceebcdb251 fix: Resolve issue with llama index type resolution
Resolve issue with llama index type resolution

Fix
2024-12-04 11:29:27 +01:00
Boris Arzentar
4678aaef52 Merge remote-tracking branch 'origin/main' 2024-12-04 11:16:16 +01:00