Commit graph

53 commits

Author SHA1 Message Date
Leon Luithlen
d6a6a9eaba Return sentence_cut instead of word in chunk_by_paragraph 2024-11-14 15:03:09 +01:00
0xideas
8b681529b1
Update cognee/tasks/chunks/chunk_by_paragraph.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-11-14 14:42:15 +01:00
Leon Luithlen
73f24f9e4d Fix sentence_cut return value in inappropriate places 2024-11-14 14:40:42 +01:00
Leon Luithlen
eaf9167fa1 Change chunk_by_word to collect newlines in prior words 2024-11-14 14:19:34 +01:00
Leon Luithlen
57d8149732 Save paragraph_ids in chunk_by_paragraph 2024-11-14 13:59:54 +01:00
Leon Luithlen
6721eaee83 Fix chunk_index bug in chunk_by_paragraph 2024-11-14 13:50:40 +01:00
0xideas
f2206a09c0
Update cognee/tasks/chunks/chunk_by_word.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-11-14 13:16:17 +01:00
Leon Luithlen
d90698305b Simplify chunk_by_word 2024-11-14 09:43:10 +01:00
Leon Luithlen
45a60b7f19 Remove assert and move is_real_paragraph_end outside loop 2024-11-13 16:35:47 +01:00
Leon Luithlen
b787407db7 Add more adversarial examples 2024-11-13 16:23:14 +01:00
Leon Luithlen
9ea2634480 Replace word_count with maximum_length in if clause 2024-11-13 15:53:44 +01:00
Leon Luithlen
9b2fb09c59 Fix PdfDocument teset, give chunk_by_sentence a maximum_length arg 2024-11-13 15:39:17 +01:00
Leon Luithlen
f8e5b529c3 Add maximum_length argument to chunk_sentences 2024-11-13 15:35:03 +01:00
Leon Luithlen
ce498d97dd Refactor chunk_by_paragraph to be isomorphic 2024-11-13 15:35:03 +01:00
Leon Luithlen
ab55a73d18 Adapt chunk_by_sentence to isomorphic chunk_by_word 2024-11-13 15:35:03 +01:00
Leon Luithlen
c054e897a3 Make chunk_by_word isomorphic 2024-11-13 15:35:03 +01:00
Leon Luithlen
6f0637a028 Small cosmetic changes 2024-11-13 15:35:02 +01:00
Leon Luithlen
cd80525420 Revert to EXTENSION_TO_DOCUMENT_CLASS implementation of classify_documents 2024-11-13 14:32:10 +01:00
Leon Luithlen
826de0edbf Remove orphan dictionary 2024-11-12 16:47:28 +01:00
Leon Luithlen
83995fa548 Try old version of classify_documents 2024-11-12 16:47:28 +01:00
Leon Luithlen
8107709e98 Remove duplicate pdf key 2024-11-12 16:47:28 +01:00
Leon Luithlen
fbd011560a Rebase onto main 2024-11-12 16:47:28 +01:00
Leon Luithlen
d7ffef1979 Remove old __tests__ folders 2024-11-12 16:47:28 +01:00
Leon Luithlen
86e726d741 Complete migrating unit tests 2024-11-12 16:47:28 +01:00
Leon Luithlen
66fb2948f8 Small cleanup pull request 2024-11-12 15:37:03 +01:00
Leon Luithlen
adaf69c127 Readd infer_data_ontology models 2024-11-12 09:05:51 +01:00
Boris Arzentar
b1b6b79ca4 fix: convert qdrant search results to ScoredPoint 2024-11-12 09:01:03 +01:00
Boris Arzentar
68700f32c7 fix: add code graph generation pipeline 2024-11-12 09:01:03 +01:00
Boris Arzentar
e1e5e7336a fix: remove unused import 2024-11-12 09:01:03 +01:00
Boris Arzentar
7ea5f638fe fix: add summaries to the graph 2024-11-12 09:01:03 +01:00
Boris Arzentar
a2b1087c84 feat: add FalkorDB integration 2024-11-12 09:01:01 +01:00
Boris
52180eb6b5
feat: COG-184 add falkordb (#192)
* feat: add falkordb adapter

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-11-11 18:20:52 +01:00
Vasilije
16ee97cb68 fixes 2024-11-08 12:01:47 +01:00
Leon Luithlen
cb205069bc Replace parse_obj with model_validate 2024-11-04 17:20:01 +01:00
Leon Luithlen
79642cfca1 Replace update_forward_refs with model_rebuild calls 2024-11-04 15:53:40 +01:00
Igor Ilic
3567e0d7e7 fix: Fix chunk naive llm classifier
Fixed chunk naive llm classifier uuid issue, added fix for deletion of data points for LanceDB

Fix #COG-472
2024-10-31 00:42:18 +01:00
Boris
2f832b190c
fix: various fixes for the deployment
* fix: remove groups from UserRead model

* fix: add missing system dependencies for postgres

* fix: change vector db provider environment variable name

* fix: WeaviateAdapter retrieve bug

* fix: correctly return data point objects from retrieve method

* fix: align graph object properties

* feat: add node example
2024-10-22 11:26:48 +02:00
Boris
dc187a81d7
feat: migrate search to tasks (#144)
* fix: don't return anything on health endpoint

* feat: add alembic migrations

* feat: align search types with the data we store and migrate search to tasks
2024-10-07 14:41:35 +02:00
Igor Ilic
fcd60861ba
fix: Fix Jupyter Notebook (#142)
* fix: resolve issue with dlt sqlalchemy usage
Cognee database configuration information was not handled properly by dlt, a new dlt handler
moudle was made to handle database configuration propagation.

* fix: resolve issue with jupyter notebook

cognee add function uses old way of working in the notebook, updated it to
work with the latest state of the cognee add function which doesn't return output.

* fix: Remove empty DB_PATH argument from .env.template

Empty value for DB_PATH in the .env file overrides default value for path intended to be used by cognee.

---------
2024-10-07 12:58:54 +02:00
Boris
01582d7a55
feat: split add into tasks and use pipeline architecture (#141)
* feat: split add into tasks and use pipeline architecture
2024-09-30 14:09:20 +02:00
Boris
58db1ac2c8
chore: increase the lib version (#138) 2024-09-21 17:57:35 +02:00
Boris
a9433e9283
feat: add sqlalchemy as dlt destination (#137)
* feat: add sqlalchemy as dlt destination

* Fix the demo, update Readme

* fix: add 1.5 notebook

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2024-09-21 15:58:28 +02:00
Boris
e1a0b55a21
feat: user authentication in routes (#133)
* feat: require logged in user in routes
2024-09-08 21:12:49 +02:00
Boris
94a674a088
feat: split document reader from chunker (#131)
* fix: abstract chunking into a separate class

* fix: yield merged text from text chunker

* fix: split python version tests

* fix: change postgres live check

* fix: remove unnecessary code

* fix: update checkout action

* fix: update setup-python action

* fix: add PG_USER env variable

* fix: make sure relationship_name is used everywhere

* fix: remove duplicate import
2024-08-19 14:36:10 +02:00
Vasilije
e80d39167b Enable different chunking methods 2024-08-08 19:59:26 +02:00
Vasilije
4675a8f323 Refactor of the tasks 2024-08-08 17:10:43 +02:00
Vasilije
156c7bec68 Refactor of the tasks 2024-08-08 13:47:03 +02:00
Vasilije
85160da387 Refactor of the tasks 2024-08-08 13:37:55 +02:00
Vasilije
2e367198cd Task updates and updates to SQLAlchemy Adapter 2024-08-07 18:21:14 +02:00
Vasilije
557014e06b Task updates and updates to SQLAlchemy Adapter 2024-08-07 13:29:53 +02:00