Leon Luithlen
d6a6a9eaba
Return sentence_cut instead of word in chunk_by_paragraph
2024-11-14 15:03:09 +01:00
0xideas
8b681529b1
Update cognee/tasks/chunks/chunk_by_paragraph.py
...
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-11-14 14:42:15 +01:00
Leon Luithlen
73f24f9e4d
Fix sentence_cut return value in inappropriate places
2024-11-14 14:40:42 +01:00
Leon Luithlen
eaf9167fa1
Change chunk_by_word to collect newlines in prior words
2024-11-14 14:19:34 +01:00
Leon Luithlen
57d8149732
Save paragraph_ids in chunk_by_paragraph
2024-11-14 13:59:54 +01:00
Leon Luithlen
6721eaee83
Fix chunk_index bug in chunk_by_paragraph
2024-11-14 13:50:40 +01:00
0xideas
f2206a09c0
Update cognee/tasks/chunks/chunk_by_word.py
...
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-11-14 13:16:17 +01:00
Leon Luithlen
d90698305b
Simplify chunk_by_word
2024-11-14 09:43:10 +01:00
Leon Luithlen
45a60b7f19
Remove assert and move is_real_paragraph_end outside loop
2024-11-13 16:35:47 +01:00
Leon Luithlen
b787407db7
Add more adversarial examples
2024-11-13 16:23:14 +01:00
Leon Luithlen
9ea2634480
Replace word_count with maximum_length in if clause
2024-11-13 15:53:44 +01:00
Leon Luithlen
9b2fb09c59
Fix PdfDocument teset, give chunk_by_sentence a maximum_length arg
2024-11-13 15:39:17 +01:00
Leon Luithlen
f8e5b529c3
Add maximum_length argument to chunk_sentences
2024-11-13 15:35:03 +01:00
Leon Luithlen
ce498d97dd
Refactor chunk_by_paragraph to be isomorphic
2024-11-13 15:35:03 +01:00
Leon Luithlen
ab55a73d18
Adapt chunk_by_sentence to isomorphic chunk_by_word
2024-11-13 15:35:03 +01:00
Leon Luithlen
c054e897a3
Make chunk_by_word isomorphic
2024-11-13 15:35:03 +01:00
Leon Luithlen
6f0637a028
Small cosmetic changes
2024-11-13 15:35:02 +01:00
Leon Luithlen
cd80525420
Revert to EXTENSION_TO_DOCUMENT_CLASS implementation of classify_documents
2024-11-13 14:32:10 +01:00
Leon Luithlen
826de0edbf
Remove orphan dictionary
2024-11-12 16:47:28 +01:00
Leon Luithlen
83995fa548
Try old version of classify_documents
2024-11-12 16:47:28 +01:00
Leon Luithlen
8107709e98
Remove duplicate pdf key
2024-11-12 16:47:28 +01:00
Leon Luithlen
fbd011560a
Rebase onto main
2024-11-12 16:47:28 +01:00
Leon Luithlen
d7ffef1979
Remove old __tests__ folders
2024-11-12 16:47:28 +01:00
Leon Luithlen
86e726d741
Complete migrating unit tests
2024-11-12 16:47:28 +01:00
Leon Luithlen
66fb2948f8
Small cleanup pull request
2024-11-12 15:37:03 +01:00
Leon Luithlen
adaf69c127
Readd infer_data_ontology models
2024-11-12 09:05:51 +01:00
Boris Arzentar
b1b6b79ca4
fix: convert qdrant search results to ScoredPoint
2024-11-12 09:01:03 +01:00
Boris Arzentar
68700f32c7
fix: add code graph generation pipeline
2024-11-12 09:01:03 +01:00
Boris Arzentar
e1e5e7336a
fix: remove unused import
2024-11-12 09:01:03 +01:00
Boris Arzentar
7ea5f638fe
fix: add summaries to the graph
2024-11-12 09:01:03 +01:00
Boris Arzentar
a2b1087c84
feat: add FalkorDB integration
2024-11-12 09:01:01 +01:00
Boris
52180eb6b5
feat: COG-184 add falkordb ( #192 )
...
* feat: add falkordb adapter
---------
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-11-11 18:20:52 +01:00
Vasilije
16ee97cb68
fixes
2024-11-08 12:01:47 +01:00
Leon Luithlen
cb205069bc
Replace parse_obj with model_validate
2024-11-04 17:20:01 +01:00
Leon Luithlen
79642cfca1
Replace update_forward_refs with model_rebuild calls
2024-11-04 15:53:40 +01:00
Igor Ilic
3567e0d7e7
fix: Fix chunk naive llm classifier
...
Fixed chunk naive llm classifier uuid issue, added fix for deletion of data points for LanceDB
Fix #COG-472
2024-10-31 00:42:18 +01:00
Boris
2f832b190c
fix: various fixes for the deployment
...
* fix: remove groups from UserRead model
* fix: add missing system dependencies for postgres
* fix: change vector db provider environment variable name
* fix: WeaviateAdapter retrieve bug
* fix: correctly return data point objects from retrieve method
* fix: align graph object properties
* feat: add node example
2024-10-22 11:26:48 +02:00
Boris
dc187a81d7
feat: migrate search to tasks ( #144 )
...
* fix: don't return anything on health endpoint
* feat: add alembic migrations
* feat: align search types with the data we store and migrate search to tasks
2024-10-07 14:41:35 +02:00
Igor Ilic
fcd60861ba
fix: Fix Jupyter Notebook ( #142 )
...
* fix: resolve issue with dlt sqlalchemy usage
Cognee database configuration information was not handled properly by dlt, a new dlt handler
moudle was made to handle database configuration propagation.
* fix: resolve issue with jupyter notebook
cognee add function uses old way of working in the notebook, updated it to
work with the latest state of the cognee add function which doesn't return output.
* fix: Remove empty DB_PATH argument from .env.template
Empty value for DB_PATH in the .env file overrides default value for path intended to be used by cognee.
---------
2024-10-07 12:58:54 +02:00
Boris
01582d7a55
feat: split add into tasks and use pipeline architecture ( #141 )
...
* feat: split add into tasks and use pipeline architecture
2024-09-30 14:09:20 +02:00
Boris
58db1ac2c8
chore: increase the lib version ( #138 )
2024-09-21 17:57:35 +02:00
Boris
a9433e9283
feat: add sqlalchemy as dlt destination ( #137 )
...
* feat: add sqlalchemy as dlt destination
* Fix the demo, update Readme
* fix: add 1.5 notebook
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2024-09-21 15:58:28 +02:00
Boris
e1a0b55a21
feat: user authentication in routes ( #133 )
...
* feat: require logged in user in routes
2024-09-08 21:12:49 +02:00
Boris
94a674a088
feat: split document reader from chunker ( #131 )
...
* fix: abstract chunking into a separate class
* fix: yield merged text from text chunker
* fix: split python version tests
* fix: change postgres live check
* fix: remove unnecessary code
* fix: update checkout action
* fix: update setup-python action
* fix: add PG_USER env variable
* fix: make sure relationship_name is used everywhere
* fix: remove duplicate import
2024-08-19 14:36:10 +02:00
Vasilije
e80d39167b
Enable different chunking methods
2024-08-08 19:59:26 +02:00
Vasilije
4675a8f323
Refactor of the tasks
2024-08-08 17:10:43 +02:00
Vasilije
156c7bec68
Refactor of the tasks
2024-08-08 13:47:03 +02:00
Vasilije
85160da387
Refactor of the tasks
2024-08-08 13:37:55 +02:00
Vasilije
2e367198cd
Task updates and updates to SQLAlchemy Adapter
2024-08-07 18:21:14 +02:00
Vasilije
557014e06b
Task updates and updates to SQLAlchemy Adapter
2024-08-07 13:29:53 +02:00