Commit graph

864 commits

Author SHA1 Message Date
alekszievr
edae2771a5
Count the number of tokens in documents [COG-1071] (#476)
* Count the number of tokens in documents

* save token count to relational db

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-01-29 11:29:09 +01:00
Igor Ilic
860218632f refactor: add suggestions from PR
Add suggestsions made by CodeRabbit on pull request
2025-01-28 17:15:25 +01:00
Igor Ilic
a8644e0bd7 feat: Use litellm max token size as default for model, if model exists in litellm 2025-01-28 17:00:47 +01:00
Igor Ilic
710ca78d6e
Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-28 16:31:11 +01:00
alekszievr
98f0f60980
Feat: [cog-1089] Define pydantic models for descriptive graph metrics and input metrics (#466)
* feat: make tasks a configurable argument in the cognify function

* fix: add data points task

* Define pydantic models for descriptive graph metrics and input metrics

* remove to_json method

* Use just one MetricData class instead of two

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-01-28 16:11:31 +01:00
Igor Ilic
6f8cbdbf1c
Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-28 15:44:57 +01:00
Igor Ilic
4e56cd64a1 refactor: Add max chunk tokens to code graph pipeline 2025-01-28 15:33:34 +01:00
Igor Ilic
dc0450d30e test: Update document tests regrading max chunk tokens 2025-01-28 15:21:43 +01:00
Igor Ilic
41544369af test: Change test_by_paragraph tests to accomodate to change 2025-01-28 14:47:17 +01:00
Igor Ilic
3db7f85c9c feat: Add max_chunk_tokens value to chunkers
Add formula and forwarding of max_chunk_tokens value through Cognee
2025-01-28 14:32:00 +01:00
Igor Ilic
49f60971bb Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-28 10:12:55 +01:00
Boris Arzentar
f811ab44e0 Merge remote-tracking branch 'origin/dev' into feat/COG-1060-code-pipeline-endpoints 2025-01-28 10:10:38 +01:00
Igor Ilic
0a9f1349f2 refactor: Change variable and function names based on PR comments
Change variable and function names based on PR comments
2025-01-28 10:10:29 +01:00
Boris Arzentar
3320bc8f2c feat: add codegraph related API endpoints 2025-01-28 10:08:59 +01:00
Boris
8da81c1de3
Merge branch 'dev' into pgvector-add-normalization 2025-01-27 11:31:24 +01:00
Boris
0c2c5870df
fix: use low_lever server for cognee mcp server (#470)
* fix: revert to older mcp version

* fix: use low_level server for the mcp

* fix: styling errors

* fix: mcp cognify arguments

* fix: ruff errors
2025-01-26 12:52:48 +01:00
Igor Ilic
89d4b7a5c4
Merge branch 'dev' into pgvector-add-normalization 2025-01-24 19:24:39 +01:00
Igor Ilic
23ecf245ed fix: Return string conversion to resolve traceback 2025-01-24 19:20:55 +01:00
Igor Ilic
b0cec3fcaa refactor: Remove conversion to string 2025-01-24 19:03:57 +01:00
Igor Ilic
ffbb387580
Merge branch 'dev' into fix-insert-data 2025-01-24 18:55:41 +01:00
Igor Ilic
77a72851fc Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-24 18:34:50 +01:00
Igor Ilic
cdc992750a test: Add github action to test code graph 2025-01-24 18:12:16 +01:00
Igor Ilic
902979c1de refactor: Refactor get source code chunks based on tokenizer rework 2025-01-24 13:40:10 +01:00
Igor Ilic
844d99cb72 docs: Remove commented code 2025-01-23 18:24:26 +01:00
Igor Ilic
7dea1d54d7 refactor: Add specific max token values to embedding models 2025-01-23 18:18:45 +01:00
Igor Ilic
6d5679f9d2 Merge branch 'dev' into COG-970-refactor-tokenizing 2025-01-23 18:14:49 +01:00
Igor Ilic
1319944dcd docs: Update .env.template to include llm and embedding options 2025-01-23 18:05:45 +01:00
Igor Ilic
b686376c54 feat: Add gemini tokenizer to cognee 2025-01-23 17:55:04 +01:00
Igor Ilic
294ed1d960 feat: Add HuggingFace Tokenizer support 2025-01-23 16:52:35 +01:00
Igor Ilic
2e1a48e22c docs: Add usage example of function 2025-01-23 15:13:46 +01:00
Igor Ilic
de19016494 fix: Add flag to allow SQLite to use foreign keys 2025-01-23 15:10:27 +01:00
Igor Ilic
d4453e4a1d fix: Add support for SQLite and PostgreSQL for inserting data in SQLAlchemyAdapter 2025-01-23 14:59:02 +01:00
Boris Arzentar
e577276d91 Merge remote-tracking branch 'origin/dev' into feat/COG-1058-fastmcp 2025-01-23 11:46:25 +01:00
Boris Arzentar
00f302c37a feat: use fastmcp for mcp server 2025-01-23 11:45:40 +01:00
Igor Ilic
9f6a0ba783
Merge branch 'dev' into pgvector-add-normalization 2025-01-23 11:11:43 +01:00
Igor Ilic
40c0279ec5 Merge branch 'COG-793-metadata-rework' of github.com:topoteretes/cognee into COG-793-metadata-rework 2025-01-22 16:13:11 +01:00
Igor Ilic
80e67b0619 refactor: Rename foreign to external metadata
Rename foreign metadata to external metadata for metadata coming outside of Cognee
2025-01-22 16:07:35 +01:00
Igor Ilic
93249c72c5 fix: Initial commit to resolve issue with using tokenizer based on LLMs
Currently TikToken is used for tokenizing by default which is only supported by OpenAI,
this is an initial commit in an attempt to add Cognee tokenizing support for multiple LLMs
2025-01-21 19:53:22 +01:00
Igor Ilic
655ab0b8cc
Merge branch 'dev' into COG-793-metadata-rework 2025-01-21 18:20:49 +01:00
Vasilije
c9536f97a5
Merge pull request #451 from topoteretes/add_docstrings
chore: add docstrings any typing to cognee tasks
2025-01-21 14:07:19 +01:00
Igor Ilic
bd3a5a758c
Merge branch 'dev' into COG-793-metadata-rework 2025-01-20 18:06:21 +01:00
Igor Ilic
4196a4ce89 refactor: Update test to be up to date with current metadata refactor effort 2025-01-20 17:53:54 +01:00
Igor Ilic
5c17501bb8 refactor: add missing foreing_metadata attr to tests 2025-01-20 17:38:28 +01:00
Igor Ilic
ab8d95cc30 refactor: As neo4j can't support dictionaries, add foreign metadata as string 2025-01-20 17:28:14 +01:00
Igor Ilic
49ad292592 refactor: Reduce complexity of metadata handling
Have foreign metadata be a table column in data instead of it's own table to reduce complexity

Refactor COG-793
2025-01-20 16:39:05 +01:00
Igor Ilic
0c7c1d7503 refactor: Refactor ingestion to only have one ingestion task 2025-01-20 14:33:47 +01:00
hajdul88
813a03c6e2
Merge branch 'dev' into pgvector-add-normalization 2025-01-20 13:46:50 +01:00
Igor Ilic
2546844787 feat: Add normalization to PGVector search
Add normalization to PGVector search results
2025-01-20 13:42:39 +01:00
hajdul88
bf70705ed0 Fix: fixes networkx failed to load graph from file error 2025-01-20 12:19:34 +01:00
Igor Ilic
e7f24548dd
Merge branch 'dev' into add_docstrings 2025-01-17 17:00:23 +01:00