Commit graph

109 commits

Author SHA1 Message Date
Daniel Molnar
91f3cd9ef7
fix: notebooks (#818)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-05-13 18:13:26 +02:00
Diego Baptista Theuerkauf
13bb244746
feat: Create notebook to show how to compute ranks from graph (#771)
<!-- .github/pull_request_template.md -->

## Description
As discussed with @hande-k and Lazar, I've created a short demo to
illustrate how to get the pagerank rankings from the knowledge graph
given the nx engine. This is a POC, and a first of step towards solving
#643 .

Please let me know what you think, and how to proceed from here. :)

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-05-13 16:36:51 +02:00
Daniel Molnar
9ba12b25ef
feat: add delete by document (#668)
<!-- .github/pull_request_template.md -->

## Description
Delete by document.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-04-17 15:42:10 +02:00
Vasilije
228fba8096
fix: Refactor notebooks (#720)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-04-11 10:23:22 +02:00
Boris
9536395468
Revert "feat: pipeline tasks needs mapping" (#717)
Reverts topoteretes/cognee#690

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-10 12:10:12 +02:00
Igor Ilic
f5cd39c09d
fix: Resolve failing test for RAG_COMPLETION, add RAG_COMPLETION to MCP (#706)
<!-- .github/pull_request_template.md -->

## Description
Resolve failing test for RAG_COMPLETION, add RAG_COMPLETION to MCP

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Boris <boris@topoteretes.com>
2025-04-07 18:13:15 +02:00
Vasilije
cd0d321eda
feat: Rename COMPLETION to RAG_COMPLETION (#701)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-07 11:46:48 +02:00
Boris
0ce6fad24a
feat: pipeline tasks needs mapping (#690)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-04-03 10:52:59 +02:00
Vasilije
4f72b597c9
ontology notebook added (#671)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-31 18:59:23 +02:00
Daniel Molnar
73db1a5a53
fix: human readable logs (#658)
<!-- .github/pull_request_template.md -->

## Description
Introducing scructlog.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-25 11:54:40 +01:00
alekszievr
c1f7b667d1
feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Simplified text processing by unifying multiple size-related
parameters into a single metric across chunking and extraction
functionalities.
- Streamlined logic for text segmentation by removing redundant
calculations and checks, resulting in a more consistent chunk management
process.
- **Chores**
  - Removed the `modal` package as a dependency.
- **Documentation**
- Updated the README.md to include a new demo video link and clarified
default environment variable settings.
- Enhanced the CONTRIBUTING.md to improve clarity and engagement for
potential contributors.
- **Bug Fixes**
- Improved handling of sentence-ending punctuation in text processing to
include additional characters.
- **Version Update**
  - Updated project version to 0.1.33 in the pyproject.toml file.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-12 14:03:41 +01:00
hibajamal
56427f287e
Demo for relational db with cognee (#620)
<!-- .github/pull_request_template.md -->

## Description
This demo uses pydantic models and dlt to pull data from the Pokémon API
and structure it into a relational format. By feeding this structured
data into cognee, it makes searching across multiple tables easier and
more intuitive, thanks to the relational model.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a comprehensive Pokémon data processing pipeline, available
as both a Python script and an interactive Jupyter Notebook.
- Enabled asynchronous operations for efficient data collection and
querying, including an integrated search functionality.
- Improved error handling and data validation during the data fetching
and processing stages for a smoother user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-03-08 20:33:42 +01:00
Hande
65d0f7317c
replace video in readme (#617)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Updated the demo reference in the documentation by replacing an
embedded video thumbnail with a simplified "Learn about cognee" text
link.
  
- **Chores**
- Integrated a minor internal update to align a related data component
with the latest project state, with no visible impact on functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-07 17:01:38 +01:00
Boris
e8ab5b4797
fix: tiktoken upgrade (#587)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Removed an outdated internal tracking reference to streamline
maintenance.
- Upgraded a key dependency to its latest stable release, delivering
enhanced performance and reliability for a smoother user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 18:16:11 +01:00
Daniel Molnar
d27f847753
Transition to new retrievers, update searches (#585)
<!-- .github/pull_request_template.md -->

## Description
Delete legacy search implementations after migrating to new retriever
classes

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced search and retrieval capabilities, providing improved context
resolution for code queries, completions, summaries, and graph
connections.
  
- **Refactor**
- Shifted to a modular, object-oriented approach that consolidates query
logic and streamlines error management for a more robust and scalable
experience.
  
- **Bug Fixes**
- Improved error handling for unsupported search types and retrieval
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 15:25:24 +01:00
lxobr
9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00
Boris
45f7c63322
fix: notebooks errors (#565)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Automatically creates a blank graph when a file isn’t found, ensuring
smoother operations.
- Updated demonstration notebooks with dynamic configurations, including
refined search operations and input prompts.
- Introduced optional support for additional graph functionalities via
an integrated dependency.

- **Refactor**
- Streamlined processing by eliminating duplicate steps and simplifying
graph rendering workflows.

- **Chores**
- Updated environment configurations and upgraded the Python runtime for
improved performance and consistency.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-19 14:07:11 -08:00
Boris
ada466879e
fix: add default params to run_tasks (#563)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced the task execution process by enabling default values for
certain parameters, allowing users to trigger task processing without
supplying every input explicitly.
  
- **Bug Fixes**
- Adjusted asynchronous handling for the `retrieved_edges_to_string`
function to ensure proper execution flow in various components.

- **Documentation**
- Updated markdown formatting in the Jupyter notebook for improved
readability and structure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-19 20:18:51 +01:00
Igor Ilic
09b9255639
GraphRAG vs RAG notebook (#503)
<!-- .github/pull_request_template.md -->

## Description
GraphRAG vs RAG cognee notebook

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Tests**
- Implemented automated validations to continuously monitor and ensure
the reliability of our interactive notebook features. These improvements
enhance overall stability and performance, enabling a more consistent
and dependable user experience.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris <boris@topoteretes.com>
2025-02-18 01:21:14 +01:00
Vasilije
2072c7a081
feat: improve tests add macos runners (#540)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Improved automated testing setups to run across multiple operating
systems (Ubuntu and macOS) for Python 3.10, 3.11, and 3.12.
- Enhanced compatibility and coverage across diverse environments,
ensuring a more robust validation process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: soekja <wes.hubert@gmail.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-15 04:19:19 +01:00
Boris Arzentar
fd9101af34 Merge remote-tracking branch 'origin/dev' 2025-02-13 00:02:02 +01:00
Boris
f9e6dcf837
fix: simplify code pipeline (#529)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
  - Enhanced code search and dependency analysis for improved accuracy.
  - Introduced a new high-performance text embedding option.
  - Added an additional execution entry point for code graph processing.
- New optional parameters for flexible property selection in retrieval
functions.
- Introduced new classes for handling import statements, function
definitions, and class definitions.
  - Updated embedding engine selection based on configuration options.

- **Bug Fixes**
- Improved error handling in search operations and database queries for
a more stable user experience.
  - Enhanced error logging for source code parsing.

- **Refactor**
- Streamlined asynchronous processing and refactored internal dependency
extraction.
- Updated configuration and integration settings to enhance overall
reliability.
  - Restructured functions for simplified dependency handling.

- **Chores**
- Upgraded and reorganized dependency management with optional libraries
for extended functionality.
- Added new secret parameters for embedding configuration in workflow
settings.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
2025-02-12 23:58:48 +01:00
Vasilije
9ba2e0d6c1
chore: Fix and update visualization (#518)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced enhanced visualization capabilities that let users launch a
dedicated server for visual displays.
  
- **Documentation**
- Updated several interactive notebooks to include execution outputs and
expanded explanatory content for better user guidance.
  
- **Style**
- Refined formatting and layout across notebooks to ensure consistent
presentation and improved readability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2025-02-11 19:25:01 +01:00
Igor Ilic
2a04fa3738
fix: Resolve llama-index integration issue with new cognee version [ COG-1270] (#515)
<!-- .github/pull_request_template.md -->

## Description
Change version to latest llama index cognee integration version which
has a proper fix for the failing notebook

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated an AI integration dependency to version 0.1.3 in both the
testing workflow and the Jupyter notebook, ensuring that the environment
uses the latest version for improved consistency during tests.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-02-11 18:28:31 +01:00
alekszievr
05ba29af01
Feat: log pipeline status and pass it through pipeline [COG-1214] (#501)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced pipeline execution now provides consolidated status feedback
with improved telemetry for start, completion, and error events.
- Automatic generation of unique dataset identifiers offers clearer task
and pipeline run associations.

- **Refactor**
- Task execution has been streamlined with explicit parameter handling
for more structured pipeline processing.
- Interactive examples and demos now return results directly, making
integration and monitoring more accessible.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-02-11 16:41:40 +01:00
Igor Ilic
3850e9c7a1
Cognee simple document example (#521)
<!-- .github/pull_request_template.md -->

## Description
Notebook and python example for cognee simple example

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced an interactive demo showcasing asynchronous document
processing and querying for key insights from a sample text.
- **Documentation**
- Added an in-depth, step-by-step guide in a Jupyter Notebook that walks
users through setup, configuration, querying, and visualizing processed
data.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 13:58:35 +01:00
Boris Arzentar
28f5422811 Merge remote-tracking branch 'origin/dev' 2025-02-08 02:02:37 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
Igor Ilic
5fe7ff9883
refactor: Refactor search so graph completion is used by default (#505)
<!-- .github/pull_request_template.md -->

## Description
Refactor search so query type doesn't need to be provided to make it
simpler for new users

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved the search interface by standardizing parameter usage with
explicit keyword arguments for specifying search types, enhancing
clarity and consistency.
- **Tests**
- Updated test cases and example integrations to align with the revised
search parameters, ensuring consistent behavior and reliable validation
of search outcomes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-07 17:16:34 +01:00
Boris Arzentar
a01d153971 Merge remote-tracking branch 'origin/dev' 2025-02-01 15:30:01 +01:00
Igor Ilic
3e29c3d8f2 docs: Update notebook to work with changes to max chunk tokens 2025-01-28 15:38:38 +01:00
Hande
0fb19ca21d
docs: update readme with "How Cognee Solves Real-World Pain Points" 2025-01-27 11:36:11 +01:00
hande-k
52e5b5c6f4 edit cognee_hotpot_eval.ipynb 2025-01-24 13:46:50 +01:00
alekszievr
bc1b05437e
Merge branch 'dev' into cog-1069-update-notebooks-evals 2025-01-23 20:32:30 +01:00
hande-k
cdecf5fb8f add short decsription in md 2025-01-23 11:17:48 +01:00
hande-k
343de01d5a update notebooks with latest eval 2025-01-23 11:11:51 +01:00
Igor Ilic
77f0b45a0d refactor: Resolve issue with notebook after metadata refactor
Resolve issue with LlamaIndex notebook after refactor
2025-01-20 18:02:57 +01:00
hajdul88
9e63bacaa7
Merge branch 'dev' into feature/cog-761-project-graphiti-graph-to-memory 2025-01-15 11:49:10 +01:00
hajdul88
1db44de7de feat: adds graphiti demo notebook 2025-01-15 11:45:06 +01:00
Igor Ilic
259414add0 docs: Update LlamaIndex integration notebook 2025-01-14 15:32:27 +01:00
Boris
0f97f8f71b
Version 0.1.22 (#438)
* Revert "fix: Add metadata reflection fix to sqlite as well"

This reverts commit 394a0b2dfb.

* COG-810 Implement a top-down dependency graph builder tool (#268)

* feat: parse repo to call graph

* Update/repo_processor/top_down_repo_parse.py task

* fix: minor improvements

* feat: file parsing jedi script optimisation

---------

* Add type to DataPoint metadata (#364)

* Add missing index_fields

* Use DataPoint UUID type in pgvector create_data_points

* Make _metadata mandatory everywhere

* feat: Add search by dataset for cognee

Added ability to search by datasets for cognee users

Feature COG-912

* feat: outsources chunking parameters to extract chunk from documents … (#289)

* feat: outsources chunking parameters to extract chunk from documents task

* fix: Remove backend lock from UI

Removed lock that prevented using multiple datasets in cognify

Fix COG-912

* COG 870 Remove duplicate edges from the code graph (#293)

* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>

* test: Added test for getting of documents for search

Added test to verify getting documents related to datasets intended for search

Test COG-912

* Structured code summarization (#375)

* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

* Structured code summarization

* add missing prompt file

* Remove summarization_model argument from summarize_code and fix typehinting

* minor refactors

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>

* fix: Resolve issue with cognify router graph model default value

Resolve issue with default value for graph model in cognify endpoint

Fix

* chore: Resolve typo in getting documents code

Resolve typo in code

chore COG-912

* Update .github/workflows/dockerhub.yml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update .github/workflows/dockerhub.yml

* Update .github/workflows/dockerhub.yml

* Update .github/workflows/dockerhub.yml

* Update get_cognify_router.py

* fix: Resolve syntax issue with cognify router

Resolve syntax issue with cognify router

Fix

* feat: Add ruff pre-commit hook for linting and formatting

Added formatting and linting on pre-commit hook

Feature COG-650

* chore: Update ruff lint options in pyproject file

Update ruff lint options in pyproject file

Chore

* test: Add ruff linter github action

Added linting check with ruff in github actions

Test COG-650

* feat: deletes executor limit from get_repo_file_dependencies

* feat: implements mock feature in LiteLLM engine

* refactor: Remove changes to cognify router

Remove changes to cognify router

Refactor COG-650

* fix: fixing boolean env for github actions

* test: Add test for ruff format for cognee code

Test if code is formatted for cognee

Test COG-650

* refactor: Rename ruff gh actions

Rename ruff gh actions to be more understandable

Refactor COG-650

* chore: Remove checking of ruff lint and format on push

Remove checking of ruff lint and format on push

Chore COG-650

* feat: Add deletion of local files when deleting data

Delete local files when deleting data from cognee

Feature COG-475

* fix: changes back the max workers to 12

* feat: Adds mock summary for codegraph pipeline

* refacotr: Add current development status

Save current development status

Refactor

* Fix langfuse

* Fix langfuse

* Fix langfuse

* Add evaluation notebook

* Rename eval notebook

* chore: Add temporary state of development

Add temp development state to branch

Chore

* fix: Add poetry.lock file, make langfuse mandatory

Added langfuse as mandatory dependency, added poetry.lock file

Fix

* Fix: fixes langfuse config settings

* feat: Add deletion of local files made by cognee through data endpoint

Delete local files made by cognee when deleting data from database through endpoint

Feature COG-475

* test: Revert changes on test_pgvector

Revert changes on test_pgvector which were made to test deletion of local files

Test COG-475

* chore: deletes the old test for the codegraph pipeline

* test: Add test to verify deletion of local files

Added test that checks local files created by cognee will be deleted and those not created by cognee won't

Test COG-475

* chore: deletes unused old version of the codegraph

* chore: deletes unused imports from code_graph_pipeline

* Ingest non-code files

* Fixing review findings

* Ingest non-code files (#395)

* Ingest non-code files

* Fixing review findings

* test: Update test regarding message

Update assertion message, add veryfing of file existence

* Handle retryerrors in code summary (#396)

* Handle retryerrors in code summary

* Log instead of print

* fix: updates the acreate_structured_output

* chore: Add logging to sentry when file which should exist can't be found

Log to sentry that a file which should exist can't be found

Chore COG-475

* Fix diagram

* fix: refactor mcp

* Add Smithery CLI installation instructions and badge

* Move readme

* Update README.md

* Update README.md

* Cog 813 source code chunks (#383)

* fix: pass the list of all CodeFiles to enrichment task

* feat: introduce SourceCodeChunk, update metadata

* feat: get_source_code_chunks code graph pipeline task

* feat: integrate get_source_code_chunks task, comment out summarize_code

* Fix code summarization (#387)

* feat: update data models

* feat: naive parse long strings in source code

* fix: get_non_py_files instead of get_non_code_files

* fix: limit recursion, add comment

* handle embedding empty input error (#398)

* feat: robustly handle CodeFile source code

* refactor: sort imports

* todo: add support for other embedding models

* feat: add custom logger

* feat: add robustness to get_source_code_chunks

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat: improve embedding exceptions

* refactor: format indents, rename module

---------

Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Fix diagram

* Fix diagram

* Fix instructions

* Fix instructions

* adding and fixing files

* Update README.md

* ruff format

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Implement PR review

* Comment out profiling

* Comment out profiling

* Comment out profiling

* fix: add allowed extensions

* fix: adhere UnstructuredDocument.read() to Document

* feat: time code graph run and add mock support

* Fix ollama, work on visualization

* fix: Fixes faulty logging format and sets up error logging in dynamic steps example

* Overcome ContextWindowExceededError by checking token count while chunking (#413)

* fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints

* Adjust AudioDocument and handle None token limit

* Handle azure models as well

* Fix visualization

* Fix visualization

* Fix visualization

* Add clean logging to code graph example

* Remove setting envvars from arg

* fix: fixes create_cognee_style_network_with_logo unit test

* fix: removes accidental remained print

* Fix visualization

* Fix visualization

* Fix visualization

* Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.

* Fix visualization

* Fix visualization

* Fix poetry issues

* Get embedding engine instead of passing it in code chunking.

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* chore: Update version of poetry install action

* chore: Update action to trigger on pull request for any branch

* chore: Remove if in github action to allow triggering on push

* chore: Remove if condition to allow gh actions to trigger on push to PR

* chore: Update poetry version in github actions

* chore: Set fixed ubuntu version to 22.04

* chore: Update py lint to use ubuntu 22.04

* chore: update ubuntu version to 22.04

* feat: implements the first version of graph based completion in search

* chore: Update python 3.9 gh action to use 3.12 instead

* chore: Update formatting of utils.py

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Adjust integration tests

* fix: Fixes ruff formatting

* Handle circular import

* fix: Resolve profiler issue with partial and recursive logger imports

Resolve issue for profiler with partial and recursive logger imports

* fix: Remove logger from __init__.py file

* test: Test profiling on HEAD branch

* test: Return profiler to base branch

* Set max_tokens in config

* Adjust SWE-bench script to code graph pipeline call

* Adjust SWE-bench script to code graph pipeline call

* fix: Add fix for accessing dictionary elements that don't exits

Using get for the text key instead of direct access to handle situation if the text key doesn't exist

* feat: Add ability to change graph database configuration through cognee

* feat: adds pydantic types to graph layer models

* test: Test ubuntu 24.04

* test: change all actions to ubuntu-latest

* feat: adds basic retriever for swe bench

* Match Ruff version in config to the one in github actions

* feat: implements code retreiver

* Fix: fixes unit test for codepart search

* Format with Ruff 0.9.0

* Fix: deleting incorrect repo path

* docs: Add LlamaIndex Cognee integration notebook

Added LlamaIndex Cognee integration notebook

* test: Add github action for testing llama index cognee integration notebook

* fix: resolve issue with langfuse dependency installation when integrating cognee in different packages

* version: Increase version to 0.1.21

* fix: update dependencies of the mcp server

* Update README.md

* Fix: Fixes logging setup

* feat: deletes on the fly embeddings as uses edge collections

* fix: Change nbformat on llama index integration notebook

* fix: Resolve api key issue with llama index integration notebook

* fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault

* version: Increase version to 0.1.22

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Rita Aleksziev <alekszievr@gmail.com>
Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>
2025-01-13 22:36:42 +01:00
Igor Ilic
adee79d7a5 fix: Change nbformat on llama index integration notebook 2025-01-13 11:54:05 +01:00
Boris
fbc9cefdab
Version 0.1.21 (#431)
* feat: Add error handling in case user is already part of database and permission already given to group

Added error handling in case permission is already given to group and user is already part of group

Feature COG-656

* feat: Add user verification for accessing data

Verify user has access to data before returning it

Feature COG-656

* feat: Add compute search to cognee

Add compute search to cognee which makes searches human readable

Feature COG-656

* feat: Add simple instruction for system prompt

Add simple instruction for system prompt

Feature COG-656

* pass pydantic model tocognify

* feat: Add unauth access error to getting data

Raise unauth access error when trying to read data without access

Feature COG-656

* refactor: Rename query compute to query completion

Rename searching type from compute to completion

Refactor COG-656

* chore: Update typo in code

Update typo in string in code

Chore COG-656

* Add mcp to cognee

* Add simple README

* Update cognee-mcp/mcpcognee/__main__.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Create dockerhub.yml

* Update get_cognify_router.py

* fix: Resolve reflection issue when running cognee a second time after pruning data

When running cognee a second time after pruning data some metadata doesn't get pruned.
This makes cognee believe some tables exist that have been deleted

Fix

* fix: Add metadata reflection fix to sqlite as well

Added fix when reflecting metadata to sqlite as well

Fix

* update

* Revert "fix: Add metadata reflection fix to sqlite as well"

This reverts commit 394a0b2dfb.

* COG-810 Implement a top-down dependency graph builder tool (#268)

* feat: parse repo to call graph

* Update/repo_processor/top_down_repo_parse.py task

* fix: minor improvements

* feat: file parsing jedi script optimisation

---------

* Add type to DataPoint metadata (#364)

* Add type to DataPoint metadata

* Add missing index_fields

* Use DataPoint UUID type in pgvector create_data_points

* Make _metadata mandatory everywhere

* Fixes

* Fixes to our demo

* feat: Add search by dataset for cognee

Added ability to search by datasets for cognee users

Feature COG-912

* feat: outsources chunking parameters to extract chunk from documents … (#289)

* feat: outsources chunking parameters to extract chunk from documents task

* fix: Remove backend lock from UI

Removed lock that prevented using multiple datasets in cognify

Fix COG-912

* COG 870 Remove duplicate edges from the code graph (#293)

* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>

* test: Added test for getting of documents for search

Added test to verify getting documents related to datasets intended for search

Test COG-912

* Structured code summarization (#375)

* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

* Structured code summarization

* add missing prompt file

* Remove summarization_model argument from summarize_code and fix typehinting

* minor refactors

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>

* fix: Resolve issue with cognify router graph model default value

Resolve issue with default value for graph model in cognify endpoint

Fix

* chore: Resolve typo in getting documents code

Resolve typo in code

chore COG-912

* Update .github/workflows/dockerhub.yml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update .github/workflows/dockerhub.yml

* Update .github/workflows/dockerhub.yml

* Update .github/workflows/dockerhub.yml

* Update get_cognify_router.py

* fix: Resolve syntax issue with cognify router

Resolve syntax issue with cognify router

Fix

* feat: Add ruff pre-commit hook for linting and formatting

Added formatting and linting on pre-commit hook

Feature COG-650

* chore: Update ruff lint options in pyproject file

Update ruff lint options in pyproject file

Chore

* test: Add ruff linter github action

Added linting check with ruff in github actions

Test COG-650

* feat: deletes executor limit from get_repo_file_dependencies

* feat: implements mock feature in LiteLLM engine

* refactor: Remove changes to cognify router

Remove changes to cognify router

Refactor COG-650

* fix: fixing boolean env for github actions

* test: Add test for ruff format for cognee code

Test if code is formatted for cognee

Test COG-650

* refactor: Rename ruff gh actions

Rename ruff gh actions to be more understandable

Refactor COG-650

* chore: Remove checking of ruff lint and format on push

Remove checking of ruff lint and format on push

Chore COG-650

* feat: Add deletion of local files when deleting data

Delete local files when deleting data from cognee

Feature COG-475

* fix: changes back the max workers to 12

* feat: Adds mock summary for codegraph pipeline

* refacotr: Add current development status

Save current development status

Refactor

* Fix langfuse

* Fix langfuse

* Fix langfuse

* Add evaluation notebook

* Rename eval notebook

* chore: Add temporary state of development

Add temp development state to branch

Chore

* fix: Add poetry.lock file, make langfuse mandatory

Added langfuse as mandatory dependency, added poetry.lock file

Fix

* Fix: fixes langfuse config settings

* feat: Add deletion of local files made by cognee through data endpoint

Delete local files made by cognee when deleting data from database through endpoint

Feature COG-475

* test: Revert changes on test_pgvector

Revert changes on test_pgvector which were made to test deletion of local files

Test COG-475

* chore: deletes the old test for the codegraph pipeline

* test: Add test to verify deletion of local files

Added test that checks local files created by cognee will be deleted and those not created by cognee won't

Test COG-475

* chore: deletes unused old version of the codegraph

* chore: deletes unused imports from code_graph_pipeline

* Ingest non-code files

* Fixing review findings

* Ingest non-code files (#395)

* Ingest non-code files

* Fixing review findings

* test: Update test regarding message

Update assertion message, add veryfing of file existence

* Handle retryerrors in code summary (#396)

* Handle retryerrors in code summary

* Log instead of print

* fix: updates the acreate_structured_output

* chore: Add logging to sentry when file which should exist can't be found

Log to sentry that a file which should exist can't be found

Chore COG-475

* Fix diagram

* fix: refactor mcp

* Add Smithery CLI installation instructions and badge

* Move readme

* Update README.md

* Update README.md

* Cog 813 source code chunks (#383)

* fix: pass the list of all CodeFiles to enrichment task

* feat: introduce SourceCodeChunk, update metadata

* feat: get_source_code_chunks code graph pipeline task

* feat: integrate get_source_code_chunks task, comment out summarize_code

* Fix code summarization (#387)

* feat: update data models

* feat: naive parse long strings in source code

* fix: get_non_py_files instead of get_non_code_files

* fix: limit recursion, add comment

* handle embedding empty input error (#398)

* feat: robustly handle CodeFile source code

* refactor: sort imports

* todo: add support for other embedding models

* feat: add custom logger

* feat: add robustness to get_source_code_chunks

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat: improve embedding exceptions

* refactor: format indents, rename module

---------

Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Fix diagram

* Fix instructions

* adding and fixing files

* Update README.md

* ruff format

* Fix linter issues

* Implement PR review

* Comment out profiling

* fix: add allowed extensions

* fix: adhere UnstructuredDocument.read() to Document

* feat: time code graph run and add mock support

* Fix ollama, work on visualization

* fix: Fixes faulty logging format and sets up error logging in dynamic steps example

* Overcome ContextWindowExceededError by checking token count while chunking (#413)

* fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints

* Adjust AudioDocument and handle None token limit

* Handle azure models as well

* Add clean logging to code graph example

* Remove setting envvars from arg

* fix: fixes create_cognee_style_network_with_logo unit test

* fix: removes accidental remained print

* Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.

* Fix visualization

* Get embedding engine instead of passing it in code chunking.

* Fix poetry issues

* chore: Update version of poetry install action

* chore: Update action to trigger on pull request for any branch

* chore: Remove if in github action to allow triggering on push

* chore: Remove if condition to allow gh actions to trigger on push to PR

* chore: Update poetry version in github actions

* chore: Set fixed ubuntu version to 22.04

* chore: Update py lint to use ubuntu 22.04

* chore: update ubuntu version to 22.04

* feat: implements the first version of graph based completion in search

* chore: Update python 3.9 gh action to use 3.12 instead

* chore: Update formatting of utils.py

* Fix poetry issues

* Adjust integration tests

* fix: Fixes ruff formatting

* Handle circular import

* fix: Resolve profiler issue with partial and recursive logger imports

Resolve issue for profiler with partial and recursive logger imports

* fix: Remove logger from __init__.py file

* test: Test profiling on HEAD branch

* test: Return profiler to base branch

* Set max_tokens in config

* Adjust SWE-bench script to code graph pipeline call

* Adjust SWE-bench script to code graph pipeline call

* fix: Add fix for accessing dictionary elements that don't exits

Using get for the text key instead of direct access to handle situation if the text key doesn't exist

* feat: Add ability to change graph database configuration through cognee

* feat: adds pydantic types to graph layer models

* feat: adds basic retriever for swe bench

* Match Ruff version in config to the one in github actions

* feat: implements code retreiver

* Fix: fixes unit test for codepart search

* Format with Ruff 0.9.0

* Fix: deleting incorrect repo path

* fix: resolve issue with langfuse dependency installation when integrating cognee in different packages

* version: Increase version to 0.1.21

---------

Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Rita Aleksziev <alekszievr@gmail.com>
Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>
2025-01-10 19:37:50 +01:00
Igor Ilic
9a4613a9dd docs: Add LlamaIndex Cognee integration notebook
Added LlamaIndex Cognee integration notebook
2025-01-10 16:49:23 +01:00
vasilije
cbd15b98a5 Fix linter issues 2025-01-05 20:24:04 +01:00
vasilije
2675836149 Fix linter issues 2025-01-05 20:17:49 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
Vasilije
d3739d91c9
Merge branch 'dev' into test 2024-12-19 19:05:26 +01:00
vasilije
ae886ba585 Fix langfuse 2024-12-19 19:03:41 +01:00
vasilije
f9fea0a0c8 Fix langfuse 2024-12-19 19:03:26 +01:00