Commit graph

2375 commits

Author SHA1 Message Date
Igor Ilic
f1c5b9a55f fix: Resolve DB caching issues when deleting databases 2025-12-03 18:05:47 +01:00
Igor Ilic
fd84edeb74 refactor: change getting of tables during deletion 2025-12-03 15:43:41 +01:00
Boris
8cad9ef225
Merge branch 'dev' into feature/cog-3409-add-bedrock-as-supported-llm-provider 2025-12-03 14:58:00 +01:00
Boris
ec744f01cc
Merge branch 'dev' into fix/memify-run-in-background-clean 2025-12-03 14:55:32 +01:00
Igor Ilic
45f32f8bfd
Merge branch 'dev' into multi-tenant-neo4j 2025-12-03 14:37:13 +01:00
Igor Ilic
1961efcc33 fix: Handle scenario when there is no relational database on prune time 2025-12-03 14:27:06 +01:00
Igor Ilic
f4078d1247 feat: Add ability to delete lance and kuzu datasets, add prune to work with multi user mode 2025-12-03 13:10:18 +01:00
Igor Ilic
5698c609f5 test: Update tests with regards to auto scaling changes 2025-12-03 11:47:10 +01:00
Boris Arzentar
0d2e84f58e
test: test_strip_quotes_from_strings 2025-12-03 10:59:17 +01:00
Boris
3288ef01a4
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-03 10:05:49 +01:00
ketanjain7981
f26b490a8f refactor: improve test isolation and add connect_args precedence
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-03 00:27:51 +05:30
ketanjain3
1f98d50870
Merge branch 'dev' into feature/sqlalchemy-custom-connect-args 2025-12-03 00:15:03 +05:30
ketanjain7981
a7da9c7d65 test: verify logger warning for invalid JSON in SQLAlchemyAdapter
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 23:35:35 +05:30
ketanjain7981
4f3a1bcf01 test: add unit tests for SQLAlchemyAdapter connection arguments
Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 23:25:47 +05:30
hajdul88
d4d190ac2b
feature: adds triplet embedding via memify (#1832)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces triplet embeddings via a new
create_triplet_embeddings memify pipeline.
The pipeline reads the graph in batches, extracts properties from graph
elements based on their datapoint types, and generates combined triplet
embeddings. These embeddings are stored in the vector database as a new
collection.

Changes in This PR:

-Added a new create_triplet_embeddings memify pipeline.
-Added a new get_triplet_datapoints memify task.
-Introduced a new triplet_completion search type.
-Added full test coverage
--Unit tests: memify task, pipeline, and retriever
--Integration tests: memify task, pipeline, and retriever
--End-to-end tests: updated session history tests and multi-DB search
tests; added tests for triplet_completion and memify pipeline execution

Acceptance Criteria and Testing
Scenario 1:
-Run default add, cognify pipelines
-Run create triplet embeddings memify pipeline
-Verify the vector DB contains a non empty Triplet_text collection.
-Use the new triplet_completion search type and confirm it works
correctly.

Scenario 2:
-Run the default add and cognify pipelines.
-Do not run the triplet embeddings memify pipeline.
-Attempt to use the triplet_completion search type.
-You should receive an error indicating that the triplet embeddings
memify pipeline must be executed first.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION)
* Batch triplet retrieval and a triplet embeddings pipeline for
extraction, indexing, and optional background processing
* Context retrieval from triplet embeddings with optional caching and
conversation-history support
  * New Triplet data type exposed for indexing and search

* **Examples**
* End-to-end example demonstrating triplet embeddings extraction and
TRIPLET_COMPLETION search

* **Tests**
* Unit and integration tests covering triplet extraction, retrieval,
embedding pipeline, and completion flows

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-12-02 18:27:08 +01:00
ketanjain7981
3f53534c99 refactor(database): simplify to env var only for connect_args
- Remove unused connect_args parameter from __init__
- Programmatic parameter was dead code (never called by users)
- Users call get_relational_engine() which doesn't expose connect_args
- Keep DATABASE_CONNECT_ARGS env var support (actually used in production)
- Simplify implementation and reduce complexity
- Update docstring to reflect env-var-only approach
- Add production examples to docstring

Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 21:12:07 +05:30
Igor Ilic
1282905888 feat: add password encryption for Neo4j 2025-12-02 16:34:16 +01:00
ketanjain7981
c892265644 fix(database): address CodeRabbit review feedback
- Add comprehensive docstring for __init__ method to meet 80% coverage requirement
- Fix security issue: remove sensitive data from log messages
- Fix merge precedence: programmatic args now correctly override env vars
- Fix SQLite timeout order: user-specified timeout now overrides default 30s
- Clarify precedence in docstring documentation

Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 21:01:43 +05:30
ketanjain7981
f9b16e508d feat(database): add connect_args support to SqlAlchemyAdapter
- Add optional connect_args parameter to __init__ method
- Support DATABASE_CONNECT_ARGS environment variable for JSON-based configuration
- Enable custom connection parameters for all database engines (SQLite and PostgreSQL)
- Maintain backward compatibility with existing code
- Add proper error handling and validation for environment variable parsing

Signed-off-by: ketanjain7981 <ketan.jain@think41.com>
2025-12-02 20:30:09 +05:30
Igor Ilic
92448767fe refactor: remove done TODOs 2025-12-02 14:29:51 +01:00
Igor Ilic
dbcb35a6da chore: remove unused imports, add optional for delete dataset statement 2025-12-02 13:09:45 +01:00
Boris Arzentar
0ff836b6dd
fix: install latest nvm version 2025-12-02 10:48:28 +01:00
Boris Arzentar
5fe6a17cfd
fix: resolve nvm when not in path 2025-12-02 10:43:57 +01:00
Boris Arzentar
5ee5ae294a
Merge remote-tracking branch 'origin/dev' into feature/cog-3441-cognee-cli-ui-fix 2025-12-01 20:23:01 +01:00
Andrej Milicevic
d473ef12ae fix: small changes based on PR comments 2025-12-01 18:32:55 +01:00
Igor Ilic
362aa8df5c
Merge branch 'main' into baml-rate-limit-handling 2025-12-01 15:12:27 +01:00
Boris
76d054b6a5
Merge branch 'dev' into feature/cog-3156-move-codify-pipeline-out-of-main-repo 2025-12-01 11:21:34 +01:00
Igor Ilic
0bb4ece4d8 Merge branch 'main' into main-merge-vol4 2025-12-01 11:16:59 +01:00
Boris
5ce1af8cc0
Merge branch 'dev' into fix/remove-double-quotes-from-llmconfig-str-params 2025-12-01 10:09:53 +01:00
Mike Potter
73d84129de fix(api): pass run_in_background parameter to memify function
The run_in_background parameter was defined in MemifyPayloadDTO but was
never passed to the cognee_memify function call, making the parameter
effectively unused. This fix passes the parameter so users can actually
run memify operations in the background.

Signed-off-by: Mike Potter <mpotter1@gmail.com>
2025-11-28 12:40:53 -05:00
Igor Ilic
d81d63390f test: Add test for dataset database handler creation 2025-11-28 16:33:46 +01:00
Igor Ilic
0c825b96ff
Merge branch 'dev' into multi-tenant-neo4j 2025-11-28 12:55:48 +01:00
Andrej Milicevic
aa8afefe8a feat: add kwargs to cognify and related tasks 2025-11-27 17:05:37 +01:00
Andrej Milicevic
c649900042 Merge branch 'dev' into feature/cog-3396-add-support-to-pass-custom-parameters-in-openai-adapter 2025-11-27 16:59:43 +01:00
Andrej Milicevic
c1857a50fa fix: remove new custom pipelien interface 2025-11-27 14:58:07 +01:00
Andrej Milicevic
f776f04ee0 feat: add registration and use of custom retrievers 2025-11-27 14:55:22 +01:00
hajdul88
0fd939ca2b updating url again 2025-11-27 13:28:48 +01:00
Igor Ilic
1ff6a72fc7 refactor: set default value to empty dictionary 2025-11-26 16:45:18 +01:00
hajdul88
508165e883
feature: Introduces wide subgraph search in graph completion and improves QA speed (#1736)
<!-- .github/pull_request_template.md -->

This PR introduces wide vector and graph structure filtering
capabilities. With these changes, the graph completion retriever and all
retrievers that inherit from it will now filter relevant vector elements
and subgraphs based on the query. This improvement significantly
increases search speed for large graphs while maintaining—and in some
cases slightly improving—accuracy.

Changes in This PR:

-Introduced new wide_search_top_k parameter: Controls the initial search
space size

-Added graph adapter level filtering method: Enables relevant subgraph
filtering while maintaining backward compatibility. For community or
custom graph adapters that don't implement this method, the system
gracefully falls back to the original search behavior.

-Updated modal dashboard and evaluation framework: Fixed compatibility
issues.
Added comprehensive unit tests: Introduced unit tests for
brute_force_triplet_search (previously untested) and expanded the
CogneeGraph test suite.

Integration tests: Existing integration tests verify end-to-end search
functionality (no changes required).

Acceptance Criteria and Testing

To verify the new search behavior, run search queries with different
wide_search_top_k parameters while logging is enabled:
None: Triggers a full graph search (default behavior)
1: Projects a minimal subgraph (demonstrates maximum filtering)
Custom values: Test intermediate levels of filtering

Internal Testing and results:
Performance and accuracy benchmarks are available upon request. The
implementation demonstrates measurable improvements in query latency for
large graphs without sacrificing result quality.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
2025-11-26 15:18:53 +01:00
Andrej Milicevic
700362a233 fix: fix model names and test names 2025-11-26 13:35:56 +01:00
Boris Arzentar
ca271c5dbb
fix: lint error 2025-11-26 12:43:57 +01:00
Boris Arzentar
2f06c3a97e
fix: install nvm and node for -ui cli command 2025-11-26 12:24:14 +01:00
Andrej Milicevic
5a2a5f64d2 merge dev 2025-11-26 11:04:11 +01:00
Igor Ilic
cf9edf2663 chore: Add migration for new dataset database model field 2025-11-25 18:03:35 +01:00
Igor Ilic
69777ef0a5 feat: Add ability to handle custom connection resolution to avoid storing security critical data in rel dbx 2025-11-25 17:53:21 +01:00
Igor Ilic
5f3b776406 chore: add todo for enhancing db connections 2025-11-25 16:38:34 +01:00
Igor Ilic
2e02aafbae refactor: Remove unused imports 2025-11-25 15:55:36 +01:00
Igor Ilic
593f17fcdc refactor: Add better handling of configuration for dataset to database handler 2025-11-25 15:41:01 +01:00
Andrej Milicevic
4c6bed885e chore: ruff format 2025-11-25 13:02:26 +01:00
Andrej Milicevic
f22330a7b6 Merge branch 'dev' into feature/bedrock-llm-provider 2025-11-25 13:02:07 +01:00