Commit graph

2480 commits

Author SHA1 Message Date
Igor Ilic
fd6a77deec refactor: Add TODO for missing llm config parameters 2026-01-08 13:31:25 +01:00
Igor Ilic
f3215e16f9 refactor: Remove silent handling of lifetime assignment 2026-01-08 12:51:11 +01:00
Igor Ilic
07b91f3a5f refactor: Remove comment from Dockerfile 2026-01-08 12:45:03 +01:00
vasilije
af72dd2fc2 fixes to ruff format 2026-01-07 16:26:36 +01:00
Vasilije
34c6652939
add configurable JWT expiration, cookie domain, CORS origins, and service restart policies (#1956)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces several configuration improvements to enhance the
application's flexibility and reliability. The changes make JWT token
expiration and cookie domain configurable via environment variables,
improve CORS configuration, and add container restart policies for
better uptime.

**JWT Token Expiration Configuration:**
- Added `JWT_LIFETIME_SECONDS` environment variable to configure JWT
token expiration time
- Set default expiration to 3600 seconds (1 hour) for both API and
client authentication backends
- Removed hardcoded expiration values in favor of environment-based
configuration
- Added documentation comments explaining the JWT strategy configuration

**Cookie Domain Configuration:**
- Added `AUTH_TOKEN_COOKIE_DOMAIN` environment variable to configure
cookie domain
- When not set or empty, cookie domain defaults to `None` allowing
cross-domain usage
- Added documentation explaining cookie expiration is handled by JWT
strategy
- Updated default_transport to use environment-based cookie domain

**CORS Configuration Enhancement:**
- Added `CORS_ALLOWED_ORIGINS` environment variable with default value
of `'*'`
- Configured frontend to use `NEXT_PUBLIC_BACKEND_API_URL` environment
variable
- Set default backend API URL to `http://localhost:8000`

**Docker Service Reliability:**
- Added `restart: always` policy to all services (cognee, frontend,
neo4j, chromadb, and postgres)
- This ensures services automatically restart on failure or system
reboot
- Improves container reliability and uptime in production and
development environments

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Services now automatically restart on failure for improved
reliability.

* **Configuration**
* Cookie domain for authentication is now configurable via environment
variable, defaulting to None if not set.
* JWT token lifetime is now configurable via environment variable, with
a 3600-second default.
* CORS allowed origins are now configurable with a default of all
origins (*).
* Frontend backend API URL is now configurable, defaulting to
http://localhost:8000.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-04 10:37:13 +01:00
maozhen
2c79d693fd ```
fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine

- Add conditional check for empty API key to prevent authentication errors- Set default API key to "EMPTY" when no valid key is provided- This ensures proper fallback behavior when API key is not configured
```
2026-01-04 15:18:43 +08:00
maozhen
e47fda4872 ```
fix(auth): add error handling for JWT lifetime configuration

- Add try-catch block to handle invalid JWT_LIFETIME_SECONDS environment variable
- Default to 360 seconds when environment variable is not a valid integer
- Apply same fix to both API and client authentication backendsdocs(docker): add security warning for CORS configuration

- Add comment warning about default CORS_ALLOWED_ORIGINS setting
- Emphasize need to override wildcard with specific domains in production
```
2026-01-04 11:08:42 +08:00
maozhen
5a77c36a95 ```
refactor(auth): remove redundant comments from JWT strategy configurationRemove duplicate comments that were explaining the JWT lifetime configuration
in both API and client authentication backends. The code remains functionallyunchanged but comments are cleaned up for better maintainability.
```
2026-01-04 11:08:32 +08:00
maozhen
a7b114725a ```
feat(auth): make JWT token expiration configurable via environment variable- Add JWT_LIFETIME_SECONDS environment variable to configure token expiration
- Set default expiration to3600 seconds (1 hour) for both API and client auth backends
- Remove hardcoded expiration values in favor of environment-based configuration
- Add documentation comments explaining the JWT strategy configuration

feat(auth): make cookie domain configurable via environment variable

- Add AUTH_TOKEN_COOKIE_DOMAIN environment variable to configure cookie domain
- When not set or empty, cookie domain defaults to None allowing cross-domain usage
- Add documentation explaining cookie expiration is handled by JWT strategy
- Update default_transport to use environment-based cookie domainfeat(docker): add CORS_ALLOWED_ORIGINS environment variable

- Add CORS_ALLOWED_ORIGINS environment variable with default value of '*'
- Configure frontend to use NEXT_PUBLIC_BACKEND_API_URL environment variable
- Set default backend API URL to http://localhost:8000

feat(docker): add restart policy to all services

- Add restart: always policy to cognee, frontend, neo4j, chromadb, and postgres services
- This ensures services automatically restart on failure or system reboot
- Improves container reliability and uptime```
2026-01-04 11:08:28 +08:00
Vasilije
a0f25f4f50
feat: redo notebook tutorials (#1922)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Two interactive tutorial notebooks added (Cognee Basics, Python
Development) with runnable code and rich markdown; MarkdownPreview for
rendered markdown; instance-aware notebook support and cloud proxy with
API key handling; notebook CRUD (create, save, run, delete).

* **Bug Fixes**
  * Improved authentication handling to treat 401/403 consistently.

* **Improvements**
* Auto-expanding text areas; better error propagation from dataset
operations; migration to allow toggling deletability for legacy tutorial
notebooks.

* **Tests**
  * Expanded tests for tutorial creation and loading.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-01 14:44:04 +01:00
vasilije
8965e31a58 reformat 2025-12-31 13:57:48 +01:00
dgarnitz
d578971b60 add support for structured outputs with llamma cpp va instructor and litellm 2025-12-30 16:37:31 -08:00
vasilije
27f2aa03b3 added fixes to litellm 2025-12-28 21:48:01 +01:00
Vasilije
310e9e97ae
feat: list vector distance in cogneegraph (#1926)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

- `map_vector_distances_to_graph_nodes` and
`map_vector_distances_to_graph_edges` accept both single-query (flat
list) and multi-query (nested list) inputs.
- `query_list_length` controls the mode: omit it for single-query
behavior, or provide it to enable multi-query mode with strict length
validation and per-query results.
- `vector_distance` on `Node` and `Edge` is now a list (one distance per
query). Constructors set it to `None`, and `reset_distances` initializes
it at the start of each search.
- `Node.update_distance_for_query` and `Edge.update_distance_for_query`
are the only methods that write to `vector_distance`. They ensure the
list has enough elements and keep unmatched queries at the penalty
value.
- `triplet_distance_penalty` is the default distance value used
everywhere. Unmatched nodes/edges and missing scores all use this same
penalty for consistency.
- `edges_by_distance_key` is an index mapping edge labels to matching
edges. This lets us update all edges with the same label at once,
instead of scanning the full edge list repeatedly.
- `calculate_top_triplet_importances` returns `List[Edge]` for
single-query mode and `List[List[Edge]]` for multi-query mode.


## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Multi-query support for mapping/scoring node and edge distances and a
configurable triplet distance penalty.
* Distance-keyed edge indexing for more accurate distance-to-edge
matching.

* **Refactor**
* Vector distance metadata changed from scalars to per-query lists;
added reset/normalization and per-query update flows.
* Node/edge distance initialization now supports deferred/listed
distances.

* **Tests**
* Updated and expanded tests for multi-query flows, list-based
distances, edge-key handling, and related error cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-23 14:47:27 +01:00
lxobr
f6c76ce19e chore: remove duplicate import 2025-12-19 16:24:49 +01:00
lxobr
c3cec818d7 fix: update tests 2025-12-19 16:22:47 +01:00
lxobr
9808077b4c nit: update variable names 2025-12-19 15:35:34 +01:00
Vasilije
9b2b1a9c13
chore: covering higher level search logic with tests (#1910)
<!-- .github/pull_request_template.md -->

## Description
This PR covers the higher level search.py logic with unit tests. As a
part of the implementation we fully cover the following core logic:

- search.py
- get_search_type_tools (with all the core search types)
- search - prepare_search_results contract (testing behavior from
search.py interface)

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Added comprehensive unit test coverage for search functionality,
including search type tool selection, search operations, and result
preparation workflows across multiple scenarios and edge cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 14:22:54 +01:00
lxobr
a85df53c74 chore: tweak mapping and scoring 2025-12-19 13:14:50 +01:00
hajdul88
72dae0f79a fix linting 2025-12-19 10:38:44 +01:00
hajdul88
7cf93ea79d updates old no asserts test + yml 2025-12-19 10:32:45 +01:00
hajdul88
4b71995a70 ruff 2025-12-19 10:25:24 +01:00
hajdul88
9819b38058
Merge branch 'dev' into feature/cog-3536-multitenant-search-testing-automation 2025-12-19 10:06:02 +01:00
Vasilije
eb444ca18f
feat: Add a task that deletes the old data that has not been accessed in a while (#1751)
<!-- .github/pull_request_template.md -->  
  
## Description  
  
This PR implements a data deletion system for unused DataPoint models
based on last access tracking. The system tracks when data is accessed
during search operations and provides cleanup functionality to remove
data that hasn't been accessed within a configurable time threshold.
  
**Key Changes:**  
1. Added `last_accessed` timestamp field to the SQL `Data` model  
2. Added `last_accessed_at` timestamp field to the graph `DataPoint`
model
3. Implemented `update_node_access_timestamps()` function that updates
both graph nodes and SQL records during search operations
4. Created `cleanup_unused_data()` function with SQL-based deletion mode
for whole document cleanup
5. Added Alembic migration to add `last_accessed` column to the `data`
table
6. Integrated timestamp tracking into  in retrievers  
7. Added comprehensive end-to-end test for the cleanup functionality  
  
## Related Issues  
Fixes #[issue_number]  
  
## Type of Change  
- [x] New feature (non-breaking change that adds functionality)  
- [ ] Bug fix (non-breaking change that fixes an issue)  
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update  
- [ ] Code refactoring  
- [ ] Performance improvement  
  
## Database Changes  
- [x] This PR includes database schema changes  
- [x] Alembic migration included: `add_last_accessed_to_data`  
- [x] Migration adds `last_accessed` column to `data` table  
- [x] Migration includes backward compatibility (nullable column)  
- [x] Migration tested locally  
  
## Implementation Details  
  
### Files Modified:  
1. **cognee/modules/data/models/Data.py** - Added `last_accessed` column
2. **cognee/infrastructure/engine/models/DataPoint.py** - Added
`last_accessed_at` field
3. **cognee/modules/retrieval/chunks_retriever.py** - Integrated
timestamp tracking in `get_context()`
4. **cognee/modules/retrieval/utils/update_node_access_timestamps.py**
(new file) - Core tracking logic
5. **cognee/tasks/cleanup/cleanup_unused_data.py** (new file) - Cleanup
implementation
6. **alembic/versions/[revision]_add_last_accessed_to_data.py** (new
file) - Database migration
7. **cognee/tests/test_cleanup_unused_data.py** (new file) - End-to-end
test
  
### Key Functions:  
- `update_node_access_timestamps(items)` - Updates timestamps in both
graph and SQL
- `cleanup_unused_data(minutes_threshold, dry_run, text_doc)` - Main
cleanup function
- SQL-based cleanup mode uses `cognee.delete()` for proper document
deletion
  
## Testing  
- [x] Added end-to-end test: `test_textdocument_cleanup_with_sql()`  
- [x] Test covers: add → cognify → search → timestamp verification →
aging → cleanup → deletion verification
- [x] Test verifies cleanup across all storage systems (SQL, graph,
vector)
- [x] All existing tests pass  
- [x] Manual testing completed  
  
## Screenshots/Videos  
N/A - Backend functionality  
  
## Pre-submission Checklist  
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)  
- [x] All new and existing tests pass  
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description  
- [x] My commits have clear and descriptive messages  
  
## Breaking Changes  
None - This is a new feature that doesn't affect existing functionality.
  

## DCO Affirmation  
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
Resolves #1335 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added access timestamp tracking to monitor when data is last
retrieved.
* Introduced automatic cleanup of unused data based on configurable time
thresholds and access history.
* Retrieval operations now update access timestamps to ensure accurate
tracking of data usage.

* **Tests**
* Added integration test validating end-to-end cleanup workflow across
storage layers.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-19 09:47:31 +01:00
hajdul88
ee967ae3fa feat: adds grant permission checks + tenant + role scenarios 2025-12-19 09:31:56 +01:00
hajdul88
976ac78e5e ruff 2025-12-19 07:53:36 +01:00
hajdul88
ef7ebc0748 feat: adds user1 and user 2 dataset read tests 2025-12-19 07:53:08 +01:00
Boris Arzentar
3311db55bf
fix: typos in text and error handling 2025-12-18 22:52:09 +01:00
Boris Arzentar
672a776df5
Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook 2025-12-18 17:33:25 +01:00
Boris Arzentar
edb541505c
fix: lint errors and ignore tutorial python files when linting 2025-12-18 17:33:21 +01:00
hajdul88
3e47de5ea0 ruff ruff 2025-12-18 17:33:15 +01:00
hajdul88
9c04f46572 feat: adds new permission test fixtures and setup til cognify 2025-12-18 17:31:32 +01:00
hajdul88
ef51dcfb7a
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-18 16:10:16 +01:00
hajdul88
4f07adee66
chore: fixes get_raw_data endpoint and adds s3 support (#1916)
<!-- .github/pull_request_template.md -->

## Description
This PR fixes get_raw_data endpoint in get_dataset_router

- Fixes local path access
- Adds s3 access
- Covers new fixed functionality with unit tests

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Streaming support for remote S3 data locations so large dataset files
can be retrieved efficiently.
  * Improved handling of local and remote file paths for downloads.

* **Improvements**
  * Standardized error responses for missing datasets or data files.

* **Tests**
* Added unit tests covering local file downloads and S3 streaming,
including content and attachment header verification.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-18 16:10:05 +01:00
Boris Arzentar
d127381262
Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook 2025-12-18 15:28:56 +01:00
Boris Arzentar
f93d414e94
feat: simplify the current tutorial and add cognee basics tutorial 2025-12-18 15:28:45 +01:00
lxobr
c1ea7a8cc2 fix: improve graph distance mapping 2025-12-18 14:52:35 +01:00
hajdul88
8602ba1e93
Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-18 13:25:19 +01:00
Vasilije
4d03fcfa9e
fix: Fix connection encoding (#1917)
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with special characters like '#' and '@' in passwords for
Postgres

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved internal database connection handling for relational and
vector databases to enhance system stability and code maintainability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-17 22:04:09 +01:00
Vasilije
2ef8094666
feat: Add custom label by contributor: apenade (#1913)
<!-- .github/pull_request_template.md -->

## Description
Add ability to define custom labels for Data in Cognee. Initial PR by
contributor: apenade

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added support for labeling individual data items during ingestion
workflows
* Expanded the add API to accept data items with optional custom labels
for better organization
* Labels are persisted and retrievable when accessing dataset
information
* Enhanced data retrieval to include label information in API responses

* **Tests**
* Added comprehensive end-to-end tests validating custom data labeling
functionality

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-17 21:21:40 +01:00
Igor Ilic
d352ff0c28
Merge branch 'dev' into fix-connection-encoding 2025-12-17 21:08:45 +01:00
Igor Ilic
6e5e79f434 fix: Resolve connection issue with postgres when special characters are present 2025-12-17 21:07:23 +01:00
lxobr
46ff01021a feat: add multi-query support to score calculation 2025-12-17 19:09:02 +01:00
lxobr
69ab8e7ede feat: add multi-query support to graph distance mapping 2025-12-17 18:14:57 +01:00
lxobr
cc7ca45e73 feat: make vector_distance list based 2025-12-17 15:48:24 +01:00
hajdul88
f79ba53e1d
COG-3532 chore: retriever test reorganization + adding new tests (unit) (STEP 2) (#1892)
<!-- .github/pull_request_template.md -->

This PR restructures/adds unittests for the retrieval module. (STEP 2)

-Added missing unit tests for all core retrieval business logic

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Expanded and refactored retrieval module test suites with
comprehensive unit test coverage for ChunksRetriever,
SummariesRetriever, RagCompletionRetriever, TripletRetriever,
GraphCompletionRetriever, TemporalRetriever, and related components.
* Added new test modules for completion utilities, graph summary
retrieval, and user feedback functionality.
* Improved test robustness with edge case handling and error scenario
coverage.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-12-17 12:30:15 +01:00
hajdul88
b0454b49a9 Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4' of github.com:topoteretes/cognee into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-17 10:35:12 +01:00
hajdul88
94d5175570 feat: adds unit test for the prepare search result - search contract 2025-12-17 10:34:57 +01:00
hajdul88
623126eec1
Merge branch 'feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-3' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4 2025-12-17 10:07:58 +01:00
Igor Ilic
cc872fc8de refactor: format PR 2025-12-16 21:04:15 +01:00