cognee

Author	SHA1	Message	Date
vasilije	06dda4e4b4	test(graph): add unit tests for relational deletion helpers	2026-01-02 14:08:01 +01:00
Vasilije	a0f25f4f50	feat: redo notebook tutorials (#1922 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Two interactive tutorial notebooks added (Cognee Basics, Python Development) with runnable code and rich markdown; MarkdownPreview for rendered markdown; instance-aware notebook support and cloud proxy with API key handling; notebook CRUD (create, save, run, delete). * Bug Fixes * Improved authentication handling to treat 401/403 consistently. * Improvements * Auto-expanding text areas; better error propagation from dataset operations; migration to allow toggling deletability for legacy tutorial notebooks. * Tests * Expanded tests for tutorial creation and loading. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-01-01 14:44:04 +01:00
vasilije	8965e31a58	reformat	2025-12-31 13:57:48 +01:00
Vasilije	e5341c5f49	Support Structured Outputs with Llama CPP using LiteLLM & Instructor (#1949 ) <!-- .github/pull_request_template.md --> ## Description This PR adds support for structured outputs with llama cpp using litellm and instructor. It returns a Pydantic instance. Based on the github issue described [here](https://github.com/topoteretes/cognee/issues/1947). It features the following: - works for both local and server modes (OpenAI api compatible) - defaults to `JSON` mode (not JSON schema mode, which is too rigid) - uses existing patterns around logging & tenacity decorator consistent with other adapters - Respects max_completion_tokens / max_tokens ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> I used the script below to test it with the [Phi-3-mini-4k-instruct model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf). This tests a basic structured data extraction and a more complex one locally, then verifies that data extraction works in server mode. There are instructors in the script on how to set up the models. If you are testing this on a mac, run `brew install llama.cpp` to get llama cpp working locally. If you don't have Apple silicon chips, you will need to alter the script or the configs to run this on GPU. ``` """ Comprehensive test script for LlamaCppAPIAdapter - Tests LOCAL and SERVER modes SETUP INSTRUCTIONS: =================== 1. Download a small model (pick ONE): # Phi-3-mini (2.3GB, recommended - best balance) wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf # OR TinyLlama (1.1GB, smallest but lower quality) wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf 2. For SERVER mode tests, start a server: python -m llama_cpp.server --model ./Phi-3-mini-4k-instruct-q4.gguf --port 8080 --n_gpu_layers -1 """ import asyncio import os from pydantic import BaseModel from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.llama_cpp.adapter import ( LlamaCppAPIAdapter, ) class Person(BaseModel): """Simple test model for person extraction""" name: str age: int class EntityExtraction(BaseModel): """Test model for entity extraction""" entities: list[str] summary: str # Configuration - UPDATE THESE PATHS MODEL_PATHS = [ "./Phi-3-mini-4k-instruct-q4.gguf", "./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf", ] def find_model() -> str: """Find the first available model file""" for path in MODEL_PATHS: if os.path.exists(path): return path return None async def test_local_mode(): """Test LOCAL mode (in-process, no server needed)""" print("=" * 70) print("Test 1: LOCAL MODE (In-Process)") print("=" * 70) model_path = find_model() if not model_path: print("❌ No model found! Download a model first:") print() return False print(f"Using model: {model_path}") try: adapter = LlamaCppAPIAdapter( name="LlamaCpp-Local", model_path=model_path, # Local mode parameter max_completion_tokens=4096, n_ctx=2048, n_gpu_layers=-1, # 0 for CPU, -1 for all GPU layers ) print(f"✓ Adapter initialized in {adapter.mode_type.upper()} mode") print(" Sending request...") result = await adapter.acreate_structured_output( text_input="John Smith is 30 years old", system_prompt="Extract the person's name and age.", response_model=Person, ) print(f"✅ Success!") print(f" Name: {result.name}") print(f" Age: {result.age}") print() return True except ImportError as e: print(f"❌ ImportError: {e}") print(" Install llama-cpp-python: pip install llama-cpp-python") print() return False except Exception as e: print(f"❌ Failed: {e}") print() return False async def test_server_mode(): """Test SERVER mode (localhost HTTP endpoint)""" print("=" * 70) print("Test 3: SERVER MODE (Localhost HTTP)") print("=" * 70) try: adapter = LlamaCppAPIAdapter( name="LlamaCpp-Server", endpoint="http://localhost:8080/v1", # Server mode parameter api_key="dummy", model="Phi-3-mini-4k-instruct-q4.gguf", max_completion_tokens=1024, chat_format="phi-3" ) print(f"✓ Adapter initialized in {adapter.mode_type.upper()} mode") print(f" Endpoint: {adapter.endpoint}") print(" Sending request...") result = await adapter.acreate_structured_output( text_input="Sarah Johnson is 25 years old", system_prompt="Extract the person's name and age.", response_model=Person, ) print(f"✅ Success!") print(f" Name: {result.name}") print(f" Age: {result.age}") print() return True except Exception as e: print(f"❌ Failed: {e}") print(" Make sure llama-cpp-python server is running on port 8080:") print(" python -m llama_cpp.server --model your-model.gguf --port 8080") print() return False async def test_entity_extraction_local(): """Test more complex extraction with local mode""" print("=" * 70) print("Test 2: Complex Entity Extraction (Local Mode)") print("=" * 70) model_path = find_model() if not model_path: print("❌ No model found!") print() return False try: adapter = LlamaCppAPIAdapter( name="LlamaCpp-Local", model_path=model_path, max_completion_tokens=1024, n_ctx=2048, n_gpu_layers=-1, ) print(f"✓ Adapter initialized") print(" Sending complex extraction request...") result = await adapter.acreate_structured_output( text_input="Natural language processing (NLP) is a subfield of artificial intelligence (AI) and computer science.", system_prompt="Extract all technical entities mentioned and provide a brief summary.", response_model=EntityExtraction, ) print(f"✅ Success!") print(f" Entities: {', '.join(result.entities)}") print(f" Summary: {result.summary}") print() return True except Exception as e: print(f"❌ Failed: {e}") print() return False async def main(): """Run all tests""" print("\n" + "🦙" * 35) print("Llama CPP Adapter - Comprehensive Test Suite") print("Testing LOCAL and SERVER modes") print("🦙" * 35 + "\n") results = {} # Test 1: Local mode (no server needed) print("=" * 70) print("PHASE 1: Testing LOCAL mode (in-process)") print("=" * 70) print() results["local_basic"] = await test_local_mode() results["local_complex"] = await test_entity_extraction_local() # Test 2: Server mode (requires server on 8080) print("\n" + "=" * 70) print("PHASE 2: Testing SERVER mode (requires server running)") print("=" * 70) print() results["server"] = await test_server_mode() # Summary print("\n" + "=" * 70) print("TEST SUMMARY") print("=" * 70) for test_name, passed in results.items(): status = "✅ PASSED" if passed else "❌ FAILED" print(f" {test_name:20s}: {status}") passed_count = sum(results.values()) total_count = len(results) print() print(f"Total: {passed_count}/{total_count} tests passed") if passed_count == total_count: print("\n🎉 All tests passed! The adapter is working correctly.") elif results.get("local_basic"): print("\n✓ Local mode works! Server/cloud tests need llama-cpp-python server running.") else: print("\n⚠️ Please check setup instructions at the top of this file.") if __name__ == "__main__": asyncio.run(main()) ``` The following screenshots show the tests passing <img width="622" height="149" alt="image" src="https://github.com/user-attachments/assets/9df02f66-39a9-488a-96a6-dc79b47e3001" /> Test 1 <img width="939" height="750" alt="image" src="https://github.com/user-attachments/assets/87759189-8fd2-450f-af7f-0364101a5690" /> Test 2 <img width="938" height="746" alt="image" src="https://github.com/user-attachments/assets/61e423c0-3d41-4fde-acaf-ae77c3463d66" /> Test 3 <img width="944" height="232" alt="image" src="https://github.com/user-attachments/assets/f7302777-2004-447c-a2fe-b12762241ba9" /> note I also tried to test it with the `TinyLlama-1.1B-Chat` model but such a small model is bad at producing structured JSON consistently. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ X] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) see above ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [X] I have tested my changes thoroughly before submitting this PR - [X] This PR contains minimal changes necessary to address the issue/feature - [X] My code follows the project's coding standards and style guidelines - [X] I have added tests that prove my fix is effective or that my feature works - [X] I have added necessary documentation (if applicable) - [X] All new and existing tests pass - [X] I have searched existing PRs to ensure this change hasn't been submitted already - [X] I have linked any relevant issues in the description - [X] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Llama CPP integration supporting local (in-process) and server (OpenAI‑compatible) modes. * Selectable provider with configurable model path, context size, GPU layers, and chat format. * Asynchronous structured-output generation with rate limiting, retries/backoff, and debug logging. * Chores * Added llama-cpp-python dependency and bumped project version. * Documentation * CONTRIBUTING updated with a “Running Simple Example” walkthrough for local/server usage. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-31 12:53:55 +01:00
dgarnitz	dd639fa967	update lock file	2025-12-30 16:59:59 -08:00
dgarnitz	d578971b60	add support for structured outputs with llamma cpp va instructor and litellm	2025-12-30 16:37:31 -08:00
vasilije	27f2aa03b3	added fixes to litellm	2025-12-28 21:48:01 +01:00
Vasilije	310e9e97ae	feat: list vector distance in cogneegraph (#1926 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> - `map_vector_distances_to_graph_nodes` and `map_vector_distances_to_graph_edges` accept both single-query (flat list) and multi-query (nested list) inputs. - `query_list_length` controls the mode: omit it for single-query behavior, or provide it to enable multi-query mode with strict length validation and per-query results. - `vector_distance` on `Node` and `Edge` is now a list (one distance per query). Constructors set it to `None`, and `reset_distances` initializes it at the start of each search. - `Node.update_distance_for_query` and `Edge.update_distance_for_query` are the only methods that write to `vector_distance`. They ensure the list has enough elements and keep unmatched queries at the penalty value. - `triplet_distance_penalty` is the default distance value used everywhere. Unmatched nodes/edges and missing scores all use this same penalty for consistency. - `edges_by_distance_key` is an index mapping edge labels to matching edges. This lets us update all edges with the same label at once, instead of scanning the full edge list repeatedly. - `calculate_top_triplet_importances` returns `List[Edge]` for single-query mode and `List[List[Edge]]` for multi-query mode. ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [x] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Multi-query support for mapping/scoring node and edge distances and a configurable triplet distance penalty. * Distance-keyed edge indexing for more accurate distance-to-edge matching. * Refactor * Vector distance metadata changed from scalars to per-query lists; added reset/normalization and per-query update flows. * Node/edge distance initialization now supports deferred/listed distances. * Tests * Updated and expanded tests for multi-query flows, list-based distances, edge-key handling, and related error cases. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-23 14:47:27 +01:00
Hande	5f8a3e24bd	refactor: restructure examples and starter kit into new-examples (#1862 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Documentation * Deprecated legacy examples and added a migration guide mapping old paths to new locations * Added a comprehensive new-examples README detailing configurations, pipelines, demos, and migration notes * New Features * Added many runnable examples and demos: database configs, embedding/LLM setups, permissions and access-control, custom pipelines (organizational, product recommendation, code analysis, procurement), multimedia, visualization, temporal/ontology demos, and a local UI starter * Chores * Updated CI/test entrypoints to use the new-examples layout <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-12-20 02:07:28 +01:00
lxobr	f6c76ce19e	chore: remove duplicate import	2025-12-19 16:24:49 +01:00
lxobr	c3cec818d7	fix: update tests	2025-12-19 16:22:47 +01:00
lxobr	9808077b4c	nit: update variable names	2025-12-19 15:35:34 +01:00
Vasilije	9b2b1a9c13	chore: covering higher level search logic with tests (#1910 ) <!-- .github/pull_request_template.md --> ## Description This PR covers the higher level search.py logic with unit tests. As a part of the implementation we fully cover the following core logic: - search.py - get_search_type_tools (with all the core search types) - search - prepare_search_results contract (testing behavior from search.py interface) ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Added comprehensive unit test coverage for search functionality, including search type tool selection, search operations, and result preparation workflows across multiple scenarios and edge cases. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-19 14:22:54 +01:00
Vasilije	16cf955497	feat: adds multitenant tests via pytest (#1923 ) <!-- .github/pull_request_template.md --> ## Description This PR changes the permission test in e2e tests to use pytest. Introduces: - fixtures for the environment setup - one eventloop for all pytest tests - mocking for acreate_structured_output answer generation (for search) - Asserts in permission test (before we use the example only) ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Entity model now includes description and metadata fields for richer entity information and indexing. * Tests * Expanded and restructured permission tests covering multi-tenant and role-based access flows; improved test scaffolding and stability. * E2E test workflow now runs pytest with verbose output and INFO logs. * Bug Fixes * Access-tracking updates now commit transactions so access timestamps persist. * Chores * General formatting, cleanup, and refactoring across modules and maintenance scripts. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-19 14:16:01 +01:00
Igor Ilic	2c4f9b07ac	fix: Resolve migration issue	2025-12-19 13:35:14 +01:00
lxobr	a85df53c74	chore: tweak mapping and scoring	2025-12-19 13:14:50 +01:00
Igor Ilic	3bc3f63362	fix: Resolve issues with migration	2025-12-19 11:55:13 +01:00
hajdul88	72dae0f79a	fix linting	2025-12-19 10:38:44 +01:00
hajdul88	7cf93ea79d	updates old no asserts test + yml	2025-12-19 10:32:45 +01:00
hajdul88	8214bdce5b	Revert "changes pytest call in yaml" This reverts commit `8a490b1c16`.	2025-12-19 10:28:48 +01:00
hajdul88	4b71995a70	ruff	2025-12-19 10:25:24 +01:00
hajdul88	8a490b1c16	changes pytest call in yaml	2025-12-19 10:20:46 +01:00
hajdul88	9819b38058	Merge branch 'dev' into feature/cog-3536-multitenant-search-testing-automation	2025-12-19 10:06:02 +01:00
Vasilije	eb444ca18f	feat: Add a task that deletes the old data that has not been accessed in a while (#1751 ) <!-- .github/pull_request_template.md --> ## Description This PR implements a data deletion system for unused DataPoint models based on last access tracking. The system tracks when data is accessed during search operations and provides cleanup functionality to remove data that hasn't been accessed within a configurable time threshold. Key Changes: 1. Added `last_accessed` timestamp field to the SQL `Data` model 2. Added `last_accessed_at` timestamp field to the graph `DataPoint` model 3. Implemented `update_node_access_timestamps()` function that updates both graph nodes and SQL records during search operations 4. Created `cleanup_unused_data()` function with SQL-based deletion mode for whole document cleanup 5. Added Alembic migration to add `last_accessed` column to the `data` table 6. Integrated timestamp tracking into in retrievers 7. Added comprehensive end-to-end test for the cleanup functionality ## Related Issues Fixes #[issue_number] ## Type of Change - [x] New feature (non-breaking change that adds functionality) - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement ## Database Changes - [x] This PR includes database schema changes - [x] Alembic migration included: `add_last_accessed_to_data` - [x] Migration adds `last_accessed` column to `data` table - [x] Migration includes backward compatibility (nullable column) - [x] Migration tested locally ## Implementation Details ### Files Modified: 1. cognee/modules/data/models/Data.py - Added `last_accessed` column 2. cognee/infrastructure/engine/models/DataPoint.py - Added `last_accessed_at` field 3. cognee/modules/retrieval/chunks_retriever.py - Integrated timestamp tracking in `get_context()` 4. cognee/modules/retrieval/utils/update_node_access_timestamps.py (new file) - Core tracking logic 5. cognee/tasks/cleanup/cleanup_unused_data.py (new file) - Cleanup implementation 6. alembic/versions/[revision]_add_last_accessed_to_data.py (new file) - Database migration 7. cognee/tests/test_cleanup_unused_data.py (new file) - End-to-end test ### Key Functions: - `update_node_access_timestamps(items)` - Updates timestamps in both graph and SQL - `cleanup_unused_data(minutes_threshold, dry_run, text_doc)` - Main cleanup function - SQL-based cleanup mode uses `cognee.delete()` for proper document deletion ## Testing - [x] Added end-to-end test: `test_textdocument_cleanup_with_sql()` - [x] Test covers: add → cognify → search → timestamp verification → aging → cleanup → deletion verification - [x] Test verifies cleanup across all storage systems (SQL, graph, vector) - [x] All existing tests pass - [x] Manual testing completed ## Screenshots/Videos N/A - Backend functionality ## Pre-submission Checklist - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## Breaking Changes None - This is a new feature that doesn't affect existing functionality. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Resolves #1335 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added access timestamp tracking to monitor when data is last retrieved. * Introduced automatic cleanup of unused data based on configurable time thresholds and access history. * Retrieval operations now update access timestamps to ensure accurate tracking of data usage. * Tests * Added integration test validating end-to-end cleanup workflow across storage layers. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-19 09:47:31 +01:00
Vasilije	3055ed89c8	test: set up triggers for docs and community tests on new main release (#1780 ) <!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Added tests that just run scripts we have in the docs, in our guides section (Essentials and Customizing Cognee). This is a start of a test suite regarding docs, to make sure new releases don't break the scripts we have written in our docs. The new workflow only runs on releases, and is part of the Release Test Workflow we have. ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Enhanced release workflow automation with improved coordination between repositories. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-19 09:40:10 +01:00
hajdul88	ee967ae3fa	feat: adds grant permission checks + tenant + role scenarios	2025-12-19 09:31:56 +01:00
hajdul88	976ac78e5e	ruff	2025-12-19 07:53:36 +01:00
hajdul88	ef7ebc0748	feat: adds user1 and user 2 dataset read tests	2025-12-19 07:53:08 +01:00
Boris Arzentar	3311db55bf	fix: typos in text and error handling	2025-12-18 22:52:09 +01:00
Boris Arzentar	672a776df5	Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook	2025-12-18 17:33:25 +01:00
Boris Arzentar	edb541505c	fix: lint errors and ignore tutorial python files when linting	2025-12-18 17:33:21 +01:00
hajdul88	3e47de5ea0	ruff ruff	2025-12-18 17:33:15 +01:00
hajdul88	9c04f46572	feat: adds new permission test fixtures and setup til cognify	2025-12-18 17:31:32 +01:00
hajdul88	ef51dcfb7a	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4	2025-12-18 16:10:16 +01:00
hajdul88	4f07adee66	chore: fixes get_raw_data endpoint and adds s3 support (#1916 ) <!-- .github/pull_request_template.md --> ## Description This PR fixes get_raw_data endpoint in get_dataset_router - Fixes local path access - Adds s3 access - Covers new fixed functionality with unit tests ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Streaming support for remote S3 data locations so large dataset files can be retrieved efficiently. * Improved handling of local and remote file paths for downloads. * Improvements * Standardized error responses for missing datasets or data files. * Tests * Added unit tests covering local file downloads and S3 streaming, including content and attachment header verification. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-18 16:10:05 +01:00
Pavel Zorin	a70ce2785b	Release v0.5.1.dev0	2025-12-18 16:07:19 +01:00
Boris Arzentar	d127381262	Merge remote-tracking branch 'origin/dev' into feature/cog-3550-simplify-tutorial-notebook	2025-12-18 15:28:56 +01:00
Boris Arzentar	f93d414e94	feat: simplify the current tutorial and add cognee basics tutorial	2025-12-18 15:28:45 +01:00
lxobr	c1ea7a8cc2	fix: improve graph distance mapping	2025-12-18 14:52:35 +01:00
hajdul88	8602ba1e93	Merge branch 'dev' into feature/cog-3532-empower-test_search-db-retrievers-tests-reorg-4	2025-12-18 13:25:19 +01:00
Vasilije	4d03fcfa9e	fix: Fix connection encoding (#1917 ) <!-- .github/pull_request_template.md --> ## Description Resolve issue with special characters like '#' and '@' in passwords for Postgres ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Improved internal database connection handling for relational and vector databases to enhance system stability and code maintainability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-17 22:04:09 +01:00
Vasilije	2ef8094666	feat: Add custom label by contributor: apenade (#1913 ) <!-- .github/pull_request_template.md --> ## Description Add ability to define custom labels for Data in Cognee. Initial PR by contributor: apenade ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] I have tested my changes thoroughly before submitting this PR - [ ] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added support for labeling individual data items during ingestion workflows * Expanded the add API to accept data items with optional custom labels for better organization * Labels are persisted and retrievable when accessing dataset information * Enhanced data retrieval to include label information in API responses * Tests * Added comprehensive end-to-end tests validating custom data labeling functionality <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-17 21:21:40 +01:00
Igor Ilic	d352ff0c28	Merge branch 'dev' into fix-connection-encoding	2025-12-17 21:08:45 +01:00
Igor Ilic	6e5e79f434	fix: Resolve connection issue with postgres when special characters are present	2025-12-17 21:07:23 +01:00
lxobr	46ff01021a	feat: add multi-query support to score calculation	2025-12-17 19:09:02 +01:00
lxobr	69ab8e7ede	feat: add multi-query support to graph distance mapping	2025-12-17 18:14:57 +01:00
lxobr	cc7ca45e73	feat: make vector_distance list based	2025-12-17 15:48:24 +01:00
Andrej Milicevic	929d88557e	Merge branch 'dev' into feature/cog-3213-docs-set-up-guide-script-tests	2025-12-17 13:52:45 +01:00
Andrej Milicevic	431a83247f	chore: remove unnecessary 'on push' setting	2025-12-17 13:50:43 +01:00
Andrej Milicevic	6958b4edd4	feat: add the triggers to release, after pypi publishing	2025-12-17 13:50:03 +01:00

1 2 3 4 5 ...

4737 commits