<!-- .github/pull_request_template.md -->
## Description
This PR introduces triplet embeddings via a new
create_triplet_embeddings memify pipeline.
The pipeline reads the graph in batches, extracts properties from graph
elements based on their datapoint types, and generates combined triplet
embeddings. These embeddings are stored in the vector database as a new
collection.
Changes in This PR:
-Added a new create_triplet_embeddings memify pipeline.
-Added a new get_triplet_datapoints memify task.
-Introduced a new triplet_completion search type.
-Added full test coverage
--Unit tests: memify task, pipeline, and retriever
--Integration tests: memify task, pipeline, and retriever
--End-to-end tests: updated session history tests and multi-DB search
tests; added tests for triplet_completion and memify pipeline execution
Acceptance Criteria and Testing
Scenario 1:
-Run default add, cognify pipelines
-Run create triplet embeddings memify pipeline
-Verify the vector DB contains a non empty Triplet_text collection.
-Use the new triplet_completion search type and confirm it works
correctly.
Scenario 2:
-Run the default add and cognify pipelines.
-Do not run the triplet embeddings memify pipeline.
-Attempt to use the triplet_completion search type.
-You should receive an error indicating that the triplet embeddings
memify pipeline must be executed first.
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Triplet-based search with LLM-powered completions (TRIPLET_COMPLETION)
* Batch triplet retrieval and a triplet embeddings pipeline for
extraction, indexing, and optional background processing
* Context retrieval from triplet embeddings with optional caching and
conversation-history support
* New Triplet data type exposed for indexing and search
* **Examples**
* End-to-end example demonstrating triplet embeddings extraction and
TRIPLET_COMPLETION search
* **Tests**
* Unit and integration tests covering triplet extraction, retrieval,
embedding pipeline, and completion flows
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Pavel Zorin <pazonec@yandex.ru>
84 lines
2.5 KiB
Python
84 lines
2.5 KiB
Python
import os
|
|
import pytest
|
|
import pathlib
|
|
import pytest_asyncio
|
|
import cognee
|
|
|
|
from cognee.low_level import setup
|
|
from cognee.tasks.storage import add_data_points
|
|
from cognee.modules.retrieval.exceptions.exceptions import NoDataError
|
|
from cognee.modules.retrieval.triplet_retriever import TripletRetriever
|
|
from cognee.modules.engine.models import Triplet
|
|
|
|
|
|
@pytest_asyncio.fixture
|
|
async def setup_test_environment_with_triplets():
|
|
"""Set up a clean test environment with triplets."""
|
|
base_dir = pathlib.Path(__file__).parent.parent.parent.parent
|
|
system_directory_path = str(base_dir / ".cognee_system/test_triplet_retriever_context_simple")
|
|
data_directory_path = str(base_dir / ".data_storage/test_triplet_retriever_context_simple")
|
|
|
|
cognee.config.system_root_directory(system_directory_path)
|
|
cognee.config.data_root_directory(data_directory_path)
|
|
|
|
await cognee.prune.prune_data()
|
|
await cognee.prune.prune_system(metadata=True)
|
|
await setup()
|
|
|
|
triplet1 = Triplet(
|
|
from_node_id="node1",
|
|
to_node_id="node2",
|
|
text="Alice knows Bob",
|
|
)
|
|
triplet2 = Triplet(
|
|
from_node_id="node2",
|
|
to_node_id="node3",
|
|
text="Bob works at Tech Corp",
|
|
)
|
|
|
|
triplets = [triplet1, triplet2]
|
|
await add_data_points(triplets)
|
|
|
|
yield
|
|
|
|
try:
|
|
await cognee.prune.prune_data()
|
|
await cognee.prune.prune_system(metadata=True)
|
|
except Exception:
|
|
pass
|
|
|
|
|
|
@pytest_asyncio.fixture
|
|
async def setup_test_environment_empty():
|
|
"""Set up a clean test environment without triplets."""
|
|
base_dir = pathlib.Path(__file__).parent.parent.parent.parent
|
|
system_directory_path = str(
|
|
base_dir / ".cognee_system/test_triplet_retriever_context_empty_collection"
|
|
)
|
|
data_directory_path = str(
|
|
base_dir / ".data_storage/test_triplet_retriever_context_empty_collection"
|
|
)
|
|
|
|
cognee.config.system_root_directory(system_directory_path)
|
|
cognee.config.data_root_directory(data_directory_path)
|
|
|
|
await cognee.prune.prune_data()
|
|
await cognee.prune.prune_system(metadata=True)
|
|
|
|
yield
|
|
|
|
try:
|
|
await cognee.prune.prune_data()
|
|
await cognee.prune.prune_system(metadata=True)
|
|
except Exception:
|
|
pass
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_triplet_retriever_context_simple(setup_test_environment_with_triplets):
|
|
"""Integration test: verify TripletRetriever can retrieve triplet context."""
|
|
retriever = TripletRetriever(top_k=5)
|
|
|
|
context = await retriever.get_context("Alice")
|
|
|
|
assert "Alice knows Bob" in context, "Failed to get Alice triplet"
|