Merge branch 'main' of github.com:getzep/graphiti into nba-langgraph
This commit is contained in:
commit
8a72fe7cec
14 changed files with 346 additions and 265 deletions
2
.github/workflows/cla.yml
vendored
2
.github/workflows/cla.yml
vendored
|
|
@ -26,7 +26,7 @@ jobs:
|
|||
path-to-document: "https://github.com/getzep/graphiti/blob/main/Zep-CLA.md" # e.g. a CLA or a DCO document
|
||||
# branch should not be protected
|
||||
branch: "main"
|
||||
allowlist: paul-paliychuk,prasmussen15,danielchalef,dependabot[bot],ellipsis-dev[bot]
|
||||
allowlist: paul-paliychuk,prasmussen15,danielchalef,dependabot[bot],ellipsis-dev
|
||||
|
||||
# the followings are the optional inputs - If the optional inputs are not given, then default values will be taken
|
||||
#remote-organization-name: enter the remote organization name where the signatures should be stored (Default is storing the signatures in the same repository)
|
||||
|
|
|
|||
38
.github/workflows/release.yml
vendored
Normal file
38
.github/workflows/release.yml
vendored
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
name: Release to PyPI
|
||||
|
||||
on:
|
||||
push:
|
||||
tags: ["v*.*.*"]
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.8.3"
|
||||
|
||||
jobs:
|
||||
release:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
id-token: write
|
||||
contents: write
|
||||
environment:
|
||||
name: release
|
||||
url: https://pypi.org/p/zep-cloud
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Install poetry
|
||||
run: pipx install poetry==$POETRY_VERSION
|
||||
- name: Set up Python 3.10
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.10"
|
||||
cache: "poetry"
|
||||
- name: Compare pyproject version with tag
|
||||
run: |
|
||||
TAG_VERSION=${GITHUB_REF#refs/tags/}
|
||||
if [ "$TAG_VERSION" != "v$(poetry version --short)" ]; then
|
||||
echo "Tag version $TAG_VERSION does not match the project version $(poetry version --short)"
|
||||
exit 1
|
||||
fi
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
- name: Publish package distributions to PyPI
|
||||
uses: pypa/gh-action-pypi-publish@release/v1
|
||||
266
README.md
266
README.md
|
|
@ -1,14 +1,32 @@
|
|||
# graphiti: Temporal Knowledge Graphs for Agentic Applications
|
||||
<div align="center">
|
||||
|
||||
# graphiti
|
||||
|
||||
## Temporal Knowledge Graphs for Agentic Applications
|
||||
|
||||
<br />
|
||||
|
||||
[](https://discord.com/invite/W8Kw6bsgXQ)
|
||||
[](https://github.com/getzep/graphiti/actions/workflows/lint.yml)
|
||||
[](https://github.com/getzep/graphiti/actions/workflows/unit_tests.yml)
|
||||
[](https://github.com/getzep/graphiti/actions/workflows/typecheck.yml)
|
||||
[](https://codespaces.new/getzep/graphiti)
|
||||
|
||||
graphiti builds dynamic, temporally-aware knowledge graphs that represent complex, evolving relationships between entities over time. graphiti ingests both unstructured and structured data and the resulting graph may be queried using a fusion of time, full-text, semantic, and graph algorithm approaches.
|
||||
<br />
|
||||
|
||||
TODO: Add video here.
|
||||
</div>
|
||||
|
||||
graphiti builds dynamic, temporally aware knowledge graphs that represent complex, evolving relationships between entities over time. graphiti ingests both unstructured and structured data and the resulting graph may be queried using a fusion of time, full-text, semantic, and graph algorithm approaches.
|
||||
|
||||
<br />
|
||||
|
||||
|
||||
<p align="center">
|
||||
<img src="/images/graphiti-intro-slides-stock-2.gif" alt="graphiti demo slides" width="700px">
|
||||
</p>
|
||||
|
||||
<br />
|
||||
|
||||
With graphiti, you can build LLM applications such as:
|
||||
|
||||
- Assistants that learn from user interactions, fusing personal knowledge with dynamic data from business systems like CRMs and billing platforms.
|
||||
|
|
@ -27,12 +45,15 @@ graphiti is designed for dynamic data and agentic use:
|
|||
- **Fast**: Search results in < 100ms, with latency primarily determined by the 3rd-party embedding API call.
|
||||
- **Schema Consistency**: Maintains a coherent graph structure by reusing existing schema, preventing unnecessary proliferation of node and edge types.
|
||||
|
||||
|
||||
## graphiti and Zep Memory
|
||||
|
||||
graphiti powers the core of Zep's next-generation memory layer for LLM-powered Assistants and Agents.
|
||||
graphiti powers the core of [Zep's memory layer](https://www.getzep.com) for LLM-powered Assistants and Agents.
|
||||
|
||||
We're excited to open-source graphiti, believing its potential reaches far beyond memory applications.
|
||||
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
Requirements:
|
||||
|
|
@ -48,16 +69,22 @@ Optional:
|
|||
> [!TIP]
|
||||
> The simplest way to install Neo4j is via [Neo4j Desktop](https://neo4j.com/download/). It provides a user-friendly interface to manage Neo4j instances and databases.
|
||||
|
||||
`pip install graphiti-core`
|
||||
```bash
|
||||
pip install graphiti-core
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
`poetry add graphiti-core`
|
||||
```bash
|
||||
poetry add graphiti-core
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Quick Start
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Zep uses OpenAI for LLM inference and embedding. Ensure that an `OPENAI_API_KEY` is set in your environment. Support for Anthropic and Groq LLM inferences is available, too.
|
||||
> graphiti uses OpenAI for LLM inference and embedding. Ensure that an `OPENAI_API_KEY` is set in your environment. Support for Anthropic and Groq LLM inferences is available, too.
|
||||
|
||||
```python
|
||||
from graphiti_core import Graphiti
|
||||
|
|
@ -67,6 +94,9 @@ from datetime import datetime
|
|||
# Initialize Graphiti
|
||||
graphiti = Graphiti("bolt://localhost:7687", "neo4j", "password")
|
||||
|
||||
# Initialize the graph database with graphiti's indices. This only needs to be done once.
|
||||
graphiti.build_indices_and_constraints()
|
||||
|
||||
# Add episodes
|
||||
episodes = [
|
||||
"Kamala Harris is the Attorney General of California. She was previously "
|
||||
|
|
@ -118,223 +148,27 @@ await client.search('Who was the California Attorney General?', center_node_uuid
|
|||
graphiti.close()
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
### Adding Episodes
|
||||
Visit the Zep knowledge base for graphiti [Guides and API documentation](https://help.getzep.com/graphiti/graphiti).
|
||||
|
||||
Episodes represent a single data ingestion event. An `episode` is itself a node, and any nodes identified while ingesting the
|
||||
episode are related to the episode via `MENTIONS` edges.
|
||||
|
||||
Episodes enable querying for information at a point in time and understanding the provenance of nodes and their edge relationships.
|
||||
|
||||
Supported episode types:
|
||||
|
||||
- `text`: Unstructured text data
|
||||
- `message`: Conversational messages of the format `speaker: message...`
|
||||
- `json`: Structured data, processed distinctly from the other types
|
||||
|
||||
The graph below was generated using the code in the [Quick Start](#quick-start). Each "podcast" is an individual episode.
|
||||
|
||||

|
||||
|
||||
#### Adding a `text` or `message` Episode
|
||||
|
||||
Using the `EpisodeType.text` type:
|
||||
|
||||
```python
|
||||
await graphiti.add_episode(
|
||||
name="tech_innovation_article",
|
||||
episode_body=(
|
||||
"MIT researchers have unveiled 'ClimateNet', an AI system capable of predicting "
|
||||
"climate patterns with unprecedented accuracy. Early tests show it can forecast "
|
||||
"major weather events up to three weeks in advance, potentially revolutionizing "
|
||||
"disaster preparedness and agricultural planning."
|
||||
),
|
||||
source=EpisodeType.text,
|
||||
# A description of the source (e.g., "podcast", "news article")
|
||||
source_description="Technology magazine article",
|
||||
# The timestamp for when this episode occurred or was created
|
||||
reference_time=datetime(2023, 11, 15, 9, 30),
|
||||
# Additional metadata about the episode (optional)
|
||||
metadata={
|
||||
"author": "Zara Patel",
|
||||
"publication": "Tech Horizons Monthly",
|
||||
"word_count": 39
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
Using the `EpisodeType.message` type supports passing in multi-turn conversations in the `episode_body`.
|
||||
|
||||
The text should be structured in `{role/name}: {message}` pairs.
|
||||
|
||||
```python
|
||||
await graphiti.add_episode(
|
||||
name="Customer_Support_Interaction_1",
|
||||
episode_body=(
|
||||
"Customer: Hi, I'm having trouble with my Allbirds shoes. "
|
||||
"The sole is coming off after only 2 months of use.\n"
|
||||
"Support: I'm sorry to hear that. Can you please provide your order number?"
|
||||
),
|
||||
source=EpisodeType.message,
|
||||
source_description="Customer support chat",
|
||||
reference_time=datetime(2024, 3, 15, 14, 45),
|
||||
metadata={
|
||||
"channel": "Live Chat",
|
||||
"agent_id": "SP001",
|
||||
"customer_id": "C12345"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
#### Adding an Epsiode using structured data in JSON format
|
||||
|
||||
JSON documents can be arbitrarily nested. However, it's advisable to keep documents compact, as they must fit within your LLM's context window.
|
||||
|
||||
> [!TIP]
|
||||
> For large data imports, consider using the `add_episode_bulk` API to efficiently add multiple episodes at once.
|
||||
|
||||
```python
|
||||
product_data = {
|
||||
"id": "PROD001",
|
||||
"name": "Men's SuperLight Wool Runners",
|
||||
"color": "Dark Grey",
|
||||
"sole_color": "Medium Grey",
|
||||
"material": "Wool",
|
||||
"technology": "SuperLight Foam",
|
||||
"price": 125.00,
|
||||
"in_stock": True,
|
||||
"last_updated": "2024-03-15T10:30:00Z"
|
||||
}
|
||||
|
||||
# Add the episode to the graph
|
||||
await graphiti.add_episode(
|
||||
name="Product Update - PROD001",
|
||||
episode_body=product_data, # Pass the Python dictionary directly
|
||||
source=EpisodeType.json,
|
||||
source_description="Allbirds product catalog update",
|
||||
reference_time=datetime.now(),
|
||||
metadata={
|
||||
"update_type": "product_info",
|
||||
"catalog_version": "v2.3"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
#### Loading Episodes in Bulk
|
||||
|
||||
Graphiti offers `add_episode_bulk` for efficient batch ingestion of episodes, significantly outperforming `add_episode` for large datasets. This method is highly recommended for bulk loading.
|
||||
|
||||
> [!WARNING]
|
||||
> Use `add_episode_bulk` only for populating empty graphs or when edge invalidation is not required. The bulk ingestion pipeline does not perform edge invalidation operations.
|
||||
|
||||
```python
|
||||
product_data = [
|
||||
{
|
||||
"id": "PROD001",
|
||||
"name": "Men's SuperLight Wool Runners",
|
||||
"color": "Dark Grey",
|
||||
"sole_color": "Medium Grey",
|
||||
"material": "Wool",
|
||||
"technology": "SuperLight Foam",
|
||||
"price": 125.00,
|
||||
"in_stock": true,
|
||||
"last_updated": "2024-03-15T10:30:00Z"
|
||||
},
|
||||
...
|
||||
{
|
||||
"id": "PROD0100",
|
||||
"name": "Kids Wool Runner-up Mizzles",
|
||||
"color": "Natural Grey",
|
||||
"sole_color": "Orange",
|
||||
"material": "Wool",
|
||||
"technology": "Water-repellent",
|
||||
"price": 80.00,
|
||||
"in_stock": true,
|
||||
"last_updated": "2024-03-17T14:45:00Z"
|
||||
}
|
||||
]
|
||||
|
||||
# Prepare the episodes for bulk loading
|
||||
bulk_episodes = [
|
||||
RawEpisode(
|
||||
name=f"Product Update - {product['id']}",
|
||||
content=json.dumps(product),
|
||||
source=EpisodeType.json,
|
||||
source_description="Allbirds product catalog update",
|
||||
reference_time=datetime.now()
|
||||
)
|
||||
for product in product_data
|
||||
]
|
||||
|
||||
await graphiti.add_episode_bulk(bulk_episodes)
|
||||
```
|
||||
|
||||
### Searching graphiti's graph
|
||||
|
||||
The examples below demonstrate two search approaches in the graphiti library:
|
||||
|
||||
1. **Hybrid Search:**
|
||||
|
||||
```python
|
||||
await graphiti.search(query)
|
||||
```
|
||||
|
||||
Combines semantic similarity and BM25 retrieval, reranked using Reciprocal Rank Fusion.
|
||||
|
||||
Example: Does a broad retrieval of facts related to Allbirds Wool Runners and Jane's purchase.
|
||||
|
||||
2. **Node Distance Reranking:**
|
||||
|
||||
```python
|
||||
await client.search(query, focal_node_uuid)
|
||||
```
|
||||
|
||||
Extends Hybrid Search above by prioritizing results based on proximity to a specified node in the graph.
|
||||
|
||||
Example: Focuses on Jane-specific information, highlighting her wool allergy.
|
||||
|
||||
Node Distance Reranking is particularly useful for entity-specific queries, providing more contextually relevant results. It weights facts by their closeness to the focal node, emphasizing information directly related to the entity of interest.
|
||||
|
||||
This dual approach allows for both broad exploration and targeted, entity-specific information retrieval from the knowledge graph.
|
||||
|
||||
```python
|
||||
query = "Can Jane wear Allbirds Wool Runners?"
|
||||
jane_node_uuid = "123e4567-e89b-12d3-a456-426614174000"
|
||||
|
||||
def print_facts(edges):
|
||||
print("\n".join([edge.fact for edge in edges]))
|
||||
|
||||
# Hybrid Search
|
||||
results = await graphiti.search(query)
|
||||
print_facts(results)
|
||||
|
||||
> The Allbirds Wool Runners are sold by Allbirds.
|
||||
> Men's SuperLight Wool Runners - Dark Grey (Medium Grey Sole) has a runner silhouette.
|
||||
> Jane purchased SuperLight Wool Runners.
|
||||
|
||||
# Hybrid Search with Node Distance Reranking
|
||||
await client.search(query, jane_node_uuid)
|
||||
print_facts(results)
|
||||
|
||||
> Jane purchased SuperLight Wool Runners.
|
||||
> Jane is allergic to wool.
|
||||
> The Allbirds Wool Runners are sold by Allbirds.
|
||||
```
|
||||
|
||||
## Status and Roadmap
|
||||
|
||||
wip, but endavour to not break API.
|
||||
graphiti is under active development. We aim to maintain API stability while working on:
|
||||
|
||||
graphiti is under active development. Areas we're actively working on:
|
||||
- [X] Implementing node and edge CRUD operations
|
||||
- [ ] Improving performance and scalability
|
||||
- [ ] Achieving good performance with different LLM and embedding models
|
||||
- [ ] Creating a dedicated embedder interface
|
||||
- [ ] Supporting custom graph schemas:
|
||||
- Allow developers to provide their own defined node and edge classes when ingesting episodes
|
||||
- Enable more flexible knowledge representation tailored to specific use cases
|
||||
- [ ] Enhancing retrieval capabilities with more robust and configurable options
|
||||
- [ ] Expanding test coverage to ensure reliability and catch edge cases
|
||||
|
||||
- performance and scalability
|
||||
- search improvements
|
||||
- node CRUD
|
||||
|
||||
TODO: expand on the above. Include ROADMAP
|
||||
|
||||
Latency scales sublinearly with graph size, with a cap
|
||||
|
||||
## Contributing
|
||||
|
||||
|
|
|
|||
|
|
@ -23,6 +23,7 @@ from uuid import uuid4
|
|||
from neo4j import AsyncDriver
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from graphiti_core.helpers import parse_db_date
|
||||
from graphiti_core.llm_client.config import EMBEDDING_DIM
|
||||
from graphiti_core.nodes import Node
|
||||
|
||||
|
|
@ -38,6 +39,9 @@ class Edge(BaseModel, ABC):
|
|||
@abstractmethod
|
||||
async def save(self, driver: AsyncDriver): ...
|
||||
|
||||
@abstractmethod
|
||||
async def delete(self, driver: AsyncDriver): ...
|
||||
|
||||
def __hash__(self):
|
||||
return hash(self.uuid)
|
||||
|
||||
|
|
@ -46,6 +50,9 @@ class Edge(BaseModel, ABC):
|
|||
return self.uuid == other.uuid
|
||||
return False
|
||||
|
||||
@classmethod
|
||||
async def get_by_uuid(cls, driver: AsyncDriver, uuid: str): ...
|
||||
|
||||
|
||||
class EpisodicEdge(Edge):
|
||||
async def save(self, driver: AsyncDriver):
|
||||
|
|
@ -66,9 +73,48 @@ class EpisodicEdge(Edge):
|
|||
|
||||
return result
|
||||
|
||||
async def delete(self, driver: AsyncDriver):
|
||||
result = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Episodic)-[e:MENTIONS {uuid: $uuid}]->(m:Entity)
|
||||
DELETE e
|
||||
""",
|
||||
uuid=self.uuid,
|
||||
)
|
||||
|
||||
# TODO: Neo4j doesn't support variables for edge types and labels.
|
||||
# Right now we have all edge nodes as type RELATES_TO
|
||||
logger.info(f'Deleted Edge: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
@classmethod
|
||||
async def get_by_uuid(cls, driver: AsyncDriver, uuid: str):
|
||||
records, _, _ = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Episodic)-[e:MENTIONS {uuid: $uuid}]->(m:Entity)
|
||||
RETURN
|
||||
e.uuid As uuid,
|
||||
n.uuid AS source_node_uuid,
|
||||
m.uuid AS target_node_uuid,
|
||||
e.created_at AS created_at
|
||||
""",
|
||||
uuid=uuid,
|
||||
)
|
||||
|
||||
edges: list[EpisodicEdge] = []
|
||||
|
||||
for record in records:
|
||||
edges.append(
|
||||
EpisodicEdge(
|
||||
uuid=record['uuid'],
|
||||
source_node_uuid=record['source_node_uuid'],
|
||||
target_node_uuid=record['target_node_uuid'],
|
||||
created_at=record['created_at'].to_native(),
|
||||
)
|
||||
)
|
||||
|
||||
logger.info(f'Found Edge: {uuid}')
|
||||
|
||||
return edges[0]
|
||||
|
||||
|
||||
class EntityEdge(Edge):
|
||||
|
|
@ -97,7 +143,7 @@ class EntityEdge(Edge):
|
|||
self.fact_embedding = embedding[:EMBEDDING_DIM]
|
||||
|
||||
end = time()
|
||||
logger.info(f'embedded {text} in {end-start} ms')
|
||||
logger.info(f'embedded {text} in {end - start} ms')
|
||||
|
||||
return embedding
|
||||
|
||||
|
|
@ -127,3 +173,60 @@ class EntityEdge(Edge):
|
|||
logger.info(f'Saved edge to neo4j: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
async def delete(self, driver: AsyncDriver):
|
||||
result = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Entity)-[e:RELATES_TO {uuid: $uuid}]->(m:Entity)
|
||||
DELETE e
|
||||
""",
|
||||
uuid=self.uuid,
|
||||
)
|
||||
|
||||
logger.info(f'Deleted Edge: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
@classmethod
|
||||
async def get_by_uuid(cls, driver: AsyncDriver, uuid: str):
|
||||
records, _, _ = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Entity)-[e:RELATES_TO {uuid: $uuid}]->(m:Entity)
|
||||
RETURN
|
||||
e.uuid AS uuid,
|
||||
n.uuid AS source_node_uuid,
|
||||
m.uuid AS target_node_uuid,
|
||||
e.created_at AS created_at,
|
||||
e.name AS name,
|
||||
e.fact AS fact,
|
||||
e.fact_embedding AS fact_embedding,
|
||||
e.episodes AS episodes,
|
||||
e.expired_at AS expired_at,
|
||||
e.valid_at AS valid_at,
|
||||
e.invalid_at AS invalid_at
|
||||
""",
|
||||
uuid=uuid,
|
||||
)
|
||||
|
||||
edges: list[EntityEdge] = []
|
||||
|
||||
for record in records:
|
||||
edges.append(
|
||||
EntityEdge(
|
||||
uuid=record['uuid'],
|
||||
source_node_uuid=record['source_node_uuid'],
|
||||
target_node_uuid=record['target_node_uuid'],
|
||||
fact=record['fact'],
|
||||
name=record['name'],
|
||||
episodes=record['episodes'],
|
||||
fact_embedding=record['fact_embedding'],
|
||||
created_at=record['created_at'].to_native(),
|
||||
expired_at=parse_db_date(record['expired_at']),
|
||||
valid_at=parse_db_date(record['valid_at']),
|
||||
invalid_at=parse_db_date(record['invalid_at']),
|
||||
)
|
||||
)
|
||||
|
||||
logger.info(f'Found Edge: {uuid}')
|
||||
|
||||
return edges[0]
|
||||
|
|
|
|||
7
graphiti_core/helpers.py
Normal file
7
graphiti_core/helpers.py
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
from datetime import datetime
|
||||
|
||||
from neo4j import time as neo4j_time
|
||||
|
||||
|
||||
def parse_db_date(neo_date: neo4j_time.DateTime | None) -> datetime | None:
|
||||
return neo_date.to_native() if neo_date else None
|
||||
|
|
@ -1,7 +1,5 @@
|
|||
from .anthropic_client import AnthropicClient
|
||||
from .client import LLMClient
|
||||
from .config import LLMConfig
|
||||
from .groq_client import GroqClient
|
||||
from .openai_client import OpenAIClient
|
||||
|
||||
__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig', 'AnthropicClient', 'GroqClient']
|
||||
__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig']
|
||||
|
|
|
|||
|
|
@ -75,6 +75,9 @@ class Node(BaseModel, ABC):
|
|||
@abstractmethod
|
||||
async def save(self, driver: AsyncDriver): ...
|
||||
|
||||
@abstractmethod
|
||||
async def delete(self, driver: AsyncDriver): ...
|
||||
|
||||
def __hash__(self):
|
||||
return hash(self.uuid)
|
||||
|
||||
|
|
@ -83,6 +86,9 @@ class Node(BaseModel, ABC):
|
|||
return self.uuid == other.uuid
|
||||
return False
|
||||
|
||||
@classmethod
|
||||
async def get_by_uuid(cls, driver: AsyncDriver, uuid: str): ...
|
||||
|
||||
|
||||
class EpisodicNode(Node):
|
||||
source: EpisodeType = Field(description='source type')
|
||||
|
|
@ -111,13 +117,58 @@ class EpisodicNode(Node):
|
|||
created_at=self.created_at,
|
||||
valid_at=self.valid_at,
|
||||
source=self.source.value,
|
||||
_database='neo4j',
|
||||
)
|
||||
|
||||
logger.info(f'Saved Node to neo4j: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
async def delete(self, driver: AsyncDriver):
|
||||
result = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Episodic {uuid: $uuid})
|
||||
DETACH DELETE n
|
||||
""",
|
||||
uuid=self.uuid,
|
||||
)
|
||||
|
||||
logger.info(f'Deleted Node: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
@classmethod
|
||||
async def get_by_uuid(cls, driver: AsyncDriver, uuid: str):
|
||||
records, _, _ = await driver.execute_query(
|
||||
"""
|
||||
MATCH (e:Episodic {uuid: $uuid})
|
||||
RETURN e.content as content,
|
||||
e.created_at as created_at,
|
||||
e.valid_at as valid_at,
|
||||
e.uuid as uuid,
|
||||
e.name as name,
|
||||
e.source_description as source_description,
|
||||
e.source as source
|
||||
""",
|
||||
uuid=uuid,
|
||||
)
|
||||
|
||||
episodes = [
|
||||
EpisodicNode(
|
||||
content=record['content'],
|
||||
created_at=record['created_at'].to_native().timestamp(),
|
||||
valid_at=(record['valid_at'].to_native()),
|
||||
uuid=record['uuid'],
|
||||
source=EpisodeType.from_str(record['source']),
|
||||
name=record['name'],
|
||||
source_description=record['source_description'],
|
||||
)
|
||||
for record in records
|
||||
]
|
||||
|
||||
logger.info(f'Found Node: {uuid}')
|
||||
|
||||
return episodes[0]
|
||||
|
||||
|
||||
class EntityNode(Node):
|
||||
name_embedding: list[float] | None = Field(default=None, description='embedding of the name')
|
||||
|
|
@ -153,3 +204,47 @@ class EntityNode(Node):
|
|||
logger.info(f'Saved Node to neo4j: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
async def delete(self, driver: AsyncDriver):
|
||||
result = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Entity {uuid: $uuid})
|
||||
DETACH DELETE n
|
||||
""",
|
||||
uuid=self.uuid,
|
||||
)
|
||||
|
||||
logger.info(f'Deleted Node: {self.uuid}')
|
||||
|
||||
return result
|
||||
|
||||
@classmethod
|
||||
async def get_by_uuid(cls, driver: AsyncDriver, uuid: str):
|
||||
records, _, _ = await driver.execute_query(
|
||||
"""
|
||||
MATCH (n:Entity {uuid: $uuid})
|
||||
RETURN
|
||||
n.uuid As uuid,
|
||||
n.name AS name,
|
||||
n.created_at AS created_at,
|
||||
n.summary AS summary
|
||||
""",
|
||||
uuid=uuid,
|
||||
)
|
||||
|
||||
nodes: list[EntityNode] = []
|
||||
|
||||
for record in records:
|
||||
nodes.append(
|
||||
EntityNode(
|
||||
uuid=record['uuid'],
|
||||
name=record['name'],
|
||||
labels=['Entity'],
|
||||
created_at=record['created_at'].to_native(),
|
||||
summary=record['summary'],
|
||||
)
|
||||
)
|
||||
|
||||
logger.info(f'Found Node: {uuid}')
|
||||
|
||||
return nodes[0]
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ from enum import Enum
|
|||
from time import time
|
||||
|
||||
from neo4j import AsyncDriver
|
||||
from pydantic import BaseModel
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from graphiti_core.edges import EntityEdge
|
||||
from graphiti_core.llm_client.config import EMBEDDING_DIM
|
||||
|
|
@ -49,8 +49,8 @@ class Reranker(Enum):
|
|||
|
||||
|
||||
class SearchConfig(BaseModel):
|
||||
num_edges: int = 10
|
||||
num_nodes: int = 10
|
||||
num_edges: int = Field(default=10)
|
||||
num_nodes: int = Field(default=10)
|
||||
num_episodes: int = EPISODE_WINDOW_LEN
|
||||
search_methods: list[SearchMethod]
|
||||
reranker: Reranker | None
|
||||
|
|
@ -63,12 +63,12 @@ class SearchResults(BaseModel):
|
|||
|
||||
|
||||
async def hybrid_search(
|
||||
driver: AsyncDriver,
|
||||
embedder,
|
||||
query: str,
|
||||
timestamp: datetime,
|
||||
config: SearchConfig,
|
||||
center_node_uuid: str | None = None,
|
||||
driver: AsyncDriver,
|
||||
embedder,
|
||||
query: str,
|
||||
timestamp: datetime,
|
||||
config: SearchConfig,
|
||||
center_node_uuid: str | None = None,
|
||||
) -> SearchResults:
|
||||
start = time()
|
||||
|
||||
|
|
@ -79,11 +79,11 @@ async def hybrid_search(
|
|||
search_results = []
|
||||
|
||||
if config.num_episodes > 0:
|
||||
episodes.extend(await retrieve_episodes(driver, timestamp))
|
||||
episodes.extend(await retrieve_episodes(driver, timestamp, config.num_episodes))
|
||||
nodes.extend(await get_mentioned_nodes(driver, episodes))
|
||||
|
||||
if SearchMethod.bm25 in config.search_methods:
|
||||
text_search = await edge_fulltext_search(query, driver)
|
||||
text_search = await edge_fulltext_search(query, driver, 2 * config.num_edges)
|
||||
search_results.append(text_search)
|
||||
|
||||
if SearchMethod.cosine_similarity in config.search_methods:
|
||||
|
|
@ -94,7 +94,9 @@ async def hybrid_search(
|
|||
.embedding[:EMBEDDING_DIM]
|
||||
)
|
||||
|
||||
similarity_search = await edge_similarity_search(search_vector, driver)
|
||||
similarity_search = await edge_similarity_search(
|
||||
search_vector, driver, 2 * config.num_edges
|
||||
)
|
||||
search_results.append(similarity_search)
|
||||
|
||||
if len(search_results) > 1 and config.reranker is None:
|
||||
|
|
|
|||
|
|
@ -3,13 +3,12 @@ import logging
|
|||
import re
|
||||
import typing
|
||||
from collections import defaultdict
|
||||
from datetime import datetime
|
||||
from time import time
|
||||
|
||||
from neo4j import AsyncDriver
|
||||
from neo4j import time as neo4j_time
|
||||
|
||||
from graphiti_core.edges import EntityEdge
|
||||
from graphiti_core.helpers import parse_db_date
|
||||
from graphiti_core.nodes import EntityNode, EpisodicNode
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
|
@ -17,10 +16,6 @@ logger = logging.getLogger(__name__)
|
|||
RELEVANT_SCHEMA_LIMIT = 3
|
||||
|
||||
|
||||
def parse_db_date(neo_date: neo4j_time.DateTime | None) -> datetime | None:
|
||||
return neo_date.to_native() if neo_date else None
|
||||
|
||||
|
||||
async def get_mentioned_nodes(driver: AsyncDriver, episodes: list[EpisodicNode]):
|
||||
episode_uuids = [episode.uuid for episode in episodes]
|
||||
records, _, _ = await driver.execute_query(
|
||||
|
|
@ -106,7 +101,7 @@ async def edge_similarity_search(
|
|||
# vector similarity search over embedded facts
|
||||
records, _, _ = await driver.execute_query(
|
||||
"""
|
||||
CALL db.index.vector.queryRelationships("fact_embedding", 5, $search_vector)
|
||||
CALL db.index.vector.queryRelationships("fact_embedding", $limit, $search_vector)
|
||||
YIELD relationship AS r, score
|
||||
MATCH (n)-[r:RELATES_TO]->(m)
|
||||
RETURN
|
||||
|
|
@ -121,7 +116,7 @@ async def edge_similarity_search(
|
|||
r.expired_at AS expired_at,
|
||||
r.valid_at AS valid_at,
|
||||
r.invalid_at AS invalid_at
|
||||
ORDER BY score DESC LIMIT $limit
|
||||
ORDER BY score DESC
|
||||
""",
|
||||
search_vector=search_vector,
|
||||
limit=limit,
|
||||
|
|
@ -316,8 +311,11 @@ async def hybrid_node_search(
|
|||
relevant_node_uuids = set()
|
||||
|
||||
results = await asyncio.gather(
|
||||
*[entity_fulltext_search(q, driver, limit or RELEVANT_SCHEMA_LIMIT) for q in queries],
|
||||
*[entity_similarity_search(e, driver, limit or RELEVANT_SCHEMA_LIMIT) for e in embeddings],
|
||||
*[entity_fulltext_search(q, driver, 2 * (limit or RELEVANT_SCHEMA_LIMIT)) for q in queries],
|
||||
*[
|
||||
entity_similarity_search(e, driver, 2 * (limit or RELEVANT_SCHEMA_LIMIT))
|
||||
for e in embeddings
|
||||
],
|
||||
)
|
||||
|
||||
for result in results:
|
||||
|
|
|
|||
BIN
images/graphiti-intro-slides-stock-2.gif
Normal file
BIN
images/graphiti-intro-slides-stock-2.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 739 KiB |
5
poetry.lock
generated
5
poetry.lock
generated
|
|
@ -1,4 +1,4 @@
|
|||
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
|
||||
# This file is automatically @generated by Poetry 1.7.1 and should not be changed by hand.
|
||||
|
||||
[[package]]
|
||||
name = "annotated-types"
|
||||
|
|
@ -1804,7 +1804,6 @@ description = "Nvidia JIT LTO Library"
|
|||
optional = false
|
||||
python-versions = ">=3"
|
||||
files = [
|
||||
{file = "nvidia_nvjitlink_cu12-12.6.20-py3-none-manylinux2014_aarch64.whl", hash = "sha256:84fb38465a5bc7c70cbc320cfd0963eb302ee25a5e939e9f512bbba55b6072fb"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.6.20-py3-none-manylinux2014_x86_64.whl", hash = "sha256:562ab97ea2c23164823b2a89cb328d01d45cb99634b8c65fe7cd60d14562bd79"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.6.20-py3-none-win_amd64.whl", hash = "sha256:ed3c43a17f37b0c922a919203d2d36cbef24d41cc3e6b625182f8b58203644f6"},
|
||||
]
|
||||
|
|
@ -3758,4 +3757,4 @@ test = ["websockets"]
|
|||
[metadata]
|
||||
lock-version = "2.0"
|
||||
python-versions = "^3.10"
|
||||
content-hash = "001663dfc8078ad473675c994b15191db1f53a844e23f40ffa4a704379a61132"
|
||||
content-hash = "425298614e73ee8b1333343bdf27e876849d56b98671d1ad7b30ca6c6584d203"
|
||||
|
|
|
|||
|
|
@ -1,10 +1,11 @@
|
|||
[tool.poetry]
|
||||
name = "graphiti-core"
|
||||
version = "0.0.1"
|
||||
version = "0.1.0"
|
||||
description = "A temporal graph building library"
|
||||
authors = [
|
||||
"Paul Paliychuk <paul@getzep.com>",
|
||||
"Preston Rasmussen <preston@getzep.com>",
|
||||
"Daniel Chalef <daniel@getzep.com>",
|
||||
]
|
||||
readme = "README.md"
|
||||
license = "Apache-2.0"
|
||||
|
|
@ -20,9 +21,7 @@ fastapi = "^0.112.0"
|
|||
neo4j = "^5.23.0"
|
||||
sentence-transformers = "^3.0.1"
|
||||
diskcache = "^5.6.3"
|
||||
arrow = "^1.3.0"
|
||||
openai = "^1.38.0"
|
||||
anthropic = "^0.34.1"
|
||||
tenacity = "^9.0.0"
|
||||
|
||||
[tool.poetry.dev-dependencies]
|
||||
|
|
@ -36,6 +35,7 @@ ruff = "^0.6.2"
|
|||
pydantic = "^2.8.2"
|
||||
mypy = "^1.11.1"
|
||||
groq = "^0.9.0"
|
||||
anthropic = "^0.34.1"
|
||||
ipykernel = "^6.29.5"
|
||||
jupyterlab = "^4.2.4"
|
||||
diskcache-stubs = "^5.6.3.6.20240818"
|
||||
|
|
|
|||
|
|
@ -22,8 +22,6 @@ from datetime import datetime
|
|||
|
||||
import pytest
|
||||
from dotenv import load_dotenv
|
||||
from neo4j import AsyncGraphDatabase
|
||||
from openai import OpenAI
|
||||
|
||||
from graphiti_core.edges import EntityEdge, EpisodicEdge
|
||||
from graphiti_core.graphiti import Graphiti
|
||||
|
|
@ -74,7 +72,7 @@ def format_context(facts):
|
|||
@pytest.mark.asyncio
|
||||
async def test_graphiti_init():
|
||||
logger = setup_logging()
|
||||
graphiti = Graphiti(NEO4J_URI, NEO4j_USER, NEO4j_PASSWORD, None)
|
||||
graphiti = Graphiti(NEO4J_URI, NEO4j_USER, NEO4j_PASSWORD)
|
||||
|
||||
edges = await graphiti.search('Freakenomics guest')
|
||||
|
||||
|
|
@ -92,11 +90,9 @@ async def test_graphiti_init():
|
|||
|
||||
@pytest.mark.asyncio
|
||||
async def test_graph_integration():
|
||||
driver = AsyncGraphDatabase.driver(
|
||||
NEO4J_URI,
|
||||
auth=(NEO4j_USER, NEO4j_PASSWORD),
|
||||
)
|
||||
embedder = OpenAI().embeddings
|
||||
client = Graphiti(NEO4J_URI, NEO4j_USER, NEO4j_PASSWORD)
|
||||
embedder = client.llm_client.get_embedder()
|
||||
driver = client.driver
|
||||
|
||||
now = datetime.now()
|
||||
episode = EpisodicNode(
|
||||
|
|
@ -139,10 +135,21 @@ async def test_graph_integration():
|
|||
invalid_at=now,
|
||||
)
|
||||
|
||||
entity_edge.generate_embedding(embedder)
|
||||
await entity_edge.generate_embedding(embedder)
|
||||
|
||||
nodes = [episode, alice_node, bob_node]
|
||||
edges = [episodic_edge_1, episodic_edge_2, entity_edge]
|
||||
|
||||
# test save
|
||||
await asyncio.gather(*[node.save(driver) for node in nodes])
|
||||
await asyncio.gather(*[edge.save(driver) for edge in edges])
|
||||
|
||||
# test get
|
||||
assert await EpisodicNode.get_by_uuid(driver, episode.uuid) is not None
|
||||
assert await EntityNode.get_by_uuid(driver, alice_node.uuid) is not None
|
||||
assert await EpisodicEdge.get_by_uuid(driver, episodic_edge_1.uuid) is not None
|
||||
assert await EntityEdge.get_by_uuid(driver, entity_edge.uuid) is not None
|
||||
|
||||
# test delete
|
||||
await asyncio.gather(*[node.delete(driver) for node in nodes])
|
||||
await asyncio.gather(*[edge.delete(driver) for edge in edges])
|
||||
|
|
|
|||
|
|
@ -113,8 +113,8 @@ async def test_hybrid_node_search_with_limit():
|
|||
assert mock_fulltext_search.call_count == 1
|
||||
assert mock_similarity_search.call_count == 1
|
||||
# Verify that the limit was passed to the search functions
|
||||
mock_fulltext_search.assert_called_with('Test', mock_driver, 1)
|
||||
mock_similarity_search.assert_called_with([0.1, 0.2, 0.3], mock_driver, 1)
|
||||
mock_fulltext_search.assert_called_with('Test', mock_driver, 2)
|
||||
mock_similarity_search.assert_called_with([0.1, 0.2, 0.3], mock_driver, 2)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
|
|
@ -148,5 +148,5 @@ async def test_hybrid_node_search_with_limit_and_duplicates():
|
|||
assert set(node.name for node in results) == {'Alice', 'Bob', 'Charlie'}
|
||||
assert mock_fulltext_search.call_count == 1
|
||||
assert mock_similarity_search.call_count == 1
|
||||
mock_fulltext_search.assert_called_with('Test', mock_driver, 2)
|
||||
mock_similarity_search.assert_called_with([0.1, 0.2, 0.3], mock_driver, 2)
|
||||
mock_fulltext_search.assert_called_with('Test', mock_driver, 4)
|
||||
mock_similarity_search.assert_called_with([0.1, 0.2, 0.3], mock_driver, 4)
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue