From 383a2c22aff3838cd4c46ccb9d98ed1a6bd5b3de Mon Sep 17 00:00:00 2001 From: Daniel Chalef <131175+danielchalef@users.noreply.github.com> Date: Tue, 27 Aug 2024 12:56:17 -0700 Subject: [PATCH] Update README.md - docs to docs site (#60) * Update README.md - docs to docs site * Update README.md Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --- README.md | 210 ++---------------------------------------------------- 1 file changed, 7 insertions(+), 203 deletions(-) diff --git a/README.md b/README.md index 69e3b608..3c98389f 100644 --- a/README.md +++ b/README.md @@ -69,11 +69,15 @@ Optional: > [!TIP] > The simplest way to install Neo4j is via [Neo4j Desktop](https://neo4j.com/download/). It provides a user-friendly interface to manage Neo4j instances and databases. -`pip install graphiti-core` +```bash +pip install graphiti-core +``` or -`poetry add graphiti-core` +```bash +poetry add graphiti-core +``` @@ -145,207 +149,7 @@ graphiti.close() ## Documentation -### Adding Episodes - -Episodes represent a single data ingestion event. An `episode` is itself a node, and any nodes identified while ingesting the -episode are related to the episode via `MENTIONS` edges. - -Episodes enable querying for information at a point in time and understanding the provenance of nodes and their edge relationships. - -Supported episode types: - -- `text`: Unstructured text data -- `message`: Conversational messages of the format `speaker: message...` -- `json`: Structured data, processed distinctly from the other types - -The graph below was generated using the code in the [Quick Start](#quick-start). Each "podcast" is an individual episode. - -![Simple Graph Visualization](images/simple_graph.svg) - -#### Adding a `text` or `message` Episode - -Using the `EpisodeType.text` type: - -```python -await graphiti.add_episode( - name="tech_innovation_article", - episode_body=( - "MIT researchers have unveiled 'ClimateNet', an AI system capable of predicting " - "climate patterns with unprecedented accuracy. Early tests show it can forecast " - "major weather events up to three weeks in advance, potentially revolutionizing " - "disaster preparedness and agricultural planning." - ), - source=EpisodeType.text, - # A description of the source (e.g., "podcast", "news article") - source_description="Technology magazine article", - # The timestamp for when this episode occurred or was created - reference_time=datetime(2023, 11, 15, 9, 30), - # Additional metadata about the episode (optional) - metadata={ - "author": "Zara Patel", - "publication": "Tech Horizons Monthly", - "word_count": 39 - } -) -``` - -Using the `EpisodeType.message` type supports passing in multi-turn conversations in the `episode_body`. - -The text should be structured in `{role/name}: {message}` pairs. - -```python -await graphiti.add_episode( - name="Customer_Support_Interaction_1", - episode_body=( - "Customer: Hi, I'm having trouble with my Allbirds shoes. " - "The sole is coming off after only 2 months of use.\n" - "Support: I'm sorry to hear that. Can you please provide your order number?" - ), - source=EpisodeType.message, - source_description="Customer support chat", - reference_time=datetime(2024, 3, 15, 14, 45), - metadata={ - "channel": "Live Chat", - "agent_id": "SP001", - "customer_id": "C12345" - } -) -``` - -#### Adding an Epsiode using structured data in JSON format - -JSON documents can be arbitrarily nested. However, it's advisable to keep documents compact, as they must fit within your LLM's context window. - -> [!TIP] -> For large data imports, consider using the `add_episode_bulk` API to efficiently add multiple episodes at once. - -```python -product_data = { - "id": "PROD001", - "name": "Men's SuperLight Wool Runners", - "color": "Dark Grey", - "sole_color": "Medium Grey", - "material": "Wool", - "technology": "SuperLight Foam", - "price": 125.00, - "in_stock": True, - "last_updated": "2024-03-15T10:30:00Z" -} - -# Add the episode to the graph -await graphiti.add_episode( - name="Product Update - PROD001", - episode_body=product_data, # Pass the Python dictionary directly - source=EpisodeType.json, - source_description="Allbirds product catalog update", - reference_time=datetime.now(), - metadata={ - "update_type": "product_info", - "catalog_version": "v2.3" - } -) -``` - -#### Loading Episodes in Bulk - -Graphiti offers `add_episode_bulk` for efficient batch ingestion of episodes, significantly outperforming `add_episode` for large datasets. This method is highly recommended for bulk loading. - -> [!WARNING] -> Use `add_episode_bulk` only for populating empty graphs or when edge invalidation is not required. The bulk ingestion pipeline does not perform edge invalidation operations. - -```python -product_data = [ - { - "id": "PROD001", - "name": "Men's SuperLight Wool Runners", - "color": "Dark Grey", - "sole_color": "Medium Grey", - "material": "Wool", - "technology": "SuperLight Foam", - "price": 125.00, - "in_stock": true, - "last_updated": "2024-03-15T10:30:00Z" - }, - ... - { - "id": "PROD0100", - "name": "Kids Wool Runner-up Mizzles", - "color": "Natural Grey", - "sole_color": "Orange", - "material": "Wool", - "technology": "Water-repellent", - "price": 80.00, - "in_stock": true, - "last_updated": "2024-03-17T14:45:00Z" - } -] - -# Prepare the episodes for bulk loading -bulk_episodes = [ - RawEpisode( - name=f"Product Update - {product['id']}", - content=json.dumps(product), - source=EpisodeType.json, - source_description="Allbirds product catalog update", - reference_time=datetime.now() - ) - for product in product_data -] - -await graphiti.add_episode_bulk(bulk_episodes) -``` - -### Searching graphiti's graph - -The examples below demonstrate two search approaches in the graphiti library: - -1. **Hybrid Search:** - - ```python - await graphiti.search(query) - ``` - - Combines semantic similarity and BM25 retrieval, reranked using Reciprocal Rank Fusion. - - Example: Does a broad retrieval of facts related to Allbirds Wool Runners and Jane's purchase. - -2. **Node Distance Reranking:** - - ```python - await client.search(query, focal_node_uuid) - ``` - - Extends Hybrid Search above by prioritizing results based on proximity to a specified node in the graph. - - Example: Focuses on Jane-specific information, highlighting her wool allergy. - -Node Distance Reranking is particularly useful for entity-specific queries, providing more contextually relevant results. It weights facts by their closeness to the focal node, emphasizing information directly related to the entity of interest. - -This dual approach allows for both broad exploration and targeted, entity-specific information retrieval from the knowledge graph. - -```python -query = "Can Jane wear Allbirds Wool Runners?" -jane_node_uuid = "123e4567-e89b-12d3-a456-426614174000" - -def print_facts(edges): - print("\n".join([edge.fact for edge in edges])) - -# Hybrid Search -results = await graphiti.search(query) -print_facts(results) - -> The Allbirds Wool Runners are sold by Allbirds. -> Men's SuperLight Wool Runners - Dark Grey (Medium Grey Sole) has a runner silhouette. -> Jane purchased SuperLight Wool Runners. - -# Hybrid Search with Node Distance Reranking -await client.search(query, jane_node_uuid) -print_facts(results) - -> Jane purchased SuperLight Wool Runners. -> Jane is allergic to wool. -> The Allbirds Wool Runners are sold by Allbirds. -``` +Visit the Zep knowledge base for graphiti [Guides and API documentation](https://help.getzep.com/graphiti/graphiti). ## Status and Roadmap