This commit is contained in:
Naseem Ali 2025-12-12 10:42:23 +08:00 committed by GitHub
commit ab5bb68c0d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 1534 additions and 2 deletions

View file

@ -292,7 +292,7 @@ A full list of LightRAG init parameters:
| **workspace** | str | Workspace name for data isolation between different LightRAG Instances | |
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`FalkorDBStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
@ -855,6 +855,49 @@ see test_neo4j.py for a working example.
</details>
<details>
<summary> <b>Using FalkorDB for Storage</b> </summary>
* FalkorDB is a high-performance graph database that's Redis module compatible and supports the Cypher query language
* Running FalkorDB in Docker is recommended for seamless local testing
* See: https://hub.docker.com/r/falkordb/falkordb
```python
export FALKORDB_HOST="localhost"
export FALKORDB_PORT="6379"
export FALKORDB_PASSWORD="password" # optional
export FALKORDB_USERNAME="username" # optional
export FALKORDB_GRAPH_NAME="lightrag_graph" # optional, defaults to namespace
# Setup logger for LightRAG
setup_logger("lightrag", level="INFO")
# When you launch the project be sure to override the default KG: NetworkX
# by specifying graph_storage="FalkorDBStorage".
# Note: Default settings use NetworkX
# Initialize LightRAG with FalkorDB implementation.
async def initialize_rag():
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=gpt_4o_mini_complete, # Use gpt_4o_mini_complete LLM model
graph_storage="FalkorDBStorage", #<-----------override KG default
)
# Initialize database connections
await rag.initialize_storages()
# Initialize pipeline status for document processing
await initialize_pipeline_status()
return rag
```
see examples/falkordb_example.py for a working example.
</details>
<details>
<summary> <b>Using PostgreSQL Storage</b> </summary>
For production level scenarios you will most likely want to leverage an enterprise solution. PostgreSQL can provide a one-stop solution for you as KV store, VectorDB (pgvector) and GraphDB (apache AGE). PostgreSQL version 16.6 or higher is supported.
@ -969,8 +1012,9 @@ The `workspace` parameter ensures data isolation between different LightRAG inst
- **For Qdrant vector database, data isolation is achieved through payload-based partitioning (Qdrant's recommended multitenancy approach):** `QdrantVectorDBStorage` uses shared collections with payload filtering for unlimited workspace scalability.
- **For relational databases, data isolation is achieved by adding a `workspace` field to the tables for logical data separation:** `PGKVStorage`, `PGVectorStorage`, `PGDocStatusStorage`.
- **For the Neo4j graph database, logical data isolation is achieved through labels:** `Neo4JStorage`
- **For the FalkorDB graph database, logical data isolation is achieved through labels:** `FalkorDBStorage`
To maintain compatibility with legacy data, the default workspace for PostgreSQL non-graph storage is `default` and, for PostgreSQL AGE graph storage is null, for Neo4j graph storage is `base` when no workspace is configured. For all external storages, the system provides dedicated workspace environment variables to override the common `WORKSPACE` environment variable configuration. These storage-specific workspace environment variables are: `REDIS_WORKSPACE`, `MILVUS_WORKSPACE`, `QDRANT_WORKSPACE`, `MONGODB_WORKSPACE`, `POSTGRES_WORKSPACE`, `NEO4J_WORKSPACE`.
To maintain compatibility with legacy data, the default workspace for PostgreSQL non-graph storage is `default` and, for PostgreSQL AGE graph storage is null, for Neo4j graph storage is `base`, and for FalkorDB graph storage is `base` when no workspace is configured. For all external storages, the system provides dedicated workspace environment variables to override the common `WORKSPACE` environment variable configuration. These storage-specific workspace environment variables are: `REDIS_WORKSPACE`, `MILVUS_WORKSPACE`, `QDRANT_WORKSPACE`, `MONGODB_WORKSPACE`, `POSTGRES_WORKSPACE`, `NEO4J_WORKSPACE`, `FALKORDB_WORKSPACE`.
### AGENTS.md -- Guiding Coding Agents

View file

@ -326,6 +326,7 @@ OLLAMA_EMBEDDING_NUM_CTX=8192
# LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
# LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
# LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
# LIGHTRAG_GRAPH_STORAGE=FalkorDBStorage
### Redis Storage (Recommended for production deployment)
# LIGHTRAG_KV_STORAGE=RedisKVStorage
@ -412,6 +413,12 @@ NEO4J_KEEP_ALIVE=true
### DB specific workspace should not be set, keep for compatible only
### NEO4J_WORKSPACE=forced_workspace_name
# FalkorDB Configuration
FALKORDB_URI=falkordb://xxxxxxxx.falkordb.cloud
FALKORDB_GRAPH_NAME=lightrag_graph
# FALKORDB_HOST=localhost
# FALKORDB_PORT=6379
### MongoDB Configuration
MONGO_URI=mongodb://root:root@localhost:27017/
#MONGO_URI=mongodb+srv://xxxx

View file

@ -0,0 +1,130 @@
#!/usr/bin/env python
"""
Example of using LightRAG with FalkorDB - Updated Version
=========================================================
Fixed imports and modern LightRAG syntax.
Prerequisites:
1. FalkorDB running: docker run -p 6379:6379 falkordb/falkordb:latest
2. OpenAI API key in .env file
3. Required packages: pip install lightrag falkordb openai python-dotenv
"""
import asyncio
import os
from dotenv import load_dotenv
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
# Load environment variables
load_dotenv()
async def main():
"""Example usage of LightRAG with FalkorDB"""
# Set up environment for FalkorDB
os.environ.setdefault("FALKORDB_HOST", "localhost")
os.environ.setdefault("FALKORDB_PORT", "6379")
os.environ.setdefault("FALKORDB_GRAPH_NAME", "lightrag_example")
os.environ.setdefault("FALKORDB_WORKSPACE", "example_workspace")
# Initialize LightRAG with FalkorDB
rag = LightRAG(
working_dir="./falkordb_example",
llm_model_func=gpt_4o_mini_complete, # Updated function name
embedding_func=openai_embed, # Updated function name
graph_storage="FalkorDBStorage", # Specify FalkorDB backend
)
# Initialize storage connections
await rag.initialize_storages()
await initialize_pipeline_status()
# Example text to process
sample_text = """
FalkorDB is a high-performance graph database built on Redis.
It supports OpenCypher queries and provides excellent performance for graph operations.
LightRAG can now use FalkorDB as its graph storage backend, enabling scalable
knowledge graph operations with Redis-based persistence. This integration
allows developers to leverage both the speed of Redis and the power of
graph databases for advanced AI applications.
"""
print("Inserting text into LightRAG with FalkorDB backend...")
await rag.ainsert(sample_text)
# Check what was created
storage = rag.chunk_entity_relation_graph
nodes = await storage.get_all_nodes()
edges = await storage.get_all_edges()
print(f"Knowledge graph created: {len(nodes)} nodes, {len(edges)} edges")
print("\nQuerying the knowledge graph...")
# Test different query modes
questions = [
"What is FalkorDB and how does it relate to LightRAG?",
"What are the benefits of using Redis with graph databases?",
"How does FalkorDB support OpenCypher queries?",
]
for i, question in enumerate(questions, 1):
print(f"\n--- Question {i} ---")
print(f"Q: {question}")
try:
response = await rag.aquery(
question, param=QueryParam(mode="hybrid", top_k=3)
)
print(f"A: {response}")
except Exception as e:
print(f"Error querying: {e}")
# Show some graph statistics
print("\n--- Graph Statistics ---")
try:
all_labels = await storage.get_all_labels()
print(f"Unique entities: {len(all_labels)}")
if nodes:
print("Sample entities:")
for i, node in enumerate(nodes[:3]):
entity_id = node.get("entity_id", "Unknown")
entity_type = node.get("entity_type", "Unknown")
print(f" {i+1}. {entity_id} ({entity_type})")
if edges:
print("Sample relationships:")
for i, edge in enumerate(edges[:2]):
source = edge.get("source", "Unknown")
target = edge.get("target", "Unknown")
print(f" {i+1}. {source}{target}")
except Exception as e:
print(f"Error getting statistics: {e}")
if __name__ == "__main__":
print("LightRAG with FalkorDB Example")
print("==============================")
print("Note: This requires FalkorDB running on localhost:6379")
print(
"You can start FalkorDB with: docker run -p 6379:6379 falkordb/falkordb:latest"
)
print()
# Check OpenAI API key
if not os.getenv("OPENAI_API_KEY"):
print("❌ Please set your OpenAI API key in .env file!")
print(" Create a .env file with: OPENAI_API_KEY=your-actual-api-key")
exit(1)
try:
asyncio.run(main())
except KeyboardInterrupt:
print("\n👋 Example interrupted. Goodbye!")
except Exception as e:
print(f"\n💥 Unexpected error: {e}")
print("🔧 Make sure FalkorDB is running and your .env file is configured")

View file

@ -0,0 +1,279 @@
import os
import xml.etree.ElementTree as ET
import falkordb
# Constants
WORKING_DIR = "./dickens"
BATCH_SIZE_NODES = 500
BATCH_SIZE_EDGES = 100
# FalkorDB connection credentials
FALKORDB_HOST = "localhost"
FALKORDB_PORT = 6379
FALKORDB_GRAPH_NAME = "dickens_graph"
def xml_to_json(xml_file):
try:
tree = ET.parse(xml_file)
root = tree.getroot()
# Print the root element's tag and attributes to confirm the file has been correctly loaded
print(f"Root element: {root.tag}")
print(f"Root attributes: {root.attrib}")
data = {"nodes": [], "edges": []}
# Use namespace
namespace = {"": "http://graphml.graphdrawing.org/xmlns"}
for node in root.findall(".//node", namespace):
node_data = {
"id": node.get("id").strip('"'),
"entity_type": node.find("./data[@key='d1']", namespace).text.strip('"')
if node.find("./data[@key='d1']", namespace) is not None
else "",
"description": node.find("./data[@key='d2']", namespace).text
if node.find("./data[@key='d2']", namespace) is not None
else "",
"source_id": node.find("./data[@key='d3']", namespace).text
if node.find("./data[@key='d3']", namespace) is not None
else "",
}
data["nodes"].append(node_data)
for edge in root.findall(".//edge", namespace):
edge_data = {
"source": edge.get("source").strip('"'),
"target": edge.get("target").strip('"'),
"weight": float(edge.find("./data[@key='d5']", namespace).text)
if edge.find("./data[@key='d5']", namespace) is not None
else 1.0,
"description": edge.find("./data[@key='d6']", namespace).text
if edge.find("./data[@key='d6']", namespace) is not None
else "",
"keywords": edge.find("./data[@key='d7']", namespace).text
if edge.find("./data[@key='d7']", namespace) is not None
else "",
"source_id": edge.find("./data[@key='d8']", namespace).text
if edge.find("./data[@key='d8']", namespace) is not None
else "",
}
data["edges"].append(edge_data)
return data
except ET.ParseError as e:
print(f"Error parsing XML: {e}")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
def insert_nodes_and_edges_to_falkordb(data):
"""Insert graph data into FalkorDB"""
try:
# Connect to FalkorDB
db = falkordb.FalkorDB(host=FALKORDB_HOST, port=FALKORDB_PORT)
graph = db.select_graph(FALKORDB_GRAPH_NAME)
print(f"Connected to FalkorDB at {FALKORDB_HOST}:{FALKORDB_PORT}")
print(f"Using graph: {FALKORDB_GRAPH_NAME}")
nodes = data["nodes"]
edges = data["edges"]
print(f"Total nodes to insert: {len(nodes)}")
print(f"Total edges to insert: {len(edges)}")
# Insert nodes in batches
for i in range(0, len(nodes), BATCH_SIZE_NODES):
batch_nodes = nodes[i : i + BATCH_SIZE_NODES]
# Build UNWIND query for batch insert
query = """
UNWIND $nodes AS node
CREATE (n:Entity {
entity_id: node.id,
entity_type: node.entity_type,
description: node.description,
source_id: node.source_id
})
"""
graph.query(query, {"nodes": batch_nodes})
print(f"Inserted nodes {i+1} to {min(i + BATCH_SIZE_NODES, len(nodes))}")
# Insert edges in batches
for i in range(0, len(edges), BATCH_SIZE_EDGES):
batch_edges = edges[i : i + BATCH_SIZE_EDGES]
# Build UNWIND query for batch insert
query = """
UNWIND $edges AS edge
MATCH (source:Entity {entity_id: edge.source})
MATCH (target:Entity {entity_id: edge.target})
CREATE (source)-[r:DIRECTED {
weight: edge.weight,
description: edge.description,
keywords: edge.keywords,
source_id: edge.source_id
}]-(target)
"""
graph.query(query, {"edges": batch_edges})
print(f"Inserted edges {i+1} to {min(i + BATCH_SIZE_EDGES, len(edges))}")
print("Data insertion completed successfully!")
# Print some statistics
node_count_result = graph.query("MATCH (n:Entity) RETURN count(n) AS count")
edge_count_result = graph.query(
"MATCH ()-[r:DIRECTED]-() RETURN count(r) AS count"
)
node_count = (
node_count_result.result_set[0][0] if node_count_result.result_set else 0
)
edge_count = (
edge_count_result.result_set[0][0] if edge_count_result.result_set else 0
)
print("Final statistics:")
print(f"- Nodes in database: {node_count}")
print(f"- Edges in database: {edge_count}")
except Exception as e:
print(f"Error inserting data into FalkorDB: {e}")
def query_graph_data():
"""Query and display some sample data from FalkorDB"""
try:
# Connect to FalkorDB
db = falkordb.FalkorDB(host=FALKORDB_HOST, port=FALKORDB_PORT)
graph = db.select_graph(FALKORDB_GRAPH_NAME)
print("\n=== Sample Graph Data ===")
# Get some sample nodes
query = (
"MATCH (n:Entity) RETURN n.entity_id, n.entity_type, n.description LIMIT 5"
)
result = graph.query(query)
print("\nSample Nodes:")
if result.result_set:
for record in result.result_set:
print(f"- {record[0]} ({record[1]}): {record[2][:100]}...")
# Get some sample edges
query = """
MATCH (a:Entity)-[r:DIRECTED]-(b:Entity)
RETURN a.entity_id, b.entity_id, r.weight, r.description
LIMIT 5
"""
result = graph.query(query)
print("\nSample Edges:")
if result.result_set:
for record in result.result_set:
print(
f"- {record[0]} -> {record[1]} (weight: {record[2]}): {record[3][:100]}..."
)
# Get node degree statistics
query = """
MATCH (n:Entity)
OPTIONAL MATCH (n)-[r]-()
WITH n, count(r) AS degree
RETURN min(degree) AS min_degree, max(degree) AS max_degree, avg(degree) AS avg_degree
"""
result = graph.query(query)
print("\nNode Degree Statistics:")
if result.result_set:
record = result.result_set[0]
print(f"- Min degree: {record[0]}")
print(f"- Max degree: {record[1]}")
print(f"- Avg degree: {record[2]:.2f}")
except Exception as e:
print(f"Error querying FalkorDB: {e}")
def clear_graph():
"""Clear all data from the FalkorDB graph"""
try:
db = falkordb.FalkorDB(host=FALKORDB_HOST, port=FALKORDB_PORT)
graph = db.select_graph(FALKORDB_GRAPH_NAME)
# Delete all nodes and relationships
graph.query("MATCH (n) DETACH DELETE n")
print("Graph cleared successfully!")
except Exception as e:
print(f"Error clearing graph: {e}")
def main():
xml_file = os.path.join(WORKING_DIR, "graph_chunk_entity_relation.graphml")
if not os.path.exists(xml_file):
print(
f"Error: File {xml_file} not found. Please ensure the GraphML file exists."
)
print(
"This file is typically generated by LightRAG after processing documents."
)
return
print("FalkorDB Graph Visualization Example")
print("====================================")
print(f"Processing file: {xml_file}")
print(f"FalkorDB connection: {FALKORDB_HOST}:{FALKORDB_PORT}")
print(f"Graph name: {FALKORDB_GRAPH_NAME}")
print()
# Parse XML to JSON
print("1. Parsing GraphML file...")
data = xml_to_json(xml_file)
if data is None:
print("Failed to parse XML file.")
return
print(f" Found {len(data['nodes'])} nodes and {len(data['edges'])} edges")
# Ask user what to do
while True:
print("\nOptions:")
print("1. Clear existing graph data")
print("2. Insert data into FalkorDB")
print("3. Query sample data")
print("4. Exit")
choice = input("\nSelect an option (1-4): ").strip()
if choice == "1":
print("\n2. Clearing existing graph data...")
clear_graph()
elif choice == "2":
print("\n2. Inserting data into FalkorDB...")
insert_nodes_and_edges_to_falkordb(data)
elif choice == "3":
print("\n3. Querying sample data...")
query_graph_data()
elif choice == "4":
print("Goodbye!")
break
else:
print("Invalid choice. Please try again.")
if __name__ == "__main__":
main()

View file

@ -12,6 +12,7 @@ STORAGE_IMPLEMENTATIONS = {
"implementations": [
"NetworkXStorage",
"Neo4JStorage",
"FalkorDBStorage",
"PGGraphStorage",
"MongoGraphStorage",
"MemgraphStorage",
@ -54,6 +55,7 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
# Graph Storage Implementations
"NetworkXStorage": [],
"Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
"FalkorDBStorage": ["FALKORDB_HOST", "FALKORDB_PORT"],
"MongoGraphStorage": [
"MONGO_URI",
"MONGO_DATABASE",
@ -100,6 +102,7 @@ STORAGES = {
"NanoVectorDBStorage": ".kg.nano_vector_db_impl",
"JsonDocStatusStorage": ".kg.json_doc_status_impl",
"Neo4JStorage": ".kg.neo4j_impl",
"FalkorDBStorage": ".kg.falkordb_impl",
"MilvusVectorDBStorage": ".kg.milvus_impl",
"MongoKVStorage": ".kg.mongo_impl",
"MongoDocStatusStorage": ".kg.mongo_impl",

1069
lightrag/kg/falkordb_impl.py Normal file

File diff suppressed because it is too large Load diff