Update README.md

This commit is contained in:
Vasilije 2024-08-11 14:26:49 +02:00 committed by GitHub
parent e494ec6c9e
commit b49553ab1c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

186
README.md
View file

@ -1,6 +1,6 @@
# cognee
Deterministic LLMs Outputs for AI Engineers using graphs, LLMs and vector retrieval
We build for developers who need a reliable, production-ready data layer for AI applications
<p>
@ -11,7 +11,7 @@ Deterministic LLMs Outputs for AI Engineers using graphs, LLMs and vector retrie
<p>
<i>Open-source framework for creating self-improving deterministic outputs for LLMs.</i>
<i> Developer-friendly framework for creating reliable data layer for AI applications using graph and vector stores for</i>
</p>
<p>
@ -29,7 +29,9 @@ Deterministic LLMs Outputs for AI Engineers using graphs, LLMs and vector retrie
</a>
</p>
![Cognee Demo](assets/cognee_demo.gif)
cognee aims to be dbt for LLMOps
cognee implements scalable, modular data pipelines that allow for a creation of the LLM-enriched data layer using graph and vector stores.
Try it in a Google collab <a href="https://colab.research.google.com/drive/1jayZ5JRwDaUGFvCw9UZySBG-iB9gpYfu?usp=sharing">notebook</a> or have a look at our <a href="https://topoteretes.github.io/cognee">documentation</a>
@ -74,6 +76,26 @@ or
import cognee
cognee.config.llm_api_key = "YOUR_OPENAI_API_KEY"
```
You can use different LLM providers, for more info check out our <a href="https://topoteretes.github.io/cognee">documentation</a>
In the next step make sure to launch a Postgres instance. Here is an example from our docker-compose:
```
postgres:
image: postgres:latest
container_name: postgres
environment:
POSTGRES_USER: cognee
POSTGRES_PASSWORD: cognee
POSTGRES_DB: cognee_db
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- 5432:5432
networks:
- cognee-network
```
If you are using Networkx, create an account on Graphistry to visualize results:
```
@ -81,15 +103,20 @@ If you are using Networkx, create an account on Graphistry to visualize results:
cognee.config.set_graphistry_password = "YOUR_PASSWORD"
```
To run the UI, run:
(Optional) To run the UI, run:
```
docker-compose up cognee
```
Then navigate to localhost:3000/wizard
You can also use Ollama or Anyscale as your LLM provider. For more info on local models check our [docs](https://topoteretes.github.io/cognee)
### Run the default example
### Run
Make sure to launch the Postgres instance first. Navigate to the cognee folder and run:
```
docker compose up postgres
```
Run the default cognee pipeline:
```
import cognee
@ -106,32 +133,128 @@ await search_results = cognee.search("SIMILARITY", {'query': 'Tell me about NLP'
print(search_results)
```
Add alternative data types:
```
cognee.add("file://{absolute_path_to_file}", dataset_name)
```
Or
```
cognee.add("data://{absolute_path_to_directory}", dataset_name)
# This is useful if you have a directory with files organized in subdirectories.
# You can target which directory to add by providing dataset_name.
# Example:
# root
# / \
# reports bills
# / \
# 2024 2023
#
# cognee.add("data://{absolute_path_to_root}", "reports.2024")
# This will add just directory 2024 under reports.
### Create your pipelines
cognee framework consists of tasks that can be grouped into pipelines. Each task can be an independent part of business logic, that can be tied to other tasks to form a pipeline.
Here is an example of how it looks for a default cognify pipeline:
1. To prepare the data for the pipeline run, first we need to add it to our metastore and normalize it:
Start with:
```
docker compose up postgres
```
And then run:
```
text = """Natural language processing (NLP) is an interdisciplinary
subfield of computer science and information retrieval"""
await cognee.add([text], "example_dataset") # Add a new piece of information
```
Read more [here](docs/index.md#run).
2. In the next step we make a task. The task can be any business logic we need, but the important part is that it should be encapsulated in one function.
Here we show an example of creating a naive LLM classifier that takes a Pydantic model and then stores the data in both the graph and vector stores after analyzing each chunk.
We provided just a snippet for reference, but feel free to check out the implementation in our repo.
```
async def chunk_naive_llm_classifier(data_chunks: list[DocumentChunk], classification_model: Type[BaseModel]):
if len(data_chunks) == 0:
return data_chunks
chunk_classifications = await asyncio.gather(
*[extract_categories(chunk.text, classification_model) for chunk in data_chunks],
)
classification_data_points = []
for chunk_index, chunk in enumerate(data_chunks):
chunk_classification = chunk_classifications[chunk_index]
classification_data_points.append(uuid5(NAMESPACE_OID, chunk_classification.label.type))
classification_data_points.append(uuid5(NAMESPACE_OID, chunk_classification.label.type))
for classification_subclass in chunk_classification.label.subclass:
classification_data_points.append(uuid5(NAMESPACE_OID, classification_subclass.value))
vector_engine = get_vector_engine()
class Keyword(BaseModel):
uuid: str
text: str
chunk_id: str
document_id: str
collection_name = "classification"
if await vector_engine.has_collection(collection_name):
existing_data_points = await vector_engine.retrieve(
collection_name,
list(set(classification_data_points)),
) if len(classification_data_points) > 0 else []
existing_points_map = {point.id: True for point in existing_data_points}
else:
existing_points_map = {}
await vector_engine.create_collection(collection_name, payload_schema=Keyword)
data_points = []
nodes = []
edges = []
for (chunk_index, data_chunk) in enumerate(data_chunks):
chunk_classification = chunk_classifications[chunk_index]
classification_type_label = chunk_classification.label.type
classification_type_id = uuid5(NAMESPACE_OID, classification_type_label)
...
```
To see existing tasks, have a look at the cognee.tasks
3. Once we have our tasks, it is time to group them into a pipeline.
This snippet shows how a group of tasks can be added to a pipeline, and how they can pass the information forward from one to another.
```
tasks = [
Task(document_to_ontology, root_node_id = root_node_id),
Task(source_documents_to_chunks, parent_node_id = root_node_id), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(chunk_to_graph_decomposition, topology_model = KnowledgeGraph, task_config = { "batch_size": 10 }), # Set the graph topology for the document chunk data
Task(chunks_into_graph, graph_model = KnowledgeGraph, collection_name = "entities"), # Generate knowledge graphs from the document chunks and attach it to chunk nodes
Task(chunk_update_check, collection_name = "chunks"), # Find all affected chunks, so we don't process unchanged chunks
Task(
save_chunks_to_store,
collection_name = "chunks",
), # Save the document chunks in vector db and as nodes in graph db (connected to the document node and between each other)
run_tasks_parallel([
Task(
chunk_extract_summary,
summarization_model = cognee_config.summarization_model,
collection_name = "chunk_summaries",
), # Summarize the document chunks
Task(
chunk_naive_llm_classifier,
classification_model = cognee_config.classification_model,
),
]),
Task(chunk_remove_disconnected), # Remove the obsolete document chunks.
]
pipeline = run_tasks(tasks, documents)
```
To see the working code, check cognee.api.v1.cognify default pipeline in our repo.
## Vector retrieval, Graphs and LLMs
Cognee supports a variety of tools and services for different operations:
- **Modular**: Cognee is modular by nature, using tasks grouped into pipelines
- **Local Setup**: By default, LanceDB runs locally with NetworkX and OpenAI.
@ -140,6 +263,8 @@ Cognee supports a variety of tools and services for different operations:
- **Language Models (LLMs)**: You can use either Anyscale or Ollama as your LLM provider.
- **Graph Stores**: In addition to LanceDB, Neo4j is also supported for graph storage.
- **User management**: Create individual user graphs and manage permissions
## Demo
@ -151,17 +276,6 @@ Check out our demo notebook [here](https://github.com/topoteretes/cognee/blob/ma
## How it works
![Image](assets/architecture.png)
## Star History