Fix diagram

2024-12-23 14:16:57 +01:00 · 2024-12-23 14:16:57 +01:00 · b73a35fec6
commit b73a35fec6
parent c0ad53cde7
3 changed files with 4 additions and 83 deletions
--- a/README.md
+++ b/README.md
@ -193,93 +193,14 @@ if __name__ == '__main__':
 When you run this script, you will see step-by-step messages in the console that help you trace the execution flow and understand what the script is doing at each stage.
 A version of this example is here: `examples/python/simple_example.py`
-### Create your own memory store
+### Understand our architecture
 cognee framework consists of tasks that can be grouped into pipelines.
 Each task can be an independent part of business logic, that can be tied to other tasks to form a pipeline.
 These tasks persist data into your memory store enabling you to search for relevant context of past conversations, documents, or any other data you have stored.
-
+<div align="center">
-
+<img src="assets/cognee_diagram.png" alt="cognee concept diagram" width="50%" />
-### Example: Classify your documents
+</div>
 Here is an example of how it looks for a default cognify pipeline:
 1. To prepare the data for the pipeline run, first we need to add it to our metastore and normalize it:
 Start with:
 ```
 text = """Natural language processing (NLP) is an interdisciplinary
       subfield of computer science and information retrieval"""
 await cognee.add(text) # Add a new piece of information
 ```
 2. In the next step we make a task. The task can be any business logic we need, but the important part is that it should be encapsulated in one function.
 Here we show an example of creating a naive LLM classifier that takes a Pydantic model and then stores the data in both the graph and vector stores after analyzing each chunk.
 We provided just a snippet for reference, but feel free to check out the implementation in our repo. 
 ```
 async def chunk_naive_llm_classifier(
    data_chunks: list[DocumentChunk],
    classification_model: Type[BaseModel]
 ):
    # Extract classifications asynchronously
    chunk_classifications = await asyncio.gather(
        *(extract_categories(chunk.text, classification_model) for chunk in data_chunks)
    )
    # Collect classification data points using a set to avoid duplicates
    classification_data_points = {
        uuid5(NAMESPACE_OID, cls.label.type)
        for cls in chunk_classifications
    } | {
        uuid5(NAMESPACE_OID, subclass.value)
        for cls in chunk_classifications
        for subclass in cls.label.subclass
    }
    vector_engine = get_vector_engine()
    collection_name = "classification"
    # Define the payload schema
    class Keyword(BaseModel):
        uuid: str
        text: str
        chunk_id: str
        document_id: str
    # Ensure the collection exists and retrieve existing data points
    if not await vector_engine.has_collection(collection_name):
        await vector_engine.create_collection(collection_name, payload_schema=Keyword)
        existing_points_map = {}
    else:
        existing_points_map = {}
    return data_chunks
 ...
 ```
 We have many tasks that can be used in your pipelines, and you can also create your  tasks to fit your business logic.
 3. Once we have our tasks, it is time to group them into a pipeline.
 This simplified snippet demonstrates how tasks can be added to a pipeline, and how they can pass the information forward from one to another. 
 ```
 Task(
    chunk_naive_llm_classifier,
    classification_model = cognee_config.classification_model,
 )
 pipeline = run_tasks(tasks, documents)
 ```
 To see the working code, check cognee.api.v1.cognify default pipeline in our repo.
 ## Vector retrieval, Graphs and LLMs
--- a/assets/architecture.png
+++ b/assets/architecture.png
--- a/assets/cognee_diagram.png
+++ b/assets/cognee_diagram.png