diff --git a/cognee/api/v1/config/config.py b/cognee/api/v1/config/config.py index 8d8e34996..83b725453 100644 --- a/cognee/api/v1/config/config.py +++ b/cognee/api/v1/config/config.py @@ -57,6 +57,14 @@ class config(): def llm_provider(llm_provider: str): graph_config.llm_provider = llm_provider + @staticmethod + def llm_endpoint(llm_endpoint: str): + graph_config.llm_endpoint = llm_endpoint + + @staticmethod + def llm_model(llm_model: str): + graph_config.llm_model = llm_model + @staticmethod def intra_layer_score_treshold(intra_layer_score_treshold: str): cognify_config.intra_layer_score_treshold =intra_layer_score_treshold diff --git a/docs/api_reference.md b/docs/api_reference.md index 532c5e3e8..4e29ea265 100644 --- a/docs/api_reference.md +++ b/docs/api_reference.md @@ -23,9 +23,6 @@ The config class in this module offers a series of static methods to configure t Import the module as follows: -```python -from cognee.config import config -``` ## Methods @@ -35,7 +32,7 @@ Sets the root directory of the system where essential system files and operation system_root_directory (str): The path to set as the system's root directory. Example: ```python -config.system_root_directory('/path/to/system/root') +cognee.config.system_root_directory('/path/to/system/root') ``` ### data_root_directory(data_root_directory: str) @@ -45,8 +42,8 @@ data_root_directory (str): The path to set as the data root directory. Example: ```python - -config.data_root_directory('/path/to/data/root') +import cognee +cognee.config.data_root_directory('/path/to/data/root') ``` ### set_classification_model(classification_model: object) @@ -56,7 +53,8 @@ classification_model (object): The Pydantic model to use for classification. Check cognee.shared.data_models for existing models. Example: ```python -config.set_classification_model(model) +import cognee +cognee.config.set_classification_model(model) ``` set_summarization_model(summarization_model: object) @@ -66,7 +64,8 @@ summarization_model (object): The model to use for summarization. Check cognee.shared.data_models for existing models. Example: ```python -config.set_summarization_model(my_summarization_model) +import cognee +cognee.config.set_summarization_model(my_summarization_model) ``` ### set_llm_model(llm_model: object) @@ -74,7 +73,8 @@ Determines the model to handle LLMs. Parameters: llm_model (object): The model to use for LLMs. Example: ```python -config.set_llm_model("OpenAI") +import cognee +cognee.config.set_llm_model("openai") ``` ### set_graph_engine(graph_engine: object) @@ -85,7 +85,7 @@ Example: ```python from cognee.shared.data_models import GraphDBType -config.set_graph_engine(GraphDBType.NEO4J) +cognee.config.set_graph_engine(GraphDBType.NEO4J) ``` @@ -122,62 +122,5 @@ For each API endpoint, provide the following details: "status": "OK" } ``` -### Endpoint 2: Add -- URL: /Add -- Method: POST -- Auth Required: Yes | No -- Description: This endpoint is responsible for adding data to the graph. -#### Parameters -| Name | Type | Required | Description | -| --- |--------------------------------------------------| --- | --- | -| data | Union[str, BinaryIO, List[Union[str, BinaryIO]]] | Yes | The data to be added| -| dataset_id | UUID | Yes | The ID of the dataset. | -| dataset_name | String | Yes | The name of the dataset.| - - - -#### Response -```json -{ - "response": "data" -} -``` - -### Endpoint 3: Cognify -- URL: /cognify -- Method: POST -- Auth Required: Yes | No -- Description: This endpoint is responsible for the cognitive processing of the content. - -#### Parameters -| Name | Type | Required | Description | -| --- |--------------------------------------------------| --- | --- | -| datasets | Union[str, List[str]] | Yes | The data to be added| - - -#### Response -```json -{ - "response": "data" -} -``` - - -### Endpoint 4: search -- URL: /search -- Method: POST -- Auth Required: No -- Description: This endpoint is responsible for searching for nodes in the graph. -#### Parameters -| Name | Type | Required | Description | -| --- | --- | --- | --- | -| query_params | Dict[str, Any] | Yes | Description of the parameter. | - - -#### Response -```json -{ - "response": "data" -} -``` \ No newline at end of file +More endpoints are available in the FastAPI server. Documentation is in progress diff --git a/docs/conceptual_overview.md b/docs/conceptual_overview.md index 1852ef683..3e4041626 100644 --- a/docs/conceptual_overview.md +++ b/docs/conceptual_overview.md @@ -3,33 +3,41 @@ ## Introduction !!! info "What is cognee?" - cognee is a framework for data processing that enables LLMs to produce for deterministic and traceable outputs. + cognee is a data processing framework that enables LLMs to produce deterministic and traceable outputs. -cognee focuses on creating tools that assist developers in introducing greater predictability and management into their Retrieval-Augmented Generation (RAG) workflows through the use of graph architectures, vector stores and auto-optimizing pipelines. +cognee assists developers in introducing greater predictability and management into their Retrieval-Augmented Generation (RAG) workflows through the use of graph architectures, vector stores, and auto-optimizing pipelines. - -Displaying this information as a graph is the clearest method to grasp the content of your documents. Crucially, using a graph allows for the systematic navigation and extraction of data from documents based on your grasp of a document's organization, an idea often termed 'document hierarchies'. +Displaying information as a graph is the clearest way to grasp the content of your documents. Crucially, graphs allow systematic navigation and extraction of data from documents based on their hierarchy. ## Core Concepts ### Concept 1: Data Pipelines -Most of the data we provide to a system can be understood as unstructured, semi-structured or structured. Rows from a database would belong to structured data, jsons to semi-structured data and logs could be unstructured. -To organize and process this data, we need to make sure to have custom loaders for all data types and also to unify and organize the data well together. +Most of the data we provide to a system can be categorized as unstructured, semi-structured, or structured. Rows from a database would belong to structured data, jsons to semi-structured data, and logs that we input into the system could be considered unstructured. +To organize and process this data, we need to ensure we have custom loaders for all data types, which can help us unify and organize it properly.
![Data Pipelines](img/pipelines.png)
Data Pipeline Example
-In the example above, we have a data pipeline that imports the data from various sources, normalizes it, and stores it in a database. It also creates relevant identifiers and relationships between the data. +In the example above, we have a pipeline in which data has been imported from various sources, normalized, and stored in a database. Relevant identifiers and relationships between the data are also created in this process. +To create an effective data pipeline for processing various types of data—structured, semi-structured, and unstructured—it’s crucial to understand each type's specific handling and processing needs. Let's expand on the concepts involved in setting up such a data pipeline. + +Data Types and Their Handling +- Structured Data: This includes data that adheres to a fixed schema, such as rows in a relational database or data in CSV files. The processing of structured data typically involves SQL queries for extraction, transformations through simple functions or procedures, and loading into destination tables or databases. + +- Semi-structured Data: JSON files, XML, or even some APIs' data fit this category. These data types don't have a rigid schema but have some organizational properties that can be exploited. Semi-structured data often requires parsers that can navigate its structure (like trees for XML or key-value pairs for JSON) to extract necessary information. Libraries such as json in Python or lxml for XML handling can be very useful here. + +- Unstructured Data: This category includes text files, logs, or even images and videos. + + ### Concept 2: Data Enrichment with LLMs -LLMs are adept at processing unstructured data. We can easily extract summaries, keywords, and other useful information from documents. +LLMs are adept at processing unstructured data. They can easily extract summaries, keywords, and other useful information from documents. We use function calling with Pydantic models to extract the data and dspy to train our functions.
![Data Enrichment](img/enrichment.png)
Data Enrichment Example
-We decompose content into graphs, allowing us to more precisely map out the relationships between entities and concepts. - +We decompose the loaded content into graphs, allowing us to more precisely map out the relationships between entities and concepts. ### Concept 3: Linguistic Analysis LLMs are probabilistic models, meaning they can make mistakes. To mitigate this, we can use a combination of NLP and LLMs to determine how to analyze the data and score each part of the text. @@ -61,7 +69,7 @@ This involves bombarding the RAG system with hundreds of synthetic questions, en This method paves the way for developing self-improving memory engines that can adapt to new data and user feedback. ## Architecture Overview -A high-level diagram of the cognee's architecture, illustrating the main components and their interactions. +A high-level diagram of cognee's architecture, illustrating the main components and their interactions.
![Architecture](img/architecture.png) @@ -80,11 +88,7 @@ Main components: ## How It Fits Into Your Projects !!! info "How cognee fits into your projects" - cognee is a self-contained library that simplifies the process of loading and structuring LLM context. It can be integrated into your data pipelines to enhance your AI applications. - -By integrating cognee into your data pipelines, you can leverage the power of LLMs, knowledge graphs, and vector retrieval to enhance your AI applications. + cognee is a self-contained library that simplifies the process of loading and structuring data in LLMs. +By integrating cognee into your data pipelines, you can leverage the power of LLMs, knowledge graphs, and vector retrieval to create accurate and explainable AI solutions. cognee provides a self-contained library that simplifies the process of loading and structuring LLM context, enabling you to create accurate and explainable AI solutions. - -Check out some [case studies](case_studies.md) to see how cognee has been used in real-world applications. - diff --git a/docs/img/bad_architecture.png b/docs/img/bad_architecture.png new file mode 100644 index 000000000..66c350802 Binary files /dev/null and b/docs/img/bad_architecture.png differ diff --git a/docs/img/good_architecture.png b/docs/img/good_architecture.png new file mode 100644 index 000000000..acf8c5dc5 Binary files /dev/null and b/docs/img/good_architecture.png differ diff --git a/docs/index.md b/docs/index.md index 0d560ba80..068b319d9 100644 --- a/docs/index.md +++ b/docs/index.md @@ -18,8 +18,7 @@ _Open-source framework for loading and structuring LLM context to create accurat ### Let's learn about cogneeHub! -cogneeHub is a free and open-sourced learning platform for those interested in creating deterministic LLM outputs. -We help people with using graphs, LLMs and adding vector retrieval to their ML stack. +cogneeHub is a free, open-source learning platform for those interested in creating deterministic LLM outputs. We help developers by using graphs, LLMs, and adding vector retrieval to their Machine Learning stack. - **Get started** — [Get started with cognee quickly and try it out for yourself.](quickstart.md) - **Conceptual Overview** — Learn about the [core concepts](conceptual_overview.md) of cognee and how it fits into your projects. diff --git a/docs/local_models.md b/docs/local_models.md index 52064c755..f83f2f03b 100644 --- a/docs/local_models.md +++ b/docs/local_models.md @@ -10,7 +10,7 @@ You'll need to run the local model on your machine or use one of the providers h Set up Ollama by following instructions on [Ollama website](https://ollama.com/) -Set the environment variable to use the model +Set the environment variable in your .env to use the model ```bash LLM_PROVIDER = 'ollama' @@ -19,18 +19,15 @@ LLM_PROVIDER = 'ollama' Otherwise, you can set the configuration for the model: ```bash -from cognee.infrastructure import infrastructure_config -infrastructure_config.set_config({ - "llm_provider": 'ollama' -}) +cognee.config.llm_provider = 'ollama' ``` You can also set the HOST and model name: ```bash -CUSTOM_OLLAMA_ENDPOINT= "http://localhost:11434/v1" -CUSTOM_OLLAMA_MODEL = "mistral:instruct" +cognee.config.llm_endpoint = "http://localhost:11434/v1" +cognee.config.llm_model = "mistral:instruct" ``` @@ -43,17 +40,14 @@ LLM_PROVIDER = 'custom' Otherwise, you can set the configuration for the model: ```bash -from cognee.infrastructure import infrastructure_config -infrastructure_config.set_config({ - "llm_provider": 'custom' -}) +cognee.config.llm_provider = 'custom' ``` You can also set the HOST and model name: ```bash -CUSTOM_LLM_MODEL = "mistralai/Mixtral-8x7B-Instruct-v0.1" -CUSTOM_ENDPOINT = "https://api.endpoints.anyscale.com/v1" -CUSTOM_LLM_API_KEY = "your_api_key" +LLM_MODEL = "mistralai/Mixtral-8x7B-Instruct-v0.1" +LLM_ENDPOINT = "https://api.endpoints.anyscale.com/v1" +LLM_API_KEY = "your_api_key" ``` You can set the same way HOST and model name for any other provider that has an API endpoint. diff --git a/docs/quickstart.md b/docs/quickstart.md index affa97d0f..ae7ab68a5 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -13,10 +13,12 @@ You can also use Ollama or Anyscale as your LLM provider. For more info on local ``` import os -os.environ["WEAVIATE_URL"] = "YOUR_WEAVIATE_URL" -os.environ["WEAVIATE_API_KEY"] = "YOUR_WEAVIATE_API_KEY" +os.environ["LLM_API_KEY"] = "YOUR_OPENAI_API_KEY" +``` +or +``` +cognee.config.llm_api_key = "YOUR_OPENAI_API_KEY" -os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" ``` ## Run diff --git a/docs/why.md b/docs/why.md index 3ca396387..040cefb76 100644 --- a/docs/why.md +++ b/docs/why.md @@ -1,28 +1,29 @@ # Why use cognee? +cognee is one of the first OSS tools that enables easy, scalable and flexible use of LLMs to process large volumes of documents using GraphRAG approach. + LLMs don't have a semantic layer, and they don't have a way to understand the data they are processing. This is where cognee comes in. We let you define logical structures for your data and then use these structures to guide the LLMs to process the data in a way that makes sense to you. +cognee helps you avoid the overly complicated set of tools and processes to give you somewhat reliable output +***From*** + +![Bad Architecture](img/bad_architecture.png) + +***To*** + +![Good Architecture](img/good_architecture.png) + ??? note "Why use cognee?" - Its hard to answer the question of why use cognee without answering why you need thin LLM frameworks in the first place.: - - **Cost effective** — cognee extends the capabilities of your LLMs without the need for expensive data processing tools. - - **Self contained** — cognee runs as a library and is simple to use - - **Easy to use** — cognee is simple to use and can be used by anyone with a basic understanding of Python - - **Flexible** — cognee can be used to structure data in any way you want, and can be used to structure data in any way you want. We rely on the work done by Pydantic and are inspired by the Instructor library, which is a simple way to structure data for LLMs. + Its hard to answer the question of why use cognee without answering why you need thin LLM frameworks in the first place.:) + - **Cost-effective** — cognee extends the capabilities of your LLMs without the need for expensive data processing tools. + - **Self-contained** — cognee runs as a simple-to-use library meaning you can add it to your application easily + - **Easy to use** — Navigate graphs instead of embeddings to understand your data faster and better + - **Flexible** — cognee lets you control your input and provide your own Pydantic data models. -## Bring your own data model - -If you are building an AI vertical, most of the time you will have a specific data model that you want to use. Cognee lets you bring your own data model and use it to structure your data in a way that makes sense to you. - - -## Data processing - -With dlt you can avoid all the boilerplate code that comes with data processing. We let you define logical structures for your data and then use these structures, deduplicated, incremental and replayable - -