diff --git a/cognee/api/client.py b/cognee/api/client.py
index 74cefcb8e..9672dc47e 100644
--- a/cognee/api/client.py
+++ b/cognee/api/client.py
@@ -232,13 +232,15 @@ async def add(
datasetId,
)
return JSONResponse(
- status_code=200,
- content="OK"
+ status_code = 200,
+ content = {
+ "message": "OK"
+ }
)
except Exception as error:
return JSONResponse(
- status_code=409,
- content={"error": str(error)}
+ status_code = 409,
+ content = {"error": str(error)}
)
class CognifyPayload(BaseModel):
@@ -252,7 +254,9 @@ async def cognify(payload: CognifyPayload):
await cognee_cognify(payload.datasets)
return JSONResponse(
status_code = 200,
- content = "OK"
+ content = {
+ "message": "OK"
+ }
)
except Exception as error:
return JSONResponse(
diff --git a/docs/api_reference.md b/docs/api_reference.md
index c87028c52..8fe7ccbc6 100644
--- a/docs/api_reference.md
+++ b/docs/api_reference.md
@@ -1,122 +1,262 @@
-# cognee API Reference
+# Cognee API Reference
## Overview
+The Cognee API provides a set of endpoints for managing datasets, performing cognitive tasks, and configuring various settings in the system. The API is built on FastAPI and includes multiple routes to handle different functionalities. This reference outlines the available endpoints and their usage.
-The Cognee API has:
+## Base URL
-1. Python library configuration entry points
-2. FastAPI server
+The base URL for all API requests is determined by the server's deployment environment. Typically, this will be:
+- **Development**: `http://localhost:8000`
+- **Production**: Depending on your server setup.
-## Python Library
+## Endpoints
-# Module: cognee.config
+### 1. Root
-This module provides functionalities to configure various aspects of the system's operation in the cognee library.
-It interfaces with a set of Pydantic settings singleton classes to manage the system's configuration.
+- **URL**: `/`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Root endpoint that returns a welcome message.
+
+ **Response**:
+ ```json
+ {
+ "message": "Hello, World, I am alive!"
+ }
+ ```
-## Overview
-The config class in this module offers a series of static methods to configure the system's directories, various machine learning models, and other parameters.
+### 2. Health Check
+- **URL**: `/health`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Health check endpoint that returns the server status.
+
+ **Response**:
+ ```json
+ {
+ "status": "OK"
+ }
+ ```
-## Methods
+### 3. Get Datasets
+- **URL**: `/datasets`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Retrieve a list of available datasets.
+
+ **Response**:
+ ```json
+ [
+ {
+ "id": "dataset_id_1",
+ "name": "Dataset Name 1",
+ "description": "Description of Dataset 1",
+ ...
+ },
+ ...
+ ]
+ ```
-### system_root_directory(system_root_directory: str)
-Sets the root directory of the system where essential system files and operations are managed. Parameters:
-system_root_directory (str): The path to set as the system's root directory.
-Example:
-```python
-cognee.config.system_root_directory('/path/to/system/root')
-```
+### 4. Delete Dataset
-### data_root_directory(data_root_directory: str)
-Sets the directory for storing data used and generated by the system.
-Parameters:
-data_root_directory (str): The path to set as the data root directory.
+- **URL**: `/datasets/{dataset_id}`
+- **Method**: `DELETE`
+- **Auth Required**: No
+- **Description**: Delete a specific dataset by its ID.
+
+ **Path Parameters**:
+ - `dataset_id`: The ID of the dataset to delete.
+
+ **Response**:
+ ```json
+ {
+ "status": "OK"
+ }
+ ```
-Example:
-```python
-import cognee
-cognee.config.data_root_directory('/path/to/data/root')
-```
+### 5. Get Dataset Graph
-### set_classification_model(classification_model: object)
-Assigns a machine learning model for classification tasks within the system.
-Parameters:
-classification_model (object): The Pydantic model to use for classification.
-Check cognee.shared.data_models for existing models.
-Example:
-```python
-import cognee
-cognee.config.set_classification_model(model)
-```
+- **URL**: `/datasets/{dataset_id}/graph`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Retrieve the graph visualization URL for a specific dataset.
+
+ **Path Parameters**:
+ - `dataset_id`: The ID of the dataset.
+
+ **Response**:
+ ```json
+ "http://example.com/path/to/graph"
+ ```
-set_summarization_model(summarization_model: object)
-Sets the Pydantic model to be used for summarization tasks.
-Parameters:
-summarization_model (object): The model to use for summarization.
-Check cognee.shared.data_models for existing models.
-Example:
-```python
-import cognee
-cognee.config.set_summarization_model(my_summarization_model)
-```
+### 6. Get Dataset Data
-### set_llm_model(llm_model: object)
-Determines the model to handle LLMs. Parameters:
-llm_model (object): The model to use for LLMs.
-Example:
-```python
-import cognee
-cognee.config.set_llm_model("openai")
-```
+- **URL**: `/datasets/{dataset_id}/data`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Retrieve data associated with a specific dataset.
+
+ **Path Parameters**:
+ - `dataset_id`: The ID of the dataset.
+
+ **Response**:
+ ```json
+ [
+ {
+ "data_id": "data_id_1",
+ "content": "Data content here",
+ ...
+ },
+ ...
+ ]
+ ```
-### graph_database_provider(graph_engine: string)
-Sets the engine to manage graph processing tasks.
-Parameters:
-graph_database_provider (object): The engine for graph tasks.
-Example:
-```python
-from cognee.shared.data_models import GraphDBType
+### 7. Get Dataset Status
-cognee.config.set_graph_engine(GraphDBType.NEO4J)
-```
+- **URL**: `/datasets/status`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Retrieve the status of one or more datasets.
+
+ **Query Parameters**:
+ - `dataset`: A list of dataset IDs to check status for.
+
+ **Response**:
+ ```json
+ {
+ "dataset_id_1": "Status 1",
+ "dataset_id_2": "Status 2",
+ ...
+ }
+ ```
+### 8. Get Raw Data
+- **URL**: `/datasets/{dataset_id}/data/{data_id}/raw`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Retrieve the raw data file for a specific data entry in a dataset.
+
+ **Path Parameters**:
+ - `dataset_id`: The ID of the dataset.
+ - `data_id`: The ID of the data entry.
+
+ **Response**: Raw file download.
-## API
+### 9. Add Data
+- **URL**: `/add`
+- **Method**: `POST`
+- **Auth Required**: No
+- **Description**: Add new data to a dataset. The data can be uploaded from a file or a URL.
+
+ **Form Parameters**:
+ - `datasetId`: The ID of the dataset to add data to.
+ - `data`: A list of files to upload.
+ **Request**
+ ```json
+ {
+ "dataset_id": "ID_OF_THE_DATASET_TO_PUT_DATA_IN", // Optional, we use "main" as default.
+ "files": File[]
+ }
+ ```
+
+ **Response**:
+ ```json
+ {
+ "message": "OK"
+ }
+ ```
-For each API endpoint, provide the following details:
+### 10. Cognify
+- **URL**: `/cognify`
+- **Method**: `POST`
+- **Auth Required**: No
+- **Description**: Perform cognitive processing on the specified datasets.
+
+ **Request Body**:
+ ```json
+ {
+ "datasets": ["ID_OF_THE_DATASET_1", "ID_OF_THE_DATASET_2", ...]
+ }
+ ```
+
+ **Response**:
+ ```json
+ {
+ "message": "OK"
+ }
+ ```
+### 11. Search
-### Endpoint 1: Root
-- URL: /add
-- Method: POST
-- Auth Required: No
-- Description: Root endpoint that returns a welcome message.
+- **URL**: `/search`
+- **Method**: `POST`
+- **Auth Required**: No
+- **Description**: Search for nodes in the graph based on the provided query parameters.
+
+
+
+ **Request Body**:
+ ```json
+ {
+ "query_params": [{
+ "query": "QUERY_TO_MATCH_DATA",
+ "searchType": "SIMILARITY", // or TRAVERSE, ADJACENT, SUMMARY
+ }]
+ }
+ ```
+ **Response**
+ ```json
+ {
+ "results": [
+ {
+ "node_id": "node_id_1",
+ "attributes": {...},
+ ...
+ },
+ ...
+ ]
+ }
+ ```
-#### Response
-```json
-{
- "message": "Hello, World, I am alive!"
-}
-```
+### 12. Get Settings
-### Endpoint 1: Health Check
-- URL: /health
-- Method: GET
-- Auth Required: No
-- Description: Health check endpoint that returns the server status.
-#### Response
-```json
-{
- "status": "OK"
-}
-```
+- **URL**: `/settings`
+- **Method**: `GET`
+- **Auth Required**: No
+- **Description**: Retrieve the current system settings.
+
+ **Response**:
+ ```json
+ {
+ "llm": {...},
+ "vectorDB": {...},
+ ...
+ }
+ ```
-More endpoints are available in the FastAPI server. Documentation is in progress
+### 13. Save Settings
+
+- **URL**: `/settings`
+- **Method**: `POST`
+- **Auth Required**: No
+- **Description**: Save new settings for the system, including LLM and vector DB configurations.
+
+ **Request Body**:
+ - `llm`: Optional. The configuration for the LLM provider.
+ - `vectorDB`: Optional. The configuration for the vector database provider.
+
+ **Response**:
+ ```json
+ {
+ "status": "OK"
+ }
+ ```
diff --git a/docs/assets/favicon.png b/docs/assets/favicon.png
new file mode 100644
index 000000000..c3c39b7ed
Binary files /dev/null and b/docs/assets/favicon.png differ
diff --git a/docs/assets/logo.png b/docs/assets/logo.png
new file mode 100644
index 000000000..c37c647ea
Binary files /dev/null and b/docs/assets/logo.png differ
diff --git a/docs/conceptual_overview.md b/docs/conceptual_overview.md
index 3e4041626..f82ac22bc 100644
--- a/docs/conceptual_overview.md
+++ b/docs/conceptual_overview.md
@@ -32,21 +32,14 @@ Data Types and Their Handling
### Concept 2: Data Enrichment with LLMs
-LLMs are adept at processing unstructured data. They can easily extract summaries, keywords, and other useful information from documents. We use function calling with Pydantic models to extract the data and dspy to train our functions.
+LLMs are adept at processing unstructured data. They can easily extract summaries, keywords, and other useful information from documents. We use function calling with Pydantic models to extract information from the unstructured data.

Data Enrichment Example
We decompose the loaded content into graphs, allowing us to more precisely map out the relationships between entities and concepts.
-### Concept 3: Linguistic Analysis
-LLMs are probabilistic models, meaning they can make mistakes.
-To mitigate this, we can use a combination of NLP and LLMs to determine how to analyze the data and score each part of the text.
-
-
-Linguistic analysis
-
-### Concept 4: Graphs
+### Concept 3: Graphs
Knowledge graphs simply map out knowledge, linking specific facts and their connections.
When Large Language Models (LLMs) process text, they infer these links, leading to occasional inaccuracies due to their probabilistic nature.
@@ -57,11 +50,12 @@ This structured approach can extend beyond concepts to document layouts, pages,

Graph Structure
-### Concept 5: Vector and Graph Retrieval
+
+### Concept 4: Vector and Graph Retrieval
Cognee lets you use multiple vector and graph retrieval methods to find the most relevant information.
!!! info "Learn more?"
Check out learning materials to see how you can use these methods in your projects.
-### Concept 6: Auto-Optimizing Pipelines
+### Concept 5: Auto-Optimizing Pipelines
Integrating knowledge graphs into Retrieval-Augmented Generation (RAG) pipelines leads to an intriguing outcome: the system's adeptness at contextual understanding allows it to be evaluated in a way Machine Learning (ML) engineers are accustomed to.
This involves bombarding the RAG system with hundreds of synthetic questions, enabling the knowledge graph to evolve and refine its context autonomously over time.
@@ -80,10 +74,9 @@ Main components:
- **Data Pipelines**: Responsible for ingesting, processing, and transforming data from various sources.
- **LLMs**: Large Language Models that process unstructured data and generate text.
-- **Graphs**: Knowledge graphs that represent relationships between entities and concepts.
-- **Vector Stores**: Databases that store vector representations of data for efficient retrieval.
-- **dspy module**: Pipelines that automatically adjust based on feedback and data changes.
-- **Search wrapper**: Retrieves relevant information from the knowledge graph and vector stores.
+- **Graph Store**: Knowledge graphs that represent relationships between entities and concepts.
+- **Vector Store**: Database that stores vector representations of data for efficient retrieval.
+- **Search**: Retrieves relevant information from the knowledge graph and vector stores.
## How It Fits Into Your Projects
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 000000000..a49e8f84e
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,93 @@
+# Configuration
+
+
+
+## 🚀 Configure Vector and Graph Stores
+
+You can configure the vector and graph stores using the environment variables in your .env file or programatically.
+We use [Pydantic Settings](https://docs.pydantic.dev/latest/concepts/pydantic_settings/#dotenv-env-support)
+
+We have a global configuration object (cognee.config) and individual configurations on pipeline and data store levels
+
+Check available configuration options:
+``` python
+from cognee.infrastructure.databases.vector import get_vectordb_config
+from cognee.infrastructure.databases.graph.config import get_graph_config
+from cognee.infrastructure.databases.relational import get_relational_config
+print(get_vectordb_config().to_dict())
+print(get_graph_config().to_dict())
+print(get_relational_config().to_dict())
+
+```
+
+Setting the environment variables in your .env file, and Pydantic will pick them up:
+
+```bash
+GRAPH_DATABASE_PROVIDER = 'lancedb'
+
+```
+Otherwise, you can set the configuration yourself:
+
+```python
+
+cognee.config.llm_provider = 'ollama'
+```
+
+## 🚀 Getting Started with Local Models
+
+You'll need to run the local model on your machine or use one of the providers hosting the model.
+!!! note "We had some success with mixtral, but 7b models did not work well. We recommend using mixtral for now."
+
+### Ollama
+
+Set up Ollama by following instructions on [Ollama website](https://ollama.com/)
+
+
+Set the environment variable in your .env to use the model
+
+```bash
+LLM_PROVIDER = 'ollama'
+
+```
+Otherwise, you can set the configuration for the model:
+
+```bash
+cognee.config.llm_provider = 'ollama'
+
+```
+You can also set the HOST and model name:
+
+```bash
+
+cognee.config.llm_endpoint = "http://localhost:11434/v1"
+cognee.config.llm_model = "mistral:instruct"
+```
+
+
+### Anyscale
+
+```bash
+LLM_PROVIDER = 'custom'
+
+```
+Otherwise, you can set the configuration for the model:
+
+```bash
+cognee.config.llm_provider = 'custom'
+
+```
+You can also set the HOST and model name:
+```bash
+LLM_MODEL = "mistralai/Mixtral-8x7B-Instruct-v0.1"
+LLM_ENDPOINT = "https://api.endpoints.anyscale.com/v1"
+LLM_API_KEY = "your_api_key"
+```
+
+You can set the same way HOST and model name for any other provider that has an API endpoint.
+
+
+
+
+
+
+
diff --git a/docs/data_ingestion.md b/docs/data_ingestion.md
new file mode 100644
index 000000000..ff2e7309b
--- /dev/null
+++ b/docs/data_ingestion.md
@@ -0,0 +1,46 @@
+# How data ingestion with cognee works
+
+
+
+
+# Why bother with data ingestion?
+
+In order to use cognee, you need to ingest data into the cognee data store.
+This data can be events, customer data, or third-party data.
+
+In order to build reliable models and pipelines, we need to structure and process various types of datasets and data sources in the same way.
+Some of the operations like normalization, deduplication, and data cleaning are common across all data sources.
+
+
+This is where cognee comes in. It provides a unified interface to ingest data from various sources and process it in a consistent way.
+For this we use dlt (Data Loading Tool) which is a part of cognee infrastructure.
+
+
+# Example
+
+Let's say you have a dataset of customer reviews in a PDF file. You want to ingest this data into cognee and use it to train a model.
+
+You can use the following code to ingest the data:
+
+```python
+dataset_name = "artificial_intelligence"
+
+ai_text_file_path = os.path.join(pathlib.Path(__file__).parent, "test_data/artificial-intelligence.pdf")
+await cognee.add([ai_text_file_path], dataset_name)
+
+```
+
+cognee uses dlt to ingest the data and allows you to use:
+
+1. SQL databases. Supports PostgreSQL, MySQL, MS SQL Server, BigQuery, Redshift, and more.
+2. REST API generic source. Loads data from REST APIs using declarative configuration.
+3. OpenAPI source generator. Generates a source from an OpenAPI 3.x spec using the REST API source.
+4. Cloud and local storage. Retrieves data from AWS S3, Google Cloud Storage, Azure Blob Storage, local files, and more.
+
+
+
+# What happens under the hood?
+
+We use dlt as a loader to ingest data into the cognee metadata store. We can ingest data from various sources like SQL databases, REST APIs, OpenAPI specs, and cloud storage.
+This enables us to have a common data model we can then use to build models and pipelines.
+The models and pipelines we build in this way end up in the cognee data store, which is a unified interface to access the data.
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 4755a8e71..7787521b2 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,381 +1,31 @@
-# cognee
+# New to cognee?
+The getting started guide covers adding a cognee data store to your AI app, sending data, identifying users, extracting actions and insights, and interconnecting separate datasets.
+[Get started](quickstart.md)
-#### Deterministic LLMs Outputs for AI Engineers
+## Ingest Data
+Learn how to manage the ingestion of events, customer data, or third-party data for use with cognee.
+[Explore](data_ingestion.md)
-_Open-source framework for loading and structuring LLM context to create accurate and explainable AI solutions using knowledge graphs and vector stores_
+## Tasks and Pipelines
+Analyze and enrich your data and improve LLM answers with a series of tasks and pipelines.
+[Learn about tasks](templates.md)
----
+## API
+Push or pull data to build custom functionality or create bespoke views for your business needs.
+[Explore](api_reference.md)
-[](https://twitter.com/tricalt)
+## Resources
+### Resources
-[](https://pypi.python.org/pypi/cognee)
+- [Research](research.md)
+- [Community](https://discord.gg/52QTb5JK){:target="_blank"}
-
-[](https://github.com/topoteretes/cognee)
-
-
-### Let's learn about cogneeHub!
-
-
-cogneeHub is a free, open-source learning platform for those interested in creating deterministic LLM outputs. We help developers by using graphs, LLMs, and adding vector retrieval to their Machine Learning stack.
-
-
-- **Get started** — [Get started with cognee quickly and try it out for yourself.](quickstart.md)
-
-- **Conceptual Overview** — Learn about the [core concepts](conceptual_overview.md) of cognee and how it fits into your projects.
-
-- **Data Engineering and LLMOps** — Learn about some [data engineering and llmops](data_engineering_llm_ops.md) core concepts that will help you build better AI apps.
-
-- **RAGs** — We provide easy-to-follow [learning materials](rags.md) to help you learn about RAGs.
-
-- **Research** — A list of resources to help you learn more about [cognee and LLM memory research](research.md)
-
-- **Blog** — A blog where you can read about the [latest news and updates](blog/index.md) about cognee.
-
-- **Support** — [Book time](https://www.cognee.ai/#bookTime) with our team.
-
-
-[//]: # (- **Case Studies** — Read about [case studies](case_studies.md) that show how cognee can be used in real-world applications.)
-
-
-### Vision
-
-
-
-
-
-
-### Architecture
-
-
-
-
-### Why use cognee?
-
-
-The question of using cognee is fundamentally a question of why to have deterministic outputs for your llm workflows.
-
-
-1. **Cost-effective** — cognee extends the capabilities of your LLMs without the need for expensive data processing tools.
-
-
-2. **Self-contained** — cognee runs as a library and is simple to use
-
-
-3. **Interpretable** — Navigate graphs instead of embeddings to understand your data.
-
-
-4. **User Guided** — cognee lets you control your input and provide your own Pydantic data models
-
-
-
-
-## License
-
-
-This project is licensed under the terms of the Apache License 2.0.
-
-
-[//]: # ()
-
-[//]: # ()
-[//]: # ()
-[//]: # (# New to cognee?)
-
-[//]: # ()
-[//]: # ()
-[//]: # (The getting started guide covers adding a GraphRAG data store to your AI app, sending events, identifying users, extracting actions and insights, and interconnecting separate datasets.)
-
-[//]: # ()
-[//]: # ()
-[//]: # (