# Add
> Ingesting and preparing data for processing in Cognee
## What is the add operation
The `.add` operation is how you bring content into Cognee. It takes your files, directories, or raw text, normalizes them into plain text, and records them into a dataset that Cognee can later expand into vectors and graphs with [Cognify](../main-operations/cognify).
* **Ingestion-only**: no embeddings, no graph yet
* **Flexible input**: raw text, local files, directories, or S3 URIs
* **Normalized storage**: everything is turned into text and stored consistently
* **Deduplicated**: Cognee uses content hashes to avoid duplicates
* **Dataset-first**: everything you add goes into a dataset
* Datasets are how Cognee keeps different collections organized (e.g. "research-papers", "customer-reports")
* Each dataset has its own ID, owner, and permissions for access control
* You can read more about them below
## Where add fits
* First step before you run [Cognify](../main-operations/cognify)
* Use it to **create a dataset** from scratch, or **append new data** over time
* Ideal for both local experiments and programmatic ingestion from storage (e.g. S3)
## What happens under the hood
1. **Expand your input**
* Directories are walked, S3 paths are expanded, raw text is passed through
* Result: a flat list of items (files, text, handles)
2. **Ingest and register**
* Files are saved into Cognee's storage and converted to text
* Cognee computes a stable content hash to prevent duplicates
* Each item becomes a record in the database and is attached to your dataset
* **Text extraction**: Converts various file formats into plain text
* **Metadata preservation**: Keeps file information like source, creation date, and format
* **Content normalization**: Ensures consistent text encoding and formatting
3. **Return a summary**
* You get a pipeline run info object that tells you where everything went and which dataset is ready for the next step
## After add finishes
After `.add` completes, your data is ready for the next stage:
* **Files are safely stored** in Cognee's storage system with metadata preserved
* **Database records** track each ingested item and link it to your dataset
* **Dataset is prepared** for transformation with [Cognify](../main-operations/cognify) — which will chunk, embed, and connect everything
## Further details
* Mix and match: `["some text", "/path/to/file.pdf", "s3://bucket/data.csv"]`
* Works with directories (recursively), S3 prefixes, and file handles
* Local and cloud sources are normalized into the same format
* **Text**: `.txt, .md, .csv, .json, …`
* **PDF**: `.pdf`
* **Images**: common formats like `.png, .jpg, .gif, .webp, …`
* **Audio**: `.mp3, .wav, .flac, …`
* **Office docs**: `.docx, .pptx, .xlsx, …`
* **Docling**: Cognee can also ingest the `DoclingDocument` format. Any format that [Docling](https://github.com/docling-project/docling) supports as input can be converted, then passed on to Cognee's add.
* Cognee chooses the right loader for each format under the hood
* A dataset is your "knowledge base" — a grouping of related data that makes sense together
* Datasets are **first-class objects in Cognee's database** with their own ID, name, owner, and permissions
* They provide **scope**: `.add` writes into a dataset, [Cognify](../main-operations/cognify) processes per-dataset
* Think of them as separate shelves in your library — e.g., a "research-papers" dataset and a "customer-reports" dataset
* If you name a dataset that doesn't exist, Cognee creates it for you; if you don't specify, a default one is used
* Every dataset and data item belongs to a user
* If you don't pass a user, Cognee creates/uses a default one
* Ownership controls who can later read, write, or share that dataset
* Optional labels to group or tag data on ingestion
* Example: `node_set=["AI", "FinTech"]`
* Useful later when you want to focus on subgraphs
Expand data into chunks, embeddings, and graphs
The units you'll see after Cognify
Learn about Tasks and Pipelines behind Add
---
> To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.cognee.ai/llms.txt