HectorSin fbead80a36 docs: setup documentation structure for i18n (en/ko)

Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>

2026-01-14 12:17:24 +09:00

3.4 KiB

Raw Blame History

Pipelines

Orchestrating tasks into coordinated workflows for data processing

What pipelines are

Pipelines coordinate ordered Tasks into a reproducible workflow. Default Cognee operations like Add and Cognify run on top of the same execution layer. You typically do not call low-level functions directly; you trigger pipelines through these operations.

Prerequisites

Dataset: a container (name or UUID) where your data is stored and processed. Every document added to cognee belongs to a dataset.
User: the identity for ownership and access control. A default user is created and used if none is provided.
More details are available below

How pipelines run

Somewhat unsurprisingly, the function used to run pipelines is called run_pipeline.

Cognee uses a layered execution model: a single call to run_pipeline orchestrates multi-dataset processing by running per-file pipelines through the sequence of tasks.

Statuses are yielded as the pipeline runs and written to databases where appropriate
User access to datasets and files is carefully verified at each layer
Pipeline run information includes dataset IDs, completion status, and error handling
Background execution uses queues to manage status updates and avoid database conflicts

- Innermost layer: individual task execution with telemetry and recursive task running in batches - Middle layer: per-dataset pipeline management and task orchestration - Outermost layer: multi-dataset orchestration and overall pipeline execution - Execution modes: blocking (wait for completion) or background (return immediately with "started" status) * Use [Cognify](../main-operations/cognify) with custom tasks after [Add](../main-operations/add) * Modify transformation steps without touching low-level functions, avoid going below `run_pipeline` * Custom tasks let you extend or replace default behavior * Identity: represents who owns and acts on data. If omitted, a default user is used * Ownership: every ingested item is tied to a user; content is deduplicated per owner * Permissions: enforced per dataset (read/write/delete/share) during processing and API access * Container: a named or UUID-scoped collection of related data and derived knowledge * Scoping: Add writes into a specific dataset; Cognify processes the dataset(s) you pass * Lifecycle: new names create datasets and grant the calling user permissions; UUIDs let you target existing datasets (given permission) Learn about the individual processing units that make up pipelines Understand the structured outputs that pipelines produce See how pipelines are used in Add, Cognify, and Search workflows

To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.cognee.ai/llms.txt

3.4 KiB Raw Blame History

Pipelines

What pipelines are

Prerequisites

How pipelines run

3.4 KiB

Raw Blame History