cognee/docs/kr/core-concepts/further-concepts/datasets.md
HectorSin fbead80a36 docs: setup documentation structure for i18n (en/ko)
Signed-off-by: HectorSin <kkang15634@ajou.ac.kr>
2026-01-14 12:17:24 +09:00

2.7 KiB
Raw Blame History

Datasets

Project-level containers for organization, permissions, and processing

What is a dataset in Cognee?

A dataset is a named container that groups documents and their metadata. It is the main boundary for:

  • Organizing content
  • Running pipelines
  • Applying permissions
**Dataset isolation** requires specific configuration. See [permissions system](../permissions-system/datasets#dataset-isolation) for details on access control requirements and supported database setups.
  • Add:

    • Direct new content into a specific dataset (by name or ID)
    • If it doesnt exist, Cognee creates it and associates your permissions
    • Items ingested are linked to that dataset and deduplicated within it
  • Cognify:

    • Choose which dataset(s) to transform into a knowledge graph
    • Loads the datasets content, checks rights, and runs the pipeline per dataset
    • If none are specified, processes all datasets youre authorized to use
    • Progress is tracked per dataset for reliable re-runs
  • Search:

    • Queries can be scoped by dataset
    • Results and metrics remain separated by dataset

Access control

  • Permissions (read, write, share, delete) are enforced at the dataset level
  • Share one dataset with a team, keep another private
  • Independently manage who can modify or distribute content

Incremental processing

  • Processing status is tracked per dataset
  • After you add more data, Cognify focuses on new or changed items
  • Skips whats already completed for that dataset

Datasets vs NodeSets

Datasets scope storage, permissions, and pipeline execution; NodeSets are semantic tags within a dataset.

  • During Add, you can label items with one or more NodeSet names (e.g., "AI", "FinTech")
  • Cognify propagates those labels into the graph by creating NodeSet nodes and linking derived chunks and entities via belongs_to_set relationships
  • This lets you slice a single datasets graph by topic or team without creating new datasets, while dataset-level permissions still control overall access
Direct content into a dataset Run pipelines per dataset Scope queries by dataset

To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.cognee.ai/llms.txt