cognee/new-examples/configurations/database_examples/chromadb_vector_database_configuration.py
Hande 5f8a3e24bd
refactor: restructure examples and starter kit into new-examples (#1862)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Deprecated legacy examples and added a migration guide mapping old
paths to new locations
* Added a comprehensive new-examples README detailing configurations,
pipelines, demos, and migration notes

* **New Features**
* Added many runnable examples and demos: database configs,
embedding/LLM setups, permissions and access-control, custom pipelines
(organizational, product recommendation, code analysis, procurement),
multimedia, visualization, temporal/ontology demos, and a local UI
starter

* **Chores**
  * Updated CI/test entrypoints to use the new-examples layout

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-12-20 02:07:28 +01:00

89 lines
3.3 KiB
Python

import os
import pathlib
import asyncio
import cognee
from cognee.modules.search.types import SearchType
async def main():
"""
Example script demonstrating how to use Cognee with ChromaDB
This example:
1. Configures Cognee to use ChromaDB as vector database
2. Sets up data directories
3. Adds sample data to Cognee
4. Processes (cognifies) the data
5. Performs different types of searches
"""
# Configure ChromaDB as the vector database provider
cognee.config.set_vector_db_config(
{
"vector_db_url": "http://localhost:8000", # Default ChromaDB server URL
"vector_db_key": "", # ChromaDB doesn't require an API key by default
"vector_db_provider": "chromadb", # Specify ChromaDB as provider
}
)
# Set up data directories for storing documents and system files
# You should adjust these paths to your needs
current_dir = pathlib.Path(__file__).parent
data_directory_path = str(current_dir / "data_storage")
cognee.config.data_root_directory(data_directory_path)
cognee_directory_path = str(current_dir / "cognee_system")
cognee.config.system_root_directory(cognee_directory_path)
# Clean any existing data (optional)
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
# Create a dataset
dataset_name = "chromadb_example"
# Add sample text to the dataset
sample_text = """ChromaDB is an open-source embedding database.
It allows users to store and query embeddings and their associated metadata.
ChromaDB can be deployed in various ways: in-memory, on disk via sqlite, or as a persistent service.
It is designed to be fast, scalable, and easy to use, making it a popular choice for AI applications.
The database is built to handle vector search efficiently, which is essential for semantic search applications.
ChromaDB supports multiple distance metrics for vector similarity search and can be integrated with various ML frameworks."""
# Add the sample text to the dataset
await cognee.add([sample_text], dataset_name)
# Process the added document to extract knowledge
await cognee.cognify([dataset_name])
# Now let's perform some searches
# 1. Search for insights related to "ChromaDB"
insights_results = await cognee.search(
query_type=SearchType.GRAPH_COMPLETION, query_text="ChromaDB"
)
print("\nInsights about ChromaDB:")
for result in insights_results:
print(f"- {result}")
# 2. Search for text chunks related to "vector search"
chunks_results = await cognee.search(
query_type=SearchType.CHUNKS, query_text="vector search", datasets=[dataset_name]
)
print("\nChunks about vector search:")
for result in chunks_results:
print(f"- {result}")
# 3. Get graph completion related to databases
graph_completion_results = await cognee.search(
query_type=SearchType.GRAPH_COMPLETION, query_text="database"
)
print("\nGraph completion for databases:")
for result in graph_completion_results:
print(f"- {result}")
# Clean up (optional)
# await cognee.prune.prune_data()
# await cognee.prune.prune_system(metadata=True)
if __name__ == "__main__":
asyncio.run(main())