<!-- .github/pull_request_template.md -->
## Description
Github Actions job:
https://github.com/topoteretes/cognee/actions/runs/18199627173/job/51815009426?pr=1493
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
## Description
This pull request introduces a preview step to the `cognee delete`
command, fulfilling the requirements of issue #1366
When a user runs the delete command, it now first queries the database
to calculate the scope of the deletion and presents a summary (number of
datasets, data entries, users) before asking for final confirmation.
This improves the safety and usability of the command, preventing
accidental data loss.
This PR also adds the `--force` flag to bypass the preview, which is
useful for scripting and automation.
## Type of Change
- [x] New feature (non-breaking change that adds functionality)
- [ ] Bug fix (non-breaking change that fixes an issue)
## Changes Made
- **`cognee/cli/commands/delete_command.py`**: Modified to include the
preview logic. It now calls the counting function, displays the results,
and proceeds with deletion only after confirmation.
- **`cognee/modules/data/methods/get_deletion_counts.py`**: Added this
new file to contain the logic for querying the database and calculating
the deletion counts for datasets, data entries, and users.
## Testing
I have tested the changes through **Manual CLI Testing**: I ran the
`cognee delete` command with the `--dataset-name`, `--user-id`, and
`--all` flags to manually verify that the preview output is correct.
### Terminal Output
Here are screenshots of the command working with the all possible flags:
<img width="1898" height="1087" alt="cognee1"
src="https://github.com/user-attachments/assets/939aa4d0-748c-45e4-a2a6-f5e7982c1fc0"
/>
<img width="1788" height="748" alt="cognee2"
src="https://github.com/user-attachments/assets/213884be-cce1-4007-90f9-5e6d3a302ced"
/>
## Pre-submission Checklist
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my feature works
- [ ] I have not added or changed documentation (as it was not required
for this CLI change)
- [x] I have searched existing PRs to ensure this change has been
submitted already
- [x] I have linked the relevant issue in the description
## Related Issues
Fixes#1366
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
This commit introduces the preview functionality for the command. The preview displays a summary of what will be deleted before asking for user confirmation.
The feature is fully functional for the following flags:
- / : Correctly counts the number of data entries within the specified dataset.
- : Correctly counts the total number of datasets, data entries, and users in the system.
The logic for the flag is a work in progress. The current implementation uses a placeholder and needs a method to query a user directly by their ID to be completed.
<!-- .github/pull_request_template.md -->
## Description
Remove MacOS13 support and CI/CD tests
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [x] Other (please specify): Remove MacOS13 support
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
- Changed "mistralai==1.9.10" to "mistralai>=1.9.10" for more flexible versioning.
- Removed "mistralai" from the optional dependencies under "mistral".
- Expanded the "docs" dependency to include "pdf" support.
- Added "mistralai==1.9.10" to the dependencies in pyproject.toml.
- Updated sdist entries in uv.lock to remove unnecessary upload-time fields for various packages.
- Ensured consistency in package specifications across the project files.
<!-- .github/pull_request_template.md -->
## Description
ruff formatting
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
I've just added a new PDF parser, AdvancedPdfLoader. It uses the
unstructured library and does a much better job of handling PDFs,
especially with its layout-aware parsing, table preservation, and image
handling.
I also built in a safeguard: if unstructured isn't installed or throws
an error, it'll automatically fall back to the old PyPdfLoader so it
won't just crash. All the related unit tests and project dependencies
are taken care of, too.
https://github.com/topoteretes/cognee/issues/1342
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):
## Changes Made
<!-- List the specific changes made in this PR -->
- Added AdvancedPdfLoader class for enhanced PDF processing using the
unstructured library.
- Integrated fallback mechanism to PyPdfLoader in case of unstructured
library import failure or exceptions.
- Updated supported loaders to include AdvancedPdfLoader.
- Added unit tests for AdvancedPdfLoader to ensure functionality and
error handling.
- Updated poetry.lock and pyproject.toml to include new dependencies and
versions.
## Testing
<!-- Describe how you tested your changes -->
pytest -v ./cognee/tests/test_advanced_pdf_loader.py
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages
## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->
## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.