Commit graph

3693 commits

Author SHA1 Message Date
Geoff-Robin
fcd91a9709 Added self as an argument to all previous methods that were static methods 2025-10-07 21:51:26 +05:30
Boris
c2698094c6
fix: frontend process output streaming stuck due to incorrect output (#1503)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

`cognee-cli -ui` has accidentally added - when opening frontend
subprocess - an output decoding into text.

## What happens exactly

On the surface - frontend ui will be stuck loading.

Frontend process hangs as it's output is not being processed (we're
expecting bytes).

## This change

This change removes the `text=True` added to frontend subprocess

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-07 18:13:40 +02:00
Igor Ilic
38cdacbcb6
fix: Resolve issue with Gemini adapter (#1494)
<!-- .github/pull_request_template.md -->

## Description
Resolve Gemini Adapter issues:
 1. resolve embedding batch issue,
2. Resolve slowness because gemini tokenizer was sending word per word
to Googles API to count tokens (using OpenAI's local tokenizer to count
tokens for Gemini now)
 3. Update deprecated library and move to instructor

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-07 18:04:18 +02:00
Geoff-Robin
3d53e8d6f1 Removed print statement that I used for debugging 2025-10-07 20:59:19 +05:30
Geoff-Robin
d91ffa2ad6 Removed staticmethod decorator from bs4_crawler.py, kwargs from the function signature in save_data_item_to_storage.py, removed unused imports in ingest_data.py and added robots_cache_ttl as a config field in BeautifulSoupCrawler. 2025-10-07 20:56:23 +05:30
Igor Ilic
9ae3c97aef
Merge branch 'dev' into update-endpoint 2025-10-07 11:27:07 +02:00
Geoff-Robin
fdf85628c7 Added uv.lock again 2025-10-07 01:40:19 +05:30
Geoff-Robin
f71cf774d2 . 2025-10-07 01:34:40 +05:30
Geoff-Robin
902f9a3b6a Changed cognee-mcp\pyproject.toml 2025-10-07 01:26:09 +05:30
Geoff-Robin
b5a1957b0f Regenerate uv.lock after merge 2025-10-07 01:22:39 +05:30
Geoff-Robin
5dcd7e512f Changes uv.lock 2025-10-07 01:09:41 +05:30
Geoff-Robin
1f36dd3d71 Solved nitpick comments 2025-10-06 19:44:54 +05:30
Geoff-Robin
54f2580f2d Solved more nitpick comments 2025-10-06 19:02:11 +05:30
Daulet Amirkhanov
263a8f4376 fix: frontend process output streaming stuck due to incorrect output 2025-10-06 14:18:39 +01:00
Daulet Amirkhanov
24d0bec025
Merge branch 'dev' into chore/update-cognee-ui-cli-mcp-docker-image 2025-10-06 14:13:03 +01:00
Geoff-Robin
1c0e0f0fe1 Solved more nitpick comments 2025-10-06 18:32:10 +05:30
Geoff-Robin
d4ce340cb5 Removed unused imports 2025-10-06 18:31:08 +05:30
Geoff-Robin
7fe1de770d Remove assignment to unused variable graph_db' 2025-10-06 18:29:58 +05:30
Geoff-Robin
0a9b624010 changed return type for fetch_page_content to Dict[str,str] 2025-10-06 18:27:54 +05:30
Geoff-Robin
3c9e5f830b Solved more nitpick comments 2025-10-06 18:16:31 +05:30
Geoff-Robin
791e38b2c0 Solved more nitpick comments 2025-10-06 18:00:20 +05:30
Geoff-Robin
1b5c099f8b CodeRabbit reviews solved 2025-10-06 17:15:25 +05:30
Geoff-Robin
ae740eda96 Added related documentation 2025-10-06 04:23:10 +05:30
Geoff-Robin
667bbd775e Added cron job and removed obvious comments 2025-10-06 04:12:32 +05:30
Geoff-Robin
4d5146c802 Added Documentation 2025-10-06 04:00:15 +05:30
Geoff-Robin
0f64f6804d Done adding cron job web scraping 2025-10-06 03:45:09 +05:30
Geoff-Robin
e5633bc368 corrected F402 error pointed out by ruff check 2025-10-06 03:44:24 +05:30
Geoff-Robin
f449fce0f1 Done with scraping_task successfully 2025-10-06 02:27:20 +05:30
Geoff-Robin
f148b1df89 Added support for multiple base_url extraction 2025-10-05 20:13:44 +05:30
Geoff-Robin
77ea7c4b1d Added APScheduler 2025-10-05 20:02:02 +05:30
Geoff-Robin
c2aa95521c removed structured argument 2025-10-05 20:00:19 +05:30
Geoff-Robin
2cba31a086 Tested and Debugged scraping usage in cognee.add() pipeline 2025-10-04 21:26:25 +05:30
Geoff-Robin
ab6fc65406 Added global context for bs4crawler and tavily config 2025-10-04 19:40:37 +05:30
Geoff-Robin
da7ebc4574 Removed asyncio import 2025-10-04 15:10:46 +05:30
Geoff-Robin
fbef6675bc removed unused Dict import from typing 2025-10-04 15:10:05 +05:30
Geoff-Robin
20fb77316c Done with integration with add workflow when incremental_loading is set to False 2025-10-04 15:01:13 +05:30
Geoff-Robin
1ab9d24cf0 Changed bs4_connector.py to bs4_crawler.py 2025-10-03 12:33:13 +05:30
Daulet Amirkhanov
ee45afed42
Fix test_cli_edge_cases, test_delete_all_with_user_id unit test (#1493)
<!-- .github/pull_request_template.md -->
## Description

Github Actions job:


https://github.com/topoteretes/cognee/actions/runs/18199627173/job/51815009426?pr=1493


<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-02 18:00:00 +01:00
Daulet Amirkhanov
38070c489b fix test_cli_edge_cases.py, test_delete_all_with_user_id unit test 2025-10-02 17:44:01 +01:00
Geoff-Robin
edd119ef97 first iteration of bs4_connector.py done 2025-10-02 22:04:50 +05:30
Daulet Amirkhanov
2efce6949b
Feature/delete preview (#1385)
## Description

This pull request introduces a preview step to the `cognee delete`
command, fulfilling the requirements of issue #1366

When a user runs the delete command, it now first queries the database
to calculate the scope of the deletion and presents a summary (number of
datasets, data entries, users) before asking for final confirmation.
This improves the safety and usability of the command, preventing
accidental data loss.

This PR also adds the `--force` flag to bypass the preview, which is
useful for scripting and automation.

## Type of Change

- [x] New feature (non-breaking change that adds functionality)
- [ ] Bug fix (non-breaking change that fixes an issue)

## Changes Made

- **`cognee/cli/commands/delete_command.py`**: Modified to include the
preview logic. It now calls the counting function, displays the results,
and proceeds with deletion only after confirmation.
- **`cognee/modules/data/methods/get_deletion_counts.py`**: Added this
new file to contain the logic for querying the database and calculating
the deletion counts for datasets, data entries, and users.

## Testing

I have tested the changes through **Manual CLI Testing**: I ran the
`cognee delete` command with the `--dataset-name`, `--user-id`, and
`--all` flags to manually verify that the preview output is correct.

### Terminal Output
Here are screenshots of the command working with the all possible flags:
<img width="1898" height="1087" alt="cognee1"
src="https://github.com/user-attachments/assets/939aa4d0-748c-45e4-a2a6-f5e7982c1fc0"
/>
<img width="1788" height="748" alt="cognee2"
src="https://github.com/user-attachments/assets/213884be-cce1-4007-90f9-5e6d3a302ced"
/>

## Pre-submission Checklist

- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my feature works
- [ ] I have not added or changed documentation (as it was not required
for this CLI change)
- [x] I have searched existing PRs to ensure this change has been
submitted already
- [x] I have linked the relevant issue in the description

## Related Issues

Fixes #1366 

## DCO Affirmation

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-02 15:21:57 +01:00
Daulet Amirkhanov
a92f4bdf3f fix: update failing tests and refactor delete_preview implementation 2025-10-02 15:05:39 +01:00
Daulet Amirkhanov
d5dd6c2fc2
Merge branch 'dev' into feature/delete-preview 2025-10-02 12:02:16 +01:00
Andrej Milicevic
a744f8d435 test: Rollback pgvector test. Was failing for some reason. 2025-10-02 09:54:30 +02:00
shehab-badawy
9c87a10848 feat: Add delete preview for --dataset-name and --all flags
This commit introduces the preview functionality for the  command. The preview displays a summary of what will be deleted before asking for user confirmation.

The feature is fully functional for the following flags:
-  / : Correctly counts the number of data entries within the specified dataset.
- : Correctly counts the total number of datasets, data entries, and users in the system.

The logic for the  flag is a work in progress. The current implementation uses a placeholder and needs a method to query a user directly by their ID to be completed.
2025-10-02 01:44:11 -04:00
Geoff-Robin
4979f43fc0 Added playwright as a dependency 2025-10-02 02:21:33 +05:30
Geoff-Robin
c283977035 switched httpx AsyncClient to fetch webpage 2025-10-02 02:01:46 +05:30
Geoff-Robin
60499c439c Added logging 2025-10-02 01:54:56 +05:30
Geoff-Robin
925bd38195 Setup models.py and utils.py 2025-10-02 01:32:00 +05:30
Geoff-Robin
70a2cc9d65 removed scrapy and added bs4 2025-10-02 01:28:48 +05:30