cognee/cognee
Hashem Aldhaheri fd77e92cc4
Fix: Handle file:// URLs in open_data_file function (#1019)
## Summary
This PR fixes an asymmetry issue where files saved with `file://`
prefixes could not be read back, causing "file not found" errors.

## Problem
The Cognee framework has a bug where:
- `save_data_to_file.py` adds `file://` prefix when saving files
- `open_data_file.py` doesn't handle the `file://` prefix when reading
files
- This causes saved files to appear as "lost" with cryptic "file not
found" errors

## Solution
Added proper handling for `file://` URLs in `open_data_file.py` by:
- Checking if the file path starts with `"file://"`
- Stripping the prefix using `replace("file://", "", 1)`
- Following the same pattern as S3 URL handling

## Changes
- Modified
`cognee/modules/data/processing/document_types/open_data_file.py` to
handle `file://` URLs
- Added comprehensive unit tests in
`cognee/tests/unit/modules/data/test_open_data_file.py`

## Testing
Added 6 test cases covering:
- Regular file paths (ensuring backward compatibility)
- file:// URLs in text mode
- file:// URLs in binary mode
- file:// URLs with specific encoding
- Nonexistent files with file:// URLs
- Edge case with multiple file:// prefixes

All tests pass successfully.

## Notes
- This is a minimal fix that maintains backward compatibility
- The fix follows the existing pattern used for S3 URL handling
- No breaking changes to the API

I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Signed-off-by: Hashem Aldhaheri <aenawi@gmail.com>
2025-06-30 11:55:34 +02:00
..
api fix: default user (#908) 2025-06-05 15:38:06 +02:00
eval_framework fix: 0.1.41 Release (#894) 2025-05-31 02:19:29 +02:00
exceptions Merge dev to main (#827) 2025-05-15 13:15:49 +02:00
infrastructure fix: 0.1.41 Release (#894) 2025-05-31 02:19:29 +02:00
modules Fix: Handle file:// URLs in open_data_file function (#1019) 2025-06-30 11:55:34 +02:00
notebooks feat: Group DataPoints into NodeSets (#680) 2025-04-19 20:21:04 +02:00
shared refactor: Update rel db example (#985) 2025-06-13 18:33:40 +02:00
tasks fix: 0.1.41 Release (#894) 2025-05-31 02:19:29 +02:00
tests Fix: Handle file:// URLs in open_data_file function (#1019) 2025-06-30 11:55:34 +02:00
__init__.py fix: 0.1.41 Release (#894) 2025-05-31 02:19:29 +02:00
base_config.py Merge dev to main (#827) 2025-05-15 13:15:49 +02:00
fetch_secret.py ruff format 2025-01-05 19:09:08 +01:00
get_token.py fix: Cognee backend fixes (#659) 2025-03-20 21:51:35 +01:00
low_level.py fix: custom model pipeline (#508) 2025-02-08 02:00:15 +01:00
pipelines.py feat: expose cognee.pipelines (#125) 2024-07-27 10:01:44 +02:00
root_dir.py ruff format 2025-01-05 19:09:08 +01:00
version.py fix: 0.1.41 Release (#894) 2025-05-31 02:19:29 +02:00