Commit graph

4555 commits

Author SHA1 Message Date
chinu0609
ebb9a1b102 fix: change /api/embeddings to /api/embed in .env.template 2025-10-22 21:36:53 +05:30
Daulet Amirkhanov
6f5915a362
Merge branch 'dev' into revert-1567-revert-1548-fix/search-without-prior-cognify 2025-10-22 17:06:11 +01:00
Vasilije
6c9b3d6385
Merge branch 'dev' into refactor/update-web-parsing 2025-10-22 18:05:54 +02:00
Vasilije
dfba2f80db
Update code for Ollama API compatibility with newer version (#1578)
<!-- .github/pull_request_template.md -->
## Description

While testing Cognee with the latest version of Ollama, I discovered two
breaking changes that prevented proper functionality:

1. **Ollama API key change**: The embeddings API response key has been
updated from `embedding` to `embeddings` in newer Ollama versions
2. **Vector dimension handling**: The `create_lance_data_point` method
was receiving vectors as nested lists `[[...]]` instead of a flat list.
Added validation to flatten the vector when this occurs.

These changes ensure compatibility with the latest Ollama release while
maintaining the expected behavior.


## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-22 18:02:56 +02:00
Vasilije
9d515d4e82
Merge branch 'dev' into main 2025-10-22 18:02:36 +02:00
Daulet Amirkhanov
9abb78efc6
Merge branch 'dev' into revert-1567-revert-1548-fix/search-without-prior-cognify 2025-10-22 16:41:38 +01:00
Daulet Amirkhanov
66345988a9
Merge branch 'dev' into refactor/update-web-parsing 2025-10-22 16:41:01 +01:00
Andrej Milićević
ec84253a87
test: Fix baml ci tests (#1576)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
The model, endpoint, and API key for BAML tests were changed because
they had issues with the new endpoint. Now they use OpenAI.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-22 08:36:20 -07:00
Chinmay Bhosale
18c45acff0
Merge pull request #2 from chinu0609/fix-for-newer-version-ollama
Fix for newer version ollama
2025-10-22 20:59:26 +05:30
chinu0609
e6ab4bbeee fix: reverting the lancedb chnage 2025-10-22 20:58:45 +05:30
chinu0609
7b31b86f10 fix: reverting the lancedb chnage 2025-10-22 20:55:59 +05:30
Daulet Amirkhanov
73e81542b5 tests: remove redundant test 2025-10-22 16:23:41 +01:00
chinu0609
61b884d1e9 Merge branch 'dev' of https://github.com/topoteretes/cognee 2025-10-22 20:33:07 +05:30
Chinmay Bhosale
8b9e30408c
Merge pull request #1 from chinu0609/fix-for-newer-version-ollama
fix: Update code for Ollama API compatibility with newer version
2025-10-22 20:06:49 +05:30
chinu0609
b47cb7462d fix: Update code for Ollama API compatibility with newer version 2025-10-22 19:55:45 +05:30
hajdul88
8d9ee07083 chore: regenerating lock files 2025-10-22 15:31:57 +02:00
hajdul88
8e59b1e933
Merge branch 'dev' into fix-baml-ci-tests 2025-10-22 15:00:17 +02:00
Daulet Amirkhanov
e4cbbcbf51 Add hint log for when data is added but not cognified 2025-10-22 13:21:51 +01:00
Daulet Amirkhanov
ee7db762e6 log warning and early exit when graph is empty and is queried 2025-10-22 13:21:51 +01:00
Daulet Amirkhanov
44345e7cf3 Revert "Revert "fix: search without prior cognify"" 2025-10-22 13:21:51 +01:00
Daulet Amirkhanov
b9afc54233 add test cases for tavily 2025-10-22 13:15:56 +01:00
Daulet Amirkhanov
925323fb35 add test for cognee.add() when tavily is used 2025-10-22 13:15:56 +01:00
Daulet Amirkhanov
5288ab4ab4 tests: fix failing tests 2025-10-22 13:15:56 +01:00
Daulet Amirkhanov
ab6a0ef11c beautilful soup loader: define default comprehensive extraction_rules 2025-10-22 13:15:56 +01:00
Daulet Amirkhanov
344fbbdc29 refactor: make prefererred_loaders easier to define on user facing api 2025-10-22 13:15:55 +01:00
Andrej Milicevic
1d1c7d21f7 change model 2025-10-22 14:06:43 +02:00
Andrej Milicevic
a7c74de208 changed temperature for baml 2025-10-22 13:11:26 +02:00
Andrej Milicevic
bb756a0e19 remove api version 2025-10-22 12:42:23 +02:00
Andrej Milicevic
1ecea0a955 change endpoint 2025-10-22 12:39:31 +02:00
Vasilije
738759bc5b
(WIP) Fix/fix web parsing (#1552)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

This PR (using TDD):
1. Separates web crawling implementation into separate fetching, and
parsing (loader) steps
2. Fetching is used in `save_data_item_to_storage`. Default settings are
used for fetching
3. Loader produces a txt file, scraping the fetched html and saves it in
a txt file (`html_hash.html` -> `html_hash.txt`), similar to how we
process pdf files

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-22 11:57:40 +02:00
EricXiao
8566516cec chore: Remove local test code
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-10-22 16:59:07 +08:00
EricXiao
742866b4c9 feat: csv ingestion loader & chunk
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-10-22 16:56:46 +08:00
Andrej Milicevic
5f2e9bd84b fix: change baml llm provider 2025-10-22 10:37:37 +02:00
Andrej Milicevic
da05671fd7 test: changed the api key of baml tests 2025-10-22 10:23:34 +02:00
Andrej Milicevic
c5648e6337 test: Add load test. 2025-10-22 09:22:11 +02:00
Daulet Amirkhanov
10e4fd7681 Make BS4 loader compatible with tavily fetcher 2025-10-21 23:46:21 +01:00
Daulet Amirkhanov
20c9e5498b skip tavily in Github CI for now 2025-10-21 23:27:18 +01:00
Daulet Amirkhanov
a35bcecdf9 refactor tavily_crawler test 2025-10-21 23:13:40 +01:00
Daulet Amirkhanov
3f5c09eb45 lazy load cron_web_scraper_task and web_scraper_task 2025-10-21 23:11:01 +01:00
Daulet Amirkhanov
f02aa1abfc ruff format 2025-10-21 23:02:25 +01:00
Daulet Amirkhanov
0f6aac19e8 TDD: add test cases and finish loading stage 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
6895813ae8 tests: name integration tests more meaningfully 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
ed4eba4c44 add back in-code comments for ingest_data 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
03b4547b7f validate e2e - urls are saved as htmls, and loaders are selected correctly 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
f84e31c626 bs4_loader.py -> beautiful_soup_loader.py, add to supported loaders 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
322ef156cb redefine preferred_loaders param to allow for args per loader 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
7210198f2e implement bs4_loader.py methods aside load yet 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
16e1c60925 move bs4 html parsing into bs4_loader 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
9d9969676f Separate BeautifulSoup crawling from fetching 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
a7ff188018 add crawler tests 2025-10-21 22:47:22 +01:00