Commit graph

466 commits

Author SHA1 Message Date
EricXiao
4c609d6074 Merge branch 'dev' into feat/csv-ingestion
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-11-14 14:46:11 +08:00
Igor Ilic
9fb7f2c4cf Merge branch 'dev' into multi-tenancy 2025-11-07 15:51:44 +01:00
Andrej Milićević
df102e99a9
feat: add structured output support to retrievers (#1734)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Add structured output support to all completion-based retrievers. If the
response model is not supplied, a string is used as default

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-07 10:21:52 +01:00
Igor Ilic
5dbfea5084
Merge branch 'dev' into multi-tenancy 2025-11-06 18:55:18 +01:00
hajdul88
c0e5ce04ce
Fix: fixes session history test for multiuser mode (#1746)
<!-- .github/pull_request_template.md -->

## Description
Fixes failing session history test

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-06 14:13:55 +01:00
Igor Ilic
6a7d8ba106
Merge branch 'dev' into multi-tenancy 2025-11-05 12:17:49 +01:00
hajdul88
eaf8d718b0
feat: introduces memify pipeline to save cache sessions into cognee (#1731)
<!-- .github/pull_request_template.md -->

## Description
This PR introduces a new memify pipeline to save cache sessions in
cognee. The QA sessions are added to the main knowledge base as separate
documents.


## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
None

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have added necessary documentation (if applicable)
- [x] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description
- [x] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-11-05 10:27:54 +01:00
Andrej Milicevic
7e3c24100b refactor: add structured output to completion retrievers 2025-11-04 15:09:33 +01:00
Igor Ilic
ac257dca1d refactor: Account for async change for identify function 2025-11-04 13:13:42 +01:00
lxobr
6223ecf05b
feat: optimize repeated entity extraction (#1682)
<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

- Added an `edge_text` field to edges that auto-fills from
`relationship_type` if not provided.
- Containts edges now store descriptions for better embedding
- Updated and refactored indexing so that edge_text gets embedded and
exposed
- Updated retrieval to use the new embeddings 
- Added a test to verify edge_text exists in the graph with the correct
format.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [x] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue/feature**
- [x] My code follows the project's coding standards and style
guidelines
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
2025-10-30 13:56:06 +01:00
Igor Ilic
a18370a0fc refactor: Remove reference to specific database row 2025-10-26 23:35:00 +01:00
Igor Ilic
90ca9bc8d1 refactor: Change id to be the relational database ID and not graph node ID 2025-10-26 20:51:38 +01:00
lxobr
7a08e13a20 chore: further expand logging 2025-10-23 18:36:51 +02:00
lxobr
b09e4b7cc4 chore: adhere to memify input convention 2025-10-23 17:48:21 +02:00
lxobr
2d6188523a chore: minor improvements 2025-10-23 17:11:01 +02:00
lxobr
46e6d87c1f Merge branch 'dev' into feature/cog-3187-feedback-enrichment-merge-test 2025-10-23 11:31:23 +02:00
Daulet Amirkhanov
3e2dbd1846 Update deprecated Exception status codes 2025-10-22 17:38:41 +01:00
EricXiao
742866b4c9 feat: csv ingestion loader & chunk
Signed-off-by: EricXiao <taoiaox@gmail.com>
2025-10-22 16:56:46 +08:00
Daulet Amirkhanov
3f5c09eb45 lazy load cron_web_scraper_task and web_scraper_task 2025-10-21 23:11:01 +01:00
Daulet Amirkhanov
ed4eba4c44 add back in-code comments for ingest_data 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
03b4547b7f validate e2e - urls are saved as htmls, and loaders are selected correctly 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
322ef156cb redefine preferred_loaders param to allow for args per loader 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
16e1c60925 move bs4 html parsing into bs4_loader 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
9d9969676f Separate BeautifulSoup crawling from fetching 2025-10-21 22:47:52 +01:00
Daulet Amirkhanov
5035c872a7 refactor: update web scraper configurations and simplify fetch logic 2025-10-21 22:47:22 +01:00
Daulet Amirkhanov
95e735d397 remove fetchers_config, use default configs for Tavily and BeautifulSoup 2025-10-21 22:46:50 +01:00
Daulet Amirkhanov
085e81c082 Clean up - remove UnsupportedPathSchemeError 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
f7c2187ce7 remove loaders_config as it's not in use 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
fc660b46bb remove web_url_loader since there is no logic post fetching for loader 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d7417d9b06 refactor: move url data fetching logic into save_data_item_to_storage 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
17b33ab443 feat: web_url_fetcher 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
a0f760a3d1 refactor: remove redundant filestream arg from LoaderEngine.load_file(...) 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
b5190c90f1 add logging for crawling status; add cap to the crawl_delay from robots.txt
- Not advising to use the cap, but giving an option to be able to configure it
2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
9b802f651b fix: web_url_loader load_data should yield stored_path 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
d0f3e224cb refactor ingest_data to accomodate non-FS data items 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
2e7ff0b01b remove reduntant HtmlContent class in save_data_item_to_storage 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
36364285b2 tests: fix failing tests 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
185600fe17 revert url_crawler changes to cognee.add(), and update web_url_loader.load() 2025-10-21 22:46:49 +01:00
Daulet Amirkhanov
95106d5914 fix: ensure web urls correctly go through ingest_data and reach loaders 2025-10-21 22:46:49 +01:00
hajdul88
46b19ad02c
Merge branch 'dev' into feature/cog-3187-feedback-enrichment 2025-10-21 11:13:32 +02:00
lxobr
70c0a98055 chore: use cot retriever only 2025-10-21 01:39:35 +02:00
lxobr
590c3ad7ec feat: use datapoints only 2025-10-21 01:30:08 +02:00
lxobr
8e580bd3d3 fix: create enrichments 2025-10-21 00:57:42 +02:00
lxobr
834cf8b113 feat: create_enrichments.py 2025-10-21 00:34:12 +02:00
lxobr
ce418828b4 feat: generate improved answers 2025-10-20 23:45:18 +02:00
lxobr
97eb89386e feat: generate improved answers temp 2025-10-20 20:07:16 +02:00
lxobr
78fca9feb7 feat: extract feedback interactions 2025-10-20 20:07:03 +02:00
lxobr
44ec814256 feat: feedback enrichment preparation 2025-10-20 12:48:11 +02:00
Igor Ilic
b4cebf4435
Merge branch 'dev' into embedding-rate-limiter 2025-10-15 15:29:36 +02:00
Igor Ilic
417015d9a9 Merge branch 'dev' into embedding-rate-limiter 2025-10-14 20:39:10 +02:00