cognee

Author	SHA1	Message	Date
Boris	6e5acec292	refactor: make run_pipeline a high-level api for running pipelines (#1294 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-08-27 09:49:20 +02:00
vasilije	b0e3f89340	move to gpt5	2025-08-17 12:19:34 +02:00
Vasilije	dabd0912f8	feat: Cog 2082 add BAML to cognee (#1054 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Raj2604 <rajmandhare26@gmail.com> Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Boris Arzentar <borisarzentar@gmail.com> Co-authored-by: Raj Mandhare <96978537+Raj2604@users.noreply.github.com> Co-authored-by: Pedro Thompson <thompsonp17@hotmail.com> Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br>	2025-08-06 10:41:47 +02:00
Boris	46c4463cb2	feat: s3 storage (#988 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-07-14 21:47:08 +02:00
Boris	773b15a645	feat: websockets for pipeline update streaming (#851 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-06-11 20:29:26 +02:00
lxobr	3da893c131	fix: deepeval retry (#918 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Implemented retries when deepeval's evaluation fails - Updated metric aggregation to ignore Nones ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-06-09 15:15:09 +02:00
Igor Ilic	1ed6cfd918	feat: new Dataset permissions (#869 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-06-06 14:20:57 +02:00
hajdul88	d6639217c3	Feat: Adds context extension search (#865 ) <!-- .github/pull_request_template.md --> ## Description Adds context extension search ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-22 18:25:43 +02:00
hajdul88	e0798ff25f	Feat: Adds chain of thought retriever (#864 ) <!-- .github/pull_request_template.md --> ## Description Adds chain of thought retriever ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-22 13:24:56 +02:00
hajdul88	7eee769251	Feat: Adds dashboard application to parallel modal evals (#847 ) <!-- .github/pull_request_template.md --> ## Description Adds dashboard application to parallel modal evals to enable fast retriever development/evaluation ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-05-21 09:07:02 +02:00
hajdul88	5c36a5dd8a	feat: Adds modal parallel evaluation for retriever development (#844 ) <!-- .github/pull_request_template.md --> ## Description Adds modal parallel evaluation for retriever development ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-20 15:16:13 +02:00
Igor Ilic	af276b8999	feat: Add initial cognee pipeline simplification [COG-1705] (#670 ) <!-- .github/pull_request_template.md --> ## Description Simplify Cognee pipeline usage for users ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-04-17 14:02:12 +02:00
lxobr	d1eab97102	feature: tighten run_tasks_base (#730 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Extracted run_tasks_base function into a new file run_tasks_base.py. - Extracted four executors that execute core logic based on the task type. - Extracted a task handler/wrapper that safely executes the core logic with logging and telemetry. - Fixed the inconsistency with the batches of size 1. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-16 09:19:03 +02:00
Boris	9536395468	Revert "feat: pipeline tasks needs mapping" (#717 ) Reverts topoteretes/cognee#690 I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-10 12:10:12 +02:00
lxobr	e12242b9d0	fix: get default tasks (#700 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Fixed get_no_summary_tasks and get_just_chunks_tasks to work with the new tasks and pipelines - Chore: fixed the pokemon example to work with the new tasks and pipelines ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-07 08:46:02 +02:00
lxobr	8207dc8643	feat: make graph creation prompt configurable (#686 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Added new graph creation prompts - Exposed graph creation prompts in .cognify via get_default tasks - Exposed graph creation prompts in eval framework ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-04-03 11:14:33 +02:00
Boris	0ce6fad24a	feat: pipeline tasks needs mapping (#690 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-04-03 10:52:59 +02:00
Boris	ebf1f81b35	fix: code cleanup [COG-781] (#667 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-03-26 18:32:43 +01:00
Daniel Molnar	73db1a5a53	fix: human readable logs (#658 ) <!-- .github/pull_request_template.md --> ## Description Introducing scructlog. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-03-25 11:54:40 +01:00
Boris	d192d1fe20	chore: remove unused dependencies and make some optional (#661 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin	2025-03-25 10:19:52 +01:00
lxobr	cad9e0ce44	Feat: cog 1491 pipeline steps in eval (#641 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Created get_default_tasks_by_indices to filter default tasks by specific indices - Added get_no_summary_tasks function to skip summarization tasks - Added get_just_chunks_tasks function for chunk extraction and data points only - Added NO_SUMMARIES and JUST_CHUNKS to the TaskGetters enum ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - The evaluation configuration now includes expanded task retrieval options. Users can choose customized modes that bypass summarization or focus solely on extracting data chunks, offering a more tailored evaluation experience. - Enhanced asynchronous task processing brings increased flexibility and smoother performance during task selection. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-14 14:20:39 +01:00
lxobr	daf7d4ae26	feat: COG-1526 instance filter in eval (#627 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Added _filter_instances to BaseBenchmarkAdapter supporting filtering by IDs, indices, or JSON files. - Updated HotpotQAAdapter and MusiqueQAAdapter to use the base class filtering. - Added instance_filter parameter to corpus builder pipeline. - Extracted _get_raw_corpus method in both adapters for better code organization ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Corpus loading and building now support a flexible filtering option, allowing users to apply custom criteria to tailor the retrieved data. - Refactor - The extraction process has been reorganized to separately handle text content and associated metadata, enhancing clarity and overall workflow efficiency. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-13 14:23:13 +01:00
lxobr	38d527ceac	fix: expose chunk_size for eval framework [COG-1546] (#634 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Exposed chunk_size in get_default_tasks in cognify - Reintegrated chunk_size in corpus building in eval framework ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an optional configuration parameter to allow users to set custom processing segment sizes. This enhances flexibility in managing content processing and task execution, enabling more dynamic control over resource handling during corpus creation and related operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 16:13:20 +01:00
alekszievr	c1f7b667d1	feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Simplified text processing by unifying multiple size-related parameters into a single metric across chunking and extraction functionalities. - Streamlined logic for text segmentation by removing redundant calculations and checks, resulting in a more consistent chunk management process. - Chores - Removed the `modal` package as a dependency. - Documentation - Updated the README.md to include a new demo video link and clarified default environment variable settings. - Enhanced the CONTRIBUTING.md to improve clarity and engagement for potential contributors. - Bug Fixes - Improved handling of sentence-ending punctuation in text processing to include additional characters. - Version Update - Updated project version to 0.1.33 in the pyproject.toml file. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 14:03:41 +01:00
alekszievr	7b5bd7897f	Feat: evaluate retrieved context against golden context [cog-1481] (#619 ) <!-- .github/pull_request_template.md --> ## Description - Compare retrieved context to golden context using deepeval's summarization metric - Display relevant fields to each metric on metrics dashboard Example output: ![image](https://github.com/user-attachments/assets/9facf716-b2ab-4573-bfdf-7b343d2a57c5) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced context handling in answer generation and corpus building to include extended details. - Introduced a new context coverage metric for deeper evaluation insights. - Upgraded the evaluation dashboard with dynamic presentation of metric details. - Added a new parameter to support loading golden context in corpus loading methods. - Bug Fixes - Improved clarity in how answers are structured and appended in the answer generation process. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-10 15:27:48 +01:00
lxobr	ac0156514d	feat: COG-1523 add top_k in run_question_answering (#625 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Expose top_k as an optional argument of run_question_answering - Update retrievers to handle the parameters ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced answer generation and document retrieval capabilities by introducing an optional parameter that allows users to specify the number of top results. This improvement adds flexibility when retrieving question responses and associated context, adapting the output based on user preference. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-10 10:55:31 +01:00
vasilije	9d783675e0	Revert "First AI pass at layered graph builder" This reverts commit `1cbcbbd55a`.	2025-03-05 19:48:53 -08:00
vasilije	1cbcbbd55a	First AI pass at layered graph builder	2025-03-05 19:37:45 -08:00
alekszievr	433264d4e4	feat: Add context evaluation to eval framework [COG-1366] (#586 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a class-based retrieval mechanism to enhance answer generation with improved context extraction and completion. - Added a new evaluation metric for contextual relevancy and an option to enable context evaluation during the evaluation process. - Refactor - Transitioned from a function-based answer resolver to a more modular retriever approach to improve extensibility. - Tests - Updated tests to align with the new answer generation and evaluation process. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Daniel Molnar <soobrosa@gmail.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-03-05 16:40:24 +01:00
hajdul88	3e93dbe264	fix: add currying to question_answering_non_parallel (#602 ) …l to avoid additional params <!-- .github/pull_request_template.md --> Introduces lambda currying in question answering non parallel function to avoid unnecessary params ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Streamlined the question-answering process for cleaner, more efficient query handling. - Updated the handling of parameters in the answer generation process, allowing for a more dynamic integration of context. - Simplified test setups by reducing the number of parameters involved in the mock answer resolver. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-04 16:09:53 +01:00
hajdul88	e3f3d49a3b	Feature/cog 1312 integrating evaluation framework into dreamify (#562 ) <!-- .github/pull_request_template.md --> ## Description This PR contains eval framework changes due to the autooptimizer integration ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced answer generation now returns structured answer details. - Search functionality accepts configurable prompt inputs. - Option to generate a metrics dashboard from evaluations. - Corpus building tasks now support adjustable chunk settings for greater flexibility. - New task retrieval functionality allows for flexible task configuration. - Introduced new methods for creating and managing metrics dashboards. - Refactor/Chore - Streamlined API signatures and reorganized module interfaces for better consistency. - Updated import paths to reflect new module structure. - Tests - Updated test scenarios to align with new configurations and parameter adjustments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-03 19:55:47 +01:00

31 commits