cognee/.github/workflows/test_suites.yml
Chaitany 96eb0d448a
feat(#1357): Lexical chunk retriever (#1392)
<!-- .github/pull_request_template.md -->

## Description
<!-- 
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
I Implemented Lexical Chunk Retriever In the LexicalRetriever class is
Inherite the BaseRetriever and The DocumentChunk are lazy loaded when
first time query is made because it save time during object
initialization
and the function get_context and the get_completion are Implemented same
as the ChunksRetriever the only diffrence is that the DocumentChunk are
converted to match the output type of the ChunksRetriever using function
get_own_properties in the utils.

## Type of Change
<!-- Please check the relevant option -->
- [-] Bug fix (non-breaking change that fixes an issue)
- [-] New feature (non-breaking change that adds functionality)
- [-] Breaking change (fix or feature that would cause existing
functionality to change)
- [-] Documentation update
- [-] Code refactoring
- [-] Performance improvement
- [-] Other (please specify):

## Changes Made
<!-- List the specific changes made in this PR -->
- Added LexicalRetriever base class with customizable tokenizer & scorer
     - Implemented caching of DocumentChunk tokens and payloads 
- Added robust initialization with error handling and logging -
Implemented get_context with top_k ranking and optional scores
- Implemented get_completion consistent with BaseRetriever interface
- Added JaccardChunksRetriever demo using set/multiset Jaccard
similarity
- Support for stopwords and multiset frequency-aware similarity -
Integrated logging for initialization, scoring, and retrieval

## Testing

- Manual tests: initialized retriever, retrieved chunks with toy corpus
    - Edge cases: empty corpus, empty query, scorer/tokenizer errors 
    - Verified Jaccard similarity results for single/multiset cases 
    - Code formatted and linted


## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [-] **I have tested my changes thoroughly before submitting this PR**
- [-] **This PR contains minimal changes necessary to address the
issue/feature**
- [-] My code follows the project's coding standards and style
guidelines
- [-] I have added tests that prove my fix is effective or that my
feature works
- [-] I have added necessary documentation (if applicable)
- [-] All new and existing tests pass
- [-] I have searched existing PRs to ensure this change hasn't been
submitted already
- [-] I have linked any relevant issues in the description
- [-] My commits have clear and descriptive messages

## Related Issues
<!-- Link any related issues using "Fixes #issue_number" or "Relates to
#issue_number" -->
Relates to  #1392
## Additional Notes
<!-- Add any additional notes, concerns, or context for reviewers -->
Int the cognee/modules/chunking/models/DocumentChunk.py
don't remove the optional  from is_part_of attributes.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Andrej Milicevic <milicevicandrej@yahoo.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-09-19 18:24:33 +02:00

186 lines
5.2 KiB
YAML

name: Test Suites
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]
types: [opened, synchronize, reopened, labeled]
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
RUNTIME__LOG_LEVEL: ERROR
ENV: 'dev'
jobs:
basic-tests:
name: Basic Tests
uses: ./.github/workflows/basic_tests.yml
secrets: inherit
e2e-tests:
name: End-to-End Tests
uses: ./.github/workflows/e2e_tests.yml
secrets: inherit
cli-tests:
name: CLI Tests
uses: ./.github/workflows/cli_tests.yml
secrets: inherit
docker-compose-test:
name: Docker Compose Test
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/docker_compose.yml
secrets: inherit
docker-ci-test:
name: Docker CI test
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/backend_docker_build_test.yml
secrets: inherit
graph-db-tests:
name: Graph Database Tests
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/graph_db_tests.yml
secrets: inherit
temporal-graph-tests:
name: Temporal Graph Test
needs: [ basic-tests, e2e-tests, graph-db-tests ]
uses: ./.github/workflows/temporal_graph_tests.yml
secrets: inherit
search-db-tests:
name: Search Test on Different DBs
needs: [basic-tests, e2e-tests, graph-db-tests]
uses: ./.github/workflows/search_db_tests.yml
secrets: inherit
relational-db-migration-tests:
name: Relational DB Migration Tests
needs: [basic-tests, e2e-tests, graph-db-tests]
uses: ./.github/workflows/relational_db_migration_tests.yml
secrets: inherit
notebook-tests:
name: Notebook Tests
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/notebooks_tests.yml
secrets: inherit
different-operating-systems-tests:
name: Operating System and Python Tests
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/test_different_operating_systems.yml
with:
python-versions: '["3.10.x", "3.11.x", "3.12.x"]'
secrets: inherit
# Matrix-based vector database tests
vector-db-tests:
name: Vector DB Tests
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/vector_db_tests.yml
secrets: inherit
# Matrix-based example tests
example-tests:
name: Example Tests
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/examples_tests.yml
secrets: inherit
mcp-test:
name: MCP Tests
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/test_mcp.yml
secrets: inherit
db-examples-tests:
name: DB Examples Tests
needs: [vector-db-tests, graph-db-tests, relational-db-migration-tests]
uses: ./.github/workflows/db_examples_tests.yml
secrets: inherit
s3-file-storage-test:
name: S3 File Storage Test
needs: [basic-tests, e2e-tests]
uses: ./.github/workflows/test_s3_file_storage.yml
secrets: inherit
# Additional LLM tests
llm-tests:
name: LLM Test Suite
needs: [ basic-tests, e2e-tests ]
uses: ./.github/workflows/test_llms.yml
secrets: inherit
# Ollama tests moved to the end
ollama-tests:
name: Ollama Tests
needs: [
basic-tests,
e2e-tests,
graph-db-tests,
notebook-tests,
different-operating-systems-tests,
vector-db-tests,
example-tests,
llm-tests,
mcp-test,
relational-db-migration-tests,
docker-compose-test,
docker-ci-test,
]
uses: ./.github/workflows/test_ollama.yml
secrets: inherit
notify:
name: Test Completion Status
needs: [
basic-tests,
e2e-tests,
cli-tests,
graph-db-tests,
notebook-tests,
different-operating-systems-tests,
vector-db-tests,
example-tests,
db-examples-tests,
mcp-test,
llm-tests,
ollama-tests,
relational-db-migration-tests,
docker-compose-test,
docker-ci-test,
]
runs-on: ubuntu-latest
if: always()
steps:
- name: Check Status
run: |
if [[ "${{ needs.basic-tests.result }}" == "success" &&
"${{ needs.e2e-tests.result }}" == "success" &&
"${{ needs.cli-tests.result }}" == "success" &&
"${{ needs.graph-db-tests.result }}" == "success" &&
"${{ needs.notebook-tests.result }}" == "success" &&
"${{ needs.different-operating-systems-tests.result }}" == "success" &&
"${{ needs.vector-db-tests.result }}" == "success" &&
"${{ needs.example-tests.result }}" == "success" &&
"${{ needs.db-examples-tests.result }}" == "success" &&
"${{ needs.relational-db-migration-tests.result }}" == "success" &&
"${{ needs.llm-tests.result }}" == "success" &&
"${{ needs.docker-compose-test.result }}" == "success" &&
"${{ needs.docker-ci-test.result }}" == "success" &&
"${{ needs.ollama-tests.result }}" == "success" ]]; then
echo "All test suites completed successfully!"
else
echo "One or more test suites failed."
exit 1
fi