test: add load tests (#1573)

## Description  Added a load test to out codebase. The test runs N adds of a pdf, then cognifies them and runs N searches. Cognify and the searches are measured, with certain constraints on how fast they should be. We can tweak the values if necessary, these are values for the gpt-5-mini model. ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  --- > [!NOTE] > Introduce a load test for S3 ingest, cognify, and concurrent searches with timing thresholds, and wire it into CI. > > - **Tests**: > - Add `cognee/tests/test_load.py` to measure end-to-end load: prunes data/system, ingests from `s3://cognee-test-load-s3-bucket`, runs `cognify` then concurrent GRAPH_COMPLETION searches, records timings across reps, and asserts avg ≤ 8m and each run ≤ 10m. > - **CI**: > - Add `test-load` job in `.github/workflows/e2e_tests.yml`: installs AWS deps, raises file descriptor limit, configures S3/env secrets, and executes the new load test. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit c7598122bb. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
2025-11-06 08:42:01 +01:00 · 2025-11-06 08:42:01 +01:00 · 69c7aa2559
commit 69c7aa2559
parent 8cc55ac0b2 c7598122bb
2 changed files with 103 additions and 0 deletions
--- a/.github/workflows/e2e_tests.yml
+++ b/.github/workflows/e2e_tests.yml
@ -447,3 +447,44 @@ jobs:
          DB_USERNAME: cognee
          DB_PASSWORD: cognee
        run: uv run python ./cognee/tests/test_conversation_history.py
+
+  test-load:
+    name: Test Load
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v4
+
+      - name: Cognee Setup
+        uses: ./.github/actions/cognee_setup
+        with:
+          python-version: '3.11.x'
+          extra-dependencies: "aws"
+
+      - name: Set File Descriptor Limit
+        run: sudo prlimit --pid $$ --nofile=4096:4096
+
+      - name: Verify File Descriptor Limit
+        run: ulimit -n
+
+      - name: Dependencies already installed
+        run: echo "Dependencies already installed in setup"
+
+      - name: Run Load Test
+        env:
+          ENV: 'dev'
+          ENABLE_BACKEND_ACCESS_CONTROL: True
+          LLM_MODEL: ${{ secrets.LLM_MODEL }}
+          LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
+          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+          LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
+          EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
+          EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
+          EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
+          EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
+          STORAGE_BACKEND: s3
+          AWS_REGION: eu-west-1
+          AWS_ENDPOINT_URL: https://s3-eu-west-1.amazonaws.com
+          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_DEV_USER_KEY_ID }}
+          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_DEV_USER_SECRET_KEY }}
+        run: uv run python ./cognee/tests/test_load.py
--- a/cognee/tests/test_load.py
+++ b/cognee/tests/test_load.py
@ -0,0 +1,62 @@
+import os
+import pathlib
+import asyncio
+import time
+
+import cognee
+from cognee.modules.search.types import SearchType
+from cognee.shared.logging_utils import get_logger
+
+logger = get_logger()
+
+
+async def process_and_search(num_of_searches):
+    start_time = time.time()
+
+    await cognee.cognify()
+
+    await asyncio.gather(
+        *[
+            cognee.search(
+                query_text="Tell me about the document", query_type=SearchType.GRAPH_COMPLETION
+            )
+            for _ in range(num_of_searches)
+        ]
+    )
+
+    end_time = time.time()
+
+    return end_time - start_time
+
+
+async def main():
+    data_directory_path = os.path.join(pathlib.Path(__file__).parent, ".data_storage/test_load")
+    cognee.config.data_root_directory(data_directory_path)
+
+    cognee_directory_path = os.path.join(pathlib.Path(__file__).parent, ".cognee_system/test_load")
+    cognee.config.system_root_directory(cognee_directory_path)
+
+    num_of_pdfs = 10
+    num_of_reps = 5
+    upper_boundary_minutes = 10
+    average_minutes = 8
+
+    recorded_times = []
+    for _ in range(num_of_reps):
+        await cognee.prune.prune_data()
+        await cognee.prune.prune_system(metadata=True)
+
+        s3_input = "s3://cognee-test-load-s3-bucket"
+        await cognee.add(s3_input)
+
+        recorded_times.append(await process_and_search(num_of_pdfs))
+
+    average_recorded_time = sum(recorded_times) / len(recorded_times)
+
+    assert average_recorded_time <= average_minutes * 60
+
+    assert all(rec_time <= upper_boundary_minutes * 60 for rec_time in recorded_times)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())