Commit graph

30 commits

Author SHA1 Message Date
Daniel Chalef
e72f81092e
Separate unit, database, and API integration tests (#997)
* Separate unit and integration tests to allow external contributors

This change addresses the issue where external contributor PRs fail unit
tests because GitHub secrets (API keys) are unavailable to external PRs
for security reasons.

Changes:
- Split GitHub Actions workflow into two jobs:
  - unit-tests: Runs without API keys or database connections (all PRs)
  - integration-tests: Runs only for internal contributors with API keys
- Renamed test_bge_reranker_client.py to test_bge_reranker_client_int.py
  to follow naming convention for integration tests
- Unit tests now skip all tests requiring databases or API keys
- Integration tests properly separated into:
  - Database integration tests (no API keys)
  - API integration tests (requires OPENAI_API_KEY, etc.)

The unit-tests job now:
- Runs for all PRs (internal and external)
- Requires no GitHub secrets
- Disables all database drivers
- Excludes all integration test files
- Passes 93 tests successfully

The integration-tests job:
- Only runs for internal contributors (same repo PRs or pushes to main)
- Has access to GitHub secrets
- Tests database operations and API integrations
- Uses conditional: github.event.pull_request.head.repo.full_name == github.repository

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Separate database tests from API integration tests

Restructured the workflow into three distinct jobs:

1. unit-tests: Runs on all PRs, no external dependencies (93 tests)
   - No API keys required
   - No database connections required
   - Fast execution

2. database-integration-tests: Runs on all PRs with databases (NEW)
   - Requires Neo4j and FalkorDB services
   - No API keys required
   - Tests database operations without external API calls
   - Includes: test_graphiti_mock.py, test_falkordb_driver.py,
     and utils/maintenance tests

3. api-integration-tests: Runs only for internal contributors
   - Requires API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
   - Conditional execution for same-repo PRs only
   - Tests that make actual API calls to LLM providers

This ensures external contributor PRs can run both unit tests and
database integration tests successfully, while API integration tests
requiring secrets only run for internal contributors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Disable Kuzu in CI database integration tests

Kuzu requires downloading extensions from external URLs which fails in CI
environment due to network restrictions. Disable Kuzu for database and API
integration tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Use pytest -k filter to skip Kuzu tests instead of DISABLE_KUZU

The original workflow used -k "neo4j" to filter tests. Kuzu requires
downloading FTS extensions from external URLs which fails in CI. Use
-k "neo4j or falkordb" to run tests against available databases while
skipping Kuzu parametrized tests.

This maintains the same test coverage as the original workflow while
properly separating unit, database, and API integration tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Upgrade Kuzu to v0.11.3+ to fix FTS extension download issue

Kuzu v0.11.3+ has FTS extension pre-installed, eliminating the need to
download it from external URLs. This fixes the "Could not establish
connection" error when trying to download libfts.kuzu_extension in CI.

Changes:
- Upgrade kuzu dependency from >=0.11.2 to >=0.11.3
- Remove pytest -k filters to run all database tests (Neo4j, FalkorDB, Kuzu)
- FTS extension is now available immediately without network calls

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Move pure unit tests from database integration to unit test job

The reviewer correctly identified that test_bulk_utils.py,
test_edge_operations.py, and test_node_operations.py are pure unit tests
using only mocks - they don't require database connections.

Changes:
- Removed tests/utils/maintenance/ from ignore list (too broad)
- Added specific ignore for test_temporal_operations_int.py (true integration test)
- Moved test_bulk_utils.py, test_edge_operations.py, test_node_operations.py to unit tests
- Kept test_graphiti_mock.py in database integration (uses real graph_driver fixture)

This reduces database integration test time and properly categorizes tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Skip flaky LLM-based tests in test_temporal_operations_int.py

- test_get_edge_contradictions_multiple_existing
- test_invalidate_edges_partial_update

These tests rely on OpenAI LLM responses for edge contradiction detection and produce non-deterministic results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Use pytest -k filter for API integration tests

Replace explicit file listing with `pytest tests/ -k "_int"` to automatically discover all integration tests in any subdirectory. This improves maintainability by eliminating the need to manually update the workflow when adding new integration test files.

Excludes:
- tests/driver/ (runs separately in database-integration-tests)
- tests/test_graphiti_mock.py (runs separately in database-integration-tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Rename workflow from "Unit Tests" to "Tests"

The workflow now runs multiple test types (unit, database integration, and API integration), so "Tests" is a more accurate name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-12 09:07:24 -07:00
Daniel Chalef
b28bd92c16
Remove ensure_ascii configuration parameter (#969)
* Remove ensure_ascii configuration parameter

- Changed to_prompt_json default from ensure_ascii=True to False
- Removed ensure_ascii parameter from Graphiti.__init__ and GraphitiClients
- Removed ensure_ascii from all function signatures and context dictionaries
- Removed ensure_ascii from all test files
- All JSON serialization now preserves Unicode characters by default

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* format

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-02 15:10:57 -07:00
Daniel Chalef
644aa2b967
feat: Add optional callback to control node summary generation (#959)
Add NodeSummaryFilter callback parameter to extract_attributes_from_nodes
and extract_attributes_from_node functions, allowing consumers to
selectively skip summary regeneration for specific nodes.

This enables downstream applications to implement custom logic for
throttling or filtering which nodes should have summaries regenerated,
reducing unnecessary LLM calls and token costs.

Key changes:
- Add NodeSummaryFilter type alias: Callable[[EntityNode], Awaitable[bool]]
- Update extract_attributes_from_nodes with optional should_summarize_node parameter
- Update extract_attributes_from_node with conditional summary generation logic
- Add 5 comprehensive test cases covering callback functionality
- Maintain full backwards compatibility (default None = all summaries generated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-01 16:17:48 -07:00
Daniel Chalef
420676faf2
fix: Prevent duplicate edge facts within same episode (#955)
* fix: Prevent duplicate edge facts within same episode

This fixes three related bugs that allowed verbatim duplicate edge facts:

1. Fixed LLM deduplication: Changed related_edges_context to use integer
   indices instead of UUIDs, matching the EdgeDuplicate model expectations.

2. Fixed batch deduplication: Removed episode skip in dedupe_edges_bulk
   that prevented comparing edges from the same episode. Added self-comparison
   guard to prevent edge from comparing against itself.

3. Added fast-path deduplication: Added exact string matching before parallel
   processing in resolve_extracted_edges to catch within-episode duplicates
   early, preventing race conditions where concurrent edges can't see each other.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: Add tests for edge deduplication fixes

Added three tests to verify the edge deduplication fixes:

1. test_dedupe_edges_bulk_deduplicates_within_episode: Verifies that
   dedupe_edges_bulk now compares edges from the same episode after
   removing the `if i == j: continue` check.

2. test_resolve_extracted_edge_uses_integer_indices_for_duplicates:
   Validates that the LLM receives integer indices for duplicate
   detection and correctly processes returned duplicate_facts.

3. test_resolve_extracted_edges_fast_path_deduplication: Confirms that
   the fast-path exact string matching deduplicates identical edges
   before parallel processing, preventing race conditions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Remove unused variables flagged by ruff

- Remove unused loop variable 'j' in bulk_utils.py
- Remove unused return value 'edges_by_episode' in test
- Replace unused 'edge_uuid' with '_' in test loop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-01 07:30:30 -07:00
Daniel Chalef
f2c4c97362
Allow Edge extraction to keep discovered edge labels (#950)
* chore: Update dependencies and enhance edge resolution logic

- Add new dependencies: boto3, opensearch-py, and langchain-aws to pyproject.toml.
- Modify Graphiti class to handle additional parameters in edge resolution.
- Improve edge type handling in deduplication logic by introducing custom edge type names.
- Enhance tests for edge resolution to cover new scenarios and ensure correct behavior.

This update improves the flexibility and functionality of edge operations while ensuring compatibility with new libraries.

* refactor: Clean up test_edge_operations.py and format response returns

- Remove unnecessary stubs for opensearchpy module.
- Format return values in llm_client.generate_response for consistency.
- Enhance readability by ensuring proper indentation and structure in test cases.

This refactor improves the clarity and maintainability of the test suite for edge operations.

* bump version to 0.30.0pre5 and enhance docstring for resolve_extracted_edge function

- Update version in pyproject.toml to 0.30.0pre5.
- Add detailed docstring to resolve_extracted_edge function in edge_operations.py, clarifying parameters and return values.

This update improves documentation clarity for the edge resolution process.
2025-09-29 21:32:47 -07:00
Daniel Chalef
3fcd587276
fix: Add edge type validation based on node labels (#948)
* fix: Add edge type validation based on node labels

- Add DEFAULT_EDGE_NAME constant for 'RELATES_TO'
- Implement pre-resolution validation to reset invalid edge names
- Add post-resolution validation for LLM-returned fact types
- Rename parameter from edge_types to edge_type_candidates for clarity
- Add comprehensive tests for validation scenarios

This ensures edges conform to edge_type_map constraints and prevents
misclassification when edge types don't match node label pairs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: Bump version to 0.30.0pre4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-29 16:35:00 -07:00
Daniel Chalef
d7828d48d8
Fix index out of range errors in LLM deduplication responses (#939)
* add tests for llm dedupe guardrails

* document llm dedupe guardrails
2025-09-26 14:57:48 -07:00
Daniel Chalef
9aee3174bd
Refactor batch deduplication logic to enhance node resolution and track duplicate pairs (#929) (#936)
* Refactor deduplication logic to enhance node resolution and track duplicate pairs (#929)

* Simplify deduplication process in bulk_utils by reusing canonical nodes.
* Update dedup_helpers to store duplicate pairs during resolution.
* Modify node_operations to append duplicate pairs when resolving nodes.
* Add tests to verify deduplication behavior and ensure correct state updates.

* reveret to concurrent dedup with fanout and then reconcilation

* add performance note for deduplication loop in bulk_utils

* enhance deduplication logic in bulk_utils to handle missing canonical nodes gracefully

* Update graphiti_core/utils/bulk_utils.py

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

* refactor deduplication logic in bulk_utils to use directed union-find for canonical UUID resolution

* implement _build_directed_uuid_map for efficient UUID resolution in bulk_utils

* document directed union-find lookup in bulk_utils for clarity

---------

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2025-09-26 08:40:18 -07:00
Daniel Chalef
7c469e8e2b
Improve node deduplication w/ deterministic matching, LLM fallbacks (#929)
* add repository guidelines and project structure documentation

* update neo4j image version and modify test command to disable specific databases

* implement deduplication helpers and integrate with node operations

* refactor string formatting to use single quotes in node operations

* enhance deduplication helpers with UUID indexing and update resolution logic

* implement exact fact matching (#931)
2025-09-25 07:13:19 -07:00
Preston Rasmussen
9422b6f5fb
Node dedupe efficiency (#490)
* update resolve extracted edge

* updated edge resolution

* dedupe nodes update

* single pass node resolution

* updates

* mypy updates

* Update graphiti_core/prompts/dedupe_nodes.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* remove unused imports

* mypy

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-05-15 13:56:33 -04:00
Preston Rasmussen
fd9969b5a1
Update dedupe prompt (#457)
* improve dedupe logic

* cut summary length

* update unit tests
2025-05-07 23:23:31 -04:00
Preston Rasmussen
1193b25fa3
add_episode() refactor (#421)
* temporal updates

* update resolve nodes

* dedupe edge updates

* edge dedupe

* extract attributes

* update dynamic pydantic model

* first pass of extract node attributes

* no errors

* bug fixes

* bug fixes

* prompt updates

* prompt updates

* updates

* updates

* remove unused imports

* update tests based on changes

* remove unused import
2025-04-30 12:08:52 -04:00
Daniel Chalef
0f6ac57dab
chore: update version to 0.9.3 and restructure dependencies (#338)
* Bump version from 0.9.0 to 0.9.1 in pyproject.toml and update google-genai dependency to >=0.1.0

* Bump version from 0.9.1 to 0.9.2 in pyproject.toml

* Update google-genai dependency version to >=0.8.0 in pyproject.toml

* loc file

* Update pyproject.toml to version 0.9.3, restructure dependencies, and modify author format. Remove outdated Google API key note from README.md.

* upgrade poetry and ruff
2025-04-08 20:47:38 -07:00
Preston Rasmussen
9efa6762d7
entity typo (#274) 2025-02-24 12:44:17 -05:00
Preston Rasmussen
088029a80c
node label filters (#265)
* node label filters

* update

* add search filters

* updates

* bump versions

* update tests

* test update
2025-02-21 12:38:01 -05:00
Daniel Chalef
445dccc021
refactor: use utc_now() for consistent UTC datetime handling (#234)
* ensure utc timezones

* fix: dep cycle

---------

Co-authored-by: paulpaliychuk <pavlo.paliychuk.ca@gmail.com>
2024-12-09 10:36:04 -08:00
Preston Rasmussen
3199e893ed
add_fact endpoint (#207)
* add_fact endpoint

* bump version

* add edge invalidation

* update
2024-11-06 09:12:21 -05:00
Preston Rasmussen
e15c872900
Fix edge invalidation (#174)
* update edge operations

* add new tests
2024-10-07 11:45:31 -04:00
Preston Rasmussen
d7c20c1f59
Search refactor + Community search (#111)
* WIP

* WIP

* WIP

* community search

* WIP

* WIP

* integration tested

* tests

* tests

* mypy

* mypy

* format
2024-09-16 14:03:05 -04:00
Preston Rasmussen
42fb590606
Add group ids (#89)
* set and retrieve group ids

* update add episode with group id support

* add episode and search functional

* update bulk

* mypy updates

* remove unused imports

* update unit tests

* unit tests

* add optional uuid field

* format

* mypy

* ellipsis
2024-09-06 12:33:42 -04:00
Preston Rasmussen
06d8d9359f
Add Missing Node and edge CRUD (#51)
* add CRUD operations and fix search limit bugs

* format

* update tests

* å

* update tests to double limit call

* add default field

* format

* import correct field
2024-08-27 16:18:01 -04:00
Daniel Chalef
2d0705fc1b
Add get_nodes_by_query method to Graphiti class (#49)
* Add get_nodes_by_query method to Graphiti class

Add a method to the Graphiti class that wraps `get_relevant_nodes` and returns a list of nodes given a query.

* Add `get_nodes_by_query` method to the `Graphiti` class in `graphiti_core/graphiti.py`.
* Import `generate_embedding` from `graphiti_core/llm_client/utils.py`.
* Use `generate_embedding` to generate an embedding for the query.
* Call `get_relevant_nodes` with the generated embedding and return the relevant nodes.

Add an embedding function to `llm_client/utils.py`.

* Add `generate_embedding` function to `graphiti_core/llm_client/utils.py`.
* Accept an embedder and model_id as parameters.
* Generate an embedding for the given text and return it.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/getzep/graphiti?shareId=XXXX-XXXX-XXXX-XXXX).

* address comments left by @danielchalef on #49 (Add get_nodes_by_query method to Graphiti class);

* fix ellipsis name in cla config

* feat: Add get_nodes_by_query method to Graphiti class

* chore: Cleanup unused files, add hybrid node search, add tests

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: paulpaliychuk <pavlo.paliychuk.ca@gmail.com>
2024-08-26 20:00:28 -07:00
Pavlo Paliychuk
6e8c964aef
chore: Add comments to graphiti methods (#40)
* chore: Add comments to graphiti methods

* chore: Update int test name + add header to test files

* chore: Add comments to episode type
2024-08-26 13:11:50 -04:00
Pavlo Paliychuk
0ed7739bc0
Controlled example (#37)
* chore: Add romeo runner

* fix: Linter

* dedupe fixes

* wip

* wip dump

* allbirds

* chore: Update romeo parser

* chore: Anthropic model fix

* allbirds runner

* format

* wip

* mypy updates

* update

* remove r

* update tests

* format

* wip

* wip

* wip

* chore: Strategically update the message

* chore: Add romeo runner

* fix: Linter

* wip

* wip dump

* chore: Update romeo parser

* chore: Anthropic model fix

* wip

* allbirds

* allbirds runner

* format

* wip

* wip

* mypy updates

* update

* remove r

* update tests

* format

* wip

* chore: Strategically update the message

* rebase and fix import issues

* Update package imports for graphiti_core in examples and utils

* nits

* chore: Update OpenAI GPT-4o model to gpt-4o-2024-08-06

* implement groq

* improvments & linting

* cleanup and nits

* Refactor package imports for graphiti_core in examples and utils

* Refactor package imports for graphiti_core in examples and utils

* chore: Nuke unused examples

* chore: Nuke unused examples

* chore: Only run type check on graphiti_core

* fix unit tests

* reformat

* unit test

* fix: Unit tests

* test: Add coverage for extract_date_strings_from_edge

* lint

* remove commented code

---------

Co-authored-by: prestonrasmussen <prasmuss15@gmail.com>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
2024-08-26 10:30:22 -04:00
Daniel Chalef
c5e52153c4
chore: Fix packaging (#38)
* feat: Update project name and description

The project name and description in the `pyproject.toml` file have been updated to reflect the changes made to the project.

* chore: Update pyproject.toml to include core package

The `pyproject.toml` file has been updated to include the `core` package in the list of packages. This change ensures that the `core` package is included when building the project.

* fix imports

* fix importats
2024-08-25 10:07:50 -07:00
Pavlo Paliychuk
605219f8c7
feat: Add real world dates extraction (#26)
* feat: Add real world dates extraction

* fix: Linter

* fix: 💄 mypy errors

* chore: handle invalid dates returned by the llm

* chore: Polish prompt

* reformat

* style: 💄 reformat
2024-08-23 14:18:45 -04:00
Pavlo Paliychuk
8a55f48f5e
Fix temporal invalidation unit tests (#23)
* wip

* wip

* wip

* fix: Linter errors

* fix formatting

* chore: fix ruff

* fix: Duplication

* chore: Fix unit tests for temporal invalidation

* attempt to fix unit tests

* fix: format

---------

Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
2024-08-22 19:02:20 -04:00
Daniel Chalef
73ec0146ff
ruff action (#17)
* ruff action

* chore: Update Python version to 3.10 in lint.yml workflow

* fix lint and formatting

* cleanup
2024-08-22 13:06:42 -07:00
Daniel Chalef
50da9d0f31
format and linting (#18)
* Makefile and format

* fix podcast stuff

* refactor: update import statement for transcript_parser in podcast_runner.py

* format and linting

* chore: Update import statements and remove unused code in maintenance module
2024-08-22 12:26:13 -07:00
Pavlo Paliychuk
a6fd0ddb75
feat: Initial version of temporal invalidation + tests (#8)
* feat: Initial version of temporal invalidation + tests

* fix: dont run int tests on CI

* fix: dont run int tests on CI

* fix: dont run int tests on CI

* fix: time of day issue

* fix: running non int tests in ci

* fix: running non int tests in ci

* fix: running non int tests in ci

* fix: running non int tests in ci

* fix: running non int tests in ci

* fix: running non int tests in ci

* fix: running non int tests in ci

* revert: Tests structural changes

* chore: Remove idea file

* chore: Get rid of NodesWithEdges class and define a triplet type instead
2024-08-20 16:29:19 -04:00