Enhanced the edge deduplication prompts to better recognize semantically
equivalent facts that use different phrasings:
- Self-referential relationships ("X is a sub-agency of X" = "X is its own sub-agency")
- Active vs passive voice ("A awarded contract to B" = "B received contract from A")
- Numeric format equivalence ($1M = $1,000,000)
- Entity aliases (DoD = Department of Defense)
Added integration tests that verify the LLM correctly identifies semantic
duplicates with the improved prompts.
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| dedupe_edges.py | ||
| dedupe_nodes.py | ||
| eval.py | ||
| extract_edge_dates.py | ||
| extract_edges.py | ||
| extract_nodes.py | ||
| invalidate_edges.py | ||
| lib.py | ||
| models.py | ||
| prompt_helpers.py | ||
| snippets.py | ||
| summarize_nodes.py | ||