<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced code search and dependency analysis for improved accuracy.
- Introduced a new high-performance text embedding option.
- Added an additional execution entry point for code graph processing.
- New optional parameters for flexible property selection in retrieval
functions.
- Introduced new classes for handling import statements, function
definitions, and class definitions.
- Updated embedding engine selection based on configuration options.
- **Bug Fixes**
- Improved error handling in search operations and database queries for
a more stable user experience.
- Enhanced error logging for source code parsing.
- **Refactor**
- Streamlined asynchronous processing and refactored internal dependency
extraction.
- Updated configuration and integration settings to enhance overall
reliability.
- Restructured functions for simplified dependency handling.
- **Chores**
- Upgraded and reorganized dependency management with optional libraries
for extended functionality.
- Added new secret parameters for embedding configuration in workflow
settings.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: vasilije <vas.markovic@gmail.com>
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- New Features
- Introduced version tracking and enhanced metadata in core data models
for improved data consistency.
- Bug Fixes
- Improved error handling during graph data loading to prevent
disruptions from unexpected identifier formats.
- Refactor
- Centralized identifier parsing and streamlined model definitions,
ensuring smoother and more consistent operations across search,
retrieval, and indexing workflows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- .github/pull_request_template.md -->
## Description
<!-- Provide a clear description of the changes in this PR -->
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.
- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.
- **Chores**
• Dependency and configuration updates improve overall stability and
performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
* Add type to DataPoint metadata
* Add missing index_fields
* Use DataPoint UUID type in pgvector create_data_points
* Make _metadata mandatory everywhere
* fix: refactor get_graph_from_model to return nodes and edges correctly
* fix: add missing params
* fix: remove complex zip usage
* fix: add edges to data_point properties
* fix: handle rate limit error coming from llm model
* fix: fixes lost edges and nodes in get_graph_from_model
* fix: fixes database pruning issue in pgvector
* fix: fixes database pruning issue in pgvector (#261)
* feat: adds code summary embeddings to vector DB
* fix: cognee_demo notebook pipeline is not saving summaries
* feat: implements first version of codegraph retriever
* chore: implements minor changes mostly to make the code production ready
* fix: turns off raising duplicated edges unit test as we have these in our current codegraph generation
* feat: implements unit tests for description to codepart search
* fix: fixes edge property inconsistent access in codepart retriever
* chore: implements more precise typing for get_attribute method for cogneegraph
* chore: adds spacing to tests and changes the cogneegraph getter names
---------
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Some SQL commands require lowercase characters in table names unless table name is wrapped in quotes. Renamed all new tables to use lowercase
Fix COG-677