cognee

Author	SHA1	Message	Date
lxobr	ee88fcf5d3	feat: reimplement `resolve_edges_to_text` with cleaner formatting (#652 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Optimized to deduplicate nodes appearing in multiple triplets, avoiding redundant text repetition - Reimplemented `resolve_edges_to_text` with cleaner formatting - Added `_top_n_words` method for extracting frequent words from text - Created `_get_title` function to generate titles from text content based on first words and word frequency - Extracted node processing logic to `_get_nodes` helper method - Created dedicated `stop_words` utility with common English stopwords ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Improved text output formatting that organizes content into clearly defined sections for enhanced readability. - Enhanced text processing capabilities, including refined title generation and key phrase extraction. - Introduced a comprehensive utility for managing common stop words, further optimizing text analysis. - Bug Fixes - Updated tests to ensure accurate validation of new functionalities and improved existing test coverage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-03-20 14:52:04 +01:00
alekszievr	164cb581ec	test: test retrievers [cog-1433] (#635 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Removed unused code to streamline internal processes. - Tests - Added a comprehensive suite of tests to validate core retrieval and search functionalities. - Improved validation of response generation, context handling, and error scenarios to ensure consistent and reliable performance. These improvements enhance overall system stability and maintainability, contributing to a smoother experience for end-users. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: vasilije <vas.markovic@gmail.com>	2025-03-20 10:18:21 +01:00
hajdul88	1c65682242	feat: adds cypher search to retrievers module (#648 ) <!-- .github/pull_request_template.md --> ## Description Exposes the query method of the adapter in the search interface for Kuzu and Neo4j (cypher compatible adapters) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a new cypher-based search option that expands the app's search functionality. - Enabled asynchronous processing for advanced query execution. - Enhanced error messaging for unsupported search types and query execution issues. - Added a new enumeration value for `CYPHER` to support the new search type. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-19 15:01:40 +01:00
Igor Ilic	88ed411f03	feat: user authorization [COG-1189] (#593 ) <!-- .github/pull_request_template.md --> ## Description Added user authorization through JWT header, reworked user and relevant RBAC models to accompany future User Permission system. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an automated workflow to validate server startup. - Added secure JWT token generation for improved session handling. - Enabled a new structure for permission management with role and tenant-based controls, including endpoints for creating roles, tenants, and assigning permissions. - Added methods for assigning default permissions to roles, tenants, and users. - Introduced new classes for managing default permissions for roles, tenants, and users. - Refactor - Streamlined authentication and user management flows with enhanced error handling. - Tests - Upgraded integration tests with improved database initialization and data pruning for a more stable environment. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>	2025-03-13 13:33:42 +01:00
hajdul88	6fcfb3c398	feat: productionizing ontology solution [COG-1401] (#623 ) <!-- .github/pull_request_template.md --> ## Description This PR contains the ontology feature integrated into cognify ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced ontology management with the introduction of the `OntologyResolver` class for improved data handling and querying. - Expanded ontology framework now provides enriched coverage of technology and automotive domains, including new entities and relationships. - Updated entity models now include a validation flag to support improved data integrity. - Added support for specifying an ontology file path in relevant functions to enhance flexibility. - Refactor - Streamlined integration of ontology processing across data extraction and workflow routines. - Chores - Updated project dependencies to include `owlready2` for advanced ontology functionality. - Tests - Introduced a new test suite for the `OntologyResolver` class to validate its functionality under various conditions. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 14:31:19 +01:00
alekszievr	c1f7b667d1	feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Simplified text processing by unifying multiple size-related parameters into a single metric across chunking and extraction functionalities. - Streamlined logic for text segmentation by removing redundant calculations and checks, resulting in a more consistent chunk management process. - Chores - Removed the `modal` package as a dependency. - Documentation - Updated the README.md to include a new demo video link and clarified default environment variable settings. - Enhanced the CONTRIBUTING.md to improve clarity and engagement for potential contributors. - Bug Fixes - Improved handling of sentence-ending punctuation in text processing to include additional characters. - Version Update - Updated project version to 0.1.33 in the pyproject.toml file. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-12 14:03:41 +01:00
lxobr	ac0156514d	feat: COG-1523 add top_k in run_question_answering (#625 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Expose top_k as an optional argument of run_question_answering - Update retrievers to handle the parameters ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced answer generation and document retrieval capabilities by introducing an optional parameter that allows users to specify the number of top results. This improvement adds flexibility when retrieving question responses and associated context, adapting the output based on user preference. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-10 10:55:31 +01:00
vasilije	9d783675e0	Revert "First AI pass at layered graph builder" This reverts commit `1cbcbbd55a`.	2025-03-05 19:48:53 -08:00
vasilije	1cbcbbd55a	First AI pass at layered graph builder	2025-03-05 19:37:45 -08:00
lxobr	f033f733b5	feat: entity brute force triplet search [COG-1325] (#589 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Refactored `brute_force_triplet_search`, extracting memory projection. - Built TripletSearchContextProvider (extends BaseContextProvider) to create a single memory projection and perform a triplet search for each entity. - Refactored `entity_completion` into EntityCompletionRetriever (extends BaseRetriever). - Added SummarizedTripletSearchContextProvider (extends TripletSearchContextProvider) for an alternative summarized output format. - Developed and tested an example showcasing both context providers, comparing raw triplets, summaries, and standard search results. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced text summarization now delivers clearer, more concise overviews of search results. - Improved search performance with optimized context retrieval and memory reuse for faster, more reliable results. - Introduced advanced entity-based completion for generating more relevant, context-aware responses. - Refactor - Streamlined internal workflows and error handling to ensure a smoother overall experience. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris <boris@topoteretes.com>	2025-03-05 11:17:58 +01:00
hajdul88	5eef212668	Allowing parallel edges in graph projection when using graph completion search (#599 ) <!-- .github/pull_request_template.md --> ## Description Allows parallell edges in graph projection when using graph completion search ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Streamlined the process for updating connections within the application’s graph. The update now ensures that every connection is consistently recorded and propagated without performing duplicate checks. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-04 12:37:26 +01:00
hajdul88	e3f3d49a3b	Feature/cog 1312 integrating evaluation framework into dreamify (#562 ) <!-- .github/pull_request_template.md --> ## Description This PR contains eval framework changes due to the autooptimizer integration ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced answer generation now returns structured answer details. - Search functionality accepts configurable prompt inputs. - Option to generate a metrics dashboard from evaluations. - Corpus building tasks now support adjustable chunk settings for greater flexibility. - New task retrieval functionality allows for flexible task configuration. - Introduced new methods for creating and managing metrics dashboards. - Refactor/Chore - Streamlined API signatures and reorganized module interfaces for better consistency. - Updated import paths to reflect new module structure. - Tests - Updated test scenarios to align with new configurations and parameter adjustments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-03-03 19:55:47 +01:00
alekszievr	6d7a68dbba	Feat: Store descriptive metrics identified by pipeline run id [cog-1260] (#582 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a new analytic capability that calculates descriptive graph metrics for pipeline runs when enabled. - Updated the execution flow to include an option for activating the graph metrics step. - Chores - Removed the previous mechanism for storing descriptive metrics to streamline the system. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-03-03 19:09:35 +01:00
lxobr	3d4312577e	fix: Use DataPoint instead of ExtendableDataPoint in get_all_subclasses (#588 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Use DataPoint instead of ExtendableDataPoint when calling get_all_subclasses in the get_triplets function of the GraphCompletionRetriever ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Updated the internal data handling for retrieving information, ensuring a more consistent and reliable output for end-users. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-27 19:05:09 +01:00
Daniel Molnar	d27f847753	Transition to new retrievers, update searches (#585 ) <!-- .github/pull_request_template.md --> ## Description Delete legacy search implementations after migrating to new retriever classes ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced search and retrieval capabilities, providing improved context resolution for code queries, completions, summaries, and graph connections. - Refactor - Shifted to a modular, object-oriented approach that consolidates query logic and streamlines error management for a more robust and scalable experience. - Bug Fixes - Improved error handling for unsupported search types and retrieval operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-27 15:25:24 +01:00
lxobr	9cc357ac1c	Feat/cog 1365 unify retrievers (#572 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Created the `BaseRetriever` class to unify all the retrievers and searches. - Implemented seven specialized retrievers (summaries, chunks, completions, graph, graph-summary, insights, code) with consistent get_context/get_completion interfaces. - Added json context dumping feature in the current completion implementations to enable context comparisons. - Built a comparison framework to validate old vs new implementations. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced multiple retrieval classes for enhanced search capabilities, including `BaseRetriever`, `ChunksRetriever`, `CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`, `GraphSummaryCompletionRetriever`, `InsightsRetriever`, and `SummariesRetriever`. - Enhanced query completions with optional context saving for improved data persistence. - Implemented advanced tools to compare retrieval outcomes across different implementations. - Refactor - Streamlined internal module organization and updated references for increased maintainability and consistency. - Added comments indicating future maintenance tasks related to code merging. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-02-27 12:13:21 +01:00
Boris	711ae8e675	feat: codegraph improvements and new CODE search [COG-1351] (#581 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an automated deployment workflow to build and push container images. - Updated dependency management to include additional database support. - Refactor - Enhanced asynchronous operations and logging in the server for improved performance. - Optimized extraction and retrieval processes for code-related data. - Chores - Streamlined build configurations and startup scripts for greater reliability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com>	2025-02-26 20:15:02 +01:00
alekszievr	a61df966c6	feat: use external chunker [cog-1354] (#551 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a modular content chunking interface that offers flexible text segmentation with configurable chunk size and overlap. - Added new chunkers for enhanced text processing, including `LangchainChunker` and improved `TextChunker`. - Refactor - Unified the chunk extraction mechanism across various document types for improved consistency and type safety. - Updated method signatures to enhance clarity and type safety regarding chunker usage. - Enhanced error handling and logging during text segmentation to guide adjustments when content exceeds limits. - Bug Fixes - Adjusted expected output in tests to reflect changes in chunking logic and configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-21 14:10:59 +01:00
hajdul88	eba1515127	feat: quick fix dynamic collection handling in search (#567 ) [COG-1369] <!-- .github/pull_request_template.md --> ## Description Fixes search dynamic collection mapping in graph completion search ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Adjusted graph processing to remove extraneous notifications when expected data elements are absent. - Updated query processing to ensure a more consistent selection of related data types. - Streamlined database error handling by aligning exception management with standard practices. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-21 13:45:42 +01:00
lxobr	e25c7c93fe	fix: correctly add nodes to chunks [COG-1370] (#568 ) <!-- .github/pull_request_template.md --> ## Description - Fix expand_with_nodes_and_edges to correctly add nodes to chunks ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Enhanced the internal processing for data associations to ensure more reliable and consistent handling of connections. - Streamlined the logic to better manage edge cases, improving overall stability and error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-20 12:52:34 +01:00
Boris	ada466879e	fix: add default params to run_tasks (#563 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced the task execution process by enabling default values for certain parameters, allowing users to trigger task processing without supplying every input explicitly. - Bug Fixes - Adjusted asynchronous handling for the `retrieved_edges_to_string` function to ensure proper execution flow in various components. - Documentation - Updated markdown formatting in the Jupyter notebook for improved readability and structure. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-02-19 20:18:51 +01:00
alekszievr	2a167fa1ab	feat: externalize chunkers [cog-1354] (#547 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced document chunk extraction for improved processing consistency across multiple formats. - Refactor - Streamlined the configuration for text chunking by replacing indirect mappings with a direct instantiation approach across document types. - Updated method signatures across various document classes to accept chunker class references instead of string identifiers. - Chores - Removed legacy configuration utilities related to document chunking to simplify processing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris <boris@topoteretes.com>	2025-02-19 13:26:11 +01:00
alekszievr	4efdb29187	Summarize retrieved edges to compact string [COG-1181] (#522 ) <!-- .github/pull_request_template.md --> ## Description Summarize retrieved edges to compact string with no redundancies. Example: Before summarization: CV example: visual innovations -- employs -- visual innovations --- CV 4: Not Relevant Name: David Thompson Contact Information: Email: david.thompson@example.com Phone: (555) 456-7890 Summary: Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals. Education: B.F.A. in Graphic Design, Rhode Island School of Design (2012) Experience: Senior Graphic Designer, CreativeWorks Agency (2015 – Present) Led design projects for clients in various industries. Created branding materials that increased client engagement by 30%. Graphic Designer, Visual Innovations (2012 – 2015) Designed marketing collateral, including brochures, logos, and websites. Collaborated with the marketing team to develop cohesive brand strategies. Skills: Design Software: Adobe Photoshop, Illustrator, InDesign Web Design: HTML, CSS Specialties: Branding and Identity, Typography -- contains -- creativeworks agency --- CV 4: Not Relevant Name: David Thompson Contact Information: Email: david.thompson@example.com Phone: (555) 456-7890 Summary: Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals. Education: B.F.A. in Graphic Design, Rhode Island School of Design (2012) Experience: Senior Graphic Designer, CreativeWorks Agency (2015 – Present) Led design projects for clients in various industries. Created branding materials that increased client engagement by 30%. Graphic Designer, Visual Innovations (2012 – 2015) Designed marketing collateral, including brochures, logos, and websites. Collaborated with the marketing team to develop cohesive brand strategies. Skills: Design Software: Adobe Photoshop, Illustrator, InDesign Web Design: HTML, CSS Specialties: Branding and Identity, Typography -- contains -- visual innovations --- CV 4: Not Relevant Name: David Thompson Contact Information: Email: david.thompson@example.com Phone: (555) 456-7890 Summary: Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals. Education: B.F.A. in Graphic Design, Rhode Island School of Design (2012) Experience: Senior Graphic Designer, CreativeWorks Agency (2015 – Present) Led design projects for clients in various industries. Created branding materials that increased client engagement by 30%. Graphic Designer, Visual Innovations (2012 – 2015) Designed marketing collateral, including brochures, logos, and websites. Collaborated with the marketing team to develop cohesive brand strategies. Skills: Design Software: Adobe Photoshop, Illustrator, InDesign Web Design: HTML, CSS Specialties: Branding and Identity, Typography -- contains -- rhode island school of design --- Experienced Graphic Designer with over 8 years in visual design and branding, specializing in Adobe Creative Suite and enthusiastic about producing engaging visuals. -- made_from -- CV 4: Not Relevant Name: David Thompson Contact Information: Email: david.thompson@example.com Phone: (555) 456-7890 Summary: Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals. Education: B.F.A. in Graphic Design, Rhode Island School of Design (2012) Experience: Senior Graphic Designer, CreativeWorks Agency (2015 – Present) Led design projects for clients in various industries. Created branding materials that increased client engagement by 30%. Graphic Designer, Visual Innovations (2012 – 2015) Designed marketing collateral, including brochures, logos, and websites. Collaborated with the marketing team to develop cohesive brand strategies. Skills: Design Software: Adobe Photoshop, Illustrator, InDesign Web Design: HTML, CSS Specialties: Branding and Identity, Typography After summarization: David Thompson is a Creative Graphic Designer with over 8 years of experience in visual design and branding, proficient in Adobe Creative Suite and passionate about creating compelling visuals. He holds a B.F.A. in Graphic Design from the Rhode Island School of Design (2012). His experience includes working as a Senior Graphic Designer at CreativeWorks Agency (2015 – Present), where he led design projects and created branding materials that increased client engagement by 30%, and as a Graphic Designer at Visual Innovations (2012 – 2015), where he designed marketing collateral and collaborated with the marketing team to develop cohesive brand strategies. His skills include design software such as Adobe Photoshop, Illustrator, and InDesign, as well as web design in HTML and CSS, with specialties in Branding and Identity and Typography. 1. David Thompson employs his skills in visual design and branding. 2. David Thompson contains experience from CreativeWorks Agency. 3. David Thompson contains experience from Visual Innovations. 4. David Thompson made his qualifications from the Rhode Island School of Design. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a summarization engine that converts relationship-based inputs into concise, natural sentences. - Expanded search capabilities with a new query option that generates graph summaries, providing insightful and aggregated results from graph data. - Enhanced asynchronous processing for improved performance in handling graph data queries and summarization. - Added flexibility in specifying string conversion methods for graph edge retrieval. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris <boris@topoteretes.com>	2025-02-18 17:29:55 +01:00
SJ	d05b49863c	Creation of default user to have is_superuser=True by default (#539 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> God mode turned on by default for the default user creation. is_superuser=True in create_default_user.py ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - The default user is now created with elevated (superuser) privileges, which may affect access control and permissions. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-15 03:09:40 +01:00
SJ	a602094598	feat: Update parameters in search API route to match search function parameters order (#528 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> Updated handling of SearchType through the chain: Router receives JSON with searchType Enum, example: "searchType": "CHUNKS" FastAPI converts to SearchType enum via SearchPayloadDTO search_v2.py expects SearchType enum search.py takes SearchType enum and extracts value log_query.py takes string value Query model stores string in database get_search_router.py Matched the exact field name from JSON payload searchType instead of search_type in the SearchPayloadDTO class. Changed cognee_search() params to use payload.query and payload.searchType search.py Changed query_type to SearchType log_query to accept query_type.value parameter instead of str(query_type) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Updated the search functionality to improve consistency and reliability. - Enhanced validation by switching to stricter search type checks, ensuring only valid search types are processed. - Maintained robust error handling for uninterrupted search operations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-02-13 21:04:31 +01:00
Boris Arzentar	20cbcbf52b	fix: ruff error	2025-02-13 14:41:11 +01:00
Boris Arzentar	d0d8559453	fix: consolidate api/sdk/mcp search	2025-02-13 13:15:39 +01:00
Boris	f9e6dcf837	fix: simplify code pipeline (#529 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced code search and dependency analysis for improved accuracy. - Introduced a new high-performance text embedding option. - Added an additional execution entry point for code graph processing. - New optional parameters for flexible property selection in retrieval functions. - Introduced new classes for handling import statements, function definitions, and class definitions. - Updated embedding engine selection based on configuration options. - Bug Fixes - Improved error handling in search operations and database queries for a more stable user experience. - Enhanced error logging for source code parsing. - Refactor - Streamlined asynchronous processing and refactored internal dependency extraction. - Updated configuration and integration settings to enhance overall reliability. - Restructured functions for simplified dependency handling. - Chores - Upgraded and reorganized dependency management with optional libraries for extended functionality. - Added new secret parameters for embedding configuration in workflow settings. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: vasilije <vas.markovic@gmail.com>	2025-02-12 23:58:48 +01:00
hajdul88	1b630366c9	Adds types property to pydantic Datapoint inherited classes (#523 ) <!-- .github/pull_request_template.md --> ## Description This PR adds types to DataPoint pydantic class + fixes visualization colors ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a `type` field to the `DataPoint` model for clearer data classification. - Enhanced color mapping in visualizations by assigning a distinct color to "TextSummary" nodes. - Refactor - Improved default settings for version control and ordering to ensure consistent data behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-11 19:23:19 +01:00
Igor Ilic	ebd1d2adbf	fix: Resolve issue with UUID in telemetry (#524 ) <!-- .github/pull_request_template.md --> ## Description Fixes sending of UUID through telemetry ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Enhanced telemetry logging by ensuring identifiers are consistently formatted. This improvement helps prevent type-related issues during logging and boosts overall reliability without affecting task execution or user-facing functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-11 18:31:53 +01:00
alekszievr	05ba29af01	Feat: log pipeline status and pass it through pipeline [COG-1214] (#501 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced pipeline execution now provides consolidated status feedback with improved telemetry for start, completion, and error events. - Automatic generation of unique dataset identifiers offers clearer task and pipeline run associations. - Refactor - Task execution has been streamlined with explicit parameter handling for more structured pipeline processing. - Interactive examples and demos now return results directly, making integration and monitoring more accessible. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-02-11 16:41:40 +01:00
hajdul88	6a0c0e3ef8	feat: Cognee evaluation framework development (#498 ) <!-- .github/pull_request_template.md --> This PR contains the evaluation framework development for cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Expanded evaluation framework now integrates asynchronous corpus building, question answering, and performance evaluation with adaptive benchmarks for improved metrics (correctness, exact match, and F1 score). - Infrastructure - Added database integration for persistent storage of questions, answers, and metrics. - Launched an interactive metrics dashboard featuring advanced visualizations. - Introduced an automated testing workflow for continuous quality assurance. - Documentation - Updated guidelines for generating concise, clear answers. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-11 16:31:54 +01:00
Boris	8f84713b54	fix: support structured data conversion to data points (#512 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced version tracking and enhanced metadata in core data models for improved data consistency. - Bug Fixes - Improved error handling during graph data loading to prevent disruptions from unexpected identifier formats. - Refactor - Centralized identifier parsing and streamlined model definitions, ensuring smoother and more consistent operations across search, retrieval, and indexing workflows. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-10 17:16:13 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
Igor Ilic	6be6b3d222	fix: resolve key error for visualization, handle ValueError for jedi … (#509 ) …and move duplicate edge information to debug log <!-- .github/pull_request_template.md --> ## Description Fix visualization bug Handle ValueError for CodeGraph Move debug information from print to debug logs ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Enhanced error handling across several modules to ensure smoother operation when unexpected conditions occur. - Updated diagnostic and logging mechanisms to provide more robust system feedback and reduce potential disruptions. - Improved robustness in the deletion of properties to prevent runtime errors related to missing keys. - Added additional exception handling for better analysis of code entities. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 00:46:46 +01:00
alekszievr	8396fed9a1	feat: metrics in neo4j adapter [COG-1082] (#487 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced graph management capabilities allow users to verify graph existence, project complete graphs, and remove graphs, delivering more comprehensive graph insights. - Refactor - Adjusted default task behavior for streamlined performance. - Updated timestamp handling to ensure accurate and consistent record tracking. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-07 15:58:43 +01:00
hajdul88	bcd326518d	feat: implements graph visualization method for cognee (#493 ) <!-- .github/pull_request_template.md --> ## Description This PR contains the improvement of the visualization endpoint ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Launched an enhanced interactive network visualization utility that renders dynamic, browser-based graphs. The new feature simplifies execution by directly generating an HTML file showcasing the visualization—complete with interactive elements and an on-screen confirmation—providing a more intuitive and efficient experience. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-06 11:22:17 +01:00
Igor Ilic	1260fc7db0	fix: Add reraising of general exception handling in cognee [COG-1062] (#490 ) <!-- .github/pull_request_template.md --> ## Description Add re-raising of errors in general exception handling ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes & Stability Improvements - Enhanced error handling throughout the system, ensuring issues during operations like server startup, data processing, and graph management are properly logged and reported. - Refactor - Standardized logging practices replace basic output statements, improving traceability and providing better insights for troubleshooting. - New Features - Updated search functionality now returns only unique results, enhancing data consistency and the overall user experience. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-02-04 10:51:05 +01:00
Vasilije	4d3acc358a	fix: mcp improvements (#472 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Dependency Update - Downgraded `mcp` package version from 1.2.0 to 1.1.3 - Updated `cognee` dependency to include additional features with `cognee[codegraph]` - New Features - Introduced a new tool, "codify", for transforming codebases into knowledge graphs - Enhanced the existing "search" tool to accept a new parameter for search type - Improvements - Streamlined search functionality with a new modular approach - Added new asynchronous function for retrieving and formatting code parts - Documentation - Updated import paths for `SearchType` in various modules and tests to reflect structural changes - Code Cleanup - Removed legacy search module and associated classes/functions - Refined data transfer object classes for consistency and clarity <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-02-04 08:47:31 +01:00
alekszievr	2858a674f5	feat: Calculate graph metrics for networkx graph [COG-1082] (#484 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enabled an option to retrieve more detailed metrics, providing comprehensive analytics for graph and descriptive data. - Refactor - Standardized the way metrics are obtained across components for consistent behavior and improved data accuracy. - Chore - Made internal enhancements to support optional detailed metric calculations, streamlining system performance and ensuring future scalability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-03 18:05:53 +01:00
alekszievr	5119992fd8	feat: Add graph metrics getter in graph db interface and adapters [COG-1082] (#483 ) Dummy implementation of graph metrics to demonstrate how the interface will look like <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced asynchronous functionality for retrieving comprehensive graph metrics, including counts and connectivity details, across different systems. - Refactor - Streamlined metrics processing and storage by shifting to direct retrieval from the graph engine. - Updated naming conventions for the `GraphMetrics` database table and reorganized module imports to enhance internal consistency. - Chores - Removed dataset deletion functionalities while introducing the ability to store descriptive metrics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-02-03 15:25:04 +01:00
Igor Ilic	8879f3fbbe	feat: Add gemini support [COG-1023] (#485 ) <!-- .github/pull_request_template.md --> ## Description PR to test Gemini PR from holchan 1. Add Gemini LLM and Gemini Embedding support 2. Fix CodeGraph issue with chunks being bigger than maximum token value 3. Add Tokenizer adapters to CodeGraph ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for the Gemini LLM provider. - Expanded LLM configuration options. - Introduced a new GitHub Actions workflow for multimetric QA evaluation. - Added new environment variables for LLM and embedding configurations across various workflows. - Bug Fixes - Improved error handling in various components. - Updated tokenization and embedding processes. - Removed warning related to missing `dict` method in data items. - Refactor - Simplified token extraction and decoding methods. - Updated tokenizer interfaces. - Removed deprecated dependencies. - Enhanced retry logic and error handling in embedding processes. - Documentation - Updated configuration comments and settings. - Chores - Updated GitHub Actions workflows to accommodate new secrets and environment variables. - Modified evaluation parameters. - Adjusted dependency management for optional libraries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-01-31 18:03:23 +01:00
hajdul88	f843c256e4	feat: Use unwind for batch edge save and add unit tests for get_graph_from_model * feat: adds some unit tests for get_graph_from_model * feat: updates neo4j add_edges cypher and deletes shallow get_graph_from_model * fix: fixing merge conflict false resolve * chore: deletes old only_root unit test	2025-01-31 13:14:04 +01:00
alekszievr	a79f7133fd	Feat: add number of tokens and descriptive graph metrics to metric table [COG-1132] (#481 ) * Count the number of tokens in documents * save token count to relational db * Add metrics to metric table * Store list as json instead of array in relational db table * Sum in sql instead of python * Unify naming * Return data_points in descriptive metric calculation task --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-01-30 12:39:14 +01:00
alekszievr	edae2771a5	Count the number of tokens in documents [COG-1071] (#476 ) * Count the number of tokens in documents * save token count to relational db --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2025-01-29 11:29:09 +01:00
Igor Ilic	860218632f	refactor: add suggestions from PR Add suggestsions made by CodeRabbit on pull request	2025-01-28 17:15:25 +01:00
Igor Ilic	710ca78d6e	Merge branch 'dev' into COG-970-refactor-tokenizing	2025-01-28 16:31:11 +01:00
alekszievr	98f0f60980	Feat: [cog-1089] Define pydantic models for descriptive graph metrics and input metrics (#466 ) * feat: make tasks a configurable argument in the cognify function * fix: add data points task * Define pydantic models for descriptive graph metrics and input metrics * remove to_json method * Use just one MetricData class instead of two --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-28 16:11:31 +01:00
Igor Ilic	6f8cbdbf1c	Merge branch 'dev' into COG-970-refactor-tokenizing	2025-01-28 15:44:57 +01:00
Igor Ilic	3db7f85c9c	feat: Add max_chunk_tokens value to chunkers Add formula and forwarding of max_chunk_tokens value through Cognee	2025-01-28 14:32:00 +01:00

... 2 3 4 5 6 ...

470 commits