* Revert "fix: Add metadata reflection fix to sqlite as well"
This reverts commit 394a0b2dfb.
* COG-810 Implement a top-down dependency graph builder tool (#268)
* feat: parse repo to call graph
* Update/repo_processor/top_down_repo_parse.py task
* fix: minor improvements
* feat: file parsing jedi script optimisation
---------
* Add type to DataPoint metadata (#364)
* Add missing index_fields
* Use DataPoint UUID type in pgvector create_data_points
* Make _metadata mandatory everywhere
* feat: Add search by dataset for cognee
Added ability to search by datasets for cognee users
Feature COG-912
* feat: outsources chunking parameters to extract chunk from documents … (#289)
* feat: outsources chunking parameters to extract chunk from documents task
* fix: Remove backend lock from UI
Removed lock that prevented using multiple datasets in cognify
Fix COG-912
* COG 870 Remove duplicate edges from the code graph (#293)
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
* test: Added test for getting of documents for search
Added test to verify getting documents related to datasets intended for search
Test COG-912
* Structured code summarization (#375)
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
* Structured code summarization
* add missing prompt file
* Remove summarization_model argument from summarize_code and fix typehinting
* minor refactors
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
* fix: Resolve issue with cognify router graph model default value
Resolve issue with default value for graph model in cognify endpoint
Fix
* chore: Resolve typo in getting documents code
Resolve typo in code
chore COG-912
* Update .github/workflows/dockerhub.yml
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Update .github/workflows/dockerhub.yml
* Update .github/workflows/dockerhub.yml
* Update .github/workflows/dockerhub.yml
* Update get_cognify_router.py
* fix: Resolve syntax issue with cognify router
Resolve syntax issue with cognify router
Fix
* feat: Add ruff pre-commit hook for linting and formatting
Added formatting and linting on pre-commit hook
Feature COG-650
* chore: Update ruff lint options in pyproject file
Update ruff lint options in pyproject file
Chore
* test: Add ruff linter github action
Added linting check with ruff in github actions
Test COG-650
* feat: deletes executor limit from get_repo_file_dependencies
* feat: implements mock feature in LiteLLM engine
* refactor: Remove changes to cognify router
Remove changes to cognify router
Refactor COG-650
* fix: fixing boolean env for github actions
* test: Add test for ruff format for cognee code
Test if code is formatted for cognee
Test COG-650
* refactor: Rename ruff gh actions
Rename ruff gh actions to be more understandable
Refactor COG-650
* chore: Remove checking of ruff lint and format on push
Remove checking of ruff lint and format on push
Chore COG-650
* feat: Add deletion of local files when deleting data
Delete local files when deleting data from cognee
Feature COG-475
* fix: changes back the max workers to 12
* feat: Adds mock summary for codegraph pipeline
* refacotr: Add current development status
Save current development status
Refactor
* Fix langfuse
* Fix langfuse
* Fix langfuse
* Add evaluation notebook
* Rename eval notebook
* chore: Add temporary state of development
Add temp development state to branch
Chore
* fix: Add poetry.lock file, make langfuse mandatory
Added langfuse as mandatory dependency, added poetry.lock file
Fix
* Fix: fixes langfuse config settings
* feat: Add deletion of local files made by cognee through data endpoint
Delete local files made by cognee when deleting data from database through endpoint
Feature COG-475
* test: Revert changes on test_pgvector
Revert changes on test_pgvector which were made to test deletion of local files
Test COG-475
* chore: deletes the old test for the codegraph pipeline
* test: Add test to verify deletion of local files
Added test that checks local files created by cognee will be deleted and those not created by cognee won't
Test COG-475
* chore: deletes unused old version of the codegraph
* chore: deletes unused imports from code_graph_pipeline
* Ingest non-code files
* Fixing review findings
* Ingest non-code files (#395)
* Ingest non-code files
* Fixing review findings
* test: Update test regarding message
Update assertion message, add veryfing of file existence
* Handle retryerrors in code summary (#396)
* Handle retryerrors in code summary
* Log instead of print
* fix: updates the acreate_structured_output
* chore: Add logging to sentry when file which should exist can't be found
Log to sentry that a file which should exist can't be found
Chore COG-475
* Fix diagram
* fix: refactor mcp
* Add Smithery CLI installation instructions and badge
* Move readme
* Update README.md
* Update README.md
* Cog 813 source code chunks (#383)
* fix: pass the list of all CodeFiles to enrichment task
* feat: introduce SourceCodeChunk, update metadata
* feat: get_source_code_chunks code graph pipeline task
* feat: integrate get_source_code_chunks task, comment out summarize_code
* Fix code summarization (#387)
* feat: update data models
* feat: naive parse long strings in source code
* fix: get_non_py_files instead of get_non_code_files
* fix: limit recursion, add comment
* handle embedding empty input error (#398)
* feat: robustly handle CodeFile source code
* refactor: sort imports
* todo: add support for other embedding models
* feat: add custom logger
* feat: add robustness to get_source_code_chunks
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* feat: improve embedding exceptions
* refactor: format indents, rename module
---------
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Fix diagram
* Fix diagram
* Fix instructions
* Fix instructions
* adding and fixing files
* Update README.md
* ruff format
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Implement PR review
* Comment out profiling
* Comment out profiling
* Comment out profiling
* fix: add allowed extensions
* fix: adhere UnstructuredDocument.read() to Document
* feat: time code graph run and add mock support
* Fix ollama, work on visualization
* fix: Fixes faulty logging format and sets up error logging in dynamic steps example
* Overcome ContextWindowExceededError by checking token count while chunking (#413)
* fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints
* Adjust AudioDocument and handle None token limit
* Handle azure models as well
* Fix visualization
* Fix visualization
* Fix visualization
* Add clean logging to code graph example
* Remove setting envvars from arg
* fix: fixes create_cognee_style_network_with_logo unit test
* fix: removes accidental remained print
* Fix visualization
* Fix visualization
* Fix visualization
* Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.
* Fix visualization
* Fix visualization
* Fix poetry issues
* Get embedding engine instead of passing it in code chunking.
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* chore: Update version of poetry install action
* chore: Update action to trigger on pull request for any branch
* chore: Remove if in github action to allow triggering on push
* chore: Remove if condition to allow gh actions to trigger on push to PR
* chore: Update poetry version in github actions
* chore: Set fixed ubuntu version to 22.04
* chore: Update py lint to use ubuntu 22.04
* chore: update ubuntu version to 22.04
* feat: implements the first version of graph based completion in search
* chore: Update python 3.9 gh action to use 3.12 instead
* chore: Update formatting of utils.py
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Adjust integration tests
* fix: Fixes ruff formatting
* Handle circular import
* fix: Resolve profiler issue with partial and recursive logger imports
Resolve issue for profiler with partial and recursive logger imports
* fix: Remove logger from __init__.py file
* test: Test profiling on HEAD branch
* test: Return profiler to base branch
* Set max_tokens in config
* Adjust SWE-bench script to code graph pipeline call
* Adjust SWE-bench script to code graph pipeline call
* fix: Add fix for accessing dictionary elements that don't exits
Using get for the text key instead of direct access to handle situation if the text key doesn't exist
* feat: Add ability to change graph database configuration through cognee
* feat: adds pydantic types to graph layer models
* test: Test ubuntu 24.04
* test: change all actions to ubuntu-latest
* feat: adds basic retriever for swe bench
* Match Ruff version in config to the one in github actions
* feat: implements code retreiver
* Fix: fixes unit test for codepart search
* Format with Ruff 0.9.0
* Fix: deleting incorrect repo path
* docs: Add LlamaIndex Cognee integration notebook
Added LlamaIndex Cognee integration notebook
* test: Add github action for testing llama index cognee integration notebook
* fix: resolve issue with langfuse dependency installation when integrating cognee in different packages
* version: Increase version to 0.1.21
* fix: update dependencies of the mcp server
* Update README.md
* Fix: Fixes logging setup
* feat: deletes on the fly embeddings as uses edge collections
* fix: Change nbformat on llama index integration notebook
* fix: Resolve api key issue with llama index integration notebook
* fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault
* version: Increase version to 0.1.22
---------
Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Rita Aleksziev <alekszievr@gmail.com>
Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>
207 lines
6.2 KiB
Python
207 lines
6.2 KiB
Python
import cognee
|
||
import asyncio
|
||
import logging
|
||
|
||
from cognee.api.v1.search import SearchType
|
||
from cognee.shared.utils import setup_logging
|
||
|
||
job_1 = """
|
||
CV 1: Relevant
|
||
Name: Dr. Emily Carter
|
||
Contact Information:
|
||
|
||
Email: emily.carter@example.com
|
||
Phone: (555) 123-4567
|
||
Summary:
|
||
|
||
Senior Data Scientist with over 8 years of experience in machine learning and predictive analytics. Expertise in developing advanced algorithms and deploying scalable models in production environments.
|
||
|
||
Education:
|
||
|
||
Ph.D. in Computer Science, Stanford University (2014)
|
||
B.S. in Mathematics, University of California, Berkeley (2010)
|
||
Experience:
|
||
|
||
Senior Data Scientist, InnovateAI Labs (2016 – Present)
|
||
Led a team in developing machine learning models for natural language processing applications.
|
||
Implemented deep learning algorithms that improved prediction accuracy by 25%.
|
||
Collaborated with cross-functional teams to integrate models into cloud-based platforms.
|
||
Data Scientist, DataWave Analytics (2014 – 2016)
|
||
Developed predictive models for customer segmentation and churn analysis.
|
||
Analyzed large datasets using Hadoop and Spark frameworks.
|
||
Skills:
|
||
|
||
Programming Languages: Python, R, SQL
|
||
Machine Learning: TensorFlow, Keras, Scikit-Learn
|
||
Big Data Technologies: Hadoop, Spark
|
||
Data Visualization: Tableau, Matplotlib
|
||
"""
|
||
|
||
job_2 = """
|
||
CV 2: Relevant
|
||
Name: Michael Rodriguez
|
||
Contact Information:
|
||
|
||
Email: michael.rodriguez@example.com
|
||
Phone: (555) 234-5678
|
||
Summary:
|
||
|
||
Data Scientist with a strong background in machine learning and statistical modeling. Skilled in handling large datasets and translating data into actionable business insights.
|
||
|
||
Education:
|
||
|
||
M.S. in Data Science, Carnegie Mellon University (2013)
|
||
B.S. in Computer Science, University of Michigan (2011)
|
||
Experience:
|
||
|
||
Senior Data Scientist, Alpha Analytics (2017 – Present)
|
||
Developed machine learning models to optimize marketing strategies.
|
||
Reduced customer acquisition cost by 15% through predictive modeling.
|
||
Data Scientist, TechInsights (2013 – 2017)
|
||
Analyzed user behavior data to improve product features.
|
||
Implemented A/B testing frameworks to evaluate product changes.
|
||
Skills:
|
||
|
||
Programming Languages: Python, Java, SQL
|
||
Machine Learning: Scikit-Learn, XGBoost
|
||
Data Visualization: Seaborn, Plotly
|
||
Databases: MySQL, MongoDB
|
||
"""
|
||
|
||
|
||
job_3 = """
|
||
CV 3: Relevant
|
||
Name: Sarah Nguyen
|
||
Contact Information:
|
||
|
||
Email: sarah.nguyen@example.com
|
||
Phone: (555) 345-6789
|
||
Summary:
|
||
|
||
Data Scientist specializing in machine learning with 6 years of experience. Passionate about leveraging data to drive business solutions and improve product performance.
|
||
|
||
Education:
|
||
|
||
M.S. in Statistics, University of Washington (2014)
|
||
B.S. in Applied Mathematics, University of Texas at Austin (2012)
|
||
Experience:
|
||
|
||
Data Scientist, QuantumTech (2016 – Present)
|
||
Designed and implemented machine learning algorithms for financial forecasting.
|
||
Improved model efficiency by 20% through algorithm optimization.
|
||
Junior Data Scientist, DataCore Solutions (2014 – 2016)
|
||
Assisted in developing predictive models for supply chain optimization.
|
||
Conducted data cleaning and preprocessing on large datasets.
|
||
Skills:
|
||
|
||
Programming Languages: Python, R
|
||
Machine Learning Frameworks: PyTorch, Scikit-Learn
|
||
Statistical Analysis: SAS, SPSS
|
||
Cloud Platforms: AWS, Azure
|
||
"""
|
||
|
||
|
||
job_4 = """
|
||
CV 4: Not Relevant
|
||
Name: David Thompson
|
||
Contact Information:
|
||
|
||
Email: david.thompson@example.com
|
||
Phone: (555) 456-7890
|
||
Summary:
|
||
|
||
Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.
|
||
|
||
Education:
|
||
|
||
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
|
||
Experience:
|
||
|
||
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
|
||
Led design projects for clients in various industries.
|
||
Created branding materials that increased client engagement by 30%.
|
||
Graphic Designer, Visual Innovations (2012 – 2015)
|
||
Designed marketing collateral, including brochures, logos, and websites.
|
||
Collaborated with the marketing team to develop cohesive brand strategies.
|
||
Skills:
|
||
|
||
Design Software: Adobe Photoshop, Illustrator, InDesign
|
||
Web Design: HTML, CSS
|
||
Specialties: Branding and Identity, Typography
|
||
"""
|
||
|
||
|
||
job_5 = """
|
||
CV 5: Not Relevant
|
||
Name: Jessica Miller
|
||
Contact Information:
|
||
|
||
Email: jessica.miller@example.com
|
||
Phone: (555) 567-8901
|
||
Summary:
|
||
|
||
Experienced Sales Manager with a strong track record in driving sales growth and building high-performing teams. Excellent communication and leadership skills.
|
||
|
||
Education:
|
||
|
||
B.A. in Business Administration, University of Southern California (2010)
|
||
Experience:
|
||
|
||
Sales Manager, Global Enterprises (2015 – Present)
|
||
Managed a sales team of 15 members, achieving a 20% increase in annual revenue.
|
||
Developed sales strategies that expanded customer base by 25%.
|
||
Sales Representative, Market Leaders Inc. (2010 – 2015)
|
||
Consistently exceeded sales targets and received the 'Top Salesperson' award in 2013.
|
||
Skills:
|
||
|
||
Sales Strategy and Planning
|
||
Team Leadership and Development
|
||
CRM Software: Salesforce, Zoho
|
||
Negotiation and Relationship Building
|
||
"""
|
||
|
||
|
||
async def main(enable_steps):
|
||
# Step 1: Reset data and system state
|
||
if enable_steps.get("prune_data"):
|
||
await cognee.prune.prune_data()
|
||
print("Data pruned.")
|
||
|
||
if enable_steps.get("prune_system"):
|
||
await cognee.prune.prune_system(metadata=True)
|
||
print("System pruned.")
|
||
|
||
# Step 2: Add text
|
||
if enable_steps.get("add_text"):
|
||
text_list = [job_1, job_2, job_3, job_4, job_5]
|
||
for text in text_list:
|
||
await cognee.add(text)
|
||
print(f"Added text: {text[:35]}...")
|
||
|
||
# Step 3: Create knowledge graph
|
||
if enable_steps.get("cognify"):
|
||
await cognee.cognify()
|
||
print("Knowledge graph created.")
|
||
|
||
# Step 4: Query insights
|
||
if enable_steps.get("retriever"):
|
||
search_results = await cognee.search(
|
||
SearchType.GRAPH_COMPLETION, query_text="Who has experience in design tools?"
|
||
)
|
||
print(search_results)
|
||
|
||
|
||
if __name__ == "__main__":
|
||
setup_logging(logging.ERROR)
|
||
|
||
rebuild_kg = True
|
||
retrieve = True
|
||
steps_to_enable = {
|
||
"prune_data": rebuild_kg,
|
||
"prune_system": rebuild_kg,
|
||
"add_text": rebuild_kg,
|
||
"cognify": rebuild_kg,
|
||
"retriever": retrieve,
|
||
}
|
||
|
||
asyncio.run(main(steps_to_enable))
|