Merge pull request #31 from topoteretes/enable_cmd_runner
Added docs functionality
This commit is contained in:
commit
d76cdf77fc
18 changed files with 652 additions and 120 deletions
131
README.md
131
README.md
|
|
@ -1,20 +1,20 @@
|
||||||
# PromethAI-Memory
|
# PromethAI-Memory
|
||||||
Memory management and testing for the AI Applications and RAGs
|
|
||||||
|
|
||||||
Dynamic Graph Memory Manager + DB + Rag Test Manager
|
AI Applications and RAGs - Cognitive Architecture, Testability, Production Ready Apps
|
||||||
|
|
||||||
|
|
||||||
<p align="center">
|
|
||||||
|
<p align="left">
|
||||||
<a href="https://prometh.ai//#gh-light-mode-only">
|
<a href="https://prometh.ai//#gh-light-mode-only">
|
||||||
<img src="assets/topoteretes_logo.png" width="10%" alt="promethAI logo" />
|
<img src="assets/topoteretes_logo.png" width="5%" alt="promethAI logo" />
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p align="center"><i>Open-source framework that manages memory for AI Agents and LLM apps </i></p>
|
<p align="left"><i>Open-source framework for building and testing RAGs and Cognitive Architectures, designed for accuracy, transparency, and control.</i></p>
|
||||||
|
|
||||||
<p align="center">
|
<p align="left">
|
||||||
<a href="https://github.com/topoteretes/PromethAI-Memory/fork" target="blank">
|
<a href="https://github.com/topoteretes/PromethAI-Memory/fork" target="blank">
|
||||||
<img src="https://img.shields.io/github/forks/topoteretes/PromethAI-Memory?style=for-the-badge" alt="promethAI forks"/>
|
<img src="https://img.shields.io/github/forks/topoteretes/PromethAI-Memory?style=for-the-badge" alt="promethAI forks"/>
|
||||||
</a>
|
</a>
|
||||||
|
|
@ -52,9 +52,9 @@ Dynamic Graph Memory Manager + DB + Rag Test Manager
|
||||||
[//]: # (</p>)
|
[//]: # (</p>)
|
||||||
|
|
||||||
|
|
||||||
<p align="center"><b>Share promethAI Repository</b></p>
|
<p align="left"><b>Share promethAI Repository</b></p>
|
||||||
|
|
||||||
<p align="center">
|
<p align="left">
|
||||||
|
|
||||||
<a href="https://twitter.com/intent/tweet?text=Check%20this%20GitHub%20repository%20out.%20promethAI%20-%20Let%27s%20you%20easily%20build,%20manage%20and%20run%20useful%20autonomous%20AI%20agents.&url=https://github.com/topoteretes/PromethAI-Backend-Backend&hashtags=promethAI,AGI,Autonomics,future" target="blank">
|
<a href="https://twitter.com/intent/tweet?text=Check%20this%20GitHub%20repository%20out.%20promethAI%20-%20Let%27s%20you%20easily%20build,%20manage%20and%20run%20useful%20autonomous%20AI%20agents.&url=https://github.com/topoteretes/PromethAI-Backend-Backend&hashtags=promethAI,AGI,Autonomics,future" target="blank">
|
||||||
<img src="https://img.shields.io/twitter/follow/_promethAI?label=Share Repo on Twitter&style=social" alt="Follow _promethAI"/></a>
|
<img src="https://img.shields.io/twitter/follow/_promethAI?label=Share Repo on Twitter&style=social" alt="Follow _promethAI"/></a>
|
||||||
|
|
@ -71,33 +71,40 @@ Dynamic Graph Memory Manager + DB + Rag Test Manager
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
This repo is built to test and evolve RAG architecture, inspired by human cognitive processes, using Python. It's aims to be production ready, testable, but give great visibility in how we build RAG applications.
|
||||||
|
|
||||||
|
This project is a part of the [PromethAI](https://prometh.ai/) ecosystem.
|
||||||
|
|
||||||
|
It runs in iterations, with each iteration building on the previous one.
|
||||||
|
|
||||||
|
_Keep Ithaka always in your mind.
|
||||||
|
Arriving there is what you’re destined for.
|
||||||
|
But don’t hurry the journey at all.
|
||||||
|
Better if it lasts for years_
|
||||||
|
|
||||||
|
|
||||||
## Production-ready modern data platform
|
### Installation
|
||||||
|
|
||||||
|
To get started with PromethAI Memory, start with the latest iteration, and follow the instructions in the README.md file
|
||||||
|
|
||||||
Browsing the database of theresanaiforthat.com, we can observe around [7000 new, mostly semi-finished projects](https://theresanaiforthat.com/) in the field of applied AI.
|
### Current Focus
|
||||||
It seems it has never been easier to create a startup, build an app, and go to market… and fail.
|
|
||||||
|
|
||||||
Decades of technological advancements have led to small teams being able to do in 2023 what in 2015 required a team of dozens.
|
RAG test manager can be used via API or via the CLI
|
||||||
Yet, the AI apps currently being pushed out still mostly feel and perform like demos.
|
|
||||||
The rise of this new profession is perhaps signaling the need for a solution that is not yet there — a solution that in its essence represents a Large Language Model (LLM) — [a powerful general problem solver](https://lilianweng.github.io/posts/2023-06-23-agent/?fbclid=IwAR1p0W-Mg_4WtjOCeE8E6s7pJZlTDCDLmcXqHYVIrEVisz_D_S8LfN6Vv20) — available in the palm of your hand 24/7/365.
|
|
||||||
|
|
||||||
To address this issue, [dlthub](https://dlthub.com/) and [prometh.ai](http://prometh.ai/) will collaborate on a productionizing a common use-case, progressing step by step. We will utilize the LLMs, frameworks, and services, refining the code until we attain a clearer understanding of what a modern LLM architecture stack might entail.
|

|
||||||
|
|
||||||
## Read more on our blog post [prometh.ai](http://prometh.ai/promethai-memory-blog-post-on)
|
### Project Structure
|
||||||
|
|
||||||
|
#### Level 1 - OpenAI functions + Pydantic + DLTHub
|
||||||
## Project Structure
|
|
||||||
|
|
||||||
### Level 1 - OpenAI functions + Pydantic + DLTHub
|
|
||||||
Scope: Give PDFs to the model and get the output in a structured format
|
Scope: Give PDFs to the model and get the output in a structured format
|
||||||
|
Blog post: https://prometh.ai/promethai-memory-blog-post-one
|
||||||
We introduce the following concepts:
|
We introduce the following concepts:
|
||||||
- Structured output with Pydantic
|
- Structured output with Pydantic
|
||||||
- CMD script to process custom PDFs
|
- CMD script to process custom PDFs
|
||||||
### Level 2 - Memory Manager + Metadata management
|
#### Level 2 - Memory Manager + Metadata management
|
||||||
Scope: Give PDFs to the model and consolidate with the previous user activity and more
|
Scope: Give PDFs to the model and consolidate with the previous user activity and more
|
||||||
|
Blog post: https://prometh.ai/promethai-memory-blog-post-two
|
||||||
We introduce the following concepts:
|
We introduce the following concepts:
|
||||||
|
|
||||||
- Long Term Memory -> store and format the data
|
- Long Term Memory -> store and format the data
|
||||||
|
|
@ -106,8 +113,9 @@ We introduce the following concepts:
|
||||||
- Docker
|
- Docker
|
||||||
- API
|
- API
|
||||||
|
|
||||||
### Level 3 - Dynamic Graph Memory Manager + DB + Rag Test Manager
|
#### Level 3 - Dynamic Graph Memory Manager + DB + Rag Test Manager
|
||||||
Scope: Store the data in N-related stores and test the retrieval with the Rag Test Manager
|
Scope: Store the data in N-related stores and test the retrieval with the Rag Test Manager
|
||||||
|
Blog post: https://prometh.ai/promethai-memory-blog-post-three
|
||||||
- Dynamic Memory Manager -> store the data in N hierarchical stores
|
- Dynamic Memory Manager -> store the data in N hierarchical stores
|
||||||
- Auto-generation of tests
|
- Auto-generation of tests
|
||||||
- Multiple file formats supported
|
- Multiple file formats supported
|
||||||
|
|
@ -116,26 +124,92 @@ Scope: Store the data in N-related stores and test the retrieval with the Rag Te
|
||||||
- API
|
- API
|
||||||
|
|
||||||
|
|
||||||
## Run the level 3
|
### Run the level 3
|
||||||
|
|
||||||
Make sure you have Docker, Poetry, and Python 3.11 installed and postgres installed.
|
Make sure you have Docker, Poetry, and Python 3.11 installed and postgres installed.
|
||||||
|
|
||||||
Copy the .env.example to .env and fill the variables
|
Copy the .env.example to .env and fill in the variables
|
||||||
|
|
||||||
|
|
||||||
Start the docker:
|
Two ways to run the level 3:
|
||||||
|
|
||||||
```docker compose up promethai_mem ```
|
#### Docker:
|
||||||
|
|
||||||
|
Copy the .env.template to .env and fill in the variables
|
||||||
|
Specify the environment variable in the .env file to "docker"
|
||||||
|
|
||||||
|
|
||||||
|
Launch the docker image:
|
||||||
|
|
||||||
|
```docker compose up promethai_mem ```
|
||||||
|
|
||||||
|
Send the request to the API:
|
||||||
|
|
||||||
|
```
|
||||||
|
curl -X POST -H "Content-Type: application/json" -d '{
|
||||||
|
"payload": {
|
||||||
|
"user_id": "681",
|
||||||
|
"data": [".data/3ZCCCW.pdf"],
|
||||||
|
"test_set": "sample",
|
||||||
|
"params": ["chunk_size"],
|
||||||
|
"metadata": "sample",
|
||||||
|
"retriever_type": "single_document_context"
|
||||||
|
}
|
||||||
|
}' http://0.0.0.0:8000/rag-test/rag_test_run
|
||||||
|
|
||||||
|
```
|
||||||
|
Params:
|
||||||
|
|
||||||
|
data -> list of URLs or path to the file, located in the .data folder (pdf, docx, txt, html)
|
||||||
|
test_set -> sample, manual (list of questions and answers)
|
||||||
|
metadata -> sample, manual (json) or version (in progress)
|
||||||
|
params -> chunk_size, chunk_overlap, search_type (hybrid, bm25), embeddings
|
||||||
|
retriever_type -> llm_context, single_document_context, multi_document_context, cognitive_architecture(coming soon)
|
||||||
|
|
||||||
|
Inspect the results in the DB:
|
||||||
|
|
||||||
|
``` docker exec -it postgres psql -U bla ```
|
||||||
|
|
||||||
|
``` \c bubu ```
|
||||||
|
|
||||||
|
``` select * from test_outputs; ```
|
||||||
|
|
||||||
|
Or set up the superset to visualize the results:
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Poetry environment:
|
||||||
|
|
||||||
|
|
||||||
|
Copy the .env.template to .env and fill in the variables
|
||||||
|
Specify the environment variable in the .env file to "local"
|
||||||
|
|
||||||
Use the poetry environment:
|
Use the poetry environment:
|
||||||
|
|
||||||
``` poetry shell ```
|
``` poetry shell ```
|
||||||
|
|
||||||
|
Change the .env file Environment variable to "local"
|
||||||
|
|
||||||
|
Launch the postgres DB
|
||||||
|
|
||||||
|
``` docker compose up postgres ```
|
||||||
|
|
||||||
|
Launch the superset
|
||||||
|
|
||||||
|
``` docker compose up superset ```
|
||||||
|
|
||||||
|
Open the superset in your browser
|
||||||
|
|
||||||
|
``` http://localhost:8088 ```
|
||||||
|
Add the Postgres datasource to the Superset with the following connection string:
|
||||||
|
|
||||||
|
``` postgres://bla:bla@postgres:5432/bubu ```
|
||||||
|
|
||||||
Make sure to run to initialize DB tables
|
Make sure to run to initialize DB tables
|
||||||
|
|
||||||
``` python scripts/create_database.py ```
|
``` python scripts/create_database.py ```
|
||||||
|
|
||||||
After that, you can run the RAG test manager.
|
After that, you can run the RAG test manager from your command line.
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
@ -149,3 +223,4 @@ After that, you can run the RAG test manager.
|
||||||
```
|
```
|
||||||
|
|
||||||
Examples of metadata structure and test set are in the folder "example_data"
|
Examples of metadata structure and test set are in the folder "example_data"
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,9 @@
|
||||||
OPENAI_API_KEY=sk
|
OPENAI_API_KEY=sk
|
||||||
WEAVIATE_URL =
|
WEAVIATE_URL =
|
||||||
WEAVIATE_API_KEY =
|
WEAVIATE_API_KEY =
|
||||||
|
ENVIRONMENT = docker
|
||||||
|
POSTGRES_USER = bla
|
||||||
|
POSTGRES_PASSWORD = bla
|
||||||
|
POSTGRES_DB = bubu
|
||||||
|
POSTGRES_HOST = localhost
|
||||||
|
POSTGRES_HOST_DOCKER = postgres
|
||||||
|
|
@ -43,6 +43,7 @@ RUN apt-get update -q && \
|
||||||
|
|
||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
COPY . /app
|
COPY . /app
|
||||||
|
COPY scripts/ /app
|
||||||
COPY entrypoint.sh /app/entrypoint.sh
|
COPY entrypoint.sh /app/entrypoint.sh
|
||||||
COPY scripts/create_database.py /app/create_database.py
|
COPY scripts/create_database.py /app/create_database.py
|
||||||
RUN chmod +x /app/entrypoint.sh
|
RUN chmod +x /app/entrypoint.sh
|
||||||
|
|
|
||||||
|
|
@ -1,9 +1,11 @@
|
||||||
|
import json
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
|
from enum import Enum
|
||||||
from typing import Dict, Any
|
from typing import Dict, Any
|
||||||
|
|
||||||
import uvicorn
|
import uvicorn
|
||||||
from fastapi import FastAPI
|
from fastapi import FastAPI, BackgroundTasks
|
||||||
from fastapi.responses import JSONResponse
|
from fastapi.responses import JSONResponse
|
||||||
from pydantic import BaseModel
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
|
@ -11,6 +13,7 @@ from database.database import AsyncSessionLocal
|
||||||
from database.database_crud import session_scope
|
from database.database_crud import session_scope
|
||||||
from vectorstore_manager import Memory
|
from vectorstore_manager import Memory
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
from rag_test_manager import start_test
|
||||||
|
|
||||||
# Set up logging
|
# Set up logging
|
||||||
logging.basicConfig(
|
logging.basicConfig(
|
||||||
|
|
@ -200,25 +203,100 @@ def memory_factory(memory_type):
|
||||||
memory_list = ["episodic", "buffer", "semantic"]
|
memory_list = ["episodic", "buffer", "semantic"]
|
||||||
for memory_type in memory_list:
|
for memory_type in memory_list:
|
||||||
memory_factory(memory_type)
|
memory_factory(memory_type)
|
||||||
|
class TestSetType(Enum):
|
||||||
|
SAMPLE = "sample"
|
||||||
|
MANUAL = "manual"
|
||||||
|
|
||||||
|
def get_test_set(test_set_type, folder_path="example_data", payload=None):
|
||||||
|
if test_set_type == TestSetType.SAMPLE:
|
||||||
|
file_path = os.path.join(folder_path, "test_set.json")
|
||||||
|
if os.path.isfile(file_path):
|
||||||
|
with open(file_path, "r") as file:
|
||||||
|
return json.load(file)
|
||||||
|
elif test_set_type == TestSetType.MANUAL:
|
||||||
|
# Check if the manual test set is provided in the payload
|
||||||
|
if payload and "manual_test_set" in payload:
|
||||||
|
return payload["manual_test_set"]
|
||||||
|
else:
|
||||||
|
# Attempt to load the manual test set from a file
|
||||||
|
pass
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class MetadataType(Enum):
|
||||||
|
SAMPLE = "sample"
|
||||||
|
MANUAL = "manual"
|
||||||
|
|
||||||
|
def get_metadata(metadata_type, folder_path="example_data", payload=None):
|
||||||
|
if metadata_type == MetadataType.SAMPLE:
|
||||||
|
file_path = os.path.join(folder_path, "metadata.json")
|
||||||
|
if os.path.isfile(file_path):
|
||||||
|
with open(file_path, "r") as file:
|
||||||
|
return json.load(file)
|
||||||
|
elif metadata_type == MetadataType.MANUAL:
|
||||||
|
# Check if the manual metadata is provided in the payload
|
||||||
|
if payload and "manual_metadata" in payload:
|
||||||
|
return payload["manual_metadata"]
|
||||||
|
else:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
@app.post("/rag-test/rag_test_run", response_model=dict)
|
@app.post("/rag-test/rag_test_run", response_model=dict)
|
||||||
async def rag_test_run(
|
async def rag_test_run(
|
||||||
payload: Payload,
|
payload: Payload,
|
||||||
# files: List[UploadFile] = File(...),
|
background_tasks: BackgroundTasks,
|
||||||
):
|
):
|
||||||
try:
|
try:
|
||||||
from rag_test_manager import start_test
|
logging.info("Starting RAG Test")
|
||||||
logging.info(" Running RAG Test ")
|
|
||||||
decoded_payload = payload.payload
|
decoded_payload = payload.payload
|
||||||
output = await start_test(data=decoded_payload['data'], test_set=decoded_payload['test_set'], user_id=decoded_payload['user_id'], params=decoded_payload['params'], metadata=decoded_payload['metadata'],
|
test_set_type = TestSetType(decoded_payload['test_set'])
|
||||||
retriever_type=decoded_payload['retriever_type'])
|
|
||||||
return JSONResponse(content={"response": output}, status_code=200)
|
metadata_type = MetadataType(decoded_payload['metadata'])
|
||||||
except Exception as e:
|
|
||||||
return JSONResponse(
|
metadata = get_metadata(metadata_type, payload=decoded_payload)
|
||||||
content={"response": {"error": str(e)}}, status_code=503
|
if metadata is None:
|
||||||
|
return JSONResponse(content={"response": "Invalid metadata value"}, status_code=400)
|
||||||
|
|
||||||
|
test_set = get_test_set(test_set_type, payload=decoded_payload)
|
||||||
|
if test_set is None:
|
||||||
|
return JSONResponse(content={"response": "Invalid test_set value"}, status_code=400)
|
||||||
|
|
||||||
|
async def run_start_test(data, test_set, user_id, params, metadata, retriever_type):
|
||||||
|
result = await start_test(data = data, test_set = test_set, user_id =user_id, params =params, metadata =metadata, retriever_type=retriever_type)
|
||||||
|
|
||||||
|
logging.info("Retriever DATA type", type(decoded_payload['data']))
|
||||||
|
|
||||||
|
background_tasks.add_task(
|
||||||
|
run_start_test,
|
||||||
|
decoded_payload['data'],
|
||||||
|
test_set,
|
||||||
|
decoded_payload['user_id'],
|
||||||
|
decoded_payload['params'],
|
||||||
|
metadata,
|
||||||
|
decoded_payload['retriever_type']
|
||||||
)
|
)
|
||||||
|
|
||||||
|
logging.info("Retriever type", decoded_payload['retriever_type'])
|
||||||
|
return JSONResponse(content={"response": "Task has been started"}, status_code=200)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return JSONResponse(
|
||||||
|
|
||||||
|
content={"response": {"error": str(e)}}, status_code=503
|
||||||
|
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# @app.get("/rag-test/{task_id}")
|
||||||
|
# async def check_task_status(task_id: int):
|
||||||
|
# task_status = task_status_db.get(task_id, "not_found")
|
||||||
|
#
|
||||||
|
# if task_status == "not_found":
|
||||||
|
# return {"status": "Task not found"}
|
||||||
|
#
|
||||||
|
# return {"status": task_status}
|
||||||
|
|
||||||
# @app.get("/available-buffer-actions", response_model=dict)
|
# @app.get("/available-buffer-actions", response_model=dict)
|
||||||
# async def available_buffer_actions(
|
# async def available_buffer_actions(
|
||||||
|
|
|
||||||
80
level_3/create_database.py
Normal file
80
level_3/create_database.py
Normal file
|
|
@ -0,0 +1,80 @@
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
|
||||||
|
# this is needed to import classes from other modules
|
||||||
|
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||||
|
# Get the parent directory of your script and add it to sys.path
|
||||||
|
parent_dir = os.path.dirname(script_dir)
|
||||||
|
sys.path.append(parent_dir)
|
||||||
|
|
||||||
|
from database.database import Base, engine
|
||||||
|
import models.memory
|
||||||
|
import models.metadatas
|
||||||
|
import models.operation
|
||||||
|
import models.sessions
|
||||||
|
import models.testoutput
|
||||||
|
import models.testset
|
||||||
|
import models.user
|
||||||
|
import models.docs
|
||||||
|
from sqlalchemy import create_engine, text
|
||||||
|
import psycopg2
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
load_dotenv()
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
def create_admin_engine(username, password, host, database_name):
|
||||||
|
admin_url = f"postgresql://{username}:{password}@{host}:5432/{database_name}"
|
||||||
|
return create_engine(admin_url)
|
||||||
|
|
||||||
|
|
||||||
|
def database_exists(username, password, host, db_name):
|
||||||
|
engine = create_admin_engine(username, password, host, db_name)
|
||||||
|
connection = engine.connect()
|
||||||
|
query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")
|
||||||
|
result = connection.execute(query).fetchone()
|
||||||
|
connection.close()
|
||||||
|
engine.dispose()
|
||||||
|
return result is not None
|
||||||
|
|
||||||
|
|
||||||
|
def create_database(username, password, host, db_name):
|
||||||
|
engine = create_admin_engine(username, password, host, db_name)
|
||||||
|
connection = engine.raw_connection()
|
||||||
|
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
|
||||||
|
cursor = connection.cursor()
|
||||||
|
cursor.execute(f"CREATE DATABASE {db_name}")
|
||||||
|
cursor.close()
|
||||||
|
connection.close()
|
||||||
|
engine.dispose()
|
||||||
|
|
||||||
|
|
||||||
|
def create_tables(engine):
|
||||||
|
Base.metadata.create_all(bind=engine)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
username = os.getenv('POSTGRES_USER')
|
||||||
|
password = os.getenv('POSTGRES_PASSWORD')
|
||||||
|
database_name = os.getenv('POSTGRES_DB')
|
||||||
|
environment = os.environ.get("ENVIRONMENT")
|
||||||
|
|
||||||
|
if environment == "local":
|
||||||
|
host = os.getenv('POSTGRES_HOST')
|
||||||
|
|
||||||
|
elif environment == "docker":
|
||||||
|
host = os.getenv('POSTGRES_HOST_DOCKER')
|
||||||
|
else:
|
||||||
|
host = os.getenv('POSTGRES_HOST_DOCKER')
|
||||||
|
|
||||||
|
engine = create_admin_engine(username, password, host, database_name)
|
||||||
|
|
||||||
|
if not database_exists(username, password, host, database_name):
|
||||||
|
print(f"Database {database_name} does not exist. Creating...")
|
||||||
|
create_database(username, password, host, database_name)
|
||||||
|
print(f"Database {database_name} created successfully.")
|
||||||
|
|
||||||
|
create_tables(engine)
|
||||||
|
|
@ -24,7 +24,19 @@ RETRY_DELAY = 5
|
||||||
username = os.getenv('POSTGRES_USER')
|
username = os.getenv('POSTGRES_USER')
|
||||||
password = os.getenv('POSTGRES_PASSWORD')
|
password = os.getenv('POSTGRES_PASSWORD')
|
||||||
database_name = os.getenv('POSTGRES_DB')
|
database_name = os.getenv('POSTGRES_DB')
|
||||||
host = os.getenv('POSTGRES_HOST')
|
import os
|
||||||
|
|
||||||
|
environment = os.environ.get("ENVIRONMENT")
|
||||||
|
|
||||||
|
if environment == "local":
|
||||||
|
host= os.getenv('POSTGRES_HOST')
|
||||||
|
|
||||||
|
elif environment == "docker":
|
||||||
|
host= os.getenv('POSTGRES_HOST_DOCKER')
|
||||||
|
else:
|
||||||
|
host= os.getenv('POSTGRES_HOST_DOCKER')
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# Use the asyncpg driver for async operation
|
# Use the asyncpg driver for async operation
|
||||||
SQLALCHEMY_DATABASE_URL = f"postgresql+asyncpg://{username}:{password}@{host}:5432/{database_name}"
|
SQLALCHEMY_DATABASE_URL = f"postgresql+asyncpg://{username}:{password}@{host}:5432/{database_name}"
|
||||||
|
|
|
||||||
|
|
@ -18,14 +18,25 @@ services:
|
||||||
- promethai_mem_backend
|
- promethai_mem_backend
|
||||||
build:
|
build:
|
||||||
context: ./
|
context: ./
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
- "./:/app"
|
- "./:/app"
|
||||||
|
- ./.data:/app/.data
|
||||||
|
|
||||||
environment:
|
environment:
|
||||||
- HOST=0.0.0.0
|
- HOST=0.0.0.0
|
||||||
profiles: ["exclude-from-up"]
|
profiles: ["exclude-from-up"]
|
||||||
ports:
|
ports:
|
||||||
- 8000:8000
|
- 8000:8000
|
||||||
- 443:443
|
- 443:443
|
||||||
|
depends_on:
|
||||||
|
- postgres
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpus: "4.0"
|
||||||
|
memory: 8GB
|
||||||
|
|
||||||
|
|
||||||
postgres:
|
postgres:
|
||||||
image: postgres
|
image: postgres
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,20 @@
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
export ENVIRONMENT
|
export ENVIRONMENT
|
||||||
|
# Run Python scripts with error handling
|
||||||
|
echo "Running fetch_secret.py"
|
||||||
python fetch_secret.py
|
python fetch_secret.py
|
||||||
|
if [ $? -ne 0 ]; then
|
||||||
|
echo "Error: fetch_secret.py failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Running create_database.py"
|
||||||
python create_database.py
|
python create_database.py
|
||||||
|
if [ $? -ne 0 ]; then
|
||||||
|
echo "Error: create_database.py failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
# Start Gunicorn
|
# Start Gunicorn
|
||||||
gunicorn -w 2 -k uvicorn.workers.UvicornWorker -t 120 --bind=0.0.0.0:8000 --bind=0.0.0.0:443 --log-level debug api:app
|
echo "Starting Gunicorn"
|
||||||
|
gunicorn -w 3 -k uvicorn.workers.UvicornWorker -t 30000 --bind=0.0.0.0:8000 --bind=0.0.0.0:443 --log-level debug api:app
|
||||||
|
|
|
||||||
|
|
@ -15,4 +15,5 @@ class DocsModel(Base):
|
||||||
created_at = Column(DateTime, default=datetime.utcnow)
|
created_at = Column(DateTime, default=datetime.utcnow)
|
||||||
updated_at = Column(DateTime, onupdate=datetime.utcnow)
|
updated_at = Column(DateTime, onupdate=datetime.utcnow)
|
||||||
|
|
||||||
operation = relationship("Operation", back_populates="docs")
|
|
||||||
|
operations = relationship("Operation", back_populates="docs")
|
||||||
|
|
@ -14,6 +14,7 @@ class Operation(Base):
|
||||||
id = Column(String, primary_key=True)
|
id = Column(String, primary_key=True)
|
||||||
user_id = Column(String, ForeignKey('users.id'), index=True) # Link to User
|
user_id = Column(String, ForeignKey('users.id'), index=True) # Link to User
|
||||||
operation_type = Column(String, nullable=True)
|
operation_type = Column(String, nullable=True)
|
||||||
|
operation_status = Column(String, nullable=True)
|
||||||
operation_params = Column(String, nullable=True)
|
operation_params = Column(String, nullable=True)
|
||||||
number_of_files = Column(Integer, nullable=True)
|
number_of_files = Column(Integer, nullable=True)
|
||||||
test_set_id = Column(String, ForeignKey('test_sets.id'), index=True)
|
test_set_id = Column(String, ForeignKey('test_sets.id'), index=True)
|
||||||
|
|
@ -24,6 +25,7 @@ class Operation(Base):
|
||||||
# Relationships
|
# Relationships
|
||||||
user = relationship("User", back_populates="operations")
|
user = relationship("User", back_populates="operations")
|
||||||
test_set = relationship("TestSet", back_populates="operations")
|
test_set = relationship("TestSet", back_populates="operations")
|
||||||
|
docs = relationship("DocsModel", back_populates="operations")
|
||||||
|
|
||||||
def __repr__(self):
|
def __repr__(self):
|
||||||
return f"<Operation(id={self.id}, user_id={self.user_id}, created_at={self.created_at}, updated_at={self.updated_at})>"
|
return f"<Operation(id={self.id}, user_id={self.user_id}, created_at={self.created_at}, updated_at={self.updated_at})>"
|
||||||
|
|
|
||||||
|
|
@ -36,6 +36,7 @@ class TestOutput(Base):
|
||||||
test_output = Column(String, nullable=True)
|
test_output = Column(String, nullable=True)
|
||||||
test_expected_output = Column(String, nullable=True)
|
test_expected_output = Column(String, nullable=True)
|
||||||
test_context = Column(String, nullable=True)
|
test_context = Column(String, nullable=True)
|
||||||
|
number_of_memories = Column(String, nullable=True)
|
||||||
|
|
||||||
test_results = Column(JSON, nullable=True)
|
test_results = Column(JSON, nullable=True)
|
||||||
created_at = Column(DateTime, default=datetime.utcnow)
|
created_at = Column(DateTime, default=datetime.utcnow)
|
||||||
|
|
|
||||||
8
level_3/poetry.lock
generated
8
level_3/poetry.lock
generated
|
|
@ -3869,18 +3869,18 @@ diagrams = ["jinja2", "railroad-diagrams"]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "pypdf"
|
name = "pypdf"
|
||||||
version = "3.16.4"
|
version = "3.17.0"
|
||||||
description = "A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files"
|
description = "A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files"
|
||||||
optional = false
|
optional = false
|
||||||
python-versions = ">=3.6"
|
python-versions = ">=3.6"
|
||||||
files = [
|
files = [
|
||||||
{file = "pypdf-3.16.4-py3-none-any.whl", hash = "sha256:a9b1eaf2db4c2edd93093470d33c3f353235c4a694f8a426a92a8ce77cea9eb7"},
|
{file = "pypdf-3.17.0-py3-none-any.whl", hash = "sha256:67f6bb7acd8fdbcf7e7a7d5319d12b8de100f5f94538d6e5647aaec3eb7c7dde"},
|
||||||
{file = "pypdf-3.16.4.tar.gz", hash = "sha256:01927771b562d4ba84939ef95b393f0179166da786c5db710d07f807c52f480d"},
|
{file = "pypdf-3.17.0.tar.gz", hash = "sha256:9fab275fea57c9e5b2416035d13d867a459ebe36294a4c39a3d0bb45a7404bad"},
|
||||||
]
|
]
|
||||||
|
|
||||||
[package.extras]
|
[package.extras]
|
||||||
crypto = ["PyCryptodome", "cryptography"]
|
crypto = ["PyCryptodome", "cryptography"]
|
||||||
dev = ["black", "flit", "pip-tools", "pre-commit (<2.18.0)", "pytest-cov", "pytest-socket", "pytest-timeout", "wheel"]
|
dev = ["black", "flit", "pip-tools", "pre-commit (<2.18.0)", "pytest-cov", "pytest-socket", "pytest-timeout", "pytest-xdist", "wheel"]
|
||||||
docs = ["myst_parser", "sphinx", "sphinx_rtd_theme"]
|
docs = ["myst_parser", "sphinx", "sphinx_rtd_theme"]
|
||||||
full = ["Pillow (>=8.0.0)", "PyCryptodome", "cryptography"]
|
full = ["Pillow (>=8.0.0)", "PyCryptodome", "cryptography"]
|
||||||
image = ["Pillow (>=8.0.0)"]
|
image = ["Pillow (>=8.0.0)"]
|
||||||
|
|
|
||||||
|
|
@ -30,6 +30,7 @@ from models.testset import TestSet
|
||||||
from models.testoutput import TestOutput
|
from models.testoutput import TestOutput
|
||||||
from models.metadatas import MetaDatas
|
from models.metadatas import MetaDatas
|
||||||
from models.operation import Operation
|
from models.operation import Operation
|
||||||
|
from models.docs import DocsModel
|
||||||
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
import ast
|
import ast
|
||||||
|
|
@ -90,7 +91,10 @@ def get_document_names(doc_input):
|
||||||
- Folder path: get_document_names(".data")
|
- Folder path: get_document_names(".data")
|
||||||
- Single document file path: get_document_names(".data/example.pdf")
|
- Single document file path: get_document_names(".data/example.pdf")
|
||||||
- Document name provided as a string: get_document_names("example.docx")
|
- Document name provided as a string: get_document_names("example.docx")
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
if isinstance(doc_input, list):
|
||||||
|
return doc_input
|
||||||
if os.path.isdir(doc_input):
|
if os.path.isdir(doc_input):
|
||||||
# doc_input is a folder
|
# doc_input is a folder
|
||||||
folder_path = doc_input
|
folder_path = doc_input
|
||||||
|
|
@ -114,7 +118,17 @@ async def add_entity(session, entity):
|
||||||
s.add(entity) # No need to commit; session_scope takes care of it
|
s.add(entity) # No need to commit; session_scope takes care of it
|
||||||
|
|
||||||
return "Successfully added entity"
|
return "Successfully added entity"
|
||||||
|
async def update_entity(session, model, entity_id, new_value):
|
||||||
|
async with session_scope(session) as s:
|
||||||
|
# Retrieve the entity from the database
|
||||||
|
entity = await s.get(model, entity_id)
|
||||||
|
|
||||||
|
if entity:
|
||||||
|
# Update the relevant column and 'updated_at' will be automatically updated
|
||||||
|
entity.operation_status = new_value
|
||||||
|
return "Successfully updated entity"
|
||||||
|
else:
|
||||||
|
return "Entity not found"
|
||||||
|
|
||||||
async def retrieve_job_by_id(session, user_id, job_id):
|
async def retrieve_job_by_id(session, user_id, job_id):
|
||||||
try:
|
try:
|
||||||
|
|
@ -313,8 +327,8 @@ async def eval_test(
|
||||||
test_case = LLMTestCase(
|
test_case = LLMTestCase(
|
||||||
input=query,
|
input=query,
|
||||||
actual_output=result_output,
|
actual_output=result_output,
|
||||||
expected_output=expected_output,
|
expected_output=[expected_output],
|
||||||
context=context,
|
context=[context],
|
||||||
)
|
)
|
||||||
metric = OverallScoreMetric()
|
metric = OverallScoreMetric()
|
||||||
|
|
||||||
|
|
@ -472,20 +486,20 @@ async def start_test(
|
||||||
|
|
||||||
if params is None:
|
if params is None:
|
||||||
data_format = data_format_route(
|
data_format = data_format_route(
|
||||||
data
|
data[0]
|
||||||
) # Assume data_format_route is predefined
|
) # Assume data_format_route is predefined
|
||||||
logging.info("Data format is %s", data_format)
|
logging.info("Data format is %s", data_format)
|
||||||
data_location = data_location_route(data)
|
data_location = data_location_route(data[0])
|
||||||
logging.info(
|
logging.info(
|
||||||
"Data location is %s", data_location
|
"Data location is %s", data_location
|
||||||
) # Assume data_location_route is predefined
|
) # Assume data_location_route is predefined
|
||||||
test_params = generate_param_variants(included_params=["chunk_size"])
|
test_params = generate_param_variants(included_params=["chunk_size"])
|
||||||
if params:
|
if params:
|
||||||
data_format = data_format_route(
|
data_format = data_format_route(
|
||||||
data
|
data[0]
|
||||||
) # Assume data_format_route is predefined
|
) # Assume data_format_route is predefined
|
||||||
logging.info("Data format is %s", data_format)
|
logging.info("Data format is %s", data_format)
|
||||||
data_location = data_location_route(data)
|
data_location = data_location_route(data[0])
|
||||||
logging.info(
|
logging.info(
|
||||||
"Data location is %s", data_location
|
"Data location is %s", data_location
|
||||||
)
|
)
|
||||||
|
|
@ -508,6 +522,7 @@ async def start_test(
|
||||||
user_id=user_id,
|
user_id=user_id,
|
||||||
operation_params=str(test_params),
|
operation_params=str(test_params),
|
||||||
number_of_files=count_files_in_data_folder(),
|
number_of_files=count_files_in_data_folder(),
|
||||||
|
operation_status = "RUNNING",
|
||||||
operation_type=retriever_type,
|
operation_type=retriever_type,
|
||||||
test_set_id=test_set_id,
|
test_set_id=test_set_id,
|
||||||
),
|
),
|
||||||
|
|
@ -517,7 +532,7 @@ async def start_test(
|
||||||
|
|
||||||
await add_entity(
|
await add_entity(
|
||||||
session,
|
session,
|
||||||
Docs(
|
DocsModel(
|
||||||
id=str(uuid.uuid4()),
|
id=str(uuid.uuid4()),
|
||||||
operation_id=job_id,
|
operation_id=job_id,
|
||||||
doc_name = doc
|
doc_name = doc
|
||||||
|
|
@ -586,11 +601,13 @@ async def start_test(
|
||||||
return retrieve_action["data"]["Get"][test_id][0]["text"]
|
return retrieve_action["data"]["Get"][test_id][0]["text"]
|
||||||
|
|
||||||
async def run_eval(test_item, search_result):
|
async def run_eval(test_item, search_result):
|
||||||
|
logging.info("Initiated test set evaluation")
|
||||||
test_eval = await eval_test(
|
test_eval = await eval_test(
|
||||||
query=test_item["question"],
|
query=str(test_item["question"]),
|
||||||
expected_output=test_item["answer"],
|
expected_output=str(test_item["answer"]),
|
||||||
context=str(search_result),
|
context=str(search_result),
|
||||||
)
|
)
|
||||||
|
logging.info("Successfully evaluated test set")
|
||||||
return test_eval
|
return test_eval
|
||||||
|
|
||||||
async def run_generate_test_set(test_id):
|
async def run_generate_test_set(test_id):
|
||||||
|
|
@ -607,9 +624,6 @@ async def start_test(
|
||||||
return dynamic_test_manager(retrieve_action)
|
return dynamic_test_manager(retrieve_action)
|
||||||
|
|
||||||
test_eval_pipeline = []
|
test_eval_pipeline = []
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
if retriever_type == "llm_context":
|
if retriever_type == "llm_context":
|
||||||
for test_qa in test_set:
|
for test_qa in test_set:
|
||||||
context = ""
|
context = ""
|
||||||
|
|
@ -690,47 +704,49 @@ async def start_test(
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
await update_entity(session, Operation, job_id, "COMPLETED")
|
||||||
|
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
|
||||||
async def main():
|
async def main():
|
||||||
metadata = {
|
# metadata = {
|
||||||
"version": "1.0",
|
# "version": "1.0",
|
||||||
"agreement_id": "AG123456",
|
# "agreement_id": "AG123456",
|
||||||
"privacy_policy": "https://example.com/privacy",
|
# "privacy_policy": "https://example.com/privacy",
|
||||||
"terms_of_service": "https://example.com/terms",
|
# "terms_of_service": "https://example.com/terms",
|
||||||
"format": "json",
|
# "format": "json",
|
||||||
"schema_version": "1.1",
|
# "schema_version": "1.1",
|
||||||
"checksum": "a1b2c3d4e5f6",
|
# "checksum": "a1b2c3d4e5f6",
|
||||||
"owner": "John Doe",
|
# "owner": "John Doe",
|
||||||
"license": "MIT",
|
# "license": "MIT",
|
||||||
"validity_start": "2023-08-01",
|
# "validity_start": "2023-08-01",
|
||||||
"validity_end": "2024-07-31",
|
# "validity_end": "2024-07-31",
|
||||||
}
|
# }
|
||||||
|
#
|
||||||
test_set = [
|
# test_set = [
|
||||||
{
|
# {
|
||||||
"question": "Who is the main character in 'The Call of the Wild'?",
|
# "question": "Who is the main character in 'The Call of the Wild'?",
|
||||||
"answer": "Buck",
|
# "answer": "Buck",
|
||||||
},
|
# },
|
||||||
{"question": "Who wrote 'The Call of the Wild'?", "answer": "Jack London"},
|
# {"question": "Who wrote 'The Call of the Wild'?", "answer": "Jack London"},
|
||||||
{
|
# {
|
||||||
"question": "Where does Buck live at the start of the book?",
|
# "question": "Where does Buck live at the start of the book?",
|
||||||
"answer": "In the Santa Clara Valley, at Judge Miller’s place.",
|
# "answer": "In the Santa Clara Valley, at Judge Miller’s place.",
|
||||||
},
|
# },
|
||||||
{
|
# {
|
||||||
"question": "Why is Buck kidnapped?",
|
# "question": "Why is Buck kidnapped?",
|
||||||
"answer": "He is kidnapped to be sold as a sled dog in the Yukon during the Klondike Gold Rush.",
|
# "answer": "He is kidnapped to be sold as a sled dog in the Yukon during the Klondike Gold Rush.",
|
||||||
},
|
# },
|
||||||
{
|
# {
|
||||||
"question": "How does Buck become the leader of the sled dog team?",
|
# "question": "How does Buck become the leader of the sled dog team?",
|
||||||
"answer": "Buck becomes the leader after defeating the original leader, Spitz, in a fight.",
|
# "answer": "Buck becomes the leader after defeating the original leader, Spitz, in a fight.",
|
||||||
},
|
# },
|
||||||
]
|
# ]
|
||||||
# "https://www.ibiblio.org/ebooks/London/Call%20of%20Wild.pdf"
|
# "https://www.ibiblio.org/ebooks/London/Call%20of%20Wild.pdf"
|
||||||
# http://public-library.uk/ebooks/59/83.pdf
|
# # http://public-library.uk/ebooks/59/83.pdf
|
||||||
# result = await start_test(
|
# result = await start_test(
|
||||||
# ".data/3ZCCCW.pdf",
|
# [".data/3ZCCCW.pdf"],
|
||||||
# test_set=test_set,
|
# test_set=test_set,
|
||||||
# user_id="677",
|
# user_id="677",
|
||||||
# params=["chunk_size", "search_type"],
|
# params=["chunk_size", "search_type"],
|
||||||
|
|
@ -739,7 +755,7 @@ async def main():
|
||||||
# )
|
# )
|
||||||
|
|
||||||
parser = argparse.ArgumentParser(description="Run tests against a document.")
|
parser = argparse.ArgumentParser(description="Run tests against a document.")
|
||||||
parser.add_argument("--file", required=True, help="URL or location of the document to test.")
|
parser.add_argument("--file", nargs="+", required=True, help="List of file paths to test.")
|
||||||
parser.add_argument("--test_set", required=True, help="Path to JSON file containing the test set.")
|
parser.add_argument("--test_set", required=True, help="Path to JSON file containing the test set.")
|
||||||
parser.add_argument("--user_id", required=True, help="User ID.")
|
parser.add_argument("--user_id", required=True, help="User ID.")
|
||||||
parser.add_argument("--params", help="Additional parameters in JSON format.")
|
parser.add_argument("--params", help="Additional parameters in JSON format.")
|
||||||
|
|
@ -776,6 +792,7 @@ async def main():
|
||||||
return
|
return
|
||||||
else:
|
else:
|
||||||
params = None
|
params = None
|
||||||
|
logging.info("Args datatype is", type(args.file))
|
||||||
#clean up params here
|
#clean up params here
|
||||||
await start_test(data=args.file, test_set=test_set, user_id= args.user_id, params= params, metadata =metadata, retriever_type=args.retriever_type)
|
await start_test(data=args.file, test_set=test_set, user_id= args.user_id, params= params, metadata =metadata, retriever_type=args.retriever_type)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,7 @@ import models.sessions
|
||||||
import models.testoutput
|
import models.testoutput
|
||||||
import models.testset
|
import models.testset
|
||||||
import models.user
|
import models.user
|
||||||
|
import models.docs
|
||||||
from sqlalchemy import create_engine, text
|
from sqlalchemy import create_engine, text
|
||||||
import psycopg2
|
import psycopg2
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
|
|
||||||
|
|
@ -289,3 +289,9 @@ class BaseMemory:
|
||||||
async def delete_memories(self, namespace:str, params: Optional[str] = None):
|
async def delete_memories(self, namespace:str, params: Optional[str] = None):
|
||||||
return await self.vector_db.delete_memories(namespace,params)
|
return await self.vector_db.delete_memories(namespace,params)
|
||||||
|
|
||||||
|
|
||||||
|
async def count_memories(self, namespace:str, params: Optional[str] = None):
|
||||||
|
return await self.vector_db.count_memories(namespace,params)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -6,50 +6,72 @@ sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
|
||||||
from vectordb.chunkers.chunkers import chunk_data
|
from vectordb.chunkers.chunkers import chunk_data
|
||||||
from llama_hub.file.base import SimpleDirectoryReader
|
from llama_hub.file.base import SimpleDirectoryReader
|
||||||
|
from langchain.document_loaders import UnstructuredURLLoader
|
||||||
from langchain.document_loaders import DirectoryLoader
|
from langchain.document_loaders import DirectoryLoader
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from langchain.document_loaders import TextLoader
|
||||||
import requests
|
import requests
|
||||||
async def _document_loader( observation: str, loader_settings: dict):
|
async def _document_loader( observation: str, loader_settings: dict):
|
||||||
# Check the format of the document
|
|
||||||
document_format = loader_settings.get("format", "text")
|
document_format = loader_settings.get("format", "text")
|
||||||
loader_strategy = loader_settings.get("strategy", "VANILLA")
|
loader_strategy = loader_settings.get("strategy", "VANILLA")
|
||||||
chunk_size = loader_settings.get("chunk_size", 500)
|
chunk_size = loader_settings.get("chunk_size", 500)
|
||||||
chunk_overlap = loader_settings.get("chunk_overlap", 20)
|
chunk_overlap = loader_settings.get("chunk_overlap", 20)
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
|
||||||
print("LOADER SETTINGS", loader_settings)
|
logging.info("LOADER SETTINGS %s", loader_settings)
|
||||||
|
|
||||||
if document_format == "PDF":
|
list_of_docs = loader_settings["path"]
|
||||||
if loader_settings.get("source") == "URL":
|
chunked_doc = []
|
||||||
pdf_response = requests.get(loader_settings["path"])
|
|
||||||
pdf_stream = BytesIO(pdf_response.content)
|
|
||||||
with fitz.open(stream=pdf_stream, filetype='pdf') as doc:
|
|
||||||
file_content = ""
|
|
||||||
for page in doc:
|
|
||||||
file_content += page.get_text()
|
|
||||||
pages = chunk_data(chunk_strategy= loader_strategy, source_data=file_content, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
|
||||||
|
|
||||||
return pages
|
if loader_settings.get("source") == "URL":
|
||||||
elif loader_settings.get("source") == "DEVICE":
|
for file in list_of_docs:
|
||||||
import os
|
if document_format == "PDF":
|
||||||
|
pdf_response = requests.get(file)
|
||||||
|
pdf_stream = BytesIO(pdf_response.content)
|
||||||
|
with fitz.open(stream=pdf_stream, filetype='pdf') as doc:
|
||||||
|
file_content = ""
|
||||||
|
for page in doc:
|
||||||
|
file_content += page.get_text()
|
||||||
|
pages = chunk_data(chunk_strategy=loader_strategy, source_data=file_content, chunk_size=chunk_size,
|
||||||
|
chunk_overlap=chunk_overlap)
|
||||||
|
|
||||||
current_directory = os.getcwd()
|
chunked_doc.append(pages)
|
||||||
import logging
|
|
||||||
logging.info("Current Directory: %s", current_directory)
|
|
||||||
|
|
||||||
loader = DirectoryLoader(".data", recursive=True)
|
elif document_format == "TEXT":
|
||||||
|
loader = UnstructuredURLLoader(urls=file)
|
||||||
|
file_content = loader.load()
|
||||||
|
pages = chunk_data(chunk_strategy=loader_strategy, source_data=file_content, chunk_size=chunk_size,
|
||||||
|
chunk_overlap=chunk_overlap)
|
||||||
|
chunked_doc.append(pages)
|
||||||
|
|
||||||
|
elif loader_settings.get("source") == "DEVICE":
|
||||||
|
|
||||||
|
current_directory = os.getcwd()
|
||||||
|
logging.info("Current Directory: %s", current_directory)
|
||||||
|
|
||||||
|
loader = DirectoryLoader(".data", recursive=True)
|
||||||
|
if document_format == "PDF":
|
||||||
# loader = SimpleDirectoryReader(".data", recursive=True, exclude_hidden=True)
|
# loader = SimpleDirectoryReader(".data", recursive=True, exclude_hidden=True)
|
||||||
documents = loader.load()
|
documents = loader.load()
|
||||||
logging.info("Documents: %s", documents)
|
logging.info("Documents: %s", documents)
|
||||||
# pages = documents.load_and_split()
|
# pages = documents.load_and_split()
|
||||||
return documents
|
chunked_doc.append(documents)
|
||||||
|
|
||||||
elif document_format == "TEXT":
|
|
||||||
pages = chunk_data(chunk_strategy= loader_strategy, source_data=observation, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
elif document_format == "TEXT":
|
||||||
return pages
|
documents = loader.load()
|
||||||
|
logging.info("Documents: %s", documents)
|
||||||
|
# pages = documents.load_and_split()
|
||||||
|
chunked_doc.append(documents)
|
||||||
|
|
||||||
else:
|
else:
|
||||||
raise ValueError(f"Unsupported document format: {document_format}")
|
raise ValueError(f"Error: ")
|
||||||
|
return chunked_doc
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -153,7 +153,7 @@ class WeaviateVectorDB(VectorDB):
|
||||||
# Assuming _document_loader returns a list of documents
|
# Assuming _document_loader returns a list of documents
|
||||||
documents = await _document_loader(observation, loader_settings)
|
documents = await _document_loader(observation, loader_settings)
|
||||||
logging.info("here are the docs %s", str(documents))
|
logging.info("here are the docs %s", str(documents))
|
||||||
for doc in documents:
|
for doc in documents[0]:
|
||||||
document_to_load = self._stuct(doc.page_content, params, metadata_schema_class)
|
document_to_load = self._stuct(doc.page_content, params, metadata_schema_class)
|
||||||
|
|
||||||
logging.info("Loading document with provided loader settings %s", str(document_to_load))
|
logging.info("Loading document with provided loader settings %s", str(document_to_load))
|
||||||
|
|
@ -290,6 +290,30 @@ class WeaviateVectorDB(VectorDB):
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
async def count_memories(self, namespace: str = None, params: dict = None) -> int:
|
||||||
|
"""
|
||||||
|
Count memories in a Weaviate database.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
namespace (str, optional): The Weaviate namespace to count memories in. If not provided, uses the default namespace.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
int: The number of memories in the specified namespace.
|
||||||
|
"""
|
||||||
|
if namespace is None:
|
||||||
|
namespace = self.namespace
|
||||||
|
|
||||||
|
client = self.init_weaviate(namespace =namespace)
|
||||||
|
|
||||||
|
try:
|
||||||
|
object_count = client.query.aggregate(namespace).with_meta_count().do()
|
||||||
|
return object_count
|
||||||
|
except Exception as e:
|
||||||
|
logging.info(f"Error counting memories: {str(e)}")
|
||||||
|
# Handle the error or log it
|
||||||
|
return 0
|
||||||
|
|
||||||
def update_memories(self, observation, namespace: str, params: dict = None):
|
def update_memories(self, observation, namespace: str, params: dict = None):
|
||||||
client = self.init_weaviate(namespace = self.namespace)
|
client = self.init_weaviate(namespace = self.namespace)
|
||||||
|
|
||||||
|
|
|
||||||
182
level_3/wait-for-it.sh
Normal file
182
level_3/wait-for-it.sh
Normal file
|
|
@ -0,0 +1,182 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# Use this script to test if a given TCP host/port are available
|
||||||
|
|
||||||
|
WAITFORIT_cmdname=${0##*/}
|
||||||
|
|
||||||
|
echoerr() { if [[ $WAITFORIT_QUIET -ne 1 ]]; then echo "$@" 1>&2; fi }
|
||||||
|
|
||||||
|
usage()
|
||||||
|
{
|
||||||
|
cat << USAGE >&2
|
||||||
|
Usage:
|
||||||
|
$WAITFORIT_cmdname host:port [-s] [-t timeout] [-- command args]
|
||||||
|
-h HOST | --host=HOST Host or IP under test
|
||||||
|
-p PORT | --port=PORT TCP port under test
|
||||||
|
Alternatively, you specify the host and port as host:port
|
||||||
|
-s | --strict Only execute subcommand if the test succeeds
|
||||||
|
-q | --quiet Don't output any status messages
|
||||||
|
-t TIMEOUT | --timeout=TIMEOUT
|
||||||
|
Timeout in seconds, zero for no timeout
|
||||||
|
-- COMMAND ARGS Execute command with args after the test finishes
|
||||||
|
USAGE
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_for()
|
||||||
|
{
|
||||||
|
if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
|
||||||
|
echoerr "$WAITFORIT_cmdname: waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
|
||||||
|
else
|
||||||
|
echoerr "$WAITFORIT_cmdname: waiting for $WAITFORIT_HOST:$WAITFORIT_PORT without a timeout"
|
||||||
|
fi
|
||||||
|
WAITFORIT_start_ts=$(date +%s)
|
||||||
|
while :
|
||||||
|
do
|
||||||
|
if [[ $WAITFORIT_ISBUSY -eq 1 ]]; then
|
||||||
|
nc -z $WAITFORIT_HOST $WAITFORIT_PORT
|
||||||
|
WAITFORIT_result=$?
|
||||||
|
else
|
||||||
|
(echo -n > /dev/tcp/$WAITFORIT_HOST/$WAITFORIT_PORT) >/dev/null 2>&1
|
||||||
|
WAITFORIT_result=$?
|
||||||
|
fi
|
||||||
|
if [[ $WAITFORIT_result -eq 0 ]]; then
|
||||||
|
WAITFORIT_end_ts=$(date +%s)
|
||||||
|
echoerr "$WAITFORIT_cmdname: $WAITFORIT_HOST:$WAITFORIT_PORT is available after $((WAITFORIT_end_ts - WAITFORIT_start_ts)) seconds"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
sleep 1
|
||||||
|
done
|
||||||
|
return $WAITFORIT_result
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_for_wrapper()
|
||||||
|
{
|
||||||
|
# In order to support SIGINT during timeout: http://unix.stackexchange.com/a/57692
|
||||||
|
if [[ $WAITFORIT_QUIET -eq 1 ]]; then
|
||||||
|
timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --quiet --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
|
||||||
|
else
|
||||||
|
timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
|
||||||
|
fi
|
||||||
|
WAITFORIT_PID=$!
|
||||||
|
trap "kill -INT -$WAITFORIT_PID" INT
|
||||||
|
wait $WAITFORIT_PID
|
||||||
|
WAITFORIT_RESULT=$?
|
||||||
|
if [[ $WAITFORIT_RESULT -ne 0 ]]; then
|
||||||
|
echoerr "$WAITFORIT_cmdname: timeout occurred after waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
|
||||||
|
fi
|
||||||
|
return $WAITFORIT_RESULT
|
||||||
|
}
|
||||||
|
|
||||||
|
# process arguments
|
||||||
|
while [[ $# -gt 0 ]]
|
||||||
|
do
|
||||||
|
case "$1" in
|
||||||
|
*:* )
|
||||||
|
WAITFORIT_hostport=(${1//:/ })
|
||||||
|
WAITFORIT_HOST=${WAITFORIT_hostport[0]}
|
||||||
|
WAITFORIT_PORT=${WAITFORIT_hostport[1]}
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
--child)
|
||||||
|
WAITFORIT_CHILD=1
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
-q | --quiet)
|
||||||
|
WAITFORIT_QUIET=1
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
-s | --strict)
|
||||||
|
WAITFORIT_STRICT=1
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
-h)
|
||||||
|
WAITFORIT_HOST="$2"
|
||||||
|
if [[ $WAITFORIT_HOST == "" ]]; then break; fi
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--host=*)
|
||||||
|
WAITFORIT_HOST="${1#*=}"
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
-p)
|
||||||
|
WAITFORIT_PORT="$2"
|
||||||
|
if [[ $WAITFORIT_PORT == "" ]]; then break; fi
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--port=*)
|
||||||
|
WAITFORIT_PORT="${1#*=}"
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
-t)
|
||||||
|
WAITFORIT_TIMEOUT="$2"
|
||||||
|
if [[ $WAITFORIT_TIMEOUT == "" ]]; then break; fi
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--timeout=*)
|
||||||
|
WAITFORIT_TIMEOUT="${1#*=}"
|
||||||
|
shift 1
|
||||||
|
;;
|
||||||
|
--)
|
||||||
|
shift
|
||||||
|
WAITFORIT_CLI=("$@")
|
||||||
|
break
|
||||||
|
;;
|
||||||
|
--help)
|
||||||
|
usage
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echoerr "Unknown argument: $1"
|
||||||
|
usage
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ "$WAITFORIT_HOST" == "" || "$WAITFORIT_PORT" == "" ]]; then
|
||||||
|
echoerr "Error: you need to provide a host and port to test."
|
||||||
|
usage
|
||||||
|
fi
|
||||||
|
|
||||||
|
WAITFORIT_TIMEOUT=${WAITFORIT_TIMEOUT:-15}
|
||||||
|
WAITFORIT_STRICT=${WAITFORIT_STRICT:-0}
|
||||||
|
WAITFORIT_CHILD=${WAITFORIT_CHILD:-0}
|
||||||
|
WAITFORIT_QUIET=${WAITFORIT_QUIET:-0}
|
||||||
|
|
||||||
|
# Check to see if timeout is from busybox?
|
||||||
|
WAITFORIT_TIMEOUT_PATH=$(type -p timeout)
|
||||||
|
WAITFORIT_TIMEOUT_PATH=$(realpath $WAITFORIT_TIMEOUT_PATH 2>/dev/null || readlink -f $WAITFORIT_TIMEOUT_PATH)
|
||||||
|
|
||||||
|
WAITFORIT_BUSYTIMEFLAG=""
|
||||||
|
if [[ $WAITFORIT_TIMEOUT_PATH =~ "busybox" ]]; then
|
||||||
|
WAITFORIT_ISBUSY=1
|
||||||
|
# Check if busybox timeout uses -t flag
|
||||||
|
# (recent Alpine versions don't support -t anymore)
|
||||||
|
if timeout &>/dev/stdout | grep -q -e '-t '; then
|
||||||
|
WAITFORIT_BUSYTIMEFLAG="-t"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
WAITFORIT_ISBUSY=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ $WAITFORIT_CHILD -gt 0 ]]; then
|
||||||
|
wait_for
|
||||||
|
WAITFORIT_RESULT=$?
|
||||||
|
exit $WAITFORIT_RESULT
|
||||||
|
else
|
||||||
|
if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
|
||||||
|
wait_for_wrapper
|
||||||
|
WAITFORIT_RESULT=$?
|
||||||
|
else
|
||||||
|
wait_for
|
||||||
|
WAITFORIT_RESULT=$?
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ $WAITFORIT_CLI != "" ]]; then
|
||||||
|
if [[ $WAITFORIT_RESULT -ne 0 && $WAITFORIT_STRICT -eq 1 ]]; then
|
||||||
|
echoerr "$WAITFORIT_cmdname: strict mode, refusing to execute subprocess"
|
||||||
|
exit $WAITFORIT_RESULT
|
||||||
|
fi
|
||||||
|
exec "${WAITFORIT_CLI[@]}"
|
||||||
|
else
|
||||||
|
exit $WAITFORIT_RESULT
|
||||||
|
fi
|
||||||
Loading…
Add table
Reference in a new issue