* feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit
678 lines
27 KiB
Python
678 lines
27 KiB
Python
from ..utils import verbose_debug, VERBOSE_DEBUG
|
|
import sys
|
|
import os
|
|
import logging
|
|
|
|
if sys.version_info < (3, 9):
|
|
from typing import AsyncIterator
|
|
else:
|
|
from collections.abc import AsyncIterator
|
|
import pipmaster as pm # Pipmaster for dynamic library install
|
|
|
|
# install specific modules
|
|
if not pm.is_installed("openai"):
|
|
pm.install("openai")
|
|
|
|
from openai import (
|
|
AsyncOpenAI,
|
|
APIConnectionError,
|
|
RateLimitError,
|
|
APITimeoutError,
|
|
)
|
|
from tenacity import (
|
|
retry,
|
|
stop_after_attempt,
|
|
wait_exponential,
|
|
retry_if_exception_type,
|
|
)
|
|
from lightrag.utils import (
|
|
wrap_embedding_func_with_attrs,
|
|
safe_unicode_decode,
|
|
logger,
|
|
)
|
|
from lightrag.types import GPTKeywordExtractionFormat
|
|
from lightrag.api import __api_version__
|
|
|
|
import numpy as np
|
|
import base64
|
|
from typing import Any, Union
|
|
|
|
from dotenv import load_dotenv
|
|
|
|
# use the .env that is inside the current folder
|
|
# allows to use different .env file for each lightrag instance
|
|
# the OS environment variables take precedence over the .env file
|
|
load_dotenv(dotenv_path=".env", override=False)
|
|
|
|
|
|
class InvalidResponseError(Exception):
|
|
"""Custom exception class for triggering retry mechanism"""
|
|
|
|
pass
|
|
|
|
|
|
def create_openai_async_client(
|
|
api_key: str | None = None,
|
|
base_url: str | None = None,
|
|
client_configs: dict[str, Any] = None,
|
|
) -> AsyncOpenAI:
|
|
"""Create an AsyncOpenAI client with the given configuration.
|
|
|
|
Args:
|
|
api_key: OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.
|
|
base_url: Base URL for the OpenAI API. If None, uses the default OpenAI API URL.
|
|
client_configs: Additional configuration options for the AsyncOpenAI client.
|
|
These will override any default configurations but will be overridden by
|
|
explicit parameters (api_key, base_url).
|
|
|
|
Returns:
|
|
An AsyncOpenAI client instance.
|
|
"""
|
|
if not api_key:
|
|
api_key = os.environ["OPENAI_API_KEY"]
|
|
|
|
default_headers = {
|
|
"User-Agent": f"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_8) LightRAG/{__api_version__}",
|
|
"Content-Type": "application/json",
|
|
}
|
|
|
|
if client_configs is None:
|
|
client_configs = {}
|
|
|
|
# Create a merged config dict with precedence: explicit params > client_configs > defaults
|
|
merged_configs = {
|
|
**client_configs,
|
|
"default_headers": default_headers,
|
|
"api_key": api_key,
|
|
}
|
|
|
|
if base_url is not None:
|
|
merged_configs["base_url"] = base_url
|
|
else:
|
|
merged_configs["base_url"] = os.environ.get(
|
|
"OPENAI_API_BASE", "https://api.openai.com/v1"
|
|
)
|
|
|
|
return AsyncOpenAI(**merged_configs)
|
|
|
|
|
|
def _normalize_openai_kwargs_for_model(model: str, kwargs: dict[str, Any]) -> None:
|
|
"""
|
|
Normalize OpenAI API parameters based on the model being used.
|
|
|
|
This function handles model-specific parameter requirements:
|
|
- gpt-5-nano uses 'max_completion_tokens' instead of 'max_tokens'
|
|
- gpt-5-nano uses reasoning tokens which consume from the token budget
|
|
- gpt-5-nano doesn't support custom temperature values
|
|
- Other models support both parameters
|
|
|
|
Args:
|
|
model: The model name (e.g., 'gpt-5-nano', 'gpt-4o', 'gpt-4o-mini')
|
|
kwargs: The API parameters dict to normalize (modified in-place)
|
|
"""
|
|
# Handle max_tokens vs max_completion_tokens conversion for gpt-5 models
|
|
if model.startswith("gpt-5"):
|
|
# gpt-5-nano and variants use max_completion_tokens
|
|
if "max_tokens" in kwargs and "max_completion_tokens" not in kwargs:
|
|
# If only max_tokens is set, move it to max_completion_tokens
|
|
max_tokens = kwargs.pop("max_tokens")
|
|
# For gpt-5-nano, we need to account for reasoning tokens
|
|
# Increase buffer to ensure actual content is generated
|
|
# Reasoning typically uses 1.5-2x the actual content tokens needed
|
|
kwargs["max_completion_tokens"] = int(max(max_tokens * 2.5, 300))
|
|
else:
|
|
# If both are set, remove max_tokens (it's not supported)
|
|
max_tokens = kwargs.pop("max_tokens", None)
|
|
if max_tokens and "max_completion_tokens" in kwargs:
|
|
# If max_completion_tokens is already set and seems too small, increase it
|
|
if kwargs["max_completion_tokens"] < 300:
|
|
kwargs["max_completion_tokens"] = int(max(kwargs["max_completion_tokens"] * 2.5, 300))
|
|
|
|
# Ensure a minimum token budget for gpt-5-nano due to reasoning overhead
|
|
if "max_completion_tokens" in kwargs:
|
|
if kwargs["max_completion_tokens"] < 300:
|
|
# Minimum 300 tokens to account for reasoning (reasoning can be expensive)
|
|
original = kwargs["max_completion_tokens"]
|
|
kwargs["max_completion_tokens"] = 300
|
|
logger.debug(f"Increased max_completion_tokens from {original} to 300 for {model} (reasoning overhead)")
|
|
|
|
# Handle temperature constraint for gpt-5 models
|
|
if model.startswith("gpt-5"):
|
|
# gpt-5-nano requires default temperature (doesn't support custom values)
|
|
# Remove any custom temperature setting
|
|
if "temperature" in kwargs:
|
|
kwargs.pop("temperature")
|
|
logger.debug(f"Removed custom temperature for {model}: uses default")
|
|
|
|
logger.debug(f"Normalized parameters for {model}: {kwargs}")
|
|
|
|
|
|
@retry(
|
|
stop=stop_after_attempt(3),
|
|
wait=wait_exponential(multiplier=1, min=4, max=10),
|
|
retry=(
|
|
retry_if_exception_type(RateLimitError)
|
|
| retry_if_exception_type(APIConnectionError)
|
|
| retry_if_exception_type(APITimeoutError)
|
|
| retry_if_exception_type(InvalidResponseError)
|
|
),
|
|
)
|
|
async def openai_complete_if_cache(
|
|
model: str,
|
|
prompt: str,
|
|
system_prompt: str | None = None,
|
|
history_messages: list[dict[str, Any]] | None = None,
|
|
enable_cot: bool = False,
|
|
base_url: str | None = None,
|
|
api_key: str | None = None,
|
|
token_tracker: Any | None = None,
|
|
**kwargs: Any,
|
|
) -> str:
|
|
"""Complete a prompt using OpenAI's API with caching support and Chain of Thought (COT) integration.
|
|
|
|
This function supports automatic integration of reasoning content (思维链) from models that provide
|
|
Chain of Thought capabilities. The reasoning content is seamlessly integrated into the response
|
|
using <think>...</think> tags.
|
|
|
|
Note on `reasoning_content`: This feature relies on a Deepseek Style `reasoning_content`
|
|
in the API response, which may be provided by OpenAI-compatible endpoints that support
|
|
Chain of Thought.
|
|
|
|
COT Integration Rules:
|
|
1. COT content is accepted only when regular content is empty and `reasoning_content` has content.
|
|
2. COT processing stops when regular content becomes available.
|
|
3. If both `content` and `reasoning_content` are present simultaneously, reasoning is ignored.
|
|
4. If both fields have content from the start, COT is never activated.
|
|
5. For streaming: COT content is inserted into the content stream with <think> tags.
|
|
6. For non-streaming: COT content is prepended to regular content with <think> tags.
|
|
|
|
Args:
|
|
model: The OpenAI model to use.
|
|
prompt: The prompt to complete.
|
|
system_prompt: Optional system prompt to include.
|
|
history_messages: Optional list of previous messages in the conversation.
|
|
base_url: Optional base URL for the OpenAI API.
|
|
api_key: Optional OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.
|
|
token_tracker: Optional token usage tracker for monitoring API usage.
|
|
enable_cot: Whether to enable Chain of Thought (COT) processing. Default is False.
|
|
**kwargs: Additional keyword arguments to pass to the OpenAI API.
|
|
Special kwargs:
|
|
- openai_client_configs: Dict of configuration options for the AsyncOpenAI client.
|
|
These will be passed to the client constructor but will be overridden by
|
|
explicit parameters (api_key, base_url).
|
|
- hashing_kv: Will be removed from kwargs before passing to OpenAI.
|
|
- keyword_extraction: Will be removed from kwargs before passing to OpenAI.
|
|
|
|
Returns:
|
|
The completed text (with integrated COT content if available) or an async iterator
|
|
of text chunks if streaming. COT content is wrapped in <think>...</think> tags.
|
|
|
|
Raises:
|
|
InvalidResponseError: If the response from OpenAI is invalid or empty.
|
|
APIConnectionError: If there is a connection error with the OpenAI API.
|
|
RateLimitError: If the OpenAI API rate limit is exceeded.
|
|
APITimeoutError: If the OpenAI API request times out.
|
|
"""
|
|
if history_messages is None:
|
|
history_messages = []
|
|
|
|
# Set openai logger level to INFO when VERBOSE_DEBUG is off
|
|
if not VERBOSE_DEBUG and logger.level == logging.DEBUG:
|
|
logging.getLogger("openai").setLevel(logging.INFO)
|
|
|
|
# Remove special kwargs that shouldn't be passed to OpenAI
|
|
kwargs.pop("hashing_kv", None)
|
|
kwargs.pop("keyword_extraction", None)
|
|
|
|
# Extract client configuration options
|
|
client_configs = kwargs.pop("openai_client_configs", {})
|
|
|
|
# Create the OpenAI client
|
|
openai_async_client = create_openai_async_client(
|
|
api_key=api_key,
|
|
base_url=base_url,
|
|
client_configs=client_configs,
|
|
)
|
|
|
|
# Prepare messages
|
|
messages: list[dict[str, Any]] = []
|
|
if system_prompt:
|
|
messages.append({"role": "system", "content": system_prompt})
|
|
messages.extend(history_messages)
|
|
messages.append({"role": "user", "content": prompt})
|
|
|
|
logger.debug("===== Entering func of LLM =====")
|
|
logger.debug(f"Model: {model} Base URL: {base_url}")
|
|
logger.debug(f"Client Configs: {client_configs}")
|
|
logger.debug(f"Additional kwargs: {kwargs}")
|
|
logger.debug(f"Num of history messages: {len(history_messages)}")
|
|
verbose_debug(f"System prompt: {system_prompt}")
|
|
verbose_debug(f"Query: {prompt}")
|
|
logger.debug("===== Sending Query to LLM =====")
|
|
|
|
messages = kwargs.pop("messages", messages)
|
|
|
|
# Normalize API parameters based on model requirements
|
|
_normalize_openai_kwargs_for_model(model, kwargs)
|
|
|
|
try:
|
|
# Don't use async with context manager, use client directly
|
|
if "response_format" in kwargs:
|
|
response = await openai_async_client.beta.chat.completions.parse(
|
|
model=model, messages=messages, **kwargs
|
|
)
|
|
else:
|
|
response = await openai_async_client.chat.completions.create(
|
|
model=model, messages=messages, **kwargs
|
|
)
|
|
except APIConnectionError as e:
|
|
logger.error(f"OpenAI API Connection Error: {e}")
|
|
await openai_async_client.close() # Ensure client is closed
|
|
raise
|
|
except RateLimitError as e:
|
|
logger.error(f"OpenAI API Rate Limit Error: {e}")
|
|
await openai_async_client.close() # Ensure client is closed
|
|
raise
|
|
except APITimeoutError as e:
|
|
logger.error(f"OpenAI API Timeout Error: {e}")
|
|
await openai_async_client.close() # Ensure client is closed
|
|
raise
|
|
except Exception as e:
|
|
logger.error(
|
|
f"OpenAI API Call Failed,\nModel: {model},\nParams: {kwargs}, Got: {e}"
|
|
)
|
|
await openai_async_client.close() # Ensure client is closed
|
|
raise
|
|
|
|
if hasattr(response, "__aiter__"):
|
|
|
|
async def inner():
|
|
# Track if we've started iterating
|
|
iteration_started = False
|
|
final_chunk_usage = None
|
|
|
|
# COT (Chain of Thought) state tracking
|
|
cot_active = False
|
|
cot_started = False
|
|
initial_content_seen = False
|
|
|
|
try:
|
|
iteration_started = True
|
|
async for chunk in response:
|
|
# Check if this chunk has usage information (final chunk)
|
|
if hasattr(chunk, "usage") and chunk.usage:
|
|
final_chunk_usage = chunk.usage
|
|
logger.debug(
|
|
f"Received usage info in streaming chunk: {chunk.usage}"
|
|
)
|
|
|
|
# Check if choices exists and is not empty
|
|
if not hasattr(chunk, "choices") or not chunk.choices:
|
|
logger.warning(f"Received chunk without choices: {chunk}")
|
|
continue
|
|
|
|
# Check if delta exists
|
|
if not hasattr(chunk.choices[0], "delta"):
|
|
# This might be the final chunk, continue to check for usage
|
|
continue
|
|
|
|
delta = chunk.choices[0].delta
|
|
content = getattr(delta, "content", None)
|
|
reasoning_content = getattr(delta, "reasoning_content", None)
|
|
|
|
# Handle COT logic for streaming (only if enabled)
|
|
if enable_cot:
|
|
if content is not None and content != "":
|
|
# Regular content is present
|
|
if not initial_content_seen:
|
|
initial_content_seen = True
|
|
# If both content and reasoning_content are present initially, don't start COT
|
|
if (
|
|
reasoning_content is not None
|
|
and reasoning_content != ""
|
|
):
|
|
cot_active = False
|
|
cot_started = False
|
|
|
|
# If COT was active, end it
|
|
if cot_active:
|
|
yield "</think>"
|
|
cot_active = False
|
|
|
|
# Process regular content
|
|
if r"\u" in content:
|
|
content = safe_unicode_decode(content.encode("utf-8"))
|
|
yield content
|
|
|
|
elif reasoning_content is not None and reasoning_content != "":
|
|
# Only reasoning content is present
|
|
if not initial_content_seen and not cot_started:
|
|
# Start COT if we haven't seen initial content yet
|
|
if not cot_active:
|
|
yield "<think>"
|
|
cot_active = True
|
|
cot_started = True
|
|
|
|
# Process reasoning content if COT is active
|
|
if cot_active:
|
|
if r"\u" in reasoning_content:
|
|
reasoning_content = safe_unicode_decode(
|
|
reasoning_content.encode("utf-8")
|
|
)
|
|
yield reasoning_content
|
|
else:
|
|
# COT disabled, only process regular content
|
|
if content is not None and content != "":
|
|
if r"\u" in content:
|
|
content = safe_unicode_decode(content.encode("utf-8"))
|
|
yield content
|
|
|
|
# If neither content nor reasoning_content, continue to next chunk
|
|
if content is None and reasoning_content is None:
|
|
continue
|
|
|
|
# Ensure COT is properly closed if still active after stream ends
|
|
if enable_cot and cot_active:
|
|
yield "</think>"
|
|
cot_active = False
|
|
|
|
# After streaming is complete, track token usage
|
|
if token_tracker and final_chunk_usage:
|
|
# Use actual usage from the API
|
|
token_counts = {
|
|
"prompt_tokens": getattr(final_chunk_usage, "prompt_tokens", 0),
|
|
"completion_tokens": getattr(
|
|
final_chunk_usage, "completion_tokens", 0
|
|
),
|
|
"total_tokens": getattr(final_chunk_usage, "total_tokens", 0),
|
|
}
|
|
token_tracker.add_usage(token_counts)
|
|
logger.debug(f"Streaming token usage (from API): {token_counts}")
|
|
elif token_tracker:
|
|
logger.debug("No usage information available in streaming response")
|
|
except Exception as e:
|
|
# Ensure COT is properly closed before handling exception
|
|
if enable_cot and cot_active:
|
|
try:
|
|
yield "</think>"
|
|
cot_active = False
|
|
except Exception as close_error:
|
|
logger.warning(
|
|
f"Failed to close COT tag during exception handling: {close_error}"
|
|
)
|
|
|
|
logger.error(f"Error in stream response: {str(e)}")
|
|
# Try to clean up resources if possible
|
|
if (
|
|
iteration_started
|
|
and hasattr(response, "aclose")
|
|
and callable(getattr(response, "aclose", None))
|
|
):
|
|
try:
|
|
await response.aclose()
|
|
logger.debug("Successfully closed stream response after error")
|
|
except Exception as close_error:
|
|
logger.warning(
|
|
f"Failed to close stream response: {close_error}"
|
|
)
|
|
# Ensure client is closed in case of exception
|
|
await openai_async_client.close()
|
|
raise
|
|
finally:
|
|
# Final safety check for unclosed COT tags
|
|
if enable_cot and cot_active:
|
|
try:
|
|
yield "</think>"
|
|
cot_active = False
|
|
except Exception as final_close_error:
|
|
logger.warning(
|
|
f"Failed to close COT tag in finally block: {final_close_error}"
|
|
)
|
|
|
|
# Ensure resources are released even if no exception occurs
|
|
if (
|
|
iteration_started
|
|
and hasattr(response, "aclose")
|
|
and callable(getattr(response, "aclose", None))
|
|
):
|
|
try:
|
|
await response.aclose()
|
|
logger.debug("Successfully closed stream response")
|
|
except Exception as close_error:
|
|
logger.warning(
|
|
f"Failed to close stream response in finally block: {close_error}"
|
|
)
|
|
|
|
# This prevents resource leaks since the caller doesn't handle closing
|
|
try:
|
|
await openai_async_client.close()
|
|
logger.debug(
|
|
"Successfully closed OpenAI client for streaming response"
|
|
)
|
|
except Exception as client_close_error:
|
|
logger.warning(
|
|
f"Failed to close OpenAI client in streaming finally block: {client_close_error}"
|
|
)
|
|
|
|
return inner()
|
|
|
|
else:
|
|
try:
|
|
if (
|
|
not response
|
|
or not response.choices
|
|
or not hasattr(response.choices[0], "message")
|
|
):
|
|
logger.error("Invalid response from OpenAI API")
|
|
await openai_async_client.close() # Ensure client is closed
|
|
raise InvalidResponseError("Invalid response from OpenAI API")
|
|
|
|
message = response.choices[0].message
|
|
content = getattr(message, "content", None)
|
|
reasoning_content = getattr(message, "reasoning_content", None)
|
|
|
|
# Handle COT logic for non-streaming responses (only if enabled)
|
|
final_content = ""
|
|
|
|
if enable_cot:
|
|
# Check if we should include reasoning content
|
|
should_include_reasoning = False
|
|
if reasoning_content and reasoning_content.strip():
|
|
if not content or content.strip() == "":
|
|
# Case 1: Only reasoning content, should include COT
|
|
should_include_reasoning = True
|
|
final_content = (
|
|
content or ""
|
|
) # Use empty string if content is None
|
|
else:
|
|
# Case 3: Both content and reasoning_content present, ignore reasoning
|
|
should_include_reasoning = False
|
|
final_content = content
|
|
else:
|
|
# No reasoning content, use regular content
|
|
final_content = content or ""
|
|
|
|
# Apply COT wrapping if needed
|
|
if should_include_reasoning:
|
|
if r"\u" in reasoning_content:
|
|
reasoning_content = safe_unicode_decode(
|
|
reasoning_content.encode("utf-8")
|
|
)
|
|
final_content = f"<think>{reasoning_content}</think>{final_content}"
|
|
else:
|
|
# COT disabled, only use regular content
|
|
final_content = content or ""
|
|
|
|
# Validate final content
|
|
if not final_content or final_content.strip() == "":
|
|
logger.error("Received empty content from OpenAI API")
|
|
await openai_async_client.close() # Ensure client is closed
|
|
raise InvalidResponseError("Received empty content from OpenAI API")
|
|
|
|
# Apply Unicode decoding to final content if needed
|
|
if r"\u" in final_content:
|
|
final_content = safe_unicode_decode(final_content.encode("utf-8"))
|
|
|
|
if token_tracker and hasattr(response, "usage"):
|
|
token_counts = {
|
|
"prompt_tokens": getattr(response.usage, "prompt_tokens", 0),
|
|
"completion_tokens": getattr(
|
|
response.usage, "completion_tokens", 0
|
|
),
|
|
"total_tokens": getattr(response.usage, "total_tokens", 0),
|
|
}
|
|
token_tracker.add_usage(token_counts)
|
|
|
|
logger.debug(f"Response content len: {len(final_content)}")
|
|
verbose_debug(f"Response: {response}")
|
|
|
|
return final_content
|
|
finally:
|
|
# Ensure client is closed in all cases for non-streaming responses
|
|
await openai_async_client.close()
|
|
|
|
|
|
async def openai_complete(
|
|
prompt,
|
|
system_prompt=None,
|
|
history_messages=None,
|
|
keyword_extraction=False,
|
|
**kwargs,
|
|
) -> Union[str, AsyncIterator[str]]:
|
|
if history_messages is None:
|
|
history_messages = []
|
|
keyword_extraction = kwargs.pop("keyword_extraction", None)
|
|
if keyword_extraction:
|
|
kwargs["response_format"] = "json"
|
|
model_name = kwargs["hashing_kv"].global_config["llm_model_name"]
|
|
return await openai_complete_if_cache(
|
|
model_name,
|
|
prompt,
|
|
system_prompt=system_prompt,
|
|
history_messages=history_messages,
|
|
**kwargs,
|
|
)
|
|
|
|
|
|
async def gpt_4o_complete(
|
|
prompt,
|
|
system_prompt=None,
|
|
history_messages=None,
|
|
enable_cot: bool = False,
|
|
keyword_extraction=False,
|
|
**kwargs,
|
|
) -> str:
|
|
if history_messages is None:
|
|
history_messages = []
|
|
keyword_extraction = kwargs.pop("keyword_extraction", None)
|
|
if keyword_extraction:
|
|
kwargs["response_format"] = GPTKeywordExtractionFormat
|
|
return await openai_complete_if_cache(
|
|
"gpt-4o",
|
|
prompt,
|
|
system_prompt=system_prompt,
|
|
history_messages=history_messages,
|
|
enable_cot=enable_cot,
|
|
**kwargs,
|
|
)
|
|
|
|
|
|
async def gpt_4o_mini_complete(
|
|
prompt,
|
|
system_prompt=None,
|
|
history_messages=None,
|
|
enable_cot: bool = False,
|
|
keyword_extraction=False,
|
|
**kwargs,
|
|
) -> str:
|
|
if history_messages is None:
|
|
history_messages = []
|
|
keyword_extraction = kwargs.pop("keyword_extraction", None)
|
|
if keyword_extraction:
|
|
kwargs["response_format"] = GPTKeywordExtractionFormat
|
|
return await openai_complete_if_cache(
|
|
"gpt-4o-mini",
|
|
prompt,
|
|
system_prompt=system_prompt,
|
|
history_messages=history_messages,
|
|
enable_cot=enable_cot,
|
|
**kwargs,
|
|
)
|
|
|
|
|
|
async def nvidia_openai_complete(
|
|
prompt,
|
|
system_prompt=None,
|
|
history_messages=None,
|
|
enable_cot: bool = False,
|
|
keyword_extraction=False,
|
|
**kwargs,
|
|
) -> str:
|
|
if history_messages is None:
|
|
history_messages = []
|
|
kwargs.pop("keyword_extraction", None)
|
|
result = await openai_complete_if_cache(
|
|
"nvidia/llama-3.1-nemotron-70b-instruct", # context length 128k
|
|
prompt,
|
|
system_prompt=system_prompt,
|
|
history_messages=history_messages,
|
|
enable_cot=enable_cot,
|
|
base_url="https://integrate.api.nvidia.com/v1",
|
|
**kwargs,
|
|
)
|
|
return result
|
|
|
|
|
|
@wrap_embedding_func_with_attrs(embedding_dim=1536)
|
|
@retry(
|
|
stop=stop_after_attempt(3),
|
|
wait=wait_exponential(multiplier=1, min=4, max=60),
|
|
retry=(
|
|
retry_if_exception_type(RateLimitError)
|
|
| retry_if_exception_type(APIConnectionError)
|
|
| retry_if_exception_type(APITimeoutError)
|
|
),
|
|
)
|
|
async def openai_embed(
|
|
texts: list[str],
|
|
model: str = "text-embedding-3-small",
|
|
base_url: str = None,
|
|
api_key: str = None,
|
|
client_configs: dict[str, Any] = None,
|
|
) -> np.ndarray:
|
|
"""Generate embeddings for a list of texts using OpenAI's API.
|
|
|
|
Args:
|
|
texts: List of texts to embed.
|
|
model: The OpenAI embedding model to use.
|
|
base_url: Optional base URL for the OpenAI API.
|
|
api_key: Optional OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.
|
|
client_configs: Additional configuration options for the AsyncOpenAI client.
|
|
These will override any default configurations but will be overridden by
|
|
explicit parameters (api_key, base_url).
|
|
|
|
Returns:
|
|
A numpy array of embeddings, one per input text.
|
|
|
|
Raises:
|
|
APIConnectionError: If there is a connection error with the OpenAI API.
|
|
RateLimitError: If the OpenAI API rate limit is exceeded.
|
|
APITimeoutError: If the OpenAI API request times out.
|
|
"""
|
|
# Create the OpenAI client
|
|
openai_async_client = create_openai_async_client(
|
|
api_key=api_key, base_url=base_url, client_configs=client_configs
|
|
)
|
|
|
|
async with openai_async_client:
|
|
response = await openai_async_client.embeddings.create(
|
|
model=model, input=texts, encoding_format="base64"
|
|
)
|
|
return np.array(
|
|
[
|
|
np.array(dp.embedding, dtype=np.float32)
|
|
if isinstance(dp.embedding, list)
|
|
else np.frombuffer(base64.b64decode(dp.embedding), dtype=np.float32)
|
|
for dp in response.data
|
|
]
|
|
)
|