ragflow/rag/utils
pyyuhao c8c3b756b0
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140)
### What problem does this PR solve?

This PR adds the support for latest OpenSearch2.19.1 as the store engine
& search engine option for RAGFlow.

### Main Benefit

1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is
much better than Elasticsearch
2. For search, OpenSearch2.19.1 supports full-text
search、vector_search、hybrid_search those are similar with Elasticsearch
on schema
3. For store, OpenSearch2.19.1 stores text、vector those are quite
simliar with Elasticsearch on schema

### Changes

- Support opensearch_python_connetor. I make a lot of adaptions since
the schema and api/method between ES and Opensearch differs in many
ways(especially the knn_search has a significant gap) :
rag/utils/opensearch_coon.py
- Support static config adaptions by changing:
conf/service_conf.yaml、api/settings.py、rag/settings.py
- Supprt some store&search schema changes between OpenSearch and ES:
conf/os_mapping.json
- Support OpenSearch python sdk : pyproject.toml
- Support docker config for OpenSearch2.19.1 :
docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template

### How to use
- I didn't change the priority that ES as the default doc/search engine.
Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it
will work.


### Others
Our team tested a lot of docs in our environment by using OpenSearch as
the vector database ,it works very well.
All the conifg for OpenSearch is necessary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Yongteng Lei <yongtengrey@outlook.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-04-24 16:03:31 +08:00
..
__init__.py Fix: type violations. (#6262) 2025-03-19 12:12:34 +08:00
azure_sas_conn.py refactor: no need to inherit in python3 clean the code (#5659) 2025-03-05 18:03:53 +08:00
azure_spn_conn.py refactor: no need to inherit in python3 clean the code (#5659) 2025-03-05 18:03:53 +08:00
doc_store_conn.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
es_conn.py Fix: Handle the case of deleting empty blocks. Update the relevant message (#6643) 2025-04-02 19:20:17 +08:00
infinity_conn.py Fix: knowledge graph resolution with infinity raise error tokenizing in specific situations (#7048) 2025-04-17 16:15:21 +08:00
minio_conn.py fix:  Remove unnecessary minio initialization (#6544) 2025-03-27 09:54:25 +08:00
opensearch_coon.py Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) 2025-04-24 16:03:31 +08:00
oss_conn.py refactor: no need to inherit in python3 clean the code (#5659) 2025-03-05 18:03:53 +08:00
redis_conn.py feat: Recover pending tasks while pod restart. (#7073) 2025-04-19 16:18:51 +08:00
s3_conn.py Fix: don't modify S3 file name when not using prefix_path (#7152) 2025-04-21 11:55:50 +08:00
storage_factory.py Feat: Accessing Alibaba Cloud OSS with Amazon S3 SDK (#5438) 2025-02-27 17:02:42 +08:00
tavily_conn.py Feat: apply LLM to optimize citations. (#5935) 2025-03-11 19:56:21 +08:00