基于txtai的智能搜索系统
多融合搜索引擎是基于txtai框架构建的智能搜索系统,支持文档和图片的语义搜索、关键词搜索和混合搜索。系统采用前后端分离架构,提供现代化的Web界面和强大的API服务。
┌─────────────────────────────────────────────────────────────┐ │ 多融合搜索引擎 │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐│ │ │ 前端层 │ │ API层 │ │ 业务逻辑层 │ │ 数据层 ││ │ │React+Vite │ │ FastAPI │ │Core Modules │ │Database ││ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘│ ├─────────────────────────────────────────────────────────────┤ │ txtai AI框架 │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐│ │ │ Embeddings │ │ Pipeline │ │ Workflow │ │ Vectors ││ │ │ 嵌入数据库 │ │ 处理管道 │ │ 工作流 │ │ 向量模型 ││ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘│ └─────────────────────────────────────────────────────────────┘
git clone <repository-url>
cd multi-fusion-search
# 后端依赖
cd app
pip install -r requirements.txt -i https://mirrors.cloud.tencent.com/pypi/simple
# 前端依赖
cd frontend
npm config set registry https://mirrors.cloud.tencent.com/npm
npm install
# 启动后端 (8888端口)
cd app && python main.py
# 启动前端 (8889端口)
cd app/frontend && npm run dev
curl -X POST "http://localhost:8888/api/knowledge-bases/" \
-H "Content-Type: application/json" \
-d '{
"name": "我的知识库",
"description": "知识库描述",
"source_path": "/path/to/documents"
}'
curl -X POST "http://localhost:8888/api/search/{kb_id}" \
-H "Content-Type: application/json" \
-d '{
"query": "搜索关键词",
"size": 10,
"include_content": true
}'
✅ 测试通过: 所有搜索功能正常工作
🟢 运行中:
# 构建Docker镜像
docker build -f app/Dockerfile -t docker.cnb.cool/aiedulab/txtai/multi-fusion-search .
# 运行容器
docker run -d \
-p 8888:8888 \
-p 8889:8889 \
-v /path/to/data:/app/data \
--name multi-fusion-search \
docker.cnb.cool/aiedulab/txtai/multi-fusion-search
# 推送镜像
docker push docker.cnb.cool/aiedulab/multi-fusion-search
. ├── app/ # 主应用目录 │ ├── api/ # API路由层 │ │ ├── knowledge_base.py # 知识库管理API │ │ ├── search.py # 搜索服务API │ │ ├── document.py # 文档管理API │ │ ├── task.py # 任务管理API │ │ └── spaces.py # 空间化文件管理API │ ├── core/ # 核心业务逻辑层 │ │ ├── knowledge_base.py # 知识库管理器 │ │ ├── search_engine.py # 混合搜索引擎 │ │ ├── document_processor.py # 文档处理器 │ │ ├── task_scheduler.py # 任务调度器 │ │ ├── task_executor.py # 任务执行器 │ │ ├── vector_store.py # 向量存储封装 │ │ ├── model_manager.py # AI模型管理器 │ │ └── space_service.py # 空间化服务 │ ├── frontend/ # React前端 │ │ ├── src/ │ │ │ ├── components/ # React组件 │ │ │ ├── pages/ # 页面组件 │ │ │ ├── hooks/ # 自定义Hooks │ │ │ ├── lib/ # 工具库 │ │ │ └── types/ # TypeScript类型 │ │ ├── package.json # 前端依赖 │ │ └── vite.config.ts # Vite配置 │ ├── models/ # AI模型文件 │ │ ├── qwen3-embedding-4b/ # 主要嵌入模型 │ │ ├── jina-clip-v2/ # CLIP图像模型 │ │ └── blip-image-captioning-large/ # BLIP描述模型 │ ├── schemas/ # 数据模型定义 │ │ ├── knowledge_base.py # 知识库数据模型 │ │ ├── search.py # 搜索数据模型 │ │ └── document.py # 文档数据模型 │ ├── data/ # 示例数据和文档 │ ├── main.py # 后端应用入口 │ ├── config.py # 配置管理 │ └── requirements.txt # Python依赖 ├── src/ # txtai源码 │ └── python/txtai/ # txtai核心框架 │ ├── embeddings/ # 嵌入数据库 │ ├── pipeline/ # 处理管道 │ ├── workflow/ # 工作流 │ ├── vectors/ # 向量模型 │ ├── ann/ # 近似最近邻索引 │ └── database/ # 数据库集成 ├── examples/ # txtai示例和教程 ├── docs/ # 项目文档 │ ├── 多融合搜索引擎技术架构详解.md │ ├── txtai实现原理深度解析.md │ ├── 系统架构说明.md │ └── API接口文档.md ├── test/ # 测试脚本 ├── docker/ # Docker相关文件 ├── Dockerfile # Docker构建文件 └── README.md # 项目说明
cd app
python main.py
# 查看日志
tail -f main.log
cd app/frontend
npm run dev
# 支持热重载
访问 http://localhost:8888/docs 查看完整的API文档
git checkout -b feature/AmazingFeature)git commit -m 'Add some AmazingFeature')git push origin feature/AmazingFeature)本项目基于 Apache License 2.0 许可证开源。
app/main.log - 详细的运行日志/api/health - 系统健康检查/api/stats - 系统性能统计/docs - 交互式API文档版本: v2.0.0 最后更新: 2025-09-22 开发状态: ✅ 生产就绪 测试覆盖率: 85%+ 文档完整度: 95%+
# app.yml
embeddings:
path: sentence-transformers/all-MiniLM-L6-v2
CONFIG=app.yml uvicorn "txtai.api:app"
curl -X GET "http://localhost:8000/search?query=positive"
The following sections introduce common txtai use cases. A comprehensive set of over 60 example notebooks and applications are also available.
Build semantic/similarity/vector/neural search applications.

Traditional search systems use keywords to find data. Semantic search has an understanding of natural language and identifies results that have the same meaning, not necessarily the same keywords.

Get started with the following examples.
| Notebook | Description | |
|---|---|---|
| Introducing txtai ▶️ | Overview of the functionality provided by txtai | |
| Similarity search with images | Embed images and text into the same space for search | |
| Build a QA database | Question matching with semantic search | |
| Semantic Graphs | Explore topics, data connectivity and run network analysis |
Autonomous agents, retrieval augmented generation (RAG), chat with your data, pipelines and workflows that interface with large language models (LLMs).

See below to learn more.
| Notebook | Description | |
|---|---|---|
| Prompt templates and task chains | Build model prompts and connect tasks together with workflows | |
| Integrate LLM frameworks | Integrate llama.cpp, LiteLLM and custom generation frameworks | |
| Build knowledge graphs with LLMs | Build knowledge graphs with LLM-driven entity extraction | |
| Parsing the stars with txtai | Explore an astronomical knowledge graph of known stars, planets, galaxies |
Agents connect embeddings, pipelines, workflows and other agents together to autonomously solve complex problems.

txtai agents are built on top of the smolagents framework. This supports all LLMs txtai supports (Hugging Face, llama.cpp, OpenAI / Claude / AWS Bedrock via LiteLLM).
See the link below to learn more.
| Notebook | Description | |
|---|---|---|
| Analyzing Hugging Face Posts with Graphs and Agents | Explore a rich dataset with Graph Analysis and Agents | |
| Granting autonomy to agents | Agents that iteratively solve problems as they see fit | |
| Analyzing LinkedIn Company Posts with Graphs and Agents | Exploring how to improve social media engagement with AI |
Retrieval augmented generation (RAG) reduces the risk of LLM hallucinations by constraining the output with a knowledge base as context. RAG is commonly used to "chat with your data".

A novel feature of txtai is that it can provide both an answer and source citation.
| Notebook | Description | |
|---|---|---|
| Build RAG pipelines with txtai | Guide on retrieval augmented generation including how to create citations | |
| Chunking your data for RAG | Extract, chunk and index content for effective retrieval | |
| Advanced RAG with graph path traversal | Graph path traversal to collect complex sets of data for advanced RAG | |
| Speech to Speech RAG ▶️ | Full cycle speech to speech workflow with RAG |
Language model workflows, also known as semantic workflows, connect language models together to build intelligent applications.

While LLMs are powerful, there are plenty of smaller, more specialized models that work better and faster for specific tasks. This includes models for extractive question-answering, automatic summarization, text-to-speech, transcription and translation.
| Notebook | Description | |
|---|---|---|
| Run pipeline workflows ▶️ | Simple yet powerful constructs to efficiently process data | |
| Building abstractive text summaries | Run abstractive text summarization | |
| Transcribe audio to text | Convert audio files to text | |
| Translate text between languages | Streamline machine translation and language detection |

The easiest way to install is via pip and PyPI
pip install txtai
Python 3.10+ is supported. Using a Python virtual environment is recommended.
See the detailed install instructions for more information covering optional dependencies, environment specific prerequisites, installing from source, conda support and how to run with containers.

See the table below for the current recommended models. These models all allow commercial use and offer a blend of speed and performance.
Models can be loaded as either a path from the Hugging Face Hub or a local directory. Model paths are optional, defaults are loaded when not specified. For tasks with no recommended model, txtai uses the default models as shown in the Hugging Face Tasks guide.
See the following links to learn more.
The following applications are powered by txtai.

| Application | Description |
|---|---|
| rag | Retrieval Augmented Generation (RAG) application |
| ragdata | Build knowledge bases for RAG |
| paperai | AI for medical and scientific papers |
| annotateai | Automatically annotate papers with LLMs |
In addition to this list, there are also many other open-source projects, published research and closed proprietary/commercial projects that have built on txtai in production.

Full documentation on txtai including configuration settings for embeddings, pipelines, workflows, API and a FAQ with common questions/issues is available.
For those who would like to contribute to txtai, please see this guide.