Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

stream-to-river/README.md

Guangming Luo<luoguangming.ivan@bytedance.com>

chore: add project_rules and prompt example (#8)

14dae25c

PreviewCode viewBlame

Overview

English | 中文

Streams to River is an English learning application. The purpose of this product is to record, extract, and manage English words, sentences, and related contexts encountered in daily life, combined with the Ebbinghaus Forgetting Curve for periodic learning and memorization.

During development, TRAE was extensively used for code development, debugging, annotation, and unit test writing. Through Coze workflow, capabilities such as image-to-text, real-time chat, speech recognition, and word highlighting were quickly integrated.

Project Introduction

1.1 Project Introduction and Background

Streams to River V2 is a word learning and language processing microservice system built on the Hertz and Kitex frameworks. The system provides a complete solution from API services to RPC implementation, including core functional modules such as user authentication, word management, review progress tracking, real-time chat, speech recognition, and image-to-text conversion, using MySQL and Redis for data storage and cache optimization.

The system is designed to provide users with a comprehensive language learning platform, enhancing learning effectiveness and user experience by combining traditional word learning methods with modern AI technology. The system supports features such as word addition, querying, tag management, review progress tracking, and intelligent chat, and integrates multimodal processing capabilities such as speech recognition and image-to-text conversion to provide users with richer and more convenient learning methods.

1.2 System Architecture

The system adopts a front-end and back-end separated microservice architecture, mainly divided into the following layers:

API Service Layer: Based on the Hertz framework, providing HTTP API interfaces to handle requests from the front-end
RPC Service Layer: Based on the Kitex framework, implementing business logic to handle requests from the API service layer
Data Access Layer: Including MySQL database and Redis cache, responsible for persistent storage and caching of data
Intelligent Processing Layer: Integrating large language models (LLM), speech recognition (ASR), and image-to-text functionality

System Architecture Diagram

Component Interaction Diagram

1.3 Technology Stack Overview

Category	Technology/Framework	Description
HTTP Framework	Hertz	High-performance Golang HTTP framework for building API services
RPC Framework	Kitex	High-performance, highly extensible Golang RPC framework for building microservices
Data Storage	MySQL	Relational database for persistent storage of user data, word information, etc.
Cache Service	Redis	In-memory database for caching hot data to improve system performance
Communication Protocol	HTTP/RESTful	For communication between front-end and API service layer
	RPC	For communication between API service layer and RPC service layer
	WebSocket	For real-time communication, such as speech recognition service
	Server-Sent Events (SSE)	For streaming communication, such as real-time chat functionality
AI/ML Integration	Large Language Model (LLM)	For intelligent chat, content generation, and word highlighting
	Speech Recognition (ASR)	For converting speech to text
	Image Processing	For image-to-text functionality
Monitoring and Observability	OpenTelemetry	For system monitoring, metrics collection, and performance analysis
Security	JWT	For user authentication and authorization
Deployment and Service Discovery	Service Registration and Discovery	For microservice registration and discovery
	Dynamic Configuration Management	For dynamic management of system configuration

1.4 System Functional Modules

1.4.1 User Management

The user management module is responsible for user registration, login, and information management, with the following main functions:

User registration: Supports registration with username, email, and password
User login: Supports login with username and password, returns JWT token
User information retrieval: Retrieves information of the currently logged-in user

1.4.2 Word Learning System

The word learning system is the core functional module of the system, responsible for word management, review, and tag management, with the following main functions:

Word management: Adding, querying, retrieving details, and listing words
Tag management: Supporting classification and tagging of words
Review system: Generating review lists, tracking review progress, and verifying answers
Word details: Providing word definitions, phonetic symbols, example sentences, and translations

1.4.3 Intelligent Chat

The intelligent chat module is based on large language models (LLM) and provides real-time chat functionality with the following main features:

Streaming communication: Using Server-Sent Events (SSE) for streaming responses
Session management: Supporting session ID and context management
Content highlighting: Supporting highlighting of words in chat content
Sensitive content review: Filtering chat content for sensitive words

1.4.4 Multimodal Processing

The multimodal processing module integrates speech recognition and image-to-text functionality, providing users with multiple input methods:

Speech recognition: Converting speech input to text
Image-to-text: Converting content in images to text descriptions

1.4.5 Documentation Service

The documentation service module provides API documentation and usage guides for the system, with the following main functions:

API documentation generation: Automatically generating API documentation for the system
Markdown processing: Supporting processing and conversion of Markdown format documents
HTML generation: Converting Markdown documents to HTML format

1.4.6 System Monitoring and Management

The system monitoring and management module is responsible for system monitoring, configuration management, and log processing, with the following main functions:

Performance monitoring: Using OpenTelemetry for performance monitoring and metrics collection
Configuration management: Supporting dynamic configuration management and environment variable reading
Log management: Providing unified log recording and management functionality
Service registration and discovery: Supporting microservice registration and discovery

For more information, please refer to repome

Getting Started

Configuration

rpcservice

Update config file: stream2river


LLM:
  ChatModel:
    # You need to go to the Volcano Ark platform https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-1-5-pro-32k to apply for the latest Doubao Pro text model and get their latest api_key and model_id
    APIKey: ""
    Model: ""

Coze:
  BaseURL: "https://api.coze.cn"
  # The following fields are configured with reference to rpcservice/biz/chat/coze/README.md
  WorkflowID: ""
  Auth: ""
  Token: ""
  ClientID:
  PublishKey:
  PrivateKey:

apiservice

Update config file: stream2river


LLM:
  AsrModel:
    # You can read the "Sentence Recognition" access document in advance: https://www.volcengine.com/docs/6561/80816, and go to the Volcano Ark platform to access the sentence recognition capability https://console.volcengine.com/speech/service/15, and fill in the following AppID / Token / Cluster provided by the platform
    AppID: ""
    Token: ""
    Cluster: ""
  VisionModel:
    # You need to go to the Volcano Ark platform https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-1-5-vision-lite to apply for Doubao's latest Vision lite model and get their latest api_key and model_id
    APIKey: ""
    Model: ""

# JWT_SECRET is used to sign and verify JWT tokens. It must be a long, random string.
# Recommended to use at least 32 bytes (256 bits) of random data.
# You can generate a secure random string using the following commands:
#   openssl rand -base64 32
#   or in Python: import secrets; print(secrets.token_urlsafe(32))
JWT_SECRET: your_secret_key

Run backend services

Before running the backend services, make sure you have installed docker and docker-compose. For more information, see https://docs.docker.com/engine/install/ and https://docs.docker.com/compose/install/ .

After starting the docker service, run ./dockerfile/run.sh in the project root directory to start the backend services.

Run frontend services

Refer to the client/README.md document.

LLM Workflow

Refer to the Coze Config document.

Project Rule & Prompt

Trae Rules

project_rules.md

Prompt Example


Implement a function for retrieving the words to be recited. The basic logic is as follows:
- From the "words_recite_record" table, select all records of the current user (whose user_id is passed as a parameter) whose "next_review_time" is earlier than the current time.
- For each record, obtain detailed information from the "words" table using their "word_id".
- For each record, generate three types of review questions. Each question contains the question stem and four answer options.
    - The first type: Select the correct Chinese meaning. The logic is as follows: the question stem is the "word_name" in the "words" table. The options consist of two parts: one part is the "explanations" in the "words" table. Additionally, randomly select 3 answers from the "answer_list" data table. The selection method is to first find the record in the "answer_list" table where "user" equals the current user, and then randomly select 3 order_ids from 1 to the maximum order_id. The "description" field of these 3 records will be used as the options. Note that an exclusion logic must also be implemented.
    - The second type: Select the correct English meaning. This can be defined as the constant "CHOOSE_EN". The logic is similar to the above. The difference is that the question stem is the "explanations" in the "words" table. The options are the "word_name" in the "answer_list".
    - The third type: Select the correct Chinese meaning based on the pronunciation. This can be defined as the constant "PRONOUNCE_CHOOSE". The logic is also similar to the first type. The difference is that the question stem is the "pronounce_us" in the "words" table. The options are the "description" in the "answer_list".

Generated Code: review_list.go

License

Licensed under the MIT license.

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111