English | 中文
Streams to River is an English learning application. The purpose of this product is to record, extract, and manage English words, sentences, and related contexts encountered in daily life, combined with the Ebbinghaus Forgetting Curve for periodic learning and memorization.
During development, TRAE was extensively used for code development, debugging, annotation, and unit test writing. Through Coze workflow, capabilities such as image-to-text, real-time chat, speech recognition, and word highlighting were quickly integrated.
Streams to River V2 is a word learning and language processing microservice system built on the Hertz and Kitex frameworks. The system provides a complete solution from API services to RPC implementation, including core functional modules such as user authentication, word management, review progress tracking, real-time chat, speech recognition, and image-to-text conversion, using MySQL and Redis for data storage and cache optimization.
The system is designed to provide users with a comprehensive language learning platform, enhancing learning effectiveness and user experience by combining traditional word learning methods with modern AI technology. The system supports features such as word addition, querying, tag management, review progress tracking, and intelligent chat, and integrates multimodal processing capabilities such as speech recognition and image-to-text conversion to provide users with richer and more convenient learning methods.
The system adopts a front-end and back-end separated microservice architecture, mainly divided into the following layers:

| Category | Technology/Framework | Description |
|---|---|---|
| HTTP Framework | Hertz | High-performance Golang HTTP framework for building API services |
| RPC Framework | Kitex | High-performance, highly extensible Golang RPC framework for building microservices |
| Data Storage | MySQL | Relational database for persistent storage of user data, word information, etc. |
| Cache Service | Redis | In-memory database for caching hot data to improve system performance |
| Communication Protocol | HTTP/RESTful | For communication between front-end and API service layer |
| RPC | For communication between API service layer and RPC service layer | |
| WebSocket | For real-time communication, such as speech recognition service | |
| Server-Sent Events (SSE) | For streaming communication, such as real-time chat functionality | |
| AI/ML Integration | Large Language Model (LLM) | For intelligent chat, content generation, and word highlighting |
| Speech Recognition (ASR) | For converting speech to text | |
| Image Processing | For image-to-text functionality | |
| Monitoring and Observability | OpenTelemetry | For system monitoring, metrics collection, and performance analysis |
| Security | JWT | For user authentication and authorization |
| Deployment and Service Discovery | Service Registration and Discovery | For microservice registration and discovery |
| Dynamic Configuration Management | For dynamic management of system configuration |
The user management module is responsible for user registration, login, and information management, with the following main functions:
The word learning system is the core functional module of the system, responsible for word management, review, and tag management, with the following main functions:
The intelligent chat module is based on large language models (LLM) and provides real-time chat functionality with the following main features:
The multimodal processing module integrates speech recognition and image-to-text functionality, providing users with multiple input methods:
The documentation service module provides API documentation and usage guides for the system, with the following main functions:
The system monitoring and management module is responsible for system monitoring, configuration management, and log processing, with the following main functions:
For more information, please refer to repome
Update config file: stream2river
LLM:
ChatModel:
# You need to go to the Volcano Ark platform https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-1-5-pro-32k to apply for the latest Doubao Pro text model and get their latest api_key and model_id
APIKey: ""
Model: ""
Coze:
BaseURL: "https://api.coze.cn"
# The following fields are configured with reference to rpcservice/biz/chat/coze/README.md
WorkflowID: ""
Auth: ""
Token: ""
ClientID:
PublishKey:
PrivateKey:
Update config file: stream2river
LLM:
AsrModel:
# You can read the "Sentence Recognition" access document in advance: https://www.volcengine.com/docs/6561/80816, and go to the Volcano Ark platform to access the sentence recognition capability https://console.volcengine.com/speech/service/15, and fill in the following AppID / Token / Cluster provided by the platform
AppID: ""
Token: ""
Cluster: ""
VisionModel:
# You need to go to the Volcano Ark platform https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-1-5-vision-lite to apply for Doubao's latest Vision lite model and get their latest api_key and model_id
APIKey: ""
Model: ""
# JWT_SECRET is used to sign and verify JWT tokens. It must be a long, random string.
# Recommended to use at least 32 bytes (256 bits) of random data.
# You can generate a secure random string using the following commands:
# openssl rand -base64 32
# or in Python: import secrets; print(secrets.token_urlsafe(32))
JWT_SECRET: your_secret_key
Before running the backend services, make sure you have installed docker and docker-compose. For more information, see https://docs.docker.com/engine/install/ and https://docs.docker.com/compose/install/ .
After starting the docker service, run ./dockerfile/run.sh in the project root directory to start the backend services.
Refer to the client/README.md document.
Refer to the Coze Config document.
Implement a function for retrieving the words to be recited. The basic logic is as follows:
- From the "words_recite_record" table, select all records of the current user (whose user_id is passed as a parameter) whose "next_review_time" is earlier than the current time.
- For each record, obtain detailed information from the "words" table using their "word_id".
- For each record, generate three types of review questions. Each question contains the question stem and four answer options.
- The first type: Select the correct Chinese meaning. The logic is as follows: the question stem is the "word_name" in the "words" table. The options consist of two parts: one part is the "explanations" in the "words" table. Additionally, randomly select 3 answers from the "answer_list" data table. The selection method is to first find the record in the "answer_list" table where "user" equals the current user, and then randomly select 3 order_ids from 1 to the maximum order_id. The "description" field of these 3 records will be used as the options. Note that an exclusion logic must also be implemented.
- The second type: Select the correct English meaning. This can be defined as the constant "CHOOSE_EN". The logic is similar to the above. The difference is that the question stem is the "explanations" in the "words" table. The options are the "word_name" in the "answer_list".
- The third type: Select the correct Chinese meaning based on the pronunciation. This can be defined as the constant "PRONOUNCE_CHOOSE". The logic is also similar to the first type. The difference is that the question stem is the "pronounce_us" in the "words" table. The options are the "description" in the "answer_list".
Generated Code: review_list.go
Copyright (c) 2025 Bytedance Ltd. and/or its affiliates. All rights reserved.
Licensed under the MIT license.