Public

WeChat Login

Code Issues Pull requests Events Packages Insights

be20_调整适配cnb与cs

Podcast-Generator/README_EN.md

hex2077<justlikemaki@qq.com>

feat: 添加日语支持并优化国际化功能

f64cd498

PreviewCode viewBlame

🎙️ Simple Podcast Generator

Easily transform your ideas into lively and engaging multi-person dialogue podcasts with one click! 中文版本

This is a powerful script tool that leverages the wisdom of OpenAI API to generate insightful podcast scripts, and through TTS (Text-to-Speech) API services, transforms cold text into warm audio. You just need to provide a topic, and leave the rest to it!

✨ The podcast script generation logic of this project is deeply inspired by the SurfSense project. We would like to express our heartfelt thanks to their open-source contributions!

✨ Core Highlights

🤖 AI-Driven Scripts: Automatically create high-quality, in-depth podcast dialogue scripts with the power of OpenAI models.
👥 Multi-Character Support: Freely define multiple podcast characters (such as host, guest), and assign unique TTS voices to each character.
🔌 Flexible TTS Integration: Seamlessly connect to your self-hosted or third-party TTS services through simple API URL configuration.
🔊 Intelligent Audio Synthesis: Automatically splice voice segments of each character accurately, and support volume and speed adjustment to synthesize a complete, smooth podcast audio file (.wav format).
⌨️ Convenient Command-Line Interface: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process.

🛠️ Installation Guide

📝 Prerequisites

Python 3.x
- Please ensure Python 3 is installed on your system.
FFmpeg
- This project depends on FFmpeg for audio merging. Please visit the FFmpeg official website to download and install.
- Important Note: After installation, please ensure the ffmpeg command has been added to your system's environment variables (PATH) so that the script can call it normally.

🐍 Python Dependencies

Open your terminal or command prompt and install the required Python libraries using pip:


pip install requests openai pydub msgpack

Dependency Explanation:

requests: Used to send HTTP requests to TTS service APIs

openai: Used to interact with OpenAI API to generate podcast scripts

pydub: Used for audio processing, such as adjusting volume and speed

msgpack: Used for efficient data serialization with certain TTS services (such as Fish Audio)

🚀 Quick Start

1. Prepare Input Files

Before running, please ensure the following files are ready:

input.txt: Enter your podcast topic or core idea in this file.
prompt/prompt-overview.txt: System prompt used to guide AI in generating the podcast overall outline.
prompt/prompt-podscript.txt: System prompt used to guide AI in generating detailed dialogue scripts. It contains dynamic placeholders (such as {{numSpeakers}}, {{turnPattern}}), which the script will automatically replace.

2. Configure TTS Services and Characters

TTS configuration files (such as edge-tts.json) are stored in the config/ directory. This file defines the TTS service API interface, podcast characters (podUsers) and their corresponding voices (voices).

3. Run the Script

Execute the following command in the project root directory:


python podcast_generator.py [optional parameters]

Optional Parameters

--api-key <YOUR_OPENAI_API_KEY>: Your OpenAI API key. If not provided, it will be read from the configuration file or OPENAI_API_KEY environment variable.
--base-url <YOUR_OPENAI_BASE_URL>: Proxy address of the OpenAI API. If not provided, it will be read from the configuration file or OPENAI_BASE_URL environment variable.
--model <OPENAI_MODEL_NAME>: Specify the OpenAI model to use (such as gpt-4o, gpt-4-turbo). Default value is gpt-3.5-turbo.
--threads <NUMBER_OF_THREADS>: Specify the number of parallel threads for audio generation (default is 1) to improve processing speed.
--output-language <LANGUAGE_CODE>: Specify the output language of the podcast script (default is Chinese).
--usetime <TIME_DURATION>: Specify the time length of the podcast script (default is 10 minutes).

Running Example


# Use gpt-4o model, edge-tts service and 4 threads to generate podcast
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4

5. Using Web API (main.py)

This project also provides a FastAPI-based web service that allows you to generate podcasts through HTTP requests.

Start Web Service


python main.py

By default, the service will run on http://localhost:8000.

API Endpoints

Generate Podcast - POST /generate-podcast
- Parameters:
  - api_key: OpenAI API key
  - base_url: OpenAI API base URL (optional)
  - model: OpenAI model name (optional)
  - input_txt_content: Input text content
  - tts_providers_config_content: TTS provider configuration content
  - podUsers_json_content: Podcast user JSON configuration
  - threads: Number of threads (optional, default is 1)
  - tts_provider: TTS provider name (optional, default is "index-tts")
Get Podcast Generation Status - GET /podcast-status
- Requires X-Auth-Id header
Download Podcast - GET /download-podcast/
- Parameters:
  - file_name: Name of the file to download
Get Voice List - GET /get-voices
- Parameters:
  - tts_provider: TTS provider name (optional, default is "tts")

API Usage Example


# After starting the service, use curl to send a request to generate podcast
curl -X POST "http://localhost:8000/generate-podcast" \
  -H "X-Auth-Id: your-auth-id" \
  -F "api_key=sk-xxxxxx" \
  -F "model=gpt-4o" \
  -F "input_txt_content=The future development of artificial intelligence" \
  -F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \
  -F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"Host\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \
  -F "threads=4" \
  -F "tts_provider=index-tts"

4. Customizing AI Prompts (`custom` code block)

To provide more detailed AI instructions or add specific context, you can embed custom code blocks in the input.txt file. The content in this code block will be used as additional instructions, built into the core prompt for podcast script generation (prompt-podscript.txt), thereby influencing the AI's generation behavior.

Usage: In the input.txt file, define your custom content anywhere using the following format:


```custom-begin
Additional instructions or context you want to provide to the AI, for example:
- "Please ensure the discussion includes an in-depth analysis of [specific concept]."
- "Please add some humor to the conversation, especially jokes about [a certain topic]."
- "All characters' speeches must be brief, with each sentence not exceeding two lines."
```custom-end

🌐 Web Application (Next.js)

In addition to the command-line script and FastAPI service, this project also provides a fully functional web user interface. This interface aims to provide a more intuitive and convenient podcast generation and management experience, exposing complex backend functions through friendly frontend operations to users.

✨ Core Features

Web Operation Interface: Intuitive and friendly web interface that makes the podcast generation process clear at a glance.
Micro User System Integration: Supports user login, registration, points and billing functions, building a complete user ecosystem.
Podcast Creation and Configuration: Allows users to enter topics through forms and configure TTS characters, volume and speed parameters.
Real-time Progress Tracking: Displays the status and progress of podcast generation.
Podcast Playback and Management: Integrates an audio player for users to listen to generated podcasts and may provide functions for managing historical podcasts.
API Interaction: Seamless communication with the backend Python service through APIs, including podcast generation, status queries, and audio streaming.

🚀 Quick Start (Web)

Install Node.js: Please ensure Node.js is installed on your system (LTS version recommended).
Install Dependencies: Enter the web/ directory and install all frontend dependencies.
```
cd web/
npm install
# or yarn install
```
Start Development Server:
```
npm run dev
# or yarn dev
```
The web application will start at http://localhost:3000 (default).

Build Production Environment:


npm run build
# or yarn build
npm run start
# or yarn start

🐳 Docker Deployment

This project supports deployment via Docker. For detailed information, please refer to Docker Usage Guide.

🌍 Internationalization (i18n) Support

This project supports multilingual interfaces, currently supporting English (en), Chinese (zh-CN), and Japanese (ja).

📁 Language File Structure

Language files are located in the web/public/locales/ directory, grouped by language code:

web/public/locales/en/common.json - English translation
web/public/locales/zh-CN/common.json - Chinese translation
web/public/locales/ja/common.json - Japanese translation

🛠️ Adding New Languages

Create a new language folder in the web/public/locales/ directory, for example fr/
Copy the common.json file to the new folder
Translate all key-value pairs in the file
Update the languages variable in the web/src/i18n/settings.ts file

🌐 Language Switching

Users can automatically switch languages through the URL path or browser language settings:

http://localhost:3000/en/ - English interface
http://localhost:3000/zh-CN/ - Chinese interface
http://localhost:3000/ja/ - Japanese interface

⚙️ Configuration File Details

`config/[tts-provider].json` (TTS Character and Voice Configuration)

This is your core TTS configuration file, with the filename corresponding to the provider specified by the --tts-provider parameter. It tells the script how to work with the TTS service.


{
  "podUsers": [
    {
      "code": "zh-CN-XiaoxiaoNeural",
      "role": "Host"
    },
    {
      "code": "zh-CN-YunxiNeural",
      "role": "Tech Expert"
    }
  ],
  "voices": [
    {
      "name": "XiaoMin",
      "code": "yue-CN-XiaoMinNeural",
      "volume_adjustment": 1.0, 
      "speed_adjustment": 5.0
    }
  ],
  "apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}",
  "turnPattern": "random",
  "tts_max_retries": 3
}

podUsers: Defines the characters in the podcast. Each character's code must correspond to a valid voice in the voices list.
voices: Defines all available TTS voices.
- volume_adjustment (optional): Volume adjustment (dB). For example, 6.0 increases volume by 6dB.
- speed_adjustment (optional): Speed adjustment (%). For example, 10.0 increases speed by 10%.
apiUrl: Your TTS service API endpoint. {{text}} and {{voiceCode}} are placeholders.
turnPattern: Defines the turn-taking mode for character dialogue, such as random (random) or sequential (sequential).
tts_max_retries (optional): Maximum number of retries when TTS API calls fail (default is 3).

`config/tts_providers.json` (TTS Provider Authentication)

This file is used to centrally manage authentication information (such as API keys) for all TTS service providers.


{
  "index": { "api_key": null },
  "edge": { "api_key": null },
  "doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" },
  "fish": { "api_key": "null" },
  "minimax": { "group_id": "null", "api_key": "null" },
  "gemini": { "api_key": "null" }
}

Note: In actual use, please replace "null" with valid authentication information. You can create a tts_providers-local.json to store real keys, which has been ignored by .gitignore.

🔌 Supported TTS Services

This project is designed to be highly flexible and supports multiple TTS services.

Provider	Type	Support Status
Index-TTS	Local	✅ Supported
Edge-TTS	Local	✅ Supported
Doubao	Network	✅ Supported
Minimax	Network	✅ Supported
Fish Audio	Network	✅ Supported
Gemini	Network	✅ Supported
OpenAI TTS	Network	Planned
Azure TTS	Network	Planned

🎉 Output Results

All successfully generated podcast audio files will be automatically saved in the output/ directory. The filename format is podcast_ plus the timestamp when it was generated, for example podcast_1678886400.wav.

🎧 Sample Audio

You can find sample podcast audio generated using different TTS services in the example/ folder.

TTS Service	Listen Link
Edge TTS	▶️ edgeTTS.wav
Index TTS	▶️ indexTTS.wav
Doubao TTS	▶️ doubaoTTS.wav
Minimax	▶️ minimax.wav
Fish Audio	▶️ fish.wav
Gemini TTS	▶️ geminiTTS.wav

📂 File Structure


.
├── config/                  # ⚙️ Configuration directory
│   ├── doubao-tts.json      # ... (configuration for each TTS provider)
│   └── tts_providers.json   # Unified TTS authentication file
├── server/                  # 🐍 Backend service directory
│   ├── main.py              # FastAPI Web API entry: Provides RESTful APIs for podcast generation, status query, audio download, manages task lifecycle, and performs data cleanup.
│   ├── podcast_generator.py # Core podcast generation logic: Responsible for interacting with OpenAI API to generate podcast scripts, calling TTS adapters to convert text to speech, and using FFmpeg to merge audio files.
│   ├── tts_adapters.py      # TTS adapter: Encapsulates interaction logic with different TTS services (such as Index-TTS, Edge-TTS, Doubao, Minimax, Fish Audio, Gemini).
│   ├── openai_cli.py        # OpenAI command-line tool
│   └── ...                  # Other backend files
├── web/                     # 🌐 Frontend Web Application Directory (Next.js)
│   ├── public/              # Static resources
│   ├── src/                 # Source code
│   │   ├── app/             # Next.js route pages
│   │   ├── components/      # React components
│   │   ├── hooks/           # React Hooks
│   │   ├── lib/             # Library files (authentication, database, API, etc.)
│   │   └── types/           # TypeScript type definitions
│   ├── package.json         # Frontend dependencies
│   ├── next.config.js       # Next.js configuration
│   └── ...                  # Other frontend files
├── prompt/                  # 🧠 AI prompt directory
│   ├── prompt-overview.txt
│   └── prompt-podscript.txt
├── example/                 # 🎧 Sample audio directory
├── output/                  # 🎉 Output audio directory
├── input.txt                # 🎙️ Podcast topic input file
├── README.md                # 📄 Project documentation (Chinese)
└── README_EN.md             # 📄 Project documentation (English)

📝 Disclaimer

License: This project is licensed under GPL-3.0.
No Warranty: This software is provided "as is" without any express or implied warranties.
Liability Limitation: Under no circumstances shall the authors or copyright holders be liable for any damages arising from the use of this software.
Third-Party Services: Users bear the risks and responsibilities of using third-party services (such as OpenAI API, TTS services) on their own.
Usage Purpose: This project is for learning and research purposes only. Please comply with all applicable laws and regulations.
Final Interpretation Rights: We reserve the right to modify this disclaimer at any time.