Public

WeChat Login

Code ISSUE Pull requests Events Packages Insights

main

banana-slides/README_EN.md

github-actions[bot]<github-actions[bot]@users.noreply.github.com>

docs: auto translate README to English [skip ci]

a8053a09

0 commits

PreviewCode viewBlame

Banana Slides

简体中文 • English

A native AI PPT generation application based on nano banana pro 🍌
From idea to presentation in minutes—no tedious formatting, voice-driven modifications, moving towards true "Vibe PPT"

🚀 Online Demo | 📖 Documentation | Deployment

If this project is helpful to you, feel free to Star 🌟 & Fork 🍴

✨ Project Origin

Have you ever found yourself in this dilemma: a presentation is due tomorrow, but your slides are still blank; your mind is full of brilliant ideas, yet your enthusiasm is drained by tedious layout and design?

We long to quickly create presentations that are both professional and aesthetically pleasing. While traditional AI PPT generation apps generally meet the need for "speed," they still face the following issues:

1️⃣ Limited to preset templates with no flexibility to adjust styles
2️⃣ Low creative freedom, making multi-round revisions difficult
3️⃣ Similar visual output, leading to severe homogenization
4️⃣ Low-quality assets that lack relevance
5️⃣ Disjointed text-image layouts with poor design aesthetics

These shortcomings make it difficult for traditional AI PPT generators to satisfy our dual needs for "speed" and "beauty." Even those claiming to be "Vibe PPT" are, in my eyes, far from having enough "Vibe."

However, the emergence of the nano banana🍌 model has changed everything. I tried using 🍌pro to generate PPT pages and found that the results were exceptional in terms of quality, aesthetics, and consistency. It can accurately render almost all text requested in the prompts while following the style of reference images. So why not build a native "Vibe PPT" application based on 🍌pro?

👨‍💻 Use Cases

Beginners: Quickly generate beautiful PPTs with zero barrier to entry, no design experience required, and no more hassle of choosing templates.
PPT Professionals: Use AI-generated layouts and combinations of text and visual elements as references to quickly gain design inspiration.
Educators: Rapidly transform teaching content into illustrated lesson plan PPTs to enhance classroom effectiveness.
Students: Complete assignment presentations quickly, focusing energy on content rather than formatting and aesthetics.
Business Professionals: Quickly visualize business proposals and product introductions with rapid adaptation to multiple scenarios.

🎯 Goal: Lower the barrier to PPT creation, enabling everyone to quickly create beautiful and professional presentations.

🎨 Result Examples



Software Development Best Practices	DeepSeek-V3.2 Technical Showcase

R&D and Industrialization of Intelligent Production Equipment for Prepared Meals	The Evolution of Money: From Shells to Banknotes

See more Use Cases

🎯 Features

1. Flexible and Diverse Creative Paths

Supports three starting methods—Ideas, Outlines, and Page Descriptions—to suit various creative workflows.

One-Sentence Generation: Simply input a topic, and the AI will automatically generate a clearly structured outline and page-by-page content descriptions.
Natural Language Editing: Supports modifying outlines or descriptions via natural language (e.g., "Change the third page to a case study"), with the AI responding and adjusting in real-time.
Outline/Description Mode: Supports both one-click batch generation and manual fine-tuning of details.

2. Powerful Asset Parsing Capabilities

Multi-format Support: Upload PDF/Docx/MD/Txt and other files, and the system automatically parses the content in the background.
Intelligent Extraction: Automatically identifies key points, image links, and chart information within the text, providing rich materials for generation.
Style Reference: Supports uploading reference images or templates to customize the PPT style.

3. "Vibe"-style Natural Language Modification

No longer restricted by complex menu buttons, issue modification commands directly using natural language.

Local Inpainting: Perform verbal modifications on unsatisfactory areas (e.g., "Change this chart to a pie chart").
Full-page Optimization: Generate high-definition, stylistically consistent pages based on nano banana pro🍌.

4. Out-of-the-box Format Export

Multi-format Support: One-click export to standard PPTX or PDF files.
Perfect Fit: Default 16:9 aspect ratio, no manual layout adjustments needed, ready for direct presentation.

5. Fully Editable PPTX Export (Beta Iteration)

Export images as high-fidelity, clean-background PPT pages with freely editable images and text
For related updates, see https://github.com/Anionex/banana-slides/issues/121

6. One-click Export for Explainer Videos

One-click conversion of slides into presentation videos (MP4) with AI voiceovers and subtitles
AI automatically generates natural, spoken voiceovers based on page descriptions and content
Supports configuration of various expression styles, multiple languages, and a wide range of voices

🌟 Feature Comparison with NotebookLM Slide Deck

Feature	NotebookLM	This Project
Page Limit	15 pages	Unlimited
Secondary Editing	Modify via prompts	Selection editing + Verbal editing
Adding Assets	Cannot add after generation	Add freely after generation
Export Formats	Supports PDF, (non-editable image) PPTX	Export as PDF, (image or editable) PPTX, presentation video
Watermark	Watermarks in free version	No watermarks, freely add/remove elements

Note: This comparison may become outdated as new features are added.

🔥 Recent Updates

[2026-04-25]: Asset Toolbox launched — Added full-image editing, marquee editing (overlay/replace), and smart erasure modes on top of existing asset generation, providing a unified entry point for one-stop operation.
[2026-04-25]: Support for account binding via OpenAI official OAuth login. Once bound, Codex can be used directly as a text/image generation provider without manually entering an API Key (based on OpenAI official OAuth PKCE authorization flow, non-reverse engineered).
[2026-04-25]: Support for saving custom text style description templates. Can be named, color-coded, and persistently reused, eliminating the need to re-enter every time.
[2026-04-23]: Added support for the gpt-image-2 model. The editable background effect during export has also been improved due to model capability upgrades (select "Generative Acquisition" in Settings -> Export Options -> Background Acquisition).
[2026-04-11]: Support for CLI operations and addition of agent skills.
[2026-03]: Added several features and optimizations, such as additional fields, multi-aspect ratio settings, etc.
[2026-02-09]: New features and optimizations
- New Features
  - Support for pasting images in the home page, outline, and description cards for immediate recognition, providing a better interactive experience.
  - Manual outline chapter editing: Support for manually adjusting the chapter (part) a page belongs to.
  - Docker multi-architecture: Image support for amd64 / arm64 builds.
  - i18n + Dark Mode: Added Chinese/English switching; support for Light/Dark/System theme; dark mode adaptation for all components.
- Fixes and Experience Optimizations
  - Fixed export-related 500 errors, reference file association timing, outline/page data misalignment, task polling errors, infinite polling in description generation, image preview memory leaks, and partial failure handling in bulk deletion.
  - Optimized format example prompts, HTTP error message copy, Modal closing experience, cleaned up old project localStorage, and removed redundant prompts for first-time project creation.
  - Several other optimizations and fixes.
[2026-01-04]: v0.4.0 Release: Comprehensive upgrade for editable PPTX export:
- Support for maximizing the restoration of font size, color, bold, and other styles of text within images;
- Added recognition of text content within tables;
- More precise text size and position restoration logic;
- Optimized export workflow, significantly reducing the occurrence of residual text on background images after export;
- Support for page multi-selection logic, allowing flexible selection of specific pages for generation and export.
- For detailed effects and usage, see https://github.com/Anionex/banana-slides/issues/121

🗺️ Roadmap

Status	Milestones
✅ Completed	Create PPT via three paths: idea, outline, and page description
✅ Completed	Parse Markdown-formatted images in text
✅ Completed	Add more assets to a single PPT slide
✅ Completed	Vibe verbal editing for selected areas on a single PPT slide
✅ Completed	Asset module: Asset generation, uploading, etc.
✅ Completed	Support for uploading and parsing multiple file types
✅ Completed	Support Vibe verbal adjustment of outlines and descriptions
✅ Completed	Initial support for exporting editable .pptx files
🔄 In Progress	Support for multi-layer, precise background removal in editable .pptx exports
🔄 In Progress	Web search
🔄 In Progress	Agent mode
✅ Completed	TTS narration video export (CN/EN/JP multi-voice, subtitles, Ken Burns effects)
🚍 Partial	Optimize front-end loading speed
🧭 Planned	Online playback functionality
🧭 Planned	Simple animations and slide transitions
🚍 Partial	Multi-language support

📦 Usage

(New) One-click deployment using application templates

This is the simplest method, requiring no Docker installation or project downloading. You can access the application immediately after creation.

Deploy and launch this application with one click via Rainyun (High bandwidth, ideal for high-definition image generation and downloading. New users get a 15-day free trial.)

Step-by-step Tutorial

Coming soon

Using Docker Compose 🐳

Quickly start front-end and back-end services via Docker Compose.

📒 Windows/Mac User Instructions

If you are using Windows or macOS, please install Docker Desktop first and ensure Docker is running (Windows users can check the system tray icon; macOS users can check the menu bar icon). Then follow the same steps as described in the documentation.

Tip: If you encounter issues, Windows users should enable the WSL 2 backend in Docker Desktop settings (recommended). Also, ensure ports 3000 and 5000 are not occupied.

Clone the repository

git clone https://github.com/Anionex/banana-slides
cd banana-slides

Configure environment variables

Create the .env file (refer to .env.example):

cp .env.example .env

(Optional, can also be configured in the UI after startup; click here for the tutorial) Edit the .env file to configure the necessary environment variables:

Click to expand details

The LLM API in this project follows the AIHubMix platform format. It is recommended to use AIHubMix (click here to visit) to obtain API keys and reduce migration costs.
Note: The Google Nano Banana Pro model API has higher costs; please be mindful of usage expenses.


# AI Provider Configuration Format (gemini / openai / vertex)

AI_PROVIDER_FORMAT=gemini

# Gemini Format Configuration (Used when AI_PROVIDER_FORMAT=gemini)

GOOGLE_API_KEY=your-api-key-here
GOOGLE_API_BASE=https://generativelanguage.googleapis.com

# Proxy Example: https://aihubmix.com/gemini

# OpenAI Format Configuration (Used when AI_PROVIDER_FORMAT=openai)

OPENAI_API_KEY=your-api-key-here
OPENAI_API_BASE=https://api.openai.com/v1

# Proxy Example: https://aihubmix.com/v1

# Vertex AI Configuration (AI_PROVIDER_FORMAT=vertex)

# GCP Project and Service Account Key Required

# VERTEX_PROJECT_ID=your-gcp-project-id

# VERTEX_LOCATION=global

# GOOGLE_APPLICATION_CREDENTIALS=./gcp-service-account.json

# Lazyllm Format Configuration (used when AI_PROVIDER_FORMAT=lazyllm)

# Select vendors for text and image generation

TEXT_MODEL_SOURCE=deepseek        # Text generation model provider
IMAGE_MODEL_SOURCE=doubao         # Image editing model provider
IMAGE_CAPTION_MODEL_SOURCE=qwen   # Image captioning model provider

# API Keys for Each Provider (Only configure the ones you want to use)

```env
DOUBAO_API_KEY=your-doubao-api-key            # Volcengine/Doubao
DEEPSEEK_API_KEY=your-deepseek-api-key        # DeepSeek
QWEN_API_KEY=your-qwen-api-key                # Alibaba Cloud/Qwen
GLM_API_KEY=your-glm-api-key                  # Zhipu GLM
SILICONFLOW_API_KEY=your-siliconflow-api-key  # SiliconFlow
SENSENOVA_API_KEY=your-sensenova-api-key      # SenseTime SenseNova
MINIMAX_API_KEY=your-minimax-api-key          # MiniMax
...

Use the new version of the editable export configuration method to get better editable export results: You need to obtain an API KEY from the Baidu Intelligent Cloud Platform (click here to enter), and fill it in the BAIDU_API_KEY field in the .env file (there is a sufficient free usage quota). See the instructions in https://github.com/Anionex/banana-slides/issues/121 for details.

📒 Vertex AI Configuration Guide (for GCP Users)

Google Cloud Vertex AI allows calling Gemini models through a GCP service account, and new users can use trial credits. Configuration steps:

Go to the GCP Console, create a service account, and download the JSON format key file.
Save the key file as gcp-service-account.json in the project root directory.

Set the following in .env:

AI_PROVIDER_FORMAT=vertex
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=global

If deploying with Docker, you also need to uncomment the relevant sections in docker-compose.yml, mount the key file into the container, and set the GOOGLE_APPLICATION_CREDENTIALS environment variable.

The gemini-3-* series models require VERTEX_LOCATION=global.

Start the Service

⚡ Use Pre-built Images (Recommended)

The project provides pre-built frontend and backend images on Docker Hub (synced with the latest version of the main branch), allowing you to skip the local build steps for rapid deployment:


# Launch with Pre-built Images (No need to build from scratch)

docker compose -f docker-compose.prod.yml up -d

Image names:

anoinex/banana-slides-frontend:latest
anoinex/banana-slides-backend:latest

Build images from scratch

docker compose up -d

TIP

If you encounter network issues, you can uncomment the mirror source configurations in the .env file and then rerun the startup command:

# Uncomment the following in the .env file to use domestic mirror sources
DOCKER_REGISTRY=docker.1ms.run/
GHCR_REGISTRY=ghcr.nju.edu.cn/
APT_MIRROR=mirrors.aliyun.com
PYPI_INDEX_URL=https://mirrors.cloud.tencent.com/pypi/simple
NPM_REGISTRY=https://registry.npmmirror.com/

Access the Application

Frontend: http://localhost:3000
Backend API: http://localhost:5000

View Logs

View Backend Logs (Last 200 Lines)

docker logs --tail 200 banana-slides-backend

View Backend Logs in Real-time (Last 100 Lines)

docker logs -f --tail 100 banana-slides-backend

View Frontend Logs (Last 100 Lines)

docker logs --tail 100 banana-slides-frontend

Stop Services

docker compose down

Update Project

Using Pre-built Images (docker-compose.prod.yml)

docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d

Using Local Build (docker-compose.yml)

Note: If you have manually modified the code, this method is not applicable. You must first revert the code to the version it was when pulled.

git pull 
docker compose down
docker compose build --no-cache
docker compose up -d

Note: Thanks to our fellow developer @ShellMonster for providing the Newbie Deployment Tutorial. It is specially designed for beginners without any server deployment experience. You can click the link to view.

Deploy from Source

Environment Requirements

Python 3.10 or higher
uv - Python package manager
Node.js 16+ and npm
FFmpeg - Required for exporting explanation videos; must include support for libass / ass subtitle filters.
Valid Google Gemini API key
(Optional) LibreOffice - Required when uploading PPTX files using the "PPT Remake" feature to convert PPTX to PDF. It is recommended to convert PPTX to PDF locally before uploading. Reason: Server-side rendering with LibreOffice may cause layout displacement due to missing fonts (such as Microsoft YaHei, Calibri, etc.) and cannot fully restore certain special effects. LibreOffice is not required if you upload PDF files directly. For Docker users who still need PPTX upload support within the container, run:
```
docker exec -it banana-slides-backend bash -c "apt-get update && apt-get install -y libreoffice-impress && rm -rf /var/lib/apt/lists/*"
```
Note: LibreOffice installed via this method will be lost when the container is rebuilt and will need to be reinstalled.

Backend Installation

Clone the repository

git clone https://github.com/Anionex/banana-slides
cd banana-slides

Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dependencies

Run the following command in the project root directory:


# macOS (Homebrew)

brew install ffmpeg-full
brew unlink ffmpeg 2>/dev/null || true
brew link --overwrite --force ffmpeg-full

# Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y ffmpeg libass9

# Then install Python dependencies

```bash
uv sync

This will automatically install all dependencies based on pyproject.toml.

Configure environment variables

Copy the environment variable template:

cp .env.example .env

Then, following the method described above, open and edit the `.env` file to configure your API key.

Cloud-Native-GORM-Gen

A powerful, type-safe GORM code generation tool, specifically designed for cloud-native architectures.

Introduction

Cloud-Native-GORM-Gen is an enhanced code generator based on GORM (Go Object Relational Mapping). It not only generates basic CRUD operations but also supports complex query logic, association mapping, and a highly customizable template engine, aiming to reduce boilerplate code and improve development efficiency.

Key Features

Type Safety: The generated DAO layer code completely avoids the use of interface{}, allowing errors to be discovered at compile time.
Code as Documentation: Automatically generates code comments from database comments, keeping the models in sync with the database.
Excellent Performance: Compared to native GORM, it reduces the overhead of dynamic reflection.
Highly Extensible: Supports configuration through YAML or Go code, and supports custom plugins and templates.
Cloud-Native Friendly: Lightweight and easy to integrate into CI/CD pipelines.

Quick Start

Installation

go install github.com/cloud-native-gen/gorm-gen/tools/gentool@latest

Basic Usage

Create the gen.tool configuration file:

version: "1.0"
database:
  dsn: "root:password@tcp(127.0.0.1:3306)/dbname?charset=utf8mb4&parseTime=True&loc=Local"
  dbType: "mysql"
outPath: "./dao/query"
modelPkgPath: "./dao/model"

Run the generation command:

gentool -c gen.tool

Contributing

Welcome to submit Pull Requests or report Issues. Please ensure you read our Contributing Guide before submitting.

License

This project is licensed under the MIT License.

Frontend Installation

Enter the frontend directory

cd frontend

Install dependencies

npm install

Configure the API address

The frontend will automatically connect to the backend service at http://localhost:5000. To modify this, please edit src/api/client.ts.

Start the Backend Service

(Optional) If you have important local data, it is recommended to back up the database before upgrading:
cp backend/instance/database.db backend/instance/database.db.bak Note: In the default configuration, templates, assets, and final products are stored in the uploads/ folder.

cd backend
uv run alembic upgrade head && uv run python app.py

The backend service will start at http://localhost:5000.

Visit http://localhost:5000/health to verify that the service is running correctly.

Start Frontend Development Server

cd frontend
npm run dev

The frontend development server will start at http://localhost:3000.

Open your browser to access and use the application.

🛠️ Technical Architecture

Front-end Technology Stack

React 18 + TypeScript + Vite 5 + Zustand

Backend Tech Stack

Python 3.10+ + Flask 3.0 + uv + SQLite

Communication Group

To facilitate communication and mutual assistance, this WeChat group has been created.

Feel free to suggest new features or provide feedback. I will also answer your questions ~~at my own pace~~.

Follow the author on social media, where I share information about this project and AI:

🔧 FAQ

See the official documentation

🤝 Contributing Guide

Welcome to contribute to this project via Issue and Pull Request!

Important: Please read CONTRIBUTING.md before contributing.

📄 License

This project is open-sourced under the GNU Affero General Public License v3.0 (AGPL-3.0). It can be freely used for non-commercial purposes such as personal learning, research, experimentation, education, or non-profit scientific research activities;

🚀 Sponsor

Thanks to AIHubMix for sponsoring this project

Thanks to AI Huo Bao for sponsoring this project

Acknowledgements

Project Contributors:

Linux.do: A new ideal community

Sponsor

Open source is not easy 🙏 If you find this project valuable, feel free to buy the developer a coffee ☕️

Thanks to the following friends for their generous sponsorship and support:

@雅俗共赏, @曹峥, @以年观日, @John, @胡yun星Ethan, @azazo1, @刘聪NLP, @🍟, @苍何, @万瑾, @biubiu, @law, @方源, @寒松Falcon If you have any questions regarding the sponsorship list, please contact the author