logo
0
0
WeChat Login
github-actions[bot]<github-actions[bot]@users.noreply.github.com>
docs: auto translate README to English [skip ci]

Banana Slides

Anionex%2Fbanana-slides | Trendshift
Featured|HelloGitHub

简体中文  •  English

GitHub Stars GitHub Forks GitHub Watchers Version License
Docker Build Ask DeepWiki

A native AI PPT generation application based on nano banana pro 🍌
From idea to presentation in minutes—no tedious formatting, voice-driven modifications, moving towards true "Vibe PPT"

🚀 Online Demo  |  📖 Documentation  |  Deployment

If this project is helpful to you, feel free to Star 🌟 & Fork 🍴

✨ Project Origin

Have you ever found yourself in this dilemma: a presentation is due tomorrow, but your slides are still blank; your mind is full of brilliant ideas, yet your enthusiasm is drained by tedious layout and design?

We long to quickly create presentations that are both professional and aesthetically pleasing. While traditional AI PPT generation apps generally meet the need for "speed," they still face the following issues:

  • 1️⃣ Limited to preset templates with no flexibility to adjust styles
  • 2️⃣ Low creative freedom, making multi-round revisions difficult
  • 3️⃣ Similar visual output, leading to severe homogenization
  • 4️⃣ Low-quality assets that lack relevance
  • 5️⃣ Disjointed text-image layouts with poor design aesthetics

These shortcomings make it difficult for traditional AI PPT generators to satisfy our dual needs for "speed" and "beauty." Even those claiming to be "Vibe PPT" are, in my eyes, far from having enough "Vibe."

However, the emergence of the nano banana🍌 model has changed everything. I tried using 🍌pro to generate PPT pages and found that the results were exceptional in terms of quality, aesthetics, and consistency. It can accurately render almost all text requested in the prompts while following the style of reference images. So why not build a native "Vibe PPT" application based on 🍌pro?

👨‍💻 Use Cases

  1. Beginners: Quickly generate beautiful PPTs with zero barrier to entry, no design experience required, and no more hassle of choosing templates.
  2. PPT Professionals: Use AI-generated layouts and combinations of text and visual elements as references to quickly gain design inspiration.
  3. Educators: Rapidly transform teaching content into illustrated lesson plan PPTs to enhance classroom effectiveness.
  4. Students: Complete assignment presentations quickly, focusing energy on content rather than formatting and aesthetics.
  5. Business Professionals: Quickly visualize business proposals and product introductions with rapid adaptation to multiple scenarios.

🎯 Goal: Lower the barrier to PPT creation, enabling everyone to quickly create beautiful and professional presentations.

🎨 Result Examples

案例3案例2
Software Development Best PracticesDeepSeek-V3.2 Technical Showcase
案例4案例1
R&D and Industrialization of Intelligent Production Equipment for Prepared MealsThe Evolution of Money: From Shells to Banknotes

See more Use Cases

🎯 Features

1. Flexible and Diverse Creative Paths

Supports three starting methods—Ideas, Outlines, and Page Descriptions—to suit various creative workflows.

  • One-Sentence Generation: Simply input a topic, and the AI will automatically generate a clearly structured outline and page-by-page content descriptions.
  • Natural Language Editing: Supports modifying outlines or descriptions via natural language (e.g., "Change the third page to a case study"), with the AI responding and adjusting in real-time.
  • Outline/Description Mode: Supports both one-click batch generation and manual fine-tuning of details.
image

2. Powerful Asset Parsing Capabilities

  • Multi-format Support: Upload PDF/Docx/MD/Txt and other files, and the system automatically parses the content in the background.
  • Intelligent Extraction: Automatically identifies key points, image links, and chart information within the text, providing rich materials for generation.
  • Style Reference: Supports uploading reference images or templates to customize the PPT style.
File Parsing and Material Processing

3. "Vibe"-style Natural Language Modification

No longer restricted by complex menu buttons, issue modification commands directly using natural language.

  • Local Inpainting: Perform verbal modifications on unsatisfactory areas (e.g., "Change this chart to a pie chart").
  • Full-page Optimization: Generate high-definition, stylistically consistent pages based on nano banana pro🍌.
image

4. Out-of-the-box Format Export

  • Multi-format Support: One-click export to standard PPTX or PDF files.
  • Perfect Fit: Default 16:9 aspect ratio, no manual layout adjustments needed, ready for direct presentation.
image PPT与PDF导出

5. Fully Editable PPTX Export (Beta Iteration)

6. One-click Export for Explainer Videos

  • One-click conversion of slides into presentation videos (MP4) with AI voiceovers and subtitles
  • AI automatically generates natural, spoken voiceovers based on page descriptions and content
  • Supports configuration of various expression styles, multiple languages, and a wide range of voices

🌟 Feature Comparison with NotebookLM Slide Deck

FeatureNotebookLMThis Project
Page Limit15 pagesUnlimited
Secondary EditingModify via promptsSelection editing + Verbal editing
Adding AssetsCannot add after generationAdd freely after generation
Export FormatsSupports PDF, (non-editable image) PPTXExport as PDF, (image or editable) PPTX, presentation video
WatermarkWatermarks in free versionNo watermarks, freely add/remove elements

Note: This comparison may become outdated as new features are added.

🔥 Recent Updates

  • [2026-04-25]: Asset Toolbox launched — Added full-image editing, marquee editing (overlay/replace), and smart erasure modes on top of existing asset generation, providing a unified entry point for one-stop operation.
  • [2026-04-25]: Support for account binding via OpenAI official OAuth login. Once bound, Codex can be used directly as a text/image generation provider without manually entering an API Key (based on OpenAI official OAuth PKCE authorization flow, non-reverse engineered).
  • [2026-04-25]: Support for saving custom text style description templates. Can be named, color-coded, and persistently reused, eliminating the need to re-enter every time.
  • [2026-04-23]: Added support for the gpt-image-2 model. The editable background effect during export has also been improved due to model capability upgrades (select "Generative Acquisition" in Settings -> Export Options -> Background Acquisition).
  • [2026-04-11]: Support for CLI operations and addition of agent skills.
  • [2026-03]: Added several features and optimizations, such as additional fields, multi-aspect ratio settings, etc.
  • [2026-02-09]: New features and optimizations
    • New Features
      • Support for pasting images in the home page, outline, and description cards for immediate recognition, providing a better interactive experience.
      • Manual outline chapter editing: Support for manually adjusting the chapter (part) a page belongs to.
      • Docker multi-architecture: Image support for amd64 / arm64 builds.
      • i18n + Dark Mode: Added Chinese/English switching; support for Light/Dark/System theme; dark mode adaptation for all components.
    • Fixes and Experience Optimizations
      • Fixed export-related 500 errors, reference file association timing, outline/page data misalignment, task polling errors, infinite polling in description generation, image preview memory leaks, and partial failure handling in bulk deletion.
      • Optimized format example prompts, HTTP error message copy, Modal closing experience, cleaned up old project localStorage, and removed redundant prompts for first-time project creation.
      • Several other optimizations and fixes.
  • [2026-01-04]: v0.4.0 Release: Comprehensive upgrade for editable PPTX export:
    • Support for maximizing the restoration of font size, color, bold, and other styles of text within images;
    • Added recognition of text content within tables;
    • More precise text size and position restoration logic;
    • Optimized export workflow, significantly reducing the occurrence of residual text on background images after export;
    • Support for page multi-selection logic, allowing flexible selection of specific pages for generation and export.
    • For detailed effects and usage, see https://github.com/Anionex/banana-slides/issues/121

🗺️ Roadmap

StatusMilestones
✅ CompletedCreate PPT via three paths: idea, outline, and page description
✅ CompletedParse Markdown-formatted images in text
✅ CompletedAdd more assets to a single PPT slide
✅ CompletedVibe verbal editing for selected areas on a single PPT slide
✅ CompletedAsset module: Asset generation, uploading, etc.
✅ CompletedSupport for uploading and parsing multiple file types
✅ CompletedSupport Vibe verbal adjustment of outlines and descriptions
✅ CompletedInitial support for exporting editable .pptx files
🔄 In ProgressSupport for multi-layer, precise background removal in editable .pptx exports
🔄 In ProgressWeb search
🔄 In ProgressAgent mode
✅ CompletedTTS narration video export (CN/EN/JP multi-voice, subtitles, Ken Burns effects)
🚍 PartialOptimize front-end loading speed
🧭 PlannedOnline playback functionality
🧭 PlannedSimple animations and slide transitions
🚍 PartialMulti-language support

📦 Usage

(New) One-click deployment using application templates

This is the simplest method, requiring no Docker installation or project downloading. You can access the application immediately after creation.

  1. Deploy and launch this application with one click via Rainyun (High bandwidth, ideal for high-definition image generation and downloading. New users get a 15-day free trial.)

Deploy on Rainyun

  1. Coming soon

Using Docker Compose 🐳

Quickly start front-end and back-end services via Docker Compose.

📒 Windows/Mac User Instructions

If you are using Windows or macOS, please install Docker Desktop first and ensure Docker is running (Windows users can check the system tray icon; macOS users can check the menu bar icon). Then follow the same steps as described in the documentation.

Tip: If you encounter issues, Windows users should enable the WSL 2 backend in Docker Desktop settings (recommended). Also, ensure ports 3000 and 5000 are not occupied.

  1. Clone the repository
git clone https://github.com/Anionex/banana-slides
cd banana-slides
  1. Configure environment variables

Create the .env file (refer to .env.example):

cp .env.example .env

(Optional, can also be configured in the UI after startup; click here for the tutorial) Edit the .env file to configure the necessary environment variables:

Click to expand details

The LLM API in this project follows the AIHubMix platform format. It is recommended to use AIHubMix (click here to visit) to obtain API keys and reduce migration costs.
Note: The Google Nano Banana Pro model API has higher costs; please be mindful of usage expenses.


# AI Provider Configuration Format (gemini / openai / vertex)

AI_PROVIDER_FORMAT=gemini

# Gemini Format Configuration (Used when AI_PROVIDER_FORMAT=gemini)

GOOGLE_API_KEY=your-api-key-here
GOOGLE_API_BASE=https://generativelanguage.googleapis.com

# Proxy Example: https://aihubmix.com/gemini

# OpenAI Format Configuration (Used when AI_PROVIDER_FORMAT=openai)

OPENAI_API_KEY=your-api-key-here
OPENAI_API_BASE=https://api.openai.com/v1

# Proxy Example: https://aihubmix.com/v1

# Vertex AI Configuration (AI_PROVIDER_FORMAT=vertex)

# GCP Project and Service Account Key Required

# VERTEX_PROJECT_ID=your-gcp-project-id

# VERTEX_LOCATION=global

# GOOGLE_APPLICATION_CREDENTIALS=./gcp-service-account.json

# Lazyllm Format Configuration (used when AI_PROVIDER_FORMAT=lazyllm)

# Select vendors for text and image generation

TEXT_MODEL_SOURCE=deepseek        # Text generation model provider
IMAGE_MODEL_SOURCE=doubao         # Image editing model provider
IMAGE_CAPTION_MODEL_SOURCE=qwen   # Image captioning model provider

# API Keys for Each Provider (Only configure the ones you want to use)

```env
DOUBAO_API_KEY=your-doubao-api-key            # Volcengine/Doubao
DEEPSEEK_API_KEY=your-deepseek-api-key        # DeepSeek
QWEN_API_KEY=your-qwen-api-key                # Alibaba Cloud/Qwen
GLM_API_KEY=your-glm-api-key                  # Zhipu GLM
SILICONFLOW_API_KEY=your-siliconflow-api-key  # SiliconFlow
SENSENOVA_API_KEY=your-sensenova-api-key      # SenseTime SenseNova
MINIMAX_API_KEY=your-minimax-api-key          # MiniMax
...

Use the new version of the editable export configuration method to get better editable export results: You need to obtain an API KEY from the Baidu Intelligent Cloud Platform (click here to enter), and fill it in the BAIDU_API_KEY field in the .env file (there is a sufficient free usage quota). See the instructions in https://github.com/Anionex/banana-slides/issues/121 for details.

📒 Vertex AI Configuration Guide (for GCP Users)

Google Cloud Vertex AI allows calling Gemini models through a GCP service account, and new users can use trial credits. Configuration steps:

  1. Go to the GCP Console, create a service account, and download the JSON format key file.
  2. Save the key file as gcp-service-account.json in the project root directory.
  3. Set the following in .env:
    AI_PROVIDER_FORMAT=vertex
    VERTEX_PROJECT_ID=your-gcp-project-id
    VERTEX_LOCATION=global
    
  4. If deploying with Docker, you also need to uncomment the relevant sections in docker-compose.yml, mount the key file into the container, and set the GOOGLE_APPLICATION_CREDENTIALS environment variable.

The gemini-3-* series models require VERTEX_LOCATION=global.

  1. Start the Service

⚡ Use Pre-built Images (Recommended)

The project provides pre-built frontend and backend images on Docker Hub (synced with the latest version of the main branch), allowing you to skip the local build steps for rapid deployment:


# Launch with Pre-built Images (No need to build from scratch)

docker compose -f docker-compose.prod.yml up -d

Image names:

  • anoinex/banana-slides-frontend:latest
  • anoinex/banana-slides-backend:latest

Build images from scratch

docker compose up -d

TIP

If you encounter network issues, you can uncomment the mirror source configurations in the .env file and then rerun the startup command:

# Uncomment the following in the .env file to use domestic mirror sources
DOCKER_REGISTRY=docker.1ms.run/
GHCR_REGISTRY=ghcr.nju.edu.cn/
APT_MIRROR=mirrors.aliyun.com
PYPI_INDEX_URL=https://mirrors.cloud.tencent.com/pypi/simple
NPM_REGISTRY=https://registry.npmmirror.com/
  1. Access the Application
  1. View Logs

View Backend Logs (Last 200 Lines)

docker logs --tail 200 banana-slides-backend

View Backend Logs in Real-time (Last 100 Lines)

docker logs -f --tail 100 banana-slides-backend

View Frontend Logs (Last 100 Lines)

docker logs --tail 100 banana-slides-frontend
  1. Stop Services
docker compose down
  1. Update Project

Using Pre-built Images (docker-compose.prod.yml)

docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d

Using Local Build (docker-compose.yml)

Note: If you have manually modified the code, this method is not applicable. You must first revert the code to the version it was when pulled.

git pull 
docker compose down
docker compose build --no-cache
docker compose up -d

Note: Thanks to our fellow developer @ShellMonster for providing the Newbie Deployment Tutorial. It is specially designed for beginners without any server deployment experience. You can click the link to view.

Deploy from Source

Environment Requirements

  • Python 3.10 or higher
  • uv - Python package manager
  • Node.js 16+ and npm
  • FFmpeg - Required for exporting explanation videos; must include support for libass / ass subtitle filters.
  • Valid Google Gemini API key
  • (Optional) LibreOffice - Required when uploading PPTX files using the "PPT Remake" feature to convert PPTX to PDF. It is recommended to convert PPTX to PDF locally before uploading. Reason: Server-side rendering with LibreOffice may cause layout displacement due to missing fonts (such as Microsoft YaHei, Calibri, etc.) and cannot fully restore certain special effects. LibreOffice is not required if you upload PDF files directly. For Docker users who still need PPTX upload support within the container, run:
    docker exec -it banana-slides-backend bash -c "apt-get update && apt-get install -y libreoffice-impress && rm -rf /var/lib/apt/lists/*"
    

    Note: LibreOffice installed via this method will be lost when the container is rebuilt and will need to be reinstalled.

Backend Installation

  1. Clone the repository
git clone https://github.com/Anionex/banana-slides
cd banana-slides
  1. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Install dependencies

Run the following command in the project root directory:


# macOS (Homebrew)

brew install ffmpeg-full
brew unlink ffmpeg 2>/dev/null || true
brew link --overwrite --force ffmpeg-full

# Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y ffmpeg libass9

# Then install Python dependencies

```bash
uv sync

This will automatically install all dependencies based on pyproject.toml.

  1. Configure environment variables

Copy the environment variable template:

cp .env.example .env

Then, following the method described above, open and edit the .env file to configure your API key.

Cloud-Native-GORM-Gen

A powerful, type-safe GORM code generation tool, specifically designed for cloud-native architectures.

Go Reference License: MIT

Introduction

Cloud-Native-GORM-Gen is an enhanced code generator based on GORM (Go Object Relational Mapping). It not only generates basic CRUD operations but also supports complex query logic, association mapping, and a highly customizable template engine, aiming to reduce boilerplate code and improve development efficiency.

Key Features

  • Type Safety: The generated DAO layer code completely avoids the use of interface{}, allowing errors to be discovered at compile time.
  • Code as Documentation: Automatically generates code comments from database comments, keeping the models in sync with the database.
  • Excellent Performance: Compared to native GORM, it reduces the overhead of dynamic reflection.
  • Highly Extensible: Supports configuration through YAML or Go code, and supports custom plugins and templates.
  • Cloud-Native Friendly: Lightweight and easy to integrate into CI/CD pipelines.

Quick Start

Installation

go install github.com/cloud-native-gen/gorm-gen/tools/gentool@latest

Basic Usage

  1. Create the gen.tool configuration file:
version: "1.0"
database:
  dsn: "root:password@tcp(127.0.0.1:3306)/dbname?charset=utf8mb4&parseTime=True&loc=Local"
  dbType: "mysql"
outPath: "./dao/query"
modelPkgPath: "./dao/model"
  1. Run the generation command:
gentool -c gen.tool

Contributing

Welcome to submit Pull Requests or report Issues. Please ensure you read our Contributing Guide before submitting.

License

This project is licensed under the MIT License.

Frontend Installation

  1. Enter the frontend directory
cd frontend
  1. Install dependencies
npm install
  1. Configure the API address

The frontend will automatically connect to the backend service at http://localhost:5000. To modify this, please edit src/api/client.ts.

Start the Backend Service

(Optional) If you have important local data, it is recommended to back up the database before upgrading:
cp backend/instance/database.db backend/instance/database.db.bak Note: In the default configuration, templates, assets, and final products are stored in the uploads/ folder.

cd backend
uv run alembic upgrade head && uv run python app.py

The backend service will start at http://localhost:5000.

Visit http://localhost:5000/health to verify that the service is running correctly.

Start Frontend Development Server

cd frontend
npm run dev

The frontend development server will start at http://localhost:3000.

Open your browser to access and use the application.

🛠️ Technical Architecture

Front-end Technology Stack

React 18 + TypeScript + Vite 5 + Zustand

Backend Tech Stack

Python 3.10+ + Flask 3.0 + uv + SQLite

Communication Group

To facilitate communication and mutual assistance, this WeChat group has been created.

Feel free to suggest new features or provide feedback. I will also answer your questions at my own pace.

image

Follow the author on social media, where I share information about this project and AI:

X (Twitter) 小红书 Bilibili

🔧 FAQ

See the official documentation

🤝 Contributing Guide

Welcome to contribute to this project via Issue and Pull Request!

Important: Please read CONTRIBUTING.md before contributing.

📄 License

This project is open-sourced under the GNU Affero General Public License v3.0 (AGPL-3.0). It can be freely used for non-commercial purposes such as personal learning, research, experimentation, education, or non-profit scientific research activities;

🚀 Sponsor


AIHubMix

Thanks to AIHubMix for sponsoring this project


image

Thanks to AI Huo Bao for sponsoring this project

Acknowledgements

  • Project Contributors:

Contributors

Sponsor

Open source is not easy 🙏 If you find this project valuable, feel free to buy the developer a coffee ☕️

image

Thanks to the following friends for their generous sponsorship and support:

@雅俗共赏, @曹峥, @以年观日, @John, @胡yun星Ethan, @azazo1, @刘聪NLP, @🍟, @苍何, @万瑾, @biubiu, @law, @方源, @寒松Falcon If you have any questions regarding the sponsorship list, please contact the author

📈 Project Statistics

Star History Chart