
A powerful tool for creating fine-tuning datasets for Large Language Models
Features • Getting Started • Usage • Documentation • Contributing • License
If you like this project, please leave a Star ⭐️ for it. Or you can buy the author a cup of coffee => Support the author ❤️!
Easy Dataset is a specialized application designed to streamline the creation of fine-tuning datasets for Large Language Models (LLMs). It offers an intuitive interface for uploading domain-specific files, intelligently splitting content, generating questions, and producing high-quality training data for model fine-tuning.
With Easy Dataset, you can transform your domain knowledge into structured datasets compatible with all OpenAI-format compatible LLM APIs, making the fine-tuning process accessible and efficient.

| Windows | MacOS | Linux | |
Setup.exe |
Intel |
M |
AppImage |
Clone the repository:
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
Install dependencies:
npm install
Start the development server:
npm run build npm run start
If you want to build the image yourself, you can use the Dockerfile in the project root directory:
Clone the repository:
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
Build the Docker image:
docker build -t easy-dataset .
Run the container:
docker run -d -p 1717:1717 -v {YOUR_LOCAL_DB_PATH}:/app/local-db --name easy-dataset easy-dataset
Note: Replace {YOUR_LOCAL_DB_PATH} with the actual path where you want to store the local database.
Open your browser and navigate to http://localhost:1717
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
easy-dataset/ ├── app/ # Next.js application directory │ ├── api/ # API routes │ │ ├── llm/ # LLM API integration │ │ │ ├── ollama/ # Ollama API integration │ │ │ └── openai/ # OpenAI API integration │ │ ├── projects/ # Project management APIs │ │ │ ├── [projectId]/ # Project-specific operations │ │ │ │ ├── chunks/ # Text chunk operations │ │ │ │ ├── datasets/ # Dataset generation and management │ │ │ │ │ └── optimize/ # Dataset optimization API │ │ │ │ ├── generate-questions/ # Batch question generation │ │ │ │ ├── questions/ # Question management │ │ │ │ └── split/ # Text splitting operations │ │ │ └── user/ # User-specific project operations │ ├── projects/ # Front-end project pages │ │ └── [projectId]/ # Project-specific pages │ │ ├── datasets/ # Dataset management UI │ │ ├── questions/ # Question management UI │ │ ├── settings/ # Project settings UI │ │ └── text-split/ # Text processing UI │ └── page.js # Home page ├── components/ # React components │ ├── datasets/ # Dataset-related components │ ├── home/ # Home page components │ ├── projects/ # Project management components │ ├── questions/ # Question management components │ └── text-split/ # Text processing components ├── lib/ # Core libraries and utilities │ ├── db/ # Database operations │ ├── i18n/ # Internationalization │ ├── llm/ # LLM integration │ │ ├── common/ # Common LLM utilities │ │ ├── core/ # Core LLM client │ │ └── prompts/ # Prompt templates │ │ ├── answer.js # Answer generation prompts (Chinese) │ │ ├── answerEn.js # Answer generation prompts (English) │ │ ├── question.js # Question generation prompts (Chinese) │ │ ├── questionEn.js # Question generation prompts (English) │ │ └── ... other prompts │ └── text-splitter/ # Text splitting utilities ├── locales/ # Internationalization resources │ ├── en/ # English translations │ └── zh-CN/ # Chinese translations ├── public/ # Static assets │ └── imgs/ # Image resources └── local-db/ # Local file-based database └── projects/ # Project data storage
For detailed documentation on all features and APIs, please visit our Documentation Site.
We welcome contributions from the community! If you'd like to contribute to Easy Dataset, please follow these steps:
git checkout -b feature/amazing-feature)git commit -m 'Add some amazing feature')git push origin feature/amazing-feature)Please make sure to update tests as appropriate and adhere to the existing coding style.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.