An automated code security audit system powered by AI large language models, helping developers quickly identify security vulnerabilities in code and providing remediation suggestions.
This project enforces three inviolable hard constraints on AI audit behavior, fundamentally eliminating hallucinations and fabrications to ensure every vulnerability report is evidence-based:
| Rule | Prohibited Behavior | Correct Approach | Violation Consequence |
|---|---|---|---|
| Rule 1: No Guessing File Paths | Referencing file paths from memory or speculation | Only reference code content actually provided to the AI | All analysis based on non-existent files is invalid |
| Rule 2: No Fabricating Code Snippets | Describing code from impression, or referencing code not actually seen | Must reference actual line numbers and content from provided code | All vulnerability analysis based on fabricated code is invalid |
| Rule 3: No Reporting Vulnerabilities in Unseen Code | Reporting vulnerabilities without actually seeing the code | See code → Analyze code → Then report vulnerabilities | Vulnerability reports for unseen code are directly marked invalid |
Violating any iron rule will invalidate the audit results.
Design motivation: Large language models are prone to "hallucinations" in code audit scenarios — fabricating non-existent file paths, inventing code snippets that never appeared, or asserting vulnerabilities in files they never read. These three iron rules constrain AI behavior at the highest priority, anchoring the audit process to actual code evidence, thereby ensuring report credibility and traceability.
Code defect exists ≠ Vulnerability is exploitable
The system requires the AI to verify the following 9 dimensions based on actual code before reporting each vulnerability:
| # | Verification Dimension | Description |
|---|---|---|
| 1 | Defect Authenticity | Whether the code defect truly exists, whether overlooked upstream protections exist |
| 2 | Path Reachability | Whether the code path is reachable (excluding dead code, legacy code, unsatisfied conditions) |
| 3 | Input Reachability | Whether user input can actually reach the danger point |
| 4 | Practical Exploitability | Whether an attacker can exploit it in a real environment |
| 5 | Systematic Design | Whether the pattern is a systematic framework design rather than an individual oversight |
| 6 | Source Type | Whether the source is external user input rather than trusted server-side code |
| 7 | Self-Attack Test | Whether the prerequisite privileges already exceed the vulnerability's own capability |
| 8 | Design Intent | Whether the behavior is the framework's intended design rather than a defect |
| 9 | Runtime Feasibility | Whether the theoretical attack is feasible in the actual runtime environment |
Incorrectly labeling secure code as a "vulnerability" is misleading. Accuracy over quantity — one accurate vulnerability report is far better than ten false positives.
Every potential vulnerability detected by AI must pass four progressive rounds of challenge verification before being written to the final report. Failure in any round affects the final determination:
Potential Vuln → Round 0 → Round 1 → Round 2 → Round 3 → Final Verdict │ │ │ │ ▼ ▼ ▼ ▼ Reachability Code Logic Data Flow Exploitability
Detailed rules for each round:
| Round | Name | Challenge Question | Pass Condition | Elimination Condition |
|---|---|---|---|---|
| Round 0 | Reachability & Design Intent | Is the code path reachable? | Path reachable and not design behavior | Dead code / Legacy code / Intended design behavior |
| Round 1 | Code Logic Challenge | Does the dangerous code pattern truly exist? | Confidence is MEDIUM or HIGH | Confidence is LOW → Direct elimination |
| Round 2 | Data Flow Challenge | Can user input reach the danger point? | Clear data flow description exists (>10 chars) | Data flow unclear or non-existent |
| Round 3 | Exploitability Challenge | Can an attacker construct an effective attack? | Confidence HIGH or severity CRITICAL | Insufficient confidence for regular vulns (composite vulns have exemption rules) |
Final verdict criteria:
| Rounds Passed | Verdict Status | Action |
|---|---|---|
| 4/4 passed | passed — Confirmed vulnerability | Written to final report |
| 2-3/4 passed | partial — Under observation | Written to report with pending verification note |
| 0-1/4 passed | failed — False positive | Filtered out, excluded from final report |
Composite function vulnerabilities (involving cross-file data flows) enjoy special exemption rules in Round 3 — they pass as long as confidence is not LOW, because cross-file vulnerabilities are inherently harder to confirm but often more damaging.
| Vulnerability Type | CWE ID | Default Level | Description |
|---|---|---|---|
| SQL Injection | CWE-89 | HIGH | Detects SQL concatenation vulnerabilities, distinguishes parameterized queries (safe) from string concatenation (dangerous) |
| XSS (Cross-Site Scripting) | CWE-79 | HIGH | Detects reflected, stored, and DOM-based XSS, covers innerHTML/v-html/dangerouslySetInnerHTML |
| Hardcoded Secrets | CWE-798 | HIGH | Detects API Keys/Tokens/Passwords/Private Keys/Connection Strings, auto-excludes placeholders and env variable references |
| Command Injection | CWE-78 | CRITICAL | Detects dangerous calls like eval/exec/system/child_process |
| Path Traversal | CWE-22 | HIGH | Detects file path manipulation vulnerabilities |
| SSRF | CWE-918 | HIGH | Detects Server-Side Request Forgery vulnerabilities |
| Insecure Deserialization | CWE-502 | HIGH | Detects pickle.loads/ObjectInputStream/unserialize, etc. |
| Authentication Flaws | CWE-287 | HIGH | Detects authentication/authorization implementation flaws |
| Sensitive Data Exposure | CWE-200 | MEDIUM | Detects error message leakage, sensitive data in logs |
| XXE | CWE-611 | HIGH | Detects XML External Entity injection |
| Insecure Randomness | CWE-330 | LOW | Detects weak random number generators in security contexts |
| Prototype Pollution | CWE-1321 | HIGH | Detects JavaScript prototype chain pollution |
| CSRF | CWE-352 | MEDIUM | Detects Cross-Site Request Forgery |
| IDOR | CWE-639 | MEDIUM | Detects Insecure Direct Object References |
The system pays special attention to cross-function, cross-file composite security issues:
| Pattern | Description |
|---|---|
| Cross-Function Data Flow Taint | Function A receives user input without sanitization → passes to Function B → Function B uses it in dangerous operations |
| Privilege Escalation Chain | Normal user modifies state via Function A → bypasses Function B's permission checks |
| Race Condition (TOCTOU) | Function A checks permissions → Function B modifies data after check but before operation |
| Error Handling Leak Chain | Function A's exception is caught by Function B → Function B returns error details to client |
| Auth/AuthZ Bypass Combo | Certain function call combinations skip intermediate authentication/authorization checks |
| Prototype Pollution Propagation | Object merge in Function A is polluted → affects Function B's logic decisions |
| Second-Order Injection | Function A stores unsanitized user input to database → Function B reads and uses it in dangerous operations |
| Callback/Event-Driven Vulnerability | Data passing between event handler functions lacks validation |
┌──────────────────────────────────────────────────────┐ │ Nginx (Port 80) │ │ Static Assets + API Reverse Proxy │ ├────────────────────┬─────────────────────────────────┤ │ React SPA │ Express API (Port 3001) │ │ TypeScript + Vite │ Node.js 20 + MongoDB Driver │ │ Tailwind + DaisyUI│ AI Model Calls (OpenAI compat.) │ └────────────────────┴──────────┬──────────────────────┘ │ ┌────────┴────────┐ │ MongoDB 7 │ │ Data Storage │ └─────────────────┘
| Layer | Technology |
|---|---|
| Frontend Framework | React 19 + TypeScript |
| Build Tool | Vite 6 |
| UI Styling | Tailwind CSS 3 + DaisyUI 4 |
| Routing | React Router 6 |
| Backend | Node.js 20 + Express 4 |
| Database | MongoDB 7 |
| AI Models | OpenAI-compatible API (Claude / GPT / DeepSeek / Qwen / Hunyuan, etc.) |
| Deployment | Docker Compose (Nginx + Node.js + MongoDB) |
AI_code_review_agent/ ├── src/ # Frontend source code │ ├── components/ # Reusable components │ │ ├── FileUpload.tsx # File upload (ZIP drag & drop) │ │ ├── TaskProgress.tsx # Task progress & real-time logs │ │ ├── ReportViewer.tsx # Audit report viewer │ │ ├── VulnerabilityCard.tsx # Vulnerability detail card │ │ ├── CodeHighlight.tsx # Code syntax highlighting │ │ ├── Navbar.tsx # Top navigation bar │ │ └── Footer.tsx # Footer │ ├── pages/ # Page components │ │ ├── HomePage.tsx # Home page (upload entry) │ │ ├── TaskPage.tsx # Task details (progress/logs/report) │ │ ├── HistoryPage.tsx # Audit history │ │ └── SettingsPage.tsx # Model configuration management │ ├── types/audit.ts # TypeScript type definitions │ ├── utils/api.ts # API request utilities │ ├── App.tsx # App routing entry │ └── index.css # Global styles ├── server/ # Backend API service │ ├── src/ │ │ ├── index.js # Express entry + route mounting │ │ ├── routes/ │ │ │ ├── tasks.js # Task CRUD + report data │ │ │ ├── audit.js # Trigger security audit │ │ │ ├── report.js # Trigger report generation │ │ │ └── model-configs.js # Model configuration management │ │ ├── services/ │ │ │ ├── ai.js # AI model calls (OpenAI compatible) │ │ │ ├── analyzeCode.js # ZIP extraction + code chunking │ │ │ ├── securityAudit.js # Audit engine (concurrency/retry/resume) │ │ │ └── generateReport.js # Report generation (Markdown + JSON) │ │ └── utils/db.js # MongoDB connection & indexes │ ├── Dockerfile # Backend container config │ ├── .env.example # Environment variable template │ └── package.json ├── shared/prompts/ # AI audit prompt templates ├── docker-compose.yml # Docker Compose orchestration ├── Dockerfile # Frontend multi-stage build (Vite → Nginx) ├── nginx.conf # Nginx reverse proxy config ├── .env # Frontend environment variables └── package.json # Frontend dependencies
# Clone the project
git clone <repo-url>
cd AI_code_review_agent
# Build and start all services
docker compose up -d
# View logs
docker compose logs -f
After startup, visit http://localhost:8080
Custom port:
APP_PORT=3000 docker compose up -d
Stop services:
docker compose down
For scenarios requiring code modification with hot reload debugging.
1. Start MongoDB
docker run -d --name mongo -p 27017:27017 mongo:7
2. Start Backend
cd server
npm install
# Create .env (or copy .env.example)
cat > .env << EOF
MONGODB_URI=mongodb://localhost:27017/ai_code_review
PORT=3001
DATA_DIR=./data
EOF
npm run dev
3. Start Frontend (new terminal)
# Return to project root
npm install
npm run dev
The frontend dev server starts at http://localhost:5173 by default, with API requests automatically proxied to backend localhost:3001.
Configure the model before first use:
View all audit tasks on the "History" page, with pagination and deletion support.
Frontend (.env):
| Variable | Default | Description |
|---|---|---|
VITE_API_BASE_URL | /api | API request path prefix |
Backend (server/.env):
| Variable | Default | Description |
|---|---|---|
MONGODB_URI | mongodb://mongo:27017/ai_code_review | MongoDB connection URI |
PORT | 3001 | Backend service port |
DATA_DIR | /app/data | Upload files and report storage path |
Docker Compose:
| Variable | Default | Description |
|---|---|---|
APP_PORT | 8080 | Externally exposed access port |
| Parameter | Value | Description |
|---|---|---|
| Batch Concurrency | 2 | Number of code chunks audited simultaneously per batch |
| Single Run Limit | 100 chunks | Maximum code chunks per single execution |
| AI Request Timeout | 150s (increments to 210s) | Initial 150s, +30s per retry |
| Max Retries | 2 | Retry count after individual chunk failure |
| Large Chunk Threshold | 120 lines | Auto-split chunks exceeding this line count |
| Safe Exit Threshold | 540s | Saves progress on timeout for later continuation |
10 built-in preset templates, plus support for any OpenAI API-compatible model:
| Model | Description |
|---|---|
| GPT-4o | OpenAI flagship model |
| GPT-4o Mini | OpenAI lightweight model |
| Claude Opus 4 | Anthropic flagship model |
| Claude Sonnet 4 | Anthropic cost-effective model |
| DeepSeek V3 | DeepSeek general-purpose model |
| DeepSeek R1 | DeepSeek reasoning model |
| Qwen Max | Alibaba Qwen flagship model |
| Hunyuan Turbo | Tencent Hunyuan high-performance reasoning model |
| Hunyuan Pro | Tencent Hunyuan best-effect model |
| Custom Model | Any endpoint compatible with OpenAI Chat Completions API |
| Collection | Purpose |
|---|---|
audit_tasks | Audit task info (status, file paths, tech stack, etc.) |
audit_results | Audit results (vulnerability lists stored per file) |
audit_logs | Audit logs (real-time progress, thinking process, etc.) |
audit_code_files | Extracted code chunks |
audit_vulnerabilities | Temporary vulnerability data (cleaned after report generation) |
model_configs | AI model configurations |
# Docker operations
docker compose up -d # Start services
docker compose down # Stop services
docker compose logs -f # View logs
docker compose up -d --build # Rebuild and start
# Local development
npm run dev # Start frontend dev server
npm run build # Build frontend
npm run lint # ESLint check
npm run format # Prettier formatting
cd server && npm run dev # Start backend dev server
This project is licensed under the Apache License 2.0.