January 29, 2026
28 min
Every unprotected Next.js app is a buffet for scrapers, credential stuffers, and AI training bots. This tutorial builds a defense layer with rate limiting, fingerprinting, bot detection, and WAF-style rules—all in middleware.

Pio Greeff
Founder & Lead Developer
Deep dive article
Your API is getting hammered. You just don't know it yet.
Every unprotected Next.js app is a buffet for scrapers, credential stuffers, and AI training bots. They're hitting your endpoints right now—burning your Vercel bill, polluting your analytics, and scraping content you spent months creating.
This tutorial builds a defense layer that stops them. Rate limiting, fingerprinting, bot detection, and WAF-style rules—all in Next.js middleware.
No paid services. No vendor lock-in. Just code that works.
Test it:
That's it. Your API now fights back.
Check your Vercel/analytics logs. You'll find:
| What You'll See | What It Actually Is |
|---|---|
| Thousands of requests from "empty" user-agents | Scrapers that forgot to set headers |
Requests from python-requests, axios, curl | Lazy bot operators |
Same IP hitting /api/* 100x/minute | Credential stuffing or enumeration |
Requests to /wp-admin, /.env, /config.php | Vulnerability scanners (you're not even running PHP) |
| GPTBot, CCBot, anthropic-ai in user-agent | AI training crawlers eating your content |
| Requests with no cookies, no JS execution | Headless browsers or raw HTTP clients |
The cost is real:
Most Next.js apps ship with zero protection. Middleware runs on every request—it's the perfect place to fix this.
Attack: A scraper hits your /api/users endpoint to enumerate valid usernames.
What happens:
Same endpoint, but with our middleware:
What happens:
The difference: 10,000x fewer successful requests. Zero data exfiltration. Automatic escalation.
Middleware-based protection has limits. Know when to use something else.
| Situation | Better Alternative |
|---|---|
| DDoS attacks (100K+ req/sec) | You need Cloudflare/AWS Shield at the edge. Middleware can't stop traffic that saturates your origin. |
| Sophisticated bot farms | Residential proxies + browser fingerprint rotation defeats IP-based limits. Use CAPTCHA or proof-of-work. |
| Authenticated API abuse | Per-user rate limits need server-side state tied to auth tokens, not IP/fingerprint. |
| Compliance requirements (PCI, SOC2) | You need a real WAF with audit logs, not DIY middleware. |
| Multi-region deployments | Vercel Edge + KV works, but managing distributed rate limit state is complex. Consider Upstash. |
Use this middleware when:
This stops 90% of attacks with 10% of the effort.
Each layer can block independently. A request must pass all four to reach your app.
We use a sliding window algorithm—it's more accurate than fixed windows and prevents the "boundary burst" problem.
Why sliding window over token bucket?
Token bucket allows bursts—an attacker can dump 100 requests instantly, then wait. Sliding window spreads the limit evenly across time.
Why escalating blocks?
First offense: 1 minute. Second: 2 minutes. Third: 4 minutes. Legitimate users who accidentally hit limits recover quickly. Attackers face exponentially increasing delays.
IP addresses aren't enough. Users behind NAT share IPs. Attackers rotate through proxies. We need a stable identifier that survives IP changes.
Why fingerprint instead of just IP?
The fingerprint combines IP + headers to create a more unique identifier. It's not perfect (sophisticated attackers can rotate headers too), but it catches 95% of automated traffic.
Most bots are lazy. They don't set proper headers, use known automation tools, or exhibit inhuman behavior patterns.
Why separate good bots?
GoogleBot needs to crawl your site for SEO. Blocking it tanks your search rankings. The isGoodBot check lets search engines and social media previews through while blocking scrapers.
Confidence scoring:
Instead of binary yes/no, we score confidence. A request with python-requests user-agent AND missing Accept-Language is definitely a bot (0.9+). A request with just a generic Accept header might be a misconfigured browser (0.2).
Block obviously malicious requests before they hit your app.
What this catches:
.env, .git)What this doesn't catch:
For those, you need a real WAF (Cloudflare, AWS WAF, etc.).
A decorator for API routes that adds per-endpoint rate limiting and abuse detection.
Tie it all together in middleware.ts:
A test suite to verify your protection actually works.
Run with:
Expected output:
| Component | Added Latency | Notes |
|---|---|---|
| WAF rules (regex checks) | ~0.5ms | 40 patterns, optimized regexes |
| Bot detection | ~0.2ms | Header checks + pattern matching |
| Fingerprinting | ~0.1ms | SHA256 hash of headers |
| Rate limit check (LRU cache) | ~0.05ms | In-memory, O(1) lookup |
| Total | ~1ms | Negligible vs. network latency |
| Component | Memory | Notes |
|---|---|---|
| LRU cache (10K entries) | ~5MB | Per rate limiter instance |
| 4 rate limiters | ~20MB | pages, api, auth, heavy |
| Regex patterns (compiled) | ~1MB | Compiled once at startup |
| Total | ~25MB | Well within Vercel limits |
| Scenario | Before Shield | After Shield | Savings |
|---|---|---|---|
| 100K bot requests/month | $15-40 | $0.50 (blocked at middleware) | 97% |
| DDoS attempt (1M requests) | $150-400 | $5-10 (most blocked) | 97% |
| Normal traffic (50K/month) | $7-20 | $7-21 (+1ms latency) | ~0% |
The math: Blocked requests still invoke middleware (you pay for the invocation), but they don't hit your database, APIs, or external services. The real savings come from not triggering expensive downstream operations.
| Endpoint Type | Recommended Limit | Block Duration | Notes |
|---|---|---|---|
| Public pages | 100/min | 1 min | Generous for real users |
| API routes | 30/min | 5 min | Tighter for data endpoints |
| Auth endpoints | 5/15min | 1 hour | Prevent credential stuffing |
| Search/heavy ops | 10/min | 10 min | Protect expensive operations |
| Webhooks | 100/min | 1 min | Third-party services need headroom |
| Confidence | Action | Example |
|---|---|---|
| < 0.3 | Allow | Missing one header |
| 0.3 - 0.5 | Log + allow | Suspicious but not definite |
| 0.5 - 0.8 | Challenge or block | Multiple signals |
| > 0.8 | Block + ban | Definite bot |
| Signal | Action |
|---|---|
| > 10K blocked requests/day | Consider Cloudflare (free tier) |
| Sophisticated attacks bypassing rules | Upgrade to Cloudflare Pro or AWS WAF |
| Compliance requirements (SOC2, PCI) | You need audit logs and managed rules |
| Geographic attacks | Use geo-blocking at the edge |
| Attack | Detection | Response |
|---|---|---|
| Credential stuffing | High volume to /api/auth, different usernames | Auth rate limit (5/15min), require CAPTCHA after 3 failures |
| Content scraping | Sequential page requests, no JS execution | Bot detection blocks, consider JS challenge |
| API enumeration | Incrementing IDs, rapid 404s | Rate limit + monitor 404 rate per fingerprint |
| Search abuse | High volume search queries | Heavy rate limit (10/min), require auth for API |
| AI training bots | GPTBot, CCBot user-agents | Block by user-agent, add to robots.txt |
| Vulnerability scanning | /wp-admin, /.env, SQL patterns | WAF blocks, 24hr ban for critical attempts |
| DDoS (application layer) | Sustained high volume from few IPs | Rate limit + escalating blocks, consider edge protection |
While you're at it, tell AI training bots to stay away:
Note: robots.txt is advisory. Ethical bots respect it; scrapers ignore it. That's why you need the middleware.
That's it. One dependency. The rest is standard Next.js.
This is defense-in-depth, not a fortress. You'll still need:
Each is a tutorial on its own.
Every unprotected Next.js app is getting scraped. Your Vercel bill is higher than it should be. Your content is being stolen. Your APIs are being abused.
This middleware stack:
It adds ~1ms latency and catches 90% of automated abuse. For the other 10%, you need Cloudflare.
Build it. Test it. Ship it. Check your logs in a week—you'll be surprised what you catch.
Found this useful?
Share it with your network
# Clonegit clone https://github.com/yourusername/nextjs-shield.gitcd nextjs-shield # Installnpm install # Copy the middleware to your projectcp src/middleware.ts your-nextjs-app/src/cp -r src/lib/shield your-nextjs-app/src/lib/ # Add to your next.config.js# (see configuration section) # Start your appnpm run dev# Normal request - workscurl http://localhost:3000/api/data # Rapid fire - gets blocked after 10 requestsfor i in {1..15}; do curl -s -o /dev/null -w "%{http_code}\\n" http://localhost:3000/api/data; done# Output: 200 200 200 200 200 200 200 200 200 200 429 429 429 429 429 # Bot user-agent - blocked immediatelycurl -A "python-requests/2.28.0" http://localhost:3000/api/data# Output: 403 Forbidden// app/api/users/[id]/route.tsexport async function GET(req: Request, { params }: { params: { id: string } }) { const user = await db.users.findUnique({ where: { id: params.id } }); if (!user) { return Response.json({ error: "User not found" }, { status: 404 }); } return Response.json({ user: { name: user.name, avatar: user.avatar } });}# Attacker scriptfor id in $(seq 1 10000); do response=$(curl -s "https://yourapp.com/api/users/$id") if [[ $response != *"not found"* ]]; then echo "Valid user: $id" fidoneResults in 60 seconds:- 10,000 requests processed- 847 valid user IDs extracted- Your Vercel bill: +$3- Time to complete: 58 seconds- Detection: NoneRequest 1-10: 200 OK (normal responses)Request 11: 429 Too Many Requests Headers: X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1706745600 Retry-After: 60 Request 12-10000: Connection refused (IP temporarily banned)Results:- 10 requests processed before lockout- 0 valid user IDs extracted (not enough attempts)- Your Vercel bill: +$0.003- Attacker's time wasted: They have to wait 60s, then 5min, then 1hr- Detection: Alert sent to your webhook┌─────────────────────────────────────────────────────────────────┐│ INCOMING REQUEST │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ MIDDLEWARE STACK ││ ┌───────────────────────────────────────────────────────────┐ ││ │ 1. WAF Rules │ ││ │ - Block known bad paths (/.env, /wp-admin) │ ││ │ - Block malicious payloads (SQL injection patterns) │ ││ │ - Block based on headers/country (optional) │ ││ └───────────────────────────────────────────────────────────┘ ││ │ PASS ││ ▼ ││ ┌───────────────────────────────────────────────────────────┐ ││ │ 2. Bot Detection │ ││ │ - Known bot user-agents │ ││ │ - Missing/malformed headers │ ││ │ - Behavioral patterns │ ││ └───────────────────────────────────────────────────────────┘ ││ │ PASS ││ ▼ ││ ┌───────────────────────────────────────────────────────────┐ ││ │ 3. Request Fingerprinting │ ││ │ - IP + headers + TLS fingerprint → stable ID │ ││ │ - Groups requests from same source │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌───────────────────────────────────────────────────────────┐ ││ │ 4. Rate Limiting │ ││ │ - Token bucket per fingerprint │ ││ │ - Sliding window for API routes │ ││ │ - Escalating penalties for repeat offenders │ ││ └───────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────┘ │ PASS ▼┌─────────────────────────────────────────────────────────────────┐│ YOUR APPLICATION ││ (API routes, pages, etc.) │└─────────────────────────────────────────────────────────────────┘// lib/shield/rate-limiter.tsimport { LRUCache } from 'lru-cache'; export interface RateLimitConfig { windowMs: number; // Time window in milliseconds maxRequests: number; // Max requests per window blockDurationMs: number; // How long to block after limit exceeded keyGenerator?: (req: Request) => string;} interface RateLimitEntry { count: number; windowStart: number; blockedUntil: number; violations: number; // Track repeat offenders} export class RateLimiter { private cache: LRUCache<string, RateLimitEntry>; private config: RateLimitConfig; constructor(config: RateLimitConfig) { this.config = config; this.cache = new LRUCache({ max: 10000, // Track up to 10K unique clients ttl: config.windowMs * 2 // Expire entries after 2x window }); } check(key: string): { allowed: boolean; remaining: number; resetAt: number; retryAfter?: number; } { const now = Date.now(); let entry = this.cache.get(key); // Check if currently blocked if (entry && entry.blockedUntil > now) { return { allowed: false, remaining: 0, resetAt: entry.blockedUntil, retryAfter: Math.ceil((entry.blockedUntil - now) / 1000) }; } // Initialize or reset window if (!entry || now - entry.windowStart >= this.config.windowMs) { entry = { count: 0, windowStart: now, blockedUntil: 0, violations: entry?.violations || 0 }; } entry.count++; // Check if limit exceeded if (entry.count > this.config.maxRequests) { entry.violations++; // Escalating block duration: 1x, 2x, 4x, 8x... up to 1 hour const escalation = Math.min(Math.pow(2, entry.violations - 1), 60); entry.blockedUntil = now + (this.config.blockDurationMs * escalation); this.cache.set(key, entry); return { allowed: false, remaining: 0, resetAt: entry.blockedUntil, retryAfter: Math.ceil((entry.blockedUntil - now) / 1000) }; } this.cache.set(key, entry); return { allowed: true, remaining: this.config.maxRequests - entry.count, resetAt: entry.windowStart + this.config.windowMs }; } // Manual block (for detected abuse) block(key: string, durationMs: number): void { const entry = this.cache.get(key) || { count: 0, windowStart: Date.now(), blockedUntil: 0, violations: 0 }; entry.blockedUntil = Date.now() + durationMs; entry.violations++; this.cache.set(key, entry); } // Check if key is currently blocked isBlocked(key: string): boolean { const entry = this.cache.get(key); return entry ? entry.blockedUntil > Date.now() : false; }} // Pre-configured limiters for different use casesexport const rateLimiters = { // General page views: generous pages: new RateLimiter({ windowMs: 60 * 1000, // 1 minute maxRequests: 100, // 100 req/min blockDurationMs: 60 * 1000 // 1 min block }), // API routes: stricter api: new RateLimiter({ windowMs: 60 * 1000, // 1 minute maxRequests: 30, // 30 req/min blockDurationMs: 5 * 60 * 1000 // 5 min block }), // Auth endpoints: very strict auth: new RateLimiter({ windowMs: 15 * 60 * 1000, // 15 minutes maxRequests: 5, // 5 attempts per 15 min blockDurationMs: 60 * 60 * 1000 // 1 hour block }), // Search/expensive operations heavy: new RateLimiter({ windowMs: 60 * 1000, // 1 minute maxRequests: 10, // 10 req/min blockDurationMs: 10 * 60 * 1000 // 10 min block })};// lib/shield/fingerprint.tsimport { createHash } from 'crypto'; export interface FingerprintComponents { ip: string; userAgent: string; acceptLanguage: string; acceptEncoding: string; connection: string; // TLS fingerprint if available (JA3) tlsFingerprint?: string;} export function extractFingerprint(req: Request, ip: string): string { const headers = req.headers; const components: FingerprintComponents = { ip, userAgent: headers.get('user-agent') || 'none', acceptLanguage: headers.get('accept-language') || 'none', acceptEncoding: headers.get('accept-encoding') || 'none', connection: headers.get('connection') || 'none', }; // Create a hash of all components const fingerprint = createHash('sha256') .update(JSON.stringify(components)) .digest('hex') .substring(0, 16); // First 16 chars is enough return fingerprint;} export function getClientIP(req: Request): string { const headers = req.headers; // Check common proxy headers (in order of trustworthiness) // WARNING: Only trust these if you're behind a trusted proxy (Vercel, Cloudflare) const forwardedFor = headers.get('x-forwarded-for'); if (forwardedFor) { // Take the first IP (original client) return forwardedFor.split(',')[0].trim(); } const realIP = headers.get('x-real-ip'); if (realIP) { return realIP; } // Vercel-specific const vercelIP = headers.get('x-vercel-forwarded-for'); if (vercelIP) { return vercelIP.split(',')[0].trim(); } // Cloudflare-specific const cfIP = headers.get('cf-connecting-ip'); if (cfIP) { return cfIP; } // Fallback (usually won't work in serverless) return '0.0.0.0';} // More aggressive fingerprint for sensitive operationsexport function extractStrictFingerprint(req: Request, ip: string): string { const headers = req.headers; // Include more headers for stricter identification const components = { ip, userAgent: headers.get('user-agent') || '', acceptLanguage: headers.get('accept-language') || '', acceptEncoding: headers.get('accept-encoding') || '', accept: headers.get('accept') || '', cacheControl: headers.get('cache-control') || '', pragma: headers.get('pragma') || '', // Screen/viewport hints if available secChUa: headers.get('sec-ch-ua') || '', secChUaPlatform: headers.get('sec-ch-ua-platform') || '', secChUaMobile: headers.get('sec-ch-ua-mobile') || '', }; return createHash('sha256') .update(JSON.stringify(components)) .digest('hex') .substring(0, 24);}// lib/shield/bot-detector.ts export interface BotDetectionResult { isBot: boolean; confidence: number; // 0-1 reasons: string[]; category?: 'scraper' | 'crawler' | 'automation' | 'ai-training' | 'security-scanner' | 'unknown';} // Known bot user-agent patternsconst BOT_PATTERNS = { // AI training crawlers aiTraining: [ /gptbot/i, /chatgpt-user/i, /ccbot/i, /anthropic-ai/i, /claude-web/i, /google-extended/i, /cohere-ai/i, /facebookexternalhit.*ai/i, /perplexitybot/i, /youbot/i, ], // Generic crawlers crawlers: [ /googlebot/i, /bingbot/i, /yandexbot/i, /duckduckbot/i, /baiduspider/i, /sogou/i, /exabot/i, /facebot/i, /ia_archiver/i, ], // Automation tools automation: [ /python-requests/i, /python-urllib/i, /axios/i, /node-fetch/i, /go-http-client/i, /java\\//i, /curl\\//i, /wget/i, /httpie/i, /postman/i, /insomnia/i, /scrapy/i, /beautifulsoup/i, /selenium/i, /puppeteer/i, /playwright/i, /phantomjs/i, /headless/i, ], // [security scanners](/insights/web-security-compliance-both-sides) securityScanners: [ /nmap/i, /nikto/i, /sqlmap/i, /wpscan/i, /nuclei/i, /burp/i, /zap/i, /acunetix/i, /nessus/i, /qualys/i, ], // Generic scraper patterns scrapers: [ /bot/i, /spider/i, /crawl/i, /scrape/i, /fetch/i, /http/i, ],}; // Good bots we might want to allow (configure per-use-case)const GOOD_BOTS = [ /googlebot/i, // Google Search /bingbot/i, // Bing Search /slurp/i, // Yahoo /duckduckbot/i, // DuckDuckGo /facebookexternalhit/i, // Facebook link preview /twitterbot/i, // Twitter link preview /linkedinbot/i, // LinkedIn link preview /slackbot/i, // Slack link preview /telegrambot/i, // Telegram link preview /whatsapp/i, // WhatsApp link preview /discordbot/i, // Discord link preview]; export function detectBot(req: Request): BotDetectionResult { const reasons: string[] = []; let confidence = 0; let category: BotDetectionResult['category'] = 'unknown'; const userAgent = req.headers.get('user-agent') || ''; const accept = req.headers.get('accept') || ''; const acceptLanguage = req.headers.get('accept-language') || ''; const acceptEncoding = req.headers.get('accept-encoding') || ''; const connection = req.headers.get('connection') || ''; const secFetchMode = req.headers.get('sec-fetch-mode') || ''; // Check 1: No user-agent (definite bot) if (!userAgent || userAgent.length < 10) { reasons.push('missing_or_short_user_agent'); confidence += 0.9; } // Check 2: Known bot patterns for (const [cat, patterns] of Object.entries(BOT_PATTERNS)) { for (const pattern of patterns) { if (pattern.test(userAgent)) { reasons.push(`known_${cat}_pattern: ${pattern.source}`); confidence += 0.8; category = cat as BotDetectionResult['category']; break; } } } // Check 3: Missing standard browser headers if (!acceptLanguage) { reasons.push('missing_accept_language'); confidence += 0.3; } if (!acceptEncoding || !acceptEncoding.includes('gzip')) { reasons.push('missing_gzip_accept_encoding'); confidence += 0.2; } if (!accept || accept === '*/*') { reasons.push('generic_accept_header'); confidence += 0.2; } // Check 4: Missing Sec-Fetch headers (modern browsers send these) if (!secFetchMode && userAgent.includes('Chrome')) { reasons.push('chrome_without_sec_fetch'); confidence += 0.4; } // Check 5: Suspicious user-agent patterns if (userAgent && !/Mozilla|Chrome|Safari|Firefox|Edge|Opera/i.test(userAgent)) { reasons.push('non_browser_user_agent'); confidence += 0.5; } // Check 6: Old browser versions (often spoofed poorly) const chromeMatch = userAgent.match(/Chrome\\/(\\d+)/); if (chromeMatch && parseInt(chromeMatch[1]) < 90) { reasons.push('outdated_chrome_version'); confidence += 0.3; } // Normalize confidence confidence = Math.min(confidence, 1); return { isBot: confidence >= 0.5, confidence, reasons, category };} export function isGoodBot(userAgent: string): boolean { return GOOD_BOTS.some(pattern => pattern.test(userAgent));} export function shouldAllowBot(req: Request, allowGoodBots: boolean = true): boolean { const userAgent = req.headers.get('user-agent') || ''; if (allowGoodBots && isGoodBot(userAgent)) { return true; } const detection = detectBot(req); return !detection.isBot;}// lib/shield/waf.ts export interface WAFResult { blocked: boolean; rule?: string; severity: 'low' | 'medium' | 'high' | 'critical';} // Paths that should never be accessed on a Next.js appconst BLOCKED_PATHS = [ // WordPress /\\/wp-admin/i, /\\/wp-login/i, /\\/wp-content/i, /\\/wp-includes/i, /\\/xmlrpc\\.php/i, // Config files /\\/\\.env/i, /\\/\\.git/i, /\\/\\.svn/i, /\\/\\.htaccess/i, /\\/config\\.php/i, /\\/configuration\\.php/i, /\\/settings\\.php/i, /\\/web\\.config/i, // Admin panels /\\/admin\\.php/i, /\\/administrator/i, /\\/phpmyadmin/i, /\\/pma/i, /\\/mysql/i, /\\/adminer/i, // Common vulnerabilities /\\/cgi-bin/i, /\\/shell/i, /\\/cmd/i, /\\/eval/i, /\\/phpinfo/i, // Backup files /\\.bak$/i, /\\.backup$/i, /\\.old$/i, /\\.orig$/i, /\\.save$/i, /\\.swp$/i, /\\.sql$/i, /\\.zip$/i, /\\.tar/i, /\\.gz$/i,]; // SQL injection patternsconst SQL_INJECTION_PATTERNS = [ /(\\%27)|(\\')|(\\-\\-)|(\\%23)|(#)/i, /((\\%3D)|(=))[^\\n]*((\\%27)|(\\')|(\\-\\-)|(\\%3B)|(;))/i, /\\w*((\\%27)|(\\'))((\\%6F)|o|(\\%4F))((\\%72)|r|(\\%52))/i, /((\\%27)|(\\'))union/i, /exec(\\s|\\+)+(s|x)p\\w+/i, /union(\\s+)select/i, /insert(\\s+)into/i, /select(\\s+).+from/i, /drop(\\s+)table/i, /update(\\s+).+set/i, /delete(\\s+)from/i,]; // XSS patternsconst XSS_PATTERNS = [ /<script[^>]*>[\\s\\S]*?<\\/script>/i, /javascript:/i, /on\\w+\\s*=/i, /<iframe/i, /<object/i, /<embed/i, /<svg[^>]*onload/i, /expression\\s*\\(/i,]; // Path traversalconst PATH_TRAVERSAL_PATTERNS = [ /\\.\\.\\//, /\\.\\.%2f/i, /\\.\\.\\\\/, /%2e%2e/i, /\\.%2e/i, /%2e\\./i,]; // Suspicious header valuesconst SUSPICIOUS_HEADERS = [ { header: 'x-forwarded-for', pattern: /[<>"']/i, rule: 'xss_in_xff' }, { header: 'referer', pattern: /<script/i, rule: 'xss_in_referer' }, { header: 'user-agent', pattern: /\\$\\{/i, rule: 'log4j_attempt' }, { header: 'user-agent', pattern: /\\{\\{/i, rule: 'ssti_attempt' },]; export function checkWAF(req: Request): WAFResult { const url = new URL(req.url); const path = url.pathname; const query = url.search; const fullUrl = path + query; // Check 1: Blocked paths for (const pattern of BLOCKED_PATHS) { if (pattern.test(path)) { return { blocked: true, rule: `blocked_path: ${pattern.source}`, severity: 'medium' }; } } // Check 2: SQL injection in URL for (const pattern of SQL_INJECTION_PATTERNS) { if (pattern.test(fullUrl)) { return { blocked: true, rule: `sql_injection: ${pattern.source}`, severity: 'critical' }; } } // Check 3: XSS in URL for (const pattern of XSS_PATTERNS) { if (pattern.test(fullUrl)) { return { blocked: true, rule: `xss_attempt: ${pattern.source}`, severity: 'high' }; } } // Check 4: Path traversal for (const pattern of PATH_TRAVERSAL_PATTERNS) { if (pattern.test(fullUrl)) { return { blocked: true, rule: `path_traversal: ${pattern.source}`, severity: 'critical' }; } } // Check 5: Suspicious headers for (const { header, pattern, rule } of SUSPICIOUS_HEADERS) { const value = req.headers.get(header); if (value && pattern.test(value)) { return { blocked: true, rule, severity: 'high' }; } } return { blocked: false, severity: 'low' };} // Check request body for attacks (call this in API routes)export async function checkRequestBody(req: Request): Promise<WAFResult> { try { const contentType = req.headers.get('content-type') || ''; // Only check JSON and form data if (!contentType.includes('json') && !contentType.includes('form')) { return { blocked: false, severity: 'low' }; } const body = await req.text(); // Check for SQL injection in body for (const pattern of SQL_INJECTION_PATTERNS) { if (pattern.test(body)) { return { blocked: true, rule: `sql_injection_in_body: ${pattern.source}`, severity: 'critical' }; } } // Check for XSS in body for (const pattern of XSS_PATTERNS) { if (pattern.test(body)) { return { blocked: true, rule: `xss_in_body: ${pattern.source}`, severity: 'high' }; } } return { blocked: false, severity: 'low' }; } catch { return { blocked: false, severity: 'low' }; }}// lib/shield/protect-api.tsimport { NextRequest, NextResponse } from 'next/server';import { rateLimiters, RateLimiter } from './rate-limiter';import { extractFingerprint, getClientIP } from './fingerprint';import { checkRequestBody } from './waf'; export interface ProtectOptions { rateLimit?: 'pages' | 'api' | 'auth' | 'heavy' | RateLimiter; checkBody?: boolean; requireAuth?: boolean; logAbuse?: boolean;} export function withProtection( handler: (req: NextRequest) => Promise<Response>, options: ProtectOptions = {}) { return async function protectedHandler(req: NextRequest): Promise<Response> { const ip = getClientIP(req); const fingerprint = extractFingerprint(req, ip); // Get rate limiter const limiter = typeof options.rateLimit === 'string' ? rateLimiters[options.rateLimit] : options.rateLimit || rateLimiters.api; // Check rate limit const limitResult = limiter.check(fingerprint); if (!limitResult.allowed) { if (options.logAbuse) { console.log(`[SHIELD] Rate limit exceeded: ${fingerprint} (IP: ${ip})`); } return new NextResponse( JSON.stringify({ error: 'Too many requests', retryAfter: limitResult.retryAfter }), { status: 429, headers: { 'Content-Type': 'application/json', 'X-RateLimit-Remaining': '0', 'X-RateLimit-Reset': String(limitResult.resetAt), 'Retry-After': String(limitResult.retryAfter), } } ); } // Check request body for attacks if (options.checkBody && (req.method === 'POST' || req.method === 'PUT')) { const bodyCheck = await checkRequestBody(req.clone()); if (bodyCheck.blocked) { if (options.logAbuse) { console.log(`[SHIELD] WAF blocked: ${fingerprint} - ${bodyCheck.rule}`); } // Block this fingerprint for repeated attacks limiter.block(fingerprint, 60 * 60 * 1000); // 1 hour return new NextResponse( JSON.stringify({ error: 'Request blocked' }), { status: 403 } ); } } // Add rate limit headers to successful responses const response = await handler(req); const newHeaders = new Headers(response.headers); newHeaders.set('X-RateLimit-Remaining', String(limitResult.remaining)); newHeaders.set('X-RateLimit-Reset', String(limitResult.resetAt)); return new NextResponse(response.body, { status: response.status, headers: newHeaders }); };} // Usage example:// // // app/api/search/route.ts// import { withProtection } from '@/lib/shield/protect-api';// // async function handler(req: NextRequest) {// // Your API logic here// return Response.json({ results: [] });// }// // export const GET = withProtection(handler, { // rateLimit: 'heavy',// logAbuse: true // });// middleware.tsimport { NextRequest, NextResponse } from 'next/server';import { rateLimiters } from './lib/shield/rate-limiter';import { extractFingerprint, getClientIP } from './lib/shield/fingerprint';import { detectBot, isGoodBot } from './lib/shield/bot-detector';import { checkWAF } from './lib/shield/waf'; // Configure which paths to protectconst config = { // Paths to always check (API routes, auth, etc.) protectedPaths: ['/api/', '/auth/', '/admin/'], // Paths to skip entirely (static assets, health checks) ignoredPaths: ['/_next/', '/favicon.ico', '/robots.txt', '/health'], // Paths where we allow good bots (for SEO) allowBotsOn: ['/', '/blog/', '/docs/', '/about'], // Paths with strict rate limiting strictPaths: ['/api/auth/', '/api/admin/'],}; export async function middleware(req: NextRequest) { const path = req.nextUrl.pathname; // Skip ignored paths if (config.ignoredPaths.some(p => path.startsWith(p))) { return NextResponse.next(); } const ip = getClientIP(req); const fingerprint = extractFingerprint(req, ip); const userAgent = req.headers.get('user-agent') || ''; // Layer 1: WAF Rules const wafResult = checkWAF(req); if (wafResult.blocked) { console.log(`[SHIELD:WAF] Blocked: ${ip} - ${wafResult.rule}`); // Immediately ban high-severity attacks if (wafResult.severity === 'critical' || wafResult.severity === 'high') { rateLimiters.api.block(fingerprint, 24 * 60 * 60 * 1000); // 24 hour ban } return new NextResponse('Forbidden', { status: 403 }); } // Layer 2: Bot Detection const isProtectedPath = config.protectedPaths.some(p => path.startsWith(p)); const allowBotsHere = config.allowBotsOn.some(p => path.startsWith(p)); if (isProtectedPath && !allowBotsHere) { // Strict bot check on protected paths const botResult = detectBot(req); if (botResult.isBot && !isGoodBot(userAgent)) { console.log(`[SHIELD:BOT] Blocked: ${ip} - ${botResult.category} (${botResult.reasons.join(', ')})`); return new NextResponse( JSON.stringify({ error: 'Automated requests not allowed' }), { status: 403, headers: { 'Content-Type': 'application/json' } } ); } } // Layer 3: Rate Limiting const isStrict = config.strictPaths.some(p => path.startsWith(p)); const limiter = isStrict ? rateLimiters.auth : (path.startsWith('/api/') ? rateLimiters.api : rateLimiters.pages); const limitResult = limiter.check(fingerprint); if (!limitResult.allowed) { console.log(`[SHIELD:RATE] Limited: ${ip} (fingerprint: ${fingerprint})`); return new NextResponse( JSON.stringify({ error: 'Rate limit exceeded', retryAfter: limitResult.retryAfter }), { status: 429, headers: { 'Content-Type': 'application/json', 'X-RateLimit-Remaining': '0', 'X-RateLimit-Reset': String(limitResult.resetAt), 'Retry-After': String(limitResult.retryAfter), } } ); } // Add security headers and rate limit info const response = NextResponse.next(); response.headers.set('X-RateLimit-Remaining', String(limitResult.remaining)); response.headers.set('X-Content-Type-Options', 'nosniff'); response.headers.set('X-Frame-Options', 'DENY'); response.headers.set('X-XSS-Protection', '1; mode=block'); response.headers.set('Referrer-Policy', 'strict-origin-when-cross-origin'); return response;} export const middlewareConfig = { matcher: [ // Match all paths except static files '/((?!_next/static|_next/image|favicon.ico).*)', ],};// scripts/test-shield.tsconst BASE_URL = process.env.TEST_URL || 'http://localhost:3000'; interface TestResult { name: string; passed: boolean; expected: string; actual: string;} async function runTests(): Promise<void> { const results: TestResult[] = []; // Test 1: Normal request succeeds results.push(await testNormalRequest()); // Test 2: Rate limiting kicks in results.push(await testRateLimit()); // Test 3: Bot user-agent blocked results.push(await testBotBlocking()); // Test 4: Good bot allowed results.push(await testGoodBot()); // Test 5: WAF blocks suspicious paths results.push(await testWAFPaths()); // Test 6: SQL injection blocked results.push(await testSQLInjection()); // Test 7: XSS blocked results.push(await testXSS()); // Test 8: Escalating blocks work results.push(await testEscalatingBlocks()); // Print results console.log('\\n=== SHIELD TEST RESULTS ===\\n'); let passed = 0; let failed = 0; for (const result of results) { const status = result.passed ? '✅' : '❌'; console.log(`${status} ${result.name}`); if (!result.passed) { console.log(` Expected: ${result.expected}`); console.log(` Actual: ${result.actual}`); failed++; } else { passed++; } } console.log(`\\n${passed}/${passed + failed} tests passed`); if (failed > 0) { process.exit(1); }} async function testNormalRequest(): Promise<TestResult> { const res = await fetch(`${BASE_URL}/api/data`, { headers: { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Chrome/120.0.0.0 Safari/537.36', 'Accept': 'application/json', 'Accept-Language': 'en-US,en;q=0.9', 'Accept-Encoding': 'gzip, deflate, br', } }); return { name: 'Normal request succeeds', passed: res.status === 200, expected: '200', actual: String(res.status) };} async function testRateLimit(): Promise<TestResult> { const requests = []; // Fire 35 requests rapidly (limit is 30/min for API) for (let i = 0; i < 35; i++) { requests.push( fetch(`${BASE_URL}/api/data`, { headers: { 'User-Agent': 'Mozilla/5.0 Test Browser', 'Accept-Language': 'en-US', 'X-Test-ID': 'rate-limit-test', // Same fingerprint components } }) ); } const responses = await Promise.all(requests); const statusCodes = responses.map(r => r.status); const has429 = statusCodes.includes(429); const firstCodes = statusCodes.slice(0, 30); const laterCodes = statusCodes.slice(30); return { name: 'Rate limiting kicks in after threshold', passed: has429 && laterCodes.every(c => c === 429), expected: 'First 30 requests: 200, Rest: 429', actual: `First 30: ${[...new Set(firstCodes)]}, Rest: ${[...new Set(laterCodes)]}` };} async function testBotBlocking(): Promise<TestResult> { const res = await fetch(`${BASE_URL}/api/data`, { headers: { 'User-Agent': 'python-requests/2.28.0', } }); return { name: 'Bot user-agent blocked', passed: res.status === 403, expected: '403', actual: String(res.status) };} async function testGoodBot(): Promise<TestResult> { // Test that Googlebot can access public pages const res = await fetch(`${BASE_URL}/`, { headers: { 'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', } }); return { name: 'Good bot (Googlebot) allowed on public pages', passed: res.status === 200, expected: '200', actual: String(res.status) };} async function testWAFPaths(): Promise<TestResult> { const blockedPaths = [ '/wp-admin', '/.env', '/.git/config', '/phpmyadmin', ]; const results = await Promise.all( blockedPaths.map(path => fetch(`${BASE_URL}${path}`).then(r => r.status) ) ); const allBlocked = results.every(status => status === 403 || status === 404); return { name: 'WAF blocks suspicious paths', passed: allBlocked, expected: 'All 403/404', actual: results.join(', ') };} async function testSQLInjection(): Promise<TestResult> { const res = await fetch(`${BASE_URL}/api/search?q='; DROP TABLE users; --`, { headers: { 'User-Agent': 'Mozilla/5.0 Test Browser', 'Accept-Language': 'en-US', } }); return { name: 'SQL injection blocked', passed: res.status === 403, expected: '403', actual: String(res.status) };} async function testXSS(): Promise<TestResult> { const res = await fetch(`${BASE_URL}/api/search?q=<script>alert('xss')</script>`, { headers: { 'User-Agent': 'Mozilla/5.0 Test Browser', 'Accept-Language': 'en-US', } }); return { name: 'XSS attempt blocked', passed: res.status === 403, expected: '403', actual: String(res.status) };} async function testEscalatingBlocks(): Promise<TestResult> { // This test requires clean state - run separately // Trigger rate limit, wait, trigger again, check longer block // For now, just verify the header is present const res = await fetch(`${BASE_URL}/api/data`, { headers: { 'User-Agent': 'Mozilla/5.0 Test Browser', 'Accept-Language': 'en-US', } }); const hasRateLimitHeaders = res.headers.has('X-RateLimit-Remaining'); return { name: 'Rate limit headers present', passed: hasRateLimitHeaders, expected: 'X-RateLimit-Remaining header present', actual: hasRateLimitHeaders ? 'Present' : 'Missing' };} runTests().catch(console.error);npx tsx scripts/test-shield.ts=== SHIELD TEST RESULTS === ✅ Normal request succeeds✅ Rate limiting kicks in after threshold✅ Bot user-agent blocked✅ Good bot (Googlebot) allowed on public pages✅ WAF blocks suspicious paths✅ SQL injection blocked✅ XSS attempt blocked✅ Rate limit headers present 8/8 tests passed# robots.txtUser-agent: GPTBotDisallow: / User-agent: ChatGPT-UserDisallow: / User-agent: CCBotDisallow: / User-agent: anthropic-aiDisallow: / User-agent: Claude-WebDisallow: / User-agent: Google-ExtendedDisallow: / User-agent: FacebookBotDisallow: / User-agent: cohere-aiDisallow: / User-agent: PerplexityBotDisallow: / User-agent: *Allow: /{ "dependencies": { "lru-cache": "^10.0.0" }}