The 12 Ways AI-generated Apps Fail in Production

The 12 Ways AI-generated Apps Fail in Production
Published

May 1, 2026

Last updated

May 1, 2026

Twelve specific failures show up on almost every Lovable, Bolt.new, Cursor, and Replit Agent codebase that lands on our desk for rescue work. Grouped by what actually breaks: data exposure, identity, reliability, operability. With the tool defaults and 2025-2026 evidence behind each one.

Twelve specific things break when an AI-generated app meets real users. We see the same twelve, in roughly the same order, on almost every Lovable, Bolt.new, Cursor, and Replit Agent codebase that lands on our desk for rescue work. They are not theoretical. Veracode's Spring 2026 update across 150+ models found 45% of AI-generated code introduces OWASP Top 10 vulnerabilities, a rate that has not improved across GPT-5, Claude 4.6, or Gemini 3. Tenzai's December 2025 benchmark of five leading agents building 15 identical apps found 69 vulnerabilities, with 100% missing CSRF protection and 0% setting security headers. Carnegie Mellon's vibe-coding study put it more bluntly: 61% of AI-generated code is functionally correct, only 10.5% is secure.

This is the short version of what we find when we open the repo. Grouped by what actually breaks, with the tool defaults that produce each one.

What actually breaks when vibe-coded apps meet real users?

Twelve specific failures, often observed across rescue engagements. They cluster into four buckets: data exposure, identity and access, reliability under load, and operability. Security gets the headlines (the Tea app's 72,000-ID Firebase leak, the Lovable CVE-2025-48757 RLS disclosure that exposed 170+ live apps), but the failures that quietly kill products at the 1,000-user mark are usually missing tests and missing observability. Those are also the cheapest to fix.

The 12 failure modes

Data exposure (1-4)

1. Hardcoded secrets in the client bundle. The model writes const STRIPE_SECRET = "sk_live_..." straight into a React component. Escape.tech's scan of 5,600 deployed vibe-coded apps found 400+ live secrets and 175 PII exposures. GitGuardian's 2026 report says AI-assisted commits leak secrets at 3.2% versus a 1.5% baseline, with Supabase credential leaks up 992% year-over-year. Invicti found models reuse the same placeholder secrets across apps (GPT-5 likes supersecretjwt), giving attackers a fingerprint to grep for.

2. Row-Level Security disabled or missing policies. The default failure for Lovable and Bolt.new on Supabase. Roughly 70% of Lovable apps ship with RLS disabled. The Moltbook social network leaked 1.5M auth tokens within days of launch via this exact pattern.

3. Client-side-only authentication. Wiz Research has documented apps where login lives entirely in JavaScript: if (password === "welcometoredacted") localStorage.setItem('authenticated', 'true'). The Tea app's 72,000-ID breach was the same pattern, scaled.

4. No server-side input validation. SQL injection, SSRF, and XSS via raw string concatenation. Tenzai found 100% of 15 AI-built apps had SSRF. Veracode reports 86% XSS and 88% log injection failure rates across model output.

Identity and access (5-6)

5. Broken object-level authorization (IDOR). /api/users/:id returns anyone's data. Static scanners cannot see this. Only behavioral tests catch it. The February 2026 Lovable incident inverted auth logic so anonymous users had full access while authenticated users were blocked, exposing 18,697 records including 4,538 students.

6. No CSRF tokens, no security headers, no rate limiting. Tenzai's numbers: 0/15 apps had CSRF, 0/15 set CSP, HSTS, or X-Frame-Options, and the one that attempted rate limiting was bypassable via X-Forwarded-For. This is the cheapest category to fix and the most consistently absent.

Reliability under load (7-9)

7. Happy-path-only error handling. AI generates the success case. One unhandled null from a Stripe webhook or a third-party API takes down the route, often with the stack trace leaked to the client. Edge-case blindness is systemic, not occasional.

8. Naive database access. N+1 queries, no indexes, no connection pooling. AI optimizes for code that runs, not code that runs at 1,000 RPS. Bolt's well-documented "70% rule" says apps work for 10-50 users and break around 500. In our rescues, the typical fall-over point is usually a missing index or an exhausted Postgres connection pool.

9. Hallucinated dependencies and slopsquatting. A UTSA, Virginia Tech, and Oklahoma study across 16 models and 576,000 samples found 19.7% of AI-recommended packages are entirely fabricated, with 43% reproducing on the same prompt. In January 2026, the hallucinated npm package react-codeshift propagated to 237 repos via AI-generated agent skill files. No human planted it.

Operability (10-12)

10. No tests, no CI, no regression catch. Stack Overflow's 2025 survey found 45% of developers say debugging AI code takes longer than writing it. Mutation testing routinely finds AI-generated code passing tests at 92% line coverage while still containing dedup bugs.

11. No observability. No structured logs, no error tracking, no per-tenant cost metrics. The Replit-SaaStr incident is the canonical case: the agent deleted 1,200+ executive records during an explicit code freeze and then lied about what it had done, because there were no independent audit logs to contradict it.

12. No dev/prod separation, no migrations, no rollback. AI agents using prisma db push instead of migrate dev produce silent schema drift. Replit's hotfix after SaaStr was to add dev/prod database separation, which should never have been a hotfix. If your agent has write access to your production database, you have one outage between you and a postmortem.

Why these twelve and not the other fifty?

We picked the twelve we see most often, with the strongest evidence behind them, that produce distinct fixes. SSR/SEO breakage is real but does not cause buyer pain at the prototype stage. CORS misconfigurations roll into #6. Race conditions and accessibility gaps matter, but neither shows up in every rescue. These twelve do.

What to do before your launch traffic finds these for you

Three actions that catch most of the bleeding before a real audit:

  1. Run git secrets --scan and trufflehog against your repo and your last 90 days of commits. Rotate everything you find.
  2. If you are on Supabase, log into the dashboard and confirm RLS is enabled on every table. Move any auth check that lives in the React client to a server route or Edge Function.
  3. Write one end-to-end test that hits your API directly with no session token and asserts you get a 401 on every protected route. This catches IDOR, broken auth, and missing middleware in one pass.