The 8 questions to ask a vibe-code rescue agency before you sign

The 8 questions to ask a vibe-code rescue agency before you sign
Published

May 19, 2026

Last updated

May 19, 2026

Most agencies that have rebranded toward vibe-code rescue will not survive a serious technical interview. Here are the eight questions to ask, with the answer a production-engineering shop should give and the red flag patterns that disqualify a body shop.

How to read this list

A vibe-coded prototype that crashes in production has a specific cost curve. Apiiro's enterprise data shows AI-assisted code ships 322% more privilege-escalation paths and 153% more design flaws than human-written code. If your Lovable, Cursor, Bolt.new, or Replit Agent app is going to handle real users, real payments, or real data, you need a rescue engineer. Most agencies that have rebranded toward this category will not survive the eight questions below.

This list is written for technical founders evaluating multiple rescue shops at the same time. Each question is paired with the answer a production-engineering shop should give and the red flag pattern that disqualifies a body shop. Send it to every agency on your shortlist, including Bytewise.

Will you rebuild it in your stack, or harden the one I have?

Look for a default-to-harden posture with a written rewrite trigger. A good answer sounds like: "We harden the existing repo unless authz, data-access, or payments code crosses a specific threshold of unowned surface area. Here is the threshold." Red flag: an instant pitch for a full rebuild before anyone has seen the repository.

MetaCTO publishes that roughly 60% of vibe-coded codebases are salvageable. The economics for the agency are the opposite of yours. A rebuild keeps the meter running for months. A harden ships in weeks. Ask for the written criterion that distinguishes the two, and ask to see the last three engagements where the shop chose to harden instead of rebuild. If they cannot name them, they default to rewrite.

What is your test-coverage target for this engagement, and how will you prove it on day 30?

Look for a coverage number scoped to specific code paths (authorization, payments, data access), the framework that produces the number (Vitest, Playwright, pytest), and a CI artifact delivered on a specific date. Red flag: "we do not believe in coverage targets" or, equally bad, "100% coverage."

The number that matters is not the global percentage. It is the percentage on the code paths that, when they break, will end your company. Ask the shop which paths they will instrument first and why.

Who owns the GitHub org, the deployment pipeline, and the cloud account when you leave?

The only acceptable answer is: you do, all of it, from day one. The agency works inside your GitHub org, your cloud account, your secrets manager, your DNS, your Terraform state. The hand-off package includes documented ownership of every account, key, and pipeline.

Red flag patterns to watch for: "we host it on our infra during the engagement," "we use our deployment account for staging," "we have a shared Vercel team for clients." Each of these turns the rescue agency into a worse hostage than the platform you are escaping. Lovable's bi-directional GitHub sync, Bolt's browser-only runtime, and Replit's coupled database and hosting each have their own lock-in profile. Replacing platform lock-in with agency lock-in is not a rescue.

Can you show me a real audit report from a previous engagement?

Look for a sanitized, written audit report with redacted client names and a finding structure that maps to production-readiness work. Red flag: "NDAs prevent us from sharing anything" with no sanitized alternative offered.

This is the difference between production engineering and a slide deck. Production engineering produces artifacts. If the shop you are evaluating cannot show you a written audit with this level of specificity, they have not done the work before.

What is your row-level security and secrets-rotation posture on day one?

Look for a week-one discovery that enumerates RLS coverage across every Supabase or Postgres table, identifies every credential present in the repo history or shipped to the client, and produces a before/after table. Red flag: vague language like "we will review security as part of the engagement."

This question matters because of CVE-2025-48757, the Lovable RLS-default flaw that affected over 170 apps, and because of Escape.tech's October 2025 scan of 5,600 vibe-coded apps that surfaced over 2,000 vulnerabilities and 400+ exposed secrets, including medical records and bank account numbers.  A shop that does not have a number for this question has not done the work.

What do you instrument, and how will I know in production when something breaks?

Look for an observability spine with named tools (Sentry, Datadog, OpenTelemetry), SLOs with specific targets (p95 latency, error rate budget), alert routing to a real on-call rotation, and a runbook artifact you can read. Red flag: "we will add monitoring at the end."

Ask specifically about LLM cost caps. If your app calls OpenAI, Anthropic, or any inference provider, an uncapped endpoint is a credit-card-draining incident waiting to happen. A rescue shop that does not instrument LLM cost as a first-class metric is not a rescue shop for the AI era.

When you finish, what exactly do I get, and what happens in week 13?

Look for a fixed hand-off package, written, with specific contents: README, architecture document, on-call runbook, IaC repository (Terraform or equivalent), secrets inventory, alert routing config, post-rescue technical-debt log, and a recorded handoff call. Red flag: open-ended language like "we will keep going as long as you need us."

The economics of open-ended retainers favor the shop, not you. A real rescue has a defined end. The shop should be confident enough in its work to hand the codebase to another team. Ask whether the runbook is good enough that your next hire, on their first on-call shift, can resolve a production incident without paging the agency. If the answer is no, the work is not done.

What part of this will you sub-contract, and to whom?

This is the question most shops hope you do not ask. The honest answer is one of three: "none," "design or pen-testing through a named partner," or "we use offshore contractors for some implementation work." Any of those is acceptable if it is true. Red flag: discovering after kickoff that the "senior engineers" in the proposal are sales staff, and the actual work is being done by junior contractors you were never introduced to.

The principal engineer you meet in the scoping call is the engineer who leads and writes the code. If the shop you are evaluating cannot tell you, by name and tenure, which engineer will write your authz code, you are buying a body shop.

How to score the answers

Two of these eight are non-negotiable. Question 3 (ownership of GitHub, deployment, and cloud accounts) and Question 4 (a real sanitized audit report) are pass or fail. A shop that cannot pass both is not a rescue shop regardless of how the other six go.

The other six sort shops into three tiers. A production-engineering shop will answer six or more with specific numbers, named tools, and written artifacts. A generalist agency will answer three or four with confident generalities and the rest with sales language. A vibe-coding body shop will pivot every question back to "we can rebuild this for you in our framework." Send this list to every shop on your shortlist. The differences will be obvious within an hour.