Why "done" needs a definition before anyone writes code
Acceptance criteria do one thing: they move the disagreement from delivery day, when it costs everything, to day 3, when it costs nothing. Every project has a gap between what the client pictured and what the developer heard. The only choice is when you discover it.
The symptoms of a missing definition of done are familiar to anyone who has hired developers: the demo that works on the freelancer’s machine and nowhere else, the "edge case" that turns out to be the main case, the version 2 invoice for things the client thought were in version 1, and the slow-motion argument conducted through milestone disputes on a platform.
The fix is not a longer contract. It is two short documents in plain language: a scope section that says what is in and, explicitly, what is out, and a numbered list of acceptance tests that define done as observable behavior rather than a feeling.
The scope section: plain sentences, with an explicit OUT list
Write scope as sentences a non-technical person can read aloud. The IN list is what everyone focuses on, but the OUT list is what saves the project: it is where mismatched assumptions go to die before they are expensive.
Note what the example below does: it names the known risk (a site behind aggressive bot protection) on day one and prices the uncertainty into the plan, instead of discovering it in week two and renegotiating under pressure.
Example: scope section from a real spec
Delivered format, client fictionalizedPROJECT: competitor-price monitoring scraper for an e-commerce retailer. SCOPE, IN PLAIN SENTENCES: build a monitor that checks 6 named competitor product pages daily, extracts price and stock status, stores history, and emails a digest when any price changes more than 2%. IN: the 6 sites listed in Appendix A; daily 06:00 ET run; CSV export; 12 months of history; email digest. OUT (explicitly): sites beyond the six; captcha-protected pages (flagged: Site #4 uses Cloudflare, see Risks); a web dashboard (quoted separately if wanted later). RISKS, NAMED NOW: Site #4 may block automated access. Fallback documented in the spec; if their protection hardens after delivery, that is new work, flagged in writing today so it is never a surprise.
Acceptance tests: the format, then a full example
A good acceptance test has three properties. It is observable: it describes something that happens, not a quality the code has. It is verifiable by a non-developer: the client can run the check themselves without reading code. And it covers failure, not just success, because how software behaves when the world changes is most of what you are buying.
The set below is the actual format from our scraper spec. AT-3 deserves attention: it is silent-failure insurance. The expensive version of a broken scraper is not the one that stops, it is the one that keeps writing wrong data with confidence. An acceptance test that requires the system to announce its own breakage is worth more than five tests of the happy path.
Example: acceptance tests (scraper project)
Delivered formatAT-1: a run on the agreed test date produces 6 rows with non-null prices, or an explicit per-site failure reason. (No silent gaps.) AT-2: manually edit a stored price in the database; the next run sends the change digest within 10 minutes of completion. (Proves the alert path end to end.) AT-3: against our saved test copy of Site #2 with its layout changed, the scraper reports "layout changed, needs selector update" instead of silently writing wrong data. (Silent-failure insurance.) AT-4: a fresh-machine setup, following the README literally, completes in under 15 minutes by a non-developer. (Proves the handover is real.) The client signs off against this list. Disagreements happen here, on day 3, not at delivery.
A second example set: API integration project
The pattern transfers to any project type. Here is the same discipline applied to a different common freelance job, connecting a client’s order system to a shipping provider’s API, so you can see which parts are structural and which are project-specific.
Structural in both sets: an end-to-end happy path with concrete numbers, a failure-mode test, a recovery test, and a handover test. Project-specific: everything else. Write your own set by asking one question per risk: "what would I check on delivery day to know this risk did not materialize?" Then move that check to the spec.
Example: acceptance tests (API integration project)
Copy and adaptAT-1: creating a test order in {OrderSystem} produces a shipping label in {Provider} within 2 minutes, with weight, dimensions and address matching the order exactly.
AT-2: when {Provider}’s API is unreachable (simulated by blocking the endpoint), orders queue locally and a single alert email is sent; no orders are lost, none are duplicated.
AT-3: when the API comes back, the queue drains automatically and each queued order produces exactly one label. (Recovery without manual surgery.)
AT-4: an order with an invalid address is rejected with a human-readable reason visible in {OrderSystem}, not a stack trace in a log nobody reads.
AT-5: the runbook’s "rotate the API key" procedure, followed literally by a non-developer, completes without downtime.Tie milestones to acceptance tests, and money to milestones
Acceptance criteria earn their keep when payment references them. Fixed price beats hourly for well-specified small projects precisely because the spec carries the definition of done; each milestone names which acceptance tests it satisfies, and payment for that milestone follows written acceptance against those tests.
From the scraper example: milestone 1 (day 5) covered sites 1 to 3, storage, and a manual run, satisfying AT-1 for those sites, with $290 of the balance due on written acceptance. Milestone 2 (day 9) covered the remaining sites including the documented Cloudflare fallback, the scheduler and the digest, satisfying AT-2 and AT-3, with $310 due. The deposit credited against milestone 1; total fixed at $1,090, agreed before any code.
The structure protects both sides. The client never pays for undefined work; the freelancer never argues about whether work is done, because "done" was numbered. If you are the client and a freelancer resists acceptance tests, that is signal. If you are the freelancer and the client cannot agree on them, you have discovered the project’s real risk early and for free.
The handover document: part of done, not a courtesy
Code that works on the developer’s machine, with no docs and no deploy notes, is a future invoice. The handover document is the last acceptance criterion, and AT-4 in the scraper set ("fresh-machine setup in under 15 minutes by a non-developer") exists to make it testable.
A real handover contains: setup as numbered commands tested on a clean machine, deployment notes (where it runs, how it restarts, where logs go), every secret listed in an example env file with a one-line explanation each, KNOWN LIMITS stated plainly (in the scraper: one site falls back to a delayed price feed when blocked; another cannot report stock status, per the OUT list), and WHAT BREAKS FIRST, with the alert behavior and the cost of the likely fix written down in advance.
This whole structure, spec with acceptance tests in front and tested handover behind, is how our development service runs every project, in four domains where we have working demos: scraping, document search, AI integrations and workflow automation. The complete sample below shows both documents at full length. Copy the format for your next freelance hire either way; the format is the part most projects are missing.
Common questions
What is the difference between acceptance criteria and a requirements document?
Requirements describe what should exist; acceptance criteria describe how you will verify it exists, as observable checks a non-developer can run. "The system must be reliable" is a requirement. "When the API is unreachable, orders queue and none are lost" is an acceptance criterion. Small projects can skip the formal requirements document; they should never skip the acceptance tests.
How many acceptance tests does a small project need?
Four to eight, typically: one end-to-end happy path with concrete numbers, one failure mode, one recovery behavior, one handover test, plus one per significant named risk. Past ten on a small project, the list is restating implementation details instead of defining done.
Who writes the acceptance criteria, client or freelancer?
The freelancer drafts them, because the freelancer knows what can fail; the client edits and signs, because the client knows what matters. The sign-off is the point: criteria one side wrote alone protect only that side. Budget an hour of back-and-forth on the draft; it is the highest-leverage hour in the project.