- The Ops Digest
- Posts
- Why “Satisfied” Customers Still Leave
Why “Satisfied” Customers Still Leave
The customers you’re losing aren’t complaining

Welcome to The Ops Digest!
Each week, we drop no-BS insights and one AI prompt to cut wasted costs, tighten workflows, and eliminate manual grunt work.
Today: most manufacturers and distributors track NPS or CSAT and miss the customers who are already directing their next order to a competitor. Customer Effort Score is what those metrics aren't. And an LLM can score it from interactions you already have, no survey program required. Part 1 of a two-part series.

Stop Measuring Customer Satisfaction. Start Measuring Customer Effort.
Customer Effort Score came out of a contrarian research project at the Corporate Executive Board in 2010. Conventional wisdom at the time said great service meant delighting customers: go above and beyond, exceed expectations, create wow moments. The CEB team studied 125,000 B2B and B2C customers across over 100 companies to test the idea. The finding was inconvenient. Delight barely moved loyalty. Customers whose expectations were met turned out to be nearly as loyal as customers whose expectations were exceeded.
But effort moved loyalty dramatically, in the wrong direction. Customers who had to work hard to resolve an issue became disloyal at much higher rates than customers whose issues got handled easily. The team published their findings in Harvard Business Review and a book called The Effortless Experience, and they proposed a single new question to capture what they had found: "To what extent do you agree the company made it easy to handle your request?" Scored 1 to 7, strongly disagree to strongly agree. They called it the Customer Effort Score, or CES.
See What Lower-Effort Order Entry Actually Feels Like
Y Meadows’ free Starter App lets you test the automation yourself, on your own time.
Upload one of your own orders - or try a sample - to watch Y Meadows extract every detail automatically and prepare an ERP-ready file.
Play around. Give it a messy one. A weird PDF. An Excel sheet someone saved sideways in 2019…
What the Research Shows
The numbers behind that finding are stark:
96% of customers who had a high-effort interaction reported being more disloyal afterward. Among customers who had a low-effort interaction, only 9% said the same. (Gartner)
Service interactions are roughly 4x more likely to push a customer toward disloyalty than toward loyalty. The downside risk of a hard interaction outweighs the upside of a great one. (Gartner)
Customer effort is 40% more accurate at predicting customer loyalty than customer satisfaction. That's per Andrew Schumacher, Senior Principal Advisor at Gartner. (Gartner)
94% of customers with low-effort experiences intend to repurchase. 88% intend to spend more.
CSAT and NPS don't catch any of this. CSAT asks how satisfied customers were. NPS asks whether they'd recommend you. The Customer Effort Score asks whether they had to work to get what they paid for. The third question turns out to be a much better predictor of whether they keep sending you orders.
An Honest Take
NPS asks your customers to predict their own future behavior. "Would you recommend us?" is a hypothetical question with no real consequence to answering. Manufacturing and distribution buyers are bad at it. The buyer at a homebuilder or a hospital procurement office gives you a 9 because you've been their supplier for three years and the rep is friendly. That score doesn't tell you whether they're already taking quotes from a competitor on the next big project.Customer Effort Score is different. It asks about something the customer actually experienced in the last interaction. No abstraction. No prediction. Just: was this hard, or was this easy? That's the metric I'd track if I had to pick one. Run NPS or CSAT at the relationship level if you must. But effort is what predicts whether the customer sends you the next PO with no questions asked, or whether they quietly route it to whoever quoted them last week.
Build It: Score CES Without Sending a Survey
The practical problem with Customer Effort Score is the same as with any survey-based metric: B2B customers don't fill them out. Response rates in manufacturing and distribution typically run under 5%, and the customers who do respond are either the angriest or the most polite, neither of which is representative.
The workaround: score effort yourself, from interaction data you already have. An LLM can read an email thread, support ticket, or call summary and score it on the same seven-point scale a calibrated human would. You don't get a perfect Customer Effort Score. You get a directional one across 100% of your interactions instead of a precise one across 5%.
One AI concept this week: using an LLM as a scoring function with calibration examples. The trick is that the calibration examples define what "high effort" and "low effort" actually mean for your business, not Gartner's.
Step 1: Create the Project
Go to Claude.ai → Projects → Create New Project. Name it "CES Scoring Desk."
Step 2: Paste this prompt into the project's custom instructions
You score customer interactions on the Customer Effort Score
(CES) scale.
CES asks: "To what extent do you agree the company made it
easy for the customer to handle their request?"
1 = Strongly disagree (very hard)
7 = Strongly agree (very easy)
Scores 5-7 count as "low effort" (good)
Scores 1-4 count as "high effort" (bad)
For each interaction provided, output:
1. CES SCORE (1-7)
2. REASONING (2 sentences citing specific friction or
smoothness)
3. EFFORT SIGNALS detected:
- Channel switches (email to portal to phone)
- Repeat contacts to resolve a single issue
- Time elapsed from first contact to resolution
- Information the customer had to repeat
- Tone shifts (neutral to frustrated)
4. JOURNEY TYPE: classify as one of: reorder, new order,
RMA/return, credit memo dispute, quote request, lead
time question, shipping issue, billing question, other
CALIBRATION
LOW EFFORT (score 6-7):
Customer placed a PO via email. Auto-confirmation in 5
minutes. Tracking link issued automatically when shipment
dropped. Zero follow-up needed. Resolution under 24 hours.
MEDIUM EFFORT (score 4-5):
Customer placed an order. Had to email twice asking for
tracking when the portal showed no data. Got tracking on
the second ask. Order arrived on time. Tone patient but
slightly tense.
HIGH EFFORT (score 1-3):
Customer emailed a PO Monday. No reply by Wednesday.
Customer followed up. Was told to call. Called twice,
got voicemail. Reached the rep Thursday. Rep didn't
have the order in the system, asked customer to resend.
Customer expressed frustration in writing. Order placed
Friday.
INPUT FORMAT
Each interaction is a labeled block:
[INTERACTION 001]
Customer: [name or ID]
Channels involved: [email / portal / phone / chat / mixed]
Date range: [first contact - resolution date]
Content: [paste full thread, ticket, or call summary]
OUTPUT FORMAT (one block per interaction)
Interaction: 001
CES score: 3
Reasoning: Customer had to follow up twice through
different channels because the portal showed no
shipping data when the order had actually shipped.
The final answer was correct but took three contacts
and 48 hours.
Effort signals: 2 channel switches (portal to email
to phone), 3 separate contacts to resolve, 48-hour
resolution time, customer language shifted from
neutral to frustrated.
Journey type: shipping issue
RULES
- Score based only on the content provided. Do not
assume context.
- If the interaction is incomplete (resolution unclear),
note that and score conservatively.
- Be consistent. Use the calibration examples as anchors.
- Never invent details that aren't in the source content.Step 3: Connect Claude Cowork to your email
Most manufacturers and distributors run customer interactions through email, not through ticketing systems. That puts the data you need scattered across hundreds of inbox threads. Copying and pasting each one into the project by hand is the part that kills these projects before they start.
Claude Cowork handles the data pull. Cowork is Anthropic's desktop agent for non-developers. It connects to Gmail, Microsoft 365, Excel, and other tools, and you give it instructions in plain English. It reads emails, runs them through your project, and writes the results to a spreadsheet without you copy-pasting anything. Cowork is available on Claude Team, Enterprise, and Max plans. If you're on Claude Pro, you can still do the same thing manually by pasting 30 to 50 threads into the project five at a time. Slower, same result.
In Claude Desktop, go to Customize, then Connectors, and add Gmail or Microsoft 365 (whichever you use). Authenticate with your work account. That's the setup.
Step 4: Ask Cowork to pull and score the threads
Switch to Cowork mode in Claude Desktop and give it an instruction along these lines:
Pull all email threads from the last 60 days where the
sender's domain is one of my customer domains. My main
customer domains are: [list 5-10 of your largest customer
domains here].
For each thread:
1. Format it as an input block for the CES Scoring Desk
project, using this format:
[INTERACTION ###]
Customer: [sender domain or company name]
Channels involved: email
Date range: [first message - last message]
Content: [full thread]
2. Run each block through the CES Scoring Desk project.
3. Collect the results in an Excel file with columns:
Interaction ID, Customer, Date Range, CES Score,
Journey Type, Reasoning.
4. Save the file to my Desktop as
"CES_Baseline_[today's date].xlsx".
Skip internal-only threads, mass marketing or vendor
emails, and threads with fewer than 2 messages.Cowork will work through this and will probably ask you for clarification a few times on the first run, especially around which domains count as customers and which don't. That's fine. The first run is where you tune the inputs. The second run is where it starts saving you real time.
Step 5: Spot-check the scores before you trust the average
Open the Excel file. Sort by CES Score, lowest first, and read the reasoning column on the bottom 10. Ask: do these actually look like the high-effort interactions to you? Then sort highest first and do the same on the easy ones. You're looking for agreement between the score and what you'd score yourself if you read the thread cold.
If a thread scored a 2 because the customer needed three escalations and a five-day resolution, that's a real signal. If it scored a 2 mostly because the customer used some frustrated language but the issue actually resolved on the first reply, the calibration examples in your project need tuning. Adjust the LOW, MEDIUM, and HIGH anchors and rerun.
Don't skip this step. The spreadsheet looks authoritative. It isn't, until you've spot-checked the extremes.
Step 6: Calculate your CES
CES is the percentage of interactions scored 5, 6, or 7. If 100 interactions came back and 62 scored 5+, your CES is 62%.
Gartner's best-in-class benchmark is 85% or higher. Most manufacturers and distributors who run this for the first time land in the 60-70% range. That's not the bad news. The bad news is they didn't know.
What to Watch Out For
This is a calibrated estimate, not a survey. The score reflects what a careful reader would conclude about the customer's effort, not what the customer would have said. Treat it as directional. Use it to compare journey types, rep workloads, and time periods, not as an absolute number to put in front of the board.
Calibration matters more than dataset size. Spend time tuning the LOW, MEDIUM, and HIGH examples to your business. Generic examples produce generic scores.
Data quality is the ceiling. If your CRM call notes are one line ("called customer, all good"), there's not enough signal for the model to score. Start with the richer sources, full email threads and complete ticket histories.
The Bottom Line
Most manufacturers and distributors are flying blind on customer effort. They send NPS or CSAT surveys quarterly, get back a number that doesn't move much, and assume the customer base is stable. Then a long-time account stops sending POs and nobody saw it coming.
Customer Effort Score is older than most of the metrics on your dashboard. It's also more accurate at predicting where the next order will or won't go. And you don't need a survey program to start using it. You need 30 to 50 customer email threads, a Claude project, and about an hour.
Next issue (Part 2): we'll cluster these scored interactions by journey type and rank them by where customer friction is concentrated. The journeys that matter for manufacturers and distributors are reorders, RMAs, credit disputes, quote follow-ups, and shipping issues. We'll also layer in sentiment, which Gartner's research shows is multiplicative rather than additive when paired with effort. That ranking is where the real ops decisions get made.

👇 👇 👇
If It’s Hard for Your Team, Your Customer Feels It
A customer sends a purchase order.
Your team has to review it, reformat it, and manually enter it into your system.
That “extra work” doesn’t stay internal - it shows up as:
Slower order confirmations
Follow-up emails to fix missing information
Occasional errors in the order