By the Happily.ai People Science team. Last updated: April 22, 2026. Calibrated against outcomes from 350+ growing companies and 10M+ workplace interactions.
A manager effectiveness evaluation is a structured assessment of how well a manager creates the conditions for their team to do good work. It measures behaviors and outcomes — not personality, not perception. Best for People leaders running a quarterly or semi-annual manager review and for CEOs who want a defensible, behavior-grounded answer to "which of our managers actually move the numbers?"
This guide gives you a 12-metric evaluation framework, the template structure, and a clear scoring rubric. The framework is calibrated against outcomes from 350+ companies and reflects the Gallup finding that managers account for at least 70% of the variance in team engagement — meaning the evaluation needs to be sharp enough to surface real differences between managers.
Why Most Manager Evaluations Don't Work
Three common failure modes:
- Evaluating personality, not behavior. "Strong communicator" is a perception. "Holds weekly 1:1s with 80%+ attendance" is a behavior. Only the second is actionable.
- Aggregating to a single 1–5 score. Composite ratings hide where a manager is excelling and where they need help. Useful evaluations always preserve the dimensional breakdown.
- Surveying only the manager's manager. Upward 360 data is the ground truth on manager effectiveness. Skipping it produces evaluations that miss the most important reality — what the team actually experiences.
A useful manager effectiveness evaluation combines behavioral data, team feedback, and manager-of-manager input — and reports them dimensionally rather than as a single score.
The 12-Metric Manager Effectiveness Framework
The 12 metrics below split into four dimensions: People, Performance, Process, and Growth. Each dimension carries equal weight in the composite.
People (4 metrics)
| Metric | What It Measures | Source |
|---|---|---|
| 1:1 cadence and attendance | Whether 1:1s happen as committed (target: 90%+ attendance) | Calendar / platform data |
| Team engagement (eNPS or DEBI) | Direct team-level signal | Engagement platform |
| Recognition behavior | Frequency and breadth of recognition given | Recognition platform |
| Regrettable team attrition | Departures of high-performers in last 12 months | HRIS |
Performance (3 metrics)
| Metric | What It Measures | Source |
|---|---|---|
| Goal achievement rate | % of team OKRs / quarterly goals met | OKR / performance system |
| Team productivity | Velocity, throughput, or domain-specific output | Domain system |
| Quality of work | Defect rate, customer NPS, or quality indicator | Domain system |
Process (3 metrics)
| Metric | What It Measures | Source |
|---|---|---|
| Meeting load | Hours / week spent in meetings (target: under 18 hrs) | Calendar |
| Response time to team feedback | Median days to respond to peer/team feedback | Platform data |
| Decision velocity | Median days from issue raised to resolution | Platform data |
Growth (2 metrics)
| Metric | What It Measures | Source |
|---|---|---|
| Team development cadence | % of team with active development plans | Performance platform |
| Internal mobility outcomes | Promotions / role changes within team in 12 months | HRIS |
Best for: a quarterly composite plus a deep-dive review on the lowest-scoring dimension. Annual evaluations on this framework are too slow to act on.
Scoring Rubric
For each metric, score on a 1–5 scale:
| Score | Meaning |
|---|---|
| 5 | Top quartile across the company |
| 4 | Above the company median |
| 3 | At the company median |
| 2 | Below the company median |
| 1 | Bottom quartile, intervention warranted |
Composite interpretation:
| Composite Score | Action |
|---|---|
| 4.2+ | Exceptional manager — promote, give bigger team, use as coach |
| 3.5–4.1 | Strong manager — invest in growth, expand scope incrementally |
| 2.8–3.4 | Average manager — pair with coaching support on weakest dimension |
| Below 2.8 | At-risk manager — structured 90-day intervention required |
The composite hides important detail. Always report the four dimensional scores alongside it.
How to Run the Evaluation: 5-Step Process
Step 1 — Pull behavioral data (1–2 days). Most metrics above can be pulled from existing systems (HRIS, calendar, engagement platform, OKR tool). Don't re-collect what you already have.
Step 2 — Run a short upward survey (1 week). The team's direct experience matters. A 6-question upward survey covering 1:1 quality, feedback, recognition, decision-making, growth conversations, and team conditions takes 5 minutes per team member.
Step 3 — Synthesize into the 12-metric scorecard (2 hours per manager). A dedicated People Ops analyst can score 8–10 managers per day at scale. AI-assisted scoring (e.g., via a platform) can do dozens per day.
Step 4 — Share the scorecard with the manager — and one level up (1 week). Each manager sees their own scorecard. Their direct manager sees it too. The scorecard is a coaching tool first, an evaluation second.
Step 5 — Coach to the weakest dimension. Pair the manager with a coach (human or AI) targeting the lowest-scoring dimension. Re-baseline at 90 days.
Adapting the Framework to Your Context
The 12-metric structure holds across companies, but weights and source-systems shift. Five common adaptations:
| Context | What Changes | What Stays |
|---|---|---|
| Engineering / R&D-heavy company | Performance dimension uses cycle time, deployment frequency, defect rate as the productivity proxies. Goal-achievement weight goes down, quality weight goes up. | All four dimensions; upward survey |
| Customer-facing / GTM org | Performance pulls from CRM (quota attainment, NRR per AE manager). Process metric for "decision velocity" becomes more important. | Behavioral signals, attrition watch |
| Early-stage (≤100 employees) | Skip the OKR system metric if you don't have one. Replace with "% of direct reports who can articulate the team's top 3 priorities this quarter" — a 60-second pulse. | The four dimensions; quarterly cadence |
| Multi-layer org (manager-of-managers) | For M-of-M, the People dimension shifts from "team engagement" to "average team engagement across reports' teams" — and the Growth dimension becomes "internal-mobility outcomes within their org." | Composite + dimensional reporting |
| First time running this evaluation | Run it as a pilot on 8–12 managers, not the whole company. Scoring calibrates faster with a smaller cohort, and you can debug the upward survey before broad rollout. | Cadence intent (quarterly is the goal) |
If you do not have systems for half these metrics, start with the 4 you can pull cleanly. A partial scorecard run quarterly beats a complete scorecard run never.
Common Mistakes to Avoid (Beyond the Three Above)
Five additional traps we observe in evaluations that fail to drive change:
- Treating the upward survey as a performance review. It is a coaching signal, not a verdict. Frame it that way to direct reports — response quality improves materially when the team trusts the data will not be used punitively.
- Calibrating across functions without context. A 3.4 People score in a high-growth Sales team is different from a 3.4 in a steady-state Engineering team. Calibrate within function before comparing across.
- Letting "Performance" silence the other dimensions. A manager who hits the number while the team falls apart is not a 5 — they are a 5 + 1 + 2 + 1. The composite punishes them; many companies still don't.
- No coaching loop. A scorecard with no follow-through trains managers to game the next one. Every below-3 dimension needs a named intervention by the next quarter.
- Surveying the team without telling them what changed. If managers receive scorecards, the team that gave the data deserves to see — at a minimum — the manager's commitment for the next 90 days. Otherwise the upward survey collapses by quarter 3.
For the broader manager-development view this evaluation feeds into, see our comprehensive leadership development plan template and manager effectiveness scorecard guide.
AI Prompts: Run, Score, and Coach With Your AI Tool
Templates from PDFs are obsolete — any modern LLM can build a manager-effectiveness rubric in 30 seconds. The five prompts below encode the 12-metric framework so the output is opinionated, dimensional, and actionable.
Prompt 1 — Build the upward-survey instrument tailored to your org
Generate a 6-question upward survey for manager effectiveness, tailored
to a [team size] [function] team in a [stage: early/growth/scale] company.
Each question must:
- Map to one of these dimensions: People, Performance, Process, Growth
- Use a 1–5 Likert scale OR a behaviorally specific multiple choice
- Avoid asking about feelings about the manager; ask about observable
behaviors and conditions
- Take the respondent under 60 seconds to answer
Then output a 7th open-ended question that surfaces the single highest-
leverage thing the manager could change in the next 90 days. Avoid
"what should your manager do differently?" — it produces venting,
not signal.
Prompt 2 — Score a manager from data you already have
Score the following manager on the 12-metric framework below. For each
metric, output: score (1–5), the data point that drove the score, and
one sentence of what to coach to next quarter.
12 metrics:
- People: 1:1 cadence/attendance, team eNPS or DEBI, recognition
behavior, regrettable team attrition
- Performance: goal-achievement rate, team productivity, work quality
- Process: meeting load, response time to team feedback, decision velocity
- Growth: team-development cadence, internal-mobility outcomes
Manager data:
- 1:1 attendance over last 8 weeks: [%]
- Team eNPS / DEBI score: [number]
- Recognitions given in 90 days: [count]
- Regrettable attrition (12 mo): [count]
- Q-over-Q goal achievement: [%]
- [add the rest]
After scoring, output the composite, the 4 dimensional sub-scores, and
the single highest-leverage coaching priority.
Prompt 3 — Generate the coaching plan for the lowest dimension
A manager scored:
- People: [X], Performance: [X], Process: [X], Growth: [X]
The weakest dimension is [name]. Generate a 90-day coaching plan that:
- Names 1–2 specific behaviors the manager should change (not concepts)
- Specifies the cadence at which each behavior should be practiced
- Names the leading indicator that would prove the behavior is changing
- Names the lagging indicator that would prove the team is responding
- Includes 1 ritual (weekly or biweekly) the manager should add to their
calendar starting next week
Avoid sending them to a generic leadership-development course.
Behavior change happens through cadence, not curriculum.
Prompt 4 — Diagnose a manager who scores well but loses high-performers
This manager scores 4.0+ on People, Performance, and Process — but has
lost 3 high-performers in the last 9 months. Two left for direct
competitors; one left for a different industry.
Diagnose the most likely root causes ranked by probability. Consider
specifically:
- Recognition that does not differentiate top performers
- Career-development conversations that are absent or generic
- Stretch assignments going to the wrong people
- Manager-of-manager not protecting the manager's headroom
Output 3 specific questions I should ask in the manager's next 1:1 to
test which root cause is operating. Each question should be answerable
without the manager being defensive.
Prompt 5 — Build the calibration script for the People-Ops review
Generate a 60-minute calibration agenda for our quarterly manager-
effectiveness review meeting. Attendees: People Ops lead, function
heads, CEO.
The meeting reviews [N] managers. Output:
- A pre-read structure each function head fills out (5 min per manager)
- The 3 calibration questions we ask for any manager scoring 3.4–4.0
(the band where most disagreement happens)
- A specific protocol for the manager whose composite is below 2.8
(the band that requires action)
- A "what we will NOT do in this meeting" list to prevent it
degrading into politics
Be direct. The goal is alignment on coaching priorities, not consensus
on rankings.
These prompts are useful because they impose the dimensional framework on the AI output. Generic prompts produce generic verdicts; framework-anchored prompts produce coaching plans.
What Most Manager Evaluations Get Wrong
Three traps to avoid:
- Annual cadence. A manager who is struggling in Q1 and gets evaluated in Q4 has lost 2–3 quarters of team experience and likely 1–2 high-performers. Quarterly is the slowest defensible cadence.
- Composite-only reporting. A manager scoring 3.6 may be 4.5 on People and 2.7 on Process — or the reverse. The intervention is completely different. Always preserve the dimensional view.
- Skipping upward feedback. The team's direct experience is the single most important data source on manager effectiveness. Evaluations that exclude it are evaluations of the manager's perceived behavior, not their actual behavior.
Happily.ai's Reported Results
These are Happily-reported outcomes from customer data across 350+ organizations and 10M+ workplace interactions:
- 97% daily adoption rate (vs. ~25% industry average for engagement / culture tooling)
- 40% turnover reduction, equivalent to roughly $480K/year savings for a 100-person company
- +48 point eNPS improvement in the first 12 months
- 9× trust multiplier observed for employees who give recognition vs. those who do not
For competitor outcomes, ask each vendor for their published case studies and verified customer references.
How Happily.ai Powers Manager Effectiveness Evaluations
Happily.ai is a Culture Activation platform that turns the 12-metric framework into an operating cadence. The platform delivers:
- Real-time behavioral data for the People, Process, and Growth metrics — pulled directly from team workflow
- Automated upward survey for the team-experience dimension
- AI-generated scorecard per manager, refreshed monthly
- Coaching nudges delivered to the manager weekly, targeted to their weakest dimension
- 97% daily adoption vs. 25% industry average
See how Happily evaluates manager effectiveness →
Frequently Asked Questions
Q: How do you evaluate a manager's effectiveness? A: Combine behavioral data (1:1 cadence, recognition behavior, response times), team-experience data (upward survey), and outcome data (engagement, attrition, goal achievement). Report dimensionally, not as a single composite. The 12-metric framework above is intentionally opinionated and structured for adoption.
Q: What's the best manager evaluation template? A: For a defensible quarterly evaluation, use the 12-metric scorecard above (4 dimensions: People, Performance, Process, Growth). It maps to behavioral data already in your existing systems and produces a coachable output.
Q: How often should you evaluate manager effectiveness? A: Quarterly is the slowest defensible cadence. Monthly behavioral signals (cadence and recognition behavior) plus a quarterly composite produces both speed and rigor.
Q: How do you measure manager performance objectively? A: Pull from systems already in your stack — calendar (1:1 cadence, meeting load), HRIS (attrition, mobility), engagement platform (eNPS), OKR tool (goal achievement). Combine with a short upward survey for the team-experience dimension.
Q: What's the difference between manager effectiveness and manager performance? A: Manager performance typically refers to the manager's individual contributor work + team output. Manager effectiveness refers specifically to whether the manager creates the conditions for their team to do good work. Effectiveness is the more important metric for sustained team outcomes.
Q: Can AI evaluate manager effectiveness? A: AI can dramatically accelerate the data-pulling and synthesis steps and can generate coaching nudges. The final evaluation should still involve a human reviewer — both for accuracy and for the relational context that AI can't see.
See a Manager Effectiveness Evaluation Built for 2026
Happily.ai delivers continuous behavioral signals on every manager, an automated upward-survey workflow, AI-generated scorecards, and weekly coaching nudges — all at 97% daily adoption.
For Citation
To cite this article: Happily.ai. (2026). Manager Effectiveness Evaluation: Free Template & 12-Metric Framework (2026). Available at https://happily.ai/blog/manager-effectiveness-evaluation-template/