Manager Effectiveness Evaluation: 12-Metric Framework + AI Prompts (2026)

By the Happily.ai People Science team. Last updated: April 22, 2026. Calibrated against outcomes from 350+ growing companies and 10M+ workplace interactions.

A manager effectiveness evaluation is a structured assessment of how well a manager creates the conditions for their team to do good work. It measures behaviors and outcomes — not personality, not perception. Best for People leaders running a quarterly or semi-annual manager review and for CEOs who want a defensible, behavior-grounded answer to "which of our managers actually move the numbers?"

This guide gives you a 12-metric evaluation framework, the template structure, and a clear scoring rubric. The framework is calibrated against outcomes from 350+ companies and reflects the Gallup finding that managers account for at least 70% of the variance in team engagement — meaning the evaluation needs to be sharp enough to surface real differences between managers.

Why Most Manager Evaluations Don't Work

Three common failure modes:

Evaluating personality, not behavior. "Strong communicator" is a perception. "Holds weekly 1:1s with 80%+ attendance" is a behavior. Only the second is actionable.
Aggregating to a single 1–5 score. Composite ratings hide where a manager is excelling and where they need help. Useful evaluations always preserve the dimensional breakdown.
Surveying only the manager's manager. Upward 360 data is the ground truth on manager effectiveness. Skipping it produces evaluations that miss the most important reality — what the team actually experiences.

A useful manager effectiveness evaluation combines behavioral data, team feedback, and manager-of-manager input — and reports them dimensionally rather than as a single score.

The 12-Metric Manager Effectiveness Framework

The 12 metrics below split into four dimensions: People, Performance, Process, and Growth. Each dimension carries equal weight in the composite.

People (4 metrics)

Metric	What It Measures	Source
1:1 cadence and attendance	Whether 1:1s happen as committed (target: 90%+ attendance)	Calendar / platform data
Team engagement (eNPS or DEBI)	Direct team-level signal	Engagement platform
Recognition behavior	Frequency and breadth of recognition given	Recognition platform
Regrettable team attrition	Departures of high-performers in last 12 months	HRIS

Performance (3 metrics)

Metric	What It Measures	Source
Goal achievement rate	% of team OKRs / quarterly goals met	OKR / performance system
Team productivity	Velocity, throughput, or domain-specific output	Domain system
Quality of work	Defect rate, customer NPS, or quality indicator	Domain system

Process (3 metrics)

Metric	What It Measures	Source
Meeting load	Hours / week spent in meetings (target: under 18 hrs)	Calendar
Response time to team feedback	Median days to respond to peer/team feedback	Platform data
Decision velocity	Median days from issue raised to resolution	Platform data

Growth (2 metrics)

Metric	What It Measures	Source
Team development cadence	% of team with active development plans	Performance platform
Internal mobility outcomes	Promotions / role changes within team in 12 months	HRIS

Best for: a quarterly composite plus a deep-dive review on the lowest-scoring dimension. Annual evaluations on this framework are too slow to act on.

Scoring Rubric

For each metric, score on a 1–5 scale:

Score	Meaning
5	Top quartile across the company
4	Above the company median
3	At the company median
2	Below the company median
1	Bottom quartile, intervention warranted

Composite interpretation:

Composite Score	Action
4.2+	Exceptional manager — promote, give bigger team, use as coach
3.5–4.1	Strong manager — invest in growth, expand scope incrementally
2.8–3.4	Average manager — pair with coaching support on weakest dimension
Below 2.8	At-risk manager — structured 90-day intervention required

The composite hides important detail. Always report the four dimensional scores alongside it.

How to Run the Evaluation: 5-Step Process

Step 1 — Pull behavioral data (1–2 days). Most metrics above can be pulled from existing systems (HRIS, calendar, engagement platform, OKR tool). Don't re-collect what you already have.

Step 2 — Run a short upward survey (1 week). The team's direct experience matters. A 6-question upward survey covering 1:1 quality, feedback, recognition, decision-making, growth conversations, and team conditions takes 5 minutes per team member.

Step 3 — Synthesize into the 12-metric scorecard (2 hours per manager). A dedicated People Ops analyst can score 8–10 managers per day at scale. AI-assisted scoring (e.g., via a platform) can do dozens per day.

Step 4 — Share the scorecard with the manager — and one level up (1 week). Each manager sees their own scorecard. Their direct manager sees it too. The scorecard is a coaching tool first, an evaluation second.

Step 5 — Coach to the weakest dimension. Pair the manager with a coach (human or AI) targeting the lowest-scoring dimension. Re-baseline at 90 days.

Adapting the Framework to Your Context

The 12-metric structure holds across companies, but weights and source-systems shift. Five common adaptations:

Context	What Changes	What Stays
Engineering / R&D-heavy company	Performance dimension uses cycle time, deployment frequency, defect rate as the productivity proxies. Goal-achievement weight goes down, quality weight goes up.	All four dimensions; upward survey
Customer-facing / GTM org	Performance pulls from CRM (quota attainment, NRR per AE manager). Process metric for "decision velocity" becomes more important.	Behavioral signals, attrition watch
Early-stage (≤100 employees)	Skip the OKR system metric if you don't have one. Replace with "% of direct reports who can articulate the team's top 3 priorities this quarter" — a 60-second pulse.	The four dimensions; quarterly cadence
Multi-layer org (manager-of-managers)	For M-of-M, the People dimension shifts from "team engagement" to "average team engagement across reports' teams" — and the Growth dimension becomes "internal-mobility outcomes within their org."	Composite + dimensional reporting
First time running this evaluation	Run it as a pilot on 8–12 managers, not the whole company. Scoring calibrates faster with a smaller cohort, and you can debug the upward survey before broad rollout.	Cadence intent (quarterly is the goal)

If you do not have systems for half these metrics, start with the 4 you can pull cleanly. A partial scorecard run quarterly beats a complete scorecard run never.

Common Mistakes to Avoid (Beyond the Three Above)

Five additional traps we observe in evaluations that fail to drive change:

Treating the upward survey as a performance review. It is a coaching signal, not a verdict. Frame it that way to direct reports — response quality improves materially when the team trusts the data will not be used punitively.
Calibrating across functions without context. A 3.4 People score in a high-growth Sales team is different from a 3.4 in a steady-state Engineering team. Calibrate within function before comparing across.
Letting "Performance" silence the other dimensions. A manager who hits the number while the team falls apart is not a 5 — they are a 5 + 1 + 2 + 1. The composite punishes them; many companies still don't.
No coaching loop. A scorecard with no follow-through trains managers to game the next one. Every below-3 dimension needs a named intervention by the next quarter.
Surveying the team without telling them what changed. If managers receive scorecards, the team that gave the data deserves to see — at a minimum — the manager's commitment for the next 90 days. Otherwise the upward survey collapses by quarter 3.

For the broader manager-development view this evaluation feeds into, see our comprehensive leadership development plan template and manager effectiveness scorecard guide.

AI Prompts: Run, Score, and Coach With Your AI Tool

Templates from PDFs are obsolete — any modern LLM can build a manager-effectiveness rubric in 30 seconds. The five prompts below encode the 12-metric framework so the output is opinionated, dimensional, and actionable.

Prompt 1 — Build the upward-survey instrument tailored to your org

Generate a 6-question upward survey for manager effectiveness, tailored
to a [team size] [function] team in a [stage: early/growth/scale] company.

Each question must:
- Map to one of these dimensions: People, Performance, Process, Growth
- Use a 1–5 Likert scale OR a behaviorally specific multiple choice
- Avoid asking about feelings about the manager; ask about observable
  behaviors and conditions
- Take the respondent under 60 seconds to answer

Then output a 7th open-ended question that surfaces the single highest-
leverage thing the manager could change in the next 90 days. Avoid
"what should your manager do differently?" — it produces venting,
not signal.

Prompt 2 — Score a manager from data you already have

Score the following manager on the 12-metric framework below. For each
metric, output: score (1–5), the data point that drove the score, and
one sentence of what to coach to next quarter.

12 metrics:
- People: 1:1 cadence/attendance, team eNPS or DEBI, recognition
  behavior, regrettable team attrition
- Performance: goal-achievement rate, team productivity, work quality
- Process: meeting load, response time to team feedback, decision velocity
- Growth: team-development cadence, internal-mobility outcomes

Manager data:
- 1:1 attendance over last 8 weeks: [%]
- Team eNPS / DEBI score: [number]
- Recognitions given in 90 days: [count]
- Regrettable attrition (12 mo): [count]
- Q-over-Q goal achievement: [%]
- [add the rest]

After scoring, output the composite, the 4 dimensional sub-scores, and
the single highest-leverage coaching priority.

Prompt 3 — Generate the coaching plan for the lowest dimension

A manager scored:
- People: [X], Performance: [X], Process: [X], Growth: [X]

The weakest dimension is [name]. Generate a 90-day coaching plan that:
- Names 1–2 specific behaviors the manager should change (not concepts)
- Specifies the cadence at which each behavior should be practiced
- Names the leading indicator that would prove the behavior is changing
- Names the lagging indicator that would prove the team is responding
- Includes 1 ritual (weekly or biweekly) the manager should add to their
  calendar starting next week

Avoid sending them to a generic leadership-development course.
Behavior change happens through cadence, not curriculum.

Prompt 4 — Diagnose a manager who scores well but loses high-performers

This manager scores 4.0+ on People, Performance, and Process — but has
lost 3 high-performers in the last 9 months. Two left for direct
competitors; one left for a different industry.

Diagnose the most likely root causes ranked by probability. Consider
specifically:
- Recognition that does not differentiate top performers
- Career-development conversations that are absent or generic
- Stretch assignments going to the wrong people
- Manager-of-manager not protecting the manager's headroom

Output 3 specific questions I should ask in the manager's next 1:1 to
test which root cause is operating. Each question should be answerable
without the manager being defensive.

Prompt 5 — Build the calibration script for the People-Ops review

Generate a 60-minute calibration agenda for our quarterly manager-
effectiveness review meeting. Attendees: People Ops lead, function
heads, CEO.

The meeting reviews [N] managers. Output:
- A pre-read structure each function head fills out (5 min per manager)
- The 3 calibration questions we ask for any manager scoring 3.4–4.0
  (the band where most disagreement happens)
- A specific protocol for the manager whose composite is below 2.8
  (the band that requires action)
- A "what we will NOT do in this meeting" list to prevent it
  degrading into politics

Be direct. The goal is alignment on coaching priorities, not consensus
on rankings.

These prompts are useful because they impose the dimensional framework on the AI output. Generic prompts produce generic verdicts; framework-anchored prompts produce coaching plans.

What Most Manager Evaluations Get Wrong

Three traps to avoid:

Annual cadence. A manager who is struggling in Q1 and gets evaluated in Q4 has lost 2–3 quarters of team experience and likely 1–2 high-performers. Quarterly is the slowest defensible cadence.
Composite-only reporting. A manager scoring 3.6 may be 4.5 on People and 2.7 on Process — or the reverse. The intervention is completely different. Always preserve the dimensional view.
Skipping upward feedback. The team's direct experience is the single most important data source on manager effectiveness. Evaluations that exclude it are evaluations of the manager's perceived behavior, not their actual behavior.

Happily.ai's Reported Results

These are Happily-reported outcomes from customer data across 350+ organizations and 10M+ workplace interactions:

97% daily adoption rate (vs. ~25% industry average for engagement / culture tooling)
40% turnover reduction, equivalent to roughly $480K/year savings for a 100-person company
+48 point eNPS improvement in the first 12 months
9× trust multiplier observed for employees who give recognition vs. those who do not

For competitor outcomes, ask each vendor for their published case studies and verified customer references.

How Happily.ai Powers Manager Effectiveness Evaluations

Happily.ai is a Culture Activation platform that turns the 12-metric framework into an operating cadence. The platform delivers:

Real-time behavioral data for the People, Process, and Growth metrics — pulled directly from team workflow
Automated upward survey for the team-experience dimension
AI-generated scorecard per manager, refreshed monthly
Coaching nudges delivered to the manager weekly, targeted to their weakest dimension
97% daily adoption vs. 25% industry average

See how Happily evaluates manager effectiveness →

Frequently Asked Questions

Q: How do you evaluate a manager's effectiveness? A: Combine behavioral data (1:1 cadence, recognition behavior, response times), team-experience data (upward survey), and outcome data (engagement, attrition, goal achievement). Report dimensionally, not as a single composite. The 12-metric framework above is intentionally opinionated and structured for adoption.

Q: What's the best manager evaluation template? A: For a defensible quarterly evaluation, use the 12-metric scorecard above (4 dimensions: People, Performance, Process, Growth). It maps to behavioral data already in your existing systems and produces a coachable output.

Q: How often should you evaluate manager effectiveness? A: Quarterly is the slowest defensible cadence. Monthly behavioral signals (cadence and recognition behavior) plus a quarterly composite produces both speed and rigor.

Q: How do you measure manager performance objectively? A: Pull from systems already in your stack — calendar (1:1 cadence, meeting load), HRIS (attrition, mobility), engagement platform (eNPS), OKR tool (goal achievement). Combine with a short upward survey for the team-experience dimension.

Q: What's the difference between manager effectiveness and manager performance? A: Manager performance typically refers to the manager's individual contributor work + team output. Manager effectiveness refers specifically to whether the manager creates the conditions for their team to do good work. Effectiveness is the more important metric for sustained team outcomes.

Q: Can AI evaluate manager effectiveness? A: AI can dramatically accelerate the data-pulling and synthesis steps and can generate coaching nudges. The final evaluation should still involve a human reviewer — both for accuracy and for the relational context that AI can't see.

See a Manager Effectiveness Evaluation Built for 2026

Happily.ai delivers continuous behavioral signals on every manager, an automated upward-survey workflow, AI-generated scorecards, and weekly coaching nudges — all at 97% daily adoption.

See Happily in action →

For Citation

To cite this article: Happily.ai. (2026). Manager Effectiveness Evaluation: Free Template & 12-Metric Framework (2026). Available at https://happily.ai/blog/manager-effectiveness-evaluation-template/

Manager Effectiveness Evaluation: 12-Metric Framework + AI Prompts (2026)

Why Most Manager Evaluations Don't Work

The 12-Metric Manager Effectiveness Framework

People (4 metrics)

Performance (3 metrics)

Process (3 metrics)

Growth (2 metrics)

Scoring Rubric

How to Run the Evaluation: 5-Step Process

Adapting the Framework to Your Context

Common Mistakes to Avoid (Beyond the Three Above)

AI Prompts: Run, Score, and Coach With Your AI Tool

What Most Manager Evaluations Get Wrong

Happily.ai's Reported Results

How Happily.ai Powers Manager Effectiveness Evaluations

Frequently Asked Questions

See a Manager Effectiveness Evaluation Built for 2026

For Citation

Two Types of Managers in the AI Era. Only One Has a Future.

Hybrid Working Survey Questions: 30 to Use in 2026 + AI Prompts

Subscribe to Smiles at Work | Insights from 10M+ Workplace Interactions newsletter and stay updated.

Manager Effectiveness Evaluation: 12-Metric Framework + AI Prompts (2026)

Why Most Manager Evaluations Don't Work

The 12-Metric Manager Effectiveness Framework

People (4 metrics)

Performance (3 metrics)

Process (3 metrics)

Growth (2 metrics)

Scoring Rubric

How to Run the Evaluation: 5-Step Process

Adapting the Framework to Your Context

Common Mistakes to Avoid (Beyond the Three Above)

AI Prompts: Run, Score, and Coach With Your AI Tool

What Most Manager Evaluations Get Wrong

Happily.ai's Reported Results

How Happily.ai Powers Manager Effectiveness Evaluations

Frequently Asked Questions

See a Manager Effectiveness Evaluation Built for 2026

For Citation

Two Types of Managers in the AI Era. Only One Has a Future.

Hybrid Working Survey Questions: 30 to Use in 2026 + AI Prompts

You might also like

Growth Mindset Is a Daily Mechanism, Not a Cultural Value

Team Performance Improvement Plan: A Practical Template (2026)

Manager Performance Improvement Plan (PIP): Template + AI Prompts (2026)

How to Improve Your eNPS Score by 50 Points in 6 Months: The Feedback Loop Method

Subscribe to Smiles at Work | Insights from 10M+ Workplace Interactions newsletter and stay updated.