The Happily Workplace Dataset — Organizational Psychology Data for Research

Name: Happily Workplace Dataset
Creator: Happily.ai
Published: 2026-06-06

The Happily Workplace Dataset is a longitudinal, continuously collected record of employee well-being and workplace behavior, spanning 10M+ interactions across 350+ organizations since 2017. It pairs daily mood check-ins, validated well-being (WHO-5), eNPS, peer recognition and feedback networks, manager behavior, and AI-rated power skills, with free-text feedback in Thai and English. It is available to license for academic, people-science, and network-science research. Licensing inquiries: tareef@happily.ai.

Most workplace datasets available to researchers are annual engagement surveys or one-time panels: a single snapshot of a moving picture. Happily's data is collected differently. Through an employee-experience platform that companies use every working day, the same people respond to short prompts on most working days, year after year. That produces a continuous, per-person record of mood, well-being, recognition, and feedback, rather than an end-of-year summary.

Because the same individuals respond repeatedly inside real reporting structures, the data is naturally longitudinal and relational. It supports questions that a survey cannot reach: how a habit holds or drifts across a full year, how one team diverges from another over many months, what a trajectory looks like in the 90 days before someone resigns, and how a behavior spreads from one person to the next through an org chart.

The studies on this site are built from this data. The same data can be licensed, de-identified and scoped to a research question, for institutions that want to analyze it directly. There is no public row-level download; every request is reviewed against privacy and customer commitments.

379,866 daily check-ins came from a single 12-month, 72-company cohort, one recent slice of a dataset that has been collected every working day since 2017.

Why this dataset is rare

Five things rarely appear together in workplace data, and they appear here at once:

Daily granularity. Mood and behavior are recorded most working days, not once a year, so short-window dynamics are visible.
Real workplaces, not a lab or a paid panel. The records are generated in the flow of work by employees at companies using the platform.
Network structure. Recognition and feedback are directed events, who recognizes or asks whom, layered on true reporting lines.
Paired text, numbers, and outcomes. Free-text feedback sits next to numeric ratings and later outcomes such as exit, so themes can be read against what happened.
Multilingual and clinically grounded. A largely Thai and English population, measured with validated instruments like the WHO-5, a group underrepresented in published workplace research.

The dataset at a glance

Scale

10M+ employee interactions across 350+ workplaces.

Time span

Continuous collection since 2017, nine years and counting. The published studies analyze recent windows; the full history is available for licensed extracts.

Structure

Per-person time series plus directed recognition and feedback networks on real reporting lines.

Population

Employees at companies using the Happily platform, primarily Thailand and the wider Southeast Asia region.

Languages

Free text is largely Thai, with English and mixed responses.

Instruments

WHO-5 Well-being Index, eNPS, daily check-ins, weekly stress, peer recognition and feedback, AI-rated power skills.

Unit of analysis

De-identified, aggregated employee responses. Every extract states its own sample.

Access

By license, scoped to a research question. De-identified extracts or aggregated cohort tables under a data-sharing agreement.

Formats

Tabular extracts (for example CSV or Parquet) with field-level documentation.

Citation

Happily Research (2026). The Happily Workplace Dataset. happily.ai/research/dataset/

Contact

tareef@happily.ai

What's in the data

The data is organized around a handful of recurring signals. Each is captured at the grain shown below, and most can be delivered as a per-person time series or aggregated to team, cohort, or company level.

Signal	Each record represents	What it carries	Cadence
Daily check-in	One employee on one day	Mood on a five-point scale, optional free text	Daily
Weekly stress	One employee, one pulse	Stress on a four-point scale, optional free-text source	Weekly
WHO-5 well-being	One employee, one assessment	Five sub-scores combined into a 0–100 index	Quarterly
eNPS	One employee, one survey	0–10 recommendation score, optional barrier text	Periodic
Peer recognition	One recognition event	Giver → receiver, points, optional values tag	Per event
Peer feedback	One feedback request	Requester → chosen giver, written feedback	Per event
Manager reply	One feedback item	Employee text, manager reply, AI quality flags	Per event
Power skills	One written feedback item	Six-skill volume score plus AI quality rating (0.5–5)	Per event
Performance review	One review	Goal and culture ratings	Periodic
Org hierarchy	One employee	Reporting line, multiple levels deep	Maintained

Extracts are delivered as de-identified records: stable hashed identifiers in place of names, no company identities, with test and internal accounts removed. The schema and field-level documentation are shared with each engagement so the data can be loaded and analyzed directly.

Measurement instruments

Where possible the data uses established, externally validated measures rather than metrics invented in-house, which keeps results legible against work done elsewhere.

WHO-5 Well-being Index. A five-item scale from the World Health Organization, validated and used in clinical and academic research worldwide, scored 0–100 with established thresholds.
eNPS. The standard single-item measure of whether an employee would recommend their workplace, on a 0–10 scale, with an optional open-ended reason.
Daily check-in. A short daily prompt on how an employee feels, answered on a five-point scale.
Weekly stress. A four-point self-report of stress, with an optional free-text source.
Peer recognition and feedback. Structured records of who recognizes, and gives feedback to, whom, forming directed networks across teams and companies.
Power skills. Six human skills (critical thinking, self-awareness, optimism, leadership, initiative, empathy) extracted and rated from written feedback. Five of the six align with the top human skills in the World Economic Forum's Future of Jobs 2025 report.

Data packs

The data is most useful when scoped to a question. We package it into named, de-identified data packs, each combining the signals a given research area needs. The sample sizes below are illustrative, drawn from published studies that analyzed only the most recent years of the record. The full archive reaches back to 2017, so a licensed extract can span far more history, and a larger population, than these figures suggest. Each extract is scoped to your design.

Nine years of depth

The figures in this table reflect a recent analysis window of a few years. The complete dataset runs nine years deep, back to 2017, and is available for licensing. Longer time spans mean more within-person history: multi-year trajectories, full tenure arcs from onboarding to exit, and how habits hold across years.

Data pack	What's inside	Illustrative scale
Well-being Longitudinal	WHO-5 assessments, daily mood, and weekly stress as per-person time series	2,912 employees, 74 companies, 65,626 WHO-5 responses
Recognition & Trust Networks	Directed recognition and peer-feedback graphs layered on org hierarchy	3,446 employees across 31 companies
Manager Behavior & Team Outcomes	Manager reply rate, quality, and timing, with team engagement and cascade structure	633 managers, 60 companies
Attrition Early-Warning	Daily mood trajectories and text complaints paired with exit outcomes	7,717 employees (4,532 left, 3,185 stayed), 39+ organizations
Multilingual Text + Behavior	Thai and English free-text feedback paired with numeric ratings, outcomes, and thematic codes	34,803 eNPS responses; 1,681 coded barrier answers
Power Skills Development	AI-rated six-skill longitudinal panel with repeated measurements	2,630 individuals over 180+ days, 80 companies

Custom cohorts

If your question does not map to a pack, we scope a bespoke de-identified extract: a particular time window, role or tenure band, set of signals, or network slice. Tell us the design and we will tell you what the data can and cannot support.

Governance, ethics, and de-identification

Research access runs on de-identified, aggregated data. Individual employees are never identified, identifiers are hashed before any data leaves our systems, and test and internal accounts are excluded. Extracts respect minimum aggregation thresholds so that no result can single out a specific person, and company identities are not shared.

Each engagement is governed by a written data-sharing and license agreement that sets out permitted use, retention, and publication terms. We are glad to support an institution's ethics or IRB review and to align the data-sharing terms with it. Where a study needs a higher bar, analysis can be arranged against aggregated cohort tables rather than record-level extracts.

Publication and access safeguards
Safeguard	How it is applied
Direct identifiers	Names, emails, company identities, and internal IDs are excluded from research outputs. Stable hashes are used only inside controlled extracts when longitudinal analysis requires them.
Public aggregation	Published results describe groups, anonymized cohorts, or cross-company segments rather than named people or customers.
Minimum group coverage	Public cross-company segment results require at least three organizations in the segment.
Minimum claim size	Published statistical claims require at least 30 data points, and each study reports its actual sample.
Exclusions	Test and internal accounts are removed before analysis.
Text protection	Emails, phone numbers, and internal identifiers are redacted before text can enter a research output.

In short

The findings travel; the individuals do not. We share the structure and the signals needed to do real research, never an identifiable person or company.

How licensing works

Access is inquiry-based and scoped to your project rather than sold off a fixed price list.

Share your research question. Tell us the design, the signals you need, and the population you have in mind.
We scope the extract. We confirm what the data can support, propose a pack or custom cohort, and flag any limitations up front.
Agreement. We put a data-sharing and license agreement in place, aligned with your institution's ethics review.
Delivery. You receive de-identified extracts (for example CSV or Parquet), or access to aggregated cohort tables, with field-level documentation.

To start a conversation, write to tareef@happily.ai with a sketch of what you want to study.

What researchers have already found

The data yields findings that hold up to scrutiny. These open studies, each with its own methodology box and limitations, show the kinds of questions it can answer:

WHO-5 Dimensions, where rest is consistently the lowest-scoring well-being dimension across quarters.
Trust Networks, where 72% of the most-trusted people across 31 companies hold no management title.
Attrition signals, where complaint themes in daily text are associated with sharply different exit rates.
Power Skills, where optimism is the skill most strongly associated with receiving peer recognition.
The Leadership Cascade, where reply behavior tracks a manager's own boss but not their skip-level.

You can browse all studies to see the breadth of what has been published from this data.

Limitations

Customer selection. The population consists of organizations using Happily, not a probability sample of all workplaces.
Regional concentration. The collection is weighted toward Thailand and Southeast Asia. Generalizability depends on the research question and sample.
Participation. Responses come from employees who choose to answer a prompt. Missing observations can be meaningful and must be handled explicitly.
Platform context. Measures are collected inside a workplace product, and product design can influence when and how people respond.
Living collection. Collection-wide counts grow over time. Published results remain tied to their stated analysis window.
Observational evidence. Most studies can establish association and temporal pattern, not causation.
Access scope. The extract available for a project depends on consent, privacy, customer commitments, and the minimum data needed to answer the approved question.

How to cite this dataset

The dataset and the studies built on it are meant to be referenced. When you cite the dataset itself, please attribute it to Happily Research and link this page.

Citation

Happily Research (2026). The Happily Workplace Dataset. happily.ai/research/dataset/

To cite a specific finding, cite the individual study and link its page, for example Happily Research (2026). The Stress Sweet Spot. happily.ai/research/stress-sweet-spot/.

Frequently asked questions

What is the Happily Workplace Dataset?

The Happily Workplace Dataset is a longitudinal, continuously collected record of employee well-being and workplace behavior. It spans 10M+ interactions across 350+ organizations since 2017 and combines daily mood check-ins, WHO-5 well-being, eNPS, peer recognition and feedback networks, manager behavior, and AI-rated power skills.

What variables and measures does the dataset include?

It includes daily mood (5-point), weekly stress (4-point), the WHO-5 Well-Being Index (0–100), eNPS (0–10) with free-text reasons, directed peer recognition and feedback networks, manager reply rate and quality, AI-rated power skills, performance ratings, and multi-level organizational hierarchy.

How large is the dataset and what time period does it cover?

Collection has run continuously since 2017, nine years and counting, across 350+ workplaces and 10M+ employee interactions. Individual published studies analyze recent multi-year windows; licensed extracts can draw on the full historical depth.

Where are participants located and what languages are represented?

Participants are employees at companies using the Happily platform, primarily in Thailand and the wider Southeast Asia region. Free-text responses are largely Thai, with English and mixed-language text, a workforce underrepresented in published workplace research.

How can researchers access or license the dataset?

Access is by license and scoped to a research question. There is no public row-level download. Researchers email tareef@happily.ai with their design; Happily assesses what can be shared under privacy and customer commitments, then scopes a de-identified extract or aggregated cohort tables under a data-sharing agreement.

Is the data anonymized and how is privacy protected?

Yes. Direct identifiers are removed, test and internal accounts are excluded, public cross-company segments require at least three organizations, and statistical claims require at least 30 data points. Licensed extracts use stable hashes only where longitudinal analysis requires them and remain governed by the data-sharing agreement.

What license and terms apply, and can it support IRB review?

Each engagement runs under a written data-sharing and license agreement covering permitted use, retention, and publication. Happily supports an institution's ethics or IRB review and can provide aggregated cohort tables rather than record-level data where a higher bar is required.

How should I cite the Happily Workplace Dataset?

Cite it as: Happily Research (2026). The Happily Workplace Dataset. happily.ai/research/dataset/. For a specific study built on the data, cite that study and link its page.

References

Topp, C. W., Østergaard, S. D., Søndergaard, S., & Bech, P. (2015). The WHO-5 Well-Being Index: A Systematic Review of the Literature. Psychotherapy and Psychosomatics, 84(3), 167–176.
Reichheld, F. F. (2003). The One Number You Need to Grow. Harvard Business Review.
World Economic Forum (2025). The Future of Jobs Report 2025.