How accurate is migraine prediction?

Published research puts the realistic ceiling at roughly 0.65 to 0.70 AUC for cohort models, rising to ~0.66 for personalised models after a month of personal data (HAPRED-II 2026). Hermly targets the same range. Anything claiming 90%+ accuracy in this domain is almost certainly overfit.

How long until predictions are useful?

Cohort predictions work from day one. Personal predictions noticeably improve after about 14 days of data and stabilise around day 30, mirroring the HAPRED-II 2026 study findings.

Does my health data leave my phone?

No. Sleep, HRV, cycle, attack records, pain logs, medications, and the personalised model weights all stay on your iPhone. Only your subscription state and anonymous event counters reach our servers.

Is Hermly a medical device?

No. Hermly is a wellness app. It is not FDA-cleared for diagnosis or treatment. Predictions are informational. Always discuss your migraine care with your doctor.

Methodology

How Hermly forecasts
migraine attacks.

A 24-hour probability, on your iPhone, recomputed every time you open Today. Below is exactly how — what we read, what we don't, which research it builds on, and the accuracy ceiling we honestly can't yet beat.

Reading time · 7 min Last updated · May 2026

In one paragraph

Hermly reads 26 daily signals from HealthKit, your menstrual cycle, WeatherKit, and an optional 1-tap stress check. A cohort machine-learning model trained on a consented research cohort gives the baseline. A per-user personalisation layer trained on-device adapts the prediction to you over the first 30 days. Your raw health data never leaves the phone. Realistic accuracy is around 0.66 AUC after a month of data — in line with the best peer-reviewed results in the field. Anything higher published in this space tends not to replicate.

The signal pipeline

From sensors to a single number.

Every refresh, Hermly fetches the day's signals from four sources, derives 26 features, and runs them through a two-layer model. Everything below the dashed line happens on your iPhone, in milliseconds, with no network call.

Figure 1 · Signal sources at top, two-layer model in the middle, conformal-wrapped output at the bottom. The dashed line is the device boundary — nothing below it makes a network call.

The 26 signals

What Hermly reads each day.

Drawn from four sources, processed into 26 raw features plus four derived "today vs. your baseline" deltas. The model handles missing values gracefully — a phone-only user still gets a useful prediction; an Apple Watch + Cycle user gets a more precise one.

From HealthKit (10)

HRV (24h average + 30d baseline ratio)
Sleep duration, efficiency, and deep-sleep fraction
Resting heart rate (today + baseline diff)
Wrist temperature anomaly (Watch Series 8+)
Activity / step counts vs. your baseline

From the menstrual cycle (3)

Cycle day (1–28+)
Phase (menstrual / follicular / ovulatory / luteal)
High-risk-window flag (perimenstrual + ovulatory days)

From WeatherKit (5)

Current barometric pressure
24-hour pressure change
Pressure-drop event flag (drop > 5 hPa)
Local-history pressure z-score (12-month window)
Humidity

Temporal & history (4)

Day of week
Days since last attack
Attacks in past 7 days
Attacks in past 30 days

Optional self-report (4)

Daily perceived stress · 1 tap, 5 buttons, end-of-day reflection. Anchored on the 0–10 Likert used in the HAPRED-I research diary.
Recent attack flag (within 36 h) — derived from your own attack log; one of only two predictors in the published HAPRED-I model.

Stress is opt-in. The picker shows five words — Calm · Mild · Moderate · High · Severe — never a number. Skipping a day leaves the feature missing, never a fabricated "low stress" reading.

Architecture

Two layers, both on your device.

Cohort model

Trained on a consented beta cohort (50 participants, 90 days of HealthKit + diary data). The base learner is XGBoost — gradient-boosted decision trees, chosen because they natively handle missing values (which every multi-source health signal eventually has) and convert one-step into a Core ML .mlpackage. The model ships with the app and updates over-the-air, but never sees your data.

Why not a transformer? Tabular gradient boosting still beats deep learning on small-N tabular data, per Shwartz-Ziv & Armon (2022). We re-validated on our pilot data; XGBoost won.

Per-user personalisation

The cohort model is one size for everyone. To adapt to you, Hermly stacks a small logistic regression head on top, trained on-device from your own labelled days. The published HAPRED-II 2026 trial showed this style of continuous Bayesian update lifts AUC from 0.59 in the first two weeks to 0.66 after a month — meaningful, even if the ceiling stays modest.

Read the trial: HAPRED-II, Houle et al., medRxiv 2026.

Conformal interval

The point estimate ("73%") is the mid-point of a wider honest interval. Hermly wraps the model output in a split conformal prediction band so the UI can communicate uncertainty when it's high (early days, sparse signals). When the band is wide, we say so; when it's tight, we trust the number.

Method: Angelopoulos & Bates, "A Gentle Introduction to Conformal Prediction" (2021).

Accuracy, honestly

We won't quote 95%. The literature can't either.

Migraine prediction is hard because it is inherently noisy. Decades of self-report data show the realistic AUC ceiling for published models lands in the 0.60–0.70 band. Hermly's targets are anchored to those numbers, not to marketing claims.

0.50 (chance) 0.70 1.00 (perfect)

Holsteen 2020 · multi-trigger self-report (n=178)

0.56

HAPRED-I external · stress + current state (n=230)

0.59

HAPRED-II personalised · after 30 days (n=230)

0.66

Stubberud 2023 · wearable + diary, ML hold-out (n=18)

0.62

Hermly target · personalised after 30 days

~0.65

Hermly stretch · personalised after 90 days

~0.75

AUC = area under the ROC curve. 0.50 is no better than chance; 1.00 is perfect. Numbers above 0.85 in published mobile-app migraine literature usually involve sample-size or label-leakage issues — see the HAPRED-II discussion for a careful read.

On-device by design

Privacy isn't a policy. It's the architecture.

The cohort model trained from research-cohort data ships with the app. Your phone runs both the cohort inference and the personal head locally. Your sleep, HRV, cycle, attack records, pain logs, and personalised weights stay on your iPhone. Our servers never see them — they couldn't, even if subpoenaed.

What our servers do see: subscription state (free / trial / Pro), keyed by your anonymous Apple transaction ID; and anonymous event counters (e.g., "onboarding completed today") that contain no health values. Detailed list: our privacy promise.

06:51

Today

Tue · May 13

● ELEVATED

Higher risk window. Pressure forecast to drop in 4h.

Sleep last night was 5.4h, below your baseline.

Pressure

−8 hPa today

Cycle

Day 26

Sleep

5.4 h

The science we built on

Six papers that shaped the model.

Hermly is engineering, not original research — we read what the field has published and built the most honest implementation we could of those ideas.

Houle et al. — HAPRED-II: Individualised Forecasting of Headache Attack Risk medRxiv 2026 · n=230 · 8-week prospective

External validation of a 2-feature parsimonious migraine forecaster. Cohort baseline AUC 0.59; per-user Bayesian updating lifts that to 0.66 after a month. The discussion section is a model of honest reporting.

What Hermly borrows: the Bayesian-update architecture (V1 personalisation), realistic AUC targets, base-rate-drift monitoring after launch, and the safety monitoring concept.

Houle et al. — HAPRED-I: Forecasting Individual Headache Attacks Using Perceived Stress Headache 2017 · n=95 · the original

The two-feature baseline: today's stress (Daily Stress Inventory) plus current headache state. AUC 0.65 on leave-one-out validation. Showed that adding more self-report predictors did not improve fit.

What Hermly borrows: the hadHeadacheLast36h "current state" predictor — one of only two features needed for a useful forecast — and the discipline to keep self-report scales tiny.

Lateef et al. — Sleep, Mood, Energy, and Stress as Headache Predictors Neurology 2024 · n=477 · 4×/day EMA

Decomposed each daily signal into person-mean and within-person Δ-from-mean. Showed both carried independent predictive signal. Energy had opposite-signed effects on morning vs. afternoon attacks — single-window models lose this.

What Hermly borrows: the within-person decomposition — every baseline-paired feature emits both a ratio and a delta — plus the v2 plan for separate AM/PM prediction heads.

Holsteen et al. — A Multivariable Prediction Model From Daily Trigger Exposures Headache 2020 · n=178

Tested the wider trigger set (caffeine, alcohol, sleep, stress, menstruation, self-prediction). Within-person C-statistic only 0.56 — worse than the 2-feature stress-only model. Negative result that shapes our architecture.

What Hermly borrows: the discipline to not add more daily self-report fields. More features ≠ more accuracy.

Stubberud et al. — Forecasting Migraine With Mobile + Wearable ML Cephalalgia 2023 · n=18

Random-forest model on heart rate, skin temperature, and muscle tension. Hold-out AUC 0.62. Calibration described as poor — sets a realistic expectation for raw wearable signals.

What Hermly borrows: calibration as a first- class evaluation metric (not just AUC), and Platt scaling for the per-user head.

Empatica/Gottesman — Smartwatch Autonomic Signals + Migraine 2025 · n=10

Best individualised AUROC 0.68 for next-day migraine. None of the five chronic-migraine participants had above-random performance — only the five episodic ones did.

What Hermly borrows: the chronic-frequency gate. When you're in chronic territory (≥15 attacks in 30 days), Hermly says so honestly instead of pretending to predict.

What Hermly is not

Three things we won't pretend to be.

Not a diagnostic tool

Hermly is a wellness app, not FDA-cleared. It does not diagnose migraine, classify subtype, or detect comorbidities. The Doctor Report is structured data for your conversation with a clinician — never a substitute for one.

Not a treatment recommender

The app does not tell you when to take medication. Even on a high-risk forecast, you'll see facts ("Pressure dropping", "Sleep below your baseline"), not instructions. Acute and preventive medication choices belong with you and your doctor.

Not always right

At ~0.66 AUC, the model is meaningfully better than chance and meaningfully worse than perfect. Some high-risk days pass without an attack. Some quiet days bring one. The UI tries to communicate this honestly — including when the prediction shouldn't be trusted at all.

Common questions

FAQ.

Does it work without an Apple Watch?

Yes. The phone-only path uses sleep, cycle, and weather to drive predictions. Adding a Watch adds heart-rate variability, resting heart rate, and wrist temperature, which improve accuracy on most users — but the app still works without one.

What if I forget to log attacks?

The personalisation layer needs your labels to learn. Skipped attacks aren't fatal — the cohort model still runs — but accuracy levels off rather than improving. The Apple Watch and Live Activity flows are designed so logging takes one tap, even mid-attack.

How long until predictions get useful?

Day one for the cohort baseline (drawn from research-cohort data). The per-user layer noticeably improves after about 14 days and stabilises around day 30, mirroring the HAPRED-II 2026 trajectory.

Why isn't it 95% accurate?

Because nothing in the published literature is. Migraine attacks emerge from interacting biological systems with substantial randomness. The realistic personalised AUC ceiling for a 24-hour forecast lands around 0.66–0.70 in every prospective study to date. We'd rather be honest about that than oversell.

Can I see what features the model is using?

Yes. Today shows the three biggest contributors below the risk number, with their direction and value. The Doctor Report exports a richer breakdown. The full feature schema will be published alongside the open-source release.

Is the prediction model audited or peer-reviewed?

Not yet. The cohort model is being trained on a 50-person prospective beta (recruitment open). After launch we plan external validation comparable to the HAPRED-II protocol. Findings will be published whether they support the product or not.

Early access

Predictions on your phone.
Data that stays there too.

Hermly is in private beta. Leave your email for an invitation when the cohort opens further.

How Hermly forecastsmigraine attacks.