From HealthKit (10)
- HRV (24h average + 30d baseline ratio)
- Sleep duration, efficiency, and deep-sleep fraction
- Resting heart rate (today + baseline diff)
- Wrist temperature anomaly (Watch Series 8+)
- Activity / step counts vs. your baseline
Methodology
A 24-hour probability, on your iPhone, recomputed every time you open Today. Below is exactly how — what we read, what we don't, which research it builds on, and the accuracy ceiling we honestly can't yet beat.
In one paragraph
Hermly reads 26 daily signals from HealthKit, your menstrual cycle, WeatherKit, and an optional 1-tap stress check. A cohort machine-learning model trained on a consented research cohort gives the baseline. A per-user personalisation layer trained on-device adapts the prediction to you over the first 30 days. Your raw health data never leaves the phone. Realistic accuracy is around 0.66 AUC after a month of data — in line with the best peer-reviewed results in the field. Anything higher published in this space tends not to replicate.
The signal pipeline
Every refresh, Hermly fetches the day's signals from four sources, derives 26 features, and runs them through a two-layer model. Everything below the dashed line happens on your iPhone, in milliseconds, with no network call.
The 26 signals
Drawn from four sources, processed into 26 raw features plus four derived "today vs. your baseline" deltas. The model handles missing values gracefully — a phone-only user still gets a useful prediction; an Apple Watch + Cycle user gets a more precise one.
Stress is opt-in. The picker shows five words — Calm · Mild · Moderate · High · Severe — never a number. Skipping a day leaves the feature missing, never a fabricated "low stress" reading.
Architecture
Trained on a consented beta cohort (50 participants, 90 days
of HealthKit + diary data). The base learner is
XGBoost — gradient-boosted decision trees,
chosen because they natively handle missing values (which
every multi-source health signal eventually has) and convert
one-step into a Core ML .mlpackage. The model
ships with the app and updates over-the-air, but never sees
your data.
Why not a transformer? Tabular gradient boosting still beats deep learning on small-N tabular data, per Shwartz-Ziv & Armon (2022). We re-validated on our pilot data; XGBoost won.
The cohort model is one size for everyone. To adapt to you, Hermly stacks a small logistic regression head on top, trained on-device from your own labelled days. The published HAPRED-II 2026 trial showed this style of continuous Bayesian update lifts AUC from 0.59 in the first two weeks to 0.66 after a month — meaningful, even if the ceiling stays modest.
Read the trial: HAPRED-II, Houle et al., medRxiv 2026.
The point estimate ("73%") is the mid-point of a wider honest interval. Hermly wraps the model output in a split conformal prediction band so the UI can communicate uncertainty when it's high (early days, sparse signals). When the band is wide, we say so; when it's tight, we trust the number.
Method: Angelopoulos & Bates, "A Gentle Introduction to Conformal Prediction" (2021).
Accuracy, honestly
Migraine prediction is hard because it is inherently noisy. Decades of self-report data show the realistic AUC ceiling for published models lands in the 0.60–0.70 band. Hermly's targets are anchored to those numbers, not to marketing claims.
On-device by design
The cohort model trained from research-cohort data ships with the app. Your phone runs both the cohort inference and the personal head locally. Your sleep, HRV, cycle, attack records, pain logs, and personalised weights stay on your iPhone. Our servers never see them — they couldn't, even if subpoenaed.
What our servers do see: subscription state (free / trial / Pro), keyed by your anonymous Apple transaction ID; and anonymous event counters (e.g., "onboarding completed today") that contain no health values. Detailed list: our privacy promise.
The science we built on
Hermly is engineering, not original research — we read what the field has published and built the most honest implementation we could of those ideas.
External validation of a 2-feature parsimonious migraine forecaster. Cohort baseline AUC 0.59; per-user Bayesian updating lifts that to 0.66 after a month. The discussion section is a model of honest reporting.
What Hermly borrows: the Bayesian-update architecture (V1 personalisation), realistic AUC targets, base-rate-drift monitoring after launch, and the safety monitoring concept.
The two-feature baseline: today's stress (Daily Stress Inventory) plus current headache state. AUC 0.65 on leave-one-out validation. Showed that adding more self-report predictors did not improve fit.
What Hermly borrows: the
hadHeadacheLast36h "current state" predictor —
one of only two features needed for a useful forecast — and
the discipline to keep self-report scales tiny.
Decomposed each daily signal into person-mean and within-person Δ-from-mean. Showed both carried independent predictive signal. Energy had opposite-signed effects on morning vs. afternoon attacks — single-window models lose this.
What Hermly borrows: the within-person decomposition — every baseline-paired feature emits both a ratio and a delta — plus the v2 plan for separate AM/PM prediction heads.
Tested the wider trigger set (caffeine, alcohol, sleep, stress, menstruation, self-prediction). Within-person C-statistic only 0.56 — worse than the 2-feature stress-only model. Negative result that shapes our architecture.
What Hermly borrows: the discipline to not add more daily self-report fields. More features ≠ more accuracy.
Random-forest model on heart rate, skin temperature, and muscle tension. Hold-out AUC 0.62. Calibration described as poor — sets a realistic expectation for raw wearable signals.
What Hermly borrows: calibration as a first- class evaluation metric (not just AUC), and Platt scaling for the per-user head.
Best individualised AUROC 0.68 for next-day migraine. None of the five chronic-migraine participants had above-random performance — only the five episodic ones did.
What Hermly borrows: the chronic-frequency gate. When you're in chronic territory (≥15 attacks in 30 days), Hermly says so honestly instead of pretending to predict.
What Hermly is not
Hermly is a wellness app, not FDA-cleared. It does not diagnose migraine, classify subtype, or detect comorbidities. The Doctor Report is structured data for your conversation with a clinician — never a substitute for one.
The app does not tell you when to take medication. Even on a high-risk forecast, you'll see facts ("Pressure dropping", "Sleep below your baseline"), not instructions. Acute and preventive medication choices belong with you and your doctor.
At ~0.66 AUC, the model is meaningfully better than chance and meaningfully worse than perfect. Some high-risk days pass without an attack. Some quiet days bring one. The UI tries to communicate this honestly — including when the prediction shouldn't be trusted at all.
Common questions
Yes. The phone-only path uses sleep, cycle, and weather to drive predictions. Adding a Watch adds heart-rate variability, resting heart rate, and wrist temperature, which improve accuracy on most users — but the app still works without one.
The personalisation layer needs your labels to learn. Skipped attacks aren't fatal — the cohort model still runs — but accuracy levels off rather than improving. The Apple Watch and Live Activity flows are designed so logging takes one tap, even mid-attack.
Day one for the cohort baseline (drawn from research-cohort data). The per-user layer noticeably improves after about 14 days and stabilises around day 30, mirroring the HAPRED-II 2026 trajectory.
Because nothing in the published literature is. Migraine attacks emerge from interacting biological systems with substantial randomness. The realistic personalised AUC ceiling for a 24-hour forecast lands around 0.66–0.70 in every prospective study to date. We'd rather be honest about that than oversell.
Yes. Today shows the three biggest contributors below the risk number, with their direction and value. The Doctor Report exports a richer breakdown. The full feature schema will be published alongside the open-source release.
Not yet. The cohort model is being trained on a 50-person prospective beta (recruitment open). After launch we plan external validation comparable to the HAPRED-II protocol. Findings will be published whether they support the product or not.
Early access
Hermly is in private beta. Leave your email for an invitation when the cohort opens further.