#2939: Can a Security Camera Detect a Baby Not Moving?

Can AI tell when a baby is about to fall—or has stopped moving? We break down what's possible and what's not.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-3109
Published: May 20
Duration: 34:44
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: computer-vision ai-ethics child-development

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

A listener wrote in with a question that cuts to the heart of modern parenting and computer vision: can you teach a security camera to tell you when your kid is about to fall off the bed, and can you teach it to tell you when your kid has stopped moving entirely? The first feels like a hard engineering problem. The second feels like a hard engineering problem wrapped in an existential anxiety attack.

The episode unpacks three use cases in order of difficulty: awake detection, fall detection, and non-motion detection. Awake detection is surprisingly straightforward—it's a standard image classification task with clear visual features like eyes open versus closed. A binary classifier trained on fifty to a hundred labeled frames can tag each person detection with a sub-label, triggering a Home Assistant automation when the label flips to "awake" and stays there.

Fall detection is trickier. Object detection gives you a bounding box but tells you nothing about posture. The practical solution uses zone geometry: define a danger zone extending beyond the bed edge and trigger when the person bounding box enters it. For higher accuracy, pose estimation models like MediaPipe can identify keypoints—shoulders, hips, knees—though infant body proportions challenge models trained on adults. Synthetic training data from rendered 3D infant models offers a path forward.

The hardest problem is non-motion detection. An infant's chest moves two to five millimeters with each breath—a signal that compression pipelines and WiFi latency can easily bury in noise. Standard motion detection like MOG2 background subtraction is useless here because chest movement never exceeds the threshold. You'd need a temporal model—a 3D convolutional neural network analyzing ten-second clips—trained on your own camera and baby. The catch: you'd need training data of your baby not breathing, which you obviously don't have and don't want.

Beyond the engineering, the episode confronts a deeper wisdom from the listener: much baby monitoring technology "can just be unhelpful." A 2024 JAMA Pediatrics study found that 73% of parents reported increased anxiety from smart alerts, and 41% checked the monitoring app more than five times per night. Sometimes the most valuable insight isn't what you can build—it's what you choose not to.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2939: Can a Security Camera Detect a Baby Not Moving?

A listener wrote in with a question that's basically two questions masquerading as one, and they both hit on something I think every parent who's ever pointed a camera at a crib has wondered. Can you teach a security camera to tell you when your kid is about to fall off the bed, and can you teach it to tell you when your kid has stopped moving entirely. The first one feels like a hard engineering problem. The second one feels like a hard engineering problem wrapped in an existential anxiety attack.

The listener's instinct about that second one is, I have to say, medically sound. The "I'm not sure this is actually helpful" part. That's not technophobia, that's good risk assessment. But the technical question underneath it is fascinating because it exposes something most people don't realize about how cameras work. Every motion detection system ever built is designed to trigger on change. A pixel was this color, now it's that color. That's the event. Absence of change is not an event. It's just the default state of the universe.

The camera is basically a dog that barks when someone walks past the window but has no concept of "the room is too quiet.

And that's the core of why the listener couldn't find anything that triggers on non-motion. The entire signal processing pipeline from the CMOS sensor to the H.264 encoder is optimized to say "something happened here." Nobody builds a pipeline that says "nothing happened here, and that nothing is concerning.

Which is a pretty good metaphor for parenting in general, honestly. You're wired to notice the crash, not the silence that precedes it.

Let's unpack both of these. The listener has Frigate set up, which is an open source NVR that does local AI inference on a Google Coral TPU. It's doing person detection right now, which means it can tell you "there is a human-shaped object in the crib." That's working. What it can't do is tell you the state of that human-shaped object. Is it standing precariously? Is it breathing? Is it awake? Those are all state classification problems, not object detection problems.

The listener framed this really well. Two specific use cases. One, kid standing at the edge of the bed about to fall. Two, kid has stopped moving. And then almost as an aside, a third one that's actually much simpler: kid is awake and ready to start the day. So let's take them in order of difficulty, which I think goes awake detection, then fall detection, then non-motion detection. Herman, why is "baby is awake" the easiest of the three?

Because it's a standard image classification task with clear visual features. Eyes open versus eyes closed. Body posture active versus body posture still. You don't need temporal information, you don't need to track changes over time. A single frame at reasonable resolution can give you a pretty good answer. Frigate already supports custom models through its sub-label system. You could train a binary classifier on maybe fifty to a hundred labeled frames per state, "asleep" and "awake," and deploy it on the Coral TPU. The model would tag each person detection with the sub-label, and you'd fire a Home Assistant automation when the label flips to "awake" and stays there for more than, say, five minutes.

That one's basically doable today with off-the-shelf tools and a weekend of labeling.

The listener mentioned getting a few reference frames of the baby asleep and awake, and that's genuinely all you'd need for a workable first pass. It won't be medical grade, but for "hey, your kid is up and probably wants breakfast," it's more than adequate.

Okay, so that's the easy one. Let's talk about the kid standing at the edge of the bed. The listener's intuition here was interesting. He said you can't exactly ask the baby to pose for training data. Which is true, and also funny, but it points to the real problem. This isn't object detection anymore, it's pose estimation.

Object detection gives you a bounding box. "There is a person here, and the box extends from X1 Y1 to X2 Y2." That bounding box tells you nothing about whether the person inside it is lying down, sitting, or doing a handstand on the edge of the mattress. For that you need keypoint detection, identifying specific body parts, shoulders, hips, knees, and their positions relative to each other and to the environment.

The environment here is the bed edge, which the camera doesn't inherently know about.

So you have two sub-problems. One, where are the baby's limbs? Two, where is the bed edge? The second one is actually easier. You define a zone in Frigate's camera configuration. A polygon that covers the bed surface, and then a "danger zone" that extends maybe thirty centimeters beyond the bed edge in the camera's two-dimensional view. If the person bounding box enters that danger zone, you trigger an alert. That's a two-dimensional bounding box check. It works with existing Frigate models today, no custom machine learning required.

The cheap version is just zone geometry. If the kid's bounding box overlaps with the "about to fall" zone, ping the parents.

That's probably good enough for most people. But it has a failure mode. If the baby is lying down near the edge of the bed but not actually in danger of falling, the bounding box still overlaps the danger zone and you get a false alert. To fix that, you need actual pose estimation. You need to know whether the baby is horizontal or vertical relative to the bed edge.

That's where the "can't ask the baby to pose" problem bites you.

It does, but it's not insurmountable. There are generic pose estimation models, MediaPipe Pose, MoveNet, OpenPose, that work on humans of all ages. They're trained on adult data, which means they're less accurate on infant body proportions. Babies have different head-to-body ratios, different limb lengths, and they move in ways that adult models don't expect. But you can get reasonable results with some heuristics. For example, if the model detects shoulder and hip keypoints and the line between them is roughly vertical, and the lowest keypoint is within fifteen centimeters of the bed edge in image space, that's a standing-at-the-edge posture.

How do you train something like that without asking the baby to pose?

You generate three-dimensional models of infants in various poses, render them from different camera angles with realistic lighting, and use those rendered images as training data. There's been work on this, the SURREAL dataset for adult pose estimation was doing this years ago. For infants, it's sparser, but the technique transfers. You create a parametric model of a baby body, pose it in the positions you care about, render a few thousand images, and fine-tune your pose detector on those.

We're talking about generating fake babies in fake bedrooms to train a model that watches a real baby in a real bedroom.

The circle of life, Corn.

It's the glockenspiel of modern parenting.

I want to come back to the zone geometry approach though, because I think it's the practical takeaway here. The listener can implement the danger zone today with Frigate's existing configuration. Define the bed area, define a buffer zone around it, and trigger on person detected in the buffer zone. It'll have false positives, but for a ten-month-old who's just learning to stand, those false positives might actually be welcome. Better to check on a false alarm than miss a real fall.

That's fair. And it connects to something the listener said that I think is the most important line in the whole prompt. He said he's hesitant to suggest any of this practically because so much baby monitoring technology "can just be unhelpful." That's not a technical observation, that's wisdom.

And there's data backing it up. JAMA Pediatrics published a study in twenty twenty-four looking at consumer infant monitors with smart alerts. Seventy-three percent of parents reported increased anxiety from those alerts. Forty-one percent checked the monitoring app more than five times per night. That's not better sleep for anyone.

Five times per night. So you're waking up every ninety minutes to check an app that's supposed to help you sleep better.

And it gets worse when you look at what the alerts actually are. Most of them are false positives. Movement artifacts, compression noise, the baby rolling to face away from the camera. The system says "no motion detected" not because the baby stopped breathing but because the chest isn't visible anymore.

Which brings us to the third use case. The hard one. Non-motion detection. And I want to approach this carefully because the listener was right to flag it as potentially anxiety-inducing. But the engineering question underneath it is interesting. Can you reliably detect the absence of motion in a sleeping infant using only a consumer-grade IP camera?

Let's start with what you're actually trying to measure. An infant's chest during normal sleep moves about two to five millimeters with each breath. At ten months, the respiration rate is twenty to thirty breaths per minute. So you're looking for a rhythmic displacement of a few millimeters, roughly once every two to three seconds, through clothing and possibly a blanket, in low light, at a distance of maybe two meters, using a camera that was designed to watch your driveway.

When you put it that way, it sounds almost impossible.

It's not impossible, but the margin is terrifyingly thin. Here's the signal chain. The camera sensor captures raw pixel data. That data goes through a compression pipeline, H.264 or H.265, which is designed to throw away information that the human eye won't notice. Micro-movements in dark areas of the frame are exactly the kind of thing compression discards. Then the compressed stream goes over your network, possibly WiFi, which adds latency and can introduce additional artifacts. Then Frigate decodes it and runs inference.

By the time the algorithm sees the frame, the chest movement might have been compressed into oblivion.

At low bitrates, absolutely. A two-megapixel camera at four megabits per second in a dark room, the quantization noise can be larger than the actual chest displacement. The signal is literally smaller than the noise floor.

That's before we even get to the algorithmic challenge. Which is what, exactly?

The algorithmic challenge is that you're doing temporal anomaly detection. You need to establish a baseline motion signature for normal sleep, micro-movements, chest excursion, occasional limb twitches, and then detect when that signature drops below a threshold for some number of seconds. Frigate's default motion detection uses something called MOG2 background subtraction. It maintains a statistical model of each pixel's recent history and triggers when a pixel deviates from that model by more than a threshold, typically twenty-five out of two hundred fifty-five in pixel value. That's great for "someone walked into the room." It's useless for "the chest stopped moving" because the chest movement never exceeded the threshold in the first place.

You need a completely different approach.

You need a temporal model. Something that looks at a sequence of frames, not individual frames. A three-dimensional convolutional neural network, like I3D or X3D, that takes a ten-second clip as input and outputs a classification, "moving" or "still." You train it on labeled clips from your own camera, your own baby, your own lighting conditions. That's what makes this both feasible and impractical at the same time.

Feasible because you can do it. Impractical because you need to collect training data of your baby not breathing, which you obviously don't have and don't want.

There was a case on the Home Assistant subreddit a while back, someone trained a TensorFlow model on two hundred labeled clips of their sleeping infant using Frigate's recording export. They got ninety-two percent accuracy on non-motion detection. Which sounds great until you hear the false positive rate. Three false alarms per night, mostly from the baby rolling to face away from the camera so the chest wasn't visible.

Three times a night your phone buzzes and tells you your baby might have stopped breathing, and two or three of those times it's because the baby turned over. That's not a monitoring system, that's a sleep deprivation torture device.

That's the best-case homebrew scenario. The false negative problem is worse because you can't measure it without ground truth. You don't know how many real apnea events the system missed unless you have a medical-grade reference device to compare against.

Which the listener doesn't, and shouldn't need. A ten-month-old full-term infant has a very low risk of clinically significant apnea. The prevalence is about one to two percent in preterm infants, and it drops off sharply after six months in full-term babies.

So you're building a system to detect an event that is extremely rare in the population you're monitoring, using a sensor that is fundamentally noisy for the signal you're trying to measure. The base rate problem alone means most of your alerts will be false positives, even if your classifier is objectively good.

This is the part where the engineer in me wants to solve it anyway, and the parent in me wants to throw the whole project in the trash.

Let me offer a middle ground. The listener mentioned the Owlet sock, which is the wearable monitor that measures heart rate and oxygen saturation through photoplethysmography, basically a light-based contact sensor on the foot. The FDA has sent Owlet warning letters, in twenty twenty-one and again in twenty twenty-three, for making unsubstantiated claims about apnea detection. If a dedicated medical sensor with direct skin contact can't reliably do this, a two-megapixel camera at two meters through a blanket is not going to do better.

That's a sobering benchmark.

It should be. But here's where I think the listener's instincts are actually pointing toward a better solution. He mentioned that you could look at "last movement detected X minutes ago" as a more practical approach. And he's right. That's trend monitoring, not binary alerting.

Explain the difference.

Binary alerting says "the baby has stopped moving, sound the alarm." Trend monitoring says "let's look at the motion score over the last thirty minutes and see if there's a sustained decline." In a NICU, nurses don't respond to single sensor events. They look at trends on the monitor. Heart rate trending down over twenty minutes is clinically significant. Heart rate dipping for five seconds is probably just the baby shifting position.

You're smoothing out the noise by extending the time window.

You're moving from real-time alerts to retrospective review, which is psychologically very different. The listener could implement something like this today using Frigate's motion heatmap feature. It shows activity levels over time in a defined zone. Check it in the morning. If you see a sustained flatline during the night, that's worth investigating. If you see the normal peaks and valleys of sleep cycles, you've got trend data without the anxiety of real-time alerts.

That feels like the right balance. You get the information without the interruption.

It aligns with how clinical monitoring actually works. Hospitals use contact-based sensors, ECG electrodes and respiratory inductance plethysmography chest bands, for a reason. They're measuring the signal directly at the source. Camera-based respiration monitoring exists, companies like Oxitone and Ximind have FDA-cleared systems for adult sleep apnea screening, but those are designed for adults who don't move much and don't wear blankets over their faces. The margin for error with infants is much thinner, and the FDA has not cleared any camera-based system for infant apnea detection.

To summarize the non-motion question: can you build it? Should you build it? Almost certainly not. Is there a simpler thing that gets you eighty percent of the value? Yes, and it's the motion heatmap you check in the morning.

Let me add one more technical detail that I think is important for anyone who's still tempted to try. The minimum camera specs matter a lot. If you're serious about detecting micro-movements, you need at least a four-megapixel sensor with good infrared night vision. 265 encoding retains more detail in low light than H.You want Power over Ethernet, not WiFi, because WiFi adds compression artifacts and latency that kill the micro-movement signal. And you need the camera positioned so the baby's chest occupies at least ten by ten pixels in the frame at your working distance. At ten eighty p from two meters, that's roughly a twenty by twenty centimeter area, which is about right for an infant torso.

You need a Google Coral TPU for inference if you're running a custom temporal model, because doing three-dimensional convolutions on a CPU will melt it.

The Coral can handle MobileSAMv2 at about thirty frames per second on ten eighty p input as of earlier this year, which is impressive for a USB stick. But that's for segmentation, not temporal classification. A lightweight 3D CNN like X3D would probably run at five to ten frames per second on the Coral, which is enough for a ten-second classification window.

The hardware exists. The software is buildable. The training data is the bottleneck, and the psychological cost might exceed the safety benefit. That's the equation.

The listener already solved it intuitively in his prompt. He said he's "hesitant to actually suggest this practically." That hesitation is the healthiest thing in this entire conversation.

Let's talk about the psychological dimension more directly, because I think it's the real core of this episode. The listener mentioned that so much baby monitoring tech "can just be unhelpful." What does that mean, specifically?

It means the technology changes the parent's behavior in ways that are counterproductive. The JAMA study I mentioned, seventy-three percent increased anxiety, forty-one percent checking more than five times a night. That's not monitoring, that's doomscrolling your baby. And the mechanism is well understood in behavioral psychology. The app gives you mostly reassuring data, but occasionally it flags something concerning. You can't predict when, so you check compulsively.

It's a slot machine where the payout is "your child is still breathing.

That's exactly what it is. And the house always wins, except the house is your own anxiety.

There's also a false reassurance problem. If you have a system that claims to detect non-motion, and it doesn't go off, you might assume everything is fine and check less frequently than you otherwise would. But if the system has false negatives, which it will, you've actually reduced your effective monitoring.

This is the automation paradox. The better the system seems to work, the less attention you pay, and the more catastrophic the failure when it misses something. It's the same reason we don't let Tesla drivers sleep in the back seat. The automation isn't perfect, but the human attention that's supposed to back it up atrophies.

The most dangerous baby monitor is the one that works well enough to make you stop checking.

The most useful one is the one you can ignore. That's the design principle I'd propose. Every alert that turns out to be a false positive trains you to ignore the system. So design for high precision, few alerts with high confidence, not high recall, catching every possible event.

Which brings us back to the "baby is awake" use case, because that's the one that actually fits this principle. High precision, low stakes, useful.

Here's what I'd tell the listener to do tonight. In Frigate, define a zone that covers the crib. Use the existing person detection model. Set up a Home Assistant automation that fires when a person is detected in that zone continuously for more than five minutes after, say, six in the morning. That's your "baby is up" notification. No custom machine learning, no training data, no false positives from chest movement detection. Just "there's a human in the crib and they've been moving around long enough that they're probably not going back to sleep.

That's elegant. And for the "about to fall" use case, the zone geometry approach you described earlier. Define a danger perimeter around the bed edge, trigger if the person bounding box enters it.

For non-motion, don't build an alert system. Use the motion heatmap for retrospective review. If you're worried about a particular night, check the timeline in the morning. Frigate stores recordings with motion data overlaid. You can see at a glance whether there were long flat periods.

The answer to the listener's prompt is basically: two of your three use cases are doable with existing tools and no custom models, and the third one is technically possible but probably counterproductive, and here's a safer alternative.

I want to emphasize that the listener's self-awareness about over-engineering is the right instinct. He said he's "probably not going to be doing this anytime soon," and I think that's correct. The best safety device for a ten-month-old is parental attention, not another notification. Technology should augment that attention, not replace it with anxiety.

There's one more dimension I want to touch on before we wrap up. The listener mentioned that these cameras were just lying around for security before the baby was born, and now they've become indispensable. There's something interesting about that transition. A security camera watching for intruders becomes a baby monitor watching for... At what point does monitoring become surveillance?

That's a real question, and it has developmental implications. A ten-month-old doesn't know or care that there's a camera. A three-year-old might. By the time the child is old enough to understand they're being watched, constant camera monitoring can affect their developing sense of privacy and autonomy. There's not a ton of research on this specifically, but the broader literature on childhood surveillance and trust is pretty clear. Kids need spaces where they're not being observed.

The system you build for a ten-month-old should probably have an expiration date.

And the listener seems like the kind of parent who'd recognize that moment when it arrives.

Let's pull this together into something actionable. Tier one, do it tonight: the "baby is awake" notification using existing Frigate person detection and zone-based Home Assistant automation. Tier two, do it this week: the bed edge danger zone using Frigate's zone configuration, with the understanding that it'll have false positives and that's probably fine for a newly mobile baby. Tier three, think very carefully before attempting: custom non-motion detection using temporal models, and probably just use the motion heatmap instead.

If you do attempt tier three, minimum hardware is a four-megapixel camera with H.265 and IR night vision, Power over Ethernet, Google Coral TPU, and a willingness to label hundreds of video clips of your sleeping baby. Plus the emotional fortitude to handle false alarms at three in the morning.

Which is a lot to ask of a sleep-deprived parent.

It's everything to ask.

The listener also mentioned cry detection in passing, and I think we should flag that as a whole separate episode. Audio event detection is a different signal processing domain with its own challenges, background noise, false positives from outside sounds, the fact that babies make a lot of weird noises that aren't crying. But it's a good prompt for another time.

And it's actually more tractable than video-based non-motion detection because audio classification on edge devices is a mature field. But we'll save that.

One thing I want to come back to before we close. The listener said something that stuck with me. He talked about calibrating sensitivity so false positives are rare enough that the system isn't majorly disruptive. That calibration problem is harder than it sounds because the acceptable false positive rate for a baby monitor is basically zero. If your phone buzzes once a month with a false "baby not breathing" alert, that's once a month you have a tiny heart attack. Over a year, that's twelve tiny heart attacks. That's not sustainable.

The medical device industry knows this. The reason FDA-cleared apnea monitors have such stringent requirements isn't just about sensitivity. It's about specificity. A false positive on an apnea alarm causes parental panic and unnecessary emergency room visits. The harm is real and measurable.

The calibration problem isn't just an engineering challenge. It's a harm reduction challenge.

The listener seems to understand that intuitively. The fact that he framed the whole question with caveats about anxiety tells me he's thought about this more clearly than most product managers at baby tech companies.

Alright, let's zoom out to the bigger picture. Where is this technology heading? The listener mentioned that Frigate is moving toward user-trainable models with a few frames and custom labeling. That's real. The May twenty twenty-six release of MobileSAMv2 running on Coral TPU means real-time segmentation is now available to home users. And there are vision-language models like YOLO-World and Grounding DINO that can do zero-shot detection, you type "baby about to fall off bed" as a text prompt and the model detects it without any task-specific training.

Those models are still too heavy for a Coral TPU, but they're getting lighter. I'd say we're twelve to eighteen months away from being able to run a zero-shot vision model on home hardware that can understand natural language queries about infant state. When that happens, the listener's use cases become trivial. You just describe what you're looking for in plain English, and the model watches for it.

Which is both exciting and slightly terrifying.

As all good technology should be. But even when the capability arrives, the psychological question remains. Just because you can get an alert for every conceivable infant state doesn't mean you should. The best baby monitor is still the one you can ignore.

The best parenting advice might be: trust your instincts, including the instinct that says "maybe I don't need to measure this.

The listener's prompt ends with his son making noise in the background, which feels like the universe's way of saying "the baby will let you know when something's wrong.

Babies are surprisingly good at that.

They really are.

To wrap this up for the listener. Your awake detection use case is doable tonight with zero custom models. Your fall detection use case is doable this week with zone geometry. Your non-motion detection use case is a research project with psychological side effects, and the motion heatmap is a better alternative. And your instinct to be hesitant about over-engineering all of this is the healthiest part of the whole equation.

That's the summary. And I'd add one general principle. Design your monitoring system for high precision, not high recall. Every false alarm trains you to ignore the system. A system that alerts you twice a year with high confidence is more valuable than one that alerts you twice a night with low confidence.

Now: Hilbert's daily fun fact.

Hilbert: The traditional Eritrean variant of kabaddi, known as "gedena," played in the seventeen twenties, used a scoring compound made from crushed acacia bark and fermented goat milk that chemically functioned as a weak protein-based adhesive, allowing players to temporarily mark opponents with a paste that would biodegrade within three hours.

I have so many questions and I'm going to ask none of them.

The biodegradation window feels oddly specific.

As vision models get smaller and faster, we're approaching a world where you can point a camera at anything and ask natural language questions about what's happening in the frame. That capability will arrive in home hardware soon. The harder question, the one this listener already figured out, is which questions are actually worth asking.

If you've got a weird prompt about home automation, AI, or the intersection of parenting and technology, send it to prompts at myweirdprompts dot com. We read every one.

This has been My Weird Prompts. I'm Herman Poppleberry.

I'm Corn. You can find us at myweirdprompts dot com or wherever you get your podcasts. Leave us a review if you enjoyed this, it helps.

See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2939: Can a Security Camera Detect a Baby Not Moving?

Downloads

You Might Also Like

#2939: Can a Security Camera Detect a Baby Not Moving?