Satiety, Scarcity, and the Reinforcement Learning of Being Human
Published:
How We Eat, and What It Says About Us
There is something curious about the way we eat. Not just what we eat, but how we move toward food, the speed at which we consume it, and the feelings that orbit the act. I’ve been thinking about satiety lately, not as a nutritional metric, but as a philosophical, psychological, and computational signal. What is satiety, if not the body learning when it has enough?
On the surface, it seems simple: eat food, become full. But the deeper I thought, the more I began to see how the experience of eating, particularly plain, cold, unseasoned food, triggers a set of internal mechanisms that have more to do with attention, expectation, and behavior than just digestion.
It’s almost as if the body says: there’s no pleasure signal here, no need to over-engage.
Take cold boiled potatoes, for instance. Unseasoned, pulled from the fridge, dense and waxy. There’s nothing exciting about eating them. But what struck me was that these are precisely the foods that make me feel full the fastest. That becomes a cue, a policy update in behavior.
Reinforcement Learning and Eating Behavior
Here, the ideas from reinforcement learning come alive. In RL, an agent interacts with an environment, makes decisions, receives feedback (rewards), and updates its policy to improve outcomes. The framework doesn’t just apply to robots and algorithms. It applies to how we, humans, learn to eat, to live, to choose.
High-palatability foods, hot, salted, seasoned, oiled, are high-reward actions. They give immediate positive reinforcement, which strengthens the policy: eat more, eat faster. But they also interfere with the long-term signal of satiety. The agent becomes biased toward short-term reward maximization.
Now consider the low-reward action: eating a cold, unsalted potato. The immediate reward is low, but the long-term signal, the feeling of fullness, the absence of hunger, the behavioral regulation that follows, is much stronger.
The agent learns that certain inputs, though not exciting, lead to better long-term regulation.
This is what in RL terms might resemble delayed or sparse rewards.
Scarcity and the Real Reason We Break Routine
This isn’t just theoretical. In practice, I found myself instinctively reaching for the microwave, the salt, the hot sauce, not out of hunger, but out of a desire to increase the reward of the experience. But when I chose not to, when I just ate the cold potato, I noticed something else: my mind disengaged. I got bored. And that boredom was satiety showing up as disinterest. The novelty was gone, the dopamine spike was absent, and so I stopped.
Scarcity fuels attention. Scarcity drives value.
Our reward functions are shaped not just by taste but by scarcity, both environmental and psychological. In ancestral environments, salt, sugar, and fat were rare, so we evolved to crave them. But that logic extends beyond food. Today, we seek novelty itself as a scarce commodity: the new restaurant, the exotic flavor, the once-in-a-lifetime meal.
Training the System to Feel Enough
And yet, there is something simple but powerful in flipping that. When you train a model in RL, you sometimes want to remove extra rewards so that it can focus on what actually matters. Maybe that’s what we need more of in life: stripping away the distractions, lowering the reward gradient, and letting the system learn what satiety actually feels like.
In that sense, eating plainly isn’t just a nutritional decision. It’s a behavioral experiment. A test of how far I can detach the reward from the act, and still listen to the deeper signals of the system.
Final Thought
To be human, then, is to be a learning agent constantly caught between exploration and exploitation, novelty and nourishment. And maybe, just maybe, the most meaningful rewards are the ones that arise not from dopamine spikes, but from understanding when we’ve had enough.