Daniel sent us this one — and it's a natural follow-up to our modafinil episode. He's asking how addiction liability testing actually works in practice. You know, when scientists put an animal in a box and it presses a lever to get a drug, what are they really measuring, and how does that translate to a Schedule II versus Schedule IV decision? And then the bigger question: can we do this without animals at all? Because the FDA Modernization Act a few years back opened the door to alternatives, but the question is whether any of them actually work.
This is genuinely one of those places where the gap between what the public imagines and what actually happens in a lab is enormous. Most people picture a rat getting cocaine and scientists just sort of watching to see if it looks happy. That's not what's happening. What's happening is a brutally precise measurement of motivation — how much work an organism will do for a single dose of a drug. And it turns out that number predicts human abuse potential better than almost anything else we have.
Work ethic as addiction proxy. There's something almost comically Protestant about it.
It does feel that way. But the underlying logic is rock solid. The core assay is called drug self-administration, and the gold standard version uses something called a progressive ratio schedule. Here's how it works. You implant a catheter into a rat's jugular vein — or sometimes a non-human primate's — and connect it to an infusion pump. The animal is in an operant conditioning chamber, which is basically a box with a lever. Press the lever, get an infusion. At first, on a fixed ratio schedule, maybe one press equals one dose. The animal learns the association quickly if the drug is reinforcing. But fixed ratio doesn't tell you much about how badly the animal wants the drug — it just tells you the animal will take it when it's cheap.
You raise the price.
The number of lever presses required for each successive dose increases according to some exponential formula. Maybe dose one costs one press, dose two costs two, dose three costs four, dose four costs eight — though in practice the escalation is usually steeper, something like the equation presses equals five times e to the power of injection number times 0.2, minus five. Eventually the animal hits a wall where the effort required exceeds its motivation. That wall is called the breakpoint. The highest number of lever presses the animal will perform for a single dose. And that number — that single number — is what everyone cares about.
Breakpoint is the price at which demand drops to zero.
That's exactly the right way to think about it. And the range across drugs is staggering. Cocaine consistently produces breakpoints in the range of eight thousand to twelve thousand lever presses per dose. Methamphetamine similar — seven to ten thousand. Heroin lands around three to five thousand. Roughly five hundred to eight hundred. Caffeine sits at two to four hundred. A saline placebo — basically flavored water — produces breakpoints under a hundred. The animal will press a few times out of curiosity or boredom, but once the work requirement escalates, it stops. With cocaine, it will keep pressing thousands of times.
Ten thousand lever presses for a single hit of cocaine. That's an animal that has essentially abandoned all other priorities.
That's exactly what the data shows. In these experiments, if you give a rat unlimited access to cocaine on a fixed ratio schedule — meaning the price stays low — it will self-administer until it dies. It stops eating, stops sleeping, stops doing anything but pressing the lever. The progressive ratio paradigm is actually more humane in a sense, because the escalating cost forces the animal to stop before it kills itself. But it also gives you a much more nuanced picture of relative abuse liability.
Let's anchor this in the modafinil data. What specific study gave us that five to eight hundred breakpoint?
The key paper is Deroche-Gamonet and colleagues, published in Neuropsychopharmacology in 2014. They ran a series of self-administration experiments comparing modafinil to cocaine and amphetamine in rats. Modafinil did maintain self-administration — animals would press the lever for it — which is already a meaningful finding, because not all drugs do. Most antidepressants don't. Antipsychotics don't. The fact that an animal will work for modafinil at all puts it in a category of drugs that have at least some reinforcing properties. But the breakpoint was dramatically lower than cocaine — roughly ten to fifteen percent of the cocaine breakpoint. That ratio, ten to fifteen percent, is essentially the quantitative basis for calling modafinil "mildly habit-forming.
That aligns with human clinical experience.
There are case reports of modafinil misuse — people taking four or five times the prescribed dose, crushing and snorting it, that kind of thing. But the prevalence is low, and the consequences are generally less severe than classical stimulant abuse. The animal data predicted that pattern before we had enough human epidemiological data to confirm it.
The model works. But I want to understand the machinery better. The rat has a catheter in its jugular vein. How do you know it's pressing the lever for the drug and not just pressing the lever because levers are fun?
That's exactly the right question, and it's why every self-administration study includes controls. The most important one is the saline substitution test. You train the animal to self-administer the drug, then you secretly replace the drug solution with saline. If lever pressing drops to near zero, you know the animal was pressing for the drug, not for the lever itself. You can also do a dose-response curve — if the animal adjusts its pressing rate when you change the dose, that's another sign that the drug is controlling the behavior. And then there's the yoked control, where a second animal receives infusions whenever the first animal presses the lever, but its own lever does nothing. The yoked animal doesn't develop persistent lever pressing, which tells you the contingency — the action-outcome relationship — is what drives the behavior.
It's not just that the drug feels good. It's that the drug teaches the animal that lever pressing works.
That's the operant conditioning part. And it's why self-administration is considered more predictive than something like conditioned place preference, which measures a different thing entirely. Conditioned place preference — CPP — is simpler. You have a two-chamber apparatus where one chamber is paired with the drug and the other with saline. After several pairings, you let the animal freely explore both chambers and measure how much time it spends on the drug-paired side. More time equals the drug was rewarding. Modafinil produces weak CPP compared to amphetamine — animals show a slight preference for the modafinil-paired chamber, but nothing like the overwhelming preference for the amphetamine-paired chamber. That aligns with the self-administration data. Weak but real.
Then there's a third model — drug discrimination.
Drug discrimination is about subjective effects. You train an animal to press one lever when it receives cocaine and a different lever when it receives saline. Once it learns the discrimination, you give it a test drug — say modafinil — and see which lever it presses. If it presses the cocaine lever, that means modafinil produces subjective effects similar to cocaine. The modafinil data here is interesting — animals partially generalize to the cocaine cue, but not fully. It's not a clean substitution. Which suggests modafinil's subjective effects overlap with cocaine but are distinct, probably because modafinil's mechanism is different — it's a weak dopamine transporter inhibitor, but it also affects orexin, histamine, and other systems.
You've got three different behavioral models, each measuring a different dimension of abuse potential. Self-administration measures motivation. Conditioned place preference measures reward. Drug discrimination measures subjective similarity to known drugs of abuse. And modafinil scores low-to-moderate on all three. That's the "mildly habit-forming" profile in data terms.
And this is not just academic. This is what the FDA looks at. Under 21 CFR 314.50 and the FDA's 2017 Guidance for Industry on Assessment of Abuse Potential, any new molecular entity with central nervous system activity must undergo abuse potential assessment. The guidance lays out a tiered approach. Tier one is in vitro receptor binding and functional assays — does your drug bind to the dopamine transporter, the serotonin transporter, mu-opioid receptors, and so on. Tier two is animal behavioral models — the self-administration, CPP, and drug discrimination studies we just described. Tier three is human abuse potential studies — giving the drug to recreational drug users and asking them how much they like it. The data from all three tiers feeds into a single regulatory question.
Should this drug be scheduled under the Controlled Substances Act, and if so, which schedule? Schedule II is for drugs with high abuse potential and accepted medical use — that's cocaine, methamphetamine, oxycodone. Schedule III is moderate abuse potential — ketamine, anabolic steroids. Schedule IV is low abuse potential — that's where modafinil landed, alongside benzodiazepines. Schedule V is the lowest — cough syrups with codeine. And then there's unscheduled — no abuse potential worth regulating.
The breakpoint data maps onto those schedules.
It does, and the concordance is remarkably good. A 2013 meta-analysis by Horton and colleagues, published in Drug and Alcohol Dependence, looked at over sixty compounds and found about eighty-five percent concordance between animal self-administration models and human abuse liability as determined by clinical experience and epidemiological data. That's not perfect — there are false positives and false negatives — but for a behavioral assay that costs a fraction of a human trial, eighty-five percent is very strong.
Let's talk about those failures. Where does the model break?
The classic false positive is certain SSRIs — selective serotonin reuptake inhibitors. Some of them show weak self-administration in animals, but in humans they have essentially zero abuse potential. Nobody is crushing and snorting Prozac. The likely explanation is that serotonin's role in reward is different across species, and the animal models were not designed to disentangle serotonergic from dopaminergic reinforcement.
The most notorious is GHB — gamma-hydroxybutyrate. Early rodent self-administration studies showed very weak or no self-administration. Yet GHB turned out to have significant abuse potential in humans, particularly in club drug scenes and among bodybuilders. The problem appears to be pharmacokinetic. GHB has an extremely short half-life in rodents — much shorter than in humans — which means the drug clears before the animal can form a strong association between lever pressing and drug effect. If you adjust the dosing schedule to account for this metabolic difference, GHB suddenly looks more reinforcing in rodents. But nobody knew to do that in the early studies.
The model can miss things when the drug's pharmacokinetics differ substantially across species. Which is a known weakness of any animal-to-human translation.
And this gets to the broader question in the prompt — can we replace animal testing for abuse liability? Because the FDA Modernization Act 2.0, signed in December 2022 and effective in 2023, eliminated the 1938 mandate that required animal testing before human trials. It doesn't ban animal testing — that's a common misconception — but it says you no longer have to do it. You can use alternative methods instead. The question is whether any alternative method is actually good enough for abuse liability assessment.
The answer in mid-2026 is?
But there are several approaches in development, and some are getting close. Let me walk through the main contenders. The simplest alternative is in vitro receptor panels. You take your drug, you measure its binding affinity at a panel of targets — dopamine transporter, serotonin transporter, norepinephrine transporter, mu-opioid receptor, cannabinoid receptors, and so on. The logic is that abuse potential correlates with dopaminergic activity, so if your drug has high affinity for the dopamine transporter, it's probably abusable. Modafinil has weak DAT binding — its inhibition constant, or Ki, is about four micromolar. Compare that to cocaine, which has a Ki of about 0.3 micromolar at DAT. That's more than tenfold difference in binding affinity, and it roughly tracks the breakpoint ratio.
Binding affinity alone doesn't tell you whether the drug is a functional inhibitor, or whether it gets into the brain, or whether it has off-target effects that modulate the abuse potential.
Binding is necessary but not sufficient. A drug could bind tightly to DAT and be completely unable to cross the blood-brain barrier — no abuse potential. Or it could bind weakly but have active metabolites that are more potent — that's actually the case with some opioids. The in vitro panel gives you a screening tool, not a definitive answer. The FDA currently views it as tier one data — useful for flagging high-risk compounds early, but not a replacement for behavioral data.
What about human laboratory studies? If the goal is to predict human abuse potential, why not just test in humans?
We do, eventually. Human abuse potential studies — HAP studies — are tier three in the FDA framework. You recruit people with recreational drug experience, give them the test drug alongside a positive control like amphetamine and a placebo, and ask them to rate drug liking, willingness to take again, and subjective effects. These studies are highly predictive. The problem is you can't do them early in development. You need Phase one safety data first — you need to know the drug is safe enough to give to humans at all. And HAP studies are expensive, they take months, and they require specialized clinical sites. So they're not a replacement for early screening — they're a complement to it.
The gap is early screening. Something you can do before Phase one that gives you a reliable abuse liability signal without putting animals in operant chambers.
That's the holy grail. And the most promising approach right now is computational. Machine learning models trained on large datasets of compounds with known abuse liability. Here's a concrete example. In January 2026, a group from the NIH's National Center for Advancing Translational Sciences — NCATS — published a paper in Nature Machine Intelligence. They trained a model on twelve hundred compounds with known self-administration breakpoint data. The input was molecular structure — basically the chemical formula and three-dimensional conformation of each drug. The output was a predicted breakpoint. They then tested the model on two hundred novel compounds that were not in the training set. The model achieved seventy-eight percent accuracy in predicting whether a compound would produce breakpoints above or below five hundred lever presses.
Seventy-eight percent. That's better than chance, worse than animals.
Significantly worse than animals, which are at eighty-five percent. But here's what's interesting. The model makes different kinds of errors than animals do. The false positives and false negatives don't fully overlap. So in principle, you could combine the computational prediction with the animal data and get better accuracy than either alone. Or you could use the computational model as a triage tool — run ten thousand virtual compounds through it, flag the five hundred that look most concerning, and only test those in animals.
That's the practical use case. Not replacing animal testing entirely, but reducing the number of animals needed by pre-screening computationally.
That's already happening. Several large pharmaceutical companies have internal machine learning models for abuse liability prediction. They use them to deprioritize compounds early in the pipeline. If a compound looks like it'll have a cocaine-level breakpoint, you might still develop it — pain drugs are supposed to be abusable to some degree — but you'll plan for the regulatory burden upfront. If it's supposed to be a non-addictive ADHD medication and the model flags it as high risk, you might kill the program before spending millions on animal studies.
The computational models are decision-support tools, not replacements. What about the more futuristic alternatives? I've heard brain organoids mentioned.
Organoids and microphysiological systems. This is where you grow human neurons in a dish — sometimes in three-dimensional structures that mimic brain circuits. The idea is you could build a dopaminergic reward circuit on a chip, expose it to a drug, and measure something like dopamine release or neuronal firing patterns as a proxy for abuse potential. The appeal is obvious — it's human tissue, so no species translation problem, and it avoids animal ethics concerns entirely. But as of mid-2026, there is no validated organoid-based assay for abuse liability. The technology is still experimental. The circuits don't fully recapitulate the complexity of a real brain's reward system, and we don't yet know which readout — dopamine release, firing rate, something else — best predicts human abuse potential.
We're years away from that being regulatory-grade.
At least five to ten years, probably more. And that's assuming the technology continues advancing at its current pace. There's also a validation problem that doesn't get talked about enough. To validate any alternative method — computational, organoid, whatever — you need a gold standard to compare it against. Right now, the gold standard is the animal self-administration data plus human clinical experience. But if your goal is to replace the animal data, what are you validating against? You end up in a circular logic problem where you're trying to prove your new method predicts what the old method predicted, but the whole point was that the old method might be wrong in ways you can't detect because you're using it as your truth standard.
That's a hard epistemological problem. How do you validate a replacement for the gold standard when the gold standard is the only thing you have?
The only way out is prospective validation. You use the new method to make predictions, then wait for real-world human data to accumulate and see if the predictions were correct. That takes years, sometimes decades, because abuse potential often only becomes apparent after a drug is on the market and being used by millions of people. The opioid crisis taught us that lesson painfully — oxycodone's abuse potential was underestimated by every model we had at the time.
Let me play this back. Animal self-administration models, particularly progressive ratio breakpoint, are the current gold standard with about eighty-five percent concordance with human abuse liability. They're imperfect — false positives from some serotonergic drugs, false negatives from drugs like GHB with weird pharmacokinetics. But they're the best we have. The alternatives are in vitro binding panels, which are cheap but insufficiently predictive on their own. Computational models, which are at seventy-eight percent accuracy and improving but not yet regulatory-grade. Human laboratory studies, which are highly predictive but can't be done early enough to replace screening. And organoids, which are promising but unvalidated. None of them can fully replace animal models today.
That's exactly where we are. And I should add — the FDA is actively working on this. They issued draft guidance on abuse potential assessment in March 2026, currently under review, with a final version expected late this year or early next year. The draft explicitly acknowledges alternative methods and provides a framework for how sponsors can submit data from computational or in vitro approaches. But it stops short of saying those methods can replace animal studies. It's more like: if you want to use alternatives, here's how to present the data, and we'll evaluate it on a case-by-case basis.
Which is regulatory language for "we're open to this, but the burden of proof is on you.
As it should be. The stakes are enormous. If you get abuse liability wrong in either direction, people suffer. False positive — you label a drug as high abuse potential when it's actually safe — and you might kill a promising treatment for depression or ADHD or chronic pain. Patients lose access to something that could help them. False negative — you miss a drug's abuse potential — and you get another OxyContin. You get a public health crisis that kills hundreds of thousands of people.
The asymmetry of error costs is brutal. A false positive costs hypothetical future patients a treatment option. A false negative costs real present patients their lives.
That's why the FDA is conservative about abandoning animal models. The models are imperfect, but they've been validated over decades against real-world outcomes. Any replacement needs to be at least as good, and ideally better, before regulators will accept it as a substitute rather than a supplement.
Let me ask a question from the other direction. The prompt asks how this data is used to inform drug development and safety. So you're a drug company. You've got a new compound that looks great in early efficacy screens — maybe it's a novel non-stimulant ADHD medication. You run the self-administration studies. What happens next?
The data goes into your Investigational New Drug application to the FDA. If the breakpoint is low — below, say, a thousand lever presses — you're probably in good shape. The FDA will still require human abuse potential studies in Phase two or three, but the animal data gives you a strong argument that the drug is unlikely to be scheduled above Schedule IV, or possibly not scheduled at all. That affects your development strategy — lower scheduling means fewer restrictions on prescribing, which means a larger potential market, which means the drug is more commercially viable. If the breakpoint is high — cocaine-level — you have a very different conversation. You might still develop the drug if it's for an indication where abuse potential is expected, like pain. But if it's for something like ADHD or depression, a high breakpoint is a major red flag. It doesn't necessarily kill the program, but it means you'll need to do extensive human abuse potential studies, you'll probably face Schedule II classification, and you'll need to build a Risk Evaluation and Mitigation Strategy — a REMS — into your development plan.
Which is expensive and limits the market.
A REMS can require special prescriber training, restricted distribution, patient registries, the works. For a small biotech company, that can make the difference between a viable product and a program that gets shelved.
The breakpoint number, that single metric, cascades into regulatory strategy, commercial viability, and ultimately whether a drug ever reaches patients.
That's why the animal models matter so much, and why the push for alternatives is so fraught. You're not just replacing a scientific assay. You're replacing a decision-making tool that has enormous consequences for public health and for the economics of drug development. If the alternative is even slightly less predictive, the downstream effects could be massive.
What's the timeline here? If I'm a listener in 2026, when should I expect the first drug to reach market with abuse liability data that came entirely from non-animal methods?
I'd say we're at least five years away from that, and probably more like ten. The NCATS machine learning model is promising, but seventy-eight percent accuracy isn't good enough for a stand-alone regulatory decision. I think what we'll see in the near term — the next three to five years — is hybrid approaches. Computational pre-screening to reduce animal numbers. Organoid data submitted as supplementary evidence alongside animal data. Human abuse potential studies done earlier and more efficiently. Gradually, as the alternative methods prove themselves against real-world outcomes, the reliance on animal models will decrease. But nobody is going to flip a switch and say "no more animal testing for abuse liability." It'll be a slow, evidence-driven transition.
Which is probably the right way to do it. The alternative is betting public health on unvalidated methods because they're more ethically appealing.
That's a bet I don't think anyone should be comfortable making. Animal testing raises genuine ethical concerns — I was a pediatrician, not a researcher, but I've thought about this a lot. At the same time, the opioid epidemic killed over a hundred thousand Americans in a single year at its peak. Getting abuse liability wrong at scale is its own kind of ethical failure. The humane thing is to use the best tools available, while working aggressively to develop better ones.
To bring this back to the modafinil case that started the conversation. When we said in the previous episode that modafinil is mildly habit-forming, what we were really saying is: in progressive ratio self-administration studies, rats will press a lever about six hundred times for a dose of modafinil, compared to ten thousand times for cocaine and three hundred times for caffeine. That number — six hundred — places modafinil in a category of drugs that have real but limited reinforcing properties. It's enough to warrant Schedule IV classification and a warning about misuse potential. It's not enough to make it a drug of abuse on the scale of amphetamine or cocaine. And that conclusion is supported by multiple converging lines of evidence — self-administration, conditioned place preference, drug discrimination, and ultimately human clinical experience.
The larger point for listeners is this. When you hear a drug described as "non-addictive" or "mildly habit-forming" or "low abuse potential," those aren't marketing terms. They're regulatory conclusions backed by a specific, quantitative, decades-refined methodology. There's a number behind that label. A lever press count. And understanding where that number comes from — what it measures, what it doesn't measure, where it can be wrong — makes you a more informed consumer of drug information, whether you're a patient, a prescriber, or just someone trying to make sense of headlines about the latest wonder drug.
Ask what the breakpoint data actually showed. That's the takeaway.
Now: Hilbert's daily fun fact.
Hilbert: In the 1920s, explorers in Guyana discovered that the mycelial networks of certain Marasmius fungi produce faint crackling sounds as they transport water through their hyphae — a phenomenon mycologists now call "fungal cavitation acoustics," measurable at roughly twenty-five kilohertz, well above the range of human hearing but audible to bats and some rodents.
...fungi have a sound.
Bats can hear mushrooms drinking. Good to know.
Here's the open question I keep coming back to. As AI-driven drug discovery accelerates — and it is accelerating, hundreds of novel CNS compounds entering pipelines every year — the bottleneck shifts. Synthesis used to be the bottleneck. Now it's screening. Abuse liability screening, specifically. The pressure to develop faster, cheaper, animal-free methods is going to become overwhelming. The question isn't whether we'll eventually replace animal models for abuse liability. It's whether the replacements will be validated before they're deployed at scale. Or whether we'll learn about their failure modes the hard way.
That's exactly the tension. And it's not unique to abuse liability — it's the same dynamic playing out across toxicology, efficacy screening, all of preclinical drug development. The technology is racing ahead of the validation. The question is whether the regulatory framework can adapt fast enough to keep up without sacrificing the safety standards that took decades to build.
This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. If you enjoyed this deep dive, you'd probably like our episode on how the DEA actually makes scheduling decisions — that's where we follow the data from the lab all the way to the Federal Register. Find it wherever you listen, or at myweirdprompts.
Until next time.