It's perfectly fine to generalize small studies provided that
1) you are generalizing on sound biological principles. In this case case, the diet of the pig determines the IRMS delta. Shelby's witnesses had good points that we don't really know the diet of the pigs served at the truck, especially in light of the pandemic. We understand the difference between C3 and C4 plants and how they contribute to IRMS deltas -- we don't need to run a detailed study for every possible case. Otherwise you could never say anything with any confidence as real world scenarios are always different from controlled studies.
2) You don't overstate your confidence in the effect size. Yes, a study of 5 people is small but that doesn't mean you can't learn anything from it, especially since 40% of the people were 'falsely positive.' That suggests the test is not super reliable! Maybe you got really unlucky and the test is 99% reliable and you just hit that 1% twice out of 5 trials, but it should raise your skepticism!
What it shouldn't do it make you say: "well, it's a small sample size so it's not going to shake my confidence in the test." You're effectively discarding the data for no reason.
The only result of such a finding should be to LOWER your confidence in the test and make you demand more rigorous studies. You are operating from an assumption that the test is perfect and forcing others to disprove it -- that's the inverse of the skepticism required for good science. You see a result with 40% false positive, you immediately are suspicious of the test.
P.S. You definitely can generalize from a sample of 5 sometimes if the effect size is large and the physical mechanism makes sense. Throw 5 people out of an airplane with no parachute and all 5 die, you're going to tell me that the danger skydiving with no parachute needs a larger sample size?[/quote]
Haha! You final point is a good one. There are very rare cases where the causal path is so obvious that generalising from the small sample is fine. That's not the case for the pig offal/nandrolone though so it doesn't help Houlihan.
I'm not going to dismiss the two false positives found in that study, but you simply cannot conclude that the IRMS test has a 40% false positivity rate on the basis of that research - it's notable that the authors themselves never use that language. It's a call for further research. Again, you can qualify the findings of the study even further: the two false positives were both male, and both ate roasted wild boar meet. Who knows what the accuracy is like for women, or if the meat is prepared in a different way?
My assumption is that WADA has access to technical data that supports the accuracy of the IRMS test. I just can't believe that any statutory body would take decisions based on a test that has a high level of inaccuracy. The organisation would have no legitimacy to act