2025-03-01
8 min read

Building Healthcare AI at Scale: Lessons from the Trenches

The messy, complex, and rewarding reality of building AI systems that process 120k+ medical procedures daily.

AI/MLHealthcareEngineering

Building Healthcare AI at Scale: Lessons from the Trenches

When I joined Mantys Healthcare AI, I thought I knew what "scale" meant.

I was wrong.

It’s one thing to build a system that handles a million requests a second for a social media app. It’s entirely another to build a system that handles 120,000 medical procedures a day, where a single error could mean a patient gets denied critical care or a hospital loses thousands of dollars.

The stakes are terrifyingly real. And that changes how you engineer.

The Messy Reality of Healthcare Data

If you’ve ever complained about parsing a messy CSV file, try dealing with medical records.

Healthcare data is the wild west. We don't get clean JSON APIs. We get PDFs faxed three times, scanned, and then emailed as a low-res image. We get handwritten notes from doctors scribbled in margins. We get insurance policies that are 100 pages long and contradict themselves on page 42.

Our first big hurdle wasn't building a smart model; it was building a system that could just read this stuff without choking.

We realized early on that standard OCR tools weren't going to cut it. We had to build multi-modal pipelines that could look at a document like a human does—understanding that the faint text in the corner is actually a critical denial code.

The "98% Accuracy" Problem

One of my first major projects was our LLM evaluation framework. In most industries, 95% accuracy is an A+. In healthcare, 95% accuracy means you made errors on 6,000 procedures today. That is unacceptable.

We needed to get to 98%, and we needed to prove it.

We built a custom evaluation rig using Cove and Gevals. But we didn't just trust the numbers. We implemented "log values methods"—essentially a way to mathematically measure our confidence in every single extraction.

If our model is 99% sure, we let it pass. If it's 90% sure? It goes to a human.

This "human-in-the-loop" approach was a mindset shift. We weren't trying to replace humans; we were trying to give them superpowers. Instead of manually reviewing 120,000 cases, our staff now only looks at the tricky 2% where their expertise actually matters.

Replacing Expertise with Software

Our Prior Authorization (PA) system was our moonshot.

PA is a notorious bottleneck. Hospitals have whole teams of experts who just memorize insurance rules. Does Blue Cross cover this MRI if the patient has had X-rays in the last 30 days?

Our goal was to codify that knowledge.

We built a system that ingests conflicting policies and patient history to make a determination. Watching this system work for the first time was magic. It was taking work that used to pile up for days and clearing it in seconds.

But it wasn't just about speed. It was about consistency. Humans get tired. They have bad days. Our API doesn't.

The Tech Stack: Boring is Better

You might expect our stack to be full of bleeding-edge, experimental tools.

It's not.

We use Python and FastAPI because they work. We use PostgreSQL because relational data is king in healthcare. We use Docker and AWS because we need compliance guarantees that we can take to the bank.

# Our philosophy: Keep the code simple, let the logic be complex.
class HealthcareEvaluationFramework:
    def evaluate_extraction(self, document, extracted_data):
        # We run multiple checks because trust is earned, not assumed.
        cove_score = self.cove_evaluator.evaluate(document, extracted_data)
        domain_validity = self.domain_validator.validate(extracted_data)
        
        if cove_score < 0.95 or not domain_validity:
            return self.flag_for_human_review()
            
        return "APPROVED"

The innovation isn't in using a new obscure database; it's in how we orchestrate these boring tools to solve a wildly complex problem securely and reliably.

What YCombinator Taught Us

YCombinator pushes you to move fast. Healthcare pushes you to be careful.

Improving at the intersection of those two forces is where the magic happens. We learned to isolate the "dangerous" parts of our system (the decision engine) from the "safe" parts (the UI, the reporting). This allowed us to iterate on the interface daily while keeping the core logic locked down and compliant.

The Future

We've recovered over $2.3M in pending claims for our partners. That's real money that goes back into buying equipment, hiring nurses, and treating patients.

Building healthcare AI is exhausting. The domain is dense, the regulations are headaches, and the data is ugly.

But when you see the tangible impact—when you see a hospital admin's face light up because their backlog is gone—it's the best job in the world.

If you're an engineer looking for a challenge that matters, come build in healthcare. We need you.


Harshavardhan is a Founding Engineer at Mantys Healthcare AI. He spends his days fighting with OCR pipelines and his nights dreaming about clean datasets.