Why Doctors Correcting AI output may be Unpaid Data Annotation Work

AI isn't solving the healthcare crisis. The health force is working for AI.

Jim the AI Whisperer4 min read·May 2, 2026

A complimentary, sharable link to this article is available in the first comment.

I am chary of clinical AI, especially AI scribes. Patient conversations are arguably one of the most scarce, and therefore most valuable, datasets in AI. I think clinicians might be going into this a bit rosey-eyed, thinking HIPAA and the GDPR have shored up privacy protections. Clearly, they haven't.

Theoretically, it could be possible to be HIPAA and GDPR compliant, and still share de-identified data with 3rd parties which can then be re-identified if they've got (or can generate) the metadata. De-anonymization with AI is remarkably easy — I've done it myself.

What can you do? I'd recommend you read the policies of clinical AI tools very thoroughly. Look for words like "improve services" or share with "related companies". Check out the consumer protection reports on every tool you use. There's a saying in tech that if the software is free (or cheap), you're likely the product. With AI, you may be an unwitting data generator/AI trainer, and your patients are the training data.

Beyond policy review, medical students should understand how annotation labour is priced in AI supply chains. When a clinician corrects a model output, that correction becomes training signal — often without compensation, attribution, or governance review.

Institutions can mitigate this by negotiating data-use clauses, requiring transparency dashboards, and treating clinician review time as protected academic labour rather than invisible infrastructure for vendors.

This is a Premium Article

One time purchase, Life time access, No subscription nedded.

$4.00