MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required
MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required The Idea Medical question answering is one of those tasks where the stakes are genuinely high. A model that confidently picks the wrong answer on a clinical MCQ isn't just wrong — it's dangerous. At the same time, most open-source medical AI work assumes you have an NVIDIA GPU. CUDA is the default. Everything else is an afterthought. This project challenges that assumption. MedQA is a LoRA fine-tuned clinical question-answering model built entirely on AMD hardware using ROCm. It takes a multiple-choice medical question and returns both the correct answer letter and a clinical explanation of the reasoning. The entire training pipeline — from data loading to adapter export — runs on an AMD Instinct MI300X without a single CUDA dependency. - 🤗 Model on HuggingFace Hub: HK2184/medqa-qwen3-lora - 🚀 Live Demo: HuggingFace Spaces - 💻 GitHub: MedQA-Medical-AI-on-AMD-ROCm Why AMD ROCm? The AMD Instinct MI300X is a remarkable piece of hardware: 192 GB of HBM3 memory in a single device. For LLM fine-tuning, VRAM is often the binding constraint — it dictates batch size, sequence length, and whether you need to quantize at all. With 192 GB available, we trained Qwen3-1.7B with LoRA in full fp16 without any 4-bit or 8-bit quantization hacks. More importantly, the goal was to prove that the HuggingFace ecosystem — Transformers, PEFT, TRL, Accelerate — works seamlessly on ROCm. It does. The same training code that runs on CUDA runs on ROCm with three environment variables set: os.environ["ROCR_VISIBLE_DEVICES"] = "0" os.environ["HIP_VISIBLE_DEVICES"] = "0" os.environ["HSA_OVERRIDE_GFX_VERSION"] = "9.4.2" That's it. No code changes. No custom kernels. No CUDA compatibility shims. The Dataset: MedMCQA MedMCQA is a large-scale multiple-choice question dataset derived from Indian medical entrance exams (AIIMS, USMLE-style). Each example contains: - A clinical question - Four answer options (A–D) - The correct answer index - An optional free-text explanation ( exp field) For this project we used 2,000 training samples — a deliberately small slice to demonstrate that meaningful fine-tuning is achievable quickly. Training took approximately 5 minutes on the MI300X. Model: Qwen3-1.7B The base model is Qwen/Qwen3-1.7B — Alibaba's latest small-scale language model. At 1.7 billion parameters it's compact enough to fine-tune cheaply but capable enough to produce coherent clinical reasoning. It supports trust_remote_code=True and loads cleanly with HuggingFace Transformers. The Prompt Format Consistency in prompt formatting is critical for instruction fine-tuning. Every training example and every inference call uses the same template: ### Question: {question} ### Options: A) {opa} B) {opb} C) {opc} D) {opd} ### Answer: {answer_letter}) {answer_text} ### Explanation: {explanation} During training the model sees the full sequence including the answer and explanation. During inference we provide everything up to ### Answer:\n and let the model complete from there. Training with LoRA Rather than fine-tuning all 1.5 billion parameters, we use LoRA (Low-Rank Adaptation) via the PEFT library. LoRA injects small trainable rank-decomposition matrices into the attention layers, leaving the base weights frozen. LoRA Configuration from peft import LoraConfig, get_peft_model,…
