Developing a Heart Language Foundation Model
- Name: Developing a Heart Language Foundation Model , HLM
- EuroHPC machine used: MeluXina
- Topic: Natural sciences; Computer and information sciences
Overview of the project
“Heart language is unique in its voice, attitude, accent, and grammar. We aim to develop a foundational heart language model that analyzes the specifics of each heart. Creating a generative AI solution to detect and classify arrhythmia through heart language analysis is challenging. Research activities demand extensive high-performance computing (HPC) resources and expertise. Our business proposal is to develop a prototype heart language model for a heart monitoring solution in the EU, utilizing wearable ECG sensors and generative AI technology. This solution automatically alerts users to dangerous arrhythmias for outpatients – a currently unavailable service. The benefits include earlier hospital discharges, reduced hospitalization time, improved patient monitoring during everyday activities, prevention of severe heart damage, improved healthcare, and increased life expectancy. Such solutions would boost our sales by reaching more customers, including large healthcare providers. We certified ViewECG as a software medical device (CE Mark) for heart monitoring in 2020. During development, we implemented various signal processing and AI-based algorithms to meet the performance requirements set by medical standards. The AI Cardiologist project (part of the ELISE Open Call, funded by the EU H2020 project no.951847) provided proof of concept and a prototype of a generative AI solution to detect Atrial Fibrillation. The CardioHPC experiment (part of the EU HPC Joint Undertaking through the FF4EuroHPC project under grant agreement No 951745) provided sufficient HPC expertise for developing machine learning (ML) algorithms. We faced challenges while developing additional generative AI models and constructing the heart language model. Our objective is to train the algorithm using a comprehensive set of benchmark ECG data and various tokenizers, which necessitates significant computing resources. In the Elise project, we utilized annotations rather than ECG samples, resulting in over 125 times less data. Developing an initial model took six months using two modern GPUs (NVIDIA Ampere A100). We understood that creating a new generative AI solution would require 100 times more resources, achievable within a reasonable timeframe only through largescale HPC. Through the EuroHPC Benchmark Access call, we received 400 node hours over two months. This access enabled us to utilize the MeluXina GPU partition at LuxProvide in Luxembourg, which was completed in January 2025. We were later granted six months of Development Access on MeluXina, completed in December 2025.”
How did EPICURE support the project and what were the benefits of the support?
“We requested EPICURE support to address scalability and performance challenges in running PyTorch-based foundation model training on EuroHPC infrastructure. The main challenge was migrating from single-node and DataParallel execution to an efficient multi-node, multi-GPU setup using PyTorch Distributed Data Parallel (DDP) under SLURM. EPICURE provided technical guidance and concrete implementations, including a containerized training environment using Apptainer, SLURM job submission scripts, and a torchrun-based launcher correctly configuring NCCL, ranks, and world size across nodes. Additionally, EPICURE optimized the training code by tuning DDP communication (e.g. bucket_cap_mb), enabling gradient accumulation and mixed precision, and integrating profiling tools. This support enabled scalable, synchronized training on MeluXina with improved throughput, stability, and resource utilization.
The support from EPICURE enabled optimized multi-node, multi-GPU distributed training on the MeluXina HPC platform across multiple configurations of CPUs, GPUs, and sliding window sizes evaluated on 16 datasets, delivering significant HPC-level performance gains. Using the provided Apptainer container and SLURM/torchrun launcher, we achieved full GPU utilization with correct device binding across nodes. Performance analysis revealed that a sliding window size of 128 consistently delivered optimal throughput, while CPU scaling beyond 16 cores did not provide additional speedup. In contrast, GPU scaling achieved sub-linear performance gains, with two GPUs yielding a 1.6× speedup and four GPUs reaching a maximum of 1.9×, below the theoretical 4× linear scaling. Key optimizations included reduced gradient communication overhead via DDP bucket tuning, gradient accumulation to maximize effective batch size, and bfloat16 mixed precision to reduce memory footprint and improve throughput. Real-time profiling with Torch Profiler enabled identification of CPU, GPU, and memory bottlenecks, while efficient DataLoader design (pinned memory, persistent workers, DistributedSampler) ensured high data throughput. Collectively, these optimizations enabled successful experimentation across all 16 datasets with diverse model configurations, reduced wall-clock training time, and improved energy efficiency per sample, while highlighting both the strong potential of HPC for accelerating AI model development in healthcare and the need for further code optimization to improve GPU scaling efficiency.”
Additional references
Tudjarski, S., Gusev, M. & Kanoulas, E. (2025). Transformer-based heart language model with electrocardiogram annotations. Sci Rep 15, 5522. https://doi.org/10.1038/s41598-024-84270-x. https://www.nature.com/articles/s41598-024-84270-x
Mileski, D., Petrovski, N., & Gusev, M. (2025). Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs. MIPRO 48th ICT and Electronics Convention, 984-989. https://arxiv.org/pdf/2503.21033
Contact the project:
- Marjan Gusev (marjan.gusev@innovation.com.mk)
- Dimitar Mileski (dimitar.mileski@innovation.com.mk)
- contact@innovation.com.mk