EPICURE support

To benefit from EPICURE support, you must have an active EuroHPC Joint Undertaking (JU) allocation project. You may apply for the open EuroHPC calls and their support services through open EuroHPC JU Access Calls.

We can guide you through the application process. Get in touch with us: epicure-applications@postit.csc.fi

You can usually get support within a few days after submitting a request. The process is straightforward: once your request is accepted, an initial meeting will be scheduled to discuss your needs and determine the next steps for support. You can submit your support request directly at https://eurohpcsupport.eu/apply-now/.

EPICURE provides Level 2 and Level 3 support in code enabling and scaling, performance analysis, benchmarking, code refactoring, and code optimization. Please find more information about our support services from the EPICURE website: https://epicure-hpc.eu/support-services/

EPICURE does not provide Level 1 support. Level 1 support refers to basic customer support (e.g., troubleshooting minor issues).

We will guide you through this process but here are some examples on how to do this:

LUMI: Go to https://my.lumi-supercomputer.eu/login/ > Project (left menu bar) > (project name) > team > Add. Add the email address the EPICURE team members gave you.

Karolina: All users must have an active IT4I account. Once their account is created, you can add them to your project using their login name: https://docs.it4i.cz/en/docs/general/get-project/applying-for-resources#authorize-collaborators-for-your-project . If you have any issues, please contact user support at: support[at]it4i.cz

General supercomputer-related questions

No. There’s also a need for an in-depth support which is out-of-scope for EPICURE if the problem lies in the core functions of the code.

Yes. Installing user-level software is common and supported on HPC systems, as it is not possible to pre-install or provide every tool or application that users may require.

Therefore, there is a best practice for this:

  • Check the availability of the required software
    • Run ‘module avail’ or ‘module spider’, review the documentation or ask colleagues/HPC support.
    • If they are available, try to use them first, as the centrally provided versions are typically better configured and supported.
  • Install only in approved locations that you control
    • Check the documentation for your system to find out what is shared and what is backed up.
    • $HOME space is typically suited to small, non-shared installations.
    • $PROJECT/$GROUP space is typically suited to shared software.
    • $SCRATCH/$WORK space for large, temporary builds.
  • Check that no root access is required (e.g. it can be installed in a user space, no access to hardware counters, etc.).
  • Use the appropriate package managers
    • ‘pip’ works well when used with virtual environments or with the ‘–user’ install option.
    • mircomamba / mini-forge / etc.
    • Make sure to install environments in your own directory.
    • Check the HPC documentation for any restrictions or preferred Python environment setups.
  • Respect licensing requirements
    • Ensure that you are licensed to install/use the software.
    • Verify that licence servers are reachable from compute nodes, if needed.
  • Organise and manage your environment.
    • Maintain a clear directory structure.
    • Provide module files for easy loading and version management.
  • Use HPC-friendly packaging tools where possible
    • Spack and EasyBuild simplify complex dependency management.
    • They help to ensure portability and reproducibility across teams.

If you are unsure about the location, licensing or best practice, contact HPC (or EPICURE) support for assistance. They may even integrate your successfully installed software into the module system for shared use.

On LUMI, you should provide the license for the additional software you want to use.

Right now, Deucalion provides software mostly built with EasyBuild (https://easybuild.io/), with some exceptions for specific toolchains (such as Fujitsu’s proprietary toolchain in the ARM partition). We also support EESSI (https://www.eessi.io/) in the three partitions. If you need help creating your own files, please contact the Deucalion customer support team!

A complete overview of available modules can be found here: https://docs.it4i.cz/en/docs/software/modules/modules-matrix

When you are allocated a project on an HPC system, you are granted specific quotas that determine how many resources you can use.

The first of these is compute time, which is often expressed as core-hours (node-hours/GPU-hours). Some HPC centres allow you to continue submitting jobs after the allocation has been used up, but with a reduced priority; others block submissions. Some centres also define how compute time may be consumed, e.g. on some HPC systems you can use the entire allocation at any time during the project period; on others you are expected to use resources linearly throughout the project time to ensure a fair and balanced system load. You should therefore always check the local policy to avoid unexpected interruptions in your work.

In addition to compute time quotas, you will have storage quotas, which are set differently on each filesystem. If you exceed the quota for any filesystem, you may be unable to create new files or directories until usage is reduced.

Finally, each filesystem also has an inode quota, which limits the number of files and directories that can be stored. Even if you have free storage capacity, you cannot create additional files once the inode limit is reached. Inode limits can be just as restrictive as storage size.

If you anticipate any violations of quota policies (e.g. large spikes in workload, significant increase in storage requirements, or workflows that generate many small files), contact HPC Support well in advance, as quota adjustments often require approval and planning.

Examples

MareNostrum 5:

Use this command to get information about all your accounts

module load bsc

bsc_acct

Use this command to get the quota information for all your groups/accounts.

bsc_quota

Deucalion

Run the ‘billing’ command to see how many hours you have left in your account. To get usage per user for a specific account, run billing -a <your account>.

For a supercomputer, we’d recommend python venv in most cases. Here’s why:

Advantages of venv on supercomputers:

  • Faster and lighter – No heavy dependency solver, minimal overhead
  • Better compatibility – Uses the system’s optimised Python builds that HPC staff have compiled with performance flags
  • Simpler – Easier for sysadmins to support and troubleshoot
  • Module system integration – Works well with environment modules (Lmod/TCL modules) that most HPC systems use
  • No conda conflicts – Avoids issues where conda’s packages override carefully-tuned system libraries

When conda might be better:

  • You need non-Python dependencies (like CUDA libraries, compilers) that aren’t available via modules
  • You’re using complex scientific stacks where conda handles tricky dependency chains (e.g., some bioinformatics workflows)
  • You need exact reproducibility across different systems

Best practice on HPC:

# Load optimized Python from modules
module load python/3.11

# Create venv
python –m venv myenv
source myenv/bin/activate

# Install what you need
pip install numpy scipy torch

Many supercomputers explicitly recommend against conda because it can interfere with their optimised software stack. Check your centre’s documentation first – they often have specific guidance and may provide pre-built environments for common frameworks.

Yes, you can. For documentation on using Vega, visit Vega Docs (https://en-vegadocs.vega.izum.si/ ). If you need additional support or access on Vega, you can contact the team at support@sling.si.

Many HPC systems set job runtime limit to ensure the system runs efficiently for everyone. Shorter jobs can be fitted nicely into open time slots, thus improving overall cluster usage. It also promotes fairness by preventing one user from monopolising machines for days. There is less risk if a job crashes because fewer resources are wasted. Longer jobs (24+ hours) are possible on some systems, but they may be given a lower priority or require special approval. The best approach is to estimate your runtime realistically. Overestimating the runtime too much will slow your job in the queue, while underestimating will result in it being terminated early.

If your workflow requires a runtime of more than 24 hours, there are several options available to you.

  • Implement checkpointing/restart functionality in your application so that it can periodically save its state. If the job reaches the time limit, you can resume from the last checkpoint rather than starting again from the beginning.
  • Divide your workflow into logical segments (e.g. time steps, iterations or data chunks) and submit them as separate jobs that will start automatically once the previous one has finished. You can use job arrays or job dependencies functionality in Slurm for this [1,2]. This keeps each job within the time limit while still enabling a long workflow to run.
  • Request access to longer queues (if available). Some systems offer special queues or partitions with extended limits for eligible workloads. These often require justification and approval.

If you are unsure what to choose, or if none of the above options are feasible, contact the HPC support team for the system you are using and/or consult EPICURE support, who can help you identify the best strategy for your specific application.

[1] https://slurm.schedmd.com/job_array.html

[2] https://slurm.schedmd.com/sbatch.html

Jobs will wait in the queue with a PD (Pending) status until the SLURM job scheduler finds resources corresponding to your request and can launch your job, this is normal. In the squeue output, the NODELIST(REASON) column will show why the job is not yet started.

Common job reason codes:

  • Priority, One or more higher priority jobs are in queue for running. Your job will eventually run, you can check the estimated StartTime using scontrol show job $JOBID.
  • AssocGrpGRES, you are submitting a job to a partition you don’t have access to.
  • AssocGrpGRESMinutes, you have insufficient node-hours on your monthly compute allocation for the partition you are requesting.

If the job seems not to start for a while check the MeluXina Weather Report for any ongoing platform events announced by our teams, and if no events are announced, raise a support ticket in our ServiceDesk

  • Ensure that you are using the correct port (8822), e.g. ssh yourlogin@login.lxp.lu -p 8822
  • Ensure that your organization / network settings are not blocking access to port 8822
  • Ensure that you are connecting to the master login address (login.lxp.lu) and not a specific login node (login[01-04].lxp.lu) as it may be under maintenance
  • Check the MeluXina Weather Report for any ongoing platform events announced by our teams

For security reasons, BSC clusters are unable to open outgoing connections to other machines, either internal (other BSC facilities) or external, but they will accept incoming connections.

At BSC each user can have multiple QOS and Accounts, to ensure a proper accounting any job needs to include the final Account to be used in the job for accounting purposes and the QOS. To know which queues and Accounts each user have, there are some commands available

module load bsc

bsc_queues

bsc_project list

If your question concerns EPICURE support or European HPC Application Support Portal, please contact pmo-epicure@postit.csc.fi.

If you have a specific question about our HPC systems see the following websites or contact the support services of the HPC system in question:

Deucalion: https://docs.macc.fccn.pt/

Discoverer: https://docs.discoverer.bg/resource_overview.html

Jupiter: https://www.fz-juelich.de/en/jsc/jupiter/faq

Karolina: https://docs.it4i.cz/en/docs/general/support 

Leonardo: https://docs.hpc.cineca.it/faq.html

LUMI: https://www.lumi-supercomputer.eu/faq/

MareNostrum 5: https://www.bsc.es/supportkc/FAQ

MeluXina: https://docs.lxp.lu/FAQ/faq/

Vega: https://en-vegadocs.vega.izum.si/faq/