How-to Guide: Biomolecule structure prediction tools on Leonardo
Author: Leonardo Salicari (CINECA)
In this how-to guide, we present how to install and use four tools used in bioinformatics and structural biology on Leonardo.
For more information regarding Python and virtual environment usage on Leonardo, please refer to the documentation on the topic for more details.
AlphaFold3 v3.0.1
Since September 2025, AF3 has been available as a module on Leonardo:
ml profile/bioinf alphafold/3.0.1
This exposes the AF3 Singularity container that has already been tested. Non-commercial academic users can request access to the AF3 weights via the dedicated AlphaFold3 form. Furthermore, the public databases required for inference are available through the environmental variable ALPHAFOLD_PUBLIC_DB. For more info, use ml show alphafold/3.0.1 (requires ml profile/bioinf).
Testing the installation and usage
This example assumes that you already requested and uploaded on Leonardo the AF3 weights. Moreover, the path to these is stored in an environmental variable called ALPHAFOLD_PRIVATE_MODELS (see example below). We create an example from a small protein sequence. First, we make an inputs directory and create a new file called input.json:
mkdir inputs
cd inputs
nano input.json
and paste the following input json:
{
"name": "8UVY_A",
"sequences": [
{
"protein": {
"id": "A",
"sequence": "QVQLQESGGGLVQAGGSLRLSCAASGIDVRIKTMAWYRQAPGKQRELLASVLVSGSTNYADPVKGRFTISRDNAKNTVYLQMNKLIPDDTAVYYCNTYGRLRRDVWGPGTQVTVSS"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
This sequence is taken from chain A of protein 8UVY.
Finally go back to the parent directory:
cd ..
To run the prediction in an interactive mode:
srun -A ACCOUNT_NAME -p boost_usr_prod -t 0:30:00 --gres=gpu:1 -N 1 -n 4 --cpus-per-task=2 --pty /bin/bash # enter interactive session
ml profile/bioinf alphafold/3.0.1
mkdir outputs
export ALPHAFOLD_PRIVATE_MODELS="/path/to/af3/weights" # FIXME
singularity run --nv \
-B $ALPHAFOLD_PRIVATE_MODELS:/root/models \
-B $ALPHAFOLD_PUBLIC_DB:/root/public_databases \
-B /leonardo/prod/opt \
-B $TMPDIR:/tmp \
-B "outputs:/output_dir" \
-B "inputs:/input_dir" \
$ALPHAFOLD_IMAGE \
python3 /app/alphafold/run_alphafold.py \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--json_path=/input_dir/input.json \
--output_dir="/output_dir" \
--jackhmmer_n_cpu=$SLURM_CPUS_PER_TASK \
--nhmmer_n_cpu=$SLURM_CPUS_PER_TASK
NOTE: all uppercase variables are environmental variables set by importing the AF3 module. The run should take around ten minutes. A healthy AF3 outputs contains the following:
…
Found local devices: [CudaDevice(id=0)], using device 0: cuda:0
…
Done processing fold input 8UVY_A.
Done processing 1 fold inputs.
which highlights the recognition of the GPU and the correct generation of the predicted structures, stored in outputs.
Boltz2 v2.2.0
Installation
ml python/3.11.7
ml cuda/12.3 # NOTE: use 12.3 version
python -m venv boltz
. boltz/bin/activate # activating venv
python -m pip install boltz[cuda]==2.2.0 -U
NOTE: the -U flag updates all packages to the newest. A new venv is advised.
Testing the installation and usage
Download examples from the Git repository:
git clone https://github.com/jwohlwend/boltz.git
A guide on how to predict a structure can be found in the prediction documentation.
On the compute nodes, no internet connection is available. Therefore, to perform inference on compute nodes, one has to download the required models in advance. The necessary files to download are:
mkdir .boltz
cd .boltz
wget https://huggingface.co/boltz-community/boltz-1/resolve/main/ccd.pkl
wget https://huggingface.co/boltz-community/boltz-2/resolve/main/mols.tar
wget https://model-gateway.boltz.bio/boltz1_conf.ckpt
wget https://huggingface.co/boltz-community/boltz-1/resolve/main/boltz1_conf.ckpt
wget https://model-gateway.boltz.bio/boltz2_conf.ckpt
wget https://huggingface.co/boltz-community/boltz-2/resolve/main/boltz2_conf.ckpt
wget https://model-gateway.boltz.bio/boltz2_aff.ckpt
wget https://huggingface.co/boltz-community/boltz-2/resolve/main/boltz2_aff.ckpt
For more details on the required files, see the Boltz source code.
NOTE: When running the example provided in the repository, one needs to modify the input .yaml file with the correct paths to the downloaded models.
Finally, the test can be run using an interactive session:
srun -A ACCOUNT_NAME -p boost_usr_prod -t 0:30:00 –gres=gpu:1 -N 1 -n 8 –pty /bin/bash # enter interactive session
ml python/3.11.7
ml cuda/12.3
. boltz/bin/activate # activating venv
boltz predict boltz/examples/prot_custom_msa.yaml –out_dir data/output_b/ –cache .boltz
This also serves as an example to run a prediction. For bigger predictions, it is advised to use the Slurm sbatch mode instead of the interactive session.
Some optimization tricks for NVIDIA accelerators, like the one used in the Leonardo Booster partition, can be found in the NVIDIA documentation.
NanoMelt v1.3.0
Installation
This software requires a conda-compatible package manager.
Firstly, configure miniconda following CINECA’s documentation on the topic. Importantly, the channels used and their order are essential to avoid download limitations of the anaconda channel.
When the conda command is available to the user, the installation is:
conda create -n env_nanomelt -y
conda activate env_nanomelt
ml cuda/12.6 # should be compatible with the one used in compiling openmpi, if issues arise, use 12.2
conda install -c conda-forge openmm cuda-version=12 pdbfixer biopython -y
ml openmpi/4.1.6–gcc–12.2.0-cuda-12.2
conda install “openmpi=4.1.6=external_*”
conda-forge provides “dummy” MPI packages, such as openmpi=x.y.z=external_* that are empty packages with no content. Installing these satisfies conda dependencies without installing OpenMPI binaries that are already present and optimized for Leonardo through modules.
NOTE: At runtime, your environment must have the system MPI libraries in the dynamic link path (LD_LIBRARY_PATH or equivalent) so the program links against the system MPI. This variable is automatically set for you once you load the MPI module with ml openmpi.
Finally,
conda install -c bioconda anarci -y
python -m pip install nanomelt
In the first inference, NanoMelt needs to download model parameters. It is best practice to perform this using the Leonardo serial queue lrd_all_serial. To download models:
srun -A YOUR_ACCOUNT -p lrd_all_serial -t 01:00:00 -N 1 -n 1 –pty /bin/bash
nanomelt predict –help
This will start the downloads in an interactive session.
Testing the installation and usage
Download test directory:
git clone https://gitlab.developers.cam.ac.uk/ch/sormanni/nanomelt.git
cd nanomelt/tests
Source the virtual environment:
srun -A YOUR_ACCOUNT -p boost_usr_prod -q boost_qos_dbg -t 0:30:00 –gres=gpu:1 -N 1 -n 8 –pty /bin/bash # enter interactive session
ml cuda/12.6
ml openmpi/4.1.6–gcc–12.2.0-cuda-12.2
conda activate env_nanomelt
Finally, to run the test:
nanomelt predict -i application6.fa -o testrun.csv -align -ncpu $SLURM_CPUS_ON_NODE
Similarly, this can be achieved in sbatch mode, using a job script and the environmental variable $SLURM_CPUS_PER_TASK.
The expected output containes:
Will transfer ESM model to GPU
anarci_alignments_of_Fv_sequences RUNNING PARALLEL
myparallel.chunk_list_of_args on 6 cpus with 6 chunks of sizes [1, 1, 1, 1, 1, 1] type <class ‘list’>
Chai1 v0.6.1
Installation
ml python/3.11.7
ml cuda/12.6
python -m venv chai1 –system-site-packages
. chai1/bin/activate # activating venv
python -m pip install chai_lab==0.6.1 # pinned stable version
Testing the installation and usage
The test requires the GitHub repo (to be run on the login node):
git clone https://github.com/chaidiscovery/chai-lab.git
to obtain the example directory.
We want to test the installation on chai-lab/example/msas/predict_with_msas.py.py.
Chai needs data to run the example prediction in chai-lab/example/msas/predict_with_msas.py.py. Since Leonardo compute nodes have no access to the internet, one has to download the necessary files before-hand. The required files are:
-
conformers:
mkdir data/input
cd data/input
wget https://chaiassets.com/chai1-inference-depencencies/conformers_v1.apkl
-
esm:
# pwd -> /path/to/data/input
mkdir esm
cd esm
wget https://chaiassets.com/chai1-inference-depencencies/esm2/traced_sdpa_esm2_t36_3B_UR50D_fp16.pt
-
the model:
# pwd -> /path/to/data/input
mkdir model_v2
cd model_v2
wget https://chaiassets.com/chai1-inference-depencencies/models_v2/feature_embedding.pt
wget https://chaiassets.com/chai1-inference-depencencies/models_v2/bond_loss_input_proj.pt
wget https://chaiassets.com/chai1-inference-depencencies/models_v2/token_embedder.pt
wget https://chaiassets.com/chai1-inference-depencencies/models_v2/trunk.pt
wget https://chaiassets.com/chai1-inference-depencencies/models_v2/diffusion_module.pt
wget https://chaiassets.com/chai1-inference-depencencies/models_v2/confidence_head.pt
How to find the missing file URL
If additional data needs to be downloaded, the approach is the following:
- Read the traceback and find the function call before
download_if_not_exists - In the source code, search for the line in which this method is called on a
paths object - Check the URL used and download from there. Usually some hint on the file name are in the traceback while the URLs are in the source code.
The final directory structure should be:
├── chai-lab
│ └── examples
│ └── msas
├── data
│ ├── input
│ │ ├── esm
│ │ └── models_v2
│ └── output
└── chai1
└── bin
└── activate
Now, modify the output dir in the example script:
output_dir = Path(“/path/to/data/output”) #previously: tmp_dir / “outputs”
Finally, the test can be run as:
srun -A ACCOUNT_NAME -p boost_usr_prod -t 0:30:00 –gres=gpu:1 -N 1 -n 8 –pty /bin/bash # enter interactive session
ml python/3.11.7
ml cuda/12.6
. ./chai1/bin/activate # activating venv
CHAI_DOWNLOADS_DIR=./data/input python ./chai-lab/examples/msas/predict_with_msas.py
In the last line, we instruct the Chai runtime where to find the downloaded data. This example might take some minutes.