Build a CycleCloud with Slurm Cluster integrated MySQL and Scheduler HA environments
Introduction
In this guide, you can walk through the process of setting up a CycleCloud environment that seamlessly integrates with a Slurm Cluster. Additionally, by exploring the integration of MySQL with external NFS Server that allowing for high availability of the cluster schedulers.
Architecture
With the following architecture, you can build a CycleCloud with Slurm Cluster integrated MySQL and Scheduler HA environments.
All the resources are deployed in the same resource group, virtual network. The CycleCloud Cluster is deployed in the subnet snet-cc
, the Slurm Cluster is deployed in the subnet snet-slurm
, the MySQL Server is deployed in the subnet snet-mysql
, the NFS Server is deployed in the subnet snet-anf
.
The Slurm Cluster source for CycleCloud will be downloaded from the Azure-CycleCloud-Slurm GitHub repository.
Prerequisites
- Virtual network:
# | Virtual Network Name | Address Space |
---|---|---|
1 | vnet-cc-slurm | 10.0.0.0/16 |
- Subnets:
# | Name | Address | NSG | Delegation |
---|---|---|---|---|
1 | snet-cc | 10.0.0.0/24 | nsg-snet-cc | – |
2 | snet-slurm | 10.0.1.0/24 | nsg-snet-slurm | – |
3 | snet-anf | 10.0.2.0/24 | nsg-snet-anf | Microsoft.NetApp/volumes |
4 | snet-pe | 10.0.3.0/24 | nsg-snet-pe | – |
5 | snet-worker | 10.0.4.0/24 | nsg-snet-worker | – |
6 | snet-mysql | 10.0.5.0/23 | nsg-snet-mysql | Microsoft.DBforMySQL/flexibleServers |
- Azure cli command to create a virtual network with subnets:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53az group create -l japaneast -n rg-jpe-cc-slurm-20230715
az network nsg create -g rg-jpe-cc-slurm-20230715 -l japaneast -n nsg-snet-cc
az network nsg create -g rg-jpe-cc-slurm-20230715 -l japaneast -n nsg-snet-slurm
az network nsg create -g rg-jpe-cc-slurm-20230715 -l japaneast -n nsg-snet-anf
az network nsg create -g rg-jpe-cc-slurm-20230715 -l japaneast -n nsg-snet-pe
az network nsg create -g rg-jpe-cc-slurm-20230715 -l japaneast -n nsg-snet-worker
az network nsg create -g rg-jpe-cc-slurm-20230715 -l japaneast -n nsg-snet-mysql
az network vnet create \
--resource-group rg-jpe-cc-slurm-20230715 \
--name vnet-jpe-cc-slurm-20230715 \
--address-prefixes 10.0.0.0/16 \
--subnet-name snet-cc \
--subnet-prefixes 10.0.0.0/24 \
--network-security-group nsg-snet-cc
az network vnet subnet create \
--resource-group rg-jpe-cc-slurm-20230715 \
--vnet-name vnet-jpe-cc-slurm-20230715 \
--name snet-slurm \
--address-prefixes 10.0.1.0/24 \
--network-security-group nsg-snet-slurm
az network vnet subnet create \
--resource-group rg-jpe-cc-slurm-20230715 \
--vnet-name vnet-jpe-cc-slurm-20230715 \
--name snet-anf \
--address-prefixes 10.0.2.0/24 \
--network-security-group nsg-snet-anf \
--delegations Microsoft.NetApp/volumes
az network vnet subnet create \
--resource-group rg-jpe-cc-slurm-20230715 \
--vnet-name vnet-jpe-cc-slurm-20230715 \
--name snet-pe \
--address-prefixes 10.0.3.0/24 \
--network-security-group nsg-snet-pe
az network vnet subnet create \
--resource-group rg-jpe-cc-slurm-20230715 \
--vnet-name vnet-jpe-cc-slurm-20230715 \
--name snet-worker \
--address-prefixes 10.0.4.0/24 \
--network-security-group nsg-snet-worker
az network vnet subnet create \
--resource-group rg-jpe-cc-slurm-20230715 \
--vnet-name vnet-jpe-cc-slurm-20230715 \
--name snet-mysql \
--address-prefixes 10.0.5.0/24 \
--network-security-group nsg-snet-mysql \
--delegations Microsoft.DBforMySQL/flexibleServers
Deploy CycleCloud Cluster
- Refer to Create a CycleCloud Cluster to create a CycleCloud Cluster.
- After the CycleCloud Cluster is created, you can access the CycleCloud Web UI with the URL: http://cyclecloud-cluster-IP
Note: Before accessing the CycleCloud Web UI, you need to add an inbound rule to the NSG of the CycleCloud Cluster subnet. The inbound rules (HTTP/HTTPS) should allow access from your IP address to the CycleCloud Cluster subnet.
Create a Storage account for CycleCloud locker storage.
(1) Create a storage account with private endpoint enabled for as a private CycleCloud locker storage.Add a subscription to CycleCloud Cluster
(1) Before adding a subscription, you need to enable the system assigned identity of CycleCloud Cluster VM.
(2) On VM settings, click Identity
and click Azure role assignments
to add a role assignment to the system assigned identity.
(3) Assign the subscription contributor
role to the system assigned identity. Contributor role has a higher privilege level than CycleCloud required. In case of security concern, you can assign a lower privilege level role to the system assigned identity. Refer to CycleCloud Documentation to crate a custom role for CycleCloud.
(4) After adding a role assignment to the system assigned identity, you can add a subscription to CycleCloud Cluster. On the CycleCloud Web UI, click Add Subscription
and input the storage created in previous step. Save as a CycleCloud subscription.
Prepare a NFS for Slurm cluster
As we’re building a high availability Scheduler environment, we need to prepare a NFS server for Slurm Cluster. The NFS server is used to store the Slurm configuration files and the Slurm state files. The NFS server can be a VM or a NFS service. In this case, we use Azure NetApp Files as a NFS service.
- Create Azure NetApp Files
(1) Create a Azure NetApp Files service. Refer to Create a Azure NetApp Files service to create a Azure NetApp Files service and a capacity pool.
(2) Create two volumes in the capacity pool ( sched
and shared
).
A. Set the Network features
to Standard
B. Set the Protocol types
to NFSv4.1
C. Set the Unix permissions
to 0775
- Write down the NFS mount point of the two volumes. The NFS mount point is used to mount the volumes to the Slurm Cluster VMs.
Create a MySQL Database Service
Create a MySQL Database Service. Refer to Create a MySQL Database Service to create a MySQL Flexible Server.
In the network setting, select the Private access (VNet Integration)
as Connectivity method.
Prepare a VM image for Slurm Cluster
In this tutorial, we use OpenLogic CentOS 7.9 generation 2 from Marketplace as a base OS image. You can use your own VM image as a base OS image.
- Create a VM from the base OS image. Login to the image and install the following packages.
1
2
3
4sudo -i
yum update
wget -P /tmp https://raw.githubusercontent.com/themorey/cyclecloud-scripts/main/slurm-install.sh
bash /tmp/slurm-install.sh
After installing the packages, run the following command to check if it works properly without error.
1
2which sinfo
id slurmDeprovision the VM, remove +user if you want to keep the user account.
1
waagent deprovision+user --force
There is an issue that Scheduler VM not register to Azure DNS successfully, thus we need to do following step to mitigate the issue.
We need to this step after deprovision waagent and before create a VM image.
Check the /etc/sysconfig/network-scripts/ifcfg-eth0
file and remove DHCP_HOSTNAME=localhost.localdomain
- Capture the VM to VM image. Refer to Create a VM image to create a VM image.
Prepare a Slurm Cluster project template for CycleCloud
Login to CycleCloud Server and initialize the CycleCloud Server.
1
cyclecloud initialize
Login to CycleCloud Server and prepare Azure-Slurm 2.7.3 project
(1) Get the Azure-Slurm 2.7.3 project from GitHub.
1 | mkdir slurm273 |
(2) Build the project
1 | cd slurm273 |
(3) Add following lines to ~/slurm273/specs/default/chef/site-cookbooks/slurm/recipes/login.rb for login server template
1 | link '/etc/slurm/keep_alive.conf' do |
- Preload the MySQL certificate
(1) On CycleCloud Server, download the certificate to ~/slurm273/specs/default/chef/site-cookbooks/slurm/files/default
folder.
1 | cd ~/slurm273/specs/default/chef/site-cookbooks/slurm/files/default |
(2) Edit the ~/slurm273/specs/default/chef/site-cookbooks/slurm/recipes/accounting.rb
file with following changes.
A. Change the remote_file
to cookbook_file
B. Comment out the original source line
C. Add a new source to DigiCertGlobalRootCA.crt.pem
1 | cookbook_file '/etc/slurm/BaltimoreCyberTrustRoot.crt.pem' do |
- Edit the Slurm Cluster template
(1) Edit the ~/slurm273/templates/slurm.txt
file, change following lines.
1 | [[[cluster-init slurm:default:2.7.3]]] |
(2) Run the following command to set slurm.install to false.
1 | sed -i '/slurm_version$/a \\tslurm.install = false' ~/slurm273/templates/slurm.txt |
- Upload the project files to CycleCloud locker storage.
1 | cyclecloud locker list |
Create a Slurm Cluster with CycleCloud
Login to CycleCloud Server, run the following command to create a Slurm Cluster.
1 | cyclecloud import_cluster slurm-20230715 -c slurm -f ~/slurm273/templates/slurm.txt |
Configure Slurm Cluster
- Login to CycleCloud web UI, edit the Slurm Cluster settings
(1) Configure the Network Attached Storage
to the NFS volume created in previous step. Where IP address is your NFS volume mount path.
(2) Configure the Advanced Settings
to enable the Slurm HA Scheduler
and MySQL HA Database
.
(3) Set the Scheduler OS and HPC OS, HTC OS to custom image
and set to VM image’s resource id.
(4) Uncheck Return Proxy
and Public Head Node
settings.
Now we have all the components ready, we can start to use Slurm Cluster.
Reference
- GitHub - Azure CycleCloud Slurm
- Quickstart - Install CycleCloud using the Marketplace image
- Create a custom role and managed identity for CycleCloud
- Create an NFS volume for Azure NetApp Files
- Quickstart: Use the Azure portal to create an Azure Database for MySQL - Flexible Server
- Remove machine specific information by deprovisioning or generalizing a VM before creating an image