Skip to content

Latest commit

 

History

History
1358 lines (1070 loc) · 41.9 KB

Antilles_installation_guide_EL7.md

File metadata and controls

1358 lines (1070 loc) · 41.9 KB

Antilles Installation Guide (For EL7) 1.0.0

1. Overview

Introduction to Antilles

Antilles is an infrastructure management software for high-performance computing (HPC). It provides features like cluster management and monitoring, job scheduling and management, cluster user management, account management, and file system management.

With Antilles, users can centralize resource allocation in one supercomputing cluster and carry out HPC jobs. Users can perform operations by logging in to the management system interface with a browser, or by using command lines after logging in to a cluster login node with another Linux shell.

Typical cluster deployment

This Guide is based on the typical cluster deployment that contains management, login, and compute nodes.

Elements in the cluster are described in the table below.

Element Description
Management node Core of the HPC cluster, undertaking primary functions such as cluster management, monitoring, scheduling, strategy management, and user & account management.
Compute node Completes computing tasks.
Login node Connects the cluster to the external network or cluster. Users must use the login node to log in and upload application data, develop compilers, and submit scheduled tasks.
Parallel file system Provides a shared storage function. It is connected to the cluster nodes through a high- speed network. Parallel file system setup is beyond the scope of this Guide. A simple NFS setup is used instead.
Nodes BMC interface Used to access the node’s BMC system.
Nodes eth interface Used to manage nodes in cluster. It can also be used to transfer computing data.
High speed network interface Optional. Used to support the parallel file system. It can also be used to transfer computing data.

Note: Antilles also supports the cluster deployment that only contains the management and compute nodes. In this case, all Antilles modules installed on the login node need to be installed on the management node.

Operating environment

Cluster server:

Lenovo ThinkSystem servers

Operating system:

CentOS / Red Hat Enterprise Linux (RHEL) 7.5

Client requirements:

  • Hardware: CPU of 2.0 GHz or above, memory of 8 GB or above
  • Browser: Chrome (V 62.0 or higher) or Firefox (V 56.0 or higher) recommended
  • Display resolution: 1280 x 800 or above

Supported servers and chassis models

Antilles can be installed on certain servers, as listed in the table below.

Product code Machine type Product name Appearance
sd530 7X21 Lenovo ThinkSystem SD530 (0.5U)
sr630 7X01, 7X02 Lenovo ThinkSystem SR630 (1U)
sr650 7X05, 7X06 Lenovo ThinkSystem SR650 (2U)
sd650 7X58 Lenovo ThinkSystem SD650 (1U)
sr670 7Y36, 7Y37, 7Y38 Lenovo ThinkSystem SR670 (2U)

Antilles can be installed on certain chassis models, as listed in the table below.

Product code Machine type Model name Appearance
d2 7X20 D2 Enclosure (2U)
n1200 5456, 5468, 5469 NeXtScale n1200 (6U)

2. Prepare the cluster environment

Prepare cluster OS and network enviroment

Cluster OS and network should be ready.

Prepare infrastructure software for nodes

Install the cluster infrastructure in accordance with the OHPC installation manual. Necessary software are listed in the table below.

Software name Component name Version Service name Notes
nfs nfs-utils 1.3.0 nfs-server
ntp ntp 4.2.6 ntpd
slurm ohpc-slurm-server 1.3.4 munge, slurmctld
ohpc-slurm-client 1.3.4 munge, slurmd
ganglia ganglia-gmond-ohpc 3.7.2 gmond
mpi openmpi3-gnu7-ohpc 3.0.0 At least one MPI type required. Make sure compute nodes can use command mpirun directly instead of specifying path to command mpirun
mpich-gnu7-ohpc 3.2.1
mvapich2-gnu7-ohpc 2.2

If GPU node(s) is included in the cluster, CUDA should also be installed on the GPU node(s). Recommended CUDA version is 9.1. Installation reference: https://developer.nvidia.com

Prepare Antilles Repo

Packaging Antilles into RPMs and create repo for installing antilles modules.

Step 1. Download Antilles code from github.

Step 2. Run the following commands to packaging Antilles and create Antilles Repo:

cd /path/to/antilles
./packaging_antilles_el7.sh

Note: /path/to/antilles should be replaced by the real path where antilles code locate.

After create Antilles repo, distribute the antilles repo file to all nodes which would install antilles.

Prepare python modules Antilles requires

Before installing Antilles packages you should pre install python modules that Antilles packages require. Each of Antilles module has requirements.txtfiles, you should make sure all the python modules installed and the module's version is conforming to the requirements file.

3. Install Antilles dependencies

Check infrastructure environment

Make sure that the cluster environment is ready.

Configure environment variables

Step 1. Log in to the management node.

Step 2. Run the following commands to configure environment variables for the entire installation process:

su root
cd ~
vi antilles_env.local

Step 3. Run the following commands to edit the antilles_env.local file:

# Management node hostname
sms_name="head"

# IP address of management node in the cluster intranet
sms_ip="192.168.0.1"

# Set the domain name
domain_name="hpc.com"

# Set OpenLDAP domain name
antilles_ldap_domain_name="dc=hpc,dc=com"

# set OpenLDAP domain component
antilles_ldap_domain_component="hpc"

Step 4. Save the changes to antilles_env.local.

Step 5. Run the following commands to make the configuration file take effect:

chmod 600 antilles_env.local
source antilles_env.local

Precheck

Make sure services below already prepared.

Infrastructure Software Check Command Notes
nfs showmount -e ${sms_ip}
slurm sinfo
ganglia gstat -a
mpi module list
cuda nvidia-smi required by GPU node
OpenHPC yum repolist

List of Antilles dependencies to be installed

Note: In the Installation node column, M stands for "Management node", L stands for "Login node", and C stands for "Compute node".

Software name Component name Version Service name Installation node Notes
rabbitmq rabbitmq-server 3.6.15 rabbitmq-server M
postgresql postgresql-server 9.2.23 postgresql M
influxdb influxdb 1.5.4 influxdb M
confluent confluent 2.0.2 confluent M
openldap slapd-ssl-config 1.0.0 slapd M
nss-pam-ldapd 0.8.13 nslcd M, C, L
libuser 0.60 M
libuser-python 0.60 M
gmond gpu plugin gmond-ohpc-gpu-module 1.0.0 C Required only on the GPU node

Install Antilles dependencies


Install RabbitMQ

Antilles uses RabbitMQ as a message broker.

Installation Reference: http://www.rabbitmq.com/


Install PostgreSQL

Antilles uses PostgreSQL as an object-related database for data storage.

Installation Reference: https://www.postgresql.org/

After installation, run the following commands to configure PostgreSQL:

# Stop PostgreSQL on the management node
systemctl stop postgresql

# Initialization and passwords can be changed as needed.
su - postgres
echo <PG_PASSWORD> > /var/tmp/pwfile

# The value for the -U switch on the initdb command ("postgres") is the postgres username.
# Make sure to make note of this, as well as the other usernames and passwords used during this
# setup as they will be needed during the ico-passwd-tool step later in the installation process.
initdb -U postgres --pwfile /var/tmp/pwfile /var/lib/pgsql/data

rm /var/tmp/pwfile
exit

# Starting PostgreSQL
systemctl enable postgresql
systemctl start postgresql

# Create Antilles database
psql -U postgres -c "CREATE DATABASE antilles;"

Install InfluxDB

Antilles uses InfluxDB as a time series database for storage monitoring.

Installation Reference: https://www.influxdata.com/

After installation, run the following commands to create InfluxDB users:

# Start InfluxDB
systemctl enable influxdb
systemctl start influxdb

# Enter the InfluxDB shell
# To create an administrator user, please note that the password must be a string, otherwise the error is reported.
influx
> create database antilles
> use antilles
> create user <INFLUX_USERNAME> with password '<INFLUX_PASSWORD>' with all privileges
> exit

# configuration
sed -i '/auth-enabled = false/a\ auth-enabled = true' /etc/influxdb/config.toml

# restart InfluxDB
systemctl restart influxdb

Install Confluent

Installtion Reference: https://hpc.lenovo.com/yum/latest/

Run the following commands to configure Confluent:

# Start confluent
systemctl enable confluent
systemctl start confluent

# Create confluent count
source /etc/profile
confetty create /users/<CONFLUENT_USERNAME> password=<CONFLUENT_PASSWORD>

Configure user authentication

Note: If you have already configured OpenLDAP for the cluster.

Install OpenLDAP-server

OpenLDAP is an open-source version of the lightweight directory access protocol. Antilles recommends using OpenLDAP to manage users; however, it also supports other authentication services compatible with Linux-PAM.

Run the following commands to install OpenLDAP:

# Install OpenLDAP
yum install -y slapd-ssl-config

# Run the following commands to modify the configuration file
sed -i "s/dc=hpc,dc=com/${antilles_ldap_domain_name}/" /usr/share/openldap-servers/antilles.ldif
sed -i "/dc:/s/hpc/${antilles_ldap_domain_component}/" /usr/share/openldap-servers/antilles.ldif
sed -i "s/dc=hpc,dc=com/${antilles_ldap_domain_name}/" /etc/openldap/slapd.conf
slapadd -v -l /usr/share/openldap-servers/antilles.ldif -f /etc/openldap/slapd.conf -b ${antilles_ldap_domain_name}

# set password
# Get the key using the following command and enter <LDAP_PASSWORD> when prompted.
slappasswd

# Edit the file /etc/openldap/slapd.conf to cover the contents of the rootpw with the key obtained.
rootpw <ENCRYPT_LDAP_PASSWORD>
chown -R ldap:ldap /var/lib/ldap
chown ldap:ldap /etc/openldap/slapd.conf

# Edit configuration files
vi /etc/sysconfig/slapd
# Please make sure the next two lines are uncommented
SLAPD_URLS="ldapi:/// ldap:/// ldaps:///"
SLAPD_OPTIONS="-f /etc/openldap/slapd.conf"

# Start OpenLDAP service
systemctl enable slapd
systemctl start slapd

# check service
systemctl status slapd
Install libuser

The libuser module is a recommended toolkit for OpenLDAP. The installation of this module is optional.

Step 1. Run the following command to install libuser:

yum install -y libuser libuser-python

Step 2. Run the following commands to configure libuser:

vi /etc/libuser.conf

[import]
login_defs = /etc/login.defs
default_useradd = /etc/default/useradd
[defaults]
crypt_style = sha512
modules = ldap
create_modules = ldap
[userdefaults]
LU_USERNAME = %n
LU_GIDNUMBER = %u
LU_GECOS = %n
# Pay attention to modify option below
LU_HOMEDIRECTORY = /home/%n
LU_SHADOWNAME = %n
LU_SHADOWMIN = 0
LU_SHADOWMAX = 99999
[groupdefaults]
LU_GROUPNAME = %n
[files]
[shadow]
[ldap]
# Modify <LDAP_ADDRESS> to management node IP
server = ldap://<LDAP_ADDRESS>
# Pay attention to modify option below
# Make sure <DOMAIN> should be the same with ${antilles_ldap_domain_name} defined in antilles_env.local
basedn = <DOMAIN>
userBranch = ou=People
groupBranch = ou=Group
binddn = uid=admin,<DOMAIN>
bindtype = simple
[sasl]
Install OpenLDAP-client

Run the following commands to install OpenLDAP-client:

echo "TLS_REQCERT never" >> /etc/openldap/ldap.conf

Then distribute /etc/openldap/ldap.conf to all other nodes.

Install nss-pam-ldapd

nss-pam-ldapd is a name service switching module and pluggable authentication module. Antilles uses this module for user authentication.

Run the following commands to install nss-pam-ldapd on all nodes:

yum install -y nss-pam-ldapd authconfig
authconfig --useshadow --usemd5 --enablemkhomedir --disablecache --enablelocauthorize --disablesssd --disablesssdauth --enableforcelegacy --enableldap --enableldapauth --disableldaptls --ldapbasedn=${antilles_ldap_domain_name} --ldapserver="ldap://${sms_name}" --updateall
echo "rootpwmoddn uid=admin,${antilles_ldap_domain_name}" >> /etc/nslcd.conf

# Start management node service
systemctl enable nslcd
systemctl start nslcd

Install Gmond GPU plugin

On all GPU nodes, run the following commands to install Gmond GPU plug-in:

yum install -y gmond-ohpc-gpu-module

ls /etc/ganglia/conf.d/*.pyconf|grep -v nvidia|xargs rm

# Start gmond
systemctl restart gmond

4. Install Antilles

List of Antilles components to be installed

Note: In the Installation node column, M stands for "Management node", L stands for "Login node", and C stands for "Compute node".

Software name Component name Version Service name Installation node Notes
antilles-core antilles-core 1.0.0 antilles M
antilles-portal antilles-portal 1.0.0 M, L
antilles-core-extend antilles-confluent-proxy 1.0.0 M
antilles-vnc-proxy 1.0.0 M
antilles-env antilles-env 1.0.0 M, L
antilles monitor antilles-ganglia-mond 1.0.0 antilles-ganglia-mond M Cannot install this software if you install antilles-icinga-mond
antilles-icinga-mond 1.0.0 antilles-icinga-mond M Cannot install this software if you install antilles-ganglia-mond
antilles-icinga-plugin 1.0.0 M, C, L Required if you need to install antilles-icinga-mond
antilles-confluent-mond 1.0.0 antilles-confluent-mond M
antilles-vnc-mond 1.0.0 antilles-vnc-mond C Required if you need to run VNC
antilles alarm notification antilles-sms-agent 1.0.0 antilles-sms-agent L Required if you need to send alerts via SMS
antilles-wechat-agent 1.0.0 antilles-wechat-agent L Required if you need to send alerts via WeChat
antilles-mail-agent 1.0.0 antilles-mail-agent L Required if you need to send alerts via e-mails

Install Antilles on the management node

Step 1. Run the following command to install the Antilles module on the management node:

yum install -y antilles-core antilles-confluent-mond antilles-confluent-proxy antilles-env

Step 2. Perform the following optional steps as required:

If you need to... Run the following commands:
Use Ganglia for Antilles monitoring (provided that Ganglia is installed on the management node)
Note:Antilles can be monitored by either Ganglia or Icinga2. Only one of Ganglia or Icinga2 can be installed.
yum install -y antilles-ganglia-mond
Use Icinga2 for Antilles monitoring (provided that Icinga2 is installed on the management node) yum install -y antilles-icinga-mond antilles-icinga-plugin
Provide Web service on the management node yum install -y antilles-portal
Use the VNC component yum install -y antilles-vnc-proxy

Step 3. Run the following command to restart services:

systemctl restart confluent

Install Antilles on the login node

Step 1. Run the following commands to install the Antilles module on the login node:

yum install -y antilles-env

Step 2. Perform the following steps on the login node as required:

If you need to... Run the following commands:
Provide Web service on the login node yum install -y antilles-portal
Use Icinga2 for Antilles monitoring (provided that Icinga2 is installed on the login node) yum install -y antilles-icinga-plugin
Provide e-mail, SMS, and WeChat services on the login node # Install email module
yum install -y antilles-mail-agent
# Install SMS module
yum install -y antilles-sms-agent
# Install WeChat module
yum install -y antilles-wechat-agent

Install Antilles on the compute node

Run the following command to install the Antilles module on all compute nodes:

yum install -y antilles-env

If Icinga2 is installed on all compute nodes, run the following command:

yum install -y antilles-icinga-plugin

5. Configure Antilles

Configure the service account

On the management node, use the tool antilles-passwd-tool.

If Icinga2 is not installed, follow the prompt below to enter the username and password for PostgreSQL, InfluxDB and Confluent:

antilles-passwd-tool

If Icinga2 is installed, follow the prompt below to enter the username and password for PostgreSQL, InfluxDB, Confluent and Icinga2 API:

antilles-passwd-tool --icinga

Configure cluster nodes

Step 1. Run the following command to import the cluster information to the system:

cp /etc/antilles/nodes.csv.example /etc/antilles/nodes.csv

Step 2. Run the following command to edit the cluster information file:

vi /etc/antilles/nodes.csv

Room information

Below is an example of the room information table.

room name location_description
Shanghai Solution Room Shanghai Zhangjiang

Enter one entry of information for the fields name and location_description.

Logic group information

Managers can use logic groups to divide nodes in the cluster into groups. The logic groups do not impact the use of computer resources or permissions configurations.

Below is an example of the logic group information table.

group name
login

You need to enter at least one logic group name in the name field.

Room row information

Room row refers to the rack order in the room. Enter the information about the rack row where the cluster node is located.

Below is an example of the room row information table.

row name index belonging_room
row1 1 Shanghai Solution Room

Enter at least one entry of row information in the fields below:

  • name: row name (must be unique in the same room)
  • index: row order (must be a positive integer and be unique in the same room)
  • belonging_room: name of the room where the row belongs

Rack information

Below is an example of the rack information table.

rack name column belonging_row
rack1 1 row1

Enter at least one entry of rack information in the fields below:

  • name: rack name (must be unique in the same room)
  • column: rack location column (must be a positive integer and be unique in the same row)
  • belonging_row: name of the row where the rack belongs

Chassis information

If there is a chassis in the cluster, enter the chassis information.

Below is an example of the chassis information table.

chassis name belonging_rack location_u_in_rack machine_type
chassis1 rack1 7 7X20

The fields in this table are described as follows:

  • name: chassis name (must be unique in the same room)
  • belonging_rack: rack location name (must use the name configured in the rack information table)
  • location_u_in_rack: location of the chassis base in the rack (Unit: U). In a standard cabinet, the value should be between 1 and 42. For example, a chassis base is located at 5U.
  • machine_type: chassis type (see Supported servers and chassis models )

Node information

Enter the information about all nodes in the cluster into the node information table. Due to its width, the example node information table is displayed in two split parts.

Part 1:

node name nodetype immip hostip machine_type ipmi_user ipmi_pwd
head head 10.240.212.13 127.0.0.1 7X58 <BMC_ USERNAME> <BMC_ PASSWORD>

Part 2:

belonging_service_node belonging_rack belonging_chassis location_u groups
rack1 2 login

The fields are described as follows:

  • name: node hostname (domain name not needed)
  • nodetype: head means management node; login means login node; compute means compute node
  • immip: IP address of the node’s BMC system
  • hostip: IP address of the node on the host network
  • machine_type: product name for the node (see Supported servers and chassis models )
  • ipmi_user: XCC (BMC) account for the node
  • ipmi_pwd: XCC (BMC) password for the node
  • belonging_service_node: large clusters require setting up a service node to which the node belongs. If there is no service node, leave the field blank
  • belonging_rack: name of the node location rack (need to add the configured name to the rack information table)
  • belonging_chassis: name of the node location chassis (leave this field blank if it can be located in any chassis). Configure the chassis name in the chassis information table
  • location_u: node location. If the node is located in the chassis, enter the slot in the chassis in which the node is located. If the node is located in a rack, enter the location of the node base in the rack (Unit: U)
  • groups: name of the node location logic group. One node can belong to multiple logic groups. Group names should be separated by ";". Configure the logic group name in the logic group information table

Configure Antilles services

The Antilles service configuration file is located in /etc/antilles/antilles.ini. This configuration file controls the operating parameters for various Antilles background service components. You can modify this configuration file as needed.

If you have changed the configuration while Antilles is running, run the following command to restart Antilles before the configuration takes effect:

systemctl restart antilles

Note: Configurations not mentioned in the instructions in this section can be modified after consulting with service staff. Modifications made without a service consultation could result in a system failure.

Infrastructure configuration

The following part of the infrastructure configuration is modifiable:

domain = hpc.com  # Cluster domain settings

Database configuration

The following parts of the database configuration are modifiable:

db_host = 127.0.0.1          # PostgreSQL address
db_port = 5432               # PostgreSQL port
db_name = antilles           # PostgreSQL database name

influx_host = 127.0.0.1      # InfluxDB address
influx_port = 8086           # InfluxDB port
influx_database = antilles   # InfluxDB database name

Login configuration

The following part of the login configuration is modifiable:

login_fail_max_chance = 3    # Maximum number of login password error attempts

Attention: If user login failures exceed login_fail_max_chance, the system will suspend this user for 45 minutes. Suspended users cannot log in to the system even with the valid authentication information. Administrators, however, can resume a suspended user with a command line or Web portal. See Resume a user.

Storage configuration

The following part of the storage configuration is modifiable:

# Shared storage directory
# If strictly adhering to the shared directory configurations in this document,
# change to: share_dir = /home
share_dir = /home

Scheduler configuration

The following part of the scheduler configuration is modifiable:

# The scheduler configuration currently supports Slurm, LSF, and Torque. Slurm is the default.
scheduler_software = slurm

Alert configuration

Note: The configuration in this section is needed only when WeChat, SMS, and e-mail proxy modules are installed for the cluster. You can obtain <WECHAT_TEMPLATE_ID> from https://mp.weixin.qq.com/wiki?t=resource/res_main&id=mp1445241432. The following part of the alert configuration is modifiable:

wechat_agent_url = http://127.0.0.1:18090   # WeChat proxy server address
wechat_template_id = <WECHAT_TEMPLATE_ID>   # WeChat notification template ID
sms_agent_url = http://127.0.0.1:18092      # SMS proxy server address
mail_agent_url = http://127.0.0.1:18091     # Email proxy server address

Confluent configuration

The following part of the confluent configuration is modifiable:

confluent_port = 4005        # Confluent port

User configuration

The following part of the user configuration is modifiable:

# user
use_libuser = false

The default configuration for user_libuser is "false". To change this value, run the following command to set the ldap password:

# The command prompts you to enter the LDAP administrator password
# Use the LDAP_PASSWORD you configured in "Install OpenLDAP-server".
antilles setldappasswd
Please input your ldap password:
Please confirm the ldap password:

Configure Antilles components

antilles-vnc-mond

Step 1. Create a file named /var/tmp/vnc-mond.ini and run the following commands:

[vnc]
url=http://127.0.0.1:18083/session
timeout=30

Note: Replace 127.0.0.1 with the actual IP address of the management node.

Step 2. Distribute the configuration file /var/tmp/vnc-mond.ini to all compute nodes, put the file in /etc/antilles/vnc-mond.ini.

antilles-portal

To prevent conflictions, you may need to modify some pathway files for nodes installed with the antilles-portal module, which provides external Web services with different ports.

  • /etc/nginx/nginx.conf

You can edit /etc/nginx/nginx.conf by changing the port to 8080:

listen 8080 default_server;
listen [::]:8080 default_server;

If you want to hide the server version information, modify /etc/nginx/nginx.conf by turning off server_tokens:

http{
    ......
    sendfile on;
    server_tokens off;
    ......
}
  • /etc/nginx/conf.d/https.conf

You can edit /etc/nginx/conf.d/https.conf by changing the https default port 443 to other ports:

listen <port> ssl http2;

Note: Ensure that the port is not used by other applications and is not blocked by the firewall.

  • /etc/nginx/conf.d/sites-available/antilles.conf

You can edit /etc/nginx/conf.d/sites-available/antilles.conf by replacing the first line to the following content:

set $antilles_host 127.0.0.1;

Note: If antilles-portal does not run on the management node, you can change 127.0.0.1 to the IP address of the management node.

  • /etc/antilles/portal.conf

You can edit /etc/antilles/portal.conf by adding custom shortcut links. Refer to /etc/antilles/portal.conf.example for the configuration format.

antilles-ganglia-mond

Note: Skip this section if Ganglia is not deployed in the cluster.

On the management node with the default port, the /etc/antilles/ganglia-mond.conf file shows as follows:

influxdb {
    cfg_db_host 127.0.0.1
    cfg_db_port 5432
    cfg_db_name antilles
    host 127.0.0.1
    port 8086
    database antilles
    timeout 10
}

Make the following changes to the file:

  • Change cfg_db_host 127.0.0.1 and cfg_db_port 5432 to the actual IP address and port number of the PostgreSQL service.
  • Change the host 127.0.0.1 and port 8086 to the actual IP address and port number of the InfluxDB service.

antilles-icinga-mond

Note: Skip this section if Icinga2 is not deployed in the cluster.

Edit the file /etc/antilles/icinga-mond.ini:

  • Section [base] Change service = antilles to the actual service name defined in Icinga2 for Antilles.
  • Section [icinga] Change host = 127.0.0.1 and port = 5665 to the actual Icinga2 API service.
  • Section [postgresql] Change host = 127.0.0.1 and port = 5432 to the actual PostgreSQL service.
  • Section [influxdb] Change host = 127.0.0.1 and port = 8086 to the actual InfluxDB service.

See the example below:

[base]
service = antilles
sample_interval = 15
domain_filter =

[icinga]
host = 127.0.0.1
port = 5665
timeout = 30

[postgresql]
host = 127.0.0.1
port = 5432
database = antilles

[influxdb]
host = 127.0.0.1
port = 8086
database = antilles
timeout = 30

antilles-icinga-plugin

Note: Skip this section if Icinga2 is not installed in the cluster.

If antilles-icinga-plugin is installed in the cluster, Icinga2 should be configured to enable the plugin. Below are examples for how to configure Icinga2:

  • Define a new command in the command configuration file of Icinga2:

    object CheckCommand "antilles-monitor" {
      command = [PluginDir + "/antilles-icinga-plugin"]
      arguments = {
          "-a" = ""
      }
    }
    
  • Define a new service in the service configuration file of Icinga2:

    apply Service "antilles" {
      display_name = "antilles"
      check_command = "antilles-monitor"
      assign where host.address
    }
    

Notes:

  • For details about how to define the Icinga2 command and service, refer to https://www.icinga.com/docs/icinga2/latest/.
  • The "display_name" of the Icinga2 service must be the same as the "service" in the configuration file of antilles-icinga-mond.

antilles-confluent-proxy

The /etc/antilles/confluent-proxy.ini file shows as follows on the management node with the default port:

[DEFAULT]
# database
db_host = 127.0.0.1
db_port = 5432
db_name = antilles

Make the following changes to the file:

  • Change db_host 127.0.0.1 and db_port 5432 to the actual IP address and port number of the PostgreSQL service.
  • Change host 127.0.0.1 and port 8086 to the actual IP address and port number of the InfluxDB service.

If there are multiple Confluents in the cluster, configure the [app:main] section as follows:

[app:main]
use = cluster-confluent-proxy

Note: To change the information about the Confluent user, see Install Confluent. To create or update the user information, see Configure the service account.

antilles-confluent-mond

The /etc/antilles/confluent-mond.ini file shows as follows on the management node with the default port:

[database]
db_host = 127.0.0.1
db_port = 5432
db_name = antilles

[influxdb]
host = 127.0.0.1
port = 8086
database = antilles
timeout = 10

Make the following changes to the file:

  • Change db_host 127.0.0.1 and db_port 5432 to the actual IP address and port number of the PostgreSQL service.
  • Change host 127.0.0.1 and port 8086 to the actual IP address and port number of the InfluxDB service.

antilles-wechat-agent

Edit the file /etc/antilles/wechat-agent as follows:

#The configurations below should be changed based on the specific environment
appid = <APPID> secret = <SECRET>

Note: For more information about <APPID> and <SECRET>, refer to https://mp.weixin.qq.com/wiki?t=resource/res_main&id=mp1445241432.

Initialize the system

Run the following command to initialize Antilles:

antilles init

Initialize users

Step 1. (Optional) Run the following commands to add an LDAP user with username and password:

luseradd <HPC_ADMIN_USERNAME> -P <HPC_ADMIN_PASSWORD>

Use the LDAP_PASSWORD you configured in Install OpenLDAP-server.

Step 2. Run the following command to import the user to Antilles:

#Import user into Antilles and as admin
antilles user_import -u <HPC_ADMIN_USERNAME> -r admin

6. Start and log in to Antilles

Start Antilles

If Ganglia is installed, run the following commands:

# If the management node has to provide web service, start Nginx on the management node.
systemctl enable nginx
systemctl start nginx

# If the login node has to provide web service, start Nginx on the login node.
systemctl enable nginx
systemctl start nginx

# Start Antilles-related services
systemctl start antilles-ganglia-mond
systemctl start antilles-confluent-mond

# Start Antilles
systemctl start antilles

If Icinga2 is installed, run the following commands:

# If the management node has to provide web service, start Nginx on the management node.
systemctl enable nginx
systemctl start nginx

# If the login node has to provide web service, start Nginx on the login node.
systemctl enable nginx
systemctl start nginx

# Start Antilles-related services
systemctl start antilles-icinga-mond
systemctl start antilles-confluent-mond

# Start Antilles
systemctl start antilles

Log in to Antilles

After the Antilles service is started, you can access Antilles by opening https://<ip of login node>:<port>/ in a Web browser.

Note: Replace port with the port number you set in /etc/nginx/conf.d/https.conf which described in section "antilles-portal".

If the installation is correct, the Antilles login page opens. You can log in using the LDAP account set in "Initialize users".

Trouble Shooting

If you forget the password, you may use the command which antilles supported.

Change a user's role

Run the following commands to change a user's role:

antilles user_changerole -u <ROLE_USERNAME> -r admin

Parameter interpretation:

  • -u Specify the username to modify
  • -r Specify the role to be set (admin/operator/user)

Resume a user

Run the following command to resume a user:

antilles user_resume <SUSPENDED_USERNAME>

Parameter interpretation:

  • Directly specify users who need to be resumed