HTCondor dockerized in three nodes: Master, Submitter and Executor. Ubuntu Trusty LTS is the base image used and condor version refer to the last stable version.
Supervisord is used in order to control different processes spawn. Many different features are implemented as described below, such as calico, marathon, onedata support.
- simple Run
- Calico Support
- Onedata Support
- Marathon Support
- Healthchecks
- SSH access
- condor_config
- Logs
Master node:
$ docker run -d --name=condormaster dscnaf/htcondor-debian -m
$ docker exec -it condormaster ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
32: eth0@if33: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe11:2/64 scope link
valid_lft forever preferred_lft forever
Submitter node:
$ docker run -d --name=condorsubmit dscnaf/htcondor-debian -s <MASTER_IP>
Then launch an arbitrary number of executors:
$ docker run -d --name=condorexecute dscnaf/htcondor-debian -e <MASTER_IP>
Containers are agnostic on network layer. As follows, a test will be shown in which containers hosted on different hosts (calico01 - calico0(x)) can communicate via calico network driver.
core@calico-01 ~ $ calicoctl pool add 192.168.0.0/16
core@calico-01 ~ $ calicoctl pool show
+----------------+---------+
| IPv4 CIDR | Options |
+----------------+---------+
| 192.168.0.0/16 | |
+----------------+---------+
+-----------+---------+
| IPv6 CIDR | Options |
+-----------+---------+
+-----------+---------+
core@calico-01 ~ $ docker network create --driver calico --ipam-driver calico calinet1
core@calico-01 ~ $ calicoctl profile calinet1 rule show
Inbound rules:
1 allow from tag calinet1
Outbound rules:
core@calico-01 ~ $ docker exec -it condormaster ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
15: cali0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff
inet 192.168.142.0/32 scope global cali0
valid_lft forever preferred_lft forever
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
core@calico-0(x) ~ $ docker run -d --net=calinet1 --name=condorsubmit dscnaf/htcondor-debian -s 192.168.142.0
core@calico-0(x) ~ $ docker run -d --net=calinet1 --name=condorexecute dscnaf/htcondor-debian -e 192.168.142.0
core@calico-0(x) ~ $ docker exec -it condorexecute ping 192.168.142.0
PING 192.168.142.0 (192.168.142.0): 48 data bytes
56 bytes from 192.168.142.0: icmp_seq=0 ttl=62 time=0.048 ms
56 bytes from 192.168.142.0: icmp_seq=1 ttl=62 time=0.376 ms
^C--- 192.168.142.0 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.048/0.212/0.376/0.164 ms
Inside containers there is oneclient tool for external data access. Users can so use during job run File System mount inside their sandbox. Requirements:
- oneprovider access
- external connectivity (--nat-outgoing is required when using Calico)
- privileged containers (executors must be --privileged)
$ docker run -d --name=condor<TYPE> --privileged dscnaf/htcondor-debian -e <MASTER_IP>
## on submitter
john@submitter:~$ cat touch.sh
#!/bin/bash
set -ex
export ONECLIENT_AUTHORIZATION_TOKEN=xxxxxxxxxxxxxxxxxxxx
export PROVIDER_HOSTNAME=<ENDPOINT>
mkdir oneclient
oneclient --no_check_certificate --authentication token oneclient
cd oneclient/John\'s\ space
touch imhere.txt
cd ../..
fusermount -u oneclient
unmount
op is due to user still. Be careful with hanging mountpoint in executor hosts.
Inside examples/marathon
folder different .json
file as example are stored for launching containers in mesos/marathon clusters. Examples contains different optional features. Please refer to usage or specific sections. For other requirements, please refer to official marathon docs
.json
files can injected via GUI (from newer marathon versions) or via API as follows:
curl -XPOST -H "Content-Type: application/json" http://<MARATHON_IP>/v2/apps -d @<FILE.json>
Healthchecks are implemented using HTCondor API python. They simple check presence of used processes in container specific role. Nevertheless, due to known bugs in Mesos Marathon platform, this feature is no totally working yet if using Calico drivers. These bugs are resolved in Mesos >= 1.0.0-rc3 and Marathon >= 1.2.0-RC8.
Healthchecks examples are too in examples/marathon
folder.
- Healthchecks still primitive
Containers are launched with sshd
daemon disabled as default. Nevertheless could be activated if needed (e.g.: access to hosting submitter host) via two methods:
-
via password:
use
-u
(user) and-p
(password) parameters. It will inject a user without root privileges
docker run -d --name=sub --net=htcondor dscnaf/htcondor-debian -s 192.168.0.152 -u john -p j0hn
-
via certificate
using
-k
(public Key) parameter, ssh service will be activated and public key stored in/root/.ssh/authorized_keys
. Public key must be reachable via net. File system exchange is not possible.
docker run -d --name=sub --net=htcondor dscnaf/htcondor-debian -s 192.168.0.152 -k <url_to_public_key>
These two methods are not mutually exclusive
- Adding calico rule
[root@mesos-host ~]# calicoctl profile htcondor rule show
Inbound rules:
1 allow from tag htcondor
2 allow tcp to ports 5000
Outbound rules:
1 allow
[root@mesos-host ~]# calicoctl profile htcondor rule add inbound allow tcp to ports 22
[root@mesos-host ~]# calicoctl profile htcondor rule show
Inbound rules:
1 allow from tag htcondor
2 allow tcp to ports 5000
3 allow tcp to ports 22
Outbound rules:
1 allow
- Routing rule adding on hosting submitter host
[root@mesos-host ~]# iptables -A PREROUTING -t nat -i <HOST_INTERFACE> -p tcp --dport <PORT> -j DNAT --to <CONTAINER_IP>:22
[root@mesos-host ~]# iptables -t nat -A OUTPUT -p tcp -o lo --dport <PORT> -j DNAT --to-destination <CONTAINER_IP>:22
e.g.:
[root@mesos-host ~]# iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 2222 -j DNAT --to 192.168.0.26:22
[root@mesos-host ~]# iptables -t nat -A OUTPUT -p tcp -o lo --dport 2222 -j DNAT --to-destination 192.168.0.26:22
- External access
john@workstation:~$ ssh -p 2222 [email protected]
Password:
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.10.0-327.28.3.el7.x86_64 x86_64)
* Documentation: https://help.ubuntu.com/
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Last login: Wed Sep 14 14:14:28 2016 from workstation
john@3211f3fc6b40:~$
Where 192.168.0.26 is submitter IP (via calico) and 131.154.96.147 is the hosting node (mesos-host).
Note: this solution is reported as calico docs.
-c
(configuration file) parameter, permit to inject a different configuration file in container instances. Configuration file must be reachable via net. File system exchange is not possible.
docker run -d --name=sub --net=htcondor dscnaf/htcondor-debian -s 192.168.0.152 -c <url_to_condor_config>
DAEMON_LIST = MASTER, @ROLE_DAEMONS@
andCONDOR_HOST = @CONDOR_HOST@
parameters should be present even in a differentcondor_config
template. Different template is in full user charge.
docker logs <container_name>
core@calico-01 ~ $ docker exec -it condorsubmit bash
root@854b194757b8:/# useradd -m -s /bin/bash john
root@854b194757b8:/# su - john
john@854b194757b8:~$ cat > sleep.sh << EOF
#!/bin/bash
/bin/sleep 20
EOF
john@854b194757b8:~$ cat > sleep.sub << EOF
executable = sleep.sh
log = sleep.log
output = outfile.txt
error = errors.txt
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue
EOF
john@854b194757b8:~$ chmod u+x sleep.sh
john@854b194757b8:~$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
073b4a02ee6a LINUX X86_64 Unclaimed Idle 0.000 997 0+02:19:33
854b194757b8 LINUX X86_64 Unclaimed Idle 0.010 997 0+00:00:04
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 2 0 0 2 0 0 0
Total 2 0 0 2 0 0 0
john@854b194757b8:~$ condor_submit sleep.sub
Submitting job(s).
1 job(s) submitted to cluster 12.
john@854b194757b8:~$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
073b4a02ee6a LINUX X86_64 Unclaimed Idle 0.000 997 0+02:19:33
854b194757b8 LINUX X86_64 Claimed Busy 0.010 997 0+00:00:04
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 2 0 1 1 0 0 0
Total 2 0 1 1 0 0 0
john@854b194757b8:~$ condor_q
-- Schedd: 854b194757b8 : <192.168.142.1:56580?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
12.0 john 5/30 15:26 0+00:00:15 R 0 0.0 sleep.sh
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
usage: $0 -m|-e master-address|-s master-address [-c url-to-config] [-k url-to-public-key] [-u inject user -p password]
Configure HTCondor role and start supervisord for this container.
OPTIONS:
-m configure container as HTCondor master
-e master-address configure container as HTCondor executor for the given master
-s master-address configure container as HTCondor submitter for the given master
-c url-to-config config file reference from http url.
-k url-to-public-key url to public key for ssh access to root
-u inject user inject a user without root privileges for submitting jobs accessing via ssh. -p password required
-p password user password (see -u attribute).