Skip to content
Mike Fisk edited this page Dec 29, 2020 · 16 revisions

No real “installation” is required.

  1. Download the fm command (a Python script) to the computer you will use to launch computations.
    • You can also download a tarball
    • You may want to pick an older release rather than the current devel version
  2. Create a filemap.conf file describing the nodes you are using.
  3. Run “fm init” once to prepare necessary directory structures on each node.
  4. Copy “fm” to each node in the computation and make sure it is in your PATH on each node.

Dependencies

  • Linux
  • Python >= 2.4 (requires subprocess module) or 3.x
  • OpenSSH (or some other remote shell mechanism. SLURM support is experimental)
  • rsync
  • bash

SSH setup

If you’re using ssh (the default) to communicate between nodes, you will need to setup some form of authentication that doesn’t prompt you for a password each time. There are several options:

  • Use an ssh keypair (private public key). Spread the private key all worker hosts as well as the “master”. Add the public key to all hosts’ `authorized_keys` file. Then use ssh-agent to cache the keys in memory:
    • ssh-agent $SHELL
    • ssh-add
  • As root, configure HostbasedAuthentication.
  • Use a single sign-on infrastructure like Kerberos.

Also, for best performance, reconfigure your nodes’ sshd_config file to set MaxStartups to a number larger than the number of nodes in your cluster configuration. Otherwise, sshd rate-limiting may hurt the performance of replication and shuffle operations. MaxStartups defaults to only allowing 10 new as-yet-unauthenticated connections at a time.

Clone this wiki locally