You can run Conio on Hadoop natively using a pseudo distributed cluster or in Docker containers.
Note: Conio on Hadoop requires native libraries, which is only supported on Linux as of now. Therefore it is currently not possible to run Conio on Mac for example.
For Linux systems perform the following steps:
-
Download a supported Hadoop from here.
-
Set up a Single Node Cluster based on this document.
-
Enable scheduling of Docker containers in YARN (like setting up the Linux Container Executor) and set up your Docker daemon accordingly.
-
Build Conio and obtain the far jar with its dependencies and
-
Submit the yaml describing your Kubernetes object through a Conio application by using the Conio client:
java -jar conio-1.0-SNAPSHOT-jar-with-dependencies.jar -yaml k8s-obj.yaml
- Set up any dockerized Hadoop solution and configure it according to the Pseudo distributed mode
- or use the conio-nano project which is a fork of big-data-europe/docker-hadoop with the same configurations as above, but suited for this project.
If you use Conio-nano, you are probably going to need a similar command to start the Conio client in a Docker container that has the required files mounted in the container:
docker run -it -a stdin -a stdout -a stderr --env-file hadoop.env --network docker-hadoop_default -v $(PWD)/conio:/conio conio/base:master -- sudo -u conio java -jar /conio/conio.jar -yaml /conio/pod.yaml -zookeeper <zookeeper address>
Note that the jar containing the dependencies is renamed to conio.jar.
In the NodeManager logs you may see mount failed options. If you use Mac and you use Docker for Mac, enable sharing specific mounts in the Preferences > File sharing tab: add /opt
and /etc
.