Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Support running Heron on Amazon Ecs. #1837

Open
wants to merge 62 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
370873e
Added Ecs Docker Host Support
ananthgs May 3, 2017
8a10b03
Adding the ecs scheduler
ananthgs May 3, 2017
cee04fe
AWS ECS schedulers added
ananthgs May 3, 2017
c5b7f3e
Adding AWS ECS files
ananthgs May 3, 2017
8c02617
adding ecs to build scripts
ananthgs May 3, 2017
725891d
adding ecs to build
ananthgs May 3, 2017
5336c3c
Adding AMI Check dependancy
ananthgs May 3, 2017
e368e9a
Adding ecs conf
ananthgs May 4, 2017
9a80784
Added Compose Commands
ananthgs May 4, 2017
0aae022
Cleaned up non ecs functions
ananthgs May 4, 2017
83bf1a1
Added temp file for compose
ananthgs May 4, 2017
f1379f6
Removed reference to temp dir
ananthgs May 4, 2017
c38322c
Removed Working Dir reference
ananthgs May 4, 2017
7e575b9
Removed commented lines for unused imports
ananthgs May 5, 2017
4c6de0c
emoved commented lines on unused imports
ananthgs May 5, 2017
fefbbc8
Removing un referenced EcsKeys class
ananthgs May 5, 2017
edaa5fa
Removed the host env setting if its an Amazon ECS instance
ananthgs May 5, 2017
8efc62e
remving had coded values
ananthgs May 12, 2017
0b74033
removing hard coded values and ports
ananthgs May 12, 2017
fceacfa
new file: EcsKey.java
ananthgs May 12, 2017
532fffe
modified: EcsLauncher.java
ananthgs May 12, 2017
2d08fe5
fixing java and other paths
ananthgs May 12, 2017
57d9c91
new file: ecs_compose_template.yaml
ananthgs May 12, 2017
3db478f
tunneling set to false
ananthgs May 13, 2017
4a36f40
update gethost to handle docker and ecs AMI
ananthgs May 13, 2017
36dd734
passing esc ami host param
ananthgs May 14, 2017
9582f54
adding kill to schedulers
ananthgs May 14, 2017
3a2af21
modified: EcsKey.java
ananthgs May 14, 2017
406f298
removed hard coding values
ananthgs May 14, 2017
5b16369
added list tasks context
ananthgs May 16, 2017
3f321cf
added List Tasks Keys
ananthgs May 16, 2017
ac7ca93
added getJobLinks with ecs Task ID's
ananthgs May 16, 2017
5204be2
adde jackson JSON jars for ecs Schedulers
ananthgs May 16, 2017
52f40bb
new file: heron/config/src/yaml/conf/ecs/setupEcs.sh
ananthgs May 16, 2017
e74b0ef
new file: heron/config/src/yaml/conf/ecs/set-ecs-cluster-name.sh
ananthgs May 16, 2017
0824fa0
new file: heron/config/src/yaml/conf/ecs/ecs-heron-policy.json
ananthgs May 16, 2017
d3a7df5
new file: heron/config/src/yaml/conf/ecs/ecs-heron-role.json
ananthgs May 16, 2017
f65f865
fixed echo messages
ananthgs May 16, 2017
15a91f7
Fixed Log messages
ananthgs May 16, 2017
96a150d
new file: EcsSchedulerTest.java
ananthgs May 19, 2017
f2e4267
modified: setupEcs.sh
ananthgs May 19, 2017
be514cb
fixed version number
ananthgs May 19, 2017
0ece9cb
fixed version number
ananthgs May 19, 2017
2ef6665
modified: set-ecs-cluster-name.sh
ananthgs May 19, 2017
c224c82
new file: README
ananthgs May 21, 2017
30f4348
Clean up file for AWS
ananthgs May 21, 2017
b362b4d
modified: ecs_compose_template.yaml
ananthgs May 21, 2017
4b05a87
modified: setupEcs.sh
ananthgs May 21, 2017
2d9dbec
modified: scheduler.yaml
ananthgs May 21, 2017
0119231
added cluster binary context
ananthgs May 21, 2017
986e7f7
Added scheduler realated changes for scheduler info to be put in zook…
ananthgs May 21, 2017
f13b137
Added files for on schedule testing
ananthgs May 21, 2017
4cc598b
JAVA home for enable AWS scheduler
ananthgs May 21, 2017
168a474
Updating with correct document link
ananthgs May 22, 2017
f19d354
Adding Ecs AMI check as non mandatory field
ananthgs May 22, 2017
034eb1a
modified: tools/rules/heron_core.bzl
ananthgs May 22, 2017
246b922
modified: scripts/packages/BUILD
ananthgs May 22, 2017
f6f52d9
merging with changes on upstream
ananthgs May 22, 2017
3767043
Merge remote-tracking branch 'upstream/master'
ananthgs May 22, 2017
84b669c
Fixed duplicate lines due to merge with head branch
ananthgs May 22, 2017
88d5ae6
Fixed Cluster name to ecs-heron-cluster
ananthgs May 22, 2017
2f1bde1
modified: heron/config/src/yaml/conf/ecs/setupEcs.sh
ananthgs May 22, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions heron/config/src/yaml/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ filegroup(
srcs = glob(["conf/local/*.yaml"]),
)

filegroup(
name = "conf-ecs-yaml",
srcs = glob(["conf/ecs/*.yaml"]),
)

filegroup(
name = "conf-aurora-yaml",
srcs = glob(["conf/aurora/*"]),
Expand Down
6 changes: 6 additions & 0 deletions heron/config/src/yaml/conf/ecs/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This folder contains sample configs needed for using running heron on AWS Cluster
Please follow the steps at this google doc for detailed set up and workflow:

https://docs.google.com/document/d/1ecbCuA46cIKPfY0SP0F1dcRlei4DIPz3pZ6ZSZ5zZgc/edit

Then you can run Heorn on AWS !!!
21 changes: 21 additions & 0 deletions heron/config/src/yaml/conf/ecs/cleanUp.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

aws autoscaling delete-launch-configuration --launch-configuration-name ecs-heron-launch-configuration
echo "done delete-launch-configuration"
aws iam remove-role-from-instance-profile --instance-profile-name ecs-heron-instance-profile --role-name ecs-heron-role
echo "done iam remove-role-from-instance-profile"
aws iam delete-instance-profile --instance-profile-name ecs-heron-instance-profile
echo "done iam iam delete-instance-profile"
aws iam delete-role-policy --role-name ecs-heron-role --policy-name ecs-heron-policy
echo "done iam delete-role-policy "
aws iam delete-role --role-name ecs-heron-role
echo "done iam delete-role "
aws ec2 delete-key-pair --key-name ecs-heron-keypair
rm -f ecs-heron-key*.pem

GROUP_ID=$(aws ec2 describe-security-groups --query 'SecurityGroups[?GroupName==`ecs-heron-securitygroup`].GroupId' --output text)
aws ec2 delete-security-group --group-id "$GROUP_ID"
echo "done delete security-groups "

aws ecs delete-cluster --cluster ecs-heron-cluster
echo "done delete cluster "
28 changes: 28 additions & 0 deletions heron/config/src/yaml/conf/ecs/ecs-heron-policy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:CreateCluster",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:Submit*",
"ecs:ListClusters",
"ecs:ListContainerInstances",
"ecs:DescribeContainerInstances",
"ecs:ListServices",
"ecs:DescribeTasks",
"ecs:DescribeServices",
"ec2:DescribeInstances",
"ec2:DescribeTags",
"autoscaling:DescribeAutoScalingInstances"
],
"Resource": [
"*"
]
}
]
}
8 changes: 8 additions & 0 deletions heron/config/src/yaml/conf/ecs/ecs-heron-role.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": {"Service": "ec2.amazonaws.com"},
"Action": "sts:AssumeRole"
}
}
17 changes: 17 additions & 0 deletions heron/config/src/yaml/conf/ecs/ecs_compose_template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: '2'
services:
CONTAINER_NUMBER:
image: ananthgs/onlyheronandubuntu
#command: ["sh", "-c", "mkdir /s3; cd /s3 ;aws s3 cp s3://herondockercal/TOPOLOGY_NAME/topology.tar.gz /s3 ;aws s3 cp s3://herondockercal/heron-core-testbuild-ubuntu14.04.tar.gz /s3 ;cd /s3; tar -zxvf topology.tar.gz; tar -zxvf heron-core-testbuild-ubuntu14.04.tar.gz; HERON_EXECUTOR ;"]
command: ["sh", "-c", "mkdir /s3; cd /s3 ;aws s3 cp s3://herondockercal/TOPOLOGY_NAME/topology.tar.gz /s3 ;aws s3 cp s3://herondockercal/heron-core.tar.gz /s3 ;cd /s3; tar -zxvf topology.tar.gz; tar -zxvf heron-core.tar.gz; HERON_EXECUTOR ;"]
networks:
- heron
ports:FREEPORTS
volumes:
- "herondata:/root/.herondata"
networks:
heron:
driver: bridge
volumes:
herondata:
driver: local
272 changes: 272 additions & 0 deletions heron/config/src/yaml/conf/ecs/heron_internals.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
################################################################################
# Default values for various configs used inside Heron.
################################################################################
# All the config associated with time is in the unit of milli-seconds,
# unless otherwise specified.
################################################################################
# All the config associated with data size is in the unit of bytes, unless
# otherwise specified.
################################################################################

################################################################################
# System level configs
################################################################################

### heron.* configs are general configurations over all componenets

# The relative path to the logging directory
heron.logging.directory: "log-files"

# The maximum log file size in MB
heron.logging.maximum.size.mb: 100

# The maximum number of log files
heron.logging.maximum.files: 5

# The interval in seconds after which to check if the tmaster location has been fetched or not
heron.check.tmaster.location.interval.sec: 120

# The interval in seconds to prune logging files in C++
heron.logging.prune.interval.sec: 300

# The interval in seconds to flush log files in C++
heron.logging.flush.interval.sec: 10

# The threshold level to log error
heron.logging.err.threshold: 3

# The interval in seconds for different components to export metrics to metrics manager
heron.metrics.export.interval.sec: 60

# The maximum count of exceptions in one MetricPublisherPublishMessage protobuf
heron.metrics.max.exceptions.per.message.count: 1024

################################################################################
# Configs related to Stream Manager, starts with heron.streammgr.*
################################################################################

# Maximum size in bytes of a packet to be send out from stream manager
heron.streammgr.packet.maximum.size.bytes: 102400

# The tuple cache (used for batching) can be drained in two ways:
# (a) Time based
# (b) size based

# The frequency in ms to drain the tuple cache in stream manager
heron.streammgr.cache.drain.frequency.ms: 10

# The sized based threshold in MB for draining the tuple cache
heron.streammgr.cache.drain.size.mb: 100

# For efficient acknowledgements
heron.streammgr.xormgr.rotatingmap.nbuckets: 3

# The reconnect interval to other stream managers in secs for stream manager client
heron.streammgr.client.reconnect.interval.sec: 1

# The reconnect interval to tamster in second for stream manager client
heron.streammgr.client.reconnect.tmaster.interval.sec: 10

# The maximum packet size in MB of stream manager's network options
heron.streammgr.network.options.maximum.packet.mb: 100

# The interval in seconds to send heartbeat
heron.streammgr.tmaster.heartbeat.interval.sec: 10

# Maximum batch size in MB to read by stream manager from socket
heron.streammgr.connection.read.batch.size.mb: 1

# Maximum batch size in MB to write by stream manager to socket
heron.streammgr.connection.write.batch.size.mb: 1

# Number of times we should wait to see a buffer full while enqueueing data
# before declaring start of back pressure
heron.streammgr.network.backpressure.threshold: 3

# High water mark on the num in MB that can be left outstanding on a connection
heron.streammgr.network.backpressure.highwatermark.mb: 100

# Low water mark on the num in MB that can be left outstanding on a connection
heron.streammgr.network.backpressure.lowwatermark.mb: 50

################################################################################
# Configs related to Topology Master, starts with heron.tmaster.*
################################################################################

# The maximum interval in minutes of metrics to be kept in tmaster
heron.tmaster.metrics.collector.maximum.interval.min: 180

# The maximum time to retry to establish the tmaster
heron.tmaster.establish.retry.times: 30

# The interval to retry to establish the tmaster
heron.tmaster.establish.retry.interval.sec: 1

# Maximum packet size in MB of tmaster's network options to connect to stream managers
heron.tmaster.network.master.options.maximum.packet.mb: 16

# Maximum packet size in MB of tmaster's network options to connect to scheduler
heron.tmaster.network.controller.options.maximum.packet.mb: 1

# Maximum packet size in MB of tmaster's network options for stat queries
heron.tmaster.network.stats.options.maximum.packet.mb: 1

# The interval for tmaster to purge metrics from socket
heron.tmaster.metrics.collector.purge.interval.sec: 60

# The maximum # of exceptions to be stored in tmetrics collector, to prevent potential OOM
heron.tmaster.metrics.collector.maximum.exception: 256

# Should the metrics reporter bind on all interfaces
heron.tmaster.metrics.network.bindallinterfaces: False

# The timeout in seconds for stream mgr, compared with (current time - last heartbeat time)
heron.tmaster.stmgr.state.timeout.sec: 60

################################################################################
# Configs related to Topology Master, starts with heron.metricsmgr.*
################################################################################

# The size of packets to read from socket will be determined by the minimal of:
# (a) time based
# (b) size based

# Time based, the maximum batch time in ms for metricsmgr to read from socket
heron.metricsmgr.network.read.batch.time.ms: 16

# Size based, the maximum batch size in bytes to read from socket
heron.metricsmgr.network.read.batch.size.bytes: 32768

# The size of packets to write to socket will be determined by the minimum of
# (a) time based
# (b) size based

# Time based, the maximum batch time in ms for metricsmgr to write to socket
heron.metricsmgr.network.write.batch.time.ms: 16

# Size based, the maximum batch size in bytes to write to socket
heron.metricsmgr.network.write.batch.size.bytes: 32768

# The maximum socket's send buffer size in bytes
heron.metricsmgr.network.options.socket.send.buffer.size.bytes: 6553600

# The maximum socket's received buffer size in bytes of metricsmgr's network options
heron.metricsmgr.network.options.socket.received.buffer.size.bytes: 8738000

################################################################################
# Configs related to Heron Instance, starts with heron.instance.*
################################################################################

# The queue capacity (num of items) in bolt for buffer packets to read from stream manager
heron.instance.internal.bolt.read.queue.capacity: 128

# The queue capacity (num of items) in bolt for buffer packets to write to stream manager
heron.instance.internal.bolt.write.queue.capacity: 128

# The queue capacity (num of items) in spout for buffer packets to read from stream manager
heron.instance.internal.spout.read.queue.capacity: 1024

# The queue capacity (num of items) in spout for buffer packets to write to stream manager
heron.instance.internal.spout.write.queue.capacity: 128

# The queue capacity (num of items) for metrics packets to write to metrics manager
heron.instance.internal.metrics.write.queue.capacity: 128

# The size of packets read from stream manager will be determined by the minimal of
# (a) time based
# (b) size based

# Time based, the maximum batch time in ms for instance to read from stream manager per attempt
heron.instance.network.read.batch.time.ms: 16

# Size based, the maximum batch size in bytes to read from stream manager
heron.instance.network.read.batch.size.bytes: 32768

# The size of packets written to stream manager will be determined by the minimum of
# (a) time based
# (b) size based

# Time based, the maximum batch time in ms for instance to write to stream manager per attempt
heron.instance.network.write.batch.time.ms: 16

# Size based, the maximum batch size in bytes to write to stream manager
heron.instance.network.write.batch.size.bytes: 32768

# The maximum socket's send buffer size in bytes
heron.instance.network.options.socket.send.buffer.size.bytes: 6553600

# The maximum socket's received buffer size in bytes of instance's network options
heron.instance.network.options.socket.received.buffer.size.bytes: 8738000

# The maximum # of data tuple to batch in a HeronDataTupleSet protobuf
heron.instance.set.data.tuple.capacity: 1024

# The maximum size in bytes of data tuple to batch in a HeronDataTupleSet protobuf
heron.instance.set.data.tuple.size.bytes: 8388608

# The maximum # of control tuple to batch in a HeronControlTupleSet protobuf
heron.instance.set.control.tuple.capacity: 1024

# The maximum time in ms for a spout to do acknowledgement per attempt, the ack batch could
# also break if there are no more ack tuples to process
heron.instance.ack.batch.time.ms: 128

# The maximum time in ms for an spout instance to emit tuples per attempt
heron.instance.emit.batch.time.ms: 16

# The maximum batch size in bytes for an spout to emit tuples per attempt
heron.instance.emit.batch.size.bytes: 32768

# The maximum time in ms for an bolt instance to execute tuples per attempt
heron.instance.execute.batch.time.ms: 16

# The maximum batch size in bytes for an bolt instance to execute tuples per attempt
heron.instance.execute.batch.size.bytes: 32768

# The time interval for an instance to check the state change,
# for example, the interval a spout uses to check whether activate/deactivate is invoked
heron.instance.state.check.interval.sec: 5

# The time to wait before the instance exits forcibly when uncaught exception happens
heron.instance.force.exit.timeout.ms: 2000

# Interval in seconds to reconnect to the stream manager, including the request timeout in connecting
heron.instance.reconnect.streammgr.interval.sec: 5
heron.instance.reconnect.streammgr.times: 60

# Interval in seconds to reconnect to the metrics manager, including the request timeout in connecting
heron.instance.reconnect.metricsmgr.interval.sec: 5
heron.instance.reconnect.metricsmgr.times: 60

# The interval in second for an instance to sample its system metrics, for instance, cpu load.
heron.instance.metrics.system.sample.interval.sec: 10

heron.instance.slave.fetch.pplan.interval.sec: 1

# For efficient acknowledgement
heron.instance.acknowledgement.nbuckets: 10

################################################################################
# For dynamically tuning the available sizes in the interval read & write queues
# to provide high performance while avoiding GC issues
################################################################################

# The expected size on read queue in bolt
heron.instance.tuning.expected.bolt.read.queue.size: 8

# The expected size on write queue in bolt
heron.instance.tuning.expected.bolt.write.queue.size: 8

# The expected size on read queue in spout
heron.instance.tuning.expected.spout.read.queue.size: 512

# The exepected size on write queue in spout
heron.instance.tuning.expected.spout.write.queue.size: 8

# The expected size on metrics write queue
heron.instance.tuning.expected.metrics.write.queue.size: 8

heron.instance.tuning.current.sample.weight: 0.8

# Interval in ms to tune the size of in & out data queue in instance
heron.instance.tuning.interval.ms: 100
Loading