How to use this tool with Spark Structured Streaming? #5

yzhan-te · 2019-08-09T18:39:57Z

Hello,

Is there a way to setup this tool to collect metrics for Spark Structured Streaming? Currently we have a spark job that pulls Kafka constantly and we have to manually shut it down every time. If I do the manual shutdown, the shell script will complain that the return code is bad and no metrics has been collected.

Thanks!

spektom · 2019-08-09T19:01:22Z

Hi,

The script should collect metrics regardless of exit code. Please try the following:

Start your Spark streaming application with --conf spark.streaming.stopGracefullyOnShutdown=true
Let the application run, and collect some metrics
Kill the application process by sending a SIGTERM signal to the Spark java process (something like this should work: ps -ef | grep spark | grep java | grep -v grep | awk '{print $2}' | xargs kill -SIGTERM)

See if metrics are collected this way. If the above doesn't help, please attach the output from the script here.

Thanks!

yzhan-te · 2019-08-09T20:26:19Z

Hi,

Thanks for replying! This it the command I'm using:

/usr/local/bin/spark-submit-flamegraph --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.amazonaws:aws-java-sdk:1.7.4,com.typesafe:config:1.3.4,com.etsy:statsd-jvm-profiler:2.0.0 --conf spark.streaming.stopGracefullyOnShutdown=true --master yarn --deploy-mode cluster  --class com.package.SparkApp spark-scala_2.11-1.0-SNAPSHOT.jar cluster

And I am getting the following when stopping the app:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429
[2019-08-09T20:20:07,136864792+0000] Spark has exited with bad exit code (130)
[2019-08-09T20:20:07,154308000+0000] Collecting profiling metrics
[2019-08-09T20:20:07,524389370+0000] No profiling metrics were recorded!

Also if I kill it with this command:

ps -ef | grep spark | grep java | grep -v grep | grep -v zeppelin | awk '{print $ 2}' | sudo xargs kill -SIGTERM

Then script doesn't even print anything. It becomes just:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429

spektom · 2019-08-11T19:16:57Z

Can you please try the latest version from master? (f66976e)

spektom · 2019-08-12T14:15:50Z

Should be fixed with latest master.

yzhan-te · 2019-08-12T16:35:40Z

Hi there,

I tried the latest version and it's still not working. When I exit the job, it was in the state of "Collecting profiling metrics" for about 5 seconds and ended with "No profiling metrics were recorded!". Are there any specific commands that I need to put in my code to get it collect metrics? Also I am running this on a EMR cluster with yarn-cluster mode. Do I need to do any changes to the worker machines?

Thanks!

spektom · 2019-08-12T16:55:51Z

Hi @yzhan-te,

No special configuration is needed, it's added automatically by the script.
What can be checked is:

Look into Spark UI of the running process, and inspect Java properties for driver and executor processes. They must contain something like -javaagent:statsd-jvm-profiler.jar=server=<server>,port=<port>,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark. Alternatively, you can validate that by looking at Java command line parameters: ps -ef | grep java.
Check that the server= and port= in the above configuration contain reachable hostname and port. The IP address is detected by the spark-submit-flamegraph script, and the port number is chosen randomly. These host and port are where the script is running, and all Spark components are reporting metrics through it.

Please report back if something is not as described above.
Thanks!

yzhan-te · 2019-08-12T17:02:09Z

Yep the settings are there:

-arg --driver-java-options --arg -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<my_ip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --properties-file

The port is reachable too. Are there any other things you want me to look at specifically?

spektom · 2019-08-12T18:32:53Z

Can you look at the Java process command line (ps -ef | grep java) and see whether there are these Java properties?

yzhan-te · 2019-08-13T21:17:12Z

Looks like it's also there:

hadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.thousandeyes.moneta.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=sparkhadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.package.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark

spektom · 2019-08-14T03:04:34Z

This is the Spark client application command line. What's interesting is the driver and executor processes themselves. Could you get them as well? Thanks!

spektom closed this as completed Aug 12, 2019

spektom reopened this Aug 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use this tool with Spark Structured Streaming? #5

How to use this tool with Spark Structured Streaming? #5

yzhan-te commented Aug 9, 2019

spektom commented Aug 9, 2019 •

edited

Loading

yzhan-te commented Aug 9, 2019 •

edited

Loading

spektom commented Aug 11, 2019

spektom commented Aug 12, 2019

yzhan-te commented Aug 12, 2019

spektom commented Aug 12, 2019

yzhan-te commented Aug 12, 2019

spektom commented Aug 12, 2019

yzhan-te commented Aug 13, 2019

spektom commented Aug 14, 2019 via email •

edited

Loading

How to use this tool with Spark Structured Streaming? #5

How to use this tool with Spark Structured Streaming? #5

Comments

yzhan-te commented Aug 9, 2019

spektom commented Aug 9, 2019 • edited Loading

yzhan-te commented Aug 9, 2019 • edited Loading

spektom commented Aug 11, 2019

spektom commented Aug 12, 2019

yzhan-te commented Aug 12, 2019

spektom commented Aug 12, 2019

yzhan-te commented Aug 12, 2019

spektom commented Aug 12, 2019

yzhan-te commented Aug 13, 2019

spektom commented Aug 14, 2019 via email • edited Loading

spektom commented Aug 9, 2019 •

edited

Loading

yzhan-te commented Aug 9, 2019 •

edited

Loading

spektom commented Aug 14, 2019 via email •

edited

Loading