Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this tool with Spark Structured Streaming? #5

Open
yzhan-te opened this issue Aug 9, 2019 · 10 comments
Open

How to use this tool with Spark Structured Streaming? #5

yzhan-te opened this issue Aug 9, 2019 · 10 comments

Comments

@yzhan-te
Copy link

yzhan-te commented Aug 9, 2019

Hello,

Is there a way to setup this tool to collect metrics for Spark Structured Streaming? Currently we have a spark job that pulls Kafka constantly and we have to manually shut it down every time. If I do the manual shutdown, the shell script will complain that the return code is bad and no metrics has been collected.

Thanks!

@spektom
Copy link
Owner

spektom commented Aug 9, 2019

Hi,

The script should collect metrics regardless of exit code. Please try the following:

  • Start your Spark streaming application with --conf spark.streaming.stopGracefullyOnShutdown=true
  • Let the application run, and collect some metrics
  • Kill the application process by sending a SIGTERM signal to the Spark java process (something like this should work: ps -ef | grep spark | grep java | grep -v grep | awk '{print $2}' | xargs kill -SIGTERM)

See if metrics are collected this way. If the above doesn't help, please attach the output from the script here.

Thanks!

@yzhan-te
Copy link
Author

yzhan-te commented Aug 9, 2019

Hi,

Thanks for replying! This it the command I'm using:

/usr/local/bin/spark-submit-flamegraph --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.amazonaws:aws-java-sdk:1.7.4,com.typesafe:config:1.3.4,com.etsy:statsd-jvm-profiler:2.0.0 --conf spark.streaming.stopGracefullyOnShutdown=true --master yarn --deploy-mode cluster  --class com.package.SparkApp spark-scala_2.11-1.0-SNAPSHOT.jar cluster

And I am getting the following when stopping the app:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429
[2019-08-09T20:20:07,136864792+0000] Spark has exited with bad exit code (130)
[2019-08-09T20:20:07,154308000+0000] Collecting profiling metrics
[2019-08-09T20:20:07,524389370+0000] No profiling metrics were recorded!

Also if I kill it with this command:

ps -ef | grep spark | grep java | grep -v grep | grep -v zeppelin | awk '{print $ 2}' | sudo xargs kill -SIGTERM

Then script doesn't even print anything. It becomes just:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429

@spektom
Copy link
Owner

spektom commented Aug 11, 2019

Can you please try the latest version from master? (f66976e)

@spektom
Copy link
Owner

spektom commented Aug 12, 2019

Should be fixed with latest master.

@spektom spektom closed this as completed Aug 12, 2019
@yzhan-te
Copy link
Author

Hi there,

I tried the latest version and it's still not working. When I exit the job, it was in the state of "Collecting profiling metrics" for about 5 seconds and ended with "No profiling metrics were recorded!". Are there any specific commands that I need to put in my code to get it collect metrics? Also I am running this on a EMR cluster with yarn-cluster mode. Do I need to do any changes to the worker machines?

Thanks!

@spektom spektom reopened this Aug 12, 2019
@spektom
Copy link
Owner

spektom commented Aug 12, 2019

Hi @yzhan-te,

No special configuration is needed, it's added automatically by the script.
What can be checked is:

  • Look into Spark UI of the running process, and inspect Java properties for driver and executor processes. They must contain something like -javaagent:statsd-jvm-profiler.jar=server=<server>,port=<port>,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark. Alternatively, you can validate that by looking at Java command line parameters: ps -ef | grep java.
  • Check that the server= and port= in the above configuration contain reachable hostname and port. The IP address is detected by the spark-submit-flamegraph script, and the port number is chosen randomly. These host and port are where the script is running, and all Spark components are reporting metrics through it.

Please report back if something is not as described above.
Thanks!

@yzhan-te
Copy link
Author

Yep the settings are there:

-arg --driver-java-options --arg -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<my_ip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --properties-file 

The port is reachable too. Are there any other things you want me to look at specifically?

@spektom
Copy link
Owner

spektom commented Aug 12, 2019

Can you look at the Java process command line (ps -ef | grep java) and see whether there are these Java properties?

@yzhan-te
Copy link
Author

Looks like it's also there:

hadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.thousandeyes.moneta.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=sparkhadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.package.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark

@spektom
Copy link
Owner

spektom commented Aug 14, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants