-
Notifications
You must be signed in to change notification settings - Fork 43
Optimisations: Rerouting and Threads
A special thank to Stefan Neumeier (Technische Hochschule Ingolstadt)
By adding the line <device.rerouting.threads value="X" />
and replacing X with the number of threads to use, sumo will use threads for parallel computation of rerouting.
For being able to use this feature, it is necessary to compile sumo with FOX.
If FOX is configured correctly, after running ./configure the config.log file should contain something like:
configure:16387: checking for fox-config
configure:16405: found /usr/bin/fox-config
configure:16417: result: /usr/bin/fox-config
configure:16448: checking fxver.h usability
configure:16448: g++ -c -O2 -DNDEBUG -I/usr/include/fox-1.6 conftest.cpp >&5
When building with make, the output has to contain "-I/usr/include/fox-1.6"
so that fox is really used.
If fox is included in compilation and the configuration file is adapted, rerouting now should be done using the number of threads passed as value. The conf_threads_LuSTScenario.sumo.cfg file contains all the configuration information and can be used with sumo -c conf_threads_LuSTScenario.sumo.cfg
.
The usage of multiple threads can be validated by using the command ps -o nlwp PID when executing sumo.
The number shown is the number of Lightweight Processes (threads).
(For Mac OSX users, ps -M PID shows the threads corresponding to each task.)
The number specified in the configuration file, indicates additional threads spawned aside the already running sumo thread. For example, setting the value to 4, will lead sumo to run with a total of 5 threads (4 rerouting and 1 sumo).
The gained performance improvements are shown following. Execution was done on a machine with 32 cores and 60 GB of RAM. As base configuration (100 percent of runtime), there was the normal configuration without specifying the threads parameter (which means no threads are spawned). 0 Threads mean that the thread-parameter was set in the configuration, but with a value of 0. There was also a configuration with 31 cores tested, to have the full utilisation of the server with 32 threads based on 31 threads for rerouting and 1 thread for sumo (mentioned above why this happens).
Number of Threads | Time in seconds | Percent of total |
---|---|---|
No config | 3,580 | 100 |
0 Threads | 3,575 | 99.86 |
1 Thread | 3,600 | 100.55 |
2 Threads | 2,301 | 64.27 |
4 Threads | 1,562 | 43.63 |
8 Threads | 1,152 | 32.17 |
16 Threads | 1,063 | 29.69 |
31 Threads | 946 | 26.42 |
32 Threads | 952 | 26.59 |
64 Threads | 941 | 26.28 |
It has to be said, that using multiple threads lead to non-deterministic results, as it is non-deterministic which thread calculates the next step. Results of the measurements showed up:
Running the simulation multiple times on a server (4 cores) using 1 thread always led to the result:
Inserted: 138361 (Loaded: 138613)
Running: 104
Waiting: 252
Teleports: 46 (Collisions: 5, Jam: 10, Yield: 11, Wrong Lane: 20)
Emergency Stops: 8
The same result can be achieved by using other servers without having the configuration file modified or having the values set to use 0 threads or 1 thread for computing rerouting. Having more than 1 thread for calculating the rerouting, the results are getting non-deterministic.
Running the simulation on the same server (4 cores) using 4 threads led to the results:
Run No 1:
Inserted: 138361 (Loaded: 138613)
Running: 100
Waiting: 252
Teleports: 97 (Collisions: 3, Jam: 52, Yield: 22, Wrong Lane: 20)
Emergency Stops: 7
Run No 2:
Inserted: 138361 (Loaded: 138613)
Running: 101
Waiting: 252
Teleports: 41 (Collisions: 7, Jam: 9, Yield: 6, Wrong Lane: 19)
Emergency Stops: 7
Running the simulation on another server (32 cores) while using 32 threads led to the results:
Run No 1:
Inserted: 138361 (Loaded: 138613)
Running: 100
Waiting: 252
Teleports: 47 (Collisions: 2, Jam: 1, Yield: 14, Wrong Lane: 30)
Emergency Stops: 12
Run No 2:
Inserted: 138361 (Loaded: 138613)
Running: 96
Waiting: 252
Teleports: 35 (Collisions: 2, Jam: 1, Yield: 9, Wrong Lane: 23)
Emergency Stops: 9
As it can be seen, the results differ from run to run by using more than one thread. By using no configuration (config file does not contain the threads-parameter, leading to now thread spawning), 0 threads or 1 thread the results are deterministic.
For normal simulation runs, this behaviour should be no problem.
The improvements shown here depend highly on the usage of rerouting.
If there is no rerouting, there will be no improvements.
If there is more rerouting, the improvements will be greater.