jorrit.schaap Posted October 8, 2014 Posted October 8, 2014 Hi, I'm investigating how to get physics deterministic. Of course I started out at setting physics.setStable(1) and using a fixed timestep (we modified the clock so we control each phyics time step.) Also, I set the budget to 1e6 so each step is not terminated before the budget runs out.. Also this reported bug has to be fixed before any deterministic results can be achieved. Starting two instances of the same sample (for example shapes/sphere_01) yield different outcomes which can be easily checked by visual inspection of the endposition of the individual spheres between the two application instances. I dug into the code and started logging the individual position of each sphere for each physics step into a file. Comparing the two files (from two application instances) show that the initial state is equal, but they diverge at a random moment during execution. Delving deeper, I also started writing to the file when each island was processed by which thread. I noticed that just before the sphere positions between the two instances diverge, they execute islands in a different order. So, it seems a multithreading issue. I ran the samples with physics single threaded many times, and then the results are always equal, so it is definitely a multithreading issue. Delving deeper into the code I did not find any race condition, or any interaction between the islands which might cause threading related read/write issues. My last suspicion is in the cpu context switch. Maybe the FPU/SSE registers are not saved on context switching causing the calculations to diverge? What is your opinion on this hypothesis? Best regards, Jorrit
ulf.schroeter Posted October 8, 2014 Posted October 8, 2014 https://developer.unigine.com/forum/topic/1143-physics-fully-deterministic/?hl=+deterministic
jorrit.schaap Posted October 9, 2014 Author Posted October 9, 2014 Dear Ulf, thanks for the link. I already read that forum discussion, and all statements in the discussion are in agreement with my findings above. Let me summerize the discussion and my findings, and then point out my opinion why single-threaded unigine physics can be deterministic, and why multi-threaded unigine phyics can't (in the current implementation). For deterministic physics you need a fixed delta_time time stepping algorithm computing v = v0 + a*dt. This yields the same results on the same machine every time you run it! Unigine ensures collisions and forces are solved iteratively with stable sorting, so this also gives the same results on the same machine every time you run it. Using single-threaded physics and the above settings (fixed timestep, stable sort, unlimited budget), I can confirm that on the same machine Unigine physics IS deterministic. However, using multi threaded physics, the results are NOT deterministic. Digging into the code, I could not find any flaw in simultaneous read/write access between threads and body states. The code is nicely split into separate islands, which are solved in parallel. The only difference I found is that the ORDER in which the islands are executed can be different, which is logical because there is no definite order in which parallel threads run. That's fine, and thats just how threads work, and should work. Between two runs of the program, at the moment the order of islands-solving (thread execution) is different, then the calculations of the new velocities diverge. That is my new finding in the whole discussion. The IEE 754 standard ensure floating point calculations in itself are deterministic. However, context switching does NOT save/restore FPU/SSE registers by default!!! This leads me to conclude that it is the context switch which "pollutes" the FPU/SSE registers, leading to accumulation of this difference between two runs. There is a remedy, which is to tell the CPU to save/restore the FPU/SSE registers upon a context switch, but this comes at a cost: extra bytes to store the registers, and copying them takes time. So, performance will degrade. I'll try to see if I can make this into a setting, so people who want fast calculations can swithc it off, and people who want deterministic multi-threaded physics can switch it on (at a cost) Best regards, Jorrit
ulf.schroeter Posted October 9, 2014 Posted October 9, 2014 Just guessing, but I would dig into engine C++ FPU compile settings (e.g./fp: strict instead if /fp:fast) https://www.google.de/url?sa=t&source=web&rct=j&ei=q1k2VK3FI6n_ygPjq4C4Dw&url=https://software.intel.com/sites/default/files/article/164389/fp-consistency-102511.pdf&ved=0CC4QFjAG&usg=AFQjCNFH1Gnz4kyfrhkME2IGpjJFs44RJQ
jorrit.schaap Posted October 9, 2014 Author Posted October 9, 2014 Thanks. Interesting article. I'll give it a go as well with these different settings.
ulf.schroeter Posted October 9, 2014 Posted October 9, 2014 Just saw that previous article was for iNtel compiler. But for Mirocsoft Visual C++ there are similar settings/aspects http://msdn.microsoft.com/en-us/library/aa289157%28v=vs.71%29.aspx
Recommended Posts