ExBemined Posted February 26, 2016 Share Posted February 26, 2016 Our code was suffering from bad performance, and when profiling it (using a C++ profiler) the Interpreter::get function kept showing up high in the list (see attachment). It seems some time ago this function was changed from a simple trivial getter that returned a static variable to a function that has a SpinLock inside and keeps the active interpreter per thread in a map to allow multithreading. But since the function is called a lot, this sums up to a couple of milliseconds of overhead with our code base. From what I understand this lock and map are there only to make sure each thread has it's own context, basically what thread local storage does. Most compilers seem to have native support for thread local storage though which is a lot faster then using a map and a lock, I did a quick test and it shaved several milliseconds of our frametime, and the Interpreter::get function disappeared from the list of heaviest functions in the profiler. Would it be possible to change this (at least for compilers that have support for thread local storage) without any side effects? Our software seems to run fine with this change, but I want to be sure I didn't overlook some side effect that might appear from this. Link to comment
silent Posted February 26, 2016 Share Posted February 26, 2016 Hi Michiel, Is it possible to get your test scene (UnigineScript) to check it on our test farm?Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
ExBemined Posted February 29, 2016 Author Share Posted February 29, 2016 I can't post the exact code since it was basically our entire project where I noticed this on. But it seems it's related mostly to the usage of pointers in script (both user and extern classes), so the following simple script should make the profiler show similar bottlenecks: class Foo { void doNothing() {} }; class Bar { void doStuff( Foo foo ) { foo.doNothing(); } }; Foo foo; Bar bar; int init() { foo = new Foo(); bar = new Bar(); return 1; } int update() { forloop( int i = 0; 50000 ) { bar.doStuff( foo ); } return 1; } If I run this script in an otherwise nearly empty world I get 132fps with our unmodified binaries, and 197fps with the binaries where I altered Interpreter::get to use a thread local static rather then a spinlock+map. That is about 2,5 milliseconds difference per frame. And according to the profiler there is still a lot of overhead in the Variable::set function due to the reference counting (grabObject and releaseObject) that together take about 40% of all the CPU samples in this test case. From what I see most of the overhead happens under the hood, as I'm not actually assigning any pointers in this script, but internally for every function call that involves pointers as parameters (including the this pointer) variables are pushed to the stack which involves the reference counters to be updated (which involves getting the interpreter, looking up the class object, looking up the instance and increasing the counter for it and for every of it's base classes). In our actual code base most code is object oriented so a lot of pointers will be involved (at least the this pointer) for every function call, during a single frame in a heavy scene the grabObject/releaseObject functions were invoked around 200000 times making them quite large bottlenecks. Link to comment
silent Posted February 29, 2016 Share Posted February 29, 2016 Hi Michael, Thank you for the test scene. I've added this task to our internal bug tracker. We will keep you updated. How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
Recommended Posts