Jump to content

UnigineScript performance


photo

Recommended Posts

Our code was suffering from bad performance, and when profiling it (using a C++ profiler) the Interpreter::get function kept showing up high in the list (see attachment). It seems some time ago this function was changed from a simple trivial getter that returned a static variable to a function that has a SpinLock inside and keeps the active interpreter per thread in a map to allow multithreading. But since the function is called a lot, this sums up to a couple of milliseconds of overhead with our code base.

 

From what I understand this lock and map are there only to make sure each thread has it's own context, basically what thread local storage does. Most compilers seem to have native support for thread local storage though which is a lot faster then using a map and a lock, I did a quick test and it shaved several milliseconds of our frametime, and the Interpreter::get function disappeared from the list of heaviest functions in the profiler. Would it be possible to change this (at least for compilers that have support for thread local storage) without any side effects? Our software seems to run fine with this change, but I want to be sure I didn't overlook some side effect that might appear from this.

post-999-0-41126300-1456491899_thumb.png

Link to comment

I can't post the exact code since it was basically our entire project where I noticed this on. But it seems it's related mostly to the usage of pointers in script (both user and extern classes), so the following simple script should make the profiler show similar bottlenecks:

class Foo {
	void doNothing() {}
};
class Bar {
	void doStuff( Foo foo ) {
		foo.doNothing();
	}
};
Foo foo;
Bar bar;

int init() {
	foo = new Foo();
	bar = new Bar();
	return 1;
}
	
int update() {
	forloop( int i = 0; 50000 ) {
		bar.doStuff( foo );
	}
	return 1;
}

If I run this script in an otherwise nearly empty world I get 132fps with our unmodified binaries, and 197fps with the binaries where I altered Interpreter::get to use a thread local static rather then a spinlock+map. That is about 2,5 milliseconds difference per frame. And according to the profiler there is still a lot of overhead in the Variable::set function due to the reference counting (grabObject and releaseObject) that together take about 40% of all the CPU samples in this test case.

 

From what I see most of the overhead happens under the hood, as I'm not actually assigning any pointers in this script, but internally for every function call that involves pointers as parameters (including the this pointer) variables are pushed to the stack which involves the reference counters to be updated (which involves getting the interpreter, looking up the class object, looking up the instance and increasing the counter for it and for every of it's base classes). In our actual code base most code is object oriented so a lot of pointers will be involved (at least the this pointer) for every function call, during a single frame in a heavy scene the grabObject/releaseObject functions were invoked around 200000 times making them quite large bottlenecks.

Link to comment
×
×
  • Create New...