Using The Disassembler To Highlight Optimization Targets
May 23, 2016
The changes from this blog can be seen in our Github Pull Request: github.com/EpicGames/UnrealEngine/pull/3722
It’s sometimes amazing how many string functions show up in profiling runs of UE4 – whether running final games, commandlets (such as content cooking) or the editor.
The FPaths::IsRelative() function showed up on a recent test – this can be found in Paths.cpp:-
bool FPaths::IsRelative(const FString& InPath) { const bool IsRooted = InPath.StartsWith(TEXT("\\"), ESearchCase::CaseSensitive) || InPath.StartsWith(TEXT("/"), ESearchCase::CaseSensitive) || InPath.StartsWith(TEXT("root:/")) | (InPath.Len() >= 2 && FChar::IsAlpha(InPath[0]) && InPath[1] == TEXT(':')); return !IsRooted; }
It looks innocent enough, right..? Just some harmless string tests to determine whether InPath is a relative path (eg. “../engine/myfile.uasset”) or absolute (eg. “c:\\myfile.uasset”).
To look into this one, I did what I often do… fired up the debugger, set a breakpoint in the function … and then looked at the disassembly. It was at this point that the true horror of the situation became immediately apparent. Here’s a small piece of the disassembly:-
00007FF6DFF97B4E mov ecx,2 00007FF6DFF97B53 xor edi,edi 00007FF6DFF97B55 xor edx,edx 00007FF6DFF97B57 mov r8d,ecx 00007FF6DFF97B5A mov qword ptr [rsp+70h],rbx 00007FF6DFF97B5F mov dword ptr [rbp+28h],edi 00007FF6DFF97B62 mov qword ptr [rbp-10h],rdi 00007FF6DFF97B66 mov qword ptr [rbp-8],2 00007FF6DFF97B6E call DefaultCalculateSlack (07FF6DFEC7DD0h) 00007FF6DFF97B73 movsxd rcx,eax 00007FF6DFF97B76 mov rax,qword ptr [rbp-10h] 00007FF6DFF97B7A mov dword ptr [rbp-4],ecx 00007FF6DFF97B7D test rax,rax 00007FF6DFF97B80 jne FPaths::IsRelative+46h (07FF6DFF97B86h) 00007FF6DFF97B82 test ecx,ecx 00007FF6DFF97B84 je FPaths::IsRelative+5Bh (07FF6DFF97B9Bh) 00007FF6DFF97B86 mov rdx,rcx 00007FF6DFF97B89 xor r8d,r8d 00007FF6DFF97B8C mov rcx,rax 00007FF6DFF97B8F add rdx,rdx 00007FF6DFF97B92 call FMemory::Realloc (07FF6DFF04CB0h) 00007FF6DFF97B97 mov qword ptr [rbp-10h],rax 00007FF6DFF97B9B lea rdx,[ToUpperAdjustmentTable+2ABCh (07FF6E1EFFB9Ch)] 00007FF6DFF97BA2 mov r8d,4 00007FF6DFF97BA8 mov rcx,rax 00007FF6DFF97BAB call FGenericPlatformString::Memcpy (07FF6DFED3CA0h) 00007FF6DFF97BB0 lea rdx,[rbp-10h] 00007FF6DFF97BB4 xor r8d,r8d 00007FF6DFF97BB7 mov rcx,rsi 00007FF6DFF97BBA mov ebx,1 00007FF6DFF97BBF call FString::StartsWith (07FF6DFEDD440h) 00007FF6DFF97BC4 test al,al 00007FF6DFF97BC6 jne FPaths::IsRelative+1BBh (07FF6DFF97CFBh)
The code above accounts for just a quarter of the whole function. All of this is just to do a single line of code from the C++… scary!
Not only is the code long, you should note the call to FMemory::Realloc() … that’s just one of the three that occurs in the full disassembly. Later in the code and not shown in the snippet above, were calls to FMemory::Free() (three of these, too). And finally, StartsWith() isn’t exactly a cheap function to be calling here either (note that StartsWith() only has an FString implementation).
So… here’s what I did:-
- reduced the amount of calls to StartsWith() by replacing such as InPath.StartsWith(TEXT(“\\”), ESearchCase::CaseSensitive) with ((InPath[0] == ‘\\’) && (InPath[1] == ‘\\’)) (nb. you also need to check the length of InPath to make sure that we’re not accessing invalid memory);
- removed the remaining runtime TEXT() blocks by creating them externally;
- wrapped one of the tests with WITH_EDITOR (the root pathing is only appropriate there).
My final code looked like this:-
// Paths.cpp:-
#if WITH_EDITOR FString FPaths::RootPrefix = TEXT("root:/"); #endif // WITH_EDITOR bool FPaths::IsRelative(const FString& InPath) { const uint32 PathLen = InPath.Len(); const bool IsRooted = PathLen && ((InPath[0] == '/') || (PathLen >= 2 && ( ((InPath[0] == '\\') && (InPath[1] == '\\')) || (InPath[1] == ':' && FChar::IsAlpha(InPath[0])) #if WITH_EDITOR || (InPath.StartsWith(RootPrefix)) #endif // WITH_EDITOR )) ); return !IsRooted; }
// Paths.h:-
private: #if WITH_EDITOR static FString RootPrefix; #endif // WITH_EDITOR
Here’s how this all looks when we disassemble the new code:-
00007FF7147B695A mov edx,dword ptr [r8+8] 00007FF7147B695E mov rsi,rcx 00007FF7147B6961 test edx,edx 00007FF7147B6963 je FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h) 00007FF7147B6965 dec edx 00007FF7147B6967 je FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h) 00007FF7147B6969 mov rax,qword ptr [r8] 00007FF7147B696C movzx ecx,word ptr [rax] 00007FF7147B696F cmp cx,2Fh 00007FF7147B6973 je FPaths::ConvertRelativePathToFull+0A9h (07FF7147B69D9h) 00007FF7147B6975 cmp edx,2 00007FF7147B6978 jb FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h) 00007FF7147B697A cmp cx,5Ch 00007FF7147B697E jne FPaths::ConvertRelativePathToFull+56h (07FF7147B6986h) 00007FF7147B6980 cmp word ptr [rax+2],cx 00007FF7147B6984 je FPaths::ConvertRelativePathToFull+0A9h (07FF7147B69D9h) 00007FF7147B6986 cmp word ptr [rax+2],3Ah 00007FF7147B698B jne FPaths::ConvertRelativePathToFull+67h (07FF7147B6997h) 00007FF7147B698D call qword ptr [__imp_iswalpha (07FF716557B48h)] 00007FF7147B6993 test eax,eax 00007FF7147B6995 jne FPaths::ConvertRelativePathToFull+0A9h (07FF7147B69D9h)
This is the -entire- function in a non-editor build. Much, much better I’m sure you’ll agree.
A nice side effect of optimising this function is that the compiler is now happy to inline it – without us even specifying FORCEINLINE or INLINE!
My final tests showed a greater than 20 times performance increase for IsRelative() with 10% of the code footprint of the old version.
Credit(s): Robert Troughton (Coconut Lizard)
Status: Currently unimplemented in 4.12