The Sadness of the Ignored Null Pointer
July 14, 2016
You can view and download the code for this post from our Github Pull Request: https://github.com/EpicGames/UnrealEngine/pull/3728
Take a look at the following function (as always, I looked at this one due to it showing up in performance tests on our project)… see if you can see anything suspicious?
FORCEINLINE bool IsPendingKill() const { return GUObjectArray.IndexToObject(InternalIndex)->IsPendingKill(); }
Now take a look at IndexToObject(), used in the above:-
FORCEINLINE FUObjectItem* IndexToObject(int32 Index) { check(Index >= 0); if (Index < ObjObjects.Num()) { return const_cast<FUObjectItem*>(&ObjObjects[Index]); } return nullptr; }
Seen anything yet..? There’s something here which, assuming the code above isn’t wasting precious CPU cycles, isn’t quite right.
Yep, if Index is too large, we’re going to be returning nullptr. And IsPendingKill() isn’t picking that up, it’s jumping straight into FUObjectItem::IsPendingKill()… which would give an access violation. Here’s the assembly just to show that:-
0x1413738e6 movsxd rax, dword ptr [rbx+0xc] 0x1413738ea cmp eax, dword ptr [rip+0x19c2334] 0x1413738f0 jnl 0x1413738ff 0x1413738f2 shl rax, 0x4 0x1413738f6 add rax, qword ptr [rip+0x19c231b] 0x1413738fd jmp 0x141373901 0x1413738ff xor eax, eax 0x141373901 mov eax, dword ptr [rax+0x8] 0x141373904 shr eax, 0x1d 0x141373907 test al, 0x1
A quick explanation of this:-
- Lines 1-3 are dealing with “if (Index < ObjObjects.Num())”, jumping through to line 7 if Index is too large (jnl is “jump if not less”)
- Lines 4-5 work out the FUobjectItem pointer
- Lines 6 jumps over line 7 to line 8
- Line 7 sets eax to zero (xor’ing a register with itself is a short/fast way of doing that)
- Lines 8-9 grab the IsPendingKill value
- Line 10 tests it
Considering all of this, one of these 2 things must be true:-
- IsPendingKill() has a bug .. it should be checking for nullptr … we’d need to investigate whether the function should then return true or false in that case…;
- Index is never too big – so we could optimize the code perhaps?
For now, since the code has been this way for 15 months, and as far as I know no problems have surfaced as a result of it potentially being wrong, I’m going to assume that case (2) is the correct one… at the very least, if we assume that, we’re no worse off than we were before – if we ignore Index being too large, we’d likely just have an access violation or just a read from somewhere beyond the ObjObjects array.
While looking more into this, I found the following function close to IndexToObject():-
FORCEINLINE FUObjectItem* IndexToObjectUnsafeForGC(int32 Index) { return const_cast<FUObjectItem*>(&ObjObjects[Index]); }
This does exactly what we want, removing the test on Index… so we can go ahead and replace IsPendingKill() with a call to this instead:-
FORCEINLINE bool IsPendingKill() const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->IsPendingKill(); }
And, while we’re at it, the same changes can be made in several other functions… these are all in Engine\Source\Runtime\CoreUObject\Public\UObject\UObjectBaseUtility.h:-
FORCEINLINE void MarkPendingKill() { check(!IsRooted()); GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->SetPendingKill(); }
FORCEINLINE void ClearPendingKill() { GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->ClearPendingKill(); }
FORCEINLINE void AddToRoot() { GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->SetRootSet(); }
FORCEINLINE void RemoveFromRoot() { GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->ClearRootSet(); }
FORCEINLINE bool IsRooted() { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->IsRootSet(); }
FORCEINLINE bool ThisThreadAtomicallyClearedRFUnreachable() { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->ThisThreadAtomicallyClearedRFUnreachable(); }
FORCEINLINE bool IsUnreachable() const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->IsUnreachable(); }
FORCEINLINE bool IsPendingKillOrUnreachable() const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->HasAnyFlags(EInternalObjectFlags::PendingKill | EInternalObjectFlags::Unreachable); }
FORCEINLINE bool IsNative() const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->HasAnyFlags(EInternalObjectFlags::Native); }
FORCEINLINE void SetInternalFlags(EInternalObjectFlags FlagsToSet) const { GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->SetFlags(FlagsToSet); }
FORCEINLINE EInternalObjectFlags GetInternalFlags() const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->GetFlags(); }
FORCEINLINE bool HasAnyInternalFlags(EInternalObjectFlags FlagsToCheck) const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->HasAnyFlags(FlagsToCheck); }
FORCEINLINE void ClearInternalFlags(EInternalObjectFlags FlagsToClear) const { GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->ClearFlags(FlagsToClear); }
FORCEINLINE bool AtomicallyClearInternalFlags(EInternalObjectFlags FlagsToClear) const { return GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->ThisThreadAtomicallyClearedFlag(FlagsToClear); }
Ok, sorry, when I started copy/pasting those, I honestly didn’t realize there were so many of them! But there you go…
As well as those, there are 2 more in Engine\Source\Runtime\CoreUObject\Private\UObject\UObjectBase.cpp… the first at the end of UObjectBase::DeferredRegister():-
check(!GUObjectArray.IsDisregardForGC(this) || GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->IsRootSet());
And the next close to the end of UObjectBase::AddObject():-
if (InternalFlagsToSet != EInternalObjectFlags::None) { GUObjectArray.IndexToObjectUnsafeForGC(InternalIndex)->SetFlags(InternalFlagsToSet); }
And that’s it..! Here’s how the new code looks when disassembled – feel free to compare this to the 10-line version above:-
00007FF6A79A28F6 movsxd rax,dword ptr [rbp+0Ch] 00007FF6A79A28FA mov rdx,qword ptr [GUObjectArray+10h (07FF6A9684288h)] 00007FF6A79A2901 add rax,rax 00007FF6A79A2904 mov ecx,dword ptr [rdx+rax*8+8] 00007FF6A79A2908 shr ecx,1Dh 00007FF6A79A290B test cl,1
We’re down to 6 simple lines of assembly – with no branching logic at all. That’s a definite win.
As I say, I’m not 100% sure whether this is really the “right” thing to do … it would be great to get some input from someone at Epic (perhaps in the comments below)… but, as it stands, if there’s a bug here then there was a bug in the previous code too … that seems unlikely, given the amount of time that’s now passed since the “bug” would’ve been introduced.
Anyway, with the new code, we see a significant reduction in time spent in IsPendingKill() (since it’s now much simpler)… and with the other functions, too, of course – though they weren’t actually causing us any great performance concerns at the time that we tested (pre-emptive optimizations are cool too, though, right..?).
Credit(s): Robert Troughton (Coconut Lizard)
Status: Currently unimplemented in 4.12