skip to main content

Speeding Up The Time Limit Controls

 SpeedingUpTheTimeLimitControls

Analyzing IsTimeLimitExceeded()


When you have a process that takes a long time to complete, and that process is running on the main thread, it’s common to use some sort of time-limiting code to make sure that other threads aren’t completely stalled. UE4 has a few of these, usually using a version of IsTimeLimitExceeded(). The FLinkerLoad class is one – and that’s the one we’re going to look at here.

During a vTune run of our server build, we noticed that IsTimeLimitExceeded() itself was taking up quite a bit of CPU time… so we delved in to see whether it could be optimized. Here’s what we saw in vTune:-

image
image
COREUOBJECT_API bool IsTimeLimitExceeded( const TCHAR* CurrentTask, int32 Granularity = 1 );

The final thing I want to show you is an example instance from where the function is called:-

while (Status == LINKER_TimedOut && 
  (!bUseTimeLimit || (bUseFullTimeLimit && !IsTimeLimitExceeded(TEXT("Checking Full Timer"))))
  );

Now, let’s have a look at the resultant assembly code for this…

First, the caller:-

00007FF61932E789  cmp         ebx,2  
00007FF61932E78C  jne         FLinkerLoad::Tick+223h (07FF61932E7C3h)  
00007FF61932E78E  cmp         byte ptr [rdi+6C6h],bpl  
00007FF61932E795  je          FLinkerLoad::Tick+83h (07FF61932E623h)  
00007FF61932E79B  cmp         byte ptr [rdi+6C7h],bpl  
00007FF61932E7A2  je          FLinkerLoad::Tick+21Fh (07FF61932E7BFh)  
00007FF61932E7A4  lea         r8d,[rbx-1]  
00007FF61932E7A8  lea         rdx,[p+3FBCh (07FF61AED0D48h)]  
00007FF61932E7AF  mov         rcx,rdi  
00007FF61932E7B2  call        FLinkerLoad::IsTimeLimitExceeded (07FF61931B6D0h)  
00007FF61932E7B7  test        al,al  
00007FF61932E7B9  je          FLinkerLoad::Tick+83h (07FF61932E623h)

Now FLinkerLoad::IsTimeLimitExceeded() itself:-

00007FF61931B6D0  mov         qword ptr [rsp+10h],rbx  
00007FF61931B6D5  push        rdi  
00007FF61931B6D6  sub         rsp,40h  
00007FF61931B6DA  inc         dword ptr [rcx+6C8h]  
00007FF61931B6E0  cmp         byte ptr [rcx+6C5h],0  
00007FF61931B6E7  mov         rdi,rdx  
00007FF61931B6EA  mov         rbx,rcx  
00007FF61931B6ED  jne         FLinkerLoad::IsTimeLimitExceeded+0FAh (07FF61931B7CAh)  
00007FF61931B6F3  cmp         byte ptr [rcx+6C6h],0  
00007FF61931B6FA  je          FLinkerLoad::IsTimeLimitExceeded+0FAh (07FF61931B7CAh)  
00007FF61931B700  mov         eax,dword ptr [rcx+6C8h]  
00007FF61931B706  cdq  
00007FF61931B707  idiv        eax,r8d  
00007FF61931B70A  test        edx,edx  
00007FF61931B70C  jne         FLinkerLoad::IsTimeLimitExceeded+0FAh (07FF61931B7CAh)  
00007FF61931B712  lea         rcx,[rsp+50h]  
00007FF61931B717  call        qword ptr [__imp_QueryPerformanceCounter (07FF61ACA96E8h)]  
00007FF61931B71D  movss       xmm1,dword ptr [rbx+6CCh]  
00007FF61931B725  xorps       xmm2,xmm2  
00007FF61931B728  cvtsi2sd    xmm2,qword ptr [rsp+50h]  
00007FF61931B72F  cvtps2pd    xmm0,xmm1  
00007FF61931B732  mulsd       xmm2,mmword ptr [FGenericPlatformTime::SecondsPerCycle (07FF61B83BC10h)]  
00007FF61931B73A  addsd       xmm2,mmword ptr [__real@4170000000000000 (07FF61ACD2EF8h)]  
00007FF61931B742  subsd       xmm2,mmword ptr [rbx+6D0h]  
00007FF61931B74A  comisd      xmm2,xmm0  
00007FF61931B74E  cvtps2pd    xmm0,xmm1  
00007FF61931B751  mulsd       xmm0,mmword ptr [__real@4004000000000000 (07FF61AEE1150h)]  
00007FF61931B759  seta        al  
00007FF61931B75C  mov         byte ptr [rbx+6C5h],al  
00007FF61931B762  comisd      xmm2,xmm0  
00007FF61931B766  jbe         FLinkerLoad::IsTimeLimitExceeded+0E8h (07FF61931B7B8h)  
00007FF61931B768  cmp         byte ptr [LogStreaming (07FF61B963190h)],5  
00007FF61931B76F  jb          FLinkerLoad::IsTimeLimitExceeded+0E8h (07FF61931B7B8h)  
00007FF61931B771  mov         rax,qword ptr [LogStreaming+8h (07FF61B963198h)]  
00007FF61931B778  mulsd       xmm2,mmword ptr [__real@408f400000000000 (07FF61ACD2EE8h)]  
00007FF61931B780  lea         r8,[rsp+50h]  
00007FF61931B785  mov         qword ptr [rsp+50h],rax  
00007FF61931B78A  lea         rax,[p+7DA4h (07FF61AED4B30h)]  
00007FF61931B791  lea         rcx,[p+7F94h (07FF61AED4D20h)]  
00007FF61931B798  movsd       mmword ptr [rsp+30h],xmm2  
00007FF61931B79E  mov         r9d,5  
00007FF61931B7A4  mov         edx,328h  
00007FF61931B7A9  mov         qword ptr [rsp+28h],rdi  
00007FF61931B7AE  mov         qword ptr [rsp+20h],rax  
00007FF61931B7B3  call        FMsg::Logf_Internal__VA (07FF61921EA60h)
00007FF61931B7B8  movzx       eax,byte ptr [rbx+6C5h]  
00007FF61931B7BF  mov         rbx,qword ptr [rsp+58h]  
00007FF61931B7C4  add         rsp,40h  
00007FF61931B7C8  pop         rdi  
00007FF61931B7C9  ret  
00007FF61931B7CA  movzx       eax,byte ptr [rcx+6C5h]  
00007FF61931B7D1  mov         rbx,qword ptr [rsp+58h]  
00007FF61931B7D6  add         rsp,40h  
00007FF61931B7DA  pop         rdi  
00007FF61931B7DB  ret

That’s not an insignificant amount of code, really, for such a simple function… all of this is just to decide, on an interval (set by Granularity), whether we’ve spent enough time on the current operation. If Granularity is 1, which it is by default and is in most cases, including the case I’ve cited above, then we’ll check our time limit every time through.

So … looking at IsTimeLimitExceeded()’s disassembly, here’s a breakdown:-

  • Lines 1-3: initial setup
  • Line 4: IsTimeLimitExceededCallCount++
  • Lines 5-15: the first if() block
  • Lines 16-25: double CurrentTime = FPlatformTime::Seconds(); and bTimeLimitExceeded = CurrentTime – TickStartTime > TimeLimit;
  • Lines 26-31: if( (CurrentTime – TickStartTime) > (2.5 * TimeLimit) )
  • Lines 32-45: UE_LOG(LogStreaming, Log, TEXT(“FLinkerLoad: %s took (less than) %5.2f ms”), CurrentTask, (CurrentTime – TickStartTime) * 1000);
  • Lines 46-55: return bTimeLimitExceeded (nb. I really don’t know why this code is duplicated like this..).

Note in particular line 13 – this line is doing a divide just for the purpose of working out “(IsTimeLimitExceededCallCount % Granularity) == 0 )”. As mentioned before, though, Granularity is -usually- 1 … if only there was a way of getting this information to the compiler?

Optimizing IsTimeLimitExceeded()


There are a couple of things I identified as being optimization possibilities here:-

  • We can get rid of the divide by eliminating the Granularity calculation – if Granularity is <= 1 then we don’t need to bother with the call-count calculation at all;
  • The logging code, and checks around it, should be eliminated completely on test/release builds. Note that, as I mentioned earlier, we’re only seeing this at all because we set USE_LOGGING_IN_SHIPPING to be “1” rather than UE4’s default of “0”.

Anyway, here’s how I did it…

Firstly, I added a new define to the top of LinkerLoad.cpp:-

#define LOG_SINGLE_SLOW_LOADS_IN_ISTIMELIMITEDEXCEEDED !(UE_BUILD_TEST||UE_BUILD_SHIPPING)

Then I changed IsTimeLimitExceeded() to be:-

bool FLinkerLoad::IsTimeLimitExceeded( const TCHAR* CurrentTask, int32 Granularity )
{
  if ((!bTimeLimitExceeded)
    && (bUseTimeLimit)
    && ((Granularity <= 1) || ((++IsTimeLimitExceededCallCount % Granularity)== 0))
    )
  {
    double CurrentTime = FPlatformTime::Seconds();
    bTimeLimitExceeded = CurrentTime - TickStartTime > TimeLimit;
#if LOG_SINGLE_SLOW_LOADS_IN_ISTIMELIMITEDEXCEEDED
    if (!FPlatformProperties::HasEditorOnlyData())
    {
      // Log single operations that take longer than timelimit.
      if( (CurrentTime - TickStartTime) > (2.5 * TimeLimit) )
      {
        UE_LOG(LogStreaming, Log, TEXT("FLinkerLoad: %s took (less than) %5.2f ms"), 
          CurrentTask, 
          (CurrentTime - TickStartTime) * 1000);
      }
    }
#endif // LOG_SINGLE_SLOW_LOADS_IN_ISTIMELIMITEDEXCEEDED
  }
  return bTimeLimitExceeded;
}

Upon compiling, I was very happy to see that the compiler decided to inline all of this, making the final disassembly much cleaner. Here it is inlined into the caller:-

00007FF7797FCE7A  cmp         ebx,2  
00007FF7797FCE7D  jne         FLinkerLoad::Tick+21Ah (07FF7797FCF0Ah)  
00007FF7797FCE83  cmp         byte ptr [rdi+6C6h],sil  
00007FF7797FCE8A  je          FLinkerLoad::Tick+83h (07FF7797FCD73h)  
00007FF7797FCE90  cmp         byte ptr [rdi+6C7h],sil  
00007FF7797FCE97  je          FLinkerLoad::Tick+1FBh (07FF7797FCEEBh)  
00007FF7797FCE99  cmp         byte ptr [rdi+6C5h],sil  
00007FF7797FCEA0  jne         FLinkerLoad::Tick+1FBh (07FF7797FCEEBh)  
00007FF7797FCEA2  lea         rcx,[rsp+28h]  
00007FF7797FCEA7  call        qword ptr [__imp_QueryPerformanceCounter (07FF77B1796E8h)]  
00007FF7797FCEAD  movss       xmm0,dword ptr [rdi+6CCh]  
00007FF7797FCEB5  xorps       xmm1,xmm1  
00007FF7797FCEB8  cvtsi2sd    xmm1,qword ptr [rsp+28h]  
00007FF7797FCEBF  cvtps2pd    xmm0,xmm0  
00007FF7797FCEC2  mulsd       xmm1,mmword ptr [FGenericPlatformTime::SecondsPerCycle (07FF77BD0BC10h)]  
00007FF7797FCECA  addsd       xmm1,xmm7  
00007FF7797FCECE  subsd       xmm1,mmword ptr [rdi+6D0h]  
00007FF7797FCED6  comisd      xmm1,xmm0  
00007FF7797FCEDA  seta        al  
00007FF7797FCEDD  mov         byte ptr [rdi+6C5h],al  
00007FF7797FCEE3  test        al,al  
00007FF7797FCEE5  je          FLinkerLoad::Tick+83h (07FF7797FCD73h)

This is, of course, significantly faster. The same happened in the other parts of code calling FLinkerLoad::IsTimeLimitExceeded(), resulting in significantly improved load performance all round.

Credit(s): Robert Troughton (Coconut Lizard)
Status: Currently unimplemented in 4.12

Facebook Messenger Twitter Pinterest Whatsapp Email
Go to Top