skip to main content

Portrait Of A Serialize Perf Killer

PortraitOfASerializePerfKiller

Making sure that code is safe and free of crashes/bugs should always considered a high priority. That being said, if you totally wrap your code up in cotton wool, you often won’t be letting it live up to its potential. Sometimes, the checks we implement can affect performance much more than it’s worth.

In this article, I want to look at what I would definitely call a “core” function of UE4. The Serialize() function. With our current project, that function is called something like 21,000,000 (21 MILLION) times just to load up into the frontend. Every little bit of data that’s needed from various packages is loaded through Serialize().

I actually assumed that the function would be incredibly well tuned already, given that it’s so important to so many components of UE4: to the final published games, the editor, the cooker process, they all use Serialize() extensively.

I could end the article here and say, yeah, my assumption was right… except… it wasn’t.

Here’s FArchiveFileReaderGeneric::Serialize() (from FileManagerGeneric.cpp) …

void FArchiveFileReaderGeneric::Serialize( void* V, int64 Length )
{
  while( Length>0 )
  {
    int64 Copy = FMath::Min( Length, BufferBase+BufferCount-Pos );
    if( Copy<=0 )
    {
      if( Length >= ARRAY_COUNT( Buffer ) )
      {
        int64 Count=0;
        {
          ReadLowLevel(( uint8* )V, Length, Count );
        }
        if( Count!=Length )
        {
          TCHAR ErrorBuffer[1024];
          ArIsError = true;
          UE_LOG( LogFileManager, Warning, TEXT( "ReadFile failed: Count=%lld Length=%lld Error=%s for file %s" ), Count, Length, FPlatformMisc::GetSystemErrorMessage( ErrorBuffer, 1024, 0 ), *Filename );
        }
        Pos += Length;
        return;
      }
      InternalPrecache( Pos, MAX_int32 );
      Copy = FMath::Min( Length, BufferBase+BufferCount-Pos );
      if( Copy<=0 )
      {
        ArIsError = true;
        UE_LOG( LogFileManager, Error, TEXT( "ReadFile beyond EOF %lld+%lld/%lld for file %s" ), Pos, Length, Size, *Filename );
      }
      if( ArIsError )
      {
        return;
      }
    }
    FMemory::Memcpy( V, Buffer+Pos-BufferBase, Copy );
    Pos       += Copy;
    Length    -= Copy;
    V          =( uint8* )V + Copy;
  }
}

Considering that every tiniest little bit of data comes individually through this function – including single bytes, words, dwords – the size of this alarmed me a little. Not only that, there are other functions being called from within – ReadLowLevel(), InternalPrecache(), Memcpy(), … of those, InternalPrecache() is the only one of any significant size:-

bool FArchiveFileReaderGeneric::InternalPrecache( int64 PrecacheOffset, int64 PrecacheSize )
{
  if( Pos == PrecacheOffset &&( !BufferBase || !BufferCount || BufferBase != Pos ) )
  {
    BufferBase = Pos;
    BufferCount = FMath::Min( FMath::Min( PrecacheSize,( int64 )( ARRAY_COUNT( Buffer ) -( Pos&( ARRAY_COUNT( Buffer )-1 ) ) ) ), Size-Pos );
    BufferCount = FMath::Max( BufferCount, 0LL ); // clamp to 0
    int64 Count = 0;
    {
      #if PLATFORM_DESKTOP
        if (BufferCount > ARRAY_COUNT( Buffer ) || BufferCount <= 0)
        {
          FText ErrorMessage, ErrorCaption;
          GConfig->GetText(TEXT("/Script/Engine.Engine"), TEXT("SerializationOutOfBoundsErrorMessage"), ErrorMessage, GEngineIni);
          GConfig->GetText(TEXT("/Script/Engine.Engine"), TEXT("SerializationOutOfBoundsErrorMessageCaption"), ErrorCaption, GEngineIni);
          UE_LOG( LogFileManager, Error, TEXT("Invalid BufferCount=%lld while reading %s. File is most likely corrupted. Please verify your installation. Pos=%lld, Size=%lld, PrecacheSize=%lld, PrecacheOffset=%lld"), BufferCount, *Filename, Pos, Size, PrecacheSize, PrecacheOffset );
          if (GLog)
          {
            GLog->Flush();
          }
          FPlatformMisc::MessageBoxExt(EAppMsgType::Ok, *ErrorMessage.ToString(), *ErrorCaption.ToString());
          check(false);
        }
      #else
        {
          UE_CLOG( BufferCount > ARRAY_COUNT( Buffer ) || BufferCount <= 0, LogFileManager, Fatal, TEXT("Invalid BufferCount=%lld while reading %s. File is most likely corrupted. Please verify your installation. Pos=%lld, Size=%lld, PrecacheSize=%lld, PrecacheOffset=%lld"), BufferCount, *Filename, Pos, Size, PrecacheSize, PrecacheOffset );
        }
      #endif
      ReadLowLevel( Buffer, BufferCount, Count );
    }
    if( Count!=BufferCount )
    {
      TCHAR ErrorBuffer[1024];
      ArIsError = true;
      UE_LOG( LogFileManager, Warning, TEXT( "ReadFile failed: Count=%lld BufferCount=%lld Error=%s" ), Count, BufferCount, FPlatformMisc::GetSystemErrorMessage( ErrorBuffer, 1024, 0 ) );
    }
  }
  return true;
}

Those are some pretty hefty functions, really… particularly considering how many times they’re likely to be called. Let me draw your attention to ErrorBuffer.. this occurs in both functions, line 16 in Serialize() and 33 in InternalPrecache()… it’s a 2k stack-allocated buffer, regardless of whether or not we’re using it. Not good.

Anyway, here’s what I did with all of this… simply put, I removed all of the error-checking code on “shipping” and “test” versions for the above functions and a couple of others. Arguably, I might’ve gotten away with doing the same for “development” – I will certainly be testing this out as I suspect it could give a significant boost for cook times.

At the top of FileManagerGeneric.cpp, I added USE_ERROR_CHECKING thus:-

#define USE_ERROR_CHECKING !(UE_BUILD_SHIPPING || UE_BUILD_TEST)

FArchiveFileReaderGeneric::Seek() became:-

void FArchiveFileReaderGeneric::Seek( int64 InPos )
{
#if USE_ERROR_CHECKING
  check( InPos>=0 );
  check( InPos<=Size );
  if( !SeekLowLevel( InPos ) )
  {
    TCHAR ErrorBuffer[1024];
    ArIsError = true;
    UE_LOG(LogFileManager, Error, TEXT("SetFilePointer on %s Failed %lld/%lld: %lld %s"), *Filename, InPos, Size, Pos, FPlatformMisc::GetSystemErrorMessage(ErrorBuffer, 1024, 0));
  }
#else
  SeekLowLevel(InPos);
#endif // USE_ERROR_CHECKING
  Pos         = InPos;
  BufferBase  = Pos;
  BufferCount = 0;
}

Here’re the changes to FArchiveFileReaderGeneric::InternalPrecache():-

    BufferCount = FMath::Max( BufferCount, 0LL ); // clamp to 0
    int64 Count; // BKP-MODS: Minor speedup
#if !USE_ERROR_CHECKING
    ReadLowLevel( Buffer, BufferCount, Count );
#else // USE_ERROR_CHECKING
    {
      #if PLATFORM_DESKTOP
        // Show a message box indicating, possible, corrupt data (desktop platforms only)
        if (BufferCount > ARRAY_COUNT( Buffer ) || BufferCount <= 0)
        {
... lots of code ...
      UE_LOG( LogFileManager, Warning, TEXT( "ReadFile failed: Count=%lld BufferCount=%lld Error=%s" ), Count, BufferCount, FPlatformMisc::GetSystemErrorMessage( ErrorBuffer, 1024, 0 ) );
    }
#endif
 }
 return true;

And, finally, FArchiveFileReaderGeneric::Serialize():-

void FArchiveFileReaderGeneric::Serialize( void* V, int64 Length )
{
  while( Length>0 )
  {
    int64 Copy = FMath::Min( Length, BufferBase+BufferCount-Pos );
    if( Copy<=0 )
    {
      if( Length >= ARRAY_COUNT( Buffer ) )
      {
        int64 Count; // BKP-MODS: Minor speedup
        {
          ReadLowLevel(( uint8* )V, Length, Count );
        }
#if USE_ERROR_CHECKING
        if( Count!=Length )
        {
...
        }
#endif // USE_ERROR_CHECKING
        Pos += Length;
        return;
      }
      InternalPrecache( Pos, MAX_int32 );
      Copy = FMath::Min( Length, BufferBase+BufferCount-Pos );
#if USE_ERROR_CHECKING
      if( Copy<=0 )
...
        return;
      }
#endif
    }
    FMemory::Memcpy( V, Buffer+Pos-BufferBase, Copy );
    Pos       += Copy;
    Length    -= Copy;
    V          =( uint8* )V + Copy;
  }
}

When USE_ERROR_CHECKING is 0, we’re losing the bulk of those functions, the expensive part. This is effectively what InternalPrecache becomes to the compiler:-

bool FArchiveFileReaderGeneric::InternalPrecache( int64 PrecacheOffset, int64 PrecacheSize )
{
  if( Pos == PrecacheOffset &&( !BufferBase || !BufferCount || BufferBase != Pos ) )
  {
    BufferBase = Pos;
    BufferCount = FMath::Min( FMath::Min( PrecacheSize,( int64 )( ARRAY_COUNT( Buffer ) -( Pos&( ARRAY_COUNT( Buffer )-1 ) ) ) ), Size-Pos );
    BufferCount = FMath::Max( BufferCount, 0LL ); // clamp to 0
    int64 Count;
    ReadLowLevel( Buffer, BufferCount, Count );
  }
  return true;
}

That’s a LOT smaller/simpler, I’m sure you’ll agree. One huge bonus from this simplification is that, on shipping/test builds, the function ends up being inlined automatically and interleaved within Serialize() by the compiler. Additionally, by removing the 2k buffers from the stack, we remove the need for Visual Studio to add in the “buffer security check” code (I’ll be writing about that in a later blog) to all three of the functions above. These checks are usually fairly insignificant – but, you know, if you take something insignificant and do it 42 million times, it doesn’t stay that way.

All in all, these changes give a significant performance boost with increased load times across the board.

Credit(s): Robert Troughton (Coconut Lizard)
Status: Currently unimplemented in 4.12

Facebook Messenger Twitter Pinterest Whatsapp Email
Go to Top