skip to main content

The Curious Incident of the Archive Corruption

AdobeStock_200479475

While doing a hardening pass on our internal memory profiling tool, we found some places in TArray, TMap, and FString deserialization code that were missing error checks. These missing checks can potentially become major issues if a file has been corrupted. Should deserialization of the data archive not be halted when reading a corrupt file, it can cause the program to allocate obscene amounts of memory!

FString


Currently the FString serialization operator << will read in the size of the FString saved during serialization and then create an array based on this size. However, if the archive is corrupt then the array size may be incorrect, potentially leading to the creation of a large array of characters.

The size parameter of an FString is an int32 value. So if an archive is corrupted, and the FString serialization continues, the underlying character array will create space for somewhere between 0 – 2.1GB of characters based on whatever junk if finds in the size parameter.

These deserialization issues become very prominent when an archive becomes corrupt and you’re attempting to serialize a TArray of FStrings, you’ll notice your used memory skyrockets!

Fortunately the error state of the archive is available during deserialization. So when the archive is loading it is simple to check that it is not currently in a state of error and skip over the deserialization if so.

 

FArchive& operator<<( FArchive& Ar, FString& A )
{
	if (Ar.IsLoading()
	   && !Ar.IsError() /* <- Add this */ )
	{
		int32 SaveNum;
		Ar << SaveNum;
		bool LoadUCS2Char = SaveNum < 0;
		if (LoadUCS2Char)
		{
			SaveNum = -SaveNum;
		}
		... Other FString serialisation code ...
		//Performs character array resize
		A.Data.Empty(SaveNum);
		... Other FString serialisation code ...
	}
}
friend FArchive& operator<<(FArchive& Ar, TArray& A)
{
	... Other TArray serialisation code ...
	int32 SerializeNum = (Ar.IsLoading() ? 0 : A.ArrayNum);
	Ar << SerializeNum;
	check(SerializeNum >= 0);
	if (!Ar.IsError() && SerializeNum >= 0 && ensure(!Ar.IsNetArchive() || SerializeNum <= MaxNetArraySerialize))
	{
		... Other TArray serialisation code ...
	}
}

The issue here is that we cannot know how much data was serialized into the file, and it’s completely legal to have saved a very large array. TArray still suffers the same issue as FString though, if the size parameter (SerializeNum) is corrupted a significant memory allocation could potentially be made. However, it is possible to determine if the size parameter is completely unrealistic and set an error on the archive if so. The size of each element is unknown at this point in deserialization, but must be a minimum of one byte. It follows that if our current position in the archive added to the size parameter is greater than the total size of the archive, then the size parameter has been corrupted.

friend FArchive& operator<<(FArchive& Ar, TArray& A)
{
	... Other TArray serialisation code ...
	int32 SerializeNum = (Ar.IsLoading() ? 0 : A.ArrayNum);
	Ar << SerializeNum;
	check(SerializeNum >= 0);
	/* Add from here */
	if (Ar.IsLoading() && Ar.Tell() + SerializeNum > Ar.TotalSize())
	{
		Ar.SetError();
		return Ar;
	}
	/* to here */
	if (!Ar.IsError() && SerializeNum >= 0 && ensure(!Ar.IsNetArchive() || SerializeNum <= MaxNetArraySerialize))
	{
		... Other TArray serialisation code ...
	}
}

As of UE4.23 FArchiveFromStructuredArchive uses the inherited TArray serialisation operator, but does not implement the TotalSize() function. A fix for this is included in our pull request, and detailed later in this post.

Another problem we found within TArray was that if an error was reported while deserializing a specific element, execution would continue until deserialization was finished. In order to exit early if this is the case, we have implemented a check that the archive was not in a state of error when an element is created.

friend FArchive& operator<<(FArchive& Ar, TArray& A)
{
	... Other TArray serialisation code ...
	int32 SerializeNum = (Ar.IsLoading() ? 0 : A.ArrayNum);
	Ar << SerializeNum;
	check(SerializeNum >= 0);
	if (!Ar.IsError() && SerializeNum >= 0 && ensure(!Ar.IsNetArchive() || SerializeNum <= MaxNetArraySerialize))
	{
		... Other TArray Serialisation code ...
		else if (Ar.IsLoading())
		{
			// Performs TArray resize
			A.Empty(SerializeNum);
			for (int32 i=0; i

TMap / TSparseArray


TMap, and the underlying TSparseArray that TMap is based upon, have some of the same problems as TArray. TSparseArray is susceptible to reading in an overly large size and attempting to allocate large chunks of memory, it also continues after an error is recorded on an element during deserialization. Analogous fixes to those outlined for TArray have been implemented for these data types.

friend FArchive& operator<<(FArchive& Ar,TSparseArray& Array)
{
	Array.CountBytes(Ar);
	if( Ar.IsLoading()
	  && !Ar.IsError() /* <- Add this */ )
	{
		// Load array.
		int32 NewNumElements = 0;
		Ar << NewNumElements;
		/* Add from here */
		if (NewNumElements < 0 || Ar.Tell() + NewNumElements > Ar.TotalSize())
		{
			Ar.SetError();
			return Ar;
		}
		/* to here */
		//Performs TSparseArray resize
		Array.Empty( NewNumElements );
		for(int32 ElementIndex = 0;ElementIndex < NewNumElements
		   && !Ar.IsError() /* <- Add this */ ;ElementIndex++)
		{
			Ar << *::new(Array.AddUninitialized())ElementType;
		}
		... Other TSprarseArray serialisation code ...
	}
}

FArchiveFromStructuredArchive::TotalSize()


Currently (UE4.23), the FArchiveFromStructuredArchive::TotalSize() function is marked explicitly as unsupported.

int64 FArchiveFromStructuredArchive::TotalSize()
{
	checkf(false, TEXT("FArchiveFromStructuredArchive does not support TotalSize()"));
	return FArchive::TotalSize();
}

However, other functions implemented by FArchiveFromStructuredArchive are able to determine the amount of data yet to be read in the archive (e.g. the Seek() function, shown below). The Seek() function implementation checks if the underlying archive is text format and if so, checks to make sure the provided position is not less than 0 and not greater than the size of the data buffer. If the archive is not in text format then it will use the underlying archives seek implementation.

While looking at this code though we noticed that a check assertion used inside the seek function was operating on the wrong variables. The assertion currently determines whether the current position is inside the bounds of the Buffer before overwriting the position value with the passed in position. The check call should be testing that the passed in position is not out of bounds before using this value to set the archive position.

void FArchiveFromStructuredArchive::Seek(int64 InPos)
{
	if (InnerArchive.IsTextFormat())
	{
		//Old check assertion
		//check(Pos >= 0 && Pos <= Buffer.Num());
		//Fixed check assertion
		check(InPos >= 0 && InPos <= Buffer.Num());
		Pos = InPos;
	}
	else
	{
		InnerArchive.Seek(InPos);
	}
}

Similar logic to that used in the Seek() function can then be used to determine the appropriate total size of the archive using the data buffer in text format or the underlying archive’s total size.

int64 FArchiveFromStructuredArchive::TotalSize()
{
	if (InnerArchive.IsTextFormat())
	{
		return Buffer.Num();
	}
	else
	{
		return InnerArchive.TotalSize();
	}
}

Summary


When all of the above error checks for FString, TArray, and TMap deserialization are implemented, any archive deserialization code will be significantly more robust, and archives will have a higher chance of halting/recovering when a file/data asset has been corrupted.

Credit(s): Sam Bendell (Coconut Lizard)
Help: Josef Gluyas, Gareth Martin (Coconut Lizard)
Status: Currently unfixed in UE4.23
Github Pull: https://github.com/EpicGames/UnrealEngine/pull/6258

Facebook Messenger Twitter Pinterest Whatsapp Email
Go to Top