The Smallest, Coolest Optimization
April 17, 2016
TLDR: Adding a single character in the UE4 codebase saved us 40% of runtime memory churn!
UPDATE: the fixes below were added to the code base in UE4.13
Sometimes, as an optimization engineer, you need to spend days working on a task that speeds up your code by a fraction of a percent. Other times, you have a “holy s###” moment, finding that a single character can make a dramatic difference – the change that I’m going to show you here is exactly one of these. As of 4.11, and the date of this post, this optimization is still valid.
What’s important to note about this change is that, while it looks like an obvious mistake that should’ve been easily spotted, with a codebase the size of Unreal Engine, it can be like looking for a pin in a haystack. I needed to use several optimization tools to isolate the problem code that I’m highlighting here – and I’m not just talking about vTune.
void FAnimNode_ModifyBone::EvaluateBoneTransforms(USkeletalMeshComponent* SkelComp, FCSPose<FCompactPose>& MeshBases, TArray<FBoneTransform>& OutBoneTransforms) { check(OutBoneTransforms.Num() == 0); // the way we apply transform is same as FMatrix or FTransform // we apply scale first, and rotation, and translation // if you'd like to translate first, you'll need two nodes that first node does translate and second nodes to rotate. const FBoneContainer BoneContainer = MeshBases.GetPose().GetBoneContainer();
The single function above, of which I’ve only pasted the initial lines, was responsible for around 40% of all memory allocations that we were seeing when profiling our project… the majority of the time being spent on a single line of code:-
const FBoneContainer BoneContainer = MeshBases.GetPose().GetBoneContainer();
EvaluateBoneTransforms(), as it turned out, didn’t need (or want) a copy of the FBoneContainer at all – a reference would suffice. FBoneContainers can get pretty huge on fully fleshed out meshes – note the number of TArrays:-
struct ENGINE_API FBoneContainer { private: /** Array of RequiredBonesIndices. In increasing order. */ TArray<FBoneIndexType> BoneIndicesArray; /** Array sized by Current RefPose. true if Bone is contained in RequiredBones array, false otherwise. */ TBitArray<> BoneSwitchArray; /** Asset BoneIndicesArray was made for. Typically a SkeletalMesh. */ TWeakObjectPtr<UObject> Asset; /** If Asset is a SkeletalMesh, this will be a pointer to it. Can be NULL if Asset is a USkeleton. */ TWeakObjectPtr<USkeletalMesh> AssetSkeletalMesh; /** If Asset is a Skeleton that will be it. If Asset is a SkeletalMesh, that will be its Skeleton. */ TWeakObjectPtr<USkeleton> AssetSkeleton; /** Pointer to RefSkeleton of Asset. */ const FReferenceSkeleton* RefSkeleton; /** Mapping table between Skeleton Bone Indices and Pose Bone Indices. */ TArray<int32> SkeletonToPoseBoneIndexArray; /** Mapping table between Pose Bone Indices and Skeleton Bone Indices. */ TArray<int32> PoseToSkeletonBoneIndexArray; // Look up from skeleton to compact pose format TArray<int32> CompactPoseToSkeletonIndex; // Look up from compact pose format to skeleton TArray<FCompactPoseBoneIndex> SkeletonToCompactPose; // Compact pose format of Parent Bones (to save us converting to mesh space and back) TArray<FCompactPoseBoneIndex> CompactPoseParentBones; // Compact pose format of Ref Pose Bones (to save us converting to mesh space and back) TArray<FTransform> CompactPoseRefPoseBones; /** For debugging. */ /** Disable Retargeting. Extract animation, but do not retarget it. */ bool bDisableRetargeting; /** Disable animation compression, use RAW data instead. */ bool bUseRAWData; /** Use Source Data that is imported that are not compressed. */ bool bUseSourceData;
As you can expect, changing the code to use a reference gave us a HUGE gain. The fact that it wasn’t a reference in the first place was obviously human error – the programmer undoubtedly just missed the reference operator. Note that almost every other call to GetBoneContainer() used a reference – the only other function at fault being AnimNode_ObserveBone::EvaluateBoneTransforms() (which can also be fixed – though this function is rarely used).
So, for completeness, the correct code should have looked like this:-
const FBoneContainer& BoneContainer = MeshBases.GetPose().GetBoneContainer();
That’s a single character change giving a 40% reduction in run-time memory allocations. Pretty cool, I’d say?
Credit(s): Robert Troughton (Coconut Lizard)
Status: Fixed in 4.12