The Quickening Of The Light Map UV Generation
June 19, 2017
Summary
In Unreal Engine 4, generating lightmap UVs for static meshes takes a very significant part of a project’s cook and startup time, at least until the Derived Data Cache (DDC) is populated. This is primarily due to the algorithms and methods used for UV packing. This method is primarily intended as a fallback if the artist working on the asset hasn’t created the lightmap (they’d do a better job naturally!) so when we have to automate the task we don’t really want it eating up so much time. Significant progress on addressing the speed of lightmap generation was made in the UE 4.16 release – but in today’s blog post, we’re going to show you that we can go further still, showing a nearly 5x increase in performance overall.
You can see our changes in the following Github Pull Requests:
github.com/EpicGames/UnrealEngine/pull/4791
and
github.com/EpicGames/UnrealEngine/pull/4792
Currently the engine uses a linear search to find the first fit and then follows this up with a binary search to see whether this can be improved upon – increasing the UV scale and testing to see whether everything still fits.
4.15 – Pixel by Pixel
Selected stats output in UE 4.15 for generating various DDC resources taken from Epic’s Boy with Kite demo (KiteDemo) – highlighting that StaticMeshes were a very significant factor in the time taken:-
DDC Resource Stats
Asset Type | Total Time (Sec) | Assets Built | MB Processed |
---|---|---|---|
StaticMesh | 2856.90 | 284 | 412.12 |
DistanceField | 600.47 | 288 | 567.33 |
Texture (Inline) | 73.28 | 4156 | 8859.84 |
SoundWave | 62.94 | 243 | 107.59 |
LandscapeCollision (Heightfield) | 47.69 | 2112 | 7140.62 |
Texture (Streaming) | 24.86 | 0 | 15254.14 |
PhysX (BodySetup) | 22.32 | 772 | 366.38 |
NavCollision | 10.67 | 362 | 25.77 |
MaterialShader | 5.01 | 764 | 85.20 |
GlobalShader | 0.08 | 2 | 2.29 |
And here’s the breakdown for a few significant meshes:-
Mesh | FindBestPacking (Sec) | PackCharts (Sec) | |||||||
---|---|---|---|---|---|---|---|---|---|
LOD0 | LOD1 | LOD2 | LOD3 | LOD0 | LOD1 | LOD2 | LOD3 | ||
ScotsPine_01 | 87 | 88 | 55.5 | 0.026 | 29.7 | 59.31 | 44.71 | 0.012 | |
HillTree_Tall_02 | 387 | 6.92 | 7.5 | 0.7 | 218 | 2.78 | 4 | 0.513 | |
HillTree_02 | 1761 | 18.4 | 19.7 | 0.044 | 1331 | 12.4 | 13.16 | 0.039 |
You can clearly see from the statistics above that a large percentage of time is spent finding the best UV packing for complex static meshes.
The process of finding a best pack involves two steps – scaling charts (groups of triangles with contiguous UVs) and packing charts. In order to pack a chart, the engine will do 8 orientation tests of the group of triangles (represented in a rectangular texture) and choose the best orientation in order to create as dense a packing as possible.
For each orientation test, the algorithm uses a software rasterization method, filling in a rectangular texture as it goes and scanning, from top-left to bottom-right, to find out whether there’s enough empty space to fit the chart. Orientation testing is carried out using an extremely expensive method based on 2D collision detection. This becomes exponentially more expensive as we deal with larger and larger lightmaps. For example, with a 1024×1024 layout texture and a 16×16 rect texture, we end up potentially with (1024 – 16) x (1024-16) x 16 x 16 collision tests per orientation. Furthermore, for each test there is also a check process (64bits at a time) and for a complex mesh there can be thousands of individual charts. For each additional chart added, potentially all of the previously fitted charts must be reorientated so the algorithm gets slower and slower as the index of our array of charts grows.
4.16 – Segments
Here are the timings from UE 4.16, highlighting that Epic realised the optimization opportunities and made a decent push at improving them:-
DDC Resource Stats
Asset Type | Total Time (Sec) | Assets Built | MB Processed |
---|---|---|---|
StaticMesh | 1879.41 | 284 | 410.09 |
DistanceField | 595.34 | 287 | 566.13 |
Texture (Inline) | 75.44 | 4167 | 9035.83 |
SoundWave | 65.71 | 243 | 107.59 |
LandscapeCollision (Heightfield) | 58.99 | 2112 | 7140.62 |
Texture (Streaming) | 22.50 | 0 | 15066.83 |
PhysX (BodySetup) | 21.79 | 771 | 365.91 |
NavCollision | 10.18 | 361 | 25.76 |
MaterialShader | 4.71 | 764 | 84.00 |
GlobalShader | 0.08 | 2 | 2.29 |
In UE 4.16, Epic have created a new data structure to represent a 2D texture. So each texture contains several rows and each row has both used and free segments (A segment is some continuous pixels in the same row).
The main benefit brought about by this data structure is that instead of scanning the layout texture pixel by pixel, now the engine can do it in segments. Because the number of segments in a row usually is much lower than the number of pixels in that row, the loop iterations can be reduced significantly especially for a high resolution texture.
As described above, each chart is represented by a rectangular texture which nominally has eight unique representations related to each other by rotational or mirroring operations. Another optimisation of the chart packing system in UE 4.16 is that four rotational orientations of a chart are rasterized and these four rasterized charts are mirrored to get the remaining four. As a result of doing this rasterization times can be reduced by a factor of four, although chart rasterization is a very fast process so the gain is relatively low.
And here are some mesh breakdowns again taken from KiteDemo:-
Mesh | FindBestPacking (Sec) | PackCharts (Sec) | |||||||
---|---|---|---|---|---|---|---|---|---|
LOD0 | LOD1 | LOD2 | LOD3 | LOD0 | LOD1 | LOD2 | LOD3 | ||
ScotsPine_01 | 10.8 | 9.54 | 3.02 | 0.009 | 3 | 6.45 | 2.42 | 0,004 | |
HillTree_Tall_02 | 238 | 1.27 | 1.13 | 0.08 | 157 | 0.86 | 0.64 | 0.046 | |
HillTree_02 | 1304 | 2.93 | 2.29 | 0.031 | 1005 | 2.33 | 1.66 | 0.028 |
It’s easy to see from the above results that the time required for chart packing has been reduced significantly, particularly with ScotsPine_01. The improvements for the other two meshes aren’t quite as good, however. This is due to fine details on these meshes needing lots of tiny charts – leading to a large number of segments per row and the optimization being less effective.
4.16 – Segments + Optimizations
In UE 4.16 the FLayoutUV::FindBestPacking() method iterates over a linear search method until a best fit scale rate is found, then up to six binary search iterations are performed, each of which should improve on the best fit scale rate (UVScalePass). On completion of the binary search UVScalePass was used as a parameter to generate a final packing result, effectively repeating the function calls carried out in the last binary search iteration. Instead of this we have cached the results of FindBestPacking(), along with UVScalePass. As a result, repeated chart packing operations can be avoided. We have also added a check to see whether the final binary search was successful, and if so the final chart packing step can be omitted. This approach can reduce the total time required for the FindBestPacking() function by around 10 percent depending on how many times the linear and binary searches are performed.
The second major optimisation we’ve implemented is asynchronous execution within the FLayoutUV::PackCharts() method. Instead of doing eight orientations tests of each and every chart, one by one, each test has been put into a task graph (for short running tasks) allowing parallel execution. Since all tests are run at the same time and each task graph can’t access the results of others, one of the optimisations introduced 4.16 has been removed – since it’s incompatible with asynchronous operations. This was a check on whether it’s impossible for the current orientation result to beat the best previous result. If this was the case then that particular iteration would be terminated early. We’ve also had to discard another optimisation introduced in 4.16, where the fitting operations had been altered to initially attempt to find a best fit on the last failed row of a chart rather than starting from the beginning (top-left), as it actually makes our algorithm a little slower.
Timings in UE 4.16 (KiteDemo) with the optimizations described above implemented:-
DDC Resource Stats
Asset Type | Total Time (Sec) | Assets Built | MB Processed |
---|---|---|---|
StaticMesh | 641.07 | 284 | 409.98 |
DistanceField | 514.33 | 287 | 570.72 |
Texture (Inline) | 40.86 | 3931 | 8245.62 |
SoundWave | 69.50 | 243 | 107.59 |
LandscapeCollision (Heightfield) | 62.76 | 2112 | 7140.62 |
Texture (Streaming) | 23.29 | 0 | 16418.68 |
PhysX (BodySetup) | 23.05 | 762 | 401.65 |
NavCollision | 10.54 | 351 | 25.69 |
MaterialShader | 5.18 | 764 | 79.75 |
GlobalShader | 0.08 | 2 | 2.29 |
And the mesh breakdowns again:-
Mesh | FindBestPacking (Sec) | PackCharts (Sec) | |||||||
---|---|---|---|---|---|---|---|---|---|
LOD0 | LOD1 | LOD2 | LOD3 | LOD0 | LOD1 | LOD2 | LOD3 | ||
ScotsPine_01 | 3.83 | 2.8 | 1.6 | 0.008 | 0.97 | 1.89 | 1.25 | ~0 | |
HillTree_Tall_02 | 50.3 | 0.399 | 0.421 | 0.035 | 26.66 | 0.162 | 0.179 | 0.016 | |
HillTree_02 | 366 | 0.980 | 1.03 | 0.044 | 223 | 0.680 | 0.658 | 0.041 |
Conclusion
Here’s a final comparison of the total timings from a cook of KiteDemo to highlight the performance of the three different methods:-
Method | Total Time (Sec) |
---|---|
4.15 Pixel by Pixel | 2856.90 |
4.16 Segments | 1879.41 |
4.16 Segments Optimized | 641.07 |
As can be seen in these test results, 4.16’s segments approach works much better than 4.15’s pixel one, particularly on meshes with high texture resolution and low chart numbers. For other meshes, containing larger numbers of charts, the improvements are not quite so high – largely due to the high segment count. The asynchronous execution approach that we’re proposing works well on those meshes (e.g. HillTree_02 with more than 9000 charts) because the 8 orientation tests of each chart can be performed in parallel – leading to an eight-fold speed gain.
In addition, the scaling of lightmap UVs generated by each algorithm can sometimes be slightly different. This may be due to the packing bias differences for each chart, or the rounding error of floating-point calculations. Regardless, all three of these approaches afford a dense UV lightmap without overlaps.
By default, the segments method is not enabled on KiteDemo meshes, but resaving these meshes should result in significant speed gains in level load timings. Users can resave the content by unchecking the generate lightmap UVs checkbox in the detail panel->build settings, saving the content, rechecking the box again and saving the content. If you are building the engine from source code, you can add the following line to the beginning of the PackCharts() function:-
LayoutVersion = ELightmapUVVersion::Latest;
to force the engine into using the segments method to pack charts.
Please let me know if this article was helpful. I ‘d love to hear any suggestions about it :).
Credits:
Programming/Research: Yuchen Mei
Mentoring: Gareth Martin, Joe Tidmarsh, Robert Troughton, Lee Clark (Coconut Lizard)
Blog Support: Josef Gluyas