skip to main content

The Quickening Of The Light Map UV Generation

TheQuickeningOfTheLightmap

Summary


In Unreal Engine 4, generating lightmap UVs for static meshes takes a very significant part of a project’s cook and startup time, at least until the Derived Data Cache (DDC) is populated. This is primarily due to the algorithms and methods used for UV packing. This method is primarily intended as a fallback if the artist working on the asset hasn’t created the lightmap (they’d do a better job naturally!) so when we have to automate the task we don’t really want it eating up so much time. Significant progress on addressing the speed of lightmap generation was made in the UE 4.16 release – but in today’s blog post, we’re going to show you that we can go further still, showing a nearly 5x increase in performance overall.

You can see our changes in the following Github Pull Requests:

github.com/EpicGames/UnrealEngine/pull/4791

and

github.com/EpicGames/UnrealEngine/pull/4792

Currently the engine uses a linear search to find the first fit and then follows this up with a binary search to see whether this can be improved upon – increasing the UV scale and testing to see whether everything still fits.

4.15 – Pixel by Pixel


Selected stats output in UE 4.15 for generating various DDC resources taken from Epic’s Boy with Kite demo (KiteDemo) – highlighting that StaticMeshes were a very significant factor in the time taken:-

 

DDC Resource Stats

Asset Type Total Time (Sec) Assets Built MB Processed
StaticMesh 2856.90 284 412.12
DistanceField 600.47 288 567.33
Texture (Inline) 73.28 4156 8859.84
SoundWave 62.94 243 107.59
LandscapeCollision (Heightfield) 47.69 2112 7140.62
Texture (Streaming) 24.86 0 15254.14
PhysX (BodySetup) 22.32 772 366.38
NavCollision 10.67 362 25.77
MaterialShader 5.01 764 85.20
GlobalShader 0.08 2 2.29

And here’s the breakdown for a few significant meshes:-

Mesh FindBestPacking (Sec)   PackCharts (Sec)
  LOD0 LOD1 LOD2 LOD3   LOD0 LOD1 LOD2 LOD3
ScotsPine_01 87 88 55.5 0.026   29.7 59.31 44.71 0.012
HillTree_Tall_02 387 6.92 7.5 0.7   218 2.78 4 0.513
HillTree_02 1761 18.4 19.7 0.044   1331 12.4 13.16 0.039

You can clearly see from the statistics above that a large percentage of time is spent finding the best UV packing for complex static meshes.

The process of finding a best pack involves two steps – scaling charts (groups of triangles with contiguous UVs) and packing charts. In order to pack a chart, the engine will do 8 orientation tests of the group of triangles (represented in a rectangular texture) and choose the best orientation in order to create as dense a packing as possible.

 

For each orientation test, the algorithm uses a software rasterization method, filling in a rectangular texture as it goes and scanning, from top-left to bottom-right, to find out whether there’s enough empty space to fit the chart. Orientation testing is carried out using an extremely expensive method based on 2D collision detection. This becomes exponentially more expensive as we deal with larger and larger lightmaps. For example, with a 1024×1024 layout texture and a 16×16 rect texture, we end up potentially with (1024 – 16) x (1024-16) x 16 x 16 collision tests per orientation. Furthermore, for each test there is also a check process (64bits at a time) and for a complex mesh there can be thousands of individual charts. For each additional chart added, potentially all of the previously fitted charts must be reorientated so the algorithm gets slower and slower as the index of our array of charts grows.

4.16 – Segments


Here are the timings from UE 4.16, highlighting that Epic realised the optimization opportunities and made a decent push at improving them:-

DDC Resource Stats

Asset Type Total Time (Sec) Assets Built MB Processed
StaticMesh 1879.41 284 410.09
DistanceField 595.34 287 566.13
Texture (Inline) 75.44 4167 9035.83
SoundWave 65.71 243 107.59
LandscapeCollision (Heightfield) 58.99 2112 7140.62
Texture (Streaming) 22.50 0 15066.83
PhysX (BodySetup) 21.79 771 365.91
NavCollision 10.18 361 25.76
MaterialShader 4.71 764 84.00
GlobalShader 0.08 2 2.29

In UE 4.16, Epic have created a new data structure to represent a 2D texture. So each texture contains several rows and each row has both used and free segments (A segment is some continuous pixels in the same row).

The main benefit brought about by this data structure is that instead of scanning the layout texture pixel by pixel, now the engine can do it in segments. Because the number of segments in a row usually is much lower than the number of pixels in that row, the loop iterations can be reduced significantly especially for a high resolution texture.

As described above, each chart is represented by a rectangular texture which nominally has eight unique representations related to each other by rotational or mirroring operations. Another optimisation of the chart packing system in UE 4.16 is that four rotational orientations of a chart are rasterized and these four rasterized charts are mirrored to get the remaining four. As a result of doing this rasterization times can be reduced by a factor of four, although chart rasterization is a very fast process so the gain is relatively low.

And here are some mesh breakdowns again taken from KiteDemo:-

Mesh FindBestPacking (Sec)   PackCharts (Sec)
  LOD0 LOD1 LOD2 LOD3   LOD0 LOD1 LOD2 LOD3
ScotsPine_01 10.8 9.54 3.02 0.009   3 6.45 2.42 0,004
HillTree_Tall_02 238 1.27 1.13 0.08   157 0.86 0.64 0.046
HillTree_02 1304 2.93 2.29 0.031   1005 2.33 1.66 0.028

It’s easy to see from the above results that the time required for chart packing has been reduced significantly, particularly with ScotsPine_01. The improvements for the other two meshes aren’t quite as good, however. This is due to fine details on these meshes needing lots of tiny charts – leading to a large number of segments per row and the optimization being less effective.

4.16 – Segments + Optimizations


In UE 4.16 the FLayoutUV::FindBestPacking() method iterates over a linear search method until a best fit scale rate is found, then up to six binary search iterations are performed, each of which should improve on the best fit scale rate (UVScalePass). On completion of the binary search UVScalePass was used as a parameter to generate a final packing result, effectively repeating the function calls carried out in the last binary search iteration. Instead of this we have cached the results of FindBestPacking(), along with UVScalePass. As a result, repeated chart packing operations can be avoided. We have also added a check to see whether the final binary search was successful, and if so the final chart packing step can be omitted. This approach can reduce the total time required for the FindBestPacking() function by around 10 percent depending on how many times the linear and binary searches are performed.

The second major optimisation we’ve implemented is asynchronous execution within the FLayoutUV::PackCharts() method. Instead of doing eight orientations tests of each and every chart, one by one, each test has been put into a task graph (for short running tasks) allowing parallel execution. Since all tests are run at the same time and each task graph can’t access the results of others, one of the optimisations introduced 4.16 has been removed – since it’s incompatible with asynchronous operations. This was a check on whether it’s impossible for the current orientation result to beat the best previous result. If this was the case then that particular iteration would be terminated early. We’ve also had to discard another optimisation introduced in 4.16, where the fitting operations had been altered to initially attempt to find a best fit on the last failed row of a chart rather than starting from the beginning (top-left), as it actually makes our algorithm a little slower.

Timings in UE 4.16 (KiteDemo) with the optimizations described above implemented:-

DDC Resource Stats

Asset Type Total Time (Sec) Assets Built MB Processed
StaticMesh 641.07 284 409.98
DistanceField 514.33 287 570.72
Texture (Inline) 40.86 3931 8245.62
SoundWave 69.50 243 107.59
LandscapeCollision (Heightfield) 62.76 2112 7140.62
Texture (Streaming) 23.29 0 16418.68
PhysX (BodySetup) 23.05 762 401.65
NavCollision 10.54 351 25.69
MaterialShader 5.18 764 79.75
GlobalShader 0.08 2 2.29

And the mesh breakdowns again:-

Mesh FindBestPacking (Sec)   PackCharts (Sec)
  LOD0 LOD1 LOD2 LOD3   LOD0 LOD1 LOD2 LOD3
ScotsPine_01 3.83 2.8 1.6 0.008   0.97 1.89 1.25 ~0
HillTree_Tall_02 50.3 0.399 0.421 0.035   26.66 0.162 0.179 0.016
HillTree_02 366 0.980 1.03 0.044   223 0.680 0.658 0.041

Conclusion


Here’s a final comparison of the total timings from a cook of KiteDemo to highlight the performance of the three different methods:-

Method Total Time (Sec)
4.15 Pixel by Pixel 2856.90
4.16 Segments 1879.41
4.16 Segments Optimized 641.07

As can be seen in these test results, 4.16’s segments approach works much better than 4.15’s pixel one, particularly on meshes with high texture resolution and low chart numbers. For other meshes, containing larger numbers of charts, the improvements are not quite so high – largely due to the high segment count. The asynchronous execution approach that we’re proposing works well on those meshes (e.g. HillTree_02 with more than 9000 charts) because the 8 orientation tests of each chart can be performed in parallel – leading to an eight-fold speed gain.

In addition, the scaling of lightmap UVs generated by each algorithm can sometimes be slightly different. This may be due to the packing bias differences for each chart, or the rounding error of floating-point calculations. Regardless, all three of these approaches afford a dense UV lightmap without overlaps.

By default, the segments method is not enabled on KiteDemo meshes, but resaving these meshes should result in significant speed gains in level load timings. Users can resave the content by unchecking the generate lightmap UVs checkbox in the detail panel->build settings, saving the content, rechecking the box again and saving the content. If you are building the engine from source code, you can add the following line to the beginning of the PackCharts() function:-

LayoutVersion = ELightmapUVVersion::Latest;

to force the engine into using the segments method to pack charts.

Please let me know if this article was helpful. I ‘d love to hear any suggestions about it :).

Credits:
Programming/Research: Yuchen Mei
Mentoring: Gareth Martin, Joe Tidmarsh, Robert Troughton, Lee Clark (Coconut Lizard)
Blog Support: Josef Gluyas

Facebook Messenger Twitter Pinterest Whatsapp Email
Go to Top