BACK TO ARCHIVE
lookdev light transport rendering paper

Cache Points for Production-Scale Occlusion-Aware Many-Lights Sampling and Volumetric Scattering

07.27.2024 ADUCG RESEARCH

用于 Production-Scale Occlusion-Aware Many-Lights Sampling 与 Volumetric Scattering 的 Cache Points

Yining Karl Li 华特迪士尼动画工作室 美国伯班克 karl.li@disneyanimation.com

Peter Kutz∗ Adobe 美国旧金山 peter.kutz@gmail.com

Charlotte Zhu 华特迪士尼动画工作室 美国伯班克 charlotte.zhu@disneyanimation.com

Wei-Feng Wayne Huang∗ NVIDIA 美国洛杉矶 wahuang@nvidia.com

Gregory Nichols∗ Latitude AI 美国匹兹堡 greg@nichols.pro

David Adler 华特迪士尼动画工作室 美国伯班克 david.adler@disneyanimation.com

Brent Burley

华特迪士尼动画工作室 美国伯班克

brent.burley@disneyanimation.com

Daniel Teece

华特迪士尼动画工作室 美国伯班克

daniel.teece@disneyanimation.com

image

  1. Uniform Light (b) Locally Optimal (c) Cache Points (Ours)

image

RMSE: 0.2280 Time: 10:16

RMSE: 0.1436 Time: 2:03:51

RMSE: 0.1443 Time: 13:13

图 1:来自 Us Again 的一个生产场景,包含 4881396 个光源(解析光源、自发光三角形和自发光体积),使用每像素 32 个样本,分别采用 uniform light selection (a)、locally optimal light selection (b) 和我们的 cache points 系统 (c) 进行渲染。Uniform light selection 产生更快的结果但收敛性差,而为每个路径顶点构建 locally optimal light distribution 则产生更收敛的结果但速度慢得多。我们的 cache points 系统 (c) 产生的噪声水平与 (b) 相似,同时保持接近 (a) 的性能。为了清晰显示噪声差异,此图未包含最终生产帧中存在的后期渲染合成。© 2024 Disney

摘要

一个将渲染器定义为生产级渲染器的标志性能力,是其能够扩展以处理极端复杂的场景,包括由大量光源投射的复杂光照。在本文中,我们提出了Cache Points,这是迪士尼Hyperion Renderer用于在包含多达数百万个光源的场景中执行直接光照的高效无偏importance sampling的系统。我们的cache points系统包含许多新颖特性。我们在将进行光源采样的点上构建空间数据结构,而不是在光源本身上构建。我们在线学习遮挡,并将其纳入我们的importance sampling分布中。我们还加速了困难的volume scattering情况下的采样。

在过去十年中,我们的cache points系统已在华特迪士尼动画工作室制作的每一部CG feature film和动画短片中得到了广泛的生产应用,使艺术家能够设计光照环境而无需担心复杂性。在本文中,我们将概述cache points系统的构建方式、工作原理、对生产光照和艺术家workflows的影响,以及它在迪士尼动画生产渲染未来中的角色。

CCS概念

• 计算方法 ! Rendering; Ray tracing.

关键词

path tracing, global illumination, light selection, importance sampling, volume rendering

ACM引用格式:

Yining Karl Li, Charlotte Zhu, Gregory Nichols, Peter Kutz, Wei-Feng Wayne Huang, David Adler, Brent Burley, and Daniel Teece. 2024. Cache Points for Production-Scale Occlusion-Aware Many-Lights Sampling and Volumetric Scattering. In The Digital Production Symposium (DigiPro ’24), July 27, 2024, Denver, CO, USA. ACM, New York, NY, USA, 19 pages. https://doi.org/10. 1145/3665320.3670993

1 引言

生产渲染中的一个主要挑战是在包含从少量到数十万甚至数百万个光源的场景中进行光源采样。此外,在任何特定制作中出现的光照场景类型往往是不可预测的。Hyperion设计的一个关键原则是强调简单性而非灵活性Burleyetal. 2017Burley et al. 2017: we try not to burden users with non-artistic controls as much as possible. In accordance with this principle, we prefer systems that are as su”ciently and automatically robust to as many production scenarios as possible; in this paper, we present an in-depth description of our system for guiding direct light sampling in our production scenes, from the simplest to the most complex lighting scenarios. Our system, cache points, builds locally optimal estimates for light sampling weights and incorporates an online learning metric for local light visibility estimates. Our system is able to (1) combine local estimates for analytical lights, emissive geometry, and emissive volumes into a single combined system, and (2) provide unbiased direct light sampling guiding for both surface points and points inside of participating media. Additionally, we have also extended our system for use in importance sampling volumetric in-scattering in participating media. While we have previously alluded to our cache points system for solving the many-lights sampling problem Burleyetal. 2018;Fongetal. 2017;NicholsandEisenacher2015Burley et al. 2018; Fong et al. 2017; Nichols and Eisenacher 2015 and have described cache points for volumetric scattering $$ Huang et al. 2021

![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId22.jpg) Figure 2: In addition to normal scenes with up to hundreds of lights (top), Big Hero 6 featured scenes with anywhere from thousands of lights (middle), to hundreds of thousands of lights (bottom). Our cache points system was designed to e\#ciently and robustly perform many-lights sampling in all of these scenarios. © 2024 Disney # 2 MOTIVATION AND RELATED WORK Motivation. Production renderers often have to handle scenes with extraordinary levels of lighting complexity. Disney’s Hyperion Renderer was rst written for the production of Big Hero 6, which featured nighttime city scenes that contained as many as half a million small, bright, directional light sources (Figure 2). Later productions further increased the complexity level by including emissive mesh geometry and emissive volumes, resulting in scenes with millions of discrete light sources if each triangle in each emissive mesh and each emissive volume is considered separately (Figure 1). While sampling direct lighting from individual analytical lights using next event estimation and combining direct lighting samples with BSDF samples via multiple importance sampling $$ Veach 1998 $$ is now well understood, automatically choosing which light to sample out of potentially millions of lights remains an active area of research. 手动光源分组。管理光照复杂性的一种可能解决方案是让艺术家手动从场景中剔除贡献低的光源,或要求艺术家运行试点渲染,收集光源使用统计信息,然后将剔除构建到生产流程中$$ Vavilala 2019 $$. However, in the spirit of seeking simple and e”cient user work\#ows, we prefer to avoid any solution that requires manual intervention or additional pipeline complexity before artists can focus on creative work. Instead, we seek automatic in-renderer solutions to handling many-lights scenarios. Hierarchical Light Trees. Hierarchical tree data structures for light selection are one of the most commonly used solutions to the many-lights problem today $$ Fascione et al. 2018; Georgiev et al. 2018; Gospodnetić 2017; Keller et al. 2017; Kulla et al. 2018; Pharr et al. 2023 $$. The particular variant of a hierarchical light tree used in Kulla et al. $$ 2018 $$ is further described in detail by Conty et al. $$ 2018 $$. While we do use a hierarchical light tree approach for clustering triangles within individual emissive meshes, we found that a hierarchy-based approach had di”culties with certain use cases for overall global many-lights sampling. Specically, we noticed that narrow, highly directional IES prole based lights were challenging for light trees to handle. Incorporating complex occlusion information into light trees also proved to be intractable. Overall, we see hierarchical tree data structures as a di%erent but largely complimentary approach to our cache points; we discuss this further in Section 5.2.2. Photon Mapping。我们cache points数据结构的核心由大量散布在场景世界空间中的点组成,这些点被放置在一个KD-tree中以进行快速最近邻查找。从高层次来看,这种数据结构与photon map非常相似$$ Jensen 1996, 2001 $$. However, our cache points di%er from photon mapping in how the data structure is built and what it stores and in how it is used. Unlike photon mapping, which places photons at points throughout the scene using forwards light tracing from light sources, our cache points are placed throughout the scene using a combination of sampling points on surfaces and backwards path tracing from the camera. Our approach of storing cumulative distribution functions (CDFs) at each point has similarities with Jensen $$ 1995 $$; however, instead of storing a rough approximation of irradiance for guiding importance sampling of indirect illumination, we store a high-quality estimate of direct illumination for use in many-lights sampling. Hyperion contains an adaptive photon mapping system $$ Burley 等人,2018 $$; we discuss how photon mapping and cache points could potentially be integrated with one another in Section 6. VPLs. The many-lights sampling problem historically has also been extensively studied as part of virtual point light (VPL) methods $$ Dachsbacher et al. 2014 $$. VPL methods focus on solving the indirect illumination problem by discretizing illumination into large numbers of virtual point lights at path vertices; the contributions from these point lights must then be summed. Performing this summing operation without having to loop over every point light is the focus of Lightcuts $$ Walter 等人,2005 $$ and other related variations $$ Davidovič et al. 2012; Pantaleoni 2019 $$. Much like VPL methods, our method’s core data structure is an unstructured collection of points generated in part from path vertices, but beyond this, we consider the problem we are focused on to be orthogonal to the VPL literature. Instead of focusing on gathering illumination from individual light sources, we consider the problem of selecting an individual light out of a large set of possible candidates for next event estimation. 学习技术。我们的方法在高层面上与 Vevoda 等人后来的工作非常相似 $$ 2018 $$, which expands upon Vevoda et al. $$ 2016 $$. They similarly use a learning technique to derive light selection probabilities incorporating visibility information. However, the details of their approach di%ers greatly from ours, including their use of a light hierarchy as the underlying data structure versus our use of unstructured points. Our approach likely also shares high-level similarities with Nichols $$ 2016 $$, although not enough detailed information about their approach is available for us to make a more thorough comparison. ReSTIR. More recently, reservoir-based spatiotemporal importance resampling, or ReSTIR $$ Bitterli et al. 2020 $$, has seen much active research. The ReSTIR family of techniques $$ Lin 等人,2022;Wyman 等人,2023 $$ builds upon resampled importance sampling $$ Talbot et al. 2005 $$ to resample lights from candidate sample pools that are rapidly built by sharing samples spatially and temporally based on weighted reservoir sampling $$ Chao 1982 $$, allowing for rapid e”cient sampling of extremely complex direct lighting scenarios in interactive GPU ray tracing contexts. ReSTIR is reliant on an initial candidate generation strategy for reservoir sampling; Bokansky et al. $$ 2021 $$ and Tokutshi et al. $$ 2021 $$ focus on this problem. Another recent work comparable to ReSTIR is Dittebrandt et al. $$ 2023 $$. We see our method as potentially complementary to ReSTIR and similar approaches; if our method was to be adapted into a progressive format, it could serve as an initial candidate generation strategy, although additional consideration would need to be given for dynamic lights. 路径引导。路径引导技术,如 OLPMM $$ Vorba 等人,2014 $$, Practical Path Guiding $$ Müller 2019; Müller et al. 2017 $$, Zero-Variance Random Walk Guiding $$ Herholz 等人,2019 $$, and others $$ Guo et al. 2018; Herholz et al. 2016; Rath et al. 2020; Ruppert et al. 2020; Vorba et al. 2020 $$ all rely on learning about illumination throughout the scene in an online process and building some variant of a spatial acceleration structure to guide sampling of indirect illumination. Path guiding techniques are distinct from many-lights sampling techniques in that path guiding focuses on indirect illumination, whereas many-lights sampling techniques focus on direct illumination; generally path guiding and many-lights sampling techniques are orthogonal but complementary to each other. In Hyperion, we use a combination of our cache points system for direct illumination and optionally Practical Path Guiding for indirect illumination. ## 3 用于多光源采样的缓存点 缓存点系统始于一个简单的观察:如果计算时间和内存不是问题,那么理论上最佳的灯光选择策略(在最小化噪声方面)将是针对每个路径顶点,估计每一个光源的贡献,并考虑遮挡,然后从所有这些估计中构建一个单一的概率分布,从中抽取一个光源进行采样。我们将这种理论上最佳的策略称为局部最优灯光选择。在实践中,除了最简单的场景外,这种方法由于所需的计算时间而不可行;在《超能陆战队》上的早期测试使用这种方法产生了噪声水平极低的渲染结果,但渲染时间被灯光选择所主导。此外,在为每个路径顶点构建局部最优灯光选择策略时考虑遮挡,意味着每个路径顶点所需的计算时间可能高度可变;在灯光数量多且遮挡复杂的情况下,计算时间可能比简单策略高出几个数量级(如图 1 所示),而在其他情况下,计算时间可能只是简单策略的较小倍数(如图 4 所示)。 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId27.jpg) 1) 制作场景 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId30.jpg) 2) 缓存点位置以黄色叠加显示 图 3:来自《魔法满屋》的一个制作场景(a),包含 38720 个光源(解析光源、自发光三角形和自发光 volume),使用缓存点进行采样(b)。初始缓存点放置建议了 388888 个候选缓存点位置,这些位置被修剪为 48364 个最终缓存点位置。为了便于说明,在此图中我们仅显示表面上的缓存点位置;实际上,我们还在空间中 volumetric 地填充缓存点,以支持参与介质中的灯光采样。© 2024 Disney 我们的系统并非在每个路径顶点构建完美的局部最优灯光采样分布,而是专注于在一个稀疏的基于点的数据结构中缓存和重用局部灯光选择信息。从概念上讲,我们的方法可以被认为是对完美的局部最优灯光采样分布进行有损压缩。由于我们将此分布用于采样而非直接评估直接光照项,因此使用局部最优灯光采样分布的近似结果是无偏的,但会以增加噪声为代价,换取更少的计算工作量和内存使用。因此,我们的目标是在尽可能接近完美采样分布质量的同时,尽可能减少计算工作量和内存使用。 ## 3.1 构建和初始化缓存点数据结构 构建并初始化缓存点系统通过以下步骤进行: 1) 生成一组初始的候选空间位置,用于放置缓存点 2) 合并空间上相似的候选位置 3) 为每个缓存点构建光照分布 4) 合并具有相似光照分布的相邻缓存点 5) 在缓存点之间模糊光照分布 在本节中,我们将详细描述这些步骤中的每一步。 3.1.1 初始候选点生成。我们的系统首先在场景中所有物体的各自包围盒内随机分布生成一组初始的 100,000 个候选点。候选点按包围盒的体积比例分布。注意,我们不会在那些不进行光照采样的物体的包围盒中放置候选点,例如面光源或具有零 re\#ectance 的物体。然后,我们从相机出发追踪少量 pilot 路径穿过场景,并从这些 pilot 路径中随机选择路径顶点子集;额外的候选点被放置在这些路径顶点上。这包括进入并在 volumes 中散射的路径。 为了确保缓存点系统将使用的内存量有一个 xed 上限,我们将从 pilot 路径顶点生成的候选点数量限制在 1,000,000。这种策略为我们提供了表面上的点与分布在空空间中的点的良好组合,以考虑 participating media。 如果我们的初始播种过程生成的缓存点少于 10 个,我们就不再继续后续的缓存点构建过程,而是简单地回退到为每个路径顶点计算完美的局部最优光照分布。 在进一步进行之前,我们对所有这些最初最多 110 万个候选点进行空间排序,并为每个候选点分配一个索引。然后,我们将候选点放入一个初始 KD-tree 中,这使我们能够执行最近邻查找,以将候选点集修剪为我们将要在其上构建光照分布的 nal 点集。接下来,我们使用两遍方法来修剪候选点集;rst 遍在我们构建任何光照分布之前进行,第二遍在我们为通过 rst 遍的每个点构建了光照分布之后进行。 3.1.2 合并空间相似的点。在 rst 修剪遍中,我们合并那些落在已有点的最小半径内的点。我们的最小半径启发式方法如下:对于每个候选点,我们使用 kNN 搜索在 1e-5 单位半径内 nd 20 个最近邻点,并将这些点合并为一个存活的候选点。我们执行此操作的方法是,首先执行一个并行循环,为每个点 nd 其 20 个最近邻点,然后 nd 具有最低索引值的点;除具有最低索引的点外,所有点都被原子地标记为删除。接下来,我们执行一个串行循环,移除所有标记为删除的点,并用存活的点重建 KDtree。在此步骤中使用 20 作为邻居数量有些随意;根据经验,这个数字似乎在平衡查找效率(通过避免不良聚类和减少总点数)与分布和修剪点所花费的时间方面表现良好。 3.1.3 构建光照分布。接下来,我们在通过 rst 遍的每个存活点处构建光照分布。每个缓存点处的光照分布实际上由两个独立的分布组成:一个近处光照分布和一个远处光照分布。我们使用多个度量来确定给定的光源 4 相对于缓存点 ? 是近还是远。首先,我们总是将 innite 光源(穹顶光源、远距离/立体角光源)视为远处。所有相对于缓存点位置具有大立体角的 nite 光源被视为靠近缓存点;我们使用以下快速启发式方法来确定这一点:

isNear(e,p) = \text{\ distance\ }(x_{e},x_{p})^{2} \leq (r_{p}*D)^{2}

其中 $x_{e}$ 是光源的质心,$x_{p}$ 是缓存点 $\mathit{p}$ 的位置,$\mathbf{r}_{p}$ 是缓存点 ? 的半径,⇡ 是缓存点分离距离的调整项,我们根据经验确定应将其设置为 4.0 以获得最佳结果。 我们将近处光源与更远的光源分开,因为来自近处光源的 irradiance 可能相对于位置和表面法线方向高度变化。近处光照分布将其所有成员放入单个 bin 中,而远处光照分布实际上由七个 bin 组成,对应于缓存点位置处的七个虚拟传感器:六个面向基本方向的定向平面和一个位于点中心的全向接收器。我们遍历场景中的所有光源,并估计每个光源在忽略遮挡的情况下对每个传感器做出的总贡献,并在每个 bin 中存储一个包含 4 到 256 个光源的列表,这些光源占到达该点的能量的 97% $$ Shirley et al. 1996 $$. We do not precompute anything for the nearby light distribution; instead, at render-time we build a light selection PDF on-the-\#y for each path vertex over the irradiance contributions from the nearby lights to the exact path vertex location. We describe this in more detail in Section 3.3. For each light in the six cardinal bins in the far light distribution, we directly store a precomputed irradiance estimate at the cache point location. For each light in the omnidirectional bin in the far light distribution, we directly store a precomputed direct \#uence estimate at the cache point location. The omnidirectional bin serves a special purpose: for surfaces that have a well dened normal, estimating irradiance makes sense since irradiance is integrated over an oriented 2D surface, but for cases where a well dened normal is either di”cult or impossible to dene, we rely on a direct \#uence estimate instead since direct \#uence is integrated over a 3D sphere. Since curve-based hair tends to have extremely complex and rapidly changing surface orientations and volumetric participating media has no dened surface normal, we use direct \#uence for driving light sampling for curves and volumes. At this stage we only estimate these values for each light but defer building CDFs and PDFs until render-time cache point lookup. There are two reasons for this: rst, as mentioned earlier, for nearby lights we can build a higher-quality sampling distribution on-the-\#y at render-time, and second, Hyperion supports a sophisticated light linking system, which means that we cannot determine which lights to exclude from a light distribution until render-time evaluation of light linking relationships has been carried out. 3.1.4 Merging Neighboring Points with Similar Light Distributions. After building a light distribution at each cache point, we perform a second pruning step that merges nearby cache points that share similar light distributions. Like in the rst pruning step, for each cache point ?, we gather the nearest 20 neighbors within a 1e-5 unit radius. We then calculate an average similarity metric $M_{avg}$ between cache point ?’s light distribution and its nearest neighbors’ light distributions. The similarity metric is calculated as follows: for two given sets of lights and ⌫, we nd the intersection of and ⌫ (meaning the set of lights that are common to both sets) and then calculate the similarity metric ” as:

M = \frac{2*\text{\ size\ }_{\text{\ weighted\ }}(\text{\ intersection\ }(A,B))}{\text{\ size\ }(A) + \text{\ size\ }(B)}

Wedenotesizeoftheintersectionofsetandsetasbeingweightedbecauseweneedtotakeintoaccountthepossibilitythatagivenlightmayexistinbothsetsandbuthavedi We denote size of the intersection of set and set ⌫ as being weighted because we need to take into account the possibility that a given light may exist in both sets and ⌫ but have di%erent assigned probabilities in each set. So, instead of just counting up the number of lights in 8=C4AB42C8\>= , ⌫ , we instead calculate a similarity percentage ( for each light, which we dene as:

S = 1 - \frac{\left| P_{B} - P_{A} \right|}{2*\left( \frac{P_{A} + P_{B}}{2} \right)}

where $P_{A}$ and $P_{B}$ denote the probabilities for a given light in sets $A$ and $B,$ respectively. This denition for ( works well so long as the probability for any given light is non-negative, which we guarantee since Hyperion’s lights only permit positive emission values; for systems where lights support negative emission values $$ Foundation 2024 $$, a modied metric would be required. The weighted size of the intersection of sets and ⌫ is then dened as the sum of ( for every light in the intersection. This approach for calculating a similarity metric between light distributions is somewhat ad-hoc, but in practice we have found that this approach works well. $M_{avg}$ is then dened as simply the sum of ” for every nearest neighbor cache point to ? divided by the number of nearest neighbors found. 接下来,我们计算缓存点 ? 与其收集的最近邻之间的平均距离;我们将此平均距离作为缓存点 ? 半径的初始猜测。然后,我们调整缓存点的半径,以考虑缓存点与其最近邻的相似程度;这通过简单地将半径乘以 $M_{avg}.$ 来实现。由于较小的 $M_{avg}$ 值表示 ? 中的光分布与其最近邻相对不相似,意味着该空间区域的直接照明辐射场以更高频率变化,因此缩小 ? 的影响范围是合理的,反之亦然。为了防止半径缩小影响缓存点的有效性,我们将其限制在世界空间原始大小的 25%,并保证最小投影屏幕空间大小(相当于 3 个像素)。 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId34.jpg) 1) 均匀光 (b) 局部最优 (c) 缓存点(我们的方法) ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId37.jpg) RMSE: 0.0896 Time: 12:02 RMSE: 0.0321 Time: 17:48 RMSE: 0.0320 Time: 15:01 图 4:来自《Encanto》的一个生产场景,包含 38720 个光源(分析光源、自发光三角形和自发光体积),具有复杂的遮挡几何体和不透明度蒙版,使用每像素 32 个样本渲染,分别采用均匀光选择 (a)、局部最优光选择 (b) 和我们的缓存点系统 (c)。在这种情况下,动态构建局部最优分布表现相对较好,而我们的缓存点系统在采样质量方面表现不差,同时所需渲染时间更少。© 2024 Disney 最后,我们遍历之前在 ? 的调整半径内找到的最近邻;对于这个最近邻子集,我们保留索引最低的点,并原子地标记子集中所有其他点以供删除。由于半径是根据缓存点光分布之间的相似性进行调整的,我们认为位于调整半径内的点具有足够相似的光分布,因此可以直接合并。 第二个剪枝步骤中的所有上述操作都在所有缓存点上的并行循环中执行;然后执行一个串行循环以删除所有标记为删除的缓存点。在我们获得最终的缓存点集合后,我们最后一次重建 KD 树,以便在渲染期间使用。 3.1.5 跨缓存点模糊光分布。在每个缓存点构建光列表并合并具有相似光分布的点之后,我们缓存点数据结构构建过程的最后一步是模糊或聚合缓存点之间的远光分布。这一模糊步骤使得每个缓存点的远光分布受到相邻缓存点远光分布的影响,从而使所有缓存点的光分布在空间上更加保守;这一步允许我们在路径追踪期间安全地每个路径顶点仅查找一个缓存点。我们不模糊近光分布,因为相邻缓存点之间近光分布的变化通常比远光分布的变化频率更高。 对于给定的缓存点 ?,模糊步骤首先通过 kNN 搜索找到距离 ? 最近的 16 个相邻缓存点。请注意,在前面的步骤中,我们使用 20 个邻居进行 kNN 搜索操作,但在这一步我们选择 16;选择 16 的理由如下。在三维空间中,能够围绕另一个相同大小的球体密集排列的等大小球体的最大数量是 12 $$ Dai et al. 2019; Hales et al. 2017 $$, in either a face-centred cubic or hexagonal close packing conguration $$ Conway and Sloane 1999 $$. However, since our cache points have variable radii, we add an additional 4 points to the maximum perfect packing number of 12 as an empirically determined adjustment factor. 收集到 16 个最近邻后,我们接下来找到到最远的收集相邻点的距离 $d_{far};$ 我们使用 $d_{far}$ 来确定每个收集的缓存点对缓存点 ? 的相对贡献。对于缓存点 ?,我们赋予相对权重 1,对于最近的收集相邻点也赋予相对权重 1,而最远的收集相邻点赋予相对权重 1/16。对于介于最近和最远点之间的点 $\mathcal{P}n$,相对权重分配如下:

\text{weight}\left( p_{n} \right) = \text{mix}\left( 1,\frac{1}{16},\frac{\text{distance}\left( x,x_{n} \right)^{2}}{\left( d_{far} \right)^{2}} \right)

然后我们将相对权重归一化,使其总和为 1;最远点被赋予相对权重 1/16 的原因是为了确保归一化后,缓存点 ? 的 16 个收集邻居将至少占 $\mathbf{p}^{\mathbf{\prime}}\mathbf{s}$ 最终模糊光分布的一半。对于 16 个收集邻居中的每一个,我们然后从该邻居的光分布中取出所有光源,将它们乘以邻居的归一化相对权重,并将这些光源添加到 ? 的光分布中。在相邻点之间模糊光分布之后,我们保持光分布未归一化,并且尚未计算 CDF;我们将此步骤推迟到实际进行光源采样时,因为我们在每个基本方向区间之间对光分布进行插值。 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId40.jpg) ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId43.jpg) 图 5:来自《Encanto》的一个生产场景,包含 4406 个光源(分析光源、自发光三角形和自发光体积),具有复杂的遮挡,使用每像素 32 个样本渲染,分别采用均匀光选择 (a)、无学习可见性估计的缓存点 (b) 和有学习可见性估计的缓存点 (c)。在此场景中,使用学习可见性估计使 RMSE 提高了 9.3%,而几乎不增加额外的渲染时间。© 2024 Disney 请注意,在此步骤中,为了允许并行处理并避免对相同点进行重复模糊,我们不能在内存中原地执行模糊;相反,我们需要将输出的模糊缓存点写入一组新的缓存点,然后交换内存。 ## 3.2 可见性估计的在线学习 3.2.1 通过跟踪样本比率进行可见性估计。缓存点最初构建时未考虑遮挡信息;在渲染过程中,我们通过在每个缓存点学习每个光源的可见性估计来改进采样。我们的方法在概念上与重要性缓存有些相似 $$ Georgiev et al. 2012 $$. Each time we select a light from a cache point, we increment an internal sample attempt counter for that light within that cache point. We then perform direct lighting and in the event that the sample successfully reaches the light and receives a useful light contribution, we atomically increment an internal successful sample counter for that same light within that same cache point. Hyperion uses a batched wave-front path tracing architecture where the rendering process is divided up into a number of discrete iterations $$ Eisenacher et al. 2013 $$; between iterations, we use the ratio ’ between successful samples $H_{success}$ and total sample attempts $H_{total}$ towards a given light to adjust the sampling weight of that light; lights with a lower ratio of successful sampling attempts are weighted down while lights with a higher ratio are weighted up. This process e%ectively corrects for cases where a bright light is initially identied as being important to a particular region of space but ends up not being important due to shadowing. The specic mechanism we use to assign every light 4 in a cache point ? a visibility weight , $\left( p,e \right)$ based on the successful sampling attempt ratio ’ is as follows:

R(p,e) = \frac{H(p,e){\text{\ success\ }} + 1}{H(p,e){\text{\ total\ }} + 1}

W(p,e) = \left{ \begin{matrix} \left( \frac{R(p,e)}{R_{\text{\ min\ }}} \right)^{2} & R(p,e) \leq R_{\text{\ min\ }} \ 1.0 & \text{\ otherwise\ } \ \end{matrix} \right.\

$R_{min}$ 是一个最小比率阈值;当该比率降至 $R_{min}$ 以下时,我们认为光源 4 在缓存点 ? 处实际上被完全遮挡,因此是降权的候选对象。根据制作经验,我们通过实验确定 0.04 是 $R_{min}$ 的一个良好默认值。重要的是,由于缓存点分布相对稀疏,无法捕捉高频阴影细节,为了保持无偏结果,我们必须确保可见性度量永远不会将光源的选择概率完全降为零 $$ Ward 1991 $$. To guarantee that down-weighting follows a relatively aggressive curve but never reaches exactly zero, we choose a quadratic fallo% (Equation 6). A single scalar visibility weight per light and quadratic fallo% are relatively simple, but even so, we have found that in complex shadowing cases, the use of our visibility metric combined with the relatively high density of cache points can noticeably improve noise over no visibility metric at all. For example, in a room lit by sunlight through small windows, our visibility estimate signicantly reduces noise by weighting down the otherwise high selection probability for a bright sun in most of the room, while leaving the selection probability high in small pools of sunlight. 3.2.2 Blurring Visibility Estimates Across Cache Points. After we calculate an individual visibility per light per cache point, we then blur the visibility weights between cache points in a manner analogous to the light distribution blurring process described in Section 3.1.5. Using the same rationale as in 3.1.5, we choose the 16 closest points to the current point ? and assign the gathered neighbors relative weights using Equation 4. We sum up the relative weights of all of the neighbors plus 1.0 for ? and invert this sum to produce a normalization term that we normalize all of the relative weights by. For each light 4 in ?’s light distribution, we multiply 4’s visibility weight , (?, 4) by ?’s normalized relative weight, and then we loop through all of the neighbors and if a neighboring point’s light distribution also contains that light, we take the visibility weight $W(p_{n},e)$ for that light in that neighbor, multiply by that neighbor’s normalized relative weight, and add that weight to ?’s visibility weight for 4. Blurring the visibility weights allows us to prevent sharp discontinuities in noise at the boundary between cache points from when the visibility term to a given light changes dramatically between neighboring cache points. The e%ect of the online learned visibility estimate system is that renders with complex occlusion initially converge slowly, but as the learning system’s quality improves during the renderer’s initial iterations, the convergence rate improves as the renderer is able to make better and better light selection decisions. Since the visibility estimate is just another metric that feeds into the cache point update mechanism, utilizing visibility estimates adds little to no additional overhead to the system (Figure 5). # 3.3 Light Selection from Cache Points At each path vertex, we perform light selection by rst selecting the nearest cache point to the path vertex via kNN search; because the light distribution blurring step in the build process results in every cache point already containing information from its neighboring cache points, we can get away with only selecting a single cache point per path vertex. For volumetric scattering, in order to avoid excessive or redundant cache point lookups, we also store the previously used cache point and re-use it if the current path vertex is still within that cache point’s radius. For a path vertex on a regular surface, we loop over the lights in the cache point’s nearby light distribution and evaluate the irradiance contribution for the exact path vertex location, and for lights in the further away cardinal bins, we approximate the irradiance by using the stored precomputed irradiance estimate at the cache point location. The cardinal bins are combined, weighting by similarity between each bin’s cardinal direction and the path vertex’s shading normal. The weights for the cardinal bins are normalized to add up to one, which makes sure that the combined weighted cardinal bins still produce a normalized light distribution. We then assign a weight to each light in our combined list of lights from the near and far distributions; we directly use each light’s irradiance estimate as its un-normalized sampling weight. We then remove any lights that are disabled for the current path vertex through light linking and add back in any lights that are marked as exclusive to the current path vertex via light linking. Finally, we multiply the un-normalized sampling weight for each light by the cache point’s learned visibility weight for that light. ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId48.jpg) Figure 6: A simple 2D-example of a cache point system and a ray looking up a light distribution from a cache point. Each yellow dot is a cache point, with the radius of each cache point indicated by the outer yellow circle surrounding the point; each cache point is labeled with a letter. Cache points D, E, and F were placed on the surface of the grey object by path vertices during the intial cache point population phase, while cache points A, B, and C were placed in empty space by being distributed inside of a bounding box. A yellow dotted line from each cache point leads to the most important light in the cache point’s light distribution. Lights 1, 2, and 3 are all equal intensity and the same size and shape. Cache points A, C, and D’s most important light is the closest light, Light 2, while cache points B and E most important light is their closest light, Light 1. Although Light 1 is closer to cache point F, cache point F’s visibility estimate over time has learned that Light 1 is occluded from cache point F, so cache point F’s most important light is now Light 3. When a ray enters the scene and hits the grey object at path vertex G, the renderer looks up the closest cache point to G via kNN search through the cache point KD-tree and “nds cache point B. From cache point B, the renderer then learns that the most important light to sample at path vertex G is Light 1. For a path vertex that belongs to volumetric scattering or that is on a curve with a hair shader $$ Chiang et al. 2016 $$, we instead calculate the direct \#uence for the exact path vertex location for nearby lights and use the omnidirectional bin in the far light distribution to get precomputed direct \#uence estimates for far lights. The direct \#uence estimates are then used as the un-normalized sampling weights, and then we apply the same light linking and visibility weight adjustments as in the regular surface case. 然后基于采样权重生成 CDF 和 PDF,所得的概率分布随后用于为下一事件估计选择光源。由于每个缓存点处的光源分布最多只考虑到达该点能量的 97%,我们还会给每个路径顶点一个较小的概率,使其从场景中的所有光源中随机选择一个光源。因为每个缓存点通常只包含相对少量的附近重要光源,所以包含大量光源的场景变得易于高效渲染,因为局部优化的光源选择分布所需考虑的光源比例可以被限制在一个较小的固定最大数量内。图 6 展示了一个简单的二维示例,其中包含一个小型缓存点分布,以及渲染器利用缓存点来决定在路径顶点处对哪个光源进行重要性采样。 我们缓存点方法的一个有趣的副作用是,尽管初始化缓存点系统需要少量额外开销,且每个路径顶点查找缓存点也有开销,但有时使用缓存点的渲染在渲染时间上仍能与采用次优但计算复杂度较低的光源选择策略的渲染相竞争;图 5 所示即为这种情况。这种效果往往出现在具有极其复杂遮挡几何体的场景中;缓存点可以有效地引导渲染器避免无谓地穿过复杂遮挡几何体投射阴影光线,从而提高整体阴影光线遍历性能。带有不透明度遮罩的遮挡几何体往往会使这种性能差异更加显著,因为在增加阴影光线遍历复杂度的基础上,评估不透明度遮罩还会增加额外的着色复杂度。 至此,我们方法在处理无限光源方面的优势变得清晰。在基于层次结构的方法中,无限光源通常是一个需要以特殊方式处理的特殊情况,因为无限光源本质上没有可定义的表面区域或空间位置,无法具有明确定义的包围盒,因此不易被纳入 BVH 或任何其他类型的空间加速结构中。因此,基于层次结构的方法通常必须定义某种机制,来决定发送给光源层次结构中有限光源的光源样本与发送给无限光源的光源样本之间的比例。然而,由于我们的方法是在我们想要进行光源采样的点上构建空间数据结构,而不是在光源本身上构建,因此我们可以像处理远距离光源分布中的任何其他有限光源类型一样处理无限光源,并在最终的每路径顶点光源选择 PDF 中正确地对无限光源进行加权。 ## 4 用于 VOLUMETRIC SCATTERING 的缓存点 除了使用缓存点来学习最优的局部光源选择分布外,我们还使用缓存点系统来学习用于重要性采样体积内散射的分布。结合我们对 null-collision 理论所做的扩展,以高效收集来自异质体积的发射,我们的体积内散射缓存点策略使我们能够高效渲染以前难以处理的情况,例如嵌入在光学薄介质中的低消光高发射体积(例如火焰),或具有高度方向性直接光照的薄各向异性介质(例如上帝之光和光束)。 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId52.jpg) 图 7:我们近期一些电影中的制作帧,描绘了包含大量光源的场景。我们的艺术家经常为从广阔城市景观到魔法闪光和粒子等效果创建大量光源;缓存点使我们能够高效渲染所有这些场景。© 2024 Disney ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId55.jpg) ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId58.jpg) 图 8:散射样本权重计算 (a) 以及根据缓存的 PDF 动态构建的距离采样 CDF (b)。 ## 4.1 Null-collision 回顾 为了最好地描述我们如何使用缓存点进行体积内散射的重要性采样,我们必须首先简要回顾内散射在体积渲染中的作用。我们首先回顾体积渲染方程的 null-collision 积分形式,该方程计算在位置 x 沿方向 l 距离 3 内的辐射亮度:

L(\mathbf{x},\omega) = \int_{0}^{d}\bar{T}(\mathbf{x},\mathbf{y})

\left( \mu_{a}(\mathbf{y})L_{e}(\mathbf{y},\omega) + \mu_{s}(\mathbf{y})L_{s}(\mathbf{y},\omega) + \mu_{n}(\mathbf{y})L(\mathbf{y},\omega) \right)dt,

其中 $\mathbf{y} = \mathbf{x} - t \times \mathbf{\omega}$。Null-collision 技术向异质体积中添加虚拟的 null 粒子,产生一个虚拟的均匀体积,通过该体积,沿光线的自由飞行距离可以通过一个与组合透射率 )¯ 成正比的 PDF 进行解析采样,该组合透射率由恒定的组合消光系数 $\bar{\mu}$ 形成。$\bar{\mu}$ 又是三种可能事件类型的体积系数之和:吸收 $\mu_{a}$、散射 $\mu_{s}$ 和 null-collision $\mu_{n}$。然后可以使用 Monte Carlo 估计器以概率 $P_{s}$(对于散射事件)、$P_{a}$(对于吸收事件)和 $P_{n}$(对于 null-collision 事件)选择性地评估这些事件。 ## 4.2 Volumetric In-scattering 采样 我们使用缓存点系统来学习方程 7 中的一个重要项:散射 $\mu_{s}$ 与内散射辐射亮度 $L_{s}$ 的乘积,两者结合给出了体积内散射的结果。在缓存点初始化过程中,当我们为每个缓存点遍历每个光源时,除了估计每个光源在每个缓存点处的总贡献外,我们还计算一个散射样本权重 B,该权重近似于 $\mu_{s}(\mathbf{y})$、入射辐射亮度 $L(\mathbf{y},\omega^{\prime})$ 和相位函数 $\rho(\mathbf{y},\omega,\omega^{\prime})$ 乘积在立体角上的积分,其中 y 是在缓存点影响半径内随机采样的位置,$\omega^{\prime}$ 和 l 分别是 y 到光源上随机采样位置和到相机原点的方向(图 8a)。该权重 B 按每个光源每个缓存点存储(图 8b)。 在体积路径追踪期间,我们查询沿光线最近的缓存点,并使用每个缓存点中存储的权重 B 构建分段线性的一维 CDF,以从中抽取散射样本(图 8b)。由于在某些情况下构建此一维 CDF 可能代表每条光线的巨大开销,我们目前仅对沿相机光线的直接光照样本使用此采样策略。将此技术的使用限制在相机光线上,仍能让我们显著改善视觉上突出的单次散射效果,同时保持较低的整体性能开销。 为了高效渲染诸如光学厚体积中的高阶散射等情况,我们将我们的技术与传统的基于透射率的采样相结合,并使用多重重要性采样(MIS)$$ Miller et al. 2019 $$. We start by using the 1D CDF to pick a scattering point $(\mathbf{x}_{k})$ , and then we use ratio tracking moving towards the scattering point to update the path’s PDF. This process is repeated until distance sampling steps the ray through the selected scattering point; we can represent this process as:

p_{cachepoint}(\bar{x}) = p_{select}(x_{k})\bar{T}(x_{0},x_{1})\bar{\mu}(x_{1})\bar{T}(x_{1},x_{2})

\bar{\mu}(x_{2})\bar{T}(x_{2},x_{3})…\bar{u}(x_{k - 1})\bar{T}(x_{k - 1},x_{k})\text{\quad\quad}(8)

where each $\bar{T}(x_{n},x_{n + 1})\bar{\mu}(x_{n})$ is the result of step =. To formulate the same path using null-collision tracking to get the PDF, we use the sampled distance and \`¯ to compute )¯, and we already know $P_{n}$ and $P_{s}$ based on our choice of tracking algorithm. For all of the path vertices found before our selected scattering point, we apply the PDF $P_{n}$ and repeat the distance sampling process and update the corresponding PDFs until we reach the selected scattering point and apply PDF $P_{s};$ ; this process gives us:

p_{null}(\bar{x}) = \bar{T}(x_{0},x_{1})\bar{\mu}(x_{1})P_{n}(x_{1})\bar{T}(x_{1},x_{2})\bar{\mu}(x_{2})

P_{n}(x_{2})\ldots\bar{T}(x_{k - 1},x_{k})\bar{\mu}(x_{k - 1})P_{s}(x_{k})

WhilethepathPDFsrepresentedbyEquations8and9lookverylong,mostofthetermscancelouttoformamuchsimplernalexpressionfortheMISweight:![image](https://aduvfx1252404142.cos.apbeijing.myqcloud.com/posts/cachepointsforproductionscaleocclusionawaremanylightssamplingandvolum/rId63.jpg)![image](https://aduvfx1252404142.cos.apbeijing.myqcloud.com/posts/cachepointsforproductionscaleocclusionawaremanylightssamplingandvolum/rId66.jpg)![image](https://aduvfx1252404142.cos.apbeijing.myqcloud.com/posts/cachepointsforproductionscaleocclusionawaremanylightssamplingandvolum/rId69.jpg)Figure9:Asceneconsistingofbrightlightsembeddedinaheterogeneousvolumewithlow(top)andhigh(bottom)extinctioncoe#cients.Nullcollisiontrackingalone(left)doesnotworkwellwiththinnervolumes;ourcachepointsbasedtechnique(middle)performswellwiththinvolumesbuthastroublewiththickervolumes.Combiningbothtechniquesthroughmultipleimportancesampling(right)e#cientlysamplesboththethinandthickvolumecases.Resultsshownareequalsample.©2024Disney While the path PDFs represented by Equations 8 and 9 look very long, most of the terms cancel out to form a much simpler nal expression for the MIS weight: ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId63.jpg) ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId66.jpg) ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId69.jpg) Figure 9: A scene consisting of bright lights embedded in a heterogeneous volume with low (top) and high (bottom) extinction coe\#cients. Null-collision tracking alone (left) does not work well with thinner volumes; our cache points based technique (middle) performs well with thin volumes but has trouble with thicker volumes. Combining both techniques through multiple importance sampling (right) e\#ciently samples both the thin and thick volume cases. Results shown are equal sample. © 2024 Disney

p_{null}(\bar{x}):p_{probes}(\bar{x}) = P_{n}(x_{1})\ldots P_{n}(x_{k - 1})

\bar{\mu}(x_{k})P_{s}(x_{k}):p_{select}(x_{k})

We demonstrate our technique working in conjunction with conventional null-scattering through MIS in an equal-sample comparison in Figure 9 and in an equal-time production comparison in Figure 10. Compared to equi-angular sampling $$ Kulla and Fajardo 2012 $$, our cache points based method performs better when rendering highly anisotropic volumes since our approach e%ectively factors in the phase function term. Additionally, our approach bypasses the need to sample a light vertex before performing distance sampling, instead, we rely on the cache points system’s scattering sample weight B as a global estimate of direct illumination. ## 4.3 体积发射采样 (Volumetric Emission Sampling) 在高度发射的非均匀体积嵌入薄各向异性介质的情况下,我们用于采样体积内散射的缓存点方法需要与一种高效收集体积发射的方法相结合。需要额外收集发射方法的原因可以从 \`¯ 通常的选择方式看出:作为 $\mu_{t} = \mu_{a} + \mu_{s}$ 的优控函数,并且介质事件以概率 $\begin{matrix} P_{t} = \frac{\mu_{t}}{\bar{\mu}},P_{a} = \frac{\mu_{a}}{\bar{\mu}} \\ \end{matrix}$ , and $\begin{matrix} P_{s} = \frac{\mu_{s}}{\bar{\mu}} \\ \end{matrix}$ . 在体积发射函数与消光函数强烈不相关的情况下,例如在低消光的高度发射体积中,null-tracking 很可能会完全跳过潜在的高度发射区域。 我们利用了 $$ 的观察结果 Kutz et al. 2017 $$ that \`¯, $,P_{a},P_{s},$ and $P_{n}$ can be treated as arbitrary uncorrelated parameters as long as their contributions are counter-balanced by appropriate sample weights. To force ratio tracking $$ Novák et al. 2014 $$ to take more steps in highly emissive regions, instead of setting $\bar{\mu}$ 对于 $\mu_{t}$ 的局部最大值,我们选择 `¯ 为 $\hat { \mu } = m a x ( \mu _ { t } , \mu _ { a } L _ { e } )$ 。我们始终将 `¯ 设置为小于平均体素大小,以避免在单个体素内进行过多的查找。由于吸收和 null-collision 事件不需要追踪新射线,我们设置 $P_{a} = P_{n} = 1$ ,这使得追踪器在每个自由路径样本处收集发射,从而产生每条射线更高质量的发射估计(算法 1)。 算法 1 | | | ------------------------------------- | | 1: function EVALUATEEMISSION(x, ω, d) | | 2: w← 1, Le← 0 | | 3: repeat | | 4: Δt← -ln(1-ζ)/μ | | 5: x← x - Δt × ω | | 6: Le← Le + w × μa(x)×Le(x)/μ | | 7: w← w × μn(x)/μ | | 8: until (t← t + Δt) \> d | | 9: return Le | | 10: end function | 接下来,为了使我们的技术可用于next event estimation,我们不仅需要更好地评估异质volume的emission,还需要采样并评估采样方向的PDF。为此,我们首先扩展Villemin & Hery $$ 2013 $$: we use an emission-energy-distribution grid, which is just a coarser version of the volume, in order to make sure that more emissive regions of the volume have a higher chance of receiving light samples. In Villemin & Hery $$ 2013 $$, point sampling is used, but point sampling can be sub-optimal when emission is occluded by heavy smoke or when the emissive region is large; in these cases, high sample counts are required to capture emission details in glossy re\#ections. Instead, we use our emission-optimized tracker to evaluate every tracking point along the ray, gathering more information in each light sample, e%ectively performing line integration. Finally, in order to use MIS to combine BSDF samples with our emissive volume light samples in the solid angle domain $$ Simon et al. 2017 $$, we track which cells in the emission-energydistribution grid that the light sample ray has passed through and integrate PDFs stored in each of these cells using a Jacobian transform: ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId73.jpg) Figure 10: A production scene from Raya and the Last Dragon containing a small bright light source embedded in a thin heterogeneous volume. Equal-time comparison of a conventional null-collision approach utilizing spectral-decomposition tracking (left) and incorporating our cache point based in-scattering sampling via MIS (right). Combining our cache points based in-scattering sampling with null-collision tracking produces a robust technique that works well both further from (top two rows) and closer to (bottom row) small bright light sources. © 2024 Disney

\begin{matrix} p_{\sigma}(\omega) = \int_{0}^{\infty}p_{x}(t)t^{2}dt \ = P_{0}(t_{1}^{3} - t_{0}^{3})/3 + P_{1}(t_{2}^{3} - t_{1}^{3})/3 + P_{2}(t_{3}^{3} - t_{2}^{3})/3 \ \ \end{matrix}

We summarize this approach in Algorithm 2. We combine our cache point based volumetric in-scattering sampling approach and our volumetric emission sampling approach using MIS to produce a single unied volume integrator, allowing us to e”ciently sample strong light sources embedded in heterogeneous volumes with either low or high extinction coe”cients (Figure 11). Algorithm 2 | | | --------------------------------- | | 1: function PDFEMISSION(x, ω) | | 2: p = 0 | | 3: for voxel v along ray(x, ω) do | | 4: $$ t0, t1 $$ ← v entry/exit | | 5: p ← p + (t13-t03/3) × pdf(v) | | 6: 结束 for | | 7: 返回 p | | 8: 结束函数 | ## 5 制作经验与讨论 我们在每一部使用 Disney 的 Hyperion Renderer 渲染的影片中,都将我们的 cache points 系统作为默认的光源选择策略;在过去十年中,我们已经渲染了数百万最终帧,以及 数量级更多的制作中帧,并取得了巨大成功。由于 cache points 系统默认启用,且其所有构建过程在渲染器启动时自动执行,无需额外的用户干预、用户输入或用户指导,我们的艺术家无需关心他们在场景中放置了多少光源。cache points 系统使我们能够满足艺术指导要求,这些要求需要在我们最近的许多影片中实现极其复杂的照明,而无需我们的 TD 或渲染团队的任何额外干预;参见图 7。例如,来自《Us Again》的码头序列(图 1)包含大约 480 万个光源;得益于 cache point 系统,该序列的照明和渲染过程如此常规且波澜不惊,以至于我们直到序列完成后才意识到场景中有多少光源。 以上所有之所以可能,是因为为使 cache points 达到生产就绪状态所付出的大量努力。构建一个生产就绪的系统需要考虑系统如何融入艺术家的日常工作流程,并在实际生产案例中展现出明显优势;我们在第 5.1 节中讨论了其中一些话题。使生产就绪系统成为生产就绪的另一部分,仅仅是通过实际使用获得的经验。在实际使用中,挑战性案例和失败案例往往与成功案例一样有趣(如果不是更有趣的话!);在第 5.2 节中,我们提供了几个在生产中遇到的有趣挑战和失败案例的案例研究。 ## 5.1 性能结果与确定性 在制作环境中部署新颖的渲染技术时,在渲染时间和内存方面的额外开销与在收敛性和艺术家工作流程方面的整体收益之间的权衡,是决定该技术是否值得在生产中部署的关键部分。此外,确定性和时间一致性也始终是制作中的主要关注点。在本节中,我们将讨论这两个主题,并展示来自真实世界制作示例的渲染时间和内存测量结果。 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId78.jpg) ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId81.jpg) RMSE: 0.0820 RMSE: 0.0594 RMSE: 0.0473 图 11:来自《Raya and the Last Dragon》的一个制作场景,由嵌入在薄各向异性非均匀雾气中的火把(建模为发射性 heterogeneous volumes)照亮。等时间比较:使用 spectral-decomposition tracking 的传统 null-collision 方法(左),结合我们的 emission sampling 策略(中),以及通过 MIS 额外结合我们的 scattering sampling 策略(右)。在发射性 heterogeneous volumes 周围的区域(上两行),结合我们的 emission 和 scattering sampling 策略相比仅使用 emission sampling 产生了显著改进,而在由发射性体积照亮但薄雾较少的区域(下行),MIS 确保结合我们的 emission 和 scattering sampling 策略的表现不会比仅使用 emission sampling 更差。© 2024 Disney 5.1.1 制作场景的结果。我们选择了三个有趣的真实世界制作场景来展示性能结果。第一个场景(图 1)来自我们的短片《Us Again》,之所以选择它,是因为它展示了复杂遮挡与大量光源的组合——总计 4881396 个发射性三角形、发射性体积和分析光源。第二和第三个场景来自我们的长片《Encanto》。第二个场景(图 4,包含 38720 个光源)是局部最优光源选择表现相对较好的情况;我们选择这个场景是为了证明,即使在局部最优光源选择的最佳真实世界场景中,我们的 cache points 系统在整体性能上仍然更优。第三个场景(图 5,包含 4406 个光源)展示了我们学习到的可见性估计在使 cache points 在复杂遮挡下仍保持稳健方面变得重要的情况。所有三个场景都包含一定量的 volumetric scattering。我们在配备双 18 核 Intel Xeon Gold 6254 处理器的系统上进行了测量;时间以实际经过的墙上时钟时间表示,所有测量均在 32 samples-per-pixel (SPP) 的渲染上进行。 我们展示了总渲染时间(表 1)、均方根误差(RMSE)(表 2)和内存使用量(表 4)的测量结果。为了用一个有用的数字来量化所有采样方法的性能,我们使用 time-to-unit-variance (TTUV) 来呈现结果,其定义为方差乘以总渲染时间(表 3)。该指标表示达到方差值为 1 所需的分钟数,这使我们能够直接比较每种技术的收敛速度。对于我们在此展示的所有指标,值越低表示性能越好。 在我们所有三个真实世界制作示例中,cache points 在 time-to-unit-variance 方面相对于均匀光源选择和局部最优光源选择都表现出绝对优势,这意味着 cache points 将始终比任何其他技术更快地达到给定的期望噪声水平。即使在图 4 的场景中,局部最优光源选择表现良好,cache points 在 time-to-unit variance 上仍快了近 1.2 倍,而在图 1 的场景中,cache points 在绝对墙上时钟时间和 time-to-unit variance 上都快了一个数量级。与均匀光源选择相比,cache points 在 time-to-unit variance 上大约快两倍到一个数量级不等,这源于 cache points 系统能够保持 RMSE 值接近最优局部光源选择,同时将渲染时间保持在接近均匀光源选择的水平。 使用在线学习进行可见性估计通常会在 time-to-unit-variance 上带来适度但仍然有用的改进;然而,当存在复杂遮挡时(图 5),我们看到可见性估计在 time-to-unit-variance 上提供了更显著的 1.2 倍加速。此外,这里展示的测试是相对较低 SPP 的渲染,在此期间可见性估计系统只能学习到可见性项的粗略近似;在实践中,随着可见性估计的改进,time-to-unit-variance 也会进一步改善。 构建 cache points 数据结构确实会给渲染器的 time-to-first-pixel 增加一些开销,并且存储 cache points 数据结构需要额外的内存开销;如表 1 所示,time-to-first-pixel 增加了大约一分钟,如表 4 所示,额外的内存开销通常在几 GB 的量级。相对于制作帧的典型渲染时间,额外的一分钟启动时间通常并不显著令人担忧。由于我们对渲染器将生成的 cache points 数量设置了硬性上限,我们可以为增加的内存开销提供一个有保证的、固定的上限。我们普遍发现,与典型制作场景的总内存使用量相比,这些增加的内存开销相对较小,而且,除了最简单的场景(如 Cornell Box)之外,cache points 在收敛速度上提供了足够大的优势,使得增加的内存开销是值得的。 表1:使用均匀光源选择、局部最优光源选择和缓存点进行32 SPP渲染的时间。我们分别展示了仅缓存点初始化过程(构建)的时间,以及包含初始化过程的整个渲染时间(总计): | | | | | | -------------------- | ----------------------- | -------------------- | ------------------- | | 时间(32 SPP) | Us Again:4881396 光源 | Encanto:38720 光源 | Encanto:4406 光源 | | 均匀选择 | 10分16秒 | 12分2秒 | 19分10秒 | | 最优选择 | 123分51秒 | 16分32秒 | 24分21秒 | | 缓存点(总计) | 13分13.1秒 | 15分0.5秒 | 19分35秒 | | 缓存点(构建) | 1分32秒 | 1分4秒 | 34秒 | | 相对于均匀选择的加速 | 慢0.29倍 | 慢0.37倍 | 慢0.02倍 | | 相对于最优选择的加速 | 快9.37倍 | 快1.10倍 | 快1.24倍 | 表2:在32 SPP下,使用均匀光源选择、局部最优光源选择和缓存点测得的均方根误差(RMSE)。对于缓存点,我们分别展示了未启用和启用可见性估计在线学习时的RMSE: | | | | | | ---------------------------- | ----------------------- | -------------------- | ------------------- | | RMSE(32 SPP) | Us Again:4881396 光源 | Encanto:38720 光源 | Encanto:4406 光源 | | 均匀选择 | 0.2280 | 0.0896 | 0.0675 | | 最优选择 | 0.1436 | 0.0321 | 0.0377 | | 缓存点(无可见性) | 0.1448 | 0.0339 | 0.0407 | | 缓存点 | 0.1443 | 0.0320 | 0.0369 | 表3:在32 SPP下使用均匀光源选择、局部最优光源选择和缓存点测量的达到单位方差所需时间(TTUV)。对于缓存点,我们展示了未启用和启用了可见性估计在线学习时的TTUV: | | | | | | ---------------------------- | ----------------------- | -------------------- | ------------------- | | TTUV(32 SPP) | Us Again:4881396 光源 | Encanto:38720 光源 | Encanto:4406 光源 | | 均匀选择 | 0.5337 | 0.0966 | 0.0873 | | 最优选择 | 2.5552 | 0.0184 | 0.0346 | | 缓存点(无可见性) | 0.2769 | 0.0172 | 0.0324 | | 缓存点 | 0.2753 | 0.0154 | 0.0267 | 表4:均匀光源选择和缓存点的内存使用情况,两者之间的dierence即为缓存点所需所有数据结构的总大小。我们没有单独列出局部最优光源选择的内存使用情况,因为这些值与均匀光源选择相同: | | | | | | ----------------- | ------------------------ | --------------------- | -------------------- | | 内存使用 | Us Again: 4881396 光源 | Encanto: 38720 光源 | Encanto: 4406 光源 | | 均匀选择 | 33.62 GB | 45.0 GB | 85.6 GB | | 缓存点 | 37.51 GB | 46.17 GB | 87.05 GB | | 增加百分比 | 11.57% | 2.6% | 1.69% | 5.1.2 确定性与时间一致性。在给定相同起始随机种子和相同输入场景的情况下,我们的缓存点系统是完全确定性的。由于缓存点初始化过程高度并行化,必须格外小心以确保并行化步骤后仍保持确定性。在我们合并空间上相似的候选位置、合并具有相似光照分布的相邻缓存点以及在缓存点之间模糊光照分布的步骤中尤其如此;我们已在前面章节中详细说明了如何利用步骤间的排序和原子操作来避免竞态条件并保持确定性。 目前,我们没有采取任何额外步骤来保持帧间的时间一致性分布;我们为每一帧构建新的缓存点分布。这样做有两个原因。首先,由于系统相对于随机种子已经是确定性的,输入场景的微小变化通常只会导致缓存点分布发生与场景变化直接对应的微小变化。通常,当使用相同的起始种子进行渲染时,这一特性足以保持相邻帧之间的一致性,并且由于缓存点产生的是无偏的直接光照采样,随着图像收敛,任何噪声di%erences都变得无关紧要。其次,在实际制作中,我们希望相邻帧具有独立的采样,因此我们为每个镜头中的各帧赋予唯一的种子值。这样做的原因是我们的制作光照工作流程广泛依赖于我们内部的降噪器的$$ Dahlberg 等人 2019;Vogels 等人 2018 $$ advanced cross-frame denoising capabilities, which benet from reusing and spreading unique samples per frame across multiple frames $$ Zimmer et al. 2015 $$. 在生产使用中,temporal coherence 对我们的 cache points 系统来说并不是一个显著的问题,我们也不需要采取进一步措施来避免相关问题。一个罕见的例外情况在 5.2.1 节中描述。 ## 5.2 案例研究 在众多制作过程中,我们偶尔会遇到一些有趣的边缘情况,这些情况给 cache points 系统带来了挑战。我们在此详述一些例子。 5.2.1 失败案例:窄聚光灯。在《奇异世界》中,我们在使用 cache points 的生产场景中遇到了一些问题,这些场景在非常大的大气 volume 中使用了非常窄的聚光灯(图 12)。使用 cache points 渲染产生的帧中,聚光灯光束出现缺口或断裂,并且这些伪影在多个帧之间也是 temporally unstable 的。因此,在这类场景中,艺术家有时不得不手动禁用 cache points 系统,这在我们制作中是罕见的情况。 经过进一步调查,我们发现这个问题仅仅是因为我们在聚光灯的那些特定区域缺少 cache points。对于伪影区域内的点,最近的 cache points 位于聚光灯光束之外,因此其 light distributions 中不包含该聚光灯;因此,从 light distribution 中采样该聚光灯的概率为零,而聚光灯只能通过我们为 cache point 的 light distribution 之外的光源保留的低概率被极少地采样到。由于 cache points 最初是随机分布在物体包围盒中的,将 cache point 恰好放置在聚光灯锥体内的机会相当低,因为聚光灯角度相对于周围 volume 的大小来说非常小。我们在播种 cache points 时不考虑光源位置或方向,因此无法保证我们会将 cache points 放置在聚光灯锥体内。这就解释了为什么伪影在帧之间是 temporally unstable 的:有时我们运气好,cache point 放置得当,有时则运气不佳。 这个问题的一个潜在解决方案是在渲染的后续迭代中添加新的 cache points。我们可以使用后续迭代中的光线命中作为新的 cache point 候选,并使用第 3.1.2 节所述方法的修改版本来接受它们。然后,我们希望更新现有 cache points 的 distributions,以考虑新添加的 cache points。 5.2.2 失败案例:扩展到数十亿光源。如第 2 节所述,虽然我们没有使用 light hierarchy 进行全局 many-lights 采样,但我们确实使用 light hierarchies 来选择自发光网格内的各个三角形。然后,cache points 系统将每个自发光网格视为单个光源,而不是在 cache point 系统的 light distributions 中直接处理每个自发光三角形。做出这一选择的一个主要动机来自《海洋奇缘》期间的一项早期实验,该实验将每个自发光三角形直接作为可唯一寻址的光源放入 cache point 系统。虽然该实验在渲染时的光源采样方面没有问题,并产生了预期的质量,对光源采样的计算工作量没有重大影响,但该实验确实导致渲染器启动时 cache point 构建时间极长。原因是我们目前在将光源分类到近处和远处 distributions 时,以及在确定将哪些光源放入远处 distribution 的哪些 cardinal bins 时,只是简单地线性遍历场景中的所有光源。由于我们目前以线性循环方式执行这些步骤,当场景中的光源数量接近巨大数字(例如数十亿)时,cache points 系统的启动时间扩展性很差。另一种方法是使用空间加速结构来筛选构建 light distributions 时需要考虑的光源数量;一种可能性是简单地将每个光源的质心放入 KD-tree 并执行 kNN 搜索,而另一种可能性是使用 light hierarchy,不是用于 many-lights 采样本身,而是用于驱动 cache point light distribution 的构建。 5.2.3 Volumetric Scattering 中的挑战。使用 cache points 改进 volumetric in-scattering 采样和 volumetric emission 采样,显著提高了我们高效渲染复杂和困难 volume 设置的能力(图 13)。随着我们电影视觉复杂性和丰富性的不断提高,我们的艺术家现在通常会为大多数场景填充某种形式的薄大气 volume,以提供额外的光照细节和造型。嵌入薄 volume 中的火炬或发光魔法效果以及戏剧性的 godrays 对我们来说都是常见的场景;有关我们利用 cache points 进行更好 volume 采样的一些近期制作示例,请参见图 13。 然而,我们当前用于 volumetric in-scattering 和 volumetric emission 采样的 cache points 解决方案的一个主要缺点是,该系统会显著增加渲染时间;每个 sample per pixel (SPP) 的方差低得多,但计算所需的时间也长得多。因此,虽然 many-lights 采样的 cache points 默认启用,但我们还无法将 volumetric in-scattering 和 volumetric emission 采样的 cache points 也默认启用。虽然除了在渲染 volumes 时简单地决定是否启用该系统外,艺术家无需设置或调整任何额外参数,但我们仍然希望艺术家甚至不需要做出打开或关闭系统的决定。理想情况下,我们要么找到一种方法来降低该系统的每 SPP 计算工作量,要么推导出一种机制,使渲染器本身能够在渲染时自动检测该系统在当前场景中是否有帮助,从而自动启用或禁用该系统。 我们 volume 解决方案的另一个小缺点是,该系统依赖于已经为 many-lights 采样存在的 cache point 位置,并且在选择 cache point 位置时没有更多地考虑额外的 volumetric 参数。虽然 cache points 被放置在路径发生 volumetric scattering 的位置,但我们没有采取任何措施来增加光学薄介质中的 cache point 密度,在这些介质中,较大的自由飞行距离意味着 cache points 可能分布非常稀疏,并且在选择 cache point 位置时,我们完全没有考虑 volumetric emission 场。对于 volumetric in-scattering 和 emission 采样,cache point 覆盖密度不足可能导致类似于第 5.2.1 节讨论的噪声不连续性,而 cache point 覆盖过于密集则可能导致不必要的内存使用增加。对于像 god rays 这样的情况,一个可能有用的扩展是添加第三种基于从光源进行光子追踪的 cache point 位置播种机制。对于 volumetric emission 等情况,一个可能有用的扩展是在 volume 的包围盒内放置 cache point 位置时,考虑各个体素的范围和密度。 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId85.jpg) 1) 启用 cache points 的场景 ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId88.jpg) 2) 禁用 cache points 的场景 图 12:来自《奇异世界》的一个精简制作场景,启用了 cache points (a) 和禁用了 cache points (b)。此渲染的主要焦点是一些聚光灯和一个大的大气 volume;所有其他几何体已被遮罩。请注意,虽然启用 cache points 的渲染通常比禁用 cache points 的渲染噪声更少,但中间的两个聚光灯有缺失的块(用红色箭头突出显示)。© 2024 Disney ## 6 未来工作 我们现已在许多制作中成功使用了 cache points 系统;迄今为止,所有使用迪士尼 Hyperion Renderer 渲染的制作都默认启用了 cache points 系统。然而,本着不断寻求改进艺术家工作流程的精神,我们设想对 cache points 系统进行一些可能的改进。 交互性与 GPU 实现。随着交互性、快速的艺术迭代时间以及最佳的首像素时间在所有依赖渲染器的用户工作流程中变得越来越重要,需要在渲染器开始追踪光线之前进行预计算的算法变得越来越没有吸引力;因此,我们对能够在交互式用例中良好工作的渐进式 cache points 公式感兴趣。与其预先构建整个 cache point 数据结构,我们设想在渲染的最初几个 SPP 期间并行地渐进构建 cache point 数据结构,这可能是一种有希望的方法,可以加快首像素时间,同时仅在初始样本集中牺牲少量的噪声改进。此外,虽然我们的内部 CPU 生产渲染器在其存在的几乎整个历史中都使用了 cache points 系统,但我们的内部交互式 GPU 路径追踪器目前使用了 ReSTIR 的组合 $$ Bitterli et al. 2020 $$ and a light hierarchy without occlusion estimates $$ Estevez and Kulla 2018 $$ for light selection; ideally we would prefer to have the same light selection strategy across both renderers. To this end, reformulating the cache points system to work well on the GPU for both build and sampling is a major point of interest for us. We would also like to experiment with using cache points as the initial light selection method used to drive ReSTIR. 替代空间数据结构。我们的 cache points 系统目前将点存储在 KD-tree 中;我们有兴趣研究更好的空间数据结构,这些结构可以减少内存占用、改善数据结构构建时间或改善点搜索时间。虽然由于需要执行多次 kNN 搜索操作,KD-trees 可能仍然是初始 cache points 构建过程的一部分,但在实际 path tracing 时,我们只需要为每个路径顶点查找一个单一的 cache point,这使得诸如哈希网格之类的空间数据结构成为存储最终渲染时 cache point distribution 的有前途的替代方案。对于 GPU 实现,多分辨率哈希网格比 KD-tree 尤其有吸引力 $$ Davidovič 等人 2014 $$. Improved Sampling Information. Our system currently does not utilize joint sampling to take into account the BSDF when performing light selection $$ Christensen et al. 2018 $$; adding this capability could potentially help in eliminating the need to sample large numbers of lights that do not line up with highly glossy BSDFs. In real-world production scenes, we expect that using a joint sampling approach in conjunction with our occlusion estimate approach could lead to signicant sampling e”ciency improvements. In a similar vein: we currently do not consider surface orientation when merging spatially neighboring cache points during the build process; a potential solution could be to factor in a surface normal for cache points placed on surfaces and allow otherwise nearby cache points to not be merged if they have highly divergent corresponding surface normals. 与路径引导相结合。目前我们使用两个独立的系统来引导直接光照和间接光照;对于间接光照,我们使用 Practical Path Guiding $$ Müller 2019;Müller 等人 2017 $$. Unifying systems for guiding direct and indirect lighting is a worthy goal; to this end, we have investigated combining Practical Path Guiding and the technique from Vevoda et al. $$ 2018 $$ with promising results. 与光子映射相结合。Hyperion 包含一个用于渲染折射焦散的光子映射系统。由于该光子映射系统是在缓存点系统核心实现之后实现的,我们的缓存点系统目前并未利用光子映射系统的任何能力,但这两个系统似乎具有互补的能力。例如,使用前向光追踪中的光子来生成候选缓存点位置是我们现有方法的自然延伸,并且可能有助于处理像第 5.2.1 节中的情况。我们的光子映射系统使用自适应光子引导技术,在渲染过程中学习光源上的光子发射函数,类似于 Estevez 和 Kulla $$ 2020a;2020b $$; combining this adaptive photon guiding with our learned occlusion estimates in cache points seems promising. Finally, our cache point locations are chosen up-front and not further rened during throughout the course of the render; using a mechanism similar to the ones found in PPM $$ Hachisuka et al. 2008 $$ and SPPM $$ Hachisuka 和 Jensen 2009 $$ to progressively rene cache point locations is a possible approach. ![image](https://aduvfx-1252404142.cos.ap-beijing.myqcloud.com/posts/cache-points-for-production-scale-occlusion-aware-many-lights-sampling-and-volum/rId92.jpg) Figure 13: Production frames Raya and the Last Dragon utilizing our cache points system for sampling volumetric inscattering and volumetric emission. © 2024 Disney # 7 CONCLUSION We have presented Cache Points, a many-lights sampling system used by Disney’s Hyperion Renderer to render millions of production frames over the past decade. We have described in detail how cache points are populated and novel aspects of the system such as online learning of visibility estimates, how they are used in light sampling at render-time, and how we have extended them for use in accelerating di”cult volumetric scattering cases. We have also discussed some production experiences, real-world success cases, and real-world failure cases for our system, along with potential paths for future improvement. # ACKNOWLEDGMENTS The techniques presented in this paper have seen continual improvement over the years, with many contributions made by both current and past members of the Hyperion development team. In addition to the authors, other past key contributors to the cache points system include Patrick Kelly, Ralf Habel, Ben Spencer, Benedikt Bitterli, and Matt Jen-Yuan Chiang. The authors are also thankful to Mackenzie Thompson, Andrew Bauer, Brian Green, Mark Lee, and Lea Reichardt from the Hyperion development team for their support of and feedback on this paper. We thank Jan Novák, Marios Papas and Thomas Müller from Disney Research|Studios and Cli% Ramshaw and Julian Fong from Pixar’s RenderMan development team for interesting and helpful discussions on the topics of many-lights sampling and volumetric scattering through optically thin media. We also thank Ivo Kondapaneni for his work implementing Vevoda et al. $$ 2018 $$’s technique in an experimental branch of Hyperion, which has served as a useful comparison point. We are also thankful to our anonymous paper referees for their invaluable feedback. 最后,我们特别感谢众多使用过 Hyperion 的艺术家和技术指导,他们多年来的反馈、建议和合作影响并塑造了渲染器的各个方面,包括本文提出的缓存点系统。

本文采用 Creative Commons BY-NC-ND 4.0 协议进行授权。

BY-NC-ND: 署名-非商业性使用-禁止演绎

End of Article