What's New in Katana 3.5

Introduction

This document describes notable changes introduced in Katana 3.5. These changes include:

Table of Contents

Geolib3-MT Runtime

Katana node graphs, their Op trees and the scenes they subsequently create are incredibly flexible and varied. To optimally evaluate these scenes, an evaluation engine must be both efficient and flexible enough to handle the variety and complexity of scenes it’s possible to author using Katana. Geolib3-MT, the next generation of Katana scene graph processing engine, provides a greater degree of configuration, introspection and tuning options than previous versions of Geolib3 to meet the demands of increasingly complex and varied workloads. In this section we explore some of these options and how they can be leveraged to improve scene traversal performance.

Configuration

Geolib3-MT can be configured via the RenderSettings node. All Geolib3-MT options live under the sceneTraversal heading.

sceneTraversal.maxCores

Determines how many logical cores Geolib3-MT will use during scene traversal phase. Unlike previous versions, Geolib3-MT uses an internal thread pool to improve scene traversal time. The following diagram demonstrates the difference between Geolib3-MT and previous versions of Katana.

Note: The default value (0), causes Geolib3-MT to use all available logical cores on the host computer. Whilst the core Geolib3-MT processing engine scales well as the number of cores increases individual Ops within an Op tree may not exhibit the same scaling characteristics. Consequently, it is possible that an increase in threads result in an increase (not decrease) of scene traversal time. In this case, the new profiling tools available in Katana 3.5 can be used to identify these Ops and refactor/optimize their behaviour. The same is true of Ops marked “thread unsafe”, as these require the acquisition of a Global Execution Lock (GEL), which further limits scene traversal scalability.

sceneTraversal.opTreeOptimizations

Geolib3-MT can perform a pre-processing step in which it examines the topology of the Op tree to identify constructs that can be potentially optimised. One such optimisation is the collapsing of sequences of Ops of the same type into a single instance of that Op. There are a number of benefits to this,

Note:The Op tree optimization pass is an experimental feature and is therefore turned off by default. It can be turned via the sceneTraversal.opTreeOptimizaions option on a RenderSettings node.

Further Information

The Op tree optimizer will attempt to collapse any chain of Ops of the same type if it calls GeolibSetupInterface::setOpsCollapsible() during the setup() call. Callers of this function must specify the name of an attribute which Geolib3 will pass to the Op's cook() call as an Op argument. This attribute will contain an ordered array of attributes (ordered upstream Op to downstream Op) containing the collapsed Ops' arguments. The Op is then able to deal with this "batch" Op argument as appropriate.

sceneTraversal.verboseLogging

To avoid a large number of informational messages filling the Render Log, Geolib3-MT does not log messages related to performance by default. These messages can be enabled by turning on the sceneTraversal.verboseLogging option on a RenderSettings node. Currently these messages include information about:

sceneTraversal.cache Group

Geolib3-MT includes a number of settings to control the behaviour of the caching subsystem. The caching subsystem is responsible for the storage and retrieval of previously cooked scene graph locations, known as "cook results". These settings can be modified from the RenderSettings node on a project-by-project basis. Sensible defaults have been provided based on testing against production scale scenes. Further information about each of the settings is provided below,

Further Information

Caching, and the trade-off between memory usage and time to first pixel can have a significant impact on the performance of scene traversal time and rendering. Using the settings provided by Geolib3-MT it's possible to tune the memory footprint during the scene traversal phase of rendering. Here are some considerations when deciding to experiment with these scenes,

sceneTraversal.useCachePrepopulation

If turned on, Geolib3-MT will perform a traversal of the scene graph populating an internal cache. The extent of this traversal can be controlled by the settings under sceneTraversal.cachePrepopulation and are explained below,

Based on the values of the above settings, on completion of the cache prepopulation phase the Geolib3-MT cache will have been pre-populated with either the whole scene graph or a subsection of it. Geolib3-MT has been optimised to provide efficient access to renderer plug-ins via the existing FnScenegraphIterator API to this cache. This cache is a scalable, thread safe cache, as such we encourage renderer plug-in writers to access this cache concurrently to improve the performance of the scene build phase.

Note: If the Geolib3-MT cache is not fully populated, cache access (via FnScenegraphIterator) will result in a cache miss. In this case the requested location will be cooked using the calling thread.

API Improvements

Various APIs have been extended to improve performance and memory managment

Analysis & Profiling Tools

Geolib3-MT Profiling

Geolib3-MT adds a new render type, called Preview Render with Profiling, designed to help track down performance problems in scene traversal. This performs a normal Preview Render, but also captures information about which Ops have run, the amount of CPU used by them to cook locations, and the amount of memory used for attributes and Lua scripts.

A Preview Render with Profiling outputs profiling data in two places:

Starting a Preview Render with Profiling

A Preview Render with Profiling can be started from the same menu as any other render, by choosing the Preview Render with Profiling command.

This option will be available for any renderer that already supports a Preview Render, and requires no additional work on the part of the renderer. If the renderer implements the finalize() method of the Geolib3-MT Runtime, these profiling reports will be created when the runtime is finalized; otherwise reports will be written when the render finishes.

What information is captured?

The name, type and numerical ID of the Op. Each Op has name, type and a unique numeric ID. For example, an OpScript Op may have name op74, type OpScript.Lua and ID 77. Note the name and ID need not correlate.

The name and type of the Katana node that spawned the Op. In cases where an Op is spawned directly by a Katana node, the name and type of that node are recorded. In cases where the Op was created implicitly, the node name will equal _NoName_ and the type will equal _NoType_. This occurs, for example, with MaterialFilenameResolve Ops: these Ops are created implicitly when a filename needs to be resolved, so no Katana node is identified as the creator.

Note: If sceneTraversal.opTreeOptimizations is turned on and chains of Ops are collapsed, node name and type will be replaced with a string generated from the chain. If the chain has length t, formed of Ops of type opType, where Op k is named ok and is generated by a Katana node named ni, the general form of the string will be:

cop(o0(n0)->o1(n1)->...->ot(nt))

Note however that the format of this string is not guaranteed to remain fixed.

The total CPU time that Op spent cooking locations. Each Op will cook many locations, and the time spent doing this, across all scene traversal threads, is accumulated. Thus, CPU time should scale with number of scene traversal threads when a scene is traversed in parallel. If this is not the case, there may be a thread-unsafe Op upstream of the Op in question.

The memory footprint of that Op. Each Op must allocate memory to cook locations, and the memory total per Op is aggregated. At present, only the following allocations are recorded:

Profiling Summary Report

A summary report will be written to the Render Log upon completion of a Preview Render with Profiling. This report is intended to give a high-level overview of the profile data, and contains:

The relevant section of an example Render Log is shown below:

Profiling JSON File

In addition to the summary report, a JSON file containing the raw profiling data is written to disk. The directory it is written to is determined by the --profiling-dir command-line option; if this is not set, it will be written to the temporary directory for the Katana session. If this directory does not exist, it will be created (if filesystem permissions allow). The filename takes the following format:

profile_<renderer>_previewRender_<datetime>.json

where:

The file contains a single JSON object with the following properties:

Property Type Description Example
timestamp string ISO8601 timestamp at which the profile file was written. 2019-10-11T09:37:06Z
renderer string Name of the render plug-in. dl
renderMethodName string Name of the render method; currently always “previewRender”. previewRender
environment object An object containing values of various environment variables, including:
  • KATANA_RELEASE
  • KATANA_ROOT
  • KATANA_RESOURCES
{
  “KATANA_RELEASE”: “3.5v1”,
  “KATANA_ROOT”: /opt/foundry/katana3.5”,
  “KATANA_RESOURCES”: “<unset>”
}
profileMode string Name of the profile mode; currently always “basic”. basic
ops array Array of objects describing resources consumed by each Op. See below.
numOps number Length of the Ops array. 78
wallTime number Wall-clock time in seconds between render start and the profiling file being written; if the renderer implements finalize(), this equates to scene traversal time. 46.85064
cpuTime number Sum of CPU time for all Ops, in seconds. 91.39238
memoryUsed number Sum of memory footprints for all Ops, in bytes. 10728607911

The ops property contains an array of objects of the following format, one for each Op that was executed during scene traversal.

Property Type Description Example
opId number The unique integer identifier for the Op. 23
opName string The unique name of the Op. op223
opType string The type of the Op. AttributeSet
nodeName string The name of the Katana node responsible for creating this Op, or _NoName_ if the Op was created implicitly. RenderSettings_SetSamples
nodeType string The type of the Katana node responsible for creating this Op, or _NoType_ if the Op was created implicitly. RenderSettings
cpuTime number The total time this Op spent cooking locations across all threads, in seconds. 0.54512136
memoryUsed number The total memory footprint, as defined above, this Op used while cooking locations, in bytes. 185378321

Experimental Features

Reusing the Render Process in a Batch Render

Other Notable Feature Enhancements