This document describes notable changes introduced in Katana 3.5. These changes include:
Katana node graphs, their Op trees and the scenes they subsequently create are incredibly flexible and varied. To optimally evaluate these scenes, an evaluation engine must be both efficient and flexible enough to handle the variety and complexity of scenes it’s possible to author using Katana. Geolib3-MT, the next generation of Katana scene graph processing engine, provides a greater degree of configuration, introspection and tuning options than previous versions of Geolib3 to meet the demands of increasingly complex and varied workloads. In this section we explore some of these options and how they can be leveraged to improve scene traversal performance.
Geolib3-MT can be configured via the RenderSettings node. All Geolib3-MT options live under the sceneTraversal heading.
Determines how many logical cores Geolib3-MT will use during scene traversal phase. Unlike previous versions, Geolib3-MT uses an internal thread pool to improve scene traversal time. The following diagram demonstrates the difference between Geolib3-MT and previous versions of Katana.
Note: The default value (0), causes Geolib3-MT to use all available logical cores on the host computer. Whilst the core Geolib3-MT processing engine scales well as the number of cores increases individual Ops within an Op tree may not exhibit the same scaling characteristics. Consequently, it is possible that an increase in threads result in an increase (not decrease) of scene traversal time. In this case, the new profiling tools available in Katana 3.5 can be used to identify these Ops and refactor/optimize their behaviour. The same is true of Ops marked “thread unsafe”, as these require the acquisition of a Global Execution Lock (GEL), which further limits scene traversal scalability.
Geolib3-MT can perform a pre-processing step in which it examines the topology of the Op tree to identify constructs that can be potentially optimised. One such optimisation is the collapsing of sequences of Ops of the same type into a single instance of that Op. There are a number of benefits to this,
Note:The Op tree optimization pass is an experimental feature and is therefore turned off by default. It can be turned via the sceneTraversal.opTreeOptimizaions option on a RenderSettings node.
The Op tree optimizer will attempt to collapse any chain of Ops of the same type if it calls GeolibSetupInterface::setOpsCollapsible() during the setup() call. Callers of this function must specify the name of an attribute which Geolib3 will pass to the Op's cook() call as an Op argument. This attribute will contain an ordered array of attributes (ordered upstream Op to downstream Op) containing the collapsed Ops' arguments. The Op is then able to deal with this "batch" Op argument as appropriate.
To avoid a large number of informational messages filling the Render Log, Geolib3-MT does not log messages related to performance by default. These messages can be enabled by turning on the sceneTraversal.verboseLogging option on a RenderSettings node. Currently these messages include information about:
Geolib3-MT includes a number of settings to control the behaviour of the caching subsystem. The caching subsystem is responsible for the storage and retrieval of previously cooked scene graph locations, known as "cook results". These settings can be modified from the RenderSettings node on a project-by-project basis. Sensible defaults have been provided based on testing against production scale scenes. Further information about each of the settings is provided below,
Caching, and the trade-off between memory usage and time to first pixel can have a significant impact on the performance of scene traversal time and rendering. Using the settings provided by Geolib3-MT it's possible to tune the memory footprint during the scene traversal phase of rendering. Here are some considerations when deciding to experiment with these scenes,
If turned on, Geolib3-MT will perform a traversal of the scene graph populating an internal cache. The extent of this traversal can be controlled by the settings under sceneTraversal.cachePrepopulation and are explained below,
Based on the values of the above settings, on completion of the cache prepopulation phase the Geolib3-MT cache will have been pre-populated with either the whole scene graph or a subsection of it. Geolib3-MT has been optimised to provide efficient access to renderer plug-ins via the existing FnScenegraphIterator API to this cache. This cache is a scalable, thread safe cache, as such we encourage renderer plug-in writers to access this cache concurrently to improve the performance of the scene build phase.
Note: If the Geolib3-MT cache is not fully populated, cache access (via FnScenegraphIterator) will result in a cache miss. In this case the requested location will be cooked using the calling thread.
Various APIs have been extended to improve performance and memory managment
Some new functions named setup(), cleanup() and setRootIterator() have been added to the RenderBase class for renderer plug-ins in the plugin_apis/include/FnRender/plugin/RenderBase.h header. These enable render plug-ins to render multiple frames in a single instance instead of creating a new instance of the plug-in for each frame.
Geolib3-MT adds a new render type, called Preview Render with Profiling, designed to help track down performance problems in scene traversal. This performs a normal Preview Render, but also captures information about which Ops have run, the amount of CPU used by them to cook locations, and the amount of memory used for attributes and Lua scripts.
A Preview Render with Profiling outputs profiling data in two places:
A Preview Render with Profiling can be started from the same menu as any other render, by choosing the Preview Render with Profiling command.
This option will be available for any renderer that already supports a Preview Render, and requires no additional work on the part of the renderer. If the renderer implements the finalize() method of the Geolib3-MT Runtime, these profiling reports will be created when the runtime is finalized; otherwise reports will be written when the render finishes.
The name, type and numerical ID of the Op. Each Op has name, type and a unique numeric ID. For example, an OpScript Op may have name op74, type OpScript.Lua and ID 77. Note the name and ID need not correlate.
The name and type of the Katana node that spawned the Op. In cases where an Op is spawned directly by a Katana node, the name and type of that node are recorded. In cases where the Op was created implicitly, the node name will equal _NoName_ and the type will equal _NoType_. This occurs, for example, with MaterialFilenameResolve Ops: these Ops are created implicitly when a filename needs to be resolved, so no Katana node is identified as the creator.
Note: If sceneTraversal.opTreeOptimizations is turned on and chains of Ops are collapsed, node name and type will be replaced with a string generated from the chain. If the chain has length t, formed of Ops of type opType, where Op k is named ok and is generated by a Katana node named ni, the general form of the string will be:
cop(o0(n0)->o1(n1)->...->ot(nt))
Note however that the format of this string is not guaranteed to remain fixed.
The total CPU time that Op spent cooking locations. Each Op will cook many locations, and the time spent doing this, across all scene traversal threads, is accumulated. Thus, CPU time should scale with number of scene traversal threads when a scene is traversed in parallel. If this is not the case, there may be a thread-unsafe Op upstream of the Op in question.
The memory footprint of that Op. Each Op must allocate memory to cook locations, and the memory total per Op is aggregated. At present, only the following allocations are recorded:
A summary report will be written to the Render Log upon completion of a Preview Render with Profiling. This report is intended to give a high-level overview of the profile data, and contains:
The relevant section of an example Render Log is shown below:
In addition to the summary report, a JSON file containing the raw profiling data is written to disk. The directory it is written to is determined by the --profiling-dir command-line option; if this is not set, it will be written to the temporary directory for the Katana session. If this directory does not exist, it will be created (if filesystem permissions allow). The filename takes the following format:
profile_<renderer>_previewRender_<datetime>.json
where:The file contains a single JSON object with the following properties:
Property | Type | Description | Example |
---|---|---|---|
timestamp | string | ISO8601 timestamp at which the profile file was written. | 2019-10-11T09:37:06Z |
renderer | string | Name of the render plug-in. | dl |
renderMethodName | string | Name of the render method; currently always “previewRender”. | previewRender |
environment | object |
An object containing values of various environment variables, including:
|
{ “KATANA_RELEASE”: “3.5v1”, “KATANA_ROOT”: /opt/foundry/katana3.5”, “KATANA_RESOURCES”: “<unset>” } |
profileMode | string | Name of the profile mode; currently always “basic”. | basic |
ops | array | Array of objects describing resources consumed by each Op. | See below. |
numOps | number | Length of the Ops array. | 78 |
wallTime | number | Wall-clock time in seconds between render start and the profiling file being written; if the renderer implements finalize(), this equates to scene traversal time. | 46.85064 |
cpuTime | number | Sum of CPU time for all Ops, in seconds. | 91.39238 |
memoryUsed | number | Sum of memory footprints for all Ops, in bytes. | 10728607911 |
The ops property contains an array of objects of the following format, one for each Op that was executed during scene traversal.
Property | Type | Description | Example |
---|---|---|---|
opId | number | The unique integer identifier for the Op. | 23 |
opName | string | The unique name of the Op. | op223 |
opType | string | The type of the Op. | AttributeSet |
nodeName | string | The name of the Katana node responsible for creating this Op, or _NoName_ if the Op was created implicitly. | RenderSettings_SetSamples |
nodeType | string | The type of the Katana node responsible for creating this Op, or _NoType_ if the Op was created implicitly. | RenderSettings |
cpuTime | number | The total time this Op spent cooking locations across all threads, in seconds. | 0.54512136 |
memoryUsed | number | The total memory footprint, as defined above, this Op used while cooking locations, in bytes. | 185378321 |
The environment variable KATANA_EXPERIMENTAL_MONITOR_OVERLAY is no longer used. The Monitor Layer feature is now an experimental feature of the Viewer (Hydra) tab and can be turned on and off via a toggle button, toggled menu item, or keyboard shortcut (`).
A new mode for optimizing the launch of the renderboot process that hosts the render plug-in has been added to Katana's batch mode, which can be turned on by passing the new --reuse-render-process command-line option to the Katana executable (in addition to --batch). When running Katana in batch mode with this option turned on, multiple frames will be rendered with a single instance of renderboot hosting the render plug-in, rather than launching a new renderboot process for every frame. Using this option can greatly improve performance of rendering sequences of frames, as the render plug-in and all other Katana plug-ins don't have to be reloaded for every frame.
Some new functions named setup(), cleanup() and setRootIterator() have been added to the RenderBase class for renderer plug-ins in the plugin_apis/include/FnRender/plugin/RenderBase.h header. These enable render plug-ins to render multiple frames in a single instance instead of creating a new instance of the plug-in for each frame.
TP 55592 / BZ 27736 - A new -V / --verbose command-line option for controlling the level of verbosity of logging informational messages has been added to the Katana executable:
-V LEVEL, --verbose=LEVEL The level of verbosity of logging informational messages. Defaults to 1. Set to 0 to suppress most informational messages.
TP 280477 - It is now possible to copy the textual representation of selected items from deferred group tree widgets in the Attributes tab to the clipboard. This type of widget is used, for example, for the lightList group attribute at /root/world and for the material.nodes group attribute on material locations that represent Network Materials.
Copyright © 2020 The Foundry Visionmongers Ltd.