Iop: Spatial Operators
=======================

The Iop class, as previously discussed, provides a parent class at some level in the inheritance tree for all image operators. It 
encapsulates the fundamentals of the 2d image architecture in NUKE, from caching through to channels. Most of the specifics
of the Iop class itself have already been tuoched upon, so prior to looking through this section please read :ref:`2d-architecture`.

Versus more specialised image operators such as PixelIop or DrawIop, the Iop class provides less locked down image access 
functions, at the expense of requiring more supporting boilerplate code.

If you have an image processing algorithm which cannot be expressed in such a fashion as to only require access to a single
row of image input data to calculate the corresponding output row then Iop is the way to go. An Iop is able to access image
data at any point on the input for a given output pixel, and is able to lock areas of the input image into memory (known as 
'tiles' and 'interests' in NUKE terminology). Of course, the larger the area you need to access the greater the memory 
overhead of your Op. An algorithm should be reduced to its  lowest possible requirement to ensure low memory overhead and
smooth operation on all hardware.

If you are just getting started with the NDK you may want to experiment first with PixelIops, before moving onto the more complex
Iop class.

* Intended for use when implementing image processing algorithms that cannot be factored to rely on solely the corresponding input pixel or row (PixelIop), or which draws into a single channel (DrawIop).
* Able to access any input pixel.
* Able to lock areas of input images into memory and optionally enforce calculation their contents at a particular time.

The Iop Class Specifics & Required Virtual Calls
------------------------------------------------

See the :ref:`2d-iop-call-order` section for a detailed overview of the calls used by Iop.

The three core calls required are:

.. cpp:function:: void Iop::_validate(bool for_real )
  
  Define what the output of your Op will be.  Eg. the size of the output image and channels produced.
  
  You should in turn call validate on all your inputs.

.. cpp:function:: void Iop::_request(int x, int y, int r, int t, ChannelMask, int count)

  Your Op will be called with a request area which is the area of image that a downstream Op will ask for during engine.
  
  For the requested area you need to turn call request on all your inputs for the area required to produce the image size requested.
  
  For instance if your Op does a blur with a kernel size of 20 you may need to request your input with a x,y,r,t that is 20 pixels bigger that the area requested.

.. cpp:function:: void Iop::engine(int y, int l, int r, ChannelMask channels, Row& row)

  Do the actual work.  Here you should get pixels from your inputs, process them and return them in 'row' for a given line as given in 'y'.  In 'row' you should fill from pixels starting at 'l' and finishing at 'r' for all 'channels'.

A Simple Iop example - AddInputs
--------------------------------

In this example we show a simple Iop that simply adds two inputs together.

Let's just run through the main concepts, open up the file 'AddInputs.cpp' in the example directory to see the full example.

.. code-block:: c

    int max_inputs() const { return 2; }
    int min_inputs() const { return 2; }

Here we are overiding some optional virtuals to define how many input pipes to show in the DAG.

Max inputs says we will never get more than two inputs.
Min inputs says we will never have less than two inputs.

Min inputs also means that there will *always* be two inputs connected to this Op even if the node in DAG is not connected to anything.

The inputs, if disconnected will in fact be instances of the Op 'Black', which produces black pixels.  This has a nice affect one our code as we don't need to check for disconnected inputs in any part of our processing.


.. code-block:: c

  void AddInputs::_validate(bool for_real)
  {
    copy_info(); // copy bbox channels etc from input0
    merge_info(1); // merge info from input 1
  }

In the implementation of the validate function we first copy the info, that is the channels, format, image size from input0 or the first input, by calling copy_info().

Then we merge the input from the other input.  It is important that we perform the merge otherwise our output image size would only ever be the size of Op connected to the first input.

Note that copy_info and merge_info implicitly call validate on each one of the inputs.

.. code-block:: c

  void AddInputs::_request(int x, int y, int r, int t, ChannelMask channels, int count)
  {
    // request from input 0 and input 1
    input(0)->request( x, y, r, t, channels, count );
    input(1)->request( x, y, r, t, channels, count );
  }

Our request is quite simple.  We are not accessing pixels spatially so we just request from both our inputs the same box and channels.

Note how we access the inputs by calling **input(inputNo)** which returns a pointer to each input Op.

.. code-block:: c

  void AddInputs::engine ( int y, int x, int r,
                                ChannelMask channels, Row& row )
  {
    // input 0 row
    row.get(input0(), y, x, r, channels);
   
    // input 1 row
    Row input1Row(x, r);
    input1Row.get(input1(), y, x, r, channels);
   
    foreach ( z, channels ) {
      const float* input1 = input1Row[z] + x;
      const float* input0  = row[z] + x;
      float* outptr = row.writable(z) + x;
      const float* end = outptr + (r - x);
   
      while (outptr < end) {
        *outptr++ = *input0++ + *input1++;
      }
    }
  }

Now this is the part that does the real work.  We first fetch out of input 0 and input 1 the line required from x through to r for the given channels.  We reuse the output row to fetch input0 and create a new Row to hold input1's Row.  Note we use the helper functions **input0()** and **input1()** to fetch the input Op references.  These functions are the equivalent of doing ***input(0)** and ***input(1)**.

Next we loop through all the channels and simply add all the pixels from input0 and input1 together and output them into 'row'.


Working With Tiles: SimpleBlur
-------------------------------

Often when performing image calculations your engine call will need to access more than just one Row from its input to produce its output Row.  In order to do that NUKE has a concept of a **Tile**.

A **Tile** has accessor functions on it that allow you to access pixels as a 2D dimensional array of the given tile size.

It is important to note as described in the :ref:`2d-architecture` section that the fundamental unit of image processing in NUKE is still a **Row**.  When a **Tile** is created NUKE is creating a cache on the input that the **Tile** was created for ( if one didn't exist already ) and then NUKE fills the Rows required on that input to fill the **Tile**.  Those Rows are then locked in the NUKE cache for the existance of the **Tile** object.  You then can use accessor functions on the **Tile** into the internal Rows in the cache for that input.

This also means when you have multiple threads all creating Tiles that overlap, quite often when the Tile is created many of the Rows are already in the cache for that Tile and only minimal extra processing occurs.

Final word of warning about Tiles and the request call.  It is very important that your **Tile** bounds *never* exceed the bounds requested from the input in **_request**.  Creating a tile that exceeds the requested area can have unexpected effects inculuding reading garbage pixels or even crashes.

Now lets look into an example, SimpleBlur.cpp.  This example does a simple box blur and covers most of the concepts of using **Tile**.

.. code-block:: c 

  int _size;
  
  SimpleBlur (Node* node) : Iop (node)
  {
    _size = 20;
  }

Firstly we have a hardcoded kernel size for our blur of 20 pixels stored in the contructor

.. code-block:: c

  void SimpleBlur::_validate(bool for_real)
  {
    copy_info(); // copy bbox channels etc from input0, which will validate it.
    info_.pad( 20 );  
  }

In validate we copy the input size but then we actually grow our bounding box by the blur size.

Without this extra step our blur would have been cropped at the edges to the input size.

.. code-block:: c

  void SimpleBlur::_request(int x, int y, int r, int t, ChannelMask channels, int count)
  {
    // request extra pixels around the input
    input(0)->request( x - _size , y - _size , r + _size, t + _size, channels, count );
  }

Next we request from our input.  We also grow the request area by the blur size as we need this many extra pixels around the requested box from the input to do the blur.

.. code-block:: c

  void SimpleBlur::engine ( int y, int x, int r,
                                ChannelMask channels, Row& row )
  {
   
    // make a tile for current line with padding arond for the blur
    Tile tile( input0(), x - _size , y - _size , r + _size, y + _size , channels);  
    if ( aborted() ) {
      std::cerr << "Aborted!";
      return;
    }
    
    foreach ( z, channels ) {
      float* outptr = row.writable(z) + x;
      float value = 0;   
      
      for( int cur = x ; cur < r; cur++ ) {
        float value = 0;
        float div = 0;
    
        if ( intersect( tile.channels(), z ) ) {  
          // a simple box blur
          for ( int px = -_size; px < _size; px++ ) {
            for ( int py = -_size; py < _size; py++ ) { 
              value += tile[z][ tile.clampy(y + py) ][ tile.clampx(cur + x) ];
              div++;
            }
          }
          if ( div )
            value /= div;
        }
        *outptr++ = value;
      }
    }
  }

The engine call first fetches a **Tile** of input0 around the current output line growing it by the blur size. 

After creating a **Tile** you *must* check for aborted() to check if filling the **Tile** failed.  If it did fail you should return immediately.

Then we loop through all pixels and channels doing a simple average of the pixels in the Tile per pixel to do our box blur.  Note the pixels in the tile can be accessed via a multi-dimension array per channel.  Note also that offsets into the Tile always are indexed based on absolute pixel locations not offsets into the Tile array.  In if the Tile is 9x9 pixels big starting at pixel x=101, y=101, then the first pixel is access like tile[z][101][101] **not** tile[z][0][0].

Note in this example we could have created a new Tile for every pixel rather than one big Tile up front for the whole Row.

.. note::
  
  Because **Tile** objects lock Rows into NUKE internal cache and cannot be freed under low memory conditions it is best to their lifetime as short as possible and their size as minimal as possible.  If you require access to the entire image but can access it one Row at a time it is better to loop through the image filling a **Row** object for every Row rather than creating a **Tile** object for the whole area.
 
Full Frame Processing and Interests
-----------------------------------

So far in this section we've discussed methods for accessing small subsections of the image to be able to generate the current output
pixel, however what about the circumstance where you need to access the entire image for a single output pixel - optical flow based
motion estimation for example.

To do this we'll lock the first **engine()** thread, and in that thread force calculation of the entirety of the incoming image.  The same thread can then do whatever global calculation is required based on this image data, before unlocking and allowing the remaining engine threads to process based on these results.

The example we'll give is a 'Normalise' operator which analyses the entire input image to find the highest value and then normalise this value to 1.0.

.. code-block:: c

  void Normalise::engine ( int y, int x, int r,
                                ChannelMask channels, Row& row )
  {
    {
      Guard guard(_lock);
      if ( _firstTime ) {
        // do anaylsis.
        Format format = input0().format();

        // these useful format variables are used later
        const int fx = format.x();
        const int fy = format.y();
        const int fr = format.r();
        const int ft = format.t();

        const int height = ft - fy ;
        const int width = fr - fx ;
        
        ChannelSet readChannels = input0().info().channels();

        Interest interest( input0(), fx, fy, fr, ft, readChannels, true );
        interest.unlock();
        
        // fetch each row and find the highest number pixel
        _maxValue = 0; 
        for ( int ry = fy; ry < ft; ry++) {
          progressFraction( ry, ft - fy );
          Row row( fx, fr );
          row.get( input0(), ry, fx, fr, readChannels );
          if ( aborted() )
            return;
            
          foreach( z, readChannels ) {
            const float *CUR = row[z] + fx;
            const float *END = row[z] + fr;
            while ( CUR < END ) {
              _maxValue = std::max( (float)*CUR, _maxValue );
              CUR++;
            }
          }
        }
        _firstTime = false;
      }
    } // end lock
    
    Row in( x,r);
    in.get( input0(), y, x, r, channels );
    if ( aborted() )
      return;
    
    foreach( z, channels ) {
      float *CUR = row.writable(z) + x;
      const float* inptr = in[z] + x;
      const float *END = row[z] + r;
      while ( CUR < END ) {
          *CUR++ = *inptr++ * ( 1. / _maxValue );
      }
    }
  }

First up, the two member variables **_firstTime** and **_lock** allow the engine threads to figure out if their instance is the first called for this image, or if another thread is already doing the necessary work. Since *engine* is multithreaded, we can't acertain from variables such as the current row index whether this is the first worker thread called (since the thread handling row 2 could be called before the thread handling row 1). Instead, we set **_firstTime** to true when initialised and in *_validate*, and then check for this at the beginning of the engine function. Alternative implementations can see the tile access work being done in *open*. Note that using *_validate* for such image access is not advisable as it will lock up the NUKE UI as the input image is calculated.

The **Guard** is a DDImage utility class defined in the DDImage **Thread.h** header. It's used to lock all other threads attempting to gain sole access to the **_lock** boolean, thus preventing them doing anything until that guard is released when it goes out of scope.

In this circumstance we then use the first engine thread pull the whole input image to find the highest value to use for our normalise calculation.  Instead of using a **Tile** we pull each **Row** out one at a time looping through the entire input image.

Before we fetch each **Row** however we add an interesting call that creates an **Interest** object.
   
.. code-block:: c

  Interest interest( input0(), fx, fy, fr, ft, readChannels, true );
  interest.unlock();
  
An **Interest** object is very similar to a **Tile** in that it locks an area of pixels into the NUKE Row cache.  The main difference being is it locks the area into the cache, but it does not fill the cache straight away during the constructor like **Tile**. 

The final argument to **Interest** is interesting too.  Here we are setting the multi-thread flag to *true*.  What this does it start up some threads that will background fill the cache for the interest area.  The reason we are doing this is we have effectivly made NUKE single threaded by locking out all the render threads in our engine call.  Without this flag, we would have to fetch the input Rows one at a time from our single thread.  With this flag the input cache is being filled by multiple threads.

.. note::
   Both **Interest** and **Tile** have a multi-thread flag for their constructor to fill the cache from multiple threads.  **Never use this option unless you have locked all the render threads in engine like this example**.   Usually when the engine call is not locked Interests or Tiles overlap and other render threads are filling the cache effectively making the calls already multi-threaded.  Adding this multi-thread option in this case will severely degrade performance as multiple threads fight for access to the cache.
   
Finally after the **Interest** is created we call 'unlock' on it.  This says to NUKE that if memory runs low it can still free lines from the cache if required.  In this case the **Interest** is more of a hint that we want NUKE to keep those rows in cache because we will access them all, *but* its not critical we need the whole interest area at once in memory.  

With these optimsations, the Normalise example here can still effectivily Normalise very large image sizes even though it requires the whole input image in the first engine call.

.. note::
    
   Accessing the entire image involves precalculating the entire source image before work can begin on calculating the target image. This means that the op can have both a large memory footprint, and will form a breakage in the scanline processing architecture, thus in many circumstances appearing 'slow' to users in the interface. If your algorithm can be refactored to not rely on the entire source image then it should.
   

.. TODO: Sample Calls & Pixels in NUKE: Building IDistort

Exercise: Build a Median Node
---------------------------

Now lets take what we've covered so far in this section and apply it. For this exercise, take the SimpleBlur.cpp source code in the NDK, get it building, then:

* Change its name to SimpleMedian, and ensure it can be created in the DAG.
* Add a knob to the node to control the Median size
* Amend the engine to find the median value of the Tile and set the output Pixel as required.
* Amend the help and tooltip text to reflect this.
* Relax with a satisfying pint of beer.


