Scripted Data Transform Component

The Scripted Data Transform component uses Groovy to define a data transformation process. The user defining a pipeline containing this component must provide a script containing the logic for transforming the objects passing through the component.  The script should implement the com.datashyft.pipeline.scripting.ScriptedTransform  interface that defines the required transformInput  method.  As items are received on the component’s input channels, they are passed into the script’s transformInput  method which returns a list of 0 or more data items, each specifying an output channel it should be sent on.  The resultant list of items will be sent out the specified Output Channel in sequence.  If an output item specifies a non-existent output channel, the item is discarded.

Data Governance 

Data Items returned from the Scripted Data Transform component are automatically configured to setup the parent-child relationship between the input data item and the output data items.  

By default, if the output of the transform is a different object from the input item, the output item is registered as a derivative of the input object.  If an input item does not generate any output from the transform, its ID is stored.  When one or more output items are later returned by the component, those items will be given the DGS IDs of all input items since the last output as their parents, and the stored ID list is reset.  If the transform returns the input item unmodified, no registration is performed.

If the transform assigns a dgsParents tag to its output data items, the output items are registered as being derived from all the data items specified in the tag.  If the tag is present, the automatic registration behavior of the component is bypassed.

Auditing

Scripts can send events to the deployment’s audit logs using an AuditLogger  object. If your script needs to send audit logs, extend the com.datashyft.pipeline.scripting.ScriptedAuditor  abstract class, which provides access to an AuditLogger  instance via its getAuditLogger()  method. The AuditLogger  object defines a single method, sendAuditLog(category, message) , which sends messages to the deployment’s audit log — for example:

this.auditLogger.sendAuditLog(LogCategory.INFO, "Discarding an item")

The available categories are defined in the LogCategory  class and include DEBUGINFOWARN , and ERROR .

Exception Handling

If the script cannot process a data item immediately, it can signal this to the deployment by throwing a ProcessAbortedException  from the transformInput  method. Throwing this exception causes the item to be requeued for later processing. If the exception includes a retry delay parameter, the component waits at least that long before retrying. The exception can also specify alternate output data to be emitted while requeuing the original input.

Example Script

import com.datashyft.core.model.dataobjects.PipelineData;
import com.datashyft.pipeline.util.AbstractGroovyComponent.AuditLogger;
import com.topiatechnology.mdci.strategies.DataOut;
import com.datashyft.pipeline.scripting.ScriptedTransform;

import java.util.ArrayList;
import java.util.List;

// @InputChannel("input", "PipelineData")
// @OutputChannel("output", "PipelineData")
class PassThroughTransform implements ScriptedTransform {

 List<DataOut> transformInput (
        String inputChannelName,
        PipelineData input) {

    List<DataOut> objectsToOutput = new ArrayList<>();

    // Transform the object by inspecting input
    // and constructing/populating objectsToOutput

    return objectsToOutput;
  }
}

Input Channels

channelN — Custom input channels must be specified for receiving data. Input channel names must match corresponding output channels.

Note: It is illegal to create an input channel named Error . Doing so causes a startup error and places the pipeline in an error state.

See the Custom Input and Output Channels page for details on how to specify the custom input channels.

Output Channels

channelN — Custom output channels must be specified for data output. These channels output transformed PipelineData  objects.  See the Custom Input and Output Channels page for details on how to specify the custom input channels.

error — Outputs an ErrorOutput object for any errors that occur executing the transform.

Parameters

script — (String) The Groovy script to use to determine how a data item should be transformed.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us