Scripted Data Transform Component
The Scripted Data Transform component uses Groovy to define a data transformation process. The user defining a pipeline containing this component must provide a script containing the logic for transforming the objects passing through the component. The script should implement the com.datashyft.pipeline.scripting.ScriptedTransform interface that defines the required transformInput method. As items are received on the component’s input channels, they are passed into the script’s transformInput method which returns a list of 0 or more data items, each specifying an output channel it should be sent on. The resultant list of items will be sent out the specified Output Channel in sequence. If an output item specifies a non-existent output channel, the item is discarded.
Data Governance
Data Items returned from the Scripted Data Transform component are automatically configured to setup the parent-child relationship between the input data item and the output data items.
By default, if the output of the transform is a different object from the input item, the output item is registered as a derivative of the input object. If an input item does not generate any output from the transform, its ID is stored. When one or more output items are later returned by the component, those items will be given the DGS IDs of all input items since the last output as their parents, and the stored ID list is reset. If the transform returns the input item unmodified, no registration is performed.
If the transform assigns a dgsParents tag to its output data items, the output items are registered as being derived from all the data items specified in the tag. If the tag is present, the automatic registration behavior of the component is bypassed.
Auditing
Scripts can send events to the deployment’s audit logs using an AuditLogger object. If your script needs to send audit logs, extend the com.datashyft.pipeline.scripting.ScriptedAuditor abstract class, which provides access to an AuditLogger instance via its getAuditLogger() method. The AuditLogger object defines a single method, sendAuditLog(category, message) , which sends messages to the deployment’s audit log — for example:
this.auditLogger.sendAuditLog(LogCategory.INFO, "Discarding an item")
The available categories are defined in the LogCategory class and include DEBUG , INFO , WARN , and ERROR .
Exception Handling
If the script cannot process a data item immediately, it can signal this to the deployment by throwing a ProcessAbortedException from the transformInput method. Throwing this exception causes the item to be requeued for later processing. If the exception includes a retry delay parameter, the component waits at least that long before retrying. The exception can also specify alternate output data to be emitted while requeuing the original input.
Example Script
import com.datashyft.core.model.dataobjects.PipelineData;
import com.datashyft.pipeline.util.AbstractGroovyComponent.AuditLogger;
import com.topiatechnology.mdci.strategies.DataOut;
import com.datashyft.pipeline.scripting.ScriptedTransform;
import java.util.ArrayList;
import java.util.List;
// @InputChannel("input", "PipelineData")
// @OutputChannel("output", "PipelineData")
class PassThroughTransform implements ScriptedTransform {
List<DataOut> transformInput (
String inputChannelName,
PipelineData input) {
List<DataOut> objectsToOutput = new ArrayList<>();
// Transform the object by inspecting input
// and constructing/populating objectsToOutput
return objectsToOutput;
}
}
Input Channels
channelN — Custom input channels must be specified for receiving data. Input channel names must match corresponding output channels.
Note: It is illegal to create an input channel named
Error. Doing so causes a startup error and places the pipeline in an error state.
See the Custom Input and Output Channels page for details on how to specify the custom input channels.
Output Channels
channelN — Custom output channels must be specified for data output. These channels output transformed PipelineData objects. See the Custom Input and Output Channels page for details on how to specify the custom input channels.
error — Outputs an ErrorOutput object for any errors that occur executing the transform.
Parameters
script — (String) The Groovy script to use to determine how a data item should be transformed.