Scripted Data Filter Component
The Scripted Data Filter component is designed to discard entire objects based on configured filter rules. It is a multi-input/multi-output component that outputs data to a channel whose name matches the input data’s channel. Inputs are typed — data of a mismatched type will be discarded, and an error will be generated on the Error Output Channel.
The Scripted Data Filter Component uses a Groovy script to define filtering rules. com.datashyft.pipeline.scripting.ScriptedDataFilter interface, which defines the required shouldFilter method. The shouldFilter method that accepts two parameters: a String containing the channel name and a PipelineData object containing the data item, returning a boolean. The method should return true if the item should be filtered (dropped) and false if it should be passed through.
Data Governance
This component does not modify the Data Governance status of any items.
Auditing
Scripts can send events to the deployment’s audit logs using an AuditLogger object. If your script needs to send audit logs, extend the com.datashyft.pipeline.scripting.ScriptedAuditor abstract class, which provides access to an AuditLogger instance via its getAuditLogger() method. The AuditLogger object defines a single method, sendAuditLog(category, message) , which sends messages to the deployment’s audit log — for example:
this.auditLogger.sendAuditLog(LogCategory.INFO, "Discarding an item")
The available categories are defined in the LogCategory class and include DEBUG , INFO , WARN , and ERROR .
Exception Handling
If the script cannot process a data item immediately, it can signal this to the deployment by throwing a ProcessAbortedException from the shouldFilter method. Throwing this exception causes the item to be requeued for later processing. If the exception includes a retry delay parameter, the component waits at least that long before retrying. The exception can also specify alternate output data to be emitted while requeuing the original input.
Example Script
import com.datashyft.core.model.dataobjects.PipelineData;
import com.datashyft.pipeline.util.AbstractGroovyComponent.AuditLogger;
import com.datashyft.pipeline.scripting.ScriptedDataFilter;
// @InputChannel("data", "PipelineData")
// @OutputChannel("data", "PipelineData")
class ExampleFilter implements ScriptedDataFilter {
boolean shouldFilter(
String inputChannelName,
PipelineData input) {
boolean objectShouldBeFiltered = false;
// Examine the input object and set objectShouldBeFiltered
// to indicate if the object should be dropped (true) or
// passed through (false).
return objectShouldBeFiltered;
}
}
Input Channels
channelN — Custom input channels must be defined for data to be received. The name of the input channel must match the name of a corresponding output channel. The component receives a PipelineData object on the channel and passes it to the script.
Note: It is illegal to create an input channel named
Error. Doing so causes a startup error and places the pipeline in an error state.
See the Custom Input and Output Channels page for details on how to specify the custom input channels.
Output Channels
channelN — Custom output channels must be defined for data output. The name of the output channel must match a corresponding input channel. The component outputs PipelineData objects of the same type as the input channel. See the Custom Input and Output Channels page for details on how to specify the custom output channels.
error — Outputs ErrorOutput objects for any errors that occur while executing the filter script.
Parameters
script — (String) The Groovy script to use to determine if a data item should be filtered or not.