Scripted Data Filter Component

The Scripted Data Filter component is designed to discard entire objects based on configured filter rules. It is a multi-input/multi-output component that outputs data to a channel whose name matches the input data’s channel. Inputs are typed — data of a mismatched type will be discarded, and an error will be generated on the Error Output Channel.

The Scripted Data Filter Component uses a Groovy script to define filtering rules. com.datashyft.pipeline.scripting.ScriptedDataFilter  interface, which defines the required shouldFilter  method. The shouldFilter  method that accepts two parameters: a String  containing the channel name and a PipelineData  object containing the data item, returning a boolean. The method should return true  if the item should be filtered (dropped) and false  if it should be passed through.

Data Governance

This component does not modify the Data Governance status of any items.  

Auditing

Scripts can send events to the deployment’s audit logs using an AuditLogger  object. If your script needs to send audit logs, extend the com.datashyft.pipeline.scripting.ScriptedAuditor  abstract class, which provides access to an AuditLogger  instance via its getAuditLogger()  method. The AuditLogger  object defines a single method, sendAuditLog(category, message) , which sends messages to the deployment’s audit log — for example:

this.auditLogger.sendAuditLog(LogCategory.INFO, "Discarding an item")

The available categories are defined in the LogCategory  class and include DEBUGINFOWARN , and ERROR .

Exception Handling

If the script cannot process a data item immediately, it can signal this to the deployment by throwing a ProcessAbortedException  from the shouldFilter  method. Throwing this exception causes the item to be requeued for later processing. If the exception includes a retry delay parameter, the component waits at least that long before retrying. The exception can also specify alternate output data to be emitted while requeuing the original input.

Example Script

import com.datashyft.core.model.dataobjects.PipelineData;
import com.datashyft.pipeline.util.AbstractGroovyComponent.AuditLogger;
import com.datashyft.pipeline.scripting.ScriptedDataFilter;

// @InputChannel("data", "PipelineData")
// @OutputChannel("data", "PipelineData")
class ExampleFilter implements ScriptedDataFilter {

  boolean shouldFilter(
        String inputChannelName,
        PipelineData input) {
    boolean objectShouldBeFiltered = false;

    // Examine the input object and set objectShouldBeFiltered
    // to indicate if the object should be dropped (true) or
    // passed through (false).

    return objectShouldBeFiltered;
  }
}

Input Channels

channelN — Custom input channels must be defined for data to be received. The name of the input channel must match the name of a corresponding output channel. The component receives a PipelineData object on the channel and passes it to the script.

Note: It is illegal to create an input channel named Error . Doing so causes a startup error and places the pipeline in an error state.

See the Custom Input and Output Channels page for details on how to specify the custom input channels.

Output Channels

channelN — Custom output channels must be defined for data output. The name of the output channel must match a corresponding input channel. The component outputs PipelineData objects of the same type as the input channel. See the Custom Input and Output Channels page for details on how to specify the custom output channels.

error — Outputs ErrorOutput objects for any errors that occur while executing the filter script.

Parameters

script — (String) The Groovy script to use to determine if a data item should be filtered or not.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us