IDQ Parser Transformation
In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.
Parser transformation can be created in two mode
Step4) Select email parser or you can create your own regular expression to parse different type of transformation
In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.
Parser transformation can be created in two mode
- Token Parsing Mode
- Pattern Based Parsing
Token Based Parsing : It is used to parse strings that match token sets regular expression or reference table based entries.We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields
NAME COMPANY_NAME DOMAIN
Suppose we have input data coming as below
Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk
We will create a token based parser transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new
Token Based Parsing :It is used to parse strings that match token sets regular expression or reference table based enteries.
We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields
NAME
COMPANY_NAME
DOMAIN
Suppose we have input data coming as below
Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk
Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new
Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new
Step2 : Click on Token Based
Step3 : Select Regular expression (As we want to have multiple output port)
Step4) Select email parser or you can create your own regular expression to parse different type of transformation
Step5) Create three output port and click on OK then finish
Below is output from Parser transformation Name ,company and email id parsed into separate fields.
Pattern Based Parsing : Pattern based parsers are useful when working with data that needs to be parsed apart or sorted and the data has a moderately high number of patterns that are easily recognized.
Parser Based Transformation need to have output from Label Transformation which will provide two outputs LabelData and Tokenised data
Suppose we have a field named as PATTERN_DATA in source which contains name ,empno and date in it and we need to parse into three seperate fields
Step1 ) We will first create a label transformation with delimiter as , and below properties by creating new strategies
in second tab choose execution order and assign label
Output of Label transformation will be
Step2 ) Connect both LabeledOutput and Tokenized data to pattern based transformation
and create three new output port in port tab as shown below
Step3 ) In Pattern Tab define below (As per Label defined in Label)
You can preview Parser data broken in three fields NAME EMPNO DOB
Hope this post make Parser transformation more clear..In case of nay question please send mail to support@ITNirvanas.com or leave your comment here.
Nice..
ReplyDeleteCheck - www.tekclasses.in
I appreciate you sharing this article. Really thank you. Informatica Data Quality Online Training
ReplyDeleteVery informative and very useful article thank you for sharing
ReplyDeleteInformatica online training
Nicely explained about token based parser transformation. To know more knowledge and more easily need to watch demo tutorial Informatica Data Quality Tutorial
ReplyDeleteHello There,
ReplyDeleteAmaze! I have been looking bing for hours because of this and I also in the end think it is in this article! Maybe I recommend you something helps me all the time?
Looking for a template (Excel is best) to document all the IDQ rules. The intention is that this template will be passed to the data owners to share the rules with development team in a detailed way. Look for what all attributes are must to gather.
It ensures validating and improving address information, profiling, and cleansing business data, or implementing a data governance practice, and other data quality requirements. Informatica Data Quality uses a unified platform to deliver quality data for all business initiatives and applications.
Awesome! Thanks for putting this all in one place. Very useful!
Kind Regards,
Vinny