This is default featured slide 1 title

Go to Blogger edit html and find these sentences.Now replace these sentences with your own descriptions.

This is default featured slide 2 title

Go to Blogger edit html and find these sentences.Now replace these sentences with your own descriptions.

This is default featured slide 3 title

Go to Blogger edit html and find these sentences.Now replace these sentences with your own descriptions.

This is default featured slide 4 title

Go to Blogger edit html and find these sentences.Now replace these sentences with your own descriptions.

This is default featured slide 5 title

Go to Blogger edit html and find these sentences.Now replace these sentences with your own descriptions.

Monday, 18 November 2013

IDQ Interview Questions Part1


Q1 What is the difference between the Power Center Integration Service and the Data Integration Service? 

The Power Center Integration Service is an application service that runs sessions and workflows.
The Data Integration Service is an application service that performs data integration tasks for the Analyst tool,the Developer tool, and external clients. The Analyst tool and the Developer tool send data integration task requests to the Data Integration Service to preview or run data profiles, SQL data services, and mappings. Commands from the command line or an external client send data integration task requests to the Data Integration Service to run SQL data services or web services.

Q2.What is the difference between the PowerCenter Repository Service and the Model Repository Service?

The PowerCenter application services and PowerCenter application clients use the PowerCenter Repository Service. The PowerCenter repository has folder-based security.

The other application services, such as the Data Integration Service, Analyst Service, Developer tool, and Analyst tool, use the Model Repository Service. The Model Repository Service has project-based security.
You can migrate some Model repository objects to the PowerCenter repository.

Q3 What is the difference between a mapplet in PowerCenter and a mapplet in the Developer tool? 

  1. Mapplet in PowerCenter and in the Developer tool is a reusable object that contains a set of transformations. You can reuse the transformation logic in multiple mappings. 
  2. PowerCenter mapplet can contain source definitions or Input transformations as the mapplet input.  It must contain Output transformations as the mapplet output. 
  3. Developer tool mapplet can contain data objects or Input transformations as the mapplet input.  It can contain data objects or Output transformations as the mapplet output. 
 A mapping in the Developer tool also includes the following features:

  1. You can validate a mapplet as a rule. 
  2. You use a rule in a profile. 
  3. A mapplet can contain other mapplets. 

Q4 What is the difference between a mapplet and a rule? 

You can validate a mapplet as a rule. A rule is business logic that defines conditions applied to source data when you run a profile. You can validate a mapplet as a rule when the mapplet meets the following requirements:

  1. It contains an Input and Output transformation. 
  2. The mapplet does not contain active transformations. 
  3. It does not specify cardinality between input groups. 

Q5 What is the difference between a source and target in PowerCenter and a physical data object in the Developer tool? 

In PowerCenter, you create a source definition to include as a mapping source. You create a target definition to include as a mapping target. In the Developer tool, you create a physical data object that you can use as a mapping source or target.


Q6 What is the difference between the PowerCenter Repository Service and the Model Repository Service? 

The PowerCenter application services and PowerCenter application clients use the PowerCenter Repository Service. The PowerCenter repository has folder-based security. The other application services, such as the Data Integration Service, Analyst Service, Developer tool, and Analyst tool, use the Model Repository Service. The Model Repository Service has project-based security.
You can migrate some Model repository objects to the PowerCenter repository.



IDQ Functionality


Use the IDQ to design and run processes to complete the following tasks:

Profile data : Profiling reveals the content and structure of data. Profiling is a key step in any data project, as it can identify strengths and weaknesses in data and help you define a project plan.

Create scorecards to review data quality : A scorecard is a graphical representation of the quality measurements in a profile.

Standardize data values : Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.

Parse data :  Parsing reads a field composed of multiple values and creates a field for each value according to the type of information it contains. Parsing can also add information to records. For example, you can define a parsing operation to add units of measurement to product data.

Validate postal addresses : Address validation evaluates and enhances the accuracy and deliverability of postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against address reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.

Find duplicate records : Duplicate analysis calculates the degrees of similarity between records by comparing data from one or more fields in each record. You select the fields to be analyzed, and you select the comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in record data.

Create reference data tables : Informatica provides reference data that can enhance several types of data quality process, including standardization and parsing. You can create reference tables using data from profile results.

Create and run data quality rules : Informatica provides rules that you can run or edit to meet your project objectives. You can create mapplets and validate them as rules in the Developer tool.

Collaborate with Informatica users :The Model repository stores reference data and rules, and this repository is available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.

Export mappings to PowerCenter : You can export mappings to PowerCenter to reuse the metadata for physical data integration or to create web services.

Tuesday, 29 October 2013

IDQ Parser Transformation

IDQ Parser Transformation 

In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.

Parser transformation can be created in two mode


  • Token Parsing Mode 
  • Pattern Based Parsing


Token Based Parsing : It is used to parse strings that match token sets regular expression or reference table based entries.We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields 
NAME COMPANY_NAME DOMAIN

Suppose we have input data coming as below 

Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk

We will create a token based parser transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new 

Token Based Parsing :It is used to parse strings that match token sets regular expression or reference table based enteries.
We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields 
NAME
COMPANY_NAME
DOMAIN
Suppose we have input data coming as below 

Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk

Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new 



Step2 : Click on Token Based

Step3 : Select Regular expression (As we want to have multiple output port)

Step4)  Select email parser or you can create your own regular expression to parse different type of transformation


Step5) Create three output port and click on OK then finish


Below is output from Parser transformation Name ,company and email id parsed into separate fields.


Pattern Based Parsing : Pattern based parsers are useful when working with data that needs to be parsed apart or sorted and the data has a moderately high number of patterns that are easily recognized.
Parser Based Transformation need to have output from Label Transformation which will provide two outputs LabelData and Tokenised data
Suppose we have a field named as PATTERN_DATA in source which contains name ,empno and date in it and we need to parse into three seperate fields
Step1 ) We will first create a label transformation with delimiter as , and below properties by creating new strategies 


in second tab choose execution order and assign label


    Output of Label transformation will be

Step2 ) Connect both LabeledOutput and Tokenized data to pattern based transformation
and create three new output port in port tab as shown below

  
Step3 ) In Pattern Tab define below (As per Label defined in Label) 


You can preview Parser data broken in three fields NAME EMPNO DOB

Hope this post make Parser transformation more clear..In case of nay question please send mail to support@ITNirvanas.com or leave your comment here.



Generating Dynamic Multiple Target files in Informatica

Recently we came across a scenario to generate multiple dynamic Multiple Target files in Informatica. We receive vendor data through Legacy database in below table


  1. Invoice_ID
  2. Invoice_No
  3. Invoice_Amount
  4. Vendor_Id


We need to separate all details related to one vendor in separate file so that we can pass data to third part vendors in separate file.


INVOICE_DETAILS


INVOICE_ID
INVOICE_NO
INVOICE_AMOUNT
VENDOR_ID
1
A01
100.00
10
2
A02
125.00
10
3
A03
150.00
10
4
B01
200.00
20
5
B02
500.00
20
6
D01
345.00
40
7
C01
230.00
30
8
D02
450.00
40
9
E01
40.00
50







In order to achieve this we need to follow below steps in mapping


Step1 ) Import INVOICE_DETAILS and drag into mapping and connect all port from Source qualifier to expression and do below sql override

SELECT INVOICE_DETAILS.INVOICE_ID, INVOICE_DETAILS.INVOICE_NO, INVOICE_DETAILS.INVOIC_AMOUNT, INVOICE_DETAILS.VENDOR_ID
FROM
INVOICE_DETAILS
order by VENDOR_ID


Step2) Create expression transformation with as below with 4 input port plus

  1. FILE_NAME (Output Port)  'Vendor_'||VENDOR_ID||'Details.csv'
  2. CHECK_VENDOR_ID(Variable Port)  iif (VENDOR_ID <> PREV_VENDOR_ID,1,0)
  1. OUTPUT_VENDOR_ID (Output)  CHECK_VENDOR_ID
  2. PREV_VENDOR_ID (Variable) VENDOR_ID




Step 3 ) Create Transaction Control Transformation (TC_GENERATE_FILES) with below logic 

iif(OUTPUT_VENDOR_ID=1,TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION)



Step 4 ) Create a File Target with same structure as source table and new field in target using option “Add FileName to this Table”

Connect FileName column from transaction control to FileName Port.



Step5) Run the workflow and below files will be created






Saturday, 26 October 2013

Informatica Port Order

Port Order

Informatica calculates ports in the following order:


  1. Input ports. Informatica calculates all input ports first as they are not dependent on any other ports. So, we can create input ports in any order.
  2. Variable ports. Variable ports can reference input ports and variable ports, but not output ports. As variable ports can reference input ports, the informatica calculates variable ports after input ports. Likewise, since variable can reference other variables, the display order for variable ports is the same as the order in which the PowerCenter Server evaluates each variable.
Suppose first we calculate the commission variable and then we need to calculate salary by using commission then we need to make sure port which hold commission should appear first before salary port.
  1. Output ports. As output ports can reference both input ports and variable ports, the informatica evaluates output ports last. The display order for output ports does not matter, as output ports cannot reference other output ports. Output ports should appear at the bottom of the list of ports



    How to capture previous row values

    Suppose we have to check whether employee’s department is same as last employee’s department .For that we can do the following

    1. Create two variables VAL_OLD_DEPT, VAL_NEW_DEPT
    2. Place VAL_OLD_DEPT before Input IN_DEPT Port
    3. Place VAL_NEW_DEPT After Input Dept Port
    4. In port VAL_OLD_DEPT assign following IIF(ISNULL(IN_DEPT),’NODEPT’, IN_DEPT)
    5. Assign VAL_NEW_DEPT equal IN_DEPT
    6. Create one output port to compare

Thursday, 24 October 2013

Informatica 9 new features for developers

Informatica 9 new features for developers

Informatica 9 has lot of new features including IDQ and Informatica Analyst etc ..in this post we will focus on features which are specially useful for developers

Lookup Transformation : Cache updates. We can update the lookup cache based on the results of an expression. When an expression is true, we can add to or update the lookup cache. We can update the dynamic lookup cache with the results of an expression. 


Multiple rows return We can configure the Lookup transformation to return all rows that match a lookup condition. A Lookup transformation is an active transformation when it can return more than one row for any given input row. 

SQL overrides for uncached lookups: In previous versions We could create a SQL override for cached lookups only. We can create an SQL override for uncached lookup. We can include lookup ports in the SQL query. 

Database deadlock resilience In previous releases, when the Integration Service encountered a database deadlock during a lookup, the session failed. Effective in 9.0, the session will not fail. When a deadlock occurs, the Integration Service attempts to run the last statement in a lookup. We can configure the number of retry attempts and time period between attempts. 

SQL transformation Auto-commit for connections. We can enable auto-commit for each database connection. Each SQL statement in a query defines a transaction. A commit occurs when the SQL statement completes or the next statement is executed, whichever comes first. 

Session Log files rollover We can limit the size of session logs for real-time sessions. We can limit the size by time or by file size. We can also limit the number of log files for a session. 

Passive transformation We can configure the SQL transformation to run in passive mode instead of active mode. When the SQL transformation runs in passive mode, the SQL transformation returns one output row for each input row. 


XML transformation XML Parser buffer validation. The XML Parser transformation can validate an XML document against a schema. The XML Parser transformation routes invalid XML to an error port. When the XML is not valid, the XML Parser transformation routes the XML and the error messages to a separate output group that We can connect to a target. 

Wednesday, 23 October 2013

Informatica Scenario Based Interview Questions (Part1)


Interviewer asked “How  will you get first 3 records from flat file source? “...You will say in your mind that’s simple

You reply “We can use variable in expression and increment it and the use filter transformation to pass just first two records”

Interviewer will come up with a new trick in his sleeve “How  will you get last 3 records from flat file source? “

Déjà vu …J

You will think and try to build some logic on the fly and try to explain may we can use this transformation or some another transformation and will get bit confused.

I have tried to explain this with a simple example

Suppose we have customer data coming in below format and we want to have last 3 rows in target


Step1)

Create expression transformation (exp_Build_Logic)  drag all three input ports and create three extra ports as below

Ø  v_dummy (Output Port) –Assign Value 1 to it
Ø  v_Total_Count (Variable Port) := v_Total_Count+1
Ø  o_Total_Count := v_Total_Count






Step2) Create Aggregator transformation with all ports and do not choose any port as group by port





Step 3)

 Create Joiner transformation jnr_Agg_Expression as Sorted Input (You need to choose this option as you are trying to use two sources originating from same source)

Take two ports from aggregator and rest from expression transformation and join on field v_dummy


Output from joiner



Step 4)  Create filter transformation after joiner with following conditions
o_Total_Count_Agg-o_Total_Count1<=2


This will past last three records

Step5)

Output will be



Mapping



XML