Tuesday, March 27, 2012

DB2 lookup with parameters

Hi,

We have a package using a lookup query on DB2 to validate data from a file. Everything works fine, except for the lookup query that has to cache about 1,5 million rows.

Now I would like to specify parameters to that query to minimize the data being cached. I tried using parameters in the query, but I get an error:

"Provider cannot derive parameter information and SetParameterInfo has not been called."

Anyone had that problem?

I am using Microsoft's OLE DB Provider for DB2.

Thanks

Can you elaborate a litle more about how you are planning to use parameter to reduce the number of rows?

To reduce the number of rows, just add a where clause in the query (do not select a table from the dropdown list); also limit the number of columns in the select part of the query. 1.5 million rows should not be a problem; unless there is a limited amount of RAM and/or the row size of the lookup query is too big.

|||

One of the lookup column is the transaction date/time (transaction_ts).

I want to cache only the rows where transaction_ts is between the min and max values of the transaction_ts in the text file.

There are only about 15000 rows in the text file and 1,5 million in the lookup. Currently the lookup operation represents approx. 97% of the total execution time so I'm trying to see if it could be optimized...

Thanks

|||

Fleo,

Using paramters in the Lookup is not what you want. By using paramters in the lookup, the component would be using partial cache; which translate on issuing a query against the lookup table for each row passing trhough; hence degradating performance even more. A work around could be to build a view (with a filter in transaction_ts) using Execute SQl task and then configure the lookup component to use the view instead.

BTW, can you provide more details on the current settings of the lookup (query, data type of each column in the query, cache mode, etc)? I still think 1.5 million rows is not that much...

|||

Hi Rafael,

Thanks for the suggestion, it's a good idea. I could also build a view which is using fields in a "configuration" table and update those fields.

I know 1.5 million rows is not that much. Width of the row is int + datetime + char(19) = 31 bytes.

Can't SSIS use full cache even with parameters? I mean the values of the parameters do not depend upon the value of the current row. SSIS would be retrieving the same data every time...

|||

Fleo,

No, you cannot use parameter and full cache at the same time. If you think about it, a parameter gives the avility of running a new query for each row in the data pipeline; so even with full cache, the next row will require a new query to be issued; which would not make too much sense. A solution though wuld be to make the query property of the LK transformation 'expressionable' so the querye gets set dinamically at run time (1 query for all rows in a single package execution though). I Think the SSIS team has already thought about it, but I don't know when they could actualy make it available.

|||

Rafael Salas wrote:

If you think about it, a parameter gives the avility of running a new query for each row in the data pipeline; so even with full cache, the next row will require a new query to be issued; which would not make too much sense.

I'm not sure I get it. Or maybe I didn't explain it well...

Right now, without parameters and full caching, the data in cache corresponds to something like "SELECT col1, col2, col3 FROM Table1 WHERE account like '123%'.

What I would like to have in cache is something similar to "SELECT col1,... FROM Table1 WHERE ... AND TRANSACTION_TS BETWEEN ? AND ?. The 2 parameters would correspond to the MIN and MAX values of TRANSACTION_TS for the rows in the data pipeline.

The parameters are not set with values extracted from the current row so this query could be runned once.

I guess I don't understand the way SSIS handles lookups.

Thanks

|||fleo,
You can't cache records if the lookup cache depends on SQL to be dynamically built via the data flow stream. Lookups using full-cache are built BEFORE the data flow executes.|||

Ok...

The full cache is built before the execution while partial cache is issuing a query for each row in the data pipe.

The SQL for the lookup query cannot be configured.

So I conclude that what I'm trying to do is technologically impossible with the current implementation of lookups (?).

Thanks for your help guys

|||No, you can use lookups, just with no caching. Partial caching works by caching results, and if no match is found, then the SQL is executed against the table again to look for a match.|||

fleo wrote:

Ok...

The full cache is built before the execution while partial cache is issuing a query for each row in the data pipe.

The SQL for the lookup query cannot be configured.

So I conclude that what I'm trying to do is technologically impossible with the current implementation of lookups (?).

Thanks for your help guys

Don't forget you still have the create/drop view option....it is a intrusive aproach but it should do the trick.

No comments:

Post a Comment