Using 'rank over partition' in BIRT reports

One of the common reporting requests I get is provide a list of the most recent set of records out a series of updates or changes. For example, provide a list of all inventory items that have been added/changed in the previous month or all work orders that were set to the COMP status last week. The trouble with these types of requests is that they don't allow a report writer to rely on the current status of the record, they need to see what may have been a historical status. In the example of inventory records, monitoring the INVENTORY. CHANGEDATE will only show the last date the item was changed. The historical status change date (INVSTATUS.CHANGEDATE) is saved separate from the main inventory records. I can use this field to segregate records that have been created or altered in the previous month without relying on the current CHANGEDATE value.

select inventory.itemnum, invstatus.status, invstatus.changedate
from inventory 
join invstatus on invstatus.itemnum=inventory.itemnum and invstatus.location=inventory.location 
and invstatus.siteid =inventory.siteid and invstatus.itemsetid=inventory.itemsetid
where  invstatus.changedate >= dateadd(month,datediff(month,1,getdate()) +0,0) 
and invstatus.changedate < dateadd(month,datediff(month,1,getdate()) +1,0)
order by inventory.itemnum

Query results: ¹

Item Num	Status	Change Date
9716	PENDOBS	2015-01-02 11:48:30.05
11182	PENDOBS	2015-01-02 11:35:49.803
11183	PENDOBS	2015-01-02 11:35:49.983
11183	NONSTOCK	2015-01-06 16:12:12.91
11185	OBSOLETE	2015-01-03 19:31:50.003
11186	ACTIVE	2015-01-04 8:23:12.017

I got the results I wanted, but item 11183 happens to have had two status changes. I only want the most recent change of the two, so I'll have to add some filter function to get only the last status change. A common methods is to use a rank() function like this: rank() over (partition field1 order by field2).

So I updated the SQL selection statement:²

select inventory.itemnum, invstatus.status, invstatus.changedate, 
rank() over (partition by inventory.itemnum, inventory.siteid order by invstatus.changedate desc) as rank
from inventory 
join invstatus on invstatus.itemnum=inventory.itemnum and invstatus.location=inventory.location 
and invstatus.siteid =inventory.siteid and invstatus.itemsetid=inventory.itemsetid
where  invstatus.changedate >= dateadd(month,datediff(month,1,getdate()) +0,0) 
and invstatus.changedate < dateadd(month,datediff(month,1,getdate()) +1,0)
order by inventory.itemnum

Query results:

Item Num	Status	Change Date	Rank
9716	PENDOBS	2015-01-02 11:48:30.05	1
11182	PENDOBS	2015-01-02 11:35:49.803	1
11183	NONSTOCK	2015-01-06 16:12:12.91	1
11183	PENDOBS	2015-01-02 11:35:49.983	2
11185	OBSOLETE	2015-01-03 19:31:50.003	1
11186	ACTIVE	2015-01-04 8:23:12.017	1

So now that I've got the rank value, I should be able to update the SQL selection statement to filter out only results where rank = 1.

select inventory.itemnum, invstatus.status, invstatus.changedate, 
rank() over (partition by inventory.itemnum, inventory.siteid order by invstatus.changedate desc) as rank
from inventory 
join invstatus on invstatus.itemnum=inventory.itemnum and invstatus.location=inventory.location 
and invstatus.siteid =inventory.siteid and invstatus.itemsetid=inventory.itemsetid
where  invstatus.changedate >= dateadd(month,datediff(month,1,getdate()) +0,0) 
and invstatus.changedate < dateadd(month,datediff(month,1,getdate()) +1,0)
and rank = 1
order by inventory.siteid, inventory.itemnum

Run the SQL statement and you'll get an error in your SQL tool similar to this: Invalid column name 'rank'. The problem is the rank value isn't being read from the database, it's being generated as the SQL statement is processed, so it can't be used to filter out the results. If you remove the where clause statement and rank = 1 the SQL statement works again, but it goes back to showing all records that had the status changed this month.

Breaking down the rank() function

The rank() over (partition field1 order by field2 function works by adding a field to a select statement and generating a value based on how the partition order was set up. In the example above, I used the following setup:

rank() over (partition by inventory.itemnum, inventory.siteid order by invstatus.changedate desc) as rank

Breaking down each piece of the function and what it does:

rank() over: This is how you call the function to initiate SQL server to start ranking the returned records.
(partition by inventory.itemnum, inventory.siteid: This tells SQL server how the ranking should be broken up. In this case we stated an individual rank should be applied when the returned record has the same inventory.itemnum and inventory.siteid. If it was applied to only inventory.itemnum, you would get the ranking applied to an item used in multiple sites.
order by invstatus.changedate decs: This tells how the ranking should be ordered. In this case we wanted the ranking based on the descending order of the invstatus.changedate field. This is why the most recent change will be ranked 1 and older status changes will be ranked lower.
as rank: The rank() function itself doesn't have a column/field description, like inventory.itemnum. To reference the values obtained by the rank() function we use the as rank statement at the end to trick SQL server into giving the rank values a column name. This is how we later used the statement rank = 1 to filter out results.

So how can we use the rank() function in the where clause?

The trick is to basically run the selection twice. The first selection will get the records and applies the rank() function. The second selection will apply any filtering on the original results, like rank = 1.

Double Select Method

The first method is basically what I described before, run the selection twice and filter on the 2nd set of results. In this case we setup the original selection and then a second selection around it to filter on the rank() function results. We do this by encasing the original selection statement inside another select statement.

select itemnum, status, changedate from (
select inventory.itemnum, invstatus.status, invstatus.changedate, 
rank() over (partition by inventory.itemnum, inventory.siteid order by invstatus.changedate desc) as rank
from inventory 
join invstatus on invstatus.itemnum=inventory.itemnum and invstatus.location=inventory.location and 
invstatus.siteid =inventory.siteid and invstatus.itemsetid=inventory.itemsetid
where  INVSTATUS.CHANGEDATE >= dateadd(month,datediff(month,1,getdate()) +0,0) 
and INVSTATUS.CHANGEDATE < dateadd(month,datediff(month,1,getdate()) +1,0)
) a 
where rank = 1
order by itemnum

Query results:

Item Num	Status	Change Date
9716	PENDOBS	2015-01-02 11:48:30.05
11182	PENDOBS	2015-01-02 11:35:49.803
11183	NONSTOCK	2015-01-06 16:12:12.91
11185	OBSOLETE	2015-01-03 19:31:50.003
11186	ACTIVE	2015-01-04 8:23:12.017

Exactly what we wanted.

With Statement Method

The other method is to use a WITH AS select statement. The idea is similar to the first method in that the selection is set aside and tagged as its own pseudo table. Then a second selection is made from the pseudo table, including the filter statement rank = 1.

with invdetail as  (
select inventory.itemnum, invstatus.status, invstatus.changedate, 
rank() over (partition by inventory.itemnum, inventory.siteid order by invstatus.changedate desc) as rank
from inventory 
join invstatus on invstatus.itemnum=inventory.itemnum and invstatus.location=inventory.location 
and invstatus.siteid =inventory.siteid and invstatus.itemsetid=inventory.itemsetid
where  INVSTATUS.CHANGEDATE >= dateadd(month,datediff(month,1,getdate()) +0,0) 
and INVSTATUS.CHANGEDATE < dateadd(month,datediff(month,1,getdate()) +1,0)  
) 
select itemnum, status, changedate
from invdetail
where rank = 1
order by itemnum

Query results:

Item Num	Status	Change Date
9716	PENDOBS	2015-01-02 11:48:30.05
11182	PENDOBS	2015-01-02 11:35:49.803
11183	NONSTOCK	2015-01-06 16:12:12.91
11185	OBSOLETE	2015-01-03 19:31:50.003
11186	ACTIVE	2015-01-04 8:23:12.017

Again, the SQL statement gives the results we wanted.

Wrapping it up in BIRT

Now that we got the results we wanted, we place the same SQL statement into BIRT to give a report to show all of the inventory items that have had a status change:

This report will give us just the results where rank = 1 and the status was changed in the selected reporting period. No additional filter or settings are needed in the report itself.

The query actually runs on the current month, but I didn't have any data that demonstrated the duplicate issue from Dec 2014. ↩
The rank function included two order by functions so items will be listed by SITEID first and by ITEMNUM second. ↩

Breaking down the rank() function

Double Select Method

With Statement Method

Wrapping it up in BIRT

Comments