Thursday 26 April 2012

Nesting multiple levels with xmlelement and xmlagg


I did some work recently prototyping creating XML directly from a select statement. I came across many examples using xmlserialize, xmlelement and xmlagg.
But most of these examples were pretty simple and didn’t go very far into creating a nested structure.
I found examples that went to two levels and used xmlagg and a group by from the query like this.
SELECT  xmlserialize(document
    xmlelement("root",
      xmlagg(
        xmlelement("Location",
          xmlelement("LocationID", l.location_id),
          xmlelement("Street", l.street_address),
          xmlelement("Departments",
            xmlagg(
              xmlelement("Department",
                xmlelement("DepartmentID" , d.department_id),
                xmlelement("DepartmentName", d.department_name)
              )
            order by d.department_id
            )
          )
        )
      )
    ) version '1.0' 
  ) as "xml"
FROM  departments d 
  JOIN locations l ON (d.location_id = l.location_id)
WHERE  l.location_id in (1400, 1500, 1700)
GROUP BY l.location_id, l.street_address


Which works fine but I couldn’t see how to extend this method past two levels. When I tried to go one level deeper I got the error
ORA-00935: group function is nested too deeply
So I had to find another way to do this. Then I found another example which nested the SQL in the select statement.
So the equivalent of the above SQL is done like this.
SELECT  xmlserialize(document
    xmlelement("root",
      xmlagg(
        xmlelement("Location",
          xmlelement("LocationID", l.location_id),
          xmlelement("Street", l.street_address),
          xmlelement("Departments",
            (SELECT  
              xmlagg(
                xmlelement("Department",
                  xmlelement("DepartmentID" , d.department_id),
                  xmlelement("DepartmentName", d.department_name)
                )
              )
            FROM  departments d
            WHERE  d.location_id = l.location_id)
          )
        )
      )
    ) version '1.0' 
  ) as "xml"
FROM  locations l
WHERE  l.location_id in (1400, 1500, 1700)

FROM    locations l
WHERE l.location_id in (1400, 1500, 1700)
With the style like this we can then extend this to however many levels we need.
For example extending the above example to a third level looks like this.
SELECT  xmlserialize(document
    xmlelement("root",
      xmlagg(
        xmlelement("Location",
          xmlelement("LocationID", l.location_id),
          xmlelement("Street", l.street_address),
          xmlelement("Departments",
            (SELECT  
              xmlagg(
                xmlelement("Department",
                  xmlelement("DepartmentID" , d.department_id),
                  xmlelement("DepartmentName", d.department_name),
                  xmlelement("Employees",
                    (SELECT  
                      xmlagg(
                        xmlelement("Employee",
                          xmlelement("EmployeeID" , e.employee_id),
                          xmlelement("EmployeeFirstName", e.first_name),
                          xmlelement("EmployeeLastName", e.last_name)
                        )
                      )
                    FROM  employees e
                    WHERE  e.department_id = d.department_id)
                  )
                )
              )
            FROM  departments d
            WHERE  d.location_id = l.location_id)
          )
        )
      )
    ) version '1.0' 
  ) as "xml"
FROM  locations l
WHERE  l.location_id in (1400, 1500, 1700)

Tuesday 17 April 2012

CONNECT BY PRIOR ORDER SIBLINGS BY not ordering correctly

Came across a situation recently where a developer was using a connect by prior order by siblings and the order was all over the place.
It was a fairly lengthy query and I spent a fair amount of time looking at it before I realised there was a DISTINCT clause at the start which was not actually needed.
The DISTINCT completely wrecks the order.

Here is a simple example
SELECT LEVEL,
 a.employee_id,
 a.first_name,
 a.last_name,
 a.manager_id
FROM employees a
START WITH a.employee_id = 100
CONNECT BY NOCYCLE PRIOR a.employee_id = a.manager_id
ORDER SIBLINGS BY a.last_name
Now if we add DISTINCT
SELECT DISTINCT
 LEVEL,
 a.employee_id,
 a.first_name,
 a.last_name,
 a.manager_id
FROM employees a
START WITH a.employee_id = 100
CONNECT BY NOCYCLE PRIOR a.employee_id = a.manager_id
ORDER SIBLINGS BY a.last_name
The connect by prior seesm to be ignored and the order is based purely on the order siblings by a.last_name

I'm not sure if this is a bug. But it is something to be aware of if you think you need to put a distinct in when using a connect by prior.