In the past I tried to avoid using hierarchies in SQL, but I couldn't find another simple data structure for a system requirement: all elements must be children of other elements with no depth limit.
I was able to change the depth limit to 100.
So I started out with:
CREATE TABLE [dbo].[Hierarchy]( [ElementId] [int] IDENTITY(1,1) NOT NULL, [ParentId] [int] NULL, CONSTRAINT [PK_Hierarchy] PRIMARY KEY CLUSTERED ( [ElementId] ASC )
at first everything worked fine, I could traverse the tree with CTE without any problem, it was even fast for the current load, then came another system requirement, the tree size will be about 1 million records.
with cte as ( select ElementId, ParentId, 0 as Level from Hierarchy where ParentId is null union all select Hierarchy.ElementId, Hierarchy.ParentId, Level + 1 from Hierarchy join cte on cte.ElementId = Hierarchy.ParentId ) select * from cte
trying to traverse a tree of 1M records with CTE with only 15 levels took more than a production sql should take, even if the system's requirement should be live or near live.
We've been looking for solutions in every corner of the internet, graph databases (we'll still need to use these IDs in the SQL so passing the data back will be a design nightmare or we can move the portions or whole application, I guess these are other options), trying to find all sorts of algorithms (which will be error prune and might be a heavy toll on the development team), CLR functions and of curse scrapping the whole hierarchy idea (no can do, system requirements).
Eventually we came to the conclusion that keeping a cache will not be too costly on the memory, cpu and disk provided that the right indexes will be created and they will provide the best result and the fun part is, its very easy and fast to join the cache table on any other table we'll need.
But we had to compromise somewhere, we compromised on node parent changes, some of them might cause the entire tree to rebuild.
I built a method to quickly update the tree given an elementid and if elementid was not provided, it will rebuild the whole tree.
I can smell just one little problem, which will be solved when updating records, need to detect circular references.
CREATE TABLE [dbo].[HierarchyCache]( [OwnerId] [int] NOT NULL, [ElementId] [int] NOT NULL )
And the SQL code updating the HierarchyCache, note that I'm using merge instead of dropping the whole table or writing individual insert/delete.
declare @ElementId int = null --if no ElementId was supplied, it means we should go over all Hierarchy and rebuild the table if (@ElementId is null) begin with parentHierarchy (ElementId, ParentId , TopElementId) as ( select ElementId, ParentId, ElementId as TopElementId from Hierarchy with (nolock) union all select [Hierarchy].ElementId,[Hierarchy].ParentId , pu.TopElementId from Hierarchy with (nolock) inner join parentHierarchy as pu on pu.ParentId = [Hierarchy].ElementId ) --instead of updating the entire table, we're doing a merge, more efficient and easier on the disk merge into HierarchyCache as target using parentHierarchy on (parentHierarchy.TopElementId = target.OwnerId and parentHierarchy.ElementId = target.ElementId) when not matched by target then insert (ElementId, OwnerId) values (parentHierarchy.ElementId,parentHierarchy.TopElementId) when not matched by source then delete; end else begin --a ElementId was supplied, update only its tree with parentHierarchy (ElementId, ParentId , TopElementId) as ( select ElementId, ParentId , ElementId as TopElementId from Hierarchy with (nolock) where ElementId = @ElementId union all select [Hierarchy].ElementId,(case when (Hierarchy.ParentId = Hierarchy.ElementId) then null else Hierarchy.ParentId end) as ParentId , pu.TopElementId from Hierarchy with (nolock) inner join parentHierarchy as pu on pu.ParentId = [Hierarchy].ElementId ) --instead of updating the entire table, we're doing a merge, more efficient and easier on the disk merge into HierarchyCache as target using parentHierarchy on (parentHierarchy.TopElementId = target.OwnerId and parentHierarchy.ElementId = target.ElementId) when not matched by target then insert (ElementId, OwnerId) values (parentHierarchy.ElementId,parentHierarchy.TopElementId); end
0 comments:
Post a Comment