Last week, I wrote about a new report I added for the UDC data that shows project pairings “in the wild”. In a comment, Nick effectively killed my weekend. Nick, my wife’s not happy with you. I showed her your picture. Be careful.
The report can now be sorted on columns. Actually, this part was easy to implement.
The hard part is that I noticed something when I started sorting the data on the different columns. I noticed that, while I expected that each pairing should be represented twice, some of them are only represented once. I expected to see, for example, a pairing of ‘ajdt’ and ‘platform’ along with a pairing of ‘platform’ and ‘ajdt’; but I only see one (the latter). I tried several different queries, and even tried more manually assembling the data (the final query is surprisingly simple, yet yields the same results as some of its more complicated cousins). I stumbled into two known MySQL bugs, and (possibly) one unknown one. To ensure that things stayed interesting, I opted to move the main query into a stored procedure (put the Data Tools Project‘s SQL editing support into full use (though it was complicated by the fact that I had never once before ever created a stored procedure).
I’ll keep hacking at it, but in the meantime, I’ve made the new version available anyway.
In my efforts to refine the query, I discovered that I had previously been doubling the actual numbers (in my defense, I did state that I thought they felt a little high). That’s been fixed.
Once I can figure out a reasonable home for the queries, I’ll make ’em available so that Nick can lose a weekend too.
I also spent a little time looking for a visualization package; something that will let me draw a diagram showing the relative proximities of the projects. I did stumble upon a neat little physics package that I figure I can use: each project can be represented as particle, with the number of users represented as the strength of a spring joining the particles. I think it’ll look cool. But, unfortunately, it’s become low priority; at least for the next week or so.
In the meantime, does anybody know of some decent software that can be used to represent this sort of information?