Thanks for sharing, that was refreshingly easy to digest.
Also surfacing this comment I noticed which points out some pretty big caveats with the paper:
> As a heavy Datalog user, here's my opinion of the paper.
> Generally, my objection to this work would be that almost all "Datalog" programs they evaluate (7 programs in total) are either just SQL (i.e., they only have joins, no recursion) or they are transitive closure (i.e., they only have linear recursion). This is not Datalog. This is provably (one of the few things we can prove in complexity theory) a lower complexity class than Datalog. So, there is very strong reason to believe that these results do not translate to full Datalog. Transitive closure can be sped up arbitrarily much, compared to more complex recursion.
> The authors only evaluate one program (out of the 7) that has complex recursion. That one is "context-sensitive program analysis" (CSPA). However:
> a) the souffle version is not optimized for complex recursion, I strongly suspect it can be sped up by 2 orders of magnitude, so I really doubt the GPU speedup for this one
> b) despite the name, the analysis is not "context-sensitive", but let's just say this is an oversight
> c) it's still just a 10-line program, hardly representative of full Datalog.
> Methodologically, I cannot reproduce anything from this paper, at least not without talking to the authors directly. The artifact link in the paper (https://file.io/YZE8MKx12iqX) is broken already. In the repo, most of the large inputs are replaced with git lfs links and the github repo doesn't seem to serve git lfs, e.g., https://github.com/harp-lab/gdlog/blob/main/data/cspa/httpd/...
Yannis is one of the foremost experts in the field, and is responsible for Doop, which as far as I know, is one of the main developments that led to a resurgence of interest in Datalog for program verification :) Him calling himself a "heavy Datalog user" is quite modest.
I am the author of this paper, and I do not agree with Dr. Smaradgakis' comments. As far as I can tell, the root of his concern is that that paper did not target Souffle Datalog, a specific Datalog language in which his group writes. The criticism is totally fair in a sense, but I do not agree with you that these are "pretty big caveats" in our paper, for the reasons I address in my rebuttal to his comment. I will say however, that his very engaging comments have pushed us to do significant follow-on work, which has now pushed our engines to scale to the kind of code he writes in Datalog, yielding very exciting results, and I am hoping that he will be satisfies when he sees it :-)
I will also mention that our group has follow-on work from this (I cannot share this widely due to reviewing reasons but a preprint is available if you would like to search) which significantly addresses Yiannais' concerns. In the engine cited here, we scale to small programs (tens of lines): our engine does not support large, tricky queries for interesting, asymptotic reasons (which are also shared by other Datalog engines based upon binary joins, not unique to our engines). Our new engines port a significantly more complex class of join algorithms to the GPU, and we have used these new algorithms (and our novel GPU-based implementation) to run 500-1000-line Datalog programs which beat all existing state-of-the-art program analysis engines by 20-50x.
In sum, I strongly disagree with the "pretty big caveats" remark. Dr. Smaradgakis' comments are quite firm in nature and I very much respect them. But I encourage you to check out my rebuttal and also (regarding scaling to larger subsets of Datalog and "real" programs) our recent follow-on work.
If you would like proof, please email me, we are happy to help you evaluate for yourself. My email is always open: kkmicins@syr.edu.
Regarding your conclusion, I always figured that the reason WAITPKG seems kinda lame is the only reason they ported it to Core architectures was to make the heterogenous CPUs possible. It works better on Atom. On Core it does almost nothing, as you note. AMD's ISA extension was written from the ground up for their high performance server core, which might explain why it's actually useful.
reply