Lmst

#PACMOD v3n1 is out, and Aaron's work is public!

With an assist from Zhuoyue's #XDB, Aaron demonstrated that a probabilistic database could use approximate query processing techniques, not only to match the accuracy of other PDBs, but to do so while returning query results **faster** than a **deterministic** database.

Probabilistic databases, broadly, are database engines that deal with data that's specified as a probability distribution, rather than exact values. For this paper, we're mainly interested in computing the expectation of count (~= sums/averages/etc...) queries.

The paper starts off, dealing with a bit of a theoretical chip on our shoulder. All the theorists like to work with set-probabilistic databases because query processing there is generally #P-time. For a probabilistic database to ever be practical, we need to operate on bag-relations, but these have been largely ignored by the theory folks since even the hardest bag-PDB query is poly time. Turns out they're wrong, and there's some interesting complexity results in bag-PDBs after all :)

Aaron pulls out the fine-grained complexity guns, and we prove that, in general, evaluating a bag-PDB query is super-linearly slower than the analogous deterministic query. That is, if there exists an algorithm that runs the deterministic version of the query in $O(f(n))$, then the best algorithm for the bag-probabilistic query will be $O(f(n) * [some as-yet unknown fractional power of n])$.

This left us with approximation. We noticed that that we can compute the lineage of a query result with the requisite runtime complexity. What puts us over the deterministic runtime complexity is computing the final query result expectations. So we developed a nice approximation algorithm for computing the expectation of a polynomial, implemented it, ran some benchmarks, and were about to call it a day when we noticed that constructing the full lineage formula was just slooooow (do not taunt the happy fun constants).

The turning point came when we started talking to Zhuoyue and realized that our sampling algorithm was more or less equivalent to #WanderJoin, albeit over provenance traces rather than raw data. Plugging into his XDB allowed us to more or less sidestep the provenance construction step, sample directly from the raw relations, and beat postgres :).

https://dl.acm.org/doi/10.1145/3709691

@vulncheck's XDB is really handy. It's a shame that so many guides still only recommend https://exploit-db.com.
#exploitdb #xdb

Does @vulncheck have a way to report that one of their entries in XDB has been deleted from GitHub? Does their XDB code occassionally check the repo URLs to make sure they aren't 404ed?

https://github.com/admi-n/CVE-2024-3400-RCE-copy is definitely 404ed.
#vulncheck #xdb

Cool to see companies and researchers embracing distributing exploits/PoCs via git repos. This was and still is one of the core features of Ronin, to allow installing 3rd-party git repos of Ruby exploit/payload/recon/whatever code that can be searched and loaded.
https://www.csoonline.com/article/3697749/how-to-check-for-new-exploits-in-real-time-vulncheck-has-an-answer.html
#xdb #exploitdb #ronin #roninrb

#xdb

Client Info