RE: https://mathstodon.xyz/@dginev/115500903499229848
🗓️ The November 2025 arXiv articles are now in ar5iv.
RE: https://mathstodon.xyz/@dginev/115500903499229848
🗓️ The November 2025 arXiv articles are now in ar5iv.
I got curious, so here are some recent papers using ar5iv data:
Connected Theorems: A Graph-Based Approach to Evaluating Mathematical Results
https://arxiv.org/abs/2508.17596
SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers
https://arxiv.org/abs/2507.02212
ChatPD: An LLM-driven Paper-Dataset Networking System
https://arxiv.org/abs/2505.22349
PaSa: An LLM Agent for Comprehensive Academic Paper Search
https://arxiv.org/abs/2501.10120
I am happy that our university hosting of the #ar5iv dataset just reached 100 verified downloaders in just about 1 year since release.
This is peanuts in a HuggingFace world, but my research group had several earlier attempts at distributing HTML5+MathML and this one went well.
P.S. Don't get me started on the zillions of crawls ar5iv has had apart from that though, sigh...
Thanks to everyone for using the dataset when you need bulk data!
@norbu presenting #OpenAccess trying to include that aspect at #arxiv.
I've been following the amazing work by @dginev towards #ar5iv for quite a while and in case you might have missed that:
https://ar5iv.labs.arxiv.org/
There's still a lot of work left, but I really love to see progress towards accessibility within the #TeX / #TeXLaTeX and generally #ScientificPublishing community. Especially showing arXiv it's possible to improve on a huge scale.