#ar5iv

2025-06-06

🗓️ The May 2025 arXiv articles are now in ar5iv.

#ar5iv #arxiv

2025-05-06

🗓️ The April 2025 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2025-04-06

🗓️ The March 2025 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2025-04-04

I am happy that our university hosting of the #ar5iv dataset just reached 100 verified downloaders in just about 1 year since release.

This is peanuts in a HuggingFace world, but my research group had several earlier attempts at distributing HTML5+MathML and this one went well.

P.S. Don't get me started on the zillions of crawls ar5iv has had apart from that though, sigh...

Thanks to everyone for using the dataset when you need bulk data!

mathstodon.xyz/@dginev/1123603

2025-03-06

🗓️ The February 2025 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2025-02-06

🗓️ The January 2025 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2024-12-06

🗓️ The November 2024 arXiv articles are now in ar5iv.

#arXiv #ar5iv

2024-11-06

🗓️ The October 2024 arXiv articles are now in ar5iv.

#arXiv #ar5iv

2024-10-06

🗓️ The September 2024 arXiv articles are now in ar5iv.

#arXiv #ar5iv

2024-09-05

🗓️ The August 2024 arXiv articles are now in ar5iv.

#arXiv #ar5iv

2024-08-06

🗓️ The July 2024 arXiv articles are now in ar5iv.

arXiv hit another submissions record: July was the first month surpassing >20,000 newly available LaTeX sources for conversion.

#ar5iv #arxiv

2024-07-20

@norbu presenting #OpenAccess trying to include that aspect at #arxiv.

I've been following the amazing work by @dginev towards #ar5iv for quite a while and in case you might have missed that:
ar5iv.labs.arxiv.org/

There's still a lot of work left, but I really love to see progress towards accessibility within the #TeX / #TeXLaTeX and generally #ScientificPublishing community. Especially showing arXiv it's possible to improve on a huge scale.

#TUG2024

2024-07-06

🗓️ The June 2024 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2024-06-06

🗓️ The May 2024 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2024-05-06

🗓️ The April 2024 arXiv articles are now in ar5iv.

#ar5iv #arXiv

2024-04-30

Announcing our new dataset:

ar5iv 04.2024
🔹2.1 million HTML documents
🔹1 billion formulas in MathML

sigmathling.kwarc.info/resourc

#ar5iv #arXiv

ar5iv logo,
date: 04.2024,
license: C-UDA
2024-04-06

🗓️ The March 2024 arXiv articles are now in ar5iv.

Enjoy!

#ar5iv #arxiv

2024-03-25

🗓️ ar5iv is brand new HTML today.

Regenerated with latexml v0.8.8, which led to resolving 30+ reported issues.

Success rate is at 75.33%, and HTML exists for 97.74% of articles.

New trade-off: experiment with lower image quality, reducing our HDD use from 4.8 TB to 2.7 TB.

Total ar5iv collection now comprises 2,152,821 HTML pages, and contains over a billion formulas.

More still to come, with the usual monthly update on April 5th. Enjoy!

ar5iv.labs.arxiv.org/

#ar5iv #arXiv

2024-03-25

How many <math> elements in ar5iv today?

1,059,794,660

We've finally passed a billion (10⁹) formulas!

#ar5iv #arXiv

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst