We use #grobid and the plos1000 #goldstandard as a baseline to compare the performance of LLM-based solutions.
Takeaways:
- Grobid still better choice for literature similar to the type it was trained on (mostly English-language STEM scholarship), since it is much faster & less resource-intensive
- For footnoted literature, experiments with LLamore/#Gemini show 3x better performance