A part of the argument against AI is that LLMs are somewhat less than accurate (colloquially, "shitty"), but in limited use cases they are fine. How do you judge an LLM? Well, I tried to answer that myself. Warning, the test is tailored more to myself and hacker friends than the world, but it should give you an idea of a direction to head on your own path.
Sample test: https://gitlab.com/nmrc/nmrc-ai-test
Blog: https://www.markloveless.net/blog/2025/6/3/using-ai-responsibly