#aievaluation

IB Teguh TMteguhteja
2025-05-28

Master Python Ragas AI Evaluation! Learn to effectively assess your LLMs and RAG systems for top-tier performance. Full tutorial inside.

teguhteja.id/python-ragas-ai-e

Mr Tech Kingmrtechking
2025-05-08

SWE-Bench, a hot AI coding test, faces a big question: is it being gamed? Models might ace it but flunk real tasks, showing we may be testing test-smarts, not true skill. Time for better AI evaluation.

Rethinking AI Tests: Building Benchmarks That Actually Work.
PPC Landppcland
2025-04-16

ICYMI: Google updates quality rater guidelines with AI content evaluation criteria: Google's latest guidelines provide clearer direction on evaluating AI-generated content and spam tactics. ppc.land/google-updates-qualit

PPC Landppcland
2025-04-15

Google updates quality rater guidelines with AI content evaluation criteria: Google's latest guidelines provide clearer direction on evaluating AI-generated content and spam tactics. ppc.land/google-updates-qualit

SAIL Research NetworkSAILnetwork
2025-04-02

πŸŽ‰ That’s a wrap! The SAIL Spring School 2025 at Bielefeld University was an inspiring event, bringing together young researchers to explore AI evaluation beyond accuracy & precision.

πŸ• A highlight: our poster & pizza session – Congrats to Kathrin Lammers & Thorben Markmann for winning Best Poster Awards! πŸ‘

A big thank you to all speakers, participants & organizers! 🀝 See you at the next SAIL Spring School 2026 in Paderborn! πŸš€

2024-10-25

"The #gamma GLM is a relatively assumption-light means of #modeling non-negative data, given gamma's flexibility.
[…]
"Explaining what is used and what is not used, despite merits and demerits […]: Loosely, the larger the internal literature in any field on modelling techniques, the less inclined people in that field seem to be to try something different."

Nick Cox, 2013: stats.stackexchange.com/questi

#normality #normalDistribution #Ξ“ #modelling #dataDev #AIDev #ML #AIEvaluation #logNormal

2024-10-23

@datadon

"The following sections discuss several state-of-the-art interpretable and explainable #ML methods. The selection of works does not comprise an exhaustive survey of the literature. Instead, it is meant to illustrate the commonest properties and inductive biases behind interpretable models and [black-box] explanation methods using concrete instances."
wires.onlinelibrary.wiley.com/ 🧡

#interpretability #explainability #aiethics #compliance #taxonomy #ethicalai #aievaluation #linearRegression

2024-10-23

Model "#interpretability and [black-box] #explainability, although not necessary in many straightforward applications, become instrumental when the problem definition is incomplete and in the presence of additional desiderata, such as trust, causality, or fairness."

wires.onlinelibrary.wiley.com/

#aiethics #compliance #taxonomy #ethicalai #aievaluation

2024-09-10

I'm continuing research in LLM evals for apps and single prompts which IMO is one of the most challenging fields in machine learning right now. Im excited to learn more about Arize Phoenix and their "open-source observability library".

#LLM #MachineLearning #AIEvaluation #Evaluation #AIMetrics #MLOps #AIInsights #ArizeAI #Phoenix #AIEthics #AITransparency #ResponsibleAI

Im linking a very informative video of theirs that got me interested in what they made:
youtube.com/watch?v=9Ay0WcjrdG

2024-09-09

Today Im trying out an OpenSource LLM evaluation framework DeepEval.
The team behind it wrote a wonderful article about evaluations that is very informative, but still easy to understand.

#AI #LLM #MachineLearning #AIEvaluation #NaturalLanguageProcessing #TechInsights

confident-ai.com/blog/llm-eval

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst