GPT5’s “safe completion” was previously called “safe answering”, and is included in the benchmark we developed to assess the “Harmfulness of Applying Off-the-Shelf Large Language Models to Programming Tasks”.
GPT5’s “safe completion” was previously called “safe answering”, and is included in the benchmark we developed to assess the “Harmfulness of Applying Off-the-Shelf Large Language Models to Programming Tasks”.
Today at #FSE2025:
* 14:40 @ Cosmos 3C “Expressing and Checking Statistical Assumptions”: 84% of surveyed notebooks violate assumptions: https://conf.researchr.org/details/fse-2025/fse-2025-research-papers/93/Expressing-and-Checking-Statistical-Assumptions
OR
* 14:50 at Pirsenteret “jAST: Analyzing and Modifying Java ASTs with Python”: https://conf.researchr.org/details/fse-2025/fse-2025-demonstrations/25/jAST-Analyzing-and-Modifying-Java-ASTs-with-Python
My #Fandango team at #FSE2025 / #ISSTA2025: Alexi Turcotte, Marius Smytzek, me, Pepe Zamudio, and Laura Plein. What is #Fandango? Watch this space on Thursday for our big 1.0 release announcement and/or attend Pepe‘s presentation on Friday 16:00!
Join me in Cosmos 3C at 15:00 at #fse2025 TODAY for my talk on TerzoN and the Composite Oracle, which combines implicit, example-based, and property-based test oracles in a thoughtful and developer-focused way.
Come for a cool NaNofuzz sticker & stay to hear the results of our randomized controlled human trail of professional developers using TerzoN & fast-check.
Also at #FSE2025: Our paper assessing LLM's capabilities to handle comments in languages other than English (we look at Chinese, Dutch, English, Greek, and Polish).
Today's metrics for assessing success "fail to reliably differentiate meaningful completions from random noise"
Too bad I can't join FSE this year. If you're there, checkout our work analyzing how language models for code try to avoid giving 'harmful' responses (re intellectual property, malware, and biases).
Today at #FSE2025: Check out
* 14:20 @ Cosmos 3C: Bernd‘s journal-first work on Information Flow Fuzzing: https://conf.researchr.org/details/fse-2025/fse-2025-journal-first/31/Presentation-Proposal-for-Finding-Information-Leaks-with-Information-Flow-Fuzzing and
* 15:12 @ Vega: Laura‘s Student Research Competion talk on Predicting Software Changes from Desired Behavior Changes: https://conf.researchr.org/details/fse-2025/fse-2025-student-research-competition/4/Predicting-Software-Changes-from-Desired-Behavior-Changes
can’t wait to travel* to #fse2025 tmrw!!
*=it’s down the street from this taco place