Lmst

GPT5’s “safe completion” was previously called “safe answering”, and is included in the benchmark we developed to assess the “Harmfulness of Applying Off-the-Shelf Large Language Models to Programming Tasks”.

https://dl.acm.org/doi/abs/10.1145/3729380

#gpt5 #safecompletion #harmfulness #fse2025

Today at #FSE2025:

* 14:40 @ Cosmos 3C “Expressing and Checking Statistical Assumptions”: 84% of surveyed notebooks violate assumptions: https://conf.researchr.org/details/fse-2025/fse-2025-research-papers/93/Expressing-and-Checking-Statistical-Assumptions

OR

* 14:50 at Pirsenteret “jAST: Analyzing and Modifying Java ASTs with Python”: https://conf.researchr.org/details/fse-2025/fse-2025-demonstrations/25/jAST-Analyzing-and-Modifying-Java-ASTs-with-Python

My #Fandango team at #FSE2025 / #ISSTA2025: Alexi Turcotte, Marius Smytzek, me, Pepe Zamudio, and Laura Plein. What is #Fandango? Watch this space on Thursday for our big 1.0 release announcement and/or attend Pepe‘s presentation on Friday 16:00!

<exchange> ::= <client:request> <server:response>
<request> ::= 0x1 <length> <payload> <padding>
<response> ::= 0x2 <length> <payload> <padding>
<length> ::= <uint16>
<payload> ::= <byte>*
<padding> ::= <byte>* 
where len(<payload>) == uint16(<length>)
where <response>.<payload> == <request>.<payload>

Join me in Cosmos 3C at 15:00 at #fse2025 TODAY for my talk on TerzoN and the Composite Oracle, which combines implicit, example-based, and property-based test oracles in a thoughtful and developer-focused way.

Come for a cool NaNofuzz sticker & stay to hear the results of our randomized controlled human trail of professional developers using TerzoN & fast-check.

https://conf.researchr.org/details/fse-2025/fse-2025-research-papers/30/TerzoN-Human-in-the-Loop-Software-Testing-with-a-Composite-Oracle

Also at #FSE2025: Our paper assessing LLM's capabilities to handle comments in languages other than English (we look at Chinese, Dutch, English, Greek, and Polish).

Today's metrics for assessing success "fail to reliably differentiate meaningful completions from random noise"

https://conf.researchr.org/details/promise-2025/promise-2025-papers/3/A-Qualitative-Investigation-into-LLM-Generated-Multilingual-Code-Comments-and-Automat

#llm #llm4code

Too bad I can't join FSE this year. If you're there, checkout our work analyzing how language models for code try to avoid giving 'harmful' responses (re intellectual property, malware, and biases).

https://conf.researchr.org/details/fse-2025/fse-2025-research-papers/68/Code-Red-On-the-Harmfulness-of-Applying-Off-the-shelf-Large-Language-Models-to-Progr

#llm #alignment #fse2025

Today at #FSE2025: Check out

* 14:20 @ Cosmos 3C: Bernd‘s journal-first work on Information Flow Fuzzing: https://conf.researchr.org/details/fse-2025/fse-2025-journal-first/31/Presentation-Proposal-for-Finding-Information-Leaks-with-Information-Flow-Fuzzing and
* 15:12 @ Vega: Laura‘s Student Research Competion talk on Predicting Software Changes from Desired Behavior Changes: https://conf.researchr.org/details/fse-2025/fse-2025-student-research-competition/4/Predicting-Software-Changes-from-Desired-Behavior-Changes

Anybody on here at FSE or is it literally just me? #fse2025 #fse25

can’t wait to travel* to #fse2025 tmrw!!

*=it’s down the street from this taco place

#fse2025

Client Info