#bindiffing

2025-01-02

This is beautiful. #aor24 day 15 #bindiffing #radare2

Joxean Koret (@matalaz)joxean
2024-09-23

The code is also published (in github) already and now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.

The datasets and tools for training and testing are here: github.com/joxeankoret/diaphor
And Diaphora, is here: github.com/joxeankoret/diaphora

Pass the SALT Conferencepassthesaltcon@infosec.exchange
2024-01-31

SPONSORING

📢 We are vey happy to announce the GOLD sponsorship by Quarkslab ( quarkslab.com ) 😍 We thank them a lot for their support since the Day 1 of the conference in 2018 🙏

🚀 Among many others #opensource projects, Quarkslab recently published a ⚡ #binDiffing portal diffing.quarkslab.com/ . You will be able to find there several of their #bindiffing open source softwares like #Qbindiff or #Quokka but also many ressources on the topic.

📝 "Through QLab‘s consulting expertise and R&D, and our software QFlow and QShield, we share and scale our knowledge by making it accessible to everyone. We believe that security is everyone’s concern as there is no freedom if there is no security."

Quokka open source software logo
Joxean Koret (@matalaz)joxean
2024-01-09

It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.

An example? One paper that I have re-read today says that and are mono-architecture and totally discard these tools for the paper. LOL.

Joxean Koret (@matalaz)joxean
2023-10-20

Fun Reverse Engineering problem du jour. A compilation unit is a set of functions. Cool. However, a function might belong to one or many compilation units.

For example, in , I used to think that once I have a compilation unit name for a function, that function belongs to just that one CU. However, if a function from, for example, a header file is in-lined inside a function, what compilation unit does that function belong to?

Joxean Koret (@matalaz)joxean
2023-10-11

Any cool bug on this Patch Tuesday? Anything cool to diff with and enhance the ability to try to find patched vulnerabilities?

Joxean Koret (@matalaz)joxean
2023-09-30

Did you know that detects patch diffing sessions and tries to help finding where vulnerabilities were fixed? Here are some examples for CVE-2020-1350 and CVE-2023-28231.

Diaphora showing the exact place where CVE-2020-1350 was fixed.Diaphora showing the exact place where CVE-2023-28231 was fixed.
Joxean Koret (@matalaz)joxean
2023-08-27

Today I realised that the oldest technology developed by me integrated into dates from 2009.

In case you are curious, it's , a Python library for doing fuzzy hashing. This simplistic library calculates a set of 3 different hashes using a configurable block size (in opposite to, say, ssdeep, that doesn't work for this).

github.com/joxeankoret/deeptoad






Joxean Koret (@matalaz)joxean
2023-07-14

Also, of even small is very slow and would only, probably, help for comparing binaries for the same (or compatible) architecture. And in order to compare binaries for the same architectures you have a myriad of different, not terribly slow, ways for doing .

Joxean Koret (@matalaz)joxean
2023-07-14

Dear everyone in the academia (and maybe elsewhere) doing research: does not work for comparing different architectures, unless you are using as input for your symbolic execution tool *decompiled code*.

If you are using assembly or using an IR (Intermediate Representation) based on assembler (like Ghidra' p-code, IDA's microcode, LLVM's IR, etc), it will inevitably produce different outputs.

Your best IR for is pseudo-code, the 's output.

Joxean Koret (@matalaz)joxean
2023-07-14

One more question regarding : Which binary diffing tools have you used?

Joxean Koret (@matalaz)joxean
2023-07-14

One question regarding : Have you ever used a tool called ? I am not talking about "BinDiff" but rather about "DeepBinDiff".

Joxean Koret (@matalaz)joxean
2023-07-14
Joxean Koret (@matalaz)joxean
2023-07-02

Dear everyone in the academia using "Machine Learning" for Binary Code Similarity Analysis (ie, bindiffing): AI is bad for anything that requires exact results. It will generate a huge amount of false positives mixed with a varying degree of similar results and is pretty hard to understand its output.

Joxean Koret (@matalaz)joxean
2023-05-02

with CVE-2023-28231. As explained in the linked blog from @thezdi, the vulnerability has been fixed by checking that the number of relay forward messages in "ProcessRelayForwardMessage()" is not bigger or equal than 32 (0x20), as shown in the following pseudo-code diffing:

zerodayinitiative.com/blog/202

Joxean Koret (@matalaz)joxean
2023-03-19
Joxean Koret (@matalaz)joxean
2023-03-19

So, let's say that we have 2 functions in binary A matching 2 functions in binary B *but* both A functions and B functions have the exact same score for the 4 matches (and the same callers and callees). This looks like a complex match to resolve, right?

So, what do you think is (apparently) the best and simplest method in to determine which match is the appropriate one?

Joxean Koret (@matalaz)joxean
2023-03-15

@themoep
Thank you very much for the explanation! That makes sense.

In case you are curious, the idea is to try to build a better matches scoring function for with . Ie: given 2 functions in 2 binaries determine how close they are and generate a ratio.

So, with what you say, my guess is that I have to train more than anything with bad results, as most of the times such a function to score matches is going to see false matches.

Joxean Koret (@matalaz)joxean
2023-03-10

What are diffing acceptable times, in your opinion, for medium to big binaries (ie, diffing 2 kernels, something like 70k functions on each database)?

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst