In the next edition of PrivateNLP '25 workshop, we'd like to include under-represented groups in the organizer team and/or the program committe! If that sparks your interest, please get in touch. Thanks for reposting :)
Full professor at Ruhr-Universität Bochum | Leading the Trustworthy Human Language Technologies group | Playing the bass | He/him/his
In the next edition of PrivateNLP '25 workshop, we'd like to include under-represented groups in the organizer team and/or the program committe! If that sparks your interest, please get in touch. Thanks for reposting :)
Share your cool research on privacy and submit to the 5th WS on Privacy in NLP - this year co-located with ACL in Bangkok in August! Submission deadline May 17, more in the CfP below:
https://sites.google.com/view/privatenlp/home/call-for-papers
"DP-NMT: Scalable Differentially Private Machine Translation"
TLDR; Fast and scalable DP framework with JAX for transformers for NMT
Paper: https://aclanthology.org/2024.eacl-demo.11/
Code: https://github.com/trusthlt/dp-nmt
(3/3)
"Answering legal questions from laymen in German civil law system"
TLDR; Legal QA for laymen in Germany - a new task & benchmark data
Paper: https://aclanthology.org/2024.eacl-long.122/
Code & Data: https://github.com/trusthlt/eacl24-german-legal-questions
(2/3)
TrustHLT is proudly presenting two papers at @eaclmeeting !!
I personally have FOMO :) but talk to Timour Igamberdiev if you're interested in differential privacy for NLP and to Mahammad Namazov if you're into legal NLP
(1/3)
If you're also reviewing for #starsem2024 , I've added their review template to the collection of "offline" markdown blank review forms -> feel free to reuse!
https://github.com/habernal/blank-peer-review-forms-nlp
@j2kun Cool -- looking forward to your new book. By the way, the first one was a game changer for me, many thanks for that!!
@tedted ...and even if it is, don't tell your VCs :) "vintage math" or "traditional math" sounds much cooler
@tedted "old-fashioned math" -- nice framing :)
"How to win #SemEval2024 Starter Pack"
1) GPT-4
2) ... eh, that's it
Is it good news or bad news for research?
DP-NMT just accepted to EACL'24 demo track! 🎉 Talk to us in Malta if you're interested in privacy and machine translation. Super proud of the DP-NMT team led by Timour Igamberdiev 💪
@drgroftehauge I have no idea what they use for which products. Here https://machinelearning.apple.com/research/learning-with-privacy-at-scale they mention 2, 4, and other values for various experiments.
Will you give me your sensitive text data if I promise you differential privacy? And if yes, how "strongly" (ε) do I have to protect it?
Laypeople *do* understand DP risks for different ε, and won't give you anything for ε>4.5
https://arxiv.org/abs/2307.06708
w/ Chris Weiß, Frauke Kreuter
@j2kun I think it should be the other way around: Learn software engineering first to better understand math (or, to understand math at all). C.S./C.Eng = structure, clarity, non-ambiguity. Math = messy code you inherited without docs :)
@leon Oh these are hilarous!! Almost like genuine prof's replies :))
@leon If this is a joke, I don't get it, but if it's real - can you pls send me the code? :)
* With a couple of other sophisticated trics and formal proofs, we need much smaller privacy budget to get meaningful down-stream performance!
Read more here: https://arxiv.org/abs/2302.07636
Try it yourself here (yeah, nothing beats full reproducibility and transparency in privacy-preserving NLP research): https://github.com/trusthlt/dp-bart-private-rewriting
(3/3)
We found out that BART has a lot of redundancy in the latest layer!
* You can "zero-out" up to 25% of the neurons -> it still regenerates the input
* This "pruning" can be learned on public data -> it reduces largly the sensitivity
(2/n)
We push the boundaries of state-of-the-art text rewriting under local differential privacy for text classification!
The biggest problem with noisifying latent representations?
Small models (SoTA) -> Gets much much worse with noise
Big models? Huge sensitivity -> huge noise -> destroys utility
Retrain big models to be smaller in latent? -> Extremely costly
(1/n)
@tschfflr No need to get upset, just reply with a true consulting fee, that'll do