Lmst

In the next edition of PrivateNLP '25 workshop, we'd like to include under-represented groups in the organizer team and/or the program committe! If that sparks your interest, please get in touch. Thanks for reposting :)

Share your cool research on privacy and submit to the 5th WS on Privacy in NLP - this year co-located with ACL in Bangkok in August! Submission deadline May 17, more in the CfP below:

https://sites.google.com/view/privatenlp/home/call-for-papers

"DP-NMT: Scalable Differentially Private Machine Translation"

TLDR; Fast and scalable DP framework with JAX for transformers for NMT

Paper: https://aclanthology.org/2024.eacl-demo.11/
Code: https://github.com/trusthlt/dp-nmt

(3/3)

"Answering legal questions from laymen in German civil law system"

TLDR; Legal QA for laymen in Germany - a new task & benchmark data

Paper: https://aclanthology.org/2024.eacl-long.122/
Code & Data: https://github.com/trusthlt/eacl24-german-legal-questions

(2/3)

TrustHLT is proudly presenting two papers at @eaclmeeting !!

I personally have FOMO :) but talk to Timour Igamberdiev if you're interested in differential privacy for NLP and to Mahammad Namazov if you're into legal NLP

(1/3)

If you're also reviewing for #starsem2024 , I've added their review template to the collection of "offline" markdown blank review forms -> feel free to reuse!
https://github.com/habernal/blank-peer-review-forms-nlp

@j2kun Cool -- looking forward to your new book. By the way, the first one was a game changer for me, many thanks for that!!

@tedted ...and even if it is, don't tell your VCs :) "vintage math" or "traditional math" sounds much cooler

@tedted "old-fashioned math" -- nice framing :)

"How to win #SemEval2024 Starter Pack"

1) GPT-4
2) ... eh, that's it

Is it good news or bad news for research?

DP-NMT just accepted to EACL'24 demo track! 🎉 Talk to us in Malta if you're interested in privacy and machine translation. Super proud of the DP-NMT team led by Timour Igamberdiev 💪

Paper: https://arxiv.org/abs/2311.14465

Code: https://github.com/trusthlt/dp-nmt/

@drgroftehauge I have no idea what they use for which products. Here https://machinelearning.apple.com/research/learning-with-privacy-at-scale they mention 2, 4, and other values for various experiments.

Will you give me your sensitive text data if I promise you differential privacy? And if yes, how "strongly" (ε) do I have to protect it?

Laypeople *do* understand DP risks for different ε, and won't give you anything for ε>4.5

https://arxiv.org/abs/2307.06708

w/ Chris Weiß, Frauke Kreuter

@j2kun I think it should be the other way around: Learn software engineering first to better understand math (or, to understand math at all). C.S./C.Eng = structure, clarity, non-ambiguity. Math = messy code you inherited without docs :)

@leon Oh these are hilarous!! Almost like genuine prof's replies :))

@leon If this is a joke, I don't get it, but if it's real - can you pls send me the code? :)

* With a couple of other sophisticated trics and formal proofs, we need much smaller privacy budget to get meaningful down-stream performance!

Read more here: https://arxiv.org/abs/2302.07636

Try it yourself here (yeah, nothing beats full reproducibility and transparency in privacy-preserving NLP research): https://github.com/trusthlt/dp-bart-private-rewriting

(3/3)

We found out that BART has a lot of redundancy in the latest layer!

* You can "zero-out" up to 25% of the neurons -> it still regenerates the input

* This "pruning" can be learned on public data -> it reduces largly the sensitivity

(2/n)

We push the boundaries of state-of-the-art text rewriting under local differential privacy for text classification!

The biggest problem with noisifying latent representations?

Small models (SoTA) -> Gets much much worse with noise

Big models? Huge sensitivity -> huge noise -> destroys utility

Retrain big models to be smaller in latent? -> Extremely costly

(1/n)

@tschfflr No need to get upset, just reply with a true consulting fee, that'll do

Client Info