https://www.pinterest.com/pin/647744358911454390/

KDD 21':Understanding and Improving Fairness-Accuracy Trade-offs in Multi-Task Learning (google)

Arthur Lee

4 min readJul 10, 2022

🤗 Recommendation system paper challenge (31/50)

paper link Google KDD 2021

🤔 What problem do they solve?

現有的MTL (multi-task learning) 沒有針對fairness進行優化,只有STL (single task learning)針對fairness進行優化

所以這篇paper在於解決 fairness on MTL的問題

😎 Contribution

定義新的metric:

ARFG: 對每個task去計算MTL中的FPRGap跟STL的FPRGap的比值,在平均起來
ARE: 對每個task去計算MTL中的error rate跟STL的error rate的比值,在平均起來

baseline 方法: Per-Task Fairness Treatment

與MTL類似,只是多了要optimize fairness loss

how to measure fairness loss?

minimizing the correlation between group membership and the predictions over negative examples (讓每個task中的negative sample的正確性之間的相關性最小)
kernel-based distribution matching through Maximum Mean Discrepancy (MMD) over negative examples (讓每個task的MMD最大)
minimizing FPR gap directly in the loss (goal)

直覺想法,讓每個task的正確性預測越相近,他們的FPR就越接近

提出方法: Multi-Task-Aware Fairness Treatment (MTA-F)

其實只是在baseline的方法加上了task-specific parameter training

可以發現只要只是Fairness的F改了,對於shared layer我們想用大家共同的data去train,對於task-specific layer,我們只想用task-specific data去跑

舉例來說, 從下表中,如果我們針對task1去找shared-negative-data跟specific-negative-data

specific-negative-data: 只有他negative,其他task是positive,所以是 E
shared-negative-data: task1 negative而且扣掉specific-negative-data,所以是 (EFGH — E) = FGH

用了以上的data,我們就可以跑training

以下是詳細的algorithm

😮 Background:

在很早期,我們只對一個task去train deep learning,但隨著業務要求還有data因素,我們開始有了multi-task learning

業務(given a image,我們不只想知道這是男生女生,還想知道他快樂與否)
data (我們有task A, taskB, 也有一個大data可能有些沒有label for task A, or for task B), 因此想到我們也許可以把這個data直接餵給task A, task B一起去optimize

Fairness metric

有兩個常用的metric

all subgroups receive the same proportion of positive outcomes: 通常在Demographic parity的case
equal opportunity and equalized odds: equal TPR (true positive rates) and FPR (false positive rates) across different subgroups, 更加務實,此篇論文用這metric

Fairer representation learning

fairer representation learning

Fairness mitigation

single-task learning setting (pre-processing the data embeddings for downstream job, post-processing model’s prediction)
intervening the model training process has also been popular, including adding fairness constraints or regularization

Multi-task learning

shared subset parameters

pros:

exploits task relatedness with inductive bias learning (learning a shared representation across related tasks is beneficial for harder tasks or tasks with limited training examples.) 對難的問題跟限制data很有幫助
forcing tasks to share model capacity (regularization) ->避免overfitting,更general
proving compact and efficient form of modeling which enables training and serving multiple prediction quantities for large-scale systems -> serving time更scalable system

cons:

可能讓task1好但task2不好
現存方法是reducing the task training conflicts 還有improving the Pareto frontier

Fairness in multi-task learning

隨著MTL越來越熱門,開始很多人在研究fairness in MTL

fairness in multi-task regression models and uses a rank-based non-parametric independence test to improve fairness in ranking
multi-task learning enhanced with fairness constraints to jointly learn classifiers that leverage information between sensitive groups

Fairness-accuracy trade-off and Pareto fairness

兩種trade-off

fairness跟accuracy有 a trade-off 關係 for single task
各個accuracy trade-off cross tasks

幾個key questions

How does inductive transfer in multi-task learning implicitly impacts fairness?
How do we measure the fairness-accuracy trade-off for multi-task learning
Are we able to achieve a better Pareto efficiency in fairness across multiple tasks, by exploiting task relatedness and shared architecture that’s specific to multi-task learning?

心得

MTL越來越熱門,很多大公司也開始使用,不僅是NLP跟CV領域,像是傳統推薦系統也會考慮MTL(click, comment, skip rate)抑或是新聞推薦系統(engagement v.s. freshness),從業務上出發,擁有很多不同的目標要達成

而fairness也越來越受到重視,希望model盡可能學習不同subgroup的metric讓我們的model不會太bias在某些group上

這兩種trade-off都是很具有挑戰性的,作者試著把這兩個trade-off去結合成一個新的問題,再提出metric跟方法來解決,屬於modeling paper (比較沒有涉及serving, production的部分),對於學習modeling來說,是個不錯的範本

結語

看完了這篇paper收穫蠻多的,主要是對於MTL跟fairness這兩個新領域背景的了解,machine learning領域真的演進非常快速,每隔幾年就發現世界已經不一樣了,如何衡量fairness也非常有趣

🙃 Other related blogs:

KDD 21':Learning to Embed Categorical Features without Embedding Tables for Recommendation (google)

KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations

KDD 18': Notification Volume Control and Optimization System at Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

KDD 2020

https://www.kdd.org/kdd2020/

Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website:
https://light0617.github.io/#/