Fair Group Shapley: Partition-Invariant and Computationally Efficient Group Data Valuation

25 Apr, 2025, 4-5 pm, GHC 8102

Speaker: Ziqi Liu

Abstract: Group data valuation is increasingly important in real-world applications, where data comes from different contributors—such as copyright owners, data providers, or institutions—and benefits such as model revenue need to be fairly allocated. Existing methods often adopt a “group-as-individual” approach, treating each group as a single unit, but we find this leads to systematic biases, especially under unequal or strategic partitioning. In this talk, I’ll present a new formulation of group Shapley value that defines a group’s value as the sum of individual contributions, preserving key axioms—including a new partition invariance axiom that ensures consistency across groupings. We also develop a computationally efficient approximation algorithm that achieves a nearly linear runtime in dataset size. Through synthetic experiments and real-world applications such as copyright attribution in generative AI, we demonstrate that our method achieves both improved fairness and computational efficiency compared to existing approaches. This is joint work with Kiljae Lee, Yuan Zhang, and Weijing Tang.