[2211.12620] Guarantees and Pitfalls of Threshold-based Auto-labeling


Obtain a PDF of the paper titled Guarantees and Pitfalls of Threshold-based Auto-labeling, by Harit Vishwakarma and three different authors

Obtain PDF
HTML (experimental)

Summary:Creating large-scale high-quality labeled datasets is a serious bottleneck in supervised machine studying workflows. Threshold-based auto-labeling (TBAL), the place validation knowledge obtained from people is used to discover a confidence threshold above which the info is machine-labeled, reduces reliance on handbook annotation. TBAL is rising as a widely-used answer in follow. Given the lengthy shelf-life and numerous utilization of the ensuing datasets, understanding when the info obtained by such auto-labeling techniques could be relied on is essential. That is the primary work to investigate TBAL techniques and derive pattern complexity bounds on the quantity of human-labeled validation knowledge required for guaranteeing the standard of machine-labeled knowledge. Our outcomes present two essential insights. First, cheap chunks of unlabeled knowledge could be robotically and precisely labeled by seemingly dangerous fashions. Second, a hidden draw back of TBAL techniques is probably prohibitive validation knowledge utilization. Collectively, these insights describe the promise and pitfalls of utilizing such techniques. We validate our theoretical ensures with intensive experiments on artificial and actual datasets.

Submission historical past

From: Harit Vishwakarma [view email]
Tue, 22 Nov 2022 22:53:17 UTC (12,869 KB)
Thu, 22 Feb 2024 02:47:53 UTC (13,643 KB)

Supply hyperlink


Please enter your comment!
Please enter your name here