Data Leakage Detection by Approximate Bayes-optimal Prediction. (arXiv:2401.14283v1 [stat.ML])


In as we speak’s data-driven world, the proliferation of publicly accessible
data intensifies the problem of data leakage (IL), elevating
safety considerations. IL includes unintentionally exposing secret (delicate)
data to unauthorized events by way of programs’ observable data.
Standard statistical approaches, which estimate mutual data (MI)
between observable and secret data for detecting IL, face challenges
such because the curse of dimensionality, convergence, computational complexity, and
MI misestimation. Moreover, rising supervised machine studying (ML)
strategies, although efficient, are restricted to binary system-sensitive data
and lack a complete theoretical framework. To handle these limitations,
we set up a theoretical framework utilizing statistical studying principle and
data principle to precisely quantify and detect IL. We reveal that MI
might be precisely estimated by approximating the log-loss and accuracy of the
Bayes predictor. Because the Bayes predictor is usually unknown in observe, we
suggest to approximate it with the assistance of automated machine studying (AutoML).
First, we evaluate our MI estimation approaches towards present baselines, utilizing
artificial information units generated utilizing the multivariate regular (MVN) distribution
with recognized MI. Second, we introduce a cut-off method utilizing one-sided
statistical exams to detect IL, using the Holm-Bonferroni correction to
improve confidence in detection selections. Our examine evaluates IL detection
efficiency on real-world information units, highlighting the effectiveness of the
Bayes predictor’s log-loss estimation, and finds our proposed methodology to
successfully estimate MI on artificial information units and thus detect ILs precisely.

Supply hyperlink


Please enter your comment!
Please enter your name here