Simply Wing It: Optimum Estimation of Lacking Mass in a Markovian Sequence

0
20



arXiv:2404.05819v1 Announce Kind: new
Summary: We research the issue of estimating the stationary mass — additionally referred to as the unigram mass — that’s lacking from a single trajectory of a discrete-time, ergodic Markov chain. This drawback has a number of purposes — for instance, estimating the stationary lacking mass is vital for precisely smoothing chance estimates in sequence fashions. Whereas the classical Good–Turing estimator from the Fifties has interesting properties for i.i.d. information, it’s recognized to be biased within the Markov setting, and different heuristic estimators don’t come outfitted with ensures. Working within the basic setting during which the scale of the state house could also be a lot bigger than the size $n$ of the trajectory, we develop a linear-runtime estimator referred to as emph{Windowed Good–Turing} (textsc{WingIt}) and present that its danger decays as $widetilde{mathcal{O}}(mathsf{T_{combine}}/n)$, the place $mathsf{T_{combine}}$ denotes the blending time of the chain in whole variation distance. Notably, this price is impartial of the scale of the state house and minimax-optimal as much as a logarithmic consider $n / mathsf{T_{combine}}$. We additionally current a certain on the variance of the lacking mass random variable, which can be of impartial curiosity. We lengthen our estimator to approximate the stationary mass positioned on parts occurring with small frequency in $X^n$. Lastly, we display the efficacy of our estimators each in simulations on canonical chains and on sequences constructed from a well-liked pure language corpus.



Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here