Segmentation of sequences in Markov chains using penalized maximum likelihood
Change Point Detection. Penalized maximum likelihood. Sequence segmentation
The sequence segmentation problem aims at partitioning a sequence or a set of sequences into a finite number of distinct segments that are as homogeneous as possible. In this paper we consider the problem of segmenting a set of random sequences, with values in a finite alphabet $E$, into a finite number of independent blocks. Under the assumption that the data follows a Markov chain, the problem consists in estimating the number and position of independence (or change) points. We further assume that we have $n$ independent samples of size $m$, obtained from the concatenation of $k+1$ blocks of length $c_{j+1} - c_j$, each of the blocks being generated from the transition probability matrix $P_j$, with $j \in 0:k$. We define the set of true cut points by $C^{} = \{c^{}_1,\ldots,c^{}_k\}$, where these points represent the block change in the sequence. For this, we propose to use the penalized maximum likelihood criterion in order to simultaneously infer the number and position of the change points. The main result of our work is the strong consistency of the set of estimators of the cutoff points, this is, $\widehat{C} = C^$ for $n$ sufficiently large.