Alternative TitleStatistical Learning Theory of Parameter-Restricted Singular Models
Note (General)Statistical models used in machine learning are called learning machines. It is well-known that learning machines are widely applied to predict unknown events and discover knowledge by computers in many fields. Indeed, machine learning has grown over the last several decades. They are used for statistical learning/inference and usually have hierarchical structures. These structures are effective for generalizing to the real world. Statistical learning theory is a theory to clarify the generalization performances of learning machines.Singular learning theory is a mathematical foundation for statistical inference using singular models. Typical hierarchical models, such as neural networks, tree and forest model, mixture model, matrix factorization, and topic model, are statistically singular since a map from a parameter to a probability density function is not one-to-one. Clarifying generalization behaviors in singular models is an important problem to estimate sufficient sample sizes, design models, and tune hyperparameters. However, conventional statistics theory cannot be applied to these models because their likelihoods cannot be approximated by any normal distribution.Singular learning theory provides a general view for this problem; birational invariants of an analytic set (a.k.a. algebraic variety) determine the generalization error. That is defined by zero of a Kullback-Leibler (KL) divergence between the data-generating distribution and the model. Algebraic structures of statistical models are essential in singular learning theory; thus, it can be interpreted as an intersection between algebraic statistics and statistical learning theory.One of such invariants is a real log canonical threshold (RLCT). An RLCT is a negative-maximum pole of a zeta function defined by an integral of a KL divergence. Determining an RLCT of a concrete model is performed by resolution of singularities. In fact, algebraic statisticians and machine learning researchers have derived the exact values or upper bounds of the RLCTs for several singular models. The theoretical value of the RLCT is effective in statistical model selection such as sBIC proposed by Drton and Plummer. Besides, Nagata proposed a tuning method using RLCTs for exchange Monte Carlo.On the other hand, from the practical point of view, the parameter region of the model is often restricted to improve interpretability. Non-negative matrix factorization (NMF) and latent Dirichlet allocation (LDA) are well-known examples of parameter-restricted singular models.In general, such constraints make the generalization error changed. However, for each singular model and condition, the quantitative effect of those constraints has not yet been clarified because the singularities in the above analytic set are also changed by the restriction to the parameter region.In this dissertation, as a foundation to establish a singular learning theory of parameter-restricted statistical models,we theoretically study the asymptotic behavior of the Bayesian generalization error in NMF and LDA. NMF and LDA are two typical singular models whose parameter regions are constrained.In NMF, we derive an upper bound of the RLCT and a lower bound of the variational approximation error.In LDA, we prove that its RLCT is equal to that of matrix factorization with simplex restrictionand clarify the exact asymptotic form of the generalization error, i.e. we determine the exact value of the RLCT of LDA.These results provide quantitative differences of generalization errors from matrix factorization whose parameter space is not restricted.
identifier:oai:t2r2.star.titech.ac.jp:50574456
Collection (particular)国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
Date Accepted (W3CDTF)2022-07-05T02:30:21+09:00
Data Provider (Database)国立国会図書館 : 国立国会図書館デジタルコレクション