توصيفگر ها :
ضرايب رگرسيوني , براورد بيز , روش انقباضي , لاسو و روش براورد ضرائب رگرسيوني ريج
چكيده فارسي :
در بسياري از مطالعه هاي اخير، مدل هاي رگرسيوني اغلب همراه با تعداد متغيرهاي پيشگو زيادي هستند، كه در اين گونه
مسائل استفاده از روش هاي ابتدايي مانند روش كمترين مربعات خطا كار ساز نبوده و بايد از رگرسيون تاوانيده 8 استفاده
كرد. از جمله اين روش ها نيز مي توان به روش هاي رگرسيوني ريچ 9، لاسو 10 و الاستيك نت 11 اشاره كرد كه در روش
و Yi لاسو علاوه بر انتخاب متغير 12 به براورد پارامترهاي مدل رگرسيوني مي پردازد. فرض كنيد ارتباط بين متغير وابسته
به صورت زير است، Xij
متغيرهاي پيشگو 13
yi = β0 +
Xp
j=1
βjxij ; i = 1, · · · , n, (2)
تا مي شود k برابر β ها و 0 βj . ضرايب رگرسيوني مدل هستند ،j = 1, ..., p ، ها βj عرض از مبدا 14 و β كه در آن 0
با ،(n ≥ p) مي باشد، در اين مدل رگرسيوني اگر تعداد مشاهدات بيشتر از تعداد متغيرهاي پيشگو باشد k = P + كه 1
بعدي −(p+ 30 ]ارائه شد، بهترين خط در فضاي( 1 ]( استفاده از روش كمترين مربعات كه توسط فرانك و فرديمن( 1993
را مي توان بدست آورد. اما در حالت هاي كه تعداد متغيرهاي پيشگو (متغيرها مستقل)، بيشتر از تعداد نمونه هاي جمع
آوري شده باشد( به طوري كه حجم عظيمي از داده هاي عصر جديد به اين صورت هستند، به طور مثال مي توان به تعداد
متغيرها براي شناسايي ژن هاي موجود در بدن انسان نام برد) آن گاه استفاده كردن از روش كمترين مربعات 15 ديگر كارسازنيست.
چكيده انگليسي :
In many recent studies, regression models are often associated with a large number of predictor variables, in which
the use of elementary methods such as the least squares error method is not effective and equation regression should
be used. Among these methods, we can mention the regression methods of Rich, Lasso and elastic net, which in
addition to selecting the variable, the Lasso method estimates the parameters of the regression model. Assume that
the relationship between the dependent variable Yi and the predictor variables Xij is as follows,
yi = β0 +
Xp
j=1
βjxij ; i = 1, · · · , n, (19)
Where β0 is the width of the origin and βj , j = 1, ..., p, are the model regression coefficients. In this regression
model, if the number of observations is greater than the number of predictor variables (n geqp), using the least
squares method proposed by Frank and Ferdiman (1993), the best line in the space (p + 1) The
next can be obtained.
But in cases where the number of predictor variables (independent variables) is greater than the number of
samples collected (such as the vast amount of New Age data, for example, the number of variables can be used to
identify genes in the human body). Named) Then using the least squares method no longer works. Ridge and lasso
regression models, which are also known as regularization methods, are widely used methods in machine learning
and inverse problems that introduce additional information to solve illposed
problems and/or perform feature selection.
The ridge and lasso estimates for linear regression parameters can be interpreted as Bayesian posterior
estimates when the regression parameters have Normal and independent Laplace (i.e., doubleexponential)
priors,
respectively. A significant challenge in regularization problems is that these approaches assume that data are normally
distributed, which makes them not robust to model misspecification. A Bayesian approach for ridge and lasso
models based on empirical likelihood is proposed. This method is semiparametric because it combines a nonparametric
model and a parametric model. Hence, problems with model misspecification are avoided. Under the Bayesian
empirical likelihood approach, the resulting posterior distribution lacks a closed form and has a nonconvex support,
which makes the implementation of traditional Markov chain Monte Carlo (MCMC) methods such as Gibbs sampling
and Metropolis–Hastings very challenging. To solve the nonconvex optimization and nonconvergence problems, the
tailored Metropolis–Hastings approach is implemented. The asymptotic Bayesian credible intervals are derived.