pythonでの
調べた
多項式回帰とは
線形回帰を
多項式分析は、
多項式回帰でフィッティング
と
多項式回帰が 計算できる ライブラリ
- numpy.polyfit — NumPy v1.15 Manual
- Orthogonal distance regression (scipy.odr) — SciPy v1.1.0 Reference Guide
scipy,odr には、これと いう 機能は ないですが、 駆使できれば 多項式回帰も できるようです。 - sklearn.preprocessing.PolynomialFeatures — scikit-learn 0.19.2 documentation
実際に 計算する
統計局ホームページ/日本の2- 1 <wbr>人口の<wbr>推移と<wbr>将来人口(エクセル:42KB)
の
* numpy.polyfitを
まず、
%matplotlib inline import numpy as np from matplotlib import pyplot as plt x = np.array([1920,1925,1930,1935,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2020,2025,2030,2035,2045,2055,2065,2075,2085,2095]) y = np.array([55963,59737,64450,69254,71933,72147,84115,90077,94302,99209,104665,111940,117060,121049,123611,125570,126926,127768,128033,128084,128032,128057,127834,127593,127414,127237,127095,126933,125325,122544,119125,115216,106421,97441,88077,78564,70381,63125]) plt.scatter(x , y) # `np.polyfit` で、4次式で近似 plt.plot(x, np.poly1d(np.polyfit(x, y, 4))(x), label='d=4', color="green") # `np.polyfit` で、2次式で近似 plt.plot(x, np.poly1d(np.polyfit(x, y, 2))(x), label='d=2', color="red")
array([ 9.61741508e-04, -7.71943642e+00, 2.32189763e+04, -3.10181359e+07, 1.55281205e+10])
4次式に
- scipy.odr を
使う
python_tips/poly_lsq.py at master · tiagopereira/python_tips をpython3 向けに 編集しました。 scipy odrpack
を使って、 多項式最小二乗法を データに 適用します。
from scipy.odr import odrpack as odr from scipy.odr import models def poly_lsq(x,y,n,verbose=False,itmax=200): ''' Performs a polynomial least squares fit to the data, with errors! Uses scipy odrpack, but for least squares. IN: x,y (arrays) - data to fit n (int) - polinomial order verbose - can be 0,1,2 for different levels of output (False or True are the same as 0 or 1) itmax (int) - optional maximum number of iterations OUT: coeff - polynomial coefficients, lowest order first err - standard error (1-sigma) on the coefficients --Tiago, 20071114 ''' # http://www.scipy.org/doc/api_docs/SciPy.odr.odrpack.html # see models.py and use ready made models!!!! func = models.polynomial(n) mydata = odr.Data(x, y) myodr = odr.ODR(mydata, func,maxit=itmax) # Set type of fit to least-squares: myodr.set_job(fit_type=2) if verbose == 2: myodr.set_iprint(final=2) fit = myodr.run() # Display results: if verbose: fit.pprint() if fit.stopreason[0] == 'Iteration limit reached': print('(WWW) poly_lsq: Iteration limit reached, result not reliable!') # Results and errors coeff = fit.beta[::-1] err = fit.sd_beta[::-1] return coeff,err %matplotlib inline import numpy as np from matplotlib import pyplot as plt x = np.array([1920,1925,1930,1935,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2020,2025,2030,2035,2045,2055,2065,2075,2085,2095]) y = np.array([55963,59737,64450,69254,71933,72147,84115,90077,94302,99209,104665,111940,117060,121049,123611,125570,126926,127768,128033,128084,128032,128057,127834,127593,127414,127237,127095,126933,125325,122544,119125,115216,106421,97441,88077,78564,70381,63125]) plt.scatter(x , y) coeff, err = poly_lsq(x, y , 4) print("polynomial coefficients:", coeff) print("standard error:", err) # グラフ描画 plt.plot(x, np.poly1d(coeff)(x), label='d=4', color="green")
polynomial coefficients: [ -2.42404329e-06 9.72801674e-03 -9.73287786e+00 7.75065922e+00 1.00000000e+00] standard error: [ 8.35504229e-08 3.35038696e-04 3.35776615e-01 0.00000000e+00 0.00000000e+00] [<matplotlib.lines.Line2D at 0x1063f9f28>]
numpy.polyfit
も、
poly_lsq
が
- sklearn.preprocessing.PolynomialFeatures を
使う
Tech Tips: scikit-learnで線形回帰 を 参考に 以下作成しました。
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import Pipeline # T でx、y を入れ替える x = np.array([[1920,1925,1930,1935,1940,1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2020,2025,2030,2035,2045,2055,2065,2075,2085,2095]]).T y = np.array([[55963,59737,64450,69254,71933,72147,84115,90077,94302,99209,104665,111940,117060,121049,123611,125570,126926,127768,128033,128084,128032,128057,127834,127593,127414,127237,127095,126933,125325,122544,119125,115216,106421,97441,88077,78564,70381,63125]]).T # train a linear regression model regr = Pipeline([ ('poly', PolynomialFeatures(degree=4)), ('linear', LinearRegression()) ]) regr.fit(x, y) # make predictions xt = np.linspace(1920, 2100, num=300, dtype = 'int').reshape(300, 1) yt = regr.predict(xt) # plot samples and regression result plt.plot(x, y, 'o') plt.plot(xt, yt) plt.show()
scipy.odr を
参考
以下、
* numpy - polynomial regression using python - Stack Overflow
* 多項式回帰入門。
* 多項式回帰と
* 最小二乗法で
* Tech Tips: scikit-learnで
以上です。
コメント