過去に、
データは
前提
OS、
- OS
!sw_vers
ProductName: Mac OS X ProductVersion: 10.14.6 BuildVersion: 18G87
- Python の
Version
!python -V
Python 3.7.2
必要な ライブラリの インストール
!pip install --upgrade pip !pip install pandas !pip install google2pandas !pip install oauth2client
Requirement already up-to-date: pip in jupyter/.env_jupyter/lib/python3.7/site-packages (19.2.3) Requirement already satisfied: pandas in jupyter/.env_jupyter/lib/python3.7/site-packages (0.24.1) Requirement already satisfied: pytz>=2011k in jupyter/.env_jupyter/lib/python3.7/site-packages (from pandas) (2018.9) Requirement already satisfied: numpy>=1.12.0 in jupyter/.env_jupyter/lib/python3.7/site-packages (from pandas) (1.16.1) Requirement already satisfied: python-dateutil>=2.5.0 in jupyter/.env_jupyter/lib/python3.7/site-packages (from pandas) (2.7.5) Requirement already satisfied: six>=1.5 in jupyter/.env_jupyter/lib/python3.7/site-packages (from python-dateutil>=2.5.0->pandas) (1.12.0) Requirement already satisfied: google2pandas in jupyter/.env_jupyter/lib/python3.7/site-packages (0.1.1) Requirement already satisfied: pandas>=0.15 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google2pandas) (0.24.1) Requirement already satisfied: numpy>=1.7 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google2pandas) (1.16.1) Requirement already satisfied: httplib2 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google2pandas) (0.12.1) Requirement already satisfied: google-api-python-client in jupyter/.env_jupyter/lib/python3.7/site-packages (from google2pandas) (1.7.8) Requirement already satisfied: pytz>=2011k in jupyter/.env_jupyter/lib/python3.7/site-packages (from pandas>=0.15->google2pandas) (2018.9) Requirement already satisfied: python-dateutil>=2.5.0 in jupyter/.env_jupyter/lib/python3.7/site-packages (from pandas>=0.15->google2pandas) (2.7.5) Requirement already satisfied: google-auth>=1.4.1 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-api-python-client->google2pandas) (1.6.2) Requirement already satisfied: uritemplate<4dev,>=3.0.0 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-api-python-client->google2pandas) (3.0.0) Requirement already satisfied: six<2dev,>=1.6.1 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-api-python-client->google2pandas) (1.12.0) Requirement already satisfied: google-auth-httplib2>=0.0.3 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-api-python-client->google2pandas) (0.0.3) Requirement already satisfied: rsa>=3.1.4 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client->google2pandas) (4.0) Requirement already satisfied: cachetools>=2.0.0 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client->google2pandas) (3.1.0) Requirement already satisfied: pyasn1-modules>=0.2.1 in jupyter/.env_jupyter/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client->google2pandas) (0.2.4) Requirement already satisfied: pyasn1>=0.1.3 in jupyter/.env_jupyter/lib/python3.7/site-packages (from rsa>=3.1.4->google-auth>=1.4.1->google-api-python-client->google2pandas) (0.4.5) Requirement already satisfied: oauth2client in jupyter/.env_jupyter/lib/python3.7/site-packages (4.1.3) Requirement already satisfied: pyasn1-modules>=0.0.5 in jupyter/.env_jupyter/lib/python3.7/site-packages (from oauth2client) (0.2.4) Requirement already satisfied: six>=1.6.1 in jupyter/.env_jupyter/lib/python3.7/site-packages (from oauth2client) (1.12.0) Requirement already satisfied: pyasn1>=0.1.7 in jupyter/.env_jupyter/lib/python3.7/site-packages (from oauth2client) (0.4.5) Requirement already satisfied: rsa>=3.1.4 in jupyter/.env_jupyter/lib/python3.7/site-packages (from oauth2client) (4.0) Requirement already satisfied: httplib2>=0.9.1 in jupyter/.env_jupyter/lib/python3.7/site-packages (from oauth2client) (0.12.1)
Google2Pandas の 使い方
Google Analytics のgoogle2pandas
と
使用する
以下の
Google2Pandas で、
1ヶ月分の スクロールイベントを 取得し、 100%読了が どれくらい 発生するのか 計算する
以下、
データの
取得方法
pandas
のプラグインの google2pandas
を使います。
過去にgoogle2pandas
の使い方に ついて、 Google2Pandas で、 Google Analytics の データを pandas Dataframe に 変換する | Monotalk に まとめました。よろしければ こちらもご 確認ください。 スクロールイベントの
値
スクロールイベントは、Google Analytics 上で 以下のように 記録しています。 - イベントカテゴリの
値
Scroll
- イベント ラベルの
値
スクロールした割合 を xx%
で記録します。 - イベントアクションの
値
スクロールイベントが発生した ページパスの 値 - イベント値
未設定です。
- イベントカテゴリの
Google Anlaytics の
データを 取得
1ヶ月分のスクロールイベントの データを 取得します。
from google2pandas import * view_id = '103185238' query = { 'reportRequests': [{ 'viewId' : view_id, 'dateRanges': [{ 'startDate' : '30daysAgo', 'endDate' : 'today'}], 'dimensions' : [ {'name' : 'ga:eventCategory'}, {'name' : 'ga:eventLabel'}, ], 'metrics' : [ {'expression' : 'ga:totalEvents'} ], 'dimensionFilterClauses' : [{ 'filters' : [ {'dimensionName' : 'ga:eventCategory', 'expressions' : ['Scroll']} ] }] }] } conn = GoogleAnalyticsQueryV4(secrets='./ga_client.json') df = conn.execute_query(query) # 出力 df['totalEvents'] = df['totalEvents'].astype(int) ga_events = df.sort_values(['eventLabel'], ascending=True) ga_events
eventCategory | eventLabel | totalEvents | |
---|---|---|---|
0 | Scroll | 0% | 4 |
1 | Scroll | 100% | 828 |
2 | Scroll | 20% | 17867 |
3 | Scroll | 40% | 14669 |
4 | Scroll | 60% | 9897 |
5 | Scroll | 80% | 3966 |
eventLabel に、%
が
これを
# 1.ga_events['eventLabel'].str.split('%', expand=True) で %で区切った結果を、データフレームに変換 # 2.[0]で1列目のカラムを取得し、astype(int) で数値に変換する ga_events['eventLabel'] = ga_events['eventLabel'].str.split('%', expand=True)[0].astype(int) ga_events = ga_events.sort_values(['eventLabel'], ascending=True) ga_events
eventCategory | eventLabel | totalEvents | |
---|---|---|---|
0 | Scroll | 0 | 4 |
2 | Scroll | 20 | 17867 |
3 | Scroll | 40 | 14669 |
4 | Scroll | 60 | 9897 |
5 | Scroll | 80 | 3966 |
1 | Scroll | 100 | 828 |
現在の
ユーザーの
# 1行前のカラムとの差を計算 100%の値は保持 ga_events['totalEvents'] = ga_events['totalEvents'].diff(-1).fillna(ga_events['totalEvents'].tail()) ga_events = ga_events.query('totalEvents > 0')
- 割合を
計算
totalEvents の割合を 計算します。
ga_events['perc']= ga_events['totalEvents']/ga_events['totalEvents'].sum() * 100 ga_events
jupyter/.env_jupyter/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """Entry point for launching an IPython kernel.
eventCategory | eventLabel | totalEvents | perc | |
---|---|---|---|---|
2 | Scroll | 20 | 3198.0 | 17.898920 |
3 | Scroll | 40 | 4772.0 | 26.708457 |
4 | Scroll | 60 | 5931.0 | 33.195276 |
5 | Scroll | 80 | 3138.0 | 17.563105 |
1 | Scroll | 100 | 828.0 | 4.634242 |
100% まで
一通り 計算して、 ページビュー数との 割合を 計算しないと いけない ことに 気づく。
読了率20%以下で
ページの
from google2pandas import * view_id = '103185238' query = { 'reportRequests': [{ 'viewId' : view_id, 'dateRanges': [{ 'startDate' : '30daysAgo', 'endDate' : 'today'}], 'dimensions' : [ {'name' : 'ga:pagePath'} ], 'metrics' : [ {'expression' : 'ga:pageviews'} ], }] } conn = GoogleAnalyticsQueryV4(secrets='./ga_client.json') df = conn.execute_query(query) df['pageviews'] = df['pageviews'].astype(int) pageview = df['pageviews'].sum() query = { 'reportRequests': [{ 'viewId' : view_id, 'dateRanges': [{ 'startDate' : '30daysAgo', 'endDate' : 'today'}], 'dimensions' : [ {'name' : 'ga:eventCategory'}, {'name' : 'ga:eventLabel'}, ], 'metrics' : [ {'expression' : 'ga:totalEvents'} ], 'dimensionFilterClauses' : [{ 'operator' : 'AND', 'filters' : [ {'dimensionName' : 'ga:eventCategory', 'expressions' : ['Scroll']}, {'dimensionName' : 'ga:eventLabel', 'expressions' : ['100']} ] }] }] } conn = GoogleAnalyticsQueryV4(secrets='./ga_client.json') df = conn.execute_query(query) int(df['totalEvents']) / pageview * 100
3.238392508778775
3%程度に
1ヶ月分の スクロールイベントを 取得、 ページごとの 読了率と ページビューとの 相関を 確認する
ページビューと
スクロールイベントの
スクロールイベントを 取得、 縦持ちを 横持ちに 変換する
from google2pandas import * view_id = '103185238' query = { 'reportRequests': [{ 'viewId' : view_id, 'dateRanges': [{ 'startDate' : '30daysAgo', 'endDate' : 'today'}], 'dimensions' : [ {'name' : 'ga:eventAction'}, {'name' : 'ga:eventCategory'}, {'name' : 'ga:eventLabel'}, ], 'metrics' : [ {'expression' : 'ga:totalEvents'} ], 'dimensionFilterClauses' : [{ 'filters' : [ {'dimensionName' : 'ga:eventCategory', 'expressions' : ['Scroll']} ] }] }] } conn = GoogleAnalyticsQueryV4(secrets='./ga_client.json') df = conn.execute_query(query) # 出力 df['totalEvents'] = df['totalEvents'].astype(int) ga_events = df.sort_values(['eventAction','eventLabel'], ascending=True) # 1.ga_events['eventLabel'].str.split('%', expand=True) で %で区切った結果を、データフレームに変換 # 2.[0]で1列目のカラムを取得し、astype(int) で数値に変換する ga_events['eventLabel'] = ga_events['eventLabel'].str.split('%', expand=True)[0].astype(int) ga_events = ga_events.sort_values(['eventAction','eventLabel'], ascending=True) # totalEvents を縦持ちから横持ちに変換 pivot_ga_events = ga_events.pivot_table(values=['totalEvents'], index=['eventAction'], columns=['eventLabel'], aggfunc='sum', fill_value=0)
カラムの 階層を 削る
pivot した
droplevel()
を
# カラムの階層を削る pivot_ga_events.columns = pivot_ga_events.columns.droplevel()
ページビューの データを 取得し、 イベントデータと マージする
ページビューの
キーとpathPath
とeventAction
を
# Pageview を取得 from google2pandas import * view_id = '103185238' query = { 'reportRequests': [{ 'viewId' : view_id, 'dateRanges': [{ 'startDate' : '30daysAgo', 'endDate' : 'today'}], 'dimensions' : [ {'name' : 'ga:pagePath'} ], 'metrics' : [ {'expression' : 'ga:pageviews'} ], }] } conn = GoogleAnalyticsQueryV4(secrets='./ga_client.json') df = conn.execute_query(query) import pandas as pd merge_pd = pd.merge(df, pivot_ga_events, left_on='pagePath', right_on='eventAction') merge_pd
pagePath | pageviews | 0 | 20 | 40 | 60 | 80 | 100 | |
---|---|---|---|---|---|---|---|---|
0 | / | 331 | 0 | 113 | 76 | 64 | 57 | 56 |
1 | /about/ | 31 | 0 | 13 | 14 | 14 | 15 | 14 |
2 | /ampoptimized/blog/In-the-browser,-wake-up-tex... | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
3 | /ampoptimized/blog/Make-the-transition-destina... | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
4 | /blog/ | 8 | 0 | 4 | 4 | 4 | 4 | 4 |
5 | /blog/404_errorpage_configration_on_wicket_dro... | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
6 | /blog/AB-test-tool-Web-personalization-tool-RT... | 25 | 0 | 21 | 20 | 14 | 5 | 2 |
7 | /blog/About-formulas-in-Google-Spreadsheet-for... | 102 | 0 | 85 | 72 | 49 | 12 | 3 |
8 | /blog/About-JavaScript-keyboard-shortcut-library/ | 95 | 0 | 54 | 43 | 27 | 15 | 3 |
9 | /blog/About-searching-and-completing-a-string-... | 176 | 0 | 148 | 120 | 77 | 19 | 3 |
10 | /blog/About-statistical-information-of-blog-po... | 8 | 0 | 5 | 4 | 4 | 2 | 0 |
11 | /blog/About-txt-file-that-can-be-installed-on-... | 22 | 0 | 15 | 15 | 13 | 6 | 3 |
12 | /blog/add-configuration-on-django-compressor/ | 2 | 0 | 1 | 1 | 0 | 0 | 0 |
13 | /blog/Add-font-display-with-gulp-using-postcss... | 22 | 0 | 17 | 12 | 4 | 0 | 0 |
14 | /blog/at-comfasterxmljacksondatabindexcunrecog... | 22 | 0 | 11 | 11 | 8 | 3 | 0 |
15 | /blog/author/monotalk/ | 11 | 0 | 4 | 2 | 1 | 1 | 3 |
16 | /blog/Block-multiple-requests-of-Wicket-Ajax/ | 28 | 0 | 21 | 15 | 10 | 5 | 0 |
17 | /blog/BooleanFiled-Null-True-on-django/ | 11 | 0 | 10 | 9 | 4 | 1 | 0 |
18 | /blog/Calculate-Cauchy-distribution-in-Python/ | 31 | 0 | 20 | 15 | 13 | 1 | 0 |
19 | /blog/Calculate-coefficient-of-variation-with-... | 69 | 0 | 50 | 33 | 16 | 3 | 0 |
20 | /blog/Calculate-hypergeometric-distribution-wi... | 19 | 0 | 12 | 10 | 7 | 0 | 0 |
21 | /blog/Calculate-inequality-index-in-python/ | 69 | 0 | 45 | 35 | 18 | 5 | 0 |
22 | /blog/Calculate-lognormal-distribution-in-Python/ | 165 | 0 | 119 | 100 | 71 | 14 | 1 |
23 | /blog/Calculate-normal-distribution-with-python/ | 434 | 0 | 312 | 229 | 182 | 62 | 7 |
24 | /blog/Calculate-polynomial-regression-with-pyt... | 267 | 0 | 177 | 144 | 128 | 47 | 9 |
25 | /blog/Calculate-the-exponential-distribution-w... | 137 | 0 | 102 | 85 | 60 | 12 | 1 |
26 | /blog/Calculate-the-F-distribution-with-python/ | 67 | 0 | 42 | 36 | 24 | 6 | 2 |
27 | /blog/Calculate-the-gamma-distribution-with-Py... | 132 | 0 | 94 | 71 | 52 | 22 | 2 |
28 | /blog/Calculate-the-geometric-distribution-wit... | 35 | 0 | 30 | 30 | 23 | 5 | 2 |
29 | /blog/Calculate-the-interquartile-range-with-p... | 285 | 0 | 226 | 167 | 83 | 11 | 2 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
219 | /blog/wicket-about-wicketheader-items/ | 10 | 0 | 8 | 7 | 6 | 2 | 0 |
220 | /blog/Wicket-AjaxButton-Controls-behavior-when... | 28 | 0 | 17 | 16 | 12 | 3 | 2 |
221 | /blog/With-python-statsmodel-calculate-the-ver... | 128 | 0 | 81 | 62 | 43 | 25 | 4 |
222 | /blog/Write-Boolean-condition-directly-in-Ecli... | 4 | 0 | 3 | 3 | 1 | 0 | 0 |
223 | /blog/WSGIRequest-object-is-not-subscriptable-... | 110 | 0 | 85 | 71 | 28 | 5 | 0 |
224 | /blog/youtube-data-api-v3-java-paging-search/ | 9 | 0 | 5 | 4 | 4 | 3 | 0 |
225 | /categories/ | 14 | 0 | 2 | 2 | 2 | 2 | 1 |
226 | /ja/aada/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
227 | /ja/admin/ | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
228 | /ja/amp/amp/blog/Webpack-4-fontmin-webpack-Rem... | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
229 | /ja/ampoptimized/blog/monitore/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
230 | /ja/apis/ | 4 | 0 | 4 | 4 | 4 | 4 | 4 |
231 | /ja/apis/guessresult/ | 5 | 0 | 5 | 5 | 5 | 5 | 5 |
232 | /ja/apis/s/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
233 | /ja/blog/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
234 | /ja/blog/Default-findBugs-Rules-In-Github-Repo... | 2 | 0 | 1 | 1 | 1 | 1 | 1 |
235 | /ja/blog/google-search-console/ | 2 | 0 | 1 | 1 | 1 | 1 | 1 |
236 | /ja/blog/google-spread-sheet-/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
237 | /ja/blog/python-requests-post-/ | 4 | 0 | 4 | 4 | 4 | 4 | 4 |
238 | /ja/blog/python-requests-post/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
239 | /ja/blog/pyton/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
240 | /ja/guess_apis/guessresult/ | 2 | 0 | 2 | 2 | 2 | 2 | 2 |
241 | /ja/guessresult/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
242 | /ja/guessresult/adss/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
243 | /ja/manifest.json/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
244 | /ja/xyz_monotalk_api/guessresult/ | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
245 | /ja/xyz_monotalk_xyz/guessresult/blog/About-Ja... | 2 | 0 | 2 | 2 | 2 | 2 | 2 |
246 | /statistics/ | 31 | 0 | 10 | 7 | 7 | 5 | 3 |
247 | /xyz_monotalk_api/pages | 2 | 0 | 2 | 2 | 2 | 2 | 2 |
248 | /xyz_monotalk_api/posts | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
249 rows × 8 columns
データの 整形と、 調整
ページビューの
また、
merge_pd['pageviews'] = merge_pd['pageviews'].astype(int) # pageviewの少ないページを除外 merge_pd = merge_pd.query('pageviews >= 5') # うまくスクロール率が記録できていないページがあるので、除外 merge_pd = merge_pd[(merge_pd['pageviews'] - merge_pd[20]) < 200] del merge_pd[0] merge_pd['Read rate'] = (merge_pd[100] / merge_pd['pageviews'] * 100) merge_pd
pagePath | pageviews | 20 | 40 | 60 | 80 | 100 | Read rate | |
---|---|---|---|---|---|---|---|---|
1 | /about/ | 31 | 13 | 14 | 14 | 15 | 14 | 45.161290 |
4 | /blog/ | 8 | 4 | 4 | 4 | 4 | 4 | 50.000000 |
6 | /blog/AB-test-tool-Web-personalization-tool-RT... | 25 | 21 | 20 | 14 | 5 | 2 | 8.000000 |
7 | /blog/About-formulas-in-Google-Spreadsheet-for... | 102 | 85 | 72 | 49 | 12 | 3 | 2.941176 |
8 | /blog/About-JavaScript-keyboard-shortcut-library/ | 95 | 54 | 43 | 27 | 15 | 3 | 3.157895 |
9 | /blog/About-searching-and-completing-a-string-... | 176 | 148 | 120 | 77 | 19 | 3 | 1.704545 |
10 | /blog/About-statistical-information-of-blog-po... | 8 | 5 | 4 | 4 | 2 | 0 | 0.000000 |
11 | /blog/About-txt-file-that-can-be-installed-on-... | 22 | 15 | 15 | 13 | 6 | 3 | 13.636364 |
13 | /blog/Add-font-display-with-gulp-using-postcss... | 22 | 17 | 12 | 4 | 0 | 0 | 0.000000 |
14 | /blog/at-comfasterxmljacksondatabindexcunrecog... | 22 | 11 | 11 | 8 | 3 | 0 | 0.000000 |
15 | /blog/author/monotalk/ | 11 | 4 | 2 | 1 | 1 | 3 | 27.272727 |
16 | /blog/Block-multiple-requests-of-Wicket-Ajax/ | 28 | 21 | 15 | 10 | 5 | 0 | 0.000000 |
17 | /blog/BooleanFiled-Null-True-on-django/ | 11 | 10 | 9 | 4 | 1 | 0 | 0.000000 |
18 | /blog/Calculate-Cauchy-distribution-in-Python/ | 31 | 20 | 15 | 13 | 1 | 0 | 0.000000 |
19 | /blog/Calculate-coefficient-of-variation-with-... | 69 | 50 | 33 | 16 | 3 | 0 | 0.000000 |
20 | /blog/Calculate-hypergeometric-distribution-wi... | 19 | 12 | 10 | 7 | 0 | 0 | 0.000000 |
21 | /blog/Calculate-inequality-index-in-python/ | 69 | 45 | 35 | 18 | 5 | 0 | 0.000000 |
22 | /blog/Calculate-lognormal-distribution-in-Python/ | 165 | 119 | 100 | 71 | 14 | 1 | 0.606061 |
23 | /blog/Calculate-normal-distribution-with-python/ | 434 | 312 | 229 | 182 | 62 | 7 | 1.612903 |
24 | /blog/Calculate-polynomial-regression-with-pyt... | 267 | 177 | 144 | 128 | 47 | 9 | 3.370787 |
25 | /blog/Calculate-the-exponential-distribution-w... | 137 | 102 | 85 | 60 | 12 | 1 | 0.729927 |
26 | /blog/Calculate-the-F-distribution-with-python/ | 67 | 42 | 36 | 24 | 6 | 2 | 2.985075 |
27 | /blog/Calculate-the-gamma-distribution-with-Py... | 132 | 94 | 71 | 52 | 22 | 2 | 1.515152 |
28 | /blog/Calculate-the-geometric-distribution-wit... | 35 | 30 | 30 | 23 | 5 | 2 | 5.714286 |
29 | /blog/Calculate-the-interquartile-range-with-p... | 285 | 226 | 167 | 83 | 11 | 2 | 0.701754 |
30 | /blog/Calculate-the-moment-with-scipy.stats.mo... | 20 | 18 | 16 | 13 | 2 | 0 | 0.000000 |
31 | /blog/Calculate-the-previous-term-growth-rate-... | 96 | 62 | 59 | 49 | 30 | 1 | 1.041667 |
32 | /blog/Calculate-the-probability-of-binomial-di... | 255 | 184 | 151 | 96 | 19 | 9 | 3.529412 |
33 | /blog/Calculate-uniform-distribution-in-Python/ | 115 | 76 | 45 | 42 | 20 | 4 | 3.478261 |
34 | /blog/Calculate-Weibull-distribution-in-Python/ | 192 | 121 | 107 | 96 | 40 | 7 | 3.645833 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
185 | /blog/understanding-of-aggregate-pipeline/ | 41 | 40 | 37 | 33 | 8 | 4 | 9.756098 |
186 | /blog/Update-workbox-webpack-plugin-from-v2-to... | 28 | 21 | 18 | 12 | 3 | 2 | 7.142857 |
187 | /blog/Updated-Japanese-translations-on-Mezzanine/ | 15 | 13 | 9 | 6 | 3 | 0 | 0.000000 |
188 | /blog/upgrade-from-workbox-v2-to-v4/ | 23 | 20 | 17 | 11 | 4 | 1 | 4.347826 |
193 | /blog/usage-of-image-class-in-wicket/ | 32 | 15 | 13 | 9 | 8 | 3 | 9.375000 |
194 | /blog/usage-of-modify-display-choice-name-on-d... | 69 | 56 | 49 | 17 | 7 | 2 | 2.898551 |
195 | /blog/usage-of-upsert-java-mongodb-driver/ | 11 | 10 | 10 | 6 | 1 | 1 | 9.090909 |
197 | /blog/Use-apache-ultimate-bad-bot-blocker-to-b... | 20 | 12 | 8 | 6 | 3 | 0 | 0.000000 |
198 | /blog/Use-data-studio-community-visualizations... | 56 | 46 | 44 | 44 | 21 | 3 | 5.357143 |
200 | /blog/Use-ifttt-and-chatwork-as-web-informatio... | 36 | 29 | 24 | 13 | 11 | 3 | 8.333333 |
202 | /blog/Use-the-Google-Place-API-as-a-facility-s... | 133 | 105 | 89 | 74 | 30 | 3 | 2.255639 |
203 | /blog/Use-trubolinks-for-sites-using-Django/ | 9 | 8 | 7 | 5 | 2 | 1 | 11.111111 |
204 | /blog/Use-UnCSS-with-gulp/ | 29 | 22 | 17 | 9 | 2 | 0 | 0.000000 |
205 | /blog/Use-Wicket-wicket-devutils-to-check-the-... | 16 | 6 | 5 | 4 | 2 | 0 | 0.000000 |
206 | /blog/uses-a-non-entity-orgeclipsepersistencee... | 13 | 8 | 8 | 5 | 1 | 1 | 7.692308 |
208 | /blog/using-resource-on-wicket-application/ | 25 | 18 | 17 | 14 | 8 | 1 | 4.000000 |
209 | /blog/Validate-security-settings-by-adding-dep... | 14 | 9 | 8 | 6 | 0 | 0 | 0.000000 |
211 | /blog/Verifying-the-vulnerability-of-blogs-bui... | 27 | 22 | 18 | 9 | 1 | 1 | 3.703704 |
212 | /blog/Viewed-from-a-programmer-point-of-view-G... | 20 | 15 | 12 | 8 | 0 | 0 | 0.000000 |
214 | /blog/Visualize-the-quality-of-the-blog-articl... | 11 | 6 | 6 | 4 | 1 | 0 | 0.000000 |
216 | /blog/Webpack-4-fontmin-webpack-Remove-unused-... | 28 | 23 | 16 | 10 | 0 | 0 | 0.000000 |
217 | /blog/Webpack-4-OptimizeCSSAssetsPlugin-Optimi... | 127 | 103 | 82 | 54 | 5 | 2 | 1.574803 |
219 | /blog/wicket-about-wicketheader-items/ | 10 | 8 | 7 | 6 | 2 | 0 | 0.000000 |
220 | /blog/Wicket-AjaxButton-Controls-behavior-when... | 28 | 17 | 16 | 12 | 3 | 2 | 7.142857 |
221 | /blog/With-python-statsmodel-calculate-the-ver... | 128 | 81 | 62 | 43 | 25 | 4 | 3.125000 |
223 | /blog/WSGIRequest-object-is-not-subscriptable-... | 110 | 85 | 71 | 28 | 5 | 0 | 0.000000 |
224 | /blog/youtube-data-api-v3-java-paging-search/ | 9 | 5 | 4 | 4 | 3 | 0 | 0.000000 |
225 | /categories/ | 14 | 2 | 2 | 2 | 2 | 1 | 7.142857 |
231 | /ja/apis/guessresult/ | 5 | 5 | 5 | 5 | 5 | 5 | 100.000000 |
246 | /statistics/ | 31 | 10 | 7 | 7 | 5 | 3 | 9.677419 |
180 rows × 8 columns
散布図の 描画と、 相関係数の 計算
scatter_matrix
でcorr
で
pd.plotting.scatter_matrix(merge_pd, alpha=0.8, figsize=(12,12), range_padding=0.5)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x11f2a17f0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12905c748>, <matplotlib.axes._subplots.AxesSubplot object at 0x11fb8f390>, <matplotlib.axes._subplots.AxesSubplot object at 0x129143e80>, <matplotlib.axes._subplots.AxesSubplot object at 0x129527438>, <matplotlib.axes._subplots.AxesSubplot object at 0x12955a9b0>, <matplotlib.axes._subplots.AxesSubplot object at 0x129585f28>], [<matplotlib.axes._subplots.AxesSubplot object at 0x129503518>, <matplotlib.axes._subplots.AxesSubplot object at 0x129503550>, <matplotlib.axes._subplots.AxesSubplot object at 0x1295d9fd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x129608588>, <matplotlib.axes._subplots.AxesSubplot object at 0x129630b00>, <matplotlib.axes._subplots.AxesSubplot object at 0x1296620b8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12968b630>], [<matplotlib.axes._subplots.AxesSubplot object at 0x1296b1ba8>, <matplotlib.axes._subplots.AxesSubplot object at 0x1296e2160>, <matplotlib.axes._subplots.AxesSubplot object at 0x1297086d8>, <matplotlib.axes._subplots.AxesSubplot object at 0x129730c50>, <matplotlib.axes._subplots.AxesSubplot object at 0x129764208>, <matplotlib.axes._subplots.AxesSubplot object at 0x12978a780>, <matplotlib.axes._subplots.AxesSubplot object at 0x1297b3cf8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x1297e42b0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12980a828>, <matplotlib.axes._subplots.AxesSubplot object at 0x129833da0>, <matplotlib.axes._subplots.AxesSubplot object at 0x129864358>, <matplotlib.axes._subplots.AxesSubplot object at 0x12988c8d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x1298b5e48>, <matplotlib.axes._subplots.AxesSubplot object at 0x1298e6400>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12990d978>, <matplotlib.axes._subplots.AxesSubplot object at 0x129936ef0>, <matplotlib.axes._subplots.AxesSubplot object at 0x1299654a8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12998da20>, <matplotlib.axes._subplots.AxesSubplot object at 0x1299b7f98>, <matplotlib.axes._subplots.AxesSubplot object at 0x1299e7550>, <matplotlib.axes._subplots.AxesSubplot object at 0x129a0eac8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x129a40080>, <matplotlib.axes._subplots.AxesSubplot object at 0x129a675f8>, <matplotlib.axes._subplots.AxesSubplot object at 0x129a90b70>, <matplotlib.axes._subplots.AxesSubplot object at 0x129ac2128>, <matplotlib.axes._subplots.AxesSubplot object at 0x129ae76a0>, <matplotlib.axes._subplots.AxesSubplot object at 0x129b11c18>, <matplotlib.axes._subplots.AxesSubplot object at 0x129b401d0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x129b69748>, <matplotlib.axes._subplots.AxesSubplot object at 0x129b91cc0>, <matplotlib.axes._subplots.AxesSubplot object at 0x129bc3278>, <matplotlib.axes._subplots.AxesSubplot object at 0x129be97f0>, <matplotlib.axes._subplots.AxesSubplot object at 0x129c12d68>, <matplotlib.axes._subplots.AxesSubplot object at 0x129c44320>, <matplotlib.axes._subplots.AxesSubplot object at 0x129c6b898>]], dtype=object)
merge_pd.corr()
pageviews | 20 | 40 | 60 | 80 | 100 | Read rate | |
---|---|---|---|---|---|---|---|
pageviews | 1.000000 | 0.977938 | 0.961896 | 0.961121 | 0.830760 | 0.659829 | -0.135975 |
20 | 0.977938 | 1.000000 | 0.993487 | 0.966516 | 0.752464 | 0.628549 | -0.137896 |
40 | 0.961896 | 0.993487 | 1.000000 | 0.961941 | 0.736928 | 0.640821 | -0.131964 |
60 | 0.961121 | 0.966516 | 0.961941 | 1.000000 | 0.819036 | 0.672964 | -0.114497 |
80 | 0.830760 | 0.752464 | 0.736928 | 0.819036 | 1.000000 | 0.753186 | -0.028989 |
100 | 0.659829 | 0.628549 | 0.640821 | 0.672964 | 0.753186 | 1.000000 | 0.342774 |
Read rate | -0.135975 | -0.137896 | -0.131964 | -0.114497 | -0.028989 | 0.342774 | 1.000000 |
Read rate
とpageviews
の
ページビュー上位20%の 相関を 見る
全体だと
上位20%のquantile
関数を
m = merge_pd[merge_pd['pageviews'] >= merge_pd['pageviews'].quantile(0.8)]
pd.plotting.scatter_matrix(m,alpha=0.8, figsize=(12,12), range_padding=0.5)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x12a075198>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a0aa320>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a0be518>, <matplotlib.axes._subplots.AxesSubplot object at 0x129386748>, <matplotlib.axes._subplots.AxesSubplot object at 0x1293a7a20>, <matplotlib.axes._subplots.AxesSubplot object at 0x129461f98>, <matplotlib.axes._subplots.AxesSubplot object at 0x129492550>], [<matplotlib.axes._subplots.AxesSubplot object at 0x1294b8b00>, <matplotlib.axes._subplots.AxesSubplot object at 0x1294b8b38>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a3065f8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a32eb70>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a35e128>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a3876a0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a3afc18>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12a3df1d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a406748>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a430cc0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a461278>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a4897f0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a4b1d68>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a4e3320>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12a50a898>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a532e10>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a5643c8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a58a940>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a5b1eb8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a5e4470>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a60a9e8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12a632f60>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a664518>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a68ba90>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a6bc048>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a6e75c0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a70fb38>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a7400f0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12a767668>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a78ebe0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a7c0198>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a7e5710>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a810c88>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a840240>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a8677b8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12a891d30>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a8c12e8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a8e8860>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a911dd8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a941390>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a96b908>, <matplotlib.axes._subplots.AxesSubplot object at 0x12a993e80>]], dtype=object)
m.corr()
pageviews | 20 | 40 | 60 | 80 | 100 | Read rate | |
---|---|---|---|---|---|---|---|
pageviews | 1.000000 | 0.928568 | 0.878365 | 0.896935 | 0.703414 | 0.579343 | -0.158547 |
20 | 0.928568 | 1.000000 | 0.980876 | 0.909730 | 0.492197 | 0.492201 | -0.193289 |
40 | 0.878365 | 0.980876 | 1.000000 | 0.888006 | 0.451727 | 0.525741 | -0.137850 |
60 | 0.896935 | 0.909730 | 0.888006 | 1.000000 | 0.636424 | 0.588516 | -0.078309 |
80 | 0.703414 | 0.492197 | 0.451727 | 0.636424 | 1.000000 | 0.741021 | 0.228411 |
100 | 0.579343 | 0.492201 | 0.525741 | 0.588516 | 0.741021 | 1.000000 | 0.643966 |
Read rate | -0.158547 | -0.193289 | -0.137850 | -0.078309 | 0.228411 | 0.643966 | 1.000000 |
特に相関が
ページビュー上位10%の 相関を 見る
上位10%の
m = merge_pd[merge_pd['pageviews'] >= merge_pd['pageviews'].quantile(0.90)]
pd.plotting.scatter_matrix(m,alpha=0.8, figsize=(12,12), range_padding=0.5)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x12d6ab320>, <matplotlib.axes._subplots.AxesSubplot object at 0x12d7e9518>, <matplotlib.axes._subplots.AxesSubplot object at 0x12d7f5780>, <matplotlib.axes._subplots.AxesSubplot object at 0x12d80f9e8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12da4ad30>, <matplotlib.axes._subplots.AxesSubplot object at 0x12da79278>, <matplotlib.axes._subplots.AxesSubplot object at 0x12daa27f0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12dacada0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dacadd8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12db23898>, <matplotlib.axes._subplots.AxesSubplot object at 0x12db4be10>, <matplotlib.axes._subplots.AxesSubplot object at 0x12db7b3c8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dba3940>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dbcceb8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12dbfd470>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dc249e8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dc4ef60>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dc80518>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dca5a90>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dcd9048>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dcff5c0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12dd28b38>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dd590f0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dd80668>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dda6be0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dddb198>, <matplotlib.axes._subplots.AxesSubplot object at 0x12ddff710>, <matplotlib.axes._subplots.AxesSubplot object at 0x12de2ac88>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12de5a240>, <matplotlib.axes._subplots.AxesSubplot object at 0x12de817b8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dea8d30>, <matplotlib.axes._subplots.AxesSubplot object at 0x12deda2e8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12deff860>, <matplotlib.axes._subplots.AxesSubplot object at 0x12df2cdd8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12df5c390>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12df86908>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dface80>, <matplotlib.axes._subplots.AxesSubplot object at 0x12dfdd438>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e0039b0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e02ef28>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e05b4e0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e085a58>], [<matplotlib.axes._subplots.AxesSubplot object at 0x12e0adfd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e0df588>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e106b00>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e1390b8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e15f630>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e188ba8>, <matplotlib.axes._subplots.AxesSubplot object at 0x12e1b8160>]], dtype=object)
m.corr()
pageviews | 20 | 40 | 60 | 80 | 100 | Read rate | |
---|---|---|---|---|---|---|---|
pageviews | 1.000000 | 0.873550 | 0.791892 | 0.819756 | 0.608373 | 0.590520 | 0.154224 |
20 | 0.873550 | 1.000000 | 0.970574 | 0.857356 | 0.268564 | 0.454755 | 0.073183 |
40 | 0.791892 | 0.970574 | 1.000000 | 0.829729 | 0.218985 | 0.511611 | 0.176578 |
60 | 0.819756 | 0.857356 | 0.829729 | 1.000000 | 0.470888 | 0.603951 | 0.283545 |
80 | 0.608373 | 0.268564 | 0.218985 | 0.470888 | 1.000000 | 0.737293 | 0.530483 |
100 | 0.590520 | 0.454755 | 0.511611 | 0.603951 | 0.737293 | 1.000000 | 0.859617 |
Read rate | 0.154224 | 0.073183 | 0.176578 | 0.283545 | 0.530483 | 0.859617 | 1.000000 |
上位10%だと、
まとめ
Pandas で
1ページあたり
複数回、 スクロール率を 記録すると、 分析時の データ加工が 面倒。
1回だけ記録するように、 Google Analytics の 設定を 変更するのが 良さそうに 思いました。
#GTMTips: Fire Trigger When User Is About To Leave The Page | Simo Ahava’s blog
に記載が ありますが、 beforeunload
にイベントの 送付を 仕込むと 1回だけ 記録が できそうなので、 これを 試してみようかと 思います。 ページビューと、
読了率100%の 割合に 相関は なさそう。
当ブログにおいては、 基本的に ページビューと 読了率100%の 割合に 相関は なさそうです。
ページ内のリンクの 数、 記事の 文字数などが 関係するのかもしれません。 上位10%だと、
読了率100%の 割合は 正の 相関に なる。
上位20%のページだと、 読了率100%の 割合は 負の 相関ですが、 上位10%の ページだと、 読了率100%の 割合は 正の 相関に なります。
上位20%に入るが、 読了率100%の 割合が 低い ページが いると いう ことで どんな ページが 該当するのか 興味を 持ちました。
参考
以下、
* pandasで
* python - How to calculate percentage with Pandas’ DataFrame - Stack Overflow
* Python: Pandasの
* python - Pandas: drop a level from a multi-level column index? - Stack Overflow
以上です。
コメント