반응형
    
    
    
  
                              Notice
                              
                          
                        
                          
                          
                            Recent Posts
                            
                        
                          
                          
                            Recent Comments
                            
                        
                          
                          
                            Link
                            
                        
                    | 일 | 월 | 화 | 수 | 목 | 금 | 토 | 
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 | 
| 12 | 13 | 14 | 15 | 16 | 17 | 18 | 
| 19 | 20 | 21 | 22 | 23 | 24 | 25 | 
| 26 | 27 | 28 | 29 | 30 | 31 | 
                            Tags
                            
                        
                          
                          - Tableu
- 붕괴 스타레일
- Roberta
- 다항분포
- LDA
- geocoding
- 포아송분포
- 데이터넥스트레벨챌린지
- 원신
- 데벨챌
- 클래스 분류
- KeyBert
- Optimizer
- 블루 아카이브
- NLP
- 문맥을 반영한 토픽모델링
- BERTopic
- SBERT
- 자연어 모델
- 데이터리안
- 토픽 모델링
- 구글 스토어 리뷰
- CTM
- 코사인 유사도
- 피파온라인 API
- 트위치
- 옵티마이저
- 조축회
- 개체명 인식
- 블루아카이브 토픽모델링
                            Archives
                            
                        
                          
                          - Today
- Total
분석하고싶은코코
fbprophet 연습 - 비트코인 본문
반응형
    
    
    
  비트코인 정보를 제공해주는 페이지에서 크롤링하여 예측해보는 과정을 연습했다.
연습하는 과정이기 때문에 예제에서 다뤘던 페이지가 아닌 업비트에서 정보를 가져오는 과정으로 바꿔서 진행했다.
forecast 연습¶
- 비트코인 데이터 forcst
- 실습과 다른 사이트에서 정보 가져와서 예측해보기
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup
from fbprophet import Prophet
from urllib.request import urlopen, Request
import set_matplotlib_hangul
import warnings
warnings.filterwarnings(action='ignore')
%matplotlib inline
Hangul Setting OK for Mac
코인 페이지는 정적페이지가 아니기 때문에 단순하게 BeautifulSoup만으로는 html소스를 가져올 수 없음. selenium사용.¶
In [2]:
url = "https://upbit.com/exchange?code=CRIX.UPBIT.KRW-DOGE"
driver = webdriver.Chrome('./driver/chromedriver 2')
driver.get(url)
In [4]:
from selenium.webdriver.common.by import By
# 스크롤 기능 사용 - 셀레니움은 화면에 보이지 않으면 실행할 수 없음
xpath = '//*[@id="UpbitLayout"]/div[3]/div/section[1]/article[3]/span/div/div[1]/table'
variable = driver.find_element(By.XPATH, xpath)
driver.execute_script("return arguments[0].scrollIntoView();", variable)
variable.click()
In [5]:
xpath = '//*[@id="UpbitLayout"]/div[3]/div/section[1]/article[3]/span/div/div[1]'
variable = driver.find_element(By.XPATH, xpath)
for i in range(10):
    driver.execute_script("arguments[0].scrollBy(0,3000)", variable)
    time.sleep(2)
In [6]:
html = driver.page_source
In [7]:
driver.quit()
In [8]:
soup = BeautifulSoup(html, "html.parser")
table = soup.find_all("table", class_="ty01")[1]
table = table.find_all("tr")
date = []
price = []
movePrice = []
moveRatio = []
tradeCnt = []
year = 22
for data in table:
    if data.find_all("td")[0].text == '12.31' and len(data) > 0:
        year -=1
        
    date.append(str(year)+ '.' + data.find_all("td")[0].text)
    price.append(data.find_all("td")[1].text)
    
    if data.find_all("td")[1].find(class_="down"):
        movePrice.append("-" + data.find_all("td")[2].text)
    elif data.find_all("td")[1].find(class_="up"):
        movePrice.append("+" + data.find_all("td")[2].text)
    else:
        movePrice.append('-')
    
    
    moveRatio.append(data.find_all("td")[3].text)
    tradeCnt.append(data.find_all("td")[4].text)
df = pd.DataFrame({
    "ds" : date,
    "price" : price,
    "movePrice" : movePrice,
    "moveRatio" : moveRatio,
    "tradeCnt" : tradeCnt
})
df
Out[8]:
| ds | price | movePrice | moveRatio | tradeCnt | |
|---|---|---|---|---|---|
| 0 | 22.11.10 | 120 | +13.00 | +12.15% | 2,588,022,928 | 
| 1 | 22.11.09 | 107 | -21.00 | -16.41% | 8,225,977,787 | 
| 2 | 22.11.08 | 128 | -28.00 | -17.95% | 5,642,309,810 | 
| 3 | 22.11.07 | 156 | -6.00 | -3.70% | 1,614,447,657 | 
| 4 | 22.11.06 | 162 | -12.00 | -6.90% | 1,352,634,001 | 
| ... | ... | ... | ... | ... | ... | 
| 545 | 21.05.14 | 693 | +77.00 | +12.50% | 10,570,378,481 | 
| 546 | 21.05.13 | 616 | +112 | +22.22% | 8,336,740,115 | 
| 547 | 21.05.12 | 504 | -103 | -16.97% | 3,357,814,191 | 
| 548 | 21.05.11 | 607 | +32.00 | +5.57% | 6,977,593,296 | 
| 549 | 21.05.10 | 575 | -128 | -18.21% | 5,902,763,500 | 
550 rows × 5 columns
In [9]:
df["ds"] = pd.to_datetime(df["ds"], format="%y.%m.%d")
df.tail()
Out[9]:
| ds | price | movePrice | moveRatio | tradeCnt | |
|---|---|---|---|---|---|
| 545 | 2021-05-14 | 693 | +77.00 | +12.50% | 10,570,378,481 | 
| 546 | 2021-05-13 | 616 | +112 | +22.22% | 8,336,740,115 | 
| 547 | 2021-05-12 | 504 | -103 | -16.97% | 3,357,814,191 | 
| 548 | 2021-05-11 | 607 | +32.00 | +5.57% | 6,977,593,296 | 
| 549 | 2021-05-10 | 575 | -128 | -18.21% | 5,902,763,500 | 
In [10]:
df["price"] = df["price"].astype("float")
df["y"] = df["price"]
del df["price"]
In [11]:
m = Prophet(yearly_seasonality=True, daily_seasonality=True)
m.fit(df);
Initial log joint probability = -15.9553
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       1451.07    0.00584195       296.736      0.6175      0.6175      120   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     175       1459.72   0.000269593       188.882    4.03e-06       0.001      255  LS failed, Hessian reset 
     199       1461.03    0.00358711       63.8951           1           1      281   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     299       1474.05     0.0031716       110.216        2.47       0.247      422   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     399       1478.85   0.000305736       97.4373      0.2389      0.8459      553   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     484       1484.47   0.000221251       308.968     7.7e-07       0.001      722  LS failed, Hessian reset 
     499       1489.43     0.0183161        759.03      0.9425      0.9425      741   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     576       1496.29   9.00996e-05        121.46   4.415e-07       0.001      871  LS failed, Hessian reset 
     599       1496.49   6.25553e-05       54.5008      0.8659      0.8659      899   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     609        1496.6   8.54379e-05       127.601   5.918e-07       0.001      968  LS failed, Hessian reset 
     677       1497.31   8.34962e-05       126.461   5.707e-07       0.001     1092  LS failed, Hessian reset 
     699       1497.72   8.92596e-05       52.0291      0.3538           1     1120   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     762       1498.08    0.00094956       134.156   1.198e-05       0.001     1248  LS failed, Hessian reset 
     795       1498.13   6.37134e-05       80.7997   7.224e-07       0.001     1327  LS failed, Hessian reset 
     799       1498.14   2.69417e-05       71.3225      0.9428      0.9428     1331   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     863       1498.19   0.000182081        131.09   1.902e-06       0.001     1449  LS failed, Hessian reset 
     899       1498.21   1.62809e-05       94.3705      0.4056      0.4056     1501   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     921       1498.21   2.86055e-05       92.2919   3.413e-07       0.001     1568  LS failed, Hessian reset 
     953       1498.21     9.021e-08       68.0956      0.5355      0.5355     1610   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
In [12]:
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)
m.plot(forecast);
In [13]:
m.plot_components(forecast);
최근 30일 실제상황과 이전 데이터로 예측한 결과 비교¶
In [14]:
df
Out[14]:
| ds | movePrice | moveRatio | tradeCnt | y | |
|---|---|---|---|---|---|
| 0 | 2022-11-10 | +13.00 | +12.15% | 2,588,022,928 | 120.0 | 
| 1 | 2022-11-09 | -21.00 | -16.41% | 8,225,977,787 | 107.0 | 
| 2 | 2022-11-08 | -28.00 | -17.95% | 5,642,309,810 | 128.0 | 
| 3 | 2022-11-07 | -6.00 | -3.70% | 1,614,447,657 | 156.0 | 
| 4 | 2022-11-06 | -12.00 | -6.90% | 1,352,634,001 | 162.0 | 
| ... | ... | ... | ... | ... | ... | 
| 545 | 2021-05-14 | +77.00 | +12.50% | 10,570,378,481 | 693.0 | 
| 546 | 2021-05-13 | +112 | +22.22% | 8,336,740,115 | 616.0 | 
| 547 | 2021-05-12 | -103 | -16.97% | 3,357,814,191 | 504.0 | 
| 548 | 2021-05-11 | +32.00 | +5.57% | 6,977,593,296 | 607.0 | 
| 549 | 2021-05-10 | -128 | -18.21% | 5,902,763,500 | 575.0 | 
550 rows × 5 columns
In [15]:
df_trunc = df[30:]
In [16]:
df_trunc.head()
Out[16]:
| ds | movePrice | moveRatio | tradeCnt | y | |
|---|---|---|---|---|---|
| 30 | 2022-10-11 | +1.30 | +1.53% | 376,319,092 | 86.5 | 
| 31 | 2022-10-10 | -3.20 | -3.62% | 227,676,691 | 85.2 | 
| 32 | 2022-10-09 | +0.4000 | +0.45% | 102,770,310 | 88.4 | 
| 33 | 2022-10-08 | -0.7000 | -0.79% | 157,678,047 | 88.0 | 
| 34 | 2022-10-07 | -1.70 | -1.88% | 369,270,539 | 88.7 | 
In [17]:
m = Prophet(yearly_seasonality=True, daily_seasonality=True)
m.fit(df_trunc);
Initial log joint probability = -12.2597
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       1365.27    0.00961231        554.03           1           1      124   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     189       1390.26   0.000149221       211.509   4.973e-07       0.001      275  LS failed, Hessian reset 
     199       1393.75   0.000921735       212.989           1           1      287   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     233       1399.09   0.000162136       182.752   3.833e-07       0.001      365  LS failed, Hessian reset 
     295       1402.56   9.23932e-05        130.18   4.968e-07       0.001      495  LS failed, Hessian reset 
     299       1402.69    0.00388162       236.035           1           1      499   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     352       1407.87   0.000132523       199.276   6.053e-07       0.001      608  LS failed, Hessian reset 
     399       1415.13    0.00405186       605.383           1           1      670   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     475       1418.04    0.00109236        453.72   2.995e-06       0.001      802  LS failed, Hessian reset 
     499       1419.61    0.00395019       84.3746       3.144           1      834   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     576       1421.13   0.000289641       193.369   3.489e-07       0.001      992  LS failed, Hessian reset 
     596       1422.79   9.85651e-05       123.418   4.184e-07       0.001     1061  LS failed, Hessian reset 
     599       1422.98    0.00196055       157.696           1           1     1065   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     699       1426.51    0.00271886       129.709           1           1     1198   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     799       1429.13   8.46228e-05       119.088   1.039e-06       0.001     1371  LS failed, Hessian reset 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     899       1432.31    0.00294548       145.666           1           1     1494   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
     980       1433.02   3.31502e-05       99.8096   3.736e-07       0.001     1628  LS failed, Hessian reset 
     999       1433.02   3.24128e-06       86.9645           1           1     1652   
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
    1036       1433.02   7.25061e-08       83.7924      0.6562      0.6562     1706   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
In [18]:
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)
In [19]:
import seaborn as sns
In [20]:
type(df["ds"][0])
Out[20]:
pandas._libs.tslibs.timestamps.Timestamp
In [21]:
plt.figure(figsize=(20,10))
# sns.lineplot(df["ds"], df["y"], label="real")
sns.lineplot(x="ds", y="y", data=df, label='real')
# sns.lineplot(forecast["ds"], forecast["yhat"], label="perdict")
sns.lineplot(x="ds", y="yhat", data=forecast, label='predict')
plt.grid(True)
plt.legend()
plt.vlines(pd.to_datetime("22.10.10",format="%y.%m.%d"),0,700, color="green")
plt.text(pd.to_datetime("22.10.13",format="%y.%m.%d"),500, "----> 예측 결과", size=15, color="red")
plt.title("도지 코인 가격 변동 그래프 및 최근 30일 예측")
plt.xlabel("날짜")
plt.ylabel("가격")
plt.show()
연습 과정을 통해서 새롭게 알게 된 내용.
- 크롤링을 할 때 정적인 페이지는 BeautifulSoup만으로 가능하지만 동적인 페에지를 클롤링 하려면 Selenium을 사용해서 접근하여 사용해야한다.
- Selenium의 find_element 사용법이 조금 바뀐 것 같다. 방식은 그전과 동일한 것 같다.
반응형
    
    
    
  'Python > 라이브러리' 카테고리의 다른 글
| Mecab Mac 사용하기 + mecab 사용자 사전(user-dic) 추가하기 (0) | 2023.06.15 | 
|---|
