Let us agree on one thing: we’ve all dreamed of predicting the stock prices to invest our money but unfortunately it isn’t a simple task as multiple factors are to consider. In this article, we talk about using Prophet, a machine learning tool, to forecast the evolution of stock prices and therefore serve our purposes.
Sales predictive analytics is an important part of modern business intelligence and can be a very complex problem especially when some data is missing or with the presence of outliers.
Predicting sales of a product or the price evolution of a stock can be considered a time series problem. There are different models that have been developed in order to get robust predictions: ARIMA, SARIMA, SARIMAX, GARCH, etc. Each of these methods makes assumptions on the data (periodicity, stationarity, ..) and use statistical properties from the historical data. We are not going to walk th
At the end of the article, you will be able to :
- Explain the PROPHET algorithm to your friends.
- Understand how it can be applied to a concrete use case.
- Start applying it to the different time series problems that you encounter in your studies/work.
PROPHET
Prophet is Facebook’s time series forecasting algorithm that was released in 2016 as open source software with an implementation in Python and R.
Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.As Prophet was developed by Facebook, it is obvious that it is optimized for the business forecast tasks that they had encountered internally, which are characterized by :
- At least a year of history of hourly, daily or weekly observations.
- A number of missing observations or large outliers that is reasonable.
- Historical trend changes, following a major event (launch of a product for instance). These trends have
a non-linear growth as it attains saturation levels.
So now that we introduced Prophet, let’s see how it works. Actually, it is an additive regression model exploiting three types of information: trend, seasonality and a provided list of holidays.
Trend
Prophet automatically detects changes in trends by selecting points in the data where the change is glaring. It enables it to produce a growth curve trend.
Seasonality
Prophet models yearly seasonal component using Fourier series and weekly seasonal components using dummy variables.
Holidays
The user has to provide the list of holidays so it helps Prophet to adapt its framework to the data of interest. You can imagine that the holidays may impact the trend as many products are launched at that time and the buyers are more inspired.
Predict stock prices – Real use case
The objective here is to predict the stock price evolution of a company listed on NYSE or NASDAQ. As in the article about loan repayment, the data is downloaded from Kaggle: Daily Historical Stock Prices.
It contains two datasets that are explored below. To conduct our experiments, we will use Python, Pandas, and Prophet libraries.
Dataset Exploration
Stock information dataset
import pandas as pd
stock_information = pd.read_csv("historical_stocks.csv")
stock_information.head()
ticker | exchange | name | sector | industry |
PIH | NASDAQ | 1347 PROPERTY INSURANCE HOLDING, INC. | FINANCE | PROPERTY-CASUALTY INSURERS |
TURN | NASDAQ | 180 DEGREE CAPITAL CORP. | FINANCE | FINANCE/INVESTORS SERVICES |
In the first dataset, we have the descriptions of the multiple tickers: in which exchange they are listed on, their name, their sector and the industry they’re related to.
There are 6460 unique tickers in the dataset. Let’s see the distribution of tickers by sector.
The tickers come from 13 different sectors : Finance, consumer services, technology, public utilities, capital goods, basic industries, health care, consumer durables, energy, miscellaneous, transportation and consumer non-durables. The majority of tickers come from the financial sector.
Stock price dataset
stock_prices = pd.read_csv("historical_stock_prices.csv")
stock_prices.head()
ticker | open | close | adj_close | low | high | volume | date |
AHH | 11.50 | 11.58 | 8.493155 | 11.25 | 11.68 | 4633900 | 2013-05-08 |
AHH | 11.66 | 11.55 | 8.471151 | 11.50 | 11.66 | 275800 | 2013-05-09 |
In the second dataset, we have the stock price evolution of the tickers : the open, low, high, closing and adjusted closing prices every working day.
Predictions
To make our predictions, we will focus on the closing price of one ticker from the financial sector and study its time series. Our choice fell on PIH.
data_PIH = stock_prices[stock_prices["ticker"] == "PIH"]
data_PIH["date"] = pd.to_datetime(data_PIH["date"])
data_PIH.head()
ticker | open | close | adj_close | low | high | volume | date |
PIH | 8.00 | 7.95 | 7.95 | 7.90 | 8.50 | 642900 | 2014-04-01 |
PIH | 7.94 | 8.16 | 8.16 | 7.90 | 8.29 | 228400 | 2014-04-02 |
We have 1095 entries for this ticker corresponding to the evolution from 2013/04/01 to 2018/08/24.
Let’s now use Prophet to predict the prices for June, July and August 2018 based on the previous prices.
First, we create the training and test datasets.
limit = np.datetime64(datetime.date(2018, 6, 1))
data_train = data_PIH[data_PIH.date < limit][["date","close"]].dropna()
data_test = data_PIH[data_PIH.date > limit][["date","close"]].dropna()
Second, we adapt the training data column names to the
data_train.columns = ["ds", "y"]
Third, we fit Prophet on the training data and we use the fitted model to make predictions on the three months corresponding to the test data. For this purpose, we use the “make_future_dataframe” feature. You should be aware that Prophet predicts also the values for the past it was trained on and therefore we filter the output to compare it later with test data.
from fbprophet import Prophet
m = Prophet()
m.fit(data_train)
future = m.make_future_dataframe(periods=90)
forecast = m.predict(future)
#Here we filter on the wanted dates as Prophet predicts the values of the past too
forecast_test = forecast[forecast.ds > np.datetime64(datetime.date(2018,6,1))]
forecast_test[["ds","yhat","yhat_lower","yhat_upper"]].head()
What I like about Prophet is that it gives information about the predictions’ confidence. As you can see, it outputs the predicted value corresponding to the date in “ds” as “yhat”, the lower bound of the confidence interval “yhat_lower”, and the upper bound of the confidence interval “yhat_upper”.
Let’s use the “plot” method now to visualize the forecasted values. The dots are ground truth values from the training data.


We notice that the forecasting isn’t very accurate when it comes to day-per-day performance but Prophet succeeds into capturing the trend and the seasonality. It is a good forecast “in average”. To understand more how it works, we plot the major components (trend, weekly seasonality, and yearly seasonality).


The seasonality information is very useful for investors as the yearly seasonality indicates that it is lucrative to invest before July and sell in September.
Let’s filter now our output only on the dates occurring in the test data and compute the RMSE metric between the predictions and the ground truth.
We get an RMSE of 0.24 which means that in average we make an error of 0.24 on our predictions (e.g. If the value is 7, we predict 7.24 or 6.76 on average).
Let’s plot the predictions against the true values to get more insight about the quality of our forecasting.


The plot confirms what we said earlier about Prophet being good in average. It cannot be accurate at the extreme values corresponding to unexpected events impacting the stock price.
Conclusion
We saw in this article a real use case of Prophet algorithm which is a time series forecasting algorithm developed by Facebook and open-sourced in 2016. This short analysis enabled us to see that Prophet is well suited for long-term investments as it can’t be accurate in the very short term when some unexpected events occur and impact the price.