Let us agree on one thing: we’ve all dreamed of predicting the stock prices to invest our money but unfortunately it isn’t a simple task as multiple factors are to consider. In this article, we talk about using Prophet, a machine learning tool, to forecast the evolution of stock prices and therefore serve our purposes.

Sales predictive analytics is an important part of modern business intelligence and can be a very complex problem especially when some data is missing or with the presence of outliers.

Predicting sales of a product or the price evolution of a stock can be considered a time series problem. There are different models that have been developed in order to get robust predictions: ARIMA, SARIMA, SARIMAX, GARCH, etc. Each of these methods makes assumptions on the data (periodicity, stationarity, ..) and use statistical properties from the historical data. We are not going to walk through these methods as they are well established and excellent tutorials are already out there. The state-of-the-art (SOTA) predictive model is developed by Facebook research team: PROPHET.

At the end of the article, you will be able to :

• Explain the PROPHET algorithm to your friends.
• Understand how it can be applied to a concrete use case.
• Start applying it to the different time series problems that you encounter in your studies/work.

## PROPHET

Prophet is Facebook’s time series forecasting algorithm that was released in 2016 as open source software with an implementation in Python and R.

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.

As Prophet was developed by Facebook, it is obvious that it is optimized for the business forecast tasks that they had encountered internally, which are characterized by :

• At least a year of history of hourly, daily or weekly observations.
• A number of missing observations or large outliers that is reasonable.
• Historical trend changes, following a major event (launch of a product for instance). These trends have a non-linear growth as it attains saturation levels.

So now that we introduced Prophet, let’s see how it works. Actually, it is an additive regression model exploiting three types of information: trend, seasonality and a provided list of holidays.

### Trend

Prophet automatically detects changes in trends by selecting points in the data where the change is glaring. It enables it to produce a growth curve trend.

### Seasonality

Prophet models yearly seasonal component using Fourier series and weekly seasonal components using dummy variables.

### Holidays

The user has to provide the list of holidays so it helps Prophet to adapt its framework to the data of interest. You can imagine that the holidays may impact the trend as many products are launched at that time and the buyers are more inspired.

## Predict stock prices – Real use case

The objective here is to predict the stock price evolution of a company listed on NYSE or NASDAQ. As in the article about loan repayment, the data is downloaded from Kaggle: Daily Historical Stock Prices.

It contains two datasets that are explored below. To conduct our experiments, we will use Python, Pandas, and Prophet libraries.

### Dataset Exploration

#### Stock information dataset

```import pandas as pd
```
 ticker exchange name sector industry PIH NASDAQ 1347 PROPERTY INSURANCE HOLDING, INC. FINANCE PROPERTY-CASUALTY INSURERS TURN NASDAQ 180 DEGREE CAPITAL CORP. FINANCE FINANCE/INVESTORS SERVICES

In the first dataset, we have the descriptions of the multiple tickers: in which exchange they are listed on, their name, their sector and the industry they’re related to.

There are 6460 unique tickers in the dataset. Let’s see the distribution of tickers by sector.

The tickers come from 13 different sectors : Finance, consumer services, technology, public utilities, capital goods, basic industries, health care, consumer durables, energy, miscellaneous, transportation and consumer non-durables. The majority of tickers come from the financial sector.

#### Stock price dataset

```stock_prices = pd.read_csv("historical_stock_prices.csv")
```
 ticker open close adj_close low high volume date AHH 11.50 11.58 8.493155 11.25 11.68 4633900 2013-05-08 AHH 11.66 11.55 8.471151 11.50 11.66 275800 2013-05-09

In the second dataset, we have the stock price evolution of the tickers : the open, low, high, closing and adjusted closing prices every working day.

### Predictions

To make our predictions, we will focus on the closing price of one ticker from the financial sector and study its time series. Our choice fell on PIH.

```data_PIH = stock_prices[stock_prices["ticker"] == "PIH"]
data_PIH["date"] = pd.to_datetime(data_PIH["date"])
```
 ticker open close adj_close low high volume date PIH 8.00 7.95 7.95 7.90 8.50 642900 2014-04-01 PIH 7.94 8.16 8.16 7.90 8.29 228400 2014-04-02

We have 1095 entries for this ticker corresponding to the evolution from 2013/04/01 to 2018/08/24.

Let’s now use Prophet to predict the prices for June, July and August 2018 based on the previous prices.

First, we create the training and test datasets.

```limit = np.datetime64(datetime.date(2018, 6, 1))
data_train = data_PIH[data_PIH.date < limit][["date","close"]].dropna()
data_test  = data_PIH[data_PIH.date > limit][["date","close"]].dropna()
```

Second, we adapt the training data column names to the needs of Prophet. The date column should be named “ds” whereas the values column should be named y.

```data_train.columns = ["ds", "y"]
```

Third, we fit Prophet on the training data and we use the fitted model to make predictions on the three months corresponding to the test data. For this purpose, we use the “make_future_dataframe” feature. You should be aware that Prophet predicts also the values for the past it was trained on and therefore we filter the output to compare it later with test data.

```from fbprophet import Prophet

m = Prophet()
m.fit(data_train)

future = m.make_future_dataframe(periods=90)

forecast = m.predict(future)

#Here we filter on the wanted dates as Prophet predicts the values of the past too
forecast_test = forecast[forecast.ds > np.datetime64(datetime.date(2018,6,1))]
```

What I like about Prophet is that it gives information about the predictions’ confidence. As you can see, it outputs the predicted value corresponding to the date in “ds” as “yhat”, the lower bound of the confidence interval “yhat_lower”, and the upper bound of the confidence interval “yhat_upper”.

Let’s use the “plot” method now to visualize the forecasted values. The dots are ground truth values from the training data.

We notice that the forecasting isn’t very accurate when it comes to day-per-day performance but Prophet succeeds into capturing the trend and the seasonality. It is a good forecast “in average”. To understand more how it works, we plot the major components (trend, weekly seasonality, and yearly seasonality).

The seasonality information is very useful for investors as the yearly seasonality indicates that it is lucrative to invest before July and sell in September.

Let’s filter now our output only on the dates occurring in the test data and compute the RMSE metric between the predictions and the ground truth.

We get an RMSE of 0.24 which means that in average we make an error of 0.24 on our predictions (e.g. If the value is 7, we predict 7.24 or 6.76 on average).

Let’s plot the predictions against the true values to get more insight about the quality of our forecasting.

The plot confirms what we said earlier about Prophet being good in average. It cannot be accurate at the extreme values corresponding to unexpected events impacting the stock price.

## Conclusion

We saw in this article a real use case of Prophet algorithm which is a time series forecasting algorithm developed by Facebook and open-sourced in 2016. This short analysis enabled us to see that Prophet is well suited for long-term investments as it can’t be accurate in the very short term when some unexpected events occur and impact the price.

Previous articleIs the borrower gonna pay?
Next articleThe Transformer Network
I am a Moroccan data scientist based in France who believes that the African continent has lost several development opportunities in the past and it shouldn’t miss the artificial intelligence revolution because faster and more efficient processes are needed nowadays.