6-Dec

In this update ACF and PACF are explained and the results which we got is being shared.

The snippet for ACF and PACF is:

The results which we got is:

PACF:

 

ACF:

For the ‘hotel_avg_daily_rate’, the Partial Autocorrelation Function (PACF) shows a significant initial spike at lag 1 with subsequent lags falling within the confidence interval, suggesting an AR(1) component for the SARIMA model. A progressive decrease in the Autocorrelation Function (ACF) suggests that differencing (d) should be included in order to attain stationarity and maybe a non-seasonal MA component.

In contrast, ‘hotel_occup_rate’ displays a strong initial spike in the PACF and significant seasonal spikes in the ACF, indicating potential seasonal MA components. This points to a SARIMA model with an AR(1) component and seasonal differencing, likely SARIMA(1,1,0)x(0,1,Q)12, where ‘Q’ corresponds to the significant seasonal lags observed in the ACF plot. The exact value for ‘Q’ would require further analysis of the seasonal lags, but the presence of clear seasonality suggests it would be non-zero.

In our next update, we will incorporate these values to our SARIMA model and discuss about the results we get.

 

4-Dec

In this update, we are going to explore more about SARIMA model and how it is going to be helpful for our project. SARIMA is a sophisticated time series forecasting model that we have been examining in our continuous investigation of the BPDA economic indicators dataset. Understanding the link between the Autocorrelation Function (ACF), Partial Autocorrelation Function (PACF), and the SARIMA model itself is essential to using the model effectively.

ACF shows the correlation of a time series with itself, lagged by x time units. In simpler terms, it tells us how well the current value of the series is related to its past values. PACF , on the other hand, reveals the partial correlation of a time series with its own lagged values, controlling for the values at all shorter lags. It isolates the impact of each lag from the others.

Selecting AR (Autoregressive) terms (p in SARIMA): The PACF plot is instrumental in determining the order of the AR part of the SARIMA model. Significant spikes in the PACF plot indicate potential AR terms. For instance, if the PACF cuts off after lag p, this suggests an AR(p) model.

Selecting MA (Moving Average) terms (q in SARIMA): The ACF plot helps identify the appropriate number of MA terms. A significant spike at lag q in the ACF plot suggests a MA(q) term in the model.

Seasonal Elements (P, D, Q, s in SARIMA): Similar principles apply to the seasonal components of SARIMA. Seasonal spikes in these plots can help determine the P and Q terms, with the lag at which these spikes occur guiding the selection of s (seasonal period).

Using ACF and PACF studies on project columns like “hotel_occup_rate” and “avg_daily_rate” will help us understand their underlying patterns better and give us direction for creating our SARIMA models. For example, these results suggest that we integrate seasonal AR and non-seasonal MA components in our SARIMA model if the ‘hotel_occup_rate’ ACF plot has a clear seasonal pattern and the PACF plot displays an abrupt cut-off.

In the next update, we will discuss how the Python code for ACF and PACF works and how it is integrated to the SARIMA model.

1-Dec

I’m excited to talk about forecasting because it’s the next important part of our project. Forecasting is the process of making future predictions utilizing information from the past and present. It’s a commonly used technique to forecast consumer behavior, market trends, and the outcomes of governmental changes in several fields, including economics.

What is Forecasting?
Statistical models are used to forecast future values of a variable, such as sales volume, stock prices, or in this case, economic indicators. It’s like looking into a crystal ball and predicting the future by looking at historical patterns.

Why is Forecasting Important?
We forecast primarily in order to plan ahead and make educated decisions. Companies use forecasting to develop strategy, budget resources, and manage inventory. Governments use forecasts to plan policies and get ready for changes in the economy. More precise forecasting can result in more effective strategic planning and decision-making.

Where is Forecasting Used?
Many different sectors and companies use forecasting. It is essential for public planning and environmental management, retail organizations use it to forecast demand to manage stock levels, the transportation sector uses it to forecast travel trends to optimize timetables and routes, and the financial industry utilizes it to predict market movements.

How is Forecasting Conducted?
Broadly defined, the process of forecasting comprises data collection and analysis, model selection, and forecast generation using the chosen model. The specific forecasting goals and the kind of data determine which model is best. The process could be as simple as extending the current trend into the future or as complex as using computers to predict shifts in the stock market.

One of the types we use for our project is SARIMA. Seasonal ARIMA is another name for SARIMA, or Seasonal AutoRegressive Integrated Moving Average. It is an improved ARIMA model designed specifically to handle data from time series containing seasonal components. Because SARIMA is effective at capturing both the regular patterns and the periodic swings in the data, it is particularly helpful for datasets where seasonality is significant, like our economic indicators collection.

Our study will forecast the economic indices of Boston using SARIMA. This model matches our dataset excellently, which has both clear seasonal tendencies and more general patterns. With SARIMA, we hope to provide precise and enlightening predictions on a number of economic factors, including real estate values and job vacancies.

 

29-Nov

In this latest update, I’ll pick up right where I left off in the previous one. I’ll share how I integrated the Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests into my project using Python. Additionally, I’ll discuss the interpretations and insights I gained from the outcomes of these tests.

Here is the snippet of the Python Code:

Based on the output we can interpret the following:

Starting with the ADF test results, we find that the ‘logan_passengers’ and ‘total_jobs’ series have p-values of 0.985 and 0.948 respectively, which are well above the common significance level of 0.05. This means we cannot reject the null hypothesis of a unit root, suggesting that these series are non-stationary. The implications are significant — it indicates that the data may contain trends or unit roots that could affect the accuracy of our forecasts.

On the other hand, the ‘hotel_avg_daily_rate’ has an ADF statistic of -3.597 with a p-value of 0.0058, which is below the 0.05 threshold. This suggests that the time series is stationary, and we can reject the null hypothesis of a unit root presence.

Turning to the KPSS test, the scenario is somewhat reversed. For ‘logan_passengers’ and ‘total_jobs’, the KPSS statistic is above the critical value, and the p-value is at 0.01, indicating we can reject the null hypothesis of stationarity at the 1% level. This reinforces the ADF test results, confirming that these series are indeed non-stationary. However, ‘hotel_occup_rate’ and ‘hotel_avg_daily_rate’ showed KPSS p-values higher than 0.05, suggesting we cannot reject the null hypothesis, which is consistent with the series being stationary.

This dual approach of using both ADF and KPSS tests gives us a robust understanding of our time series data’s stationarity. For ‘hotel_avg_daily_rate’, which appears to be stationary, we can proceed with further analysis or forecasting without the need for differencing or detrending. However, for ‘logan_passengers’ and ‘total_jobs’, we might consider differencing the data to remove the non-stationarity or apply models that account for trends and seasonal components.

In our next update, we will explore about forecasting and its uses in our project.

27-Nov

As we continue to explore Boston’s economy using the BPDA dataset, we’ve identified various trends and seasonal patterns within our key indicators. The next crucial step in our Time Series Analysis is to establish the stationarity of these time series. Stationarity is a vital concept which, in simple terms, means that the statistical characteristics of the series—like the average or variance—remain consistent over time.

Why does stationarity matter? In essence, most forecasting models rely on the assumption that the patterns we see in the past will hold true for the future. If a time series isn’t stationary, it can affect the accuracy of our predictions, leading to unreliable models.

To tackle this, we’ll introduce two popular statistical tests in our next update: the Augmented Dickey-Fuller test (ADF Test) and the Kwiatkowski-Phillips-Schmidt-Shin test (KPSS test).

  1. Augmented Dickey-Fuller (ADF) Test: This test is used to identify the presence of a unit root in the series, which indicates non-stationarity. If the test statistic is less than the critical value, we reject the null hypothesis of a unit root (non-stationarity), suggesting the time series is stationary.
  2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test: In contrast to the ADF test, the null hypothesis for the KPSS test is that the time series is stationary. If the test statistic is greater than the critical value, we reject the null hypothesis, indicating the presence of a trend (non-stationarity).

While we focus on the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test, we will also include a crucial component: the Autocorrelation Function (ACF). Integrating ACF with these tests provides a more comprehensive approach to analyzing time series data, like the economic indicators from Boston.

We’ll explore how ACF can be used in conjunction with the ADF and KPSS tests to check for stationarity. While ADF and KPSS tests help us identify whether the series is stationary, ACF will provide insights into the nature of the time dependency within the series. We’ll discuss how ACF helps in selecting and validating models, especially in the context of ARIMA models, where identifying the right order of autoregression is key.

In our next update, we will discuss on how to include these tests in our project and discuss about the results we get from them.

24-Nov

I’m pleased to bring you some key findings from the Time Series Analysis we conducted on the BPDA economic indicators dataset. After applying statistical techniques to analyze the data, some patterns and trends have emerged that are worth highlighting. Attached to this update, you’ll find visual representations of the trends we’ve observed.

In the job market, represented by ‘total_jobs’, there’s a clear upward trajectory in job creation, indicating that there are more and more jobs being created over time, showing that the economy is doing well and that efforts to create more jobs are working. This steady increase is a good sign and shows that the job market is strong.

The ‘hotel_avg_daily_rate’ shows a seasonal pattern, with peaks and troughs. The average daily rates for hotels change with the seasons. They are highest in summer and fall, probably because more people are traveling for vacations and events during these times.

Similarly, ‘hotel_occup_rate’ follows a seasonal pattern, the occupancy rates of hotels also change with the seasons. They are highest in the summer, likely because of more tourists. The lower rates in winter might be because fewer people travel then.

Lastly, the ‘logan_passengers’ data displays a pattern with sharp increases and decreases throughout the year. This could be influenced by various factors, including vacation cycles, business travel trends, and broader economic factors.

In our next update, we’ll explore how to determine if a time series, like the ones we’re studying, is stationary along with the two common tests: the Augmented Dickey-Fuller test (ADF Test) and the Kwiatkowski-Phillips-Schmidt-Shin test (KPSS test). How can we ascertain if our time series is stationary and how to use these techniques, we will see it in our next update.

 

22-Nov

Following our introduction to Time Series Analysis, I’m now ready to share how I’m applying this technique to our BPDA economic indicators project.

My primary goal is to uncover patterns and trends in Boston’s economy over time. To do this, I’m analyzing data points like employment rates, housing prices, passenger traffic at Logan Airport, and hotel occupancy rates. By examining these data points in a time-ordered sequence, we can observe how they have evolved and identify any recurring patterns.

Here’s how I am doing it: First, I’m looking for trends in the dataset. This means identifying whether certain economic indicators have been increasing, decreasing, or staying relatively constant over the years. For example, I might find that hotel occupancy rates have shown an upward trend during summer months, indicating peak tourist activity.

Next, I’m examining seasonality. This involves identifying patterns that repeat at regular time intervals. Do housing prices consistently go up in certain months? Is there a particular time of year when job numbers spike? Understanding these seasonal trends can be crucial for planning and policy-making.

I’m also on the lookout for any outliers or unusual data points that don’t fit the regular patterns. These could signify extraordinary events or changes in the economic environment of Boston.

A snippet of Python code where I used the Time Series Analysis has been attached.

In my upcoming update, I’ll share some of the key findings and insights from this analysis. Stay tuned for more detailed insights!