Research article
Forecasting tourism demand with multisource big data

https://doi.org/10.1016/j.annals.2020.102912Get rights and content

Highlights

  • This study forecasts weekly tourism arrivals to a national park in China.

  • Internet big data from a search engine and online review platforms are employed.

  • Findings suggest the superiority of multiple-source big data forecasting.

  • Forecasting based on online review data from multiple platforms is preferred.

Abstract

Based on internet big data from multiple sources (i.e., the Baidu search engine and two online review platforms, Ctrip and Qunar), this study forecasts tourist arrivals to Mount Siguniang, China. Key findings of this empirical study indicate that (a) tourism demand forecasting based on internet big data from a search engine and online review platforms can significantly improve forecasting performance; (b) compared with tourism demand forecasting based on single-source data from a search engine, demand forecasting based on multisource big data from a search engine and online review platforms demonstrates better performance; and (c) compared with tourism demand forecasting based on online review data from a single platform, forecasting performance based on multiple platforms is significantly better.

Introduction

Tourism demand forecasting plays an important role in the travel and tourism industry, and it provides important implications for destination policymakers and tourism practitioners (Colladon, Guardabascio, & Innarella, 2019). Predicting tourist arrivals is also important for the planning, operation, and management of tourist attractions (Huang, Zhang, & Ding, 2017). Specifically, Dergiades, Mavragani, and Pan (2018) stated that accurate tourism demand forecasting can benefit medium- to long-term marketing and tourism strategy development, pricing policies, investment plans and strategies, and allocation of limited resources. Given its importance, precise and timely tourism demand forecasting has become an increasingly popular topic in academic research.

Traditional tourism demand forecasting relies on structured statistical data published by governments. Yet forecasting is inherently limited by delayed and low-frequency publication of such data, leading to inaccurate predictions (Huang et al., 2017). Internet big data offer a valuable opportunity to provide timely tourism demand forecasting and to increase forecasting accuracy. These data can measure and monitor tourist behaviors and satisfaction in a timely manner while overcoming lags in traditional forecasting methods (Huang et al., 2017). Therefore, internet big data are effective supplements to traditional data sources (Choi & Varian, 2012; Wamba, Akter, Edwards, Chopin, & Gnanzou, 2015). Yang, Pan, and Song (2014) contended that internet big data can reveal tourists' preferences and their changes in real time in addition to providing high-frequency information (e.g., daily or weekly). Up-to-date information on tourist changes compensates for the limitations of tourism demand forecasting when using traditional data, as such methods often fail to forecast tourism demand accurately in cases of one-off events where data patterns change (Dergiades et al., 2018).

Extensive research has applied internet big data, such as search engine data or website traffic data, to forecast tourism demand. Several empirical studies have demonstrated the usefulness of search query data in improving the forecasting of tourism demand (Bangwayo-Skeete & Skeete, 2015; Li, Chen, Wang, & Ming, 2018; Li, Pan, Law, & Huang, 2017; Sun, Wei, Tsui, & Wang, 2019), hotel room demand (Pan, Wu, & Song, 2012), and tourist attraction demand (Huang et al., 2017; Peng, Liu, Wang, & Gu, 2017). Apart from search query data, website traffic data have also been found to improve the forecasting accuracy of hotel demand in a destination (Pan & Yang, 2017; Yang et al., 2014).

Tourism businesses and destinations can also gain useful insight from content analysis of social media data, such as online reviews, and customers prefer to trust peer-supplied reviews rather than information from service providers (Xiang, Schwartz, Gerdes Jr, & Uysal, 2015). Similarly, social media data can help practitioners anticipate rapid changes in tourists' preferences and popularity trends related to destinations and local attractions; such information can be gleaned from the number of online reviews and tourists' sentiments embedded within them. Some studies have revealed the usefulness of online reviews in forecasting product sales beyond tourism contexts (Dellarocas, Zhang, & Awad, 2007; Fan, Che, & Chen, 2017; Schneider & Gupta, 2016; Yu, Liu, Huang, & An, 2010). Accordingly, online reviews have been deemed highly important, with the potential to be incorporated into tourism demand predictions (Colladon et al., 2019).

Although previous studies have indicated that internet big data can greatly enhance tourism demand forecasting performance and offer valuable practical implications, several research gaps in tourism forecasting with such data should be addressed. First, most research has relied on volume-based search engine data or website traffic data for tourism demand forecasting; few studies have referred to volume- and sentiment-based social media data, which are much richer and can reflect tourists' attention and sentiments. Even so, volume-based data have their own shortcomings: a higher volume of website traffic does not necessarily reflect greater consumer interest in visiting a destination; in fact, the opposite may be true. For example, the Hong Kong protests in 2019 garnered increasing online attention, but the number of visitors to Hong Kong actually declined amidst safety concerns. Therefore, it would make sense to integrate volume-based and complementary sentiment-based variables when forecasting tourism demand. In particular, consumer-generated online reviews from online travel websites provide useful reflections of consumers' behaviors and satisfaction (Xiang et al., 2015; Ye, Law, & Gu, 2009), yet this type of data has yet to be employed to forecast tourism demand. Second, most prior studies considered internet big data from a single source, either from a search engine or the website of a specific destination marketing organization. However, few studies have investigated tourism demand forecasting performance by including big data from multiple sources in a single forecasting model. Overly narrow and insufficiently diverse data are major culprits of poor model forecasting; under such circumstances, models do not perform well in a variety of cases (Phillips, Dowling, Shaffer, Hodas, & Volkova, 2017). This limitation can be overcome by incorporating data from multiple, often complementary sources (Jia et al., 2016; Pan & Yang, 2017; Phillips et al., 2017). On this basis, this study will address the following research question: Can incorporating internet big data, including search query data and online review data, into a model improve forecasting accuracy over a model using only internet search query data?

Section snippets

Common methods of tourism demand forecasting

Common approaches to tourism demand forecasting consist of time series models, econometric models, and artificial intelligence (AI) models (Li et al., 2017; Song & Li, 2008). Classical time series models include the naïve model, exponential smoothing model, autoregressive–moving-average (ARMA) models, and structural time series model (Peng, Song, & Crouch, 2014). Although time series models offer distinct advantages in forecasting accuracy, they seldom consider the influencing factors of

Methodology

We propose an integrated framework (see Fig. 1) to incorporate search query and online review data into tourism demand forecasting. This framework includes four steps: 1) data collection, 2) data processing and variable calculation, 3) model specification, and 4) model estimation and forecasting performance evaluation. In the first step, we collected three types of data: weekly tourist arrival data from a tourist attraction's official website, search query data from Baidu's search engine, and

Results

Two groups of comparisons were conducted to answer our research question (see Fig. 3). The first comparison group was used to test whether incorporating internet big data (i.e., search query data and online review data) into a single forecasting model could improve forecasting accuracy; the second comparison group was used to test whether combining online review data from Ctrip and Qunar into one forecasting model could improve the forecasting accuracy.

To test whether incorporating search query

Conclusions and implications

Internet big data have revolutionized how tourism demand is forecasted (Yang et al., 2014; Volchek et al., 2019). Based on the case of a national park in China, the empirical results of this study revealed that compared with the benchmark model without any internet big data variables, tourism demand forecasting incorporating internet big data from a search engine and online review platforms could significantly improve forecasting performance. Moreover, we found that compared with tourism demand

Acknowledgments

This paper and research project (Project Account Code: 5-ZJLT) is funded by Research Grant of Hospitality and Tourism Research Centre (HTRC Grant) of the School of Hotel and Tourism Management, The Hong Kong Polytechnic University. This paper is also supported by the National Natural Science Foundation of China (71761001) and Hong Kong Scholars Program.

Hengyun Li is an assistant professor in the School of Hotel and Tourism Management at The Hong Kong Polytechnic University ([email protected]).

References (65)

  • W. Huang et al.

    Forecasting stock market movement direction with support vector machine

    Computers & Operations Research

    (2005)
  • X. Huang et al.

    The Baidu Index: Uses in predicting tourism flows–A case study of the Forbidden City

    Tourism Management

    (2017)
  • R. Law et al.

    Tourism demand forecasting: A deep learning approach

    Annals of Tourism Research

    (2019)
  • S. Li et al.

    Effective tourist volume forecasting supported by PCA and improved BPNN using Baidu index

    Tourism Management

    (2018)
  • X. Li et al.

    Forecasting tourism demand with composite search index

    Tourism Management

    (2017)
  • P.F. Pai et al.

    An improved neural network model in forecasting arrivals

    Annals of Tourism Research

    (2005)
  • P.F. Pai et al.

    A hybrid ARIMA and support vector machines model in stock price forecasting

    Omega

    (2005)
  • B. Peng et al.

    A meta-analysis of international tourism demand forecasting and implications for practice

    Tourism Management

    (2014)
  • R. Rivera

    A dynamic linear model to forecast hotel registrations in Puerto Rico using Google Trends data

    Tourism Management

    (2016)
  • M.J. Schneider et al.

    Forecasting sales of new and existing products using consumer reviews: A random projections approach

    International Journal of Forecasting

    (2016)
  • H. Song et al.

    Tourism demand modelling and forecasting-A review of recent research

    Tourism Management

    (2008)
  • S. Sun et al.

    Forecasting tourist arrivals with machine learning and internet search index

    Tourism Management

    (2019)
  • Z. Xiang et al.

    What can big data and text analytics tell us about hotel guest experience and satisfaction?

    International Journal of Hospitality Management

    (2015)
  • X. Yang et al.

    Forecasting Chinese tourist volume with search engine data

    Tourism Management

    (2015)
  • Y. Yang et al.

    Spatial-temporal forecasting of tourism demand

    Annals of Tourism Research

    (2019)
  • Q. Ye et al.

    The impact of online user reviews on hotel room sales

    International Journal of Hospitality Management

    (2009)
  • H. Akaike

    A new look at the statistical model identification

  • A.G. Assaf et al.

    Modeling and forecasting regional tourism demand using the Bayesian global vector autoregressive (BGVAR) model

    Journal of Travel Research

    (2019)
  • S. Bernard et al.

    Influence of hyperparameters on random forest accuracy

  • G.E.P. Box et al.

    Time series analysis, forecasting and control

    (1994)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • Cited by (122)

    View all citing articles on Scopus

    Hengyun Li is an assistant professor in the School of Hotel and Tourism Management at The Hong Kong Polytechnic University ([email protected]).

    Mingming Hu is an assistant professor at Guangxi University and a postdoctoral fellow at The Hong Kong Polytechnic University.

    Gang Li is professor of tourism economics at the University of Surrey in the UK.

    View full text