Research articleForecasting tourism demand with multisource big data
Introduction
Tourism demand forecasting plays an important role in the travel and tourism industry, and it provides important implications for destination policymakers and tourism practitioners (Colladon, Guardabascio, & Innarella, 2019). Predicting tourist arrivals is also important for the planning, operation, and management of tourist attractions (Huang, Zhang, & Ding, 2017). Specifically, Dergiades, Mavragani, and Pan (2018) stated that accurate tourism demand forecasting can benefit medium- to long-term marketing and tourism strategy development, pricing policies, investment plans and strategies, and allocation of limited resources. Given its importance, precise and timely tourism demand forecasting has become an increasingly popular topic in academic research.
Traditional tourism demand forecasting relies on structured statistical data published by governments. Yet forecasting is inherently limited by delayed and low-frequency publication of such data, leading to inaccurate predictions (Huang et al., 2017). Internet big data offer a valuable opportunity to provide timely tourism demand forecasting and to increase forecasting accuracy. These data can measure and monitor tourist behaviors and satisfaction in a timely manner while overcoming lags in traditional forecasting methods (Huang et al., 2017). Therefore, internet big data are effective supplements to traditional data sources (Choi & Varian, 2012; Wamba, Akter, Edwards, Chopin, & Gnanzou, 2015). Yang, Pan, and Song (2014) contended that internet big data can reveal tourists' preferences and their changes in real time in addition to providing high-frequency information (e.g., daily or weekly). Up-to-date information on tourist changes compensates for the limitations of tourism demand forecasting when using traditional data, as such methods often fail to forecast tourism demand accurately in cases of one-off events where data patterns change (Dergiades et al., 2018).
Extensive research has applied internet big data, such as search engine data or website traffic data, to forecast tourism demand. Several empirical studies have demonstrated the usefulness of search query data in improving the forecasting of tourism demand (Bangwayo-Skeete & Skeete, 2015; Li, Chen, Wang, & Ming, 2018; Li, Pan, Law, & Huang, 2017; Sun, Wei, Tsui, & Wang, 2019), hotel room demand (Pan, Wu, & Song, 2012), and tourist attraction demand (Huang et al., 2017; Peng, Liu, Wang, & Gu, 2017). Apart from search query data, website traffic data have also been found to improve the forecasting accuracy of hotel demand in a destination (Pan & Yang, 2017; Yang et al., 2014).
Tourism businesses and destinations can also gain useful insight from content analysis of social media data, such as online reviews, and customers prefer to trust peer-supplied reviews rather than information from service providers (Xiang, Schwartz, Gerdes Jr, & Uysal, 2015). Similarly, social media data can help practitioners anticipate rapid changes in tourists' preferences and popularity trends related to destinations and local attractions; such information can be gleaned from the number of online reviews and tourists' sentiments embedded within them. Some studies have revealed the usefulness of online reviews in forecasting product sales beyond tourism contexts (Dellarocas, Zhang, & Awad, 2007; Fan, Che, & Chen, 2017; Schneider & Gupta, 2016; Yu, Liu, Huang, & An, 2010). Accordingly, online reviews have been deemed highly important, with the potential to be incorporated into tourism demand predictions (Colladon et al., 2019).
Although previous studies have indicated that internet big data can greatly enhance tourism demand forecasting performance and offer valuable practical implications, several research gaps in tourism forecasting with such data should be addressed. First, most research has relied on volume-based search engine data or website traffic data for tourism demand forecasting; few studies have referred to volume- and sentiment-based social media data, which are much richer and can reflect tourists' attention and sentiments. Even so, volume-based data have their own shortcomings: a higher volume of website traffic does not necessarily reflect greater consumer interest in visiting a destination; in fact, the opposite may be true. For example, the Hong Kong protests in 2019 garnered increasing online attention, but the number of visitors to Hong Kong actually declined amidst safety concerns. Therefore, it would make sense to integrate volume-based and complementary sentiment-based variables when forecasting tourism demand. In particular, consumer-generated online reviews from online travel websites provide useful reflections of consumers' behaviors and satisfaction (Xiang et al., 2015; Ye, Law, & Gu, 2009), yet this type of data has yet to be employed to forecast tourism demand. Second, most prior studies considered internet big data from a single source, either from a search engine or the website of a specific destination marketing organization. However, few studies have investigated tourism demand forecasting performance by including big data from multiple sources in a single forecasting model. Overly narrow and insufficiently diverse data are major culprits of poor model forecasting; under such circumstances, models do not perform well in a variety of cases (Phillips, Dowling, Shaffer, Hodas, & Volkova, 2017). This limitation can be overcome by incorporating data from multiple, often complementary sources (Jia et al., 2016; Pan & Yang, 2017; Phillips et al., 2017). On this basis, this study will address the following research question: Can incorporating internet big data, including search query data and online review data, into a model improve forecasting accuracy over a model using only internet search query data?
Section snippets
Common methods of tourism demand forecasting
Common approaches to tourism demand forecasting consist of time series models, econometric models, and artificial intelligence (AI) models (Li et al., 2017; Song & Li, 2008). Classical time series models include the naïve model, exponential smoothing model, autoregressive–moving-average (ARMA) models, and structural time series model (Peng, Song, & Crouch, 2014). Although time series models offer distinct advantages in forecasting accuracy, they seldom consider the influencing factors of
Methodology
We propose an integrated framework (see Fig. 1) to incorporate search query and online review data into tourism demand forecasting. This framework includes four steps: 1) data collection, 2) data processing and variable calculation, 3) model specification, and 4) model estimation and forecasting performance evaluation. In the first step, we collected three types of data: weekly tourist arrival data from a tourist attraction's official website, search query data from Baidu's search engine, and
Results
Two groups of comparisons were conducted to answer our research question (see Fig. 3). The first comparison group was used to test whether incorporating internet big data (i.e., search query data and online review data) into a single forecasting model could improve forecasting accuracy; the second comparison group was used to test whether combining online review data from Ctrip and Qunar into one forecasting model could improve the forecasting accuracy.
To test whether incorporating search query
Conclusions and implications
Internet big data have revolutionized how tourism demand is forecasted (Yang et al., 2014; Volchek et al., 2019). Based on the case of a national park in China, the empirical results of this study revealed that compared with the benchmark model without any internet big data variables, tourism demand forecasting incorporating internet big data from a search engine and online review platforms could significantly improve forecasting performance. Moreover, we found that compared with tourism demand
Acknowledgments
This paper and research project (Project Account Code: 5-ZJLT) is funded by Research Grant of Hospitality and Tourism Research Centre (HTRC Grant) of the School of Hotel and Tourism Management, The Hong Kong Polytechnic University. This paper is also supported by the National Natural Science Foundation of China (71761001) and Hong Kong Scholars Program.
Hengyun Li is an assistant professor in the School of Hotel and Tourism Management at The Hong Kong Polytechnic University ([email protected]).
References (65)
- et al.
A novel approach to model selection in tourism demand modelling
Tourism Management
(2015) - et al.
Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling approach
Tourism Management
(2015) - et al.
Support vector regression with genetic algorithms in forecasting tourism demand
Tourism Management
(2007) A comparison of three different approaches to tourist arrival forecasting
Tourism Management
(2003)- et al.
Forecasting tourism demand to Catalonia: Neural networks vs. time series models
Economic Modelling
(2014) - et al.
Exploring the value of online product reviews in forecasting sales: The case of motion pictures
Journal of Interactive Marketing
(2007) - et al.
Google Trends and tourists' arrivals: Emerging biases and proposed corrections
Tourism Management
(2018) - et al.
Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis
Journal of Business Research
(2017) - et al.
Forecasting city arrivals with Google Analytics
Annals of Tourism Research
(2016) - et al.
Testing the equality of prediction mean squared errors
International Journal of Forecasting
(1997)
Forecasting stock market movement direction with support vector machine
Computers & Operations Research
The Baidu Index: Uses in predicting tourism flows–A case study of the Forbidden City
Tourism Management
Tourism demand forecasting: A deep learning approach
Annals of Tourism Research
Effective tourist volume forecasting supported by PCA and improved BPNN using Baidu index
Tourism Management
Forecasting tourism demand with composite search index
Tourism Management
An improved neural network model in forecasting arrivals
Annals of Tourism Research
A hybrid ARIMA and support vector machines model in stock price forecasting
Omega
A meta-analysis of international tourism demand forecasting and implications for practice
Tourism Management
A dynamic linear model to forecast hotel registrations in Puerto Rico using Google Trends data
Tourism Management
Forecasting sales of new and existing products using consumer reviews: A random projections approach
International Journal of Forecasting
Tourism demand modelling and forecasting-A review of recent research
Tourism Management
Forecasting tourist arrivals with machine learning and internet search index
Tourism Management
What can big data and text analytics tell us about hotel guest experience and satisfaction?
International Journal of Hospitality Management
Forecasting Chinese tourist volume with search engine data
Tourism Management
Spatial-temporal forecasting of tourism demand
Annals of Tourism Research
The impact of online user reviews on hotel room sales
International Journal of Hospitality Management
A new look at the statistical model identification
Modeling and forecasting regional tourism demand using the Bayesian global vector autoregressive (BGVAR) model
Journal of Travel Research
Influence of hyperparameters on random forest accuracy
Time series analysis, forecasting and control
Bagging predictors
Machine Learning
Random forests
Machine Learning
Cited by (122)
Selection biases in crowdsourced big data applied to tourism research: An interpretive framework
2024, Tourism ManagementIntellectual landscape and emerging trends of big data research in hospitality and tourism: A scientometric analysis
2024, International Journal of Hospitality ManagementRoute planning model based on multidimensional eigenvector processing in vehicular fog computing
2024, Computer CommunicationsLeveraging online reviews for hotel demand forecasting: A deep learning approach
2024, Information Processing and ManagementMining tourist preferences and decision support via tourism-oriented knowledge graph
2024, Information Processing and Management
Hengyun Li is an assistant professor in the School of Hotel and Tourism Management at The Hong Kong Polytechnic University ([email protected]).
Mingming Hu is an assistant professor at Guangxi University and a postdoctoral fellow at The Hong Kong Polytechnic University.
Gang Li is professor of tourism economics at the University of Surrey in the UK.