For our analyses, we collected data from two websites, namely tripadvisor.com and booking.com, which provide users' generated contents as reviews of touristic structures.
The TripAdvisor's dataset is composed of:
For the TripAdvisor dataset, three kinds of structures are available, i.e. attractions, hotels and restaurants.
Number of review per structure type:
structure type | number of reviews |
---|---|
attraction | 34315 |
hotel | 71365 |
restaurant | 204308 |
The structure type "attractions" includes items like "things to do" and "place to visit".
The Booking dataset includes only reviews referred to hotels and it was collected between June 2016 and August 2016. In particular, we considered the reviews of hotels in Lucca, New York and Paris. For each city, we selected only hotels with more than 1000 reviews, regardless the language they were written.
Number of reviews per city:
city | number of reviews |
---|---|
Lucca | 53730 |
NewYork | 182438 |
Paris | 93164 |