Review Date - Visit Date Analysis

Each review in the Tripadvisor's dataset contains two different dates:

  • review date: indicates the date in which the review was written. It is always available and its provides day, month and year.y
  • visit date: indicates the date in which the structure was visited by the reviewer. It is not always available and provides only month and year.

We perform an analysis to estimate the size of the difference between the review date and the visit date, that is the time elapsed between the moment of the visit and the moment in which the review is written. To this aim, we used only reviews for which both dates are available. When considering the visit date, since the exact day of the visit is unavailable, we set the first day of the correspondent month as the day of the visit date.

The following graph shows the distribution of the delta of the dates, computed as the difference between each review and the visit date. timedelta

The following graph shows the cumulative density function of the distribution. timedelta cdf

Descriptive statistics for the distribution:

variable value
count 281338.000000
mean 40.902345
std 62.461652
min 1.000000
25% 13.000000
50% 23.000000
75% 34.000000
80% 40.000000
90% 83.000000
max 497.000000

The table highlights that for 50% of the reviews, the difference is 23 days, for 75% of the reviews, dates differs at most 34 days and the remaining 25% of reviews belong to the "long tail" of the distribution.

This result indicates that the error introduced using the review date instead of the visit date is limited to a month for the majority of reviews.