Michela Fazzolari, Marinella Petrocchi, Alessandro Tommasi, Cesare Zavattari

This webpage contains additional materia, which, for pages limit, has not been included into the research paper submitted to the Intl. Conference on Web Engineering (ICWE 2017). In particular, we report here abstract and structure of the paper, filling the sections with material that does not appear in the submitted paper.

Abstract

In this paper, we propose a novel approach for aggregating on-line reviews, according to the opinions they express. Our methodology is unsupervised - due to the fact that it does not rely on pre-labeled reviews - and it is agnostic - since it does not make any assumption about the domain or the language of the review content. We do not adopt opinion mining techniques; rather, we propose a novel metric, measuring the adherence of the review content to the domain terminology extracted from the reviews set. First, we demonstrate the informativeness of the adherence metric with respect to the score associated with a review. Then, we successfully apply this novel approach to group reviews, according to the opinions they express. Our extensive experimental campaign has been carried out on two large datasets of hotel reviews and products reviews, collected from Booking and Amazon, respectively.

Summary

  1. Introduction
  2. Review Adherence to Typical Terminology
    1. Extracting the Terminology
    2. Adherence Definition
  3. Datasets
  4. Experiments and Results
    1. Adherence Informativeness
    2. Good Opinions, Higher Adherence
    3. Extension to Different Languages
    4. Language-Agnostic Reviews Clustering
    5. Representative Terms in First and Last Bins
  5. Related Work
  6. Final Remarks

1. Introduction

In this paper, we propose an original approach to aggregating reviews with similar opinions. The proposed approach is unsupervised, since it does not rely on labelled reviews and training phases. Moreover, it is agnostic, needing no previous knowledge on either the reviews domain or language. Grouping of reviews is obtained by relying on a novel introduced metric, called adherence, which mea sures how much a review text inherits from a reference terminology, automatically extracted from an unannotated reviews corpus. Leveraging an extensive experimental campaign over two large reviews datasets, in different languages, from Booking and Amazon, we first demonstrate that the value of the adherence metric is informative, since it is correlated with the review score. Then, we exploit adherence to aggregate reviews according to the reviews positiveness. A further analysis on such groups highlights the most characteristic terms therein. This leads to the additional result of learning the best and worst features of a product.

2. Review Adherence to Typical Terminology

In this section, we define the adherence metric. Adherence measures how much one review adheres to the reference terminolog extracted from a review set. All the material for this section is in the submitted paper.

3. Datasets

We consider two large datasets, composed of reviews from Booking and Amazon. All the material for this section is in the submitted paper.

4. Experiments and Results

This section describes the experiments and their results.

4.1 Adherence Informativeness

First, we show the correlation between adherence values and reviews scores. Here, we report the graphs plotting the adherence vs the scores, both for Booking and for Amazon, for the balanced and the unbalanced case. Note: in the submitted paper, we reported the case for the balanced datasets only.
Amazon unbalanced dataset
Score vs Adherence - Amazon dataset - unbalanced
Amazon balanced dataset
Score vs Adherence - Amazon dataset - balanced
Booking unbalanced dataset
Score vs Adherence - Booking dataset - unbalanced
Booking balanced dataset
Score vs Adherence - Booking dataset - balanced

Note that, working on balanced bins, the typical terminology of the reviews corpus has been recalculated, with respect to what computed for the original, unbalanced dataset. Thus, even considering the less populated bin, the average adherence value for the balanced dataset is not the same as the one in the unbalanced dataset for the same bin (this holds both for Booking and Amazon datasets, it can be appreciated from the related figures). Also note that the under-sampling applied to the majority classes to balance the reviews number leads to a deterioration of the results (both for Amazon and Booking - see the differences between the balanced and unbalanced graphs).

4.2 Good Opinions, Higher Adherence

These results enrich the ones reported in the related section of the submitted paper.

The following table shows the average adherence and the average standard deviation for six Amazon categories, when considering the unbalanced dataset.

Amazon unbalanced Bluetooth Headsets Bluetooth Speakers Magnifiers Oral Irrigators Screen Protectors Unlocked Cell Phones
Score bin avg sth dev avg sth dev avg sth dev avg sth dev avg sth dev avg
sth dev
1 0.17 0.07 0.16 0.08 0.15 0.08 0.17 0.07 0.16 0.07 0.15 0.06
2 0.18 0.07 0.19 0.07 0.15 0.07 0.18 0.07 0.16 0.07 0.15 0.06
3 0.19 0.07 0.20 0.08 0.16 0.08 0.19 0.07 0.16 0.07 0.16 0.07
4 0.21 0.08 0.24 0.09 0.17 0.08 0.21 0.08 0.18 0.08 0.18 0.08
5 0.21 0.09 0.26 0.11 0.18 0.09 0.22 0.08 0.20 0.09 0.19 0.10


The following table shows the average adherence and the average standard deviation for six Amazon categories, when considering the balanced dataset.

Amazon balanced Bluetooth Headsets Bluetooth Speakers Magnifiers Oral Irrigators Screen Protectors Unlocked Cell Phones
Score bin avg sth dev avg sth dev avg sth dev avg sth dev avg sth dev avg
sth dev
1 0.17 0.07 0.18 0.08 0.16 0.08 0.18 0.07 0.17 0.07 0.15 0.06
2 0.19 0.07 0.20 0.07 0.16 0.08 0.19 0.06 0.17 0.07 0.16 0.06
3 0.19 0.07 0.21 0.08 0.16 0.08 0.20 0.07 0.16 0.07 0.16 0.07
4 0.21 0.08 0.23 0.08 0.17 0.08 0.22 0.08 0.18 0.08 0.18 0.08
5 0.21 0.09 0.23 0.09 0.18 0.09 0.21 0.08 0.19 0.09 0.18 0.09


The following table shows the average adherence and the average standard deviation for six Booking cities, when considering the unbalanced dataset.

Booking unbalanced London Los Angeles New York Paris Rome Sydney
Score bin avg sth dev avg sth dev avg sth dev avg sth dev avg sth dev avg
sth dev
1 0.18 0.15 0.17 0.13 0.17 0.15 0.18 0.14 0.18 0.14 0.20 0.16
2 0.23 0.16 0.21 0.15 0.23 0.16 0.24 0.17 0.22 0.15 0.21 0.15
3 0.27 0.17 0.24 0.16 0.26 0.17 0.28 0.18 0.26 0.17 0.22 0.15
4 0.29 0.19 0.27 0.17 0.28 0.18 0.32 0.19 0.30 0.18 0.26 0.18
5 0.30 0.19 0.27 0.18 0.28 0.18 0.32 0.19 0.32 0.18 0.29 0.19


The following table shows the average adherence and the average standard deviation for six Booking cities, when considering the balanced dataset.

Booking balanced London Los Angeles New York Paris Rome Sydney
Score bin avg sth dev avg sth dev avg sth dev avg sth dev avg sth dev avg
sth dev
1 0.24 0.18 0.22 0.15 0.25 0.22 0.24 0.18 0.24 0.17 0.27 0.23
2 0.25 0.16 0.23 0.16 0.25 0.17 0.26 0.18 0.26 0.17 0.25 0.16
3 0.27 0.17 0.25 0.16 0.27 0.17 0.29 0.18 0.29 0.16 0.23 0.14
4 0.30 0.19 0.27 0.17 0.29 0.19 0.32 0.20 0.31 0.18 0.28 0.18
5 0.30 0.20 0.28 0.19 0.29 0.19 0.31 0.18 0.32 0.19 0.30 0.21


As shown in the tables, the standard deviation within each bin is quite high. This suggests that, even correlated with the score, adherence is not a good measure when considering a single review. Indeed, its informativeness should be rather exploited by considering an ensemble of reviews, as shown in the rest of the submitted paper.

4.3 Extension to Different Languages

Here, we plot the score vs adherence values, for the Booking dataset, considering reviews in Italian and in French. For the graphs where we consider distinct positive ane negative contents, dashed lines correposnd to negative ones. This is completely new material, not shown in the submitted paper, due to pages limit.
Booking unbalanced dataset
Score vs Adherence - Italian Booking dataset - all content
Booking balanced dataset
Score vs Adherence - Italian Booking dataset - positive/negative
Booking unbalanced dataset
Score vs Adherence - French Booking dataset - all content
Booking balanced dataset
Score vs Adherence - French Booking dataset - positive/negative

4.4 Language-Agnostic Reviews Clustering

Here, some additional material to Section 4.4 of the submitted paper. The figures below show the values of the average differences between the average adherences of the first and last bins and the average differences between the average scores for the first and last bins, for items belonging to both the Amazon and Booking categories. In each graph, the x-axis reports the number of bins considered, wheres the y-axis represents the average differences values. We depicted the average differences for the adherence with a solid red line, while the average differences for the score with a dashed blue line. The graphs clearly show that, when the number of bin increases, the first and last bin include reviews which describe the product in a considerably different way, in term of positiveness.


btheadsets
Bluetooth headsets
btspeaker
Bluetooth Speakers
London
London
Newyork
New York

4.5 Representative Terms in First and Last Bins

Here, we extend Section 4.5 of the submitted paper, reporting the most relevant terms that appear in the first and last bins of reviews, as formed in Section 4.4. For ``most relevant", we intend 1) most frequent, and 2) belonging to the relevant terminology, and 3) not overlapping.

The following table shows the most relevant terms extracted for two Amazon products, from the reviews that, on average, speak worse and better of the products themselves.

Product Negative Score Terms in the first bin (reviews with lower adherence) Positive Score Terms in the last bin (reviews with higher adherence)
B005XA0DNQ 2.9 refund, packaging, casing, disconnected, gift, battery, packaged, addition, hooked, plugging, shipping, hook, speaker, purpose, sounds, kitchen 4.3 compact, sound, great, retractable, portable, very, price, unbelievable, satisfied, product, easy, recommend, small, perfect, little, handy, size
B0083RXA86 2.4 stereo, impressive, mostly, charger, button, product, charging, switch, louder, useless, price, usb 4.6 blue, charge, useful, very, coating, satisfied, quite, battery, enough, music, attachment, excellent, highly, quality, pleased

Similar as above, but over the five Amazon categories with more reviews. The table shows the most relevant terms extracted from reviews in the first and last bin (which, in their turn, features lower and higher average adherence)

Product The most relevant terms in bin with the lowest adherence The most relevant terms in bin with the highest adherence
Bluetooth Headsets charge, item, amazon, hear, purchased, device, problem, bought fit, clear, recommend, highly, music, easy, excellent, comfortable, quality
Bluetooth Speakers reviews, speakers, gift, charge, volume, item, amazon, music, purchased, charging, bought little, portable, excellent, bluetooth, easy, recommend, small, sounds, highly, quality, size
Screen Protectors purchase, cover, bought, item, iphone, pack, instructions perfect, samsung, clear, highly, fits, apply, perfectly
Unlocked Cell Phones buy, phone, phones, seller, cell, everything, amazon, iphone, item, problem, bought perfect, excellent, fast, card, easy, recommend, android, quality, sim
Appetite Control try, tried, these, hungry, pills, reviews, products, oz, bottle, waste, eat, bought loss, cambogia, lost, pounds, garcinia, recommend, diet, highly, definitely, lose, extract, exercise

The following table shows the most relevant terms extracted for some Booking categories, for English, Italian and French. However, for Italian and French, few examples are shown, due to the low number of reviews for such datasets.

City The most relevant terms in bin with the lowest adherence The most relevant terms in bin with the highest adherence
London booked, bar, floor, wifi, stayed, stay, shower, receptionist, reception, booking, hotels helpful, cleanliness, convenient, quiet, comfortable, beds, facilities, clean, excellent, friendly, size
New York booked, square, checked, floor, door, stayed, stay, reception, desk, reservation, booking, hotels perfect, bathroom, helpful, cleanliness, comfortable, beds, clean, excellent, small, noisy, friendly, size
Rome albergo, prenotazione, servizi, stanze, hotel, struttura, soggiorno, reception, doccia, booking disponibile, cortesia, gentile, confortevole, termini, ottima, gentilezza, abbondante, pulita, cordiale
Paris emplacement, gare, accueil, anglais, chambres, hotel, clients, avons, sol, londres, wifi, lit, chambre, réservation, hôtel, réception, payé, bruit, booking salle, géographique, déjeuner, proche, confortable, petite, petit, qualité, quartier, très, métro, calme, propreté, literie, situation, propre, agréable, proximité, bain

5. Related Work

No additional material here.

6. Final Remarks

No additional material here.