Stars Lie: What Our Research Says About Ratings and Returns in E-Commerce

We analysed 2.1 million Amazon listings and 31,000 products from a Central European e-shop. Four of our studies in a nutshell: rating inflation, the predictive ceiling for returns, repairability as a signal — and when reputation fails.

Why we study e-commerce signals

Star ratings and reviews are today's dominant quality signal for online purchases. But what if that signal is losing its information content? Four of our studies examine what publicly visible product-page data really says about quality — and what it no longer can.

1. Rating inflation and the price-quality paradox on Amazon

An analysis of 2.1 million product listings from the Amazon Reviews 2023 corpus. The mean rating is 4.23/5, fewer than 1% of products fall below 3.0 — the usable scale has compressed into a single effective star of range (3.8–4.8). Review attention concentrates extremely on a small share of products, and the link between price and visible review quality is weak.

Why it matters: the five-point scale is functionally a one-point scale. Average ratings therefore cannot serve as a comparison tool — and recommendation systems that ingest them as features are working with a collapsed measuring instrument.

2. The predictive ceiling: what returns reveal about information asymmetry

This study tests the upper bound of how accurately publicly visible product-page signals can predict return rates, using machine learning on listing-level data. The result: predictive performance has a structural ceiling. That is not a modelling failure but a finding about the market itself: public information is too thin to carry real dissatisfaction risk.

Why it matters: information asymmetry in e-commerce is measurable. A buyer simply cannot tell from a product page what awaits them — which is the argument for external data signals like a repairability score.

3. Repairability as a signal: replaceability cues versus returns

A study of ~31,000 products with real return rates tests whether visible replaceability cues (typically the wired vs. battery-dependent distinction) are associated with lower return propensity. It is one of the first empirical tests of right-to-repair-adjacent signals against actual post-purchase outcomes.

Why it matters: right-to-repair policy assumes repairability cues correspond to better outcomes. Our data shows cheap, visible cues can be tested empirically — and that a single cue is no substitute for a full repairability score.

4. When reputation fails: high ratings, high returns

On the same dataset of 31,000 products across 12 categories from a Central European retailer, this study profiles products that combine high ratings with elevated return rates. On average, higher ratings do mean fewer returns — but the mismatch is substantial and concentrates systematically among novelty products, premium-positioned items, and goods that depend on personal fit.

Why it matters: ratings are not a measure of realized quality but an imperfect attention-and-expectation signal. For some product types, they fail predictably.

The takeaway

Public e-commerce signals — stars, reviews, price — carry far less information than they appear to. Rating inflation has collapsed the scale, return prediction has a ceiling, and reputation fails systematically rather than randomly. Both consumers and e-shops need an external, objective data signal. That is precisely the role of our repairability score and e-shop API.

Zpět na přehled

Pavel Kopczyk · 5. července 2026

Další čtení

Výzkum5. července 2026

Jak dlouho doopravdy vydrží spotřebiče? Shrnutí našeho výzkumu životnosti

Pět našich studií o životnosti spotřebičů v kostce: co čekají čeští spotřebitelé, jak dlouho pračky skutečně vydrží, proč věříme na plánované zastarávání — a kam mizí vysloužilé spotřebiče.

Výzkum5. července 2026

Hvězdičky lžou: co náš výzkum říká o recenzích a vracení zboží v e-commerce

Analyzovali jsme 2,1 milionu produktů na Amazonu a 31 000 produktů středoevropského e-shopu. Čtyři naše studie v kostce: inflace hodnocení, strop predikce vratek, opravitelnost jako signál — a kdy reputace selhává.