– □ x

OUR MISSION OF
RADICAL TRANSPARENCY

How We Test The Testers

Our True Score system identifies the true crème de la crème products on the web. The magic behind a True Score is the intricate synthesis of the most trusted expert and customer ratings.

Now let’s talk experts and who’s “trusted”. The promise of any publication should be to deliver accurate, relevant, and useful content to help readers make informed decisions. Then that’s where the actual reviewers come in. They should pledge to provide an honest and properly-tested review with their own quantitative test results.

So by us testing the testers, we’re making sure that promise is kept. Spoiler Alert: there’s a lot of broken promises out there. To test the testers, we use our own quantitative Trust Score system to leave no room for personal biases.

– □ x

THE PUBLICATION
TRUST SCORE

A Publication Trust Score assesses the trust of an expert site, and it serves as the bedrock for the weighting for the Expert Scores that help determine the True Scores. Expert sites can be complex and feature diverse ranges and types of content apart from each other, so to be as thorough and accurate in scoring as possible, we have to first calculate a publication’s Trust Score.

A Trust Score is a weighted score composed of two parts: a General Trust Score and Category-Specific Trust Score. A dedicated researcher calculates these scores by evaluating a wide variety of aspects on a site to determine how trustworthy the publication is on a whole and for a specific product category. Once the evaluation process is complete, we reach out to the publication, inform them of their Trust Scores, and ask if we missed anything.

These questions are based on Google’s review criteria that give creators direction in how to structure an in-depth review. We explain the Publication Trust Score – Version 1.5 criteria in the following tables:

GENERAL TRUST CRITERIA – VERSION 1.5

This criteria assesses foundational aspects of each publication, such as an About Us page, sponsorship/paid promotion disclosure, and a scoring system used on every review regardless of product category. We’ve provided a sample of Trust Score criteria below across all our criteria categories. We’ll provide the entire list in another page.

CRITERIA CATEGORY

QUALIFICATION

Publication staff are real humans.


INTEGRITY

Publication promotes editorial integrity and prioritizes genuinely helping consumers.


CLARITY

Content is structured to effectively communicate product information to consumers.


QUANTIFICATION & SCORING

Publication uses a thorough, numerical scoring system to differentiate products from each other.

TRUST SCORE CRITERIA

POINT VALUE

Does the expert site have an About Us or Team Page?

3.0

Do they have an ethics statement?

3.0

Do they not have any sponsored content, paid promos, or advertorials?

3.0 if Yes
0 if No

Do they have an ethics statement?

3.0

Are there Categories of Performance ratings, scores, badges, etc.?

0.5

Are there Performance Criteria ratings out of 100.0?

0.5

Are multiple retailers’ product listings included?

1.0

Does the site provide a Numerical Score Methodology?

1.0

CATEGORY-SPECIFIC TRUST CRITERIA – VERSION 1.5

This criteria assesses specific categories on each publication, such as how experienced an author is in the specific category, what type of media they present in their content, and how in-depth the product testing is. We’ve provided a sample of Trust Score criteria below across all our criteria categories. We’ll provide the entire list in another page.

CRITERIA CATEGORY

QUALIFICATION

Publication staff are real humans.


VISUAL EVIDENCE

The content provides visual proof to support the reviewer’s claims.


TESTING EVIDENCE

The reviewer tested the product and provided their own quantitative measurements from their testing.

TRUST SCORE CRITERIA

POINT VALUE

Has the author on the Pillar Buying Guide (“best headphones”, “best tvs”, etc.) written at least 5 articles for the category on the site?

5.0

Does the review author have a public LinkedIn profile?

2.0

Does the content contain real-world, non-stock photos?

10.0

Did the reviewer demonstrate that the product was tested in a realistic usage scenario?

10.0

Does a category-specific test method exist, i.e. How We Test Headphones, How We Test Smartphones, etc.?

5.0

Do they provide correct units of measurement to help support that they actually tested?

10.0

Custom Questions For Each Category

20.0

NOTE: All our categories will eventually be evaluated using our 2.0 criteria.

HOW WE CALCULATE A TRUE SCORE:

Before we can calculate a product’s True Score, we need the Expert and Customer Scores. So when the above criteria is all evaluated, we then calculate the Trust Score:

We believe that the category-specific review content is the core of all these publications, which is why we weigh that score to be worth 80% of the Trust Score. General Trust Makes up the other 20%, and the two values are added together to become the Trust Score. The average of all the Trust Scores for a single publication is calculated, and that number becomes the Publication Trust Score.

Calculating a Publication Trust Score Example:

The Publication Trust Score determines the weight of the publication’s review score when calculating the Expert Score for a specific product. So in essence, the higher the Publication Trust Score, the more weight that publication’s review has in our overall TrueScore for that product.

Calculating an Expert Score:

In this scenario, five publications have their own reviews and scores on the Dell S2721HGF monitor. They’ve become Expert Sources to calculate our Expert Score for that model due to their numerical scores for the product along with their passing Trust Scores. We do not include any publications that earned a Trust Score under 60% in the Expert Score calculation.

PublicationDell S2721HGF Monitor
Expert Score
PC Mag4/5
Laptop Mag4/5
RTINGs7.3/10
Trusted Reviews3.5/5

We then have to convert their review scores to our own scoring system that uses a 1-100 logarithmic scale:

Through our Trust Score research, we’ve found that the Publication Trust Scores (PTS) for the four sites are (these numbers are for illustrative purposes and constantly changing):

PublicationDell S2721HGF Monitor
Expert Score
Converted
Expert Score
Publication Trust Score (PTS)
PC Mag4/580%92%
Laptop Mag4/580%68%
RTINGs7.3/1073%103%
Trusted Reviews3.5/570%91%
Publication Trust Scores posted here are for example purposes only. Trust Scores change as we improve our system.

The publications with the higher Trust Scores get the higher weights in the score. With a weighted average, we can give greater weight in the calculation to the publications with the highest Blended Trust Scores versus those with lower scores. 

The number of publications we use in our Expert Score calculation might always be different per product, since more publications may have reviewed a certain product versus another.

To figure out the weighting distribution, we use a Rubric Weighting system. The Rubric Weighting system is basically the Trust Score converted from a percentage to a decimal. Here are some examples:

Trust ScoreRubric Weight
101%1.01
91%0.91
89%0.89
84%0.84

The final formula is:

266.89 / 3.54 = 75.39% = Expert Score

The Expert Score is then used alongside the Customer Score to calculate the Dell S2721HGF monitor’s TrueScore.

Calculating a Customer Score:

The second component to calculating a True Score is the Customer Score which is calculated by taking a weighted average of three large online retailers’ average customer scores for a single product. Let’s continue using the Dell S2721HGF monitor’s data as an example.

The Customer Score formula is:

Here’s a table of the customer score data of the Dell S2721HGF monitor:

Online RetailerDell S2721HGF
Average Customer Score
Number of Customer Reviews
Amazon4.802780
Best Buy4.70795
Walmart4.701786

We need to convert the Customer Scores to the 0-100 logarithmic scale first.

Calculating the True Score v1.0:

This is our initial method of calculating True Scores that involves a 75-25 rating between the expert and customer scores. This version is live on the site but is in the process of being updated to version 2.0 that uses a probabilistic model. You can see a bit more about this here.

75 is given to the experts, and we value the customer’s voice and input towards long-term usage of products, but the issue of fake customer reviews is concerning, which is why they’re given only 25%.

Here’s the True Score formula:

True Score = (Expert Score x 0.75) + (Customer Score x 0.25)

We have all the components now, so the final calculation is very simple.

True Score = (75.39 x 0.75) + (95.05 x 0.25)

True Score = 80%

The True Score of the Dell S2721HGF monitor is 80%.

Calculating the True Score v2.0 – Bayesian Model:

Update: as of April 2024 all True Scores are now calculated using the True Score v2.0.

The product True Scores are soon going to be recalculated by a Bayesian model, which is a type of probabilistic model. In a Bayesian model, beliefs and uncertainties are expressed in terms of probabilities.

If statistics aren’t your forte, don’t fear, we’ll break it down for you. Imagine you have a question or a problem, but you’re not sure about the answer. For us, our question is, “What is the most accurate and true rating of a product, AKA a product’s True Score?” 

A Bayesian model helps us answer that question based on what we already think and the information we have. We first had two hypotheses on what a product True Score is:

  • One hypothesis was that the higher the Trust Score of a publication (meaning the more in-depth their testing and transparency is), the more accurate their product rating is.
  • The other hypothesis was that the greater the number of customer reviews a product has, the more accurate that average product rating is.

Then, we gathered evidence that relates to our hypotheses, which we did through our research to get the final Trust Scores of all sites plus our scraping of the biggest e-commerce retailers for their customer reviews.

The special thing about the model is that it uses probabilities. Probabilities are like chances or possibilities. Instead of saying the True Score is definitely true or false (which we can’t definitively determine), the model tells you how likely a certain True Score is to be true or false.

Our final True Score we give to a product is determined by which integer has the highest probability of being the True Score.

PUBLICATION TRUST SCORES – PHASE 2

The Trust Scores are a work in progress, and since completing Phase 1 of Publication Trust Score Research, we’re now working within Phase 2, which will add onto the current Trust Scores and assess new criteria:

  • How many products have they tested?
  • How many Performance Criteria do they evaluate in reviews?
  • Have they reviewed five of the top brands of a specific category?
  • Do they review any newcomer brands’ products in a category?
  • Do they cover how a product has evolved from previous models?

Future phases are to come so that we can further refine the Publication Trust Scores.

MATCHING OUR CRITERIA TO GOOGLE’S

Gadget Review’s criteria were carefully selected to match up with Google’s published criteria on what makes up a quality review.

CRITERIA CATEGORY

GOOGLE’S CRITERIA

GADGET REVIEW’S CRITERIA

POINT VALUE

QUALIFICATION

The writer or publication should demonstrate expertise within the category.

Demonstrate that you are knowledgeable about what you are reviewing—show you are an expert.

Has the pillar author been writing for X amount of years?

3.0

Has the author on the Pillar Buying Guide (“best headphones”, “best tvs”, etc.) written at least 10 reviews for the category on the site?

5.0

Share quantitative measurements about how a product measures up in various categories of performance.

Do they provide correct units of measurement to help support that they actually tested?

10.0

Custom Questions

20.0

Competitor Product Analysis

Compare a product against similar ones to help a consumer make the best buying decision possible.

Explain what sets a product apart from its competitors.

Does the content contain feature comparison tables of similar products?

10.0

Category of Performance Analysis

Analyze the various aspects of the product to thoroughly determine its quality.

Focus on the most important decision-making factors, based on your experience or expertise (for example, a car review might determine that fuel economy and safety are key decision-making factors and rate performance in those areas).

Are there Categories of Performance ratings, scores, badges, etc.?

0.5

Helpful Content Structure

Organize the content in a way that effectively communicates information to consumers.

Evaluate the product from a user’s perspective.

Did the reviewer demonstrate that the product was tested in a realistic usage scenario?

1.0

Consider including links to multiple sellers to give the reader the option to purchase from their merchant of choice.

Are multiple retailers’ product listings included?

1.0

CHANGELOG

Dec 18, 2023: Version 2.0 True Score System

Our True Score system now utilizes Bayes Theroem, a much more comprehensive, probablistic model that utilizes machine learning to adapt scores on the fly.

To attain the most accurate and insightful product score, we employ advanced Hierarchical Bayesian Models (HBMs). These models are adept at pooling information from a multitude of sources, both from customer-generated reviews and expert assessments. Expert review websites are carefully evaluated based on a rigorous and systematic ranking scheme, where each website undergoes a thorough analysis by seasoned professionals. These individual expert scores are then integrated across different review platforms.

The utilization of HBMs allows us to merge these expert evaluations coherently, accounting for possible variations and biases inherent in different reviewing methodologies. This consolidation of expert scores adds a layer of sophistication to our models, enabling us to glean deeper insights into the true quality of each product. The hierarchical structure of the models ensures that both expert and customer reviews are weighed appropriately, lending a balanced, unbiased viewpoint that is representative of both professional and public opinion.

Hierarchical Bayesian Models are sophisticated machine learning tools employed in a wide variety of contexts, including healthcare analytics and recommendation systems like those used by Spotify and Netflix. These models navigate complex data structures effectively, synthesizing multiple layers of information to produce robust conclusions. This is particularly vital when dealing with sparse or inconsistent data sets, as the models can “borrow strength” from the entire pool of data, thereby mitigating the impact of outliers or skewed distributions.

By leveraging the capabilities of HBMs, we achieve more than simple review aggregation; we offer a nuanced synthesis that accounts for variability, uncertainty, and the multidimensional nature of product quality. These models are designed to adaptively update as new data are accrued, offering a dynamic, continuously refined product evaluation. This state-of-the-art analytical approach provides an exhaustive, ongoing assessment of product quality, serving as an invaluable tool for market analysis and product development.

March 6, 2023: Version 2.0 Category Trust Score Created

Note: While it may seem odd that our v2.0 was created before v1.5, this is not an error! The intent of our v2.0 confidence research was to create a new and improved process with the Category side of our criteria, but it is a time-consuming, top-to-bottom process.

The General side of the criteria remained the same as it was in v1.0 aside from adjusted point values. We wanted to avoid leaving the gaps v1.0’s category research had, however, so v1.5 was created after the fact to update existing research with new data from some of the most important parts established in v2.0.

As of this writing (July 12, 2023), a majority of our current research uses the v1.5 model, but will be updated over time to v2.0.

  • The biggest change in this update is the addition of our Custom Questions to break out what were previously just two generic queries regarding quantitative and benchmarked testing.
    • In this system of “Custom Questions” (or CQs for short), for each category (e.g. best TVs, best gaming mice, etc.) we research leading reviews and buying guides from top authorities in the category like RTINGS, PCMag and GearLab to see what the experts test for. We also search ecommerce listings and Google keywords to see what consumers care about most. Informed by this research, we brainstorm 3-10 questions that outline what we should be looking for in each review or buying guide to prove that a given publication really knows their stuff. These fulfill the same purpose as the questions they replaced, in a more granular and specialized manner.
    • We try our best to keep these CQs focused on quantitative data that can be measured in hard numbers, concrete and defined units. However, for certain categories such as blenders or VPNs, the nature of the products and services in question means that the best we can reasonably do is to scrutinize what is reported on qualitatively, such as the texture of a blender’s smoothie or the user experience and security features of a VPN, in service of the same goal to determine who is and isn’t a trustworthy expert in the consumer space.
  • Previously, we were checking for real-world photos of products in either buying guides or product reviews. As of v2.0, we now check buying guides and reviews separately, as many publications will satisfy this question in the latter but not the former.
  • Formerly, great consideration was granted to how long the author of a given buying guide had been writing in the industry. Obviously, a well-tenured writing staff is generally a vote of confidence for a publication, but we found that during the research process, scores for the same site could vary wildly based on which article happened to be picked for research. A few steps were taken to rectify this problem:
    • Where before we had multiple incremental questions checking for a tenure ranging between 3 months and 10 years, we now only make one single check for an author who’s been writing for the category for at least one year.
    • Using formulaically assembled Google search links, we have standardized the way we search for articles on each site, helping reducing variance in the research process.
    • Instead of just one product review, we try to look for three in the category from each site, and separately denote whether each claims to test or not.
  • In addition to checking for the presence of a testing methodology in the category, we also check if it was marked with the date it was published or last updated. Many of the categories we cover include emerging or otherwise fast-moving technologies that improve near-daily, and so in the world of tech reviews it’s important to keep your information up to date. This is also a factor of transparency, which feeds into a publication’s overall trustworthiness. A testing methodology with no listed date on it is a hit to its veracity.
  • The score values of each criteria have been adjusted.
  • Along the same lines of our confidence research for publications, we implemented a similar process for reviewing YouTube videos and channels to be able to cover videos that appear in Google searches.

May 23, 2023: Version 1.5 Category Trust Score Created

  • Score Weights adjusted to reflect our focus on evidence based testing, both visual and reported through text data, charts and graphs.
    • Point values of Phase 1.5 Research were converted to a 100-point scale.
    • Qualification Category of Performance changed from 22% to 10%
    • Visual Evidence Category of Performance changed from 33% to 30%
    • Methodology Category of Performance changed from 22% to 20%
    • Testing Proof Category of Performance changed from 22% to 40%
  • Removed questions from the Testing Proof section that are replaced and covered in greater detail by our new custom questions segment.
    • Quantitative product test results and/or category scores in PR and/or BG}
    • BONUS: Quantitative benchmark (comparative) product tests in PR and/or BG exist}
  • Added a question to Methodology to help show the focus of our research: getting to the bottom of who does and doesn’t test.
    • Do they claim to test?
  • Added questions to Testing Proof designed to look for proof that any claims of testing a publication makes are properly supported:
    • Do they provide correct units of measurement to help support that they actually tested?
    • Did the reviewer demonstrate that the product was tested in a realistic usage scenario? 
    • (e.g. assessing a pair of headphones’ sound quality with different genres of music, riding an e-bike on various inclines, blending ice in a blender, etc.)”
    • Custom Questions (Up to 10 per category)
      • Criteria of this and all following custom questions determined per category relating to relevant aspects that should be tested to demonstrate technical understanding of the products; these are often tests of performance criteria by way of measurement or use case, such a testing for brightness, how well something crushes ice, or how much debris a vacuum picks up
      • These questions are connected to performance criteria/categories of performance
      • Each category receives a different number of custom questions, but the sum point value of all of the custom questions is always the same, regardless of whether there are 3 or 10
    • Based on the custom questions and your own assessment, is their claim to test truthful?