How We Test The Testers

– □ x

OUR MISSION OF
RADICAL TRANSPARENCY

How We Test The Testers

Our True Score system identifies the true crème de la crème products on the web. The magic behind a True Score is the Trust Rating, an intricate synthesis of only the most trusted expert product ratings.

The Trust Rating system quantitatively evaluates how trustworthy an expert site is by considering various criteria, such as editorial integrity, transparency, and thoroughness of testing.

– □ x

THE

TRUST RATING

Our Trust Rating system examines the trustworthiness of 526 tech journalists across 30 categories (electronics and appliances) using 67-70 indicators to measure Human Authenticity, Scoring System, Integrity, Helpfulness, Qualification, Expertise, Visual Evidence, and Data Science. The number of indicators varies by category.

1. Human Authenticity

Are the staff real humans?

Worth 9% of the Trust Rating

2. Scoring System

Does the publication use a thorough, numerical scoring system?

Worth 1.95% of the Trust Rating

3. Integrity

Does the publication promote editorial integrity and prioritize helping readers?

Worth 4% of the Trust Rating

4. Helpfulness

Does the content effectively communicate product information to readers?

Worth 4.05% of the Trust Rating

5. Qualification

Does the publication claim to test the products they review?

Worth 5% of the Trust Rating

6. Expertise

Are the reviewer and publication experienced experts in the category?

Worth 8% of the Trust Rating

7. Visual Evidence

Does the publication provide visual proof of testing/real-world use?

Worth 24% of the Trust Rating

8. Data Science

Does the reviewer test the product and provide quantitative test data?

Worth 44% of the Trust Rating

Trust Rating

A numerical assessment of how trustworthy a publication is in an individual category.

After we give a Trust Rating to all the individual categories that a publication reviews, we average all of their Trust Ratings together to calculate the publication’s Final Trust Rating.

Trust Rating

Final Trust Rating

An averaged assessment of how trustworthy a publication as a whole.

Our Trust Rating creates a score from zero to 100 for each publication, placing them into one of six classifications: Highly Trusted, Trusted, Mixed Trust, Low Trust, Not Trusted and Fake Reviewers. The Trust Ratings power our True Score rating of products, the web’s most accurate quality score for a product.

A dedicated researcher calculates these scores, then we reach out to the publication, inform them of their Trust Ratings, and ask if we missed anything.

We dig into all the Trust Ratings in each category and apply them to page 1 of real Google search results.

The goal? Spot the fake reviews and point out the trusted ones.

In fact, when developing our TV Trust List, we found something shocking in Google’s TV search results: one-third of the results are fake reviews.

As testing experts with years of experience, we know exactly what needs to be tested and what test data to collect. We develop a testing methodology for each category (TVs, soundbars, etc.), synthesizing trusted expert testing processes and real customer pain points to determine which criteria matter when testing a product’s performance.

Classifying Sites With Trust Ratings

To help you identify which sites harbor fake reviews and which can be trusted, we use these 6 classifications based on their Trust Rating:

Classification	Description
Highly Trusted (90 – 100+ Trust Rating)	These sites passed our Trust Rating evaluation by earning a rating over 90%. The “cream of the crop” of publishers so to speak. Strong testing methods, plenty of photos, charts, and graphs, and all the numbers you could ever ask for. These publications are the most likely to give you useful info with clear scores worth trusting. Highly Trusted publications not only provide you with everything you need to know before you make a purchase, but they go beyond and provide additional information, often testing less known but highly relevant aspects of a product to give you a complete picture. We refer to these as “Industry Leaders.”
Trusted (70 – 89 Trust Rating)	All publications in this rating range have passed and are considered “Trusted.” These publications will provide you with enough quantitative testing and useful data to be able to make a purchase decision. However, they might feature more qualitative analysis and discussion than the top dogs do.
Mixed Trust (60 – 69 Trust Rating)	These publications passed, but are close to the dividing line between success and failure. While calling them untrustworthy would be incorrect, they still need more rigorous testing and documentation. Often, publications in this range of Trust Rating perform incomplete testing or may be missing photo evidence of using and testing the product.
Low Trust (50 – 59 Trust Rating)	These publications are on the other side of the dividing line and fail more than they succeed. Generally speaking, they have bits of useful information (hence their classifier as Low Trust) but are overall less useful and less reliable. You could technically use a Low Trust source to help find a product, but it’s not the best idea, as meaningful test data is likely going to be in short supply.
Not Trusted (0 – 49 Trust Rating)	These testers are failing. It could be from a lack of support and proof for their testing, being fraudulent in their claims about testing, or simply not offering enough useful information. These publications aren’t worth considering when you make a purchase decision, as even the higher-scoring failures are still failures. Most of the sites that aren’t trusted offer little useful information in the way of insight. Instead, they tend to provide summaries of information that can be pulled from reading product pages and sometimes blend it with qualitative research and personal experience that isn’t backed by even semi-rigorous testing. The lower the rating, the more likely the publication did the bare minimum to get their page up, with some offering little more than copy-pasted product specs and a four-sentence summary of the product’s features. Common indicators of an untrusted site are having no claims of testing (5.1) and failing to furnish proof that they actually used a product. Getting “No” to 7.1, 7.2 and 8.4 usually mean that whatever content is being offered on page is purely researched and based very little to not actual hands-on use, much less anything resembling testing.
Fake Reviewer	Not every failing tester is faking it. Many simply aren’t doing the work and are just publishing reviews based on research or specifications data, and never make claims about or imply testing. That’s fine – it’s not ideal and it doesn’t get you a passing grade, but it isn’t misleading or harmful. Some testers, however, are making claims to test and aren’t supporting their claims with data and photos. These experts use a variety of tactics to make it appear they’re testing, from being deliberately vague to using spec sheet numbers to saying nothing at all and simply hoping you’ll take them at their word. Publishers that do this enough risk being labeled as “Fake Reviewers.” Specifically, a publication that has been labeled a Fake Reviewer meets one of the following criteria: • 30% of the categories they cover feature fake reviews • At least 3 of the categories they cover feature fake reviews On the category level, a site is labeled a Fake Reviewer if they claim to test (5.1) and are found, after an assessment of their reviews and buying guides (7.1 – 7.3, 8.4 – 8.10) to have not actually tested (8.11). A site cannot be given this classification if they don’t claim to test.

TRUST CRITERIA – VERSION 1.5

Our trust criteria assess publications using 55 points, grouped into 8 clusters totaling 100 points (with 5.2 possible bonus points).

General Trust (19%) includes Human Authenticity, Review System, Integrity, and Helpfulness, focusing on foundational aspects like an About Us page and disclosure of sponsorships.

Category Trust (81%) includes Category Qualification, Category Expertise, Visual Evidence, and Data Science, evaluating category-specific knowledge such as author experience and the depth of product testing.

HOW WE CALCULATE A TRUST RATING:

Because each of the criteria in the table above have a point value, calculating a Trust Rating is very straightfoward.

(Human Authenticity Score) + (Review System Score) + (Integrity Score) + (Helpfulness Score) + (Category Qualification Score) + (Category Expertise Score) + (Visual Evidence Score) + (Data Science Score) = Trust Rating

The average of all the Trust Ratings for a single publication is calculated, and that number becomes the Publication Trust Rating.

Example of Calculating a Trust Rating:

Example Publication
- Trust Rating:
  - → Best Laptops: (5 + 1 + 4 + 2 + 4 + 8 + 16 + 44) = 84%
  - → Best Headphones: (5 + 1 + 4 + 2 + 4 + 8 + 16 + 8) = 48%
  - → Best TVs: (5 + 1 + 4 + 2 + 4 + 8 + 8 + 24) = 56%
- Publication Trust Rating:
- → (84 + 48 + 56) / 3 = 62.7%

PUBLICATION TRUST RATINGS – PHASE 2

The Trust Ratings are a work in progress, and since completing Phase 1 of Publication Trust Rating Research, we’re now working within Phase 2, which will add onto the current Trust Ratings and assess new criteria:

How many products have they tested?
How many Performance Criteria do they evaluate in reviews?
Have they reviewed five of the top brands of a specific category?
Do they review any newcomer brands’ products in a category?
Do they cover how a product has evolved from previous models?

Future phases are to come so that we can further refine the Publication Trust Ratings.

CHANGELOG

Dec 18, 2023: Version 2.0 True Score System

Our True Score system now utilizes Bayes Theroem, a much more comprehensive, probablistic model that utilizes machine learning to adapt scores on the fly.

To attain the most accurate and insightful product score, we employ advanced Hierarchical Bayesian Models (HBMs). These models are adept at pooling information from a multitude of sources, both from customer-generated reviews and expert assessments. Expert review websites are carefully evaluated based on a rigorous and systematic ranking scheme, where each website undergoes a thorough analysis by seasoned professionals. These individual expert scores are then integrated across different review platforms.

The utilization of HBMs allows us to merge these expert evaluations coherently, accounting for possible variations and biases inherent in different reviewing methodologies. This consolidation of expert scores adds a layer of sophistication to our models, enabling us to glean deeper insights into the true quality of each product. The hierarchical structure of the models ensures that both expert and customer reviews are weighed appropriately, lending a balanced, unbiased viewpoint that is representative of both professional and public opinion.

Hierarchical Bayesian Models are sophisticated machine learning tools employed in a wide variety of contexts, including healthcare analytics and recommendation systems like those used by Spotify and Netflix. These models navigate complex data structures effectively, synthesizing multiple layers of information to produce robust conclusions. This is particularly vital when dealing with sparse or inconsistent data sets, as the models can “borrow strength” from the entire pool of data, thereby mitigating the impact of outliers or skewed distributions.

By leveraging the capabilities of HBMs, we achieve more than simple review aggregation; we offer a nuanced synthesis that accounts for variability, uncertainty, and the multidimensional nature of product quality. These models are designed to adaptively update as new data are accrued, offering a dynamic, continuously refined product evaluation. This state-of-the-art analytical approach provides an exhaustive, ongoing assessment of product quality, serving as an invaluable tool for market analysis and product development.

Read Less

March 6, 2023: Version 2.0 Category Trust Rating Created

Note: While it may seem odd that our v2.0 was created before v1.5, this is not an error! The intent of our v2.0 confidence research was to create a new and improved process with the Category side of our criteria, but it is a time-consuming, top-to-bottom process.

The General side of the criteria remained the same as it was in v1.0 aside from adjusted point values. We wanted to avoid leaving the gaps v1.0’s category research had, however, so v1.5 was created after the fact to update existing research with new data from some of the most important parts established in v2.0.

As of this writing (July 12, 2023), a majority of our current research uses the v1.5 model, but will be updated over time to v2.0.

The biggest change in this update is the addition of our Custom Questions to break out what were previously just two generic queries regarding quantitative and benchmarked testing.
- In this system of “Custom Questions” (or CQs for short), for each category (e.g. best TVs, best gaming mice, etc.) we research leading reviews and buying guides from top authorities in the category like RTINGS, PCMag and GearLab to see what the experts test for. We also search ecommerce listings and Google keywords to see what consumers care about most. Informed by this research, we brainstorm 3-10 questions that outline what we should be looking for in each review or buying guide to prove that a given publication really knows their stuff. These fulfill the same purpose as the questions they replaced, in a more granular and specialized manner.
- We try our best to keep these CQs focused on quantitative data that can be measured in hard numbers, concrete and defined units. However, for certain categories such as blenders or VPNs, the nature of the products and services in question means that the best we can reasonably do is to scrutinize what is reported on qualitatively, such as the texture of a blender’s smoothie or the user experience and security features of a VPN, in service of the same goal to determine who is and isn’t a trustworthy expert in the consumer space.
Previously, we were checking for real-world photos of products in either buying guides or product reviews. As of v2.0, we now check buying guides and reviews separately, as many publications will satisfy this question in the latter but not the former.
Formerly, great consideration was granted to how long the author of a given buying guide had been writing in the industry. Obviously, a well-tenured writing staff is generally a vote of confidence for a publication, but we found that during the research process, scores for the same site could vary wildly based on which article happened to be picked for research. A few steps were taken to rectify this problem:
- Where before we had multiple incremental questions checking for a tenure ranging between 3 months and 10 years, we now only make one single check for an author who’s been writing for the category for at least one year.
- Using formulaically assembled Google search links, we have standardized the way we search for articles on each site, helping reducing variance in the research process.
- Instead of just one product review, we try to look for three in the category from each site, and separately denote whether each claims to test or not.
In addition to checking for the presence of a testing methodology in the category, we also check if it was marked with the date it was published or last updated. Many of the categories we cover include emerging or otherwise fast-moving technologies that improve near-daily, and so in the world of tech reviews it’s important to keep your information up to date. This is also a factor of transparency, which feeds into a publication’s overall trustworthiness. A testing methodology with no listed date on it is a hit to its veracity.
The score values of each criteria have been adjusted.
Along the same lines of our confidence research for publications, we implemented a similar process for reviewing YouTube videos and channels to be able to cover videos that appear in Google searches.

Read Less

May 23, 2023: Version 1.5 Category Trust Rating Created

Score Weights adjusted to reflect our focus on evidence-based testing, both visual and reported through text data, charts, and graphs.
- Point values of Phase 1.5 Research were converted to a 100-point scale.
- Qualification Category of Performance changed from 22% to 10%
- Visual Evidence Category of Performance changed from 33% to 30%
- Methodology Category of Performance changed from 22% to 20%
- Testing Proof Category of Performance changed from 22% to 40%

Removed questions from the Testing Proof section that are replaced and covered in greater detail by our new custom questions segment.
- Quantitative product test results and/or category scores in PR and/or BG}
- BONUS: Quantitative benchmark (comparative) product tests in PR and/or BG exist}
Added a question to Methodology to help show the focus of our research: getting to the bottom of who does and doesn’t test.
- Do they claim to test?
Added questions to Testing Proof designed to look for proof that any claims of testing a publication makes are properly supported:
- Do they provide correct units of measurement to help support that they actually tested?
- Did the reviewer demonstrate that the product was tested in a realistic usage scenario?
- (e.g. assessing a pair of headphones’ sound quality with different genres of music, riding an e-bike on various inclines, blending ice in a blender, etc.)”
- Custom Questions (Up to 10 per category)
  - Criteria of this and all following custom questions determined per category relating to relevant aspects that should be tested to demonstrate technical understanding of the products; these are often tests of performance criteria by way of measurement or use case, such a testing for brightness, how well something crushes ice, or how much debris a vacuum picks up
  - These questions are connected to performance criteria/categories of performance
  - Each category receives a different number of custom questions, but the sum point value of all of the custom questions is always the same, regardless of whether there are 3 or 10
- Based on the custom questions and your own assessment, is their claim to test truthful?

Read Less

June 17, 2024: Merged Trust Criteria Created

General Trust Rating and Category Trust Rating have been merged into a single rubric to simplify the calculation and facilitate an easier understanding of how a site earns the score it does within a category, and as a whole.

Categorized the questions and criteria used to assess publications into 8 distinct clusters with revamped scoring to make the entire rubric come out to 100 points plus 5.2 bonus points.
- Human Authenticity (9 points)
- Review System (1.95 points)
- Integrity (4 points, 0.4 bonus points)
- Helpfulness (4.05 points. 0.8 bonus points)
- Category Qualification (4 points)
- Category Expertise (8 points)
- Visual Evidence (24 points, 4 bonus points)
- Data Science (44 points)

Read Less

Why We Exist

WHERE WE GET OUR DATA

How We Test the Testers

How We Score

HOW WE MAKE MONEY

How we Write (SOME) CONTENT