Mythogin — Why Other Ratings Fail

The Value of Mythogin

The purpose of Mythogin isn’t to develop new technology to identify good stories and filter out the bad. Our focus is more circumspect: to provide a transparent lens to evaluate stories. That way, you can determine how helpful this lens is to you or not.

Storytelling is deeply personal, situational, and intrinsically subjective. What moves one person on one day, might not on another. However, part of what makes stories "good" or "bad" is objective. This relates to characterization, pacing, premise, originality, plotting, and so forth. It's upon this objective basis, that it becomes practical to identify and filter out "bad" stories.

The lens we use to distinguish good stories is twofold (bi-focals as it were): the originality of the story is the first lens; the second is the storytelling. Just being original isn’t enough for a story to be good—it could just be weird. On other hand, some stories might be good, even if they are partly covering old-ground—because the implementation is strong (great writing, great acting, etc.)

Even though we can’t promise that you will like every one of our selections, we can promise that there is something of value in each one. It’s likewise possible, that your favorite story isn’t on this list. Please don’t be offended. We might have overlooked it. So please write to us and tell us why it is your favorite, and we will take it seriously.

We will be adding listings to this catalogue as we discover more content. And we will be providing more detail on our reasoning for the selection of each title as time allows.

Why It’s so Hard to Find Good Stories

It’s becoming increasingly difficult to identify quality stories in a media landscape drowning in unoriginal content. Large sites like IMDB and Rotten Tomatoes can provide statistically significant ratings, but such systems break down when users only rank stories with a 1 or a 10 score. If users aren’t willing to use the entire scale, then the scale itself becomes counterproductive: it stops measuring the story in lieu of measuring the polarity of the viewership.

Amazon’s system used to be good until they started selling visibility and rankings to distributors. By now, it's difficult to distinguish advertizing from actual user content.

Netflix has a powerful recommendation algorithm that can compare similar content. However, the value of their algorithm depends upon the quality of their content. When Netflix ran out of quality content, they started to mass produce generic content to feed their own algorithms. For this reason, their business shifted from being a content provider to being a mass marketer. For this and other reasons, Netflix is in serious business trouble. We'll be providing a discussion of this in our Industry Pages.

The overproduction of generic content and the dearth of objective measurment makes it very difficult and time-consuming to find good stories these days.

Why Other Ratings Systems Fail

Rotten Tomatoes compresses judgment. Netflix optimizes attention. Amazon corrupts discovery by mixing ratings, advertising, marketplace power, and self-preferencing. IMDb turns taste into factional pressure.

Ratings systems promise clarity in a crowded media environment. They appear to give viewers a way to separate quality from noise: a percentage, a star rating, a recommendation row, a popularity rank, or an algorithmic match. But these systems often measure something other than artistic value. They measure tolerability, engagement, visibility, purchasing pressure, fandom intensity, or factional mobilization.

That does not make them useless. A rating can still function as a warning light. It can show when a film or show generated unusually broad dissatisfaction. It can help viewers avoid obvious disasters. But a warning light is not the same thing as judgment. A rating can tell us that people reacted. It cannot necessarily tell us whether a story has structure, originality, emotional force, moral seriousness, imaginative power, or lasting cultural value.

The deeper failure is that modern ratings systems often collapse the distinction between behavior and meaning. A click is treated like interest. A completed season is treated like satisfaction. A Fresh label is treated like quality. A five-star product is treated like merit. A high IMDb score is treated like consensus. But each of these signals can be distorted by incentives that have little to do with whether a story is actually worth remembering.

Netflix: When Recommendation Becomes a Substitute for Quality

Netflix’s recommendation engine did not fail because it became less sophisticated. It failed because sophistication was redirected away from discovery and toward attention capture.

In its strongest period, Netflix used personalization to solve a real problem: viewers had access to a large and varied catalog, but they needed help finding titles actually worth their time. The algorithm functioned like a guide through abundance. It could connect viewers to older films, foreign films, documentaries, cult favorites, niche genres, and overlooked works that ordinary advertising or cable programming would never have placed in front of them. The value of the system depended on the value of the catalog. Netflix helped users discover stories because there were stories worth discovering.

The original promise of Netflix personalization was that it could make a large library feel intimate. Instead of forcing every viewer through the same front door, Netflix could build different paths through the catalog. A person who liked political thrillers might be led toward older conspiracy films. A person who watched science fiction might be led toward stranger speculative dramas. A person who liked prestige crime stories might be led toward foreign noir, documentaries, or character-driven mysteries. At its best, the algorithm expanded taste. It used familiar preferences as a bridge toward unfamiliar value.

Streaming changed the recommendation engine by giving Netflix far more behavioral data, but more data did not necessarily produce better judgment. Netflix says its system considers a user’s viewing history, ratings, similar users’ preferences, title information such as genre and actors, time of day, preferred language, device, and how long someone watched a title. It also personalizes rows, title placement, and the order of titles within each row. That is technically impressive, but it also reveals the nature of the system: Netflix is not merely recommending stories; it is arranging the viewer’s entire field of attention.

The central weakness of Netflix’s modern algorithm is that it measures behavior more easily than satisfaction. A viewer may finish a series because it is excellent, but the same viewer may also finish it because it is easy, familiar, mildly addictive, visually loud, or convenient to leave running in the background. Completion is not the same as admiration. A viewer may abandon a difficult film not because it is bad, but because the timing is wrong, the device is wrong, the mood is wrong, or the film asks for more attention than the viewer can give that night. The algorithm sees the action, but the meaning of the action remains ambiguous.

This ambiguity matters because Netflix increasingly depends on behavior as a substitute for judgment. A click can mean curiosity, boredom, accident, habit, or genuine interest. A completed season can mean love, inertia, sunk cost, or hate-watching. A thumbs-up can mean “this was excellent,” “this was fine,” “show me more of this genre,” or “my child liked it.” These signals are useful for prediction, but they are not the same as criticism. They tell Netflix what a viewer is likely to do next. They do not tell Netflix whether a story has structure, depth, originality, emotional force, or lasting cultural value.

The algorithm becomes more dangerous when it stops discovering value and starts simulating it. If Netflix has a deep catalog, personalization can guide users toward worthwhile material. But if the catalog becomes weaker, more generic, and more self-produced, the recommendation engine is forced into a different role. It must make disposable content appear attractive. It must use genre labels, thumbnail choices, row placement, similarity claims, autoplay logic, and behavioral prediction to make the next option feel good enough. At that point, recommendation is no longer a guide through value. It becomes an interface for managing disappointment.

A recommendation engine cannot redeem a weak catalog; it can only rearrange it. This is the hard limit Netflix eventually confronts. The algorithm can sort the available titles with extraordinary precision, but it cannot make derivative thrillers less derivative, generic documentaries less generic, or disposable series more memorable. It can place a mediocre show in the perfect row, under the perfect thumbnail, after the perfect viewing history, but it cannot give that show a stronger premise, better characters, cleaner structure, or more meaningful emotional payoff. Personalization can reduce friction. It cannot create artistic value.

This is where attention capture begins to fail on its own terms. Netflix’s system can make the next click easier, but it cannot make repeated disappointment disappear. The first weak recommendation is merely a bad match. The fifth or tenth weak recommendation becomes a lesson. The viewer gradually stops believing that Netflix is helping them find something good. The homepage starts to feel less like a library and more like a slot machine: rows of bright surfaces, familiar categories, recycled formulas, and promises that rarely pay off. Once trust collapses, the algorithm is no longer building engagement. It is burning credibility.

The deeper shift is from discovery to retention. Discovery assumes there is value in the catalog and the viewer needs help finding it. Retention assumes the viewer may leave and the platform needs to keep them occupied. Discovery expands taste. Retention narrows behavior. Discovery says, “Here is something worthwhile you may not have found.” Retention says, “Here is something close enough to what you already watched that you may not close the app.” That difference changes the moral purpose of the system.

Netflix’s modern recommendation environment often mistakes habit for taste. If a viewer watches several mediocre crime dramas, the system may conclude that the viewer wants more crime dramas. But the viewer may not want more mediocrity. The viewer may want one excellent story. A system optimized around behavioral adjacency struggles with that distinction. It is very good at finding what resembles the last thing. It is much weaker at finding what is better than the last thing.

This is why Netflix can feel both personalized and empty. The rows may be tailored. The thumbnails may be optimized. The categories may be precise. The homepage may know the viewer’s habits in unsettling detail. Yet the experience can still feel hollow because the system is not fundamentally organized around quality. It is organized around likelihood of selection. The viewer is not being asked to encounter the best available work. The viewer is being nudged toward the most statistically plausible next watch.

That distinction also explains why the algorithm can become self-reinforcing. Netflix recommends titles based on prior behavior. The user chooses from the options Netflix makes visible. Netflix then treats those choices as evidence of what the user wants. Over time, the system can trap the viewer inside a narrowed version of their own habits. Instead of using preference as a bridge to stronger or stranger work, the platform uses preference as a loop. It gives the viewer more of what the viewer already tolerated, then records that tolerance as demand.

The problem is not that Netflix knows too little about its users; the problem is that it may know the wrong things too well. It knows what people start, stop, finish, search for, rate, and return to. It knows what time they watch, which devices they use, and which titles they recently engaged with. But the qualities that make a story endure are not so easily captured by behavioral telemetry. Narrative structure, moral seriousness, imaginative force, character development, thematic coherence, and emotional afterlife are not the same kind of data as watch time. The system is strongest where human judgment is thinnest.

Netflix’s dependence on recommendation also helps explain why its later business choices belong to the same larger story, even if they should be analyzed separately. When a platform’s growth depends on keeping users inside a subscription environment, the recommendation engine becomes central to the business model. If the engine can no longer compensate for declining content value, the company must lean harder on other tools: pricing, password-sharing restrictions, advertising tiers, live events, games, bundling, and new metrics of success. Netflix announced in 2024 that it would stop reporting quarterly subscriber additions beginning in 2025 and urged investors to focus more on revenue and operating margins than customer additions, a shift that makes sense for a mature attention business trying to define success beyond simple subscriber growth.

The business argument, however, begins with the cultural argument: no attention system can indefinitely substitute for quality. Netflix can personalize the shelf, rank the rows, modify the artwork, and predict the next click. It can reduce the time between boredom and playback. It can make weak content easier to choose. But if the viewer repeatedly discovers that the chosen content is forgettable, the platform loses the very trust its algorithm depends on. Recommendation works only when the viewer believes there is something worth finding.

Netflix’s algorithm therefore reveals the central failure of modern entertainment discovery. The company built one of the most sophisticated systems in media history for predicting what people might watch next. But prediction is not judgment, and engagement is not value. A platform can learn how to capture attention without learning how to honor it. That is the real failure. Netflix learned how to recommend what keeps people watching. It did not learn how to guarantee that what they watched would be worth remembering.

Rotten Tomatoes: The Problem with Measuring “Good Enough”

Rotten Tomatoes does not measure greatness. It measures the percentage of people willing to call something tolerable.

Rotten Tomatoes does not measure greatness. It measures the percentage of people willing to call something tolerable. That makes it a useful warning system, but a poor guide to artistic value. The site can tell viewers whether a film or show passed a broad threshold of approval, but it cannot tell them whether the work is original, profound, emotionally powerful, structurally excellent, or culturally significant. Its central weakness is that it turns criticism into a binary signal: Fresh or Rotten.

The Tomatometer is useful because it gives viewers basic direction in a crowded media environment. Rotten Tomatoes defines the Tomatometer as the percentage of professional critic reviews that are positive, with a title considered “Fresh” when at least 60% of reviews are positive and “Rotten” when less than 60% are positive. That system has obvious practical value. If only a small minority of critics respond positively to a movie, viewers have reason to be cautious. If most critics broadly recommend it, viewers have reason to believe it probably functions at some basic level. As a quick consumer warning system, Rotten Tomatoes can help people avoid obvious disasters.

The problem is that basic approval is not the same as artistic achievement. A film that receives polite approval from nearly everyone may earn an excellent Tomatometer score, even if few critics think it is remarkable. Meanwhile, a stranger, more ambitious, more divisive film may receive a weaker score because it produces sharper disagreement. The system rewards consensus more than depth. It is better at identifying broad acceptability than greatness.

Rotten Tomatoes loses magnitude by treating mild approval and passionate admiration as the same kind of vote. A critic who thinks a movie is barely good enough and a critic who thinks it is a masterpiece may both contribute to the same Fresh score. Likewise, a critic who thinks a film narrowly fails and a critic who thinks it is worthless may both count toward Rotten. The percentage tells us how many critics leaned positive, but it does not adequately show how strongly they felt. This makes the score appear more precise than it really is.

This weakness becomes especially damaging for flawed movies with real strengths. Many interesting films are not clean successes. They may have structural problems, uneven pacing, awkward exposition, or failed endings, while still containing extraordinary performances, visual imagination, thematic ambition, emotional force, or cultural importance. A real critic can hold those contradictions together. Rotten Tomatoes struggles to do so because the system ultimately pushes the judgment toward Fresh or Rotten. A movie that is half-broken but alive can be punished more harshly than a safer movie that is competent but forgettable.

The system therefore tends to favor movies that are agreeable over movies that are risky. A polished, conventional film can do very well because most critics find it acceptable. It may be cleanly made, well-paced, professionally acted, and easy to recommend. But those qualities do not necessarily make it memorable or important. A more original film may disturb expectations, divide audiences, or fail in visible ways because it is attempting something more difficult. Rotten Tomatoes often makes the safer work look superior because the safer work produces fewer objections.

Rotten Tomatoes also hides the reasons behind the judgment. A 75% score can mean that most critics thought a movie was solid but unexceptional. It can also mean that some critics loved it while others strongly disliked it. It can mean critics admired the performances but disliked the script, praised the direction but rejected the story, or respected the ambition but doubted the execution. The number compresses these very different critical situations into the same public symbol. It gives the viewer a result without preserving the argument.

The audience score has a parallel weakness. Rotten Tomatoes’ Popcornmeter represents the percentage of fans who rated a movie or show positively; for audience ratings, a full popcorn bucket appears when at least 60% of users give a rating of 3.5 stars or higher. This creates the same threshold problem. It does not measure lasting satisfaction, intensity of response, or the quality of the experience. It measures whether enough users crossed a minimum line of approval. That can be useful, but it is still not the same as judgment.

Verified audience ratings solve only part of the audience-score problem. Confirming that a user bought a ticket can reduce some forms of bad-faith rating, but it does not make the audience representative. Verified viewers are people who already chose to see the movie, which often means they were predisposed by marketing, fandom, franchise loyalty, genre preference, or cultural identification. Verification can prove attendance. It cannot prove balanced judgment.

The deeper cultural problem is that Rotten Tomatoes turns criticism into traffic signage. A review is supposed to be an argument about what a work is, what it attempts, how it succeeds, where it fails, and why it matters. Rotten Tomatoes reduces that argument to a public signal: go, stop, safe, unsafe, Fresh, Rotten. That is convenient for consumers, but corrosive for criticism. It trains audiences to ask whether a movie has been approved, rather than asking what kind of experience the movie offers or what kind of value it contains.

Rotten Tomatoes is therefore most useful when treated as a warning light, not a measure of greatness. It can tell viewers when a film has generated broad approval or broad dissatisfaction. It can help identify obvious caution signs. But it cannot preserve nuance, magnitude, ambition, partial success, or artistic risk. The score may help answer whether a movie is probably tolerable. It cannot answer whether the movie is beautiful, necessary, unforgettable, or worth arguing about.

The 60% line matters because it turns “a majority found this acceptable” into a public signal that can look much stronger than it really is. A higher threshold would make the score more trustworthy as a recommendation, though it would also make Rotten Tomatoes less permissive and probably more controversial.

The weakness is intensified by how low the Fresh threshold is. A movie only needs 60% of critics to give it a broadly positive review in order to receive the Fresh label. That means a film can be branded positively even when a large minority of critics found it unsuccessful. This may be useful as a basic warning system: if a movie cannot clear even that low bar, viewers should probably be cautious. But the threshold is too weak to function as a serious mark of quality. It tells us that a movie was tolerable to enough people, not that it was excellent, ambitious, or memorable. Rotten Tomatoes’ own Certified Fresh designation implicitly acknowledges this problem by requiring a stronger 75% score and additional review standards. The ordinary Fresh label therefore should not be read as praise. It is closer to a minimum viability badge.

That is the final limitation of the Tomatometer: it mistakes aggregated approval for meaningful judgment. It can count how many people were willing to say yes. It cannot tell us how much that yes mattered.

Amazon: When Discovery Becomes an Auction

Amazon’s weakness is not that every rating is fake or every recommendation is paid. The problem is structural: the user cannot easily tell whether visibility reflects merit, popularity, advertising spend, Amazon’s commercial interest, or some mixture of all four.

Amazon pollutes ratings not simply by allowing bad reviews or sponsored listings, but by collapsing the boundary between judgment and promotion. The star rating may come from users, but the path to that star rating is shaped by advertising, platform incentives, and Amazon’s own commercial interests. The result is a marketplace where visibility looks like merit, popularity looks like quality, and recommendation becomes indistinguishable from sales pressure.

Amazon Turns Discovery Into an Auction

Amazon’s ratings become less trustworthy because they sit inside a paid-placement environment. A five-star item does not simply compete against other five-star items on relevance, quality, or user satisfaction. It competes inside a marketplace where sellers can buy visibility.

Amazon’s own advertising page says Sponsored Products are cost-per-click ads that promote individual listings, and that ads may appear “at the top of, alongside, or within shopping results and on product pages.” Amazon also says these ads help increase product visibility and drive sales.

That is the first pollution point: the rating may be user-generated, but the visibility is commercially engineered. A product can look like a natural recommendation while actually benefiting from paid placement. The user sees star ratings, review counts, badges, rankings, and familiar Amazon interface cues, but the path that brought the product into view may be advertising-driven.

Paid Visibility Contaminates the Meaning of Popularity

Amazon’s system creates a feedback loop between paid placement and apparent legitimacy. A seller buys placement. Placement creates clicks. Clicks create sales. Sales improve ranking signals. Ranking signals create more visibility. Visibility produces more reviews. More reviews make the product look more trustworthy.

At that point, the rating system has not necessarily been falsified, but it has been subsidized into prominence. The product may have real ratings, but the user is not seeing it merely because it is the best result. The user is seeing it because advertising helped push it into the zone where ratings could compound.

That is a subtler problem than fake reviews. It is not just “the score is wrong.” It is that the entire discovery environment has been tilted before the user ever evaluates the score.

Amazon’s Conflict of Interest Is Built Into the Platform

Amazon is not a neutral marketplace. It is the store, the search engine, the advertising broker, the logistics provider, the data collector, the seller of its own products, and the judge of which products are easiest to see.

That creates an obvious conflict: Amazon can profit from third-party sellers paying for placement, while also promoting its own products or content inside the same environment.

The FTC’s antitrust complaint against Amazon makes this exact kind of allegation. The FTC accused Amazon of degrading the customer experience by replacing relevant organic search results with paid ads and of biasing search results to prefer Amazon’s own products over products Amazon allegedly knew were better quality. These are allegations, not final court findings, but they identify the central structural problem clearly.

The issue is not simply that Amazon has private-label products or original content. Many retailers do. The issue is that Amazon also controls the visibility architecture. It can decide what appears first, what appears “recommended,” what appears sponsored, what appears organic, and what receives the trust halo of the Amazon interface.

Ratings Become Weaker When Surrounded by Promotional Signals

Amazon’s star ratings might still provide useful information, but the surrounding system makes them harder to interpret. A user is not only seeing ratings. The user is seeing “Sponsored” placement, Amazon’s Choice badges, Best Seller labels, Prime eligibility, review counts, discounts, personalized recommendations, algorithmic rankings, and sometimes Amazon-owned or Amazon-favored offerings.

Each of these signals can feel like evidence of quality. But many of them may reflect platform incentives rather than independent value. That means Amazon does not need to fake ratings in order to distort judgment. It can simply surround ratings with enough commercial cues that the user mistakes visibility for quality.

Fake Reviews Are Only the Obvious Version of the Problem

Fake reviews matter, but they are not the deepest issue. The deeper issue is that Amazon’s whole marketplace depends on turning social proof into conversion.

Regulators have repeatedly focused on fake-review problems. In 2025, after a UK Competition and Markets Authority investigation, Amazon pledged stronger action against fake reviews and “catalogue abuse,” where sellers attach reviews from successful products to different products in order to boost ratings. The Guardian reported that Amazon said it blocked 275 million fake reviews in 2024.

That is important, but it also shows the scale of the trust problem. If the marketplace requires constant policing against fake reviews, review manipulation, catalogue abuse, and seller gaming, then the rating system is always operating under pressure.

So the argument should not be: “Amazon ratings are useless.” The sharper argument is this: Amazon ratings are useful only after discounting for the polluted environment in which they appear.

Amazon’s Recommendation System Is Designed to Sell, Not Judge

Amazon personalization is powerful, but its purpose is commercial. Amazon says it uses shopping activity, preferences, search, browsing, and purchase history to personalize recommendations and product descriptions across the shopping journey.

That is helpful when the user wants convenience. But it is not neutral evaluation. The system is designed to move users toward purchases, not to identify the best product, the most honest review pattern, or the most durable value.

This matters because Amazon recommendations can feel like objective discovery while actually functioning as sales architecture. The recommendation is not criticism. It is merchandising with data.

Prime Video Has the Same Conflict in Entertainment Form

Prime Video extends the problem from products to stories. Amazon is not just helping users find films and shows. It also owns and promotes Amazon Originals, sells or rents third-party titles, bundles content through Prime, supports advertising, and gives some content partners promotional tools.

Amazon’s own Prime Video support materials describe marketing and promotions tools for content providers, including Entertainment Spotlight Ads for self-service providers and invite-only promotional tools for managed contract partners.

That creates the same basic question: when Prime Video surfaces a title, is it doing so because the title is excellent, because it fits the user, because it is included with Prime, because it is monetizable, because a partner promoted it, or because Amazon wants its own content to dominate attention?

The user cannot easily tell.

The Real Failure Is the Collapse of Trust Boundaries

Amazon’s system fails because it collapses categories that should remain distinct. A rating should tell users what other people thought. A recommendation should help users find what is relevant or worthwhile. An advertisement should disclose that visibility was purchased. A platform-owned product should be clearly understood as self-interested placement. A marketplace ranking should help users compare options fairly.

Amazon blends these functions into one interface. The result is a discovery system that looks informational but behaves commercially.

IMDb: When the Scale Becomes a Weapon

IMDb appears to offer a more precise measure of quality than Rotten Tomatoes because it uses a ten-point scale. But that precision collapses when users treat the scale as a weapon.

IMDb appears to offer a more precise measure of quality than Rotten Tomatoes because it uses a ten-point scale. But that precision collapses when users treat the scale as a weapon. A 1 becomes a punishment, a 10 becomes a defense, and the average becomes less a measure of artistic value than a record of factional pressure. IMDb’s weighted system may reduce the damage, but because the weighting is opaque, users are left with a number whose social meaning they cannot fully interpret.

IMDb may be more useful for older films before the polarization era, where large vote totals accumulated over time and fewer titles became immediate fronts in online culture war. But for newer or controversial titles, the number often requires suspicion before interpretation.

IMDb’s Strength Is Also Its Weakness

IMDb seems better than Rotten Tomatoes because it lets registered users rate titles from 1 to 10. That should, in theory, capture magnitude: a 6 is different from an 8, and a 4 is different from a 1. IMDb says registered users can rate released titles on that 1–10 scale.

That makes IMDb look more nuanced than Rotten Tomatoes. But the system only works if users actually use the full scale. If most people vote honestly, IMDb can be useful. If enough users treat the score as a battlefield, the scale becomes performative. A 1 does not mean “this is nearly worthless.” It means “I want to punish this.” A 10 does not mean “this is a masterpiece.” It means “I want to defend this.”

That is the central failure.

The 1-or-10 Problem Turns Rating Into Political Speech

The biggest weakness of IMDb is extremity. Many users do not ask, “What score best reflects the quality of this work?” They ask, “What score will help push the public number in the direction I want?”

That turns the rating into a political statement. A user who thinks a movie is mediocre may give it a 1 because they believe it is overrated, ideologically offensive, too “woke,” not woke enough, disrespectful to the source material, hostile to the fanbase, or useful as a target in some larger online fight. Another user may give a 10 not because the work is excellent, but because they want to counter the backlash.

At that point, IMDb is not measuring taste. It is measuring mobilization. The score becomes less a judgment of the movie than a record of how intensely different factions wanted to move the number.

IMDb Knows the System Is Vulnerable

IMDb’s own rating system implicitly acknowledges this problem. IMDb says it publishes weighted vote averages rather than raw averages, that not all votes have the same impact, and that it may apply an alternate weighting calculation when unusual voting activity is detected. It also says it does not disclose the exact method used to generate the rating, to protect the system from abuse.

That is important. IMDb is not pretending the simple average is reliable. It knows the public score needs protection against manipulation.

But this creates a second problem: opacity. The weighted score may be more reliable than the raw average, but the user cannot fully interpret it. IMDb shows a number, but the exact process behind that number is hidden. That may be necessary for anti-abuse reasons, but it also means the viewer has to trust IMDb’s black box.

So IMDb has a double weakness: the raw vote is vulnerable to manipulation, but the corrected vote is opaque.

Review Bombing Exposes the Emotional Structure of the System

IMDb has repeatedly had to deal with “unusual voting activity” around controversial titles. The 2023 live-action The Little Mermaid received an IMDb warning after unusual voting activity tied to review bombing, and reports described IMDb applying alternate weighting to preserve reliability. Similar warnings appeared around Snow White in 2025 after a wave of one-star ratings.

Those cases show the problem clearly. The score becomes a symbolic fight. Some users are not rating the film as a film. They are voting on casting, politics, franchise resentment, corporate anger, culture-war identity, or online group membership.

That does not mean every negative vote is illegitimate. A film can deserve criticism. But when huge waves of extreme ratings appear before a serious public evaluation has had time to form, the score stops functioning as audience judgment. It becomes a protest mechanism.

Positive Manipulation Is Harder to Prove, but Structurally Plausible

The claim that studios or interested parties may be buying ratings should be phrased carefully. It is better not to say, “Studios are buying IMDb ratings” without evidence for a specific studio, title, or campaign. The stronger and safer argument is this: IMDb ratings are vulnerable not only to backlash campaigns, but also to promotional manipulation. Whether the buyer is a studio, distributor, producer, filmmaker, fan group, or marketing contractor, the system creates incentives to inflate ratings because a higher IMDb score can shape public perception.

There is visible evidence that a market for IMDb-rating manipulation exists. Some sites openly advertise paid IMDb votes, customized star ratings, and packages designed to raise a movie or show’s IMDb score. One such service advertises customized 7-, 8-, 9-, and 10-star ratings, while another markets IMDb ratings as a way to improve visibility and make a film look more popular.

That does not prove major studios are buying ratings. It proves something narrower but still important: IMDb ratings are valuable enough that third parties sell manipulation services around them.

That is enough for the essay.

Paid Ratings Would Be Especially Hard to Detect

Negative review bombing is often obvious because it produces visible waves of 1-star votes. Positive manipulation can be harder to identify because it can mimic ordinary enthusiasm.

A new movie getting a surge of 10s may look like fan excitement. A streaming release getting high early votes may look like a passionate audience. An indie film with suspiciously high ratings may look like grassroots support. Unless the pattern is extreme, outside viewers cannot easily tell whether the score reflects genuine admiration, coordinated fandom, paid promotion, or some mixture of all three.

That makes positive manipulation more dangerous than obvious review bombing. A wave of 1-star attacks often looks suspicious. A wave of 10-star praise often looks like success.

IMDb’s Scale Can Exaggerate Fandom

IMDb also rewards the intensity of highly motivated audiences. The people who rate a title are not necessarily the full audience. They are the people motivated enough to log in and vote.

That can skew the system toward fandoms, haters, activists, franchise loyalists, and people with strong identity investments in the work. A casual viewer who thought a movie was “fine” may never vote. A furious fan or defender might.

So IMDb does not measure “the audience.” It measures the voting audience. That distinction matters.

IMDb Is Most Useful After Discounting the Extremes

IMDb is not useless. It can still provide a rough signal, especially for older titles with large vote counts and less active controversy. But the score should be read with suspicion when the rating distribution is polarized, when the title is politically charged, when it belongs to a combative fandom, when it has unusually high or low early votes, or when the vote breakdown is dominated by 1s and 10s.

A healthy distribution should have texture. It should show meaningful use of the middle. If the middle disappears, the score is probably not measuring quality. It is measuring conflict.

Why Mythogin Uses a Different Lens

Ratings can summarize reaction. Mythogin tries to explain value.

Mythogin does not pretend that taste can be made perfectly objective. Storytelling is personal, situational, and emotional. A story may reach one person at the right moment and leave another person cold. No rating system can eliminate that subjectivity.

But subjectivity does not mean that all judgment is arbitrary. Some parts of storytelling can be discussed with clarity: premise, characterization, pacing, dramatic structure, originality, moral conflict, emotional payoff, symbolic depth, and cultural significance. These features do not guarantee personal enjoyment, but they help explain why some stories endure while others merely pass the time.

Our catalogue is built around that distinction. We are not simply asking whether a story is popular, frictionless, or broadly tolerable. We are asking what kind of value it offers. Does it contain narrative intelligence? Does it deepen perception? Does it show human beings under meaningful pressure? Does it provide delight, catharsis, insight, or mythic resonance?

That is why other ratings fail. They often measure reaction without explaining value. Mythogin’s purpose is to make the act of recommendation more transparent: not just “this is good,” but why it might matter.