McGowan: AI Flunks the Baseball Test

November 4, 2025

by Richard McGowan, Ph.D.

This year’s world series and all the attention paid to Shohei Ohtani reminded me of the way students used to get by when writing papers. They used CliffsNotes instead of doing research. Can AI be far behind?

I had a bright student who wrote a paper on Plato’s “Crito” as explained by every student’s former cheat sheet, i.e., CliffsNotes. Conner discovered that CliffsNotes was woefully deficient. In its analysis of that Platonic dialogue, it never addressed history’s first presentation and explanation of civil disobedience, the explanation which Socrates delivered to Crito. The glaring omission led many students to poor comprehension and short circuited an avenue of contemplative thought about civil disobedience.

CliffsNotes was replaced by Wikipedia, but that source, while better, often lacked nuance and depth. At least an entry in Wikipedia provided sources enabling my students to research independently.

These days, of course, we have Artificial Intelligence (AI), and the word “artificial” accurately describes that source of information, at least now, in its infancy. I decided to test AI against the great knowledge base of Major League Baseball.

With all the attention given to Ohtani’s double role as hitter and pitcher, I began by revisiting my earlier research on other players who were good at both pitching and hitting. Every baseball data hound knows that Babe Ruth was the prototype. He went 94-46 with a 2.28 ERA while compiling a .342 BA. How about others?

As I investigated, I became distracted from that pursuit because of the glaring errors in AI.

I typed into the Google search engine “Leaders in career BA (batting average) for pitchers.”  The AI overview said, “The leaders in career batting average for pitchers are Cy Young with a .210 average, followed by other pitchers with high batting averages like Babe Ruth and Dave McNally. However, when discussing a pitcher’s performance, “BA” more commonly refers to earned run average (ERA) rather than batting average.”

BA has never referred to earned run average. And Dave McNally as a hitter? His BA, below the Mendoza line at .196, was worse than the anemic .201 by Cy Young, who hit .01 above the Mendoza line.

With Wes Ferrell in mind, I typed in “mlb pitchers with 6 or more 20-game win seasons not in the hof” and got this response: “There are no MLB pitchers with six or more 20-win seasons who are not in the Hall of Fame. However, pitcher Roger Clemens has the most 20-win seasons (six) among eligible pitchers not in the Hall of Fame.”  If the first sentence is correct, the second sentence makes no sense. Further, Wes Ferrell had six 20-win seasons.

Curious now about AI blunders, I typed in “pitchers with eight or more 20-game win seasons” and got this AI response: “No pitcher has eight or more 20-win seasons. However, Jim Palmer is reported to have eight seasons with 20 or more wins, a figure surpassed by only a few pitchers in baseball history. This statistic is particularly notable because of the rarity of 20-win seasons in modern baseball. Jim Palmer: Had eight seasons with 20 or more wins, which is the most since 1960.

Other Pitchers: Many other pitchers have achieved multiple 20-win seasons, but none have reached eight or more.”

That response is so plainly wrong as to baffle me. AI reported Spahn and Cy Young as having five 20-game win seasons. Spahn had 12 20-game win seasons and Cy Young had 18. In defense of AI reporting on Young, I note that perhaps the site did not know the difference between  20-win season and a 30-win season inasmuch as Cy Young had five 30-games win seasons.

More AI nonsense about baseball: “Whitey Ford: Managed by Casey Stengel for most of his career, Ford had only one 20-win season, in 1961, when he finally received more regular starts under new manager Ralph Houk.”  Ford won 25 games in 1961 and 24 in 1963.

Enough. For baseball data, AI is unreliable. Use baseballreference.com instead. As for teaching, I used to tell my students that Wikipeidia was an okay site for finding research sources — books and articles on their topic.  I strongly counseled them to use Google Scholar, a grown-up source that lists books and articles that have been reviewed by academics expert in their respective fields.

Were I still teaching, I’d make the same recommendations but given reports on how AI is used, the recommendations would go unfollowed. So, what should we do? Have students write papers, turn them in, and then on the due date, in class, ask the students to write a one-page summary of their papers. That is the item I would grade and record. Reading, research and writing is about comprehension and assessment (critical thinking), not the relaying of information that may or may not be accurate.

Richard McGowan, Ph.D., an adjunct scholar of the Indiana Policy Review Foundation, has taught philosophy and ethics cores for more than 40 years, most recently at Butler University.



Comments...

Leave a Reply