Interview: We Talked to a Teacher Who Integrated AI in His History Class
Un profesor de Historia comparte sus aciertos y errores al integrar ChatGPT, Gemini y Claude en el aula, con lecciones prácticas para docentes que quieren usar

Miguel Torres, a high school history teacher in Valencia, Spain, started experimenting with AI chatbots in his classroom in September 2025. By February 2026, 78% of his students reported improved research skills and critical thinking, according to an internal survey conducted by his school district. The pilot program, which combined ChatGPT, Gemini, and Claude with traditional source analysis, has since been adopted by four other schools in the region. Torres agreed to share his methodology, mistakes, and the realities of teaching with AI tools that many educators still view with suspicion.
The experiment matters because it addresses a tension facing schools worldwide: how to harness AI’s educational potential without undermining academic rigor. While some districts ban chatbots outright, others lack clear frameworks, leaving teachers to navigate on their own.
- Torres structured AI use around source verification, requiring students to fact-check every chatbot claim against primary documents.
- The pilot showed a 34% reduction in plagiarism incidents compared to the previous academic year, measured by Turnitin reports.
- Students used AI to generate hypotheses about historical events, then tested them through archival research and peer debate.
- The school invested €1,200 in teacher training and prompt engineering workshops before launching the program.
Why a history teacher decided to embrace AI
Torres observed that students were already using ChatGPT for homework, often uncritically copying answers. Rather than fighting a losing battle, he redesigned his curriculum to treat AI as a research assistant that requires constant supervision. «I realized banning it was pointless,» he explains. «They’d just use it in secret, and I’d lose the chance to teach them how to spot hallucinations or biased framing.»
The decision came after a November 2024 incident in which three students submitted nearly identical essays on the Spanish Civil War, all containing a fabricated quote attributed to Manuel Azaña. When confronted, the students admitted they’d trusted ChatGPT without checking sources. Torres saw an opportunity to turn the failure into a teaching moment.
He spent the 2024-25 winter break designing lesson plans that embedded AI verification exercises. Students would ask a chatbot to summarize a historical event, then compare the output against primary sources from the school library’s digital archive. The goal was to make critical thinking visible and measurable.
The methodology: treating AI as a flawed research partner
The core framework divides AI use into three phases: hypothesis generation, verification, and synthesis. According to Torres, this structure mirrors professional historical research and prevents students from treating chatbot answers as final truths. Each phase includes specific deliverables and rubrics adapted from Stanford’s History Education Group guidelines.
In the first phase, students pose open-ended questions to ChatGPT, Gemini, or Claude—questions like «What economic factors led to the fall of the Roman Empire?» or «How did women participate in the French Resistance?» The chatbot’s answer becomes a working hypothesis, not a conclusion.
Phase two requires students to locate at least three primary or secondary sources that either support or contradict the AI-generated hypothesis. Torres provides access to JSTOR, Google Scholar, and the National Library of Spain’s digital collections. Students document discrepancies in a shared spreadsheet, noting when the AI invents facts, oversimplifies, or omits key actors.
The final phase involves writing a 500-word analysis that incorporates both the chatbot’s insights and the students’ own findings. Torres grades on evidence quality, argumentation, and the ability to explain where the AI went wrong. A student who uncritically copies ChatGPT receives a failing mark; one who refutes the AI with archival evidence can earn top marks.
«The best essays are the ones that prove the AI wrong. That’s when I know they’ve actually learned to think like historians.»
Mistakes, hallucinations, and teachable moments
Torres documented 47 instances of AI hallucinations across five months of classroom use, ranging from invented dates to fictional historians. These errors became the foundation of a unit on misinformation and source literacy. One particularly egregious example involved Claude generating a plausible-sounding citation for a non-existent book titled The Hidden Role of Sephardic Merchants in the Reconquista.
Students initially trusted the citation because it included realistic details: a publisher (Oxford University Press), a year (2018), and an author with a credible-sounding name (Dr. Elena Ruiz-Domènech). Only after searching university library catalogs and Google Books did they realize the source was fabricated. Torres now uses the incident to teach ISBN verification and author credential checks.
Another common error involved anachronistic language. ChatGPT described medieval guilds using terms like «collective bargaining» and «labor rights,» projecting modern concepts onto the past. Torres assigned students to identify three anachronisms in a chatbot-generated essay about the Black Death, then rewrite the problematic sentences using period-appropriate vocabulary.
The misinformation unit culminated in a class debate. Half the students defended a chatbot-generated thesis about the causes of World War I; the other half attacked it using archival sources. The exercise revealed how persuasive AI-generated text can be, even when factually flawed. Post-debate surveys showed that 83% of students felt more confident identifying rhetorical manipulation, a skill Torres considers essential for navigating today’s information landscape.
Tools, budgets, and institutional support
The school allocated €1,200 for the pilot, covering teacher training, a subscription to ChatGPT Plus for classroom demos, and access to academic databases. Torres received 12 hours of professional development focused on prompt engineering and AI ethics. The training, delivered by a consultant from the Universitat de València, emphasized the importance of transparent AI use and the risks of over-reliance.
Torres chose to use free-tier versions of ChatGPT, Gemini, and Claude for student assignments, rotating models to expose students to different strengths and weaknesses. He found that Claude excelled at nuanced historical analysis but sometimes refused to answer politically sensitive questions. ChatGPT provided more confident answers but hallucinated more frequently. Gemini offered good summaries but struggled with non-English sources.
The school also updated its academic integrity policy to distinguish between prohibited AI use (submitting unedited chatbot text) and permitted use (using AI as a brainstorming or drafting tool with full attribution). The new policy, approved by the faculty council in January 2026, requires students to attach an «AI use log» to major assignments, documenting every prompt and the role AI played in their work.
| AI model | Best use case (according to Torres) | Main limitation |
|---|---|---|
| ChatGPT | Generating essay outlines and debate arguments | Frequent citation hallucinations |
| Claude | Nuanced analysis of historiographical debates | Overly cautious with controversial topics |
| Gemini | Summarizing academic articles and timelines | Weaker performance with non-English sources |
Student reactions: from skepticism to strategic use
Initial student surveys from September 2025 revealed that 62% viewed AI as a «cheating tool,» while only 18% saw it as a legitimate research aid. By March 2026, those numbers had reversed: 71% reported using AI strategically for brainstorming and source discovery. The shift, Torres believes, came from explicit instruction on when and how to deploy chatbots.
One student, Carla Méndez, described using Claude to explore alternative explanations for the collapse of the Weimar Republic. «I asked it to argue against the standard textbook narrative,» she said in a class reflection essay. «Then I had to figure out which counterarguments held up under scrutiny. It forced me to actually engage with the sources instead of just memorizing facts.»
Not all feedback was positive. Some students complained that the verification phase took longer than traditional research methods. Others felt penalized for trusting the AI, even when the hallucinations were difficult to detect. Torres responded by adjusting his grading rubric to reward students who documented their verification process, even if they initially accepted a false claim.
Parents expressed mixed reactions. A few raised concerns about data privacy and the ethics of using commercial AI tools in public schools. Torres addressed these worries by ensuring that students never entered personally identifiable information into chatbots and by using school-provided email accounts with privacy settings enabled. The school also sent consent forms home, allowing parents to opt their children out of AI-based assignments.
What other teachers can learn from the Valencia pilot
Torres identifies three critical success factors: explicit verification protocols, administrative support, and a willingness to treat AI failures as learning opportunities rather than catastrophes. Schools considering similar programs, he argues, should budget for teacher training and invest in database subscriptions that give students access to reliable sources.
He recommends starting small—one unit, one class—and iterating based on student performance data. His school tracked metrics like plagiarism rates, research citation quality, and student self-reported confidence in evaluating sources. The data showed measurable improvements in all three areas by the end of the academic year.
Torres also stresses the importance of transparency. He shares his own AI-use logs with students, modeling the behavior he expects. When preparing lectures, he sometimes uses ChatGPT to generate discussion questions, then fact-checks them before class. «If I’m going to ask students to be honest about their AI use,» he says, «I have to model that honesty myself.»
Other schools in Spain have begun adapting Torres’s framework. A secondary school in Barcelona launched a similar pilot in March 2026, focusing on literature instead of history. Early reports suggest that the verification-based approach transfers well to other humanities subjects, though science teachers remain divided on whether the methodology applies to STEM fields.
For teachers outside Spain, Torres points to resources like Anthropic’s Claude for Education initiative, which offers lesson plan templates and prompt libraries designed for classroom use. He also recommends joining online communities where educators share AI integration strategies, such as the AI in Education subreddit and the EdTech Hub’s Slack channel.
What happens when the technology changes faster than the curriculum
Torres acknowledges that his framework may need constant revision as AI capabilities evolve. OpenAI released GPT-4 in March 2023; by early 2026, GPT-5 rumors were circulating, along with reports of models that could generate more convincing citations and access real-time web data. Each new release threatens to outpace the verification skills students have just learned.
The challenge raises a broader question for education systems: should schools teach specific tool literacy (how to fact-check ChatGPT) or general AI literacy (how to evaluate any automated text, regardless of the underlying model)? Torres leans toward the latter. «In five years, ChatGPT might not exist, or it might work completely differently,» he says. «But the skill of questioning a confident-sounding claim? That’s timeless.»
The Valencia pilot ends in June 2026. The school district plans to evaluate its long-term impact by tracking participating students’ performance on Spain’s university entrance exams, which include essay components that reward critical thinking and source analysis. If the results hold, the district may expand the program to all secondary schools in the region. For now, Torres continues refining his lesson plans, waiting to see whether his students’ AI-augmented education translates into measurable academic success—or whether the experiment will be remembered as a well-intentioned detour in the history of pedagogy.