Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Objectives and Perspectives
A lesson plan is a teacher’s daily guide that outlines what students need to learn, how it will be taught, and how learning will be measured (Jackson et al., 2002). Yet creating high-quality lesson plans is challenging, especially for teachers with limited resources and support. To address these challenges and enhance the quality of lesson plan teachers develop, we propose a framework for auto-generating lesson plans by utilizing LLMs. We draw our inspiration from the iterative improvement method in software development (Salo et al., 2007, Touvron et al., 2023, Zhao et al., 2023). The proposed framework (Figure 1) consists of three stages: component generation, self-critique, and refinement.
Methods
The component generation stage involves the LLM generating each part of the lesson plan step by step, using essential information provided by teachers (lesson topic, grade level, and subject) and lesson objectives retrieved from real-world datasets via retrieval-augmented generation (RAG; Lewis et al., 2020, Gao et al., 2023). This method ensures that the generated content is relevant and coherent. In the self-critique stage, the LLM integrates the generated components into a complete lesson plan and evaluates it based on criteria from senior educators by providing a critical review and suggestions for improvement (Madaan et al., 2024). The refinement stage uses feedback from the self-critique to iteratively enhance the lesson plan by revisiting each component to make the necessary improvements and ensuring that the final product meets high-quality standards akin to reflective educational practices.
Data Sources
In this study, we generated more than 80 topics (Table 1) of mathematics lesson plans for elementary school stages by using the GPT-4 API (Achiam et al., 2023). To ensure quality and effectiveness, we compiled a dataset of exemplary lesson plans from diverse educational platforms, all meticulously crafted by human educators.
The lesson plans were evaluated in four versions: version 1 (generated once and for all), version 2 (generated component by component), version 3 (refined for component-by-component generation), and human (designed by a human). The evaluation criteria (Table 2) encompassed a global score (30 points) for overall integrity, accuracy, consistency, and logic and a component score (66 points) for individual elements, such as the clarity of objectives, completeness of materials, and effectiveness of procedures.
Results
The evaluation results (Table 3) revealed that version 2 scored the lowest in global coherence compared with versions 1 and 3, indicating that component-by-component generation did not inherently ensure coherence and relevance. The component scores demonstrated a trend in which version 1 < version 2 < version 3, suggesting that component-specific generation resulted in more targeted and specific content. Overall, version 3 emerged as the most effective, as it combined the global coherence of version 1 with the component specificity of version 2, and it demonstrated the potential of iterative generation and refinement in producing high-quality lesson plans.
Significance of the Study
This study highlights the efficacy of using AI to generate structured and comprehensive lesson plans, and it provides valuable insights for future optimization in educational content creation.