How to train the model

Al_Mus · February 5, 2024, 3:42am

I have 2 collections, one of them contains the static data, “students”, the other one (“marks” ) contains the user_id, average_mark and date.

For every quarter I need to go and look at the data and write a short overview.

For example: “Dave has shown remarkable improvement in the first quarter of 2024, with his average mark increasing from 8.2 to 8.7”.

My collections are:

students: [
     {_id: 1, name: "Dave"}
]
marks: [
     {_id: 1, student_id: 1, average_mark: 8.2, date: ISODate("2023-09-30T00:00:00.000+0000")},
     {_id: 2, student_id: 1, average_mark: 8.7, date: ISODate("2023-12-31T00:00:00.000+0000")},
]

Is it possible for me to train the model based on the past data, where I would feed the raw data and my (manually) written overview? So that in the future I can input the data and generate the text?

Michael_Lynn · September 8, 2024, 12:12pm

It sounds like you’re trying to build something similar to what we’ve worked on recently, but the good news is you don’t need to custom train a model for this! Instead, you can leverage a stock Large Language Model (LLM) like OpenAI’s GPT-4, and it works wonders without the added complexity of training your own model… that can be a lot of work.

Why You Don’t Need to Train a Model:

GPT-4 can generate personalized progress reports just by feeding in some basic data—like a student’s initial and most recent grades—and letting the model handle the rest. The model can even adapt its tone depending on whether the student’s performance has improved or declined.

What We Did:

We pulled student data from MongoDB and then generated personalized progress summaries using OpenAI’s GPT-4. The model produced motivational or constructive feedback based on the student’s performance. Here’s how you can do it yourself.

The Script:

from openai import OpenAI
import os
from pymongo import MongoClient

# Initialize OpenAI client
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY")  # this can be omitted if set in the environment
)

# MongoDB connection URI and setup
mongo_uri = os.getenv("MONGO_URI")
mongo_client = MongoClient(mongo_uri)
db = mongo_client['student_test']
students_collection = db['students']
marks_collection = db['marks']

# Function to get student and marks data and create a markdown report
def generate_student_report():
    students = list(students_collection.find().limit(10))  # Limiting to 10 students
    
    # Debugging step: print the students data
    print("Students data fetched:", students)

    summaries = []

    for student in students:
        print(f"Processing student: {student['name']} (ID: {student['_id']})")

        # Fetch student's marks data
        marks = list(marks_collection.find({"student_id": student["_id"]}).sort("date", 1))
        
        # Debugging step: print marks data
        print(f"Marks data for {student['name']}: {marks}")
        
        if len(marks) >= 2:
            # Fetch first and last mark for the comparison
            first_mark = marks[0]
            last_mark = marks[-1]
            
            # Determine if performance has improved or declined
            if last_mark['average_mark'] > first_mark['average_mark']:
                prompt = f"""
                Write a brief summary of {student['name']}'s performance. They started with an average mark of {first_mark['average_mark']} on {first_mark['date']} and their average mark improved to {last_mark['average_mark']} on {last_mark['date']}.
                Provide this summary in a motivational, encouraging tone.
                """
            else:
                prompt = f"""
                Write a brief summary of {student['name']}'s performance. They started with an average mark of {first_mark['average_mark']} on {first_mark['date']} but their average mark dropped to {last_mark['average_mark']} on {last_mark['date']}.
                Provide this summary in a constructive tone, acknowledging the drop and suggesting ways for {student['name']} to improve their performance in the future.
                """
            
            print(f"Generated prompt: {prompt}")
            
            # Call the OpenAI API using the instantiated client
            response = client.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "You are an expert academic coach. Analyze the student's performance and provide a summary of their progress."},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=150
            )
            
            # Debugging step: print the API response
            print("GPT-4 API response:", response)

            # Store the summary and student's name
            summaries.append({
                "student_name": student["name"],
                "summary": response.choices[0].message.content.strip(),
                "first_mark": first_mark['average_mark'],
                "last_mark": last_mark['average_mark']
            })

    # Generate markdown report
    generate_markdown_report(summaries)

def generate_markdown_report(summaries):
    report_file = "student_progress_report.md"
    
    # Open a file in write mode
    with open(report_file, "w") as file:
        file.write("# Student Progress Report\n\n")
        
        for summary in summaries:
            file.write(f"## {summary['student_name']}\n\n")
            file.write(f"{summary['summary']}\n\n")
            file.write(f"Initial Grade: {summary['first_mark']}\n")
            file.write(f"Latest Grade: {summary['last_mark']}\n\n")
            file.write("---\n\n")
    
    print(f"Markdown report generated: {report_file}")

# Fetch the data and generate the report
generate_student_report()

How It Works:

OpenAI Integration: We use GPT-4 to generate the personalized summary of each student’s performance.
MongoDB for Student Data: The script pulls the student names and grades from MongoDB. We compare the student’s first and last grades to create a summary.
Conditionally Generate Responses: If the student’s performance improves, the response is positive and motivational. If their grades drop, the response is more constructive and suggests ways to improve.

Sample Report:

Here’s an example of the report it generates:

If the student’s performance improves:

## Dave Dave has shown remarkable improvement. He started with an average mark of 8.2 on 2024-01-01 and improved to 8.7 on 2024-03-31. Initial Grade: 8.2 Latest Grade: 8.7

If the student’s performance drops:

## Sarah Sarah's performance has declined. She started with an average mark of 8.5 on 2024-01-01 but dropped to 7.9 on 2024-03-31. Sarah should focus on consistency in her study habits and seek additional support to bring her grades back up. Initial Grade: 8.5 Latest Grade: 7.9

Modify the prompt to suit your personal style… hope this helps.