An Indianapolis-based nonprofit that regulates student athletes receives student transcripts from high schools around the country. The transcripts contain the required information, but in thousands of different formats. As a result, employees had to enter the documents into a database manually, costing the company hundreds of man-hours per week. This was a time consuming and error-prone process. Other local companies were tasked with automating the data entry but were ultimately unsuccessful.
Systematically identify courses and grades across academic year regardless of transcript format, and parse all transcripts into a standard data format – eventually entering it into the client’s transcript system. This reduces the manual time and cost.
How We Solved it:
- EduSource used a text extraction library to pinpoint the x and y coordinates for course name, course ID, and course grade.
- We then captured the coordinates of each data point within each document format.
- We also prescrubbed and parsed the data before loading it into the database – Up until now those steps had to be done manually.
- EduSource demonstrated the ability to automate the process of identifying and capturing transcript data across thousands of formats. Phase 2 of this project will save the company hundreds of thousands of dollars annually.
- We are also starting the process of using machine learning from previous transcripts so when the algorithm recognizes the format it puts the transcript into the correct template.
- Future steps are to build API to the database and utilize Optical Character Recognition to capture information from scanned transcripts and images.