Computer says “no”? The pitfalls and potential of machine assessment

Think Space

Ever since the room-sized computers of the mid-twentieth century were programmed to read data from a series of punched cards, the potential for computer to recognise, process and respond to human input has existed. This potential is now being realised.

Nowadays, whether it’s a computer-based skills test, a Centrelink online claim form or even a web-based insurance application (the methodology is really much the same), we’ve all had contact with some form of machine assessment. But for education, with testing forming such a time consuming, resource intensive yet crucial part of the process, the potential for computers to lighten the load of assessment holds great significance. The question is: where are we on this journey in Australia today?

Rise of the machines

There are different kinds of machine assessment and they range from pretty straightforward to mind-bendingly complex.

Multiple choice testing is a familiar example and it’s easy to see how using computers to mark these sorts of tests would be fairly simple. But, of course, things have moved on considerably.

New technology, from the internet and browser-based applications to tablets and touchscreens, has led to extensive user interface advancements in computer tests. Hotspots can be used to test image comprehension and drag and drop functionality allows more interactive ways of getting students to respond to questions and tasks than they could on paper or by simply selecting a radio button.

But more interestingly, advances in artificial intelligence now means computers can accurately judge persuasive writing in long form as well. This may sound like science fiction, but so were touchscreen tablets in the 1960s when Dr. Ellis Page first demonstrated that a computer could be used to score written responses to essay questions. More than 50 years later, a far more mature and practical evolution of that software which is available in Australia through the Australian Council for Education Research (ACER), which offers a range of online assessment modules for schools, including eWrite tests which use the world-leading IntelliMetric® system from Vantage Technologies to score long form answers.

Whilst right now take-up on ACER’s platform is optional, when the National Assessment Program - Literacy and Numeracy (NAPLAN) goes online from 2017, more and more schools will be exposed to machine algorithmic assessment in the NAPLAN persuasive writing test in addition to more straightforward multiple choice sections.

Accurate, reliable and without variation or bias: a natural fit for artificial intelligence

Long form machine assessment tests use advanced computational features such as cognitive processing, artificial intelligence, natural language understanding and computational linguistics to score responses on a range of criteria including orientation and engagement, register, text structure, ideas, vocabulary, paragraphing, sentences, punctuation in sentences and spelling. The resulting assessment has been proven to be more accurate, reliable and consistent than human marking. It also removes any bias from a marker’s foreknowledge of particular students, or where two markers may not exactly agree what’s most important about a topic. Scores come back much more quickly than could be achieved through manual marking and importantly it reduces the workload on teachers.

Of course there are criticisms, with some believing that while the computer can score some core features of an essay, they have no ability to appreciate and be swayed by the overall persuasive effect of a well-written argument.

Critically in this process, the marking program is not able to generate the initial model answers itself. Three hundred varied responses to a task must first be marked by humans and graded into a range of percentage strata. The responses and their corresponding grades are then fed into the program so that it can learn to recognise the hallmarks of a good answer, before being able to apply that inferred intelligence to other responses. Once the machine assessment is complete, a percentage of answers are also double marked using 20 control scripts representing responses from excellent to very poor, in order to ensure that there isn’t any significant deviation.

Don’t worry, people aren’t obsolete just yet

There are still things that the software just can’t handle, and some scripts are returned unmarked and flagged for individual manual moderation. This could be because a student was able to input a special character that the system doesn’t recognise, or because a response wasn’t long enough for the computer to mark. In the latter case, the answer would usually be a poor effort which represents a fail – but it might just be that the student has so succinctly and brilliantly summed up an answer that it falls outside of the machine’s ability to assess.

Despite the benefits and potential of automation, human input and control will always be central to the process. The human brain is incredibly complex and no-one is trying to say that software can emulate its remarkable abilities just yet. But let’s not forget that teachers have had their entire lives to understand human behavior and communication methods, plus years of education, professional training and experience before they are qualified to mark exams – the computer can assimilate data from hundreds of answers and then grade papers in seconds. However, well-educated, experienced and intelligent people still have to design the curricula, set the tests, and decide what good and bad answers look like to provide the software with enough variation to learn the difference.

Challenges and opportunities abound

For standardised national testing and other exams that operate on a large scale, a move to machine assessment could lead to some big savings in terms of time and cost. It’s also a good answer to some of the challenges of placement testing, which must often be completed remotely and assesses a candidate’s ability to reason more than their knowledge of details within a particular curriculum.

There are of course concerns, as with any testing, over cheating the system. And it seems that there are more inventive ways and means of manipulating the system with computers than the old school methods of writing notes on your hand or good old-fashioned plagiarism.

Whilst there are safeguards in online testing platforms, like disabling browser-generated grammar prompts and spellcheck, not allowing any copy and pasting and freezing the user’s screen so they can’t switch to another application to look up notes or browse the internet, there are still be ways that the system can be fooled. As with any exam where the stakes are high, students may go to extraordinary lengths to do just that.

For tests being taken remotely, establishing the test-taker’s true identity is an immediate issue. Even if the system receives the correct usernames, passwords and other identifiers, how can it be sure if it’s actually the right person who has supplied those details? In the US, there are businesses offering verification services that go as far as having remote invigilators watching students completing their tests via webcams, whilst simultaneously being able to view the content of the student’s screen to ensure they don’t use any notes or google a few things during their exam. Test-takers must identify themselves with photo identification, answer some personal questions from a database and do a webcam sweep of their room and workspace to prove that there are no notes or other sources of information on show. Some programs also track a user’s typing patterns for keynote verification and in the US there are pilot schemes using biometric recognition like iris scanning to verify students’ identities.

The future of machine testing: plenty to talk about

Machine assessment is growing rapidly, and in the US, Pearson has recently released - Test of English Language Learning (TELL), which uses speech recognition to allow students to give both written and verbal answers or machine assessment. TELL also offers dynamic reporting and tracks development throughout a school year, with the idea being that teachers are able to receive immediate feedback on their pupil’s progress and adjust their instruction accordingly.

While machine testing is developing deeper functionality and more impressive features, it’s not yet perfect, and as each current issue is overcome new ones will surely arise.

The truth is that no other testing system is perfect either, nor ever has been, and in the fast-moving digital age, continual evolution to stay on top of the shifting sand is paramount. The old adage that change is constant rings ever true when we think about computers. Modern day technology has gifted us with the ability to experience truly Innovative Pedagogy. Today’s educators and students are in prime position to have firsthand experience the “science fiction” which Dr Page’s generation could only dream about.