Friday, October 8, 2010

Evaluation: Collecting Data to Prove and Improve

Evaluation is a vital component of the design and research process. However, games provide unique challenges as well as unique opportunities for measurement. Games allows the opportunity to measure far more than “task-related” assessment or assessment of “targeted skills” (Bente et al, 2009). Games allow to collect and code information for states such as “arousal, attention, and workload, as well about their mutual affiliations, transactions, and workloads.” Games also provide a variety of built-in assessments, such as health bars, leader boards, end of missions stats, and more that can be manipulated to monitor and provide feedback on selected serious goals (Bente et al, 2009). The ultimate goal of such evaluation is three-fold, to provide assessments that harvest data to improve in-game learning experiences, to provide assessments that quantify player data to adapt or tailor gameplay, and to provide assessments that measure the effectiveness of the serious game in meetings its target outcomes(Bente et al, 2009).



There are two important types of assessments that should be important in serious games today: formative and summative assessments (Bente et al, 2009). A formative assessment tells you where the player's mastery of an objective are at a given moment before or during play, while a summative assessment tells you the player's mastery of an objective at the end of play. Both provide meaningful data that can be used to measure and improve the serious game.

For example, imagine you are a teacher in a classroom of 30 students. You are required to improve student achievement so that 80% of your students achieve mastery on a high stakes state assessment. You teach, but never perform formative assessments. Therefore, when the summative state assessment scores arrive, you are surprised when on 65% of your students achieve mastery. If you had administered formative assessments, including a beginning of year pre-test, on content to be mastered, you would have known the baseline of the group of the group of students and been able to tailor instruction, or differentiate, in order to ensure that many more of them were successful. You also would know which skills were strengths and which we weaknesses, allowing you to create unit plans that spend the appropriate amount of time instructing and reviewing key skills.

This process is actually part of “evidence centered design.” You must measure competence and know what a student needs to know (or know how to do) in order to accomplish a task. You must mine evidence that students are achieving the goals by denoting measurable behaviors, or benchmarks, that can be observed. You craft the task from this knowledge, which is a situation that provides evidence of the benchmarks. Using then, the data collected about student performance, the game, instruction, or delivery for intervention should adapt to allow the appropriate amount of practice or review necessary for mastery (Bente et al, 2009).

I have already mentioned that evaluations in games should not necessarily look like education, as entertainment games have created opportunities to borrow and modify systems that also effectively provide rewards and feedback for players. It is important, due to the nature of games, to assess and provide feedback through multiple channels, ie. Music, health bar, power-ups, etc. This provides the player with control (Bente, 2009). It also provides the developer and research team with rich data to analyze.

Take for example Barab et al's research on the education MUVE Quest for Atlantis, developed with an NSF grant at Indiana University. For the evaluation, they used four different assessments: immediate, close level, proximal, and distal. From these varied assessments, taking place as formative and summative assessments they were able to measure several different areas of the game. This included the effectiveness of narrative and scientific inquiry, students ability to create project based work after the game, how they learned the concepts and skills within the game, and whether they were able to then apply those concepts and skills in a high stakes testing scenario. This rich collection of data not only both quantitatively and qualitatively measured effectiveness, but also provided valuable feedback for researchers on how to improve upon the outcomes of the “quest” measured (Barab et al, 2007).

This study provides an effective example of different methods for evaluation in serious games, in particular educational games where “Standards-based instruction” is of the utmost importance in today's educational climate. However, the study also provides meaningful evaluations that could feasibly could be modified for other areas of serious games research. What it appeared to lack, was the signficant use of in-game rewards systems like those used in entertainment games in order to collect data (Bente et al, 2009). However, it is significant, that the branching dialog used provided an interesting narrative form of feedback to students and could be valuable evaluative and feedback tool in future games (Barab et al, 2007). Clearly, evaluation of serious games, as is the media, in its infant stages and there are many opportunities still to borrow and innovate within the field to add more depth to the research.

Works Cited

Barab, S., Dodge, T., Tuzun, H., Job-Sluder, K., Jackson, C., Arici, A., Job-Sluder, L., Carteaux, R.,    Jr., Gilbertson, J., & Heiselt, C. (2007). The Quest Atlantis Project: A socially-responsive play space for learning. In B. E. Shelton & D. Wiley (Eds.), The Educational Design and Use of Simulation Computer Games (pp 159-186). Rotterdam, The Netherlands: Sense Publishers.

Bente, G. & Breuer, J. (2009). Making the implicit explicit: Embedded measurement in serious games. In U. Ritterfield, M. Cody & P. Vorderer (Eds.), Serious Games: Mechanisms and Effects (pp322-343). New York, U.S.A.: Routledge.

No comments:

Post a Comment

Imaginings by Kristy


View more personalized gifts from Zazzle.