Referenties

ACARA - Australian Curriculum, Assessment and Reporting Authority. 2014. NAPLAN Achievement in Reading, Persuasive Writing, Language Conventions and Numeracy: National Report for 2014.” Sydney: ACARA.
AERA APA & NCME. 2014. Standards for Educational and Psychological Testing. Washington D.C.: American Educational Research Association. https://blackwells.co.uk/bookshop/product/Standards-for-Educational-and-Psychological-Testing-by-American-Educational-Research-Association-American-Psychological-Association-National-Council-on-Measurement-in-Education-Joint-Committee-on-Standards-for-Educational-and-Psychological-Testing-U-S-/9780935302356.
Baartman, Liesbeth. 2008. “Assessing the Assessment: Development and Use of Quality Criteria for Competence Assessment Programmes.” Doctoral Thesis, Utrecht University.
Baartman, Liesbeth, Theo Bastiaens, Paul Kirschner, and Cees van der Vleuten. 2006. “The Wheel of Competency Assessment: Presenting Quality Criteria for Competency Assessment Programs.” Studies in Educational Evaluation 32 (2): 153–70. https://doi.org/10.1016/j.stueduc.2006.04.006.
Baartman, Bastiaens, Kirschner, and van der Vleuten. 2007. “Evaluating Assessment Quality in Competence-Based Education: A Qualitative Comparison of Two Frameworks.” Educational Research Review 2 (2): 114–29. https://doi.org/10.1016/j.edurev.2007.06.001.
Basturk, Ramazan. 2008. “Applying the Many‐facet Rasch Model to Evaluate PowerPoint Presentation Performance in Higher Education.” Assessment & Evaluation in Higher Education 33 (4): 431–44. https://doi.org/10.1080/02602930701562775.
Ben-Simon, Anat, and Randy Elliot Bennett. 2007. “Toward More Substantively Meaningful Automated Essay Scoring.” The Journal of Technology, Learning and Assessment 6 (1, 1). https://ejournals.bc.edu/index.php/jtla/article/view/1631.
Biggs, John. 1996. “Enhancing Teaching Through Constructive Alignment.” Higher Education 32 (3): 347–64. https://doi.org/10.1007/BF00138871.
Biggs, John B., and Catherine So-kum Tang. 2011. Teaching for Quality Learning at University: What the Student Does. 4th edition. SRHE and Open University Press Imprint. Maidenhead, England New York, NY: McGraw-Hill, Society for Research into Higher Education & Open University Press.
Brennan, Robert L., ed. 2006. Educational Measurement. 4. ed. Series on Higher Education. New York: American Council on Education [u.a.].
Brennan, Robert L., and Eugene G . Johnson. 1995. “Generalizability of Performance Assessments.” Educational Measurement: Issues and Practice 14 (4): 9–12. https://doi.org/10.1111/j.1745-3992.1995.tb00882.x.
Chapelle, Carol A. 2012. “Validity Argument for Language Assessment: The Framework Is Simple….” Language Testing 29 (1): 19–27. https://doi.org/10.1177/0265532211417211.
Chapelle, Carol A., Mary K. Enright, and Joan Jamieson. 2010. “Does an Argument-Based Approach to Validity Make a Difference?” Educational Measurement: Issues and Practice 29 (1): 3–13. https://doi.org/10.1111/j.1745-3992.2009.00165.x.
Childs, Ruth, and Andrew Jaciw. 2019. “Matrix Sampling of Items in Large-Scale Assessments.” Practical Assessment, Research, and Evaluation 8 (1). https://doi.org/10.7275/gwvh-4z51.
Cohen, Allan, and James Wollack. 2006. “Test Administration, Security, Scoring, and Reporting.” In Educational Measurement, edited by Robert L. Brennan, 4th ed., 355–86. American Council on Education/Praeger.
Cronbach, Lee. 1971. “Test Validation.” In Educational Measurement, edited by L. Thorndike, 2nd ed., 443–507. Washington D.C.: American Council on Education/Praeger.
Cronbach, Lee J, and Goldine C Gleser. 1965. Psychological Tests and Personnel Decisions. Urbana: University of Illinois Press.
Cronbach, Lee J., and Paul E. Meehl. 1955. “Construct Validity in Psychological Tests.” Psychological Bulletin 52 (4): 281–302. https://doi.org/10.1037/h0040957.
Crooks, Terry J., Michael T. Kane, and Allan S. Cohen. 1996. “Threats to the Valid Use of Assessments.” Assessment in Education: Principles, Policy & Practice 3 (3): 265–86. https://doi.org/10.1080/0969594960030302.
Curcin, Milja, Andrew Boyle, Tom May, and Zeeshan Rahman. 2014. “A Validation Framework for Work-Based Observational Assessment in Vocational Qualifications.” Coventry: Office of Qualifications and Examinations Regulation.
Darling-Hammond, Linda, and Frank Adamson. 2014. Beyond the Bubble Test: How Performance Assessments Support 21st Century Learning. First edition. San Francisco, CA: Jossey-Bass & Pfeiffer Imprints, Wiley.
Davey, Tim, Steve Ferrara, P. W. Holland, Rich Shavelson, Noreen M. Webb, and Lauress L. Wise. 2015. “Psychometric Considerations for the Next Generation of Performance Assessment. Princeton.” Educational Testing Service.
De Maeyer, Sven, Vincent Donche, Jan Vanhoof, Peter Van Petegem, Liesje Coertjens, Jetje De Groof, and Alexia Deneire. 2016. “Hoe Zijn Competenties Grootschalig Te Toetsen? Ontwikkeling van Een Evaluatiematrix Voor Toetsprogramma’s En Een Inventarisatie van ‘Good Practices.’ Eindrapport. Departement Onderwijs.
Der Vleuten, Cees P M van, and Lambert W T Schuwirth. 2005. “Assessing Professional Competence: From Methods to Programmes.” Medical Education 39 (3): 309–17. https://doi.org/10.1111/j.1365-2929.2005.02094.x.
Dienst Beroepsopleiding. 2008. “Competentieleren: Een Gedachte-Experiment: Rapport.” Brussel: Dienst Beroepsopleiding, Departement Onderwijs en Vorming.
Educational Assessment Research Unit & NZCER - New Zealand Council for Educational Research, EARU -. 2014. “National Monitoring Study of Student Achievement (Wanangatia Te Putanga Tauira) - Health and Physical Education 2013.” New Zealand: Ministry of Education.
Eisner, Elliot W. 1999. “The Uses and Limits of Performance Assessment.” The Phi Delta Kappan 80 (9): 658–60. https://www.jstor.org/stable/20439532.
Engelhard, George Jr. 2002. “Monitoring Raters in Performance Assessments.” In Large-Scale Assessment Programs for All Students, edited by Gerald Tindal and Thomas M. Haladyna. Routledge.
Figel, J. 2007. “Key Competences for Lifelong Learning-European Reference Framework.” Luxemburg: Office for Official Publications of the European Communities.
Fitzpatrick, R., and E. Morrison. 1971. “Performance and Product Evaluation.” In Educational Measurement, edited by L. Thorndike, 2nd ed., 443–507. Washington D.C.: American Council on Education/Praeger.
Gorin, Joanna S, and Robert J Mislevy. 2013. “Inherent Measurement Challenges in the Next Generation Science Standards for Both Formative and Summative Assessment.” New Jersey: Educational Testing Service. https://www.ets.org/Media/Research/pdf/gorin-mislevy.pdf.
Gulikers, Judith, and Niek van Benthum. 2017. “Toetsen van competenties.” In Toetsen in het hoger onderwijs, edited by Henk van Berkel, Anneke Bax, and Desirée Joosten-ten Brinke, 227–39. Houten: Bohn Stafleu van Loghum. https://doi.org/10.1007/978-90-368-1679-3_18.
Haertel, E. 2006. “Reliability.” In Educational Measurement, by Robert L. Brennan, 4th ed. Westport: Praeger Publishers.
Hambleton, Ronald K. 2006. “Setting Performance Standards.” In Educational Measurement, by B. S. Pitoniak and Robert L. Brennan, 4th ed. Westport: Praeger Publishers.
Hambleton, Ronald K., Richard M. Jaeger, Barbara S. Plake, and Craig Mills. 2000. “Setting Performance Standards on Complex Educational Assessments.” Applied Psychological Measurement 24 (4): 355–66. https://doi.org/10.1177/01466210022031804.
Heldsinger, S., and Humphry. 2013. “Using Calibrated Exemplars in the Teacher-Assessment of Writing: An Empirical Study.” Educational Research 55 (3): 219–35. https://doi.org/10.1080/00131881.2013.825159.
Heldsinger, and Humphry. 2010. “Using the Method of Pairwise Comparison to Obtain Reliable Teacher Assessments.” The Australian Educational Researcher 37 (2): 1–19. https://doi.org/10.1007/BF03216919.
Hill, Richard K., and Charles A. DePascale. 2003. “Reliability of No Child Left Behind Accountability Designs.” Educational Measurement: Issues and Practice 22 (3): 12–20. https://doi.org/10.1111/j.1745-3992.2003.tb00133.x.
Holland, P. W., and Charles A. DePascale. 2006. “Linking and Equation.” In Educational Measurement, by Robert L. Brennan, 4th ed., 187–220. Westport: Praeger Publishers.
Hornsby, D., and M. Wu. 2012. “Misleading Everyone with Statistics.” http://sydney.edu.au/education_social_work/news_events/resources/No_NAPLAN.pdf.
Johnson, Robert L., James A. Penny, and Belita Gordon. 2009. Assessing Performance: Designing, Scoring, and Validating Performance Tasks. New York: The Guilford Press.
Kane. 2006. “Validation.” In Educational Measurement, by Robert L. Brennan, 4th ed. Westport: Praeger Publishers.
Kane, M. T. 2013. “Validating the Interpretations and Uses of Test Scores.” Journal of Educational Measurement 50 (1): 1–73. https://doi.org/10.1111/jedm.12000.
Kane, Crooks, and Cohen. 1999. “Validating Measures of Performance.” Educational Measurement: Issues and Practice 18 (2): 5–17. https://doi.org/10.1111/j.1745-3992.1999.tb00010.x.
Kimbell, Richard, Tony Wheeler, Soo Miller, and Alastair Pollitt. 2007. E-Scape Portfolio Assessment - Phase 2 Report. London: Goldsmiths.
Kish, Leslie. 2005. Statistical Design for Research. https://nbn-resolving.org/urn:nbn:de:101:1-20141021261.
Kolen. 2006. “Scaling and Norming.” In Educational Measurement, by Robert L. Brennan, 4th ed. Westport: Praeger Publishers.
Kolen, and Brennan. 2014. Test Equating, Scaling, and Linking: Methods and Practices. 3d edition. Statistics for Social Science and Public Policy. New York: Springer.
Kuhlemeier, Hans, Bas Hemker, and Huub van den Bergh. 2013. “Impact of Verbal Scale Labels on the Elevation and Spread of Performance Ratings.” Applied Measurement in Education 26 (1): 16–33. https://doi.org/10.1080/08957347.2013.739425.
Kuhlemeier, Hans, A. van Til, Bas Hemker, W. de Klijn, and H. Feenstra. 2013. “Balans van de Schrijfvaardigheid in Het Basis- En Speciaal Basisonderwijs 2. Uitkomsten van de Peiling in 2009 in Groep 5, Groep 8 En de Eindgroep van Het SBO.” 53. PPON-reeks. Arnhem: Cito.
Lane, S. 2015. “Performance Assessment: The State of the Art.” In Beyond the Bubble Test, edited by Linda Darling-Hammond and Frank Adamson, 131–84. San Francisco: John Wiley & Sons, Inc. https://doi.org/10.1002/9781119210863.ch5.
Lane, Suzanne. 2010. Performance Assessment: The State of the Art. SCOPE Student Performance Assessment Series. Stanford, CA: Stanford University, Stanford Center of Opportunity Policy in Education. https://edpolicy.stanford.edu/sites/default/files/publications/performance-assessment-state-art_1.pdf.
Lane, Suzanne, and C. Stone. 2006. “Performance Assessment.” In Educational Measurement, edited by Robert L. Brennan, 4th ed., 387–432. American Council on Education/Praeger.
Lesterhuis, Donche, De Maeyer, van Daal, Van Gasse, Coertjens, Verhavert, Mortier, Coenen, and Vlerick. 2015. “Compententies Kwaliteitsvol Beoordelen: Brengt Een Comparatieve Aanpak Soelaas?” Tijdschrift Voor Hoger Onderwijs 33 (2): 55–67.
Lesterhuis, Verhavert, Coertjens, Donche, and De Maeyer. 2017. “Comparative Judgement as a Promising Alternative to Score Competences.” In Innovative Practices for Higher Education Assessment and Measurement, by E. Cano and G. Ion, 119–36. https://doi.org/10.4018/978-1-5225-0531-0.ch007.
Linn, Robert, Eva Baker, and Stephen B. Dunbar. 1991. “Complex, Performance-Based Assessment: Expectations and Validation Criteria.” Educational Researcher 20 (8): 15–21. https://doi.org/10.3102/0013189X020008015.
Lissitz, Robert W, and Feifei Li. 2011. “Standard Setting in Complex Performance Assessments: An Approach Aligned with Cognitive Diagnostic Models.” Psychological Test and Assessment Modeling 53 (4): 461–85.
Lizzio, Alf, and Keithia Wilson. 2004. “Action Learning in Higher Education: An Investigation of Its Potential to Develop Professional Capability.” Studies in Higher Education 29 (4): 469–88. https://doi.org/10.1080/0307507042000236371.
Lu, L. R. 2012. A Validation Framework for Automated Essay Scoring Systems. Unpublished Doctoral Dissertation. Australia: Faculty of Education, University of Wollongong.
Mazzeo, J., and M. J. Zieky. 2006. “Monitoring Educational Progress with Group-Score Assessments.” In Educational Measurement, by Robert L. Brennan, 4th ed., 681–99. Westport: Praeger Publishers.
Messick. 1996. “Validity of Performance Assessments.” In Technical Issues in Large-Scale Performance Assessment, edited by G. Phillips, 198–258. Washington D.C.: National Center for Education Statistics.
Messick, S. 1989. “Validity.” In Educational Measurement, 3rd Ed, edited by R. L. Linn, 13–103. The American Council on Education/Macmillan Series on Higher Education. American Council on Education.
Messick, Samuel. 1994. “The Interplay of Evidence and Consequences in the Validation of Performance Assessments.” Educational Researcher 23 (2): 13–23. https://doi.org/10.3102/0013189X023002013.
Moss, Pamela A. 1994. “Can There Be Validity Without Reliability?” Educational Researcher 23 (2): 5–12. https://doi.org/10.3102/0013189X023002005.
National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Committee on Developing Assessments of Science Proficiency in K-12. Washington D.C.: The National Academies Press.
Newhouse, C. Paul. 2011. “Using IT to Assess IT: Towards Greater Authenticity in Summative Performance Assessment.” Computers & Education 56 (2): 388–402. https://doi.org/10.1016/j.compedu.2010.08.023.
Newhouse, Paul. 2013. “Literature Review and Conceptual Framework.” In Digital Representations of Student Performance for Assessment, edited by P. John Williams and C. Paul Newhouse, 9–28. Rotterdam: SensePublishers. https://doi.org/10.1007/978-94-6209-341-6_2.
Pecheone, Raymond, and Stuart Kahl. 2015. “Where We Are Now.” In Beyond the Bubble Test, 53–91. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119210863.ch3.
Powers, Donald E., and Mary E. Fowles. 1998. “Effects of Preexamination Disclosure of Essay Topics.” Applied Measurement in Education 11 (2): 139–57. https://doi.org/10.1207/s15324818ame1102_2.
Powers, Donald E., Mary E. Fowles, Marisa Farnum, and Paul Ramsey. 1994. “Will They Think Less of My Handwritten Essay If Others Word Process Theirs? Effects on Essay Scores of Intermingling Handwritten and Word-Processed Essays.” Journal of Educational Measurement 31 (3): 220–33. https://www.jstor.org/stable/1435267.
Prodromou, Luke. 1995. “The Backwash Effect: From Testing to Teaching.” ELT Journal 49 (1): 13–25. https://doi.org/10.1093/elt/49.1.13.
Rubin, D. 1996. “A Preface Relating Alternative Assessment, Test Fairness, and Assessment Utility to Communication.” In Large Scale Assessment of Oral Communication: K–12 and Higher Education, by S. Morreale and P. Backlund, 1–4. Annandale: Speech Communication Association. https://files.eric.ed.gov/fulltext/ED399578.pdf.
Schmeiser, C., and C. Welch. 2006. “Test Development.” In Educational Measurement, by Robert L. Brennan, 4th ed., 307–54. Westport: Praeger Publishers.
Shavelson, Richard J. 2010. “On the Measurement of Competency.” Empirical Research in Vocational Education and Training 2 (1, 1): 41–63. https://doi.org/10.1007/BF03546488.
Shaw, Stuart, Victoria Crisp, and Nat Johnson. 2012. “A Framework for Evidencing Assessment Validity in Large-Scale, High-Stakes International Examinations.” Assessment in Education: Principles, Policy & Practice 19 (2): 159–76. https://doi.org/10.1080/0969594X.2011.563356.
Sireci, Stephen G. 2009. “Packing and Unpacking Sources of Validity Evidence: History Repeats Itself Again.” In The Concept of Validity: Revisions, New Directions, and Applications, 19–37. Charlotte, NC, US: IAP Information Age Publishing.
Stecher, Brian. 2015. “Looking Back.” In Beyond the Bubble Test, 15–52. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119210863.ch2.
Steedle, Jeffrey T., and Steve Ferrara. 2016. “Evaluating Comparative Judgment as an Approach to Essay Scoring.” Applied Measurement in Education 29 (3): 211–23. https://doi.org/10.1080/08957347.2016.1171769.
Straetmans, G. 2014. “Toetsen met performance assessment methodieken.” In Toetsen in het hoger onderwijs, edited by Henk van Berkel, Anneke Bax, and Desiree Joosten-ten Brinke. Bohn Stafleu van Loghum.
Tan, Xuan, and Rochelle Michel. 2011. “Why Do Standardized Testing Programs Report Scaled Scores?” ETS R&D Connections, no. 16: 6.
Toulmin, Stephen. 2003. The Uses of Argument. Updated ed. Cambridge, U.K. ; New York: Cambridge University Press.
van Daal, Lesterhuis, Coertjens, Donche, and De Maeyer. 2019. “Validity of Comparative Judgement to Assess Academic Writing: Examining Implications of Its Holistic Character and Building on a Shared Consensus.” Assessment in Education: Principles, Policy & Practice 26 (1): 59–74. https://doi.org/10.1080/0969594X.2016.1253542.
Weigel, Tanja, Martin Mulder, and Kate Collins. 2007. “The Concept of Competence in the Development of Vocational Education and Training in Selected EU Member States.” Journal of Vocational Education & Training 59 (1): 53–66. https://doi.org/10.1080/13636820601145549.
Wools, Saskia. 2015. “All About Validity - An Evaluation System for the Quality of Educational Assessment.” Enschede: University of Twente.
Wools, Saskia, P. Sanders, and E. Roelofs. 2007. Beoordelingsinstrument: Kwaliteit van Competentie Assessment. Arnhem: Cito.