Archive

Posts Tagged ‘assessment’

testing, testing, one, two, three

April 26, 2011 2 comments

Testing.  It’s become so much part of the life of a learner or a teacher, at any age.  And it’s a fascinating topic.

Okay, I’m one of those weird people who thinks of test-taking myself as a sort of competitive athletic event, one at which I’m really quite good even while thinking that the vast majority of tests I’ve ever taken were nearly completely pointless.  No, that’s not my impostor syndrome kicking in.  It has to do with a central concept in test design, which I’ll explain below.

What I love most about assessment is how useful it can be when done well.  One of my colleagues says that testing doesn’t bring out the best in people, it doesn’t bring out the worst in people, but it brings out the most in people.  We put you in a situation where your normal compensatory strategies for getting along in the world aren’t going to work.  As Peter Ossorio says, when you ask a person to do something they can’t do, they’ll do something they can do.  You’ll figure out something to do, the best you can, and what you do will be a reflection, in some way, of you.  It’s like science — each test is an experiment that you and I do together.  No one bit of data proves anything by itself, but when we put things together and look for themes, consistencies, divergences, a story begins to emerge, and it often does so surprisingly quickly.

But what bugs me is how little most folks understand about tests of all stripes — most importantly, how they’re built, how they work, what they’re good for… and what they aren’t.  So what I’d like to do is to kick off a random-access series of posts on various aspects of assessment, including ordinary classroom tests, high-stakes testing for No Child Left Behind Allowed Ahead (also known as No Teacher Left Standing) and other similar “accountability” movements, bubble tests like the dreaded SAT and its ilk, and, of course, my favorite, the one-on-one kinds of tests used for special education and other diagnostic work, the kind that seriously geeky people like me give.  Those include cognitive tests, neuropsychological tests, academic tests, psychological tests, behavioral questionnaires, and other fun stuff. I’ll start there because, well, because I like them and I think they’re really pretty interesting.  I’ll try to chew off manageable chunks to talk about, and over time, I hope people learn something.

The most serious and popular misconception I encounter is a fundamental misunderstanding of what tests can do.  They’re not magic, and neither are those of us who give them magicians.  We’re just very observant (or at least we’re supposed to be!), and we’re using them to make a series of structured observations.

Again, this is like science.  When I was training as a molecular biologist, one of the things I had thwacked into my head (through reading in the literature some of the truly impressively weird things that happened when people didn’t remember it) was that no experiment ever tells you anything about the real world.  It tells you what happened on that day when that person did that experiment in that way.  You might use that information to conjecture about the nature of the real world based on your data, and over time, as you build up more data, you can get a better and better sense of what the real world might be like.  But you might see a different experiment, claiming to answer the same question, where you get different results.  Uh, oh.  Where do you look, to figure out what was going on to find the difference that made the difference?  In the Materials and Methods, the specifics of how the experiment was designed and constructed.  Very often, that’s where the difference lies.  You cannot separate data from the experiment that generated it.

Same with assessment.   No test, no matter how beautifully it’s designed, how skillfully it’s administered, and how insightfully it’s interpreted, can possibly tell you anything incontrovertibly true about the real human being.  The test tells you what that person did on that day on that test with that tester in that environment.  It might reflect something probably true about the person, but you have to stay humble with your interpretation.

Since you will always value what you measure, it makes sense to think very carefully about how to measure what you actually value. In education, we talk about the idea of “alignment” — we’d say that this test is or is not well-aligned to the skills we want the student to be able to demonstrate.  That’s what I was talking about above, why I don’t respect the very bubble tests that I tend to be able to blow out of the water.  They typically test what is easy to measure, but not what a thoughtful professional would consider all that valuable.  At the conclusion of many thousands of hours of clinical training, psychologists in most states have to take a detailed fact-recall bubble test covering basically the entire field.  We to prove that we know which classic theorist suggested that you were running from the bear because you were afraid, versus which one suggested that you were afraid because you were running from the bear.  But we don’t have to demonstrate the capacity to actually manifest any clinical competencies with actual, oh, I dunno, human beings in distress.  In test design, we talk about the very-closely-related concept of “validity,” which comes in many flavors.  In this case, the construct validity of the test — how it defines what it is that it’s trying to measure — is awful.  Fact knowledge within a domain is a useful thing, and might be a good prerequisite to beginning clinical work.  But the public is not protected from incompetent psychologists by choosing only those who can remember the facts printed in their textbooks.

I think the best-aligned test I ever took was the qualifying exam for the Ph.D. I didn’t get in cancer biology.  I was required to dive in to fields I was unfamiliar with, learn about the prior research in those fields, and propose new lines of research that would answer important unanswered questions.  Minus the speed with which I had to do it (three of these, in completely different fields, within a single week!), this test was testing very much what I would need to do if I became a principal investigator running my own lab someday.   Of course, the alignment/construct validity of that test wasn’t perfect either.  What it didn’t explore was the personality traits which set me up to be a very sad and bored and frustrated person in the lab, the precise difference between thinking about science, which I love and am good at, and doing bench science on a day-to-day basis, which I don’t and am not.

What I find most concerning about the high-stakes testing (aka “accountability”) movement in education is that it tends to use tests with poor validity in a variety of domains (construct validity, content validity, and predictive validity being the most notable), and that it tends to ignore other underlying methodological differences between comparison groups (most notably, differences in the populations being served and the resources available to teachers and administrators to serve them, but also differences in how various jurisdictions define their goals and standards).  When science teachers teach kids about experimental controls, we start with the idea of a “fair game.”  But there’s no way on earth that these “games” are fair.  There’s nothing truly “standardized” about these experiments, and almost every interpretation that is made of them is a massive overinterpretation from inadequate data.  Gives serious testing a bad name.  Harrumph.

Okay, so my plans for this series of posts right now involve topics like the various types of validity and reliability (the twin pillars of assessment for people who actually want usable data!), and a sort of overview of each of the major types of clinical testing (e.g., cognitive, academic, neuropsychological, behavioral, projective) and what they are and aren’t good for.  I’ll do classroom and educational and high-stakes stuff later, but I’d rather start with what I do the most of.  If there are specific ideas or questions you’d like me to address, feel free to drop them in the comments area here.

bubble, bubble, toil and trouble… (multiple choice exams)

December 23, 2010 24 comments

I hear a lot, particularly around late fall / early winter, about students who have a particular difficulty with multiple-choice exams, like the SAT, ACT, GRE, and so on.  I personally think bubble tests are nearly always worthless in terms of telling us anything we actually want to know about a kid (it is possible to create a really good multiple-choice exam to explore real understanding of content and mastery of higher-order thinking skills… but it is extremely rarely actually done).  Unfortunately, they are a fact of life for students.

Sometimes, bright kids are like me — I cordially despise these tests, but I’m very good at them.   Always have been.  Even when I haven’t actually learned the content (in fact, especially so, since I have a terrible memory for the kind of isolated facts these tests so often rely upon).   I consider them something of a competitive sport.   But very often, “bubble tests” are a bright-to-gifted kid’s personal nemesis — “I understand it all, but I just can’t get things right on the stupid test!  I can’t remember the nitty little details and I can’t decide what answer they think I’m supposed to give and it’s all just awful!”

Neither the research literature nor the professional lore would support the idea that some people should be diagnosed with 315.9 Learning Disorder, Not Otherwise Specified, Cannot Take Multiple-Choice Exams.  Typically, if a kid (or adult) has persistent problems with bubble tests, one or more of several things is going on…

Test anxiety:  These tests tend to push an already-slightly-anxious person’s buttons.   The pace tends to be very rapid (in the realm of one minute or less per question), the stakes tend to be perceived as high (will you get into such-and-such program?), there are more wrong answers than right answers but they’re all pulling at you… eek!  According to the inverted-U hypothesis (aka the Yerkes-Dodson law), overly-high levels of arousal tend to decrease performance.  Moderate levels of arousal are good (see below under EF/ADHD), but if you get too buzzy, you end up crossing the line into freaked-out, and no one can concentrate well or do their best when they’re freaked out.

Lack of test savvy:  While I don’t think there’s any value to the kinds of apocryphal loree. kids like to rely on (“if you’re not sure, choose C”), it is worth recognizing that these tests are written by human beings.  Your goal is not actually to get the right answer.  Your goal is to think like the writer of the test questions, to see the underlying question they’re trying to ask, to spot the trick they’re trying to lead you into, and to choose the answer they want you to choose.  This is absolutely a learned and learnable skill.

Executive functioning problems (including ADHD):  Most often, the kids with ADHD are impulsively choosing the first “pretty-good” answer they see, or they drift off before reading the question thoroughly and carefully considering all possibilities.  Many of them have a hard time keeping their arousal level high enough to stay focused, keeping their focus on the task, and maintaining a working tempo that will let them finish in time.  Note also that kids who aren’t getting enough sleep will also typically struggle with these things.   Coffee can sometimes help… but it’s not as good as having the brain properly rested in the first place.

Issues with speed and pace: This is properly a subset of “executive functioning” above, but it’s something that a lot of folks have trouble with specifically on this kind of test, even when they don’t have trouble with it in real life.  Timed test-taking is itself a skill.  It is difficult to maintain the pace and rhythm needed to get the whole thing done.   People often get bogged down in a few hard questions and then can’t pick up the pace after they extricate themselves from the bog (too much mud on the boots, if you will).  Also, many kids have a hard time maintaining the required effort over the long period of time the tests take (the high-stakes tests such as the SAT are often several hours long).  Staying focused that long without reorienting cues is something we don’t practice that much these days.

Language comprehension problems (including Asperger’s):  Test questions are often quite finicky in terms of language — they’re highly specific in their meaning, and if you don’t read really carefully and focus on (1) exactly what they’re saying and (2) exactly what they meant (yes, I know those might sometimes seem like opposites… that’s part of the game!), you will trip up.  The wrong answer choices are almost always based upon the typical misreadings of the questions — these “attractor” answers are the reason that some kids actually do worse than chance when they guess.  Kids who have trouble in this domain often also have subtle weaknesses in the rest of the “real world” in terms of reading comprehension, analytical writing, and oral direction-following.

Overthinking:  I’ve often seen bright-to-gifted kids overthink these stupid multiple-choice questions, choose the second-best answer on a technicality because, “Well, it could be that,” etc.  Sometimes they’re getting all proud of themselves for coming up with a technicality, like, “Lookit me, I’m smarter than the test, ha, ha!”  But in school, teachers only rarely grant credit retroactively for coming up with a clever justification, and on those high-stakes tests, you’re almost never going to get credit this way.  The goal is not to get the right answer.  The goal is to get the answer the test writer wanted you to get.  Personally, when I was taking the tests for my high school teaching credential in science, while I pegged the upper reaches of the scores, I found it interesting that I had a relative weakness in the areas of science I knew the best (courtesy of ten years of training as a molecular biologist).  Why? Because I got stuck saying things like, “Okay, B is the actual right answer.  However, the overwhelming majority of the population thinks it’s C, and lots of textbooks say it’s C, too.  Did the person who wrote the test know about B, such that C is the attractor answer, or am I supposed to say C because that’s probably what the person who wrote the test thought the answer was?”

So, what to do?

What I generally recommend in terms of intervention, regardless of the cause, is to provide explicit instruction and guided practice in the specific skills involved.

Numero uno, the most likely area of weakness.  Make it a habit.  Always.  Read the  question carefully.  Read all of the choices.  Think through what each choice means and why it would be a good or bad choice.  Then (and only then) choose the best one.

Practice solving items by thinking out loud with a test-savvy tutor.   When mistakes are made, go over the explanations for the answers and use these as learning opportunities to understand better how test-writers think.

If there are specific high-stakes tests at issue then get the Big Thick Book of Real Practice Tests from the local bookstore and study the test itself. Learn to identify common question types.  In fact, it’s often worth it to practice rapidly identifying question type as a separate skill.   When you’re good at knowing the general kinds of questions, your study can then focus on strategies which fit each type — that makes your work a lot more efficient.   It’s better to practice a whole bunch of questions of the same type and master the skill, saving the mixed practice for when you’re reviewing skills you’ve already mastered.

As with any sport or musical instrument, regular practice, on items that are difficult enough to be challenging, is what you need to improve.  Massed practice (“cramming”) might feel like, “Ooh, I’m doing something heroic, this has gotta work.”  But it doesn’t work anywhere near as well as regular practice.  I know, I know.  You think it does.  Everyone thinks it does.  Sorry.  It doesn’t.   You’re not special.

Frankly, I generally don’t recommend the courses from test-prep companies unless you’re a kid who honestly won’t do the Big Thick Book technique reliably.  The courses tend to be basically just the same thing as the books, only there’s a grownup standing at the front of the room keeping you on task.  If that’s the only way you’ll reliably study, well, okay, fine.  But if you’re trying to take a high-stakes test that will get you into, oh, say, college, where, did anyone mention, no one reminds you to get out of bed or do your homework, perhaps this would be a good time to learn to get yourself to do the stuff you don’t like to do.

Particularly if anxiety or drifting-off is an issue, practice, practice, practice, under the most realistic conditions you can muster up.  Try out different techniques for reducing your anxiety or getting yourself woken up to the right level, and figure out what works best for you and is legal under test conditions (that is, if you do best with music, sorry, you will almost never be allowed to have an mp3 player on a standardized test, so you need to come up with something else).  If the unfamiliar location of a high-stakes test is a problem, try taking practice tests in different locations (public libraries are good).  Take them timed.  No food.   No potty.  No breaks.  No standing up.

For timing or pace issues, practice with a loop timer, gradually decreasing the time per item, to work on tempo.  Set a tempo that will get you finished in approximately 80% of the time allotted — that leaves time to work on the really hard questions that will take more thought.

Practice a strategy that will maximize the number of items answered probably-correctly.  A lot of people get stuck on hard items and won’t move on until they’ve figured out the answer.   It’s much more advantageous to look at each question, and if you know or can quickly figure out the right answer, do it, and if you don’t, circle the item number and move on.  That both ensures that you get to all the easy ones (which have the same point value as the hard ones!) and puts the content of the hard ones into your head where it can cook.  Once you’ve done that, go back and do all of the moderately-hard ones, the ones you can get with some serious thought.   Cross off the circles as you answer them, so that you can easily scan for the not-yet-done ones on later passes.  The very-hard ones should not get time wasted on them until you’ve done the moderately-hard ones.

Yes, I know, if you’re taking a computer-adaptive test (where it insists that you answer each question because it’s adjusting the difficulty level of the next question based on whether you got this one right), you may not be able to use this strategy, but if you can, it is a huge benefit.  Note, by the way, that on some computer-administered tests, you can skip forward and go back as you wish.  If so, then use the scratch paper to keep track of the item numbers you have skipped and cross them off as you get them dealt with.

Know the scoring rules of the test. If there is no penalty for guessing, you should make sure to answer every question even if all you’re doing is bubbling randomly in the last minute.  If there is a penalty for guessing (typically -1/(n-1) where n is the number of choices, such that a purely random guessing pattern would result in a score of zero), you need to get a bit more strategic.   Some people are good guessers — they guess above chance.  If you’re one of those people who guesses at or above chance, again, you should always guess on every item.

However, some people are not good guessers, and actually guess below chance, typically because they’re getting caught up by those attractor answers.  The usual advice to guess if you can eliminate even one answer as definitely correct is wrong, or at least oversimplified. If you can eliminate one or even two out of four choices as definitely wrong, but then choose the wrong answer of the remaining choices often enough that your guesses are below chance overall (that is, if you work your way down from four choices to two but then still pick the wrong answer more than 3/4 of the time), then you’re still guessing below chance.  “Almost right” or, “It was my second choice,” doesn’t count (this is, as the proverb says, neither horseshoes nor hand grenades).  You need to gather data on your own guessing patterns to know whether guessing is an advantageous strategy for you.  This is a great use for the Big Thick Book.

Furthermore, if you’re a bad guesser, or even if you’re a decent one, study what is tripping you up when you guess wrong.  What are the traps you’re getting caught in?  Can you create specific rules and checklists for yourself to make sure you don’t forget about them?  For example, when I’m doing quantitative comparisons, I always check to see what happens if the variables have values of 0, 1, -1, some other negative number, and a fraction between 0 and 1, trying to find a situation where the obvious answer is wrong.    When I do reading comprehension tests, I always read the questions first, and then read the passage with a pencil in hand so I can mark it up.   Stuff like that.

What are the specific skills or content areas that they seem to always throw at you and you always forget?  How can you make sure you get it into your head long enough to write it on the scratch paper as soon as the test starts?  I’m a hawk when it comes to cheating, but even I don’t think it’s cheating if you write a “cheatsheet” out of your head during the test.   I knew one dyslexic young man who could not for the life of him memorize the quadratic formula, but he was a great conceptual thinker and could remember easily how to derive it.  He got to the point where he could rederive that thing in thirty seconds flat on the scratch paper (and he went on to major in mathematics in college).

Anyone have any special tried-and-true techniques that work for them for studying for bubble tests?  Inspirational stories on how you destroyed a stupid test that had been making you miserable?  Post them below!