Building Value in Certification—A Status Report on the Enhanced NIC Interview and Performance Examination

PDF Document

By Clarence “Buck” Chaffee, President(1)
The Caviart Group, LLC

In July 2011, NAD and RID announced that the Boards of both organizations had approved a plan for developing the next generation of NIC Certification. The stated purpose of the changes was “to strengthen the effectiveness and increase the value of the NIC credential.” Since that announcement, a number of changes have been implemented: a national study (pre-Job/Task Analysis survey) of deaf, hard of hearing, and hearing consumers of interpreting services was conducted; the enhanced NIC Interview and Performance Examination was launched; and a new interview and performance examination scoring process was implemented.

So now, nearly a year after the enhancements were announced, the time has come to assess how the program is doing. Have these changes done what they were purported to do? Is the NIC credential more effective and more valuable as promised and, if so, how can anyone tell?

This interim report looks to address these questions and answer some of the additional questions that have surfaced in the community about the enhanced NIC Interview and Performance Examination. It will explain the logic and process used as well as the evidence to date as to the effectiveness of the changes.

To understand whether the enhanced NIC Interview and Performance Examination is an improvement, it is first important to understand what certification is and what it means for a certification program to be “effective and valuable.”

What is certification really about?

Certification is essentially a warranty provided by an organization that says that the people to whom the organization has issued a certificate have the knowledge and skills required to competently perform a given job at a given level. This means that anyone who engages or receives the services warranted from such certificants has the organization’s assurance that the certified person will be able to perform those services competently at the level in which they are certified.

This does not mean that all certificants are of equal ability. In fact, many will be able to perform additional services beyond those addressed in the certification, and some will be able to perform the job at a higher level. What the certification means is that all certificants can do the described job at least at the level of basic competency.

So, what is the job? What is the level? And what does it mean to be competent?

While we might think that all members of a profession do the same thing, it is quite difficult to come to a consensus as to what members of a profession actually do. No matter which profession or which certification, people will always declare that the certification is “off-base,” “unfair,” or “‘invalid,” because the certification examination includes topics that the critic does not do or because the certification examination fails to address something that the critic believes to be important.

This is true in interpreting, just as in every other profession, because every person has a slightly different view about what it means to be an interpreter. Practitioners define their profession based on their personal experience. But every person’s education and experience are slightly different, and every person’s job has different demands. In addition, professions are constantly changing to meet the changing market conditions, to respond to changes in laws and regulations, and to keep up with changes in technology. Some people will be ahead of the curve, and some will be behind the curve.

Given the wide variety of opinions and the fluidity of professions, the only way for any certification program to establish a meaningful credential is to take a stand for what the certification means at a given point in time—to formally declare the nature of the job that they are going to certify and the knowledge, skills, and abilities that they are going to warrant. Certainly organizations must do the appropriate due diligence before drawing this line, but at the end of the day they must declare and define what the credential stands for.

The ultimate success of the credential will then be measured by its value to consumers, its validity, and its defensibility. If the defined job meets the needs of a significant number of consumers, they will value the credential; if the process of defining the job uses professional judgment and is based on scientifically gathered information, the program will be valid; and if the execution of the program meets recognized credentialing industry standards without bias, the program will be legally defensible.

What is NAD-RID’s declaration of the job of a certified NIC (Level I) interpreter?

The enhanced NIC is NAD-RID’s warranty issued to interpreters who can perform competently at a basic level of interpreting (currently called Level I ).(2) The profile of this person is as follows:

A minimally competent Level I interpreter can facilitate communications between deaf and hearing individuals (be it in interactive discussions or one-way presentations) in common settings that contain vocabulary at a first-year, undergraduate college level and is presented at normal, conversational speed. They can quickly adapt to different situations and communication styles and are able to take in and relay the essence of a message so that the communication is complete, effective, and transmitted without undue delay or interruption.(3)

Some interpreters or consumers will certainly disagree with portions of this statement. That is fine. NAD-RID understands that no profile will satisfy everyone. The bottom line, however, is that this statement summarizes what both organizations believe is needed to serve consumers and what NAD-RID is willing to warrant as the job that can be performed competently by every enhanced NIC (Level I) certificant.

How did RID decide what would make the enhanced NIC (Level I) credential valuable?

Members of a profession may assume that a certification is for the benefit of the people who become certified. Industry leaders may believe that the purpose of the program is to increase the respect or prestige of the profession. And educators may think that the purpose is to validate their educational programs and objectives. While all of these are ancillary benefits, none are the true audience for whom a credential is built or from whom bestow value on the credential.

In order to have value, a certification must be developed for the benefit of consumers—those individuals who rely on and benefit from the services provided by individuals holding the credential. In NAD-RID’s case, consumers include those deaf, hard of hearing, and hearing persons whose communications are facilitated by an interpreter and those employers, agencies, or other entities that engage interpreters to facilitate such communications.

Determining exactly how the enhanced NIC examination should be structured and what content it should address to best meet the needs of today’s consumers was not a simple task. In fact, it took over two years and several studies for NAD-RID to develop this information.

The first step was to convene the NIC Task Force to review the NIC paradigm and develop a new and comprehensive analysis and delineation of contemporary practice for interpreters. This group also reviewed the concerns of consumers and interpreters about the NIC certification program and outlined a plan for the next generation of national interpreter credentials. RID and NAD convened a joint task force of industry thought leaders and subject matter experts, including deaf and hearing interpreters(4) to address these tasks. The task force met three times over the period of at least one year to review data and documentation about the NIC, to consider alternatives for the future and to draft a master plan for the development of the next generation of national interpreter credentials. This master plan was ultimately approved by both the RID and the NAD Boards of Directors. A priority of the master plan was to modify current testing requirements to create an “enhanced” NIC examination aimed at a single level of certification, built to better meet the needs of today’s consumers.

The NAD-RID NIC Task Force believed that a number of changes had taken place both in the manner in which interpreting was conducted as well as in the needs of consumers since the NIC was originally developed. For example, they felt that the traditional interpreter/consumer arrangement in which the interpreter was known in the community and developed a working relationsfhip with consumers was occurring less frequently and that more interpreter/consumer interactions occurred remotely with little or no advance information or interaction between the interpreters and the consumers, such as in video relay environments. The Task Force felt that the demand for interpreters at different levels of skill and with different areas of specialization was also increasing. These and similar changes were seen as placing even greater importance on the certification to accurately and consistently indicate the level of skill of the interpreter.

While such changes were believed to be true, it was important to gather reliable data about the needs of consumers before constructing a new certification program. To validate these beliefs, RID conducted a needs analysis in 2011.(5) This study included a major survey (presented in both text and ASL) which asked deaf, hard of hearing, and hearing consumers of interpreting services what services they used most often and in what context. It also asked consumers what services they need that require specialized knowledge and skill. The study asked similar questions to interpreters. Over 4,600 people responded to the survey, providing RID with a wealth of detailed information needed to structure the NIC credential (Level I) as well as future credentials at higher levels and in areas of specialization.

This study confirmed several concepts put forth by the NAD-RID NIC Task Force. First, the changes in the manner in which consumers used interpreting services was confirmed. In fact, the results of the study indicate that deaf consumers now use remote interpreting services as much as, if not more often than, they use in-person interpreting services.

Frequency by Method of Interpreting

Consumers

Method of Interpreting

In person

Remote

Deaf

3.54

3.69

Hard of Hearing

2.52

2.48

Hearing

2.89

2.23

Grand Total

3.25

3.1

Consumer Responses by Frequency of Method of Interpreting(6)

The study also confirmed the need for general interpreting skills for personal (day-to-day) communications and the need for specialized skills for other communications.

Degree of Importance of Specialized Knowledge and Skill

Service Settings

Degree of Importance

Legal

4.84

Mental health

4.67

Medical

4.66

Graduate Education

4.45

Undergraduate Education

4.26

K-12

4.18

Corporate Education

3.99

Business

3.78

Entertainment

3.48

Personal

2.51

Service Settings by Consumer’s Mean Degree of Importance of Specialized Knowledge and Skill(7)

In addition to the information generated about consumer needs, NAD-RID also had information from the 2002 Role Delineation Study of Certified Interpreters.8 This study identified the tasks performed by interpreters and the knowledge, skills, and abilities required to perform those tasks. All of the content of the enhanced NIC examination was identified as being important to competent practice of interpreters in this study.(9)

The information from these studies was used to specify the content and format for the enhanced NIC (Level I) examination, as follows:

  1. First, an examination Scoring Group, a group of subject-matter experts recommended by the NIC Task Force, finalized the profile and definition of the individual to be certified by the enhanced NIC (Level I) examination.(10)
  2. Then multiple, independent panels of experienced deaf, interpreter, and hearing NIC raters were convened to review the content of the NIC Interview and Performance Examination and to identify those portions of the examination that they believed most effectively discriminated the skills of interpreters at the level described.
  3. The examination Scoring Group then developed scoring criteria to rate the segments identified by the rater panels using a proven scoring algorithm.(11)
  4. A scoring validation study was then conducted to ensure that the content and segment length identified by the raters was sufficient for consistent assessment of examinees’ skills. To do so, a sample of actual, previously-scored NIC candidate submissions were independently scored by each member of the Scoring Group using the new scoring criteria and scoring process. The Scoring Group then discussed their ratings as a group to confirm that their holistic assessment of candidates’ skill was in line with the assessment generated through the application of the scoring criteria.
  5. Rater consistency was calculated and found to be within industry standards.

The results of this validation study confirmed that the test subject matter, test length, and scoring process provided an accurate and consistent measure of candidate performance.

With this validation in hand, RID proceeded to develop the enhanced NIC Interview and Performance Examination. A number of examination vignettes were developed, reviewed, and evaluated by the Scoring Group. The final examination vignettes were selected from these drafts.

These cases were believed, by the Scoring Group, to have the most appropriate subject matter content, vocabulary, and stimulus material. The Scoring Group then proceeded to develop the scoring criteria for these vignettes.

The final enhanced NIC Interview and Performance Examination was launched on December 1, 2011. Unlike multiple-choice tests in which the cut score can be established prior to live testing, actual candidate solutions are required in performance-based tests before the cut score can be established. This ensures that the testing materials and instructions were clear and that test vignettes performed as intended. This also ensures that the scoring criteria are appropriate and fair to candidates.

Once a sufficient pool of candidates had completed the examination, the Scoring Group met one more time to review actual samples of the candidate solutions and to revise the scoring criteria as needed. The Scoring Group then developed a “cut score” using a modified Angoff method.(12) A cut score is the minimum score required to pass a test.

Finally, the Scoring Group undertook another scoring validation process to ensure that the results of the scoring process and the applied cut score agreed with their holistic evaluation of the candidates’ skill.

After all of these development and validation steps, the rater training materials were selected and rater training was performed. Trained raters were then administered several sample vignettes to ensure that their scoring matched the scoring established by the Scoring Group. Only those raters who passed these scoring vignettes were allowed to score actual examinations.

How does RID ensure that certificants can perform the job warranted by an NIC (Level I) certification?

Since no assessment is so accurate as to guarantee the competence of certificants by itself, all certification programs rely on a series of requirements and assessments to increase the likelihood that only competent applicants obtain the certification. For the enhanced NIC certification, these standards include requirements for education as well as the requirement that applicants pass a two-part examination.

NAD-RID ensures the credibility of the process by making certain that the application review process is thorough and that the examinations are valid, accurate, and reliable. They do the latter by carefully building the examinations to meet or exceed all recognized testing standards(13) and then by carefully analyzing the results of the examinations.

What is the difference between the previous NIC examination and the current enhanced NIC examination?

As was the case with the original NIC examination, the enhanced NIC examination still consists of two parts: a selected response (i.e., multiple-choice) examination and a constructed response (i.e., interview and performance) examination. Both parts must be passed independently.

The format and content of the multiple-choice section (the NIC Knowledge Examination) has not changed. The actual items in the interview and performance examination are different than before; however, the content of the enhanced NIC examination is based on the same Role Delineation Study, as was the original NIC examination.

A major difference is that the enhanced NIC examination results in a single level of credential, whereas the previous NIC examination could result in one of three levels of credentials.(14)

What is the evidence that the enhanced NIC examination is valid, fair, and legally defensible?

The first question in determining the quality of an examination has to do with content validity. Certifications can consist of both “content validity” and “face validity.” While content validity is critical, face validity is not. In fact, face validity has more to do with market acceptance than it has to do with actual test quality.

An examination that has content validity accurately measures only the knowledge and skills that are required for competent performance of the job. This is critically important as it would be unfair and unreasonable for an examination to assess knowledge or skills that were not needed to do the job, as defined in the certification profile.

While the examination must be job-related, it does not have to look anything like the actual job. A multiple-choice examination may be perfectly valid even though it does not look like the job to be certified.

[In fact, multiple-choice examinations do not look like many jobs, as most jobs do not require that professionals select a response out of four (4) possible choices].

Face validity, on the other hand, means that the testing process “looks something like” what the candidate does on the job. Examinations that have face validity still are not exactly what the candidate does on the job, because they are merely tests.(15)

Is the enhanced NIC examination valid?

Each part of the enhanced NIC examination has been designed and constructed in accordance with the credentialing industry standards that define validity.(16)

The multiple-choice examination (NIC Knowledge Examination) is valid, because it tests for knowledge and skills that were scientifically identified in the Role Delineation Study as being important for competent practice.

The performance-based examination is also valid because it tests for knowledge and skills that were also scientifically identified on the Role Delineation Study as being important for competent practice. (Additionally, it has significant face validity(17) in that it “looks like” something similar to what interpreters do on the job.)

Why don’t the enhanced NIC Performance vignettes mirror real interpreting practice?

The interview and performance vignettes are not intended to replicate actual practice and they do not need to replicate practice to be valid. Instead, these vignettes are specifically designed to present candidates with a series of challenges that are carefully designed to demonstrate a candidate’s ability or lack of ability. While a concern has been raised within the RID membership that these shorter vignettes cannot possibly assess candidates’ skills, it is important to clarify that there is absolutely no merit to this suggestion.

Tests are designed to produce accurate and reliable information as efficiently as possible. The enhanced NIC interview and performance vignettes are approximately three (3) to five (5) minutes in length. This length was established as described earlier in this report by multiple panels of expert interpreters and deaf consumers who were asked to identify the amount of time that it takes to accurately assess a candidate’s skill.(18) The vignette design also went through numerous validation studies to confirm that the test content and length were appropriate for assessing candidates’ skills.

The vignettes can be shorter than actual practice encounters, because they exclude material that does not contribute to the accurate identification of a candidate’s ability level. In reality, conversations usually start with socially-appropriate introductions. This is usually followed by some background or introductory information. Our subject matter experts determined that not only do these niceties consume a considerable amount of time; they are so basic that they do not require much interpreting skill and therefore provide little information about the candidate’s skill.

Instead of using testing time to assess unessential elements that do not contribute to the needed measurement, the enhanced NIC vignettes go right to the heart of the communication encounter and present candidates with a series of tasks containing content of sufficient complexity to quickly and accurately separate those who are minimally competent from those who are not.

We understand some may have concerns because this may not be how interpreting exchanges normally occur in practice, but the purpose of the examination is to assess a candidate’s skills, not to replicate reality. In fact, this is true of all performance examinations, across professions. Following are some examples:

Tower crane operators typically work an eight (8) hour shift during which time they do not leave the tower. The practical examination of tower crane operators however is only seven (7) to ten (10) minutes in duration. Essentially, it was determined that this is as long as it takes for a crane operator to demonstrate his/her skill in manipulating the crane ball through a number of predetermined obstacles. The certification program that relies on this test is now mandatory in a number of states where major, fatal accidents previously occurred due to crane operator error.

Another example is police officers. The daily activities of police officers rarely involve life-threatening situations. In fact, most police officers never fire their guns in the field. In performance examinations, however, police officer candidates are presented with complex situations containing numerous problems, some of which require them to fire their guns repeatedly. This is done because it is important to know that if an officer is presented with such situations, he or she will be able to respond appropriately. The test is valid even though it is not very indicative of what the officers will actually do in day-to-day practice.

While the interpreting field does not resemble the examples listed above, the bottom line is that no certification test, regardless of how much face validity it has, is actually what professionals do in reality.

Is the enhanced NIC certification fair?

Fairness of certification programs must be examined from two perspectives—from that of the candidate and from that of the consumer. For candidates, a program is considered to be fair if the test content is valid, if it correctly identifies those candidates who pass and those candidates who fail, and if every candidate has the same opportunity to demonstrate his or her competence. For consumers, a program is fair if it accurately identifies those individuals who are competent to do the job that they are certified to do.

We have already addressed content validity, so the next question is: “Does every candidate have the same opportunity to pass?” This is an area in which the enhanced NIC examination has made significant improvements, and NAD-RID believes the answer to this question is “Yes! “

First of all, the enhanced NIC Interview and Performance Examination is shorter than the original examination and contains numerous breaks. This reduces the potential impact of fatigue affecting the candidate’s score. (It should be noted that the current examination is not intended to be a test of endurance, as there is no evidence in the needs analysis or the current Role Delineation Study that suggests that physical endurance is required for the job.)(19)

Additionally, examination conditions have been standardized. RID has carefully specified the testing conditions that are to be provided at every test site. This means that, to the extent possible, all candidates are tested under the same conditions. There is also an appeals process if the candidate feels that the testing conditions affected their ability to successfully perform the examination.

The timing and flow of the examination has also been set for all candidates, further ensuring a level playing field for everyone. Candidates cannot advance or rewind the video stimulus, and all candidates must take the examination in the order that it is presented. (These requirements ensure that the examination is only assessing candidates’ interpreting skills and not their test management skills.)

Substantial changes have been made in the scoring of the examination to ensure the accurate scoring of all vignettes and to significantly reduce the chance that bias could affect the outcome of any candidate’s score. The enhanced NIC Interview and Performance Examinations are scored using a new automated, distributed scoring system. This online system automatically distributes candidates’ vignettes individually to approved raters. This means that every examination is scored by a number of different raters (20) and that the distribution of vignettes to raters is random. The system also measures the consistency of scoring and automatically sends exam vignettes for additional scoring if the initial raters disagree on the pass/fail status of that vignette.

Further, candidates are identified only by their candidate ID number in the scoring process and in their videos. There is no indication of the candidate’s identity or location of testing.

Beyond the mechanics of the scoring process, the scoring criteria is also improved. The Scoring Group developed vignette-specific criteria for each problem. The criteria guide all raters to the accurate assessment of each vignette in the examination. In fact, analysis of the scoring process has indicated that the first and second raters agreed on the pass/fail status of each vignette 83% of the time.(21) This is an excellent percent agreement and it should continue to go up as raters gain additional experience and receive additional feedback.(22) The final score for those vignettes in which the initial raters disagreed is made by including a third, independent rater.

Is the enhanced NIC examination legally defensible?

The enhanced NIC Interview and Performance Examination is definitely valid and legally defensible. It has been constructed to meet all recognized industry standards, it uses a proven scoring algorithm, it has passed thorough validation studies, and it has been shown to produce very consistent scoring results. The Caviart Group was selected based on their extensive and numerous years of experience designing, administering, and scoring performance examinations. Given the group’s in-depth understanding of testing and testing principles, and success in defending performance examinations in the legal arena, it is Mr. Chaffee’s professional opinion that this certification examination is legally defensible.

Is the enhanced NIC credential effective and valuable?

In the past, consumers expressed concerns that they were not always sure that they could rely on the NIC credential to identify interpreters with the skills they needed to do a job. There were a number of reasons for that uncertainty. Foremost, there was not a clear warranty of what the certified interpreters could do, and consumers were confused by the titles assigned to the various levels and what they meant.

NAD-RID has recognized these issues and taken positive steps to resolve them. Hopefully, the steps will achieve the goals for which they have been designed, and the enhanced NIC certification will earn the value and respect from consumers that it deserves.

NAD-RID is committed to keeping the members and consumers informed about the NIC Certification Program. Both organizations are confident that new functionalities, such as ongoing assessment of active raters, and future steps, such as the implementation of a web-based test delivery platform to administer the NIC exam and the release of the Job/Task Analysis, will not only enhance the value of the NIC credential, but will attract more stakeholders to become involved in the continuous effort to ensure that RID credentials are meeting the needs of the marketplace.


Footnotes 

(1) See Appendix A for information about The Caviart Group, LLC, and Mr. Chaffee’s credentials.

(2) RID plans to certify higher and specialized levels of interpreting skill in future editions of the program.

(3) The profile was developed by the NIC Task Force and the NIC Scoring Group which were comprised of experienced deaf and hearing interpreters.

(4) The NAD-RID NIC Task Force Members include: Michael Canale, CI, CT, NAD IV (Chair), Sherri Collins, Robyn Dean, CI, CT, Kelly Flores, CI, CT, Judith Gilliam, RSC, CDI, Lisandra Gold, CI, CT, Gino Gouby, CDI, CLIP-R, Jo Linda Greenfield, NIC Master, NAD V, SC: L, CI, CT, TC, Daniel Langholtz, CDI, CLIP-R, RSC, Elizabeth Morgan, NIC Master, CI, CT, SC:L, NAD V, Geri Mu, NIC Master, CI, CT, Debbie Peterson, CDI, Linda Ross, NIC Master, CI, CT, TC, and Amanda Smith, NIC Master, CI, CT, SC:L, Ed:K-12.

(5) “Results of the RID Pre-JTA Survey” by The Caviart Group, LLC, 2011

(6) The frequency scale consists of five levels from “Never” (Value =1) to “Always” (Value = 5)

(7) The importance scale consists of five levels from “Not Important” (Value =1) to Extremely Important (Value = 5)

(8) “Role Delineation Study of Certified Interpreters” by Castle, Inc., 2002

(9) The content of a certification examination is deemed to be valid if it can be scientifically related to the knowledge, skills, and ability required to perform the job. Studies such as the RID Role Delineation Study are recognized by national and international certification accreditation agencies (including NCCA Standards for the Accreditation of Certification Programs (Standard 10) and ANSI ISO Standard 17024 Section G.4.3.4) as the preferred method of ensuring the content validity of certification examinations.

(10) See page 2 for the developed definition of the profile of an NIC (Level I) interpreter.

(11) This algorithm was developed by Mr. Chaffee with researchers from the Educational Testing Service (ETS)—see References. Mr. Chaffee has used this process for more than 20 years to score more than 1 million human-scored examinations for licensure as architects and as landscape architects. It also serves as the basis for the computer- simulation scoring for architects and sonographers and is currently being used to score the work samples of technical communicators. Mr. Chaffee has repeatedly and successfully defended this scoring process in the legal arena.

(12) Dr. Angoff was an Educational Testing Service (ETS) research scientist who developed a method for determining examination cut scores in 1971. This method is widely accepted in the testing industry.

(13) The enhanced NIC examination was built to meet the certification program accreditation requirements of the NCCA and of the ANSI ISO 17024.

(14) Future examinations will be developed to assess candidates for higher levels and for specialized credentials.

(15) Tests, by design, are always a sampling of the knowledge and skills required to perform a job. Although tests occur in unrealistic conditions and compressed time frames, the reasoning behind testing this sampling of knowledge and skills is that if a candidate can demonstrate the knowledge and skills tested, there is a high degree of probability that they have highly-related knowledge and skills.

(16) NCCA Standards for the Accreditation of Certification Programs (Standard 10) and ANSI ISO Standard 17024 Section G.4.3.4.

(17) Face validity is a term that refers to tests that “look like” what candidates do on the job. These are still tests so they are not comprised of what candidates actually do on the job.

(18) In determining the appropriate length of these vignettes, RID interviewed and convened panels of experienced NIC examination raters and asked them to indicate the length of exercise they felt was needed to accurately demonstrate a candidate’s skills. The work of these panels was then reviewed by a completely different examination development committee and then by an examination Scoring Group. This was done to ensure that the vignette cases were appropriate and effective in assessing candidates’ skills. In addition, the Scoring Group looked at the resulting overall score of candidates as determined by the scoring system and compared that result to their holistic assessment of the candidates’ competency. There was a very high level of agreement between these two assessment approaches.

(19) The ADA requires that any physical attribute (such as physical endurance) to be included as a part of a test must be specifically identified in formal studies of professions. Without such documentation, RID would not be able to defend the NIC administration in court. The ADA protects the civil rights of all candidates.

(20) A candidate’s complete exam may be scored by as many as 21 different raters which significantly reduce the chance that rater bias will impact the final score of any candidate.

(21) Typically, 80% internal consistency of a human-scored exam is considered to be very good. Throughout the process, raters are not aware if they are providing a first, second or third rating. Initial analysis of the data from 1448 vignettes scored at least twice in the new scoring system indicates that the first and second raters agreed on the pass/fail status of the vignette in 83% of the cases. The remaining 17%, in which the first and second raters disagreed, were scored by a third rater. The divergent score is thrown out so 100% of the final scores are in agreement.

(22) See References.


References 

American National Standards Institute, Washington DC, ANSI/ISO/IEC 17024:2003

Bauer, M., Williamson, D. M., Steinberg, L. S., Mislevy, R. J., & Behrens, J. T. (April, 2001). How to create complex measurement models: A case study of principled assessment design. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA.

Bejar I. I. (1991). A methodology for scoring open-ended architectural design problems. Journal of Applied Psychology, 76(4), 522-532.

Bejar, I. I., & Braun, H. (1994). On the synergy between assessment and instruction: Early lessons from computer-based simulations. Machine-Mediated Learning, 4, 5-25.

Bejar, I. I., & Braun, H.. (March, 1999). Architectural simulations: From research to implementation: Final report to the National Council of Architectural Registration Boards (ETS RM-99-2). Princeton, NJ::ETS.

Bejar, I. I., & Whalen, S. J. (1997, March). A system for interactive standard setting. Paper presented at the annual conference of the National Council on Measurement in Education, Chicago, IL.

Bejar, I. I., Yepes-Baraya, M., & Miller, S. (1997, March). Characterization of complex performance: From qualitative features to scores and decisions. Paper presented at the annual conference of the National Council on Measurement in Education, Chicago, IL.

Braun, H. I., Bennett, R. E., Frye, D, & Soloway, E. (1990). Scoring constructed responses using expert systems. Journal of Educational Measurement, 27, 93-108.

Clauser, B. E., Margolis, M. J., Clyman, S. G., & Ross, L. P. (1997). Development of automated scoring algorithms for complex performance assessments: A comparison of two approaches. Journal of Educational Measurement, 34, 141-161.

Clauser, B. E., Subhiyah, R. G., Nungenster, R. J., Ripkey, D. R., Clyman, D. R., & McKinley, D. (1995). Scoring a performance-based assessment by modeling the judgement process of experts. Journal of Educational Measurement, 32, 397-415.

Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.

Kenney, J. F. (1997). New testing methodologies for the Architect Registration Examination. CLEAR Exam Review, 8(2), 23-28.

National Commission for Certifying Agencies, Standards for the Accreditation of Certification Programs, 2004.

Sebrechts, M. M., Bennett, R. E., & Rock, D. A. (1991). Agreement between expert system and human raters’ scores on complex constructed-response quantitative items. Journal of Applied Psychology, 76, 856-862.

Williamson D. M., Bejar I. I., & Hone, A. S. (1999). “Mental model” comparison of automated and human scoring. Journal of Educational Measurement, 36(2), 158-184.


Appendix 

About The Caviart Group

The Caviart Group is a partnership of leading experts in the field of credentialing and testing with extensive experience in managing large and complex national and international projects. The principals have more than 50 years of combined experience in creating and managing professional credentialing programs and in designing, developing, and administering occupational and professional examinations. They have been responsible for the examination of more than one million candidates in the U.S. and abroad in both paper-and-pencil and computer-based formats.

The company’s broad service categories include credentialing and assessment services such as job analyses; examination design, development, and delivery; psychometric and examination audits; and business consulting services, including strategic planning, new product development, and marketing.

The principals have conceived, developed and grown several very successful national and international credentialing and licensure programs from the ground up and are uniquely qualified and experienced with both traditional and cutting-edge technology.

The Caviart Group brings to every project not only the demonstrated experience and skill necessary to perform the tasks, but also the qualities of professionalism—character, integrity and judgment—needed to ensure the credibility of results. We also provide exceptional customer care and service. Our goal is to become an integral part of our client’s team and to establish a long-term relationship.

Current/Recent Projects

  • Conducting Job/Practice Analysis or Market Analysis Studies
    • Association of Financial Professionals
    • Commission on Dietetic Registration
    • Registry of Interpreters for the Deaf
    • Global Association of Risk Professionals
    • Certification Commission of Healthcare Interpreters
    • Muscular-Skeletal Sonographers Certification Examination
    • Electronic Warfare Professionals
    • Society of Technical Communicators
    • Professional Landcare Network
  • Creating a new certification program for the Commission on Dietetic Registration
  • Creating a new certification program for the Society of Technical Communicators
  • Creating a new certification program and examinations for the Association of School Business Officials (ASBO)
  • Developing a computer-based simulation exam for the American Registry for Diagnostic Medical Sonographers (ARDMS)
  • Providing organizational advice and candidate management systems design for the Pharmacy Technician Certification Board (PTCB)
  • Redesigning the performance examination scoring process and criteria for the Registry of Interpreters for the Deaf (RID)
  • Designing the Candidate Management System for the National Institute for Automotive Service Excellence (ASE)
  • Providing advice and counsel to the Certified Financial Planners (CFP) Board
  • Creating advanced performance-based items for engineering technologists
  • Developing advanced, multi-media examination items for medical specialty certification boards
  • Providing project management for the Institute of Internal Auditors (IIA) conversion to computer-based testing, design of the candidate management system, and quality control for IIA’s global certification program

Current and Previous Clients Include

American Board of Family Medicine (ABFM)
American Board of Internal Medicine (ABIM)
American Registry for Diagnostic Medical Sonography (ARDMS) Association of Financial Professionals (AFP)
Board of Certified Safety Professionals (BCSP)
Certification Commission of Healthcare Interpreters (CCHI) Commission for Dietetic Registration (CDR)
Association of Old Crows (Electronic Warfare Professionals) Global Association of Risk Professionals (GARP)
Institute for Automotive Service Excellence (ASE)
Institute for Certified Professional Managers (ICPM)
Institute of Internal Auditors (IIA)
National Environmental Balancing Bureau (NEBB)
National Institute for Certification Engineering Technology (NICET) Pharmacy Technicians Certification Board (PTCB)
Professional Landcare Network (PLANET)
Registry of Interpreters for the Deaf (RID)
Society of Technical Communicators (STC)

Caviart Group Qualifications

Clarence “Buck” Chaffee, President

Mr. Chaffee is the president of The Caviart Group—a certification and testing design and development company specializing in the creation of cutting-edge technology for certification programs. He directs the company’s work on advanced technology which includes major projects involving the creation and implementation of major international testing programs, research and development of computer- delivered performance tests with interactive 3-D technology, and the development of state-of-the-art item banking and candidate management systems.

Mr. Chaffee has more than 30 years of senior-level experience in the field of certification and testing. He has been involved in the design and development of advanced examination items since the 1980’s when he led the research and development of computer-based simulation or architectural registration examinations. He has extensive experience in developing hands-on and computer-based performance tests.

He has managed the conversion of several programs to computer-based testing and directed statistical analyses at all levels including the development of cutting-edge scoring algorithms. Mr. Chaffee has also worked with state and provincial licensure boards for 25 years and has a reputation in the regulatory arena for honesty and integrity as well as for producing examinations of the highest quality. He is recognized as a leading authority in the development of high-stakes examinations and is a respected author, speaker and expert witness on topics involving the examination, education and regulation of professionals.

Prior to founding The Caviart Group, Mr. Chaffee served as the Executive Director of the Council of Landscape Architectural Registration Boards (CLARB). He oversaw the development and grading of the Landscape Architect Registration Examination (L.A.R.E.) from its inception in 1992 to 2006. He has served as President and Chief Executive Officer of the Center for Collaboration and Education in Design (C2Ed)—a for-profit corporation providing high quality continuing education over the Internet for design professionals. He also served as Executive Director of the Landscape Architectural Registration Boards Foundation—a charity which supports research and other efforts to advance the education of landscape architects.

Mr. Chaffee has directed international studies of the profession of landscape architecture in 1990, 1998, 2003 and 2006. Previously, he served for eight years as the Director of Examinations for the National Council of Architectural Registration Boards where he was responsible for the research behind the first computer-administered and scored Architect Registration Examination.

Mr. Chaffee holds a professional degree in architecture from Virginia Polytechnic Institute and State University. He is a licensed architect, NCRA Certified and an honorary member of the American Society of Landscape Architects. He has been an invited speaker at the American Society of Association Executives, the National Organization for Competency Assurance, the Performance Testing Council and the Association for Test Publishers. He previously served as the chairman of the steering committee for the Certification Networking Group in Washington, D.C.