As a responsible test provider, we are committed to demonstrating work towards the reliability and validity of our tests.  We are proud of the work we do and the efforts we go to, to establish these.

This page will be updated as new information comes in, with respect to the validity and reliability of our tests. This version is effective December 2020.

TEAC Validity

A case for test validity can be made in a number of ways.  One of these, “validity as argument” (Kane, 2001) conceives of validity as an argument made with respect to the meaning and use of scores from a test developed for a particular purpose.

The goal of TEAC is to assess English within the framework of the ICAO language proficiency requirements for the purposes of interacting within civil aviation.  Given such a meaning and use, it seems reasonable to apply the “validity as argument” structure of developing a validity case. Kane’s (2001) proposal for test validity arguments follows the structure of arguments outlined by Toulmin (2003):

  • A claim (the conclusion we wish to justify)
  • The grounds for the claim (data or information currently available)
  • Warrants (the link between the grounds and the claim)
  • Backing (theoretical models, prior research or supporting data for the warrants)
  • Rebuttals (counter-claims that the warrant does not justify the link between the ground and the claim).
  • As a new test, work must be continue to be done to establish backing between warrants and claims for TEAC, however there are reasons to suppose that such backing will be found. 

    Here we outline the case for a validity argument for TEAC, existing reasons to suppose that the backing will be found for the warrants, and proposals for a research agenda to establish this backing.


    The TEAC provides an assessment of language proficiency relevant to the requirements of the ICAO LPRs, and performance in TEAC is predictive of workplace linguistic performance in plain-language situations.


    Grounds (1): The ICAO Language Proficiency Requirements at Level 4 are sufficient to predict success in workplace plain-language interactions.

    Grounds (2): The TEAC reliably assesses language proficiency in terms of the ICAO Language Proficiency Requirements at all levels of proficiency.


    Warrants (1.1): The ICAO LPRs are valid.

    Warrants (1.2): TEAC Performance is predictive of plain-language performance in real-world aviation scenarios.

    Warrants (2.1): Part One allows for rater judgements to be made of candidate use of Pronunciation, Interactions, work-related Vocabulary, basic Structures, discourse-related Fluency and Comprehension of questions.

    Warrants (2.2): Part Two allows for rater judgements to be made of candidate passive recognition and use of work-related Vocabulary & Structures as well as Comprehension.

    Warrants (2.3): Part Three allows for rater judgements to be made of candidate passive recognition and use of Vocabulary associated with non-routine events, a wide range of Structures as well as Comprehension and Interactions.

    Warrants (2.4): Part Four allows for rater judgements to be made of candidate use of Pronunciation, Interactions, a wide range of Vocabulary, more complex Structures, Comprehension of questions and both discoursal and flow-related Fluency.

    Warrants (2.5): Rater judgements are reliable and grounded in the ICAO LPRs. 


    Backing (1.1): This is assumed to be true.   If it is not, there is little that we, as test developers, can do about it, and we are bound to test according to these LPRs.

    Backing (1.2): A future, planned concurrent / predictive validity assessment will ask flight instructors / ATC instructors who are unaware of candidate performance in TEAC to assess their trainees for what difficulties they have experienced or are likely to experience in communicating non-routine aviation scenarios over the radio.  This can be correlated with TEAC performance by means of linear regression.  The null hypothesis of such a study would be that there is no relationship between TEAC performance and instructor assessments, but particular attention will be paid to the cut-offs between what TEAC assess at, for example Level 3 - Level 4 and Level 5 - Level 6.

    In the interim, we have sought the opinions of SMEs and experienced ELEs as to whether they believe the test tasks are designed in such a way as to elicit appropriate language to meet the LPRs.  Indeed, current items have been revised to that effect.  SMEs have been heavily consulted in terms of content to ensure that test tasks meet SME ideas of appropriate requirements. 

    Findings from this early study are mainly in the form of correspondence, but can be discussed with interested parties on request.

    Backing (2.1 - 2.4): A future study will analyse sampled candidate responses and invite independent language experts to assess these responses as being categorizable in terms of the categories within the ICAO LPRs, as well as the level to which they correspond, as assessed independently and by our own raters.  

    In the short term, we are asking representatives of airlines, aviation training organisations, air navigation service providers, and approved language test entities to complete a survey in order to provide backing for these warrants.  Results are still coming in at the time of writing. 

    Backing 2.5: We have commenced rater standardisation and developed procedures to ensure rater standardisation is enhanced for TEAC compared to any previous tests we have worked for. These procedures include making other ratings transparent to raters, having all tests assessed by SMEs and ELEs, as well as annual group rating exercises.

    Early studies have shown good reliability for the SME and ELE rating team on a sample of 12 practice tests (Alpha coefficient > 0.9). This will be considered further below.


    Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. London and New York: Routledge

    Kane, M.  (2001). ‘Current concerns in validity theory. Journal of Educational Measurement 38(4), 319–342

     Toulmin, S.E. (2003). The Use of Argument (updated edition). Cambridge University Press, Cambridge, UK.

    TEAC Reliability

    A sample of 12 tests was used to conduct initial inter-rater reliability studies among the core team who work to develop TEAC's rating standards. 

    Scores in each profile were considered separately for reliability, as well as overall (final) scoring.  These scores were submitted independently via TEAC's online rater assessment tool and scores compared to find Cronbach's alpha and the intra-class correlation coefficient (2-way random, absolute agreement).

    Both of these values report a coefficient whereby +1 represents complete agreement and -1 complete negative agreement.  0 indicates no correlation.

    A value of >0.7 is considered acceptable for many purposes, but for high-stakes language assessment, values above 0.8 are preferable.  Lenguax's internal target for senior rater ICC absolute agreement is 0.85.

    Initial Results

    Single-measure ICC absolute agreement among Lenguax raters was highly reliable (>0.85) for Structure, Vocabulary and Comprehension rating, and reliable (>0.8) for Pronunciation, Fluency and Interactions.  Cronbach's alpha was 0.954 (N = 12).

    This represents a pleasing initial benchmark, and a basis on which to build.

    Rater Training

    Since the initial rating, Lenguax has launched its own rater course for the training of new raters following this standard.  Existing raters have access to all the online training and example tests. 

    Statistical Analysis

    In addition to the Cronbach alpha and ICC calculations, rater course participants and other raters are monitored regularly using FACETS software, which enables statistical modelling of rater severity among many other powerful tools.  As more tests are rated, the software is able to model with increasing accuracy the various elements that affect reliability and will allow us to provide more targetted training to new and experienced raters. 

    BLC Reliability

    The Benchmark Level Check is an online 120-question assessment which aims to provide an approximate benchmark of English language proficiency in terms of listening comprehension, grammar and vocabulary which can give decision-makers an idea of the level of ability of candidates in terms of the ICAO LPRs. 

    Development of BLC is planned in three phases.

    Item Trialling

    An initial sample of items were trialled with approximately 100 volunteers, and item facility values were studied in order to gain an understanding of the benchmark level of difficulty that was required for a candidature of this nature. 

    Test Reliability

    Using WINSTEPS software, items could be studied for parameters such as item infit and logit difficulty.  In this way underperforming items could be revised to improve reliability.  This work is currently ongoing. 

    Mapping to ICAO Levels

    Once the items are sufficiently revised, a mapping project can be undertaken to determine the logit difficulty of ICAO test reference scores for a representative sample of candidates when compared to BLC test scores.  In this way, probabilistic estimates for BLC scores and ICAO test scores can be empirically determined.  This work is expected to be done in Q1 or Q2 2021. 


    We are happy to discuss these matters with interested parties and decision makers.  You can make contact using the form below.