The development of an open access multimedia language data archive, VALID, will facilitate and promote innovative research in the area of language impairment. There are a number of compelling reasons for exchanging and sharing this kind of data. Obviously, in a small country like the Netherlands a considerably smaller amount of language data is being collected than in much larger regions, especially so with respect to language disorders, a highly specific research domain. The combination of a wide range of language impairments in one data archive not only enhances the study of similar impairments but also advances comparisons between different disorders. Moreover, the inclusion of different age groups allows for quasi-longitudinal research designs. Finally, analysis of task properties and effects that are specific to pathological language groups can make a significant contribution to evidence-based research. Three examples can illustrate the scientific merits of a VALID data archive. In each case, the availability of specific data collections can address research questions that could not be answered otherwise or would demand elaborate collection of new data.

  1. Methodology: availability of normative data. The consequences of language impairment can only be verified by a comparison to typical development. This matching procedure is a painstaking process, often requiring a large pool of potential matches to select controls from. If data are available from a significant number of controls, matching can be made substantially easier.
  2. Infrequent clinical conditions. For those researchers who focus on highly specific subgroups of language and speech problems, such as Landau-Kleffner syndrome or aphasia in Dutch Sign Language, it is hard to find subjects. Research becomes feasible when data can be accessed from other data sets.
  3. Comorbidity. The research area of language impairment typically entails verifying comorbid symptoms in a clinical group (like SLI and dyslexia) and comparing them on behavioral variables to groups that have only the ‘other’ disorder. The availability in one data archive of ‘pure’ clinical groups and data from individuals who show symptoms of more than one disorder makes such research more feasible.

The aim of the curation project is to curate five data sets. They will define the launching platform for the VALID data archive, together with the BISLI data set that is currently being curated (FESLI, Functional Elements in Specific Language Impairment). For all data sets concerned, written informed consent from the participants or their caretakers has been obtained. Informants or their caretakers have agreed to share their speech/language data and metadata, on the condition of anonymity, which will be ensured by the data providers and infrastructure specialists, when the resources are curated.