Plan | VALID

Five databases will be curated in this Resource Curation Project.

All audio files will be curated into .wav-files (linear PCM). All transcriptions (now typically in Praat text grids) will be converted into CHAT or ELAN format. The video recordings and the transcripts have to be synchronized. Research data that consists of test forms, SPSS data files, SPSS syntax, and Excel files outputs will be properly documented and kept in their original formats, since conversion would be confusing. All databases will obtain appropriate CMDI metadata files, both at database level and at recording session level (per speaker).

Starting from existing profiles (such as for DBD and LESLLA) a specific new profile for data resources related to language and speech impairments will be established. Care will be taken that ISOcat categories are used. If categories are encountered that do not yet exist in ISOcat they will be defined in ISOcat terms and proposed to the TDG. After curation all data will be deposited at the MPI and all resources and metadata files will get persistent identifiers. A metadata compliance and harvesting test will be performed by the MPI. Eventually all metadata will be searchable and browsable in the MPI metadata catalogue and the annotations are searchable with the MPI search engine “TROVA”.