Week 3

So far this week, I have been working on automating the creation of metadata files for the audio data we will be working with. For the CORAAL data we have been using so far, it should be easier in theory to create metadata files for that data since it has a standard and fairly straightforward file naming convention. To make the scripts more generalizable, I have decided to use the CORAAL data naming convention as a standard for any file that the user may choose to work with. I spoke with my graduate student supervisor and we agree that a format where the original file name is preserved but has a “CORAAL-like” ID attached would be good. So far, I have 3 scripts. One to download the data, one to create the metadata files, and one to sort the metadata files. I must modify the first script so that it can rename files that it downloads using information that the user would want reflected in the titles such as socio-economic grouping, gender, and age group. The sorting script is in its second version currently. A broad issue has been designing the scripts to be general enough but also we cannot predict every use case so there must be some rules for standardization of the file names. Hopefully, once all three scripts have undergone the necessary modifications, they can be placed in any new speech directory and the user would only need to feed the download script and sorting script a few pieces of information along the way.

Meeting of Women in CompSci that my DREU partner and I were invited to.

(From left to right) My DREU partner, Dr. Gray, and I.

Written on June 17, 2022