Week 2

For this week, I began and finished implementing the first bash script to automate the training and testing of the chatbot AI. The logic of the script was fairly straightforward. The script randomly divides the data from a folder, for this one, the yesno folder, into training and testing datasets. 80% of the original data would go to testing and the other 20% to training. The main issue I had were in the grammar of the portions of the code that handled copying the data as well as creating a folder for each audio file in the test and train folders. Another issue was that my main machine does not have all that much space left so I cannot download extremely large datasets but the graduate student working with me has assured me that that probably will not be an issue. There are also other machines in the lab I can use. The preliminary steps of this process have familiarized me more with the data in the kaldi workspace as well as other prep scripts in the workspace. I have started on our second main task, similar to task one but using the larger Valdosta (VLD) dataset. I need to automate the downloading, extraction, and partitioning of this data. Finally, I need to create 3 metadata files using a provided metadata text file. I may need to read into this file at some point along the process and figure out a concise way to sift through the metadata within.

Written on June 12, 2022