Week 5
This week, I managed to get my scripts for getting data and running meta procedures working to where Kaldi was able to extract CMVN stats and MFCC features from the data. These are descriptors that are important in preparing and training a model. The data and metadata files being formatted correctly and in the right locations also meant that it will be possible to extract the multiple “phone” files related to the sounds in the CORAAL audio. The biggest frustration this week was understanding how exactly to sort my files. It was easy enough to call Bash’s sorting function but Bash seems to have not built in way to overwrite then replace file contents. It was also a challenge figuring out in which script to do this as to be most efficient and if it would make sense to create temporary files to help in overwriting and copying the sorted data. I decided not to.