My name is Carter Huffman, and I'm currently the CTO of Modulate where I'm working on our Voice Conversion technology! We're a tiny startup based out of the Cambridge Innovation Center in Cambridge, MA. If you're in the area I'd love to meet up! Feel free to send me a message at whuffman@whuffman.net.
Before starting Modulate, I spent two years in the Machine Learning and Instrument Autonomy group at the Jet Propulsion Laboratory. Some of my projects included:
My current work at Modulate focuses on creating representations of human speech which disentangle the sound of a person's voice (their timbre) from the content of their speech (word/phoneme choice and pronunciation, emotional content, cadence, etc.). The objective is to be able to manipulate these parts of the representation independently - for example, modifying the sound of a person's voice without affecting the content (you can hear an example of this here). While deep learning is a great choice for creating abstract, manipulable representations of high-dimensional data (such as audio); most robust applications involve having ground-truth label information on the targets that you care about (for example, text-to-speech applications train on audio/text pairs), whereas voice conversion needs to manipulate (or preserve) characteristics of speech for which labeled data is scarce or non-existent (e.g. emotional content or emphasis). Finding ways to force neural networks to work around these abstract aspects of the data, while being unable to directly reference those aspects during training, is part of what makes voice conversion a difficult (and interesting) problem!
Beyond deep learning for voice conversion, speech synthesis, etc.; I'm interested in the broader scope of probabilistic graphical models, and in particular their use in interpretable machine learning. The key property is being able to enforce specific dependency structures between variables in a model a-priori (based not only on understanding of causal relationships, but also on regulation, or ethical concerns). I expect that, over time, regulation and ethical considerations will put pressure on data scientists in many fields to use models which give more explicit control over dependence or independence between various variables and predictions (while it's true that neural networks can be shown to make predictions independent of some variables, this is decided by dataset balancing, and is not enforced in the model itself).