Carter Huffman's Homepage

About Me

My name is Carter Huffman, and I'm currently the CTO of Modulate where I'm working on our Voice Conversion technology! We're a tiny startup based out of the Cambridge Innovation Center in Cambridge, MA. If you're in the area I'd love to meet up! Feel free to send me a message at whuffman@whuffman.net.

Before starting Modulate, I spent two years in the Machine Learning and Instrument Autonomy group at the Jet Propulsion Laboratory. Some of my projects included:

designing, implementing, and operating a statistical instrument model for the Lunar Flashlight science payload
designing and implementing an image-based navigation algorithm for use on small astronomical body flyby missions, such as the NEAScout mission
designing and implementing an image-based unsupervised plume detection algorithm
writing a clustering-based tracking and neural network-based identification system for object avoidance onboard autonomous marine vehicles

Research Interests

My current work at Modulate focuses on creating representations of human speech which disentangle the sound of a person's voice (their timbre) from the content of their speech (word/phoneme choice and pronunciation, emotional content, cadence, etc.). The objective is to be able to manipulate these parts of the representation independently - for example, modifying the sound of a person's voice without affecting the content (you can hear an example of this here). While deep learning is a great choice for creating abstract, manipulable representations of high-dimensional data (such as audio); most robust applications involve having ground-truth label information on the targets that you care about (for example, text-to-speech applications train on audio/text pairs), whereas voice conversion needs to manipulate (or preserve) characteristics of speech for which labeled data is scarce or non-existent (e.g. emotional content or emphasis). Finding ways to force neural networks to work around these abstract aspects of the data, while being unable to directly reference those aspects during training, is part of what makes voice conversion a difficult (and interesting) problem!

Beyond deep learning for voice conversion, speech synthesis, etc.; I'm interested in the broader scope of probabilistic graphical models, and in particular their use in interpretable machine learning. The key property is being able to enforce specific dependency structures between variables in a model a-priori (based not only on understanding of causal relationships, but also on regulation, or ethical concerns). I expect that, over time, regulation and ethical considerations will put pressure on data scientists in many fields to use models which give more explicit control over dependence or independence between various variables and predictions (while it's true that neural networks can be shown to make predictions independent of some variables, this is decided by dataset balancing, and is not enforced in the model itself).

William Carter Huffman's Homepage

About Me

Research Interests