Boris Smus

interaction engineering

Web-based voice command recognition

Last time we converted audio buffers into images. This time we'll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command ("yes" or "no"), and see the output of the classifier in real-time, like this:

Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and no? Try it out live. You will quickly see that the performance is far from perfect. But that's ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let's dive into how this works.

Continued →

Audio features for web-based ML

One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn't. Let's preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We'll do this by converting sounds like this:

Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:

Final log-mel spectrogram.

The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don't care and just want to see the code, or play with some live demos, be my guest!

Continued →

UIST 2017 highlights

Picking up where I left off 3 years ago with this year's UIST highlight reel. As expected, the research creatively applied interesting principles, but many applications were adorably contrived. Also, I miss academia!

Continued →

Memento Mori

The association of sundials with time has inspired their designers over the centuries to display mottoes as part of the design. Often these cast the device in the role of memento mori, inviting the observer to reflect on the transience of the world and the inevitability of death. – Wikipedia

WatchKit screenshot

This rich tradition is now available on Apple Watch.

A respectful truce?

It's a fact that men greatly outnumber women in software engineering. As for why, there is a fundamental disagreement between social constructivists and evolutionary psychologists.

Constructivists say that it’s because of systematic oppression against women. Boys in the 80s grew up with computers, learned programming and made life a living hell for their female classmates in CS101. Planet Money has the scoop. Evolutionary psychologists say that it’s because men and women have evolutionarily caused differences in interests and point to studies indicating that progressive countries have larger gender representation gaps because women are free to choose careers based on interest. I find it likely that both positions contribute in part to the gender representation disparity. Compelling arguments can be made for and against both, and both are rooted in academic disciplines that are dangerously squishy. As a result, I don’t think we currently have a good scientific way of determining this.

So what; science, schmience! Followers of the evo psych and constructivism are at war. One continuously pisses the other off with their inflamatory rhetoric. It is true that white males are historically privileged, but hammering “white male privelege” into their heads predictably puts white males into a defensive stance. Similarly, discussing how evo psych means that women are less interested and therefore less capable of software engineering will predictably annoy females, especially those rightfully proud of their software engineering prowess, and tired of dealing with similar allegations for their whole career.

I've spent way too much time reading and thinking about this lately, and it saddens me to conclude that the wisest course of action is to avoid discussing this topic entirely (oops, too late). Scott Aaronson, Sarah Constantin and Stacey Jeffery propose a respectful truce between the two camps. Advocates of evolutionary psychology should:

do everything they can to foster diversity, including by creating environments that are welcoming for women, and by supporting affirmative action, women-only scholarships and conferences, and other diversity policies and also agree never to talk in public about possible cognitive-science explanations for gender disparities in which careers people choose, or overlapping bell curves, or anything else potentially inflammatory."

Meanwhile, social constructivists should:

avoid libelling [white men] as misogynist monsters, who must be scaring all the women away with their inherently gross, icky, creepy, discriminatory brogrammer maleness.


Filter playground

"You don't understand anything until you learn it more than one way." – Marvin Minsky

In my short Web Audio book, I covered the BiquadFilterNode, but didn't have any sense for how it worked. As I sat down to read Human and Machine Hearing, it became clear that I needed to catch up on some digital filtering fundamentals.

What follows is an introduction to digital filters via explorable explanation I built to help myself better understand some DSP concepts. The approach I took was to try to present the concept as visually and aurally as possible, maximizing opportunities to build intuition. I learned a lot in the process. Read on for a introduction, jump ahead to the Filter Playground, or check out this video:

Continued →

The end of endings

Despite the obnoxious title, this a16z podcast was unusually insightful.

One big theme in the podcast can be summarized as an end of endings: a trend away from one story with a coherent, finite arc, towards something neverending. Amorphous TV shows like Lost, as well as Hollywood's discovery that a coherent franchise like Star Wars can be milked for many more dollars are examples of this. The stats backs it up: Hollywood moved from 10% of the hits of the 90s being sequels to 50% today, as can be seen in the box office data. Looking at TV shows through the same lens, each show is a one hour long movie, followed by tens of sequels. And successful franchises aim to immerse their fans (especially kids) into their universe with figurines, games, lunch boxes, bed sheets.

The end of endings also shows up in user interfaces, and is epitomized by the infinite scroller. Millions of souls fixate on the next pretty picture, the next baby picture, the next outrage. Entertainment that never ends!

This is interesting in the context of attention. In the realm of meditation, the challenge is to focus on something mundane, like your breath. In the realm of entertainment, the challenge is to snap out of an immersive world carefully constructed to consume your attention for as long as possible.

Intelligence, aliens, and self-improvement

I recently enjoyed a podcast about intelligence, aliens and learning where Kevin Kelly was interviewed by Sam Harris. Here are some things I liked about it:

  • Kelly makes a great argument that with text we already have no way of knowing what is true. We have to rely on the source. My fears of audio and video fake news are reduced.

  • It's obvious that the world is changing quickly, which means lifelong learning is necessary. One thing I haven't systematically understood is how I learn best. How would you even figure that out?

  • Chimpanzees are amazing at short term spatial memory, and do better than humans in these cognitive tasks on average.

  • Currently, the best Chess players are human-computer hybrids. Harris claims that computers alone will eventually be better than these hybrids, but his claim that "the ape will just be adding noise" is unsubstantiated.

  • About mid-way, the two discuss the analogy between AI and aliens. I haven't thought of this either. But I wondered how salient that analogy was given that we have a largely shared environment with whatever AIs we create. To the extent that the AIs truly are alien, I wonder how much of a unifying force these alien AIs could be for humanity.

  • I strongly agreed with Kelly when he says that building machine systems enhances our self-understanding. Building VR rendering and head tracking systems, you benefit from some human visual perception basics. Human psychology is critical for building user interfaces. You need an understanding of how human hearing works to build good lossy audio compression.

  • An interesting thought experiment that superscedes Asimov's laws of robotics: what if we built AGIs that were terribly worried about what people think. Then we reduce the machine menace problem to merely the evils of humanity.

Convincing the inconvincible

I highly recommend two podcast episodes on the theme "how to argue", featuring Daryl Davis, the black friend of white supremacists. Mr. Davis is remarkably lucid in dealing with very difficult issues, and presents great tips for convincing the inconvincible.

  1. Gather your information. Understand their position as well as you do your own. Expect to hear things you will disagree with and keep your cool.

  2. Invite them. Have a conversation not a debate. Debates get people's guards up. Instead seek to understand them. People want to be heard and speak their mind freely without retaliation.

  3. Look for commonality. You can find something in 5 minutes and then build on it. Look to other issues. Politics? As you find more commonality the differences matter less.

  4. Talking is not fighting. When the talking stops the ground becomes fertile for violence (cf. Sam Harris). The longer you talk the more commonalities you can find.

  5. Patience.

  6. Make a good first impression. Be respectful.

  7. Don't be condescending or insulting even when you hear things you don't like. Expect opinions that are ridiculous and facts that are wrong. Keep cool.

  8. Don't explain somebody else's movement. Let them explain it and address the points that they themselves define. Let them finish and don't cut them off even though you probably know a lot about their ideas already.

"While you are actively learning about someone else you are passively teaching them about yourself." – Daryl Davis

Climate metaquiz results

Last week I ran a Climate metaquiz, and 123 people responded. The point of a metaquiz is to test how well political groups know the other side, while questions on personal beliefs and knowledge about the climate are secondary. Both the small sample size and potential sampling biases are important caveats to keep in mind here. All that said, Republicans outperformed Democrats on the factual part of the quiz, despite their low self-reported self-confidence. However, Democrats outperformed Republicans on the metaquiz part, with Republicans tending to exaggerate levels of climate change-related handwringing amongst Democrats, as well as their eagerness to exaggerate the facts in the name of behavior change.

Continued →