Boris Smus

interaction engineering

Required reading for VR enthusiasts

Several books, movies, and articles have helped to shape my opinions on Virtual Reality. I've was initially surprised by some former colleagues that it is possible to view these works as aspirational rather than cautionary. I generally prefer the dystopian interpretation.

Balancing technological pessimism

Pessimists archive is a podcast that chronicles pessimistic reactions to emerging technology as it was becoming mainstream. Technology here is defined broadly, covering a broad range of topics: bikes, coffee, pinball machines, vaccines, recorded music. The podcast is very accessible, focused more on social and psychological issues and less on the tech itself.

Continued →

How rationalists can win

The belief that "rationalists should win" is widely held in the rationalist community. So: does being a good rationalist actually help you win? Certainly in some domains, like engineering and science, which focus on quantification, systematization, and prediction. There, having a hyper-rational mindset is clearly an advantage. As for winning at life, which I will take that to mean leading to greater success in survival, evolution, and human flourishing, I don't think rationality helps very much.

Continued →

Web-based voice command recognition

Last time we converted audio buffers into images. This time we'll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command ("yes" or "no"), and see the output of the classifier in real-time, like this:

Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and no? Try it out live. You will quickly see that the performance is far from perfect. But that's ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let's dive into how this works.

Continued →

Audio features for web-based ML

One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn't. Let's preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We'll do this by converting sounds like this:

Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:

Final log-mel spectrogram.

The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don't care and just want to see the code, or play with some live demos, be my guest!

Continued →

UIST 2017 highlights

Picking up where I left off 3 years ago with this year's UIST highlight reel. As expected, the research creatively applied interesting principles, but many applications were adorably contrived. Also, I miss academia!

Continued →

Memento Mori

The association of sundials with time has inspired their designers over the centuries to display mottoes as part of the design. Often these cast the device in the role of memento mori, inviting the observer to reflect on the transience of the world and the inevitability of death. – Wikipedia

WatchKit screenshot

This rich tradition is now available on Apple Watch.

A respectful truce?

It's a fact that men greatly outnumber women in software engineering. As for why, there is a fundamental disagreement between social constructivists and evolutionary psychologists.

Continued →

Filter playground

"You don't understand anything until you learn it more than one way." – Marvin Minsky

In my short Web Audio book, I covered the BiquadFilterNode, but didn't have any sense for how it worked. As I sat down to read Human and Machine Hearing, it became clear that I needed to catch up on some digital filtering fundamentals.

What follows is an introduction to digital filters via explorable explanation I built to help myself better understand some DSP concepts. The approach I took was to try to present the concept as visually and aurally as possible, maximizing opportunities to build intuition. I learned a lot in the process. Read on for a introduction, jump ahead to the Filter Playground, or check out this video:

Continued →

The end of endings

Despite the obnoxious title, this a16z podcast was unusually insightful.

One big theme in the podcast can be summarized as an end of endings: a trend away from one story with a coherent, finite arc, towards something neverending. Amorphous TV shows like Lost, as well as Hollywood's discovery that a coherent franchise like Star Wars can be milked for many more dollars are examples of this. The stats backs it up: Hollywood moved from 10% of the hits of the 90s being sequels to 50% today, as can be seen in the box office data. Looking at TV shows through the same lens, each show is a one hour long movie, followed by tens of sequels. And successful franchises aim to immerse their fans (especially kids) into their universe with figurines, games, lunch boxes, bed sheets.

The end of endings also shows up in user interfaces, and is epitomized by the infinite scroller. Millions of souls fixate on the next pretty picture, the next baby picture, the next outrage. Entertainment that never ends!

This is interesting in the context of attention. In the realm of meditation, the challenge is to focus on something mundane, like your breath. In the realm of entertainment, the challenge is to snap out of an immersive world carefully constructed to consume your attention for as long as possible.