Boris Smus

interaction engineering

Balancing technological pessimism

Pessimists archive is a podcast that chronicles pessimistic reactions to emerging technology as it was becoming mainstream. Technology here is defined broadly, covering a broad range of topics: bikes, coffee, pinball machines, vaccines, recorded music. The podcast is very accessible, focused more on social and psychological issues and less on the tech itself.

Continued →

How rationalists can win

The belief that "rationalists should win" is widely held in the rationalist community. So: does being a good rationalist actually help you win? Certainly in some domains, like engineering and science, which focus on quantification, systematization, and prediction. There, having a hyper-rational mindset is clearly an advantage. As for winning at life, which I will take that to mean leading to greater success in survival, evolution, and human flourishing, I don't think rationality helps very much.

Continued →

Web-based voice command recognition

Last time we converted audio buffers into images. This time we'll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command ("yes" or "no"), and see the output of the classifier in real-time, like this:

Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and no? Try it out live. You will quickly see that the performance is far from perfect. But that's ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let's dive into how this works.

Continued →

Audio features for web-based ML

One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn't. Let's preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We'll do this by converting sounds like this:

Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:

Final log-mel spectrogram.

The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don't care and just want to see the code, or play with some live demos, be my guest!

Continued →

UIST 2017 highlights

Picking up where I left off 3 years ago with this year's UIST highlight reel. As expected, the research creatively applied interesting principles, but many applications were adorably contrived. Also, I miss academia!

Continued →

Memento Mori

The association of sundials with time has inspired their designers over the centuries to display mottoes as part of the design. Often these cast the device in the role of memento mori, inviting the observer to reflect on the transience of the world and the inevitability of death. – Wikipedia

WatchKit screenshot

This rich tradition is now available on Apple Watch.

A respectful truce?

It's a fact that men greatly outnumber women in software engineering. As for why, there is a fundamental disagreement between social constructivists and evolutionary psychologists.

Continued →

Filter playground

"You don't understand anything until you learn it more than one way." – Marvin Minsky

In my short Web Audio book, I covered the BiquadFilterNode, but didn't have any sense for how it worked. As I sat down to read Human and Machine Hearing, it became clear that I needed to catch up on some digital filtering fundamentals.

What follows is an introduction to digital filters via explorable explanation I built to help myself better understand some DSP concepts. The approach I took was to try to present the concept as visually and aurally as possible, maximizing opportunities to build intuition. I learned a lot in the process. Read on for a introduction, jump ahead to the Filter Playground, or check out this video:

Continued →

The end of endings

Despite the obnoxious title, this a16z podcast was unusually insightful.

One big theme in the podcast can be summarized as an end of endings: a trend away from one story with a coherent, finite arc, towards something neverending. Amorphous TV shows like Lost, as well as Hollywood's discovery that a coherent franchise like Star Wars can be milked for many more dollars are examples of this. The stats backs it up: Hollywood moved from 10% of the hits of the 90s being sequels to 50% today, as can be seen in the box office data. Looking at TV shows through the same lens, each show is a one hour long movie, followed by tens of sequels. And successful franchises aim to immerse their fans (especially kids) into their universe with figurines, games, lunch boxes, bed sheets.

The end of endings also shows up in user interfaces, and is epitomized by the infinite scroller. Millions of souls fixate on the next pretty picture, the next baby picture, the next outrage. Entertainment that never ends!

This is interesting in the context of attention. In the realm of meditation, the challenge is to focus on something mundane, like your breath. In the realm of entertainment, the challenge is to snap out of an immersive world carefully constructed to consume your attention for as long as possible.

Intelligence, aliens, and self-improvement

I recently enjoyed a podcast about intelligence, aliens and learning where Kevin Kelly was interviewed by Sam Harris. Here are some things I liked about it:

  • Kelly makes a great argument that with text we already have no way of knowing what is true. We have to rely on the source. My fears of audio and video fake news are reduced.

  • It's obvious that the world is changing quickly, which means lifelong learning is necessary. One thing I haven't systematically understood is how I learn best. How would you even figure that out?

  • Chimpanzees are amazing at short term spatial memory, and do better than humans in these cognitive tasks on average.

  • Currently, the best Chess players are human-computer hybrids. Harris claims that computers alone will eventually be better than these hybrids, but his claim that "the ape will just be adding noise" is unsubstantiated.

  • About mid-way, the two discuss the analogy between AI and aliens. I haven't thought of this either. But I wondered how salient that analogy was given that we have a largely shared environment with whatever AIs we create. To the extent that the AIs truly are alien, I wonder how much of a unifying force these alien AIs could be for humanity.

  • I strongly agreed with Kelly when he says that building machine systems enhances our self-understanding. Building VR rendering and head tracking systems, you benefit from some human visual perception basics. Human psychology is critical for building user interfaces. You need an understanding of how human hearing works to build good lossy audio compression.

  • An interesting thought experiment that superscedes Asimov's laws of robotics: what if we built AGIs that were terribly worried about what people think. Then we reduce the machine menace problem to merely the evils of humanity.