Humanity’s Evolving Relationship with Computers

By Joshua Newnham, Lead Design Technologist — Method London

Published in

Method Perspectives

12 min readJan 9, 2018

In this post we discuss the evolution of the relationship we have with our computational companions, and look to better understand the application of emotional sensing technologies.

Having an interest in the intersection of design and artificial intelligence exposes you to a lot of interesting concepts and tools that seem appealing and relevant at first glance, but it is only when you start working with them you start asking deeper and more meaningful questions about their application and value to the end user. This was the case with emotional sensing technologies, like those offered by the popular emotional recognition service provider Affectiva, which offers tools for recognizing the user’s emotion based on a image of their face.

As a technologist, you are first attracted by the how and it’s only after you become comfortable with the intricate details of it’s inner workings that you start questioning the why. It was only after having learned and made systems that could satisfactorily classify emotion given some text or an image of a face, that I started questioning how these could be applied. It is only recently that I have realized their significance and applicability; this realization is the starting point of this post.

https://commons.wikimedia.org/wiki/File:Lutzmann_Motorwagen.jpg

The need for new lens

No introduction of new technology is complete without the mention of the “horseless carriage”; a term used by Don Norman when describing the adoption and evolution of design for new technologies. It highlights that we, designers and technologists, normally project our existing mental models onto new technologies and it’s only after many iterations that we start creating new mental models which are more applicable for the technology. An obvious story that illustrates this is how television shows were initially designed and broadcasted; mostly disregarding the element that made them richer than the radio, which was the ability to use the addition of images to portray a story. Instead television shows reverted to little more than a radio show with images of the presenters.

Despite my awareness and interest in Affective Computing, it was hard to envision the use of emotion recognition beyond analytics and reporting. Of course, conceptually, I would often talk about the computer being able to recognize and respond to the emotion of the user but didn’t dig much deeper as I couldn’t see how our existing applications, such as Microsoft Word, could make effective use of it — until recently. But to better understand and appreciate the significance of this revelation it is important to take a step back and review what computers were, how they have been evolving and their likely trajectory (in respect to their application and how we interact with them).

A brief history of the computer

The concept of the computer was devised in the 19th century by an English mathematics professor named Charles Babbage; this concept was appropriately named the Analytical Engine, highlighting its purpose of performing and outputting mathematical calculations. This concept was finally realized around 1943 and found application in the area of trajectory calculations for military purposes. Users tended to be highly trained professionals who would interact with the computer using punchcards detailing explicit instructions for the computer to follow.

Next came industrial computers in the form of mainframes; these, produced by the likes of International Business Machines (IBM), resembled much of their predecessors and again required highly trained users to use them but replacing their physical punchcards with a digital Command Line Interface (CLI) for submitting their instructions. During this era; acknowledgment must be made to advancements in Human Computer Interaction (HCI) to a small group of individuals; individuals including the likes of Steve Russell who saw computers beyond just a batch calculator and envisioned (and created) interactive computer programs that allowed for the first era of computer games such as Spacewar! John McCarthy, considered the father of Artificial Intelligence (AI), envisioned the potential for computers performing the tasks of humans and Doug Engelbart who, paradoxically, envisioned computers that augmented us rather than replacing us and a pioneer for a large array of the direct manipulation concepts we still use today including the mouse and Graphical User Interface (GUI).

In the late 70s we saw the rise of Personal Computers (PC’s); despite their name they were far from personal but finally became affordable and applicable to a large enough population to be considered mainstream. The killer application at the time was the spreadsheet, a sophisticated calculator for office productivity. Because of their availability and adoption, usability soon became very important and issuing commands via a terminal was a barrier for most users.

https://en.wikipedia.org/wiki/File:Apple_Macintosh_Desktop.png

It wasn’t until the early 80s, with the introduction of the GUI, that interacting with computers became (somewhat) democratized. The GUI used many metaphors borrowed from the real world, this along with direct manipulation and rapid feedback made computers accessible to an audience beyond computer experts. During this time we saw rise of the web and extended application and use-cases of computers; going from pure analytical tools to being used for such tasks like communication, entertainment and creative work. This adoption lead to the acceleration of digitization of our physical world; information, entertainment, and our relationships became bytes.

The next significant milestone that influenced how we used computers was around the mid 80’s with the proliferation of the internet; email turned computers into communication devices; people were not interacting, people were interacting with other people through computers — this paradigm, communicating and collaborating via a computer, is now referred to as social computing.

https://www.pexels.com/photo/iphone-6-apple-hand-time-9041/

Then came the iPhone (and then Android); computers finally became truly personal, touch further reduced the friction of use and the addition of sensors, connectivity, and further increase in digitization strengthened their relevance and convenience for the real-world and ‘real-people’. But up until recently they (computers) still required us to explicitly instruct them and communicated through static interfaces. Despite having increased the level of abstraction from the CLI, the core interaction model still remained the same — this is now changing.

We are now entering an era where we are seeing the convergence of Artificial Intelligence (AI) and Intelligence Augmentation (IA) — whereby we have systems that use ‘intelligence’ to better understand us (voice, pictorial, textual, or gestural), our intent and are capable of performing tasks semi-autonomously and, sometimes, proactively.

To further illustrate the evolution of how we interact with computers, I will borrow a plot from Mark Billinghurst, computer interface researcher, that highlights our progression towards natural user interfaces over time.

This plot not only highlights the diminishing friction between us and computers (natural user interfaces) but also how our interactions are shifting from being explicit to implicit i.e. more and more of our systems are becoming anticipatory.

The other notable trends include the role and function of applications; shifting from dealing with clean discrete instructions to those that deal with high degrees of ambiguity i.e. early applications were used for calculating missile trajectories while modern applications deal with recommending songs, movies, partners and organizing your meetings. The final trend I want to highlight is how the form of the computer is changing, from a keyboard and screen to many other forms, from portable slates we carry around in our pockets to intelligent speakers that sit next to our bed.

The intention of the above is not to provide a comprehensive (or accurate) history lesson in computing but rather highlight how the function, form, and our relationship with computers have been evolving over time and it’s likely trajectory — shifting from a pure functional tool to a close companion. So just as the GUI borrowed heavily from the physical world to make interacting with computers more familiar and natural, so too will the need for recognizing, reacting and portraying emotion i.e. we will find it frustrating talking to something deemed intelligent if it is unable to recognize and respond to our emotional state. Being able to exhibit emotion also provides another means of communicating the current state of the system to help the user in building a more accurate and helpful mental model of the system they are interacting with i.e. portraying confusion could help the user understand that the system needs assistance.

In short; instead of emotion being used purely for analytics and reporting, emotional intelligence makes a lot of sense when you’re talking with a Virtual Personal Assistance (VPA), digital avatars or physically embodied computers, such as a robot; essentially anytime you’re dealing with a computer that can be interacted with naturally, has some autonomy, deals with ambiguity and uncertainty, knows you and your preferences, and requires a level of trust. Sound familiar? These traits have typically been confined to people but now our computational companions have also acquired these traits.

Let’s briefly look at a couple of use-cases where emotional intelligence makes sense and how it can be applied.

One example that illustrates this shift in computing well is DragonBot; a research project from the Social Robotics Group at MIT exploring intelligent tutoring systems. DragonBot uses emotional awareness to adapt to the student, for example, one of the applications is a reading game that adapts the words based on the recognized emotion i.e. the system can adjust the difficulty of the task (words in this case) based on the users ability determined by the recognized emotion.

Conversational agents (chatbots) are an obvious opportunity for using emotional recognition. Currently chatbots perform what is known as Natural Language Understanding (NLU) to determine the responses; this response is typically dependent on a given context and inferred intent but it won’t be long (and some already exist, such as Emotibot) before it becomes standard to also use the recognized emotion when determining the response to the user (adapting not only language but also tone to respond with). This can not only increase effectiveness of communication but also gives us the opportunity to avoid creating undesirable behaviors in how we communicate with one another. We often joke in the studio about how voice assistants, such as Alexa, are creating behaviors in children where they will demand for things rather than ask for them “Alexa Tell me the time!”.

As conversational interfaces become more pervasive, so to will the need for developing effective ways of recognizing and adapting to the users emotion, especially in the domains around medical assistance (Ada) and mental health (woebot).

Generally emotional recognition can be used to either automatically increase engagement or automatically adapt to its user(s); Disney Research providing many more examples of where emotional recognition will play a role in adapting the content; from their exploration in interactive preschool television programming, their interactive narrative authoring tool, and many more — I encourage you to spend sometime exploring.

As mentioned above; the catalyst for this exploration stemmed from my initial curiosity of wanting to know how to recognize emotion, which, itself stemmed from an initiative here at Method called FINE.

FINE is an ecosystem designed to support the mental health of young children. Emotion is very much at the heart of it — for both, input and output. Though the camera and keyboard, we monitor and infer the emotional state of the user(s) and using this data we then present the aggregate mood through a shared device. This encourages communication as well as offering an empathetic companion through a virtual avatar taught empathy through crowd sourced intelligence.

The application of emotion recognition is very domain specific but I hope I have presented a strong enough argument above for its opportunity and likelihood of adoption in the coming years. Recognition on the other hand is universal and therefore I will spend the rest of this post briefly introducing and summarizing the approaches we took for FINE to infer the emotion of the user, using both an image of their face and text they had written.

Recognizing emotion from our facial expressions

A quick search on Google about what percentage of communication comes through body language quickly highlights that most communication is nonverbal (body language accounting for 55% of the overall message, tone accounting for 38% and words only account for 7%). So it should come to no surprise that a lot can be inferred simply by looking at ones face — this is the premise for us being able to infer someones emotion simply by examining their facial expression. So the task now comes one of classifying facial expressions to determine emotion and luckily this has been well studied and data made available.

The dataset used in training our classifier comes from a Kaggle competition; the accompanying dataset consists of over 20,000 grayscale images of faces that have been manually labeled as being either angry, disgust, fear, happy, sad, surprise, or neutral. As with any Machine Learning (ML) project; our first task is to build intuition around the data and come up with some theoretical hypotheses of how we go about performing classification. Below are some examples of the faces from our dataset along with their associated labels.

Our assumption is that there is some common pattern between the expression and emotion; one way of exploring and validating this is through visualization. To visualize it, we can take the average face for each emotion; below we show what the this looks like for the emotions angry, happy, and surprised.

We can clearly see there are distinct expressions for each of these emotions; our next task is to learn these patterns. For this experiment we used a Convolution Neural Network (or ConvNet) to learn these patterns (we forego the details here but will be sharing the Notebook for those interested in knowing the technical details). After 15 epochs of training we achieved a validation accuracy near 60% (not bad given the baseline would be around 14%); the results of training shown below.

Recognizing emotion from text

We saw before that text (the words we use) only accounts for 7% of the overall message; this and the fact that languages inherit ambiguity makes it more difficult but still a valuable source of data and something that can easily be monitored passively. For this prototype we trained a Recurrent Neural Network (once again, we will skip the details here but will be sharing the Notebook for those interested in the technical details) and ported it (the model) to CoreML, Apples ML framework. Accompanying this was a custom iOS keyboard that passively monitored what the user typed and used this model to determine the current emotional state of the user.

Data for text was more difficult to come across; albeit there were a few from creditable sources, none contained a substantial amount of examples to train a Deep Neural Network. Here lies an important point; labeled data is scare and acquiring it can be expensive. Various datasets were tried after finally settling with a dataset made available by CrowdFlower, a dataset consisting of around 40,000 rows of tweets that have been labeled with one of 13 emotions (such as happiness, sadness, and anger). One issue with with dataset was the imbalance in examples for each emotion. The plot below shows this distribution. Despite this, our goal was in the feasibility and application rather than accuracy so we continued with this dataset.

Despite the significant imbalance and amount of training examples, we were still able to obtain a validation accuracy of around 35% after 12 epochs.

Below shows the classification being performed on the device (albeit the simulator in this instance).

Here we have only explored the obvious available sources for recognising emotion; others include tone (tone of voice), behavioural (or model based), and pose but the important thing to take away is the trend away from explicit to implicit interactions and how emotion will be a valuable input to determine how your system engages with the user.

From Human Computer Interaction (HCI) to Human Computer Relationships (HCR)

We conclude this post by, again, emphasizing the evolution of HCI and how the importance of our relationship with computers is becoming just as important as how we interact with them.

The original focus of HCI was around the concept of usability. Where the initial definition of usability was solely focused around the concept simplicity i.e. “easy to learn, easy to use”, it has continuously been evolving alongside with advancements in technology. It now subsumes the qualities of fun, well being, collective efficacy, aesthetic tension, enhanced creativity, flow, support for human development, and others.

It has moved beyond the individual user sitting at their desktop, and will continue to move driven by the new frontiers made possible by technology. This dependency on technology means to continually investigate, develop, and harness new areas of possibilities for enhancing the human activity and experience. Those technologies now offer opportunity to recognize the emotion of the user; what will you do with this?