Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by social interaction and communication deficits and the presence of restricted, repetitive patterns of behavior (1). Children and adults with ASD often have difficulty in responding to social overtures, recognizing the emotional states of others from visual or auditory cues, and understanding the importance of gaze as a social cue (2). Therapies are diverse, but are typically time-, resource-, and labor-intensive, and can put substantial strain on families and caregivers (3).
Technology-based interventions, and robotics in particular, for ASD have been seen as a potential approach for augmenting the efforts of families and clinicians to provide on-demand, personalized, social skills training (4). The robots envisioned by these efforts are part of a new field called socially assistive robotics, which aims to construct systems that support social and cognitive growth by using social rather than physical means (5–7). These robots share characteristics of educational robots, which attempt to convey information typically via a tutor-student relationship (8), and rehabilitation robots, which provide structured physical therapy for deficits such as stroke and paralysis (9).
Exploratory studies from dozens of research groups have shown that many individuals with ASD enjoy interacting with robots and, in many cases, even demonstrate more appropriate social behaviors with robots than they do with peers or caregivers (10, 11). These initial exploratory studies focused on short interactions, spanning tens of minutes or less, under controlled laboratory or clinical conditions, often involving sample sizes of five children or fewer, and exclusively on robot-directed behavior (7). Although these studies generated considerable excitement, they held little clinical value. Results tended to fade with repeated exposures and may have been the result of novelty, appropriate control conditions were rarely considered, and experiments failed to demonstrate learning that generalized to human-directed actions (12). A few studies did examine longer-term interactions (13) or demonstrated improved adult-directed social behavior (14), but none was able to demonstrate skill acquisition that could be considered clinically meaningful that generalized beyond the specific robot encounter.
We report here a demonstration of directly assessed improvements in social skills in children with ASD after an in-home, 1-month intervention in which daily social skills games were conducted by an autonomous, socially assistive robot (Fig. 1). This study differs from previous work in this domain in four important aspects. First, this study used a fully autonomous robot system that operates for a 1-month deployment duration with no adjustments made by clinical or research staff. Many socially assistive robots still operate under teleoperative control, because autonomous operation for this duration is a substantial challenge in the robotics community even when static program requirements are used throughout the deployment (6). Second, unlike previous work where predefined protocols are followed explicitly (15), the system used here must adapt to the strengths and weaknesses of the individual child by changing the difficulty of individual tasks based on the child’s preferences and performance. Because individuals with ASD have substantial individual differences in the type and severity of their social skill deficits, the need to adapt to an individual child is essential to enabling a positive learning outcome. Further, the interaction between the need for autonomy and the need for adaptation creates additional technical challenges. Third, this study provided therapy directly in homes with a fully autonomous robot. Whereas clinical and laboratory spaces represent known environmental conditions that can be controlled or explicitly planned for, the unconstrained home environment requires more complex sensing and behavioral routines to deal with greater variation in environmental conditions. Last, this study focused primarily on demonstrations of clinically meaningful measures of performance using standard evaluation metrics that are conducted by an independent assessor away from the robot. This represents a challenging evaluation standard, because a child must not only learn a skill while practicing with the robot but also be capable of generalizing that skill to interactions with an adult in an environment that differs from the practice games used by the robot.
Our system consists of a social robot, touch screen monitor, and two RGB cameras. The system supports triadic interactions between the robot, the child, and the caregiver. Software running on the perception computer uses an elevated camera to track both the child’s and caregiver’s attentional foci, whereas the other camera records the intervention session (Fig. 2). The main computer controls the flow of the intervention and the robot’s behavior to ensure presentation of coherent, meaningful intervention.
Our robot-assisted intervention included a 30-min session every day for 30 days and involved triadic interactions among the social robot, the child, and the caregiver, providing opportunities for the child to interact and share experiences with the caregiver (Fig. 2). The robot modeled social gaze behaviors, such as making eye contact (Fig. 3) and sharing attention throughout the sessions, and provided feedback to and guided the participants in six interactive games. The six games targeted different social skills, including social and emotional understanding, perspective-taking, and ordering and sequencing (Fig. 4). Each session began with the robot telling a daily story to engage the participants. The session continued with three games, which varied from day to day, and concluded with a caregiver survey, where the caregivers rated their observations of the child’s social communication skills.
Our robot system was designed to engage and facilitate interactions between the child and the caregiver, therefore providing opportunities for the child to practice social skills in a fun, natural way.
The robot models appropriate social gaze behavior by demonstrating context-contingent gaze and facilitates mutual gaze and experience sharing between the child and the caregiver. When the child is engaged with the robot (A), the robot directs the child’s attention to relevant task content on the screen (B). As the child’s attention shifts to the robot-directed focus on the screen, the robot then attempts to redirect gaze to the caregiver (C) in the hope of redirecting the child’s visual attention to the caregiver (D). (These demonstration images were recreated in the laboratory to show both robot and child behavior because this perspective was not recorded by the deployed system.)
A set of interactive games were developed to allow children with ASD to practice social skills through play. The games were designed to support interactions between the caregiver and the child and between the robot and the child. The games targeted three social skills, including social and emotional understanding (A) (Story), perspective-taking (B) (Rocket), and ordering and sequencing (C) (Train).
Fourteen families with a child with ASD enrolled in this study. Two families withdrew, one due to unrelated health problems of a caregiver and one due to technical difficulties with the robot installation. Among the 12 families who finished the study, five of the children with ASD were females and seven were males. Caregivers reported that all 12 children identified racially as white and 4 of the 12 as Hispanic or Latino. These participants’ age ranged from 6 to 12 years old [mean (M) = 9.02, SD = 1.41]. All had nonverbal intelligent quotient scores of ≥70 as determined by the Differential Ability Scales (DAS; M = 94.17, SD = 20.06) (16). Diagnosis of ASD was based on standard-in-field clinical best-estimate (CBE) diagnosis by licensed clinical psychologists and/or speech-language pathologists with extensive experience in autism diagnosis. Measures used in the diagnostic process included the Autism Diagnostic Interview–Revised (ADI-R) (17) caregiver interview and the Autism Diagnostic Observation Schedule (ADOS) (18) semistructured play observation. Scores on the ADI-R and the ADOS reflect the presence of autism symptoms, with higher scores reflecting greater autism severity. The ADI-R is broken down into four domains: reciprocal social interactions (M = 17.64, SD = 6.98; cutoff for ASD, 10); communication (M = 16.36, SD = 4.74; cutoff for ASD, 8); restricted, repetitive, and stereotyped behaviors (M = 6.00, SD = 1.41; cutoff for ASD, 3); and history of early abnormal development (M = 3.44, SD = .73; cutoff for ASD, 1). The ADOS yields outputs including a calibrated severity score [M = 7.08, SD = 2.02; cutoff for ASD (scale from 1 to 10), 4]. All participants, in addition to receiving a CBE of ASD, scored above the ASD cutoff on either the ADOS or the ADI-R.
All participants were recruited from a large database of children with ASD who have either participated in previous research studies with our laboratory or expressed interest in participation. Eligible families were contacted via email to inquire about their interest in participating. Given the scope of the project, the first 12 eligible families were enrolled. Inclusionary criteria were (i) age between 4 and 12 years old, (ii) good medical health, (iii) cooperative with testing, (iv) English is a language spoken in the family, and (v) having been diagnosed with ASD and meet the characterization cutoffs described above. Exclusionary criteria were (i) a fragile health status and (ii) suspected or diagnosed hearing loss or visual impairment or diagnosed neurological abnormality significantly affecting visual or auditory acuity.
All children in the study were enrolled in school programming full time and received intensive special education services as consistent with the state standards for educating children with ASD. Because the scope and form of these services and therapies varied substantially across participants based not only on their individual needs but also on family preferences and local resource availability, we used a single-subject withdrawal design (ABA) that allowed each child to serve as their own control (see Materials and Methods for details). Caregivers were instructed to maintain consistent intervention services during their participation in the study.
Engagement and skills performance
A total of 127 hours of data was collected from the interaction between the 12 children, the robot, and their caregivers. These data included video and audio data, head orientation of both child and caregiver, interaction logs containing the robot utterances and actions, game logs for the tablet-based games, and caregiver survey responses. Because our primary study design was focused on showing the efficacy of this intervention, we focus in this paper on the analysis of child social performance as measured by game performance, caregiver reports, and clinical measures.
The children combined initiated a total of 653 games with the robot, which resulted in 540 complete games for analysis. (Games that were shortened because of the end of time in the session were not considered for analysis.) On average, each child performed 23.25 sessions with the robot across the month, and each session lasted, on average, for 27 min and 42 s. After a month of interacting with the robot on a daily basis, the robot was able to maintain engagement with the child during the interactions: Children played with the robot for an average of 27 min during the first five sessions and an average of 25 min during the last five sessions.
The robot adapted the difficulty of each individual game based on the child’s history of performance in each skill set. On the emotion-understanding game “Story,” 86% of children reached the most difficult level of the game by the last session. On the perspective-taking games, 58 and 100% of children reached the highest level on “Rocket” and “House,” respectively. On the sequencing and ordering game “Train,” 67% of the children reached the highest level. The “Spaceship” and “Traveler” games used only a single difficulty level and were excluded from this analysis.
Binomial generalized linear mixed models (Fig. 5) were used to model the level attained by children as a proportion of the maximum possible level as a function of the specific game and session number. Game and session number were included as both fixed and random effects. Likelihood ratio tests on the resultant model indicated significant main effects of game, session, and their interaction (all P < 0.001). In terms of overall performance (i.e., intercept) and gains over sessions (i.e., slope), the House game was easier than other games [intercept, slope: P = 0.001, P = 0.030 (versus Story); P < 0.001, P < 0.001 (versus Rocket); P = 0.014, P = 0.030 (versus Train)].
Curves were modeled in a binomial generalized linear mixed model with session and game as fixed and random effects. The 95% confidence intervals are shown. Children advanced in the level of each game when they achieved over 75% of correct answers and regressed a level when giving less than 25% correct answers. When achieving between 25 and 75% of the correct answers, the children would remain at the same level.
Performance on the joint attention probe was measured and recorded at four time points: (i) T0, 30 days before intervention began; (ii) T1, on the first day of robot intervention; (iii) T2, on the last day of intervention; and (iv) T3, 30 days after the end of the intervention. The difference between time points T0 and T1 was computed to measure change in joint attention during a period of time with no robot intervention and is denoted as the pretest. The difference between time points T1 and T2 was calculated to measure joint attention changes resulting from the robot-administered intervention and is denoted as the test phase. Last, the difference between time points T1 and T3 was evaluated to measure the stability of any changes recorded during the robot-administered intervention and is denoted as the posttest.
Two participants were excluded for lack of data at one or more time points. Another participant was excluded for being out of the age range in which the task was normed (7 to 12 years of age). Group means were as follows: for T0, M = 16.89 and SD = 4.46; for T1, M = 15.67 and SD = 3.81; for T2, M = 20.89 and SD = 3.79; and for T3, M = 18.22 and SD = 5.02. A linear mixed model with compound symmetry repeated covariance effects indicated a significant time point effect [F(3,24) = 5.03, P = 0.008]. Planned comparisons showed that, although no pretest or posttest effect was observed (T1 − T0, P = 0.395; T3 − T1, P = 0.083), joint attention improvements occurred in the test phase (T2 − T1, P = 0.001; Fig. 6). Test phase changes were negatively associated with nonverbal reasoning performance on the DAS [r(9) = −0.750, P = 0.020]. These results are consistent with greater joint attention gains made by children with lower nonverbal ability. Exploration of relationships between baseline nonverbal ability and average baseline (T0 and T1) joint attention performance indicated a strong positive relationship [r(9) = 0.831, P = 0.005], suggesting that children with lower nonverbal reasoning skills had more capacity to grow in terms of joint attention skills. Joint attention performance at T1 was also positively Pearson’s correlated with modeled participant overall performance on the House [r(9) = 0.702, P = 0.035] and Story [r(9) = 0.705, P = 0.034] games, suggesting shared variance in performance.
Probe scores for the child at four different time points: 30 days before the robot intervention started, on the start day of the robot intervention, on the last day of the robot intervention, and 30 days after the end of the robot intervention. There was a significant increase in joint attention scores when comparing before the robot intervention and after it. n.s., not significant; *P < 0.05. Error bars indicate SE.
Caregivers completed an on-screen survey immediately after each day’s intervention session during the test phase. In all but one family, these interactions were conducted with the same caregiver (one father, one grandmother, and nine mothers).
The survey consisted of five-point Likert scale ratings. The questions were grouped into two categories: questions on how children interacted with caregivers during the past 24 hours, parallel questions about interactions with other people, and one final question regarding engagement. We compared the ratings scored by the caregivers on the first day and the last day of interventions with paired sample t tests. All 12 caregivers’ responses were included in the analysis.
Caregivers reported increased social skill performance between their child and themselves, including more eye contact [t(11) = −2.462, P = 0.03] with them on the last day of the intervention (M = 3.75, SD = 1.06) compared with the first day (M = 3.00, SD = 0.00), more attempts to initiate communication [t(11) = −2.930, P = 0.014] with them on the last day (M = 4.08, SD = 1.00) than on the first day (M = 3.17, SD = 0.39), and more frequent responses to communication bids from the caregiver [t(11) = −3.000, P = 0.012] on the last day (M = 3.83, SD = 0.94) than on the first day (M = 3.08, SD = 0.29; Fig. 7A).
Caregivers reported increased eye contact, increased initiation of communication, and increased response to communication bids with them (A) and with other people (B). On the basis of comparisons of ratings from the last day of the robot intervention (T2) to the first day of the intervention (T1), these results showed that caregivers were able to observe improved communication abilities of the children beyond our robot-assisted intervention sessions over the period of 30 days. Error bars indicate SE.
Caregivers also reported increased social skill performance between their child and other people, including more eye contact [t(11) = −3.447, P = 0.005] with other people on the last day of the intervention (M = 3.83, SD = 0.83) when compared with the first day (M = 3.08, SD = 0.29), more attempts to initiate communication [t(11) = −3.527, P = 0.005] with other people on the last day (M = 3.91, SD = 0.90) than on the first day (M = 3.00, SD = 0.00), and more frequent responses to communication bids from other people [t(11) = −3.458, P = 0.005] on the last day (M = 3.75, SD = 0.75) than on the first day (M = 2.91, SD = 0.29; Fig. 7B).
Last, caregivers were asked daily to rate how easy it was to engage their child with the robot therapy session. To confirm that the continued length of engagement was not solely a result of compliance to the protocol instruction, we modeled the engagement rating with a cumulative link mixed model fitted with an adaptive Gauss-Hermite quadrature approximation as a function of day with random participant effects. This model revealed no significant effect of day on engagement (P = 0.822). This suggests that participant engagement did not change in a systematic fashion throughout the study.
The potential benefit of a socially assistive robot lies in the ability to provide personalized, on-demand, and structured cognitive or social support to augment the efforts of clinicians, teachers, and families. In the ideal case, robots could provide personalized support, whenever and wherever needed, and could be capable of producing lasting enhancements in social and communicative skills not only in human-robot interactions but also in human-human interactions (4). The system presented here takes steps in this direction beyond the current state of the art but also does not yet live up to all of these grand visions. We focus our discussion around the points in which the current work makes substantial improvements and also describe the limitations and areas requiring continued focus as this field progresses.
Our deployed robots operated autonomously without any experimenter intervention for a total of 127 hours over 279 sessions. Caregivers contacted our 24/7 help line a total of eight times: six times for confirmation that they were using the system correctly, which required no action from our team, and two times for a technical issue that was prompted by the sudden disappearance of an online software library, which required a software update and was resolved quickly. Robot-assisted autism intervention in previous studies was mostly short episodic interactions that rarely lasted more than 30 min (14, 19) and usually required experimenters to supervise robot-directed actions [although see (20, 21) for exceptions]. Moving from teleoperated to autonomous interactions presents substantial challenges in computational perception and robot control to create meaningful therapeutic training. Although challenging, increasing robot autonomy in assisted therapy has potential to reduce therapists’ cognitive load and ensures consistent therapy for the children with autism (21). Our system demonstrated the possibility and potential of autonomous robot interventions for autism, which would enable the implementation and application of robot-assisted intervention at a large scale in various environments, accelerating us toward the goal of achieving clinical significance.
Sustaining engagement with participants is key to effective interventions. Repetitive and unchallenging tasks are likely to bore participants, who then would disengage from the intervention and miss opportunities to practice and improve on targeted skills. As informed by the challenge point theory (22), optimal learning occurs when the task is neither too easy nor too difficult. Our system sought to keep the participating children challenged and adapted the difficulty level of practice games to the children’s skill performances as measured in the games. This adaptation allowed the children to practice and to improve the target skills at their own pace. Our results confirmed that the children continued to engage with our system throughout the test phase. We speculate that such engagement with our robot-assisted intervention was crucial to the observed improvements in the children’s social skills.
Deployment in uncontrolled environments
Deployment of robotic systems outside controlled laboratory settings is challenging. Our deployment needed to address various environmental constraints and to meet different human considerations. For example, the setup location of our system was constrained by electrical power, network connectivity, and family preferences. For instance, one child was particularly sensitive to light; therefore, our system had to be set up in a dimmed room, which created additional challenges for our perception system. Furthermore, our deployment needed to accommodate other family members’ needs, especially the participating child’s siblings. We provided robotic toys to the siblings, so that they would not interrupt daily intervention sessions. We also made the operation of the system user-friendly by automating startup procedures and by providing a simple checklist to ensure that caregivers would feel comfortable operating the systems on their own on a daily basis. These challenges, constraints, and considerations are unique to field deployment of robotic systems aiming to interact with nonspecialist users every day over a long period of time. Yet, meeting these requirements is a practical necessity for the integration of robots into our environments to provide daily support.
Contributions of the social robot
Although the focus of this study is not to understand the specific influence that any of the individual components of our system (including the robot, tablet-based games, perception system, etc.) have on our robot-mediated intervention, we believe that the social robot contributed positively to the observed behavior based on three converging lines of research. First, in triadic interactions between a child, an adult, and a third interaction partner, children with ASD demonstrate more social behavior toward the adult when the third interaction partner is a robot rather than a tablet-based game or another adult (14). Second, the embodiment of the robot provides necessary affordance to convey gaze cues that are central to our behavioral intervention. Known as the Mona Lisa gaze effect (23), agents on a flat screen are limited in accurate communication of gaze directionality. Third, in tutoring interactions with both adults and typically developing children, physically embodied robots increase learning outcomes (24, 25), increase compliance to instructions (25), and increase user engagement during the interaction (26, 27) over screen-based agents. Nonetheless, we acknowledge that, in this study, the impact of the robot (or any other system component) cannot be measured independently. We present this as a limitation of this study and an area for future work.
Improvements in caregiver-reported social behavior
Over the month of the robot-based intervention, caregivers reported improved social behavior directed both toward themselves and toward others in areas including eye contact, initiation of communication, and responses to communication. The change in reported behavior on the caregiver survey could be, in part, related to the caregiver attending more to the child’s social communication skills. It is unlikely that the change is due solely to this increased focus given the nature of the sample; caregivers of children with ASD provide ongoing support to their children in this area and generally monitor social communication development.
Improvements in clinical measures
Our results also showed improvements of children’s joint attention in the absence of the robot, indicating that the children were able to demonstrate greater skill in the context of human interactions. These results are independent of the caregiver reports and are not subject to the same limitations. These results advanced and differed substantially from previous research in robot-assisted autism therapy, where behavioral improvements in children with ASD were mostly documented in the context of robot-directed interactions (20). Successful demonstration of improvement in human interactions is the ultimate goal of robot-assisted intervention, evolving beyond the mediation and scaffolds that assistive robots offer during interactions with other people. Our results provide evidence illustrating possible transferable social skills beyond robot-mediated interactions in naturalistic human interactions.
The present results have multiple clinical implications. Joint attention is the critical foundation for many higher-level social communication skills, including reciprocal exchanges and perspective-taking. Therefore, with improvements in joint attention after this intervention, in time, we may see downstream effects on other higher-level skills. We did see broader gains in the context of the current study, even in this 1-month intervention. Future work with larger and longer trials will clarify this promising, yet preliminary, result. These results support the potential for robot intervention studies in group treatment to facilitate interactions between peers and to improve both foundational and high-level social skills in this context.
The specific developmental growth seen in the children during their participation in this study is likely due to our system, as opposed to other treatments they received, because the children did not show the same magnitude of gains during the pretest phase (just the test phase), and the children’s concomitant treatments remained stable throughout their participation in the study. The lack of significant effects between the first day of the test phase and the end of the posttest phase further emphasizes that the improvements are more likely due to the intervention components and may deteriorate over time in the absence of continued support. That said, from a clinical intervention perspective, our study is an open-label pilot. Future studies extending the duration of the study and with randomization with appropriate control groups are necessary to verify the gains we have observed and attributed to our intervention. In addition, the factors associated with long-term preservation of joint attention improvements in ASD remain to be elucidated. Last, future randomized controlled studies will need better control for practice effects of the tasks.
Although our results provide evidence of benefits and the possibility of using robot-assisted autism therapy for clinical intervention, limitations of our system motivate future research on the development of effective robot-based interventions. In particular, our system relied on prespecified interaction content, which included daily opening stories and a fixed set of behavioral responses. This approach was appropriate for our target scenarios, but it would not scale well for interventions that span a longer period of time (e.g., over 30 min per day and over 30 days). How to generate coherent, engaging interaction content automatically is a core challenge for realizing a long-term, autonomous robot-assisted intervention or human-robot interaction in general. Second, our intervention personalization was focused on adjusting difficulty levels of the practice games, analogous to personalization of educational contents in intelligent tutoring systems (ITS). Our personalization algorithm was simplistic, although it matched the complexity of personalization algorithms successfully used to demonstrate learning gains in other ITS systems [e.g., (28)]. More complex and detailed modeling of a child’s capabilities would likely provide a more substantial impact. Furthermore, to effectively support the wide variety of behavioral characteristics of individuals with ASD, adaptive models that prioritize and personalize needs and preferences in addition to skill performance are necessary to maximize the potential of robot-assisted interventions. Third, our system was designed to provide targeted interventions, involving interactions between the robot, child, and caregiver for about 30 min each day. Although this design provided structures for targeted intervention, it missed naturalistic intervention opportunities outside of the intended sessions. These three limitations necessitate smart, adaptive systems that can provide personalized, engaging interventions to children with ASD in a variety of situations over long periods of time.
MATERIALS AND METHODS
Objectives and study design
The objectives of this study were to investigate how a social robot may deliver behavioral intervention to children with ASD both autonomously and effectively outside clinical settings, as well as how such robot-assisted intervention can improve these children’s social-communicative abilities. This study was modeled after single-subject withdrawal (ABA) designs (29, 30). This design included pretest (A), test (B), and posttest (A) phases, each phase lasting for about 30 days. The pretest phase served as a comparison baseline, capturing possible maturation of social communication abilities and the effectiveness of any other therapies or interventions that the family may have been using. The test phase involved the in-home deployment of a socially assistive robot system that engaged the participating child in our intervention program, which was based on intervention activities commonly used in clinical settings. The posttest phase sought to explore whether the benefits provided by our robot-assisted intervention would be sustained after the removal of the system. This study design is suitable for investigating the effects of a single intervention and for when there are wide variances in participants’ characteristics and responses to the intervention. Informed consent from families and assent from minors were obtained in all cases, as approved by the Yale University Institutional Review Board.
To assess a child’s ability to respond to joint attention bids in their familiar environments, we used the validated, naturalistic joint attention assessment of Bean and Eigsti (31). This assessment includes six naturalistic prompts that can be delivered at any point during an interaction with the child and is designed particularly for school-age children and adolescents. The six prompts examine different aspects of joint attention, including gaze following, response to name and a greeting opportunity, and recognition of the other person’s current interest. This assessment of joint attention was administered four times throughout the study while a researcher was interacting with a child in play-based activities.
To understand whether a child’s behaviors of social communication changed over time outside intervention sessions, we asked the child’s caregiver to fill out a survey regarding his/her own observations of the child’s communicative behaviors at the end of each daily session. The survey questions sought to measure the broader influence of our robot-assisted intervention outside of intervention sessions, focusing on the child’s ability to make eye contact with, initiate communication with, and respond to communication bids from the caregiver and others.
Robot-assisted intervention system
Our intervention system consisted of a social robot, a 24-inch touch screen, two external color cameras, and two computers. The social robot used was an early prototype of the Jibo robot (32). The Jibo robot is a 12-inch table-top robot with three degrees of freedom, capable of turning its head and body around 360°. The robot can exhibit expressive behaviors through body movements, a ring of color-changing light-emitting diode lights, and a pair of animated eyes (e.g., blinking and dilation). These capabilities allow the robot to make eye contact with the participants and signal shared attention. In addition, the robot can deliver information verbally to the participants through its internal speakers. The 24-inch touch screen presented educational content and served as a shared medium that the robot and the participants could all interact with and reference. One of the cameras tracked both the child’s and the caregiver’s attentional foci as approximated by head orientations, whereas the other camera recorded the intervention session.
Our software system (Fig. 8), which involved attention tracking and intervention presentation, was implemented in the Robot Operating System (ROS) framework (33). The attention-tracking subsystem, running on one of the computers, continuously approximated users’ attentional targets in the environment. With RGB camera stream input, the system estimated and tracked head poses and orientations by using Constrained Local Model face tracking and landmark detection algorithms (34) and approximated attentional targets according to the estimated head poses and orientations. The intervention presentation subsystem, running on the other computer, ensured smooth delivery of curricular content. It controlled the robot’s behaviors, scheduled intervention content, and adjusted difficulty levels of the social skills games.
Our software system consists of several components responsible for attention tracking of the participants, robot behavior control, and intervention presentation. These components together create rich, engaging interactions for our robot-assisted autism therapy. These components operate within the ROS framework.
In our implementation, we manually prepared interaction scripts that specified predefined behavioral animations for the robot, daily opening stories, and verbal encouragement and feedback to the participants. When the robot was not displaying prespecified behaviors in a prepared interaction script, it maintained eye contact with the child to show engagement. The robot also shared attention with the child by looking toward the visual content on the screen from time to time throughout each session. These behaviors of making eye contact and sharing visual attention were meant to model social gaze behaviors for the child.
In addition to targeting the core social skill of joint attention, we designed and developed six interactive games that provided opportunities for the child to practice social and emotional understanding, perspective-taking, and ordering and sequencing while interacting with the robot and the caregiver. The six games were an emotional understanding game (Story), two barrier games that facilitate perspective-taking (House and Rocket), and three ordering and sequencing games (Train, Spaceship, and Traveler). Each game involved multiple levels of difficulty, ranging from 1 to 4 to 1 to 8, except for Spaceship and Traveler, which have only one difficulty setting. Depending on the child’s performance in the game, the system adjusted the difficulty level accordingly. As inspired by the challenge point theory (22), our personalization module was focused on delivering learning contents with appropriate levels of difficulty to increase learning gains while reducing frustration. The personalization module kept track of the child’s performance in game activities, providing approximate measures of their abilities of social and emotional understanding, perspective-taking, and ordering and sequencing. Using these performance measures, the module followed a simple decision tree mechanism to decide the difficulty level of the game for the next round of interaction. In our implementation, we used 25 and 75% as criteria for decreasing and increasing a difficulty level, respectively. Similar approaches of discrete adaptation have been used in robot-child tutoring applications (25, 35, 36).
Social and emotional understanding
The Story game targets the skills of social and emotional understanding. A typical example of this game is as follows. The robot provides a social situation, displayed as cartoon-like images on the touch screen, and asks the child to choose what he/she thinks the story character is feeling at different points in the story by selecting one of multiple options displayed on the screen. As the child progresses, the social stories become longer and more complex. To succeed in this game, the child needs to understand the social situations and emotional states of the characters.
Two virtual barrier games, Rocket and House, target the ability of taking the other person’s perspective on a joint task. Modeled after physical barrier games commonly used in clinical interventions, these games provide spatial information to either the child or the caregiver and ask them to relay that information to the other verbally. In both games, the robot facilitates interactions between the child and the caregiver and acts as a game moderator by keeping time and providing motivational support. In Rocket, the child and the caregiver take turns building a rocket ship. The first player builds a rocket by dragging modular component parts onto a rocket template while the second player looks away. The screen is then reset to hide this design, and the first player must explain to the second player how to recreate the design. If the two designs are identical, then the players have succeeded and win the game. Similarly, in House, the child and the robot take turns in the roles of builder and guesser. The builder builds a virtual house that is composed of various designs and materials while the guesser looks away. The game then shows six possible designs, one of which was constructed by the builder. The guesser then asks questions about whether the builder’s house has a particular design or material to guess which of the shown designs is the builder’s. These games provide opportunities for the child not only to understand that the caregiver or the robot has a different perspective but also to practice turn-taking and verbal communication.
Ordering and sequencing
The Train sequencing game targets the skills of ordering and sequencing. In this game, the robot instructs the child to build a train by dragging parts onto a template. To succeed in this game, the child needs to follow the robot’s instructions carefully in sequence. Two additional games, Spaceship and Traveler, involve various tasks such as sorting objects in order. In an ordering task, the child needs to place objects in the right order to complete the task successfully.
Substantial effort was placed into making the system robust and easy to use. Before the deployment described in this paper, we conducted multiple pilot tests of the system and the installation process in the homes of the research team. We attempted to make the system easy for families to use by limiting the startup required to four button presses, providing in-home training on the first day, and continuous system state logging to allow for most troubleshooting to require only powering the system off and then on again with no loss of data. Multiple efforts were made to minimize disruptions to normal operations after installation: Backup power supplies in the system base guarded against short power failures; hardware components (including the cameras, robots, and tablets) were secured in place to the table; and a troubleshooting sheet and a 24/7 technical support line (via email and phone) were provided to participating families. Last, the system limited play use to conform to the study design; the robot would play games only for one session each day and only for a maximum of 30 min.
Acknowledgments: We are indebted to the other members of the Expedition in Computing on Socially Assistive Robotics for their contribution of social skills game software and the early development of other socially assistive systems that led to the deployment reported here, especially M. Mataric, C. Breazeal, C. Nass, F. Volkmar, M. Jung, A. Ramachandran, S. Sebo, D. Becerra, C. Claybaugh, J. Kory, S. Shen, M. Baranov, and A. Waugh. We also thank L. Hall for scheduling and administrative support, L. Scassellati for photo assistance, and all of the families that welcomed a robot into their homes. Funding: Support was provided by an NSF Expedition in Computing, B.S. is the principal investigator, #1139078 (Socially Assistive Robotics). F.S. was supported by funding by NIH grant no. K01MH104739. Author contributions: All authors took part in experimental design and deployment. B.S., C.-M.H., M.Q., and N.S. designed and constructed the robotic system and associated software. L.B., M.M., P.V., and F.S. conducted clinical assessments and provided clinical oversight. F.S. and C.-M.H. provided statistical analyses. Competing interests: L.B. is now supported by Vän Robotics, which constructs tutoring robots though not for children with ASD, and F.S. and B.S. serve as advisor board members. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper or the Supplementary Materials. Software available from github.com/ScazLab. Contact B.S. for other materials.