Monday, 6 April 2015

Speaking Actors Living in the Phone

A Hollywood actor who couldn't speak wouldn't get far. How soon before we say the same of mobile apps?

This article is a bit off-topic for this blog, but it's a bit of fun and the subject deserves more attention. Over the past 18 months the Google Text-to-Speech API has come a long way on the Android platform. I decided to take advantage of it in the Libanius quiz app. For example, when the app presents a quiz item, it should speak the prompt ("Clairvoyance" in the picture).

Libanius uses the actor model from Akka to coordinate its subsystems. To see how Android can be set up to use Akka actors, see this from Typesafe, or this from me. Assuming the Akka infrastructure is in place, a speaking actor can be created like this:

class Voice(implicit ctx: Context) extends Actor with TextToSpeech.OnInitListener {

  val tts = new TextToSpeech(ctx, this)

  override def receive = {
    case Speak(text: String, quizGroupKeyType: String) => // see next code snippet
    case _ => logError("Voice received an unknown command")

The Context of the Android activity is passed to the actor because the TextToSpeech class needs it for initialization. Speak is just a case class for the message that is sent to the actor. We have to implement what happens when that message is received.

The prompt could be in various different languages. The Google API supplies a different voice for each language, so we need to take advantage of this.

  case Speak(text: String, quizGroupKeyType: String) =>
    KnownQuizGroups.getLocale(quizGroupKeyType).foreach(speak(text, _))

KnownQuizGroups is a simple map of Libanius quiz group types (e.g. "Spanish") to locales
recognized by TextToSpeech.

The speak method finally uses the TextToSpeech API to set the voice according to the given locale, and actually speak the text. It works fine whether the text is a word or a full sentence.

def speak(text: String, locale: Locale) {
  tts.speak(text, TextToSpeech.QUEUE_FLUSH, null)

private[this] def setSpeechLanguage(locale: Locale): Unit = {
  val result: Int = tts.setLanguage(locale)
  if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED)
    logError("Language is not available.")

Also we must remember in the Speak code to reset the language after the message is spoken.


While we're adding sound to the Libanius quiz, how about a ping sound for a correct answer, and a buzz sound for a wrong answer? This is a job for another actor: SoundPlayer. The guts of it look like this:

class SoundPlayer(implicit ctx: Context) extends Actor {

  val audioManager: AudioManager =
  val soundPool: SoundPool = new SoundPool(10, AudioManager.STREAM_MUSIC, 0)
  var soundPoolMap = Map[SoundSample, Int]()

  override def receive = {
    case Load() => 
    case Play(soundSample: SoundSample) =>
      val curVolume = audioManager.getStreamVolume(AudioManager.STREAM_MUSIC), curVolume, curVolume, 1,  0, 1f)
    case _ =>
      logError("SoundPlayer received an unknown command")

There is more complete code on Github. This is all for the Android platform, but of course you can also use Akka actors for sound on other platforms like the Mac: see Alvin Alexander's Wikipedia Reader page for some sample code. Have fun!