This is Animalese.
KK Slider: [unintelligible babble]
It's the pseudo-language of Animal Crossing. This is what
it sounds like in the Japanese version:
KK Slider: [unintelligible babble]
And here's what it sounds like in the
English version:
KK Slider: [slightly deeper unintelligible babble]
They sound different, which is weird because it's supposed to
be nonsense, right?
So... why does Nintendo dub KK Slider?
To understand we need to
do a taxonomy of all the ways games have
tried to represent -- or avoid representing --
human speech.
In the beginning, there was
the Word, and the Word was:
Oop! That... was supposed to say voice synthesis.
Early attempts at adding audio to games were a
mix of pre-recorded voices and genuine
voice synthesizers. But they were mostly
gimmicky, expensive add-ons. Voice chips
made more sense in arcade machines
because they were already a huge
investment of space and money -- but it was
still a technical struggle to get them
to work. Like Q*bert, known for his mad ups
and foul mouth, this drop of Tang was
originally supposed
to speak English instead of:
Q*Bert: [garbled synthetic phonemes]
but audio engineer David Thiel couldn't get the voice chip to
produce the sounds he was hoping for. So
instead of continuing to mess with it,
he just said [Q*bert garble curse] and had it string together some incoherent phonemes instead.
Thiel, like many designers that
followed, came to the conclusion that
human voices just weren't worth the fuss.
Other developers opted for a style
that's entirely unique to video games,
and I looked around but I couldn't find
a single definitive phrase used to
describe this style -- which I think speaks
to how much we take it for granted, even
though it is super weird.
I'm talking about using nonsensical sound effects to stand in for language, or simply put
[slow, low beeps that appear in time with the words]
[very high-pitched piercing beeps that appear in time with the words]
[sharp, high-pitch beeps that appear in time with the words]
for the purpose of this video -- and
because it's cool to name things --
I'm going to call this beep speech.
The earliest examples of beep speech I could
find were in JRPGs like Star Arthur
Legend - Planet Mephius [short pattern of beeps]
and Legend of Zelda [mid-pitch beeps]
Some American games used a similar
trope of mimicking on-screen text, but
it's not meant to stand in for a voice
so it's not quite the same.
That distinction is important because of beep speech's peculiar function; games that
use beep speech slowly reveal text and
accompany each word with audio, which
makes the player process information as
if they were really listening to somebody speak.
It's not a straight info-dump; it replicates the act of listening,
which makes it easier to stay
engaged with the written text. That's
assuming you enjoy listening to bebe bebe be bebeep
which is a great weakness
of the beep speech of the cartridge era.
Because audio capabilities were still
limited, most games use the same beep for
every character in every situation. Later
games - including Animal Crossing - could
pitch the beeps higher or lower, and that
really helped spice things up.
Then there were games like Star Fox which gave each
character a different kind of "voice" so
you could easily distinguish your kind
friends Slippy [synthetic sounds similar to frog croaks]
from that no-good hotshot Falco [deeper babble]
These were synthetic voices and
total nonsense with no real association
with the text. Another strategy was to
use vocal grunts -- things like sighs
and yells and other non-language forms
of communication.
These were great for adding variety,
conveying emotion, and giving a character
a voice without giving them language.
Although they use different strategies
Star Fox and Ocarina of Time have two
weird things in common:
first of all, both
have friendly frogs that never get their due.
[rhythmic frog croaking]
Second of all, both have English
language lines even in the original Japanese versions.
Navi in Ocarina of Time: Hello!
The [GOOD LUCK]
and [HEY, LISTEN] were the same in every
version of the games. And that points to
one of the biggest strengths of beep speech
and vocal grunts: you DON'T have
to translate them. A shiver is a shiver
in every language.
Link in Ocarina of Time: [shivers]
Localizing a game was -
and is - a huge expenditure of time and
money, which makes these non-voice
options the perfect replacement for
voice lines. Quality localization is
basically a requirement for most games
now, but the 90s and early 2000s were a
dark time for translations and voice
acting alike, leaving us with such gems
as:
Dracula in Castlevania: What is a man?!
Barry in Resident Evil: A Jill Sandwich!
And that's when localization happened at all - sorry Earthbound fans. During this period,
beep speech was usually a stand-in for a
real language. But Banjo Kazooie made a
huge innovation in that their gibberish
was... just what it was.
Bottles: [a gentle honking]
Like Star Fox, the characters
had distinct voices - but they
weren't synthetic. They were powered by
real human pipes, which is wild because
it's human voices replicating a
synthetic style, that was made to replace
human voices, like an aural ouroboros.
An auralboros.
Plenty of games of this era
had full voice acting... but they weren't
on the N64. Nintendo's insistence on
using cartridges would continue to
restrict their options for speech
representation. On other consoles, games
became more invested in the cinematic
experience of having characters say real shit.
Which mean a greater investment in voice acting. That caused a split in style where beep speech,
previously just fine for serious stories,
came to represent a more lighthearted
cartoonish feel.
Mushi's Mama in Okami: [soft mid-tone beeps]
By the early 2000s, a trend emerged of
entirely fictional spoken languages.
Whereas beep speech stood in for the
player's native speech, these constructed
languages were more about making certain
characters and settings appear foreign --
while still empowering the player to
understand what they're saying. It's
during this period that
both Animal Crossing and The Sims arrived.
Simlish actually predates The Sims; it first
appeared in Simcopter. But for The Sims,
the team at Maxis knew they'd need
something more elaborate. Because the
game was so much about the human
condition, they wanted to communicate
emotions which would encourage players
to connect with their creations. Plus the
practical considerations - anything that's
comprehensible can become repetitive, and having a huge scroll of dialogue meant
writing, translating, and redubbing a huge
scroll of dialogue. Following the style
of Banjo Kazooie, they captured the real
human voices of two improvisers and then
spent a year remix that audio to become the perfect blend of nonsense.
Sim 1: Dag dag aulf,
Sim 2: Anamana blastamana
But that strategy can't work with every
franchise. Animal Crossing had different
intentions and different styles, and so
they needed a different approach. When
you hear Animalese for the first
time, it sounds a lot like a standard
voice synthesis. But KK Slider is
actually saying REAL WORDS.
Here he is slowed down:
KK Slider: [deeper and slower than normal, words that match the text box identifiable]
The synthetic voice doesn't exactly
nail the pronunciation of each word, but
that works to its advantage; once it's
sped back up, it's even harder to tell
that KK is speaking English. Dōbutsu no Mori, the original n64 Japanese
version of Animal Crossing, features
Animalese in Japanese. Region-specific
Animalese is also the default language
in New Leaf... but not Wild World or City
Folk. Instead they use a pretty standard
sounding voice synthesis called Bebebese.
That's because Animal Crossing was never
intended to be localized. In fact,
Nintendo didn't localize the first
version of Animal Crossing; the American
release was based on the updated
GameCube game Dōbutsu no Mori+.
Members of the Nintendo treehouse had to
advocate for it to be translated,
partially because they had already
gotten addicted to playing it. Because
they never intended to localize the game,
Nintendo included a lot of specific
Japanese cultural elements, including of
course the language. All of those had to
be changed in the American version
because the style of translation at the
time called for completely eradicating
any hint of a foreign culture. The
prevailing notion was that American
audiences didn't want anything that had
what cultural theorist Kōichi Iwabuchi
called "cultural odor,"
a phrase I hate to say out loud but have to respect the usefulness of. The localizers for the
first Animal Crossing did an amazing
job replacing content and adding new
events for American audiences -- so much so that their game was actually real localized
back into Japanese and released
for the Gamecube as Dōbutsu no Mori e+.
So when it came time to make Animal
Crossing: Wild World, Nintendo needed a
localization strategy from the start.
And that strategy was to make a game with no regionality at ALL.
No cherry blossoms.
No Halloween.
And no regional Animalese.
The Bebebese of Wild World stuck around for City Folk,
but by New Leaf, Animalese had made a triumphant,
multilingual return. Why?
Well!
I don't know.
But my theory is this! City Folk got
a lot of criticism for being too similar
to early entries in the franchise. The
next game had to distinguish itself
significantly to avoid another letdown.
Aya Kyogoku, who co-directed New Leaf
alongside Isao Moro,
viewed the game as a tool to communicate
with both animal characters and other
players. So it made sense that the
communication in game would be more
elaborate than Bebebese.
But doing full voice acting would have
been exorbitantly expensive;
City Folk had a huge script around, 640,000 words.
For perspective, Infinite Jest
clocks in about 483,000 words,
so this cute little game about bugs and letters has it beat
by over a hundred thousand words, and
that means
it's better.
On top of that it's just plain science that when creatures speak in adorable baby talk
they're cute and you just want to squish
their widdle faces.
All of that probably
made it worthwhile to switch from bee
Bebebese back to Animalese, an easy way
to show that you're turning over a New
Leaf.
That brings us up to New Horizons and
Nintendo has once again flipped the
script by making the language...
Tom Nook: [a few recognizable English words then nonsense babble]
semi-Animalese?
It's not quite Bebebese; there are
parts that still sound like words, and
certain sounds do repeat with specific
text like "I" being pronounced like:
Goose: Ah
Rocket: Ah
Gulliver: Ah
but the weird thing is this has still been localized!
You can hear the difference in
the way Nook addresses the audience in
the Japanese and English Nintendo Direct.
He's using the same quasi-English here
that appears in New Horizons, which
brings us to an important question:
Why localized Animalese?
and why not localize Simlish?
Melissa Baese-Berk: One of the the ways
that we best understand how languages
differ from each other is in terms of
what's called their prosody or their
sort-of rhythmic and pitch information.
Jenna: This is Dr. Baese-Berk,
a psycholinguist at the University of Oregon, studying how people process speech,
especially as a second language.
Prosody is a linguistic concept that
covers a lot of speech elements that
aren't explicitly phonetic -- like if you
hear somebody talking through a wall, you
can often tell if they're speaking your
native language or not, even if you can't
hear specific words.
Dr. Baese-Berk: We know that the rhythm matters a ton for recognizability, and when you disturb the rhythm
information and pitch information, it can
have really big consequences for how you
understand the speech. That said there's
a lot of variability, so I could have
sort of weird prosody, weird pitch and
rhythm information, and you could
probably still tell that I'm the native
speaker of English.
Jenna: Which is exactly why
Animalese is so difficult to parse, even
though it's just been peppered with a
few audio artifacts. The sped-up pace
alters the prosody and makes it very
hard to understand.
Hard... but not impossible.
Dr.Baese-Berk: There areways in which you can distort the signal so much that it feels at
first just like it's impossible for you
to understand it, but once you have
started to figure it out, it becomes easy
to understand.
Jenna: I can vouch for that,
having listened to a lot of Animal
Crossing clips while researching this
video. I've gotten to the point where I
can sort of like... half-understand the
villagers while they're speaking, and
I've also begun to dream in Animalese,
and that's... that's probably fine
right?
Dr. Baese-Berk: How we define gibberish is
going to be based on our native language,
right? So how gibberish-y something
sounds is going to be related to how
similar or different it might sound to
your native language, and... but I could
imagine if it sounded so distinct from
your native language. It might not even
sound like gibberish; it might just sound
like something that isn't really
language-y
Jenna: So even though it's gibberish,
the distance from your native language can determine, even
subconsciously, whether you perceive it
as a language. Which means that Simlish
probably isn't as universal as the team
at Maxis hopes.
Dr. Baese-Berk: The specific sounds that
they're using are English-like sounds, in
part because we know producing
non-english-like sounds is something
that's really hard if you're a native
English speaker. So if you're improvising
and producing gibberish, you're going to
produce the sounds that are within your
inventory.
Jenna: A lot of effort was put into
mixing and chopping up this audio but
the raw materials were still inevitably
lacking in variety. Simlish could still
be localized to make more regionally
accessible forms of gibberish.
But you're not a member of the community in The Sims;
you're an overseer God who
occasionally intervenes to drown
somebody. So it matters less how familiar
the nonsense sounds.
But for Animal Crossing, the villagers can't speak a gibberish that's too distant from the
player's native language, because the
games are about becoming part of a
community. Animal Crossing needs to feel -
and sound - warm and familiar.
Dr. Baese-Berk: That level of comfort and familiarity
is something that is probably easy to induce via language.
Jenna: Localized Animalese can make you
feel more comfortable and at home,
because even if you don't realize it,
everybody is speaking your language.
