The researchers weren’t interested in what the daters discussed, or even whether they seemed to share personality traits, backgrounds, or interests. Instead, they were searching for subtle similarities in how they structured their sentences—specifically, how often they used function words such as it, that, but, about, never, and lots. This synchronicity, known as “language style matching,” or LSM, happens unconsciously. But the researchers found it to be a good predictor of mutual affection: An analysis of conversations involving 80 speed daters showed that couples with high LSM scores were three times as likely as those with low scores to want to see each other again.
It’s not just speech patterns that can encode chemistry. Other studies suggest that when two people unknowingly coordinate nonverbal cues, such as hand gestures, eye gaze, and posture, they’re more apt to like and understand each other. These findings raise a tantalizing question: Could a computer know whom we’re falling for before we do?
…
Welcome to the vision of Eli Finkel. A professor of psychology and management at Northwestern University and a co-author of the LSM study, Finkel is a prominent critic of popular dating sites such as eHarmony and Chemistry, which claim to possess a formula that can connect you with your soul mate. Finkel’s beef with these sites, he says, isn’t that they “use math to get you dates,” as OKCupid puts it. It’s that they go about it all wrong. As a result, Finkel argues, their matching algorithms likely foretell love no better than chance.
The problem, he explains, is that they rely on information about individuals who have never met—namely, self-reported personality traits and preferences. Decades of relationship research show that romantic success hinges more on how two people interact than on who they are or what they believe they want in a partner. Attraction, scientists tell us, is created and kindled in the glances we exchange, the laughs we share, and the other myriad ways our brains and bodies respond to one another.
Which is why, according to Finkel, we’ll never predict love simply by browsing photographs and curated profiles, or by answering questionnaires. “So the question is: Is there a new way to leverage the Internet to enhance matchmaking, so that when you get face to face with a person, the odds that you’ll be compatible with that person are higher than they would be otherwise?”
At the end of the 1950s, researchers in the United States, Russia, and Western Europe were confident that high-quality machine translation (MT) of scientific and technical documents would be possible within a very few years. After the promise had remained unrealized for a decade, the National Academy of Sciences of the United States published the much cited but little read report of its Automatic Language Processing Advisory Committee. The ALPAC Report recommended
that the resources that were being expended on MT as a solution to immediate practical problems should be redirected towards more fundamental questions of language processing that would have to be answered before any translation machine could be built. The number of laboratories working in the field was sharply reduced all over the world, and few of them were able to obtain funding for more long-range research programs in what then came to be known as computational linguistics.
There was a resurgence of interest in machine translation in the 1980s and, although the approaches adopted differed little from those of the 1960s, many of the efforts, notably in Japan, were rapidly deemed successful. This seems to have had less to do with advances in linguistics and software technology or with the greater size and speed of computers than with a better appreciation of special situations where ingenuity might make a limited success of rudimentary MT. The most conspicuous example was the METEO system, developed at the University of Montreal, which has long provided the French translations of the weather reports used by airlines, shipping companies, and others. Some manufacturers of machinery have found it possible to translate maintenance manuals used within their organizations (not by their customers) largely automatically by having the technical writers use only certain words and only in carefully prescribed ways.
Why Machine Translation Is Hard
Many factors contribute to the difficulty of machine translation, including words with multiple meanings, sentences with multiple grammatical structures, uncertainty about what a pronoun refers to, and other problems of grammar. But two common misunderstandings make translation seem altogether simpler than it is. First, translation is not primarily a linguistic operation, and second, translation is not an operation that preserves meaning.
There is a famous example that makes the first point well. Consider the sentence:
The police refused the students a permit because they feared violence.
Suppose that it is to be translated into a language like French in which the word for ‘police’ is feminine. Presumably the pronoun that translates ‘they’ will also have to be feminine. Now replace the word ‘feared’ with ‘advocated’. Now, suddenly, it seems that ‘they’ refers to the students and not to the police and, if the word for students is masculine, it will therefore require a different translation. The knowledge required to reach these conclusions has nothing linguistic about it. It has to do with everyday facts about students, police, violence, and the kinds of relationships we have seen these things enter into.
The second point is, of course, closely related. Consider the following question, stated in French: Ou voulez-vous que je me mette? It means literally, “Where do you want me to put myself?” but it is a very natural translation for a whole family of English questions of the form “Where do you want me to sit/stand/sign my name/park/tie up my boat?” In most situations, the English “Where do you want me?” would be acceptable, but it is natural and routine to add or delete information in order to produce a fluent translation. Sometimes it cannot be avoided because there are languages like French in which pronouns must show number and gender, Japanese where pronouns are often omitted altogether, Russian where there are no articles, Chinese where nouns do not differentiate singular and plural nor verbs present and past, and German where flexibility of the word order can leave uncertainties about what is the subject and what is the object.
The Structure of Machine Translation Systems
While there have been many variants, most MT systems, and certainly those that have found practical application, have parts that can be named for the chapters in a linguistic text book. They have lexical, morphological, syntactic, and possibly semantic components, one for each of the two languages, for treating basic words, complex words, sentences and meanings. Each feeds into the next until a very abstract representation of the sentence is produced by the last one in the chain.
There is also a ‘transfer’ component, the only one that is specialized for a particular pair of languages, which converts the most abstract source representation that can be achieved into a corresponding abstract target representation. The target sentence is produced from this essentially by reversing the analysis process. Some systems make use of a so-called ‘interlingua’ or intermediate language, in which case the transfer stage is divided into two steps, one translating a source sentence into the interlingua and the other translating the result of this into an abstract representation in the target language.
One other problem for computers is dealing with metaphor. Metaphors are a common part of language and occur frequently in the computer world:
How can I kill the program?
How do I get back into dos?
My car drinks gasoline
One approach treats metaphor as a failure of regular semantic rules
Compute the normal meaning of get into—dos violates its selection restrictions
dos isn’t an enclosure so the interpreter fails
Next have to search for an unconventional meaning for get into and recompute its meaning
If an unconventional meaning isn’t available, you can try using context, or world knowledge
Statistical procedures aren’t likely to generate interpretations for new metaphors.
Interpretation routines might result in overgeneralizations:
How can I kill dos? —> *How can I give birth to dos?
*How can I slay dos?
Mary caught a cold from John —> *John threw Mary his cold.
Catching a cold in unintentional (as opposed to catching a thief)
Getting Started
The best way to learn about language processing is to write your own computer programs. To do this, users will need access to a computer that can display information on the internet. Anyone with an email account on a personal computer has this type of access. The exercises in this class are written for the Perl programming language. This language is widely available on mainframe computers, and allows users to manipulate strings of text with a modicum of ease. In order to use Perl on a mainframe computer, however, the reader will have to access the computer directly via a terminal emulation program.
The only other item that you will need for Perl programming is a text editor. Text editors provide a means of writing the commands that make up a Perl program. Mainframe computers typically have a program that allows users to write text files. You can also use these programs to write a Perl program. The University of Kansas mainframe uses the Pico and vi editors. Once you have assembled the basic tools for creating Perl programs you are ready to begin language processing.
The image of humans conversing with their computers is both a thoroughly accepted cliche of science fiction and the ultimate goal of computer programming, and yet, the year 2001 has come and gone without the appearance of anything like the HAL 9000 talking computer featured in the movie 2001: A Space Odyssey.
Computational linguists attempt to use computers to process human languages. The field of computational linguistics has two general aims:
The technological. To enable computers to analyze and process natural language.
The psychological. To model human language processing on computers.
From the technological perspective, natural language applications include:
Speech recognition. Today, many personal computers include speech recognition software.
Natural language interfaces to software. For example, demonstration systems have been built that let a user ask for flight information.
Document retrieval and information extraction from written text. For example, a computer system could scan newspaper articles, looking for information about events of a particular type and enter the information into a database.
The rapid growth of the Internet/WWW and the emergence of the information society poses exciting new challenges to computational linguistics. Although the new media combine text, graphics, sound and movies, the whole wealth of multimedia information can only be structured, indexed and navigated through language. For browsing, navigating, filtering and processing the information on the web, we need language technology. The increasing multilingual nature of the web constitutes an additional challenge for language technology. The multilingual web can only be mastered with the help of multilingual tools for indexing and navigating.
Computational linguists adopting the psychological perspective hypothesize that at some abstract level, the brain is a kind of biological computer, and that an adequate answer to how people understand and generate language must be in terms formal and precise enough to be modeled by a computer.
Semiotics is the theory of signs, and reading signs is a part of everyday life: from road signs that point to a destination, to smoke that warns of fire, to the symbols buried within art and literature. Semiotic theory can, however, appear mysterious and impenetrable. This introductory book decodes that mystery using visual examples instead of abstract theory.
This new edition features an expanded introduction that carefully and clearly presents the world of semiotics before leading into the book’s 76 sections of key semiotic concepts. Each short section begins with a single image or sign, accompanied by a question inviting us to interpret what we are seeing. Turning the page, we can compare our response with the theory behind the sign, and in this way, actively engage in creative thinking.
A fascinating read, this book provides practical examples of how meaning is made in contemporary culture.
It was early 1954 when computer scientists, for the first time, publicly revealed a machine that could translate between human languages. It became known as the Georgetown-IBM experiment: an “electronic brain” that translated sentences from Russian into English.
The scientists believed a universal translator, once developed, would not only give Americans a security edge over the Soviets but also promote world peace by eliminating language barriers.
They also believed this kind of progress was just around the corner: Leon Dostert, the Georgetown language scholar who initiated the collaboration with IBM founder Thomas Watson, suggested that people might be able to use electronic translators to bridge several languages within five years, or even less.
The process proved far slower. (So slow, in fact, that about a decade later, funders of the research launched an investigation into its lack of progress.) And more than 60 years later, a true real-time universal translator — a la C-3PO from Star Wars or the Babel Fish from The Hitchhiker’s Guide to the Galaxy — is still the stuff of science fiction.
…
Stimulating Machines’ Brains
After decades of jumping linguistic and technological hurdles, the technical approach scientists use today is known as the neural network method, in which machines are trained to emulate the way people think — in essence, creating an artificial version of the neural networks of our brains.
Neurons are nerve cells that are activated by all aspects of a person’s environment, including words. The longer someone exists in an environment, the more elaborate that person’s neural network becomes.
With the neural network method, the machine converts every word into its simplest representation — a vector, the equivalent of a neuron in a biological network, that contains information not only about each word but about a whole sentence or text. In the context of machine learning, a science that has been developed over the years, a neural network produces more accurate results the more translations it attempts, with limited assistance from a human.
Though machines can now “learn” similarly to the way humans learn, they still face some limits, says Yoshua Bengio, a computer science professor at the University of Montreal who studies neural networks. One of the limits is the sheer amount of data required — children need far less to learn a language than machines do.
The way I thought you used a dictionary was that you looked up words you’ve never heard of, or whose sense you’re unsure of. You would never look up an ordinary word — like example, or sport, or magic — because all you’ll learn is what it means, and that you already know.
…
The New Oxford American dictionary, by the way, is not like singularly bad. Google’s dictionary, the modern Merriam-Webster, the dictionary at dictionary.com: they’re all like this. They’re all a chore to read. There’s no play, no delight in the language. The definitions are these desiccated little husks of technocratic meaningese, as if a word were no more than its coordinates in semantic space.
John McPhee’s secret weapon
John McPhee — one the great American writers of nonfiction, almost peerless as a prose stylist — once wrote an essay for the New Yorker about his process called “Draft #4.” He explains that for him, draft #4 is the draft after the painstaking labor of creation is done, when all that’s left is to punch up the language, to replace shopworn words and phrases with stuff that sings.
The way you do it, he says, is “you draw a box not only around any word that does not seem quite right but also around words that fulfill their assignment but seem to present an opportunity.” You go looking for le mot juste.
But where?
“Your destination is the dictionary,” he writes:
Suppose you sense an opportunity beyond the word “intention.” You read the dictionary’s thesaurian list of synonyms: “intention, intent, purpose, design, aim, end, object, objective, goal.” But the dictionary doesn’t let it go at that. It goes on to tell you the differences all the way down the line — how each listed word differs from all the others. Some dictionaries keep themselves trim by just listing synonyms and not going on to make distinctions. You want the first kind, in which you are not just getting a list of words; you are being told the differences in their hues, as if you were looking at the stripes in an awning, each of a subtly different green.
As Blaze Miskulin puts it below, “English is a mutt. And a slut. It was born of the random fucking of multiple cultures, languages, and dialects, and it will hop in bed with any language that tickles its fancy. It’a also a thief. English will blatantly steal any word or phrase that it finds interesting. We like it? Fuck it, it’s English, now.”
I made a minor statement in another post, and I got a reply that struck me as rather odd—both in its content and its somewhat aggressive tone.
I made the comment that English isn’t easy for foreigners to learn, and gave an example of phrasal verbs to illustrate my point. The response was… odd.
So, I thought I would share my experience and insight on EFL (English as a Foreign Language) with the O-Deck at large.
Bastards
The thing that most native English speakers don’t understand is that English isn’t a single language in the way that French, German, and Chinese are. Each of those have a very long history and a high degree of isolation (German maybe less than the others, but still significant).
In the course of my teaching, I often have the opportunity to explain the basic history of English. This is usually prompted by a situation where I have to explain that “well… this word is actually French, and that word is Latin, and we often use them in combination with this other word that comes from German…”
The simplest way I can explain it is that English is the bastard child of a bastard child. There are linguistic historians out there who can explain the details far better than I could ever hope to, but it boils down to this: English is a mutt. And a slut. It was born of the random fucking of multiple cultures, languages, and dialects, and it will hop in bed with any language that tickles its fancy.
It’a also a theif. English will blatantly steal any word or phrase that it finds interesting. We like it? Fuck it, it’s English, now.
I express this in rude terms (something English is excelent for, by the way), but I want to stress that this is one of the things I absolutely *love* about English. And I believe that this is one of the most significant reasons that it has become the lingua franca for the world. French used to be the universal language. But the French became too wrapped up in preserving the “purity” of their language. The world doesn’t want a “pure” language; it wants a slut that accepts any and all comers.
From the Other Side
For about 25 years before coming to China, my work and play revolved around English. I was a competitive speaker, a performer, a speech and drama coach, a published author, a copywriter, an editor, and more. I had a mastery of the English language. And then I came to China and started teaching it to people for whom it was not their mother tongue. I’m not ashamed to admit that I was brought down a few pegs. I was suddenly confronted with people for whom the very basis of the language was utterly alien. Culture guides language, language guides culture and thought.
For example: In which direction does time move?
Ask anyone who speaks a European language, and they will say “Forward, (of course!)”. Things are “ahead of you” or “behind you”. You “move forward” or “go back”. You “look ahead” and “look back”. We have “foresight” and “hindsight”.
In Chinese, time moves down. In Chinese, “next week” is “down one week” (xia ge xing qi —下个星期). Tomorrow, however, is “bright day” (ming tian—明天).
Chinese language “thinks” in a very different way from English—and most European languages.
Put Down the Dictionary
One of the most frustrating things I have to deal with is “the dictionary”. In a recent Forbes article, Amelia Friedman recounted this typical interaction with undergraduates:
We don’t need to be bilingual because we already have the skills we need in the global marketplace.
We know how to use Google Translate
, we’ve traveled abroad in college, and we watch Anderson Cooper at least twice per week.
[emphasis added]
The most common thing I say in my classes is “Put down the dictionary!” I don’t care how advanced you think your “universal translator” is… it’s wrong. I can’t count the number of times I’ve had someone thrust their phone in my face, open to the dictionary app, saying “But the dictionary says…” only for me to respond: “The dictionary is wrong. If you say that in America (England, Australia, Anywhere), nobody will understand you.”
N.B. Sometimes it’s “amazo-fucking wrong”.
Turn Up, Turn Down, Turn In, Turn Out
One of the areas where communication breaks down is phrasal verbs
Phrasal verbs ( as I noted in my previous post) are real and integral parts of the English language. They are also one example of how English is a difficult language to truly learn.
In the thread that spawned this post, the commenter referred to phrasal verbs as “crazy colloquialisms”. This is a serious misunderstanding of the English language (and, frankly, language in general)
A phrasal verb is when multiple words (a phrase) combine to mean something different from the meanings of the individual words. English isn’t unique in this, but it’s still a factor that makes English difficult to understand.
Phrasal verbs are not something that is “outside” common speech. They are inherent English phrases which use words in ways which do not fall within the expected parameters. They are, quite frankly, the way we speak.
No matter where you go in the world, you will turn in reports, look up information, ask outpretty women, call in sick, turn down invitations, and run out of beer (Goddammit! Who drank all the beer?!)
That’s a Horse of a Different Color
Idioms: It’s a love-hate relationship. I absolutely love idioms. Not as much as I love colloquialisms (look down yonder), but they make me as happy as a pig in shit.
On the other hand… teaching them as EFL is… well… problematic (to put it politely). While some idioms can be explained, the vast majority of them are simply “common cultural knowledge”. I will say, with authority and conviction: “This phrase means X. I don’t know why, don’t bother asking”.
This is, of course, true of any language. The first time someone said “ma ma hu hu” to me, I had no clue what they were saying. Then a “helpful” (read: snarky asshole) co-worker translated for me. It means… “horse horse tiger tiger”. Ummm…. what?? “Just so-so”. Not a single person I have asked (all native speakers) has been able to explain to me why “horse horse tiger tiger” means “Just so-so”. It’s just an idiom.
As My Granpa Used to Say…
Colloquialisms are, I think, the best part of English. I loves me some local sayings.
For those who don’t know, a colloquialism is a “colorful saying”. They are frequently regional or local. Unlike slang (see down yonder), colloquialisms have a long lifespan—as evidenced by the fact that they are commonly preceded by the phrase “As my grandpa used to say…”
While colloquialisms can nestle into any grammatical corner, the most common (and most fun) fall into the range of metaphors and similes.
“He’s as nervous as a long-tailed cat in a room full of rocking chairs.”
The one I remember from my dad was “She’s five-foot tall and three axe-handles wide” (she’s short and fat).
Because they are regional or local, even native speakers won’t always understand colloquialisms. I ran into that somewhat frequently when I was living in Texas and Virginia (I’m a Wisconsin boy). But then… they often didn’t understand what I was saying.
Between native speakers, colloquialisms are either easy to figure out (there’s enough shared culture) or easily identified as colloquialisms—at which point we just say “What the fuck does that mean?”
Foreigners, however, frequently lack an inherent “feeling” for the language that lets them know that a set of words should not be taken literally. And they certainly don’t have the cultural and regional understanding to correctly interpret what’s being said.
That’s sick
Slang. Oh gods… slang. How do I hate thee? Let me count the ways.
Slang is the bane of anyone trying to teach EFL.
One of the core intents of slang is to make itself unintelligable to “old people” and “uncool people”. So… basically anyone who deals with language in an international context.
Slang has a half-life shorter than most transuranic elements.
I don’t teach slang. I will often refer to slang—but almost always to show how it causes breakdowns in communication, and shouldn’t be used.
My example: At a previous job (selling boats) I did a photoshoot. 3 models volunteered their time in exchange for copies of the photos (TFP; Time for Print (that’s jargon)). After weeding out the bad shots, I sent copies of the photos to the models. I also posted copies online. One of the models commented on a photo of hers with this: “That makes my legs look sick!”
I was taken aback. I thought the photo was quite good and showed her in a positive light. Why was she saying that it made her look ill or deformed? I e-mailed another one of the models and asked her what I had done wrong. She replied: “Sick means really good”.
That’s when I knew I was officially “old”.
The First Shall Be Last, and the Simple Shall Be Complex
Nothing has advanced my understanding of English more than spending 3 years teaching it to Chinese students.
You think you know English? Okay… Find a non-European who is just learning English and try to explain these words to them:
very
so
the
Now…
Explain the difference between “a”[uh] and “a”[ay].
“I have [uh] pencil.”
“I have [ay] pencil.”
To Wrap It Up…
One of the things I stress to my students is that language is not words; language is ideas. I loves me some colloquialisms, but I will get down-right violent about the complete wrongness of using “literally” to mean “figuratively”. It’s not about the words, it’s about the ideas.
Language isn’t about rules or definitions any of the shit the textbooks insist is ultra-important. Langauge is about communicating. There are a bazillion tools out there to teach people vocabulary and grammar and all that other textbook stuff.