An Inside Update on Natural Language Processing Blog

An Inside Update on Natural Language Processing

by 7wData
July 4, 2016

This article is an interview with computational linguist Jason Baldridge. It’s a good read for data scientists, researchers, software developers, and professionals working in media, consumer insights, and market intelligence. It’s for anyone who’s interested in, or needs to know about, natural language processing (NLP).

Jason and NLP go way back. As a linguistics graduate student at the University of Edinburgh, in 2000, Jason co-created the OpenNLP text-processing framework, now part of Apache. He joined the University of Texas linguistics faculty in 2005 and, a few years back, helped build a text-analytics system for social-media agency Converseon. Jason’s Austin start-up, People Pattern, applies NLP and machine learning for social-audience insights; he co-founded the company in 2013 and serves as chief scientist. Finally, he’ll keynote on “Personality and the Science of Sharing” and teach a tutorial at the 2016 Sentiment Analysis Symposium.

In sum, Jason is an all-around cool guy, and he deserves special recognition for providing the most thorough Q&A responses I have ever received in response to an interview request. The interview? This one, covering AI, neural networks, computational linguistics, Java vs. Scala, and accuracy evaluation with a detour into Portuguese-English translation challenges, that is —

Seth Grimes> Let’s jump in the deep end. What’s the state of NLP, of natural language processing?

Jason Baldridge> There’s work to be done.

The first thing to keep in mind is that many of the most interesting NLP tasks are AI-complete. That means we are likely to need representations and architectures that recognize, capture, and learn knowledge about people and the world in order to exhibit human-level competence in these tasks. Do we need to represent word senses, predicate-argument relations, discourse models, etc? Almost certainly. An optimistic deep learning person might say “the network will learn all that,” but I’m skeptical that a generic model structure will learn all these things from the data that is available to it.

Jason> No, they are a great set of tools and techniques that are providing large improvements for many tasks. But they aren’t magic and they won’t suddenly solve every problem we throw at them, out-of-the-box. When it comes to language, the only competent device we know of for processing human language fully — the human brain — is the result of hundreds of millions of years of evolution. That process has afforded it with a complex architecture that dwarfs the relative puny networks that are used for language and vision tasks today.

Humans learn language from a surprisingly small amount of data, and they go through different phases in that process, including memorization to generalization (including overgeneralization, e.g., “Mommy goed to the store”). Having said that, I love the boldness and confidence of the neural optimists, but I think we will need to figure out the architectures and the reward mechanisms by which a very deep network processes, represents, stores, and generalizes information and how it relates to language. That will imply choices about how lexicons are stored, how morphological and syntactic regularities are captured, and so on.

Is there academic computational-linguistics work that you’d call out as interesting, surfaced in NLP software tools or not?

The vectorization of words and phrases is one of the big overall trends these days, with the use of those vectors as the inputs for NLP tasks. The good part is that vectors are learned on large, unlabeled corpora. This injects knowledge into supervised learning tasks that have much less data.

For example, “pope,” “catholic,” and “vatican” will have similar vectors, so training examples that have just one of these words will still contribute toward better learning of shared parameters. Without this, a classifier based on bags-of-words sees these words as being as separate as “apple,” “hieroglyph,” and “bucket.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

An Inside Update on Natural Language Processing

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Big Data, Open Data and the Need for Data Transparency (Industry Perspective)

Four Steps to a Modern Data Management Architecture

Creating The Most Sophisticated Recommendations Using Native Graphs

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

An Inside Update on Natural Language Processing

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change