LinkedIn Knowledge Graph – KDnuggets Interview

2 min read

We interview LinkedIn about their recently published LinkedIn Knowledge Graph which connects their many millions of members, jobs, companies, and more.

LinkedIn recently published The LinkedIn Knowledge Graph (LKG) . It is an impressive achievement, connecting 450M members, 190M historical job listings, 9M companies, 200+ countries, 35K skills in 19 languages, 28K schools, 1.5K fields of study, 600+ degrees, 24K titles in 19 languages, and 500+ certificates, among other entities, as of Oct 6, 2016.

I had an opportunity to ask LinkedIn a few questions, and here are the answers from Bee-Chung Chen , Senior Staff Engineer & Applied Researcher at LinkedIn and Deepak Agarwal , VP of Engineering, Head of Relevance at LinkedIn, two of the leaders of the LKG project.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

“Data Scientist” is the canonical form of a title entity in the taxonomy. A member or a job with title string “Data Mining Scientist” is standardized to title “Data Scientist” by our title standardizer (a supervised binary classifier) based on title string features and other member/job metadata (e.g., the skills of the member or the skills required by the job).

However, not all similar title strings can be mapped to the same entity by this supervised method, e.g., “Predictive Analytics Specialist” is not standardized to “Data Scientist”, partially because collecting high-quality and high-volume training data for this task is challenging.

To augment the binary decision in such an entity-level standardization task, we also provide the similarity among these three title strings in the following two ways simultaneously. First, LinkedIn title taxonomy has a hierarchical structure: title → super title → function, which enables a higher-level similarity. For example, these three title strings can all belong to the same super title and/or the same function.

Downstream data mining applications can select the most suitable title granularity level.

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.