Mining Information From A Resume

by 7wData
May 29, 2017

This article demonstrates a framework for mining relevant entities from a text resume. It shows how separation of parsing logic from entity specification can be achieved. Although only one resume sample is considered here, the framework can be enhanced further to be used not only for different resume formats, but also for documents such as judgments, contracts, patents, medical papers, etc.

Majority of world’s unstructured data is in the textual form. To make sense of it, one must, either go through it painstakingly or employ certain automated techniques to extract relevant information. Looking at the volume, variety and velocity of such textual data, it is imperative to employ Text Mining techniques to extract the relevant information, transforming unstructured data into structured form, so that further insights, processing, analysis, visualizations are possible.

This article deals with a specific domain, of applicant profiles or resumes. They, as we know, come not only in different file formats (txt, doc, pdf, etc.) but also with different contents and layouts. Such heterogeneity makes extraction of relevant information, a challenging task. Even though it may not be possible to fully extract all the relevant information from all the types of formats, one can get started with simple steps and at least extract whatever is possible from some of the known formats.

Broadly there are two approaches: linguistics based and Machine Learning based. In “linguistic” based approaches pattern searches are made to find key information, whereas in “Machine Learning” approaches supervised-unsupervised methods are used to extract the information. “Regular expression” (RegEx), used here, is one of the “linguistic” based pattern-matching method.

A primitive way of implementing entity extraction in a resume could be to write the pattern-matching logic for each entity, in a code-program, monolithically. In case of any change in the patterns, or if there is an introduction of new entities/patterns, one needs to change the code-program. This makes maintenance cumbersome as the complexity increases. To alleviate this problem, separation of parsing-logic and specification of entities is proposed in a framework, which is demonstrated below.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Mining Information From A Resume

Leave a Reply Cancel reply

Upcoming Events