400 Categorized Job Titles for Data Scientists
- by 7wData
Job titles for data scientists, including details about the simple but powerful classifier used to categorize these job titles. This analysis provides a break down per job category, and granular reports that you can download for free (job titles broken down per company, category and level), as well as NLP (natural language processing) source code. It is based on analyzing connections from multiple LinkedIn profiles - totaling more than 10,000 professionals. The first study was published in June 2013.
The table below shows the top job titles in the business analytics category. The full list has 700+ job titles shared by at least two practitioners, across the 11 following categories
The full table  can be downloaded here (Excel spreadsheet). If you include job titles shared by only one person, we have 7,000+ job titles: this is another example of a system governed by a Zipf distribution, with very long tail. A very interesting spreadsheet with full details (including job title, job category, level, and company name) is available for DSC members exclusively. If you are not yet a member, you can sign-up here to access the spreadsheet.
We analyzed the LinkedIn data (connections with job title and company, from well connected data scientists), cleaned the job title field, and created three extra fields:
In order to identify job categories and levels, we first created a data dictionary of all one-token and two-token keywords found in job titles, ranked by frequency, after filtering out tokens that make no sense (such as vice, because it is always associated with president, in job titles containing vice president).
The job categories, levels and cleaned job tiles were computed with the following perl script, in section 3. While this is a clustering problem (creating a taxonomy of job titles for data scientists), because of our simple and scalable approach, from a computational point of view, it looks more like an indexing problem, rather than pure clustering.
The idea was to quickly write a script, to produce the results in less than two hours or work - from start to finish. The input file jobs.txt contains raw job title and company, entered by LinkedIn connections. The first step uses regular expressions to clean the job titles. If you are unfamiliar with this type of code, read our data science cheat sheet first. Note that the many "if" statements in the code are in hierarchical order, you can not re-order them without causing some problems.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More