Programming in ‘natural’ language is coming sooner than you think
- by 7wData
Sometimes major shifts happen virtually unnoticed. On May 5, IBMannounced Project CodeNet to very little media or academic attention.
CodeNet is a follow-up to ImageNet, a large-scale dataset of images and their descriptions; the images are free for non-commercial uses. ImageNet is now central to the progress of deep learning computer vision.
CodeNet is an attempt to do for Artificial Intelligence (AI) coding what ImageNet did for computer vision: it is a dataset of over 14 million code samples, covering 50 programming languages, intended to solve 4,000 coding problems. The dataset also contains numerous additional data, such as the amount of memory required for software to run and log outputs of running code.
IBM’s own stated rationale for CodeNet is that it is designed to swiftly update legacy systems programmed in outdated code, a development long-awaited since the Y2K panic over 20 years ago, when many believed that undocumented legacy systems could fail with disastrous consequences.
However, as security researchers, we believe the most important implication of CodeNet — and similar projects — is the potential for lowering barriers, and the possibility of Natural Language Coding (NLC).
In recent years, companies such as OpenAI and Googlehave been rapidly improving Natural Language Processing (NLP) technologies. These are machine learning-driven programs designed to better understand and mimic natural human language and translate between different languages. Training machine learning systems require access to a large dataset with texts written in the desired human languages. NLC applies all this to coding too.
Coding is a difficult skill to learn let alone master and an experienced coder would be expected to be proficient in multiple programming languages. NLC, in contrast, leverages NLP technologies and a vast database such as CodeNet to enable anyone to use English, or ultimately French or Chinese or any other natural language, to code. It could make tasks like designing a website as simple as typing “make a red background with an image of an airplane on it, my company logo in the middle and a contact me button underneath,” and that exact website would spring into existence, the result of automatic translation of natural language to code.
It is clear that IBM was not alone in its thinking. GPT-3, OpenAI’s industry-leading NLP model, has been used to allow coding a website or app by writing a description of what you want. Soon after IBM’s news, Microsoft announced it had secured exclusive rights to GPT-3.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Shift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read MoreCategories
You Might Be Interested In
Improving Machine Learning: How Knowledge Graphs Bring Deeper Meaning to Data
8 Dec, 2021Enterprise machine learning deployments are limited by two consequences of outdated data management practices widely used today. The first is …
Riding the second wave of digital transformation
25 Apr, 2021Digital transformation is inevitable. What began as a way to encourage companies across the globe to adapt to market changes …
AI-driven knowledge management – Why it matters to contact centres and how they can achieve it
7 Aug, 2021Getting knowledge management right is one of the most important challenges any contact centre undertakes The Austrian management consultant, Peter …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.