Where GREP Came From - Computerphile
TLDRThe script delves into the history and functionality of 'grep', a UNIX command-line tool introduced in the 1970s for pattern searching in text files. It discusses the limitations of early computing, the creation of the 'ed' text editor by Ken Thompson, and the introduction of regular expressions. The story of 'grep' being developed by Thompson to assist a colleague in analyzing the Federalist Papers is highlighted, showcasing Thompson's ingenuity in repurposing 'ed' commands into a new tool. The summary also touches on an assignment given by the speaker in 1993, where students were challenged to convert 'ed' into 'grep' in C, emphasizing the evolution from assembly language to C.
Takeaways
- π 'grep' is a command used for searching text patterns in files, introduced in the early 1970s.
- π It can handle an unbounded number of files and is often used in Unix pipelines.
- π€ The origin of 'grep' is attributed to the need for a tool to search through large documents that could not be managed by the 'ed' text editor.
- π₯οΈ The 'ed' editor was simple and straightforward, reflecting the limited computing resources and personal tastes of its creators, Ken Thompson and Dennis Ritchie.
- π 'ed' operated line by line and used single-letter commands, including the 'g' command for global operations on lines matching a regular expression.
- π Regular expressions, a method for specifying text patterns, were integrated into 'ed' and later became a fundamental part of 'grep'.
- π¨βπ» Ken Thompson is credited with quickly creating 'grep' to help a colleague, Lee McMahon, analyze the Federalist Papers.
- π The Federalist Papers analysis project was a catalyst for the creation of 'grep', as Lee needed to search through a large document set.
- π οΈ 'grep' was initially written in PDP 11 assembly language, reflecting the constraints of early computing systems.
- π The 'g/re/p' command in 'ed', which prints lines matching a regular expression, is the direct predecessor to the 'grep' command.
- π¨βπ« A real-life assignment given by the speaker involved converting 'ed' source code into 'grep', highlighting the evolution from 'ed' to 'grep'.
Q & A
What is the primary function of 'grep'?
-'grep' is a command-line utility used for searching plain-text data sets for lines that match a regular expression. It is used to search for patterns of text in one or more files or input streams.
What is the origin of the name 'grep'?
-The name 'grep' is derived from the command 'g' in the 'ed' text editor, which stands for 'global'. It was used in the form 'g/re/p', where 're' stands for regular expression and 'p' for print, to find all occurrences of a pattern and print them.
What was the computing environment like in the early days of UNIX?
-In the early days of UNIX, the computing environment was characterized by limited computing resources with machines like the PDP-11, which had very little memory, typically around 32K to 64K bytes, and very small secondary storage, a few megabytes of disk.
Who were the main contributors to the early UNIX system?
-The main contributors to the early UNIX system were Ken Thompson and Dennis Ritchie, who were responsible for the development of many of the system's utilities and programming languages.
What was the 'ed' text editor, and how was it used?
-The 'ed' text editor was a simple, line-oriented text editor written by Ken Thompson. It was used for editing text one line at a time, with commands like 'p' for print, 'd' for delete, 's' for substitute, and 'append' for adding text.
What is a regular expression, and how was it used in the 'ed' editor?
-A regular expression is a pattern of text used to specify, search, and manipulate strings of text. In the 'ed' editor, regular expressions were used to match patterns within the text, allowing for operations like search and replace to be performed on the matching lines.
Why was 'grep' created, and what was its initial purpose?
-'grep' was created by Ken Thompson to help Lee McMahon analyze the Federalist Papers. It was designed to search through multiple documents and find all occurrences of a particular regular expression, which was not possible with the 'ed' editor due to memory limitations.
What was the significance of the Federalist Papers in the context of 'grep'?
-The Federalist Papers were significant because they were the large dataset that Lee McMahon wanted to analyze to determine the authors of the documents. The inability to edit the entire collection in 'ed' due to memory constraints led to the creation of 'grep'.
What was the original programming language of 'grep', and how has it evolved?
-The original 'grep' was written in PDP-11 assembly language. It has since evolved and is now commonly implemented in higher-level languages like C for broader compatibility and ease of use.
How did the speaker use the 'ed' editor and 'grep' in a programming assignment?
-The speaker assigned students the task of converting the 'ed' editor's source code, which was about 1800 lines of C, into a 'grep' program in C. This exercise was meant to demonstrate the relationship between the two utilities and challenge the students to replicate 'grep's functionality.
Outlines
π Introduction to 'grep' and Its Origin
The script begins with an introduction to 'grep', a command-line utility for searching text patterns in files, which has been a staple in UNIX systems since the 1970s. It highlights 'grep's ability to filter through vast amounts of input, whether from files or other programs, and contrasts this with the limitations of text editors. The origin of 'grep's quirky name is teased as a point of interest for the discussion. The historical context of early UNIX systems, with their limited computing power and memory, is set, mentioning the PDP 11 computer and the influence of Ken Thompson and Dennis Ritchie on the simplicity of UNIX software. The 'ed' text editor, created by Ken Thompson, is described, emphasizing its minimalist design and the use of single-letter commands, which was reflective of the era's technology and user preferences.
π The 'ed' Editor and the Birth of 'grep'
This paragraph delves deeper into the 'ed' editor, explaining its functionality and the use of regular expressions to match text patterns, which was a sophisticated feature compared to shell wildcards. The limitations of 'ed' when dealing with large files due to memory constraints are discussed, leading to the story of Lee McMahon, who needed to analyze large documents like the Federalist Papers. McMahon's challenge with 'ed' and his conversation with Ken Thompson set the stage for the creation of 'grep'. Thompson's quick development of 'grep', which could search through multiple documents for regular expressions, is highlighted. The narrative describes the 'g' command in 'ed' as the inspiration for 'grep', showing Thompson's ingenuity in repurposing existing tools. The story concludes with a personal anecdote from the script's narrator about an assignment given to students to recreate 'grep' from 'ed', underscoring the complexity of Thompson's achievement.
Mindmap
Keywords
π‘grep
π‘UNIX
π‘PDP 11
π‘ed
π‘regular expressions
π‘Federalist Papers
π‘Ken Thompson
π‘Dennis Ritchie
π‘Natural Language Processing
π‘assembly language
π‘C programming language
Highlights
Introduction to 'grep', a command used for searching text patterns in files, dating back to the early 1970s.
'grep' allows for pattern searching in multiple files or input from other programs, a feature not easily replicated in text editors.
The origin of 'grep' is explored, revealing its creation during the early days of UNIX.
UNIX initially ran on a PDP 11, a machine with limited computing power and memory.
Early UNIX software was simple and straightforward, reflecting the hardware limitations and the tastes of its creators, Ken Thompson and Dennis Ritchie.
The 'ed' text editor, written by Ken Thompson, was a basic, line-oriented editor that minimized the use of paper.
The 'ed' editor used single-letter commands and supported line addressing for operations like printing or deleting specific lines.
Regular expressions were introduced in 'ed', allowing for pattern matching in text editing.
The 'g' command in 'ed' was used for global operations on lines matching a regular expression, which inspired the creation of 'grep'.
Lee McMahon's interest in text analysis led to the development of 'grep' for handling large documents that could not fit in 'ed'.
'grep' was created by Ken Thompson to search through documents for occurrences of a specific regular expression.
The name 'grep' is derived from the 'g/re/p' command in 'ed', signifying global search, regular expression, and print.
Ken Thompson's quick development of 'grep' is highlighted as an example of his programming genius.
A historical anecdote from 1993 involves converting the 'ed' source code into a 'grep' program as a class assignment.
The original 'grep' was written in PDP 11 assembly language, while the class assignment required conversion to C.
The challenge of the class assignment was to replicate 'grep' functionality despite not being Ken Thompson.
Transcripts
Browse More Related Video
C
What causes erectile dysfunction and what can be done to treat it? | Peter Attia & Mohit Khera
How to Put Notes on the TI 84 Plus CE!
Integrating scaled version of function | AP Calculus AB | Khan Academy
Get Unlimited DATA from Twitter (Without API!)
Why are there so many accents in the UK? LEP Video Podcast - Learn English with Luke Thompson
5.0 / 5 (0 votes)
Thanks for rating: