Where GREP Came From - Computerphile

Computerphile
6 Jul 201810:06
EducationalLearning
32 Likes 10 Comments

TLDRThe script delves into the history and functionality of 'grep', a UNIX command-line tool introduced in the 1970s for pattern searching in text files. It discusses the limitations of early computing, the creation of the 'ed' text editor by Ken Thompson, and the introduction of regular expressions. The story of 'grep' being developed by Thompson to assist a colleague in analyzing the Federalist Papers is highlighted, showcasing Thompson's ingenuity in repurposing 'ed' commands into a new tool. The summary also touches on an assignment given by the speaker in 1993, where students were challenged to convert 'ed' into 'grep' in C, emphasizing the evolution from assembly language to C.

Takeaways
  • πŸ” 'grep' is a command used for searching text patterns in files, introduced in the early 1970s.
  • πŸ“š It can handle an unbounded number of files and is often used in Unix pipelines.
  • πŸ€” The origin of 'grep' is attributed to the need for a tool to search through large documents that could not be managed by the 'ed' text editor.
  • πŸ–₯️ The 'ed' editor was simple and straightforward, reflecting the limited computing resources and personal tastes of its creators, Ken Thompson and Dennis Ritchie.
  • πŸ“ 'ed' operated line by line and used single-letter commands, including the 'g' command for global operations on lines matching a regular expression.
  • 🌐 Regular expressions, a method for specifying text patterns, were integrated into 'ed' and later became a fundamental part of 'grep'.
  • πŸ‘¨β€πŸ’» Ken Thompson is credited with quickly creating 'grep' to help a colleague, Lee McMahon, analyze the Federalist Papers.
  • πŸ“– The Federalist Papers analysis project was a catalyst for the creation of 'grep', as Lee needed to search through a large document set.
  • πŸ› οΈ 'grep' was initially written in PDP 11 assembly language, reflecting the constraints of early computing systems.
  • πŸ“ The 'g/re/p' command in 'ed', which prints lines matching a regular expression, is the direct predecessor to the 'grep' command.
  • πŸ‘¨β€πŸ« A real-life assignment given by the speaker involved converting 'ed' source code into 'grep', highlighting the evolution from 'ed' to 'grep'.
Q & A
  • What is the primary function of 'grep'?

    -'grep' is a command-line utility used for searching plain-text data sets for lines that match a regular expression. It is used to search for patterns of text in one or more files or input streams.

  • What is the origin of the name 'grep'?

    -The name 'grep' is derived from the command 'g' in the 'ed' text editor, which stands for 'global'. It was used in the form 'g/re/p', where 're' stands for regular expression and 'p' for print, to find all occurrences of a pattern and print them.

  • What was the computing environment like in the early days of UNIX?

    -In the early days of UNIX, the computing environment was characterized by limited computing resources with machines like the PDP-11, which had very little memory, typically around 32K to 64K bytes, and very small secondary storage, a few megabytes of disk.

  • Who were the main contributors to the early UNIX system?

    -The main contributors to the early UNIX system were Ken Thompson and Dennis Ritchie, who were responsible for the development of many of the system's utilities and programming languages.

  • What was the 'ed' text editor, and how was it used?

    -The 'ed' text editor was a simple, line-oriented text editor written by Ken Thompson. It was used for editing text one line at a time, with commands like 'p' for print, 'd' for delete, 's' for substitute, and 'append' for adding text.

  • What is a regular expression, and how was it used in the 'ed' editor?

    -A regular expression is a pattern of text used to specify, search, and manipulate strings of text. In the 'ed' editor, regular expressions were used to match patterns within the text, allowing for operations like search and replace to be performed on the matching lines.

  • Why was 'grep' created, and what was its initial purpose?

    -'grep' was created by Ken Thompson to help Lee McMahon analyze the Federalist Papers. It was designed to search through multiple documents and find all occurrences of a particular regular expression, which was not possible with the 'ed' editor due to memory limitations.

  • What was the significance of the Federalist Papers in the context of 'grep'?

    -The Federalist Papers were significant because they were the large dataset that Lee McMahon wanted to analyze to determine the authors of the documents. The inability to edit the entire collection in 'ed' due to memory constraints led to the creation of 'grep'.

  • What was the original programming language of 'grep', and how has it evolved?

    -The original 'grep' was written in PDP-11 assembly language. It has since evolved and is now commonly implemented in higher-level languages like C for broader compatibility and ease of use.

  • How did the speaker use the 'ed' editor and 'grep' in a programming assignment?

    -The speaker assigned students the task of converting the 'ed' editor's source code, which was about 1800 lines of C, into a 'grep' program in C. This exercise was meant to demonstrate the relationship between the two utilities and challenge the students to replicate 'grep's functionality.

Outlines
00:00
πŸ” Introduction to 'grep' and Its Origin

The script begins with an introduction to 'grep', a command-line utility for searching text patterns in files, which has been a staple in UNIX systems since the 1970s. It highlights 'grep's ability to filter through vast amounts of input, whether from files or other programs, and contrasts this with the limitations of text editors. The origin of 'grep's quirky name is teased as a point of interest for the discussion. The historical context of early UNIX systems, with their limited computing power and memory, is set, mentioning the PDP 11 computer and the influence of Ken Thompson and Dennis Ritchie on the simplicity of UNIX software. The 'ed' text editor, created by Ken Thompson, is described, emphasizing its minimalist design and the use of single-letter commands, which was reflective of the era's technology and user preferences.

05:04
πŸ“š The 'ed' Editor and the Birth of 'grep'

This paragraph delves deeper into the 'ed' editor, explaining its functionality and the use of regular expressions to match text patterns, which was a sophisticated feature compared to shell wildcards. The limitations of 'ed' when dealing with large files due to memory constraints are discussed, leading to the story of Lee McMahon, who needed to analyze large documents like the Federalist Papers. McMahon's challenge with 'ed' and his conversation with Ken Thompson set the stage for the creation of 'grep'. Thompson's quick development of 'grep', which could search through multiple documents for regular expressions, is highlighted. The narrative describes the 'g' command in 'ed' as the inspiration for 'grep', showing Thompson's ingenuity in repurposing existing tools. The story concludes with a personal anecdote from the script's narrator about an assignment given to students to recreate 'grep' from 'ed', underscoring the complexity of Thompson's achievement.

Mindmap
Keywords
πŸ’‘grep
Grep is a command-line utility for searching plain-text data for lines that match a regular expression. It is a fundamental tool in UNIX and UNIX-like operating systems. In the context of the video, 'grep' is introduced as a tool that allows users to search for patterns of text in files or input streams, which is crucial for tasks that would be cumbersome in a text editor. The name 'grep' is derived from the 'g' (global) command in the 'ed' text editor, which was used to apply operations globally on lines matching a regular expression.
πŸ’‘UNIX
UNIX is a family of multitasking, multi-user computer operating systems that derive from the original UNIX System developed in the 1970s at the Bell Labs. The video discusses the early days of UNIX, highlighting the limitations of computing power and memory, which influenced the design of simple and straightforward software like 'grep'. UNIX is the environment where 'grep' was first developed and is still widely used today.
πŸ’‘PDP 11
The PDP 11 is a series of 16-bit minicomputers that were produced by Digital Equipment Corporation (DEC) from 1970 into the 1980s. In the video, the PDP 11 is mentioned as the computer on which UNIX initially ran, emphasizing its limited computing power and memory, which influenced the development of early UNIX software, including 'grep'.
πŸ’‘ed
Ed is a line-oriented text editor in UNIX that is used for editing text files. It is known for its simplicity and was written by Ken Thompson. The video explains that 'ed' was the standard text editor in the early days of UNIX and that it introduced the concept of regular expressions, which are crucial for the functionality of 'grep'. The 'ed' editor's commands and its influence on 'grep' are discussed in detail.
πŸ’‘regular expressions
Regular expressions are a powerful tool used in text processing for matching patterns of text. They allow for complex searches and manipulations of text based on specific rules. In the video, regular expressions are described as a way to specify patterns of text in the 'ed' text editor, which is foundational to the functionality of 'grep'. The script provides examples of how regular expressions were used in 'ed' to match words like 'print'.
πŸ’‘Federalist Papers
The Federalist Papers are a series of 85 articles advocating for the ratification of the United States Constitution. In the video, the Federalist Papers are used as an example of a large text corpus that Lee McMahon wanted to analyze using textual analysis techniques. The inability to edit these documents in 'ed' due to memory limitations led to the creation of 'grep'.
πŸ’‘Ken Thompson
Ken Thompson is an American pioneer in computer science. He was one of the creators of UNIX and the 'ed' text editor. The video highlights Thompson's role in developing 'grep', showcasing his ability to quickly create a program that met a specific need, in this case, searching large text files.
πŸ’‘Dennis Ritchie
Dennis Ritchie was an American computer scientist who co-created the C programming language and helped develop UNIX. Although not directly mentioned in the context of 'grep', Ritchie's influence on UNIX and its tools is implied through his partnership with Ken Thompson in the development of UNIX and its associated software.
πŸ’‘Natural Language Processing
Natural Language Processing (NLP) is a field of computer science that focuses on the interaction between computers and human language. In the video, Lee McMahon's interest in analyzing the Federalist Papers is described as an early form of NLP, where he sought to determine the authors of the documents through textual analysis.
πŸ’‘assembly language
Assembly language is a low-level programming language used to instruct a computer's hardware directly. The video mentions that the original 'grep' was written in PDP 11 assembly language, emphasizing the technical skill required to create such a utility in the early days of computing.
πŸ’‘C programming language
C is a general-purpose, procedural computer programming language developed in the early 1970s by Dennis Ritchie. The video script mentions that the 'ed' text editor was converted into 'grep' using C, highlighting the evolution from assembly language to higher-level languages for software development.
Highlights

Introduction to 'grep', a command used for searching text patterns in files, dating back to the early 1970s.

'grep' allows for pattern searching in multiple files or input from other programs, a feature not easily replicated in text editors.

The origin of 'grep' is explored, revealing its creation during the early days of UNIX.

UNIX initially ran on a PDP 11, a machine with limited computing power and memory.

Early UNIX software was simple and straightforward, reflecting the hardware limitations and the tastes of its creators, Ken Thompson and Dennis Ritchie.

The 'ed' text editor, written by Ken Thompson, was a basic, line-oriented editor that minimized the use of paper.

The 'ed' editor used single-letter commands and supported line addressing for operations like printing or deleting specific lines.

Regular expressions were introduced in 'ed', allowing for pattern matching in text editing.

The 'g' command in 'ed' was used for global operations on lines matching a regular expression, which inspired the creation of 'grep'.

Lee McMahon's interest in text analysis led to the development of 'grep' for handling large documents that could not fit in 'ed'.

'grep' was created by Ken Thompson to search through documents for occurrences of a specific regular expression.

The name 'grep' is derived from the 'g/re/p' command in 'ed', signifying global search, regular expression, and print.

Ken Thompson's quick development of 'grep' is highlighted as an example of his programming genius.

A historical anecdote from 1993 involves converting the 'ed' source code into a 'grep' program as a class assignment.

The original 'grep' was written in PDP 11 assembly language, while the class assignment required conversion to C.

The challenge of the class assignment was to replicate 'grep' functionality despite not being Ken Thompson.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: