Beware global search and replace!

I’m old enough to remember when cutting and pasting were really done with scissors and glue (or Scotch tape). When I was a graduate student in the late 1970s, few physicists typed their own papers, and if they did they left gaps in the text, to be filled in later with handwritten equations. The gold standard of technical typing was the IBM Correcting Selectric II typewriter. Among its innovations was the correction ribbon, which allowed one to remove a typo with the touch of a key. But it was especially important for scientists that the Selectric could type mathematical characters, including Greek letters.

IBM Selectric typeballs

IBM Selectric typeballs

It wasn’t easy. Many different typeballs were available, to support various fonts and special characters. Typing a displayed equation or in-line equation usually involved swapping back and forth between typeballs to access all the needed symbols. Most physics research groups had staff who knew how to use the IBM Selectric and spent much of their time typing manuscripts.

Though the IBM Selectric was used by many groups, typewriters have unique personalities, as forensic scientists know. I had a friend who claimed he had learned to recognize telltale differences among documents produced by various IBM Selectric machines. That way, whenever he received a referee report, he could identify its place of origin.

Manuscripts did not evolve through 23 typeset versions in those days, as one of my recent papers did. Editing was arduous and frustrating, particularly for a lowly graduate student like me, who needed to beg Blanche to set aside what she was doing for Steve Weinberg and devote a moment or two to working on my paper.

It was tremendously liberating when I learned to use TeX in 1990 and started typing my own papers. (Not LaTeX in those days, but Plain TeX embellished by a macro for formatting.) That was a technological advance that definitely improved my productivity. An earlier generation had felt the same way about the Xerox machine.

But as I was reminded a few days ago, while technological advances can be empowering, they can also be dangerous when used recklessly. I was editing a very long document, and decided to make a change. I had repeatedly used $x$ to denote an n-bit string, and thought it better to use $\vec x$ instead. I was walking through the paper with the replace button, changing each $x$ to $\vec x$ where the change seemed warranted. But I slipped once, and hit the “Replace All” button instead of “Replace.” My computer curtly informed me that it had made the replacement 1011 times. Oops …

This was a revocable error. There must have been a way to undo it (though it was not immediately obvious how). Or I could have closed the file without saving, losing some recent edits but limiting the damage.

But it was late at night and I was tired. I panicked, immediately saving and LaTeXing the file. It was a mess.

Okay, no problem, all I had to do was replace every \vec x with x and everything would be fine. Except that in the original replacement I had neglected to specify “Match Case.” In 264 places $X$ had become $\vec x$, and the new replacement did not restore the capitalization. It took hours to restore every $X$ by hand, and there are probably a few more that I haven’t noticed yet.

Which brings me to the cautionary tale of one of my former graduate students, Robert Navin. Rob’s thesis had two main topics, scattering off vortices and scattering off monopoles. On the night before the thesis due date, Rob made a horrifying discovery. The crux of his analysis of scattering off vortices concerned the singularity structure of a certain analytic function, and the chapter about vortices made many references to the poles of this function. What Rob realized at this late stage is that these singularities are actually branch points, not poles!

What to do? It’s late and you’re tired and your thesis is due in a few hours. Aha! Global search and replace! Rob replaced every occurrence of “pole” in his thesis by “branch point.” Problem solved.

Except … Rob had momentarily forgotten about that chapter on monopoles. Which, when I read the thesis, had been transformed into a chapter on monobranch points. His committee accepted the thesis, but requested some changes …

Rob Navin no longer does physics, but has been very successful in finance. I’m sure he’s more careful now.

7 thoughts on “Beware global search and replace!

  1. Several times, I have had journal typesetters attempt to be “helpful” and make global find-and-replace changes in my accepted manuscripts. As a result, my collaborators and I have lost hundreds of hours trying to undo the damage. Find-and-replace should be banned at that stage of the publishing process.

  2. When I was an undergrad I was doing some experiments on an NMR spectrometer which I ran remotely. It involved this complicated reverse ssh tunnel that Ike set up for me and I only had access on the weekends when his lab wasn’t using the machine. I had a matlab script to gather tons of data which I started on a lab computer at 5:30pm on Friday and after I saw it running successfully for 10-15 minutes I left for the weekend. But I didn’t realize that I had used the “more on” command which causes matlab to pause after every page of output. So probably around half an hour later it just stopped, waiting for me to press space. I found this Monday morning when I came back, and ever since then have called it the “moron” command.

  3. This is a problem that version control software is good at defusing. Not only will git (or others) give you the option of going back to a known-good version that’s not too far behind (assuming you commit reasonably often), it also gives you built-in tools that are useful for saving the edits you soured with the global replace.

    For example:

    – You are on branch A, working on commit Y and accidentally do a bad global search and replace
    – Create a new branch B (based on A) and commit your now-broken state there
    – Revert to A, redo the global search and replace (without the edits you want), and commit the results to a new branch C
    – Checkout C, then merge B into C, creating a commit X
    – X’s diff contains most of the edits you want, and none of the search and replace edits (you can lose some when they affect the same line or especially the same characters)
    – Cherrypick X onto A
    – A now contains (most of) the edits you wanted, and none of the global search and replace

    Quickly scanning a staged change’s diff before committing it is also a good way to double-check that your changes made sense (e.g. you’d probably notice “monobranch point”).

    • Additionally, Dropbox saves recent versions of your files. You might be able to restore from the second-most-recent version that was uploaded to Dropbox.

      Version control systems are more powerful, though.

  4. “This was a revocable error. There must have been a way to undo it (though it was not immediately obvious how).”

    Press down the control key and keep it pressed down. Then press the Z key many times. Each time it undoes one instance of the change. So press the Z key 400 times to undo 400 changes

  5. I had very similar experiences when I was a student in the 70s regarding physical gluing and pasting and later when I was a postdoc regarding early computerized typesetting. My first paper was typed in 1973 or so with an ordinary typewriter (by a professional typist but without the symbols, subscripts and superscripts), with additional amazing manual scissor work of gluing and pasting (that I did). I think I still have it somewhere. (It took 6 years for it to be published; The topic was some combinatorial identity related to quantities A_n(x,y; a,b) which appear in Riordan book’s combinatorial identities and I sent it to Riordan himself. He had a lot of comments the first of which was that the title “On A_n(x,y;a,b) and $F_n(x,y;a,b)$ is not a good title, especially since the later is a notation is something I invented in that paper.)

    The second experience regarding global changes referred to a “trilogy” of papers I wrote in 1984-5. The first part was ready and submitted and parts II and III were in draft forms, This was already the era of computerized typesetting and I used something called “troff”. (before Tex and LaTex I think) at some point, I wanted to make a small change but the editor understood a point as a wildcard and corrupt the entire files. Eventually I did not write these two papers which is not so bad, except that I promised my wife that I am writing a trilogy of papers and she kept asking me what happened to my trilogy for many years.

  6. Yeah, we were using troff (pronounced “Tea-Roff”) at Caltech in the mid-1980s, too. I never learned to use it myself, so relied on Helen Tuck (the group administrator then) to do edits for me.

Your thoughts here.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s