Taming Messy URLs

A wet, shaggy dog walks through a puddle

CMOS 14.10 in the Spotlight

From the perspective of writers and editors, URLs do their best work behind the scenes or just off the page, in a browser’s address bar. In that role—as an internet address that will take you to a specific page online—it doesn’t matter all that much what a URL looks like so long as it works.

Looks do matter, however, when you need to mention or cite a URL in the text or share it with someone. Most domain names are easy enough to deal with (think Amazon.com). It’s when URLs go beyond the home page that they tend to get messy.

But just because it’s been copied from an address bar and takes you to the right place doesn’t mean you’re stuck with it. Whenever you’re dealing with a URL that isn’t reasonably short and readable by humans, you should look for alternatives.

Getting to the Right Link

Writers typically pay attention to a URL (uniform resource locator) whenever they need to link a bit of text or because they’re citing a source. It’s usually part of an editor’s job, in turn, to double-check that these links work as intended.

Let’s say you’re writing a blog post, and you mention The Great Gatsby. To make sure your readers are on the same page, you link the title to the version you read: the 2020 mass-market paperback published by Simon & Schuster’s Pocket Books division under its Scribner imprint and featuring an introduction by Jesmyn Ward.

This isn’t as easy as it sounds. As of this year, Fitzgerald’s original 1925 text has fallen out of copyright, and dozens of editions, of varying quality, have flooded the market. But if you go to Amazon.com and enter some of the publication details in a search, you should eventually find the right one:

Cover of The Great Gatsby, with an introduction by Jesmyn Ward

Here’s the URL resulting from that search, copied and pasted from the address bar in Chrome:

https://​www.amazon.com/Great-Gatsby-F-Scott-Fitzgerald/dp/1982146702/ref=sr_1_3?dchild=1&keywords=great+gatsby+mass+market+paperback+jesmyn+ward&qid=1625929848&sr=8-3

That URL is kind of long, but at least it works (try it). Still, it’s ugly.

Making It Reader-Friendly

The link from Amazon is ugly, but it doesn’t have to be. To improve it, try deleting from the end of the URL, starting at the point where it looks like it’s intended for a computer rather than a human—in this case, the part beginning “/ref=sr_1_3?dchild= . . .”

Here’s the result:

https://​www.amazon.com/Great-Gatsby-F-Scott-Fitzgerald/dp/1982146702

This new and improved URL leads to the same place as the ugly longer version.

You might wonder why we didn’t also get rid of the part with the numbers—“/dp/1982146702”—to make an even shorter URL:

https://​www.amazon.com/Great-Gatsby-F-Scott-Fitzgerald

But that URL doesn’t work at all:

A page from Amazon.com showing a picture of a small dog and the following text: “SORRY, we couldn’t find that page. Try searching or go to Amazon’s home page. Waffles. Meet the dogs of Amazon.”

It turns out that the numbers, in this case the ten-digit version of the ISBN for the Simon & Schuster edition (following dp, for the product detail page), are an essential part of the URL.*

As for the part we cut off in the version that works (i.e., after the ISBN), that stuff reflects the search terms used on Amazon to find the title and some other details about how we arrived on that page. But those things matter only to the people maintaining the page and tracking our activity. For everyone else, they’re not important.

If in doubt, use a process of elimination. The shortest version of a URL (from Amazon or anywhere else) is the one that breaks or no longer goes to the intended page if you delete the last character. Cut the final “2” from the improved example above and we’re back to a cute dog from Amazon telling us that the link doesn’t work. Restore the “2”—and any portion thereafter, starting with the forward slash—and the link still works.

Plan B: Find a Better Source

Sometimes the thing to do—rather than trying to edit an Amazon or Google Books† or other third-party link into shape—is to go to the source. That’s not always possible, but in this case it’s a recent book, and the publisher maintains a page for it.

A Google search should get you there. Try “simon and schuster mass market gatsby” (or something similar). Disregarding the ads, the first hit seems like the right one, so we’ll go there. Here’s the URL, copied from the address bar for that page:

https://​www.simonandschuster.com/books/The-Great-Gatsby/F-Scott-Fitzgerald/9781982146702

This time there’s no search syntax (that stuff doesn’t carry over from our Google search), and the URL looks a lot like the edited version of the Amazon URL. The main difference is that the one from Simon & Schuster uses the longer 13-digit version of the ISBN.

And at the Simon & Schuster page there’s less danger than there would be at Amazon that a reader will click over to one of the many other editions from other publishers by mistake.

Subscriptions and Proxies and Permalinks

Now let’s say you’re doing research on Gatsby, and you want to cite a particular book by John Irwin that you’ve read through your library’s subscription to EBSCOhost. That should be easy, because EBSCO offers a “Cite” link on the page that includes a Chicago-style author-date citation (edited here to correct minor stylistic errors):

Irwin, John T. 2014. F. Scott Fitzgerald’s Fiction: “An Almost Theatrical Innocence.” Baltimore: Johns Hopkins University Press. https://​search-ebscohost-com.proxy.uchicago.edu/login.aspx?direct=true&db=e000xna&AN=662214&site=ehost-live&scope=site.

That URL is ugly, so let’s instead try copying what EBSCO offers as a “permalink”:

http://​proxy.uchicago.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=e000xna&AN=662214&site=ehost-live&scope=site&ebv=EB&ppid=pp_1

But that’s ugly too; in fact, neither of those URLs is as useful as we would hope. Not only are they both ugly, but they both rely on “proxy.uchicago.edu” for the domain. That’s because EBSCOhost requires a subscription, and we logged in with our credentials through the University of Chicago Library.

Readers who don’t have access to that library will land on a page asking them to log in. Nor will it help to edit the link to get rid of the proxy info:

https://​search.ebscohost.com/login.aspx?direct=true&db=e000xna&AN=662214&site=ehost-live&scope=site&ebv=EB&ppid=pp_1

If you don’t have the right credentials, you’ll still end up on a page that asks you to log in—while providing no details about the source (and no cute dog).

Your best option in a case like this one may be simply not to use the URL—not in your Chicago-style citation and not as an embedded link behind the title or other text. Instead, cite EBSCOhost as the source in place of a URL:

Irwin, John T. 2014. F. Scott Fitzgerald’s Fiction: “An Almost Theatrical Innocence.” Baltimore: Johns Hopkins University Press. EBSCOhost.

Interested readers should be able to find the book from the publication details alone, either at EBSCO (when they can get access) or at a library or bookstore.

DOIs

Subscription access isn’t always a barrier to providing a link. If the URL points to a page that includes details about the source and where or how to get access to it, it will be useful to your readers. A URL based on a DOI (for Digital Object Identifier, an international standard for facilitating persistent links) will always do that.

For example, try clicking on https://doi.org/10.1017/S0021875815000663 (for an article by Laura Goldblatt published in the February 2016 issue of Journal of American Studies). Even if you don’t have access to that journal, you’ll get an abstract for the article and other useful details. (For more on DOIs, see CMOS 14.8.)

Bitly et al.

Any URL can be radically shortened via a service like Bitly. For example, the URL for the Simon & Schuster page for The Great Gatsby could be rendered as https://bit.ly/3r6H30U (linked here for your convenience).

That’s very useful in many scenarios, but there’s an obvious disadvantage to such a link. Readers can’t tell what it is or where it might lead. That’s why Chicago recommends against using third-party link-shortening services for URLs in source citations. (Shortened DOIs are an exception; see CMOS 14.10.)

One thing you can do—for any URL—is to omit the protocol (“https://” or “http://”; the s is for secure), which browsers supply by default. Chicago recommends leaving it in for URLs in source citations, partly because when you copy and paste an address from your browser, the protocol gets carried with it. But in other contexts, or if your style allows, you can leave it out.


* You could also get rid of the part with the title and author and use https://www.amazon.com/dp/1982146702. That link still works as intended, but it no longer tells human readers what it points to.

† For an example featuring a URL from Google Books, see CMOS 14.10.

Top image: Shaggy, by Frank Shepherd, licensed under CC BY-SA 2.0.

Please see our commenting policy.

4 thoughts on “Taming Messy URLs

  1. One solution you might mention with regard to long (or any) URLs is that they can be formatted to wrap elegantly, especially in fully justified text. This is achieved by inserting the zero-width space, also called no-width space (Alt+8203), wherever needed. CMS, 17th ed., par. 7.46, gives a list of such possible breaks. In Microsoft Word these spaces look like a small grey rectangle in portrait orientation with another one inside it when paragraph marks are displayed. Of course, one needs to insert these spaces in all possible places to ensure correct wrapping when the URL’s position changes. Of course, writers proficient with Visual Basic for Applications can write a macro to do this on any block of selected text or all URLs if so desired.

  2. A publisher I’m working for is actually now dropping both http:// and www. So far every link I’ve tested works. Strongly agree with the importance of testing, at proofs too.

  3. I’ve found that it’s usually safe to delete “?utm” and “?fbclid” and anything that follows them. (But of course I always test this!)

  4. The publisher I just retired from, and several others I’ve worked for, omitted “https://” or “http://” only when it was followed by “www.,” – copy/pasting URLs without “www.” into a browser frequently doesn’t work, and copy/pasting URLs that have it always works.
    I can’t stress often enough what you devote only one sentence to: the editor (or copyeditor) MUST test every link in a manuscript, and if any changes are made during the editing-typesetting-proofing stages, the links must be checked again. I can’t tell you how many times a dot or the m of .com got accidentally dropped!

Comments are closed.