ZSH Globbing as an Alternative to Find Command

Dmitry Antonyuk
6 min readDec 30, 2020
Photo by Lynn Kintziger on Unsplash

The find command is one of the most frequently used commands in Unix environment. Following Unix principle “Do one thing and do it well”, it searches for the files by a wide range of filters: file names, creation or modification dates, ownership, permissions, etc. Just look at different examples illustrating how this command can be used. If you use the terminal you know how important it is.

However, I noticed that I don’t use it anymore since I switched to zsh. Zsh is an interactive Unix shell considered as an extended version of bash. It has a lot of pretty features, and one of them is file globbing. File globbing allows you to specify files you want to operate on. Even if you never heard about it, you definitely used it. Just remember how you list JPG files in some directory:

ls *.jpg

Let’s just walk through several examples to see how else it can be used.
But before that make sure you set the option EXTENDED_GLOB first:

setopt extended_glob

We need it to use some specific features of file globbing, so better put
it to your .zshrc file.

Searching by names, directories, extensions

File name globbing is extremely impressive. It supports alternatives,
negation, recursive search, ignoring case, and much more.

Just look at this example:

ls **/(^(2020|copy)*).{jpg,jpeg,gif,png}

It’ll find all the files that have extensions jpg, jpeg, gif, or
png, do not start with 2020 or copy, in the current directory and
recursively in subdirectories. The parts of the pattern are pretty much
self-explanatory.

By the way, to test this example I needed a set of files, so I’ve created
them using

touch {1,2,3,4,5,2020–1,2020–2,copy1,copy2}.{jpg,jpeg,png,gif,txt}

One more example. Zsh supports approximate matching allowing to search
files by names having some spelling mistakes. E.g.

ls (#a1)cpy*

will find all files started with `copy` although the pattern says cpy.
The number 1 determines how many mistakes may be made.

Search by File Types

Zsh globbing patterns may use a list of qualifiers. One can define them
in trailing parentheses. For EXTENDED_GLOB option it looks like (#qX)
where X is a place where qualifiers can be put in. I’m going to use a
bit simpler version (X) but you need BARE_GLOB_QUAL option to be set
(by default it is). We’re going to use qualifiers a lot in the next
sections.

OK. So using qualifiers we can specify what kind of file we are looking
for. E.g. when we are grepping something recursively like this:

grep pattern **/*

we can get a bunch of warning messages notifying us about directories that obviously can’t be handled. Apparently, we can specify -d recurse option for grep command but it’s easier to say “handle only regular
files” by using . qualifier:

grep pattern **/*(.)

Zsh has qualifiers for directories (/), symbolic links (@), sockets
(=), and pipes (p). It even has qualifiers for “full” (i.e.
non-empty) directories (F). To find empty ones use (/^F).

Search by Permissions

Qualifiers can be used to search files by the file access permissions.
E.g. to find recursively all owner-executable files:

ls **/*(.x)

Note that we use . here to specify regular files since directories
could have an owner-executable flag as well.

To emulate find’s -perm argument one can use f qualifer:

ls **/*(f755)

returns a set of executable and readable files that are writeable only
for the owner.

Qualifier f is extremely flexible: (f7??) gives the files where the owner can read, write, and execute regardless of permissions for the
group and the others; (f-100) gives the files where the owner does not have execute permissions. You can even use more readable
(f:gu+w,o-rx:) to specify what permissions should be set/unset.

Search by Ownership

Using qualifiers it’s possible to test against the user and group owning
the files:

> ls **/*(u/warlock/)
> ls **/*(g/wizards/)

As usual, ^ negates the following qualifier:

ls **/*(.^u/warlock/)

finds all regular files owned by anyone but warlock.

Search by Time

Sometimes we need to find the files created last week, or not modified
by today, etc. In that case, we use time qualifiers. There are three of
them: a for access time, m for modification time, and c for
creation time. The syntax is the same for all three qualifiers, so I’m
going to provide examples for only one of them, let’s say c.

E.g. to search for the files created the last two weeks we can use

ls **/*(cw-2)

w is for weeks here. Other unit specifiers are M (months), d
(days), h (hours), m (minutes), s (seconds). If the unit specifier
is not defined, it’s considered to be a d.

If you want to remove some old files, let’s say older than 2 months, you
can use:

rm *.log(cM2)

Search by Content Length

This qualifier is similar to the time qualifiers. Let me show an
example:

ls -la **/*.java(Lk+100)

It displays all the java files with a length more than 100K. `k` is the
size specifier for kilobytes. If no specifier is defined, it’s
considered to be bytes.

Note that size specifiers define how file size is going to be treated.
E.g. m4 means “anything between 3 megabytes (exclusive) and 4
megabytes (inclusive)”:

ls -la **/*(Lm4)

So if you want files with a size equal or less than 3 megabytes, use:

ls -la **/*(Lm-4)

Yes, it’s 4 since the file size 3,284,835 is treated as 4m.

Search by Custom Filter

There are situations when provided filters are not enough. If we use
find command we specify -exec argument. For zsh globbing it’s + and e qualifiers. The e qualifier allows you to specify a string
testing the file while the qualifier + allows you to specify a
command. Both of them use the REPLY environment variable as a testing file’s name. Let me show you examples:

ls -la *(e:’[[ $(file -bi $REPLY) = image/* ]]’:)

It’ll find all the files that have image/* mime type. We can extract
it to a separate function and use it instead:

> im() { [[ $(file -bi $REPLY) = image/* ]] }
> ls -la *(+im)

We can even change the output to something else. E.g. we can print the
files size and sum them up:

> im() { [[ `file -bi $REPLY` = image/*; reply=`wc -c <$REPLY` ]] }
>
> sum=0
> for size in *(+im)
> do
> ((sum += size))
> done
>
> echo $sum

Drawbacks

Of course, there is no such thing as an ideal tool. And file globbing
has drawbacks as well. One of them is that shell has its limitation on
the command line argument length (ARG_MAX). Try getconf ARG_MAX to get the limitations on in your system. The find command, on the other
hand, has no such limitation and will print all the files it’s found.

There are several solutions to this problem. First of all, we definitely
can use find. Another one is to split it into several commands, or maybe even run it one by one by using for.

Conclusion

I tried to show how flexible and expressive zsh file globbing can be,
what cases it covers, and how to extend it in order to cover some cases it does not cover out-of-the-box. We saw that in most cases there is no need in find command since we can do the same using the file expansion described in the article.

So next time when you need to get git log for all js and ts files longer than 100K that were modified within last week and are readable for everyone just use zsh:

git log -- **/*.{js,ts}(Lk+100mw-1f/ugo+r/)

Not so tricky, after all.

--

--