Find your Files

Fabien Potencier

April 22, 2010

The best practices for finding files with PHP has evolved a lot in a the last few years. Back in 2004, one of the very first thing I did with PHP was porting the File::Find::Rule Perl module to PHP. File::Find::Rule is a great way to describe the files and directories you want to work with. I used the opendir, readdir, and closedir native PHP functions, and it did the job quite well. The PHP class was named sfFinder, and it can still be found in all symfony versions. Even if the class is bundled with symfony, I know that a few people use it for all kind of stuff, not necessarily related to symfony.

But the code starts to show its age; first because I learned a lot since then about PHP, and also because there is a better way now. Enter iterators! PHP 5 comes bundled with a bunch of iterator classes that ease all kind of, well, iterations. You can iterate over an iterator with the standard foreach operator, a very powerful PHP construct.

PHP Iterators

So, how do you get all the files and directories recursively with PHP iterators? Frankly, I don't know. Well, I know more or less which classes to use and how to assemble them, but instead of thinking too hard, I always copy and paste an existing snippet of code to get it right. Here is such a snippet:

// some flags to filter . and .. and follow symlinks
$flags = \FilesystemIterator::SKIP_DOTS | \FilesystemIterator::FOLLOW_SYMLINKS;
 
// create a simple recursive directory iterator
$iterator = new \RecursiveDirectoryIterator($dir, $flags);
 
// make it a truly recursive iterator
$iterator = new \RecursiveIteratorIterator($iterator, \RecursiveIteratorIterator::SELF_FIRST);
 
// iterate over it
foreach ($iterator as $file)
{
  // do something with $file (a \SplFileInfo instance)
}
 

Noticed the fancy \ character before each built-in class? That's the way you reference built-in PHP class when using them in a PHP 5.3 namespace context.

As you can see for yourself, nothing complex. You just need to know the which Iterator to use, their possible flags, and how to compose them together. So, the first barrier of entry is the learning curve. There are a lot of great tutorials and presentations on the Internet about iterators, but the official documentation on php.net probably lacks some good examples.

The other "problem" is that everything is very object-oriented. And as soon as you want to filter the iterator, you will need to create your own classes, which seems impractical most of the time. That's because PHP iterators are very powerful and have been written to be general-purpose iterators.

What is filtering? Let's say I want to exclude all files ending with .rb from the iterator. I can create a simple \FilterIterator for that:

class ExcludeRubyFilesFilterIterator extends \FilterIterator
{
  public function accept()
  {
    $fileinfo = $this->getInnerIterator()->current();
 
    if (preg_match('/\.rb$/', $fileinfo))
    {
      return false;
    }
 
    return true;
  }
}
 

This filter iterator can be used with the previous one by wrapping it like this:

$iterator = new ExcludeRubyFilesFilterIterator($iterator);
 

That's easy enough. But when I need to find files and directories, I always need the same kind of specialized filters, like excluding VCS files (like .svn and .git directories), filtering files by name or by size.

The Symfony Finder Component

Instead of writing the same iterators over and over again, I have packaged them in a Symfony Component: the Finder component.

The Symfony Finder Component provides many specialized Iterator classes for finding files and directories. It also adds a wrapper on top of them to ease its day-to-day usage.

As any Symfony component, you first need to bootstrap your script with any class loader that is able to load classes that follows the PHP 5.3 interoperability standards, like the Symfony UniversalClassLoader class:

require_once '/path/to/src/Symfony/Foundation/UniversalClassLoader.php';
 
use Symfony\Foundation\UniversalClassLoader;
 
$classLoader = new UniversalClassLoader();
$classLoader->registerNamespace('Symfony', '/path/to/src');
$classLoader->register();
 

Now, let's see how to use the Finder class, the main class of the component:

use Symfony\Components\Finder\Finder;
 
$finder = new Finder();
$iterator = $finder->files()->in(__DIR__);
 
foreach ($iterator as $file)
{
  print $file->getRealpath()."\n";
}
 

The above code prints the names of all the files in the current directory recursively. Notice that the Finder class uses a fluent interface, which means that all methods return the Finder instance. The only exception is the in() method, which builds and returns an Iterator for the given directory, or for an array of directories:

$iterator = $finder->files()->in(array('/path1', '/path2'));
 

You can convert an iterator to an array with the iterator_to_array() method, and have the number of items with iterator_count().

If you want to restrict the iterator to only return PHP files in the current directory, use the name() and maxDepth() methods:

$iterator = $finder
  ->files()
  ->name('*.php')
  ->maxDepth(0)
  ->in(__DIR__);
 

The name() method accepts globs, strings, or regexes:

$finder
  ->files()
  ->name('/\.php$/');
 

There is also methods to exclude files by name or to exclude whole directories content from matching:

$finder
  ->files()
  ->name('test.*')
  ->notName('*.rb')
  ->exclude('ruby');
 

The result should contain files named test with any extension, but not the ones ending with .rb (it excludes test.rb), and the iterator won't match any file in ruby directories (ruby/foo/test.php won't match for instance).

If you want to follow links, use the followLinks() method:

$finder
  ->files()
  ->followLinks();
 

You can also restrict files by size:

$finder
  ->files()
  ->name('/\.php$/')
  ->size('< 1.5K');
 

Most of the methods are cumulative. So, if you want to get all PHP and Python files with a size between 1 and 2 K, here is the code:

$finder
  ->files()
  ->name('*.php')
  ->name('*.py/')
  ->size('>= 1K')
  ->size('<= 2K');
 

By default, the iterator ignores popular VCS files. This can be changed with the ignoreVCS() method.

As the in() method returns an \Iterator instance, you can wrap it with your own specialized iterator. But instead of creating a class, you can also use the filter() method:

$filter = function (\SplFileInfo $fileinfo)
{
  if (strlen($fileinfo) > 10)
  {
    return false;
  }
};
 
$finder
  ->files()
  ->name('*.php')
  ->filter($filter);
 

This example excludes all the files with a file name of more than 10 characters.

Want to sort the result by name, use the sortByName() method:

$finder
  ->files()
  ->name('*.php')
  ->sortByName();
 

Notice that the sort* methods need to get all matching elements to do their jobs. For large iterators, it can be rather slow.

Under the hood, the Finder class uses specialized iterator classes:

  • ChainIterator
  • CustomFilterIterator
  • DateRangeFilterIterator (coming soon)
  • ExcludeDirectoryFilterIterator
  • FileTypeFilterIterator
  • FilenameFilterIterator
  • IgnoreVcsFilterIterator
  • LimitDepthFilterIterator
  • SizeRangeFilterIterator
  • SortableIterator

Have a look at the code to learn more about these iterators and how they work.

Discussion

gravatar David J.  — April 22, 2010 09:27   #1
That's a good news, sfFinder was a good tools and porting it to Symfony2 and new PHP coding standards was a must have.
Something that was missing in the sfFinder was the ability to specified rules on the element path and not only on the name. I never find how to include/exclude files or folder with conditions similar as the one we can use for rsync (For exemple /web/*_dev.php or /plugins/*/web). We were able to only set condition on the file name...
Is this is a new feature of this Finder release? I'm going to look at the code, see if it's already build in. If not, can be a good challenge ;-)
gravatar Patricio  — April 22, 2010 09:34   #2
Nice =)

But why ignore CVS files by default? I know they are annoying many times, but ignoring them seems like a strange behavior to be the default...
gravatar CoolGoose  — April 22, 2010 09:44   #3
This is awesome.

And yes, as you said iterators are something harder to grasp that the normal php concepts and they're not advertised a lot.
gravatar Bob  — April 22, 2010 12:49   #4
Hi Fabien !

I'm wondering why you made an exception in the fluent interface with the in() method.

It would be nice if we could chain in() methods to specify more than one directory to search into.

To get the iterator one could use a specific method iterator() or even use a cast :

$iterator = $finder->files()->in($dir1)->in($dir2)->iterator();

$iterator = (\FilesystemIterator) $finder->files()->in($dir1)->in($dir2);

What do you think ?
gravatar Fabien  — April 22, 2010 13:04   #5
@Bob: The in() method already takes an array of directories, so there is no need to chain them. You iterator() method already exists, it's in() ;)
gravatar Bob  — April 22, 2010 13:08   #6
@Fabien: I agree, i just thought the interface would be even more fluent this way... :)
gravatar MisterA  — April 22, 2010 13:18   #7
Pff, just great, as usual. Thx fabpot.

@Bob: You're right, why should in() be different of name() or size()? Kind of curious IMHO.
gravatar nedy  — April 22, 2010 14:29   #8
Great job done, Fabien!

One suggestion though. IMO the method name exclude() is somehow ambiguous. If I glance at the code, I cannot tell whether the method is excluding directories or files.
It would be much easier to understand what the method is performing, if it is named excludeDirectory() or excludeDir().
gravatar Jonathan Nieto  — April 23, 2010 01:53   #9
Great job, as always!


@nedy: agree with you
gravatar Peter  — April 23, 2010 10:16   #10
It is good to see a really nice update to the finder; it certainly makes lots of sense to adopt the delights of the SPL.

P.S. If you have any particular thoughts, ideas, requests for what we can do to improve the documentation (even if it is just "finish it") then do let me know.
gravatar http://www.b3-e.com  — April 28, 2010 12:44   #11
Nice :)
That's a good news
gravatar Ross  — May 04, 2010 09:26   #12
Few questions:

- Could Finder implement IteratorAggregate and using the interface for the building step instead of in()? This could enable some extra magic and allow stacking functions that would otherwise be kickers.

- Will you completely rule out multiple in() calls? If I pass Finder through an event, I'd have to track an instance AND an array. That feels a bit less than ideal.

- Can Finder's ChainIterator be replaced with SPL's AppendIterator? (I should just try that with the test suite...)

- Any intention to further Finder/another component into a general purpose DSL for iterators?

The first few points are fairly trivial to implement, I'd be glad to send a patch if you have any interest. Thanks in advance.
gravatar Fabien  — May 04, 2010 11:08   #13
@Ross:

Thanks for your comment.

I have just replaced ChainIterator with AppendIterator, as it indeed does the same thing.

For the first two points, I will play a bit with the idea and see how it goes. Thanks for the suggestions.
gravatar Fabien  — May 04, 2010 11:38   #14
@Ross:

Using IteratorAggregate is definitely a good move. I have just committed my changes:

http://github.com/fabpot/symfony/commit/aaeb48f744af3673aa4fc23529a3f4c955e4776d

It means that the in() method is now fluent as any other method of the class.

You can get the iterator with the getIterator() method, but most of the time, PHP will do the right thing, thanks to the IteratorAggregate interface.

So, iterating over a Finder instance is the same as before, with the added benefit you mentioned in your comment (the possibility to chain in() methods, and to pass the finder as an argument).
gravatar Jason  — May 05, 2010 05:26   #15
Is there a recommended way to include a Symfony Component's files into a symfony 1.4 application (ie. checking out the entire Symfony2 codebase into someplace like root/lib/vendor/symfony2/Symfony?

I don't see these components packaged independent of one another anywhere yet (perhaps I'm not looking in the right places, or perhaps it is too early).

I have a lot of use for this Finder component in an existing symfony 1.4 application.
gravatar Fabien  — May 05, 2010 07:45   #16
@Jason: symfony 1.4 has the equivalent, the sfFinder class. The interface is quite the same and the possibilities are the same too.
gravatar glaaha  — May 14, 2010 01:32   #17

this is wonderful tutorial .. i read it 3 times and get a fantastic results and sure i put a
copy of this lesson on my site here

http://www.glaaha.com