The state of YAML in PHP

Fabien Potencier

December 21, 2009

My first exposure to YAML was in 2001, back in the days when I was mainly working with Perl. Well, I was not using YAML per se at that time, but rather Data::Denter, a Perl library that provides data serialization/deserialization. I used this library mainly for debugging purposes. From its documentation:

"It formats nested data structures in an indented fashion. It is optimized for human readability/editability, safe deserialization, and (eventually) speed."

At the end of the year 2002, the module was deprecated in favor of a new serialization language, YAML, with the added bonus of being programming language independent. I promptly switched to use the Perl YAML module, and I never looked back. I used YAML as a mean to debug my Perl programs, but I also started to use it more and more to store configuration data.

When I started to use PHP at the end of 2004, one of the first thing that quickly bothered me was the poor support for YAML in the PHP world.

By the way, if symfony uses YAML a lot, it has nothing to do with Ruby on Rails ;) It just happens that Ruby also has some Perl heritage!

But first, what is YAML?

According to the official YAML website, YAML (YAML Ain't Markup Language), is a human friendly data serialization standard for all programming languages.

YAML can be used to describe both simple and complex data structures. It's an easy to learn language that describes data. As PHP, it has a syntax for simple types like strings, booleans, floats, integers, arrays, and even more complex ones like objects.

Nowadays, YAML is a heavily used format for configuration files, mainly because even non programmers are able to understand and modify YAML files easily.

To sum up the benefits of YAML, I often say that YAML files are as expressive as XML files and as readable as INI files.

Since the creation of YAML, another lightweight data-interchange format has come to life: JSON. JSON is quite similar to YAML (and as a matter of fact, JSON is a subset of YAML); but even if it is easy for humans to read and write, I think it is not as readable as YAML, and a bit too verbose.

YAML

If you already know what is YAML and how to use it to describe your data structures, just skip this section.

Besides strings, Booleans, and numbers, let's have a look at one of the simplest configuration structure you can describe with YAML:

key: value
foo: bar
 

The above snippet is the simplest way to express key/value pairs in YAML. The foo key has a bar value. The equivalent PHP code would be:

array('key' => 'value', 'foo' => 'bar')
 

And that's pretty much covers what you can do with ini files. Speaking of ini files, you can also group key/values under "sections". Here is how this is possible with YAML:

section1:
  foo: bar
 
section2:
  bar: foo
 

The equivalent PHP code reads as follows:

array(
  'section1' => array('foo' => 'bar'),
  'section2' => array('bar' => 'foo'),
)
 

That does the trick because there is several ways to describe key/value pairs. The short notation (foo: bar), and the expanded one, where you use indentation to describe nested structures as above.

The same data structure can also be described as follows:

section1: { foo: bar }
section2: { bar: foo }
 

The {} is how you enclose a hash. That's one of the greatest benefit of YAML as a description format: you can visually organize your data by using one of the three possible notations.

Unlike PHP, YAML makes a difference between hashes (mappings) and arrays (sequences):

[1, 'a string', "another string"]
 

The above snippet, a YAML sequence, is the equivalent of the following PHP code:

array(1, 'a string', "another string")
 

If you mix and match mappings and sequences, short and verbose notations, you can describe very complex data structures:

section1:
  foo: { bar: foo }
  bar: [1, 2]
  foobar:
    - 'a string'
    - 'another one'
 

This section has barely scratched the surface of what you can express with YAML. If you want to learn more, you will find plenty of documentation on the Internet.

YAML in PHP

YAML is human-friendly, but not so developer-friendly for someone willing to write a parser for it. The YAML specification is really huge. If you read it carefully, you can easily imagine that writing a YAML parser is not an easy task. As I mainly use YAML as a configuration format like many other developers, I'm more looking for a fast, incomplete but correct library, instead of a fat, spec-compliant one.

Back in 2005, I was looking for such a YAML parser and dumper for PHP. Chris Wanstrath, who will eventually create Github some years later, wrote one such limited parser and dumper, Spyc, specifically to be used as a simple configuration library.

I used it for symfony 1.0. I fixed some bugs from time to time, but as time passed, I found many limitations and became more and more frustrated about it. One day, I eventually decided to write a more robust and stable YAML parser and dumper for symfony.

Since then, Alexey Zakhlestin created a PECL extension that wraps the Syck library.

At the beginning of 2009, I decided to release this library as a standalone library, with no dependency whatsoever. It means that you can start using it today.

The YAML Symfony Component

Released under the MIT license, the YAML Symfony Component can be used in any application, even commercial ones.

When I created this YAML library for PHP, I had several goals in mind:

  • Ease of use: Installation should be easy and fast. Install it via PEAR, download an archive, or checkout the SVN or Git repository, and you are ready to go. No configuration. Drop the files in a directory and start using it right away.

  • Fast: One of the main goal of Symfony YAML was to find the right balance between speed and features.

  • Unit tested: The library is unit-tested (with more than 400 unit tests as of today).

  • "Real" Parser: To correctly handle a large subset of the YAML specification, a dedicated and hand-written parser has been written. The parser is robust, easy to understand, and simple enough to extend.

  • Clear error messages: Whenever you have a syntax problem with your YAML files, the library should output helpful messages with the filename and the line number where the problem occurred. It eases debugging a lot.

And of course, YAML being not so well-known in the PHP world, the YAML component also comes with a full documentation.

The easiest way to install the Symfony YAML Component is probably to use the PEAR installer:

$ pear channel-discover pear.symfony-project.com
$ pear install symfony/YAML

Using YAML in your Projects

The Symfony YAML library consists of two main classes: one to parse YAML strings, and the other to dump a PHP variable to a YAML string. On top of these two core classes, the main sfYaml class acts as a thin wrapper and simplifies common uses:

// loading a YAML file or a YAML string
$var = sfYaml::load('/path/to/file.yml');
 
// Dumping a PHP variable to YAML
$yaml = sfYaml::dump($var, $inline);
 

YAML for PHP 5.3

The previous sections use the PHP 5.2 compatible version of the library. If you have already switched to use PHP 5.3, the good news is that the YAML Component is already available for that version too. For now, it is only available on the Symfony 2 Subversion repository:

$ svn co http://svn.symfony-project.com/branches/2.0/lib/Symfony/Components/YAML/ YAML
use Symfony\Components\YAML\YAML;
 
// loading a YAML file or a YAML string
$var = YAML::load('/path/to/file.yml');
 
// Dumping a PHP variable to YAML
$yaml = YAML::dump($var, $inline);
 

This version can be autoloaded with any autoloader that follows the standards discussed by some PHP developers. Symfony 2 provides such an autoloader:

require_once __DIR__.'/lib/Symfony/Foundation/ClassLoader.php';
 
use Symfony\Foundation\ClassLoader;
 
$loader = new ClassLoader('Symfony', __DIR__.'/lib');
$loader->register();
 

The YAML Symfony Component is already used by and bundled with many popular Open-Source PHP software like symfony, Doctrine, and PHPUnit. Other frameworks like the upcoming Okapi2 framework and the mootools plugins repository, announced some days ago, make a heavy use of YAML and also use the YAML Symfony Component.

Next time you look for a flexible mean to store or share data, consider using YAML!

Discussion

gravatar simo  — December 21, 2009 09:39   #1
The Yaml component is great! But I really think the ability to convert from xml to yml and vice-versa (as symfony framework do it) is missing. IMHO, the native implementation would make it more powerful and useful.

Is there a plan to go through that feature? thanks for your reply
gravatar Florian Mueller  — December 21, 2009 10:09   #2
Hi,

Wouldn't it be better to return the contents as an iterator (maybe php 5.3 iterators or your own) and then eg provide functionality to determine line number and column of entry:

$contents = YAML::load($file);
while( $root = $contents->next() ) {
// we just wanna have value of foo
$foo = $root->getValue('foo');

// $fooBar is an entry object
$fooBar = $root->get('foobar');
$line = $fooBar->getLine();
$column = $fooBar->getColumn();
$fooBarValue = $fooBar->getValue();

// entry object again iterable aswell
$bar = $root->get('bar');
while( $b = $bar->next() ) {
// ...
}

cheers, Florian
gravatar Fabien  — December 21, 2009 10:38   #3
@simo: You can have create a generic converter from XML to YAML or vice-versa, because the semantics are quite different. In symfony, we support both YAML and XML, but the conversion is hand-crafted for each feature.
gravatar romanb  — December 21, 2009 11:14   #4
I prefer XML over YAML any day because I get automatic validation + intellisense + code completion against a DTD/XSD by any decent XML editor. Compared to that, working with YAML files that you're not yet familar with can be a real pain.

Or did I miss something and there is a way to describe the valid structure of a YAML document (an equivalent to a DTD/XSD?) which can then be used by tools/IDEs to validate the document as you type and give inline help+intellisense etc.?

gravatar rinie  — December 21, 2009 12:19   #5
Unfortunately YAML uses indentation for blocks.
I prefer freestyle whitespace and {},
so Json...
gravatar Ren  — December 21, 2009 13:47   #6
The basics of YAML are human readable and editable.

But whomever is driving the YAML spec seems to put those desirable features aside, and added a load more complexity.

gravatar Robin  — December 21, 2009 17:56   #7
YAML can be great for config files and such, altough I agree with Romanb that it can be a problem that there is no standard schema definition for YAML yet.
But YAML should not be used on unreliable data streams; it is possible to describe data without an end delimiter. Therefore the parser can not be sure it has read all the data. With e.g. XML it uses an end element for that..
Anyway, great to have a choice :)
gravatar eswar  — December 21, 2009 18:11   #8
hi
gravatar sapphirecat  — December 21, 2009 19:49   #9
@Fabien: "You can have create a generic converter from XML to YAML..."

Perhaps that would be better said as "You cannot easily create a generic converter from XML to YAML..."

I tried it once, and the specific problem I ran into was that XML provides more dimensions of data than YAML, because XML tags _span_ document text. Elements and attributes can be fairly easily translated to YAML, but handling anything else seemed to require making the generated YAML look like the DOM API (list of children, node types, etc.). Otherwise, I couldn't losslessly preserve e.g. a paragraph with some emphasized words.
gravatar simo  — December 21, 2009 23:15   #10
@Fabien > I thought the conversion was done by a generic and smart class! Pity, it's not that easy! it makes sense to switch from one to an other.

@sapphirecat > thanks for decoding ;-))
gravatar Fabian Spillner  — December 22, 2009 22:42   #11
Thank you for share sfYaml component! It's so powerful, useful and working great and I couldn't imagine a project without Yaml configuration files. Thank you again!
gravatar Jeff Dickey  — December 23, 2009 08:22   #12
I've been using YAML for a while now (see, for example, this recent blog entry at http://archlever.blogspot.com/2009/11/reuse-renew-recycle-data-structures.html). It's now my default format for configuration files and structured persistence in general, after nearly ten years working with XML. XML has better tools, as people have noted, but the only tool you really need for YAML 1.1 is a text editor; if your YAML is getting too unwieldy, that's an indication that you may want to refactor or redesign your code that uses it.

And as far as comparisons with JSON go.... you can express JSON /in/ YAML; there's at least a couple of people out there who've written code to do that. To me, that's akin to replacing a flat screwdriver with a Philips; don't use either on any "nails".
gravatar Olivier El Mekki  — December 26, 2009 01:00   #13
I like YAML as well as JSON, but I never conceived them as similar.

Maybe theirs designs are alike, but I won't try to use json as configuration files (not expressive enough), as well as I won't try to send YAML back to an ajax request (json is builtin in the javascript layer of many browsers). They each happened to have different purpose.

Now, there is this new BSON, used in mongo database...
gravatar bout de papier  — January 13, 2010 14:44   #14
Bonjour,

Vu que l'on ne peut plus commenter sur l'utilisation des templates mais que cela me démange trop je me permet de le faire ici :)

Juste pour dire : l'utilisation des templates risque de faire finir symfony comme d'autres projets franco français (pour citer SPIP ou Dotclear) : aux oubliettes !

EN dehors de la considération utile ou non : il y a la considération de la communauté et de l'histoire (je l'accorde : relativement courte) de celle-ci qui a montré à plusieurs reprise que c'était une mauvaise idée.