Monday, July 30, 2012

#perl: How to manipulate XML files using XML::Twig module

The goal is using XML::Twig to add and remove fields in a sample XML file.

We have a list of characters with names and scores. We'll want to remove the score field and add a percentage score field. This library is object oriented, so we'll be using methods. In this example we will also load all the content into memory.

It will help you understand the code if you know that Twig creates a tree structure, where each character is a "child" of the root and each person's name, score, and league fields are "first children."

Input file:
<?xml version="1.0"?>
<name>Fred Flintstone</name>
<name>Barney Rubble</name>


use XML::Twig;
my $twig = new XML::Twig;
# build a new twig
# get the root of the twig. This is above the characters
my $root = $twig->root;
# each character is a child
my @people = $root->children;
# you can print the whole thing simply using
# or make it nicer
#$twig->print(pretty_print => 'indented');
# or iterate and pick what you want and formating
foreach my $person (@people) {
# print the name
# list the score
# print the whole content, from character tag to it's closing
print "\n";

# *** ADDING A FIELD - perc_score (score/300)
my $out_of = 300;
foreach my $person (@people) {
        # get hat person's score
my $score = $person->first_child('score')->text;
# computer % score
my $perc_score = sprintf("%3.2f", $score/$out_of * 100);
my $element = new XML::Twig::Elt('perc_score', ,perc_score); # create a new element
$element->paste('last_child', $person); # paste it into the document
# admire the results
$twig->print(pretty_print => 'indented');

# name of the filed we will remove
my $field = 'score';
# new twig that will use a subroutine to go thorough the content
my $twig3 = new XML::Twig(twig_roots => { $field => 1 }, twig_print_outside_roots => 1, twig_handlers => { $field => \&field } );
sub field 
my($twig, $field) = @_;
# save to a file

open OUT, '>', 'sample_out.xml' or die "Can't save output to file: $!";
# this is tricky, print takes the filehandle as a parameter
$twig->print(\*OUT, pretty_print => 'indented');

# the same thing can also be done by loading the content into an array and iterating through it, instead of by using a "twig_handler" subroutine:

foreach my $character ($root->children('character')) {
# the actual removal of score
open OUT, '>', 'sample_out.xml' or die "Can't save output to file: $!";
$twig->print(\*OUT, pretty_print => 'indented');

# person's name and actual score

my $leader_name;
my $leader_score = 0;
my $twig2 = new XML::Twig(twig_handlers => { character => \&player });
# for this to work it must be the case that either the parse above, or the "new XML..." statements execute the player handler on each person.
print "\nLeader in the league: $leader_name score: $leader_score\n";
sub player 
my($twig, $player) = @_; # handler's params are always the twig and the element
my $score = $player->first_child('score')->text;
if($score > $leader_score) {
$leader_score = $score;
$leader_name = $player->first_child('name')->text;


No comments: