Parsing the WoW Armory without XML
A month or so ago Blizzard moved the WoW Armory to Battle.net servers. Currently, the new WoW Armory does not offer XML feeds for the data. I spent a few hours working with PHP and DOM to create a 'parser' for the new Armory. The below script is a trimmed down version of what is currently being used for the We Know Roster. I am only providing the back end script that will do the parsing and store the information in a MySQL database. Front end displaying can easily be achieved by querying the database with the stored results.
The script will pull the following information for each member in a specified guild: Name, Level, Class, Rank, Achievement Points, Profession 1 Name+Level, Profession 2 Name+Level, Talent1, and Talent2.
The scripts below require modifications to work properly. I recommend having knowledge of PHP/CLI before working with this script. I will develop a more user friendly version of this script only if Blizzard does not supply useful XML or JSON feeds in a reasonable amount of time.
The Bash Script:
The bash script pulls the newest HTML Roster file from the new Armory. This could probably be pulled via the PHP script, but since the file is several thousand lines long, I found it more efficient to save the file first and read it locally.
Please pay special attention to the paths, they will need to be altered in order to work correctly.
#!/bin/bash #Replace YOUR_GUILD_NAME_HERE with your guild name. If your Guild Name is two or more words, it should be in the format #of Your%20Guild%20Name wget --directory-prefix=/path/to/your/desired/directory/ http://us.battle.net/wow/en/guild/YOUR_SERVER_HERE/YOUR_GUILD_NAME_HERE/roster mv /path/to/your/desired/directory/roster /path/to/your/desired/directory/roster.html php /path/to/php/file/ParseRoster.php
The SQL Dump:
Import this into a MySQL database.
-- -- Table structure for table `roster` -- CREATE TABLE IF NOT EXISTS `roster` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `name` VARCHAR(255) NOT NULL, `race` VARCHAR(255) NOT NULL, `class` VARCHAR(255) NOT NULL, `level` VARCHAR(255) NOT NULL, `rank` VARCHAR(255) NOT NULL, `ap` VARCHAR(255) NOT NULL, `prof1name` VARCHAR(255) DEFAULT NULL, `prof1value` VARCHAR(255) DEFAULT NULL, `prof2name` VARCHAR(255) DEFAULT NULL, `prof2value` VARCHAR(255) DEFAULT NULL, `talent1` VARCHAR(255) DEFAULT NULL, `talent2` VARCHAR(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
The PHP Backend:
The PHP file should be fairly straight forward.
A few notes:
- I have a config file that holds information for my database, if you have a similar file you should include it, otherwise add in the proper mysql_connect() information.
- Make sure the path to the Roster.html file is correct.
< ?php /** * This script will parse the new WoW Armory without an XML file. * This script will currently pull the Name, Level, Class, Race, * Achievement Points, Professions, and Talents of every member * In a specified guild. The script works for me but may not work * as expected on every system. Use at your own risk. * * @author Josh Grochowski (josh[dot]kastang[at]gmail[dot]com) * */ set_time_limit(8000); include("/path/to/config/file.php"); getRosterInformation(); function getRosterInformation() { $roster = file_get_contents("/path/to/roster/file/roster.html"); $dom = new domDocument; $dom->loadHTML($roster); $dom->preserveWhiteSpace = false; //The first tbody tag marks the start of the actual //'roster' part of the html. $roster = $dom->getElementsByTagName('tbody'); //Each Character has its own tr block. $char = $roster->item(0)->getElementsByTagName('tr'); foreach ($char as $c) { //Character information is split into individual //td blocks. $charInfo = $c->getElementsByTagName('td'); $charImages = $c->getElementsByTagName('img'); //I only care about active characters. Inactive characters //will display 0 Achievement points. if((int)$charInfo->item(5)->nodeValue > 0) { $name = $charInfo->item(0)->nodeValue; $race = $charImages->item(0)->getAttribute('src'); $class = $charImages->item(1)->getAttribute('src'); $level = $charInfo->item(3)->nodeValue; $rank = trim($charInfo->item(4)->nodeValue); $ap = trim($charInfo->item(5)->nodeValue); //Returns an array containing the professions name/level and //talents of each individual character. $charArray = getCharacterInformation($name); $query = "INSERT INTO roster(name,race,class,level,rank,ap,prof1name,prof1value,prof2name,prof2value,talent1,talent2) VALUES('$name','$race','$class','$level','$rank','$ap','$charArray[profName1]','$charArray[profValue1]', '$charArray[profName2]','$charArray[profValue2]','$charArray[talent1]','$charArray[talent2]')"; mysql_query($query) or die(mysql_error()); //Wait 5 seconds inbetween queries to keep from getting banned from WoW Armory servers. //This can probably be adjusted to three or four seconds, but if you do get banned, it can //last las long as 48 hours. sleep(5); } } } function getCharacterInformation($charName) { //link to characters page on WoW Armory $charInfo = file_get_contents("http://us.battle.net/wow/en/character/eitrigg/".$charName."/simple"); $dom = new domDocument; $dom->loadHTML($charInfo); $dom->preserveWhiteSpace = false; //Profession Names $xpath = new DOMXPath($dom); $profName = $xpath->query('//span[@class="profession-details"]/span[@class="name"]'); //Profession Values $profValue = $xpath->query('//span[@class="profession-details"]/span[@class="value"]'); //Talents $talents = $xpath->query('//span[@class="name-build"]/span[@class="name"]'); $charArray = array("profName1" => $profName->item(0)->nodeValue, "profValue1" => $profValue->item(0)->nodeValue, "profName2" => $profName->item(1)->nodeValue, "profValue2" => $profValue->item(1)->nodeValue, "talent1" => $talents->item(0)->nodeValue, "talent2" => $talents->item(1)->nodeValue); return $charArray; } ?>

ScifiToday