home PYTHONJAVA
 

H1>Ruby XML, XSLT and XPath tutorials

What is XML?

XML refers to the eXtensible Markup Language.

Extensible Markup Language, a subset of the standard universal markup language, a markup language used to mark electronic documents to make them structural.

It can be used to mark data, define data types, and is a source language that allows users to define their own markup language. It is ideal for web transport and provides a unified way to describe and exchange structured data that is independent of the application or vendor.

For more content, check out our XML tutorial


XML parser structure and API

XML parsers are mainly DOM and SAX.

  • The SAX parser is based on event processing. It needs to scan the XML document from beginning to end. During the scanning process, each time a syntax structure is encountered, the event handler of this specific syntax structure is called. The application sends an event.
  • The DOM is a document object model parsing, constructing a hierarchical syntax structure of a document, and establishing a DOM tree in memory. The nodes of the DOM tree are identified by an object. After the document is parsed, the entire DOM tree of the document is placed in the memory.

Parsing and creating XML in Ruby

The parsing of XML documents in RUBY can use this library REXML library.

The REXML library is an XML toolkit for ruby that is written in pure Ruby and adheres to the XML 1.0 specification.

In Ruby 1.8 and beyond, REXML will be included in the RUBY standard library.

The path to the REXML library is: rexml/document

All methods and classes are encapsulated into a single REXML module.

The REXML parser has the following advantages over other parsers:

  • 100% written by Ruby.
  • Applicable to SAX and DOM parsers.
  • It's lightweight, with less than 2000 lines of code.
  • Methods and classes that are easy to understand.
  • Based on the SAX2 API and full XPath support.
  • Use Ruby installation without having to install it separately.

The following XML code for the instance is saved as movies.xml:

<collection shelf=" New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type > <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description > </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description > </movie> <movie title="Trigun"> <type>Anime, Action</type > <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>

DOM parser

Let's first parse the XML data. First we introduce the rexml/document library. Usually we can introduce REXML in the top-level namespace:

Instance

#!/usr/bin/ruby -w require 'rexml/document' include REXML xmlfile= File.new("movies.xml") xmldoc = Document.new(xmlfile) # Get the root element root= xmldoc.root puts "Root element : " + root.attributes ["shelf" ] # The following will output the movie title xmldoc.elements.each("collection/movie"){ |e| puts "Movie Title : " + e.attributes["title"] } # All movie types will be output below< /span> xmldoc.elements.each("collection/movie/type") { |e| puts "Movie Type : " + e.text } # All movie descriptions will be output below< /span> xmldoc.elements.each("collection/movie/description") { |e| puts "Movie Description : " + e.text }

The output of the above example is:

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War , Thriller
Movie Type : Anime , Science  Fiction
Movie Type : Anime , Action
Movie Type : Comedy
Movie Description : Talk About a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash The Stampede!
Movie Description : Viewable Boredom
SAX-like Parsing:

SAX parser

Process the same data file: movies.xml. It is not recommended to parse SAX into a small file. Here is a simple example:

Instance

#!/usr/bin/ruby -w require 'rexml/document' require 'rexml/streamlistener' include REXML class MyListener include REXML::StreamListener def tag_start(*args) puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end def text(data) return if data =~ /^\w* $/ # whitespace only abbrev = data[0..40] + (data.length > 40 ? "..." : "") puts " text : #{abbrev.inspect}" end end list= MyListener.new xmlfile= File.new("movies.xml") Document.parse_stream(xmlfile, list)

The above output is:

tag_start: "collection", { "shelf"=>"New Arrivals" }
Tag_start: "movie", {"title"=>"Enemy Behind"}
Tag_start: "type", {}
  Text : "War, Thriller"
Tag_start: "format", {}
Tag_start: "year", {}
Tag_start: "rating", {}
Tag_start: "stars", {}
Tag_start: "description", {}
  Text : "Talk about a US-Japan war"
Tag_start: "movie", {"title"=>"Transformers"}
Tag_start: "type", {}
  Text : "Anime, Science Fiction"
Tag_start: "format", {}
Tag_start: "year", {}
Tag_start: "rating", {}
Tag_start: "stars", {}
Tag_start: "description", {}
  Text : "A schientific fiction"
Tag_start: "movie", {"title"=>"Trigun"}
Tag_start: "type", {}
  Text : "Anime, Action"
Tag_start: "format", {}
Tag_start: "episodes", {}
XPath.each(xmldoc, "//type") { |e| puts e.text }
 
# Get the types of all movie formats, return an array
names= XPath.match(xmldoc, "/ /format").map {|x| x.text }
p names

The output of the above example is:

<movie title< /span>='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT and Ruby

There are two XSLT parsers in Ruby, which are briefly described below:

Ruby-Sablotron

This parser was written and maintained by Justice Masayoshi Takahash. This is mainly written for the Linux operating system and requires the following libraries:

  • Sablot
  • Iconv
  • Expat

You can find these libraries at Ruby-Sablotron.

XSLT4R

The XSLT4R was written by Michael Neumann. XSLT4R is used for simple command line interaction and can be used by third-party applications to transform XML documents.

XSLT4R requires XMLScan operations, including the XSLT4R archive, which is a 100% Ruby module. These modules can be installed using the standard Ruby installation method (ie Ruby install.rb).

The XSLT4R syntax is as follows:

ruby xslt .rb stylesheet.xsl document.xml [arguments]

If you want to use XSLT4R in your application, you can introduce XSLT and enter the parameters you need. An example is as follows:

Instance

require " xslt" stylesheet = File.readlines("stylesheet.xsl").to_s xml_doc= File.readlines("document.xml").to_s arguments = { ' image_dir' => '/....' } sheet = XSLT::Stylesheet.new( stylesheet , arguments ) # output to StdOut sheet.apply( xml_doc ) # output to 'str' str = "" sheet.output = [ str ] sheet.apply( xml_doc )

More information






welookups is optimized for learning.© welookups. 2018 - 2019 All Right Reserved and you agree to have read and accepted our term and condition.