But one key feature that the Scala libraries are lacking is support for validation of XML documents against an XSD schema. Fortunately, the JDK has a standard mechanism for this in the
javax.xml.validation package. And with a little bit of work we can combine these capabilities to get the best of both platforms.The
scala.xml.XML object provides a set of simple functions for reading XML from various sources (files, streams, etc.). A quick inspection of the code in this class shows all the load* methods to be wrappers around the loadXML method of the scala.xml.parsing.NoBindingFactoryAdapter class.NoBindingFactoryAdapter (via its parent class scala.xml.parsing.FactoryAdapter) implements the standard org.xml.sax.helpers.DefaultHandler interface and serves as a bridge between the Java and Scala XML worlds. The relevant loadXML method is defined on FactoryAdapter and looks like this:def loadXML(source: InputSource): Node = {
// create parser
val parser: SAXParser = try {
val f = SAXParserFactory.newInstance()
f.setNamespaceAware(false)
f.newSAXParser()
} catch {
case e: Exception =>
Console.err.println("error: Unable to instantiate parser")
throw e
}
// parse file
scopeStack.push(TopScope)
parser.parse(source, this)
scopeStack.pop
return rootElem
}
Essentially, this method creates a standard
javax.xml.parsers.SAXParser object and passes itself in as the ContentHandler. The callback methods in FactoryAdapter convert the standard SAX parsing events into instances of the core Scala XML types. This is our hook for introducing XSD validation.The
javax.xml.validation.ValidatorHandler is a class that acts as a filter of sorts (although not a true org.xml.sax.XMLFilter), intercepting callbacks sent from an SAX parser to a ContentHandler, and generating errors if the data doesn't conform to its schema definition. We will create a subclass of NoBindingFactoryAdapter that interposes a ValidatorHandler between the parser and functions defined in the FactoryAdapter to more-or-less transparently implement validation.We will use the
loadXML method of FactoryAdapter as a starting point for implementation:import javax.xml.parsers.SAXParser
import javax.xml.parsers.SAXParserFactory
import javax.xml.validation.Schema
import javax.xml.validation.ValidatorHandler
import org.xml.sax.XMLReader
class SchemaAwareFactoryAdapter(schema:Schema) extends NoBindingFactoryAdapter {
override def loadXML(source: InputSource): Elem = {
// create parser
val parser: SAXParser = try {
val f = SAXParserFactory.newInstance()
f.setNamespaceAware(true)
f.setFeature("http://xml.org/sax/features/namespace-prefixes", true)
f.newSAXParser()
} catch {
case e: Exception =>
Console.err.println("error: Unable to instantiate parser")
throw e
}
val xr = parser.getXMLReader()
val vh = schema.newValidatorHandler()
vh.setContentHandler(this)
xr.setContentHandler(vh)
// parse file
scopeStack.push(TopScope)
xr.parse(source)
scopeStack.pop
return rootElem.asInstanceOf[Elem]
}
}
The key differences here are that we have enabled namespace awareness on the parser (which is a requirement for schema validation!) and stuck our
ValidatorHandler in between the parser and the FactoryAdapter instance. Because we are actually overriding NoBindingFactoryAdapter#loadXML (which has a return type of Elem) we need the cast in the final line.So now we can validate XML as follows:
// A schema can be loaded in like ...
val sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
val s = sf.newSchema(new StreamSource(new File("foo.xsd")))
// and whenever we would want to do something like:
val is = new InputSource(new File("foo.xml"))
val xml = XML.load(is)
// instead we'll use our class:
val is = new InputSource(new File("foo.xml"))
val xml = new SchemaAwareFactoryAdapter(s).loadXML(is)