WH-347 Form to Payroll XML Example
Building Up From the Schema
Every well-formed XML file should begin with an XML preamble. This alerts the receiving program that the file should comply with the fundamental rules of XML. It should start in column 1 of the very first line, and generally look exactly like the following:
This just announces that it is an XML file, following version 1.0 of that specification (which is the currently widely used version), and that the text is encoded in UTF-8. (If you are writing normal English language plain text, you are generally using UTF-8 whether you intended to or not. For special characters and other languages, you need to make sure your software works with this character encoding.)
Immediately following the preamble, the file should have the root element for a Payroll, which is defined in the schema by the following:
That's a roundabout way of saying that the element is called
Payroll
and is of type PayrollType
. And
that anything of type PayrollType
contains something
of PayrollHeaderType
followed by an optional
PayrollBenefitPrograms
element and an optional
PayrollEmployees
element. Based on that, we would
expect a valid file, without the optional content,
to look something like the following:
So what constitutes valid PayrollHeaderType
content? That's much simpler, and is defined by:
That says that PayrollHeaderType
is a sequence
of seven elements, named vendorID
, contractID
,
and so on, of type xs:string
, xs:integer
or xs:date
. The comments
element
is optional, signified by the minOccurs="0"
attribute. Plugging that in, we might get:
That looks good. Let's check it with either the online or downloadable validator.
[Error] xml:2:10: cvc-elt.1: Cannot find the declaration of element 'Payroll'.
Well, that's not good. But the declaration of element 'Payroll' is right there in the schema, on line 5. So what's the problem?
It's a little bit tricky. What if multiple schemas each defined
an element called Payroll? It might not matter, unless someone tried
to use both schemas at once. In that case, the validator (and any
other software) would have a problem: which definition's rules should
apply? So schemas generally declare a namespace for their
definitions. That way, a user can say something like "this
Payroll is the one defined in this schema, but that
Payroll is different, defined in that schema." That's what
happened here. If you look closely at the schema, you'll see the
<schema>
element has an attribute called
targetNamespace
whose value is
http://www.transxml.net/schema/prl/1.2
. And that
means that the elements it directly declares, like Payroll
,
are actually in that namespace.
How is the validator to know that? Well, our XML file can tell it. Add
an attribute called xmlns
to the Payroll
element,
and set its value to http://www.transxml.net/schema/prl/1.2
,
shown below.
That xmlns
attribute announces that this element, and
all its descendant elements, should have their names be considered
relative to the specified namespace, and not be treated as if
they were unqualified names. That should solve the problem with
the Payroll
element. When run through the validator,
we see that the problem has been solved.
The source document is valid Payroll XML.
Now that we have a valid starting point, we can proceed in trying to put a more complete example into Payroll XML in the next section.
Converting from Form WH-347
Now consider a realistic but very simple situation as might be reported on the standard US Department of Labor WH-347 form. (The DOL makes a PDF version of the form and instructions available for download.)
We will work from an example form that you can download here. We will also display snippets from the form as we go.
Start with the minimal example we created in the last section, but with all of the data left blank, as shown below:
We need to start by filling these values in, from the WH-347 form when possible, but otherwise from other sources we can access. The top of the form has some of the data we need:
The XML file needs the vendorID
, which isn't in
the form. Instead, the name of the contractor is provided.
We will need to look up or otherwise know what that
contractor's DOT-provided vendor number is. Let's assume
for the example that it is AC-999. We also need
the contractID
, which is available in the
form. It's 2011-1234. The form also has the payroll
number in it (23) and the payroll ending date
(06/18/2011). The payroll beginning date is not here,
but the form always shows a single week, so can figure out
that the beginning date must be 06/12/2011. Finally,
there's nothing yet about fringe benefits or comments, but
those elements are just strings, which can be blank, so we will skip
them for now.
Putting those values into the file, we get the following:
Let's try to validate this.
[Error] xml:6:51: cvc-datatype-valid.1.2.1: '06/12/2011' is not a valid value for 'date'. [Error] xml:6:51: cvc-type.3.1.3: The value '06/12/2011' of element 'payrollBeginDate' is not valid. [Error] xml:7:47: cvc-datatype-valid.1.2.1: '06/18/2011' is not a valid value for 'date'. [Error] xml:7:47: cvc-type.3.1.3: The value '06/18/2011' of element 'payrollEndDate' is not valid.
Apparently, our dates are not in an acceptable format. The
schema says that they must be of type xs:date
.
The xs
prefix is associated in the declaration
with the namespace http://www.w3.org/2001/XMLSchema
.
The Payroll XML schema includes other schemas, which include
others, and so on, and eventually the definition of what
xs:date
has to look like can be found. Rather than
hunting that down, we'll just give the answer. Dates in that
format must be written as YYYY-MM-DD
. That
changes the file to the following:
The validator checks the file, and tells us that we are okay (so far).
The source document is valid Payroll XML.
But we are not done. The most important information, the actual hours and pay
for each employee, need to be entered. All of that is part of one of the
optional elements mentioned in the previous section,
PayrollEmployees
. So we want to add that to the file.
When we check it with the validator, we get an error message:
[Error] xml:9:41: cvc-complex-type.2.4.b: The content of element 'PayrollEmployees' is not complete. One of '{"http://www.transxml.net/schema/prl/1.2":PayrollEmployee}' is expected.
So, while the PayrollEmployees
element is itself optional,
if it is included, it must not be empty. Let's look
at the definition of the PayrollEmployees
element
in the schema. It says it is of type
PayrollEmployeePropertyType
, so we will look at
that definition.
That says that anything of this type is a sequence of
PayrollEmployee
elements. That sequence must
have at least one occurrence (which is why the example above
was not valid). Our WH-347 form
has data for two employees, as shown below:
We will need two PayrollEmployee
elements, one for
John J. Jones and one for Mary M. Smith. So what does a
PayrollEmployee
look like? The schema says
it's of type PayrollEmployeeType
, which is
defined as follows:
This is again pretty straightforward, for the most part. It is a
sequence of 13 elements, many of which are optional. Most of them
are of a type we've already encountered, xs:string
. But
there are some new types here. Any type with a name starting
with xs:
is a standard XML data type. The only new
example here is xs:boolean
, which must be either
false
or true
(capitalization is important).
There is an element of type PayrollEmployeeLaborPropertyType
,
which is declared elsewhere in this schema, and which we will get to soon.
But there is an element of type txl:StreetAddressPropertyType
.
That is a new namespace prefix, txl
. To find its definition,
look back at the top of the PRL-GML20170101pxs.xsd
file.
The schema
element declares the txl
namespace to be http://www.transxml.net/schema/txl/1.0
. The
rules for that namespace are imported in line 4, which refers to the file
TXL-GML20060501pxs.xsd
. We will need to look in that
file for the definition of that type. And the element with that type,
streetAddress
, is required, so we can't get much further
without it. However, it is not in the form, anyway. So even though
it is required, it will have to be obtained elsewhere.
The elements lastName
, firstName
,
middleInitial
, and
partialSsn
clearly correspond to data showing in the
form. socialSecurityNumber
and vendorSuppliedEmployeeID
are each optional and not shown in the form, so we will leave them out
(though some agencies might insist that those optional data elements
be included for them to process the file).
gender
and ethnicity
are each required, and
not in the form, so they will have to be found elsewhere or left as
blank string. Even though the types are simply strings,
each agency is likely to have their own list of valid values and
want those values to be used here. We will assume that we've checked
that list and determined that John J. Jones is gender M
and ethnicity A
, and Mary M. Smith is gender
F
and ethnicity B
.
The remaining fields are employeeAddress
, which is not
in the form, but is required, and optional fields. We may need
to fill in some of those optional fields, but for now we will leave
them out.
Recall that the employeeAddress
field is defined as being
of type txl:StreetAddressPropertyType
. That type is
not defined in our main payroll schema file,
PRL-GML20170101pxs.xsd.
That's because it is prefixed with the namespace alias txl
,
which is short for the namespace
http://www.transxml.net/schema/txl/1.0
. The main schema
file imports the definition of that namespace from another file:
So we have to look at the file TXL-GML20060501pxs.xsd to find this definition:
Wow. Well, let's take it from the top. Any element of type
StreetAddressPropertyType
in this namespace consists
of a sequence of StreetAddress
elements, each of
type StreetAddressType
. It is legal to have zero
elements, according to the schema, but that does not make sense
here. We will have a single StreetAddress
element.
The schema also says that the StreetAddress
elements
can have XML attributes as defined in yet another namespace,
but it turns out that those attributes are all optional, and not
important for our use. So far, that means the employeeAddress
element we are trying to create will have a single child element,
a txl:StreetAddress
element. That txl
is needed because the schema for that namespace says that every
element name must be qualified with the namespace (which is
a different rule than our main namespace uses). Note
the exact required capitalization of each element name.
So what about each StreetAddress
element? That is
of type StreetAddressType
, which means that it
contains one or more elements named street
,
exactly one element each named cityName
,
stateCode
, and postalCodeID
, and
optional elements named postalCodeExtension
and addressCategory
. All of these elements
are of type xs:string
, except for the
postalCodeExtension
element which is a type xs:integer
.
Let's assume we have found that John J. Jones lives at
123 Main Street, Anytown FL 34567 and Mary M. Smith
lives at 456 First Avenue, Anytown FL 345CDE. That
gives us most of the data fields. The postalCodeExtension
is not available, but the schema makes it optional (some agencies
might want it filled in anyway). The addressCategory
is also optional, but let's assume that the agency has told us
that it uses the code H
for home addresses, and
fill that in. The element postalCodeID
is alphanumeric
as shown in the Mary M. Smith example and the element can allow
up to 10 characters.
We are not ready to create the final section, but we can start to fill it out, as shown below:
We know this isn't complete, because we have not filled
in any optional PayrollEmployeeLabors
elements.
Even though they are optional, they are much of the point of
this effort, and there are hours shown on the WH-347 form.
But even though incomplete, is this correct? Let's try this out in
one of the validators. And we get the following:
[Fatal Error] xml:18:32: The prefix "txl" for element "txl:StreetAddress" is not bound.
The error is related to namespaces. The prefix
txl
doesn't seem to mean anything to the validator.
Recall that we declared that we were using a particular namespace
for the file, defined by the file PRL-GML20170101pxs.xsd
.
The txl:
elements are defined in a separate file, which
we need to declare. We could try just adding another xmlns
declaration to the file, but that would be ambiguous. Instead,
we declare a second namespace and announce that it will be used for
elements prefixed with txl:
, as shown below.
We will actually split the declaration over two lines, which is perfectly allowable
in XML.
With both those changes, we have the following file:
Sure enough, that fixes it. The validator reports:
The source document is valid Payroll XML.
Let's move to a lower level of detail. Each PayrollEmployee
element can have a PayrollEmployeeLabors
element
of type PayrollEmployeeLaborPropertyType
. Information
about each employee's labor is certainly on the WH-347 form,
so we want to include that element in the XML file. Let's look
at the definition of PayrollEmployeeLaborPropertyType
:
The the PayrollEmployeeLabors
element just contains
a sequence of PayrollEmployeeLabor
elements. Each of
those elements is of type PayrollEmployeeLaborType
.
What does the PayrollEmployeeLaborType
define?
Wow! It's a sequence of 36 child elements. Luckily, 27 of them are
optional, so we will only deal with them if we have matching data
on the WH-347 form. The required elements are craftCode
,
laborClass
, projectID
, ojtProgramIndicator
,
apprentice
, totalHours
, grossPay
,
totalDeductions
, and netPay
. There are several
unlabeled columns without data that might actually be used in some cases.
It's likely that some of the optional elements could be provided in those
columns in that case. Let's take a
look at the employee section of the WH-347 form again, this time
for the full width:
The value for totalHours
should probably be the sum of the
TOTAL HOURS column for straight time and overtime.
grossPay
is probably the same as the GROSS AMOUNT EARNED
column, netPay
the same as NET WAGES PAID FOR WEEK, and
totalDeductions
the same as TOTAL DEDUCTIONS.
craftCode
is not shown on the form, so we will have to
find it elsewhere. Let's assume we did that, and the craftCode
for John J. Jones is L-31
and the craftCode
for Mary M. Smith is D-83
. The laborClass
either matches
the WORK CLASSIFICATION column or can be derived from it. We will
assume that they match in this case. There are some new data types
used here. totalHours
must be decimal
,
which is a standard type that represents a decimal number (like 40.0).
But the various "pay" elements are all of type txl:CurrencyPropertyType
type. For the moment, let's see if those are just simple numbers, too,
and come back to if needed.
That leaves three required elements that don't seem to match anything
in the form. ojtProgramIndicator
and apprentice
are each boolean values, but don't show on the form. Just as with
gender
and ethnicity
, we have to get those
values elsewhere. For this simple example, we will assume they are both false
.
The projectID
field seems to be for the case where
a single contract can have multiple projects. It allows each entry
to be for a separate project, such as when a large contract is
broken into several projects. This example doesn't do that, so
the projectID
value will be the same in each entry,
and match the overall contractID
.
Putting these values together with the actual
form, the updated XML file now looks like the following:
Run this through the validator to see if we are on the right track. Apparently, not quite:
[Error] xml:35:44: cvc-complex-type.2.3: Element 'grossPay' cannot have character [children], because the type's content type is element-only. [Error] xml:35:44: cvc-complex-type.2.4.b: The content of element 'grossPay' is not complete. One of '{"http://www.transxml.net/schema/txl/1.0":Currency}' is expected. [Error] xml:36:57: cvc-complex-type.2.3: Element 'totalDeductions' cannot have character [children], because the type's content type is element-only. [Error] xml:36:57: cvc-complex-type.2.4.b: The content of element 'totalDeductions' is not complete. One of '{"http://www.transxml.net/schema/txl/1.0":Currency}' is expected. [Error] xml:37:40: cvc-complex-type.2.3: Element 'netPay' cannot have character [children], because the type's content type is element-only. [Error] xml:37:40: cvc-complex-type.2.4.b: The content of element 'netPay' is not complete. One of '{"http://www.transxml.net/schema/txl/1.0":Currency}' is expected. [Error] xml:65:44: cvc-complex-type.2.3: Element 'grossPay' cannot have character [children], because the type's content type is element-only. [Error] xml:65:44: cvc-complex-type.2.4.b: The content of element 'grossPay' is not complete. One of '{"http://www.transxml.net/schema/txl/1.0":Currency}' is expected. [Error] xml:66:57: cvc-complex-type.2.3: Element 'totalDeductions' cannot have character [children], because the type's content type is element-only. [Error] xml:66:57: cvc-complex-type.2.4.b: The content of element 'totalDeductions' is not complete. One of '{"http://www.transxml.net/schema/txl/1.0":Currency}' is expected. [Error] xml:67:40: cvc-complex-type.2.3: Element 'netPay' cannot have character [children], because the type's content type is element-only. [Error] xml:67:40: cvc-complex-type.2.4.b: The content of element 'netPay' is not complete. One of '{"http://www.transxml.net/schema/txl/1.0":Currency}' is expected.
The six "pay" elements are all rejected. Just putting a number in
for something of type txl:CurrencyPropertyType
is clearly
not allowed. We need to look up the definition of that
type, but where is it? Note the namespace prefix of
txl
, which means we have to look in the file
TXL-GML20060501pxs.xsd.
Sure enough, CurrencyPropertyType
is defined
there:
That's pretty daunting. Starting at the top, it says that an
element of type CurrencyPropertyType
contains
a sequence of exactly one element, named Currency
.
There is something there about an attributeGroup
,
but we will see if we can work without worrying about that. The
Currency
element is of type
CurrencyType
. That in turn contains three
child elements, amount
, country
,
and currencyUnits
. The last two children
are optional, and since this is based on a U.S. federal
form, we can figure that U.S. dollars are assumed and leave
them out. That leaves just the amount
element
which is a decimal
value, so plain numbers should
work.
To fix each of the "pay" elements, instead
of just a simple number like 1800.00
we need a Currency
child element that contains an amount
child element,
which can contain the decimal number (1800.00
in this example). Both those new elements
are defined as being in the namespace
http://www.transxml.net/schema/txl/1.0
.
So we must change the various "pay"
elements' values to contain the proper nested elements. For
example, change the first grossPay
as
follows:
<grossPay><txl:Currency><txl:amount>1800.00</txl:amount></txl:Currency></grossPay>
And we are back to having a valid file:
The source document is valid Payroll XML.
If only we were done. But there is important information entered
in the WH-347 form that is not yet in the XML file. Some of that is
optional data similar to what is already handled, namely the
straightTimeHourlyRate
, overtimeHourlyRate
,
straightTimeHours
, overtimeHours
,
fICAAmount
, and federalWithholdingTaxAmount
elements. And we will interpret regularHourlyRate
as being the same as straightTimeHourlyRate
. Adding
them to the file gives us:
That is still valid, and all that is left to enter from the
form are the hours worked each day. Those are part of the
optional PayrollEmployeeLaborHours
child
element of the PayrollEmployeeLabor
element.
That element is of type PayrollEmployeeLaborHourPropertyType
,
defined as follows:
That says that the contents of this element is a sequence of one or
more PayrollEmployeeLaborHour
elements, and each
of those contain a laborHourDate
element and then
either an hourlyHours
element or a
salariedEmployeeHours
element. If there
is an hourlyHours
element, it contains a
straightTimeHours
child element and an optional
overtimeHours
child element.
The sample WH-347 form does not
make it clear whether the employees are salaried or hourly, but
we will assume they are all hourly. In that case, it is pretty
clear how to map the form to these elements. One question is
whether a date without hours for an employee should result
in a PayrollEmployeeHour
element with only
a laborHourDate
child, or should simply
be omitted. We will omit it and only create
PayrollEmployeeLaborHour
elements if
the employee shows time worked that day.
When we try that out in the validator, we get:
The source document is valid Payroll XML.
You will have noted that there were many judgment calls made here, and some could have gone differently. You will have to decide what works best in your situation based on what your state agency expects and the data you have available.