Dataconversion validating xml good things to write about yourself on a dating site
It assumes your document is UTF-8 and will report each instance (by line number) where a non-UTF character is encountered.
Because it is identifying and counting each character in a document, it is rather slow, but very very useful.
For many collections, converting and preparing data is the most time-consuming and difficult part of mounting the collection online.
Because each conversion project is specific to your material and cannot be easily generalized, DLXS does not formally support mechanisms for converting data to various formats.
Nevertheless, we do provide some documentation on strategies, tools, and methods that we have found helpful for data conversion.
Some of this documentation is class-specific, and some deals with more general Unicode and XML issues.
First, we'll look at which character or numeric entities, if any, are used in these documents.Now that we know which items need what character treatments, we'll convert them. So, we'll use ncr2utf to convert the entities into the characters. & is the ampersand (as is &) -- if you convert these to the character, you will run into validation problems down the road, as bare ampersands are not permitted in XML. Many of you may be in a position where you'll want to be converting your SGML files to XML.Many of you will be fortunate enough to have files already in XML -- say, finding aids in EAD 2002., you will find four sample files that we'll examine for character encoding and then convert to UTF-8.Copy these to your own directory -- they are completely expendable and won't serve a purpose in tomorrow's Text Class implementation.
Because the file we want to work with is now UTF-8, we need to set some environment variables for the tools from the sp package to let them know this is UTF-8.