In certain occasions e.g. when you want to save scrapped html page or just a specific part of it (e.g. div with its contents) the recommendation I found is to use the blob type and save xml document as binary stream.
In groovy, this is achieved in a rather simple way:
class Document {
int id
String title
String xmlContent
}
Sql sql = Sql.newInstance(
"jdbc:mysql://localhost:3306/documents_db?useUnicode=true&characterEncoding=UTF-8&useBlobToStoreUTF8OutsideBMP=true",
"user", "pass", "com.mysql.jdbc.Driver")
// We assume db table similar to:
sql.execute("""
CREATE TABLE IF NOT EXISTS `document` (
`id` bigint(32) NOT NULL AUTO_INCREMENT,
`title` varchar(500) NOT NULL,
`xmlContent` blob,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
""")
def saveDocuments(def docs) {
println "Saving documents to database..."
def documents = sql.dataSet("document")
docs.each {Document doc ->
documents.add(title: doc.title, xmlContent: doc.xmlContent)
}
}
When you want to retrieve data, with Groovy this is not much complicated either:
def loadDocuments() {
def list = []
sql.eachRow("SELECT * FROM document") {
Document doc = new Document(it.toRowResult())
if (it['xmlContent'] != null) {
doc.xmlContent = new String(it['xmlContent'])
}
list.add(doc)
}
println "Loaded ${list.size()} documents from database"
list
}
I tried the code with mysql database but I assume that you can achieve the same with any other database.
One hint for mysql blobs and utf-8: although utf-8 charset behaves very well with other database table fields (if you add useUnicode=true
and characterEncoding=UTF-8
parameters to your connection string), in order to work with utf-8 blob fields you should adjust one more parameter: useBlobToStoreUTF8OutsideBMP=true
. With this parameter set, your utf8 encoded xml documents will behave well with mysql database.
Leave a Reply