Signatures are useful for spinning articles

Thinking of the different ways that spinning text may be used usefully and simply made me think of what was unique about an article. So keeping everything as simple as possible I came up with the idea of a signature, created each time a document is processed.

So let’s think of an agreed signature for a document. The rule should be a template and a signature should be all that is needed to create the final document. So the simplest signature possible (for spinning only at the moment) should be a recording of which snippets were used to create the output. Snippets are the strings between the delimiters, the places in the document that would actually change.

So in the following example, if the template is

A {Day|Week} In {A|The} Life

The only possible outputs would be

A Week In A Life
A Day In A Life
A Week In The Life
A Day In The Life

So the simplest signature would be which options were chosen in both snippets that were spun. The signatures for the above lines would be

2,1
1,1
2,2
1,2

respectively.

This is easily read and understood and allows a reader to work out what was used where so it looks like a good idea to stick to it and make it exactly that format for now. Maybe later for efficiency the bit arrays can be fully used but for now, to get this working, we’ll store them in that fashion.

Although the situation is more complex when you want a spin within a spin I found parsing a template into an array of snippets to be really helpful and dealt with it easily. Each time the delimiters are found they are add to an array of snippets. If the snippets themselves contain delimiters then more snippets will be created to deal with them.

So the following template will produce an array of three snippets

This product is {top|the {highest sales|best performing} widget} of its range
0: This product is {1} of its range
1: top|the {2} widget
2: highest sales|best performing

So the signature mask of this template will contain 2 numbers, for snippets 1 and 2. The first snippet did not have any split ‘|’ delimiter so needn’t be recorded. Therefore the total number of signatures for this template is exactly the same as the earlier one, with only 4  possibilities.

What are the advantages of these signatures?

Well their neat fit with the template means that you could actually stored efficiently the documents that have been produced. More efficiently if bit array are used but that’s for later. So you would have a template and a list of signatures that define what documents have already been done, meaning you can continue in separate stages. You could for example generate a new unique document some time later, whenever you needed it, say for example submitting to directories over an extended period of time.

It means also that you do not need to produce everything at once and only create these documents when they are needed. Useful if you are using several different documents in one go to, without having to produce all the output in marathon sessions.

Another advantage comes back to ensuring uniqueness in the application, the signatures themselves can be the key to ensure that the application does not reproduce the same article again by just checking the existing signatures. I still need to work this out in a future post, but I have had a few ideas for that to keep it efficiently simple, as best I can.

The biggest advantage is that the signatures use less memory, they are smaller, and could possibly be kept in a spreadsheet or database to show which ones have been produced.

Another advantage is that the same output document can be produced again when needed. By applying the signature to the template, meaning that practically creating loads of output files is not really necessary and you can look back at earlier documents that you produced.

What are the disadvantages?

Well, I have been looking around to find the best technology for scripting and to put them into the article template. This is actually the most important feature personally that I am looking for to using, because although spinning give colour and uniqueness to the final document, if used correctly, the most power comes when the document is powered by the data you have collected.

So also it means that uniqueness in signature is of less importance, because the material then gets fed from data that is constantly changing. Also it will break the rule of  “a template and a signature should be all that is needed to create the final document” I stated in the beginning.

The thing that would need to be done is to add in the (smallest subset, decision point of, or smallest set of the decision funnel) data that would be needed to recreate the output article. There are more interesting ideas there but we’ll leave that to the back of my mind for the moment.

Actually the most troublesome bit at the moment is trying to decide on the technologies to use… Things are not as clear cut as they were a few years ago and I get this urge to bring back a 10 year technology VBScript, just because of its simplicity for any users…

Yet another post (or more) there, I am sure.

So Version 1.0.2 will have signatures, they will be a useful feature going forward.

This entry was posted in Site News. Bookmark the permalink.

Leave a Reply