Python Scripting with Scribus

en+emdash.py

Here is another script which came about as a modification of another script, in this case Autoquote.py. I'm presenting this one first because it is shorter, and hopefully simpler to understand. Both of these scripts have at their core the idea of analyzing the content of a text frame, character by character, with some transformation in mind. In Autoquote.py, the idea was to transform typewriter quotes such as we have on our keyboards to typographic quotes. This was a question posed on the mailing list, regarding whether this could be done in some automated fashion. Most wordprocessor programs automatically do this as you type, but not so in Scribus, where you might be typing directly into a text frame, but often you are loading a text file into the frame.

en_emdash.py, has as its task transforming typewriter hyphens into the typographic endashes and emdashes. The idea is that a single hyphen would stay as a hyphen, but two in a row would be changed to an endash, and three in a row would become an emdash.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# File: en+emdash.py - convert hyphens to en and em dashes

"""
USAGE

You must have a document open, and a text frame selected.
There are no dialogs. The presumption is that you have encoded a single
hyphen to mean a single hyphen, two hyphens to mean an en-dash, and three
hyphens to mean an em-dash.

"""
import scribus

if scribus.haveDoc():
    c = 0
        
else:
    scribus.messageBox('Usage Error', 'You need a Document open', icon=0, button1=1)
    sys.exit(2)

if scribus.selectionCount() == 0:
    scribus.messageBox('Scribus - Usage Error',
        "There is no object selected.\nPlease select a text frame and try again.",
        scribus.ICON_WARNING, scribus.BUTTON_OK)
    sys.exit(2)
if scribus.selectionCount() > 1:
    scribus.messageBox('Scribus - Usage Error',
        "You have more than one object selected.\nPlease select one text frame and try again.", scribus.ICON_WARNING, scribus.BUTTON_OK)
    sys.exit(2)
textbox = scribus.getSelectedObject()
pageitems = scribus.getPageItems()
boxcount = 1
for item in pageitems:
    if (item[0] == textbox):
        if (item[1] != 4):
            scribus.messageBox('Scribus - Usage Error', "This is not a textframe. Try again.", scribus.ICON_WARNING, scribus.BUTTON_OK)
            sys.exit(2)
contents = scribus.getTextLength(textbox)

ndash = u"\u2013"
mdash = u"\u2014"
prevchar = ''

while 1:
    if ((c == contents) or (c > contents)): break
    if ((c + 1) > contents - 1):
        nextchar = ' '
    else:
        scribus.selectText(c+1, 1, textbox)
        nextchar = scribus.getText(textbox)
    scribus.selectText(c, 1, textbox)
    char = scribus.getText(textbox)
    if (len(char) != 1):
        c += 1
        continue
    if (prevchar == chr(45)):
      if (ord(char) == 45):
          scribus.deleteText(textbox)
          char = ndash
          c -= 1
      else:
          scribus.deleteText(textbox)
          scribus.insertText(chr(45) + char, c, textbox)
          c += 1
            
    elif (prevchar == ndash):
      if (ord(char) == 45):
          scribus.deleteText(textbox)
          scribus.insertText(mdash, c, textbox)
          char = mdash
      else:
          scribus.deleteText(textbox)
          scribus.insertText(ndash, c, textbox)
          c += 1
          scribus.insertText(char, c, textbox)
      
    else:
      if (ord(char) == 45):
          scribus.deleteText(textbox)
          c -= 1
      else:
          scribus.deleteText(textbox)
          scribus.insertText(char, c, textbox)
            
    c += 1
    prevchar = char
    contents = scribus.getTextLength(textbox)

scribus.setRedraw(1)
scribus.docChanged(1)
scribus.messageBox("Finished", "That should do it!",icon=scribus.ICON_NONE,button1=scribus.BUTTON_OK)

We start out checking for an open document, and then whether some object is selected. Just as we did with centervert.py, we then check to make sure this is a text frame. Once these are satisfied, we check the length of the text in the frame. This is because we're going to use this as an indexing mechanism for the characters. We then define our variables for the typographic endash and emdash. In case you're wondering, this notation is how you specify unicode characters in Python. Incidentally, if you wanted to enter an endash on the main canvas you can use its Unicode value. Press Ctrl+Shift+U, and then enter 2013.

Our while loop, specified with while 1:, is going to keep running until we break it internally, which comes in the next line when our indexing goes beyond the length of our text, where c is our indexing variable. In Python, the first character of some variable mytext could be specified by mytext[0]. Since we have specified the count of the characters as contents, the last character is going to be mytext[contents - 1]. I chose to work with 3 characters, the current character, called char, the previous one, called prevchar, and the following character, nextchar.

Look at these two lines:

    scribus.selectText(c, 1, textbox)
    char = scribus.getText(textbox)

This is what you must do to pluck one or more characters from somewhere in the middle of a text frame. You first select the character(s), then you getText(), a command which gets whatever has been selected. Later on, you can see that when we get ready to substitute a character, we simply deleteText(), which deletes whatever is selected. The structured variable naming is then keeping track of what's selected.

Overall, the script works in the following way. As we march through the text frame, using our indexing method, we are looking to see if our char character is a hyphen (in Python, chr(45)). If it is, then we want to see what the previous character is (prevchar). If that is also a hyphen, we delete both and substitute an ndash. The next test if we have a hyphen is whether prevchar is an endash. If that is the case, it's the same as saying that there were three hyphens in a row, and we then delete the ndash and the following hyphen and substitute an emdash.

When these substitutions occur, we have effectively shortened the length of the text in the text frame, and so that we don't skip any, we have adjust our indexing with c -= 1 whenever we substitute ndashes and emdashes for hyphens. This also explains why we need to refresh our getTextLength() at the end of this while loop.

Here is a script where I have made use of the docChanged() command. The theoretical reason for this is so that Scribus recognizes that you have modified the document, and if you might try to exit it or shut down Scribus, it will give you a warning about saving the document. The last command, the messageBox(), is just there to show that the script is done and completed successfully.

As I look at this script and analyze it, I can see that I actually never made use of the variable nextchar. This is a holdover from Autoquote.py, but harms nothing here. Another thing about this script is that there is a certain kind of "error" it doesn't handle, which is when you have four or more hyphens in a row. One of the reasons I didn't work on that is that it only brings up the question of what should the script do in that case? Another reason is that this is supposed to be working on a conscientiously designed body of text, where someone has put in double or triple hyphens expressly for this purpose. Here is what some experimentation shows:

4 dashes = emdash then hyphen
5 dashes = emdash then endash
6 dashes = emdash then emdash
7 dashes = emdash then emdash then hyphen

So extrapolating from this, you will have as many emdashes in a row as possible, then either nothing or a hyphen or an endash. You won't ever see an endash then a hyphen because that would be made into an emdash.