Sunday, February 16, 2014

PERL Matching non-ASCII characters in a converted RTF

I have a data file that was converted from an RTF to a TXT.    When I started trying to parse it using PERL, my regular expressions weren't able to split up lines that looked like they had whitespace delimiters - It would just ignore the whitespace.

After my initial confusion, I figured that the whitespace must be something other than an ASCII space character, tab, etc.   By some experimentation, I noticed that there were several bytes being represented in that "whitespace".

To try and figure out what the bytes/characters were, I created a little PERL code segment that looked like:

while ($filecontents =~ /([^\d\w\s\t\.:;&\,\-\(\)]+)/){
    $f = $1;
    $d = $1;
    $f  =~ s/(.)/sprintf("%x ",ord($1))/eg;
    print "f is $f\n";
    $filecontents =~ s/$d/zzz/g;


Basically, the code goes thru the file, finds oddball characters and prints them out.  When I ran it, it produced the following:

   f is e2 80 83
   f is e2 80 a8
   f is e2 81 84
   f is c2 b0 

Note that each of those looks like a multi-byte character, but what are they?

Well, I do love the internet.  I cut and pasted e2 80 a8  into Google and found that it was an "em space", aka Unicode character \u2003.

Once I was able to get the Unicode character, I could just replace all of the em spaces with a regular space, and the rest of my program worked as designed.  Same idea with the other special characters.  Two of those characters were not whitespace, but were non-ASCII characters as well (fraction slash and degree symbol).

Note that, at least in my case, I had to match using the hex versus the unicode character. In other words

    $filecontents =~ s/\xe2\x80\xa8/ /g;

I'm assuming this is because the Unicode would be a UTF-16 character but I'm dealing with a UTF-8 encoding?   For next time, I should see if I can export the RTF to a UTF-16 text file.  Maybe it would be easier :)

Monday, February 10, 2014

UIWebView, URL history, and redirects

Sometimes it seems the simplest things turn out to be much more complicated than they should be.

In several of our apps, we include help files written in HTML which are loaded locally from the bundle into a UIWebView.  Sometimes those help files contain links to web pages.

The problem is that UIWebView doesn't treat locally loaded webpages as part of the history stack.  Thus if the user clicks a link and visits a web page, there is no simple way for the app to return to the original locally loaded HTML file because [webview canGoBack] returns NO.  Grr.

My first attempt to deal with the issue was to just reload the local file if [webview canGoBack] was NO.  However, if the same locally loaded HTML file contains two web pages, and the user visits each, after the attempt to return from the 2nd webpage,  [webview canGoBack] will return YES and [webview goBack] will display the first webpage, because the first webpage was never removed from the UIWebView's history, and there's apparently no way (that I was able to find) to get rid of it.  Grrr.

The next step was to try and maintain my own count of visited URL's and back out of them as the user clicked the back button.   I implemented this by adding to the count in the UIWebView delegate method shouldStartWithRequest when the navigationType was UIWebViewNavigationTypeLinkClicked and removed them with each click of the back button.  Great!  Except that redirects also load with a navigationType of UIWebViewNavigationTypeLinkClicked so the user would have to click "back" several times for no clear reason because the link count was incremented for each redirect.  Grrrr.

I was able to finally make it work with some insight from this tutorial.  The key I learned was that  the webViewDidFinishLoad delegate method is not called until the redirects have been resolved. (But it may call the method several times as it loads the contents of the page).

Below are the key elements of the code:

- (BOOL)webView:(UIWebView *)webView shouldStartLoadWithRequest:(NSURLRequest *)request navigationType:(UIWebViewNavigationType)navigationType
    // we use the linkClicked boolean because with redirects, this method is still called multiple times and we only want
    // to increment once.
    if (linkClicked == NO && navigationType == UIWebViewNavigationTypeLinkClicked){
        linkClicked = YES;

        if (linkStack == 0){
            scrollOffset = webView.scrollView.contentOffset;
    return YES;
- (void)webViewDidFinishLoad:(UIWebView *)webView
    if (linkStack == 0 && scrollOffset.y > 0){
        // reset the scroll if we are coming back to a page after a clicked link.
        [wv.scrollView setContentOffset:scrollOffset];
        scrollOffset = CGPointMake(0, 0);
    NSURLRequest* request = [webView request];
    if ([[request mainDocumentURL] isEqual:lastMainDoc])
    linkClicked = NO;
    [self setLastMainDoc:[request mainDocumentURL]];
    NSLog(@"finished loading %@", [[request mainDocumentURL] absoluteString]);

    if (linkStack > 0){
        if (linkStack == 1 || [wv canGoBack] == NO){
            [self loadFile];

        } else {
            [wv goBack];

- (void)loadFile
    // load local HTML file

I'm sure there are a few other ways to address this problem, but this is what worked for me....